Learning schedules

Some optimizers (such as neon.optimizers.GradientDescentMomenum), and any custom optimizers you may create, can also be initialized with a learning rate schedule to adjust the learning rate over training epochs.

Function Description
neon.optimizers.Schedule Constant or step learning rate
neon.optimizers.ExpSchedule Exponential decay
neon.optimizers.PolySchedule Polynomial learning rate

Schedule

This class can be configured to implement a constant, linear, or step learning rate. By default, schedule is a constant learning rate.

# Constant learning rate of 0.01 across training epochs
optimizer = GradientDescentMomentum(0.01, 0.9, schedule = Schedule())

To set a step schedule, pass the arguments step_config and change. The schedule will multiply the learning rate by change during each epoch # provided in the list step_config. For example, the following call:

# Lower the LR to 0.6 at step 2, and 0.4 at step 6.
schedule = Schedule(step_config=[2, 6], change=[0.6, 0.4])

# Use learning rate of 1.0
optimizer = GradientDescentMomentum(1.0, 0.9, schedule=schedule)

yields the learning rate schedule below:

_images/docs_step_schedule.png

ExpSchedule

To set a decaying schedule, use ExpSchedule and pass the decay rate decay. This schedule implements

\[\alpha(t) = \frac{\alpha_\circ}{1 + \beta t}\]

where \(\beta\) is the decay rate, and \(\alpha_\circ\) is the initial learning rate.

# Blue line
s = ExpSchedule(decay=0.1)

# Green line
s = ExpSchedule(decay=0.3)

# Red line
s = ExpSchedule(decay=0.7)

yields different decay rates:

_images/docs_expSchedule.png

PolySchedule

A polynomial schedule takes as input the total number of epochs \(T\) and a power \(\beta\), and produces the learning schedule:

\[\alpha(t) = \alpha_\circ \times\left(1-\frac{t}{T}\right)^\beta\]

where \(\alpha_\circ\) is the initial learning rate. For example,

schedule = PolySchedule(total_epochs = 10, power = 0.7)

yields (with the initial learning rate set at 1.0):

_images/docs_poly_schedule.png