Some optimizers (such as
and any custom optimizers you may create, can also be initialized with a
learning rate schedule to adjust the learning rate over training epochs.
||Constant or step learning rate|
||Polynomial learning rate|
This class can be configured to implement a constant, linear, or step learning rate. By default, schedule is a constant learning rate.
# Constant learning rate of 0.01 across training epochs optimizer = GradientDescentMomentum(0.01, 0.9, schedule = Schedule())
To set a step schedule, pass the arguments
change. The schedule will multiply the learning rate by
during each epoch # provided in the list
step_config. For example,
the following call:
# Lower the LR to 0.6 at step 2, and 0.4 at step 6. schedule = Schedule(step_config=[2, 6], change=[0.6, 0.4]) # Use learning rate of 1.0 optimizer = GradientDescentMomentum(1.0, 0.9, schedule=schedule)
yields the learning rate schedule below:
To set a decaying schedule, use
ExpSchedule and pass the decay rate
decay. This schedule implements
where \(\beta\) is the decay rate, and \(\alpha_\circ\) is the initial learning rate.
# Blue line s = ExpSchedule(decay=0.1) # Green line s = ExpSchedule(decay=0.3) # Red line s = ExpSchedule(decay=0.7)
yields different decay rates:
A polynomial schedule takes as input the total number of epochs \(T\) and a power \(\beta\), and produces the learning schedule:
where \(\alpha_\circ\) is the initial learning rate. For example,
schedule = PolySchedule(total_epochs = 10, power = 0.7)
yields (with the initial learning rate set at 1.0):