class neon.optimizers.optimizer.Adagrad(stochastic_round=False, learning_rate=0.01, epsilon=1e-06, gradient_clip_norm=None, gradient_clip_value=None, param_clip_value=None, name=None)[source]

Adagrad is an algorithm that adapts the learning rate individually for each parameter by dividing by the $$L_2$$-norm of all previous gradients. Given the parameters $$\theta$$, gradient $$\nabla J$$, accumulating norm $$G$$, and smoothing factor $$\epsilon$$, we use the update equations:

$G' = G + (\nabla J)^2$
$\theta' = \theta - \frac{\alpha}{\sqrt{G' + \epsilon}} \nabla J$

where the smoothing factor $$\epsilon$$ prevents from dividing by zero. By adjusting the learning rate individually for each parameter, Adagrad adapts to the geometry of the error surface. Differently scaled weights have appropriately scaled update steps.

Example usage:

from neon.optimizers import Adagrad

# use Adagrad with a learning rate of 0.01

__init__(stochastic_round=False, learning_rate=0.01, epsilon=1e-06, gradient_clip_norm=None, gradient_clip_value=None, param_clip_value=None, name=None)[source]

Class constructor.

Parameters: stochastic_round (bool) – Set this to True for stochastic rounding. If False rounding will be to nearest. If True will perform stochastic rounding using default width. Only affects the gpu backend. learning_rate (float) – the multiplication coefficent of updates epsilon (float) – smoothing epsilon to avoid divide by zeros gradient_clip_norm (float, optional) – Target gradient norm. Defaults to None. gradient_clip_value (float, optional) – Value to element-wise clip gradients. Defaults to None. param_clip_value (float, optional) – Value to element-wise clip parameters. Defaults to None.

Notes

Only constant learning rate is supported currently.

Methods

 __init__([stochastic_round, learning_rate, …]) Class constructor. clip_gradient_norm(param_list, clip_norm) Returns a scaling factor to apply to the gradients. clip_value(v[, abs_bound]) Element-wise clip a gradient or parameter tensor to between -abs_bound and +abs_bound. gen_class(pdict) get_description([skip]) Returns a dict that contains all necessary information needed to serialize this object. optimize(layer_list, epoch) Apply the learning rule to all the layers and update the states. recursive_gen(pdict, key) helper method to check whether the definition
be = None
classnm

Returns the class name.

clip_gradient_norm(param_list, clip_norm)

Returns a scaling factor to apply to the gradients.

The scaling factor is computed such that the root mean squared average of the scaled gradients across all layers will be less than or equal to the provided clip_norm value. This factor is always <1, so never scales up the gradients.

Parameters: param_list (list) – List of layer parameters clip_norm (float, optional) – Target norm for the gradients. If not provided the returned scale_factor will equal 1. Computed scale factor. scale_factor (float)
clip_value(v, abs_bound=None)

Element-wise clip a gradient or parameter tensor to between -abs_bound and +abs_bound.

Parameters: v (tensor) – Tensor of gradients or parameters for a single layer abs_bound (float, optional) – Value to element-wise clip gradients or parameters. Defaults to None. Tensor of clipped gradients or parameters. v (tensor)
gen_class(pdict)
get_description(skip=[], **kwargs)

Returns a dict that contains all necessary information needed to serialize this object.

Parameters: skip (list) – Objects to omit from the dictionary. Dictionary format for object information. (dict)
modulenm

Returns the full module path.

optimize(layer_list, epoch)[source]

Apply the learning rule to all the layers and update the states.

Parameters: layer_list (list) – a list of Layer objects to optimize. epoch (int) – the current epoch, needed for the Schedule object.
recursive_gen(pdict, key)

helper method to check whether the definition dictionary is defining a NervanaObject child, if so it will instantiate that object and replace the dictionary element with an instance of that object