# neon.layers.recurrent.GRU¶

class neon.layers.recurrent.GRU(output_size, init, init_inner=None, activation=None, gate_activation=None, reset_cells=False, name=None)[source]

Implementation of the Gated Recurrent Unit based on [Cho2014].

• It uses two gates: reset gate (r) and update gate (z)
• The update gate (z) decides how much the activation is updated
• The reset gate (r) decides how much to reset (when r = 0) from the previous activation
• Activation (h_t) is a linear interpolation (by z) between the previous activation (h_t-1) and the new candidate activation ( h_can )
• r and z are computed the same way, using different weights
• gate activation function and unit activation function are usually different
• gate activation is usually logistic
• unit activation is usually tanh
• consider there are 3 gates: r, z, h_can
Parameters: output_size (int) – Number of hidden/output units init (Initializer) – Function for initializing the model’s input to hidden weights. By default, this initializer will also be used for recurrent parameters unless init_inner is also specified. Biases will always be initialized to zero. init_inner (Initializer, optional) – Function for initializing the model’s recurrent parameters. If absent, will default to using same initializer provided to init. activation (Transform) – Activation function for the input modulation. gate_activation (Transform) – Activation function for the gates. reset_cells (bool) – default to be False to make the layer stateful, set to True to be stateless. name (str, optional) – name to refer to this layer as.
x

Tensor – Input data tensor (input_size, sequence_length * batch_size)

W_input

Tensor – Weights on the input units (out size * 3, input size)

W_recur

Tensor – Weights on the recursive inputs (out size * 3, out size)

b

Tensor – Biases (out size * 3 , 1)

References

• Learning phrase representations using rnn encoder-decoder for statistical machine translation [Cho2014]
• Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling [Chung2014]
 [Cho2014] (1, 2) http://arxiv.org/abs/1406.1078
__init__(output_size, init, init_inner=None, activation=None, gate_activation=None, reset_cells=False, name=None)[source]

Methods

 __init__(output_size, init[, init_inner, …]) accumulates(f) Higher order decorator function that enables accumulation functionality for that function. allocate([shared_outputs]) Allocate output buffer to store activations from fprop. allocate_deltas(global_deltas) bprop(deltas[, alpha, beta]) Backpropagation of errors, output delta for previous layer, and calculate the update on model params. configure(in_obj) Set shape based parameters of this layer given an input tuple, int or input layer. final_state() Return final state for sequence to sequence models fprop(inputs[, inference, init_state]) Apply the forward pass transformation to the input data. gen_class(pdict) get_description([get_weights, keep_states]) Get layer parameters. get_final_hidden_error() Return hidden delta after bprop and adjusting for bprop from decoder to encoder in sequence to sequence models. get_is_mklop() is_mklop true means this op is on mkl backend get_param_attrs() get_params() Get layer parameters, gradients, and states for optimization. get_params_serialize([keep_states]) get_terminal() Used for recursively getting final nodes from layer containers. init_buffers(inputs) Initialize buffers for recurrent internal units and outputs. init_params(shape) Initialize params for GRU including weights and biases. load_weights(pdict[, load_states]) Load weights. nested_str([level]) Utility function for displaying layer info with a given indentation level. recursive_gen(pdict, key) helper method to check whether the definition serialize() Get state parameters for this layer. set_acc_on(acc_on) Set the acc_on flag according to bool argument. set_batch_size(N) Set minibatch size. set_deltas(delta_buffers) Use pre-allocated (by layer containers) list of buffers for backpropagated error. set_is_mklop() set_next(layer) Set next_layer to provided layer. set_not_mklop() set_params(pdict) Set layer parameters (weights). set_seq_len(S) Set sequence length. set_states(pdict)
accumulates(f)

Higher order decorator function that enables accumulation functionality for that function. Object that use this decorator are required to have an acc_param attribute. This attribute tuple declares the names for existing temp parameter and real parameter buffers. The temp parameter buffer copies the value of the parameter buffer before f is called, and after f is called the temp and normal buffers are summed. This decorator could be used to wrap any function that may want to accumulate parameters instead of overwriting.

allocate(shared_outputs=None)[source]

Allocate output buffer to store activations from fprop.

Parameters: shared_outputs (Tensor, optional) – pre-allocated tensor for activations to be computed into
allocate_deltas(global_deltas)
be = None
bprop(deltas, alpha=1.0, beta=0.0)[source]
Backpropagation of errors, output delta for previous layer, and calculate the update on
model params.
Parameters: deltas (Tensor) – error tensors for each time step of unrolling alpha (float, optional) – scale to apply to input for activation gradient bprop. Defaults to 1.0 beta (float, optional) – scale to apply to output activation gradient bprop. Defaults to 0.0
dW_input

dW_recur

db

Returns: Backpropagated errors for each time step of model unrolling Tensor
classnm

Returns the class name.

configure(in_obj)

Set shape based parameters of this layer given an input tuple, int or input layer.

Parameters: in_obj (int, tuple, Layer, Tensor or dataset) – object that provides shape information for layer shape of output data (tuple)
final_state()

Return final state for sequence to sequence models

fprop(inputs, inference=False, init_state=None)[source]
Apply the forward pass transformation to the input data. The input data is a list of
inputs with an element for each time step of model unrolling.
Parameters: inputs (Tensor) – input data as 3D tensors, then converted into a list of 2D tensors. The dimension is (input_size, sequence_length * batch_size) inference (bool, optional) – Set to true if you are running inference (only care about forward propagation without associated backward propagation). Default is False. GRU output for each model time step Tensor
gen_class(pdict)
get_description(get_weights=False, keep_states=True)

Get layer parameters. All parameters are needed for optimization, but only weights are serialized.

Parameters: get_weights (bool, optional) – Control whether all parameters are returned or just weights for serialization. keep_states (bool, optional) – Control whether all parameters are returned or just weights for serialization.
get_final_hidden_error()

Return hidden delta after bprop and adjusting for bprop from decoder to encoder in sequence to sequence models.

get_is_mklop()

is_mklop true means this op is on mkl backend and may require convert when from non-mkl op

get_param_attrs()
get_params()

Get layer parameters, gradients, and states for optimization.

get_params_serialize(keep_states=True)
get_terminal()

Used for recursively getting final nodes from layer containers.

init_buffers(inputs)

Initialize buffers for recurrent internal units and outputs. Buffers are initialized as 2D tensors with second dimension being steps * batch_size. The second dimension is ordered as [s1b1, s1b2, …, s1bn, s2b1, s2b2, …, s2bn, …] A list of views are created on the buffer for easy manipulation of data related to a certain time step

Parameters: inputs (Tensor) – input data as 2D tensor. The dimension is (input_size, sequence_length * batch_size)
init_params(shape)[source]

Initialize params for GRU including weights and biases. The weight matrix and bias matrix are concatenated from the weights for inputs and weights for recurrent inputs and bias. The shape of the weights are (number of inputs + number of outputs +1 ) by (number of outputs * 3)

Parameters: shape (Tuple) – contains number of outputs and number of inputs
load_weights(pdict, load_states=True)

Parameters: pdict – load_states – (Default value = True)
modulenm

Returns the full module path.

nested_str(level=0)

Utility function for displaying layer info with a given indentation level.

Parameters: level (int, optional) – indentation level layer info at the given indentation level str
recursive_gen(pdict, key)

helper method to check whether the definition dictionary is defining a NervanaObject child, if so it will instantiate that object and replace the dictionary element with an instance of that object

serialize()

Get state parameters for this layer.

Returns: whatever data this model wants to receive in order to restore state varies
set_acc_on(acc_on)

Set the acc_on flag according to bool argument. If set to true, the layer will accumulate some (preset) parameters on calls to functions that are decorated with the accumulates decorator. In order to use this feature, accumulate_updates=True must have been passed to the layer’s allocate function

This currently only works for a few hard coded parameters in select layers

Parameters: acc_on (bool) – Value to set the acc_on flag to.
set_batch_size(N)

Set minibatch size.

Parameters: N (int) – minibatch size
set_deltas(delta_buffers)

Use pre-allocated (by layer containers) list of buffers for backpropagated error. Only set deltas for layers that own their own deltas Only allocate space if layer owns its own deltas (e.g., bias and activation work in-place, so do not own their deltas).

Parameters: delta_buffers (list) – list of pre-allocated tensors (provided by layer container)
set_is_mklop()
set_next(layer)

Set next_layer to provided layer.

Parameters: layer (layer) – Next layer
set_not_mklop()
set_params(pdict)

Set layer parameters (weights). Allocate space for other parameters but do not initialize them.

Parameters: pdict (dict, ndarray) – dictionary or ndarray with layer parameters [support for ndarray is DEPRECATED and will be removed]
set_seq_len(S)

Set sequence length.

Parameters: S (int) – sequence length
set_states(pdict)