neon.layers.recurrent.LSTM

class neon.layers.recurrent.LSTM(output_size, init, init_inner=None, activation=None, gate_activation=None, reset_cells=False, name=None)[source]

Bases: neon.layers.recurrent.Recurrent

Long Short-Term Memory (LSTM) layer based on Hochreiter and Schmidhuber, Neural Computation 9(8): 1735-80 (1997).

Parameters:
  • output_size (int) – Number of hidden/output units
  • init (Initializer) – Function for initializing the model’s input to hidden weights. By default, this initializer will also be used for recurrent parameters unless init_inner is also specified. Biases will always be initialized to zero.
  • init_inner (Initializer, optional) – Function for initializing the model’s recurrent parameters. If absent, will default to using same initializer provided to init.
  • activation (Transform) – Activation function for the input modulation
  • gate_activation (Transform) – Activation function for the gates
  • reset_cells (bool) – default to be False to make the layer stateful, set to True to be stateless
  • name (str, optional) – name to refer to this layer as.
x

Tensor – input data as 2D tensor. The dimension is (input_size, sequence_length * batch_size)

W_input

Tensor – Weights on the input units (out size * 4, input size)

W_recur

Tensor – Weights on the recursive inputs (out size * 4, out size)

b

Tensor – Biases (out size * 4 , 1)

__init__(output_size, init, init_inner=None, activation=None, gate_activation=None, reset_cells=False, name=None)[source]

Methods

__init__(output_size, init[, init_inner, …])
accumulates(f) Higher order decorator function that enables accumulation functionality for that function.
allocate([shared_outputs]) Allocate output buffer to store activations from fprop.
allocate_deltas(global_deltas)
bprop(deltas[, alpha, beta]) Backpropagation of errors, output delta for previous layer, and
configure(in_obj) Set shape based parameters of this layer given an input tuple, int or input layer.
final_state() Return final state for sequence to sequence models
fprop(inputs[, inference, init_state]) Apply the forward pass transformation to the input data.
gen_class(pdict)
get_description([get_weights, keep_states]) Get layer parameters.
get_final_hidden_error() Return hidden delta after bprop and adjusting for bprop from decoder to encoder in sequence to sequence models.
get_is_mklop() is_mklop true means this op is on mkl backend
get_param_attrs()
get_params() Get layer parameters, gradients, and states for optimization.
get_params_serialize([keep_states])
get_terminal() Used for recursively getting final nodes from layer containers.
init_buffers(inputs) Initialize buffers for recurrent internal units and outputs.
init_params(shape) Initialize params including weights and biases.
load_weights(pdict[, load_states]) Load weights.
nested_str([level]) Utility function for displaying layer info with a given indentation level.
recursive_gen(pdict, key) helper method to check whether the definition
serialize() Get state parameters for this layer.
set_acc_on(acc_on) Set the acc_on flag according to bool argument.
set_batch_size(N) Set minibatch size.
set_deltas(delta_buffers) Use pre-allocated (by layer containers) list of buffers for backpropagated error.
set_is_mklop()
set_next(layer) Set next_layer to provided layer.
set_not_mklop()
set_params(pdict) Set layer parameters (weights).
set_seq_len(S) Set sequence length.
set_states(pdict)
accumulates(f)

Higher order decorator function that enables accumulation functionality for that function. Object that use this decorator are required to have an acc_param attribute. This attribute tuple declares the names for existing temp parameter and real parameter buffers. The temp parameter buffer copies the value of the parameter buffer before f is called, and after f is called the temp and normal buffers are summed. This decorator could be used to wrap any function that may want to accumulate parameters instead of overwriting.

allocate(shared_outputs=None)[source]

Allocate output buffer to store activations from fprop.

Parameters:shared_outputs (Tensor, optional) – pre-allocated tensor for activations to be computed into
allocate_deltas(global_deltas)
be = None
bprop(deltas, alpha=1.0, beta=0.0)[source]

Backpropagation of errors, output delta for previous layer, and calculate the update on model params

Parameters:
  • deltas (Tensor) – tensors containing the errors for each step of model unrolling. Expected 2D shape is (output_size, sequence_length * batch_size)
  • alpha (float, optional) – scale to apply to input for activation gradient bprop. Defaults to 1.0
  • beta (float, optional) – scale to apply to output activation gradient bprop. Defaults to 0.0
dW_input

Tensor – input weight gradients

dW_recur

Tensor – recursive weight gradients

db

Tensor – bias gradients

Returns:
Backpropagated errors for each time step
of model unrolling
Return type:Tensor
classnm

Returns the class name.

configure(in_obj)

Set shape based parameters of this layer given an input tuple, int or input layer.

Parameters:in_obj (int, tuple, Layer, Tensor or dataset) – object that provides shape information for layer
Returns:shape of output data
Return type:(tuple)
final_state()

Return final state for sequence to sequence models

fprop(inputs, inference=False, init_state=None)[source]
Apply the forward pass transformation to the input data. The input
data is a list of inputs with an element for each time step of model unrolling.
Parameters:
  • inputs (Tensor) – input data as 2D tensors, then being converted into a list of 2D slices. The dimension is (input_size, sequence_length * batch_size)
  • init_state (Tensor, optional) – starting cell values, if not None. For sequence to sequence models.
  • inference (bool, optional) – Set to true if you are running inference (only care about forward propagation without associated backward propagation). Default is False.
Returns:

LSTM output for each model time step

Return type:

Tensor

gen_class(pdict)
get_description(get_weights=False, keep_states=True)

Get layer parameters. All parameters are needed for optimization, but only weights are serialized.

Parameters:
  • get_weights (bool, optional) – Control whether all parameters are returned or just weights for serialization.
  • keep_states (bool, optional) – Control whether all parameters are returned or just weights for serialization.
get_final_hidden_error()

Return hidden delta after bprop and adjusting for bprop from decoder to encoder in sequence to sequence models.

get_is_mklop()

is_mklop true means this op is on mkl backend and may require convert when from non-mkl op

get_param_attrs()
get_params()

Get layer parameters, gradients, and states for optimization.

get_params_serialize(keep_states=True)
get_terminal()

Used for recursively getting final nodes from layer containers.

init_buffers(inputs)

Initialize buffers for recurrent internal units and outputs. Buffers are initialized as 2D tensors with second dimension being steps * batch_size. The second dimension is ordered as [s1b1, s1b2, …, s1bn, s2b1, s2b2, …, s2bn, …] A list of views are created on the buffer for easy manipulation of data related to a certain time step

Parameters:inputs (Tensor) – input data as 2D tensor. The dimension is (input_size, sequence_length * batch_size)
init_params(shape)

Initialize params including weights and biases. The weight matrix and bias matrix are concatenated from the weights for inputs and weights for recurrent inputs and bias.

Parameters:shape (Tuple) – contains number of outputs and number of inputs
load_weights(pdict, load_states=True)

Load weights.

Parameters:
  • pdict
  • load_states – (Default value = True)
modulenm

Returns the full module path.

nested_str(level=0)

Utility function for displaying layer info with a given indentation level.

Parameters:level (int, optional) – indentation level
Returns:layer info at the given indentation level
Return type:str
recursive_gen(pdict, key)

helper method to check whether the definition dictionary is defining a NervanaObject child, if so it will instantiate that object and replace the dictionary element with an instance of that object

serialize()

Get state parameters for this layer.

Returns:whatever data this model wants to receive in order to restore state
Return type:varies
set_acc_on(acc_on)

Set the acc_on flag according to bool argument. If set to true, the layer will accumulate some (preset) parameters on calls to functions that are decorated with the accumulates decorator. In order to use this feature, accumulate_updates=True must have been passed to the layer’s allocate function

This currently only works for a few hard coded parameters in select layers

Parameters:acc_on (bool) – Value to set the acc_on flag to.
set_batch_size(N)

Set minibatch size.

Parameters:N (int) – minibatch size
set_deltas(delta_buffers)

Use pre-allocated (by layer containers) list of buffers for backpropagated error. Only set deltas for layers that own their own deltas Only allocate space if layer owns its own deltas (e.g., bias and activation work in-place, so do not own their deltas).

Parameters:delta_buffers (list) – list of pre-allocated tensors (provided by layer container)
set_is_mklop()
set_next(layer)

Set next_layer to provided layer.

Parameters:layer (layer) – Next layer
set_not_mklop()
set_params(pdict)

Set layer parameters (weights). Allocate space for other parameters but do not initialize them.

Parameters:pdict (dict, ndarray) – dictionary or ndarray with layer parameters [support for ndarray is DEPRECATED and will be removed]
set_seq_len(S)

Set sequence length.

Parameters:S (int) – sequence length
set_states(pdict)