Layers

To specify the architecture of a model, we can create a network by concatenating layers in a list:

from neon.layers import Affine
from neon.initializers import Gaussian
from neon.transforms import Rectlin

init = Gaussian()

# add three affine (all-to-all) layers
layers = []
layers.append(Affine(nout=100, init=init, bias=init, activation=Rectlin()))
layers.append(Affine(nout=50, init=init, bias=init, activation=Rectlin()))
layers.append(Affine(nout=10, init=init, bias=init, activation=Rectlin()))

Each layer has several core methods:

Method Description
configure(self, in_obj) Define the layer’s out_shape and in_shape
allocate(self, shared_outputs=None) Allocate the output buffer (if needed)
fprop(self, inputs, inference=False) Forward propagate the activation based on the tensor inputs. If inference, do not store the outputs.
bprop(self, error) Backward propagate the tensor error and return the gradients

During model training, the provided training data is propagated through the model’s layers, calling the configure method to set the appropriate layer shapes. Then, each layer’s allocate method is called to allocate any needed buffers.

Layer taxonomy

The base classes neon.layers.Layer, neon.layers.ParameterLayer, and neon.layers.CompoundLayer form the classes from which all other layers should inherit. These base classes are not meant to be directly instantiated. The figure below is a taxonomy of all the layers implemented in neon ( \(B\leftarrow A\) means that \(B\) inherits from \(A\)).

_images/LayerTaxonomy_v3.gif

Layer

Because these layers do not have weights, they do not need to be instantiated with a neon.initializers.Initializer. Below is table of the layers, their key layer-specific parameters, and a description.

Layer Parameters Description
neon.layers.Dropout keep=0.5 At each fprop call, retains a random keep fraction of units
neon.layers.Pooling fshape, op, strides, padding Pools over a window fshape (height, width, num_filters) with the operation op (either “max” or “avg”).
neon.layers.BatchNorm rho=0.9 Z-scores each minibatch’s input, then scales with \(f(z) = \gamma z + \beta\). See Ioffe, 2015
neon.layers.LRN alpha=1, beta=0, ascale=1, bpower=1 Performs local response normalization (see Section 3.3 in Krizhevsky, 2012)
neon.layers.Activation transform Applies transform (neon.transforms.Transform) to the input
neon.layers.BranchNode   Inserts a branching node (see Layer containers)
neon.layers.SkipNode   Layer that allows pass-through

Parameter Layers

These layers with weights inherit from neon.layers.ParameterLayer, which handles the buffering and tracking of the weight parameters. They should be initialized with an Initializer (neon.initializers.Initializer). For example,

from neon.layers import Linear
from neon.initializers import Gaussian

layers = Linear(nout = 100, init = Gaussian())
Layer Parameters Description
neon.layers.Linear nout Linear all-to-all layer with nout units
neon.layers.Convolution fshape, strides, padding Convolves the input with filters of size fshape (height, width, num_filters).
neon.layers.Deconvolution fshape, strides, padding Applies deconvolution with filters of size fshape
neon.layers.LookupTable vocab_size, embedding_dim Embeds input with vocab_size number of unique symbols to embedding_dim dimensions
neon.layers.Bias   Adds a learned bias to the input

Compound Layers

Filtering or linear layers are often combined with a bias and an activation function. For convenience, we use neon.layers.CompoundLayer which are simply a list of layers, to initialize these layers. For example,

from neon.layers import Conv
from neon.initializers import Gaussian, Constant
from neon.transforms import Rectlin

layers = Conv((11, 11, 64), init=Gaussian(scale=0.01), bias=Constant(0),
              activation=Rectlin(), name = "myConv")

This code will create a convolution layer, followed by a bias layer and a rectified linear activation layer. By default, the convolution layer will be given the name "myConv", the bias layer "myConv_bias", and the activation layer "myConv_Rectlin".

Layer Description
neon.layers.Affine Linear -> Bias -> Activation
neon.layers.Conv Convolution -> Bias-> Activation
neon.layers.Deconv Deconvolution -> Bias -> Activation

Recurrent Layers

Recurrent layers inherit from the base class neon.layers.Recurrent. The number of recurrent units is specified by the argument output_size. These layers also require the arguments init (Initializer) and activation (Transform) to seed the model’s weights and activation function for the inputs-to-hidden units connections. An optional argument is init_inner, which initializes the models’ recurrent parameters. If absent, the initializer provided with init will be used.

Additional layer-specific parameters are specified below:

Layer Parameters Description
neon.layers.Recurrent   Recurrent layer with all-to-all connections
neon.layers.LSTM gate_activation Long Short-Term Layer (LSTM) implementation
neon.layers.GRU gate_activation Gated Recurrent Unit (GRU)

Examples of a recurrent layer with tanh units:

from neon.initializers import Uniform, GlorotUniform
from neon.layers import Recurrent, Affine, GRU
from neon.transforms import Tanh, Softmax, Logistic
init = Uniform(low=-0.08, high=0.08)

# Recurrent layer with tanh units
layers = [Recurrent(500, init, activation=Tanh()),
          Affine(1000, init, bias=init, activation=Softmax())]

LSTM layer with embedding for word analysis:

# LSTM layer with embedding layer
layers = [
    LSTM(128, g_uni, activation=Tanh(),
         gate_activation=Logistic()),
    RecurrentSum(),
    Dropout(keep=0.5),
    Affine(2, g_uni, bias=GlorotUniform(), activation=Softmax())
]

Network with two stacked GRU layers:

# set common parameters
rlayer_params = {"output_size": hidden_size, "init": init,
                 "activation": Tanh(), "gate_activation": Logistic()}

# initialize two GRU layers
rlayer1, rlayer2 = GRU(**rlayer_params), GRU(**rlayer_params)

# build full model
layers = [
    LookupTable(vocab_size=1000, embedding_dim=200, init=init),
    rlayer1,
    rlayer2,
    Affine(1000, init, bias=init, activation=Softmax())
]

Summary layers

A recurrent layer can be followed with layers that collapse over the time dimension in interesting ways. These layers do not have weights/parameters and therefore do not undergo any learning.

Layer Description
neon.layers.RecurrentSum Sums unit output over time
neon.layers.RecurrentMean Averages unit output over time
neon.layers.RecurrentLast Retains output from last time step only

If a recurrent layer is followed by, for example, an Affine layer, and not one of the above summary layers, then the Affine layer has connections to all the units from the different time steps.