Design Decisions

Computation backend

  • All objects inherit from NervanaObject which has a static be variable which is the computation backend being used (gpu, mkl, or cpu).
    • be stores other important attributes like batch size and data type.
    • A backend must first be generated before running a model using gen_backend.
    • If swapping backends, then buffers must be reinitialized by reinstantiating the model layers and calling fprop with the new generated backend.

Data Layout

Neon’s layers internally store data as two-dimensional tensors. For convolution and pooling layers, the data is formatted in \((C, H, W, N)\) layout (\(C\) = channels, \(H\) = height, \(W\) = width, \(N\) = batch size), and represented as a tensor of shape \((F, N)\), where \(F = C * H *W\).

For recurrent layers, the time dimension \(T\) is added to the \(N\) dimension, so the data format is \((F, T*N)\). The second dimension is ordered by incrementing the batch index first: \(t_1n_1, t_1n_2, ... t_1n_N, t_2n_1, t_2n_2, ...\)

Layers

  • Most layers are in layer.py, recurrent layers are in recurrent.py, and merge layers for concatenating or summing input are in merge.py.

Composite layers

  • Some layers (for convenience) are composite layers made as lists of other layers.
    • Conv is a list of Convolution, Bias, and Activation layers
    • Affine is a list of Linear, Bias, and Activation layers
    • This allows flexibility in adding optional bias and activation layers without having to specify these as separate layers.

Layer buffer allocations

  • Data buffers
    • A layer infers input shape from previous layers and initializes buffers accordingly.
    • Pre-allocating activation buffers allows buffer reuse and reduces memory usage.
    • Buffers will be reinitialized during the next fprop if the layer is reinstantiated.
  • Parameter layers (Linear, Bias, Convolution, and BatchNorm) maintain their own parameters W, gradients dW, and states states (for the optimizer).
  • In general, layer buffer allocation is kicked off by the containing model, being called prior to the first fit or eval call.

Initialization

  • Weight initialization routines are in initializers.py and all have a fill method that describe how they will fill a given param buffer.
  • The weight initialization object is passed to the layer constructor and the layer will fill the parameters during init_params.

Models

Model container

  • The model provides a container of all the network layers and provides function calls to run and train the network. It is also responsible for initializing and allocating layer parameter buffers.
  • We can create a list of layers and give that to the model.
  • When forward or backward propagation functions are called, the model will iterate through all the layers to forward pass the inputs and backward pass the errors.

Learning

  • When training the model, the following necessary components will be provided:
    • a training set object that can iterate over training data
    • an optimizer that applies to all the layer updates or a multi-optimizer that maps different optimizers to different layers by layer name
    • a cost function to compute the error
    • callback object that configures whether to use a validation set and how frequent in the training to validate, whether to get progress bar display, etc. For more information, see neon fundamentals – callbacks.
  • During update, the model sends a list containing all layers with learnable parameters to the optimizer.
    • The optimizer will then grab a tuple of (W, dW, state) from each layer and apply the updates.

Choice of sizes

  • We will get better utilization if we pick more friendly sizes for batch size, sequence length, or feature size.
  • Our GPU kernels are optimized for sizes being multiples of 4.
  • In many of our examples, we use parameters from reference implementations. However, it is recommended to use multiples of 4. In many cases, zero-padding is needed to implement the same model.