Creating new layers

A simple layer

To implement a simple custom layer in neon, write a Python class that subclasses Layer (neon.layers.Layer). Layer is a subclass of neon.NervanaObject, which contains a static instance of the computational backend. The backend is exposed in the class as self.be.

At minimum, the layer must implement configure() to properly set the input/output shapes as well as the fprop() and bprop() methods for forward and backward propagation, respectively. For computations that require pre-allocating buffer space, also implement Layer.allocate().

Here is a custom layer that multiples the input by two.

class MultiplyByTwo(Layer):
    " A layer that multiples the input by two. "

    # constructor and initialize buffers
    def __init__(self, name=None):
        super(MultiplyByTwo, self).__init__(name)

    # configure the layer input and output shapes
    def configure(self, in_obj):
        super(MultiplyByTwo, self).configure(in_obj)
        self.out_shape = self.in_shape
        return self

    # compute the fprop
    def fprop(self, inputs, inference=False):
        self.outputs = inputs
        self.outputs[:] = inputs*2
        return self.outputs

    # backprop the gradients
    def bprop(self, error):
        error[:] = 2*error
        return error

Let’s break this down. Because this layer does not change the shape of the data (as opposed to convolutional or pooling layers, for example), configure() simply sets the shape of the output to the shape of the input layer in_obj. During model initialization, neon forward-propagates the shapes through the layers, calling configure().

Layer has a tensor self.outputs that is pre-allocated, which we use as a buffer to store the outputs. For bprop() and fprop(), we execute the backend multiply operations and return the Tensor results.

Auto-differentiation

Neon provides an auto-differentiation feature, which is particularly useful for defining bprop(). Computations with our backend are first stored as a graph of numerical calculations (see backend). For example, we define the logistic function

myTensor = be.zeros((10,10), name = 'myTensor')
f = 1/(1+be.exp(-1*myTensor))

Then, f is an op-tree (neon.backends.backend.OpTreeNode). We execute the op-tree by calling the proper syntax.

fval = be.empty((10,10)) # allocate space for output
fval[:] = f # execute the op-tree

We compute the gradients from an op-tree by calling Autodiff and passing the op-tree f and the backend be:

from neon.backends import Autodiff

myAutodiff = Autodiff(op_tree = f, be = be)

Then we retrieve the gradients by calling get_grad_tensor() and passing the tensor myTensor.

grads = myAutodiff.get_grad_tensor(myTensor)

There are two other methods for computing the gradient. The method get_grad_asnumpyarray returns a numpy array instead of a tensor. Relevant for constructing layers is the back_prop_grad function, which stores the result in the provided tensor.

grads = be.empty((10,10))
myAutodiff.back_prop_grad(myTensor,grads)

Example layer with autodiff

We can put this into action with a BatchNorm layer that uses Autodiff

class BatchNormAutodiff(BatchNorm):

    def __init__(self, rho=0.99, eps=1e-6, name=None):
        super(BatchNormAutodiff, self).__init__(rho, eps, name)

    def get_forward_optree(self):
        """
        Initialize the fprop optree for batchnorm.
        """
        # get fprop op-tree
        xvar = self.be.var(self.x, axis=1)
        xmean = self.be.mean(self.x, axis=1)
        xhat = (self.x - xmean) / self.be.sqrt(xvar + self.eps)
        return xhat * self.gamma + self.beta

    def fprop(self, inputs, inference=False):
        """
        Compute the actual fprop from op-tree, update the global estimations
        """
        if inference:
            return self._fprop_inference(inputs)
        self.init_buffers(inputs)
        if self.allparams is None:
            self.init_params(self.nfm)
            self.fprop_op_tree = self.get_forward_optree()

        # the actual f-prop
        self.y[:] = self.fprop_op_tree

        # for inference
        self.gmean[:] = (self.gmean * self.rho + (1.0 - self.rho) * self.be.mean(self.x, axis=1))
        self.gvar[:] = (self.gvar * self.rho + (1.0 - self.rho) * self.be.var(self.x, axis=1))

        return self.outputs

    def bprop(self, error):
        """
        Use Autodiff.back_prop_grad to back propagate gradients for the
        corresponding tensors.
        """
        if not self.deltas:
            self.deltas = error.reshape((self.nfm, -1))

        # autodiff will automatically cache and reuse the object
        # if we know the `error` buffer at init, we can also create the autodiff
        # object at layer's init
        ad = Autodiff(self.fprop_op_tree, self.be, next_error=self.deltas)

        # back propagate
        ad.back_prop_grad([self.x, self.gamma, self.beta],
                          [self.deltas, self.grad_gamma, self.grad_beta])

        return error

Layers with parameters

For simple layers that do not carry any weights, inheriting from Layer suffices. However, if the layer has weight parameters (e.g. linear, convolutional, etc.), neon has a class ParameterLayer with common functionality for storing and tracking weights.

This class has the variables W (Tensor) for storing the weights and implements allocate() to allocate the buffers for W and initialize W with the provided initializer. New layers with weights should subclass from ParameterLayer and implement configure(), fprop(), and bprop().