Layer containers

Layer containers are used to wrap groups of layers to provide convenience functions for iterating through them during training. Containers also support efficiently allocating memory for outputs or parameters that are shared between different layers.


Sequential containers are the default container for models. This is the simplest type of container, and used to encapsulate linear pathways of multiple layers. Upon construction, a Sequential container will automatically flatten compound layers (such as Conv or Affine). Calling fprop() on a Sequential container will in turn call fprop on the constituent layers in its layers list.


Tree containers are used to represent branching pathways of layers where the overall structure has multiple output nodes. The Tree is constructed by specifying a list of Sequential containers. Each Sequential represents a pathway that terminates in an output node. The pathways can be provided to the Tree constructor as either a list of Sequential containers, or as a list of list of layers, in which case each list will be implicitly encapsulated in a Sequential during construction.

BranchNode is used to define branching points for the Tree.

The following are equivalent:

bnode = BranchNode()
layer_list = [
              [Conv((3,3,2), init=ifunc), bnode, Affine(nout=32, init=ifunc)],
              [bnode, Conv((3,3,2), init=ifunc), Affine(nout=16, init=ifunc)]
t = Tree(layers=layer_list, alphas=[1., 1.])
bnode = BranchNode()
layer_list = [
              Sequential([Conv((3,3,2), init=ifunc), bnode, Affine(nout=32, init=ifunc)]),
              Sequential([bnode, Conv((3,3,2), init=ifunc), Affine(nout=16, init=ifunc)])
t = Tree(layers=layer_list, alphas=[1., 1.])

The layer pathways must be specified in the order of precedence from the root of the tree. So the “trunk” of the tree must be provided first, then the subsequent pathways in the order which they occur up the tree. All non-trunk pathways must start with an instantiated BranchNode layer that also occurs in the trunk at the point of branching.

Multiple branching points can be used to construct complex networks. The figure below shows a branching model with three output nodes, and how that model is constructed from a list of containers.


An example of how to create a branching model using a Tree container is provided in

During training, the backpropagated errors of each branch are combined using the weighting parameters in the alphas list. By default, all branches are equally weighted (alphas are given default value of all ones).

During inference, only the trunk branch (first Sequential) is evaluated.


MergeBroadcast containers consist of multiple Sequential pathways that receive input from a single input layer (the broadcast), and then combine their outputs via concatenation (the merge). A MergeBroadcast container’s layers attribute consists of multiple Sequential containers, each representing one of the pathways receiving the broadcasted input. The output of the MergeBroadcast is the merged output of the Sequential members of layers. The method of concatenation is specified by the merge argument, which indicates the axis along which to concatenate. As in Tree, the provided layers list can consist of either Sequential objects or lists of layers which will be converted to Sequential objects during construction.


MergeMultistream containers are special cases of the MergeBroadcast container, except that they originate from the input provided to the model, by ArrayIterator, for example. Instead of broadcasting from an input layer to their internal Sequential pathways, each pathway gets its own input, that is unpacked from the input object. One scenario may be if the data source provides a tuple of Tensor, with each Tensor meant as input to each pathway (stream).