MNIST is a computer vision dataset consisting of 70,000 images of handwritten digits. Each image has 28x28 pixels for a total of 784 features, and is associated with a digit between 0-9.
In this tutorial, we will construct a multi-layer perceptron (also called softmax regression) to recognize each image. Note that this tutorial assumes some basic familiarity with python and machine learning.
This tutorial is similar to the model specified in
The first step is to set up the argument parser, which enables customizing options with flags (see the previous chapter).
#!/usr/bin/env python from neon.util.argparser import NeonArgparser parser = NeonArgparser(__doc__) args = parser.parse_args()
The MNIST dataset can be found on Yann LeCunn’s
website. We have included an easy
function that downloads the MNIST dataset into your
directory and loads it into memory.
from neon.data import MNIST mnist = MNIST() (X_train, y_train), (X_test, y_test), nclass = mnist.load_data()
This function automatically splits the images
X and labels
into training (60,000 examples) and testing (10,000 examples) data. The
X_train is a numpy array with shape
(num_examples, num_features) = (60000, 784).
During training, neon iterates over the training examples to compute the
gradients. We use the following commands to set up the
object that we send to the optimizer.
from neon.data import ArrayIterator # setup training set iterator train_set = ArrayIterator(X_train, y_train, nclass=nclass) # setup test set iterator test_set = ArrayIterator(X_test, y_test, nclass=nclass)
For small datasets like MNIST, this step may seem trivial. However, for large datasets that cannot fit into memory (e.g. ImageNet or Sports-1M), the data has to be efficiently loaded and fed to the optimizer in batches. This requires more advanced iterators described in Loading Data.
Since it is a common function, the data iterator generation for stock datasets can be done directly through helper methods contained in the DataSet class. For example, the MNIST training and validation set iterators can be obtained with the following code:
from neon.data import MNIST mnist = MNIST() train_set = mnist.train_iter test_set = mnist.valid_iter
Training a deep learning model in neon requires specifying the dataset, a list of layers, a cost function, and the learning rule. Here we guide you through each item in turn.
Neon supports many ways of initializing weight matrices. In this tutorial, we initialize the weights using a Gaussian distribution with zero mean and 0.01 standard deviation.
from neon.initializers import Gaussian init_norm = Gaussian(loc=0.0, scale=0.01)
The model is specified as a list of layers. For classifying MNIST images, we use a multi-layer perceptron with fully connected layers.
An output layer with 10 units to match the number of labels in the MNIST dataset. We use the
Softmax()activation function to ensure the outputs sum to one and are within the range \([0, 1]\).
from neon.layers import Affine from neon.transforms import Rectlin, Softmax layers =  layers.append(Affine(nout=100, init=init_norm, activation=Rectlin())) layers.append(Affine(nout=10, init=init_norm, activation=Softmax()))
We initialize the weights in each layer with the
previously. Neon supports many other layer types (convolutional,
pooling, recurrent, etc.) that will be described in subsequent examples.
We then construct the model via
# initialize model object from neon.models import Model mlp = Model(layers=layers)
The cost function is wrapped within a
GeneralizedCost layer, which
handles the comparison of the outputs with the provided labels in the
dataset. One common cost function which we use here is the cross
from neon.layers import GeneralizedCost from neon.transforms import CrossEntropyMulti cost = GeneralizedCost(costfunc=CrossEntropyMulti())
To read more about costs, read Costs and metrics.
For learning, we use stochastic gradient descent with a learning rate of 0.1 and momentum coefficient of 0.9.
from neon.optimizers import GradientDescentMomentum optimizer = GradientDescentMomentum(0.1, momentum_coef=0.9)
Additional optimizers and optional arguments are discussed in Optimizers.
Neon provides an API for calling operations during the model fit (see Callbacks). Here we set up the default callback, which is displaying a progress bar for each epoch.
from neon.callbacks.callbacks import Callbacks callbacks = Callbacks(mlp, eval_set=test_set, **args.callback_args)
Putting it all together¶
We are ready to put all the ingredients together and run our model!
mlp.fit(train_set, optimizer=optimizer, num_epochs=args.epochs, cost=cost, callbacks=callbacks)
At the beginning of the fitting procedure, neon propagates
through the model to set the input and output shapes of each layer. Each
layer has a
configure() method that determines the appropriate layer
shapes, and an
allocate() method to set up the needed buffers for
holding the forward propagation information.
During the training, neon sends batches of the training data through the
model, calling each layers’
bprop() methods to
compute the gradients and update the weights.
Using the trained model¶
Now that the model is successfully trained, we can use the trained model to classify a novel image, measure performance, and visualize the weights and training results.
Given a set of images such as those contained in the iterable
test_set, we can fetch the output of the final model layer via
results = mlp.get_outputs(test_set)
results is a numpy array with shape
(num_test_examples, num_outputs) = (10000,10) with the model
probabilities for each label.
Neon supports convenience functions for evaluating performance using custom metrics. Here we measure the misclassification rate on the held out test set.
from neon.transforms import Misclassification # evaluate the model on test_set using the misclassification metric error = mlp.eval(test_set, metric=Misclassification())*100 print('Misclassification error = %.1f%%' % error)
This simple example guides you through the basic operations needed to create and fit a neural network. However, neon contains a rich feature set of customizable layers, metrics, and options. To learn more, we recommend reading through the CIFAR10 tutorial, which introduces convolutional neural networks.