neon.data.dataiterator.ArrayIterator

class neon.data.dataiterator.ArrayIterator(X, y=None, nclass=None, lshape=None, make_onehot=True, name=None)[source]

Bases: neon.data.dataiterator.NervanaDataIterator

The ArrayIterator class iterates over minibatches of data that have been preloaded into memory in the form of numpy arrays. This may be used when the entire dataset (e.g. CIFAR-10 or MNIST) is small enough to fit in memory. For example:

X = np.random.rand(10000, 3072)
y = np.random.randint(0, 10, 10000)
train = ArrayIterator(X=X, y=y, nclass=10, lshape=(3, 32, 32))

The above will create the ArrayIterator object. This object implements python’s __iter__ method, and returns one minibatch of data, formatted as tuple of (input, label), with each iteration. The minibatch size is controlled by the generated backend.

X should be an ndarray of shape (# example, # features). For images, the features should be formatted in (channel, height, width) order. The lshape keyword indicates the local shape of the images in (channel, height, width) format.

For classification tasks, the labels y should be integers from 0 to K-1, where K is the total number of classes. When y is not provided, the input features themselves will be returned as the target values (e.g. autoencoder).

In regression tasks, where y is not a categorical label, set make_onehot to False. For example:

X = np.random.rand(1000, 1)
y = 2*X + 1
train = ArrayIterator(X=X, y=y, make_onehot=False)

For more information, see the Loading data section of the documentation.

__init__(X, y=None, nclass=None, lshape=None, make_onehot=True, name=None)[source]

During initialization, the input data will be converted to backend tensor objects (e.g. CPUTensor or GPUTensor). If the backend uses the GPU, the data is copied over to the device.

Parameters:
  • (ndarray, shape (y) – [# examples, feature size]): Input features of the dataset.
  • (ndarray, shape – [# examples, 1 or feature size], optional): Labels corresponding to the input features. If absent, the input features themselves will be returned as target values (e.g. autoencoder)
  • nclass (int, optional) – The number of classes in labels. Not necessary if labels are not provided or where the labels are non-categorical.
  • lshape (tuple, optional) – Local shape for the input features (e.g. # channels, height, width)
  • make_onehot (bool, optional) – True if y is a categorical label that has to be converted to a one hot representation.

Methods

__init__(X[, y, nclass, lshape, …]) During initialization, the input data will be converted to backend tensor objects (e.g.
gen_class(pdict)
get_description([skip]) Returns a dict that contains all necessary information needed to serialize this object.
recursive_gen(pdict, key) helper method to check whether the definition
reset() Resets the starting index of this dataset to zero.
be = None
classnm

Returns the class name.

gen_class(pdict)
get_description(skip=[], **kwargs)

Returns a dict that contains all necessary information needed to serialize this object.

Parameters:skip (list) – Objects to omit from the dictionary.
Returns:Dictionary format for object information.
Return type:(dict)
modulenm

Returns the full module path.

nbatches

Return the number of minibatches in this dataset.

recursive_gen(pdict, key)

helper method to check whether the definition dictionary is defining a NervanaObject child, if so it will instantiate that object and replace the dictionary element with an instance of that object

reset()[source]

Resets the starting index of this dataset to zero. Useful for calling repeated evaluations on the dataset without having to wrap around the last uneven minibatch. Not necessary when data is divisible by batch size