neon.data.hdf5iterator.HDF5Iterator

class neon.data.hdf5iterator.HDF5Iterator(hdf_filename, name=None)[source]

Bases: neon.data.dataiterator.ArrayIterator

Data iterator which uses an HDF5 file as the source of the data, useful when the entire HDF5 dataset cannot fit into memory (for smaller datasets, use the ArrayIterator).

To initialize the HDF5Iterator, simply call:

train_set = HDF5Iterator('your_data_path.h5')

The HDF5 file format must contain the following datasets:

  • input (ndarray):

    Input data, which is a 2-D array (float or uint8) that has size (N, F), where N is the number of examples, and F is the number of features. For images, F = C*H*W where C is the number of channels and H and W are the height and width of the image, respectively. This data must also have the following attributes:

    • lshape (tuple):
      Tuple of ints indicating the shape of each input (for examples, image data may have an lshape of [C, H, W])
    • mean (ndarray):
      The mean to subtract, either formatted as (F, 1) or a mean for each channel with dimensions (C, 1)
  • output (ndarray):

    An optional dataset which, if supplied, will be used at the target/expected output of the network. the array should have the shape (N, M) where N is the number of items (must match the N dim of the input set) and M is the size of the output data which must match size of ouput from the output layer of the network.

For cases where the output should be converted to a one-hot encoding (see Loading Data), use the HDF5IteratorOneHot. Or for autoencoder problems, use HDFIteratorAutoencoder.

__init__(hdf_filename, name=None)[source]
Parameters:
  • hdf_filename (string) – Path to the HDF5 datafile.
  • name (string, optional) – Name to assign this iterator. Defaults to None.

Methods

__init__(hdf_filename[, name])
param hdf_filename:
 Path to the HDF5 datafile.
allocate() After the input and output (self.inp and self.out) have been set this function will allocate the host and device buffers for the mini-batches.
allocate_inputs() Allocates the host and device input data buffers and any other associated storage.
allocate_outputs() Allocates the host and device output data buffers and any other associated storage.
cleanup() Closes the HDF file.
gen_class(pdict)
gen_input(mini_batch) Function to handle any preprocessing before pushing an input mini-batch to the device.
gen_output(mini_batch) Function to handle any preprocessing before pushing an output mini-batch to the device.
get_description([skip]) Returns a dict that contains all necessary information needed to serialize this object.
recursive_gen(pdict, key) helper method to check whether the definition
reset() Resets the index to zero.
allocate()[source]

After the input and output (self.inp and self.out) have been set this function will allocate the host and device buffers for the mini-batches.

The host buffer is referenced as self.mini_batch_in and self.mini_batch_out, and stored on device as self.inbuf and self.outbuf.

allocate_inputs()[source]

Allocates the host and device input data buffers and any other associated storage.

self.inpbuf is the on-device buffer for the input minibatch self.mini_batch_in is the on-host buffer for the input minibatch self.mean is the on-device buffer of the mean array

allocate_outputs()[source]

Allocates the host and device output data buffers and any other associated storage.

self.outbuf is the on-device buffer for the output minibatch self.mini_batch_out is the on-host buffer for the output minibatch

be = None
classnm

Returns the class name.

cleanup()[source]

Closes the HDF file.

gen_class(pdict)
gen_input(mini_batch)[source]

Function to handle any preprocessing before pushing an input mini-batch to the device. For example, mean subtraction etc.

Parameters:mini_batch (ndarray) – M-by-N array where M is the flatten input vector size and N is the batch size
gen_output(mini_batch)[source]

Function to handle any preprocessing before pushing an output mini-batch to the device. For example, one-hot generation.

Parameters:mini_batch (ndarray) – M-by-N array where M is the flatten output vector size and N is the batch size
get_description(skip=[], **kwargs)

Returns a dict that contains all necessary information needed to serialize this object.

Parameters:skip (list) – Objects to omit from the dictionary.
Returns:Dictionary format for object information.
Return type:(dict)
modulenm

Returns the full module path.

nbatches

Return the number of minibatches in this dataset.

recursive_gen(pdict, key)

helper method to check whether the definition dictionary is defining a NervanaObject child, if so it will instantiate that object and replace the dictionary element with an instance of that object

reset()[source]

Resets the index to zero.