neon.data.hdf5iterator.HDF5IteratorOneHot

class neon.data.hdf5iterator.HDF5IteratorOneHot(hdf_filename, name=None)[source]

Bases: neon.data.hdf5iterator.HDF5Iterator

Extends the HDF5Iterator class to add one hot conversion of the target data. For example:

train_set = HDF5IteratorOneHot('your_data_path.h5')

The HDF5 file format must contain the following datasets: - input dataset: Input data, which is a 2-D array (float or uint8) that

has size (N, C*H*W), where N is the number of examples, C is the number of channels, and H and W are the height and width of the image, respectively. This dataset must also have the following attributes:
  • lshape: a tuple of int indicating the shape of each
    input (for examples, image data may have an lshape of [C, H, W])
  • mean: the mean to subtract, either formatted as (C*H*W, 1) or
    a mean for each channel with dimensions (C, 1)
  • output dataset: An optional dataset which, if supplied, will be
    used at the target/expected output of the network. the array should have the shape (N, M) where N is the number of items (must match the N dim of the input set) and M is the size of the output data which must match size of ouput from the output layer of the network.

The “output” dataset in the HDF5 (if present) must have the ‘nclass’ attribute specifying the number of total output classes which is needed for generating the one-hot encoding.

__init__(hdf_filename, name=None)[source]
Parameters:
  • hdf_filename (string) – Path to the HDF5 datafile.
  • name (string, optional) – Name to assign this iterator. Defaults to None.

Methods

__init__(hdf_filename[, name])
param hdf_filename:
 Path to the HDF5 datafile.
allocate() After the input and output (self.inp and self.out) have been set this function will allocate the host and device buffers for the mini-batches.
allocate_inputs() Allocates the host and device input data buffers and any other associated storage.
allocate_outputs()
cleanup() Closes the HDF file.
gen_class(pdict)
gen_input(mini_batch) Function to handle any preprocessing before pushing an input mini-batch to the device.
gen_output(mini_batch)
get_description([skip]) Returns a dict that contains all necessary information needed to serialize this object.
recursive_gen(pdict, key) helper method to check whether the definition
reset() Resets the index to zero.
allocate()

After the input and output (self.inp and self.out) have been set this function will allocate the host and device buffers for the mini-batches.

The host buffer is referenced as self.mini_batch_in and self.mini_batch_out, and stored on device as self.inbuf and self.outbuf.

allocate_inputs()

Allocates the host and device input data buffers and any other associated storage.

self.inpbuf is the on-device buffer for the input minibatch self.mini_batch_in is the on-host buffer for the input minibatch self.mean is the on-device buffer of the mean array

allocate_outputs()[source]
be = None
classnm

Returns the class name.

cleanup()

Closes the HDF file.

gen_class(pdict)
gen_input(mini_batch)

Function to handle any preprocessing before pushing an input mini-batch to the device. For example, mean subtraction etc.

Parameters:mini_batch (ndarray) – M-by-N array where M is the flatten input vector size and N is the batch size
gen_output(mini_batch)[source]
get_description(skip=[], **kwargs)

Returns a dict that contains all necessary information needed to serialize this object.

Parameters:skip (list) – Objects to omit from the dictionary.
Returns:Dictionary format for object information.
Return type:(dict)
modulenm

Returns the full module path.

nbatches

Return the number of minibatches in this dataset.

recursive_gen(pdict, key)

helper method to check whether the definition dictionary is defining a NervanaObject child, if so it will instantiate that object and replace the dictionary element with an instance of that object

reset()

Resets the index to zero.