neon.data.questionanswer.BABI

class neon.data.questionanswer.BABI(path='.', task='qa1_single-supporting-fact', subset='en')[source]

Bases: neon.data.datasets.Dataset

This class loads in the Facebook bAbI dataset and vectorizes them into stories, questions, and answers as described in: “Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks” http://arxiv.org/abs/1502.05698.

__init__(path='.', task='qa1_single-supporting-fact', subset='en')[source]

Load bAbI dataset and extract text and read the stories For a particular task, the class will read both train and test files and combine the vocabulary.

Parameters:
  • path (str) – Directory to store the dataset
  • task (str) – a particular task to solve (all bAbI tasks are train and tested separately)
  • subset (str) – subset of the dataset to use: {en, en-10k, shuffled, hn, hn-10k, shuffled-10k}

Methods

__init__([path, task, subset]) Load bAbI dataset and extract text and read the stories For a particular task, the class will read both train and test files and combine the vocabulary.
compute_statistics() Compute vocab, word index, and max length of stories and queries.
data_to_list(data) Clean a block of data and split into lines.
fetch_dataset(url, sourcefile, destfile, totalsz) Download the file specified by the given URL.
flatten(data) Flatten a list of data.
gen_class(pdict)
gen_iterators() Method that generates the data set iterators for the train, test and validation data sets.
get_description([skip]) Returns a dict that contains all necessary information needed to serialize this object.
get_iterator(setname) Helper method to get the data iterator for specified dataset
load_data([path, task, subset]) Fetch the Facebook bAbI dataset and load it to memory.
load_zip(filename, size) Helper function for downloading test files
one_hot_vector(answer) Create one-hot representation of an answer.
parse_babi(babi_file) Parse bAbI data into stories, queries, and answers.
recursive_gen(pdict, key) helper method to check whether the definition
serialize() Generates dictionary with the required parameters to describe this object
tokenize(sentence) Split a sentence into tokens including punctuation.
vectorize_stories(data) Convert (story, query, answer) word data into vectors.
words_to_vector(words) Convert a list of words into vector form.
be = None
classnm

Returns the class name.

compute_statistics()[source]

Compute vocab, word index, and max length of stories and queries.

data_dict
static data_to_list(data)[source]

Clean a block of data and split into lines.

Parameters:data (string) – String of bAbI data.
Returns:List of cleaned lines of bAbI data.
Return type:list
fetch_dataset(url, sourcefile, destfile, totalsz)

Download the file specified by the given URL.

Parameters:
  • url (str) – Base URL of the file to be downloaded.
  • sourcefile (str) – Name of the source file.
  • destfile (str) – Path to the destination.
  • totalsz (int) – Size of the file to be downloaded.
static flatten(data)[source]

Flatten a list of data.

Parameters:data (list) – List of list of words.
Returns:A single flattened list of all words.
Return type:list
gen_class(pdict)
gen_iterators()

Method that generates the data set iterators for the train, test and validation data sets. This method needs to set the instance data_set attribute to a dictionary of data iterators.

Returns:dictionary with the various data set iterators
Return type:dict
get_description(skip=[], **kwargs)

Returns a dict that contains all necessary information needed to serialize this object.

Parameters:skip (list) – Objects to omit from the dictionary.
Returns:Dictionary format for object information.
Return type:(dict)
get_iterator(setname)

Helper method to get the data iterator for specified dataset

Parameters:setname (str) – which iterator to return (e.g. ‘train’, ‘valid’)
load_data(path='.', task='qa1_single-supporting-fact', subset='en')[source]

Fetch the Facebook bAbI dataset and load it to memory.

Parameters:
  • path (str, optional) – Local directory in which to cache the raw dataset. Defaults to current directory.
  • task (str, optional) – bAbI task to load
  • subset (str, optional) – Data comes in English, Hindi, or Shuffled characters. Options are ‘en’, ‘hn’, and ‘shuffled’ for 1000 training and test examples or ‘en-10k’, ‘hn-10k’, and ‘shuffled-10k’ for 10000 examples.
Returns:

training and test files are returned

Return type:

tuple

load_zip(filename, size)

Helper function for downloading test files Will download and unzip the file into the directory self.path

Parameters:
  • filename (str) – name of file to download from self.url
  • size (str) – size of the file in bytes?
Returns:

Path to the downloaded dataset.

Return type:

str

modulenm

Returns the full module path.

one_hot_vector(answer)[source]

Create one-hot representation of an answer.

Parameters:answer (string) – The word answer.
Returns:One-hot representation of answer.
Return type:list
static parse_babi(babi_file)[source]

Parse bAbI data into stories, queries, and answers.

Parameters:
  • babi_data (string) – String of bAbI data.
  • babi_file (string) – Filename with bAbI data.
Returns:

List of (story, query, answer) words.

Return type:

list of tuples

recursive_gen(pdict, key)

helper method to check whether the definition dictionary is defining a NervanaObject child, if so it will instantiate that object and replace the dictionary element with an instance of that object

serialize()

Generates dictionary with the required parameters to describe this object

test_iter

Helper method to return test set iterator

static tokenize(sentence)[source]

Split a sentence into tokens including punctuation.

Parameters:sentence (string) – String of sentence to tokenize.
Returns:List of tokens.
Return type:list
train_iter

Helper method to return training set iterator

valid_iter

Helper method to return validation set iterator

vectorize_stories(data)[source]

Convert (story, query, answer) word data into vectors.

Parameters:data (tuple) – Tuple of story, query, answer word data.
Returns:Tuple of story, query, answer vectors.
Return type:tuple
words_to_vector(words)[source]

Convert a list of words into vector form.

Parameters:words (list) – List of words.
Returns:Vectorized list of words.
Return type:list