neon.backends.nervanamkl.NervanaMKL¶

class
neon.backends.nervanamkl.
NervanaMKL
(rng_seed=None, default_dtype=<class 'numpy.float32'>, hist_bins=64, hist_offset=48, compat_mode=None, num_devices=None, stochastic_round=None, device_id=None, deterministic=None)[source]¶ Bases:
neon.backends.nervanacpu.NervanaCPU
MKL Backend

__init__
(rng_seed=None, default_dtype=<class 'numpy.float32'>, hist_bins=64, hist_offset=48, compat_mode=None, num_devices=None, stochastic_round=None, device_id=None, deterministic=None)[source]¶
Methods
Relu
(ary[, out])Calculates the ReLu transformation for input array. __init__
([rng_seed, default_dtype, …])absolute
(a[, out])Perform elementwise absolute value of Tensor a, storing the result in Tensor out. add
(a, b[, out])Perform elementwise addition on the operands, storing the resultant values in the out Tensor. add_fc_bias
(inputs, bias)Add the bias for a fully connected network layer. allocate_backend
(name, **kargs)Allocate a named backend. allocate_new_deltas
(delta, in_shape, parallelism)For MKL backends, allocate new deltas for broadcast allocate_new_outputs
(layer, share_output)argmax
(a[, axis, out, keepdims])Calculates the indices of the maximal element value along the specified axis. argmin
(a[, axis, out, keepdims])Calculates the indices of the minimal element value along the specified axis. array
(ary[, dtype, name, persist_values, …])Instantiate a new instance of the CPUTensor class setting each element value to what is specified in ary. backend_choices
()Return the list of available backends. batched_dot
(A, B, C[, alpha, beta, relu])Doing following operations: 1 For fprop: A(K, C), B(X,C,N), C(X,K,N) –> call batched_dot(A, B, C) 2 For bprop: A(K, C), B(X,K,N), C(X,C,N) –> call batched_dot(A.T, B, C) 3 For update: A(X,K,N), B(X,C,N), C(K,C) –> call batched_dot(A, B.T, C). batchnorm_layer
(in_shape)begin
(block, identifier)Signal the start of a block of repeated computation (at the start of a loop). bibnrnn_layer
(h_buffer_all, h_ff_buffer, …)Create a new BiBNRNN parameter object. binarize
(ary, out[, stochastic])Binarizes input array bprop_conv
(layer, F, E, grad_I[, X, bias, …])Backward propagate the error through a convolutional network layer. bprop_lrn
(layer, I, O, E, delta, denom[, …])Backward propagate pooling layer. bprop_mergebroadcast
(ngLayer, layers, …)bprop_mergesum
(ngLayer, alpha, beta, layers, …)bprop_pool
(layer, I, O[, argmax, alpha, beta])Backward propagate pooling layer. bprop_relu
(layer, x, error, deltas, slope)bprop_skipnode
(error, delta, alpha, beta)bprop_transform
(nglayer, transform, outputs, …)change_data_store_order
(a, a_row, a_col, a_len)check_caffe_compat
()Check whether compatibility mode is set to ‘caffe’. clean_data
(tensor, layer_mkl)cleanup_backend
()Release any resources that have been acquired by this backend. clip
(a, a_min, a_max[, out])Performs elementwise clipping of Tensor a, storing the result in out. compound_bprop_bn
(deltas, grad_gamma, …[, …])compound_bprop_lut
(nin, inputs, error, …)Backward propagate lookup table layer. compound_dot
(A, B, C[, alpha, beta, relu, bsum])Doing following operations (* is dot product) C = alpha * A * B + beta * C C = alpha * A.T * B + beta * C C = alpha * A * B.T + beta * C. compound_fprop_bn
(x, xsum, xvar, gmean, …)compound_rnn_unroll_bprop
(W_recur, …[, …])Time step unrolling portion of recurrent layer bprop. compound_rnn_unroll_bprop_bibnrnn
(ngLayer, …)compound_rnn_unroll_fprop
(W_recur, h_prev_s, …)Time step unrolling portion of recurrent layer fprop. compound_rnn_unroll_fprop_bibnrnn
(ngLayer, …)consume
(buf_index, hostlist, devlist)conv_layer
(dtype, N, C, K[, D, H, W, T, R, …])Create a new ConvLayer parameter object. convert
(a)convert_data
(tensor, layer_mkl)convert_mkl
(a)copy_transpose
(a, out[, axes, repeat])use MKL transposition to speed up deconv_layer
(dtype, N, C, K, M, P, Q[, T, …])Create a new DeconvLayer parameter object. detectionOutput_fprop
(conf_view, loc_view, …)distribute_data
(tensor, layer_parallelism)For backends which support distributed training, this will distribute or gather the error or activation tensor depending on the type of parallelism used to distribute the layer computation. divide
(a, b[, out])Perform elementwise division on the operands, storing the resultant values in the out Tensor. dot
(a, b[, out])Dot product of two Tensors. dump_hist_data
()empty
(shape[, dtype, name, persist_values, …])Instantiate a new instance of the CPUTensor class without initializing individual element values. empty_like
(ary[, dtype, name, persist_values])Instantiate a new instance of this backend’s Tensor class, with the shape taken from ary. end
(block, identifier)Signal the corresponding end of a block of repeated computation (at the end of a loop). equal
(a, b[, out])Performs elementwise equality testing on each element of left and right, storing the result in out. execute
(optree)exp
(a[, out])Perform elementwise exponential transformation on Tensor a, storing the result in Tensor out. exp2
(a[, out])Perform elementwise 2based exponential transformation on Tensor a, storing the result in Tensor out. fabs
(a[, out])Perform elementwise absolute value of Tensor a, storing the result in Tensor out. fill_normal
(ary[, mean, stdv])Fill ary with normally distributed random numbers. finite
(a[, out])Perform elementwise test of finiteness (not infinity or not Not a Number) on Tensor a, storing the result in Tensor out. fprop_conv
(layer, I, F, O[, X, bias, bsum, …])Forward propagate the inputs of a convolutional network layer to produce output. fprop_lrn
(layer, I, O, denom[, alpha, beta, …])Forward propagate pooling layer. fprop_mergebroadcast
(ngLayer, inputs, …)fprop_mergesum
(ngLayer, inputs, inference, …)fprop_pool
(layer, I, O[, argmax, beta])Forward propagate pooling layer. fprop_relu
(layer, x, slope)fprop_skipnode
(x, y, beta)fprop_softmax
(x, axis)fprop_transform
(nglayer, transform, inputs, …)gen_rng
([seed])Generate the random number generator on host. get_numpy
(a)get_time
(start, end)Return time between start and end marks. greater
(a, b[, out])Performs elementwise greater than testing on each element of left and right, storing the result in out. greater_equal
(a, b[, out])Performs elementwise greater than or equal testing on each element of left and right, storing the result in out. init_mark
()Generate a timing mark object. iobuf
(dim0[, x, dtype, name, …])Allocate input and output buffer for layer based on batch size. is_mkl
()less
(a, b[, out])Performs elementwise less than testing on each element of left and right, storing the result in out. less_equal
(a, b[, out])Performs elementwise less than or equal testing on each element of left and right, storing the result in out. log
(a[, out])Perform elementwise natural logarithm transformation on Tensor a, storing the result in Tensor out. log2
(a[, out])Perform elementwise 2based logarithm transformation on Tensor a, storing the result in Tensor out. lrn_layer
(dtype, N, C[, D, H, W, J])Create a new PoolLayer parameter object. make_binary_mask
(out[, keepthresh])Create a binary mask for dropout layers. max
(a[, axis, out, keepdims])Calculates the maximal element value along the specified axes. maximum
(a, b[, out])Performs elementwise maximum value assignment based on corresponding elements of left and right, storing the result in out. mean
(a[, axis, partial, out, keepdims])Calculates the arithmetic mean of the elements along the specified axes. mergebroadcast_layer
(layer_num)mergesum_layer
(layer_num)min
(a[, axis, out, keepdims])Calculates the minimal element value along the specified axes. minimum
(a, b[, out])Performs elementwise minimum value assignment based on corresponding elements of left and right, storing the result in out. multiply
(a, b[, out])Perform elementwise multiplication on the operands, storing the resultant values in the out Tensor. negative
(a[, out])Perform elementwise negation of Tensor a, storing the result in Tensor out. nms
(detections, threshold[, normalized])Function to perform nonmaximal supression. not_equal
(a, b[, out])Performs elementwise nonequality testing on each element of left and right, storing the result in out. onehot
(indices, axis[, out])Generate optree for converting indices to a onehot representation. ones
(shape[, dtype, name, persist_values, …])Instantiate a new instance of the CPUTensor class setting each element value to 1. output_dim
(X, S, padding, strides[, …])Compute along 1 dimension, with these sizes, what will be the output dimension. pool_layer
(dtype, op, N, C[, D, H, W, J, T, …])Create a new PoolLayer parameter object. power
(a, b[, out])Perform elementwise raise of tsr values to specified power, storing the result in Tensor out. reciprocal
(a[, out])Perform elementwise reciprocal of Tensor a, storing the result in Tensor out. record_mark
(marker)Mark the current time. relu_layer
()revert_tensor
(tensor)Reverts a tensor to its original state after being distributed by distribute_data. rint
(a[, out])Perform elementwise rounding to nearest int. rng_get_state
()Return the current state of the onhost RNG. rng_reset
()Reset the random state to the state where the Backend is first initialized. rng_set_state
(state)Set the RNG state for host RNG. roipooling_bprop
(I, rois, O, argmax, …)Function to perform bprop of ROIPooling. roipooling_fprop
(I, rois, O, argmax, …)Function to perform fprop of ROIPooling safelog
(a[, out])Perform elementwise natural logarithm transformation on Tensor a, storing the result in Tensor out. set_caffe_compat
()Set flag to make layers compatible with caffe in terms of conv and pool layer output size determination and dropout layer implementation. set_hist_buffers
(hist_bins, hist_offset)sgn
(a[, out])Perform elementwise indication of the sign of Tensor a, storing the result in Tensor out. shared_iobuf_size
(shape, parallelism)Computes the backend specific size needed for an iobuf with a specified shape that is meant to be shared between layers. shift
(ary, shift_ary[, value, out])Shifts input array sig
(a[, out])Perform elementwise sigmoid transformation on Tensor a, storing the result in Tensor out. sig2
(a[, out])Perform elementwise 2based sigmoid logarithm transformation on Tensor a, storing the result in Tensor out. sqrt
(a[, out])Perform elementwise squareroot of Tensor a, storing the result in Tensor out. square
(a[, out])Perform elementwise square of Tensor a, storing the result in Tensor out. std
(a[, axis, partial, out, keepdims])Calculates the standard deviation of the elements along the specified axes. subtract
(a, b[, out])Perform elementwise subtraction on the operands, storing the resultant values in the out Tensor. sum
(a[, axis, out, keepdims])Calculates the summation of the elements along the specified axis. sum_tensor
(sum, layer_num, tensors, output)synchronize_mark
(marker)Synchronize on the given marker. take
(a, indices, axis[, out])Extract elements based on the indices along a given axis. tanh
(a[, out])Perform elementwise hyperbolic tangent transformation on Tensor a, storing the result in Tensor out. tanh2
(a[, out])Perform elementwise 2based hyperbolic tangent transformation on Tensor a, storing the result in Tensor out. trans2d
(W_recur_f, W_recur_b, …)true_divide
(a, b[, out])Here it is an alias of divide. update_conv
(layer, I, E, U[, alpha, beta, …])Compute the updated gradient for a convolutional network layer. update_fc_bias
(err, out)Compute the updated bias gradient for a fully connected network layer. var
(a[, axis, partial, out, keepdims, binary])Calculates the variance of the elements along the specified axes. xnor_compound_dot
(A, B, C[, beta, bsum])Performs XNOR GEMM zeros
(shape[, dtype, name, persist_values, …])Instantiate a new instance of the CPUTensor class setting each element value to 0. zeros_like
(ary[, dtype, name, persist_values])Instantiate a new instance of this backend’s Tensor class, with the shape taken from ary and populating each element with a value of 0. 
Relu
(ary, out=None)¶ Calculates the ReLu transformation for input array.
Parameters:  ary – numpy array
 out – reference to output

absolute
(a, out=None)¶ Perform elementwise absolute value of Tensor a, storing the result in Tensor out. Both Tensor’s should have identical shape.
Parameters: Returns: the resulting optree
Return type:

add
(a, b, out=None)¶ Perform elementwise addition on the operands, storing the resultant values in the out Tensor. Each operand and out must have identical shape or be broadcastable as such.
Parameters: Returns: the resulting optree
Return type:

add_fc_bias
(inputs, bias)¶ Add the bias for a fully connected network layer.
Parameters:

allocate_backend
(name, **kargs)¶ Allocate a named backend.

allocate_new_deltas
(delta, in_shape, parallelism)[source]¶ For MKL backends, allocate new deltas for broadcast

argmax
(a, axis=1, out=None, keepdims=True)¶ Calculates the indices of the maximal element value along the specified axis. If multiple elements contain the maximum, only the indices of the first are returned.
Parameters:  a (Tensor) – the Tensor on which to perform the operation
 axis (int, optional) – the dimension along which to compute. If set to None, we will take argmax over all dimensions. Defaults to 1
 out (Tensor, optional) – where the result will be stored. If out is None, only the optree will be returned.
 keepdims (bool, optional) – Keep the axes being computed over in the output (with size 1), instead of collapsing. Defaults to True.
Returns: the resulting optree
Return type:

argmin
(a, axis=1, out=None, keepdims=True)¶ Calculates the indices of the minimal element value along the specified axis. If multiple elements contain the minimum, only the indices of the first are returned.
Parameters:  a (Tensor) – the Tensor on which to perform the operation
 axis (int, optional) – the dimension along which to compute. If set to None, we will take argmin over all dimensions. Defaults to 1
 out (Tensor, optional) – where the result will be stored. If out is None, only the optree will be returned.
 keepdims (bool, optional) – Keep the axes being computed over in the output (with size 1), instead of collapsing. Defaults to True.
Returns: the resulting optree
Return type:

array
(ary, dtype=None, name=None, persist_values=True, parallel=False, distributed=False)¶ Instantiate a new instance of the CPUTensor class setting each element value to what is specified in ary.
Parameters:  ary (numpy.ndarray) – The data structure containing element values spread across a number of dimensions. Python builtin types like ints and lists are supported.
 dtype (dtype, optional) – Element data type. If not specified we use default_dtype value (‘float32’ unless overridden).
 persist_values (bool, optional) – If set to True (the default), the values assigned to this Tensor will persist across multiple begin and end calls. Setting to False may provide a performance increase if values do not need to be maintained across such calls
Returns: newly created data structure reference
Return type:

backend_choices
()¶ Return the list of available backends.

backend_name
= 'mkl'¶

backends
= {'cpu': <class 'neon.backends.nervanacpu.NervanaCPU'>, 'mkl': <class 'neon.backends.nervanamkl.NervanaMKL'>, 'gpu': <class 'neon.backends.nervanagpu.NervanaGPU'>}¶

batched_dot
(A, B, C, alpha=1.0, beta=0.0, relu=False)¶ Doing following operations: 1 For fprop: A(K, C), B(X,C,N), C(X,K,N) –> call batched_dot(A, B, C) 2 For bprop: A(K, C), B(X,K,N), C(X,C,N) –> call batched_dot(A.T, B, C) 3 For update: A(X,K,N), B(X,C,N), C(K,C) –> call batched_dot(A, B.T, C).
Parameters:  B (A,) – input operands
 C (CPUTensor) – output
 beta, relu (alpha,) – see usage in dot()

begin
(block, identifier)¶ Signal the start of a block of repeated computation (at the start of a loop). This operation can be used to help the compiler optimize instruction performance, but has no direct effect on calculations. It must be bookended by a corresponding Backend.end() call. Note that multiple begin calls can appear adjacent in nested loops.
Parameters:  block (Block.attr) – identifies the type of computation being worked on based on Block attribute specified
 identifier (int) – unique identifier for this particular iteration of the block. Will typically be something like epoch number, minibatch number, and so forth.
See also

bibnrnn_layer
(h_buffer_all, h_ff_buffer, W_recur_f, W_recur_b, nsteps, nout)[source]¶ Create a new BiBNRNN parameter object. To change the data storage type This then is passed as an argument to all the BiBNRNN operations.
N: Number of images in minibatch C: Number of output feature maps K: Number of input feature maps

binarize
(ary, out, stochastic=True)¶ Binarizes input array
Parameters:  ary – tensor
 out – reference to output
 stochastic – stochastic or deterministic

bprop_conv
(layer, F, E, grad_I, X=None, bias=None, bsum=None, alpha=1.0, beta=0.0, relu=False, brelu=False, slope=0.0, layer_op=None)¶ Backward propagate the error through a convolutional network layer.
Parameters:  Compounding Options:
 X: tensor to use in bprop_relu or beta
 can be same as grad_I for beta accumulate (this is default when None) should be same shape as grad_I
 bias: (K,1) tensor to use for adding bias to output
 grad_I += bias
 bsum: (K,1) tensor to accumulate batch sum over (used in batchnorm or bprop_bias)
 bsum = sum(grad_I.reshape(K,1), axis=1) the sum operation is fully deterministic
 alpha, beta:
 grad_I = alpha*grad_I + beta*X grad_I = alpha*grad_I + beta*grad_I (if X==grad_I)
 relu, slope: boolean flag to apply:
 grad_I = max(grad_I, 0) + slope*min(grad_I, 0) can be combined with bias (where bias is added first)
 brelu, slope: boolean flag to apply:
 grad_I *= (X > 0) + slope*(X < 0) can be combined with bsum tensor to output bprop_bias

bprop_lrn
(layer, I, O, E, delta, denom, alpha=None, beta=None, ascale=1, bpower=1)¶ Backward propagate pooling layer.
Parameters:  layer (PoolLayer) – The pool layer object. Different backends have different pool layers.
 I (Tensor) – Input tensor.
 E (Tensor) – Error tensor.
 delta (Tensor) – Gradient tensor (delta)
 denom (Tensor) – denominator tensor computed during bprop
 ascale (float) – scaling parameter (alpha) to multiply the pooled sum (1.25e5 in AK)
 bpower (float) – exponential parameter (beta) to raise denominator by (0.75 in AK)

bprop_mergebroadcast
(ngLayer, layers, error_views, error, deltas, out_shape, alpha, beta, alphas, betas)[source]¶

bprop_pool
(layer, I, O, argmax=None, alpha=1.0, beta=0.0)[source]¶ Backward propagate pooling layer.
Parameters:  layer (PoolLayer) – The pool layer object. Different backends have different pool layers.
 I (Tensor) – Input (error) tensor.
 O (Tensor) – Output (delta) tensor.
 argmax (Tensor) – tensor to store location of the maximum
 alpha (float) – linear scaling (does not work for l2 pooling)
 beta (float) – accumulation value into grad_I

check_caffe_compat
()¶ Check whether compatibility mode is set to ‘caffe’.

cleanup_backend
()¶ Release any resources that have been acquired by this backend.

clip
(a, a_min, a_max, out=None)¶ Performs elementwise clipping of Tensor a, storing the result in out. The clipped value will be between [a_min, a_max].
Parameters: Returns: the resulting optree
Return type:

compound_bprop_bn
(deltas, grad_gamma, grad_beta, error, inputs, xsum, xvar, gamma, eps, binary=False, layer=None)[source]¶

compound_bprop_lut
(nin, inputs, error, error_t, dW, pad_idx, alpha=1.0, beta=0)¶ Backward propagate lookup table layer.
Parameters:

compound_dot
(A, B, C, alpha=1.0, beta=0.0, relu=False, bsum=None)[source]¶ Doing following operations (* is dot product) C = alpha * A * B + beta * C C = alpha * A.T * B + beta * C C = alpha * A * B.T + beta * C.
relu: if true applied before output (and prior to beta addition)
The operation will be shortcircuited to: out < alpha * left * right if beta has value 0 (the default).
Parameters:

compound_fprop_bn
(x, xsum, xvar, gmean, gvar, gamma, beta, y, eps, rho, compute_batch_sum, accumbeta=0.0, relu=False, binary=False, inference=False, outputs=None, layer=None)[source]¶

compound_rnn_unroll_bprop
(W_recur, delta_prev_s, delta_s, h_s, nout, num_steps, num_used_steps, activation, reverse=True)[source]¶ Time step unrolling portion of recurrent layer bprop.
Parameters:  W_recur (Tensor) – Recurrent weight matrix.
 delta_prev_s (Array) – Array of per time step input delta tensors. Each element in the array is a single time step view into one tensor containing all of the time steps in sequence.
 delta_s (Array) – Array of per time step input delta tensors. Each element in the array is a single time step view into one tensor containing all of the time steps in sequence.
 h_s (Tensor) – Array of per time step hidden state tensors. Each element in the array is a single time step view into one tensor containing all of the time steps in sequence.
 nout (integer) – Number of output units for the layer.
 num_steps (integer) – Total number of time steps in the buffer.
 num_used_steps (integer) – Number of time steps being used for real data.
 activation (Transform) – Activation function for the layer.
 reverse (boolean) – When true, unrolling will iterate over time steps in reverse (default case for RNN).

compound_rnn_unroll_bprop_bibnrnn
(ngLayer, error, in_deltas_f_not_used_in_mkl, prev_in_deltas_not_used_in_mkl, in_deltas_b_not_used_in_mkl, next_in_deltas_not_used_in_mkl, W_recur_f, W_recur_b, h_f_not_used_in_mkl, h_b_not_used_in_mkl, nout, nsteps, nsteps_used, activation, h_buffer_all)[source]¶

compound_rnn_unroll_fprop
(W_recur, h_prev_s, h_ff_s, h_s, bias, nout, num_steps, num_used_steps, activation, reverse=False)[source]¶ Time step unrolling portion of recurrent layer fprop.
Parameters:  W_recur (Tensor) – Recurrent weight matrix.
 h_prev_s (Array) – Array of per time step hidden state tensors. Each element in the array is a single time step view into one tensor containing all of the time steps in sequence.
 h_ff_s (Array) – Array of per time step hidden state tensors. Each element in the array is a single time step view into one tensor containing all of the time steps in sequence.
 h_s (Array) – Array of per time step hidden state tensors. Each element in the array is a single time step view into one tensor containing all of the time steps in sequence.
 bias (Tensor) – Bias tensor to add at each time step.
 nout (integer) – Number of output units for the layer.
 num_steps (integer) – Total number of time steps in the buffer.
 num_used_steps (integer) – Number of time steps being used for real data.
 activation (Transform) – Activation function for the layer.
 reverse (boolean) – When true, unrolling will iterate over time steps in reverse (for BiRNN).

compound_rnn_unroll_fprop_bibnrnn
(ngLayer, h_buffer_all, h_ff_buffer, W_recur_f, h_prev_not_used_in_mkl, h_ff_f_not_used_in_mkl, h_f_not_used_in_mkl, b_f, W_recur_b, h_next_not_used_in_mkl, h_ff_b_not_used_in_mkl, h_b_not_used_in_mkl, b_b, nout, nsteps, nsteps_used, activation)[source]¶

consume
(buf_index, hostlist, devlist)¶

conv_layer
(dtype, N, C, K, D=1, H=1, W=1, T=1, R=1, S=1, pad_d=0, pad_h=0, pad_w=0, str_d=1, str_h=1, str_w=1, dil_d=1, dil_h=1, dil_w=1)[source]¶ Create a new ConvLayer parameter object. This then is passed as an argument to all the convolution operations.
N: Number of images in minibatch C: Number of input feature maps K: Number of output feature maps
D: Depth of input image H: Height of input image W: Width of input image
T: Depth of filter kernel R: Height of filter kernel S: Width of filter kernel
padding: amount of zeropadding around the given edge strides: factor to step the filters by in a given direction dilation: dilation factor for each dimension
dtype: need to know dtype to setup proper kernels and params.
 bsum: calculate the sum along the batchnorm axis for fprop or bprop
 outputs an fp32 tensor of size Kx1

deconv_layer
(dtype, N, C, K, M, P, Q, T=1, R=1, S=1, pad_d=0, pad_h=0, pad_w=0, str_d=1, str_h=1, str_w=1, dil_d=1, dil_h=1, dil_w=1)[source]¶ Create a new DeconvLayer parameter object. This then is passed as an argument to all the convolution operations.
N: Number of images in minibatch C: Number of output feature maps K: Number of input feature maps
M: Depth of input P: Height of input Q: Width of input
D: Depth of output image H: Height of output image W: Width of output image
T: Depth of filter kernel R: Height of filter kernel S: Width of filter kernel
padding: amount of zeropadding around the given edge strides: factor to step the filters by in a given direction dilation: dilation factor for each dimension
dtype: need to know dtype to setup proper kernels and params.

detectionOutput_fprop
(conf_view, loc_view, detection, prior_boxes, proposals, nms_top_k, image_top_k, score_threshold, nms_threshold)[source]¶

distribute_data
(tensor, layer_parallelism)¶ For backends which support distributed training, this will distribute or gather the error or activation tensor depending on the type of parallelism used to distribute the layer computation. Currently this is only supported by multiGPU in Nervana cloud.
Parameters:  tensor – Tensor containing either activations or errors
 layer_parallelism – Type of parallelism expected by the layer
Returns: Tensor which has been altered by this call or None

divide
(a, b, out=None)¶ Perform elementwise division on the operands, storing the resultant values in the out Tensor. Each operand and out must have identical shape or be broadcastable as such.
Parameters: Returns: the resulting optree
Return type:

dot
(a, b, out=None)¶ Dot product of two Tensors.
Parameters: Returns: the resulting optree from this operation.
Return type:

dump_hist_data
()¶

empty
(shape, dtype=None, name=None, persist_values=True, parallel=False, distributed=False)¶ Instantiate a new instance of the CPUTensor class without initializing individual element values.
Parameters:  shape (int, list) – The size of each dimension of the Tensor.
 dtype (dtype, optional) – Element data type. If not specified we use default_dtype value
 persist_values (bool, optional) – If set to True (the default), the values assigned to this Tensor will persist across multiple begin and end calls. Setting to False may provide a performance increase if values do not need to be maintained across such calls
Returns: newly created data structure reference
Return type:

empty_like
(ary, dtype=None, name=None, persist_values=True)¶ Instantiate a new instance of this backend’s Tensor class, with the shape taken from ary.
Parameters:  ary (tensor object) – Tensor to inherit the dimensions of.
 dtype (datatype, optional) – If present, specifies the underlying type to employ for each element.
 persist_values (bool, optional) – If set to True (the default), the values assigned to this Tensor will persist across multiple begin and end calls. Setting to False may provide a performance increase if values do not need to be maintained across such calls
Returns: array object
Return type:

end
(block, identifier)¶ Signal the corresponding end of a block of repeated computation (at the end of a loop). This operation can be used to help the compiler optimize performance, but has no direct effect on calculations. It must be preceded by a corresponding Backend.begin() call.
Parameters:  block (Block.attr) – identifies the type of computation being worked on based on Block attribute specified
 identifier (int) – unique identifier for this particular iteration of the block. Will typically be something like epoch number, minibatch number, and so forth.
See also

equal
(a, b, out=None)¶ Performs elementwise equality testing on each element of left and right, storing the result in out. Each operand is assumed to be the same shape (or broadcastable as such).
Parameters: Returns: the resulting optree
Return type:

exp
(a, out=None)¶ Perform elementwise exponential transformation on Tensor a, storing the result in Tensor out. Both Tensor’s should have identical shape.
Parameters: Returns: the resulting optree
Return type:

exp2
(a, out=None)¶ Perform elementwise 2based exponential transformation on Tensor a, storing the result in Tensor out. Both Tensor’s should have identical shape.
Parameters: Returns: the resulting optree
Return type:

fabs
(a, out=None)¶ Perform elementwise absolute value of Tensor a, storing the result in Tensor out. Both Tensor’s should have identical shape. Implemented as an alias of absolute.
Parameters: Returns: the resulting optree
Return type:

fill_normal
(ary, mean=0, stdv=1)¶ Fill ary with normally distributed random numbers.
Parameters:

finite
(a, out=None)¶ Perform elementwise test of finiteness (not infinity or not Not a Number) on Tensor a, storing the result in Tensor out. Both Tensor’s should have identical shape.
Parameters: Returns: the resulting optree
Return type:

fprop_conv
(layer, I, F, O, X=None, bias=None, bsum=None, alpha=1.0, beta=0.0, relu=False, brelu=False, slope=0.0, layer_op=None)¶ Forward propagate the inputs of a convolutional network layer to produce output.
Parameters:  Compounding Options:
 X: tensor to use in bprop_relu or beta
 can be same as O for beta accumulate (this is default when None) should be same shape as O
 bias: (K,1) tensor to use for adding bias to output
 O += bias
 bsum: (K,1) tensor to accumulate batch sum over (used in batchnorm or bprop_bias)
 bsum = sum(O.reshape(K,1), axis=1) the sum operation is fully deterministic
 alpha, beta:
 O = alpha*O + beta*X O = alpha*O + beta*O (if X==O)
 relu, slope: boolean flag to apply:
 O = max(O, 0) + beta*min(O, 0) can be combined with bias (where bias is added first)
 brelu, slope: boolean flag to apply:
 O *= (X > 0) + beta*(X < 0) can be combined with bsum tensor to output bprop_bias

fprop_lrn
(layer, I, O, denom, alpha=None, beta=None, ascale=1, bpower=1)¶ Forward propagate pooling layer.
Parameters:  layer (PoolLayer) – The pool layer object, different backends have different pool layers.
 I (Tensor) – Input tensor.
 O (Tensor) – output tensor.
 denom (Tensor) – denominator tensor, stores the result of the squared pooling/contrast
 ascale (float) – scaling parameter (alpha) to multiply the pooled sum (1.25e5 in AK)
 bpower (float) – exponential parameter (beta) to raise denominator by (0.75 in AK)

fprop_pool
(layer, I, O, argmax=None, beta=0.0)[source]¶ Forward propagate pooling layer.
Parameters:

fprop_softmax
(x, axis)¶

gen_rng
(seed=None)¶ Generate the random number generator on host.
Parameters: seed (int) – random number generator seed Returns: seeded numpy RNG

get_time
(start, end)¶ Return time between start and end marks.
Parameters:  start (time maker) – start time mark
 end (time marker) – end time mark
Returns: time elapsed between start and end time marks in milliseconds

greater
(a, b, out=None)¶ Performs elementwise greater than testing on each element of left and right, storing the result in out. Each operand is assumed to be the same shape (or broadcastable as such).
Parameters: Returns: the resulting optree
Return type:

greater_equal
(a, b, out=None)¶ Performs elementwise greater than or equal testing on each element of left and right, storing the result in out. Each operand is assumed to be the same shape (or broadcastable as such).
Parameters: Returns: the resulting optree
Return type:

init_mark
()¶ Generate a timing mark object.
Returns: timing mark (dict)

iobuf
(dim0, x=None, dtype=None, name=None, persist_values=True, shared=None, parallelism=None)¶ Allocate input and output buffer for layer based on batch size. This is used because the layer does not know about the batch size.
Parameters:  dim0 (tuple or int) – I/O buffer dimension for layer (without the axis specifying the batch size).
 x (datatype, optional) – If present and not None, x will be returned directly. x will be not None if the buffer has already been allocated.
 dtype (datatype, optional) – If present, specifies the underlying type to employ for each element.
 name (str, optional) – name indentifying the tensor (used in printing).
 persist_values (bool, optional) – If set to True (the default), the values assigned to this Tensor will persist across multiple begin and end calls. Setting to False may provide a performance increase if values do not need to be maintained across such calls
 shared (buffer, optional) – If present will attempt to reuse the memory in shared to allocate the I/O buffer
 parallelism (str, optional) – Indicates type of parallelism (Data, Model) employed by this buffer. Ignored on CPU and GPU backends, defaults to no parallelism.
Returns: array object
Return type:

less
(a, b, out=None)¶ Performs elementwise less than testing on each element of left and right, storing the result in out. Each operand is assumed to be the same shape (or broadcastable as such).
Parameters: Returns: the resulting optree
Return type:

less_equal
(a, b, out=None)¶ Performs elementwise less than or equal testing on each element of left and right, storing the result in out. Each operand is assumed to be the same shape (or broadcastable as such).
Parameters: Returns: the resulting optree
Return type:

log
(a, out=None)¶ Perform elementwise natural logarithm transformation on Tensor a, storing the result in Tensor out. Both Tensor’s should have identical shape.
Parameters: Returns: the resulting optree
Return type:

log2
(a, out=None)¶ Perform elementwise 2based logarithm transformation on Tensor a, storing the result in Tensor out. Both Tensor’s should have identical shape.
Parameters: Returns: the resulting optree
Return type:

lrn_layer
(dtype, N, C, D=1, H=1, W=1, J=1)¶ Create a new PoolLayer parameter object. This then is passed as an argument to all pooling kernels.
N: Number of images in minibatch
C: Number of input feature maps H: Height of input image W: Width of input image
J: Size of feature map pooling window (maxout n_pieces)
padding: amount of zeropadding around the given image or feature map edge strides: factor to step the window by in a given direction (overlap allowed)
Leave spatial dimensions at 1 to allow feature map pooling in the fc layers.

make_binary_mask
(out, keepthresh=0.5)¶ Create a binary mask for dropout layers.
Parameters:

max
(a, axis=None, out=None, keepdims=True)¶ Calculates the maximal element value along the specified axes.
Parameters:  a (Tensor) – the Tensor on which to perform the operation
 axis (int, optional) – the dimension along which to compute. If set to None, we will take max over all dimensions.
 out (Tensor, optional) – where the result will be stored. If out is None, only the optree will be returned.
 keepdims (bool, optional) – Keep the axes being computed over in the output (with size 1), instead of collapsing. Defaults to True.
Returns: the resulting optree
Return type:

maximum
(a, b, out=None)¶ Performs elementwise maximum value assignment based on corresponding elements of left and right, storing the result in out. Each operand is assumed to be the same shape (or broadcastable as such).
Parameters: Returns: the resulting optree
Return type:

mean
(a, axis=None, partial=None, out=None, keepdims=True)¶ Calculates the arithmetic mean of the elements along the specified axes.
Parameters:  a (Tensor) – the Tensor on which to perform the operation
 axis (int, optional) – the dimension along which to compute. If set to None, we will take mean over all dimensions. Defaults to None
 partial (bool, optional) – Not currently used.
 out (Tensor, optional) – where the result will be stored. If out is None, only the optree will be returned.
 keepdims (bool, optional) – Keep the axes being computed over in the output (with size 1), instead of collapsing. Defaults to True.
Returns: the resulting optree
Return type:

min
(a, axis=None, out=None, keepdims=True)¶ Calculates the minimal element value along the specified axes.
Parameters:  a (Tensor) – the Tensor on which to perform the operation
 axis (int, optional) – the dimension along which to compute. If set to None, we will take min over all dimensions.
 out (Tensor, optional) – where the result will be stored. If out is None, only the optree will be returned.
 keepdims (bool, optional) – Keep the axes being computed over in the output (with size 1), instead of collapsing. Defaults to True.
Returns: the resulting optree
Return type:

minimum
(a, b, out=None)¶ Performs elementwise minimum value assignment based on corresponding elements of left and right, storing the result in out. Each operand is assumed to be the same shape (or broadcastable as such).
Parameters: Returns: the resulting optree
Return type:

multiply
(a, b, out=None)¶ Perform elementwise multiplication on the operands, storing the resultant values in the out Tensor. Each operand and out must have identical shape or be broadcastable as such.
Parameters: Returns: the resulting optree
Return type:

negative
(a, out=None)¶ Perform elementwise negation of Tensor a, storing the result in Tensor out. Both Tensor’s should have identical shape.
Parameters: Returns: the resulting optree
Return type:

nms
(detections, threshold, normalized=False)¶ Function to perform nonmaximal supression.
Parameters:  detections (Tensor) – detection boxes (box_count, 5), each row has (x1, y1, x2, y2, score). Assume the boxes have already been sorted based on score in descending order
 output_mask (Tensor) – preallocated buffer for mask output from the kernel
 threshold (float) – box overlap threshold, boxes with smaller overlaps will be kept
 normalized (bool) – whether box coordinates are normalized to image dimensions
 Outputs:
 keep_ind (list): list of indices

not_equal
(a, b, out=None)¶ Performs elementwise nonequality testing on each element of left and right, storing the result in out. Each operand is assumed to be the same shape (or broadcastable as such).
Parameters: Returns: the resulting optree
Return type:

onehot
(indices, axis, out=None)¶ Generate optree for converting indices to a onehot representation.
Parameters: Returns: the resulting optree
Return type:

ones
(shape, dtype=None, name=None, persist_values=True, parallel=False, distributed=False)¶ Instantiate a new instance of the CPUTensor class setting each element value to 1.
Parameters:  shape (list of ints) – The size of each dimension of the Tensor.
 dtype (dtype, optional) – Element data type. If not specified we use default_dtype value (‘float32’ unless overridden).
 persist_values (bool, optional) – If set to True (the default), the values assigned to this Tensor will persist across multiple begin and end calls. Setting to False may provide a performance increase if values do not need to be maintained across such calls
Returns: newly created data structure reference
Return type:

output_dim
(X, S, padding, strides, pooling=False, dilation=1)¶ Compute along 1 dimension, with these sizes, what will be the output dimension.
Parameters:

pool_layer
(dtype, op, N, C, D=1, H=1, W=1, J=1, T=1, R=1, S=1, pad_c=0, pad_d=0, pad_h=0, pad_w=0, str_c=None, str_d=None, str_h=None, str_w=None)[source]¶ Create a new PoolLayer parameter object. This then is passed as an argument to all pooling kernels.
op: “max”, “avg”, “l2” pooling (currently bprop only supports max, but not avg and l2) N: Number of images in minibatch
C: Number of input feature maps D: Depth of input image H: Height of input image W: Width of input image
J: Size of feature map pooling window (maxout n_pieces) T: Depth of pooling window R: Height of pooling window S: Width of pooling window
padding: amount of zeropadding around the given image or feature map edge strides: factor to step the window by in a given direction (overlap allowed)
Leave spatial dimensions at 1 to allow feature map pooling in the fc layers.

power
(a, b, out=None)¶ Perform elementwise raise of tsr values to specified power, storing the result in Tensor out. Both Tensor’s should have identical shape.
Parameters: Returns: the resulting optree
Return type:

reciprocal
(a, out=None)¶ Perform elementwise reciprocal of Tensor a, storing the result in Tensor out. Both Tensor’s should have identical shape.
Parameters: Returns: the resulting optree
Return type:

record_mark
(marker)¶ Mark the current time.
Parameters: marker (time mark) – timing mark generated by init_mark()

revert_tensor
(tensor)¶ Reverts a tensor to its original state after being distributed by distribute_data.
Parameters: tensor – Tensor to be reverted

rint
(a, out=None)¶ Perform elementwise rounding to nearest int.
Parameters: Returns: the resulting optree
Return type:

rng_get_state
()¶ Return the current state of the onhost RNG.
Returns: the onhost RNG state vectors Return type: np.array

rng_reset
()¶ Reset the random state to the state where the Backend is first initialized.

rng_set_state
(state)¶ Set the RNG state for host RNG.
Parameters: state (np.array) – numpy random number state vector

roipooling_bprop
(I, rois, O, argmax, roi_count, C, H, W, pooled_height, pooled_width, spatial_scale)¶ Function to perform bprop of ROIPooling.
Parameters:

roipooling_fprop
(I, rois, O, argmax, roi_count, C, H, W, pooled_height, pooled_width, spatial_scale)¶ Function to perform fprop of ROIPooling
Parameters:

safelog
(a, out=None)¶ Perform elementwise natural logarithm transformation on Tensor a, storing the result in Tensor out. Both Tensor’s should have identical shape. This log function has built in safety for underflow.
Parameters: Returns: the resulting optree
Return type:

set_caffe_compat
()¶ Set flag to make layers compatible with caffe in terms of conv and pool layer output size determination and dropout layer implementation.

set_hist_buffers
(hist_bins, hist_offset)¶

sgn
(a, out=None)¶ Perform elementwise indication of the sign of Tensor a, storing the result in Tensor out. Both Tensor’s should have identical shape.
Parameters: Returns: the resulting optree
Return type:
Computes the backend specific size needed for an iobuf with a specified shape that is meant to be shared between layers.
Parameters: Returns: Size of required iobuf
Return type:

shift
(ary, shift_ary, value=True, out=None)¶ Shifts input array
Parameters:  ary – tensor
 shift_ary – tensor of shift amount
 out – reference to output

sig
(a, out=None)¶ Perform elementwise sigmoid transformation on Tensor a, storing the result in Tensor out. Both Tensor’s should have identical shape.
Parameters: Returns: the resulting optree
Return type:

sig2
(a, out=None)¶ Perform elementwise 2based sigmoid logarithm transformation on Tensor a, storing the result in Tensor out. Both Tensor’s should have identical shape.
Parameters: Returns: the resulting optree
Return type:

sqrt
(a, out=None)¶ Perform elementwise squareroot of Tensor a, storing the result in Tensor out. Both Tensor’s should have identical shape.
Parameters: Returns: the resulting optree
Return type:

square
(a, out=None)¶ Perform elementwise square of Tensor a, storing the result in Tensor out. Both Tensor’s should have identical shape.
Parameters: Returns: the resulting optree
Return type:

std
(a, axis=None, partial=None, out=None, keepdims=True)¶ Calculates the standard deviation of the elements along the specified axes.
Parameters:  a (Tensor) – the Tensor on which to perform the operation
 axis (int, optional) – the dimension along which to compute. If set to None, we will take std over all dimensions.
 out (Tensor, optional) – where the result will be stored. If out is None, only the optree will be returned.
 partial (bool, optional) – Not currently used.
 keepdims (bool, optional) – Keep the axes being computed over in the output (with size 1), instead of collapsing. Defaults to True.
Returns: the resulting optree
Return type:

subtract
(a, b, out=None)¶ Perform elementwise subtraction on the operands, storing the resultant values in the out Tensor. Each operand and out must have identical shape or be broadcastable as such.
Parameters: Returns: the resulting optree
Return type:

sum
(a, axis=None, out=None, keepdims=True)¶ Calculates the summation of the elements along the specified axis.
Parameters:  a (Tensor) – the Tensor on which to perform the sum
 axis (int, optional) – the dimension along which to compute. If set to None, we will sum over all dimensions.
 out (Tensor, optional) – where the result will be stored. If out is None, only the optree will be returned.
 keepdims (bool, optional) – Keep the axes being computed over in the output (with size 1), instead of collapsing. Defaults to True.
Returns: the resulting optree
Return type:

synchronize_mark
(marker)¶ Synchronize on the given marker.
Parameters: marker (time mark) – timing mark generated by init_mark()

take
(a, indices, axis, out=None)¶ Extract elements based on the indices along a given axis.
Parameters:  a (Tensor) – the Tensor on which to perform the operation
 indices (Tensor, numpy ndarray) – indicies of elements to select
 axis (int, optional) – the dimension along which to compute. If set to None, we will extract over all dimensions (flattened first)
 out (Tensor, optional) – where the result will be stored. If out is None, only the optree will be returned.

tanh
(a, out=None)¶ Perform elementwise hyperbolic tangent transformation on Tensor a, storing the result in Tensor out. Both Tensor’s should have identical shape.
Parameters: Returns: the resulting optree
Return type:

tanh2
(a, out=None)¶ Perform elementwise 2based hyperbolic tangent transformation on Tensor a, storing the result in Tensor out. Both Tensor’s should have identical shape.
Parameters: Returns: the resulting optree
Return type:

true_divide
(a, b, out=None)¶ Here it is an alias of divide. Instead of the Python traditional ‘floor division’, this returns a true division.
Parameters: Returns: the resulting optree
Return type:

update_conv
(layer, I, E, U, alpha=1.0, beta=0.0, grad_bias=None, layer_op=None)¶ Compute the updated gradient for a convolutional network layer.
Parameters:

update_fc_bias
(err, out)¶ Compute the updated bias gradient for a fully connected network layer.
Parameters:

var
(a, axis=None, partial=None, out=None, keepdims=True, binary=False)¶ Calculates the variance of the elements along the specified axes.
Parameters:  a (Tensor) – the Tensor on which to perform the operation
 axis (int, optional) – the dimension along which to compute. If set to None, we will take var over all dimensions. Defaults to None
 partial (bool, optional) – Not currently used.
 out (Tensor, optional) – where the result will be stored. If out is None, only the optree will be returned.
 keepdims (bool, optional) – Keep the axes being computed over in the output (with size 1), instead of collapsing. Defaults to True.
Returns: the resulting optree
Return type:

xnor_compound_dot
(A, B, C, beta=0.0, bsum=None)¶ Performs XNOR GEMM C = A * B
Parameters:

zeros
(shape, dtype=None, name=None, persist_values=True, parallel=False, distributed=False)¶ Instantiate a new instance of the CPUTensor class setting each element value to 0.
Parameters:  shape (list of ints) – The size of each dimension of the Tensor.
 dtype (dtype, optional) – Element data type. If not specified we use default_dtype value (‘float32’ unless overridden).
 persist_values (bool, optional) – If set to True (the default), the values assigned to this Tensor will persist across multiple begin and end calls. Setting to False may provide a performance increase if values do not need to be maintained across such calls
Returns: newly created data structure reference
Return type:

zeros_like
(ary, dtype=None, name=None, persist_values=True)¶ Instantiate a new instance of this backend’s Tensor class, with the shape taken from ary and populating each element with a value of 0.
Parameters:  ary (tensor object) – Tensor to inherit the dimensions of.
 dtype (datatype, optional) – If present, specifies the underlying type to employ for each element.
 persist_values (bool, optional) – If set to True (the default), the values assigned to this Tensor will persist across multiple begin and end calls. Setting to False may provide a performance increase if values do not need to be maintained across such calls
Returns: array object
Return type:
