neon.backends.nervanagpu.NervanaGPU

class neon.backends.nervanagpu.NervanaGPU(rng_seed=None, default_dtype=<class 'numpy.float32'>, stochastic_round=False, deterministic=None, device_id=0, bench=False, scratch_size=0, hist_bins=64, hist_offset=-48, compat_mode=None, enable_winograd=True, num_devices=None)[source]

Bases: neon.backends.backend.Backend

The primary interface class and factory for GPUTensors.

Parameters:
  • stochastic_round (int or bool, optional) – set to desired number of mantissa bits to stochasically round to. Set to 0 or False to disable stochastic rounding (the default). Set to True to use default rounding bit width.
  • bench (bool, optional) – set to True to print out performance data for most kernel calls. If False (default) no performance data is printed.
  • compat_mode (str, optional) – set flag to match implementation of other libraries for compatibility. currently only ‘caffe’ is supported
  • TODO – define other keyword parameters!
__init__(rng_seed=None, default_dtype=<class 'numpy.float32'>, stochastic_round=False, deterministic=None, device_id=0, bench=False, scratch_size=0, hist_bins=64, hist_offset=-48, compat_mode=None, enable_winograd=True, num_devices=None)[source]

Methods

__init__([rng_seed, default_dtype, …])
absolute(a[, out]) Perform element-wise absolute value of Tensor a, storing the result in Tensor out.
add(a, b[, out]) Perform element-wise addition on the operands, storing the resultant values in the out Tensor.
add_fc_bias(inputs, bias) Add the bias for a fully connected network layer.
allocate_backend(name, **kargs) Allocate a named backend.
allocate_new_deltas(delta, in_shape, parallelism) For MKL backends, allocate new deltas for broadcast
allocate_new_outputs(layer, share_output)
argmax(a[, axis, out, keepdims]) Calculates the indices of the maximal element value along the specified axis.
argmin(a[, axis, out, keepdims]) Calculates the indices of the minimal element value along the specified axis.
array(ary[, dtype, name, persist_values, …]) Converts a numpy array to a GPUTensor
backend_choices() Return the list of available backends.
batched_dot(A, B, C[, alpha, beta, relu, …])
batchnorm_layer(in_shape)
begin(block, identifier) Signal the start of a block of repeated computation (at the start of a loop).
bibnrnn_layer(h_buffer_all, h_ff_buffer, …) bibnrnn_layer: now is used in mkl to create new layer. CPU and GPU return None
binarize(ary, out[, stochastic]) Binarizes input array
bprop_conv(layer, F, E, grad_I[, X, bias, …]) bprop_conv:
bprop_lrn(layer, I, O, E, delta, denom[, …]) Backward propagate pooling layer.
bprop_mergebroadcast(ngLayer, layers, …)
bprop_mergesum(ngLayer, alpha, beta, layers, …)
bprop_pool(layer, I, O[, argmax, alpha, …])
bprop_relu(layer, x, error, deltas, slope)
bprop_skipnode(error, deltas, alpha, beta)
bprop_transform(ngLayer, transform, outputs, …)
check_caffe_compat() Check whether compatibility mode is set to ‘caffe’.
clean_data(tensor, layer_mkl) For MKL backends to clean mkl data (memory not freed)
cleanup_backend()
clip(a, a_min, a_max[, out]) Performs element-wise clipping of Tensor a, storing the result in out.
compensated_sum(sum_tensor, cmp_tensor, …)
compound_bprop_bn(delta_out, grad_gamma, …) Function to perform batch normalization forward pass.
compound_bprop_lut(nin, inputs, error, …) Backward propagate lookup table layer.
compound_dot(A, B, C[, alpha, beta, relu, …]) Doing following operations (* is dot product) C = alpha * A * B + beta * C C = alpha * A.T * B + beta * C C = alpha * A * B.T + beta * C.
compound_fprop_bn(x, xsum, xvar, gmean, …) Function to perform compound kernel call for batch normalization forward pass.
compound_rnn_unroll_bprop(W_recur, …[, …]) Time step unrolling portion of recurrent layer bprop.
compound_rnn_unroll_bprop_bibnrnn(ngLayer, …)
compound_rnn_unroll_fprop(W_recur, h_prev_s, …) Time step unrolling portion of recurrent layer fprop.
compound_rnn_unroll_fprop_bibnrnn(ngLayer, …)
consume(buf_index, hostlist, devlist)
conv_layer(dtype, N, C, K[, D, H, W, T, R, …]) Create a new ConvLayer parameter object.
convert_data(tensor, layer_mkl) For MKL backends to convert data from mkl layout to norm numpy layout
copy_transpose(a, out[, axes, repeat]) Function to perform a fast copy transpose/dimshuffle operation.
cublas_dot(A, B, C[, alpha, beta]) Matrix multiplication using cublas library.
deconv_layer(dtype, N, C, K, M, P, Q[, T, …]) Create a new DeconvLayer parameter object.
distribute_data(tensor, layer_parallelism) For backends which support distributed training, this will distribute or gather the error or activation tensor depending on the type of parallelism used to distribute the layer computation.
divide(a, b[, out]) Perform element-wise division on the operands, storing the resultant values in the out Tensor.
dot(a, b[, out]) Dot product of two Tensors.
dropout([keep, out]) Returns a keep mask for dropout.
dump_hist_data()
empty(shape[, dtype, name, persist_values, …]) Allocate the space for a GPUTensor
empty_like(other_ary[, name]) Instantiate a new instance of this backend’s Tensor class, with the shape taken from ary.
end(block, identifier) Signal the corresponding end of a block of repeated computation (at the end of a loop).
equal(a, b[, out]) Performs element-wise equality testing on each element of left and right, storing the result in out.
execute(optree) Execute the optree.
exp(a[, out]) Perform element-wise exponential transformation on Tensor a, storing the result in Tensor out.
exp2(a[, out]) Perform element-wise 2-based exponential transformation on Tensor a, storing the result in Tensor out.
fabs(a[, out]) Perform element-wise absolute value of Tensor a, storing the result in Tensor out.
fill_normal(ary[, mean, stdv]) Fill ary with normally distributed random numbers.
finite(a[, out]) Perform element-wise test of finiteness (not infinity or not Not a Number) on Tensor a, storing the result in Tensor out.
fprop_conv(layer, I, F, O[, X, bias, bsum, …]) fprop_conv:
fprop_lrn(layer, I, O, denom[, alpha, beta, …]) Forward propagate lrn layer.
fprop_mergebroadcast(ngLayer, inputs, …)
fprop_mergesum(ngLayer, inputs, inference, …)
fprop_pool(layer, I, O[, argmax, alpha, …])
fprop_relu(layer, x, slope)
fprop_skipnode(x, y, beta)
fprop_softmax(x, axis)
fprop_transform(ngLayer, transform, inputs, …)
gen_rng([seed]) Generate the random number generator on device and on host.
get_events()
get_time(start, end) Return time between start and end marks.
greater(a, b[, out]) Performs element-wise greater than testing on each element of left and right, storing the result in out.
greater_equal(a, b[, out]) Performs element-wise greater than or equal testing on each element of left and right, storing the result in out.
init_mark() Generate a timing mark object.
iobuf(dim0[, x, dtype, name, …]) Allocate input and output buffer for layer based on batch size.
is_mkl()
less(a, b[, out]) Performs element-wise less than testing on each element of left and right, storing the result in out.
less_equal(a, b[, out]) Performs element-wise less than or equal testing on each element of left and right, storing the result in out.
log(a[, out]) Perform element-wise natural logarithm transformation on Tensor a, storing the result in Tensor out.
log2(a[, out]) Perform element-wise 2-based logarithm transformation on Tensor a, storing the result in Tensor out.
lrn_layer(dtype, N, C[, D, H, W, J]) Create a new PoolLayer parameter object.
make_binary_mask(out[, keepthresh]) Create a binary mask for dropout layers.
max(a[, axis, out, keepdims]) Calculates the maximal element value along the specified axes.
maximum(a, b[, out]) Performs element-wise maximum value assignment based on corresponding elements of left and right, storing the result in out.
mean(a[, axis, partial, out, keepdims]) Calculates the arithmetic mean of the elements along the specified axes.
mergebroadcast_layer(layer_num)
mergesum_layer(layer_num)
min(a[, axis, out, keepdims]) Calculates the minimal element value along the specified axes.
minimum(a, b[, out]) Performs element-wise minimum value assignment based on corresponding elements of left and right, storing the result in out.
multiply(a, b[, out]) Perform element-wise multiplication on the operands, storing the resultant values in the out Tensor.
negative(a[, out]) Perform element-wise negation of Tensor a, storing the result in Tensor out.
nms(detections, threshold[, normalized]) Function to perform non-maximal supression.
not_equal(a, b[, out]) Performs element-wise non-equality testing on each element of left and right, storing the result in out.
onehot(indices, axis[, out]) Generate optree for converting indices to a onehot representation.
ones(shape[, dtype, name, persist_values, …]) Instantiate a new instance of the GPUTensor class setting each element value to 1.
output_dim(X, S, padding, strides[, …]) Compute along 1 dimension, with these sizes, what will be the output dimension.
pool_layer(dtype, op, N, C[, D, H, W, J, T, …]) Create a new PoolLayer parameter object.
power(a, b[, out]) Perform element-wise raise of tsr values to specified power, storing the result in Tensor out.
rand([out]) Generate random number uniformly distributed between 0 and 1.
reciprocal(a[, out]) Perform element-wise reciprocal of Tensor a, storing the result in Tensor out.
record_mark(marker) Mark the current time.
relu_layer()
revert_tensor(tensor) Reverts a tensor to its original state after being distributed by distribute_data.
rint(a[, out]) Perform element-wise rounding to nearest int.
rng_get_state() Return the current state of the on-host and on-device RNGs.
rng_reset() Reset the RNG to the initial state stored in self.init_rng_state and self.init_rng_state_dev for the host and device RNG, respectively.
rng_set_state(rng_states) Set the RNG state for both the on device and on host RNGs.
roipooling_bprop(I, rois, O, argmax, …) Function to perform bprop of ROIPooling.
roipooling_fprop(I, rois, O, argmax, …) Function to perform fprop of ROIPooling.
safelog(a[, out]) Perform element-wise natural logarithm transformation on Tensor a, storing the result in Tensor out.
scratch_buffer(size)
scratch_buffer_init()
scratch_buffer_offset(size)
scratch_buffer_reset()
set_caffe_compat() Set flag to make layers compatible with caffe in terms of conv and pool layer output size determination and dropout layer implementation.
set_hist_buffers(hist_bins, hist_offset)
set_scratch_size(*args)
sgn(a[, out]) Perform element-wise indication of the sign of Tensor a, storing the result in Tensor out.
shared_iobuf_size(shape, parallelism) Computes the backend specific size needed for an iobuf with a specified shape that is meant to be shared between layers.
shift(ary, shift_ary[, value, out]) Shifts input array
sig(a[, out]) Perform element-wise sigmoid transformation on Tensor a, storing the result in Tensor out.
sig2(a[, out]) Perform element-wise 2-based sigmoid logarithm transformation on Tensor a, storing the result in Tensor out.
sqrt(a[, out]) Perform element-wise square-root of Tensor a, storing the result in Tensor out.
square(a[, out]) Perform element-wise square of Tensor a, storing the result in Tensor out.
std(a[, axis, partial, out, keepdims]) Calculates the standard deviation of the elements along the specified axes.
subtract(a, b[, out]) Perform element-wise subtraction on the operands, storing the resultant values in the out Tensor.
sum(a[, axis, out, keepdims]) Calculates the summation of the elements along the specified axis.
synchronize_mark(marker) Synchronize on the given marker.
take(a, indices, axis[, out]) Extract elements based on the indices along a given axis.
tanh(a[, out]) Perform element-wise hyperbolic tangent transformation on Tensor a, storing the result in Tensor out.
tanh2(a[, out]) Perform element-wise 2-based hyperbolic tangent transformation on Tensor a, storing the result in Tensor out.
true_divide(a, b[, out]) Here it is an alias of divide.
update_conv(layer, I, E, grad_F[, alpha, …]) update_conv:
update_fc_bias(err, out) Compute the updated bias gradient for a fully connected network layer.
var(a[, axis, partial, out, keepdims, binary]) Calculates the variance of the elements along the specified axes.
xnor_compound_dot(A, B, C[, beta, bsum]) Performs XNOR GEMM
zeros(shape[, dtype, name, persist_values, …]) Instantiate a new instance of the GPUTensor class setting each element value to 0.
zeros_like(other_ary[, name]) Instantiate a new instance of this backend’s Tensor class, with the shape taken from ary and populating each element with a value of 0.
absolute(a, out=None)

Perform element-wise absolute value of Tensor a, storing the result in Tensor out. Both Tensor’s should have identical shape.

Parameters:
  • a (Tensor) – input to be transformed.
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
Returns:

the resulting op-tree

Return type:

OpTreeNode

add(a, b, out=None)

Perform element-wise addition on the operands, storing the resultant values in the out Tensor. Each operand and out must have identical shape or be broadcastable as such.

Parameters:
  • a (Tensor, numeric) – left-hand side operand.
  • b (Tensor, numeric) – right-hand side operand.
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
Returns:

the resulting op-tree

Return type:

OpTreeNode

add_fc_bias(inputs, bias)

Add the bias for a fully connected network layer.

Parameters:
  • inputs (Tensor) – the input to update.
  • bias (Tensor) – the amount to increment
allocate_backend(name, **kargs)

Allocate a named backend.

allocate_new_deltas(delta, in_shape, parallelism)

For MKL backends, allocate new deltas for broadcast

allocate_new_outputs(layer, share_output)
argmax(a, axis=1, out=None, keepdims=True)

Calculates the indices of the maximal element value along the specified axis. If multiple elements contain the maximum, only the indices of the first are returned.

Parameters:
  • a (Tensor) – the Tensor on which to perform the operation
  • axis (int, optional) – the dimension along which to compute. If set to None, we will take argmax over all dimensions. Defaults to 1
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
  • keepdims (bool, optional) – Keep the axes being computed over in the output (with size 1), instead of collapsing. Defaults to True.
Returns:

the resulting op-tree

Return type:

OpTreeNode

argmin(a, axis=1, out=None, keepdims=True)

Calculates the indices of the minimal element value along the specified axis. If multiple elements contain the minimum, only the indices of the first are returned.

Parameters:
  • a (Tensor) – the Tensor on which to perform the operation
  • axis (int, optional) – the dimension along which to compute. If set to None, we will take argmin over all dimensions. Defaults to 1
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
  • keepdims (bool, optional) – Keep the axes being computed over in the output (with size 1), instead of collapsing. Defaults to True.
Returns:

the resulting op-tree

Return type:

OpTreeNode

array(ary, dtype=None, name=None, persist_values=True, parallel=False, distributed=False, allocator=<Boost.Python.function object at 0x2db4ca0>)[source]

Converts a numpy array to a GPUTensor

Parameters:
  • ary (numpy.ndarray) – The data structure containing element values spread across a number of dimensions. Python built-in types like ints and lists are supported.
  • dtype (dtype, optional) – Element data type. If not specified we use default_dtype value (‘float32’ unless overridden).
  • persist_values (bool, optional) – If set to True (the default), the values assigned to this Tensor will persist across multiple begin and end calls. Setting to False may provide a performance increase if values do not need to be maintained across such calls
  • allocator (function, optional) – Memory allocator.
Returns:

newly created data structure reference

Return type:

GPUTensor

backend_choices()

Return the list of available backends.

backend_name = 'gpu'
backends = {'cpu': <class 'neon.backends.nervanacpu.NervanaCPU'>, 'mkl': <class 'neon.backends.nervanamkl.NervanaMKL'>, 'gpu': <class 'neon.backends.nervanagpu.NervanaGPU'>}
batched_dot(A, B, C, alpha=1.0, beta=0.0, relu=False, repeat=1, size=None)[source]
batchnorm_layer(in_shape)[source]
begin(block, identifier)

Signal the start of a block of repeated computation (at the start of a loop). This operation can be used to help the compiler optimize instruction performance, but has no direct effect on calculations. It must be book-ended by a corresponding Backend.end() call. Note that multiple begin calls can appear adjacent in nested loops.

Parameters:
  • block (Block.attr) – identifies the type of computation being worked on based on Block attribute specified
  • identifier (int) – unique identifier for this particular iteration of the block. Will typically be something like epoch number, mini-batch number, and so forth.

See also

end()

bibnrnn_layer(h_buffer_all, h_ff_buffer, W_recur_f, W_recur_b, nsteps, nout)

bibnrnn_layer: now is used in mkl to create new layer. CPU and GPU return None

binarize(ary, out, stochastic=True)[source]

Binarizes input array

Parameters:
  • ary – tensor
  • out – reference to output
  • stochastic – stochastic or deterministic
bprop_conv(layer, F, E, grad_I, X=None, bias=None, bsum=None, alpha=1.0, beta=0.0, relu=False, brelu=False, slope=0.0, repeat=1, layer_op=None)[source]

bprop_conv:

Required Arguments:
layer: ConvLayer object created with conv_layer() E: error tensor (output gradient from previous layer) F: filter tensor (weights) grad_I: output tensor (gradient with respect to inputs)
Compounding Options:
X: tensor to use in bprop_relu or beta
can be same as grad_I for beta accumulate (this is default when None) should be same shape as grad_I
bias: (C,1) tensor to use for adding bias to output
grad_I += bias
bsum: (C,1) tensor to accumulate batch sum over (used in batchnorm or bprop_bias)
bsum = sum(grad_I.reshape(C,-1), axis=1) the sum operation is fully deterministic if combined with brelu then brelu is applied first
alpha, beta:
grad_I = alpha*grad_I + beta*X grad_I = alpha*grad_I + beta*grad_I (if X==grad_I)
relu: boolean flag to apply:
grad_I = max(grad_I, 0) + slope*min(grad_I, 0) can be combined with bias (where bias is added first)
brelu: bprop_relu boolean flag to apply:
grad_I *= (X > 0) + slope*(X < 0) can be combined with bsum tensor to output bprop_bias

repeat: used in benchmarking

bprop_lrn(layer, I, O, E, delta, denom, alpha=1.0, beta=0.0, ascale=1.0, bpower=1.0, repeat=1)[source]

Backward propagate pooling layer.

Parameters:
  • layer (PoolLayer) – The pool layer object. Different backends have different pool layers.
  • I (Tensor) – Input tensor.
  • E (Tensor) – Error tensor.
  • delta (Tensor) – Gradient tensor (delta)
  • denom (Tensor) – denominator tensor computed during bprop
  • ascale (float) – scaling parameter (alpha) to multiply the pooled sum (1.25e-5 in AK)
  • bpower (float) – exponential parameter (beta) to raise denominator by (0.75 in AK)
bprop_mergebroadcast(ngLayer, layers, error_views, error, delta, out_shape, alpha, beta, alphas, betas)[source]
bprop_mergesum(ngLayer, alpha, beta, layers, error, deltas)[source]
bprop_pool(layer, I, O, argmax=None, alpha=1.0, beta=0.0, repeat=1)[source]
bprop_relu(layer, x, error, deltas, slope)[source]
bprop_skipnode(error, deltas, alpha, beta)[source]
bprop_transform(ngLayer, transform, outputs, error, deltas, relu)[source]
check_caffe_compat()

Check whether compatibility mode is set to ‘caffe’.

clean_data(tensor, layer_mkl)

For MKL backends to clean mkl data (memory not freed)

cleanup_backend()[source]
clip(a, a_min, a_max, out=None)

Performs element-wise clipping of Tensor a, storing the result in out. The clipped value will be between [a_min, a_max].

Parameters:
  • a (Tensor) – the Tensor on which to perform the operation
  • a_min (Tensor, numeric) – lower bound for clip (inclusive).
  • a_max (Tensor, numeric) – upper bound for clip (inclusive).
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
Returns:

the resulting op-tree

Return type:

OpTreeNode

compensated_sum(sum_tensor, cmp_tensor, add_tensor, cmp_scale=1.0, add_scale=1.0)[source]
compound_bprop_bn(delta_out, grad_gamma, grad_beta, delta_in, x, xsum, xvar, gamma, eps, threads=None, repeat=1, binary=False, layer=None)[source]

Function to perform batch normalization forward pass.

Parameters:
  • delta_out (Tensor) – Delta buffer (where to write the output deltas)
  • grad_gamma (Tensor) – Gradient w.r.t. gamma
  • grad_beta (Tensor) – Gradient w.r.t. beta
  • delta_in (Tensor) – Delta buffer (where to get the input deltas)
  • x (Tensor) – feedforward input
  • xsum (Tensor) – Batch sum over PQN dimension
  • xvar (Tensor) – Batch variance
  • gamma (Tensor) – scale parameter
  • eps (float) – constant for numerical stability
  • threads (int) – Number of GPU threads
  • repeat (int) – Repeats for benchmarking
  • binary (bool) – Binary shift based computations
compound_bprop_lut(nin, inputs, error, error_t, dW, pad_idx, alpha=1.0, beta=0)[source]

Backward propagate lookup table layer.

Parameters:
  • nin (int) – Number of input word_ids.
  • inputs (Tensor) – Input tensor.
  • error (Tensor) – Error tensor.
  • error_t (Tensor) – Transposed error tensor.
  • dW (Tensor) – Gradient tensor (delta).
  • pad_idx (int) –
  • alpha (float) –
  • beta (float) –
compound_dot(A, B, C, alpha=1.0, beta=0.0, relu=False, bsum=None, repeat=1, size=None)[source]

Doing following operations (* is dot product) C = alpha * A * B + beta * C C = alpha * A.T * B + beta * C C = alpha * A * B.T + beta * C.

relu: if true applied before output (and prior to beta addition)

size: one of 32x128, 128x32, 64x128, 128x64, 128x128. Sometimes the
fastest tiling isn’t chosen for you.
Parameters:
  • B (A,) – input operands
  • C (GPUTensor) – output
  • alpha (float) – scale A*B term
  • beta (float) – scale C term before sum
  • relu (bool) – whether to apply ReLu before output
  • size (nxm) – Sometimes the fastest tiling isn’t chosen for you.
compound_fprop_bn(x, xsum, xvar, gmean, gvar, gamma, beta, y, eps, rho, compute_batch_sum, accumbeta=0.0, relu=False, threads=None, repeat=1, binary=False, inference=False, outputs=None, layer=None)[source]

Function to perform compound kernel call for batch normalization forward pass.

Parameters:
  • x (Tensor) – Input from previous layer
  • xsum (Tensor) – Precomputed batch sum over PQN dimension
  • xvar (Tensor) – Buffer for variance (computed in kernel)
  • gmean (Tensor) – global mean ()
  • gvar (Tensor) – global variance
  • gamma (Tensor) – scale parameter
  • beta (Tensor) – location parameter
  • y (Tensor) – normalized output
  • eps (float) – constant for numerical stability
  • rho (float) – exponential window averaging constant
  • accumbeta (float) – value to scale output by before accumulating
  • relu (bool) – Compound ReLU activation in kernel
  • threads (int) – Number of GPU threads
  • repeat (int) – Repeats for benchmarking
  • binary (bool) – Binary shift based computations
compound_rnn_unroll_bprop(W_recur, delta_prev_s, delta_s, h_s, nout, num_steps, num_used_steps, activation, reverse=True)[source]

Time step unrolling portion of recurrent layer bprop.

Parameters:
  • W_recur (Tensor) – Recurrent weight matrix.
  • delta_prev_s (Array) – Array of per time step input delta tensors. Each element in the array is a single time step view into one tensor containing all of the time steps in sequence.
  • delta_s (Array) – Array of per time step input delta tensors. Each element in the array is a single time step view into one tensor containing all of the time steps in sequence.
  • h_s (Tensor) – Array of per time step hidden state tensors. Each element in the array is a single time step view into one tensor containing all of the time steps in sequence.
  • nout (integer) – Number of output units for the layer.
  • num_steps (integer) – Total number of time steps in the buffer.
  • num_used_steps (integer) – Number of time steps being used for real data.
  • activation (Transform) – Activation function for the layer.
  • reverse (boolean) – When true, unrolling will iterate over time steps in reverse (default case for RNN).
compound_rnn_unroll_bprop_bibnrnn(ngLayer, error, in_deltas_f, prev_in_deltas, in_deltas_b, next_in_deltas, W_recur_f, W_recur_b, h_f, h_b, nout, nsteps, used_nsteps, activation, h_buffer_all)
compound_rnn_unroll_fprop(W_recur, h_prev_s, h_ff_s, h_s, bias, nout, num_steps, num_used_steps, activation, reverse=False)[source]

Time step unrolling portion of recurrent layer fprop.

Parameters:
  • W_recur (Tensor) – Recurrent weight matrix.
  • h_prev_s (Array) – Array of per time step hidden state tensors. Each element in the array is a single time step view into one tensor containing all of the time steps in sequence.
  • h_ff_s (Array) – Array of per time step hidden state tensors. Each element in the array is a single time step view into one tensor containing all of the time steps in sequence.
  • h_s (Array) – Array of per time step hidden state tensors. Each element in the array is a single time step view into one tensor containing all of the time steps in sequence.
  • bias (Tensor) – Bias tensor to add at each time step.
  • nout (integer) – Number of output units for the layer.
  • num_steps (integer) – Total number of time steps in the buffer.
  • num_used_steps (integer) – Number of time steps being used for real data.
  • activation (Transform) – Activation function for the layer.
  • reverse (boolean) – When true, unrolling will iterate over time steps in reverse (for BiRNN).
compound_rnn_unroll_fprop_bibnrnn(ngLayer, h_buffer_all, h_ff_buffer, W_recur_f, h_prev, h_ff_f, h_f, b_f, W_recur_b, h_next, h_ff_b, h_b, b_b, nout, nsteps, used_nsteps, activation)
consume(buf_index, hostlist, devlist)[source]
conv_layer(dtype, N, C, K, D=1, H=1, W=1, T=1, R=1, S=1, pad_d=0, pad_h=0, pad_w=0, str_d=1, str_h=1, str_w=1, dil_d=1, dil_h=1, dil_w=1)[source]

Create a new ConvLayer parameter object. This then is passed as an argument to all the convolution operations.

N: Number of images in mini-batch C: Number of input feature maps K: Number of output feature maps

D: Depth of input image H: Height of input image W: Width of input image

T: Depth of filter kernel R: Height of filter kernel S: Width of filter kernel

padding: amount of zero-padding around the given edge strides: factor to step the filters by in a given direction dilation: dilation factor for each dimension

dtype: need to know dtype to setup proper kernels and params.

convert_data(tensor, layer_mkl)

For MKL backends to convert data from mkl layout to norm numpy layout

copy_transpose(a, out, axes=None, repeat=1)[source]

Function to perform a fast copy transpose/dimshuffle operation. Works just like numpy.transpose, but requires an output tensor argument.

cublas_dot(A, B, C, alpha=1.0, beta=0.0)[source]

Matrix multiplication using cublas library. Intended for use on Kepler GPUs where maxas kernels are not supported.

C = alpha * (AB) + beta * C

Parameters:
  • A (Tensor) – Input tensor
  • B (Tensor) – Input tensor
  • C (Tensor) – Output tensor
  • alpha (float) – Scalar for AB
  • beta (float) – Scalar for C
deconv_layer(dtype, N, C, K, M, P, Q, T=1, R=1, S=1, pad_d=0, pad_h=0, pad_w=0, str_d=1, str_h=1, str_w=1, dil_d=1, dil_h=1, dil_w=1)[source]

Create a new DeconvLayer parameter object. This then is passed as an argument to all the convolution operations.

N: Number of images in mini-batch C: Number of output feature maps K: Number of input feature maps

M: Depth of input P: Height of input Q: Width of input

D: Depth of output image H: Height of output image W: Width of output image

T: Depth of filter kernel R: Height of filter kernel S: Width of filter kernel

padding: amount of zero-padding around the given edge strides: factor to step the filters by in a given direction dilation: dilation factor for each dimension

dtype: need to know dtype to setup proper kernels and params.

distribute_data(tensor, layer_parallelism)

For backends which support distributed training, this will distribute or gather the error or activation tensor depending on the type of parallelism used to distribute the layer computation. Currently this is only supported by multi-GPU in Nervana cloud.

Parameters:
  • tensor – Tensor containing either activations or errors
  • layer_parallelism – Type of parallelism expected by the layer
Returns:

Tensor which has been altered by this call or None

divide(a, b, out=None)

Perform element-wise division on the operands, storing the resultant values in the out Tensor. Each operand and out must have identical shape or be broadcastable as such.

Parameters:
  • a (Tensor, numeric) – left-hand side operand.
  • b (Tensor, numeric) – right-hand side operand.
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
Returns:

the resulting op-tree

Return type:

OpTreeNode

dot(a, b, out=None)

Dot product of two Tensors.

Parameters:
  • a (Tensor) – left-hand side operand.
  • b (Tensor) – right-hand side operand.
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned. Note that this object should differ from left and right.
Returns:

the resulting op-tree from this operation.

Return type:

OpTreeNode

dropout(keep=0.5, out=None)[source]

Returns a keep mask for dropout.

Parameters:
  • keep (int, optional) – the keep threshold. Values smaller than keep will be set to 0, otherwise set to 1.
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
Returns:

the resulting op-tree

Return type:

OpTreeNode

dump_hist_data()[source]
empty(shape, dtype=None, name=None, persist_values=True, parallel=False, distributed=False, allocator=<Boost.Python.function object at 0x2db4ca0>)[source]

Allocate the space for a GPUTensor

Parameters:
  • shape (int, list) – The size of each dimension of the Tensor.
  • dtype (dtype, optional) – Element data type. If not specified we use default_dtype value
  • persist_values (bool, optional) – If set to True (the default), the values assigned to this Tensor will persist across multiple begin and end calls. Setting to False may provide a performance increase if values do not need to be maintained across such calls
  • allocator (function, optional) – Memory allocator.
Returns:

newly created data structure reference

Return type:

GPUTensor

empty_like(other_ary, name=None)[source]

Instantiate a new instance of this backend’s Tensor class, with the shape taken from ary.

Parameters:
  • ary (tensor object) – Tensor to inherit the dimensions of.
  • dtype (data-type, optional) – If present, specifies the underlying type to employ for each element.
Returns:

array object

Return type:

Tensor

end(block, identifier)

Signal the corresponding end of a block of repeated computation (at the end of a loop). This operation can be used to help the compiler optimize performance, but has no direct effect on calculations. It must be preceded by a corresponding Backend.begin() call.

Parameters:
  • block (Block.attr) – identifies the type of computation being worked on based on Block attribute specified
  • identifier (int) – unique identifier for this particular iteration of the block. Will typically be something like epoch number, mini-batch number, and so forth.

See also

begin()

equal(a, b, out=None)

Performs element-wise equality testing on each element of left and right, storing the result in out. Each operand is assumed to be the same shape (or broadcastable as such).

Parameters:
  • a (Tensor, numeric) – left-hand side operand.
  • b (Tensor, numeric) – right-hand side operand.
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
Returns:

the resulting op-tree

Return type:

OpTreeNode

execute(optree)[source]

Execute the optree.

Parameters:optree – (OpTreeNode): the OpTreeNode object that represents all the operations
exp(a, out=None)

Perform element-wise exponential transformation on Tensor a, storing the result in Tensor out. Both Tensor’s should have identical shape.

Parameters:
  • a (Tensor) – input to be transformed.
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
Returns:

the resulting op-tree

Return type:

OpTreeNode

exp2(a, out=None)

Perform element-wise 2-based exponential transformation on Tensor a, storing the result in Tensor out. Both Tensor’s should have identical shape.

Parameters:
  • a (Tensor) – input to be transformed.
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
Returns:

the resulting op-tree

Return type:

OpTreeNode

fabs(a, out=None)

Perform element-wise absolute value of Tensor a, storing the result in Tensor out. Both Tensor’s should have identical shape. Implemented as an alias of absolute.

Parameters:
  • a (Tensor) – input to be transformed.
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
Returns:

the resulting op-tree

Return type:

OpTreeNode

fill_normal(ary, mean=0, stdv=1)[source]

Fill ary with normally distributed random numbers.

Parameters:
  • ary (Tensor) – Tensor to fill with random values
  • mean (float) – Mean value. Default 0
  • stdv (float) – standard deviation value. Default 1
finite(a, out=None)

Perform element-wise test of finiteness (not infinity or not Not a Number) on Tensor a, storing the result in Tensor out. Both Tensor’s should have identical shape.

Parameters:
  • a (Tensor) – input to be transformed.
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
Returns:

the resulting op-tree

Return type:

OpTreeNode

fprop_conv(layer, I, F, O, X=None, bias=None, bsum=None, alpha=1.0, beta=0.0, relu=False, brelu=False, slope=0.0, repeat=1, layer_op=None)[source]

fprop_conv:

Required Arguments:
layer: ConvLayer object created with conv_layer() I: input tensor (actiavtions) F: filter tensor (weights) O: output tensor (actiavtions)
Compounding Options:
X: tensor to use in bprop_relu or beta
can be same as O for beta accumulate (this is default when None) should be same shape as O
bias: (K,1) tensor to use for adding bias to output
O += bias
bsum: (K,1) tensor to accumulate batch sum over (used in batchnorm or bprop_bias)
bsum = sum(O.reshape(K,-1), axis=1) the sum operation is fully deterministic
alpha, beta:
O = alpha*O + beta*X O = alpha*O + beta*O (if X==O)
relu: boolean flag to apply:
O = max(O, 0) + slope*min(O, 0) can be combined with bias (where bias is added first)
brelu: bprop_relu boolean flag to apply:
O *= (X > 0) + slope*(X < 0) can be combined with bsum tensor to output bprop_bias

repeat: used in benchmarking

fprop_lrn(layer, I, O, denom, alpha=1.0, beta=0.0, ascale=1.0, bpower=1.0, repeat=1)[source]

Forward propagate lrn layer.

Parameters:
  • layer (PoolLayer) – The pool layer object, specd for LRN
  • I (Tensor) – Input tensor.
  • O (Tensor) – output tensor.
  • denom (Tensor) – denominator tensor, stores the result of the squared pooling/contrast
  • ascale (float) – scaling parameter (alpha) to multiply the pooled sum (1.25e-5 in AK)
  • bpower (float) – exponential parameter (beta) to raise denominator by (0.75 in AK)
fprop_mergebroadcast(ngLayer, inputs, inference, outputs, layers, out_shape)[source]
fprop_mergesum(ngLayer, inputs, inference, layers, outputs, out_shape)[source]
fprop_pool(layer, I, O, argmax=None, alpha=1.0, beta=0.0, repeat=1)[source]
fprop_relu(layer, x, slope)[source]
fprop_skipnode(x, y, beta)[source]
fprop_softmax(x, axis)[source]
fprop_transform(ngLayer, transform, inputs, outputs, relu=False)[source]
gen_rng(seed=None)[source]

Generate the random number generator on device and on host.

Parameters:seed (int) – random number generator seed
Returns:seeded numpy RNG
get_events()[source]
get_time(start, end)[source]

Return time between start and end marks.

Parameters:
  • start (time maker) – start time mark
  • end (time marker) – end time mark
Returns:

time elapsed between start and end time marks in milliseconds

greater(a, b, out=None)

Performs element-wise greater than testing on each element of left and right, storing the result in out. Each operand is assumed to be the same shape (or broadcastable as such).

Parameters:
  • a (Tensor, numeric) – left-hand side operand.
  • b (Tensor, numeric) – right-hand side operand.
  • out (Tensor, optional) – where the result will be stored. If out is None, only theshape op-tree will be returned.
Returns:

the resulting op-tree

Return type:

OpTreeNode

greater_equal(a, b, out=None)

Performs element-wise greater than or equal testing on each element of left and right, storing the result in out. Each operand is assumed to be the same shape (or broadcastable as such).

Parameters:
  • a (Tensor, numeric) – left-hand side operand.
  • b (Tensor, numeric) – right-hand side operand.
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
Returns:

the resulting op-tree

Return type:

OpTreeNode

init_mark()[source]

Generate a timing mark object.

Returns:timing mark (pycude driver event)
iobuf(dim0, x=None, dtype=None, name=None, persist_values=True, shared=None, parallelism=None)

Allocate input and output buffer for layer based on batch size. This is used because the layer does not know about the batch size.

Parameters:
  • dim0 (tuple or int) – I/O buffer dimension for layer (without the axis specifying the batch size).
  • x (data-type, optional) – If present and not None, x will be returned directly. x will be not None if the buffer has already been allocated.
  • dtype (data-type, optional) – If present, specifies the underlying type to employ for each element.
  • name (str, optional) – name indentifying the tensor (used in printing).
  • persist_values (bool, optional) – If set to True (the default), the values assigned to this Tensor will persist across multiple begin and end calls. Setting to False may provide a performance increase if values do not need to be maintained across such calls
  • shared (buffer, optional) – If present will attempt to reuse the memory in shared to allocate the I/O buffer
  • parallelism (str, optional) – Indicates type of parallelism (Data, Model) employed by this buffer. Ignored on CPU and GPU backends, defaults to no parallelism.
Returns:

array object

Return type:

Tensor

is_mkl()
less(a, b, out=None)

Performs element-wise less than testing on each element of left and right, storing the result in out. Each operand is assumed to be the same shape (or broadcastable as such).

Parameters:
  • a (Tensor, numeric) – left-hand side operand.
  • b (Tensor, numeric) – right-hand side operand.
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
Returns:

the resulting op-tree

Return type:

OpTreeNode

less_equal(a, b, out=None)

Performs element-wise less than or equal testing on each element of left and right, storing the result in out. Each operand is assumed to be the same shape (or broadcastable as such).

Parameters:
  • a (Tensor, numeric) – left-hand side operand.
  • b (Tensor, numeric) – right-hand side operand.
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
Returns:

the resulting op-tree

Return type:

OpTreeNode

log(a, out=None)

Perform element-wise natural logarithm transformation on Tensor a, storing the result in Tensor out. Both Tensor’s should have identical shape.

Parameters:
  • a (Tensor) – input to be transformed.
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
Returns:

the resulting op-tree

Return type:

OpTreeNode

log2(a, out=None)

Perform element-wise 2-based logarithm transformation on Tensor a, storing the result in Tensor out. Both Tensor’s should have identical shape.

Parameters:
  • a (Tensor) – input to be transformed.
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
Returns:

the resulting op-tree

Return type:

OpTreeNode

lrn_layer(dtype, N, C, D=1, H=1, W=1, J=1)[source]

Create a new PoolLayer parameter object. This then is passed as an argument to all pooling kernels.

N: Number of images in mini-batch

C: Number of input feature maps H: Height of input image W: Width of input image

J: Size of feature map pooling window (maxout n_pieces)

padding: amount of zero-padding around the given image or feature map edge strides: factor to step the window by in a given direction (overlap allowed)

Leave spatial dimensions at 1 to allow feature map pooling in the fc layers.

make_binary_mask(out, keepthresh=0.5)[source]

Create a binary mask for dropout layers.

Parameters:
  • out (GPUTensor) – Output tensor
  • keepthresh (float) – fraction of ones
max(a, axis=None, out=None, keepdims=True)

Calculates the maximal element value along the specified axes.

Parameters:
  • a (Tensor) – the Tensor on which to perform the operation
  • axis (int, optional) – the dimension along which to compute. If set to None, we will take max over all dimensions.
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
  • keepdims (bool, optional) – Keep the axes being computed over in the output (with size 1), instead of collapsing. Defaults to True.
Returns:

the resulting op-tree

Return type:

OpTreeNode

maximum(a, b, out=None)

Performs element-wise maximum value assignment based on corresponding elements of left and right, storing the result in out. Each operand is assumed to be the same shape (or broadcastable as such).

Parameters:
  • a (Tensor, numeric) – left-hand side operand.
  • b (Tensor, numeric) – right-hand side operand.
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
Returns:

the resulting op-tree

Return type:

OpTreeNode

mean(a, axis=None, partial=None, out=None, keepdims=True)

Calculates the arithmetic mean of the elements along the specified axes.

Parameters:
  • a (Tensor) – the Tensor on which to perform the operation
  • axis (int, optional) – the dimension along which to compute. If set to None, we will take mean over all dimensions. Defaults to None
  • partial (bool, optional) – Not currently used.
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
  • keepdims (bool, optional) – Keep the axes being computed over in the output (with size 1), instead of collapsing. Defaults to True.
Returns:

the resulting op-tree

Return type:

OpTreeNode

mergebroadcast_layer(layer_num)[source]
mergesum_layer(layer_num)[source]
min(a, axis=None, out=None, keepdims=True)

Calculates the minimal element value along the specified axes.

Parameters:
  • a (Tensor) – the Tensor on which to perform the operation
  • axis (int, optional) – the dimension along which to compute. If set to None, we will take min over all dimensions.
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
  • keepdims (bool, optional) – Keep the axes being computed over in the output (with size 1), instead of collapsing. Defaults to True.
Returns:

the resulting op-tree

Return type:

OpTreeNode

minimum(a, b, out=None)

Performs element-wise minimum value assignment based on corresponding elements of left and right, storing the result in out. Each operand is assumed to be the same shape (or broadcastable as such).

Parameters:
  • a (Tensor, numeric) – left-hand side operand.
  • b (Tensor, numeric) – right-hand side operand.
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
Returns:

the resulting op-tree

Return type:

OpTreeNode

multiply(a, b, out=None)

Perform element-wise multiplication on the operands, storing the resultant values in the out Tensor. Each operand and out must have identical shape or be broadcastable as such.

Parameters:
  • a (Tensor, numeric) – left-hand side operand.
  • b (Tensor, numeric) – right-hand side operand.
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
Returns:

the resulting op-tree

Return type:

OpTreeNode

negative(a, out=None)

Perform element-wise negation of Tensor a, storing the result in Tensor out. Both Tensor’s should have identical shape.

Parameters:
  • a (Tensor) – input to be transformed.
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
Returns:

the resulting op-tree

Return type:

OpTreeNode

nms(detections, threshold, normalized=False)[source]

Function to perform non-maximal supression.

Parameters:
  • detections (Tensor) – detection boxes (box_count, 5), each row has (x1, y1, x2, y2, score). Assume the boxes have already been sorted based on score in descending order
  • threshold (float) – box overlap threshold, boxes with smaller overlaps will be kept
  • normalized (bool) – whether box coordinates are normalized to image dimensions. This affects whether we use a +1 offset to compute box sizes.
Outputs:
keep_ind (list): list of indices
not_equal(a, b, out=None)

Performs element-wise non-equality testing on each element of left and right, storing the result in out. Each operand is assumed to be the same shape (or broadcastable as such).

Parameters:
  • a (Tensor, numeric) – left-hand side operand.
  • b (Tensor, numeric) – right-hand side operand.
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
Returns:

the resulting op-tree

Return type:

OpTreeNode

onehot(indices, axis, out=None)

Generate optree for converting indices to a onehot representation.

Parameters:
  • indices (Tensor) – Elements must be of numpy integer type for gpu onehot to work.
  • axis (int) – the axis along the feature length dimension
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
Returns:

the resulting op-tree

Return type:

OpTreeNode

ones(shape, dtype=None, name=None, persist_values=True, parallel=False, distributed=False, allocator=<Boost.Python.function object at 0x2db4ca0>)[source]

Instantiate a new instance of the GPUTensor class setting each element value to 1.

Parameters:
  • shape (list of ints) – The size of each dimension of the Tensor.
  • dtype (dtype, optional) – Element data type. If not specified we use default_dtype value (‘float32’ unless overridden).
  • persist_values (bool, optional) – If set to True (the default), the values assigned to this Tensor will persist across multiple begin and end calls. Setting to False may provide a performance increase if values do not need to be maintained across such calls
  • allocator (function, optional) – Memory allocator.
Returns:

newly created data structure reference

Return type:

GPUTensor

output_dim(X, S, padding, strides, pooling=False, dilation=1)

Compute along 1 dimension, with these sizes, what will be the output dimension.

Parameters:
  • X (int) – input data dimension
  • S (int) – filter dimension
  • padding (int) – padding on each side
  • strides (int) – striding
  • pooling (bool) – flag for setting pooling layer size
  • dilation (int) – dilation of filter
pool_layer(dtype, op, N, C, D=1, H=1, W=1, J=1, T=1, R=1, S=1, pad_c=0, pad_d=0, pad_h=0, pad_w=0, str_c=None, str_d=None, str_h=None, str_w=None)[source]

Create a new PoolLayer parameter object. This then is passed as an argument to all pooling kernels.

op: max, avg, l2 pooling N: Number of images in mini-batch

C: Number of input feature maps D: Depth of input image H: Height of input image W: Width of input image

J: Size of feature map pooling window (maxout n_pieces) T: Depth of pooling window R: Height of pooling window S: Width of pooling window

padding: amount of zero-padding around the given image or feature map edge strides: factor to step the window by in a given direction (overlap allowed)

Leave spatial dimensions at 1 to allow feature map pooling in the fc layers.

power(a, b, out=None)

Perform element-wise raise of tsr values to specified power, storing the result in Tensor out. Both Tensor’s should have identical shape.

Parameters:
  • a (Tensor) – input to be transformed.
  • b (Tensor, numeric) – exponentiated value to be applied to element. Examples include 2 (square), 0.5 (sqaure root).
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
Returns:

the resulting op-tree

Return type:

OpTreeNode

rand(out=None)[source]

Generate random number uniformly distributed between 0 and 1.

Parameters:out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
Returns:the resulting op-tree
Return type:OpTreeNode
reciprocal(a, out=None)

Perform element-wise reciprocal of Tensor a, storing the result in Tensor out. Both Tensor’s should have identical shape.

Parameters:
  • a (Tensor) – input to be transformed.
  • power (Tensor, numeric) – exponentiated value to be applied to element. Examples include 2 (square), 0.5 (sqaure root).
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
Returns:

the resulting op-tree

Return type:

OpTreeNode

record_mark(marker)[source]

Mark the current time.

Parameters:marker (time mark) – timing mark generated by init_mark()
relu_layer()[source]
revert_tensor(tensor)

Reverts a tensor to its original state after being distributed by distribute_data.

Parameters:tensor – Tensor to be reverted
rint(a, out=None)

Perform element-wise rounding to nearest int.

Parameters:
  • a (Tensor) – input to be transformed.
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
Returns:

the resulting op-tree

Return type:

OpTreeNode

rng_get_state()[source]

Return the current state of the on-host and on-device RNGs.

Returns:
the on-host and on-device RNG state vectors,
respectively
Return type:(np.array, np.array)
rng_reset()[source]

Reset the RNG to the initial state stored in self.init_rng_state and self.init_rng_state_dev for the host and device RNG, respectively.

rng_set_state(rng_states)[source]

Set the RNG state for both the on device and on host RNGs.

Parameters:rng_states (tuple of np.arrays) – tuple with 2 elements 1) numpy random number state vector 2) array of uint32 specifying on dev RNG state
roipooling_bprop(I, rois, O, argmax, roi_count, fm_channel, fm_height, fm_width, pooled_height, pooled_width, spatial_scale)[source]

Function to perform bprop of ROIPooling.

Parameters:
  • I (Tensor) – input errors (C, pooled_height, pooled_width, roi_count)
  • argmax (Tensor) – max args from the fprp (C, pooled_height, pooled_width, roi_count)
  • rois (Tensor) – (ROIs, 5)
  • O (Tensor) – output deltas (C, H, W, N)
roipooling_fprop(I, rois, O, argmax, roi_count, fm_channel, fm_height, fm_width, pooled_height, pooled_width, spatial_scale)[source]

Function to perform fprop of ROIPooling.

Parameters:
  • I (Tensor) – (C, H, W, N)
  • rois (Tensor) – (ROIs, 5)
  • O (Tensor) – (C, pooled_height, pooled_width, roi_count)
  • argmax (Tensor) – (C, pooled_height, pooled_width, roi_count)
safelog(a, out=None)

Perform element-wise natural logarithm transformation on Tensor a, storing the result in Tensor out. Both Tensor’s should have identical shape. This log function has built in safety for underflow.

Parameters:
  • a (Tensor) – input to be transformed.
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
Returns:

the resulting op-tree

Return type:

OpTreeNode

scratch_buffer(size)[source]
scratch_buffer_init()[source]
scratch_buffer_offset(size)[source]
scratch_buffer_reset()[source]
set_caffe_compat()

Set flag to make layers compatible with caffe in terms of conv and pool layer output size determination and dropout layer implementation.

set_hist_buffers(hist_bins, hist_offset)[source]
set_scratch_size(*args)[source]
sgn(a, out=None)

Perform element-wise indication of the sign of Tensor a, storing the result in Tensor out. Both Tensor’s should have identical shape.

Parameters:
  • a (Tensor) – input to be transformed.
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
Returns:

the resulting op-tree

Return type:

OpTreeNode

shared_iobuf_size(shape, parallelism)

Computes the backend specific size needed for an iobuf with a specified shape that is meant to be shared between layers.

Parameters:
  • shape (tuple) – Requested iobuf shape
  • parallelism (string) – Parallelism of layer requesting this iobuf
Returns:

Size of required iobuf

Return type:

int

shift(ary, shift_ary, value=True, out=None)[source]

Shifts input array

Parameters:
  • ary – tensor
  • shift_ary – tensor of shift amount
  • out – reference to output
sig(a, out=None)

Perform element-wise sigmoid transformation on Tensor a, storing the result in Tensor out. Both Tensor’s should have identical shape.

Parameters:
  • a (Tensor) – input to be transformed.
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
Returns:

the resulting op-tree

Return type:

OpTreeNode

sig2(a, out=None)

Perform element-wise 2-based sigmoid logarithm transformation on Tensor a, storing the result in Tensor out. Both Tensor’s should have identical shape.

Parameters:
  • a (Tensor) – input to be transformed.
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
Returns:

the resulting op-tree

Return type:

OpTreeNode

sqrt(a, out=None)

Perform element-wise square-root of Tensor a, storing the result in Tensor out. Both Tensor’s should have identical shape.

Parameters:
  • a (Tensor) – input to be transformed.
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
Returns:

the resulting op-tree

Return type:

OpTreeNode

square(a, out=None)

Perform element-wise square of Tensor a, storing the result in Tensor out. Both Tensor’s should have identical shape.

Parameters:
  • a (Tensor) – input to be transformed.
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
Returns:

the resulting op-tree

Return type:

OpTreeNode

std(a, axis=None, partial=None, out=None, keepdims=True)

Calculates the standard deviation of the elements along the specified axes.

Parameters:
  • a (Tensor) – the Tensor on which to perform the operation
  • axis (int, optional) – the dimension along which to compute. If set to None, we will take std over all dimensions.
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
  • partial (bool, optional) – Not currently used.
  • keepdims (bool, optional) – Keep the axes being computed over in the output (with size 1), instead of collapsing. Defaults to True.
Returns:

the resulting op-tree

Return type:

OpTreeNode

subtract(a, b, out=None)

Perform element-wise subtraction on the operands, storing the resultant values in the out Tensor. Each operand and out must have identical shape or be broadcastable as such.

Parameters:
  • a (Tensor, numeric) – left-hand side operand.
  • b (Tensor, numeric) – right-hand side operand.
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
Returns:

the resulting op-tree

Return type:

OpTreeNode

sum(a, axis=None, out=None, keepdims=True)

Calculates the summation of the elements along the specified axis.

Parameters:
  • a (Tensor) – the Tensor on which to perform the sum
  • axis (int, optional) – the dimension along which to compute. If set to None, we will sum over all dimensions.
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
  • keepdims (bool, optional) – Keep the axes being computed over in the output (with size 1), instead of collapsing. Defaults to True.
Returns:

the resulting op-tree

Return type:

OpTreeNode

synchronize_mark(marker)[source]

Synchronize on the given marker.

Parameters:marker (time mark) – timing mark generated by init_mark()
take(a, indices, axis, out=None)

Extract elements based on the indices along a given axis.

Parameters:
  • a (Tensor) – the Tensor on which to perform the operation
  • indices (Tensor, numpy ndarray) – indicies of elements to select
  • axis (int, optional) – the dimension along which to compute. If set to None, we will extract over all dimensions (flattened first)
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
tanh(a, out=None)

Perform element-wise hyperbolic tangent transformation on Tensor a, storing the result in Tensor out. Both Tensor’s should have identical shape.

Parameters:
  • a (Tensor) – input to be transformed.
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
Returns:

the resulting op-tree

Return type:

OpTreeNode

tanh2(a, out=None)

Perform element-wise 2-based hyperbolic tangent transformation on Tensor a, storing the result in Tensor out. Both Tensor’s should have identical shape.

Parameters:
  • a (Tensor) – input to be transformed.
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
Returns:

the resulting op-tree

Return type:

OpTreeNode

true_divide(a, b, out=None)

Here it is an alias of divide. Instead of the Python traditional ‘floor division’, this returns a true division.

Parameters:
  • a (Tensor, numeric) – left-hand side operand.
  • b (Tensor, numeric) – right-hand side operand.
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
Returns:

the resulting op-tree

Return type:

OpTreeNode

update_conv(layer, I, E, grad_F, alpha=1.0, beta=0.0, repeat=1, grad_bias=None, layer_op=None)[source]

update_conv:

Required Inputs:
layer: ConvLayer object created with conv_layer() I: input tensor (actiavtions) E: error tensor (output gradient from previous layer) grad_F: output tensor (gradient with respect to weights)

Compounding Options: alpha, beta:

O = alpha*O + beta*O

repeat: used in benchmarking

update_fc_bias(err, out)

Compute the updated bias gradient for a fully connected network layer.

Parameters:
  • err (Tensor) – backpropagated error
  • out (Tensor) – Where to store the updated gradient value.
var(a, axis=None, partial=None, out=None, keepdims=True, binary=False)

Calculates the variance of the elements along the specified axes.

Parameters:
  • a (Tensor) – the Tensor on which to perform the operation
  • axis (int, optional) – the dimension along which to compute. If set to None, we will take var over all dimensions. Defaults to None
  • partial (bool, optional) – Not currently used.
  • out (Tensor, optional) – where the result will be stored. If out is None, only the op-tree will be returned.
  • keepdims (bool, optional) – Keep the axes being computed over in the output (with size 1), instead of collapsing. Defaults to True.
Returns:

the resulting op-tree

Return type:

OpTreeNode

xnor_compound_dot(A, B, C, beta=0.0, bsum=None)[source]

Performs XNOR GEMM C = A * B

Parameters:
  • A (Tensor) – left-hand side operand.
  • B (Tensor) – right-hand side operand.
  • C (Tensor) – output operand
zeros(shape, dtype=None, name=None, persist_values=True, parallel=False, distributed=False, allocator=<Boost.Python.function object at 0x2db4ca0>)[source]

Instantiate a new instance of the GPUTensor class setting each element value to 0.

Parameters:
  • shape (list of ints) – The size of each dimension of the Tensor.
  • dtype (dtype, optional) – Element data type. If not specified we use default_dtype value (‘float32’ unless overridden).
  • persist_values (bool, optional) – If set to True (the default), the values assigned to this Tensor will persist across multiple begin and end calls. Setting to False may provide a performance increase if values do not need to be maintained across such calls
  • allocator (function, optional) – Memory allocator.
Returns:

newly created data structure reference

Return type:

GPUTensor

zeros_like(other_ary, name=None)[source]

Instantiate a new instance of this backend’s Tensor class, with the shape taken from ary and populating each element with a value of 0.

Parameters:
  • ary (tensor object) – Tensor to inherit the dimensions of.
  • dtype (data-type, optional) – If present, specifies the underlying type to employ for each element.
Returns:

array object

Return type:

Tensor