Overview
MXNet.jl Namespace
Most the functions and types in MXNet.jl are organized in a flat namespace. Because many some functions are conflicting with existing names in the Julia Base module, we wrap them all in a mx
module. The convention of accessing the MXNet.jl interface is the to use the mx.
prefix explicitly:
julia> using MXNet
julia> x = mx.zeros(2, 3) # MXNet NDArray
2×3 mx.NDArray{Float32} @ CPU0:
0.0 0.0 0.0
0.0 0.0 0.0
julia> y = zeros(eltype(x), size(x)) # Julia Array
2×3 Array{Float32,2}:
0.0 0.0 0.0
0.0 0.0 0.0
julia> copy!(y, x) # Overloaded function in Julia Base
2×3 Array{Float32,2}:
0.0 0.0 0.0
0.0 0.0 0.0
julia> z = mx.ones(size(x), mx.gpu()) # MXNet NDArray on GPU
2×3 mx.NDArray{Float32} @ GPU0:
1.0 1.0 1.0
1.0 1.0 1.0
julia> mx.copy!(z, y) # Same as copy!(z, y)
2×3 mx.NDArray{Float32} @ GPU0:
0.0 0.0 0.0
0.0 0.0 0.0
Note functions like size
, copy!
that is extensively overloaded for various types works out of the box. But functions like zeros
and ones
will be ambiguous, so we always use the mx.
prefix. If you prefer, the mx.
prefix can be used explicitly for all MXNet.jl functions, including size
and copy!
as shown in the last line.
Low Level Interface
NDArray
NDArray
is the basic building blocks of the actual computations in MXNet. It is like a Julia Array
object, with some important differences listed here:
- The actual data could live on different
Context
(e.g. GPUs). For some contexts, iterating into the elements one by one is very slow, thus indexing into NDArray is not recommanded in general. The easiest way to inspect the contents of an NDArray is to use thecopy
function to copy the contents as a JuliaArray
. - Operations on
NDArray
(including basic arithmetics and neural network related operators) are executed in parallel with automatic dependency tracking to ensure correctness. - There is no generics in
NDArray
, theeltype
is alwaysmx.MX_float
. Because for applications in machine learning, single precision floating point numbers are typical a best choice balancing between precision, speed and portability. Also since libmxnet is designed to support multiple languages as front-ends, it is much simpler to implement with a fixed data type.
While most of the computation is hidden in libmxnet by operators corresponding to various neural network layers. Getting familiar with the NDArray
API is useful for implementing Optimizer
or customized operators in Julia directly.
The followings are common ways to create NDArray
objects:
-
mx.empty(shape[, context])
: create on uninitialized array of a given shape on a specific device. For example,mx.empty(2, 3)
,mx.((2, 3), mx.gpu(2))
. -
mx.zeros(shape[, context])
andmx.ones(shape[, context])
: similar to the Julia's built-inzeros
andones
. -
mx.copy(jl_arr, context)
: copy the contents of a JuliaArray
to a specific device.
Most of the convenient functions like size
, length
, ndims
, eltype
on array objects should work out-of-the-box. Although indexing is not supported, it is possible to take slices:
julia> using MXNet
julia> a = mx.ones(2, 3)
2×3 mx.NDArray{Float32,2} @ CPU0:
1.0 1.0 1.0
1.0 1.0 1.0
julia> b = mx.slice(a, 1:2)
2×2 mx.NDArray{Float32,2} @ CPU0:
1.0 1.0
1.0 1.0
julia> b[:] = 2
2
julia> a
2×3 mx.NDArray{Float32,2} @ CPU0:
2.0 2.0 1.0
2.0 2.0 1.0
A slice is a sub-region sharing the same memory with the original NDArray
object. A slice is always a contiguous piece of memory, so only slicing on the last dimension is supported. The example above also shows a way to set the contents of an NDArray
.
julia> using MXNet
julia> mx.srand(42)
julia> a = mx.empty(2, 3)
2×3 mx.NDArray{Float32,2} @ CPU0:
1.34342f-36 1.79751f19 1.69501f-7
0.0 1.06216f-5 0.0
julia> a[:] = 0.5 # set all elements to a scalar
0.5
julia> a[:] = rand(size(a)) # set contents with a Julia Array
2×3 Array{Float64,2}:
0.707028 0.97237 0.491423
0.103496 0.367879 0.88511
julia> copy!(a, rand(size(a))) # set value by copying a Julia Array
2×3 mx.NDArray{Float32,2} @ CPU0:
0.380377 0.145532 0.420763
0.496733 0.54073 0.724241
julia> b = mx.empty(size(a))
2×3 mx.NDArray{Float32,2} @ CPU0:
2.894f-36 7.9325f34 1.19403f-7
0.0 4.17261f-8 0.0
julia> b[:] = a # copying and assignment between NDArrays
2×3 mx.NDArray{Float32,2} @ CPU0:
0.380377 0.145532 0.420763
0.496733 0.54073 0.724241
Note due to the intrinsic design of the Julia language, a normal assignment
a = b
does not mean copying the contents of b
to a
. Instead, it just make the variable a
pointing to a new object, which is b
. Similarly, inplace arithmetics does not work as expected:
julia> using MXNet
julia> a = mx.ones(2)
2 mx.NDArray{Float32,1} @ CPU0:
1.0
1.0
julia> r = a # keep a reference to a
2 mx.NDArray{Float32,1} @ CPU0:
1.0
1.0
julia> b = mx.ones(2)
2 mx.NDArray{Float32,1} @ CPU0:
1.0
1.0
julia> a += b # translates to a = a + b
2 mx.NDArray{Float32,1} @ CPU0:
2.0
2.0
julia> a
2 mx.NDArray{Float32,1} @ CPU0:
2.0
2.0
julia> r
2 mx.NDArray{Float32,1} @ CPU0:
1.0
1.0
As we can see, a
has expected value, but instead of inplace updating, a new NDArray
is created and a
is set to point to this new object. If we look at r
, which still reference to the old a
, its content has not changed. There is currently no way in Julia to overload the operators like +=
to get customized behavior.
Instead, you will need to write a[:] = a + b
, or if you want real inplace +=
operation, MXNet.jl provides a simple macro @mx.inplace
:
julia> @mx.inplace a += b
2 mx.NDArray{Float32,1} @ CPU0:
3.0
3.0
julia> macroexpand(:(@mx.inplace a += b))
2 mx.NDArray{Float32,1} @ CPU0:
4.0
4.0
As we can see, it translate the +=
operator to an explicit add_to!
function call, which invokes into libmxnet to add the contents of b
into a
directly. For example, the following is the update rule in the SGD Optimizer
(both grad
and weight
are NDArray
objects):
@inplace weight += -lr * (grad_scale * grad + self.weight_decay * weight)
Note there is no much magic in mx.inplace
: it only does a shallow translation. In the SGD update rule example above, the computation like scaling the gradient by grad_scale
and adding the weight decay all create temporary NDArray
objects. To mitigate this issue, libmxnet has a customized memory allocator designed specifically to handle this kind of situations. The following snippet does a simple benchmark on allocating temp NDArray
vs. pre-allocating:
using Benchmark
using MXNet
N_REP = 1000
SHAPE = (128, 64)
CTX = mx.cpu()
LR = 0.1
function inplace_op()
weight = mx.zeros(SHAPE, CTX)
grad = mx.ones(SHAPE, CTX)
# pre-allocate temp objects
grad_lr = mx.empty(SHAPE, CTX)
for i = 1:N_REP
copy!(grad_lr, grad)
@mx.inplace grad_lr .*= LR
@mx.inplace weight -= grad_lr
end
return weight
end
function normal_op()
weight = mx.zeros(SHAPE, CTX)
grad = mx.ones(SHAPE, CTX)
for i = 1:N_REP
weight[:] -= LR * grad
end
return weight
end
# make sure the results are the same
@assert(maximum(abs(copy(normal_op() - inplace_op()))) < 1e-6)
println(compare([inplace_op, normal_op], 100))
The comparison on my laptop shows that normal_op
while allocating a lot of temp NDArray in the loop (the performance gets worse when increasing N_REP
), is only about twice slower than the pre-allocated one.
Row | Function | Average | Relative | Replications |
---|---|---|---|---|
1 | "inplace_op" | 0.0074854 | 1.0 | 100 |
2 | "normal_op" | 0.0174202 | 2.32723 | 100 |
So it will usually not be a big problem unless you are at the bottleneck of the computation.
Distributed Key-value Store
The type KVStore
and related methods are used for data sharing across different devices or machines. It provides a simple and efficient integer - NDArray key-value storage system that each device can pull or push.
The following example shows how to create a local KVStore
, initialize a value and then pull it back.
kv = mx.KVStore(:local)
shape = (2, 3)
key = 3
mx.init!(kv, key, mx.ones(shape) * 2)
a = mx.empty(shape)
mx.pull!(kv, key, a) # pull value into a
a
2×3 mx.NDArray{Float32,2} @ CPU0:
2.0 2.0 2.0
2.0 2.0 2.0
Intermediate Level Interface
Symbols and Composition
The way we build deep learning models in MXNet.jl is to use the powerful symbolic composition system. It is like Theano, except that we avoided long expression compilation time by providing larger neural network related building blocks to guarantee computation performance. See also this note for the design and trade-off of the MXNet symbolic composition system.
The basic type is mx.SymbolicNode
. The following is a trivial example of composing two symbols with the +
operation.
A = mx.Variable(:A)
B = mx.Variable(:B)
C = A + B
print(C) # debug printing
Symbol Outputs:
output[0]=_plus0(0)
Variable:A
Variable:B
--------------------
Op:elemwise_add, Name=_plus0
Inputs:
arg[0]=A(0) version=0
arg[1]=B(0) version=0
We get a new SymbolicNode
by composing existing SymbolicNode
s by some operations. A hierarchical architecture of a deep neural network could be realized by recursive composition. For example, the following code snippet shows a simple 2-layer MLP construction, using a hidden layer of 128 units and a ReLU
activation function.
net = mx.Variable(:data)
net = mx.FullyConnected(net, name=:fc1, num_hidden=128)
net = mx.Activation(net, name=:relu1, act_type=:relu)
net = mx.FullyConnected(net, name=:fc2, num_hidden=64)
net = mx.SoftmaxOutput(net, name=:out)
print(net) # debug printing
Symbol Outputs:
output[0]=out(0)
Variable:data
Variable:fc1_weight
Variable:fc1_bias
--------------------
Op:FullyConnected, Name=fc1
Inputs:
arg[0]=data(0) version=0
arg[1]=fc1_weight(0) version=0
arg[2]=fc1_bias(0) version=0
Attrs:
num_hidden=128
--------------------
Op:Activation, Name=relu1
Inputs:
arg[0]=fc1(0)
Attrs:
act_type=relu
Variable:fc2_weight
Variable:fc2_bias
--------------------
Op:FullyConnected, Name=fc2
Inputs:
arg[0]=relu1(0)
arg[1]=fc2_weight(0) version=0
arg[2]=fc2_bias(0) version=0
Attrs:
num_hidden=64
Variable:out_label
--------------------
Op:SoftmaxOutput, Name=out
Inputs:
arg[0]=fc2(0)
arg[1]=out_label(0) version=0
Each time we take the previous symbol, and compose with an operation. Unlike the simple +
example above, the operations here are "bigger" ones, that correspond to common computation layers in deep neural networks.
Each of those operation takes one or more input symbols for composition, with optional hyper-parameters (e.g. num_hidden
, act_type
) to further customize the composition results.
When applying those operations, we can also specify a name
for the result symbol. This is convenient if we want to refer to this symbol later on. If not supplied, a name will be automatically generated.
Each symbol takes some arguments. For example, in the +
case above, to compute the value of C
, we will need to know the values of the two inputs A
and B
. For neural networks, the arguments are primarily two categories: inputs and parameters. inputs are data and labels for the networks, while parameters are typically trainable weights, bias, filters.
When composing symbols, their arguments accumulates. We can list all the arguments by
mx.list_arguments(net)
6-element Array{Symbol,1}:
:data
:fc1_weight
:fc1_bias
:fc2_weight
:fc2_bias
:out_label
Note the names of the arguments are generated according to the provided name for each layer. We can also specify those names explicitly:
julia> using MXNet
julia> net = mx.Variable(:data)
MXNet.mx.SymbolicNode data
julia> w = mx.Variable(:myweight)
MXNet.mx.SymbolicNode myweight
julia> net = mx.FullyConnected(net, weight=w, name=:fc1, num_hidden=128)
MXNet.mx.SymbolicNode fc1
julia> mx.list_arguments(net)
3-element Array{Symbol,1}:
:data
:myweight
:fc1_bias
The simple fact is that a Variable
is just a placeholder mx.SymbolicNode
. In composition, we can use arbitrary symbols for arguments. For example:
julia> using MXNet
julia> net = mx.Variable(:data)
MXNet.mx.SymbolicNode data
julia> net = mx.FullyConnected(net, name=:fc1, num_hidden=128)
MXNet.mx.SymbolicNode fc1
julia> net2 = mx.Variable(:data2)
MXNet.mx.SymbolicNode data2
julia> net2 = mx.FullyConnected(net2, name=:net2, num_hidden=128)
MXNet.mx.SymbolicNode net2
julia> mx.list_arguments(net2)
3-element Array{Symbol,1}:
:data2
:net2_weight
:net2_bias
julia> composed_net = net2(data2=net, name=:composed)
MXNet.mx.SymbolicNode composed
julia> mx.list_arguments(composed_net)
5-element Array{Symbol,1}:
:data
:fc1_weight
:fc1_bias
:net2_weight
:net2_bias
Note we use a composed symbol, net
as the argument data2
for net2
to get a new symbol, which we named :composed
. It also shows that a symbol itself is a call-able object, which can be invoked to fill in missing arguments and get more complicated symbol compositions.
Shape Inference
Given enough information, the shapes of all arguments in a composed symbol could be inferred automatically. For example, given the input shape, and some hyper-parameters like num_hidden
, the shapes for the weights and bias in a neural network could be inferred.
julia> using MXNet
julia> net = mx.Variable(:data)
MXNet.mx.SymbolicNode data
julia> net = mx.FullyConnected(net, name=:fc1, num_hidden=10)
MXNet.mx.SymbolicNode fc1
julia> arg_shapes, out_shapes, aux_shapes = mx.infer_shape(net, data=(10, 64))
(Tuple[(10, 64), (10, 10), (10,)], Tuple[(10, 64)], Tuple[])
The returned shapes corresponds to arguments with the same order as returned by mx.list_arguments
. The out_shapes
are shapes for outputs, and aux_shapes
can be safely ignored for now.
julia> for (n, s) in zip(mx.list_arguments(net), arg_shapes)
println("$n\t=> $s")
end
data => (10, 64)
fc1_weight => (10, 10)
fc1_bias => (10,)
julia> for (n, s) in zip(mx.list_outputs(net), out_shapes)
println("$n\t=> $s")
end
fc1_output => (10, 64)
Binding and Executing
In order to execute the computation graph specified a composed symbol, we will bind the free variables to concrete values, specified as mx.NDArray
. This will create an mx.Executor
on a given mx.Context
. A context describes the computation devices (CPUs, GPUs, etc.) and an executor will carry out the computation (forward/backward) specified in the corresponding symbolic composition.
julia> using MXNet
julia> A = mx.Variable(:A)
MXNet.mx.SymbolicNode A
julia> B = mx.Variable(:B)
MXNet.mx.SymbolicNode B
julia> C = A .* B
MXNet.mx.SymbolicNode _mul0
julia> a = mx.ones(3) * 4
3 mx.NDArray{Float32,1} @ CPU0:
4.0
4.0
4.0
julia> b = mx.ones(3) * 2
3 mx.NDArray{Float32,1} @ CPU0:
2.0
2.0
2.0
julia> c_exec = mx.bind(C, context=mx.cpu(), args=Dict(:A => a, :B => b));
julia> mx.forward(c_exec)
1-element Array{MXNet.mx.NDArray{Float32,1},1}:
NDArray Float32[8.0, 8.0, 8.0]
julia> c_exec.outputs[1]
3 mx.NDArray{Float32,1} @ CPU0:
8.0
8.0
8.0
julia> copy(c_exec.outputs[1]) # copy turns NDArray into Julia Array
3-element Array{Float32,1}:
8.0
8.0
8.0
For neural networks, it is easier to use simple_bind
. By providing the shape for input arguments, it will perform a shape inference for the rest of the arguments and create the NDArray automatically. In practice, the binding and executing steps are hidden under the Model
interface.
TODO Provide pointers to model tutorial and further details about binding and symbolic API.
High Level Interface
The high level interface include model training and prediction API, etc.