A tutorial based on basic information of Torch7. It covers installation, simple runable codes, tensor manipulations, sweep out key-packages and post-hoc audience q&a.
Sending Calendar Invites on SES and Calendarsnack.pdf
Simple, fast, and scalable torch7 tutorial
1. Simple, Fast, and Scalable
Torch7 Tutorial
Jin-Hwa Kim
Biointelligence Lab.
Program in Cognitive Science
SNU
Twitter@jnhwkim
2. BITable of Contents
■ Installing Torch
■ Simple examples
■ Tensor System
■ Notable Packages
• nn
• optim
• dp
• rnn
■ References
3. BIInstalling Torch
■ Installing Torch
$ curl -s https://raw.githubusercontent.com/torch/ezinstall/master/
install-deps | bash
$ git clone https://github.com/torch/distro.git ~/torch --recursive
$ cd ~/torch; ./install.sh
■ LuaRocks
• Lua package manager (like a apt-get or homebrew for linux or OSX)
$ luarocks install image
$ luarocks list
■ TREPL
• Torch read-eval-print loop
$ th
$ th file.lua
$ th -i interactive.lua
4. BISimple example
■ Quick Lua (python + Matlab ?)
• object.method(self, a, b) ≡ object:method(a, b)
• index starts with 1, i = i + 1
• if not <condition> and/or <condition2> then <statement1> elseif
<statement2> else <statement3> end
• for i=1,10,2 do <statement> end
• You can break, but cannot continue.
• Generic data structure: table (JSON?)
{}, {1,2,3}, {‘a’=1, ‘b’=2, ‘c’=3}, {{1,2},{3,4}}, #table
■ Few Torch Functions
• rand() which creates tensor drawn from uniform distribution
• t() which transposes a tensor (note it returns a new view)
• dot() which performs a dot product between two tensors
• eye() which returns a identity matrix
• * operator over matrices (which performs a matrix-vector or matrix-matrix
multiplication)
5. BITensor System (1/6)
■ Fundamental data class, Tensor
• Handling numeric data
• Serializable (If you want, can save as a file.)
• Tensor interprets a chunk of memory as having dimensions.
■ Size
> x:nDimension()
6
> x:size() —- use x:size(dim) for a specific dimension
4
5
6
2
7
3
[torch.LongStorage of size 6]
■ Access
> x[3][4][5]
6. BITensor System (2/6)
■ Memory Contiguous
• It’s a C style, not Fortran.
x = torch.Tensor(4,5)
i = 0
x:apply(function()
i = i + 1
return i
end)
> x
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
[torch.DoubleTensor of dimension 4x5]
> x:stride()
5
1 -- element in the last dimension are contiguous!
[torch.LongStorage of size 2]
7. BITensor System (3/6)
■ Tensor Types
ByteTensor -- contains unsigned chars
CharTensor -- contains signed chars
ShortTensor -- contains shorts
IntTensor -- contains ints
FloatTensor -- contains floats
DoubleTensor -- contains doubles
■ Most numeric operations are implemented only for FloatTensor and
DoubleTensor (e.g. torch.histc()).
> torch.histc(torch.IntTensor(5))
[string "_RESULT={torch.histc(a)}"]:1: torch.IntTensor does not implement the
torch.histc() function
stack traceback:
[C]: in function 'histc'
[string "_RESULT={torch.histc(a)}"]:1: in main chunk
[C]: in function 'xpcall'
/Users/Calvin/torch/install/share/lua/5.1/trepl/init.lua:630: in function 'repl'
...lvin/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:185: in main chunk
[C]: at 0x010e9422f0
torch.setdefaulttensortype(‘torch.FloatTensor')
a = torch.Tensor()
a:type()
a:size(dim)
8. BITensor System (4/6)
■ Querying elements
> x[2][3] -- returns row 2, column 3
6
> x[{2,3}] -- another way to return row 2, column 3
6
> x[torch.LongStorage{2,3}] -- yet another way to return row 2, column 3
6
> x[torch.le(x,3)] -- torch.le returns a ByteTensor that acts as a mask
1
2
3
[torch.DoubleTensor of dimension 3]
■ Extracting sub-tensors
[self] narrow(dim, index, size)
[Tensor] sub(dim1s, dim1e ... [, dim4s [, dim4e]])
[Tensor] select(dim, index)
or just using operator [] …
9. BITensor System (5/6)
■ Indexing operator []
x = torch.Tensor(5, 6):zero()
> x
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
[torch.DoubleTensor of dimension 5x6]
x[{ 1,3 }] = 1 -- sets element at
> x (i=1,j=3) to 1
0 0 1 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
[torch.DoubleTensor of dimension 5x6]
x[{ 2,{2,4} }] = 2 -- sets a slice
> x of 3 elements to 2
0 0 1 0 0 0
0 2 2 2 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
[torch.DoubleTensor of dimension 5x6]
x[{ {},4 }] = -1 -- sets the full 4th column to -1
> x
0 0 1 -1 0 0
0 2 2 -1 0 0
0 0 0 -1 0 0
0 0 0 -1 0 0
0 0 0 -1 0 0
[torch.DoubleTensor of dimension 5x6]
x[{ {},2 }] = torch.range(1,5) -- copy a 1D tensor
> x to a slice of x
0 1 1 -1 0 0
0 2 2 -1 0 0
0 3 0 -1 0 0
0 4 0 -1 0 0
0 5 0 -1 0 0
[torch.DoubleTensor of dimension 5x6]
x[torch.lt(x,0)] = -2 -- sets all negative elements
> x to -2 via a bytetensor mask
0 1 1 -2 0 0
0 2 2 -2 0 0
0 3 0 -2 0 0
0 4 0 -2 0 0
0 5 0 -2 0 0
[torch.DoubleTensor of dimension 5x6]
10. BITensor System (6/6)
■ And so on …
• See how many functions are the same you’ve used in Matlab!
⋮
[Tensor] gather(dim, index)
[LongTensor] nonzero(tensor)
[result] expand([result,] sizes)
[Tensor] repeatTensor([result,] sizes)
[Tensor] squeeze([dim])
[Tensor] transpose(dim1, dim2)
[Tensor] permute(dim1, dim2, ..., dimn)
[Tensor] unfold(dim, size, step)
⋮
11. BINotable Packages nn
■ Stands for nueral network
• It provides a modular way to build a complex model.
■ Modules
• Module: abstract class
• Containers: e.g. Sequential, Parallel and Concat
• Transfer functions: e.g. Tanh, ReLU and Sigmoid
• Simple layers: e.g. Linear, Mean, Max and Reshape
• Convolutional layers: Temporal, Spatial and Volumetric (3d)
■ Criterions
• Criterions: abstract class
• MSECriterion
• ClassNLLCriterion
12. BINotable Packages nn
■ nn example of manual training (though you gonna not using this)
• Model definition
require "nn"
mlp = nn.Sequential(); -- make a multi-layer perceptron
inputs = 2; outputs = 1; HUs = 20; -- parameters
mlp:add(nn.Linear(inputs, HUs))
mlp:add(nn.Tanh())
mlp:add(nn.Linear(HUs, outputs))
• Loss function
criterion = nn.MSECriterion()
• Training
for i = 1,2500 do
-- random sample
local input= torch.randn(2); -- normally distributed example in 2d
local output= torch.Tensor(1);
if input[1]*input[2] > 0 then -- calculate label for XOR function
output[1] = -1
else output[1] = 1 end
-- feed it to the neural network and the criterion
criterion:forward(mlp:forward(input), output) —- test using mlp:forward(input)
-- train over this example in 3 steps
-- (1) zero the accumulation of the gradients
mlp:zeroGradParameters()
-- (2) accumulate gradients
mlp:backward(input, criterion:backward(mlp.output, output))
-- (3) update parameters with a 0.01 learning rate
mlp:updateParameters(0.01)
end
13. BINotable Packages optim
■ optim
• Example - SGD
require “optim”
state = {
learningRate = 1e-3,
momentum = 0.5,
maxIter = 100
}
for i,sample in ipairs(training_samples) do
local func = function(x)
-- define eval function
return f,df_dx
end
optim.sgd(func,x,state)
end
• Algorithms
- adadelta, adagrad, asgd, cg, lbfgs, nag, …
14. BINotable Packages dp
■ Command-line Arguments
--[[command line arguments]]--
cmd = torch.CmdLine()
cmd:text()
cmd:text('Image Classification using MLP Training/Optimization')
cmd:text('Example:')
cmd:text('$> th neuralnetwork.lua --batchSize 128 --momentum 0.5')
cmd:text('Options:')
cmd:option('--learningRate', 0.1, 'learning rate at t=0')
cmd:option('--schedule', '{}', 'learning rate schedule')
cmd:option('--hiddenSize', '{200,200}', 'number of hidden units per layer')
⋮
cmd:option('--batchSize', 32, 'number of examples per batch')
cmd:option('--cuda', false, 'use CUDA')
cmd:option('--useDevice', 1, 'sets the device (GPU) to use')
cmd:option('--maxEpoch', 100, 'maximum number of epochs to run')
cmd:option('--dropout', false, 'apply dropout on hidden neurons')
cmd:option('--batchNorm', false, 'use batch normalization. dropout is mostly redundant with this')
cmd:option('--dataset', 'Mnist', 'which dataset to use : Mnist | NotMnist | Cifar10 | Cifar100')
cmd:option('--standardize', false, 'apply Standardize preprocessing')
cmd:option('--zca', false, 'apply Zero-Component Analysis whitening')
cmd:option('--progress', false, 'display progress bar')
cmd:option('--silent', false, 'dont print anything to stdout')
cmd:text()
opt = cmd:parse(arg or {})
opt.schedule = dp.returnString(opt.schedule)
opt.hiddenSize = dp.returnString(opt.hiddenSize)
if not opt.silent then
table.print(opt)
end
15. BINotable Packages dp
■ Preprocess
--[[preprocessing]]--
local input_preprocess = {}
if opt.standardize then
table.insert(input_preprocess, dp.Standardize())
end
if opt.zca then
table.insert(input_preprocess, dp.ZCA())
end
if opt.lecunlcn then
table.insert(input_preprocess, dp.GCN())
table.insert(input_preprocess, dp.LeCunLCN{progress=true})
end
■ DataSource
--[[data]]--
if opt.dataset == 'Mnist' then
ds = dp.Mnist{input_preprocess = input_preprocess}
elseif opt.dataset == 'NotMnist' then
ds = dp.NotMnist{input_preprocess = input_preprocess}
elseif opt.dataset == 'Cifar10' then
ds = dp.Cifar10{input_preprocess = input_preprocess}
elseif opt.dataset == 'Cifar100' then
ds = dp.Cifar100{input_preprocess = input_preprocess}
else
error("Unknown Dataset")
end
16. BINotable Packages dp
■ Model of Modules
--[[Model]]--
model = nn.Sequential()
model:add(nn.Convert(ds:ioShapes(), 'bf')) -- to batchSize x nFeature (also type converts)
-- hidden layers
inputSize = ds:featureSize()
for i,hiddenSize in ipairs(opt.hiddenSize) do
model:add(nn.Linear(inputSize, hiddenSize)) -- parameters
if opt.batchNorm then
model:add(nn.BatchNormalization(hiddenSize))
end
model:add(nn.Tanh())
if opt.dropout then
model:add(nn.Dropout())
end
inputSize = hiddenSize
end
-- output layer
model:add(nn.Linear(inputSize, #(ds:classes())))
model:add(nn.LogSoftMax())
17. BINotable Packages dp
■ Propagator
--[[Propagators]]--
if opt.lrDecay == 'adaptive' then
ad = dp.AdaptiveDecay{max_wait = opt.maxWait, decay_factor=opt.decayFactor}
elseif opt.lrDecay == 'linear' then
opt.decayFactor = (opt.minLR - opt.learningRate)/opt.saturateEpoch
end
train = dp.Optimizer{
acc_update = opt.accUpdate, loss = nn.ModuleCriterion(nn.ClassNLLCriterion(), nil, nn.Convert()),
epoch_callback = function(model, report) -- called every epoch
-- learning rate decay
if report.epoch > 0 then
if opt.lrDecay == 'adaptive' then
opt.learningRate = opt.learningRate*ad.decay
ad.decay = 1
elseif opt.lrDecay == 'schedule' and opt.schedule[report.epoch] then
opt.learningRate = opt.schedule[report.epoch]
elseif opt.lrDecay == 'linear' then
opt.learningRate = opt.learningRate + opt.decayFactor
end
opt.learningRate = math.max(opt.minLR, opt.learningRate)
if not opt.silent then
print("learningRate", opt.learningRate)
end
end
end,
callback = function(model, report) -- called every batch
if opt.accUpdate then
model:accUpdateGradParameters(model.dpnn_input, model.output, opt.learningRate)
else
model:updateGradParameters(opt.momentum) -- affects gradParams
model:updateParameters(opt.learningRate) -- affects params
end
model:maxParamNorm(opt.maxOutNorm) -- affects params
model:zeroGradParameters() -- affects gradParams
end,
feedback = dp.Confusion(), sampler = dp.ShuffleSampler{batch_size = opt.batchSize}, progress = opt.progress
}
valid = dp.Evaluator{
feedback = dp.Confusion(), sampler = dp.Sampler{batch_size = opt.batchSize}
}
test = dp.Evaluator{
feedback = dp.Confusion(), sampler = dp.Sampler{batch_size = opt.batchSize}
}
19. BINotable Packages dp
■ Running the Experiment
--[[GPU or CPU]]--
if opt.cuda then
require 'cutorch'
require 'cunn'
cutorch.setDevice(opt.useDevice)
xp:cuda()
end
--[[Experiment]]--
xp:run(ds)
■ Loading the saved Experiment
require 'dp'
require 'cuda' -- if you used cmd-line argument --cuda
xp = torch.load("/home/nicholas/save/xps:1432747515:1.dat")
model = xp:model()
print(torch.type(model))
nn.Serial
model = model.module
print(torch.type(model))
nn.Sequential
20. BINotable Packages rnn
■ RNN
-- recurrent layer
local rnn
if opt.lstm then
-- Long Short Term Memory
rnn = nn.Sequencer(nn.FastLSTM(inputSize, hiddenSize))
else
-- simple recurrent neural network
rnn = nn.Recurrent(
hiddenSize, -- first step will use nn.Add
nn.Identity(), -- for efficiency (see above input layer)
nn.Linear(hiddenSize, hiddenSize), -- feedback layer (recurrence)
nn.Sigmoid(), -- transfer function
99999 -- maximum number of time-steps per sequence
)
if opt.zeroFirst then
-- this is equivalent to forwarding a zero vector through the feedback layer
rnn.startModule:share(rnn.feedbackModule, 'bias')
end
rnn = nn.Sequencer(rnn)
end
22. BIQ1: What is the advantage over Theano?
■ Nicholas Leonard’ case
• He is from the LISA lab (Mecca of Theano) switched from Theano to Torch
for some reasons. He said:
• Easy to code components in C/Cuda.
• Theano is a C/CUDA compiler, which enables to perform automatic gradient
differentiation. Whereas Torch7 is not such a compiler, so you don’t need to
be symbolical, which means you don’t need to wait another 5 min for
compiling. The compiling also makes debugging harder.
• Pylearn2 adds fancy features to Theano, however, it isn’t easy. You have to
learn a new kind of programming always thinking symbolically, with the risk
of exeptions for it. The exeption may be your research.
(Though dp is Pylearn2-like alternative for Torch7.)
• So, I was tired, and wanted to get back to non-symbolic programming.
https://plus.google.com/+YannLeCunPhD/posts/iJV2tJgpF16
23. BIQ1: What is the advantage over Theano?
■ Yann LeCun’s recommandation
• Torch is used at Facebook AI Research and in other parts of Facebook.
• It's also used heavily at Deep Mind (now Google) and people in the Google
Brain group have started to use it too.
• Naturally, it's used at NYU and IDIAP where much of the original
development came from. But it's also used at INRIA in Paris, MSR in New
York, Intel, and a number of startups.
https://plus.google.com/+YannLeCunPhD/posts/iJV2tJgpF16
24. BIQ2: Is the # of dimmensions unlimited?
■ Torch7 supports an unlimited multi-dimmensional Tensor matrix.
• The number of dimensions is unlimited that can be created using
LongStorage with more dimensions.
--- creation of a 4D-tensor 4x5x6x2
z = torch.Tensor(4,5,6,2)
--- for more dimensions, (here a 6D tensor) one can do:
s = torch.LongStorage(6)
--- assigning lengths for each dimension, not values
s[1] = 4; s[2] = 5; s[3] = 6; s[4] = 2; s[5] = 7; s[6] = 3;
x = torch.Tensor(s)
> x:nDimension()
6
> x:size()
4
5
6
2
7
3
[torch.LongStorage of size 6]