Simple, Fast, and Scalable
Torch7 Tutorial
Jin-Hwa Kim
Biointelligence Lab.
Program in Cognitive Science
Table of Contents
■ Installing Torch
■ Simple examples
■ Tensor System
■ Notable Packages
• nn
• optim
• dp
• rnn
■ References
Installing Torch
■ Installing Torch
$ curl -s
install-deps | bash
$ git clone ~/torch --recursive
$ cd ~/torch; ./
■ LuaRocks
• Lua package manager (like a apt-get or homebrew for linux or OSX)
$ luarocks install image
$ luarocks list
• Torch read-eval-print loop
$ th
$ th file.lua
$ th -i interactive.lua
Simple example
■ Quick Lua (python + Matlab ?)
• object.method(self, a, b) ≡ object:method(a, b)
• index starts with 1, i = i + 1
• if not <condition> and/or <condition2> then <statement1> elseif
<statement2> else <statement3> end
• for i=1,10,2 do <statement> end
• You can break, but cannot continue.
• Generic data structure: table (JSON?)
{}, {1,2,3}, {‘a’=1, ‘b’=2, ‘c’=3}, {{1,2},{3,4}}, #table
■ Few Torch Functions
• rand() which creates tensor drawn from uniform distribution
• t() which transposes a tensor (note it returns a new view)
• dot() which performs a dot product between two tensors
• eye() which returns a identity matrix
• * operator over matrices (which performs a matrix-vector or matrix-matrix
Tensor System (1/6)
■ Fundamental data class, Tensor
• Handling numeric data
• Serializable (If you want, can save as a file.)
• Tensor interprets a chunk of memory as having dimensions.
■ Size
> x:nDimension()
> x:size() —- use x:size(dim) for a specific dimension
[torch.LongStorage of size 6]
■ Access
> x[3][4][5]
Tensor System (2/6)
■ Memory Contiguous
• It’s a C style, not Fortran.

x = torch.Tensor(4,5)
i = 0
i = i + 1
return i
> x
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
[torch.DoubleTensor of dimension 4x5]
> x:stride()
1 -- element in the last dimension are contiguous!
[torch.LongStorage of size 2]
Tensor System (3/6)
■ Tensor Types
ByteTensor -- contains unsigned chars
CharTensor -- contains signed chars
ShortTensor -- contains shorts
IntTensor -- contains ints
FloatTensor -- contains floats
DoubleTensor -- contains doubles
■ Most numeric operations are implemented only for FloatTensor and
DoubleTensor (e.g. torch.histc()).
> torch.histc(torch.IntTensor(5))
[string "_RESULT={torch.histc(a)}"]:1: torch.IntTensor does not implement the
torch.histc() function
stack traceback:
[C]: in function 'histc'
[string "_RESULT={torch.histc(a)}"]:1: in main chunk
[C]: in function 'xpcall'
/Users/Calvin/torch/install/share/lua/5.1/trepl/init.lua:630: in function 'repl'
...lvin/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:185: in main chunk
[C]: at 0x010e9422f0
a = torch.Tensor()
Tensor System (4/6)
■ Querying elements
> x[2][3] -- returns row 2, column 3
> x[{2,3}] -- another way to return row 2, column 3
> x[torch.LongStorage{2,3}] -- yet another way to return row 2, column 3
> x[torch.le(x,3)] -- torch.le returns a ByteTensor that acts as a mask
[torch.DoubleTensor of dimension 3]
■ Extracting sub-tensors
[self] narrow(dim, index, size)
[Tensor] sub(dim1s, dim1e ... [, dim4s [, dim4e]])
[Tensor] select(dim, index)
or just using operator [] …
Tensor System (5/6)
■ Indexing operator []
x = torch.Tensor(5, 6):zero()
> x
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
[torch.DoubleTensor of dimension 5x6]
x[{ 1,3 }] = 1 -- sets element at
> x (i=1,j=3) to 1
0 0 1 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
[torch.DoubleTensor of dimension 5x6]
x[{ 2,{2,4} }] = 2 -- sets a slice
> x of 3 elements to 2
0 0 1 0 0 0
0 2 2 2 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
[torch.DoubleTensor of dimension 5x6]
x[{ {},4 }] = -1 -- sets the full 4th column to -1
> x
0 0 1 -1 0 0
0 2 2 -1 0 0
0 0 0 -1 0 0
0 0 0 -1 0 0
0 0 0 -1 0 0
[torch.DoubleTensor of dimension 5x6]
x[{ {},2 }] = torch.range(1,5) -- copy a 1D tensor
> x to a slice of x
0 1 1 -1 0 0
0 2 2 -1 0 0
0 3 0 -1 0 0
0 4 0 -1 0 0
0 5 0 -1 0 0
[torch.DoubleTensor of dimension 5x6]
x[,0)] = -2 -- sets all negative elements
> x to -2 via a bytetensor mask
0 1 1 -2 0 0
0 2 2 -2 0 0
0 3 0 -2 0 0
0 4 0 -2 0 0
0 5 0 -2 0 0
[torch.DoubleTensor of dimension 5x6]
Tensor System (6/6)
■ And so on …
• See how many functions are the same you’ve used in Matlab!
[Tensor] gather(dim, index)
[LongTensor] nonzero(tensor)
[result] expand([result,] sizes)
[Tensor] repeatTensor([result,] sizes)
[Tensor] squeeze([dim])
[Tensor] transpose(dim1, dim2)
[Tensor] permute(dim1, dim2, ..., dimn)
[Tensor] unfold(dim, size, step)
Notable Packages nn
■ Stands for nueral network
• It provides a modular way to build a complex model.
■ Modules
• Module: abstract class
• Containers: e.g. Sequential, Parallel and Concat
• Transfer functions: e.g. Tanh, ReLU and Sigmoid
• Simple layers: e.g. Linear, Mean, Max and Reshape
• Convolutional layers: Temporal, Spatial and Volumetric (3d)
■ Criterions
• Criterions: abstract class
• MSECriterion
• ClassNLLCriterion
Notable Packages nn
■ nn example of manual training (though you gonna not using this)
• Model definition
require "nn"
mlp = nn.Sequential(); -- make a multi-layer perceptron
inputs = 2; outputs = 1; HUs = 20; -- parameters
mlp:add(nn.Linear(inputs, HUs))
mlp:add(nn.Linear(HUs, outputs))
• Loss function
criterion = nn.MSECriterion()
• Training
for i = 1,2500 do
-- random sample
local input= torch.randn(2); -- normally distributed example in 2d
local output= torch.Tensor(1);
if input[1]*input[2] > 0 then -- calculate label for XOR function
output[1] = -1
else output[1] = 1 end
-- feed it to the neural network and the criterion
criterion:forward(mlp:forward(input), output) —- test using mlp:forward(input)
-- train over this example in 3 steps
-- (1) zero the accumulation of the gradients
-- (2) accumulate gradients
mlp:backward(input, criterion:backward(mlp.output, output))
-- (3) update parameters with a 0.01 learning rate
Notable Packages optim
■ optim
• Example - SGD
require “optim”
state = {
learningRate = 1e-3,
momentum = 0.5,
maxIter = 100
for i,sample in ipairs(training_samples) do
local func = function(x)
-- define eval function
return f,df_dx
• Algorithms
- adadelta, adagrad, asgd, cg, lbfgs, nag, …
Notable Packages dp
■ Command-line Arguments
--[[command line arguments]]--
cmd = torch.CmdLine()
cmd:text('Image Classification using MLP Training/Optimization')
cmd:text('$> th neuralnetwork.lua --batchSize 128 --momentum 0.5')
cmd:option('--learningRate', 0.1, 'learning rate at t=0')
cmd:option('--schedule', '{}', 'learning rate schedule')
cmd:option('--hiddenSize', '{200,200}', 'number of hidden units per layer')
cmd:option('--batchSize', 32, 'number of examples per batch')
cmd:option('--cuda', false, 'use CUDA')
cmd:option('--useDevice', 1, 'sets the device (GPU) to use')
cmd:option('--maxEpoch', 100, 'maximum number of epochs to run')
cmd:option('--dropout', false, 'apply dropout on hidden neurons')
cmd:option('--batchNorm', false, 'use batch normalization. dropout is mostly redundant with this')
cmd:option('--dataset', 'Mnist', 'which dataset to use : Mnist | NotMnist | Cifar10 | Cifar100')
cmd:option('--standardize', false, 'apply Standardize preprocessing')
cmd:option('--zca', false, 'apply Zero-Component Analysis whitening')
cmd:option('--progress', false, 'display progress bar')
cmd:option('--silent', false, 'dont print anything to stdout')
opt = cmd:parse(arg or {})
opt.schedule = dp.returnString(opt.schedule)
opt.hiddenSize = dp.returnString(opt.hiddenSize)
if not opt.silent then
Notable Packages dp
■ Preprocess
local input_preprocess = {}
if opt.standardize then
table.insert(input_preprocess, dp.Standardize())
if opt.zca then
table.insert(input_preprocess, dp.ZCA())
if opt.lecunlcn then
table.insert(input_preprocess, dp.GCN())
table.insert(input_preprocess, dp.LeCunLCN{progress=true})
■ DataSource
if opt.dataset == 'Mnist' then
ds = dp.Mnist{input_preprocess = input_preprocess}
elseif opt.dataset == 'NotMnist' then
ds = dp.NotMnist{input_preprocess = input_preprocess}
elseif opt.dataset == 'Cifar10' then
ds = dp.Cifar10{input_preprocess = input_preprocess}
elseif opt.dataset == 'Cifar100' then
ds = dp.Cifar100{input_preprocess = input_preprocess}
error("Unknown Dataset")
Notable Packages dp
■ Model of Modules
model = nn.Sequential()
model:add(nn.Convert(ds:ioShapes(), 'bf')) -- to batchSize x nFeature (also type converts)
-- hidden layers
inputSize = ds:featureSize()
for i,hiddenSize in ipairs(opt.hiddenSize) do
model:add(nn.Linear(inputSize, hiddenSize)) -- parameters
if opt.batchNorm then
if opt.dropout then
inputSize = hiddenSize
-- output layer
model:add(nn.Linear(inputSize, #(ds:classes())))
Notable Packages dp
■ Propagator
if opt.lrDecay == 'adaptive' then
ad = dp.AdaptiveDecay{max_wait = opt.maxWait, decay_factor=opt.decayFactor}
elseif opt.lrDecay == 'linear' then
opt.decayFactor = (opt.minLR - opt.learningRate)/opt.saturateEpoch
train = dp.Optimizer{
acc_update = opt.accUpdate, loss = nn.ModuleCriterion(nn.ClassNLLCriterion(), nil, nn.Convert()),
epoch_callback = function(model, report) -- called every epoch
-- learning rate decay
if report.epoch > 0 then
if opt.lrDecay == 'adaptive' then
opt.learningRate = opt.learningRate*ad.decay
ad.decay = 1
elseif opt.lrDecay == 'schedule' and opt.schedule[report.epoch] then
opt.learningRate = opt.schedule[report.epoch]
elseif opt.lrDecay == 'linear' then
opt.learningRate = opt.learningRate + opt.decayFactor
opt.learningRate = math.max(opt.minLR, opt.learningRate)
if not opt.silent then
print("learningRate", opt.learningRate)
callback = function(model, report) -- called every batch
if opt.accUpdate then
model:accUpdateGradParameters(model.dpnn_input, model.output, opt.learningRate)
model:updateGradParameters(opt.momentum) -- affects gradParams
model:updateParameters(opt.learningRate) -- affects params
model:maxParamNorm(opt.maxOutNorm) -- affects params
model:zeroGradParameters() -- affects gradParams
feedback = dp.Confusion(), sampler = dp.ShuffleSampler{batch_size = opt.batchSize}, progress = opt.progress
valid = dp.Evaluator{
feedback = dp.Confusion(), sampler = dp.Sampler{batch_size = opt.batchSize}
test = dp.Evaluator{
feedback = dp.Confusion(), sampler = dp.Sampler{batch_size = opt.batchSize}
Notable Packages dp
■ Experiment
xp = dp.Experiment{
model = model,
optimizer = train,
validator = valid,
tester = test,
observer = {
error_report = {'validator','feedback','confusion','accuracy'},
maximize = true,
max_epochs = opt.maxTries
random_seed = os.time(),
max_epoch = opt.maxEpoch
Notable Packages dp
■ Running the Experiment
--[[GPU or CPU]]--
if opt.cuda then
require 'cutorch'
require 'cunn'
■ Loading the saved Experiment
require 'dp'
require 'cuda' -- if you used cmd-line argument --cuda
xp = torch.load("/home/nicholas/save/xps:1432747515:1.dat")
model = xp:model()
model = model.module
Notable Packages rnn
-- recurrent layer
local rnn
if opt.lstm then
-- Long Short Term Memory
rnn = nn.Sequencer(nn.FastLSTM(inputSize, hiddenSize))
-- simple recurrent neural network
rnn = nn.Recurrent(
hiddenSize, -- first step will use nn.Add
nn.Identity(), -- for efficiency (see above input layer)
nn.Linear(hiddenSize, hiddenSize), -- feedback layer (recurrence)
nn.Sigmoid(), -- transfer function
99999 -- maximum number of time-steps per sequence
if opt.zeroFirst then
-- this is equivalent to forwarding a zero vector through the feedback layer
rnn.startModule:share(rnn.feedbackModule, 'bias')
rnn = nn.Sequencer(rnn)
Responsive merging pull requests…
Q1: What is the advantage over Theano?
■ Nicholas Leonard’ case
• He is from the LISA lab (Mecca of Theano) switched from Theano to Torch
for some reasons. He said:
• Easy to code components in C/Cuda.
• Theano is a C/CUDA compiler, which enables to perform automatic gradient
differentiation. Whereas Torch7 is not such a compiler, so you don’t need to
be symbolical, which means you don’t need to wait another 5 min for
compiling. The compiling also makes debugging harder.
• Pylearn2 adds fancy features to Theano, however, it isn’t easy. You have to
learn a new kind of programming always thinking symbolically, with the risk
of exeptions for it. The exeption may be your research.

(Though dp is Pylearn2-like alternative for Torch7.)
• So, I was tired, and wanted to get back to non-symbolic programming.
Q1: What is the advantage over Theano?
■ Yann LeCun’s recommandation
• Torch is used at Facebook AI Research and in other parts of Facebook.
• It's also used heavily at Deep Mind (now Google) and people in the Google
Brain group have started to use it too.
• Naturally, it's used at NYU and IDIAP where much of the original
development came from. But it's also used at INRIA in Paris, MSR in New
York, Intel, and a number of startups.
Q2: Is the # of dimmensions unlimited?
■ Torch7 supports an unlimited multi-dimmensional Tensor matrix.
• The number of dimensions is unlimited that can be created using
LongStorage with more dimensions.
--- creation of a 4D-tensor 4x5x6x2
z = torch.Tensor(4,5,6,2)
--- for more dimensions, (here a 6D tensor) one can do:
s = torch.LongStorage(6)
--- assigning lengths for each dimension, not values
s[1] = 4; s[2] = 5; s[3] = 6; s[4] = 2; s[5] = 7; s[6] = 3;
x = torch.Tensor(s)
> x:nDimension()
> x:size()
[torch.LongStorage of size 6]

Simple, fast, and scalable torch7 tutorial

  Simple, Fast, and Scalable Torch7 Tutorial Jin-Hwa Kim Biointelligence Lab. Program in Cognitive Science SNU Twitter@jnhwkim
  Table of Contents ■ Installing Torch ■ Simple examples ■ Tensor System ■ Notable Packages • nn • optim • dp • rnn ■ References
  • 3. BIInstalling Torch ■ Installing Torch $ curl -s install-deps | bash $ git clone ~/torch --recursive $ cd ~/torch; ./ ■ LuaRocks • Lua package manager (like a apt-get or homebrew for linux or OSX) $ luarocks install image $ luarocks list ■ TREPL • Torch read-eval-print loop $ th $ th file.lua $ th -i interactive.lua
  • 4. BISimple example ■ Quick Lua (python + Matlab ?) • object.method(self, a, b) ≡ object:method(a, b) • index starts with 1, i = i + 1 • if not <condition> and/or <condition2> then <statement1> elseif <statement2> else <statement3> end • for i=1,10,2 do <statement> end • You can break, but cannot continue. • Generic data structure: table (JSON?) {}, {1,2,3}, {‘a’=1, ‘b’=2, ‘c’=3}, {{1,2},{3,4}}, #table ■ Few Torch Functions • rand() which creates tensor drawn from uniform distribution • t() which transposes a tensor (note it returns a new view) • dot() which performs a dot product between two tensors • eye() which returns a identity matrix • * operator over matrices (which performs a matrix-vector or matrix-matrix multiplication)
  • 5. BITensor System (1/6) ■ Fundamental data class, Tensor • Handling numeric data • Serializable (If you want, can save as a file.) • Tensor interprets a chunk of memory as having dimensions. ■ Size > x:nDimension() 6 > x:size() —- use x:size(dim) for a specific dimension 4 5 6 2 7 3 [torch.LongStorage of size 6] ■ Access > x[3][4][5]
  • 6. BITensor System (2/6) ■ Memory Contiguous • It’s a C style, not Fortran. 
 x = torch.Tensor(4,5) i = 0 x:apply(function() i = i + 1 return i end) > x 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 [torch.DoubleTensor of dimension 4x5] > x:stride() 5 1 -- element in the last dimension are contiguous! [torch.LongStorage of size 2]
  • 7. BITensor System (3/6) ■ Tensor Types ByteTensor -- contains unsigned chars CharTensor -- contains signed chars ShortTensor -- contains shorts IntTensor -- contains ints FloatTensor -- contains floats DoubleTensor -- contains doubles ■ Most numeric operations are implemented only for FloatTensor and DoubleTensor (e.g. torch.histc()). > torch.histc(torch.IntTensor(5)) [string "_RESULT={torch.histc(a)}"]:1: torch.IntTensor does not implement the torch.histc() function stack traceback: [C]: in function 'histc' [string "_RESULT={torch.histc(a)}"]:1: in main chunk [C]: in function 'xpcall' /Users/Calvin/torch/install/share/lua/5.1/trepl/init.lua:630: in function 'repl' ...lvin/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:185: in main chunk [C]: at 0x010e9422f0 torch.setdefaulttensortype(‘torch.FloatTensor') a = torch.Tensor() a:type() a:size(dim)
  • 8. BITensor System (4/6) ■ Querying elements > x[2][3] -- returns row 2, column 3 6 > x[{2,3}] -- another way to return row 2, column 3 6 > x[torch.LongStorage{2,3}] -- yet another way to return row 2, column 3 6 > x[torch.le(x,3)] -- torch.le returns a ByteTensor that acts as a mask 1 2 3 [torch.DoubleTensor of dimension 3] ■ Extracting sub-tensors [self] narrow(dim, index, size) [Tensor] sub(dim1s, dim1e ... [, dim4s [, dim4e]]) [Tensor] select(dim, index) or just using operator [] …
  • 9. BITensor System (5/6) ■ Indexing operator [] x = torch.Tensor(5, 6):zero() > x 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [torch.DoubleTensor of dimension 5x6] x[{ 1,3 }] = 1 -- sets element at > x (i=1,j=3) to 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [torch.DoubleTensor of dimension 5x6] x[{ 2,{2,4} }] = 2 -- sets a slice > x of 3 elements to 2 0 0 1 0 0 0 0 2 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [torch.DoubleTensor of dimension 5x6] x[{ {},4 }] = -1 -- sets the full 4th column to -1 > x 0 0 1 -1 0 0 0 2 2 -1 0 0 0 0 0 -1 0 0 0 0 0 -1 0 0 0 0 0 -1 0 0 [torch.DoubleTensor of dimension 5x6] x[{ {},2 }] = torch.range(1,5) -- copy a 1D tensor > x to a slice of x 0 1 1 -1 0 0 0 2 2 -1 0 0 0 3 0 -1 0 0 0 4 0 -1 0 0 0 5 0 -1 0 0 [torch.DoubleTensor of dimension 5x6] x[,0)] = -2 -- sets all negative elements > x to -2 via a bytetensor mask 0 1 1 -2 0 0 0 2 2 -2 0 0 0 3 0 -2 0 0 0 4 0 -2 0 0 0 5 0 -2 0 0 [torch.DoubleTensor of dimension 5x6]
  • 10. BITensor System (6/6) ■ And so on … • See how many functions are the same you’ve used in Matlab! ⋮ [Tensor] gather(dim, index) [LongTensor] nonzero(tensor) [result] expand([result,] sizes) [Tensor] repeatTensor([result,] sizes) [Tensor] squeeze([dim]) [Tensor] transpose(dim1, dim2) [Tensor] permute(dim1, dim2, ..., dimn) [Tensor] unfold(dim, size, step) ⋮
  • 11. BINotable Packages nn ■ Stands for nueral network • It provides a modular way to build a complex model. ■ Modules • Module: abstract class • Containers: e.g. Sequential, Parallel and Concat • Transfer functions: e.g. Tanh, ReLU and Sigmoid • Simple layers: e.g. Linear, Mean, Max and Reshape • Convolutional layers: Temporal, Spatial and Volumetric (3d) ■ Criterions • Criterions: abstract class • MSECriterion • ClassNLLCriterion
  • 12. BINotable Packages nn ■ nn example of manual training (though you gonna not using this) • Model definition require "nn" mlp = nn.Sequential(); -- make a multi-layer perceptron inputs = 2; outputs = 1; HUs = 20; -- parameters mlp:add(nn.Linear(inputs, HUs)) mlp:add(nn.Tanh()) mlp:add(nn.Linear(HUs, outputs)) • Loss function criterion = nn.MSECriterion() • Training for i = 1,2500 do -- random sample local input= torch.randn(2); -- normally distributed example in 2d local output= torch.Tensor(1); if input[1]*input[2] > 0 then -- calculate label for XOR function output[1] = -1 else output[1] = 1 end -- feed it to the neural network and the criterion criterion:forward(mlp:forward(input), output) —- test using mlp:forward(input) -- train over this example in 3 steps -- (1) zero the accumulation of the gradients mlp:zeroGradParameters() -- (2) accumulate gradients mlp:backward(input, criterion:backward(mlp.output, output)) -- (3) update parameters with a 0.01 learning rate mlp:updateParameters(0.01) end
  • 13. BINotable Packages optim ■ optim • Example - SGD require “optim” state = { learningRate = 1e-3, momentum = 0.5, maxIter = 100 } for i,sample in ipairs(training_samples) do local func = function(x) -- define eval function return f,df_dx end optim.sgd(func,x,state) end • Algorithms - adadelta, adagrad, asgd, cg, lbfgs, nag, …
  • 14. BINotable Packages dp ■ Command-line Arguments --[[command line arguments]]-- cmd = torch.CmdLine() cmd:text() cmd:text('Image Classification using MLP Training/Optimization') cmd:text('Example:') cmd:text('$> th neuralnetwork.lua --batchSize 128 --momentum 0.5') cmd:text('Options:') cmd:option('--learningRate', 0.1, 'learning rate at t=0') cmd:option('--schedule', '{}', 'learning rate schedule') cmd:option('--hiddenSize', '{200,200}', 'number of hidden units per layer') ⋮ cmd:option('--batchSize', 32, 'number of examples per batch') cmd:option('--cuda', false, 'use CUDA') cmd:option('--useDevice', 1, 'sets the device (GPU) to use') cmd:option('--maxEpoch', 100, 'maximum number of epochs to run') cmd:option('--dropout', false, 'apply dropout on hidden neurons') cmd:option('--batchNorm', false, 'use batch normalization. dropout is mostly redundant with this') cmd:option('--dataset', 'Mnist', 'which dataset to use : Mnist | NotMnist | Cifar10 | Cifar100') cmd:option('--standardize', false, 'apply Standardize preprocessing') cmd:option('--zca', false, 'apply Zero-Component Analysis whitening') cmd:option('--progress', false, 'display progress bar') cmd:option('--silent', false, 'dont print anything to stdout') cmd:text() opt = cmd:parse(arg or {}) opt.schedule = dp.returnString(opt.schedule) opt.hiddenSize = dp.returnString(opt.hiddenSize) if not opt.silent then table.print(opt) end
  • 15. BINotable Packages dp ■ Preprocess --[[preprocessing]]-- local input_preprocess = {} if opt.standardize then table.insert(input_preprocess, dp.Standardize()) end if opt.zca then table.insert(input_preprocess, dp.ZCA()) end if opt.lecunlcn then table.insert(input_preprocess, dp.GCN()) table.insert(input_preprocess, dp.LeCunLCN{progress=true}) end ■ DataSource --[[data]]-- if opt.dataset == 'Mnist' then ds = dp.Mnist{input_preprocess = input_preprocess} elseif opt.dataset == 'NotMnist' then ds = dp.NotMnist{input_preprocess = input_preprocess} elseif opt.dataset == 'Cifar10' then ds = dp.Cifar10{input_preprocess = input_preprocess} elseif opt.dataset == 'Cifar100' then ds = dp.Cifar100{input_preprocess = input_preprocess} else error("Unknown Dataset") end
  • 16. BINotable Packages dp ■ Model of Modules --[[Model]]-- model = nn.Sequential() model:add(nn.Convert(ds:ioShapes(), 'bf')) -- to batchSize x nFeature (also type converts) -- hidden layers inputSize = ds:featureSize() for i,hiddenSize in ipairs(opt.hiddenSize) do model:add(nn.Linear(inputSize, hiddenSize)) -- parameters if opt.batchNorm then model:add(nn.BatchNormalization(hiddenSize)) end model:add(nn.Tanh()) if opt.dropout then model:add(nn.Dropout()) end inputSize = hiddenSize end -- output layer model:add(nn.Linear(inputSize, #(ds:classes()))) model:add(nn.LogSoftMax())
  • 17. BINotable Packages dp ■ Propagator --[[Propagators]]-- if opt.lrDecay == 'adaptive' then ad = dp.AdaptiveDecay{max_wait = opt.maxWait, decay_factor=opt.decayFactor} elseif opt.lrDecay == 'linear' then opt.decayFactor = (opt.minLR - opt.learningRate)/opt.saturateEpoch end train = dp.Optimizer{ acc_update = opt.accUpdate, loss = nn.ModuleCriterion(nn.ClassNLLCriterion(), nil, nn.Convert()), epoch_callback = function(model, report) -- called every epoch -- learning rate decay if report.epoch > 0 then if opt.lrDecay == 'adaptive' then opt.learningRate = opt.learningRate*ad.decay ad.decay = 1 elseif opt.lrDecay == 'schedule' and opt.schedule[report.epoch] then opt.learningRate = opt.schedule[report.epoch] elseif opt.lrDecay == 'linear' then opt.learningRate = opt.learningRate + opt.decayFactor end opt.learningRate = math.max(opt.minLR, opt.learningRate) if not opt.silent then print("learningRate", opt.learningRate) end end end, callback = function(model, report) -- called every batch if opt.accUpdate then model:accUpdateGradParameters(model.dpnn_input, model.output, opt.learningRate) else model:updateGradParameters(opt.momentum) -- affects gradParams model:updateParameters(opt.learningRate) -- affects params end model:maxParamNorm(opt.maxOutNorm) -- affects params model:zeroGradParameters() -- affects gradParams end, feedback = dp.Confusion(), sampler = dp.ShuffleSampler{batch_size = opt.batchSize}, progress = opt.progress } valid = dp.Evaluator{ feedback = dp.Confusion(), sampler = dp.Sampler{batch_size = opt.batchSize} } test = dp.Evaluator{ feedback = dp.Confusion(), sampler = dp.Sampler{batch_size = opt.batchSize} }
  • 18. BINotable Packages dp ■ Experiment --[[Experiment]]-- xp = dp.Experiment{ model = model, optimizer = train, validator = valid, tester = test, observer = { dp.FileLogger(), dp.EarlyStopper{ error_report = {'validator','feedback','confusion','accuracy'}, maximize = true, max_epochs = opt.maxTries } }, random_seed = os.time(), max_epoch = opt.maxEpoch }
  • 19. BINotable Packages dp ■ Running the Experiment --[[GPU or CPU]]-- if opt.cuda then require 'cutorch' require 'cunn' cutorch.setDevice(opt.useDevice) xp:cuda() end --[[Experiment]]-- xp:run(ds) ■ Loading the saved Experiment require 'dp' require 'cuda' -- if you used cmd-line argument --cuda xp = torch.load("/home/nicholas/save/xps:1432747515:1.dat") model = xp:model() print(torch.type(model)) nn.Serial model = model.module print(torch.type(model)) nn.Sequential
  • 20. BINotable Packages rnn ■ RNN -- recurrent layer local rnn if opt.lstm then -- Long Short Term Memory rnn = nn.Sequencer(nn.FastLSTM(inputSize, hiddenSize)) else -- simple recurrent neural network rnn = nn.Recurrent( hiddenSize, -- first step will use nn.Add nn.Identity(), -- for efficiency (see above input layer) nn.Linear(hiddenSize, hiddenSize), -- feedback layer (recurrence) nn.Sigmoid(), -- transfer function 99999 -- maximum number of time-steps per sequence ) if opt.zeroFirst then -- this is equivalent to forwarding a zero vector through the feedback layer rnn.startModule:share(rnn.feedbackModule, 'bias') end rnn = nn.Sequencer(rnn) end
  • 22. BIQ1: What is the advantage over Theano? ■ Nicholas Leonard’ case • He is from the LISA lab (Mecca of Theano) switched from Theano to Torch for some reasons. He said: • Easy to code components in C/Cuda. • Theano is a C/CUDA compiler, which enables to perform automatic gradient differentiation. Whereas Torch7 is not such a compiler, so you don’t need to be symbolical, which means you don’t need to wait another 5 min for compiling. The compiling also makes debugging harder. • Pylearn2 adds fancy features to Theano, however, it isn’t easy. You have to learn a new kind of programming always thinking symbolically, with the risk of exeptions for it. The exeption may be your research.
 (Though dp is Pylearn2-like alternative for Torch7.) • So, I was tired, and wanted to get back to non-symbolic programming.
  • 23. BIQ1: What is the advantage over Theano? ■ Yann LeCun’s recommandation • Torch is used at Facebook AI Research and in other parts of Facebook. • It's also used heavily at Deep Mind (now Google) and people in the Google Brain group have started to use it too. • Naturally, it's used at NYU and IDIAP where much of the original development came from. But it's also used at INRIA in Paris, MSR in New York, Intel, and a number of startups.
  • 24. BIQ2: Is the # of dimmensions unlimited? ■ Torch7 supports an unlimited multi-dimmensional Tensor matrix. • The number of dimensions is unlimited that can be created using LongStorage with more dimensions. --- creation of a 4D-tensor 4x5x6x2 z = torch.Tensor(4,5,6,2) --- for more dimensions, (here a 6D tensor) one can do: s = torch.LongStorage(6) --- assigning lengths for each dimension, not values s[1] = 4; s[2] = 5; s[3] = 6; s[4] = 2; s[5] = 7; s[6] = 3; x = torch.Tensor(s) > x:nDimension() 6 > x:size() 4 5 6 2 7 3 [torch.LongStorage of size 6]
  References