SlideShare a Scribd company logo
1 of 103
Getting your hands dirty with deep
learning in java
Dave Snowdon
@davesnowdon
github.com/davesnowdon
https://www.linkedin.com/in/davesnowdon/
About me
• Java & javascript by day
• Python & clojure by night
• Amateur social roboticist
• Been learning about deep learning for 18 months
Objective
• Understand the basics of what neural networks are and how they work
• Understand the most common types of network you are likely to encounter and
what network architecture is likely to be suitable for a given application.
Agenda
• Setup
• Why neural networks
• How do neural networks work
• Exercise 1 building a neural network from “scratch”
• Exercise 2 building the same network with DL4J
• Convolutional neural networks
• Exercise 3 training a network to classify images
• Exercise 4 Using your own dataset (optional)
• Recurrent neural networks
• Exercise 5 sentiment analysis
• Exercise 6 text generation (optional)
Setup
• mkdir /my/data/directory
• export DEVOXXUK_GHDDL_DATA=/my/data/directory
• git clone https://github.com/davesnowdon/devoxxuk2018-dl-workshop.git
• cd devoxxuk2018-dl-workshop
• ./gradlew :ex0-setup:ex0run
Why neural networks?
Why care about deep learning?
• Impressive results in a wide range of domains
• image classification, text descriptions of images, language translation, speech
generation, speech recognition…
• Predictable execution (inference) time
• Amenable to hardware acceleration
• Automatic feature extraction
What are features?
10 PRINT “Hello Java developers”
20 GOTO 10
Average statement length
Number of statements
Number of variables
Cyclomatic complexity
Feature engineering
Traditional machine learning process
Pre-
process
Feature
Engineering
Model ResultsData
Pre-
process
Model Results
Deep learning process
Data F
E
Neural network downsides
• Need to define the model and it’s training parameters
• Large models can take days or weeks to train
• May need a lot of data. > 10K examples
How neural networks work
NOT YOUR NEURAL NETWORK
Deep learning != your brain
input 0
input 1
input N
weight 0
weight 1
weight N
bias (weight for fixed input)
S
u
m
x0
x1
xN
w0
w1
wN
b
S
u
m
outputF( )
Neuron model
0.5
1
-0.5
0.1
-0.5
4
0.8
S
u
m
F( )
Neuron model
-1.65
0.5
1
-0.5
0.1
-0.5
4
0.8
S
u
m
-1.65F( )
Neuron model
-1.65
identity
0.5
1
-0.5
0.1
-0.5
4
0.8
S
u
m
0.1611F( )
Neuron model
-1.65
sigmoid
0.5
1
-0.5
0.1
-0.5
4
0.8
S
u
m
-0.9289F( )
Neuron model
-1.65
tanh
0.5
1
-0.5
0.1
-0.5
4
0.8
S
u
m
0F( )
Neuron model
-1.65
ReLU
Neural networks are not graphs
Neural networks are like onions
(they have layers and can make you cry)
Input layer Output layerHidden layer
Why layers?
Layer 1 Layer 2
x
x
x
x
x
Fully Connected / Dense Network
Input layer Output layerHidden layeroutput = f(W2 . f(W1 . Input + B1) + B2)
W1
W2
Going deeper
Input layer Output layerHidden layer Hidden layer
What do the layers do?
Successive layers model higher level features
What input can a network accept?
• Anything you like as long as it’s a tensor
• Tensor = general multi-dimensional numeric quantity
• scalar = tensor of 0 dimensions (AKA rank 0)
• vector = 1 dimensional tensor (rank 1)
• matrix = 2 dimensional tensor (rank 2)
• tensor = N dimensional tensor (rank > 2)
Images
Can represent image as tensor of rank 3
Source: https://www.slideshare.net/BertonEarnshaw/a-brief-survey-of-tensors
One-hot encoding : input
FAVOURITE PROGRAMMING LANGUAGE
JAVA CLOJURE PYTHON JAVASCRIPT
BARRY 1 0 0 0
BRUCE 0 1 0 0
RUSSEL 0 0 1 0
“enums”
Categorical variables
One-hot encoding: output
JAVA CLOJURE PYTHON JAVASCRIPT
BARRY 0.6 0.1 0.1 0.2
BRUCE 0.15 0.75 0.05 0.05
RUSSEL 0.34 0.05 0.6 0.01
Also useful for output
Probability distribution
Back propagation
Training example
Cost
Function
Expected
output
Input example
Error
(also known as
cost or loss)
Gradient descent
Update delta = -gradient * learning rate
Cost
Model parameter (weight)
The Chain Rule
x0
x1
x2
w0
w1
w2
output
Want error gradient with respect to each weight
Computation graph
X
+ S/M Loss
W2
b2
A1
Z2 A2
+ ReLU
b1 Z1
X
X
W1
dZ1
dB2
dW2
dB1
dW1
dZ2
More on back propagation
Exercise 1 Build a neural network
Use the ND4J linear algebra library to implement a neural network and train it with back propagation to distinguish
between two classes.
Project: ex1-no-framework
Files to modify: NeuralNetClassifier.java
EX1 data set
Frameworks
Exercise 2 Use DL4J to build a neural
network
We solve the same problem as in exercise 1 but with the DL4J framework so we can get a feel for the benefits of using
a deep learning framework.
Project: ex2-dl4j
Files to modify: MLPClassifierMoon.java
Basic config
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.seed(seed)
.updater(Updater.ADAM)
.iterations(iterations)
// normalize to prevent vanishing or exploding gradients
.gradientNormalization(GradientNormalization.RenormalizeL2PerLayer)
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
.l1(1e-4)
.regularization(true)
.l2(5 * 1e-4)
.list()
Fully connected layer
.layer(10,
new DenseLayer.Builder()
.name(“ffn1")
.nOut(512)
.learningRate(1e-3)
.biasInit(1e-3).biasLearningRate(1e-3 * 2)
.build())
Output layer
.layer(14,
new OutputLayer.Builder(LossFunctions.LossFunction.MCXENT)
.name("output")
.nOut(numLabels)
.activation(Activation.SOFTMAX)
.build())
Build the model
.backprop(true)
.pretrain(false)
.setInputType(InputType.convolutional(height, width, channels))
.build();
MultiLayerNetwork model = new MultiLayerNetwork(conf);
model.init();
Train it
for (int i = 0; i < epochs; i++) {
model.fit(dataIter);
}
Summary so far
• Neural networks are NOT like your brain
• Networks are arranged as layers
• Forward pass compute output of network
• Backward pass compute gradients & adjust weights
• Frameworks take care of the math for you
• but still good to understand what’s going on
Convolutional Neural Networks
Images as input
• Even relative small images have large numbers of pixels
• For example 75x22x3 -> 4950 inputs
• 1:1 mapping from input to hidden: 4950x5950 + 4950 -> 24,507,450
• Easily 4,950,000 weights in first layer alone even if we only have 1000 neurons
• Maybe we need another neural network architecture
Convolution
Convolution example(s)
1 1 1
0 0 0
-1 -1 -1
-1 -1 -1
-1 8 -1
-1 -1 -1
1 0 -1
1 0 -1
1 0 -1
Convolutional layer
Max Pooling layer
Convolutional network
Exercise 3 Classifying CIFAR10 images
Training a convolutional network to classify each of the 10 image categories found in the CIFAR10 dataset.
Project: ex3-cnn-cifar10
Files to modify: Cifar.java
Classifying CIFAR10
Source: https://www.cs.toronto.edu/~kriz/cifar.html
DL4J model structure
Input
Convolution
Pooling
Convolution
Pooling
Fully connected
Softmax
Define the model
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.seed(seed)
.cacheMode(CacheMode.DEVICE)
.updater(Updater.ADAM)
.iterations(iterations)
.gradientNormalization(GradientNormalization.RenormalizeL2PerLayer) // normalize to prevent vanishing or exploding gradients
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
.l1(1e-4)
.regularization(true)
.l2(5 * 1e-4)
.list()
.layer(0, new ConvolutionLayer.Builder(new int[]{4, 4}, new int[]{1, 1}, new int[]{0, 0}).name("cnn1").convolutionMode(ConvolutionMode.Same)
.nIn(3).nOut(64).weightInit(WeightInit.XAVIER_UNIFORM).activation(Activation.RELU)//.learningRateDecayPolicy(LearningRatePolicy.Step)
.learningRate(1e-2).biasInit(1e-2).biasLearningRate(1e-2 * 2).build())
.layer(1, new ConvolutionLayer.Builder(new int[]{4, 4}, new int[]{1, 1}, new int[]{0, 0}).name("cnn2").convolutionMode(ConvolutionMode.Same)
.nOut(64).weightInit(WeightInit.XAVIER_UNIFORM).activation(Activation.RELU)
.learningRate(1e-2).biasInit(1e-2).biasLearningRate(1e-2 * 2).build())
.layer(2, new SubsamplingLayer.Builder(PoolingType.MAX, new int[]{2, 2}).name("maxpool2").build())
.layer(3, new ConvolutionLayer.Builder(new int[]{4, 4}, new int[]{1, 1}, new int[]{0, 0}).name("cnn3").convolutionMode(ConvolutionMode.Same)
.nOut(96).weightInit(WeightInit.XAVIER_UNIFORM).activation(Activation.RELU)
.learningRate(1e-2).biasInit(1e-2).biasLearningRate(1e-2 * 2).build())
.layer(4, new ConvolutionLayer.Builder(new int[]{4, 4}, new int[]{1, 1}, new int[]{0, 0}).name("cnn4").convolutionMode(ConvolutionMode.Same)
.nOut(96).weightInit(WeightInit.XAVIER_UNIFORM).activation(Activation.RELU)
.learningRate(1e-2).biasInit(1e-2).biasLearningRate(1e-2 * 2).build())
.layer(5, new ConvolutionLayer.Builder(new int[]{3, 3}, new int[]{1, 1}, new int[]{0, 0}).name("cnn5").convolutionMode(ConvolutionMode.Same)
.nOut(128).weightInit(WeightInit.XAVIER_UNIFORM).activation(Activation.RELU)
.learningRate(1e-2).biasInit(1e-2).biasLearningRate(1e-2 * 2).build())
.layer(6, new ConvolutionLayer.Builder(new int[]{3, 3}, new int[]{1, 1}, new int[]{0, 0}).name("cnn6").convolutionMode(ConvolutionMode.Same)
.nOut(128).weightInit(WeightInit.XAVIER_UNIFORM).activation(Activation.RELU)
.learningRate(1e-2).biasInit(1e-2).biasLearningRate(1e-2 * 2).build())
.layer(7, new ConvolutionLayer.Builder(new int[]{2, 2}, new int[]{1, 1}, new int[]{0, 0}).name("cnn7").convolutionMode(ConvolutionMode.Same)
.nOut(256).weightInit(WeightInit.XAVIER_UNIFORM).activation(Activation.RELU)
.learningRate(1e-2).biasInit(1e-2).biasLearningRate(1e-2 * 2).build())
.layer(8, new ConvolutionLayer.Builder(new int[]{2, 2}, new int[]{1, 1}, new int[]{0, 0}).name("cnn8").convolutionMode(ConvolutionMode.Same)
.nOut(256).weightInit(WeightInit.XAVIER_UNIFORM).activation(Activation.RELU)
.learningRate(1e-2).biasInit(1e-2).biasLearningRate(1e-2 * 2).build())
.layer(9, new SubsamplingLayer.Builder(PoolingType.MAX, new int[]{2, 2}).name("maxpool8").build())
.layer(10, new DenseLayer.Builder().name("ffn1").nOut(1024).learningRate(1e-3).biasInit(1e-3).biasLearningRate(1e-3 * 2).build())
.layer(11, new DropoutLayer.Builder().name("dropout1").dropOut(0.2).build())
.layer(12, new DenseLayer.Builder().name("ffn2").nOut(1024).learningRate(1e-2).biasInit(1e-2).biasLearningRate(1e-2 * 2).build())
.layer(13, new DropoutLayer.Builder().name("dropout2").dropOut(0.2).build())
.layer(14, new OutputLayer.Builder(LossFunctions.LossFunction.Not-VMwareLOGLIKELIHOOD)
.name("output")
.nOut(numLabels)
.activation(Activation.SOFTMAX)
.build())
.backprop(true)
.pretrain(false)
.setInputType(InputType.convolutional(height, width, channels))
.build();
MultiLayerNetwork model = new MultiLayerNetwork(conf);
model.init();
Convolutional layer
.layer(0,
new ConvolutionLayer.Builder(
new int[]{4, 4}, new int[]{1, 1}, new int[]{0, 0})
.name(“cnn1”)
.convolutionMode(ConvolutionMode.Same)
.nIn(3).nOut(64)
.weightInit(WeightInit.XAVIER_UNIFORM)
.activation(Activation.RELU)
.learningRate(1e-2)
.biasInit(1e-2).biasLearningRate(1e-2 * 2)
.build())
Pooling layer
.layer(2,
new SubsamplingLayer.Builder(PoolingType.MAX,
new int[]{2, 2})
.name(“maxpool2”)
.build())
Fully connected layer
.layer(10,
new DenseLayer.Builder()
.name(“ffn1")
.nOut(512)
.learningRate(1e-3)
.biasInit(1e-3).biasLearningRate(1e-3 * 2)
.build())
.layer(11,
new DropoutLayer.Builder()
.name(“dropout1")
.dropOut(0.2) // probability of keeping output of any given neutron
.build())
Data set 1
PathLabelGenerator labelMaker = new ParentPathLabelGenerator();
FileSplit fileSplit = new FileSplit(PATH,
NativeImageLoader.ALLOWED_FORMATS,
rng);
RandomPathFilter pathFilter = new RandomPathFilter(rng);
Data set 2
InputSplit[] inputSplit = fileSplit.sample(pathFilter, 0.8, 0.2);
InputSplit trainData = inputSplit[0];
InputSplit testData = inputSplit[1];
ImageRecordReader recordReader = new ImageRecordReader(22, 75, 3,
labelMaker);
dataIter = new RecordReaderDataSetIterator(recordReader, batchSize, 1,
numLabels);
Train it
for (int i = 0; i < epochs; i++) {
model.fit(dataIter);
}
Use it
// Use NativeImageLoader to convert to tensor
NativeImageLoader loader = new NativeImageLoader(height, width, 3);
// Get the image into an INDarray
INDArray image = loader.asMatrix(file);
// give a probability distribution
INDArray output = model.output(image);
// predict the output class
int[] predict = model.predict(image);
Exercise 4 Image classification with
your images
Experiment with training a network using your own images. The code is basically the same as for exercise 3 but with
dataset iterators that read images from a specified location in the file system.
Project: ex4-cnn-image-classification
Files to modify: ImageClassifier.java
First we need a dataset
Highlight the parts for training
Creating the dataset
• Grab images from google image search
• PyImageSearch “How to create a deep learning dataset using Google Images”
• Use dlib imglab tool to draw bounding boxes around logos / not Logos
• https://github.com/davisking/dlib/tree/master/tools/imglab
• Wrote python script to read imglab XML and produce cropped images using
OpenCV
Summary so far
Convolutional networks
• Used mostly for image processing
• Convolution layer applies learnt filter to inputs
• Pooling layer reduces size of inputs
• Fewer weights (parameters) to train compared to fully connected networks
Recurrent neural networks
Variable length input
VMware workstation rocks!
I’ve used vSphere for a number of years …
Feed forward networks don’t remember state
rocks!workstationVMware
Recurrent network
rocks!workstationVMware
Vanishing/exploding gradients
0
10
20
30
40
50
60
70
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
x 1.1
x 0.9
Long Short-Term Memory (LSTM) cell
Stanford large movie review dataset
Homelessness (or Houselessness as George Carlin stated) has been an issue for
years but never a plan to help those on the street that were once considered
human who did everything from going to school, …
This is the biggest Flop of 2008. I don know what Director has is his mind of
creating such a big disaster. The songs have been added without situations, the
story have been stretched to fill the 3 hrs gap …
http://ai.stanford.edu/~amaas/data/sentiment/
Words -> vectors
0 0 0 0 0 0 0 1 0 0 0 0 0 0 …
cat
0.67 5.4 0.2 0.46 15 69 23 …
1 hot, 3 million values
word2vec, 300 values
DL4J model
LSTM
The code
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.seed(seed)
.updater(Updater.ADAM) //To configure: .updater(Adam.builder().beta1(0.9).beta2(0.999).build())
.regularization(true).l2(1e-5)
.weightInit(WeightInit.XAVIER)
.gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue).gradientNormalizationThreshold(1.0)
.learningRate(2e-2)
.trainingWorkspaceMode(WorkspaceMode.SEPARATE).inferenceWorkspaceMode(WorkspaceMode.SEPARATE)
.list()
.layer(0, new GravesLSTM.Builder().nIn(vectorSize).nOut(256)
.activation(Activation.TANH).build())
.layer(1, new RnnOutputLayer.Builder().activation(Activation.SOFTMAX)
.lossFunction(LossFunctions.LossFunction.MCXENT).nIn(256).nOut(2).build())
.pretrain(false).backprop(true).build();
MultiLayerNetwork net = new MultiLayerNetwork(conf);
net.init();
Exercise 5 Implement sentiment
analysis with a recurrent network
Train a network to detect positive or negative sentiment by training it with movie reviews that have been classified as
either positive or negative.
Project: ex5-rnn-word2vecsentiment
Files to modify: Word2VecSentimentRNN.java
LSTM layer
.layer(0,
new GravesLSTM.Builder()
.nIn(vectorSize)
.nOut(256)
.activation(Activation.TANH)
.build())
Output layer
.layer(1,
new RnnOutputLayer.Builder()
.activation(Activation.SOFTMAX)
.lossFunction(LossFunctions.LossFunction.MCXENT)
.nIn(256)
.nOut(2)
.build()
Input pre-processing
WordVectors wordVectors =
WordVectorSerializer.loadStaticModel(WORD_VECTORS_PATH);
SentimentExampleIterator train =
new SentimentExampleIterator(DATA_PATH,
wordVectors, batchSize, truncateReviewsToLength, true);
…
// Get all word vectors for the current document and
// transpose them to fit the 2nd and 3rd feature shape
final INDArray vectors =
wordVectors.getWordVectors(tokens.subList(0, seqLength))
.transpose();
Classify example
INDArray features = test.loadFeaturesFromString(text, truncateReviewsToLength);
INDArray networkOutput = net.output(features);
int timeSeriesLength = networkOutput.size(2);
INDArray probabilitiesAtLastWord =
networkOutput.get(NDArrayIndex.point(0),
NDArrayIndex.all(),
NDArrayIndex.point(timeSeriesLength - 1));
RNN Summary
• Use recurrent neural networks (RNNs) for time series or sequential data
• RNNs can consume & generate sequences
• RNNs back propagate through time as well as layers
• very deep networks so increase in training time
• Use LSTM (or GRU) layers
Exercise 6 Text generation using an
RNN
Use a text file of your choice to train an RNN to generate text
Project: ex6-rnn-text-generation
Files to modify: GravesLSTMCharModellingExample.java
Text generation using RNNs
Shift the text left to generate output
T h e q u i c k b r o w n f o
h e q u i c k b r o w n f o x
Test data
LJC meetup #1
“Please note that this is a talk organised by the Java Web
Users Group, who have kindly invited all of our members to
attend. …”
DL4J model
LSTM
LSTM
Pick first character
INDArray initializationInput = Nd4j.zeros(1, iter.inputColumns(),
initialization.length());
int idx = iter.convertCharacterToIndex(getRandomCharacter());
initializationInput.putScalar(
new int[]{0,idx,i}, 1.0f);
StringBuilder sb = new StringBuilder();
net.rnnClearPreviousState();
Sample characters
for (int i=0; i<charactersToSample; i++ ){
INDArray nextInput = Nd4j.zeros(1, …);
int charIdx = sampleFromDistribution(output);
nextInput.putScalar(
new int[]{o,sampledCharacterIdx}, 1.0f);
sb.append(convertIndexToCharacter(charIdx));
output = net.rnnTimeStep(nextInput);
}
Results : epoch 0
hpee Te paol. and the: angocind ppeage. 0e3gfve antinlenf
sger ind dige pomet se andes af ivecockoat, ireloierd an., ge
toees afod cigitpectibhe thicer R Enderdlasd, siugiaganler
inobe. Nemteng fla tiye das aneed seveale dod shas soy to
has, Moste thameste teupelge. 5oantion ander then ins anfoe
ton-
Results : epoch 5
out more including up to this talk mixt on the Jinabchar is
architects. He has written at made community and testing
Jethinions, sides Cloud, all hew demonstrate Retald or
various minutes & deploying your own scalability in Java, and
how any's well as designed) our will have elow, years,
Results : epoch 50
Most of the core of the conference is to experienced
developer or simply RSVP to this event and they recruitment
way to guh a finish things to help you get started, including
some from IBM who work on specific techougher in the blog
here: http:blog.recworks.co.uk
Summary
• Each layer learns features composed of features from previous layer
• Convolutional neural networks (CNN) well suited for images
• Recurrent Neural Networks (RNN) used for time series data
• Can have networks combining both convolutional & recurrent layers
Further info #1
Further info #2
• Andrew Ng Coursera : https://www.coursera.org/specializations/deep-learning
• Udacity : https://www.udacity.com/course/deep-learning--ud730
• CS231n Winter 2016 lecture videos
• Andrej Karpathy’s blog : http://karpathy.github.io/
• Andrew Trask’s blog : https://iamtrask.github.io/
Backup / bonus slides
Back propagation
Training example
Cost
Function
Expected
output
Input example
Error
(also known as
cost or loss)
Time for some java
Backpropgation
Sigmoid & derivative
private INDArray sigmoid(INDArray input) {
return Nd4j.ones(input.shape()).div(exp(input.neg()).add(1));
}
private INDArray sigmoidDerivative(INDArray input) {
return input.mul(Nd4j.ones(input.shape()).sub(input));
}
Inputs & Weights
final double[][] inputsArray = {
{0, 0, 1},
{0, 1, 1},
{1, 0, 1},
{1, 1, 1}
};
final double[] labelsArray = {0, 1, 1, 0};
final INDArray inputs = Nd4j.create(inputsArray);
final INDArray labels = Nd4j.create(labelsArray).transpose();
learningRate = 0.1;
weights1 = Nd4j.randn(inputWidth, hiddenWidth);
weights2 = Nd4j.randn(hiddenWidth, outputWidth);
Forward pass
for (int i = 0; i < numIterations; ++i) {
// forward pass
INDArray layer1 = sigmoid(x.mmul(weights1));
INDArray layer2 = sigmoid(layer1.mmul(weights2));
// compute error
INDArray layer2Error = y.sub(layer2);
Backward pass
// backward pass
INDArray delta2 = layer2Error.mul(sigmoidDerivative(layer2));
INDArray layer1Error = delta2.mmul(weights2.transpose());
INDArray delta1 = layer1Error.mul(sigmoidDerivative(layer1));
Backward pass
weights2 = weights2.add(
layer1.transpose() // chain rule
.mmul(delta2) // error value
.mul(learningRate)); // update scale factor
weights1=weights1.add(
x.transpose()
.mmul(delta1)
.mul(learningRate));
} // end update loop

More Related Content

What's hot

Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
mjfrankli
 
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical Structures
Rajarshi Guha
 
スローダウン、ハングを一発解決 スレッドダンプはトラブルシューティングの味方 #wlstudy
スローダウン、ハングを一発解決 スレッドダンプはトラブルシューティングの味方 #wlstudyスローダウン、ハングを一発解決 スレッドダンプはトラブルシューティングの味方 #wlstudy
スローダウン、ハングを一発解決 スレッドダンプはトラブルシューティングの味方 #wlstudy
Yusuke Yamamoto
 
Python multithreading session 9 - shanmugam
Python multithreading session 9 - shanmugamPython multithreading session 9 - shanmugam
Python multithreading session 9 - shanmugam
Navaneethan Naveen
 

What's hot (17)

FPL15 talk: Deep Convolutional Neural Network on FPGA
FPL15 talk: Deep Convolutional Neural Network on FPGAFPL15 talk: Deep Convolutional Neural Network on FPGA
FPL15 talk: Deep Convolutional Neural Network on FPGA
 
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
 
How to win data science competitions with Deep Learning
How to win data science competitions with Deep LearningHow to win data science competitions with Deep Learning
How to win data science competitions with Deep Learning
 
Deep Learning for Computer Vision - ExecutiveML
Deep Learning for Computer Vision - ExecutiveMLDeep Learning for Computer Vision - ExecutiveML
Deep Learning for Computer Vision - ExecutiveML
 
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
 
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical Structures
 
Ac cuda c_3
Ac cuda c_3Ac cuda c_3
Ac cuda c_3
 
FCN-Based 6D Robotic Grasping for Arbitrary Placed Objects
FCN-Based 6D Robotic Grasping for Arbitrary Placed ObjectsFCN-Based 6D Robotic Grasping for Arbitrary Placed Objects
FCN-Based 6D Robotic Grasping for Arbitrary Placed Objects
 
Introduction to deep learning @ Startup.ML by Andres Rodriguez
Introduction to deep learning @ Startup.ML by Andres RodriguezIntroduction to deep learning @ Startup.ML by Andres Rodriguez
Introduction to deep learning @ Startup.ML by Andres Rodriguez
 
5 Introduction to neural networks
5 Introduction to neural networks5 Introduction to neural networks
5 Introduction to neural networks
 
Neural network
Neural networkNeural network
Neural network
 
Threads
ThreadsThreads
Threads
 
Don't dump thread dumps
Don't dump thread dumpsDon't dump thread dumps
Don't dump thread dumps
 
What is a Neural Network | Edureka
What is a Neural Network | EdurekaWhat is a Neural Network | Edureka
What is a Neural Network | Edureka
 
Slide tesi
Slide tesiSlide tesi
Slide tesi
 
スローダウン、ハングを一発解決 スレッドダンプはトラブルシューティングの味方 #wlstudy
スローダウン、ハングを一発解決 スレッドダンプはトラブルシューティングの味方 #wlstudyスローダウン、ハングを一発解決 スレッドダンプはトラブルシューティングの味方 #wlstudy
スローダウン、ハングを一発解決 スレッドダンプはトラブルシューティングの味方 #wlstudy
 
Python multithreading session 9 - shanmugam
Python multithreading session 9 - shanmugamPython multithreading session 9 - shanmugam
Python multithreading session 9 - shanmugam
 

Similar to Getting your hands dirty with deep learning in java

Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer Farooqui
Databricks
 

Similar to Getting your hands dirty with deep learning in java (20)

Deep Learning for Computer Vision - PyconDE 2017
Deep Learning for Computer Vision - PyconDE 2017Deep Learning for Computer Vision - PyconDE 2017
Deep Learning for Computer Vision - PyconDE 2017
 
Introduction To Tensorflow
Introduction To TensorflowIntroduction To Tensorflow
Introduction To Tensorflow
 
ANN Slides.pdf
ANN Slides.pdfANN Slides.pdf
ANN Slides.pdf
 
Keras on tensorflow in R & Python
Keras on tensorflow in R & PythonKeras on tensorflow in R & Python
Keras on tensorflow in R & Python
 
Artificial Neural networks
Artificial Neural networksArtificial Neural networks
Artificial Neural networks
 
Introduction to Neural Networks in Tensorflow
Introduction to Neural Networks in TensorflowIntroduction to Neural Networks in Tensorflow
Introduction to Neural Networks in Tensorflow
 
Deep learning summary
Deep learning summaryDeep learning summary
Deep learning summary
 
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 
Training course lect3
Training course lect3Training course lect3
Training course lect3
 
Transfer Learning (20230516)
Transfer Learning (20230516)Transfer Learning (20230516)
Transfer Learning (20230516)
 
Neural network
Neural networkNeural network
Neural network
 
Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용
 
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
 
Artificial neural network model & hidden layers in multilayer artificial neur...
Artificial neural network model & hidden layers in multilayer artificial neur...Artificial neural network model & hidden layers in multilayer artificial neur...
Artificial neural network model & hidden layers in multilayer artificial neur...
 
Caffe framework tutorial2
Caffe framework tutorial2Caffe framework tutorial2
Caffe framework tutorial2
 
Deep Neural Networks for Computer Vision
Deep Neural Networks for Computer VisionDeep Neural Networks for Computer Vision
Deep Neural Networks for Computer Vision
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
 
introduction to deeplearning
introduction to deeplearningintroduction to deeplearning
introduction to deeplearning
 
Automating Yourself Out of Trouble
Automating Yourself Out of TroubleAutomating Yourself Out of Trouble
Automating Yourself Out of Trouble
 
NVIDIA深度學習教育機構 (DLI): Object detection with jetson
NVIDIA深度學習教育機構 (DLI): Object detection with jetsonNVIDIA深度學習教育機構 (DLI): Object detection with jetson
NVIDIA深度學習教育機構 (DLI): Object detection with jetson
 

Recently uploaded

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
anilsa9823
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
anilsa9823
 

Recently uploaded (20)

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 

Getting your hands dirty with deep learning in java

  • 1. Getting your hands dirty with deep learning in java Dave Snowdon @davesnowdon github.com/davesnowdon https://www.linkedin.com/in/davesnowdon/
  • 2. About me • Java & javascript by day • Python & clojure by night • Amateur social roboticist • Been learning about deep learning for 18 months
  • 3. Objective • Understand the basics of what neural networks are and how they work • Understand the most common types of network you are likely to encounter and what network architecture is likely to be suitable for a given application.
  • 4. Agenda • Setup • Why neural networks • How do neural networks work • Exercise 1 building a neural network from “scratch” • Exercise 2 building the same network with DL4J • Convolutional neural networks • Exercise 3 training a network to classify images • Exercise 4 Using your own dataset (optional) • Recurrent neural networks • Exercise 5 sentiment analysis • Exercise 6 text generation (optional)
  • 5. Setup • mkdir /my/data/directory • export DEVOXXUK_GHDDL_DATA=/my/data/directory • git clone https://github.com/davesnowdon/devoxxuk2018-dl-workshop.git • cd devoxxuk2018-dl-workshop • ./gradlew :ex0-setup:ex0run
  • 7. Why care about deep learning? • Impressive results in a wide range of domains • image classification, text descriptions of images, language translation, speech generation, speech recognition… • Predictable execution (inference) time • Amenable to hardware acceleration • Automatic feature extraction
  • 8. What are features? 10 PRINT “Hello Java developers” 20 GOTO 10 Average statement length Number of statements Number of variables Cyclomatic complexity
  • 9. Feature engineering Traditional machine learning process Pre- process Feature Engineering Model ResultsData Pre- process Model Results Deep learning process Data F E
  • 10. Neural network downsides • Need to define the model and it’s training parameters • Large models can take days or weeks to train • May need a lot of data. > 10K examples
  • 12. NOT YOUR NEURAL NETWORK Deep learning != your brain
  • 13. input 0 input 1 input N weight 0 weight 1 weight N bias (weight for fixed input) S u m x0 x1 xN w0 w1 wN b S u m outputF( ) Neuron model
  • 19. Neural networks are not graphs
  • 20. Neural networks are like onions (they have layers and can make you cry) Input layer Output layerHidden layer
  • 21. Why layers? Layer 1 Layer 2 x x x x x
  • 22. Fully Connected / Dense Network Input layer Output layerHidden layeroutput = f(W2 . f(W1 . Input + B1) + B2) W1 W2
  • 23. Going deeper Input layer Output layerHidden layer Hidden layer
  • 24. What do the layers do? Successive layers model higher level features
  • 25. What input can a network accept? • Anything you like as long as it’s a tensor • Tensor = general multi-dimensional numeric quantity • scalar = tensor of 0 dimensions (AKA rank 0) • vector = 1 dimensional tensor (rank 1) • matrix = 2 dimensional tensor (rank 2) • tensor = N dimensional tensor (rank > 2)
  • 26. Images Can represent image as tensor of rank 3 Source: https://www.slideshare.net/BertonEarnshaw/a-brief-survey-of-tensors
  • 27. One-hot encoding : input FAVOURITE PROGRAMMING LANGUAGE JAVA CLOJURE PYTHON JAVASCRIPT BARRY 1 0 0 0 BRUCE 0 1 0 0 RUSSEL 0 0 1 0 “enums” Categorical variables
  • 28. One-hot encoding: output JAVA CLOJURE PYTHON JAVASCRIPT BARRY 0.6 0.1 0.1 0.2 BRUCE 0.15 0.75 0.05 0.05 RUSSEL 0.34 0.05 0.6 0.01 Also useful for output Probability distribution
  • 29. Back propagation Training example Cost Function Expected output Input example Error (also known as cost or loss)
  • 30. Gradient descent Update delta = -gradient * learning rate Cost Model parameter (weight)
  • 31. The Chain Rule x0 x1 x2 w0 w1 w2 output Want error gradient with respect to each weight
  • 32. Computation graph X + S/M Loss W2 b2 A1 Z2 A2 + ReLU b1 Z1 X X W1 dZ1 dB2 dW2 dB1 dW1 dZ2
  • 33. More on back propagation
  • 34. Exercise 1 Build a neural network Use the ND4J linear algebra library to implement a neural network and train it with back propagation to distinguish between two classes. Project: ex1-no-framework Files to modify: NeuralNetClassifier.java
  • 37. Exercise 2 Use DL4J to build a neural network We solve the same problem as in exercise 1 but with the DL4J framework so we can get a feel for the benefits of using a deep learning framework. Project: ex2-dl4j Files to modify: MLPClassifierMoon.java
  • 38. Basic config MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder() .seed(seed) .updater(Updater.ADAM) .iterations(iterations) // normalize to prevent vanishing or exploding gradients .gradientNormalization(GradientNormalization.RenormalizeL2PerLayer) .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT) .l1(1e-4) .regularization(true) .l2(5 * 1e-4) .list()
  • 39. Fully connected layer .layer(10, new DenseLayer.Builder() .name(“ffn1") .nOut(512) .learningRate(1e-3) .biasInit(1e-3).biasLearningRate(1e-3 * 2) .build())
  • 41. Build the model .backprop(true) .pretrain(false) .setInputType(InputType.convolutional(height, width, channels)) .build(); MultiLayerNetwork model = new MultiLayerNetwork(conf); model.init();
  • 42. Train it for (int i = 0; i < epochs; i++) { model.fit(dataIter); }
  • 43. Summary so far • Neural networks are NOT like your brain • Networks are arranged as layers • Forward pass compute output of network • Backward pass compute gradients & adjust weights • Frameworks take care of the math for you • but still good to understand what’s going on
  • 45. Images as input • Even relative small images have large numbers of pixels • For example 75x22x3 -> 4950 inputs • 1:1 mapping from input to hidden: 4950x5950 + 4950 -> 24,507,450 • Easily 4,950,000 weights in first layer alone even if we only have 1000 neurons • Maybe we need another neural network architecture
  • 47. Convolution example(s) 1 1 1 0 0 0 -1 -1 -1 -1 -1 -1 -1 8 -1 -1 -1 -1 1 0 -1 1 0 -1 1 0 -1
  • 51. Exercise 3 Classifying CIFAR10 images Training a convolutional network to classify each of the 10 image categories found in the CIFAR10 dataset. Project: ex3-cnn-cifar10 Files to modify: Cifar.java
  • 54. Define the model MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder() .seed(seed) .cacheMode(CacheMode.DEVICE) .updater(Updater.ADAM) .iterations(iterations) .gradientNormalization(GradientNormalization.RenormalizeL2PerLayer) // normalize to prevent vanishing or exploding gradients .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT) .l1(1e-4) .regularization(true) .l2(5 * 1e-4) .list() .layer(0, new ConvolutionLayer.Builder(new int[]{4, 4}, new int[]{1, 1}, new int[]{0, 0}).name("cnn1").convolutionMode(ConvolutionMode.Same) .nIn(3).nOut(64).weightInit(WeightInit.XAVIER_UNIFORM).activation(Activation.RELU)//.learningRateDecayPolicy(LearningRatePolicy.Step) .learningRate(1e-2).biasInit(1e-2).biasLearningRate(1e-2 * 2).build()) .layer(1, new ConvolutionLayer.Builder(new int[]{4, 4}, new int[]{1, 1}, new int[]{0, 0}).name("cnn2").convolutionMode(ConvolutionMode.Same) .nOut(64).weightInit(WeightInit.XAVIER_UNIFORM).activation(Activation.RELU) .learningRate(1e-2).biasInit(1e-2).biasLearningRate(1e-2 * 2).build()) .layer(2, new SubsamplingLayer.Builder(PoolingType.MAX, new int[]{2, 2}).name("maxpool2").build()) .layer(3, new ConvolutionLayer.Builder(new int[]{4, 4}, new int[]{1, 1}, new int[]{0, 0}).name("cnn3").convolutionMode(ConvolutionMode.Same) .nOut(96).weightInit(WeightInit.XAVIER_UNIFORM).activation(Activation.RELU) .learningRate(1e-2).biasInit(1e-2).biasLearningRate(1e-2 * 2).build()) .layer(4, new ConvolutionLayer.Builder(new int[]{4, 4}, new int[]{1, 1}, new int[]{0, 0}).name("cnn4").convolutionMode(ConvolutionMode.Same) .nOut(96).weightInit(WeightInit.XAVIER_UNIFORM).activation(Activation.RELU) .learningRate(1e-2).biasInit(1e-2).biasLearningRate(1e-2 * 2).build()) .layer(5, new ConvolutionLayer.Builder(new int[]{3, 3}, new int[]{1, 1}, new int[]{0, 0}).name("cnn5").convolutionMode(ConvolutionMode.Same) .nOut(128).weightInit(WeightInit.XAVIER_UNIFORM).activation(Activation.RELU) .learningRate(1e-2).biasInit(1e-2).biasLearningRate(1e-2 * 2).build()) .layer(6, new ConvolutionLayer.Builder(new int[]{3, 3}, new int[]{1, 1}, new int[]{0, 0}).name("cnn6").convolutionMode(ConvolutionMode.Same) .nOut(128).weightInit(WeightInit.XAVIER_UNIFORM).activation(Activation.RELU) .learningRate(1e-2).biasInit(1e-2).biasLearningRate(1e-2 * 2).build()) .layer(7, new ConvolutionLayer.Builder(new int[]{2, 2}, new int[]{1, 1}, new int[]{0, 0}).name("cnn7").convolutionMode(ConvolutionMode.Same) .nOut(256).weightInit(WeightInit.XAVIER_UNIFORM).activation(Activation.RELU) .learningRate(1e-2).biasInit(1e-2).biasLearningRate(1e-2 * 2).build()) .layer(8, new ConvolutionLayer.Builder(new int[]{2, 2}, new int[]{1, 1}, new int[]{0, 0}).name("cnn8").convolutionMode(ConvolutionMode.Same) .nOut(256).weightInit(WeightInit.XAVIER_UNIFORM).activation(Activation.RELU) .learningRate(1e-2).biasInit(1e-2).biasLearningRate(1e-2 * 2).build()) .layer(9, new SubsamplingLayer.Builder(PoolingType.MAX, new int[]{2, 2}).name("maxpool8").build()) .layer(10, new DenseLayer.Builder().name("ffn1").nOut(1024).learningRate(1e-3).biasInit(1e-3).biasLearningRate(1e-3 * 2).build()) .layer(11, new DropoutLayer.Builder().name("dropout1").dropOut(0.2).build()) .layer(12, new DenseLayer.Builder().name("ffn2").nOut(1024).learningRate(1e-2).biasInit(1e-2).biasLearningRate(1e-2 * 2).build()) .layer(13, new DropoutLayer.Builder().name("dropout2").dropOut(0.2).build()) .layer(14, new OutputLayer.Builder(LossFunctions.LossFunction.Not-VMwareLOGLIKELIHOOD) .name("output") .nOut(numLabels) .activation(Activation.SOFTMAX) .build()) .backprop(true) .pretrain(false) .setInputType(InputType.convolutional(height, width, channels)) .build(); MultiLayerNetwork model = new MultiLayerNetwork(conf); model.init();
  • 55. Convolutional layer .layer(0, new ConvolutionLayer.Builder( new int[]{4, 4}, new int[]{1, 1}, new int[]{0, 0}) .name(“cnn1”) .convolutionMode(ConvolutionMode.Same) .nIn(3).nOut(64) .weightInit(WeightInit.XAVIER_UNIFORM) .activation(Activation.RELU) .learningRate(1e-2) .biasInit(1e-2).biasLearningRate(1e-2 * 2) .build())
  • 57. Fully connected layer .layer(10, new DenseLayer.Builder() .name(“ffn1") .nOut(512) .learningRate(1e-3) .biasInit(1e-3).biasLearningRate(1e-3 * 2) .build()) .layer(11, new DropoutLayer.Builder() .name(“dropout1") .dropOut(0.2) // probability of keeping output of any given neutron .build())
  • 58. Data set 1 PathLabelGenerator labelMaker = new ParentPathLabelGenerator(); FileSplit fileSplit = new FileSplit(PATH, NativeImageLoader.ALLOWED_FORMATS, rng); RandomPathFilter pathFilter = new RandomPathFilter(rng);
  • 59. Data set 2 InputSplit[] inputSplit = fileSplit.sample(pathFilter, 0.8, 0.2); InputSplit trainData = inputSplit[0]; InputSplit testData = inputSplit[1]; ImageRecordReader recordReader = new ImageRecordReader(22, 75, 3, labelMaker); dataIter = new RecordReaderDataSetIterator(recordReader, batchSize, 1, numLabels);
  • 60. Train it for (int i = 0; i < epochs; i++) { model.fit(dataIter); }
  • 61. Use it // Use NativeImageLoader to convert to tensor NativeImageLoader loader = new NativeImageLoader(height, width, 3); // Get the image into an INDarray INDArray image = loader.asMatrix(file); // give a probability distribution INDArray output = model.output(image); // predict the output class int[] predict = model.predict(image);
  • 62. Exercise 4 Image classification with your images Experiment with training a network using your own images. The code is basically the same as for exercise 3 but with dataset iterators that read images from a specified location in the file system. Project: ex4-cnn-image-classification Files to modify: ImageClassifier.java
  • 63. First we need a dataset
  • 64. Highlight the parts for training
  • 65. Creating the dataset • Grab images from google image search • PyImageSearch “How to create a deep learning dataset using Google Images” • Use dlib imglab tool to draw bounding boxes around logos / not Logos • https://github.com/davisking/dlib/tree/master/tools/imglab • Wrote python script to read imglab XML and produce cropped images using OpenCV
  • 66. Summary so far Convolutional networks • Used mostly for image processing • Convolution layer applies learnt filter to inputs • Pooling layer reduces size of inputs • Fewer weights (parameters) to train compared to fully connected networks
  • 68. Variable length input VMware workstation rocks! I’ve used vSphere for a number of years …
  • 69. Feed forward networks don’t remember state rocks!workstationVMware
  • 71. Vanishing/exploding gradients 0 10 20 30 40 50 60 70 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 x 1.1 x 0.9
  • 72. Long Short-Term Memory (LSTM) cell
  • 73. Stanford large movie review dataset Homelessness (or Houselessness as George Carlin stated) has been an issue for years but never a plan to help those on the street that were once considered human who did everything from going to school, … This is the biggest Flop of 2008. I don know what Director has is his mind of creating such a big disaster. The songs have been added without situations, the story have been stretched to fill the 3 hrs gap … http://ai.stanford.edu/~amaas/data/sentiment/
  • 74. Words -> vectors 0 0 0 0 0 0 0 1 0 0 0 0 0 0 … cat 0.67 5.4 0.2 0.46 15 69 23 … 1 hot, 3 million values word2vec, 300 values
  • 76. The code MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder() .seed(seed) .updater(Updater.ADAM) //To configure: .updater(Adam.builder().beta1(0.9).beta2(0.999).build()) .regularization(true).l2(1e-5) .weightInit(WeightInit.XAVIER) .gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue).gradientNormalizationThreshold(1.0) .learningRate(2e-2) .trainingWorkspaceMode(WorkspaceMode.SEPARATE).inferenceWorkspaceMode(WorkspaceMode.SEPARATE) .list() .layer(0, new GravesLSTM.Builder().nIn(vectorSize).nOut(256) .activation(Activation.TANH).build()) .layer(1, new RnnOutputLayer.Builder().activation(Activation.SOFTMAX) .lossFunction(LossFunctions.LossFunction.MCXENT).nIn(256).nOut(2).build()) .pretrain(false).backprop(true).build(); MultiLayerNetwork net = new MultiLayerNetwork(conf); net.init();
  • 77. Exercise 5 Implement sentiment analysis with a recurrent network Train a network to detect positive or negative sentiment by training it with movie reviews that have been classified as either positive or negative. Project: ex5-rnn-word2vecsentiment Files to modify: Word2VecSentimentRNN.java
  • 80. Input pre-processing WordVectors wordVectors = WordVectorSerializer.loadStaticModel(WORD_VECTORS_PATH); SentimentExampleIterator train = new SentimentExampleIterator(DATA_PATH, wordVectors, batchSize, truncateReviewsToLength, true); … // Get all word vectors for the current document and // transpose them to fit the 2nd and 3rd feature shape final INDArray vectors = wordVectors.getWordVectors(tokens.subList(0, seqLength)) .transpose();
  • 81. Classify example INDArray features = test.loadFeaturesFromString(text, truncateReviewsToLength); INDArray networkOutput = net.output(features); int timeSeriesLength = networkOutput.size(2); INDArray probabilitiesAtLastWord = networkOutput.get(NDArrayIndex.point(0), NDArrayIndex.all(), NDArrayIndex.point(timeSeriesLength - 1));
  • 82. RNN Summary • Use recurrent neural networks (RNNs) for time series or sequential data • RNNs can consume & generate sequences • RNNs back propagate through time as well as layers • very deep networks so increase in training time • Use LSTM (or GRU) layers
  • 83. Exercise 6 Text generation using an RNN Use a text file of your choice to train an RNN to generate text Project: ex6-rnn-text-generation Files to modify: GravesLSTMCharModellingExample.java
  • 84. Text generation using RNNs Shift the text left to generate output T h e q u i c k b r o w n f o h e q u i c k b r o w n f o x
  • 85. Test data LJC meetup #1 “Please note that this is a talk organised by the Java Web Users Group, who have kindly invited all of our members to attend. …”
  • 87. Pick first character INDArray initializationInput = Nd4j.zeros(1, iter.inputColumns(), initialization.length()); int idx = iter.convertCharacterToIndex(getRandomCharacter()); initializationInput.putScalar( new int[]{0,idx,i}, 1.0f); StringBuilder sb = new StringBuilder(); net.rnnClearPreviousState();
  • 88. Sample characters for (int i=0; i<charactersToSample; i++ ){ INDArray nextInput = Nd4j.zeros(1, …); int charIdx = sampleFromDistribution(output); nextInput.putScalar( new int[]{o,sampledCharacterIdx}, 1.0f); sb.append(convertIndexToCharacter(charIdx)); output = net.rnnTimeStep(nextInput); }
  • 89. Results : epoch 0 hpee Te paol. and the: angocind ppeage. 0e3gfve antinlenf sger ind dige pomet se andes af ivecockoat, ireloierd an., ge toees afod cigitpectibhe thicer R Enderdlasd, siugiaganler inobe. Nemteng fla tiye das aneed seveale dod shas soy to has, Moste thameste teupelge. 5oantion ander then ins anfoe ton-
  • 90. Results : epoch 5 out more including up to this talk mixt on the Jinabchar is architects. He has written at made community and testing Jethinions, sides Cloud, all hew demonstrate Retald or various minutes & deploying your own scalability in Java, and how any's well as designed) our will have elow, years,
  • 91. Results : epoch 50 Most of the core of the conference is to experienced developer or simply RSVP to this event and they recruitment way to guh a finish things to help you get started, including some from IBM who work on specific techougher in the blog here: http:blog.recworks.co.uk
  • 92. Summary • Each layer learns features composed of features from previous layer • Convolutional neural networks (CNN) well suited for images • Recurrent Neural Networks (RNN) used for time series data • Can have networks combining both convolutional & recurrent layers
  • 94. Further info #2 • Andrew Ng Coursera : https://www.coursera.org/specializations/deep-learning • Udacity : https://www.udacity.com/course/deep-learning--ud730 • CS231n Winter 2016 lecture videos • Andrej Karpathy’s blog : http://karpathy.github.io/ • Andrew Trask’s blog : https://iamtrask.github.io/
  • 95. Backup / bonus slides
  • 96. Back propagation Training example Cost Function Expected output Input example Error (also known as cost or loss)
  • 99. Sigmoid & derivative private INDArray sigmoid(INDArray input) { return Nd4j.ones(input.shape()).div(exp(input.neg()).add(1)); } private INDArray sigmoidDerivative(INDArray input) { return input.mul(Nd4j.ones(input.shape()).sub(input)); }
  • 100. Inputs & Weights final double[][] inputsArray = { {0, 0, 1}, {0, 1, 1}, {1, 0, 1}, {1, 1, 1} }; final double[] labelsArray = {0, 1, 1, 0}; final INDArray inputs = Nd4j.create(inputsArray); final INDArray labels = Nd4j.create(labelsArray).transpose(); learningRate = 0.1; weights1 = Nd4j.randn(inputWidth, hiddenWidth); weights2 = Nd4j.randn(hiddenWidth, outputWidth);
  • 101. Forward pass for (int i = 0; i < numIterations; ++i) { // forward pass INDArray layer1 = sigmoid(x.mmul(weights1)); INDArray layer2 = sigmoid(layer1.mmul(weights2)); // compute error INDArray layer2Error = y.sub(layer2);
  • 102. Backward pass // backward pass INDArray delta2 = layer2Error.mul(sigmoidDerivative(layer2)); INDArray layer1Error = delta2.mmul(weights2.transpose()); INDArray delta1 = layer1Error.mul(sigmoidDerivative(layer1));
  • 103. Backward pass weights2 = weights2.add( layer1.transpose() // chain rule .mmul(delta2) // error value .mul(learningRate)); // update scale factor weights1=weights1.add( x.transpose() .mmul(delta1) .mul(learningRate)); } // end update loop