Getting your hands dirty with deep learning in java

Getting your hands dirty with deep
learning in java
Dave Snowdon
@davesnowdon
github.com/davesnowdon
https://www.linkedin.com/in/davesnowdon/

About me
• Java & javascript by day
• Python & clojure by night
• Amateur social roboticist
• Been learning about deep learning for 18 months

Objective
• Understand the basics of what neural networks are and how they work
• Understand the most common types of network you are likely to encounter and
what network architecture is likely to be suitable for a given application.

Agenda
• Setup
• Why neural networks
• How do neural networks work
• Exercise 1 building a neural network from “scratch”
• Exercise 2 building the same network with DL4J
• Convolutional neural networks
• Exercise 3 training a network to classify images
• Exercise 4 Using your own dataset (optional)
• Recurrent neural networks
• Exercise 5 sentiment analysis
• Exercise 6 text generation (optional)

Setup
• mkdir /my/data/directory
• export DEVOXXUK_GHDDL_DATA=/my/data/directory
• git clone https://github.com/davesnowdon/devoxxuk2018-dl-workshop.git
• cd devoxxuk2018-dl-workshop
• ./gradlew :ex0-setup:ex0run

Why care about deep learning?
• Impressive results in a wide range of domains
• image classification, text descriptions of images, language translation, speech
generation, speech recognition…
• Predictable execution (inference) time
• Amenable to hardware acceleration
• Automatic feature extraction

What are features?
10 PRINT “Hello Java developers”
20 GOTO 10
Average statement length
Number of statements
Number of variables
Cyclomatic complexity

Feature engineering
Traditional machine learning process
Pre-
process
Feature
Engineering
Model ResultsData
Pre-
process
Model Results
Deep learning process
Data F
E

Neural network downsides
• Need to define the model and it’s training parameters
• Large models can take days or weeks to train
• May need a lot of data. > 10K examples

NOT YOUR NEURAL NETWORK
Deep learning != your brain

input 0
input 1
input N
weight 0
weight 1
weight N
bias (weight for fixed input)
S
u
m
x0
x1
xN
w0
w1
wN
b
S
u
m
outputF( )
Neuron model

0.5
1
-0.5
0.1
-0.5
4
0.8
S
u
m
F( )
Neuron model
-1.65

0.5
1
-0.5
0.1
-0.5
4
0.8
S
u
m
-1.65F( )
Neuron model
-1.65
identity

0.5
1
-0.5
0.1
-0.5
4
0.8
S
u
m
0.1611F( )
Neuron model
-1.65
sigmoid

0.5
1
-0.5
0.1
-0.5
4
0.8
S
u
m
-0.9289F( )
Neuron model
-1.65
tanh

0.5
1
-0.5
0.1
-0.5
4
0.8
S
u
m
0F( )
Neuron model
-1.65
ReLU

Neural networks are not graphs

Neural networks are like onions
(they have layers and can make you cry)
Input layer Output layerHidden layer

Why layers?
Layer 1 Layer 2
x
x
x
x
x

Fully Connected / Dense Network
Input layer Output layerHidden layeroutput = f(W2 . f(W1 . Input + B1) + B2)
W1
W2

Going deeper
Input layer Output layerHidden layer Hidden layer

What do the layers do?
Successive layers model higher level features

What input can a network accept?
• Anything you like as long as it’s a tensor
• Tensor = general multi-dimensional numeric quantity
• scalar = tensor of 0 dimensions (AKA rank 0)
• vector = 1 dimensional tensor (rank 1)
• matrix = 2 dimensional tensor (rank 2)
• tensor = N dimensional tensor (rank > 2)

Images
Can represent image as tensor of rank 3
Source: https://www.slideshare.net/BertonEarnshaw/a-brief-survey-of-tensors

One-hot encoding : input
FAVOURITE PROGRAMMING LANGUAGE
JAVA CLOJURE PYTHON JAVASCRIPT
BARRY 1 0 0 0
BRUCE 0 1 0 0
RUSSEL 0 0 1 0
“enums”
Categorical variables

One-hot encoding: output
JAVA CLOJURE PYTHON JAVASCRIPT
BARRY 0.6 0.1 0.1 0.2
BRUCE 0.15 0.75 0.05 0.05
RUSSEL 0.34 0.05 0.6 0.01
Also useful for output
Probability distribution

Back propagation
Training example
Cost
Function
Expected
output
Input example
Error
(also known as
cost or loss)

Gradient descent
Update delta = -gradient * learning rate
Cost
Model parameter (weight)

The Chain Rule
x0
x1
x2
w0
w1
w2
output
Want error gradient with respect to each weight

Computation graph
X
+ S/M Loss
W2
b2
A1
Z2 A2
+ ReLU
b1 Z1
X
X
W1
dZ1
dB2
dW2
dB1
dW1
dZ2

Exercise 1 Build a neural network
Use the ND4J linear algebra library to implement a neural network and train it with back propagation to distinguish
between two classes.
Project: ex1-no-framework
Files to modify: NeuralNetClassifier.java

Exercise 2 Use DL4J to build a neural
network
We solve the same problem as in exercise 1 but with the DL4J framework so we can get a feel for the benefits of using
a deep learning framework.
Project: ex2-dl4j
Files to modify: MLPClassifierMoon.java

Basic config
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.seed(seed)
.updater(Updater.ADAM)
.iterations(iterations)
// normalize to prevent vanishing or exploding gradients
.gradientNormalization(GradientNormalization.RenormalizeL2PerLayer)
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
.l1(1e-4)
.regularization(true)
.l2(5 * 1e-4)
.list()

Fully connected layer
.layer(10,
new DenseLayer.Builder()
.name(“ffn1")
.nOut(512)
.learningRate(1e-3)
.biasInit(1e-3).biasLearningRate(1e-3 * 2)
.build())

Output layer
.layer(14,
new OutputLayer.Builder(LossFunctions.LossFunction.MCXENT)
.name("output")
.nOut(numLabels)
.activation(Activation.SOFTMAX)
.build())

Build the model
.backprop(true)
.pretrain(false)
.setInputType(InputType.convolutional(height, width, channels))
.build();
MultiLayerNetwork model = new MultiLayerNetwork(conf);
model.init();

Train it
for (int i = 0; i < epochs; i++) {
model.fit(dataIter);
}

Summary so far
• Neural networks are NOT like your brain
• Networks are arranged as layers
• Forward pass compute output of network
• Backward pass compute gradients & adjust weights
• Frameworks take care of the math for you
• but still good to understand what’s going on

Images as input
• Even relative small images have large numbers of pixels
• For example 75x22x3 -> 4950 inputs
• 1:1 mapping from input to hidden: 4950x5950 + 4950 -> 24,507,450
• Easily 4,950,000 weights in first layer alone even if we only have 1000 neurons
• Maybe we need another neural network architecture

Convolution example(s)
1 1 1
0 0 0
-1 -1 -1
-1 -1 -1
-1 8 -1
-1 -1 -1
1 0 -1
1 0 -1
1 0 -1

Exercise 3 Classifying CIFAR10 images
Training a convolutional network to classify each of the 10 image categories found in the CIFAR10 dataset.
Project: ex3-cnn-cifar10
Files to modify: Cifar.java

Classifying CIFAR10
Source: https://www.cs.toronto.edu/~kriz/cifar.html

DL4J model structure
Input
Convolution
Pooling
Convolution
Pooling
Fully connected
Softmax

Define the model
.seed(seed)
.cacheMode(CacheMode.DEVICE)
.updater(Updater.ADAM)
.iterations(iterations)
.gradientNormalization(GradientNormalization.RenormalizeL2PerLayer) // normalize to prevent vanishing or exploding gradients
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
.l1(1e-4)
.regularization(true)
.l2(5 * 1e-4)
.list()
.layer(0, new ConvolutionLayer.Builder(new int[]{4, 4}, new int[]{1, 1}, new int[]{0, 0}).name("cnn1").convolutionMode(ConvolutionMode.Same)
.nIn(3).nOut(64).weightInit(WeightInit.XAVIER_UNIFORM).activation(Activation.RELU)//.learningRateDecayPolicy(LearningRatePolicy.Step)
.learningRate(1e-2).biasInit(1e-2).biasLearningRate(1e-2 * 2).build())
.nOut(64).weightInit(WeightInit.XAVIER_UNIFORM).activation(Activation.RELU)
.layer(2, new SubsamplingLayer.Builder(PoolingType.MAX, new int[]{2, 2}).name("maxpool2").build())
.layer(9, new SubsamplingLayer.Builder(PoolingType.MAX, new int[]{2, 2}).name("maxpool8").build())
.layer(10, new DenseLayer.Builder().name("ffn1").nOut(1024).learningRate(1e-3).biasInit(1e-3).biasLearningRate(1e-3 * 2).build())
.layer(11, new DropoutLayer.Builder().name("dropout1").dropOut(0.2).build())
.layer(12, new DenseLayer.Builder().name("ffn2").nOut(1024).learningRate(1e-2).biasInit(1e-2).biasLearningRate(1e-2 * 2).build())
.layer(13, new DropoutLayer.Builder().name("dropout2").dropOut(0.2).build())
.layer(14, new OutputLayer.Builder(LossFunctions.LossFunction.Not-VMwareLOGLIKELIHOOD)
.name("output")
.nOut(numLabels)
.build())
.backprop(true)
.pretrain(false)
.setInputType(InputType.convolutional(height, width, channels))
.build();
MultiLayerNetwork model = new MultiLayerNetwork(conf);
model.init();

Convolutional layer
.layer(0,
new ConvolutionLayer.Builder(
new int[]{4, 4}, new int[]{1, 1}, new int[]{0, 0})
.name(“cnn1”)
.convolutionMode(ConvolutionMode.Same)
.nIn(3).nOut(64)
.weightInit(WeightInit.XAVIER_UNIFORM)
.activation(Activation.RELU)
.learningRate(1e-2)
.build())

Pooling layer
.layer(2,
new SubsamplingLayer.Builder(PoolingType.MAX,
new int[]{2, 2})
.name(“maxpool2”)
.build())

Fully connected layer
.layer(10,
new DenseLayer.Builder()
.name(“ffn1")
.nOut(512)
.learningRate(1e-3)
.build())
.layer(11,
new DropoutLayer.Builder()
.name(“dropout1")
.dropOut(0.2) // probability of keeping output of any given neutron
.build())

Data set 1
PathLabelGenerator labelMaker = new ParentPathLabelGenerator();
FileSplit fileSplit = new FileSplit(PATH,
NativeImageLoader.ALLOWED_FORMATS,
rng);
RandomPathFilter pathFilter = new RandomPathFilter(rng);

Data set 2
InputSplit[] inputSplit = fileSplit.sample(pathFilter, 0.8, 0.2);
InputSplit trainData = inputSplit[0];
InputSplit testData = inputSplit[1];
ImageRecordReader recordReader = new ImageRecordReader(22, 75, 3,
labelMaker);
dataIter = new RecordReaderDataSetIterator(recordReader, batchSize, 1,
numLabels);

Use it
// Use NativeImageLoader to convert to tensor
NativeImageLoader loader = new NativeImageLoader(height, width, 3);
// Get the image into an INDarray
INDArray image = loader.asMatrix(file);
// give a probability distribution
INDArray output = model.output(image);
// predict the output class
int[] predict = model.predict(image);

Exercise 4 Image classification with
your images
Experiment with training a network using your own images. The code is basically the same as for exercise 3 but with
dataset iterators that read images from a specified location in the file system.
Project: ex4-cnn-image-classification
Files to modify: ImageClassifier.java

Highlight the parts for training

Creating the dataset
• Grab images from google image search
• PyImageSearch “How to create a deep learning dataset using Google Images”
• Use dlib imglab tool to draw bounding boxes around logos / not Logos
• https://github.com/davisking/dlib/tree/master/tools/imglab
• Wrote python script to read imglab XML and produce cropped images using
OpenCV

Summary so far
Convolutional networks
• Used mostly for image processing
• Convolution layer applies learnt filter to inputs
• Pooling layer reduces size of inputs
• Fewer weights (parameters) to train compared to fully connected networks

Variable length input
VMware workstation rocks!
I’ve used vSphere for a number of years …

Feed forward networks don’t remember state
rocks!workstationVMware

Recurrent network
rocks!workstationVMware

Vanishing/exploding gradients
0
10
20
30
40
50
60
70
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
x 1.1
x 0.9

Long Short-Term Memory (LSTM) cell

Stanford large movie review dataset
Homelessness (or Houselessness as George Carlin stated) has been an issue for
years but never a plan to help those on the street that were once considered
human who did everything from going to school, …
This is the biggest Flop of 2008. I don know what Director has is his mind of
creating such a big disaster. The songs have been added without situations, the
story have been stretched to fill the 3 hrs gap …
http://ai.stanford.edu/~amaas/data/sentiment/

Words -> vectors
0 0 0 0 0 0 0 1 0 0 0 0 0 0 …
cat
0.67 5.4 0.2 0.46 15 69 23 …
1 hot, 3 million values
word2vec, 300 values

The code
.seed(seed)
.updater(Updater.ADAM) //To configure: .updater(Adam.builder().beta1(0.9).beta2(0.999).build())
.regularization(true).l2(1e-5)
.weightInit(WeightInit.XAVIER)
.gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue).gradientNormalizationThreshold(1.0)
.learningRate(2e-2)
.trainingWorkspaceMode(WorkspaceMode.SEPARATE).inferenceWorkspaceMode(WorkspaceMode.SEPARATE)
.list()
.layer(0, new GravesLSTM.Builder().nIn(vectorSize).nOut(256)
.activation(Activation.TANH).build())
.layer(1, new RnnOutputLayer.Builder().activation(Activation.SOFTMAX)
.lossFunction(LossFunctions.LossFunction.MCXENT).nIn(256).nOut(2).build())
.pretrain(false).backprop(true).build();
MultiLayerNetwork net = new MultiLayerNetwork(conf);
net.init();

Exercise 5 Implement sentiment
analysis with a recurrent network
Train a network to detect positive or negative sentiment by training it with movie reviews that have been classified as
either positive or negative.
Project: ex5-rnn-word2vecsentiment
Files to modify: Word2VecSentimentRNN.java

LSTM layer
.layer(0,
new GravesLSTM.Builder()
.nIn(vectorSize)
.nOut(256)
.activation(Activation.TANH)
.build())

Output layer
.layer(1,
new RnnOutputLayer.Builder()
.lossFunction(LossFunctions.LossFunction.MCXENT)
.nIn(256)
.nOut(2)
.build()

Input pre-processing
WordVectors wordVectors =
WordVectorSerializer.loadStaticModel(WORD_VECTORS_PATH);
SentimentExampleIterator train =
new SentimentExampleIterator(DATA_PATH,
wordVectors, batchSize, truncateReviewsToLength, true);
…
// Get all word vectors for the current document and
// transpose them to fit the 2nd and 3rd feature shape
final INDArray vectors =
wordVectors.getWordVectors(tokens.subList(0, seqLength))
.transpose();

Classify example
INDArray features = test.loadFeaturesFromString(text, truncateReviewsToLength);
INDArray networkOutput = net.output(features);
int timeSeriesLength = networkOutput.size(2);
INDArray probabilitiesAtLastWord =
networkOutput.get(NDArrayIndex.point(0),
NDArrayIndex.all(),
NDArrayIndex.point(timeSeriesLength - 1));

RNN Summary
• Use recurrent neural networks (RNNs) for time series or sequential data
• RNNs can consume & generate sequences
• RNNs back propagate through time as well as layers
• very deep networks so increase in training time
• Use LSTM (or GRU) layers

Exercise 6 Text generation using an
RNN
Use a text file of your choice to train an RNN to generate text
Project: ex6-rnn-text-generation
Files to modify: GravesLSTMCharModellingExample.java

Text generation using RNNs
Shift the text left to generate output
T h e q u i c k b r o w n f o
h e q u i c k b r o w n f o x

Test data
LJC meetup #1
“Please note that this is a talk organised by the Java Web
Users Group, who have kindly invited all of our members to
attend. …”

Pick first character
INDArray initializationInput = Nd4j.zeros(1, iter.inputColumns(),
initialization.length());
int idx = iter.convertCharacterToIndex(getRandomCharacter());
initializationInput.putScalar(
new int[]{0,idx,i}, 1.0f);
StringBuilder sb = new StringBuilder();
net.rnnClearPreviousState();

Sample characters
for (int i=0; i<charactersToSample; i++ ){
INDArray nextInput = Nd4j.zeros(1, …);
int charIdx = sampleFromDistribution(output);
nextInput.putScalar(
new int[]{o,sampledCharacterIdx}, 1.0f);
sb.append(convertIndexToCharacter(charIdx));
output = net.rnnTimeStep(nextInput);
}

Results : epoch 0
hpee Te paol. and the: angocind ppeage. 0e3gfve antinlenf
sger ind dige pomet se andes af ivecockoat, ireloierd an., ge
toees afod cigitpectibhe thicer R Enderdlasd, siugiaganler
inobe. Nemteng fla tiye das aneed seveale dod shas soy to
has, Moste thameste teupelge. 5oantion ander then ins anfoe
ton-

Results : epoch 5
out more including up to this talk mixt on the Jinabchar is
architects. He has written at made community and testing
Jethinions, sides Cloud, all hew demonstrate Retald or
various minutes & deploying your own scalability in Java, and
how any's well as designed) our will have elow, years,

Results : epoch 50
Most of the core of the conference is to experienced
developer or simply RSVP to this event and they recruitment
way to guh a finish things to help you get started, including
some from IBM who work on specific techougher in the blog
here: http:blog.recworks.co.uk

Summary
• Each layer learns features composed of features from previous layer
• Convolutional neural networks (CNN) well suited for images
• Recurrent Neural Networks (RNN) used for time series data
• Can have networks combining both convolutional & recurrent layers

Further info #2
• Andrew Ng Coursera : https://www.coursera.org/specializations/deep-learning
• Udacity : https://www.udacity.com/course/deep-learning--ud730
• CS231n Winter 2016 lecture videos
• Andrej Karpathy’s blog : http://karpathy.github.io/
• Andrew Trask’s blog : https://iamtrask.github.io/

Sigmoid & derivative
private INDArray sigmoid(INDArray input) {
return Nd4j.ones(input.shape()).div(exp(input.neg()).add(1));
}
private INDArray sigmoidDerivative(INDArray input) {
return input.mul(Nd4j.ones(input.shape()).sub(input));
}

Inputs & Weights
final double[][] inputsArray = {
{0, 0, 1},
{0, 1, 1},
{1, 0, 1},
{1, 1, 1}
};
final double[] labelsArray = {0, 1, 1, 0};
final INDArray inputs = Nd4j.create(inputsArray);
final INDArray labels = Nd4j.create(labelsArray).transpose();
learningRate = 0.1;
weights1 = Nd4j.randn(inputWidth, hiddenWidth);
weights2 = Nd4j.randn(hiddenWidth, outputWidth);

Forward pass
for (int i = 0; i < numIterations; ++i) {
// forward pass
INDArray layer1 = sigmoid(x.mmul(weights1));
INDArray layer2 = sigmoid(layer1.mmul(weights2));
// compute error
INDArray layer2Error = y.sub(layer2);

Backward pass
// backward pass
INDArray delta2 = layer2Error.mul(sigmoidDerivative(layer2));
INDArray layer1Error = delta2.mmul(weights2.transpose());
INDArray delta1 = layer1Error.mul(sigmoidDerivative(layer1));

Backward pass
weights2 = weights2.add(
layer1.transpose() // chain rule
.mmul(delta2) // error value
.mul(learningRate)); // update scale factor
weights1=weights1.add(
x.transpose()
.mmul(delta1)
.mul(learningRate));
} // end update loop

Getting your hands dirty with deep learning in java

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Similar to Getting your hands dirty with deep learning in java

Similar to Getting your hands dirty with deep learning in java (20)

Recently uploaded

Recently uploaded (20)

Getting your hands dirty with deep learning in java