Yes,
Machine Learning is present in Java,
but is Deep Learning too?
Tomasz Sikora
SJUG #26, Katowice, 2018-03-23
How to present DL at JUG, for technical audience?
What would be the most important to You?
Dilemma
From Machine Learning to Deep Learning
Multilayer
Neural Networks
learning from vast
amounts of data
Algorithms whose
output improve as
they are exposed
to more data
A program that can
sense, reason, act
and adapt
Intelligence
Explosion
(Good, 1965)
Bayesian
methods
‘AI Winter’
ML Pessimism
Backpropagation
rediscovery
Knowledge-driven
to data-driven ML
Kernel methods,
PCA, Clust, SVM
Feasible DL
From Programming to Building Model
Computer
Computer
Traditional
Programming
Data
Program
Machine
Learning
Data
Result
Result
Program/Model
(1) complex task or amount of data
(2) rules difficult to define or huge program
Categories of Machine Learning
Supervised,
the algorithm has training data with
a known expected output.
Unsupervised,
the algorithm identifies patterns in
the data without being told the
expected outcome.
Reinforcement Learning,
the algorithm learns from
interactions with the environment,
using trial-and-error and memorizes
strategy for further improvement.
Anomaly detection,
analyzes patterns.
Classification, a set of
data is given, and your
answer is one of the
pieces of data (discrete
target).
Regression, used to
find numbers (numeric
value, continuous
target).
Clustering, used if we
need to know about
structure; forms groups
to interpret the data.
Reinforcement, used
when a decision needs
to be made based on
past experience and the
environment.
Name Licence Short Algorithms Other, ANN
WEKA GPLv3 Collection of ML
algorithms for
DM
Classification, Regression, Clustering, Assoc Rules,
Cross-validation, Bayesian Networks, Ensemble
Learning, Visualization, Deep Learning
MLP, and wrapper to
DL4J
H2O Apache
2.0
Distributed and
scalable ML and
predictive
analytics
platform
Deep Learning, Distributed Random Forest, Generalized
Linear Model, Gradient Boosting Machine (GBM), Naïve
Bayes Classifier, Stacked Ensembles, XGBoost,
Generalized Low Rank Models, K-Means Clustering,
Principal Component Analysis
- MLP, RNN, CNN
- Deep Water: TF, Caffe,
MXNet
- Sparkling Water (for
Spark)
MOA GPLv3 Mining data
streams
Unsupervised methods in Cluster Analysis and Outlier
Detection, Decision Trees, Meta Classifiers, Naive Bayes
Weka
ELKI AGPLv3 Clustering and Outlier Detection
MLlib
(Spark)
Apache
2.0
Apache Spark's
scalable ML
library
Distributed Linear Algebra, SVD, PCA, Logistic
Regression, Naive Bayes, Generalized Linear
Regression, Decision trees, Random Forests,
Gradient-boosted trees, Clustering, K-means, Gaussian
Apache Spark's scalable
machine learning library
“Traditional” ML in Java, part 1
(many libraries & algorithms for similar tasks)
Name Licence Short Algorithms Other
Mahout Apache 2.0 Java libs for distributed /
scalable ML algorithms
Distributed Linear Algebra, SVD, PCA, Collaborative Filtering,
Canopy Clustering and Classification on to of Hadoop using
map/reduce
Apache Hadoop, Spark,
Flink and H2O
YALE GNU Affero RapidMiner Linear Algebra, PCA, Clustering, ... Extended as a proprietary
software
Shogun GPLv3 General ML Binary and Multiclass Classifier, Regressors, Random Forest,
SVM, Clustering, ...
NNs
JDMP LGPLv3 Data mining and ML Java Data Mining Package, a Library for Machine Learning
and Big Data Analytics
Yooreka Apache 2.0 General ML Clustering, Classification, Bayesian, Decision trees, Neural
Networks, Collaborative filtering
NNs
SAMOA Apache
Incubator
distributed streaming ML
algorithms
multiple DSPEs framework that contains a programing
abstraction for distributed streaming ML algorithms
DSPEe, such as Apache
Storm, Apache S4, and
Apache Samza
Java-ML GPLv2 Java API Java API with a collection of machine learning algorithms
“Traditional” ML in Java, part 2
(many libraries & algorithms for similar tasks)
“Traditional” NN
(No GPU/CUDA Support)
Name (Leader) License Architectures and Training Other
Neuroph
(Zoran Severac)
Apache 2.0 - Perceptron, Adaline, Multi Layer Perceptron,
Hopfield network, Bidirectional, Associative
Memory, Kohonen network, Hebbian network,
Maxnet, Competitive network, Instar
Outstar, RBF network, Neuro Fuzzy Reasoner
- Backpropagation, Momentum on Resilient
Propagation...
CNNs!
Encog
(Jeff Heaton)
Apache 2.0 - Perceptron, Adaline, Adaptive Resonance
Theory 1 (ART1), Bidirectional Associative
Memory (BAM), Boltzmann Machine,
Counterpropagation NN (CPN), Elman Recurrent
NN, Hopfield Neural Network, Jordan Recurrent
NN, Radial Basis Function Network, Recurrent
Self Organizing Map (RSOM), Self Organizing
Map (Kohonen)
- Backpropagation, Resilient Propagation, Genetic
Algorithm Training...
Neuroevolution of
augmenting
topologies, NEAT
and HyperNEAT
Bayesian Networks,
Hidden Markov
Models and Support
Vector Machines.
ML vs DL Performance and Scale
(Andrew NG, 2016)
Performance
Data
Traditional Algo ML
Shallow NN
Medium NN
Deep NN
Unsupervised
Supervised
Deep Learning Area
General DL Models
MLP, densely connected
layers
Image 2D/3D
CNN
Other
Deep Reinforcement
Learning
Sequence Models
RNN, LSTM, GRU
(Andrew NG, 2016)
Deep Learning Area
Focus is on end-to-end:
Vision: image --> object/face --> caption/person
image --> ????? --> caption/person
Audio: wave --> phonem --> transcript
wave --> ????? --> transcript
Instead of human/designer guidance, we need
lots of labeled data
Natural language processing (NLP): english -->
polish (spoken language understanding)
Market segmentation, i.e. predict if customer will
respond to a promotion
Demo 1 CNN
https://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html
Effect of hidden layer size Effect of regularization (pruning)
(Andrej Karpathy, 2015)
Demo 2 TS
http://playground.tensorflow.org
Supervised Learning, Steps
Prepare the data: Get
the raw data and
structure it.
Train the model: Use the
data and train the
model.
Test the model with
some test data; do the
model fitting and test it
again.
Deploy the model: Once
satisfied with the model,
deploy it to use.
Validate: Review the
success of the model
applied to real
conditions
Training set: Train
model (60% training)
Cross-validation: 20%
Test set: Test the
model (20% test)
Pursuit of Good Generalisation...
Error
Model Complexity
Test Sample
Underfitting Overfitting
BestGeneralization
Training Sample
Cross Validation
Error
Training
Error
Use cases - we will NOT be focusing on details of...
Training details and backpropagation
Hyperparameters tuning
Activation functions - Sigmoid, Tanh, ReLU, Maxout, ELU...
Architectures - http://www.asimovinstitute.org/neural-network-zoo/
Fighting with Vanishing Gradient Problem
Testing approaches - Sampling, KFold Cross-Validation
Regularisation L1, L2 avoiding overfitting during training, adaptive learning rate, rate
annealing, momentum training, dropout, checkpointing, and grid search enable high
predictive accuracy...
DL Use Cases
Image Classification, YOLO Object Recognition, Semantic Segmentation
MNIST CNN LeNet-5
(Le Cun, el al. 1989)
Deconvolutional Networks
(Krähenbühl, Koltun, 2012)
(Hong, Noh, Han, 2016)
(Zhao, at al, 2017)
https://www.youtube.com/watch?v=qWl9idsCuLQ
Speech to text
DL Use Cases - Audio Classification
DL Use Cases - Audio Classification
Canx Bookings Predictor
Data Set - 10yrs of oper, ~2.3M samples, 1.1GB csv
PoC 2m of oper, 22 attributes, R nn --> NN acc. was 92% (whilst LM was 84%)
352 booking and pax attributes --> sparse matrix of 1386 elements
TF+Keras, 3.5h learning on AWS t2.large --> NN acc. was 97.2%
Part 2
Main Platforms - Big Fight (and Firms)
Name / Site Licence Written In Interfaces NN Notes
DL4J
(Skymind)
Apache 2.0 Java, C++ Java, Scala, Clojure, Kotlin,
Python (Keras)
CNN, RNN,
LSTM
ND4J, Hadoop,
Spark
TensorFlow
(Google Brain)
Apache 2.0 Python,
C++
Python (Keras), C/C++, Java, Go,
R
CNN, RNN,
LSTM
H20 DW
Theano
(U Montreal)
BSD Python Python (Keras) CNN, RNN,
LSTM
H20 DW
Keras (François
Chollet, Google)
MIT Python Python, R Interface to TS,
MXNet, Theano
TensorFlow,
Theano, MXNet
Caffe (U Berkeley),
Caffe 2 (FB)
BSD,
Apache 2.0
C++ Python CNN, RNN,
LSTM
CaffeOnSpark
(Yahoo), H20 DW
DAAL (Intel) Apache 2.0 Python,
Java, C++
Python, C++, Java, R, Matlab Hadoop, Spark
MXNet (Apache) Apache 2.0 C++ C++, Python, Scala, Julia, Matlab,
JavaScript, Go, R, Perl
CNN, LSTM AWS, H20 DW
Torch BSD C, Lua C, Lua CNN, RNN,
LSTM
See PyTorch
Benchmarks
Name Desc A K 2016 Libs
MNIST-10 MNIST database of
handwritten digits, available
from this page, has a training
set of 60k examples (subset of
larger NIST), and a test set of
10k pics. The digits have been
size-normalized and centered
in a fixed-size 28x28 image.
https://cs.stanford.edu/pe
ople/karpathy/convnetjs/
demo/mnist.html
DL4J
https://github.com/deepl
earning4j/dl4j-examples
Keras
https://github.com/keras
-team/keras/tree/master
/examples
TS
https://www.tensorflow.
org/tutorials/layers
https://github.com/h2oai
/h2o-3/tree/master/exa
mples/deeplearning/not
ebooks
CIFAR CIFAR-10 dataset consists of
60k 32x32 colour images in 10
classes, with 6k images per
class. There are 50k training
images and 10k test images.
Run with 100 epochs training.
https://cs.stanford.edu/pe
ople/karpathy/convnetjs/
demo/cifar10.html
Demo 3 DL4J MNIST MLP(h:1x1000)
Demo 3 DL4J MNIST CNN(...)
Demo 3 DL4J MNIST CNN LeNet
Demo 3 DL4J MNIST CNN LeNet
DL4J CIFAR
DL4J CIFAR CNN Simpler
Benchmarks
Name Desc DL4J (CPU) TF + Keras (CPU)
MNIST-10 MNIST database of
handwritten digits, available
from this page, has a training
set of 60k examples (subset of
larger NIST), and a test set of
10k pics. The digits have been
size-normalized and centered
in a fixed-size 28x28 image.
MLP(h:1x1k) - 241s,
acc: 0.9729
MLP(h:2x500) - 193s,
acc: 0.9808
CNN (l6) - 126s, acc: 0.9917
LeNet - 100s, acc: 0.9750
MLP(h:1x1k) - 172s,
acc: 0.9827
MLP(h:2x500) - 182s,
acc: 0.9835
CNN (l6) - 860s,
acc: 0.9955
CIFAR CIFAR-10 dataset consists of
60k 32x32 colour images in 10
classes, with 6k images per
class. There are 50k training
images and 10k test images.
Run with 100 epochs training.
CNN AlexNet
(c64c64m,c96c96m,c128c12
8m,d1024d1024s)
- 180ks, acc: 0.4568
CNN
(c32c32m,c64c64m,d512s)
- 90ks, acc: 0.3437
CNN AlexNet
(c64c64m,c96c96m,c128c1
28m,d1024d1024s)
- 9.9ks, acc: 0.4313
CNN
(c32c32m,c64c64m,d512s)
- 2950s, acc: 0.4616
Deep Learning in H20
H20
https://htmlpreview.github.io/?https://github.com/ledell/sldm4-h2o/blob/master/sld
m4-deeplearning-h2o.html
RNN - LSTM
(Hochreiter, Schmidhuber, 1997)
(Brownlee, 2017)
Recurrent NN, nets with memory --> multivariate LSTM
Deep Reinforcement Learning
Q-learn Example (Karpathy, 2016)
RL4J - Deep Q-learning, A3C
https://github.com/deeplearning4j/rl4j
GPU nVidia CUDA, cuDNN
PerfTest TS on CIFAR10
(Lazorenko, 2017)
Distributed Training
General Tips ‘n Tricks
Always use the simplest architecture for a problem
Data prep is key!
Reduce feature set -- Covariance and PCA
The more layers the more features you can manage (dense MLP) but prune weights
Train and validate with test dataset --- use cross validation method
Tune tune tune ;) --- or use hyper-parameter optimization
Experiment with other platforms -- Integrate
Often we did not get to E2E DL yet!
QA
+
Feedback
@tomaszsikora
Examples
Image Net --
http://vision.stanford.edu/teaching/cs231b_spring1415/slides/alexnet_tugce_kyun
ghee.pdf
https://github.com/tensorflow/models/tree/master/research/object_detection
https://svds.com/tensorflow-rnn-tutorial/
???
No lib approach
http://ashishvs.in/2017-03-21-how-i-built-a-convolutional-neural-network-in-java/
https://github.com/BigPeng/JavaCNN
What Java Dev can use DL for ?
!@#!@#!@#!@# Pre trained models http://pretrained.ml/
https://github.com/fchollet/deep-learning-models
Architectures
https://www.slideshare.net/xavigiro/deep-learning-architectures-d2l2-insightdcu-m
achine-learning-workshop-2017
http://www.asimovinstitute.org/neural-network-zoo/
Performance Management

Sjug #26 ml is in java but is dl too - ver1.04 - tomasz sikora 2018-03-23

  • 1.
    Yes, Machine Learning ispresent in Java, but is Deep Learning too? Tomasz Sikora SJUG #26, Katowice, 2018-03-23
  • 2.
    How to presentDL at JUG, for technical audience? What would be the most important to You? Dilemma
  • 4.
    From Machine Learningto Deep Learning Multilayer Neural Networks learning from vast amounts of data Algorithms whose output improve as they are exposed to more data A program that can sense, reason, act and adapt Intelligence Explosion (Good, 1965)
  • 5.
  • 6.
    From Programming toBuilding Model Computer Computer Traditional Programming Data Program Machine Learning Data Result Result Program/Model (1) complex task or amount of data (2) rules difficult to define or huge program
  • 7.
    Categories of MachineLearning Supervised, the algorithm has training data with a known expected output. Unsupervised, the algorithm identifies patterns in the data without being told the expected outcome. Reinforcement Learning, the algorithm learns from interactions with the environment, using trial-and-error and memorizes strategy for further improvement. Anomaly detection, analyzes patterns. Classification, a set of data is given, and your answer is one of the pieces of data (discrete target). Regression, used to find numbers (numeric value, continuous target). Clustering, used if we need to know about structure; forms groups to interpret the data. Reinforcement, used when a decision needs to be made based on past experience and the environment.
  • 8.
    Name Licence ShortAlgorithms Other, ANN WEKA GPLv3 Collection of ML algorithms for DM Classification, Regression, Clustering, Assoc Rules, Cross-validation, Bayesian Networks, Ensemble Learning, Visualization, Deep Learning MLP, and wrapper to DL4J H2O Apache 2.0 Distributed and scalable ML and predictive analytics platform Deep Learning, Distributed Random Forest, Generalized Linear Model, Gradient Boosting Machine (GBM), Naïve Bayes Classifier, Stacked Ensembles, XGBoost, Generalized Low Rank Models, K-Means Clustering, Principal Component Analysis - MLP, RNN, CNN - Deep Water: TF, Caffe, MXNet - Sparkling Water (for Spark) MOA GPLv3 Mining data streams Unsupervised methods in Cluster Analysis and Outlier Detection, Decision Trees, Meta Classifiers, Naive Bayes Weka ELKI AGPLv3 Clustering and Outlier Detection MLlib (Spark) Apache 2.0 Apache Spark's scalable ML library Distributed Linear Algebra, SVD, PCA, Logistic Regression, Naive Bayes, Generalized Linear Regression, Decision trees, Random Forests, Gradient-boosted trees, Clustering, K-means, Gaussian Apache Spark's scalable machine learning library “Traditional” ML in Java, part 1 (many libraries & algorithms for similar tasks)
  • 9.
    Name Licence ShortAlgorithms Other Mahout Apache 2.0 Java libs for distributed / scalable ML algorithms Distributed Linear Algebra, SVD, PCA, Collaborative Filtering, Canopy Clustering and Classification on to of Hadoop using map/reduce Apache Hadoop, Spark, Flink and H2O YALE GNU Affero RapidMiner Linear Algebra, PCA, Clustering, ... Extended as a proprietary software Shogun GPLv3 General ML Binary and Multiclass Classifier, Regressors, Random Forest, SVM, Clustering, ... NNs JDMP LGPLv3 Data mining and ML Java Data Mining Package, a Library for Machine Learning and Big Data Analytics Yooreka Apache 2.0 General ML Clustering, Classification, Bayesian, Decision trees, Neural Networks, Collaborative filtering NNs SAMOA Apache Incubator distributed streaming ML algorithms multiple DSPEs framework that contains a programing abstraction for distributed streaming ML algorithms DSPEe, such as Apache Storm, Apache S4, and Apache Samza Java-ML GPLv2 Java API Java API with a collection of machine learning algorithms “Traditional” ML in Java, part 2 (many libraries & algorithms for similar tasks)
  • 10.
    “Traditional” NN (No GPU/CUDASupport) Name (Leader) License Architectures and Training Other Neuroph (Zoran Severac) Apache 2.0 - Perceptron, Adaline, Multi Layer Perceptron, Hopfield network, Bidirectional, Associative Memory, Kohonen network, Hebbian network, Maxnet, Competitive network, Instar Outstar, RBF network, Neuro Fuzzy Reasoner - Backpropagation, Momentum on Resilient Propagation... CNNs! Encog (Jeff Heaton) Apache 2.0 - Perceptron, Adaline, Adaptive Resonance Theory 1 (ART1), Bidirectional Associative Memory (BAM), Boltzmann Machine, Counterpropagation NN (CPN), Elman Recurrent NN, Hopfield Neural Network, Jordan Recurrent NN, Radial Basis Function Network, Recurrent Self Organizing Map (RSOM), Self Organizing Map (Kohonen) - Backpropagation, Resilient Propagation, Genetic Algorithm Training... Neuroevolution of augmenting topologies, NEAT and HyperNEAT Bayesian Networks, Hidden Markov Models and Support Vector Machines.
  • 11.
    ML vs DLPerformance and Scale (Andrew NG, 2016) Performance Data Traditional Algo ML Shallow NN Medium NN Deep NN
  • 12.
    Unsupervised Supervised Deep Learning Area GeneralDL Models MLP, densely connected layers Image 2D/3D CNN Other Deep Reinforcement Learning Sequence Models RNN, LSTM, GRU (Andrew NG, 2016)
  • 13.
    Deep Learning Area Focusis on end-to-end: Vision: image --> object/face --> caption/person image --> ????? --> caption/person Audio: wave --> phonem --> transcript wave --> ????? --> transcript Instead of human/designer guidance, we need lots of labeled data Natural language processing (NLP): english --> polish (spoken language understanding) Market segmentation, i.e. predict if customer will respond to a promotion
  • 14.
    Demo 1 CNN https://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html Effectof hidden layer size Effect of regularization (pruning) (Andrej Karpathy, 2015)
  • 15.
  • 16.
    Supervised Learning, Steps Preparethe data: Get the raw data and structure it. Train the model: Use the data and train the model. Test the model with some test data; do the model fitting and test it again. Deploy the model: Once satisfied with the model, deploy it to use. Validate: Review the success of the model applied to real conditions Training set: Train model (60% training) Cross-validation: 20% Test set: Test the model (20% test)
  • 17.
    Pursuit of GoodGeneralisation... Error Model Complexity Test Sample Underfitting Overfitting BestGeneralization Training Sample Cross Validation Error Training Error
  • 18.
    Use cases -we will NOT be focusing on details of... Training details and backpropagation Hyperparameters tuning Activation functions - Sigmoid, Tanh, ReLU, Maxout, ELU... Architectures - http://www.asimovinstitute.org/neural-network-zoo/ Fighting with Vanishing Gradient Problem Testing approaches - Sampling, KFold Cross-Validation Regularisation L1, L2 avoiding overfitting during training, adaptive learning rate, rate annealing, momentum training, dropout, checkpointing, and grid search enable high predictive accuracy...
  • 19.
    DL Use Cases ImageClassification, YOLO Object Recognition, Semantic Segmentation
  • 20.
    MNIST CNN LeNet-5 (LeCun, el al. 1989)
  • 22.
    Deconvolutional Networks (Krähenbühl, Koltun,2012) (Hong, Noh, Han, 2016) (Zhao, at al, 2017) https://www.youtube.com/watch?v=qWl9idsCuLQ
  • 23.
    Speech to text DLUse Cases - Audio Classification
  • 24.
    DL Use Cases- Audio Classification
  • 25.
    Canx Bookings Predictor DataSet - 10yrs of oper, ~2.3M samples, 1.1GB csv PoC 2m of oper, 22 attributes, R nn --> NN acc. was 92% (whilst LM was 84%) 352 booking and pax attributes --> sparse matrix of 1386 elements TF+Keras, 3.5h learning on AWS t2.large --> NN acc. was 97.2%
  • 26.
  • 27.
    Main Platforms -Big Fight (and Firms) Name / Site Licence Written In Interfaces NN Notes DL4J (Skymind) Apache 2.0 Java, C++ Java, Scala, Clojure, Kotlin, Python (Keras) CNN, RNN, LSTM ND4J, Hadoop, Spark TensorFlow (Google Brain) Apache 2.0 Python, C++ Python (Keras), C/C++, Java, Go, R CNN, RNN, LSTM H20 DW Theano (U Montreal) BSD Python Python (Keras) CNN, RNN, LSTM H20 DW Keras (François Chollet, Google) MIT Python Python, R Interface to TS, MXNet, Theano TensorFlow, Theano, MXNet Caffe (U Berkeley), Caffe 2 (FB) BSD, Apache 2.0 C++ Python CNN, RNN, LSTM CaffeOnSpark (Yahoo), H20 DW DAAL (Intel) Apache 2.0 Python, Java, C++ Python, C++, Java, R, Matlab Hadoop, Spark MXNet (Apache) Apache 2.0 C++ C++, Python, Scala, Julia, Matlab, JavaScript, Go, R, Perl CNN, LSTM AWS, H20 DW Torch BSD C, Lua C, Lua CNN, RNN, LSTM See PyTorch
  • 28.
    Benchmarks Name Desc AK 2016 Libs MNIST-10 MNIST database of handwritten digits, available from this page, has a training set of 60k examples (subset of larger NIST), and a test set of 10k pics. The digits have been size-normalized and centered in a fixed-size 28x28 image. https://cs.stanford.edu/pe ople/karpathy/convnetjs/ demo/mnist.html DL4J https://github.com/deepl earning4j/dl4j-examples Keras https://github.com/keras -team/keras/tree/master /examples TS https://www.tensorflow. org/tutorials/layers https://github.com/h2oai /h2o-3/tree/master/exa mples/deeplearning/not ebooks CIFAR CIFAR-10 dataset consists of 60k 32x32 colour images in 10 classes, with 6k images per class. There are 50k training images and 10k test images. Run with 100 epochs training. https://cs.stanford.edu/pe ople/karpathy/convnetjs/ demo/cifar10.html
  • 29.
    Demo 3 DL4JMNIST MLP(h:1x1000)
  • 30.
    Demo 3 DL4JMNIST CNN(...)
  • 31.
    Demo 3 DL4JMNIST CNN LeNet
  • 32.
    Demo 3 DL4JMNIST CNN LeNet
  • 33.
  • 36.
  • 37.
    Benchmarks Name Desc DL4J(CPU) TF + Keras (CPU) MNIST-10 MNIST database of handwritten digits, available from this page, has a training set of 60k examples (subset of larger NIST), and a test set of 10k pics. The digits have been size-normalized and centered in a fixed-size 28x28 image. MLP(h:1x1k) - 241s, acc: 0.9729 MLP(h:2x500) - 193s, acc: 0.9808 CNN (l6) - 126s, acc: 0.9917 LeNet - 100s, acc: 0.9750 MLP(h:1x1k) - 172s, acc: 0.9827 MLP(h:2x500) - 182s, acc: 0.9835 CNN (l6) - 860s, acc: 0.9955 CIFAR CIFAR-10 dataset consists of 60k 32x32 colour images in 10 classes, with 6k images per class. There are 50k training images and 10k test images. Run with 100 epochs training. CNN AlexNet (c64c64m,c96c96m,c128c12 8m,d1024d1024s) - 180ks, acc: 0.4568 CNN (c32c32m,c64c64m,d512s) - 90ks, acc: 0.3437 CNN AlexNet (c64c64m,c96c96m,c128c1 28m,d1024d1024s) - 9.9ks, acc: 0.4313 CNN (c32c32m,c64c64m,d512s) - 2950s, acc: 0.4616
  • 38.
    Deep Learning inH20 H20 https://htmlpreview.github.io/?https://github.com/ledell/sldm4-h2o/blob/master/sld m4-deeplearning-h2o.html
  • 39.
    RNN - LSTM (Hochreiter,Schmidhuber, 1997) (Brownlee, 2017) Recurrent NN, nets with memory --> multivariate LSTM
  • 40.
    Deep Reinforcement Learning Q-learnExample (Karpathy, 2016) RL4J - Deep Q-learning, A3C https://github.com/deeplearning4j/rl4j
  • 41.
    GPU nVidia CUDA,cuDNN PerfTest TS on CIFAR10 (Lazorenko, 2017)
  • 42.
  • 43.
    General Tips ‘nTricks Always use the simplest architecture for a problem Data prep is key! Reduce feature set -- Covariance and PCA The more layers the more features you can manage (dense MLP) but prune weights Train and validate with test dataset --- use cross validation method Tune tune tune ;) --- or use hyper-parameter optimization Experiment with other platforms -- Integrate Often we did not get to E2E DL yet!
  • 44.
  • 47.
  • 48.
  • 49.
    What Java Devcan use DL for ? !@#!@#!@#!@# Pre trained models http://pretrained.ml/ https://github.com/fchollet/deep-learning-models Architectures https://www.slideshare.net/xavigiro/deep-learning-architectures-d2l2-insightdcu-m achine-learning-workshop-2017 http://www.asimovinstitute.org/neural-network-zoo/ Performance Management