Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
DL4J
DEEP LEARNING IS
TAKING DATA SCIENCE
BY STORM
Neural Nets & Deep Learning
▪ A Perceptron/Neuron can be loosely compared to a
NAND gate

▪ Non linear functions can be co...
Not quite terminator…
{yet, despite our name}

▪Applications are everywhere. 

▪OCR

▪Netflix recommendations
REAL USE CASES
Deep Learning - why
now?
Confluence of
Algorithms
Compute
Data
Skymind?
We are a distributed team living and
working in every major time zone
WE BUILT
DEEPLEARNING4J
WHY THE JVM?
ENTERPRISE NEEDS
Data Gravity
• We need to process the data in workflows where the data
lives • If you move data you don’t ...
The JVM is great at
network I/O and data
access• Also great at
streaming infrastructure.
BUT…
When considering HPC on the JVM
Scientific computing & the JVM
• Vectorization

• Array indexing, 32 bit address space

FUL...
JAVACPP
	 	 •  Auto generate JNI bindings for C++ by parsing classes 

	 	 •  Allows for easy maintenance and deployment o...
ND4J
§ Heterogenous codebase

§ Supports cuda, x86 and Power

§ Shared indexing logic for writing ndarray routines

§ Memo...
BOTTOMLINE:
If you don’t want to end up
rearranging deck chairs on the
titanic…
SCHEMATIC OVERVIEW
QUICK START
http://deeplearning4j.org/
quickstart
Get support on Gitter.
https://gitter.im/deeplearning4j/
deeplearning4j
Example Repo
https://github.com/deeplearning4j/dl4j-
examples
Recurrent NN
▪ Loops

▪ Temporal behavior

▪ Used for temporal
series
Applications
- Sequence to Sequence
Credits, Andrej Karpathy
Applications
Second Lord:

They would be ruled after this chamber, and

my fair nues begun out of the fact, to be conveyed...
Let’s look at output from a dl4j
run…
ADDITION as SeqToSeq
73+45=118
37+54=91
&#$%/!!*
#&%$/(!Z
What is the random probability of
getting an addition right?
Let’s look at output from a dl4j
run…
LET’s TALK TUNING!
Learning Rate,
updaters, batch size,
loss functions,…
Normalize your data!
• continuous values
• discrete classes
• repeat the exact same method for
training and test
Weight Initialization
• Xavier weight initialization
• RELU weight initialization
Learning Rate, Epochs,
Iterations, Minibatch size
• A learning rate range typical range: 0.1
to 1e-6
• Multiple epochs and...
Activation Functions, Loss
Functions
• Regression/Classification: MSE
(MAE,MAPE..)/ XENT (KLD..) + softmax
• relu, leakyrel...
Remember
overfitting and the
bias/variance trade
off?
Regularization
• l1, l2 regularization. Common values for
l2 regularization are 1e-3 to 1e-6.
• Dropout,
• Dropconnect
• R...
Updaters, Optimization
Algorithms
• Updaters: momentum, RMSProp,
adagrad etc
• Optimization Algorithms: SGD with line
sear...
Exploding and
Vanishing…
Gradient Normalization
• gradientNormalization
• gradientNormalizationThreshold
All this and more…
http://deeplearning4j.org/
troubleshootingneuralnets
Inside the black
box…
Use the UI
All this and more…
http://
deeplearning4j.org/
visualization
What happened
here?
Here?
Here?
Here?
I don’t understand
why I get nans?!
The saga of the nans
• Not a saga - any numerical software/
scientific computing can/will run into this.
Default data type ...
Let’s try this…NOW
From the NeedForSpeed to
Distributed Deep Learning
Switching to GPUs
<dependency>

<groupId>org.nd4j</groupId> <artifactId>nd4j-cuda-7.5</artifactId> <version>
${nd4j.versio...
Onto multi-GPUs
ParallelWrapper wrapper =
new
ParallelWrapper.Builder(YourExistingModel)
.prefetchBuffer(24)
.workers(4)
....
Deep Learning with Parameter
Averaging
http://engineering.skymind.io/distributed-deep-learning-part-1-an-
introduction-to-...
Onto Spark…
//Create the TrainingMaster instance
int examplesPerDataSetObject = 1;
TrainingMaster trainingMaster =
new Par...
Spark with a GPU cluster
Distributed GPUs With Spark…
http://deeplearning4j.org/spark-gpus
Tune appropriately and keep gpu...
Double performance gains?
Huge gains on memory-intensive operations
if you are willing to sacrifice some precision
DataTyp...
DEEP LEARNING:
A PRACTITIONER’S APPROACH
Written by Skymind’s Adam
Gibson and Josh Patterson.
#1 New Release in Data
Model...
BackUp
SKYMIND INTELLIGENCE LAYER (SKIL)
Reference Architecture
SKIL-DC/OS
SKIL / DL
DOCKER
DCOS + SPARK
MESOS
MULTI-GPUS
Deeplearning 4j
Upcoming SlideShare
Loading in …5
×

Deeplearning 4j

98 views

Published on

Overview of Deeplearning 4j

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Deeplearning 4j

  1. 1. DL4J
  2. 2. DEEP LEARNING IS TAKING DATA SCIENCE BY STORM
  3. 3. Neural Nets & Deep Learning ▪ A Perceptron/Neuron can be loosely compared to a NAND gate ▪ Non linear functions can be constructed
  4. 4. Not quite terminator… {yet, despite our name} ▪Applications are everywhere. ▪OCR ▪Netflix recommendations
  5. 5. REAL USE CASES
  6. 6. Deep Learning - why now?
  7. 7. Confluence of Algorithms Compute Data
  8. 8. Skymind? We are a distributed team living and working in every major time zone
  9. 9. WE BUILT DEEPLEARNING4J
  10. 10. WHY THE JVM?
  11. 11. ENTERPRISE NEEDS Data Gravity • We need to process the data in workflows where the data lives • If you move data you don’t have big data • Even if the data is not “big” we still want simpler workflows Integration Issues • Ingest, ETL, Vectorization, Modeling, Evaluation, and Deployment issues, security etc
  12. 12. The JVM is great at network I/O and data access• Also great at streaming infrastructure. BUT…
  13. 13. When considering HPC on the JVM Scientific computing & the JVM • Vectorization • Array indexing, 32 bit address space FULLY native backend https://github.com/deeplearning4j/libnd4j JAVACPP OpenMP CUDA
  14. 14. JAVACPP •  Auto generate JNI bindings for C++ by parsing classes 
 •  Allows for easy maintenance and deployment of C++ 
 binaries in java 
 •  Write efficient ETL pipelines for images via opencv (javacv) 
 •  64 bit pointers (wasn’t possible before) 

  15. 15. ND4J § Heterogenous codebase
 § Supports cuda, x86 and Power
 § Shared indexing logic for writing ndarray routines § Memory management in java (even cuda memory!) 

  16. 16. BOTTOMLINE: If you don’t want to end up rearranging deck chairs on the titanic…
  17. 17. SCHEMATIC OVERVIEW
  18. 18. QUICK START http://deeplearning4j.org/ quickstart Get support on Gitter. https://gitter.im/deeplearning4j/ deeplearning4j
  19. 19. Example Repo https://github.com/deeplearning4j/dl4j- examples
  20. 20. Recurrent NN ▪ Loops ▪ Temporal behavior ▪ Used for temporal series
  21. 21. Applications - Sequence to Sequence Credits, Andrej Karpathy
  22. 22. Applications Second Lord: They would be ruled after this chamber, and my fair nues begun out of the fact, to be conveyed, Whose noble souls I'll have the heart of the wars. Clown: Come, sir, I will make did behold your worship. VIOLA: I'll drink it. Credits, Andrej Karpathy
  23. 23. Let’s look at output from a dl4j run…
  24. 24. ADDITION as SeqToSeq 73+45=118 37+54=91 &#$%/!!* #&%$/(!Z What is the random probability of getting an addition right?
  25. 25. Let’s look at output from a dl4j run…
  26. 26. LET’s TALK TUNING!
  27. 27. Learning Rate, updaters, batch size, loss functions,…
  28. 28. Normalize your data! • continuous values • discrete classes • repeat the exact same method for training and test
  29. 29. Weight Initialization • Xavier weight initialization • RELU weight initialization
  30. 30. Learning Rate, Epochs, Iterations, Minibatch size • A learning rate range typical range: 0.1 to 1e-6 • Multiple epochs and one iteration • 16 to 128 minibatch, with a max 1k
  31. 31. Activation Functions, Loss Functions • Regression/Classification: MSE (MAE,MAPE..)/ XENT (KLD..) + softmax • relu, leakyrelu, identity, tanh, softsign
  32. 32. Remember overfitting and the bias/variance trade off?
  33. 33. Regularization • l1, l2 regularization. Common values for l2 regularization are 1e-3 to 1e-6. • Dropout, • Dropconnect • Restrict network size • Early stopping
  34. 34. Updaters, Optimization Algorithms • Updaters: momentum, RMSProp, adagrad etc • Optimization Algorithms: SGD with line search, conjugate gradient and LBFGS optimization algorithms
  35. 35. Exploding and Vanishing…
  36. 36. Gradient Normalization • gradientNormalization • gradientNormalizationThreshold
  37. 37. All this and more… http://deeplearning4j.org/ troubleshootingneuralnets
  38. 38. Inside the black box…
  39. 39. Use the UI
  40. 40. All this and more… http:// deeplearning4j.org/ visualization
  41. 41. What happened here?
  42. 42. Here?
  43. 43. Here?
  44. 44. Here?
  45. 45. I don’t understand why I get nans?!
  46. 46. The saga of the nans • Not a saga - any numerical software/ scientific computing can/will run into this. Default data type is float, could change it to double. • Most often - It’s a tuning issue. If it's diverging just keep adjusting your parameters.
  47. 47. Let’s try this…NOW
  48. 48. From the NeedForSpeed to Distributed Deep Learning
  49. 49. Switching to GPUs <dependency>
 <groupId>org.nd4j</groupId> <artifactId>nd4j-cuda-7.5</artifactId> <version> ${nd4j.version}</version> </dependency>
  50. 50. Onto multi-GPUs ParallelWrapper wrapper = new ParallelWrapper.Builder(YourExistingModel) .prefetchBuffer(24) .workers(4) .averagingFrequency(1) .reportScoreAfterAveraging(true) .useLegacyAveraging(false) .build(); http://deeplearning4j.org/gpu
  51. 51. Deep Learning with Parameter Averaging http://engineering.skymind.io/distributed-deep-learning-part-1-an- introduction-to-distributed-training-of-neural-networks
  52. 52. Onto Spark… //Create the TrainingMaster instance int examplesPerDataSetObject = 1; TrainingMaster trainingMaster = new ParameterAveragingTrainingMaster. Builder(examplesPerDataSetObject). (other configuration options ).build(); //Create the SparkDl4jMultiLayer instance SparkDl4jMultiLayer sparkNetwork = new SparkDl4jMultiLayer(sc, net workConfig, trainingMaster); //Fit the network using the training data: sparkNetwork.fit(trainingData); http://deeplearning4j.org/spark
  53. 53. Spark with a GPU cluster Distributed GPUs With Spark… http://deeplearning4j.org/spark-gpus Tune appropriately and keep gpu memory into consideration
  54. 54. Double performance gains? Huge gains on memory-intensive operations if you are willing to sacrifice some precision DataTypeUtil.setDTypeForContext(DataBuffer. Type.HALF);
  55. 55. DEEP LEARNING: A PRACTITIONER’S APPROACH Written by Skymind’s Adam Gibson and Josh Patterson. #1 New Release in Data Modeling and Design on Amazon.
  56. 56. BackUp
  57. 57. SKYMIND INTELLIGENCE LAYER (SKIL) Reference Architecture
  58. 58. SKIL-DC/OS
  59. 59. SKIL / DL DOCKER DCOS + SPARK MESOS MULTI-GPUS

×