Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
Guglielmo Iozzia, MSD
Deep Learning with DL4J
on Apache Spark: Yeah
it's Cool, but are You
Doing it the Right Way?
#Unifie...
About Me
Currently at
Previously at
Author at Packt Publishing
Got some awards lately
Champion I love cooking
#UnifiedData...
MSD in ireland
4#UnifiedDataAnalytics #SparkAISummit
+ 50 years
Approx. 2,000 employees
$2.5 billion investment to date
Ap...
The Dublin Tech Hub
#UnifiedDataAnalytics #SparkAISummit 5
Deep Learning
It is a subset of machine learning where
artificial neural networks, algorithms
inspired by the human brain,...
Deep Learning
http://www.asimovinstitute.org/wp-content/uploads/2019/04/NeuralNetworkZoo20042019.png
#UnifiedDataAnalytics...
Deep
Learning
#UnifiedDataAnalytics #SparkAISummit
8
Some practical applications of
Deep Learning
• Computer vision
• Text generation
• NLP and NLU
• Autonomous cars
• Robotic...
Challenges of training MNNs in
Spark
• Different execution models between Spark and
the DL frameworks
• GPU configuration ...
DeepLearning4J
11#UnifiedDataAnalytics #SparkAISummit
It is an Open Source, distributed,
Deep Learning framework written
f...
DL4J modules
• DataVec
• Arbiter
• NN
• Datasets
• RL4J
• DL4J-Spark
• Model Import
• ND4J
12#UnifiedDataAnalytics #SparkA...
DL4J Code Example
13#UnifiedDataAnalytics #SparkAISummit
Training and Evaluation
Network
Configuration
ND4J
It is an Open Source linear algebra and
matrix manipulation library which supports
n-dimensional arrays and it is int...
ND4J Code Example
15#UnifiedDataAnalytics #SparkAISummit
Why Distributed MNN Training
with DL4J and Apache Spark?
Why this is a powerful combination?
16#UnifiedDataAnalytics #Spar...
DL4J + Apache Spark
• DL4J provides high level API to design, configure, train and
evaluate MNNs.
• Spark performances are...
DL4J + Apache Spark
18#UnifiedDataAnalytics #SparkAISummit
Model Parallelization Data Parallelization
How Training Happens in Spark
with DL4J?
19#UnifiedDataAnalytics #SparkAISummit
Parameter Averaging
(DL4J 1.0.0-alpha)
Asy...
So: What could possibly go wrong?
20#UnifiedDataAnalytics #SparkAISummit
Memory
Management
And now, for something
(a little bit) different.
#UnifiedDataAnalytics #SparkAISummit 21
Memory Utilization at Training Time
#UnifiedDataAnalytics #SparkAISummit 22
Memory Management in DL4J
Memory allocations can be managed using two different approaches:
• JVM GC and WeakReference tra...
Memory Management in DL4J
The difference between the two approaches is:
• JVM GC: when a INDArray is collected by the garb...
Memory Management in DL4J
Please remember that, when a training process uses
workspaces, in order to get the most from thi...
The DL4J training UI
#UnifiedDataAnalytics #SparkAISummit 26
Root Cause and Potential
Solutions
Dependencies conflict between the DL4J-UI library and
Apache Spark when running in the ...
Serialization & ND4J
Data Serialization is the process of converting the
in-memory objects to another format that can be
u...
Do You Opt for Kryo?
Kryo doesn’t work well with
off-heap data structures.
29#UnifiedDataAnalytics #SparkAISummit
How to Use Kryo Serialization
with ND4J?
1. Add the ND4J-Kryo dependency to the project
2. Configure the Spark application...
Spark and Large Off-heap
Objects
Spark has problems handling Java objects with
large off-heap components, in particular in...
Spark and Large Off-heap
Objects
Spark drops part of a RDD based on the estimated size of
that block. It estimates the siz...
Spark and Large Off-heap
Objects
It is then good practice using MEMORY_ONLY_SER or
MEMORY_AND_DISK_SER when persisting a
R...
Configuring the Memory Limits
Java command line arguments available:
• -Xms
• -Xmx
• -Dorg.bytedeco.javacpp.maxbytes: to s...
Configuring the Memory Limits
Caveat:
In limited memory environments it’s a bad idea to use high
-Xmx value together with ...
Configuring the Memory Limits
General best practice:
Typically in DL4J applications you need less RAM to be
used in the JV...
Python Models Import in DL4J
37#UnifiedDataAnalytics #SparkAISummit
TensorFlow
DL4J Memory Management Applies Here
You can find
more details on
DL4J and Spark
in my Book
http://tinyurl.com/y9jkvtuy
38
Thank You!
Any Questions?
You can find me at
@guglielmoiozzia
https://ie.linkedin.com/in/giozzia
googlielmo.blogspot.com
3...
DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT
Upcoming SlideShare
Loading in …5
×

of

Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? Slide 1 Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? Slide 2 Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? Slide 3 Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? Slide 4 Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? Slide 5 Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? Slide 6 Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? Slide 7 Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? Slide 8 Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? Slide 9 Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? Slide 10 Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? Slide 11 Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? Slide 12 Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? Slide 13 Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? Slide 14 Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? Slide 15 Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? Slide 16 Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? Slide 17 Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? Slide 18 Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? Slide 19 Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? Slide 20 Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? Slide 21 Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? Slide 22 Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? Slide 23 Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? Slide 24 Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? Slide 25 Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? Slide 26 Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? Slide 27 Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? Slide 28 Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? Slide 29 Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? Slide 30 Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? Slide 31 Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? Slide 32 Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? Slide 33 Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? Slide 34 Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? Slide 35 Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? Slide 36 Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? Slide 37 Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? Slide 38 Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? Slide 39 Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? Slide 40
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

0 Likes

Share

Download to read offline

Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way?

Download to read offline

DeepLearning4J (DL4J) is a powerful Open Source distributed framework that brings Deep Learning to the JVM (it can serve as a DIY tool for Java, Scala, Clojure and Kotlin programmers). It can be used on distributed GPUs and CPUs. It is integrated with Hadoop and Apache Spark. ND4J is a Open Source, distributed and GPU-enabled library that brings the intuitive scientific computing tools of the Python community to the JVM. Training neural network models using DL4J, ND4J and Spark is a powerful combination, but it presents some unexpected issues that can compromise performance and nullify the benefits of well written code and good model design. In this talk I will walk through some of those problems and will present some best practices to prevent them, coming from lessons learned when putting things in production.

  • Be the first to like this

Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way?

  1. 1. WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
  2. 2. Guglielmo Iozzia, MSD Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it the Right Way? #UnifiedDataAnalytics #SparkAISummit
  3. 3. About Me Currently at Previously at Author at Packt Publishing Got some awards lately Champion I love cooking #UnifiedDataAnalytics #SparkAISummit #GuglielmoIozzia 3
  4. 4. MSD in ireland 4#UnifiedDataAnalytics #SparkAISummit + 50 years Approx. 2,000 employees $2.5 billion investment to date Approx 50% MSD’s top 20 products manufactured here Export to + 60 countries €6.1 billion turnover in 2017 2017 + 300 jobs & €280m investment MSD Biotech, Dublin, coming in 2021
  5. 5. The Dublin Tech Hub #UnifiedDataAnalytics #SparkAISummit 5
  6. 6. Deep Learning It is a subset of machine learning where artificial neural networks, algorithms inspired by the human brain, learn from large amounts of data. #UnifiedDataAnalytics #SparkAISummit 6
  7. 7. Deep Learning http://www.asimovinstitute.org/wp-content/uploads/2019/04/NeuralNetworkZoo20042019.png #UnifiedDataAnalytics #SparkAISummit 7
  8. 8. Deep Learning #UnifiedDataAnalytics #SparkAISummit 8
  9. 9. Some practical applications of Deep Learning • Computer vision • Text generation • NLP and NLU • Autonomous cars • Robotics • Gaming • Quantitative finance • Manufacturing 9#UnifiedDataAnalytics #SparkAISummit
  10. 10. Challenges of training MNNs in Spark • Different execution models between Spark and the DL frameworks • GPU configuration and management • Performance • Accuracy 10#UnifiedDataAnalytics #SparkAISummit
  11. 11. DeepLearning4J 11#UnifiedDataAnalytics #SparkAISummit It is an Open Source, distributed, Deep Learning framework written for JVM languages. It can be used on distributed GPUs and CPUs. It is integrated with Hadoop and Apache Spark.
  12. 12. DL4J modules • DataVec • Arbiter • NN • Datasets • RL4J • DL4J-Spark • Model Import • ND4J 12#UnifiedDataAnalytics #SparkAISummit
  13. 13. DL4J Code Example 13#UnifiedDataAnalytics #SparkAISummit Training and Evaluation Network Configuration
  14. 14. ND4J It is an Open Source linear algebra and matrix manipulation library which supports n-dimensional arrays and it is integrated with Apache Hadoop and Spark. #UnifiedDataAnalytics #SparkAISummit 14
  15. 15. ND4J Code Example 15#UnifiedDataAnalytics #SparkAISummit
  16. 16. Why Distributed MNN Training with DL4J and Apache Spark? Why this is a powerful combination? 16#UnifiedDataAnalytics #SparkAISummit
  17. 17. DL4J + Apache Spark • DL4J provides high level API to design, configure, train and evaluate MNNs. • Spark performances are excellent in particular for ETL/streaming, but in terms of computation, in a MNN training context, some data transformation/aggregation need to be done using a low-level language. • DL4J uses ND4J, which is a C++ library that provides high level Scala API to developers. 17#UnifiedDataAnalytics #SparkAISummit
  18. 18. DL4J + Apache Spark 18#UnifiedDataAnalytics #SparkAISummit Model Parallelization Data Parallelization
  19. 19. How Training Happens in Spark with DL4J? 19#UnifiedDataAnalytics #SparkAISummit Parameter Averaging (DL4J 1.0.0-alpha) Asynchronous SDG (DL4J 1.0.0-beta+)
  20. 20. So: What could possibly go wrong? 20#UnifiedDataAnalytics #SparkAISummit
  21. 21. Memory Management And now, for something (a little bit) different. #UnifiedDataAnalytics #SparkAISummit 21
  22. 22. Memory Utilization at Training Time #UnifiedDataAnalytics #SparkAISummit 22
  23. 23. Memory Management in DL4J Memory allocations can be managed using two different approaches: • JVM GC and WeakReference tracking • MemoryWorkspaces The idea behind both is the same: once a NDArray is no longer required, the off-heap memory associated with it should be released so that it can be reused. 23#UnifiedDataAnalytics #SparkAISummit
  24. 24. Memory Management in DL4J The difference between the two approaches is: • JVM GC: when a INDArray is collected by the garbage collector, its off-heap memory is deallocated, with the assumption that it is not used elsewhere. • MemoryWorkspaces: when a INDArray leaves the workspace scope, its off-heap memory may be reused, without deallocation and reallocation. 24#UnifiedDataAnalytics #SparkAISummit Better performance for training and inference.
  25. 25. Memory Management in DL4J Please remember that, when a training process uses workspaces, in order to get the most from this approach, periodic GC calls need to be disabled: Nd4j.getMemoryManager.togglePeriodicGc(false) or their frequency needs to be reduced: val gcInterval = 10000 // In milliseconds Nd4j.getMemoryManager.setAutoGcWindow(gcInterval) 25#UnifiedDataAnalytics #SparkAISummit
  26. 26. The DL4J training UI #UnifiedDataAnalytics #SparkAISummit 26
  27. 27. Root Cause and Potential Solutions Dependencies conflict between the DL4J-UI library and Apache Spark when running in the same JVM. Two alternatives are available: • Collect and save the relevant training stats at runtime, and then visualize them offline later. • Execute the UI and use its remote functionality into separate JVMs (servers). Metrics are uploaded from the Spark master to the UI server. 27#UnifiedDataAnalytics #SparkAISummit
  28. 28. Serialization & ND4J Data Serialization is the process of converting the in-memory objects to another format that can be used to store or send them over the network. Two options available in Spark: • Java (default) • Kryo 28#UnifiedDataAnalytics #SparkAISummit
  29. 29. Do You Opt for Kryo? Kryo doesn’t work well with off-heap data structures. 29#UnifiedDataAnalytics #SparkAISummit
  30. 30. How to Use Kryo Serialization with ND4J? 1. Add the ND4J-Kryo dependency to the project 2. Configure the Spark application to use the ND4J Kryo Registrator: val sparkConf = new SparkConf sparkConf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") sparkConf.set("spark.kryo.registrator", "org.nd4j.Nd4jRegistrator") 30#UnifiedDataAnalytics #SparkAISummit
  31. 31. Spark and Large Off-heap Objects Spark has problems handling Java objects with large off-heap components, in particular in caching or persisting them. When working with DL4J, this is a frequent case, as DataSet and NDArray objects are involved. 31#UnifiedDataAnalytics #SparkAISummit
  32. 32. Spark and Large Off-heap Objects Spark drops part of a RDD based on the estimated size of that block. It estimates the size of a block depending on the selected persistence level. In case of MEMORY_ONLY or MEMORY_AND_DISK, the estimate is done by walking the Java object graph. This process doesn't take into account the off-heap memory used by DL4J and ND4J, so Spark under- estimates the true size of objects like DataSets or NDArrays. 32#UnifiedDataAnalytics #SparkAISummit O ut of M em ory Exception!
  33. 33. Spark and Large Off-heap Objects It is then good practice using MEMORY_ONLY_SER or MEMORY_AND_DISK_SER when persisting a RDD<DataSet> or a RDD<INDArray>. This way Spark stores blocks on the JVM heap in serialized form. Because there is no off-heap memory for the serialized objects, it can accurately estimate their size, in so avoiding out of memory issues. 33#UnifiedDataAnalytics #SparkAISummit
  34. 34. Configuring the Memory Limits Java command line arguments available: • -Xms • -Xmx • -Dorg.bytedeco.javacpp.maxbytes: to specify the off- heap memory limit • -Dorg.bytedeco.javacpp.maxphysicalbytes: (optional) to specify the maximum bytes for the entire process 34#UnifiedDataAnalytics #SparkAISummit
  35. 35. Configuring the Memory Limits Caveat: In limited memory environments it’s a bad idea to use high -Xmx value together with the -Xms option. This way not enough off-heap memory would be left. Example: • A system with 32 GB of RAM. • -Xmx28G • 4 GB of RAM left for the off-heap memory, the OS and everything else running in the machine. 35#UnifiedDataAnalytics #SparkAISummit
  36. 36. Configuring the Memory Limits General best practice: Typically in DL4J applications you need less RAM to be used in the JVM heap and more too be used in off-heap, since all NDArrays are stored there. Allocating too much to the JVM heap, there will not be enough memory left for the off-heap memory. 36#UnifiedDataAnalytics #SparkAISummit
  37. 37. Python Models Import in DL4J 37#UnifiedDataAnalytics #SparkAISummit TensorFlow DL4J Memory Management Applies Here
  38. 38. You can find more details on DL4J and Spark in my Book http://tinyurl.com/y9jkvtuy 38
  39. 39. Thank You! Any Questions? You can find me at @guglielmoiozzia https://ie.linkedin.com/in/giozzia googlielmo.blogspot.com 39#UnifiedDataAnalytics #SparkAISummit
  40. 40. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT

DeepLearning4J (DL4J) is a powerful Open Source distributed framework that brings Deep Learning to the JVM (it can serve as a DIY tool for Java, Scala, Clojure and Kotlin programmers). It can be used on distributed GPUs and CPUs. It is integrated with Hadoop and Apache Spark. ND4J is a Open Source, distributed and GPU-enabled library that brings the intuitive scientific computing tools of the Python community to the JVM. Training neural network models using DL4J, ND4J and Spark is a powerful combination, but it presents some unexpected issues that can compromise performance and nullify the benefits of well written code and good model design. In this talk I will walk through some of those problems and will present some best practices to prevent them, coming from lessons learned when putting things in production.

Views

Total views

457

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

21

Shares

0

Comments

0

Likes

0

×