DeepLearning4J (DL4J) is a powerful Open Source distributed framework that brings Deep Learning to the JVM (it can serve as a DIY tool for Java, Scala, Clojure and Kotlin programmers). It can be used on distributed GPUs and CPUs. It is integrated with Hadoop and Apache Spark. ND4J is a Open Source, distributed and GPU-enabled library that brings the intuitive scientific computing tools of the Python community to the JVM. Training neural network models using DL4J, ND4J and Spark is a powerful combination, but the overall cluster configuration can present some unespected issues that can compromise performances and nullify the benefits of well written code and good model design. In this talk I will walk through some of those problems and will present some best practices to prevent them. The presented use cases will refer to DL4J and ND4J on different Spark deployment modes (standalone, YARN, Kubernetes). The reference programming language for any code example would be Scala, but no preliminary Scala knowledge is mandatory in order to better understanding the presented topics.
2. Hello!
I am Guglielmo Iozzia
Associate Director – Business Tech Analysis at
Previuosly at
2
3. MSD in Ireland + 50 years
Approx. 2,000 employees
Five sites: Ballydine, Brinny, Carlow and
Dublin
$2.5 billion investment to date
Approx 50% MSD’s top 20 products
manufactured here
Export to + 60 countries
€6.1 billion turnover in 2017
2017 + 300 jobs & €280m investment
MSD Biotech, Dublin, coming in 2021
4. Deep Learning
It is a subset of machine learning where artificial neural networks, algorithms
inspired by the human brain, learn from large amounts of data.
4
5. Some Practical Applications of Deep Learning
× Computer vision
× Text generation
× NLP and NLU
× Autonomous cars
× Robotics
× Gaming
× Quantitative finance
5
6. DL4J
It is an Open Source, distributed, Deep Learning
framework written for JVM languages.
6
7. It is integrated with Hadoop and Apache Spark and
can be used on distributed GPUs and CPUs.
7
10. ND4J
It is an Open Source linear algebra and matrix manipulation library which supports n-dimensional arrays and it
is integrated with Apache Hadoop and Spark.
10
11. Apache Spark
It is a unified analytics engine for large-scale data
processing.
11
12. Speed
Apache Spark achieves high
performance for both batch
and streaming data, using a
state-of-the-art DAG
scheduler, a query
optimizer, and a physical
execution engine.
Apache Spark
Ease of Use
Write applications quickly in
Java, Scala, Python, R and
SQL.
12
13. Generality
Spark provides a stack of
libraries that can be
combined seamlessly in the
same application.
Apache Spark
Runs Everywhere
Spark runs on Hadoop,
Apache Mesos, Kubernetes,
standalone or in the cloud. It
can access diverse data
sources.
13
14. Why Distributed MNN Training with
DL4J and Apache Spark?
Why this is a powerful combination?
14
15. DL4J + Apache Spark
× DL4J provides high level API to design, configure train and
evaluate MNNs.
× Spark performances are excellent in particular for
ETL/streaming, but in terms of computation, in a MNN
training context, some data transformation/aggregation
need to be done using a low-level language.
× DL4J uses ND4J, which is a C++ library that provides high
level Scala API to developers.
15
20. Memory Management in DL4J
Memory allocations can be managed using two different
approaches:
×JVM GC and WeakReference tracking
×MemoryWorkspaces
The idea behind both is the same:
once a NDArray is no longer required, the off-heap
memory associated with it should be released so that it
can be reused. 20
21. Memory Management in DL4J
The difference between the two approaches is:
×JVM GC: when a INDArray is collected by the garbage
collector, its off-heap memory is deallocated, with the
assumption that it is not used elsewhere.
×MemoryWorkspaces: when a INDArray leaves the
workspace scope, its off-heap memory may be reused,
without deallocation and reallocation.
21
22. Memory Management in DL4J
Please remember that, when a training process uses
workspaces, in order to get the most
from this approach, periodic GC calls need to be
disabled:
Nd4j.getMemoryManager.togglePeriodicGc(false)
or their frequency needs to be reduced:
val gcInterval = 10000 // In milliseconds
Nd4j.getMemoryManager.setAutoGcWindow(gcInterval)
22
23. Spark & the DL4J
Web UI
A love/hate relationship.
23
25. Root Cause and Potential Solutions
Dependencies conflict between the DL4J-UI library and
Apache Spark when running in the same JVM.
Two alternatives are available:
×Collect and save the relevant training stats at runtime,
and then visualize them offline later.
×Execute the UI and use its remote functionality into
separate JVMs (servers). Metrics are uploaded from the
Spark master to the UI server.
25
27. Data Serialization Options in Spark
Data Serialization is the process of converting the in-
memory objects to another format that can be used
to store or send them over the network.
Two options available in Spark:
×Java (default)
×Kryo
27
29. How to Use Kryo Serialization with ND4J?
×Add the ND4J-Kryo dependency to the project
×Configure the Spark application to use the ND4J Kryo
Registrator:
val sparkConf = new SparkConf
sparkConf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
sparkConf.set("spark.kryo.registrator", "org.nd4j.Nd4jRegistrator")
29
31. Data Locality in Spark
Data locality in Spark means doing computation on the
node where data resides.
In order to optimize processing tasks, Spark tries to place the
execution code as close as possible to the processed data.
×It tries first to move serialized code to the data.
×Sometimes this isn’t possible and the data must be moved to
the executor.
31
32. Data Locality in Spark
How Spark handles data locality?
×It prefers to schedule all tasks at the best locality level.
×When there is no unprocessed data on any idle executor, it
switches to lower locality levels.
− It can wait until a busy CPU frees up to start a task on
data on the same server.
− It can immediately start a new task in a farther place that
requires moving data there.
32
33. Data Locality in Spark
Spark typically waits a bit in the hopes that a busy CPU
frees up. Once that timeout (default is 3 sec) expires, it
starts moving the data to the free CPU. But:
×Training neural networks with DL4J is computationally
expensive.
×So the Spark default behavior isn’t an ideal fit for maximizing
cluster utilization.
33
34. Data Locality in Spark and DL4J
During training on Spark, DL4J ensures that there is
exactly one task per executor: so it is always better to
immediately transfer data to a free executor, rather than
waiting for another one to become free. Computation
time is more important than any network transfer
time.
34
35. Data Locality in Spark and DL4J
Spark provides the spark.locality.wait configuration
property: it is the timeout (in seconds) to wait before
moving data to a free CPU.
So, when submitting the configuration for a DL4J training
app, we have to set the value of the spark.locality.wait
property to 0.
35
37. Spark and Large Off-heap Objects
Spark has problems handling Java objects with large off-
heap components, in particular in caching or persisting
them.
When working with DL4J, this is a frequent case, as
DataSet and NDArray objects are involved.
37
38. Spark and Large Off-heap Objects
Spark drops part of a RDD based on the estimated size of
that block. It estimates the size of a block depending on
the selected persistence level.
In case of MEMORY_ONLY or
MEMORY_AND_DISK, the estimate is done by
walking the Java object graph.
This process doesn't take into account the off-heap
memory used by DL4J and ND4J, so Spark under-
estimates the true size of objects like DataSets or
NDArrays. 38
39. Spark and Large Off-heap Objects
When deciding bewteen keeping or dropping blocks,
Spark considers only the amount of heap memory used.
DataSet and NDArray objects have a very small on-heap
size, then Spark will keep too many of them, causing out
of memory issues as off-heap memory becomes
exhausted.
39
40. Spark and Large Off-heap Objects
It is then good practice using MEMORY_ONLY_SER or
MEMORY_AND_DISK_SER when persisting a
RDD<DataSet> or a RDD<INDArray>.
This way Spark stores blocks on the JVM heap in
serialized form. Because there is no off-heap memory for
the serialized objects, it can accurately estimate their size,
in so avoiding out of memory issues.
40
42. Convolutional Neural Network (CNN)
42
By Aphex34 - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=45679374
43. Single Machine
You can use DataVec’s
ImageRecordReader.
Image Pipeline Data Preparation
Spark Cluster
Image preprocessing.
43
44. Image Pipeline Data Preparation
The Spark strategy assume the images are in subdirectories
based on their class labels. Example:
imageRootDir/car/img0.jpg
imageRootDir/car/img1.jpg
...
imageRootDir/truck/img0.jpg
imageRootDir/truck/img1.jpg
...
imageRootDir/motorbike/img0.jpg
imageRootDir/motorbike/img1.jpg
... 44
45. Image Pipeline Data Preparation (Spark)
The approach is to preprocess the images into batches of files
(ND4J’s FileBatch objects).
The motivation: the original image files typically use efficient
compression (JPEG, PNG, other) which is much more space
and network efficient than a bitmap representation. However, on
a cluster we want to minimize disk reads due to latency issues
with remote storage – one single file read/transfer is faster than
multiple remote file reads.
45
46. Image Pipeline Data Preparation (Spark)
Step 1 (option 1): Preprocess images locally.
val sourceDirectory = "/home/guglielmo/training_images"
val destinationDirectory = "/home/guglielmo/preprocessed_images"
val batchSize = 32
SparkDataUtils.createFileBatchesLocal(sourceDirectory,
NativeImageLoader.ALLOWED_FORMATS, true, destinationDirectory,
batchSize)
After the preprocessing completes, the destination directory can
be copied to the cluster.
46
47. Image Pipeline Data Preparation (Spark)
Step 1 (option 2): Preprocess images using Spark.
val sourceDirectory = “hdfs:///data/training_images”
val destinationDirectory = “hdfs:///data/preprocessed_images”
val batchSize = 32
SparkDataUtils.createFileBatchesSpark(sourceDirectory, destinationDirectory,
batchSize, sparkContext)
47
48. Image Pipeline Data Preparation (Spark)
Step 2: Create a data loader...
val imageHeightWidth = 64
val imageChannels = 3
val labelMaker:PathLabelGenerator = new ParentPathLabelGenerator
val rr = new ImageRecordReader(imageHeightWidth, imageHeightWidth,
imageChannels, labelMaker)
rr.setLabels(new TinyImageNetDataSetIterator(1).getLabels)
val numClasses = TinyImageNetFetcher.NUM_LABELS
val loader = new RecordReaderFileBatchLoader(rr, minibatch, 1, numClasses)
loader.setPreProcessor(new ImagePreProcessingScaler)
48
49. Image Pipeline Data Preparation (Spark)
Step 2: ...and finally train the model
val trainDataPath = "hdfs:///data/preprocessed_images"
val pathsTrain:JavaRDD<String> = SparkUtils.listPaths(sparkContext,
trainDataPath)
for (i <- 0 until numEpochs) {
sparkNet.fitPaths(pathsTrain, loader)
}
49
50. All the Details on DL4J and Spark in my
Book
http://tinyurl.com/y9jkvtuy
50
51. Thanks!
Any questions?
You can find me at
@guglielmoiozzia
https://ie.linkedin.com/in/giozzia
googlielmo.blogspot.com
51
52. Credits
Special thanks to all the people who made and
released these awesome resources for free:
×Presentation template by SlidesCarnival
52
Editor's Notes
The field of artificial is essentially when machines can do tasks that typically require human intelligence. It encompasses machine learning, where machines can learn by experience and acquire skills without human involvement. Deep learning is a subset of machintelligenceine learning where artificial neural networks, algorithms inspired by the human brain, learn from large amounts of data. Similarly to how we learn from experience, the deep learning algorithm would perform a task repeatedly, each time tweaking it a little to improve the outcome. We refer to ‘deep learning’ because the neural networks have various (deep) layers that enable learning. Just about any problem that requires “thought” to figure out is a problem deep learning can learn to solve.