Why scala for data science

HELLO!
I am Guglielmo
Iozzia
I am here because I love AI and the
With the Best conference series
You can follow me at
@GuglielmoIozzia
2

Something about me
✘ Big Data Delivery Lead at
(UHG)
✘ Previously at and of the UN
✘ Current fields of expertise are Big
Data, ML/DL and DevOps
✘ Author of the upcoming book “Hands-
on Deep Learning with Apache Spark”
✘ I love preparing
home-made pizza3

What is Scala?
Let’s get everyone on the same
page

The Scala PL
Scala is a programming language
that blends object-oriented and
functional programming concepts on
the JVM.
5

Functional Programming
✘ In FP you write pure functions.
✘ Given the same input, a function
always return the same output,
producing no side effect.
✘ A function is first-class: it can be used
like any other type.
✘ That means that it can be assigned to
a variable, passed as a parameter to
another function or returned by a
function.6

Place your screenshot here
Functional Programming
in Scala
An example of
functional
programming in
Scala.
7

Why Scala for Data
Science?
Let’s move towards the main topic
of this talk

The Python’s Temptation
When it comes to Data Science the first programming
language people take into consideration is Python.
9

Here are three valid reasons to
consider Scala.
10

#1 Robustness
Robustness and performance when it
comes to production system and
large datasets.
11

#2 Integration
Most part of the systems/tools in the
Big Data/ML space run on the JVM.
12

Think about these systems
you most probably have in
your production tech stack.
They all run in JVMs.
13

#3 Libraries
Good availability of ready to
production Open Source ML/DL
frameworks and libraries.
14

Scala Open Source Projects for AI/ML/DL
✘ Spark MLlib: Spark’s library for ML
algorithms, feature extraction,
dimensionality reduction, linear
algebra, etc.
✘ ND4J: a linear algebra and matrix
manipulation library which supports n-
dimensional arrays and it is integrated
with Apache Hadoop and Spark.
15

✘ DeepLearning4J: a distributed deep-
learning framework written for Java
and Scala. It is integrated with Hadoop
and Apache Spark, for use on
distributed GPUs and CPUs.
✘ BigDL: a distributed deep learning
framework for Apache Spark, created
at Intel.
16

✘ XGBoost: a scalable, portable and
distributed Gradient Boosting library.
✘ PredictionIO: an Apache template
system for creating machine learning
engines.
✘ Smile: a fast and comprehensive
machine learning system.
✘ Saddle: a high-performance data
manipulation library.17

✘ Deeplearning.scala: a simple library
for creating complex neural networks.
It can be used either in standalone
JVM applications or Jupyter
Notebooks.
✘ ScalaNLP: a suite of ML and
numerical computing libraries. It
includes Breeze and Epic.
18

Code Examples
Let’s get practical!

object Nd4JScalaSample {
def main (args: Array[String]) {
// Create arrays using the numpy syntax
var arr1 = Nd4j.create(4)
val arr2 = Nd4j.linspace(1, 10, 10)
// Fill an array with the value 5 (equivalent to fill method in numpy)
println(arr1.assign(5) + "Assigned value of 5 to the array")
// Basic stats methods
println(Nd4j.mean(arr1) + "Calculate mean of array")
println(Nd4j.std(arr2) + "Calculate standard deviation of array")
println(Nd4j.`var`(arr2), "Calculate variance")
...
ND4J Example
ND4J tries to fill the
gap between JVM
languages and
Python
programmers in
terms of availability
of powerful data
analysis tools.
20

DL4J Example (1 of 3)
Multilayer Neural
Network
configuration in
Scala with DL4J.
21

Network
initialization and
training in Scala
with DL4J.
22

The DL4J web UI
(training time).
23

Can Scala and Python
co-exist in Data Science
projects?
Is there any bridge between this
two worlds?

139,000
The result of a search on Google about MNN models
implemented through Tensorflow
8,330,000
The result of a generic search on Google about models
120,000
The result of a search on Google about MNN examples
25

Tensorflow Pros and Cons
✘ Big community
✘ Lots of models, example and use
cases available
✘ Stunning features
Mostly Python. The Java API is currently
experimental and is not covered by the
TensorFlow API stability guarantees.
26

Keras to the Rescue
✘ It is an open source neural network
library written in Python
✘ It can run on top of TensorFlow (and
other backend engines)
✘ Easy prototyping
✘ Lightweight
✘ Can be used to import Python models
to DL4J
27

Importing Keras Models
into DL4J: example
DL4J provides
Java/Scala API to
import a pre-trained
TensorFlow model
through Keras.
29

Importing Keras Models
into DL4J: example
The imported model
can then be used in
a DL4J application
implemented
through Java or
Scala only.
30

Conclusion
Bridging the Gap between Data
Engineers and Data Scientists

The Missing Link
Data Engineers
• Scala/Java skills
and experience
• Hands-on Big Data
and Streaming tools
(Hadoop, HBase,
Spark, Kafka, Beam,
etc.)
• DevOps mindset
• Attention on testing,
performance,
scalability
• Containerization
• Often no skills in
ML/DL
Data Scientist
• Strong ML/DL skills
• Python and R users
• Good data
understanding
• Model training and
evaluating strategies
• Probably knowledge
on Big Data and
Streaming tools
• No DevOps mindset
• Research more than
production
32

To Leaverage the Specific Skills of Each Team
DL4J
Keras
TensorFlow
Data Engineers Data Scientists
33

To Leaverage the Specific Skills of Each Team
Keras
Scala
(DL4J)
TensorFlow
(Python)
34

Hands-on Deep Learning
with Apache Spark
More on some topics
covered in this talk
can be found in this
book.
https://tinyurl.com/y9jkvtuy
35

THANK
YOU!
Any questions?
You can find me at
✘ @GuglielmoIozzia
✘ https://ie.linkedin.com/in/giozzia
✘ googlielmo.blogspot.com/
✘ https://dzone.com/users/253294
8/virtualramblas.html
36

Credits
Special thanks to all the people who made
and released these awesome resources for
free:
✘ Presentation template by SlidesCarnival
✘ The painting in slide 9 is a detail of “Eve
Tempted” (1887) by John Roddam
Spencer Stanhope
37

Why scala for data science

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Why scala for data science

Similar to Why scala for data science (20)

Recently uploaded

Recently uploaded (20)

Why scala for data science