SlideShare a Scribd company logo
Why Scala for Data
Science?
HELLO!
I am Guglielmo
Iozzia
I am here because I love AI and the
With the Best conference series
You can follow me at
@GuglielmoIozzia
2
Something about me
✘ Big Data Delivery Lead at
(UHG)
✘ Previously at and of the UN
✘ Current fields of expertise are Big
Data, ML/DL and DevOps
✘ Author of the upcoming book “Hands-
on Deep Learning with Apache Spark”
✘ I love preparing
home-made pizza3
What is Scala?
Let’s get everyone on the same
page
The Scala PL
Scala is a programming language
that blends object-oriented and
functional programming concepts on
the JVM.
5
Functional Programming
✘ In FP you write pure functions.
✘ Given the same input, a function
always return the same output,
producing no side effect.
✘ A function is first-class: it can be used
like any other type.
✘ That means that it can be assigned to
a variable, passed as a parameter to
another function or returned by a
function.6
Place your screenshot here
Functional Programming
in Scala
An example of
functional
programming in
Scala.
7
Why Scala for Data
Science?
Let’s move towards the main topic
of this talk
The Python’s Temptation
When it comes to Data Science the first programming
language people take into consideration is Python.
9
Here are three valid reasons to
consider Scala.
10
#1 Robustness
Robustness and performance when it
comes to production system and
large datasets.
11
#2 Integration
Most part of the systems/tools in the
Big Data/ML space run on the JVM.
12
Think about these systems
you most probably have in
your production tech stack.
They all run in JVMs.
13
#3 Libraries
Good availability of ready to
production Open Source ML/DL
frameworks and libraries.
14
Scala Open Source Projects for AI/ML/DL
✘ Spark MLlib: Spark’s library for ML
algorithms, feature extraction,
dimensionality reduction, linear
algebra, etc.
✘ ND4J: a linear algebra and matrix
manipulation library which supports n-
dimensional arrays and it is integrated
with Apache Hadoop and Spark.
15
Scala Open Source Projects for AI/ML/DL
✘ DeepLearning4J: a distributed deep-
learning framework written for Java
and Scala. It is integrated with Hadoop
and Apache Spark, for use on
distributed GPUs and CPUs.
✘ BigDL: a distributed deep learning
framework for Apache Spark, created
at Intel.
16
Scala Open Source Projects for AI/ML/DL
✘ XGBoost: a scalable, portable and
distributed Gradient Boosting library.
✘ PredictionIO: an Apache template
system for creating machine learning
engines.
✘ Smile: a fast and comprehensive
machine learning system.
✘ Saddle: a high-performance data
manipulation library.17
Scala Open Source Projects for AI/ML/DL
✘ Deeplearning.scala: a simple library
for creating complex neural networks.
It can be used either in standalone
JVM applications or Jupyter
Notebooks.
✘ ScalaNLP: a suite of ML and
numerical computing libraries. It
includes Breeze and Epic.
18
Code Examples
Let’s get practical!
object Nd4JScalaSample {
def main (args: Array[String]) {
// Create arrays using the numpy syntax
var arr1 = Nd4j.create(4)
val arr2 = Nd4j.linspace(1, 10, 10)
// Fill an array with the value 5 (equivalent to fill method in numpy)
println(arr1.assign(5) + "Assigned value of 5 to the array")
// Basic stats methods
println(Nd4j.mean(arr1) + "Calculate mean of array")
println(Nd4j.std(arr2) + "Calculate standard deviation of array")
println(Nd4j.`var`(arr2), "Calculate variance")
...
ND4J Example
ND4J tries to fill the
gap between JVM
languages and
Python
programmers in
terms of availability
of powerful data
analysis tools.
20
Place your screenshot here
DL4J Example (1 of 3)
Multilayer Neural
Network
configuration in
Scala with DL4J.
21
Place your screenshot here
DL4J Example (2 of 3)
Network
initialization and
training in Scala
with DL4J.
22
Place your screenshot here
DL4J Example (3 of 3)
The DL4J web UI
(training time).
23
Can Scala and Python
co-exist in Data Science
projects?
Is there any bridge between this
two worlds?
139,000
The result of a search on Google about MNN models
implemented through Tensorflow
8,330,000
The result of a generic search on Google about models
implemented through Tensorflow
120,000
The result of a search on Google about MNN examples
implemented through Tensorflow
25
Tensorflow Pros and Cons
✘ Big community
✘ Lots of models, example and use
cases available
✘ Stunning features
Mostly Python. The Java API is currently
experimental and is not covered by the
TensorFlow API stability guarantees.
26
Keras to the Rescue
✘ It is an open source neural network
library written in Python
✘ It can run on top of TensorFlow (and
other backend engines)
✘ Easy prototyping
✘ Lightweight
✘ Can be used to import Python models
to DL4J
27
TensorFlow + Keras + DL4J
28
Place your screenshot here
Importing Keras Models
into DL4J: example
DL4J provides
Java/Scala API to
import a pre-trained
TensorFlow model
through Keras.
29
Place your screenshot here
Importing Keras Models
into DL4J: example
The imported model
can then be used in
a DL4J application
implemented
through Java or
Scala only.
30
Conclusion
Bridging the Gap between Data
Engineers and Data Scientists
The Missing Link
Data Engineers
• Scala/Java skills
and experience
• Hands-on Big Data
and Streaming tools
(Hadoop, HBase,
Spark, Kafka, Beam,
etc.)
• DevOps mindset
• Attention on testing,
performance,
scalability
• Containerization
• Often no skills in
ML/DL
Data Scientist
• Strong ML/DL skills
• Python and R users
• Good data
understanding
• Model training and
evaluating strategies
• Probably knowledge
on Big Data and
Streaming tools
• No DevOps mindset
• Research more than
production
32
To Leaverage the Specific Skills of Each Team
DL4J
Keras
TensorFlow
Data Engineers Data Scientists
33
To Leaverage the Specific Skills of Each Team
Keras
Scala
(DL4J)
TensorFlow
(Python)
34
Place your screenshot here
Hands-on Deep Learning
with Apache Spark
More on some topics
covered in this talk
can be found in this
book.
https://tinyurl.com/y9jkvtuy
35
THANK
YOU!
Any questions?
You can find me at
✘ @GuglielmoIozzia
✘ https://ie.linkedin.com/in/giozzia
✘ googlielmo.blogspot.com/
✘ https://dzone.com/users/253294
8/virtualramblas.html
36
Credits
Special thanks to all the people who made
and released these awesome resources for
free:
✘ Presentation template by SlidesCarnival
✘ The painting in slide 9 is a detail of “Eve
Tempted” (1887) by John Roddam
Spencer Stanhope
37

More Related Content

What's hot

Spark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar CastanedaSpark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar Castaneda
Spark Summit
 
Project Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Project Hydrogen: State-of-the-Art Deep Learning on Apache SparkProject Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Project Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Databricks
 
Hadoop summit 2016
Hadoop summit 2016Hadoop summit 2016
Hadoop summit 2016
Adam Gibson
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
Databricks
 
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2OStrata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Sri Ambati
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Jen Aman
 
Big data Processing with Apache Spark & Scala
Big data Processing with Apache Spark & ScalaBig data Processing with Apache Spark & Scala
Big data Processing with Apache Spark & ScalaEdureka!
 
Relationship Extraction from Unstructured Text-Based on Stanford NLP with Spa...
Relationship Extraction from Unstructured Text-Based on Stanford NLP with Spa...Relationship Extraction from Unstructured Text-Based on Stanford NLP with Spa...
Relationship Extraction from Unstructured Text-Based on Stanford NLP with Spa...
Spark Summit
 
Deep Learning with GPUs in Production - AI By the Bay
Deep Learning with GPUs in Production - AI By the BayDeep Learning with GPUs in Production - AI By the Bay
Deep Learning with GPUs in Production - AI By the Bay
Adam Gibson
 
EclairJS = Node.Js + Apache Spark
EclairJS = Node.Js + Apache SparkEclairJS = Node.Js + Apache Spark
EclairJS = Node.Js + Apache Spark
Jen Aman
 
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit
 
Self driving computers active learning workflows with human interpretable ve...
Self driving computers  active learning workflows with human interpretable ve...Self driving computers  active learning workflows with human interpretable ve...
Self driving computers active learning workflows with human interpretable ve...
Adam Gibson
 
Productionizing Machine Learning Pipelines with Databricks and Azure ML
Productionizing Machine Learning Pipelines with Databricks and Azure MLProductionizing Machine Learning Pipelines with Databricks and Azure ML
Productionizing Machine Learning Pipelines with Databricks and Azure ML
Databricks
 
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
Spark Summit
 
Spark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar CastanedaSpark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar Castaneda
Spark Summit
 
Spark Summit EU talk by Kent Buenaventura and Willaim Lau
Spark Summit EU talk by Kent Buenaventura and Willaim LauSpark Summit EU talk by Kent Buenaventura and Willaim Lau
Spark Summit EU talk by Kent Buenaventura and Willaim Lau
Spark Summit
 
Scalable Scientific Computing with Dask
Scalable Scientific Computing with DaskScalable Scientific Computing with Dask
Scalable Scientific Computing with Dask
Uwe Korn
 
Benchmark Tests and How-Tos of Convolutional Neural Network on HorovodRunner ...
Benchmark Tests and How-Tos of Convolutional Neural Network on HorovodRunner ...Benchmark Tests and How-Tos of Convolutional Neural Network on HorovodRunner ...
Benchmark Tests and How-Tos of Convolutional Neural Network on HorovodRunner ...
Databricks
 
High Performance Python on Apache Spark
High Performance Python on Apache SparkHigh Performance Python on Apache Spark
High Performance Python on Apache Spark
Wes McKinney
 
Operationalize Apache Spark Analytics
Operationalize Apache Spark AnalyticsOperationalize Apache Spark Analytics
Operationalize Apache Spark Analytics
Databricks
 

What's hot (20)

Spark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar CastanedaSpark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar Castaneda
 
Project Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Project Hydrogen: State-of-the-Art Deep Learning on Apache SparkProject Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Project Hydrogen: State-of-the-Art Deep Learning on Apache Spark
 
Hadoop summit 2016
Hadoop summit 2016Hadoop summit 2016
Hadoop summit 2016
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
 
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2OStrata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2O
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
 
Big data Processing with Apache Spark & Scala
Big data Processing with Apache Spark & ScalaBig data Processing with Apache Spark & Scala
Big data Processing with Apache Spark & Scala
 
Relationship Extraction from Unstructured Text-Based on Stanford NLP with Spa...
Relationship Extraction from Unstructured Text-Based on Stanford NLP with Spa...Relationship Extraction from Unstructured Text-Based on Stanford NLP with Spa...
Relationship Extraction from Unstructured Text-Based on Stanford NLP with Spa...
 
Deep Learning with GPUs in Production - AI By the Bay
Deep Learning with GPUs in Production - AI By the BayDeep Learning with GPUs in Production - AI By the Bay
Deep Learning with GPUs in Production - AI By the Bay
 
EclairJS = Node.Js + Apache Spark
EclairJS = Node.Js + Apache SparkEclairJS = Node.Js + Apache Spark
EclairJS = Node.Js + Apache Spark
 
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod Narasimha
 
Self driving computers active learning workflows with human interpretable ve...
Self driving computers  active learning workflows with human interpretable ve...Self driving computers  active learning workflows with human interpretable ve...
Self driving computers active learning workflows with human interpretable ve...
 
Productionizing Machine Learning Pipelines with Databricks and Azure ML
Productionizing Machine Learning Pipelines with Databricks and Azure MLProductionizing Machine Learning Pipelines with Databricks and Azure ML
Productionizing Machine Learning Pipelines with Databricks and Azure ML
 
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
 
Spark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar CastanedaSpark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar Castaneda
 
Spark Summit EU talk by Kent Buenaventura and Willaim Lau
Spark Summit EU talk by Kent Buenaventura and Willaim LauSpark Summit EU talk by Kent Buenaventura and Willaim Lau
Spark Summit EU talk by Kent Buenaventura and Willaim Lau
 
Scalable Scientific Computing with Dask
Scalable Scientific Computing with DaskScalable Scientific Computing with Dask
Scalable Scientific Computing with Dask
 
Benchmark Tests and How-Tos of Convolutional Neural Network on HorovodRunner ...
Benchmark Tests and How-Tos of Convolutional Neural Network on HorovodRunner ...Benchmark Tests and How-Tos of Convolutional Neural Network on HorovodRunner ...
Benchmark Tests and How-Tos of Convolutional Neural Network on HorovodRunner ...
 
High Performance Python on Apache Spark
High Performance Python on Apache SparkHigh Performance Python on Apache Spark
High Performance Python on Apache Spark
 
Operationalize Apache Spark Analytics
Operationalize Apache Spark AnalyticsOperationalize Apache Spark Analytics
Operationalize Apache Spark Analytics
 

Similar to Why scala for data science

Introduction to Scala
Introduction to ScalaIntroduction to Scala
Introduction to Scala
Mohammad Hossein Rimaz
 
Suneel Marthi - Deep Learning with Apache Flink and DL4J
Suneel Marthi - Deep Learning with Apache Flink and DL4JSuneel Marthi - Deep Learning with Apache Flink and DL4J
Suneel Marthi - Deep Learning with Apache Flink and DL4J
Flink Forward
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life CycleMLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
Databricks
 
Deep Learning for Java Developer - Getting Started
Deep Learning for Java Developer - Getting StartedDeep Learning for Java Developer - Getting Started
Deep Learning for Java Developer - Getting Started
Suyash Joshi
 
Tensorflow 2.0 and Coral Edge TPU
Tensorflow 2.0 and Coral Edge TPU Tensorflow 2.0 and Coral Edge TPU
Tensorflow 2.0 and Coral Edge TPU
Andrés Leonardo Martinez Ortiz
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflow
Databricks
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ... MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
Databricks
 
Hands on image recognition with scala spark and deep learning4j
Hands on image recognition with scala spark and deep learning4jHands on image recognition with scala spark and deep learning4j
Hands on image recognition with scala spark and deep learning4j
Guglielmo Iozzia
 
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...
Codemotion
 
Spark Streaming and MLlib - Hyderabad Spark Group
Spark Streaming and MLlib - Hyderabad Spark GroupSpark Streaming and MLlib - Hyderabad Spark Group
Spark Streaming and MLlib - Hyderabad Spark Group
Phaneendra Chiruvella
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code Workshop
Amanda Casari
 
201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning
Mark Tabladillo
 
Sparklife - Life In The Trenches With Spark
Sparklife - Life In The Trenches With SparkSparklife - Life In The Trenches With Spark
Sparklife - Life In The Trenches With Spark
Ian Pointer
 
Sequoia Spark Talk March 2015.pdf
Sequoia Spark Talk March 2015.pdfSequoia Spark Talk March 2015.pdf
Sequoia Spark Talk March 2015.pdf
totomeme1991
 
Top Deep Learning Frameworks.pdf
Top Deep Learning Frameworks.pdfTop Deep Learning Frameworks.pdf
Top Deep Learning Frameworks.pdf
Appdeveloper10
 
Intro - End to end ML with Kubeflow @ SignalConf 2018
Intro - End to end ML with Kubeflow @ SignalConf 2018Intro - End to end ML with Kubeflow @ SignalConf 2018
Intro - End to end ML with Kubeflow @ SignalConf 2018
Holden Karau
 
Integrating Deep Learning Libraries with Apache Spark
Integrating Deep Learning Libraries with Apache SparkIntegrating Deep Learning Libraries with Apache Spark
Integrating Deep Learning Libraries with Apache Spark
Databricks
 
Scala final ppt vinay
Scala final ppt vinayScala final ppt vinay
Scala final ppt vinay
Viplav Jain
 
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...
Codemotion
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
Kaxil Naik
 

Similar to Why scala for data science (20)

Introduction to Scala
Introduction to ScalaIntroduction to Scala
Introduction to Scala
 
Suneel Marthi - Deep Learning with Apache Flink and DL4J
Suneel Marthi - Deep Learning with Apache Flink and DL4JSuneel Marthi - Deep Learning with Apache Flink and DL4J
Suneel Marthi - Deep Learning with Apache Flink and DL4J
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life CycleMLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
 
Deep Learning for Java Developer - Getting Started
Deep Learning for Java Developer - Getting StartedDeep Learning for Java Developer - Getting Started
Deep Learning for Java Developer - Getting Started
 
Tensorflow 2.0 and Coral Edge TPU
Tensorflow 2.0 and Coral Edge TPU Tensorflow 2.0 and Coral Edge TPU
Tensorflow 2.0 and Coral Edge TPU
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflow
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ... MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 
Hands on image recognition with scala spark and deep learning4j
Hands on image recognition with scala spark and deep learning4jHands on image recognition with scala spark and deep learning4j
Hands on image recognition with scala spark and deep learning4j
 
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...
 
Spark Streaming and MLlib - Hyderabad Spark Group
Spark Streaming and MLlib - Hyderabad Spark GroupSpark Streaming and MLlib - Hyderabad Spark Group
Spark Streaming and MLlib - Hyderabad Spark Group
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code Workshop
 
201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning
 
Sparklife - Life In The Trenches With Spark
Sparklife - Life In The Trenches With SparkSparklife - Life In The Trenches With Spark
Sparklife - Life In The Trenches With Spark
 
Sequoia Spark Talk March 2015.pdf
Sequoia Spark Talk March 2015.pdfSequoia Spark Talk March 2015.pdf
Sequoia Spark Talk March 2015.pdf
 
Top Deep Learning Frameworks.pdf
Top Deep Learning Frameworks.pdfTop Deep Learning Frameworks.pdf
Top Deep Learning Frameworks.pdf
 
Intro - End to end ML with Kubeflow @ SignalConf 2018
Intro - End to end ML with Kubeflow @ SignalConf 2018Intro - End to end ML with Kubeflow @ SignalConf 2018
Intro - End to end ML with Kubeflow @ SignalConf 2018
 
Integrating Deep Learning Libraries with Apache Spark
Integrating Deep Learning Libraries with Apache SparkIntegrating Deep Learning Libraries with Apache Spark
Integrating Deep Learning Libraries with Apache Spark
 
Scala final ppt vinay
Scala final ppt vinayScala final ppt vinay
Scala final ppt vinay
 
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
 

Recently uploaded

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 

Recently uploaded (20)

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 

Why scala for data science

  • 1. Why Scala for Data Science?
  • 2. HELLO! I am Guglielmo Iozzia I am here because I love AI and the With the Best conference series You can follow me at @GuglielmoIozzia 2
  • 3. Something about me ✘ Big Data Delivery Lead at (UHG) ✘ Previously at and of the UN ✘ Current fields of expertise are Big Data, ML/DL and DevOps ✘ Author of the upcoming book “Hands- on Deep Learning with Apache Spark” ✘ I love preparing home-made pizza3
  • 4. What is Scala? Let’s get everyone on the same page
  • 5. The Scala PL Scala is a programming language that blends object-oriented and functional programming concepts on the JVM. 5
  • 6. Functional Programming ✘ In FP you write pure functions. ✘ Given the same input, a function always return the same output, producing no side effect. ✘ A function is first-class: it can be used like any other type. ✘ That means that it can be assigned to a variable, passed as a parameter to another function or returned by a function.6
  • 7. Place your screenshot here Functional Programming in Scala An example of functional programming in Scala. 7
  • 8. Why Scala for Data Science? Let’s move towards the main topic of this talk
  • 9. The Python’s Temptation When it comes to Data Science the first programming language people take into consideration is Python. 9
  • 10. Here are three valid reasons to consider Scala. 10
  • 11. #1 Robustness Robustness and performance when it comes to production system and large datasets. 11
  • 12. #2 Integration Most part of the systems/tools in the Big Data/ML space run on the JVM. 12
  • 13. Think about these systems you most probably have in your production tech stack. They all run in JVMs. 13
  • 14. #3 Libraries Good availability of ready to production Open Source ML/DL frameworks and libraries. 14
  • 15. Scala Open Source Projects for AI/ML/DL ✘ Spark MLlib: Spark’s library for ML algorithms, feature extraction, dimensionality reduction, linear algebra, etc. ✘ ND4J: a linear algebra and matrix manipulation library which supports n- dimensional arrays and it is integrated with Apache Hadoop and Spark. 15
  • 16. Scala Open Source Projects for AI/ML/DL ✘ DeepLearning4J: a distributed deep- learning framework written for Java and Scala. It is integrated with Hadoop and Apache Spark, for use on distributed GPUs and CPUs. ✘ BigDL: a distributed deep learning framework for Apache Spark, created at Intel. 16
  • 17. Scala Open Source Projects for AI/ML/DL ✘ XGBoost: a scalable, portable and distributed Gradient Boosting library. ✘ PredictionIO: an Apache template system for creating machine learning engines. ✘ Smile: a fast and comprehensive machine learning system. ✘ Saddle: a high-performance data manipulation library.17
  • 18. Scala Open Source Projects for AI/ML/DL ✘ Deeplearning.scala: a simple library for creating complex neural networks. It can be used either in standalone JVM applications or Jupyter Notebooks. ✘ ScalaNLP: a suite of ML and numerical computing libraries. It includes Breeze and Epic. 18
  • 20. object Nd4JScalaSample { def main (args: Array[String]) { // Create arrays using the numpy syntax var arr1 = Nd4j.create(4) val arr2 = Nd4j.linspace(1, 10, 10) // Fill an array with the value 5 (equivalent to fill method in numpy) println(arr1.assign(5) + "Assigned value of 5 to the array") // Basic stats methods println(Nd4j.mean(arr1) + "Calculate mean of array") println(Nd4j.std(arr2) + "Calculate standard deviation of array") println(Nd4j.`var`(arr2), "Calculate variance") ... ND4J Example ND4J tries to fill the gap between JVM languages and Python programmers in terms of availability of powerful data analysis tools. 20
  • 21. Place your screenshot here DL4J Example (1 of 3) Multilayer Neural Network configuration in Scala with DL4J. 21
  • 22. Place your screenshot here DL4J Example (2 of 3) Network initialization and training in Scala with DL4J. 22
  • 23. Place your screenshot here DL4J Example (3 of 3) The DL4J web UI (training time). 23
  • 24. Can Scala and Python co-exist in Data Science projects? Is there any bridge between this two worlds?
  • 25. 139,000 The result of a search on Google about MNN models implemented through Tensorflow 8,330,000 The result of a generic search on Google about models implemented through Tensorflow 120,000 The result of a search on Google about MNN examples implemented through Tensorflow 25
  • 26. Tensorflow Pros and Cons ✘ Big community ✘ Lots of models, example and use cases available ✘ Stunning features Mostly Python. The Java API is currently experimental and is not covered by the TensorFlow API stability guarantees. 26
  • 27. Keras to the Rescue ✘ It is an open source neural network library written in Python ✘ It can run on top of TensorFlow (and other backend engines) ✘ Easy prototyping ✘ Lightweight ✘ Can be used to import Python models to DL4J 27
  • 28. TensorFlow + Keras + DL4J 28
  • 29. Place your screenshot here Importing Keras Models into DL4J: example DL4J provides Java/Scala API to import a pre-trained TensorFlow model through Keras. 29
  • 30. Place your screenshot here Importing Keras Models into DL4J: example The imported model can then be used in a DL4J application implemented through Java or Scala only. 30
  • 31. Conclusion Bridging the Gap between Data Engineers and Data Scientists
  • 32. The Missing Link Data Engineers • Scala/Java skills and experience • Hands-on Big Data and Streaming tools (Hadoop, HBase, Spark, Kafka, Beam, etc.) • DevOps mindset • Attention on testing, performance, scalability • Containerization • Often no skills in ML/DL Data Scientist • Strong ML/DL skills • Python and R users • Good data understanding • Model training and evaluating strategies • Probably knowledge on Big Data and Streaming tools • No DevOps mindset • Research more than production 32
  • 33. To Leaverage the Specific Skills of Each Team DL4J Keras TensorFlow Data Engineers Data Scientists 33
  • 34. To Leaverage the Specific Skills of Each Team Keras Scala (DL4J) TensorFlow (Python) 34
  • 35. Place your screenshot here Hands-on Deep Learning with Apache Spark More on some topics covered in this talk can be found in this book. https://tinyurl.com/y9jkvtuy 35
  • 36. THANK YOU! Any questions? You can find me at ✘ @GuglielmoIozzia ✘ https://ie.linkedin.com/in/giozzia ✘ googlielmo.blogspot.com/ ✘ https://dzone.com/users/253294 8/virtualramblas.html 36
  • 37. Credits Special thanks to all the people who made and released these awesome resources for free: ✘ Presentation template by SlidesCarnival ✘ The painting in slide 9 is a detail of “Eve Tempted” (1887) by John Roddam Spencer Stanhope 37