SlideShare a Scribd company logo
1 of 30
Future of AI on the JVM
Scala Days Amsterdam 2015
Adam Gibson
Creator of Deeplearning4j (and 4s :)
What is AI?
● Not Terminator (despite our name)
● Many subfields
● Our focus: Machine learning
Big Data?
Problem Space
● Spam Classification
● Summarization
● Face Detection
● Eye Tracking
● Targeted Ads
● Recommendation Engines
Current State of ML
● Simpler models
● Most of industry barely uses Logistic Reg.
● Many problems are binary
o e.g. fraud, spam
● Some unsupervised (clustering, reccos)
● Lots of ML frameworks on JVM
ML Frameworks on JVM...
● Apache Mahout
● Spark’s MLlib
● Weka (is that R?)
ML GUIs
● Prediction.io
● Encog
Problems
● Monolithic
● Makes assumptions about data
● Hard to use
● No separation of concerns
Ring a Bell?
● We call that “Monolithic”
● Separate ML concerns:
Data Pipelines/Vectorization
Scoring
Model Training
Evaluation
Micro-Services + ML?
● Kinda like micro-services
● Reduce lock in
● Take math, data cleaning, model training,
choosing algorithms ...
● … and separate them
Math
● Parametric Models (Matrices!)
● Non Parametric (Random forest)
● Focusing on Matrices (the hard part of ML
systems)
Matrices
● NDArrays ( > 2d)
● Tensors (think of pages of matrices)
● Example: 2 x 2 x 2 (2 2x 2 matrices)
● ^^THIS IS UNCLEAR. Two 2 x 2 matrices?
● Applies to graphs w/ sparse representations
Chips/Hardware/Matrices
● CPUs - We work with these
● GPUs - CUDA ditto
● FPGAs
o Intel bought Altera, an FPGA maker, for $17 billion
this month
o The edge, the cloud
Why New Chips?
Why New Chips?
● See the numbers yourself:
● http://www.slideshare.net/airbots/cuda-
29330283
● http://devblogs.nvidia.com/parallelforall/bidm
ach-machine-learning-limit-gpus/
● http://jcuda.org
Mixed clusters
● GPUs aren’t good for all workloads
● Because latency
● Need to upload data: not good for small
problems
● Mixed CPU/GPU clusters are best bet
Data Pipelines
● More data will be binary
● Frameworks today can’t process binary well
● Binary data has different semantics
● Moving windows for audio
● 3d for images ...
People Roll Their Own b/c
● Current frameworks assume clean data :(
● Pipelines are brittle, hard to maintain
● Moving towards being composable (reuse)
Dedicated Libraries
● Let’s focus on vectorization -- now!
● Because IoT
● Because more access to raw media
● Should fit into current big data frameworks
Scoring
● AUC
● F1
● Different Loss Functions
● Hyper parameter optimization
All independent
● These things work for different models
● Shouldn’t be tied to a particular system
● Should be embeddable
Training
● Split Train/Test
● Sample data (no, not all the data ;) to
validate model
● Increasingly compute intensive
Deep Learning
● Most done in Python...
● Norm training time is measured in
hours/days -- weeks!?
● Work being done in HPC (Model parallelism)
● Distbelief (Data parallelism)
Automatic Learning
● Good at unstructured data
● Images, Text, Audio and Sensors
● Quick, baseline feature engineering
● Not good at feature introspection
Or are they?
TSNE
Where Does Scala Fit In?
● Akka - Real time streaming analytics/micro services
● Spark - Dataframes/number crunching
● JVM Key/Value Stores
● Pistachio (powers Yahoo’s ad network)
o http://yahooeng.tumblr.com/post/118860853846/dist
ributed-word2vec-on-top-of-pistachio
The Way We Learn Now
● Monolithic ML frameworks
● No per-chip optimizations
● No Tensors (come on guys, it’s 2015...)
● Need isolation and less lockin
● JVM is the platform to make it happen
Other Links
● http://deeplearning4j.org/
● http://nd4j.org/
● https://github.com/deeplearning4j/Canova
Questions?
● adam@skymind.io
● @agibsonccc
● github.com/agibsonccc

More Related Content

What's hot

Anomaly Detection and Automatic Labeling with Deep Learning
Anomaly Detection and Automatic Labeling with Deep LearningAnomaly Detection and Automatic Labeling with Deep Learning
Anomaly Detection and Automatic Labeling with Deep LearningAdam Gibson
 
Big Data Analytics Tokyo
Big Data Analytics TokyoBig Data Analytics Tokyo
Big Data Analytics TokyoAdam Gibson
 
Deep learning in production with the best
Deep learning in production   with the bestDeep learning in production   with the best
Deep learning in production with the bestAdam Gibson
 
Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)Nikhil Garg
 
Dl4j in the wild
Dl4j in the wildDl4j in the wild
Dl4j in the wildAdam Gibson
 
Bringing Deep Learning into production
Bringing Deep Learning into production Bringing Deep Learning into production
Bringing Deep Learning into production Paolo Platter
 
IBM Middle East Data Science Connect 2016 - Doha, Qatar
IBM Middle East Data Science Connect 2016 - Doha, QatarIBM Middle East Data Science Connect 2016 - Doha, Qatar
IBM Middle East Data Science Connect 2016 - Doha, QatarRomeo Kienzler
 
CI/CD for Machine Learning with Daniel Kobran
CI/CD for Machine Learning with Daniel KobranCI/CD for Machine Learning with Daniel Kobran
CI/CD for Machine Learning with Daniel KobranDatabricks
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...MLconf
 
How to Feed a Data Hungry Organization – by Traveloka Data Team
How to Feed a Data Hungry Organization – by Traveloka Data TeamHow to Feed a Data Hungry Organization – by Traveloka Data Team
How to Feed a Data Hungry Organization – by Traveloka Data TeamTraveloka
 
Deep Learning with MXNet - Dmitry Larko
Deep Learning with MXNet - Dmitry LarkoDeep Learning with MXNet - Dmitry Larko
Deep Learning with MXNet - Dmitry LarkoSri Ambati
 
Anomaly detection in deep learning (Updated) English
Anomaly detection in deep learning (Updated) EnglishAnomaly detection in deep learning (Updated) English
Anomaly detection in deep learning (Updated) EnglishAdam Gibson
 
Anatomy of in memory processing in Spark
Anatomy of in memory processing in SparkAnatomy of in memory processing in Spark
Anatomy of in memory processing in Sparkdatamantra
 
Traveloka's journey to no ops streaming analytics
Traveloka's journey to no ops streaming analyticsTraveloka's journey to no ops streaming analytics
Traveloka's journey to no ops streaming analyticsRendy Bambang Junior
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsZhenxiao Luo
 
Machine Learning Using Cloud Services
Machine Learning Using Cloud ServicesMachine Learning Using Cloud Services
Machine Learning Using Cloud ServicesSC5.io
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...Databricks
 
Deep Learning with Microsoft Cognitive Toolkit
Deep Learning with Microsoft Cognitive ToolkitDeep Learning with Microsoft Cognitive Toolkit
Deep Learning with Microsoft Cognitive ToolkitBarbara Fusinska
 
Staying Shallow & Lean in a Deep Learning World
Staying Shallow & Lean in a Deep Learning WorldStaying Shallow & Lean in a Deep Learning World
Staying Shallow & Lean in a Deep Learning WorldXavier Amatriain
 

What's hot (20)

Anomaly Detection and Automatic Labeling with Deep Learning
Anomaly Detection and Automatic Labeling with Deep LearningAnomaly Detection and Automatic Labeling with Deep Learning
Anomaly Detection and Automatic Labeling with Deep Learning
 
Big Data Analytics Tokyo
Big Data Analytics TokyoBig Data Analytics Tokyo
Big Data Analytics Tokyo
 
Deep learning in production with the best
Deep learning in production   with the bestDeep learning in production   with the best
Deep learning in production with the best
 
Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)
 
Dl4j in the wild
Dl4j in the wildDl4j in the wild
Dl4j in the wild
 
Bringing Deep Learning into production
Bringing Deep Learning into production Bringing Deep Learning into production
Bringing Deep Learning into production
 
IBM Middle East Data Science Connect 2016 - Doha, Qatar
IBM Middle East Data Science Connect 2016 - Doha, QatarIBM Middle East Data Science Connect 2016 - Doha, Qatar
IBM Middle East Data Science Connect 2016 - Doha, Qatar
 
CI/CD for Machine Learning with Daniel Kobran
CI/CD for Machine Learning with Daniel KobranCI/CD for Machine Learning with Daniel Kobran
CI/CD for Machine Learning with Daniel Kobran
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
 
Scrappy
ScrappyScrappy
Scrappy
 
How to Feed a Data Hungry Organization – by Traveloka Data Team
How to Feed a Data Hungry Organization – by Traveloka Data TeamHow to Feed a Data Hungry Organization – by Traveloka Data Team
How to Feed a Data Hungry Organization – by Traveloka Data Team
 
Deep Learning with MXNet - Dmitry Larko
Deep Learning with MXNet - Dmitry LarkoDeep Learning with MXNet - Dmitry Larko
Deep Learning with MXNet - Dmitry Larko
 
Anomaly detection in deep learning (Updated) English
Anomaly detection in deep learning (Updated) EnglishAnomaly detection in deep learning (Updated) English
Anomaly detection in deep learning (Updated) English
 
Anatomy of in memory processing in Spark
Anatomy of in memory processing in SparkAnatomy of in memory processing in Spark
Anatomy of in memory processing in Spark
 
Traveloka's journey to no ops streaming analytics
Traveloka's journey to no ops streaming analyticsTraveloka's journey to no ops streaming analytics
Traveloka's journey to no ops streaming analytics
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systems
 
Machine Learning Using Cloud Services
Machine Learning Using Cloud ServicesMachine Learning Using Cloud Services
Machine Learning Using Cloud Services
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
 
Deep Learning with Microsoft Cognitive Toolkit
Deep Learning with Microsoft Cognitive ToolkitDeep Learning with Microsoft Cognitive Toolkit
Deep Learning with Microsoft Cognitive Toolkit
 
Staying Shallow & Lean in a Deep Learning World
Staying Shallow & Lean in a Deep Learning WorldStaying Shallow & Lean in a Deep Learning World
Staying Shallow & Lean in a Deep Learning World
 

Viewers also liked

Applied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4jApplied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4jDataWorks Summit
 
Composing Project Archetyps with SBT AutoPlugins
Composing Project Archetyps with SBT AutoPluginsComposing Project Archetyps with SBT AutoPlugins
Composing Project Archetyps with SBT AutoPluginsMark Schaake
 
Transformative Git Practices
Transformative Git PracticesTransformative Git Practices
Transformative Git PracticesNicola Paolucci
 
Nd4 j slides.pptx
Nd4 j slides.pptxNd4 j slides.pptx
Nd4 j slides.pptxAdam Gibson
 
A Scala Corrections Library
A Scala Corrections LibraryA Scala Corrections Library
A Scala Corrections LibraryPaul Phillips
 
Lightning Talk: Running MongoDB on Docker for High Performance Deployments
Lightning Talk: Running MongoDB on Docker for High Performance DeploymentsLightning Talk: Running MongoDB on Docker for High Performance Deployments
Lightning Talk: Running MongoDB on Docker for High Performance DeploymentsMongoDB
 
Basic NLP with Python and NLTK
Basic NLP with Python and NLTKBasic NLP with Python and NLTK
Basic NLP with Python and NLTKFrancesco Bruni
 
The Next Generation SharePoint: Powered by Text Analytics
The Next Generation SharePoint: Powered by Text AnalyticsThe Next Generation SharePoint: Powered by Text Analytics
The Next Generation SharePoint: Powered by Text AnalyticsAlyona Medelyan
 
Effective Actors
Effective ActorsEffective Actors
Effective Actorsshinolajla
 
Natural Language Toolkit (NLTK), Basics
Natural Language Toolkit (NLTK), Basics Natural Language Toolkit (NLTK), Basics
Natural Language Toolkit (NLTK), Basics Prakash Pimpale
 
Scala Json Features and Performance
Scala Json Features and PerformanceScala Json Features and Performance
Scala Json Features and PerformanceJohn Nestor
 
Stateful Distributed Stream Processing
Stateful Distributed Stream ProcessingStateful Distributed Stream Processing
Stateful Distributed Stream ProcessingGyula Fóra
 
KiwiPyCon 2014 - NLP with Python tutorial
KiwiPyCon 2014 - NLP with Python tutorialKiwiPyCon 2014 - NLP with Python tutorial
KiwiPyCon 2014 - NLP with Python tutorialAlyona Medelyan
 
Distributed deep rl on spark strata singapore
Distributed deep rl on spark   strata singaporeDistributed deep rl on spark   strata singapore
Distributed deep rl on spark strata singaporeAdam Gibson
 
Recurrent nets and sensors
Recurrent nets and sensorsRecurrent nets and sensors
Recurrent nets and sensorsAdam Gibson
 
Wrangleconf Big Data Malaysia 2016
Wrangleconf Big Data Malaysia 2016Wrangleconf Big Data Malaysia 2016
Wrangleconf Big Data Malaysia 2016Adam Gibson
 
What We (Don't) Know About the Beginning of the Universe
What We (Don't) Know About the Beginning of the UniverseWhat We (Don't) Know About the Beginning of the Universe
What We (Don't) Know About the Beginning of the UniverseSean Carroll
 
Gifford Lecture One: Cosmos, Time, Memory
Gifford Lecture One: Cosmos, Time, MemoryGifford Lecture One: Cosmos, Time, Memory
Gifford Lecture One: Cosmos, Time, MemorySean Carroll
 
NLTK - Natural Language Processing in Python
NLTK - Natural Language Processing in PythonNLTK - Natural Language Processing in Python
NLTK - Natural Language Processing in Pythonshanbady
 
Hadoop Turns a Corner and Sees the Future
Hadoop Turns a Corner and Sees the FutureHadoop Turns a Corner and Sees the Future
Hadoop Turns a Corner and Sees the FutureDataWorks Summit
 

Viewers also liked (20)

Applied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4jApplied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4j
 
Composing Project Archetyps with SBT AutoPlugins
Composing Project Archetyps with SBT AutoPluginsComposing Project Archetyps with SBT AutoPlugins
Composing Project Archetyps with SBT AutoPlugins
 
Transformative Git Practices
Transformative Git PracticesTransformative Git Practices
Transformative Git Practices
 
Nd4 j slides.pptx
Nd4 j slides.pptxNd4 j slides.pptx
Nd4 j slides.pptx
 
A Scala Corrections Library
A Scala Corrections LibraryA Scala Corrections Library
A Scala Corrections Library
 
Lightning Talk: Running MongoDB on Docker for High Performance Deployments
Lightning Talk: Running MongoDB on Docker for High Performance DeploymentsLightning Talk: Running MongoDB on Docker for High Performance Deployments
Lightning Talk: Running MongoDB on Docker for High Performance Deployments
 
Basic NLP with Python and NLTK
Basic NLP with Python and NLTKBasic NLP with Python and NLTK
Basic NLP with Python and NLTK
 
The Next Generation SharePoint: Powered by Text Analytics
The Next Generation SharePoint: Powered by Text AnalyticsThe Next Generation SharePoint: Powered by Text Analytics
The Next Generation SharePoint: Powered by Text Analytics
 
Effective Actors
Effective ActorsEffective Actors
Effective Actors
 
Natural Language Toolkit (NLTK), Basics
Natural Language Toolkit (NLTK), Basics Natural Language Toolkit (NLTK), Basics
Natural Language Toolkit (NLTK), Basics
 
Scala Json Features and Performance
Scala Json Features and PerformanceScala Json Features and Performance
Scala Json Features and Performance
 
Stateful Distributed Stream Processing
Stateful Distributed Stream ProcessingStateful Distributed Stream Processing
Stateful Distributed Stream Processing
 
KiwiPyCon 2014 - NLP with Python tutorial
KiwiPyCon 2014 - NLP with Python tutorialKiwiPyCon 2014 - NLP with Python tutorial
KiwiPyCon 2014 - NLP with Python tutorial
 
Distributed deep rl on spark strata singapore
Distributed deep rl on spark   strata singaporeDistributed deep rl on spark   strata singapore
Distributed deep rl on spark strata singapore
 
Recurrent nets and sensors
Recurrent nets and sensorsRecurrent nets and sensors
Recurrent nets and sensors
 
Wrangleconf Big Data Malaysia 2016
Wrangleconf Big Data Malaysia 2016Wrangleconf Big Data Malaysia 2016
Wrangleconf Big Data Malaysia 2016
 
What We (Don't) Know About the Beginning of the Universe
What We (Don't) Know About the Beginning of the UniverseWhat We (Don't) Know About the Beginning of the Universe
What We (Don't) Know About the Beginning of the Universe
 
Gifford Lecture One: Cosmos, Time, Memory
Gifford Lecture One: Cosmos, Time, MemoryGifford Lecture One: Cosmos, Time, Memory
Gifford Lecture One: Cosmos, Time, Memory
 
NLTK - Natural Language Processing in Python
NLTK - Natural Language Processing in PythonNLTK - Natural Language Processing in Python
NLTK - Natural Language Processing in Python
 
Hadoop Turns a Corner and Sees the Future
Hadoop Turns a Corner and Sees the FutureHadoop Turns a Corner and Sees the Future
Hadoop Turns a Corner and Sees the Future
 

Similar to Future of AI on the JVM: Microservices and Deep Learning

AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or realityAwantik Das
 
Bridging the gap in enterprise AI
Bridging the gap in enterprise AIBridging the gap in enterprise AI
Bridging the gap in enterprise AIMax Pumperla
 
Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makerszekeLabs Technologies
 
Ideas spracklen-final
Ideas spracklen-finalIdeas spracklen-final
Ideas spracklen-finalsupportlogic
 
Productionizing Deep Learning From the Ground Up
Productionizing Deep Learning From the Ground UpProductionizing Deep Learning From the Ground Up
Productionizing Deep Learning From the Ground Upodsc
 
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned Omid Vahdaty
 
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureFei Chen
 
Webinar - Unleash AI power with MySQL and MindsDB
Webinar - Unleash AI power with MySQL and MindsDBWebinar - Unleash AI power with MySQL and MindsDB
Webinar - Unleash AI power with MySQL and MindsDBFederico Razzoli
 
Scaling Recommendations at Quora (RecSys talk 9/16/2016)
Scaling Recommendations at Quora (RecSys talk 9/16/2016)Scaling Recommendations at Quora (RecSys talk 9/16/2016)
Scaling Recommendations at Quora (RecSys talk 9/16/2016)Nikhil Dandekar
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabszekeLabs Technologies
 
Real world machine learning with Java for Fumankaitori.com
Real world machine learning with Java for Fumankaitori.comReal world machine learning with Java for Fumankaitori.com
Real world machine learning with Java for Fumankaitori.comMathieu Dumoulin
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3 Omid Vahdaty
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroDaniel Marcous
 
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdfSlides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdfvitm11
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | EnglishOmid Vahdaty
 
Sf big analytics: bighead
Sf big analytics: bigheadSf big analytics: bighead
Sf big analytics: bigheadChester Chen
 
What drives Innovation? Innovations And Technological Solutions for the Distr...
What drives Innovation? Innovations And Technological Solutions for the Distr...What drives Innovation? Innovations And Technological Solutions for the Distr...
What drives Innovation? Innovations And Technological Solutions for the Distr...Stefano Fago
 
Simply Business' Data Platform
Simply Business' Data PlatformSimply Business' Data Platform
Simply Business' Data PlatformDani Solà Lagares
 

Similar to Future of AI on the JVM: Microservices and Deep Learning (20)

AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or reality
 
Cloud accounting software uk
Cloud accounting software ukCloud accounting software uk
Cloud accounting software uk
 
Bridging the gap in enterprise AI
Bridging the gap in enterprise AIBridging the gap in enterprise AI
Bridging the gap in enterprise AI
 
Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makers
 
Ideas spracklen-final
Ideas spracklen-finalIdeas spracklen-final
Ideas spracklen-final
 
Productionizing Deep Learning From the Ground Up
Productionizing Deep Learning From the Ground UpProductionizing Deep Learning From the Ground Up
Productionizing Deep Learning From the Ground Up
 
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
 
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
 
Webinar - Unleash AI power with MySQL and MindsDB
Webinar - Unleash AI power with MySQL and MindsDBWebinar - Unleash AI power with MySQL and MindsDB
Webinar - Unleash AI power with MySQL and MindsDB
 
Scaling Recommendations at Quora (RecSys talk 9/16/2016)
Scaling Recommendations at Quora (RecSys talk 9/16/2016)Scaling Recommendations at Quora (RecSys talk 9/16/2016)
Scaling Recommendations at Quora (RecSys talk 9/16/2016)
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabs
 
Real world machine learning with Java for Fumankaitori.com
Real world machine learning with Java for Fumankaitori.comReal world machine learning with Java for Fumankaitori.com
Real world machine learning with Java for Fumankaitori.com
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to hero
 
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdfSlides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
 
Aws autopilot
Aws autopilotAws autopilot
Aws autopilot
 
Sf big analytics: bighead
Sf big analytics: bigheadSf big analytics: bighead
Sf big analytics: bighead
 
What drives Innovation? Innovations And Technological Solutions for the Distr...
What drives Innovation? Innovations And Technological Solutions for the Distr...What drives Innovation? Innovations And Technological Solutions for the Distr...
What drives Innovation? Innovations And Technological Solutions for the Distr...
 
Simply Business' Data Platform
Simply Business' Data PlatformSimply Business' Data Platform
Simply Business' Data Platform
 

More from Adam Gibson

End to end MLworkflows
End to end MLworkflowsEnd to end MLworkflows
End to end MLworkflowsAdam Gibson
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018Adam Gibson
 
Deploying signature verification with deep learning
Deploying signature verification with deep learningDeploying signature verification with deep learning
Deploying signature verification with deep learningAdam Gibson
 
Boolan machine learning summit
Boolan machine learning summitBoolan machine learning summit
Boolan machine learning summitAdam Gibson
 
Advanced deeplearning4j features
Advanced deeplearning4j featuresAdvanced deeplearning4j features
Advanced deeplearning4j featuresAdam Gibson
 
Strata Beijing - Deep Learning in Production on Spark
Strata Beijing - Deep Learning in Production on SparkStrata Beijing - Deep Learning in Production on Spark
Strata Beijing - Deep Learning in Production on SparkAdam Gibson
 
Skymind - Udacity China presentation
Skymind - Udacity China presentationSkymind - Udacity China presentation
Skymind - Udacity China presentationAdam Gibson
 
Anomaly Detection in Deep Learning (Updated)
Anomaly Detection in Deep Learning (Updated)Anomaly Detection in Deep Learning (Updated)
Anomaly Detection in Deep Learning (Updated)Adam Gibson
 
Hadoop summit 2016
Hadoop summit 2016Hadoop summit 2016
Hadoop summit 2016Adam Gibson
 
Anomaly detection in deep learning
Anomaly detection in deep learningAnomaly detection in deep learning
Anomaly detection in deep learningAdam Gibson
 
Advanced spark deep learning
Advanced spark deep learningAdvanced spark deep learning
Advanced spark deep learningAdam Gibson
 
Deep learning on Hadoop/Spark -NextML
Deep learning on Hadoop/Spark -NextMLDeep learning on Hadoop/Spark -NextML
Deep learning on Hadoop/Spark -NextMLAdam Gibson
 
Skymind & Deeplearning4j: Deep Learning for the Enterprise
Skymind & Deeplearning4j: Deep Learning for the EnterpriseSkymind & Deeplearning4j: Deep Learning for the Enterprise
Skymind & Deeplearning4j: Deep Learning for the EnterpriseAdam Gibson
 
Sf data mining_meetup
Sf data mining_meetupSf data mining_meetup
Sf data mining_meetupAdam Gibson
 

More from Adam Gibson (14)

End to end MLworkflows
End to end MLworkflowsEnd to end MLworkflows
End to end MLworkflows
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018
 
Deploying signature verification with deep learning
Deploying signature verification with deep learningDeploying signature verification with deep learning
Deploying signature verification with deep learning
 
Boolan machine learning summit
Boolan machine learning summitBoolan machine learning summit
Boolan machine learning summit
 
Advanced deeplearning4j features
Advanced deeplearning4j featuresAdvanced deeplearning4j features
Advanced deeplearning4j features
 
Strata Beijing - Deep Learning in Production on Spark
Strata Beijing - Deep Learning in Production on SparkStrata Beijing - Deep Learning in Production on Spark
Strata Beijing - Deep Learning in Production on Spark
 
Skymind - Udacity China presentation
Skymind - Udacity China presentationSkymind - Udacity China presentation
Skymind - Udacity China presentation
 
Anomaly Detection in Deep Learning (Updated)
Anomaly Detection in Deep Learning (Updated)Anomaly Detection in Deep Learning (Updated)
Anomaly Detection in Deep Learning (Updated)
 
Hadoop summit 2016
Hadoop summit 2016Hadoop summit 2016
Hadoop summit 2016
 
Anomaly detection in deep learning
Anomaly detection in deep learningAnomaly detection in deep learning
Anomaly detection in deep learning
 
Advanced spark deep learning
Advanced spark deep learningAdvanced spark deep learning
Advanced spark deep learning
 
Deep learning on Hadoop/Spark -NextML
Deep learning on Hadoop/Spark -NextMLDeep learning on Hadoop/Spark -NextML
Deep learning on Hadoop/Spark -NextML
 
Skymind & Deeplearning4j: Deep Learning for the Enterprise
Skymind & Deeplearning4j: Deep Learning for the EnterpriseSkymind & Deeplearning4j: Deep Learning for the Enterprise
Skymind & Deeplearning4j: Deep Learning for the Enterprise
 
Sf data mining_meetup
Sf data mining_meetupSf data mining_meetup
Sf data mining_meetup
 

Recently uploaded

Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 

Recently uploaded (20)

Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 

Future of AI on the JVM: Microservices and Deep Learning

  • 1. Future of AI on the JVM Scala Days Amsterdam 2015 Adam Gibson Creator of Deeplearning4j (and 4s :)
  • 2. What is AI? ● Not Terminator (despite our name) ● Many subfields ● Our focus: Machine learning
  • 4. Problem Space ● Spam Classification ● Summarization ● Face Detection ● Eye Tracking ● Targeted Ads ● Recommendation Engines
  • 5. Current State of ML ● Simpler models ● Most of industry barely uses Logistic Reg. ● Many problems are binary o e.g. fraud, spam ● Some unsupervised (clustering, reccos) ● Lots of ML frameworks on JVM
  • 6. ML Frameworks on JVM... ● Apache Mahout ● Spark’s MLlib ● Weka (is that R?)
  • 8. Problems ● Monolithic ● Makes assumptions about data ● Hard to use ● No separation of concerns
  • 9. Ring a Bell? ● We call that “Monolithic” ● Separate ML concerns: Data Pipelines/Vectorization Scoring Model Training Evaluation
  • 10. Micro-Services + ML? ● Kinda like micro-services ● Reduce lock in ● Take math, data cleaning, model training, choosing algorithms ... ● … and separate them
  • 11. Math ● Parametric Models (Matrices!) ● Non Parametric (Random forest) ● Focusing on Matrices (the hard part of ML systems)
  • 12. Matrices ● NDArrays ( > 2d) ● Tensors (think of pages of matrices) ● Example: 2 x 2 x 2 (2 2x 2 matrices) ● ^^THIS IS UNCLEAR. Two 2 x 2 matrices? ● Applies to graphs w/ sparse representations
  • 13. Chips/Hardware/Matrices ● CPUs - We work with these ● GPUs - CUDA ditto ● FPGAs o Intel bought Altera, an FPGA maker, for $17 billion this month o The edge, the cloud
  • 15. Why New Chips? ● See the numbers yourself: ● http://www.slideshare.net/airbots/cuda- 29330283 ● http://devblogs.nvidia.com/parallelforall/bidm ach-machine-learning-limit-gpus/ ● http://jcuda.org
  • 16. Mixed clusters ● GPUs aren’t good for all workloads ● Because latency ● Need to upload data: not good for small problems ● Mixed CPU/GPU clusters are best bet
  • 17. Data Pipelines ● More data will be binary ● Frameworks today can’t process binary well ● Binary data has different semantics ● Moving windows for audio ● 3d for images ...
  • 18. People Roll Their Own b/c ● Current frameworks assume clean data :( ● Pipelines are brittle, hard to maintain ● Moving towards being composable (reuse)
  • 19. Dedicated Libraries ● Let’s focus on vectorization -- now! ● Because IoT ● Because more access to raw media ● Should fit into current big data frameworks
  • 20. Scoring ● AUC ● F1 ● Different Loss Functions ● Hyper parameter optimization
  • 21. All independent ● These things work for different models ● Shouldn’t be tied to a particular system ● Should be embeddable
  • 22. Training ● Split Train/Test ● Sample data (no, not all the data ;) to validate model ● Increasingly compute intensive
  • 23. Deep Learning ● Most done in Python... ● Norm training time is measured in hours/days -- weeks!? ● Work being done in HPC (Model parallelism) ● Distbelief (Data parallelism)
  • 24. Automatic Learning ● Good at unstructured data ● Images, Text, Audio and Sensors ● Quick, baseline feature engineering ● Not good at feature introspection
  • 26. TSNE
  • 27. Where Does Scala Fit In? ● Akka - Real time streaming analytics/micro services ● Spark - Dataframes/number crunching ● JVM Key/Value Stores ● Pistachio (powers Yahoo’s ad network) o http://yahooeng.tumblr.com/post/118860853846/dist ributed-word2vec-on-top-of-pistachio
  • 28. The Way We Learn Now ● Monolithic ML frameworks ● No per-chip optimizations ● No Tensors (come on guys, it’s 2015...) ● Need isolation and less lockin ● JVM is the platform to make it happen
  • 29. Other Links ● http://deeplearning4j.org/ ● http://nd4j.org/ ● https://github.com/deeplearning4j/Canova