SlideShare a Scribd company logo
skymind.io | deeplearning.org | gitter.im/deeplearning4j
Deep Learning in Production
Building Production Class Deep Learning Workflows for the Enterprise
Adam Gibson / CTO Skymind
AI With the Best / The Internet
Topics
• Deep Learning in Production vs Academia
• Data Scientists vs Engineers
• Defining Production
• A solution
Deep Learning in Production vs
Academia
Academia/Research
Focus on accuracy and the latest architectures
Build proof of concepts quickly to validate an assumption
Prototype as many ideas as quickly as possible to come
up with a solution to a problem
Publish often incremental results to increase publications
Current state of research
Mostly funded by large consumer companies (Amazon,Google,Facebook,..)
Scant pockets of deep learning academic institutions (CMU,Stanford,NYU,..)
Large focus on audio and vision, somewhat spreading in to natural language
processing
Starting to focus more on reinforcement learning and better ways of tuning
People in Deep Learning
• Talent still sparse
• Most are in research labs
• Some of them are enthusiasts or startup founders
• Reality: Deep Learning hasn’t hit most of the world yet. It affects alot of people
but most aren’t doing it.
Industry (MOST Companies doing data science)
● Most use linear regression and random forest
● Prototyping happens in python - these are data scientists
● Data Engineers hold the keys to the cluster (write code in java)
● Most problems are simple - analytics, churn prediction, maybe
recommendation engines or price forecasting
● Deep Learning is seen as overkill - no gpus in your cluster
Data Scientists vs Engineers
Data Scientists
• Math or stats background - know r or python
• Often a beginning coder - may have started in sql and
moved up to analytics
• Know basic machine learning - problems are focused
on replacing excel spreadsheets or solving business
problems
Data Engineers
• Computer Science background
• Builds data pipelines and knows how to setup
production systems
• Doesn’t really know machine learning that well -
usually willing to learn
• Usually closer to the product team - may port python
algorithms to java depending on level of ability
The hybrid
• Been in the game a while knows CS and stats
• Knows SQL, machine learning, and how to operate a
spark cluster
• Can formulate problems and figure out what projects to
tackle next
• Either understands business objectives or can
implement machine learning algorithms themselves
Most companies
• 2 separate teams
• Data scientists use python/r and sql, experiment with
data and come up with new models (very little machine
learning)
• Data engineers use java (sometimes .net) and work on
terabytes of data - most time spent writing integrations
and data pipelines
Startups
● Tend to employ generalists
● Usually 3-5 people who can sort of do both. Startups aren’t usually ready to
hire specialists
● Sometimes have a product where something like deep learning is needed
● Usually ruby or python stack, not many users or scale
● Usually just want something simple to setup
● Not much need for compiled languages or scale yet - this comes later
Defining Production
Defining “Production”
● Varying degrees of scale
● Not everyone has terabytes of data
● Mysql and outsourced cloud services are “machine learning” for most startups
● Many will start out with scikit learn and flask, maybe add python based deep
learning later. This is “good enough” - this is also what you see the most
tutorials for
● Larger companies care more about other things - security,scale, and return on
investment for projects. These companies use java
● If you’re google you use c++ or facebook you use your own version of php
you wrote and maintain
Hardware
• GPUs have very little market penetration
• Deep Learning also has very little market penetration
(despite the marketing)
• Most of the world is cpus (this is changing very slowly)
• Startups are fine with cloud - on prem data centers are
usually dell or hp servers with red hat or ubuntu on
them
Typical stack
• Web based product (go,ruby,python,scala,java,mix)
• Storage (1 or more sql databases, elasticsearch/solr)
• Cloud infrastructure or on prem (bare metal)
• Machine Learning - ???
Machine Learning at startups
• Random 1 off scripts for analysis
• Random 1 off notebooks
• 1 off ETL pipelines written in java
• 1 or more models tied to a rest api that talks to your
product stack
Machine Learning at big companies
• Random 1 off scripts for analysis
• Random 1 off notebooks
• Large numbers of separate data bases and applications
run by different teams
• Multiple disconnected apis
• Some models connected to a spark or hadoop cluster
Challenges in Production
• Serving user traffic (latency)
• Data access (connecting everything together)
• Large amounts of time spent on data pipeline code
• Unclear metrics of success for the data team
• Lack of innovation or “too much” eg: “chase the shiny
new thing”
Challenges of Deep Learning in Production
• Same problems as machine learning
• Hard to interpret models
• Requires specialized hardware
• Not a lot of best practices
• Lack of expertise (machine learning is hard enough)
Closing the gap
Establish some best practices
• Kaggle is a good start for this - start with “somewhat real” problems
• Use higher level tools - keras, otherwise easy to get lost in weeds
• Consider having a real world goal - eg: if you’re in real estate figure out how to
use a simple cnn (not the latest algorithm) for image search
• Depending on need consider integration with hadoop/spark
• Lastly - don’t treat deep learning as special. It’s still a subfield of machine
learning
Going to production
• Sometimes python is enough for simple stuff
• Data Engineering teams should consider java/scala
based solutions (disclaimer: highly opinionated here)
• Follow same workflow - prototype in python port to
production
• Overall - scope to a core problem where deep learning
is worth it
Newer hardware
• Prototype on cloud infrastructure on a toy problem
• Try out this “GPU thing” and see what might be
involved
• Learn the trade offs of cpus and gpus - don’t believe
the marketing
• Buy new hardware as needed
In closing
• Use something open source to start off with
• Use something *supported* keep an eye on open
source activity
• Don’t just believe the research. Papers are not your
company. Do due diligence
Thank you!
Please visit
skymind.io/learn for more
information

More Related Content

What's hot

DeepLearning and Advanced Machine Learning on IoT
DeepLearning and Advanced Machine Learning on IoTDeepLearning and Advanced Machine Learning on IoT
DeepLearning and Advanced Machine Learning on IoT
Romeo Kienzler
 
H2O World - H2O Deep Learning with Arno Candel
H2O World - H2O Deep Learning with Arno CandelH2O World - H2O Deep Learning with Arno Candel
H2O World - H2O Deep Learning with Arno Candel
Sri Ambati
 
Productionizing dl from the ground up
Productionizing dl from the ground upProductionizing dl from the ground up
Productionizing dl from the ground up
Adam Gibson
 
Machine Learning for (JVM) Developers
Machine Learning for (JVM) DevelopersMachine Learning for (JVM) Developers
Machine Learning for (JVM) Developers
Mateusz Dymczyk
 
Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work wi...
Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work wi...Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work wi...
Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work wi...
Formulatedby
 
Brief introduction to Distributed Deep Learning
Brief introduction to Distributed Deep LearningBrief introduction to Distributed Deep Learning
Brief introduction to Distributed Deep Learning
Adam Gibson
 
Big Data Analytics Tokyo
Big Data Analytics TokyoBig Data Analytics Tokyo
Big Data Analytics Tokyo
Adam Gibson
 
Anomaly detection in deep learning (Updated) English
Anomaly detection in deep learning (Updated) EnglishAnomaly detection in deep learning (Updated) English
Anomaly detection in deep learning (Updated) English
Adam Gibson
 
Deep learning on a mixed cluster with deeplearning4j and spark
Deep learning on a mixed cluster with deeplearning4j and sparkDeep learning on a mixed cluster with deeplearning4j and spark
Deep learning on a mixed cluster with deeplearning4j and spark
François Garillot
 
Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)Nikhil Garg
 
Distributed machine learning 101 using apache spark from a browser devoxx.b...
Distributed machine learning 101 using apache spark from a browser   devoxx.b...Distributed machine learning 101 using apache spark from a browser   devoxx.b...
Distributed machine learning 101 using apache spark from a browser devoxx.b...
Andy Petrella
 
DeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François GarillotDeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François Garillot
Steve Moore
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018
Adam Gibson
 
Adversarial Post-Ex: Lessons From The Pros
Adversarial Post-Ex: Lessons From The ProsAdversarial Post-Ex: Lessons From The Pros
Adversarial Post-Ex: Lessons From The Pros
Justin Warner
 
Testing for the deeplearning folks
Testing for the deeplearning folksTesting for the deeplearning folks
Testing for the deeplearning folks
Vishwas N
 
Machine Learning Infrastructure
Machine Learning InfrastructureMachine Learning Infrastructure
Machine Learning Infrastructure
SigOpt
 
DevTernity - OOP in the enterprise
DevTernity - OOP in the enterpriseDevTernity - OOP in the enterprise
DevTernity - OOP in the enterprise
Nicolas Fränkel
 
Advanced deeplearning4j features
Advanced deeplearning4j featuresAdvanced deeplearning4j features
Advanced deeplearning4j features
Adam Gibson
 
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2OStrata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Sri Ambati
 
Practical Deep Learning
Practical Deep LearningPractical Deep Learning
Practical Deep Learning
André Karpištšenko
 

What's hot (20)

DeepLearning and Advanced Machine Learning on IoT
DeepLearning and Advanced Machine Learning on IoTDeepLearning and Advanced Machine Learning on IoT
DeepLearning and Advanced Machine Learning on IoT
 
H2O World - H2O Deep Learning with Arno Candel
H2O World - H2O Deep Learning with Arno CandelH2O World - H2O Deep Learning with Arno Candel
H2O World - H2O Deep Learning with Arno Candel
 
Productionizing dl from the ground up
Productionizing dl from the ground upProductionizing dl from the ground up
Productionizing dl from the ground up
 
Machine Learning for (JVM) Developers
Machine Learning for (JVM) DevelopersMachine Learning for (JVM) Developers
Machine Learning for (JVM) Developers
 
Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work wi...
Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work wi...Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work wi...
Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work wi...
 
Brief introduction to Distributed Deep Learning
Brief introduction to Distributed Deep LearningBrief introduction to Distributed Deep Learning
Brief introduction to Distributed Deep Learning
 
Big Data Analytics Tokyo
Big Data Analytics TokyoBig Data Analytics Tokyo
Big Data Analytics Tokyo
 
Anomaly detection in deep learning (Updated) English
Anomaly detection in deep learning (Updated) EnglishAnomaly detection in deep learning (Updated) English
Anomaly detection in deep learning (Updated) English
 
Deep learning on a mixed cluster with deeplearning4j and spark
Deep learning on a mixed cluster with deeplearning4j and sparkDeep learning on a mixed cluster with deeplearning4j and spark
Deep learning on a mixed cluster with deeplearning4j and spark
 
Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)
 
Distributed machine learning 101 using apache spark from a browser devoxx.b...
Distributed machine learning 101 using apache spark from a browser   devoxx.b...Distributed machine learning 101 using apache spark from a browser   devoxx.b...
Distributed machine learning 101 using apache spark from a browser devoxx.b...
 
DeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François GarillotDeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François Garillot
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018
 
Adversarial Post-Ex: Lessons From The Pros
Adversarial Post-Ex: Lessons From The ProsAdversarial Post-Ex: Lessons From The Pros
Adversarial Post-Ex: Lessons From The Pros
 
Testing for the deeplearning folks
Testing for the deeplearning folksTesting for the deeplearning folks
Testing for the deeplearning folks
 
Machine Learning Infrastructure
Machine Learning InfrastructureMachine Learning Infrastructure
Machine Learning Infrastructure
 
DevTernity - OOP in the enterprise
DevTernity - OOP in the enterpriseDevTernity - OOP in the enterprise
DevTernity - OOP in the enterprise
 
Advanced deeplearning4j features
Advanced deeplearning4j featuresAdvanced deeplearning4j features
Advanced deeplearning4j features
 
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2OStrata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2O
 
Practical Deep Learning
Practical Deep LearningPractical Deep Learning
Practical Deep Learning
 

Viewers also liked

Wrangleconf Big Data Malaysia 2016
Wrangleconf Big Data Malaysia 2016Wrangleconf Big Data Malaysia 2016
Wrangleconf Big Data Malaysia 2016
Adam Gibson
 
Distributed deep rl on spark strata singapore
Distributed deep rl on spark   strata singaporeDistributed deep rl on spark   strata singapore
Distributed deep rl on spark strata singapore
Adam Gibson
 
Deep Learning with GPUs in Production - AI By the Bay
Deep Learning with GPUs in Production - AI By the BayDeep Learning with GPUs in Production - AI By the Bay
Deep Learning with GPUs in Production - AI By the Bay
Adam Gibson
 
Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016
Grigory Sapunov
 
Deep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceDeep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial Intelligence
Lukas Masuch
 
Cities and Startups: Cultivating Deep Engagement
Cities and Startups: Cultivating Deep EngagementCities and Startups: Cultivating Deep Engagement
Cities and Startups: Cultivating Deep Engagement
Code for America
 
Notes on Machine Learning and Data-centric Startups
Notes on Machine Learning and Data-centric StartupsNotes on Machine Learning and Data-centric Startups
Notes on Machine Learning and Data-centric Startups
Jay (Jianqiang) Wang
 
Predictive apps for startups
Predictive apps for startupsPredictive apps for startups
Predictive apps for startups
Louis Dorard
 
Investor's View on Machine Intelligence startups, 1.0, @YellowDoors meetup Ap...
Investor's View on Machine Intelligence startups, 1.0, @YellowDoors meetup Ap...Investor's View on Machine Intelligence startups, 1.0, @YellowDoors meetup Ap...
Investor's View on Machine Intelligence startups, 1.0, @YellowDoors meetup Ap...
Victor Osyka
 
Startups are about learning, SW Startup Day at TUT
Startups are about learning, SW Startup Day at TUTStartups are about learning, SW Startup Day at TUT
Startups are about learning, SW Startup Day at TUT
Topi Järvinen
 
SUPERSMART LEARNING TOOLS for Lean Startups: Volume 1 - Six Question (Q) Temp...
SUPERSMART LEARNING TOOLS for Lean Startups: Volume 1 - Six Question (Q) Temp...SUPERSMART LEARNING TOOLS for Lean Startups: Volume 1 - Six Question (Q) Temp...
SUPERSMART LEARNING TOOLS for Lean Startups: Volume 1 - Six Question (Q) Temp...
Rod King, Ph.D.
 
Investors foresee a safe bet on deep tech startups
Investors foresee a safe bet on deep tech startupsInvestors foresee a safe bet on deep tech startups
Investors foresee a safe bet on deep tech startups
eTailing India
 
Self-Service.AI - Pitch Competition for AI-Driven SaaS Startups
Self-Service.AI - Pitch Competition for AI-Driven SaaS StartupsSelf-Service.AI - Pitch Competition for AI-Driven SaaS Startups
Self-Service.AI - Pitch Competition for AI-Driven SaaS Startups
Datentreiber
 
Recommender Systems and Active Learning (for Startups)
Recommender Systems and Active Learning (for Startups)Recommender Systems and Active Learning (for Startups)
Recommender Systems and Active Learning (for Startups)
Neil Rubens
 
Investor's view on machine intelligence startups, 2.0, Jan 2017
Investor's view on machine intelligence startups, 2.0, Jan 2017Investor's view on machine intelligence startups, 2.0, Jan 2017
Investor's view on machine intelligence startups, 2.0, Jan 2017
Victor Osyka
 
BootstrapLabs - Tracxn Report - artificial intelligence for the Applied Arti...
BootstrapLabs - Tracxn  Report - artificial intelligence for the Applied Arti...BootstrapLabs - Tracxn  Report - artificial intelligence for the Applied Arti...
BootstrapLabs - Tracxn Report - artificial intelligence for the Applied Arti...
BootstrapLabs
 
Machine learning and TensorFlow
Machine learning and TensorFlowMachine learning and TensorFlow
Machine learning and TensorFlow
Jose Papo, MSc
 
Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!
Roelof Pieters
 
Venture Scanner Artificial Intelligence 2016 Q4
Venture Scanner Artificial Intelligence 2016 Q4Venture Scanner Artificial Intelligence 2016 Q4
Venture Scanner Artificial Intelligence 2016 Q4
Nathan Pacer
 
Introduction to Machine Learning and Deep Learning
Introduction to Machine Learning and Deep LearningIntroduction to Machine Learning and Deep Learning
Introduction to Machine Learning and Deep Learning
Terry Taewoong Um
 

Viewers also liked (20)

Wrangleconf Big Data Malaysia 2016
Wrangleconf Big Data Malaysia 2016Wrangleconf Big Data Malaysia 2016
Wrangleconf Big Data Malaysia 2016
 
Distributed deep rl on spark strata singapore
Distributed deep rl on spark   strata singaporeDistributed deep rl on spark   strata singapore
Distributed deep rl on spark strata singapore
 
Deep Learning with GPUs in Production - AI By the Bay
Deep Learning with GPUs in Production - AI By the BayDeep Learning with GPUs in Production - AI By the Bay
Deep Learning with GPUs in Production - AI By the Bay
 
Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016
 
Deep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceDeep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial Intelligence
 
Cities and Startups: Cultivating Deep Engagement
Cities and Startups: Cultivating Deep EngagementCities and Startups: Cultivating Deep Engagement
Cities and Startups: Cultivating Deep Engagement
 
Notes on Machine Learning and Data-centric Startups
Notes on Machine Learning and Data-centric StartupsNotes on Machine Learning and Data-centric Startups
Notes on Machine Learning and Data-centric Startups
 
Predictive apps for startups
Predictive apps for startupsPredictive apps for startups
Predictive apps for startups
 
Investor's View on Machine Intelligence startups, 1.0, @YellowDoors meetup Ap...
Investor's View on Machine Intelligence startups, 1.0, @YellowDoors meetup Ap...Investor's View on Machine Intelligence startups, 1.0, @YellowDoors meetup Ap...
Investor's View on Machine Intelligence startups, 1.0, @YellowDoors meetup Ap...
 
Startups are about learning, SW Startup Day at TUT
Startups are about learning, SW Startup Day at TUTStartups are about learning, SW Startup Day at TUT
Startups are about learning, SW Startup Day at TUT
 
SUPERSMART LEARNING TOOLS for Lean Startups: Volume 1 - Six Question (Q) Temp...
SUPERSMART LEARNING TOOLS for Lean Startups: Volume 1 - Six Question (Q) Temp...SUPERSMART LEARNING TOOLS for Lean Startups: Volume 1 - Six Question (Q) Temp...
SUPERSMART LEARNING TOOLS for Lean Startups: Volume 1 - Six Question (Q) Temp...
 
Investors foresee a safe bet on deep tech startups
Investors foresee a safe bet on deep tech startupsInvestors foresee a safe bet on deep tech startups
Investors foresee a safe bet on deep tech startups
 
Self-Service.AI - Pitch Competition for AI-Driven SaaS Startups
Self-Service.AI - Pitch Competition for AI-Driven SaaS StartupsSelf-Service.AI - Pitch Competition for AI-Driven SaaS Startups
Self-Service.AI - Pitch Competition for AI-Driven SaaS Startups
 
Recommender Systems and Active Learning (for Startups)
Recommender Systems and Active Learning (for Startups)Recommender Systems and Active Learning (for Startups)
Recommender Systems and Active Learning (for Startups)
 
Investor's view on machine intelligence startups, 2.0, Jan 2017
Investor's view on machine intelligence startups, 2.0, Jan 2017Investor's view on machine intelligence startups, 2.0, Jan 2017
Investor's view on machine intelligence startups, 2.0, Jan 2017
 
BootstrapLabs - Tracxn Report - artificial intelligence for the Applied Arti...
BootstrapLabs - Tracxn  Report - artificial intelligence for the Applied Arti...BootstrapLabs - Tracxn  Report - artificial intelligence for the Applied Arti...
BootstrapLabs - Tracxn Report - artificial intelligence for the Applied Arti...
 
Machine learning and TensorFlow
Machine learning and TensorFlowMachine learning and TensorFlow
Machine learning and TensorFlow
 
Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!
 
Venture Scanner Artificial Intelligence 2016 Q4
Venture Scanner Artificial Intelligence 2016 Q4Venture Scanner Artificial Intelligence 2016 Q4
Venture Scanner Artificial Intelligence 2016 Q4
 
Introduction to Machine Learning and Deep Learning
Introduction to Machine Learning and Deep LearningIntroduction to Machine Learning and Deep Learning
Introduction to Machine Learning and Deep Learning
 

Similar to Deep learning in production with the best

An Agile Approach to Machine Learning
An Agile Approach to Machine LearningAn Agile Approach to Machine Learning
An Agile Approach to Machine Learning
Randy Shoup
 
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
Dataconomy Media
 
The Data Janitor Returns | Daniel Molnar | DN18
The Data Janitor Returns | Daniel Molnar | DN18The Data Janitor Returns | Daniel Molnar | DN18
The Data Janitor Returns | Daniel Molnar | DN18
DataconomyGmbH
 
Architecting for Data Science
Architecting for Data ScienceArchitecting for Data Science
Architecting for Data Science
Johann Schleier-Smith
 
"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau
"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau
"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau
TheFamily
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
Vivian S. Zhang
 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the Switch
Rachel Berryman
 
Deep learning with tensorflow
Deep learning with tensorflowDeep learning with tensorflow
Deep learning with tensorflow
Charmi Chokshi
 
Building successful data science teams
Building successful data science teamsBuilding successful data science teams
Building successful data science teams
Venkatesh Umaashankar
 
Internet of Things, TYBSC IT, Semester 5, Unit II
Internet of Things, TYBSC IT, Semester 5, Unit IIInternet of Things, TYBSC IT, Semester 5, Unit II
Internet of Things, TYBSC IT, Semester 5, Unit II
Arti Parab Academics
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningPaco Nathan
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Ramiro Aduviri Velasco
 
Binary crosswords
Binary crosswordsBinary crosswords
Binary crosswords
Laurent Cerveau
 
Kubeflow.pptx
Kubeflow.pptxKubeflow.pptx
Kubeflow.pptx
dhaferbenali1
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
Paco Nathan
 
Deep Learning on Qubole Data Platform
Deep Learning on Qubole Data PlatformDeep Learning on Qubole Data Platform
Deep Learning on Qubole Data Platform
Shivaji Dutta
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
Databricks
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
Jen Aman
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Jen Aman
 
Deep learning for NLP
Deep learning for NLPDeep learning for NLP
Deep learning for NLP
Shishir Choudhary
 

Similar to Deep learning in production with the best (20)

An Agile Approach to Machine Learning
An Agile Approach to Machine LearningAn Agile Approach to Machine Learning
An Agile Approach to Machine Learning
 
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
 
The Data Janitor Returns | Daniel Molnar | DN18
The Data Janitor Returns | Daniel Molnar | DN18The Data Janitor Returns | Daniel Molnar | DN18
The Data Janitor Returns | Daniel Molnar | DN18
 
Architecting for Data Science
Architecting for Data ScienceArchitecting for Data Science
Architecting for Data Science
 
"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau
"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau
"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the Switch
 
Deep learning with tensorflow
Deep learning with tensorflowDeep learning with tensorflow
Deep learning with tensorflow
 
Building successful data science teams
Building successful data science teamsBuilding successful data science teams
Building successful data science teams
 
Internet of Things, TYBSC IT, Semester 5, Unit II
Internet of Things, TYBSC IT, Semester 5, Unit IIInternet of Things, TYBSC IT, Semester 5, Unit II
Internet of Things, TYBSC IT, Semester 5, Unit II
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Binary crosswords
Binary crosswordsBinary crosswords
Binary crosswords
 
Kubeflow.pptx
Kubeflow.pptxKubeflow.pptx
Kubeflow.pptx
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 
Deep Learning on Qubole Data Platform
Deep Learning on Qubole Data PlatformDeep Learning on Qubole Data Platform
Deep Learning on Qubole Data Platform
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
 
Deep learning for NLP
Deep learning for NLPDeep learning for NLP
Deep learning for NLP
 

More from Adam Gibson

End to end MLworkflows
End to end MLworkflowsEnd to end MLworkflows
End to end MLworkflows
Adam Gibson
 
Self driving computers active learning workflows with human interpretable ve...
Self driving computers  active learning workflows with human interpretable ve...Self driving computers  active learning workflows with human interpretable ve...
Self driving computers active learning workflows with human interpretable ve...
Adam Gibson
 
Strata Beijing 2017: Jumpy, a python interface for nd4j
Strata Beijing 2017: Jumpy, a python interface for nd4jStrata Beijing 2017: Jumpy, a python interface for nd4j
Strata Beijing 2017: Jumpy, a python interface for nd4j
Adam Gibson
 
Boolan machine learning summit
Boolan machine learning summitBoolan machine learning summit
Boolan machine learning summit
Adam Gibson
 
Strata Beijing - Deep Learning in Production on Spark
Strata Beijing - Deep Learning in Production on SparkStrata Beijing - Deep Learning in Production on Spark
Strata Beijing - Deep Learning in Production on Spark
Adam Gibson
 
Skymind - Udacity China presentation
Skymind - Udacity China presentationSkymind - Udacity China presentation
Skymind - Udacity China presentation
Adam Gibson
 
Anomaly Detection in Deep Learning (Updated)
Anomaly Detection in Deep Learning (Updated)Anomaly Detection in Deep Learning (Updated)
Anomaly Detection in Deep Learning (Updated)
Adam Gibson
 
Hadoop summit 2016
Hadoop summit 2016Hadoop summit 2016
Hadoop summit 2016
Adam Gibson
 
Advanced spark deep learning
Advanced spark deep learningAdvanced spark deep learning
Advanced spark deep learning
Adam Gibson
 
Recurrent nets and sensors
Recurrent nets and sensorsRecurrent nets and sensors
Recurrent nets and sensors
Adam Gibson
 
Nd4 j slides.pptx
Nd4 j slides.pptxNd4 j slides.pptx
Nd4 j slides.pptx
Adam Gibson
 
Deep learning on Hadoop/Spark -NextML
Deep learning on Hadoop/Spark -NextMLDeep learning on Hadoop/Spark -NextML
Deep learning on Hadoop/Spark -NextML
Adam Gibson
 
Skymind & Deeplearning4j: Deep Learning for the Enterprise
Skymind & Deeplearning4j: Deep Learning for the EnterpriseSkymind & Deeplearning4j: Deep Learning for the Enterprise
Skymind & Deeplearning4j: Deep Learning for the Enterprise
Adam Gibson
 
Sf data mining_meetup
Sf data mining_meetupSf data mining_meetup
Sf data mining_meetup
Adam Gibson
 

More from Adam Gibson (14)

End to end MLworkflows
End to end MLworkflowsEnd to end MLworkflows
End to end MLworkflows
 
Self driving computers active learning workflows with human interpretable ve...
Self driving computers  active learning workflows with human interpretable ve...Self driving computers  active learning workflows with human interpretable ve...
Self driving computers active learning workflows with human interpretable ve...
 
Strata Beijing 2017: Jumpy, a python interface for nd4j
Strata Beijing 2017: Jumpy, a python interface for nd4jStrata Beijing 2017: Jumpy, a python interface for nd4j
Strata Beijing 2017: Jumpy, a python interface for nd4j
 
Boolan machine learning summit
Boolan machine learning summitBoolan machine learning summit
Boolan machine learning summit
 
Strata Beijing - Deep Learning in Production on Spark
Strata Beijing - Deep Learning in Production on SparkStrata Beijing - Deep Learning in Production on Spark
Strata Beijing - Deep Learning in Production on Spark
 
Skymind - Udacity China presentation
Skymind - Udacity China presentationSkymind - Udacity China presentation
Skymind - Udacity China presentation
 
Anomaly Detection in Deep Learning (Updated)
Anomaly Detection in Deep Learning (Updated)Anomaly Detection in Deep Learning (Updated)
Anomaly Detection in Deep Learning (Updated)
 
Hadoop summit 2016
Hadoop summit 2016Hadoop summit 2016
Hadoop summit 2016
 
Advanced spark deep learning
Advanced spark deep learningAdvanced spark deep learning
Advanced spark deep learning
 
Recurrent nets and sensors
Recurrent nets and sensorsRecurrent nets and sensors
Recurrent nets and sensors
 
Nd4 j slides.pptx
Nd4 j slides.pptxNd4 j slides.pptx
Nd4 j slides.pptx
 
Deep learning on Hadoop/Spark -NextML
Deep learning on Hadoop/Spark -NextMLDeep learning on Hadoop/Spark -NextML
Deep learning on Hadoop/Spark -NextML
 
Skymind & Deeplearning4j: Deep Learning for the Enterprise
Skymind & Deeplearning4j: Deep Learning for the EnterpriseSkymind & Deeplearning4j: Deep Learning for the Enterprise
Skymind & Deeplearning4j: Deep Learning for the Enterprise
 
Sf data mining_meetup
Sf data mining_meetupSf data mining_meetup
Sf data mining_meetup
 

Recently uploaded

一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 

Recently uploaded (20)

一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 

Deep learning in production with the best

  • 1. skymind.io | deeplearning.org | gitter.im/deeplearning4j Deep Learning in Production Building Production Class Deep Learning Workflows for the Enterprise Adam Gibson / CTO Skymind AI With the Best / The Internet
  • 2. Topics • Deep Learning in Production vs Academia • Data Scientists vs Engineers • Defining Production • A solution
  • 3. Deep Learning in Production vs Academia
  • 4. Academia/Research Focus on accuracy and the latest architectures Build proof of concepts quickly to validate an assumption Prototype as many ideas as quickly as possible to come up with a solution to a problem Publish often incremental results to increase publications
  • 5. Current state of research Mostly funded by large consumer companies (Amazon,Google,Facebook,..) Scant pockets of deep learning academic institutions (CMU,Stanford,NYU,..) Large focus on audio and vision, somewhat spreading in to natural language processing Starting to focus more on reinforcement learning and better ways of tuning
  • 6. People in Deep Learning • Talent still sparse • Most are in research labs • Some of them are enthusiasts or startup founders • Reality: Deep Learning hasn’t hit most of the world yet. It affects alot of people but most aren’t doing it.
  • 7. Industry (MOST Companies doing data science) ● Most use linear regression and random forest ● Prototyping happens in python - these are data scientists ● Data Engineers hold the keys to the cluster (write code in java) ● Most problems are simple - analytics, churn prediction, maybe recommendation engines or price forecasting ● Deep Learning is seen as overkill - no gpus in your cluster
  • 8. Data Scientists vs Engineers
  • 9. Data Scientists • Math or stats background - know r or python • Often a beginning coder - may have started in sql and moved up to analytics • Know basic machine learning - problems are focused on replacing excel spreadsheets or solving business problems
  • 10. Data Engineers • Computer Science background • Builds data pipelines and knows how to setup production systems • Doesn’t really know machine learning that well - usually willing to learn • Usually closer to the product team - may port python algorithms to java depending on level of ability
  • 11. The hybrid • Been in the game a while knows CS and stats • Knows SQL, machine learning, and how to operate a spark cluster • Can formulate problems and figure out what projects to tackle next • Either understands business objectives or can implement machine learning algorithms themselves
  • 12. Most companies • 2 separate teams • Data scientists use python/r and sql, experiment with data and come up with new models (very little machine learning) • Data engineers use java (sometimes .net) and work on terabytes of data - most time spent writing integrations and data pipelines
  • 13. Startups ● Tend to employ generalists ● Usually 3-5 people who can sort of do both. Startups aren’t usually ready to hire specialists ● Sometimes have a product where something like deep learning is needed ● Usually ruby or python stack, not many users or scale ● Usually just want something simple to setup ● Not much need for compiled languages or scale yet - this comes later
  • 15. Defining “Production” ● Varying degrees of scale ● Not everyone has terabytes of data ● Mysql and outsourced cloud services are “machine learning” for most startups ● Many will start out with scikit learn and flask, maybe add python based deep learning later. This is “good enough” - this is also what you see the most tutorials for ● Larger companies care more about other things - security,scale, and return on investment for projects. These companies use java ● If you’re google you use c++ or facebook you use your own version of php you wrote and maintain
  • 16. Hardware • GPUs have very little market penetration • Deep Learning also has very little market penetration (despite the marketing) • Most of the world is cpus (this is changing very slowly) • Startups are fine with cloud - on prem data centers are usually dell or hp servers with red hat or ubuntu on them
  • 17. Typical stack • Web based product (go,ruby,python,scala,java,mix) • Storage (1 or more sql databases, elasticsearch/solr) • Cloud infrastructure or on prem (bare metal) • Machine Learning - ???
  • 18. Machine Learning at startups • Random 1 off scripts for analysis • Random 1 off notebooks • 1 off ETL pipelines written in java • 1 or more models tied to a rest api that talks to your product stack
  • 19. Machine Learning at big companies • Random 1 off scripts for analysis • Random 1 off notebooks • Large numbers of separate data bases and applications run by different teams • Multiple disconnected apis • Some models connected to a spark or hadoop cluster
  • 20. Challenges in Production • Serving user traffic (latency) • Data access (connecting everything together) • Large amounts of time spent on data pipeline code • Unclear metrics of success for the data team • Lack of innovation or “too much” eg: “chase the shiny new thing”
  • 21. Challenges of Deep Learning in Production • Same problems as machine learning • Hard to interpret models • Requires specialized hardware • Not a lot of best practices • Lack of expertise (machine learning is hard enough)
  • 23. Establish some best practices • Kaggle is a good start for this - start with “somewhat real” problems • Use higher level tools - keras, otherwise easy to get lost in weeds • Consider having a real world goal - eg: if you’re in real estate figure out how to use a simple cnn (not the latest algorithm) for image search • Depending on need consider integration with hadoop/spark • Lastly - don’t treat deep learning as special. It’s still a subfield of machine learning
  • 24. Going to production • Sometimes python is enough for simple stuff • Data Engineering teams should consider java/scala based solutions (disclaimer: highly opinionated here) • Follow same workflow - prototype in python port to production • Overall - scope to a core problem where deep learning is worth it
  • 25. Newer hardware • Prototype on cloud infrastructure on a toy problem • Try out this “GPU thing” and see what might be involved • Learn the trade offs of cpus and gpus - don’t believe the marketing • Buy new hardware as needed
  • 26. In closing • Use something open source to start off with • Use something *supported* keep an eye on open source activity • Don’t just believe the research. Papers are not your company. Do due diligence