SlideShare a Scribd company logo
1 of 40
Download to read offline
From Prototyping to Deployment at Scale with R and sparklyr
Kevin Kuo
June 2018
1 / 26
Menu today
The deployment problem
ML pipelines
Model deployment and demo
2 / 26
The deployment problem
3 / 26
Deployment
Or, putting ML models "into production".
4 / 26
Deployment
Or, putting ML models "into production".
Basically, make it so that someone else can use your model (i.e. make some predictions with
it).
4 / 26
Batch
Event-based/time-based
E.g. nightly portfolio risk calculations,
and
Email campaigns
It's OK to take a while
"Real-time"
On demand
E.g. instant loan approvals, and
Fraud detection on credit card swipes
Gotta be (relatively) fast, seconds to
less than a second
Deployment - latency dimension
5 / 26
Deployment - why is it hard?
Challenge 1/n: Putting ML models into production involves different
expertise
6 / 26
Deployment - why is it hard?
Challenge 1/n: Putting ML models into production involves different
expertise
...so it involves more people
6 / 26
Deployment - why is it hard?
Challenge 1/n: Putting ML models into production involves different
expertise
...so it involves more people
...mo ppl mo problems
6 / 26
Deployment - mo ppl mo problems
Credit: https://youtu.be/-K9SjrWpeys @josh_wills 7 / 26
Deployment - why is it hard?
Challenge 2/n: Rapidly changing landscape in deployment options
8 / 26
Deployment - why is it hard?
Challenge 2/n: Rapidly changing landscape in deployment options
What do?
Spark ML persistence?
dbml-local?
PMML?
PFA/Aardpfark?
MLeap?
ONNX?
Roll our own thing?
Re-implement the model in C++, because performance?
Throw it into a container and do orchestration cuz it's cool?
8 / 26
Deployment - why is it hard?
Challenge 3/n: Too many ML frameworks and no standardization
9 / 26
Deployment - why is it hard?
Challenge 3/n: Too many ML frameworks and no standardization
We're focusing on Spark in this session, but we'll acknowledge other technologies we need to deal with
9 / 26
Deployment - diversity of ML frameworks
Spark ML, xgboost, random CRAN packages, scikit-learn, H2O, ...
10 / 26
Deployment - one of many scenarios
Data scientist: Hey this random forest loan decision model is ready to go!
Engineer: OK, we need to recode it in C#/Java, see you in 6 months!
11 / 26
Deployment - one of many scenarios
Data scientist: Hey this random forest loan decision model is ready to go!
Engineer: OK, we need to recode it in C#/Java, see you in 6 months!
Data scientist: Oh no that's too long, what about I give you this GLM with just a few
parameters?
Engineer: 2 months.
11 / 26
Deployment - one of many scenarios
Data scientist: Hey this random forest loan decision model is ready to go!
Engineer: OK, we need to recode it in C#/Java, see you in 6 months!
Data scientist: Oh no that's too long, what about I give you this GLM with just a few
parameters?
Engineer: 2 months.
Data scientist:
11 / 26
Deployment - challenges for the R user
On average, R users tend to...
Be math/stats types
Have little CS/software engineering training
So it's slightly tougher for them to collaborate with the folks doing model implementation.
12 / 26
Deployment - challenges for the R user
On average, R users tend to...
Be math/stats types
Have little CS/software engineering training
So it's slightly tougher for them to collaborate with the folks doing model implementation.
However,
Data scientists (regardless of background) are becoming more comfortable moving up and down the stack
There has been active development of technology to faciliate the data science-engineering handoff
12 / 26
Deployment - technology
Technology won't solve your people/process/culture issues, but it can make collaboration easier!
13 / 26
Deployment - technology
Technology won't solve your people/process/culture issues, but it can make collaboration easier!
Next up: we'll provide a quick review of Spark ML pipelines, and offer a couple ways of
"deploying" them using the sparklyr ecosystem.
13 / 26
Spark ML pipelines
14 / 26
ML pipelines
Pipelines are basically...
A structure in which you can throw in data transformers and ML models.
15 / 26
ML pipelines
Pipelines are basically...
A structure in which you can throw in data transformers and ML models.
Keep in mind that when you deploy a model, you also need to deploy the feature engineering steps in order to
feed the right inputs to the model! (E.g. converting a numeric age variable into an age range bucket that the
model requires.)
15 / 26
ML pipelines
Pipelines are basically...
A structure in which you can throw in data transformers and ML models.
Keep in mind that when you deploy a model, you also need to deploy the feature engineering steps in order to
feed the right inputs to the model! (E.g. converting a numeric age variable into an age range bucket that the
model requires.)
Now let's go through a (very quick) overview of pipeline concepts.
15 / 26
ML pipelines
A Transformer takes a data frame, via ml_transform(), and returns a transformed data frame.
16 / 26
ML pipelines
A Transformer takes a data frame, via ml_transform(), and returns a transformed data frame.
An Estimator take a data frame, via ml_fit(), and returns a Transformer.
17 / 26
ML pipelines
A Transformer takes a data frame, via ml_transform(), and returns a transformed data frame.
An Estimator take a data frame, via ml_fit(), and returns a Transformer.
A Pipeline consists of a sequence of stages—PipelineStages—that act on some data in order.
A PipelineStage can be either a Transformer or an Estimator.
18 / 26
ML pipelines
A Pipeline consists of a sequence of stages—PipelineStages—that act on some data in order.
A PipelineStage can be either a Transformer or an Estimator.
A Pipeline is always an Estimator, and its fitted form is called PipelineModel which is a
Transformer.
19 / 26
ML pipelines
20 / 26
ML pipelines
21 / 26
Serving the model
Now, the trick is to persist this PipelineModel so we can use it to serve predictions later on.
22 / 26
Serving the model
Now, the trick is to persist this PipelineModel so we can use it to serve predictions later on.
We'll demo a couple ways today
Native Spark ML persistence support
MLeap (via the mleap R package)
22 / 26
Demo
23 / 26
Spark ML Persistence
Appropriate for batch jobs, scoring lots
of records at once
Requires Spark session
MLeap
Better for real-time prediction of a
small number of records
Doesn't require Spark session, portable
to apps/devices that support JVM
Model deployment paths
24 / 26
Towards a better deployment story
Data scientist: Hey this random forest loan decision model is ready to go! Here is the .zip
bundle, and here is the documentation you need to use it.
Engineer: Awesome! We won't need to write a bazillion if-else statements to recreate the
model!
25 / 26
Towards a better deployment story
Data scientist: Hey this random forest loan decision model is ready to go! Here is the .zip
bundle, and here is the documentation you need to use it.
Engineer: Awesome! We won't need to write a bazillion if-else statements to recreate the
model!
When the model needs updating...
25 / 26
Towards a better deployment story
Data scientist: Hey this random forest loan decision model is ready to go! Here is the .zip
bundle, and here is the documentation you need to use it.
Engineer: Awesome! We won't need to write a bazillion if-else statements to recreate the
model!
When the model needs updating...
Data scientist: We decided to use a GBM instead for better accuracy, here's the updated
bundle.
Engineer: Fantabulous! All we need to do is update the model directory!
25 / 26
Wrap up
Slides and code will be available at https://kevinykuo.com.
Inspirations/other talks to check out
"Productionizing Spark ML pipelines with the portable format for analytics" https://youtu.be/h-
B0VCkoRkE @MLnick
"How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2.x"
https://youtu.be/r740xbIpb54 Richard Garris
"MLeap and Combust ML" https://youtu.be/MGZDF6E41r4 Hollin Wilkins and Mikhail Semeniuk
26 / 26

More Related Content

What's hot

Scaling Ride-Hailing with Machine Learning on MLflow
Scaling Ride-Hailing with Machine Learning on MLflowScaling Ride-Hailing with Machine Learning on MLflow
Scaling Ride-Hailing with Machine Learning on MLflowDatabricks
 
Continuous Delivery of ML-Enabled Pipelines on Databricks using MLflow
Continuous Delivery of ML-Enabled Pipelines on Databricks using MLflowContinuous Delivery of ML-Enabled Pipelines on Databricks using MLflow
Continuous Delivery of ML-Enabled Pipelines on Databricks using MLflowDatabricks
 
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedInGrokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedInGrokking VN
 
H2O at Berlin R Meetup
H2O at Berlin R MeetupH2O at Berlin R Meetup
H2O at Berlin R MeetupJo-fai Chow
 
DAIS Europe Nov. 2020 presentation on MLflow Model Serving
DAIS Europe Nov. 2020 presentation on MLflow Model ServingDAIS Europe Nov. 2020 presentation on MLflow Model Serving
DAIS Europe Nov. 2020 presentation on MLflow Model Servingamesar0
 
Scalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2OScalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2OSri Ambati
 
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
How to Utilize MLflow and Kubernetes to Build an Enterprise ML PlatformHow to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
How to Utilize MLflow and Kubernetes to Build an Enterprise ML PlatformDatabricks
 
Grokking: Data Engineering Course
Grokking: Data Engineering CourseGrokking: Data Engineering Course
Grokking: Data Engineering CourseGrokking VN
 
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...Databricks
 
Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflow
Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflowAutomatic Forecasting using Prophet, Databricks, Delta Lake and MLflow
Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflowDatabricks
 
Porting R Models into Scala Spark
Porting R Models into Scala SparkPorting R Models into Scala Spark
Porting R Models into Scala Sparkcarl_pulley
 
“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOpsRui Quintino
 
The Quest for an Open Source Data Science Platform
 The Quest for an Open Source Data Science Platform The Quest for an Open Source Data Science Platform
The Quest for an Open Source Data Science PlatformQAware GmbH
 
Deploying and Monitoring Heterogeneous Machine Learning Applications with Cli...
Deploying and Monitoring Heterogeneous Machine Learning Applications with Cli...Deploying and Monitoring Heterogeneous Machine Learning Applications with Cli...
Deploying and Monitoring Heterogeneous Machine Learning Applications with Cli...Databricks
 
Getting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague GriffithGetting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague GriffithDatabricks
 
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...PAPIs.io
 
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & KubeflowMLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & KubeflowJan Kirenz
 
A Microservices Framework for Real-Time Model Scoring Using Structured Stream...
A Microservices Framework for Real-Time Model Scoring Using Structured Stream...A Microservices Framework for Real-Time Model Scoring Using Structured Stream...
A Microservices Framework for Real-Time Model Scoring Using Structured Stream...Databricks
 

What's hot (20)

MLflow with R
MLflow with RMLflow with R
MLflow with R
 
Scaling Ride-Hailing with Machine Learning on MLflow
Scaling Ride-Hailing with Machine Learning on MLflowScaling Ride-Hailing with Machine Learning on MLflow
Scaling Ride-Hailing with Machine Learning on MLflow
 
Continuous Delivery of ML-Enabled Pipelines on Databricks using MLflow
Continuous Delivery of ML-Enabled Pipelines on Databricks using MLflowContinuous Delivery of ML-Enabled Pipelines on Databricks using MLflow
Continuous Delivery of ML-Enabled Pipelines on Databricks using MLflow
 
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedInGrokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
 
H2O at Berlin R Meetup
H2O at Berlin R MeetupH2O at Berlin R Meetup
H2O at Berlin R Meetup
 
DAIS Europe Nov. 2020 presentation on MLflow Model Serving
DAIS Europe Nov. 2020 presentation on MLflow Model ServingDAIS Europe Nov. 2020 presentation on MLflow Model Serving
DAIS Europe Nov. 2020 presentation on MLflow Model Serving
 
Scalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2OScalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2O
 
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
How to Utilize MLflow and Kubernetes to Build an Enterprise ML PlatformHow to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
 
Madrid Meetup
Madrid MeetupMadrid Meetup
Madrid Meetup
 
Grokking: Data Engineering Course
Grokking: Data Engineering CourseGrokking: Data Engineering Course
Grokking: Data Engineering Course
 
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
 
Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflow
Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflowAutomatic Forecasting using Prophet, Databricks, Delta Lake and MLflow
Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflow
 
Porting R Models into Scala Spark
Porting R Models into Scala SparkPorting R Models into Scala Spark
Porting R Models into Scala Spark
 
“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps
 
The Quest for an Open Source Data Science Platform
 The Quest for an Open Source Data Science Platform The Quest for an Open Source Data Science Platform
The Quest for an Open Source Data Science Platform
 
Deploying and Monitoring Heterogeneous Machine Learning Applications with Cli...
Deploying and Monitoring Heterogeneous Machine Learning Applications with Cli...Deploying and Monitoring Heterogeneous Machine Learning Applications with Cli...
Deploying and Monitoring Heterogeneous Machine Learning Applications with Cli...
 
Getting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague GriffithGetting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague Griffith
 
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
 
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & KubeflowMLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
 
A Microservices Framework for Real-Time Model Scoring Using Structured Stream...
A Microservices Framework for Real-Time Model Scoring Using Structured Stream...A Microservices Framework for Real-Time Model Scoring Using Structured Stream...
A Microservices Framework for Real-Time Model Scoring Using Structured Stream...
 

Similar to From Prototyping to Deployment at Scale with R and sparklyr with Kevin Kuo

Machine learning at scale challenges and solutions
Machine learning at scale challenges and solutionsMachine learning at scale challenges and solutions
Machine learning at scale challenges and solutionsStavros Kontopoulos
 
Productionizing Predictive Analytics using the Rendezvous Architecture - for ...
Productionizing Predictive Analytics using the Rendezvous Architecture - for ...Productionizing Predictive Analytics using the Rendezvous Architecture - for ...
Productionizing Predictive Analytics using the Rendezvous Architecture - for ...danielschulz2005
 
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud av...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud av...IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud av...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud av...IEEEGLOBALSOFTSTUDENTPROJECTS
 
2014 IEEE JAVA CLOUD COMPUTING PROJECT Scalable analytics for iaas cloud avai...
2014 IEEE JAVA CLOUD COMPUTING PROJECT Scalable analytics for iaas cloud avai...2014 IEEE JAVA CLOUD COMPUTING PROJECT Scalable analytics for iaas cloud avai...
2014 IEEE JAVA CLOUD COMPUTING PROJECT Scalable analytics for iaas cloud avai...IEEEFINALSEMSTUDENTPROJECTS
 
[DSC Europe 22] Reproducibility and Versioning of ML Systems - Spela Poklukar
[DSC Europe 22] Reproducibility and Versioning of ML Systems - Spela Poklukar[DSC Europe 22] Reproducibility and Versioning of ML Systems - Spela Poklukar
[DSC Europe 22] Reproducibility and Versioning of ML Systems - Spela PoklukarDataScienceConferenc1
 
[Sirius Day Eindhoven 2018] ASML's MDE Going Sirius
[Sirius Day Eindhoven 2018]  ASML's MDE Going Sirius[Sirius Day Eindhoven 2018]  ASML's MDE Going Sirius
[Sirius Day Eindhoven 2018] ASML's MDE Going SiriusObeo
 
The Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceThe Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceDataWorks Summit/Hadoop Summit
 
SiriusCon2016 - ASML's MDE Going Sirius
SiriusCon2016 - ASML's MDE Going SiriusSiriusCon2016 - ASML's MDE Going Sirius
SiriusCon2016 - ASML's MDE Going SiriusObeo
 
Victor Chang: Cloud computing business framework
Victor Chang: Cloud computing business frameworkVictor Chang: Cloud computing business framework
Victor Chang: Cloud computing business frameworkCBOD ANR project U-PSUD
 
2014 IEEE DOTNET CLOUD COMPUTING PROJECT Scalable analytics for iaa s cloud a...
2014 IEEE DOTNET CLOUD COMPUTING PROJECT Scalable analytics for iaa s cloud a...2014 IEEE DOTNET CLOUD COMPUTING PROJECT Scalable analytics for iaa s cloud a...
2014 IEEE DOTNET CLOUD COMPUTING PROJECT Scalable analytics for iaa s cloud a...IEEEFINALSEMSTUDENTPROJECTS
 
Clipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemClipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemDatabricks
 
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud ...
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud ...IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud ...
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud ...IEEEMEMTECHSTUDENTPROJECTS
 
Pitfalls of machine learning in production
Pitfalls of machine learning in productionPitfalls of machine learning in production
Pitfalls of machine learning in productionAntoine Sauray
 
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Spark Summit
 
FlinkML: Large Scale Machine Learning with Apache Flink
FlinkML: Large Scale Machine Learning with Apache FlinkFlinkML: Large Scale Machine Learning with Apache Flink
FlinkML: Large Scale Machine Learning with Apache FlinkTheodoros Vasiloudis
 
Production machine learning: Managing models, workflows and risk at scale
Production machine learning: Managing models, workflows and risk at scaleProduction machine learning: Managing models, workflows and risk at scale
Production machine learning: Managing models, workflows and risk at scaleAlex Housley
 
Limited Budget but Effective End to End MLOps Practices (Machine Learning Mod...
Limited Budget but Effective End to End MLOps Practices (Machine Learning Mod...Limited Budget but Effective End to End MLOps Practices (Machine Learning Mod...
Limited Budget but Effective End to End MLOps Practices (Machine Learning Mod...IRJET Journal
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...Data Con LA
 
Learning Software Performance Models for Dynamic and Uncertain Environments
Learning Software Performance Models for Dynamic and Uncertain EnvironmentsLearning Software Performance Models for Dynamic and Uncertain Environments
Learning Software Performance Models for Dynamic and Uncertain EnvironmentsPooyan Jamshidi
 

Similar to From Prototyping to Deployment at Scale with R and sparklyr with Kevin Kuo (20)

Machine learning at scale challenges and solutions
Machine learning at scale challenges and solutionsMachine learning at scale challenges and solutions
Machine learning at scale challenges and solutions
 
Productionizing Predictive Analytics using the Rendezvous Architecture - for ...
Productionizing Predictive Analytics using the Rendezvous Architecture - for ...Productionizing Predictive Analytics using the Rendezvous Architecture - for ...
Productionizing Predictive Analytics using the Rendezvous Architecture - for ...
 
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud av...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud av...IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud av...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud av...
 
2014 IEEE JAVA CLOUD COMPUTING PROJECT Scalable analytics for iaas cloud avai...
2014 IEEE JAVA CLOUD COMPUTING PROJECT Scalable analytics for iaas cloud avai...2014 IEEE JAVA CLOUD COMPUTING PROJECT Scalable analytics for iaas cloud avai...
2014 IEEE JAVA CLOUD COMPUTING PROJECT Scalable analytics for iaas cloud avai...
 
[DSC Europe 22] Reproducibility and Versioning of ML Systems - Spela Poklukar
[DSC Europe 22] Reproducibility and Versioning of ML Systems - Spela Poklukar[DSC Europe 22] Reproducibility and Versioning of ML Systems - Spela Poklukar
[DSC Europe 22] Reproducibility and Versioning of ML Systems - Spela Poklukar
 
[Sirius Day Eindhoven 2018] ASML's MDE Going Sirius
[Sirius Day Eindhoven 2018]  ASML's MDE Going Sirius[Sirius Day Eindhoven 2018]  ASML's MDE Going Sirius
[Sirius Day Eindhoven 2018] ASML's MDE Going Sirius
 
The Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceThe Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open Source
 
SiriusCon2016 - ASML's MDE Going Sirius
SiriusCon2016 - ASML's MDE Going SiriusSiriusCon2016 - ASML's MDE Going Sirius
SiriusCon2016 - ASML's MDE Going Sirius
 
Victor Chang: Cloud computing business framework
Victor Chang: Cloud computing business frameworkVictor Chang: Cloud computing business framework
Victor Chang: Cloud computing business framework
 
2014 IEEE DOTNET CLOUD COMPUTING PROJECT Scalable analytics for iaa s cloud a...
2014 IEEE DOTNET CLOUD COMPUTING PROJECT Scalable analytics for iaa s cloud a...2014 IEEE DOTNET CLOUD COMPUTING PROJECT Scalable analytics for iaa s cloud a...
2014 IEEE DOTNET CLOUD COMPUTING PROJECT Scalable analytics for iaa s cloud a...
 
Clipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemClipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving System
 
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud ...
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud ...IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud ...
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud ...
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
 
Pitfalls of machine learning in production
Pitfalls of machine learning in productionPitfalls of machine learning in production
Pitfalls of machine learning in production
 
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
 
FlinkML: Large Scale Machine Learning with Apache Flink
FlinkML: Large Scale Machine Learning with Apache FlinkFlinkML: Large Scale Machine Learning with Apache Flink
FlinkML: Large Scale Machine Learning with Apache Flink
 
Production machine learning: Managing models, workflows and risk at scale
Production machine learning: Managing models, workflows and risk at scaleProduction machine learning: Managing models, workflows and risk at scale
Production machine learning: Managing models, workflows and risk at scale
 
Limited Budget but Effective End to End MLOps Practices (Machine Learning Mod...
Limited Budget but Effective End to End MLOps Practices (Machine Learning Mod...Limited Budget but Effective End to End MLOps Practices (Machine Learning Mod...
Limited Budget but Effective End to End MLOps Practices (Machine Learning Mod...
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...
 
Learning Software Performance Models for Dynamic and Uncertain Environments
Learning Software Performance Models for Dynamic and Uncertain EnvironmentsLearning Software Performance Models for Dynamic and Uncertain Environments
Learning Software Performance Models for Dynamic and Uncertain Environments
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Business Analytics using Microsoft Excel
Business Analytics using Microsoft ExcelBusiness Analytics using Microsoft Excel
Business Analytics using Microsoft Excelysmaelreyes
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证
在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证
在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证nhjeo1gg
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...ttt fff
 
办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一
办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一
办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一z xss
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
 

Recently uploaded (20)

Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Business Analytics using Microsoft Excel
Business Analytics using Microsoft ExcelBusiness Analytics using Microsoft Excel
Business Analytics using Microsoft Excel
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证
在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证
在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
 
办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一
办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一
办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
 

From Prototyping to Deployment at Scale with R and sparklyr with Kevin Kuo

  • 1. From Prototyping to Deployment at Scale with R and sparklyr Kevin Kuo June 2018 1 / 26
  • 2. Menu today The deployment problem ML pipelines Model deployment and demo 2 / 26
  • 4. Deployment Or, putting ML models "into production". 4 / 26
  • 5. Deployment Or, putting ML models "into production". Basically, make it so that someone else can use your model (i.e. make some predictions with it). 4 / 26
  • 6. Batch Event-based/time-based E.g. nightly portfolio risk calculations, and Email campaigns It's OK to take a while "Real-time" On demand E.g. instant loan approvals, and Fraud detection on credit card swipes Gotta be (relatively) fast, seconds to less than a second Deployment - latency dimension 5 / 26
  • 7. Deployment - why is it hard? Challenge 1/n: Putting ML models into production involves different expertise 6 / 26
  • 8. Deployment - why is it hard? Challenge 1/n: Putting ML models into production involves different expertise ...so it involves more people 6 / 26
  • 9. Deployment - why is it hard? Challenge 1/n: Putting ML models into production involves different expertise ...so it involves more people ...mo ppl mo problems 6 / 26
  • 10. Deployment - mo ppl mo problems Credit: https://youtu.be/-K9SjrWpeys @josh_wills 7 / 26
  • 11. Deployment - why is it hard? Challenge 2/n: Rapidly changing landscape in deployment options 8 / 26
  • 12. Deployment - why is it hard? Challenge 2/n: Rapidly changing landscape in deployment options What do? Spark ML persistence? dbml-local? PMML? PFA/Aardpfark? MLeap? ONNX? Roll our own thing? Re-implement the model in C++, because performance? Throw it into a container and do orchestration cuz it's cool? 8 / 26
  • 13. Deployment - why is it hard? Challenge 3/n: Too many ML frameworks and no standardization 9 / 26
  • 14. Deployment - why is it hard? Challenge 3/n: Too many ML frameworks and no standardization We're focusing on Spark in this session, but we'll acknowledge other technologies we need to deal with 9 / 26
  • 15. Deployment - diversity of ML frameworks Spark ML, xgboost, random CRAN packages, scikit-learn, H2O, ... 10 / 26
  • 16. Deployment - one of many scenarios Data scientist: Hey this random forest loan decision model is ready to go! Engineer: OK, we need to recode it in C#/Java, see you in 6 months! 11 / 26
  • 17. Deployment - one of many scenarios Data scientist: Hey this random forest loan decision model is ready to go! Engineer: OK, we need to recode it in C#/Java, see you in 6 months! Data scientist: Oh no that's too long, what about I give you this GLM with just a few parameters? Engineer: 2 months. 11 / 26
  • 18. Deployment - one of many scenarios Data scientist: Hey this random forest loan decision model is ready to go! Engineer: OK, we need to recode it in C#/Java, see you in 6 months! Data scientist: Oh no that's too long, what about I give you this GLM with just a few parameters? Engineer: 2 months. Data scientist: 11 / 26
  • 19. Deployment - challenges for the R user On average, R users tend to... Be math/stats types Have little CS/software engineering training So it's slightly tougher for them to collaborate with the folks doing model implementation. 12 / 26
  • 20. Deployment - challenges for the R user On average, R users tend to... Be math/stats types Have little CS/software engineering training So it's slightly tougher for them to collaborate with the folks doing model implementation. However, Data scientists (regardless of background) are becoming more comfortable moving up and down the stack There has been active development of technology to faciliate the data science-engineering handoff 12 / 26
  • 21. Deployment - technology Technology won't solve your people/process/culture issues, but it can make collaboration easier! 13 / 26
  • 22. Deployment - technology Technology won't solve your people/process/culture issues, but it can make collaboration easier! Next up: we'll provide a quick review of Spark ML pipelines, and offer a couple ways of "deploying" them using the sparklyr ecosystem. 13 / 26
  • 24. ML pipelines Pipelines are basically... A structure in which you can throw in data transformers and ML models. 15 / 26
  • 25. ML pipelines Pipelines are basically... A structure in which you can throw in data transformers and ML models. Keep in mind that when you deploy a model, you also need to deploy the feature engineering steps in order to feed the right inputs to the model! (E.g. converting a numeric age variable into an age range bucket that the model requires.) 15 / 26
  • 26. ML pipelines Pipelines are basically... A structure in which you can throw in data transformers and ML models. Keep in mind that when you deploy a model, you also need to deploy the feature engineering steps in order to feed the right inputs to the model! (E.g. converting a numeric age variable into an age range bucket that the model requires.) Now let's go through a (very quick) overview of pipeline concepts. 15 / 26
  • 27. ML pipelines A Transformer takes a data frame, via ml_transform(), and returns a transformed data frame. 16 / 26
  • 28. ML pipelines A Transformer takes a data frame, via ml_transform(), and returns a transformed data frame. An Estimator take a data frame, via ml_fit(), and returns a Transformer. 17 / 26
  • 29. ML pipelines A Transformer takes a data frame, via ml_transform(), and returns a transformed data frame. An Estimator take a data frame, via ml_fit(), and returns a Transformer. A Pipeline consists of a sequence of stages—PipelineStages—that act on some data in order. A PipelineStage can be either a Transformer or an Estimator. 18 / 26
  • 30. ML pipelines A Pipeline consists of a sequence of stages—PipelineStages—that act on some data in order. A PipelineStage can be either a Transformer or an Estimator. A Pipeline is always an Estimator, and its fitted form is called PipelineModel which is a Transformer. 19 / 26
  • 33. Serving the model Now, the trick is to persist this PipelineModel so we can use it to serve predictions later on. 22 / 26
  • 34. Serving the model Now, the trick is to persist this PipelineModel so we can use it to serve predictions later on. We'll demo a couple ways today Native Spark ML persistence support MLeap (via the mleap R package) 22 / 26
  • 36. Spark ML Persistence Appropriate for batch jobs, scoring lots of records at once Requires Spark session MLeap Better for real-time prediction of a small number of records Doesn't require Spark session, portable to apps/devices that support JVM Model deployment paths 24 / 26
  • 37. Towards a better deployment story Data scientist: Hey this random forest loan decision model is ready to go! Here is the .zip bundle, and here is the documentation you need to use it. Engineer: Awesome! We won't need to write a bazillion if-else statements to recreate the model! 25 / 26
  • 38. Towards a better deployment story Data scientist: Hey this random forest loan decision model is ready to go! Here is the .zip bundle, and here is the documentation you need to use it. Engineer: Awesome! We won't need to write a bazillion if-else statements to recreate the model! When the model needs updating... 25 / 26
  • 39. Towards a better deployment story Data scientist: Hey this random forest loan decision model is ready to go! Here is the .zip bundle, and here is the documentation you need to use it. Engineer: Awesome! We won't need to write a bazillion if-else statements to recreate the model! When the model needs updating... Data scientist: We decided to use a GBM instead for better accuracy, here's the updated bundle. Engineer: Fantabulous! All we need to do is update the model directory! 25 / 26
  • 40. Wrap up Slides and code will be available at https://kevinykuo.com. Inspirations/other talks to check out "Productionizing Spark ML pipelines with the portable format for analytics" https://youtu.be/h- B0VCkoRkE @MLnick "How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2.x" https://youtu.be/r740xbIpb54 Richard Garris "MLeap and Combust ML" https://youtu.be/MGZDF6E41r4 Hollin Wilkins and Mikhail Semeniuk 26 / 26