SlideShare a Scribd company logo
How to Build a ML
Platform Efficiently
Using Open-Source
Jean Carlo Machado
Theodore Meynard
GetYourGuide 1
Agenda
▪ Introduction
▪ ML at GetYourGuide
▪ Before the Platform
▪ ML Platform
▪ Demo
▪ Final words
2
Who Are We
Theo: Senior Data Scientist
Jean: Senior Software Engineer
3
theodore.meynard@getyourguide.com
jean.machado@getyourguide.com
Introduction
4
We’ve built the world’s largest
marketplace for travel activities…
Millions of travelers use
GetYourGuide every year
We facilitate the transaction We offer more than 40,000
activities worldwide
Connecting
customers...
...to suppliers
around the world
5
ML at GetYourGuide
6
Data Product: Ranking Service
7
Data Product: Recommendation Panels
8
Data Product: Paid Search
9
Amsterdam
:
Canal
Cruise
Other Data Products
We also use ML for:
• Demand forecasting
• Inventory labeling
⇒ 20+ ML projects distributed in 2
teams + delivered models to other
teams to maintain
10
Data Product Principles
We follow clean code principles
PoCs are temporary
We build solid, resilient deployment processes
Know our models health at every point in time
Quality,
Testing &
Monitoring
We integrate into the engineering
ecosystem and leverage
open-source
Data workflows are efficient and
cost-effective
We take reproducibility and
modularity seriously
Engineering
We Promote the Data Product
mindset
We deeply integrate with data
stakeholders
Stakeholder
Engagement
We actively manage the unknowns in our
planning
Data analytics dynamically informs our
project plans
Workflow
Customer and business value over
fancy solutions
Exploration is one of our goals
Strategy
We value small iterations on
existing models
We value explainability over pure
accuracy
Performance is proven online
Model
Our principles explained 11
Before the platform
12
How We Started
Pros
● Widely used by ML practitioners
● Good to start new projects &
prototype
● Great visualization
Cons
● No proper version control
● No code reuse
● No automatic testing
13
A Major Improvement
Pros
● Tests included in library
● Version control with code review
● Maintainable projects
Cons
● No CI/CD
● No model tracking
14
ML Platform
15
From Amsterdam:
Volendam, Marken and
Windmills
ML Platform Key Features
● CI/CD
● Model Tracking
● Batch & Online inference
16
ML Platform Principles
• Maximize data scientist’s model
ownership
• Reproducible Machine Learning
• Reuse most of our existing
infrastructure
• Build incrementally
• Use open-source and open
standards
17
Our Current Workflow
18
ML CI/CD
19
Training Path
20
Batch Inference Path
21
Online Inference Path
22
Online Inference With BentoML
23
From MLflow to BentoML Example
import mlflow
from iris_classifier import IrisClassifier
# Load mlflow model
mlflow_model = mlflow.sklearn.load_model(model_uri)
# Create a iris classifier service instance
iris_classifier_service = IrisClassifier()
# Pack the newly trained model artifact
iris_classifier_service.pack("model", mlflow_model)
# Save the prediction service for model serving
saved_path = iris_classifier_service.save()
from bentoml import env, artifacts, api, BentoService
from bentoml.adapters import DataframeInput
from bentoml.frameworks.sklearn import SklearnModelArtifact
@artifacts([SklearnModelArtifact("model")])
class IrisClassifier(BentoService):
"""A minimum prediction service"""
@api(input=DataframeInput())
def predict(self, df: pd.DataFrame):
"""An inference API named `predict`"""
return self.artifacts.model.predict(df)
iris_classifier.py
mlflow2bentoml.py
24
Demo
25
Final remarks
26
● The integration with the existing
architecture fosters proactive
collaboration
● The tool space is very new making
the exploration vital but time
consuming
● A design review is necessary to align
everyone
Learnings
27
Amsterdam:
Moco Museum
Conclusion
● As we grew, we needed to refine our
Data science process
● Software engineering good practices
+ special twist for ML
● Our platform helps Data Scientists to
○ Build faster
○ Deploy safer
○ Document automatically
28
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.
29

More Related Content

What's hot

Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Ed Fernandez
 
MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle
Databricks
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOps
Weaveworks
 

What's hot (20)

Accelerate Your ML Pipeline with AutoML and MLflow
Accelerate Your ML Pipeline with AutoML and MLflowAccelerate Your ML Pipeline with AutoML and MLflow
Accelerate Your ML Pipeline with AutoML and MLflow
 
MLOps - The Assembly Line of ML
MLOps - The Assembly Line of MLMLOps - The Assembly Line of ML
MLOps - The Assembly Line of ML
 
MLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
MLOps journey at Swisscom: AI Use Cases, Architecture and Future VisionMLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
MLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflow
 
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
 Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
 
MLflow Model Serving
MLflow Model ServingMLflow Model Serving
MLflow Model Serving
 
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
 
Knowledge Graphs and Generative AI
Knowledge Graphs and Generative AIKnowledge Graphs and Generative AI
Knowledge Graphs and Generative AI
 
MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle
 
Apply MLOps at Scale by H&M
Apply MLOps at Scale by H&MApply MLOps at Scale by H&M
Apply MLOps at Scale by H&M
 
Simplifying Model Management with MLflow
Simplifying Model Management with MLflowSimplifying Model Management with MLflow
Simplifying Model Management with MLflow
 
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa... Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 
MLOps Virtual Event: Automating ML at Scale
MLOps Virtual Event: Automating ML at ScaleMLOps Virtual Event: Automating ML at Scale
MLOps Virtual Event: Automating ML at Scale
 
"Managing the Complete Machine Learning Lifecycle with MLflow"
"Managing the Complete Machine Learning Lifecycle with MLflow""Managing the Complete Machine Learning Lifecycle with MLflow"
"Managing the Complete Machine Learning Lifecycle with MLflow"
 
Ml ops intro session
Ml ops   intro sessionMl ops   intro session
Ml ops intro session
 
Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...
Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...
Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflow
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOps
 
MLOps in action
MLOps in actionMLOps in action
MLOps in action
 
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
 

Similar to How to Build a ML Platform Efficiently Using Open-Source

Deploying ML models in the enterprise
Deploying ML models in the enterpriseDeploying ML models in the enterprise
Deploying ML models in the enterprise
doppenhe
 

Similar to How to Build a ML Platform Efficiently Using Open-Source (20)

From Data Science to MLOps
From Data Science to MLOpsFrom Data Science to MLOps
From Data Science to MLOps
 
The Need for Speed
The Need for SpeedThe Need for Speed
The Need for Speed
 
Ds for finance day 4
Ds for finance day 4Ds for finance day 4
Ds for finance day 4
 
DutchMLSchool. ML for Energy Trading and Automotive Sector
DutchMLSchool. ML for Energy Trading and Automotive SectorDutchMLSchool. ML for Energy Trading and Automotive Sector
DutchMLSchool. ML for Energy Trading and Automotive Sector
 
Consolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest AirportsConsolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest Airports
 
Practical model management in the age of Data science and ML
Practical model management in the age of Data science and MLPractical model management in the age of Data science and ML
Practical model management in the age of Data science and ML
 
Deploying ML models in the enterprise
Deploying ML models in the enterpriseDeploying ML models in the enterprise
Deploying ML models in the enterprise
 
Pitfalls of machine learning in production
Pitfalls of machine learning in productionPitfalls of machine learning in production
Pitfalls of machine learning in production
 
Reproducibility and experiments management in Machine Learning
Reproducibility and experiments management in Machine Learning Reproducibility and experiments management in Machine Learning
Reproducibility and experiments management in Machine Learning
 
What is Machine Learning Operations (MLOps)?
What is Machine Learning Operations (MLOps)?What is Machine Learning Operations (MLOps)?
What is Machine Learning Operations (MLOps)?
 
Model driven engineering for big data management systems
Model driven engineering for big data management systemsModel driven engineering for big data management systems
Model driven engineering for big data management systems
 
The A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOpsThe A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOps
 
Pivoting event streaming, from PROJECTS to a PLATFORM
Pivoting event streaming, from PROJECTS to a PLATFORMPivoting event streaming, from PROJECTS to a PLATFORM
Pivoting event streaming, from PROJECTS to a PLATFORM
 
Mohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with KubeflowMohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with Kubeflow
 
Mohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with KubeflowMohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with Kubeflow
 
PAD-3126 - Evolving the DevOps Organization around IBM PureApplication System...
PAD-3126 - Evolving the DevOps Organization around IBM PureApplication System...PAD-3126 - Evolving the DevOps Organization around IBM PureApplication System...
PAD-3126 - Evolving the DevOps Organization around IBM PureApplication System...
 
Regtech in Fintech + QuSandbox Demo
Regtech in Fintech + QuSandbox DemoRegtech in Fintech + QuSandbox Demo
Regtech in Fintech + QuSandbox Demo
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
Building Intelligent Solutions with Graphs, Stefan Kolmar, Neo4j
Building Intelligent Solutions with Graphs, Stefan Kolmar, Neo4jBuilding Intelligent Solutions with Graphs, Stefan Kolmar, Neo4j
Building Intelligent Solutions with Graphs, Stefan Kolmar, Neo4j
 
ITMAGINATION - competences, facts, technologies, clients
ITMAGINATION - competences, facts, technologies, clientsITMAGINATION - competences, facts, technologies, clients
ITMAGINATION - competences, facts, technologies, clients
 

More from Databricks

Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
DilipVasan
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage s
MAQIB18
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 

Recently uploaded (20)

Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage s
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDB
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 

How to Build a ML Platform Efficiently Using Open-Source

  • 1. How to Build a ML Platform Efficiently Using Open-Source Jean Carlo Machado Theodore Meynard GetYourGuide 1
  • 2. Agenda ▪ Introduction ▪ ML at GetYourGuide ▪ Before the Platform ▪ ML Platform ▪ Demo ▪ Final words 2
  • 3. Who Are We Theo: Senior Data Scientist Jean: Senior Software Engineer 3 theodore.meynard@getyourguide.com jean.machado@getyourguide.com
  • 5. We’ve built the world’s largest marketplace for travel activities… Millions of travelers use GetYourGuide every year We facilitate the transaction We offer more than 40,000 activities worldwide Connecting customers... ...to suppliers around the world 5
  • 10. Amsterdam : Canal Cruise Other Data Products We also use ML for: • Demand forecasting • Inventory labeling ⇒ 20+ ML projects distributed in 2 teams + delivered models to other teams to maintain 10
  • 11. Data Product Principles We follow clean code principles PoCs are temporary We build solid, resilient deployment processes Know our models health at every point in time Quality, Testing & Monitoring We integrate into the engineering ecosystem and leverage open-source Data workflows are efficient and cost-effective We take reproducibility and modularity seriously Engineering We Promote the Data Product mindset We deeply integrate with data stakeholders Stakeholder Engagement We actively manage the unknowns in our planning Data analytics dynamically informs our project plans Workflow Customer and business value over fancy solutions Exploration is one of our goals Strategy We value small iterations on existing models We value explainability over pure accuracy Performance is proven online Model Our principles explained 11
  • 13. How We Started Pros ● Widely used by ML practitioners ● Good to start new projects & prototype ● Great visualization Cons ● No proper version control ● No code reuse ● No automatic testing 13
  • 14. A Major Improvement Pros ● Tests included in library ● Version control with code review ● Maintainable projects Cons ● No CI/CD ● No model tracking 14
  • 16. From Amsterdam: Volendam, Marken and Windmills ML Platform Key Features ● CI/CD ● Model Tracking ● Batch & Online inference 16
  • 17. ML Platform Principles • Maximize data scientist’s model ownership • Reproducible Machine Learning • Reuse most of our existing infrastructure • Build incrementally • Use open-source and open standards 17
  • 23. Online Inference With BentoML 23
  • 24. From MLflow to BentoML Example import mlflow from iris_classifier import IrisClassifier # Load mlflow model mlflow_model = mlflow.sklearn.load_model(model_uri) # Create a iris classifier service instance iris_classifier_service = IrisClassifier() # Pack the newly trained model artifact iris_classifier_service.pack("model", mlflow_model) # Save the prediction service for model serving saved_path = iris_classifier_service.save() from bentoml import env, artifacts, api, BentoService from bentoml.adapters import DataframeInput from bentoml.frameworks.sklearn import SklearnModelArtifact @artifacts([SklearnModelArtifact("model")]) class IrisClassifier(BentoService): """A minimum prediction service""" @api(input=DataframeInput()) def predict(self, df: pd.DataFrame): """An inference API named `predict`""" return self.artifacts.model.predict(df) iris_classifier.py mlflow2bentoml.py 24
  • 27. ● The integration with the existing architecture fosters proactive collaboration ● The tool space is very new making the exploration vital but time consuming ● A design review is necessary to align everyone Learnings 27
  • 28. Amsterdam: Moco Museum Conclusion ● As we grew, we needed to refine our Data science process ● Software engineering good practices + special twist for ML ● Our platform helps Data Scientists to ○ Build faster ○ Deploy safer ○ Document automatically 28
  • 29. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions. 29