SlideShare a Scribd company logo
1 of 18
Towards Data Science
Engineering Principles
Denis Jannot – EMEA Sales Engineer - @djannot
Milan | November 29 - 30, 2018
2
© 2018 Mesosphere, Inc. All Rights Reserved.
Machine Learning: Why it matters?
Health Care
Patient Diagnosis
Finance
Fraud Detection
Manufacturing
Anomaly Detection
Retail
Inventory Optimization
Media
Interaction and Speed
Transportation
Demand Forecasting
eCommerce
Recommender System
Building precise models, an organization has a better chance of
identifying profitable opportunities or avoiding unknown risks.
3
What you want to be doing
Get
Data
Write intelligent machine learning code
Train
Model
Run
Model
Repeat
4
ML Code isn’t the hardest part of Machine Learning Lifecycle
Sculley, D., Holt, G., Golovin, D. et al. Hidden Technical Debt in Machine Learning Systems
Configuration
Machine
Resource
Management
and
Monitoring
Serving
Infrastructure
Data Collection
Data Verification
Process Management
Tools
Feature
Extraction
ML Analysis Tools
Model
Monitoring
System Admin / DevOps
Data Engineer / DataOps
Data Scientist
Applying Software Delivery Best Practices to Data Science
The Rise of the DataOps Engineer
Combines two key skills:
- Data science
- Distributed systems engineering
The equivalent of DevOps for Data Science
Digital Transformation Challenges
Data
Analytics
Cluster
Message
Queue
Cluster
Data
Persistence
Cluster
Container
Orchestration
Cluster
CI/CD
Cluster High cost, low utilization
Cloud-native technologies are distributed systems
that require dedicated clusters
Significant training &
skills development
Complex IT projects to run containers and data
services; developed talent poached by competitors
Container Orchestration Access to the latest
technology is needed
Container orchestration, data services, machine
learning/AI tools, & CI/CD toolchains
Data Services
Locked to a cloud provider
Expensive services
ECS DynamoDBKinesisEMRCodePipeline
Mesosphere DC/OS can be deployed anywhere
Datacenter &
Edge Clouds
AWS EC2 Microsoft Azure Google Cloud
8
No lock-in
Moving working
between clouds or
even back on
premise
Mesosphere DC/OS can be deployed anywhere
Datacenter &
Edge Clouds
AWS EC2 Microsoft Azure Google Cloud
9
No lock-in
Moving working
between clouds or
even back on
premise
Apache Mesos aggregates & manages all the resources
Datacenter &
Edge Clouds
AWS EC2 Microsoft Azure Google Cloud
10
DISKGPUCPU
Reduce
infrastructure cost
Deploying all the
services on the same
hardware
No lock-in
Moving working
between clouds or
even back on
premise
Run everything on the same cluster
Typical Datacenter
siloed, over-provisioned servers,
low utilization
DC/OS Datacenter
automated schedulers, workload multiplexing onto the
same machines
Industry Average
12-15% utilization
DC/OS Multiplexing
30-40% utilization, up
to 96% at some
customers
4X
Kubernetes
Kafka
Jenkins
Cassandra
Spark
DC/OS provides everything you need as a Service
12
Transport
Message Queue
Store
Distributed Database
Analyze
AI / ML + Real-time
Serve
Container
Orchestration
Datacenter &
Edge Clouds
AWS EC2 Microsoft Azure Google Cloud
DISKGPUCPU
Reduce
management cost
Automating the
deployment, upgrade,
recovery, …
Reduce
infrastructure cost
Deploying all the
services on the same
hardware
Faster time to
market
Providing the latest
technology in few clicks
No lock-in
Moving working
between clouds or
even back on
premise
Automation AuditingMultitenancy Logging
All you need “as a Service” with Mesosphere support
14
Accelerate Machine Learning with Dramatically Lower Cost
● Ad-hoc provisioning
● Underutilized resource and data infrastructure
● Higher risk of data loss
● Jupyter Notebook as-a-Service
● Secure Collaboration
● Accelerated prototyping and deployment
Transport
Message Queue
Store
Distributed Database
Analyze
AI / ML + Real-time
Serve
Container
Orchestration
Unique Hybrid Cloud capabilities
Datacenter &
Edge Clouds
AWS EC2 Microsoft Azure Google Cloud
15
DISKGPUCPU
Reduce
management cost
Automating the
deployment, upgrade,
recovery, …
Reduce
infrastructure cost
Deploying all the
services on the same
hardware
Faster time to
market
Providing the latest
technology in few clicks
No lock-in
Moving working
between clouds or
even back on
premise
Hybrid Cloud ( Cloud Bursting)
Automation AuditingMultitenancy Logging
Get Pictures on 2
categories using the
Flickr API
Persist the pictures in
HDFS
Retrain an Image
Classifier for New
Categories
Kerberos
TLS
Provide a notebook
experience to the end
user
Push the model in a
git repository
Configure a CI/CD
pipeline to build and
deploy a Docker
image to serve the
model
What have we demonstrated ?
● Mesosphere DC/OS provides all the software you need to create a Secure ML pipeline
● It can deploy each of them in few minutes
● It can secure all the communications with Kerberos and TLS
● It provides the same experience on premise or on any public cloud, at a predictable cost
● It provides a nice notebook experience for the data scientists
● It can leverage GPUs
● Mesos quotas can be leveraged to guarantee resources
● Kubernetes can be used to serve the models
Learn more at mesosphere.com

More Related Content

What's hot

Datacentre Expo - case study consolidation
Datacentre Expo - case study consolidationDatacentre Expo - case study consolidation
Datacentre Expo - case study consolidation
Jan Wiersma
 

What's hot (20)

Cloud fundamentals, with Kevin Ashby and Brett Samuel
Cloud fundamentals, with Kevin Ashby and Brett SamuelCloud fundamentals, with Kevin Ashby and Brett Samuel
Cloud fundamentals, with Kevin Ashby and Brett Samuel
 
Cloud computing in business
Cloud computing in businessCloud computing in business
Cloud computing in business
 
Machine Learning Operations (MLOps) - Active Failures and Latent Conditions
Machine Learning Operations (MLOps) - Active Failures and Latent ConditionsMachine Learning Operations (MLOps) - Active Failures and Latent Conditions
Machine Learning Operations (MLOps) - Active Failures and Latent Conditions
 
Webinar: Which Storage Architecture is Best for Splunk Analytics?
Webinar: Which Storage Architecture is Best for Splunk Analytics?Webinar: Which Storage Architecture is Best for Splunk Analytics?
Webinar: Which Storage Architecture is Best for Splunk Analytics?
 
The Private Cloud Isn't Dead
The Private Cloud Isn't DeadThe Private Cloud Isn't Dead
The Private Cloud Isn't Dead
 
Webinar: Cut Disaster Recovery Expenses – Improve Recovery Times
Webinar: Cut Disaster Recovery Expenses – Improve Recovery TimesWebinar: Cut Disaster Recovery Expenses – Improve Recovery Times
Webinar: Cut Disaster Recovery Expenses – Improve Recovery Times
 
Why is hybrid cloud still so hard? 4 keys to unlock the future of IT
Why is hybrid cloud still so hard? 4 keys to unlock the future of ITWhy is hybrid cloud still so hard? 4 keys to unlock the future of IT
Why is hybrid cloud still so hard? 4 keys to unlock the future of IT
 
CS150
CS150CS150
CS150
 
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...
 
Why edge computing is critical to hybrid IT and cloud success
Why edge computing is critical to hybrid IT and cloud successWhy edge computing is critical to hybrid IT and cloud success
Why edge computing is critical to hybrid IT and cloud success
 
Storage Challenges for Production Machine Learning
Storage Challenges for Production Machine LearningStorage Challenges for Production Machine Learning
Storage Challenges for Production Machine Learning
 
Accelerating Digital Transformation with Better Data Management
Accelerating Digital Transformation with Better Data ManagementAccelerating Digital Transformation with Better Data Management
Accelerating Digital Transformation with Better Data Management
 
Datacentre Expo - case study consolidation
Datacentre Expo - case study consolidationDatacentre Expo - case study consolidation
Datacentre Expo - case study consolidation
 
For linked in part 2 no template
For linked in part 2  no templateFor linked in part 2  no template
For linked in part 2 no template
 
Denodo Cloud Survey Results 2017
Denodo Cloud Survey Results 2017Denodo Cloud Survey Results 2017
Denodo Cloud Survey Results 2017
 
A "First Time Right" Start with Data Virtualization by Bart De Groeve, Practi...
A "First Time Right" Start with Data Virtualization by Bart De Groeve, Practi...A "First Time Right" Start with Data Virtualization by Bart De Groeve, Practi...
A "First Time Right" Start with Data Virtualization by Bart De Groeve, Practi...
 
IBM Big Data in the Cloud
IBM Big Data in the CloudIBM Big Data in the Cloud
IBM Big Data in the Cloud
 
Real timedata
Real timedataReal timedata
Real timedata
 
How Cloud is Affecting Data Scientists
How Cloud is Affecting Data Scientists How Cloud is Affecting Data Scientists
How Cloud is Affecting Data Scientists
 
MSPs: Give customers the cloud (without letting them float away)
MSPs: Give customers the cloud (without letting them float away)MSPs: Give customers the cloud (without letting them float away)
MSPs: Give customers the cloud (without letting them float away)
 

Similar to Denis Jannot - Towards Data Science Engineering Principles - Codemotion Milan 2018

Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Kai Wähner
 
Wicsa2011 cloud tutorial
Wicsa2011 cloud tutorialWicsa2011 cloud tutorial
Wicsa2011 cloud tutorial
Anna Liu
 
Azure Overview Csco
Azure Overview CscoAzure Overview Csco
Azure Overview Csco
rajramab
 
Cloud computing (2)
Cloud computing (2)Cloud computing (2)
Cloud computing (2)
Vincent Kwon
 

Similar to Denis Jannot - Towards Data Science Engineering Principles - Codemotion Milan 2018 (20)

Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use Cases
 
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
 
Wicsa2011 cloud tutorial
Wicsa2011 cloud tutorialWicsa2011 cloud tutorial
Wicsa2011 cloud tutorial
 
Vucci IBM Smart Cloud Presentation
Vucci IBM Smart Cloud PresentationVucci IBM Smart Cloud Presentation
Vucci IBM Smart Cloud Presentation
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
Digitalization strategy for downstream oil refineries
Digitalization strategy for downstream oil refineriesDigitalization strategy for downstream oil refineries
Digitalization strategy for downstream oil refineries
 
Building ML Pipelines with DCOS
Building ML Pipelines with DCOSBuilding ML Pipelines with DCOS
Building ML Pipelines with DCOS
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
 
TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform
 
Building what's next with google cloud's powerful infrastructure
Building what's next with google cloud's powerful infrastructureBuilding what's next with google cloud's powerful infrastructure
Building what's next with google cloud's powerful infrastructure
 
CRTC Cloud- Scott Sadler
CRTC Cloud- Scott SadlerCRTC Cloud- Scott Sadler
CRTC Cloud- Scott Sadler
 
Arocom Company - Portfolio Brochure Details.pdf
Arocom Company - Portfolio Brochure Details.pdfArocom Company - Portfolio Brochure Details.pdf
Arocom Company - Portfolio Brochure Details.pdf
 
DataAquitaine February 2022
DataAquitaine February 2022DataAquitaine February 2022
DataAquitaine February 2022
 
Cloud Computing - Beyond the Hype
Cloud Computing - Beyond the HypeCloud Computing - Beyond the Hype
Cloud Computing - Beyond the Hype
 
Big Data on Azure Tutorial
Big Data on Azure TutorialBig Data on Azure Tutorial
Big Data on Azure Tutorial
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
accelerate-intelligent-solutions-with-machine-learning-platform-brief
accelerate-intelligent-solutions-with-machine-learning-platform-briefaccelerate-intelligent-solutions-with-machine-learning-platform-brief
accelerate-intelligent-solutions-with-machine-learning-platform-brief
 
Azure Overview Csco
Azure Overview CscoAzure Overview Csco
Azure Overview Csco
 
The Eco-System of AI and How to Use It
The Eco-System of AI and How to Use ItThe Eco-System of AI and How to Use It
The Eco-System of AI and How to Use It
 
Cloud computing (2)
Cloud computing (2)Cloud computing (2)
Cloud computing (2)
 

More from Codemotion

More from Codemotion (20)

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending story
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storia
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard Altwasser
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
 

Recently uploaded

TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc
 

Recently uploaded (20)

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational Performance
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 

Denis Jannot - Towards Data Science Engineering Principles - Codemotion Milan 2018

  • 1. Towards Data Science Engineering Principles Denis Jannot – EMEA Sales Engineer - @djannot Milan | November 29 - 30, 2018
  • 2. 2 © 2018 Mesosphere, Inc. All Rights Reserved. Machine Learning: Why it matters? Health Care Patient Diagnosis Finance Fraud Detection Manufacturing Anomaly Detection Retail Inventory Optimization Media Interaction and Speed Transportation Demand Forecasting eCommerce Recommender System Building precise models, an organization has a better chance of identifying profitable opportunities or avoiding unknown risks.
  • 3. 3 What you want to be doing Get Data Write intelligent machine learning code Train Model Run Model Repeat
  • 4. 4 ML Code isn’t the hardest part of Machine Learning Lifecycle Sculley, D., Holt, G., Golovin, D. et al. Hidden Technical Debt in Machine Learning Systems Configuration Machine Resource Management and Monitoring Serving Infrastructure Data Collection Data Verification Process Management Tools Feature Extraction ML Analysis Tools Model Monitoring System Admin / DevOps Data Engineer / DataOps Data Scientist
  • 5. Applying Software Delivery Best Practices to Data Science
  • 6. The Rise of the DataOps Engineer Combines two key skills: - Data science - Distributed systems engineering The equivalent of DevOps for Data Science
  • 7. Digital Transformation Challenges Data Analytics Cluster Message Queue Cluster Data Persistence Cluster Container Orchestration Cluster CI/CD Cluster High cost, low utilization Cloud-native technologies are distributed systems that require dedicated clusters Significant training & skills development Complex IT projects to run containers and data services; developed talent poached by competitors Container Orchestration Access to the latest technology is needed Container orchestration, data services, machine learning/AI tools, & CI/CD toolchains Data Services Locked to a cloud provider Expensive services ECS DynamoDBKinesisEMRCodePipeline
  • 8. Mesosphere DC/OS can be deployed anywhere Datacenter & Edge Clouds AWS EC2 Microsoft Azure Google Cloud 8 No lock-in Moving working between clouds or even back on premise
  • 9. Mesosphere DC/OS can be deployed anywhere Datacenter & Edge Clouds AWS EC2 Microsoft Azure Google Cloud 9 No lock-in Moving working between clouds or even back on premise
  • 10. Apache Mesos aggregates & manages all the resources Datacenter & Edge Clouds AWS EC2 Microsoft Azure Google Cloud 10 DISKGPUCPU Reduce infrastructure cost Deploying all the services on the same hardware No lock-in Moving working between clouds or even back on premise
  • 11. Run everything on the same cluster Typical Datacenter siloed, over-provisioned servers, low utilization DC/OS Datacenter automated schedulers, workload multiplexing onto the same machines Industry Average 12-15% utilization DC/OS Multiplexing 30-40% utilization, up to 96% at some customers 4X Kubernetes Kafka Jenkins Cassandra Spark
  • 12. DC/OS provides everything you need as a Service 12 Transport Message Queue Store Distributed Database Analyze AI / ML + Real-time Serve Container Orchestration Datacenter & Edge Clouds AWS EC2 Microsoft Azure Google Cloud DISKGPUCPU Reduce management cost Automating the deployment, upgrade, recovery, … Reduce infrastructure cost Deploying all the services on the same hardware Faster time to market Providing the latest technology in few clicks No lock-in Moving working between clouds or even back on premise Automation AuditingMultitenancy Logging
  • 13. All you need “as a Service” with Mesosphere support
  • 14. 14 Accelerate Machine Learning with Dramatically Lower Cost ● Ad-hoc provisioning ● Underutilized resource and data infrastructure ● Higher risk of data loss ● Jupyter Notebook as-a-Service ● Secure Collaboration ● Accelerated prototyping and deployment
  • 15. Transport Message Queue Store Distributed Database Analyze AI / ML + Real-time Serve Container Orchestration Unique Hybrid Cloud capabilities Datacenter & Edge Clouds AWS EC2 Microsoft Azure Google Cloud 15 DISKGPUCPU Reduce management cost Automating the deployment, upgrade, recovery, … Reduce infrastructure cost Deploying all the services on the same hardware Faster time to market Providing the latest technology in few clicks No lock-in Moving working between clouds or even back on premise Hybrid Cloud ( Cloud Bursting) Automation AuditingMultitenancy Logging
  • 16. Get Pictures on 2 categories using the Flickr API Persist the pictures in HDFS Retrain an Image Classifier for New Categories Kerberos TLS Provide a notebook experience to the end user Push the model in a git repository Configure a CI/CD pipeline to build and deploy a Docker image to serve the model
  • 17. What have we demonstrated ? ● Mesosphere DC/OS provides all the software you need to create a Secure ML pipeline ● It can deploy each of them in few minutes ● It can secure all the communications with Kerberos and TLS ● It provides the same experience on premise or on any public cloud, at a predictable cost ● It provides a nice notebook experience for the data scientists ● It can leverage GPUs ● Mesos quotas can be leveraged to guarantee resources ● Kubernetes can be used to serve the models
  • 18. Learn more at mesosphere.com