Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
RADJI Yannick, SODEXO
Industrializing ML on
Azure with DataBricks
#UnifiedDataAnalytics #SparkAISummit
Who am I?
3
Data
Scientist
Data
Engineer
Data
Architect
Using:
• Spark since 2015
• DataBricks since 2018
Work for:
• Indu...
Agenda
• Sodexo
• Food service use case
• What was there?
• What are we using know?
• Our development practices
4
€20.4bn
in consolidated revenues
460,000
employees
#1
France-based private employer
worldwide*
100
million
consumers
serve...
On-site
Services
Sodexo
Benefits & Rewards
Services
Personal & Home
Services
6
> Corporate Services
> Energy & Resources
>...
Sodexo
7
Business & Administrations
10,938
million euro
in revenues
276,573
employees
56%
of Group
revenues
REVENUES BY
CL...
Food service use case
8
What was there?
9
• Dozens of Jupyter’s notebooks running on Azure HDInsight
Apache Spark on
HDInsight
Hive Query on
HDIns...
What was there?
10
• Unversionned, no CI/CD
• Hard to maintain & operate with high costs
• No orchestration
• Not versatil...
Current (macro) Architecture
11
Site
managers
Azure
DevOps
Azure
Monitor
Azure Data
Factory
Azure Data
Lake
Azure
KeyVault...
Azure DataBricks
Why migrate?
• Same tools used by data scientist exploration on notebooks &
industrialisation by data eng...
Azure DataBricks
Feedbacks:
• Cluster’s autoScaling gives fine results
• Pools helps to speed up jobs
13
Azure DataFactory
Why we use it?
• Managed service
• Native integrated to all azure services and on premise DB
• Easy orch...
Azure DataFactory
Lesson learned?
• Call to DataBricks via web activity or DataBricks activity
• Cost saving by using a Da...
Azure DataFactory
16
Azure Monitor
Why we use it?
• Monitor, analyze daily run and send mails alerts
• Same tool to get metrics and logs from a...
Azure Monitor
18
source: docs.azuredatabricks.net
Azure Monitor
19
source: docs.azuredatabricks.net
Azure Monitor
Lesson learned?
• It uses KQL language with specificities so there is a little learning
curve is to plan
• Y...
Azure Data Lake
Why we use it?
• Used to store the sources CSV files and output parquet files of
DataBricks jobs
Lesson le...
Azure Kubernetes
Why we use it?
• To host our dashboard, REST API & DataBase
Lesson learned?
• The managed service save yo...
Azure DevOps
• Azure Repos
• Azure Pipelines
23
source: docs.microsoft.com
Azure DevOps
• Azure Boards
• Azure Test Plans
• Azure Artifacts
24
source: docs.microsoft.com
Azure DevOps
25
Sprint
planning
Commits
link to
tasks
Unit &
Integration
tests are
passing
Pull
Request
DEV
Review,
UAT
Re...
Our development practices
26
• Use naming conventions
• Validate inputs config or parameters
• New code should come with n...
Summary
• To industrialiaze you need to design an architecture that is :
– Versatile
– Easy to maintain
– Operate with low...
DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
Please send me any suggestion :
yannick.radji@sodexo.com
SEARCH SPARK + AI SU...
Q&A
#UnifiedDataAnalytics #SparkAISummit
Upcoming SlideShare
Loading in …5
×

of

Industrializing Machine Learning on an Enterprise Azure Platform with Databricks: Experiences and Feedbacks Slide 1 Industrializing Machine Learning on an Enterprise Azure Platform with Databricks: Experiences and Feedbacks Slide 2 Industrializing Machine Learning on an Enterprise Azure Platform with Databricks: Experiences and Feedbacks Slide 3 Industrializing Machine Learning on an Enterprise Azure Platform with Databricks: Experiences and Feedbacks Slide 4 Industrializing Machine Learning on an Enterprise Azure Platform with Databricks: Experiences and Feedbacks Slide 5 Industrializing Machine Learning on an Enterprise Azure Platform with Databricks: Experiences and Feedbacks Slide 6 Industrializing Machine Learning on an Enterprise Azure Platform with Databricks: Experiences and Feedbacks Slide 7 Industrializing Machine Learning on an Enterprise Azure Platform with Databricks: Experiences and Feedbacks Slide 8 Industrializing Machine Learning on an Enterprise Azure Platform with Databricks: Experiences and Feedbacks Slide 9 Industrializing Machine Learning on an Enterprise Azure Platform with Databricks: Experiences and Feedbacks Slide 10 Industrializing Machine Learning on an Enterprise Azure Platform with Databricks: Experiences and Feedbacks Slide 11 Industrializing Machine Learning on an Enterprise Azure Platform with Databricks: Experiences and Feedbacks Slide 12 Industrializing Machine Learning on an Enterprise Azure Platform with Databricks: Experiences and Feedbacks Slide 13 Industrializing Machine Learning on an Enterprise Azure Platform with Databricks: Experiences and Feedbacks Slide 14 Industrializing Machine Learning on an Enterprise Azure Platform with Databricks: Experiences and Feedbacks Slide 15 Industrializing Machine Learning on an Enterprise Azure Platform with Databricks: Experiences and Feedbacks Slide 16 Industrializing Machine Learning on an Enterprise Azure Platform with Databricks: Experiences and Feedbacks Slide 17 Industrializing Machine Learning on an Enterprise Azure Platform with Databricks: Experiences and Feedbacks Slide 18 Industrializing Machine Learning on an Enterprise Azure Platform with Databricks: Experiences and Feedbacks Slide 19 Industrializing Machine Learning on an Enterprise Azure Platform with Databricks: Experiences and Feedbacks Slide 20 Industrializing Machine Learning on an Enterprise Azure Platform with Databricks: Experiences and Feedbacks Slide 21 Industrializing Machine Learning on an Enterprise Azure Platform with Databricks: Experiences and Feedbacks Slide 22 Industrializing Machine Learning on an Enterprise Azure Platform with Databricks: Experiences and Feedbacks Slide 23 Industrializing Machine Learning on an Enterprise Azure Platform with Databricks: Experiences and Feedbacks Slide 24 Industrializing Machine Learning on an Enterprise Azure Platform with Databricks: Experiences and Feedbacks Slide 25 Industrializing Machine Learning on an Enterprise Azure Platform with Databricks: Experiences and Feedbacks Slide 26 Industrializing Machine Learning on an Enterprise Azure Platform with Databricks: Experiences and Feedbacks Slide 27 Industrializing Machine Learning on an Enterprise Azure Platform with Databricks: Experiences and Feedbacks Slide 28 Industrializing Machine Learning on an Enterprise Azure Platform with Databricks: Experiences and Feedbacks Slide 29
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

0 Likes

Share

Download to read offline

Industrializing Machine Learning on an Enterprise Azure Platform with Databricks: Experiences and Feedbacks

Download to read offline

<p>Sodexo as the world leader of services, we believe that quality of life is created when we integrate our Food services, Facilities management, employee benefits and more… Our ambition is to positively impact one billion consumers worldwide. Sodexo has launched a POC for food services that is satisfying our business, but how do we come from a local pure Python Jupyter notebook to a product that can be scalable to cover all our consumers? To do so we have not only used DataBricks but we have made it interacting with many Azure services: Data Factory, Log Analytics, DevOps, CosmosDB, Kubernetes, Data Lake and Key Vault. This talk deal about what are those services, how we use Databricks, how it’s interacting with those services, and what are our feedbacks and experiences.</p>

  • Be the first to like this

Industrializing Machine Learning on an Enterprise Azure Platform with Databricks: Experiences and Feedbacks

  1. 1. WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
  2. 2. RADJI Yannick, SODEXO Industrializing ML on Azure with DataBricks #UnifiedDataAnalytics #SparkAISummit
  3. 3. Who am I? 3 Data Scientist Data Engineer Data Architect Using: • Spark since 2015 • DataBricks since 2018 Work for: • Industry • Consulting • Services
  4. 4. Agenda • Sodexo • Food service use case • What was there? • What are we using know? • Our development practices 4
  5. 5. €20.4bn in consolidated revenues 460,000 employees #1 France-based private employer worldwide* 100 million consumers served daily 72countries World leader in Quality of Life Services 5 Key figures Sodexo *2018 Fortune 500 ranking
  6. 6. On-site Services Sodexo Benefits & Rewards Services Personal & Home Services 6 > Corporate Services > Energy & Resources > Government & Agencies > Sports & Leisure > Health Care > Seniors > Universities > Schools ▪ EMPLOYEE EXPERIENCE ▪ MOBILITY AND EXPENSE ▪ EDUCATION ▪ HEALTH CARE & SENIORS ▪ BUSINESS & ADMINISTRATIONS ▪ CHILD CARE ▪ CONCIERGE SERVICES ▪ HOME CARE We are the global leader in quality of life services
  7. 7. Sodexo 7 Business & Administrations 10,938 million euro in revenues 276,573 employees 56% of Group revenues REVENUES BY CLIENT SUB-SEGMENT KEY FIGURES
  8. 8. Food service use case 8
  9. 9. What was there? 9 • Dozens of Jupyter’s notebooks running on Azure HDInsight Apache Spark on HDInsight Hive Query on HDInsight Azure Blob Storage Azure Data Lake Store
  10. 10. What was there? 10 • Unversionned, no CI/CD • Hard to maintain & operate with high costs • No orchestration • Not versatile • Not secure • Poor performances • No live Data • No re-usable design • Library conflict
  11. 11. Current (macro) Architecture 11 Site managers Azure DevOps Azure Monitor Azure Data Factory Azure Data Lake Azure KeyVault Azure DataBricks Azure Kubernetes Batch data Store Prep & Train Serve Azure Container Registry
  12. 12. Azure DataBricks Why migrate? • Same tools used by data scientist exploration on notebooks & industrialisation by data engineer • Managed Spark cluster • Large compatibility of Spark’s versions • Centralize all the ETL in Spark whereas split it on several technologies 12
  13. 13. Azure DataBricks Feedbacks: • Cluster’s autoScaling gives fine results • Pools helps to speed up jobs 13
  14. 14. Azure DataFactory Why we use it? • Managed service • Native integrated to all azure services and on premise DB • Easy orchestrate & scheduling • No limitations on data volume or on the number of files 14
  15. 15. Azure DataFactory Lesson learned? • Call to DataBricks via web activity or DataBricks activity • Cost saving by using a DataBricks cluster job and argparse compatible • You can use 1 template for all jobs 15
  16. 16. Azure DataFactory 16
  17. 17. Azure Monitor Why we use it? • Monitor, analyze daily run and send mails alerts • Same tool to get metrics and logs from all the packages of our software and services of Azure platform (including Databricks) 17
  18. 18. Azure Monitor 18 source: docs.azuredatabricks.net
  19. 19. Azure Monitor 19 source: docs.azuredatabricks.net
  20. 20. Azure Monitor Lesson learned? • It uses KQL language with specificities so there is a little learning curve is to plan • You can do your own library but rather use SDK for python, java, JS, C# & .NET to send app logs 20
  21. 21. Azure Data Lake Why we use it? • Used to store the sources CSV files and output parquet files of DataBricks jobs Lesson learned? • To easily use it, mount the Data Lake with dbutils • Data Lake Storage Gen2 is converging of Azure Blob storage and Azure Data Lake Storage Gen1 – not available in all regions • Large number of files, propagating the permissions can take long 21
  22. 22. Azure Kubernetes Why we use it? • To host our dashboard, REST API & DataBase Lesson learned? • The managed service save you some DevOps time to set up the cluster • Configure a secure network can be challenging 22
  23. 23. Azure DevOps • Azure Repos • Azure Pipelines 23 source: docs.microsoft.com
  24. 24. Azure DevOps • Azure Boards • Azure Test Plans • Azure Artifacts 24 source: docs.microsoft.com
  25. 25. Azure DevOps 25 Sprint planning Commits link to tasks Unit & Integration tests are passing Pull Request DEV Review, UAT Retro, Refinment
  26. 26. Our development practices 26 • Use naming conventions • Validate inputs config or parameters • New code should come with new tests • Documents • Avoid repetitive code • Use a logger do not print • Take the time to think about the right data structure to use (a dictionary? Tuple? Dataframe? Array?) • Break down responsibilities between the classes • Code should be easy to explain to a third person (avoid complex code) • Developers should used an unify development environment • Use dockerized environments to make simulation of real interactions • Keep development, UAT, and production as similar as possible • One codebase protected in revision control, many releases • Strictly separate build and release stages
  27. 27. Summary • To industrialiaze you need to design an architecture that is : – Versatile – Easy to maintain – Operate with low costs • Azure provides managed services to: – Orchestrate – Store – Set up all your DevOps – Host • Finally you need a good team with development practices 27
  28. 28. DON’T FORGET TO RATE AND REVIEW THE SESSIONS Please send me any suggestion : yannick.radji@sodexo.com SEARCH SPARK + AI SUMMIT
  29. 29. Q&A #UnifiedDataAnalytics #SparkAISummit

<p>Sodexo as the world leader of services, we believe that quality of life is created when we integrate our Food services, Facilities management, employee benefits and more… Our ambition is to positively impact one billion consumers worldwide. Sodexo has launched a POC for food services that is satisfying our business, but how do we come from a local pure Python Jupyter notebook to a product that can be scalable to cover all our consumers? To do so we have not only used DataBricks but we have made it interacting with many Azure services: Data Factory, Log Analytics, DevOps, CosmosDB, Kubernetes, Data Lake and Key Vault. This talk deal about what are those services, how we use Databricks, how it’s interacting with those services, and what are our feedbacks and experiences.</p>

Views

Total views

472

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

13

Shares

0

Comments

0

Likes

0

×