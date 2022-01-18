Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
What to Upload to SlideShare
Loading in …3
×
1 of 23

MLOps implemented - how we combine the cloud & open-source to boost data scientists work - Krzysztof Zarzycki, Marek Wiewiórka - GetInData

Jan. 18, 2022
0 likes 1 view

0

Share

Download to read offline

Technology

Presentation from the performance given by our team during the NSML Summit.
Authors: Krzysztof Zarzycki, Marek Wiewiórka
Linkedin: https://www.linkedin.com/in/kzarzycki/
https://www.linkedin.com/in/marekwiewiorka/
___
Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets.
Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries.
https://getindata.com​

Recommended

Related Books

Free with a 30 day trial from Scribd

See all
Bezonomics: How Amazon Is Changing Our Lives and What the World's Best Companies Are Learning from It Brian Dumaine
(4.5/5)
Free
So You Want to Start a Podcast: Finding Your Voice, Telling Your Story, and Building a Community That Will Listen Kristen Meinzer
(3.5/5)
Free
No Filter: The Inside Story of Instagram Sarah Frier
(4.5/5)
Free
Autonomy: The Quest to Build the Driverless Car—And How It Will Reshape Our World Lawrence D. Burns
(5/5)
Free
Live Work Work Work Die: A Journey into the Savage Heart of Silicon Valley Corey Pein
(4.5/5)
Free
Talk to Me: How Voice Computing Will Transform the Way We Live, Work, and Think James Vlahos
(3.5/5)
Free
SAM: One Robot, a Dozen Engineers, and the Race to Revolutionize the Way We Build Jonathan Waldman
(5/5)
Free
From Gutenberg to Google: The History of Our Future Tom Wheeler
(2/5)
Free
The Future Is Faster Than You Think: How Converging Technologies Are Transforming Business, Industries, and Our Lives Peter H. Diamandis
(4.5/5)
Free
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are Seth Stephens-Davidowitz
(4/5)
Free
Life After Google: The Fall of Big Data and the Rise of the Blockchain Economy George Gilder
(4/5)
Free
Future Presence: How Virtual Reality Is Changing Human Connection, Intimacy, and the Limits of Ordinary Life Peter Rubin
(4.5/5)
Free
Wizard:: The Life and Times of Nikolas Tesla Marc Seifer
(2.5/5)
Free
The Basics of Bitcoins and Blockchains: An Introduction to Cryptocurrencies and the Technology that Powers Them (Cryptography, Crypto Trading, Digital Assets, NFT) Antony Lewis
(4/5)
Free
On War: With linked Table of Contents Carl von Clausewitz
(4.5/5)
Free
Ninety Percent of Everything: Inside Shipping, the Invisible Industry That Puts Clothes on Your Back, Gas in Your Car, and Food on Your Plate Rose George
(4/5)
Free

Related Audiobooks

Free with a 30 day trial from Scribd

See all
A Brief History of Motion: From the Wheel, to the Car, to What Comes Next Tom Standage
(4.5/5)
Free
An Ugly Truth: Inside Facebook’s Battle for Domination Sheera Frenkel
(4.5/5)
Free
The Quiet Zone: Unraveling the Mystery of a Town Suspended in Silence Stephen Kurczy
(5/5)
Free
The Wires of War: Technology and the Global Struggle for Power Jacob Helberg
(4.5/5)
Free
System Error: Where Big Tech Went Wrong and How We Can Reboot Rob Reich
(4/5)
Free
If Then: How the Simulmatics Corporation Invented the Future Jill Lepore
(4.5/5)
Free
Liftoff: Elon Musk and the Desperate Early Days That Launched SpaceX Eric Berger
(5/5)
Free
The Science of Time Travel: The Secrets Behind Time Machines, Time Loops, Alternate Realities, and More! Elizabeth Howell
(3/5)
Free
Bitcoin Billionaires: A True Story of Genius, Betrayal, and Redemption Ben Mezrich
(4.5/5)
Free
The Players Ball: A Genius, a Con Man, and the Secret History of the Internet's Rise David Kushner
(4.5/5)
Free
Lean Out: The Truth About Women, Power, and the Workplace Marissa Orr
(4.5/5)
Free
Blockchain: The Next Everything Stephen P. Williams
(4/5)
Free
Uncanny Valley: A Memoir Anna Wiener
(4/5)
Free
A World Without Work: Technology, Automation, and How We Should Respond Daniel Susskind
(4.5/5)
Free
User Friendly: How the Hidden Rules of Design Are Changing the Way We Live, Work, and Play Cliff Kuang
(4/5)
Free
Digital Renaissance: What Data and Economics Tell Us about the Future of Popular Culture Joel Waldfogel
(3.5/5)
Free

MLOps implemented - how we combine the cloud & open-source to boost data scientists work - Krzysztof Zarzycki, Marek Wiewiórka - GetInData

  1. 1. MLOps implemented - how we combine the cloud & open-source to boost data scientists work
  2. 2. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Marek Wiewiórka Chief Data Architect marek.wiewiorka@getindata.com Krzysztof Zarzycki Chief Technology Ofﬁcer krzysztof.zarzycki@getindata.com
  3. 3. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Founded by ex-Spotify engineers in 2014 Focus only on Big Data and Cloud (from day 1) Community builders (Big Data Tech Warsaw, blogs, OSS) 80+ Big Data engineers (and growing) GetInData in a Nutshell
  4. 4. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
  5. 5. © Copyright. All rights reserved. Not to be reproduced without prior written consent. How We Got to MLOps 2015 Google publishes “Hidden Technical Debt in Machine Learning Systems“ 2018 Started building a cloud-native ML platform at ING Bank 2019 started building a ML Platform for a large Polish telecom 2020 Built ML Platform for Kcell, the largest Kazakh Telecom 2020 MLOps projects started with retail (cloud), mobile app 2021 MLOps project started for the largest Polish bank (cloud) and more...
  6. 6. © Copyright. All rights reserved. Not to be reproduced without prior written consent. ● Software Engineering-like process but for ML models ● The pipeline is the result, not the model ● No IT required, for Data Science to production ● Freedom of choice of tools ● Loosely coupled mix of cloud services and open-source ● Best of breed instead of all-in-one approach Our MLOps Principles
  7. 7. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Data Science Workbench - Our Vision
  8. 8. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Data Scientists IDE - Batteries Included ●
  9. 9. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Data Scientists IDE - Batteries Included ●
  10. 10. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Kedro - Data Scientist’s Swiss Knife ● Kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code ● Kedro’s main concepts: ○ Project template ○ Conﬁguration and environments ○ Data catalog ○ Nodes and pipelines
  11. 11. © Copyright. All rights reserved. Not to be reproduced without prior written consent. ● common directory structure for all projects ● customizable Cookiecutter templates ● boilerplate code for a ML project using Kedro framework ● ofﬁcial and in-house baked kedro new --starter=pyspark Kedro - Project starters
  12. 12. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Kedro - Data Catalog Data source deﬁnition: ● Separation of transformations code and data connectors ● Can be reused between projects
  13. 13. © Copyright. All rights reserved. Not to be reproduced without prior written consent. ● Node - a Python function that has zero to many inputs and/or output datasets ● Pipeline - a DAG. A collection of nodes with deﬁned relationships and dependencies. kedro run Kedro viz Kedro Nodes and Pipelines
  14. 14. © Copyright. All rights reserved. Not to be reproduced without prior written consent. 1. Log into JupyterLab 2. Create a project with a Kedro starter 3. EDA with notebooks & pipeline implementation using VS Code 4. Run your project and automatically track experiment with a local MLﬂow 5. Optionally schedule it with a local Airﬂow 6. Repeat until you’re happy with your model ! Local development with Kedro and MLﬂow
  15. 15. © Copyright. All rights reserved. Not to be reproduced without prior written consent. ● Pipeline containerization with kedro-docker ● DAGs generation and scheduling with one of our plugins: ○ kedro-airﬂow-k8s ○ kedro-kubeﬂow ● Dataset stability with kedro-popmon (together with ING) ● Kubernetes pods proﬁling(R&D) ● CI/CD for maximum automation Delivering ML Model to Production
  16. 16. © Copyright. All rights reserved. Not to be reproduced without prior written consent. ● Freedom of toolkit choice with containerized execution ● Scalable training ● Experiments and models tracking ● “Continuous Training” Schedule- or event-driven Model Training
  17. 17. © Copyright. All rights reserved. Not to be reproduced without prior written consent. ● CI/CD ● Models from registry ● Batch & online ● Scalability ● Extensive monitoring Model Serving
  18. 18. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Model Deployment to Production! writes produces
  19. 19. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Kedro-kubeﬂow kedro-airﬂow-k8s Model deployer Jupyter plugins Prebaked images Google AI Platform Experimentation Training Serving
  20. 20. © Copyright. All rights reserved. Not to be reproduced without prior written consent. MLOps R&D ● Align Data and ML engineering ● Feature Store ○ Feast, GCP, AWS ● Kedro ○ Company-wide data discovery tools ○ Hyperparameters tuning ○ Serving, model deployment ● Advanced deployments ● Retraining, data drift ● Business monitoring, outcome attribution
  21. 21. © Copyright. All rights reserved. Not to be reproduced without prior written consent. ● Focus on unlocking data scientists ○ Start with Data Science Workbench ○ Make code reproducible by CI ○ Then build Scalable Training How to Start with MLOps?
  22. 22. Thank you! - Dziękujemy! github.com/getindata/kedro-kubeﬂow github.com/getindata/kedro-airﬂow-k8s

×