Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Serverless machine learning operations

Any startup has to have a clear go-to-market strategy from the beginning. Similarly, any data science project has to have a go-to-production strategy from its first days, so it could go beyond proof-of-concept. Machine learning and artificial intelligence in production would result in hundreds of training pipelines and machine learning models that are continuously revised by teams of data scientists and seamlessly connected with web applications for tenants and users.

In this demo-based talk we will walk through the best practices for simplifying machine learning operations across the enterprise and providing a serverless abstraction for data scientists and data engineers, so they could train, deploy and monitor machine learning models faster and with better quality.

  • Be the first to comment

Serverless machine learning operations

  1. 1. Serverless Machine Learning Operations by Stepan Pushkarev CTO of Hydrosphere.io
  2. 2. Mission: Accelerate Machine Learning to Production Opensource Products: - Mist: Spark Compute as a Service - ML Lambda: ML Function as a Service - Sonar: Data and ML Monitoring Business Model: Subscription services and hands-on consulting About
  3. 3. Ops folks here? Machine Learning nerds here? VP/Managers/Strategy?
  4. 4. Development Operations are well studied
  5. 5. Machine Learning operations are ad hoc ● Research phase -> productization phase ● Scripts driven ./bin/spark-submit python train.py ● Raw SQL / HiveQL / SQL on Hadoop ● Automated with Cron and/or Workflow Managers ● Hosted Notebooks culture
  6. 6. ML Project Time to Market
  7. 7. ML Project Time to Market
  8. 8. ML Project Time to Market
  9. 9. - Go to production strategy from the Day 1 - Training: Serverless Spark Compute - Serving/inferencing: Serverless ML Lambdas Agenda
  10. 10. Why does business hire data scientists?
  11. 11. Why do companies hire data scientists? To make products smarter.
  12. 12. What is a deliverable of data scientist? Academic paper ML Model R/Python script Jupiter Notebook BI Dashboard
  13. 13. How to move this to prod? Academic paper? ML Model? R/Python script? Jupiter Notebook? BI Dashboard?
  14. 14. Tragedy 1: Engineer to re-implement R/Python script
  15. 15. Tragedy 2: Notebook/scripts deployments
  16. 16. Tragedy 2: Run notebook/script as it is using cron
  17. 17. © Daniel Tunkelang - Where should you put your data scientists? - www.slideshare.net/dtunkelang/where-should-you-put-your-data-scientists Step 1 (management): Integrate data scientists into cross-functional teams
  18. 18. Step 2: Build/Deploy functions, not notebooks
  19. 19. Step 3: Monitor ML in production with other ML ● Data pipeline statistics ● Anomaly detection ● Pattern recognition ● Keep Data Scientist in the loop ● Treat data errors as Software bugs
  20. 20. Data Pipeline Functions
  21. 21. Batch Prediction Functions
  22. 22. From Vanilla Spark to serverless training and data processing ./bin/spark-submit - Spark Sessions Pool - Functions Registry - Multi-tenancy - REST API Framework - Data API Framework - Infrastructure Integration (EMR, Hortonworks, etc)
  23. 23. UX: Deploy Spark functions and trigger it from apps
  24. 24. Mist - Serverless proxy for Spark DEMO
  25. 25. Machine Learning: training + serving
  26. 26. pipeline Training (Estimation) pipeline trainpreprocess preprocess
  27. 27. pipeline Prediction Pipeline preprocess preprocess
  28. 28. cluster data model data scientist web app docker API libs model Local Spark ML Serving Library: https://github.com/Hydrospheredata/spark-ml-serving
  29. 29. Model Artifact
  30. 30. Models - Runtimes - Formats Zoo
  31. 31. API & Logistics - HTTP/1.1, HTTP/2, gRPC - Kafka, Flink, Kinesis - Protobuf, Avro - Service Discovery - Pipelining - Tracing - Monitoring - Autoscaling - Versioning - A/B, Canary - Testing - CPU, GPU
  32. 32. Sidecar Architecture
  33. 33. UX: Train anywhere and deploy as a Function
  34. 34. UX: Models and Applications Applications provide public endpoints for the models and compositions of the models.
  35. 35. UX: Streaming Applications + Batching
  36. 36. UX: Pipelines, Assembles and BestSLA Applications
  37. 37. ML Function as a Service Demo!!!
  38. 38. Thank you Looking for - Feedback - Advisors, mentors & partners - Pilots and early adopters Stay in touch - @hydrospheredata - https://github.com/Hydrospheredata - http://hydrosphere.io/ - spushkarev@hydrosphere.io

×