Looking to implement MLOps using AWS services and Kubeflow? Come and learn about machine learning from the experts of Provectus and Amazon Web Services (AWS)!
Businesses recognize that machine learning projects are important but go beyond just building and deploying models, which is mostly done by organizations. Successful ML projects entail a complete lifecycle involving ML, DevOps, and data engineering and are built on top of ML infrastructure.
AWS and Amazon SageMaker provide a foundation for building infrastructure for machine learning while Kubeflow is a great open source project, which is not given enough credit in the AWS community. In this webinar, we show how to design and build an end-to-end ML infrastructure on AWS.
Agenda
- Introductions
- Case Study: GoCheck Kids
- Overview of AWS Infrastructure for Machine Learning
- Provectus ML Infrastructure on AWS
- Experimentation
- MLOps
- Feature Store
Intended Audience
Technology executives & decision makers, manager-level tech roles, data engineers & data scientists, ML practitioners & ML engineers, and developers
Presenters
- Stepan Pushkarev, Chief Technology Officer, Provectus
- Qingwei Li, ML Specialist Solutions Architect, AWS
Feel free to share this presentation with your colleagues and don't hesitate to reach out to us at info@provectus.com if you have any questions!
REQUEST WEBINAR: https://provectus.com/webinar-mlops-and-reproducible-ml-on-aws-with-kubeflow-and-sagemaker-aug-2020/
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
MLOps and Reproducible ML on AWS with Kubeflow and Amazon SageMaker
1. MLOps and Reproducible ML on AWS
with Kubeflow and Amazon SageMaker
Presented by:
Stepan Pushkarev, CTO @ Provectus
Qingwei Li, ML Specialist Solutions Architect @ AWS
2. 1. Learn how to a build scalable and secure ML Infrastructure on AWS with
Provectus
2. Explore best practices of using Amazon SageMaker with open source tools
for better experience and productivity
Webinar Objectives
3. 1. Familiarity with AWS & Amazon SageMaker services
2. Familiarity with ML Workflow
3. Familiarity with Kubeflow & Kubeflow Pipelines
Webinar Prerequisites
4. 1. Introductions
2. Case Study: GoCheck Kids
3. Overview of AWS Infrastructure for Machine Learning
4. Provectus ML Infrastructure on AWS
a. Experimentation
b. MLOps
c. Feature Store
Agenda
5. AI-First Consultancy & Solutions Provider
Сlients ranging from
fast-growing startups to
large enterprises
450 employees and
growing
Established in 2010
HQ in Palo Alto
Offices across the US,
Canada, and Europe
We are obsessed about leveraging cloud, data, and AI to reimagine the way
businesses operate, compete, and deliver customer value
6. Innovative Tech Vendors
Seeking for niche expertise to
differentiate and win the market
Midsize to Large Enterprises
Seeking to accelerate innovation,
achieve operational excellence
Our Clients
7. Introductions
Stepan Pushkarev
Chief Technology
Officer, Provectus
Iskandar Sitdikov
ML Solutions Architect,
Provectus
Rinat Gareev
ML Solutions Architect,
Provectus
Ilnur Garifullin
ML Solutions Architect,
Provectus
Qingwei Li
ML Specialist Solutions
Architect, AWS
8. The past few years have been like a dream come true for those who work in
analytics and big data.There is a new career path for platform engineers to learn
Hadoop, Scala and Spark. Java and Python programmers have a chance to move
to the Big Data world. There they find higher salaries, new challenges and get
to scale up to distributed systems. But recently I am starting to hear some
complaints and dashed hopes from engineers who have spent time working there.
9. 1. Tools evolution — The Apache Spark/Hadoop ecosystem is great, but it is not stable and user-friendly enough
to just run and forget. Engineers and data scientists should contribute to existing open source projects and create
new tools to fill the gaps in day-to-day operations.
2. Education and cross skills — When data scientists write code, they need to think not just about abstractions,
but consider the practical issues of what is possible and what is reasonable. For example, they need to think how
long their query will run and whether the data they extract will fit into the storage mechanism they are using.
3. Improve the process — DevOps might be a solution. Here DevOps does not just mean writing Ansible scripts
and installing Jenkins. We need DevOps working in optimal fashion to reduce handoffs and invent new tools to
give everyone self-service to make them as productive as possible.
12. Reduce manual overhead for child vision
screening.
Detect strabismus, crescent, dark iris/pupil
population, as well as to reject images where
child is not looking straight into the camera.
Security and compliance requirements - Track
everything, do not touch anything.
Deep Vision Solution for GoCheck Kids
Business Problem Solution
End-to-end deep learning image classification
models to detect child gaze, strabismus,
crescent, and dark iris/pupil population.
13. Provectus has developed quite a few ML models:
● Different input (pre-processing, region cropping, single vs two eyes, etc.), 6
● Different feature generation backbones (deep convolutional networks: ResNet,
MobileNet, EfficientNet, custom, etc.), 7
● Transfer learning from a synthetic dataset, 3
● Tweaks with objective functions to tackle data imbalance, 5
● Different datasets splits, 10
Modeling Hypothesis
6x7x3x5x10 = 6,300 combinations to test in 3 weeks!
14. Conducted ~100* experiments on the entire dataset using pipelines within 3 weeks
● 100 000+ images
● Each experiment takes 15 min – 6 hours on a single GPU (P3 instance type)
* not counting development runs and experiments in notebook instances
We always had quite a few pending improvement hypotheses in backlog
● Each good hypothesis needs several runs to determine best hyperparameters
● OR automatic hyperparameter optimizer
Data preparation took ~5 hours
● Had to parallelize and reuse outputs
Each experiment produces artifacts: models, metrics, predictions
Met security and compliance requirements
Benefits and Outcomes of ML Infrastructure
15. Results Summary
3X
Increase in ML
model’s recall
(same precision)
95%
ML Engineer’s time
was dedicated to
experimentation
100+
Large scale
experiments in 3
weeks by 3 ML
engineers
This could not be achieved without Provectus ML Infrastructure on AWS
100%
Secure and FDA
Compliant
17. VISION SPEECH TEXT SEARCH NEW CHATBOTS PERSONALIZATION FORECASTING FRAUD NEW DEVELOPMENT NEW CONTACT CENTERS
Amazon SageMaker
Amazon
SageMaker
Ground
Truth
Amazon
A2I
Amazon
SageMaker
Neo
Built-in
algorithms
SageMaker
Notebooks NEW
SageMaker
Experiments NEW
Model
tuning
SageMaker
Debugger NEW
SageMaker
Autopilot NEW
Model
hosting
SageMaker
Model Monitor NEW
Deep Learning
AMIs & Containers
GPUs &
CPUs
Elastic
Inference
Inferentia FPGA
Amazon
Rekognition
Amazon
Polly
Amazon
Transcribe
+Medical
Amazon
Comprehend
+Medical
Amazon
Translate
Amazon
Lex
Amazon
Personalize
Amazon
Forecast
Amazon
Fraud Detector
Amazon
CodeGuru
AWS AI Services
AWS ML Services
AWS ML Frameworks & Infrastructure
Amazon
Textract
Amazon
Kendra
Contact Lens
For Amazon Connect
Amazon SageMaker Studio IDE
NEW
NEW NEW
AWS AI/ML Stack
18. Amazon SageMaker - A Fully Managed Services for ML
10101101
0
0101010
Collect
and prepare
training data
Select or
Build ML
algorithms
Set up and
manage
environments
for training
Train, debug,
and tune
models
Deploy
models in
production
Manage
training runs
Monitor
models
Scale and manage
the production
environment
Validate
predictions
32. How Provectus Adds Value
Feature Store
Store and reuse features to build ML models faster
ML Workflow Orchestrator
Reproduce and track the whole ML Workflow
Dataset Management
Track and govern training datasets
Dataset Sampling
Sample from production
streams
Advanced Monitoring
Detect drift in text & images
MLOps
Continuous Training & Delivery
33. The Core of MLOps and Reproducible Experimentation
Pipelines
34. 1. Backbone of Experimentation flow
2. Essential part of Continuous Integration and Delivery flow
3. Major part of Continuous Retraining flow
4. Production workload (unlike traditional CI/CD)
5. Part of day-to-day model tuning and development process
6. Idempotent — Should produce the same results with the same inputs
ML Pipeline Characteristics
42. Summary of Kubeflow on AWS
Best Practices:
● Invest into a library of reusable components
● Use Amazon SageMaker Components for Kubeflow
● Deploy on Amazon EKS, consider Provectus Swiss Army
Kube for a quick start
● Use Argo and Kubeflow for MLOps
Benefits:
● Metadata Tracker and Pipeline Orchestrator
● Minimal intervention into existing day-to-day ML routines
44. Value Proposition of Feature Store
A data management layer for machine learning features.
1. Better ROI from feature engineering — Facilitates collaboration,
sharing and reusing of features
2. Increases ML Engineer productivity — Storage is further
decoupled from ML pipelines
3. Prevents training-serving data skew by design
4. Can encapsulate or facilitate data versioning and features
quality monitoring
45. Good News: A properly designed Data Lake
covers 80% of requirements for Feature Store
46. Higher Level Operations:
● Fetch batch (take a sample)
● Get one
● Add / Deprecate feature
Lineage Metadata:
● Upstream Models
● Data Sources and transformations
Annotation Metadata:
● Agreements
● Judgements
● Annotation job parameters
Adding ML Awareness to Data Lake
Data Profiling Metadata:
● Min/max
● Uniqueness, missing values, etc.
Governance Metadata:
● Owner
● Description
● Version
● Last updated, SLA
47. Feature Store: Options
Not a Store. General purpose Data Catalogue.
Adds nice UI, Governance and Searchability.
Great design. Early Stage. Nicely overlaps with Data Lake.
No extensive metadata management yet.
AWS support: https://github.com/feast-dev/feast/issues/367
By Ph.D for Ph.Ds. Tremendous amount of work,
very advanced concepts but overcomplicated.
By creators of Uber Michelangelo. Closed source.
48. 1. Modern ML infrastructure accelerates time to value for ML initiatives and increases
trust from the business
2. Eliminates handoffs between Data Scientists, ML Engineers and IT
3. Must-have requirement for small ML shops and for large organizations. Spans from
straightforward “image classification” projects to more complex ML pipelines
4. Must-have requirement for secure and compliant environments
5. Minimizes growing technical debt in machine learning projects
6. Complements fully managed AWS services with Open Source projects for pipeline
orchestration, experiments tracking, dataset versioning, and feature store
Summary of ML Infrastructure
49. 125 University Avenue
Suite 290, Palo Alto
California, 94301
hello@provectus.com
Questions, details?
We would be happy to answer!