Machine Learning - From Scratch to Production

Machine Learning (ML):
From Scratch to
Production
Instructor: Amitabha Banerjee

What are we trying to do here?
Aka How to learn ML?
• Pick an interesting problem to solve
• Not something off a Kaggle dataset.
• Something that sounds like fun for you.
• Measurable outcome
• In terms of improving performance over time.
• Convince yourself that you need ML
• ML should be the last resort
• See if any algorithm/ heuristic works better.
• Understand what it takes to do good ML
• Data (high quality)
• Algorithms/Models
• User Feedback
• Measurability
• Continuous Improvement
• Deploy ML in production

What tools will we use?
• Two Vector Robots
• Roboflow
• A Linux machine such as Raspberry Pi to control Vector

Introducing Vector
• Introduced by Anki in October
2018
• Anki went bankrupt in May
2019
• Digital Dream Labs (DDL)
bought Anki assets in 2020
• DDL releases Vector 2.0 in
2022
• https://
www.digitaldreamlabs.com/
products/vector-robot

Goal: Provide Ability for Vector to see another Vector

Upload images
Datasets
Annotate
Train, Test,
Validate
Trained
Model(s)
MLOps Lifecycle
ML Operations
Model Repository + Manager
Data
Repository
Image Transformations
Pre-processing
Augmentation
Ready
Datasets
Provide Best
Model
(c) Amitabha Banerjee, 2022

Steps
●
Step 1: Uploading Raw Images
●
Step 2: Annotate (Label) the Images
●
Step 3: Image Pre-Processing and Augmentation
●
Step 4: Check dataset health and improve

Step 1: Uploading Raw Images
Roboflow
Data
Repository
Upload Images

Remember
●
Choose a good sampling frequency
●
Choose diverse deployment settings
●
Cleaner data is better than no data

Step 2: Annotate (Label) the images

3 ways to Annotate Images
●
Annotate of desktop using tools such as labelImg
– Then upload manually
●
Annotate directly at Roboflow
●
Annotate using power of Machine Learning
– Roboflow Label Assist

Step 3: Image Pre-processing and
Augmentation
Gray – scale
Adjust Brightness + Contrast +
Saturation
Rotate
Blur
Noise
Bounding Box Specific Enhancements

Step 4: Check Dataset Health
●
Do you have enough null images?
●
Are all images the same size?
●
Do you have sufficient representation for all your classes?
●
How do you keep correcting and improving?

Training a Machine Learning Model

Training Options
●
Train and Deploy at Roboflow
●
Train Using Cloud Service
– AWS Sagemaker Studio
– Google Colab
●
Train in your local environment

Deployment Options
●
Deploy in a Cloud Service
– Roboflow
– AWS Lambda Service
●
Deploy in your local environment
– Build docker images (to keep track of models)

●
Identifying weaknesses
– Failure
– Inaccuracy
●
Enrich dataset for those conditions
●
Re-train model
●
Re-evaluate Model Performance
– Check Key Performance Indicators (KPIs)
– Possible A/B Testing
●
Track all models (possibly in a database)
Iteratively Improving ML Models
Tesla Motors Inc. ML Pipeline
(courtesy Andrej Karpathy)

Conclusion
●
ML Operations Pipeline separates success from failure
●
Success in ML depends on:
– Quickly testing ML model performance in real-life
– Gathering feedback and improving labeled data
– Re-training and Re-measuring Performance wrt KPIs

Machine Learning - From Scratch to Production

Recommended

Recommended

More Related Content

Similar to Machine Learning - From Scratch to Production

Similar to Machine Learning - From Scratch to Production (20)

Recently uploaded

Recently uploaded (20)

Machine Learning - From Scratch to Production