Machine Learning with
PyCaret
Moez Ali
Creator of PyCaret
Agenda
§ Introduction
§ Machine Learning Life Cycle
§ PyCaret
§ Demo
§ Q&A
About me
https://www.linkedin.com/in/profile-moez/
https://twitter.com/moezpycaretorg1
https://medium.com/@moez_62905/
moez@pycaret.org
• Background: Finance + Economics + Computer Science + Data Science
• Industry: Healthcare + Education + Consulting + Fintech
• Work: Asia + Middle East + East Africa + North America
• Open-Source: PyCaret
Important Links
• Official: https://www.pycaret.org
• GitHub: https://www.github.com/pycaret/pycaret
• LinkedIn: https://www.linkedin.com/company/pycaret
• YouTube: https://www.youtube.com/channel/UCxA1YTYJ9BEeo50lxyI_B3g
• Medium: https://medium.com/@moez_62905/
• Slack: https://join.slack.com/t/pycaret/shared_invite/zt-nm02k73a-PSDo5lwmQ6evlFCPRxsKKA
• Demo: https://www.github.com/pycaret/pycaret-demo-dataai2021
Machine Learning Life Cycle
Business
Problem
Data
Sourcing
& ETL
Exploratory
Data Analysis
(EDA)
Data
Prep.
Model
Training &
Selection
Deployment
& Monitoring
Data
Prep.
Model
Training &
Selection
Data Prep, Model Training and Selection
Data
Train Test
Split
Test
Train
Missing Values
Imputation
Feature
Scaling
Encodings
Feature
Engineering
Cross Validation Environment
Model
Training
Model
Tuning
Model
Ensemble
Model
Selection
Finalize
Pipeline
Deploy Monitor
Optional
Challenges of Machine Learning Lifecycle
● Machine Learning is an iterative process. It is very time consuming.
● Right tooling in the hands of right people is very important.
● Creating a seamless pipeline is hard. Managing it in production is harder.
● Focus on end-goal and solving business problems can take a backseat within small
teams with increasing technical debt.
● Scalability is not just desirable, but it is very much needed.
What is PyCaret?
PyCaret is an open source, low-code machine learning library and end-to-end model
management used to automate machine learning workflows. It is commonly used for
rapid prototyping and deployment of ML pipelines.
EASY TO USE PRODUCTIVITY TOOL BUSINESS READY
PyCaret Features
Model
Selection
Analysis &
Interpretability
Experiment
Logging
Data
Preparation
Model
Training
Hyperparameter
Tuning
Machine Learning use-case supported:
Classification Regression
Clustering
Anomaly
Detection
Association
Rule
Mining
Natural
Language
Processing
SUPERVISED
UNSUPERVISED
Time Series
Impact of PyCaret
0
20
40
60
80
100
120
140
160
180
Data Preparation Model Training Model Selection Model Evaluation
Cumulative Lines of Code Comparison
scikit-learn
PyCaret
Cumulative
lines
of
code
Machine Learning Workflow Level
Statistics
Downloads
500,000+
Git Stars
3,000+
Contributors
46
Commits
1,700+
Contributors
PyCaret on GPU and on distributed cluster
Training on GPU Scalable Hyperparameter Tuning
PyCaret Integrations
Demo
qDemo 1 – Regression Problem on Insurance data
qDemo 2 – Time Series Forecasting using Regression
qDemo 3 – Multiple Time Series Forecasting
qDemo 4 – Model serving through MLflow API
All Notebooks are available here:
https://www.github.com/pycaret/pycaret-demo-dataai2021
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.
APPENDIX
Resources
Resources (cont.)
Resources (cont.)
Resources (cont.)
Resources (cont.)
Resources (cont.)
Resources (cont.)

Machine Learning with PyCaret