2. 2
Who am I and What am I doing?
Muhammad A. Raza (Ph.D.)
A production ready, Scalable retail sales forecasting machine learning pipeline
Corporacion Favorita data
Model to forecast
Why Business need Model
Retail Sales Forecasting
Why we need Model
How do I approach it?
Proposed Model
Future Improvements
Tools/Technologies Used
Machine Learning Approach
3. 3
Corporacion Favorita & Julian Assange
Corporacion Favorita-
TOTAL ASSETS
$1,440,2 Millions
Revenue
$2045 Millions
Net Income
$154 Millions
• large Ecuadorian-based grocery retailer
• Public Listed Company
• Founded in 1945
• A group of 20+ companies
4. 4
03. Promo optimization
• Store sales planning
• Hot promotions planning
04. Consumer behavior
• What local consumer like the most
• Personalized consumer demand
• Meet individual consumer demand
even before they ask
01. Logistics Planning
• Optimal logistic transportation
• Optimal storage capacity
• Staff rostering
02. Better Business
Forecast
• Understand future revenue
• Helps to find demand and supply
• Better revenue predictions over
short periods as well as long period
How Sales Forecasting helps in Business?
5. 5
Deploy the Model for
Production
› Deploy the Model on Cloud or in-
house for production.
› Keep tracking of real time
performance of model
Design
Cycle
Feature Engineering
› Relationship between data tables
› Extracting the features from
existing data
› Creating new features using
domain specific knowledge
Model Training
› Train the model
› Find optimal/best model
› Save the model in pickle
Preparing Data for Model
› Preparing the training, validation
and test data
› Imputing missing values if
necessary with some logic
Exploratory Data Analysis
› Data Engineering & Governess
› Distributions of data
› Can we believe the data?
An Overview of Forecasting ML Framework
6. 6
Exploratory Data Analysis
Store Data Table1
Items Data Table2
Transactions Data Table3
Holidays Data Table4
Oil Data Table5
EDA-Part 1/1 EDA-Part 2/2
7. 7
Feature Engineering
U n d erstan d in g th e Train in g D ata
Size of training data1 Identify Target variable2 Problem Identified3
P rep arin g th e D ata F o r Train in g
Training data set1 Validation set2 Test Set3
Feature Engineering notebook
8. 8
Machine Learning Model
Model Training notebook
L ig h t GB M
What I chose Light GBM1 Industry Use Cases2 Fast and Scalable3
Model Comparison
Max
Round
Num_leaves Metric
Validation
MSE
Weight MSE
Model-Winner 200 3 l2 0.392 0.6266
Model-Proposed_v1 200 3 l2 0.2206 0.4699
Model-Proposed_v2
Comments
9. 9
Future work to complete the pipeline
One of the interesting task, I would like to complete is creating new features.
Feature Engineering
01
Adding options for more models. Use linear Bayesian model to combine the predictions from individual models
ML/DL Models
02
With the increase of data, we can scale this pipeline using spark capabilities. Light GBM already has support to run with
pyspark.
Distributed Computation
03
Designing a framework to monitor the consistency of Model(s). Rolling training and validation is one option.
Consistency check of Model
04
Deploying the model for production. Set up CI/CD and add the test coverage.
Deploying with CI/CD
05