=[J l[.—[[ ;l/’ [p;;;;’i8p
Retail Sales Forecasting: Corporacion Favorita
Performing Better Business Decisions
https://github.com/amjadraza/retail-sales-prediction
2
Who am I and What am I doing?
Muhammad A. Raza (Ph.D.)
A production ready, Scalable retail sales forecasting machine learning pipeline
Corporacion Favorita data
Model to forecast
Why Business need Model
Retail Sales Forecasting
Why we need Model
How do I approach it?
Proposed Model
Future Improvements
Tools/Technologies Used
Machine Learning Approach
3
Corporacion Favorita & Julian Assange
Corporacion Favorita-
TOTAL ASSETS
$1,440,2 Millions
Revenue
$2045 Millions
Net Income
$154 Millions
• large Ecuadorian-based grocery retailer
• Public Listed Company
• Founded in 1945
• A group of 20+ companies
4
03. Promo optimization
• Store sales planning
• Hot promotions planning
04. Consumer behavior
• What local consumer like the most
• Personalized consumer demand
• Meet individual consumer demand
even before they ask
01. Logistics Planning
• Optimal logistic transportation
• Optimal storage capacity
• Staff rostering
02. Better Business
Forecast
• Understand future revenue
• Helps to find demand and supply
• Better revenue predictions over
short periods as well as long period
How Sales Forecasting helps in Business?
5
Deploy the Model for
Production
› Deploy the Model on Cloud or in-
house for production.
› Keep tracking of real time
performance of model
Design
Cycle
Feature Engineering
› Relationship between data tables
› Extracting the features from
existing data
› Creating new features using
domain specific knowledge
Model Training
› Train the model
› Find optimal/best model
› Save the model in pickle
Preparing Data for Model
› Preparing the training, validation
and test data
› Imputing missing values if
necessary with some logic
Exploratory Data Analysis
› Data Engineering & Governess
› Distributions of data
› Can we believe the data?
An Overview of Forecasting ML Framework
6
Exploratory Data Analysis
Store Data Table1
Items Data Table2
Transactions Data Table3
Holidays Data Table4
Oil Data Table5
EDA-Part 1/1 EDA-Part 2/2
7
Feature Engineering
U n d erstan d in g th e Train in g D ata
Size of training data1 Identify Target variable2 Problem Identified3
P rep arin g th e D ata F o r Train in g
Training data set1 Validation set2 Test Set3
Feature Engineering notebook
8
Machine Learning Model
Model Training notebook
L ig h t GB M
What I chose Light GBM1 Industry Use Cases2 Fast and Scalable3
Model Comparison
Max
Round
Num_leaves Metric
Validation
MSE
Weight MSE
Model-Winner 200 3 l2 0.392 0.6266
Model-Proposed_v1 200 3 l2 0.2206 0.4699
Model-Proposed_v2
Comments
9
Future work to complete the pipeline
One of the interesting task, I would like to complete is creating new features.
Feature Engineering
01
Adding options for more models. Use linear Bayesian model to combine the predictions from individual models
ML/DL Models
02
With the increase of data, we can scale this pipeline using spark capabilities. Light GBM already has support to run with
pyspark.
Distributed Computation
03
Designing a framework to monitor the consistency of Model(s). Rolling training and validation is one option.
Consistency check of Model
04
Deploying the model for production. Set up CI/CD and add the test coverage.
Deploying with CI/CD
05
Python
PycharmPlotly Express
Jupyter
NotebooksUbuntu
Machine
Data Science Tools/Technologies
Sphnix Documentation
Git/GitHub

A presentation for Retail Sales Projects

  • 1.
    =[J l[.—[[ ;l/’[p;;;;’i8p Retail Sales Forecasting: Corporacion Favorita Performing Better Business Decisions https://github.com/amjadraza/retail-sales-prediction
  • 2.
    2 Who am Iand What am I doing? Muhammad A. Raza (Ph.D.) A production ready, Scalable retail sales forecasting machine learning pipeline Corporacion Favorita data Model to forecast Why Business need Model Retail Sales Forecasting Why we need Model How do I approach it? Proposed Model Future Improvements Tools/Technologies Used Machine Learning Approach
  • 3.
    3 Corporacion Favorita &Julian Assange Corporacion Favorita- TOTAL ASSETS $1,440,2 Millions Revenue $2045 Millions Net Income $154 Millions • large Ecuadorian-based grocery retailer • Public Listed Company • Founded in 1945 • A group of 20+ companies
  • 4.
    4 03. Promo optimization •Store sales planning • Hot promotions planning 04. Consumer behavior • What local consumer like the most • Personalized consumer demand • Meet individual consumer demand even before they ask 01. Logistics Planning • Optimal logistic transportation • Optimal storage capacity • Staff rostering 02. Better Business Forecast • Understand future revenue • Helps to find demand and supply • Better revenue predictions over short periods as well as long period How Sales Forecasting helps in Business?
  • 5.
    5 Deploy the Modelfor Production › Deploy the Model on Cloud or in- house for production. › Keep tracking of real time performance of model Design Cycle Feature Engineering › Relationship between data tables › Extracting the features from existing data › Creating new features using domain specific knowledge Model Training › Train the model › Find optimal/best model › Save the model in pickle Preparing Data for Model › Preparing the training, validation and test data › Imputing missing values if necessary with some logic Exploratory Data Analysis › Data Engineering & Governess › Distributions of data › Can we believe the data? An Overview of Forecasting ML Framework
  • 6.
    6 Exploratory Data Analysis StoreData Table1 Items Data Table2 Transactions Data Table3 Holidays Data Table4 Oil Data Table5 EDA-Part 1/1 EDA-Part 2/2
  • 7.
    7 Feature Engineering U nd erstan d in g th e Train in g D ata Size of training data1 Identify Target variable2 Problem Identified3 P rep arin g th e D ata F o r Train in g Training data set1 Validation set2 Test Set3 Feature Engineering notebook
  • 8.
    8 Machine Learning Model ModelTraining notebook L ig h t GB M What I chose Light GBM1 Industry Use Cases2 Fast and Scalable3 Model Comparison Max Round Num_leaves Metric Validation MSE Weight MSE Model-Winner 200 3 l2 0.392 0.6266 Model-Proposed_v1 200 3 l2 0.2206 0.4699 Model-Proposed_v2 Comments
  • 9.
    9 Future work tocomplete the pipeline One of the interesting task, I would like to complete is creating new features. Feature Engineering 01 Adding options for more models. Use linear Bayesian model to combine the predictions from individual models ML/DL Models 02 With the increase of data, we can scale this pipeline using spark capabilities. Light GBM already has support to run with pyspark. Distributed Computation 03 Designing a framework to monitor the consistency of Model(s). Rolling training and validation is one option. Consistency check of Model 04 Deploying the model for production. Set up CI/CD and add the test coverage. Deploying with CI/CD 05
  • 10.
    Python PycharmPlotly Express Jupyter NotebooksUbuntu Machine Data ScienceTools/Technologies Sphnix Documentation Git/GitHub