SlideShare a Scribd company logo
1 of 10
Download to read offline
=[J l[.—[[ ;l/’ [p;;;;’i8p
Retail Sales Forecasting: Corporacion Favorita
Performing Better Business Decisions
https://github.com/amjadraza/retail-sales-prediction
2
Who am I and What am I doing?
Muhammad A. Raza (Ph.D.)
A production ready, Scalable retail sales forecasting machine learning pipeline
Corporacion Favorita data
Model to forecast
Why Business need Model
Retail Sales Forecasting
Why we need Model
How do I approach it?
Proposed Model
Future Improvements
Tools/Technologies Used
Machine Learning Approach
3
Corporacion Favorita & Julian Assange
Corporacion Favorita-
TOTAL ASSETS
$1,440,2 Millions
Revenue
$2045 Millions
Net Income
$154 Millions
• large Ecuadorian-based grocery retailer
• Public Listed Company
• Founded in 1945
• A group of 20+ companies
4
03. Promo optimization
• Store sales planning
• Hot promotions planning
04. Consumer behavior
• What local consumer like the most
• Personalized consumer demand
• Meet individual consumer demand
even before they ask
01. Logistics Planning
• Optimal logistic transportation
• Optimal storage capacity
• Staff rostering
02. Better Business
Forecast
• Understand future revenue
• Helps to find demand and supply
• Better revenue predictions over
short periods as well as long period
How Sales Forecasting helps in Business?
5
Deploy the Model for
Production
› Deploy the Model on Cloud or in-
house for production.
› Keep tracking of real time
performance of model
Design
Cycle
Feature Engineering
› Relationship between data tables
› Extracting the features from
existing data
› Creating new features using
domain specific knowledge
Model Training
› Train the model
› Find optimal/best model
› Save the model in pickle
Preparing Data for Model
› Preparing the training, validation
and test data
› Imputing missing values if
necessary with some logic
Exploratory Data Analysis
› Data Engineering & Governess
› Distributions of data
› Can we believe the data?
An Overview of Forecasting ML Framework
6
Exploratory Data Analysis
Store Data Table1
Items Data Table2
Transactions Data Table3
Holidays Data Table4
Oil Data Table5
EDA-Part 1/1 EDA-Part 2/2
7
Feature Engineering
U n d erstan d in g th e Train in g D ata
Size of training data1 Identify Target variable2 Problem Identified3
P rep arin g th e D ata F o r Train in g
Training data set1 Validation set2 Test Set3
Feature Engineering notebook
8
Machine Learning Model
Model Training notebook
L ig h t GB M
What I chose Light GBM1 Industry Use Cases2 Fast and Scalable3
Model Comparison
Max
Round
Num_leaves Metric
Validation
MSE
Weight MSE
Model-Winner 200 3 l2 0.392 0.6266
Model-Proposed_v1 200 3 l2 0.2206 0.4699
Model-Proposed_v2
Comments
9
Future work to complete the pipeline
One of the interesting task, I would like to complete is creating new features.
Feature Engineering
01
Adding options for more models. Use linear Bayesian model to combine the predictions from individual models
ML/DL Models
02
With the increase of data, we can scale this pipeline using spark capabilities. Light GBM already has support to run with
pyspark.
Distributed Computation
03
Designing a framework to monitor the consistency of Model(s). Rolling training and validation is one option.
Consistency check of Model
04
Deploying the model for production. Set up CI/CD and add the test coverage.
Deploying with CI/CD
05
Python
PycharmPlotly Express
Jupyter
NotebooksUbuntu
Machine
Data Science Tools/Technologies
Sphnix Documentation
Git/GitHub

More Related Content

What's hot

A combination of decision tree learning and clustering
A combination of decision tree learning and clusteringA combination of decision tree learning and clustering
A combination of decision tree learning and clustering
Chinnapat Kaewchinporn
 
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
Karthik Murugesan
 
ensemble learning
ensemble learningensemble learning
ensemble learning
butest
 

What's hot (20)

How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ? How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ?
 
Ensemble learning Techniques
Ensemble learning TechniquesEnsemble learning Techniques
Ensemble learning Techniques
 
Machine learning life cycle
Machine learning life cycleMachine learning life cycle
Machine learning life cycle
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
A combination of decision tree learning and clustering
A combination of decision tree learning and clusteringA combination of decision tree learning and clustering
A combination of decision tree learning and clustering
 
Applications of Big Data
Applications of Big DataApplications of Big Data
Applications of Big Data
 
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
 
Gradient Boosted trees
Gradient Boosted treesGradient Boosted trees
Gradient Boosted trees
 
Machine Learning for Sales & Marketing
Machine Learning for Sales & MarketingMachine Learning for Sales & Marketing
Machine Learning for Sales & Marketing
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentals
 
Data at Spotify
Data at SpotifyData at Spotify
Data at Spotify
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Machine Learning Applications in Credit Risk
Machine Learning Applications in Credit RiskMachine Learning Applications in Credit Risk
Machine Learning Applications in Credit Risk
 
ensemble learning
ensemble learningensemble learning
ensemble learning
 
Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An Overview
 
Deep Credit Risk Ranking with LSTM with Kyle Grove
Deep Credit Risk Ranking with LSTM with Kyle GroveDeep Credit Risk Ranking with LSTM with Kyle Grove
Deep Credit Risk Ranking with LSTM with Kyle Grove
 
Predictive analytics
Predictive analytics Predictive analytics
Predictive analytics
 
Market Basket Analysis in SAS
Market Basket Analysis in SASMarket Basket Analysis in SAS
Market Basket Analysis in SAS
 

Similar to A presentation for Retail Sales Projects

A Data Warehouse And Business Intelligence Application
A Data Warehouse And Business Intelligence ApplicationA Data Warehouse And Business Intelligence Application
A Data Warehouse And Business Intelligence Application
Kate Subramanian
 

Similar to A presentation for Retail Sales Projects (20)

Big Data, Big Thinking: Untapped Opportunities
Big Data, Big Thinking: Untapped OpportunitiesBig Data, Big Thinking: Untapped Opportunities
Big Data, Big Thinking: Untapped Opportunities
 
Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptx
Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptxLecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptx
Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptx
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
Building a Marketing Data Warehouse from Scratch - SMX Advanced 202
Building a Marketing Data Warehouse from Scratch - SMX Advanced 202Building a Marketing Data Warehouse from Scratch - SMX Advanced 202
Building a Marketing Data Warehouse from Scratch - SMX Advanced 202
 
Black_Friday_Sales_Trushita
Black_Friday_Sales_TrushitaBlack_Friday_Sales_Trushita
Black_Friday_Sales_Trushita
 
Demystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep DiveDemystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep Dive
 
Day 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business AnalyticsDay 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business Analytics
 
INFORMATICA EASY LEARNING ONLINE TRAINING
INFORMATICA EASY LEARNING ONLINE TRAININGINFORMATICA EASY LEARNING ONLINE TRAINING
INFORMATICA EASY LEARNING ONLINE TRAINING
 
Not Tooling Around: How The Home Depot Uses Machine Learning for Vendor Accou...
Not Tooling Around: How The Home Depot Uses Machine Learning for Vendor Accou...Not Tooling Around: How The Home Depot Uses Machine Learning for Vendor Accou...
Not Tooling Around: How The Home Depot Uses Machine Learning for Vendor Accou...
 
Business function
Business functionBusiness function
Business function
 
Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...
Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...
Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...
 
Big data analytics presented at meetup big data for decision makers
Big data analytics presented at meetup big data for decision makersBig data analytics presented at meetup big data for decision makers
Big data analytics presented at meetup big data for decision makers
 
Digital Velocity 2014 Morning Keynote: "Building an Effective Digital Marketi...
Digital Velocity 2014 Morning Keynote: "Building an Effective Digital Marketi...Digital Velocity 2014 Morning Keynote: "Building an Effective Digital Marketi...
Digital Velocity 2014 Morning Keynote: "Building an Effective Digital Marketi...
 
How to Capitalize on Big Data with Oracle Analytics Cloud
How to Capitalize on Big Data with Oracle Analytics CloudHow to Capitalize on Big Data with Oracle Analytics Cloud
How to Capitalize on Big Data with Oracle Analytics Cloud
 
Key Principles Of Data Mining
Key Principles Of Data MiningKey Principles Of Data Mining
Key Principles Of Data Mining
 
Big Data Business Transformation - Big Picture and Blueprints
Big Data Business Transformation - Big Picture and BlueprintsBig Data Business Transformation - Big Picture and Blueprints
Big Data Business Transformation - Big Picture and Blueprints
 
Meetup Data-science OVH
Meetup Data-science OVHMeetup Data-science OVH
Meetup Data-science OVH
 
How to Build an AI/ML Product and Sell it by SalesChoice CPO
How to Build an AI/ML Product and Sell it by SalesChoice CPOHow to Build an AI/ML Product and Sell it by SalesChoice CPO
How to Build an AI/ML Product and Sell it by SalesChoice CPO
 
Doing Analytics Right - Building the Analytics Environment
Doing Analytics Right - Building the Analytics EnvironmentDoing Analytics Right - Building the Analytics Environment
Doing Analytics Right - Building the Analytics Environment
 
A Data Warehouse And Business Intelligence Application
A Data Warehouse And Business Intelligence ApplicationA Data Warehouse And Business Intelligence Application
A Data Warehouse And Business Intelligence Application
 

Recently uploaded

一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
wsppdmt
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Abortion pills in Riyadh +966572737505 get cytotec
 

Recently uploaded (20)

一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 

A presentation for Retail Sales Projects

  • 1. =[J l[.—[[ ;l/’ [p;;;;’i8p Retail Sales Forecasting: Corporacion Favorita Performing Better Business Decisions https://github.com/amjadraza/retail-sales-prediction
  • 2. 2 Who am I and What am I doing? Muhammad A. Raza (Ph.D.) A production ready, Scalable retail sales forecasting machine learning pipeline Corporacion Favorita data Model to forecast Why Business need Model Retail Sales Forecasting Why we need Model How do I approach it? Proposed Model Future Improvements Tools/Technologies Used Machine Learning Approach
  • 3. 3 Corporacion Favorita & Julian Assange Corporacion Favorita- TOTAL ASSETS $1,440,2 Millions Revenue $2045 Millions Net Income $154 Millions • large Ecuadorian-based grocery retailer • Public Listed Company • Founded in 1945 • A group of 20+ companies
  • 4. 4 03. Promo optimization • Store sales planning • Hot promotions planning 04. Consumer behavior • What local consumer like the most • Personalized consumer demand • Meet individual consumer demand even before they ask 01. Logistics Planning • Optimal logistic transportation • Optimal storage capacity • Staff rostering 02. Better Business Forecast • Understand future revenue • Helps to find demand and supply • Better revenue predictions over short periods as well as long period How Sales Forecasting helps in Business?
  • 5. 5 Deploy the Model for Production › Deploy the Model on Cloud or in- house for production. › Keep tracking of real time performance of model Design Cycle Feature Engineering › Relationship between data tables › Extracting the features from existing data › Creating new features using domain specific knowledge Model Training › Train the model › Find optimal/best model › Save the model in pickle Preparing Data for Model › Preparing the training, validation and test data › Imputing missing values if necessary with some logic Exploratory Data Analysis › Data Engineering & Governess › Distributions of data › Can we believe the data? An Overview of Forecasting ML Framework
  • 6. 6 Exploratory Data Analysis Store Data Table1 Items Data Table2 Transactions Data Table3 Holidays Data Table4 Oil Data Table5 EDA-Part 1/1 EDA-Part 2/2
  • 7. 7 Feature Engineering U n d erstan d in g th e Train in g D ata Size of training data1 Identify Target variable2 Problem Identified3 P rep arin g th e D ata F o r Train in g Training data set1 Validation set2 Test Set3 Feature Engineering notebook
  • 8. 8 Machine Learning Model Model Training notebook L ig h t GB M What I chose Light GBM1 Industry Use Cases2 Fast and Scalable3 Model Comparison Max Round Num_leaves Metric Validation MSE Weight MSE Model-Winner 200 3 l2 0.392 0.6266 Model-Proposed_v1 200 3 l2 0.2206 0.4699 Model-Proposed_v2 Comments
  • 9. 9 Future work to complete the pipeline One of the interesting task, I would like to complete is creating new features. Feature Engineering 01 Adding options for more models. Use linear Bayesian model to combine the predictions from individual models ML/DL Models 02 With the increase of data, we can scale this pipeline using spark capabilities. Light GBM already has support to run with pyspark. Distributed Computation 03 Designing a framework to monitor the consistency of Model(s). Rolling training and validation is one option. Consistency check of Model 04 Deploying the model for production. Set up CI/CD and add the test coverage. Deploying with CI/CD 05
  • 10. Python PycharmPlotly Express Jupyter NotebooksUbuntu Machine Data Science Tools/Technologies Sphnix Documentation Git/GitHub