SlideShare a Scribd company logo
1 of 44
Download to read offline
Development of Sales Forecasting Application
for Retail Hypermarkets
using Simulation & Genetic Algorithm
Chachrist Srisuwanrat, Ph.D.
The Founder of ThaiQuants.com
Main Characters of Retail Hypermarkets for this Project
• Sales Area: 6,000 to 12,000 sq.m.
• Land Size: 16 to 24 rais
• #Stories: 1 to 2 floors
• Provide food & non-food products
• Including rent areas
Objectives of this Project
Forecasting Sales for a new potential Hypermarkets (not exist yet)
• Method
• Application
• Workflow
• Database
• Documents
Note: the project was conducted in late 2011 with Confidential & Non-Compete Agreement.
Information in this presentation is only a broad idea about its methods and development.
Timeline
Timeline
Software
• Camtasia for documenting and consulting
• MS Excel for existing and a new proof-of-concept calculation
• Visual Studio Dot Net for prototyping and developing application
• MS Access for Database
• GIS Application for retrieving demographic information
Requirements from Executives (key decision makers)
• Result in 90% accuracy
• Eliminate subjective judgement
• Keep information secrete
• No “Linear Regression”
Requirements from Strategies (end users)
• Use/modify the current method
• Expedite the process of forecasting
• Remain some flexibility
Parameters of Retail Hypermarkets
1. Size
2. Location
3. Demographic factors
4. Visibility
5. Traffic
6. Competition
7. Design
8. Access
9. Activity
10. Parking
11. Signing
12. …
Sales ForecastingMethods
1. Regression Method (Linear & Non-linear)
2. Gravity Model (Demographic/Catchment Calculation)
3. Analogue Method (Stores’ Parameter Comparison)
Sales (Million Baht)
Sales Area (sq.m.)
8,0006,000 10,000
1,000
1,200
800
600
1,400
12,000
1. Regression Method 2. Gravity Model 3. Analogue Method
1. Regression Method
Sales = a × X1 + b × X2 + c × X3 + … + K
• Xs are primary parameters:
• Store Size
• Catchment Area & Competition
• Demographic Variables
• …
• Xi could be a ratio.
• K is a constant such as 500 million baht
Sales (Million Baht)
Sales Area (sq.m.)
8,0006,000 10,000
1,000
1,200
800
600
1,400
12,000
No. Actual Sales X1 X2 X3 …
1
2
3
4
5
6
7
Derive a, b, c, …, and K
2. Gravity Model
Sales = Adjusted Population × Expenditure/head
• Primary Data
I. Store’s physical data
II. Geographic data
III. Demographic data
IV. …
Consider I, II, III, IV, …
Transportation
Mountain
3. Analogue Method (Traditional)
Sales = Average (Actual Sales × Parameters)
• Based on actual sales of exiting stores
• Compare parameters between Analog & Target Stores
• The parameters including both:
• Primary data: Demographic/Geographicdata
• Secondary data: Size, Location, …
Analog Stores Target
Store A B X
Actual Sales 1,000 800 ?
Size ?% ?%
Location ?% ?%
Sum
???
Analog Stores Target
Store A B X
Actual Sales 1,000 800 ?
Size +15% -5%
Location +5% +15%
Sum +20% +10%
1,200 880 1,040
Assign %parameters
Problems with Current Practice,
Analogue Method
• Assigning subjective values
• Missing parameters (cannibalization, …)
• Weighting parameters equally
• Omitting an impact from other parameters
• Selecting ambiguous analog stores (A, B, C, …)
• Taking time
Analog Stores Target
Store A B C X
Actual Sales 1,000 800 1,000 ?
Size +15% -5% -
Location +5% +15% +5%
Visibility - - +5%
Traffic ? - -15%
Competition +%5% - +5%
Sum …% …%
… … ???
Analog Stores Target
Store A B X
Actual Sales 1,000 800 ?
Size +15% -5%
Location +5% +15%
Sum +20% +10%
1,200 880 1,040
During Proof-Of-Concept Development
• Researching/learning about the topic & data
• Limiting effort from the team
• Requesting data for 10 existing stores
• Asking for only available data from the team
• Finding/Testing new data
• Discussing/Proposing ideas
• Focusing on primary data: size, population, household income
ACCURACY: 75%
Analog Stores Target
Store A B C X
Actual Sales # # # ?
P1 % % %-
P2 % % %
P3 % % %
… % % %
P6 % % %
Size # # # #
Population # # # #
HH Income # # # #
Sum …% …% …%
… … … ???
During Prototype Development
• Changing assigned value to boolean-type data
• P2.1: Is it visible in 150 meters?
• Creating levels for parameters
• P2.2: Is it visible in 300 meters?
• Requesting more data and 3-point estimates
• Incorporating Monte Carlo Simulation
• Grouping secondary parameters into 3-4 groups
• Testing Genetic Algorithm Optimization
ACCURACY: 86%
Analog Stores Target
Store A B C X
Actual Sales # # # ?
P1 % % %-
P2.1
P2.2
… … … … …
P15
Size # # # #
Pop @ 5 Km # # # #
Pop @ 15 Km # # # #
Pop @ 30 Km # # # #
HHI @ 5 Km # # # #
HHI @ 15 Km # # # #
HHI @ 30 Km # # # #
Sum …% …% …%
… … … ???
Example of Using Boolean Parameters
Analog Stores Target
Store A B C X
Actual Sales 1,000 800 1,000 ?
P1: +10% % % %-
P2.1: +4%
P2.2: +6%
… … … … …
Sum …% …% …%
… … … ???
Analog Stores Target
Store A B C X
Actual Sales 1,000 800 1,000 ?
P1: +10% 0 +10% +10%
P2.1: +4% -4% -4% 0
P2.2: +6% +6% 0 +6%
… … … … …
Sum +2% +6% +16%
1,020 848 1,160 1,009
𝑆𝑎𝑙𝑒𝑠 𝑋:𝑓𝑜𝑟𝑒𝑐𝑎𝑠𝑡 = ൘෍
𝑠=1
𝑆=3
𝑆𝑎𝑙𝑒𝑠 𝐴,𝐵,𝐶:𝐴𝑐𝑡𝑢𝑎𝑙 × ෍
𝑝=1
𝑃
∆𝑃 𝑆
%𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 100 −
𝑆𝑎𝑙𝑒𝑠 𝑓𝑜𝑟𝑒𝑐𝑎𝑠𝑡 −𝑆𝑎𝑙𝑒𝑠 𝐴𝑐𝑡𝑢𝑎𝑙
𝑆𝑎𝑙𝑒𝑠 𝐴𝑐𝑡𝑢𝑎𝑙
× 100
Three-point Estimates & Simulation
Analog Stores Target
Store A B C X
Actual Sales 1,000 800 1,000 ?
P1: % % %-
P2.1:
P2.2:
… … … … …
Sum …% …% …%
… … … ???
10% 12%4%
4% 6%2%
6% 10%4%
P1
P2
P3
Simulate 1,000 runs to get
1,000 values of Salesforecast
0.25
0.50
0.75
1.00
1,000 1,200 1,400800600
P1 = 10%
P2.1 = 4%
P2.2 = 6%
Probability
Salesforecast
60 + 10 Parameters
• About 60 Boolean parameters
• …Visibility in 150 m
• …
• …
• About 10 Numerical parameters
• Store size
• Population in 5, 15, 30 Km
• Household Income in 5, 15, 30 Km
•Should these parameters beweighted equally for SalesForecast?
Improving Accuracy with Optimization
• Categorizing Boolean Parameters into 6 groups
• Introducing Weights for each group
• Therefore, there will be Weight1, Weight2, …, Weight6
𝑆𝑎𝑙𝑒𝑠 𝑋:𝑓𝑜𝑟𝑒𝑐𝑎𝑠𝑡 = ൘෍
𝑠=1
𝑆=3
𝑆𝑎𝑙𝑒𝑠 𝐴,𝐵,𝐶:𝐴𝑐𝑡𝑢𝑎𝑙 × ෍
𝑝=1
𝑃
∆𝑃 𝑆
𝑆𝑎𝑙𝑒𝑠 𝑋:𝑓𝑜𝑟𝑒𝑐𝑎𝑠𝑡 = ൘෍
𝑠=1
𝑆=3
𝑆𝑎𝑙𝑒𝑠 𝐴,𝐵,𝐶:𝐴𝑐𝑡𝑢𝑎𝑙 ×
σ 𝑤=1
𝑊=6
(𝑊𝑒𝑖𝑔ℎ𝑡 𝑤 × σ 𝑝=1
𝑃
∆𝑃𝑤)
σ 𝑤=1
𝑊=6
𝑊𝑒𝑖𝑔ℎ𝑡 𝑤
𝑆
To find Optimum Weights:
•How to derive the optimum Weights???
Remember. We don’t have SalesX:Actual for the calculation of %Accuracy !!!
𝑂𝑏𝑗𝑒𝑐𝑡𝑖𝑣𝑒 𝐹𝑢𝑛𝑐𝑡𝑖𝑜𝑛 = 𝑀𝑎𝑥𝑖𝑚𝑖𝑧𝑒(%𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦)
%𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 100 −
𝑆𝑎𝑙𝑒𝑠 𝑓𝑜𝑟𝑒𝑐𝑎𝑠𝑡 −𝑆𝑎𝑙𝑒𝑠 𝐴𝑐𝑡𝑢𝑎𝑙
𝑆𝑎𝑙𝑒𝑠 𝐴𝑐𝑡𝑢𝑎𝑙
× 100
𝑆𝑎𝑙𝑒𝑠 𝑋:𝑓𝑜𝑟𝑒𝑐𝑎𝑠𝑡 = ൘෍
𝑠=1
𝑆=3
𝑆𝑎𝑙𝑒𝑠 𝐴,𝐵,𝐶:𝐴𝑐𝑡𝑢𝑎𝑙 ×
σ 𝑤=1
𝑊=6
(𝑊𝑒𝑖𝑔ℎ𝑡 𝑤× σ 𝑝=1
𝑃
∆𝑃𝑤)
σ 𝑤=1
𝑊=6
𝑊𝑒𝑖𝑔ℎ𝑡 𝑤
𝑆
Can we derive the Optimum Weights for StoreX?
• No. We can’t. We don’t have Target Store’s Actual Sales, SalesX:Actual .
• So. How can we maximize %Accuracy to solve this problem?
•Well. Let’s make assumptions.
%𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 100 −
𝑆𝑎𝑙𝑒𝑠 𝑓𝑜𝑟𝑒𝑐𝑎𝑠𝑡 −𝑆𝑎𝑙𝑒𝑠 𝐴𝑐𝑡𝑢𝑎𝑙
𝑆𝑎𝑙𝑒𝑠 𝐴𝑐𝑡𝑢𝑎𝑙
× 100
Assumptions used to derive Optimum Weights
• Selected analogue stores should be able to effectively predict their sales.
• (A & B predict C); (A & C predict B); (B & C predict A).
• Then Optimize (%Accuracy) with proper Weights should improve their (analogue stores) forecasts.
• Given that Target Store X and Selected Analog Stores (A, B, and C) have similar characteristics.
Thus,
• Weights from Optimize (A & B predict C) should improve accuracy in (A & B forecast X).
• Weights from Optimize (A & C predict B) should improve accuracy in (A & C forecast X).
• Weights from Optimize (B & C predict A) should improve accuracy in (B & C forecast X).
• Next step: use Genetic Algorithm Optimization to derive Weights.
Why do we need GA Optimization?
All Possibilities
• 3 pairs of Analogue Stores: (A & B), (A & B), and (B & C)
• 10 possible values for each Weight: 0.1, 0.2, 0.3, …, and 1.0
• 6 groups of parameters: W1, W2, W3, …, and W6
All Possibilities = 3 x 106 = 3,000,000
• Simulation: 200 runs per a set of weights (2 minutes per 200 runs)
Total Duration = 3 x 106 x 2 = 6,000,000 minutes or 4,166 days or 11.5 years
Therefore, we need Genetic Algorithm Optimization.
Note: Sets of Weights from each pair of analogue stores are unique to that pair.
Genetic Algorithm Optimization
“The Greater %Accuracy, the Greater Survival Rate to the new generation”
• 3 Main GA Operations
1. Reproduction
2. Crossover (mating)
3. Mutation
• GA parameters
• totalGeneration = 50, and 1 initial generation
• totalPopulation = 50 for each generation
• probCrossover = 0.20, and nCrossover = 1
• probMutation = 0.03, and nMutation = 2
Generation i
Pop ID W1 W2 W3 W4 W5 W6 %Accuracy Probability
1
2
3
…
50
Total
Generation i
Pop ID W1 W2 … W6 %Accuracy Probability
1
2
3
4
5
Total
Actual Optimization Data Structure Simplified Example
Simplified Example of GA: (A & B) predict C, then X
InitialGeneration: Random all the Weights
Pop ID W1 W2 … W6 %Accuracy Probability
1 0.4 0.6 0.2 0.7
2 0.5 0.7 1.0 0.9
3 0.8 0.1 0.4 0.2
4 0.6 1.0 0.8 0.8
5 0.5 0.4 0.3 0.9
Total
%𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 100 −
𝑆𝑎𝑙𝑒𝑠 𝑓𝑜𝑟𝑒𝑐𝑎𝑠𝑡 −𝑆𝑎𝑙𝑒𝑠 𝐴𝑐𝑡𝑢𝑎𝑙
𝑆𝑎𝑙𝑒𝑠 𝐴𝑐𝑡𝑢𝑎𝑙
× 100
𝑆𝑎𝑙𝑒𝑠 𝐶:𝑓𝑜𝑟𝑒𝑐𝑎𝑠𝑡 = ൘෍
𝑠=1
𝑆=2
𝑆𝑎𝑙𝑒𝑠 𝐴,𝐵:𝐴𝑐𝑡𝑢𝑎𝑙 ×
σ 𝑤=1
𝑊=4
(𝑊𝑒𝑖𝑔ℎ𝑡 𝑤× (σ 𝑝=1
𝑃
∆𝑃𝑤)
σ 𝑤=1
𝑊=4
𝑊𝑒𝑖𝑔ℎ𝑡 𝑤
2
InitialGeneration
Pop ID W1 W2 … W6 %Accuracy Probability
1
2
3
4
5
Total
200 Simulation Runs
• Each %Accuracy is derived from 200 simulation runs, 200 values of SalesC:forecast
InitialGeneration
Pop ID W1 W2 … W6 %Accuracy Probability
1 0.4 0.6 0.2 0.7
2 0.5 0.7 1.0 0.9
3 0.8 0.1 0.4 0.2
4 0.6 1.0 0.8 0.8
5 0.5 0.4 0.3 0.9
Total
InitialGeneration: 200 Simulation Runs
Pop ID W1 W2 … W6 %Accuracy Probability
1 0.4 0.6 0.2 0.7 70
2 0.5 0.7 1.0 0.9 95
3 0.8 0.1 0.4 0.2 65
4 0.6 1.0 0.8 0.8 20
5 0.5 0.4 0.3 0.9 80
Total
200 runs
200 runs
200 runs
200 runs
200 runs
1. GA Reproduction
InitialGeneration (Gen00): Create BiasedRouletteWheel
Pop ID W1 W2 … W6 %Accuracy Probability
1 0.4 0.6 0.2 0.7 70 0.21
2 0.5 0.7 1.0 0.9 95 0.29
3 0.8 0.1 0.4 0.2 65 0.20
4 0.6 1.0 0.8 0.8 20 0.06
5 0.5 0.4 0.3 0.9 80 0.24
Total 330 1.00
Generation 1 (Gen01): Reproduction
Pop ID W1 W2 … W6 %Accuracy Probability
1 0.5 0.4 0.3 0.9
2 0.4 0.6 0.2 0.7
3 0.5 0.7 1.0 0.9
4 0.5 0.4 0.3 0.9
5 0.5 0.7 1.0 0.9
Total
0.21
0.50
0.70
0.76
For the new Generation 1:
1. Random = 0.80 then Gen00Pop05 is selected for Pop01.
2. Random = 0.15 then Gen00Pop01 is selected for Pop02.
3. Random = 0.44 then Gen00Pop02 is selectedfor Pop03.
4. Random = 0.98 then Gen00Pop05 is selectedfor Pop04.
5. Random = 0.22 then Gen00Pop02 is selectedfor Pop05.
1.00
2. GA Crossover
Generation 1: Crossover
Pop ID W1 W2 … W6 %Accuracy Probability
1 0.5 0.6 0.3 0.9
2 0.4 0.4 0.2 0.7
3 0.5 0.7 0.3 0.9
4 0.5 0.4 1.0 0.9
5 0.5 0.7 1.0 0.9
Total
Generation 1: Pairing
Pop ID W1 W2 … W6 %Accuracy Probability
1 0.5 0.4 0.3 0.9
2 0.4 0.6 0.2 0.7
3 0.5 0.7 1.0 0.9
4 0.5 0.4 0.3 0.9
5 0.5 0.7 1.0 0.9
Total
If rand(0,1) < probCrossover then
For nCrossover
randInteger[1, 6]
crossover()
End For
End if
//probCrossover = 0.20
//nCrossover = 1
3. GA Mutation
Generation 1: Mutate
Pop ID W1 W2 … W6 %Accuracy Probability
1 0.5 0.6 0.7 0.9
2 0.4 0.4 0.2 0.7
3 0.5 0.7 0.3 0.9
4 0.5 0.3 1.0 0.8
5 0.1 0.7 1.0 0.9
Total
Generation 1: Randomly select
Pop ID W1 W2 … W6 %Accuracy Probability
1 0.5 0.4 0.3 0.9
2 0.4 0.6 0.2 0.7
3 0.5 0.7 1.0 0.9
4 0.5 0.4 0.3 0.9
5 0.5 0.7 1.0 0.9
Total
For nMutation
If rand(0,1) < probMutation then
randInteger[1,6]
mutate()
End If
End For
//nMutation = 2
//probMutation = 0.03
200 Simulation Runs
Generation 1: Create Biased RouletteWheel
Pop ID W1 W2 … W6 %Accuracy Probability
1 0.5 0.6 0.7 0.9 60 0.15
2 0.4 0.4 0.2 0.7 75 0.19
3 0.5 0.7 0.3 0.9 90 0.23
4 0.5 0.3 1.0 0.8 70 0.18
5 0.1 0.7 1.0 0.9 95 0.24
Total 390 1.00
Generation 1: 200 Simulation Runs
Pop ID W1 W2 … W6 %Accuracy Probability
1 0.5 0.6 0.7 0.9
2 0.4 0.4 0.2 0.7
3 0.5 0.7 0.3 0.9
4 0.5 0.3 1.0 0.8
5 0.1 0.7 1.0 0.9
Total
200 runs
200 runs
200 runs
200 runs
200 runs
Create the next Generation
Repeat the processes
until Generation 50
Generation 2
Pop ID W1 W2 … W6 %Accuracy Probability
1
2
3
4
5
Total
Generation 1
Pop ID W1 W2 … W6 %Accuracy Probability
1 0.5 0.6 0.7 0.9 60 0.15
2 0.4 0.4 0.2 0.7 75 0.19
3 0.5 0.7 0.3 0.9 90 0.23
4 0.5 0.3 1.0 0.8 70 0.18
5 0.1 0.7 1.0 0.9 95 0.24
Total 390 1.00
1. Reproduction
2. Crossover
3. Mutation
Rank the Sets of Optimum Weights by %Accuracy
No Gen ID Pop ID W1 W2 … W6 ±Error %Accuracy
1 00 01 … … … … … …
2 00 02 … … … … … …
… … … … … … … … …
50 00 50 … … … … … …
51 01 01 … … … … … …
52 01 02 … … … … … …
… … … … … … … …
100 01 50 … … … … … …
101 02 01 … … … … … …
… … … … … … … …
… … … … … … … …
2501 50 01 … … … … … …
2502 50 02 … … … … … …
… … … … … … … …
2550 50 50 … … … … … …
Rank
&
Select
Top 100 Sets of Optimum Weights (A & B predict C),
Ranked by %Accuracy
Use these Sets of Optimum Weights & ±Error
for (A & B forecast X)
RANK No Gen ID Pop ID W1 W2 … W6 ±Error %Accuracy
1 … … … … … 98
2 … … … … … 96
3 … … … … … 96
… … … … … … …
100 … … … … … 88
Optimum Weights from (A & B predict C) to Forecast X
200 runs
200 runs
200 runs
200 runs
200 runs
(A & B predict C)
RANK W1 W2 … W6 ±Error Sales Forecast
1 … … … … …
2 … … … … …
3 … … … … …
… … … … … …
100 … … … … …
(A & B predict C) X
RANK W1 W2 … W6 ±Error Sales Forecast
1 … … … … … … 200 values
2 … … … … … … 200 values
3 … … … … … … 200 values
… … … … … … … 200 values
100 … … … … … … 200 values
Optimum Weights from the 3 pairs to Forecast X
(A & B predict C) X
RANK W1 W2 … W6 ±Error Sales Forecast
1 … … … … … … 200 values
… … … … … … … 200 values
100 … … … … … … 200 values
(A & C predict B) X
RANK W1 W2 … W6 ±Error Sales Forecast
1 … … … … … … 200 values
… … … … … … … 200 values
100 … … … … … … 200 values
(B & C predict C) X
RANK W1 W2 … W6 ±Error Sales Forecast
1 … … … … … … 200 values
… … … … … … … 200 values
100 … … … … … … 200 values
3 x 100 x 200 values
of SalesX:forecast
0.25
0.50
0.75
1.00
1,000 1,200 1,400800600
Probability
SalesX:forecast
Selection of Analog Stores
1. Size & Design
2. Region: North, South, …
3. Regional Characteristics: CBD, around Bangkok, international trading area, …
4. Population: 5, 15, and 30 Km
5. Household Income: 5, 15, and 30 Km
6. Competition Level: rival stores, local stores, cannibalization, …
To calculate “Analogue Store Selection Score”
Example Table of Analogue Store Selection Score
No
Score Store ID Size Design Region Regional
Characteristics
Population Household
Income
Competition
Level
1 98 56
2 96 42
3 95 31
4 92 7
5 87 19
… … …
10 71 28
Optimize the Processes of GA & Simulation
1. Consider tradeoff between Time & Accuracy
2. GA Optimization
• Adjust GA parameters to suit the problem
• Start the initial generation (Gen00) better/smarter
3. Simulation
• Reduce the number of simulation runs
• Skip already-tested parameter sets, and just copy their results: %Accuracy and ±Error
From 11 years (all possibilities) to 2 months (GA), to 2 days (Optimized GA),
and 2 hours (Optimized GA & Simulation).
Final Formula for Sales Forecast
𝑆𝑎𝑙𝑒𝑠 𝑋:𝑓𝑜𝑟𝑒𝑐𝑎𝑠𝑡 = ൘෍
𝑠=1
𝑆=3
𝑆𝑎𝑙𝑒𝑠 𝐴,𝐵,𝐶:𝐴𝑐𝑡𝑢𝑎𝑙 ×
σ 𝑤=1
𝑊=6
(𝑊𝑒𝑖𝑔ℎ𝑡 𝑤× σ 𝑝=1
𝑃
∆𝑃𝑤)
σ 𝑤=1
𝑊=6
𝑊𝑒𝑖𝑔ℎ𝑡 𝑤
× 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑁𝑢𝑚𝑒𝑟𝑖𝑐𝑎𝑙−𝑃𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠 𝑆
𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑁𝑢𝑚𝑒𝑟𝑖𝑐𝑎𝑙−𝑃𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠 = 𝑘 𝑆𝑖𝑧𝑒 𝑓𝑢𝑛𝑐 𝑆𝑖𝑧𝑒 × 𝑘 𝑃𝑜𝑝 𝑓𝑢𝑛𝑐 𝑃𝑜𝑝× 𝑘 𝐻𝐻𝐼 𝑓𝑢𝑛𝑐 𝐻𝐻𝐼× 𝑘 𝐶𝑜𝑚𝑝 𝑓𝑢𝑛𝑐 𝐶𝑜𝑚𝑝
Simplified Example: 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑆𝑖𝑧𝑒 = ൘
𝑆𝑖𝑧𝑒 𝑋
𝑆𝑖𝑧𝑒 𝐴𝑛𝑎𝑙𝑜𝑔𝑢𝑒
with Maximum and Minimum Value Range
(Store Size, Population, Household Income, and Competition)
Requirements from Executives (key decision makers)
• Result in 90% accuracy  Final accuracy is about 90%++
• Eliminate subjective judgement  Use 60 boolean parameters with 3-point estimates
• Keep information secrete  Data is locked in only one laptop.
• No “Linear Regression”  Employ Simulation and Genetic Algorithm Optimization
Requirements from Strategists (end users)
• Use/modify the current method  Use Analogue Method with modification
• Expedite the process of forecasting  Optimize the processes of GA & Simulation
• Remain some flexibility  Analogue store selection, and k values for numerical parameters
Keys to success for this project
• Fully supported by executives
• Professionally participated by the team
• Great contribution from the head of the team (expert view)
• Willing to test new idea & data
• Effectively/flexibly manage time
• Reasonable expectation for the project
Future Improvement
• Consider a dedicated weight for each boolean parameters
• Adjust 3-point Estimates
• Improve numerical-parameter functions
• Enhance Analogue Store Selection Scoring System
• Test different types of Optimization methods and other Machine Learning methods
• Use better Hardware and GPU
• Incorporate other Sales Forecasting Methods
• Develop automated and repeatable improvement processes in the application
Reference
THANK YOU
Chachrist Srisuwanrat, Ph.D.
The Founder of ThaiQuants.com

More Related Content

Similar to Location based sales forecast for superstores

A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...PAPIs.io
 
SMART Seminar Series: "Optimisation of closed loop supply chain decisions usi...
SMART Seminar Series: "Optimisation of closed loop supply chain decisions usi...SMART Seminar Series: "Optimisation of closed loop supply chain decisions usi...
SMART Seminar Series: "Optimisation of closed loop supply chain decisions usi...SMART Infrastructure Facility
 
XGBoost @ Fyber
XGBoost @ FyberXGBoost @ Fyber
XGBoost @ FyberDaniel Hen
 
Labeling Foot Traffic in Dense Locations
Labeling Foot Traffic in Dense LocationsLabeling Foot Traffic in Dense Locations
Labeling Foot Traffic in Dense LocationsOm Patri
 
Big Data LDN 2017: Advanced Analytics Applied to Marketing Attribution
Big Data LDN 2017: Advanced Analytics Applied to Marketing AttributionBig Data LDN 2017: Advanced Analytics Applied to Marketing Attribution
Big Data LDN 2017: Advanced Analytics Applied to Marketing AttributionMatt Stubbs
 
Scheduling advertisements on a web page to maximize revenue
Scheduling advertisements on a web page to maximize revenueScheduling advertisements on a web page to maximize revenue
Scheduling advertisements on a web page to maximize revenueShu-Jeng Hsieh
 
Int'l Conference on Predictive APIs: RTB Optimizer presentation
Int'l Conference on Predictive APIs: RTB Optimizer presentationInt'l Conference on Predictive APIs: RTB Optimizer presentation
Int'l Conference on Predictive APIs: RTB Optimizer presentationDatacratic
 
Ed Snelson. Counterfactual Analysis
Ed Snelson. Counterfactual AnalysisEd Snelson. Counterfactual Analysis
Ed Snelson. Counterfactual AnalysisVolha Banadyseva
 
Augmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML ToolkitAugmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML ToolkitDatabricks
 
Winner Determination in Combinatorial Reverse Auctions
Winner Determination in Combinatorial Reverse AuctionsWinner Determination in Combinatorial Reverse Auctions
Winner Determination in Combinatorial Reverse AuctionsSamira Sadaoui
 
Wooing the Best Bank Deposit Customers
Wooing the Best Bank Deposit CustomersWooing the Best Bank Deposit Customers
Wooing the Best Bank Deposit CustomersLucinda Linde
 
Unit2 montecarlosimulation
Unit2 montecarlosimulationUnit2 montecarlosimulation
Unit2 montecarlosimulationDevaKumari Vijay
 
Unit 4 simulation and queing theory(m/m/1)
Unit 4  simulation and queing theory(m/m/1)Unit 4  simulation and queing theory(m/m/1)
Unit 4 simulation and queing theory(m/m/1)DevaKumari Vijay
 
Machine Learning: je m'y mets demain!
Machine Learning: je m'y mets demain!Machine Learning: je m'y mets demain!
Machine Learning: je m'y mets demain!Louis Dorard
 
Assumptions: Check yo'self before you wreck yourself
Assumptions: Check yo'self before you wreck yourselfAssumptions: Check yo'self before you wreck yourself
Assumptions: Check yo'self before you wreck yourselfErin Shellman
 
Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingAutomated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingDatabricks
 

Similar to Location based sales forecast for superstores (20)

Ali upload
Ali uploadAli upload
Ali upload
 
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
 
SMART Seminar Series: "Optimisation of closed loop supply chain decisions usi...
SMART Seminar Series: "Optimisation of closed loop supply chain decisions usi...SMART Seminar Series: "Optimisation of closed loop supply chain decisions usi...
SMART Seminar Series: "Optimisation of closed loop supply chain decisions usi...
 
XGBoost @ Fyber
XGBoost @ FyberXGBoost @ Fyber
XGBoost @ Fyber
 
MATH 533 Entire Course NEW
MATH 533 Entire Course NEWMATH 533 Entire Course NEW
MATH 533 Entire Course NEW
 
Marketing strategy exo hind
Marketing strategy exo hindMarketing strategy exo hind
Marketing strategy exo hind
 
Labeling Foot Traffic in Dense Locations
Labeling Foot Traffic in Dense LocationsLabeling Foot Traffic in Dense Locations
Labeling Foot Traffic in Dense Locations
 
Big Data LDN 2017: Advanced Analytics Applied to Marketing Attribution
Big Data LDN 2017: Advanced Analytics Applied to Marketing AttributionBig Data LDN 2017: Advanced Analytics Applied to Marketing Attribution
Big Data LDN 2017: Advanced Analytics Applied to Marketing Attribution
 
Scheduling advertisements on a web page to maximize revenue
Scheduling advertisements on a web page to maximize revenueScheduling advertisements on a web page to maximize revenue
Scheduling advertisements on a web page to maximize revenue
 
Int'l Conference on Predictive APIs: RTB Optimizer presentation
Int'l Conference on Predictive APIs: RTB Optimizer presentationInt'l Conference on Predictive APIs: RTB Optimizer presentation
Int'l Conference on Predictive APIs: RTB Optimizer presentation
 
Ed Snelson. Counterfactual Analysis
Ed Snelson. Counterfactual AnalysisEd Snelson. Counterfactual Analysis
Ed Snelson. Counterfactual Analysis
 
Augmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML ToolkitAugmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML Toolkit
 
Winner Determination in Combinatorial Reverse Auctions
Winner Determination in Combinatorial Reverse AuctionsWinner Determination in Combinatorial Reverse Auctions
Winner Determination in Combinatorial Reverse Auctions
 
Ahp
AhpAhp
Ahp
 
Wooing the Best Bank Deposit Customers
Wooing the Best Bank Deposit CustomersWooing the Best Bank Deposit Customers
Wooing the Best Bank Deposit Customers
 
Unit2 montecarlosimulation
Unit2 montecarlosimulationUnit2 montecarlosimulation
Unit2 montecarlosimulation
 
Unit 4 simulation and queing theory(m/m/1)
Unit 4  simulation and queing theory(m/m/1)Unit 4  simulation and queing theory(m/m/1)
Unit 4 simulation and queing theory(m/m/1)
 
Machine Learning: je m'y mets demain!
Machine Learning: je m'y mets demain!Machine Learning: je m'y mets demain!
Machine Learning: je m'y mets demain!
 
Assumptions: Check yo'self before you wreck yourself
Assumptions: Check yo'self before you wreck yourselfAssumptions: Check yo'self before you wreck yourself
Assumptions: Check yo'self before you wreck yourself
 
Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingAutomated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and Tracking
 

Recently uploaded

Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptxAlMamun560346
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxRizalinePalanog2
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 

Recently uploaded (20)

Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 

Location based sales forecast for superstores

  • 1. Development of Sales Forecasting Application for Retail Hypermarkets using Simulation & Genetic Algorithm Chachrist Srisuwanrat, Ph.D. The Founder of ThaiQuants.com
  • 2. Main Characters of Retail Hypermarkets for this Project • Sales Area: 6,000 to 12,000 sq.m. • Land Size: 16 to 24 rais • #Stories: 1 to 2 floors • Provide food & non-food products • Including rent areas
  • 3. Objectives of this Project Forecasting Sales for a new potential Hypermarkets (not exist yet) • Method • Application • Workflow • Database • Documents Note: the project was conducted in late 2011 with Confidential & Non-Compete Agreement. Information in this presentation is only a broad idea about its methods and development.
  • 6. Software • Camtasia for documenting and consulting • MS Excel for existing and a new proof-of-concept calculation • Visual Studio Dot Net for prototyping and developing application • MS Access for Database • GIS Application for retrieving demographic information
  • 7. Requirements from Executives (key decision makers) • Result in 90% accuracy • Eliminate subjective judgement • Keep information secrete • No “Linear Regression” Requirements from Strategies (end users) • Use/modify the current method • Expedite the process of forecasting • Remain some flexibility
  • 8. Parameters of Retail Hypermarkets 1. Size 2. Location 3. Demographic factors 4. Visibility 5. Traffic 6. Competition 7. Design 8. Access 9. Activity 10. Parking 11. Signing 12. …
  • 9. Sales ForecastingMethods 1. Regression Method (Linear & Non-linear) 2. Gravity Model (Demographic/Catchment Calculation) 3. Analogue Method (Stores’ Parameter Comparison) Sales (Million Baht) Sales Area (sq.m.) 8,0006,000 10,000 1,000 1,200 800 600 1,400 12,000 1. Regression Method 2. Gravity Model 3. Analogue Method
  • 10. 1. Regression Method Sales = a × X1 + b × X2 + c × X3 + … + K • Xs are primary parameters: • Store Size • Catchment Area & Competition • Demographic Variables • … • Xi could be a ratio. • K is a constant such as 500 million baht Sales (Million Baht) Sales Area (sq.m.) 8,0006,000 10,000 1,000 1,200 800 600 1,400 12,000 No. Actual Sales X1 X2 X3 … 1 2 3 4 5 6 7 Derive a, b, c, …, and K
  • 11. 2. Gravity Model Sales = Adjusted Population × Expenditure/head • Primary Data I. Store’s physical data II. Geographic data III. Demographic data IV. … Consider I, II, III, IV, … Transportation Mountain
  • 12. 3. Analogue Method (Traditional) Sales = Average (Actual Sales × Parameters) • Based on actual sales of exiting stores • Compare parameters between Analog & Target Stores • The parameters including both: • Primary data: Demographic/Geographicdata • Secondary data: Size, Location, … Analog Stores Target Store A B X Actual Sales 1,000 800 ? Size ?% ?% Location ?% ?% Sum ??? Analog Stores Target Store A B X Actual Sales 1,000 800 ? Size +15% -5% Location +5% +15% Sum +20% +10% 1,200 880 1,040 Assign %parameters
  • 13. Problems with Current Practice, Analogue Method • Assigning subjective values • Missing parameters (cannibalization, …) • Weighting parameters equally • Omitting an impact from other parameters • Selecting ambiguous analog stores (A, B, C, …) • Taking time Analog Stores Target Store A B C X Actual Sales 1,000 800 1,000 ? Size +15% -5% - Location +5% +15% +5% Visibility - - +5% Traffic ? - -15% Competition +%5% - +5% Sum …% …% … … ??? Analog Stores Target Store A B X Actual Sales 1,000 800 ? Size +15% -5% Location +5% +15% Sum +20% +10% 1,200 880 1,040
  • 14. During Proof-Of-Concept Development • Researching/learning about the topic & data • Limiting effort from the team • Requesting data for 10 existing stores • Asking for only available data from the team • Finding/Testing new data • Discussing/Proposing ideas • Focusing on primary data: size, population, household income ACCURACY: 75% Analog Stores Target Store A B C X Actual Sales # # # ? P1 % % %- P2 % % % P3 % % % … % % % P6 % % % Size # # # # Population # # # # HH Income # # # # Sum …% …% …% … … … ???
  • 15. During Prototype Development • Changing assigned value to boolean-type data • P2.1: Is it visible in 150 meters? • Creating levels for parameters • P2.2: Is it visible in 300 meters? • Requesting more data and 3-point estimates • Incorporating Monte Carlo Simulation • Grouping secondary parameters into 3-4 groups • Testing Genetic Algorithm Optimization ACCURACY: 86% Analog Stores Target Store A B C X Actual Sales # # # ? P1 % % %- P2.1 P2.2 … … … … … P15 Size # # # # Pop @ 5 Km # # # # Pop @ 15 Km # # # # Pop @ 30 Km # # # # HHI @ 5 Km # # # # HHI @ 15 Km # # # # HHI @ 30 Km # # # # Sum …% …% …% … … … ???
  • 16. Example of Using Boolean Parameters Analog Stores Target Store A B C X Actual Sales 1,000 800 1,000 ? P1: +10% % % %- P2.1: +4% P2.2: +6% … … … … … Sum …% …% …% … … … ??? Analog Stores Target Store A B C X Actual Sales 1,000 800 1,000 ? P1: +10% 0 +10% +10% P2.1: +4% -4% -4% 0 P2.2: +6% +6% 0 +6% … … … … … Sum +2% +6% +16% 1,020 848 1,160 1,009 𝑆𝑎𝑙𝑒𝑠 𝑋:𝑓𝑜𝑟𝑒𝑐𝑎𝑠𝑡 = ൘෍ 𝑠=1 𝑆=3 𝑆𝑎𝑙𝑒𝑠 𝐴,𝐵,𝐶:𝐴𝑐𝑡𝑢𝑎𝑙 × ෍ 𝑝=1 𝑃 ∆𝑃 𝑆 %𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 100 − 𝑆𝑎𝑙𝑒𝑠 𝑓𝑜𝑟𝑒𝑐𝑎𝑠𝑡 −𝑆𝑎𝑙𝑒𝑠 𝐴𝑐𝑡𝑢𝑎𝑙 𝑆𝑎𝑙𝑒𝑠 𝐴𝑐𝑡𝑢𝑎𝑙 × 100
  • 17. Three-point Estimates & Simulation Analog Stores Target Store A B C X Actual Sales 1,000 800 1,000 ? P1: % % %- P2.1: P2.2: … … … … … Sum …% …% …% … … … ??? 10% 12%4% 4% 6%2% 6% 10%4% P1 P2 P3 Simulate 1,000 runs to get 1,000 values of Salesforecast 0.25 0.50 0.75 1.00 1,000 1,200 1,400800600 P1 = 10% P2.1 = 4% P2.2 = 6% Probability Salesforecast
  • 18. 60 + 10 Parameters • About 60 Boolean parameters • …Visibility in 150 m • … • … • About 10 Numerical parameters • Store size • Population in 5, 15, 30 Km • Household Income in 5, 15, 30 Km •Should these parameters beweighted equally for SalesForecast?
  • 19. Improving Accuracy with Optimization • Categorizing Boolean Parameters into 6 groups • Introducing Weights for each group • Therefore, there will be Weight1, Weight2, …, Weight6 𝑆𝑎𝑙𝑒𝑠 𝑋:𝑓𝑜𝑟𝑒𝑐𝑎𝑠𝑡 = ൘෍ 𝑠=1 𝑆=3 𝑆𝑎𝑙𝑒𝑠 𝐴,𝐵,𝐶:𝐴𝑐𝑡𝑢𝑎𝑙 × ෍ 𝑝=1 𝑃 ∆𝑃 𝑆 𝑆𝑎𝑙𝑒𝑠 𝑋:𝑓𝑜𝑟𝑒𝑐𝑎𝑠𝑡 = ൘෍ 𝑠=1 𝑆=3 𝑆𝑎𝑙𝑒𝑠 𝐴,𝐵,𝐶:𝐴𝑐𝑡𝑢𝑎𝑙 × σ 𝑤=1 𝑊=6 (𝑊𝑒𝑖𝑔ℎ𝑡 𝑤 × σ 𝑝=1 𝑃 ∆𝑃𝑤) σ 𝑤=1 𝑊=6 𝑊𝑒𝑖𝑔ℎ𝑡 𝑤 𝑆
  • 20. To find Optimum Weights: •How to derive the optimum Weights??? Remember. We don’t have SalesX:Actual for the calculation of %Accuracy !!! 𝑂𝑏𝑗𝑒𝑐𝑡𝑖𝑣𝑒 𝐹𝑢𝑛𝑐𝑡𝑖𝑜𝑛 = 𝑀𝑎𝑥𝑖𝑚𝑖𝑧𝑒(%𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦) %𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 100 − 𝑆𝑎𝑙𝑒𝑠 𝑓𝑜𝑟𝑒𝑐𝑎𝑠𝑡 −𝑆𝑎𝑙𝑒𝑠 𝐴𝑐𝑡𝑢𝑎𝑙 𝑆𝑎𝑙𝑒𝑠 𝐴𝑐𝑡𝑢𝑎𝑙 × 100 𝑆𝑎𝑙𝑒𝑠 𝑋:𝑓𝑜𝑟𝑒𝑐𝑎𝑠𝑡 = ൘෍ 𝑠=1 𝑆=3 𝑆𝑎𝑙𝑒𝑠 𝐴,𝐵,𝐶:𝐴𝑐𝑡𝑢𝑎𝑙 × σ 𝑤=1 𝑊=6 (𝑊𝑒𝑖𝑔ℎ𝑡 𝑤× σ 𝑝=1 𝑃 ∆𝑃𝑤) σ 𝑤=1 𝑊=6 𝑊𝑒𝑖𝑔ℎ𝑡 𝑤 𝑆
  • 21. Can we derive the Optimum Weights for StoreX? • No. We can’t. We don’t have Target Store’s Actual Sales, SalesX:Actual . • So. How can we maximize %Accuracy to solve this problem? •Well. Let’s make assumptions. %𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 100 − 𝑆𝑎𝑙𝑒𝑠 𝑓𝑜𝑟𝑒𝑐𝑎𝑠𝑡 −𝑆𝑎𝑙𝑒𝑠 𝐴𝑐𝑡𝑢𝑎𝑙 𝑆𝑎𝑙𝑒𝑠 𝐴𝑐𝑡𝑢𝑎𝑙 × 100
  • 22. Assumptions used to derive Optimum Weights • Selected analogue stores should be able to effectively predict their sales. • (A & B predict C); (A & C predict B); (B & C predict A). • Then Optimize (%Accuracy) with proper Weights should improve their (analogue stores) forecasts. • Given that Target Store X and Selected Analog Stores (A, B, and C) have similar characteristics. Thus, • Weights from Optimize (A & B predict C) should improve accuracy in (A & B forecast X). • Weights from Optimize (A & C predict B) should improve accuracy in (A & C forecast X). • Weights from Optimize (B & C predict A) should improve accuracy in (B & C forecast X). • Next step: use Genetic Algorithm Optimization to derive Weights.
  • 23. Why do we need GA Optimization? All Possibilities • 3 pairs of Analogue Stores: (A & B), (A & B), and (B & C) • 10 possible values for each Weight: 0.1, 0.2, 0.3, …, and 1.0 • 6 groups of parameters: W1, W2, W3, …, and W6 All Possibilities = 3 x 106 = 3,000,000 • Simulation: 200 runs per a set of weights (2 minutes per 200 runs) Total Duration = 3 x 106 x 2 = 6,000,000 minutes or 4,166 days or 11.5 years Therefore, we need Genetic Algorithm Optimization. Note: Sets of Weights from each pair of analogue stores are unique to that pair.
  • 24. Genetic Algorithm Optimization “The Greater %Accuracy, the Greater Survival Rate to the new generation” • 3 Main GA Operations 1. Reproduction 2. Crossover (mating) 3. Mutation • GA parameters • totalGeneration = 50, and 1 initial generation • totalPopulation = 50 for each generation • probCrossover = 0.20, and nCrossover = 1 • probMutation = 0.03, and nMutation = 2
  • 25. Generation i Pop ID W1 W2 W3 W4 W5 W6 %Accuracy Probability 1 2 3 … 50 Total Generation i Pop ID W1 W2 … W6 %Accuracy Probability 1 2 3 4 5 Total Actual Optimization Data Structure Simplified Example
  • 26. Simplified Example of GA: (A & B) predict C, then X InitialGeneration: Random all the Weights Pop ID W1 W2 … W6 %Accuracy Probability 1 0.4 0.6 0.2 0.7 2 0.5 0.7 1.0 0.9 3 0.8 0.1 0.4 0.2 4 0.6 1.0 0.8 0.8 5 0.5 0.4 0.3 0.9 Total %𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 100 − 𝑆𝑎𝑙𝑒𝑠 𝑓𝑜𝑟𝑒𝑐𝑎𝑠𝑡 −𝑆𝑎𝑙𝑒𝑠 𝐴𝑐𝑡𝑢𝑎𝑙 𝑆𝑎𝑙𝑒𝑠 𝐴𝑐𝑡𝑢𝑎𝑙 × 100 𝑆𝑎𝑙𝑒𝑠 𝐶:𝑓𝑜𝑟𝑒𝑐𝑎𝑠𝑡 = ൘෍ 𝑠=1 𝑆=2 𝑆𝑎𝑙𝑒𝑠 𝐴,𝐵:𝐴𝑐𝑡𝑢𝑎𝑙 × σ 𝑤=1 𝑊=4 (𝑊𝑒𝑖𝑔ℎ𝑡 𝑤× (σ 𝑝=1 𝑃 ∆𝑃𝑤) σ 𝑤=1 𝑊=4 𝑊𝑒𝑖𝑔ℎ𝑡 𝑤 2 InitialGeneration Pop ID W1 W2 … W6 %Accuracy Probability 1 2 3 4 5 Total
  • 27. 200 Simulation Runs • Each %Accuracy is derived from 200 simulation runs, 200 values of SalesC:forecast InitialGeneration Pop ID W1 W2 … W6 %Accuracy Probability 1 0.4 0.6 0.2 0.7 2 0.5 0.7 1.0 0.9 3 0.8 0.1 0.4 0.2 4 0.6 1.0 0.8 0.8 5 0.5 0.4 0.3 0.9 Total InitialGeneration: 200 Simulation Runs Pop ID W1 W2 … W6 %Accuracy Probability 1 0.4 0.6 0.2 0.7 70 2 0.5 0.7 1.0 0.9 95 3 0.8 0.1 0.4 0.2 65 4 0.6 1.0 0.8 0.8 20 5 0.5 0.4 0.3 0.9 80 Total 200 runs 200 runs 200 runs 200 runs 200 runs
  • 28. 1. GA Reproduction InitialGeneration (Gen00): Create BiasedRouletteWheel Pop ID W1 W2 … W6 %Accuracy Probability 1 0.4 0.6 0.2 0.7 70 0.21 2 0.5 0.7 1.0 0.9 95 0.29 3 0.8 0.1 0.4 0.2 65 0.20 4 0.6 1.0 0.8 0.8 20 0.06 5 0.5 0.4 0.3 0.9 80 0.24 Total 330 1.00 Generation 1 (Gen01): Reproduction Pop ID W1 W2 … W6 %Accuracy Probability 1 0.5 0.4 0.3 0.9 2 0.4 0.6 0.2 0.7 3 0.5 0.7 1.0 0.9 4 0.5 0.4 0.3 0.9 5 0.5 0.7 1.0 0.9 Total 0.21 0.50 0.70 0.76 For the new Generation 1: 1. Random = 0.80 then Gen00Pop05 is selected for Pop01. 2. Random = 0.15 then Gen00Pop01 is selected for Pop02. 3. Random = 0.44 then Gen00Pop02 is selectedfor Pop03. 4. Random = 0.98 then Gen00Pop05 is selectedfor Pop04. 5. Random = 0.22 then Gen00Pop02 is selectedfor Pop05. 1.00
  • 29. 2. GA Crossover Generation 1: Crossover Pop ID W1 W2 … W6 %Accuracy Probability 1 0.5 0.6 0.3 0.9 2 0.4 0.4 0.2 0.7 3 0.5 0.7 0.3 0.9 4 0.5 0.4 1.0 0.9 5 0.5 0.7 1.0 0.9 Total Generation 1: Pairing Pop ID W1 W2 … W6 %Accuracy Probability 1 0.5 0.4 0.3 0.9 2 0.4 0.6 0.2 0.7 3 0.5 0.7 1.0 0.9 4 0.5 0.4 0.3 0.9 5 0.5 0.7 1.0 0.9 Total If rand(0,1) < probCrossover then For nCrossover randInteger[1, 6] crossover() End For End if //probCrossover = 0.20 //nCrossover = 1
  • 30. 3. GA Mutation Generation 1: Mutate Pop ID W1 W2 … W6 %Accuracy Probability 1 0.5 0.6 0.7 0.9 2 0.4 0.4 0.2 0.7 3 0.5 0.7 0.3 0.9 4 0.5 0.3 1.0 0.8 5 0.1 0.7 1.0 0.9 Total Generation 1: Randomly select Pop ID W1 W2 … W6 %Accuracy Probability 1 0.5 0.4 0.3 0.9 2 0.4 0.6 0.2 0.7 3 0.5 0.7 1.0 0.9 4 0.5 0.4 0.3 0.9 5 0.5 0.7 1.0 0.9 Total For nMutation If rand(0,1) < probMutation then randInteger[1,6] mutate() End If End For //nMutation = 2 //probMutation = 0.03
  • 31. 200 Simulation Runs Generation 1: Create Biased RouletteWheel Pop ID W1 W2 … W6 %Accuracy Probability 1 0.5 0.6 0.7 0.9 60 0.15 2 0.4 0.4 0.2 0.7 75 0.19 3 0.5 0.7 0.3 0.9 90 0.23 4 0.5 0.3 1.0 0.8 70 0.18 5 0.1 0.7 1.0 0.9 95 0.24 Total 390 1.00 Generation 1: 200 Simulation Runs Pop ID W1 W2 … W6 %Accuracy Probability 1 0.5 0.6 0.7 0.9 2 0.4 0.4 0.2 0.7 3 0.5 0.7 0.3 0.9 4 0.5 0.3 1.0 0.8 5 0.1 0.7 1.0 0.9 Total 200 runs 200 runs 200 runs 200 runs 200 runs
  • 32. Create the next Generation Repeat the processes until Generation 50 Generation 2 Pop ID W1 W2 … W6 %Accuracy Probability 1 2 3 4 5 Total Generation 1 Pop ID W1 W2 … W6 %Accuracy Probability 1 0.5 0.6 0.7 0.9 60 0.15 2 0.4 0.4 0.2 0.7 75 0.19 3 0.5 0.7 0.3 0.9 90 0.23 4 0.5 0.3 1.0 0.8 70 0.18 5 0.1 0.7 1.0 0.9 95 0.24 Total 390 1.00 1. Reproduction 2. Crossover 3. Mutation
  • 33. Rank the Sets of Optimum Weights by %Accuracy No Gen ID Pop ID W1 W2 … W6 ±Error %Accuracy 1 00 01 … … … … … … 2 00 02 … … … … … … … … … … … … … … … 50 00 50 … … … … … … 51 01 01 … … … … … … 52 01 02 … … … … … … … … … … … … … … 100 01 50 … … … … … … 101 02 01 … … … … … … … … … … … … … … … … … … … … … … 2501 50 01 … … … … … … 2502 50 02 … … … … … … … … … … … … … … 2550 50 50 … … … … … … Rank & Select Top 100 Sets of Optimum Weights (A & B predict C), Ranked by %Accuracy Use these Sets of Optimum Weights & ±Error for (A & B forecast X) RANK No Gen ID Pop ID W1 W2 … W6 ±Error %Accuracy 1 … … … … … 98 2 … … … … … 96 3 … … … … … 96 … … … … … … … 100 … … … … … 88
  • 34. Optimum Weights from (A & B predict C) to Forecast X 200 runs 200 runs 200 runs 200 runs 200 runs (A & B predict C) RANK W1 W2 … W6 ±Error Sales Forecast 1 … … … … … 2 … … … … … 3 … … … … … … … … … … … 100 … … … … … (A & B predict C) X RANK W1 W2 … W6 ±Error Sales Forecast 1 … … … … … … 200 values 2 … … … … … … 200 values 3 … … … … … … 200 values … … … … … … … 200 values 100 … … … … … … 200 values
  • 35. Optimum Weights from the 3 pairs to Forecast X (A & B predict C) X RANK W1 W2 … W6 ±Error Sales Forecast 1 … … … … … … 200 values … … … … … … … 200 values 100 … … … … … … 200 values (A & C predict B) X RANK W1 W2 … W6 ±Error Sales Forecast 1 … … … … … … 200 values … … … … … … … 200 values 100 … … … … … … 200 values (B & C predict C) X RANK W1 W2 … W6 ±Error Sales Forecast 1 … … … … … … 200 values … … … … … … … 200 values 100 … … … … … … 200 values 3 x 100 x 200 values of SalesX:forecast 0.25 0.50 0.75 1.00 1,000 1,200 1,400800600 Probability SalesX:forecast
  • 36. Selection of Analog Stores 1. Size & Design 2. Region: North, South, … 3. Regional Characteristics: CBD, around Bangkok, international trading area, … 4. Population: 5, 15, and 30 Km 5. Household Income: 5, 15, and 30 Km 6. Competition Level: rival stores, local stores, cannibalization, … To calculate “Analogue Store Selection Score”
  • 37. Example Table of Analogue Store Selection Score No Score Store ID Size Design Region Regional Characteristics Population Household Income Competition Level 1 98 56 2 96 42 3 95 31 4 92 7 5 87 19 … … … 10 71 28
  • 38. Optimize the Processes of GA & Simulation 1. Consider tradeoff between Time & Accuracy 2. GA Optimization • Adjust GA parameters to suit the problem • Start the initial generation (Gen00) better/smarter 3. Simulation • Reduce the number of simulation runs • Skip already-tested parameter sets, and just copy their results: %Accuracy and ±Error From 11 years (all possibilities) to 2 months (GA), to 2 days (Optimized GA), and 2 hours (Optimized GA & Simulation).
  • 39. Final Formula for Sales Forecast 𝑆𝑎𝑙𝑒𝑠 𝑋:𝑓𝑜𝑟𝑒𝑐𝑎𝑠𝑡 = ൘෍ 𝑠=1 𝑆=3 𝑆𝑎𝑙𝑒𝑠 𝐴,𝐵,𝐶:𝐴𝑐𝑡𝑢𝑎𝑙 × σ 𝑤=1 𝑊=6 (𝑊𝑒𝑖𝑔ℎ𝑡 𝑤× σ 𝑝=1 𝑃 ∆𝑃𝑤) σ 𝑤=1 𝑊=6 𝑊𝑒𝑖𝑔ℎ𝑡 𝑤 × 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑁𝑢𝑚𝑒𝑟𝑖𝑐𝑎𝑙−𝑃𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠 𝑆 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑁𝑢𝑚𝑒𝑟𝑖𝑐𝑎𝑙−𝑃𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠 = 𝑘 𝑆𝑖𝑧𝑒 𝑓𝑢𝑛𝑐 𝑆𝑖𝑧𝑒 × 𝑘 𝑃𝑜𝑝 𝑓𝑢𝑛𝑐 𝑃𝑜𝑝× 𝑘 𝐻𝐻𝐼 𝑓𝑢𝑛𝑐 𝐻𝐻𝐼× 𝑘 𝐶𝑜𝑚𝑝 𝑓𝑢𝑛𝑐 𝐶𝑜𝑚𝑝 Simplified Example: 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑆𝑖𝑧𝑒 = ൘ 𝑆𝑖𝑧𝑒 𝑋 𝑆𝑖𝑧𝑒 𝐴𝑛𝑎𝑙𝑜𝑔𝑢𝑒 with Maximum and Minimum Value Range (Store Size, Population, Household Income, and Competition)
  • 40. Requirements from Executives (key decision makers) • Result in 90% accuracy  Final accuracy is about 90%++ • Eliminate subjective judgement  Use 60 boolean parameters with 3-point estimates • Keep information secrete  Data is locked in only one laptop. • No “Linear Regression”  Employ Simulation and Genetic Algorithm Optimization Requirements from Strategists (end users) • Use/modify the current method  Use Analogue Method with modification • Expedite the process of forecasting  Optimize the processes of GA & Simulation • Remain some flexibility  Analogue store selection, and k values for numerical parameters
  • 41. Keys to success for this project • Fully supported by executives • Professionally participated by the team • Great contribution from the head of the team (expert view) • Willing to test new idea & data • Effectively/flexibly manage time • Reasonable expectation for the project
  • 42. Future Improvement • Consider a dedicated weight for each boolean parameters • Adjust 3-point Estimates • Improve numerical-parameter functions • Enhance Analogue Store Selection Scoring System • Test different types of Optimization methods and other Machine Learning methods • Use better Hardware and GPU • Incorporate other Sales Forecasting Methods • Develop automated and repeatable improvement processes in the application
  • 44. THANK YOU Chachrist Srisuwanrat, Ph.D. The Founder of ThaiQuants.com