Hi Guyss,
Parth Cholera here and would like to share one of my Capstone Project based on Machine learning-based Model Building with the objective to develop a business model/proposal & find the optimal price for the products being sold. This will in turn help CFO to convince the merchants to higher profitable sales.
Also, need to analyze the factors which drive the price of the product and provide suggestions that will help in increasing the pricing.
Contact & follow me to get PDF & PPT file - https://www.linkedin.com/in/parthcholera/
2. Parth Cholera || https://www.linkedin.com/in/parthcholera/
Table of content
Project Introduction & Problem Statement
Business Objective & Goal
Executive Summary
Project Flow Chart & Process Cycle
Data Collection / Data Set
Data Preparation, Normalization, Understanding And Interpretation
Current Overview & Insight Of Sale, Consumption & Production.
Dashboards & Charts To Visualize The Data And Give Insights.
Choose Model & Feature Selection along with Sampling
Train and Evaluate the Model [Validation]
Methodology & Algorithm used : What we tried and why
Evaluation of Models, Evaluation Metrics and Observation
Forecast Analysis & Stats
Conclusion & Executive Summary Of Suggestion/Solution ProposedChallenges
Appendix
3. Parth Cholera || https://www.linkedin.com/in/parthcholera/
Project Introduction
• Project Sponsor / Client : CFO of American Fashion Retailer
• Problem Statement :
• To derive what should be the optimal price at which the products can be sold or should be sold out for higher
benefits?
• Analyze and derive which/what factors define and drive prices of products?
• Deliverables :
• Using ML model, analyse the factors which drive the price of the product.
• Develop dynamic dashboard to slice & dice the data.
• Desired End Outcome :
• To project optimal price of sale for given products/group of products along with its confidence interval.
• Derive the impact on pricing when variation in variables observed
• Demonstrate the outcome in Visualization form and any tool to be opted as per the choice.
The client is CFO of an American Fashion Retailer for women, which has over 700 stores across the US. Retailer
belong to Value Fashion segment which provide wear-to-work dresses and clothing for the working women at affordable
price. Now the CFO has asked us to develop a proposal & find optimal price for every product for higher benefits which
will help to convince the merchants for better price of products.
4. Parth Cholera || https://www.linkedin.com/in/parthcholera/
Business Problem and Objective :
The main objective is to develop a business model / proposal & find optimal price for
the products being sold. This will in turn help CFO to convince the merchants for
higher profitable sales.
Also need to analyze the factors which drive the price of the product and provide
suggestions which will help in increasing the pricing.
5. Parth Cholera || https://www.linkedin.com/in/parthcholera/
Methodology And Tools Used
Work Task Tools Used
Cleanup the CSV data (Remove unwanted & Blank fields, sanitize/format some of them such as store Name, City
Name, State Name, Date format, Class Name)
Jupyter
Notebook
Open Refine
Importing the Data set from all the 3 files and trying to merge with Common Variable
Jupyter
Notebook /
Google Colab
Data Preparation, Formatting, Cleaning, Excluding NA & blanks along with excluding of Outliers
Model building and tuning along with splitting dataset into train and test / Analyze and Transform Variables. Random
Sampling
Evaluation of Models, Evaluation Metrics and Observation along with forecasting trend & analysis
Prepare Dashboards and Charts to Visualize the data and give and share insights along with predictive trend.
Tableau
MS Office
**** Note : Open Refine was used to have a quick glance on data points of dataset & cleanup some of the variables.
6. Parth Cholera || https://www.linkedin.com/in/parthcholera/
Data
Collection :
Finalizing the
Data set &
gathering the
Data
Understand
the Business
Problem &
Defining the
Goal
Data
Preparation :
Select /
Cleaning up of
data
Data Analysis :
Analytics to
under stand
the Data &
Derive current
situation
Proposing
required
Columns /
Variable /
Tables for
modelling &
successfully
outcome
Random
Sampling :
Accurately
Sampling /
Splitting /
transforming
of data.
Model
Selection :
Based on
Business Goal
& Data set
Build/Develop
/Train Models
: Analyzed the
model output
& re-
devlop/re-
train the
model
Validate/Test
Models :
Differentiate
model as over,
underfitting,
defining &
derive/Validat
e how a model
learns.
Model
Management :
Finalizing
adequate
model and
tune it to get
the best
performance
possible.
Performing
Analyzing &
deriving
Insight of
Dataset and
Building Up
Business
Solutions /
Suggestions
Visualization :
Dashboards
and Charts to
Visualize the
data and give
insights along
with
Suggestions
Team
Presentation
in TA session
along with
Business
Insights
Final
Milestone to
demonstrate
& project
submission
PROJECT FLOW
** Milestone
7. Parth Cholera || https://www.linkedin.com/in/parthcholera/
Deployment CycleModel Building Cycle
Desired Model Building & Deployment Flow
https://rstudio-pubs-static.s3.amazonaws.com/223423_8ca6fccca1e44939be3f85ecbfa9598f.html
https://blogs.oracle.com/ai/7-artificial-intelligence-trends-and-how-they-work-with-
operational-machine-learning
8. Parth Cholera || https://www.linkedin.com/in/parthcholera/
Assumptions
• To built the model & perform the Optimal Pricing, the model being constructed will predict
the Selling Price for every DBSKU.
• Price is derived by No of Sales & Units and is assumed that other variable have lower
impacting % ratio.
9. Parth Cholera || https://www.linkedin.com/in/parthcholera/
Executive Summary :: Observations
10% Dip in Sales [and so profit] if compared Y-o-Y / Quarter by Quarter
Forecast Trend for Sales & profit is on to lower side.
August (Month) & Saturday (Day) with Highest Sales & so the Profit
Strip Stores has highest no of Stores [State = NY]
Power Stores has highest no of Stores [State = TX]
FL – Has Highest no of Outlet’s
Lifestyle Center - Present only in State : MI & IL
Class 4 [For both department] & Department 2 [For all Classes] is the with Highest Sales & so the Profit
DBSKU or Location ID is unique entity in all 3 Data sets
NY & IN has most no's of Sales & so the Profits
TX is the only Sate with has moderate no's of Sales & but higher Profits
Total Sales has Correlation with Units / Profits & Cost
DBSKU has Correlation with Department
Location ID has Correlation with Online flag & so Total profit
10. Parth Cholera || https://www.linkedin.com/in/parthcholera/
Understanding Pricing Policy & then making decision to fulfill Business Objective :
As we know the main objective is find optimal price for each and every product’s for higher benefits so need to analyze the
factor’s which drive’s the price of the product and the % profit rate.
• Pricing Policy & Decisions :
Most important aspect of business which
is used to for setting prices of their
products.
Pricing is considered most import part of
a company’s marketing strategy.
Pricing has influences on many factors but
more on customers & their needs.
Its observed when prices are fair and
competitive -> customers come back &
increasing the profitability of the
business. Hence Pricing Policy &
Decisions making plays vital role in
enhancing Business.
Factors relating to Pricing Policy & Decision Making :
• Understand customer’s & their needs.
• Analyze & Track how pricing affects sales & influences customer’s purchasing decisions.
• Understand Competitors business stagey & their offering.
• Adjust quickly to understand need & to changes in markets.
• Help customer’s to understand why its products are priced at that rate.
• Be able to negotiate with wholesalers, retailers and other suppliers and resellers.
11. Parth Cholera || https://www.linkedin.com/in/parthcholera/
Understanding Pricing Policy & than making decision to fulfill Business Objective :
Types of Pricing Policy:
• There are 4 types of Pricing policy
1. Cost Based =
• It adds fixed profit % to the overall cost of a product.
• The end results is a selling price that aims to cover all the costs
during production or delivery stage and attain a certain level of
profit.
2. Value Based =
• It has optimal price which is a combination of customer’s
perception of the value of offered goods and production costs.
• Prices is based on market research.
• It totally depends on customer demands, expectations and
preferences, financial resources and competition.
3. Demand Based =
• Its based on customer behavior hence said demand based pricing.
• Prices depends on the demand, so the %profit.
4. Competitor Based =
• Its forms prices by looking what others are charging.
• After identifying competition, a company first assesses its own
goods and then prices them lower, higher or equal to the
competition.
Understanding Low to High Price with Factors which affects the price
*** Cost & Value Based pricing are most primary used once for higher profit
Optimal Price with profits
12. Parth Cholera || https://www.linkedin.com/in/parthcholera/
Functional Block diagram of Pricing Policy & decision :
Based above block diagram, we have will select the variables which can affect the pricing policy & so the business goal.
13. Parth Cholera || https://www.linkedin.com/in/parthcholera/
Break even point (Most Imp Parameter of Pricing) :
Profit
Loss
Break even point =
No Loss = No Profit
Observation :
• Deciding the price factor, one of the most important
question is at what price do a company invest in
manufacturing / invest in buying the product.
• Simply the price at which it will manufacture or buy wont
yelled any profits [if they sell at same price].
• So, Understanding & deriving the point at which there is
no loss / profit then further price policy can be used for
deriving & making higher profits.
• The break even point is the production level where total
revenues equals total expenses. In other words, the
break-even point is where a company produces the same
amount of revenues as expenses either during a
manufacturing process or an accounting period. Since
revenues equal expenses, the net income for the period
will be zero
Use of Break Even Analysis [higher profits] :
• Determination of selling price.
• Helps in forecasting costs & profit.
• Gives suggestions for shift in sales mix.
• Helps in making inter-firm comparison of profitability.
• Determination of costs & revenue at various levels of O/P.
• Reveals business strength & profit earning capacity.
• Helps in management decision-making (e.g. buy/Sale),
• Helps in forecasting & long-term planning
Fixed Cost
Based on Nos of Units /
Class - Break even point
changes but on average
Sale’s >=18.5 USD is
profitable Tx
14. Parth Cholera || https://www.linkedin.com/in/parthcholera/
Overview of Data Collected / Data set
• No of Dataset = 3
• Features in dataset [Unique] = 20
• Number of observation = 13052345
• This dataset contains information about the retail stores across the
USA with information such as Class / Sub-Class / Department /
State / City & Price related columns/Variable with Profit Price.
• There is missing information for some variable [~1% of total
dataset], which is ignored and rest used for model building.
For eg : 804 entity = Database No of Stock keeping Unit in transition
dataset are NULL and hence ignored.
Data Set
Name
Variables Description
Product_dataset
DBSKU Stock Keeping Unit - Database ID - Unique
DEPARTMENT Department No
CLASS Class
SUBCLASS Sub Class
DEPARTMENT_NAME Department No = Department Name
CLASS_NAME Class = Class Name
SUBCLASS_NAME Sub class = Sub Class Name
store_dataset
LOC_IDNT Location Identity - Unique
CITY City
STATE State
STORE_TYPE Store Type
POSTAL_CD Postal Code
STORE_SIZE Store Size / Capacity
transcition_dataset
DAY_DT Date
LOC_INDT Location Identity - Unique
DBSKU Stock Keeping Unit - Database ID - Unique
ONLINE_FLAG Online Yes and No
FULL_PRICE_IND FP - For Profit & NFP - Not for Profit
TOTAL_SALES Price at which sale was done
TOTAL_UNITS No of Units
TOTAL_SALES_PRFT Net profit price
TOTAL_COST Total cost of the product
16. Parth Cholera || https://www.linkedin.com/in/parthcholera/
Data Cleaning, Exploration & Preparation / Data Analysis
Dimension of data set (r, c)
To remove /drop duplicate entities
To understand type of data & its type
17. Parth Cholera || https://www.linkedin.com/in/parthcholera/
Data Cleaning, Exploration & Preparation / Data Analysis
To check no of NULL values
Data 1 =
transcition_dataset
Data 2 =
store_dataset
Data 3 =
Product_dataset
DBSKU
LOC_INDT
LOC_INDT
DBSKU
Merge1 =
Data 1 +
Data 2 based
on LOC
Step1
Step2df = Merge 1 +
Merge2 based on
DBSKU
Merging of Data set & checking its dimensions
18. Parth Cholera || https://www.linkedin.com/in/parthcholera/
Data Cleaning, Exploration & Preparation / Data Analysis
No of Store Types & there Count
Above 3.3L values are NA and
hence dropping the coloums
19. Parth Cholera || https://www.linkedin.com/in/parthcholera/
Exploring Data / Data Insights – Summary & Analysis
NY & IN has most sales NY, IN & TX has more Profits
Location wise Sales & Profits
20. Parth Cholera || https://www.linkedin.com/in/parthcholera/
Exploring Data / Data Insights – Summary & Analysis
Store Type wise Sales & Profits
No of Sales is going hand in hand
for all store type.
Suspecting Correlation between
No of Sales & Profit
Using SNS Pair Plot
21. Parth Cholera || https://www.linkedin.com/in/parthcholera/
Exploring Data / Data Insights – Summary & Analysis
Year on Year Sales [and so profit] have reduced, which might be one of the Major reason that CFO wants to built a model to increase the
Sales & profits. Every Q’s are having dip in sales & profit.
22. Parth Cholera || https://www.linkedin.com/in/parthcholera/
Forecast Analysis & Relational Stats
Based on Forecast Analysis, No
of Sales & so Profit will be on
lower side
** Note = Forecasting in Tableau
uses a technique known as
exponential smoothing. Forecast
algorithms try to find a regular
pattern in measures that can be
continued into the future. All
forecast algorithms are simple
models of a real-world data
generating process (DGP).
Considering prediction interval as
95% which is determined as
shaded area in the image.
23. Parth Cholera || https://www.linkedin.com/in/parthcholera/
Exploring Data / Data Insights – Summary & Analysis
August (Month) & Saturday (Day) with Highest Sales & so the Profit
24. Parth Cholera || https://www.linkedin.com/in/parthcholera/
Exploring Data / Data Insights – Summary & Analysis
Store Type & State wise Sales & Profits
No of Sales Summary [Profit is Similar] :
NY – Has highest no of Strip Stores
TX – Has highest no of Power Strips Stores
FL – Has Highest no of Outlet’s
Lifestyle Center - Present only
in State : MI & IL
No of Sales Summary [Profit is Similar] :
• Brooklyn
• Orlando
• Houston
• Bronx
• New York
Store Type & City wise Sales & Profits
25. Parth Cholera || https://www.linkedin.com/in/parthcholera/
Exploring Data / Data Insights – Summary & Analysis
Class 4 [For both department] & Department 2 [For all Classes] is the with Highest Sales & so the Profit
26. Parth Cholera || https://www.linkedin.com/in/parthcholera/
Correlation Matrix
Department have strong correlation with DBSKU
Location ID have moderate correlation with Online
Flag
Removing High correlated features :
Features with high correlation are more linearly
dependent and have almost the same effect on the
dependent variable.
So, when two features have high correlation, we
should drop one of them.
Visualizing same effect from seaborn package &
matplot library & drop the features which have value > 0.5
27. Parth Cholera || https://www.linkedin.com/in/parthcholera/
Data preparation – Sampling/Splitting & Exploring Train Data
To develop the model which helps in predicting higher profit, the data is divided into train and test samples.
Sampling is required to test the train model & predict the train results on test to confirm there is less variance & bias in
data. It is also called as evaluation check of trained model
Train Data Set = 70% of randomly datasets.
Test Data Set = 30% of remaining datasets.
28. Parth Cholera || https://www.linkedin.com/in/parthcholera/
Linear Regression:
• It is a very powerful technique and is
been used to understand the factors
that influence profitability.
• It has helped to forecast sales in the
coming months by analyzing the sales
data for previous months.
• It has also helped to gain various
insights about customer behavior.
• It determine a line which best fits the
data.
Analysis & Modelling – techniques Adopted
The linear regression has five key assumptions:
• Linear relationship
• Multivariate normality
• No or little multicollinearity
• No auto-correlation
• Homoscedasticity
29. Parth Cholera || https://www.linkedin.com/in/parthcholera/
Decision tree :
• Decision Tree is one of the
most powerful and popular
algorithm & used as one of
the classifiers to solve the
classification problems.
• Been a supervised learning
algorithms it works for both
continuous as well as
categorical output variables.
Analysis & Modelling – techniques Adopted
31. Parth Cholera || https://www.linkedin.com/in/parthcholera/
Analysis & Modelling – techniques Adopted
Ridge & Lasso :
• It’s a type of Regularization techniques.
• Regularization techniques used to deal with overfitting &
when the dataset is large
• Ridge and Lasso Regression involve adding penalties to the
regression function.
• The default value of regularization parameter in Lasso
regression (given by α) is 1.
• Best fit can be found by hyper tuning alpha and increasing
number of iterations.
• Ridge Regression:
Performs L2 regularization, i.e. adds penalty equivalent to
sq. of the magnitude of coefficients. Minimization objective
= LS Obj + α * (sum of square of coefficients)
• Lasso Regression:
Performs L1 regularization, i.e. adds penalty equivalent to
absolute value of the magnitude of coefficient.
Minimization objective = LS Obj + α * (sum of absolute
value of coefficients)
32. Parth Cholera || https://www.linkedin.com/in/parthcholera/
Analysis & Modelling – techniques Adopted
Ridge & Lasso :
• The default value of regularization parameter in Lasso
regression (given by α) is 1.
• Best fit can be found by hyper tuning alpha and
increasing number of iterations.
• Lasso Regression:
Performs L1 regularization, i.e. adds penalty equivalent
to absolute value of the magnitude of coefficient.
Minimization objective = LS Obj + α * (sum of absolute
value of coefficients)
33. Parth Cholera || https://www.linkedin.com/in/parthcholera/
Analysis and Modelling – Examining the models
Models R2 SCORE MAE
Linear Regression 95.40% 1.40
Decision Tree 80.90% 4.40
RIDGE 95.40% 1.40
LASSO 85.03% 3.44
34. Parth Cholera || https://www.linkedin.com/in/parthcholera/
Conclusion – Model of Choice
• The Dataset was run through different algorithms – Linear regression and Decision Tress and regularized using Lasso and Ridge.
• Linear Regression Overfits the Data and owing to high R2 score, this model is not expected not generalize well in the real world.
• Although Decision tress shown an improvement of fit over Linear Regression model, they still have a very MSE.
• By taking in to account the evaluation and Error metric and comparing the results of all the built Models, it is clearly evident that the
Model regularized with Lasso gives the best fit . So this has been chosen has the model of choice.
35. Parth Cholera || https://www.linkedin.com/in/parthcholera/
Business takeaway…..
• Based on model output, we observed that some of the features that significantly impact the pricing [profit] are
o Location [Location ID] [i.e State’s / Cities]
o Store Size
o Type of product
o Total No of Units sold & Total Cost
• The location of the Store [Location ID] & Size of the Store have significant impact on the Selling price and there by profits.
[Cities like = Brooklyn / Orlando / Houston / Bronx / New York have very high Sale & higher profits, Business should look into increase
more no's of Stores & Units for sale].
• Based on Output it was observed that Higher profits were observed when No of Sale Units were low. Higher the Units – Lower is the
Profit rate [Suspecting due to Sales/Promotion].
• August [Month] & Saturday [Days] has highest No of Sales – New Product Launch / Sales / Higher Rates policies can be experimented in
those months & days.
• Business team should concentrate on [dependent] features & further study & re-tune them to have a better Pricing Policy / Pricing
decision.
• Further Efforts should be made to capture more dimensions & features to enhance the model and to infer more factors impacting the
Price & profits.
36. Parth Cholera || https://www.linkedin.com/in/parthcholera/
Challenges:
As such there were no major challenges we faced while making these project.
[One of the Minor challenges we faced was to understand the Dataset as no Data library was provide]
The Limitation on the dataset provided is that there were very few meaningful dimensions captured on
which we can work on & to derive any fruitful result’s.