SlideShare a Scribd company logo
1 of 23
Using Insight-informed
Data to Plan Inventory
in Next 6 Months
An Application of
Linear Regression Modelling
Author: Anthony Mok
Date: 16 Nov 2023
Email: xxiaohao@yahoo.com
Agenda
1. Modern-day
Data Analytics
2. Linear
Regression
Analysis
4. Project’s
Primary Goals
5. Description of
Dataset
7. Findings,
Conclusions &
Recommendations
An Application of Linear Regression Modelling
3. Code-less,
Less-code
Analytical Tools
6. Modelling
Strategies
Modern-day
Data Analytics
Both traditional and modern-day data analytics deal
with extracting insights from information, but they
differ significantly in their methods and capabilities
Here are the significant differences:
An Application of Linear Regression Modelling 3
Features Traditional Data Analytics Modern-Day Data Analytics
Data type Mostly structured Diverse (structured, semi-structured,
unstructured)
Technology On-premises Cloud-based
Processing Batch-oriented Real-time or near-real-time
Analysis Descriptive, diagnostic Predictive, prescriptive
Accessibility Limited to data analysts Aims for data democratisation
Linear
Regression Analysis
Linear regression is a statistical method that
models the relationship between a dependent
variable and one or more independent variables
using a straight line
It is used to understand trends, make predictions,
and test hypotheses
This analysis is suitable when the data exhibits a
linear relationship where assumptions like
normality and constant variance are held
An Application of Linear Regression Modelling
Code-less &
Less-code Data Analytics
Code-less Applications
Offer drag-and-drop
interfaces, pre-built
connectors, and
automated workflows,
making data analysis
accessible to everyone,
even without technical
expertise
Less-code Applications
Requiring some coding
knowledge, less-code
platforms provide pre-
written code snippets,
wizards, and visual tools
to streamline complex
tasks
5
An Application of Linear Regression Modelling
KNIME -
Less-code Data Analytics
• Knime is a less-code data analytics platform
• Build visual workflows with pre-built nodes for data
preparation, analysis, and visualisation
• No coding required, but Python integration empowers
customisation
• Unique Selling Points: Open-source, free, and powerful.
Handles diverse data, builds predictive models, and deploys
insights
• This project is carried out using KNIME
6
An Application of Linear Regression Modelling
Project’s Primary Goals
To analyse past sales data to
generate insights to understand
what features of mobile phone
that drive the sales
To use these insights to
efficiently plan the inventory in
the next 6 months
7
An Application of Linear Regression Modelling
Dataset Data Description
• Dataset consists of sales and product-related features
• Dataset contains descriptions of the top 5 most popular
mobile brands
• Dataset consists of 418 rows and 16 columns
Data Dictionary
• A sample data dictionary* is given below:
8
* More details are found in the project report, which are
not released at the request of the Social Enterprise
An Application of Linear Regression Modelling
Strategies
for Modelling
• Check for, and treat with suitable methods, missing values
in dataset
• Observe for, and take suitable steps to treat, outliers
• Check for multicollinearity amongst variables and use
suitable steps to treat highly correlated variables
• Build a Linear Regression Model to predict the sales of
mobile phones
• Report on the the metrics of the models
• Identify the significant variables, and rebuild and report on
the model using only these variables only
• Based on the final model outcomes, determine the features
driving mobile phone sales
• List down the recommendations to help in the inventory
planning for the next 6 months
9
An Application of Linear Regression Modelling
Check for Missing Values in Dataset By:
10
An Application of Linear Regression Modelling
• KNIME Workflow was created
• ‘CSV Read’ and ‘Data Explorer’ nodes were dragged
and dropped onto the KNIME Platform to ingest and
explore the variables and data in the dataset
• Using the ‘Interactive Viewer’ in the ‘Data Explorer’
node, 16 numeric variables were discovered
• The properties of the variables were expanded to
explore their missing values, and only missing values
in Rows 7, 18 & 397 in the ‘display size’ variable
were found
Treat Missing Values in Dataset
11
An Application of Linear Regression Modelling
• Since ‘display size’ is a categorical variable, the
mode of this variable was used to replace its missing
values
• To do this, the data in the display size column was
converted from numbers to strings, by using the
‘Number-To-String’ node
• The ‘Missing Value’ node was used to replace the
three missing values with their ‘Most Frequent
Value’, which is its mode, of 6.5
• Finally, the ‘String-To-Number’ node was deployed to
return this column of data to its original data format
for modeling purposes
Observe and Treat Outliers
12
An Application of Linear Regression Modelling
The Histogram for ‘ratings’
were constructed to study their distribution:
The distribution of ‘ratings’ is left skewed. The median for
‘ratings’, the middlemost value when the smallest to
largest rating were ordered, is 4.3, while the mean, the
average of all ratings, is 4.339. There is a difference of
0.039 between the mean and median, which places them
tightly together. When the middle value resembles the
average, the dataset for ‘ratings’ is symmetrically
distributed. About 50% of the ‘ratings’ were in the
Interquartile Range, which is between 4.3 to 4.4, while
about 25% of the ‘ratings’ are higher than Quartile 3,
between 4.4 and 4.5. About 25% of the ‘ratings’ are
lower than Quartile 1, between 4.2 to 4.3
Observe and Treat Outliers
Observe and Treat Outliers
13
An Application of Linear Regression Modelling
The Box Plot for ‘ratings’
were constructed to study their outliers:
Observe and Treat Outliers
Through the Box Plot, a total of 6 outliers were found. One
value (in Row 49 of the dataset) is above the upper whisker
boundary and five values (in Rows 158, 259, 286, 320
and 408 of the dataset) are below the lower whisker
boundary of the Box Plot. Relating these six outliers to real
life circumstances, the decision is not to treat them since it
is realistic to observe ratings of 4.6 (for ‘ratings’ in Row
49) and 3.0 (for ‘ratings’ in Row 320) in a 5-point scale
customer rating form. So, these rows are kept to enhance
analysis
14
An Application of Linear Regression Modelling
Check for
Multicollinearity Amongst Variables
The ‘Linear Correlation’ node was engaged to observe the correlation coefficients
between all the numerical variables. After sorting the ‘Correlation Value’, in
descending order, in the ‘View’ function in the ‘Linear Correlation’ node, the
correlation value between the variables ‘num_of_ratings’ and ‘sales’ is 0.9418, which
is 94.18%. This suggests that these two variables are highly correlated.
Multicollinearity of variables reduces the precision of the estimated coefficients since
they shift wildly with slight changes in other independent variables. Under such
situation, the p-values are unable to identify independent variables that are
statistically significant. To strengthen the statistical power in the regression model,
the multicollinearity of these variables needs to be removed . Typically, variables
which correlation values are >0.70 are deemed highly correlated and need to be
treated
15
An Application of Linear Regression Modelling
Treating for
Multicollinearity Amongst Variables
• Observe the correlation values and identify the highly correlated quantitative
(numerical) variables, that is, correlation value is >0.7
• Shift this variable to the ‘Exclude’ box of the ‘Configure’ function of the ‘Linear
Correlation’ node Using the remaining variables, re-execute the ‘Linear
Correlation’ node
• Observe the correlation values of the remaining variables after re-executing the
node
• Identify the next highly correlated variables
• Repeat this process until all the variables have correlation value of <0.7
• This process was not repeated as there were no other highly correlated
quantitative (numerical) variables found after treating the multicollinearity of
‘num_of_ratings’ and ‘sales’
The following steps were taken to achieve this outcome:
16
An Application of Linear Regression Modelling
Build the Linear Regression Model By:
1. ‘Partitioning’ node was configured
to split the dataset in training and
testing sets by the ratio of 7:3
3. Two sets of ‘Regression Predictor’ and ‘Numeric
Scorer’ were created; one to ingest the training dataset
and the other to churn the data from the testing dataset
2. ‘Linear Regression Learner’ was
created with these configurations
with ‘sales’ as ‘Target’
17
An Application of Linear Regression Modelling
Evaluate the Linear Regression Model
After feeding the training and testing dataset, from the
‘partitioning’ node, into the learner and predictors, their
numeric scorers produced the following metrics:
Training Dataset Numeric Scorer Testing Dataset Numeric Scorer
The model has performed well on both the training and testing datasets. The R-squared is around 0.882 on the training dataset
and 0.928 on the testing dataset. They have high R-squared values; the higher these values are, the better the model fits the data
and the predictions approximate the real data points. It is a clear indication that a good model has been created that is able to
explain the variance in the sales of mobile phones of up to 88%. Mean Absolute Error indicates that my model is able to predict
sales of mobile phones within the mean error of 9.4 units of SGD on testing dataset
18
An Application of Linear Regression Modelling
Identify Significant Variables
The p-value measures the significance of observational data. There
are 11 variables which p-values are more than 0.05, starting with
‘battery_capacity’ at 0.799. Typically, p-value that is less than or
equals to 0.05 is statistically significant, which helps to determine if
the observed relationship that arises is not a result of chance
19
An Application of Linear Regression Modelling
Rebuild Model Significant Variables Only By:
• Shifting the variable with the highest p-value, that is >0.05, to the
‘Exclude’ box of the ‘Configure’ function of the ‘Linear Regression
Learner’
• Using the remaining variable, re-execute the node
• Observing the changes in the p-values through the ‘Coefficients
and Statistics’ function of the node
• Identifying the next variable with the highest p-value
• Continuing to iterate the process until all p-values of remaining
variables are ≤ 0.05
These are the six variables with p-value ≤ 0.05 that are
retained to rebuilt the model since they are statistically
significant:
20
An Application of Linear Regression Modelling
Evaluate the Rebuilt Linear Regression Model
After the model has been rebuilt, the scorers for the training and
testing dataset show the following information:
Training Dataset Numeric Scorer Testing Dataset Numeric Scorer
This model continues to perform well on both the training and testing datasets. The R-squared is
around 0.875 on the training dataset and 0.924 on the testing dataset. These are 0.007 and 0.004
lower than the original model. Nevertheless, they have high R-squared values, and higher these
values are, the better the model fits the data and the predictions approximate the real data points. It
is a clear indication that I am able to create a good rebuilt model that is able to explain the variance
in the sales of mobile phones of up to 88%. Mean Absolute Error indicates that my model is able to
predict sales of mobile phones within the mean error of 9 units of rupees on testing dataset
Findings &
Conclusions*
21
Key Features Driving Mobile Phone Sales
• It seems that ‘discount_percent’ is the only comparatively
higher coefficient with a positive impact on 'sales’. An
increase in one unit of ‘discount_percent’ will increase
‘sales’ by 0.46 unit of SGD
• Similarly, ‘display_size’ has the most negative impact on
'sales’. An increase in one unit of the ‘display_size’
variable would decrease the ‘sales’ by around 1 unit of
SGD
• In ranking order, ‘num_of_ratings’, ‘model’, ‘processor’
and ‘num_rear_camera’ have similar negative effects on
‘sales’. A unit increase of these would reduce ‘sales’ by
0.38 unit of SGD
* More details are found in the project report, which are
not released at the request of the Social Enterprise
Recommendations*
22
2. Stock smaller display sizes
of mobile phones with lesser
rear cameras
3. Narrow range of models to
stock
4. Keep phones which
processors are encoded at a
lower value
Recommendations On Inventory Planning
1. Look at including
higher discounts
* More details are found in the project report, which are
not released at the request of the Social Enterprise
Thank you
Author: Anthony Mok
Date: 16 Nov 2023
Email: xxiaohao@yahoo.com

More Related Content

Similar to Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6 Months

Atharva_Joshis_Presentation_on_Regression.pptx
Atharva_Joshis_Presentation_on_Regression.pptxAtharva_Joshis_Presentation_on_Regression.pptx
Atharva_Joshis_Presentation_on_Regression.pptx
Atharva Joshi
 
analyzing-time-series-data-regression-with-a-practical-example.pptx
analyzing-time-series-data-regression-with-a-practical-example.pptxanalyzing-time-series-data-regression-with-a-practical-example.pptx
analyzing-time-series-data-regression-with-a-practical-example.pptx
joyadas092
 
analyzing-time-series-data-regression-with-a-practical-example (1).pptx
analyzing-time-series-data-regression-with-a-practical-example (1).pptxanalyzing-time-series-data-regression-with-a-practical-example (1).pptx
analyzing-time-series-data-regression-with-a-practical-example (1).pptx
joyadas092
 
MIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaMIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_Bhatia
Rahul Bhatia
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Simplilearn
 

Similar to Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6 Months (20)

SHAHBAZ_TECHNICAL_SEMINAR.docx
SHAHBAZ_TECHNICAL_SEMINAR.docxSHAHBAZ_TECHNICAL_SEMINAR.docx
SHAHBAZ_TECHNICAL_SEMINAR.docx
 
Recommender system
Recommender systemRecommender system
Recommender system
 
PERFORMANCE_PREDICTION__PARAMETERS[1].pptx
PERFORMANCE_PREDICTION__PARAMETERS[1].pptxPERFORMANCE_PREDICTION__PARAMETERS[1].pptx
PERFORMANCE_PREDICTION__PARAMETERS[1].pptx
 
All PERFORMANCE PREDICTION PARAMETERS.pptx
All PERFORMANCE PREDICTION  PARAMETERS.pptxAll PERFORMANCE PREDICTION  PARAMETERS.pptx
All PERFORMANCE PREDICTION PARAMETERS.pptx
 
validation and verification part 2.pptx
validation and verification part 2.pptxvalidation and verification part 2.pptx
validation and verification part 2.pptx
 
laptop price prediction presentation
laptop price prediction presentationlaptop price prediction presentation
laptop price prediction presentation
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom Industry
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
PRIORITIZING THE BANKING SERVICE QUALITY OF DIFFERENT BRANCHES USING FACTOR A...
PRIORITIZING THE BANKING SERVICE QUALITY OF DIFFERENT BRANCHES USING FACTOR A...PRIORITIZING THE BANKING SERVICE QUALITY OF DIFFERENT BRANCHES USING FACTOR A...
PRIORITIZING THE BANKING SERVICE QUALITY OF DIFFERENT BRANCHES USING FACTOR A...
 
A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...
A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...
A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...
 
Employee mode of commuting
Employee mode of commutingEmployee mode of commuting
Employee mode of commuting
 
Atharva_Joshis_Presentation_on_Regression.pptx
Atharva_Joshis_Presentation_on_Regression.pptxAtharva_Joshis_Presentation_on_Regression.pptx
Atharva_Joshis_Presentation_on_Regression.pptx
 
analyzing-time-series-data-regression-with-a-practical-example.pptx
analyzing-time-series-data-regression-with-a-practical-example.pptxanalyzing-time-series-data-regression-with-a-practical-example.pptx
analyzing-time-series-data-regression-with-a-practical-example.pptx
 
analyzing-time-series-data-regression-with-a-practical-example (1).pptx
analyzing-time-series-data-regression-with-a-practical-example (1).pptxanalyzing-time-series-data-regression-with-a-practical-example (1).pptx
analyzing-time-series-data-regression-with-a-practical-example (1).pptx
 
ML-Unit-4.pdf
ML-Unit-4.pdfML-Unit-4.pdf
ML-Unit-4.pdf
 
MIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaMIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_Bhatia
 
dimension reduction.ppt
dimension reduction.pptdimension reduction.ppt
dimension reduction.ppt
 
[KAIST DFMP CBA] Analyze price determinants and forecast Seoul apartment pric...
[KAIST DFMP CBA] Analyze price determinants and forecast Seoul apartment pric...[KAIST DFMP CBA] Analyze price determinants and forecast Seoul apartment pric...
[KAIST DFMP CBA] Analyze price determinants and forecast Seoul apartment pric...
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptx
 

More from ThinkInnovation

Decision Making Under Uncertainty - Decide Whether Or Not to Take Precautions
Decision Making Under Uncertainty - Decide Whether Or Not to Take PrecautionsDecision Making Under Uncertainty - Decide Whether Or Not to Take Precautions
Decision Making Under Uncertainty - Decide Whether Or Not to Take Precautions
ThinkInnovation
 
Breakfast Talk - Manage Projects
Breakfast Talk - Manage ProjectsBreakfast Talk - Manage Projects
Breakfast Talk - Manage Projects
ThinkInnovation
 
Think innovation issue 4 share - scamper
Think innovation issue 4   share - scamperThink innovation issue 4   share - scamper
Think innovation issue 4 share - scamper
ThinkInnovation
 

More from ThinkInnovation (18)

Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Ordinary Least Square Regression & Stage-2 Regression - Factors Influencing M...
Ordinary Least Square Regression & Stage-2 Regression - Factors Influencing M...Ordinary Least Square Regression & Stage-2 Regression - Factors Influencing M...
Ordinary Least Square Regression & Stage-2 Regression - Factors Influencing M...
 
Decision Making Under Uncertainty - Predict the Chances of a Person Suffering...
Decision Making Under Uncertainty - Predict the Chances of a Person Suffering...Decision Making Under Uncertainty - Predict the Chances of a Person Suffering...
Decision Making Under Uncertainty - Predict the Chances of a Person Suffering...
 
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
 
Decision Making Under Uncertainty - Decide Whether Or Not to Take Precautions
Decision Making Under Uncertainty - Decide Whether Or Not to Take PrecautionsDecision Making Under Uncertainty - Decide Whether Or Not to Take Precautions
Decision Making Under Uncertainty - Decide Whether Or Not to Take Precautions
 
Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in Logistics
 
Create Data Model & Conduct Visualisation in Power BI Desktop
Create Data Model & Conduct Visualisation in Power BI DesktopCreate Data Model & Conduct Visualisation in Power BI Desktop
Create Data Model & Conduct Visualisation in Power BI Desktop
 
Using DAX & Time-based Analysis in Data Warehouse
Using DAX & Time-based Analysis in Data WarehouseUsing DAX & Time-based Analysis in Data Warehouse
Using DAX & Time-based Analysis in Data Warehouse
 
Creating Data Warehouse Using Power Query & Power Pivot
Creating Data Warehouse Using Power Query & Power PivotCreating Data Warehouse Using Power Query & Power Pivot
Creating Data Warehouse Using Power Query & Power Pivot
 
Unlocking New Insights Into the World of European Soccer Through the European...
Unlocking New Insights Into the World of European Soccer Through the European...Unlocking New Insights Into the World of European Soccer Through the European...
Unlocking New Insights Into the World of European Soccer Through the European...
 
Breakfast Talk - Manage Projects
Breakfast Talk - Manage ProjectsBreakfast Talk - Manage Projects
Breakfast Talk - Manage Projects
 
Think innovation issue 4 share - scamper
Think innovation issue 4   share - scamperThink innovation issue 4   share - scamper
Think innovation issue 4 share - scamper
 
SCAMPER
SCAMPERSCAMPER
SCAMPER
 
Reverse Assumption Method
Reverse Assumption MethodReverse Assumption Method
Reverse Assumption Method
 
Psyche of Facilitation - The New Language of Facilitating Conversations
Psyche of Facilitation - The New Language of Facilitating ConversationsPsyche of Facilitation - The New Language of Facilitating Conversations
Psyche of Facilitation - The New Language of Facilitating Conversations
 
Visual Connection - Ideation Through Word Association
Visual Connection - Ideation Through Word AssociationVisual Connection - Ideation Through Word Association
Visual Connection - Ideation Through Word Association
 

Recently uploaded

一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
pyhepag
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
Amil baba
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
cyebo
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
DilipVasan
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
RafigAliyev2
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
pyhepag
 

Recently uploaded (20)

The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancing
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfGenerative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeral
 
Machine Learning for Accident Severity Prediction
Machine Learning for Accident Severity PredictionMachine Learning for Accident Severity Prediction
Machine Learning for Accident Severity Prediction
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdf
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
 

Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6 Months

  • 1. Using Insight-informed Data to Plan Inventory in Next 6 Months An Application of Linear Regression Modelling Author: Anthony Mok Date: 16 Nov 2023 Email: xxiaohao@yahoo.com
  • 2. Agenda 1. Modern-day Data Analytics 2. Linear Regression Analysis 4. Project’s Primary Goals 5. Description of Dataset 7. Findings, Conclusions & Recommendations An Application of Linear Regression Modelling 3. Code-less, Less-code Analytical Tools 6. Modelling Strategies
  • 3. Modern-day Data Analytics Both traditional and modern-day data analytics deal with extracting insights from information, but they differ significantly in their methods and capabilities Here are the significant differences: An Application of Linear Regression Modelling 3 Features Traditional Data Analytics Modern-Day Data Analytics Data type Mostly structured Diverse (structured, semi-structured, unstructured) Technology On-premises Cloud-based Processing Batch-oriented Real-time or near-real-time Analysis Descriptive, diagnostic Predictive, prescriptive Accessibility Limited to data analysts Aims for data democratisation
  • 4. Linear Regression Analysis Linear regression is a statistical method that models the relationship between a dependent variable and one or more independent variables using a straight line It is used to understand trends, make predictions, and test hypotheses This analysis is suitable when the data exhibits a linear relationship where assumptions like normality and constant variance are held An Application of Linear Regression Modelling
  • 5. Code-less & Less-code Data Analytics Code-less Applications Offer drag-and-drop interfaces, pre-built connectors, and automated workflows, making data analysis accessible to everyone, even without technical expertise Less-code Applications Requiring some coding knowledge, less-code platforms provide pre- written code snippets, wizards, and visual tools to streamline complex tasks 5 An Application of Linear Regression Modelling
  • 6. KNIME - Less-code Data Analytics • Knime is a less-code data analytics platform • Build visual workflows with pre-built nodes for data preparation, analysis, and visualisation • No coding required, but Python integration empowers customisation • Unique Selling Points: Open-source, free, and powerful. Handles diverse data, builds predictive models, and deploys insights • This project is carried out using KNIME 6 An Application of Linear Regression Modelling
  • 7. Project’s Primary Goals To analyse past sales data to generate insights to understand what features of mobile phone that drive the sales To use these insights to efficiently plan the inventory in the next 6 months 7 An Application of Linear Regression Modelling
  • 8. Dataset Data Description • Dataset consists of sales and product-related features • Dataset contains descriptions of the top 5 most popular mobile brands • Dataset consists of 418 rows and 16 columns Data Dictionary • A sample data dictionary* is given below: 8 * More details are found in the project report, which are not released at the request of the Social Enterprise An Application of Linear Regression Modelling
  • 9. Strategies for Modelling • Check for, and treat with suitable methods, missing values in dataset • Observe for, and take suitable steps to treat, outliers • Check for multicollinearity amongst variables and use suitable steps to treat highly correlated variables • Build a Linear Regression Model to predict the sales of mobile phones • Report on the the metrics of the models • Identify the significant variables, and rebuild and report on the model using only these variables only • Based on the final model outcomes, determine the features driving mobile phone sales • List down the recommendations to help in the inventory planning for the next 6 months 9 An Application of Linear Regression Modelling
  • 10. Check for Missing Values in Dataset By: 10 An Application of Linear Regression Modelling • KNIME Workflow was created • ‘CSV Read’ and ‘Data Explorer’ nodes were dragged and dropped onto the KNIME Platform to ingest and explore the variables and data in the dataset • Using the ‘Interactive Viewer’ in the ‘Data Explorer’ node, 16 numeric variables were discovered • The properties of the variables were expanded to explore their missing values, and only missing values in Rows 7, 18 & 397 in the ‘display size’ variable were found
  • 11. Treat Missing Values in Dataset 11 An Application of Linear Regression Modelling • Since ‘display size’ is a categorical variable, the mode of this variable was used to replace its missing values • To do this, the data in the display size column was converted from numbers to strings, by using the ‘Number-To-String’ node • The ‘Missing Value’ node was used to replace the three missing values with their ‘Most Frequent Value’, which is its mode, of 6.5 • Finally, the ‘String-To-Number’ node was deployed to return this column of data to its original data format for modeling purposes
  • 12. Observe and Treat Outliers 12 An Application of Linear Regression Modelling The Histogram for ‘ratings’ were constructed to study their distribution: The distribution of ‘ratings’ is left skewed. The median for ‘ratings’, the middlemost value when the smallest to largest rating were ordered, is 4.3, while the mean, the average of all ratings, is 4.339. There is a difference of 0.039 between the mean and median, which places them tightly together. When the middle value resembles the average, the dataset for ‘ratings’ is symmetrically distributed. About 50% of the ‘ratings’ were in the Interquartile Range, which is between 4.3 to 4.4, while about 25% of the ‘ratings’ are higher than Quartile 3, between 4.4 and 4.5. About 25% of the ‘ratings’ are lower than Quartile 1, between 4.2 to 4.3 Observe and Treat Outliers
  • 13. Observe and Treat Outliers 13 An Application of Linear Regression Modelling The Box Plot for ‘ratings’ were constructed to study their outliers: Observe and Treat Outliers Through the Box Plot, a total of 6 outliers were found. One value (in Row 49 of the dataset) is above the upper whisker boundary and five values (in Rows 158, 259, 286, 320 and 408 of the dataset) are below the lower whisker boundary of the Box Plot. Relating these six outliers to real life circumstances, the decision is not to treat them since it is realistic to observe ratings of 4.6 (for ‘ratings’ in Row 49) and 3.0 (for ‘ratings’ in Row 320) in a 5-point scale customer rating form. So, these rows are kept to enhance analysis
  • 14. 14 An Application of Linear Regression Modelling Check for Multicollinearity Amongst Variables The ‘Linear Correlation’ node was engaged to observe the correlation coefficients between all the numerical variables. After sorting the ‘Correlation Value’, in descending order, in the ‘View’ function in the ‘Linear Correlation’ node, the correlation value between the variables ‘num_of_ratings’ and ‘sales’ is 0.9418, which is 94.18%. This suggests that these two variables are highly correlated. Multicollinearity of variables reduces the precision of the estimated coefficients since they shift wildly with slight changes in other independent variables. Under such situation, the p-values are unable to identify independent variables that are statistically significant. To strengthen the statistical power in the regression model, the multicollinearity of these variables needs to be removed . Typically, variables which correlation values are >0.70 are deemed highly correlated and need to be treated
  • 15. 15 An Application of Linear Regression Modelling Treating for Multicollinearity Amongst Variables • Observe the correlation values and identify the highly correlated quantitative (numerical) variables, that is, correlation value is >0.7 • Shift this variable to the ‘Exclude’ box of the ‘Configure’ function of the ‘Linear Correlation’ node Using the remaining variables, re-execute the ‘Linear Correlation’ node • Observe the correlation values of the remaining variables after re-executing the node • Identify the next highly correlated variables • Repeat this process until all the variables have correlation value of <0.7 • This process was not repeated as there were no other highly correlated quantitative (numerical) variables found after treating the multicollinearity of ‘num_of_ratings’ and ‘sales’ The following steps were taken to achieve this outcome:
  • 16. 16 An Application of Linear Regression Modelling Build the Linear Regression Model By: 1. ‘Partitioning’ node was configured to split the dataset in training and testing sets by the ratio of 7:3 3. Two sets of ‘Regression Predictor’ and ‘Numeric Scorer’ were created; one to ingest the training dataset and the other to churn the data from the testing dataset 2. ‘Linear Regression Learner’ was created with these configurations with ‘sales’ as ‘Target’
  • 17. 17 An Application of Linear Regression Modelling Evaluate the Linear Regression Model After feeding the training and testing dataset, from the ‘partitioning’ node, into the learner and predictors, their numeric scorers produced the following metrics: Training Dataset Numeric Scorer Testing Dataset Numeric Scorer The model has performed well on both the training and testing datasets. The R-squared is around 0.882 on the training dataset and 0.928 on the testing dataset. They have high R-squared values; the higher these values are, the better the model fits the data and the predictions approximate the real data points. It is a clear indication that a good model has been created that is able to explain the variance in the sales of mobile phones of up to 88%. Mean Absolute Error indicates that my model is able to predict sales of mobile phones within the mean error of 9.4 units of SGD on testing dataset
  • 18. 18 An Application of Linear Regression Modelling Identify Significant Variables The p-value measures the significance of observational data. There are 11 variables which p-values are more than 0.05, starting with ‘battery_capacity’ at 0.799. Typically, p-value that is less than or equals to 0.05 is statistically significant, which helps to determine if the observed relationship that arises is not a result of chance
  • 19. 19 An Application of Linear Regression Modelling Rebuild Model Significant Variables Only By: • Shifting the variable with the highest p-value, that is >0.05, to the ‘Exclude’ box of the ‘Configure’ function of the ‘Linear Regression Learner’ • Using the remaining variable, re-execute the node • Observing the changes in the p-values through the ‘Coefficients and Statistics’ function of the node • Identifying the next variable with the highest p-value • Continuing to iterate the process until all p-values of remaining variables are ≤ 0.05 These are the six variables with p-value ≤ 0.05 that are retained to rebuilt the model since they are statistically significant:
  • 20. 20 An Application of Linear Regression Modelling Evaluate the Rebuilt Linear Regression Model After the model has been rebuilt, the scorers for the training and testing dataset show the following information: Training Dataset Numeric Scorer Testing Dataset Numeric Scorer This model continues to perform well on both the training and testing datasets. The R-squared is around 0.875 on the training dataset and 0.924 on the testing dataset. These are 0.007 and 0.004 lower than the original model. Nevertheless, they have high R-squared values, and higher these values are, the better the model fits the data and the predictions approximate the real data points. It is a clear indication that I am able to create a good rebuilt model that is able to explain the variance in the sales of mobile phones of up to 88%. Mean Absolute Error indicates that my model is able to predict sales of mobile phones within the mean error of 9 units of rupees on testing dataset
  • 21. Findings & Conclusions* 21 Key Features Driving Mobile Phone Sales • It seems that ‘discount_percent’ is the only comparatively higher coefficient with a positive impact on 'sales’. An increase in one unit of ‘discount_percent’ will increase ‘sales’ by 0.46 unit of SGD • Similarly, ‘display_size’ has the most negative impact on 'sales’. An increase in one unit of the ‘display_size’ variable would decrease the ‘sales’ by around 1 unit of SGD • In ranking order, ‘num_of_ratings’, ‘model’, ‘processor’ and ‘num_rear_camera’ have similar negative effects on ‘sales’. A unit increase of these would reduce ‘sales’ by 0.38 unit of SGD * More details are found in the project report, which are not released at the request of the Social Enterprise
  • 22. Recommendations* 22 2. Stock smaller display sizes of mobile phones with lesser rear cameras 3. Narrow range of models to stock 4. Keep phones which processors are encoded at a lower value Recommendations On Inventory Planning 1. Look at including higher discounts * More details are found in the project report, which are not released at the request of the Social Enterprise
  • 23. Thank you Author: Anthony Mok Date: 16 Nov 2023 Email: xxiaohao@yahoo.com