SlideShare a Scribd company logo
1 of 23
Using Insight-informed
Data to Plan Inventory
in Next 6 Months
An Application of
Linear Regression Modelling
Author: Anthony Mok
Date: 16 Nov 2023
Email: xxiaohao@yahoo.com
Agenda
1. Modern-day
Data Analytics
2. Linear
Regression
Analysis
4. Project’s
Primary Goals
5. Description of
Dataset
7. Findings,
Conclusions &
Recommendations
An Application of Linear Regression Modelling
3. Code-less,
Less-code
Analytical Tools
6. Modelling
Strategies
Modern-day
Data Analytics
Both traditional and modern-day data analytics deal
with extracting insights from information, but they
differ significantly in their methods and capabilities
Here are the significant differences:
An Application of Linear Regression Modelling 3
Features Traditional Data Analytics Modern-Day Data Analytics
Data type Mostly structured Diverse (structured, semi-structured,
unstructured)
Technology On-premises Cloud-based
Processing Batch-oriented Real-time or near-real-time
Analysis Descriptive, diagnostic Predictive, prescriptive
Accessibility Limited to data analysts Aims for data democratisation
Linear
Regression Analysis
Linear regression is a statistical method that
models the relationship between a dependent
variable and one or more independent variables
using a straight line
It is used to understand trends, make predictions,
and test hypotheses
This analysis is suitable when the data exhibits a
linear relationship where assumptions like
normality and constant variance are held
An Application of Linear Regression Modelling
Code-less &
Less-code Data Analytics
Code-less Applications
Offer drag-and-drop
interfaces, pre-built
connectors, and
automated workflows,
making data analysis
accessible to everyone,
even without technical
expertise
Less-code Applications
Requiring some coding
knowledge, less-code
platforms provide pre-
written code snippets,
wizards, and visual tools
to streamline complex
tasks
5
An Application of Linear Regression Modelling
KNIME -
Less-code Data Analytics
• Knime is a less-code data analytics platform
• Build visual workflows with pre-built nodes for data
preparation, analysis, and visualisation
• No coding required, but Python integration empowers
customisation
• Unique Selling Points: Open-source, free, and powerful.
Handles diverse data, builds predictive models, and deploys
insights
• This project is carried out using KNIME
6
An Application of Linear Regression Modelling
Project’s Primary Goals
To analyse past sales data to
generate insights to understand
what features of mobile phone
that drive the sales
To use these insights to
efficiently plan the inventory in
the next 6 months
7
An Application of Linear Regression Modelling
Dataset Data Description
• Dataset consists of sales and product-related features
• Dataset contains descriptions of the top 5 most popular
mobile brands
• Dataset consists of 418 rows and 16 columns
Data Dictionary
• A sample data dictionary* is given below:
8
* More details are found in the project report, which are
not released at the request of the Social Enterprise
An Application of Linear Regression Modelling
Strategies
for Modelling
• Check for, and treat with suitable methods, missing values
in dataset
• Observe for, and take suitable steps to treat, outliers
• Check for multicollinearity amongst variables and use
suitable steps to treat highly correlated variables
• Build a Linear Regression Model to predict the sales of
mobile phones
• Report on the the metrics of the models
• Identify the significant variables, and rebuild and report on
the model using only these variables only
• Based on the final model outcomes, determine the features
driving mobile phone sales
• List down the recommendations to help in the inventory
planning for the next 6 months
9
An Application of Linear Regression Modelling
Check for Missing Values in Dataset By:
10
An Application of Linear Regression Modelling
• KNIME Workflow was created
• ‘CSV Read’ and ‘Data Explorer’ nodes were dragged
and dropped onto the KNIME Platform to ingest and
explore the variables and data in the dataset
• Using the ‘Interactive Viewer’ in the ‘Data Explorer’
node, 16 numeric variables were discovered
• The properties of the variables were expanded to
explore their missing values, and only missing values
in Rows 7, 18 & 397 in the ‘display size’ variable
were found
Treat Missing Values in Dataset
11
An Application of Linear Regression Modelling
• Since ‘display size’ is a categorical variable, the
mode of this variable was used to replace its missing
values
• To do this, the data in the display size column was
converted from numbers to strings, by using the
‘Number-To-String’ node
• The ‘Missing Value’ node was used to replace the
three missing values with their ‘Most Frequent
Value’, which is its mode, of 6.5
• Finally, the ‘String-To-Number’ node was deployed to
return this column of data to its original data format
for modeling purposes
Observe and Treat Outliers
12
An Application of Linear Regression Modelling
The Histogram for ‘ratings’
were constructed to study their distribution:
The distribution of ‘ratings’ is left skewed. The median for
‘ratings’, the middlemost value when the smallest to
largest rating were ordered, is 4.3, while the mean, the
average of all ratings, is 4.339. There is a difference of
0.039 between the mean and median, which places them
tightly together. When the middle value resembles the
average, the dataset for ‘ratings’ is symmetrically
distributed. About 50% of the ‘ratings’ were in the
Interquartile Range, which is between 4.3 to 4.4, while
about 25% of the ‘ratings’ are higher than Quartile 3,
between 4.4 and 4.5. About 25% of the ‘ratings’ are
lower than Quartile 1, between 4.2 to 4.3
Observe and Treat Outliers
Observe and Treat Outliers
13
An Application of Linear Regression Modelling
The Box Plot for ‘ratings’
were constructed to study their outliers:
Observe and Treat Outliers
Through the Box Plot, a total of 6 outliers were found. One
value (in Row 49 of the dataset) is above the upper whisker
boundary and five values (in Rows 158, 259, 286, 320
and 408 of the dataset) are below the lower whisker
boundary of the Box Plot. Relating these six outliers to real
life circumstances, the decision is not to treat them since it
is realistic to observe ratings of 4.6 (for ‘ratings’ in Row
49) and 3.0 (for ‘ratings’ in Row 320) in a 5-point scale
customer rating form. So, these rows are kept to enhance
analysis
14
An Application of Linear Regression Modelling
Check for
Multicollinearity Amongst Variables
The ‘Linear Correlation’ node was engaged to observe the correlation coefficients
between all the numerical variables. After sorting the ‘Correlation Value’, in
descending order, in the ‘View’ function in the ‘Linear Correlation’ node, the
correlation value between the variables ‘num_of_ratings’ and ‘sales’ is 0.9418, which
is 94.18%. This suggests that these two variables are highly correlated.
Multicollinearity of variables reduces the precision of the estimated coefficients since
they shift wildly with slight changes in other independent variables. Under such
situation, the p-values are unable to identify independent variables that are
statistically significant. To strengthen the statistical power in the regression model,
the multicollinearity of these variables needs to be removed . Typically, variables
which correlation values are >0.70 are deemed highly correlated and need to be
treated
15
An Application of Linear Regression Modelling
Treating for
Multicollinearity Amongst Variables
• Observe the correlation values and identify the highly correlated quantitative
(numerical) variables, that is, correlation value is >0.7
• Shift this variable to the ‘Exclude’ box of the ‘Configure’ function of the ‘Linear
Correlation’ node Using the remaining variables, re-execute the ‘Linear
Correlation’ node
• Observe the correlation values of the remaining variables after re-executing the
node
• Identify the next highly correlated variables
• Repeat this process until all the variables have correlation value of <0.7
• This process was not repeated as there were no other highly correlated
quantitative (numerical) variables found after treating the multicollinearity of
‘num_of_ratings’ and ‘sales’
The following steps were taken to achieve this outcome:
16
An Application of Linear Regression Modelling
Build the Linear Regression Model By:
1. ‘Partitioning’ node was configured
to split the dataset in training and
testing sets by the ratio of 7:3
3. Two sets of ‘Regression Predictor’ and ‘Numeric
Scorer’ were created; one to ingest the training dataset
and the other to churn the data from the testing dataset
2. ‘Linear Regression Learner’ was
created with these configurations
with ‘sales’ as ‘Target’
17
An Application of Linear Regression Modelling
Evaluate the Linear Regression Model
After feeding the training and testing dataset, from the
‘partitioning’ node, into the learner and predictors, their
numeric scorers produced the following metrics:
Training Dataset Numeric Scorer Testing Dataset Numeric Scorer
The model has performed well on both the training and testing datasets. The R-squared is around 0.882 on the training dataset
and 0.928 on the testing dataset. They have high R-squared values; the higher these values are, the better the model fits the data
and the predictions approximate the real data points. It is a clear indication that a good model has been created that is able to
explain the variance in the sales of mobile phones of up to 88%. Mean Absolute Error indicates that my model is able to predict
sales of mobile phones within the mean error of 9.4 units of SGD on testing dataset
18
An Application of Linear Regression Modelling
Identify Significant Variables
The p-value measures the significance of observational data. There
are 11 variables which p-values are more than 0.05, starting with
‘battery_capacity’ at 0.799. Typically, p-value that is less than or
equals to 0.05 is statistically significant, which helps to determine if
the observed relationship that arises is not a result of chance
19
An Application of Linear Regression Modelling
Rebuild Model Significant Variables Only By:
• Shifting the variable with the highest p-value, that is >0.05, to the
‘Exclude’ box of the ‘Configure’ function of the ‘Linear Regression
Learner’
• Using the remaining variable, re-execute the node
• Observing the changes in the p-values through the ‘Coefficients
and Statistics’ function of the node
• Identifying the next variable with the highest p-value
• Continuing to iterate the process until all p-values of remaining
variables are ≤ 0.05
These are the six variables with p-value ≤ 0.05 that are
retained to rebuilt the model since they are statistically
significant:
20
An Application of Linear Regression Modelling
Evaluate the Rebuilt Linear Regression Model
After the model has been rebuilt, the scorers for the training and
testing dataset show the following information:
Training Dataset Numeric Scorer Testing Dataset Numeric Scorer
This model continues to perform well on both the training and testing datasets. The R-squared is
around 0.875 on the training dataset and 0.924 on the testing dataset. These are 0.007 and 0.004
lower than the original model. Nevertheless, they have high R-squared values, and higher these
values are, the better the model fits the data and the predictions approximate the real data points. It
is a clear indication that I am able to create a good rebuilt model that is able to explain the variance
in the sales of mobile phones of up to 88%. Mean Absolute Error indicates that my model is able to
predict sales of mobile phones within the mean error of 9 units of rupees on testing dataset
Findings &
Conclusions*
21
Key Features Driving Mobile Phone Sales
• It seems that ‘discount_percent’ is the only comparatively
higher coefficient with a positive impact on 'sales’. An
increase in one unit of ‘discount_percent’ will increase
‘sales’ by 0.46 unit of SGD
• Similarly, ‘display_size’ has the most negative impact on
'sales’. An increase in one unit of the ‘display_size’
variable would decrease the ‘sales’ by around 1 unit of
SGD
• In ranking order, ‘num_of_ratings’, ‘model’, ‘processor’
and ‘num_rear_camera’ have similar negative effects on
‘sales’. A unit increase of these would reduce ‘sales’ by
0.38 unit of SGD
* More details are found in the project report, which are
not released at the request of the Social Enterprise
Recommendations*
22
2. Stock smaller display sizes
of mobile phones with lesser
rear cameras
3. Narrow range of models to
stock
4. Keep phones which
processors are encoded at a
lower value
Recommendations On Inventory Planning
1. Look at including
higher discounts
* More details are found in the project report, which are
not released at the request of the Social Enterprise
Thank you
Author: Anthony Mok
Date: 16 Nov 2023
Email: xxiaohao@yahoo.com

More Related Content

Similar to Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6 Months

PERFORMANCE_PREDICTION__PARAMETERS[1].pptx
PERFORMANCE_PREDICTION__PARAMETERS[1].pptxPERFORMANCE_PREDICTION__PARAMETERS[1].pptx
PERFORMANCE_PREDICTION__PARAMETERS[1].pptxTAHIRZAMAN81
 
All PERFORMANCE PREDICTION PARAMETERS.pptx
All PERFORMANCE PREDICTION  PARAMETERS.pptxAll PERFORMANCE PREDICTION  PARAMETERS.pptx
All PERFORMANCE PREDICTION PARAMETERS.pptxtaherzamanrather
 
validation and verification part 2.pptx
validation and verification part 2.pptxvalidation and verification part 2.pptx
validation and verification part 2.pptxubaidullah75790
 
laptop price prediction presentation
laptop price prediction presentationlaptop price prediction presentation
laptop price prediction presentationNeerajNishad4
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPranov Mishra
 
PRIORITIZING THE BANKING SERVICE QUALITY OF DIFFERENT BRANCHES USING FACTOR A...
PRIORITIZING THE BANKING SERVICE QUALITY OF DIFFERENT BRANCHES USING FACTOR A...PRIORITIZING THE BANKING SERVICE QUALITY OF DIFFERENT BRANCHES USING FACTOR A...
PRIORITIZING THE BANKING SERVICE QUALITY OF DIFFERENT BRANCHES USING FACTOR A...ijmvsc
 
A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...
A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...
A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...ijmvsc
 
Atharva_Joshis_Presentation_on_Regression.pptx
Atharva_Joshis_Presentation_on_Regression.pptxAtharva_Joshis_Presentation_on_Regression.pptx
Atharva_Joshis_Presentation_on_Regression.pptxAtharva Joshi
 
analyzing-time-series-data-regression-with-a-practical-example.pptx
analyzing-time-series-data-regression-with-a-practical-example.pptxanalyzing-time-series-data-regression-with-a-practical-example.pptx
analyzing-time-series-data-regression-with-a-practical-example.pptxjoyadas092
 
analyzing-time-series-data-regression-with-a-practical-example (1).pptx
analyzing-time-series-data-regression-with-a-practical-example (1).pptxanalyzing-time-series-data-regression-with-a-practical-example (1).pptx
analyzing-time-series-data-regression-with-a-practical-example (1).pptxjoyadas092
 
MIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaMIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaRahul Bhatia
 
dimension reduction.ppt
dimension reduction.pptdimension reduction.ppt
dimension reduction.pptDeadpool120050
 
[KAIST DFMP CBA] Analyze price determinants and forecast Seoul apartment pric...
[KAIST DFMP CBA] Analyze price determinants and forecast Seoul apartment pric...[KAIST DFMP CBA] Analyze price determinants and forecast Seoul apartment pric...
[KAIST DFMP CBA] Analyze price determinants and forecast Seoul apartment pric...경록 박
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Simplilearn
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptxAniket Patil
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptxpatilaniket2418
 
Review Parameters Model Building & Interpretation and Model Tunin.docx
Review Parameters Model Building & Interpretation and Model Tunin.docxReview Parameters Model Building & Interpretation and Model Tunin.docx
Review Parameters Model Building & Interpretation and Model Tunin.docxcarlstromcurtis
 

Similar to Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6 Months (20)

PERFORMANCE_PREDICTION__PARAMETERS[1].pptx
PERFORMANCE_PREDICTION__PARAMETERS[1].pptxPERFORMANCE_PREDICTION__PARAMETERS[1].pptx
PERFORMANCE_PREDICTION__PARAMETERS[1].pptx
 
All PERFORMANCE PREDICTION PARAMETERS.pptx
All PERFORMANCE PREDICTION  PARAMETERS.pptxAll PERFORMANCE PREDICTION  PARAMETERS.pptx
All PERFORMANCE PREDICTION PARAMETERS.pptx
 
validation and verification part 2.pptx
validation and verification part 2.pptxvalidation and verification part 2.pptx
validation and verification part 2.pptx
 
laptop price prediction presentation
laptop price prediction presentationlaptop price prediction presentation
laptop price prediction presentation
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom Industry
 
PRIORITIZING THE BANKING SERVICE QUALITY OF DIFFERENT BRANCHES USING FACTOR A...
PRIORITIZING THE BANKING SERVICE QUALITY OF DIFFERENT BRANCHES USING FACTOR A...PRIORITIZING THE BANKING SERVICE QUALITY OF DIFFERENT BRANCHES USING FACTOR A...
PRIORITIZING THE BANKING SERVICE QUALITY OF DIFFERENT BRANCHES USING FACTOR A...
 
A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...
A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...
A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...
 
Employee mode of commuting
Employee mode of commutingEmployee mode of commuting
Employee mode of commuting
 
Atharva_Joshis_Presentation_on_Regression.pptx
Atharva_Joshis_Presentation_on_Regression.pptxAtharva_Joshis_Presentation_on_Regression.pptx
Atharva_Joshis_Presentation_on_Regression.pptx
 
analyzing-time-series-data-regression-with-a-practical-example.pptx
analyzing-time-series-data-regression-with-a-practical-example.pptxanalyzing-time-series-data-regression-with-a-practical-example.pptx
analyzing-time-series-data-regression-with-a-practical-example.pptx
 
analyzing-time-series-data-regression-with-a-practical-example (1).pptx
analyzing-time-series-data-regression-with-a-practical-example (1).pptxanalyzing-time-series-data-regression-with-a-practical-example (1).pptx
analyzing-time-series-data-regression-with-a-practical-example (1).pptx
 
ML-Unit-4.pdf
ML-Unit-4.pdfML-Unit-4.pdf
ML-Unit-4.pdf
 
MIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaMIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_Bhatia
 
dimension reduction.ppt
dimension reduction.pptdimension reduction.ppt
dimension reduction.ppt
 
[KAIST DFMP CBA] Analyze price determinants and forecast Seoul apartment pric...
[KAIST DFMP CBA] Analyze price determinants and forecast Seoul apartment pric...[KAIST DFMP CBA] Analyze price determinants and forecast Seoul apartment pric...
[KAIST DFMP CBA] Analyze price determinants and forecast Seoul apartment pric...
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptx
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptx
 
Review Parameters Model Building & Interpretation and Model Tunin.docx
Review Parameters Model Building & Interpretation and Model Tunin.docxReview Parameters Model Building & Interpretation and Model Tunin.docx
Review Parameters Model Building & Interpretation and Model Tunin.docx
 
Malhotra09 basic
Malhotra09 basicMalhotra09 basic
Malhotra09 basic
 

More from ThinkInnovation

Ordinary Least Square Regression & Stage-2 Regression - Factors Influencing M...
Ordinary Least Square Regression & Stage-2 Regression - Factors Influencing M...Ordinary Least Square Regression & Stage-2 Regression - Factors Influencing M...
Ordinary Least Square Regression & Stage-2 Regression - Factors Influencing M...ThinkInnovation
 
Decision Making Under Uncertainty - Predict the Chances of a Person Suffering...
Decision Making Under Uncertainty - Predict the Chances of a Person Suffering...Decision Making Under Uncertainty - Predict the Chances of a Person Suffering...
Decision Making Under Uncertainty - Predict the Chances of a Person Suffering...ThinkInnovation
 
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...ThinkInnovation
 
Decision Making Under Uncertainty - Decide Whether Or Not to Take Precautions
Decision Making Under Uncertainty - Decide Whether Or Not to Take PrecautionsDecision Making Under Uncertainty - Decide Whether Or Not to Take Precautions
Decision Making Under Uncertainty - Decide Whether Or Not to Take PrecautionsThinkInnovation
 
Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsThinkInnovation
 
Create Data Model & Conduct Visualisation in Power BI Desktop
Create Data Model & Conduct Visualisation in Power BI DesktopCreate Data Model & Conduct Visualisation in Power BI Desktop
Create Data Model & Conduct Visualisation in Power BI DesktopThinkInnovation
 
Using DAX & Time-based Analysis in Data Warehouse
Using DAX & Time-based Analysis in Data WarehouseUsing DAX & Time-based Analysis in Data Warehouse
Using DAX & Time-based Analysis in Data WarehouseThinkInnovation
 
Creating Data Warehouse Using Power Query & Power Pivot
Creating Data Warehouse Using Power Query & Power PivotCreating Data Warehouse Using Power Query & Power Pivot
Creating Data Warehouse Using Power Query & Power PivotThinkInnovation
 
Unlocking New Insights Into the World of European Soccer Through the European...
Unlocking New Insights Into the World of European Soccer Through the European...Unlocking New Insights Into the World of European Soccer Through the European...
Unlocking New Insights Into the World of European Soccer Through the European...ThinkInnovation
 
Breakfast Talk - Manage Projects
Breakfast Talk - Manage ProjectsBreakfast Talk - Manage Projects
Breakfast Talk - Manage ProjectsThinkInnovation
 
Think innovation issue 4 share - scamper
Think innovation issue 4   share - scamperThink innovation issue 4   share - scamper
Think innovation issue 4 share - scamperThinkInnovation
 
Reverse Assumption Method
Reverse Assumption MethodReverse Assumption Method
Reverse Assumption MethodThinkInnovation
 
Psyche of Facilitation - The New Language of Facilitating Conversations
Psyche of Facilitation - The New Language of Facilitating ConversationsPsyche of Facilitation - The New Language of Facilitating Conversations
Psyche of Facilitation - The New Language of Facilitating ConversationsThinkInnovation
 
Visual Connection - Ideation Through Word Association
Visual Connection - Ideation Through Word AssociationVisual Connection - Ideation Through Word Association
Visual Connection - Ideation Through Word AssociationThinkInnovation
 

More from ThinkInnovation (15)

Ordinary Least Square Regression & Stage-2 Regression - Factors Influencing M...
Ordinary Least Square Regression & Stage-2 Regression - Factors Influencing M...Ordinary Least Square Regression & Stage-2 Regression - Factors Influencing M...
Ordinary Least Square Regression & Stage-2 Regression - Factors Influencing M...
 
Decision Making Under Uncertainty - Predict the Chances of a Person Suffering...
Decision Making Under Uncertainty - Predict the Chances of a Person Suffering...Decision Making Under Uncertainty - Predict the Chances of a Person Suffering...
Decision Making Under Uncertainty - Predict the Chances of a Person Suffering...
 
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
 
Decision Making Under Uncertainty - Decide Whether Or Not to Take Precautions
Decision Making Under Uncertainty - Decide Whether Or Not to Take PrecautionsDecision Making Under Uncertainty - Decide Whether Or Not to Take Precautions
Decision Making Under Uncertainty - Decide Whether Or Not to Take Precautions
 
Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in Logistics
 
Create Data Model & Conduct Visualisation in Power BI Desktop
Create Data Model & Conduct Visualisation in Power BI DesktopCreate Data Model & Conduct Visualisation in Power BI Desktop
Create Data Model & Conduct Visualisation in Power BI Desktop
 
Using DAX & Time-based Analysis in Data Warehouse
Using DAX & Time-based Analysis in Data WarehouseUsing DAX & Time-based Analysis in Data Warehouse
Using DAX & Time-based Analysis in Data Warehouse
 
Creating Data Warehouse Using Power Query & Power Pivot
Creating Data Warehouse Using Power Query & Power PivotCreating Data Warehouse Using Power Query & Power Pivot
Creating Data Warehouse Using Power Query & Power Pivot
 
Unlocking New Insights Into the World of European Soccer Through the European...
Unlocking New Insights Into the World of European Soccer Through the European...Unlocking New Insights Into the World of European Soccer Through the European...
Unlocking New Insights Into the World of European Soccer Through the European...
 
Breakfast Talk - Manage Projects
Breakfast Talk - Manage ProjectsBreakfast Talk - Manage Projects
Breakfast Talk - Manage Projects
 
Think innovation issue 4 share - scamper
Think innovation issue 4   share - scamperThink innovation issue 4   share - scamper
Think innovation issue 4 share - scamper
 
SCAMPER
SCAMPERSCAMPER
SCAMPER
 
Reverse Assumption Method
Reverse Assumption MethodReverse Assumption Method
Reverse Assumption Method
 
Psyche of Facilitation - The New Language of Facilitating Conversations
Psyche of Facilitation - The New Language of Facilitating ConversationsPsyche of Facilitation - The New Language of Facilitating Conversations
Psyche of Facilitation - The New Language of Facilitating Conversations
 
Visual Connection - Ideation Through Word Association
Visual Connection - Ideation Through Word AssociationVisual Connection - Ideation Through Word Association
Visual Connection - Ideation Through Word Association
 

Recently uploaded

Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 

Recently uploaded (20)

Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 

Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6 Months

  • 1. Using Insight-informed Data to Plan Inventory in Next 6 Months An Application of Linear Regression Modelling Author: Anthony Mok Date: 16 Nov 2023 Email: xxiaohao@yahoo.com
  • 2. Agenda 1. Modern-day Data Analytics 2. Linear Regression Analysis 4. Project’s Primary Goals 5. Description of Dataset 7. Findings, Conclusions & Recommendations An Application of Linear Regression Modelling 3. Code-less, Less-code Analytical Tools 6. Modelling Strategies
  • 3. Modern-day Data Analytics Both traditional and modern-day data analytics deal with extracting insights from information, but they differ significantly in their methods and capabilities Here are the significant differences: An Application of Linear Regression Modelling 3 Features Traditional Data Analytics Modern-Day Data Analytics Data type Mostly structured Diverse (structured, semi-structured, unstructured) Technology On-premises Cloud-based Processing Batch-oriented Real-time or near-real-time Analysis Descriptive, diagnostic Predictive, prescriptive Accessibility Limited to data analysts Aims for data democratisation
  • 4. Linear Regression Analysis Linear regression is a statistical method that models the relationship between a dependent variable and one or more independent variables using a straight line It is used to understand trends, make predictions, and test hypotheses This analysis is suitable when the data exhibits a linear relationship where assumptions like normality and constant variance are held An Application of Linear Regression Modelling
  • 5. Code-less & Less-code Data Analytics Code-less Applications Offer drag-and-drop interfaces, pre-built connectors, and automated workflows, making data analysis accessible to everyone, even without technical expertise Less-code Applications Requiring some coding knowledge, less-code platforms provide pre- written code snippets, wizards, and visual tools to streamline complex tasks 5 An Application of Linear Regression Modelling
  • 6. KNIME - Less-code Data Analytics • Knime is a less-code data analytics platform • Build visual workflows with pre-built nodes for data preparation, analysis, and visualisation • No coding required, but Python integration empowers customisation • Unique Selling Points: Open-source, free, and powerful. Handles diverse data, builds predictive models, and deploys insights • This project is carried out using KNIME 6 An Application of Linear Regression Modelling
  • 7. Project’s Primary Goals To analyse past sales data to generate insights to understand what features of mobile phone that drive the sales To use these insights to efficiently plan the inventory in the next 6 months 7 An Application of Linear Regression Modelling
  • 8. Dataset Data Description • Dataset consists of sales and product-related features • Dataset contains descriptions of the top 5 most popular mobile brands • Dataset consists of 418 rows and 16 columns Data Dictionary • A sample data dictionary* is given below: 8 * More details are found in the project report, which are not released at the request of the Social Enterprise An Application of Linear Regression Modelling
  • 9. Strategies for Modelling • Check for, and treat with suitable methods, missing values in dataset • Observe for, and take suitable steps to treat, outliers • Check for multicollinearity amongst variables and use suitable steps to treat highly correlated variables • Build a Linear Regression Model to predict the sales of mobile phones • Report on the the metrics of the models • Identify the significant variables, and rebuild and report on the model using only these variables only • Based on the final model outcomes, determine the features driving mobile phone sales • List down the recommendations to help in the inventory planning for the next 6 months 9 An Application of Linear Regression Modelling
  • 10. Check for Missing Values in Dataset By: 10 An Application of Linear Regression Modelling • KNIME Workflow was created • ‘CSV Read’ and ‘Data Explorer’ nodes were dragged and dropped onto the KNIME Platform to ingest and explore the variables and data in the dataset • Using the ‘Interactive Viewer’ in the ‘Data Explorer’ node, 16 numeric variables were discovered • The properties of the variables were expanded to explore their missing values, and only missing values in Rows 7, 18 & 397 in the ‘display size’ variable were found
  • 11. Treat Missing Values in Dataset 11 An Application of Linear Regression Modelling • Since ‘display size’ is a categorical variable, the mode of this variable was used to replace its missing values • To do this, the data in the display size column was converted from numbers to strings, by using the ‘Number-To-String’ node • The ‘Missing Value’ node was used to replace the three missing values with their ‘Most Frequent Value’, which is its mode, of 6.5 • Finally, the ‘String-To-Number’ node was deployed to return this column of data to its original data format for modeling purposes
  • 12. Observe and Treat Outliers 12 An Application of Linear Regression Modelling The Histogram for ‘ratings’ were constructed to study their distribution: The distribution of ‘ratings’ is left skewed. The median for ‘ratings’, the middlemost value when the smallest to largest rating were ordered, is 4.3, while the mean, the average of all ratings, is 4.339. There is a difference of 0.039 between the mean and median, which places them tightly together. When the middle value resembles the average, the dataset for ‘ratings’ is symmetrically distributed. About 50% of the ‘ratings’ were in the Interquartile Range, which is between 4.3 to 4.4, while about 25% of the ‘ratings’ are higher than Quartile 3, between 4.4 and 4.5. About 25% of the ‘ratings’ are lower than Quartile 1, between 4.2 to 4.3 Observe and Treat Outliers
  • 13. Observe and Treat Outliers 13 An Application of Linear Regression Modelling The Box Plot for ‘ratings’ were constructed to study their outliers: Observe and Treat Outliers Through the Box Plot, a total of 6 outliers were found. One value (in Row 49 of the dataset) is above the upper whisker boundary and five values (in Rows 158, 259, 286, 320 and 408 of the dataset) are below the lower whisker boundary of the Box Plot. Relating these six outliers to real life circumstances, the decision is not to treat them since it is realistic to observe ratings of 4.6 (for ‘ratings’ in Row 49) and 3.0 (for ‘ratings’ in Row 320) in a 5-point scale customer rating form. So, these rows are kept to enhance analysis
  • 14. 14 An Application of Linear Regression Modelling Check for Multicollinearity Amongst Variables The ‘Linear Correlation’ node was engaged to observe the correlation coefficients between all the numerical variables. After sorting the ‘Correlation Value’, in descending order, in the ‘View’ function in the ‘Linear Correlation’ node, the correlation value between the variables ‘num_of_ratings’ and ‘sales’ is 0.9418, which is 94.18%. This suggests that these two variables are highly correlated. Multicollinearity of variables reduces the precision of the estimated coefficients since they shift wildly with slight changes in other independent variables. Under such situation, the p-values are unable to identify independent variables that are statistically significant. To strengthen the statistical power in the regression model, the multicollinearity of these variables needs to be removed . Typically, variables which correlation values are >0.70 are deemed highly correlated and need to be treated
  • 15. 15 An Application of Linear Regression Modelling Treating for Multicollinearity Amongst Variables • Observe the correlation values and identify the highly correlated quantitative (numerical) variables, that is, correlation value is >0.7 • Shift this variable to the ‘Exclude’ box of the ‘Configure’ function of the ‘Linear Correlation’ node Using the remaining variables, re-execute the ‘Linear Correlation’ node • Observe the correlation values of the remaining variables after re-executing the node • Identify the next highly correlated variables • Repeat this process until all the variables have correlation value of <0.7 • This process was not repeated as there were no other highly correlated quantitative (numerical) variables found after treating the multicollinearity of ‘num_of_ratings’ and ‘sales’ The following steps were taken to achieve this outcome:
  • 16. 16 An Application of Linear Regression Modelling Build the Linear Regression Model By: 1. ‘Partitioning’ node was configured to split the dataset in training and testing sets by the ratio of 7:3 3. Two sets of ‘Regression Predictor’ and ‘Numeric Scorer’ were created; one to ingest the training dataset and the other to churn the data from the testing dataset 2. ‘Linear Regression Learner’ was created with these configurations with ‘sales’ as ‘Target’
  • 17. 17 An Application of Linear Regression Modelling Evaluate the Linear Regression Model After feeding the training and testing dataset, from the ‘partitioning’ node, into the learner and predictors, their numeric scorers produced the following metrics: Training Dataset Numeric Scorer Testing Dataset Numeric Scorer The model has performed well on both the training and testing datasets. The R-squared is around 0.882 on the training dataset and 0.928 on the testing dataset. They have high R-squared values; the higher these values are, the better the model fits the data and the predictions approximate the real data points. It is a clear indication that a good model has been created that is able to explain the variance in the sales of mobile phones of up to 88%. Mean Absolute Error indicates that my model is able to predict sales of mobile phones within the mean error of 9.4 units of SGD on testing dataset
  • 18. 18 An Application of Linear Regression Modelling Identify Significant Variables The p-value measures the significance of observational data. There are 11 variables which p-values are more than 0.05, starting with ‘battery_capacity’ at 0.799. Typically, p-value that is less than or equals to 0.05 is statistically significant, which helps to determine if the observed relationship that arises is not a result of chance
  • 19. 19 An Application of Linear Regression Modelling Rebuild Model Significant Variables Only By: • Shifting the variable with the highest p-value, that is >0.05, to the ‘Exclude’ box of the ‘Configure’ function of the ‘Linear Regression Learner’ • Using the remaining variable, re-execute the node • Observing the changes in the p-values through the ‘Coefficients and Statistics’ function of the node • Identifying the next variable with the highest p-value • Continuing to iterate the process until all p-values of remaining variables are ≤ 0.05 These are the six variables with p-value ≤ 0.05 that are retained to rebuilt the model since they are statistically significant:
  • 20. 20 An Application of Linear Regression Modelling Evaluate the Rebuilt Linear Regression Model After the model has been rebuilt, the scorers for the training and testing dataset show the following information: Training Dataset Numeric Scorer Testing Dataset Numeric Scorer This model continues to perform well on both the training and testing datasets. The R-squared is around 0.875 on the training dataset and 0.924 on the testing dataset. These are 0.007 and 0.004 lower than the original model. Nevertheless, they have high R-squared values, and higher these values are, the better the model fits the data and the predictions approximate the real data points. It is a clear indication that I am able to create a good rebuilt model that is able to explain the variance in the sales of mobile phones of up to 88%. Mean Absolute Error indicates that my model is able to predict sales of mobile phones within the mean error of 9 units of rupees on testing dataset
  • 21. Findings & Conclusions* 21 Key Features Driving Mobile Phone Sales • It seems that ‘discount_percent’ is the only comparatively higher coefficient with a positive impact on 'sales’. An increase in one unit of ‘discount_percent’ will increase ‘sales’ by 0.46 unit of SGD • Similarly, ‘display_size’ has the most negative impact on 'sales’. An increase in one unit of the ‘display_size’ variable would decrease the ‘sales’ by around 1 unit of SGD • In ranking order, ‘num_of_ratings’, ‘model’, ‘processor’ and ‘num_rear_camera’ have similar negative effects on ‘sales’. A unit increase of these would reduce ‘sales’ by 0.38 unit of SGD * More details are found in the project report, which are not released at the request of the Social Enterprise
  • 22. Recommendations* 22 2. Stock smaller display sizes of mobile phones with lesser rear cameras 3. Narrow range of models to stock 4. Keep phones which processors are encoded at a lower value Recommendations On Inventory Planning 1. Look at including higher discounts * More details are found in the project report, which are not released at the request of the Social Enterprise
  • 23. Thank you Author: Anthony Mok Date: 16 Nov 2023 Email: xxiaohao@yahoo.com