- 1. - Multiple Linear Regression Model
- 2. INTRODUCTION In this project, we will discuss the results of our analysis of customer satisfaction conducted using R Studio. Our team did this by carefully analyzing a number of factors that would have an effect on customer happiness of that certain company, such as Complaint Resolution, Delivery Speed, Order Billing, Warranty Claims, Technical Support, E- commerce, Product Quality, Sales Force Image, Advertising, Price, and Product Line. We performed a thorough study using a random sample of 70 data points, using a pre- chosen seed value which we obtained from the largest student number in our group. To ensure the accuracy of our results, we replace missing values of the dataset with the mean of the dataset. We will discuss our approach, major discoveries and key findings in this presentation.
- 3. Methodology
- 4. Methodology • Import the data set to R software • Print dataset • To view full dataset • Remove duplicates
- 5. Then we have to Generate random numbers from setting the seed value to 232.
- 6. Preliminary analysis and descriptive study Now we can see, there are some missing terms in each observation. So we first get the mean of each observation and replace each of those missing values using that mean. Our sample data set have 8 missing values. So we have to replace those missing values by the mean of that column.
- 7. Preliminary analysis and descriptive study • Check the outliers in above 0f 70 sample using boxplot. Then we remove outliers from the dataset
- 8. Model building Our purpose is to select a random sample of 70 data points from the dataset mentioned above. Subsequently, conduct an analysis of this sample to create a model to estimate the best linear relationship between customer satisfaction and the different areas. Now we fit the model as multiple linear regression model. • Dep en den t va ria ble – Customer satisfaction Here our predictor variables are • Compla int Res olu tio n (CompRes ), • Deliver y Sp eed ( DelSp eed ), • Order Billin g (Or dBil lin g), • Warra nty Cla im s (Wart yC laim), • Tec hnica l Su ppo rt (TechSu ppo rt), • E-commerce (Eco m), • Product Qualit y ( Pro dQual ), • Sa les Force Image (Sa le sFIma ge), • Adverti sing (Adverti sing), • P ric e (Co mPri ci ng), • Pro duct Line (P rodLine
- 9. Now we get the model of the above data set and compute the summary of the linear regression model
- 10. Model building In complex regression situations when there are a large number of predictor variables that may be not relevant for making predictions about the response variables, it is useful to reduce the model to contain only the variables that provide important information about the response variable. So we can use several methods to select best independent variables that are explains about dependent variable. • Forward selection • Backward selection • Best sub set selection • Step wise selection
- 11. So we use Backward method to find the best model. • Backward selection Now we use the Backward elimination procedure to identify the best predictor .so now We remove (eliminate) the variable with the highest p-value (smallest of partial F statistic or equivalently, a t statistic) for the test of significance of the variable. Condition on the p-value being bigger than some predefined level. So we reject the biggest p value in the above subset is variable of Technical Support and compute the new regression model without Technical Support variable.
- 12. Likewise, we check all the independent variables according to the descending order of p value Then we can get that the 10(th) greatest p value in the above subset is, Product Quality variable and compute the new regression model without Product Quality variable
- 13. In this model 7 we can get a model with all independent variables are significant. So we can get this model as our best multiple regression model. So according to the above model we can say that, [satisfaction=- 1 . 90028 +( 0. 33350 ) ProdQual+( 0. 44008 ) CompRes=( 0. 20757 ) Prod Line+( 0. 53088 ) Sales FImage
- 14. Model interpretation and explanation By above method we can say that this model is significant. So we can get this model as our best multiple regression model by backward selection method. So according to the above model we can say that, [satisfaction= -1.90028 + (0.33350)ProdQual + (0.44008)CompRes + (0.20757)ProdLine + (0.53088)SalesFImage
- 15. Interpreting coefficient • Intercept (1.90028 ): The baseline level of satisfaction when all predictor variables (Prod Qual, Comp Res, ProdLine, Sales FImage) are zero. It represents the satisfaction level in the absence of these factors. • ProdQual (0.33350 ): For each unit increase in Product Quality score, satisfaction is predicted to increase by 0.33350 units, holding other variables constant. • Comp Res (0.44008 ): A one- unit increase in Complaint Resolution score is associated with a 0.44008 unit increase in satisfaction, keeping other variables constant. • ProdLine (0.20757 ): An increase in Product Line score by one unit corresponds to a predicted increase of 0.20757 units in satisfaction, assuming other variables remain constant. • Sales FImage (0.Sales FImage): A o n e-uni t i n c rea se in Co mp l ai nt R es o luti o n s c o r e i s asso ciat ed w ith a 0.53088 unit increa se in satisfac tion, keep ing o ther va riables co nstan t.
- 16. Plotting models According to the standardized residuals vs fitted value, plot we can see the data are spread on the mean value of the residuals line.the center panel of the above Figure, for a data set with two predictors. Most of the observations’ predictor values fall within the green dashed ellipse, but the (100) observation is well outside of this range. But neither its value for the two variables value for is unusual. So if we examine just variables, we will fail to notice this high leverage point.
- 17. VIF<5, So we can conclude that there are no multicollinearity. Checking multicollinearity
- 18. Hypothesis Test H0: b1=b2=b3=b4=0 Ha: at least one coefficient is not equal to zero • P value of each variables are less than 0.05. therefore, our hypothesis is rejected. • Therefore, the estimated regression model is statistically significant at 5% significance level.
- 19. Conclusion We have taken sample from given subset and using “Backward selection” for the following model has been estimated the best linear relationship between customer satisfaction and the different areas. [satisfaction= -1.90028 + (0.33350)ProdQual + (0.44008)CompRes + (0.20757)ProdLine + (0.53088) SalesFImage
- 20. Key insights • Impact of Factors: The model suggests that Product Quality and Complaint Resolution have the most substantial impact on customer satisfaction among the studied variables. • Strategic Implications: Focusing on improving Product Quality and enhancing Complaint Resolution processes could potentially lead to notable increases in overall customer satisfaction, according to the model's coefficients.
- 21. MODEL LIMITATIONS • Scope: The model only considers the specified predictor variables and may not capture the entirety of factors influencing customer satisfaction. • Causation: While the model shows associations, causation cannot be directly inferred. Other unmeasured variables might also impact satisfaction.
- 22. • PS/2019/232 - A. H. M. P. S. Abesinghe (Group leader): Planning and designing the project work and distributing the work • PS/2019/075 - S. N. Y. A. Gunasekara: Import the dataset and replace missing values by the column means • PS/2019/192 - G. U. Nandadewa: Backword elimination and fit the regression models • PS/2019/045 - E. D. D. Dilshan: Making the project report with all the details • PS/2019/135 - W. M. S. S. Wickramasinghe: Making presentation slides • PS/2019/039 - H. R. I. Perera: Writing a conclusion and making interpretations • PS/2019/150 - W. M. D. Nuwan: Collecting data from other resources and coordinate others with that knowledge
- 23. Thank You!