SlideShare a Scribd company logo
1 of 4
Download to read offline
Project Title:
Data analysis on quality of Wine
Project Team:
1. Saurabh Choudhary : sxc143430
2. Vijay Ramanathan :vxr141530
3. Chaitanya Vejendla: cxv140530
4. Siri Venkat Vemuri: sxv141130
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
The	
  Wine	
  Dataset:	
  	
  
Two datasets (Red Wine and White Wine) each consisting of following columns:
Input variables
1 fixed acidity 2. volatile acidity
3 citric acid 4 residual sugar
5 chlorides 6 free sulfur dioxide
7 total sulfur dioxide 8 density
9 pH 10 sulphates
11 alcohol
Output Variable
12 quality (score between 0 and 10)
Regression	
  Model	
  :	
  
We	
   have	
   created	
   the	
   model	
   using	
   lm	
   between	
   the	
   different	
   sub-­‐sets	
   of	
   predictor	
  
variables	
  and	
  the	
  response	
  variable	
  (quality).	
  This	
  creates	
  coefficients	
  for	
  the	
  data.	
  
The	
  Accuracy	
  is	
  calculated	
  by	
  rounding	
  the	
  predicted	
  value	
  to	
  the	
  nearest	
  integer.	
  It	
  
is	
  done	
  as	
  follows:	
  
White Wine:
lm.fit=lm(quality~.-density-total.sulfur.dioxide,data=white)
mean((round(predict(lm.fit)))==white$quality)
[1] 0.5165374	
  
Red Wine:
lm.fit_red=lm(quality~.-fixed.acidity-citric.acid,data=red)
mean((round(predict(lm.fit_red)))==red$quality)
[1] 0.5959975
Residual Plots
Figure	
  1:	
  White	
  Wine	
   	
   	
   	
   	
   Figure	
  2:	
  Red	
  Wine	
  
Conclusions:	
  	
  
Ø For	
  White	
  wine,	
  best	
  accuracy	
  of	
  51.65%	
  is	
  obtained	
  when	
  all	
  predictors	
  except	
  den
sity	
  and	
  sulfur	
  dioxide	
  are	
  considered.	
  
Ø For	
  Red	
  wine,	
  best	
  accuracy	
  of	
  59.61%	
  is	
  obtained	
  when	
  all	
  predictors	
  except	
  acidit
y	
  and	
  citric	
  acid	
  are	
  considered.	
  
Ø While	
  predicting	
  the	
  quality	
  the	
  physiochemical	
  properties	
  that	
  are	
  to	
  be	
  considered
,	
  varies	
  for	
  white	
  and	
  red	
  wines.	
  	
  
 
Verifying	
  Model	
  with	
  LDA	
  :	
  
	
  
1.	
  White	
  Wine-­‐	
  LDA:
Accuracy:	
  49.8%
	
  	
  	
  
	
  
	
  
2.	
  Red	
  Wine:
Accuracy:	
  56.1%
	
  
	
  
	
  
KNN	
  Model:	
  
Model	
  :	
  knn.pred=knn(train.x,test.x,train.y,k=3)	
  
Best	
  accuracy	
  observed	
  for	
  k=5	
  in	
  Red	
  wine	
  data	
  and	
  for	
  K=1	
  in	
  white	
  wine	
  
data.	
  
K-Value Red-
Accuracy
White-
Accuracy
1 49.5 58.96
3 47.9 49.18
5 52.16 48.15
Decision	
  Tree	
  :	
  	
  	
  
Accuracies	
  of	
  54.21%	
  
and	
  52.36%	
  are	
  
obtained	
  respectively	
  
for	
  red	
  and	
  white	
  
data	
  sets	
  	
  
	
   	
  
	
  
RANDOM	
  FORESTS:	
  
After	
  getting	
  poor	
  accuracies,	
  we	
  divided	
  the	
  group	
  into	
  3	
  groups	
  before	
  running	
  Forest	
  
algo:	
  
Good	
  wine	
  (with	
  quality	
  score	
  above	
  6)	
  
Normal	
  wine	
  (with	
  quality	
  score	
  equal	
  to	
  6)	
  
Bad	
  wine	
  (with	
  quality	
  score	
  below	
  6)	
  
White	
  Wine	
  
	
  
RedWine	
  
	
  
	
  
randomForest(formula	
  =	
  taste	
  ~	
  .	
  -­‐	
  quality,	
  
data	
  =	
  white_train	
  
	
  
Predictor	
  	
  	
  bad	
  good	
  normal	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  bad	
  	
  	
  	
  	
  	
  	
  	
  479	
  	
  	
  10	
  	
  	
  	
  127	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  good	
  	
  	
  	
  	
  	
  	
  17	
  	
  	
  249	
  	
  	
  	
  	
  91	
  
	
  	
  	
  	
  	
  	
  	
  	
  normal	
  	
  	
  171	
  	
  152	
  	
  	
  	
  664	
  
	
  
Accuracy:	
  71.02%	
  
randomForest(formula	
  =	
  taste	
  ~	
  .	
  -­‐	
  quality,	
  
data	
  =	
  red_train)	
  
	
  
	
  Predictor	
  	
  	
  bad	
  good	
  normal	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  bad	
  	
  	
  	
  	
  	
  	
  	
  0	
  	
  	
  	
  	
  	
  	
  	
  0	
  	
  	
  	
  	
  	
  0	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  good	
  	
  	
  	
  	
  	
  	
  5	
  	
  	
  	
  274	
  	
  	
  	
  	
  56	
  
	
  	
  	
  	
  	
  	
  	
  	
  normal	
  	
  	
  15	
  	
  	
  	
  60	
  	
  	
  	
  230	
  
	
  
Accuracy:	
  78.75%	
  
	
  
Conclusion:	
  
1. Grouping	
  the	
  dataset	
  on	
  different	
  quality	
  segments	
  would	
  give	
  a	
  better	
  accuracy	
  
2. Low	
  accuracies	
  hints	
  that	
  there	
  might	
  be	
  external	
  factors	
  other	
  than	
  just	
  the	
  cont
ent	
  of	
  wine	
  which	
  affects	
  its	
  quality	
  and	
  is	
  not	
  included	
  in	
  the	
  dataset	
  (for	
  eq.	
  age	
  
of	
  wine,	
  process	
  of	
  manufacturing	
  etc.)	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  

More Related Content

Similar to ProjectReport

Wine Classification(2).pptx
Wine Classification(2).pptxWine Classification(2).pptx
Wine Classification(2).pptxsivasounda
 
Wine Data Analysis using R, SQL and TABLEAU
Wine Data Analysis using R, SQL and TABLEAUWine Data Analysis using R, SQL and TABLEAU
Wine Data Analysis using R, SQL and TABLEAUSarita Maharia
 
2022 10 Fintech Conference Winetech&Fintech.pptx
2022 10 Fintech Conference Winetech&Fintech.pptx2022 10 Fintech Conference Winetech&Fintech.pptx
2022 10 Fintech Conference Winetech&Fintech.pptxVeaceslav Cunev
 
Predicting wine quality using data analytics
Predicting wine quality using data analyticsPredicting wine quality using data analytics
Predicting wine quality using data analyticsGautam Sawant
 
QCP user manual EN.pdf
QCP user manual EN.pdfQCP user manual EN.pdf
QCP user manual EN.pdfEmerson Ceras
 

Similar to ProjectReport (8)

Wine ppt template
Wine ppt templateWine ppt template
Wine ppt template
 
Wine Classification(2).pptx
Wine Classification(2).pptxWine Classification(2).pptx
Wine Classification(2).pptx
 
Wine Data Analysis using R, SQL and TABLEAU
Wine Data Analysis using R, SQL and TABLEAUWine Data Analysis using R, SQL and TABLEAU
Wine Data Analysis using R, SQL and TABLEAU
 
Wine.Final.Project.MJv3
Wine.Final.Project.MJv3Wine.Final.Project.MJv3
Wine.Final.Project.MJv3
 
2022 10 Fintech Conference Winetech&Fintech.pptx
2022 10 Fintech Conference Winetech&Fintech.pptx2022 10 Fintech Conference Winetech&Fintech.pptx
2022 10 Fintech Conference Winetech&Fintech.pptx
 
pdf.pdf
pdf.pdfpdf.pdf
pdf.pdf
 
Predicting wine quality using data analytics
Predicting wine quality using data analyticsPredicting wine quality using data analytics
Predicting wine quality using data analytics
 
QCP user manual EN.pdf
QCP user manual EN.pdfQCP user manual EN.pdf
QCP user manual EN.pdf
 

ProjectReport

  • 1. Project Title: Data analysis on quality of Wine Project Team: 1. Saurabh Choudhary : sxc143430 2. Vijay Ramanathan :vxr141530 3. Chaitanya Vejendla: cxv140530 4. Siri Venkat Vemuri: sxv141130                                
  • 2. The  Wine  Dataset:     Two datasets (Red Wine and White Wine) each consisting of following columns: Input variables 1 fixed acidity 2. volatile acidity 3 citric acid 4 residual sugar 5 chlorides 6 free sulfur dioxide 7 total sulfur dioxide 8 density 9 pH 10 sulphates 11 alcohol Output Variable 12 quality (score between 0 and 10) Regression  Model  :   We   have   created   the   model   using   lm   between   the   different   sub-­‐sets   of   predictor   variables  and  the  response  variable  (quality).  This  creates  coefficients  for  the  data.   The  Accuracy  is  calculated  by  rounding  the  predicted  value  to  the  nearest  integer.  It   is  done  as  follows:   White Wine: lm.fit=lm(quality~.-density-total.sulfur.dioxide,data=white) mean((round(predict(lm.fit)))==white$quality) [1] 0.5165374   Red Wine: lm.fit_red=lm(quality~.-fixed.acidity-citric.acid,data=red) mean((round(predict(lm.fit_red)))==red$quality) [1] 0.5959975 Residual Plots Figure  1:  White  Wine           Figure  2:  Red  Wine   Conclusions:     Ø For  White  wine,  best  accuracy  of  51.65%  is  obtained  when  all  predictors  except  den sity  and  sulfur  dioxide  are  considered.   Ø For  Red  wine,  best  accuracy  of  59.61%  is  obtained  when  all  predictors  except  acidit y  and  citric  acid  are  considered.   Ø While  predicting  the  quality  the  physiochemical  properties  that  are  to  be  considered ,  varies  for  white  and  red  wines.    
  • 3.   Verifying  Model  with  LDA  :     1.  White  Wine-­‐  LDA: Accuracy:  49.8%           2.  Red  Wine: Accuracy:  56.1%       KNN  Model:   Model  :  knn.pred=knn(train.x,test.x,train.y,k=3)   Best  accuracy  observed  for  k=5  in  Red  wine  data  and  for  K=1  in  white  wine   data.   K-Value Red- Accuracy White- Accuracy 1 49.5 58.96 3 47.9 49.18 5 52.16 48.15 Decision  Tree  :       Accuracies  of  54.21%   and  52.36%  are   obtained  respectively   for  red  and  white   data  sets          
  • 4. RANDOM  FORESTS:   After  getting  poor  accuracies,  we  divided  the  group  into  3  groups  before  running  Forest   algo:   Good  wine  (with  quality  score  above  6)   Normal  wine  (with  quality  score  equal  to  6)   Bad  wine  (with  quality  score  below  6)   White  Wine     RedWine       randomForest(formula  =  taste  ~  .  -­‐  quality,   data  =  white_train     Predictor      bad  good  normal                      bad                479      10        127                    good              17      249          91                  normal      171    152        664     Accuracy:  71.02%   randomForest(formula  =  taste  ~  .  -­‐  quality,   data  =  red_train)      Predictor      bad  good  normal                      bad                0                0            0                    good              5        274          56                  normal      15        60        230     Accuracy:  78.75%     Conclusion:   1. Grouping  the  dataset  on  different  quality  segments  would  give  a  better  accuracy   2. Low  accuracies  hints  that  there  might  be  external  factors  other  than  just  the  cont ent  of  wine  which  affects  its  quality  and  is  not  included  in  the  dataset  (for  eq.  age   of  wine,  process  of  manufacturing  etc.)