SlideShare a Scribd company logo
Capstone Project- Wine Quality Analysis
We consider a set of observations on a number of red varieties involving their chemical properties and
ranking by tasters. Wine industry showed a recent growth as social drinking was on the rise. The price of
wine depends on a rather abstract concept of wine appreciation by wine tasters, opinion among whom may
have a high degree of variability. Pricing of wine depends on such a volatile factor to some extent. Another
key factor in wine certification and quality assessment is physicochemical tests which are laboratory-based
and takes into account factors like acidity, pH level, presence of sugar and other chemical properties. For the
wine market, it would be of interest if human quality of tasting can be related to the chemical properties of
wine so that certification and quality assessment and assurance process is more controlled.
Red Wine Dataset is available having 1599 different varieties. All wines are produced in a particular area of
Portugal. Data are collected on 12 different properties of the wines one of which is Quality, based on sensory
data, and the rest are on chemical properties of the wines including density, acidity, alcohol content etc. All
chemical properties of wines are continuous variables. Quality is an ordinal variable with possible ranking
from 1 (worst) to 10 (best). Each sample of wine is tasted by three independent tasters and the final rank
assigned is the median rank given by the tasters.
Objectives of the Analysis
Objective is prediction of Quality ranking from the chemical properties of the wines. A predictive model
developed to be this data is expected to provide guidance to vineyards regarding quality and price
expected on their produce without heavy reliance on volatility of wine tasters.
List of Attributes in Data
1. Fixed acidity: most acids involved with wine or fixed or non-volatile (one that do not evaporate
2. Volatile acidity: the amount of acetic acid in wine, which at too high of levels can lead to an
unpleasant, vinegar taste
3. Citric acid: found in small quantities, citric acid can add ‘freshness’ and flavour to wines
4. Residual sugar: the amount of sugar remaining after fermentation stops, it’s rare to find wines with
less than 1 gram/litre and wines with greater than 45 grams/litre are considered sweet
5. Chlorides: the amount of salt in the wine
6. Free sulphur dioxide: the free form of SO2 exists in equilibrium between molecular SO2 (as a
dissolved gas) and bisulphite ion; it prevents microbial growth and the oxidation of wine
7. Total sulphur dioxide:amount of free and bound forms of S02; in low concentrations, SO2 is mostly
undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the
nose and taste of wine
8. Density: the density of wine is close to that of water depending on the percent alcohol and sugar
9. pH: describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most
wines are between 3-4 on the pH scale.
10. Sulphates: a wine additive which can contribute to sulphur dioxide gas (S02) levels, which acts as an
antimicrobial and antioxidant.
11. Alcohol: the percent alcohol content of the wine
12. Quality: output variable (based on sensory data, score between 0 and 10)
Analysis of Data
1. Basic Statistics
## fixed.acidity volatile.acidity citric.acid residual.sugar
## Min. : 4.60 Min. :0.1200 Min. :0.000 Min. : 0.900
## 1st Qu.: 7.10 1st Qu.:0.3900 1st Qu.:0.090 1st Qu.: 1.900
## Median : 7.90 Median :0.5200 Median :0.260 Median : 2.200
## Mean : 8.32 Mean :0.5278 Mean :0.271 Mean : 2.539
## 3rd Qu.: 9.20 3rd Qu.:0.6400 3rd Qu.:0.420 3rd Qu.: 2.600
## Max. :15.90 Max. :1.5800 Max. :1.000 Max. :15.500
## chlorides free.sulfur.dioxide total.sulfur.dioxide
## Min. :0.01200 Min. : 1.00 Min. : 6.00
## 1st Qu.:0.07000 1st Qu.: 7.00 1st Qu.: 22.00
## Median :0.07900 Median :14.00 Median : 38.00
## Mean :0.08747 Mean :15.87 Mean : 46.47
## 3rd Qu.:0.09000 3rd Qu.:21.00 3rd Qu.: 62.00
## Max. :0.61100 Max. :72.00 Max. :289.00
## density pH sulphates alcohol
## Min. :0.9901 Min. :2.740 Min. :0.3300 Min. : 8.40
## 1st Qu.:0.9956 1st Qu.:3.210 1st Qu.:0.5500 1st Qu.: 9.50
## Median :0.9968 Median :3.310 Median :0.6200 Median :10.20
## Mean :0.9967 Mean :3.311 Mean :0.6581 Mean :10.42
## 3rd Qu.:0.9978 3rd Qu.:3.400 3rd Qu.:0.7300 3rd Qu.:11.10
## Max. :1.0037 Max. :4.010 Max. :2.0000 Max. :14.90
## quality
## Min. :3.000
## 1st Qu.:5.000
## Median :6.000
## Mean :5.636
## 3rd Qu.:6.000
## Max. :8.000
1. The alcohol contentvaries from 8.40 to 14.90 for the samples in dataset.
2. The quality of the samples range from 3 to 8 with 6 being the median.
3. The range for fixed acidity is quite high with minimum being 4.6 and maximum being 15.9,
4. pH value varies from 2.740 to 4.010 with a median being 3.310.
2. Histogram Plot Analysis
1. The spread for the quality for Red w ine exhibit a peak quality rating of approx 5.
2. The pH value seems to dispaly a normal distribution w ith major samples exhibiting values betw een 3.0 and 3.6
3. The free sulfur dioxide seems to be betw een the 1-60 w ith peaking around 10 mark.
4. The total sulfur dioxide seems to a have a spread betw een 0 and 300 and exhibiting peak around 50.
5. The alcohol content seems to vary from8 to 14 w ith major peaks around 9.
3. Correlation Matrix and Correlogram and CoVariance
#Correlation Matrix
## fixed.acidity volatile.acidity citric.acid
## fixed.acidity 1.00000000 -0.256130895 0.67170343
## volatile.acidity -0.25613089 1.000000000 -0.55249568
## citric.acid 0.67170343 -0.552495685 1.00000000
## residual.sugar 0.11477672 0.001917882 0.14357716
## chlorides 0.09370519 0.061297772 0.20382291
## free.sulfur.dioxide -0.15379419 -0.010503827 -0.06097813
## total.sulfur.dioxide -0.11318144 0.076470005 0.03553302
## density 0.66804729 0.022026232 0.36494718
## pH -0.68297819 0.234937294 -0.54190414
## sulphates 0.18300566 -0.260986685 0.31277004
## alcohol -0.06166827 -0.202288027 0.10990325
## quality 0.12405165 -0.390557780 0.22637251
## residual.sugar chlorides free.sulfur.dioxide
## fixed.acidity 0.114776724 0.093705186 -0.153794193
## volatile.acidity 0.001917882 0.061297772 -0.010503827
## citric.acid 0.143577162 0.203822914 -0.060978129
## residual.sugar 1.000000000 0.055609535 0.187048995
## chlorides 0.055609535 1.000000000 0.005562147
## free.sulfur.dioxide 0.187048995 0.005562147 1.000000000
## total.sulfur.dioxide 0.203027882 0.047400468 0.667666450
## density 0.355283371 0.200632327 -0.021945831
## pH -0.085652422 -0.265026131 0.070377499
## sulphates 0.005527121 0.371260481 0.051657572
## alcohol 0.042075437 -0.221140545 -0.069408354
## quality 0.013731637 -0.128906560 -0.050656057
## total.sulfur.dioxide density pH
## fixed.acidity -0.11318144 0.66804729 -0.68297819
## volatile.acidity 0.07647000 0.02202623 0.23493729
## citric.acid 0.03553302 0.36494718 -0.54190414
## residual.sugar 0.20302788 0.35528337 -0.08565242
## chlorides 0.04740047 0.20063233 -0.26502613
## free.sulfur.dioxide 0.66766645 -0.02194583 0.07037750
## total.sulfur.dioxide 1.00000000 0.07126948 -0.06649456
## density 0.07126948 1.00000000 -0.34169933
## pH -0.06649456 -0.34169933 1.00000000
## sulphates 0.04294684 0.14850641 -0.19664760
## alcohol -0.20565394 -0.49617977 0.20563251
## quality -0.18510029 -0.17491923 -0.05773139
## sulphates alcohol quality
## fixed.acidity 0.183005664 -0.06166827 0.12405165
## volatile.acidity -0.260986685 -0.20228803 -0.39055778
## citric.acid 0.312770044 0.10990325 0.22637251
## residual.sugar 0.005527121 0.04207544 0.01373164
## chlorides 0.371260481 -0.22114054 -0.12890656
## free.sulfur.dioxide 0.051657572 -0.06940835 -0.05065606
## total.sulfur.dioxide 0.042946836 -0.20565394 -0.18510029
## density 0.148506412 -0.49617977 -0.17491923
## pH -0.196647602 0.20563251 -0.05773139
## sulphates 1.000000000 0.09359475 0.25139708
## alcohol 0.093594750 1.00000000 0.47616632
## quality 0.251397079 0.47616632 1.00000000
library("corrgram", lib.loc="/Library/Frameworks/R.framework/Versions/3.4/Resourc
corrgram(wine.df, order=TRUE, lower.panel=panel.shade,upper.panel=panel.pie, text
.panel=panel.txt,main="Red Wine Quality")
1. Free SO2-Noticeable positive correlation with Total SO2 and Residual sugar Negative correlation
with pH and Alcohol
2. Total So2-Positive correlation between free so2 and residual sugar Negative correlation with
3. pH-Positive correlation with Alcohol and Volatile Acidity Negative correlation with Total and Free
SO2,Residual sugar,citric acid.
4. Alcohol-Positive correlation with pH and quality NEGATIVE Correlation with density,total and free
5. Quality-positive correlation with alcohol negative correaltion with density,chlorides,volatile acidity
AlcoholAnalysis - ScatterPlots
1. There seems to be no significantbias ofthe alcohol contenteventhough there are samples with higer
Alcohol contentfor Red wine
2. pH scatterplot indicates an intrestng observation that pH and alcohol share storng correlations.
3. Total SO2 content decreases with Alcohol contentfor wine
4. The Free SO2 content decrease as the alcohol contentincreases for wine.
pH Analysis - ScatterPlots
1. No clear relation is established between quality and pH
2. There is a distributed relations between pH and Total sulphur dioxide with SO2
maximum ranging to be around 150.
3. There is a distributed relations between pH and Free sulphur dioxide
Hypothesis Testing
##Hypothesis 1
#A higher alcohol content and lower fixed acidity tends to equal a higher
quality wine. Why is this?
I will use heatmaps and Chi-Square Tests for concluding this hypothesis.
#Chi-Sq on Quality and Alcohol
chisq.test(quality, alcohol)
## data: quality and alcohol
## X-squared = 1124.5, df = 320, p-value < 2.2e-16
#Chi-Sq on Quality and Alcohol
chisq.test(quality, fixed.acidity)
## data: quality and fixed.acidity
## X-squared = 736.08, df = 475, p-value = 1.416e-13
This hypothesis comes out to be correct as the Chi-Sq tests confirm that there exists a signif
icant relation and heatmap shows the distribution.
##Hypothesis 2
#Higher quality wine tends to have a lower residual sugar and lower citric
acid. Why is this?
#Chi-Sq on Quality and Alcohol
chisq.test(quality, citric.acid)
## data: quality and citric.acid
## X-squared = 695.82, df = 395, p-value < 2.2e-16
#Chi-Sq on Quality and Alcohol
chisq.test(quality, residual.sugar)
## data: quality and residual.sugar
## X-squared = 864.79, df = 450, p-value < 2.2e-16
This hypothesis is wrong considering the observations from the heatmap.
##Hypothesis 3
#Does lower sulfur content make wine higher quality?
plot(sulphates, quality, ylab="Quality", xlab="Sulphates", main="Quality vs
#Chi-Sq on Quality and Sulphates
chisq.test(quality, sulphates)
## data: quality and sulphates
## X-squared = 925.78, df = 475, p-value < 2.2e-16
Yes this hypothesis stands correct as majorly the samples with higher quality
tend to have lower sulphate contents.
Linear Regression Models and Testing
#Test Model 1
model1 <- lm( quality ~ alcohol)
## Call:
## lm(formula = quality ~ alcohol)
## Residuals:
## Min 1Q Median 3Q Max
## -2.8442 -0.4112 -0.1690 0.5166 2.5888
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.87497 0.17471 10.73 <2e-16 ***
## alcohol 0.36084 0.01668 21.64 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 0.7104 on 1597 degrees of freedom
## Multiple R-squared: 0.2267, Adjusted R-squared: 0.2263
## F-statistic: 468.3 on 1 and 1597 DF, p-value: < 2.2e-16
P-value and the star marking assure that Alcohol is a significant factor.
add1(model1, scope = wine.df, test = 'F')
## Warning in model.matrix.default(Terms, m, contrasts.arg = object
## $contrasts): the response appeared on the right-hand side and was dropped
## Warning in model.matrix.default(Terms, m, contrasts.arg = object
## $contrasts): problem with term 11 in model.matrix: no columns are assigned
## Single term additions
## Model:
## quality ~ alcohol
## Df Sum of Sq RSS AIC F value Pr(>F)
## <none> 805.87 -1091.7
## volatile.acidity 1 94.074 711.80 -1288.1 210.9346 < 2.2e-16 ***
## citric.acid 1 31.953 773.92 -1154.3 65.8949 9.408e-16 ***
## residual.sugar 1 0.041 805.83 -1089.7 0.0822 0.774437
## chlorides 1 0.611 805.26 -1090.9 1.2103 0.271443
## free.sulfur.dioxide 1 0.325 805.55 -1090.3 0.6431 0.422696
## total.sulfur.dioxide 1 8.270 797.60 -1106.2 16.5475 4.976e-05 ***
## density 1 5.203 800.67 -1100.0 10.3708 0.001306 **
## pH 1 26.362 779.51 -1142.8 53.9749 3.226e-13 ***
## sulphates 1 44.977 760.89 -1181.5 94.3399 < 2.2e-16 ***
## quality 0 0.000 805.87 -1091.7
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We check that based on variance which all other factors can be significant
#Test Model 7
model7 <- lm( quality ~ alcohol + pH + total.sulfur.dioxide + citric.acid + c
hlorides + sulphates + volatile.acidity)
## Call:
## lm(formula = quality ~ alcohol + pH + total.sulfur.dioxide +
## citric.acid + chlorides + sulphates + volatile.acidity)
## Residuals:
## Min 1Q Median 3Q Max
## -2.58632 -0.36679 -0.04584 0.45297 1.95470
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.6134833 0.4607493 10.013 < 2e-16 ***
## alcohol 0.2951742 0.0171178 17.244 < 2e-16 ***
## pH -0.5247565 0.1328432 -3.950 8.15e-05 ***
## total.sulfur.dioxide -0.0023114 0.0005082 -4.549 5.81e-06 ***
## citric.acid -0.1670682 0.1207391 -1.384 0.167
## chlorides -1.9153285 0.4028925 -4.754 2.17e-06 ***
## sulphates 0.8994970 0.1102877 8.156 6.96e-16 ***
## volatile.acidity -1.1146326 0.1145923 -9.727 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 0.6485 on 1591 degrees of freedom
## Multiple R-squared: 0.3579, Adjusted R-squared: 0.3551
## F-statistic: 126.7 on 7 and 1591 DF, p-value: < 2.2e-16
add1(model7, scope = wine.df, test = 'F')
## Warning in model.matrix.default(Terms, m, contrasts.arg = object
## $contrasts): the response appeared on the right-hand side and was dropped
## Warning in model.matrix.default(Terms, m, contrasts.arg = object
## $contrasts): problem with term 11 in model.matrix: no columns are assigned
## Single term additions
## Model:
## quality ~ alcohol + pH + total.sulfur.dioxide + citric.acid +
## chlorides + sulphates + volatile.acidity
## Df Sum of Sq RSS AIC F value Pr(>F)
## <none> 669.13 -1377.0
## residual.sugar 1 0.41979 668.71 -1376.0 0.9982 0.3179
## free.sulfur.dioxide 1 2.06369 667.06 -1379.9 4.9190 0.0267 *
## density 1 0.05573 669.07 -1375.1 0.1324 0.7160
## quality 0 0.00000 669.13 -1377.0
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Test Model 8
model8 <- lm( quality ~ alcohol + pH + total.sulfur.dioxide + chlorides + sul
phates + volatile.acidity)
This the final model
## Call:
## lm(formula = quality ~ alcohol + pH + total.sulfur.dioxide +
## chlorides + sulphates + volatile.acidity)
## Residuals:
## Min 1Q Median 3Q Max
## -2.60575 -0.35883 -0.04806 0.46079 1.95643
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.2957316 0.3995603 10.751 < 2e-16 ***
## alcohol 0.2906738 0.0168108 17.291 < 2e-16 ***
## pH -0.4351830 0.1160368 -3.750 0.000183 ***
## total.sulfur.dioxide -0.0023721 0.0005064 -4.684 3.05e-06 ***
## chlorides -2.0022839 0.3980757 -5.030 5.46e-07 ***
## sulphates 0.8886802 0.1100419 8.076 1.31e-15 ***
## volatile.acidity -1.0381945 0.1004270 -10.338 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 0.6487 on 1592 degrees of freedom
## Multiple R-squared: 0.3572, Adjusted R-squared: 0.3548
## F-statistic: 147.4 on 6 and 1592 DF, p-value: < 2.2e-16
In this final model all the factors have come out to be significant.
add1(model8, scope = wine.df, test = 'F')
## Warning in model.matrix.default(Terms, m, contrasts.arg = object
## $contrasts): the response appeared on the right-hand side and was dropped
## Warning in model.matrix.default(Terms, m, contrasts.arg = object
## $contrasts): problem with term 11 in model.matrix: no columns are assigned
## Single term additions
## Model:
## quality ~ alcohol + pH + total.sulfur.dioxide + chlorides + sulphates +
## volatile.acidity
## Df Sum of Sq RSS AIC F value Pr(>F)
## <none> 669.93 -1377.1
## citric.acid 1 0.80525 669.13 -1377.0 1.9147 0.16664
## residual.sugar 1 0.28390 669.65 -1375.7 0.6745 0.41161
## free.sulfur.dioxide 1 2.39413 667.54 -1380.8 5.7061 0.01702 *
## density 1 0.04468 669.89 -1375.2 0.1061 0.74465
## quality 0 0.00000 669.93 -1377.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
A limitation of the current analysis is that the current data consists of samples collected
from a specific portugal region.It will be intresting to obtain datasets across various wine
making regions to eliminate any bias created by any secific qualities of the product.
Regression Equation
Quality = 4.29 + (0.29)*alcohol + (0.88)*sulphates – { (0.43)*pH + (0.002)*tot.SO2 +
(2)*chlorides + (1.03)*vol.acidity }
 Hence quality depends on factors like alcohol and sulphates in a positive relation
and on pH , SO2 chlorides and acidity in a negative relation.
Waterhouse Lab :
Aroma Dictiory:
Wisconsin Dept.Health Services:

More Related Content

Similar to Red_wine_final_report

5228_Leeder Wine Bro-low res
5228_Leeder Wine Bro-low res5228_Leeder Wine Bro-low res
5228_Leeder Wine Bro-low res
Dr John Leeder
Duanrui Shi
Wine Quality
Wine QualityWine Quality
Wine Quality
Tapas Saha
Venkatesan R - 6369851191
Assignment - 03Model Building, Selection, & Prediction.docx
Assignment - 03Model Building, Selection, & Prediction.docxAssignment - 03Model Building, Selection, & Prediction.docx
Assignment - 03Model Building, Selection, & Prediction.docx
Assignment - 03Model Building, Selection, & Prediction.docx
Assignment - 03Model Building, Selection, & Prediction.docxAssignment - 03Model Building, Selection, & Prediction.docx
Assignment - 03Model Building, Selection, & Prediction.docx
Analysis of beer
Analysis of beer Analysis of beer
Analysis of beer
Practical White Wine Production: Theory and Practice
Practical White Wine Production: Theory and PracticePractical White Wine Production: Theory and Practice
Practical White Wine Production: Theory and Practice
Sabrina Lueck
Wine quality Analysis
Wine quality AnalysisWine quality Analysis
Wine quality Analysis
Krishna Bollojula
Wine ppt template
Wine ppt templateWine ppt template
Wine ppt template
Krishna Bollojula
Determination of Wine Color and Total Phenol Content using the LAMBDA PDA UV/...
Determination of Wine Color and Total Phenol Content using the LAMBDA PDA UV/...Determination of Wine Color and Total Phenol Content using the LAMBDA PDA UV/...
Determination of Wine Color and Total Phenol Content using the LAMBDA PDA UV/...
PerkinElmer, Inc.
Product profile
Product profileProduct profile
Product profile
K.K. Kumar
2018 Oregon Wine Symposium | Understanding Control Points from Crush Pad to B...
2018 Oregon Wine Symposium | Understanding Control Points from Crush Pad to B...2018 Oregon Wine Symposium | Understanding Control Points from Crush Pad to B...
2018 Oregon Wine Symposium | Understanding Control Points from Crush Pad to B...
Oregon Wine Board
Wine quality
Wine qualityWine quality
Wine quality
Sadaseeb Choudhury
Analysis of fermentation products of (2) (1)
Analysis of fermentation products of (2) (1)Analysis of fermentation products of (2) (1)
Analysis of fermentation products of (2) (1)
Alcoholic beverages
Alcoholic beveragesAlcoholic beverages
Alcoholic beverages
Srooti Jos
Presentation of CDR WineLab®, Wine Analysis System
Presentation of CDR WineLab®, Wine Analysis SystemPresentation of CDR WineLab®, Wine Analysis System
Presentation of CDR WineLab®, Wine Analysis System
CDR S.r.l.
BDI Dec 2016 Sodium
BDI Dec 2016 SodiumBDI Dec 2016 Sodium
BDI Dec 2016 Sodium
Aaron Golston, M.Sc.
CDR WineLab®: controllare, intervenire e migliorare la vinificazione in cantina
CDR WineLab®: controllare, intervenire e migliorare la vinificazione in cantinaCDR WineLab®: controllare, intervenire e migliorare la vinificazione in cantina
CDR WineLab®: controllare, intervenire e migliorare la vinificazione in cantina
CDR S.r.l.

Similar to Red_wine_final_report (20)

5228_Leeder Wine Bro-low res
5228_Leeder Wine Bro-low res5228_Leeder Wine Bro-low res
5228_Leeder Wine Bro-low res
Wine Quality
Wine QualityWine Quality
Wine Quality
Assignment - 03Model Building, Selection, & Prediction.docx
Assignment - 03Model Building, Selection, & Prediction.docxAssignment - 03Model Building, Selection, & Prediction.docx
Assignment - 03Model Building, Selection, & Prediction.docx
Assignment - 03Model Building, Selection, & Prediction.docx
Assignment - 03Model Building, Selection, & Prediction.docxAssignment - 03Model Building, Selection, & Prediction.docx
Assignment - 03Model Building, Selection, & Prediction.docx
Analysis of beer
Analysis of beer Analysis of beer
Analysis of beer
Practical White Wine Production: Theory and Practice
Practical White Wine Production: Theory and PracticePractical White Wine Production: Theory and Practice
Practical White Wine Production: Theory and Practice
Wine quality Analysis
Wine quality AnalysisWine quality Analysis
Wine quality Analysis
Wine ppt template
Wine ppt templateWine ppt template
Wine ppt template
Determination of Wine Color and Total Phenol Content using the LAMBDA PDA UV/...
Determination of Wine Color and Total Phenol Content using the LAMBDA PDA UV/...Determination of Wine Color and Total Phenol Content using the LAMBDA PDA UV/...
Determination of Wine Color and Total Phenol Content using the LAMBDA PDA UV/...
Product profile
Product profileProduct profile
Product profile
2018 Oregon Wine Symposium | Understanding Control Points from Crush Pad to B...
2018 Oregon Wine Symposium | Understanding Control Points from Crush Pad to B...2018 Oregon Wine Symposium | Understanding Control Points from Crush Pad to B...
2018 Oregon Wine Symposium | Understanding Control Points from Crush Pad to B...
Wine quality
Wine qualityWine quality
Wine quality
Analysis of fermentation products of (2) (1)
Analysis of fermentation products of (2) (1)Analysis of fermentation products of (2) (1)
Analysis of fermentation products of (2) (1)
Alcoholic beverages
Alcoholic beveragesAlcoholic beverages
Alcoholic beverages
Presentation of CDR WineLab®, Wine Analysis System
Presentation of CDR WineLab®, Wine Analysis SystemPresentation of CDR WineLab®, Wine Analysis System
Presentation of CDR WineLab®, Wine Analysis System
BDI Dec 2016 Sodium
BDI Dec 2016 SodiumBDI Dec 2016 Sodium
BDI Dec 2016 Sodium
CDR WineLab®: controllare, intervenire e migliorare la vinificazione in cantina
CDR WineLab®: controllare, intervenire e migliorare la vinificazione in cantinaCDR WineLab®: controllare, intervenire e migliorare la vinificazione in cantina
CDR WineLab®: controllare, intervenire e migliorare la vinificazione in cantina

Recently uploaded

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
sameer shah
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano

Recently uploaded (20)

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf


  • 1. Capstone Project- Wine Quality Analysis #ALL LINES IN THIS COLOUR THROUGHUT THE REPORT ARE INFERENCES FROM THE ANALYSIS DONE ABOVE THAT LINE IN THE REPORT Overview We consider a set of observations on a number of red varieties involving their chemical properties and ranking by tasters. Wine industry showed a recent growth as social drinking was on the rise. The price of wine depends on a rather abstract concept of wine appreciation by wine tasters, opinion among whom may have a high degree of variability. Pricing of wine depends on such a volatile factor to some extent. Another key factor in wine certification and quality assessment is physicochemical tests which are laboratory-based and takes into account factors like acidity, pH level, presence of sugar and other chemical properties. For the wine market, it would be of interest if human quality of tasting can be related to the chemical properties of wine so that certification and quality assessment and assurance process is more controlled. Introduction Red Wine Dataset is available having 1599 different varieties. All wines are produced in a particular area of Portugal. Data are collected on 12 different properties of the wines one of which is Quality, based on sensory data, and the rest are on chemical properties of the wines including density, acidity, alcohol content etc. All chemical properties of wines are continuous variables. Quality is an ordinal variable with possible ranking from 1 (worst) to 10 (best). Each sample of wine is tasted by three independent tasters and the final rank assigned is the median rank given by the tasters. Objectives of the Analysis Objective is prediction of Quality ranking from the chemical properties of the wines. A predictive model developed to be this data is expected to provide guidance to vineyards regarding quality and price expected on their produce without heavy reliance on volatility of wine tasters. List of Attributes in Data 1. Fixed acidity: most acids involved with wine or fixed or non-volatile (one that do not evaporate readily) 2. Volatile acidity: the amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste 3. Citric acid: found in small quantities, citric acid can add ‘freshness’ and flavour to wines 4. Residual sugar: the amount of sugar remaining after fermentation stops, it’s rare to find wines with less than 1 gram/litre and wines with greater than 45 grams/litre are considered sweet 5. Chlorides: the amount of salt in the wine 6. Free sulphur dioxide: the free form of SO2 exists in equilibrium between molecular SO2 (as a dissolved gas) and bisulphite ion; it prevents microbial growth and the oxidation of wine 7. Total sulphur dioxide:amount of free and bound forms of S02; in low concentrations, SO2 is mostly undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine 8. Density: the density of wine is close to that of water depending on the percent alcohol and sugar content 9. pH: describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3-4 on the pH scale.
  • 2. 10. Sulphates: a wine additive which can contribute to sulphur dioxide gas (S02) levels, which acts as an antimicrobial and antioxidant. 11. Alcohol: the percent alcohol content of the wine 12. Quality: output variable (based on sensory data, score between 0 and 10) Analysis of Data 1. Basic Statistics summary(wine.df) ## fixed.acidity volatile.acidity citric.acid residual.sugar ## Min. : 4.60 Min. :0.1200 Min. :0.000 Min. : 0.900 ## 1st Qu.: 7.10 1st Qu.:0.3900 1st Qu.:0.090 1st Qu.: 1.900 ## Median : 7.90 Median :0.5200 Median :0.260 Median : 2.200 ## Mean : 8.32 Mean :0.5278 Mean :0.271 Mean : 2.539 ## 3rd Qu.: 9.20 3rd Qu.:0.6400 3rd Qu.:0.420 3rd Qu.: 2.600 ## Max. :15.90 Max. :1.5800 Max. :1.000 Max. :15.500 ## chlorides free.sulfur.dioxide total.sulfur.dioxide ## Min. :0.01200 Min. : 1.00 Min. : 6.00 ## 1st Qu.:0.07000 1st Qu.: 7.00 1st Qu.: 22.00 ## Median :0.07900 Median :14.00 Median : 38.00 ## Mean :0.08747 Mean :15.87 Mean : 46.47 ## 3rd Qu.:0.09000 3rd Qu.:21.00 3rd Qu.: 62.00 ## Max. :0.61100 Max. :72.00 Max. :289.00 ## density pH sulphates alcohol ## Min. :0.9901 Min. :2.740 Min. :0.3300 Min. : 8.40 ## 1st Qu.:0.9956 1st Qu.:3.210 1st Qu.:0.5500 1st Qu.: 9.50 ## Median :0.9968 Median :3.310 Median :0.6200 Median :10.20 ## Mean :0.9967 Mean :3.311 Mean :0.6581 Mean :10.42 ## 3rd Qu.:0.9978 3rd Qu.:3.400 3rd Qu.:0.7300 3rd Qu.:11.10 ## Max. :1.0037 Max. :4.010 Max. :2.0000 Max. :14.90 ## quality ## Min. :3.000 ## 1st Qu.:5.000 ## Median :6.000 ## Mean :5.636 ## 3rd Qu.:6.000 ## Max. :8.000 1. The alcohol contentvaries from 8.40 to 14.90 for the samples in dataset. 2. The quality of the samples range from 3 to 8 with 6 being the median. 3. The range for fixed acidity is quite high with minimum being 4.6 and maximum being 15.9, 4. pH value varies from 2.740 to 4.010 with a median being 3.310.
  • 3. 2. Histogram Plot Analysis 1. The spread for the quality for Red w ine exhibit a peak quality rating of approx 5. 2. The pH value seems to dispaly a normal distribution w ith major samples exhibiting values betw een 3.0 and 3.6 3. The free sulfur dioxide seems to be betw een the 1-60 w ith peaking around 10 mark. 4. The total sulfur dioxide seems to a have a spread betw een 0 and 300 and exhibiting peak around 50. 5. The alcohol content seems to vary from8 to 14 w ith major peaks around 9. 3. Correlation Matrix and Correlogram and CoVariance #Correlation Matrix cor(wine.df)
  • 4. ## fixed.acidity volatile.acidity citric.acid ## fixed.acidity 1.00000000 -0.256130895 0.67170343 ## volatile.acidity -0.25613089 1.000000000 -0.55249568 ## citric.acid 0.67170343 -0.552495685 1.00000000 ## residual.sugar 0.11477672 0.001917882 0.14357716 ## chlorides 0.09370519 0.061297772 0.20382291 ## free.sulfur.dioxide -0.15379419 -0.010503827 -0.06097813 ## total.sulfur.dioxide -0.11318144 0.076470005 0.03553302 ## density 0.66804729 0.022026232 0.36494718 ## pH -0.68297819 0.234937294 -0.54190414 ## sulphates 0.18300566 -0.260986685 0.31277004 ## alcohol -0.06166827 -0.202288027 0.10990325 ## quality 0.12405165 -0.390557780 0.22637251 ## residual.sugar chlorides free.sulfur.dioxide ## fixed.acidity 0.114776724 0.093705186 -0.153794193 ## volatile.acidity 0.001917882 0.061297772 -0.010503827 ## citric.acid 0.143577162 0.203822914 -0.060978129 ## residual.sugar 1.000000000 0.055609535 0.187048995 ## chlorides 0.055609535 1.000000000 0.005562147 ## free.sulfur.dioxide 0.187048995 0.005562147 1.000000000 ## total.sulfur.dioxide 0.203027882 0.047400468 0.667666450 ## density 0.355283371 0.200632327 -0.021945831 ## pH -0.085652422 -0.265026131 0.070377499 ## sulphates 0.005527121 0.371260481 0.051657572 ## alcohol 0.042075437 -0.221140545 -0.069408354 ## quality 0.013731637 -0.128906560 -0.050656057 ## total.sulfur.dioxide density pH ## fixed.acidity -0.11318144 0.66804729 -0.68297819 ## volatile.acidity 0.07647000 0.02202623 0.23493729 ## citric.acid 0.03553302 0.36494718 -0.54190414 ## residual.sugar 0.20302788 0.35528337 -0.08565242 ## chlorides 0.04740047 0.20063233 -0.26502613 ## free.sulfur.dioxide 0.66766645 -0.02194583 0.07037750 ## total.sulfur.dioxide 1.00000000 0.07126948 -0.06649456 ## density 0.07126948 1.00000000 -0.34169933 ## pH -0.06649456 -0.34169933 1.00000000 ## sulphates 0.04294684 0.14850641 -0.19664760 ## alcohol -0.20565394 -0.49617977 0.20563251 ## quality -0.18510029 -0.17491923 -0.05773139 ## sulphates alcohol quality ## fixed.acidity 0.183005664 -0.06166827 0.12405165 ## volatile.acidity -0.260986685 -0.20228803 -0.39055778 ## citric.acid 0.312770044 0.10990325 0.22637251 ## residual.sugar 0.005527121 0.04207544 0.01373164 ## chlorides 0.371260481 -0.22114054 -0.12890656 ## free.sulfur.dioxide 0.051657572 -0.06940835 -0.05065606 ## total.sulfur.dioxide 0.042946836 -0.20565394 -0.18510029 ## density 0.148506412 -0.49617977 -0.17491923 ## pH -0.196647602 0.20563251 -0.05773139 ## sulphates 1.000000000 0.09359475 0.25139708 ## alcohol 0.093594750 1.00000000 0.47616632 ## quality 0.251397079 0.47616632 1.00000000
  • 5. #Correlogram library("corrgram", lib.loc="/Library/Frameworks/R.framework/Versions/3.4/Resourc es/library") corrgram(wine.df, order=TRUE, lower.panel=panel.shade,upper.panel=panel.pie, text .panel=panel.txt,main="Red Wine Quality") 1. Free SO2-Noticeable positive correlation with Total SO2 and Residual sugar Negative correlation with pH and Alcohol 2. Total So2-Positive correlation between free so2 and residual sugar Negative correlation with Alcohol 3. pH-Positive correlation with Alcohol and Volatile Acidity Negative correlation with Total and Free SO2,Residual sugar,citric acid. 4. Alcohol-Positive correlation with pH and quality NEGATIVE Correlation with density,total and free so2,chlorides 5. Quality-positive correlation with alcohol negative correaltion with density,chlorides,volatile acidity
  • 6. AlcoholAnalysis - ScatterPlots 1. There seems to be no significantbias ofthe alcohol contenteventhough there are samples with higer Alcohol contentfor Red wine 2. pH scatterplot indicates an intrestng observation that pH and alcohol share storng correlations. 3. Total SO2 content decreases with Alcohol contentfor wine 4. The Free SO2 content decrease as the alcohol contentincreases for wine.
  • 7. pH Analysis - ScatterPlots 1. No clear relation is established between quality and pH 2. There is a distributed relations between pH and Total sulphur dioxide with SO2 maximum ranging to be around 150. 3. There is a distributed relations between pH and Free sulphur dioxide
  • 8. Hypothesis Testing ##Hypothesis 1 #A higher alcohol content and lower fixed acidity tends to equal a higher quality wine. Why is this? Sol. I will use heatmaps and Chi-Square Tests for concluding this hypothesis. #HeatMap #Chi-Sq on Quality and Alcohol chisq.test(quality, alcohol) ## data: quality and alcohol ## X-squared = 1124.5, df = 320, p-value < 2.2e-16 #Chi-Sq on Quality and Alcohol chisq.test(quality, fixed.acidity) ## data: quality and fixed.acidity ## X-squared = 736.08, df = 475, p-value = 1.416e-13 This hypothesis comes out to be correct as the Chi-Sq tests confirm that there exists a signif icant relation and heatmap shows the distribution.
  • 9. ##Hypothesis 2 #Higher quality wine tends to have a lower residual sugar and lower citric acid. Why is this? #HeatMap #Chi-Sq on Quality and Alcohol chisq.test(quality, citric.acid) ## data: quality and citric.acid ## X-squared = 695.82, df = 395, p-value < 2.2e-16 #Chi-Sq on Quality and Alcohol chisq.test(quality, residual.sugar) ## data: quality and residual.sugar ## X-squared = 864.79, df = 450, p-value < 2.2e-16 This hypothesis is wrong considering the observations from the heatmap.
  • 10. ##Hypothesis 3 #Does lower sulfur content make wine higher quality? #ScatterPlot plot(sulphates, quality, ylab="Quality", xlab="Sulphates", main="Quality vs Sulphates") #Chi-Sq on Quality and Sulphates chisq.test(quality, sulphates) ## data: quality and sulphates ## X-squared = 925.78, df = 475, p-value < 2.2e-16 Yes this hypothesis stands correct as majorly the samples with higher quality tend to have lower sulphate contents. Linear Regression Models and Testing #Test Model 1 model1 <- lm( quality ~ alcohol) summary(model1) ## ## Call: ## lm(formula = quality ~ alcohol) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.8442 -0.4112 -0.1690 0.5166 2.5888 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|)
  • 11. ## (Intercept) 1.87497 0.17471 10.73 <2e-16 *** ## alcohol 0.36084 0.01668 21.64 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.7104 on 1597 degrees of freedom ## Multiple R-squared: 0.2267, Adjusted R-squared: 0.2263 ## F-statistic: 468.3 on 1 and 1597 DF, p-value: < 2.2e-16 P-value and the star marking assure that Alcohol is a significant factor. add1(model1, scope = wine.df, test = 'F') ## Warning in model.matrix.default(Terms, m, contrasts.arg = object ## $contrasts): the response appeared on the right-hand side and was dropped ## Warning in model.matrix.default(Terms, m, contrasts.arg = object ## $contrasts): problem with term 11 in model.matrix: no columns are assigned ## Single term additions ## ## Model: ## quality ~ alcohol ## Df Sum of Sq RSS AIC F value Pr(>F) ## <none> 805.87 -1091.7 ## volatile.acidity 1 94.074 711.80 -1288.1 210.9346 < 2.2e-16 *** ## citric.acid 1 31.953 773.92 -1154.3 65.8949 9.408e-16 *** ## residual.sugar 1 0.041 805.83 -1089.7 0.0822 0.774437 ## chlorides 1 0.611 805.26 -1090.9 1.2103 0.271443 ## free.sulfur.dioxide 1 0.325 805.55 -1090.3 0.6431 0.422696 ## total.sulfur.dioxide 1 8.270 797.60 -1106.2 16.5475 4.976e-05 *** ## density 1 5.203 800.67 -1100.0 10.3708 0.001306 ** ## pH 1 26.362 779.51 -1142.8 53.9749 3.226e-13 *** ## sulphates 1 44.977 760.89 -1181.5 94.3399 < 2.2e-16 *** ## quality 0 0.000 805.87 -1091.7 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 We check that based on variance which all other factors can be significant here. #Test Model 7 model7 <- lm( quality ~ alcohol + pH + total.sulfur.dioxide + citric.acid + c hlorides + sulphates + volatile.acidity) summary(model7) ## ## Call: ## lm(formula = quality ~ alcohol + pH + total.sulfur.dioxide + ## citric.acid + chlorides + sulphates + volatile.acidity) ## ## Residuals:
  • 12. ## Min 1Q Median 3Q Max ## -2.58632 -0.36679 -0.04584 0.45297 1.95470 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 4.6134833 0.4607493 10.013 < 2e-16 *** ## alcohol 0.2951742 0.0171178 17.244 < 2e-16 *** ## pH -0.5247565 0.1328432 -3.950 8.15e-05 *** ## total.sulfur.dioxide -0.0023114 0.0005082 -4.549 5.81e-06 *** ## citric.acid -0.1670682 0.1207391 -1.384 0.167 ## chlorides -1.9153285 0.4028925 -4.754 2.17e-06 *** ## sulphates 0.8994970 0.1102877 8.156 6.96e-16 *** ## volatile.acidity -1.1146326 0.1145923 -9.727 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.6485 on 1591 degrees of freedom ## Multiple R-squared: 0.3579, Adjusted R-squared: 0.3551 ## F-statistic: 126.7 on 7 and 1591 DF, p-value: < 2.2e-16 add1(model7, scope = wine.df, test = 'F') ## Warning in model.matrix.default(Terms, m, contrasts.arg = object ## $contrasts): the response appeared on the right-hand side and was dropped ## Warning in model.matrix.default(Terms, m, contrasts.arg = object ## $contrasts): problem with term 11 in model.matrix: no columns are assigned ## Single term additions ## ## Model: ## quality ~ alcohol + pH + total.sulfur.dioxide + citric.acid + ## chlorides + sulphates + volatile.acidity ## Df Sum of Sq RSS AIC F value Pr(>F) ## <none> 669.13 -1377.0 ## residual.sugar 1 0.41979 668.71 -1376.0 0.9982 0.3179 ## free.sulfur.dioxide 1 2.06369 667.06 -1379.9 4.9190 0.0267 * ## density 1 0.05573 669.07 -1375.1 0.1324 0.7160 ## quality 0 0.00000 669.13 -1377.0 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #Test Model 8 model8 <- lm( quality ~ alcohol + pH + total.sulfur.dioxide + chlorides + sul phates + volatile.acidity) summary(model8) This the final model ## ## Call:
  • 13. ## lm(formula = quality ~ alcohol + pH + total.sulfur.dioxide + ## chlorides + sulphates + volatile.acidity) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.60575 -0.35883 -0.04806 0.46079 1.95643 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 4.2957316 0.3995603 10.751 < 2e-16 *** ## alcohol 0.2906738 0.0168108 17.291 < 2e-16 *** ## pH -0.4351830 0.1160368 -3.750 0.000183 *** ## total.sulfur.dioxide -0.0023721 0.0005064 -4.684 3.05e-06 *** ## chlorides -2.0022839 0.3980757 -5.030 5.46e-07 *** ## sulphates 0.8886802 0.1100419 8.076 1.31e-15 *** ## volatile.acidity -1.0381945 0.1004270 -10.338 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.6487 on 1592 degrees of freedom ## Multiple R-squared: 0.3572, Adjusted R-squared: 0.3548 ## F-statistic: 147.4 on 6 and 1592 DF, p-value: < 2.2e-16 In this final model all the factors have come out to be significant. add1(model8, scope = wine.df, test = 'F') ## Warning in model.matrix.default(Terms, m, contrasts.arg = object ## $contrasts): the response appeared on the right-hand side and was dropped ## Warning in model.matrix.default(Terms, m, contrasts.arg = object ## $contrasts): problem with term 11 in model.matrix: no columns are assigned ## Single term additions ## ## Model: ## quality ~ alcohol + pH + total.sulfur.dioxide + chlorides + sulphates + ## volatile.acidity ## Df Sum of Sq RSS AIC F value Pr(>F) ## <none> 669.93 -1377.1 ## citric.acid 1 0.80525 669.13 -1377.0 1.9147 0.16664 ## residual.sugar 1 0.28390 669.65 -1375.7 0.6745 0.41161 ## free.sulfur.dioxide 1 2.39413 667.54 -1380.8 5.7061 0.01702 * ## density 1 0.04468 669.89 -1375.2 0.1061 0.74465 ## quality 0 0.00000 669.93 -1377.1 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • 14. Conclusion A limitation of the current analysis is that the current data consists of samples collected from a specific portugal region.It will be intresting to obtain datasets across various wine making regions to eliminate any bias created by any secific qualities of the product. Regression Equation Quality = 4.29 + (0.29)*alcohol + (0.88)*sulphates – { (0.43)*pH + (0.002)*tot.SO2 + (2)*chlorides + (1.03)*vol.acidity }  Hence quality depends on factors like alcohol and sulphates in a positive relation and on pH , SO2 chlorides and acidity in a negative relation. Refrences Calwineries: Waterhouse Lab : Aroma Dictiory: Wisconsin Dept.Health Services: