SlideShare a Scribd company logo
1 of 14
Capstone Project- Wine Quality Analysis
#ALL LINES IN THIS COLOUR THROUGHUT THE REPORT ARE INFERENCES FROM THE ANALYSIS DONE ABOVE THAT LINE IN THE
REPORT
Overview
We consider a set of observations on a number of red varieties involving their chemical properties and
ranking by tasters. Wine industry showed a recent growth as social drinking was on the rise. The price of
wine depends on a rather abstract concept of wine appreciation by wine tasters, opinion among whom may
have a high degree of variability. Pricing of wine depends on such a volatile factor to some extent. Another
key factor in wine certification and quality assessment is physicochemical tests which are laboratory-based
and takes into account factors like acidity, pH level, presence of sugar and other chemical properties. For the
wine market, it would be of interest if human quality of tasting can be related to the chemical properties of
wine so that certification and quality assessment and assurance process is more controlled.
Introduction
Red Wine Dataset is available having 1599 different varieties. All wines are produced in a particular area of
Portugal. Data are collected on 12 different properties of the wines one of which is Quality, based on sensory
data, and the rest are on chemical properties of the wines including density, acidity, alcohol content etc. All
chemical properties of wines are continuous variables. Quality is an ordinal variable with possible ranking
from 1 (worst) to 10 (best). Each sample of wine is tasted by three independent tasters and the final rank
assigned is the median rank given by the tasters.
Objectives of the Analysis
Objective is prediction of Quality ranking from the chemical properties of the wines. A predictive model
developed to be this data is expected to provide guidance to vineyards regarding quality and price
expected on their produce without heavy reliance on volatility of wine tasters.
List of Attributes in Data
1. Fixed acidity: most acids involved with wine or fixed or non-volatile (one that do not evaporate
readily)
2. Volatile acidity: the amount of acetic acid in wine, which at too high of levels can lead to an
unpleasant, vinegar taste
3. Citric acid: found in small quantities, citric acid can add ‘freshness’ and flavour to wines
4. Residual sugar: the amount of sugar remaining after fermentation stops, it’s rare to find wines with
less than 1 gram/litre and wines with greater than 45 grams/litre are considered sweet
5. Chlorides: the amount of salt in the wine
6. Free sulphur dioxide: the free form of SO2 exists in equilibrium between molecular SO2 (as a
dissolved gas) and bisulphite ion; it prevents microbial growth and the oxidation of wine
7. Total sulphur dioxide:amount of free and bound forms of S02; in low concentrations, SO2 is mostly
undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the
nose and taste of wine
8. Density: the density of wine is close to that of water depending on the percent alcohol and sugar
content
9. pH: describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most
wines are between 3-4 on the pH scale.
10. Sulphates: a wine additive which can contribute to sulphur dioxide gas (S02) levels, which acts as an
antimicrobial and antioxidant.
11. Alcohol: the percent alcohol content of the wine
12. Quality: output variable (based on sensory data, score between 0 and 10)
Analysis of Data
1. Basic Statistics
summary(wine.df)
## fixed.acidity volatile.acidity citric.acid residual.sugar
## Min. : 4.60 Min. :0.1200 Min. :0.000 Min. : 0.900
## 1st Qu.: 7.10 1st Qu.:0.3900 1st Qu.:0.090 1st Qu.: 1.900
## Median : 7.90 Median :0.5200 Median :0.260 Median : 2.200
## Mean : 8.32 Mean :0.5278 Mean :0.271 Mean : 2.539
## 3rd Qu.: 9.20 3rd Qu.:0.6400 3rd Qu.:0.420 3rd Qu.: 2.600
## Max. :15.90 Max. :1.5800 Max. :1.000 Max. :15.500
## chlorides free.sulfur.dioxide total.sulfur.dioxide
## Min. :0.01200 Min. : 1.00 Min. : 6.00
## 1st Qu.:0.07000 1st Qu.: 7.00 1st Qu.: 22.00
## Median :0.07900 Median :14.00 Median : 38.00
## Mean :0.08747 Mean :15.87 Mean : 46.47
## 3rd Qu.:0.09000 3rd Qu.:21.00 3rd Qu.: 62.00
## Max. :0.61100 Max. :72.00 Max. :289.00
## density pH sulphates alcohol
## Min. :0.9901 Min. :2.740 Min. :0.3300 Min. : 8.40
## 1st Qu.:0.9956 1st Qu.:3.210 1st Qu.:0.5500 1st Qu.: 9.50
## Median :0.9968 Median :3.310 Median :0.6200 Median :10.20
## Mean :0.9967 Mean :3.311 Mean :0.6581 Mean :10.42
## 3rd Qu.:0.9978 3rd Qu.:3.400 3rd Qu.:0.7300 3rd Qu.:11.10
## Max. :1.0037 Max. :4.010 Max. :2.0000 Max. :14.90
## quality
## Min. :3.000
## 1st Qu.:5.000
## Median :6.000
## Mean :5.636
## 3rd Qu.:6.000
## Max. :8.000
1. The alcohol contentvaries from 8.40 to 14.90 for the samples in dataset.
2. The quality of the samples range from 3 to 8 with 6 being the median.
3. The range for fixed acidity is quite high with minimum being 4.6 and maximum being 15.9,
4. pH value varies from 2.740 to 4.010 with a median being 3.310.
2. Histogram Plot Analysis
1. The spread for the quality for Red w ine exhibit a peak quality rating of approx 5.
2. The pH value seems to dispaly a normal distribution w ith major samples exhibiting values betw een 3.0 and 3.6
3. The free sulfur dioxide seems to be betw een the 1-60 w ith peaking around 10 mark.
4. The total sulfur dioxide seems to a have a spread betw een 0 and 300 and exhibiting peak around 50.
5. The alcohol content seems to vary from8 to 14 w ith major peaks around 9.
3. Correlation Matrix and Correlogram and CoVariance
#Correlation Matrix
cor(wine.df)
## fixed.acidity volatile.acidity citric.acid
## fixed.acidity 1.00000000 -0.256130895 0.67170343
## volatile.acidity -0.25613089 1.000000000 -0.55249568
## citric.acid 0.67170343 -0.552495685 1.00000000
## residual.sugar 0.11477672 0.001917882 0.14357716
## chlorides 0.09370519 0.061297772 0.20382291
## free.sulfur.dioxide -0.15379419 -0.010503827 -0.06097813
## total.sulfur.dioxide -0.11318144 0.076470005 0.03553302
## density 0.66804729 0.022026232 0.36494718
## pH -0.68297819 0.234937294 -0.54190414
## sulphates 0.18300566 -0.260986685 0.31277004
## alcohol -0.06166827 -0.202288027 0.10990325
## quality 0.12405165 -0.390557780 0.22637251
## residual.sugar chlorides free.sulfur.dioxide
## fixed.acidity 0.114776724 0.093705186 -0.153794193
## volatile.acidity 0.001917882 0.061297772 -0.010503827
## citric.acid 0.143577162 0.203822914 -0.060978129
## residual.sugar 1.000000000 0.055609535 0.187048995
## chlorides 0.055609535 1.000000000 0.005562147
## free.sulfur.dioxide 0.187048995 0.005562147 1.000000000
## total.sulfur.dioxide 0.203027882 0.047400468 0.667666450
## density 0.355283371 0.200632327 -0.021945831
## pH -0.085652422 -0.265026131 0.070377499
## sulphates 0.005527121 0.371260481 0.051657572
## alcohol 0.042075437 -0.221140545 -0.069408354
## quality 0.013731637 -0.128906560 -0.050656057
## total.sulfur.dioxide density pH
## fixed.acidity -0.11318144 0.66804729 -0.68297819
## volatile.acidity 0.07647000 0.02202623 0.23493729
## citric.acid 0.03553302 0.36494718 -0.54190414
## residual.sugar 0.20302788 0.35528337 -0.08565242
## chlorides 0.04740047 0.20063233 -0.26502613
## free.sulfur.dioxide 0.66766645 -0.02194583 0.07037750
## total.sulfur.dioxide 1.00000000 0.07126948 -0.06649456
## density 0.07126948 1.00000000 -0.34169933
## pH -0.06649456 -0.34169933 1.00000000
## sulphates 0.04294684 0.14850641 -0.19664760
## alcohol -0.20565394 -0.49617977 0.20563251
## quality -0.18510029 -0.17491923 -0.05773139
## sulphates alcohol quality
## fixed.acidity 0.183005664 -0.06166827 0.12405165
## volatile.acidity -0.260986685 -0.20228803 -0.39055778
## citric.acid 0.312770044 0.10990325 0.22637251
## residual.sugar 0.005527121 0.04207544 0.01373164
## chlorides 0.371260481 -0.22114054 -0.12890656
## free.sulfur.dioxide 0.051657572 -0.06940835 -0.05065606
## total.sulfur.dioxide 0.042946836 -0.20565394 -0.18510029
## density 0.148506412 -0.49617977 -0.17491923
## pH -0.196647602 0.20563251 -0.05773139
## sulphates 1.000000000 0.09359475 0.25139708
## alcohol 0.093594750 1.00000000 0.47616632
## quality 0.251397079 0.47616632 1.00000000
#Correlogram
library("corrgram", lib.loc="/Library/Frameworks/R.framework/Versions/3.4/Resourc
es/library")
corrgram(wine.df, order=TRUE, lower.panel=panel.shade,upper.panel=panel.pie, text
.panel=panel.txt,main="Red Wine Quality")
1. Free SO2-Noticeable positive correlation with Total SO2 and Residual sugar Negative correlation
with pH and Alcohol
2. Total So2-Positive correlation between free so2 and residual sugar Negative correlation with
Alcohol
3. pH-Positive correlation with Alcohol and Volatile Acidity Negative correlation with Total and Free
SO2,Residual sugar,citric acid.
4. Alcohol-Positive correlation with pH and quality NEGATIVE Correlation with density,total and free
so2,chlorides
5. Quality-positive correlation with alcohol negative correaltion with density,chlorides,volatile acidity
AlcoholAnalysis - ScatterPlots
1. There seems to be no significantbias ofthe alcohol contenteventhough there are samples with higer
Alcohol contentfor Red wine
2. pH scatterplot indicates an intrestng observation that pH and alcohol share storng correlations.
3. Total SO2 content decreases with Alcohol contentfor wine
4. The Free SO2 content decrease as the alcohol contentincreases for wine.
pH Analysis - ScatterPlots
1. No clear relation is established between quality and pH
2. There is a distributed relations between pH and Total sulphur dioxide with SO2
maximum ranging to be around 150.
3. There is a distributed relations between pH and Free sulphur dioxide
Hypothesis Testing
##Hypothesis 1
#A higher alcohol content and lower fixed acidity tends to equal a higher
quality wine. Why is this?
Sol.
I will use heatmaps and Chi-Square Tests for concluding this hypothesis.
#HeatMap
#Chi-Sq on Quality and Alcohol
chisq.test(quality, alcohol)
## data: quality and alcohol
## X-squared = 1124.5, df = 320, p-value < 2.2e-16
#Chi-Sq on Quality and Alcohol
chisq.test(quality, fixed.acidity)
## data: quality and fixed.acidity
## X-squared = 736.08, df = 475, p-value = 1.416e-13
This hypothesis comes out to be correct as the Chi-Sq tests confirm that there exists a signif
icant relation and heatmap shows the distribution.
##Hypothesis 2
#Higher quality wine tends to have a lower residual sugar and lower citric
acid. Why is this?
#HeatMap
#Chi-Sq on Quality and Alcohol
chisq.test(quality, citric.acid)
## data: quality and citric.acid
## X-squared = 695.82, df = 395, p-value < 2.2e-16
#Chi-Sq on Quality and Alcohol
chisq.test(quality, residual.sugar)
## data: quality and residual.sugar
## X-squared = 864.79, df = 450, p-value < 2.2e-16
This hypothesis is wrong considering the observations from the heatmap.
##Hypothesis 3
#Does lower sulfur content make wine higher quality?
#ScatterPlot
plot(sulphates, quality, ylab="Quality", xlab="Sulphates", main="Quality vs
Sulphates")
#Chi-Sq on Quality and Sulphates
chisq.test(quality, sulphates)
## data: quality and sulphates
## X-squared = 925.78, df = 475, p-value < 2.2e-16
Yes this hypothesis stands correct as majorly the samples with higher quality
tend to have lower sulphate contents.
Linear Regression Models and Testing
#Test Model 1
model1 <- lm( quality ~ alcohol)
summary(model1)
##
## Call:
## lm(formula = quality ~ alcohol)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.8442 -0.4112 -0.1690 0.5166 2.5888
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.87497 0.17471 10.73 <2e-16 ***
## alcohol 0.36084 0.01668 21.64 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7104 on 1597 degrees of freedom
## Multiple R-squared: 0.2267, Adjusted R-squared: 0.2263
## F-statistic: 468.3 on 1 and 1597 DF, p-value: < 2.2e-16
P-value and the star marking assure that Alcohol is a significant factor.
add1(model1, scope = wine.df, test = 'F')
## Warning in model.matrix.default(Terms, m, contrasts.arg = object
## $contrasts): the response appeared on the right-hand side and was dropped
## Warning in model.matrix.default(Terms, m, contrasts.arg = object
## $contrasts): problem with term 11 in model.matrix: no columns are assigned
## Single term additions
##
## Model:
## quality ~ alcohol
## Df Sum of Sq RSS AIC F value Pr(>F)
## <none> 805.87 -1091.7
## volatile.acidity 1 94.074 711.80 -1288.1 210.9346 < 2.2e-16 ***
## citric.acid 1 31.953 773.92 -1154.3 65.8949 9.408e-16 ***
## residual.sugar 1 0.041 805.83 -1089.7 0.0822 0.774437
## chlorides 1 0.611 805.26 -1090.9 1.2103 0.271443
## free.sulfur.dioxide 1 0.325 805.55 -1090.3 0.6431 0.422696
## total.sulfur.dioxide 1 8.270 797.60 -1106.2 16.5475 4.976e-05 ***
## density 1 5.203 800.67 -1100.0 10.3708 0.001306 **
## pH 1 26.362 779.51 -1142.8 53.9749 3.226e-13 ***
## sulphates 1 44.977 760.89 -1181.5 94.3399 < 2.2e-16 ***
## quality 0 0.000 805.87 -1091.7
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We check that based on variance which all other factors can be significant
here.
#Test Model 7
model7 <- lm( quality ~ alcohol + pH + total.sulfur.dioxide + citric.acid + c
hlorides + sulphates + volatile.acidity)
summary(model7)
##
## Call:
## lm(formula = quality ~ alcohol + pH + total.sulfur.dioxide +
## citric.acid + chlorides + sulphates + volatile.acidity)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.58632 -0.36679 -0.04584 0.45297 1.95470
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.6134833 0.4607493 10.013 < 2e-16 ***
## alcohol 0.2951742 0.0171178 17.244 < 2e-16 ***
## pH -0.5247565 0.1328432 -3.950 8.15e-05 ***
## total.sulfur.dioxide -0.0023114 0.0005082 -4.549 5.81e-06 ***
## citric.acid -0.1670682 0.1207391 -1.384 0.167
## chlorides -1.9153285 0.4028925 -4.754 2.17e-06 ***
## sulphates 0.8994970 0.1102877 8.156 6.96e-16 ***
## volatile.acidity -1.1146326 0.1145923 -9.727 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6485 on 1591 degrees of freedom
## Multiple R-squared: 0.3579, Adjusted R-squared: 0.3551
## F-statistic: 126.7 on 7 and 1591 DF, p-value: < 2.2e-16
add1(model7, scope = wine.df, test = 'F')
## Warning in model.matrix.default(Terms, m, contrasts.arg = object
## $contrasts): the response appeared on the right-hand side and was dropped
## Warning in model.matrix.default(Terms, m, contrasts.arg = object
## $contrasts): problem with term 11 in model.matrix: no columns are assigned
## Single term additions
##
## Model:
## quality ~ alcohol + pH + total.sulfur.dioxide + citric.acid +
## chlorides + sulphates + volatile.acidity
## Df Sum of Sq RSS AIC F value Pr(>F)
## <none> 669.13 -1377.0
## residual.sugar 1 0.41979 668.71 -1376.0 0.9982 0.3179
## free.sulfur.dioxide 1 2.06369 667.06 -1379.9 4.9190 0.0267 *
## density 1 0.05573 669.07 -1375.1 0.1324 0.7160
## quality 0 0.00000 669.13 -1377.0
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Test Model 8
model8 <- lm( quality ~ alcohol + pH + total.sulfur.dioxide + chlorides + sul
phates + volatile.acidity)
summary(model8)
This the final model
##
## Call:
## lm(formula = quality ~ alcohol + pH + total.sulfur.dioxide +
## chlorides + sulphates + volatile.acidity)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.60575 -0.35883 -0.04806 0.46079 1.95643
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.2957316 0.3995603 10.751 < 2e-16 ***
## alcohol 0.2906738 0.0168108 17.291 < 2e-16 ***
## pH -0.4351830 0.1160368 -3.750 0.000183 ***
## total.sulfur.dioxide -0.0023721 0.0005064 -4.684 3.05e-06 ***
## chlorides -2.0022839 0.3980757 -5.030 5.46e-07 ***
## sulphates 0.8886802 0.1100419 8.076 1.31e-15 ***
## volatile.acidity -1.0381945 0.1004270 -10.338 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6487 on 1592 degrees of freedom
## Multiple R-squared: 0.3572, Adjusted R-squared: 0.3548
## F-statistic: 147.4 on 6 and 1592 DF, p-value: < 2.2e-16
In this final model all the factors have come out to be significant.
add1(model8, scope = wine.df, test = 'F')
## Warning in model.matrix.default(Terms, m, contrasts.arg = object
## $contrasts): the response appeared on the right-hand side and was dropped
## Warning in model.matrix.default(Terms, m, contrasts.arg = object
## $contrasts): problem with term 11 in model.matrix: no columns are assigned
## Single term additions
##
## Model:
## quality ~ alcohol + pH + total.sulfur.dioxide + chlorides + sulphates +
## volatile.acidity
## Df Sum of Sq RSS AIC F value Pr(>F)
## <none> 669.93 -1377.1
## citric.acid 1 0.80525 669.13 -1377.0 1.9147 0.16664
## residual.sugar 1 0.28390 669.65 -1375.7 0.6745 0.41161
## free.sulfur.dioxide 1 2.39413 667.54 -1380.8 5.7061 0.01702 *
## density 1 0.04468 669.89 -1375.2 0.1061 0.74465
## quality 0 0.00000 669.93 -1377.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Conclusion
A limitation of the current analysis is that the current data consists of samples collected
from a specific portugal region.It will be intresting to obtain datasets across various wine
making regions to eliminate any bias created by any secific qualities of the product.
Regression Equation
Quality = 4.29 + (0.29)*alcohol + (0.88)*sulphates – { (0.43)*pH + (0.002)*tot.SO2 +
(2)*chlorides + (1.03)*vol.acidity }
 Hence quality depends on factors like alcohol and sulphates in a positive relation
and on pH , SO2 chlorides and acidity in a negative relation.
Refrences
Practicalwinery.com: http://www.practicalwinery.com/janfeb09/page5.htm:
Calwineries: http://www.calwineries.com/learn/wine-chemistry
Waterhouse Lab :http://waterhouse.ucdavis.edu/
Aroma Dictiory:http://www.aromadictionary.com/articles/salt_article.html
Wisconsin Dept.Health Services: https://www.dhs.wisconsin.gov/chemical/sulfates.htm
Wines.com: http://www.wines.com/wiki/Density/

More Related Content

What's hot

Fortified Wine Production Swe
Fortified Wine Production SweFortified Wine Production Swe
Fortified Wine Production Swe
MGM Sommelier
 
Ripasso esame Sommelier - Italia e Mondo
Ripasso esame Sommelier -  Italia e MondoRipasso esame Sommelier -  Italia e Mondo
Ripasso esame Sommelier - Italia e Mondo
Veronica Montanelli
 

What's hot (20)

Vodka
VodkaVodka
Vodka
 
El vino y el sommelier
El vino y el sommelierEl vino y el sommelier
El vino y el sommelier
 
French wine
French wineFrench wine
French wine
 
Wine of france
Wine of franceWine of france
Wine of france
 
Champagne presentation
Champagne   presentationChampagne   presentation
Champagne presentation
 
Food & Wine Harmony f and b.pdf
Food & Wine Harmony f and b.pdfFood & Wine Harmony f and b.pdf
Food & Wine Harmony f and b.pdf
 
Bordeaux
BordeauxBordeaux
Bordeaux
 
Tipos de vinos
Tipos de vinosTipos de vinos
Tipos de vinos
 
Vodka
VodkaVodka
Vodka
 
Fortified Wine Production Swe
Fortified Wine Production SweFortified Wine Production Swe
Fortified Wine Production Swe
 
rum in history and today & effects
 rum in history and today & effects rum in history and today & effects
rum in history and today & effects
 
Vodka
VodkaVodka
Vodka
 
Classifiers for Predicting Wine Quality
Classifiers for Predicting Wine QualityClassifiers for Predicting Wine Quality
Classifiers for Predicting Wine Quality
 
Italian wine
Italian wine Italian wine
Italian wine
 
Spanish Wines
Spanish WinesSpanish Wines
Spanish Wines
 
Vodka
VodkaVodka
Vodka
 
All about Bordeaux Wines
All about Bordeaux WinesAll about Bordeaux Wines
All about Bordeaux Wines
 
F & B Service Notes for 2nd year Hotel Management Students: Chap 04. sparklin...
F & B Service Notes for 2nd year Hotel Management Students: Chap 04. sparklin...F & B Service Notes for 2nd year Hotel Management Students: Chap 04. sparklin...
F & B Service Notes for 2nd year Hotel Management Students: Chap 04. sparklin...
 
Ripasso esame Sommelier - Italia e Mondo
Ripasso esame Sommelier -  Italia e MondoRipasso esame Sommelier -  Italia e Mondo
Ripasso esame Sommelier - Italia e Mondo
 
Burgundy wine introduction
Burgundy wine introductionBurgundy wine introduction
Burgundy wine introduction
 

Similar to Red_wine_final_report

Wine Taste Preference Modeling Based On Physicochemical Tests_ShuaiWei
Wine Taste Preference Modeling Based On Physicochemical Tests_ShuaiWeiWine Taste Preference Modeling Based On Physicochemical Tests_ShuaiWei
Wine Taste Preference Modeling Based On Physicochemical Tests_ShuaiWei
Shuai Wei
 
5228_Leeder Wine Bro-low res
5228_Leeder Wine Bro-low res5228_Leeder Wine Bro-low res
5228_Leeder Wine Bro-low res
Dr John Leeder
 
Product profile
Product profileProduct profile
Product profile
K.K. Kumar
 
ACS NERM 2013 Sour Beer - NMR Talk
ACS NERM 2013   Sour Beer - NMR TalkACS NERM 2013   Sour Beer - NMR Talk
ACS NERM 2013 Sour Beer - NMR Talk
John Edwards
 

Similar to Red_wine_final_report (20)

pdf.pdf
pdf.pdfpdf.pdf
pdf.pdf
 
Red wine
Red wineRed wine
Red wine
 
Wine Taste Preference Modeling Based On Physicochemical Tests_ShuaiWei
Wine Taste Preference Modeling Based On Physicochemical Tests_ShuaiWeiWine Taste Preference Modeling Based On Physicochemical Tests_ShuaiWei
Wine Taste Preference Modeling Based On Physicochemical Tests_ShuaiWei
 
5228_Leeder Wine Bro-low res
5228_Leeder Wine Bro-low res5228_Leeder Wine Bro-low res
5228_Leeder Wine Bro-low res
 
Team_Random
Team_RandomTeam_Random
Team_Random
 
Wine Quality
Wine QualityWine Quality
Wine Quality
 
ABHISHEK S2 FA FERMENTATION food analysis
ABHISHEK S2 FA FERMENTATION food analysisABHISHEK S2 FA FERMENTATION food analysis
ABHISHEK S2 FA FERMENTATION food analysis
 
Analysis of beer
Analysis of beer Analysis of beer
Analysis of beer
 
Practical White Wine Production: Theory and Practice
Practical White Wine Production: Theory and PracticePractical White Wine Production: Theory and Practice
Practical White Wine Production: Theory and Practice
 
Determination of Wine Color and Total Phenol Content using the LAMBDA PDA UV/...
Determination of Wine Color and Total Phenol Content using the LAMBDA PDA UV/...Determination of Wine Color and Total Phenol Content using the LAMBDA PDA UV/...
Determination of Wine Color and Total Phenol Content using the LAMBDA PDA UV/...
 
hw5report
hw5reporthw5report
hw5report
 
Product profile
Product profileProduct profile
Product profile
 
2018 Oregon Wine Symposium | Understanding Control Points from Crush Pad to B...
2018 Oregon Wine Symposium | Understanding Control Points from Crush Pad to B...2018 Oregon Wine Symposium | Understanding Control Points from Crush Pad to B...
2018 Oregon Wine Symposium | Understanding Control Points from Crush Pad to B...
 
Analysis of fermentation products of (2) (1)
Analysis of fermentation products of (2) (1)Analysis of fermentation products of (2) (1)
Analysis of fermentation products of (2) (1)
 
Alcoholic beverages
Alcoholic beveragesAlcoholic beverages
Alcoholic beverages
 
Presentation of CDR WineLab®, Wine Analysis System
Presentation of CDR WineLab®, Wine Analysis SystemPresentation of CDR WineLab®, Wine Analysis System
Presentation of CDR WineLab®, Wine Analysis System
 
BDI Dec 2016 Sodium
BDI Dec 2016 SodiumBDI Dec 2016 Sodium
BDI Dec 2016 Sodium
 
CDR WineLab®: controllare, intervenire e migliorare la vinificazione in cantina
CDR WineLab®: controllare, intervenire e migliorare la vinificazione in cantinaCDR WineLab®: controllare, intervenire e migliorare la vinificazione in cantina
CDR WineLab®: controllare, intervenire e migliorare la vinificazione in cantina
 
ACS NERM 2013 Sour Beer - NMR Talk
ACS NERM 2013   Sour Beer - NMR TalkACS NERM 2013   Sour Beer - NMR Talk
ACS NERM 2013 Sour Beer - NMR Talk
 
Tester and rapid kit for analysis catalog
Tester and rapid kit for analysis catalog Tester and rapid kit for analysis catalog
Tester and rapid kit for analysis catalog
 

Recently uploaded

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
shambhavirathore45
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 

Red_wine_final_report

  • 1. Capstone Project- Wine Quality Analysis #ALL LINES IN THIS COLOUR THROUGHUT THE REPORT ARE INFERENCES FROM THE ANALYSIS DONE ABOVE THAT LINE IN THE REPORT Overview We consider a set of observations on a number of red varieties involving their chemical properties and ranking by tasters. Wine industry showed a recent growth as social drinking was on the rise. The price of wine depends on a rather abstract concept of wine appreciation by wine tasters, opinion among whom may have a high degree of variability. Pricing of wine depends on such a volatile factor to some extent. Another key factor in wine certification and quality assessment is physicochemical tests which are laboratory-based and takes into account factors like acidity, pH level, presence of sugar and other chemical properties. For the wine market, it would be of interest if human quality of tasting can be related to the chemical properties of wine so that certification and quality assessment and assurance process is more controlled. Introduction Red Wine Dataset is available having 1599 different varieties. All wines are produced in a particular area of Portugal. Data are collected on 12 different properties of the wines one of which is Quality, based on sensory data, and the rest are on chemical properties of the wines including density, acidity, alcohol content etc. All chemical properties of wines are continuous variables. Quality is an ordinal variable with possible ranking from 1 (worst) to 10 (best). Each sample of wine is tasted by three independent tasters and the final rank assigned is the median rank given by the tasters. Objectives of the Analysis Objective is prediction of Quality ranking from the chemical properties of the wines. A predictive model developed to be this data is expected to provide guidance to vineyards regarding quality and price expected on their produce without heavy reliance on volatility of wine tasters. List of Attributes in Data 1. Fixed acidity: most acids involved with wine or fixed or non-volatile (one that do not evaporate readily) 2. Volatile acidity: the amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste 3. Citric acid: found in small quantities, citric acid can add ‘freshness’ and flavour to wines 4. Residual sugar: the amount of sugar remaining after fermentation stops, it’s rare to find wines with less than 1 gram/litre and wines with greater than 45 grams/litre are considered sweet 5. Chlorides: the amount of salt in the wine 6. Free sulphur dioxide: the free form of SO2 exists in equilibrium between molecular SO2 (as a dissolved gas) and bisulphite ion; it prevents microbial growth and the oxidation of wine 7. Total sulphur dioxide:amount of free and bound forms of S02; in low concentrations, SO2 is mostly undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine 8. Density: the density of wine is close to that of water depending on the percent alcohol and sugar content 9. pH: describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3-4 on the pH scale.
  • 2. 10. Sulphates: a wine additive which can contribute to sulphur dioxide gas (S02) levels, which acts as an antimicrobial and antioxidant. 11. Alcohol: the percent alcohol content of the wine 12. Quality: output variable (based on sensory data, score between 0 and 10) Analysis of Data 1. Basic Statistics summary(wine.df) ## fixed.acidity volatile.acidity citric.acid residual.sugar ## Min. : 4.60 Min. :0.1200 Min. :0.000 Min. : 0.900 ## 1st Qu.: 7.10 1st Qu.:0.3900 1st Qu.:0.090 1st Qu.: 1.900 ## Median : 7.90 Median :0.5200 Median :0.260 Median : 2.200 ## Mean : 8.32 Mean :0.5278 Mean :0.271 Mean : 2.539 ## 3rd Qu.: 9.20 3rd Qu.:0.6400 3rd Qu.:0.420 3rd Qu.: 2.600 ## Max. :15.90 Max. :1.5800 Max. :1.000 Max. :15.500 ## chlorides free.sulfur.dioxide total.sulfur.dioxide ## Min. :0.01200 Min. : 1.00 Min. : 6.00 ## 1st Qu.:0.07000 1st Qu.: 7.00 1st Qu.: 22.00 ## Median :0.07900 Median :14.00 Median : 38.00 ## Mean :0.08747 Mean :15.87 Mean : 46.47 ## 3rd Qu.:0.09000 3rd Qu.:21.00 3rd Qu.: 62.00 ## Max. :0.61100 Max. :72.00 Max. :289.00 ## density pH sulphates alcohol ## Min. :0.9901 Min. :2.740 Min. :0.3300 Min. : 8.40 ## 1st Qu.:0.9956 1st Qu.:3.210 1st Qu.:0.5500 1st Qu.: 9.50 ## Median :0.9968 Median :3.310 Median :0.6200 Median :10.20 ## Mean :0.9967 Mean :3.311 Mean :0.6581 Mean :10.42 ## 3rd Qu.:0.9978 3rd Qu.:3.400 3rd Qu.:0.7300 3rd Qu.:11.10 ## Max. :1.0037 Max. :4.010 Max. :2.0000 Max. :14.90 ## quality ## Min. :3.000 ## 1st Qu.:5.000 ## Median :6.000 ## Mean :5.636 ## 3rd Qu.:6.000 ## Max. :8.000 1. The alcohol contentvaries from 8.40 to 14.90 for the samples in dataset. 2. The quality of the samples range from 3 to 8 with 6 being the median. 3. The range for fixed acidity is quite high with minimum being 4.6 and maximum being 15.9, 4. pH value varies from 2.740 to 4.010 with a median being 3.310.
  • 3. 2. Histogram Plot Analysis 1. The spread for the quality for Red w ine exhibit a peak quality rating of approx 5. 2. The pH value seems to dispaly a normal distribution w ith major samples exhibiting values betw een 3.0 and 3.6 3. The free sulfur dioxide seems to be betw een the 1-60 w ith peaking around 10 mark. 4. The total sulfur dioxide seems to a have a spread betw een 0 and 300 and exhibiting peak around 50. 5. The alcohol content seems to vary from8 to 14 w ith major peaks around 9. 3. Correlation Matrix and Correlogram and CoVariance #Correlation Matrix cor(wine.df)
  • 4. ## fixed.acidity volatile.acidity citric.acid ## fixed.acidity 1.00000000 -0.256130895 0.67170343 ## volatile.acidity -0.25613089 1.000000000 -0.55249568 ## citric.acid 0.67170343 -0.552495685 1.00000000 ## residual.sugar 0.11477672 0.001917882 0.14357716 ## chlorides 0.09370519 0.061297772 0.20382291 ## free.sulfur.dioxide -0.15379419 -0.010503827 -0.06097813 ## total.sulfur.dioxide -0.11318144 0.076470005 0.03553302 ## density 0.66804729 0.022026232 0.36494718 ## pH -0.68297819 0.234937294 -0.54190414 ## sulphates 0.18300566 -0.260986685 0.31277004 ## alcohol -0.06166827 -0.202288027 0.10990325 ## quality 0.12405165 -0.390557780 0.22637251 ## residual.sugar chlorides free.sulfur.dioxide ## fixed.acidity 0.114776724 0.093705186 -0.153794193 ## volatile.acidity 0.001917882 0.061297772 -0.010503827 ## citric.acid 0.143577162 0.203822914 -0.060978129 ## residual.sugar 1.000000000 0.055609535 0.187048995 ## chlorides 0.055609535 1.000000000 0.005562147 ## free.sulfur.dioxide 0.187048995 0.005562147 1.000000000 ## total.sulfur.dioxide 0.203027882 0.047400468 0.667666450 ## density 0.355283371 0.200632327 -0.021945831 ## pH -0.085652422 -0.265026131 0.070377499 ## sulphates 0.005527121 0.371260481 0.051657572 ## alcohol 0.042075437 -0.221140545 -0.069408354 ## quality 0.013731637 -0.128906560 -0.050656057 ## total.sulfur.dioxide density pH ## fixed.acidity -0.11318144 0.66804729 -0.68297819 ## volatile.acidity 0.07647000 0.02202623 0.23493729 ## citric.acid 0.03553302 0.36494718 -0.54190414 ## residual.sugar 0.20302788 0.35528337 -0.08565242 ## chlorides 0.04740047 0.20063233 -0.26502613 ## free.sulfur.dioxide 0.66766645 -0.02194583 0.07037750 ## total.sulfur.dioxide 1.00000000 0.07126948 -0.06649456 ## density 0.07126948 1.00000000 -0.34169933 ## pH -0.06649456 -0.34169933 1.00000000 ## sulphates 0.04294684 0.14850641 -0.19664760 ## alcohol -0.20565394 -0.49617977 0.20563251 ## quality -0.18510029 -0.17491923 -0.05773139 ## sulphates alcohol quality ## fixed.acidity 0.183005664 -0.06166827 0.12405165 ## volatile.acidity -0.260986685 -0.20228803 -0.39055778 ## citric.acid 0.312770044 0.10990325 0.22637251 ## residual.sugar 0.005527121 0.04207544 0.01373164 ## chlorides 0.371260481 -0.22114054 -0.12890656 ## free.sulfur.dioxide 0.051657572 -0.06940835 -0.05065606 ## total.sulfur.dioxide 0.042946836 -0.20565394 -0.18510029 ## density 0.148506412 -0.49617977 -0.17491923 ## pH -0.196647602 0.20563251 -0.05773139 ## sulphates 1.000000000 0.09359475 0.25139708 ## alcohol 0.093594750 1.00000000 0.47616632 ## quality 0.251397079 0.47616632 1.00000000
  • 5. #Correlogram library("corrgram", lib.loc="/Library/Frameworks/R.framework/Versions/3.4/Resourc es/library") corrgram(wine.df, order=TRUE, lower.panel=panel.shade,upper.panel=panel.pie, text .panel=panel.txt,main="Red Wine Quality") 1. Free SO2-Noticeable positive correlation with Total SO2 and Residual sugar Negative correlation with pH and Alcohol 2. Total So2-Positive correlation between free so2 and residual sugar Negative correlation with Alcohol 3. pH-Positive correlation with Alcohol and Volatile Acidity Negative correlation with Total and Free SO2,Residual sugar,citric acid. 4. Alcohol-Positive correlation with pH and quality NEGATIVE Correlation with density,total and free so2,chlorides 5. Quality-positive correlation with alcohol negative correaltion with density,chlorides,volatile acidity
  • 6. AlcoholAnalysis - ScatterPlots 1. There seems to be no significantbias ofthe alcohol contenteventhough there are samples with higer Alcohol contentfor Red wine 2. pH scatterplot indicates an intrestng observation that pH and alcohol share storng correlations. 3. Total SO2 content decreases with Alcohol contentfor wine 4. The Free SO2 content decrease as the alcohol contentincreases for wine.
  • 7. pH Analysis - ScatterPlots 1. No clear relation is established between quality and pH 2. There is a distributed relations between pH and Total sulphur dioxide with SO2 maximum ranging to be around 150. 3. There is a distributed relations between pH and Free sulphur dioxide
  • 8. Hypothesis Testing ##Hypothesis 1 #A higher alcohol content and lower fixed acidity tends to equal a higher quality wine. Why is this? Sol. I will use heatmaps and Chi-Square Tests for concluding this hypothesis. #HeatMap #Chi-Sq on Quality and Alcohol chisq.test(quality, alcohol) ## data: quality and alcohol ## X-squared = 1124.5, df = 320, p-value < 2.2e-16 #Chi-Sq on Quality and Alcohol chisq.test(quality, fixed.acidity) ## data: quality and fixed.acidity ## X-squared = 736.08, df = 475, p-value = 1.416e-13 This hypothesis comes out to be correct as the Chi-Sq tests confirm that there exists a signif icant relation and heatmap shows the distribution.
  • 9. ##Hypothesis 2 #Higher quality wine tends to have a lower residual sugar and lower citric acid. Why is this? #HeatMap #Chi-Sq on Quality and Alcohol chisq.test(quality, citric.acid) ## data: quality and citric.acid ## X-squared = 695.82, df = 395, p-value < 2.2e-16 #Chi-Sq on Quality and Alcohol chisq.test(quality, residual.sugar) ## data: quality and residual.sugar ## X-squared = 864.79, df = 450, p-value < 2.2e-16 This hypothesis is wrong considering the observations from the heatmap.
  • 10. ##Hypothesis 3 #Does lower sulfur content make wine higher quality? #ScatterPlot plot(sulphates, quality, ylab="Quality", xlab="Sulphates", main="Quality vs Sulphates") #Chi-Sq on Quality and Sulphates chisq.test(quality, sulphates) ## data: quality and sulphates ## X-squared = 925.78, df = 475, p-value < 2.2e-16 Yes this hypothesis stands correct as majorly the samples with higher quality tend to have lower sulphate contents. Linear Regression Models and Testing #Test Model 1 model1 <- lm( quality ~ alcohol) summary(model1) ## ## Call: ## lm(formula = quality ~ alcohol) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.8442 -0.4112 -0.1690 0.5166 2.5888 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|)
  • 11. ## (Intercept) 1.87497 0.17471 10.73 <2e-16 *** ## alcohol 0.36084 0.01668 21.64 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.7104 on 1597 degrees of freedom ## Multiple R-squared: 0.2267, Adjusted R-squared: 0.2263 ## F-statistic: 468.3 on 1 and 1597 DF, p-value: < 2.2e-16 P-value and the star marking assure that Alcohol is a significant factor. add1(model1, scope = wine.df, test = 'F') ## Warning in model.matrix.default(Terms, m, contrasts.arg = object ## $contrasts): the response appeared on the right-hand side and was dropped ## Warning in model.matrix.default(Terms, m, contrasts.arg = object ## $contrasts): problem with term 11 in model.matrix: no columns are assigned ## Single term additions ## ## Model: ## quality ~ alcohol ## Df Sum of Sq RSS AIC F value Pr(>F) ## <none> 805.87 -1091.7 ## volatile.acidity 1 94.074 711.80 -1288.1 210.9346 < 2.2e-16 *** ## citric.acid 1 31.953 773.92 -1154.3 65.8949 9.408e-16 *** ## residual.sugar 1 0.041 805.83 -1089.7 0.0822 0.774437 ## chlorides 1 0.611 805.26 -1090.9 1.2103 0.271443 ## free.sulfur.dioxide 1 0.325 805.55 -1090.3 0.6431 0.422696 ## total.sulfur.dioxide 1 8.270 797.60 -1106.2 16.5475 4.976e-05 *** ## density 1 5.203 800.67 -1100.0 10.3708 0.001306 ** ## pH 1 26.362 779.51 -1142.8 53.9749 3.226e-13 *** ## sulphates 1 44.977 760.89 -1181.5 94.3399 < 2.2e-16 *** ## quality 0 0.000 805.87 -1091.7 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 We check that based on variance which all other factors can be significant here. #Test Model 7 model7 <- lm( quality ~ alcohol + pH + total.sulfur.dioxide + citric.acid + c hlorides + sulphates + volatile.acidity) summary(model7) ## ## Call: ## lm(formula = quality ~ alcohol + pH + total.sulfur.dioxide + ## citric.acid + chlorides + sulphates + volatile.acidity) ## ## Residuals:
  • 12. ## Min 1Q Median 3Q Max ## -2.58632 -0.36679 -0.04584 0.45297 1.95470 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 4.6134833 0.4607493 10.013 < 2e-16 *** ## alcohol 0.2951742 0.0171178 17.244 < 2e-16 *** ## pH -0.5247565 0.1328432 -3.950 8.15e-05 *** ## total.sulfur.dioxide -0.0023114 0.0005082 -4.549 5.81e-06 *** ## citric.acid -0.1670682 0.1207391 -1.384 0.167 ## chlorides -1.9153285 0.4028925 -4.754 2.17e-06 *** ## sulphates 0.8994970 0.1102877 8.156 6.96e-16 *** ## volatile.acidity -1.1146326 0.1145923 -9.727 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.6485 on 1591 degrees of freedom ## Multiple R-squared: 0.3579, Adjusted R-squared: 0.3551 ## F-statistic: 126.7 on 7 and 1591 DF, p-value: < 2.2e-16 add1(model7, scope = wine.df, test = 'F') ## Warning in model.matrix.default(Terms, m, contrasts.arg = object ## $contrasts): the response appeared on the right-hand side and was dropped ## Warning in model.matrix.default(Terms, m, contrasts.arg = object ## $contrasts): problem with term 11 in model.matrix: no columns are assigned ## Single term additions ## ## Model: ## quality ~ alcohol + pH + total.sulfur.dioxide + citric.acid + ## chlorides + sulphates + volatile.acidity ## Df Sum of Sq RSS AIC F value Pr(>F) ## <none> 669.13 -1377.0 ## residual.sugar 1 0.41979 668.71 -1376.0 0.9982 0.3179 ## free.sulfur.dioxide 1 2.06369 667.06 -1379.9 4.9190 0.0267 * ## density 1 0.05573 669.07 -1375.1 0.1324 0.7160 ## quality 0 0.00000 669.13 -1377.0 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #Test Model 8 model8 <- lm( quality ~ alcohol + pH + total.sulfur.dioxide + chlorides + sul phates + volatile.acidity) summary(model8) This the final model ## ## Call:
  • 13. ## lm(formula = quality ~ alcohol + pH + total.sulfur.dioxide + ## chlorides + sulphates + volatile.acidity) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.60575 -0.35883 -0.04806 0.46079 1.95643 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 4.2957316 0.3995603 10.751 < 2e-16 *** ## alcohol 0.2906738 0.0168108 17.291 < 2e-16 *** ## pH -0.4351830 0.1160368 -3.750 0.000183 *** ## total.sulfur.dioxide -0.0023721 0.0005064 -4.684 3.05e-06 *** ## chlorides -2.0022839 0.3980757 -5.030 5.46e-07 *** ## sulphates 0.8886802 0.1100419 8.076 1.31e-15 *** ## volatile.acidity -1.0381945 0.1004270 -10.338 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.6487 on 1592 degrees of freedom ## Multiple R-squared: 0.3572, Adjusted R-squared: 0.3548 ## F-statistic: 147.4 on 6 and 1592 DF, p-value: < 2.2e-16 In this final model all the factors have come out to be significant. add1(model8, scope = wine.df, test = 'F') ## Warning in model.matrix.default(Terms, m, contrasts.arg = object ## $contrasts): the response appeared on the right-hand side and was dropped ## Warning in model.matrix.default(Terms, m, contrasts.arg = object ## $contrasts): problem with term 11 in model.matrix: no columns are assigned ## Single term additions ## ## Model: ## quality ~ alcohol + pH + total.sulfur.dioxide + chlorides + sulphates + ## volatile.acidity ## Df Sum of Sq RSS AIC F value Pr(>F) ## <none> 669.93 -1377.1 ## citric.acid 1 0.80525 669.13 -1377.0 1.9147 0.16664 ## residual.sugar 1 0.28390 669.65 -1375.7 0.6745 0.41161 ## free.sulfur.dioxide 1 2.39413 667.54 -1380.8 5.7061 0.01702 * ## density 1 0.04468 669.89 -1375.2 0.1061 0.74465 ## quality 0 0.00000 669.93 -1377.1 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • 14. Conclusion A limitation of the current analysis is that the current data consists of samples collected from a specific portugal region.It will be intresting to obtain datasets across various wine making regions to eliminate any bias created by any secific qualities of the product. Regression Equation Quality = 4.29 + (0.29)*alcohol + (0.88)*sulphates – { (0.43)*pH + (0.002)*tot.SO2 + (2)*chlorides + (1.03)*vol.acidity }  Hence quality depends on factors like alcohol and sulphates in a positive relation and on pH , SO2 chlorides and acidity in a negative relation. Refrences Practicalwinery.com: http://www.practicalwinery.com/janfeb09/page5.htm: Calwineries: http://www.calwineries.com/learn/wine-chemistry Waterhouse Lab :http://waterhouse.ucdavis.edu/ Aroma Dictiory:http://www.aromadictionary.com/articles/salt_article.html Wisconsin Dept.Health Services: https://www.dhs.wisconsin.gov/chemical/sulfates.htm Wines.com: http://www.wines.com/wiki/Density/