Red_wine_final_report

Capstone Project- Wine Quality Analysis
#ALL LINES IN THIS COLOUR THROUGHUT THE REPORT ARE INFERENCES FROM THE ANALYSIS DONE ABOVE THAT LINE IN THE
REPORT
Overview
We consider a set of observations on a number of red varieties involving their chemical properties and
ranking by tasters. Wine industry showed a recent growth as social drinking was on the rise. The price of
wine depends on a rather abstract concept of wine appreciation by wine tasters, opinion among whom may
have a high degree of variability. Pricing of wine depends on such a volatile factor to some extent. Another
key factor in wine certification and quality assessment is physicochemical tests which are laboratory-based
and takes into account factors like acidity, pH level, presence of sugar and other chemical properties. For the
wine market, it would be of interest if human quality of tasting can be related to the chemical properties of
wine so that certification and quality assessment and assurance process is more controlled.
Introduction
Red Wine Dataset is available having 1599 different varieties. All wines are produced in a particular area of
Portugal. Data are collected on 12 different properties of the wines one of which is Quality, based on sensory
data, and the rest are on chemical properties of the wines including density, acidity, alcohol content etc. All
chemical properties of wines are continuous variables. Quality is an ordinal variable with possible ranking
from 1 (worst) to 10 (best). Each sample of wine is tasted by three independent tasters and the final rank
assigned is the median rank given by the tasters.
Objectives of the Analysis
Objective is prediction of Quality ranking from the chemical properties of the wines. A predictive model
developed to be this data is expected to provide guidance to vineyards regarding quality and price
expected on their produce without heavy reliance on volatility of wine tasters.
List of Attributes in Data
1. Fixed acidity: most acids involved with wine or fixed or non-volatile (one that do not evaporate
readily)
2. Volatile acidity: the amount of acetic acid in wine, which at too high of levels can lead to an
unpleasant, vinegar taste
3. Citric acid: found in small quantities, citric acid can add ‘freshness’ and flavour to wines
4. Residual sugar: the amount of sugar remaining after fermentation stops, it’s rare to find wines with
less than 1 gram/litre and wines with greater than 45 grams/litre are considered sweet
5. Chlorides: the amount of salt in the wine
6. Free sulphur dioxide: the free form of SO2 exists in equilibrium between molecular SO2 (as a
dissolved gas) and bisulphite ion; it prevents microbial growth and the oxidation of wine
7. Total sulphur dioxide:amount of free and bound forms of S02; in low concentrations, SO2 is mostly
undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the
nose and taste of wine
8. Density: the density of wine is close to that of water depending on the percent alcohol and sugar
content
9. pH: describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most
wines are between 3-4 on the pH scale.

10. Sulphates: a wine additive which can contribute to sulphur dioxide gas (S02) levels, which acts as an
antimicrobial and antioxidant.
11. Alcohol: the percent alcohol content of the wine
12. Quality: output variable (based on sensory data, score between 0 and 10)
Analysis of Data
1. Basic Statistics
summary(wine.df)
## fixed.acidity volatile.acidity citric.acid residual.sugar
## Min. : 4.60 Min. :0.1200 Min. :0.000 Min. : 0.900
## 1st Qu.: 7.10 1st Qu.:0.3900 1st Qu.:0.090 1st Qu.: 1.900
## Median : 7.90 Median :0.5200 Median :0.260 Median : 2.200
## Mean : 8.32 Mean :0.5278 Mean :0.271 Mean : 2.539
## 3rd Qu.: 9.20 3rd Qu.:0.6400 3rd Qu.:0.420 3rd Qu.: 2.600
## Max. :15.90 Max. :1.5800 Max. :1.000 Max. :15.500
## chlorides free.sulfur.dioxide total.sulfur.dioxide
## Min. :0.01200 Min. : 1.00 Min. : 6.00
## 1st Qu.:0.07000 1st Qu.: 7.00 1st Qu.: 22.00
## Median :0.07900 Median :14.00 Median : 38.00
## Mean :0.08747 Mean :15.87 Mean : 46.47
## 3rd Qu.:0.09000 3rd Qu.:21.00 3rd Qu.: 62.00
## Max. :0.61100 Max. :72.00 Max. :289.00
## density pH sulphates alcohol
## Min. :0.9901 Min. :2.740 Min. :0.3300 Min. : 8.40
## 1st Qu.:0.9956 1st Qu.:3.210 1st Qu.:0.5500 1st Qu.: 9.50
## Median :0.9968 Median :3.310 Median :0.6200 Median :10.20
## Mean :0.9967 Mean :3.311 Mean :0.6581 Mean :10.42
## 3rd Qu.:0.9978 3rd Qu.:3.400 3rd Qu.:0.7300 3rd Qu.:11.10
## Max. :1.0037 Max. :4.010 Max. :2.0000 Max. :14.90
## quality
## Min. :3.000
## 1st Qu.:5.000
## Median :6.000
## Mean :5.636
## 3rd Qu.:6.000
## Max. :8.000
1. The alcohol contentvaries from 8.40 to 14.90 for the samples in dataset.
2. The quality of the samples range from 3 to 8 with 6 being the median.
3. The range for fixed acidity is quite high with minimum being 4.6 and maximum being 15.9,
4. pH value varies from 2.740 to 4.010 with a median being 3.310.

2. Histogram Plot Analysis
1. The spread for the quality for Red w ine exhibit a peak quality rating of approx 5.
2. The pH value seems to dispaly a normal distribution w ith major samples exhibiting values betw een 3.0 and 3.6
3. The free sulfur dioxide seems to be betw een the 1-60 w ith peaking around 10 mark.
4. The total sulfur dioxide seems to a have a spread betw een 0 and 300 and exhibiting peak around 50.
5. The alcohol content seems to vary from8 to 14 w ith major peaks around 9.
3. Correlation Matrix and Correlogram and CoVariance
#Correlation Matrix
cor(wine.df)

## fixed.acidity volatile.acidity citric.acid
## fixed.acidity 1.00000000 -0.256130895 0.67170343
## volatile.acidity -0.25613089 1.000000000 -0.55249568
## citric.acid 0.67170343 -0.552495685 1.00000000
## residual.sugar 0.11477672 0.001917882 0.14357716
## chlorides 0.09370519 0.061297772 0.20382291
## free.sulfur.dioxide -0.15379419 -0.010503827 -0.06097813
## total.sulfur.dioxide -0.11318144 0.076470005 0.03553302
## density 0.66804729 0.022026232 0.36494718
## pH -0.68297819 0.234937294 -0.54190414
## sulphates 0.18300566 -0.260986685 0.31277004
## alcohol -0.06166827 -0.202288027 0.10990325
## quality 0.12405165 -0.390557780 0.22637251
## residual.sugar chlorides free.sulfur.dioxide
## fixed.acidity 0.114776724 0.093705186 -0.153794193
## volatile.acidity 0.001917882 0.061297772 -0.010503827
## citric.acid 0.143577162 0.203822914 -0.060978129
## residual.sugar 1.000000000 0.055609535 0.187048995
## chlorides 0.055609535 1.000000000 0.005562147
## free.sulfur.dioxide 0.187048995 0.005562147 1.000000000
## total.sulfur.dioxide 0.203027882 0.047400468 0.667666450
## density 0.355283371 0.200632327 -0.021945831
## pH -0.085652422 -0.265026131 0.070377499
## sulphates 0.005527121 0.371260481 0.051657572
## alcohol 0.042075437 -0.221140545 -0.069408354
## quality 0.013731637 -0.128906560 -0.050656057
## total.sulfur.dioxide density pH
## fixed.acidity -0.11318144 0.66804729 -0.68297819
## volatile.acidity 0.07647000 0.02202623 0.23493729
## citric.acid 0.03553302 0.36494718 -0.54190414
## residual.sugar 0.20302788 0.35528337 -0.08565242
## chlorides 0.04740047 0.20063233 -0.26502613
## free.sulfur.dioxide 0.66766645 -0.02194583 0.07037750
## total.sulfur.dioxide 1.00000000 0.07126948 -0.06649456
## density 0.07126948 1.00000000 -0.34169933
## pH -0.06649456 -0.34169933 1.00000000
## sulphates 0.04294684 0.14850641 -0.19664760
## alcohol -0.20565394 -0.49617977 0.20563251
## quality -0.18510029 -0.17491923 -0.05773139
## sulphates alcohol quality
## fixed.acidity 0.183005664 -0.06166827 0.12405165
## volatile.acidity -0.260986685 -0.20228803 -0.39055778
## citric.acid 0.312770044 0.10990325 0.22637251
## residual.sugar 0.005527121 0.04207544 0.01373164
## chlorides 0.371260481 -0.22114054 -0.12890656
## free.sulfur.dioxide 0.051657572 -0.06940835 -0.05065606
## total.sulfur.dioxide 0.042946836 -0.20565394 -0.18510029
## density 0.148506412 -0.49617977 -0.17491923
## pH -0.196647602 0.20563251 -0.05773139
## sulphates 1.000000000 0.09359475 0.25139708
## alcohol 0.093594750 1.00000000 0.47616632
## quality 0.251397079 0.47616632 1.00000000

#Correlogram
library("corrgram", lib.loc="/Library/Frameworks/R.framework/Versions/3.4/Resourc
es/library")
corrgram(wine.df, order=TRUE, lower.panel=panel.shade,upper.panel=panel.pie, text
.panel=panel.txt,main="Red Wine Quality")
1. Free SO2-Noticeable positive correlation with Total SO2 and Residual sugar Negative correlation
with pH and Alcohol
2. Total So2-Positive correlation between free so2 and residual sugar Negative correlation with
Alcohol
3. pH-Positive correlation with Alcohol and Volatile Acidity Negative correlation with Total and Free
SO2,Residual sugar,citric acid.
4. Alcohol-Positive correlation with pH and quality NEGATIVE Correlation with density,total and free
so2,chlorides
5. Quality-positive correlation with alcohol negative correaltion with density,chlorides,volatile acidity

AlcoholAnalysis - ScatterPlots
1. There seems to be no significantbias ofthe alcohol contenteventhough there are samples with higer
Alcohol contentfor Red wine
2. pH scatterplot indicates an intrestng observation that pH and alcohol share storng correlations.
3. Total SO2 content decreases with Alcohol contentfor wine
4. The Free SO2 content decrease as the alcohol contentincreases for wine.

pH Analysis - ScatterPlots
1. No clear relation is established between quality and pH
2. There is a distributed relations between pH and Total sulphur dioxide with SO2
maximum ranging to be around 150.
3. There is a distributed relations between pH and Free sulphur dioxide

Hypothesis Testing
##Hypothesis 1
#A higher alcohol content and lower fixed acidity tends to equal a higher
quality wine. Why is this?
Sol.
I will use heatmaps and Chi-Square Tests for concluding this hypothesis.
#HeatMap
#Chi-Sq on Quality and Alcohol
chisq.test(quality, alcohol)
## data: quality and alcohol
## X-squared = 1124.5, df = 320, p-value < 2.2e-16
chisq.test(quality, fixed.acidity)
## data: quality and fixed.acidity
## X-squared = 736.08, df = 475, p-value = 1.416e-13
This hypothesis comes out to be correct as the Chi-Sq tests confirm that there exists a signif
icant relation and heatmap shows the distribution.

##Hypothesis 2
#Higher quality wine tends to have a lower residual sugar and lower citric
acid. Why is this?
#HeatMap
chisq.test(quality, citric.acid)
## data: quality and citric.acid
chisq.test(quality, residual.sugar)
## data: quality and residual.sugar
This hypothesis is wrong considering the observations from the heatmap.

##Hypothesis 3
#Does lower sulfur content make wine higher quality?
#ScatterPlot
plot(sulphates, quality, ylab="Quality", xlab="Sulphates", main="Quality vs
Sulphates")
#Chi-Sq on Quality and Sulphates
chisq.test(quality, sulphates)
## data: quality and sulphates
Yes this hypothesis stands correct as majorly the samples with higher quality
tend to have lower sulphate contents.
Linear Regression Models and Testing
#Test Model 1
model1 <- lm( quality ~ alcohol)
summary(model1)
##
## Call:
## lm(formula = quality ~ alcohol)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.8442 -0.4112 -0.1690 0.5166 2.5888
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)

## (Intercept) 1.87497 0.17471 10.73 <2e-16 ***
## alcohol 0.36084 0.01668 21.64 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7104 on 1597 degrees of freedom
## Multiple R-squared: 0.2267, Adjusted R-squared: 0.2263
## F-statistic: 468.3 on 1 and 1597 DF, p-value: < 2.2e-16
P-value and the star marking assure that Alcohol is a significant factor.
add1(model1, scope = wine.df, test = 'F')
## Warning in model.matrix.default(Terms, m, contrasts.arg = object
## $contrasts): the response appeared on the right-hand side and was dropped
## $contrasts): problem with term 11 in model.matrix: no columns are assigned
## Single term additions
##
## Model:
## quality ~ alcohol
## Df Sum of Sq RSS AIC F value Pr(>F)
## <none> 805.87 -1091.7
## volatile.acidity 1 94.074 711.80 -1288.1 210.9346 < 2.2e-16 ***
## citric.acid 1 31.953 773.92 -1154.3 65.8949 9.408e-16 ***
## residual.sugar 1 0.041 805.83 -1089.7 0.0822 0.774437
## chlorides 1 0.611 805.26 -1090.9 1.2103 0.271443
## free.sulfur.dioxide 1 0.325 805.55 -1090.3 0.6431 0.422696
## total.sulfur.dioxide 1 8.270 797.60 -1106.2 16.5475 4.976e-05 ***
## density 1 5.203 800.67 -1100.0 10.3708 0.001306 **
## pH 1 26.362 779.51 -1142.8 53.9749 3.226e-13 ***
## sulphates 1 44.977 760.89 -1181.5 94.3399 < 2.2e-16 ***
## quality 0 0.000 805.87 -1091.7
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We check that based on variance which all other factors can be significant
here.
#Test Model 7
model7 <- lm( quality ~ alcohol + pH + total.sulfur.dioxide + citric.acid + c
hlorides + sulphates + volatile.acidity)
summary(model7)
##
## Call:
## lm(formula = quality ~ alcohol + pH + total.sulfur.dioxide +
## citric.acid + chlorides + sulphates + volatile.acidity)
##
## Residuals:

## -2.58632 -0.36679 -0.04584 0.45297 1.95470
##
## Coefficients:
## (Intercept) 4.6134833 0.4607493 10.013 < 2e-16 ***
## alcohol 0.2951742 0.0171178 17.244 < 2e-16 ***
## pH -0.5247565 0.1328432 -3.950 8.15e-05 ***
## total.sulfur.dioxide -0.0023114 0.0005082 -4.549 5.81e-06 ***
## citric.acid -0.1670682 0.1207391 -1.384 0.167
## chlorides -1.9153285 0.4028925 -4.754 2.17e-06 ***
## sulphates 0.8994970 0.1102877 8.156 6.96e-16 ***
## volatile.acidity -1.1146326 0.1145923 -9.727 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Model:
## quality ~ alcohol + pH + total.sulfur.dioxide + citric.acid +
## chlorides + sulphates + volatile.acidity
## <none> 669.13 -1377.0
## residual.sugar 1 0.41979 668.71 -1376.0 0.9982 0.3179
## free.sulfur.dioxide 1 2.06369 667.06 -1379.9 4.9190 0.0267 *
## density 1 0.05573 669.07 -1375.1 0.1324 0.7160
## quality 0 0.00000 669.13 -1377.0
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Test Model 8
model8 <- lm( quality ~ alcohol + pH + total.sulfur.dioxide + chlorides + sul
phates + volatile.acidity)
summary(model8)
This the final model
##
## Call:

## lm(formula = quality ~ alcohol + pH + total.sulfur.dioxide +
## chlorides + sulphates + volatile.acidity)
##
## Residuals:
## -2.60575 -0.35883 -0.04806 0.46079 1.95643
##
## Coefficients:
## (Intercept) 4.2957316 0.3995603 10.751 < 2e-16 ***
## alcohol 0.2906738 0.0168108 17.291 < 2e-16 ***
## pH -0.4351830 0.1160368 -3.750 0.000183 ***
## total.sulfur.dioxide -0.0023721 0.0005064 -4.684 3.05e-06 ***
## chlorides -2.0022839 0.3980757 -5.030 5.46e-07 ***
## sulphates 0.8886802 0.1100419 8.076 1.31e-15 ***
## volatile.acidity -1.0381945 0.1004270 -10.338 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
In this final model all the factors have come out to be significant.
##
## Model:
## quality ~ alcohol + pH + total.sulfur.dioxide + chlorides + sulphates +
## volatile.acidity
## <none> 669.93 -1377.1
## citric.acid 1 0.80525 669.13 -1377.0 1.9147 0.16664
## residual.sugar 1 0.28390 669.65 -1375.7 0.6745 0.41161
## free.sulfur.dioxide 1 2.39413 667.54 -1380.8 5.7061 0.01702 *
## density 1 0.04468 669.89 -1375.2 0.1061 0.74465
## quality 0 0.00000 669.93 -1377.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion
A limitation of the current analysis is that the current data consists of samples collected
from a specific portugal region.It will be intresting to obtain datasets across various wine
making regions to eliminate any bias created by any secific qualities of the product.
Regression Equation
Quality = 4.29 + (0.29)*alcohol + (0.88)*sulphates – { (0.43)*pH + (0.002)*tot.SO2 +
(2)*chlorides + (1.03)*vol.acidity }
 Hence quality depends on factors like alcohol and sulphates in a positive relation
and on pH , SO2 chlorides and acidity in a negative relation.
Refrences
Practicalwinery.com: http://www.practicalwinery.com/janfeb09/page5.htm:
Calwineries: http://www.calwineries.com/learn/wine-chemistry
Waterhouse Lab :http://waterhouse.ucdavis.edu/
Aroma Dictiory:http://www.aromadictionary.com/articles/salt_article.html
Wisconsin Dept.Health Services: https://www.dhs.wisconsin.gov/chemical/sulfates.htm
Wines.com: http://www.wines.com/wiki/Density/

Red_wine_final_report

Recommended

Recommended

More Related Content

Similar to Red_wine_final_report

Similar to Red_wine_final_report (20)

Recently uploaded

Recently uploaded (20)

Red_wine_final_report