Red Wine
Quality
Evaluation
Weiyang Bi
Shilin Wang
Zheng Xue
Data Description
• Source:
Paulo Cortez, University of Minho, Guimarães,
Portugal, http://www3.dsi.uminho.pt/pcortez A.
Cerdeira, F. Almeida, T. Matos and J. Reis,
Viticulture Commission of the Vinho Verde
Region(CVRVV), Porto, Portugal @2009
Data Description
• The dataset is related to red variant of
the Portuguese "Vinho Verde" wine.
• Due to privacy and logistic issues, only
physicochemical (inputs) and sensory
(the output) variables are available.
Dataset
> nrow(data[!complete.cases(data),])
[1] 0
Missing values check
Attribute information
Input variables (based on physicochemical
tests): 1 - fixed acidity 2 - volatile acidity 3 - citric
acid 4 - residual sugar 5 - chlorides 6 - free sulfur
dioxide 7 - total sulfur dioxide 8 - density 9 - pH 10 -
sulphates 11 - alcohol Output variable (based on
sensory data): 12 - quality (score between 0 and 10)
Correlation matrix
R code for training set
and test set
B<-20
for(i in 1:B){
set.seed(i)
indexes<-sample(1:nrow(data),size=1000,replace=F)
train<-data[indexes[1:1000],]
test<-data[-indexes[1:1000],]
}
Methods
Three methods were applied to the data set:
1) CART
2) Bagging
3) Random Forest
Classification and
Regression Trees (CART)
CP Table
Pruned Tree
Variable Selection
Total
sulfur
dioxide
Volatile
acidity
sulfates
Residual
sugar
alcohol
Number of splits
Error rate
Bagging
Merging data
CP Table
Misclassification Rate
Misclassification rate of
bagged 100 trees
ROC Graph
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
1-Sepecificity
Sensitivity
Best single tree:0.64
Bagged 100 trees:0.644
Frequency Table
Evaluation of Variable Importance
Random Forest
Data Structure
Random Forest Fit
Random Forest Plot
Importance
Relative Variable Importance
Partial Dependence Plot
Alcohol Sulphates
Volatiles acidity Total sulfur dioxide
Partialdependence
Volatiles acidity
Partialdependence
Partialdependence
Partialdependence
CART Bagging and RF Comparison
CART Bagging Random Forest
Variable
Selection
Alcohol
total sulfur dioxide
volatile acidity
Sulphates
Residual sugar
Alcohol
Sulphates
volatile acidity
total sulfur dioxide
Density
fixed acidity
residual sugar
citric acid
pH
free sulfur dioxide
Chlorides
Alcohol
Sulphates
volatile acidity
total sulfur dioxide
Density
Chlorides
fixed acidity
free sulfur dioxide
pH
citric acid
residual sugar
CART Bagging and RF Comparison
Conclusion
• Random forest is the best prediction
tool in this case over CART and bagging
in terms of the lowest estimate test
error rate.

Red Wine Quality Assessment