SlideShare a Scribd company logo
1 of 10
Download to read offline
Exercise on Wine Dataset
Objective: Finding a Relation between Wine Quality and Chemical Composition of Wines
Summary of the Data:
Total number of observations: 6497
Number of instances for Red Wine: 1599
Number of instances for White Wine: 4898
Total number of Numeric variables: 11; Character variables: 2
Exploratory Data Analysis: The following graphical analysis will help us to understand how the
different components of wines are being used while making the wines or we can get a sense of
the distribution of the data chemicals is being used in the wines
Interpretation:
From the above plot, it is evident that, quality has most values concentrated in the categories 5,
6 and 7. Only a small proportion is in the categories [3, 4] and [8, 9] and none in the categories
[1, 2] and 10. Fixed acidity, volatile acidity and citric acid have outliers or more specifically for
some of the cases these components make the taste of the wine bad. If those mistakes are
taken care the quality of the wine would have been better. Residual sugar has a positively
skewed distribution; even after eliminating the outliers distribution will remain skewed.
Descriptive Statistical Analysis of the Wine Dataset:
Interpretation: At an overall level the average quality of the wine is 5.8 and if we calculate the
inter quartile range then we can see that in most of the cases the Range is greater than IQR,
which clearly tell us that dataset has the outlier on the higher side and due care should be
taken during preparation of wine.
Outlier Analysis: Boxplot analysis clearly showing that the dataset has the outliers for almost
all the variables.
Variables n mean sd median trimmed mad min max range skew kurtosis se
fixed_acidity 6497 7.2 1.3 7.0 7.1 0.9 3.8 15.9 12.1 1.7 5.1 0.0
volatile_acidity 6497 0.3 0.2 0.3 0.3 0.1 0.1 1.6 1.5 1.5 2.8 0.0
citric_acid 6497 0.3 0.1 0.3 0.3 0.1 0.0 1.7 1.7 0.5 2.4 0.0
residual_sugar 6497 5.4 4.8 3.0 4.7 2.5 0.6 65.8 65.2 1.4 4.4 0.1
chlorides 6497 0.1 0.0 0.0 0.1 0.0 0.0 0.6 0.6 5.4 50.8 0.0
free_sulfur_dioxide 6497 30.5 17.7 29.0 29.3 17.8 1.0 289.0 288.0 1.2 7.9 0.2
total_sulfur_dioxide 6497 115.7 56.5 118.0 115.9 57.8 6.0 440.0 434.0 0.0 -0.4 0.7
density 6497 1.0 0.0 1.0 1.0 0.0 1.0 1.0 0.1 0.5 6.6 0.0
pH 6497 3.2 0.2 3.2 3.2 0.2 2.7 4.0 1.3 0.4 0.4 0.0
sulphates 6497 0.5 0.1 0.5 0.5 0.1 0.2 2.0 1.8 1.8 8.6 0.0
alcohol 6497 10.5 1.2 10.3 10.4 1.3 8.0 14.9 6.9 0.6 -0.5 0.0
quality 6497 5.8 0.9 6.0 5.8 1.5 3.0 9.0 6.0 0.2 0.2 0.0
Correlation Analysis: There are no such high correlation is exists among the variables, every
variable has its own importance while making the wine.
Deep Dive on the Wine Dataset: As our main objective is to find the relation between quality of
the wine and the different compositions of chemical variables for producing wine, we have
tried to look at for each quality and the corresponding compositions.
Quality-3:
For Quality-4:
Variables n mean sd median trimmed mad min max range skew kurtosis se
fixed_acidity 30 7.9 1.7 7.5 7.7 1.3 4.2 11.8 7.6 0.5 -0.1 0.3
volatile_acidity 30 0.5 0.3 0.4 0.5 0.3 0.2 1.6 1.4 1.3 1.1 0.1
citric_acid 30 0.3 0.2 0.3 0.3 0.1 0.0 0.7 0.7 -0.2 -0.8 0.0
residual_sugar 30 5.1 4.7 3.2 4.3 2.5 0.7 16.2 15.5 1.1 -0.2 0.9
chlorides 30 0.1 0.1 0.1 0.1 0.0 0.0 0.3 0.2 1.8 2.4 0.0
free_sulfur_dioxide 30 39.2 60.0 17.0 25.2 17.8 3.0 289.0 286.0 2.7 7.6 11.0
total_sulfur_dioxide 30 122.0 112.1 102.5 104.8 120.8 9.0 440.0 431.0 1.1 0.5 20.5
density 30 1.0 0.0 1.0 1.0 0.0 1.0 1.0 0.0 0.0 -1.2 0.0
pH 30 3.3 0.2 3.2 3.3 0.2 2.9 3.6 0.8 -0.2 -1.0 0.0
sulphates 30 0.5 0.1 0.5 0.5 0.1 0.3 0.9 0.6 0.6 0.2 0.0
alcohol 30 10.2 1.1 10.2 10.2 1.2 8.0 12.6 4.6 0.1 -0.5 0.2
Variables n mean sd median trimmed mad min max range skew kurtosis se
fixed_acidity 216 7.3 1.3 7.0 7.2 0.9 4.6 12.5 7.9 1.1 1.8 0.1
volatile_acidity 216 0.5 0.2 0.4 0.4 0.2 0.1 1.1 1.0 1.0 0.2 0.0
citric_acid 216 0.3 0.2 0.3 0.3 0.2 0.0 1.0 1.0 0.6 0.7 0.0
residual_sugar 216 4.2 3.8 2.2 3.5 1.6 0.7 17.6 16.9 1.4 1.1 0.3
chlorides 216 0.1 0.0 0.1 0.1 0.0 0.0 0.6 0.6 8.1 86.3 0.0
free_sulfur_dioxide 216 20.6 18.9 15.0 17.3 13.3 3.0 138.5 135.5 2.5 10.0 1.3
total_sulfur_dioxide 216 103.4 61.3 102.0 101.6 71.2 7.0 272.0 265.0 0.3 -0.8 4.2
density 216 1.0 0.0 1.0 1.0 0.0 1.0 1.0 0.0 -0.1 -0.6 0.0
pH 216 3.2 0.2 3.2 3.2 0.2 2.7 3.9 1.2 0.5 0.2 0.0
sulphates 216 0.5 0.2 0.5 0.5 0.1 0.3 2.0 1.8 4.0 30.9 0.0
alcohol 216 10.2 1.0 10.0 10.1 1.0 8.4 13.5 5.1 0.7 0.1 0.1
Quality-5:
Quality-6:
Variables n mean sd median trimmed mad min max range skew kurtosis se
fixed_acidity 2138 7.3 1.3 7.1 7.2 0.9 4.5 15.9 11.4 1.7 5.7 0.0
volatile_acidity 2138 0.4 0.2 0.3 0.4 0.1 0.1 1.3 1.2 1.1 1.3 0.0
citric_acid 2138 0.3 0.2 0.3 0.3 0.1 0.0 1.0 1.0 0.4 0.2 0.0
residual_sugar 2138 5.8 5.0 3.0 5.1 2.7 0.6 23.5 22.9 1.0 0.0 0.1
chlorides 2138 0.1 0.0 0.1 0.1 0.0 0.0 0.6 0.6 5.2 41.5 0.0
free_sulfur_dioxide 2138 30.2 18.6 27.0 28.9 20.8 2.0 131.0 129.0 0.7 0.3 0.4
total_sulfur_dioxide 2138 120.8 60.8 127.0 121.4 69.7 6.0 344.0 338.0 -0.1 -0.8 1.3
density 2138 1.0 0.0 1.0 1.0 0.0 1.0 1.0 0.0 -0.2 -0.3 0.0
pH 2138 3.2 0.2 3.2 3.2 0.1 2.8 3.8 1.0 0.4 0.3 0.0
sulphates 2138 0.5 0.1 0.5 0.5 0.1 0.3 2.0 1.7 2.5 13.3 0.0
alcohol 2138 9.8 0.8 9.6 9.7 0.6 8.0 14.9 6.9 1.2 2.0 0.0
Quality-7:
Variables n mean sd median trimmed mad min max range skew kurtosis se
fixed_acidity 2836 7.2 1.3 6.9 7.0 0.9 3.8 14.3 10.5 1.7 4.7 0.0
volatile_acidity 2836 0.3 0.1 0.3 0.3 0.1 0.1 1.0 1.0 1.5 2.4 0.0
citric_acid 2836 0.3 0.1 0.3 0.3 0.1 0.0 1.7 1.7 0.8 4.7 0.0
residual_sugar 2836 5.5 4.9 3.1 4.8 2.7 0.7 65.8 65.1 1.7 8.5 0.1
chlorides 2836 0.1 0.0 0.0 0.0 0.0 0.0 0.4 0.4 4.4 32.9 0.0
free_sulfur_dioxide 2836 31.2 16.8 29.0 30.2 17.8 1.0 112.0 111.0 0.7 0.5 0.3
total_sulfur_dioxide 2836 115.4 55.5 117.0 115.6 56.3 6.0 294.0 288.0 0.0 -0.6 1.0
density 2836 1.0 0.0 1.0 1.0 0.0 1.0 1.0 0.1 1.2 15.5 0.0
pH 2836 3.2 0.2 3.2 3.2 0.2 2.7 4.0 1.3 0.4 0.6 0.0
sulphates 2836 0.5 0.1 0.5 0.5 0.1 0.2 2.0 1.7 1.7 8.4 0.0
alcohol 2836 10.6 1.1 10.5 10.5 1.3 8.4 14.0 5.6 0.4 -0.6 0.0
Variables n mean sd median trimmed mad min max range skew kurtosis se
fixed_acidity 1079 7.1 1.4 6.9 6.9 0.9 4.2 15.6 11.4 1.9 5.8 0.0
volatile_acidity 1079 0.3 0.1 0.3 0.3 0.1 0.1 0.9 0.8 1.3 2.9 0.0
citric_acid 1079 0.3 0.1 0.3 0.3 0.1 0.0 0.8 0.8 0.4 2.4 0.0
residual_sugar 1079 4.7 4.0 2.8 4.0 2.1 0.9 19.3 18.4 1.4 1.0 0.1
chlorides 1079 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0.3 4.2 43.3 0.0
free_sulfur_dioxide 1079 30.4 14.9 30.0 29.9 14.8 3.0 108.0 105.0 0.5 0.8 0.5
total_sulfur_dioxide 1079 108.5 47.9 114.0 110.0 40.0 7.0 289.0 282.0 -0.2 0.0 1.5
density 1079 1.0 0.0 1.0 1.0 0.0 1.0 1.0 0.0 0.6 -0.4 0.0
pH 1079 3.2 0.2 3.2 3.2 0.2 2.8 3.8 1.0 0.3 0.0 0.0
sulphates 1079 0.5 0.2 0.5 0.5 0.2 0.2 1.4 1.1 0.8 0.6 0.0
alcohol 1079 11.4 1.2 11.4 11.4 1.2 8.6 14.2 5.6 -0.3 -0.5 0.0
Quality-8:
Quality-9:
Variables n mean sd median trimmed mad min max range skew kurtosis se
fixed_acidity 193 6.8 1.1 6.8 6.8 0.7 3.9 12.6 8.7 1.3 4.7 0.1
volatile_acidity 193 0.3 0.1 0.3 0.3 0.1 0.1 0.9 0.7 1.2 2.2 0.0
citric_acid 193 0.3 0.1 0.3 0.3 0.1 0.0 0.7 0.7 0.5 2.9 0.0
residual_sugar 193 5.4 4.2 4.1 4.8 3.4 0.8 14.8 14.0 1.0 -0.3 0.3
chlorides 193 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.1 1.9 6.0 0.0
free_sulfur_dioxide 193 34.5 17.2 34.0 33.5 13.3 3.0 105.0 102.0 1.1 3.1 1.2
total_sulfur_dioxide 193 117.5 42.1 118.0 119.1 35.6 12.0 212.5 200.5 -0.3 0.5 3.0
density 193 1.0 0.0 1.0 1.0 0.0 1.0 1.0 0.0 0.9 0.1 0.0
pH 193 3.2 0.2 3.2 3.2 0.1 2.9 3.7 0.8 0.2 -0.2 0.0
sulphates 193 0.5 0.2 0.5 0.5 0.2 0.3 1.1 0.9 0.8 0.2 0.0
alcohol 193.0 11.7 1.3 12.0 11.8 1.2 8.5 14.0 5.5 -0.8 0.0 0.1
Interpretation: Though the analysis is quite lengthy but from the above statistical summary and
multiple plots we can easily find out the cut-offs points for a good or excellent quality of wine.
It is observed that people finds a wine of bad quality or good quality if the following criteria is
satisfied
Variables Bad Quality Excellent Quality
fixed_acidity less than 4 and greater than 8 Range between 6.5 to 7.5
volatile_acidity greater than 0.5 Range between 0.2 to 0.4
citric_acid greater than 0.5 Range between 0.2 to 0.5
residual_sugar greater than 10 Range between 0.5 to 5
chlorides greater than 0.05 Range between 0.01 to 0.04
free_sulfur_dioxide greater than 45 Range between 20 to 40
total_sulfur_dioxide less than 30 and greater than 150 Range between 60 to 145
density greater than 1 Range between 0.98 to 1
pH less than 3 Range between 3 to 3.6
sulphates less than 0.3 Range between 0.4 to 0.75
alcohol less than 8 and greater than 13.5 Range between 10.5 to 13
Model Building and Classification Analysis:
After thoroughly working on the exploratory data analysis we have observed the behavior of
the wine dataset. Based on the pattern of the data we have seen that the possible reason for a
wine to be of bad quality or good quality. As a next step, I have worked on the random forest
algorithm to classify the wine dataset. Variable importance plot and Mean decrease Accuracy
shows us which are chemical composition are most important while making the wine. I have
found an accuracy level of 85%, which is pretty good and the variables are classifying the quality
correctly. The model can be improved or further analysis can be done to better understanding
the variable importance.
Variables n mean sd median trimmed mad min max range skew kurtosis se
fixed_acidity 5 7.4 1.0 7.1 7.4 0.4 6.6 9.1 2.5 0.8 -1.2 0.4
volatile_acidity 5 0.3 0.1 0.3 0.3 0.0 0.2 0.4 0.1 0.2 -2.2 0.0
citric_acid 5 0.4 0.1 0.4 0.4 0.1 0.3 0.5 0.2 0.1 -2.0 0.0
residual_sugar 5 4.1 3.8 2.2 4.1 0.9 1.6 10.6 9.0 0.9 -1.2 1.7
chlorides 5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -0.2 -2.1 0.0
free_sulfur_dioxide 5 33.4 13.4 28.0 33.4 4.4 24.0 57.0 33.0 1.0 -1.0 6.0
total_sulfur_dioxide 5 116.0 19.8 119.0 116.0 8.9 85.0 139.0 54.0 -0.4 -1.4 8.9
density 5 1.0 0.0 1.0 1.0 0.0 1.0 1.0 0.0 1.0 -1.0 0.0
pH 5 3.3 0.1 3.3 3.3 0.1 3.2 3.4 0.2 0.0 -1.9 0.0
sulphates 5 0.5 0.1 0.5 0.5 0.1 0.4 0.6 0.3 0.4 -1.5 0.0
alcohol 5.0 12.2 1.0 12.5 12.2 0.3 10.4 12.9 2.5 -1.0 -1.0 0.5
Variable Importance Plot:
Importance of the Variables:
Variables MeanDec re a seAccu ra c y
alcohol 82.6
free_sulfur_dioxide 64.4
volatile_acidity 62.0
pH 61.1
sulphates 60.3
residual_sugar 59.7
chlorides 56.0
fixed_acidity 52.8
Density 51.2
total_sulfur_dioxide 49.6
citric_acid 46.7
While it is found that alcohol is the most important chemical while making the wine and the
citric acid is the least significant variable for making the wine.
Confusion Matrix:
Predicted Variable
Target Variable Bad Good Normal
Bad 9 0 1
Good 1 227 54
Normal 68 162 1428
Accuracy 85%
Note:
 Though the outlier is there in the dataset I haven’t deleted the outlier observations from
the data, as I believe we have to identify those cases where we can improve the wine
making procedure
 As our target is to identify the relation between the perceived quality of the wine and
the chemical composition of the wines, didn’t consider the red and white wine as a
separate dataset, if you observed there are no such significant differences between the
red and white wine dataset
 To classify the variables I have considered Bad as the quality level 1,2,3,4
 Normal as the quality level 5, 6, 7 and
 Good as the quality level 8,9,10

More Related Content

Similar to Wine Quality

characterization of PPO in ataulfo mango
characterization of PPO in ataulfo mango characterization of PPO in ataulfo mango
characterization of PPO in ataulfo mango Summervir Cheema
 
PEP Functional Proteomics Technology
PEP Functional Proteomics TechnologyPEP Functional Proteomics Technology
PEP Functional Proteomics TechnologyXing Wang
 
Business statistics -_assignment_dec_2019_zf_sgc5ylme
Business statistics -_assignment_dec_2019_zf_sgc5ylmeBusiness statistics -_assignment_dec_2019_zf_sgc5ylme
Business statistics -_assignment_dec_2019_zf_sgc5ylmeAssignmentchimp
 
Prep Symposium | Poster July, 2015
Prep Symposium | Poster July, 2015Prep Symposium | Poster July, 2015
Prep Symposium | Poster July, 2015KBI Biopharma
 
Q3D - Elemental Impurities: What implications for APIs & excipients suppliers?
Q3D - Elemental Impurities: What implications for APIs & excipients suppliers?Q3D - Elemental Impurities: What implications for APIs & excipients suppliers?
Q3D - Elemental Impurities: What implications for APIs & excipients suppliers?Quality Assistance s.a.
 
IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...
IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...
IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...James Nelson
 
Champagne market in Korea, short consumer Survey
Champagne market in Korea, short consumer SurveyChampagne market in Korea, short consumer Survey
Champagne market in Korea, short consumer Survey휘웅 정
 
Planta de Bolivia funcionando y Venezuela qué?
Planta de Bolivia funcionando y Venezuela qué?Planta de Bolivia funcionando y Venezuela qué?
Planta de Bolivia funcionando y Venezuela qué?Jose Manuel Aller
 

Similar to Wine Quality (11)

characterization of PPO in ataulfo mango
characterization of PPO in ataulfo mango characterization of PPO in ataulfo mango
characterization of PPO in ataulfo mango
 
PEP Functional Proteomics Technology
PEP Functional Proteomics TechnologyPEP Functional Proteomics Technology
PEP Functional Proteomics Technology
 
Business statistics -_assignment_dec_2019_zf_sgc5ylme
Business statistics -_assignment_dec_2019_zf_sgc5ylmeBusiness statistics -_assignment_dec_2019_zf_sgc5ylme
Business statistics -_assignment_dec_2019_zf_sgc5ylme
 
Prep Symposium | Poster July, 2015
Prep Symposium | Poster July, 2015Prep Symposium | Poster July, 2015
Prep Symposium | Poster July, 2015
 
Q3D - Elemental Impurities: What implications for APIs & excipients suppliers?
Q3D - Elemental Impurities: What implications for APIs & excipients suppliers?Q3D - Elemental Impurities: What implications for APIs & excipients suppliers?
Q3D - Elemental Impurities: What implications for APIs & excipients suppliers?
 
IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...
IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...
IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...
 
6. F&B Service
6. F&B Service6. F&B Service
6. F&B Service
 
Champagne market in Korea, short consumer Survey
Champagne market in Korea, short consumer SurveyChampagne market in Korea, short consumer Survey
Champagne market in Korea, short consumer Survey
 
Statistics.docx
Statistics.docxStatistics.docx
Statistics.docx
 
Planta de Bolivia funcionando y Venezuela qué?
Planta de Bolivia funcionando y Venezuela qué?Planta de Bolivia funcionando y Venezuela qué?
Planta de Bolivia funcionando y Venezuela qué?
 
Students Reporting Use of Selected Substances in the Last 30 Days
Students Reporting Use of Selected Substances in the Last 30 Days Students Reporting Use of Selected Substances in the Last 30 Days
Students Reporting Use of Selected Substances in the Last 30 Days
 

Recently uploaded

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 

Recently uploaded (20)

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 

Wine Quality

  • 1. Exercise on Wine Dataset Objective: Finding a Relation between Wine Quality and Chemical Composition of Wines Summary of the Data: Total number of observations: 6497 Number of instances for Red Wine: 1599 Number of instances for White Wine: 4898 Total number of Numeric variables: 11; Character variables: 2 Exploratory Data Analysis: The following graphical analysis will help us to understand how the different components of wines are being used while making the wines or we can get a sense of the distribution of the data chemicals is being used in the wines Interpretation: From the above plot, it is evident that, quality has most values concentrated in the categories 5, 6 and 7. Only a small proportion is in the categories [3, 4] and [8, 9] and none in the categories [1, 2] and 10. Fixed acidity, volatile acidity and citric acid have outliers or more specifically for some of the cases these components make the taste of the wine bad. If those mistakes are taken care the quality of the wine would have been better. Residual sugar has a positively skewed distribution; even after eliminating the outliers distribution will remain skewed. Descriptive Statistical Analysis of the Wine Dataset:
  • 2. Interpretation: At an overall level the average quality of the wine is 5.8 and if we calculate the inter quartile range then we can see that in most of the cases the Range is greater than IQR, which clearly tell us that dataset has the outlier on the higher side and due care should be taken during preparation of wine. Outlier Analysis: Boxplot analysis clearly showing that the dataset has the outliers for almost all the variables. Variables n mean sd median trimmed mad min max range skew kurtosis se fixed_acidity 6497 7.2 1.3 7.0 7.1 0.9 3.8 15.9 12.1 1.7 5.1 0.0 volatile_acidity 6497 0.3 0.2 0.3 0.3 0.1 0.1 1.6 1.5 1.5 2.8 0.0 citric_acid 6497 0.3 0.1 0.3 0.3 0.1 0.0 1.7 1.7 0.5 2.4 0.0 residual_sugar 6497 5.4 4.8 3.0 4.7 2.5 0.6 65.8 65.2 1.4 4.4 0.1 chlorides 6497 0.1 0.0 0.0 0.1 0.0 0.0 0.6 0.6 5.4 50.8 0.0 free_sulfur_dioxide 6497 30.5 17.7 29.0 29.3 17.8 1.0 289.0 288.0 1.2 7.9 0.2 total_sulfur_dioxide 6497 115.7 56.5 118.0 115.9 57.8 6.0 440.0 434.0 0.0 -0.4 0.7 density 6497 1.0 0.0 1.0 1.0 0.0 1.0 1.0 0.1 0.5 6.6 0.0 pH 6497 3.2 0.2 3.2 3.2 0.2 2.7 4.0 1.3 0.4 0.4 0.0 sulphates 6497 0.5 0.1 0.5 0.5 0.1 0.2 2.0 1.8 1.8 8.6 0.0 alcohol 6497 10.5 1.2 10.3 10.4 1.3 8.0 14.9 6.9 0.6 -0.5 0.0 quality 6497 5.8 0.9 6.0 5.8 1.5 3.0 9.0 6.0 0.2 0.2 0.0
  • 3. Correlation Analysis: There are no such high correlation is exists among the variables, every variable has its own importance while making the wine. Deep Dive on the Wine Dataset: As our main objective is to find the relation between quality of the wine and the different compositions of chemical variables for producing wine, we have tried to look at for each quality and the corresponding compositions. Quality-3:
  • 4. For Quality-4: Variables n mean sd median trimmed mad min max range skew kurtosis se fixed_acidity 30 7.9 1.7 7.5 7.7 1.3 4.2 11.8 7.6 0.5 -0.1 0.3 volatile_acidity 30 0.5 0.3 0.4 0.5 0.3 0.2 1.6 1.4 1.3 1.1 0.1 citric_acid 30 0.3 0.2 0.3 0.3 0.1 0.0 0.7 0.7 -0.2 -0.8 0.0 residual_sugar 30 5.1 4.7 3.2 4.3 2.5 0.7 16.2 15.5 1.1 -0.2 0.9 chlorides 30 0.1 0.1 0.1 0.1 0.0 0.0 0.3 0.2 1.8 2.4 0.0 free_sulfur_dioxide 30 39.2 60.0 17.0 25.2 17.8 3.0 289.0 286.0 2.7 7.6 11.0 total_sulfur_dioxide 30 122.0 112.1 102.5 104.8 120.8 9.0 440.0 431.0 1.1 0.5 20.5 density 30 1.0 0.0 1.0 1.0 0.0 1.0 1.0 0.0 0.0 -1.2 0.0 pH 30 3.3 0.2 3.2 3.3 0.2 2.9 3.6 0.8 -0.2 -1.0 0.0 sulphates 30 0.5 0.1 0.5 0.5 0.1 0.3 0.9 0.6 0.6 0.2 0.0 alcohol 30 10.2 1.1 10.2 10.2 1.2 8.0 12.6 4.6 0.1 -0.5 0.2 Variables n mean sd median trimmed mad min max range skew kurtosis se fixed_acidity 216 7.3 1.3 7.0 7.2 0.9 4.6 12.5 7.9 1.1 1.8 0.1 volatile_acidity 216 0.5 0.2 0.4 0.4 0.2 0.1 1.1 1.0 1.0 0.2 0.0 citric_acid 216 0.3 0.2 0.3 0.3 0.2 0.0 1.0 1.0 0.6 0.7 0.0 residual_sugar 216 4.2 3.8 2.2 3.5 1.6 0.7 17.6 16.9 1.4 1.1 0.3 chlorides 216 0.1 0.0 0.1 0.1 0.0 0.0 0.6 0.6 8.1 86.3 0.0 free_sulfur_dioxide 216 20.6 18.9 15.0 17.3 13.3 3.0 138.5 135.5 2.5 10.0 1.3 total_sulfur_dioxide 216 103.4 61.3 102.0 101.6 71.2 7.0 272.0 265.0 0.3 -0.8 4.2 density 216 1.0 0.0 1.0 1.0 0.0 1.0 1.0 0.0 -0.1 -0.6 0.0 pH 216 3.2 0.2 3.2 3.2 0.2 2.7 3.9 1.2 0.5 0.2 0.0 sulphates 216 0.5 0.2 0.5 0.5 0.1 0.3 2.0 1.8 4.0 30.9 0.0 alcohol 216 10.2 1.0 10.0 10.1 1.0 8.4 13.5 5.1 0.7 0.1 0.1
  • 5. Quality-5: Quality-6: Variables n mean sd median trimmed mad min max range skew kurtosis se fixed_acidity 2138 7.3 1.3 7.1 7.2 0.9 4.5 15.9 11.4 1.7 5.7 0.0 volatile_acidity 2138 0.4 0.2 0.3 0.4 0.1 0.1 1.3 1.2 1.1 1.3 0.0 citric_acid 2138 0.3 0.2 0.3 0.3 0.1 0.0 1.0 1.0 0.4 0.2 0.0 residual_sugar 2138 5.8 5.0 3.0 5.1 2.7 0.6 23.5 22.9 1.0 0.0 0.1 chlorides 2138 0.1 0.0 0.1 0.1 0.0 0.0 0.6 0.6 5.2 41.5 0.0 free_sulfur_dioxide 2138 30.2 18.6 27.0 28.9 20.8 2.0 131.0 129.0 0.7 0.3 0.4 total_sulfur_dioxide 2138 120.8 60.8 127.0 121.4 69.7 6.0 344.0 338.0 -0.1 -0.8 1.3 density 2138 1.0 0.0 1.0 1.0 0.0 1.0 1.0 0.0 -0.2 -0.3 0.0 pH 2138 3.2 0.2 3.2 3.2 0.1 2.8 3.8 1.0 0.4 0.3 0.0 sulphates 2138 0.5 0.1 0.5 0.5 0.1 0.3 2.0 1.7 2.5 13.3 0.0 alcohol 2138 9.8 0.8 9.6 9.7 0.6 8.0 14.9 6.9 1.2 2.0 0.0
  • 6. Quality-7: Variables n mean sd median trimmed mad min max range skew kurtosis se fixed_acidity 2836 7.2 1.3 6.9 7.0 0.9 3.8 14.3 10.5 1.7 4.7 0.0 volatile_acidity 2836 0.3 0.1 0.3 0.3 0.1 0.1 1.0 1.0 1.5 2.4 0.0 citric_acid 2836 0.3 0.1 0.3 0.3 0.1 0.0 1.7 1.7 0.8 4.7 0.0 residual_sugar 2836 5.5 4.9 3.1 4.8 2.7 0.7 65.8 65.1 1.7 8.5 0.1 chlorides 2836 0.1 0.0 0.0 0.0 0.0 0.0 0.4 0.4 4.4 32.9 0.0 free_sulfur_dioxide 2836 31.2 16.8 29.0 30.2 17.8 1.0 112.0 111.0 0.7 0.5 0.3 total_sulfur_dioxide 2836 115.4 55.5 117.0 115.6 56.3 6.0 294.0 288.0 0.0 -0.6 1.0 density 2836 1.0 0.0 1.0 1.0 0.0 1.0 1.0 0.1 1.2 15.5 0.0 pH 2836 3.2 0.2 3.2 3.2 0.2 2.7 4.0 1.3 0.4 0.6 0.0 sulphates 2836 0.5 0.1 0.5 0.5 0.1 0.2 2.0 1.7 1.7 8.4 0.0 alcohol 2836 10.6 1.1 10.5 10.5 1.3 8.4 14.0 5.6 0.4 -0.6 0.0 Variables n mean sd median trimmed mad min max range skew kurtosis se fixed_acidity 1079 7.1 1.4 6.9 6.9 0.9 4.2 15.6 11.4 1.9 5.8 0.0 volatile_acidity 1079 0.3 0.1 0.3 0.3 0.1 0.1 0.9 0.8 1.3 2.9 0.0 citric_acid 1079 0.3 0.1 0.3 0.3 0.1 0.0 0.8 0.8 0.4 2.4 0.0 residual_sugar 1079 4.7 4.0 2.8 4.0 2.1 0.9 19.3 18.4 1.4 1.0 0.1 chlorides 1079 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0.3 4.2 43.3 0.0 free_sulfur_dioxide 1079 30.4 14.9 30.0 29.9 14.8 3.0 108.0 105.0 0.5 0.8 0.5 total_sulfur_dioxide 1079 108.5 47.9 114.0 110.0 40.0 7.0 289.0 282.0 -0.2 0.0 1.5 density 1079 1.0 0.0 1.0 1.0 0.0 1.0 1.0 0.0 0.6 -0.4 0.0 pH 1079 3.2 0.2 3.2 3.2 0.2 2.8 3.8 1.0 0.3 0.0 0.0 sulphates 1079 0.5 0.2 0.5 0.5 0.2 0.2 1.4 1.1 0.8 0.6 0.0 alcohol 1079 11.4 1.2 11.4 11.4 1.2 8.6 14.2 5.6 -0.3 -0.5 0.0
  • 7. Quality-8: Quality-9: Variables n mean sd median trimmed mad min max range skew kurtosis se fixed_acidity 193 6.8 1.1 6.8 6.8 0.7 3.9 12.6 8.7 1.3 4.7 0.1 volatile_acidity 193 0.3 0.1 0.3 0.3 0.1 0.1 0.9 0.7 1.2 2.2 0.0 citric_acid 193 0.3 0.1 0.3 0.3 0.1 0.0 0.7 0.7 0.5 2.9 0.0 residual_sugar 193 5.4 4.2 4.1 4.8 3.4 0.8 14.8 14.0 1.0 -0.3 0.3 chlorides 193 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.1 1.9 6.0 0.0 free_sulfur_dioxide 193 34.5 17.2 34.0 33.5 13.3 3.0 105.0 102.0 1.1 3.1 1.2 total_sulfur_dioxide 193 117.5 42.1 118.0 119.1 35.6 12.0 212.5 200.5 -0.3 0.5 3.0 density 193 1.0 0.0 1.0 1.0 0.0 1.0 1.0 0.0 0.9 0.1 0.0 pH 193 3.2 0.2 3.2 3.2 0.1 2.9 3.7 0.8 0.2 -0.2 0.0 sulphates 193 0.5 0.2 0.5 0.5 0.2 0.3 1.1 0.9 0.8 0.2 0.0 alcohol 193.0 11.7 1.3 12.0 11.8 1.2 8.5 14.0 5.5 -0.8 0.0 0.1
  • 8. Interpretation: Though the analysis is quite lengthy but from the above statistical summary and multiple plots we can easily find out the cut-offs points for a good or excellent quality of wine. It is observed that people finds a wine of bad quality or good quality if the following criteria is satisfied Variables Bad Quality Excellent Quality fixed_acidity less than 4 and greater than 8 Range between 6.5 to 7.5 volatile_acidity greater than 0.5 Range between 0.2 to 0.4 citric_acid greater than 0.5 Range between 0.2 to 0.5 residual_sugar greater than 10 Range between 0.5 to 5 chlorides greater than 0.05 Range between 0.01 to 0.04 free_sulfur_dioxide greater than 45 Range between 20 to 40 total_sulfur_dioxide less than 30 and greater than 150 Range between 60 to 145 density greater than 1 Range between 0.98 to 1 pH less than 3 Range between 3 to 3.6 sulphates less than 0.3 Range between 0.4 to 0.75 alcohol less than 8 and greater than 13.5 Range between 10.5 to 13 Model Building and Classification Analysis: After thoroughly working on the exploratory data analysis we have observed the behavior of the wine dataset. Based on the pattern of the data we have seen that the possible reason for a wine to be of bad quality or good quality. As a next step, I have worked on the random forest algorithm to classify the wine dataset. Variable importance plot and Mean decrease Accuracy shows us which are chemical composition are most important while making the wine. I have found an accuracy level of 85%, which is pretty good and the variables are classifying the quality correctly. The model can be improved or further analysis can be done to better understanding the variable importance. Variables n mean sd median trimmed mad min max range skew kurtosis se fixed_acidity 5 7.4 1.0 7.1 7.4 0.4 6.6 9.1 2.5 0.8 -1.2 0.4 volatile_acidity 5 0.3 0.1 0.3 0.3 0.0 0.2 0.4 0.1 0.2 -2.2 0.0 citric_acid 5 0.4 0.1 0.4 0.4 0.1 0.3 0.5 0.2 0.1 -2.0 0.0 residual_sugar 5 4.1 3.8 2.2 4.1 0.9 1.6 10.6 9.0 0.9 -1.2 1.7 chlorides 5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -0.2 -2.1 0.0 free_sulfur_dioxide 5 33.4 13.4 28.0 33.4 4.4 24.0 57.0 33.0 1.0 -1.0 6.0 total_sulfur_dioxide 5 116.0 19.8 119.0 116.0 8.9 85.0 139.0 54.0 -0.4 -1.4 8.9 density 5 1.0 0.0 1.0 1.0 0.0 1.0 1.0 0.0 1.0 -1.0 0.0 pH 5 3.3 0.1 3.3 3.3 0.1 3.2 3.4 0.2 0.0 -1.9 0.0 sulphates 5 0.5 0.1 0.5 0.5 0.1 0.4 0.6 0.3 0.4 -1.5 0.0 alcohol 5.0 12.2 1.0 12.5 12.2 0.3 10.4 12.9 2.5 -1.0 -1.0 0.5
  • 9. Variable Importance Plot: Importance of the Variables: Variables MeanDec re a seAccu ra c y alcohol 82.6 free_sulfur_dioxide 64.4 volatile_acidity 62.0 pH 61.1 sulphates 60.3 residual_sugar 59.7 chlorides 56.0 fixed_acidity 52.8 Density 51.2 total_sulfur_dioxide 49.6 citric_acid 46.7 While it is found that alcohol is the most important chemical while making the wine and the citric acid is the least significant variable for making the wine. Confusion Matrix: Predicted Variable Target Variable Bad Good Normal Bad 9 0 1 Good 1 227 54 Normal 68 162 1428 Accuracy 85%
  • 10. Note:  Though the outlier is there in the dataset I haven’t deleted the outlier observations from the data, as I believe we have to identify those cases where we can improve the wine making procedure  As our target is to identify the relation between the perceived quality of the wine and the chemical composition of the wines, didn’t consider the red and white wine as a separate dataset, if you observed there are no such significant differences between the red and white wine dataset  To classify the variables I have considered Bad as the quality level 1,2,3,4  Normal as the quality level 5, 6, 7 and  Good as the quality level 8,9,10