SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
Practical Data Science: Data Modelling and Presentation
1. GOAL:
To Predict the Quality of Red Wine Using Binary Classification With KNN
and Decision Tree Classifier
Data Set Characteristics:
Number of Instances:1599
Number of Attributes:12
Predictors:
fixed acidity, volatile acidity, citric acid, residual sugar, chlorides, free
sulphur dioxide, total sulphur dioxide, density, pH, sulphates and
alcohol
Response:
Quality
Whats Good about the Data set:
• None of the columns have missing values in it.
• Not many levels in Nominal variables.
• No Textual Columns.
Data Preparation:
1. The Quality column which an ordinal variable with a range of 2 to
8 is converted into two parts by median, with help of pd.cut() and
labeled them into 0’s and 1’s by LaberEncoder().
2. Outliers are not removed but chosen to be in the predictive
2. Data Visualisation:
• The chance of getting good quality wine is less.
• All the numeric columns are independent on each
other.
Data Modelling
❖Splitting dataset into , X is predictors and y is response variable.
❖Splitting training and test in the ratios of 50:50,60:40 and 80:20.
❖KNN Classifier:
‣ Use Standard scaler to optimise the results
‣ Two important parameters of KNNclassifier() are
metric :- Minkowski
n_neighbors: - square-root of number of predictive
variables(y_test here).
‣ Create and Train the Model
‣ Make predictions with the Model
‣ Evaluate the predictions using confusion matrix and classification report.
‣ From the predictions we can see that accuracy decreases as we increase the
Test size.
Train:Test 50:50 60:40 80:20
Bad
Quality(0)
89 89 89
Good
Quality(1)
66 59 50
Accuracy 88.25 87.03 86.56
Percentages of precision of getting 0 and 1
along with accuracy percentage
3. ❖ Decision Tree Classifier:
‣ No standard Scaling Required as this classifier handles
outliers.
‣ Important parameters of DecisionTreeclassifier() are
criterion: gini Index
max_depth: We have tried a range of values from 2 to 10
but 4 was best .
‣ Create and Train the Model
‣ Make predictions with the Model
‣ Evaluate the predictions using confusion matrix
and classification report.
‣ Similar to the KNN, We can Notice that accuracy decreases as we increase the Test size.
‣ Interestingly, we can See that the accuracy is exactly equal for both KNN and Decision tree models if the train and test split are in ratio
Train:Test 50:50 60:40 80:20
Bad
Quality(0)
91 91 91
Good
Quality(1)
52 70 70
Accuracy 89.37 89.35 86.56
Conclusion:
In conclusion, we can see that Both,KNN and Decision Tree Fitted models have almost
similar accuracy.But Since Decision Tree classifier has Slightly high Accuracy, It is
Recommended for this DataSet.
Thank You
Percentages of precision of getting 0 and 1
along with accuracy percentage.