改4 (1)

GROUP 4
--by using mathematical models
Investigating The Best Quality
Measurements
Student ID: 9685718 9744049 8455558 9794830 8517199 9652148

- Decision for procurement of
Portuguese white wine
- Decision made based on
wine’s characteristics
- Classification methods were
used
Aim:Find the best quality wines
using the most accurate method
Methods used:
- Decision Trees
- Logical regression
- Multiple regression
- KNN algorithm
Introduction

01 Initial Trials
-- Decision Tree, Logical Regression & Multiple Regression

Step/01 Decision Tree
Built using “python” software
Model runs 10 times
Accuracy below 60%

Step/01 Logical Regression
Performed using Matlab
Performed 5 times
Accuracy below 55%

Step/01 Multiple Regression
Performed using Excel
Removed variables with low p-value
Accuracy below 50%
None of the three methods mentioned
above was considered as a reliable
method to use for classification.

02 Basic codes interpretation
-- k-NN by Matlab language

function [averagewrong,precision] = main(repeattimes)]
[alldata,txt,raw]=xlsread('wine-quality');
len = size(alldata,1); Set the sample data
sample_number = round(0.8*len);
test_number = len - sample_number; Set the rest of data as test data
for k=1:10
wrong = [];
for t=1:repeattimes
sample = [];
test = [];
sequence = randperm(len); Generate random permutation
for i=1:len
if i<=sample_number Pick random samples from data
sample = [sample;alldata(sequence(i),:)];
test = [test;alldata(sequence(i),:)]; Rest of data as test data
end
end
Algorithms Design of Main Function

wrongnumber = classify(sample,test,k); KNN Meothod Applied
wrong = [wrong wrongnumber];
end
averagewrong(k)= mean(wrong);
end
plot(averagewrong,'k-*');
title(['Average errors from k=1 to k=10,while repeat times is
',num2str(repeattimes)]);
xlabel('k');ylabel('errors number')
precision = 1-averagewrong/test_number;
plot(precision,'r-*');
title(['Average precision from k=1 to k=10,while repeat times is
',num2str(repeattimes)]);
xlabel('k');ylabel('precision')
Algorithms Design of Main Function
Draw the plot

Algorithms Design of Classification Function
function [wrongnumber] = classify(traindata,testdata,k)
test_len = size(testdata,1);
train_len = size(traindata,1);
predict = [];
wrongnumber = 0;
for i=1:test_len
temp=[];
test = repmat(testdata(i,:),train_len,1);
temp = sum((test-traindata).^2,2).^0.5; Calculate the distance between test data and sample
temp = [temp traindata(:,end)]; First column is distance, Second column is quality
temp = sortrows(temp,1); Do ranking based on the distance
tt = ones(9,2); Programming K closest qualities
tt(:,1)=cumsum(tt(:,1));
for w=1:k
quality = temp(w,2);
tt(quality,2) = tt(quality,2)+1;
end
tt = sortrows(tt,-2); Rank the quality based on how many time the quality appears
predict(i) = tt(1,1); Choose the one that appears the most
if predict(i)~=testdata(i,end)
wrongnumber = wrongnumber+1;
end
end

03 Results Analysis
--by Matlab software

Trial>> [averagewrong,precision] = main(50)
averagewrong =
349.2400 432.3200 452.4800 464.0400 474.8000 478.1600
475.6800 483.8600 480.1200
precision =
0.6436 0.5589 0.5383 0.5326 0.5265 0.5155 0.5121 0.5146
0.5663 0.5101
Computational Results
Step 1: Basic Model
K=1

v
Single Variable Accuracy (when k=1)
Variable Name Accuracy
Fixed Acidity 99.95%
Volatile Acidity 100.00%
Citric Acid 100.00%
Residual Sugar 99.86%
Chlorides 100.00%
Free Sulfur Dioxide 98.59%
Total Sulfur Dioxide 96.68%
Density 100.00%
PH 100.00%
Sulphates 100.00%
Alcohol 99.99% (100%)
Step 2: Remove variables
Delete

>> [averagewrong,precision] = main(50)
averagewrong =
0.1800 0.4200 0.7800 0.7400 1.3000 1.5000
2.1800 2.1200 2.0400 2.2800
precision =
0.9998 0.9996 0.9992 0.9992 0.9987 0.9985
0.9978 0.9978 0.9979 0.9977
Step 2 Remove variables
K=1

Step 3 Normalization
Normalization is the transformation from a normally distributed random
variable to a random variable following a standard normal distribution.
.
Reference: The concise encyclopedia
of statistics, PP 387-388
*Code of Normalization:
for i=1:size(alldata,2)-1
alldata(:,i) = (max(alldata(:,i))- alldata(:,i)) /
(max(alldata(:,i))-min(alldata(:,i)));
end
Formula: X=
𝑥−𝑥 𝑚𝑖𝑛
𝑥 𝑚𝑎𝑥−𝑥 𝑚𝑖𝑛
∈ (0,1)

>> [averagewrong,precision] = main(50)
averagewrong =
0.0050 0.0240 0.0120 0.1760 0.1970 0.6160
0.5840 0.9420 0.9920 0.9790
precision =
1.0000 1.0000 1.0000 0.9998 0.9998 0.9994
0.9994 0.9990 0.9990 0.9990
Step 3 Normalization
K=1

>> [averagewrong,precision] =
main(1000)
averagewrong =
0 0.0200 0.0200 0.1300
0.1590 0.6000 0.5830 0.9800
0.9830 1.0340
precision =
1.0000 1.0000 1.0000 0.9999
0.9998 0.9994 0.9994 0.9990
0.9990 0.9989
Step 4 Combination
K=1

谢谢观赏
Group 4
THANK YOU!
THANKS

改4 (1)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (11)

Similar to 改4 (1)

Similar to 改4 (1) (20)

改4 (1)