1. GROUP 4
--by using mathematical models
Investigating The Best Quality
Measurements
Student ID: 9685718 9744049 8455558 9794830 8517199 9652148
2. - Decision for procurement of
Portuguese white wine
- Decision made based on
wine’s characteristics
- Classification methods were
used
Aim:Find the best quality wines
using the most accurate method
Methods used:
- Decision Trees
- Logical regression
- Multiple regression
- KNN algorithm
Introduction
6. Step/01 Multiple Regression
Performed using Excel
Removed variables with low p-value
Accuracy below 50%
None of the three methods mentioned
above was considered as a reliable
method to use for classification.
8. function [averagewrong,precision] = main(repeattimes)]
[alldata,txt,raw]=xlsread('wine-quality');
len = size(alldata,1); Set the sample data
sample_number = round(0.8*len);
test_number = len - sample_number; Set the rest of data as test data
for k=1:10
wrong = [];
for t=1:repeattimes
sample = [];
test = [];
sequence = randperm(len); Generate random permutation
for i=1:len
if i<=sample_number Pick random samples from data
sample = [sample;alldata(sequence(i),:)];
test = [test;alldata(sequence(i),:)]; Rest of data as test data
end
end
Algorithms Design of Main Function
9. wrongnumber = classify(sample,test,k); KNN Meothod Applied
wrong = [wrong wrongnumber];
end
averagewrong(k)= mean(wrong);
end
plot(averagewrong,'k-*');
title(['Average errors from k=1 to k=10,while repeat times is
',num2str(repeattimes)]);
xlabel('k');ylabel('errors number')
precision = 1-averagewrong/test_number;
plot(precision,'r-*');
title(['Average precision from k=1 to k=10,while repeat times is
',num2str(repeattimes)]);
xlabel('k');ylabel('precision')
Algorithms Design of Main Function
Draw the plot
10. Algorithms Design of Classification Function
function [wrongnumber] = classify(traindata,testdata,k)
test_len = size(testdata,1);
train_len = size(traindata,1);
predict = [];
wrongnumber = 0;
for i=1:test_len
temp=[];
test = repmat(testdata(i,:),train_len,1);
temp = sum((test-traindata).^2,2).^0.5; Calculate the distance between test data and sample
temp = [temp traindata(:,end)]; First column is distance, Second column is quality
temp = sortrows(temp,1); Do ranking based on the distance
tt = ones(9,2); Programming K closest qualities
tt(:,1)=cumsum(tt(:,1));
for w=1:k
quality = temp(w,2);
tt(quality,2) = tt(quality,2)+1;
end
tt = sortrows(tt,-2); Rank the quality based on how many time the quality appears
predict(i) = tt(1,1); Choose the one that appears the most
if predict(i)~=testdata(i,end)
wrongnumber = wrongnumber+1;
end
end
16. Step 3 Normalization
Normalization is the transformation from a normally distributed random
variable to a random variable following a standard normal distribution.
.
Reference: The concise encyclopedia
of statistics, PP 387-388
*Code of Normalization:
for i=1:size(alldata,2)-1
alldata(:,i) = (max(alldata(:,i))- alldata(:,i)) /
(max(alldata(:,i))-min(alldata(:,i)));
end
Formula: X=
𝑥−𝑥 𝑚𝑖𝑛
𝑥 𝑚𝑎𝑥−𝑥 𝑚𝑖𝑛
∈ (0,1)