DBA Basics: Getting Started with Performance Tuning.pdf
Machine hw3
1. Math 156
Homework3(Cui Yi) ID number:605068398
Math 156
Homework3(Cui Yi) ID number:605068398
Q1.
Solution
Q2.
Solution
Q3.
Solution
Q4.
Solution
Q5.
Solution
Q6.
Solution
Q7.
Solution
2. Q1.
Solution
The simplest representation of a linear discriminant function is obtained by taking
a linear function of the input vector so that
We can avoid these difficulties by considering a single K-class discriminant comprising K linear
functions of the form:
The decision boundary between class and class is therefore given by and
hence corresponds to a (D − 1)-dimensional hyperplane defined by .
Consider two points and both of which lie on the decision surface.Because
, we have and hence the vector is orthogonal to every vector
lying within the decision surface, and sow determines the orientation of the decision surface.
Similarly, if is a point on the decision surface, then , and so the normal distance from the
origin to the decision surface is given by
3. Figure 1. Scatter plot
Q2.
Solution
It makes sense to classify this data with a linear classifier.
mu = [1 2]; Sigma = [.1 .05; .05 .2];
r = mvnrnd(mu, Sigma, 50);
scatter(r(:,1),r(:,2),'b.');
hold on
mu1 = [2 4]; Sigma1 = [.2 -.1; -.1 .3];
r1 = mvnrnd(mu1, Sigma1, 50);
scatter(r1(:,1),r1(:,2),'r+');
4. Figure 2. Scatter plot and boundary
Q3.
Solution
X = [[r; r1] ones(2*50, 1)];
b = [ones(50, 1); -ones(50, 1)];
coef = lscov(X, b);
% plot the coefficient line using that learning data we generated
xline = [0; 3];
yline = (-coef(3)-coef(1).*line_x)./coefs(2);
plot( xline, yline, '-k' );
xlim([0 3]); ylim([1 6]);
8. Figure 5.Scatter plot and boundary
Figure 6.Scatter plot and boundary(After adjusting)
9. When calculating the accuracy between r4 and r5,
Accuracy =
When calculating the accuracy between r4 and r6
Accuracy =
When calculating the accuracy between r5 and r6
Accuracy =
The overall accuracy =
R = [r4; r5];
0.9400
R = [r4; r6];
0.7200
R = [r5; r6];
0.5000
0.7200
10. Figure 7.New points
Q6.
Solution
Then we test the performance by creating some new sets.
When calculating the accuracy between r4 and r5,
Accuracy =
When calculating the accuracy between r4 and r6
R = [r4; r5];
0.8400
R = [r4; r6];
11. Accuracy =
When calculating the accuracy between r5 and r6
Accuracy =
The overall accuracy =
The success rate do nor make sense given the distribution of the data, which implies the
discriminant functions least squares do not make sense in when
And it shows that least squares is highly sensitive to outliers.
Q7.
Solution
Calculating all the x in set 1 2 3 for example, k=1 to 15
0.5400
R = [r5; r6];
0.5200
0.6333
%mu = [2 2]; Sigma = [.2 .05; .05 .3];
%r4 = mvnrnd(mu, Sigma, 50);
scatter(r4(:,1),r4(:,2),'b.');
hold on
%mu1 = [2 4]; Sigma1 = [.4 -.1; -.1 .3];
%r5 = mvnrnd(mu1, Sigma1, 50);
scatter(r5(:,1),r5(:,2),'r+');
hold on
%mu2 = [3 3]; Sigma2 = [.5 -.3; -.3 .4];
%r6 = mvnrnd(mu2, Sigma2, 50);
scatter(r6(:,1),r6(:,2),'kd');
legend('r4','r5','r6','Location','best');
12. %Using the Euclidean distance
figure
accuracy=zeros(50,15);
for k=1:15
for cycy=1:50
newpoint = r4(cycy,:);
%line(newpoint(1),newpoint(2),'marker','x','color','k',...
%'markersize',k,'linewidth',2)
x=[r4;r5;r6];
Mdl = KDTreeSearcher(x);
[n,d] = knnsearch(Mdl,newpoint,'k',k);
%line(x(n,1),x(n,2),'color',[.5 .5 .5],'marker','o',...
% 'linestyle','none','markersize',10) ;
cx(cycy,:)=0;
cy(cycy,:)=0;
cz(cycy,:)=0;
for i = 1:k
if n(i) <=50
cx(cycy,:)=cx(cycy,:)+1;
else if n(i)<=100
cy(cycy,:)=cy(cycy,:)+1;
else
cz(cycy,:)=cz(cycy,:)+1;
end
end
end
end
classify=[cx cy cz];
the_point_in_cluster(:,k)=cx./(cy+cz+cx);
subplot(3,5,k)
plot(the_point_in_cluster(:,k),'o')
title(sprintf('k = %d',k))
end
13. Figure 8. The percentage of each points in Set D1
Figure 9. 10 nearest points of the first row of D2
For example, when we want to get 10 nearest points of the first row of D1, we can see that 8 points in
10 points are all in D1, so it will be classified in D1 and it also show the correctness of the KNN
classification.
14. Figure 10. 10 nearest points of the first row of D2
Figure 11. 10 nearest points of the first row of D3
When we want to get 10 nearest points of the first row of D2, we can see that 6 points in 10 points
are all in D2, so it will be classified in D2 and it also show the correctness of the KNN classification.
15. Figure 12. The percentage of each points in Set D2
Figure 13. The percentage of each points in Set D3
When we want to get 10 nearest points of the first row of D3, we can see that 10 points in 10 points
are all in D3, so it will be classified in D3 and it also show the correctness of the KNN classification.
Adjusting the parameter of the function, we can get
16. Finally, I calculate the overall accuracy of the KNN classification.
The failure of least squares should not surprise us when we recall that it corresponds to
maximum likelihood under the assumption of a Gaussian conditional distribution, whereas binary
target vectors clearly have a distribution that is far from Gaussian. By adopting k-nearest neighbor
classifier, we can find KNN algorithm is quite useful in multi-classification problem.
Whether we use linear method or unlinear method like KNN depends on the number of classes
that we should to classify, the dimension of the points in the data and the situations of the degree of
intersection and overlap.
KNN is useful when the data have relatively high level of intersection nad overlap. Also, it can
solve the unlinear problem that the linear classifier can not solve.
%After using "fitcknn" and "predict"
%Calculate the accuracy
Acc_1 = sum(lab(1:50)==1);
Acc_2 = sum(lab(51:100)==2);
Acc_3 = sum(lab(101:159)==3);
Accuracy(i) = ((Acc_1)+(Acc_2)+(Acc_3))./3;
1 0.8133
2 0.8200
3 0.8333
4 0.8400
5 0.8533
6 0.8600
7 0.8600
8 0.8533
9 0.8533
10 0.8533
11 0.8600
12 0.8600
13 0.8400
14 0.8533
15 0.8400