Finding the best K- Knn

K Nearest Neighbours
Choosing the best K
#rstats
#ML
#Classification
In knn, choosing the best k value (nearest neighbors) is critical.
In this post, i write a function which can choose the best k from a range of k values e.g., a range
from 1 to 100.
This function above plots two graphs, one for percentage accuracy, and the other for percentage
error, and it returns a data frame containing the k values and their percentage accuracies and
errors.
The below image illustrates the plots, and we can see the best k is 1 with a accuracy of above
85%. You could make this plot interactive with ggplot2 or plotly packages.

The code is shown below:
analyze_k<-function(train, test, train_labels, test_labels, k_range)
{
# train - the training dataset
#test - the test dataset
#train_labels
#test_labels
#k_range - the max number of k values to
# use(should be numeric and greater than 0)
## the classification package
require(class)
prediction_table<-data.frame() # to store predicted classes
prediction_table[1:nrow(test), 1]<-seq(1, nrow(test), 1)
for (i in 1:k_range)
{ ### storing predicted-class columns
prediction_table[, i]<-knn(train, test, train_labels, k=i)
}
# a list of all tables comparing actual and predicted classes
tab_list<-list()
### storing crosstable lists for all values of k
{
tab_list[[i]]<-table(prediction_table[, i], test_labels)
}
l<-length(unique(train_labels))
sq<-seq(1, (l**2), l+1) # indexer sequence
# stores percentage accuracy and errors
d_f<-data.frame()
d_f[1:k_range, 1]<-1:k_range
# storing percentage accuracy and errors
{
d_f[i, 2]<-sum(tab_list[[i]][sq])
}
d_f[, 3]<-(d_f[, 2]/nrow(test))*100
d_f[, 4]<-100 - d_f[, 3]
colnames(d_f)<-c("k values", "index vector", "Percentage_accuracy",
"Percentage_error")
par(bg="black",mfrow=c(1, 2))
plot(d_f[, 1], d_f[, 3], type="l", xlab="k values", col="blue",
ylab="Percentage Accuracy", main="ACCURACY PLOT", col.main="white",
ylim=c(10, 100), lwd=2, col.axis="azure3", col.lab="azure3")
abline(h=max(d_f[, 3]), lty=1)
grid(,lty=1, col="wheat4")

plot(d_f[, 1], d_f[, 4], type="l", xlab="k values", col="red",
ylab="Percentage error",
main="ERROR PLOT",
col.main="white",
ylim=c(0, 100), lwd=2, col.axis="azure3", col.lab="azure3")
abline(h=min(d_f[, 4]), lty=1)
grid(,lty=1, col="wheat4")
analyze_k_table<-d_f[-2]
## returns an object of type dataframe containing k values and their
## respective accuracy and errors
return(analyze_k_table)
}

Finding the best K- Knn

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Similar to Finding the best K- Knn

Similar to Finding the best K- Knn (20)

Recently uploaded

Recently uploaded (20)

Finding the best K- Knn