• Save
Datamining R 4th
Upcoming SlideShare
Loading in...5
×
 

Datamining R 4th

on

  • 2,042 views

 

Statistics

Views

Total Views
2,042
Views on SlideShare
1,751
Embed Views
291

Actions

Likes
1
Downloads
0
Comments
0

4 Embeds 291

http://togodb.sel.is.ocha.ac.jp 278
http://togodb.seselab.org 11
http://www.slideshare.net 1
http://webcache.googleusercontent.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Datamining R 4th Datamining R 4th Presentation Transcript

  • R: apply Fisher sesejun@is.ocha.ac.jp 2009/11/19
  • USPS
  • ImageName Class 0,0 0,1 0,2 0,3 0,4 img_2_00_02 1 0 0 0 0 0 img_2_00_03 1 0 38 22 0 0 img_2_00_05 1 13 0 64 13 42 ... img_0_00_09 -1 34 53 0 38 0 img_0_00_28 -1 0 64 0 98 93 img_0_01_08 -1 13 0 0 59 13 img_0_03_05 -1 34 34 0 0 0 View slide
  • img_3_29_25 img_5_03_31 img_3_06_30 img_3_17_08 View slide
  • k-NN
  • Apply Family • , , • for • apply(X, 1, ) apply(X, 2, ) apply(X, c(1,2), ) lapply(X, ) dataframe sapply(X, ) table sweep(X, M,V) X (M=1) (M=2) (M=c(1,2)) V
  • > m <- matrix((1:9)**2, nrow=3) > l <- list(a=1:3, b=4:6) > m > l [,1] [,2] [,3] $a [1,] 1 16 49 [1] 1 2 3 [2,] 4 25 64 [3,] 9 36 81 $b > apply(m, 1, sum) [1] 4 5 6 [1] 66 93 126 > apply(m, 2, sum) > lapply(l, sum) [1] 14 77 194 $a > apply(m, c(1,2), sqrt) [1] 6 [,1] [,2] [,3] [1,] 1 4 7 $b [2,] 2 5 8 [1] 15 [3,] 3 6 9 > sapply(l, sum) a b 6 15
  • K-NN • > iris.train <- read.table("iris_train.csv", sep=",", header=T) > iris.test <- read.table("iris_test.csv", sep=",", header=T) > q <- iris.test[1,1:4] > diff <- sweep(iris.train[1:4], 2, t(q)) > diff * diff > distquery <- apply(diff * diff, 1, sum) > sort(distquery) > order(distquery)
  • 1 > iris.train[order(distquery)[1:5],] > iris.train[order(distquery)[1:5],]$Class > knnclasses <- table(iris.train[order(distquery)[1:5],]$Class) > as.factor(table(knnclasses) > sortedtable <- sort(as.factor(table(knnclasses)), decreasing=T) > labels(sortedtable)[1] > predclass <- labels(sortedtable)[1] > predclass == iris.test$Class[1]
  • > knnpredict <- function(train,class,query,k) { + diff <- sweep(train,2,query) + distquery <- apply(diff * diff, 1, sum) + knnclasses <- class[order(distquery)[1:k]] + sortedtable <- sort(as.factor(table(knnclasses)), decreasing=T) + labels(sortedtable)[1] + } > knnpredict(iris.train[1:4], iris.train$Class, t(iris.test[1,1:4]), 5) > knnpredict(iris.train[1:4], iris.train$Class, t(iris.test[10,1:4]), 1) > for (i in 1:length(rownames(iris.test))) { + pred <- knnpredict(iris.train[1:4], iris.train$Class, t(iris.test[i,1:4]),10) + result <- pred == iris.test[i,]$Class + cat(paste(pred, iris.test[i,]$Class, result, sep="t")) + cat("n") + }
  • > resvec <- c() > for (i in 1:30) { + pred <- knnpredict(iris.train[1:4], iris.train$Class, t(iris.test[i,1:4]),10) + resvec <- append(resvec, pred == iris.test[i,]$Class) + } > sum(resvec)/length(resvec)
  • SVM
  • SVM > iris.train <- read.table("iris_train.csv", sep=",", header=T) > iris.test <- read.table("iris_test.csv", sep=",", header=T) > library("e1071") > iris.model <- svm(iris.train[1:4], iris.train$Class) > iris.pred <- predict(iris.model, iris.test[1:4]) > table(iris.pred, iris.test$Class) iris.pred Iris-setosa Iris-versicolor Iris-virginica Iris-setosa 7 0 0 Iris-versicolor 0 9 0 Iris-virginica 0 0 14
  • > iris.model <- svm(iris.train[1:4], iris.train$Class, kernel=”linear”) > iris.pred <- predict(iris.model, iris.test[1:4]) > table(iris.pred, iris.test$Class) iris.pred Iris-setosa Iris-versicolor Iris-virginica Iris-setosa 7 0 0 Iris-versicolor 0 9 0 Iris-virginica 0 0 14
  • 1. IRIS 3 1. IRIS 4 ("Sepal.length","Sepal.width", "Petal.length","Petal.width") 2. IRIS K-NN 2. USPS 1. USPS 5-NN (0-9) 2. K-NN K 3. USPS SVM radial