関東第2回r勉強会

2,565 views

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,565
On SlideShare
0
From Embeds
0
Number of Embeds
2,010
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

関東第2回r勉強会

  1. 1. 第三回R勉強会R言語のループ処理について
  2. 2. R言語でほかのプログラミング言語と同じ:●for文●while文●repeat文実行を振り返すためにループがある。。。
  3. 3. for> for (i in 1:10) cat(pnorm(i)," ")0.8413447 0.9772499 0.9986501 0.9999683 0.9999997 1 1 1 1 1for (name in expr_1) expr_2> for (i in 1:ncol(faithful)){>print(c(min(faithful[,i]),max(faithful[,i]),mean(faithful[,i]),median(faithful[,i])))> }> xyz = list(42,c(1,2,3),matrix(c(1:4),2,2))> for (i in 1:length(xyz)) xyz[[i]] = xyz[[i]]+1[1] 43[1] 2 3 4[,1] [,2][1,] 2 4[2,] 3 5
  4. 4. while> while (i < 5){print(dnorm(i))i = i+1}[1] 0.3989423[1] 0.2419707[1] 0.05399097[1] 0.004431848[1] 0.0001338302while (condition) expr
  5. 5. > x = 2> repeat{> print(x)> x = x^2> }...[1] Inf[1] Inf[1] Inf[1] Inf[1] Inf[1] Inf[1] Inf[1] Inf[1]...repeat exprrepeatx = 2repeat{print(x)x = x^2if(x == Inf ) break}[1] 2[1] 4[1] 16[1] 256[1] 65536[1] 4294967296[1] 1.844674e+19[1] 3.402824e+38[1] 1.157921e+77[1] 1.340781e+154Break 以外には next (AKA ”continue”) もあります
  6. 6. R言語でほかのプログラミング言語と同じ:●for文●while文●repeat文実行を振り返すためにループがある。。。
  7. 7. Warning: for() loops are used in R code much less often than incompiled languages. Code that takes a ‘whole object’ view is likelyto be both clearer and faster in RAn introduction to R(http://cran.r-project.org)けど。。。
  8. 8. 1. Vectorization> A = matrix(1:4,nrow=2,ncol=2)> A[,1] [,2][1,] 1 3[2,] 2 4> B = matrix(2,nrow=2,ncol=2)> B[,1] [,2][1,] 2 2[2,] 2 2> A+B[,1] [,2][1,] 3 5[2,] 4 6> A*B[,1] [,2][1,] 2 6[2,] 4 8> A %*% B[,1] [,2][1,] 8 8[2,] 12 12配列掛け算:> A = 1:4> A[1] 1 2 3 4> B = c(2,2,2,2)> B[1] 2 2 2 2> A+B[1] 3 4 5 6> A*B[1] 2 4 6 8> A^2[1] 1 4 9 16R言語の関数はVectorizationということ使ってベクターのようなオブジェクト一緒に扱って処理します
  9. 9. 1. Vectorization - MSE
  10. 10. 1. Vectorization – MSEmse = function(Q0,Q1,X,Y){sum = 0for(i in 1:length(X)){temp_sum = (Y[i] -(Q0+X[i]*Q1))^2sum = sum + temp_sum}return(sum / length(X))}mse = 1/n*sum(((q0 + q1*X)-Y)^2)MSE=1n∑i=1n( ̂Yi−Yi)2̂Yi=Q0+ Q1×XiMSE=1n∑i=1n((Q0+ Q1×X i)−Yi)2MSE=1n∑i=1n((Q0+ Q1×X i)−Yi)2
  11. 11. > y = sort(rnorm(10e5,mean=20,sd=10),decreasing=TRUE)> x = seq(y[1],y[length(y)],length.out=10e5)> system.time(sum((y-(q0 +x*q1))^2)/length(x))ユーザ システム 経過0.00 0.02 0.01> system.time(mse(-1.5,1,x,y))ユーザ システム 経過2.57 0.00 2.57Vectorizeの件:関数の件:> X = iris[,1]> Y = iris[,3]> system.time(mse(-1.5,1,X,Y))ユーザ システム 経過0 0 0> system.time(sum((Y-(q0 + X*q1))^2)/length(X))ユーザ システム 経過0 0 0Vectorizeの件:関数の件:
  12. 12. 2. Built in functionsmaximum = function(mtrx){mx = mtrx[1,1]for(i in 1:ncol(abc)){for(j in 1:length(abc[i,])){if(mtrx[i,j]>mx) mx = mtrx[i,j]}}return(mx)}maximum(abc)> max(abc)> system.time(maximum(abc))ユーザ システム 経過0.90 0.00 0.91> system.time(max(abc))ユーザ システム 経過0.02 0.00 0.01> set.seed(42)> abc =matrix(matrix(rnorm(10e5),nrow=10e2))様子はー万だけ
  13. 13. 2. Built in functions数学の手段 R言語の関数全額 sum平均 mean中央値 median分散 var共分散 cov相関 cor対数 log値の範囲 range尺度 scale
  14. 14. ,2 ,31,2,3,,158.559.558.056.5157.5156.0 057.055.5> apply(myMatrix,1,mean)[1] 57.0 104.5 58.058.0104.557.0平均272.5268.0 118.0全額> apply(myMatrix,2,sum)[1] 268.0 272.5 118.03. apply, lapply, sapply...Apply – apply function over array margins> apply(myMatrix,c(1,2),sqrt)[,1] [,2] [,3][1,] 7.449832 7.549834 7.648529[2,] 12.489996 12.549900 0.000000[3,] 7.516648 7.615773 7.713624> sqrt(myMatrix)
  15. 15. 3. apply, lapply, sapply...> set.seed(42)> abc =matrix(matrix(rnorm(10e5),nrow=10e2))my.mean = function(mat){res.vec = vector()for (i in 1:ncol(mat)){my.sum = 0for (j in 1:length(mat[,i])){my.sum = my.sum + mat[j,i]}my.col.mean = my.sum / length(mat[,i])res.vec = append(res.vec,my.col.mean)}return (res.vec)}> apply(abc,1,mean)> system.time(apply(abc,2,mean))ユーザ システム 経過0.01 0.01 0.03> system.time(my.mean(abc))ユーザ システム 経過0.92 0.00 0.92
  16. 16. 3. apply, lapply, sapply...● lapply – apply function over list or vector● sapply – user-friendly version of lapply● vapply – similar but return specified values● rapply – recursive apply● tapply – apply function to a ragged array● mapply – apply a function to a multiple listinput???output???
  17. 17. 4. plyrのパッケージAP PLY + Rパッケージの名前について:data frame data framelist arrayInput Output 関数名前ddplylaplyarray none a_ply
  18. 18. 4. plyrのパッケージGET https://api.twitter.com/1.1/statuses/home_timeline.json[{"coordinates": null,"truncated": false,"favorited": false,"created_at": "Mon Jun 27 19:32:19 +0000 2011","id_str": "85430275915526144","user": {"profile_sidebar_border_color": "0094C2","profile_background_tile": false,"profile_sidebar_fill_color": "a9d9f1","name": "Twitter API",},…...}]
  19. 19. 4. plyrのパッケージin_reply_to_status_id_sin_reply_to_user_id... user.id user.name ….$statuses[[99]]$in_reply_to_status_id_strNULL$statuses[[99]]$in_reply_to_user_idNULL$statuses[[99]]$in_reply_to_user_id_strNULL$statuses[[99]]$user$statuses[[99]]$user$id[1] 375912401$statuses[[99]]$user$name[1] "Super Villain "
  20. 20. nestedParser = function(element,data){resultMat =matrix(nrow=length(data),ncol=length(element))colnames(resultMat) = elementfor (i in 1:length(element)){levels = strsplit(element[i],"$",fixed=TRUE)for(k in 1:length(data)){for(j in 1:(length(levels[[1]]))){if(j == 1){temp = data[[k]][levels[[1]][j]]}else{temp = temp[[1]][levels[[1]][j]]}}resultMat[k,i] = as.character(temp)}}return(resultMat)}4. plyrのパッケージ> nestedParser(c("user$name","text","lang","created_at","id"),tw.data$statuses)
  21. 21. 4. plyrのパッケージtw.data = laply(tw.data, .fun = function(x){x[c("text","id",...)]})tw.data = laply(tw.data, function(x) laply(x, identity))ひとつずつの列を取って:リストの帰納を避ける:
  22. 22. ●R言語でループ(for, while, repeat)があるけどできる限り使わないほうがいいです●Vectorizeを使って関数と方法はループより早くて読みやすいです● ループはバッグっぽくてデバッグやりにくい●R言語のBuild-inの関数やApplyやplyrパッケージなど使いましょう!5. まとめ

×