40

                                   SAKAUE Akkuş Tatsuya
                          2011-12-17


Tuesday, December 6, 11                                   1
Agenda
                          1. R
                          2.
                          3.
                          4.
                          5. R

Tuesday, December 6, 11                   7
Agenda
                          1. R
                          2.
                          3.
                          4.
                          5. R

Tuesday, December 6, 11                   8
•
         •                     Ross Ihaka & Robert Gentleman (R & R)
                     •    Ross Ihaka and Robert Gentleman. R: A language for data analysis and graphics.
                          Journal of Computational and Graphical Statistics, 5(3):299-314, 1996.

                     •    http://biostat.mc.vanderbilt.edu/twiki/pub/Main/Je reyHorner/JCGSR.pdf


         •

Tuesday, December 6, 11                                                                                    9
Tuesday, December 6, 11   10
•                IBM SPSS Statistics   10   ...


         •
         •
Tuesday, December 6, 11                                    11
Excel     SPSS   ...
        •
        •
                          • Excel
        •
Tuesday, December 6, 11                          12
-Install / Update / Uninstall-



Tuesday, December 6, 11                 14
1. http://www.r-project.org/
           2. “download R”
           3. JAPAN
           4.                OS


Tuesday, December 6, 11                   15
• Windows: : http://cran.md.tsukuba.ac.jp/bin/windows/base/
                     • Download R 2.14.0 for Windows

            • Mac OS X: http://cran.md.tsukuba.ac.jp/bin/macosx/
                     • R-2.14.0.pkg (latest version)



Tuesday, December 6, 11                                                   16
Tuesday, December 6, 11   17
Tuesday, December 6, 11   18
Tuesday, December 6, 11   19
Tuesday, December 6, 11   20
Tuesday, December 6, 11   21
Tuesday, December 6, 11   22
1. http://www.r-project.org/
                  2.

                  3.


Tuesday, December 6, 11                          23
•

                          •

            •                 Windows
                          •


Tuesday, December 6, 11                 24
• Windows
         • [ Windows XP]         →

         • [ Windows 7 ]        →


    • Mac OS X
         • /Applications                     /Library
                 Frameworks   R. framework       CleanApp


Tuesday, December 6, 11                                     25
• Windows
            • START → Program → R → R 2.14.0
      • Mac OS X
            • /Applications   R


                          R

Tuesday, December 6, 11                        26
• Windows       Mac OS X
                1. q()
                2.
                3.            R


Tuesday, December 6, 11              27
...
               •        help(sth)
               • seekR    (http://seekr.jp/)


               • R SEEK    (http://www.rseek.org/)


               • RjpWiki      (http://www.okada.jp.org/RWiki/)


               • R-Tips   (http://cse.naro.a rc.go.jp/takezawa/r-tips/r.html)


               •R                                (http://aoki2.si.gunma-u.ac.jp/R/)




Tuesday, December 6, 11                                                               28
•
            •
            •             help(sth) !!


Tuesday, December 6, 11                  29
Agenda
                          1. R
                          2.
                          3.
                          4.
                          5. R

Tuesday, December 6, 11                   31
Agenda
                          1. R
                          2.
                          3.
                          4.
                          5. R

Tuesday, December 6, 11                   32
Tuesday, December 6, 11   33
•                      [Enter]
                          • > 3+5 [Enter]
                          • > 10-3 [Enter]
                          • > 2*3 [Enter]
                          • > 100/20 [Enter]
                          • > (12 + 34 -56) * 78 / 90 [Enter]
Tuesday, December 6, 11                                         34
Tuesday, December 6, 11   35
Tuesday, December 6, 11   36
“I don't know !” by fmgbain http://www.flickr.com/photos/fmgbain/4382010455/
Tuesday, December 6, 11                                                                             37
Tuesday, December 6, 11   38
sqrt()

                          • > sqrt(2)
                          • > sqrt(144)
                          • > sqrt(104976)
Tuesday, December 6, 11                      39
(   )

      •
      •q()                help(sth)

      •
Tuesday, December 6, 11                40
Tuesday, December 6, 11   41
Tuesday, December 6, 11   42
Tuesday, December 6, 11   43
Tuesday, December 6, 11   44
“I don't know !” by fmgbain http://www.flickr.com/photos/fmgbain/4382010455/
Tuesday, December 6, 11                                                                             45
Tuesday, December 6, 11   46
“hako”

     •        > hako <- c(1,2,3,4,5)
     •        > hako
                • c()     concatenate/combine
                •
Tuesday, December 6, 11                         47
c()    “<-”

       hako <- c(1,2,3,4,5)      “<-”

                                 ←

Tuesday, December 6, 11                 48
+        ...


                    •                [Enter]


                          • [STOP]
                          • [Esc]

Tuesday, December 6, 11                              49
•
                    •     Tab


Tuesday, December 6, 11         50
hako


               1          5          5



Tuesday, December 6, 11                  51
Tuesday, December 6, 11   52
•     > sqrt(hako)

                    • > log(hako)

Tuesday, December 6, 11                  53
sqrt(1), sqrt(2) ... sqrt(5)
                           log(1), log (2) ... log (5)

Tuesday, December 6, 11                                  54
Tuesday, December 6, 11   55
•R
             •
               • q(), help(), sqrt(), log(), c()
             •
Tuesday, December 6, 11                            56
Agenda
                          1. R
                          2.
                          3.
                          4.
                          5. R

Tuesday, December 6, 11                   57
Agenda
                          1. R
                          2.
                          3.
                          4.
                          5. R

Tuesday, December 6, 11                   58
Tuesday, December 6, 11   59
Tuesday, December 6, 11   60
Tuesday, December 6, 11   61
...

           •
           •
           •              TOEIC

Tuesday, December 6, 11           62
...

                          A   180     75
                          B   170     65
                          C   165     60
                          D   175     70
                          E   190     80
Tuesday, December 6, 11                    63
...

              • Excel
              •
                    →


Tuesday, December 6, 11         64
•                ...

                          •
                          •
Tuesday, December 6, 11             65
• Windows            →
                •       “MyDocuments”

             • Mac OS X           →

             • Linux up to you...

Tuesday, December 6, 11                 66
•
                          • getwd()
                             • > getwd()
                          • setwd()
                             •   > setwd("/Users/sakaue/Desktop/")




Tuesday, December 6, 11                                              67
• read.csv()
               • CSV
               • CSV: Comma Separated Value

Tuesday, December 6, 11                       68
1. demo.csv
                      • XLS/XLSX
                      • CSV                UTF-8

           2. > test <- read.csv(“demo.csv”)

           3. > test [Enter]


Tuesday, December 6, 11                            69
CSV
                                ...



Tuesday, December 6, 11               70
• read.delim()
               •
               • delim: delimiter

Tuesday, December 6, 11             71
1. demo.xls
        2. > test2 <- read.delim("clipboard")
                          Mac   : read.delim(pipe(“pbpaste”))

        3. > test2 [Enter]



Tuesday, December 6, 11                                         72
> table(test2[,1])
                          •            1

                  > mean(test2[,2])
                          •            2

                  > hist(test2[,2])
                          •            2


Tuesday, December 6, 11                    73
•
         • CSV                 read.csv()

         •                or



Tuesday, December 6, 11                     74
CSV


                          Excel


                                  “y2.d175 | Lasershow! Relax!” by B Rosen
                                  http://www.flickr.com/photos/rosengrant/4751386872/
Tuesday, December 6, 11                                                           75
Agenda
                          1. R
                          2.
                          3.
                          4.
                          5. R

Tuesday, December 6, 11                   77
Agenda
                          1. R
                          2.
                          3.
                          4.
                          5. R

Tuesday, December 6, 11                   78
Tuesday, December 6, 11   79
Tuesday, December 6, 11   80
1.
                  2.



Tuesday, December 6, 11   81
2

                          Excel       SPSS
                                      ... orz
Tuesday, December 6, 11                         82
Tuesday, December 6, 11   83
> age <- c(18, 23, 14, 19,
                   21, 29, 22, 21, 23, 19, 20, 20,
                   26, 18, 14, 6, 8, 16, 23, 20)
                   > hist(age)

Tuesday, December 6, 11                              84
Tuesday, December 6, 11   85
> score <- c(60, 50, 72, 43, 50,
                55, 43, 50, 85, 40)
                > words <- c(340, 190, 465, 170,
                130, 225,140, 310, 580, 120)
                > plot(score,words)

Tuesday, December 6, 11                            86
Tuesday, December 6, 11   87
> nns <- c(350, 285, 315, 340, 210,
            185, 120, 740, 425, 155)
            > ns <- c(365, 570, 645, 540, 645,
            665, 880, 550, 410, 585)
            > boxplot(nns, ns, names=c("NNS", "NS"))


Tuesday, December 6, 11                                88
1.

                  2.

                  3.

                          twitter, FB

Tuesday, December 6, 11                 89
Excel

                          “y2.d175 | Lasershow! Relax!” by B Rosen
                          http://www.flickr.com/photos/rosengrant/4751386872/
Tuesday, December 6, 11                                                   90
Agenda
                          1. R
                          2.
                          3.
                          4.
                          5. R

Tuesday, December 6, 11                   91
Agenda
                          1. R
                          2.
                          3.
                          4.
                          5. R

Tuesday, December 6, 11                   92
Tuesday, December 6, 11   93
t




Tuesday, December 6, 11       94
t




Tuesday, December 6, 11       95
•                      H0

   •                      H1

   •       H0                  (t, χ2, F )

   •


Tuesday, December 6, 11                      96
t

                 •

                 •R       t.test()



Tuesday, December 6, 11              97
t


     > like <- c(6,10,6,10,7,8,7,9,10,4)
     > dislike <- c(3,5,6,4,4,8,4,5,4,7)
     > t.test(like,dislike,var.equal=TRUE)




Tuesday, December 6, 11                      98
t
     > t.test(like,dislike,var.equal=TRUE)

             Two Sample t-test

     data: like and dislike
     t = 3.3041, df = 18, p-value = 0.003946
     alternative hypothesis: true difference in
     means is not equal to 0 #
     95 percent confidence interval: #
      0.9831754 4.4168246
     sample estimates:
     mean of x mean of y
           7.7       5.0

Tuesday, December 6, 11                           99
Tuesday, December 6, 11   101
•
                •

                •         A   B

          •

Tuesday, December 6, 11           102
:        “however”



                                  109     347        8   493

                              [   ] However, ....
                              [   ] ..., however, ....
                              [   ] ..., however.

Tuesday, December 6, 11                                        103
> freq <- c(109,347,8)
    > chisq.test(freq,correct=FALSE)

           Chi-squared test for given probabilities

        data:             freq
        X-squared = 391.7371, df = 2, p-value < 2.2e-16


    #                                              2
    #     http://homepage2.nifty.com/nandemoarchive/toukei_kiso/t_F_chi.htm




Tuesday, December 6, 11                                                       104
•
                          •   t
                          •
              •                   !
                          •           ...


Tuesday, December 6, 11                     106
Tuesday, December 6, 11   107
One more thing...


Tuesday, December 6, 11                  108
Package



Tuesday, December 6, 11   109
Package

        •
                   •      base         1,000
                   •                package
                   •                   ex. RMeCab




Tuesday, December 6, 11                             110
RMeCab



Tuesday, December 6, 11      111
RMeCab
                •
                      •R    MeCab


                      •             R



Tuesday, December 6, 11                 112
• RMeCabText() :
          • RMeCabFreq() :
          • Ngram() : N-gram
          • collocate() :
Tuesday, December 6, 11        114
Tuesday, December 6, 11   115
Tuesday, December 6, 11   117
2,940     1,785   3,780

Tuesday, December 6, 11                   119
twitter: @sakaue

                          e-mail: tsakaue<AT>hiroshima-u.ac.jp




Tuesday, December 6, 11                                          121

Introduction to "R" for Language Researchers

  • 1.
    40 SAKAUE Akkuş Tatsuya 2011-12-17 Tuesday, December 6, 11 1
  • 2.
    Agenda 1. R 2. 3. 4. 5. R Tuesday, December 6, 11 7
  • 3.
    Agenda 1. R 2. 3. 4. 5. R Tuesday, December 6, 11 8
  • 4.
    • Ross Ihaka & Robert Gentleman (R & R) • Ross Ihaka and Robert Gentleman. R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics, 5(3):299-314, 1996. • http://biostat.mc.vanderbilt.edu/twiki/pub/Main/Je reyHorner/JCGSR.pdf • Tuesday, December 6, 11 9
  • 5.
  • 6.
    IBM SPSS Statistics 10 ... • • Tuesday, December 6, 11 11
  • 7.
    Excel SPSS ... • • • Excel • Tuesday, December 6, 11 12
  • 8.
    -Install / Update/ Uninstall- Tuesday, December 6, 11 14
  • 9.
    1. http://www.r-project.org/ 2. “download R” 3. JAPAN 4. OS Tuesday, December 6, 11 15
  • 10.
    • Windows: :http://cran.md.tsukuba.ac.jp/bin/windows/base/ • Download R 2.14.0 for Windows • Mac OS X: http://cran.md.tsukuba.ac.jp/bin/macosx/ • R-2.14.0.pkg (latest version) Tuesday, December 6, 11 16
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
    1. http://www.r-project.org/ 2. 3. Tuesday, December 6, 11 23
  • 18.
    • • Windows • Tuesday, December 6, 11 24
  • 19.
    • Windows • [ Windows XP] → • [ Windows 7 ] → • Mac OS X • /Applications /Library Frameworks R. framework CleanApp Tuesday, December 6, 11 25
  • 20.
    • Windows • START → Program → R → R 2.14.0 • Mac OS X • /Applications R R Tuesday, December 6, 11 26
  • 21.
    • Windows Mac OS X 1. q() 2. 3. R Tuesday, December 6, 11 27
  • 22.
    ... • help(sth) • seekR (http://seekr.jp/) • R SEEK (http://www.rseek.org/) • RjpWiki (http://www.okada.jp.org/RWiki/) • R-Tips (http://cse.naro.a rc.go.jp/takezawa/r-tips/r.html) •R (http://aoki2.si.gunma-u.ac.jp/R/) Tuesday, December 6, 11 28
  • 23.
    • • help(sth) !! Tuesday, December 6, 11 29
  • 24.
    Agenda 1. R 2. 3. 4. 5. R Tuesday, December 6, 11 31
  • 25.
    Agenda 1. R 2. 3. 4. 5. R Tuesday, December 6, 11 32
  • 26.
  • 27.
    [Enter] • > 3+5 [Enter] • > 10-3 [Enter] • > 2*3 [Enter] • > 100/20 [Enter] • > (12 + 34 -56) * 78 / 90 [Enter] Tuesday, December 6, 11 34
  • 28.
  • 29.
  • 30.
    “I don't know!” by fmgbain http://www.flickr.com/photos/fmgbain/4382010455/ Tuesday, December 6, 11 37
  • 31.
  • 32.
    sqrt() • > sqrt(2) • > sqrt(144) • > sqrt(104976) Tuesday, December 6, 11 39
  • 33.
    ( ) • •q() help(sth) • Tuesday, December 6, 11 40
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
    “I don't know!” by fmgbain http://www.flickr.com/photos/fmgbain/4382010455/ Tuesday, December 6, 11 45
  • 39.
  • 40.
    “hako” • > hako <- c(1,2,3,4,5) • > hako • c() concatenate/combine • Tuesday, December 6, 11 47
  • 41.
    c() “<-” hako <- c(1,2,3,4,5) “<-” ← Tuesday, December 6, 11 48
  • 42.
    + ... • [Enter] • [STOP] • [Esc] Tuesday, December 6, 11 49
  • 43.
    • Tab Tuesday, December 6, 11 50
  • 44.
    hako 1 5 5 Tuesday, December 6, 11 51
  • 45.
  • 46.
    > sqrt(hako) • > log(hako) Tuesday, December 6, 11 53
  • 47.
    sqrt(1), sqrt(2) ...sqrt(5) log(1), log (2) ... log (5) Tuesday, December 6, 11 54
  • 48.
  • 49.
    •R • • q(), help(), sqrt(), log(), c() • Tuesday, December 6, 11 56
  • 50.
    Agenda 1. R 2. 3. 4. 5. R Tuesday, December 6, 11 57
  • 51.
    Agenda 1. R 2. 3. 4. 5. R Tuesday, December 6, 11 58
  • 52.
  • 53.
  • 54.
  • 55.
    ... • • • TOEIC Tuesday, December 6, 11 62
  • 56.
    ... A 180 75 B 170 65 C 165 60 D 175 70 E 190 80 Tuesday, December 6, 11 63
  • 57.
    ... • Excel • → Tuesday, December 6, 11 64
  • 58.
    ... • • Tuesday, December 6, 11 65
  • 59.
    • Windows → • “MyDocuments” • Mac OS X → • Linux up to you... Tuesday, December 6, 11 66
  • 60.
    • getwd() • > getwd() • setwd() • > setwd("/Users/sakaue/Desktop/") Tuesday, December 6, 11 67
  • 61.
    • read.csv() • CSV • CSV: Comma Separated Value Tuesday, December 6, 11 68
  • 62.
    1. demo.csv • XLS/XLSX • CSV UTF-8 2. > test <- read.csv(“demo.csv”) 3. > test [Enter] Tuesday, December 6, 11 69
  • 63.
    CSV ... Tuesday, December 6, 11 70
  • 64.
    • read.delim() • • delim: delimiter Tuesday, December 6, 11 71
  • 65.
    1. demo.xls 2. > test2 <- read.delim("clipboard") Mac : read.delim(pipe(“pbpaste”)) 3. > test2 [Enter] Tuesday, December 6, 11 72
  • 66.
    > table(test2[,1]) • 1 > mean(test2[,2]) • 2 > hist(test2[,2]) • 2 Tuesday, December 6, 11 73
  • 67.
    • CSV read.csv() • or Tuesday, December 6, 11 74
  • 68.
    CSV Excel “y2.d175 | Lasershow! Relax!” by B Rosen http://www.flickr.com/photos/rosengrant/4751386872/ Tuesday, December 6, 11 75
  • 69.
    Agenda 1. R 2. 3. 4. 5. R Tuesday, December 6, 11 77
  • 70.
    Agenda 1. R 2. 3. 4. 5. R Tuesday, December 6, 11 78
  • 71.
  • 72.
  • 73.
    1. 2. Tuesday, December 6, 11 81
  • 74.
    2 Excel SPSS ... orz Tuesday, December 6, 11 82
  • 75.
  • 76.
    > age <-c(18, 23, 14, 19, 21, 29, 22, 21, 23, 19, 20, 20, 26, 18, 14, 6, 8, 16, 23, 20) > hist(age) Tuesday, December 6, 11 84
  • 77.
  • 78.
    > score <-c(60, 50, 72, 43, 50, 55, 43, 50, 85, 40) > words <- c(340, 190, 465, 170, 130, 225,140, 310, 580, 120) > plot(score,words) Tuesday, December 6, 11 86
  • 79.
  • 80.
    > nns <-c(350, 285, 315, 340, 210, 185, 120, 740, 425, 155) > ns <- c(365, 570, 645, 540, 645, 665, 880, 550, 410, 585) > boxplot(nns, ns, names=c("NNS", "NS")) Tuesday, December 6, 11 88
  • 81.
    1. 2. 3. twitter, FB Tuesday, December 6, 11 89
  • 82.
    Excel “y2.d175 | Lasershow! Relax!” by B Rosen http://www.flickr.com/photos/rosengrant/4751386872/ Tuesday, December 6, 11 90
  • 83.
    Agenda 1. R 2. 3. 4. 5. R Tuesday, December 6, 11 91
  • 84.
    Agenda 1. R 2. 3. 4. 5. R Tuesday, December 6, 11 92
  • 85.
  • 86.
  • 87.
  • 88.
    H0 • H1 • H0 (t, χ2, F ) • Tuesday, December 6, 11 96
  • 89.
    t • •R t.test() Tuesday, December 6, 11 97
  • 90.
    t > like <- c(6,10,6,10,7,8,7,9,10,4) > dislike <- c(3,5,6,4,4,8,4,5,4,7) > t.test(like,dislike,var.equal=TRUE) Tuesday, December 6, 11 98
  • 91.
    t > t.test(like,dislike,var.equal=TRUE) Two Sample t-test data: like and dislike t = 3.3041, df = 18, p-value = 0.003946 alternative hypothesis: true difference in means is not equal to 0 # 95 percent confidence interval: # 0.9831754 4.4168246 sample estimates: mean of x mean of y 7.7 5.0 Tuesday, December 6, 11 99
  • 92.
  • 93.
    • • A B • Tuesday, December 6, 11 102
  • 94.
    : “however” 109 347 8 493 [ ] However, .... [ ] ..., however, .... [ ] ..., however. Tuesday, December 6, 11 103
  • 95.
    > freq <-c(109,347,8) > chisq.test(freq,correct=FALSE) Chi-squared test for given probabilities data: freq X-squared = 391.7371, df = 2, p-value < 2.2e-16 # 2 # http://homepage2.nifty.com/nandemoarchive/toukei_kiso/t_F_chi.htm Tuesday, December 6, 11 104
  • 96.
    • t • • ! • ... Tuesday, December 6, 11 106
  • 97.
  • 98.
    One more thing... Tuesday,December 6, 11 108
  • 99.
  • 100.
    Package • • base 1,000 • package • ex. RMeCab Tuesday, December 6, 11 110
  • 101.
  • 102.
    RMeCab • •R MeCab • R Tuesday, December 6, 11 112
  • 103.
    • RMeCabText() : • RMeCabFreq() : • Ngram() : N-gram • collocate() : Tuesday, December 6, 11 114
  • 104.
  • 105.
  • 106.
    2,940 1,785 3,780 Tuesday, December 6, 11 119
  • 107.
    twitter: @sakaue e-mail: tsakaue<AT>hiroshima-u.ac.jp Tuesday, December 6, 11 121