Descriptive
Statistics with
   2012-10-12 @HSPH
  Kazuki Yoshida, M.D.
    MPH-CLE student


                         FREEDOM
                         TO	
  KNOW
Group Website is at:
http://rpubs.com/kaz_yos/useR_at_HSPH
Previously in this group
n   Introduction to R

n   Reading Data into R (1)

n   Reading Data into R (2)




                  Group Website: http://rpubs.com/kaz_yos/useR_at_HSPH
Menu
n   mean and sd

n   median, quantiles, IQR, max, min, and range

n   skewness and kurtosis

n   smarter ways of doing these
Ingredients
          Statistics                 Programming
n   Summary statistics for   n   vector and data frame
     continuous data
                              n   DATA$VAR extraction
     n   Normal data
                              n   Indexing by [row,col]
     n   Non-normal data
                              n   Various functions
     n   Normality check
                              n   skewness(), kurtosis()

                              n   summary()

                              n   describe(), describeBy()
Data loaded
What’s next?
 http://echrblog.blogspot.com/2011/04/statistics-on-states-with-systemic-or.html
Descriptive
 Statistics
 http://www.ehow.com/info_8650637_descriptive-statistical-methods.html
Descriptive statistics                                  is the


          describing
discipline of quantitatively                            the


main features                        of a collection of data




       http://en.wikipedia.org/wiki/Descriptive_statistics
Open
R Studio
Download comma-separated and Excel




 Put them in folder
BONEDEN.DAT.txt

        http://www.cengage.com/cgi-wadsworth/course_products_wp.pl?
                 fid=M20bI&product_isbn_issn=9780538733496
Read in BONEDEN.DAT.txt
      Name it bone
Accessing a single variable in data set

 dataset name             variable name




DATA$VAR
  e.g., mean(bone$age)
vector
?
http://healthy-india.org/enviromentalhealth/
            direct_indirect2.html
DATA$VAR is a vector

 1 2     3   4   5    6   7   8
                 OR
 “A” “B” “C” “D” “E” “F” “G” “H”

                          like strings with
                           values attached
Multiple vectors
                                                   of same length
                                                    tied together
                       Tied here
DATA is a data frame




                               1   2   3   4   5      6        7      8
                             “A” “B” “C” “D” “E” “F” “G” “H
                               1   2   3   4   5       6       7      8
                             “A” “B” “C” “D” “E” “F” “G” “H
                               1   2   3   4   5       6       7      8
                             “A” “B” “C” “D” “E” “F” “G” “H
Indexing: extraction of data from
              data frame

Extract 1st to 15th rows   Extract 1st to 12th columns



       bone[1:15 , 1:12]
    Colon in between
                           Don’t forget comma
age vector within bone data frame
bone$age



Extracted as a vector
mean
mean(x, trim = 0, na.rm = FALSE)
Your turn      adopted from Hadley Wickham




n   What is the mean of age?
sd
sd(x, na.rm = FALSE)
Your turn   adopted from Hadley Wickham




n   What is the sd of age?
median
 median(x, na.rm = FALSE)
Your turn    adopted from Hadley Wickham




n   What is the median of age?
0th, 25th, 50th, 75th, and 100th percentiles by defaults




   quantile
 quantile(x, probs = seq(0, 1, 0.25), na.rm = FALSE,
              names = TRUE, type = 7)
Your turn    adopted from Hadley Wickham




n   What is the 25th and 75th percentiles of age?
75th percentile - 25th percentile




   IQR
IQR(x, na.rm = FALSE, type = 7)
Your turn     adopted from Hadley Wickham




n   What is the IQR of age?
max
max(..., na.rm = FALSE)
min




min(..., na.rm = FALSE)
Your turn   adopted from Hadley Wickham




n   What are the minimum and maximum of age?
range
range(..., na.rm = FALSE)
Your turn    adopted from Hadley Wickham




n   What the range of age?
We now
resort to
external
packages
Install and Load
  e1071, psych
To load a package by command
                          package name here



library(package)
     double quote “” can be omitted
Assessment of
  normality
Load e1071 package
library(e1071)



skewness
    skewness(x, na.rm = FALSE, type = 3)
                               type = 2 SAS
                               type = 1 Stata
library(e1071)



 kurtosis
     kurtosis(x, na.rm = FALSE, type = 3)
                                type = 2 SAS
                                type = 1 Stata
Your turn     adopted from Hadley Wickham




n   What are the skewness and kurtosis of age by the
     Stata-method?
Multiple
variables
 at once
summary
  summary(object, ...)
Your turn    adopted from Hadley Wickham




n   Try summary on the dataset (data frame).
Various summary
    measures                   library(psych)



  describe
 describe(x, na.rm = TRUE, interp = FALSE, skew =
     TRUE, ranges = TRUE,trim = .1, type = 3)
                                      type = 2 SAS
                                      type = 1 Stata
Your turn    adopted from Hadley Wickham




n   describe(bone[,-1], type = 2)
Groupwise
 summary                    library(psych)


 describeBy
describeBy(x, group=NULL,mat=FALSE,type=3,...)
                                  type = 2 SAS
                                  type = 1 Stata
Your turn    adopted from Hadley Wickham



 bone data frame
without 1st columns           zyg vector for grouping



 n   describeBy(bone[ , c(-1)] , bone$zyg , type = 2)



                                       SAS method for
                                    skewness and kurtosis
Ingredients
          Statistics                 Programming
n   Summary statistics for   n   vector and data frame
     continuous data
                              n   DATA$VAR extraction
     n   Normal data
                              n   Indexing by [row,col]
     n   Non-normal data
                              n   Various functions
     n   Normality check
                              n   skewness(), kurtosis()

                              n   summary()

                              n   describe(), describeBy()
Descriptive Statistics with R

Descriptive Statistics with R