Descriptive Statistics with R

2,046
-1

Published on

Published in: Education

Descriptive Statistics with R

  1. 1. DescriptiveStatistics with 2012-10-12 @HSPH Kazuki Yoshida, M.D. MPH-CLE student FREEDOM TO  KNOW
  2. 2. Group Website is at:http://rpubs.com/kaz_yos/useR_at_HSPH
  3. 3. Previously in this groupn Introduction to Rn Reading Data into R (1)n Reading Data into R (2) Group Website: http://rpubs.com/kaz_yos/useR_at_HSPH
  4. 4. Menun mean and sdn median, quantiles, IQR, max, min, and rangen skewness and kurtosisn smarter ways of doing these
  5. 5. Ingredients Statistics Programmingn Summary statistics for n vector and data frame continuous data n DATA$VAR extraction n Normal data n Indexing by [row,col] n Non-normal data n Various functions n Normality check n skewness(), kurtosis() n summary() n describe(), describeBy()
  6. 6. Data loadedWhat’s next? http://echrblog.blogspot.com/2011/04/statistics-on-states-with-systemic-or.html
  7. 7. Descriptive Statistics http://www.ehow.com/info_8650637_descriptive-statistical-methods.html
  8. 8. Descriptive statistics is the describingdiscipline of quantitatively themain features of a collection of data http://en.wikipedia.org/wiki/Descriptive_statistics
  9. 9. OpenR Studio
  10. 10. Download comma-separated and Excel Put them in folderBONEDEN.DAT.txt http://www.cengage.com/cgi-wadsworth/course_products_wp.pl? fid=M20bI&product_isbn_issn=9780538733496
  11. 11. Read in BONEDEN.DAT.txt Name it bone
  12. 12. Accessing a single variable in data set dataset name variable nameDATA$VAR e.g., mean(bone$age)
  13. 13. vector
  14. 14. ?http://healthy-india.org/enviromentalhealth/ direct_indirect2.html
  15. 15. DATA$VAR is a vector 1 2 3 4 5 6 7 8 OR “A” “B” “C” “D” “E” “F” “G” “H” like strings with values attached
  16. 16. Multiple vectors of same length tied together Tied hereDATA is a data frame 1 2 3 4 5 6 7 8 “A” “B” “C” “D” “E” “F” “G” “H 1 2 3 4 5 6 7 8 “A” “B” “C” “D” “E” “F” “G” “H 1 2 3 4 5 6 7 8 “A” “B” “C” “D” “E” “F” “G” “H
  17. 17. Indexing: extraction of data from data frameExtract 1st to 15th rows Extract 1st to 12th columns bone[1:15 , 1:12] Colon in between Don’t forget comma
  18. 18. age vector within bone data frame
  19. 19. bone$ageExtracted as a vector
  20. 20. meanmean(x, trim = 0, na.rm = FALSE)
  21. 21. Your turn adopted from Hadley Wickhamn What is the mean of age?
  22. 22. sdsd(x, na.rm = FALSE)
  23. 23. Your turn adopted from Hadley Wickhamn What is the sd of age?
  24. 24. median median(x, na.rm = FALSE)
  25. 25. Your turn adopted from Hadley Wickhamn What is the median of age?
  26. 26. 0th, 25th, 50th, 75th, and 100th percentiles by defaults quantile quantile(x, probs = seq(0, 1, 0.25), na.rm = FALSE, names = TRUE, type = 7)
  27. 27. Your turn adopted from Hadley Wickhamn What is the 25th and 75th percentiles of age?
  28. 28. 75th percentile - 25th percentile IQRIQR(x, na.rm = FALSE, type = 7)
  29. 29. Your turn adopted from Hadley Wickhamn What is the IQR of age?
  30. 30. maxmax(..., na.rm = FALSE)
  31. 31. minmin(..., na.rm = FALSE)
  32. 32. Your turn adopted from Hadley Wickhamn What are the minimum and maximum of age?
  33. 33. rangerange(..., na.rm = FALSE)
  34. 34. Your turn adopted from Hadley Wickhamn What the range of age?
  35. 35. We nowresort toexternalpackages
  36. 36. Install and Load e1071, psych
  37. 37. To load a package by command package name herelibrary(package) double quote “” can be omitted
  38. 38. Assessment of normality
  39. 39. Load e1071 package
  40. 40. library(e1071)skewness skewness(x, na.rm = FALSE, type = 3) type = 2 SAS type = 1 Stata
  41. 41. library(e1071) kurtosis kurtosis(x, na.rm = FALSE, type = 3) type = 2 SAS type = 1 Stata
  42. 42. Your turn adopted from Hadley Wickhamn What are the skewness and kurtosis of age by the Stata-method?
  43. 43. Multiplevariables at once
  44. 44. summary summary(object, ...)
  45. 45. Your turn adopted from Hadley Wickhamn Try summary on the dataset (data frame).
  46. 46. Various summary measures library(psych) describe describe(x, na.rm = TRUE, interp = FALSE, skew = TRUE, ranges = TRUE,trim = .1, type = 3) type = 2 SAS type = 1 Stata
  47. 47. Your turn adopted from Hadley Wickhamn describe(bone[,-1], type = 2)
  48. 48. Groupwise summary library(psych) describeBydescribeBy(x, group=NULL,mat=FALSE,type=3,...) type = 2 SAS type = 1 Stata
  49. 49. Your turn adopted from Hadley Wickham bone data framewithout 1st columns zyg vector for grouping n describeBy(bone[ , c(-1)] , bone$zyg , type = 2) SAS method for skewness and kurtosis
  50. 50. Ingredients Statistics Programmingn Summary statistics for n vector and data frame continuous data n DATA$VAR extraction n Normal data n Indexing by [row,col] n Non-normal data n Various functions n Normality check n skewness(), kurtosis() n summary() n describe(), describeBy()

×