Reading Data into R

2,537 views

Published on

Published in: Education, Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,537
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
43
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Reading Data into R

  1. 1. Reading data into 2012-09-28 @HSPH Kazuki Yoshida, M.D. MPH-CLE student FREEDOM TO(KNOW
  2. 2. Previously in this group! Group Website: http://rpubs.com/kaz_yos/useR_at_HSPH! Introduction to R
  3. 3. Menu! What statistics is all about.! Data-reading functions in R! Installing packages! Reading excel files! Reading other files
  4. 4. http://mediacrushllc.com/2012/internet-statistics-2012/ is the study of thecollection, organization, analysis, interpretation, and presentation of data http://en.wikipedia.org/wiki/Statistics
  5. 5. No data, No lifeNo statistics
  6. 6. Loading data is the first step http://echrblog.blogspot.com/2011/04/statistics-on-states-with-systemic-or.html
  7. 7. Supported! .RData (native): load()! .csv: read.csv()! .xls/.xlsx: library(gdata) or library(XLConnect)! .sas7bdat: read.sas7bdat() via library(sas7bdat)! .dta: read.dta via library(foreign)! and more... http://cran.r-project.org/doc/manuals/R-data.html
  8. 8. library()packages
  9. 9. 4000+ user-contributed packages Fast development http://r4stats.com/articles/popularity/
  10. 10. Downside:not much can be done without packages
  11. 11. CRAN
  12. 12. Comprehensive R Archive Networkhttp://cran.r-project.org/web/packages/ available_packages_by_date.html
  13. 13. Let’s try
  14. 14. OpenR Studio
  15. 15. http://rstudio.orgWatch the screencast
  16. 16. Plot Workspace switchedConsole Source
  17. 17. Menu: RStudio - Preferences My configuration Plot Workspace Console Source
  18. 18. Menu: RStudio - Preferences My configuration Configure CRAN mirror
  19. 19. Comma Separated Values Use .CSV if possiblehttp://www.edrugsearch.com/edsblog/cvs-takes-on-wal-marts-generic-drug-prices-with-a-gimmicky-twist/#.UEfft0J8z0d
  20. 20. .csvhttp://www.wondergraphs.com/img/SFO_Landings.csv
  21. 21. read.csv(“file.csv”)http://www.wondergraphs.com/img/SFO_Landings.csv Careful big file!
  22. 22. name of a dataset here new.dat <-read.csv(“file.csv”) file name herefunction to read .csv files
  23. 23. alternatively name of a dataset here new.dat <- read.csv(file.choose()) function to open a function to read .csv files file-choose dialogue
  24. 24. Space separatedhttp://www.biostat.harvard.edu/~fitzmaur/ala2e/tlc.dat
  25. 25. read.table(“file.dat”) or read.table(“file.dat”, header = T)http://www.biostat.harvard.edu/~fitzmaur/ala2e/tlc.dat
  26. 26. tab-separated
  27. 27. read.delim(“file.tsv”) http://www.brookscole.com/cgi-wadsworth/ course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495384 960&disciplinenumber=1038&template=AUS
  28. 28. For comma-, tab-, orspace-separated text Let’s try!
  29. 29. Excel files prevalenthttp://www.biography.com/people/bill-gates-9307520 http://www.last.fm/music/Excel/+images/285200
  30. 30. http://www.philipcoppens.com/matrixconstructs.html We will use publicly available datahttp://www.hsph.harvard.edu/faculty/miguel- hernan/files/nhefs_book.xls http://www.hsph.harvard.edu/faculty/miguel-hernan/causal-inference-book/
  31. 31. install.packages(“gdata”, dep = T) library(gdata)read.xls(“file.xls”) Perl configuration necessary on Win http://cran.r-project.org/web/packages/gdata/INSTALL
  32. 32. install.packages(“XLConnect”, dep = T)library(XLConnect)readWorksheet(loadWorkbook(“file.xls”), sheet=1) Define a function for simplicity my.read.xls <- function(file) readWorksheet(loadWorkbook(file), sheet = 1) my.read.xls(“file.xls”) install.packages("XLConnect", type = "source") on Mac
  33. 33. To install a package package name hereinstall.packages(“package”, dep = T) short for TRUE short for dependencies
  34. 34. To load a package package name herelibrary(package) double quote “” can be omitted
  35. 35. Just click box
  36. 36. Install package Load packageRead xls file chosen to nhefs
  37. 37. install.packages(“sas7bdat”, dep = T) library(sas7bdat)read.sas7bdat(“file.sas7bdat”) http://www.biostat.harvard.edu/~fitzmaur/ala2e/ smoking.sas7bdat
  38. 38. library(foreign) read.xport(“file.xpt”)ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/nhanes/ 2009-2010/DEMO_F.xpt
  39. 39. library(foreign)read.dta(“file.dta”)http://www.biostat.harvard.edu/~fitzmaur/ala2e/ headache.dta
  40. 40. HTML table http:// www.drugs.com/ top200_2003.html
  41. 41. install.packages(“XML”, dep = T) library(XML)readHTMLTable("http://www.drugs.com/top200_2003.html", which = 2, skip.rows = 1) http://www.drugs.com/top200_2003.html
  42. 42. Fixed width
  43. 43. read.fwf(“file.txt”,width = c(3, 5, ...)) Use width = list(c(3,5,..), c(5,7,..)) for multiple rows per subject
  44. 44. Important functions! install.packages(“PackageName”, dep = T)! library(PackageName)! str(dataset)! summary(dataset)! head(dataset)
  45. 45. Appendix:ProbabilityFunctions
  46. 46. what it -norm -t -binom -pois does density d- dnorm dt dbinom dpois (mass) given x- axis return p- pnorm pt pbinom ppois probability, given x- axis(quan.) return q- qnorm qt qbinom qpois quantile (x-axis), given prob. library(BS t.test, return p--test DA): z.test, library(BS DA): binom.test poisson.test value and confidence zsum.test tsum.test interval

×