20130215 Reading data into R
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

20130215 Reading data into R

  • 561 views
Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
561
On Slideshare
561
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
6
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Reading andManipulationg data in 2013-02-15 @HSPH Kazuki Yoshida, M.D. MPH-CLE student FREEDOM TO  KNOW
  • 2. Reading data inn Usually the first task in real-life data analysis.
  • 3. Supportedn .RData (native) files: load()n .csv files: read.csv()n .xls/.xlsx files: gdata::read.xls() or xlsx::read.xlsx()n .sas7bdat files: sas7bdat ::read.sas7bdat()n .dta files: foreign::read.dta()n and more... http://cran.r-project.org/doc/manuals/R-data.html
  • 4. package name(packages add functions) function name foreign::read.dta() functions are followed by (), in which you specify arguments
  • 5. Create a folder for this group
  • 6. OpenR Studio
  • 7. Make sure yourworking directory is correct
  • 8. Download filesn Rosner (ASCII, comma-separated and Stata): http://www.cengage.com/cgi-wadsworth/ course_products_wp.pl? fid=M20bI&product_isbn_issn=9780538733496n Hernan (Excel and SAS): http:// www.hsph.harvard.edu/miguel-hernan/causal- inference-book/
  • 9. .csvhttp://www.wondergraphs.com/img/SFO_Landings.csv
  • 10. For comma-, tab-, orspace-separated text
  • 11. name of object to create assignment operatornew.dat <- read.csv(“file.csv”) function to read .csv files file name here
  • 12. Space separatedhttp://www.biostat.harvard.edu/~fitzmaur/ala2e/tlc.dat
  • 13. read.table(“file.dat”) or read.table(“file.dat”, header = T)http://www.biostat.harvard.edu/~fitzmaur/ala2e/tlc.dat
  • 14. tab-separated
  • 15. read.delim(“file.tsv”) http://www.brookscole.com/cgi-wadsworth/ course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495384 960&disciplinenumber=1038&template=AUS
  • 16. Excel files
  • 17. Install xlsx package
  • 18. Just clickbox to load
  • 19. To install/load a packageinstall.packages(“package”, dep = T) library(package)
  • 20. name of object to create assignment operatorxlsdat <- read.xlsx(“file.xls”, 1) function to read .xlsx files file name here sheet number
  • 21. SAS native files library(sas7bdat)sasdat <- read.sas7bdat(“file.sas7bdat”)
  • 22. SAS xport files library(foreign) xptdat <- read.xport(“file.xpt”)ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/nhanes/ 2009-2010/DEMO_F.xpt
  • 23. library(foreign) statadat <- read.dta(“file.dta”)http://www.biostat.harvard.edu/~fitzmaur/ala2e/ headache.dta
  • 24. Fixed width
  • 25. fwfdat <- read.fwf(“file.txt”, width = c(3, 5, ...)) Use width = list(c(3,5,..), c(5,7,..)) for multiple rows per subject
  • 26. Manipulating data in Rn Objectsn Classesn Various data objects
  • 27. Objectsn Just about everything named in R is an objectn An object is a container that n knows its class (eg, I have numbers inside!). n has contents (eg, Actual numbers).
  • 28. Examples of objectsn data, which you use for analysis (various classes)n functions, which perform analysis (function class)n results, which come out of analysis (various classes)
  • 29. Classes of data values inside data objectsn Numeric: Continuous variablesn Factor: Categorical variablesn Logical: TRUE/FALSE binary variablesn etc...
  • 30. Class?n An object’s class tells R how the object should be handled.n For example, summarizing data should work differently for numbers and categories!
  • 31. Data objectsn Vector (contains single class of data values)n List (contains multiple classes of data values)
  • 32. Data objectsn Vector (contains single class of data values) n Array including Matrixn List (contains multiple classes of data values) n Data frame
  • 33. Vectorn Smallest building block of data objectsn Single dimensionn Combination of values of same classn vec1 <- c(2013, 2, 15, -10) # combinen vec2 <- 1:16 # integers 1 to 16
  • 34. Arrayn Vector folded into a multidimensional structuren 2-dimensional array is a matrixn vec3 <- 1:16n dim(vec3) <- c(4, 4) # 4 x 4 structuren dim(vec3) <- c(2, 2, 4) # 2 x 2 x 4 structuren arr1 <- array(1:60, dim = c(3,4,5))
  • 35. Listn Combination of any values or objectsn Can contain objects of multiple classesn eg, a list of two vectors, a matrix, three arraysn list1 <- list(first = 1:17, second = matrix(letters, 13,2))n list2 <- list(alpha = c(1,4,5,7), beta = c("h","s","p","h"))
  • 36. Data framen Special case of a listn List of same-length vectors vertically alignedn df1 <- data.frame(list2)n list3 <- list(small = letters, large = LETTERS, number = 1:26)n df2 <- data.frame(list3)
  • 37. Access by indexesn letters[3] # 1-dimensional objectn arr1[1,2,3] # 3-dimensional objectn arr1[1, ,3] # implies 1,(all),3n df1[ ,3] # implies (all),3n list1[[1]] # list needs [[ ]]
  • 38. Access named elementsn list3n list3$smalln list3[["small"]]n df1$largen df1[, "large"]