0
Reading andManipulationg   data in   2013-02-15 @HSPH  Kazuki Yoshida, M.D.    MPH-CLE student                         FRE...
Reading data inn   Usually the first task in real-life data analysis.
Supportedn   .RData (native) files: load()n   .csv files: read.csv()n   .xls/.xlsx files: gdata::read.xls() or xlsx::read....
package name(packages add functions)     function name   foreign::read.dta()                        functions are followed...
Create a folder for   this group
OpenR Studio
Make sure yourworking directory   is correct
Download filesn   Rosner (ASCII, comma-separated and Stata):     http://www.cengage.com/cgi-wadsworth/     course_products...
.csvhttp://www.wondergraphs.com/img/SFO_Landings.csv
For comma-, tab-, orspace-separated text
name of object to create                               assignment operatornew.dat <- read.csv(“file.csv”)         function ...
Space separatedhttp://www.biostat.harvard.edu/~fitzmaur/ala2e/tlc.dat
read.table(“file.dat”)                  or  read.table(“file.dat”, header = T)http://www.biostat.harvard.edu/~fitzmaur/ala2e/...
tab-separated
read.delim(“file.tsv”)     http://www.brookscole.com/cgi-wadsworth/              course_products_wp.pl?fid=M20b&flag=student&...
Excel files
Install xlsx package
Just clickbox to load
To install/load a packageinstall.packages(“package”, dep = T)         library(package)
name of object to create                             assignment operatorxlsdat <- read.xlsx(“file.xls”, 1)       function t...
SAS native files            library(sas7bdat)sasdat <- read.sas7bdat(“file.sas7bdat”)
SAS xport files        library(foreign) xptdat <- read.xport(“file.xpt”)ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/nhanes/...
library(foreign) statadat <- read.dta(“file.dta”)http://www.biostat.harvard.edu/~fitzmaur/ala2e/                 headache.dta
Fixed width
fwfdat <- read.fwf(“file.txt”, width = c(3, 5, ...))                  Use width = list(c(3,5,..), c(5,7,..))               ...
Manipulating data in Rn   Objectsn   Classesn   Various data objects
Objectsn   Just about everything named in R is an objectn   An object is a container that     n   knows its class (eg, ...
Examples of objectsn   data, which you use for analysis (various classes)n   functions, which perform analysis (function...
Classes of data values      inside data objectsn   Numeric: Continuous variablesn   Factor: Categorical variablesn   Lo...
Class?n   An object’s class tells R how the object should be     handled.n   For example, summarizing data should work  ...
Data objectsn   Vector (contains single class of data values)n   List (contains multiple classes of data values)
Data objectsn   Vector (contains single class of data values)     n   Array including Matrixn   List (contains multiple...
Vectorn   Smallest building block of data objectsn   Single dimensionn   Combination of values of same classn   vec1 <...
Arrayn   Vector folded into a multidimensional structuren   2-dimensional array is a matrixn   vec3 <- 1:16n   dim(vec...
Listn   Combination of any values or objectsn   Can contain objects of multiple classesn   eg, a list of two vectors, a...
Data framen   Special case of a listn   List of same-length vectors vertically alignedn   df1 <- data.frame(list2)n   ...
Access by indexesn   letters[3] # 1-dimensional objectn   arr1[1,2,3] # 3-dimensional objectn   arr1[1, ,3] # implies 1...
Access named elementsn   list3n   list3$smalln   list3[["small"]]n   df1$largen   df1[, "large"]
20130215 Reading data into R
20130215 Reading data into R
20130215 Reading data into R
Upcoming SlideShare
Loading in...5
×

20130215 Reading data into R

393

Published on

0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
393
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
13
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Transcript of "20130215 Reading data into R"

  1. 1. Reading andManipulationg data in 2013-02-15 @HSPH Kazuki Yoshida, M.D. MPH-CLE student FREEDOM TO  KNOW
  2. 2. Reading data inn Usually the first task in real-life data analysis.
  3. 3. Supportedn .RData (native) files: load()n .csv files: read.csv()n .xls/.xlsx files: gdata::read.xls() or xlsx::read.xlsx()n .sas7bdat files: sas7bdat ::read.sas7bdat()n .dta files: foreign::read.dta()n and more... http://cran.r-project.org/doc/manuals/R-data.html
  4. 4. package name(packages add functions) function name foreign::read.dta() functions are followed by (), in which you specify arguments
  5. 5. Create a folder for this group
  6. 6. OpenR Studio
  7. 7. Make sure yourworking directory is correct
  8. 8. Download filesn Rosner (ASCII, comma-separated and Stata): http://www.cengage.com/cgi-wadsworth/ course_products_wp.pl? fid=M20bI&product_isbn_issn=9780538733496n Hernan (Excel and SAS): http:// www.hsph.harvard.edu/miguel-hernan/causal- inference-book/
  9. 9. .csvhttp://www.wondergraphs.com/img/SFO_Landings.csv
  10. 10. For comma-, tab-, orspace-separated text
  11. 11. name of object to create assignment operatornew.dat <- read.csv(“file.csv”) function to read .csv files file name here
  12. 12. Space separatedhttp://www.biostat.harvard.edu/~fitzmaur/ala2e/tlc.dat
  13. 13. read.table(“file.dat”) or read.table(“file.dat”, header = T)http://www.biostat.harvard.edu/~fitzmaur/ala2e/tlc.dat
  14. 14. tab-separated
  15. 15. read.delim(“file.tsv”) http://www.brookscole.com/cgi-wadsworth/ course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495384 960&disciplinenumber=1038&template=AUS
  16. 16. Excel files
  17. 17. Install xlsx package
  18. 18. Just clickbox to load
  19. 19. To install/load a packageinstall.packages(“package”, dep = T) library(package)
  20. 20. name of object to create assignment operatorxlsdat <- read.xlsx(“file.xls”, 1) function to read .xlsx files file name here sheet number
  21. 21. SAS native files library(sas7bdat)sasdat <- read.sas7bdat(“file.sas7bdat”)
  22. 22. SAS xport files library(foreign) xptdat <- read.xport(“file.xpt”)ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/nhanes/ 2009-2010/DEMO_F.xpt
  23. 23. library(foreign) statadat <- read.dta(“file.dta”)http://www.biostat.harvard.edu/~fitzmaur/ala2e/ headache.dta
  24. 24. Fixed width
  25. 25. fwfdat <- read.fwf(“file.txt”, width = c(3, 5, ...)) Use width = list(c(3,5,..), c(5,7,..)) for multiple rows per subject
  26. 26. Manipulating data in Rn Objectsn Classesn Various data objects
  27. 27. Objectsn Just about everything named in R is an objectn An object is a container that n knows its class (eg, I have numbers inside!). n has contents (eg, Actual numbers).
  28. 28. Examples of objectsn data, which you use for analysis (various classes)n functions, which perform analysis (function class)n results, which come out of analysis (various classes)
  29. 29. Classes of data values inside data objectsn Numeric: Continuous variablesn Factor: Categorical variablesn Logical: TRUE/FALSE binary variablesn etc...
  30. 30. Class?n An object’s class tells R how the object should be handled.n For example, summarizing data should work differently for numbers and categories!
  31. 31. Data objectsn Vector (contains single class of data values)n List (contains multiple classes of data values)
  32. 32. Data objectsn Vector (contains single class of data values) n Array including Matrixn List (contains multiple classes of data values) n Data frame
  33. 33. Vectorn Smallest building block of data objectsn Single dimensionn Combination of values of same classn vec1 <- c(2013, 2, 15, -10) # combinen vec2 <- 1:16 # integers 1 to 16
  34. 34. Arrayn Vector folded into a multidimensional structuren 2-dimensional array is a matrixn vec3 <- 1:16n dim(vec3) <- c(4, 4) # 4 x 4 structuren dim(vec3) <- c(2, 2, 4) # 2 x 2 x 4 structuren arr1 <- array(1:60, dim = c(3,4,5))
  35. 35. Listn Combination of any values or objectsn Can contain objects of multiple classesn eg, a list of two vectors, a matrix, three arraysn list1 <- list(first = 1:17, second = matrix(letters, 13,2))n list2 <- list(alpha = c(1,4,5,7), beta = c("h","s","p","h"))
  36. 36. Data framen Special case of a listn List of same-length vectors vertically alignedn df1 <- data.frame(list2)n list3 <- list(small = letters, large = LETTERS, number = 1:26)n df2 <- data.frame(list3)
  37. 37. Access by indexesn letters[3] # 1-dimensional objectn arr1[1,2,3] # 3-dimensional objectn arr1[1, ,3] # implies 1,(all),3n df1[ ,3] # implies (all),3n list1[[1]] # list needs [[ ]]
  38. 38. Access named elementsn list3n list3$smalln list3[["small"]]n df1$largen df1[, "large"]
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×