Categorical data with R
Upcoming SlideShare
Loading in...5
×
 

Categorical data with R

on

  • 2,717 views

 

Statistics

Views

Total Views
2,717
Slideshare-icon Views on SlideShare
2,717
Embed Views
0

Actions

Likes
0
Downloads
38
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

Categorical data with R Categorical data with R Presentation Transcript

  • Tabulatingdata with 2012-10-22 @HSPHKazuki Yoshida, M.D. MPH-CLE student FREEDOM TO  KNOW
  • Group Website is at:http://rpubs.com/kaz_yos/useR_at_HSPH
  • Previously in this groupn Introductionn Reading Data into R (1)n Reading Data into R (2)n Descriptive statistics Group Website: http://rpubs.com/kaz_yos/useR_at_HSPH
  • Menun Categorical datan How to tabulaten Get sums and proportions
  • Ingredients Epi/Stat Programmingn Tables n data()n Cross tables n table(), summary()n Stratified tables n prop.table() n addmargins() n xtabs(), ftable() n gmodels::CrossTable() n epiR::epi.2by2() n Creating categorical variables
  • country race gender ethnicity Categorical datacancer stage education level disease severity
  • OpenR Studio
  • Install and Load vcd epiR
  • We will use “Arthritis” dataset in vcd packageLoad built-in dataset Named “Arthritis” data(Arthritis)
  • Indexing: extraction of data from data frameExtract 1st to 17th rows Show all columns Arthritis[1:17 , ] Colon in between Don’t forget comma
  • Treatment vector in Arthritis data frame Five vectors of same length tied together
  • summary of whole dataset summary summary(Arthritis)
  • Your turn adopted from Hadley Wickhamn summary(Arthritis)
  • Accessing a single variable in data set dataset name variable nameArthritis$Treatment
  • Arthritis$Treatmentfactor levels (categories)
  • Check factor levels of a vector levels levels(Arthritis$Treatment)
  • Your turn adopted from Hadley Wickhamn Arthritis$Improvedn levels(Arthritis$Improved)
  • This is an ordered factor
  • factor
  • factor is categorical variable in R
  • Create a singlevariable summary table table(Arthritis$Improved)
  • Your turn adopted from Hadley Wickhamn table(Arthritis$Improved)
  • Convert tables to proportions prop.table table(table.object)
  • Your turn adopted from Hadley Wickhamn Improved.cat <- table(Arthritis$Improved)n prop.table(Improved.cat)
  • Create cross tables xtabs xtabs(formula = ~ , data = Arthritis)
  • Your turn adopted from Hadley Wickhamn xtabs(~ Treatment +Improved, Arthritis)n xtabs(~ Treatment +Improved +Sex, Arthritis)
  • 2nd dimention 1stdimention 3rd dimention
  • Add margins to tables addmargins addmargins(table.object)
  • Your turn adopted from Hadley Wickhamn tab1 <- xtabs(~ Treatment +Improved, Arthritis)n addmargins(tab1)
  • Create flat tables Good for ≥ 3 dimentional ftable ftable(..., exclude = c(NA, NaN), row.vars = NULL, col.vars = NULL)
  • Your turn adopted from Hadley Wickhamn tab2 <- xtabs(~ Treatment +Improved +Sex, Arthritis)n ftable(tab2)
  • Proportions againprop.table table(cross.table.object)
  • Your turn adopted from Hadley Wickhamn tab3 <- xtabs(~ Treatment +Improved, Arthritis)n prop.table(tab3) # proportion to totaln prop.table(tab3, 1) # proportion to row sum 1st dimensionn prop.table(tab3, 2) # proportion to2nd dimension sum column
  • Chi-squared testchisq.test chisq.test(cross.table.object)
  • Fisher’s exact testfisher.test fisher.test(cross.table.object)
  • Your turn adopted from Hadley Wickhamn tab3 <- xtabs(~ Treatment +Improved, Arthritis)n chisq.test(tab3)n fisher.test(tab3)
  • SAS-like cross tables available in gmodels package CrossTable CrossTable(tab.2d)
  • Your turn adopted from Hadley Wickhamn tab3 <- xtabs(~ Treatment +Improved, Arthritis)n CrossTable(tab3)
  • 2x2 table with RR RD OR available in epiR package epi.2x2 epi.2x2(tab.2by2)
  • Your turn adopted from Hadley Wickhamn tab.2by2 <- xtabs(~ Sex +Treatment, Arthritis)n epi.2by2(tab.2by2, units = 1)
  • Creating factor
  • Data in Excel factor factor Integer
  • To convert number vector to factor vectordat$Stage <- factor(dat$Stage)
  • To convert back to numberdat$Stage <- as.numeric(as.character(dat$Stage))