• Email
  • Like
  • Save
  • Private Content
  • Embed
 

Grouping & Summarizing Data in R

by on May 06, 2011

  • 26,908 views

Overview of a few ways to group and summarize data in R using sample airfare data from DOT/BTS's O&D Survey. ...

Overview of a few ways to group and summarize data in R using sample airfare data from DOT/BTS's O&D Survey.

Starts with naive approach with subset() & loops, shows base R's tapply() & aggregate(), highlights doBy and plyr packages.

Presented at the March 2011 meeting of the Greater Boston useR Group.

Accessibility

Categories

Upload Details

Uploaded via SlideShare as Adobe PDF

Usage Rights

© All Rights Reserved

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

Cancel

4 Embeds 42

http://dodata.wordpress.com 33
http://www.linkedin.com 7
https://si0.twimg.com 1
https://twitter.com 1

Statistics

Likes
17
Downloads
285
Comments
4
Embed Views
42
Views on SlideShare
26,866
Total Views
26,908

14 of 4 previous next Post a comment

  • kerrynicholson Kerry Nicholson OMG thank you! Celebrate the little wins and this ddply is a BIG win! 1 year ago
    Are you sure you want to
  • waggonerc Lao Tzu, consultant at Humility Group Awesome! Thanks Jeff. 1 year ago
    Are you sure you want to
  • jeffreybreen Jeffrey Breen, Technology and travel analyst at Atmosphere Research Group Try naming the variables in your function like this:

    FUN = function(x) { c( sum=sum(x), quantile=quantile(x) ) }

    It should be closer to what you(r boss) wants:

    Category Amount.sum Amount.quantile.0% Amount.quantile.25% Amount.quantile.50% Amount.quantile.75% Amount.quantile.100%

    e.g.:

    > library(doBy)
    > data(beets)
    > summaryBy ( yield ~ harvest, beets, FUN = function(x) { c( sum=sum(x), quantile=quantile(x) ) } )

    harvest yield.sum yield.quantile.0% yield.quantile.25% yield.quantile.50% yield.quantile.75% yield.quantile.100%
    1 harv1 1825.5 91.5 116.75 128 132.5 137.5
    2 harv2 2053.0 99.5 137.75 141 150.5 156.0
    1 year ago
    Are you sure you want to
  • waggonerc Lao Tzu, consultant at Humility Group For me, summaryBy adds weird names.

    summaryBy ( Amount ~ Category, wuslich, FUN = function(x) { c( sum(x), quantile(x) ) } )

    yields

    Category Amount.FUN1 Amount.FUN2 Amount.FUN3 Amount.FUN4 Amount.FUN5 Amount.FUN6



    I don’t think my boss is going to understand that output without retitling.
    1 year ago
    Are you sure you want to
Post Comment
Edit your comment

Grouping & Summarizing Data in R Grouping & Summarizing Data in R Presentation Transcript