SlideShare a Scribd company logo
1 of 13
Download to read offline
useR Vignette:



Reshaping Data in R


Greater Boston useR Group
        April 6, 2011


             by

      Jeffrey Breen
  jbreen@cambridge.aero




      Photo from http://www.ibm-1401.info/
Outline
 ●     Sample data from 2010
       U.S. Census
          ●     Starts “wide”, like a
                spreadsheet
 ●     reshape2 package
          ●     “melt” to make long
          ●     “*cast” to go back to wide
 ●     Extra credit fun with
       “dcast”
 ●     Further reading

useR Vignette: Reshaping Data in R      Greater Boston useR Meeting, April 2011   Slide 2
Sample data: U.S. Census 2010
Read the data:
> pop = read.csv('http://2010.census.gov/2010census/data/pop_density.csv', skip=3)

Just keep the first few columns (total state populations by year):
> pop = pop[,1:12]

Clean up column names:
> colnames(pop)

 [1] "STATE_OR_REGION" "X1910_POPULATION" "X1920_POPULATION" "X1930_POPULATION"
"X1940_POPULATION" "X1950_POPULATION"

 [7] "X1960_POPULATION" "X1970_POPULATION" "X1980_POPULATION" "X1990_POPULATION"
"X2000_POPULATION" "X2010_POPULATION"

> colnames(pop) = c('state', seq(1910, 2010, 10))

> colnames(pop)

 [1] "state" "1910"                  "1920"   "1930"   "1940"  "1950"  "1960"  "1970"  "1980"  "1990"  "2000" 
"2010"




useR Vignette: Reshaping Data in R                        Greater Boston useR Meeting, April 2011       Slide 3
Data set is “wide” like a spreadsheet
> head(pop,40)
                  state     1910      1920      1930      1940      1950      1960      1970      1980      1990      2000      2010
1         United States 92228531 106021568 123202660 132165129 151325798 179323175 203211926 226545805 248709873 281421906 308745538
2               Alabama 2138093    2348174   2646248   2832961   3061743   3266740   3444165   3893888   4040587   4447100   4779736
3                Alaska    64356     55036     59278     72524    128643    226167    300382    401851    550043    626932    710231
4               Arizona   204354    334162    435573    499261    749587   1302161   1770900   2718215   3665228   5130632   6392017
5              Arkansas 1574449    1752204   1854482   1949387   1909511   1786272   1923295   2286435   2350725   2673400   2915918
6            California 2377549    3426861   5677251   6907387 10586223 15717204 19953134 23667902 29760021 33871648 37253956
7              Colorado   799024    939629   1035791   1123296   1325089   1753947   2207259   2889964   3294394   4301261   5029196
8           Connecticut 1114756    1380631   1606903   1709242   2007280   2535234   3031709   3107576   3287116   3405565   3574097
9              Delaware   202322    223003    238380    266505    318085    446292    548104    594338    666168    783600    897934
10 District of Columbia   331069    437571    486869    663091    802178    763956    756510    638333    606900    572059    601723
11              Florida   752619    968470   1468211   1897414   2771305   4951560   6789443   9746324 12937926 15982378 18801310
12              Georgia 2609121    2895832   2908506   3123723   3444578   3943116   4589575   5463105   6478216   8186453   9687653
13               Hawaii   191909    255912    368336    423330    499794    632772    768561    964691   1108229   1211537   1360301
14                Idaho   325594    431866    445032    524873    588637    667191    712567    943935   1006749   1293953   1567582
15             Illinois 5638591    6485280   7630654   7897241   8712176 10081158 11113976 11426518 11430602 12419293 12830632
16              Indiana 2700876    2930390   3238503   3427796   3934224   4662498   5193669   5490224   5544159   6080485   6483802
17                 Iowa 2224771    2404021   2470939   2538268   2621073   2757537   2824376   2913808   2776755   2926324   3046355
18               Kansas 1690949    1769257   1880999   1801028   1905299   2178611   2246578   2363679   2477574   2688418   2853118
19             Kentucky 2289905    2416630   2614589   2845627   2944806   3038156   3218706   3660777   3685296   4041769   4339367
20            Louisiana 1656388    1798509   2101593   2363880   2683516   3257022   3641306   4205900   4219973   4468976   4533372
21                Maine   742371    768014    797423    847226    913774    969265    992048   1124660   1227928   1274923   1328361
22             Maryland 1295346    1449661   1631526   1821244   2343001   3100689   3922399   4216975   4781468   5296486   5773552
23        Massachusetts 3366416    3852356   4249614   4316721   4690514   5148578   5689170   5737037   6016425   6349097   6547629
24             Michigan 2810173    3668412   4842325   5256106   6371766   7823194   8875083   9262078   9295297   9938444   9883640
25            Minnesota 2075708    2387125   2563953   2792300   2982483   3413864   3804971   4075970   4375099   4919479   5303925
26          Mississippi 1797114    1790618   2009821   2183796   2178914   2178141   2216912   2520638   2573216   2844658   2967297
27             Missouri 3293335    3404055   3629367   3784664   3954653   4319813   4676501   4916686   5117073   5595211   5988927
28              Montana   376053    548889    537606    559456    591024    674767    694409    786690    799065    902195    989415
29             Nebraska 1192214    1296372   1377963   1315834   1325510   1411330   1483493   1569825   1578385   1711263   1826341
30               Nevada    81875     77407     91058    110247    160083    285278    488738    800493   1201833   1998257   2700551
31        New Hampshire   430572    443083    465293    491524    533242    606921    737681    920610   1109252   1235786   1316470
32           New Jersey 2537167    3155900   4041334   4160165   4835329   6066782   7168164   7364823   7730188   8414350   8791894
33           New Mexico   327301    360350    423317    531818    681187    951023   1016000   1302894   1515069   1819046   2059179
34             New York 9113614 10385227 12588066 13479142 14830192 16782304 18236967 17558072 17990455 18976457 19378102
35       North Carolina 2206287    2559123   3170276   3571623   4061929   4556155   5082059   5881766   6628637   8049313   9535483
36         North Dakota   577056    646872    680845    641935    619636    632446    617761    652717    638800    642200    672591
37                 Ohio 4767121    5759394   6646697   6907612   7946627   9706397 10652017 10797630 10847115 11353140 11536504
38             Oklahoma 1657155    2028283   2396040   2336434   2233351   2328284   2559229   3025290   3145585   3450654   3751351
39               Oregon   672765    783389    953786   1089684   1521341   1768687   2091385   2633105   2842321   3421399   3831074
40         Pennsylvania 7665111    8720017   9631350   9900180 10498012 11319366 11793909 11863895 11881643 12281054 12702379


useR Vignette: Reshaping Data in R                             Greater Boston useR Meeting, April 2011                           Slide 4
...and wide can be convenient

 ●      Wide format can be useful – just like spreadsheets
 ●      Easy to read, comprehend
 ●      Can sort whole data set by one column:
> library(doBy)
> top = orderBy(~-2010, pop)

> top = subset(top, state!='United States')
> top = head(top, 10)
> top$state = factor(top$state)

> head(top)
            state             1910     1920     1930     1940     1950          1960            1970        1980       1990       2000       2010
6      California          2377549 3426861 5677251 6907387 10586223         15717204        19953134    23667902   29760021   33871648   37253956
45          Texas          3896542 4663228 5824715 6414824 7711194           9579677        11196730    14229191   16986510   20851820   25145561
34       New York          9113614 10385227 12588066 13479142 14830192      16782304        18236967    17558072   17990455   18976457   19378102
11        Florida           752619   968470 1468211 1897414 2771305          4951560         6789443     9746324   12937926   15982378   18801310
15       Illinois          5638591 6485280 7630654 7897241 8712176          10081158        11113976    11426518   11430602   12419293   12830632
40   Pennsylvania          7665111 8720017 9631350 9900180 10498012         11319366        11793909    11863895   11881643   12281054   12702379


 ●      But how would you plot population vs. year?

useR Vignette: Reshaping Data in R                            Greater Boston useR Meeting, April 2011                                     Slide 5
...or too much of a good thing

I'm sure this seemed like a good idea at the time




useR Vignette: Reshaping Data in R   Greater Boston useR Meeting, April 2011   Slide 6
reshape2 package

 ●      Hadley Wickham's “reboot of the reshape
        package”
          ●      Like “plyr”, naming convention denotes output data
                 type: acast() → arrays, dcast() → data.frames
                    –     Beware conflict between reshape::melt() and
                          reshape2::melt()
 ●      Announced September 2010 on [R-pkgs]:
          ●      http://r.789695.n4.nabble.com/R-pkgs-reshape2-a-reboot-
 ●      Similar functions in Base R
          ●      utils::stack(), utils::unstack(), stats::reshape()
useR Vignette: Reshaping Data in R             Greater Boston useR Meeting, April 2011   Slide 7
melt() all those columns away

melt() treats column names as a variable as it
collapses data into long format:
> mtop = melt(top, id.vars='state', variable.name='year', value.name='population')

> head(mtop)
         state year population
1   California 1910     2377549
2        Texas 1910     3896542
3     New York 1910     9113614
4      Florida 1910      752619
5     Illinois 1910     5638591
6 Pennsylvania 1910     7665111
> tail(mtop)
             state year population
105       Illinois 2010    12830632
106   Pennsylvania 2010    12702379
107           Ohio 2010    11536504
108       Michigan 2010     9883640
109        Georgia 2010     9687653
110 North Carolina 2010     9535483

Long, “molten” format (may be) easier for analysis,
plotting, database storage, etc.
useR Vignette: Reshaping Data in R    Greater Boston useR Meeting, April 2011   Slide 8
Obligatory graph

Molten form certainly
makes graphing easier in
ggplot2:

library(ggplot2)

ggplot(data=mtop, aes(group=state)) +
geom_line(aes(x=year, y=population,
color=state))




useR Vignette: Reshaping Data in R   Greater Boston useR Meeting, April 2011   Slide 9
“cast” functions put data back into wide form

 ●      Sometimes you really can put the toothpaste back
        into the tube
          ●      Use acast() to produce arrays/matrices, dcast() for
                 data.frames
          ●      Accepts formula notation
> dcast(mtop, state~year, value_var='population')
           state                1910     1920     1930     1940     1950         1960           1970           1980       1990       2000       2010
1     California             2377549 3426861 5677251 6907387 10586223        15717204       19953134       23667902   29760021   33871648   37253956
2        Florida              752619   968470 1468211 1897414 2771305         4951560        6789443        9746324   12937926   15982378   18801310
3        Georgia             2609121 2895832 2908506 3123723 3444578          3943116        4589575        5463105    6478216    8186453    9687653
4       Illinois             5638591 6485280 7630654 7897241 8712176         10081158       11113976       11426518   11430602   12419293   12830632
5       Michigan             2810173 3668412 4842325 5256106 6371766          7823194        8875083        9262078    9295297    9938444    9883640
6       New York             9113614 10385227 12588066 13479142 14830192     16782304       18236967       17558072   17990455   18976457   19378102
7 North Carolina             2206287 2559123 3170276 3571623 4061929          4556155        5082059        5881766    6628637    8049313    9535483
8           Ohio             4767121 5759394 6646697 6907612 7946627          9706397       10652017       10797630   10847115   11353140   11536504
9   Pennsylvania             7665111 8720017 9631350 9900180 10498012        11319366       11793909       11863895   11881643   12281054   12702379
10         Texas             3896542 4663228 5824715 6414824 7711194          9579677       11196730       14229191   16986510   20851820   25145561



useR Vignette: Reshaping Data in R                               Greater Boston useR Meeting, April 2011                                        Slide 10
Extra credit: *cast functions for BI

 ●      Disclaimer on http://had.co.nz/reshape/ warns not
        a “fully fledged OLAP solution”
          ●      But *cast() can replace table() for computing
                 frequency/contingency tables and crosstabs
          ●      Formula notation allows you to pick out specific
                 columns so wide data can look molten
 ●      Here's some (fake) consumer survey data:
> head(survey)
            ResponseID    sex         age favorite.airline
1 R_51JpA6GecA0SRcU    Female 36-49 years   Virgin America
2 R_0N77P8pZyPnjctm      Male 50-65 years        Southwest
3 R_eFoGuGRSuzqnHgM      Male 50-65 years        Southwest
4 R_9KXgybRXPiDG3LS      Male 36-49 years      Continental
5 R_cCH0fRZmc0zwGkk      Male 50-65 years Delta/Northwest
6 R_ba8ujmCV5OnBNxW      Male 36-49 years        Southwest
[...]

useR Vignette: Reshaping Data in R      Greater Boston useR Meeting, April 2011   Slide 11
Crosstabs with dcast()
> dcast(survey, favorite.airline~sex, value_var='favorite.airline', fun.aggregate=length)
     favorite.airline Male Female
1            Air Tran    7      2
2     Alaska Airlines   17      3
3           Allegiant    0      1
4            American   20      9
5         Continental   31      8
6     Delta/Northwest   36      6
7    Frontier/Midwest    4      3
8             JetBlue   41     29
9           Southwest 100      40
10             Spirit    1      0
11             United   23     10
12         US Airways    2      4
13     Virgin America   30     18

> dcast(survey, favorite.airline~age, value_var='favorite.airline', fun.aggregate=length)
   favorite.airline 18-24 years 25-35 years 36-49 years 50-65 years 66+ years
1          Air Tran           0           1           3           4         1
2   Alaska Airlines           0           0           5          11         4
3         Allegiant           0           0           1           0         0
4          American           1           4           7          14         3
5       Continental           0           3          15          17         4
6   Delta/Northwest           0           6          16          20         0
7 Frontier/Midwest            0           1           3           3         0
8           JetBlue           2          12          19          35         2
9         Southwest           0          20          44          66        10
10           Spirit           0           0           0           1         0
11           United           2           4          13          14         0
12       US Airways           0           0           3           3         0
13   Virgin America           0           8          23          13         4


useR Vignette: Reshaping Data in R             Greater Boston useR Meeting, April 2011   Slide 12
Further reading
 ●     reshape2 package on CRAN
         ●     http://cran.r-project.org/web/packages/reshape2/
 ●     Hadley's github (bleeding edge)
         ●     https://github.com/hadley/reshape
 ●     Decision Stats: “Using Reshape2 for transposing datasets
       in R”
         ●     http://decisionstats.com/2010/11/06/using-reshape2-for-transposin
 ●     Recology: “Good riddance to Excel pivot tables”
         ●     http://r-ecology.blogspot.com/2011/01/good-riddance-to-excel-pivo
 ●     Stack Overflow discussions: “[r] reshape2”
         ●     http://stackoverflow.com/search?tab=votes&q=[r]%20reshape2
useR Vignette: Reshaping Data in R      Greater Boston useR Meeting, April 2011   Slide 13

More Related Content

Similar to Reshaping Data in R

Population by State: Census 2010 Data
Population by State: Census 2010 DataPopulation by State: Census 2010 Data
Population by State: Census 2010 DataKristen Carney
 
Matemática financeira 6 tabelas financeiras de alto nível - by prof. shibat...
Matemática financeira   6 tabelas financeiras de alto nível - by prof. shibat...Matemática financeira   6 tabelas financeiras de alto nível - by prof. shibat...
Matemática financeira 6 tabelas financeiras de alto nível - by prof. shibat...prof. Renan Viana
 
Stats Final-Traffic Fatalities
Stats Final-Traffic FatalitiesStats Final-Traffic Fatalities
Stats Final-Traffic FatalitiesBetty Trinh
 
Ppt presentation
Ppt presentationPpt presentation
Ppt presentationvishnugud
 
Market Indicators - General Overview - December 2014
Market Indicators - General Overview - December 2014Market Indicators - General Overview - December 2014
Market Indicators - General Overview - December 2014MLSListings Inc
 
June 2015 - General Overview
June 2015 - General Overview June 2015 - General Overview
June 2015 - General Overview MLSListings Inc
 
Energy Analysis - Growth Rate 1984 - 2000 - 2019
Energy Analysis - Growth Rate 1984 - 2000 - 2019Energy Analysis - Growth Rate 1984 - 2000 - 2019
Energy Analysis - Growth Rate 1984 - 2000 - 2019Calvin Korponai
 
July 2015 - Market Snapshot - General Overview
July 2015  - Market Snapshot - General OverviewJuly 2015  - Market Snapshot - General Overview
July 2015 - Market Snapshot - General OverviewMLSListings Inc
 
September 2015 - Market Snapshot - General Overvew
September 2015 - Market Snapshot - General OvervewSeptember 2015 - Market Snapshot - General Overvew
September 2015 - Market Snapshot - General OvervewMLSListings Inc
 
August 2015 - Market snapshot - General Overview
August 2015 - Market snapshot - General OverviewAugust 2015 - Market snapshot - General Overview
August 2015 - Market snapshot - General OverviewMLSListings Inc
 
December 2015 - Market Snapshot - General Overview
December 2015 - Market Snapshot - General OverviewDecember 2015 - Market Snapshot - General Overview
December 2015 - Market Snapshot - General OverviewMLSListings Inc
 
New Frontiers in the Political Economy of Minerals & Hydrocarbons: Section III
New Frontiers in the Political Economy of Minerals & Hydrocarbons: Section IIINew Frontiers in the Political Economy of Minerals & Hydrocarbons: Section III
New Frontiers in the Political Economy of Minerals & Hydrocarbons: Section IIIGraciela Chichilnisky
 
NYC Foreign Population by Decade, Country - Seventh Revision
NYC Foreign Population by Decade, Country - Seventh RevisionNYC Foreign Population by Decade, Country - Seventh Revision
NYC Foreign Population by Decade, Country - Seventh RevisionZach Sanders
 
United States Total Area Comparisons (Square miles)
United States Total Area Comparisons (Square miles)United States Total Area Comparisons (Square miles)
United States Total Area Comparisons (Square miles)PhactoidPhil
 
May 2015 - Market Snapshot - General Overview
May 2015 - Market Snapshot - General OverviewMay 2015 - Market Snapshot - General Overview
May 2015 - Market Snapshot - General OverviewMLSListings Inc
 

Similar to Reshaping Data in R (20)

Population by State: Census 2010 Data
Population by State: Census 2010 DataPopulation by State: Census 2010 Data
Population by State: Census 2010 Data
 
Matemática financeira 6 tabelas financeiras de alto nível - by prof. shibat...
Matemática financeira   6 tabelas financeiras de alto nível - by prof. shibat...Matemática financeira   6 tabelas financeiras de alto nível - by prof. shibat...
Matemática financeira 6 tabelas financeiras de alto nível - by prof. shibat...
 
Stats Final-Traffic Fatalities
Stats Final-Traffic FatalitiesStats Final-Traffic Fatalities
Stats Final-Traffic Fatalities
 
Ppt presentation
Ppt presentationPpt presentation
Ppt presentation
 
Market Indicators - General Overview - December 2014
Market Indicators - General Overview - December 2014Market Indicators - General Overview - December 2014
Market Indicators - General Overview - December 2014
 
June 2015 - General Overview
June 2015 - General Overview June 2015 - General Overview
June 2015 - General Overview
 
Energy Analysis - Growth Rate 1984 - 2000 - 2019
Energy Analysis - Growth Rate 1984 - 2000 - 2019Energy Analysis - Growth Rate 1984 - 2000 - 2019
Energy Analysis - Growth Rate 1984 - 2000 - 2019
 
bin4tsv
bin4tsvbin4tsv
bin4tsv
 
July 2015 - Market Snapshot - General Overview
July 2015  - Market Snapshot - General OverviewJuly 2015  - Market Snapshot - General Overview
July 2015 - Market Snapshot - General Overview
 
September 2015 - Market Snapshot - General Overvew
September 2015 - Market Snapshot - General OvervewSeptember 2015 - Market Snapshot - General Overvew
September 2015 - Market Snapshot - General Overvew
 
August 2015 - Market snapshot - General Overview
August 2015 - Market snapshot - General OverviewAugust 2015 - Market snapshot - General Overview
August 2015 - Market snapshot - General Overview
 
CRONY GROUP
CRONY GROUPCRONY GROUP
CRONY GROUP
 
Emi chart
Emi chartEmi chart
Emi chart
 
December 2015 - Market Snapshot - General Overview
December 2015 - Market Snapshot - General OverviewDecember 2015 - Market Snapshot - General Overview
December 2015 - Market Snapshot - General Overview
 
New Frontiers in the Political Economy of Minerals & Hydrocarbons: Section III
New Frontiers in the Political Economy of Minerals & Hydrocarbons: Section IIINew Frontiers in the Political Economy of Minerals & Hydrocarbons: Section III
New Frontiers in the Political Economy of Minerals & Hydrocarbons: Section III
 
CAR Email 5.16.03
CAR Email 5.16.03CAR Email 5.16.03
CAR Email 5.16.03
 
CAR Email 5.16.03 (a)
CAR Email 5.16.03 (a)CAR Email 5.16.03 (a)
CAR Email 5.16.03 (a)
 
NYC Foreign Population by Decade, Country - Seventh Revision
NYC Foreign Population by Decade, Country - Seventh RevisionNYC Foreign Population by Decade, Country - Seventh Revision
NYC Foreign Population by Decade, Country - Seventh Revision
 
United States Total Area Comparisons (Square miles)
United States Total Area Comparisons (Square miles)United States Total Area Comparisons (Square miles)
United States Total Area Comparisons (Square miles)
 
May 2015 - Market Snapshot - General Overview
May 2015 - Market Snapshot - General OverviewMay 2015 - Market Snapshot - General Overview
May 2015 - Market Snapshot - General Overview
 

More from Jeffrey Breen

Tapping the Data Deluge with R
Tapping the Data Deluge with RTapping the Data Deluge with R
Tapping the Data Deluge with RJeffrey Breen
 
Getting started with R & Hadoop
Getting started with R & HadoopGetting started with R & Hadoop
Getting started with R & HadoopJeffrey Breen
 
Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)
Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)
Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)Jeffrey Breen
 
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Jeffrey Breen
 
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2Jeffrey Breen
 
Big Data Step-by-Step: Infrastructure 1/3: Local VM
Big Data Step-by-Step: Infrastructure 1/3: Local VMBig Data Step-by-Step: Infrastructure 1/3: Local VM
Big Data Step-by-Step: Infrastructure 1/3: Local VMJeffrey Breen
 
Move your data (Hans Rosling style) with googleVis + 1 line of R code
Move your data (Hans Rosling style) with googleVis + 1 line of R codeMove your data (Hans Rosling style) with googleVis + 1 line of R code
Move your data (Hans Rosling style) with googleVis + 1 line of R codeJeffrey Breen
 
R by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlinesR by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlinesJeffrey Breen
 
Accessing Databases from R
Accessing Databases from RAccessing Databases from R
Accessing Databases from RJeffrey Breen
 
Grouping & Summarizing Data in R
Grouping & Summarizing Data in RGrouping & Summarizing Data in R
Grouping & Summarizing Data in RJeffrey Breen
 
R + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop clusterR + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop clusterJeffrey Breen
 
FAA Aviation Forecasts 2011-2031 overview
FAA Aviation Forecasts 2011-2031 overviewFAA Aviation Forecasts 2011-2031 overview
FAA Aviation Forecasts 2011-2031 overviewJeffrey Breen
 

More from Jeffrey Breen (12)

Tapping the Data Deluge with R
Tapping the Data Deluge with RTapping the Data Deluge with R
Tapping the Data Deluge with R
 
Getting started with R & Hadoop
Getting started with R & HadoopGetting started with R & Hadoop
Getting started with R & Hadoop
 
Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)
Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)
Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)
 
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
 
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
 
Big Data Step-by-Step: Infrastructure 1/3: Local VM
Big Data Step-by-Step: Infrastructure 1/3: Local VMBig Data Step-by-Step: Infrastructure 1/3: Local VM
Big Data Step-by-Step: Infrastructure 1/3: Local VM
 
Move your data (Hans Rosling style) with googleVis + 1 line of R code
Move your data (Hans Rosling style) with googleVis + 1 line of R codeMove your data (Hans Rosling style) with googleVis + 1 line of R code
Move your data (Hans Rosling style) with googleVis + 1 line of R code
 
R by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlinesR by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlines
 
Accessing Databases from R
Accessing Databases from RAccessing Databases from R
Accessing Databases from R
 
Grouping & Summarizing Data in R
Grouping & Summarizing Data in RGrouping & Summarizing Data in R
Grouping & Summarizing Data in R
 
R + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop clusterR + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop cluster
 
FAA Aviation Forecasts 2011-2031 overview
FAA Aviation Forecasts 2011-2031 overviewFAA Aviation Forecasts 2011-2031 overview
FAA Aviation Forecasts 2011-2031 overview
 

Recently uploaded

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

Recently uploaded (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 

Reshaping Data in R

  • 1. useR Vignette: Reshaping Data in R Greater Boston useR Group April 6, 2011 by Jeffrey Breen jbreen@cambridge.aero Photo from http://www.ibm-1401.info/
  • 2. Outline ● Sample data from 2010 U.S. Census ● Starts “wide”, like a spreadsheet ● reshape2 package ● “melt” to make long ● “*cast” to go back to wide ● Extra credit fun with “dcast” ● Further reading useR Vignette: Reshaping Data in R Greater Boston useR Meeting, April 2011 Slide 2
  • 3. Sample data: U.S. Census 2010 Read the data: > pop = read.csv('http://2010.census.gov/2010census/data/pop_density.csv', skip=3) Just keep the first few columns (total state populations by year): > pop = pop[,1:12] Clean up column names: > colnames(pop) [1] "STATE_OR_REGION" "X1910_POPULATION" "X1920_POPULATION" "X1930_POPULATION" "X1940_POPULATION" "X1950_POPULATION" [7] "X1960_POPULATION" "X1970_POPULATION" "X1980_POPULATION" "X1990_POPULATION" "X2000_POPULATION" "X2010_POPULATION" > colnames(pop) = c('state', seq(1910, 2010, 10)) > colnames(pop) [1] "state" "1910" "1920" "1930" "1940"  "1950"  "1960"  "1970"  "1980"  "1990"  "2000"  "2010" useR Vignette: Reshaping Data in R Greater Boston useR Meeting, April 2011 Slide 3
  • 4. Data set is “wide” like a spreadsheet > head(pop,40) state 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 1 United States 92228531 106021568 123202660 132165129 151325798 179323175 203211926 226545805 248709873 281421906 308745538 2 Alabama 2138093 2348174 2646248 2832961 3061743 3266740 3444165 3893888 4040587 4447100 4779736 3 Alaska 64356 55036 59278 72524 128643 226167 300382 401851 550043 626932 710231 4 Arizona 204354 334162 435573 499261 749587 1302161 1770900 2718215 3665228 5130632 6392017 5 Arkansas 1574449 1752204 1854482 1949387 1909511 1786272 1923295 2286435 2350725 2673400 2915918 6 California 2377549 3426861 5677251 6907387 10586223 15717204 19953134 23667902 29760021 33871648 37253956 7 Colorado 799024 939629 1035791 1123296 1325089 1753947 2207259 2889964 3294394 4301261 5029196 8 Connecticut 1114756 1380631 1606903 1709242 2007280 2535234 3031709 3107576 3287116 3405565 3574097 9 Delaware 202322 223003 238380 266505 318085 446292 548104 594338 666168 783600 897934 10 District of Columbia 331069 437571 486869 663091 802178 763956 756510 638333 606900 572059 601723 11 Florida 752619 968470 1468211 1897414 2771305 4951560 6789443 9746324 12937926 15982378 18801310 12 Georgia 2609121 2895832 2908506 3123723 3444578 3943116 4589575 5463105 6478216 8186453 9687653 13 Hawaii 191909 255912 368336 423330 499794 632772 768561 964691 1108229 1211537 1360301 14 Idaho 325594 431866 445032 524873 588637 667191 712567 943935 1006749 1293953 1567582 15 Illinois 5638591 6485280 7630654 7897241 8712176 10081158 11113976 11426518 11430602 12419293 12830632 16 Indiana 2700876 2930390 3238503 3427796 3934224 4662498 5193669 5490224 5544159 6080485 6483802 17 Iowa 2224771 2404021 2470939 2538268 2621073 2757537 2824376 2913808 2776755 2926324 3046355 18 Kansas 1690949 1769257 1880999 1801028 1905299 2178611 2246578 2363679 2477574 2688418 2853118 19 Kentucky 2289905 2416630 2614589 2845627 2944806 3038156 3218706 3660777 3685296 4041769 4339367 20 Louisiana 1656388 1798509 2101593 2363880 2683516 3257022 3641306 4205900 4219973 4468976 4533372 21 Maine 742371 768014 797423 847226 913774 969265 992048 1124660 1227928 1274923 1328361 22 Maryland 1295346 1449661 1631526 1821244 2343001 3100689 3922399 4216975 4781468 5296486 5773552 23 Massachusetts 3366416 3852356 4249614 4316721 4690514 5148578 5689170 5737037 6016425 6349097 6547629 24 Michigan 2810173 3668412 4842325 5256106 6371766 7823194 8875083 9262078 9295297 9938444 9883640 25 Minnesota 2075708 2387125 2563953 2792300 2982483 3413864 3804971 4075970 4375099 4919479 5303925 26 Mississippi 1797114 1790618 2009821 2183796 2178914 2178141 2216912 2520638 2573216 2844658 2967297 27 Missouri 3293335 3404055 3629367 3784664 3954653 4319813 4676501 4916686 5117073 5595211 5988927 28 Montana 376053 548889 537606 559456 591024 674767 694409 786690 799065 902195 989415 29 Nebraska 1192214 1296372 1377963 1315834 1325510 1411330 1483493 1569825 1578385 1711263 1826341 30 Nevada 81875 77407 91058 110247 160083 285278 488738 800493 1201833 1998257 2700551 31 New Hampshire 430572 443083 465293 491524 533242 606921 737681 920610 1109252 1235786 1316470 32 New Jersey 2537167 3155900 4041334 4160165 4835329 6066782 7168164 7364823 7730188 8414350 8791894 33 New Mexico 327301 360350 423317 531818 681187 951023 1016000 1302894 1515069 1819046 2059179 34 New York 9113614 10385227 12588066 13479142 14830192 16782304 18236967 17558072 17990455 18976457 19378102 35 North Carolina 2206287 2559123 3170276 3571623 4061929 4556155 5082059 5881766 6628637 8049313 9535483 36 North Dakota 577056 646872 680845 641935 619636 632446 617761 652717 638800 642200 672591 37 Ohio 4767121 5759394 6646697 6907612 7946627 9706397 10652017 10797630 10847115 11353140 11536504 38 Oklahoma 1657155 2028283 2396040 2336434 2233351 2328284 2559229 3025290 3145585 3450654 3751351 39 Oregon 672765 783389 953786 1089684 1521341 1768687 2091385 2633105 2842321 3421399 3831074 40 Pennsylvania 7665111 8720017 9631350 9900180 10498012 11319366 11793909 11863895 11881643 12281054 12702379 useR Vignette: Reshaping Data in R Greater Boston useR Meeting, April 2011 Slide 4
  • 5. ...and wide can be convenient ● Wide format can be useful – just like spreadsheets ● Easy to read, comprehend ● Can sort whole data set by one column: > library(doBy) > top = orderBy(~-2010, pop) > top = subset(top, state!='United States') > top = head(top, 10) > top$state = factor(top$state) > head(top) state 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 6 California 2377549 3426861 5677251 6907387 10586223 15717204 19953134 23667902 29760021 33871648 37253956 45 Texas 3896542 4663228 5824715 6414824 7711194 9579677 11196730 14229191 16986510 20851820 25145561 34 New York 9113614 10385227 12588066 13479142 14830192 16782304 18236967 17558072 17990455 18976457 19378102 11 Florida 752619 968470 1468211 1897414 2771305 4951560 6789443 9746324 12937926 15982378 18801310 15 Illinois 5638591 6485280 7630654 7897241 8712176 10081158 11113976 11426518 11430602 12419293 12830632 40 Pennsylvania 7665111 8720017 9631350 9900180 10498012 11319366 11793909 11863895 11881643 12281054 12702379 ● But how would you plot population vs. year? useR Vignette: Reshaping Data in R Greater Boston useR Meeting, April 2011 Slide 5
  • 6. ...or too much of a good thing I'm sure this seemed like a good idea at the time useR Vignette: Reshaping Data in R Greater Boston useR Meeting, April 2011 Slide 6
  • 7. reshape2 package ● Hadley Wickham's “reboot of the reshape package” ● Like “plyr”, naming convention denotes output data type: acast() → arrays, dcast() → data.frames – Beware conflict between reshape::melt() and reshape2::melt() ● Announced September 2010 on [R-pkgs]: ● http://r.789695.n4.nabble.com/R-pkgs-reshape2-a-reboot- ● Similar functions in Base R ● utils::stack(), utils::unstack(), stats::reshape() useR Vignette: Reshaping Data in R Greater Boston useR Meeting, April 2011 Slide 7
  • 8. melt() all those columns away melt() treats column names as a variable as it collapses data into long format: > mtop = melt(top, id.vars='state', variable.name='year', value.name='population') > head(mtop) state year population 1 California 1910 2377549 2 Texas 1910 3896542 3 New York 1910 9113614 4 Florida 1910 752619 5 Illinois 1910 5638591 6 Pennsylvania 1910 7665111 > tail(mtop) state year population 105 Illinois 2010 12830632 106 Pennsylvania 2010 12702379 107 Ohio 2010 11536504 108 Michigan 2010 9883640 109 Georgia 2010 9687653 110 North Carolina 2010 9535483 Long, “molten” format (may be) easier for analysis, plotting, database storage, etc. useR Vignette: Reshaping Data in R Greater Boston useR Meeting, April 2011 Slide 8
  • 9. Obligatory graph Molten form certainly makes graphing easier in ggplot2: library(ggplot2) ggplot(data=mtop, aes(group=state)) + geom_line(aes(x=year, y=population, color=state)) useR Vignette: Reshaping Data in R Greater Boston useR Meeting, April 2011 Slide 9
  • 10. “cast” functions put data back into wide form ● Sometimes you really can put the toothpaste back into the tube ● Use acast() to produce arrays/matrices, dcast() for data.frames ● Accepts formula notation > dcast(mtop, state~year, value_var='population') state 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 1 California 2377549 3426861 5677251 6907387 10586223 15717204 19953134 23667902 29760021 33871648 37253956 2 Florida 752619 968470 1468211 1897414 2771305 4951560 6789443 9746324 12937926 15982378 18801310 3 Georgia 2609121 2895832 2908506 3123723 3444578 3943116 4589575 5463105 6478216 8186453 9687653 4 Illinois 5638591 6485280 7630654 7897241 8712176 10081158 11113976 11426518 11430602 12419293 12830632 5 Michigan 2810173 3668412 4842325 5256106 6371766 7823194 8875083 9262078 9295297 9938444 9883640 6 New York 9113614 10385227 12588066 13479142 14830192 16782304 18236967 17558072 17990455 18976457 19378102 7 North Carolina 2206287 2559123 3170276 3571623 4061929 4556155 5082059 5881766 6628637 8049313 9535483 8 Ohio 4767121 5759394 6646697 6907612 7946627 9706397 10652017 10797630 10847115 11353140 11536504 9 Pennsylvania 7665111 8720017 9631350 9900180 10498012 11319366 11793909 11863895 11881643 12281054 12702379 10 Texas 3896542 4663228 5824715 6414824 7711194 9579677 11196730 14229191 16986510 20851820 25145561 useR Vignette: Reshaping Data in R Greater Boston useR Meeting, April 2011 Slide 10
  • 11. Extra credit: *cast functions for BI ● Disclaimer on http://had.co.nz/reshape/ warns not a “fully fledged OLAP solution” ● But *cast() can replace table() for computing frequency/contingency tables and crosstabs ● Formula notation allows you to pick out specific columns so wide data can look molten ● Here's some (fake) consumer survey data: > head(survey) ResponseID sex age favorite.airline 1 R_51JpA6GecA0SRcU Female 36-49 years Virgin America 2 R_0N77P8pZyPnjctm Male 50-65 years Southwest 3 R_eFoGuGRSuzqnHgM Male 50-65 years Southwest 4 R_9KXgybRXPiDG3LS Male 36-49 years Continental 5 R_cCH0fRZmc0zwGkk Male 50-65 years Delta/Northwest 6 R_ba8ujmCV5OnBNxW Male 36-49 years Southwest [...] useR Vignette: Reshaping Data in R Greater Boston useR Meeting, April 2011 Slide 11
  • 12. Crosstabs with dcast() > dcast(survey, favorite.airline~sex, value_var='favorite.airline', fun.aggregate=length) favorite.airline Male Female 1 Air Tran 7 2 2 Alaska Airlines 17 3 3 Allegiant 0 1 4 American 20 9 5 Continental 31 8 6 Delta/Northwest 36 6 7 Frontier/Midwest 4 3 8 JetBlue 41 29 9 Southwest 100 40 10 Spirit 1 0 11 United 23 10 12 US Airways 2 4 13 Virgin America 30 18 > dcast(survey, favorite.airline~age, value_var='favorite.airline', fun.aggregate=length) favorite.airline 18-24 years 25-35 years 36-49 years 50-65 years 66+ years 1 Air Tran 0 1 3 4 1 2 Alaska Airlines 0 0 5 11 4 3 Allegiant 0 0 1 0 0 4 American 1 4 7 14 3 5 Continental 0 3 15 17 4 6 Delta/Northwest 0 6 16 20 0 7 Frontier/Midwest 0 1 3 3 0 8 JetBlue 2 12 19 35 2 9 Southwest 0 20 44 66 10 10 Spirit 0 0 0 1 0 11 United 2 4 13 14 0 12 US Airways 0 0 3 3 0 13 Virgin America 0 8 23 13 4 useR Vignette: Reshaping Data in R Greater Boston useR Meeting, April 2011 Slide 12
  • 13. Further reading ● reshape2 package on CRAN ● http://cran.r-project.org/web/packages/reshape2/ ● Hadley's github (bleeding edge) ● https://github.com/hadley/reshape ● Decision Stats: “Using Reshape2 for transposing datasets in R” ● http://decisionstats.com/2010/11/06/using-reshape2-for-transposin ● Recology: “Good riddance to Excel pivot tables” ● http://r-ecology.blogspot.com/2011/01/good-riddance-to-excel-pivo ● Stack Overflow discussions: “[r] reshape2” ● http://stackoverflow.com/search?tab=votes&q=[r]%20reshape2 useR Vignette: Reshaping Data in R Greater Boston useR Meeting, April 2011 Slide 13