SlideShare a Scribd company logo
Group Cases
group_by(.data, ..., add = FALSE)
Returns copy of table grouped by …
g_iris <- group_by(iris, Species)
ungroup(x, ...)
Returns ungrouped copy of table.
ungroup(g_iris)
wwwwww
www
Use group_by() to created a "grouped" copy of a table. dplyr
functions will manipulate each "group" separately and then
combine the results.
mtcars %>%
group_by(cyl) %>%
summarise(avg = mean(mpg))
Summarise Cases
These apply summary functions to
columns to create a new table.
Summary functions take vectors as
input and return one value (see back).
summary
function
Variations
• summarise_all() - Apply funs to every column.
• summarise_at() - Apply funs to specific columns.
• summarise_if() - Apply funs to all cols of one type.
www
www
summarise(.data, …)
Compute table of summaries. Also
summarise_().
summarise(mtcars, avg = mean(mpg))
count(x, ..., wt = NULL, sort = FALSE)
Count number of rows in each group defined
by the variables in … Also tally().
count(iris, Species)
Manipulate Cases
RStudio® is a trademark of RStudio, Inc. • CC BY RStudio • info@rstudio.com • 844-448-1212 • rstudio.com Learn more with browseVignettes(package = c("dplyr", "tibble")) • dplyr 0.5.0 • tibble 1.2.0 • Updated: 2017-01
Extract Cases
Add Cases
Arrange Cases
filter(.data, …)
Extract rows that meet logical criteria. Also
filter_(). filter(iris, Sepal.Length > 7)
distinct(.data, ..., .keep_all = FALSE)
Remove rows with duplicate values. Also
distinct_(). distinct(iris, Species)
sample_frac(tbl, size = 1, replace = FALSE,
weight = NULL, .env = parent.frame())
Randomly select fraction of rows.
sample_frac(iris, 0.5, replace = TRUE)
sample_n(tbl, size, replace = FALSE,
weight = NULL, .env = parent.frame())
Randomly select size rows.
sample_n(iris, 10, replace = TRUE)
slice(.data, …)
Select rows by position. Also slice_().
slice(iris, 10:15)
top_n(x, n, wt)
Select and order top n entries (by group if
grouped data). top_n(iris, 5, Sepal.Width)
Row functions return a subset of rows as a new table. Use a variant
that ends in _ for non-standard evaluation friendly code.
wwwwww
wwwwww
wwwwww
wwwwww
Logical and boolean operators to use with filter()
See ?base::logic and ?Comparison for help.
> >= !is.na() ! &
< <= is.na() %in% | xor()
wwwwww
arrange(.data, ...)
Order rows by values of a column (low to high),
use with desc() to order from high to low.
arrange(mtcars, mpg)
arrange(mtcars, desc(mpg))
wwwwww
add_row(.data, ..., .before = NULL,
.after = NULL)
Add one or more rows to a table.
add_row(faithful, eruptions = 1, waiting = 1)
Manipulate Variables
Extract Variables
Make New Variables
wwww
www ww
Column functions return a set of columns as a new table. Use a
variant that ends in _ for non-standard evaluation friendly code.
vectorized
function
These apply vectorized functions to
columns. Vectorized funs take vectors
as input and return vectors of the
same length as output (see back).
Data Transformation
with dplyr Cheat Sheet
wwwwww
www
wwww
mutate(.data, …)
Compute new column(s).
mutate(mtcars, gpm = 1/mpg)
transmute(.data, …)
Compute new column(s), drop others.
transmute(mtcars, gpm = 1/mpg)
mutate_all(.tbl, .funs, ...)
Apply funs to every column. Use with
funs(). mutate_all(faithful, funs(log(.),
log2(.)))
mutate_at(.tbl, .cols, .funs, ...)
Apply funs to specific columns. Use with
funs(), vars() and the helper functions for
select().
mutate_at(iris, vars( -Species), funs(log(.)))
mutate_if(.tbl, .predicate, .funs, ...)
Apply funs to all columns of one type. Use
with funs().
mutate_if(iris, is.numeric, funs(log(.)))
add_column(.data, ..., .before =
NULL, .after = NULL)
Add new column(s).
add_column(mtcars, new = 1:32)
rename(.data, …)
Rename columns.
rename(iris, Length = Sepal.Length)
w ww
Use these helpers with select(),
e.g. select(iris, starts_with("Sepal"))
contains(match)
ends_with(match)
matches(match)
:, e.g. mpg:cyl
-, e.g, -Species
Each observation, or
case, is in its own row
A B C
Each variable is
in its own column
A B C
&
dplyr functions work with pipes and expect tidy data. In tidy data:
pipes
x %>% f(y)
becomes f(x, y) num_range(prefix, range)
one_of(…)
starts_with(match)
select(.data, …)
Extract columns by name. Also select_if()
select(iris, Sepal.Length, Species)
wwwwww
C A B
1 a t
2 b u
3 c v
1 a t
2 b u
3 c v
C A B
A B C
a t 1
b u 2
c v 3
1 a t
2 b u
3 c v
C A B
A.x B.x C A.y B.y
a t 1 d w
b u 2 b u
c v 3 a t
a t 1 d w
b u 2 b u
c v 3 a t
A1 B1 C A2 B2
A B.x C B.y D
a t 1 t 3
b u 2 u 2
c v 3 NA NA
A B D
a t 3
b u 2
d w 1
A B C D
a t 1 3
b u 2 2
c v 3 NA
d w NA 1
A B C D
a t 1 3
b u 2 2
a t 1 3
b u 2 2
d w NA 1
A B C D
A B C D
a t 1 3
b u 2 2
c v 3 NA
A B C A B D
a t 1 a t 3
b u 2 b u 2
c v 3 d w 1
A B C
c v 3
A B C
a t 1
b u 2
A B C
a t 1
b u 2
a t 1
b u 2
A B C
c v 3
d w 4
A B C
c v 3
DF A B C
x a t 1
x b u 2
x c v 3
z c v 3
z d w 4
Counts
dplyr::n() - number of values/rows
dplyr::n_distinct() - # of uniques
sum(!is.na()) - # of non-NA’s
Location
mean() - mean, also mean(!is.na())
median() - median
Logicals
mean() - Proportion of TRUE’s
sum() - # of TRUE’s
Position/Order
dplyr::first() - first value
dplyr::last() - last value
dplyr::nth() - value in nth location of vector
Rank
quantile() - nth quantile
min() - minimum value
max() - maximum value
Spread
IQR() - Inter-Quartile Range
mad() - mean absolute deviation
sd() - standard deviation
var() - variance
Offsets
dplyr::lag() - Offset elements by 1
dplyr::lead() - Offset elements by -1
Cumulative Aggregates
dplyr::cumall() - Cumulative all()
dplyr::cumany() - Cumulative any()
cummax() - Cumulative max()
dplyr::cummean() - Cumulative mean()
cummin() - Cumulative min()
cumprod() - Cumulative prod()
cumsum() - Cumulative sum()
Rankings
dplyr::cume_dist() - Proportion of all values <=
dplyr::dense_rank() - rank with ties = min, no
gaps
dplyr::min_rank() - rank with ties = min
dplyr::ntile() - bins into n bins
dplyr::percent_rank() - min_rank scaled to [0,1]
dplyr::row_number() - rank with ties = "first"
Math
+, - , *, /, ^, %/%, %% - arithmetic ops
log(), log2(), log10() - logs
<, <=, >, >=, !=, == - logical comparisons
Misc
dplyr::between() - x >= left & x <= right
dplyr::case_when() - multi-case if_else()
dplyr::coalesce() - first non-NA values by
element across a set of vectors
dplyr::if_else() - element-wise if() + else()
dplyr::na_if() - replace specific values with NA
pmax() - element-wise max()
pmin() - element-wise min()
dplyr::recode() - Vectorized switch()
dplyr::recode_factor() - Vectorized switch() for
factors
Summary FunctionsVectorized Functions Combine Tables
RStudio® is a trademark of RStudio, Inc. • CC BY RStudio • info@rstudio.com • 844-448-1212 • rstudio.com Learn more with browseVignettes(package = c("dplyr", "tibble")) • dplyr 0.5.0 • tibble 1.2.0 • Updated: 2017-01
Combine Variables
bind_cols(…)
Returns tables placed side by
side as a single table.
BE SURE THAT ROWS ALIGN.
left_join(x, y, by = NULL,
copy=FALSE, suffix=c(“.x”,“.y”),…)
Join matching values from y to x.
right_join(x, y, by = NULL, copy =
FALSE, suffix=c(“.x”,“.y”),…)
Join matching values from x to y.
inner_join(x, y, by = NULL, copy =
FALSE, suffix=c(“.x”,“.y”),…)
Join data. Retain only rows with
matches.
full_join(x, y, by = NULL,
copy=FALSE, suffix=c(“.x”,“.y”),…)
Join data. Retain all values, all
rows.
Use a "Mutating Join" to join one table to columns
from another, matching values with the rows that
they correspond to. Each join retains a different
combination of values from the tables.
Use by = c("col1", "col2") to
specify the column(s) to match
on.
left_join(x, y, by = "A")
Use a named vector, by =
c("col1" = "col2"), to match on
columns with different names in
each data set.
left_join(x, y, by = c("C" = "D"))
Use suffix to specify suffix to give
to duplicate column names.
left_join(x, y, by = c("C" = "D"),
suffix = c("1", "2"))
Use bind_cols() to paste tables beside each other
as they are.
A B C
a t 1
b u 2
c v 3
+ =
x y
A B D
a t 3
b u 2
d w 1
Combine Cases
A B C
a t 1
b u 2
Use bind_rows() to paste tables below each other as
they are.
bind_rows(…, .id = NULL)
Returns tables one on top of the other
as a single table. Set .id to a column
name to add a column of the original
table names (as pictured)
intersect(x, y, …)
Rows that appear in both x and z.
setdiff(x, y, …)
Rows that appear in x but not z.
union(x, y, …)
Rows that appear in x or z. (Duplicates
removed). union_all() retains
duplicates.
Extract Rows
Use a "Filtering Join" to filter one table against the
rows of another.
semi_join(x, y, by = NULL, …)
Return rows of x that have a match in y.
USEFUL TO SEE WHAT WILL BE JOINED.
anti_join(x, y, by = NULL, …)
Return rows of x that do not have a
match in y. USEFUL TO SEE WHAT WILL
NOT BE JOINED.
A B C
a t 1
b u 2
c v 3
+
x
z
A B C
c v 3
d w 4
Use setequal() to test whether two data sets contain
the exact same rows (in any order).
A B C
a t 1
b u 2
c v 3
+ =
x y
A B D
a t 3
b u 2
d w 1
to use with summarise()to use with mutate()
mutate() and transmute() apply vectorized
functions to columns to create new columns.
Vectorized functions take vectors as input and
return vectors of the same length as output.
vectorized
function
summarise() applies summary functions to
columns to create a new table. Summary
functions take vectors as input and return single
values as output.
summary
function
Row names
Tidy data does not use rownames, which store
a variable outside of the columns. To work with
the rownames, first move them into a column.
rownames_to_column()
Move row names into col.
a <- rownames_to_column(iris,
var = "C")
column_to_rownames()
Move col in row names.
column_to_rownames(a,
var = "C")
Also has_rownames(), remove_rownames()
a t 1
b u 2
A B C
a t 1
b u 2
c v 3
A B C A B C
a t 3
b u 2
d w 1
a t 1
b u 2
c v 3
A B CA B C
a t 1
b u 2
c v 3
+ =
x y
A B D
a t 3
b u 2
d w 1
A B C
a t 1
b u 2
c v 3 + =
x y
A B D
a t 3
b u 2
d w 1

More Related Content

What's hot

Fact less fact Tables & Aggregate Tables
Fact less fact Tables & Aggregate Tables Fact less fact Tables & Aggregate Tables
Fact less fact Tables & Aggregate Tables
Sunita Sahu
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
Mahir Haque
 
Dimensional modelling-mod-3
Dimensional modelling-mod-3Dimensional modelling-mod-3
Dimensional modelling-mod-3Malik Alig
 
Data Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisData Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data Analysis
Eva Durall
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecturepcherukumalla
 
Difference between ER-Modeling and Dimensional Modeling
Difference between ER-Modeling and Dimensional ModelingDifference between ER-Modeling and Dimensional Modeling
Difference between ER-Modeling and Dimensional ModelingAbdul Aslam
 
Introduction to R for data science
Introduction to R for data scienceIntroduction to R for data science
Introduction to R for data science
Long Nguyen
 
Data Visualization With Tableau | Edureka
Data Visualization With Tableau | EdurekaData Visualization With Tableau | Edureka
Data Visualization With Tableau | Edureka
Edureka!
 
Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012
Empowered Holdings, LLC
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time ApplicationsDataWorks Summit
 
Introduction to Tableau
Introduction to Tableau Introduction to Tableau
Introduction to Tableau
Mithileysh Sathiyanarayanan
 
Tableau
TableauTableau
Tableau
Nilesh Patel
 
Data Wrangling
Data WranglingData Wrangling
Data Wrangling
Gramener
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analytics
SSaudia
 
3. R- list and data frame
3. R- list and data frame3. R- list and data frame
3. R- list and data frame
krishna singh
 
The Relational Model
The Relational ModelThe Relational Model
The Relational Model
Bhandari Nawaraj
 
Tableau interview questions www.bigclasses.com
Tableau interview questions www.bigclasses.comTableau interview questions www.bigclasses.com
Tableau interview questions www.bigclasses.com
bigclasses.com
 
Réalisation d'un mashup de données avec DSS de Dataiku et visualisation avec ...
Réalisation d'un mashup de données avec DSS de Dataiku et visualisation avec ...Réalisation d'un mashup de données avec DSS de Dataiku et visualisation avec ...
Réalisation d'un mashup de données avec DSS de Dataiku et visualisation avec ...
Gautier Poupeau
 

What's hot (20)

Fact less fact Tables & Aggregate Tables
Fact less fact Tables & Aggregate Tables Fact less fact Tables & Aggregate Tables
Fact less fact Tables & Aggregate Tables
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Dimensional modelling-mod-3
Dimensional modelling-mod-3Dimensional modelling-mod-3
Dimensional modelling-mod-3
 
Data Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisData Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data Analysis
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 
Difference between ER-Modeling and Dimensional Modeling
Difference between ER-Modeling and Dimensional ModelingDifference between ER-Modeling and Dimensional Modeling
Difference between ER-Modeling and Dimensional Modeling
 
Introduction to R for data science
Introduction to R for data scienceIntroduction to R for data science
Introduction to R for data science
 
Data Visualization With Tableau | Edureka
Data Visualization With Tableau | EdurekaData Visualization With Tableau | Edureka
Data Visualization With Tableau | Edureka
 
Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012
 
Eda sri
Eda sriEda sri
Eda sri
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time Applications
 
Introduction to Tableau
Introduction to Tableau Introduction to Tableau
Introduction to Tableau
 
Tableau
TableauTableau
Tableau
 
Data Wrangling
Data WranglingData Wrangling
Data Wrangling
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analytics
 
3. R- list and data frame
3. R- list and data frame3. R- list and data frame
3. R- list and data frame
 
The Relational Model
The Relational ModelThe Relational Model
The Relational Model
 
Tableau interview questions www.bigclasses.com
Tableau interview questions www.bigclasses.comTableau interview questions www.bigclasses.com
Tableau interview questions www.bigclasses.com
 
Datawarehouse and OLAP
Datawarehouse and OLAPDatawarehouse and OLAP
Datawarehouse and OLAP
 
Réalisation d'un mashup de données avec DSS de Dataiku et visualisation avec ...
Réalisation d'un mashup de données avec DSS de Dataiku et visualisation avec ...Réalisation d'un mashup de données avec DSS de Dataiku et visualisation avec ...
Réalisation d'un mashup de données avec DSS de Dataiku et visualisation avec ...
 

Similar to Data transformation-cheatsheet

Commands list
Commands listCommands list
Commands list
PRAVEENKUMAR CHIKOTI
 
Pandas,scipy,numpy cheatsheet
Pandas,scipy,numpy cheatsheetPandas,scipy,numpy cheatsheet
Pandas,scipy,numpy cheatsheet
Dr. Volkan OBAN
 
tidyr.pdf
tidyr.pdftidyr.pdf
tidyr.pdf
Mateus S. Xavier
 
Matlab cheatsheet
Matlab cheatsheetMatlab cheatsheet
Matlab cheatsheetlokeshkumer
 
Statistics lab 1
Statistics lab 1Statistics lab 1
Statistics lab 1
University of Salerno
 
R Cheat Sheet – Data Management
R Cheat Sheet – Data ManagementR Cheat Sheet – Data Management
R Cheat Sheet – Data Management
Dr. Volkan OBAN
 
R gráfico
R gráficoR gráfico
R gráfico
stryper1968
 
Short Reference Card for R users.
Short Reference Card for R users.Short Reference Card for R users.
Short Reference Card for R users.
Dr. Volkan OBAN
 
Reference card for R
Reference card for RReference card for R
Reference card for R
Dr. Volkan OBAN
 
@ R reference
@ R reference@ R reference
@ R reference
vickyrolando
 
R command cheatsheet.pdf
R command cheatsheet.pdfR command cheatsheet.pdf
R command cheatsheet.pdf
Ngcnh947953
 
MATLAB-Cheat-Sheet-for-Data-Science_LondonSchoolofEconomics (1).pdf
MATLAB-Cheat-Sheet-for-Data-Science_LondonSchoolofEconomics (1).pdfMATLAB-Cheat-Sheet-for-Data-Science_LondonSchoolofEconomics (1).pdf
MATLAB-Cheat-Sheet-for-Data-Science_LondonSchoolofEconomics (1).pdf
Central university of Haryana
 
R Programming Reference Card
R Programming Reference CardR Programming Reference Card
R Programming Reference Card
Maurice Dawson
 
Data import-cheatsheet
Data import-cheatsheetData import-cheatsheet
Data import-cheatsheet
Dieudonne Nahigombeye
 
3 Data Structure in R
3 Data Structure in R3 Data Structure in R
3 Data Structure in R
Dr Nisha Arora
 
Stata Cheat Sheets (all)
Stata Cheat Sheets (all)Stata Cheat Sheets (all)
Stata Cheat Sheets (all)
Laura Hughes
 

Similar to Data transformation-cheatsheet (20)

Matlab tut3
Matlab tut3Matlab tut3
Matlab tut3
 
Commands list
Commands listCommands list
Commands list
 
Pandas,scipy,numpy cheatsheet
Pandas,scipy,numpy cheatsheetPandas,scipy,numpy cheatsheet
Pandas,scipy,numpy cheatsheet
 
tidyr.pdf
tidyr.pdftidyr.pdf
tidyr.pdf
 
Matlab cheatsheet
Matlab cheatsheetMatlab cheatsheet
Matlab cheatsheet
 
Statistics lab 1
Statistics lab 1Statistics lab 1
Statistics lab 1
 
ML-CheatSheet (1).pdf
ML-CheatSheet (1).pdfML-CheatSheet (1).pdf
ML-CheatSheet (1).pdf
 
R Cheat Sheet – Data Management
R Cheat Sheet – Data ManagementR Cheat Sheet – Data Management
R Cheat Sheet – Data Management
 
LMD UGR
LMD UGRLMD UGR
LMD UGR
 
R gráfico
R gráficoR gráfico
R gráfico
 
Short Reference Card for R users.
Short Reference Card for R users.Short Reference Card for R users.
Short Reference Card for R users.
 
Reference card for R
Reference card for RReference card for R
Reference card for R
 
@ R reference
@ R reference@ R reference
@ R reference
 
R command cheatsheet.pdf
R command cheatsheet.pdfR command cheatsheet.pdf
R command cheatsheet.pdf
 
MATLAB-Cheat-Sheet-for-Data-Science_LondonSchoolofEconomics (1).pdf
MATLAB-Cheat-Sheet-for-Data-Science_LondonSchoolofEconomics (1).pdfMATLAB-Cheat-Sheet-for-Data-Science_LondonSchoolofEconomics (1).pdf
MATLAB-Cheat-Sheet-for-Data-Science_LondonSchoolofEconomics (1).pdf
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
R Programming Reference Card
R Programming Reference CardR Programming Reference Card
R Programming Reference Card
 
Data import-cheatsheet
Data import-cheatsheetData import-cheatsheet
Data import-cheatsheet
 
3 Data Structure in R
3 Data Structure in R3 Data Structure in R
3 Data Structure in R
 
Stata Cheat Sheets (all)
Stata Cheat Sheets (all)Stata Cheat Sheets (all)
Stata Cheat Sheets (all)
 

More from Dieudonne Nahigombeye

Rstudio ide-cheatsheet
Rstudio ide-cheatsheetRstudio ide-cheatsheet
Rstudio ide-cheatsheet
Dieudonne Nahigombeye
 
Rmarkdown cheatsheet-2.0
Rmarkdown cheatsheet-2.0Rmarkdown cheatsheet-2.0
Rmarkdown cheatsheet-2.0
Dieudonne Nahigombeye
 
Reg ex cheatsheet
Reg ex cheatsheetReg ex cheatsheet
Reg ex cheatsheet
Dieudonne Nahigombeye
 
How big-is-your-graph
How big-is-your-graphHow big-is-your-graph
How big-is-your-graph
Dieudonne Nahigombeye
 
Ggplot2 cheatsheet-2.1
Ggplot2 cheatsheet-2.1Ggplot2 cheatsheet-2.1
Ggplot2 cheatsheet-2.1
Dieudonne Nahigombeye
 
Eurostat cheatsheet
Eurostat cheatsheetEurostat cheatsheet
Eurostat cheatsheet
Dieudonne Nahigombeye
 
Devtools cheatsheet
Devtools cheatsheetDevtools cheatsheet
Devtools cheatsheet
Dieudonne Nahigombeye
 
Base r
Base rBase r
Advanced r
Advanced rAdvanced r

More from Dieudonne Nahigombeye (10)

Sparklyr
SparklyrSparklyr
Sparklyr
 
Rstudio ide-cheatsheet
Rstudio ide-cheatsheetRstudio ide-cheatsheet
Rstudio ide-cheatsheet
 
Rmarkdown cheatsheet-2.0
Rmarkdown cheatsheet-2.0Rmarkdown cheatsheet-2.0
Rmarkdown cheatsheet-2.0
 
Reg ex cheatsheet
Reg ex cheatsheetReg ex cheatsheet
Reg ex cheatsheet
 
How big-is-your-graph
How big-is-your-graphHow big-is-your-graph
How big-is-your-graph
 
Ggplot2 cheatsheet-2.1
Ggplot2 cheatsheet-2.1Ggplot2 cheatsheet-2.1
Ggplot2 cheatsheet-2.1
 
Eurostat cheatsheet
Eurostat cheatsheetEurostat cheatsheet
Eurostat cheatsheet
 
Devtools cheatsheet
Devtools cheatsheetDevtools cheatsheet
Devtools cheatsheet
 
Base r
Base rBase r
Base r
 
Advanced r
Advanced rAdvanced r
Advanced r
 

Recently uploaded

Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 

Recently uploaded (20)

Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 

Data transformation-cheatsheet

  • 1. Group Cases group_by(.data, ..., add = FALSE) Returns copy of table grouped by … g_iris <- group_by(iris, Species) ungroup(x, ...) Returns ungrouped copy of table. ungroup(g_iris) wwwwww www Use group_by() to created a "grouped" copy of a table. dplyr functions will manipulate each "group" separately and then combine the results. mtcars %>% group_by(cyl) %>% summarise(avg = mean(mpg)) Summarise Cases These apply summary functions to columns to create a new table. Summary functions take vectors as input and return one value (see back). summary function Variations • summarise_all() - Apply funs to every column. • summarise_at() - Apply funs to specific columns. • summarise_if() - Apply funs to all cols of one type. www www summarise(.data, …) Compute table of summaries. Also summarise_(). summarise(mtcars, avg = mean(mpg)) count(x, ..., wt = NULL, sort = FALSE) Count number of rows in each group defined by the variables in … Also tally(). count(iris, Species) Manipulate Cases RStudio® is a trademark of RStudio, Inc. • CC BY RStudio • info@rstudio.com • 844-448-1212 • rstudio.com Learn more with browseVignettes(package = c("dplyr", "tibble")) • dplyr 0.5.0 • tibble 1.2.0 • Updated: 2017-01 Extract Cases Add Cases Arrange Cases filter(.data, …) Extract rows that meet logical criteria. Also filter_(). filter(iris, Sepal.Length > 7) distinct(.data, ..., .keep_all = FALSE) Remove rows with duplicate values. Also distinct_(). distinct(iris, Species) sample_frac(tbl, size = 1, replace = FALSE, weight = NULL, .env = parent.frame()) Randomly select fraction of rows. sample_frac(iris, 0.5, replace = TRUE) sample_n(tbl, size, replace = FALSE, weight = NULL, .env = parent.frame()) Randomly select size rows. sample_n(iris, 10, replace = TRUE) slice(.data, …) Select rows by position. Also slice_(). slice(iris, 10:15) top_n(x, n, wt) Select and order top n entries (by group if grouped data). top_n(iris, 5, Sepal.Width) Row functions return a subset of rows as a new table. Use a variant that ends in _ for non-standard evaluation friendly code. wwwwww wwwwww wwwwww wwwwww Logical and boolean operators to use with filter() See ?base::logic and ?Comparison for help. > >= !is.na() ! & < <= is.na() %in% | xor() wwwwww arrange(.data, ...) Order rows by values of a column (low to high), use with desc() to order from high to low. arrange(mtcars, mpg) arrange(mtcars, desc(mpg)) wwwwww add_row(.data, ..., .before = NULL, .after = NULL) Add one or more rows to a table. add_row(faithful, eruptions = 1, waiting = 1) Manipulate Variables Extract Variables Make New Variables wwww www ww Column functions return a set of columns as a new table. Use a variant that ends in _ for non-standard evaluation friendly code. vectorized function These apply vectorized functions to columns. Vectorized funs take vectors as input and return vectors of the same length as output (see back). Data Transformation with dplyr Cheat Sheet wwwwww www wwww mutate(.data, …) Compute new column(s). mutate(mtcars, gpm = 1/mpg) transmute(.data, …) Compute new column(s), drop others. transmute(mtcars, gpm = 1/mpg) mutate_all(.tbl, .funs, ...) Apply funs to every column. Use with funs(). mutate_all(faithful, funs(log(.), log2(.))) mutate_at(.tbl, .cols, .funs, ...) Apply funs to specific columns. Use with funs(), vars() and the helper functions for select(). mutate_at(iris, vars( -Species), funs(log(.))) mutate_if(.tbl, .predicate, .funs, ...) Apply funs to all columns of one type. Use with funs(). mutate_if(iris, is.numeric, funs(log(.))) add_column(.data, ..., .before = NULL, .after = NULL) Add new column(s). add_column(mtcars, new = 1:32) rename(.data, …) Rename columns. rename(iris, Length = Sepal.Length) w ww Use these helpers with select(), e.g. select(iris, starts_with("Sepal")) contains(match) ends_with(match) matches(match) :, e.g. mpg:cyl -, e.g, -Species Each observation, or case, is in its own row A B C Each variable is in its own column A B C & dplyr functions work with pipes and expect tidy data. In tidy data: pipes x %>% f(y) becomes f(x, y) num_range(prefix, range) one_of(…) starts_with(match) select(.data, …) Extract columns by name. Also select_if() select(iris, Sepal.Length, Species) wwwwww
  • 2. C A B 1 a t 2 b u 3 c v 1 a t 2 b u 3 c v C A B A B C a t 1 b u 2 c v 3 1 a t 2 b u 3 c v C A B A.x B.x C A.y B.y a t 1 d w b u 2 b u c v 3 a t a t 1 d w b u 2 b u c v 3 a t A1 B1 C A2 B2 A B.x C B.y D a t 1 t 3 b u 2 u 2 c v 3 NA NA A B D a t 3 b u 2 d w 1 A B C D a t 1 3 b u 2 2 c v 3 NA d w NA 1 A B C D a t 1 3 b u 2 2 a t 1 3 b u 2 2 d w NA 1 A B C D A B C D a t 1 3 b u 2 2 c v 3 NA A B C A B D a t 1 a t 3 b u 2 b u 2 c v 3 d w 1 A B C c v 3 A B C a t 1 b u 2 A B C a t 1 b u 2 a t 1 b u 2 A B C c v 3 d w 4 A B C c v 3 DF A B C x a t 1 x b u 2 x c v 3 z c v 3 z d w 4 Counts dplyr::n() - number of values/rows dplyr::n_distinct() - # of uniques sum(!is.na()) - # of non-NA’s Location mean() - mean, also mean(!is.na()) median() - median Logicals mean() - Proportion of TRUE’s sum() - # of TRUE’s Position/Order dplyr::first() - first value dplyr::last() - last value dplyr::nth() - value in nth location of vector Rank quantile() - nth quantile min() - minimum value max() - maximum value Spread IQR() - Inter-Quartile Range mad() - mean absolute deviation sd() - standard deviation var() - variance Offsets dplyr::lag() - Offset elements by 1 dplyr::lead() - Offset elements by -1 Cumulative Aggregates dplyr::cumall() - Cumulative all() dplyr::cumany() - Cumulative any() cummax() - Cumulative max() dplyr::cummean() - Cumulative mean() cummin() - Cumulative min() cumprod() - Cumulative prod() cumsum() - Cumulative sum() Rankings dplyr::cume_dist() - Proportion of all values <= dplyr::dense_rank() - rank with ties = min, no gaps dplyr::min_rank() - rank with ties = min dplyr::ntile() - bins into n bins dplyr::percent_rank() - min_rank scaled to [0,1] dplyr::row_number() - rank with ties = "first" Math +, - , *, /, ^, %/%, %% - arithmetic ops log(), log2(), log10() - logs <, <=, >, >=, !=, == - logical comparisons Misc dplyr::between() - x >= left & x <= right dplyr::case_when() - multi-case if_else() dplyr::coalesce() - first non-NA values by element across a set of vectors dplyr::if_else() - element-wise if() + else() dplyr::na_if() - replace specific values with NA pmax() - element-wise max() pmin() - element-wise min() dplyr::recode() - Vectorized switch() dplyr::recode_factor() - Vectorized switch() for factors Summary FunctionsVectorized Functions Combine Tables RStudio® is a trademark of RStudio, Inc. • CC BY RStudio • info@rstudio.com • 844-448-1212 • rstudio.com Learn more with browseVignettes(package = c("dplyr", "tibble")) • dplyr 0.5.0 • tibble 1.2.0 • Updated: 2017-01 Combine Variables bind_cols(…) Returns tables placed side by side as a single table. BE SURE THAT ROWS ALIGN. left_join(x, y, by = NULL, copy=FALSE, suffix=c(“.x”,“.y”),…) Join matching values from y to x. right_join(x, y, by = NULL, copy = FALSE, suffix=c(“.x”,“.y”),…) Join matching values from x to y. inner_join(x, y, by = NULL, copy = FALSE, suffix=c(“.x”,“.y”),…) Join data. Retain only rows with matches. full_join(x, y, by = NULL, copy=FALSE, suffix=c(“.x”,“.y”),…) Join data. Retain all values, all rows. Use a "Mutating Join" to join one table to columns from another, matching values with the rows that they correspond to. Each join retains a different combination of values from the tables. Use by = c("col1", "col2") to specify the column(s) to match on. left_join(x, y, by = "A") Use a named vector, by = c("col1" = "col2"), to match on columns with different names in each data set. left_join(x, y, by = c("C" = "D")) Use suffix to specify suffix to give to duplicate column names. left_join(x, y, by = c("C" = "D"), suffix = c("1", "2")) Use bind_cols() to paste tables beside each other as they are. A B C a t 1 b u 2 c v 3 + = x y A B D a t 3 b u 2 d w 1 Combine Cases A B C a t 1 b u 2 Use bind_rows() to paste tables below each other as they are. bind_rows(…, .id = NULL) Returns tables one on top of the other as a single table. Set .id to a column name to add a column of the original table names (as pictured) intersect(x, y, …) Rows that appear in both x and z. setdiff(x, y, …) Rows that appear in x but not z. union(x, y, …) Rows that appear in x or z. (Duplicates removed). union_all() retains duplicates. Extract Rows Use a "Filtering Join" to filter one table against the rows of another. semi_join(x, y, by = NULL, …) Return rows of x that have a match in y. USEFUL TO SEE WHAT WILL BE JOINED. anti_join(x, y, by = NULL, …) Return rows of x that do not have a match in y. USEFUL TO SEE WHAT WILL NOT BE JOINED. A B C a t 1 b u 2 c v 3 + x z A B C c v 3 d w 4 Use setequal() to test whether two data sets contain the exact same rows (in any order). A B C a t 1 b u 2 c v 3 + = x y A B D a t 3 b u 2 d w 1 to use with summarise()to use with mutate() mutate() and transmute() apply vectorized functions to columns to create new columns. Vectorized functions take vectors as input and return vectors of the same length as output. vectorized function summarise() applies summary functions to columns to create a new table. Summary functions take vectors as input and return single values as output. summary function Row names Tidy data does not use rownames, which store a variable outside of the columns. To work with the rownames, first move them into a column. rownames_to_column() Move row names into col. a <- rownames_to_column(iris, var = "C") column_to_rownames() Move col in row names. column_to_rownames(a, var = "C") Also has_rownames(), remove_rownames() a t 1 b u 2 A B C a t 1 b u 2 c v 3 A B C A B C a t 3 b u 2 d w 1 a t 1 b u 2 c v 3 A B CA B C a t 1 b u 2 c v 3 + = x y A B D a t 3 b u 2 d w 1 A B C a t 1 b u 2 c v 3 + = x y A B D a t 3 b u 2 d w 1