2. Dplyr() provides a flexible grammar of data manipulation. It's the next iteration
of plyr, focused on tools for working with data frames (hence the d in the
name).
It has three main goals:
Identify the most important data manipulation verbs and make them easy to
use from R.
Provide blazing fast performance for in-memory data i.e. large data by
writing key pieces in C++ (using Rcpp)
Uses the same interface to work with data no matter where it's stored,
whether in a data frame, a data table or a database.
dplyr(): a grammar of data
manipulation
Rupak Roy
3. >install.packages(dplyr)
>library(dplyr)
#converting the variables to factors
>mtcars$cyl<- as.factor(mtcars$cyl);
>mtcars$am<-as.factor(mtcars$am);
>str(mtcars);
#using OR dpylr()
>dmtcars<-filter(mtcars,cyl==6|cyl==7)
#base R package
>dmtcarss<-mtcars[mtcars$cyl==6|mtcars$cyl == 7,]
>View(dmtcars)
#using AND dplyr()
>dmtcars<-filter(mtcars,cyl==6 & cyl==4)
#using base R package
>dmtcars<- dmtcars<-mtcars[mtcars$cyl==6 & mtcars$cyl ==4, ]
>View(dmtcars)
Subsetting: rows
4. #using dplyr()
>mtcars_col1<-select(mtcars, mpg, cyl, disp)
>View(mtcars_col)
#using base R-package
>mtcars_col1<-mtcars[ , c("mpg", "cyl", "disp")]
#adding new columns using dplyr:mutate()
>mtcars<-mutate(mtcars, newcol1= ifelse(mtcars$mpg<=15,"luxury",
ifelse(mtcars$mpg<= 20,"sports","economy")) )
#using base R-package where “newcol1” is the new column
>mtcars$newcol1<-ifelse(mtcars$mpg<=15,"luxury",ifelse (mtcars$mpg<=
20,"sports","economy"))
>View(mtcars)
>mtcars<-select(mtcars, -newcol1) #to delete a column
Sub-setting: columns
5. #arrange using dplyr()
>mtcars<-arrange(mtcars,cyl) #ascending order
>mtcars<-arrange(mtcars, desc(cyl))
#arrange using base R package
>mtcars<-mtcars[order(mtcars$cyl), ]
#group
>group_by(mtcars, cyl)
#summarize
>summarize(mtcars, mean(mpg), sd(mpg))
Order() and Group_by()
Rupak Roy
6. Pipelines is a R package helps to better organize the code in
pipeline built with %>% structuring sequences of data operations
left-to-right which is much easier to read, write, and maintain.
The dplyr R package uses %.% operator which is similar to %>%;
however, it has been deprecated and dplyr now recommends
magrittr that %>% which dplyr imports from magrittr.
Differences between %.%(dplyr) and %>%(magrittr):
> The magrittr package is a much more lightweight package that
exists to define only that pipe-like operator.
> Minimizing the need for local variables and function definitions.
Pipelines %>%(pipe operator)
7. #using base R package to find the average whose cylinder = 4
>mean(mtcars[mtcars$cyl=="4","mpg"])
Note: here we have use “4” as cyl data type is factor and not numeric else ==4
#using dpylr
>summarize(filter(mtcars,cyl=="4"), mean(mpg))
#using pipe
>mtcars%>%filter(cyl=="4")%>%summarize(mean(mpg))
#categorize the mtcars based on mpg in a new column
mtcars%>%mutate(newcol2=ifelse(mpg<=15,"luxury",ifelse (mpg<=
20,"sports","economy")))
magrittr()
Rupak Roy
8. Next:
We will see how to manipulate data using dates
Manipulating Data
Rupak Roy