Manipulating Data
Using DPLYR()
Rupak Roy
Dplyr() provides a flexible grammar of data manipulation. It's the next iteration
of plyr, focused on tools for working with data frames (hence the d in the
name).
It has three main goals:
 Identify the most important data manipulation verbs and make them easy to
use from R.
 Provide blazing fast performance for in-memory data i.e. large data by
writing key pieces in C++ (using Rcpp)
 Uses the same interface to work with data no matter where it's stored,
whether in a data frame, a data table or a database.
dplyr(): a grammar of data
manipulation
Rupak Roy
>install.packages(dplyr)
>library(dplyr)
#converting the variables to factors
>mtcars$cyl<- as.factor(mtcars$cyl);
>mtcars$am<-as.factor(mtcars$am);
>str(mtcars);
#using OR dpylr()
>dmtcars<-filter(mtcars,cyl==6|cyl==7)
#base R package
>dmtcarss<-mtcars[mtcars$cyl==6|mtcars$cyl == 7,]
>View(dmtcars)
#using AND dplyr()
>dmtcars<-filter(mtcars,cyl==6 & cyl==4)
#using base R package
>dmtcars<- dmtcars<-mtcars[mtcars$cyl==6 & mtcars$cyl ==4, ]
>View(dmtcars)
Subsetting: rows
#using dplyr()
>mtcars_col1<-select(mtcars, mpg, cyl, disp)
>View(mtcars_col)
#using base R-package
>mtcars_col1<-mtcars[ , c("mpg", "cyl", "disp")]
#adding new columns using dplyr:mutate()
>mtcars<-mutate(mtcars, newcol1= ifelse(mtcars$mpg<=15,"luxury",
ifelse(mtcars$mpg<= 20,"sports","economy")) )
#using base R-package where “newcol1” is the new column
>mtcars$newcol1<-ifelse(mtcars$mpg<=15,"luxury",ifelse (mtcars$mpg<=
20,"sports","economy"))
>View(mtcars)
>mtcars<-select(mtcars, -newcol1) #to delete a column
Sub-setting: columns
#arrange using dplyr()
>mtcars<-arrange(mtcars,cyl) #ascending order
>mtcars<-arrange(mtcars, desc(cyl))
#arrange using base R package
>mtcars<-mtcars[order(mtcars$cyl), ]
#group
>group_by(mtcars, cyl)
#summarize
>summarize(mtcars, mean(mpg), sd(mpg))
Order() and Group_by()
Rupak Roy
 Pipelines is a R package helps to better organize the code in
pipeline built with %>% structuring sequences of data operations
left-to-right which is much easier to read, write, and maintain.
 The dplyr R package uses %.% operator which is similar to %>%;
however, it has been deprecated and dplyr now recommends
magrittr that %>% which dplyr imports from magrittr.
Differences between %.%(dplyr) and %>%(magrittr):
> The magrittr package is a much more lightweight package that
exists to define only that pipe-like operator.
> Minimizing the need for local variables and function definitions.
Pipelines %>%(pipe operator)
#using base R package to find the average whose cylinder = 4
>mean(mtcars[mtcars$cyl=="4","mpg"])
Note: here we have use “4” as cyl data type is factor and not numeric else ==4
#using dpylr
>summarize(filter(mtcars,cyl=="4"), mean(mpg))
#using pipe
>mtcars%>%filter(cyl=="4")%>%summarize(mean(mpg))
#categorize the mtcars based on mpg in a new column
mtcars%>%mutate(newcol2=ifelse(mpg<=15,"luxury",ifelse (mpg<=
20,"sports","economy")))
magrittr()
Rupak Roy
Next:
We will see how to manipulate data using dates
Manipulating Data
Rupak Roy

Manipulating Data using DPLYR in R Studio

  • 1.
  • 2.
    Dplyr() provides aflexible grammar of data manipulation. It's the next iteration of plyr, focused on tools for working with data frames (hence the d in the name). It has three main goals:  Identify the most important data manipulation verbs and make them easy to use from R.  Provide blazing fast performance for in-memory data i.e. large data by writing key pieces in C++ (using Rcpp)  Uses the same interface to work with data no matter where it's stored, whether in a data frame, a data table or a database. dplyr(): a grammar of data manipulation Rupak Roy
  • 3.
    >install.packages(dplyr) >library(dplyr) #converting the variablesto factors >mtcars$cyl<- as.factor(mtcars$cyl); >mtcars$am<-as.factor(mtcars$am); >str(mtcars); #using OR dpylr() >dmtcars<-filter(mtcars,cyl==6|cyl==7) #base R package >dmtcarss<-mtcars[mtcars$cyl==6|mtcars$cyl == 7,] >View(dmtcars) #using AND dplyr() >dmtcars<-filter(mtcars,cyl==6 & cyl==4) #using base R package >dmtcars<- dmtcars<-mtcars[mtcars$cyl==6 & mtcars$cyl ==4, ] >View(dmtcars) Subsetting: rows
  • 4.
    #using dplyr() >mtcars_col1<-select(mtcars, mpg,cyl, disp) >View(mtcars_col) #using base R-package >mtcars_col1<-mtcars[ , c("mpg", "cyl", "disp")] #adding new columns using dplyr:mutate() >mtcars<-mutate(mtcars, newcol1= ifelse(mtcars$mpg<=15,"luxury", ifelse(mtcars$mpg<= 20,"sports","economy")) ) #using base R-package where “newcol1” is the new column >mtcars$newcol1<-ifelse(mtcars$mpg<=15,"luxury",ifelse (mtcars$mpg<= 20,"sports","economy")) >View(mtcars) >mtcars<-select(mtcars, -newcol1) #to delete a column Sub-setting: columns
  • 5.
    #arrange using dplyr() >mtcars<-arrange(mtcars,cyl)#ascending order >mtcars<-arrange(mtcars, desc(cyl)) #arrange using base R package >mtcars<-mtcars[order(mtcars$cyl), ] #group >group_by(mtcars, cyl) #summarize >summarize(mtcars, mean(mpg), sd(mpg)) Order() and Group_by() Rupak Roy
  • 6.
     Pipelines isa R package helps to better organize the code in pipeline built with %>% structuring sequences of data operations left-to-right which is much easier to read, write, and maintain.  The dplyr R package uses %.% operator which is similar to %>%; however, it has been deprecated and dplyr now recommends magrittr that %>% which dplyr imports from magrittr. Differences between %.%(dplyr) and %>%(magrittr): > The magrittr package is a much more lightweight package that exists to define only that pipe-like operator. > Minimizing the need for local variables and function definitions. Pipelines %>%(pipe operator)
  • 7.
    #using base Rpackage to find the average whose cylinder = 4 >mean(mtcars[mtcars$cyl=="4","mpg"]) Note: here we have use “4” as cyl data type is factor and not numeric else ==4 #using dpylr >summarize(filter(mtcars,cyl=="4"), mean(mpg)) #using pipe >mtcars%>%filter(cyl=="4")%>%summarize(mean(mpg)) #categorize the mtcars based on mpg in a new column mtcars%>%mutate(newcol2=ifelse(mpg<=15,"luxury",ifelse (mpg<= 20,"sports","economy"))) magrittr() Rupak Roy
  • 8.
    Next: We will seehow to manipulate data using dates Manipulating Data Rupak Roy