5. • Keeps only those variables (columns) that you
want to retain/extract.
• Syntax: select(dataset,[column1],[column2],…)
Examples:
Select columns Month, Dealer, Item, Quantity: select(sales,Month,Dealer,Item,Qty)
Select columns from Month to Quantity: select(sales,Month:Qty)
Deselect column Month from the dataset: select(sales,-Month)
Select columns ending with the letter “r”: select(sales,ends_with("r"))
Select columns containing the letter “r”: select(sales,contains("r"))
Select columns starting the series “m”: select(sales,matches("m."))
Select columns with multiple variables: select(sales,one_of(c("Month","Dealer")))
Select columns starting with the letter “d”: select(sales,starts_with("d"))
select()
6. • Keeps only those records (rows) that you want to
retain/extract.
• Syntax: filter(dataset,criteria)
Examples:
Item is Pen: filter(sales,Item==“Pen”)
Quantity is more than 50: filter(sales,Qty>50)
Item is Pencil and Quantity is more than 50: filter(sales,Item=="Pencil"&Qty>50)
Quantity is between 50 and 80: filter(sales,Qty>50&Qty<80)
Item is Pencil or Quantity is more than 50: filter(sales,Item=="Pencil"|Qty>50)
filter()
7. Examples:
We want to extract the Sales Manager, Item and Quantity but only for Pencil:
i) k=select(sales,SalesManager,Item,Qty)
filter(k,Item=="Pencil")
ii) select(filter(sales,Item=="Pencil"),SalesManager,Item,Qty)
iii) filter(select(sales,SalesManager,Item,Qty),Item=="Pencil")
We want to extract for the Month of May, Dealer, Item and Quantity:
i) filter(select(sales,Dealer,Item,Qty),sales$Month=="May")
ii) filter(select(sales,Dealer,Item,Qty),Month=="May")
select() and filter()
8. • Orders or sorts the records (rows) based on the
variable(s).
• By default the arrangement is in ascending order.
• Syntax: arrange(dataset,column1,[column2],…)
Examples:
Sort the dataset based on Months: arrange(sales,Month)
Sort the dataset based on Months and Dealer: arrange(sales,Month,Dealer)
Arrange the data in descending order of Quantity: arrange(sales,desc(Qty))
arrange()
9. • Helps extract unique values from a variable.
• Syntax: distinct(dataset,by=column1)
Examples:
Find the names of the Dealers: distinct(sales,Dealer)
Find the items sold by each Dealer: arrange(distinct(sales,Dealer,Item),by=Dealer)
distinct()
10. • Adds a new variable (column) to the existing
dataset
• Syntax: mutate(dataset,newcolumn=criteria)
Example:
Add a new column Target where it is twice of Quantity: mutate(sales,Target=Qty*2)
mutate()
11. • Creates a new variable (column) but drops the
existing ones
• Syntax: transmute(dataset,newcolumn=criteria)
Example:
Create a new column Target where it is twice of Quantity: transmute(sales,tgt=2*Qty)
transmute()
12. • Helps create groups in a dataset based on a
varaible.
• Useful when nested with other functions.
• Syntax: group_by(dataset,column1,[column2]…)
• Ungroup Syntax: ungroup(dataset)
Example:
Create groups in the data based on Items: group_by(sales,Item)
Get the maximum units sold for each item: filter(group_by(sales,by=Item),Qty==max(Qty))
group_by()
13. • Helps generate a single number/statistic for the dataset
• Syntax: summarise(dataset,newvariable=function….)
Examples:
Total number of units sold across all Items:
summarise(sales,total=sum(Qty))
Total number of units sold and total amount:
summarise(sales,t_Qty=sum(Qty),t_Amount=sum(Amount))
Total number of records in the dataset:
summarise(sales,rowscount=n())
Get the total number of records, quantity sold and amount for each item:
summarise(group_by(sales,Item),rcount=n(),untiyqty=sum(Qty),totalamount=sum(Amount))
Every statistic for each dealer and their respective items:
summarise(group_by(sales,Dealer,Item),rcount=n(),untiyqty=sum(Qty),totalamount=sum(Amount))
summarise()
14. We want to extract the top 6 records for Dealers who have sold the Item Pen only:
filter((sales,Item=="Pen")
select(filter(sales,Item=="Pen"),Item,Dealer,Qty)
arrange(select(filter(sales,Item=="Pen"),Item,Dealer,Qty),by=Dealer)
head(arrange(select(filter(sales,Item=="Pen"),Item,Dealer,Qty),by=Dealer))
We want the maximum quantity of every item for the month of May with just Dealer, Item and
Quantity variables:
select(sales,Dealer,Item,Qty)
filter(select(sales,Dealer,Item,Qty),sales$Month=="May")
group_by(filter(select(sales,Dealer,Item,Qty),sales$Month=="May"),Item,Dealer)
summarise(group_by(filter(select(sales,Dealer,Item,Qty),sales$Month=="May"),Item,Dealer),max(Qty))
Assignment
15. • Belongs to magrittr Package.
• Helps structure sequence of operations in a
single code from left to right.
• Helps avoid nesting of funtions.
• Operator: %>%
Examples:
We want to extract the top 6 records for Dealers who have sold the Item Pen only:
sales%>%filter(Item=="Pen")%>%select(Dealer,Item,Qty)%>%arrange(Dealer)%>%head
We want the maximum quantity of every item for the month of May with just Dealer, Item and
Quantity variables:
sales%>%select(Dealer,Item,Qty)%>%filter(sales$Month=="May")%>%group_by(Item,Dealer)%>%sum
marise(max(Qty))
pipe operator %>%
16. • Helps extract records (rows) based on their
position.
• Syntax: slice(dataset,row numbers)
Examples:
Select first ten rows: slice(sales,1:10)
Select rows fifteen to twenty: slice(sales,15:20)
slice()
17. • Helps count the number of times a values has
appeared in a variable.
• Syntax: count(dataset, [column1],[column2],…)
Examples:
Count the number of times each Dealer has appeared: count(sales,Dealer)
Count the number of times Pen has appeared: count(sales,Item=="Pen")
count()