Well-documented data visualization using ggplot2, geom_density2d, stat_density_2d, geom_smooth, stat_ellipse, scatterplot and much more. Let me know if anything is required. Ping me at google #bobrupakroy
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
Data visualization using the grammar of graphics
1. Data Visualization - II
Using ggplot2, geom_density2d(), stat_density_2d(),
geom_smooth(), stat_ellipse(), scatterplot()
Rupak Roy
2. ggplot2(): A system for 'declaratively' creating graphics, based on "The
Grammar of Graphics". Provide the data, tell 'ggplot2' how to map variables to
aesthetics, what graphical primitives to use and it will take care of all the
details.
A ggplot comprises of 3 important components:
1. Aesthetic Mapping i.e. aes in short is used to define the data for x and y
axis, size, positions and more.
2. Geometrical Objects i.e. geoms in short are the geometrical elements that
we mark on the plot like boxplot, lines, points, polygons, bars etc.
3. Statistical Transformation: it's often useful to transform the data into
counts, bin, density etc. before plotting. Few important lists of stat():
stat_bin: for Discretizing/binning
stat_smooth: for Function Continuity
stat_density for Probability Density function (PDF)
Create Elegant Data Visualizations Using
the Grammar of Graphics
Rupak Roy
3. #load the data
>m<-mtcars
#set the aesthetic mapping
>p<-ggplot(m,aes(x=mpg,y=cyl)
#define the geometrical objects(geom_point) and statistical transformation(stat)
where “identity” refers no transformation
>p+geom_point(color="blue", stat = "identity")
#histogram
#aesthetic mapping
>p<-ggplot(m,aes(x=mpg))
#plot the geometrical objects
>p+geom_histogram(color="blue",fill=“green",stat="bin")
>p+geom_histogram(color="blue",fill=“green")
#will result the same because the geom already contains default stat value.
Create Elegant Data Visualizations
Using the Grammar of Graphics
4. #to know more about aes mappings use
>help(aes)
#for a list of geom objects use
>help(alpha) #and select ggplot2
from objects exported from other packages.
Look at the index under G for the documentation of complete geom list.
Few geom’s from the list:
etc.
Create Elegant Data Visualizations
Using the Grammar of Graphics
geom Default stat
geom_histogram bin
geom_polygon identity
geom_boxplot boxplot
geom stat
geom_point identity
geom_line identity
geom_title identity
geom
geom_abline
geom_jitter
geom_violin
5. >library(ggplot)
#load the dataset
>mt_cars<-mtcars
>p<-ggplot(mt_cars, aes(x=mpg,y= disp))
#now we can differentiate x=mpg and y=disp based on “cyl” directly by color
>p+geom_point(aes(color=cyl))
#adding the labels
>p+geom_point(aes(color=cyl))+xlab("mileage")
+ylab("disp")
Create Elegant Data Visualizations
Using the Grammar of Graphics
6. #use single color
>geom_point(colour=‘blue’)
#use color by groups
>geom_point(aes(colour=VariableName))
#for boxplots
>geom_boxplot() + scale_fill_manual(values=c("#999999", "#E69F00",
"#56B4E9"))
Or > geom_boxplot() + scale_fill_manual(values=c("red", "blue", "green"))
#for scatterplot
>geom_point()+scale_color_manual(values=c("red", "blue", "green"))
#using RColorBrewer package >library(RColorBrewer)
>geom_point+scale_fill_brewer(palette=“Set2") #for boxplot
>geom_point+scale_color_brewer(palette=“Set3") #for scatterplot
>display.brewer.all() #to display all the available palette list of RColorBrewer
Change default colour: ggplot()
7. #-----------change colors to continuous colors i.e. gradient colors------------------------#
#for scatterplot
>p+geom_point(aes(color=cyl),alpha=0.3)
+xlab("mileage") +ylab("disp")+
scale_color_gradient(low="blue", high="red")
#for histogram
>p+geom_histogram(aes(fill= ..count..))+scale_fill_gradient(low="blue",
high="red")
Change default colour:: ggplot()
Rupak Roy
8. #add a regression line(is the line that best fits the trend of a given data, such that
the overall distance from the line to the points (variable values) plotted on a graph
is the smallest.
#for continuous variable
>p+geom_point(aes(color=cyl))+xlab("mileage")
+ylab("disp")+geom_smooth(method = lm)
+scale_color_gradient(low="blue", high="red")
#for discrete variable
>mt_cars$cyl<-as.factor(mt_cars$cyl)
>plm<- ggplot(mt_cars, aes(x=wt,y= mpg))
>plm+geom_point(aes(color=cyl))+xlab(“weight")
+ylab(“mpg")+geom_smooth(method = lm,
se=FALSE, fullrange=TRUE)
+scale_fill_gradient(low="blue", high=“red”)
regression line:: ggplot()
9. #alternative way but with separate regression lines for each cyl
> ggplot(mt_cars, aes(x=wt, y=mpg, color=cyl, shape=cyl)) +
geom_point() +
geom_smooth(method=lm,aes(fill=cyl))
regression line::ggplot()
Rupak Roy
10. #scatterplots with 2d density
>p<-ggplot(mt_cars, aes(x=mpg,y= disp))
>p+geom_point()+geom_density2d()
#adding a rug plot (It helps to add short lines to
represent points for the existing plot to
illuminate information that sometimes remains
unseen)
>p+geom_point()+geom_density2d()+geom_rug()
Alternative to 2d density is stat_density_2d()
>p + stat_density_2d(aes(fill = ..level..),
geom="polygon")+scale_fill_gradient(low=
"blue", high="red")+geom_rug()
Scatter plots with 2d density
11. #one ellipse for all variables
>p+geom_point()+geom_rug()+stat_ellipse()
#by group
>p<-ggplot(mt_cars, aes(x=mpg,y= disp,
color= disp >280))
>p+geom_point()+stat_ellipse()
#increase the accuracy of ellipses with "t", "norm", "euclid“
>p+geom_point()+stat_ellipse(type = "norm")
Scatter plots with ellipses
12. #another easy way to create scatter plot by using scatterplot() from “car”
package
>install.packages("car")
>library(car)
>scatterplot(mpg ~ qsec | cyl, data=mtcars)
car::scatterplot()
Rupak Roy
13. Next:
We will learn how to plot multiple groups using facet,
heatmaps, geom_density and more.
Data Visualization
Rupak Roy