SlideShare a Scribd company logo
Introduction to Data Analysis and Graphics in R
Introduction to Data Analysis and Graphics in R
Hellen Gakuruh
2017-04-03
Slide 5: Graphics in R
Outline
What we will cover:
• Introduction
• High level plotting functions
• Low level plotting functions
• Interacting with graphics
• Modifying a graph
n
• Plotting dichotomous and categorical variables
• Plotting ordinal variables
• Plotting continuous variables
Introduction
• R is renown for it’s plotting facilities; not only does it have all the well
known graphs, it also offers an opportunity to build an entirely new type
of graph
• There three well known graphics in R; “base graphics”, “grid graphics
(often implemented with package Lattice)” and “ggplot2”
• On start-up, R initiates a graphical device; calls X11() IN UNIX,
windows() in Windows and quartz() in mac
• Plotting functions fall under three types of commands; High-level, Low-
level, and Interactive
1
• Plots can be customized with “graphical parameters”
High level plotting functions
• They are designed to generate a complete plot with axes, labels and titles
unless they are suppressed (with graphical parameters)
• They start a new plot
• Core R’s plotting function is plot()
• plot() can produce a variety of different plots depending on type/class of
first argument (hence, plot() is completely reliant on class(object))
Expected output of “plot()”
• If only “x” is given only;
– if it is a time series object (class = ts), a line plot is produced; other
wise if it’s numeric a scatter plot of it’s index against it (x) is generated
– if class(x) = "factor", a bar plot is produced
– it’s an error when class(x) == "character" as plot needs a finite
object to set a plotting window
• If two variables are given and they are both numeric, output is a scatter
plot
Expected output of “plot()”
• If a factor and a numeric vector are given, box plots are produced
• If both vectors are factors, stacked bar plot is produced
• If objected parsed is not a vector but a matrix, data frame or list, plot()
will make plots per elements type
• We produce a few of these as example using plain plot(obj) (without
changing/giving other arguments)
Time series object
n
ts <- ts(rnorm(12, 50), start = 1, end = 12, frequency = 1)
class(ts)
[1] "ts"
n
plot(ts)
2
Numeric vector
n
num <- rnorm(12, 50)
class(num)
[1] "numeric"
n
plot(num)
3
Factor vector
n
fac <- factor(sample(c("Y", "N"), 100, T, c(0.7, 0.3)))
class(fac)
[1] "factor"
n
plot(fac)
4
Two numeric vectors
n
num2 <- rnorm(12, 88)
class(num2)
[1] "numeric"
n
plot(num, num2)
5
Factor and numeric vector
n
set.seed(5)
num3 <- rnorm(100, 88)
class(num3)
[1] "numeric"
n
plot(fac, num3)
6
Two factor vectors
n
fac2 <- factor(sample(c("F", "M"), 100, T, c(0.8, 0.2)))
class(fac2)
[1] "factor"
n
plot(fac, fac2)
7
Summary
• In all these plots, axis, labels (except title) and in some, color is give, this
makes them communicative
• However, they might not be aesthetically up to requirements, this can be
changed by passing other arguments including suppression of axis
Other arguments to “plot”
• Type of plot produced by plot() depends on first (and “y”) argument,
but how it is generated depends on values parsed to other argument
• Plot type can also be changed with argument “type”, though do this when
sure it makes sense
• “xlim” and “ylim” define x and y limits (min and max axis values), this
can be changed especially if need a bit more padding
8
Other argument to “plot” function cont.
• For customized axis like logs, argument “axes” can be suppressed
• To annotate plot with additional graphical parameters, add them as argu-
ment to high and low level plots or make a call to par(). . . more on this
later (read ?par)
Other High-level plots
• hist() for histograms (univariate continuous distributions)
• boxplot() for box-and-whiskers plot (for univariate numerical variables
alone or categorised by a categorical variable)
• barplot() for bar plots (for categorical distribution)
• pie() for pie chart (for categorical distribution)
Low level plotting functions
• These functions add more information to an existing plot
• Used to customize plots
• Some of the most frequently used functions are; point(), lines(), text(),
title(), abline(), polygon(), legend(), and axis()
• We use some of these when plotting some of the example distributions
Interacting with graphics
• Interaction means extracting or adding information to a plot using a mouse
(rather than inputting data to plot)
• Two function for interaction in R are locator() and identify()
• locator(n, type): one can select “n” number of points using left mouse
button and if type is not specified, a list with two components x and y is
outputted otherwise plotting over selected points given “type” is done
• locator() is particularly handy in locating position for legends, and labels
e.g. text(locator(1), "Outlier", adj=0)
Interacting with graphics cont.
• identify(x, y, labels) is used to highlight any of the points defined
by x and y (using left mouse button)
• These can be used to identify certain points and possibly label
Demonstration on interacting with graphics
9
Graphical paramenters “par()”
• Almost every aspect of a plot can be customized by graphical parameters
• Graphical parameters come in “name=value” pair with all having a default
value
• Accessing current default parameters call par() for complete list
• For a specific list call par detailing parameter of interest par("parameter")
e.g. par("mfrow")
• Changing any parameters can be done globally (not recommended) or
individually
Plotting dichotomous and categorical variables
• Plotting of any distribution depends on whether it’s univariate (one vari-
able), bi-variate (two variables) or multi-variate
• Plots for univariate categorical variables (dichotomous included) are:
– Pie charts (for few values e.g. 2)
– Bar plots, and
– Cleveland’s dot plots
Plotting dichotomous and categorical variables conti.
• Bi-variate plots
– Stacked/besides bar plots
– Four-fold display
• Multi-variate plots
– Mosaic
– Four-fold plots
Pie chart
• Suitable when their few categories
• Useful for showing “%’s”
• Highly discouraged due to angular perception, in addition it uses a lot of
ink
10
Pie chart example
set.seed(5)
response <- sample(c("Yes", "No"), 300, T, c(0.68, 0.32))
tab_response <- table(response)
pie(tab_response, col = c("#99CCFF", "#6699CC"))
labs <- paste0("(", round(as.vector(prop.table(tab_response)*100)), "%)")
text(x = c(0.78, -0.50), y = c(0.80, -1), labels = c(labs[1], labs[2]))
Bar plot
• Consist of a sequence of rectangular bars with heights given by values
given
• Ideally, bars should be ordered by frequency rather than bar-label
• Not recommended due to high-ink-ration (an alternative is Cleveland’s dot
plot)
11
Bar plot cont.
barplot(sort(tab_response, decreasing = TRUE), las = 1, col = c("#6699CC", "#99CCFF"))
title("Bar chart", xlab = "Response", ylab = "Frequency")
Cleveland’s dot plot
• An alternative to bar chart (uses less data:ink ratio)
• As an example, generate a “Cleveland’s dot plot” of the following data set
and it should be:
– titled “Total student’s trained by quarters (2016)”
– have an x axis titled “Total student’s trained”
– a sub-title “Data Mania Inc” (grey in color and slant), and
– Y axis titled “Quarters”, balled according to (ordered) months given
(March, Jun, Sep and Dec)
– have blue colored points
12
Cleveland’s dot plot
• Example data: Hypothetical random number of students trained by quarter
totals for year 2016
set.seed(5)
months <- sample(month.abb[c(3, 6, 9, 12)], size = 300, replace = TRUE)
tab_months <- table(months)[c("Mar", "Jun", "Sep", "Dec")]
tab_months
months
Mar Jun Sep Dec
81 78 60 81
Cleveland’s dot plot
13
n
dotchart(as.numeric(tab_months), xlab = "Total student's Trained", ylab = "Quarters", bg = 4
title("Total students trained by quarters (2016)", sub = "Data Mania Inc.,", font.sub = 3, c
axis(2, at = 1:4, labels = names(tab_months), las = 2)
Bi-variate Stacked/Besides bar plots and Dot plot
• Following earlier example, generate stacked/besides bar plot and bi-variate
Cleveland’s dot plot
• Adding second variable; Gender composition of students trained
Bivariate stacked/besides bar plots and dot plot cont.
set.seed(5)
gender <- sample(c("Female", "Male"), 300, TRUE, c(0.7, 0.3))
monthgen_tab <- table(gender, months)[, c("Dec", "Sep", "Jun", "Mar")]
monthgen_tab
months
gender Dec Sep Jun Mar
Female 0 49 78 81
Male 81 11 0 0
14
Bivariate stacked/besides bar plots and dot plot cont.
barplot(monthgen_tab, col = c("#6699CC", "#99CCFF"), beside = TRUE)
legend("topright", legend = c("Female", "Male"), pch = 22 , pt.bg = c("#6699CC", "#99CCFF"),
title("Student's trained by gender and month (2016)", xlab = "Month", ylab = "Number trained
15
Bivariate Cleveland’s dot plot
dotchart(as.matrix(monthgen_tab)[, c("Mar", "Jun", "Sep", "Dec")], bg = 4, xlab = "Total num
title("Total student's trained by gender and month", sub = "Data Mania Inc.", font.sub = 3,
title(ylab = "Gender and month", line = 2.5)
Four-fold plots
• Used to display association (or lack of)
• Designed for two binary variables (2 x 2 tables), this can be categorized
by a third categorical variable with K levels (2 x 2 x k tables)
• Association established if diagonal opposite cells in one direction tend to
differ in size from those in the other direction
• Color used to show this direction
16
Four-fold plots cont.
• Rings around circle are confidence rings and if adjacent quadrants rings
overlap then it corresponds to ( H_0: ) No association
• Example data: R’s “Titanic” data (but only for passengers)
# Convert Titanic data
titanic_passengers <- colSums(Titanic[-4,,,])
titanic_passengers
, , Survived = No
Age
Sex Child Adult
Male 35 659
Female 17 106
, , Survived = Yes
Age
Sex Child Adult
Male 29 146
Female 28 296
17
Four-fold for Titanic Passengers
n
# Plotting four fold plot
fourfoldplot(titanic_passengers, std = "margins")
• Plot shows association (rings do not overlap and diagonal opposite cells
differ in size) between Titanic’s passenger’s age (child/adult) and gender
(Male/Female) stratified by survival status (No/Yes)
• Four-fold differ from pie chart as it varies radius while holding angle
constant while pie varies angle while holding radius constant
Mosaic plots
• Originally proposed by Hartigan and Kleiner (1981, 1984)
18
• Similar to a divided bar plot where it displays counts of a contingency table
directly by tiles whose area is proportional to the observed cell frequency
• Later extended by Friendly (1992, 1994b)
• Extended version generates greater visual impact by using color and shading
to reflect size of residuals from independence (no association)
• Used for exploratory data analysis (establish associations) and model
building (display residuals of log-linear model)
mosaicplot(titanic_passengers, color = TRUE)
• Width of each column of tile in above figure is proportional to observed
frequency of each cell and height of each tile is determined by conditional
probabilities of row (age) in each column (sex).
# Height of tiles
prop.table(apply(titanic_passengers, 1:2, sum), 1)
Age
19
Sex Child Adult
Male 0.07364787 0.9263521
Female 0.10067114 0.8993289
Plotting continuous variables
• Display will depend on whether it univariate, bi-variate or multivariate
• Some often used displays for univariate:
– Histograms
– Density plots
– Box-and-whisker plots
– Dot plot
– Stem-and-leave plot
Plotting continuous variables
• Some bi-variate displays
– Scatter plot (both variables are continuous)
– Box-and-whisker plot (one variable is continuous and the other cate-
gorical)
Histogram
• Display distribution of observation in intervals called “bins”
• Each bin is represented by a rectangle whose width is the intervals
• Intervals can be equal through out (equidistant, R’s default) or not
• Heights of each rectangle corresponds to number of observations falling
within an interval (bin)
• Generated with function “hist” or plot(x, type = “h”)
• Hist constructs bins from argument “breaks”
Histogram cont.
• Breaks are breaking points for each interval or bin
• Giving a vector without this argument is okay (R will compute them), but
it’s usually good to change them to show best picture of distribution
• Argument “nclass” (compatible with S) can also be used to get number of
breaks needed
• Histograms are excellent for data with numerous observations
20
Histogram cont.
# Example data: Edgar Anderson's Iris Data
sepal <- iris$Sepal.Length
sepal
[1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9 5.4 4.8 4.8 4.3 5.8 5.7 5.4
[18] 5.1 5.7 5.1 5.4 5.1 4.6 5.1 4.8 5.0 5.0 5.2 5.2 4.7 4.8 5.4 5.2 5.5
[35] 4.9 5.0 5.5 4.9 4.4 5.1 5.0 4.5 4.4 5.0 5.1 4.8 5.1 4.6 5.3 5.0 7.0
[52] 6.4 6.9 5.5 6.5 5.7 6.3 4.9 6.6 5.2 5.0 5.9 6.0 6.1 5.6 6.7 5.6 5.8
[69] 6.2 5.6 5.9 6.1 6.3 6.1 6.4 6.6 6.8 6.7 6.0 5.7 5.5 5.5 5.8 6.0 5.4
[86] 6.0 6.7 6.3 5.6 5.5 5.5 6.1 5.8 5.0 5.6 5.7 5.7 6.2 5.1 5.7 6.3 5.8
[103] 7.1 6.3 6.5 7.6 4.9 7.3 6.7 7.2 6.5 6.4 6.8 5.7 5.8 6.4 6.5 7.7 7.7
[120] 6.0 6.9 5.6 7.7 6.3 6.7 7.2 6.2 6.1 6.4 7.2 7.4 7.9 6.4 6.3 6.1 7.7
[137] 6.3 6.4 6.0 6.9 6.7 6.9 5.8 6.8 6.7 6.7 6.3 6.5 6.2 5.9
21
Code used to plot
op <- par("mfrow")
par(mfrow = c(1, 2))
hist(sepal, col = "#99CCFF", ann = FALSE)
title("Breaks = 10", xlab = "Sepal Length", ylab = "Frequency")
hist(sepal, nclass = 15, col = "#6699cc", ann = FALSE)
title("Breaks = 15", xlab = "Sepal Length", ylab = "Frequency")
par(mfrow = op)
Density Plots
• Fit “smooth” curve by computing kernel density estimates
• Based on probability theory
22
dens_sepal <- density(sepal)
plot(dens_sepal, type = "n")
polygon(dens_sepal, col = "#99CCFF")
Box-and-whisker plot (univariate)
• Used to visualize data distribution in terms of quarters
• Shows outliers
• Good comparison displays as multiple variables or groups can be plotted
side-by-side
states <- as.data.frame(state.x77[, c("Illiteracy", "Life Exp", "Murder", "HS Grad")])
23
# Layout (1 row by 2 columns)
op <- par("mfrow")
par(mfrow = c(1, 2))
# Visualise distributions
boxplot(states$Illiteracy, col = "#99CCFF")
boxplot(states$'Life Exp', col = "#6699CC")
# Reset original layout
par(mfrow = op)
• Both distributions have no outliers (points beyond whiskers)
• First distribution has most of it’s values at the lower side suggesting a
positive skewness (right tail)
• Second distribution look almost symmetrical as lower and upper quarters
look the same though it’s middle value is more on the lower side
24
Dot plots (Uni-variate)
• An alternative to box plot when n (sample size) is small
• They are one dimensional scatter plots
• Called stripchart in R
• Example data: 49.3, 48.1, 51.4, 48.1, 49, 49.3, 49.5, 49.8, 49.9, 50.4, 50.1
and 50.3
stripchart(round(num, 1), pch = 22, bg = col[1])
title("Dot plot for small sample size", xlab = "Observations")
Stem-and-leave plot
• Used to show distribution of observation
• Use actual values rather than points
25
• Stem is the whole number and is plotted on the left side while on the right
side (separated by a vertical bar) are the fractions
# Example data (sorted)
sort(round(num, 1))
[1] 48.1 48.1 49.0 49.3 49.3 49.5 49.8 49.9 50.1 50.3 50.4 51.4
# # Stem-and-leave plot
stem(round(num, 1))
The decimal point is at the |
48 | 11
49 | 033589
50 | 134
51 | 4
Scatter plot
• Used to show relationship between two continuous variables
• Relationship is said to exist if points have a visible pattern (positive or
negative)
• No relationship exists if not pattern is visible; points are scattered
plot(states[, 1:2], pch = 21, bg = col[1])
title("Association between Illiteracy and Life Expectancy")
26
n
• Scatter plot shows some negative pattern suggesting an association between
“Life Expectancy” and “Illiteracy” (cor = -0.5884779)
Box-and-whisker plot (bi-variate)
• Useful to display numerical variable by strata’s or groups of another
categorical variable
• Can also be used to compare two numerical distributions
27
# Box plot with slant axis
op <- par("mar")
par(mar = c(7, 4, 4, 2) + 0.1)
# Plot without axis
boxplot(states$`Life Exp`~state.division, col = col[1], xaxt = "n", xlab = "")
# Add axis without labels
axis(1, labels = FALSE)
# Labels as levels of categorical variable
labs <- levels(state.division)
# Add labels
text(1:length(labs), par("usr")[3] - 0.25, srt = 45, adj = 1, labels = labs, xpd = TRUE)
28
# Add xlab
mtext("Divisions", side = 1, line = 6, font = 2)
# Annotate plot
title("Life expectancy for each US division", ylab = "Life expectancy")
# Reset parameter
par(mar = op)
• Using box plot to make comparison of similar distribution
• Example data: Elgar Anderson’s Iris Data
29
# Comparing lengths (Sepal and Petal)
boxplot(iris[, c("Sepal.Length", "Petal.Length")], col = col)
title("Comparing length of Irises of Gaspe Peninsula")
# Comparing width (Sepal and Petal)
boxplot(iris[, c("Sepal.Width", "Petal.Width")], col = col)
title("Comparing width of Irises of Gaspe Peninsula")
• Sepal seems to be higher in terms of length and width than petal
• Will this pattern hold under different species?
30
• Pattern still holds, Sepal length is higher than Petal length across all
species
31
• Pattern still holds as Sepal width is higher than Petal width across all
species however, it’s interesting to see “setosa” is higher than the others.
# High level functions
boxplot(iris$Sepal.Length~iris$Species, col = col[1], ylim = c(min(iris$Petal.Length) - 0.1,
boxplot(iris$Petal.Length~iris$Species, col = 4, add = TRUE)
# Low level functions
legend("bottomright", c("Sepal", "Petal"), pch = 22, pt.bg = c(col[1], 4), title = "Iris Typ
title("Comparison of Iris Length by species", xlab = "Species", ylab = "Length")
# High level functions
boxplot(iris$Sepal.Width~iris$Species, col = col[1], ylim = c(min(iris$Petal.Width) - 0.1, m
boxplot(iris$Petal.Width~iris$Species, col = 4, add = TRUE)
# Low level functions
legend("bottomright", c("Sepal", "Petal"), pch = 22, pt.bg = c(col[1], 4), title = "Iris Typ
32
title("Comparison of Iris Width by species", xlab = "Species", ylab = "Width")
33

More Related Content

What's hot

8. R Graphics with R
8. R Graphics with R8. R Graphics with R
8. R Graphics with R
FAO
 
A Survey Of R Graphics
A Survey Of R GraphicsA Survey Of R Graphics
A Survey Of R Graphics
Dataspora
 
R-programming-training-in-mumbai
R-programming-training-in-mumbaiR-programming-training-in-mumbai
R-programming-training-in-mumbai
Unmesh Baile
 
3 R Tutorial Data Structure
3 R Tutorial Data Structure3 R Tutorial Data Structure
3 R Tutorial Data Structure
Sakthi Dasans
 
Presentation R basic teaching module
Presentation R basic teaching modulePresentation R basic teaching module
Presentation R basic teaching module
Sander Timmer
 
R language
R languageR language
R language
LearningTech
 
Matlab Graphics Tutorial
Matlab Graphics TutorialMatlab Graphics Tutorial
Matlab Graphics Tutorial
Cheng-An Yang
 
Introduction to r
Introduction to rIntroduction to r
Introduction to r
Golden Julie Jesus
 
Spark Overview - Oleg Mürk
Spark Overview - Oleg MürkSpark Overview - Oleg Mürk
Spark Overview - Oleg Mürk
Planet OS
 
pandas - Python Data Analysis
pandas - Python Data Analysispandas - Python Data Analysis
pandas - Python Data Analysis
Andrew Henshaw
 
Python - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesPython - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning Libraries
Andrew Ferlitsch
 
Abstracting over the Monad yielded by a for comprehension and its generators
Abstracting over the Monad yielded by a for comprehension and its generatorsAbstracting over the Monad yielded by a for comprehension and its generators
Abstracting over the Monad yielded by a for comprehension and its generators
Philip Schwarz
 
An Invitation to Functional Programming
An Invitation to Functional ProgrammingAn Invitation to Functional Programming
An Invitation to Functional Programming
Sonat Süer
 
Pandas,scipy,numpy cheatsheet
Pandas,scipy,numpy cheatsheetPandas,scipy,numpy cheatsheet
Pandas,scipy,numpy cheatsheet
Dr. Volkan OBAN
 
Rcommands-for those who interested in R.
Rcommands-for those who interested in R.Rcommands-for those who interested in R.
Rcommands-for those who interested in R.
Dr. Volkan OBAN
 
Graphing stata (2 hour course)
Graphing stata (2 hour course)Graphing stata (2 hour course)
Graphing stata (2 hour course)
izahn
 
Sequence and Traverse - Part 1
Sequence and Traverse - Part 1Sequence and Traverse - Part 1
Sequence and Traverse - Part 1
Philip Schwarz
 
5 R Tutorial Data Visualization
5 R Tutorial Data Visualization5 R Tutorial Data Visualization
5 R Tutorial Data Visualization
Sakthi Dasans
 
Hessian Matrices in Statistics
Hessian Matrices in StatisticsHessian Matrices in Statistics
Hessian Matrices in Statistics
Ferris Jumah
 
Array Data Structures
Array Data StructuresArray Data Structures
Array Data Structures
Soni Gupta
 

What's hot (20)

8. R Graphics with R
8. R Graphics with R8. R Graphics with R
8. R Graphics with R
 
A Survey Of R Graphics
A Survey Of R GraphicsA Survey Of R Graphics
A Survey Of R Graphics
 
R-programming-training-in-mumbai
R-programming-training-in-mumbaiR-programming-training-in-mumbai
R-programming-training-in-mumbai
 
3 R Tutorial Data Structure
3 R Tutorial Data Structure3 R Tutorial Data Structure
3 R Tutorial Data Structure
 
Presentation R basic teaching module
Presentation R basic teaching modulePresentation R basic teaching module
Presentation R basic teaching module
 
R language
R languageR language
R language
 
Matlab Graphics Tutorial
Matlab Graphics TutorialMatlab Graphics Tutorial
Matlab Graphics Tutorial
 
Introduction to r
Introduction to rIntroduction to r
Introduction to r
 
Spark Overview - Oleg Mürk
Spark Overview - Oleg MürkSpark Overview - Oleg Mürk
Spark Overview - Oleg Mürk
 
pandas - Python Data Analysis
pandas - Python Data Analysispandas - Python Data Analysis
pandas - Python Data Analysis
 
Python - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesPython - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning Libraries
 
Abstracting over the Monad yielded by a for comprehension and its generators
Abstracting over the Monad yielded by a for comprehension and its generatorsAbstracting over the Monad yielded by a for comprehension and its generators
Abstracting over the Monad yielded by a for comprehension and its generators
 
An Invitation to Functional Programming
An Invitation to Functional ProgrammingAn Invitation to Functional Programming
An Invitation to Functional Programming
 
Pandas,scipy,numpy cheatsheet
Pandas,scipy,numpy cheatsheetPandas,scipy,numpy cheatsheet
Pandas,scipy,numpy cheatsheet
 
Rcommands-for those who interested in R.
Rcommands-for those who interested in R.Rcommands-for those who interested in R.
Rcommands-for those who interested in R.
 
Graphing stata (2 hour course)
Graphing stata (2 hour course)Graphing stata (2 hour course)
Graphing stata (2 hour course)
 
Sequence and Traverse - Part 1
Sequence and Traverse - Part 1Sequence and Traverse - Part 1
Sequence and Traverse - Part 1
 
5 R Tutorial Data Visualization
5 R Tutorial Data Visualization5 R Tutorial Data Visualization
5 R Tutorial Data Visualization
 
Hessian Matrices in Statistics
Hessian Matrices in StatisticsHessian Matrices in Statistics
Hessian Matrices in Statistics
 
Array Data Structures
Array Data StructuresArray Data Structures
Array Data Structures
 

Similar to R training5

R graphics
R graphicsR graphics
R graphics
DHIVYADEVAKI
 
Presentation: Plotting Systems in R
Presentation: Plotting Systems in RPresentation: Plotting Systems in R
Presentation: Plotting Systems in R
Ilya Zhbannikov
 
Introduction to matlab
Introduction to matlabIntroduction to matlab
Introduction to matlab
Khulna University
 
Chart and graphs in R programming language
Chart and graphs in R programming language Chart and graphs in R programming language
Chart and graphs in R programming language
CHANDAN KUMAR
 
CIV1900 Matlab - Plotting & Coursework
CIV1900 Matlab - Plotting & CourseworkCIV1900 Matlab - Plotting & Coursework
CIV1900 Matlab - Plotting & CourseworkTUOS-Sam
 
Learning notes of r for python programmer (Temp1)
Learning notes of r for python programmer (Temp1)Learning notes of r for python programmer (Temp1)
Learning notes of r for python programmer (Temp1)Chia-Chi Chang
 
R language introduction
R language introductionR language introduction
R language introduction
Shashwat Shriparv
 
Matlab plotting
Matlab plottingMatlab plotting
Matlab plotting
pink1710
 
UNIT_4_data visualization.pptx
UNIT_4_data visualization.pptxUNIT_4_data visualization.pptx
UNIT_4_data visualization.pptx
BhagyasriPatel2
 
R training2
R training2R training2
R training2
Hellen Gakuruh
 
Data import-cheatsheet
Data import-cheatsheetData import-cheatsheet
Data import-cheatsheet
Dieudonne Nahigombeye
 
R Programming Reference Card
R Programming Reference CardR Programming Reference Card
R Programming Reference Card
Maurice Dawson
 
Lectures r-graphics
Lectures r-graphicsLectures r-graphics
Lectures r-graphics
etyca
 
Short Reference Card for R users.
Short Reference Card for R users.Short Reference Card for R users.
Short Reference Card for R users.
Dr. Volkan OBAN
 
Reference card for R
Reference card for RReference card for R
Reference card for R
Dr. Volkan OBAN
 
Lecture_3.pptx
Lecture_3.pptxLecture_3.pptx
Lecture_3.pptx
SungaleliYuen
 

Similar to R training5 (20)

R graphics
R graphicsR graphics
R graphics
 
Presentation: Plotting Systems in R
Presentation: Plotting Systems in RPresentation: Plotting Systems in R
Presentation: Plotting Systems in R
 
Introduction to matlab
Introduction to matlabIntroduction to matlab
Introduction to matlab
 
Chart and graphs in R programming language
Chart and graphs in R programming language Chart and graphs in R programming language
Chart and graphs in R programming language
 
MATLAB PLOT.pdf
MATLAB PLOT.pdfMATLAB PLOT.pdf
MATLAB PLOT.pdf
 
Lec2
Lec2Lec2
Lec2
 
CIV1900 Matlab - Plotting & Coursework
CIV1900 Matlab - Plotting & CourseworkCIV1900 Matlab - Plotting & Coursework
CIV1900 Matlab - Plotting & Coursework
 
Learning notes of r for python programmer (Temp1)
Learning notes of r for python programmer (Temp1)Learning notes of r for python programmer (Temp1)
Learning notes of r for python programmer (Temp1)
 
R language introduction
R language introductionR language introduction
R language introduction
 
Matlab plotting
Matlab plottingMatlab plotting
Matlab plotting
 
UNIT_4_data visualization.pptx
UNIT_4_data visualization.pptxUNIT_4_data visualization.pptx
UNIT_4_data visualization.pptx
 
R training2
R training2R training2
R training2
 
Data import-cheatsheet
Data import-cheatsheetData import-cheatsheet
Data import-cheatsheet
 
R Programming Reference Card
R Programming Reference CardR Programming Reference Card
R Programming Reference Card
 
Twopi.1
Twopi.1Twopi.1
Twopi.1
 
Lectures r-graphics
Lectures r-graphicsLectures r-graphics
Lectures r-graphics
 
ML-CheatSheet (1).pdf
ML-CheatSheet (1).pdfML-CheatSheet (1).pdf
ML-CheatSheet (1).pdf
 
Short Reference Card for R users.
Short Reference Card for R users.Short Reference Card for R users.
Short Reference Card for R users.
 
Reference card for R
Reference card for RReference card for R
Reference card for R
 
Lecture_3.pptx
Lecture_3.pptxLecture_3.pptx
Lecture_3.pptx
 

More from Hellen Gakuruh

R training6
R training6R training6
R training6
Hellen Gakuruh
 
R training4
R training4R training4
R training4
Hellen Gakuruh
 
R training3
R training3R training3
R training3
Hellen Gakuruh
 
R training
R trainingR training
R training
Hellen Gakuruh
 
Prelude to level_three
Prelude to level_threePrelude to level_three
Prelude to level_three
Hellen Gakuruh
 
Prelude to level_two
Prelude to level_twoPrelude to level_two
Prelude to level_two
Hellen Gakuruh
 
SessionThree_IntroductionToVersionControlSystems
SessionThree_IntroductionToVersionControlSystemsSessionThree_IntroductionToVersionControlSystems
SessionThree_IntroductionToVersionControlSystemsHellen Gakuruh
 
Introduction_to_Regular_Expressions_in_R
Introduction_to_Regular_Expressions_in_RIntroduction_to_Regular_Expressions_in_R
Introduction_to_Regular_Expressions_in_RHellen Gakuruh
 
SessionTen_CaseStudies
SessionTen_CaseStudiesSessionTen_CaseStudies
SessionTen_CaseStudiesHellen Gakuruh
 
SessionNine_HowandWheretoGetHelp
SessionNine_HowandWheretoGetHelpSessionNine_HowandWheretoGetHelp
SessionNine_HowandWheretoGetHelpHellen Gakuruh
 
SessionEight_PlottingInBaseR
SessionEight_PlottingInBaseRSessionEight_PlottingInBaseR
SessionEight_PlottingInBaseRHellen Gakuruh
 
SessionSeven_WorkingWithDatesandTime
SessionSeven_WorkingWithDatesandTimeSessionSeven_WorkingWithDatesandTime
SessionSeven_WorkingWithDatesandTimeHellen Gakuruh
 
SessionSix_TransformingManipulatingDataObjects
SessionSix_TransformingManipulatingDataObjectsSessionSix_TransformingManipulatingDataObjects
SessionSix_TransformingManipulatingDataObjectsHellen Gakuruh
 
SessionFive_ImportingandExportingData
SessionFive_ImportingandExportingDataSessionFive_ImportingandExportingData
SessionFive_ImportingandExportingDataHellen Gakuruh
 
SessionFour_DataTypesandObjects
SessionFour_DataTypesandObjectsSessionFour_DataTypesandObjects
SessionFour_DataTypesandObjectsHellen Gakuruh
 
SessionTwo_MakingFunctionCalls
SessionTwo_MakingFunctionCallsSessionTwo_MakingFunctionCalls
SessionTwo_MakingFunctionCallsHellen Gakuruh
 

More from Hellen Gakuruh (20)

R training6
R training6R training6
R training6
 
R training4
R training4R training4
R training4
 
R training3
R training3R training3
R training3
 
R training
R trainingR training
R training
 
Prelude to level_three
Prelude to level_threePrelude to level_three
Prelude to level_three
 
Prelude to level_two
Prelude to level_twoPrelude to level_two
Prelude to level_two
 
SessionThree_IntroductionToVersionControlSystems
SessionThree_IntroductionToVersionControlSystemsSessionThree_IntroductionToVersionControlSystems
SessionThree_IntroductionToVersionControlSystems
 
Day 2
Day 2Day 2
Day 2
 
Day 1
Day 1Day 1
Day 1
 
Introduction_to_Regular_Expressions_in_R
Introduction_to_Regular_Expressions_in_RIntroduction_to_Regular_Expressions_in_R
Introduction_to_Regular_Expressions_in_R
 
SessionTen_CaseStudies
SessionTen_CaseStudiesSessionTen_CaseStudies
SessionTen_CaseStudies
 
webScrapingFunctions
webScrapingFunctionswebScrapingFunctions
webScrapingFunctions
 
SessionNine_HowandWheretoGetHelp
SessionNine_HowandWheretoGetHelpSessionNine_HowandWheretoGetHelp
SessionNine_HowandWheretoGetHelp
 
SessionEight_PlottingInBaseR
SessionEight_PlottingInBaseRSessionEight_PlottingInBaseR
SessionEight_PlottingInBaseR
 
SessionSeven_WorkingWithDatesandTime
SessionSeven_WorkingWithDatesandTimeSessionSeven_WorkingWithDatesandTime
SessionSeven_WorkingWithDatesandTime
 
SessionSix_TransformingManipulatingDataObjects
SessionSix_TransformingManipulatingDataObjectsSessionSix_TransformingManipulatingDataObjects
SessionSix_TransformingManipulatingDataObjects
 
Files
FilesFiles
Files
 
SessionFive_ImportingandExportingData
SessionFive_ImportingandExportingDataSessionFive_ImportingandExportingData
SessionFive_ImportingandExportingData
 
SessionFour_DataTypesandObjects
SessionFour_DataTypesandObjectsSessionFour_DataTypesandObjects
SessionFour_DataTypesandObjects
 
SessionTwo_MakingFunctionCalls
SessionTwo_MakingFunctionCallsSessionTwo_MakingFunctionCalls
SessionTwo_MakingFunctionCalls
 

Recently uploaded

一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 

Recently uploaded (20)

一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 

R training5

  • 1. Introduction to Data Analysis and Graphics in R Introduction to Data Analysis and Graphics in R Hellen Gakuruh 2017-04-03 Slide 5: Graphics in R Outline What we will cover: • Introduction • High level plotting functions • Low level plotting functions • Interacting with graphics • Modifying a graph n • Plotting dichotomous and categorical variables • Plotting ordinal variables • Plotting continuous variables Introduction • R is renown for it’s plotting facilities; not only does it have all the well known graphs, it also offers an opportunity to build an entirely new type of graph • There three well known graphics in R; “base graphics”, “grid graphics (often implemented with package Lattice)” and “ggplot2” • On start-up, R initiates a graphical device; calls X11() IN UNIX, windows() in Windows and quartz() in mac • Plotting functions fall under three types of commands; High-level, Low- level, and Interactive 1
  • 2. • Plots can be customized with “graphical parameters” High level plotting functions • They are designed to generate a complete plot with axes, labels and titles unless they are suppressed (with graphical parameters) • They start a new plot • Core R’s plotting function is plot() • plot() can produce a variety of different plots depending on type/class of first argument (hence, plot() is completely reliant on class(object)) Expected output of “plot()” • If only “x” is given only; – if it is a time series object (class = ts), a line plot is produced; other wise if it’s numeric a scatter plot of it’s index against it (x) is generated – if class(x) = "factor", a bar plot is produced – it’s an error when class(x) == "character" as plot needs a finite object to set a plotting window • If two variables are given and they are both numeric, output is a scatter plot Expected output of “plot()” • If a factor and a numeric vector are given, box plots are produced • If both vectors are factors, stacked bar plot is produced • If objected parsed is not a vector but a matrix, data frame or list, plot() will make plots per elements type • We produce a few of these as example using plain plot(obj) (without changing/giving other arguments) Time series object n ts <- ts(rnorm(12, 50), start = 1, end = 12, frequency = 1) class(ts) [1] "ts" n plot(ts) 2
  • 3. Numeric vector n num <- rnorm(12, 50) class(num) [1] "numeric" n plot(num) 3
  • 4. Factor vector n fac <- factor(sample(c("Y", "N"), 100, T, c(0.7, 0.3))) class(fac) [1] "factor" n plot(fac) 4
  • 5. Two numeric vectors n num2 <- rnorm(12, 88) class(num2) [1] "numeric" n plot(num, num2) 5
  • 6. Factor and numeric vector n set.seed(5) num3 <- rnorm(100, 88) class(num3) [1] "numeric" n plot(fac, num3) 6
  • 7. Two factor vectors n fac2 <- factor(sample(c("F", "M"), 100, T, c(0.8, 0.2))) class(fac2) [1] "factor" n plot(fac, fac2) 7
  • 8. Summary • In all these plots, axis, labels (except title) and in some, color is give, this makes them communicative • However, they might not be aesthetically up to requirements, this can be changed by passing other arguments including suppression of axis Other arguments to “plot” • Type of plot produced by plot() depends on first (and “y”) argument, but how it is generated depends on values parsed to other argument • Plot type can also be changed with argument “type”, though do this when sure it makes sense • “xlim” and “ylim” define x and y limits (min and max axis values), this can be changed especially if need a bit more padding 8
  • 9. Other argument to “plot” function cont. • For customized axis like logs, argument “axes” can be suppressed • To annotate plot with additional graphical parameters, add them as argu- ment to high and low level plots or make a call to par(). . . more on this later (read ?par) Other High-level plots • hist() for histograms (univariate continuous distributions) • boxplot() for box-and-whiskers plot (for univariate numerical variables alone or categorised by a categorical variable) • barplot() for bar plots (for categorical distribution) • pie() for pie chart (for categorical distribution) Low level plotting functions • These functions add more information to an existing plot • Used to customize plots • Some of the most frequently used functions are; point(), lines(), text(), title(), abline(), polygon(), legend(), and axis() • We use some of these when plotting some of the example distributions Interacting with graphics • Interaction means extracting or adding information to a plot using a mouse (rather than inputting data to plot) • Two function for interaction in R are locator() and identify() • locator(n, type): one can select “n” number of points using left mouse button and if type is not specified, a list with two components x and y is outputted otherwise plotting over selected points given “type” is done • locator() is particularly handy in locating position for legends, and labels e.g. text(locator(1), "Outlier", adj=0) Interacting with graphics cont. • identify(x, y, labels) is used to highlight any of the points defined by x and y (using left mouse button) • These can be used to identify certain points and possibly label Demonstration on interacting with graphics 9
  • 10. Graphical paramenters “par()” • Almost every aspect of a plot can be customized by graphical parameters • Graphical parameters come in “name=value” pair with all having a default value • Accessing current default parameters call par() for complete list • For a specific list call par detailing parameter of interest par("parameter") e.g. par("mfrow") • Changing any parameters can be done globally (not recommended) or individually Plotting dichotomous and categorical variables • Plotting of any distribution depends on whether it’s univariate (one vari- able), bi-variate (two variables) or multi-variate • Plots for univariate categorical variables (dichotomous included) are: – Pie charts (for few values e.g. 2) – Bar plots, and – Cleveland’s dot plots Plotting dichotomous and categorical variables conti. • Bi-variate plots – Stacked/besides bar plots – Four-fold display • Multi-variate plots – Mosaic – Four-fold plots Pie chart • Suitable when their few categories • Useful for showing “%’s” • Highly discouraged due to angular perception, in addition it uses a lot of ink 10
  • 11. Pie chart example set.seed(5) response <- sample(c("Yes", "No"), 300, T, c(0.68, 0.32)) tab_response <- table(response) pie(tab_response, col = c("#99CCFF", "#6699CC")) labs <- paste0("(", round(as.vector(prop.table(tab_response)*100)), "%)") text(x = c(0.78, -0.50), y = c(0.80, -1), labels = c(labs[1], labs[2])) Bar plot • Consist of a sequence of rectangular bars with heights given by values given • Ideally, bars should be ordered by frequency rather than bar-label • Not recommended due to high-ink-ration (an alternative is Cleveland’s dot plot) 11
  • 12. Bar plot cont. barplot(sort(tab_response, decreasing = TRUE), las = 1, col = c("#6699CC", "#99CCFF")) title("Bar chart", xlab = "Response", ylab = "Frequency") Cleveland’s dot plot • An alternative to bar chart (uses less data:ink ratio) • As an example, generate a “Cleveland’s dot plot” of the following data set and it should be: – titled “Total student’s trained by quarters (2016)” – have an x axis titled “Total student’s trained” – a sub-title “Data Mania Inc” (grey in color and slant), and – Y axis titled “Quarters”, balled according to (ordered) months given (March, Jun, Sep and Dec) – have blue colored points 12
  • 13. Cleveland’s dot plot • Example data: Hypothetical random number of students trained by quarter totals for year 2016 set.seed(5) months <- sample(month.abb[c(3, 6, 9, 12)], size = 300, replace = TRUE) tab_months <- table(months)[c("Mar", "Jun", "Sep", "Dec")] tab_months months Mar Jun Sep Dec 81 78 60 81 Cleveland’s dot plot 13
  • 14. n dotchart(as.numeric(tab_months), xlab = "Total student's Trained", ylab = "Quarters", bg = 4 title("Total students trained by quarters (2016)", sub = "Data Mania Inc.,", font.sub = 3, c axis(2, at = 1:4, labels = names(tab_months), las = 2) Bi-variate Stacked/Besides bar plots and Dot plot • Following earlier example, generate stacked/besides bar plot and bi-variate Cleveland’s dot plot • Adding second variable; Gender composition of students trained Bivariate stacked/besides bar plots and dot plot cont. set.seed(5) gender <- sample(c("Female", "Male"), 300, TRUE, c(0.7, 0.3)) monthgen_tab <- table(gender, months)[, c("Dec", "Sep", "Jun", "Mar")] monthgen_tab months gender Dec Sep Jun Mar Female 0 49 78 81 Male 81 11 0 0 14
  • 15. Bivariate stacked/besides bar plots and dot plot cont. barplot(monthgen_tab, col = c("#6699CC", "#99CCFF"), beside = TRUE) legend("topright", legend = c("Female", "Male"), pch = 22 , pt.bg = c("#6699CC", "#99CCFF"), title("Student's trained by gender and month (2016)", xlab = "Month", ylab = "Number trained 15
  • 16. Bivariate Cleveland’s dot plot dotchart(as.matrix(monthgen_tab)[, c("Mar", "Jun", "Sep", "Dec")], bg = 4, xlab = "Total num title("Total student's trained by gender and month", sub = "Data Mania Inc.", font.sub = 3, title(ylab = "Gender and month", line = 2.5) Four-fold plots • Used to display association (or lack of) • Designed for two binary variables (2 x 2 tables), this can be categorized by a third categorical variable with K levels (2 x 2 x k tables) • Association established if diagonal opposite cells in one direction tend to differ in size from those in the other direction • Color used to show this direction 16
  • 17. Four-fold plots cont. • Rings around circle are confidence rings and if adjacent quadrants rings overlap then it corresponds to ( H_0: ) No association • Example data: R’s “Titanic” data (but only for passengers) # Convert Titanic data titanic_passengers <- colSums(Titanic[-4,,,]) titanic_passengers , , Survived = No Age Sex Child Adult Male 35 659 Female 17 106 , , Survived = Yes Age Sex Child Adult Male 29 146 Female 28 296 17
  • 18. Four-fold for Titanic Passengers n # Plotting four fold plot fourfoldplot(titanic_passengers, std = "margins") • Plot shows association (rings do not overlap and diagonal opposite cells differ in size) between Titanic’s passenger’s age (child/adult) and gender (Male/Female) stratified by survival status (No/Yes) • Four-fold differ from pie chart as it varies radius while holding angle constant while pie varies angle while holding radius constant Mosaic plots • Originally proposed by Hartigan and Kleiner (1981, 1984) 18
  • 19. • Similar to a divided bar plot where it displays counts of a contingency table directly by tiles whose area is proportional to the observed cell frequency • Later extended by Friendly (1992, 1994b) • Extended version generates greater visual impact by using color and shading to reflect size of residuals from independence (no association) • Used for exploratory data analysis (establish associations) and model building (display residuals of log-linear model) mosaicplot(titanic_passengers, color = TRUE) • Width of each column of tile in above figure is proportional to observed frequency of each cell and height of each tile is determined by conditional probabilities of row (age) in each column (sex). # Height of tiles prop.table(apply(titanic_passengers, 1:2, sum), 1) Age 19
  • 20. Sex Child Adult Male 0.07364787 0.9263521 Female 0.10067114 0.8993289 Plotting continuous variables • Display will depend on whether it univariate, bi-variate or multivariate • Some often used displays for univariate: – Histograms – Density plots – Box-and-whisker plots – Dot plot – Stem-and-leave plot Plotting continuous variables • Some bi-variate displays – Scatter plot (both variables are continuous) – Box-and-whisker plot (one variable is continuous and the other cate- gorical) Histogram • Display distribution of observation in intervals called “bins” • Each bin is represented by a rectangle whose width is the intervals • Intervals can be equal through out (equidistant, R’s default) or not • Heights of each rectangle corresponds to number of observations falling within an interval (bin) • Generated with function “hist” or plot(x, type = “h”) • Hist constructs bins from argument “breaks” Histogram cont. • Breaks are breaking points for each interval or bin • Giving a vector without this argument is okay (R will compute them), but it’s usually good to change them to show best picture of distribution • Argument “nclass” (compatible with S) can also be used to get number of breaks needed • Histograms are excellent for data with numerous observations 20
  • 21. Histogram cont. # Example data: Edgar Anderson's Iris Data sepal <- iris$Sepal.Length sepal [1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9 5.4 4.8 4.8 4.3 5.8 5.7 5.4 [18] 5.1 5.7 5.1 5.4 5.1 4.6 5.1 4.8 5.0 5.0 5.2 5.2 4.7 4.8 5.4 5.2 5.5 [35] 4.9 5.0 5.5 4.9 4.4 5.1 5.0 4.5 4.4 5.0 5.1 4.8 5.1 4.6 5.3 5.0 7.0 [52] 6.4 6.9 5.5 6.5 5.7 6.3 4.9 6.6 5.2 5.0 5.9 6.0 6.1 5.6 6.7 5.6 5.8 [69] 6.2 5.6 5.9 6.1 6.3 6.1 6.4 6.6 6.8 6.7 6.0 5.7 5.5 5.5 5.8 6.0 5.4 [86] 6.0 6.7 6.3 5.6 5.5 5.5 6.1 5.8 5.0 5.6 5.7 5.7 6.2 5.1 5.7 6.3 5.8 [103] 7.1 6.3 6.5 7.6 4.9 7.3 6.7 7.2 6.5 6.4 6.8 5.7 5.8 6.4 6.5 7.7 7.7 [120] 6.0 6.9 5.6 7.7 6.3 6.7 7.2 6.2 6.1 6.4 7.2 7.4 7.9 6.4 6.3 6.1 7.7 [137] 6.3 6.4 6.0 6.9 6.7 6.9 5.8 6.8 6.7 6.7 6.3 6.5 6.2 5.9 21
  • 22. Code used to plot op <- par("mfrow") par(mfrow = c(1, 2)) hist(sepal, col = "#99CCFF", ann = FALSE) title("Breaks = 10", xlab = "Sepal Length", ylab = "Frequency") hist(sepal, nclass = 15, col = "#6699cc", ann = FALSE) title("Breaks = 15", xlab = "Sepal Length", ylab = "Frequency") par(mfrow = op) Density Plots • Fit “smooth” curve by computing kernel density estimates • Based on probability theory 22
  • 23. dens_sepal <- density(sepal) plot(dens_sepal, type = "n") polygon(dens_sepal, col = "#99CCFF") Box-and-whisker plot (univariate) • Used to visualize data distribution in terms of quarters • Shows outliers • Good comparison displays as multiple variables or groups can be plotted side-by-side states <- as.data.frame(state.x77[, c("Illiteracy", "Life Exp", "Murder", "HS Grad")]) 23
  • 24. # Layout (1 row by 2 columns) op <- par("mfrow") par(mfrow = c(1, 2)) # Visualise distributions boxplot(states$Illiteracy, col = "#99CCFF") boxplot(states$'Life Exp', col = "#6699CC") # Reset original layout par(mfrow = op) • Both distributions have no outliers (points beyond whiskers) • First distribution has most of it’s values at the lower side suggesting a positive skewness (right tail) • Second distribution look almost symmetrical as lower and upper quarters look the same though it’s middle value is more on the lower side 24
  • 25. Dot plots (Uni-variate) • An alternative to box plot when n (sample size) is small • They are one dimensional scatter plots • Called stripchart in R • Example data: 49.3, 48.1, 51.4, 48.1, 49, 49.3, 49.5, 49.8, 49.9, 50.4, 50.1 and 50.3 stripchart(round(num, 1), pch = 22, bg = col[1]) title("Dot plot for small sample size", xlab = "Observations") Stem-and-leave plot • Used to show distribution of observation • Use actual values rather than points 25
  • 26. • Stem is the whole number and is plotted on the left side while on the right side (separated by a vertical bar) are the fractions # Example data (sorted) sort(round(num, 1)) [1] 48.1 48.1 49.0 49.3 49.3 49.5 49.8 49.9 50.1 50.3 50.4 51.4 # # Stem-and-leave plot stem(round(num, 1)) The decimal point is at the | 48 | 11 49 | 033589 50 | 134 51 | 4 Scatter plot • Used to show relationship between two continuous variables • Relationship is said to exist if points have a visible pattern (positive or negative) • No relationship exists if not pattern is visible; points are scattered plot(states[, 1:2], pch = 21, bg = col[1]) title("Association between Illiteracy and Life Expectancy") 26
  • 27. n • Scatter plot shows some negative pattern suggesting an association between “Life Expectancy” and “Illiteracy” (cor = -0.5884779) Box-and-whisker plot (bi-variate) • Useful to display numerical variable by strata’s or groups of another categorical variable • Can also be used to compare two numerical distributions 27
  • 28. # Box plot with slant axis op <- par("mar") par(mar = c(7, 4, 4, 2) + 0.1) # Plot without axis boxplot(states$`Life Exp`~state.division, col = col[1], xaxt = "n", xlab = "") # Add axis without labels axis(1, labels = FALSE) # Labels as levels of categorical variable labs <- levels(state.division) # Add labels text(1:length(labs), par("usr")[3] - 0.25, srt = 45, adj = 1, labels = labs, xpd = TRUE) 28
  • 29. # Add xlab mtext("Divisions", side = 1, line = 6, font = 2) # Annotate plot title("Life expectancy for each US division", ylab = "Life expectancy") # Reset parameter par(mar = op) • Using box plot to make comparison of similar distribution • Example data: Elgar Anderson’s Iris Data 29
  • 30. # Comparing lengths (Sepal and Petal) boxplot(iris[, c("Sepal.Length", "Petal.Length")], col = col) title("Comparing length of Irises of Gaspe Peninsula") # Comparing width (Sepal and Petal) boxplot(iris[, c("Sepal.Width", "Petal.Width")], col = col) title("Comparing width of Irises of Gaspe Peninsula") • Sepal seems to be higher in terms of length and width than petal • Will this pattern hold under different species? 30
  • 31. • Pattern still holds, Sepal length is higher than Petal length across all species 31
  • 32. • Pattern still holds as Sepal width is higher than Petal width across all species however, it’s interesting to see “setosa” is higher than the others. # High level functions boxplot(iris$Sepal.Length~iris$Species, col = col[1], ylim = c(min(iris$Petal.Length) - 0.1, boxplot(iris$Petal.Length~iris$Species, col = 4, add = TRUE) # Low level functions legend("bottomright", c("Sepal", "Petal"), pch = 22, pt.bg = c(col[1], 4), title = "Iris Typ title("Comparison of Iris Length by species", xlab = "Species", ylab = "Length") # High level functions boxplot(iris$Sepal.Width~iris$Species, col = col[1], ylim = c(min(iris$Petal.Width) - 0.1, m boxplot(iris$Petal.Width~iris$Species, col = 4, add = TRUE) # Low level functions legend("bottomright", c("Sepal", "Petal"), pch = 22, pt.bg = c(col[1], 4), title = "Iris Typ 32
  • 33. title("Comparison of Iris Width by species", xlab = "Species", ylab = "Width") 33