SlideShare a Scribd company logo
Exploratory Data Analysis 
Wesley GOI
In today’s session 
• Principles behind exploratory analyses 
• Plotting data out on to popular exploratory graphs 
• Plotting Systems in R 
• Base (Week1) 
• Lattice (Week2) 
• GGPLOT2 (Week2) 
• Choosing and using Graphic Devices aka the output formats 
Scripts can be downloaded at: 
https://www.dropbox.com/s/ii1yj8f650d4l1q/lesson1.r?dl=0 
https://www.dropbox.com/s/eme44h6lrhn775l/final.r?dl=0
Principles behind exploratory analyses 
• Show comparisons 
• Show causality, mechanism, explanation 
• Show multivariate data 
• Integrate multiple modes of evidence 
• Describe and document the evidence 
• Content is king 
• SPEED
Dimensionality 
• Five-number summary 
• Boxplots 
• Histograms 
• Density plot 
• Barplot 
Multiple-overlayed 1D plots 
Scatter plots
Downloading our dataset 
R code 
dir.create("exploring_data") 
setwd(“exploring_data”) 
download.file(“http://www.bio.ic.ac.uk/research/mjcraw/therbook/data/therbook.zip",dest="data.zip") 
unzip(“data.zip”)
R code 
Boxplots 
weather = read.table("SilwoodWeather.txt",h=T) 
onemonth = subset(weather, 
month==1 & yr == 2004) 
boxplot(onemonth$rain) 
Header = T
Histograms 
R code 
hist(weather$upper) 
rug(weather$upper) ticks for each value
Barplot 
R code 
Barplot( 
table(weather$month), 
col = "wheat", 
main = "Number of Observations in 
Months”)
Raster Vector 
PNG PDF SVG 
grDevices 
Filesize small medium medium 
Scalable No Yes Yes 
Web friendly Yes No Yes
Plotting Systems 
Plotting Systems 
Base Lattice Grid 
Libraries lattice grid, gridExtras 
ggplot2 
Example 
functions 
hist✔ 
barplot✔ 
boxplot✔ 
Plot 
xyplot (scatterplots) 
bwplot (boxplots) 
levelplot 
qplot 
ggplot 
geom 
Facetted plots Yes Yes Yes 
Grammar of 
NO No Yes 
graphics 
Interface with 
statistical 
functions 
Yes Partial Partial + 
Workarounds 
Cannot 
be mixed
Base plots: Scatterplot 
R code 
data1 = read.table("scatter1.txt", h=T) 
data2 = read.table("scatter2.txt", h=T)
Base plots: Scatterplot 
R code 
data1 = read.table("scatter1.txt", h=T) 
data2 = read.table("scatter2.txt", h=T) 
#Color 
with(data1, plot(xv, ys, col="red")) 
#Regression Line 
with(data1, abline(lm(ys~xv))) 
Color
Base plots: Scatterplot 
Set symbol to represent data point
Base plots: Scatterplot 
R code 
data1 = read.table("scatter1.txt", h=T) 
data2 = read.table("scatter2.txt", h=T) 
#Color 
with(data1, plot(xv, ys, col="red")) 
with(data1, abline(lm(ys~xv))) 
#shape 
with(data2, 
points(xv2, ys2, col="blue", 
pch =11)) 
Symbol shape
Base plots: Scatterplot 
R code 
data1 = read.table("scatter1.txt", h=T) 
data2 = read.table("scatter2.txt", h=T) 
#Color 
with(data1, plot(xv, ys, col="red")) 
with(data1, abline(lm(ys~xv))) 
#shape 
with(data2, 
points(xv2, ys2, col="blue", 
pch =11)) 
Symbol shape
Base plots: Using par for multiple plots 
R code 
par(mfrow=c(1,2)) 
with(data1, plot(xv, ys, col="red")) 
with(data1, abline(lm(ys~xv))) 
#Plot2 
with(data2, 
plot(xv2, ys2, col="blue", 
pch =11)) 
title(“My Title", outer=TRUE)
Par: To set global settings 
R code 
mfrow( 
mar=c(5.1,4.1,4.1,2.1), 
oma=c(2,2,2,2) 
)
Lattice 
productivity = read.table("productivity.txt",h=T) 
# of species in forest against differing productivity 
library(lattice) 
#plotting 
xyplot( x~y, productivity, 
xlab=list(label="Productivity"), 
ylab=list(label="Mammal Species")) 
R code 
Formular 
Data frame
Lattice 
productivity = read.table("productivity.txt",h=T) 
# of species in forest against differing productivity 
library(lattice) 
#plotting 
xyplot( x~y, productivity, 
xlab=list(label="Productivity"), 
ylab=list(label="Mammal Species")) 
xyplot( x~y | f, productivity, 
xlab=list(label="Productivity"), 
ylab=list(label="Mammal Species")) 
R code 
Formular 
Data frame 
given
ggplot2 
• Grammar of graphics (gg) 
• Based on GRID plotting system, cannot be 
mixed with base 
ggplot2.org
ggplot 
Components 
• Data & relationship 
• GEOMetric Object 
• Statistical transformation 
• Scales 
• Coordinate system 
• Facetting
ggplot 
Data
ggplot 
Mapping
ggplot 
Geometric objects 
aka 
Geoms 
Coordinate system 
wrt 
scales 
Log scale / sqrt / log ratio 
Title 
Plot 
Theme 
etc
ggplot 
Geometric objects 
aka 
Geoms
ggplot 
Components 
• Data & relationship ✔ 
• GEOMetric Object 
• Statistical transformation 
• Scales 
• Coordinate system 
• Facetting 
R code 
Rmbr to change 
month into a 
factor 
data.frame 
Aesthetics function which maps the relationships 
ggplot(weather, aes(x=month, y=upper))+ 
geom_boxplot()
ggplot 
Components 
• Data & relationship ✔ 
• GEOMetric Object ✔ 
• Statistical transformation✔ 
• Scales 
• Coordinate system 
• Facetting 
R code 
weather2 = weather %>% 
group_by(month) %>% 
summarise(average.upper = mean(upper)) 
ggplot(weather2, aes(month, average.upper))+ 
geom_bar(stat="identity")
ggplot 
Components 
• Data & relationship ✔ 
• GEOMetric Object ✔ 
• Statistical transformation✔ 
• Scales 
• Coordinate system 
• Facetting 
R code 
weather2 = weather %>% 
group_by(month) %>% 
summarise(average.upper = mean(upper)) 
ggplot(weather2, aes(month, average.upper))+ 
geom_bar(stat="identity")
ggplot 
Components 
• Data & relationship ✔ 
• GEOMetric Object ✔ 
• Statistical transformation✔ 
• Scales✔ 
• Coordinate system 
• Facetting 
R code 
plot2 = ggplot(weather2, 
aes(month, average.upper))+ 
geom_bar(aes(fill=month),stat="identity")+ 
scale_fill_brewer(palette="Set3")+ 
xlab("Months")+ 
ylab("Upper Quantile")+theme_bw()
ggplot 
Components 
• Data & relationship ✔ 
• GEOMetric Object ✔ 
• Statistical transformation✔ 
• Scales✔ 
• Coordinate system 
• Facetting 
R code 
plot2 = ggplot(weather2, 
aes(month, average.upper))+ 
geom_bar(aes(fill=month),stat="identity")+ 
scale_fill_brewer(palette="Set3")+ 
xlab("Months")+ 
ylab("Upper Quantile")+theme_bw()
ggplot
qplot 
A separate function which wraps ggplot, for simpler syntax 
R code 
qplot(month, upper, fill=month, data=weather, facets = ~yr, geom="bar", 
stat="identity")
Ethos behind visualization 
http://keylines.com/network-visualization
Final Challenge
Final Challenge 
R code 
library(ggplot2) 
#Reads in data 
data = read.csv("final.csv") 
#Preparing for the rectangle background 
areas=unique(subset(data, select=c(Planning_Area,Planning_Region))) 
areas=areas[order(areas$Planning_Region),] 
areas$rectid=1:nrow(areas) 
rectdata = areas %>% group_by(Planning_Region) %>% summarise(xstart=min(rectid)- 
0.5,xend= max(rectid)+0.5) 
#Order the levels 
data$Planning_Area=factor(data$Planning_Area, 
levels=as.character(areas[order(areas$Planning_Region),]$Planning_Area))
Final challenge 
#Plot 
p0 = 
ggplot(data, aes(Planning_Area, Unit_Price____psm_))+ 
geom_boxplot(outlier.colour=NA)+ 
geom_rect(data=rectdata,aes(xmin=xstart,xmax=xend,ymin = -Inf, ymax = Inf, fill = 
Planning_Region,group=Planning_Region), alpha = 0.4,inherit.aes=F)+ 
geom_jitter(alpha=0.40, aes(color=as.factor(Year)))+ 
scale_color_brewer("Year", palette='RdBu')+ 
scale_fill_brewer(palette="Set1",name='Region')+ 
theme_minimal()+ 
theme(axis.text.x = element_text(angle=45, hjust=1, vjust=1))+ 
xlab("Planning Area")+ylab("Unit Price (PSM)") 
R code 
#Save plot 
ggsave(p0, file="areaboxplots.pdf",w=20,h=10,units="in",dpi=300)
“Above all else show the data.” 
― Edward R. Tufte, The Visual Display of Quantitative Information 
Thank you for your time
gridExtras

More Related Content

Viewers also liked

Exploratory Factor Analysis
Exploratory Factor AnalysisExploratory Factor Analysis
Exploratory Factor Analysis
Daire Hooper
 
Hamilton 1994 time series analysis
Hamilton 1994 time series analysisHamilton 1994 time series analysis
Hamilton 1994 time series analysis
Ozan Baskan
 
Exploratory factor analysis
Exploratory factor analysisExploratory factor analysis
Exploratory factor analysisAmmar Pervaiz
 
Descriptive Analysis in Statistics
Descriptive Analysis in StatisticsDescriptive Analysis in Statistics
Descriptive Analysis in Statistics
Azmi Mohd Tamil
 
Time Series Analysis: Theory and Practice
Time Series Analysis: Theory and PracticeTime Series Analysis: Theory and Practice
Time Series Analysis: Theory and Practice
Tetiana Ivanova
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
Bhagya Silva
 
Time series
Time seriesTime series
Time series
Haitham Ahmed
 
Time series slideshare
Time series slideshareTime series slideshare
Time series slideshare
Sabbir Tahmidur Rahman
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsAiden Yeh
 
Time Series
Time SeriesTime Series
Time Seriesyush313
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
guest290abe
 
Data analysis powerpoint
Data analysis powerpointData analysis powerpoint
Data analysis powerpointjamiebrandon
 
Exploratory factor analysis
Exploratory factor analysisExploratory factor analysis
Exploratory factor analysis
James Neill
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017
Drift
 

Viewers also liked (16)

Exploratory Factor Analysis
Exploratory Factor AnalysisExploratory Factor Analysis
Exploratory Factor Analysis
 
Hamilton 1994 time series analysis
Hamilton 1994 time series analysisHamilton 1994 time series analysis
Hamilton 1994 time series analysis
 
Exploratory factor analysis
Exploratory factor analysisExploratory factor analysis
Exploratory factor analysis
 
Descriptive Analysis in Statistics
Descriptive Analysis in StatisticsDescriptive Analysis in Statistics
Descriptive Analysis in Statistics
 
Time Series Analysis: Theory and Practice
Time Series Analysis: Theory and PracticeTime Series Analysis: Theory and Practice
Time Series Analysis: Theory and Practice
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
Time series
Time seriesTime series
Time series
 
Time Series Analysis Ravi
Time Series Analysis RaviTime Series Analysis Ravi
Time Series Analysis Ravi
 
Time series slideshare
Time series slideshareTime series slideshare
Time series slideshare
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Time Series
Time SeriesTime Series
Time Series
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
Data analysis powerpoint
Data analysis powerpointData analysis powerpoint
Data analysis powerpoint
 
Exploratory factor analysis
Exploratory factor analysisExploratory factor analysis
Exploratory factor analysis
 
time series analysis
time series analysistime series analysis
time series analysis
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017
 

Similar to Exploratory Analysis Part1 Coursera DataScience Specialisation

R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine Learning
AmanBhalla14
 
Big datacourse
Big datacourseBig datacourse
Big datacourse
Massimiliano Ruocco
 
Introduction to R for data science
Introduction to R for data scienceIntroduction to R for data science
Introduction to R for data science
Long Nguyen
 
Data profiling in Apache Calcite
Data profiling in Apache CalciteData profiling in Apache Calcite
Data profiling in Apache Calcite
DataWorks Summit
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache Calcite
Julian Hyde
 
Tech talk ggplot2
Tech talk   ggplot2Tech talk   ggplot2
Tech talk ggplot2
jalle6
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache Calcite
Julian Hyde
 
ggplotcourse.pptx
ggplotcourse.pptxggplotcourse.pptx
ggplotcourse.pptx
JAVIERDELAHOZ8
 
Spatial Analysis with R - the Good, the Bad, and the Pretty
Spatial Analysis with R - the Good, the Bad, and the PrettySpatial Analysis with R - the Good, the Bad, and the Pretty
Spatial Analysis with R - the Good, the Bad, and the Pretty
Noam Ross
 
Presentation: Plotting Systems in R
Presentation: Plotting Systems in RPresentation: Plotting Systems in R
Presentation: Plotting Systems in R
Ilya Zhbannikov
 
R training5
R training5R training5
R training5
Hellen Gakuruh
 
Practical data science_public
Practical data science_publicPractical data science_public
Practical data science_public
Long Nguyen
 
ggplot2: An Extensible Platform for Publication-quality Graphics
ggplot2: An Extensible Platform for Publication-quality Graphicsggplot2: An Extensible Platform for Publication-quality Graphics
ggplot2: An Extensible Platform for Publication-quality Graphics
Claus Wilke
 
Week-3 – System RSupplemental material1Recap •.docx
Week-3 – System RSupplemental material1Recap •.docxWeek-3 – System RSupplemental material1Recap •.docx
Week-3 – System RSupplemental material1Recap •.docx
helzerpatrina
 
Elegant Graphics for Data Analysis with ggplot2
Elegant Graphics for Data Analysis with ggplot2Elegant Graphics for Data Analysis with ggplot2
Elegant Graphics for Data Analysis with ggplot2
yannabraham
 
Exploratory data analysis of 2017 US Employment data using R
Exploratory data analysis  of 2017 US Employment data using RExploratory data analysis  of 2017 US Employment data using R
Exploratory data analysis of 2017 US Employment data using R
Chetan Khanzode
 
Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016
Spencer Fox
 
CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners
Jen Stirrup
 
Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2
izahn
 

Similar to Exploratory Analysis Part1 Coursera DataScience Specialisation (20)

R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine Learning
 
Big datacourse
Big datacourseBig datacourse
Big datacourse
 
Introduction to R for data science
Introduction to R for data scienceIntroduction to R for data science
Introduction to R for data science
 
Data profiling in Apache Calcite
Data profiling in Apache CalciteData profiling in Apache Calcite
Data profiling in Apache Calcite
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache Calcite
 
Tech talk ggplot2
Tech talk   ggplot2Tech talk   ggplot2
Tech talk ggplot2
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache Calcite
 
ggplotcourse.pptx
ggplotcourse.pptxggplotcourse.pptx
ggplotcourse.pptx
 
Spatial Analysis with R - the Good, the Bad, and the Pretty
Spatial Analysis with R - the Good, the Bad, and the PrettySpatial Analysis with R - the Good, the Bad, and the Pretty
Spatial Analysis with R - the Good, the Bad, and the Pretty
 
Presentation: Plotting Systems in R
Presentation: Plotting Systems in RPresentation: Plotting Systems in R
Presentation: Plotting Systems in R
 
R training5
R training5R training5
R training5
 
BasicGraphsWithR
BasicGraphsWithRBasicGraphsWithR
BasicGraphsWithR
 
Practical data science_public
Practical data science_publicPractical data science_public
Practical data science_public
 
ggplot2: An Extensible Platform for Publication-quality Graphics
ggplot2: An Extensible Platform for Publication-quality Graphicsggplot2: An Extensible Platform for Publication-quality Graphics
ggplot2: An Extensible Platform for Publication-quality Graphics
 
Week-3 – System RSupplemental material1Recap •.docx
Week-3 – System RSupplemental material1Recap •.docxWeek-3 – System RSupplemental material1Recap •.docx
Week-3 – System RSupplemental material1Recap •.docx
 
Elegant Graphics for Data Analysis with ggplot2
Elegant Graphics for Data Analysis with ggplot2Elegant Graphics for Data Analysis with ggplot2
Elegant Graphics for Data Analysis with ggplot2
 
Exploratory data analysis of 2017 US Employment data using R
Exploratory data analysis  of 2017 US Employment data using RExploratory data analysis  of 2017 US Employment data using R
Exploratory data analysis of 2017 US Employment data using R
 
Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016
 
CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners
 
Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2
 

Recently uploaded

DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
fafyfskhan251kmf
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
TinyAnderson
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
RASHMI M G
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
MAGOTI ERNEST
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
muralinath2
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
Anemia_ types_clinical significance.pptx
Anemia_ types_clinical significance.pptxAnemia_ types_clinical significance.pptx
Anemia_ types_clinical significance.pptx
muralinath2
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
David Osipyan
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills MN
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 

Recently uploaded (20)

DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
Anemia_ types_clinical significance.pptx
Anemia_ types_clinical significance.pptxAnemia_ types_clinical significance.pptx
Anemia_ types_clinical significance.pptx
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 

Exploratory Analysis Part1 Coursera DataScience Specialisation

  • 2. In today’s session • Principles behind exploratory analyses • Plotting data out on to popular exploratory graphs • Plotting Systems in R • Base (Week1) • Lattice (Week2) • GGPLOT2 (Week2) • Choosing and using Graphic Devices aka the output formats Scripts can be downloaded at: https://www.dropbox.com/s/ii1yj8f650d4l1q/lesson1.r?dl=0 https://www.dropbox.com/s/eme44h6lrhn775l/final.r?dl=0
  • 3. Principles behind exploratory analyses • Show comparisons • Show causality, mechanism, explanation • Show multivariate data • Integrate multiple modes of evidence • Describe and document the evidence • Content is king • SPEED
  • 4. Dimensionality • Five-number summary • Boxplots • Histograms • Density plot • Barplot Multiple-overlayed 1D plots Scatter plots
  • 5. Downloading our dataset R code dir.create("exploring_data") setwd(“exploring_data”) download.file(“http://www.bio.ic.ac.uk/research/mjcraw/therbook/data/therbook.zip",dest="data.zip") unzip(“data.zip”)
  • 6. R code Boxplots weather = read.table("SilwoodWeather.txt",h=T) onemonth = subset(weather, month==1 & yr == 2004) boxplot(onemonth$rain) Header = T
  • 7. Histograms R code hist(weather$upper) rug(weather$upper) ticks for each value
  • 8. Barplot R code Barplot( table(weather$month), col = "wheat", main = "Number of Observations in Months”)
  • 9. Raster Vector PNG PDF SVG grDevices Filesize small medium medium Scalable No Yes Yes Web friendly Yes No Yes
  • 10. Plotting Systems Plotting Systems Base Lattice Grid Libraries lattice grid, gridExtras ggplot2 Example functions hist✔ barplot✔ boxplot✔ Plot xyplot (scatterplots) bwplot (boxplots) levelplot qplot ggplot geom Facetted plots Yes Yes Yes Grammar of NO No Yes graphics Interface with statistical functions Yes Partial Partial + Workarounds Cannot be mixed
  • 11. Base plots: Scatterplot R code data1 = read.table("scatter1.txt", h=T) data2 = read.table("scatter2.txt", h=T)
  • 12. Base plots: Scatterplot R code data1 = read.table("scatter1.txt", h=T) data2 = read.table("scatter2.txt", h=T) #Color with(data1, plot(xv, ys, col="red")) #Regression Line with(data1, abline(lm(ys~xv))) Color
  • 13. Base plots: Scatterplot Set symbol to represent data point
  • 14. Base plots: Scatterplot R code data1 = read.table("scatter1.txt", h=T) data2 = read.table("scatter2.txt", h=T) #Color with(data1, plot(xv, ys, col="red")) with(data1, abline(lm(ys~xv))) #shape with(data2, points(xv2, ys2, col="blue", pch =11)) Symbol shape
  • 15. Base plots: Scatterplot R code data1 = read.table("scatter1.txt", h=T) data2 = read.table("scatter2.txt", h=T) #Color with(data1, plot(xv, ys, col="red")) with(data1, abline(lm(ys~xv))) #shape with(data2, points(xv2, ys2, col="blue", pch =11)) Symbol shape
  • 16. Base plots: Using par for multiple plots R code par(mfrow=c(1,2)) with(data1, plot(xv, ys, col="red")) with(data1, abline(lm(ys~xv))) #Plot2 with(data2, plot(xv2, ys2, col="blue", pch =11)) title(“My Title", outer=TRUE)
  • 17. Par: To set global settings R code mfrow( mar=c(5.1,4.1,4.1,2.1), oma=c(2,2,2,2) )
  • 18. Lattice productivity = read.table("productivity.txt",h=T) # of species in forest against differing productivity library(lattice) #plotting xyplot( x~y, productivity, xlab=list(label="Productivity"), ylab=list(label="Mammal Species")) R code Formular Data frame
  • 19.
  • 20. Lattice productivity = read.table("productivity.txt",h=T) # of species in forest against differing productivity library(lattice) #plotting xyplot( x~y, productivity, xlab=list(label="Productivity"), ylab=list(label="Mammal Species")) xyplot( x~y | f, productivity, xlab=list(label="Productivity"), ylab=list(label="Mammal Species")) R code Formular Data frame given
  • 21.
  • 22. ggplot2 • Grammar of graphics (gg) • Based on GRID plotting system, cannot be mixed with base ggplot2.org
  • 23. ggplot Components • Data & relationship • GEOMetric Object • Statistical transformation • Scales • Coordinate system • Facetting
  • 26. ggplot Geometric objects aka Geoms Coordinate system wrt scales Log scale / sqrt / log ratio Title Plot Theme etc
  • 28. ggplot Components • Data & relationship ✔ • GEOMetric Object • Statistical transformation • Scales • Coordinate system • Facetting R code Rmbr to change month into a factor data.frame Aesthetics function which maps the relationships ggplot(weather, aes(x=month, y=upper))+ geom_boxplot()
  • 29. ggplot Components • Data & relationship ✔ • GEOMetric Object ✔ • Statistical transformation✔ • Scales • Coordinate system • Facetting R code weather2 = weather %>% group_by(month) %>% summarise(average.upper = mean(upper)) ggplot(weather2, aes(month, average.upper))+ geom_bar(stat="identity")
  • 30. ggplot Components • Data & relationship ✔ • GEOMetric Object ✔ • Statistical transformation✔ • Scales • Coordinate system • Facetting R code weather2 = weather %>% group_by(month) %>% summarise(average.upper = mean(upper)) ggplot(weather2, aes(month, average.upper))+ geom_bar(stat="identity")
  • 31. ggplot Components • Data & relationship ✔ • GEOMetric Object ✔ • Statistical transformation✔ • Scales✔ • Coordinate system • Facetting R code plot2 = ggplot(weather2, aes(month, average.upper))+ geom_bar(aes(fill=month),stat="identity")+ scale_fill_brewer(palette="Set3")+ xlab("Months")+ ylab("Upper Quantile")+theme_bw()
  • 32. ggplot Components • Data & relationship ✔ • GEOMetric Object ✔ • Statistical transformation✔ • Scales✔ • Coordinate system • Facetting R code plot2 = ggplot(weather2, aes(month, average.upper))+ geom_bar(aes(fill=month),stat="identity")+ scale_fill_brewer(palette="Set3")+ xlab("Months")+ ylab("Upper Quantile")+theme_bw()
  • 34. qplot A separate function which wraps ggplot, for simpler syntax R code qplot(month, upper, fill=month, data=weather, facets = ~yr, geom="bar", stat="identity")
  • 35. Ethos behind visualization http://keylines.com/network-visualization
  • 37. Final Challenge R code library(ggplot2) #Reads in data data = read.csv("final.csv") #Preparing for the rectangle background areas=unique(subset(data, select=c(Planning_Area,Planning_Region))) areas=areas[order(areas$Planning_Region),] areas$rectid=1:nrow(areas) rectdata = areas %>% group_by(Planning_Region) %>% summarise(xstart=min(rectid)- 0.5,xend= max(rectid)+0.5) #Order the levels data$Planning_Area=factor(data$Planning_Area, levels=as.character(areas[order(areas$Planning_Region),]$Planning_Area))
  • 38. Final challenge #Plot p0 = ggplot(data, aes(Planning_Area, Unit_Price____psm_))+ geom_boxplot(outlier.colour=NA)+ geom_rect(data=rectdata,aes(xmin=xstart,xmax=xend,ymin = -Inf, ymax = Inf, fill = Planning_Region,group=Planning_Region), alpha = 0.4,inherit.aes=F)+ geom_jitter(alpha=0.40, aes(color=as.factor(Year)))+ scale_color_brewer("Year", palette='RdBu')+ scale_fill_brewer(palette="Set1",name='Region')+ theme_minimal()+ theme(axis.text.x = element_text(angle=45, hjust=1, vjust=1))+ xlab("Planning Area")+ylab("Unit Price (PSM)") R code #Save plot ggsave(p0, file="areaboxplots.pdf",w=20,h=10,units="in",dpi=300)
  • 39. “Above all else show the data.” ― Edward R. Tufte, The Visual Display of Quantitative Information Thank you for your time

Editor's Notes

  1. In this course we will be learning how to
  2. In this course we will be learning how to
  3. In this course we will be learning how to
  4. In this course we will be learning how to
  5. barplot(table(weather$month), col = "wheat", main = "Number of Observations in Months")
  6. In this course we will be learning how to
  7. In this course we will be learning how to
  8. In this course we will be learning how to
  9. In this course we will be learning how to
  10. In this course we will be learning how to
  11. In this course we will be learning how to title("My Title", outer=TRUE)
  12. In this course we will be learning how to
  13. ggplot(weather, aes(month, upper))+ geom_boxplot()
  14. ggplot(weather, aes(month, upper))+ geom_boxplot()
  15. ggplot(weather, aes(month, upper))+ geom_boxplot()
  16. ggplot(weather, aes(month, upper))+ geom_boxplot()
  17. ggplot(weather, aes(month, upper))+ geom_boxplot()
  18. ggplot(weather, aes(month, upper))+ geom_boxplot()
  19. In this course we will be learning how to
  20. In this course we will be learning how to