SlideShare a Scribd company logo
1 of 50
A Survey of R Graphics June 18 2009 R Users Group of LA Michael E. Driscoll Principal, Dataspora [email_address] www.dataspora.com
“ The sexy job in the next ten years will be statisticians…” - Hal Varian
 
(from Jessica Hagy’s  thisisindexed.com) Hypothesis
gdp <- read.csv('gdp.csv') hours <- read.csv('hours.csv') gdp.hours <- merge(hours,gdp) gdp.hours$freetime <- 4380 - gdp.hours$hours  attach(gdp.hours) plot(freetime ~ gdp) m <- lm(freetime ~ gdp,data=gdp.hours) abline(m,col=3,lw=2) pm <- loess(freetime ~ gdp) lines(spline(gdp,fitted(pm))) Munge & Model
Visualization library(ggplot2) qplot(gdp,freetime, data=gdp.hours, geom=c(&quot;point&quot;, &quot;smooth&quot;), span=1)
basic graphics
R’s Two Graphics Systems
plot()  graphs objects plot(freetime ~ gdp,  data=gdp.hours) model <- lm(freetime ~ gdp, data=gdp.hours) ab line(model)
plot()  graphs objects abline(model, col=&quot;red&quot;, lwd=3 )
par  sets graphical  par ameters par( pch =20,  cex =5, col =&quot;#5050a0 BB &quot;) RGB hex alpha blending! help(par) plot(freetime ~ gdp, data=gdp.hours)
par  sets graphical  par ameters parameters for par() pch col adj srt pt.cex graphing functions points() text() xlab() legend()
Paneling Graphics ,[object Object],[object Object],Number of rows Number of columns
Paneling Graphics ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Paneling Graphics
Working with Graphics Devices ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
library( ggplot2 )
gg plot2 = g rammar of  g raphics
gg plot2 = g rammar of g raphics
Visualizing 50,000 Diamonds with ggplot2
qplot (carat, price, data = diamonds)
qplot( log (carat),  log (price), data = diamonds) qplot(carat, price,  log=“xy” , data = diamonds) OR
qplot(log(carat), log(price), data = diamonds,  alpha = I(1/20) )
qplot(log(carat), log(price), data = diamonds,  alpha = I(1/20),  colour=color )
Achieving small multiples with “facets” qplot(log(carat), log(price), data = diamonds, alpha=I(1/20)) +  facet_grid(. ~ color)
qplot(color, price/carat,  data = diamonds, alpha = I(1/20),  geom=“jitter” ) qplot(color, price/carat,  data = diamonds, geom=“boxplot” ) old new
 
library( lattice )
lattice =  trellis ,[object Object]
visualizing six dimensions of MLB pitches with  lattice
xyplot(x ~ y, data=pitch)
xyplot(x ~ y,  groups=type , data=pitch)
xyplot( x ~ y | type , data=pitch)
xyplot(x ~ y | type, data=pitch, fill.color = pitch$color, panel =  function(x,y, fill.color, …, subscripts) { fill <- fill.color[subscripts] panel.xyplot(x,y, fill= fill, …) })
xyplot(x ~ y | type, data=pitch, fill.color = pitch$color, panel =  function(x,y, fill.color, …, subscripts) { fill <- fill.color[subscripts] panel.xyplot(x, y, fill= fill, …) })
A Story of Two Pitchers Hamels Webb
list of  lattice functions densityplot(~ speed | type, data=pitch)
plotting big data
xyplot  with 1m points = Bad Idea Jeans xyplot(log(price)~log(carat),data=diamonds)
efficient plotting with  hexbinplot hexbinplot(log(price)~log(carat),data=diamonds,xbins=40)
100 thousand  gene measures
efficient plotting with  geneplotter
beautiful colors with  Colorspace library(“Colorspace”) red <- LAB(50,64,64) blue <- LAB(50,-48,-48) mixcolor(10, red, blue)
R--> web
L inux A pache M ySQL R http://labs.dataspora.com/gameday
 
 
Configuring rapache ,[object Object],setContentType(&quot;text/html&quot;) png(&quot;/var/www/hello.png&quot;) plot(sample(100,100),col=1:8,pch=19) dev.off() cat(&quot;<html>&quot;) cat(&quot;<body>&quot;) cat(&quot;<h1>hello world</h1>&quot;) cat('<img src=&quot;../hello.png&quot;') cat(&quot;</body>&quot;) cat(&quot;</html>&quot;)
Data Visualization References ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Contact Us Michael E. Driscoll, Ph.D. Principal [email_address] www.dataspora.com

More Related Content

What's hot

Pandas pythonfordatascience
Pandas pythonfordatasciencePandas pythonfordatascience
Pandas pythonfordatascienceNishant Upadhyay
 
Symmetry in the interrelation of flatMap/foldMap/traverse and flatten/fold/se...
Symmetry in the interrelation of flatMap/foldMap/traverse and flatten/fold/se...Symmetry in the interrelation of flatMap/foldMap/traverse and flatten/fold/se...
Symmetry in the interrelation of flatMap/foldMap/traverse and flatten/fold/se...Philip Schwarz
 
Python Cheat Sheet
Python Cheat SheetPython Cheat Sheet
Python Cheat SheetGlowTouch
 
Sequence and Traverse - Part 1
Sequence and Traverse - Part 1Sequence and Traverse - Part 1
Sequence and Traverse - Part 1Philip Schwarz
 
Pandas Cheat Sheet
Pandas Cheat SheetPandas Cheat Sheet
Pandas Cheat SheetACASH1011
 
Introduction to NumPy (PyData SV 2013)
Introduction to NumPy (PyData SV 2013)Introduction to NumPy (PyData SV 2013)
Introduction to NumPy (PyData SV 2013)PyData
 
Scientific Computing with Python - NumPy | WeiYuan
Scientific Computing with Python - NumPy | WeiYuanScientific Computing with Python - NumPy | WeiYuan
Scientific Computing with Python - NumPy | WeiYuanWei-Yuan Chang
 
Introduction to matplotlib
Introduction to matplotlibIntroduction to matplotlib
Introduction to matplotlibPiyush rai
 
Scala. Introduction to FP. Monads
Scala. Introduction to FP. MonadsScala. Introduction to FP. Monads
Scala. Introduction to FP. MonadsKirill Kozlov
 
A, B, C. 1, 2, 3. Iterables you and me - Willian Martins (ebay)
A, B, C. 1, 2, 3. Iterables you and me - Willian Martins (ebay)A, B, C. 1, 2, 3. Iterables you and me - Willian Martins (ebay)
A, B, C. 1, 2, 3. Iterables you and me - Willian Martins (ebay)Shift Conference
 
Data manipulation with dplyr
Data manipulation with dplyrData manipulation with dplyr
Data manipulation with dplyrRomain Francois
 
Data Visualization With R: Learn To Modify Font Of Graphical Parameters
Data Visualization With R: Learn To Modify Font Of Graphical ParametersData Visualization With R: Learn To Modify Font Of Graphical Parameters
Data Visualization With R: Learn To Modify Font Of Graphical ParametersRsquared Academy
 
Addendum to ‘Monads do not Compose’
Addendum to ‘Monads do not Compose’ Addendum to ‘Monads do not Compose’
Addendum to ‘Monads do not Compose’ Philip Schwarz
 
Pandas,scipy,numpy cheatsheet
Pandas,scipy,numpy cheatsheetPandas,scipy,numpy cheatsheet
Pandas,scipy,numpy cheatsheetDr. Volkan OBAN
 
Fp in scala with adts part 2
Fp in scala with adts part 2Fp in scala with adts part 2
Fp in scala with adts part 2Hang Zhao
 
python-cheat-sheet-v1
python-cheat-sheet-v1python-cheat-sheet-v1
python-cheat-sheet-v1Hiroshi Ono
 
Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Ram Narasimhan
 

What's hot (20)

Pandas pythonfordatascience
Pandas pythonfordatasciencePandas pythonfordatascience
Pandas pythonfordatascience
 
Symmetry in the interrelation of flatMap/foldMap/traverse and flatten/fold/se...
Symmetry in the interrelation of flatMap/foldMap/traverse and flatten/fold/se...Symmetry in the interrelation of flatMap/foldMap/traverse and flatten/fold/se...
Symmetry in the interrelation of flatMap/foldMap/traverse and flatten/fold/se...
 
Python Cheat Sheet
Python Cheat SheetPython Cheat Sheet
Python Cheat Sheet
 
Sequence and Traverse - Part 1
Sequence and Traverse - Part 1Sequence and Traverse - Part 1
Sequence and Traverse - Part 1
 
NumPy Refresher
NumPy RefresherNumPy Refresher
NumPy Refresher
 
Numpy Talk at SIAM
Numpy Talk at SIAMNumpy Talk at SIAM
Numpy Talk at SIAM
 
Pandas Cheat Sheet
Pandas Cheat SheetPandas Cheat Sheet
Pandas Cheat Sheet
 
Introduction to NumPy (PyData SV 2013)
Introduction to NumPy (PyData SV 2013)Introduction to NumPy (PyData SV 2013)
Introduction to NumPy (PyData SV 2013)
 
Scientific Computing with Python - NumPy | WeiYuan
Scientific Computing with Python - NumPy | WeiYuanScientific Computing with Python - NumPy | WeiYuan
Scientific Computing with Python - NumPy | WeiYuan
 
Introduction to matplotlib
Introduction to matplotlibIntroduction to matplotlib
Introduction to matplotlib
 
Scala. Introduction to FP. Monads
Scala. Introduction to FP. MonadsScala. Introduction to FP. Monads
Scala. Introduction to FP. Monads
 
A, B, C. 1, 2, 3. Iterables you and me - Willian Martins (ebay)
A, B, C. 1, 2, 3. Iterables you and me - Willian Martins (ebay)A, B, C. 1, 2, 3. Iterables you and me - Willian Martins (ebay)
A, B, C. 1, 2, 3. Iterables you and me - Willian Martins (ebay)
 
Data manipulation with dplyr
Data manipulation with dplyrData manipulation with dplyr
Data manipulation with dplyr
 
Data Visualization With R: Learn To Modify Font Of Graphical Parameters
Data Visualization With R: Learn To Modify Font Of Graphical ParametersData Visualization With R: Learn To Modify Font Of Graphical Parameters
Data Visualization With R: Learn To Modify Font Of Graphical Parameters
 
Dplyr and Plyr
Dplyr and PlyrDplyr and Plyr
Dplyr and Plyr
 
Addendum to ‘Monads do not Compose’
Addendum to ‘Monads do not Compose’ Addendum to ‘Monads do not Compose’
Addendum to ‘Monads do not Compose’
 
Pandas,scipy,numpy cheatsheet
Pandas,scipy,numpy cheatsheetPandas,scipy,numpy cheatsheet
Pandas,scipy,numpy cheatsheet
 
Fp in scala with adts part 2
Fp in scala with adts part 2Fp in scala with adts part 2
Fp in scala with adts part 2
 
python-cheat-sheet-v1
python-cheat-sheet-v1python-cheat-sheet-v1
python-cheat-sheet-v1
 
Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)
 

Similar to A Survey Of R Graphics

La R Users Group Survey Of R Graphics
La R Users Group Survey Of R GraphicsLa R Users Group Survey Of R Graphics
La R Users Group Survey Of R Graphicsguest43ed8709
 
Some Examples in R- [Data Visualization--R graphics]
 Some Examples in R- [Data Visualization--R graphics] Some Examples in R- [Data Visualization--R graphics]
Some Examples in R- [Data Visualization--R graphics]Dr. Volkan OBAN
 
perl usage at database applications
perl usage at database applicationsperl usage at database applications
perl usage at database applicationsJoe Jiang
 
Rstudio is an integrated development environment for R that allows users to i...
Rstudio is an integrated development environment for R that allows users to i...Rstudio is an integrated development environment for R that allows users to i...
Rstudio is an integrated development environment for R that allows users to i...SWAROOP KUMAR K
 
introtorandrstudio.ppt
introtorandrstudio.pptintrotorandrstudio.ppt
introtorandrstudio.pptMalkaParveen3
 
Functional programming using underscorejs
Functional programming using underscorejsFunctional programming using underscorejs
Functional programming using underscorejs偉格 高
 
Some R Examples[R table and Graphics] -Advanced Data Visualization in R (Some...
Some R Examples[R table and Graphics] -Advanced Data Visualization in R (Some...Some R Examples[R table and Graphics] -Advanced Data Visualization in R (Some...
Some R Examples[R table and Graphics] -Advanced Data Visualization in R (Some...Dr. Volkan OBAN
 
Rewriting Java In Scala
Rewriting Java In ScalaRewriting Java In Scala
Rewriting Java In ScalaSkills Matter
 
Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with ClojureDmitry Buzdin
 
Beginning Scala Svcc 2009
Beginning Scala Svcc 2009Beginning Scala Svcc 2009
Beginning Scala Svcc 2009David Pollak
 
CoffeeScript
CoffeeScriptCoffeeScript
CoffeeScriptMark
 
Implementing virtual machines in go & c 2018 redux
Implementing virtual machines in go & c 2018 reduxImplementing virtual machines in go & c 2018 redux
Implementing virtual machines in go & c 2018 reduxEleanor McHugh
 
Big Data Analytics with Scala at SCALA.IO 2013
Big Data Analytics with Scala at SCALA.IO 2013Big Data Analytics with Scala at SCALA.IO 2013
Big Data Analytics with Scala at SCALA.IO 2013Samir Bessalah
 
An Intro To ES6
An Intro To ES6An Intro To ES6
An Intro To ES6FITC
 
CL metaprogramming
CL metaprogrammingCL metaprogramming
CL metaprogrammingdudarev
 
Top 10 php classic traps
Top 10 php classic trapsTop 10 php classic traps
Top 10 php classic trapsDamien Seguy
 
Real World Haskell: Lecture 7
Real World Haskell: Lecture 7Real World Haskell: Lecture 7
Real World Haskell: Lecture 7Bryan O'Sullivan
 

Similar to A Survey Of R Graphics (20)

La R Users Group Survey Of R Graphics
La R Users Group Survey Of R GraphicsLa R Users Group Survey Of R Graphics
La R Users Group Survey Of R Graphics
 
Some Examples in R- [Data Visualization--R graphics]
 Some Examples in R- [Data Visualization--R graphics] Some Examples in R- [Data Visualization--R graphics]
Some Examples in R- [Data Visualization--R graphics]
 
perl usage at database applications
perl usage at database applicationsperl usage at database applications
perl usage at database applications
 
Rstudio is an integrated development environment for R that allows users to i...
Rstudio is an integrated development environment for R that allows users to i...Rstudio is an integrated development environment for R that allows users to i...
Rstudio is an integrated development environment for R that allows users to i...
 
introtorandrstudio.ppt
introtorandrstudio.pptintrotorandrstudio.ppt
introtorandrstudio.ppt
 
Functional programming using underscorejs
Functional programming using underscorejsFunctional programming using underscorejs
Functional programming using underscorejs
 
Some R Examples[R table and Graphics] -Advanced Data Visualization in R (Some...
Some R Examples[R table and Graphics] -Advanced Data Visualization in R (Some...Some R Examples[R table and Graphics] -Advanced Data Visualization in R (Some...
Some R Examples[R table and Graphics] -Advanced Data Visualization in R (Some...
 
Rewriting Java In Scala
Rewriting Java In ScalaRewriting Java In Scala
Rewriting Java In Scala
 
Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with Clojure
 
Beginning Scala Svcc 2009
Beginning Scala Svcc 2009Beginning Scala Svcc 2009
Beginning Scala Svcc 2009
 
CoffeeScript
CoffeeScriptCoffeeScript
CoffeeScript
 
Groovy
GroovyGroovy
Groovy
 
Implementing virtual machines in go & c 2018 redux
Implementing virtual machines in go & c 2018 reduxImplementing virtual machines in go & c 2018 redux
Implementing virtual machines in go & c 2018 redux
 
Big Data Analytics with Scala at SCALA.IO 2013
Big Data Analytics with Scala at SCALA.IO 2013Big Data Analytics with Scala at SCALA.IO 2013
Big Data Analytics with Scala at SCALA.IO 2013
 
An Intro To ES6
An Intro To ES6An Intro To ES6
An Intro To ES6
 
CL metaprogramming
CL metaprogrammingCL metaprogramming
CL metaprogramming
 
Scala @ TomTom
Scala @ TomTomScala @ TomTom
Scala @ TomTom
 
Scala 2 + 2 > 4
Scala 2 + 2 > 4Scala 2 + 2 > 4
Scala 2 + 2 > 4
 
Top 10 php classic traps
Top 10 php classic trapsTop 10 php classic traps
Top 10 php classic traps
 
Real World Haskell: Lecture 7
Real World Haskell: Lecture 7Real World Haskell: Lecture 7
Real World Haskell: Lecture 7
 

Recently uploaded

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 

Recently uploaded (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

A Survey Of R Graphics

Editor's Notes

  1. “ A Survey of R Graphics” – presented to the LA R Users Group, June 18, 2009. Today I’m going to go through a survey of data visualization functions and packages in R. In particular, I’ll discuss three approaches for data visualization in R: (i) the built-in base graphics functions, (ii) the ggplot2 package, and (iii) the lattice package. I’ll also discuss some methods for visualizing large data sets. I’ll end with an overview of Rapache, a tool for embedding R in web applications. For questions beyond this talk, I can be contacted at: Michael E Driscoll http://www.dataspora.com [email_address]
  2. Hal Varian said that “The sexy job in the next ten years will be statisticians…” (in an 2009 interview with McKinsey Quarterly). Data visualization is the fastest means to feeds our brains data, because it leverages our highest bandwidth sensory organ: our eyes. Statistical visualization is sexy both because high-density information plots tickle our brains – we crave information – and because it is hard to do well.
  3. A data visualization is often the final step in a three-step data sense-making process, whereby data is (i) “munged” e.g. collected, cleansed, and structured), (ii) modeled , relationships in the data are explored and hypotheses tested, and finally (iii) visualized , a particular model of the data is represented graphically. At Facebook, their data engineers are called “data scientists.” I like this term because it conveys that working with data involves the scientific method, predicated on making hypotheses and testing them. Ultimately, we are interested in using data to make hypotheses about the world.
  4. Like this one, from Jessica Hagy’s witty blog – this is indexed.com She visualizes a hypothesis that free time and money are related – e.g. that you have the most free time when you’re broke and when you’re rich. I decided to test this hypothesis with data on working hours (its complement = free time) and GDP from 29 OECD countries.
  5. Using R, I decided to test this hypothesis. I modeled it with a polynomial regression. Data for 29 countries in the OECD, using 2006 data on annual hours worked, and GDP per capita. I modeled it with both linear and polynomial regression models. Just a few lines of code.
  6. And using R, I visualized it. Her wealth-free time hypothesis was half-right. The richer you are, the more free time you have (the extreme rightmost point is Luxembourg). But at least for this subset of countries that we examined, the relationship is strictly linear – the poorest OECD countries have the least free time. (In the code shown on the right, I’m using ggplot2 here, not the base graphics plot function in the previous slide. But ggplot2 will automatically do a loess fit for us).
  7. In this section, I describe built-in graphics functions in R, that require no external packages.
  8. First, let’s peek under the covers of the R graphics stack. At the top-most level are packages, like “maps”, “lattice”, and “ggplot2”. These packages make calls to a lower-level graphics system, of which in R there are two – called “graphics” and “grid”. According to Nicholas Lewin-Koh, the goal of these graphics systems is to “create coordinates for each graphical object and render them to a device or canvas. In addition the system may manage (i) a stack of graphics objects, (ii) local state information, (iii) redrawing and resizing.” Finally these graphics systems are capable of rendering output to a variety of devices – which for our purposes, can be considered image formats such as PNG, JPG, and PDF. Devices are most commonly include interactive displays – such as those in Windows of Mac OS X – which R sends its output to by default during an interactive session. Grid is a newer system, and both “lattice” and “ggplot2”, which I’ll discuss later, use Grid.
  9. plot() is a “do the right thing” graphics command plot() is the simplest R command for generating a visualization of an R object. It’s an overloaded function that just “does the right thing”, and yields a quick few for many R objects that are passed to it. These built-in basic plotting commands are useful if you’re just doing quick, exploratory analysis, and publication quality graphs are not what you’re looking for.
  10. We can interactively add layers – lines, points, and text -- to plots using basic graphics functions. One such example is abline – so named for its a slope, b intercept parameters it uses to draw a line (from that saw y = a x + b ).
  11. par is a function for setting graphical parameters for base graphics – and, nota bene, these parameters are often shared by the higher level packages I discuss later. Once parameters are defined via par , graphics functions like plot will use these new parameters in subsequent plots. The example above shows the setting of three parameters: pch to set a p lotting ch aracter (21 denotes a filled circle), cex to set size or c haracter ex pansion (1 is default, 5 is bigger) col to set color, which is definable as a name (“blue”), an integer (1-7 for primaries), or an RGB value (as above).
  12. graphics parameters can be set via par(), or passed directly to graphics functions Above are some more parameters that you can set using par() . For a full list, type help(par) at the R prompt. You can also pass these parameters directly to graphics functions, for example, “ points(5,3, pch=19, col=blue)” The chart on the right is example of a plot painstakingly created with the low-level plotting parameters and functions above. This was done by interactively layering additional text labels and legends on after the initial points were plotted.
  13. Edward Tufte has lauded the value of “small multiples” in information graphics: namely, the incorporation of many small plots in a single graphic. R provides a basic facility for the subdivision of a display device (or ultimately its printed representation) into several panels. This can be achieved by setting the graphics parameter mfrow , which stands for m ultiple f igures plotted row -wise.
  14. With the mfrow parameter, a 2 x 2 matrix of sub-panels -- as in the example above -- can be set up, and plots will be interactively drawn in these sub-panels. The code above illustrates the creation of four figures in a single graphic, and the result is shown in the next slide. (There is also a mfcol function for plotting multiple figures in a col umn-wise manner.)
  15. Unless a data visualization is of unusually high density, most modern display devices allow for upwards of 16 figures to be suitably resolved on a single device. See the splom() function for automatic creation of such dense graphics.
  16. R graphics devices can present some “gotchas” Normally one need not have any knowledge of the graphics devices that underly the R graphics system. But in a few cases, it’s worth knowing something about: while typical users can save R graphics in the Windows or Mac OS X (via a “Save As” dialog in the graphics window), if one is not using a GUI, exporting graphics requires manually opening a device – with one of several device commands (such as pdf() or png() ) – and closing it properly (using dev.off() ). also, when exporting graphics in a non-interactive environment (via a script for instance) – it’s critical to invoke the print() function – which will properly write a graphic to the available device. this “print” issue can be a real gotcha for scripts.
  17. Okay, now I want you to try and forget everything you just heard about base graphics. ggplot2 is a new visualization package formally released in 2009, developed by Professor Hadley Wickham. It is a based a different perspective of developing graphics, and has its own set of functions and parameters.
  18. the ‘gg’ in ggplot2 is a reference to a book called The G rammar of G raphics written done by Leland Wilkinson The book conceives graphics as compositional – made up colors, visual shapes, and coordinates, much as sentences are made up of parts of speech.
  19. I’ve illustrated an incomplete version of Wilkinson’s grammar in this slide, to convey how graphics are built up – and out of – their component parts. As such, Wilkinson advocates that graphical tools should leave behind what he deems “chart typologies” – rigid casts of a pie charts, bar graphs, or scatter plots, which data is poured into. (Excel chart wizard might be thought of as the Mad Libs of graphics –with pre-defined structure, and limited degrees of freedom). Conceived as compositional, a graphical grammar allows for an infinite variety of graphical constructions.
  20. In the upcoming examples, drawn directly from Hadley Wickham’s book on ggplot2, we’ll visualize data concerning ~ 50,000. We’ll start simple and build to more complex graphs by specifying additional elements of the graphical grammar. This data is in the ggplot2 package, more information is available with help(diamonds) (after loading ggplot2). For our purposes, we’re concerned examining relationships between just three dimensions of this data, namely: carat, cut, clarity, price.
  21. In ggplot2 , the command to build this plot is qplot() , which stands for “ q uick plot”. We pass qplot() two dimensions of our data (carat and price), and it defaults to a scatter plot representation. Also worth noting is ggplot2’s other visual defaults are quite easy on the eyes – in contrast to most of R’s base graphics. We begin with a basic scatter plot of these 50,000 diamonds. This plot reveals that, not surprisingly, the price of diamonds increases as they get bigger (in terms of carats). Somewhat more interesting is how: we perceive that price seems to increase exponentially (and we test this hypothesis in the next slide).
  22. Next, we log normalize the our data, and reveal that as we suspected, the relationship between a diamond’s price and its carat is exponential. It should be noted that we can achieve this transformation in two equivalent ways: (i) we can directly transform our data with the log function, or (ii) we can transform our coordinate scales on which our data is plotted. In ggplot2, this latter approach is achieved by passing the parameter ‘log=“xy”’ to qplot. Because both normalization approaches rely on different parts of graphical speech – data and scale – this nicely illustrates that, as in language, there is more than one way to express data visually using this grammar of graphics and ggplot2.
  23. Another element of the graphical grammar is the aesthetic appearance of plotting points. Here, we pass a parameter, alpha , which controls the transparency of the points plotted. The parameter’s value, I(1/20) , indicates that each point should have 1/20 th of full intensity: thus 20 overplotted points are required at any given location to achieve full saturation (in this case, to black). (Note: the “I” function in R inhibits further interpretation of its arguments, so can be thought of simply the fraction 1/20) This method uncovers some interesting distributions in the data that were previously obscured by overplotting. For example, we can detect that points are highly concentrated around specific carat sizes. Contrast this method with our earlier approach to alpha blending with base graphics, which required manually specifying the RGB hex code.
  24. Here we layer on yet another element of grammar, the color, to show how clearer stones are more expensive. ggplot2 automatically creates a legend for the mapping of color variables onto color. (Note, Wickham’s choice of a default color palette is not accidental – they of equal luminance, thus no one dominates over the other. For more than you ever want to know about color choice, see http://www.stat.auckland.ac.nz/~ihaka/120/Lectures/lecture13.pdf ).
  25. Now we use another element of the grammar – what is termed ‘facets’ – to splinter our graphic into a number of subplots along a given dimension. Here we achieve the small multiples that we previously did using the par function and mfrow parameter. These sorts of sub-divided plots are what the Lattice system, excels at, which we’ll see later. What can say from this plot? Well, if anything, clear colored diamonds (“D”) seem to get more expensive more quickly (slightly steeper slope as a function of their size) versus yellower diamonds.
  26. Let’s take another view of the data. Here we’re interested in seeing how color influences the per carat cost of a diamond. The boxplot on the left shows that nearly clear diamonds (color categories ‘D’ and ‘E’) have a greater number of high-priced outliers, but their median (the center line of each box) is nearly identical to the others. The so-called jitter plot on the right shows this same view of the data, but all of the points are shown – in this case, the points plotted into bins according a categorical variable, diamond color, and “jittered” within each bin to prevent overplotting, and allow a sense of the local density at difference values along the common y-dimension of price/carat.
  27. A display of 50,000 data points. Why not? Our eyes can handle, and I submit, crave these kind of rich visualizations. This also allows us to detect features of the data (for example, several thin white bands across the bottom of the bars – perhaps preferred price/carat combinations?) that may be missing in from more simplified data views.
  28. lattice is an alternative high-level graphics package for R. Like ggplot2 it is built on the grid graphics system.
  29. lattice is named in honor of its predecessor, trellis , which was a visualization library developed for the S language by William Cleveland. trellis was so named because of how it visualizes higher dimensions of data: it splinters these dimensions across space, producing a grid of small multiples that resemble a trellis. In the next series of slides I show how we can use lattice to visualize up to six dimensions of data in a single plot.
  30. To demonstrate lattice’s multivariate visualizing abilities, we’ll use a fascinating data set called MLB Gameday. Since 2007, Major League Baseball has tracked the path and velocity of &gt; 1 million pitches thrown. Sample data is here: http://gd2.mlb.com/components/game/mlb/year_2008/month_03/day_30/gid_2008_03_30_atlmlb_wasmlb_1/pbp/pitchers/400010.xml
  31. With just two dimensions of data to describe — the x and y location in the strike zone — we can use lattice’s xyplot function. Unlike ggplot2, the first that we pass to lattice’s plotting functions (of which xyplot is just one) are formulas that describe a relationship in the data to be plotted. In this case, “x ~ y” can be read as “x depends on y”. Note the visual defaults: not as easy on the eyes as ggplot2 (which has a lower contrast gray background), but an improvement on R’s base graphics plots.
  32. In this plot, I’ve layered a third dimension, pitch type, into our plot by using lattice’s “groups” parameter, which uses a different plotting symbol for each type, and includes a legend across the top. Alas, this is not a particularly informative chart. The symbols are overplotted on top of each other: trends among the pitch types are hard to discern. With lattice, we can use yet another approach.
  33. Now we’re doing what lattice does best – splintering a dimension, in this case pitch type, into space. We do this by using R’s “condition” operator in the formula we pass to lattice (the formula “x ~ y | type” can be read as “x depends on y conditioned on type”).
  34. Now we include a fourth dimension in our plot – pitch speed – by using color. The speed to color mapping is relatively intuitive (seen in upper right), red is fast, blue is slow. How we achieve this is not particularly simple: we must use what lattice deems “panel functions”, which allow us to extend the default appearance of the chart.
  35. Finally we add a fifth dimension, local density, to our plots using a two-dimensional color palette, where speed is related to chroma, and local density to luminance. This is an attempt to control for some overplotting that might otherwise occur when we shrink these pitch plots down in size.
  36. Now we can compare two different pitchers – the sixth dimension – in a single graphic. The six dimensions of data we visualized with lattice are thus: 1. and 2. x and y location of the pitch 3. pitch type 4. pitch speed 5. pitch density (lots of pitches make darker luminosity with out changing hue) 6. pitcher (Cole or Hamels)
  37. As mentioned, the lattice package provides several other graphics functions besides xyplot. Some are listed above here, and the densityplot() function is highlighted at the bottom. This is a particularly useful alternative to standard histograms, which can suffer from binning artifacts.
  38. In this section I mention a couple of techniques for handling large data sets.
  39. This is bad for two reasons: (1) overplotting obscures data, even when alpha blending is used. (2) it’s highly inefficient, both on screen – and especially if saved as vector graphic (huge PDFs). Two solutions: - resort to sampling map density of points onto some other attribute – such as color hexbinplot and geneplotter do just this.
  40. hexbinplot() is a graphics function (in an self-named package) divides a scatter plot area into hexagons, counts occurrences within each these hexagonal areas, and maps these counts to a color scale. The result is a plot, as shown, where the graphics device need only draw as many points as there are hexagons. In the case of the diamond data, rather than 50,000 points being graphed, just ~ 2000 hexagons are. This also reveals some of the clumpiness in the data, though not as well as ggplot2’s alpha-blended scatterplots.
  41. This is an Affymetrix gene chip, with 100,000 data points. On the right we have the output of a typical microarray assay: the colors correspond to RNA expression levels. With R, I can distill these 100,000 data points down to a simple model – and visualize it.
  42. The data visualization on the right, called an M-A plot, is a variation of an XY scatter plot, where we are comparing the observed signals for particular microarray, to a composite background distribution – both are ordered by intensity of signal– deviations from the straight line show differences between our array and the background (in this case, our array tends to have higher signals across the board). Typically we generate an M-A plot for every array in our compendium to yield a big picture view of the consistency of our arrays across experiments – the flatter the red lines, the better (remember that in most models of cellular behavior we expect only a small fraction of genes to change in expression).
  43. Ross Ihaka’s Colorspace package provides access to useful colorspaces beyond RGB, like LAB and HSV. These colorspaces are preferred by artists and designers for their more intuitive properties. This is the package I used to design the palettes in the pitching plots shown earlier. For my opinionated comments on using color in data visualizations, visit: http://dataspora.com/blog/how-to-color-multivariate-data/
  44. Before we end, some thoughts on how R can be used a visualization engine on the web.
  45. So I’ve pushed this pitch visualization application into a web app, using RApache. I can do this because R is open source – without licensing restrictions. Data and the processing can both live on the server – important when your data set is huge (this one is around 20 Gigabytes). And when the data changes, the dashboard updates. No local software installation needed, and updates are instantly available to all web users. It can be part of the open source web-analytics stack, with a catchy name – LAMR. If you can think of something less lame, let me know.
  46. Why Embed R into a Web-based Architecture? Immediately access the many benefits of a web architecture that is: * Stateless/Scalable – URL requests can be distributed across one or many servers * Cacheable - common requests made to the R server can be cached by Apache * Secure - we can piggyback on existing HTTPS architecture for analysis of sensitive data
  47. rapache: Embedding R within the Apache Server Our tool of choice is rapache, developed by Jeff Horner at Vanderbilt University. http://biostat.mc.vanderbilt.edu/rapache/
  48. Naturally this is just scratching the surface of what rapache can do. An alternative approach to printing HTML directly, is to use a templating system, similar to PHP. This is available via the R package brew (also developed by Jeffrey Horner), downloadable on CRAN and at: http://www.rforge.net/brew/
  49. The ggplot2 and lattice books are both published by Springer (ggplot2 as of July 2009), available via Amazon. example code and figures from ggplot2 book http://had.co.nz/ggplot2 example code and figures from lattice book http://lmdvr.r-forge.r-project.org/
  50. Michael E. Driscoll is Principal and Founder of Dataspora LLC. He has a decade of experience developing large-scale databases and data mining algorithms within industry, government, and academic institutions. He founded and until 2008 served on the board of CustomInk.com, an Inc. 500 online retailer. Michael has a Ph.D. in Bioinformatics from Boston University and an A.B. from Harvard College.