0
R for Data Visualizaiton
and Graphics
Rob Kabacoff, Ph.D.
Vice President of Research

Source code for presentation: http:/...
R is a Statistical and Graphical
R Homepage - http://www.r-project.org/
Platform

CRAN Mirrors – http://cran.r-project.org...
Data Input
Statistical Packages
SAS

SPSS

Stata

Keyboard
ASCII
Text Files

Excel
netCDF
HDF5

R

XML

Webscraping
SQL

M...
Statistical Methods
Descriptive Statistics
Experimental Design
Linear , Generalized, Nonlinear,
and Hierarchical Models
An...
Given : depth

Graphs!

200

300

400

500

10 Meter Contour Spacing

165 170 175 180 185

-35

-25

-15

lat

-35

-25

M...
A High Level Tour
• General Systems
– base
– lattice
– ggplot2

• Interactive
–
–
–
–

iplots
rggobi
googleVis
Shiny

• Sp...
60
40
20
0

3 complete
graphics systems

Frequency

80

100

Base Graphics

50000

100000

150000

200000

Salary (dollars...
BASE GRAPHICS

8
histograms
Histogram with Rug plot

150000

8.0e-06

1.2e-05

100000

200000

50000

Salary (dollars)

100000

150000

200...
bar charts

10
box plots
Singer Height by Voice Part

Soprano 1
Soprano 2
Alto 1
Alto 2
Tenor 1
Tenor 2
Bass 1
Bass 2

60

65

70

75

He...
Monthly Airline Passengers

line charts
Passengers (K)

600

4000

UK Lung Cancer Deaths

3500

Total
Male
Female

500
400...
time series
300

-60

Season Decomposition of a Time Series

300

Season Decomposition of a Time Series

0 20

remainder

...
scatterplots
10

15

High Density Scatterplot (n=10,000)

5

Iris Data
Y

7

0

5
-5

4
3

-10

Petal Length (cm)

6

2

-...
scatterplot matrix
Anderson's Iris Data -- 3 species
3.0 3.5 4.0

0.5

1.0 1.5

2.0 2.5

6.5

7.5

2.0 2.5

4.0

4.5

5.5
...
dot plot
MPG by Automobile
Toyota Corolla
Fiat 128
Lotus Europa
Honda Civic
Fiat X1-9
Porsche 914-2
Merc 240D
Merc 230
Dat...
contour plots
A Topographic Map of Maunga Whau

600

10 Meter Contour Spacing

110

120

400

10

0

10

150

0

200

180
...
LATTICE GRAPHICS

18
lattice graphs
• expands base graphics to include trellis plots
• seeks to improve in graph defaults (symbols, axes, label...
3D graphs with faceting

20
lattice graph with faceting and a
customized panel function
GGPLOT2 GRAPHICS
ggplot2
• Grammar of Graphics
• graphs built up in layers by plotting "geoms"
• grouping
– color, fill, shape, size can be...
kernel density plots with grouping

24
histogram with
faceting
Theoretical

Applied

20
AsstProf

15
10
5
0

AssocProf

count

20
15
10
5
0
20
15
Prof

10
5
0
50...
boxplots
Theoretical

Applied

200000

salary

sex
150000

Female
Male

100000

50000
AsstProf

AssocProf

Prof

AsstProf
...
jittered plots

27
scatter plot with smooth line

28
scatterplot with fit lines, grouping,
and faceting

29
SPECIALIZED GRAPHS

30
Danger

Exp

Pred

Gest

Span

Sleep

Dream

NonD

BrainWgt

BodyWgt

Danger

Exp

Pred

Gest

Span

Sleep

Dream

NonD

B...
car
package

scatterplot matrices
10

20

30

40

50

60
10 20 30 40 50

0

50

0

Frequency

yrs.since.phd

yrs.service

...
cyl

corrplot
package

wt

78

89

hp

83

79

66

hp

carb

53

39

43

75

carb

qsec

-59

-43

-17

-71

-66

wt

90

...
Heatmap

Specification Variables

disp

hp

mpg

qsec

gear

drat

wt

carb

vs

am

cyl

Toyota Corona
Porsche 914-2
Dats...
visualizing categorical data
2000

vcd
package

Sex

1500
1000
500
0

Male Female

Survived
1500
1000
500
0

No

Yes

1000...
visualizing effects (linear models)
2 x 3 ANCOVA

36
rank by sex interaction (means)
adjusting for other variables

effects
package

rank*sex effect plot
AsstProf

sex : Femal...
visualizing effects (generalized
linear models)
Logistic regression with 8 predictors

38
rating effects (prob) by gender adjusting
for other variables

effects
package

39
scatterplot3d
package

3D Scatterplot

Automobile Data

35

Toyota Corolla
Fiat 128

30

Honda Civic
Lotus Europa
Fiat X1-...
INTERACTIVE GRAPHICS

41
iplots

hold [Ctrl] and mouse
over graph for info

42
rggobi
• GGobi is an open source visualization program for
exploring high-dimensional data
• rggobi provides R command lin...
Display to
open new
windows
Interaction
to select,
identity, or
brush
View to change
type of xy plot

right mouse
to selec...
googleVis
• Provides access to Google Chart Tools
–
–
–
–

motion charts
annotated time lines
maps
other (e.g. line, bar, ...
46
Shiny
• Package for building interative web
applications with R
– homepage- http://www.rstudio.com/shiny/
– examples- http...
shiny example

48
RESOURCES
www.statmethods.net
Books
R in Action
Robert I. Kabacoff

R Graphics Cookbook
Winston Chang

Lattice
Deepayan Sarkar

ggplot2
Hadley Wickham

...
additional websites
• Cookbook for R
http://www.cookbook-r.com/

• ggplot2 documentation
http://docs.ggplot2.org/current/
...
Upcoming SlideShare
Loading in...5
×

R for data visualization and graphics

1,752

Published on

1 Comment
1 Like
Statistics
Notes
No Downloads
Views
Total Views
1,752
On Slideshare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
49
Comments
1
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "R for data visualization and graphics"

  1. 1. R for Data Visualizaiton and Graphics Rob Kabacoff, Ph.D. Vice President of Research Source code for presentation: http://tinyurl.com/Kabacoff-CS20
  2. 2. R is a Statistical and Graphical R Homepage - http://www.r-project.org/ Platform CRAN Mirrors – http://cran.r-project.org/ • • • • • • • Free Open source State-of-the-art data analysis Platform for programming new methods Runs on Windows, Linux, Mac OS X Enormous user base Reproducible research 2
  3. 3. Data Input Statistical Packages SAS SPSS Stata Keyboard ASCII Text Files Excel netCDF HDF5 R XML Webscraping SQL MySQL Oracle Other Access Database Management Systems 3
  4. 4. Statistical Methods Descriptive Statistics Experimental Design Linear , Generalized, Nonlinear, and Hierarchical Models Analysis of Categorical Data Nonparametric Analysis Survival Analysis Latent Variable Models Bayesian Models Missing Values Analysis Cluster Analysis Decision Trees Data Mining Classical Test Theory Item Response Theory Correspondence Analysis Multidimensional Scaling Meta Analysis Structural Equation Modeling Complex Survey Design Time Series Analysis Longitudinal Analysis Social Network Analysis Study of Mediation and Moderation Power Analysis Clinical Trials and … 4
  5. 5. Given : depth Graphs! 200 300 400 500 10 Meter Contour Spacing 165 170 175 180 185 -35 -25 -15 lat -35 -25 Meters West -15 165 170 175 180 185 A Topographic Map of Maunga Whau 600 100 200 300 400 500 600 100 165 170 175 180 185 0 long 0 200 400 600 800 Meters North Sinc( 8 6 4 2 0 -2 -10 10 r) 5 Y 0 -5 0 X -5 5 10 -10 Survival on the Titanic Child University Salaries by Discipline Age Adult Pearson residuals: 14.3 Male No 200000 Yes Salary Sex Survived discipline 4.0 2.0 0.0 -2.0 -4.0 150000 Theoretical Applied Yes No Female 100000 -11.1 p-value = <2e-16 50000 0 20 Years Since Ph.D. 40 5
  6. 6. A High Level Tour • General Systems – base – lattice – ggplot2 • Interactive – – – – iplots rggobi googleVis Shiny • Specialized – – – – – – – – – vcd (categorical data) VIM (missing data) likert (likert data) scatterplot3d (3-D scatterplot) car (regression) corrplot (correlations) (decision trees) (dendograms) effects (glm/ANOVA) 6
  7. 7. 60 40 20 0 3 complete graphics systems Frequency 80 100 Base Graphics 50000 100000 150000 200000 Salary (dollars) Lattice Graphics ggplot2 Graphics 40 100 30 Frequency Frequency 80 60 20 40 10 20 0 0 50000 50000 100000 150000 Salary (dollars) 200000 100000 150000 Salary (dollars) 200000
  8. 8. BASE GRAPHICS 8
  9. 9. histograms Histogram with Rug plot 150000 8.0e-06 1.2e-05 100000 200000 50000 Salary (dollars) 100000 150000 200000 Salary (dollars) 0 20 40 60 80 100 Histogram with Normal Curve Frequency 50000 0.0e+00 4.0e-06 Density 8.0e-06 0.0e+00 4.0e-06 Density 1.2e-05 Histogram of Kernal Density Curve 50000 100000 150000 Salary (dollars) 200000 9
  10. 10. bar charts 10
  11. 11. box plots Singer Height by Voice Part Soprano 1 Soprano 2 Alto 1 Alto 2 Tenor 1 Tenor 2 Bass 1 Bass 2 60 65 70 75 Heights in Inches 11
  12. 12. Monthly Airline Passengers line charts Passengers (K) 600 4000 UK Lung Cancer Deaths 3500 Total Male Female 500 400 300 200 3000 100 1950 1952 1954 1956 1958 1960 2500 Time 2000 Monthly Airline Passengers 500 1000 Passengers (K) 1500 600 1974 1975 1976 1977 year 1978 1979 1980 500 400 300 200 100 1950 1952 1954 1956 1958 1960 Time 12
  13. 13. time series 300 -60 Season Decomposition of a Time Series 300 Season Decomposition of a Time Series 0 20 remainder 60 200 trend 400 500 Season Decomposition -20 0 20 seasonal 60 100 data 500 Monthly Air Passengers -40 Season Decomposition of a Time Series 1950 1952 1954 1956 1958 1960 time Season Decomposition of a Time Series 13
  14. 14. scatterplots 10 15 High Density Scatterplot (n=10,000) 5 Iris Data Y 7 0 5 -5 4 3 -10 Petal Length (cm) 6 2 -5 1 0 5 10 X 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 Sepal Length (cm) 14
  15. 15. scatterplot matrix Anderson's Iris Data -- 3 species 3.0 3.5 4.0 0.5 1.0 1.5 2.0 2.5 6.5 7.5 2.0 2.5 4.0 4.5 5.5 Sepal.Length 5 6 7 2.0 3.0 Sepal.Width 1.5 2.5 1 2 3 4 Petal.Length 0.5 Petal.Width 4.5 5.5 6.5 7.5 1 2 3 4 5 6 7 15
  16. 16. dot plot MPG by Automobile Toyota Corolla Fiat 128 Lotus Europa Honda Civic Fiat X1-9 Porsche 914-2 Merc 240D Merc 230 Datsun 710 Toyota Corona Volvo 142E Hornet 4 Drive Mazda RX4 Wag Mazda RX4 Ferrari Dino Pontiac Firebird Merc 280 Hornet Sportabout Valiant Merc 280C Merc 450SL Merc 450SE Ford Pantera L Dodge Challenger AMC Javelin Merc 450SLC Maserati Bora Chrysler Imperial Duster 360 Camaro Z28 Lincoln Continental Cadillac Fleetwood 10 15 20 25 30 16
  17. 17. contour plots A Topographic Map of Maunga Whau 600 10 Meter Contour Spacing 110 120 400 10 0 10 150 0 200 180 0 0 17 19 18 0 160 160 110 170 140 130 10 0 110 0 Meters West 110 0 200 400 Meters North 600 800 17
  18. 18. LATTICE GRAPHICS 18
  19. 19. lattice graphs • expands base graphics to include trellis plots • seeks to improve in graph defaults (symbols, axes, labels) over base gaphics • grouping – color, fill, line type can be mapped to variable values • facets – subgroups can be plotted in an array based on the levels of (usually) one or two variables • customizable panel functions allow you fine grained control of what is plotted in each facet • comments – clean and fast – high degree of customization possible
  20. 20. 3D graphs with faceting 20
  21. 21. lattice graph with faceting and a customized panel function
  22. 22. GGPLOT2 GRAPHICS
  23. 23. ggplot2 • Grammar of Graphics • graphs built up in layers by plotting "geoms" • grouping – color, fill, shape, size can be mapped to variable values • facets – subgroups can be plotted in an array based on the levels of (usually) one or two variables • comments – – – – allows you to create novel plots can be slow for large problems no 3D graphs HOT!
  24. 24. kernel density plots with grouping 24
  25. 25. histogram with faceting Theoretical Applied 20 AsstProf 15 10 5 0 AssocProf count 20 15 10 5 0 20 15 Prof 10 5 0 50000 100000 150000 200000 50000 100000 150000 200000 salary 25
  26. 26. boxplots Theoretical Applied 200000 salary sex 150000 Female Male 100000 50000 AsstProf AssocProf Prof AsstProf AssocProf Prof rank 26
  27. 27. jittered plots 27
  28. 28. scatter plot with smooth line 28
  29. 29. scatterplot with fit lines, grouping, and faceting 29
  30. 30. SPECIALIZED GRAPHS 30
  31. 31. Danger Exp Pred Gest Span Sleep Dream NonD BrainWgt BodyWgt Danger Exp Pred Gest Span Sleep Dream NonD BrainWgt BodyWgt 0 2 4 8 Combinations 6 10 Number of missings 12 14 visualizing missing data VIM package 1 1 2 2 2 3 9 42 31
  32. 32. car package scatterplot matrices 10 20 30 40 50 60 10 20 30 40 50 0 50 0 Frequency yrs.since.phd yrs.service 0 10 20 30 40 50 salary 100000 Frequency x 200000 0 10 30 Frequency x 100000 150000 x 200000 32
  33. 33. cyl corrplot package wt 78 89 hp 83 79 66 hp carb 53 39 43 75 carb qsec -59 -43 -17 -71 -66 wt 90 disp visualizing correlations disp qsec variables reordered to find clusters -49 -56 -58 -13 27 -21 am -52 -59 -69 -24 6 -23 79 am drat -70 -71 -71 -45 -9 9 70 71 drat vs -81 -71 -55 -72 -57 74 21 17 44 mpg -85 -85 -87 -78 -55 42 48 60 68 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 vs gear gear non-significant (.05) correlations indicated with an X 66 0.8 1 33
  34. 34. Heatmap Specification Variables disp hp mpg qsec gear drat wt carb vs am cyl Toyota Corona Porsche 914-2 Datsun 710 Volvo 142E Merc 230 Lotus Europa Merc 280 Merc 280C Mazda RX4 Wag Mazda RX4 Merc 240D Ferrari Dino Fiat 128 Fiat X1-9 Toyota Corolla Honda Civic Merc 450SL Merc 450SE Merc 450SLC Dodge Challenger AMC Javelin Hornet 4 Drive Valiant Duster 360 Camaro Z28 Ford Pantera L Pontiac Firebird Hornet Sportabout Cadillac Fleetwood Lincoln Continental Chrysler Imperial Maserati Bora Car Models stats package 34
  35. 35. visualizing categorical data 2000 vcd package Sex 1500 1000 500 0 Male Female Survived 1500 1000 500 0 No Yes 1000 800 600 400 200 0 Class 1st 2nd 3rd 35
  36. 36. visualizing effects (linear models) 2 x 3 ANCOVA 36
  37. 37. rank by sex interaction (means) adjusting for other variables effects package rank*sex effect plot AsstProf sex : Female AssocProf Prof sex : Male 130000 120000 salary 110000 100000 90000 80000 70000 AsstProf AssocProf Prof rank 37
  38. 38. visualizing effects (generalized linear models) Logistic regression with 8 predictors 38
  39. 39. rating effects (prob) by gender adjusting for other variables effects package 39
  40. 40. scatterplot3d package 3D Scatterplot Automobile Data 35 Toyota Corolla Fiat 128 30 Honda Civic Lotus Europa Fiat X1-9 Merc 240D Merc 230 Ferrari Dino Merc 280C Hornet 4 Drive Pontiac Firebird Valiant Chrysler Imperial Hornet Sportabout Merc 450SL Merc 450SE 20 Merc 450SLCChallenger Dodge Maserati Bora AMC JavelinPantera L Ford Duster 360 Camaro Z28 6 Lincoln Continental Cadillac Fleetwood 5 15 4 Weight (lb/1000) 25 Datsun 710 Volvo 142E Mazda RX4 Wag Toyota Corona Mazda RX4 Merc 280 3 2 10 Miles/(US) Gallon Porsche 914-2 1 0 100 200 300 400 500 Displacement (cu. in.) 40
  41. 41. INTERACTIVE GRAPHICS 41
  42. 42. iplots hold [Ctrl] and mouse over graph for info 42
  43. 43. rggobi • GGobi is an open source visualization program for exploring high-dimensional data • rggobi provides R command line interface to GGobi Installation 1. install GGobi: download from www.ggobi.org 2. in R: install.packages("rggobi") see: http://www.ggobi.org/rggobi/introduction.pdf 43
  44. 44. Display to open new windows Interaction to select, identity, or brush View to change type of xy plot right mouse to select 44
  45. 45. googleVis • Provides access to Google Chart Tools – – – – motion charts annotated time lines maps other (e.g. line, bar, bubble, column, area, scatter, candlestick, pie, org charts) – https://developers.google.com/chart/ • output is html code containing data and references to JavaScript functions hosted by Google • an internet connection required to view the graphs demo(WorldBank) Hans Rosling in his TED talks 45
  46. 46. 46
  47. 47. Shiny • Package for building interative web applications with R – homepage- http://www.rstudio.com/shiny/ – examples- http://www.rstudio.com/shiny/showcase/ • Distribution – self hosted (requires free Shiny Server on Linux server) pkgs <- c("Rcpp", "httpuv", "shiny") – Rstudio hosted install.packages(pkgs) library(shiny) – distribute as a package runExample("06_tabsets") 47
  48. 48. shiny example 48
  49. 49. RESOURCES
  50. 50. www.statmethods.net
  51. 51. Books R in Action Robert I. Kabacoff R Graphics Cookbook Winston Chang Lattice Deepayan Sarkar ggplot2 Hadley Wickham 51
  52. 52. additional websites • Cookbook for R http://www.cookbook-r.com/ • ggplot2 documentation http://docs.ggplot2.org/current/ • R-Bloggers http://www.r-bloggers.com/ 52
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×