R for data visualization and graphics

3,385 views
3,152 views

Published on

1 Comment
3 Likes
Statistics
Notes
No Downloads
Views
Total views
3,385
On SlideShare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
99
Comments
1
Likes
3
Embeds 0
No embeds

No notes for slide

R for data visualization and graphics

  1. 1. R for Data Visualizaiton and Graphics Rob Kabacoff, Ph.D. Vice President of Research Source code for presentation: http://tinyurl.com/Kabacoff-CS20
  2. 2. R is a Statistical and Graphical R Homepage - http://www.r-project.org/ Platform CRAN Mirrors – http://cran.r-project.org/ • • • • • • • Free Open source State-of-the-art data analysis Platform for programming new methods Runs on Windows, Linux, Mac OS X Enormous user base Reproducible research 2
  3. 3. Data Input Statistical Packages SAS SPSS Stata Keyboard ASCII Text Files Excel netCDF HDF5 R XML Webscraping SQL MySQL Oracle Other Access Database Management Systems 3
  4. 4. Statistical Methods Descriptive Statistics Experimental Design Linear , Generalized, Nonlinear, and Hierarchical Models Analysis of Categorical Data Nonparametric Analysis Survival Analysis Latent Variable Models Bayesian Models Missing Values Analysis Cluster Analysis Decision Trees Data Mining Classical Test Theory Item Response Theory Correspondence Analysis Multidimensional Scaling Meta Analysis Structural Equation Modeling Complex Survey Design Time Series Analysis Longitudinal Analysis Social Network Analysis Study of Mediation and Moderation Power Analysis Clinical Trials and … 4
  5. 5. Given : depth Graphs! 200 300 400 500 10 Meter Contour Spacing 165 170 175 180 185 -35 -25 -15 lat -35 -25 Meters West -15 165 170 175 180 185 A Topographic Map of Maunga Whau 600 100 200 300 400 500 600 100 165 170 175 180 185 0 long 0 200 400 600 800 Meters North Sinc( 8 6 4 2 0 -2 -10 10 r) 5 Y 0 -5 0 X -5 5 10 -10 Survival on the Titanic Child University Salaries by Discipline Age Adult Pearson residuals: 14.3 Male No 200000 Yes Salary Sex Survived discipline 4.0 2.0 0.0 -2.0 -4.0 150000 Theoretical Applied Yes No Female 100000 -11.1 p-value = <2e-16 50000 0 20 Years Since Ph.D. 40 5
  6. 6. A High Level Tour • General Systems – base – lattice – ggplot2 • Interactive – – – – iplots rggobi googleVis Shiny • Specialized – – – – – – – – – vcd (categorical data) VIM (missing data) likert (likert data) scatterplot3d (3-D scatterplot) car (regression) corrplot (correlations) (decision trees) (dendograms) effects (glm/ANOVA) 6
  7. 7. 60 40 20 0 3 complete graphics systems Frequency 80 100 Base Graphics 50000 100000 150000 200000 Salary (dollars) Lattice Graphics ggplot2 Graphics 40 100 30 Frequency Frequency 80 60 20 40 10 20 0 0 50000 50000 100000 150000 Salary (dollars) 200000 100000 150000 Salary (dollars) 200000
  8. 8. BASE GRAPHICS 8
  9. 9. histograms Histogram with Rug plot 150000 8.0e-06 1.2e-05 100000 200000 50000 Salary (dollars) 100000 150000 200000 Salary (dollars) 0 20 40 60 80 100 Histogram with Normal Curve Frequency 50000 0.0e+00 4.0e-06 Density 8.0e-06 0.0e+00 4.0e-06 Density 1.2e-05 Histogram of Kernal Density Curve 50000 100000 150000 Salary (dollars) 200000 9
  10. 10. bar charts 10
  11. 11. box plots Singer Height by Voice Part Soprano 1 Soprano 2 Alto 1 Alto 2 Tenor 1 Tenor 2 Bass 1 Bass 2 60 65 70 75 Heights in Inches 11
  12. 12. Monthly Airline Passengers line charts Passengers (K) 600 4000 UK Lung Cancer Deaths 3500 Total Male Female 500 400 300 200 3000 100 1950 1952 1954 1956 1958 1960 2500 Time 2000 Monthly Airline Passengers 500 1000 Passengers (K) 1500 600 1974 1975 1976 1977 year 1978 1979 1980 500 400 300 200 100 1950 1952 1954 1956 1958 1960 Time 12
  13. 13. time series 300 -60 Season Decomposition of a Time Series 300 Season Decomposition of a Time Series 0 20 remainder 60 200 trend 400 500 Season Decomposition -20 0 20 seasonal 60 100 data 500 Monthly Air Passengers -40 Season Decomposition of a Time Series 1950 1952 1954 1956 1958 1960 time Season Decomposition of a Time Series 13
  14. 14. scatterplots 10 15 High Density Scatterplot (n=10,000) 5 Iris Data Y 7 0 5 -5 4 3 -10 Petal Length (cm) 6 2 -5 1 0 5 10 X 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 Sepal Length (cm) 14
  15. 15. scatterplot matrix Anderson's Iris Data -- 3 species 3.0 3.5 4.0 0.5 1.0 1.5 2.0 2.5 6.5 7.5 2.0 2.5 4.0 4.5 5.5 Sepal.Length 5 6 7 2.0 3.0 Sepal.Width 1.5 2.5 1 2 3 4 Petal.Length 0.5 Petal.Width 4.5 5.5 6.5 7.5 1 2 3 4 5 6 7 15
  16. 16. dot plot MPG by Automobile Toyota Corolla Fiat 128 Lotus Europa Honda Civic Fiat X1-9 Porsche 914-2 Merc 240D Merc 230 Datsun 710 Toyota Corona Volvo 142E Hornet 4 Drive Mazda RX4 Wag Mazda RX4 Ferrari Dino Pontiac Firebird Merc 280 Hornet Sportabout Valiant Merc 280C Merc 450SL Merc 450SE Ford Pantera L Dodge Challenger AMC Javelin Merc 450SLC Maserati Bora Chrysler Imperial Duster 360 Camaro Z28 Lincoln Continental Cadillac Fleetwood 10 15 20 25 30 16
  17. 17. contour plots A Topographic Map of Maunga Whau 600 10 Meter Contour Spacing 110 120 400 10 0 10 150 0 200 180 0 0 17 19 18 0 160 160 110 170 140 130 10 0 110 0 Meters West 110 0 200 400 Meters North 600 800 17
  18. 18. LATTICE GRAPHICS 18
  19. 19. lattice graphs • expands base graphics to include trellis plots • seeks to improve in graph defaults (symbols, axes, labels) over base gaphics • grouping – color, fill, line type can be mapped to variable values • facets – subgroups can be plotted in an array based on the levels of (usually) one or two variables • customizable panel functions allow you fine grained control of what is plotted in each facet • comments – clean and fast – high degree of customization possible
  20. 20. 3D graphs with faceting 20
  21. 21. lattice graph with faceting and a customized panel function
  22. 22. GGPLOT2 GRAPHICS
  23. 23. ggplot2 • Grammar of Graphics • graphs built up in layers by plotting "geoms" • grouping – color, fill, shape, size can be mapped to variable values • facets – subgroups can be plotted in an array based on the levels of (usually) one or two variables • comments – – – – allows you to create novel plots can be slow for large problems no 3D graphs HOT!
  24. 24. kernel density plots with grouping 24
  25. 25. histogram with faceting Theoretical Applied 20 AsstProf 15 10 5 0 AssocProf count 20 15 10 5 0 20 15 Prof 10 5 0 50000 100000 150000 200000 50000 100000 150000 200000 salary 25
  26. 26. boxplots Theoretical Applied 200000 salary sex 150000 Female Male 100000 50000 AsstProf AssocProf Prof AsstProf AssocProf Prof rank 26
  27. 27. jittered plots 27
  28. 28. scatter plot with smooth line 28
  29. 29. scatterplot with fit lines, grouping, and faceting 29
  30. 30. SPECIALIZED GRAPHS 30
  31. 31. Danger Exp Pred Gest Span Sleep Dream NonD BrainWgt BodyWgt Danger Exp Pred Gest Span Sleep Dream NonD BrainWgt BodyWgt 0 2 4 8 Combinations 6 10 Number of missings 12 14 visualizing missing data VIM package 1 1 2 2 2 3 9 42 31
  32. 32. car package scatterplot matrices 10 20 30 40 50 60 10 20 30 40 50 0 50 0 Frequency yrs.since.phd yrs.service 0 10 20 30 40 50 salary 100000 Frequency x 200000 0 10 30 Frequency x 100000 150000 x 200000 32
  33. 33. cyl corrplot package wt 78 89 hp 83 79 66 hp carb 53 39 43 75 carb qsec -59 -43 -17 -71 -66 wt 90 disp visualizing correlations disp qsec variables reordered to find clusters -49 -56 -58 -13 27 -21 am -52 -59 -69 -24 6 -23 79 am drat -70 -71 -71 -45 -9 9 70 71 drat vs -81 -71 -55 -72 -57 74 21 17 44 mpg -85 -85 -87 -78 -55 42 48 60 68 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 vs gear gear non-significant (.05) correlations indicated with an X 66 0.8 1 33
  34. 34. Heatmap Specification Variables disp hp mpg qsec gear drat wt carb vs am cyl Toyota Corona Porsche 914-2 Datsun 710 Volvo 142E Merc 230 Lotus Europa Merc 280 Merc 280C Mazda RX4 Wag Mazda RX4 Merc 240D Ferrari Dino Fiat 128 Fiat X1-9 Toyota Corolla Honda Civic Merc 450SL Merc 450SE Merc 450SLC Dodge Challenger AMC Javelin Hornet 4 Drive Valiant Duster 360 Camaro Z28 Ford Pantera L Pontiac Firebird Hornet Sportabout Cadillac Fleetwood Lincoln Continental Chrysler Imperial Maserati Bora Car Models stats package 34
  35. 35. visualizing categorical data 2000 vcd package Sex 1500 1000 500 0 Male Female Survived 1500 1000 500 0 No Yes 1000 800 600 400 200 0 Class 1st 2nd 3rd 35
  36. 36. visualizing effects (linear models) 2 x 3 ANCOVA 36
  37. 37. rank by sex interaction (means) adjusting for other variables effects package rank*sex effect plot AsstProf sex : Female AssocProf Prof sex : Male 130000 120000 salary 110000 100000 90000 80000 70000 AsstProf AssocProf Prof rank 37
  38. 38. visualizing effects (generalized linear models) Logistic regression with 8 predictors 38
  39. 39. rating effects (prob) by gender adjusting for other variables effects package 39
  40. 40. scatterplot3d package 3D Scatterplot Automobile Data 35 Toyota Corolla Fiat 128 30 Honda Civic Lotus Europa Fiat X1-9 Merc 240D Merc 230 Ferrari Dino Merc 280C Hornet 4 Drive Pontiac Firebird Valiant Chrysler Imperial Hornet Sportabout Merc 450SL Merc 450SE 20 Merc 450SLCChallenger Dodge Maserati Bora AMC JavelinPantera L Ford Duster 360 Camaro Z28 6 Lincoln Continental Cadillac Fleetwood 5 15 4 Weight (lb/1000) 25 Datsun 710 Volvo 142E Mazda RX4 Wag Toyota Corona Mazda RX4 Merc 280 3 2 10 Miles/(US) Gallon Porsche 914-2 1 0 100 200 300 400 500 Displacement (cu. in.) 40
  41. 41. INTERACTIVE GRAPHICS 41
  42. 42. iplots hold [Ctrl] and mouse over graph for info 42
  43. 43. rggobi • GGobi is an open source visualization program for exploring high-dimensional data • rggobi provides R command line interface to GGobi Installation 1. install GGobi: download from www.ggobi.org 2. in R: install.packages("rggobi") see: http://www.ggobi.org/rggobi/introduction.pdf 43
  44. 44. Display to open new windows Interaction to select, identity, or brush View to change type of xy plot right mouse to select 44
  45. 45. googleVis • Provides access to Google Chart Tools – – – – motion charts annotated time lines maps other (e.g. line, bar, bubble, column, area, scatter, candlestick, pie, org charts) – https://developers.google.com/chart/ • output is html code containing data and references to JavaScript functions hosted by Google • an internet connection required to view the graphs demo(WorldBank) Hans Rosling in his TED talks 45
  46. 46. 46
  47. 47. Shiny • Package for building interative web applications with R – homepage- http://www.rstudio.com/shiny/ – examples- http://www.rstudio.com/shiny/showcase/ • Distribution – self hosted (requires free Shiny Server on Linux server) pkgs <- c("Rcpp", "httpuv", "shiny") – Rstudio hosted install.packages(pkgs) library(shiny) – distribute as a package runExample("06_tabsets") 47
  48. 48. shiny example 48
  49. 49. RESOURCES
  50. 50. www.statmethods.net
  51. 51. Books R in Action Robert I. Kabacoff R Graphics Cookbook Winston Chang Lattice Deepayan Sarkar ggplot2 Hadley Wickham 51
  52. 52. additional websites • Cookbook for R http://www.cookbook-r.com/ • ggplot2 documentation http://docs.ggplot2.org/current/ • R-Bloggers http://www.r-bloggers.com/ 52

×