Colman McMahon, DIT School of Computing: Getting Started with Data Visualisation

  • 1,528 views
Uploaded on

Graduating with a BA from UCD in 1995, Colman emigrated to America to pursue a career that combined creativity, commerce and computers. Heading west to California, Colman worked for 11 years in …

Graduating with a BA from UCD in 1995, Colman emigrated to America to pursue a career that combined creativity, commerce and computers. Heading west to California, Colman worked for 11 years in Hollywood's visual effects (VFX) industry. During this time he worked mainly at The Walt Disney Co. and also as a. In 2006, Colman returned home to Ireland to undertake a . A short time after the conclusion of the course, while starting up his own , Colman was invited back to DIT as a part-time lecturer. In 2011, Colman was offered a PhD Fellowship at modeling and simulating the relationship between innovation and profit. This full-time study is under the direction of Prof. Petra Ahrweiler, Director UCD Innovation Research Unit and Professor of Technology and Innovation Management, Smurfit School of Business. In 2012, Colman designed and delivered the first iteration of a new Visualisation module as part of DIT's .

Details of Colman's research activities can be found at .

-Dubinked-
Drawing from a new module at DIT, Colman's presentation at Dublinked will be an introduction to the domain of visualisation and a demonstration of powerful yet "do-able" data visualisations. The ethos of the presentation is for people who have little or no visualisation experience but have an aptitude and appetite for using technical tools to surface meaning from data. The tools used will be R, R Studio and Inkscape.

More in: Education , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,528
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
36
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Fingal County CouncilGetting Started with Data Visualization Colman McMahon colman@colmanmcmahon.com2012-05-24 1
  • 2. Visualisation MSc Data Analytics http://www.dit.ie/postgrad/programmes/dt285dt286mscincomputingdataanalytics/2012-05-24 2
  • 3. Agenda1) Background to Data Visualization*2) Resources3) Classification of Visualization4) The Design Process5) Demonstration*Disclaimer (and apologies to some), I use the American spelling “visualization”2012-05-24 3
  • 4. Take-away Points1) Open to all » new domain with many facets2) Professional-level output is achievable » practice a few programming and graphic design techniques3) Its (only) a means to an end » should affect behaviour2012-05-24 4
  • 5. Background Data Visualisation (very briefly)2012-05-24 5
  • 6. Charles Joseph Minard (1781 – 1870)2012-05-24 6
  • 7. William Playfair (1759 – 1823)2012-05-24 7
  • 8. Broad Street cholera outbreak (John Snow - 1854)2012-05-24 8
  • 9. Crimea War deaths (Florence Nightingale - 1858)2012-05-24 9
  • 10. London Underground Map Harry Beck (1933)2012-05-24 10 http://briankerr.wordpress.com/2009/06/08/connections/
  • 11. John Tukey (1915 – 2000)2012-05-24 11
  • 12. Edward Tufte2012-05-24 12
  • 13. Hans Rosling2012-05-24 13
  • 14. ...and many other giants of statistics, mathematics, medicine,design, computing and related fields2012-05-24 14
  • 15. Agenda1) Background to Data Visualization*2) Resources3) Classification of Visualisation4) The Design Process5) Demonstration2012-05-24 15
  • 16. Resources (ever growing)2012-05-24 16
  • 17. Texts (1 of 2)► R in a Nutshell: A Desktop Quick Reference - Adler, Joseph► Excel 2007 Dashboards & Reports For Dummies - Alexander, Michael► Ways of Seeing: Based on the BBC Television Series - Berger, John S.► Semiology of Graphics: Diagrams, Networks, Maps - Bertin, Jacques► Statistics in a Nutshell: A Desktop Quick Reference - Boslaugh, Watters► The Jelly Effect: How to Make Your Communication Stick - Bounds, Andy► Gamestorming: A Playbook for Innovators, Rulebreakers, and Changemakers - Brown, Sunni► Sketching User Experiences: Getting the Design Right and the Right Design - Buxton, Bill► Readings in Information Visualization: Using Vision to Think - Card, Mackinlay and Shneiderman► The Elements of Graphing Data - Cleveland, William S.► Visualizing Data - Cleveland, William S.► Now You See It - Davidson, Cathy N.► slide:ology: The Art and Science of Creating Great Presentations - Duarte, Nancy► Art: The Whole Story - Farthing, Stephen► Information Dashboard Design: The Effective Visual Communication of Data - Few, Stephen► Now You See It: Simple Visualization Techniques for Quantitative Analysis - Few, Stephen► Show Me the Numbers: Designing Tables and Graphs to Enlighten - Few, Stephen► Freelance Design in Practice - Fishel, Cathy► Art of Plain Talk - Flesch, Rudolf► The Art of Looking Sideways - Fletcher, Alan► Graphic Artists Guild Handbook of Pricing and Ethical Guidelines - Graphic Artists Guild► Made to Stick: Why Some Ideas Survive and Others Die - Heath, Chip and Dan► Switch: How to Change Things When Change Is Hard - Heath, Chip and Dan► Data Analysis with Open Source Tools - Janert, Philipp K.► We Feel Fine: An Almanac of Human Emotion - Kamvar, Sep► Turning Numbers into Knowledge: Mastering the Art of Problem Solving - Koomey, Jon► Elements of Graph Design - Kosslyn, Stephen M.2012-05-24 17 Andy Kirk, http://www.visualisingdata.com
  • 18. Texts (2 of 2)► Graph Design for the Eye and Mind - Kosslyn, Stephen M.► Dont Make Me Think: A Common Sense Approach to Web Usability, 2nd Edition - Krug, Steve► Universal Principles of Design, Revised and Updated - Lidwell - Holden, Butler► Visual Complexity: Mapping Patterns of Information - Lima, Manuel► The Power of the 2 x 2 Matrix: Using 2 x 2 Thinking to Solve Business Problems and Make Better Decisions - Lowy, Alex► How Maps Work: Representation, Visualization, and Design - MacEachren, Alan M.► The Laws of Simplicity (Simplicity: Design, Technology, Business, Life) - Maeda, John► Visual Language for Designers: Principles for Creating Graphics that People Understand - Malamed, Connie► Understanding Comics: The Invisible Art - Mccloud, Scott► The Chicago Guide to Writing about Numbers (Chicago Guides to Writing, Editing, and Publishing) - Miller, Jane E.► How to make an IMPACT - Moon, Jon► Designing Visual Interfaces: Communication Oriented Techniques - Mullet, Kevin► The Designful Company: How to build a culture of nonstop innovation - Neumeier, Marty► Emotional Design: Why We Love (or Hate) Everyday Things - Norman, Donald A.► The Design of Everyday Things - Norman, Donald A.► Playfairs Commercial and Political Atlas and Statistical Breviary - Playfair, William► Presentation Zen Design: Simple Design Principles and Techniques to Enhance Your Presentations - Reynolds, Garr► Presentation Zen: Simple Ideas on Presentation Design and Delivery - Reynolds, Garr► The Back of the Napkin (Expanded Edition): Solving Problems and Selling Ideas with Pictures - Roam, Dan► Unfolding the Napkin: The Hands-On Method for Solving Complex Problems with Simple Pictures - Roam, Dan► Creating More Effective Graphs - Robbins, Naomi B.► The Craft of Information Visualization: Readings and Reflections - Shneiderman, Ben► The Visual Display of Quantitative Information - Tufte, Edward R.► Envisioning Information - Tufte, Edward R.► Beautiful Evidence - Tufte, Edward R.► Graphic Discovery: A Trout in the Milk and Other Visual Adventures - Wainer, Howard► Visual Thinking: for Design - Ware, Colin► The Grammar of Graphics - Wilkinson, Leland► Non-Designers Design Book (3rd Edition) - Williams, Robin2012-05-24 18► Glut: Mastering Information Through the Ages - Wright, Alex Andy Kirk, http://www.visualisingdata.com
  • 19. Tools for Analysis, Graphing and Enterprise► Microsoft Excel http://office.microsoft.com/en-us/excel/► Open Office Calc http://why.openoffice.org/why_great.html► Tableau Desktop http://www.tableausoftware.com/products/desktop► Tableau Public http://www.tableausoftware.com/public/► TIBCO Spotfire http://spotfire.tibco.com/► QlikView http://www.qlikview.com/► Grapheur http://grapheur.com/► Gephi http://gephi.org/► Visokio Omniscope http://www.visokio.com/► Panopticon http://www.panopticon.com/► Wolfram Mathematica http://www.wolfram.com/mathematica/► Data Graph http://www.visualdatatools.com/DataGraph/► OmniGraphSketcher http://www.omnigroup.com/products/omnigraphsketcher► PLOT http://plot.micw.eu/► MATLAB http://www.mathworks.com/products/matlab/► SPSS Visualisation Designer http://www-01.ibm.com/software/analytics/spss/products/statistics/vizdesigner/► STATA http://www.stata.com/► Visualize Free http://visualizefree.com/index.jsp► Dundas http://www.dundas.com/dashboard/► Wondergraphs http://www.wondergraphs.com/ Andy Kirk, http://www.visualisingdata.com2012-05-24 19
  • 20. Visual Programming Languages and Environments► Adobe Flash http://www.adobe.com/products/flash/► Processing http://processing.org/► Processing.js http://processingjs.org/► R http://www.r-project.org/► D3 http://mbostock.github.com/d3► Protovis http://protovis.org/► Prefuse http://prefuse.org/► Prefuse Flare http://flare.prefuse.org/► Impure http://www.impure.com/► Mondrian http://www.theusrus.de/Mondrian/► HTML5 http://dev.w3.org/html5/spec/► Python http://www.python.org/► Silverlight http://www.silverlight.net/► Orange http://orange.biolab.si► paper.js http://paperjs.org/about/► WebGL http://www.chromeexperiments.com/webgl► Dejavis http://dejavis.org/stacks► Simile Widgets http://simile-widgets.org/► JavaScript InfoVis Toolkit http://thejit.org/► Juice Kit http://www.juicekit.org/► Treevis http://treevis.net/ Andy Kirk, http://www.visualisingdata.com2012-05-24 20
  • 21. Tools for Analysis, Graphing and Enterprise► Microsoft Excel http://office.microsoft.com/en-us/excel/► Open Office Calc http://why.openoffice.org/why_great.html► Tableau Desktop http://www.tableausoftware.com/products/desktop► Tableau Public http://www.tableausoftware.com/public/► TIBCO Spotfire http://spotfire.tibco.com/► QlikView http://www.qlikview.com/► Grapheur http://grapheur.com/► Gephi http://gephi.org/► Visokio Omniscope http://www.visokio.com/► Panopticon http://www.panopticon.com/► Wolfram Mathematica http://www.wolfram.com/mathematica/► Data Graph http://www.visualdatatools.com/DataGraph/► OmniGraphSketcher http://www.omnigroup.com/products/omnigraphsketcher► PLOT http://plot.micw.eu/► MATLAB http://www.mathworks.com/products/matlab/► SPSS Visualisation Designer http://www-01.ibm.com/software/analytics/spss/products/statistics/vizdesigner/► STATA http://www.stata.com/► Visualize Free http://visualizefree.com/index.jsp► Dundas http://www.dundas.com/dashboard/► Wondergraphs http://www.wondergraphs.com/ Andy Kirk, http://www.visualisingdata.com2012-05-24 21
  • 22. Googles Charting and Visualisation Tools► Google Docs https://docs.google.com/?pli=1#home► Google Fusion Tables http://www.google.com/fusiontables/Home?pli=1► Google Chart API http://code.google.com/apis/chart/► Google Visualization API http://code.google.com/apis/visualization/documentation/gallery.html► Google Motion Chart & Public Data Explorer http://www.google.com/publicdata/home► Google Insights for Search http://www.google.com/insights/search/#► Google Zeitgeist http://www.google.com/intl/en/press/zeitgeist2010/► Google Ngram Viewer http://ngrams.googlelabs.com/► Google Analytics http://www.google.com/intl/en_uk/analytics/► Google.org Philanthropy http://www.google.org/#one► Google Wonder Wheel http://www.google.com/landing/searchtips/engineers.html► GraphViz http://code.google.com/apis/chart/docs/gallery/graphviz.html► Choosel http://code.google.com/p/choosel/► Data Appeal http://dataappeal.com/ Andy Kirk, http://www.visualisingdata.com2012-05-24 22
  • 23. Tools for Mapping► Google Maps & Google Earth http://www.google.co.uk/help/maps/tour/► ArcGIS http://www.arcgis.com/home/index.html► GeoCommons http://geocommons.com/► OpenHeatMap http://www.openheatmap.com/► Indiemapper http://indiemapper.com/► InstantAtlas http://www.instantatlas.com/Choose_your_language.xhtml► Target Map http://www.targetmap.com/► TileMill http://tilemill.com/index.html► Polymaps http://polymaps.org/► Color Brewer http://colorbrewer2.org/► Dotspotting http://dotspotting.org/► DataMaps.eu http://www.datamaps.eu/► GeoTime http://geotime.com/ Andy Kirk, http://www.visualisingdata.com2012-05-24 23
  • 24. Specialist Tools and Visualisation Communities► Many Eyes http://www-958.ibm.com/software/data/cognos/manyeyes/► Visual.ly http://visual.ly/► Visualizing Player http://www.visualizing.org/► Number Picture http://numberpicture.com/► Parallel Sets http://eagereyes.org/parallel-sets► Dipity http://www.dipity.com/► Wordle http://www.wordle.net/► Tagxedo http://www.tagxedo.com/► VisualEyes http://www.viseyes.org/► Wordlings http://wordlin.gs/► Chartle http://www.chartle.net/► ChartsBin http://chartsbin.com/► Simple Usability http://www.simpleusability.com/services/usability/eye-tracking► Fineo http://fineo.densitydesign.org/custom/ Andy Kirk, http://www.visualisingdata.com2012-05-24 24
  • 25. Combination of Many DisciplinesGiven complexity of data, insights from diverse fields are required to providemeaningful solutions: Statistics Graphic Design Data Mining Computer Science Data/Info Visualisation (Ben Fry – “Visualizing Data”)2011/12 25
  • 26. Pick an area of interest/define your requirements, then drill down...2012-05-24 26
  • 27. Primary Texts2012-05-24 27
  • 28. “Designing Data Visualizations”Designing Data VisualizationsIntentional Communication from Data to DisplayNoah Iliinsky and Julie SteelePublisher: OReilly Media (September 29, 2011)ISBN-10: 14493122842012-05-24 28
  • 29. “Visualize This”Visualize ThisThe Flowing Data Guide to Design, Visualization and StatisticsNathan YauPublisher: Wiley (July 20, 2011)ISBN-10: 04709448892012-05-24 29
  • 30. “Visualizing Data”Visualizing DataBen FryPublisher: OReilly Media (January 11, 2008)ISBN-10: 14493122842011/12 30
  • 31. Course tools (all free/open source)2012-05-24 31
  • 32. R Project http://www.r-project.org/2012-05-24 32
  • 33. R Studio http://rstudio.org2012-05-24 33
  • 34. R & R Studio stack R-Studio RMust have R Computer OSfor R Studioto work 2012-05-24 34
  • 35. Inkscape http://inkscape.org/2012-05-24 35
  • 36. Python http://python.org/► Download & install » http://wiki.python.org/moin/BeginnersGuide/Download► Beginners Guide » http://wiki.python.org/moin/BeginnersGuide/NonProgrammers2012-05-24 36
  • 37. Beautiful Soup http://www.crummy.com/software/BeautifulSoup/2012-05-24 37
  • 38. Notepad++ http://notepad-plus-plus.org/2012-05-24 38
  • 39. 7Zip http://www.7-zip.org/2012-05-24 39
  • 40. Calibre http://calibre-ebook.com/2012-05-24 40
  • 41. Agenda1) Background to Data Visualization*2) Resources3) Classification of Visualisation4) The Design Process5) Demonstration2012-05-24 41
  • 42. Classification of Visualization2012-05-24 42
  • 43. “Designing Data Visualizations”Designing Data VisualizationsIntentional Communication from Data to DisplayNoah Iliinsky and Julie SteelePublisher: OReilly Media (September 29, 2011)ISBN-10: 14493122842012-05-24 43
  • 44. Classifications of Visualizations 1 Complexity 2 Infographics Data Viz 3 Exploration Explanation 4 Informative Persuasive Visual Art2012-05-24 44
  • 45. (Data Visualisations) (Infographics) Figure1-2. The difference between infographics and data visualization may be loosely determined2012-05-24 45 by the method of generation, the quantity of data represented, and the degree of aesthetic treatment applied.
  • 46. InfographicsInfographics is useful term for referring to visual representation of data that is: » manually drawn (and therefore a custom treatment of the information) » specific to the data at hand (and therefore non-trivial to recreate with different data) » aesthetically rich (strong visual content meant to draw the eye and hold interest) » relatively data—poor (because each piece of information must be manually encoded)2012-05-24 46
  • 47. 2012-05-24 47
  • 48. 2012-05-24 48
  • 49. Classifications of Visualizations 1 Complexity 2 Infographics Data Viz 3 Exploration Explanation 4 Informative Persuasive Visual Art2012-05-24 49
  • 50. (Data Visualisations) (Infographics) Figure1-2. The difference between infographics and data visualization may be loosely determined2012-05-24 50 by the method of generation, the quantity of data represented, and the degree of aesthetic treatment applied.
  • 51. Data VisualizationThe terms data visualization and information visualization refer to any visualrepresentation of data that is: » algorithmically drawn (may have custom touches but is largely rendered with the help of computerized methods); » easy to regenerate with different data (the same form may be re-purposed to represent different datasets with similar dimensions or characteristics); » often aesthetically barren (data is not decorated); and » relatively data-rich (large volumes of data are welcome and viable, in contrast to infographics)2012-05-24 51
  • 52. Figure 4-47: Unemployment rates with fitted LOESS curve2012-05-24 52
  • 53. 2012-05-24 53
  • 54. Classifications of Visualizations 1 Complexity 2 Infographics Data Viz 3 Exploration Explanation 4 Informative Persuasive Visual Art2012-05-24 54
  • 55. Exploration vs Explanation Exploratory visualization: 103123101123425832 453246502163409218 ► The dataset 3640634102 9236401326432654 736147236421523452 123453456856 141232343576 (1) (2) 153465 ► The mind of the designer Explanatory visualization: ? ► The mind of the designer ► The mind of the reader (3)2012-05-24 55
  • 56. "Holy Trinity" Designer-Reader-Data Reader Informative Persuasive Data Visual Art Designer Figure 1-4. The nature of the visualization depends on which relationship (between two of the three components) is dominant.2012-05-24 56
  • 57. Classifications of Visualizations 1 Complexity 2 Infographics Data Viz 3 Exploration Explanation 4 Informative Persuasive Visual Art2012-05-24 57
  • 58. Informative2012-05-24 58 http://www.irisheconomy.ie/wp-content/uploads/2009/05/unemployment.gif
  • 59. Persuasive2012-05-24 59
  • 60. 2012-05-24 60 http://www.flickr.com/photos/robertpalmer/3743826461/sizes/l/in/photostream/
  • 61. Visual Art2012-05-24 61 Nora Ligorano and Marshall Reese designed a project that converts Twitter streams into a woven fiber-optic tapestry http://ligoranoreese.net/hber-optic-tapestry)
  • 62. Classifications of Visualizations 1 Complexity 2 Infographics Data Viz 3 Exploration Explanation 4 Informative Persuasive Visual Art2012-05-24 62
  • 63. Agenda1) Background to Data Visualization*2) Resources3) Classification of Visualisation4) The Design Process5) Demonstration2012-05-24 63
  • 64. The Design Process2012-05-24 64
  • 65. “Visualizing Data”Visualizing DataBen FryPublisher: OReilly Media (January 11, 2008)ISBN-10: 05965145572011/12 65
  • 66. Reconcile through single process...► Must reconcile the various elements through a single process► The process begins with: » a set of numbers » a question2011/12 66
  • 67. Visualization Goals - Technical1) Highlight data features in order of their importance2) Reveal patterns3) Simultaneously show features across multiple dimensions » e.g. time, quantity & geography2011/12 67
  • 68. Visualization Goals - People► The goal of your visualization will be informed by: » Your own goals and motivations » The needs of your reader ? • need for specific information • to change the reader’s opinions or behaviour2012-05-24 68
  • 69. Data Visualization Process -7 Stages-acquire parse filter mine represent refine interact ► Iteration & combination » demonstrates how later decisions can affect earlier stages 2011/12 69
  • 70. Data Process – 7 Stages 1) Acquire Obtain the data (file, disk, over network) 2) Parse Provide some structure for the datas meaning, and order it into categories 3) Filter Remove all but the data of interest 4) Mine Apply methods from statistics or data mining as a way to discern patterns or place the data in mathematical context 5) Represent Choose a basic visual model, such as a bar graph, list or tree 6) Refine Improve the basic representation to make it clearer and more visually engaging 7) Interact Add methods for manipulating the data or controlling what features are visible (may not need every step in every project)2011/12 70
  • 71. Represent► Rule #1 - function then form► The visual design elements should enhance and enable the function► The key to a successful visualization is making good design choices » elegance, simplicity, efficiency2012-05-24 71
  • 72. Encodings2012-05-24 72
  • 73. Agenda1) Background to Data Visualization*2) Resources3) Classification of Visualisation4) The Design Process5) Demonstration2012-05-24 73
  • 74. Demonstration (walk-through followed by demo)2012-05-24 74
  • 75. “Visualize This”Visualize ThisThe Flowing Data Guide to Design, Visualization and StatisticsNathan YauPublisher: Wiley (July 20, 2011))ISBN-10: 04709448892012-05-24 75
  • 76. R Project http://www.r-project.org/2012-05-24 76
  • 77. R Studio http://rstudio.org2012-05-24 77
  • 78. The R Script► A file in the R format► Allows you to save your scripting work► File (or Ctrl+Shift+N) » New • R Script► Hit “Run” (or Ctrl + Enter) after each command2011/12 78
  • 79. The R Script2011/12 79
  • 80. The R Script pane2011/12 80
  • 81. Installing packages Package installation in R Studio► Option 1 (R or R Studio) » Type the following commands into the console or R script: » install.packages(package­name) » library (package­name)► Option 2 (R Studio) » Use GUI as show on right -> Activate package2011/12 81
  • 82. Python http://python.org/► Download & install » http://wiki.python.org/moin/BeginnersGuide/Download► Beginners Guide » http://wiki.python.org/moin/BeginnersGuide/NonProgrammers2012-05-24 82
  • 83. Beautiful Soup http://www.crummy.com/software/BeautifulSoup/2012-05-24 83
  • 84. Inkscape http://inkscape.org/2012-05-24 84
  • 85. Process (roughly) Beautiful colorize Soup counties /cmd _svg .svg .py (data crunched) (writes to a(run colorize_svg (or double-click new file).py) to run) (uses BS & Python) 2012-05-24 85
  • 86. Chapter 8: Visualizing Spatial Relationships► What to Look For► Specific Locations » Just Points • Map with Dots • Map with Lines » Scaled Points • Map with Bubbles► Regions » Color by Data • Map Counties • Map Countries2011/12 86
  • 87. Map the points2011/12 87
  • 88. New Map with Dots R script file► R, although limited in mapping functionality, makes placing dots on a map easy► The maps package does most of the work » install via Package Installer or console.► Next step: Load the data. Use the Costco locations that you just geocoded, or load it directly from the URL costcos <­read.csv("http://book.flowingdata.com/ch08/geocode/costcos­ geocoded.csv", sep=",")2011/12 88
  • 89. Costco2011/12 89
  • 90. Mapping – first layer► When you create your maps, it’s useful to think of them as layers (regardless of the software in use).► The bottom layer is usually the base map that shows geographical boundaries, and then you place data layers on top of that.► In this case the bottom layer is a map of the United States, and the second layer is Costco locations map(database="state")2011/12 90 Figure 8-2: Plain map of the United States
  • 91. Mapping – second layer► The second layer, or Costco’s, are then mapped with the symbols() function.symbols(costcos$Longitude, costcos$Latitude,  circles=rep(1, length(costcos$Longitude)), inches=0.05, add=TRUE) symbols()2011/12 Figure 8-3: Map of Costco locations 91
  • 92. Change colours ► Change the colors of both the map and the circles so that the locations stand out and boundary lines sit in the backgroundmap(database="state", col="#cccccc")symbols(costcos$Longitude, costcos$Latitude, bg="#e2373f", fg="#ffffff",  lwd=0.5, circles=rep(1, length(costcos$Longitude)),  inches=0.05, add=TRUE) 2011/12 92 Figure 8-4: Using color with mapped locations
  • 93. Result?► Not bad for a few lines of code. Costco has clearly focused on opening locations on the coasts with clusters in southern and northern California, northwest Washington, and in the northeast of the country.2011/12 93 Figure 8-4: Using color with mapped locations
  • 94. Anything missing? (US geography question)2011/12 94
  • 95. Alaska & Hawaii► Alaska and Hawaii are in the “world” database, so you need to map the entire world map(database="world", col="#cccccc") symbols(costcos$Longitude, costcos$Latitude, bg="#e2373f", fg="#ffffff",    lwd=0.3, circles=rep(1, length(costcos$Longitude)),    inches=0.03, add=TRUE)2011/12 95 Figure 8-5: World map of Costco locations
  • 96. State specific► Say you want to only map Costco locations for a few states. You can do that with the region argument.map(database="state", region=c("California", "Nevada", "Oregon",  "Washington"), col="#cccccc")symbols(costcos$Longitude, costcos$Latitude, bg="#e2373f", fg="#ffffff",  lwd=0.5, circles=rep(1, length(costcos$Longitude)), inches=0.05,  add=TRUE)► Some dots are not in any of those states » easy to remove in Inkscape2011/12 96 Figure 8-6: Costco locations in selected states
  • 97. Chapter 8: Visualizing Spatial Relationships► What to Look For► Specific Locations » Just Points • Map with Dots • Map with Lines » Scaled Points • Map with Bubbles► Regions » Color by Data • Map Counties • Map Countries2011/12 97
  • 98. Figure 8-7: Drawing a location trace2011/12 98
  • 99. New Map with Lines R script file► Draw the lines by simply plugging in the two columns into lines(). Also specify color (col) and line width (lwd).lines(faketrace$longitude, faketrace$latitude, col="#bb4cd4", lwd=2)► Now also add dots, exactly like you just did with the Costco locationssymbols(faketrace$longitude, faketrace$latitude, lwd=1, bg="#bb4cd4", fg="#ffffff", circles=rep(1, length(faketrace$longitude)), inches=0.05, add=TRUE)2011/12 99 Figure 8-7: Drawing a location trace
  • 100. Figure 8-8: Drawing worldwide connections2011/12 100
  • 101. Drawing Connections► It could be interesting to draw lines from one location to all the othersmap(database="world", col="#cccccc")for (i in 2:length(faketrace$longitude)­1) {       lngs <­ c(faketrace$longitude[8], faketrace$longitude[i])       lats <­ c(faketrace$latitude[8], faketrace$latitude[i])       lines(lngs, lats, col="#bb4cd4", lwd=2) } (run function as a block) ► Isn’t very informative, but maybe you can find a good use for it ► The point here is that you can draw a map and then use R’s other graphics functions to draw whatever you want using latitude and longitude coordinates.2011/12 101 Figure 8-8: Drawing worldwide connections
  • 102. Chapter 8: Visualizing Spatial Relationships► What to Look For► Specific Locations » Just Points • Map with Dots • Map with Lines » Scaled Points • Map with Bubbles► Regions » Color by Data • Map Counties • Map Countries2011/12 102
  • 103. Figure 8-10: Rates more clearly explained for a wider audience2011/12 103
  • 104. Scaled Points► Usually,don’t just have a location » also have other values, e.g • sales volume • city population► Use the principle of bubble plot and apply it to a map2011/12 104
  • 105. New R script file► The code is almost the same as when you mapped Costco locations, but remember you just passed a vector of ones for circle size in the symbols() function. Instead, we use the sqrt() of the rates to indicate size.fertility <­    read.csv("http://book.flowingdata.com/ch08/points/adol­fertility.csv")map(‘world’, fill = FALSE, col = "#cccccc")symbols(fertility$longitude, fertility$latitude,    circles=sqrt(fertility$ad_fert_rate), add=TRUE,    inches=0.15, bg="#93ceef", fg="#ffffff")2011/12 105 Figure 8-9: Adolescent fertility rate worldwide
  • 106. Figure 8-10: Rates more clearly explained for a wider audience2011/12 106
  • 107. Chapter 8: Visualizing Spatial Relationships► What to Look For► Specific Locations » Just Points • Map with Dots • Map with Lines » Scaled Points • Map with Bubbles► Regions » Color by Data • Map Counties • Map Countries2011/12 107
  • 108. Regions► Mapping points can take you only so far because they represent only single locations.► Large scale data is usually aggregated over whole counties, states, countries, and continents► Use Python and SVG to generate map » Python - to process the data http://www.nevron.com/Gallery.DiagramFor.NET.Maps.ChoroplethMaps.aspx » SVG - for the map 2011/12 108
  • 109. Color By Data► Choropleth maps are the most common way to map regional data► Based on some metric, regions are colored following a color scale that you define Figure 8-11: Choropleth map framework2011/12 109
  • 110. Using colours► When you have your color scheme, you have two more things to do: » Scale - decide how the colors you picked match up to the data range » Location - assign colors to each region based on your choice2011/12 110 http://gismapcatalog.blogspot.com/2010/07/standardized-choropleth-map.html
  • 111. Chapter 8: Visualizing Spatial Relationships► What to Look For► Specific Locations » Just Points • Map with Dots • Map with Lines » Scaled Points • Map with Bubbles► Regions » Color by Data • Map Counties • Map Countries2011/12 111
  • 112. Unemployment by county2011/12 112
  • 113. Connect data & map Unemployment rates Beautiful Soup Python New map “colorize_svg.py” Blank map2011/12 113
  • 114. Connect data & map Beautiful Soup Python “colorize_svg.py”2011/12 114
  • 115. File structure2011/12 115
  • 116. Get data► U.S. Bureau of Labor Statistics provides county-level unemployment data every month► Download the data at http://book.flowingdata.com/ch08/regions/unemployment­aug2010.txt.► There are six columns: 1) is a code specific to the Bureau of Labor Statistics 2) and 3) are a unique id specifying county 4) is the county name and  5) is the month the rate is an estimate of 6) is the estimated percentage of people in the county who are unemployed► For the purposes of this example, only interested in COUNTY ID (FIPS) and the RATE2011/12 116
  • 117. US Unemployment figures (BLS) LAUS_CODE,STATE_FIPS,COUNTY_FIPS,COUNTY,MONTH,RATE CN010010,01,001,"Autauga County, AL",Aug­10(p),8.1 PA011000,01,003,"Baldwin County, AL",Aug­10(p),8.2 CN010050,01,005,"Barbour County, AL",Aug­10(p),11.6 CN010070,01,007,"Bibb County, AL",Aug­10(p),10.1 CN010090,01,009,"Blount County, AL",Aug­10(p),8.3 CN010110,01,011,"Bullock County, AL",Aug­10(p),15.0 CN010130,01,013,"Butler County, AL",Aug­10(p),12.2 PA010250,01,015,"Calhoun County, AL",Aug­10(p),9.1 CN010170,01,017,"Chambers County, AL",Aug­10(p),13.6 CN010190,01,019,"Cherokee County, AL",Aug­10(p),8.8 CN010210,01,021,"Chilton County, AL",Aug­10(p),9.4 CN010230,01,023,"Choctaw County, AL",Aug­10(p),11.1 CN010250,01,025,"Clarke County, AL",Aug­10(p),15.8 CN010270,01,027,"Clay County, AL",Aug­10(p),13.3 CN010290,01,029,"Cleburne County, AL",Aug­10(p),8.4 CN010310,01,031,"Coffee County, AL",Aug­10(p),7.3 PA010900,01,033,"Colbert County, AL",Aug­10(p),9.2 CN010350,01,035,"Conecuh County, AL",Aug­10(p),15.4 CN010370,01,037,"Coosa County, AL",Aug­10(p),12.22011/12 117
  • 118. Get map► Blank map from Wikimedia Commons:http://commons.wikimedia.org/wiki/File:USA_Counties_with_FIPS_and_names.svg► download SVG file and save as counties.svg, in the same directory that you save the unemployment data2011/12 118
  • 119. Download the SVG file2011/12 http://commons.wikimedia.org/wiki/File:USA_Counties_with_FIPS_and_names.svg 119
  • 120. SVG map file► SVG (scalable vector graphics) is an XML file► It’s text with tags, and you can edit it in a text editor like you would an HTML file► The browser or image viewer reads the XML, and the XML tells the browser what to show, such as the colors to use and shapes to draw.2011/12 120
  • 121. Figure 8-15: Blank SVG county map from Wikimedia Commons2011/12 121
  • 122. SVG - colour of each state► Change the fill color of each county to match the corresponding unemployment rate <path     style="font­size:12px;fill:#d0d0d0;fill­rule:nonzero;stroke:#000000;stroke­opacity:1;stroke­width:0.1;stroke­miterlimit:4;stroke­dasharray:none;stroke­linecap:butt;marker­start:none;stroke­linejoin:bevel"► There are more than 3,000 counties so use Beautiful Soup to make parsing XML and HTML easy2011/12 122
  • 123. Load the elements (create a small script/program) colorize.svg.py► Open a blank file in the same directory as your SVG map and unemployment data► Save it as colorize_svg.py► Follow instructions from book to construct the script2011/12 123
  • 124. Connect data & map)► The challenge is to somehow link the unemployment data to the county map► The linkage = the FIPS codes (Federal Information Processing Standard) Underemployment rates FIPS codes Blank map2011/12 124
  • 125. US Unemployment figures (BLS) LAUS_CODE,STATE_FIPS,COUNTY_FIPS,COUNTY,MONTH,RATE CN010010,01,001,"Autauga County, AL",Aug­10(p),8.1 PA011000,01,003,"Baldwin County, AL",Aug­10(p),8.2 CN010050,01,005,"Barbour County, AL",Aug­10(p),11.6 CN010070,01,007,"Bibb County, AL",Aug­10(p),10.1 CN010090,01,009,"Blount County, AL",Aug­10(p),8.3 CN010110,01,011,"Bullock County, AL",Aug­10(p),15.0 CN010130,01,013,"Butler County, AL",Aug­10(p),12.2 PA010250,01,015,"Calhoun County, AL",Aug­10(p),9.1 CN010170,01,017,"Chambers County, AL",Aug­10(p),13.6 CN010190,01,019,"Cherokee County, AL",Aug­10(p),8.8 CN010210,01,021,"Chilton County, AL",Aug­10(p),9.4 CN010230,01,023,"Choctaw County, AL",Aug­10(p),11.1 CN010250,01,025,"Clarke County, AL",Aug­10(p),15.8 CN010270,01,027,"Clay County, AL",Aug­10(p),13.3 CN010290,01,029,"Cleburne County, AL",Aug­10(p),8.4 CN010310,01,031,"Coffee County, AL",Aug­10(p),7.3 PA010900,01,033,"Colbert County, AL",Aug­10(p),9.2 CN010350,01,035,"Conecuh County, AL",Aug­10(p),15.4 CN010370,01,037,"Coosa County, AL",Aug­10(p),12.22011/12 125
  • 126. Connect data & SVG (map)► Each path in the SVG file has a unique id » combined FIPS state and county FIPS code:   id="01001"     inkscape:label="Autauga, AL”2011/12 126
  • 127. Run the Python script$ python colorize_svg.py > colored_map.svg 2011/12 127
  • 128. Possible code problem... unemployment = {} rates_only = [] # To In book... calculate quartiles min_value = 100; max_value  = 0; past_header = False for row in reader:     if not past_header:         past_header = True         continue     try:         full_fips = row[1]  + row[2]         rate =  float( row[5].strip() )          unemployment[full_fips] =  rate          rates_only.append(rate)     except: Finished script...         pass2011/12 128 http://book.flowingdata.com/ch08/regions/colorize_svg.py.txt
  • 129. Figure 8-18: Choropleth map showing unemployment rates► Open your new choropleth map in a modern browser such as Firefox, Safari, or Chrome or in Inkscape to see the fruits of your labor2011/12 129
  • 130. Next... unemployment rates divided by quartiles2011/12 130
  • 131. Define thresholds is by quartiles► Another common way to define thresholds is by quartiles » This means that a quarter of the counties have rates below 6.9 percent, another quarter between 6.9 and 8.7, one between 8.7 and 10.8, and the last quarter is greater than 10.8 percent # Quantile scale if rate > 10.8:   color_class = 3 elif rate > 8.7:   color_class = 2 elif rate > 6.9:   color_class = 1 else:   color_class = 02011/12 131
  • 132. Define thresholds is by quartiles► Use four colors to represents a quarter of the regions » one shade per quarter colors = ["#f2f0f7", "#cbc9e2", "#9e9ac8", "#6a51a3"]2011/12 132
  • 133. Quartiles for re-use► Instead of hard-coding the values 6.9, 8.7, and 10.8 in your code, you can replace those values with q1, q2, and q3, respectively. » The advantage of calculating the values programmatically is that you can reuse the code with a different dataset just by changing the CSV file # Quartiles  rates_only.sort() q1_index = int( 0.25 * len(rates_only) ) q1 = rates_only[q1_index] q2_index = int( 0.5 * len(rates_only) ) q2 = rates_only[q2_index] q3_index = int( 0.75 * len(rates_only) ) q3 = rates_only[q3_index]2011/12 133
  • 134. Modify the script (or create a new one) colorize.svg.py► Follow instructions in book to construct the next script example► Minor alterations2011/12 134
  • 135. Figure 8-19: Unemployment rates divided by quartiles2011/12 135
  • 136. Customise and reuse► You can edit the SVG file in Inkscape, change border colors and sizes, and add annotation to make it a complete graphic for a larger audience (hint: It still needs a legend) and that fits with the theme of your project.► The code is reusable - you can apply it to other datasets that use the FIPS code.2011/12 136
  • 137. In action...2012-05-24 137
  • 138. Summary1) Background to Data Visualization2) Resources3) Classification of Visualization4) The Design Process5) Demonstration2012-05-24 138
  • 139. Take-away Points1) Open to all » new domain with many facets2) Professional-level output is achievable » practice a few programming and graphic design techniques3) Its (only) a means to an end » should affect behaviour2012-05-24 139
  • 140. The end. Thank you :)2012-05-24 140