Practical Considerations for Displaying Quantitative Data

2,791 views
2,582 views

Published on

Many librarians need to express data visually in reports, papers, and presentations. The goal of this talk is to cover the basics of effectively displaying quantitative data visually. It will include an overview of quantitative data types and common quantitative relationships that can be expressed visually. The talk will emphasize practical considerations and guidance for effectively selecting and designing data visualizations, such as those found in everyday tools like Microsoft Excel and the Google Visualization API.

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,791
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
60
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide
  • Data visualization seems like it's become a bit of buzzword. So at the risk of disappointing some of you I'm not going to show a lot of fancy graphics. My goals for this talk are to dispel the myth that data visualization is something new. And I want to provide a framework for thinking about data visualization that you can apply to the kind of data visualization most of us do. This will involve approaching data with good questions, knowing the material you're working with, in this case data and basic tables and graphs. And also understanding how the human visual perceptual system affects what makes for useful displays of quantitiative information. I'll also show a few applications you might want to try out. And point you in some directions for future reading if you want to know more.
  • Now for some history
  • First maps were of the sky Cave paintings at Lascaux contain star maps Image from flickr user williamcromar
  • Maps of land came later. There seem to be several contenders for the first town map But here is a frequently cited example from Konya, Turkey in 6200 BCE
  • This graph by an unknown author attempts to show the movement of the planets over time. I can't vouch for its accuracy.
  • Rene Descartes – invents the Cartesian coordinate system This has significant impact on how we visualize quantitative information
  • William Playfair is credited with inventing statistical graphics. He invented the Bar Chart This is a later example that shows the rise in the price of wheat along with the rise in wages over time
  • A local example. Ben Schneiderman invented the treemap as a way to visualize usage of his Macintosh's hard drive. It's useful for displaying hierarchical data
  • Hans Rosling invents the Motion Bubble Chart – which is now part of Google's visualization API An interactive chart that displays several variables at once and animates changes over time. It's featured in a popular TED talk
  • Computers are powerful tools, and yet we still need a human brain to tell the computer what to process and how to process it. It's our job to approach the computer with the right questions. I want to emphasize the importance of asking good questions.
  • 1913 London Underground Map - http://homepage.ntlworld.com/clive.billson/tubemaps/1913.html Here is an example of a data visualization (or map) that is accurate but may not work well for its intended purpose. Things to notice It's a standard map project Subway lines appear where they would geographically if they were on the surface Roads, various municipal boundaries are visible. It works but it's not optimal
  • Harry Beck's 1933 Underground Map Beck took a step back Considered the problem that the subway map was attempting to solve What matters are relation of stops and transfer stations to each other Legibility of stop names – where to get on and off Subway is underground – don't need roads For simplicity and legibility lines are drawn at 90 and 45 degree angles – Similar to electrical circuit diagrams http://sites.google.com/site/tombowersites/harry-beck
  • 2010 Boston T Map This basic design is so successful that it is still used for subway maps around the world
  • I've leaned heavily on Stephen Few's Show Me the Numbers – which is a great book for getting a handle on how to use tables and graphs effectively.
  • Here is data It's a quantity But we don't know enough to know what it is quantifying
  • This just happens to be the number of keyword searches performed on NCSU libraries website last spring.
  • In order for data to mean something, in order for it to be information it needs to express a relationship
  • Nominal comparison – differences in particular values Time series – how values change over time Ranking – the order of values Part to whole (%) – percentages – what part of this whole is made up of that Deviation – difference from some standard value Distribution – how a set values are distributed over a range Correlation – whether two different values change together
  • I've leaned heavily on Stephen Few's Show Me the Numbers – which is a great book for getting a handle on how to use tables and graphs effectively.
  • It turns out that it's important to understand the kind of quantitative relationships you're working with because particular methods display are better at conveying particular quantitative relationships
  • Most data is or can be arranged in tables. It's often the perfect starting place and sometimes the right format for presenting quantitative information. Graphs aren't always very good at these things where Tables excel.
  • Back to my library website search example. This table has precise values with mixed units of measure Even a part to whole relationship on the bottom row
  • Likewise, graphs excel in areas where tables aren't so useful. When meaning that is hidden in a table is revealed by the shape of the values
  • 13,000 pages of data – how to make this understandable?
  • Before looking more closely at different kinds of graphs and the kinds of quantitative relationships they're good at expressing, I want to introduce the role that visual perception plays in data visualization. See Stephen Few's Show Me the Numbers and Christopher G. Healey's Perception in Visualization http://www.csc.ncsu.edu/faculty/healey/PP/index.html
  • We don't just see stuff that's out there. Light reflects off objects. That light gets collected by our eyes and stimulates the retina. The signals from the retina are interpreted by the brain. There are particular ways that our brain processes visual information that has a bearing on what is and isn't useful for visualizing data. Brain image originally posted to Flickr, was uploaded to Commons using Flickr upload bot on 22:05, 20 October 2008 (UTC) by Kaldari Eye image: Copyright: public domain, credit to NIH National Eye Institute requested. Mountain: Some rights reserved by Ian BC North
  • Here is a series of numbers If I asked you pick out and count all the 0's you'd have to scan the numbers serially and count as you moved your eye from one digit to the next. This would take you some time, maybe 20 to 30 seconds Example adapted from Stephen Few's Show Me the Numbers .
  • If I increase the intensity of the color of the 0's Suddenly you can pick out the zeroes without having to process All of the visual information serially You can pick out without thinking about it all the items with increased intensity.
  • An important initial result was the discovery of a limited set of visual properties that are detected very rapidly and accurately by the low-level visual system. These properties were initially called preattentive, since their detection seemed to precede focused attention. One way to think about this is that preattentive features we can processes all at once, while other features we have to process serially. Another thing to keep in mind is that the more of these attributes that are present the less effective they are.
  • There is a small subset of these that we can interpret quantitatively. Notice that line length and 2d spatial position are the most effective attributes. Others can be used but they pose challenges. I am going to ignore flicker and direction to focus on static images, but you could also use these to display quantitative information. This is an incomplete list. For a really in-depth discussion of preattentive processing and attributes see http://www.csc.ncsu.edu/faculty/healey/PP/index.html
  • There is a small subset of these that we can interpret quantitatively. Notice that line length and 2d spatial position are the most effective attributes. Others can be used but they pose challenges. I am going to ignore flicker and direction to focus on static images, but you could also use these to display quantitative information. This is an incomplete list. For a really in-depth discussion of preattentive processing and attributes see http://www.csc.ncsu.edu/faculty/healey/PP/index.html
  • Scatterplot – takes advantage of 2D spatial position
  • Line chart also takes advantage of 2D spatial position. Line chart is really a scatterplot with lines draw between points in some sequence.
  • Bar chart takes advantage of line length and 2D spatial position
  • This was created using protovis. You can also consider using small multiples. In this case they are intended to show differences in rate of change over time across different departments If you read consumer reports, you're familiar with this, when they use their colored dot matrix to rate various products, That is an effective use of small multiples displayes to enhance comparisons across categories. http://en.wikipedia.org/wiki/File:Smallmult.png Public domain image You can also consider using small multiples. In this case they are intended to show differences wait times for different device. The lines show the pattern within each device type and the color intensity show higher average weight time time across different devices If you read consumer reports, you're familiar with this, when they use their colored dot matrix to rate various products, That is an effective use of small multiples displayes to enhance comparisons across categories. http://en.wikipedia.org/wiki/File:Smallmult.png Public domain image
  • Both charts show the same data in the same order. Which makes it easier to determine whether B or C is larger. Turns out we're better at distinguishing small differences in length than small differences in area, which is why I think pie charts are usually a bad idea and I pretty much never use them.
  • Both charts show the same data in the same order. Which makes it easier to determine whether B or C is larger. Turns out we're better at distinguishing small differences in length than small differences in area, which is why I think pie charts are usually a bad idea and I pretty much never use them.
  • No data visualization presentation could be complete without mentioning Edward Tufte. The graph should reveal more than the data can reveal in its raw form Don't worry about doing something pretty or cool, do something effective Don't lie and 3D effects are probably a bad idea A really good visualization let's you some things quickly, but also can reveal depth upon closer examination Know what you're showing and why you're showing it.
  • Now that we have an historical context. Know that having good questions is important Common kinds of quantitative relationships How our visual perceptual system influences what makes for a good data visualization And that particular graphs are better at displaying particular quantitative relationships, let's look at some of the tools that are out there.
  • As an alternative to excel, which can also produce great charts. BTW I think excel is a great tool for exploring datasets, because the cost of trying things is so low.
  • Has the advantage of looking and working like the familiar spreadsheet applications Different visualization options can be accessed By inserting Gadgets or Charts into the document Here I'm selected a Gadget.
  • Menu of available Google Gadgets.
  • So in this case I've created a treemap from some collection management data about our spending on resources. But what I really want to point out is the Publish button. Because this app is in the cloud. You get some advantages over excel. You can publish graphs and then very easily embed them in other web pages, which is useful if you want to create a web-accessible report. Area of each of the rectangles corresponds to that node's value. "Treemaps display hierarchical data as a set of nested rectangles. Each branch of the tree is given a rectangle, which is then tiled with smaller rectangles representing sub-branches. A leaf node's rectangle has an area proportional to a specified dimension on the data." -- http://en.wikipedia.org/wiki/Treemapping One of the more interesting features of Google Documents is the ability to publish Documents and Gadgets. Publishing generates a code snippet that can be added to a webpage to display the chart or document.
  • Another web tool that doesn't require any programming.
  • After some struggle about whether I should present on two different tools from the The Google, I decided I would be honest and go ahead and reveal the tools I use most often.
  • Somewhat like google gadgets but more powerful. Google Visualization API Collection of JavaScript visualizations You can customize and embed in web pages Requires some programming know-how
  • Relatively simple Javascript embedded in a web page generates the chart. Can modify this directly and create a chart, but the data will be static.
  • The advantage of this is that I can use PHP to generate Javascript. In this case everytime I load this page PHP processes all the most current log data, generates the javascript and I can see a chart of search activity that's up to date every time the page gets loaded.
  • There are lots of tools out there for doing various sorts of data visualizations. Thanks to Hilary Davis and Joe Ryan for some of the following tools/book/website recommendations. Many Eyes you have to make your dataset public, which can be a consideration
  • More advanced tools are more flexible but often require some comfort with javascript and/or PHP depending on what you want to accomplish.
  • There seem to be more of these.
  • Adobe Illustrator often creates cleaner looking charts than excel – at the cost of some effort to learn the application OmniGraffle is fantastic for making diagrams Viso plays a similar roll for PCs
  • Adobe Illustrator often creates cleaner looking charts than excel – at the cost of some effort to learn the application OmniGraffle is fantastic for making diagrams Viso plays a similar roll for PCs
  • Edward Tufte gets a lot of attention Personally, I think he's overrated He popularized the idea of displaying information visually His first book is worth checking out But Few will be more helpful for practical advice I really like Show Me the Numbers – very practical guide to statistics and basic charts
  • Also a number of websites that are worth checking out on your own time.
  • Practical Considerations for Displaying Quantitative Data

    1. 1. Practical Considerations for Displaying Quantitative Data Cory Lown NCSU Libraries Maryland SLA 21 October 2010
    2. 2. Outline <ul><li>History and context </li></ul><ul><li>Things to consider </li></ul><ul><ul><li>Good questions </li></ul></ul><ul><ul><li>What is data? </li></ul></ul><ul><ul><li>What kind of chart? </li></ul></ul><ul><ul><li>Visual perception </li></ul></ul><ul><li>Data visualization tools </li></ul><ul><li>Where to learn more </li></ul>
    3. 3. History and context
    4. 4. 16,500 BCE
    5. 5. 6,200 BCE
    6. 6. 950
    7. 7. 1637
    8. 8. 1786
    9. 9. 1991 – in Maryland
    10. 10. 2005
    11. 11. Data visualization isn't new
    12. 12. What is new <ul><li>Amount of data </li></ul><ul><li>Computer processing ubiquity </li></ul><ul><li>Desktop and Web applications </li></ul>
    13. 13. <ul><li>Computers are useless. They can only give you answers. </li></ul><ul><li>— Pablo Picasso </li></ul>
    14. 14. Good questions
    15. 15. Untitled Image Layout <ul><li>Image of something built </li></ul>
    16. 16. Untitled Image Layout <ul><li>Image of a tool </li></ul>
    17. 18. What is data? *See Stephen Few's Show Me the Numbers
    18. 19. 155,741
    19. 20. 155,741 Searches
    20. 21. Quantitative information always expresses relationships
    21. 22. Quantitative relationships are: <ul><li>An association between quantitative values and categories </li></ul><ul><li>Associations among multiple sets of quantitative values </li></ul>
    22. 23. Relationships among quantities <ul><li>Nominal comparison </li></ul><ul><li>Time series </li></ul><ul><li>Ranking </li></ul><ul><li>Part to whole (%) </li></ul><ul><li>Deviation </li></ul><ul><li>Distribution </li></ul><ul><li>Correlation </li></ul>
    23. 24. What kind of chart? *See Stephen Few's Show Me the Numbers
    24. 25. Charts <ul><li>Tables </li></ul><ul><li>Graphs </li></ul>
    25. 26. Tables <ul><li>Look up individual values </li></ul><ul><li>Compare individual values </li></ul><ul><li>Precision is important </li></ul><ul><li>Multiple units of measure </li></ul>
    26. 27. A table with mixed units
    27. 28. Graphs <ul><li>Meaning is revealed by the shape of the values </li></ul><ul><li>Show relationships among many values </li></ul>
    28. 29. 1 of 13,000 pages of data
    29. 30. Same data in a graph
    30. 31. Visual perception *See Stephen Few's Show Me the Numbers and Christopher G. Healey's Perception in Visualization http://www.csc.ncsu.edu/faculty/healey/PP/index.html
    31. 32. Stimulus  Stimulation  Perception
    32. 33. Preattentive processing Extremely fast, pre-conscious visual processing
    33. 34. Example 9128732198432789543287 6784905043267812837698 7843928364382398731092 3478957438298374209123 0980934591283754845645 8934678238328009748349
    34. 35. Example 9128732198432789543287 67849 0 5 0 43267812837698 7843928364382398731 0 92 34789574382983742 0 9123 0 98 0 934591283754845645 8934678238328 00 9748349
    35. 36. Some preattentive attributes <ul><li>Form: </li></ul><ul><li>Orientation </li></ul><ul><li>Line length </li></ul><ul><li>Line width </li></ul><ul><li>Size </li></ul><ul><li>Shape </li></ul><ul><li>Curvature </li></ul><ul><li>Added marks </li></ul><ul><li>Enclosure </li></ul><ul><li>Color: </li></ul><ul><li>Hue </li></ul><ul><li>Intensity </li></ul><ul><li>Spatial Position: </li></ul><ul><li>2D </li></ul>
    36. 37. Some preattentive attributes <ul><li>Form: </li></ul><ul><li>Orientation </li></ul><ul><li>Line length </li></ul><ul><li>Line width </li></ul><ul><li>Size </li></ul><ul><li>Shape </li></ul><ul><li>Curvature </li></ul><ul><li>Added marks </li></ul><ul><li>Enclosure </li></ul><ul><li>Color: </li></ul><ul><li>Hue </li></ul><ul><li>Intensity </li></ul><ul><li>Spatial Position: </li></ul><ul><li>2D </li></ul>
    37. 38. Some preattentive attributes <ul><li>Form: </li></ul><ul><li>Orientation </li></ul><ul><li>Line length </li></ul><ul><li>Line width </li></ul><ul><li>Size </li></ul><ul><li>Shape </li></ul><ul><li>Curvature </li></ul><ul><li>Added marks </li></ul><ul><li>Enclosure </li></ul><ul><li>Color: </li></ul><ul><li>Hue </li></ul><ul><li>Intensity </li></ul><ul><li>Spatial Position: </li></ul><ul><li>2D </li></ul>
    38. 40. Scatterplot <ul><li>Correlation </li></ul><ul><li>Nominal comparisons </li></ul>
    39. 42. Line chart <ul><li>Time series </li></ul><ul><li>Deviation </li></ul><ul><li>Distribution </li></ul>
    40. 44. Bar chart <ul><li>Nominal comparison </li></ul><ul><li>Ranking </li></ul><ul><li>Part to whole </li></ul><ul><li>Deviation </li></ul><ul><li>Distribution </li></ul>
    41. 46. Stacked bar chart <ul><li>Part to whole </li></ul>
    42. 48. The humble pie chart
    43. 49. Is B or C larger?
    44. 50. 3D effects distort 2D proportions
    45. 51. Advice from Edward Tufte <ul><li>Show the data </li></ul><ul><li>Make large datasets coherent </li></ul><ul><li>Emphasize substance over method </li></ul><ul><li>Don't distort </li></ul><ul><li>Reveal several levels of detail </li></ul><ul><li>Serve a clear purpose </li></ul>
    46. 52. Data visualization tools
    47. 53. Google Docs
    48. 54. Untitled Image Layout
    49. 55. Untitled Image Layout
    50. 56. Untitled Image Layout
    51. 57. Many Eyes
    52. 59. Many Eyes
    53. 60. Many Eyes
    54. 61. Many Eyes
    55. 62. Many Eyes
    56. 63. Google Visualization API
    57. 64. Untitled Image Layout
    58. 65. Untitled Image Layout Some JavaScript – not so bad, right?
    59. 66. Untitled Image Layout
    60. 67. Web tools (no coding) <ul><li>Google Docs/Gadgets* http://docs.google.com/ </li></ul><ul><li>Many Eyes http://manyeyes.alphaworks.ibm.com/manyeyes/ </li></ul>
    61. 68. Web tools (coding) <ul><li>Google Visualization API* http://code.google.com/apis/visualization/documentation/gallery.html </li></ul><ul><li>Protovis* http://vis.stanford.edu/protovis/ </li></ul><ul><li>Flotr http://www.solutoire.com/experiments/flotr/examples/ </li></ul><ul><li>Flot http://people.iola.dk/olau/flot/examples/ </li></ul>
    62. 69. Web tools (coding) <ul><li>MIT Simile widgets http://www.simile-widgets.org/ </li></ul><ul><li>Rgraph http://www.rgraph.net/ </li></ul><ul><li>jQuery Visualize http://www.filamentgroup.com/lab/update_to_jquery_visualize_accessible_charts_with_html5_from_designing_with </li></ul>
    63. 70. Desktop apps (easier to use) <ul><li>OpenOffice Spreadsheet / MS Excel </li></ul><ul><li>Adobe Illustrator </li></ul><ul><li>OmniGraffle (diagramming - Mac) </li></ul><ul><li>Visio (diagramming – PC) </li></ul>
    64. 71. Desktop apps (harder to use) <ul><li>GraphViz (network graphs) </li></ul><ul><li>JMP (stats) </li></ul><ul><li>R (stats) </li></ul><ul><li>Processing* http://processing.org/ </li></ul>
    65. 72. Where to learn more
    66. 73. Books <ul><li>Show Me the Numbers* (Few, 2004) </li></ul><ul><li>Now You See It (Few, 2009) </li></ul><ul><li>The Visual Display of Quantitative Information (Tufte, 1983) </li></ul><ul><li>Beautiful Data (Segaran & Hammerbacher, 2009) </li></ul><ul><li>Visualizing Data (Fry, 2008) </li></ul>
    67. 74. Websites <ul><li>http://flowingdata.com </li></ul><ul><li>http://infosthetics.com/ </li></ul><ul><li>http://www.visualcomplexity.com/vc/ </li></ul><ul><li>http://www.gapminder.org/ </li></ul><ul><li>http://www.visualizing.org/ </li></ul><ul><li>http://understandinggraphics.com/ </li></ul>
    68. 75. Thanks! <ul><li>Cory Lown </li></ul><ul><li>Digital Technologies Development Librarian </li></ul><ul><li>NCSU Libraries </li></ul><ul><li>[email_address] </li></ul>

    ×