Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Beyond the Black Box:
Data Visualisation
Dr Mia Ridge, @mia_out
Digital Curator, British Library
Beyond the Black Box, Uni...
Before we get started
• Go to http://viewshare.org/ and sign up for an
account
• Raise your hand or ask your neighbour for...
Overview
• Foundations of data visualisation
– What is data visualisation and why use it?
– The building blocks of visuali...
What is visualisation?
Visualisation is the graphical display of
quantitative or qualitative information to create
insights by highlighting patte...
From this...
...to this
...or this
Why visualise information?
For 'sense-making (also called data analysis) and
communication' (Stephen Few)
'…showing quanti...
Data visualisation can help you...
Explore your data
Explain your results
Introductions
• In a sentence or two, what's your interest in
data visualisation?
– What kinds of data do you work with?
–...
The building blocks of visualisation
Charts
https://cloud.highcharts.com/show/azujym
Causes of death in Shakespeare plays
All the deaths depicted by the Bard
Florence Nightingale's petal charts, 1857
Joseph Priestley, 1769
John Snow's cholera map, 1854
Charles Minard's figurative map, 1869
'Figurative Map of the successive losses in men of the French Army in the Russian ca...
Web 2.0 and the mashup, 2006
http://www.bombsight.org
Isochronic maps
https://www.mysociety.org/files/2014/03/rail-edinburgh-1500px.png
The old tube map
Harry Beck, 1931
Small multiples
Exploring words
http://www.codeitmagazine.com/images/text.png
Exploring words
http://www.jasondavies.com/wordtree/
Networks
Every point on this diagram represents a male film producer. The pink dots represent men who worked exclusively w...
Networks
http://networks.viraltexts.org/1836to1899/
Visualising images and video
http://www.flickr.com/photos/culturevis/5883371358/
'Mondrian vs. Rothko', Lev Manovich, 2010...
Sonification
http://www.caseyrule.com/projects/sounds-of-sorting/
Visualisations and data types
• Quantitative
• Qualitative
• Geographic
• Temporal
• Media
• Entities (people, places,
eve...
Comments or questions?
Exploring scholarly visualisations
Scholarly data visualisations
• Exploring or explaining datasets / arguments
• Sometimes 'distant reading' - providing sen...
Exercise: critiquing scholarly visualisations
Go to http://bit.ly/2lHMyQB and follow the
steps for Exercise 1
Pair up and ...
America's Public Bible
http://americaspublicbible.org/
http://on-broadway.nyc/
http://www.sixdegreesoffrancisbacon.com/
Bristol Know Your Place
http://maps.bristol.gov.uk/knowyourplace/
https://www.historypin.org/
Visualizing Emancipation
http://www.americanpast.org/emancipation/
New York Society Library’s City Readers
http://cityreaders.nysoclib.org/About/visualizations
Mapping the Republic of Letters
http://www.stanford.edu/group/toolingup/rplviz/rplviz.swf
https://www.locatinglondon.org/
Digital Harlem
http://digitalharlem.org
Digital Public Library of America
http://dp.la/
Orbis
http://orbis.stanford.edu
Lost Change
http://tracemedia.co.uk/lostchange/
State of the Union
http://benschmidt.org/poli/2015-SOTU
http://viraltexts.northeastern.edu/
From the data you have to the
visualisation you want
How do you get data to visualise?
• Make it
– Mark up text or type/copy data into a structured
format
• Automate it
– Extr...
Proceedings of the Old Bailey, 25th April 1677, page 4.
http://criminalintent.org/OB-data-warehouse/
http://cloud.tapor.ca/digging2data/
Computational data generation
• Generate data from attributes of text, images,
etc
• Allows visualisation at scale
• Can b...
Topic modelling
http://discontents.com.au/mining-for-meanings/
Other forms of text analysis
Entity
recognition:
turning text into
things
Entity recognition examples
Information from video, images
http://emotions.periscopic.com/inauguration/
Extracting information from images
https://www.clarifai.com/demo
Exercise: Explore computational data
generation and entity extraction
Go to http://bit.ly/2lHMyQB and follow the
steps for...
Exercise: learning about black boxes
• What attributes does each tool report on? Which attributes, if any, were
unique to ...
Dealing with humanities data
Considerations for humanities data
Commercial tools often assume complete, born-
digital datasets
• Historical records oft...
Messiness in historical data
• 'Begun in Kiryu, Japan, finished in France'
• 'Bali? Java? Mexico?'
• Variations on USA:
– ...
Computers don't cope
Preparing data for visualisations
Historical data often needs manual cleaning to:
 remove rows where vital information is...
Open Refine
…but be careful
Data Preparation
• Generally needs to be in tables, one row per
item, one column per value. One bit of data
per cell!
• De...
Viewshare's advice on
spreadsheets
• Remove any data that is not in a solid rectangular area.
This includes white space, p...
Document data preparation!
Putting visualisations in context
Visualisations and 'truthiness'
A sample of publication printing locations 1534-1831 (British Library data)
http://bit.ly/...
Visualising uncertainty
Matt Lincoln http://blogs.getty.edu/iris/metadata-specialists-share-their-challenges-defeats-and-t...
Visualising uncertainty
Publishing visualisations
• How can you contextualise, explain any
limitations of your visualisations? e.g.
– provenance a...
Choosing visualisation formats
Structure
Purpose
Data
Audience
Purpose, data, audience, structure
• Intersections of format and purpose
• Data types: quantitative, qualitative,
geograph...
Key format decisions
• Print or digital?
• Static or interactive?
• Narrative or 'factual'?
• Shape (distant view) or deta...
What do you want to do?
• See relationships between variables (data points)
• Compare sets of values
• Track change over t...
Dealing with complex data
• Find a visualisation type that can harbour the
data in a meaningful way or reduce the data in
...
If all else fails...
• Sketch out your visualisation on paper to test
it and work out what data is needed
• Iteration is k...
Practising with Viewshare
Browser-based - no need to install software
Supports a range of input formats, relatively
smart ...
Exercise: Create simple visualisations
with Viewshare
Go to http://bit.ly/2lHMyQB and follow the
steps for:
• Viewshare Ex...
Don’t Do try this at home
Tools that don't require programming
• Excel
• Google Fusion Tables, Google Drive
• Viewshare
• Tableau Public
Directories...
Giorgia Lupi and Stefanie Posavec http://www.dear-data.com/all
Thank you!
Final thoughts?
http://bit.ly/2lHMyQB
Mia Ridge @mia_out
Digital Curator, British Library
Beyond the Black Box,...
Beyond the Black Box: Data Visualisation
Beyond the Black Box: Data Visualisation
Beyond the Black Box: Data Visualisation
Beyond the Black Box: Data Visualisation
Beyond the Black Box: Data Visualisation
Upcoming SlideShare
Loading in …5
×

Beyond the Black Box: Data Visualisation

2,584 views

Published on

For Beyond the Black Box, University of Edinburgh, February 2017

As the datasets used by humanists become ever larger and more readily accessible, the ability to render and interpret overwhelmingly large amounts of information in graphically literate ways has become an increasingly important part of the researcher’s skillset. In this workshop, participants will be introduced to the core principles of scholarly data visualisation and shown how to use a variety of visualisation tools.

Visualisations may sound like the opposite of a black box, as they display the data provided. However, aside from 'truthiness' of things on a screen, lots of invisible algorithmic decisions affect what appears on the screen. Data used in visualisations is increasingly generated algorithmically rather than manually. What choices is software making for you, and whose world view do they reflect? Algorithms are choices - if you can't read the source code or access the learned model, how can you understand them?

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Beyond the Black Box: Data Visualisation

  1. 1. Beyond the Black Box: Data Visualisation Dr Mia Ridge, @mia_out Digital Curator, British Library Beyond the Black Box, University of Edinburgh, February 2017
  2. 2. Before we get started • Go to http://viewshare.org/ and sign up for an account • Raise your hand or ask your neighbour for help if you get stuck
  3. 3. Overview • Foundations of data visualisation – What is data visualisation and why use it? – The building blocks of visualisation – Create simple visualisations in Viewshare • Exploring and critiquing interactive, scholarly visualisations • What happens in the black box? – Explore different algorithms for entity recognition for images
  4. 4. What is visualisation?
  5. 5. Visualisation is the graphical display of quantitative or qualitative information to create insights by highlighting patterns, trends, variations and anomalies.
  6. 6. From this...
  7. 7. ...to this
  8. 8. ...or this
  9. 9. Why visualise information? For 'sense-making (also called data analysis) and communication' (Stephen Few) '…showing quantitative and qualitative information so that a viewer can see patterns, trends, or anomalies, constancy or variation' (Michael Friendly) '…interactive, visual representations of abstract data to amplify cognition' (Card et al) 'Distant reading' (Moretti) - focus on the shape rather than detail of a collection
  10. 10. Data visualisation can help you... Explore your data Explain your results
  11. 11. Introductions • In a sentence or two, what's your interest in data visualisation? – What kinds of data do you work with? – Who or what visualisations you're creating be for?
  12. 12. The building blocks of visualisation
  13. 13. Charts https://cloud.highcharts.com/show/azujym Causes of death in Shakespeare plays All the deaths depicted by the Bard
  14. 14. Florence Nightingale's petal charts, 1857
  15. 15. Joseph Priestley, 1769
  16. 16. John Snow's cholera map, 1854
  17. 17. Charles Minard's figurative map, 1869 'Figurative Map of the successive losses in men of the French Army in the Russian campaign 1812-1813'. Drawn up by M. Minard, Inspector General of Bridges and Roads in retirement. Paris, November 20, 1869.
  18. 18. Web 2.0 and the mashup, 2006 http://www.bombsight.org
  19. 19. Isochronic maps https://www.mysociety.org/files/2014/03/rail-edinburgh-1500px.png
  20. 20. The old tube map
  21. 21. Harry Beck, 1931
  22. 22. Small multiples
  23. 23. Exploring words http://www.codeitmagazine.com/images/text.png
  24. 24. Exploring words http://www.jasondavies.com/wordtree/
  25. 25. Networks Every point on this diagram represents a male film producer. The pink dots represent men who worked exclusively with other men in the period surveyed, and the green dots represent those who worked with women. https://theconversation.com/women-arent-the-problem-in-the-film-industry-men-are-68740 Deb Verhoeven and Stuart Palmer
  26. 26. Networks http://networks.viraltexts.org/1836to1899/
  27. 27. Visualising images and video http://www.flickr.com/photos/culturevis/5883371358/ 'Mondrian vs. Rothko', Lev Manovich, 2010. Image preparation: Xiaoda Wang
  28. 28. Sonification http://www.caseyrule.com/projects/sounds-of-sorting/
  29. 29. Visualisations and data types • Quantitative • Qualitative • Geographic • Temporal • Media • Entities (people, places, events, concepts, things)
  30. 30. Comments or questions?
  31. 31. Exploring scholarly visualisations
  32. 32. Scholarly data visualisations • Exploring or explaining datasets / arguments • Sometimes 'distant reading' - providing sense of overall connection, patterns by pulling back from detail, close reading (Moretti, Stanford LitLab) • Inspiring curiosity and research questions • But - which questions do they privilege and what do they leave out?
  33. 33. Exercise: critiquing scholarly visualisations Go to http://bit.ly/2lHMyQB and follow the steps for Exercise 1 Pair up and discuss together before reporting back.
  34. 34. America's Public Bible http://americaspublicbible.org/
  35. 35. http://on-broadway.nyc/
  36. 36. http://www.sixdegreesoffrancisbacon.com/
  37. 37. Bristol Know Your Place http://maps.bristol.gov.uk/knowyourplace/
  38. 38. https://www.historypin.org/
  39. 39. Visualizing Emancipation http://www.americanpast.org/emancipation/
  40. 40. New York Society Library’s City Readers http://cityreaders.nysoclib.org/About/visualizations
  41. 41. Mapping the Republic of Letters http://www.stanford.edu/group/toolingup/rplviz/rplviz.swf
  42. 42. https://www.locatinglondon.org/
  43. 43. Digital Harlem http://digitalharlem.org
  44. 44. Digital Public Library of America http://dp.la/
  45. 45. Orbis http://orbis.stanford.edu
  46. 46. Lost Change http://tracemedia.co.uk/lostchange/
  47. 47. State of the Union http://benschmidt.org/poli/2015-SOTU
  48. 48. http://viraltexts.northeastern.edu/
  49. 49. From the data you have to the visualisation you want
  50. 50. How do you get data to visualise? • Make it – Mark up text or type/copy data into a structured format • Automate it – Extract it from text, images, audio or video • Find it – Lots of freely available data to practice with or check and enhance
  51. 51. Proceedings of the Old Bailey, 25th April 1677, page 4.
  52. 52. http://criminalintent.org/OB-data-warehouse/ http://cloud.tapor.ca/digging2data/
  53. 53. Computational data generation • Generate data from attributes of text, images, etc • Allows visualisation at scale • Can be used in conjunction with manual methods • Tools often require calibration or 'training'
  54. 54. Topic modelling http://discontents.com.au/mining-for-meanings/
  55. 55. Other forms of text analysis Entity recognition: turning text into things
  56. 56. Entity recognition examples
  57. 57. Information from video, images http://emotions.periscopic.com/inauguration/
  58. 58. Extracting information from images https://www.clarifai.com/demo
  59. 59. Exercise: Explore computational data generation and entity extraction Go to http://bit.ly/2lHMyQB and follow the steps for Exercise 2: 1. Find a sample image 2. Load it onto the listed browser-based tools 3. Review and discuss the outputs
  60. 60. Exercise: learning about black boxes • What attributes does each tool report on? Which attributes, if any, were unique to a service? • Based on this, what do each vendor seem to think is important to them (or to their users)? • How many possible entities (e.g. concepts, people, places, events, references to time or dates) did it pick up? • Is any of the information presented useful? Did it label anything incorrectly? • What options for exporting or saving the results do the demo or full service offer? • For tools with configuration options - what could you configure? What difference did changing classifiers or other parameters make? • If you tried it with a few images, did it do better with some than others? Why might that be?
  61. 61. Dealing with humanities data
  62. 62. Considerations for humanities data Commercial tools often assume complete, born- digital datasets • Historical records often contain uncertainty and fuzziness (e.g. date ranges, multiple values, uncertain or unavailable information; data entry standards change over time) • Humanities data often multi-layered, multi- relational • 'Data' = metadata, data, digital surrogates, born-digital items
  63. 63. Messiness in historical data • 'Begun in Kiryu, Japan, finished in France' • 'Bali? Java? Mexico?' • Variations on USA: – U.S. – U.S.A – U.S.A. – USA – United States of America – USA ? – United States (case) • Inconsistency in uncertainty – U.S.A. or England – U.S.A./England ? – England & U.S.A.
  64. 64. Computers don't cope
  65. 65. Preparing data for visualisations Historical data often needs manual cleaning to:  remove rows where vital information is missing  tidy inconsistencies in term lists or spelling  convert words to numbers (e.g. dates)  remove hard returns and non-ASCII characters (or change data format)  split multiple values in one field into other columns (e.g. author name, date in single field)  expand coded values (e.g. countries, language)
  66. 66. Open Refine
  67. 67. …but be careful
  68. 68. Data Preparation • Generally needs to be in tables, one row per item, one column per value. One bit of data per cell! • Decide on aggregate or individual values - might need to calculate totals in advance • Data should be made as consistent as possible with tools like Excel, OpenRefine
  69. 69. Viewshare's advice on spreadsheets • Remove any data that is not in a solid rectangular area. This includes white space, page titles, scattered cells, and additional worksheets. • Check that your formatting is consistent throughout each column (e.g. column is all in date format, currency format, etc. as appropriate). • Make sure that data of the same type but in different columns is formatted consistently (e.g. dates in different columns are in the same date format).
  70. 70. Document data preparation!
  71. 71. Putting visualisations in context
  72. 72. Visualisations and 'truthiness' A sample of publication printing locations 1534-1831 (British Library data) http://bit.ly/W9VM7D
  73. 73. Visualising uncertainty Matt Lincoln http://blogs.getty.edu/iris/metadata-specialists-share-their-challenges-defeats-and-triumphs/#matt
  74. 74. Visualising uncertainty
  75. 75. Publishing visualisations • How can you contextualise, explain any limitations of your visualisations? e.g. – provenance and qualities of original dataset; – what you needed to do to it to get it into software (how transformed, how cleaned); – what's left out of the visualisation, and why?
  76. 76. Choosing visualisation formats
  77. 77. Structure Purpose Data Audience
  78. 78. Purpose, data, audience, structure • Intersections of format and purpose • Data types: quantitative, qualitative, geographic, time series, media, entities (people, places, events, concepts, things) • How clean are your sources? How much time do you have?
  79. 79. Key format decisions • Print or digital? • Static or interactive? • Narrative or 'factual'? • Shape (distant view) or detail (close view)?
  80. 80. What do you want to do? • See relationships between variables (data points) • Compare sets of values • Track change over time / distribution in space • See the parts of a whole (composition)
  81. 81. Dealing with complex data • Find a visualisation type that can harbour the data in a meaningful way or reduce the data in a meaningful way. – e.g. go from individual values to distribution of values – e.g. introduce interaction: overview, zoom and filter, details on demand (Ben Shneiderman)
  82. 82. If all else fails... • Sketch out your visualisation on paper to test it and work out what data is needed • Iteration is key, and... • Stubbornness is a virtue!
  83. 83. Practising with Viewshare Browser-based - no need to install software Supports a range of input formats, relatively smart about processing it for you Relatively easy to get started with maps, timelines, charts Interactive visualisations can be embedded in web pages, can save images as screenshots for print Supported by heritage institution
  84. 84. Exercise: Create simple visualisations with Viewshare Go to http://bit.ly/2lHMyQB and follow the steps for: • Viewshare Exercise 1: Ten minute tutorial - getting started with Viewshare • Viewshare Exercise 2: Create new views and widgets
  85. 85. Don’t Do try this at home
  86. 86. Tools that don't require programming • Excel • Google Fusion Tables, Google Drive • Viewshare • Tableau Public Directories listed at http://bit.ly/2lHMyQB NB: be careful about sensitive data on cloud platforms
  87. 87. Giorgia Lupi and Stefanie Posavec http://www.dear-data.com/all
  88. 88. Thank you! Final thoughts? http://bit.ly/2lHMyQB Mia Ridge @mia_out Digital Curator, British Library Beyond the Black Box, University of Edinburgh, February 2017

×