Introduction to information visualisation for humanities PhDs
Training workshop for the CHASE Arts and Humanities in the Digital Age programme. (
This session will give you an overview of a variety of techniques and tools available for data visualisation and analysis in the humanities. You will learn about common types of visualisations and the role of exploratory and explanatory visualisations, explore examples of scholarly visualisations, try some visualisation tools, and know where to find further information about analysing and building data visualisations.
Introduction to information visualisation for humanities PhDs
Introduction to Information
Dr Mia Ridge, @mia_out
Digital Curator, British Library
CHASE Arts and Humanities in the Digital Age, February 2017
While we're getting started...
• Check that you can get online with the browsers Firefox or Chrome
• The Exercises page contains all the links you need during the day
• Check you can view it now: http://bit.ly/2kYtGx4
• Check you can log in to Viewshare with your new account
• 11am Tea and coffee
• 1 - 1:45pm Lunch
• 3 - 3:15pm Tea and coffee
• 4:30pm Finish; free working time until 5pm
• What is information visualisation and why use
• The building blocks of visualisations
• Exploring and critiquing interactive
• Getting from the data you have to the
visualisation you want
Data visualisation can help you...
Explore your data
Explain your results
Why visualise information?
For 'sense-making (also called data analysis) and
communication' (Stephen Few)
'…showing quantitative and qualitative information
so that a viewer can see patterns, trends, or
anomalies, constancy or variation' (Michael
'…interactive, visual representations of abstract
data to amplify cognition' (Card et al)
'Distant reading' (Moretti) - focus on the shape
rather than detail of a collection
• In a sentence or two, what's your interest in
– What kinds of data do you work with?
– What's the goal of any visualisations you're
interested in creating?
– Do you have any potential users in mind?
Charles Minard's figurative map, 1869
'Figurative Map of the successive losses in men of the French Army in the Russian campaign
1812-1813'. Drawn up by M. Minard, Inspector General of Bridges and Roads in retirement.
Paris, November 20, 1869.
Web 2.0 and the mashup, 2006
Exercise: compare n-gram tools
• Think of two words or phrases you'd like to
compare over time (e.g. Burma, Burmah).
• Open two browser windows
• In one, go to http://books.google.com/ngrams
• In the other, go to http://benschmidt.org/OL/
• Enter your words or phrases in each and compare
• Discuss with your neighbour: what differences
did you find, and why?
Every point on this diagram represents a male film producer. The pink dots represent men who worked exclusively with other men in the period
surveyed, and the green dots represent those who worked with women.
https://theconversation.com/women-arent-the-problem-in-the-film-industry-men-are-68740 Deb Verhoeven and Stuart Palmer
Visualising images and video
'Mondrian vs. Rothko', Lev Manovich, 2010. Image preparation: Xiaoda Wang
• Entities (people, places, events, concepts,
How do you get data to visualise?
• Make it
– Type it into a spreadsheet or database
• Automate it
– Extract it from text, images, audio or video
• Find it
– Lots of freely available data to practice with
Scholarly data visualisations
• Visualisations as 'distant reading' where
distance is 'a specific form of knowledge:
fewer elements, hence a sharper sense of
their overall interconnection' (Moretti, 2005)
• Inspiring curiosity and research questions
• But - which questions do they privilege and
what do they leave out?
Exercise: critiquing scholarly visualisations
Go to http://bit.ly/2kYtGx4 and follow the steps
for Exercise 3
Pair up and discuss together before reporting
America's Public Bible
Considerations for humanities data
Commercial tools often assume complete, born-
digital datasets – no missing fields or changes in
data entry over time
• Historical records often contain uncertainty
and fuzziness (e.g. date ranges, multiple
values, uncertain or unavailable information)
• Includes metadata, data, digital surrogates
Messiness in historical data
• 'Begun in Kiryu, Japan, finished in France'
• 'Bali? Java? Mexico?'
• Variations on USA:
– United States of America
– USA ?
– United States (case)
• Inconsistency in uncertainty
– U.S.A. or England
– U.S.A./England ?
– England & U.S.A.
When were objects collected?
Preparing data for visualisations
Historical data often needs manual cleaning to:
remove rows where vital information is missing
tidy inconsistencies in term lists or spelling
convert words to numbers (e.g. dates)
remove hard returns and non-ASCII characters (or
change data format)
split multiple values in one field into other
columns (e.g. author name, date in single field)
expand coded values (e.g. countries, language)
What do you want to do?
• See relationships among data points
• Compare a set of values
• Track change over time
• See the parts of a whole
See relationships among data points
• Network diagram
Compare a set of values
• Bar chart
• Bubble chart
Track change over time
• Line graph
• Stack graph
See the parts of a whole
• Pie chart
Key format decisions
• Static or interactive?
• Print or digital?
• Narrative or 'factual'?
• Shape (distant view) or detail (close view)?
Purpose, data, audience, structure
• Intersections of format and purpose
• Data types: quantitative, qualitative,
geographic, time series, media, entities
(people, places, events, concepts, things)
• Static, interactive; print, digital; product,
• Exploratory, explanatory: find new insights, or
tell a story? Pragmatic, emotive?
Dealing with complex data
• Find a visualisation type that can harbour the
data in a meaningful way or reduce the data in
a meaningful way.
– e.g. go from individual values to distribution of
– e.g. introduce interaction: overview, zoom and
filter, details on demand (Ben Shneiderman)
Exercise: 10 minute Viewshare tutorial
Discuss: what did you learn about preparing
data and using visualisation software?
• Generally needs to be in tables, one row per
item, one column per value
• Aggregate or individual values - might need to
calculate totals in advance
• Data should be made as consistent as possible
with tools like Excel, OpenRefine
From viewshare, on spreadsheets:
• Remove any data that is not in a solid rectangular area.
This includes white space, page titles, scattered cells,
and additional worksheets.
• Check that your formatting is consistent throughout
each column (e.g. column is all in date format, currency
format, etc. as appropriate).
• Make sure that data of the same type but in different
columns is formatted consistently (e.g. dates in
different columns are in the same date format).
If all else fails...
• Sketch out your visualisation on paper to test
• Iteration is key, and...
• Stubbornness is a virtue!
Exercise: try views and widgets in
• Lists, maps, pie charts, bar charts, scatter plots, tables,
timelines or galleries
• Search boxes, lists, tag clouds, sliders, ranges, logos or
How might you apply these with your own data?
• How can you contextualise, explain any
limitations of your visualisations? e.g.
– provenance and qualities of original dataset;
– what you needed to do to it to get it into software
(how transformed, how cleaned);
– what's left out of the visualisation, and why?
Best practice for design
• How effectively does the visualisation support
• The most important and frequent visual
queries/pattern finding should be supported
with the most visually distinct objects
• Question: which examples did this well?
Do you really need a visualisation?
• Use tables when:
– doc will be used to look up individual values
– to compare individual values
– precise values are required
– the quantitative info to be communicated involves
more than one unit of measure
• Use graphs when:
– the message is contained in the shape of the values
– the document will be used to reveal relationships