Data visualisations as a gateway to programming


Published on

Short workshop on data visualisation for THATCamp Feminisms West, Scripps College, Claremont, California

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Learn the basics of programming by fiddling with existing visualisations and prepared exercises.Background: participants will be thinking about how to structure data for use in software, learning basic programming concepts, and moving towards tinkering with scripts. This is a great workshop for humanists who want a friendly intro to the world of programming.Find out more at
  • This is the “don't be scared” slide! Computers are really picky about spelling, white space, matching quote marks, how sentences end... Think of your most pedantic friend, and multiply that by 1000. It's like dealing with a grumpy six year old - it might be tricky to negotiate, but it's not going to kill either of you. Thinking computationally is like cooking a few courses for a fancy dinner party – you learn what needs to be prepped in advance or just before serving, which steps must be done in a particular order and what can be done at any time.Hard fun – phrase comes from gaming – when something is challenging it's even more rewarding when you finally crack it. A lot of my 'don't be scared' message is aimed at getting you over those first hurdles and into the rewarding stuff. Persistence (or stubbornness) is one of the key characteristics of a good programmer. The process of finding a path through something you're still figuring out is something programmers and researchers have in common.
  • Short workshop, leaving loads out – have prepared two routes you can go – one is using pre-made data in a tool called ManyEyes to learn about how different types of visualisations work, the other is about loading up a page that will draw a timeline based on data in a Google Spreadsheet, and playing with bits of code to start to learn how it all comes together on a web page.When you’re working with your own data, about 80% of your time is spent massaging it into shape. Researching data also takes a long time – several evenings spent putting together this list, and it’s nowhere near complete and lots of values are still missing. Starts to bring in questions about writing history – it’s not like working with born-digital scientific etc datasets.There’s a bit of me talking at the start, but I want to let you get stuck into trying things out as soon as possible. This does mean it’s up to you to get the most out of it – ask questions, let me know when you get stuck, follow your own curiosity in thinking about what to try in the time.Knowing your way around a browser will help but no hardcore technical skills are required. Making good visualisations takes time, but I hope you’ll get a taste of what can be done.
  • You can load this and have a play while I talk. I created this as an excuse to play with software called Neatline that’s designed for hand-crafted visualisations with maps and timelines. One nice thing about this is that it illustrates how far some technical skills can take you – and it’s not all about code, some of it has a big overlap with things like design and library science.Currently PhD student in Digital Humanities in the Department of History, Open UniversityPhD and MSc (Human-Computer Interaction) research on crowdsourcingCall myself a cultural heritage technologist (Science Museum, Museum of London, Melbourne Museum) because it encompasses my background as programmer and business analyst, my later interest in user experience design and research, and now my Digital Humanities research.
  • Data visualisation is about creating insight, or the formation of a mental model – a new way of thinking about data.Few, Stephen. 2013. ‘Data Visualization for Human Perception’. Ed. MadsSoegaard and RikkeFriis Dam. The Encyclopedia of Human-Computer Interaction, 2nd Ed. Aarhus, Denmark: The Interaction Design Foundation. Accessed January 14. Friendly quoted at interested in the history of visualisation, find out more Milestones in the history of data visualisation or CABINET // A Timeline of Timelines
  • Hopefully have some ideas now for how visualisations can enable 'scholars to ask increasingly complex research questions by analysing large scale datasets with freely available tools.’ Thinking now about how visualisations can be used to understand, analyse and present large-scale datasets in the humanities and science, and the value of visualisation tools in understanding the shape of a data set. In digital humanities, part of discourse around distant and close reading. Enables overview of many sources over long periods of time, highlighting changes in style, genre or content. Visualisation allows a view of large numbers of items and with tools like entity recognition, can help put them in spatial, historical or cultural context.  Ultimately about enabling spotting of patterns; patterns can lead to hypothesis.
  • Lots of different ways to think about types... Do you want to find new insights, or to communicate or convince? Can be exploratory (find stories)/explanatory (tell stories) in purpose, and range from analytic/pragmatic - abstract/emotive axis Source: Tale of Two Types of Visualization and Much Confusion, Robert Kosara: 'two major types of data-based visualization, and understanding the differences. … Pragmatic Visualization…even if understanding this requires some work and experience, the goal of this method is to communicate the data, as efficiently as possible. ... If a visualization is designed to visually represent data, and to do that in such a way as to gain new insights into that data, it shall be called a pragmatic visualization. The basic idea is that using the human visual system (instead of automatic means like data mining or statistics), we can gain insight into data, and develop an understanding of the data and the structures in it. To determine whether a visualization is pragmatic, we simply ask if it allows us to efficiently read the data (or at least the relationships between subsets) from the display.' Cf Artistic Visualization
  • Scatterplots: good for relationships between variablesMatrix chart: good for multi-dimensional dataBubble chart: good for data with big variations in numbersLine, stack graphs: good for changes in numbers over timePie charts: good for showing proportionsTreemap: good for hierarchical structuresWord tree: good for unstructured textPhrase Net: display common relationships between words in textMaps: display data by location
  • What types of data are suitable for visualisation? ; the issues researchers commonly encounter when applying tools designed for the commercial sector to typically fuzzy, incomplete and complex humanities data; Data within one dataset might have been prepared by different departments, in different original systems or at different times, so when cleaning data, some content might be more likely to drop out than others.
  • Examples from the Cooper Hewitt collection. I spent 3/5 of my time at the Cooper Hewitt just trying to get the data clean enough to vaguely represent the collection. The problem is that computers think U.S., U. S. , U.S.A., U. S. A. , United States, United States of America are six different places.Fields also contain things like internal notes about potential duplicates, unexpected extra information - notes on what type of location, etc. Lots of inconsistencies - uncertainty and date ranges expressed in different ways.More common museum issues - What year is 'early 18th century'? What do you do with '1836 (probably)'?
  • Tools die when they encounter messy data
  • There are also lots of software libraries for creating visualisations lets you toggle between ones that require you to code and ones that don’t) but many require some programming knowledge.If you want to do really interesting things, invent new type of visualisations or find ways of presenting your specific data, you might need to get stuck into some code. Finding someone to work with can be a good way of learning if you don’t have any training available to you.
  • Visualization Options Available in Many Eyes formats for uploading data1] Prepare your data. First, find the data set that you want to put into Many Eyes. The size limit is 5 megabytes. Data tables If your data is a list of values, first format it into a table with informative column headers. If your columns have different units of measure, be sure to include the units in the headers. Use a spreadsheet program such as Microsoft Excel or a text file where columns are separated with tabs. If this is your first upload, read the format guidelines. If you have a specific visualization in mind, take a look at its explanation page for additional information.Free TextIf your data is free text (such as an essay or a speech), open the data in a word processor or web browser, select the text, and copy it to the clipboard by typing control-C (Windows) or command-C (Macintosh).
  • The code is heavily (and chattily) commented with things to try so that you can start to see how the code effects what happens on the page.
  • CSDiff(Windows)
  • It physically hurts me to see unmatched quotes because they have been the cause of so much trauma in the past
  • Visualisation type - review previous slides, think about whether you're:Comparing categories;Assessing hierarchies & part-to-whole relationships;Showing changes over time;Charting connections and relationships;Mapping geo-spatial dataYou might get further working in pairs… [Exercises must include: creating a data visualisation (learn how to use online tools to create visualisations that explore British Library datasets such as the British National Bibliography or 19th Century books, designed to result in something to take home to mum); using Google Refine to clean and prepare data. Do, clean, re-do? How to design so that failure is a learning experience? Small, controlled 'compare and contrast' experiments with ManyEyes? Do exercise on discussing how visualisations are good or bad in terms of design?]
  • Find out more at
  • Data visualisations as a gateway to programming

    1. 1. Data visualisations as a gateway to programming Mia Ridge @mia_out THATCamp Feminisms West Scripps College, California, March 2013
    2. 2. AKA: a whirlwind tour of datavisualisation(and some bits to tempt you into playingwith code)
    3. 3. ‘Start small, make things, andthen when you’re done, makesome more things.’Jake Levine,
    4. 4. Probably impossible things• Asking a question that’s actually stupid• Breaking the computer
    5. 5. Some points about code• Computers are annoyingly pedantic• Scripting isnt rocket science (but it is hard fun)
    6. 6. Overview• What is data visualisation?• Tools and types of visualisations• A bit of programming jargon• Activity options: play with data in ManyEyes or tweak timeline/map code to try basic programming
    7. 7. Registering with Many Eyes• In your browser, go to http://www- ter and register for a Many Eyes account – Check your email to make sure the registration has come through for later use• There’s a dataset loaded into ManyEyes that you can try different things with but you might find that you want to tweak new versions to achieve particular effects
    8. 8. Who am I? Tool from
    9. 9. Who are you?• One sentence on your interest in data visualisation, do you have any potential uses in mind?
    10. 10. What is data visualisation?• …the graphical display of abstract information for two purposes: sense-making (also called data analysis) and communication’ (Stephen Few)• …showing quantitative and qualitative information so that a viewer can see patterns, trends, or anomalies, constancy or variation, in ways that other forms – text and tables – do not allow. (Michael Friendly)• …interactive, visual representations of abstract data to amplify cognition‘ (Card et al., 1999)
    11. 11. Scholarly data visualisations• Visualisations as ‘distant reading’ where distance is ‘a specific form of knowledge: fewer elements, hence a sharper sense of their overall interconnection’ (Moretti, 2005)• Inspiring curiosity and research questions• But - what do they leave out?
    12. 12. Types of visualisations• Different types of data in: quantitative, qualitative, geographic, time series, entities (people, places, events, concepts, things)• Static, interactive• Exploratory, explanatory: find new insights, or tell a story?• Pragmatic, analytic? Abstract, emotive?•
    13. 13. Visualisation types in Many Eyes
    14. 14. Considerations for humanities data• Commercial tools often assume complete, born-digital datasets – no missing fields, consistent data entry over time• Humanities and GLAM (galleries, libraries, museums, archives) records contain uncertainty and fuzziness (e.g. date ranges, uncertain places, creators, etc)
    15. 15. Messiness in data• Begun in Kiryu, Japan, finished in France• Bali? Java? Mexico?• Variations on USA: – U.S. – U.S.A – U.S.A. – USA – United States of America – USA ? – United States (case)• Inconsistency in uncertainty – U.S.A. or England – U.S.A./England ? – England & U.S.A.
    16. 16. Computers dont cope
    17. 17. Cleaning data for visualisationsHumanities data often needs manual cleaning to: remove rows where vital information is missing tidying inconsistencies in term lists or spelling converting words to numbers (e.g. dates) remove hard returns and non-ASCII characters (or change data format) split multiple values in one field into other columns (e.g. author name, date in one field) expanded coded values (e.g. countries, language)
    18. 18. What other data can you join to yours? Information from general sites like Wikipedia, Freebase, VIAF Information from other GLAMs Other information about the same event, place, person, object, etc General contextualising information – science, history, reviews, citations?
    19. 19. Dealing with complex data• Find a visualisation type that can harbour the data in a meaningful way or reduce the data in a meaningful way. – e.g. go from individual values to distribution of values – e.g. introduce interaction: overview, zoom and filter, details on demand (Ben Shneiderman)
    20. 20. Visualisation tools
    21. 21. IBM Many Eyes
    22. 22. SIMILE example• Data:• mileexample.html
    23. 23. Programming concepts
    24. 24. Variables and comments• Variables: containers that store things• Comments: leave messages for other programmers; the computer cant see them• Operators: small, simple bits of functionality
    25. 25. Getting unstuck• Try copy/pasting or typing the error message into Google.• Make different versions as you go, use software to compare two versions of a file• Asking for help: what steps would someone need to take to reproduce the problem? What did you expect the output to be and what happened instead?• Most browsers have built-in tools to help you debug JavaScript.
    26. 26. Getting unstuck• Make a copy of the exercise file first so you can always compare with one that works• If it breaks or doesnt work: – Check that “quotes’ and {brackets) are matched – Check that any named thing is spelt consistently – Check upper/lower case – Ask the person next to you (sometimes explaining it helps you spot the issue) – If the last version works, use software to compare two versions of a file
    27. 27. Visualising ‘Inspiring Women’• ManyEyes – online tool, no code required• SIMILE – start with a working example, read through the commented code and try the exercises listed in the comments
    28. 28. ‘Inspiring Women’ in ManyEyes• Log into ManyEyes• Go to – visualisation options available from there• Choose a type of visualisation and evaluate the results – What cleaning, extra data or transformation might be needed? – You may need to iterate with different versions of the data from
    29. 29. Review: visualisation tools• What did the tools you tried do well? Poorly?• Were the tool and the data a good match for each other?• Which tools might be useful in the future?
    30. 30. ‘Start small, make things, and then when you’re done, make some more things.’Some links: you!Mia Ridge, Open University