MPhil Lecture on Data Vis for Analysis


Published on

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

MPhil Lecture on Data Vis for Analysis

  1. 1. An Introduction to Data Visualisation for Analysis Exploring the Dataset - Textual, Numerical and Otherwise
  2. 2. Agenda Thoughts from last week - Introduction What do we mean by Data Analysis? Some foundation terms and concepts The Data Visualisation Process Tools and Methods Extending your toolset An Exercise
  3. 3. Objective To appreciate the rich variety of techniques and tools available to digital humanities scholars for data visualisation and analysis. The intention is to be able to add tools to your arsenal and to have a sense of where to look for more.
  4. 4. Breakpoint One of the keys to good visualization is understanding what your immediate goals are. Are you visualizing data to understand what’s in it, or are you trying to communicate meaning to others? You - Visualisation for Data Analysis Others - Visualisation for Presentation
  5. 5. Speaking of Data Analysis SPSS SAS OS Equivalents
  6. 6. So Why Would You Want to VisualiseYour Data? Bypass language centres to tap directly into the visual cortex Leverage ability to recognise patterns - what they call visual sense-making Powerful graphics engines now allow for live data processing and sophisticated animations and interactive research environments Sources: Geoff McGhee, Getting Started with Data Viz
  7. 7. So Why Would You Want to VisualiseYour Data? Work with new data to create new knowledge Explore data to discover things that used to be unknown, unknowable or impractical to know Take a new perspective on the familiar to reveal previously hidden insights
  8. 8. Visualising New Information Tourists vs Locals, Eric Fischer, 2010 - Flickr
  9. 9. Visualising New Information Flickr Flow, Martin Wattenberg and Fernanda Viegas, 2009
  10. 10. The Familiar through New Eyes The Times Atlas
  11. 11. How Could You Use Data Analysis “In the Lab” - for your own analysis Online as part of collabourative groups Through dissemination for extension of own work - crowdsourcing Others?
  12. 12. The Time Ribbon and the Tree Map
  13. 13. Visualisation Objective Exploring the ordinary life of rural pioneers in nineteenth century Ontario
  14. 14. Farm Journal William Sunter Farm Diary, 1858
  15. 15. Diaries: the raw materials • 100s of pages • Varying hands • Varying quality
  16. 16. The Process • Generate word frequency (Voyeur, TAPoR) • Isolate known farm activities (NLP - LanguageWare) • Collocate to link activity references to time, duration, and resources (Voyeur)
  17. 17. Example: Medical Diary Medical Diary by BlueChillies
  18. 18. Example: History Flow History flow by Martin Wattenberg and Fernanda Viegas
  19. 19. The Result/ New Patterns
  20. 20. The Result/ New Patterns•Less time haying•The impact of technology•More tasks faster
  21. 21. How Else Could this be done?
  22. 22. What is the Value of this Visualisation • Easier to compare over intervals • Multiple vectors with greater granularity in a compressed space • The challenge is to find rich enough source materials to yield substantive datasets
  23. 23. The Tree Map
  24. 24. Example: Newsmap
  25. 25. Example: Panopticon
  26. 26. Case Study:Occupations of Politicians • What are we studying? – Self-declared occupations of politicians • Why? – What bias might they bring to their job? • How? – Visualising past occupation and mapping to political platform of party affiliated with
  27. 27. Occupations of TDs in the 30th Dáil
  28. 28. Occupations of MPs in the 2nd Parliament
  29. 29. Occupations of MPs in the 37th Parliament
  30. 30. The Result/ New Patterns • The emergence of the professional politician with no private sector experience • Occupational continuity across changes in governing party
  31. 31. How Else Could this be Done?
  32. 32. The Value of Data Vis for Analysis • New ways of presenting allow new ways of seeing • Hidden patterns become evident • Suggest other hypothesis to test
  33. 33. Basic Terms Datamining Statistics Structured/Unstructured Data Visualisation Modelling
  34. 34. Types of Data to Visualise Audio Data Network Data Categorical Data Social Cartographic Data Other Collections Numerical Data Image Data Temporal Data Still Textual Data Moving Narrative Metadata Qualitative Multimedia Data ????
  35. 35. General Steps in Data Vis for DH Discovery / Acquisition Cleaning / ‘Munging’ Analysis / Exploratory Vis Presentation
  36. 36. Discovery / Acquisition Original Research Scraping Spreadsheets Junar Databases Outwit Hub Digitized Media ScraperWiki Other Downloads Public Data Archives/Libraries Academic Partners Purchase
  37. 37. Demo/Hands-On: Junar
  38. 38. Cleaning / Munging(Normalisation, Format Conversion) Tools: Data Wrangler Google Refine Mr. Data Converter Data Wrangler Does simple, split, clear, fold/unfold transforms on data See example --> Data and Script Google Refine Works with larger datasets
  39. 39. Hands-On: Data Wrangler
  40. 40. Hands-On: Google Refine
  41. 41. Hands-On: Mr Data Converter
  42. 42. Analysis / Exploratory Visualisation Web Services Google Fusion Tables Google Spreadsheets IBM ManyEyes TimeFlow Applications Tableau/Tableau Public MS Office OpenOffice Gephi Node XL (plug-in for Excel) Spotfire R Processing
  43. 43. Google NGram Viewers Examine word frequency in digitised books Currently about 4% of books ever published In English, Chinese, French, German, Hebrew, Russian, and Spanish Changes in word usage Trends Check out the Cultural Observatory @ Harvard
  44. 44. Google NGram Viewer
  45. 45. Wordle Visually present word frequency using size, weight, colour Consider Word Clouds Considered Harmful
  46. 46. Exercise Choose a dataset from a source such as: The CSO Project Guttenberg or your own material Choose an appropriate Data Visualisation from a webservice we explored in workshop. Explain the process and how you madeyour choice and embed it in your own blog using as we explored last week. Suggest a research question that can be answered by using this data visualisation as a research environment Send the link to me at: Maybe: business-post-red-c-poll-4th-september-2011/