Transcript of "MPhil Lecture on Data Vis for Analysis"
An Introduction to Data Visualisation for Analysis Exploring the Dataset - Textual, Numerical and Otherwisehttp://www.slideshare.net/shawnday/m-phil-datavisforanalysis
Agenda Thoughts from last week - wordpress.com? Introduction What do we mean by Data Analysis? Some foundation terms and concepts The Data Visualisation Process Tools and Methods Extending your toolset An Exercise
Objective To appreciate the rich variety of techniques and tools available to digital humanities scholars for data visualisation and analysis. The intention is to be able to add tools to your arsenal and to have a sense of where to look for more.
Breakpoint One of the keys to good visualization is understanding what your immediate goals are. Are you visualizing data to understand what’s in it, or are you trying to communicate meaning to others? You - Visualisation for Data Analysis Others - Visualisation for Presentation
Speaking of Data Analysis SPSS SAS OS Equivalents
So Why Would You Want to VisualiseYour Data? Bypass language centres to tap directly into the visual cortex Leverage ability to recognise patterns - what they call visual sense-making Powerful graphics engines now allow for live data processing and sophisticated animations and interactive research environments Sources: Geoff McGhee, Getting Started with Data Viz
So Why Would You Want to VisualiseYour Data? Work with new data to create new knowledge Explore data to discover things that used to be unknown, unknowable or impractical to know Take a new perspective on the familiar to reveal previously hidden insights
Visualising New Information Tourists vs Locals, Eric Fischer, 2010 - Flickr
Visualising New Information Flickr Flow, Martin Wattenberg and Fernanda Viegas, 2009
What is the Value of this Visualisation • Easier to compare over intervals • Multiple vectors with greater granularity in a compressed space • The challenge is to ﬁnd rich enough source materials to yield substantive datasets
Case Study:Occupations of Politicians • What are we studying? – Self-declared occupations of politicians • Why? – What bias might they bring to their job? • How? – Visualising past occupation and mapping to political platform of party afﬁliated with
The Value of Data Vis for Analysis • New ways of presenting allow new ways of seeing • Hidden patterns become evident • Suggest other hypothesis to test
Basic Terms Datamining Statistics Structured/Unstructured Data Visualisation Modelling
Types of Data to Visualise Audio Data Network Data Categorical Data Social Cartographic Data Other Collections Numerical Data Image Data Temporal Data Still Textual Data Moving Narrative Metadata Qualitative Multimedia Data ????
General Steps in Data Vis for DH Discovery / Acquisition Cleaning / ‘Munging’ Analysis / Exploratory Vis Presentation
Discovery / Acquisition Original Research Scraping Spreadsheets Junar Databases Outwit Hub Digitized Media ScraperWiki Other Downloads Public Data Archives/Libraries Academic Partners Purchase
Cleaning / Munging(Normalisation, Format Conversion) Tools: Data Wrangler Google Reﬁne Mr. Data Converter Data Wrangler Does simple, split, clear, fold/unfold transforms on data See example --> Data and Script Google Reﬁne Works with larger datasets
Hands-On: Data Wrangler http://vis.stanford.edu/wrangler/app/
Hands-On: Google Reﬁne http://code.google.com/p/google-reﬁne/
Hands-On: Mr Data Converter http://shancarter.com/data_converter/
Analysis / Exploratory Visualisation Web Services Google Fusion Tables Google Spreadsheets IBM ManyEyes TimeFlow Applications Tableau/Tableau Public MS Ofﬁce OpenOfﬁce Gephi Node XL (plug-in for Excel) Spotﬁre R Processing
Google NGram Viewers Examine word frequency in digitised books Currently about 4% of books ever published In English, Chinese, French, German, Hebrew, Russian, and Spanish Changes in word usage Trends Check out the Cultural Observatory @ Harvard
Wordle Visually present word frequency using size, weight, colour Consider Word Clouds Considered Harmful
Exercise Choose a dataset from a source such as: The CSO Project Guttenberg or your own material Choose an appropriate Data Visualisation from a webservice we explored in workshop. Explain the process and how you madeyour choice and embed it in your own blog using wordpress.com as we explored last week. Suggest a research question that can be answered by using this data visualisation as a research environment Send the link to me at: firstname.lastname@example.org Maybe: http://politicalreform.ie/2011/12/04/state-of-enda-sunday- business-post-red-c-poll-4th-september-2011/
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.