Data Journalism MDST 3559: DataestheticsProf. Alvarado1/25/2011
Business Course web site http://pages.shanti.virginia.edu/Dataesthetics_S11 Course Collab site Dataesthetics S11 Syllabus Posted as a page on the course web site Review at end of class
Review Dataesthetics is about data design Data design is relevant at several levels: Data modeling (tables, etc.) Processing (code) Visualizing (charts, graphs, interfaces, art, etc.) Contextualizing (digital storytelling, arguments, presentations, etc.) Each level denotes a form a digital representation
Overview We look at the new field of Data Journalism A framing example for the course Accessible content Shows all of the levels Uses available tools A great example to imitate Thursday we will do our own DJ Acquire data and use the tools
What is Data Journalism?
How is DJ related to traditional journalism? i.e. news stories and op eds, aka Plain Old Journalism (POJ)
Relation to POJ Data work is supplementary to the story Combines data, visualization, and story-telling But also valuable in itself the publishing of interesting data is a journalistic act that stands alone “The Guardian curates far more data than it creates” (NJL) Data tells a story More interactive “there’s somebody out there who knows a lot more than you do, and can thus contribute.” (NJL)
What is the workflow of DJ?
“Find, interrogate, visualize, mash” Acquisition from diverse sources Well-formatted data sources Web scraping from government PDFs, web sites Everything ends up in Google Docs Data is cleaned up Data is interrogated, explored Available tools used to make visualizations
Example: Afghanistan IEDs
Example Get IED data from Data Blog link to Google http://www.guardian.co.uk/news/datablog/2010/jul/27/wikileaks-afghanistan-data-datajournalism Download as CSV Change extension to txt Open in Excel and save as tab delimited file Delete extra data Paste into Many Eyes Choose Block Histogram
Government Data http://www.guardian.co.uk/data http://www.data.gov/ http://factual.com/
“The technology involved is surprisingly simple, and mostly free. The Guardian uses public, read-only Google Spreadsheets to share the data they’ve collected, which require no special tools for viewing and can be downloaded in just about any desired format. Visualizations are mostly via Many Eyes and Timetric, both free.” http://www.niemanlab.org/2010/08/how-the-guardian-is-pioneering-data-journalism-with-free-tools/
TBL says the future of journalism "lies with journalists who know their CSV from their RDF, can throw together some quick MySQL queries for a PHP or Python output … and discover the story lurking in datasets released by governments, local authorities, agencies, or any combination of them – even across national borders." Same for scholarship?
Types of Data Sources vary – often must be scraped CSV (‘comma separated values’) is the lingua franca Once it is in this form, you can do anything with it Actually more general—any delimited format
Types of Visualization ManyEyes http://www-958.ibm.com/software/data/cognos/manyeyes/page/Visualization_Options.html Google http://code.google.com/apis/visualization/documentation/gallery.html
Homework Get a Google account and visit Google Docs docs.google.com Create a spreadsheet Create a ManyEyes account http://www-958.ibm.com/software/data/cognos/manyeyes/ Read “Visualization Types”