Successfully reported this slideshow.

When data journalism meets science | Erice, June 10th, 2014



1 of 38
1 of 38

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

When data journalism meets science | Erice, June 10th, 2014

  1. 1. ALESSIO CIMARELLI Data scientist at Dataninja | @jenkin27 International School of Science Journalism The Digital World (Erice, June 10th, 2014)
  2. 2. aka jenkin PAST Master Degree in Physics at the University of Rome "La Sapienza" Master in Science Communication at the International School for Advanced Studies (SISSA-ISAS) in Trieste Press officer at the European Laboratory for Non-Linear Spectroscopy (LENS) in Florence PRESENT Freelance data journalist, web developer, open data activist, citizen scientist, ...
  3. 3. Data journalism & data visualization made in Italy
  4. 4. You know very well how it works... :)
  5. 5. As topic Stories about the edge of scientific research and human knowledge. Key role in relationship between science and society. Science journalist can be a watchdog against false science and scientific frauds.
  6. 6. As method It would be evident in , because the workflow is similar to police inquiries or scientific research. Many informations from different sources, accountability problems, hypothesis and proofs, trial and error cycles, and so on. Not only a story, but also a discovery itself...
  7. 7. A word in a buzzwords era when his investigation is ultimately based on (or driven by) digital data, he acquires such prefix. If a journalist want to tell the world, and the world is now made of digital and quantitative informations, he has to acquire skills in management and interpretation of data, or he will miss an opportunity.
  8. 8. Teamwork and multidisciplinary Nose for news, public interest, intuition based on contest knowledge Analytical mind, mathematical and statistical skills, intuition based on science of numbers
  9. 9. Teamwork and multidisciplinary Problem solving, hi-tech knowledge in hardware and software, nerd (or geek, if you prefer) mood Artistic sensibility and intuition, knowledge in User Experience theory and techniques
  10. 10. Miners, dustmen, researchers, and story tellers Public search engines or deep web? Official 5-stars open data or web spiders and screen scrapers? Monitor and keyboard, smartphone and touch, or boots and mud? Data should be read by machines and not by humans! Datasets could hide errors, inconsistencies, lies... or show only a part of a story.
  11. 11. Miners, dustmen, researchers, and story tellers Normalizations and comparisons, filtering, grouping, aggregation, correlations, ... How to represent numbers and relations among numbers? Yes, with arabic numerals, but pictures are worth a thousand words... as long as you keep in mind that there are facts behind the numbers, and (copyright of The Guardian).
  12. 12. In method You run into a dataset and feel the presence of a possible news... OR ... you have an interest, an idea, a thesis, so you are looking for data. Having quantitative data about a phenomenon means that somewhere there is a you have to understand, test, verify... and interpret! Data themselves can suggest new ways for your investigation or even falsify some hypothesis or assumptions. Common sense, intellectual honesty, professional ethics
  13. 13. Some random examples New Scientist Apps tornadoes warmingworld exoplanets planck sealevel The Telegraph map of wind farm Sorting algorithms Meteorites Earth Journalism Network
  14. 14. by Global Editors Network Health American Way of Birth, Costliest in the World Inside the Government's Drug Data Which Emergency Room Will See You the Fastest? New York floods Breathless and Burdened When Italy is shaking Italy, a delicate land Kepler’s Tally of Planets Biomassa (NYT) (ProPublica) (ProPublica) Environment (ProPublica) (Center for Public Integrity) (La Stampa) (La Stampa) Astronomy (NYT) Energy (Planbureau voor de Leefomgeving)
  15. 15. Research data, science world, citizen science
  16. 16. Hard sciences and social sciences Ok, neither LHC petabytes are for journalists, nor statistical data from epidemiologic surveys. But , or (open) , why not? If you are not specialized in a specific topic or if you lack the knowledge about the framework, you can ask to an expert you trust. You can also use numbers not in an investigation, but to tell a complex story using infographics and interactive visualizations.
  17. 17. Bibliographies, social networks of scientists, infrastructures Science is a human activity and an industry (almost) like any other. How are the European funds invested in scientific research? Where are the centers specialized in the treatment of specific diseases? Why some well known monitoring technologies are not used in some countries?
  18. 18. Sensor-based journalism Cheap electronics and sensors + open hardware + free information sharing = data from stakeholders other than scientists It's early, but promising: Swiss Make Open Data Camps Japan Geigermap at-a-glance Citizen Science & Sensors
  19. 19. If you have data, it's better if you know how to deal with them. If you think you may find some data, it's better if you use them. If someone use data, it's better if you can check his claims. Play with data is funny!
  20. 20. Welcome to the jungle!
  21. 21. Some examples Public administration International organizations NGOs Civic activists Press offices Leaks Social networks Journalistic sources Single journalists Ourselves...
  22. 22. Data made public and reusable Open Data Hub OpenIR (USA) (UK) (Italy) (Indonesia) ...
  23. 23. Remember the buzzword era? Data from big science experiments (Atlas, Human Brain Project, ...) Social networks (Facebook, Twitter, but also eBay, Amazon, ...) Maybe it's not for journalists, but it's a hot topic... Google Earth Engine
  24. 24. For machine, not for human The keyword is ! A well-formed table represent a structured data set. A list of facebook comments, articles of a newspaper, a recorded speech are not structured data (and so are not machine-readable).
  25. 25. It all depends on the format If we have Gladstone Gander as best friend: spreadsheet (xls, xlsx, ods, csv, tsv); not-so-common good formats (xml, sql, json, shp, kml, ...). If we are not so lucky: tables or lists in web pages (html); simple tables in well-done pdfs (pdf). If we have Murphy as worst enemy: scanned images, even if in a pdf wrapper (png, jpg, pdf); digital data behind complex search engines. And if we have the best data ever, but under closed license?
  26. 26. Well-formed data sets Numbers are numbers, strings are strings and not numbers, datetime must always have a single format (ie. yyyy/mm/dd), localization is important, no gender values in names' column or similar mixings, every elements should be named with a Unique Identifier (ID). Data types computer understands: integers (with sign, zero included), floating numbers (with sign), datetime, characters and string (case sensitive), null value (the strange case of a value that states "I'm not a value"). And simple comparisons are strictly equalities, also in strings!
  27. 27. Aggregation, average, normalization, relative difference, distribution, ... A single rule: correlation does not imply causation! Spurious correlations: Correlated:
  28. 28. At a glance
  29. 29. With great power comes great responsibility The basic idea is quite simple: you have quantities expressed in numbers and geometric objects defined by dimensions (ie. radius in a circle), so you just have to decide how connect your quantities to visual dimensions. There are several (un)common charts and endless combinations: scatter plots, lines, bars, areas, pies, donuts, bubble charts, treemaps, word clouds, alluvional diagrams, dendrograms, networks, streamgraphs, gauges, chord diagrams, motion charts, parallel coordinates, sankey diagrams, maps, choropleth, ... On there is an endless gallery list of examples!
  30. 30. Building a simple dataset or a large and complex database focused on a topic of public interest leads to a valuable product: the database itself, intended as a collection of (linked) data plus metadata. Can a public frontend to such database, designed for citizens, journalists, stakeholders, be considered a journalistic outcome? If journalism is a public good, it can be a service, not only a product...
  31. 31. Scraping "Copy & Paste" combo Data Miner IMPORTXML() Tabula for Chrome browser Google Spreadsheet function for simple pdfs Python (or other languages) scripts and libraries Cleaning Filters and "Find & Replace" tools in spreadsheets Open Refine Analysis Pivot tables and simple charts in spreadsheets Dedicated softwares (ie. open-source or ) Viz QtiPlot QGIS Datawrapper RAW Google Fusion Tables Tableau CartoDB Timelinejs Timemapper StoryMap d3js , , , , , , , , , , , ...
  32. 32. Tina Casagrand, " Data journalism for science journalists ", The Open Notebook (2014) Paul Bradshaw, " Scraping for Journalists ", Leanpub (2014) John Mair, Richard Lance Keeble, " Data Journalism ", abramis (2014) Paul Bradshaw, " Data Journalism Heist " Claire Miller, " Getting Started with Data Journalism ", Leanpub (2013) Nathan Yau, " Data Points ", Wiley (2013) Simon Rogers, " Facts are Sacred ", Faber & Faber (2013) Jonathan Gray, " The Data Journalism Handbook ", O'Reilly (2012) Nathan Yau, " Visualize This ", Wiley (2011)
  33. 33. Alessio "jenkin" Cimarelli @ Dataninja jenkin27 Q&A SWIM
  34. 34. Hacking + Marathon = Hackathon ESPAD (European students and drugs): RASFF (EU food safety):
  35. 35. The Rapid Alert System for Food and Feed (RASFF) was put in place to provide food and feed control authorities with an effective tool to exchange information about measures taken responding to serious risks detected in relation to food or feed. This exchange of information helps Member States to act more rapidly and in a coordinated manner in response to a health threat caused by food or feed.
  36. 36. This is the report from the fifth data-collection wave of the European School Survey Project on Alcohol and Other Drugs (ESPAD). It is based on data from more than 100,000 European students. Over the years about 500,000 European students have answered the ESPAD questionnaire. A total of 36 countries and regions have contributed data to the 2011 ESPAD Database. Drugs list includes cigarettes, alcohol, cannabis, other illecit drugs, tranquillants and sedatives without prescriptions.