When data journalism meets science | Erice, June 10th, 2014

Dataninja
DataninjaDataninja
ALESSIO CIMARELLI 
Data scientist at Dataninja 
jenkin@dataninja.it | @jenkin27 
dtnj.it/erice14 
International School of Science Journalism 
The Digital World (Erice, June 10th, 2014)
aka jenkin 
PAST 
Master Degree in Physics at the University of Rome "La Sapienza" 
Master in Science Communication at the International School for 
Advanced Studies (SISSA-ISAS) in Trieste 
Press officer at the European Laboratory for Non-Linear Spectroscopy 
(LENS) in Florence 
PRESENT 
Freelance data journalist, web developer, open data activist, citizen 
scientist, ...
Data journalism & data visualization made in Italy
When data journalism meets science | Erice, June 10th, 2014
You know very well how it works... :)
As topic 
Stories about the edge of scientific research and human knowledge. 
Key role in relationship between science and society. 
Science journalist can be a watchdog against false science and scientific 
frauds.
As method 
It would be evident in , because the workflow is 
similar to police inquiries or scientific research. 
Many informations from different sources, accountability problems, 
hypothesis and proofs, trial and error cycles, and so on. 
Not only a story, but also a discovery itself...
A word in a buzzwords era 
when his investigation 
is ultimately based on (or driven by) digital data, he acquires such prefix. 
If a journalist want to tell the world, and the world is now made of digital 
and quantitative informations, he has to acquire skills in management 
and interpretation of data, or he will miss an opportunity.
Teamwork and multidisciplinary 
Nose for news, public interest, intuition based on contest knowledge 
Analytical mind, mathematical and statistical skills, intuition based on 
science of numbers
Teamwork and multidisciplinary 
Problem solving, hi-tech knowledge in hardware and software, nerd (or 
geek, if you prefer) mood 
Artistic sensibility and intuition, knowledge in User Experience theory and 
techniques
Miners, dustmen, researchers, and story tellers 
Public search engines or deep web? Official 5-stars open data or web 
spiders and screen scrapers? Monitor and keyboard, smartphone and 
touch, or boots and mud? 
Data should be read by machines and not by humans! Datasets could 
hide errors, inconsistencies, lies... or show only a part of a story.
Miners, dustmen, researchers, and story tellers 
Normalizations and comparisons, filtering, grouping, aggregation, 
correlations, ... 
How to represent numbers and relations among numbers? Yes, with 
arabic numerals, but pictures are worth a thousand words... as long as 
you keep in mind that there are facts behind the numbers, and 
(copyright of The Guardian).
When data journalism meets science | Erice, June 10th, 2014
In method 
You run into a dataset and feel the presence of a possible news... 
OR 
... you have an interest, an idea, a thesis, so you are looking for data. 
Having quantitative data about a phenomenon means that somewhere 
there is a you have to understand, test, 
verify... and interpret! 
Data themselves can suggest new ways for your investigation or even 
falsify some hypothesis or assumptions. 
Common sense, intellectual honesty, professional ethics
Some random examples 
New Scientist Apps 
tornadoes 
warmingworld 
exoplanets 
planck 
sealevel 
The Telegraph map of wind farm 
Sorting algorithms 
Meteorites 
Earth Journalism Network
by Global Editors Network 
Health 
American Way of Birth, Costliest in the World 
Inside the Government's Drug Data 
Which Emergency Room Will See You the Fastest? 
New York floods 
Breathless and Burdened 
When Italy is shaking 
Italy, a delicate land 
Kepler’s Tally of Planets 
Biomassa 
(NYT) 
(ProPublica) 
(ProPublica) 
Environment 
(ProPublica) 
(Center for Public Integrity) 
(La Stampa) 
(La Stampa) 
Astronomy 
(NYT) 
Energy 
(Planbureau voor de Leefomgeving)
Research data, science world, citizen science
Hard sciences and social sciences 
Ok, neither LHC petabytes are for journalists, nor statistical data from 
epidemiologic surveys. 
But , or (open) 
, why not? 
If you are not specialized in a specific topic or if you lack the knowledge 
about the framework, you can ask to an expert you trust. 
You can also use numbers not in an investigation, but to tell a complex 
story using infographics and interactive visualizations.
Bibliographies, social networks of scientists, infrastructures 
Science is a human activity and an industry (almost) like any other. 
How are the European funds invested in scientific research? Where are 
the centers specialized in the treatment of specific diseases? Why some 
well known monitoring technologies are not used in some countries?
Sensor-based journalism 
Cheap electronics and sensors 
+ 
open hardware 
+ 
free information sharing 
= 
data from stakeholders other than scientists 
It's early, but promising: 
Swiss Make Open Data Camps 
Japan Geigermap at-a-glance 
Citizen Science & Sensors
If you have data, it's better if you know how to deal with them. 
If you think you may find some data, it's better if you use them. 
If someone use data, it's better if you can check his claims. 
Play with data is funny!
Welcome to the jungle!
Some examples 
Public administration 
International organizations 
NGOs 
Civic activists 
Press offices 
Leaks 
Social networks 
Journalistic sources 
Single journalists 
Ourselves...
Data made public and reusable 
Data.gov 
Data.gov.uk 
Open Data Hub 
OpenIR 
(USA) 
(UK) 
(Italy) 
(Indonesia) 
...
Remember the buzzword era? 
Data from big science experiments (Atlas, Human Brain Project, ...) 
Social networks (Facebook, Twitter, but also eBay, Amazon, ...) 
Maybe it's not for journalists, but it's a hot topic... 
Google Earth Engine
For machine, not for human 
The keyword is ! 
A well-formed table represent a structured data set. A list of facebook 
comments, articles of a newspaper, a recorded speech are not structured 
data (and so are not machine-readable).
It all depends on the format 
If we have Gladstone Gander as best friend: 
spreadsheet (xls, xlsx, ods, csv, tsv); 
not-so-common good formats (xml, sql, json, shp, kml, ...). 
If we are not so lucky: 
tables or lists in web pages (html); 
simple tables in well-done pdfs (pdf). 
If we have Murphy as worst enemy: 
scanned images, even if in a pdf wrapper (png, jpg, pdf); 
digital data behind complex search engines. 
And if we have the best data ever, but under closed license?
Well-formed data sets 
Numbers are numbers, strings are strings and not numbers, datetime 
must always have a single format (ie. yyyy/mm/dd), localization is 
important, no gender values in names' column or similar mixings, every 
elements should be named with a Unique Identifier (ID). 
Data types computer understands: 
integers (with sign, zero included), 
floating numbers (with sign), 
datetime, 
characters and string (case sensitive), 
null value (the strange case of a value that states "I'm not a value"). 
And simple comparisons are strictly equalities, also in strings!
Aggregation, average, normalization, relative difference, distribution, ... 
A single rule: correlation does not imply causation! 
Spurious correlations: 
Correlated: 
http://www.tylervigen.com/ 
http://www.correlated.org/
At a glance
With great power comes great responsibility 
The basic idea is quite simple: you have quantities expressed in numbers 
and geometric objects defined by dimensions (ie. radius in a circle), so you 
just have to decide how connect your quantities to visual dimensions. 
There are several (un)common charts and endless combinations: scatter 
plots, lines, bars, areas, pies, donuts, bubble charts, treemaps, word 
clouds, alluvional diagrams, dendrograms, networks, streamgraphs, 
gauges, chord diagrams, motion charts, parallel coordinates, sankey 
diagrams, maps, choropleth, ... 
On there is an endless d3js.org gallery list of examples!
Building a simple dataset or a large and complex database focused on a 
topic of public interest leads to a valuable product: the database itself, 
intended as a collection of (linked) data plus metadata. 
Can a public frontend to such database, designed for citizens, journalists, 
stakeholders, be considered a journalistic outcome? If journalism is a 
public good, it can be a service, not only a product...
Scraping 
"Copy & Paste" combo 
Data Miner 
IMPORTXML() 
Tabula 
for Chrome browser 
Google Spreadsheet function 
for simple pdfs 
Python (or other languages) scripts and libraries 
Cleaning 
Filters and "Find & Replace" tools in spreadsheets 
Open Refine 
Analysis 
Pivot tables and simple charts in spreadsheets 
Dedicated softwares (ie. open-source or ) 
Viz 
QtiPlot QGIS 
Datawrapper RAW Google Fusion Tables Tableau CartoDB 
infogr.am easel.ly Timelinejs Timemapper StoryMap d3js 
, , , , , 
, , , , , , ...
Tina Casagrand, " Data journalism for science journalists 
", The Open 
Notebook (2014) 
Paul Bradshaw, " Scraping for Journalists 
", Leanpub (2014) 
John Mair, Richard Lance Keeble, " Data Journalism 
", abramis (2014) 
Paul Bradshaw, " Data Journalism Heist 
" 
Claire Miller, " Getting Started with Data Journalism 
", Leanpub (2013) 
Nathan Yau, " Data Points 
", Wiley (2013) 
Simon Rogers, " Facts are Sacred 
", Faber & Faber (2013) 
Jonathan Gray, " The Data Journalism Handbook 
", O'Reilly (2012) 
Nathan Yau, " Visualize This 
", Wiley (2011)
Alessio "jenkin" Cimarelli 
jenkin@dataninja.it 
@ 
Dataninja 
jenkin27 
www.dataninja.it 
school.dataninja.it 
dataninja.it/newsletter 
Q&A 
school.dataninja.it/qa 
SWIM 
sciencewritersinitaly.wordpress.com
Hacking + Marathon = Hackathon 
ESPAD (European students and drugs): http://www.espad.org/en/ 
RASFF (EU food safety): http://ec.europa.eu/food/food/rapidalert/
http://ec.europa.eu/food/food/rapidalert/ 
The Rapid Alert System for Food and Feed (RASFF) was put in place to 
provide food and feed control authorities with an effective tool to 
exchange information about measures taken responding to serious risks 
detected in relation to food or feed. This exchange of information helps 
Member States to act more rapidly and in a coordinated manner in 
response to a health threat caused by food or feed. 
dtnj.it/rasff2013
http://www.espad.org/en/ 
This is the report from the fifth data-collection wave of the European 
School Survey Project on Alcohol and Other Drugs (ESPAD). It is based on 
data from more than 100,000 European students. Over the years about 
500,000 European students have answered the ESPAD questionnaire. A 
total of 36 countries and regions have contributed data to the 
2011 ESPAD Database. Drugs list includes cigarettes, alcohol, cannabis, 
other illecit drugs, tranquillants and sedatives without prescriptions. 
dtnj.it/espad2011
1 of 38

Recommended

How to Uncover Big Growth Opportunities with Data by
How to Uncover Big Growth Opportunities with DataHow to Uncover Big Growth Opportunities with Data
How to Uncover Big Growth Opportunities with DataLooker
3.2K views23 slides
Data driven storytelling tips from an iron viz champion ryan sleeper by
Data driven storytelling tips from an iron viz champion   ryan sleeperData driven storytelling tips from an iron viz champion   ryan sleeper
Data driven storytelling tips from an iron viz champion ryan sleeperRyan Sleeper
5K views52 slides
Les outils de data visualisation by
Les outils de data visualisationLes outils de data visualisation
Les outils de data visualisationUNITEC
15K views59 slides
Fezeka nkosi tourist attractions by
Fezeka nkosi tourist attractionsFezeka nkosi tourist attractions
Fezeka nkosi tourist attractionsFezeka Nkosi
606 views38 slides
John Quinton-Barber, Social Communications by
John Quinton-Barber, Social CommunicationsJohn Quinton-Barber, Social Communications
John Quinton-Barber, Social CommunicationsAnaerobic Digestion & Biogas Association
381 views19 slides

More Related Content

Viewers also liked

Medicamentos antidiabéticos para adultos con diabetes tipo 2. Revisión de efe... by
Medicamentos antidiabéticos para adultos con diabetes tipo 2. Revisión de efe...Medicamentos antidiabéticos para adultos con diabetes tipo 2. Revisión de efe...
Medicamentos antidiabéticos para adultos con diabetes tipo 2. Revisión de efe...José Ignacio Sánchez Amezua
250 views2 slides
Northeast Asia Tourism Forum Cross border tourism planning presentation by
Northeast Asia Tourism Forum Cross border tourism planning presentationNortheast Asia Tourism Forum Cross border tourism planning presentation
Northeast Asia Tourism Forum Cross border tourism planning presentationJames MacGregor (jmacgregor@ecoplannet.com)
1.6K views38 slides
الهوبيت والفلسفة by
الهوبيت والفلسفةالهوبيت والفلسفة
الهوبيت والفلسفةاحمد الجسار
449 views261 slides
Final Presentaion BD by
Final Presentaion BDFinal Presentaion BD
Final Presentaion BDSai Brahma Penugonda
290 views38 slides
FANSHOES Informational Packet by
FANSHOES Informational PacketFANSHOES Informational Packet
FANSHOES Informational PacketNick Rovisa
455 views14 slides
الإدارة بالحب by
الإدارة بالحبالإدارة بالحب
الإدارة بالحبDr Ghaiath Hussein
717 views10 slides

Viewers also liked(15)

FANSHOES Informational Packet by Nick Rovisa
FANSHOES Informational PacketFANSHOES Informational Packet
FANSHOES Informational Packet
Nick Rovisa455 views
White Paper: Social Monitoring by Cory Grassell
White Paper: Social MonitoringWhite Paper: Social Monitoring
White Paper: Social Monitoring
Cory Grassell426 views
Final report for oap butterfly garden by miaomiaopig
Final report for oap butterfly gardenFinal report for oap butterfly garden
Final report for oap butterfly garden
miaomiaopig396 views
Pdf de taller apicultura marzo by Ruben NotFun
Pdf de taller apicultura marzoPdf de taller apicultura marzo
Pdf de taller apicultura marzo
Ruben NotFun1.2K views
Investment press release by Duczko
Investment press releaseInvestment press release
Investment press release
Duczko241 views
OBRA Y OBREROS EN VENEZUELA. AÑO 2011. No. 50 by CPV
OBRA Y OBREROS EN VENEZUELA. AÑO 2011. No. 50OBRA Y OBREROS EN VENEZUELA. AÑO 2011. No. 50
OBRA Y OBREROS EN VENEZUELA. AÑO 2011. No. 50
CPV586 views
Silsilah keluarga gaffar by Warnet Raha
Silsilah keluarga gaffarSilsilah keluarga gaffar
Silsilah keluarga gaffar
Warnet Raha163 views

Similar to When data journalism meets science | Erice, June 10th, 2014

The era of artificial intelligence by
The era of artificial intelligenceThe era of artificial intelligence
The era of artificial intelligencePrajjwal Kushwaha
2.4K views24 slides
Figures of the Many - Quantitative Concepts for Qualitative Thinking by
Figures of the Many - Quantitative Concepts for Qualitative ThinkingFigures of the Many - Quantitative Concepts for Qualitative Thinking
Figures of the Many - Quantitative Concepts for Qualitative ThinkingBernhard Rieder
4.6K views57 slides
Nine Algorithms That Changed The Future by
Nine Algorithms That Changed The FutureNine Algorithms That Changed The Future
Nine Algorithms That Changed The FutureBrittany Eason
2 views41 slides
Digital Scholarship Seminar: Implications of Data for the 21st-century Humanist by
Digital Scholarship Seminar: Implications of Data for the 21st-century HumanistDigital Scholarship Seminar: Implications of Data for the 21st-century Humanist
Digital Scholarship Seminar: Implications of Data for the 21st-century HumanistRebecca Davis
381 views40 slides
New Frontiers in IA: Design in the Era of Cognitive Computing by
New Frontiers in IA: Design in the Era of Cognitive ComputingNew Frontiers in IA: Design in the Era of Cognitive Computing
New Frontiers in IA: Design in the Era of Cognitive ComputingPaul King
614 views21 slides
Artificial intelligence by
Artificial intelligenceArtificial intelligence
Artificial intelligenceSantanu Mukhopadhyay
2.4K views26 slides

Similar to When data journalism meets science | Erice, June 10th, 2014(20)

Figures of the Many - Quantitative Concepts for Qualitative Thinking by Bernhard Rieder
Figures of the Many - Quantitative Concepts for Qualitative ThinkingFigures of the Many - Quantitative Concepts for Qualitative Thinking
Figures of the Many - Quantitative Concepts for Qualitative Thinking
Bernhard Rieder4.6K views
Nine Algorithms That Changed The Future by Brittany Eason
Nine Algorithms That Changed The FutureNine Algorithms That Changed The Future
Nine Algorithms That Changed The Future
Brittany Eason2 views
Digital Scholarship Seminar: Implications of Data for the 21st-century Humanist by Rebecca Davis
Digital Scholarship Seminar: Implications of Data for the 21st-century HumanistDigital Scholarship Seminar: Implications of Data for the 21st-century Humanist
Digital Scholarship Seminar: Implications of Data for the 21st-century Humanist
Rebecca Davis381 views
New Frontiers in IA: Design in the Era of Cognitive Computing by Paul King
New Frontiers in IA: Design in the Era of Cognitive ComputingNew Frontiers in IA: Design in the Era of Cognitive Computing
New Frontiers in IA: Design in the Era of Cognitive Computing
Paul King614 views
Data Science definition by CarloLauro1
Data Science definitionData Science definition
Data Science definition
CarloLauro124 views
Let's talk about Data Science by Carlo Lauro
Let's talk about Data ScienceLet's talk about Data Science
Let's talk about Data Science
Carlo Lauro76 views
Data science innovations by suresh sood
Data science innovations Data science innovations
Data science innovations
suresh sood207 views
Difference Between Discipline And Synopticon by Aurora Cuellar
Difference Between Discipline And SynopticonDifference Between Discipline And Synopticon
Difference Between Discipline And Synopticon
Aurora Cuellar2 views
AI 3.0 by InnoTech
AI 3.0AI 3.0
AI 3.0
InnoTech870 views
Harvesting collective intelligence. by Alberto Cottica
Harvesting collective intelligence. Harvesting collective intelligence.
Harvesting collective intelligence.
Alberto Cottica486 views
Making our mark: the important role of social scientists in the ‘era of big d... by The Higher Education Academy
Making our mark: the important role of social scientists in the ‘era of big d...Making our mark: the important role of social scientists in the ‘era of big d...
Making our mark: the important role of social scientists in the ‘era of big d...
Human-machine Inter-agencies by mo-seph
Human-machine Inter-agenciesHuman-machine Inter-agencies
Human-machine Inter-agencies
mo-seph4.8K views
Artificial Intelligence by Mhd Sb
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
Mhd Sb3.6K views
In the Age of Open Information - Do-It-Yourself Analytical Mashups on Schema-... by MaikThiele
In the Age of Open Information - Do-It-Yourself Analytical Mashups on Schema-...In the Age of Open Information - Do-It-Yourself Analytical Mashups on Schema-...
In the Age of Open Information - Do-It-Yourself Analytical Mashups on Schema-...
MaikThiele613 views
Data fluency for the 21st century by MartinFrigaard
Data fluency for the 21st centuryData fluency for the 21st century
Data fluency for the 21st century
MartinFrigaard97 views
AI WORLD: I-World: EIS Global Innovation Platform: BIG Knowledge World vs. BI... by Azamat Abdoullaev
AI WORLD: I-World: EIS Global Innovation Platform: BIG Knowledge World vs. BI...AI WORLD: I-World: EIS Global Innovation Platform: BIG Knowledge World vs. BI...
AI WORLD: I-World: EIS Global Innovation Platform: BIG Knowledge World vs. BI...
Azamat Abdoullaev3.2K views

More from Dataninja

Confiscatibene data & community driven journalism by
Confiscatibene data & community driven journalismConfiscatibene data & community driven journalism
Confiscatibene data & community driven journalismDataninja
1.9K views16 slides
The Migrants’ Files, one year later by
The Migrants’ Files, one year laterThe Migrants’ Files, one year later
The Migrants’ Files, one year laterDataninja
526 views9 slides
#migrantsfiles international by
#migrantsfiles international#migrantsfiles international
#migrantsfiles internationalDataninja
939 views25 slides
Confiscati Bene a Ferrara by
Confiscati Bene a FerraraConfiscati Bene a Ferrara
Confiscati Bene a FerraraDataninja
553 views20 slides
Guida galattica per i data journalists by
Guida galattica per i data journalistsGuida galattica per i data journalists
Guida galattica per i data journalistsDataninja
1K views23 slides
Un giornalista tra dati e sensori by
Un giornalista tra dati e sensoriUn giornalista tra dati e sensori
Un giornalista tra dati e sensoriDataninja
426 views12 slides

More from Dataninja(20)

Confiscatibene data & community driven journalism by Dataninja
Confiscatibene data & community driven journalismConfiscatibene data & community driven journalism
Confiscatibene data & community driven journalism
Dataninja1.9K views
The Migrants’ Files, one year later by Dataninja
The Migrants’ Files, one year laterThe Migrants’ Files, one year later
The Migrants’ Files, one year later
Dataninja526 views
#migrantsfiles international by Dataninja
#migrantsfiles international#migrantsfiles international
#migrantsfiles international
Dataninja939 views
Confiscati Bene a Ferrara by Dataninja
Confiscati Bene a FerraraConfiscati Bene a Ferrara
Confiscati Bene a Ferrara
Dataninja553 views
Guida galattica per i data journalists by Dataninja
Guida galattica per i data journalistsGuida galattica per i data journalists
Guida galattica per i data journalists
Dataninja1K views
Un giornalista tra dati e sensori by Dataninja
Un giornalista tra dati e sensoriUn giornalista tra dati e sensori
Un giornalista tra dati e sensori
Dataninja426 views
Storie che nascono dai dati, come cambia il giornalismo nell'età della Rete by Dataninja
Storie che nascono dai dati, come cambia il giornalismo nell'età della ReteStorie che nascono dai dati, come cambia il giornalismo nell'età della Rete
Storie che nascono dai dati, come cambia il giornalismo nell'età della Rete
Dataninja557 views
Data journalism: fare giornalismo con metodo (scientifico) by Dataninja
Data journalism: fare giornalismo con metodo (scientifico)Data journalism: fare giornalismo con metodo (scientifico)
Data journalism: fare giornalismo con metodo (scientifico)
Dataninja563 views
#migrantsfiles | Cortina d'Ampezzo, 8 luglio 2014 by Dataninja
#migrantsfiles | Cortina d'Ampezzo, 8 luglio 2014#migrantsfiles | Cortina d'Ampezzo, 8 luglio 2014
#migrantsfiles | Cortina d'Ampezzo, 8 luglio 2014
Dataninja378 views
Open Data & Data Visualization: dalle licenze ai grafici | Bologna, 16 giugno... by Dataninja
Open Data & Data Visualization: dalle licenze ai grafici | Bologna, 16 giugno...Open Data & Data Visualization: dalle licenze ai grafici | Bologna, 16 giugno...
Open Data & Data Visualization: dalle licenze ai grafici | Bologna, 16 giugno...
Dataninja765 views
Data Journalism: strumenti operativi | Bologna, 9 giugno 2014 by Dataninja
Data Journalism: strumenti operativi | Bologna, 9 giugno 2014Data Journalism: strumenti operativi | Bologna, 9 giugno 2014
Data Journalism: strumenti operativi | Bologna, 9 giugno 2014
Dataninja1.3K views
Introduzione al data journalism | Roma, 7 giugno 2014 by Dataninja
Introduzione al data journalism | Roma, 7 giugno 2014Introduzione al data journalism | Roma, 7 giugno 2014
Introduzione al data journalism | Roma, 7 giugno 2014
Dataninja530 views
Dispensa Datajournalism | Maggio 2014 | school.dataninja.it by Dataninja
Dispensa Datajournalism | Maggio 2014 | school.dataninja.itDispensa Datajournalism | Maggio 2014 | school.dataninja.it
Dispensa Datajournalism | Maggio 2014 | school.dataninja.it
Dataninja1.8K views
Tra dati e notizie by Dataninja
Tra dati e notizieTra dati e notizie
Tra dati e notizie
Dataninja463 views
Data visualization in data journalism workflow by Dataninja
Data visualization in data journalism workflowData visualization in data journalism workflow
Data visualization in data journalism workflow
Dataninja602 views
Data Visualization Lab - #SOD14 - Bologna - 30 marzo 2014 by Dataninja
Data Visualization Lab - #SOD14 - Bologna - 30 marzo 2014Data Visualization Lab - #SOD14 - Bologna - 30 marzo 2014
Data Visualization Lab - #SOD14 - Bologna - 30 marzo 2014
Dataninja851 views
Come nasce un'inchiesta data-driven by Dataninja
Come nasce un'inchiesta data-drivenCome nasce un'inchiesta data-driven
Come nasce un'inchiesta data-driven
Dataninja486 views
Pools of data by Dataninja
Pools of dataPools of data
Pools of data
Dataninja494 views
Web scraping e Datawrapper per giornalisti locali by Dataninja
Web scraping e Datawrapper per giornalisti localiWeb scraping e Datawrapper per giornalisti locali
Web scraping e Datawrapper per giornalisti locali
Dataninja881 views
20131130 - Open Ricostruzione: i fondi destinati a Bondeno (Ferrara) dopo il ... by Dataninja
20131130 - Open Ricostruzione: i fondi destinati a Bondeno (Ferrara) dopo il ...20131130 - Open Ricostruzione: i fondi destinati a Bondeno (Ferrara) dopo il ...
20131130 - Open Ricostruzione: i fondi destinati a Bondeno (Ferrara) dopo il ...
Dataninja314 views

Recently uploaded

11.28.23 Social Capital and Social Exclusion.pptx by
11.28.23 Social Capital and Social Exclusion.pptx11.28.23 Social Capital and Social Exclusion.pptx
11.28.23 Social Capital and Social Exclusion.pptxmary850239
312 views25 slides
Create a Structure in VBNet.pptx by
Create a Structure in VBNet.pptxCreate a Structure in VBNet.pptx
Create a Structure in VBNet.pptxBreach_P
78 views8 slides
Drama KS5 Breakdown by
Drama KS5 BreakdownDrama KS5 Breakdown
Drama KS5 BreakdownWestHatch
98 views2 slides
CONTENTS.pptx by
CONTENTS.pptxCONTENTS.pptx
CONTENTS.pptxiguerendiain
62 views17 slides
Gopal Chakraborty Memorial Quiz 2.0 Prelims.pptx by
Gopal Chakraborty Memorial Quiz 2.0 Prelims.pptxGopal Chakraborty Memorial Quiz 2.0 Prelims.pptx
Gopal Chakraborty Memorial Quiz 2.0 Prelims.pptxDebapriya Chakraborty
695 views81 slides
The Value and Role of Media and Information Literacy in the Information Age a... by
The Value and Role of Media and Information Literacy in the Information Age a...The Value and Role of Media and Information Literacy in the Information Age a...
The Value and Role of Media and Information Literacy in the Information Age a...Naseej Academy أكاديمية نسيج
58 views42 slides

Recently uploaded(20)

11.28.23 Social Capital and Social Exclusion.pptx by mary850239
11.28.23 Social Capital and Social Exclusion.pptx11.28.23 Social Capital and Social Exclusion.pptx
11.28.23 Social Capital and Social Exclusion.pptx
mary850239312 views
Create a Structure in VBNet.pptx by Breach_P
Create a Structure in VBNet.pptxCreate a Structure in VBNet.pptx
Create a Structure in VBNet.pptx
Breach_P78 views
Drama KS5 Breakdown by WestHatch
Drama KS5 BreakdownDrama KS5 Breakdown
Drama KS5 Breakdown
WestHatch98 views
The basics - information, data, technology and systems.pdf by JonathanCovena1
The basics - information, data, technology and systems.pdfThe basics - information, data, technology and systems.pdf
The basics - information, data, technology and systems.pdf
JonathanCovena1146 views
AUDIENCE - BANDURA.pptx by iammrhaywood
AUDIENCE - BANDURA.pptxAUDIENCE - BANDURA.pptx
AUDIENCE - BANDURA.pptx
iammrhaywood117 views
CUNY IT Picciano.pptx by apicciano
CUNY IT Picciano.pptxCUNY IT Picciano.pptx
CUNY IT Picciano.pptx
apicciano54 views
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB... by Nguyen Thanh Tu Collection
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
Structure and Functions of Cell.pdf by Nithya Murugan
Structure and Functions of Cell.pdfStructure and Functions of Cell.pdf
Structure and Functions of Cell.pdf
Nithya Murugan719 views
S1_SD_Resources Walkthrough.pptx by LAZAROAREVALO1
S1_SD_Resources Walkthrough.pptxS1_SD_Resources Walkthrough.pptx
S1_SD_Resources Walkthrough.pptx
LAZAROAREVALO164 views
Ch. 8 Political Party and Party System.pptx by Rommel Regala
Ch. 8 Political Party and Party System.pptxCh. 8 Political Party and Party System.pptx
Ch. 8 Political Party and Party System.pptx
Rommel Regala54 views
Solar System and Galaxies.pptx by DrHafizKosar
Solar System and Galaxies.pptxSolar System and Galaxies.pptx
Solar System and Galaxies.pptx
DrHafizKosar106 views
Pharmaceutical Inorganic chemistry UNIT-V Radiopharmaceutical.pptx by Ms. Pooja Bhandare
Pharmaceutical Inorganic chemistry UNIT-V Radiopharmaceutical.pptxPharmaceutical Inorganic chemistry UNIT-V Radiopharmaceutical.pptx
Pharmaceutical Inorganic chemistry UNIT-V Radiopharmaceutical.pptx
Ms. Pooja Bhandare113 views
Sociology KS5 by WestHatch
Sociology KS5Sociology KS5
Sociology KS5
WestHatch85 views
EIT-Digital_Spohrer_AI_Intro 20231128 v1.pptx by ISSIP
EIT-Digital_Spohrer_AI_Intro 20231128 v1.pptxEIT-Digital_Spohrer_AI_Intro 20231128 v1.pptx
EIT-Digital_Spohrer_AI_Intro 20231128 v1.pptx
ISSIP386 views

When data journalism meets science | Erice, June 10th, 2014

  • 1. ALESSIO CIMARELLI Data scientist at Dataninja jenkin@dataninja.it | @jenkin27 dtnj.it/erice14 International School of Science Journalism The Digital World (Erice, June 10th, 2014)
  • 2. aka jenkin PAST Master Degree in Physics at the University of Rome "La Sapienza" Master in Science Communication at the International School for Advanced Studies (SISSA-ISAS) in Trieste Press officer at the European Laboratory for Non-Linear Spectroscopy (LENS) in Florence PRESENT Freelance data journalist, web developer, open data activist, citizen scientist, ...
  • 3. Data journalism & data visualization made in Italy
  • 5. You know very well how it works... :)
  • 6. As topic Stories about the edge of scientific research and human knowledge. Key role in relationship between science and society. Science journalist can be a watchdog against false science and scientific frauds.
  • 7. As method It would be evident in , because the workflow is similar to police inquiries or scientific research. Many informations from different sources, accountability problems, hypothesis and proofs, trial and error cycles, and so on. Not only a story, but also a discovery itself...
  • 8. A word in a buzzwords era when his investigation is ultimately based on (or driven by) digital data, he acquires such prefix. If a journalist want to tell the world, and the world is now made of digital and quantitative informations, he has to acquire skills in management and interpretation of data, or he will miss an opportunity.
  • 9. Teamwork and multidisciplinary Nose for news, public interest, intuition based on contest knowledge Analytical mind, mathematical and statistical skills, intuition based on science of numbers
  • 10. Teamwork and multidisciplinary Problem solving, hi-tech knowledge in hardware and software, nerd (or geek, if you prefer) mood Artistic sensibility and intuition, knowledge in User Experience theory and techniques
  • 11. Miners, dustmen, researchers, and story tellers Public search engines or deep web? Official 5-stars open data or web spiders and screen scrapers? Monitor and keyboard, smartphone and touch, or boots and mud? Data should be read by machines and not by humans! Datasets could hide errors, inconsistencies, lies... or show only a part of a story.
  • 12. Miners, dustmen, researchers, and story tellers Normalizations and comparisons, filtering, grouping, aggregation, correlations, ... How to represent numbers and relations among numbers? Yes, with arabic numerals, but pictures are worth a thousand words... as long as you keep in mind that there are facts behind the numbers, and (copyright of The Guardian).
  • 14. In method You run into a dataset and feel the presence of a possible news... OR ... you have an interest, an idea, a thesis, so you are looking for data. Having quantitative data about a phenomenon means that somewhere there is a you have to understand, test, verify... and interpret! Data themselves can suggest new ways for your investigation or even falsify some hypothesis or assumptions. Common sense, intellectual honesty, professional ethics
  • 15. Some random examples New Scientist Apps tornadoes warmingworld exoplanets planck sealevel The Telegraph map of wind farm Sorting algorithms Meteorites Earth Journalism Network
  • 16. by Global Editors Network Health American Way of Birth, Costliest in the World Inside the Government's Drug Data Which Emergency Room Will See You the Fastest? New York floods Breathless and Burdened When Italy is shaking Italy, a delicate land Kepler’s Tally of Planets Biomassa (NYT) (ProPublica) (ProPublica) Environment (ProPublica) (Center for Public Integrity) (La Stampa) (La Stampa) Astronomy (NYT) Energy (Planbureau voor de Leefomgeving)
  • 17. Research data, science world, citizen science
  • 18. Hard sciences and social sciences Ok, neither LHC petabytes are for journalists, nor statistical data from epidemiologic surveys. But , or (open) , why not? If you are not specialized in a specific topic or if you lack the knowledge about the framework, you can ask to an expert you trust. You can also use numbers not in an investigation, but to tell a complex story using infographics and interactive visualizations.
  • 19. Bibliographies, social networks of scientists, infrastructures Science is a human activity and an industry (almost) like any other. How are the European funds invested in scientific research? Where are the centers specialized in the treatment of specific diseases? Why some well known monitoring technologies are not used in some countries?
  • 20. Sensor-based journalism Cheap electronics and sensors + open hardware + free information sharing = data from stakeholders other than scientists It's early, but promising: Swiss Make Open Data Camps Japan Geigermap at-a-glance Citizen Science & Sensors
  • 21. If you have data, it's better if you know how to deal with them. If you think you may find some data, it's better if you use them. If someone use data, it's better if you can check his claims. Play with data is funny!
  • 22. Welcome to the jungle!
  • 23. Some examples Public administration International organizations NGOs Civic activists Press offices Leaks Social networks Journalistic sources Single journalists Ourselves...
  • 24. Data made public and reusable Data.gov Data.gov.uk Open Data Hub OpenIR (USA) (UK) (Italy) (Indonesia) ...
  • 25. Remember the buzzword era? Data from big science experiments (Atlas, Human Brain Project, ...) Social networks (Facebook, Twitter, but also eBay, Amazon, ...) Maybe it's not for journalists, but it's a hot topic... Google Earth Engine
  • 26. For machine, not for human The keyword is ! A well-formed table represent a structured data set. A list of facebook comments, articles of a newspaper, a recorded speech are not structured data (and so are not machine-readable).
  • 27. It all depends on the format If we have Gladstone Gander as best friend: spreadsheet (xls, xlsx, ods, csv, tsv); not-so-common good formats (xml, sql, json, shp, kml, ...). If we are not so lucky: tables or lists in web pages (html); simple tables in well-done pdfs (pdf). If we have Murphy as worst enemy: scanned images, even if in a pdf wrapper (png, jpg, pdf); digital data behind complex search engines. And if we have the best data ever, but under closed license?
  • 28. Well-formed data sets Numbers are numbers, strings are strings and not numbers, datetime must always have a single format (ie. yyyy/mm/dd), localization is important, no gender values in names' column or similar mixings, every elements should be named with a Unique Identifier (ID). Data types computer understands: integers (with sign, zero included), floating numbers (with sign), datetime, characters and string (case sensitive), null value (the strange case of a value that states "I'm not a value"). And simple comparisons are strictly equalities, also in strings!
  • 29. Aggregation, average, normalization, relative difference, distribution, ... A single rule: correlation does not imply causation! Spurious correlations: Correlated: http://www.tylervigen.com/ http://www.correlated.org/
  • 31. With great power comes great responsibility The basic idea is quite simple: you have quantities expressed in numbers and geometric objects defined by dimensions (ie. radius in a circle), so you just have to decide how connect your quantities to visual dimensions. There are several (un)common charts and endless combinations: scatter plots, lines, bars, areas, pies, donuts, bubble charts, treemaps, word clouds, alluvional diagrams, dendrograms, networks, streamgraphs, gauges, chord diagrams, motion charts, parallel coordinates, sankey diagrams, maps, choropleth, ... On there is an endless d3js.org gallery list of examples!
  • 32. Building a simple dataset or a large and complex database focused on a topic of public interest leads to a valuable product: the database itself, intended as a collection of (linked) data plus metadata. Can a public frontend to such database, designed for citizens, journalists, stakeholders, be considered a journalistic outcome? If journalism is a public good, it can be a service, not only a product...
  • 33. Scraping "Copy & Paste" combo Data Miner IMPORTXML() Tabula for Chrome browser Google Spreadsheet function for simple pdfs Python (or other languages) scripts and libraries Cleaning Filters and "Find & Replace" tools in spreadsheets Open Refine Analysis Pivot tables and simple charts in spreadsheets Dedicated softwares (ie. open-source or ) Viz QtiPlot QGIS Datawrapper RAW Google Fusion Tables Tableau CartoDB infogr.am easel.ly Timelinejs Timemapper StoryMap d3js , , , , , , , , , , , ...
  • 34. Tina Casagrand, " Data journalism for science journalists ", The Open Notebook (2014) Paul Bradshaw, " Scraping for Journalists ", Leanpub (2014) John Mair, Richard Lance Keeble, " Data Journalism ", abramis (2014) Paul Bradshaw, " Data Journalism Heist " Claire Miller, " Getting Started with Data Journalism ", Leanpub (2013) Nathan Yau, " Data Points ", Wiley (2013) Simon Rogers, " Facts are Sacred ", Faber & Faber (2013) Jonathan Gray, " The Data Journalism Handbook ", O'Reilly (2012) Nathan Yau, " Visualize This ", Wiley (2011)
  • 35. Alessio "jenkin" Cimarelli jenkin@dataninja.it @ Dataninja jenkin27 www.dataninja.it school.dataninja.it dataninja.it/newsletter Q&A school.dataninja.it/qa SWIM sciencewritersinitaly.wordpress.com
  • 36. Hacking + Marathon = Hackathon ESPAD (European students and drugs): http://www.espad.org/en/ RASFF (EU food safety): http://ec.europa.eu/food/food/rapidalert/
  • 37. http://ec.europa.eu/food/food/rapidalert/ The Rapid Alert System for Food and Feed (RASFF) was put in place to provide food and feed control authorities with an effective tool to exchange information about measures taken responding to serious risks detected in relation to food or feed. This exchange of information helps Member States to act more rapidly and in a coordinated manner in response to a health threat caused by food or feed. dtnj.it/rasff2013
  • 38. http://www.espad.org/en/ This is the report from the fifth data-collection wave of the European School Survey Project on Alcohol and Other Drugs (ESPAD). It is based on data from more than 100,000 European students. Over the years about 500,000 European students have answered the ESPAD questionnaire. A total of 36 countries and regions have contributed data to the 2011 ESPAD Database. Drugs list includes cigarettes, alcohol, cannabis, other illecit drugs, tranquillants and sedatives without prescriptions. dtnj.it/espad2011