Workshop on Data Journalism
February 17, 2014
Ghent

How to get the
data and how to
process them?

Lorenzo Pellizzari
1
About me …

2
Get the data

Receive it

Advanced search
techniques

How to get
the data?
FOI laws

Scrape it

3
1

Receive it

Analyzing the War Logs (Associated Press)
4
2

Advanced search techniques:
Google
79.300.000 results

5results

5
2

Advanced search techniques:
SPARQL

http://dbpedia.org/sparql

6
2

Advanced search techniques:
SPARQL

7
2

Advanced search techniques:
SPARQL

http://latemar.science.unitn.it/spacetime/spacetime.html
8
3

Freedom of Information laws

9
3

Freedom of Information laws

10
4

Scrape your data

“Web scraping (web harvesting or web data extraction) is a computer software
technique of extracting ...
4

Scrape your data

12
4

Scrape your data

13
Process the data
What Analytics, Data mining, Big Data
software you used in the past 12 months for a
real project (not jus...
The software for data analysis
Share of R- or SAS-related posts to Stack
Overflow by week.

http://r4stats.com/articles/po...
The software for data analysis

16
Example: ABC News
Interactive map of gas wells and leases in Australia

Scraping: Main data coming from
gouvernemental web...
Example: ABC News
•

A web developer and designer

•

A lead journalist

•

A part time researcher with expertise in data ...
19
Upcoming SlideShare
Loading in …5
×

DataJournalism: How To get data and process them?

618 views

Published on

Workshop on datajournalism given at the DataDays organised by the Open Knowledge Foundation on the 17th of February 2014.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
618
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
20
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

DataJournalism: How To get data and process them?

  1. 1. Workshop on Data Journalism February 17, 2014 Ghent How to get the data and how to process them? Lorenzo Pellizzari 1
  2. 2. About me … 2
  3. 3. Get the data Receive it Advanced search techniques How to get the data? FOI laws Scrape it 3
  4. 4. 1 Receive it Analyzing the War Logs (Associated Press) 4
  5. 5. 2 Advanced search techniques: Google 79.300.000 results 5results 5
  6. 6. 2 Advanced search techniques: SPARQL http://dbpedia.org/sparql 6
  7. 7. 2 Advanced search techniques: SPARQL 7
  8. 8. 2 Advanced search techniques: SPARQL http://latemar.science.unitn.it/spacetime/spacetime.html 8
  9. 9. 3 Freedom of Information laws 9
  10. 10. 3 Freedom of Information laws 10
  11. 11. 4 Scrape your data “Web scraping (web harvesting or web data extraction) is a computer software technique of extracting information from websites.” (Wikipedia) http://www-news.iaea.org/ 11
  12. 12. 4 Scrape your data 12
  13. 13. 4 Scrape your data 13
  14. 14. Process the data What Analytics, Data mining, Big Data software you used in the past 12 months for a real project (not just evaluation) [798 voters] http://www.kdnuggets.com/ 14
  15. 15. The software for data analysis Share of R- or SAS-related posts to Stack Overflow by week. http://r4stats.com/articles/popularity/ 15
  16. 16. The software for data analysis 16
  17. 17. Example: ABC News Interactive map of gas wells and leases in Australia Scraping: Main data coming from gouvernemental websites FOI: Data on chemical releases Variety of reports: Data on salt and water http://datajournalismhandbook.org/ 17
  18. 18. Example: ABC News • A web developer and designer • A lead journalist • A part time researcher with expertise in data extraction, excel spread sheets and data cleaning • A part time junior journalist • A consultant executive producer • A academic consultant with expertise in data mining, graphic visualization and advanced research skills • The services of a project manager and the administrative assistance of the ABC’s multi-platform unit • Importantly we also had a reference group of journalists and others whom we consulted on a needs basis http://datajournalismhandbook.org/ 18
  19. 19. 19

×