• Like
Upcoming SlideShare
Loading in...5


Uploaded on


More in: Education , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. validating external data sets what social scholars and data journalists can learn from each another
  • 2.     Hille van der Kaa @Hillevanderkaa  
  • 3. missing data, no value stored “I need to solve this”
  • 4. missing data, no value stored “I need to solve this” missing data, no value stored “I need to write a story about this”
  • 5. forreporters.com/andrew-lehren/
  • 6. “Trustworthiness and data management are vital to the success of qualitative studies … There is a lack of scientific literature regarding the structures and processes for managing large qualitative data sets.” (White, Oelken, Friesen, 2012)  
  • 7. “A simple answer to objective reporting is the kind of reporting that uses relevant and reliable sources which is not bias or slanted to a certain party.” Ibrahim, Pawanteh, Kee (2011)
  • 8. can I trust and use this dataset?
  • 9. check the data source what are his/her/its intentions?
  • 10. what is the citation index of the data owner? do other journalists cite the data owner?  
  • 11. benefit do I really need this? do I really need to use it?  
  • 12. check data gathering? is this correct? clarification of the data? do I understand?  
  • 13. missing data what is wrong? I need to solve what is the story? I need to write  
  • 14. internal validation TEST! CALL!  
  • 15. I need more sources! (do I?) give me data check consistency give me humans check my story  
  • 16. scientists data journalists check the source (citation) check the source (citation) check the data check the data check benefit check benefit check data gathering check clarification TEST! CALL! more data sources more human sources
  • 17. scientist to journalist: “You twist everything”
  • 18. “Dear datajournalist, Please take a look at the research method yourself and act a bit more like a scientist.”
  • 19. journalist to scientist: “Your articles are useless”
  • 20. “Dear scientist, Try to avoid intellectual arrogance. There are other people who are just as smart.”  
  • 21. journalistic data mining The process of finding correlations or patterns in large relational databases. It is the process of analyzing data from different perspectives and summarizing it into useful and reliable information.  
  • 22.  Gross Time Ranking versus Net Time Ranking     ‘The net time is the measured time from starting line to finish line and the gross time is the measured time from the starting shot until the finish line. In photo's of the starting line of marathons one can see thousands of runners who are eager to start. However, when one stands in the last starting pen, one can not directly run at full speed. A kind of human traffic jam arises when the marathon starts. On the internet people complain about this difference in time results, because the ranking is based on gross times.’
  • 23. missing values - solve ‘We discovered that the data of 100 runners lacked. Apparently one scraped page was added double. We removed the 100 duplicates.’  
  • 24. missing values - story ‘Still, nineteen runners were missing in the Amsterdam data set. Perchance these are runners that have been disqualified.’ Or…
  • 25. ‘To calculate the average position changes, caused by net ranking, we converted the difference scores to absolute figures. The average position change in the Amsterdam Marathon was 281.6 places.’
  • 26. scientific outcome ‘We calculated the Kendalls Tau rank correlation coefficient for the net and gross ranking of the Amsterdam Marathon. This coefficient shows that despite of the average differences between the rankings, the net and gross time rankings are almost equal to each other.’
  • 27. journalistic outcome ‘We spoke Patrick Schuerman from Tilburg on the phone. Patrick had starting number 11797 in the Amsterdam Marathon of 2013 and had a gross time versus net time difference of over 21 minutes. In his opinion, the ranking of the marathon should happen after net times since these are the ‘real’ times people ran.’
  • 28. we are both right
  • 29. we can learn from each other
  • 30.     Hille van der Kaa @Hillevanderkaa   current topic: a citizen view on the credibility of machine written news   http://tinyurl.com/ research-uvt Part of PhD research Human Component in Machine Written Narratives