SlideShare a Scribd company logo
validating external data sets 	
	
	
	
	
	
	
what social scholars and data journalists can learn from each another
 
	
  
	
Hille van der Kaa	
@Hillevanderkaa	
	
  
missing data, no value stored	
“I need to solve this”
missing data, no value stored	
“I need to solve this”	

missing data, no value stored	
“I need to write a story about this”
forreporters.com/andrew-lehren/
“Trustworthiness and data
management are vital to the success of
qualitative studies … There is a lack of
scientific literature regarding the
structures and processes for managing
large qualitative data sets.”	
	
(White, Oelken, Friesen, 2012)	

	
	
  
“A simple answer to objective reporting
is the kind of reporting that uses relevant
and reliable sources which is not bias or
slanted to a certain party.”	
	
Ibrahim, Pawanteh, Kee (2011)
can I trust and use this dataset?
check the data source	
	
what are his/her/its intentions?
what is the citation index	
of the data owner?	
	
	
do other journalists	
cite the data owner?	
	
	
  
benefit	
	
do I really need this?	
	
	
	
do I really need to use it?	
	
	
  
check	
	
data gathering? 	
is this correct?	
	
	
clarification of the data?
do I understand?	
	
	
  
missing data	
	
what is wrong? 	
I need to solve	
	
	
what is the story?	
I need to write	
	
  
internal validation	
	
TEST!	
	
	
	
CALL!	
	
  
I need more sources! (do I?)	
	
give me data	
check consistency	
	
	
give me humans	
check my story	
	
  
scientists	

data journalists	

check the
source
(citation)	

check the
source
(citation)	

check the
data	

check the
data	

check
benefit	

check
benefit	

check data
gathering	

check
clarification	

TEST!	

CALL!	

more data
sources	

more
human
sources
scientist to journalist: “You twist everything”
“Dear datajournalist,	
	
Please take a look at the
research method yourself
and act a bit more like a
scientist.”
journalist to scientist: “Your articles are useless”
“Dear scientist,	
	
Try to avoid intellectual
arrogance. There are
other people who are just
as smart.”	
	
  
journalistic data mining	
The process of finding correlations or
patterns in large relational databases. 	
	
It is the process of analyzing data from
different perspectives and summarizing it
into useful and reliable information.	
	
  
Gross Time Ranking versus Net Time Ranking	
  

	
  

‘The net time is the measured time from starting line
to finish line and the gross time is the measured time
from the starting shot until the finish line. 	
	
In photo's of the starting line of marathons one can
see thousands of runners who are eager to start.
However, when one stands in the last starting pen,
one can not directly run at full speed.	
	
A kind of human traffic jam arises when the
marathon starts. On the internet people complain
about this difference in time results, because the
ranking is based on gross times.’
missing values - solve	
	
	
‘We discovered that the data of 100
runners lacked. Apparently one scraped
page was added double. We removed
the 100 duplicates.’	
  
missing values - story	
	
	
‘Still, nineteen runners were missing in
the Amsterdam data set. 	
Perchance these are runners that have
been disqualified.’	
	
Or…
‘To calculate the average position
changes, caused by net ranking, we
converted the difference scores to
absolute figures.	
	
The average position change in the
Amsterdam Marathon was 281.6
places.’
scientific outcome	
‘We calculated the Kendalls Tau rank
correlation coefficient for the net and
gross ranking of the Amsterdam
Marathon. 	
This coefficient shows that despite of the
average differences between the
rankings, the net and gross time rankings
are almost equal to each other.’
journalistic outcome	
‘We spoke Patrick Schuerman from Tilburg
on the phone. Patrick had starting number
11797 in the Amsterdam Marathon of 2013
and had a gross time versus net time
difference of over 21 minutes. 	
	
In his opinion, the ranking of the marathon
should happen after net times since these
are the ‘real’ times people ran.’
we are both right
we can learn from each other
 
	
  
	
Hille van der Kaa	
@Hillevanderkaa	
	
  

current topic:	
	
a citizen view on the
credibility of machine
written news	
	
  
http://tinyurl.com/
research-uvt	
	
Part of PhD research 	
Human Component in 	
Machine Written Narratives

More Related Content

Viewers also liked

How to validate a dataset? Six steps.
How to validate a dataset? Six steps.How to validate a dataset? Six steps.
How to validate a dataset? Six steps.
Hille van der Kaa MA MBA
 
Brand storytelling introduction @iemes fontys
Brand storytelling   introduction @iemes fontysBrand storytelling   introduction @iemes fontys
Brand storytelling introduction @iemes fontysHille van der Kaa MA MBA
 
Location based Apps for journalists
Location based Apps for journalistsLocation based Apps for journalists
Location based Apps for journalists
Hille van der Kaa MA MBA
 
Keep presentation
Keep presentationKeep presentation
Keep presentation
Elaine Kelleher
 
Storytelling
StorytellingStorytelling
Toekomst Van Media
Toekomst Van MediaToekomst Van Media
Toekomst Van Media
Hille van der Kaa MA MBA
 
Keynote Syntens 'Crossmediaal in 2010'
Keynote Syntens 'Crossmediaal in 2010'Keynote Syntens 'Crossmediaal in 2010'
Keynote Syntens 'Crossmediaal in 2010'
Hille van der Kaa MA MBA
 
Keynote Syntens 'Crossmediaal in 2010'
Keynote Syntens 'Crossmediaal in 2010'Keynote Syntens 'Crossmediaal in 2010'
Keynote Syntens 'Crossmediaal in 2010'
Hille van der Kaa MA MBA
 
'Happiness on 13'
'Happiness on 13''Happiness on 13'
'Happiness on 13'
Hille van der Kaa MA MBA
 

Viewers also liked (9)

How to validate a dataset? Six steps.
How to validate a dataset? Six steps.How to validate a dataset? Six steps.
How to validate a dataset? Six steps.
 
Brand storytelling introduction @iemes fontys
Brand storytelling   introduction @iemes fontysBrand storytelling   introduction @iemes fontys
Brand storytelling introduction @iemes fontys
 
Location based Apps for journalists
Location based Apps for journalistsLocation based Apps for journalists
Location based Apps for journalists
 
Keep presentation
Keep presentationKeep presentation
Keep presentation
 
Storytelling
StorytellingStorytelling
Storytelling
 
Toekomst Van Media
Toekomst Van MediaToekomst Van Media
Toekomst Van Media
 
Keynote Syntens 'Crossmediaal in 2010'
Keynote Syntens 'Crossmediaal in 2010'Keynote Syntens 'Crossmediaal in 2010'
Keynote Syntens 'Crossmediaal in 2010'
 
Keynote Syntens 'Crossmediaal in 2010'
Keynote Syntens 'Crossmediaal in 2010'Keynote Syntens 'Crossmediaal in 2010'
Keynote Syntens 'Crossmediaal in 2010'
 
'Happiness on 13'
'Happiness on 13''Happiness on 13'
'Happiness on 13'
 

Similar to Etmaal

Panel on the future of Electronic Health Records
Panel on the future of Electronic Health RecordsPanel on the future of Electronic Health Records
Panel on the future of Electronic Health Records
Karim Keshavjee
 
Data Discovery and Visualization
Data Discovery and VisualizationData Discovery and Visualization
Data Discovery and Visualization
Dr. Neil Brittliff
 
Simulation and collectiveintelligence
Simulation and collectiveintelligenceSimulation and collectiveintelligence
Simulation and collectiveintelligence
Gloria Origgi
 
Emcien overview v6 01282013
Emcien overview v6 01282013Emcien overview v6 01282013
Emcien overview v6 01282013
WCJones6348
 
Data as a source is like any other source, march 2014
Data as a source is like any other source, march 2014Data as a source is like any other source, march 2014
Data as a source is like any other source, march 2014
Hassel Fallas
 
Stewart Baker Metaadata Research Paper
Stewart Baker Metaadata Research PaperStewart Baker Metaadata Research Paper
Stewart Baker Metaadata Research Paper
Crystal Williams
 
The Human Side of Data By Colin Strong
The Human Side of Data By Colin StrongThe Human Side of Data By Colin Strong
The Human Side of Data By Colin Strong
MarTech Conference
 
The Journey of Data
The Journey of DataThe Journey of Data
The Journey of Data
Kylee Vogel
 
Data science and its potential to change business as we know it. The Roadmap ...
Data science and its potential to change business as we know it. The Roadmap ...Data science and its potential to change business as we know it. The Roadmap ...
Data science and its potential to change business as we know it. The Roadmap ...
InnoTech
 
Research Data Management
Research  Data ManagementResearch  Data Management
Research Data Management
Mahmoud91Tx
 
The law and ethics of data-driven artificial intelligence
The law and ethics of data-driven artificial intelligenceThe law and ethics of data-driven artificial intelligence
The law and ethics of data-driven artificial intelligence
PyData
 
Exploring the barriers to developing data-driven business models in the creat...
Exploring the barriers to developing data-driven business models in the creat...Exploring the barriers to developing data-driven business models in the creat...
Exploring the barriers to developing data-driven business models in the creat...
AAM_Associates
 
Denver Event - 2013 - Floodlight and Data Engine User Survey
Denver Event - 2013 - Floodlight and Data Engine User SurveyDenver Event - 2013 - Floodlight and Data Engine User Survey
Denver Event - 2013 - Floodlight and Data Engine User SurveyKDMC
 
The Algorithm: A Narrative
The Algorithm: A NarrativeThe Algorithm: A Narrative
The Algorithm: A Narrative
FITC
 
An Introduction to Data Visualization
An Introduction to Data VisualizationAn Introduction to Data Visualization
An Introduction to Data Visualization
Nupur Samaddar
 
Data types rapid launch (1)
Data types  rapid launch (1)Data types  rapid launch (1)
Data types rapid launch (1)
Alexandra Mannerings
 
Altmetrics / New metrics / Article-level metrics : a new metric of scholarly ...
Altmetrics / New metrics / Article-level metrics : a new metric of scholarly ...Altmetrics / New metrics / Article-level metrics : a new metric of scholarly ...
Altmetrics / New metrics / Article-level metrics : a new metric of scholarly ...
Eileen Shepherd
 
2016 06 27 dia ibara e_source final distribution
2016 06 27 dia ibara e_source final distribution2016 06 27 dia ibara e_source final distribution
2016 06 27 dia ibara e_source final distribution
Michael Ibara
 
People Analytics_Introduction
People Analytics_IntroductionPeople Analytics_Introduction
People Analytics_Introduction
Edith Soghomonyan
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA
 

Similar to Etmaal (20)

Panel on the future of Electronic Health Records
Panel on the future of Electronic Health RecordsPanel on the future of Electronic Health Records
Panel on the future of Electronic Health Records
 
Data Discovery and Visualization
Data Discovery and VisualizationData Discovery and Visualization
Data Discovery and Visualization
 
Simulation and collectiveintelligence
Simulation and collectiveintelligenceSimulation and collectiveintelligence
Simulation and collectiveintelligence
 
Emcien overview v6 01282013
Emcien overview v6 01282013Emcien overview v6 01282013
Emcien overview v6 01282013
 
Data as a source is like any other source, march 2014
Data as a source is like any other source, march 2014Data as a source is like any other source, march 2014
Data as a source is like any other source, march 2014
 
Stewart Baker Metaadata Research Paper
Stewart Baker Metaadata Research PaperStewart Baker Metaadata Research Paper
Stewart Baker Metaadata Research Paper
 
The Human Side of Data By Colin Strong
The Human Side of Data By Colin StrongThe Human Side of Data By Colin Strong
The Human Side of Data By Colin Strong
 
The Journey of Data
The Journey of DataThe Journey of Data
The Journey of Data
 
Data science and its potential to change business as we know it. The Roadmap ...
Data science and its potential to change business as we know it. The Roadmap ...Data science and its potential to change business as we know it. The Roadmap ...
Data science and its potential to change business as we know it. The Roadmap ...
 
Research Data Management
Research  Data ManagementResearch  Data Management
Research Data Management
 
The law and ethics of data-driven artificial intelligence
The law and ethics of data-driven artificial intelligenceThe law and ethics of data-driven artificial intelligence
The law and ethics of data-driven artificial intelligence
 
Exploring the barriers to developing data-driven business models in the creat...
Exploring the barriers to developing data-driven business models in the creat...Exploring the barriers to developing data-driven business models in the creat...
Exploring the barriers to developing data-driven business models in the creat...
 
Denver Event - 2013 - Floodlight and Data Engine User Survey
Denver Event - 2013 - Floodlight and Data Engine User SurveyDenver Event - 2013 - Floodlight and Data Engine User Survey
Denver Event - 2013 - Floodlight and Data Engine User Survey
 
The Algorithm: A Narrative
The Algorithm: A NarrativeThe Algorithm: A Narrative
The Algorithm: A Narrative
 
An Introduction to Data Visualization
An Introduction to Data VisualizationAn Introduction to Data Visualization
An Introduction to Data Visualization
 
Data types rapid launch (1)
Data types  rapid launch (1)Data types  rapid launch (1)
Data types rapid launch (1)
 
Altmetrics / New metrics / Article-level metrics : a new metric of scholarly ...
Altmetrics / New metrics / Article-level metrics : a new metric of scholarly ...Altmetrics / New metrics / Article-level metrics : a new metric of scholarly ...
Altmetrics / New metrics / Article-level metrics : a new metric of scholarly ...
 
2016 06 27 dia ibara e_source final distribution
2016 06 27 dia ibara e_source final distribution2016 06 27 dia ibara e_source final distribution
2016 06 27 dia ibara e_source final distribution
 
People Analytics_Introduction
People Analytics_IntroductionPeople Analytics_Introduction
People Analytics_Introduction
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 

Recently uploaded

special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
The Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptxThe Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptx
DhatriParmar
 
Digital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion DesignsDigital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion Designs
chanes7
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
Israel Genealogy Research Association
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
Scholarhat
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
goswamiyash170123
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
DhatriParmar
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
Krisztián Száraz
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
Wasim Ak
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Dr. Vinod Kumar Kanvaria
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 

Recently uploaded (20)

special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
The Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptxThe Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptx
 
Digital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion DesignsDigital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion Designs
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 

Etmaal

  • 1. validating external data sets what social scholars and data journalists can learn from each another
  • 2.     Hille van der Kaa @Hillevanderkaa  
  • 3.
  • 4. missing data, no value stored “I need to solve this”
  • 5. missing data, no value stored “I need to solve this” missing data, no value stored “I need to write a story about this”
  • 7.
  • 8. “Trustworthiness and data management are vital to the success of qualitative studies … There is a lack of scientific literature regarding the structures and processes for managing large qualitative data sets.” (White, Oelken, Friesen, 2012)  
  • 9. “A simple answer to objective reporting is the kind of reporting that uses relevant and reliable sources which is not bias or slanted to a certain party.” Ibrahim, Pawanteh, Kee (2011)
  • 10. can I trust and use this dataset?
  • 11. check the data source what are his/her/its intentions?
  • 12. what is the citation index of the data owner? do other journalists cite the data owner?  
  • 13. benefit do I really need this? do I really need to use it?  
  • 14. check data gathering? is this correct? clarification of the data? do I understand?  
  • 15. missing data what is wrong? I need to solve what is the story? I need to write  
  • 17. I need more sources! (do I?) give me data check consistency give me humans check my story  
  • 18. scientists data journalists check the source (citation) check the source (citation) check the data check the data check benefit check benefit check data gathering check clarification TEST! CALL! more data sources more human sources
  • 19. scientist to journalist: “You twist everything”
  • 20. “Dear datajournalist, Please take a look at the research method yourself and act a bit more like a scientist.”
  • 21. journalist to scientist: “Your articles are useless”
  • 22. “Dear scientist, Try to avoid intellectual arrogance. There are other people who are just as smart.”  
  • 23. journalistic data mining The process of finding correlations or patterns in large relational databases. It is the process of analyzing data from different perspectives and summarizing it into useful and reliable information.  
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30. Gross Time Ranking versus Net Time Ranking     ‘The net time is the measured time from starting line to finish line and the gross time is the measured time from the starting shot until the finish line. In photo's of the starting line of marathons one can see thousands of runners who are eager to start. However, when one stands in the last starting pen, one can not directly run at full speed. A kind of human traffic jam arises when the marathon starts. On the internet people complain about this difference in time results, because the ranking is based on gross times.’
  • 31.
  • 32.
  • 33.
  • 34. missing values - solve ‘We discovered that the data of 100 runners lacked. Apparently one scraped page was added double. We removed the 100 duplicates.’  
  • 35. missing values - story ‘Still, nineteen runners were missing in the Amsterdam data set. Perchance these are runners that have been disqualified.’ Or…
  • 36. ‘To calculate the average position changes, caused by net ranking, we converted the difference scores to absolute figures. The average position change in the Amsterdam Marathon was 281.6 places.’
  • 37. scientific outcome ‘We calculated the Kendalls Tau rank correlation coefficient for the net and gross ranking of the Amsterdam Marathon. This coefficient shows that despite of the average differences between the rankings, the net and gross time rankings are almost equal to each other.’
  • 38. journalistic outcome ‘We spoke Patrick Schuerman from Tilburg on the phone. Patrick had starting number 11797 in the Amsterdam Marathon of 2013 and had a gross time versus net time difference of over 21 minutes. In his opinion, the ranking of the marathon should happen after net times since these are the ‘real’ times people ran.’
  • 39.
  • 40. we are both right
  • 41. we can learn from each other
  • 42.     Hille van der Kaa @Hillevanderkaa   current topic: a citizen view on the credibility of machine written news   http://tinyurl.com/ research-uvt Part of PhD research Human Component in Machine Written Narratives