Your SlideShare is downloading. ×
Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts

98
views

Published on

Published in: Technology

1 Comment
0 Likes
Statistics
Notes
  • Be the first to like this

No Downloads
Views
Total Views
98
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
1
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Web Science & Technologies University of Koblenz ▪ Landau, Germany Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts Cristina Sarasua csarasua@uni-koblenz.de Computational Social Science workshop Köln, 16.12.2013
  • 2. Ideal workflow 1 Read publications 2 Access data 3 Reuse data FOTO  Peter Schumacher (social scientist) would like to analyse the voting patterns of Germans in the last 20 years  Past observations  New analysis, new findings WeST Cristina Sarasua Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts
  • 3. Reality ? FOTO  Publications and research data (coming from surveys and studies) are published independently  The link between them is missing  Researchers cannot easily access the research data WeST Cristina Sarasua Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts
  • 4. Scenario publications research data (studies) WeST Cristina Sarasua  We need a method to process publications and studies in order to be able to 1. Find references to studies inside publications 2. Identify which publication is connected to which study 3. Identify the type of relation between publication and study Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts
  • 5. Problem  Computers cannot perform these 3 tasks automatically in a perfect way Incorrect link between a publication and a study  We need human intervention  Domain experts are often not available for such kind of tasks WeST Cristina Sarasua Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts
  • 6. Solution: Crowdsourcing “The process of outsourcing a task to a (potentially) large and undefined group of people in an open call“ Jeff Howe, 2006 Microtask crowdsourcing -Simple and independent tasks -Paid crowdsourcing -Online labor marketplaces (e.g. MTurk) - WeST Cristina Sarasua Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts
  • 7. Amazon Mechanical Turk WeST Cristina Sarasua Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts
  • 8. Crowdsourced interlinking: the GESIS case study Researcher 1 SSOAR Web portal Publications da|ra InfoLink links 2 3 CrowdLINK corrected links Web portal Research data Hybrid solution 1) Automatic processing of publications and studies 2) Ask crowd workers to review links - Correct errors - Identify primary literature / secondary literature 3) Generates Linked Data WeST Cristina Sarasua Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts
  • 9. How is this related to CSS? WeST Cristina Sarasua Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts
  • 10. On the one hand … The GESIS case study In collab with GESIS colleagues Katarina Boland, Daniel Hienert et al. WeST Cristina Sarasua Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts
  • 11. On the other hand … How to manage such a group of people to maximize their efficiency and make them happy? WeST Cristina Sarasua Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts
  • 12. WeST Cristina Sarasua Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts
  • 13. Open call  We can impose some restrictions (e.g. language, country, reputation gained) Different background Different motivations Chart: Ipeirotis, 2010 Different behaviour 2010  Spam Charts: Charts Ross et al., 2010 CrowdFlower 11.12.2013 WeST Cristina Sarasua Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts
  • 14. The tasks at hand  They are not the “most exciting tasks“ of the world  The data is in German  The domain is very specific WeST Cristina Sarasua Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts
  • 15. First experiments of the GESIS case study Adopted measures    Used majority voting Included verification questions (e.g. “please type the date shown for the publication“) Defined gold standard links to check who could be trusted Highlights of findings    We managed to get trusted workers quite quickly (e.g. 490 links reviewed in ~24hours) being able to improve the precision of the automatic software without without loosing considerable recall The cases which required background knowledge showed worse results The task of “relating publication and study“ was solved with much better recall than the task of deciding on “whether a publication is primaryLiterature or not of a study“. The precision was very high, though. WeST Cristina Sarasua Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts
  • 16. Ongoing research work  Can we improve their results by including mixed incentives? Not only money, but also competition at a microtask level there are only X links left, be quick!“, or „there are three workers who were faster in reviewing links!  there 3 workers who were faster in reviewing links! How can we better instruct crowd workers in 1) the type of tasks were are running and 2) the domain we are working with? WeST Cristina Sarasua Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts
  • 17. Take-home message We can employ crowd workers for connecting scientific publications and studies in the social sciences. It can improve automatically generated links. How can we transfer the knowledge of domain experts to the crowd? WeST Cristina Sarasua Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts
  • 18. Call for discussion  Who? 1. Psychologists 2. Social Scientists 3. Computer scientists  Possible topics  Any feedback about the aforementioned ideas  Well-established methodologies in psychology to instruct or train a large group of people  Any suggestion on how to analyse crowd workers (i.e. criteria) WeST Cristina Sarasua Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts
  • 19. Thank you. Vielen Dank. WeST Cristina Sarasua Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts