Web Science & Technologies
University of Koblenz ▪ Landau, Germany

Exploring the challenge of
linking scientific publicat...
Ideal workflow
1

Read publications

2

Access data

3 Reuse data

FOTO

 Peter Schumacher (social scientist) would like ...
Reality

?

FOTO

 Publications and research data (coming from surveys and
studies) are published independently
 The lin...
Scenario
publications

research data (studies)

WeST

Cristina Sarasua

 We need a method to
process publications and
stu...
Problem

 Computers cannot perform these 3 tasks automatically in a
perfect way

Incorrect link between a
publication and...
Solution: Crowdsourcing

“The process of outsourcing a task to a (potentially) large and
undefined group of people in an o...
Amazon Mechanical Turk

WeST

Cristina Sarasua

Exploring the challenge of linking scientific publications and
studies wit...
Crowdsourced interlinking: the GESIS case study

Researcher

1
SSOAR

Web
portal

Publications

da|ra

InfoLink
links

2

...
How is this related to CSS?

WeST

Cristina Sarasua

Exploring the challenge of linking scientific publications and
studie...
On the one hand …

The GESIS case study
In collab with GESIS colleagues
Katarina Boland, Daniel Hienert et al.

WeST

Cris...
On the other hand …

How to manage such a
group of people to maximize
their efficiency and make
them happy?

WeST

Cristin...
WeST

Cristina Sarasua

Exploring the challenge of linking scientific publications and
studies with crowd workers instead ...
Open call

 We can impose some restrictions (e.g. language, country,
reputation gained)
Different background

Different m...
The tasks at hand

 They are not the “most exciting tasks“ of the world
 The data is in German
 The domain is very spec...
First experiments of the GESIS case study

Adopted measures





Used majority voting
Included verification questions (...
Ongoing research work


Can we improve their results by including mixed
incentives? Not only money, but also competition ...
Take-home message

We can employ crowd workers for connecting scientific
publications and studies in the social sciences. ...
Call for discussion

 Who?
1. Psychologists
2. Social Scientists
3. Computer scientists
 Possible topics
 Any feedback ...
Thank you.
Vielen Dank.

WeST

Cristina Sarasua

Exploring the challenge of linking scientific publications and
studies wi...
Upcoming SlideShare
Loading in …5
×

Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts

466 views

Published on

Published in: Technology
1 Comment
0 Likes
Statistics
Notes
  • Be the first to like this

No Downloads
Views
Total views
466
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
3
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide

Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts

  1. 1. Web Science & Technologies University of Koblenz ▪ Landau, Germany Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts Cristina Sarasua csarasua@uni-koblenz.de Computational Social Science workshop Köln, 16.12.2013
  2. 2. Ideal workflow 1 Read publications 2 Access data 3 Reuse data FOTO  Peter Schumacher (social scientist) would like to analyse the voting patterns of Germans in the last 20 years  Past observations  New analysis, new findings WeST Cristina Sarasua Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts
  3. 3. Reality ? FOTO  Publications and research data (coming from surveys and studies) are published independently  The link between them is missing  Researchers cannot easily access the research data WeST Cristina Sarasua Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts
  4. 4. Scenario publications research data (studies) WeST Cristina Sarasua  We need a method to process publications and studies in order to be able to 1. Find references to studies inside publications 2. Identify which publication is connected to which study 3. Identify the type of relation between publication and study Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts
  5. 5. Problem  Computers cannot perform these 3 tasks automatically in a perfect way Incorrect link between a publication and a study  We need human intervention  Domain experts are often not available for such kind of tasks WeST Cristina Sarasua Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts
  6. 6. Solution: Crowdsourcing “The process of outsourcing a task to a (potentially) large and undefined group of people in an open call“ Jeff Howe, 2006 Microtask crowdsourcing -Simple and independent tasks -Paid crowdsourcing -Online labor marketplaces (e.g. MTurk) - WeST Cristina Sarasua Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts
  7. 7. Amazon Mechanical Turk WeST Cristina Sarasua Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts
  8. 8. Crowdsourced interlinking: the GESIS case study Researcher 1 SSOAR Web portal Publications da|ra InfoLink links 2 3 CrowdLINK corrected links Web portal Research data Hybrid solution 1) Automatic processing of publications and studies 2) Ask crowd workers to review links - Correct errors - Identify primary literature / secondary literature 3) Generates Linked Data WeST Cristina Sarasua Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts
  9. 9. How is this related to CSS? WeST Cristina Sarasua Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts
  10. 10. On the one hand … The GESIS case study In collab with GESIS colleagues Katarina Boland, Daniel Hienert et al. WeST Cristina Sarasua Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts
  11. 11. On the other hand … How to manage such a group of people to maximize their efficiency and make them happy? WeST Cristina Sarasua Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts
  12. 12. WeST Cristina Sarasua Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts
  13. 13. Open call  We can impose some restrictions (e.g. language, country, reputation gained) Different background Different motivations Chart: Ipeirotis, 2010 Different behaviour 2010  Spam Charts: Charts Ross et al., 2010 CrowdFlower 11.12.2013 WeST Cristina Sarasua Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts
  14. 14. The tasks at hand  They are not the “most exciting tasks“ of the world  The data is in German  The domain is very specific WeST Cristina Sarasua Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts
  15. 15. First experiments of the GESIS case study Adopted measures    Used majority voting Included verification questions (e.g. “please type the date shown for the publication“) Defined gold standard links to check who could be trusted Highlights of findings    We managed to get trusted workers quite quickly (e.g. 490 links reviewed in ~24hours) being able to improve the precision of the automatic software without without loosing considerable recall The cases which required background knowledge showed worse results The task of “relating publication and study“ was solved with much better recall than the task of deciding on “whether a publication is primaryLiterature or not of a study“. The precision was very high, though. WeST Cristina Sarasua Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts
  16. 16. Ongoing research work  Can we improve their results by including mixed incentives? Not only money, but also competition at a microtask level there are only X links left, be quick!“, or „there are three workers who were faster in reviewing links!  there 3 workers who were faster in reviewing links! How can we better instruct crowd workers in 1) the type of tasks were are running and 2) the domain we are working with? WeST Cristina Sarasua Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts
  17. 17. Take-home message We can employ crowd workers for connecting scientific publications and studies in the social sciences. It can improve automatically generated links. How can we transfer the knowledge of domain experts to the crowd? WeST Cristina Sarasua Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts
  18. 18. Call for discussion  Who? 1. Psychologists 2. Social Scientists 3. Computer scientists  Possible topics  Any feedback about the aforementioned ideas  Well-established methodologies in psychology to instruct or train a large group of people  Any suggestion on how to analyse crowd workers (i.e. criteria) WeST Cristina Sarasua Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts
  19. 19. Thank you. Vielen Dank. WeST Cristina Sarasua Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts

×