Your SlideShare is downloading. ×
Crowdsourcing the Assembly of Concept Hierarchies
Crowdsourcing the Assembly of Concept Hierarchies
Crowdsourcing the Assembly of Concept Hierarchies
Crowdsourcing the Assembly of Concept Hierarchies
Crowdsourcing the Assembly of Concept Hierarchies
Crowdsourcing the Assembly of Concept Hierarchies
Crowdsourcing the Assembly of Concept Hierarchies
Crowdsourcing the Assembly of Concept Hierarchies
Crowdsourcing the Assembly of Concept Hierarchies
Crowdsourcing the Assembly of Concept Hierarchies
Crowdsourcing the Assembly of Concept Hierarchies
Crowdsourcing the Assembly of Concept Hierarchies
Crowdsourcing the Assembly of Concept Hierarchies
Crowdsourcing the Assembly of Concept Hierarchies
Crowdsourcing the Assembly of Concept Hierarchies
Crowdsourcing the Assembly of Concept Hierarchies
Crowdsourcing the Assembly of Concept Hierarchies
Crowdsourcing the Assembly of Concept Hierarchies
Crowdsourcing the Assembly of Concept Hierarchies
Crowdsourcing the Assembly of Concept Hierarchies
Crowdsourcing the Assembly of Concept Hierarchies
Crowdsourcing the Assembly of Concept Hierarchies
Crowdsourcing the Assembly of Concept Hierarchies
Crowdsourcing the Assembly of Concept Hierarchies
Crowdsourcing the Assembly of Concept Hierarchies
Crowdsourcing the Assembly of Concept Hierarchies
Crowdsourcing the Assembly of Concept Hierarchies
Crowdsourcing the Assembly of Concept Hierarchies
Crowdsourcing the Assembly of Concept Hierarchies
Crowdsourcing the Assembly of Concept Hierarchies
Crowdsourcing the Assembly of Concept Hierarchies
Crowdsourcing the Assembly of Concept Hierarchies
Crowdsourcing the Assembly of Concept Hierarchies
Crowdsourcing the Assembly of Concept Hierarchies
Crowdsourcing the Assembly of Concept Hierarchies
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Crowdsourcing the Assembly of Concept Hierarchies

1,365

Published on

How to create a taxonomy by a paid workforce provided by Amazon Mechanical Turk. Evaluative comparison to an existing community of motivated students and domain experts. …

How to create a taxonomy by a paid workforce provided by Amazon Mechanical Turk. Evaluative comparison to an existing community of motivated students and domain experts.

Presentation held at JCDL 2010, Brisbane, Australia (http://www.jcdl2010.org).

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,365
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Crowdsourcing the Assembly of Concept Hierarchies Kai Eckert¹ Cameron Buckner² Mathias Niepert¹ Colin Allen² Christof Niemann¹ Heiner Stuckenschmidt¹ ¹ University of Mannheim, Germany ² Indiana University, USA Presentation: Kai Eckert Wednesday, June 23, 2010 Joint Conference on Digital Libraries (JCDL), Brisbane, Australia, 2010
  • 2. Motivation ● Various types of Concept Hierarchies: ● Thesauri ● Taxonomies ● Classifications ● Ontologies ● ... ● Manual creation is expensive. ● Automatic creation lacks quality.
  • 3. Could the users do the work? ● Divide the work between a lot of users. ● Motivate them to be part of a community. ● Achieve quality control by means of redundancy. ● Can a concept hierarchy be created like e.g. Wikipedia?
  • 4. ● The Indiana Philosophy Ontology Project. ● A browsable taxonomy of philosophical ideas. ● Ideas are extracted from the Stanford Encyclopedia of Philosophy (SEP). ● Intuitive access to the SEP via the InPhO taxonomy. ● Entry point for other philosophical ressources on the web.
  • 5. From the SEP to InPhO Start with a hand-built formal ontology describing major Extraction of new topics and sub-topics. ideas and relationships Process feedback and Gathering community infer positions in the feedback about ideas classification tree and relationships
  • 6. Gathering community feedback
  • 7. Gathering community feedback Relatedness
  • 8. Gathering community feedback Relatedness is more specific than Relative Generality
  • 9. Great stuff, but... ● what, if you do not have a motivated community of expert users? ● Well,... ● Like almost everything, you can buy it at Amazon... ● Amazon Mechanical Turk
  • 10. Amazon Mechanical Turk (AMT) ● Platform for the placing and taking of Human Intelligence Tasks (HIT). ● 100,000 – 400,000 HITs available. ● Number of workers: ??? (100,000 in 100 countries, 2007, New York Times).
  • 11. HIT Definition Time allotted per assignment: Maximum time a worker can work on a single task. Worker restrictions: Approval Rate, Location Reward per assignment: How much do you pay for each HIT? Number of assignments per HIT: How many unique workers do you want to work on each HIT?
  • 12. HIT Result Answer of each worker for each HIT Accept Time, Submit Time, Work Time In Seconds Worker ID
  • 13. Our questions Can we replace the InPhO community by means of Amazon Mechanical Turk? How much does it cost and what is the resulting quality?
  • 14. Experimental Setup ● We wanted some overlap within the experts: Minimum overlap i=1 2 3 4 5 Number of pairs 3,237 1,154 370 187 92 We decided for the 1,154 pairs. ● Each pair was evaluated by 5 different workers. ● Each worker evaluated at least 12 pairs (1 HIT). ● 87 distinct workers. ● The HITs were completed in 20 hours.
  • 15. Measuring Agreement ● Calculation of the distance between two answers: ● Relatedness: Absolute value of the difference ● Relative Generality: Match: 0, otherwise: 1 ● The evaluation deviation is the mean distance of a user to the users in a reference group.
  • 16. Comparison with Experts (Relative Generality) 30 InPhO Users AMT Users Fraction of users in % 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Follow Experts Own Opinion
  • 17. Comparison with Experts (Relative Generality) Random Clicker 30 InPhO Users AMT Users Fraction of users in % 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Follow Experts Own Opinion
  • 18. Comparison with Experts (Relative Generality) 30 InPhO Users AMT Users Fraction of users in % 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Follow Experts Own Opinion
  • 19. Comparison with Experts (Relative Generality) InPhO Users are quite consistent. 30 InPhO Users AMT Users Fraction of users in % 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Follow Experts Own Opinion
  • 20. Comparison with Experts (Relative Generality) InPhO Users are quite consistent. 30 InPhO Users AMT Users Fraction of users in % 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Follow Experts Own Opinion AMT Users are not consistent. → Are there good ones?
  • 21. Comparison with Experts (Relative Generality) InPhO Users are quite consistent. 30 InPhO Users AMT Users Fraction of users in % 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Follow Experts Own Opinion AMT Users are not consistent. Yes, there are! → Are there good ones? → But which ones?
  • 22. Comparison with Experts (Relative Generality) InPhO Users are quite consistent. 30 InPhO Users AMT Users Fraction of users in % 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Follow Experts Own Opinion AMT Users are not consistent. Yes, there are! → Are there good ones? → But which ones?
  • 23. Mixed Results... Can we just use the good ones?
  • 24. Telling the good from the bad ● First approach: Filtering by working time ● Hypothesis 1: Workers who think some time before they answer, give better answers. ● Hypothesis 2: Probably there are workers who give quick random responses.
  • 25. Filtering by working time 100 84 80 75 Number of Users 68 60 57 44 40 36 29 22 20 17 # Users 13 9 9 8 7 5 4 4 3 0 0s s s s s s s s s s s s s 00 40 00 40 40 60 20 80 60 20 80 00 >8 >2 >4 >5 >7 >8 >1 >2 >3 >3 >5 >6 >6 Average working time for one HIT (12 pairs)
  • 26. Filtering by working time 48 47 1, 1,5 100 1, 41 39 1, 38 37 36 1, 35 1, 1, 1, 1,42 1, 31 1, 27 1, 1,2 84 1,21 80 10 1, Deviation from Experts 75 1,06 Number of Users 68 0,9 60 57 0,64 0,6 44 40 36 29 0,3 22 20 17 # Users 13 Deviation 9 9 8 7 5 4 4 0 3 0 0s s s s s s s s s s s s s 00 40 00 40 40 60 20 80 60 20 80 00 >8 >2 >4 >5 >7 >8 >1 >2 >3 >3 >5 >6 >6 Average working time for one HIT (12 pairs)
  • 27. Telling the good from the bad ● Second approach: Filtering by comparison with a hidden gold standard. ● Test pairs: ● Social Epistemology – Epistemology (P1) ● Computer Ethics – Ethics (P2) ● Chinese Room Argument – Chinese Philosophy (P3) ● Dualism - Philosophy of Mind (P4)
  • 28. Applying filters ● Test pairs: ● Social Epistemology – Epistemology (P1) ● Computer Ethics – Ethics (P2) ● Chinese Room Argument – Chinese Philosophy (P3) ● Dualism - Philosophy of Mind (P4) ● Filters: 1) P1 and P2 are correct (Common Sense) 2) Like 1), additionally P4 is correct (+Background) 3) Like 1), additionally P3 is correct (+Lexical) 4) All have to be correct (All)
  • 29. Filter results for relatedness Filter Users Deviation Max. Dev. All (4) 7 0.60 1.00 +Lexical (3) 10 0.87 1.78 +Background (2) 23 0.84 1.41 Common Sense (1) 40 1.11 1.96 All AMT 87 1.39 2.96 All InPhO 25 0.77 1.75 Random --- 1.8 ---
  • 30. Filter results for relative generality Filter Users Deviation Max. Dev. All (4) 7(5) 0.12 0.22 +Lexical (3) 10(8) 0.14 0.27 +Background (2) 23(20) 0.15 0.45 Common Sense (1) 40(35) 0.21 0.59 All AMT 87(78) 0.45 1.00 All InPhO 25 0.23 0.47 Random --- 0.75 ---
  • 31. Financial considerations Filter Pairs Evaluations Cost per Pair Cost per Evaluation --- 1,138 5,690 US$ 0.111 US$ 0.022 Common Sense (1) 1,074 1,909 US$ 0.117 US$ 0.066 +Background (2) 1,018 1,558 US$ 0.124 US$ 0.081 +Lexical (3) 215 215 US$ 0.586 US$ 0.586 All (4) 183 183 US$ 0.689 US$ 0.689 ● Overall payments: 126 US$ ● Estimation for all pairs with filter „All (4)“: 784 US$ ● Estimation for all pairs with redundancy (5x): 3,920 US$.
  • 32. Conclusion AMT answers are of varying quality. But this is true for many communities, too. With moderate filtering („Background“), we achieved a quality comparable to the InPhO community. With 5 evaluations per pair, we still covered 89% of all pairs with this filter. The resulting InPhO taxonomy is online: http://inpho.cogs.indiana.edu/amt_taxonomy No need for existing data, gold standards or training data (Beside the filter pairs). No need for a community?
  • 33. Thank you Questions? Kai Eckert kai@informatik.uni-mannheim.de http://www.slideshare.net/kaiec „Computer ethics doesn't exist. Blue is black and red is blood on the internet. Nobody cares, because they are lonely.“ Anonymous Mechanical Turk Worker
  • 34. Photo Credits ● Michal Zacharzewski (Title Crowd), http://www.sxc.hu/profile/mzacha ● Peter Suneson (Crowd sillhouette), http://www.sxc.hu/profile/CMSeter ● Alaa Hamed (Egyptian Coins), http://www.sxc.hu/profile/alaasafei ● Piotr Lewandowski (Money), http://www.sxc.hu/profile/LeWy2005 ● Asif Akbar (Clock), http://www.sxc.hu/profile/asifthebes ● Zern Liew (Traffic Cone), http://www.sxc.hu/profile/eidesign ● Peter Gustafson (Counting Fingers), http://www.sxc.hu/profile/liaj ● Kostya Kisleyko (Yes No), http://www.sxc.hu/profile/dlnny ● Sergio Roberto Bichara (Barcode), http://www.sxc.hu/profile/srbichara ● Maggie Molloy (Icons), http://www.sxc.hu/profile/agthabrown ● Sanja Gjenero (World with Crowd), http://www.sxc.hu/profile/lusi ● Wikimedia Commons (The Turk), http://en.wikipedia.org/wiki/File:Kempelen_chess1.jpg

×