Crowdsourcing
     the Assembly of Concept
           Hierarchies
 Kai Eckert¹                           Cameron Buckner²
 Mathias Niepert¹                      Colin Allen²
 Christof Niemann¹                     Heiner Stuckenschmidt¹

¹ University of Mannheim, Germany
² Indiana University, USA

 Presentation: Kai Eckert
 Wednesday, June 23, 2010


  Joint Conference on Digital Libraries (JCDL), Brisbane, Australia, 2010
Motivation
●   Various types of Concept Hierarchies:

    ●   Thesauri
    ●   Taxonomies
    ●   Classifications
    ●   Ontologies
    ●   ...
●   Manual creation is expensive.

●   Automatic creation lacks quality.
Could the users do the work?
●   Divide the work between a lot of users.

●   Motivate them to be part of a community.

●   Achieve quality control by means of redundancy.

●   Can a concept hierarchy be
    created like e.g. Wikipedia?
●   The Indiana Philosophy Ontology Project.

●   A browsable taxonomy of philosophical ideas.

●   Ideas are extracted from the Stanford Encyclopedia of
    Philosophy (SEP).

●   Intuitive access to the SEP via the InPhO taxonomy.

●   Entry point for other philosophical ressources on the web.
From the SEP to InPhO
Start with a hand-built
formal ontology
describing major                      Extraction of new
topics and sub-topics.                ideas and relationships




             Process feedback and                    Gathering community
             infer positions in the                  feedback about ideas
             classification tree                     and relationships
Gathering community feedback
Gathering community feedback


                       Relatedness
Gathering community feedback


                                         Relatedness

           is more specific than



                                   Relative Generality
Great stuff, but...
●   what, if you do not have a motivated community of expert
    users?

●   Well,...

●   Like almost everything,
    you can buy it
    at Amazon...

●   Amazon Mechanical Turk
Amazon Mechanical Turk (AMT)

●   Platform for the placing and taking of
    Human Intelligence Tasks (HIT).
●   100,000 – 400,000 HITs available.
●   Number of workers: ??? (100,000 in 100 countries,
    2007, New York Times).
HIT Definition
Time allotted per assignment: Maximum time
a worker can work on a single task.


Worker restrictions: Approval Rate, Location



Reward per assignment: How much do you pay for
each HIT?


Number of assignments per HIT: How many unique
workers do you want to work on each HIT?
HIT Result

Answer of each worker for each HIT



Accept Time, Submit Time, Work Time In
Seconds



Worker ID
Our questions
Can we replace the InPhO community by means
of Amazon Mechanical Turk?



How much does it cost and what is the resulting
quality?
Experimental Setup
●   We wanted some overlap within the experts:
         Minimum overlap    i=1     2      3     4     5
         Number of pairs   3,237   1,154   370   187   92

    We decided for the 1,154 pairs.

●   Each pair was evaluated by 5 different workers.

●   Each worker evaluated at least 12 pairs (1 HIT).

●   87 distinct workers.

●   The HITs were completed in 20 hours.
Measuring Agreement
●   Calculation of the distance between two answers:

    ●   Relatedness: Absolute value of the difference
    ●   Relative Generality: Match: 0, otherwise: 1
●   The evaluation deviation is the mean distance of a user
    to the users in a reference group.
Comparison with Experts
                                      (Relative Generality)


       30
                                                                           InPhO Users
                                                                           AMT Users
 Fraction of users in %




               0
                          0.1   0.2    0.3   0.4   0.5   0.6   0.7   0.8     0.9   1.0
                          Follow Experts                             Own Opinion
Comparison with Experts
                                      (Relative Generality)




                                                               Random Clicker
       30
                                                                                      InPhO Users
                                                                                      AMT Users
 Fraction of users in %




               0
                          0.1   0.2    0.3   0.4   0.5   0.6   0.7              0.8     0.9   1.0
                          Follow Experts                                        Own Opinion
Comparison with Experts
                                      (Relative Generality)


       30
                                                                           InPhO Users
                                                                           AMT Users
 Fraction of users in %




               0
                          0.1   0.2    0.3   0.4   0.5   0.6   0.7   0.8     0.9   1.0
                          Follow Experts                             Own Opinion
Comparison with Experts
                                                  (Relative Generality)
InPhO Users are quite consistent.
                   30
                                                                                       InPhO Users
                                                                                       AMT Users
             Fraction of users in %




                           0
                                      0.1   0.2    0.3   0.4   0.5   0.6   0.7   0.8     0.9   1.0
                                      Follow Experts                             Own Opinion
Comparison with Experts
                                                  (Relative Generality)
InPhO Users are quite consistent.
                   30
                                                                                       InPhO Users
                                                                                       AMT Users
             Fraction of users in %




                           0
                                      0.1   0.2    0.3   0.4   0.5   0.6   0.7   0.8     0.9   1.0
                                      Follow Experts                             Own Opinion


AMT Users are not consistent.
→ Are there good ones?
Comparison with Experts
                                                  (Relative Generality)
InPhO Users are quite consistent.
                   30
                                                                                       InPhO Users
                                                                                       AMT Users
             Fraction of users in %




                           0
                                      0.1   0.2    0.3   0.4   0.5   0.6   0.7   0.8     0.9   1.0
                                      Follow Experts                             Own Opinion


AMT Users are not consistent.                                                     Yes, there are!
→ Are there good ones?                                                            → But which ones?
Comparison with Experts
                                                  (Relative Generality)
InPhO Users are quite consistent.
                   30
                                                                                       InPhO Users
                                                                                       AMT Users
             Fraction of users in %




                           0
                                      0.1   0.2    0.3   0.4   0.5   0.6   0.7   0.8     0.9   1.0
                                      Follow Experts                             Own Opinion


AMT Users are not consistent.                                                     Yes, there are!
→ Are there good ones?                                                            → But which ones?
Mixed Results...

Can we just use the good ones?
Telling the good from the bad

●   First approach: Filtering by working time

●   Hypothesis 1: Workers who think some time before they
    answer, give better answers.

●   Hypothesis 2: Probably there are workers who give quick
    random responses.
Filtering by working time
                                                                                                                      100




       84                                                                                                             80

            75




                                                                                                                           Number of Users
                  68
                                                                                                                      60
                         57



                              44                                                                                      40

                                     36

                                             29

                                                  22                                                                  20
                                                        17
            # Users
                                                              13
                                                                   9     9       8     7
                                                                                             5       4   4     3      0
  0s




                                                         s
             s


                     s


                               s


                                         s


                                                   s




                                                                  s


                                                                             s


                                                                                       s


                                                                                                 s


                                                                                                         s


                                                                                                                  s
                   00




                                                         40


                                                                00




                                                                                                       40
          40




                            60


                                      20


                                                80




                                                                          60


                                                                                    20


                                                                                              80




                                                                                                               00
>8




                 >2




                                                       >4


                                                              >5




                                                                                                     >7


                                                                                                             >8
       >1




                         >2


                                   >3


                                             >3




                                                                       >5


                                                                                 >6


                                                                                           >6
                          Average working time for one HIT (12 pairs)
Filtering by working time




                                                                                          48


                                                                                                  47
                                                                                       1,
                         1,5                                                                                                                                                       100




                                                                                               1,
                                          41




                                                                             39
                                       1,
                                  38




                                                    37



                                                                 36



                                                                          1,




                                                                                                                                             35
                               1,




                                                 1,



                                                              1,
                                                                                             1,42




                                                                                                                                          1,




                                                                                                                                                                   31
                                                                                                                                                                1,
                                                                                                            27
                                                                                                         1,
                         1,2    84                                                                                                                                          1,21   80




                                                                                                                        10
                                                                                                                     1,
Deviation from Experts




                                        75
                                                                                                                                   1,06




                                                                                                                                                                                        Number of Users
                                                68
                         0,9                                                                                                                                                       60
                                                         57



                                                                                                                                                                    0,64
                         0,6                                   44                                                                                                                  40

                                                                        36

                                                                                  29

                         0,3                                                            22                                                                                         20
                                                                                                    17
                                        # Users
                                                                                                          13
                                        Deviation
                                                                                                                 9     9       8          7
                                                                                                                                                  5     4   4
                          0                                                                                                                                             3          0
                       0s




                                                                                                     s
                                        s


                                                   s


                                                               s


                                                                            s


                                                                                        s




                                                                                                                s


                                                                                                                           s


                                                                                                                                     s


                                                                                                                                                    s


                                                                                                                                                            s


                                                                                                                                                                        s
                                                 00




                                                                                                 40


                                                                                                              00




                                                                                                                                                          40
                                     40




                                                            60


                                                                         20


                                                                                     80




                                                                                                                        60


                                                                                                                                  20


                                                                                                                                                 80




                                                                                                                                                                     00
                     >8




                                               >2




                                                                                               >4


                                                                                                            >5




                                                                                                                                                        >7


                                                                                                                                                                   >8
                                  >1




                                                         >2


                                                                      >3


                                                                                  >3




                                                                                                                     >5


                                                                                                                               >6


                                                                                                                                              >6
                                                          Average working time for one HIT (12 pairs)
Telling the good from the bad

●   Second approach: Filtering by comparison with a hidden
    gold standard.

●   Test pairs:

    ●   Social Epistemology – Epistemology (P1)
    ●   Computer Ethics – Ethics (P2)
    ●   Chinese Room Argument – Chinese Philosophy (P3)
    ●   Dualism - Philosophy of Mind (P4)
Applying filters
●   Test pairs:
    ●   Social Epistemology – Epistemology (P1)
    ●   Computer Ethics – Ethics (P2)
    ●   Chinese Room Argument – Chinese Philosophy (P3)
    ●   Dualism - Philosophy of Mind (P4)
●   Filters:
    1) P1 and P2 are correct (Common Sense)
    2) Like 1), additionally P4 is correct (+Background)
    3) Like 1), additionally P3 is correct (+Lexical)
    4) All have to be correct (All)
Filter results for relatedness

Filter             Users    Deviation   Max. Dev.
All (4)                7         0.60        1.00
+Lexical (3)         10          0.87        1.78
+Background (2)      23          0.84        1.41
Common Sense (1)     40          1.11        1.96
All AMT              87          1.39        2.96
All InPhO            25          0.77        1.75
Random                ---         1.8          ---
Filter results for relative generality

Filter             Users    Deviation   Max. Dev.
All (4)              7(5)        0.12        0.22
+Lexical (3)        10(8)        0.14        0.27
+Background (2)    23(20)        0.15        0.45
Common Sense (1)   40(35)        0.21        0.59
All AMT            87(78)        0.45        1.00
All InPhO             25         0.23        0.47
Random                ---        0.75          ---
Financial considerations
Filter                  Pairs   Evaluations   Cost per Pair Cost per Evaluation
---                    1,138          5,690      US$ 0.111           US$ 0.022
Common Sense (1)       1,074          1,909      US$ 0.117           US$ 0.066
+Background (2)        1,018          1,558      US$ 0.124           US$ 0.081
+Lexical (3)             215           215       US$ 0.586           US$ 0.586
All (4)                  183           183       US$ 0.689           US$ 0.689



●     Overall payments:         126 US$

●     Estimation for all pairs with filter „All (4)“:        784 US$

●     Estimation for all pairs with redundancy (5x):            3,920 US$.
Conclusion
AMT answers are of varying quality. But this is true
for many communities, too.
With moderate filtering („Background“), we achieved
a quality comparable to the InPhO community.
With 5 evaluations per pair, we still covered 89% of
all pairs with this filter.
The resulting InPhO taxonomy is online:
http://inpho.cogs.indiana.edu/amt_taxonomy
No need for existing data, gold standards or training
data (Beside the filter pairs).
No need for a community?
Thank you

                 Questions?

                Kai Eckert
     kai@informatik.uni-mannheim.de
      http://www.slideshare.net/kaiec


„Computer ethics doesn't exist. Blue is
black and red is blood on the internet.
Nobody cares, because they are lonely.“

                    Anonymous Mechanical Turk Worker
Photo Credits
●   Michal Zacharzewski (Title Crowd), http://www.sxc.hu/profile/mzacha
●   Peter Suneson (Crowd sillhouette), http://www.sxc.hu/profile/CMSeter
●   Alaa Hamed (Egyptian Coins), http://www.sxc.hu/profile/alaasafei
●   Piotr Lewandowski (Money), http://www.sxc.hu/profile/LeWy2005
●   Asif Akbar (Clock), http://www.sxc.hu/profile/asifthebes
●   Zern Liew (Traffic Cone), http://www.sxc.hu/profile/eidesign
●   Peter Gustafson (Counting Fingers), http://www.sxc.hu/profile/liaj
●   Kostya Kisleyko (Yes No), http://www.sxc.hu/profile/dlnny
●   Sergio Roberto Bichara (Barcode), http://www.sxc.hu/profile/srbichara
●   Maggie Molloy (Icons), http://www.sxc.hu/profile/agthabrown
●   Sanja Gjenero (World with Crowd), http://www.sxc.hu/profile/lusi
●   Wikimedia Commons (The Turk), http://en.wikipedia.org/wiki/File:Kempelen_chess1.jpg

Crowdsourcing the Assembly of Concept Hierarchies

  • 1.
    Crowdsourcing the Assembly of Concept Hierarchies Kai Eckert¹ Cameron Buckner² Mathias Niepert¹ Colin Allen² Christof Niemann¹ Heiner Stuckenschmidt¹ ¹ University of Mannheim, Germany ² Indiana University, USA Presentation: Kai Eckert Wednesday, June 23, 2010 Joint Conference on Digital Libraries (JCDL), Brisbane, Australia, 2010
  • 2.
    Motivation ● Various types of Concept Hierarchies: ● Thesauri ● Taxonomies ● Classifications ● Ontologies ● ... ● Manual creation is expensive. ● Automatic creation lacks quality.
  • 3.
    Could the usersdo the work? ● Divide the work between a lot of users. ● Motivate them to be part of a community. ● Achieve quality control by means of redundancy. ● Can a concept hierarchy be created like e.g. Wikipedia?
  • 4.
    The Indiana Philosophy Ontology Project. ● A browsable taxonomy of philosophical ideas. ● Ideas are extracted from the Stanford Encyclopedia of Philosophy (SEP). ● Intuitive access to the SEP via the InPhO taxonomy. ● Entry point for other philosophical ressources on the web.
  • 5.
    From the SEPto InPhO Start with a hand-built formal ontology describing major Extraction of new topics and sub-topics. ideas and relationships Process feedback and Gathering community infer positions in the feedback about ideas classification tree and relationships
  • 6.
  • 7.
  • 8.
    Gathering community feedback Relatedness is more specific than Relative Generality
  • 10.
    Great stuff, but... ● what, if you do not have a motivated community of expert users? ● Well,... ● Like almost everything, you can buy it at Amazon... ● Amazon Mechanical Turk
  • 11.
    Amazon Mechanical Turk(AMT) ● Platform for the placing and taking of Human Intelligence Tasks (HIT). ● 100,000 – 400,000 HITs available. ● Number of workers: ??? (100,000 in 100 countries, 2007, New York Times).
  • 12.
    HIT Definition Time allottedper assignment: Maximum time a worker can work on a single task. Worker restrictions: Approval Rate, Location Reward per assignment: How much do you pay for each HIT? Number of assignments per HIT: How many unique workers do you want to work on each HIT?
  • 13.
    HIT Result Answer ofeach worker for each HIT Accept Time, Submit Time, Work Time In Seconds Worker ID
  • 14.
    Our questions Can wereplace the InPhO community by means of Amazon Mechanical Turk? How much does it cost and what is the resulting quality?
  • 15.
    Experimental Setup ● We wanted some overlap within the experts: Minimum overlap i=1 2 3 4 5 Number of pairs 3,237 1,154 370 187 92 We decided for the 1,154 pairs. ● Each pair was evaluated by 5 different workers. ● Each worker evaluated at least 12 pairs (1 HIT). ● 87 distinct workers. ● The HITs were completed in 20 hours.
  • 16.
    Measuring Agreement ● Calculation of the distance between two answers: ● Relatedness: Absolute value of the difference ● Relative Generality: Match: 0, otherwise: 1 ● The evaluation deviation is the mean distance of a user to the users in a reference group.
  • 17.
    Comparison with Experts (Relative Generality) 30 InPhO Users AMT Users Fraction of users in % 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Follow Experts Own Opinion
  • 18.
    Comparison with Experts (Relative Generality) Random Clicker 30 InPhO Users AMT Users Fraction of users in % 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Follow Experts Own Opinion
  • 19.
    Comparison with Experts (Relative Generality) 30 InPhO Users AMT Users Fraction of users in % 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Follow Experts Own Opinion
  • 20.
    Comparison with Experts (Relative Generality) InPhO Users are quite consistent. 30 InPhO Users AMT Users Fraction of users in % 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Follow Experts Own Opinion
  • 21.
    Comparison with Experts (Relative Generality) InPhO Users are quite consistent. 30 InPhO Users AMT Users Fraction of users in % 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Follow Experts Own Opinion AMT Users are not consistent. → Are there good ones?
  • 22.
    Comparison with Experts (Relative Generality) InPhO Users are quite consistent. 30 InPhO Users AMT Users Fraction of users in % 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Follow Experts Own Opinion AMT Users are not consistent. Yes, there are! → Are there good ones? → But which ones?
  • 23.
    Comparison with Experts (Relative Generality) InPhO Users are quite consistent. 30 InPhO Users AMT Users Fraction of users in % 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Follow Experts Own Opinion AMT Users are not consistent. Yes, there are! → Are there good ones? → But which ones?
  • 24.
    Mixed Results... Can wejust use the good ones?
  • 25.
    Telling the goodfrom the bad ● First approach: Filtering by working time ● Hypothesis 1: Workers who think some time before they answer, give better answers. ● Hypothesis 2: Probably there are workers who give quick random responses.
  • 26.
    Filtering by workingtime 100 84 80 75 Number of Users 68 60 57 44 40 36 29 22 20 17 # Users 13 9 9 8 7 5 4 4 3 0 0s s s s s s s s s s s s s 00 40 00 40 40 60 20 80 60 20 80 00 >8 >2 >4 >5 >7 >8 >1 >2 >3 >3 >5 >6 >6 Average working time for one HIT (12 pairs)
  • 27.
    Filtering by workingtime 48 47 1, 1,5 100 1, 41 39 1, 38 37 36 1, 35 1, 1, 1, 1,42 1, 31 1, 27 1, 1,2 84 1,21 80 10 1, Deviation from Experts 75 1,06 Number of Users 68 0,9 60 57 0,64 0,6 44 40 36 29 0,3 22 20 17 # Users 13 Deviation 9 9 8 7 5 4 4 0 3 0 0s s s s s s s s s s s s s 00 40 00 40 40 60 20 80 60 20 80 00 >8 >2 >4 >5 >7 >8 >1 >2 >3 >3 >5 >6 >6 Average working time for one HIT (12 pairs)
  • 28.
    Telling the goodfrom the bad ● Second approach: Filtering by comparison with a hidden gold standard. ● Test pairs: ● Social Epistemology – Epistemology (P1) ● Computer Ethics – Ethics (P2) ● Chinese Room Argument – Chinese Philosophy (P3) ● Dualism - Philosophy of Mind (P4)
  • 29.
    Applying filters ● Test pairs: ● Social Epistemology – Epistemology (P1) ● Computer Ethics – Ethics (P2) ● Chinese Room Argument – Chinese Philosophy (P3) ● Dualism - Philosophy of Mind (P4) ● Filters: 1) P1 and P2 are correct (Common Sense) 2) Like 1), additionally P4 is correct (+Background) 3) Like 1), additionally P3 is correct (+Lexical) 4) All have to be correct (All)
  • 30.
    Filter results forrelatedness Filter Users Deviation Max. Dev. All (4) 7 0.60 1.00 +Lexical (3) 10 0.87 1.78 +Background (2) 23 0.84 1.41 Common Sense (1) 40 1.11 1.96 All AMT 87 1.39 2.96 All InPhO 25 0.77 1.75 Random --- 1.8 ---
  • 31.
    Filter results forrelative generality Filter Users Deviation Max. Dev. All (4) 7(5) 0.12 0.22 +Lexical (3) 10(8) 0.14 0.27 +Background (2) 23(20) 0.15 0.45 Common Sense (1) 40(35) 0.21 0.59 All AMT 87(78) 0.45 1.00 All InPhO 25 0.23 0.47 Random --- 0.75 ---
  • 32.
    Financial considerations Filter Pairs Evaluations Cost per Pair Cost per Evaluation --- 1,138 5,690 US$ 0.111 US$ 0.022 Common Sense (1) 1,074 1,909 US$ 0.117 US$ 0.066 +Background (2) 1,018 1,558 US$ 0.124 US$ 0.081 +Lexical (3) 215 215 US$ 0.586 US$ 0.586 All (4) 183 183 US$ 0.689 US$ 0.689 ● Overall payments: 126 US$ ● Estimation for all pairs with filter „All (4)“: 784 US$ ● Estimation for all pairs with redundancy (5x): 3,920 US$.
  • 33.
    Conclusion AMT answers areof varying quality. But this is true for many communities, too. With moderate filtering („Background“), we achieved a quality comparable to the InPhO community. With 5 evaluations per pair, we still covered 89% of all pairs with this filter. The resulting InPhO taxonomy is online: http://inpho.cogs.indiana.edu/amt_taxonomy No need for existing data, gold standards or training data (Beside the filter pairs). No need for a community?
  • 34.
    Thank you Questions? Kai Eckert kai@informatik.uni-mannheim.de http://www.slideshare.net/kaiec „Computer ethics doesn't exist. Blue is black and red is blood on the internet. Nobody cares, because they are lonely.“ Anonymous Mechanical Turk Worker
  • 35.
    Photo Credits ● Michal Zacharzewski (Title Crowd), http://www.sxc.hu/profile/mzacha ● Peter Suneson (Crowd sillhouette), http://www.sxc.hu/profile/CMSeter ● Alaa Hamed (Egyptian Coins), http://www.sxc.hu/profile/alaasafei ● Piotr Lewandowski (Money), http://www.sxc.hu/profile/LeWy2005 ● Asif Akbar (Clock), http://www.sxc.hu/profile/asifthebes ● Zern Liew (Traffic Cone), http://www.sxc.hu/profile/eidesign ● Peter Gustafson (Counting Fingers), http://www.sxc.hu/profile/liaj ● Kostya Kisleyko (Yes No), http://www.sxc.hu/profile/dlnny ● Sergio Roberto Bichara (Barcode), http://www.sxc.hu/profile/srbichara ● Maggie Molloy (Icons), http://www.sxc.hu/profile/agthabrown ● Sanja Gjenero (World with Crowd), http://www.sxc.hu/profile/lusi ● Wikimedia Commons (The Turk), http://en.wikipedia.org/wiki/File:Kempelen_chess1.jpg