In traditional computation a task is outsourced to one or more computers for solving…But in human computation, the task is routed to the crowd.Specific problems easy for humans but not for systems. E.g., objects in a room.
AgendaIntroduction to Named Entities (NE) concepts.Extracting relationships among NEs andproblems associated with them.Our approach: Human in the loopProposed design and results
Terminology Named entities Relations Co-references
Named Entity: Definition It is an atomic element in a body of text. Types: person, organization, location etc. Different named entities when linked together, form a relation.
Named Entity: An example Sachin Tendulkarwas born in Bombay. NE of type ‘Person’ NE of type ‘Location’
Relationship: Structure Subject – Relation - Object NE of any type NE of any type Verb, Adjective, Adverb
Relationship: An ExampleSachin Tendulkar was born inBombay Subject Relation Object
Co-references: An ExampleSachin was born in Bombay. He is a ...
Extracting relationships amongNEs: Importance They signify a fact related to a named entity. Useful in Question answering system. Useful in improving the accuracy of search results.
Extracting relationships amongNEs: Standard process1. Identify named entities within a sentence.2. Find the verb or adjective that connects the identified named entities.3. Connect them together to form relation.
Extracting relationships amongNEs: Difficulty Co-references, Use of abbreviations, acronyms and ambiguous words at several places. Complex structure of the sentences.
Extracting relationships amongNEs: Difficulty Example“Tom called his father last night. They talked for an hour. He said he would be home the next day." What is ‘He referring to? Tom orhis father?
Extracting relationships amongNEs: Required process1. Identify part-of-speech constructs: noun, verb, adjective etc.2. Determine Co-references, Acronyms and abbreviations.3. Connect them together to form a relationship.
Extracting relationships amongNEs: Automated Approaches Natural Language Processing: Part-of-Speech Tagger, CRF. Machine Learning: Hidden Markov Model, Singular Vector Model. Statistical Methods: Maximum Entropy Method. Other methods:Vocabularybased systems, context based clustering.
Issues with the automatedextraction techniques Dependency Scalability on external vocabulary Domain-dependent. sources, like Wikipedia, Corpus-dependent. WordNet, MindNet etc. Relation specific. Maintenance, update of vocabulary sources is manual, costly and require expertise. Limited size produce context based noise.
Crowdsourcing: harness thewisdom of the crowd From Traditional to Human Computation
Crowdsourcing: In terms of NErelationship extraction Advantages DisadvantagesHumans can easily extract Not like computers.and find different facts froma text body. Find the task boring and cumbersome.They can also verify theaccuracy of the obtained Need incentives torelationships from participate.automated techniques.
Incentive mechanisms Money IncentivesFun Social
1. Monetary incentives Features Disadvantages Can scale massively. Not the only source of motivation. Harness majority vote. Work is not credited so may Example: encourage cheating. Requires filtering and Amazon Mechanical Turk monitoring. Needs labor law.
2. Social incentives Features Disadvantages Distribute among social Do not scale beyond the peers (crowd). closed network. Harness trust. Specific to the participating crowd. Examples: Requires filtering andFlickr, Facebook, Quora. monitoring.
3. Fun incentives Features Disadvantages Games are seductive. Need someone to play with. Bring collaboration, Improper game play may curiosity, challenges, encourage cheating. competition, fun. More cognitive work leads Task is generally hidden. to less fun. Example: Requires filtering and ESP, GWAP monitoring.
Existing crowdsourcing ideas Monetary incentive Fun incentiveAmazon Mechanical Turk Games With a purpose: Verbosity, Categorilla,collecting named entities, Phrase Detectivesfinding relational hierarchy,phrase detection etc. For collecting common sense facts, producingStill no solution for entities for templates.verification! Still a higher cognitive task!
uPick working Step 1: Extract NEs and relations using POS Tagger (automated technique). Step 2:Present the extracted relations to a crowd in the form a game (challenge). Step 3: Filter the relations by collecting the majority votes.
uPick scoring For the first player, compare the output with the expert judgments. For subsequent players, check the majority vote (> 50%).
uPick benefits Effectiveness Generalization Min. cognitive effort Language and corpus because of click-based independent. interaction. Can be extended to solve No dependency on external other similar NLP problems. resource, therefore, scalable.
Supervised laboratory study. Participants: 12 (4 maleUser study and 8 females). Two sessions of one of hour: training and game play. uPick Document setfour onAshok Maurya, Sachin Tendulkar, Shahrukh Khan, and Sonia Gandhi.
D1 D2 D3 D4Total number of presented 37 39 40 33relationsCorrectly identified valid 19 18 19 15relationsIncorrectly identified valid 5 6 4 1relations as invalidCorrectly identified invalid 12 12 16 15relationsIncorrectly identified 1 3 1 2invalid relations as validAccuracy 84% 77% 87% 91%(Correctly identifiedrelations / total relations)Accuracy using automated techniques 65% 61% 57% 49%only (Valid relations / total relations) RESULTS: Accuracy of uPick scheme after considering majority votes of the participants
Conclusion Participants did not find the game design engaging. uPick proved helpful in remembering various facts related to a text body.
Future Work Leader board. More engaging game play design. For example, physics based puzzles and object finding games. Extension to question answering system based on individual document.