Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

EACL2012: In Search of a Gold Standard in Studies of Deception

1,228 views

Published on

Presentation by myself and Jeff Hancock on April 23, 2012, in Avignon, France, at the 2012 conference for the European Association of Computational Linguistics (EACL) Deception Detection Workshop.

Published in: Technology, Education
  • Login to see the comments

  • Be the first to like this

EACL2012: In Search of a Gold Standard in Studies of Deception

  1. 1. Stephanie Gohkman, Jeff Hancock,Poornima Prabhu, Myle Ott, & Claire Cardie
  2. 2. In Search of a Gold Standard in Studies of Deception Stephanie Gokhman, Jeff Hancock,Poornima Prabhu, Myle Ott, & Claire Cardie
  3. 3. In Search of a Gold Standard in Studies of Deception Stephanie Gohkman, Jeff Hancock,Poornima Prabhu, Myle Ott, & Claire Cardie Newman-Pennebaker Model (2003)
  4. 4. The NP model not consistentacross contextsOn reflection, why would weexpect it to be?Psychological and persuasiondynamics of deception arehighly constrained by context
  5. 5. Context: Deception in Online Reviews
  6. 6. Creating Deception for Research1. Sanctioned Lies • Researcher asks participant to lie • Topics include beliefs, attitudes, feelings, actions Ex: mock crime
  7. 7. Creating Deception for Research1. Sanctioned Lies • Researcher asks participant to lie • Topics include beliefs, attitudes, feelings, actions Ex: mock crime Adv: researcher can control when and where lie occurs Limitations: permission to lie, requires high stakes
  8. 8. Creating Deception for Research1. Sanctioned Lies2. Unsanctioned Lies Diary Studies Retrospective Identification Cheating paradigms
  9. 9. Creating Deception for Research1. Sanctioned Lies Psychology & Communication2. Unsanctioned Lies
  10. 10. Creating Deception for Research1. Sanctioned Lies Psychology & Communication2. Unsanctioned Lies3. Non-gold Standard Approaches i. Manual Annotation Computer i. Heuristically labeled Science i. Unlabeled (distributional analysis)
  11. 11. Creating Deception for Research1. Sanctioned Lies1. Unsanctioned Lies1. Non-gold Standard ApproachesA Novel Method: The Crowd-sourcing Approach…
  12. 12. The Crowdsourcing ApproachCrowdsourcing divides large projects into small manageable tasks and matches these tasks with humans that will perform them- harness distributed resources- maximize speed- minimize cost- more powerful than local tech & small research groups- data collection, access, annotation, and analysis
  13. 13. Amazons Mechanical TurkRequesters create a Human Intelligence Task (HIT) to be completed by WorkersHITs are similar to HTML forms an may include:- the solicitation- information needed for the Workers to complete the task- collection of survey information
  14. 14. 4 Assumptions of our Crowdsourcing Approach1. Balanced data set  Equal # of truthful and deceptive reviews  Uniform valence: whole positive or negative data set2. Both truthful and deceptive reviews coversame set of entities  Minimize distinguishing features that may be context- based rather than language of deception3. Data set of reasonable size  800 total reviews (400 crowdsourced)
  15. 15. 4 Assumptions of our Crowdsourcing Approach4. Deceptive reviews should be generated under the same basic guidelines as governs the generation of truthful reviews  Length  Quality  Time
  16. 16. STEP 1: Identify entities to be covered in the reviewsTruthful corpus – Find all entities (specific hotels) from the real world database (TripAdvisor) – Extract all statements (reviews) from those entities – Identify the subcategories to which these entities belong (Chicago hotels)
  17. 17. STEP 1:Identify entities to be covered in the reviews
  18. 18. STEP 1: Identify entities to be covered in the reviewsTruthful corpus – Find all entities (specific hotels) from the real world database (TripAdvisor) – Extract all statements (reviews) from those entities – Identify the subcategories to which these entities belong (Chicago hotels)Deceptive Corpus – Use entities from truthful corpus to create the prompt for the Turkers
  19. 19. STEP 2: Develop the Mechanical Turk promptSurvey real solicitations for deception (hotel reviews, doctor reviews, etc)
  20. 20. A Real Solicitation
  21. 21. STEP 2: Develop the Mechanical Turk promptSurvey real solicitations for deception (hotelreviews, doctor reviews, etc)Mimic the workflow, vocabulary and tone ofthe Turkers
  22. 22. Step 3:Attach appropriate warnings to the solicitation May not complete this task more than once Their work will not be awarded if it is not coherent or off topic This review is for academic purposes  Be aware of priming effects and placement of this warning
  23. 23. Step 4: Gather demographic data and comments Survey mechanism for demographics – Age, Education, etc Qualitative, open-ended comment Provides technical information Incentivize comments
  24. 24. Step 5: Pilot Pilot the resulting HIT in small batches (10) Remove all plagiarized results through automated processes (Yahoo! Boss API) – Workers do not receive payment for any plagiarized material Manually evaluate remaining set Coherence, Topical, Length of Review Iterate until:  No technical complaints  Experiment quality Full run of solicitation (400 reviews) by unique workers
  25. 25. Lets see it!
  26. 26. Finding the Gold Standard Resulting set of 400 reviews are then used to train the algorithm for deceptive positive reviews The algorithm trains separately on the set of 400 truthful* reviews for comparison
  27. 27. Discussion & ConclusionAdvantages• model the deception as closely to real-world as possible• known deceptiveLimitations• sanctioned?• limited knowledge of Turkers• constrained to certain contexts• construction of the ‘truthful’ set non-trivial
  28. 28. Discussion & ConclusionKey Potential: to create datasets more easily and efficiently in an effort to model deception customized to specific contexts for a Context Constrained Approach to Deception
  29. 29. In Search of a Gold Standard in Studies of Deception Stephanie Gokhman, Jeff Hancock,Poornima Prabhu, Myle Ott, & Claire Cardie

×