Stephanie Gohkman, Jeff Hancock,Poornima Prabhu, Myle Ott, & Claire Cardie
In Search of a Gold Standard     in Studies of Deception   Stephanie Gokhman, Jeff Hancock,Poornima Prabhu, Myle Ott, & Cl...
In Search of a Gold Standard     in Studies of Deception   Stephanie Gohkman, Jeff Hancock,Poornima Prabhu, Myle Ott, & Cl...
The NP model not consistentacross contextsOn reflection, why would weexpect it to be?Psychological and persuasiondynamics ...
Context: Deception in Online Reviews
Creating Deception for Research1. Sanctioned Lies   •   Researcher asks participant to lie   •   Topics include beliefs, a...
Creating Deception for Research1. Sanctioned Lies   •     Researcher asks participant to lie   •     Topics include belief...
Creating Deception for Research1. Sanctioned Lies2. Unsanctioned Lies                     Diary Studies                   ...
Creating Deception for Research1. Sanctioned Lies                       Psychology & Communication2. Unsanctioned Lies
Creating Deception for Research1. Sanctioned Lies                                 Psychology & Communication2. Unsanctione...
Creating Deception for Research1. Sanctioned Lies1. Unsanctioned Lies1. Non-gold Standard ApproachesA Novel Method: The Cr...
The Crowdsourcing ApproachCrowdsourcing divides large projects into small manageable tasks and matches these tasks with hu...
Amazons Mechanical TurkRequesters create a Human Intelligence Task (HIT) to be completed by WorkersHITs are similar to HTM...
4 Assumptions of our          Crowdsourcing Approach1. Balanced data set          Equal # of truthful and deceptive revie...
4 Assumptions of our           Crowdsourcing Approach4. Deceptive reviews should be generated  under the same basic guidel...
STEP 1:  Identify entities to be covered in the reviewsTruthful corpus     –   Find all entities (specific hotels) from th...
STEP 1:Identify entities to be covered in the reviews
STEP 1: Identify entities to be covered in the reviewsTruthful corpus     –   Find all entities (specific hotels) from the...
STEP 2:    Develop the Mechanical Turk promptSurvey real solicitations for deception (hotel reviews, doctor reviews, etc)
A Real Solicitation
STEP 2:    Develop the Mechanical Turk promptSurvey real solicitations for deception (hotelreviews, doctor reviews, etc)Mi...
Step 3:Attach appropriate warnings to the solicitation    May not complete this task more than once    Their work will n...
Step 4:    Gather demographic data and comments    Survey mechanism for demographics      –   Age, Education, etc    Qua...
Step 5:                                   Pilot    Pilot the resulting HIT in small batches (10)    Remove all plagiariz...
Lets see it!
Finding the Gold Standard    Resulting set of 400 reviews are then used    to train the algorithm for deceptive positive ...
Discussion & ConclusionAdvantages• model the deception as closely to real-world as possible• known deceptiveLimitations•  ...
Discussion & ConclusionKey Potential:        to create datasets more easily and efficiently        in an effort to model d...
In Search of a Gold Standard     in Studies of Deception   Stephanie Gokhman, Jeff Hancock,Poornima Prabhu, Myle Ott, & Cl...
EACL2012: In Search of a Gold Standard in Studies of Deception
EACL2012: In Search of a Gold Standard in Studies of Deception
EACL2012: In Search of a Gold Standard in Studies of Deception
EACL2012: In Search of a Gold Standard in Studies of Deception
EACL2012: In Search of a Gold Standard in Studies of Deception
EACL2012: In Search of a Gold Standard in Studies of Deception
Upcoming SlideShare
Loading in …5
×

EACL2012: In Search of a Gold Standard in Studies of Deception

1,154 views

Published on

Presentation by myself and Jeff Hancock on April 23, 2012, in Avignon, France, at the 2012 conference for the European Association of Computational Linguistics (EACL) Deception Detection Workshop.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,154
On SlideShare
0
From Embeds
0
Number of Embeds
26
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

EACL2012: In Search of a Gold Standard in Studies of Deception

  1. 1. Stephanie Gohkman, Jeff Hancock,Poornima Prabhu, Myle Ott, & Claire Cardie
  2. 2. In Search of a Gold Standard in Studies of Deception Stephanie Gokhman, Jeff Hancock,Poornima Prabhu, Myle Ott, & Claire Cardie
  3. 3. In Search of a Gold Standard in Studies of Deception Stephanie Gohkman, Jeff Hancock,Poornima Prabhu, Myle Ott, & Claire Cardie Newman-Pennebaker Model (2003)
  4. 4. The NP model not consistentacross contextsOn reflection, why would weexpect it to be?Psychological and persuasiondynamics of deception arehighly constrained by context
  5. 5. Context: Deception in Online Reviews
  6. 6. Creating Deception for Research1. Sanctioned Lies • Researcher asks participant to lie • Topics include beliefs, attitudes, feelings, actions Ex: mock crime
  7. 7. Creating Deception for Research1. Sanctioned Lies • Researcher asks participant to lie • Topics include beliefs, attitudes, feelings, actions Ex: mock crime Adv: researcher can control when and where lie occurs Limitations: permission to lie, requires high stakes
  8. 8. Creating Deception for Research1. Sanctioned Lies2. Unsanctioned Lies Diary Studies Retrospective Identification Cheating paradigms
  9. 9. Creating Deception for Research1. Sanctioned Lies Psychology & Communication2. Unsanctioned Lies
  10. 10. Creating Deception for Research1. Sanctioned Lies Psychology & Communication2. Unsanctioned Lies3. Non-gold Standard Approaches i. Manual Annotation Computer i. Heuristically labeled Science i. Unlabeled (distributional analysis)
  11. 11. Creating Deception for Research1. Sanctioned Lies1. Unsanctioned Lies1. Non-gold Standard ApproachesA Novel Method: The Crowd-sourcing Approach…
  12. 12. The Crowdsourcing ApproachCrowdsourcing divides large projects into small manageable tasks and matches these tasks with humans that will perform them- harness distributed resources- maximize speed- minimize cost- more powerful than local tech & small research groups- data collection, access, annotation, and analysis
  13. 13. Amazons Mechanical TurkRequesters create a Human Intelligence Task (HIT) to be completed by WorkersHITs are similar to HTML forms an may include:- the solicitation- information needed for the Workers to complete the task- collection of survey information
  14. 14. 4 Assumptions of our Crowdsourcing Approach1. Balanced data set  Equal # of truthful and deceptive reviews  Uniform valence: whole positive or negative data set2. Both truthful and deceptive reviews coversame set of entities  Minimize distinguishing features that may be context- based rather than language of deception3. Data set of reasonable size  800 total reviews (400 crowdsourced)
  15. 15. 4 Assumptions of our Crowdsourcing Approach4. Deceptive reviews should be generated under the same basic guidelines as governs the generation of truthful reviews  Length  Quality  Time
  16. 16. STEP 1: Identify entities to be covered in the reviewsTruthful corpus – Find all entities (specific hotels) from the real world database (TripAdvisor) – Extract all statements (reviews) from those entities – Identify the subcategories to which these entities belong (Chicago hotels)
  17. 17. STEP 1:Identify entities to be covered in the reviews
  18. 18. STEP 1: Identify entities to be covered in the reviewsTruthful corpus – Find all entities (specific hotels) from the real world database (TripAdvisor) – Extract all statements (reviews) from those entities – Identify the subcategories to which these entities belong (Chicago hotels)Deceptive Corpus – Use entities from truthful corpus to create the prompt for the Turkers
  19. 19. STEP 2: Develop the Mechanical Turk promptSurvey real solicitations for deception (hotel reviews, doctor reviews, etc)
  20. 20. A Real Solicitation
  21. 21. STEP 2: Develop the Mechanical Turk promptSurvey real solicitations for deception (hotelreviews, doctor reviews, etc)Mimic the workflow, vocabulary and tone ofthe Turkers
  22. 22. Step 3:Attach appropriate warnings to the solicitation May not complete this task more than once Their work will not be awarded if it is not coherent or off topic This review is for academic purposes  Be aware of priming effects and placement of this warning
  23. 23. Step 4: Gather demographic data and comments Survey mechanism for demographics – Age, Education, etc Qualitative, open-ended comment Provides technical information Incentivize comments
  24. 24. Step 5: Pilot Pilot the resulting HIT in small batches (10) Remove all plagiarized results through automated processes (Yahoo! Boss API) – Workers do not receive payment for any plagiarized material Manually evaluate remaining set Coherence, Topical, Length of Review Iterate until:  No technical complaints  Experiment quality Full run of solicitation (400 reviews) by unique workers
  25. 25. Lets see it!
  26. 26. Finding the Gold Standard Resulting set of 400 reviews are then used to train the algorithm for deceptive positive reviews The algorithm trains separately on the set of 400 truthful* reviews for comparison
  27. 27. Discussion & ConclusionAdvantages• model the deception as closely to real-world as possible• known deceptiveLimitations• sanctioned?• limited knowledge of Turkers• constrained to certain contexts• construction of the ‘truthful’ set non-trivial
  28. 28. Discussion & ConclusionKey Potential: to create datasets more easily and efficiently in an effort to model deception customized to specific contexts for a Context Constrained Approach to Deception
  29. 29. In Search of a Gold Standard in Studies of Deception Stephanie Gokhman, Jeff Hancock,Poornima Prabhu, Myle Ott, & Claire Cardie

×