Your SlideShare is downloading. ×
Dispute finder
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Dispute finder

358
views

Published on

Slides about the Dispute Finder project. Slides are taken from the talks presented at WWW 2010 and WICOW 2010.

Slides about the Dispute Finder project. Slides are taken from the talks presented at WWW 2010 and WICOW 2010.

Published in: Technology, Business

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
358
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
6
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. 1/44
    Is there another side to this?
    Identifying Disputed Information on the Web
    Rob Ennals, Intel Research Berkeley  -  rob@ennals.org
    Work done in collaboration with:
        John Mark Agosta, Dan Byler, Beth Trushkowsky,     Barbara Rosario, Tad Hirsch, Tye Rattenbury
  • 2. About Me: Rob Ennals
    • Senior Research Scientist at Intel Research
    • 3. Represent Intel at W3C for HTML and Web Apps.
    • 4. PhD from University of Cambridge
    (advised by Simon Peyton Jones – Microsoft Research)
    • Diverse interests: PL, Concurrency, Systems, Web, Mashups, HCI, NLP, Politics, etc
  • Not everything on the web is true, balanced, and objective
    3/44
    Not everything on the web is true, balanced, and objective
  • 5. 4/44
    People increasingly rely on the web for information
    source: Pew Research
  • 6. Old Model: small number of known sources    TV, Radio, Newspaper, Book PublishersNew Model: huge number of unknown sources    Blogs, random websites, foreign newspapers
    5/44
    Not just an issue of source credibility. If we ignore untrusted sources then we ignore a lot of the information on the web.
  • 7. 6/44
    Dispute Finder:
    inform users when information that they encounter in their lives is disputed by a source that they might trust
  • 8. 7/44
    Browser extension
    Firefox extension examines every page you browse     (including email, intranet pages, etc).
    Highlights claims that are disputed.
  • 9. 8/44
    Click a dispute for more information
    Show sources that support or oppose the claim.
  • 10. 9/44
    Search Engine Front-End
    Built with Yahoo BOSS.
    Examines text on all linked pages.
  • 11. Early Work:Mobile Voice Interface
    Currently an early prototype, running on a laptop, based on Dragon NaturallySpeaking.
    Listen to everything people say around you. Keep a list of disputed things you may have heard. 
    Vibrate when you hear something disputed.
    10/44
  • 12. 11/44
    Future: Disputed Claims on TV
  • 13. 12/44
    Future: Mail, Books, News, etc ...
  • 14. People seem to like it
    Covered by: NPR, New Scientist, Fast Company, Christian Science Monitor, Wall Street Journal, NY Times Bay Area, San Jose Mercury, SF Chronicle, The Guardian, ACM TechNews, CBC (Canadian Public Radio), Cnet, Sacramento Bee, + many others
    TG Daily: “This is hands down, the most amazing idea I’ve ever heard of when it comes to using the web”
    Paper accepted for WWW 2010 + WICOW 2010.
  • 15. Overall structure:
    14/44
  • 16. Related Work: Social Annotation
    15/44
    Videolyzer
    Diigo
    Diigo
    Need to mark every instanceindividually
    SpinSpotter
  • 17. Related Work: Fact Checker Sites
    16/44
    Need to suspect somethingmay be disputed.
  • 18. Related Work: Source Rating
    17/44
    Automatic quality metrics.
    But: Non-credible sources still have useful information.
    But: Credible sources still get stuff wrong.
  • 19. Related Work: Wiki Source Tracking
    18/44
    WikiTrust
    WikiScanner
    Who wrote this, and are they credible/biased?
    Great if your content is on wikipedia.
  • 20. Overall structure:
    19/44
  • 21. 20/44
    Compare Observed Text to Known Disputes
    Glenn Beck falsely claimed that the moon is made of cheese, despite clear evidence to the contrary.
    False claim: "the moon is made of cheese"
    Disputed by: Huffington Post, New York Times
    Context: ...
    Entailment: "We should mine the moon because it ismadeof cheese"
  • 22. 21/44
    Contradiction detection via dispute detection
  • 23. 22/44
    Contradiction detection vs Dispute Detection
    Contradiction detection:
        Does statement X logically contradict statement Y. 
        Hard: need lots of real-world knowledge.
    Dispute detection:
        Does author A believe that statement X is disputed or misleading.
        Humans determine what is actually disputed.
        Humans determine which disputes are interesting.
        Only detects contradictions that humans find.
        Detects statements that are misleadingwithout being wrong.
    Once we have determined that a dispute is real, could use contradiction detection and sentiment analysis to see who is on each side.
  • 24. 23/44
    A statement can be misleading without being wrong
    GM's misleading claim that the Chevrolet Volt gets 230 miles per gallon
    deceptively claimed that fast food could be nutritious
    Logical truth isn't all that interesting.
    We want to know if there is a different way of looking at the subject. A different frame. 
  • 25. 24/44
    Mining claims from the web
  • 26. 25/44
    Use Patterns to Find Disputed Claims
    the false claim that Himalayan glaciers could melt away by 2035
    it is not true that anyone aged over 59 cannot receive heart repairs
    the misconception that everyone in the south are stupid
    the delusion that scientists in different countries do science differently
    into believing that Van Morrison had a new baby
    the myth that we can't afford good working conditions for everyone
    misleadingly claimed that unemployment is lower than the '70s
    We built a simple grammar for such prefixes.
    Currently 1293 patterns, identified on ~ 35 million web pages.
    of which we have downloaded and processed 2 million.
     
    Restricting to prefixes allows us to search for them using Yahoo BOSS.
    Future: automatically infer a larger grammar of patterns
  • 27. 26/44
    Some Disputes I Wasn’t Aware of
    The Niger-Iraq Uranium connection has been discredited
    Medieval Europeans thought the world was flat
    Dinosaurs looked sleek and reptilian.
    Dietary Cholesterol is a problem.
    “Wear and Tear” causes arthritis
    Specific foods cause ulcers
    Estimates from Yahoo BOSS. Not all URLs downloaded.
  • 28. Most Disputed Nouns
    God
    Iraq
    Government
    Obama
    War
    6.Israel
    7. President
    8. Women
    9. Money
    10. Jesus
  • 29. 28/44
    Search for all patterns on Yahoo BOSS
    Yahoo BOSS is an API for Yahoo search.
     
    BOSS API has a limit of 1000 hits per query, so salt with year and month.
     
    +"falsely claimed that" +2010
    +"falsely claimed that" -2010+2009
    +"falsely claimed that" -2010-2009+2008
    +"falsely claimed that" -2010-2009 -2008+2007
    Needed for 197 patterns.
    We talked to Yahoo first...
    Future: get direct access to complete results for a pattern
  • 30. 29/44
    Claims need to be filtered
    the false claim that won't go away
     
    falsely claimed that he didn't do it
    wrongly think that the bill will pass 
    wrongly think that Great Britain doesn't
     
    the myth thatElvis is alivehas a long history
     
    falsely claim thatfull commentary below 
     
    fragment
    ambiguous
    suffix
    extractionerror
  • 31. 30/44
    Labeled data from Mechanical Turk
    $0.04 to label 10 claims, two of which are known.
    If a turker gets known items wrong, reject their work.
    Each claim labeled by two turkers.
  • 32. 31/44
    Problem: text may not be a statement
    the false claim that won't go away
    the belief that works best
    the lie that people fell for
     
    Current approach: Is the first word a verb?
    finds 71% of bad claims
    mistakenly drops 2% of good claims
    Works for first two, but not last.
  • 33. 32/44
    Problem: ambiguous claims
    he didn't do it
    the union was a party in the proceedings
    the other parent is abusive
    our troops have committed atrocities
     
    property taxes are regressive
    Obama is a communist
     
    Bad
    Maybe
    Good
    If two pages say X, do they mean the same thing?
    Turk: 61.9% agreement - often very subjective
       
    Future: associate claim with page topic
  • 34. 33/44
    Wikipedia links tell us what is unambiguous
    property taxes are regressive
     Obama is a communist
    Is this word always linked to the same thing?
    Precision: 73% Recall: 73%
    (vs gold data + word features)
  • 35. Overall structure:
    34/44
  • 36. Users enter that claims they disagree with
    35/44
  • 37. Users add paraphrases for claims
    36/44
    Alternative ways to phrase the same claim.
  • 38. Teach Dispute Finder to recognize claims
    37/44
  • 39. Users add evidence to support claims
    38/44
    A claim will not be shown to others unless the user finds a source that argues against it.
  • 40. Users identify a disputed claim on a page
    39/44
    Define a new disputed claim, or add paraphrase for
    existing disputed claim.
  • 41. 40/44
    User Study Results
    Frustrated by low number of claims that were highlighted
    - motivated text mining approach
    Did not appreciate that a claim should apply to multiple pages
    - particularly when using context menu approach
    Confused about how specific a claim should be
    E.g. “Global temperatures will rise by X degrees”
    Users created claims with ambiguous meanings
    E.g. saying “wood” to mean “Ronnie Wood”
    Confused by double-negatives when adding evidence
    E.g. opposes global warming does not exist
    Future: use users to improve mined claims
  • 42. 41/44
    Entailment
  • 43. 42/44
    Entailment is resource constrained
    Must compare many sentences against a huge number of claimsin a fraction of a second.
  • 44. 43/44
    Simple lexical entailment
    I think that globalwarming is just a hoax
    globalwarming is a hoax
    All non-stopwords present, and in the correct order.
    Very simple but:
    •     it can be done very efficiently
    • 45.     if you have a big enough corpus then it works ok
    Future: better entailment that still scales
    Future: look at context, and other places same text appears
  • 46. What is Disputed?
    44/44
    Anything disputed by anyone?
    - we get overwhelmed with claims disputed by nutcases
    Anything disputed by a “reliable source”?
    - what is a “reliable source”? (Wikipedia rules?)
    - do we end up enforcing “orthodox” beliefs and stifling debate?
    Anything disputed by a source that I would trust?
    - we reinforce existing echo-chamber problem
    Anything disputed by my friends?
    - do I agree with my friends
    - should I be encouraged to agree with them
    Future: learn what to show a user by analyzing their behavior
  • 47. Interviews: Do people want this?
    45/44
    Hard to change established opinions
    They think they already understand the issue.
    They would have to publically back down
    So focus on issues they don’t yet have an opinion on?
    Hard to make someone accept the other side
    Social identity in “us” vs “them”
    Not willing to listen to “other side”
    So give sources from their “own” side?
    Sometimes people may not care
    Reading just for entertainment and conversation material
    Don’t care much if they are wrong
    Not interested in challenging opinions of others
    Focus on issues that affect them personally
    Dispute Finder probably isn’t for everyone
  • 48. 46/44
    Questions?