1/44<br />Is there another side to this?<br />Identifying Disputed Information on the Web<br />Rob Ennals, Intel Research ...
About Me: Rob Ennals<br /><ul><li>Senior Research Scientist at Intel Research
Represent Intel at W3C for HTML and Web Apps.
PhD from University of Cambridge</li></ul>(advised by Simon Peyton Jones – Microsoft Research)<br /><ul><li>Diverse intere...
4/44<br />People increasingly rely on the web for information<br />source: Pew Research<br />
Old Model: small number of known sources    TV, Radio, Newspaper, Book PublishersNew Model: huge number of unknown sources...
6/44<br />Dispute Finder:<br />inform users when information that they encounter in their lives is disputed by a source th...
7/44<br />Browser extension<br />Firefox extension examines every page you browse     (including email, intranet pages, et...
8/44<br />Click a dispute for more information<br />Show sources that support or oppose the claim.<br />
9/44<br />Search Engine Front-End<br />Built with Yahoo BOSS.<br />Examines text on all linked pages.<br />
Early Work:Mobile Voice Interface<br />Currently an early prototype, running on a laptop, based on Dragon NaturallySpeakin...
11/44<br /> Future: Disputed Claims on TV<br />
12/44<br />Future: Mail, Books, News, etc ...<br />
People seem to like it<br />Covered by: NPR, New Scientist, Fast Company, Christian Science Monitor, Wall Street Journal, ...
Overall structure:<br />14/44<br />
Related Work: Social Annotation<br />15/44<br />Videolyzer<br />Diigo<br />Diigo<br />Need to mark every instanceindividua...
Related Work: Fact Checker Sites<br />16/44<br />Need to suspect somethingmay be disputed.<br />
Related Work: Source Rating<br />17/44<br />Automatic quality metrics.<br />But: Non-credible sources still have useful in...
Related Work: Wiki Source Tracking<br />18/44<br />WikiTrust<br />WikiScanner<br />Who wrote this, and are they credible/b...
Overall structure:<br />19/44<br />
20/44<br />Compare Observed Text to Known Disputes<br />Glenn Beck falsely claimed that the moon is made of cheese, despit...
21/44<br />Contradiction detection via dispute detection<br />
22/44<br />Contradiction detection vs Dispute Detection<br />Contradiction detection: <br />    Does statement X logically...
23/44<br />A statement can be misleading without being wrong<br />GM's misleading claim that the Chevrolet Volt gets 230 m...
24/44<br />Mining claims from the web<br />
25/44<br />Use Patterns to Find Disputed Claims<br />the false claim that Himalayan glaciers could melt away by 2035<br />...
26/44<br />Some Disputes I Wasn’t Aware of<br />The Niger-Iraq Uranium connection has been discredited<br />Medieval Europ...
Most Disputed Nouns<br />God<br />Iraq<br />Government<br />Obama<br />War<br />6.Israel<br />7. President<br />8. Women<b...
28/44<br />Search for all patterns on Yahoo BOSS<br />Yahoo BOSS is an API for Yahoo search.<br /> <br />BOSS API has a li...
29/44<br />Claims need to be filtered<br />the false claim that won't go away<br /> <br />falsely claimed that he didn't d...
30/44<br />Labeled data from Mechanical Turk<br />$0.04 to label 10 claims, two of which are known.<br />If a turker gets ...
31/44<br />Problem: text may not be a statement<br />the false claim that won't go away<br />the belief that works best<br...
32/44<br />Problem: ambiguous claims<br />he didn't do it<br />the union was a party in the proceedings<br />the other par...
33/44<br />Wikipedia links tell us what is unambiguous<br />property taxes are regressive <br /> Obama is a communist<br /...
Overall structure:<br />34/44<br />
Users enter that claims they disagree with<br />35/44<br />
Users add paraphrases for claims<br />36/44<br />Alternative ways to phrase the same claim.<br />
Teach Dispute Finder to recognize claims<br />37/44<br />
Users add evidence to support claims<br />38/44<br />A claim will not be shown to others unless the user finds a source th...
Users identify a disputed claim on a page<br />39/44<br />Define a new disputed claim, or add paraphrase for<br />existing...
40/44<br />User Study Results<br />Frustrated by low number of claims that were highlighted<br />	- motivated text mining ...
41/44<br />Entailment<br />
42/44<br />Entailment is resource constrained<br />Must compare many sentences against a huge number of claimsin a fractio...
43/44<br />Simple lexical entailment<br />I think that globalwarming is just a hoax<br />globalwarming is a hoax<br />All ...
    if you have a big enough corpus then it works ok</li></ul>Future: better entailment that still scales<br />Future: loo...
What is Disputed?<br />44/44<br />Anything disputed by anyone?<br />- we get overwhelmed with claims disputed by nutcases<...
Upcoming SlideShare
Loading in …5
×

Dispute finder

682 views

Published on

Slides about the Dispute Finder project. Slides are taken from the talks presented at WWW 2010 and WICOW 2010.

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
682
On SlideShare
0
From Embeds
0
Number of Embeds
59
Actions
Shares
0
Downloads
10
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Dispute finder

  1. 1. 1/44<br />Is there another side to this?<br />Identifying Disputed Information on the Web<br />Rob Ennals, Intel Research Berkeley  -  rob@ennals.org<br />Work done in collaboration with:<br />    John Mark Agosta, Dan Byler, Beth Trushkowsky,     Barbara Rosario, Tad Hirsch, Tye Rattenbury<br />
  2. 2. About Me: Rob Ennals<br /><ul><li>Senior Research Scientist at Intel Research
  3. 3. Represent Intel at W3C for HTML and Web Apps.
  4. 4. PhD from University of Cambridge</li></ul>(advised by Simon Peyton Jones – Microsoft Research)<br /><ul><li>Diverse interests: PL, Concurrency, Systems, Web, Mashups, HCI, NLP, Politics, etc</li></li></ul><li>Not everything on the web is true, balanced, and objective<br />3/44<br />Not everything on the web is true, balanced, and objective<br />
  5. 5. 4/44<br />People increasingly rely on the web for information<br />source: Pew Research<br />
  6. 6. Old Model: small number of known sources    TV, Radio, Newspaper, Book PublishersNew Model: huge number of unknown sources    Blogs, random websites, foreign newspapers<br />5/44<br />Not just an issue of source credibility. If we ignore untrusted sources then we ignore a lot of the information on the web.<br />
  7. 7. 6/44<br />Dispute Finder:<br />inform users when information that they encounter in their lives is disputed by a source that they might trust<br />
  8. 8. 7/44<br />Browser extension<br />Firefox extension examines every page you browse     (including email, intranet pages, etc).<br />Highlights claims that are disputed.<br />
  9. 9. 8/44<br />Click a dispute for more information<br />Show sources that support or oppose the claim.<br />
  10. 10. 9/44<br />Search Engine Front-End<br />Built with Yahoo BOSS.<br />Examines text on all linked pages.<br />
  11. 11. Early Work:Mobile Voice Interface<br />Currently an early prototype, running on a laptop, based on Dragon NaturallySpeaking.<br />Listen to everything people say around you. Keep a list of disputed things you may have heard. <br />Vibrate when you hear something disputed.<br />10/44<br />
  12. 12. 11/44<br /> Future: Disputed Claims on TV<br />
  13. 13. 12/44<br />Future: Mail, Books, News, etc ...<br />
  14. 14. People seem to like it<br />Covered by: NPR, New Scientist, Fast Company, Christian Science Monitor, Wall Street Journal, NY Times Bay Area, San Jose Mercury, SF Chronicle, The Guardian, ACM TechNews, CBC (Canadian Public Radio), Cnet, Sacramento Bee, + many others<br />TG Daily: “This is hands down, the most amazing idea I’ve ever heard of when it comes to using the web”<br />Paper accepted for WWW 2010 + WICOW 2010.<br />
  15. 15. Overall structure:<br />14/44<br />
  16. 16. Related Work: Social Annotation<br />15/44<br />Videolyzer<br />Diigo<br />Diigo<br />Need to mark every instanceindividually<br />SpinSpotter<br />
  17. 17. Related Work: Fact Checker Sites<br />16/44<br />Need to suspect somethingmay be disputed.<br />
  18. 18. Related Work: Source Rating<br />17/44<br />Automatic quality metrics.<br />But: Non-credible sources still have useful information.<br />But: Credible sources still get stuff wrong.<br />
  19. 19. Related Work: Wiki Source Tracking<br />18/44<br />WikiTrust<br />WikiScanner<br />Who wrote this, and are they credible/biased?<br />Great if your content is on wikipedia.<br />
  20. 20. Overall structure:<br />19/44<br />
  21. 21. 20/44<br />Compare Observed Text to Known Disputes<br />Glenn Beck falsely claimed that the moon is made of cheese, despite clear evidence to the contrary.<br />False claim: "the moon is made of cheese"<br />Disputed by: Huffington Post, New York Times<br />Context: ...<br />Entailment: "We should mine the moon because it ismadeof cheese"<br />
  22. 22. 21/44<br />Contradiction detection via dispute detection<br />
  23. 23. 22/44<br />Contradiction detection vs Dispute Detection<br />Contradiction detection: <br />    Does statement X logically contradict statement Y. <br />    Hard: need lots of real-world knowledge.<br />Dispute detection:<br />    Does author A believe that statement X is disputed or misleading.<br />    Humans determine what is actually disputed.<br />    Humans determine which disputes are interesting.<br />    Only detects contradictions that humans find.<br />    Detects statements that are misleadingwithout being wrong. <br />Once we have determined that a dispute is real, could use contradiction detection and sentiment analysis to see who is on each side.<br />
  24. 24. 23/44<br />A statement can be misleading without being wrong<br />GM's misleading claim that the Chevrolet Volt gets 230 miles per gallon<br />deceptively claimed that fast food could be nutritious<br />Logical truth isn't all that interesting.<br />We want to know if there is a different way of looking at the subject. A different frame. <br />
  25. 25. 24/44<br />Mining claims from the web<br />
  26. 26. 25/44<br />Use Patterns to Find Disputed Claims<br />the false claim that Himalayan glaciers could melt away by 2035<br />it is not true that anyone aged over 59 cannot receive heart repairs<br />the misconception that everyone in the south are stupid<br />the delusion that scientists in different countries do science differently<br />into believing that Van Morrison had a new baby<br />the myth that we can't afford good working conditions for everyone<br />misleadingly claimed that unemployment is lower than the '70s <br />We built a simple grammar for such prefixes.<br />Currently 1293 patterns, identified on ~ 35 million web pages.<br />of which we have downloaded and processed 2 million.<br /> <br />Restricting to prefixes allows us to search for them using Yahoo BOSS. <br />Future: automatically infer a larger grammar of patterns<br />
  27. 27. 26/44<br />Some Disputes I Wasn’t Aware of<br />The Niger-Iraq Uranium connection has been discredited<br />Medieval Europeans thought the world was flat<br />Dinosaurs looked sleek and reptilian.<br />Dietary Cholesterol is a problem.<br />“Wear and Tear” causes arthritis<br />Specific foods cause ulcers<br />Estimates from Yahoo BOSS. Not all URLs downloaded.<br />
  28. 28. Most Disputed Nouns<br />God<br />Iraq<br />Government<br />Obama<br />War<br />6.Israel<br />7. President<br />8. Women<br />9. Money<br />10. Jesus<br />
  29. 29. 28/44<br />Search for all patterns on Yahoo BOSS<br />Yahoo BOSS is an API for Yahoo search.<br /> <br />BOSS API has a limit of 1000 hits per query, so salt with year and month.<br />  <br />+"falsely claimed that" +2010<br />+"falsely claimed that" -2010+2009<br />+"falsely claimed that" -2010-2009+2008<br />+"falsely claimed that" -2010-2009 -2008+2007<br />Needed for 197 patterns.<br />We talked to Yahoo first...<br />Future: get direct access to complete results for a pattern<br />
  30. 30. 29/44<br />Claims need to be filtered<br />the false claim that won't go away<br /> <br />falsely claimed that he didn't do it<br />wrongly think that the bill will pass <br />wrongly think that Great Britain doesn't<br /> <br />the myth thatElvis is alivehas a long history<br /> <br />falsely claim thatfull commentary below <br /> <br />fragment<br />ambiguous<br />suffix<br />extractionerror<br />
  31. 31. 30/44<br />Labeled data from Mechanical Turk<br />$0.04 to label 10 claims, two of which are known.<br />If a turker gets known items wrong, reject their work.<br />Each claim labeled by two turkers.<br />
  32. 32. 31/44<br />Problem: text may not be a statement<br />the false claim that won't go away<br />the belief that works best<br />the lie that people fell for<br /> <br />Current approach: Is the first word a verb?<br /> finds 71% of bad claims <br /> mistakenly drops 2% of good claims<br /> Works for first two, but not last.<br />
  33. 33. 32/44<br />Problem: ambiguous claims<br />he didn't do it<br />the union was a party in the proceedings<br />the other parent is abusive<br />our troops have committed atrocities<br /> <br />property taxes are regressive <br />Obama is a communist<br /> <br />Bad<br />Maybe<br />Good<br />If two pages say X, do they mean the same thing?<br />Turk: 61.9% agreement - often very subjective<br />    <br />Future: associate claim with page topic<br />
  34. 34. 33/44<br />Wikipedia links tell us what is unambiguous<br />property taxes are regressive <br /> Obama is a communist<br />Is this word always linked to the same thing?<br />Precision: 73% Recall: 73%<br />(vs gold data + word features) <br />
  35. 35. Overall structure:<br />34/44<br />
  36. 36. Users enter that claims they disagree with<br />35/44<br />
  37. 37. Users add paraphrases for claims<br />36/44<br />Alternative ways to phrase the same claim.<br />
  38. 38. Teach Dispute Finder to recognize claims<br />37/44<br />
  39. 39. Users add evidence to support claims<br />38/44<br />A claim will not be shown to others unless the user finds a source that argues against it.<br />
  40. 40. Users identify a disputed claim on a page<br />39/44<br />Define a new disputed claim, or add paraphrase for<br />existing disputed claim.<br />
  41. 41. 40/44<br />User Study Results<br />Frustrated by low number of claims that were highlighted<br /> - motivated text mining approach<br />Did not appreciate that a claim should apply to multiple pages<br /> - particularly when using context menu approach<br />Confused about how specific a claim should be <br /> E.g. “Global temperatures will rise by X degrees”<br />Users created claims with ambiguous meanings<br /> E.g. saying “wood” to mean “Ronnie Wood”<br />Confused by double-negatives when adding evidence<br /> E.g. opposes global warming does not exist<br />Future: use users to improve mined claims<br />
  42. 42. 41/44<br />Entailment<br />
  43. 43. 42/44<br />Entailment is resource constrained<br />Must compare many sentences against a huge number of claimsin a fraction of a second.<br />
  44. 44. 43/44<br />Simple lexical entailment<br />I think that globalwarming is just a hoax<br />globalwarming is a hoax<br />All non-stopwords present, and in the correct order.<br />Very simple but:<br /><ul><li>    it can be done very efficiently
  45. 45.     if you have a big enough corpus then it works ok</li></ul>Future: better entailment that still scales<br />Future: look at context, and other places same text appears<br />
  46. 46. What is Disputed?<br />44/44<br />Anything disputed by anyone?<br />- we get overwhelmed with claims disputed by nutcases<br />Anything disputed by a “reliable source”?<br />- what is a “reliable source”? (Wikipedia rules?)<br /> - do we end up enforcing “orthodox” beliefs and stifling debate?<br />Anything disputed by a source that I would trust?<br /> - we reinforce existing echo-chamber problem<br />Anything disputed by my friends?<br /> - do I agree with my friends<br /> - should I be encouraged to agree with them<br />Future: learn what to show a user by analyzing their behavior<br />
  47. 47. Interviews: Do people want this?<br />45/44<br />Hard to change established opinions<br /> They think they already understand the issue.<br /> They would have to publically back down<br />So focus on issues they don’t yet have an opinion on?<br />Hard to make someone accept the other side<br />Social identity in “us” vs “them”<br />Not willing to listen to “other side”<br />So give sources from their “own” side?<br />Sometimes people may not care <br /> Reading just for entertainment and conversation material<br /> Don’t care much if they are wrong<br /> Not interested in challenging opinions of others<br />Focus on issues that affect them personally<br />Dispute Finder probably isn’t for everyone<br />
  48. 48. 46/44<br />Questions?<br />

×