1/44 Is there another side to this? Identifying Disputed Information on the Web Rob Ennals, Intel Research Berkeley - firstname.lastname@example.org Work done in collaboration with: John Mark Agosta, Dan Byler, Beth Trushkowsky, Barbara Rosario, Tad Hirsch, Tye Rattenbury
Not everything on the web is true, balanced, and objective 3/44 Not everything on the web is true, balanced, and objective
4/44 People increasingly rely on the web for information source: Pew Research
Old Model: small number of known sources TV, Radio, Newspaper, Book PublishersNew Model: huge number of unknown sources Blogs, random websites, foreign newspapers 5/44 Not just an issue of source credibility. If we ignore untrusted sources then we ignore a lot of the information on the web.
6/44 Dispute Finder: inform users when information that they encounter in their lives is disputed by a source that they might trust
7/44 Browser extension Firefox extension examines every page you browse (including email, intranet pages, etc). Highlights claims that are disputed.
8/44 Click a dispute for more information Show sources that support or oppose the claim.
9/44 Search Engine Front-End Built with Yahoo BOSS. Examines text on all linked pages.
Early Work:Mobile Voice Interface Currently an early prototype, running on a laptop, based on Dragon NaturallySpeaking. Listen to everything people say around you. Keep a list of disputed things you may have heard. Vibrate when you hear something disputed. 10/44
People seem to like it Covered by: NPR, New Scientist, Fast Company, Christian Science Monitor, Wall Street Journal, NY Times Bay Area, San Jose Mercury, SF Chronicle, The Guardian, ACM TechNews, CBC (Canadian Public Radio), Cnet, Sacramento Bee, + many others TG Daily: “This is hands down, the most amazing idea I’ve ever heard of when it comes to using the web” Paper accepted for WWW 2010 + WICOW 2010.
20/44 Compare Observed Text to Known Disputes Glenn Beck falsely claimed that the moon is made of cheese, despite clear evidence to the contrary. False claim: "the moon is made of cheese" Disputed by: Huffington Post, New York Times Context: ... Entailment: "We should mine the moon because it ismadeof cheese"
21/44 Contradiction detection via dispute detection
22/44 Contradiction detection vs Dispute Detection Contradiction detection: Does statement X logically contradict statement Y. Hard: need lots of real-world knowledge. Dispute detection: Does author A believe that statement X is disputed or misleading. Humans determine what is actually disputed. Humans determine which disputes are interesting. Only detects contradictions that humans find. Detects statements that are misleadingwithout being wrong. Once we have determined that a dispute is real, could use contradiction detection and sentiment analysis to see who is on each side.
23/44 A statement can be misleading without being wrong GM's misleading claim that the Chevrolet Volt gets 230 miles per gallon deceptively claimed that fast food could be nutritious Logical truth isn't all that interesting. We want to know if there is a different way of looking at the subject. A different frame.
25/44 Use Patterns to Find Disputed Claims the false claim that Himalayan glaciers could melt away by 2035 it is not true that anyone aged over 59 cannot receive heart repairs the misconception that everyone in the south are stupid the delusion that scientists in different countries do science differently into believing that Van Morrison had a new baby the myth that we can't afford good working conditions for everyone misleadingly claimed that unemployment is lower than the '70s We built a simple grammar for such prefixes. Currently 1293 patterns, identified on ~ 35 million web pages. of which we have downloaded and processed 2 million.
Restricting to prefixes allows us to search for them using Yahoo BOSS. Future: automatically infer a larger grammar of patterns
26/44 Some Disputes I Wasn’t Aware of The Niger-Iraq Uranium connection has been discredited Medieval Europeans thought the world was flat Dinosaurs looked sleek and reptilian. Dietary Cholesterol is a problem. “Wear and Tear” causes arthritis Specific foods cause ulcers Estimates from Yahoo BOSS. Not all URLs downloaded.
Most Disputed Nouns God Iraq Government Obama War 6.Israel 7. President 8. Women 9. Money 10. Jesus
28/44 Search for all patterns on Yahoo BOSS Yahoo BOSS is an API for Yahoo search.
BOSS API has a limit of 1000 hits per query, so salt with year and month.
+"falsely claimed that" +2010 +"falsely claimed that" -2010+2009 +"falsely claimed that" -2010-2009+2008 +"falsely claimed that" -2010-2009 -2008+2007 Needed for 197 patterns. We talked to Yahoo first... Future: get direct access to complete results for a pattern
29/44 Claims need to be filtered the false claim that won't go away
falsely claimed that he didn't do it wrongly think that the bill will pass wrongly think that Great Britain doesn't
the myth thatElvis is alivehas a long history
falsely claim thatfull commentary below
fragment ambiguous suffix extractionerror
30/44 Labeled data from Mechanical Turk $0.04 to label 10 claims, two of which are known. If a turker gets known items wrong, reject their work. Each claim labeled by two turkers.
31/44 Problem: text may not be a statement the false claim that won't go away the belief that works best the lie that people fell for
Current approach: Is the first word a verb? finds 71% of bad claims mistakenly drops 2% of good claims Works for first two, but not last.
32/44 Problem: ambiguous claims he didn't do it the union was a party in the proceedings the other parent is abusive our troops have committed atrocities
property taxes are regressive Obama is a communist
Bad Maybe Good If two pages say X, do they mean the same thing? Turk: 61.9% agreement - often very subjective
Future: associate claim with page topic
33/44 Wikipedia links tell us what is unambiguous property taxes are regressive Obama is a communist Is this word always linked to the same thing? Precision: 73% Recall: 73% (vs gold data + word features)
Users enter that claims they disagree with 35/44
Users add paraphrases for claims 36/44 Alternative ways to phrase the same claim.
Teach Dispute Finder to recognize claims 37/44
Users add evidence to support claims 38/44 A claim will not be shown to others unless the user finds a source that argues against it.
Users identify a disputed claim on a page 39/44 Define a new disputed claim, or add paraphrase for existing disputed claim.
40/44 User Study Results Frustrated by low number of claims that were highlighted - motivated text mining approach Did not appreciate that a claim should apply to multiple pages - particularly when using context menu approach Confused about how specific a claim should be E.g. “Global temperatures will rise by X degrees” Users created claims with ambiguous meanings E.g. saying “wood” to mean “Ronnie Wood” Confused by double-negatives when adding evidence E.g. opposes global warming does not exist Future: use users to improve mined claims
42/44 Entailment is resource constrained Must compare many sentences against a huge number of claimsin a fraction of a second.
43/44 Simple lexical entailment I think that globalwarming is just a hoax globalwarming is a hoax All non-stopwords present, and in the correct order. Very simple but:
it can be done very efficiently
if you have a big enough corpus then it works ok
Future: better entailment that still scales Future: look at context, and other places same text appears
What is Disputed? 44/44 Anything disputed by anyone? - we get overwhelmed with claims disputed by nutcases Anything disputed by a “reliable source”? - what is a “reliable source”? (Wikipedia rules?) - do we end up enforcing “orthodox” beliefs and stifling debate? Anything disputed by a source that I would trust? - we reinforce existing echo-chamber problem Anything disputed by my friends? - do I agree with my friends - should I be encouraged to agree with them Future: learn what to show a user by analyzing their behavior
Interviews: Do people want this? 45/44 Hard to change established opinions They think they already understand the issue. They would have to publically back down So focus on issues they don’t yet have an opinion on? Hard to make someone accept the other side Social identity in “us” vs “them” Not willing to listen to “other side” So give sources from their “own” side? Sometimes people may not care Reading just for entertainment and conversation material Don’t care much if they are wrong Not interested in challenging opinions of others Focus on issues that affect them personally Dispute Finder probably isn’t for everyone