Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking

Believe it or not: Designing a Human-AI
Partnership for Mixed-Initiative Fact-Checking
Joint work with
An Thanh Nguyen (UT), Byron Wallace (Northeastern), & more!
Matt Lease
School of Information @mattlease
University of Texas at Austin ml@utexas.edu
Slides:
slideshare.net/mattlease

“The place where people & technology meet”
~ Wobbrock et al., 2009
“iSchools” now exist at over 65 universities around the world
www.ischools.org
What’s an Information School?
2Matt Lease (UT Austin) • Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking

Information Literacy
National Information Literacy Awareness Month,
US Presidential Proclamation, October 1, 2009.
“Though we may know how to find the information
we need, we must also know how to evaluate it.
Over the past decade, we have seen a crisis of
authenticity emerge. We now live in a world where
anyone can publish an opinion or perspective, whether
true or not, and have that opinion amplified…”

“Truthiness”
“Truthiness is tearing apart our country... It used to
be everyone was entitled to their own opinion, but
not their own facts. But that’s not the case anymore.”
– Stephen Colbert (Jan. 25, 2006)

Journalist Fact-Checking

Danny Sullivan,
April 7, 2017

Automated Fact-Checking

Fake News Challenge

Matt Lease (UT Austin) • Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking
Challenges
• Fair, Accountable, & Transparent (AI)
– Why trust “black box” classifier?
– How do we reason about potential bias?
– Do people really only want to know true vs. false?
– How to integrate human knowledge/experience?
• Joint AI + Human Reasoning, Correct Errors, Personalization
• How to design strong Human + AI Partnerships?
– Horvitz, CHI’99: mixed-initiative design
– Dove et al., CHI’17 “Machine Learning As a Design Material”
12

MemeBrowser (Ryu et al., ACM HyperText’12)
http://odyssey.ischool.utexas.edu/mb

• Crowdsourced stance labels
– Hybrid AI + Human (near real-time) Prediction
• Joint model of stance, veracity, & annotators
– Interaction between variables
– Interpretable
• Source on github
Nguyen et al., AAAI’18
14

This Work
15
Demo!

This Work
16
http://fcweb.pythonanywhere.com

Primary Interface
17

Source Reputation
18

System Architecture
• Google Search API
• Two logistic regression models
– Stance (Ferreira & Vlachos ’16) w/ same features
• average accuracy > 70% but variable over claims
– Veracity (Popat et al. ‘17)
– Scikit-learn, L1 regularization, Liblinear solver, &
default parameters
19

Data: Train & Test
Emergent (Ferreira & Vlachos ’16)
Accuracy of prediction models
20

User Study 1: Setup
• 2 Groups: Control vs. System
21

User Study 1: Setup
22

User Study 1: Setup
23

User Study 1: Setup
– 113 participants (58 control, 55 system) – MTurk
• 1. Asked to predict claim veracity
– Likert: Def. False, Prob. F., neutral, Prob. True, Def. T.
– Error: distance
• For a (definitely) false claim, PF -> error=1, DT -> error=4
• 2. Shown model’s claim prediction
24

User Study 1: Setup
25

User Study 1: Setup
26

User Study 1: Setup
– 113 participants (58 control, 55 system) – MTurk
– Error: distance
• For a (definitely) false claim, PF -> error=1, DT -> error=4
• 2. Shown model’s claim prediction
– Asked if they want to change their answer?
27

Before seeing model’s claim prediction
• “System” group:
– claims 1-2: > avg. prediction error
– claim 4: < error
– claims 3, 5: only small differences
• Human accuracy in claim prediction roughly
follows model’s accuracy in stance prediction
– i.e., helped when model correct, hurt when not
28

After seeing model’s claim prediction
• “System” group:
– Smaller changes for errors
– change answers > “control” group
• Human accuracy in claim prediction roughly
follows model’s accuracy in veracity prediction
– i.e., helped when model correct, hurt when not
29

Study 1: Statistical Significance (1 of 2)
• Mixed-effects Generalized Linear Model (GLM)
• Before seeing model’s claim predictions
– CSP: # correct stance predictions seen
– WSP: # wrong stance predictions seen
– 2-tail test includes unlikely possibility that seeing
correct stance predictions increases human error
30

Study 1: Statistical Significance (2 of 2)
• Mixed-effects Generalized Linear Model (GLM)
• After seeing model’s claim predictions
– CSP: # correct stance predictions seen
– WSP: # wrong stance predictions seen
– 2-tail test includes unlikely possibility that seeing
correct stance predictions increases human error
31

User Study 2: Setup
• 2 Groups: Control vs. Slider
32

User Study 2: Setup
• 2 Groups: Control vs. Slider
– 109 participants (51 control, 58 slider) – MTurk
and indicate confidence in prediction
– (-20,+20) score range: accuracy x confidence
• eg, a correct answer with 75% confidence -> 20x75%=15
33

• “Slider” group:
– Claims 1-3: > score on average
– Claims 4-5: < first quartiles, but medians same
• Some participants negatively impacted by slider
– No statistically significant difference on average
34
User Study 2: Results

Discussion & Future Work
• Fact Checking & IR (Lease, DESIRES’18)
– How to diversify search results for controversial topics?
– Information evaluation (eg, vaccination & autism)
• Potential harm as well as good
– Potential added confusion, data / algorithmic bias
– Potential for personal “echo chamber”
– Adversarial applications
• Future Work
– Making personalization more visible
– Collaborative use, small and big
• 1st author Nguyen looking for a postdoc!
35

Conclusion
• Fact-checking more than black-box prediction
– Interaction, exploration, trust
• We proposed a mixed-initiative human + AI
partnership for fact-checking
– Back-end AI + front-end interface/interaction
– Support AI + human collaboration
– Fair, Accountable, & Transparent (FAT) AI
36

Thank You!
Slides: slideshare.net/mattlease
Lab: ir.ischool.utexas.edu
37

Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking

Similar to Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking (20)

More from Matthew Lease

More from Matthew Lease (20)

Recently uploaded

Recently uploaded (20)

Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking