Measuring Human Perception
to Defend Democracy
Elissa M. Redmiles
@eredmil1
eredmiles@gmail.com
Recent attacks on
democracy are attacks
on human perceptions
This has been going on for a long time…
“The first casualty in war is truth”
“Black Brute” message
in Jim Crow propaganda
The “Evil Jew” message
in Nazi propaganda
Not all influence
has ill intent
Propaganda has two goals
Influence people Avoid detection
(labeling as propaganda)
Novel technology,
especially targeting,
helps propaganda
achieve these goals
Don’t take my word for it, let’s consider
a case study of the Russian DNC Ads
On Microtargeting Socially Divisive Ads: A Case Study of Russia-Linked Ad Campaigns on Facebook
Ribeiro, F., Saha, K., Babaei, M., Henrique, L., Messias, J., Goga, O., Benevenuto, F., Gummadi, K.P.,
and Redmiles, E.M. ACM FAT* 2019.
DNC released dataset of 3,517 ads that
Facebook identified as IRA sponsored
USElection
0
10
1
102
103
10
4
105
106
Jun−2015
Jul−2015
Aug−2015
Sep−2015
Oct−2015
Nov−2015
Dec−2015
Jan−2016
Feb−2016
Mar−2016
Apr−2016
May−2016
Jun−2016
Jul−2016
Aug−2016
Sep−2016
Oct−2016
Nov−2016
Dec−2016
Jan−2017
Feb−2017
Mar−2017
Apr−2017
May−2017
Jun−2017
Jul−2017
Aug−2017
Amount
Impressions
Clicks
Cost (USD)
Ads
8
Volume of advertising increased as
election approached & at inauguration
Ad volume increases by nearly an order of
magnitude in the 2 months before election
Inauguration
10.8
1.6
1.0
0.80
0.56
0 10
IRA
Retail
Fitness
Healthcare
Finance
Avg. CTR (%) of Facebook Ads
IRA-sponsored ads had incredibly high engagement
Avg. Clicks: 1,062
Avg. Impressions: 11,536
Avg. Spend: $34.50
Propaganda has two goals
Influence people Avoid detection
(labeling as propaganda)
Influence people Avoid detection
(labeling as propaganda)
Using human perception measurements to
understand whether IRA ads achieved these goals
ReportingApproval
Perception
Proxy
Propaganda
Goal
Influence people
Measured divisiveness in ad approval to proxy
for whether propaganda influenced people
Propaganda
Goal
Survey Measurement
of Approval
1) Do you approve or disapprove
of what this ad says or implies?
Answer choices: Approve;
Disapprove; Neither; There is
nothing in this ad to approve or
disapprove of; I don’t know.
2) “Do you [approve/disapprove]
very strongly, or not so strongly?”
Metrics of
Divisiveness
Between-group:
how different are the
opinions of different
ideological groups?
Within-group:
do people of the same
political ideology have
consistent opinions?
Accurate human measurements require
careful questionnaire design
Pre-testing
Questions were evaluated & revised using cognitive interviews
that help ensure respondents consistently understand questions
Cognitive Load
Survey methodology research suggests that the shorter the
survey, the more accurate the measurements. Each participant
in our survey saw 10 ads
Internal measurement validity
Ads were randomized to avoid priming effects & questions had
a ”don’t know” response option to avoid inaccurate answers
Accurate human measurements also
require careful choice of respondent sample
Sample should be representative of
the population you wish to study
To represent the U.S. population, we used
an online panel (SSI) to collect a
census-representative sample comprised
of 40% Democrats, 40% Republicans, and
20% Independents across the U.S.
n = 2,886
Studied the 485 most impact ads (>80% spend,
impressions, clicks) run before the election
Each ad evaluated by at least 15 respondents
Democratic respondents approve the same ads with similar
and near-opposite strength to Republican respondents
Low within-group divisiveness:
people with similar political
leanings approve of the ads
with similar strength
𝑠𝑡𝑑 𝑑𝑒𝑣 (𝑎𝑝𝑝𝑟𝑜𝑣𝑎𝑙 𝑜𝑓 𝑖𝑑𝑒𝑜𝑙𝑜𝑔𝑖𝑐𝑎𝑙 𝑔𝑟𝑜𝑢𝑝)
𝑠𝑡𝑑 𝑑𝑒𝑣 (𝑎𝑙𝑙 𝑟𝑒𝑠𝑝𝑜𝑛𝑑𝑒𝑛𝑡𝑠)
< 1
High between-group divisiveness
We don’t want to honor racism, slavery
and hatred. This is what Confederate
Heritage is. Not My Heritage Rally
America is at risk. To protect our country we
need to secure the border
It’s ok they’re women so they’ll only
find the kitchen
Propaganda has two goals
Influence people Avoid detection
(labeling as propaganda)
Avoid detection
(labeling as propaganda)
Measured reporting behavioral intent to proxy
for whether propaganda would avoid detection
Reporting
Perception
Proxy
Propaganda
Goal
Survey
Measurement
Some social media platforms
allow you to report content by
clicking "report". Would you
report this ad?
Answer choices:
“Yes”, “No”, “I don’t know”
Between-group divisiveness in reporting
For nearly ¾ of the ads, at least 3 of 15 respondents (20%)
who reviewed the ad said they would have reported it
Reported by the Liberals
Not Reported by the Conservatives
Reported by the Conservatives
Not Reported by the Liberals20
Propaganda has two goals
Influence people Avoid detection
(labeling as propaganda)
IRA used targeting to avoid detection: match ads to
those who wouldn’t report / would approve
Reporting Approval
Ad
ProportionReported
Non-Targeted
Targeted
Ad
ApprovalScore
Non-Targeted
Targeted
The IRA primarily targeted their ads using
attribute-based targeting
• Using name, phone number, or email address to target the ads
• None of the IRA ads used this method
PII targeting
• Provide user PII or page, Facebook identifies targets "like" that user or page audience
• 1.1% of IRA ads were targeted this way
Look-a-like audience targeting
• Create a targeting formula based on demographics (gender, age, location, language),
advanced demographics (e.g., parents with toddlers, political leaning, etc.), interests (e.g.,
religion, travel) and behaviors (e.g., new vehicle buyers)
• ~99% of high impact IRA ads were targeted this way
• Up to 39 attributes used in targeting; 78% of ads had 2+ attributes used to target
Attributed-based targeting
Most used
targeting
attributes
were race-
related
Up to 64% of IRA ads may
have been targeted using
Facebook’s automated
targeting suggestions
Propaganda has two goals
Influence people Avoid detection
(labeling as propaganda)
The rest of this talk:
can we use human perceptions
to defend democracy?
Identifying perception
manipulation proactively could
be our “canary in the coal mine”
Leveraging human perceptions to protect
democracy: Case study of
fact checking prioritization
Analysing Biases in Perception of Truth in News Stories and their Implications for Fact Checking
Babaei, M., Kulshrestha, J., Chakraborty, A., Redmiles, E.M., Cha, M., and Gummadi, K.P. ACM FAT* 2019.
Traditionally, journalist claims were fact
checked; Now, anyone can produce “news”
• Produced by professional
journalists
• Limited number of media
sources
• Media watchdog groups check
for bias and inaccuracies
• Produced by members of
the crowd
• Number of potential news
sources is huge
• Not possible to monitor all
news sources
Fake news: news which is verifiably false
Different from the IRA ads
in that it is verifiably false
Similar in that it alters
human perceptions
Verifying news is time consuming & costly
• Verify stories to be True, Mostly True, False, Mostly False …
Expert fact checkers follow principled approach
(e.g., Poynter’s Code of Principles)
• Snopes, PolitiFact, FactCheck.org, AltNews, etc.
Dedicated fact-checking organizations
Solution: prioritize stories for fact checking
Problem: cannot fact check all stories
Current state-of-the-art: how Facebook,
Twitter, etc. prioritize stories for fact checking
Ask users to report news stories which they perceive to be false
Stories reported as false by many users are prioritized
Goal: Fact check false news with
higher probability than true news
Does the strategy achieve this goal?
Depends on how users perceive
the truthfulness of different stories
Let’s consider an example set of news
• S1: President Trump inherited a White House infested with cockroaches due
to the careless behavior of his predecessor, Barack Obama. False
• S2: Sen. John McCain’s vote against a ‘skinny repeal’ health care proposal
stopped attempts to repeal the Affordable Care Act for FY ’17. False
• S3: The national debt saw a ‘surprising’ decline of $102 billion
between January 20 and July 27 2017. True
• S4: President Donald Trump changed the constitution to
read ‘citizens’ instead of persons. Mostly False
Goal: Fact check false news with higher probability than true news
S2, S1 > S4 > S3
But, how do people
perceive the truthfulness
of these stories?
Created Truth Perception Tests (similar to Implicit
Association Tests) to evaluate truth perceptions
Participants rapidly (in 15 seconds or less) provide their
truth perception for the story or headline
Human measurements must be done carefully: we
conducted six micro-experiments validating our TPTs
TPTs are robust to sampling, answer choice, incentive and satisficing effects
Survey panel
Expert panel
Accuracy incentive
7 point scale
6 point scale
5 point scale
Going back to our example, relying on people’s truth
perceptions (current approach) does not achieve goal
Goal ranking by ground truth:
S2, S1 > S4 > S3
Ranking by truth perception:
S1 > S4 > S3 > S2
But, truth perceptions can help us achieve other goals:
decreasing polarization, correcting misperceptions
Identifying perception
manipulation proactively could
be our “canary in the coal mine”
Platforms can use TPTs to assess
perceptions of content in an
evenly-sampled user population
to flag issues as they emerge
Opportunities for
future work
Human perception measurements, and human computation,
has rarely been combined with NLP in the democracy space
Resources for getting started with
human measurement (survey methods)
Overview of survey methodology best practices & pitfalls:
bit.ly/surveyHandbook
Tutorial presentation of best practices & case studies:
bit.ly/surveyTutorial
University of Maryland / Michigan Joint Institute on Survey Methods
Elissa Redmiles | eredmiles@gmail.com | @eredmil1

Measuring Human Perception to Defend Democracy

  • 1.
    Measuring Human Perception toDefend Democracy Elissa M. Redmiles @eredmil1 eredmiles@gmail.com
  • 2.
    Recent attacks on democracyare attacks on human perceptions
  • 3.
    This has beengoing on for a long time… “The first casualty in war is truth” “Black Brute” message in Jim Crow propaganda The “Evil Jew” message in Nazi propaganda Not all influence has ill intent
  • 4.
    Propaganda has twogoals Influence people Avoid detection (labeling as propaganda)
  • 5.
    Novel technology, especially targeting, helpspropaganda achieve these goals
  • 6.
    Don’t take myword for it, let’s consider a case study of the Russian DNC Ads On Microtargeting Socially Divisive Ads: A Case Study of Russia-Linked Ad Campaigns on Facebook Ribeiro, F., Saha, K., Babaei, M., Henrique, L., Messias, J., Goga, O., Benevenuto, F., Gummadi, K.P., and Redmiles, E.M. ACM FAT* 2019.
  • 7.
    DNC released datasetof 3,517 ads that Facebook identified as IRA sponsored
  • 8.
  • 9.
    10.8 1.6 1.0 0.80 0.56 0 10 IRA Retail Fitness Healthcare Finance Avg. CTR(%) of Facebook Ads IRA-sponsored ads had incredibly high engagement Avg. Clicks: 1,062 Avg. Impressions: 11,536 Avg. Spend: $34.50
  • 10.
    Propaganda has twogoals Influence people Avoid detection (labeling as propaganda)
  • 11.
    Influence people Avoiddetection (labeling as propaganda) Using human perception measurements to understand whether IRA ads achieved these goals ReportingApproval Perception Proxy Propaganda Goal
  • 12.
    Influence people Measured divisivenessin ad approval to proxy for whether propaganda influenced people Propaganda Goal Survey Measurement of Approval 1) Do you approve or disapprove of what this ad says or implies? Answer choices: Approve; Disapprove; Neither; There is nothing in this ad to approve or disapprove of; I don’t know. 2) “Do you [approve/disapprove] very strongly, or not so strongly?” Metrics of Divisiveness Between-group: how different are the opinions of different ideological groups? Within-group: do people of the same political ideology have consistent opinions?
  • 13.
    Accurate human measurementsrequire careful questionnaire design Pre-testing Questions were evaluated & revised using cognitive interviews that help ensure respondents consistently understand questions Cognitive Load Survey methodology research suggests that the shorter the survey, the more accurate the measurements. Each participant in our survey saw 10 ads Internal measurement validity Ads were randomized to avoid priming effects & questions had a ”don’t know” response option to avoid inaccurate answers
  • 14.
    Accurate human measurementsalso require careful choice of respondent sample Sample should be representative of the population you wish to study To represent the U.S. population, we used an online panel (SSI) to collect a census-representative sample comprised of 40% Democrats, 40% Republicans, and 20% Independents across the U.S. n = 2,886 Studied the 485 most impact ads (>80% spend, impressions, clicks) run before the election Each ad evaluated by at least 15 respondents
  • 15.
    Democratic respondents approvethe same ads with similar and near-opposite strength to Republican respondents Low within-group divisiveness: people with similar political leanings approve of the ads with similar strength 𝑠𝑡𝑑 𝑑𝑒𝑣 (𝑎𝑝𝑝𝑟𝑜𝑣𝑎𝑙 𝑜𝑓 𝑖𝑑𝑒𝑜𝑙𝑜𝑔𝑖𝑐𝑎𝑙 𝑔𝑟𝑜𝑢𝑝) 𝑠𝑡𝑑 𝑑𝑒𝑣 (𝑎𝑙𝑙 𝑟𝑒𝑠𝑝𝑜𝑛𝑑𝑒𝑛𝑡𝑠) < 1 High between-group divisiveness We don’t want to honor racism, slavery and hatred. This is what Confederate Heritage is. Not My Heritage Rally America is at risk. To protect our country we need to secure the border It’s ok they’re women so they’ll only find the kitchen
  • 16.
    Propaganda has twogoals Influence people Avoid detection (labeling as propaganda)
  • 17.
    Avoid detection (labeling aspropaganda) Measured reporting behavioral intent to proxy for whether propaganda would avoid detection Reporting Perception Proxy Propaganda Goal Survey Measurement Some social media platforms allow you to report content by clicking "report". Would you report this ad? Answer choices: “Yes”, “No”, “I don’t know”
  • 18.
    Between-group divisiveness inreporting For nearly ¾ of the ads, at least 3 of 15 respondents (20%) who reviewed the ad said they would have reported it
  • 19.
    Reported by theLiberals Not Reported by the Conservatives Reported by the Conservatives Not Reported by the Liberals20
  • 20.
    Propaganda has twogoals Influence people Avoid detection (labeling as propaganda)
  • 21.
    IRA used targetingto avoid detection: match ads to those who wouldn’t report / would approve Reporting Approval Ad ProportionReported Non-Targeted Targeted Ad ApprovalScore Non-Targeted Targeted
  • 22.
    The IRA primarilytargeted their ads using attribute-based targeting • Using name, phone number, or email address to target the ads • None of the IRA ads used this method PII targeting • Provide user PII or page, Facebook identifies targets "like" that user or page audience • 1.1% of IRA ads were targeted this way Look-a-like audience targeting • Create a targeting formula based on demographics (gender, age, location, language), advanced demographics (e.g., parents with toddlers, political leaning, etc.), interests (e.g., religion, travel) and behaviors (e.g., new vehicle buyers) • ~99% of high impact IRA ads were targeted this way • Up to 39 attributes used in targeting; 78% of ads had 2+ attributes used to target Attributed-based targeting
  • 23.
  • 24.
    Up to 64%of IRA ads may have been targeted using Facebook’s automated targeting suggestions
  • 25.
    Propaganda has twogoals Influence people Avoid detection (labeling as propaganda)
  • 26.
    The rest ofthis talk: can we use human perceptions to defend democracy?
  • 27.
    Identifying perception manipulation proactivelycould be our “canary in the coal mine”
  • 28.
    Leveraging human perceptionsto protect democracy: Case study of fact checking prioritization Analysing Biases in Perception of Truth in News Stories and their Implications for Fact Checking Babaei, M., Kulshrestha, J., Chakraborty, A., Redmiles, E.M., Cha, M., and Gummadi, K.P. ACM FAT* 2019.
  • 29.
    Traditionally, journalist claimswere fact checked; Now, anyone can produce “news” • Produced by professional journalists • Limited number of media sources • Media watchdog groups check for bias and inaccuracies • Produced by members of the crowd • Number of potential news sources is huge • Not possible to monitor all news sources
  • 30.
    Fake news: newswhich is verifiably false Different from the IRA ads in that it is verifiably false Similar in that it alters human perceptions
  • 31.
    Verifying news istime consuming & costly • Verify stories to be True, Mostly True, False, Mostly False … Expert fact checkers follow principled approach (e.g., Poynter’s Code of Principles) • Snopes, PolitiFact, FactCheck.org, AltNews, etc. Dedicated fact-checking organizations Solution: prioritize stories for fact checking Problem: cannot fact check all stories
  • 32.
    Current state-of-the-art: howFacebook, Twitter, etc. prioritize stories for fact checking Ask users to report news stories which they perceive to be false Stories reported as false by many users are prioritized Goal: Fact check false news with higher probability than true news Does the strategy achieve this goal? Depends on how users perceive the truthfulness of different stories
  • 33.
    Let’s consider anexample set of news • S1: President Trump inherited a White House infested with cockroaches due to the careless behavior of his predecessor, Barack Obama. False • S2: Sen. John McCain’s vote against a ‘skinny repeal’ health care proposal stopped attempts to repeal the Affordable Care Act for FY ’17. False • S3: The national debt saw a ‘surprising’ decline of $102 billion between January 20 and July 27 2017. True • S4: President Donald Trump changed the constitution to read ‘citizens’ instead of persons. Mostly False Goal: Fact check false news with higher probability than true news S2, S1 > S4 > S3
  • 34.
    But, how dopeople perceive the truthfulness of these stories?
  • 35.
    Created Truth PerceptionTests (similar to Implicit Association Tests) to evaluate truth perceptions Participants rapidly (in 15 seconds or less) provide their truth perception for the story or headline
  • 36.
    Human measurements mustbe done carefully: we conducted six micro-experiments validating our TPTs TPTs are robust to sampling, answer choice, incentive and satisficing effects Survey panel Expert panel Accuracy incentive 7 point scale 6 point scale 5 point scale
  • 37.
    Going back toour example, relying on people’s truth perceptions (current approach) does not achieve goal Goal ranking by ground truth: S2, S1 > S4 > S3 Ranking by truth perception: S1 > S4 > S3 > S2
  • 38.
    But, truth perceptionscan help us achieve other goals: decreasing polarization, correcting misperceptions
  • 39.
    Identifying perception manipulation proactivelycould be our “canary in the coal mine” Platforms can use TPTs to assess perceptions of content in an evenly-sampled user population to flag issues as they emerge
  • 40.
    Opportunities for future work Humanperception measurements, and human computation, has rarely been combined with NLP in the democracy space
  • 41.
    Resources for gettingstarted with human measurement (survey methods) Overview of survey methodology best practices & pitfalls: bit.ly/surveyHandbook Tutorial presentation of best practices & case studies: bit.ly/surveyTutorial University of Maryland / Michigan Joint Institute on Survey Methods Elissa Redmiles | eredmiles@gmail.com | @eredmil1

Editor's Notes

  • #4 1877-1950 jim crow
  • #8 The entire IRA ads dataset consists a total of 3517 ads. The temporal figure demonstrates that these ad campaigns intensified near the U.S. election period – increasing by nearly 1 order of magnitude The average cost of the ads amounted at 34.5 USD, average number of views of 11k, and average number of clicks of 1062. *Let us talk about the click-through-rate or the CTR of the ads. CTR is a mechanism to measure the effectiveness of ads, that essentially is the percentage of clicks per total number of views or impressions of ad. The average CTR of IRA ads amounted at 10.8, which is extremely high compared to any other type of Facebook ads.
  • #9 The entire IRA ads dataset consists a total of 3517 ads. The temporal figure demonstrates that these ad campaigns intensified near the U.S. election period – increasing by nearly 1 order of magnitude The average cost of the ads amounted at 34.5 USD, average number of views of 11k, and average number of clicks of 1062. *Let us talk about the click-through-rate or the CTR of the ads. CTR is a mechanism to measure the effectiveness of ads, that essentially is the percentage of clicks per total number of views or impressions of ad. The average CTR of IRA ads amounted at 10.8, which is extremely high compared to any other type of Facebook ads.
  • #10 The entire IRA ads dataset consists a total of 3517 ads. The temporal figure demonstrates that these ad campaigns intensified near the U.S. election period – increasing by nearly 1 order of magnitude The average cost of the ads amounted at 34.5 USD, average number of views of 11k, and average number of clicks of 1062. *Let us talk about the click-through-rate or the CTR of the ads. CTR is a mechanism to measure the effectiveness of ads, that essentially is the percentage of clicks per total number of views or impressions of ad. The average CTR of IRA ads amounted at 10.8, which is extremely high compared to any other type of Facebook ads.
  • #12 We have some proof from the election, but we can’t do a controlled experiment (unethical) and we are doing this retroactively
  • #13 Many types of influence, in this case goal has appeared to be inciting division between groups
  • #17 We have some proof from the election, but we can’t do a controlled experiment (unethical) and we are doing this retroactively
  • #21 Here is a contrasting example. These ads target two different inciteful topics. The ad on the left, which targets immigration in the country mentions that immigrants are like parasites, and we must get rid of them. This particular ad would have been reported by the liberals, whereas not by conservatives. The ad on the right targets the topics of muslims, and that they are subject to islamophobia. This ad would have been reported by the conservatives, but not by the liberals. Therefore, if the first ad is targeted only to the conservatives, they would probably never get reported, and the second, would probably never be reported if only targeted to the liberals.
  • #22 We have some proof from the election, but we can’t do a controlled experiment (unethical) and we are doing this retroactively
  • #37 What actually happens?
  • #39 Results correlated 0.90 to 0.96 meaning this was robust
  • #41 Using demographic features like those used in the ad targeting previously, can predict GTL of headlines with 82% accuracy. Look at difference between predicted GTL and PTL Objective 3 can be achieved with no prediction – just knowledge of political leaning