A discussion of the initial steps taken to assemble a corpus of web-based “fake news” in order to facilitate a massive narrative framework analysis of online misinformation masquerading as news, using a modified version of software previously applied to the study of anti-vaccination narratives. Accompanying the data-gathering discussion is a commentary on how current web-archiving approaches and frameworks might be enhanced to help achieve such research-oriented objectives. This work additionally presents some initial results of small pilot studies conducted to test the narrative analytical techniques that ultimately will be scaled up to the level of millions of online postings. Because these subsequent studies are likely to compare the narrative “shapes” of news stories along a continuum from hoaxes to verifiable reporting, the pilot studies focus on archives of web materials based around two conspiracies: one that turned out to be real, namely, the so-called “Bridgegate” scandal of politically motivated lane closures on the George Washington Bridge, and one that was false: the so-called “Pizzagate” hoax.
Conspiracy Stories: Building Archives to Facilitate Narrative Analyses of Online Fake News
1. Conspiracy Stories:
Building Archives to
Facilitate Narrative
Analyses of Online Fake
News
Peter Broadwell (@PeterBroadwell)
Digital Library Program
University of California, Los Angeles
IFLA International News Media Conference
Reykjavík, Iceland, 27-28 April 2017
2. Peter Broadwell (@PeterBroadwell) - UCLA Digital Library
Building Archives to Facilitate Narrative Analyses of Fake News
IFLA International News Media Conference, 27 April 2017
Fake news is…
2
1 Jacob L. Nelson. 2017. “Is ‘fake news’ a fake problem?” Columbia Journalism Review, January 31, 2017
• Difficult to define and identify
• A badly overused term – especially these days
• Prominent on social media:
30% of traffic to fake news sites is from Facebook;
which is true for only 8% of legitimate news
• Financed by online advertising systems (Google),
and maybe some governments
• Now seen as a serious issue by journalists and
some Internet companies (finally)
• Not as dire a problem as some believe?1
• Worthy of further study
3. Peter Broadwell (@PeterBroadwell) - UCLA Digital Library
Building Archives to Facilitate Narrative Analyses of Fake News
IFLA International News Media Conference, 27 April 2017
3
Gregor Aisch, Jon Huang, Cecilia Kang,“Dissecting the #PizzaGate Conspiracy Theories,” New York Times, December 10, 2016
Analyzing the “shape” of conspiracy stories
4. Peter Broadwell (@PeterBroadwell) - UCLA Digital Library
Building Archives to Facilitate Narrative Analyses of Fake News
IFLA International News Media Conference, 27 April 2017
Title
4
TR Tangherlini, V Roychowdhury, B Glenn, CM Crespi, R Bandari, A Wadia, M Falahi, E
Ebrahimzadeh, R Bastani. 2016. “‘Mommy Blogs’ and the Vaccination Exemption
Narrative: Results From A Machine-Learning Approach for Story Aggregation on
Parenting Social Media Sites.” JMIR Public Health Surveillance 2:2 (2016).
5. Peter Broadwell (@PeterBroadwell) - UCLA Digital Library
Building Archives to Facilitate Narrative Analyses of Fake News
IFLA International News Media Conference, 27 April 2017
Don’t believe everything you read online…
5
http://newsroom.ucla.edu/releases/ucla-researchers-teach-computer-to-read-the-internet
6. Peter Broadwell (@PeterBroadwell) - UCLA Digital Library
Building Archives to Facilitate Narrative Analyses of Fake News
IFLA International News Media Conference, 27 April 2017
6
TR Tangherlini, V Roychowdhury, B Glenn, CM Crespi, R Bandari, A Wadia, M Falahi, E Ebrahimzadeh, R
Bastani. 2016. “‘Mommy Blogs’ and the Vaccination Exemption Narrative: Results From A Machine-Learning
Approach for Story Aggregation on Parenting Social Media Sites.” JMIR Public Health Surveillance 2:2 (2016).
Narrative framework analysis
7. Peter Broadwell (@PeterBroadwell) - UCLA Digital Library
Building Archives to Facilitate Narrative Analyses of Fake News
IFLA International News Media Conference, 27 April 2017
Building our own “fake news” web archive
7
8. Peter Broadwell (@PeterBroadwell) - UCLA Digital Library
Building Archives to Facilitate Narrative Analyses of Fake News
IFLA International News Media Conference, 27 April 2017
8
Or… using someone else’s archive?
9. Peter Broadwell (@PeterBroadwell) - UCLA Digital Library
Building Archives to Facilitate Narrative Analyses of Fake News
IFLA International News Media Conference, 27 April 2017
9
Small-scale case studies:
Pizzagate
and
Bridgegate
10. Peter Broadwell (@PeterBroadwell) - UCLA Digital Library
Building Archives to Facilitate Narrative Analyses of Fake News
IFLA International News Media Conference, 27 April 2017
Pre-selected “archives” of coverage
10
11. Peter Broadwell (@PeterBroadwell) - UCLA Digital Library
Building Archives to Facilitate Narrative Analyses of Fake News
IFLA International News Media Conference, 27 April 2017
Web archives: too much data,
not enough information
11
In WANE files Extracted by DBpedia Spotlight
3,376 unique names 954 unique Wikipedia entities
Christie_(P): 376
Tony_(P): 241
Chris_Christie_(P): 236
David_Wildstein_(P): 190
Bridget_Anne_Kelly_(P): 169
Chris_Christie_(P): 435
David_Wildstein_(P): 263
Bridget_Anne_Kelly_(P): 222
Bill_Baroni_(P): 194
Mark_Sokolich_(P): 130
• 2 seed URLs, 1 from Huffington Post, 1 from NJ Record
• Full Archive-It WARCs contain 124,355 unique linked URLs
• But 47,398 are adverts/spam, 73,544 likely news articles
• Only 415 news articles (163 from HuffPo, 252 from the Record)
are relevant to Bridgegate (.5%)
12. Peter Broadwell (@PeterBroadwell) - UCLA Digital Library
Building Archives to Facilitate Narrative Analyses of Fake News
IFLA International News Media Conference, 27 April 2017
Returning to Pizzagate…
12
13. Peter Broadwell (@PeterBroadwell) - UCLA Digital Library
Building Archives to Facilitate Narrative Analyses of Fake News
IFLA International News Media Conference, 27 April 2017
13
Gregor Aisch, Jon Huang, Cecilia Kang,“Dissecting the #PizzaGate Conspiracy Theories,” New York Times, December 10, 2016
Can we generate this computationally?
14. Peter Broadwell (@PeterBroadwell) - UCLA Digital Library
Building Archives to Facilitate Narrative Analyses of Fake News
IFLA International News Media Conference, 27 April 2017
14
A simple Pizzagate network model
15. Peter Broadwell (@PeterBroadwell) - UCLA Digital Library
Building Archives to Facilitate Narrative Analyses of Fake News
IFLA International News Media Conference, 27 April 2017
15
Towards a useful Pizzagate network model
16. Peter Broadwell (@PeterBroadwell) - UCLA Digital Library
Building Archives to Facilitate Narrative Analyses of Fake News
IFLA International News Media Conference, 27 April 2017
16
Bridgegate network, highlighting “degree”
17. Peter Broadwell (@PeterBroadwell) - UCLA Digital Library
Building Archives to Facilitate Narrative Analyses of Fake News
IFLA International News Media Conference, 27 April 2017
17
Bridgegate network, showing “betweenness”
18. Peter Broadwell (@PeterBroadwell) - UCLA Digital Library
Building Archives to Facilitate Narrative Analyses of Fake News
IFLA International News Media Conference, 27 April 2017
Archiving fake news for research…
18
• Is difficult to do well
• Requires constant monitoring and checking of
targeted sites, given capabilities of existing tools
• Would benefit from coordination between
institutions to distribute web-crawling tasks
• Entails a great deal of manual and semi-automated
content classification and filtering after collection
• Calls for the use of more sophisticated tools for
named entity resolution, disambiguation and
versioning of archival content
• Is potentially very worthwhile, despite these issues
19. Peter Broadwell (@PeterBroadwell) - UCLA Digital Library
Building Archives to Facilitate Narrative Analyses of Fake News
IFLA International News Media Conference, 27 April 2017
Thanks!
19
• Prof. Tim Tangherlini, UCLA Scandinavian Section
• Prof. Vwani Roychowdhury, UCLA Electrical Engineering
• Ph.D. students, UCLA Electrical Engineering:
• Ehsan Ebrahimzadeh
• Behnam Shahbazi
• Misagh Falahi
• Mark Graham, Internet Archive
• Karl Blumenthal, Archive-It