SlideShare a Scribd company logo
1 of 38
Download to read offline
Providing tools to build collections of stories for local events from local sources
1
Th is work was made possible in part by IMLS LG-71-15-0077-15 and
support from the Harvard Law School Library. We are grateful for the support.
Local Memory Project (LMP)
http://www.localmemory.org/, https://twitter.com/localmem
Alexander C. Nwala, Michele C. Weigle, and Michael L. Nelson
@webscidl
Old Dominion University
Adam B. Ziegler and Anastasia Aizman
@harvardlil
Harvard Library Innovation Lab
Presented by: Alexander C. Nwala (@acnwala)
Computer Science Ph.D student
Media Cloud Intern, Berkman Klein Center for Internet & Society, Harvard University
JCDL 2017, June 21, 2017
2
LMP: Outline
1. Introduction
2. LMP local stories collection building
a. Geo: Nearby news media discovery
b. Chrome Extension: Collection building
c. Collection archiving
d. Community collection building
3. Evaluation
a. Dataset
b. Metrics/Results
4. Conclusions
3
Local Michigan media first reported on the Flint water changeover in 2014
http://www.mlive.com/opinion/flint/index.ssf/2014/04/editorial_switch_to_flint_rive.html
● April 2014: Officials in Flint, Michigan switched
the city’s water source from Lake Huron (Detroit
water system) to the Flint River
● This news was reported by local media such as
Michigan Radio, the Flint Journal-MLive, and
local TV affiliates in Flint (WEYI, WJRT, WSMH,
and WNEM)1
1 Denise Robbins. 2016. ANALYSIS: How Michigan And National Reporters Covered The Flint Water Crisis. h ttps://mediama tters.org/research/2016/02/02/analysis-how-michigan-and-national-reporters-co/208290. (2016).
4
http://www.mlive.com/news/flint/index.ssf/2014/05/state_says_flint_river_water_m.html
● May 23, 2014: City residents complained about
the water’s taste and smell
● This news was reported Ron Fonger of Flint
Journal-MLive reported (local media)2
2 Ron Fonger. 2014. State says Flint River water meets all standards but more than twice the hardness of lake water. h ttp://www.mlive.com/news/ int/index.ssf/2014/05/state_says_fl int_river_water_m.html. (2014).
City residents complained about the water’s taste and smell
5
Between August and September 2014: the city issued three boil advisories to residents of Flint
after finding fecal coliform bacteria (E. coli) in the water1
http://www.mlive.com/news/flint/index.ssf/2014/09/flint_says_drinking_water_advi
.html
http://www.mlive.com/news/flint/index.ssf/2014/09/flint_lifts_boil_water_advisor.html
http://www.mlive.com/news/flint/index.ssf/2014/09/flint_flushes_out_latest_water.ht
ml
Flint issues three boil advisories after finding E. coli in the water
6
1 Denise Robbins. 2016. ANALYSIS: How Michigan And National Reporters Covered The Flint Water Crisis. h ttps://mediamatt ers.org/research/2016/02/ 02/analysis-how-michigan-and-national-reporters-co/208290. (2016).
January 5, 2016: Governor Rick Snyder declared a state of emergency for the city of Flint, due to
dangerously high levels of lead contamination in the drinking water
https://www.democracynow.org/2016/1/8/poisoned_democracy_how_an_unelected_official
January 2016, Governor Rick Snyder declared a state of emergency for Flint
7
● A chain of events about the Flint water crisis was reported by local media, but most of the non-local media did
not report this crucial story until 2016.1
● Local media is fundamental to journalism, but is in decline.3
LMP attempts to shed some light on local media
1 Denise Robbins. 2016. ANALYSIS: How Michigan And National Reporters Covered The Flint Water Crisis. h ttps://mediamatt ers.org/research/2016/02/02/analysis-how-michigan-and-national-reporters-co/208290. (2016).
3 Rasmus Kleis Nielsen. 2015. Local journalism: the decline of newspapers and the rise of digital media. IB Tauris.
Non-local media did not report this crucial story until 2016
https://cloudfront.mediamatters.org/static/uploader/image/2016/02/03/flinttimeline1.png
8
Local and non-Local media have different priorities
Non-local news organizations such as CNN cover
stories of a broader (national/international) scope such
as Obamacare and the Syrian refugee migrant crisis
Local media such as the Caloosa Belle Newspaper
(LaBelle, FL) cover stories that would not naturally
be of interest to another locality, such as the annual
Swamp Cabbage Festival
http://caloosabelle.com/?s=swamp+cabbage http://www.cnn.com/specials/world/migration-crisis
9
LMP: Introduction
LMP provides a suite of tools (beginning with two) to help users and
small communities discover, collect, build, archive, and share collections
of stories for important local events by leveraging local news sources
10
LMP: Outline
1. Introduction
2. LMP local stories collection building
a. Geo: Nearby news media discovery
b. Chrome Extension: Collection building
c. Collection archiving
d. Community collection building
3. Evaluation
a. Dataset
b. Metrics/Results
4. Conclusions
11
Geo: Nearby news media discovery
● Given a zip code, Geo, returns a list of newspapers, TV, and radio stations
in order of proximity to location associated with the zip code.
● For example, given the zip code: “23529” (Norfolk Virginia, USA), here is a
list of 10 news media for Norfolk:
12
Geo: Nearby news media discovery
● For example, given the zip code: “23529” (Norfolk Virginia, USA), here is a
list of 10 news media for Norfolk (JSON):
13
Geo: Nearby news media discovery
● US local news repository:
○ 5,992 Newspapers
○ 1,061 TV stations, and
○ 2,539 Radio stations
■ Scraped from
http://www.usnpl.com/
● Non-US local news repository:
○ 6,638 Newspapers
○ 183 Countries
○ 3,151 Cities
■ Scraped from
https://www.thepaperboy.com/
14
SearchEngine(q = “protesters and police site:whro.org”)
...
SearchEngine(q = “protesters and police site:pilotonline.com”)
SearchEngine(q = “protesters and police site:wtkr.com”)
Chrome Extension: Collection building
15
Local Stories for Query: "protesters and police", for
23529 (Norfolk VA, USA).
Chrome Extension: Collection building
16
Non-LocalLocal vs
Local news sources from Virginia, such: Virginia
Pilot, WHRO-TV, and WTKR-TV
Non-Local sources (e.g., CNN and NBC News),
and Local sources (e.g., ABC7 Chicago and
Chicago Tribune), and a Youtube source
A non-Local collection mixes Local and non-Local sources
Chrome Extension: Collection building
17
To mitigate the problems of content drift and link rot, as well as preserve
collections for future users and researchers, the LMP extension
implements collection archiving
18
Collection
building
Collection
archiving
Chrome Extension: Collection archiving
19
...
archive.is https://archive.is/0hQQG
...
public archive0
public archive1
public archiven-1
PRESENT IMPLEMENTATION
IDEAL IMPLEMENTATION
archive-uri0
archive-uri1
archive-urin-1
Chrome Extension: Collection archiving
20
Community collection building
● We believe there is value when multiple users contribute to the same collection
● This is similar in spirit to the Internet Archive’s request to the public to contribute URIs for
the 2016 Orlando Nightclub Shooting Web Archive:
21
The LMP Extension enables users to share collections on Twitter
● We believe there is value when multiple users contribute to the
same collection
● The LMP Extension enables users to share collections on Twitter.
Shared collections may be tagged with a hashtag
● The hashtag provides a means for thematically-related collections to
be organized
22
The hashtag provides a means for thematically-related collections to be organized
23
LMP: Outline
1. Introduction
2. LMP local stories collection building
a. Geo: Nearby news media discovery
b. Chrome Extension: Collection building
c. Collection archiving
d. Community collection building
3. Evaluation
a. Dataset
b. Metrics/Results
4. Conclusions
24
Evaluation
● We claim that Local collections have less exposure compared to
non-Local collections
● Through collection building, archiving, and sharing, LMP could
facilitate the increase of exposure of Local news sources
● To assess the validity of our claim, we measured the degree of
exposure Local collections have compared to non-Local collections
25
Evaluation: Dataset
● Our evaluation dataset comprised of 20 pairs (Local and non-Local) of collections
corresponding to 20 different stories
● Each collection (Local
and non-Local) was
further split into two
classes:
○ G - extracted from
the default Google
SERP, and
○ NV - extracted from
the Google News
vertical SERP
G NV
26
Evaluation: Dataset
● Our evaluation dataset comprised of 20 pairs (Local and non-Local)
of collections corresponding to 20 different stories
● Each collection (Local
and non-Local) was
further split into two
classes:
○ G - extracted from
the default Google
SERP, and
○ NV - extracted from
the Google News
vertical SERP
27
Evaluation: Dataset (cont’d)
● Our evaluation dataset comprised of 20 pairs (Local and non-Local)
of collections corresponding to 20 different stories
● Each collection (Local
and non-Local) was
further split into two
classes:
○ G - extracted from
the default Google
SERP, and
○ NV - extracted from
the Google News
vertical SERP
28
Evaluation: Metrics
● For each collection we measured:
○ Archival coverage and tweet index rate to approximate the
exposure of the Local and non-Local collections
● We also measured:
○ Temporal range,
○ Precision, and
○ Sub-collection overlap for experimentation
29
Archival coverage: Non-Local collections produced higher archive rates
than Local collections (claim confirmed)
● Definition: The archival coverage is the fraction of a
collection that is archived
● Claim: We claim that non-Local collections possess
higher archive rates than Local collections
● Extraction: The binary archived state of a story in a
collection was extracted by utilizing the MemGator
utility (http://memgator.cs.odu.edu/)
● Result:
○ Non-Local collections G and NV produced
archive rates of 0.83 and 0.80, respectively
○ Local collections G and NV produced archive
rates of 0.52 and 0.63, respectively
30
Tweet index rates: Non-Local collections produced higher tweet index
rates than Local collections (claim confirmed)
● Definition: The tweet index rate is the fraction of a
collection which could also be found embedded in
a tweet
● Claim: We claim that non-Local collections possess
higher tweet index rates than Local collections
● Extraction: The binary tweet index state of a story
in the collection was extracted by searching Twitter
● Result:
○ Non-Local collections G and NV produced
tweet index rates of 0.71 and 0.80,
respectively
○ Local collections G and NV produced tweet
index rates of 0.44 and 0.59, respectively
31
Temporal range: Non-Local-NV collections possessed the highest
probability of producing the newest document with a probability of 0.75
(claim confirmed)
● Definition: the temporal range of a collection is the
distribution of the creation datestamps of the stories in
the collection
● Claim: We claim that non-Local collections are
temporally biased to produce newer stories than Local
collections
● Extraction: Most news stories have creation
datestamps. We extracted these datestamps from the
SERPs
● Result:
○ Local-G collections produce the oldest
documents with a probability of 0.7
○ The consequences of these probabilities are
crucial: One must sample Local-G collections in
order to maximize the chances of finding the first
reports about a story or event 32
Precision: Type-G collections produce documents at a higher precision
than NV (claim partially confirmed)
● Definition: The precision of a collection is the fraction of
stories in the collection that are relevant to the collection
query based on the judgement of a human evaluator. We
considered a story relevant or non-relevant only if the
relevance score was by a margin of 2 votes or more
● Claim: We claim that non-Local collections possess a
higher precision than Local collections
● Extraction: 14 evaluators evaluated our dataset. For each
story in a collection, an evaluator scored the story as
relevant if the story was on topic with respect to the
collection query, and non-relevant otherwise
● Result
○ Local-G precision: 0.84, non-Local-G: 0.72,
Local-NV: 0.71, and non-Local-NV: 0.68
Relevance Margin of 2 Vote or more
33
Precision: Type-G collections produce documents at a higher precision
than NV (claim partially confirmed)
● Result
○ non-Local-G precision: 0.84, Local-G: 0.79,
non-Local-NV: 0.71, and Local-NV: 0.70
Relevance Margin of 1 Vote or more
34
Sub-collection overlap: Local collections showed a higher overlap rate
than non-Local collection (claim confirmed)
● Definition:
○ Given a collection evaluation dataset, let
sub-collection sets LG
and LNV
define sets populated
from Local-G and Local-NV, respectively
○ Similarly, let sub-collection sets NLG
and NLNV
define
sets populated from non-Local-G and non-Local-NV,
respectively
○ The overlap of 2 sets X, Y, overlap(X, Y) =
● Claim: We claim Local sub-collections LG
and LNV
have
more in common (more overlap) than non-Local
sub-collections NLG
and NLNV
● Result: Local collections showed a higher overlap rate than
non-Local collection
35
e1: Local collections overlap
e2: Non-Local collections overlap
e3: e1 and e2 overlap
LMP: Outline
1. Introduction
2. LMP local stories collection building
a. Geo: Nearby news media discovery
b. Chrome Extension: Collection building
c. Collection archiving
d. Community collection building
3. Evaluation
a. Dataset
b. Metrics/Results
4. Conclusions
36
Conclusions
● We cannot rely exclusively on non-Local sources to build our
collections
● Local news sources are fundamental to journalism, but less exposed
● LMP’s tools could help expose local news source
○ Geo (http://www.localmemory.org/geo/)
○ Chrome Extension - Local stories collection generator
(http://www.localmemory.org/)
● Our tools, local news repository, and evaluation results are publicly
available (https://github.com/harvard-lil/local-memory)
37
Follow: @localmem
Download Chrome Extension:
http://www.localmemory.org/
Thank you!
@acnwala @webscidl @harvardlil
38

More Related Content

Similar to Local Memory Project

SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
Micah Altman
 
NewYorkHeritage.org overview for historians
NewYorkHeritage.org overview for historiansNewYorkHeritage.org overview for historians
NewYorkHeritage.org overview for historians
Larry Naukam
 
BL Social Sciences Post Graduate Training Day - Datasets
BL Social Sciences Post Graduate Training Day - DatasetsBL Social Sciences Post Graduate Training Day - Datasets
BL Social Sciences Post Graduate Training Day - Datasets
johnkayebl
 
Local Weather Information and GNOME Shell Extension
Local Weather Information and GNOME Shell ExtensionLocal Weather Information and GNOME Shell Extension
Local Weather Information and GNOME Shell Extension
Sammy Fung
 
Creating Structure in Web Archives With Collections: Different Concepts From ...
Creating Structure in Web Archives With Collections: Different Concepts From ...Creating Structure in Web Archives With Collections: Different Concepts From ...
Creating Structure in Web Archives With Collections: Different Concepts From ...
Himarsha Jayanetti
 

Similar to Local Memory Project (20)

SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
 
Webinar: SOS Save Our Site! Archiving Web Content-2017-08-10
Webinar: SOS Save Our Site! Archiving Web Content-2017-08-10Webinar: SOS Save Our Site! Archiving Web Content-2017-08-10
Webinar: SOS Save Our Site! Archiving Web Content-2017-08-10
 
Mapping Historical Photos For The Common Good
Mapping Historical Photos For The Common GoodMapping Historical Photos For The Common Good
Mapping Historical Photos For The Common Good
 
NewYorkHeritage.org overview for historians
NewYorkHeritage.org overview for historiansNewYorkHeritage.org overview for historians
NewYorkHeritage.org overview for historians
 
Introduction to British Library digital resources for social scientists
Introduction to British Library digital resources for social scientistsIntroduction to British Library digital resources for social scientists
Introduction to British Library digital resources for social scientists
 
OKFN_OpenDataMx
OKFN_OpenDataMxOKFN_OpenDataMx
OKFN_OpenDataMx
 
BL Social Sciences Post Graduate Training Day - Datasets
BL Social Sciences Post Graduate Training Day - DatasetsBL Social Sciences Post Graduate Training Day - Datasets
BL Social Sciences Post Graduate Training Day - Datasets
 
The IMLS National Digital Platform & Your Library: Tools You Can Use
The IMLS National Digital Platform & Your Library: Tools You Can UseThe IMLS National Digital Platform & Your Library: Tools You Can Use
The IMLS National Digital Platform & Your Library: Tools You Can Use
 
BHL Tech Status Update Tech Director W.Ulate 2015.12.11
BHL Tech Status Update Tech Director W.Ulate 2015.12.11BHL Tech Status Update Tech Director W.Ulate 2015.12.11
BHL Tech Status Update Tech Director W.Ulate 2015.12.11
 
Local Weather Information and GNOME Shell Extension
Local Weather Information and GNOME Shell ExtensionLocal Weather Information and GNOME Shell Extension
Local Weather Information and GNOME Shell Extension
 
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...SocialSensor Project: Sensing User Generated Input for Improved Media Discove...
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...
 
Storytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesStorytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web Archives
 
Digital Library Project Proposal
Digital Library Project ProposalDigital Library Project Proposal
Digital Library Project Proposal
 
Creating Structure in Web Archives With Collections: Different Concepts From ...
Creating Structure in Web Archives With Collections: Different Concepts From ...Creating Structure in Web Archives With Collections: Different Concepts From ...
Creating Structure in Web Archives With Collections: Different Concepts From ...
 
AAPB: National Federation of Community Broadcasters
AAPB: National Federation of Community BroadcastersAAPB: National Federation of Community Broadcasters
AAPB: National Federation of Community Broadcasters
 
Update on IMLS National Digital Platform
Update on IMLS National Digital Platform Update on IMLS National Digital Platform
Update on IMLS National Digital Platform
 
Keeping the Broadcast Historic Record: An Archive of Public Media in the Making
Keeping the Broadcast Historic Record: An Archive of Public Media in the MakingKeeping the Broadcast Historic Record: An Archive of Public Media in the Making
Keeping the Broadcast Historic Record: An Archive of Public Media in the Making
 
Linked Data on the BBC
Linked Data on the BBCLinked Data on the BBC
Linked Data on the BBC
 
Bootstrapping Web Archive Collections of Stories from Micro-collections in S...
Bootstrapping Web Archive Collections  of Stories from Micro-collections in S...Bootstrapping Web Archive Collections  of Stories from Micro-collections in S...
Bootstrapping Web Archive Collections of Stories from Micro-collections in S...
 
BL Digital Scholarship
BL Digital Scholarship BL Digital Scholarship
BL Digital Scholarship
 

More from Alexander Nwala

More from Alexander Nwala (6)

Scraping SERPs For Archival Seeds - It Matters When You Start
Scraping SERPs For Archival Seeds - It Matters When You StartScraping SERPs For Archival Seeds - It Matters When You Start
Scraping SERPs For Archival Seeds - It Matters When You Start
 
Tweet Visibility Dynamics in a Tweet Conversation Graph
Tweet Visibility Dynamics in a Tweet Conversation GraphTweet Visibility Dynamics in a Tweet Conversation Graph
Tweet Visibility Dynamics in a Tweet Conversation Graph
 
Generating collections for stories and events
Generating collections for stories and eventsGenerating collections for stories and events
Generating collections for stories and events
 
Jcdl2016_keynote-zemankova
Jcdl2016_keynote-zemankovaJcdl2016_keynote-zemankova
Jcdl2016_keynote-zemankova
 
Tracking discourse on social media
Tracking discourse on social mediaTracking discourse on social media
Tracking discourse on social media
 
Information Visualization Project
Information Visualization ProjectInformation Visualization Project
Information Visualization Project
 

Recently uploaded

Powerful Love Spells in Phoenix, AZ (310) 882-6330 Bring Back Lost Lover
Powerful Love Spells in Phoenix, AZ (310) 882-6330 Bring Back Lost LoverPowerful Love Spells in Phoenix, AZ (310) 882-6330 Bring Back Lost Lover
Powerful Love Spells in Phoenix, AZ (310) 882-6330 Bring Back Lost Lover
PsychicRuben LoveSpells
 
₹5.5k {Cash Payment} Independent Greater Noida Call Girls In [Delhi INAYA] 🔝|...
₹5.5k {Cash Payment} Independent Greater Noida Call Girls In [Delhi INAYA] 🔝|...₹5.5k {Cash Payment} Independent Greater Noida Call Girls In [Delhi INAYA] 🔝|...
₹5.5k {Cash Payment} Independent Greater Noida Call Girls In [Delhi INAYA] 🔝|...
Diya Sharma
 

Recently uploaded (20)

04052024_First India Newspaper Jaipur.pdf
04052024_First India Newspaper Jaipur.pdf04052024_First India Newspaper Jaipur.pdf
04052024_First India Newspaper Jaipur.pdf
 
05052024_First India Newspaper Jaipur.pdf
05052024_First India Newspaper Jaipur.pdf05052024_First India Newspaper Jaipur.pdf
05052024_First India Newspaper Jaipur.pdf
 
America Is the Target; Israel Is the Front Line _ Andy Blumenthal _ The Blogs...
America Is the Target; Israel Is the Front Line _ Andy Blumenthal _ The Blogs...America Is the Target; Israel Is the Front Line _ Andy Blumenthal _ The Blogs...
America Is the Target; Israel Is the Front Line _ Andy Blumenthal _ The Blogs...
 
Powerful Love Spells in Phoenix, AZ (310) 882-6330 Bring Back Lost Lover
Powerful Love Spells in Phoenix, AZ (310) 882-6330 Bring Back Lost LoverPowerful Love Spells in Phoenix, AZ (310) 882-6330 Bring Back Lost Lover
Powerful Love Spells in Phoenix, AZ (310) 882-6330 Bring Back Lost Lover
 
Verified Love Spells in Little Rock, AR (310) 882-6330 Get My Ex-Lover Back
Verified Love Spells in Little Rock, AR (310) 882-6330 Get My Ex-Lover BackVerified Love Spells in Little Rock, AR (310) 882-6330 Get My Ex-Lover Back
Verified Love Spells in Little Rock, AR (310) 882-6330 Get My Ex-Lover Back
 
Lorenzo D'Emidio_Lavoro sullaNorth Korea .pptx
Lorenzo D'Emidio_Lavoro sullaNorth Korea .pptxLorenzo D'Emidio_Lavoro sullaNorth Korea .pptx
Lorenzo D'Emidio_Lavoro sullaNorth Korea .pptx
 
WhatsApp 📞 8448380779 ✅Call Girls In Chaura Sector 22 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Chaura Sector 22 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Chaura Sector 22 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Chaura Sector 22 ( Noida)
 
Enjoy Night ≽ 8448380779 ≼ Call Girls In Gurgaon Sector 46 (Gurgaon)
Enjoy Night ≽ 8448380779 ≼ Call Girls In Gurgaon Sector 46 (Gurgaon)Enjoy Night ≽ 8448380779 ≼ Call Girls In Gurgaon Sector 46 (Gurgaon)
Enjoy Night ≽ 8448380779 ≼ Call Girls In Gurgaon Sector 46 (Gurgaon)
 
Group_5_US-China Trade War to understand the trade
Group_5_US-China Trade War to understand the tradeGroup_5_US-China Trade War to understand the trade
Group_5_US-China Trade War to understand the trade
 
BDSM⚡Call Girls in Greater Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Greater Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Greater Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Greater Noida Escorts >༒8448380779 Escort Service
 
Gujarat-SEBCs.pdf pfpkoopapriorjfperjreie
Gujarat-SEBCs.pdf pfpkoopapriorjfperjreieGujarat-SEBCs.pdf pfpkoopapriorjfperjreie
Gujarat-SEBCs.pdf pfpkoopapriorjfperjreie
 
Kishan Reddy Report To People (2019-24).pdf
Kishan Reddy Report To People (2019-24).pdfKishan Reddy Report To People (2019-24).pdf
Kishan Reddy Report To People (2019-24).pdf
 
KAHULUGAN AT KAHALAGAHAN NG GAWAING PANSIBIKO.pptx
KAHULUGAN AT KAHALAGAHAN NG GAWAING PANSIBIKO.pptxKAHULUGAN AT KAHALAGAHAN NG GAWAING PANSIBIKO.pptx
KAHULUGAN AT KAHALAGAHAN NG GAWAING PANSIBIKO.pptx
 
Enjoy Night⚡Call Girls Iffco Chowk Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Iffco Chowk Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Iffco Chowk Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Iffco Chowk Gurgaon >༒8448380779 Escort Service
 
Nara Chandrababu Naidu's Visionary Policies For Andhra Pradesh's Development
Nara Chandrababu Naidu's Visionary Policies For Andhra Pradesh's DevelopmentNara Chandrababu Naidu's Visionary Policies For Andhra Pradesh's Development
Nara Chandrababu Naidu's Visionary Policies For Andhra Pradesh's Development
 
BDSM⚡Call Girls in Sector 135 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 135 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 135 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 135 Noida Escorts >༒8448380779 Escort Service
 
₹5.5k {Cash Payment} Independent Greater Noida Call Girls In [Delhi INAYA] 🔝|...
₹5.5k {Cash Payment} Independent Greater Noida Call Girls In [Delhi INAYA] 🔝|...₹5.5k {Cash Payment} Independent Greater Noida Call Girls In [Delhi INAYA] 🔝|...
₹5.5k {Cash Payment} Independent Greater Noida Call Girls In [Delhi INAYA] 🔝|...
 
Enjoy Night⚡Call Girls Rajokri Delhi >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Rajokri Delhi >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Rajokri Delhi >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Rajokri Delhi >༒8448380779 Escort Service
 
Defensa de JOH insiste que testimonio de analista de la DEA es falso y solici...
Defensa de JOH insiste que testimonio de analista de la DEA es falso y solici...Defensa de JOH insiste que testimonio de analista de la DEA es falso y solici...
Defensa de JOH insiste que testimonio de analista de la DEA es falso y solici...
 
06052024_First India Newspaper Jaipur.pdf
06052024_First India Newspaper Jaipur.pdf06052024_First India Newspaper Jaipur.pdf
06052024_First India Newspaper Jaipur.pdf
 

Local Memory Project

  • 1. Providing tools to build collections of stories for local events from local sources 1
  • 2. Th is work was made possible in part by IMLS LG-71-15-0077-15 and support from the Harvard Law School Library. We are grateful for the support. Local Memory Project (LMP) http://www.localmemory.org/, https://twitter.com/localmem Alexander C. Nwala, Michele C. Weigle, and Michael L. Nelson @webscidl Old Dominion University Adam B. Ziegler and Anastasia Aizman @harvardlil Harvard Library Innovation Lab Presented by: Alexander C. Nwala (@acnwala) Computer Science Ph.D student Media Cloud Intern, Berkman Klein Center for Internet & Society, Harvard University JCDL 2017, June 21, 2017 2
  • 3. LMP: Outline 1. Introduction 2. LMP local stories collection building a. Geo: Nearby news media discovery b. Chrome Extension: Collection building c. Collection archiving d. Community collection building 3. Evaluation a. Dataset b. Metrics/Results 4. Conclusions 3
  • 4. Local Michigan media first reported on the Flint water changeover in 2014 http://www.mlive.com/opinion/flint/index.ssf/2014/04/editorial_switch_to_flint_rive.html ● April 2014: Officials in Flint, Michigan switched the city’s water source from Lake Huron (Detroit water system) to the Flint River ● This news was reported by local media such as Michigan Radio, the Flint Journal-MLive, and local TV affiliates in Flint (WEYI, WJRT, WSMH, and WNEM)1 1 Denise Robbins. 2016. ANALYSIS: How Michigan And National Reporters Covered The Flint Water Crisis. h ttps://mediama tters.org/research/2016/02/02/analysis-how-michigan-and-national-reporters-co/208290. (2016). 4
  • 5. http://www.mlive.com/news/flint/index.ssf/2014/05/state_says_flint_river_water_m.html ● May 23, 2014: City residents complained about the water’s taste and smell ● This news was reported Ron Fonger of Flint Journal-MLive reported (local media)2 2 Ron Fonger. 2014. State says Flint River water meets all standards but more than twice the hardness of lake water. h ttp://www.mlive.com/news/ int/index.ssf/2014/05/state_says_fl int_river_water_m.html. (2014). City residents complained about the water’s taste and smell 5
  • 6. Between August and September 2014: the city issued three boil advisories to residents of Flint after finding fecal coliform bacteria (E. coli) in the water1 http://www.mlive.com/news/flint/index.ssf/2014/09/flint_says_drinking_water_advi .html http://www.mlive.com/news/flint/index.ssf/2014/09/flint_lifts_boil_water_advisor.html http://www.mlive.com/news/flint/index.ssf/2014/09/flint_flushes_out_latest_water.ht ml Flint issues three boil advisories after finding E. coli in the water 6 1 Denise Robbins. 2016. ANALYSIS: How Michigan And National Reporters Covered The Flint Water Crisis. h ttps://mediamatt ers.org/research/2016/02/ 02/analysis-how-michigan-and-national-reporters-co/208290. (2016).
  • 7. January 5, 2016: Governor Rick Snyder declared a state of emergency for the city of Flint, due to dangerously high levels of lead contamination in the drinking water https://www.democracynow.org/2016/1/8/poisoned_democracy_how_an_unelected_official January 2016, Governor Rick Snyder declared a state of emergency for Flint 7
  • 8. ● A chain of events about the Flint water crisis was reported by local media, but most of the non-local media did not report this crucial story until 2016.1 ● Local media is fundamental to journalism, but is in decline.3 LMP attempts to shed some light on local media 1 Denise Robbins. 2016. ANALYSIS: How Michigan And National Reporters Covered The Flint Water Crisis. h ttps://mediamatt ers.org/research/2016/02/02/analysis-how-michigan-and-national-reporters-co/208290. (2016). 3 Rasmus Kleis Nielsen. 2015. Local journalism: the decline of newspapers and the rise of digital media. IB Tauris. Non-local media did not report this crucial story until 2016 https://cloudfront.mediamatters.org/static/uploader/image/2016/02/03/flinttimeline1.png 8
  • 9. Local and non-Local media have different priorities Non-local news organizations such as CNN cover stories of a broader (national/international) scope such as Obamacare and the Syrian refugee migrant crisis Local media such as the Caloosa Belle Newspaper (LaBelle, FL) cover stories that would not naturally be of interest to another locality, such as the annual Swamp Cabbage Festival http://caloosabelle.com/?s=swamp+cabbage http://www.cnn.com/specials/world/migration-crisis 9
  • 10. LMP: Introduction LMP provides a suite of tools (beginning with two) to help users and small communities discover, collect, build, archive, and share collections of stories for important local events by leveraging local news sources 10
  • 11. LMP: Outline 1. Introduction 2. LMP local stories collection building a. Geo: Nearby news media discovery b. Chrome Extension: Collection building c. Collection archiving d. Community collection building 3. Evaluation a. Dataset b. Metrics/Results 4. Conclusions 11
  • 12. Geo: Nearby news media discovery ● Given a zip code, Geo, returns a list of newspapers, TV, and radio stations in order of proximity to location associated with the zip code. ● For example, given the zip code: “23529” (Norfolk Virginia, USA), here is a list of 10 news media for Norfolk: 12
  • 13. Geo: Nearby news media discovery ● For example, given the zip code: “23529” (Norfolk Virginia, USA), here is a list of 10 news media for Norfolk (JSON): 13
  • 14. Geo: Nearby news media discovery ● US local news repository: ○ 5,992 Newspapers ○ 1,061 TV stations, and ○ 2,539 Radio stations ■ Scraped from http://www.usnpl.com/ ● Non-US local news repository: ○ 6,638 Newspapers ○ 183 Countries ○ 3,151 Cities ■ Scraped from https://www.thepaperboy.com/ 14
  • 15. SearchEngine(q = “protesters and police site:whro.org”) ... SearchEngine(q = “protesters and police site:pilotonline.com”) SearchEngine(q = “protesters and police site:wtkr.com”) Chrome Extension: Collection building 15
  • 16. Local Stories for Query: "protesters and police", for 23529 (Norfolk VA, USA). Chrome Extension: Collection building 16
  • 17. Non-LocalLocal vs Local news sources from Virginia, such: Virginia Pilot, WHRO-TV, and WTKR-TV Non-Local sources (e.g., CNN and NBC News), and Local sources (e.g., ABC7 Chicago and Chicago Tribune), and a Youtube source A non-Local collection mixes Local and non-Local sources Chrome Extension: Collection building 17
  • 18. To mitigate the problems of content drift and link rot, as well as preserve collections for future users and researchers, the LMP extension implements collection archiving 18
  • 20. ... archive.is https://archive.is/0hQQG ... public archive0 public archive1 public archiven-1 PRESENT IMPLEMENTATION IDEAL IMPLEMENTATION archive-uri0 archive-uri1 archive-urin-1 Chrome Extension: Collection archiving 20
  • 21. Community collection building ● We believe there is value when multiple users contribute to the same collection ● This is similar in spirit to the Internet Archive’s request to the public to contribute URIs for the 2016 Orlando Nightclub Shooting Web Archive: 21
  • 22. The LMP Extension enables users to share collections on Twitter ● We believe there is value when multiple users contribute to the same collection ● The LMP Extension enables users to share collections on Twitter. Shared collections may be tagged with a hashtag ● The hashtag provides a means for thematically-related collections to be organized 22
  • 23. The hashtag provides a means for thematically-related collections to be organized 23
  • 24. LMP: Outline 1. Introduction 2. LMP local stories collection building a. Geo: Nearby news media discovery b. Chrome Extension: Collection building c. Collection archiving d. Community collection building 3. Evaluation a. Dataset b. Metrics/Results 4. Conclusions 24
  • 25. Evaluation ● We claim that Local collections have less exposure compared to non-Local collections ● Through collection building, archiving, and sharing, LMP could facilitate the increase of exposure of Local news sources ● To assess the validity of our claim, we measured the degree of exposure Local collections have compared to non-Local collections 25
  • 26. Evaluation: Dataset ● Our evaluation dataset comprised of 20 pairs (Local and non-Local) of collections corresponding to 20 different stories ● Each collection (Local and non-Local) was further split into two classes: ○ G - extracted from the default Google SERP, and ○ NV - extracted from the Google News vertical SERP G NV 26
  • 27. Evaluation: Dataset ● Our evaluation dataset comprised of 20 pairs (Local and non-Local) of collections corresponding to 20 different stories ● Each collection (Local and non-Local) was further split into two classes: ○ G - extracted from the default Google SERP, and ○ NV - extracted from the Google News vertical SERP 27
  • 28. Evaluation: Dataset (cont’d) ● Our evaluation dataset comprised of 20 pairs (Local and non-Local) of collections corresponding to 20 different stories ● Each collection (Local and non-Local) was further split into two classes: ○ G - extracted from the default Google SERP, and ○ NV - extracted from the Google News vertical SERP 28
  • 29. Evaluation: Metrics ● For each collection we measured: ○ Archival coverage and tweet index rate to approximate the exposure of the Local and non-Local collections ● We also measured: ○ Temporal range, ○ Precision, and ○ Sub-collection overlap for experimentation 29
  • 30. Archival coverage: Non-Local collections produced higher archive rates than Local collections (claim confirmed) ● Definition: The archival coverage is the fraction of a collection that is archived ● Claim: We claim that non-Local collections possess higher archive rates than Local collections ● Extraction: The binary archived state of a story in a collection was extracted by utilizing the MemGator utility (http://memgator.cs.odu.edu/) ● Result: ○ Non-Local collections G and NV produced archive rates of 0.83 and 0.80, respectively ○ Local collections G and NV produced archive rates of 0.52 and 0.63, respectively 30
  • 31. Tweet index rates: Non-Local collections produced higher tweet index rates than Local collections (claim confirmed) ● Definition: The tweet index rate is the fraction of a collection which could also be found embedded in a tweet ● Claim: We claim that non-Local collections possess higher tweet index rates than Local collections ● Extraction: The binary tweet index state of a story in the collection was extracted by searching Twitter ● Result: ○ Non-Local collections G and NV produced tweet index rates of 0.71 and 0.80, respectively ○ Local collections G and NV produced tweet index rates of 0.44 and 0.59, respectively 31
  • 32. Temporal range: Non-Local-NV collections possessed the highest probability of producing the newest document with a probability of 0.75 (claim confirmed) ● Definition: the temporal range of a collection is the distribution of the creation datestamps of the stories in the collection ● Claim: We claim that non-Local collections are temporally biased to produce newer stories than Local collections ● Extraction: Most news stories have creation datestamps. We extracted these datestamps from the SERPs ● Result: ○ Local-G collections produce the oldest documents with a probability of 0.7 ○ The consequences of these probabilities are crucial: One must sample Local-G collections in order to maximize the chances of finding the first reports about a story or event 32
  • 33. Precision: Type-G collections produce documents at a higher precision than NV (claim partially confirmed) ● Definition: The precision of a collection is the fraction of stories in the collection that are relevant to the collection query based on the judgement of a human evaluator. We considered a story relevant or non-relevant only if the relevance score was by a margin of 2 votes or more ● Claim: We claim that non-Local collections possess a higher precision than Local collections ● Extraction: 14 evaluators evaluated our dataset. For each story in a collection, an evaluator scored the story as relevant if the story was on topic with respect to the collection query, and non-relevant otherwise ● Result ○ Local-G precision: 0.84, non-Local-G: 0.72, Local-NV: 0.71, and non-Local-NV: 0.68 Relevance Margin of 2 Vote or more 33
  • 34. Precision: Type-G collections produce documents at a higher precision than NV (claim partially confirmed) ● Result ○ non-Local-G precision: 0.84, Local-G: 0.79, non-Local-NV: 0.71, and Local-NV: 0.70 Relevance Margin of 1 Vote or more 34
  • 35. Sub-collection overlap: Local collections showed a higher overlap rate than non-Local collection (claim confirmed) ● Definition: ○ Given a collection evaluation dataset, let sub-collection sets LG and LNV define sets populated from Local-G and Local-NV, respectively ○ Similarly, let sub-collection sets NLG and NLNV define sets populated from non-Local-G and non-Local-NV, respectively ○ The overlap of 2 sets X, Y, overlap(X, Y) = ● Claim: We claim Local sub-collections LG and LNV have more in common (more overlap) than non-Local sub-collections NLG and NLNV ● Result: Local collections showed a higher overlap rate than non-Local collection 35 e1: Local collections overlap e2: Non-Local collections overlap e3: e1 and e2 overlap
  • 36. LMP: Outline 1. Introduction 2. LMP local stories collection building a. Geo: Nearby news media discovery b. Chrome Extension: Collection building c. Collection archiving d. Community collection building 3. Evaluation a. Dataset b. Metrics/Results 4. Conclusions 36
  • 37. Conclusions ● We cannot rely exclusively on non-Local sources to build our collections ● Local news sources are fundamental to journalism, but less exposed ● LMP’s tools could help expose local news source ○ Geo (http://www.localmemory.org/geo/) ○ Chrome Extension - Local stories collection generator (http://www.localmemory.org/) ● Our tools, local news repository, and evaluation results are publicly available (https://github.com/harvard-lil/local-memory) 37
  • 38. Follow: @localmem Download Chrome Extension: http://www.localmemory.org/ Thank you! @acnwala @webscidl @harvardlil 38