Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Local Memory Project

1,021 views

Published on

Providing tools to build collections of stories for local events from local sources.

Published in: News & Politics
  • Be the first to comment

  • Be the first to like this

Local Memory Project

  1. 1. Providing tools to build collections of stories for local events from local sources 1
  2. 2. Th is work was made possible in part by IMLS LG-71-15-0077-15 and support from the Harvard Law School Library. We are grateful for the support. Local Memory Project (LMP) http://www.localmemory.org/, https://twitter.com/localmem Alexander C. Nwala, Michele C. Weigle, and Michael L. Nelson @webscidl Old Dominion University Adam B. Ziegler and Anastasia Aizman @harvardlil Harvard Library Innovation Lab Presented by: Alexander C. Nwala (@acnwala) Computer Science Ph.D student Media Cloud Intern, Berkman Klein Center for Internet & Society, Harvard University JCDL 2017, June 21, 2017 2
  3. 3. LMP: Outline 1. Introduction 2. LMP local stories collection building a. Geo: Nearby news media discovery b. Chrome Extension: Collection building c. Collection archiving d. Community collection building 3. Evaluation a. Dataset b. Metrics/Results 4. Conclusions 3
  4. 4. Local Michigan media first reported on the Flint water changeover in 2014 http://www.mlive.com/opinion/flint/index.ssf/2014/04/editorial_switch_to_flint_rive.html ● April 2014: Officials in Flint, Michigan switched the city’s water source from Lake Huron (Detroit water system) to the Flint River ● This news was reported by local media such as Michigan Radio, the Flint Journal-MLive, and local TV affiliates in Flint (WEYI, WJRT, WSMH, and WNEM)1 1 Denise Robbins. 2016. ANALYSIS: How Michigan And National Reporters Covered The Flint Water Crisis. h ttps://mediama tters.org/research/2016/02/02/analysis-how-michigan-and-national-reporters-co/208290. (2016). 4
  5. 5. http://www.mlive.com/news/flint/index.ssf/2014/05/state_says_flint_river_water_m.html ● May 23, 2014: City residents complained about the water’s taste and smell ● This news was reported Ron Fonger of Flint Journal-MLive reported (local media)2 2 Ron Fonger. 2014. State says Flint River water meets all standards but more than twice the hardness of lake water. h ttp://www.mlive.com/news/ int/index.ssf/2014/05/state_says_fl int_river_water_m.html. (2014). City residents complained about the water’s taste and smell 5
  6. 6. Between August and September 2014: the city issued three boil advisories to residents of Flint after finding fecal coliform bacteria (E. coli) in the water1 http://www.mlive.com/news/flint/index.ssf/2014/09/flint_says_drinking_water_advi .html http://www.mlive.com/news/flint/index.ssf/2014/09/flint_lifts_boil_water_advisor.html http://www.mlive.com/news/flint/index.ssf/2014/09/flint_flushes_out_latest_water.ht ml Flint issues three boil advisories after finding E. coli in the water 6 1 Denise Robbins. 2016. ANALYSIS: How Michigan And National Reporters Covered The Flint Water Crisis. h ttps://mediamatt ers.org/research/2016/02/ 02/analysis-how-michigan-and-national-reporters-co/208290. (2016).
  7. 7. January 5, 2016: Governor Rick Snyder declared a state of emergency for the city of Flint, due to dangerously high levels of lead contamination in the drinking water https://www.democracynow.org/2016/1/8/poisoned_democracy_how_an_unelected_official January 2016, Governor Rick Snyder declared a state of emergency for Flint 7
  8. 8. ● A chain of events about the Flint water crisis was reported by local media, but most of the non-local media did not report this crucial story until 2016.1 ● Local media is fundamental to journalism, but is in decline.3 LMP attempts to shed some light on local media 1 Denise Robbins. 2016. ANALYSIS: How Michigan And National Reporters Covered The Flint Water Crisis. h ttps://mediamatt ers.org/research/2016/02/02/analysis-how-michigan-and-national-reporters-co/208290. (2016). 3 Rasmus Kleis Nielsen. 2015. Local journalism: the decline of newspapers and the rise of digital media. IB Tauris. Non-local media did not report this crucial story until 2016 https://cloudfront.mediamatters.org/static/uploader/image/2016/02/03/flinttimeline1.png 8
  9. 9. Local and non-Local media have different priorities Non-local news organizations such as CNN cover stories of a broader (national/international) scope such as Obamacare and the Syrian refugee migrant crisis Local media such as the Caloosa Belle Newspaper (LaBelle, FL) cover stories that would not naturally be of interest to another locality, such as the annual Swamp Cabbage Festival http://caloosabelle.com/?s=swamp+cabbage http://www.cnn.com/specials/world/migration-crisis 9
  10. 10. LMP: Introduction LMP provides a suite of tools (beginning with two) to help users and small communities discover, collect, build, archive, and share collections of stories for important local events by leveraging local news sources 10
  11. 11. LMP: Outline 1. Introduction 2. LMP local stories collection building a. Geo: Nearby news media discovery b. Chrome Extension: Collection building c. Collection archiving d. Community collection building 3. Evaluation a. Dataset b. Metrics/Results 4. Conclusions 11
  12. 12. Geo: Nearby news media discovery ● Given a zip code, Geo, returns a list of newspapers, TV, and radio stations in order of proximity to location associated with the zip code. ● For example, given the zip code: “23529” (Norfolk Virginia, USA), here is a list of 10 news media for Norfolk: 12
  13. 13. Geo: Nearby news media discovery ● For example, given the zip code: “23529” (Norfolk Virginia, USA), here is a list of 10 news media for Norfolk (JSON): 13
  14. 14. Geo: Nearby news media discovery ● US local news repository: ○ 5,992 Newspapers ○ 1,061 TV stations, and ○ 2,539 Radio stations ■ Scraped from http://www.usnpl.com/ ● Non-US local news repository: ○ 6,638 Newspapers ○ 183 Countries ○ 3,151 Cities ■ Scraped from https://www.thepaperboy.com/ 14
  15. 15. SearchEngine(q = “protesters and police site:whro.org”) ... SearchEngine(q = “protesters and police site:pilotonline.com”) SearchEngine(q = “protesters and police site:wtkr.com”) Chrome Extension: Collection building 15
  16. 16. Local Stories for Query: "protesters and police", for 23529 (Norfolk VA, USA). Chrome Extension: Collection building 16
  17. 17. Non-LocalLocal vs Local news sources from Virginia, such: Virginia Pilot, WHRO-TV, and WTKR-TV Non-Local sources (e.g., CNN and NBC News), and Local sources (e.g., ABC7 Chicago and Chicago Tribune), and a Youtube source A non-Local collection mixes Local and non-Local sources Chrome Extension: Collection building 17
  18. 18. To mitigate the problems of content drift and link rot, as well as preserve collections for future users and researchers, the LMP extension implements collection archiving 18
  19. 19. Collection building Collection archiving Chrome Extension: Collection archiving 19
  20. 20. ... archive.is https://archive.is/0hQQG ... public archive0 public archive1 public archiven-1 PRESENT IMPLEMENTATION IDEAL IMPLEMENTATION archive-uri0 archive-uri1 archive-urin-1 Chrome Extension: Collection archiving 20
  21. 21. Community collection building ● We believe there is value when multiple users contribute to the same collection ● This is similar in spirit to the Internet Archive’s request to the public to contribute URIs for the 2016 Orlando Nightclub Shooting Web Archive: 21
  22. 22. The LMP Extension enables users to share collections on Twitter ● We believe there is value when multiple users contribute to the same collection ● The LMP Extension enables users to share collections on Twitter. Shared collections may be tagged with a hashtag ● The hashtag provides a means for thematically-related collections to be organized 22
  23. 23. The hashtag provides a means for thematically-related collections to be organized 23
  24. 24. LMP: Outline 1. Introduction 2. LMP local stories collection building a. Geo: Nearby news media discovery b. Chrome Extension: Collection building c. Collection archiving d. Community collection building 3. Evaluation a. Dataset b. Metrics/Results 4. Conclusions 24
  25. 25. Evaluation ● We claim that Local collections have less exposure compared to non-Local collections ● Through collection building, archiving, and sharing, LMP could facilitate the increase of exposure of Local news sources ● To assess the validity of our claim, we measured the degree of exposure Local collections have compared to non-Local collections 25
  26. 26. Evaluation: Dataset ● Our evaluation dataset comprised of 20 pairs (Local and non-Local) of collections corresponding to 20 different stories ● Each collection (Local and non-Local) was further split into two classes: ○ G - extracted from the default Google SERP, and ○ NV - extracted from the Google News vertical SERP G NV 26
  27. 27. Evaluation: Dataset ● Our evaluation dataset comprised of 20 pairs (Local and non-Local) of collections corresponding to 20 different stories ● Each collection (Local and non-Local) was further split into two classes: ○ G - extracted from the default Google SERP, and ○ NV - extracted from the Google News vertical SERP 27
  28. 28. Evaluation: Dataset (cont’d) ● Our evaluation dataset comprised of 20 pairs (Local and non-Local) of collections corresponding to 20 different stories ● Each collection (Local and non-Local) was further split into two classes: ○ G - extracted from the default Google SERP, and ○ NV - extracted from the Google News vertical SERP 28
  29. 29. Evaluation: Metrics ● For each collection we measured: ○ Archival coverage and tweet index rate to approximate the exposure of the Local and non-Local collections ● We also measured: ○ Temporal range, ○ Precision, and ○ Sub-collection overlap for experimentation 29
  30. 30. Archival coverage: Non-Local collections produced higher archive rates than Local collections (claim confirmed) ● Definition: The archival coverage is the fraction of a collection that is archived ● Claim: We claim that non-Local collections possess higher archive rates than Local collections ● Extraction: The binary archived state of a story in a collection was extracted by utilizing the MemGator utility (http://memgator.cs.odu.edu/) ● Result: ○ Non-Local collections G and NV produced archive rates of 0.83 and 0.80, respectively ○ Local collections G and NV produced archive rates of 0.52 and 0.63, respectively 30
  31. 31. Tweet index rates: Non-Local collections produced higher tweet index rates than Local collections (claim confirmed) ● Definition: The tweet index rate is the fraction of a collection which could also be found embedded in a tweet ● Claim: We claim that non-Local collections possess higher tweet index rates than Local collections ● Extraction: The binary tweet index state of a story in the collection was extracted by searching Twitter ● Result: ○ Non-Local collections G and NV produced tweet index rates of 0.71 and 0.80, respectively ○ Local collections G and NV produced tweet index rates of 0.44 and 0.59, respectively 31
  32. 32. Temporal range: Non-Local-NV collections possessed the highest probability of producing the newest document with a probability of 0.75 (claim confirmed) ● Definition: the temporal range of a collection is the distribution of the creation datestamps of the stories in the collection ● Claim: We claim that non-Local collections are temporally biased to produce newer stories than Local collections ● Extraction: Most news stories have creation datestamps. We extracted these datestamps from the SERPs ● Result: ○ Local-G collections produce the oldest documents with a probability of 0.7 ○ The consequences of these probabilities are crucial: One must sample Local-G collections in order to maximize the chances of finding the first reports about a story or event 32
  33. 33. Precision: Type-G collections produce documents at a higher precision than NV (claim partially confirmed) ● Definition: The precision of a collection is the fraction of stories in the collection that are relevant to the collection query based on the judgement of a human evaluator. We considered a story relevant or non-relevant only if the relevance score was by a margin of 2 votes or more ● Claim: We claim that non-Local collections possess a higher precision than Local collections ● Extraction: 14 evaluators evaluated our dataset. For each story in a collection, an evaluator scored the story as relevant if the story was on topic with respect to the collection query, and non-relevant otherwise ● Result ○ Local-G precision: 0.84, non-Local-G: 0.72, Local-NV: 0.71, and non-Local-NV: 0.68 Relevance Margin of 2 Vote or more 33
  34. 34. Precision: Type-G collections produce documents at a higher precision than NV (claim partially confirmed) ● Result ○ non-Local-G precision: 0.84, Local-G: 0.79, non-Local-NV: 0.71, and Local-NV: 0.70 Relevance Margin of 1 Vote or more 34
  35. 35. Sub-collection overlap: Local collections showed a higher overlap rate than non-Local collection (claim confirmed) ● Definition: ○ Given a collection evaluation dataset, let sub-collection sets LG and LNV define sets populated from Local-G and Local-NV, respectively ○ Similarly, let sub-collection sets NLG and NLNV define sets populated from non-Local-G and non-Local-NV, respectively ○ The overlap of 2 sets X, Y, overlap(X, Y) = ● Claim: We claim Local sub-collections LG and LNV have more in common (more overlap) than non-Local sub-collections NLG and NLNV ● Result: Local collections showed a higher overlap rate than non-Local collection 35 e1: Local collections overlap e2: Non-Local collections overlap e3: e1 and e2 overlap
  36. 36. LMP: Outline 1. Introduction 2. LMP local stories collection building a. Geo: Nearby news media discovery b. Chrome Extension: Collection building c. Collection archiving d. Community collection building 3. Evaluation a. Dataset b. Metrics/Results 4. Conclusions 36
  37. 37. Conclusions ● We cannot rely exclusively on non-Local sources to build our collections ● Local news sources are fundamental to journalism, but less exposed ● LMP’s tools could help expose local news source ○ Geo (http://www.localmemory.org/geo/) ○ Chrome Extension - Local stories collection generator (http://www.localmemory.org/) ● Our tools, local news repository, and evaluation results are publicly available (https://github.com/harvard-lil/local-memory) 37
  38. 38. Follow: @localmem Download Chrome Extension: http://www.localmemory.org/ Thank you! @acnwala @webscidl @harvardlil 38

×