2. • Digitized newspaper corpora currently siloed in national collections
• Historical newspaper archives have been digitised by public entities
(the Library of Congress, Hemeroteca Nacional de México),
commercial companies (Gale Cengage or DC Thomson/FindMyPast),
and public-private partnerships (the British Newspaper Archive).
• The problem of OCR-related noise, or imperfect comparability of
corpora
Challenges
3. Member Institutions and PIs
• Northeastern University, US. Ryan Cordell (Consortium PI)
• University of Nebraska–Lincoln, US. Elizabeth Lorang
• North Carolina State University, US. Paul Fyfe
• University of Turku, Finland. Hannu Salmi
• University College London, UK. Ulrich Tiedau
• Loughborough University, UK. Melodee Beals
• Utrecht University, Netherlands. Jaap Verheul
• National Autonomous University of Mexico.
Isabel Galina Russell
• Universität Stuttgart. Steffen Koch
Member Institutions and PIs
4. Member Institutions and PIs
• Northeastern University, US. Ryan Cordell (Consortium PI)
• University of Nebraska–Lincoln, US. Elizabeth Lorang
• North Carolina State University, US. Paul Fyfe
• University of Turku, Finland. Hannu Salmi
• University College London, UK. Ulrich Tiedau
• Loughborough University, UK. Melodee Beals
• Utrecht University, Netherlands. Jaap Verheul
• National Autonomous University of Mexico.
Isabel Galina Russell
• Universität Stuttgart. Steffen Koch
Member Institutions and PIs
The Finnish Team:
Otto Latva
Asko Nivala
Mila Oiva
Hannu Salmi
5. Data Providers
Germany
• Berlin State Library
• Hamburg State Library
• Bavarian State Library
Netherlands
• National Library of the Netherlands
United Kingdom
• British Library
• Cengage Publishing
Finland
• National Library of Finland
Data Providers
6. Available Data for the Project
Australia’s Trove Newspapers http://trove.nla.gov.au 18.5 million
British Newspapers Archive http://www.britishnewspaperarchive.co.uk 14.5 million
Chronicling America (US) http://chroniclingamerica.loc.gov 11 million
Europeana Newspapers http://europeana-newspapers.eu 20 million
Hemeroteca Nacional Digital de México http://www.hndm.unam.mx 9 million
National Library of Finland http://digi.kansalliskirjasto.fi/sanomalehti 2 million
National Library of the Netherlands http://www.delpher.nl/nl/kranten 11 million
National Library of Wales http://newspapers.library.wales 1.1 million
New Zealand's PapersPast http://paperspast.natlib.govt.nz 4 million
Cengage Newsvault (commercial) http://goo.gl/OgCvUo 16 million
Available Data for the Project
7. • To build classifiers for textual and visual similarity of related
newspaper passages;
• To create a networked ontology of different genres, forms, and
textual elements that emerged during the nineteenth century;
• To model and visualise textual migration and viral culture;
• To model and visualise conceptual migration and translation of
texts across regional, national, and linguistic boundaries;
• To analyze the sensitivity and generality of results; release public
collections
Aims
8. 1. Which stories spread between nations and how quickly?
2. Which texts were translated and resonated across languages?
3. How did textual copying (reprinting) operate internationally compared to
conceptual copying (ideas spread)?
4. How did the migration of texts facilitate the circulation of knowledge,
ideas, and concepts, and how were these ideas transformed as they
moved from one Atlantic context to another?
5. How did geopolitical realities (e.g. economic integration, technology,
migration, geopolitical power) influence the directionality of these
transnational exchanges?
6. How does reporting in immigrant and ethnic communities differ from
reporting in surrounding host countries?
7. Does the national organization of digitized newspaper archives
artificially foreclose globally-oriented research questions and outcomes?
Questions