Challenges and Solutions in
Creating a European Historic
Newspapers Browser

Alastair Dunning
Europeana Newspapers
September 2013
Work
Package 4 -

Aggregation and
presentation of digitized
newspapers

Task $
“...
Creation of a full-text index
of newspaper content
Development of a
newspaper content
browser
In reality ...

The European Library is
building an interface to
allow cross-searching of
historic newspapers
digitised by project
partners
Title-level metadata
exported to Europeana.
Sep 2013 - Beta version with limited
content and functionality made
available

Timetable

2014 - Ongoing inclusion of more
content and functionality

Spring 2014 - Usability testing I (subject
to project funding)
Winter 2014 - Usability testing II (subject
to project funding)

Jan 2015 - All scheduled content and
functionality completed
Full Images, Full Text, Metadata

What content
will be
included ?

Latvia, Belgrade, Hamburg, Berlin,
Estonia, Finland, Netherlands *,
Austria *

Snippets of Images, Full Text,
Metadata

Frederich Tessman, France *,
Poland
Complete
Newspaper image
can be shown
Eesti Potimees ehk
Naddaleleht , 2
November 1866
(National Library of
Estonia)
Full Images, Full Text, Metadata

What content
will be
included ?

Latvia, Belgrade, Hamburg, Berlin,
Estonia, Finland, Netherlands *,
Austria *

Snippets of Images, Full Text,
Metadata

Frederich Tessman, France *,
Poland
Fragment of Newspaper
image can be shown
Dziennik Slaskui, 10 June
1915
(National Library of
Poland)
Just Metadata

What content
will be
included ?

Turkey
(Partners with copyright issues)
All Associate Partners (for now)

The available content in influenced
by what restrictions in copyright
and business model from each
of the contributing libraries.
Just title level metadata can
be shown:
“Kleine Blatt, 15 November
1932”
(National Library of Austria)
(Although can we have dark
index of full text ?)
Creating a
newspapers
interface
that ...

• Provides unique value to users
• Reflects relationship to original
•
•
•
•

physical newspaper collections
Is sustainable
Offers contributors added value
Defines relationship to
Europeana
Respects library wishes
Provides
unique value
to users

Users can cross-search
European Newspapers
18m pages, 10m with full text
Users can see what was
published on a particular day
across Europe
Users can see information on
individual newspapers
But who are
the users ?

Local historians
Researchers
Undergraduates
Genealogists
Teachers and / school pupils
‘Interested public’
….
(According to the project
Description of Work it is for the
‘researcher’)
Respects
library wishes

The available content in
influenced by what
restrictions in copyright and
business model from each
of the contributing libraries.
●Location of digital image
●Size of image
●Format of image
Reflects
relationship to
original
physical
newspaper
collections

Not all issues in a
newspaper title will be
available to TEL, or even
digitised
Documents hosted by TEL
will be different quality than
those
Contextual information vital
Embedded in The European
Library (TEL) portal

Is sustainable
TEL membership fees will
help with ongoing costs
TEL members can add
content to newspaper
browser over time
Offers
contributors
added value

Logos and links back to
source of original content
But also evidence of usage
of library content via TEL /
what statistics are needed ?
Is developed
iteratively

Interface will respond to
usability testing
Harvesting of different
material will affect interface
Changing requests from
libraries
Uneven quality, especially in
First
Iteration

Basic text search
Filtering of results by
●date
●country
●newspaper
●language
●library
First
Iteration

● OCR shown
● Zoomable version of full
image
● Clickable links between
full text and image
(sometimes)
● Link to newspaper source
library (where we have
been provided with links)
Second
Iteration

● Fragments (where requested by
library)
● See information on particular
title
● See what was published on a
particular day
● Search over titles (not just text)
● Other browseable visualisations
of publication and library
source
● Search / browse via entities
Testing the
Site

Newspapers from national libraries of
Finland and Austria are available for
searching (Sample search terms: Linz,
Graz, Salzburg, Turku, Oulu, Tampere)
{Site will be made available for testing
after the conference}
1. Play with site (it will break)
2. Put post it notes on wall …
Landing Page / Search Results Page /
Newspaper Page

Alastair Dunning, Challenges and Solutions in Creating a European Historic Newspaper Browser, TEL

  • 1.
    Challenges and Solutionsin Creating a European Historic Newspapers Browser Alastair Dunning Europeana Newspapers September 2013
  • 2.
    Work Package 4 - Aggregationand presentation of digitized newspapers Task $ “... Creation of a full-text index of newspaper content Development of a newspaper content browser
  • 3.
    In reality ... TheEuropean Library is building an interface to allow cross-searching of historic newspapers digitised by project partners Title-level metadata exported to Europeana.
  • 4.
    Sep 2013 -Beta version with limited content and functionality made available Timetable 2014 - Ongoing inclusion of more content and functionality Spring 2014 - Usability testing I (subject to project funding) Winter 2014 - Usability testing II (subject to project funding) Jan 2015 - All scheduled content and functionality completed
  • 5.
    Full Images, FullText, Metadata What content will be included ? Latvia, Belgrade, Hamburg, Berlin, Estonia, Finland, Netherlands *, Austria * Snippets of Images, Full Text, Metadata Frederich Tessman, France *, Poland
  • 6.
    Complete Newspaper image can beshown Eesti Potimees ehk Naddaleleht , 2 November 1866 (National Library of Estonia)
  • 7.
    Full Images, FullText, Metadata What content will be included ? Latvia, Belgrade, Hamburg, Berlin, Estonia, Finland, Netherlands *, Austria * Snippets of Images, Full Text, Metadata Frederich Tessman, France *, Poland
  • 8.
    Fragment of Newspaper imagecan be shown Dziennik Slaskui, 10 June 1915 (National Library of Poland)
  • 9.
    Just Metadata What content willbe included ? Turkey (Partners with copyright issues) All Associate Partners (for now) The available content in influenced by what restrictions in copyright and business model from each of the contributing libraries.
  • 10.
    Just title levelmetadata can be shown: “Kleine Blatt, 15 November 1932” (National Library of Austria) (Although can we have dark index of full text ?)
  • 11.
    Creating a newspapers interface that ... •Provides unique value to users • Reflects relationship to original • • • • physical newspaper collections Is sustainable Offers contributors added value Defines relationship to Europeana Respects library wishes
  • 12.
    Provides unique value to users Userscan cross-search European Newspapers 18m pages, 10m with full text Users can see what was published on a particular day across Europe Users can see information on individual newspapers
  • 13.
    But who are theusers ? Local historians Researchers Undergraduates Genealogists Teachers and / school pupils ‘Interested public’ …. (According to the project Description of Work it is for the ‘researcher’)
  • 14.
    Respects library wishes The availablecontent in influenced by what restrictions in copyright and business model from each of the contributing libraries. ●Location of digital image ●Size of image ●Format of image
  • 15.
    Reflects relationship to original physical newspaper collections Not allissues in a newspaper title will be available to TEL, or even digitised Documents hosted by TEL will be different quality than those Contextual information vital
  • 16.
    Embedded in TheEuropean Library (TEL) portal Is sustainable TEL membership fees will help with ongoing costs TEL members can add content to newspaper browser over time
  • 17.
    Offers contributors added value Logos andlinks back to source of original content But also evidence of usage of library content via TEL / what statistics are needed ?
  • 18.
    Is developed iteratively Interface willrespond to usability testing Harvesting of different material will affect interface Changing requests from libraries Uneven quality, especially in
  • 19.
    First Iteration Basic text search Filteringof results by ●date ●country ●newspaper ●language ●library
  • 20.
    First Iteration ● OCR shown ●Zoomable version of full image ● Clickable links between full text and image (sometimes) ● Link to newspaper source library (where we have been provided with links)
  • 21.
    Second Iteration ● Fragments (whererequested by library) ● See information on particular title ● See what was published on a particular day ● Search over titles (not just text) ● Other browseable visualisations of publication and library source ● Search / browse via entities
  • 22.
    Testing the Site Newspapers fromnational libraries of Finland and Austria are available for searching (Sample search terms: Linz, Graz, Salzburg, Turku, Oulu, Tampere) {Site will be made available for testing after the conference} 1. Play with site (it will break) 2. Put post it notes on wall … Landing Page / Search Results Page / Newspaper Page