The document summarizes the Europeana Newspapers Project, which digitized 18 million newspaper pages from across Europe between the 17th-20th centuries. The project aims to improve search capabilities and access to these historical newspapers by applying optical character recognition (OCR) and extracting metadata on people, places and organizations mentioned in articles. A network of 12 content providers, technical partners and others collaborated on enrichment, aggregation and dissemination of the newspaper content so it can be explored through Europeana and other online interfaces.
This document summarizes a presentation about using digital technologies and "big data" to study the emergence of the United States as a "reference culture" in public discourse in the Netherlands between 1890-1990. It discusses both the promises and limitations of digital approaches, including the ability to analyze large amounts of newspaper text but also the need to move from just finding information to exploring meaningful patterns and relationships in the data.
Sigrid Bosmans & Marijke Wienen (Erfgoedcel Mechelen) over een participatief museumtraject met bewoners, erfgoedzorgers en experten in Mechelen.
Doel: een zo breed mogelijk draagvlak creëren voor het nieuwe stadsmuseum. Het museum wil een levendige verbinding maken tussen de stad vroeger en nu, tussen he- den en verleden. Het stedelijk verleden een nieuwe plaats geven in het actueel stadsleven. De Mechelse bewoners, erfgoedzorgers, beleidsmakers en opinieleiders bepalen mee de invulling van het nieuwe museum.
The document summarizes the Europeana Newspapers Project, which digitized 18 million newspaper pages from across Europe between the 17th-20th centuries. The project aims to improve search capabilities and access to these historical newspapers by applying optical character recognition (OCR) and extracting metadata on people, places and organizations mentioned in articles. A network of 12 content providers, technical partners and others collaborated on enrichment, aggregation and dissemination of the newspaper content so it can be explored through Europeana and other online interfaces.
This document summarizes a presentation about using digital technologies and "big data" to study the emergence of the United States as a "reference culture" in public discourse in the Netherlands between 1890-1990. It discusses both the promises and limitations of digital approaches, including the ability to analyze large amounts of newspaper text but also the need to move from just finding information to exploring meaningful patterns and relationships in the data.
Sigrid Bosmans & Marijke Wienen (Erfgoedcel Mechelen) over een participatief museumtraject met bewoners, erfgoedzorgers en experten in Mechelen.
Doel: een zo breed mogelijk draagvlak creëren voor het nieuwe stadsmuseum. Het museum wil een levendige verbinding maken tussen de stad vroeger en nu, tussen he- den en verleden. Het stedelijk verleden een nieuwe plaats geven in het actueel stadsleven. De Mechelse bewoners, erfgoedzorgers, beleidsmakers en opinieleiders bepalen mee de invulling van het nieuwe museum.
The Presentation of Hans-Jörg Lieder, Staatsbibliothek zu Berlin – Preußischer Kulturbesitz, at the BnF Information Day for Europeana Newspapers (November 2014).
Optical Character Recognition (OCR) technology can help users in their research by digitizing printed texts and enabling full-text search. However, OCR quality varies and error rates can be as high as 10-40% depending on factors like language and publication date. This can negatively impact researchers seeking all occurrences of search terms. Crowd-sourcing corrections for searched words and utilizing external knowledge sources like Wikipedia could help improve search results and researchers' experiences. Machine learning applied to large digitized collections also has potential to extract additional useful information and insights not readily apparent from the text alone.
The document discusses Optical Layout Recognition (OLR) to convert scanned newspaper pages into structured digital files. It describes CCS's role in providing OLR technology and services to structure over 2 million newspaper pages from 5 European library partners. The general OLR workflow involves scanning, layout analysis to identify text blocks and zones, OCR, and quality assurance. CCS will analyze page layouts to recognize elements like articles, headlines, images and classify page types. Libraries can perform final quality assurance checking on the structured output, which is packaged in METS and ALTO formats for preservation and improved search and access capabilities.
The Europeana Newspapers project is digitizing newspapers from the 17th-20th centuries across 22 European languages. It has provided full text for over 2 million newspaper pages and metadata for over 18 million additional pages. Usability testing was conducted with researchers and improvements were made to search, browsing, and display functionality based on feedback. Researchers value the project for enabling new large-scale, interdisciplinary, and computational analyses of digitized newspaper archives.
The document discusses the Europeana Newspapers project, which aims to digitize over 18 million newspaper pages from various European newspapers ranging from the 17th to 20th centuries. The project involves 12 content providers, 2 networking partners, 4 technology providers and 1 aggregator working together to improve access to historical newspapers. Key aspects of the project include cultural cooperation, skills sharing, improved search capabilities through technologies like optical character recognition. The project highlights how digitization has improved access to historical newspapers and their coverage of events like the Titanic disaster across different European countries.
This document discusses optical character recognition (OCR) of historical newspapers. It describes the digitization process, which includes image capturing, text and structure recognition, natural language processing, and content representation. OCR accuracy can be improved through layout analysis, structural metadata extraction, and identifying different content units like articles, advertisements, and entertainment sections. The goal is to make the content and knowledge within digitized newspapers accessible beyond the scanned text.
The Presentation of Hans-Jörg Lieder, Staatsbibliothek zu Berlin – Preußischer Kulturbesitz, at the BnF Information Day for Europeana Newspapers (November 2014).
Optical Character Recognition (OCR) technology can help users in their research by digitizing printed texts and enabling full-text search. However, OCR quality varies and error rates can be as high as 10-40% depending on factors like language and publication date. This can negatively impact researchers seeking all occurrences of search terms. Crowd-sourcing corrections for searched words and utilizing external knowledge sources like Wikipedia could help improve search results and researchers' experiences. Machine learning applied to large digitized collections also has potential to extract additional useful information and insights not readily apparent from the text alone.
The document discusses Optical Layout Recognition (OLR) to convert scanned newspaper pages into structured digital files. It describes CCS's role in providing OLR technology and services to structure over 2 million newspaper pages from 5 European library partners. The general OLR workflow involves scanning, layout analysis to identify text blocks and zones, OCR, and quality assurance. CCS will analyze page layouts to recognize elements like articles, headlines, images and classify page types. Libraries can perform final quality assurance checking on the structured output, which is packaged in METS and ALTO formats for preservation and improved search and access capabilities.
The Europeana Newspapers project is digitizing newspapers from the 17th-20th centuries across 22 European languages. It has provided full text for over 2 million newspaper pages and metadata for over 18 million additional pages. Usability testing was conducted with researchers and improvements were made to search, browsing, and display functionality based on feedback. Researchers value the project for enabling new large-scale, interdisciplinary, and computational analyses of digitized newspaper archives.
The document discusses the Europeana Newspapers project, which aims to digitize over 18 million newspaper pages from various European newspapers ranging from the 17th to 20th centuries. The project involves 12 content providers, 2 networking partners, 4 technology providers and 1 aggregator working together to improve access to historical newspapers. Key aspects of the project include cultural cooperation, skills sharing, improved search capabilities through technologies like optical character recognition. The project highlights how digitization has improved access to historical newspapers and their coverage of events like the Titanic disaster across different European countries.
This document discusses optical character recognition (OCR) of historical newspapers. It describes the digitization process, which includes image capturing, text and structure recognition, natural language processing, and content representation. OCR accuracy can be improved through layout analysis, structural metadata extraction, and identifying different content units like articles, advertisements, and entertainment sections. The goal is to make the content and knowledge within digitized newspapers accessible beyond the scanned text.
1. De onderzoeker en de krant
Dr. Samuël Kruizinga (KB Onderzoeker te Gast / Universiteit van Amsterdam
2. De onderzoeker en de krant
• Waarom krantenonderzoek?
• Praktische voorbeelden
• Mijn eigen onderzoek
• Uitdaging aan Europeana Newspapers
Onderzoeker te gast 28-10-2014
3. De onderzoeker en de krant
• Waarom krantenonderzoek?
• Praktische voorbeelden
• Mijn eigen onderzoek
• Uitdaging aan Europeana Newspapers
Onderzoeker te gast 28-10-2014
4. Wat staat er in de krant?
• ‘Nieuws’
• Opinie en/of commentaar
• Reclame
• Financiële, juridische, commerciële of sociale gegevens
• Humor
• Spotprenten / afbeeldingen
• Literatuur / wetenschap
Onderzoeker te gast 28-10-2014
5. Kranten als historische informatiebron
• Harde data
• Advertenties geven informatie over koopkracht
• Nieuws, commentaar, interpretatie…
• … vanuit verschillende gezichtspunten en invalshoeken
• Cultuurhistorische gegevens
• Veranderingen in stijl, smaak
• Voorbeelden: De Tijd, 24 juli 1870
Onderzoeker te gast 28-10-2014
7. Kranten en ‘publieke opinie’
• Wie vindt er iets als de krant iets vindt?
• Individuele journalist? Of (hoofd)redacteur?
• Journalist als representant van sociale groep?
• Reflectie van mening van bepaalde elites?
• Reflectie van ‘de’ publieke opinie?
• Wat gebeurt er met de informatie uit de krant?
• Invloed op politieke / maatschappelijke debatten?
• Invloed op “de lezer”
• Bredere vraag: in hoeverre is krant een reflectie van
een maatschappij?
• Praktisch voorbeeld
Onderzoeker te gast 28-10-2014
9. Krant is commercieel
• Kranten volgen het nieuws…
• … dat wordt gemaakt door Bismarck en de Paus…
• … maar nieuws wordt geselecteerd en gekleurd door
maatschappelijke voorkeur van eigen lezerspubliek
• Tilburgsche Courant: conservatief-katholiek regionaal
georiënteerd dagblad
Onderzoeker te gast 28-10-2014
10. De krant en de lezer
• Wat doet een krantenbericht met een lezer? Lastig.
• statistische informatie over oplages
• opiniepolls / verkiezingsuitslagen en verschuivende
openbare mening
• geletterdheid en sociaaleconomische omstandigheden
• Algemeen
• De Lezer is en blijft een enigma
• Kranten vormen of versterken meningen
• Grootschalig onderzoek neemt (iets) van onzekerheid weg
• ‘Digital humanities’: Pim Huijnen
Onderzoeker te gast 28-10-2014
11. De onderzoeker en de krant
• Waarom krantenonderzoek?
• Praktische voorbeelden
• Mijn eigen onderzoek
• Digitale (on)mogelijkheden
• Uitdaging aan Europeana
Onderzoeker te gast 28-10-2014
13. De Nederlandse herinnering aan WOI
• Herinnering: ‘collectief geheugen’ (Halbwachs 1924)
• ‘Impact’ van een gebeurtenis op een samenleving
• Onderzoek nu: gedenktekens, cultuuruitingen
• Nadruk op ‘de dood’
• Mijn insteek
• Van formeel naar informeel, van elitair naar breed-
maatschappelijk
• Referenties in kranten naar de oorlog in periode 1918-
1940
• In welke contexten refereerden kranten aan de oorlog?
• Hoe?
• Topic modelling
Onderzoeker te gast 16-09-2014
14. De onderzoeker en de krant
• Waarom krantenonderzoek?
• Praktische voorbeelden
• Mijn eigen onderzoek
• Uitdaging aan Europeana Newspapers
Onderzoeker te gast 28-10-2014
15. Europeana Newspapers…
• Maakt veel mogelijk!
• Grote voordelen:
• Gemak
• Enormiteit
• Vergemakkelijkt, verbreed en vernieuwd onderzoek
• Inter- en transnationale fenomenen
• Maar let op:
• Balans tussen kwantitatief en kwalitatief onderzoek
• Balans tussen context, paratext en ‘bag of words’
• Laat zien wat kan!
Onderzoeker te gast 28-10-2014