This document summarizes a presentation given by Clemens Neudecker of the Staatsbibliothek zu Berlin on reading a million books and newspapers through digitization. It discusses various digital library projects and collections containing millions of digitized objects. It then focuses on the Europeana Newspapers project, which has digitized over 12 million historic newspaper pages from across Europe. The presentation describes the formats and standards used in digitization, as well as tools for working with digitized content. It also evaluates the performance of optical character recognition on the Europeana Newspapers collection and challenges involved in processing historic newspaper text.
Stiller & Király, Multilinguality of MetadataPéter Király
Measuring the Multilingual Degree of Europeana‘s Metadata.
This presentation is a kind of status report. We have started a second phase of our research, but it is not finished yet. We have several questions to investigates, but we already have some approaches as well - and we hope that they are relevant not only in Europeana but in other cultural heritage institutions.
The Great Twentieth-Century Hole Or, what the Digital Humanities MissTU Delft, Netherlands
The Great Twentieth-Century Hole Or, what the Digital Humanities Miss.
Paper looking at lack of representation of 20th Century Digital Humanities
Presentation for Digital Humanities Benelux, June 2014
New approaches for data acquisition at europeana iiif, sitemaps and schema.o...Nuno Freire
Presentation on experiments at Europeana regarding new methods of aggregating metadata.
Presented at the Seminar Linked Data in Research and Cultural Heritage, on 1st of May 2017.
What is social software engineering? How do we collect the data? What kind of data do we collect? How do we analyse it? What challenges are we facing when collecting and analysing social software engineering data?
Representation and Absence in Digital Resources: The Case of Europeana Newspa...TU Delft, Netherlands
Presentation at Digital Humanities 2014, Lausanne. Looks at some of the issues related to digitising historic newspapers in Europe, particularly how a website that can search through all of them can be built
Stiller & Király, Multilinguality of MetadataPéter Király
Measuring the Multilingual Degree of Europeana‘s Metadata.
This presentation is a kind of status report. We have started a second phase of our research, but it is not finished yet. We have several questions to investigates, but we already have some approaches as well - and we hope that they are relevant not only in Europeana but in other cultural heritage institutions.
The Great Twentieth-Century Hole Or, what the Digital Humanities MissTU Delft, Netherlands
The Great Twentieth-Century Hole Or, what the Digital Humanities Miss.
Paper looking at lack of representation of 20th Century Digital Humanities
Presentation for Digital Humanities Benelux, June 2014
New approaches for data acquisition at europeana iiif, sitemaps and schema.o...Nuno Freire
Presentation on experiments at Europeana regarding new methods of aggregating metadata.
Presented at the Seminar Linked Data in Research and Cultural Heritage, on 1st of May 2017.
What is social software engineering? How do we collect the data? What kind of data do we collect? How do we analyse it? What challenges are we facing when collecting and analysing social software engineering data?
Representation and Absence in Digital Resources: The Case of Europeana Newspa...TU Delft, Netherlands
Presentation at Digital Humanities 2014, Lausanne. Looks at some of the issues related to digitising historic newspapers in Europe, particularly how a website that can search through all of them can be built
Profiling Web Archives
IIPC General Assembly
Paris, France, May 21, 2014
Michael Nelson, Ahmed AlSum, Michele Weigle, Herbert Van de Sompel, David Rosenthal
One day workshop Linked Data and Semantic WebVictor de Boer
As taught at UNIMAS July 2019. based on a three day summer school by Knud Hinnerk Moeller and Victor de Boer. Includes hands on excercises using SWI-Prolog ClioPatria
Wikidata, a target for Europeana's semantic strategy - GLAM-WIKI 2015Antoine Isaac
"Wikidata, a target for Europeana's semantic strategy"/ Presentation at the GLAM-Wiki conference with Valentine Charles, Hugo Manguinhas, Antoine Isaac, Vladimir Alexiev http://nl.wikimedia.org/wiki/GLAM-WIKI_2015/
Authority Files and Web 2.0.
Presentation during the EDL Workshop "Extending the multilingual capacity of The European Library in the EDL project" in Stockholm 23.11.07
Presentation at the H2020-CEF Infoday, 16 January 2014 http://ec.europa.eu/digital-agenda/en/news/information-and-networking-days-h2020-work-programme-2014-2015-connecting-europe-facility
Slides of the presentations gives as part of the Europeana Research panel "Cultural Heritage Data for Research: A Europeana Research Panel" at DH Benelux 2017 in Utrecht.
Keynote presentation for CSWS 2013 Conference in Shanghai, China.
Some slides borrowed from Jan Wielemaker, Guus Schreiber, Jacco van Ossenbruggen, Niels Ockeloen, Antske Fokkens, Serge ter Braake.
Présentation dans le cadre du "33rd joined meeting of the CIDOC CRM SIG and ISO/TC46/SC4/WG9 and the 26th FRBR - CIDOC CRM Harmonization meeting" (Germanisches Nationalmuseum, Nuremberg, 19-22 May 2015 ) par Stefanie Gehrke (coordinatrice metadonnées Biblissima).
Estermann Panel on Authority Files, 3 June 2020Beat Estermann
Panel on Authority Files and Controlled Vocabularies: Welcome and Introduction; GLAM Inventory; Named Entities in the Context of the LOD Ecosystem for the Performing Arts. Side programme of the Swiss Open Cultural Data Hackathon 2020, Online Session, 3 June 2020.
Europeana and schema.org
Presentation at the Dublin Core conference, special session on Schema.org, Sept 5, 2013.
Conference site: http://dcevents.dublincore.org/index.php/IntConf/dc-2013/
The Use of Big Data Techniques for Digital ArchivingSven Schlarb
These slides were used in a presentation at the "Our Digital Future - Multidisciplinary Perspectives on Long Term Data Preservation and Access" conference in Cambridge/UK in March 2016 in the session "Current and Future perspectives on technology for data preservation and sharing". They describe work in progress in the E-ARK project, which is co-funded by the European Commission and has as its main objective the creation of a scalable open source, digital archiving system offering efficent search and access content of very large digital object collections. The focus of this presentation lies on describing the core big data technologies (Apache Hadoop, Apache Hbase, and the document repository Lily developed by NGData), the architecture of the E-ARK integrated prototype implementation, and data mining use cases related to geographical data, named entitity extraction, and OLAP data analysis.
Profiling Web Archives
IIPC General Assembly
Paris, France, May 21, 2014
Michael Nelson, Ahmed AlSum, Michele Weigle, Herbert Van de Sompel, David Rosenthal
One day workshop Linked Data and Semantic WebVictor de Boer
As taught at UNIMAS July 2019. based on a three day summer school by Knud Hinnerk Moeller and Victor de Boer. Includes hands on excercises using SWI-Prolog ClioPatria
Wikidata, a target for Europeana's semantic strategy - GLAM-WIKI 2015Antoine Isaac
"Wikidata, a target for Europeana's semantic strategy"/ Presentation at the GLAM-Wiki conference with Valentine Charles, Hugo Manguinhas, Antoine Isaac, Vladimir Alexiev http://nl.wikimedia.org/wiki/GLAM-WIKI_2015/
Authority Files and Web 2.0.
Presentation during the EDL Workshop "Extending the multilingual capacity of The European Library in the EDL project" in Stockholm 23.11.07
Presentation at the H2020-CEF Infoday, 16 January 2014 http://ec.europa.eu/digital-agenda/en/news/information-and-networking-days-h2020-work-programme-2014-2015-connecting-europe-facility
Slides of the presentations gives as part of the Europeana Research panel "Cultural Heritage Data for Research: A Europeana Research Panel" at DH Benelux 2017 in Utrecht.
Keynote presentation for CSWS 2013 Conference in Shanghai, China.
Some slides borrowed from Jan Wielemaker, Guus Schreiber, Jacco van Ossenbruggen, Niels Ockeloen, Antske Fokkens, Serge ter Braake.
Présentation dans le cadre du "33rd joined meeting of the CIDOC CRM SIG and ISO/TC46/SC4/WG9 and the 26th FRBR - CIDOC CRM Harmonization meeting" (Germanisches Nationalmuseum, Nuremberg, 19-22 May 2015 ) par Stefanie Gehrke (coordinatrice metadonnées Biblissima).
Estermann Panel on Authority Files, 3 June 2020Beat Estermann
Panel on Authority Files and Controlled Vocabularies: Welcome and Introduction; GLAM Inventory; Named Entities in the Context of the LOD Ecosystem for the Performing Arts. Side programme of the Swiss Open Cultural Data Hackathon 2020, Online Session, 3 June 2020.
Europeana and schema.org
Presentation at the Dublin Core conference, special session on Schema.org, Sept 5, 2013.
Conference site: http://dcevents.dublincore.org/index.php/IntConf/dc-2013/
The Use of Big Data Techniques for Digital ArchivingSven Schlarb
These slides were used in a presentation at the "Our Digital Future - Multidisciplinary Perspectives on Long Term Data Preservation and Access" conference in Cambridge/UK in March 2016 in the session "Current and Future perspectives on technology for data preservation and sharing". They describe work in progress in the E-ARK project, which is co-funded by the European Commission and has as its main objective the creation of a scalable open source, digital archiving system offering efficent search and access content of very large digital object collections. The focus of this presentation lies on describing the core big data technologies (Apache Hadoop, Apache Hbase, and the document repository Lily developed by NGData), the architecture of the E-ARK integrated prototype implementation, and data mining use cases related to geographical data, named entitity extraction, and OLAP data analysis.
We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...Trevor Owens
Libraries, Archives and Museums now have massive digital
holdings. There is tremendous potential for library and
information science, computer science and computer engineering
researchers to partner with cultural heritage institutions and
make our digital cultural record more useful and usable. In
particular, there is a significant need to bridge basic research in
areas such as computer vision, crowdsourcing, natural language
processing, multilingual OCR, and machine learning to make this
work directly usable in the practices of cultural heritage
institutions. In this talk, I discuss a series of exemplar projects,
largely funded through the Institute of Museum and Library
Services National Digital Platform initiative, that illustrate some
key principles for building applied research partnerships with
cultural heritage institutions. Building on Ben Schniderman’s
The New ABCs of Research: Achieving Breakthrough
Collaborations, I focus specifically on why the public purpose
and missions of cultural heritage institutions are particularly
valuable in establishing new kinds of collaborations that can
simultaneously advance basic research and the ability for people
of the world to engage with their cultural record.
Slides of the paper Curation Technologies for a Cultural Heritage Archive: Analysing and transforming a heterogeneous data set into an interactive curation workbench by Georg Rehm, Martin Lee, Julián Moreno Schneider and Peter Bourgonje at the 3rd Edition of the DATeCH2019 International Conference
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...BigData_Europe
Slides for keynote talk at the Big Data Europe workshop nr 3 on 11.9.2017 in Amsterdam co-located with SEMANTiCS2017 conference by Ron Dekker, Director CESSDA: European Open Science Agenda: where we are and where we are going?
Redesigning our Combine Harvester
Experimenting with Metalib’s X-server:
ideas, attempts and realisations
Ane W. van der Leij
Dpts of Information & Collection Development /
Digital Library Facilities
University Library
University of Groningen (RUG), the Netherlands
IGELU STOCKHOLM 2006
The Meertens Institute, part of the Royal Netherlands Academy of Arts and Sciences, is also a memory institution, where records are digitally preserved and curated. This talk will give an overview of the different types of records currently digitally curated at the Meertens Institute. We highlight our recent projects, such as the Sailing Letters project, where we use crowd sourcing to transcribe centuries-old handwritten letters, or the Radical Political Representation project, where we crowd source the analysis of political cartoons. These are all exemplary Digital Humanities cases, and we show our approach to the digital archiving of these materials, from creation to (re-)use.
ARCLib project presentation from Pasig 2016dp-blog-cz
Digital preservation project by a group of Czech Libraries, financed by the Ministery of Culture of Czech Republic applied research grant. First information.
Linked Open Data and The Digital Archaeological Workflow at the Swedish Natio...Marcus Smith
A presentation of two aspects of the linked open data work ongoing at the Swedish National Heritage Board (Riksantikvarieämbetet): Swedish Open Cultural Heritage (SOCH/K-samsök) and the Digital Archaeological Process (DAP).
Delivered at the Smithsonian, Washington, DC, 2014-11-10
SCAPE Presentation at the Elag2013 conference in Gent/BelgiumSven Schlarb
Presentation of the European project SCAPE (www.scape-project.eu) at the Elag2013 conference in Gent/Belgium. The presentation includes details about use cases and implementation at the Austrian National LIbrary.
EuropeanaTech x AI: Qurator.ai @ Berlin State Librarycneudecker
The EuropeanaTech Community and Europeana Foundation are delighted to introduce a new webinar series to explore the opportunities and challenges of working with Artificial Intelligence in the cultural heritage and arts sector.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
6. About me
• Research Coordinator @ Berlin State Library
• M.A. Philosophy, Computer Science, Political Science
• Mostly curious about
– Optical Character Recognition, Document Analysis
– Natural Language Processing
– Digital Humanities
• More: @cneudecker, cneud.net
7. Staatsbibliothek zu Berlin
• Established 1661 as the Library of the King of Prussia
• Today largest research library in Germany,
with approx. 11.5m volumes (23m objects)
• Part of the „Stiftung Preußischer Kulturbesitz“,
a unique union of museums, archives, libraries
and research institutes from Berlin
• http://staatsbibliothek-berlin.de/
15. Formats & Standards
• What data is available?
• Typically, a digital object is composed of:
– Scanned Images in TIFF, JP2 or JPEG
– Descriptive metadata in DublinCore
– Structural metadata in METS
– Text content in ALTO or TEI
– Europeana in EDM
– Linked Data in RDF or JSON-LD
19. Europeana Newspapers
• EU-project to make Europe‘s historical
newspapers searchable & accessible
• http://www.europeana-newspapers.eu/
20. Europeana Newspapers Collection
• 12 million historic newspaper pages text
(> 10.000.000.000 tokens)
• 40 languages, 4 alphabets
• 400 years (1618 – 2016)
• http://www.theeuropeanlibrary.org/tel4/newspapers
21. OCR / OLR
(U.lag nul «chestttetrung- ■geeinoel II, Setch«it,zen I—Ig Ufr sterntpeechee g» U II.
für ftrene-geingelpilche: 13 01191 nnd 13 03 11 io"gl f l««lt-beOeu; OetHn *1,
blnftraße IS IZeinsptechee; H I Sanemeinummet gurfilrft 8MB); ««de«: gdn.o(tio||e III
(ZemlpreAei 284.3»). Iie.gonlen nur nnier heimnnn » Erden bei der veutlchen Bonl
»n« Vtdennld-Getelltchoil gttloto bumduig. Commerz- nndprinoldonlN voINchrSomI
bomduig u 189 ß>, .»ontbeegee Kochelchlen- eitchelne» 12 mal wSchenNIch. täglich
zweimal — morgen« nnd ndendn —, Sonntage nnr morgen». Toonlnge nur abend»
Zn den Kochdorerlen wird die Ndend-Nuegode noch am üben!
Dieser Entwurf ist. wie Bürgermeister Roß mitteilte, den Fraktionen zur
Stellungnahme vorgelegt worden. Zum Donnerstag war eine zweite Sitzung der
Fraktionsfübrer vom Vertreter des Senats angeordnet worden, zu der ober zwei
Fraktionen, die Teutschnationalen und die Nationalsozialisten, nicht erschienen
waren. Von den Nationalsozialisten ist kurz vor Beginn der Sitzung eine telephonische
Erklärung abgegeben worden, etwa des Inhalts, daß die Fraktion sich den sachlichen
Verhandlungen entziehen müsse, solange nicht gewisse Vorbedingungen erfüllt sein.
http://www.theeuropeanlibrary.org/tel4/newspapers/issue/Hamburger_Nachrichten/1932/12/31
27. Named Entity Recognition
• 3 Categories:
– PERSON; LOCATION; ORGANIZATION
• 3 Languages:
– Dutch; French; German
• Powered by Stanford CoreNLP - CRF-NER
28. Annotations
Language # tokens # PER # LOC # ORG
French 207,000 5,672 5,614 2,574
Dutch 182,483 4,492 4,448 1,160
German 96,735 7,914 6,143 2,784
Language # tokens # PER # LOC # ORG
French 100% 2,75% 2,71% 1,24%
Dutch 100% 2,46% 2,44% 0,64%
German 100% 8,18% 6,35% 2,88%
Language Word-Error-Rate
(Bag of Words)
Reading Order
Success Rate
French 16,6% 19,9%
Dutch 17,6% 23,2%
German 15,9% / 21,9% 13,6%
31. Lack of metadata
Issue
There is no associated metadata for the annotated
text (newspaper title, date, etc.)
Solution
Automatically match lines with newspaper pages
through keyword search
32. OCR errors vs. historical spelling
Issue
Text contains OCR errors but also
valid(!) historical spelling variants
Solution
Document language profiling to distinguish
OCR errors and spelling variants
theylteil eyeitht
,
33. Sentence splits
Issue
During data pre-processing, (parts) of
sentences have been erroneously cut
Solution
Reconstruct sentences through keyword
search and matching procedure