The document discusses OCR for typewritten documents. It describes the IMPACT project, which is supported by the European Community under the FP7 ICT Work Programme and coordinated by the National Library of the Netherlands. The presentation covers the challenges of typewritten documents for OCR, the specific approaches used in the IMPACT project's TOCR system, and some example results showing its performance.
The document discusses linguistic resources created for improving access to 16th century German texts. It describes how the IMPACT project adapted resources like lexicons to account for the differences between historical and modern German. A groundtruth corpus spanning 1500-1950 was created, as well as a hypothetical lexicon of rule-based variants and a manually verified lexicon to map historical words to their modern equivalents. These resources were able to cover 30% of 16th century vocabulary and improve optical character recognition.
The document discusses a structural analysis tool called the Functional Extension Parser (FEP). It can recognize structural elements in documents like pages numbers, headings, footnotes, and tables of contents. This information adds context and can improve search, navigation, and other functions. The FEP follows a rule-based approach and currently achieves over 80% accuracy for common book elements. It is being integrated into the IMPACT project and will soon be available as a web service.
The document discusses a project called IMPACT that is supported by the European Community under its FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. It focuses on developing computer lexicons to help with optical character recognition (OCR) and information retrieval of historical documents.
The document discusses the IMPACT project, which is supported by the European Community to improve optical character recognition (OCR). It is coordinated by the National Library of the Netherlands. The document outlines the OCR process of binarization, segmentation, and pattern matching. It highlights improvements made by the IMPACT project to these steps. Specifically, it details how IMPACT developed better algorithms for binarization, segmentation of text blocks and lines, and recognition of languages like Fraktur.
Presentation given by Hildelies Balk during the 2nd LIBER-EBLIDA Workshop on Digitisation of Library Material in Europe (19-21 October 2009, The Hague, the Netherlands)
The IMPACT project is supported by the European Community under its FP7 ICT Work Programme and coordinated by the National Library of the Netherlands. It aims to address challenges in digitizing historical materials through technical solutions for tasks like document warping correction, OCR, named entity recognition and collaborative correction. The project involves 13 universities and research centers and 2 industry partners.
The document discusses OCR for typewritten documents. It describes the IMPACT project, which is supported by the European Community under the FP7 ICT Work Programme and coordinated by the National Library of the Netherlands. The presentation covers the challenges of typewritten documents for OCR, the specific approaches used in the IMPACT project's TOCR system, and some example results showing its performance.
The document discusses linguistic resources created for improving access to 16th century German texts. It describes how the IMPACT project adapted resources like lexicons to account for the differences between historical and modern German. A groundtruth corpus spanning 1500-1950 was created, as well as a hypothetical lexicon of rule-based variants and a manually verified lexicon to map historical words to their modern equivalents. These resources were able to cover 30% of 16th century vocabulary and improve optical character recognition.
The document discusses a structural analysis tool called the Functional Extension Parser (FEP). It can recognize structural elements in documents like pages numbers, headings, footnotes, and tables of contents. This information adds context and can improve search, navigation, and other functions. The FEP follows a rule-based approach and currently achieves over 80% accuracy for common book elements. It is being integrated into the IMPACT project and will soon be available as a web service.
The document discusses a project called IMPACT that is supported by the European Community under its FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. It focuses on developing computer lexicons to help with optical character recognition (OCR) and information retrieval of historical documents.
The document discusses the IMPACT project, which is supported by the European Community to improve optical character recognition (OCR). It is coordinated by the National Library of the Netherlands. The document outlines the OCR process of binarization, segmentation, and pattern matching. It highlights improvements made by the IMPACT project to these steps. Specifically, it details how IMPACT developed better algorithms for binarization, segmentation of text blocks and lines, and recognition of languages like Fraktur.
Presentation given by Hildelies Balk during the 2nd LIBER-EBLIDA Workshop on Digitisation of Library Material in Europe (19-21 October 2009, The Hague, the Netherlands)
The IMPACT project is supported by the European Community under its FP7 ICT Work Programme and coordinated by the National Library of the Netherlands. It aims to address challenges in digitizing historical materials through technical solutions for tasks like document warping correction, OCR, named entity recognition and collaborative correction. The project involves 13 universities and research centers and 2 industry partners.
The document discusses an analysis of optical character recognition (OCR) results for historical documents. It describes creating language and error profiles to characterize documents, including spelling variations and common OCR mistakes. These profiles help adapt OCR and post-processing to each document. The document also presents an interactive system to efficiently correct OCR errors in historical texts by utilizing the document profiles.
This document provides an overview of the iTILT project, which aims to explore effective uses of interactive whiteboards (IWBs) for communicative language teaching. It discusses general tips for using IWBs, including classroom organization, organizing materials, and additional devices. It also covers criteria for designing and evaluating IWB-based materials, including ensuring tasks are communicative, interactive, and focus on meaning over form. Examples of using IWBs for teaching the four skills - speaking, listening, reading, and writing - as well as vocabulary and grammar are also provided.
Models and tools for aggregating and annotating content on ECLAPPaolo Nesi
This document summarizes the ECLAP project which provides tools and a semantic model for aggregating and annotating cultural heritage content. The model allows content from different sources to be collected, organized and shared without modification. It also supports educational uses through organizing content and giving semantic meaning. Key tools described include playlists, collections, and storytelling annotations. The project aims to make European performing arts content accessible through both its own portal and Europeana.
Europeana. A Digital Library for the Humanities?AubreyMcFato
A presentation given by Stefan Gradmann about Europeana and his role for the humanities.
Lesson given to DILL students at ECDL 2009, in Corfu.
Slides are released in CC-By-SA.
(+351) 21 421 12 34
Languages: Portuguese, English, French, Spanish
Driver’s license: B
Academic Qualifications:
- Integrated Master in Electronic and Telecom Engineering (1980)
- Post-Graduation in Information Sciences – Archives (2007/09)
- Post-Graduation in Information Sciences – Libraries (2007/09)
- PhD in Electronic Document Management (2010)
Areas of Expertise:
- Information Management
- Document and Records Management
- Digital Archives and Libraries
- Information Flow Optimization
- Project Management
- Training and Consulting
The document discusses an attractive way to teach programming to students in Argentina. It proposes using personal robots like Scribblers and teaching the Python programming language. Students will learn to program the robots' behaviors and sensors. This project aims to motivate students by making programming fun and hands-on, and to help integrate programming into school curricula in Argentina. It details the current situation, an inspiring robotics education project in the US, and pilot experiences in schools in La Plata, Argentina.
(+351) 21 421 30 00
Languages: Portuguese, English, French, Spanish
Driver’s license: B
Academic qualifications:
- Integrated Master in Electronic and Telecom Engineering (1980)
- Post-Graduation in Information Sciences – Archives (2007/09)
- Post-Graduation in Information Sciences – Libraries (2007/09)
- PhD in Electronic Document Management (2010)
Areas of expertise: Information and knowledge management, document management systems, business process management, information architecture.
Strong background in engineering and technology combined with information sciences. Able to understand business needs and translate them into technical solutions. Experienced in managing complex
The Postcorrection Tool from LMU Munich uses advanced language technology developed in the IMPACT project to improve interactive document postcorrection. It integrates lexicons, matching tools, and profiling to power automated error detection and suggested corrections. Unsupervised analysis of OCRed documents provides document-centric knowledge for identifying error classes. The tool also enables batch corrections of multiple errors at once.
(+351) 21 421 11 11
Languages: Portuguese, English, French, Spanish
Academic qualifications:
- Integrated Masters in Electronic and Telecom Engineering (1980)
- Post-Graduation in Information Sciences – Libraries (2007/09)
- Post-Graduation in Information Sciences – Archives (2007/09)
- PhD researcher in Electronic Document Management (2008/2010)
Main areas of expertise:
- Information management
- Digital transformation projects
- Complex information systems implementation
- Information flow regulation within organizations
- Team leadership and project management
Personal interests: New technologies, reading, traveling, cinema, music.
Tomaž Erjavec discusses the development of language resources for historical Slovene, including transcribed texts, an annotated corpus, and a historical lexicon. Over 10 million words of historical Slovene texts have been transcribed. A reference corpus of 300,000 words from the 15th-19th centuries was annotated for part-of-speech and modern equivalents. An initial lexicon of 3,000 entries was expanded to over 20,000 entries incorporating forms from the annotated corpus. The resources aim to support research on and processing of historical Slovene texts.
The document discusses digitization workflows for enhancing and segmenting documents for optical character recognition (OCR). It describes steps for image enhancement including border removal, page curl removal, and correction of arbitrary warping. It then discusses standalone methods for segmenting text lines, words, and characters without relying on character recognition. These include a hybrid text line segmenter and density-based word segmenter that have been evaluated on historical documents with promising results. The techniques allow digitization of documents with non-standard words or layouts.
The document introduces the IMPACT Centre of Competence, a not-for-profit organization that aims to advance digitization of historical materials. It provides tools, services, and testing facilities for practitioners in content institutions, researchers, and industry. Membership offers benefits like access to datasets and tools, implementation support, and knowledge sharing. The Centre will be sustained through membership fees and contributions to support continued collaboration in the community.
The document summarizes research activities and tools developed by the National Center for Scientific Research "Demokritos" for the IMPACT project. It describes tools for border detection, page curl detection, and character segmentation. Evaluation results for the border detection and page curl detection tools on large datasets are provided.
- CLARIN aims to create a federated infrastructure providing researchers access to digital language data and tools through a single sign-on. It seeks to integrate existing resources across Europe to advance humanities and social sciences research.
- CLARIN's success requires collaboration with libraries, which hold vast amounts of printed materials indispensable for researchers but face obstacles like copyright and lack of standardization.
- The IMPACT project's work on optical character recognition technology and goal of an OCR center of expertise can help address a key challenge and bring CLARIN and libraries closer through continued collaboration beyond the project.
The document outlines the roadmap for updates and new features in the Taverna workflow system, including releasing versions 2.3 and 3.0 with improvements to the user interface, support for new standards, and integration with additional technologies and domains like clouds, semantic web, and biodiversity. It also discusses new plugins and tools being developed to enhance provenance capture, support additional file formats, and provide domain-specific functionality for astronomy, life sciences, and data mining.
The document announces an IMPACT-myGrid-Hackathon event scheduled for November 14-15, 2011 at the University of Manchester. The event website provides additional information and is located at http://impact-mygrid-taverna-hackathon.wikispaces.com/. The hackathon focused on myGrid and Taverna tools.
The document discusses named entity (NE) recognition in digitized historical texts. It describes how NEs like people, locations and organizations can be identified during optical character recognition (OCR) and retrieved for users. The key steps include building an NE lexicon database by collecting data, tagging and enriching NEs with metadata, and linking variant names. This helps improve OCR quality and allows users to find NEs despite spelling variations in historical texts.
The document discusses an analysis of optical character recognition (OCR) results for historical documents. It describes creating language and error profiles to characterize documents, including spelling variations and common OCR mistakes. These profiles help adapt OCR and post-processing to each document. The document also presents an interactive system to efficiently correct OCR errors in historical texts by utilizing the document profiles.
This document provides an overview of the iTILT project, which aims to explore effective uses of interactive whiteboards (IWBs) for communicative language teaching. It discusses general tips for using IWBs, including classroom organization, organizing materials, and additional devices. It also covers criteria for designing and evaluating IWB-based materials, including ensuring tasks are communicative, interactive, and focus on meaning over form. Examples of using IWBs for teaching the four skills - speaking, listening, reading, and writing - as well as vocabulary and grammar are also provided.
Models and tools for aggregating and annotating content on ECLAPPaolo Nesi
This document summarizes the ECLAP project which provides tools and a semantic model for aggregating and annotating cultural heritage content. The model allows content from different sources to be collected, organized and shared without modification. It also supports educational uses through organizing content and giving semantic meaning. Key tools described include playlists, collections, and storytelling annotations. The project aims to make European performing arts content accessible through both its own portal and Europeana.
Europeana. A Digital Library for the Humanities?AubreyMcFato
A presentation given by Stefan Gradmann about Europeana and his role for the humanities.
Lesson given to DILL students at ECDL 2009, in Corfu.
Slides are released in CC-By-SA.
(+351) 21 421 12 34
Languages: Portuguese, English, French, Spanish
Driver’s license: B
Academic Qualifications:
- Integrated Master in Electronic and Telecom Engineering (1980)
- Post-Graduation in Information Sciences – Archives (2007/09)
- Post-Graduation in Information Sciences – Libraries (2007/09)
- PhD in Electronic Document Management (2010)
Areas of Expertise:
- Information Management
- Document and Records Management
- Digital Archives and Libraries
- Information Flow Optimization
- Project Management
- Training and Consulting
The document discusses an attractive way to teach programming to students in Argentina. It proposes using personal robots like Scribblers and teaching the Python programming language. Students will learn to program the robots' behaviors and sensors. This project aims to motivate students by making programming fun and hands-on, and to help integrate programming into school curricula in Argentina. It details the current situation, an inspiring robotics education project in the US, and pilot experiences in schools in La Plata, Argentina.
(+351) 21 421 30 00
Languages: Portuguese, English, French, Spanish
Driver’s license: B
Academic qualifications:
- Integrated Master in Electronic and Telecom Engineering (1980)
- Post-Graduation in Information Sciences – Archives (2007/09)
- Post-Graduation in Information Sciences – Libraries (2007/09)
- PhD in Electronic Document Management (2010)
Areas of expertise: Information and knowledge management, document management systems, business process management, information architecture.
Strong background in engineering and technology combined with information sciences. Able to understand business needs and translate them into technical solutions. Experienced in managing complex
The Postcorrection Tool from LMU Munich uses advanced language technology developed in the IMPACT project to improve interactive document postcorrection. It integrates lexicons, matching tools, and profiling to power automated error detection and suggested corrections. Unsupervised analysis of OCRed documents provides document-centric knowledge for identifying error classes. The tool also enables batch corrections of multiple errors at once.
(+351) 21 421 11 11
Languages: Portuguese, English, French, Spanish
Academic qualifications:
- Integrated Masters in Electronic and Telecom Engineering (1980)
- Post-Graduation in Information Sciences – Libraries (2007/09)
- Post-Graduation in Information Sciences – Archives (2007/09)
- PhD researcher in Electronic Document Management (2008/2010)
Main areas of expertise:
- Information management
- Digital transformation projects
- Complex information systems implementation
- Information flow regulation within organizations
- Team leadership and project management
Personal interests: New technologies, reading, traveling, cinema, music.
Tomaž Erjavec discusses the development of language resources for historical Slovene, including transcribed texts, an annotated corpus, and a historical lexicon. Over 10 million words of historical Slovene texts have been transcribed. A reference corpus of 300,000 words from the 15th-19th centuries was annotated for part-of-speech and modern equivalents. An initial lexicon of 3,000 entries was expanded to over 20,000 entries incorporating forms from the annotated corpus. The resources aim to support research on and processing of historical Slovene texts.
The document discusses digitization workflows for enhancing and segmenting documents for optical character recognition (OCR). It describes steps for image enhancement including border removal, page curl removal, and correction of arbitrary warping. It then discusses standalone methods for segmenting text lines, words, and characters without relying on character recognition. These include a hybrid text line segmenter and density-based word segmenter that have been evaluated on historical documents with promising results. The techniques allow digitization of documents with non-standard words or layouts.
The document introduces the IMPACT Centre of Competence, a not-for-profit organization that aims to advance digitization of historical materials. It provides tools, services, and testing facilities for practitioners in content institutions, researchers, and industry. Membership offers benefits like access to datasets and tools, implementation support, and knowledge sharing. The Centre will be sustained through membership fees and contributions to support continued collaboration in the community.
The document summarizes research activities and tools developed by the National Center for Scientific Research "Demokritos" for the IMPACT project. It describes tools for border detection, page curl detection, and character segmentation. Evaluation results for the border detection and page curl detection tools on large datasets are provided.
- CLARIN aims to create a federated infrastructure providing researchers access to digital language data and tools through a single sign-on. It seeks to integrate existing resources across Europe to advance humanities and social sciences research.
- CLARIN's success requires collaboration with libraries, which hold vast amounts of printed materials indispensable for researchers but face obstacles like copyright and lack of standardization.
- The IMPACT project's work on optical character recognition technology and goal of an OCR center of expertise can help address a key challenge and bring CLARIN and libraries closer through continued collaboration beyond the project.
The document outlines the roadmap for updates and new features in the Taverna workflow system, including releasing versions 2.3 and 3.0 with improvements to the user interface, support for new standards, and integration with additional technologies and domains like clouds, semantic web, and biodiversity. It also discusses new plugins and tools being developed to enhance provenance capture, support additional file formats, and provide domain-specific functionality for astronomy, life sciences, and data mining.
The document announces an IMPACT-myGrid-Hackathon event scheduled for November 14-15, 2011 at the University of Manchester. The event website provides additional information and is located at http://impact-mygrid-taverna-hackathon.wikispaces.com/. The hackathon focused on myGrid and Taverna tools.
The document discusses named entity (NE) recognition in digitized historical texts. It describes how NEs like people, locations and organizations can be identified during optical character recognition (OCR) and retrieved for users. The key steps include building an NE lexicon database by collecting data, tagging and enriching NEs with metadata, and linking variant names. This helps improve OCR quality and allows users to find NEs despite spelling variations in historical texts.
The document discusses the transformation of humanities research through digital technologies and optical character recognition (OCR). It describes efforts to extract over 2,000 years of Latin text from digitized books and track linguistic changes over time using machine learning techniques. Computational analysis is helping scholars build dynamic digital editions and study underrepresented languages on a massive scale.
The document discusses ABBYY's involvement in the IMPACT project. It states that ABBYY is the OCR technology provider for IMPACT members. It also notes that ABBYY improved its core OCR technologies for the recognition of old documents through its work on the IMPACT project, focusing on areas like image pre-processing, segmentation, character recognition, and export formats. The presentation provides examples of how ABBYY's technologies were enhanced between versions 9 and 10 for tasks like binarization, layout analysis, and character recognition of historical documents.
This document summarizes the results of experiments examining the effect of scanning parameters like color, resolution, and binarization method on OCR accuracy. The experiments found that bitonal images produced the best OCR results on average but the optimal method varied between images. Higher resolution images did not necessarily improve OCR accuracy. The quality of archival images was also found to affect OCR performance. The document concludes different scanning choices may be suitable depending on the document type and quality.
The IMPACT Interoperability Framework provides a way to integrate various OCR and other software components into reusable workflows. It uses a Java-based architecture with web services and the open source Taverna workflow system. Developers can integrate new command line tools as web services with minimal effort, and workflows can then be built, shared, and executed through a web portal. The framework has been evaluated for scalability and is intended to support a community around sharing workflows and experiments.
Paul Fogel of the California Digital Library examined OCR quality at scale using the corpus from the HathiTrust and its member institutions. The document discusses issues that arise when performing OCR at a massive scale, including the challenges of indexing very large document collections, supporting many different languages, and correcting the inevitable OCR errors produced when scanning and recognizing text from millions of pages.
The document describes CONCERT, an adaptive collaborative correction platform for digitized text. It uses feedback from users to improve optical character recognition and increase productivity of post-correction. Key features include adaptive OCR, quality control tools, productivity tools like games to motivate volunteers, and monitoring of users to prevent data corruption. It has been used successfully in several library digitization projects worldwide.
An Experimental Workflow Development Platform for Historical Document Digitis...cneudecker
An Experimental Workflow Development Platform for Historical Document Digitisation and Analysis
International Workshop on Historical Document Imaging and Processing (HIP).
ICDAR 2011, 16-17 September 2011, Beijing, China.
The document discusses IMPACT, a project supported by the European Community to develop a uniform technical framework for end users to work with digital library tools and applications. The framework is built on open source components and standards and uses a service-oriented architecture. It allows tools to be transformed into web services and combined into workflows for tasks like optical character recognition. The project is coordinated by the National Library of the Netherlands and evaluates workflows using datasets and ground truths.
Scalable and sustainable - OCR & document image analysis in the cloud
New Trends in Humanities Computing. HPC Cloud day, 4 October 2011, Amsterdam, Netherlands.
OCR challenges in historic documents and the contribution of IMPACTcneudecker
OCR challenges in historic documents and the contribution of IMPACT
IFLA 2010 Satellite Meeting "New Techniques for Old Documents", 16-18 August 2010, Uppsala, Sweden.
Experimental Workflow Development in Digitisationcneudecker
The document discusses the IMPACT project, which is developing collaborative workflows for digitizing historical books and newspapers printed before 1900. The project is coordinated by the National Library of the Netherlands and involves 26 partner libraries and research institutes. It aims to create a workflow development platform that allows technical staff to provide tools for library staff to design digitization workflows through a community-driven process. The platform will incorporate modular, transparent, flexible, and extensible architectural principles.
The document discusses the IMPACT project, which is supported by the European Community and coordinated by the National Library of the Netherlands. It proposes establishing a Centre of Competence after IMPACT to support ongoing work in digitization through tools, resources, training, and community support. The Centre would benefit content holders, researchers, and service providers working in digitization.
This document discusses metadata considerations for the Europeana Newspapers project. It begins with an introduction to the speaker and his background in digital library projects. It then covers general concepts of metadata, how metadata is important for digitized newspapers, and the Europeana Newspaper METS ALTO Profile (ENMAP) that is being developed to provide robust metadata for the project. The goal of ENMAP is to create a standardized format for metadata that can be used for preservation, access, and delivery of newspaper data to Europeana.
The document discusses the National and University Library of Slovenia's involvement in the IMPACT project. It summarizes that the project aims to improve OCR and information retrieval on historical collections. The library worked to build lexicons for historical Slovene language and improve OCR of Slovene historical documents from the 18th-19th centuries using these lexicons. The goals were interdependent and included OCR improvement, lexicon building, and improved information retrieval on old texts.
This document discusses optical character recognition (OCR) of historical newspapers. It describes the digitization process, which includes image capturing, text and structure recognition, natural language processing, and content representation. OCR accuracy can be improved through layout analysis, structural metadata extraction, and identifying different content units like articles, advertisements, and entertainment sections. The goal is to make the content and knowledge within digitized newspapers accessible beyond the scanned text.
The document discusses various tools and resources for living labs, including the ENoLL Living Lab Knowledge Center, Living Lab Methodology Handbook, and CoCo Toolkit. It provides an overview of the European Network of Living Labs (ENoLL), describing its members, goals of knowledge sharing and project collaboration between members, and influence on EU policies. Tools covered in the Knowledge Center and Handbook are meant to facilitate the living lab methodology.
Targeted Language Resources for the Digitisation of Historical CollectionsEmma Huber
The document discusses the IMPACT project, which is supported by the European Community under the FP7 ICT Work Programme. The project aims to develop targeted language resources for digitizing historical collections. It is coordinated by the National Library of the Netherlands. The document outlines some of the challenges of optical character recognition (OCR) and information retrieval (IR) on historical texts due to factors like image quality, historical language variants, and unknown words. It proposes the development of specialized language resources like lexicons and language models to help address these challenges.
The Presentation of Hans-Jörg Lieder, Staatsbibliothek zu Berlin – Preußischer Kulturbesitz, at the BnF Information Day for Europeana Newspapers (November 2014).
European network for co-ordination of policies and programmes on e-infrastruc...Jisc
The document discusses the e-InfraNet project, a European network that aims to coordinate policies and programs on e-infrastructures. It coordinates representatives from ministries and funding agencies in 9 European countries. The project objectives are to identify existing programs, analyze them, exchange strategic views, and prepare joint policy activities and calls. Some focus areas are cloud computing, environmental computing, and openness. Events have been held on related topics such as green computing and cloud computing.
The document summarizes the Europeana Newspapers project, which digitized over 18 million newspaper pages from 20 languages and 950 titles from 18 partner institutions. The project developed tools to extract text from images using OCR and named entity recognition in three languages. Digitized pages were made available through Europeana and other online interfaces with search and browsing functions.
The document discusses text summarization and describes the problem it addresses of producing concise summaries of lengthy documents. It outlines two main techniques for text summarization - extractive summarization which extracts key phrases and sentences, and abstractive summarization which generates a new summary using NLP techniques. The model used is natural language processing with Python's NLTK package. Tools used include Spyder/PyCharm and the technologies are NLP machine learning with the Python programming language. The overall goal is to create an efficient and accurate text summarizer.
ECLAP White paper, social network for Cultural Heritage on Peforming artsPaolo Nesi
the experience of a new generation digital content service is presented, namely ECLAP (European Collected Library of Artistic Performance, http://www.eclap.eu). ECLAP is a live lab in which several new technologies and solutions in the area of semantic computing and social media have been developed and put under trial of the final users and institutions. On this regard, ECLAP is open for both content and results experimentations, and presently comprises more than 35 prestigious international institutions; ECLAP provides services and tools for automated content ingestion, adaptation, metadata ingestion and editing, semantic information extraction, indexing and distribution by exploiting the most innovative and consolidated technologies. ECLAP supports the institutions in all their activities: metadata selection and mapping, content ingestion, to the definition and management of permissions and licenses on contents, and finally managing their users on ECLAP services. According to ECLAP workflow, the obtained metadata are sent to Europeana only after that the metadata have been enriched and linked to a reachable digital resource and when the IPR details have been finalized, with needed quality level. An ECLAP IPR Model can be associated with each single content or collection. ECLAP also provides infrastructural connection for direct promotion of content towards a large number of social networks, including: Facebook, LinkedIn, Diggs, Twitter, etc. On ECLAP, each content provider may have its own distribution channel/group (including a forum and a blog in addition to the space for their content collections, and the groups can be open, moderated or private) with the possibility of customizing the group user interface according to their logo and colours. This multitenant modality permits at the institutions to see ECLAP as a non-intrusive service, to reinforce their brand and at the same time to exploit and experiment a number of innovative ECLAP tools, to accelerate the promotion exploiting ECLAP social media, LOD and Europeana channels, and ready to access new users for their content. ECLAP provides the unique videos, images and texts related to more than 50 years of activity of the Dario Fo and Franca Rame theatre company, featuring videos, photos, texts, drawings, paintings, sketches, posters, copies of contracts and of invoices, notes, books, articles. Other unique, irreplaceable material includes video, audio recordings and photos of performances, workshops, seminars, rehearsals of Jerzy Grotowski, Peter Brook, Gennadi Bogdanov, Anatolij Vasil’ev, Alberto Sordi, Carmelo Bene, Giorgio Strehler, Mimmo Cuticchio, Gian Maria Volonté, Judith Malina.
Similar to IMPACT Final Conference - Muehlberger - FEP (20)
Slides of the paper Deep Learning-Based Morphological Taggers and Lemmatizers for Annotating Historical Texts by Helmut Schmid at the 3rd Edition of the DATeCH2019 International Conference
This document discusses using text models to improve the accuracy of optical character recognition (OCR) on Chinese rare books. It conducted experiments using n-gram, backward/forward n-gram, and LSTM models on OCR data from ancient medicine books. The backward and forward 4-gram model achieved the highest correction rate at 97.57%. Mixing the LSTM 6-gram model with the OCR's top 5 candidates and probability of the top candidate further improved accuracy to 97.71%, demonstrating that combining text models with OCR probabilities can better correct OCR errors than text models alone. In conclusion, text models are effective for increasing OCR accuracy on rare books, with backward/forward 4-gram and LSTM 6-gram
Slides of the paper Turning Digitised Material into a Diachronic Corpus: Metadata Challenges in the Nederlab Project by Katrien Depuydt and Hennie Brugman at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Standoff Annotation for the Ancient Greek and Latin Dependency Treebank by Giuseppe Celano at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Using lexicography to characterise relations between species mentions in the biodiversity literature by Sandra Young at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Implementation of a Databaseless Web REST API for the Unstructured Texts of Migne's Patrologia Graeca with Searching capabilities and additional Semantic and Syntactic expandability by Evagelos Varthis, Marios Poulos, Ilias Yarenis and Sozon Papavlasopoulos at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Curation Technologies for a Cultural Heritage Archive: Analysing and transforming a heterogeneous data set into an interactive curation workbench by Georg Rehm, Martin Lee, Julián Moreno Schneider and Peter Bourgonje at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Cross-disciplinary collaborations to enrich access to non-Western language material in the Cultural Heritage sector by Tom Derrick and Nora McGregor at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Tribunal Archives as Digital Research Facility (TRIADO): new ways to make archives accessible and useable by Anne Gorter, Edwin Klijn, Rutger Van Koert, Marielle Scherer and Ismee Tames at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Improving OCR of historical newspapers and journals published in Finland by Senka Drobac, Pekka Kauppinen and Krister Lindén at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Towards a generic unsupervised method for transcription of encoded manuscripts by Arnau Baró, Jialuo Chen, Alicia Fornés and Beáta Megyesi at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Towards the Extraction of Statistical Information from Digitised Numerical Tables - The Medical Officer of Health Reports Scoping Study by Christian Clausner, Apostolos Antonacopoulos, Christy Henshaw and Justin Hayes at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Detecting Articles in a Digitized Finnish Historical Newspaper Collection 1771–1929: Early Results Using the PIVAJ Software by Kimmo Kettunen, Teemu Ruokolainen, Erno Liukkonen, Pierrick Tranouez, Daniel Antelme and Thierry Paquet at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper OCR-D: An end-to-end open-source OCR framework for historical documents by Clemens Neudecker, Konstantin Baierer, Maria Federbusch, Kay-Michael Würzner, Matthias Boenig, Elisa Hermann and Volker Hartmann at the 3rd Edition of the DATeCH2019 International Conference
- The document describes a project to fill gaps in knowledge about diamond mining, trading, and polishing in Borneo by developing a workflow using various CLARIAH tools and resources.
- The workflow involved digitizing a diamond encyclopedia, extracting concepts and place names, linking the data to external sources to create linked open data, and querying newspaper archives to build a corpus of relevant articles.
- Promising results showed mining, trading, and polishing continued in Borneo for Southeast Asian customers, and described previously unknown diamond fields and polishing locations in Borneo. The project aims to apply the workflow to other commodities like sugar.
Slides of the paper Automatic Reconstruction of Emperor Itineraries from the Regesta Imperii by Juri Opitz, Leo Born, Vivi Nastase and Yannick Pultar at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Automatic Semantic Text Tagging on Historical Lexica by Combining OCR and Typography Classification by Christian Reul, Sebastian Göttel, Uwe Springmann, Christoph Wick, Kay-Michael Würzner and Frank Puppe at the 3rd Edition of the DATeCH2019 International Conference
This document describes the SOS system for segmenting, stemming, and standardizing Arabic text. It presents the challenges of processing Arabic cultural heritage texts which contain orthographic variations. The system uses gradient boosting machines and achieves state-of-the-art performance on segmentation and derives stemming as a byproduct. It also standardizes orthography with high accuracy, which further improves segmentation. The system addresses issues like hamza forms and letter confusions that previous systems did not handle well.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Building RAG with self-deployed Milvus vector database and Snowpark Container...Zilliz
This talk will give hands-on advice on building RAG applications with an open-source Milvus database deployed as a docker container. We will also introduce the integration of Milvus with Snowpark Container Services.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Presentation of the OECD Artificial Intelligence Review of Germany
IMPACT Final Conference - Muehlberger - FEP
1. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Li brary of the
Netherlands.
The Functional Extension Parser
A Document Understanding Platform
Günter Mühlberger
University Innsbruck Library (ULB Tyrol)
2. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Li brary of the
Netherlands.
Document understanding
A book is more than just pure text – it contains a lot of structural
metadata
These metadata are (often) encoded in the layout of a document
Size of characters, position on page, distance to other lines, etc. is
used to express structural meaning
FEP is designed to “understand” the meaning of the layout
2
3. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Li brary of the
Netherlands.
Headlines
Footnotes
Print space
3
4. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Li brary of the
Netherlands.
Running title
Page number
Signature mark
4
5. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Li brary of the
Netherlands.
Table of Contents
Single entries
Authors
Titles
Page numbers
5
6. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Li brary of the
Netherlands.
Why structural tagging is important
– some examples
Search & Retrieval
References and links to other documents
Reading: analogue and digital
6
7. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Li brary of the
Netherlands.
Search & retrieval
– Ranking and scoring,
noise reduction
The same word
appears in the running
title of a journal at
every page
“Alpenverein”
Front matters, such as
title pages, dedications,
table of contents
tables, etc.
Back matters such as
indexes, ads, etc.
7
8. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Li brary of the
Netherlands.
– Search & retrieval
– Facets for full-text
Currently facets are
used for metadata such
as author, year, text
type, ...
A user might be
interested in facets
such as headline,
footnote, index, etc...
8
9. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Li brary of the
Netherlands.
Citations index / cloud
– Footnotes, reference
lists, citations contain
bibliographic links to
books, journal articles,
texts, etc.
– Structural tagging
supports detection of
bibliographic references
– May also be used for
catalogue enrichment
Cawkell, A. E. (1971)
9
10. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Li brary of the
Netherlands.
Digital reading
– Tablet computers as
alternative for reading
historical books with
OCR below reading
quality
– Expected features
Nicely cropped pages
Bookmarks
ToC page linked with
headings
Advanced reading
– eBooks for modern texts
with satisfying OCR
quality
– Structure can be
encoded into ePUB etc.
10
11. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Li brary of the
Netherlands.
Analogue reading
– Print on Demand
– Print space as old
concept with new
benefits
– Reconstruction helps to
semi-automate the
standardized production
of pre-press files
11
12. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Li brary of the
Netherlands.
Technical background
Input
– OCR text which needs to contain at least word coordinates
– E.g. ALTO files, ABBYY XML or Google Books (Tesseract) HTML
Output
– Annotations of structural elements with coordinates, e.g. page numbers,
running titles, headings, footnotes, printspace, etc.
– Output format: METS/ALTO, XML, etc.
FEP System
– Images and/or OCR files are loaded via a web-service
– OCR data are converted into internal format
– Information is processed based on rules
– Results are stored in a database
– Quality control on the basis of “ground truth”, e.g. expected results
– Rules are either manually encoded (expert knowledge) and/or based on
machine learning (large document sets)
12
13. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Li brary of the
Netherlands.
13
14. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Li brary of the
Netherlands.
Apart from books...
FEP
– IMPACT: A generic rule set for historical books has been developed
– This rule set can be used as basis for similar documents
Journals
Critical editions
etc.
– Other rule sets can be developed from the scratch
Manual and/or machine learning
Other document types
– Index cards
– Title pages
– Journals
– Dissertations
– Printed catalogues and bibliographies
– ...
14
15. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Li brary of the
Netherlands.
Results
Basic rules set
– General structural elements of books from e.g. 1700 to 2010
– Data set: 155 books, 30.673 pages (141 training set, 41 evaluation set)
– All pages were manually annotated (ground truth)
Recall, Precision, F-Measure
– 10 lines with headings in a book. We find e.g. 12 lines, 8 of them correct, 4
false:
– Recall = 8 of 10 = 0,8
– Precision = 8 of 12 = 0,66
– F-Measure = 2*0.8*0.66/(0.8+0.66) = 0,72
More information
– Important: We count lines, not structural entities!
E.g. if a heading has two lines one might be correct, the other one might not be
recognised
– Differences between training and evaluation set are low
15
16. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Li brary of the
Netherlands.
Some results on the evaluation set
F-
Precisi measu
Recall on re
Running
text 0,99 0,98 0,98
Running
titles 0,97 1 0,98
Page
16
17. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Li brary of the
Netherlands.
Comment
Research situation
– Document analysis is a wide field and many applications
– But only very little research on (historical) books
– Due to lack of datasets hard to compare our results with other research
groups dataset will be published next year
Detection of ToC pages and ToC entries
– Rules set for ToC was developed recently
– Reasonable results compared with INEX competition
– Foreseen to publish results in spring 2011
Method
– Combination of manual and machine learning methods using fuzzy logic
– Application for a patent at the European Patent Office in September
2011
17
18. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Li brary of the
Netherlands.
How to deal with uncertainty and errors?
Option 1: Leave it as it is
– Accept the accuracy which can be provided automatically
– Inclusion of ground truth in the database allows to exactly measure the
quality of the automated processing one knows in advance what can
be expected
Pro
– Maybe the only solution for really large document sets
– It is much cheaper to develop better rule sets than to correct large
numbers of documents
– Good results for homogenous sets are possible
– Similar to OCR
Con
– You and your users need to accept errors
– People want to contribute and to correct
18
19. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Li brary of the
Netherlands.
How to deal with uncertainty and errors?
Option 2: Correct it
– Service providers or library staff needs to correct
– Manual correction with automated support
Pro
– Batch correction + off shore is relatively cheap and effective
– Quick and standardized results
– Users are satisfied
Con
– A reasonable investment is necessary
– The complexity of the workflow may not be underestimated
– Probably it will be too expensive to correct all interesting elements, therefore
you and your users still need to accept “some” errors
– Users still want to contribute but do not have a chance
19
20. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Li brary of the
Netherlands.
Option 3
Provide a user interface for the crowd
– Correction of OCR results may only be the start for also providing interfaces
for structural annotations
– Might be combined with some basic corrections carried out by service
providers
Pro
– Satisfies the willingness of users to contribute
– Users get immediate benefit, e.g. they are able to download structured
PDFs for their iPad, or annotated full-text for further processing
– Users are satisfied AND are able to contribute
– Library gets correct and standardized data
Con
– An reasonable investment is necessary both for the user interface as well
as for adapting the digital library application
– User interfaces need to be powerful, self-explaining and simple
– You and your users need to accept that there are always errors in the
collection and that it will take decades to come to an end
20
21. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Li brary of the
Netherlands.
FEP User Interface
A concept study for a powerful, self-explaining and simple GUI
– Currently a “general purpose interface” to display, edit and correct the
structural elements of books
– No optimisation for specific tasks and large amounts of documents
– Has the potential to become a user interface for the crowd
– Could look completely different!
Based on Google Web Tool Kit (GWT)
– Open source tool kit for complex browser based developments
– GWT allows for features previously seen mainly in FLASH interfaces
– Growing community
– Good experiences: GWT allows to create interfaces in a relatively short
time period
21
22. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Li brary of the
Netherlands.
Display of results
22
23. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Li brary of the
Netherlands.
Rich interface
23
24. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Li brary of the
Netherlands.
Recognized elements, e.g. headings
24
25. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Li brary of the
Netherlands.
Display of ground truth
25
26. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Li brary of the
Netherlands.
Page numbers
26
27. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Li brary of the
Netherlands.
Page numbers control
27
28. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Li brary of the
Netherlands.
ToC pages
28
29. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Li brary of the
Netherlands.
ToC entries
29
30. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Li brary of the
Netherlands.
Linking of entries with pages/headings
30
31. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Li brary of the
Netherlands.
ToC hierarchy editor
31
32. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Li brary of the
Netherlands.
Drag and drop of entries
32
33. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Li brary of the
Netherlands.
Export from FEP web-interface
METS/ALTO
– XML Standard for digitised books and documents
PDFs
– Advanced PDFs for eBooks
Original version
FEP processed version
– Pre-press files for Print on Demand
FEP prepress file
ePUB
– For modern documents with good OCR quality or corrected books
33
34. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Li brary of the
Netherlands.
After the project
General
– Innovative projects with research component will be done via the
University Innsbruck
– Commercial projects via a spin-out of the University (transidee)
FEP as a service
– Currently not foreseen to create a product or stand alone version, but to
offer web-services for OCR/structural annotation and remote correction
– Adaptation of the rule sets for specific documents
Pilot
– EOD Network: Digitisation on Demand carried out by more than 30
libraries in Europe
– FEP shall be integrated during 2012
– Member libraries get the chance to use the FEP for producing enhanced
PDFs for eBooks
34
35. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Li brary of the
Netherlands.
Thank you for your attention!
35