JTS 2010 Presentation 'Audiovisual Heritage and Participatory Culture"Johan Oomen
As the Web gets more “social” and as museums, libraries and archives are beginning to offer online access to digital representations of their collections, users and institutions are beginning to inhabit the same, shared information space. This is an exciting prospect, as we are now witnessing new paradigms for engaging users with our shared heritage. 'Netizens' (people actively involved in online communities) are using technological advances, offered by cultural heritage institutions, publishers and other commercial entities, as well as objects from a great variety of sources to shape this information space. The new paradigms imply, in many cases, the need for profound change in institutional practice. For instance, using the power of the Social Web to enrich the knowledge about our shared heritage. As a result, republication and the reuse of heritage will be enhanced, and thus its value is increased.
This presentation focusses on:
- www.openimages.eu
- www.waisda.nl
The document introduces the IMPACT Centre of Competence, a not-for-profit organization that aims to advance digitization of historical materials. It provides tools, services, and testing facilities for practitioners in content institutions, researchers, and industry. Membership offers benefits like access to datasets and tools, implementation support, and knowledge sharing. The Centre will be sustained through membership fees and contributions to support continued collaboration in the community.
The document discusses the transformation of humanities research through digital technologies and optical character recognition (OCR). It describes efforts to extract over 2,000 years of Latin text from digitized books and track linguistic changes over time using machine learning techniques. Computational analysis is helping scholars build dynamic digital editions and study underrepresented languages on a massive scale.
- CLARIN aims to create a federated infrastructure providing researchers access to digital language data and tools through a single sign-on. It seeks to integrate existing resources across Europe to advance humanities and social sciences research.
- CLARIN's success requires collaboration with libraries, which hold vast amounts of printed materials indispensable for researchers but face obstacles like copyright and lack of standardization.
- The IMPACT project's work on optical character recognition technology and goal of an OCR center of expertise can help address a key challenge and bring CLARIN and libraries closer through continued collaboration beyond the project.
The document discusses linguistic resources created for improving access to 16th century German texts. It describes how the IMPACT project adapted resources like lexicons to account for the differences between historical and modern German. A groundtruth corpus spanning 1500-1950 was created, as well as a hypothetical lexicon of rule-based variants and a manually verified lexicon to map historical words to their modern equivalents. These resources were able to cover 30% of 16th century vocabulary and improve optical character recognition.
The document summarizes research activities and tools developed by the National Center for Scientific Research "Demokritos" for the IMPACT project. It describes tools for border detection, page curl detection, and character segmentation. Evaluation results for the border detection and page curl detection tools on large datasets are provided.
Tomaž Erjavec discusses the development of language resources for historical Slovene, including transcribed texts, an annotated corpus, and a historical lexicon. Over 10 million words of historical Slovene texts have been transcribed. A reference corpus of 300,000 words from the 15th-19th centuries was annotated for part-of-speech and modern equivalents. An initial lexicon of 3,000 entries was expanded to over 20,000 entries incorporating forms from the annotated corpus. The resources aim to support research on and processing of historical Slovene texts.
JTS 2010 Presentation 'Audiovisual Heritage and Participatory Culture"Johan Oomen
As the Web gets more “social” and as museums, libraries and archives are beginning to offer online access to digital representations of their collections, users and institutions are beginning to inhabit the same, shared information space. This is an exciting prospect, as we are now witnessing new paradigms for engaging users with our shared heritage. 'Netizens' (people actively involved in online communities) are using technological advances, offered by cultural heritage institutions, publishers and other commercial entities, as well as objects from a great variety of sources to shape this information space. The new paradigms imply, in many cases, the need for profound change in institutional practice. For instance, using the power of the Social Web to enrich the knowledge about our shared heritage. As a result, republication and the reuse of heritage will be enhanced, and thus its value is increased.
This presentation focusses on:
- www.openimages.eu
- www.waisda.nl
The document introduces the IMPACT Centre of Competence, a not-for-profit organization that aims to advance digitization of historical materials. It provides tools, services, and testing facilities for practitioners in content institutions, researchers, and industry. Membership offers benefits like access to datasets and tools, implementation support, and knowledge sharing. The Centre will be sustained through membership fees and contributions to support continued collaboration in the community.
The document discusses the transformation of humanities research through digital technologies and optical character recognition (OCR). It describes efforts to extract over 2,000 years of Latin text from digitized books and track linguistic changes over time using machine learning techniques. Computational analysis is helping scholars build dynamic digital editions and study underrepresented languages on a massive scale.
- CLARIN aims to create a federated infrastructure providing researchers access to digital language data and tools through a single sign-on. It seeks to integrate existing resources across Europe to advance humanities and social sciences research.
- CLARIN's success requires collaboration with libraries, which hold vast amounts of printed materials indispensable for researchers but face obstacles like copyright and lack of standardization.
- The IMPACT project's work on optical character recognition technology and goal of an OCR center of expertise can help address a key challenge and bring CLARIN and libraries closer through continued collaboration beyond the project.
The document discusses linguistic resources created for improving access to 16th century German texts. It describes how the IMPACT project adapted resources like lexicons to account for the differences between historical and modern German. A groundtruth corpus spanning 1500-1950 was created, as well as a hypothetical lexicon of rule-based variants and a manually verified lexicon to map historical words to their modern equivalents. These resources were able to cover 30% of 16th century vocabulary and improve optical character recognition.
The document summarizes research activities and tools developed by the National Center for Scientific Research "Demokritos" for the IMPACT project. It describes tools for border detection, page curl detection, and character segmentation. Evaluation results for the border detection and page curl detection tools on large datasets are provided.
Tomaž Erjavec discusses the development of language resources for historical Slovene, including transcribed texts, an annotated corpus, and a historical lexicon. Over 10 million words of historical Slovene texts have been transcribed. A reference corpus of 300,000 words from the 15th-19th centuries was annotated for part-of-speech and modern equivalents. An initial lexicon of 3,000 entries was expanded to over 20,000 entries incorporating forms from the annotated corpus. The resources aim to support research on and processing of historical Slovene texts.
The document discusses ABBYY's involvement in the IMPACT project. It states that ABBYY is the OCR technology provider for IMPACT members. It also notes that ABBYY improved its core OCR technologies for the recognition of old documents through its work on the IMPACT project, focusing on areas like image pre-processing, segmentation, character recognition, and export formats. The presentation provides examples of how ABBYY's technologies were enhanced between versions 9 and 10 for tasks like binarization, layout analysis, and character recognition of historical documents.
The document discusses digitization workflows for enhancing and segmenting documents for optical character recognition (OCR). It describes steps for image enhancement including border removal, page curl removal, and correction of arbitrary warping. It then discusses standalone methods for segmenting text lines, words, and characters without relying on character recognition. These include a hybrid text line segmenter and density-based word segmenter that have been evaluated on historical documents with promising results. The techniques allow digitization of documents with non-standard words or layouts.
This document summarizes the results of experiments examining the effect of scanning parameters like color, resolution, and binarization method on OCR accuracy. The experiments found that bitonal images produced the best OCR results on average but the optimal method varied between images. Higher resolution images did not necessarily improve OCR accuracy. The quality of archival images was also found to affect OCR performance. The document concludes different scanning choices may be suitable depending on the document type and quality.
The document discusses OCR for typewritten documents. It describes the IMPACT project, which is supported by the European Community under the FP7 ICT Work Programme and coordinated by the National Library of the Netherlands. The presentation covers the challenges of typewritten documents for OCR, the specific approaches used in the IMPACT project's TOCR system, and some example results showing its performance.
Paul Fogel of the California Digital Library examined OCR quality at scale using the corpus from the HathiTrust and its member institutions. The document discusses issues that arise when performing OCR at a massive scale, including the challenges of indexing very large document collections, supporting many different languages, and correcting the inevitable OCR errors produced when scanning and recognizing text from millions of pages.
The IMPACT Interoperability Framework provides a way to integrate various OCR and other software components into reusable workflows. It uses a Java-based architecture with web services and the open source Taverna workflow system. Developers can integrate new command line tools as web services with minimal effort, and workflows can then be built, shared, and executed through a web portal. The framework has been evaluated for scalability and is intended to support a community around sharing workflows and experiments.
The document describes CONCERT, an adaptive collaborative correction platform for digitized text. It uses feedback from users to improve optical character recognition and increase productivity of post-correction. Key features include adaptive OCR, quality control tools, productivity tools like games to motivate volunteers, and monitoring of users to prevent data corruption. It has been used successfully in several library digitization projects worldwide.
The document announces an IMPACT-myGrid-Hackathon event scheduled for November 14-15, 2011 at the University of Manchester. The event website provides additional information and is located at http://impact-mygrid-taverna-hackathon.wikispaces.com/. The hackathon focused on myGrid and Taverna tools.
The document outlines the roadmap for updates and new features in the Taverna workflow system, including releasing versions 2.3 and 3.0 with improvements to the user interface, support for new standards, and integration with additional technologies and domains like clouds, semantic web, and biodiversity. It also discusses new plugins and tools being developed to enhance provenance capture, support additional file formats, and provide domain-specific functionality for astronomy, life sciences, and data mining.
The document discusses named entity (NE) recognition in digitized historical texts. It describes how NEs like people, locations and organizations can be identified during optical character recognition (OCR) and retrieved for users. The key steps include building an NE lexicon database by collecting data, tagging and enriching NEs with metadata, and linking variant names. This helps improve OCR quality and allows users to find NEs despite spelling variations in historical texts.
The document discusses an analysis of optical character recognition (OCR) results for historical documents. It describes creating language and error profiles to characterize documents, including spelling variations and common OCR mistakes. These profiles help adapt OCR and post-processing to each document. The document also presents an interactive system to efficiently correct OCR errors in historical texts by utilizing the document profiles.
The document provides an overview of language work being done in the IMPACT project to improve optical character recognition (OCR) of historical documents. It discusses the development of lexicons for various languages to incorporate historical spelling variations that can help OCR more accurately recognize words. Computational tools are being developed and adapted to assist with building lexicons from corpus materials and dictionaries. Challenges include a lack of resources for some languages and dealing with special characters. The work involves collaboration between institutes to share knowledge and resources for lexicon building across languages.
Digitization, industrialisation - sport broadcasting challenges and the value...FIAT/IFTA
Mediaset Premium is transitioning its sports broadcasting operations from analog to digital. This involves digitizing processes and technologies as well as evolving roles and responsibilities. Key aspects of the transition include developing integrated digital workflows, centralized content management, and redefining roles like journalists and producers to work in a digital "content factory" model. The goal is to maximize the value of content assets across multiple platforms.
Muehlberger - PrestoPrime case study 2 @EUscreen MykonosEUscreen
This document discusses the digitization of audiovisual materials at the University of Innsbruck. It outlines the university's plans to digitize over 90,000 hours of audiovisual content from its collections over the next 5-10 years. As a pilot project, the document focuses on digitizing 2000 VHS cassettes containing 3000 hours of video from the Slavonic Studies Department. It describes the proposed mass digitization process, including using a custom VHS digitization machine to capture content, extracting descriptive and technical metadata, and ingesting the content and metadata into a digital preservation system for long-term access. The goals are to develop an institutional strategy for digitizing and preserving all analog audiovisual materials at the university.
Designing Smart Things: user experience design for networked devicesMike Kuniavsky
In this workshop Mike Kuniavsky, author of "Smart Things: ubiquitous computing user experience design" introduces concepts of user experience design for the post-PC/post-phone world.
How do you design experiences that transcend a single device, or even a family of devices? How do you create experiences that exist simultaneously in your hand and in the cloud?
Using plentiful examples drawn from cutting edge products and the history of technology, the workshop describes underlying trends, shows the latest developments and asks broader questions.
This presentation introduces fundamental concepts of ubiquitous computing user experience design and specific techniques for designing services and interfaces.
Topics include:
- Design for multiple scales
- Design for services used by multiple devices
- Rethinking everyday objects and experiences
- Understanding use context
The document discusses ABBYY's involvement in the IMPACT project. It states that ABBYY is the OCR technology provider for IMPACT members. It also notes that ABBYY improved its core OCR technologies for the recognition of old documents through its work on the IMPACT project, focusing on areas like image pre-processing, segmentation, character recognition, and export formats. The presentation provides examples of how ABBYY's technologies were enhanced between versions 9 and 10 for tasks like binarization, layout analysis, and character recognition of historical documents.
The document discusses digitization workflows for enhancing and segmenting documents for optical character recognition (OCR). It describes steps for image enhancement including border removal, page curl removal, and correction of arbitrary warping. It then discusses standalone methods for segmenting text lines, words, and characters without relying on character recognition. These include a hybrid text line segmenter and density-based word segmenter that have been evaluated on historical documents with promising results. The techniques allow digitization of documents with non-standard words or layouts.
This document summarizes the results of experiments examining the effect of scanning parameters like color, resolution, and binarization method on OCR accuracy. The experiments found that bitonal images produced the best OCR results on average but the optimal method varied between images. Higher resolution images did not necessarily improve OCR accuracy. The quality of archival images was also found to affect OCR performance. The document concludes different scanning choices may be suitable depending on the document type and quality.
The document discusses OCR for typewritten documents. It describes the IMPACT project, which is supported by the European Community under the FP7 ICT Work Programme and coordinated by the National Library of the Netherlands. The presentation covers the challenges of typewritten documents for OCR, the specific approaches used in the IMPACT project's TOCR system, and some example results showing its performance.
Paul Fogel of the California Digital Library examined OCR quality at scale using the corpus from the HathiTrust and its member institutions. The document discusses issues that arise when performing OCR at a massive scale, including the challenges of indexing very large document collections, supporting many different languages, and correcting the inevitable OCR errors produced when scanning and recognizing text from millions of pages.
The IMPACT Interoperability Framework provides a way to integrate various OCR and other software components into reusable workflows. It uses a Java-based architecture with web services and the open source Taverna workflow system. Developers can integrate new command line tools as web services with minimal effort, and workflows can then be built, shared, and executed through a web portal. The framework has been evaluated for scalability and is intended to support a community around sharing workflows and experiments.
The document describes CONCERT, an adaptive collaborative correction platform for digitized text. It uses feedback from users to improve optical character recognition and increase productivity of post-correction. Key features include adaptive OCR, quality control tools, productivity tools like games to motivate volunteers, and monitoring of users to prevent data corruption. It has been used successfully in several library digitization projects worldwide.
The document announces an IMPACT-myGrid-Hackathon event scheduled for November 14-15, 2011 at the University of Manchester. The event website provides additional information and is located at http://impact-mygrid-taverna-hackathon.wikispaces.com/. The hackathon focused on myGrid and Taverna tools.
The document outlines the roadmap for updates and new features in the Taverna workflow system, including releasing versions 2.3 and 3.0 with improvements to the user interface, support for new standards, and integration with additional technologies and domains like clouds, semantic web, and biodiversity. It also discusses new plugins and tools being developed to enhance provenance capture, support additional file formats, and provide domain-specific functionality for astronomy, life sciences, and data mining.
The document discusses named entity (NE) recognition in digitized historical texts. It describes how NEs like people, locations and organizations can be identified during optical character recognition (OCR) and retrieved for users. The key steps include building an NE lexicon database by collecting data, tagging and enriching NEs with metadata, and linking variant names. This helps improve OCR quality and allows users to find NEs despite spelling variations in historical texts.
The document discusses an analysis of optical character recognition (OCR) results for historical documents. It describes creating language and error profiles to characterize documents, including spelling variations and common OCR mistakes. These profiles help adapt OCR and post-processing to each document. The document also presents an interactive system to efficiently correct OCR errors in historical texts by utilizing the document profiles.
The document provides an overview of language work being done in the IMPACT project to improve optical character recognition (OCR) of historical documents. It discusses the development of lexicons for various languages to incorporate historical spelling variations that can help OCR more accurately recognize words. Computational tools are being developed and adapted to assist with building lexicons from corpus materials and dictionaries. Challenges include a lack of resources for some languages and dealing with special characters. The work involves collaboration between institutes to share knowledge and resources for lexicon building across languages.
Digitization, industrialisation - sport broadcasting challenges and the value...FIAT/IFTA
Mediaset Premium is transitioning its sports broadcasting operations from analog to digital. This involves digitizing processes and technologies as well as evolving roles and responsibilities. Key aspects of the transition include developing integrated digital workflows, centralized content management, and redefining roles like journalists and producers to work in a digital "content factory" model. The goal is to maximize the value of content assets across multiple platforms.
Muehlberger - PrestoPrime case study 2 @EUscreen MykonosEUscreen
This document discusses the digitization of audiovisual materials at the University of Innsbruck. It outlines the university's plans to digitize over 90,000 hours of audiovisual content from its collections over the next 5-10 years. As a pilot project, the document focuses on digitizing 2000 VHS cassettes containing 3000 hours of video from the Slavonic Studies Department. It describes the proposed mass digitization process, including using a custom VHS digitization machine to capture content, extracting descriptive and technical metadata, and ingesting the content and metadata into a digital preservation system for long-term access. The goals are to develop an institutional strategy for digitizing and preserving all analog audiovisual materials at the university.
Designing Smart Things: user experience design for networked devicesMike Kuniavsky
In this workshop Mike Kuniavsky, author of "Smart Things: ubiquitous computing user experience design" introduces concepts of user experience design for the post-PC/post-phone world.
How do you design experiences that transcend a single device, or even a family of devices? How do you create experiences that exist simultaneously in your hand and in the cloud?
Using plentiful examples drawn from cutting edge products and the history of technology, the workshop describes underlying trends, shows the latest developments and asks broader questions.
This presentation introduces fundamental concepts of ubiquitous computing user experience design and specific techniques for designing services and interfaces.
Topics include:
- Design for multiple scales
- Design for services used by multiple devices
- Rethinking everyday objects and experiences
- Understanding use context
Cooperation in the Digital Age: Building the Library PlatformConstance Malpas
This document discusses building cooperative library infrastructure in the digital age. It argues that libraries must work together and with other institutions to aggregate and share digital collections and data. Specifically, it notes that libraries are shifting resources from local print collections to licensed electronic materials and digital formats. To better support research, libraries are focusing on making their special collections and institutional assets more discoverable outside their own institutions. The document advocates for a shared infrastructure approach where libraries pool resources and collections to create network effects that benefit all participants.
Live to e-Learning, a lecture capture and delivery service based on MediaMosaMediaMosa
L2L (Live to e-Learning) a lecture capture and delivery service based on MediaMosa. Presentation by Matteo Bertazzo from CINECA InterUniversity Consortium at the MediaMosa Community day, November 25, 2010
From Essence to Assets. Making sense of an audiovisual archiveBrecht Declercq
As presented on November 5, 2016 at the Impact Hub in Athens, Greece, as a part of the Audiovisual archiving workshop of the Interfaces Projects supported by the European Commission
Thinking the archives of 2020: Opportunitiws, priorities, IssuesFIAT/IFTA
This document summarizes a discussion between members of broadcasting archives organizations about priorities and challenges for archives in 2020. The discussion covered many topics, including storage formats and migration, rights management, metadata automation, user interfaces, and financing models. Participants shared their individual organization's priorities, such as NHK's focus on high resolution content and rich navigation or RAI's projects to digitize archives and automate rights management. Overall, the discussion aimed to identify common issues and opportunities to develop strategies together for the future of broadcasting archives.
The Big Data Is A Significant Subject Of Modern Times With...Sarah Gordon
Big data is a significant topic as technologies like smartphones and computers generate large amounts of data daily. Companies need platforms to not only store but also analyze this data quickly, such as Google's BigQuery which runs in the cloud and provides real-time information. The document discusses how BigQuery manages vast amounts of both structured and unstructured data for Google's needs.
The document summarizes a presentation about high performance computing applications in the petroleum industry given by Dr. Leonid Sheremetov of the Mexican Petroleum Institute. It discusses the challenges of exploration and production for PEMEX and outlines IMP's research program including grid-based simulation, data mining, and task optimization on clusters and desktop grids. Specific applications mentioned include reservoir simulation, seismic analysis, data mining of production data, electron microscopy, and a data mining project between IMP and other Mexican institutions.
This document summarizes a presentation about high performance computing (HPC) in the petroleum industry given by Dr. Leonid Sheremetov of the Mexican Petroleum Institute (IMP). It outlines the challenges of HPC in petroleum exploration and production. It provides an overview of IMP's research program in applied mathematics and computing, including their use of HPC. It then summarizes several of IMP's research projects applying HPC to problems in petroleum such as reservoir simulation, data mining of petroleum data, and distributed computing applications.
The document discusses the reasons for and methods of digitizing materials in libraries. It outlines why digitization provides improved access and preservation of collections, what types of materials can be digitized, and how the digitization process works through tools like scanning and metadata capture. The document also considers who should perform the digitization work and where it could take place within the library or through outside contractors.
This document provides an overview of principles of multimedia including definitions of multimedia, its characteristics, applications, building blocks, and relationship with the internet. It also discusses topics like multimedia architecture, user interfaces, hardware support, distributed multimedia applications, streaming technologies, multimedia databases, authoring tools, and multimedia document standards.
The document discusses emerging trends in library networks in the new millennium, including the growth of digital resources and collections, developments in digital library technologies, and the future of networked digital resources. Some key points discussed are the exponential growth of information, transition from physical to digital media, consortium approaches for accessing content, developing digital collections and repositories, and emerging technologies like semantic retrieval and knowledge sharing platforms. The future of library networks is envisioned to include fluid and transient multimedia resources, free and flexible virtual information spaces, global and personalized access, and more emphasis on informal knowledge exchange and social relationships.
Myths, Challenges and Advances in Power & Signal Distribution for Live Event ...Bob Vanden Burgt
The document discusses myths, challenges, and advances in power and signal distribution for live production over the past decade. Digital networking and power distribution requirements have changed substantially, presenting unique reliability and portability challenges with the tight integration of lighting, media, video, and audio in touring shows. The session will provide an overview of some transport protocols, network topologies, and more contemporary methods for distributing power and data in complex and changing production environments.
A Short Course on the Internet of ThingsPrasant Misra
This document provides an overview of a short course on the Internet of Things (IoT). The course content is divided into four sections: IoT Primer, IoT Architecture, IoT "Last-mile" Considerations, and Derivatives for Intelligence. It discusses key topics like IoT history and trends, functional architecture, field devices and standards, and how machine learning can be applied to IoT data. The course aims to provide foundational knowledge on IoT technologies and applications.
Digitisation-Industrialisation: Sport Broadcasting Challenges and the Value o...FIAT/IFTA
The document discusses Mediaset Sport's digital transformation journey in broadcasting sport content. It describes Mediaset Sport's process before digitization, including analog production and distribution. It then outlines Mediaset Sport's roadmap for evolving to digital and file-based workflows across the entire content production process from ingest to publishing. This includes implementing a central newsroom system, tapeless workflows, and integrated search and cataloging.
Managing director of Klokan Technologies GmbH, a small Swiss company that develops innovative geo applications for cultural heritage institutions. The document discusses Old Maps Online, a project that provides an easy-to-use gateway for searching historical maps from libraries around the world. It allows users to search maps by geographic location on an interactive world map and view high resolution maps from contributing institutions with proper crediting back to the libraries. The project is open to additional map contributors and uses tools like BoundingBox and Georeferencer to help enrich map metadata.
The document summarizes image retrieval techniques and applications at the BnF (French National Library). It discusses using deep learning for image segmentation, classification, and indexing. It then describes several BnF projects applying these techniques, including GallicaSimilitudes for visual similarity search of collections, GallicaPix for iconographic retrieval and digital humanities case studies, and collaborations with INRIA on object detection in manuscripts and iterative querying. The goal is improved search and access to the diverse range of images in BnF collections.
Similar to IMPACT Final Conference - Majlis Bremer Laamanen (20)
Slides of the paper Deep Learning-Based Morphological Taggers and Lemmatizers for Annotating Historical Texts by Helmut Schmid at the 3rd Edition of the DATeCH2019 International Conference
This document discusses using text models to improve the accuracy of optical character recognition (OCR) on Chinese rare books. It conducted experiments using n-gram, backward/forward n-gram, and LSTM models on OCR data from ancient medicine books. The backward and forward 4-gram model achieved the highest correction rate at 97.57%. Mixing the LSTM 6-gram model with the OCR's top 5 candidates and probability of the top candidate further improved accuracy to 97.71%, demonstrating that combining text models with OCR probabilities can better correct OCR errors than text models alone. In conclusion, text models are effective for increasing OCR accuracy on rare books, with backward/forward 4-gram and LSTM 6-gram
Slides of the paper Turning Digitised Material into a Diachronic Corpus: Metadata Challenges in the Nederlab Project by Katrien Depuydt and Hennie Brugman at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Standoff Annotation for the Ancient Greek and Latin Dependency Treebank by Giuseppe Celano at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Using lexicography to characterise relations between species mentions in the biodiversity literature by Sandra Young at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Implementation of a Databaseless Web REST API for the Unstructured Texts of Migne's Patrologia Graeca with Searching capabilities and additional Semantic and Syntactic expandability by Evagelos Varthis, Marios Poulos, Ilias Yarenis and Sozon Papavlasopoulos at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Curation Technologies for a Cultural Heritage Archive: Analysing and transforming a heterogeneous data set into an interactive curation workbench by Georg Rehm, Martin Lee, Julián Moreno Schneider and Peter Bourgonje at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Cross-disciplinary collaborations to enrich access to non-Western language material in the Cultural Heritage sector by Tom Derrick and Nora McGregor at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Tribunal Archives as Digital Research Facility (TRIADO): new ways to make archives accessible and useable by Anne Gorter, Edwin Klijn, Rutger Van Koert, Marielle Scherer and Ismee Tames at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Improving OCR of historical newspapers and journals published in Finland by Senka Drobac, Pekka Kauppinen and Krister Lindén at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Towards a generic unsupervised method for transcription of encoded manuscripts by Arnau Baró, Jialuo Chen, Alicia Fornés and Beáta Megyesi at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Towards the Extraction of Statistical Information from Digitised Numerical Tables - The Medical Officer of Health Reports Scoping Study by Christian Clausner, Apostolos Antonacopoulos, Christy Henshaw and Justin Hayes at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Detecting Articles in a Digitized Finnish Historical Newspaper Collection 1771–1929: Early Results Using the PIVAJ Software by Kimmo Kettunen, Teemu Ruokolainen, Erno Liukkonen, Pierrick Tranouez, Daniel Antelme and Thierry Paquet at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper OCR-D: An end-to-end open-source OCR framework for historical documents by Clemens Neudecker, Konstantin Baierer, Maria Federbusch, Kay-Michael Würzner, Matthias Boenig, Elisa Hermann and Volker Hartmann at the 3rd Edition of the DATeCH2019 International Conference
- The document describes a project to fill gaps in knowledge about diamond mining, trading, and polishing in Borneo by developing a workflow using various CLARIAH tools and resources.
- The workflow involved digitizing a diamond encyclopedia, extracting concepts and place names, linking the data to external sources to create linked open data, and querying newspaper archives to build a corpus of relevant articles.
- Promising results showed mining, trading, and polishing continued in Borneo for Southeast Asian customers, and described previously unknown diamond fields and polishing locations in Borneo. The project aims to apply the workflow to other commodities like sugar.
Slides of the paper Automatic Reconstruction of Emperor Itineraries from the Regesta Imperii by Juri Opitz, Leo Born, Vivi Nastase and Yannick Pultar at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Automatic Semantic Text Tagging on Historical Lexica by Combining OCR and Typography Classification by Christian Reul, Sebastian Göttel, Uwe Springmann, Christoph Wick, Kay-Michael Würzner and Frank Puppe at the 3rd Edition of the DATeCH2019 International Conference
This document describes the SOS system for segmenting, stemming, and standardizing Arabic text. It presents the challenges of processing Arabic cultural heritage texts which contain orthographic variations. The system uses gradient boosting machines and achieves state-of-the-art performance on segmentation and derives stemming as a byproduct. It also standardizes orthography with high accuracy, which further improves segmentation. The system addresses issues like hamza forms and letter confusions that previous systems did not handle well.
Assessment and Planning in Educational technology.pptxKavitha Krishnan
In an education system, it is understood that assessment is only for the students, but on the other hand, the Assessment of teachers is also an important aspect of the education system that ensures teachers are providing high-quality instruction to students. The assessment process can be used to provide feedback and support for professional development, to inform decisions about teacher retention or promotion, or to evaluate teacher effectiveness for accountability purposes.
Thinking of getting a dog? Be aware that breeds like Pit Bulls, Rottweilers, and German Shepherds can be loyal and dangerous. Proper training and socialization are crucial to preventing aggressive behaviors. Ensure safety by understanding their needs and always supervising interactions. Stay safe, and enjoy your furry friends!
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
This slide is special for master students (MIBS & MIFB) in UUM. Also useful for readers who are interested in the topic of contemporary Islamic banking.
This presentation includes basic of PCOS their pathology and treatment and also Ayurveda correlation of PCOS and Ayurvedic line of treatment mentioned in classics.
The simplified electron and muon model, Oscillating Spacetime: The Foundation...RitikBhardwaj56
Discover the Simplified Electron and Muon Model: A New Wave-Based Approach to Understanding Particles delves into a groundbreaking theory that presents electrons and muons as rotating soliton waves within oscillating spacetime. Geared towards students, researchers, and science buffs, this book breaks down complex ideas into simple explanations. It covers topics such as electron waves, temporal dynamics, and the implications of this model on particle physics. With clear illustrations and easy-to-follow explanations, readers will gain a new outlook on the universe's fundamental nature.
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
IMPACT Final Conference - Majlis Bremer Laamanen
1. CROWDSOURCING IN THE
DIGITALKOOT PROJECT
Majlis Bremer-Laamanen
IMPACT 24TH OF OCTOBER, 2011
Microtask.com:
Digitalkoot: Making Old Archives Accessible Using Crowdsourcing by
Otto Chrons and Sami Sundell,
Discussions Managing Director Harri Holopainen
harri@microtask.com
2. The Centre for Preservation and Digitisation: statistics
• Established in 1990 • Digitisation: 1,3
• Digitisation started in million pages
1998 • Audio digitisation
• Over 50 employees and cataloguing
music 1,300 unique
• Yearly average (past
cassettes and the
three years):
sleeves
• Microfilm
• Conservation:
production: 1, 3
10,000-15,000 units
million exposures
3. ENRICHING CONTENT
(http://digi.nationallibrary.fi, http://www.doria.fi/handle/10024/4194)
• Newspapers - > 2 million pages, the Historical Newspaper Library
• Journals - > 2,7 million pages, free to 1910, in all legal deposit
libraries to 1944
• Books - > travel, novels, Dissertations 17th century, Save the Book
• Ephemera - > industrial price lists
• Sound - > national sound archive, C-casettes
• Interest groups: the creators, users, contributors of the material
4. Context for mass digitisation and crowdsourcing
Client
Accessibility
Centre for Preservation and Digitisation
Temporary Physical
Preparation for Post- storage for
Digitisation objects
Transferring Digitisation processing digitised objects Retrieval
Physical
Objects
Mass digitisation activities in the most cost-effective manner:
Newspapers, books, journals, ephemera, audio:
• Logistics for physical items
• Process for digital objects: network services and long-term preservation
• Metadata Mets - Alto: capturing through process
• Metadata development: User experience and crowdsourcing
• Customizing of the tracking systems (CCS, Item Tracking, Scan Client)
• Operational environment: scaling architecture and implementation
5. DIGITALKOOT
DIGI = TO DIGITISE
TALKOOT = PEOPLE GATHERING TO WORK TOGETHER
VOLUNTARILY (WITHOUT PAYMENT)
FIRST EXPERIENCE 2011:
DIGITALKOOT: correction of OCR by gamification, turning useful
activities into games ”THE MOLE HUNT” by Microtask.com.
– People can spend hours on games
– Turning useful activities into games
– Activities can be rewarded with scores, achievments and social benefits
From February, 8th to September 15th, 2011: about 80.000
visitors, 4000 hours of effective game time. More than 5 million
tasks.
6. CHALLENGES
Meaningful tasks without breaking the flow of the game
Real-time feedback – many simultaneous players doing
the same task
Build a bridge to save the moles from falling down =>
– Correct typing gives you a block to the bridge
– Incorrect is punished by explosion
15. GAMIFICATION CHALLENGES
Balancing game play elements with task completion speed and
accuracy
Keep the motivation of people and enlarge the audience
Introduction of meaningful tasks into the game without breaking
game play mechanisms
Instant feedback on players´ actions (simultaneous players)
•pressure to adapt to varying feedback situations/latencities
16. POSITIVE EFFECT OF VERIFICATION
”The wisdom of the crowds”
• includes answers from possible spammers
Game start: verification tasks only
Accurate work shown => verification lowered in phases, never zero
Verification tasks are created automatically:
• A randomly selected task is sent to several players: all have to
agree on the result => verification task
17. VERIFICATION OF THE OCR
Players and their pace cannot be synchronized.
Verification tasks to the task stream:
•Fed to players varies according to the number of active players
•The system knows the answer: the game play is improved by fast
feedback
•Downside: no new information produced
18. USERS: February 8th to March 31st, 2011
31,816 visitors, 4,768 players, 2,740 hours of game time, 2,5 million
tasks.
1 % via Internet, 99 % via Facebook
Half of the users were men.
Gametime: seconds to over 100 hours (altogether).
Median time: => 9 minutes.
Women >13 minutes and 54 % of the tasks
Hardest working top 4 were all men
19. ACCURACY
OCR-system 0.8 confidential about accuracy => human correction in 30%
Random selection of 2 articles:
•1,467 words Digitalkoot result: only14 mistakes /228 OCR
•516 words Digitalkoot result: 1 mistake/118 OCR
•>> well over 99% possible by gamification
Spammer play:
•One player 1,5 hours and 5,692 tasks was detected by the verification
system and only 4 tasks were accepted
20. Enriching Digitisation Production
Processes, METS Profiles: a new
development platform RESOURCE
DIGITAL
Articles
Illustrations COMPREHENSIVE
Poems LEVEL OF DIGITAL COLLECTIONS
MARK UP
Standards & OAI-PMH
Structural metadata METS, ALTO complient METS SIP
POST packages
PROCESSING
METS EXPORT
Administrative/technical metadata MIX/PREMIS
Packesges include:
SCANNING JPEG2000
Descriptive metadata MARC21/MODS OCR TXT as ALTO XML
PDF
CATALOGUING Two Bibliographic
Newspapers Records JPEG(150)
Serials
METSXML
Books
Parchments MARCXML
Notes
Maps SOURCE MATERIAL
Audio
PHYSICAL COLLECTIONS
21. IN THE MEDIA
-Until March 31st, over 30 articles: all around the world: New York
Times…
-Television appearances ongoing
-Helsingin Sanomat : HS talkoot using the National Library´s
digitised newspaper material Historical Newspaper Library >
advertising Digitalkoot e.g. September 15th
-Influenced user interest
=> stabilisation to 300 individual users per week
22. NEXT
1) Marking of articles and/or
images
2) Indexing articles and/or
images
23. KUVATALKOOT
Goal: sophisticated
user experience
Collections discovery and
Luonnon-kirja ala-alkeiskouluin tarpeeksi / Z. Topelius, 1868
reuse of digital content by
researchers and people at
large:
Researchers will get better
systematic coverage of
images and articles in
published printed material.