Slides of the paper Cross-disciplinary collaborations to enrich access to non-Western language material in the Cultural Heritage sector by Tom Derrick and Nora McGregor at the 3rd Edition of the DATeCH2019 International Conference
The HiPerCiC initiative creates custom web applications for users through collaboration between computer science students and domain specialists. For the Fall 2015 semester, six digital humanities projects are planned for the ID 259 course involving this work, including tools for exploring 1924 Paris music, an online art museum, a dance video archive, and archaeology research website. Preparation for the course involved developing training materials and frameworks for 3D modeling, maps, and other technologies to support the projects.
Does DH Scholarship Take Place in the Lab?Shawn Day
This document discusses how digital tools and resources are changing humanities scholarship. It provides examples of different types of digital data and tools that can be used for analysis, including text, images, video, and more complex data types like networks and animations. It then highlights two specific digital projects - the Text Analysis Portal for Research (TAPoR) and ManyEyes - that aimed to make text analysis and data visualization tools accessible to researchers. TAPoR developed infrastructure and portals to support text analysis research across Canada, while ManyEyes allowed users to analyze and visualize data and discuss their findings online. The document argues that while traditional scholarship is still important, digital tools require new approaches to research in order to take advantage
A presentation to attendees of our Arabic Scientific Manuscripts ground truth for OCR transcription workshop.
For more details see: https://www.eventbrite.co.uk/e/arabic-scientific-manuscripts-transcription-workshop-tickets-43303096728
About the project: http://blogs.bl.uk/digital-scholarship/2018/03/arabic-handwrittten-ocr.html
NORFest 2023 Lightning Talks Session Three dri_ireland
Lightning Talk Session 3: Enabling FAIR Research Data and Other Outputs
The Irish ORCID Consortium
presented by Catherine Ferris, IReL;
Exploring Large-Scale Open Data: The Curatr Platform
presented by Derek Greene, University College Dublin;
A Workflow for Research Data Management (RDM): Aligning the Management of Research Data
presented by Gail Birkbeck, University College Dublin;
Making Cultural Heritage Data FAIR: Developing Recommendations for the WorldFAIR Project at the Digital Repository of Ireland
presented by Joan Murphy, Digital Repository of Ireland.
Publishing conference proceedings internationally: how does it workAliaksandr Birukou
In this presentation we look into main elements one has to consider when organizing an international conference. First, we describe the role of conference proceedings in CS and beyond. Second, we focus on the tasks of conference organizers. Third, we cover the peer review aspects and announce the new group CrossRef and DataCite start with this respect. We then cover indexing and dissemination as well as present several tips and guidelines for organizers of international conferences as well as the word of warning regarding predatory publishers.
В этой презентации мы рассмотрим основные элементы, которые необходимо учитывать при организации международной конференции. Во-первых, мы описываем роль материалов конференций в компьютерных науках и других областях. Во-вторых, мы концентрируемся на задачах организаторов конференции. В-третьих, мы рассмотрим аспекты рецензирования и расскажем о работе группы CrossRef и DataCite. Затем мы расскажем об индексировании и распространении, а также представим несколько советов и рекомендаций для организаторов международных конференций, а также предостережём о феномене хищнических издателей и конференций.
Presentation to the National Science Library of the Chinese Academy of Scienceslabsbl
1100 - 1300, Thursday, 26th April 2018,
British Library Labs and Digital Scholarship at the British Library, Harley Room, British Library, St Pancras, London.
Presentation to the National Science Library of the Chinese Academy of Sciences
by Mahendra Mahey Manager of BL Labs
The Work of British Library Labs and Digital ScholarshipInsights from British Library Labs and an emerging role for Libraries
The HiPerCiC initiative creates custom web applications for users through collaboration between computer science students and domain specialists. For the Fall 2015 semester, six digital humanities projects are planned for the ID 259 course involving this work, including tools for exploring 1924 Paris music, an online art museum, a dance video archive, and archaeology research website. Preparation for the course involved developing training materials and frameworks for 3D modeling, maps, and other technologies to support the projects.
Does DH Scholarship Take Place in the Lab?Shawn Day
This document discusses how digital tools and resources are changing humanities scholarship. It provides examples of different types of digital data and tools that can be used for analysis, including text, images, video, and more complex data types like networks and animations. It then highlights two specific digital projects - the Text Analysis Portal for Research (TAPoR) and ManyEyes - that aimed to make text analysis and data visualization tools accessible to researchers. TAPoR developed infrastructure and portals to support text analysis research across Canada, while ManyEyes allowed users to analyze and visualize data and discuss their findings online. The document argues that while traditional scholarship is still important, digital tools require new approaches to research in order to take advantage
A presentation to attendees of our Arabic Scientific Manuscripts ground truth for OCR transcription workshop.
For more details see: https://www.eventbrite.co.uk/e/arabic-scientific-manuscripts-transcription-workshop-tickets-43303096728
About the project: http://blogs.bl.uk/digital-scholarship/2018/03/arabic-handwrittten-ocr.html
NORFest 2023 Lightning Talks Session Three dri_ireland
Lightning Talk Session 3: Enabling FAIR Research Data and Other Outputs
The Irish ORCID Consortium
presented by Catherine Ferris, IReL;
Exploring Large-Scale Open Data: The Curatr Platform
presented by Derek Greene, University College Dublin;
A Workflow for Research Data Management (RDM): Aligning the Management of Research Data
presented by Gail Birkbeck, University College Dublin;
Making Cultural Heritage Data FAIR: Developing Recommendations for the WorldFAIR Project at the Digital Repository of Ireland
presented by Joan Murphy, Digital Repository of Ireland.
Publishing conference proceedings internationally: how does it workAliaksandr Birukou
In this presentation we look into main elements one has to consider when organizing an international conference. First, we describe the role of conference proceedings in CS and beyond. Second, we focus on the tasks of conference organizers. Third, we cover the peer review aspects and announce the new group CrossRef and DataCite start with this respect. We then cover indexing and dissemination as well as present several tips and guidelines for organizers of international conferences as well as the word of warning regarding predatory publishers.
В этой презентации мы рассмотрим основные элементы, которые необходимо учитывать при организации международной конференции. Во-первых, мы описываем роль материалов конференций в компьютерных науках и других областях. Во-вторых, мы концентрируемся на задачах организаторов конференции. В-третьих, мы рассмотрим аспекты рецензирования и расскажем о работе группы CrossRef и DataCite. Затем мы расскажем об индексировании и распространении, а также представим несколько советов и рекомендаций для организаторов международных конференций, а также предостережём о феномене хищнических издателей и конференций.
Presentation to the National Science Library of the Chinese Academy of Scienceslabsbl
1100 - 1300, Thursday, 26th April 2018,
British Library Labs and Digital Scholarship at the British Library, Harley Room, British Library, St Pancras, London.
Presentation to the National Science Library of the Chinese Academy of Sciences
by Mahendra Mahey Manager of BL Labs
The Work of British Library Labs and Digital ScholarshipInsights from British Library Labs and an emerging role for Libraries
SGCI Science Gateways: Ushering in a New Era of Sustainability Sandra Gesing
The computational landscape has never so fast evolved like in the last decade. Computational scientific methods tackle an increasing breadth and diversity of topics – analyzing data on a large scale and accessing high-performance computing infrastructures, cutting-edge hardware and instruments. Novel technologies such as next-gen sequencing or the Square Kilometre Array telescope, the world largest radio telescope, have evolved, which allow creating data in exascale dimension. While the availability of this data salvage to find answers for research questions, which would not have been feasible before, the amount of data creates new challenges, which obviously need novel computational solutions. Such novel solutions require integrative approaches for multidisciplinary teams across geographical boundaries, which improve usability of scientific methods tailored to the target user communities and aim at achieving reproducibility of science. The goal of science gateways, also called virtual research environments or virtual laboratories, are following exactly this goal to provide an easy-to-use end-to-end solution hiding the complex underlying infrastructure. They support researchers with intuitive user interfaces to focus on their research question instead of becoming acquainted with technological details.
Science gateways are often developed by research teams, who are not necessarily in the computer science domain and science projects depend on academic funding. Centralized research programmer teams, who can provide broad experience and contribute to sustainability of solutions, are rather rare at universities and there is still a lack of incentives for interested developers to stay in academia. One of the future challenges for science gateways and thus for computational scientific methods will be to increase the sustainability and getting less dependent on successful proposals. The US National Science Foundation has recognized the importance of this topic for research and has funded the Science Gateways Community Institute (SGCI) to support not only teams in developing science gateways but also to help communities to find a way to sustain their favorite science gateway for conducting their research. This talk will go into detail for current challenges, the landscape around science gateways, the services of SGCI and approaches to reach sustainability.
This document discusses virtual research environments (VREs) in the digital humanities field. It provides examples of several existing VREs, including TextGrid (Germany), TAPoR (Canada), NINES (US/UK), DARIAH (EU-wide), and a VRE for European Holocaust research. It explains that VREs aim to provide researchers with collaborative tools and interfaces to organize, analyze, and share digital research materials online. However, developing VREs for the humanities poses challenges around establishing common standards, balancing diversity of research with coordination needs, and ensuring new technologies support rather than hinder existing humanistic methods.
The document discusses tools for text digitization and transcription. It describes the IMPACT Centre of Competence, which brings together content holders, service providers, and researchers to advance digitization. The Center shares expertise and tools for tasks like image enhancement, segmentation, optical character recognition, and evaluation. Member organizations can access resources, consult experts, and collaborate on projects. The Poznan Supercomputing and Networking Center is highlighted as a member that develops tools like the Virtual Transcription Laboratory to support high-quality digitization of cultural heritage materials.
Towards a Knowledge Graph for a Research Group with Focus on Qualitative Anal...Vera G. Meister
Support of scientific workflows by semantic technology gains in-creasing interest in recent years. Huge efforts are put on providing structured, standard-based meta data and on machine based qualitative analysis of unstructured content of scholarly papers. This helps researchers to stay oriented in an ever growing and gaining complexity field. Semantic technologies have also the potential to support the in-depth involvement in scholarly papers, like practiced in research seminars. The paper reports on the preliminary results of an under-taking to support the collaborative documentation and reuse of qualitative analysis of scholarly papers in an information systems research group. A vocabulary is developed and openly provided. The system is implemented on the base of OntoWiki and can be accessed openly.
SGCI - The Science Gateways Community Institute: Going Beyond BordersSandra Gesing
The Science Gateways Community Institute (SGCI), opened in August 2016, provides free resources, services, experts, and ideas for creating and sustaining science gateways. It offers five areas of services to the science gateway developer and user communities: the Incubator, Extended Developer Support, the Scientific Software Collaborative, Community Engagement and Exchange, and Workforce Development. While all these areas are available to US-based communities, the Incubator, the Scientific Software Collaborative and the Community Engagement and Exchange serve also the international communities. We aim at reaching out and supporting beyond borders on international scale with diverse measures and our intent is to form and deepen collaborations with partner organizations and coalitions beneficial and/or related to the science gateways community. Research topics are independent of national borders and researchers spread worldwide can benefit from each other’s research results, software, data and from lessons learned — via online materials and publications or at international events. The gateway community has long benefitted from this type of exchange. This paper will present related work describing the benefits of international collaborations generally, and specifically as they relate to science gateways. We go into detail regarding SGCI’s ongoing work on international scale and its work planned in the near future.
This document provides information about a workshop on newspaper data visualization hosted by the British Library and London College of Communication. It discusses the British Library's collection of over 34,000 newspaper titles containing 450 million pages. It outlines plans to digitize 1.3 million additional newspaper pages by 2022 and make metadata and text data openly available. The workshop goals are to help researchers understand how to visualize and analyze the complexities of the library's newspaper collection using tools like Python, R, Voyant, and Palladio and methods like named entity recognition and text mining.
Ebooks: desafios, perturbações e inovaçõesREA Brasil
The document discusses challenges, disruptions and innovations related to ebooks. It covers several topics including price, digital inclusion, technological standards and interoperability, content, new opportunities for authorship and collaboration, and user behavior. Ebooks present opportunities for lowering costs and increasing access but also challenges related to establishing common standards and business models. Innovation in authorship, publishing and accessing content will be needed to fully realize the potential of ebooks.
The document discusses challenges, disruptions and innovations related to ebooks. It addresses issues such as price, digital inclusion, technological standards and interoperability, content, new opportunities for authorship and collaboration, and user behavior. It also examines initiatives for open educational resources and open licensing models as ways to increase access and sharing of knowledge through digital means.
The document discusses a project called PROMISE which aims to develop a Belgian strategy for web archiving. The project seeks to:
1) Identify best practices in web archiving.
2) Develop a Belgian web archiving strategy.
3) Pilot archiving the Belgian web and providing access to collections.
4) Make recommendations for implementing a sustainable web archiving service.
A demonstration of transparent and scalable OpenURL quality metrics for use i...alc28
This document summarizes Adam Chandler's presentation on using OpenURL quality metrics to promote metadata consistency across content providers. It discusses literature on OpenURL and metadata quality, analyzes elements in OpenURLs, and presents Chandler's 2008 findings on common and variable elements. The goal is to build a tool to evaluate OpenURL quality from content providers based on Hughes' metadata evaluation approach and analysis of core OpenURL elements.
Multimodal Perspectives for Digitised Historical Newspaperscneudecker
This document discusses challenges and opportunities in analyzing digitized historical newspapers. It describes several projects aimed at improving OCR accuracy using deep learning models, extracting structural information using computer vision and heuristics, and establishing standards for metadata and evaluation. Key challenges include the need for more granular and representative ground truth newspaper data, methods that combine machine learning and domain knowledge, and community efforts around shared tasks, seminars, and an atlas of digitized newspapers to advance interdisciplinary research. The overall goal is to make cultural heritage collections more accessible online through improved digitization and analysis of newspapers.
Lorna hughes 12 05-2013 NeDiMAH and ontology for DHlorna_hughes
This document describes NeDiMAH, a network examining the use of digital methods in the arts and humanities. NeDiMAH is funded by the European Science Foundation and chaired by Lorna Hughes. It aims to research advanced ICT methods, develop activities/publications/networking, and create a map of digital humanities in Europe and a taxonomy of methods. NeDiMAH includes 16 supporting member organizations and has working groups on topics like spatial modeling, visualization, and scholarly publishing. A key output will be a formal ontology of digital methods to provide evidence of their use and enable evaluation of digital humanities projects.
Towards OpenURL Quality Metrics: Initial Findingsalc28
Presentation on creating a method for benchmarking metadata consistency in OpenURL links. See also: <http: />. Delivered at the July 2009 American Library Association conference in Chicago.
Transforming University Research - Mar 2006Jill Patrick
The document discusses Scholars Portal, a consortium of Ontario university libraries that provides access to digital scholarly resources and services. It aims to create a single point of access for integrated searching, as well as long-term archiving of content. Services described include article searching, access to ejournals and databases, interlibrary loans, and a digital repository. Future plans include expanding content and developing a shared infrastructure to ensure sustainability. The goal is to transform research, teaching and learning through a centralized portal for high-quality scholarly materials.
Slides of the paper Curation Technologies for a Cultural Heritage Archive: Analysing and transforming a heterogeneous data set into an interactive curation workbench by Georg Rehm, Martin Lee, Julián Moreno Schneider and Peter Bourgonje at the 3rd Edition of the DATeCH2019 International Conference
Open Research Knowledge Graph (ORKG) - an overview Jennifer D'Souza
The ORKG makes scientific knowledge human- and machine-actionable and thus enables completely new ways of machine assistance. This will help researchers find relevant contributions to their field and create state-of-the-art comparisons and reviews. With the ORKG, scientists can explore knowledge in entirely new ways and share results even across different disciplines. This presentation offered an overview about the ORKG. The presentation was made on 15.7.2021 for the meeting of Lower Saxony librarian trainees.
CIL 2020 - Bringing Collections to the ScreenMatthew Ragucci
Our NGA library speakers discuss their procedures and challenges in providing digitized content from their collections via the International Image Interoperability Framework (IIIF), an initiative led by the world's leading research libraries. It is an open source, community-driven technology that aims to provide application programming interfaces (APIs) that support viewing, comparing, manipulating, and annotating images from a variety of repositories. The NGA Library made the decision to implement IIIF alongside its new library system, Ex Libris. Alma and Primo VE products, and our speakers discuss the technical procedures required to integrate the IIIF APIs with the Primo VE discovery client and Alma, the cloudbased library services platform. Members of the NISO Content Platform Migration workgroup discuss their recommended practices document to guide publishers, platform vendors, and librarians through content migrations. Hear about the problems encountered in migrations and the recommendations to make them progress smoothly.
Slides of the paper Deep Learning-Based Morphological Taggers and Lemmatizers for Annotating Historical Texts by Helmut Schmid at the 3rd Edition of the DATeCH2019 International Conference
This document discusses using text models to improve the accuracy of optical character recognition (OCR) on Chinese rare books. It conducted experiments using n-gram, backward/forward n-gram, and LSTM models on OCR data from ancient medicine books. The backward and forward 4-gram model achieved the highest correction rate at 97.57%. Mixing the LSTM 6-gram model with the OCR's top 5 candidates and probability of the top candidate further improved accuracy to 97.71%, demonstrating that combining text models with OCR probabilities can better correct OCR errors than text models alone. In conclusion, text models are effective for increasing OCR accuracy on rare books, with backward/forward 4-gram and LSTM 6-gram
SGCI Science Gateways: Ushering in a New Era of Sustainability Sandra Gesing
The computational landscape has never so fast evolved like in the last decade. Computational scientific methods tackle an increasing breadth and diversity of topics – analyzing data on a large scale and accessing high-performance computing infrastructures, cutting-edge hardware and instruments. Novel technologies such as next-gen sequencing or the Square Kilometre Array telescope, the world largest radio telescope, have evolved, which allow creating data in exascale dimension. While the availability of this data salvage to find answers for research questions, which would not have been feasible before, the amount of data creates new challenges, which obviously need novel computational solutions. Such novel solutions require integrative approaches for multidisciplinary teams across geographical boundaries, which improve usability of scientific methods tailored to the target user communities and aim at achieving reproducibility of science. The goal of science gateways, also called virtual research environments or virtual laboratories, are following exactly this goal to provide an easy-to-use end-to-end solution hiding the complex underlying infrastructure. They support researchers with intuitive user interfaces to focus on their research question instead of becoming acquainted with technological details.
Science gateways are often developed by research teams, who are not necessarily in the computer science domain and science projects depend on academic funding. Centralized research programmer teams, who can provide broad experience and contribute to sustainability of solutions, are rather rare at universities and there is still a lack of incentives for interested developers to stay in academia. One of the future challenges for science gateways and thus for computational scientific methods will be to increase the sustainability and getting less dependent on successful proposals. The US National Science Foundation has recognized the importance of this topic for research and has funded the Science Gateways Community Institute (SGCI) to support not only teams in developing science gateways but also to help communities to find a way to sustain their favorite science gateway for conducting their research. This talk will go into detail for current challenges, the landscape around science gateways, the services of SGCI and approaches to reach sustainability.
This document discusses virtual research environments (VREs) in the digital humanities field. It provides examples of several existing VREs, including TextGrid (Germany), TAPoR (Canada), NINES (US/UK), DARIAH (EU-wide), and a VRE for European Holocaust research. It explains that VREs aim to provide researchers with collaborative tools and interfaces to organize, analyze, and share digital research materials online. However, developing VREs for the humanities poses challenges around establishing common standards, balancing diversity of research with coordination needs, and ensuring new technologies support rather than hinder existing humanistic methods.
The document discusses tools for text digitization and transcription. It describes the IMPACT Centre of Competence, which brings together content holders, service providers, and researchers to advance digitization. The Center shares expertise and tools for tasks like image enhancement, segmentation, optical character recognition, and evaluation. Member organizations can access resources, consult experts, and collaborate on projects. The Poznan Supercomputing and Networking Center is highlighted as a member that develops tools like the Virtual Transcription Laboratory to support high-quality digitization of cultural heritage materials.
Towards a Knowledge Graph for a Research Group with Focus on Qualitative Anal...Vera G. Meister
Support of scientific workflows by semantic technology gains in-creasing interest in recent years. Huge efforts are put on providing structured, standard-based meta data and on machine based qualitative analysis of unstructured content of scholarly papers. This helps researchers to stay oriented in an ever growing and gaining complexity field. Semantic technologies have also the potential to support the in-depth involvement in scholarly papers, like practiced in research seminars. The paper reports on the preliminary results of an under-taking to support the collaborative documentation and reuse of qualitative analysis of scholarly papers in an information systems research group. A vocabulary is developed and openly provided. The system is implemented on the base of OntoWiki and can be accessed openly.
SGCI - The Science Gateways Community Institute: Going Beyond BordersSandra Gesing
The Science Gateways Community Institute (SGCI), opened in August 2016, provides free resources, services, experts, and ideas for creating and sustaining science gateways. It offers five areas of services to the science gateway developer and user communities: the Incubator, Extended Developer Support, the Scientific Software Collaborative, Community Engagement and Exchange, and Workforce Development. While all these areas are available to US-based communities, the Incubator, the Scientific Software Collaborative and the Community Engagement and Exchange serve also the international communities. We aim at reaching out and supporting beyond borders on international scale with diverse measures and our intent is to form and deepen collaborations with partner organizations and coalitions beneficial and/or related to the science gateways community. Research topics are independent of national borders and researchers spread worldwide can benefit from each other’s research results, software, data and from lessons learned — via online materials and publications or at international events. The gateway community has long benefitted from this type of exchange. This paper will present related work describing the benefits of international collaborations generally, and specifically as they relate to science gateways. We go into detail regarding SGCI’s ongoing work on international scale and its work planned in the near future.
This document provides information about a workshop on newspaper data visualization hosted by the British Library and London College of Communication. It discusses the British Library's collection of over 34,000 newspaper titles containing 450 million pages. It outlines plans to digitize 1.3 million additional newspaper pages by 2022 and make metadata and text data openly available. The workshop goals are to help researchers understand how to visualize and analyze the complexities of the library's newspaper collection using tools like Python, R, Voyant, and Palladio and methods like named entity recognition and text mining.
Ebooks: desafios, perturbações e inovaçõesREA Brasil
The document discusses challenges, disruptions and innovations related to ebooks. It covers several topics including price, digital inclusion, technological standards and interoperability, content, new opportunities for authorship and collaboration, and user behavior. Ebooks present opportunities for lowering costs and increasing access but also challenges related to establishing common standards and business models. Innovation in authorship, publishing and accessing content will be needed to fully realize the potential of ebooks.
The document discusses challenges, disruptions and innovations related to ebooks. It addresses issues such as price, digital inclusion, technological standards and interoperability, content, new opportunities for authorship and collaboration, and user behavior. It also examines initiatives for open educational resources and open licensing models as ways to increase access and sharing of knowledge through digital means.
The document discusses a project called PROMISE which aims to develop a Belgian strategy for web archiving. The project seeks to:
1) Identify best practices in web archiving.
2) Develop a Belgian web archiving strategy.
3) Pilot archiving the Belgian web and providing access to collections.
4) Make recommendations for implementing a sustainable web archiving service.
A demonstration of transparent and scalable OpenURL quality metrics for use i...alc28
This document summarizes Adam Chandler's presentation on using OpenURL quality metrics to promote metadata consistency across content providers. It discusses literature on OpenURL and metadata quality, analyzes elements in OpenURLs, and presents Chandler's 2008 findings on common and variable elements. The goal is to build a tool to evaluate OpenURL quality from content providers based on Hughes' metadata evaluation approach and analysis of core OpenURL elements.
Multimodal Perspectives for Digitised Historical Newspaperscneudecker
This document discusses challenges and opportunities in analyzing digitized historical newspapers. It describes several projects aimed at improving OCR accuracy using deep learning models, extracting structural information using computer vision and heuristics, and establishing standards for metadata and evaluation. Key challenges include the need for more granular and representative ground truth newspaper data, methods that combine machine learning and domain knowledge, and community efforts around shared tasks, seminars, and an atlas of digitized newspapers to advance interdisciplinary research. The overall goal is to make cultural heritage collections more accessible online through improved digitization and analysis of newspapers.
Lorna hughes 12 05-2013 NeDiMAH and ontology for DHlorna_hughes
This document describes NeDiMAH, a network examining the use of digital methods in the arts and humanities. NeDiMAH is funded by the European Science Foundation and chaired by Lorna Hughes. It aims to research advanced ICT methods, develop activities/publications/networking, and create a map of digital humanities in Europe and a taxonomy of methods. NeDiMAH includes 16 supporting member organizations and has working groups on topics like spatial modeling, visualization, and scholarly publishing. A key output will be a formal ontology of digital methods to provide evidence of their use and enable evaluation of digital humanities projects.
Towards OpenURL Quality Metrics: Initial Findingsalc28
Presentation on creating a method for benchmarking metadata consistency in OpenURL links. See also: <http: />. Delivered at the July 2009 American Library Association conference in Chicago.
Transforming University Research - Mar 2006Jill Patrick
The document discusses Scholars Portal, a consortium of Ontario university libraries that provides access to digital scholarly resources and services. It aims to create a single point of access for integrated searching, as well as long-term archiving of content. Services described include article searching, access to ejournals and databases, interlibrary loans, and a digital repository. Future plans include expanding content and developing a shared infrastructure to ensure sustainability. The goal is to transform research, teaching and learning through a centralized portal for high-quality scholarly materials.
Slides of the paper Curation Technologies for a Cultural Heritage Archive: Analysing and transforming a heterogeneous data set into an interactive curation workbench by Georg Rehm, Martin Lee, Julián Moreno Schneider and Peter Bourgonje at the 3rd Edition of the DATeCH2019 International Conference
Open Research Knowledge Graph (ORKG) - an overview Jennifer D'Souza
The ORKG makes scientific knowledge human- and machine-actionable and thus enables completely new ways of machine assistance. This will help researchers find relevant contributions to their field and create state-of-the-art comparisons and reviews. With the ORKG, scientists can explore knowledge in entirely new ways and share results even across different disciplines. This presentation offered an overview about the ORKG. The presentation was made on 15.7.2021 for the meeting of Lower Saxony librarian trainees.
CIL 2020 - Bringing Collections to the ScreenMatthew Ragucci
Our NGA library speakers discuss their procedures and challenges in providing digitized content from their collections via the International Image Interoperability Framework (IIIF), an initiative led by the world's leading research libraries. It is an open source, community-driven technology that aims to provide application programming interfaces (APIs) that support viewing, comparing, manipulating, and annotating images from a variety of repositories. The NGA Library made the decision to implement IIIF alongside its new library system, Ex Libris. Alma and Primo VE products, and our speakers discuss the technical procedures required to integrate the IIIF APIs with the Primo VE discovery client and Alma, the cloudbased library services platform. Members of the NISO Content Platform Migration workgroup discuss their recommended practices document to guide publishers, platform vendors, and librarians through content migrations. Hear about the problems encountered in migrations and the recommendations to make them progress smoothly.
Slides of the paper Deep Learning-Based Morphological Taggers and Lemmatizers for Annotating Historical Texts by Helmut Schmid at the 3rd Edition of the DATeCH2019 International Conference
This document discusses using text models to improve the accuracy of optical character recognition (OCR) on Chinese rare books. It conducted experiments using n-gram, backward/forward n-gram, and LSTM models on OCR data from ancient medicine books. The backward and forward 4-gram model achieved the highest correction rate at 97.57%. Mixing the LSTM 6-gram model with the OCR's top 5 candidates and probability of the top candidate further improved accuracy to 97.71%, demonstrating that combining text models with OCR probabilities can better correct OCR errors than text models alone. In conclusion, text models are effective for increasing OCR accuracy on rare books, with backward/forward 4-gram and LSTM 6-gram
Slides of the paper Turning Digitised Material into a Diachronic Corpus: Metadata Challenges in the Nederlab Project by Katrien Depuydt and Hennie Brugman at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Standoff Annotation for the Ancient Greek and Latin Dependency Treebank by Giuseppe Celano at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Using lexicography to characterise relations between species mentions in the biodiversity literature by Sandra Young at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Implementation of a Databaseless Web REST API for the Unstructured Texts of Migne's Patrologia Graeca with Searching capabilities and additional Semantic and Syntactic expandability by Evagelos Varthis, Marios Poulos, Ilias Yarenis and Sozon Papavlasopoulos at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Tribunal Archives as Digital Research Facility (TRIADO): new ways to make archives accessible and useable by Anne Gorter, Edwin Klijn, Rutger Van Koert, Marielle Scherer and Ismee Tames at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Improving OCR of historical newspapers and journals published in Finland by Senka Drobac, Pekka Kauppinen and Krister Lindén at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Towards a generic unsupervised method for transcription of encoded manuscripts by Arnau Baró, Jialuo Chen, Alicia Fornés and Beáta Megyesi at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Towards the Extraction of Statistical Information from Digitised Numerical Tables - The Medical Officer of Health Reports Scoping Study by Christian Clausner, Apostolos Antonacopoulos, Christy Henshaw and Justin Hayes at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Detecting Articles in a Digitized Finnish Historical Newspaper Collection 1771–1929: Early Results Using the PIVAJ Software by Kimmo Kettunen, Teemu Ruokolainen, Erno Liukkonen, Pierrick Tranouez, Daniel Antelme and Thierry Paquet at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper OCR-D: An end-to-end open-source OCR framework for historical documents by Clemens Neudecker, Konstantin Baierer, Maria Federbusch, Kay-Michael Würzner, Matthias Boenig, Elisa Hermann and Volker Hartmann at the 3rd Edition of the DATeCH2019 International Conference
- The document describes a project to fill gaps in knowledge about diamond mining, trading, and polishing in Borneo by developing a workflow using various CLARIAH tools and resources.
- The workflow involved digitizing a diamond encyclopedia, extracting concepts and place names, linking the data to external sources to create linked open data, and querying newspaper archives to build a corpus of relevant articles.
- Promising results showed mining, trading, and polishing continued in Borneo for Southeast Asian customers, and described previously unknown diamond fields and polishing locations in Borneo. The project aims to apply the workflow to other commodities like sugar.
Slides of the paper Automatic Reconstruction of Emperor Itineraries from the Regesta Imperii by Juri Opitz, Leo Born, Vivi Nastase and Yannick Pultar at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Automatic Semantic Text Tagging on Historical Lexica by Combining OCR and Typography Classification by Christian Reul, Sebastian Göttel, Uwe Springmann, Christoph Wick, Kay-Michael Würzner and Frank Puppe at the 3rd Edition of the DATeCH2019 International Conference
This document describes the SOS system for segmenting, stemming, and standardizing Arabic text. It presents the challenges of processing Arabic cultural heritage texts which contain orthographic variations. The system uses gradient boosting machines and achieves state-of-the-art performance on segmentation and derives stemming as a byproduct. It also standardizes orthography with high accuracy, which further improves segmentation. The system addresses issues like hamza forms and letter confusions that previous systems did not handle well.
Slides of the paper A-I-PoCoTo - Combining Automated and Interactive OCR PostCorrection by Tobias Englmeier, Florian Fink and Klaus U. Schulz at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Labelling OCR for Greek polytonic (multi accent) historical printed documents. Development, optimization and quality control by Anna-Maria Sichani, Panagotis Kaddas, Vassilis Gatos and George Mikros at the 3rd Edition of the DATeCH2019 International Conference
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Session5 02.tom derrick
1. Cross-disciplinary collaborations
to enrich access to non-Western
language material in the Cultural
Heritage sector
Tom Derrick, Nora McGregor, Dr Adi Keinan-Schoonbaert
Digital Scholarship Department, British Library
2. www.bl.uk 2
National library of the UK and
world’s largest library by
number of items catalogued.
c.150-200 million items stored
in London and in York.
20+ years creating digital
assets.
Digitisation is key to opening
up access
We can now do much more
than simply view these digital
objects online and must
embrace opportunities
afforded by analysing digital
collections at scale.
3. www.bl.uk 3
Our aims
• Support the British Library's mission to make our intellectual heritage
accessible to everyone “for research, inspiration and enjoyment”, particularly
our non-western materials
• Raise awareness of our South Asian Printed books and Arabic manuscript
collections, with a wide and diverse audience around the world, from the
general public, computer scientists, to students
• Instigate new collaborations in the computer science/recognition domain,
creating a dialogue around the challenges/opportunities for automatic
transcription of historical Arabic and Bengali texts
• Create openly licensed ground truth datasets to aid digital humanists and
researchers working on the state-of-the-art in recognition software
4. www.bl.uk 4
Two Centuries of Indian Print
Scope of collection:
- Rare and unique South Asian printed books collection
- 1,000 Bengali books, 1713-1914
- 600 Assamese and Sylheti books digitising 2018/19
5. www.bl.uk 5
Challenges for OCR
• Bengali not widely, or well supported by leading providers
• Extensive alphabet with complex character forms
• Varied historical fonts and alphabetical reforms
• Physical defects in material
• Quality of digitised items
6. www.bl.uk 6
Bangla OCR Competition
• ICDAR (Kyoto, Nov 2017)
• PRImA Research Lab, University of Salford
• 23 institutions 7 countries (50% India)
• Commercial tech companies + university computer science & engineering depts
7. Bangla OCR Competition Process
Selected images from collection
representing OCR challenges
Created Ground Truth
training set
Entrants trained systems
on Ground truth
Entrants perform OCR on
full collection of images
Evaluated by
PRIMA Research Lab
Image: neural network
Image: poster
Published report and poster
at ICDAR conference
Kyoto, Nov 2017
10. www.bl.uk 10
Current Bangla OCR Initiatives - Transkribus
• Handwritten and printed text analysis
• Collaborative platform
• 100 pages ground truth train HTR engine
• Supports non-Latin scripts
www.transkribus.eu
11. www.bl.uk 11
Initial Transkribus Results
• 100 pages of ground truth transcribed by Jadavpur University
• New HTR+ achieved 6% CER on same set of pages!
• On par with Google for OCR performance but requires lots of manual work
www.transkribus.eu
12. www.bl.uk 12
Future Plans
• Evaluate ICDAR2019 OCR competition methods
• Continue training Transkribus with new transcriptions in 2019
• Facilitating OCR training workshops in South Asia
13. www.bl.uk 13
BL Arabic scientific
manuscript collections
In 2014 the British Library Qatar Foundation
Partnership launched the Qatar Digital Library
(QDL): a bilingual, online portal providing access
to digitised British Library archival materials and
manuscripts relating to Gulf history and Arabic
science.
• 600 manuscripts (215 digitised)
• 1,500 texts
• 184,000 pages
• Manuscripts produced from Spain/North Africa
to India
• Manuscripts dating from the 10th-20th centuries
• Authors dating from the 5th century BC to the
19th century
14. www.bl.uk 14
Challenges with Arabic script
Arabic script presents unique challenges for text recognition:
• Arabic script writing styles are varied
• Characters are written in cursive, joined right to left, they
may take 2 to 4 shapes, and each is context sensitive
• The shape of each of the 28 Arabic characters may change
drastically depending on their location in the word
• The existence of non-joining characters means that
although the script is cursive, they do not join to the
following letter resulting in a small space within a word
• Long strokes along the baseline
• Complex combination of ascenders, descenders, diacritics,
and special notation either above or below the baseline
depending on the character pose further challenges.
16. www.bl.uk 16
In collaboration with our partners at
the Alan Turing Institute and PRImA
Research Lab, we launched a
competition as part of the 16th
International Conference on Frontiers
in Handwriting Recognition (ICFHR
2018) held August 5-8, 2018 in Niagara
Falls (USA).
The competition focused on finding an
optimal solution for accurately and
automatically transcribing historical
Arabic scientific handwritten
manuscripts, utilising ground truth that
we created.
A paper describing the competition and results was published in
the proceedings of ICFHR 2018:
C. Clausner, A. Antonacopoulos, N. McGregor, D. Wilson-Nunn,
"ICFHR 2018 Competition on Recognition of Historical Arabic
Scientific Manuscripts - RASM2018", Proceedings of the 17th
International Workshop on Frontiers in Handwriting Recognition
(ICFHR2018), Niagara Falls, USA, August 2018, pp. 471-476.
RASM2018 ICFHR2018 Competition on
Recognition of Historical Arabic Scientific
Manuscripts
http://www.primaresearch.org/RASM2018/
18. www.bl.uk 18
We explored creating a ground
truth dataset collaboratively and
at scale, using the collective
expertise of volunteers
We utilised a free and open-source
platform, From the Page, which
allowed anyone with an interest in
historical Arabic manuscripts to
experience them up close
A BL team of curatorial &
translation experts produced the
first 10 pages to use as an example
for volunteers
It took only 18 days for 36
volunteers from around the world
to fully transcribe 85 pages
Collaborative Transcription
https://fromthepage.com/
20. www.bl.uk 20
Methods Evaluated
• Google Cloud Vision API
J. Walker, Y. Fujii, A.C. Popat “A Web-Based OCR Service for Documents” in
Proceedings of the 13th IAPR International Workshop on Document Analysis
Systems (DAS), Vienna, Austria, Apr. 2018
• KFCN, Ben-Gurion University of the Negev
B. Kurar and J. El-Sana, “Binarization free layout analysis for Arabic historical
documents using fully convolutional networks” in Arabic Script Analysis and
Recognition (ASAR), 2018 2nd International Workshop on. IEEE, 2018
• RDI, Cairo University
RDI-Corporation’s own Historical Arabic Handwritten/Typewritten OCR system which
has been built from different historical manuscripts
• Tesseract 3.04 + 4.0 (beta)
• ABBYY FineReader Engine 11
24. www.bl.uk 24
• RASM2019 ICDAR2019 competition
• Test this material with Transkribus
• Explore external collaborations e.g. with RDI, Transkribus, Open
Islamicate Texts Initiative (OpenITI)
Future Plans for historical Arabic texts
25. www.bl.uk 25
What’s next
• Integrate OCR with digital objects to make full
text searchable through IIIF viewer
• Host all ground truth resources and make freely
available for anyone wishing to advance the state-
of-the-art in text recognition technology (BL
Repository, replacing data.bl.uk)
• Host all resources on the IMPACT Centre of
Competence website
• Pilot workflows to OCR our materials at scale
using the more successful methods
• Promote our fully searchable digitised items to
target audiences (e.g. researchers)