Presentation of the paper PoCoTo - An Open Source System For Efficient Interactive Postcorrection of OCRed Historical Text by Thorsten Vobl, Annette Gotscharek, Ulrich Reffle, Christoph Ringlstetter and Klaus Schulz in DATeCH 2014. #digidays
Presentation of the paper User-driven correction of OCR errors. Combining crowdsourcing and information retrieval technology by Günter Mühlberger, Johannes Zelger, David Sagmeister and Albert Greinöcker in DATeCH 2014. #digidays
Prezentacja artykułu z konferencji Infobazy 2014, prezentującego prace realizowane w projekcie MARKOS. Celem projektu MARKOS jest opracowanie koncepcji i rozwój sieciowej usługi umożliwiającej wyszukanie w globalnej przestrzeni projektów Open Source komponentów, które w sposób optymalny spełniają kryteria wyspecyfikowane przez użytkownika systemu. Dzięki opracowanemu systemowi twórcy i użytkownicy otwartego oprogramowania (ang. Open Source Software, OSS) będą mogli w łatwy i automatyczny sposób analizować zależności pomiędzy użytymi komponentami OSS, biorąc pod uwagę funkcjonalne, strukturalne i licencyjne aspekty kodu źródłowego.
Wynikiem projektu będzie prototyp usługi uruchomionej w Internecie przez partnerów projektu i udostępnionej poprzez zestaw interaktywnych aplikacji, zarówno przez graficzny interfejs użytkownika, jak i semantyczny punkt dostępu do danych w modelu linked data. Wspomniana powyżej usługa będzie realizowana za pomocą zestawu wewnętrznych komponentów systemu MARKOS, których zadaniem będzie wielokontekstowa analiza informacji dostępnych w sieci oraz ich przetwarzanie i przechowywanie w wewnętrznym repozytorium semantycznym systemu.
System MARKOS będzie oferował użytkownikom możliwość semantycznego przeszukiwania i przeglądania komponentów i bibliotek oraz nawigowania po strukturze kodu na wysokim poziomie abstrakcji. Ułatwi to, w szczególności architektom i analitykom, wyszukanie komponentu, który spełnia funkcjonalne, techniczne i prawne wymagania systemu. Z kolei programistom pozwoli lepiej zrozumieć dostępne interfejsy i wewnętrzne zależności oprogramowania. Dodatkowo system MARKOS będzie brał pod uwagę również aspekty integracji kodu, pokazując i wykorzystując zależności i związki między komponentami oprogramowania z różnych projektów. Dzięki temu w systemie MARKOS dostępny będzie zintegrowany globalny widok na istniejące oprogramowanie Open Source. MARKOS wykorzysta również zależności między komponentami do bardziej efektywnej i trafnej analizy kompatybilności licencji, dostarczając podstaw argumentacji prawnej i rozwiązywania konfliktów. W celu ułatwienia współpracy między różnymi projektami, MARKOS dostarczy też narzędzi umożliwiających powiadamianie o istotnych zmianach w komponentach pomiędzy zależnymi projektami. Oczekuje się w związku z powyższym, że system MARKOS ze swoją funkcjonalnością w kontekście globalnym ułatwi rozwój oprogramowania w oparciu o paradygmat Open Source wnosząc swój wkład w globalną społeczność.
Purposeful Gaming, OCR Correction and Seed & Nursery Catalog DigitizationMartySchlabach
An online game will be developed to crowd-source the correction of OCRed content in the Biodiversity Heritage Library (BHL). Several additional content types will digitized and added to BHL, namely seed lists, seed & nursery catalogs, and hand-written field notebooks.
Niall Anderson outlines the IMPACT approach to adaptive OCR and Post Production including tools prepared by IBM CONCERT and experimental tools from: USAL, NCSR and UIBK.
Delivered at BL Demo Day - 12th July 2011
Presentation of the paper User-driven correction of OCR errors. Combining crowdsourcing and information retrieval technology by Günter Mühlberger, Johannes Zelger, David Sagmeister and Albert Greinöcker in DATeCH 2014. #digidays
Prezentacja artykułu z konferencji Infobazy 2014, prezentującego prace realizowane w projekcie MARKOS. Celem projektu MARKOS jest opracowanie koncepcji i rozwój sieciowej usługi umożliwiającej wyszukanie w globalnej przestrzeni projektów Open Source komponentów, które w sposób optymalny spełniają kryteria wyspecyfikowane przez użytkownika systemu. Dzięki opracowanemu systemowi twórcy i użytkownicy otwartego oprogramowania (ang. Open Source Software, OSS) będą mogli w łatwy i automatyczny sposób analizować zależności pomiędzy użytymi komponentami OSS, biorąc pod uwagę funkcjonalne, strukturalne i licencyjne aspekty kodu źródłowego.
Wynikiem projektu będzie prototyp usługi uruchomionej w Internecie przez partnerów projektu i udostępnionej poprzez zestaw interaktywnych aplikacji, zarówno przez graficzny interfejs użytkownika, jak i semantyczny punkt dostępu do danych w modelu linked data. Wspomniana powyżej usługa będzie realizowana za pomocą zestawu wewnętrznych komponentów systemu MARKOS, których zadaniem będzie wielokontekstowa analiza informacji dostępnych w sieci oraz ich przetwarzanie i przechowywanie w wewnętrznym repozytorium semantycznym systemu.
System MARKOS będzie oferował użytkownikom możliwość semantycznego przeszukiwania i przeglądania komponentów i bibliotek oraz nawigowania po strukturze kodu na wysokim poziomie abstrakcji. Ułatwi to, w szczególności architektom i analitykom, wyszukanie komponentu, który spełnia funkcjonalne, techniczne i prawne wymagania systemu. Z kolei programistom pozwoli lepiej zrozumieć dostępne interfejsy i wewnętrzne zależności oprogramowania. Dodatkowo system MARKOS będzie brał pod uwagę również aspekty integracji kodu, pokazując i wykorzystując zależności i związki między komponentami oprogramowania z różnych projektów. Dzięki temu w systemie MARKOS dostępny będzie zintegrowany globalny widok na istniejące oprogramowanie Open Source. MARKOS wykorzysta również zależności między komponentami do bardziej efektywnej i trafnej analizy kompatybilności licencji, dostarczając podstaw argumentacji prawnej i rozwiązywania konfliktów. W celu ułatwienia współpracy między różnymi projektami, MARKOS dostarczy też narzędzi umożliwiających powiadamianie o istotnych zmianach w komponentach pomiędzy zależnymi projektami. Oczekuje się w związku z powyższym, że system MARKOS ze swoją funkcjonalnością w kontekście globalnym ułatwi rozwój oprogramowania w oparciu o paradygmat Open Source wnosząc swój wkład w globalną społeczność.
Purposeful Gaming, OCR Correction and Seed & Nursery Catalog DigitizationMartySchlabach
An online game will be developed to crowd-source the correction of OCRed content in the Biodiversity Heritage Library (BHL). Several additional content types will digitized and added to BHL, namely seed lists, seed & nursery catalogs, and hand-written field notebooks.
Niall Anderson outlines the IMPACT approach to adaptive OCR and Post Production including tools prepared by IBM CONCERT and experimental tools from: USAL, NCSR and UIBK.
Delivered at BL Demo Day - 12th July 2011
Digital Classicist London Seminars 2013 - Seminar 7 - Federico Boschetti & Br...DigitalClassicistLondon
An Integrated System For Generating And Correcting Polytonic Greek OCR
Federico Boschetti (CNR, Pisa) and Bruce Robertson (Mount Allison University, Canada)
Digital Classicist London & Institute of Classical Studies seminar 2013
Friday July 19th at 16:30, in Room S264, Senate House, Malet Street, London WC1E 7HU
In many fields, the digital books revolution provides wide and highly detailed access to pertinent texts; but this revolution has left behind scholars working with ancient Greek. While it is true that Hellenists have had digitized canonical texts for many years, these collections' relatively limited scope and restrictive licenses are increasingly at odds with recent currents in computer-based humanities research: linked data, large-scale text mining, and syntatic treebanking, to name a few. Perhaps the most important impediments to digitizing polytonic Greek have been the lack of: a high-quality optical character recognition for this script, especially under open-source licenses; and an assisted editor for polytonic Greek OCR output. In this seminar, we present a integrated system that fills these critical gap, making it possible for polytonic Greek texts to be digitized en masse.
Rigaudon OCR is a complete suite of scripts, python code and data required for producing polytonic Greek OCR. It comprises: an OCR engine based on Gamera with many features specific to the recognition of polytonic Greek and specific classifiers to identify the characters in Teubner, Teubner-sans-serif, OCT/Loeb, and Didot editions. It includes an automatic spellchecker designed to correct Greek OCR errors, and it has a process for combining existing, high-quality Latin-script OCR output with parallel Greek output, as illustrated by this papyrological text. Finally, it coordinates these processes through Sun Grid Engine scripts required to queue and parallelize these processes.
Slides of the paper Labelling OCR for Greek polytonic (multi accent) historical printed documents. Development, optimization and quality control by Anna-Maria Sichani, Panagotis Kaddas, Vassilis Gatos and George Mikros at the 3rd Edition of the DATeCH2019 International Conference
Wroclaw University Library presentation at "Succeed in Digitisation. Spreading Excellence" Conference. Validation and take-up of text digitisation tools.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.Lifeng (Aaron) Han
Invited Presentation in NLP lab of Soochow University, about my NLP journey and ADAPT Centre. NLP part covers Machine Translation Evaluation, Quality Estimation, Multiword Expression Identification, Named Entity Recognition, Word Segmentation, Treebanks, Parsing.
Optical Character Recognition (OCR) technology has revolutionized the way we process and digitize printed or handwritten text. It plays a crucial role in document management systems, data extraction, and many other applications where converting images of text into editable and searchable formats is essential. However, the accuracy and reliability of OCR heavily rely on the quality of the training dataset used during its development. In this blog post, we will explore the significance of an OCR training dataset and its impact on the performance of OCR systems.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
Web Annotations – A Game Changer for Language Technology?Georg Rehm
Georg Rehm, Felix Sasaki, and Aljoscha Burchardt. Web Annotations - A Game Changer for Language Technologies? I Annotate 2016, Berlin, Germany, May 2016. May 19/20, 2016.
Slides of the paper Deep Learning-Based Morphological Taggers and Lemmatizers for Annotating Historical Texts by Helmut Schmid at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Towards a Higher Accuracy of Optical Character Recognition of Chinese Rare Books in Making Use of Text Model by Hsiang-An Wang and Pin-Ting Liu at the 3rd Edition of the DATeCH2019 International Conference
More Related Content
Similar to Datech2014 - Session 3 - PoCoTo - An Open Source System For Efficient Interactive Postcorrection of OCRed Historical Text
Digital Classicist London Seminars 2013 - Seminar 7 - Federico Boschetti & Br...DigitalClassicistLondon
An Integrated System For Generating And Correcting Polytonic Greek OCR
Federico Boschetti (CNR, Pisa) and Bruce Robertson (Mount Allison University, Canada)
Digital Classicist London & Institute of Classical Studies seminar 2013
Friday July 19th at 16:30, in Room S264, Senate House, Malet Street, London WC1E 7HU
In many fields, the digital books revolution provides wide and highly detailed access to pertinent texts; but this revolution has left behind scholars working with ancient Greek. While it is true that Hellenists have had digitized canonical texts for many years, these collections' relatively limited scope and restrictive licenses are increasingly at odds with recent currents in computer-based humanities research: linked data, large-scale text mining, and syntatic treebanking, to name a few. Perhaps the most important impediments to digitizing polytonic Greek have been the lack of: a high-quality optical character recognition for this script, especially under open-source licenses; and an assisted editor for polytonic Greek OCR output. In this seminar, we present a integrated system that fills these critical gap, making it possible for polytonic Greek texts to be digitized en masse.
Rigaudon OCR is a complete suite of scripts, python code and data required for producing polytonic Greek OCR. It comprises: an OCR engine based on Gamera with many features specific to the recognition of polytonic Greek and specific classifiers to identify the characters in Teubner, Teubner-sans-serif, OCT/Loeb, and Didot editions. It includes an automatic spellchecker designed to correct Greek OCR errors, and it has a process for combining existing, high-quality Latin-script OCR output with parallel Greek output, as illustrated by this papyrological text. Finally, it coordinates these processes through Sun Grid Engine scripts required to queue and parallelize these processes.
Slides of the paper Labelling OCR for Greek polytonic (multi accent) historical printed documents. Development, optimization and quality control by Anna-Maria Sichani, Panagotis Kaddas, Vassilis Gatos and George Mikros at the 3rd Edition of the DATeCH2019 International Conference
Wroclaw University Library presentation at "Succeed in Digitisation. Spreading Excellence" Conference. Validation and take-up of text digitisation tools.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.Lifeng (Aaron) Han
Invited Presentation in NLP lab of Soochow University, about my NLP journey and ADAPT Centre. NLP part covers Machine Translation Evaluation, Quality Estimation, Multiword Expression Identification, Named Entity Recognition, Word Segmentation, Treebanks, Parsing.
Optical Character Recognition (OCR) technology has revolutionized the way we process and digitize printed or handwritten text. It plays a crucial role in document management systems, data extraction, and many other applications where converting images of text into editable and searchable formats is essential. However, the accuracy and reliability of OCR heavily rely on the quality of the training dataset used during its development. In this blog post, we will explore the significance of an OCR training dataset and its impact on the performance of OCR systems.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
Web Annotations – A Game Changer for Language Technology?Georg Rehm
Georg Rehm, Felix Sasaki, and Aljoscha Burchardt. Web Annotations - A Game Changer for Language Technologies? I Annotate 2016, Berlin, Germany, May 2016. May 19/20, 2016.
Slides of the paper Deep Learning-Based Morphological Taggers and Lemmatizers for Annotating Historical Texts by Helmut Schmid at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Towards a Higher Accuracy of Optical Character Recognition of Chinese Rare Books in Making Use of Text Model by Hsiang-An Wang and Pin-Ting Liu at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Turning Digitised Material into a Diachronic Corpus: Metadata Challenges in the Nederlab Project by Katrien Depuydt and Hennie Brugman at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Standoff Annotation for the Ancient Greek and Latin Dependency Treebank by Giuseppe Celano at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Using lexicography to characterise relations between species mentions in the biodiversity literature by Sandra Young at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Implementation of a Databaseless Web REST API for the Unstructured Texts of Migne's Patrologia Graeca with Searching capabilities and additional Semantic and Syntactic expandability by Evagelos Varthis, Marios Poulos, Ilias Yarenis and Sozon Papavlasopoulos at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Curation Technologies for a Cultural Heritage Archive: Analysing and transforming a heterogeneous data set into an interactive curation workbench by Georg Rehm, Martin Lee, Julián Moreno Schneider and Peter Bourgonje at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Cross-disciplinary collaborations to enrich access to non-Western language material in the Cultural Heritage sector by Tom Derrick and Nora McGregor at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Tribunal Archives as Digital Research Facility (TRIADO): new ways to make archives accessible and useable by Anne Gorter, Edwin Klijn, Rutger Van Koert, Marielle Scherer and Ismee Tames at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Improving OCR of historical newspapers and journals published in Finland by Senka Drobac, Pekka Kauppinen and Krister Lindén at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Towards a generic unsupervised method for transcription of encoded manuscripts by Arnau Baró, Jialuo Chen, Alicia Fornés and Beáta Megyesi at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Towards the Extraction of Statistical Information from Digitised Numerical Tables - The Medical Officer of Health Reports Scoping Study by Christian Clausner, Apostolos Antonacopoulos, Christy Henshaw and Justin Hayes at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Detecting Articles in a Digitized Finnish Historical Newspaper Collection 1771–1929: Early Results Using the PIVAJ Software by Kimmo Kettunen, Teemu Ruokolainen, Erno Liukkonen, Pierrick Tranouez, Daniel Antelme and Thierry Paquet at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper OCR-D: An end-to-end open-source OCR framework for historical documents by Clemens Neudecker, Konstantin Baierer, Maria Federbusch, Kay-Michael Würzner, Matthias Boenig, Elisa Hermann and Volker Hartmann at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Diamonds in Borneo: Commodities as Concepts in Context by Karin Hofmeester, Ashkan Ashkpour, Katrien Depuydt and Jesse de Does at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Automatic Reconstruction of Emperor Itineraries from the Regesta Imperii by Juri Opitz, Leo Born, Vivi Nastase and Yannick Pultar at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Automatic Semantic Text Tagging on Historical Lexica by Combining OCR and Typography Classification by Christian Reul, Sebastian Göttel, Uwe Springmann, Christoph Wick, Kay-Michael Würzner and Frank Puppe at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Arabic-SOS Segmenter, Stemmer and Orthography Standardizer for the Arabic Cultural Heritage by Emad Mohamed & Zeeshas Sayyed at the 3rd Edition of the DATeCH2019 International Conference
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Essentials of Automations: The Art of Triggers and Actions in FME
Datech2014 - Session 3 - PoCoTo - An Open Source System For Efficient Interactive Postcorrection of OCRed Historical Text
1. PoCoTo
An Open Source System for Efficient
Interactive Postcorrection of OCRed
Historical Texts
Thorsten Vobl, Annette Gotscharek, Ulrich Reffle,
Christoph Ringlstetter, Klaus U. Schulz
CIS - Center for Information and Language Processing
University of Munich
Gini GmbH Munich
2. Motivation
- For historical texts still many OCR errors
- Downstream Applications harmed
Option to improve quality with interactive Postcorrection
Why: selected and important texts/corpora or parts can/must be lifted
to a much higher level of accuracy/to perfection.
Somehow “business driven”
How: The user experience of the software has a major influence on time and
efforts needed for improving accuracy.
3. Approach
Features to Raise Productivity within our competence and explorative :
• Plugin Language technology that unmasks orthographic variation in historical
language and returns document specific distributions of OCR errors.
• Tool visualizes series of similar OCR errors
• Error series can be corrected in one shot
• Implement productive UX through interface and functionality
4. Evaluation
Tool developed in University Environment during EU project IMPACT
and maintained since despite serious fluctuation
Practical user tests in three major European libraries have shown
Gains in time/corrections rates. User ratings from practitioners high.
Maintaining Interest, open for new languages, new functionalities.
Division of language resources and tool through a server-client model
Published as an open source tool under GitHub.
5. § Language technology used for improvement of
interactive postcorrection
§ Lexica, matching tool, profiler integrated as background technology
§ Document centric knowledge from unsupervised analysis of OCRed
document used for detection of error classes and suggested corrections
§ Batchmode for corrections of many errors in „one shot“
§ Rich graphical user interface to let users fully benefit
from „knowledge“ on document derived error classes
Starting Point: Postcorrection Tool as
a Carrier of Technology
6. Flexible GUI
OCR
Correction candidates,
Special workflows
Image
§ Unlimited configuration of
the views:
– OCR with image snippets
– Complete image page
– Correction candidates, special
workflows
Font-/window size
configuration
7. § OCRed text is presented to
the user with word-image
alignment.
§ Natural flow of text is
maintained, comparison
with original text images a
lot easier than with focus
hopping
View: OCR + Image Snippets
8. § Alternative view with the
complete page image.
– Useful for difficult to read words
– Useful if word segmentation of the OCR
is too poor
– Useful if long distance text understanding
is needed
View: Original Image
10. § Speed-up through
selection of proposed
correction candidates
In line with what is usually
offered: „Base Mode“
Drop Down Selection of Correction
Candidates
11. Modern word word form in word form in
form ground truth OCRed text
Wmod Wgt Wocr
Patterns applied
„pattern trace“
OCR errors applied
„OCR trace“
„Interpretation“ of the OCR token
Starting from OCR token Wocr Estimation of the Channel Model
Two-Channel Model for OCRed
historical Text
12. Improved model for
• words
• patterns
• OCR errors
and their probabilities
.
.
for each OCR token Wocr
Improved list of
interpretations
with probabilities
Final Result
Modern word
Ground truth
OCR trace
Hist trace
Local guess Global guess
Profiling of historical OCRed corpora
with EM
15. § Valid historical words not
marked as errors even if
not in the lexicon
(„hypothetical lexicon“)
§ Historical variants
proposed as correction
candidates
Lexicons Triggered by Profiles
16. § Improved Ranking of candidates through document
specific language and error profile
§ Concordance Error View with high confidence
corrections
Selection of Correction Candidates
17. § High Probability Identical strings
corrected as batch
§ Concordance views optional
Rapid Workflow - Batch Processing
Identical Strings
18. § Strings with identical error patterns
corrected as batch
§ In the example: n -> u
Rapid Workflow - Batch Processing
Identical Error Patterns
19. Controlled “Hard” Evaluations
0 10 20 30 40 50 60 70 80 90
0
100
200
300
400
500
600
700
800
BSB Dokument1
Corrections made
User1 F
User2 F
User3 B
User4 B
User5 F
User6 B
time in minutes
correctionsmade
§ Measure Points every 10
minutes for 90 minutes
§ Each User with a base/full
session (inter/intra User
comparison)
§ More corrections avg. 1.5x – 3x
for Full Mode
§ Earley Gains: First 10 Minutes
21. Soft Evaluations
Questionaires with all three institutions.
Most favorite aspect:
Batch Corrections
Main problems:
Stability
Correction of Segmentation Errors
22. Future work
• Extend to new Languages e.g. Latin
• New Correction Scenarios e.g. specific Named
Entity Correction
• Turn Interest into a Community and Implement
Industrial Tool Partnerships for isolated parts of
the Software
23. Thanks for your attention!
… and special thanks to University of Alicante, Bavarian State Library, Royal
Library of the Netherlands for their Time and Efforts during the Experiments