OCR en toepassing bij de KB by Marian Hellema

•Download as PPT, PDF•

0 likes•365 views

IMPACT Centre of Competence

Presentation given on the KB IMPACT Demo Day on 16 February 2011 in The Hague.

Education

OCR en toepassing bij de KB IMPACT Demo-dag Marian Hellema, 16 februari 2011

Digitaliseren ,[object Object],[object Object],[object Object]

Waarom OCR? ,[object Object],[object Object],[object Object],[object Object],[object Object]

Zoeken ( fulltext search) ,[object Object]

Layout-informatie ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Stukje ALTO ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Hoe gaat OCR in zijn werk? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Hoe gaat OCR in zijn werk? ,[object Object],[object Object],[object Object]

The document discusses semantic interoperability in the CLARIN infrastructure. CLARIN aims to provide access to digital language data and tools for researchers. It sets technical standards and recommendations to enable discovery and sharing of resources located in different places. CLARIN achieves a level of semantic interoperability through its Component Metadata Infrastructure (CMDI), which includes a metadata concept registry (ISOcat) and profile (CMDI). This infrastructure allows resources to be semantically mapped and searched despite differences in their metadata structures. While flexible, the system could be improved by simplifying the concept registry workflow and better integrating semantic annotation into researchers' tools and workflows.

Governare Reti, governare con le Reti

Stefano Rossi

Wroclaw university library - Grazyna Piotrowicz

IMPACT Centre of Competence

IMPACT Framework en Evaluatie by Clemens Neudecker

IMPACT Centre of Competence

Datech2014 Session 2 - Automated Assignment of Topics to OCRed Texts

IMPACT Centre of Competence

This document summarizes an experiment on automatically assigning topics to text from a historical encyclopedia using optical character recognition (OCR). The researchers tested automated topic assignment on 14 pages from an 18th century German encyclopedia that had been OCR'd. They analyzed the recall and precision of topic assignment on the OCR'd text, original text, and original text with modernized spelling. Topic assignment was challenging due to OCR errors and historical topics not represented in their topic hierarchy. While automated topic assignment showed some value in organizing the historical texts, errors limited its usefulness if precision needed to be high. The researchers identified ways to improve precision, such as updating the topic hierarchy, and proposed combining it with social tagging to

Gentoo勉強会平成26/7月/5日

alice ferrazzi

Polaris is one of the leading Ukrainian brand consulting agencies operating on Ukrainian market since 2002. We are focused on branding and design services among which are brand strategy, brand analysis and consulting. Our professional team has created identities for Hike, Zibert, Axent, Orange Boom, Velon, Jaffa Select, Kite, Flower shop, Walzer, Миргородская, Розмай, Кристальная капля, Форес, Прозора, Ахтырское, Уника. For more information on our projects and services visit our website Polaris.ua

Gentoo勉強会平成26 11月-15日

alice ferrazzi

IMPACT Final Conference - USAL - Arbitrary warping

IMPACT Centre of Competence

1) The document outlines steps in a digitization workflow including scanning, image enhancement, page splitting, border and curl removal, layout analysis, segmentation, and optical character recognition. 2) It describes a fully automated tool for correcting arbitrary geometric distortions in documents, including those with multiple columns. The process is fully parameterized and reversible with no adverse effects on undistorted documents. 3) Preliminary results show the method more accurately corrects distortions compared to another method and the original images, by calculating deviations from straight lines.

IMPACT Final Conference - Claus Gravenhorst

IMPACT Centre of Competence

Biomolecules

guestd3e1af

There are 4 main types of biomolecules that make up living things: proteins, carbohydrates, lipids, and nucleic acids. These large molecules are composed of carbon and other atoms bonded together. Energy is stored in the covalent bonds of these biomolecules and released when they are broken down during chemical reactions in the body, which allows the body to use the parts to build new molecules and structures.

Governare Reti, Governare con le Reti (con note intervento)

Stefano Rossi

BSB Demo Day - Mühlberger - DokumentstrukturanalyseIMPACT Centre of Competence

Session6 01.helmut schmid

IMPACT Centre of Competence

Session1 03.hsian-an wang

IMPACT Centre of Competence

This document discusses using text models to improve the accuracy of optical character recognition (OCR) on Chinese rare books. It conducted experiments using n-gram, backward/forward n-gram, and LSTM models on OCR data from ancient medicine books. The backward and forward 4-gram model achieved the highest correction rate at 97.57%. Mixing the LSTM 6-gram model with the OCR's top 5 candidates and probability of the top candidate further improved accuracy to 97.71%, demonstrating that combining text models with OCR probabilities can better correct OCR errors than text models alone. In conclusion, text models are effective for increasing OCR accuracy on rare books, with backward/forward 4-gram and LSTM 6-gram

Session7 03.katrien depuydt

IMPACT Centre of Competence

Session7 02.peter kiraly

IMPACT Centre of Competence

Session6 04.giuseppe celano

IMPACT Centre of Competence

Session6 03.sandra young

IMPACT Centre of Competence

Session6 02.jeremi ochab

IMPACT Centre of Competence

Session5 04.evangelos varthis

IMPACT Centre of Competence

Session5 03.george rehm

IMPACT Centre of Competence

Session5 02.tom derrick

IMPACT Centre of Competence

Session5 01.rutger vankoert

IMPACT Centre of Competence

Session4 04.senka drobac

IMPACT Centre of Competence

Session3 04.arnau baro

IMPACT Centre of Competence

Viewers also liked

Gentoo勉強会平成26 8月-2日

alice ferrazzi

IMPACT Online by Lieke Ploeger

IMPACT Centre of Competence

POLARIS Brand Design&Development

POLARIS Brand Design & Development

Gentoo勉強会平成26 11月-15日

alice ferrazzi

IMPACT Final Conference - USAL - Arbitrary warping

IMPACT Centre of Competence

IMPACT Final Conference - Claus Gravenhorst

IMPACT Centre of Competence

Biomolecules

guestd3e1af

Governare Reti, Governare con le Reti (con note intervento)

Stefano Rossi

BSB Demo Day - Mühlberger - DokumentstrukturanalyseIMPACT Centre of Competence

Viewers also liked (9)

Gentoo勉強会平成26 8月-2日

IMPACT Online by Lieke Ploeger

POLARIS Brand Design&Development

Gentoo勉強会平成26 11月-15日

IMPACT Final Conference - USAL - Arbitrary warping

IMPACT Final Conference - Claus Gravenhorst

Biomolecules

Governare Reti, Governare con le Reti (con note intervento)

BSB Demo Day - Mühlberger - Dokumentstrukturanalyse

More from IMPACT Centre of Competence

Session6 01.helmut schmid

IMPACT Centre of Competence

Session1 03.hsian-an wang

IMPACT Centre of Competence

Session7 03.katrien depuydt

IMPACT Centre of Competence

Session7 02.peter kiraly

IMPACT Centre of Competence

Session6 04.giuseppe celano

IMPACT Centre of Competence

Session6 03.sandra young

IMPACT Centre of Competence

Session6 02.jeremi ochab

IMPACT Centre of Competence

Session5 04.evangelos varthis

IMPACT Centre of Competence

Session5 03.george rehm

IMPACT Centre of Competence

Session5 02.tom derrick

IMPACT Centre of Competence

Session5 01.rutger vankoert

IMPACT Centre of Competence

Session4 04.senka drobac

IMPACT Centre of Competence

Session3 04.arnau baro

IMPACT Centre of Competence

Session3 03.christian clausner

IMPACT Centre of Competence

Session3 02.kimmo ketunnen

IMPACT Centre of Competence

Session3 01.clemens neudecker

IMPACT Centre of Competence

Session2 04.ashkan ashkpour

IMPACT Centre of Competence

- The document describes a project to fill gaps in knowledge about diamond mining, trading, and polishing in Borneo by developing a workflow using various CLARIAH tools and resources. - The workflow involved digitizing a diamond encyclopedia, extracting concepts and place names, linking the data to external sources to create linked open data, and querying newspaper archives to build a corpus of relevant articles. - Promising results showed mining, trading, and polishing continued in Borneo for Southeast Asian customers, and described previously unknown diamond fields and polishing locations in Borneo. The project aims to apply the workflow to other commodities like sugar.

Session2 03.juri opitz

IMPACT Centre of Competence

Session2 02.christian reul

IMPACT Centre of Competence

Session2 01.emad mohamed

IMPACT Centre of Competence

This document describes the SOS system for segmenting, stemming, and standardizing Arabic text. It presents the challenges of processing Arabic cultural heritage texts which contain orthographic variations. The system uses gradient boosting machines and achieves state-of-the-art performance on segmentation and derives stemming as a byproduct. It also standardizes orthography with high accuracy, which further improves segmentation. The system addresses issues like hamza forms and letter confusions that previous systems did not handle well.

More from IMPACT Centre of Competence (20)

Session6 01.helmut schmid

Session1 03.hsian-an wang

Session7 03.katrien depuydt

Session7 02.peter kiraly

Session6 04.giuseppe celano

Session6 03.sandra young

Session6 02.jeremi ochab

Session5 04.evangelos varthis

Session5 03.george rehm

Session5 02.tom derrick

Session5 01.rutger vankoert

Session4 04.senka drobac

Session3 04.arnau baro

Session3 03.christian clausner

Session3 02.kimmo ketunnen

Session3 01.clemens neudecker

Session2 04.ashkan ashkpour

Session2 03.juri opitz

Session2 02.christian reul

Session2 01.emad mohamed