SlideShare a Scribd company logo
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.




TR5 Profiler and Post-Correction System
Ludwig-Maximilians-UniversitΓ€t MΓΌnchen
Centrum fΓΌr Informations- und Sprachverarbeitung
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.




    TR5 Post-Correction System

User interface for easy postcorrection of
  User interface for easy postcorrection of
historical OCR'd documents
  historical OCR'd documents
Stand-alone user interface
  Stand-alone user interface
Innovative language technology enables
  Innovative language technology enables
identification, presentation of recognition
  identification, presentation of recognition
errors and efficient correction
  errors and efficient correction
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.




      Customizable user interface                                                                                                                       Font size

Freely rearrangeable interface
 Freely rearrangeable interface
elements:
 elements:
 ––   OCR with Image snippets
        OCR with Image snippets
 ––   Complete image
        Complete image
 ––   Correction candidates/ Special                                                                                              OCR and image fragments
        Correction candidates/ Special
      functions
        functions




                                                                                                                                                      Complete image



             Correction candidates,
               Special functions
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.




    View: OCR and Image clippings
Word by word presentation of
 Word by word presentation of
recognized text and image clippings.
 recognized text and image clippings.
Comparison of text and image follows
 Comparison of text and image follows
reading order and isismuch easier than
 reading order and much easier than
side-by-side presentation of image and
 side-by-side presentation of image and
text.
 text.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.




     View: Original image

–– For difficult cases
     For difficult cases
–– When word segmentation by OCR
     When word segmentation by OCR
   fails
     fails
–– Current word isis highlighted
     Current word highlighted
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.




    Word by word correction of text
Correction by manual text entry
 Correction by manual text entry
Choosing correction candidates
 Choosing correction candidates
Faster correction thanks to candidates
 Faster correction thanks to candidates
proposed by the postcorrection system
 proposed by the postcorrection system
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.




Batch correction: efficient postcorrection
 Batch correction
  Batch correction
    –– Several occurences of identical
        Several occurences of identical
       word
        word
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.




Batch correction: efficient postcorrection
Batch correction
 Batch correction
  –– classes of systematic errors
       classes of systematic errors
  –– errors where the correction
       errors where the correction
     candidate has aa high degree of
       candidate has high degree of
     certainty
       certainty
  –– further possilities
       further possilities
                  Frequent errors
                   Frequent errors
                  For instance Location names
                   For instance Location names
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.




Postcorrection system: Evaluation
User Experiment with 14 individual instances


        Result:
         Result:
        Error correction thanks to text and error
         Error correction thanks to text and error
        profiling is 2.7 times faster
         profiling is 2.7 times faster




                                                                                                                                       9
                                                                                                                                              Ulrich Reffle, 4,
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.




Korrektursystem




                                                                                                                                       10
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.




Korrektursystem




                                                                                                                                       11
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.




     Why another postcorrection system?

   Targets more specialist audience
    Targets more specialist audience

Thanks to underlying language technology:
 Thanks to underlying language technology:
   Historical variants are recognized and
    Historical variants are recognized and
   not marked as errors –– evenwhen not in
    not marked as errors even when not in
   historical lexicon
    historical lexicon
   Historical variants are proposed as
    Historical variants are proposed as
   correction candidates
    correction candidates
   Typical error patterns are exploited
    Typical error patterns are exploited
   Ranking of correction candidates
    Ranking of correction candidates
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.




Underlying language technology
  Lexica and language models help dealing with orthographical variants und
   Lexica and language models help dealing with orthographical variants und
  unknown words.
   unknown words.
  Recognition of OCR errors and proposal of Correction candidates depends
   Recognition of OCR errors and proposal of Correction candidates depends
  on specially developed LMU language technology
   on specially developed LMU language technology
           Approximate search inin β€œhypothetical lexicaβ€œ
            Approximate search β€œhypothetical lexicaβ€œ
           An analysis of the whole work (β€žtext and error profileβ€œ) produces document-
            An analysis of the whole work (β€žtext and error profileβ€œ) produces document-
           specific information about the language and the type of OCR errors
            specific information about the language and the type of OCR errors
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.




Text and error profiles
             Text profile                                                                                         Error profile
  Coverage of lexica
   Coverage of lexica
                                                                                                        Estimate of error rate
                                                                                                         Estimate of error rate
  Typical variant patterns                                                                              Typical OCR errors
                                                                                                         Typical OCR errors
   Typical variant patterns

  β†’ Targeted selection of lexica
  β†’ Targeted selection of lexica
  β†’ Better language models                                                                              β†’ Better modeling of error channel
                                                                                                        β†’ Better modeling of error channel
  β†’ Better language models
           β†’ Distinguishing historical variants                                                                    β†’ Distinguishing historical variants
            β†’ Distinguishing historical variants                                                                    β†’ Distinguishing historical variants
             and OCR errors                                                                                          and OCT errors
              and OCR errors                                                                                          and OCT errors
           β†’ Ranking of correction candidates                                                                      β†’ Ranking of correction candidates
            β†’ Ranking of correction candidates                                                                      β†’ Ranking of correction candidates
           β†’ Recall and Precision in IR                                                                            β†’Treatment of systematic errors
            β†’ Recall and Precision in IR                                                                           β†’Treatment of systematic errors




                                                                                                                                         14
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.




Underlying logic: Dual noisy channel model
Interpretation of OCR output tokens as result of two β€œnoisy channels”


            modern word u                                historical variant v                                    OCR result w
                                           patterns                                        OCR errors




Given an OCR token w, give possible interpretations of w in terms of
         β€’ β€œunderlying” modern word u (IR!)
         β€’ correct historical word v and its derivation from u via β€œpatterns”
         β€’ OCR errors garbling v into w
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.




Historical variant and OCR error patterns

                                                                                                                        teil            theil
Historical
Variants




 OCR
 Error patterns                                                                                                                  theil             iheil
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.




  Relative frequency: 2.9% of all
  β€˜t’ are rewritten to β€˜th’




                                                                                                      Absolute frequency: Pattern
                                                                                                      was found 120 times in the
                                                                                                      current document.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.




 Local view: interpretations of tokens
     –         Local view: β€œMeaningful interpretations” for all tokens of the
               ocr text are the matches in all attached lexicons, using the
               given settings.
                                                                                        Occurrence of spelling variant
                                                                                        β€œiβ†’y”:




Occurrence of ocr error
β€œiβ†’y”:
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.




 Global view: pattern frequencies
     –         Global view: Increment counters to estimate (relative)
               frequencies.

                                                                                        Occurrences of spelling variant
                                                                                        β€œiβ†’y”:
                                                                                        +0.999771




Occurrences of ocr error
β€œiβ†’y”:
+0.000224948
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.




      Computation of profile: initialization

       Initial global profile


Non-specific model with
probabilities for
β€’Words
β€’Variant Patterns
β€’Error


        OCR result
     w0, w1 ,w2, w3, …
      0   1   2   3
                                                                                                                                             20
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.




      Computation of profile: global to local

       Initial global profile
                                                                                                                                                    Local profile

Non-specific model with                                                                                                                    ww:33::
                                                                                                                                            w:
                                                                                                                                         ww… β†’ … β†’ …
                                                                                                                                             :
                                                                                                                                         w22:33 β†’ … β†’ …
                                                                                                                                            …→ … β†’ …
probabilities for                                                                                                                        ……→……→……
                                                                                                                                           …→ …→ …
                                                                                                                                           …→ … β†’ …
                                                                                                                                         …… β†’ … β†’ …
                                                                                                                                      w11::… β†’ … β†’ …
                                                                                                                                       w …→…→…    β†’ β†’
β€’Words                                                                                                                                ………→ β†’β†’ ……
                                                                                                                                         ………… ……… …
                                                                                                                                              β†’ … β†’β†’
                                                                                                                                              β†’
                                                                                                                                    w00…… →→…→ …
                                                                                                                                          β†’ … →… …
                                                                                                                                           β†’β†’ →…
                                                                                                                                    w :: … β†’ ……→…
                                                                                                                                           … →…→
                                                                                                                                           …         β†’
                                                                                                                                                   ……… …
β€’Variant Patterns                                                                                                                   …………… ……→…
                                                                                                                                       β†’β†’ β†’β†’β†’ …
                                                                                                                                        →…→ β†’ … …
                                                                                                                                          β†’ …… β†’
                                                                                                                                    …………→……→… …
                                                                                                                                           … →… β†’
                                                                                                                                            …→ …
                                                                                                                                            …→    β†’
β€’Error                                                                                                                                        β†’ β†’
                                                                                                                                    …… β†’ … β†’ …
                                                                                                                                    …… β†’ … β†’ …
                                                                                                                                        →…→…
                                                                                                                                       →…→…
                                                                                                                                    …… β†’ … β†’ …
                                                                                                                                    …… β†’ … β†’ …
                                                                                                                                        →…→…
                                                                                                                                       →…→…
                                                                                                                                    …→…→…
                                                                                                                                    …→…→…
        OCR result
     w0, w1 ,w2, w3, …
      0   1   2   3
                                                                                                                                             21
                                                                                                                                                    Ulrich Reffle, 4,
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.




     Computation of profile: local to global

      Global profile
                                                                                                                                                   Local profile

Improved model with                                                                                                                       ww:33::
                                                                                                                                           w:
                                                                                                                                        ww… β†’ … β†’ …
                                                                                                                                            :
                                                                                                                                        w22:33 β†’ … β†’ …
                                                                                                                                           …→ … β†’ …
probabilities for                                                                                                                       ……→……→……
                                                                                                                                          …→ …→ …
                                                                                                                                          …→ … β†’ …
                                                                                                                                        …… β†’ … β†’ …
                                                                                                                                     w11::… β†’ … β†’ …
                                                                                                                                      w …→…→…    β†’ β†’
β€’Words                                                                                                                               ………→ β†’β†’ ……
                                                                                                                                        ………… ……… …
                                                                                                                                             β†’ … β†’β†’
                                                                                                                                             β†’
                                                                                                                                   w00…… →→…→ …
                                                                                                                                         β†’ … →… …
                                                                                                                                          β†’β†’ →…
                                                                                                                                   w :: … β†’ ……→…
                                                                                                                                          … →…→
                                                                                                                                          …         β†’
                                                                                                                                                  ……… …
β€’Variant Patterns                                                                                                                  …………… ……→…
                                                                                                                                      β†’β†’ β†’β†’β†’ …
                                                                                                                                       →…→ β†’ … …
                                                                                                                                         β†’ …… β†’
                                                                                                                                   …………→……→… …
                                                                                                                                          … →… β†’
                                                                                                                                           …→ …
                                                                                                                                           …→    β†’
β€’Error                                                                                                                                       β†’ β†’
                                                                                                                                   …… β†’ … β†’ …
                                                                                                                                   …… β†’ … β†’ …
                                                                                                                                       →…→…
                                                                                                                                      →…→…
                                                                                                                                   …… β†’ … β†’ …
                                                                                                                                   …… β†’ … β†’ …
                                                                                                                                       →…→…
                                                                                                                                      →…→…
                                                                                                                                   …→…→…
                                                                                                                                   …→…→…
       OCR result
    w0, w1 ,w2, w3, …
     0   1   2   3
                                                                                                                                            22
                                                                                                                                                   Ulrich Reffle, 4,
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.




     Computation of profile: iteration

      Global profile
                                                                                                                                                   Local profile

Improved model with                                                                                                                       ww:33::
                                                                                                                                           w:
                                                                                                                                        ww… β†’ … β†’ …
                                                                                                                                            :
                                                                                                                                        w22:33 β†’ … β†’ …
                                                                                                                                           …→ … β†’ …
probabilities for                                                                                                                       ……→……→……
                                                                                                                                          …→ …→ …
                                                                                                                                          …→ … β†’ …
                                                                                                                                        …… β†’ … β†’ …
                                                                                                                                     w11::… β†’ … β†’ …
                                                                                                                                      w …→…→…    β†’ β†’
β€’Words                                                                                                                               ………→ β†’β†’ ……
                                                                                                                                        ………… ……… …
                                                                                                                                             β†’ … β†’β†’
                                                                                                                                             β†’
                                                                                                                                   w00…… →→…→ …
                                                                                                                                         β†’ … →… …
                                                                                                                                          β†’β†’ →…
                                                                                                                                   w :: … β†’ ……→…
                                                                                                                                          … →…→
                                                                                                                                          …         β†’
                                                                                                                                                  ……… …
β€’Variant Patterns                                                                                                                  …………… ……→…
                                                                                                                                      β†’β†’ β†’β†’β†’ …
                                                                                                                                       →…→ β†’ … …
                                                                                                                                         β†’ …… β†’
                                                                                                                                   …………→……→… …
                                                                                                                                          … →… β†’
                                                                                                                                           …→ …
                                                                                                                                           …→    β†’
β€’Error                                                                                                                                       β†’ β†’
                                                                                                                                   …… β†’ … β†’ …
                                                                                                                                   …… β†’ … β†’ …
                                                                                                                                       →…→…
                                                                                                                                      →…→…
                                                                                                                                   …… β†’ … β†’ …
                                                                                                                                   …… β†’ … β†’ …
                                                                                                                                       →…→…
                                                                                                                                      →…→…
                                                                                                                                   …→…→…
                                                                                                                                   …→…→…
       OCR result
    w0, w1 ,w2, w3, …
     0   1   2   3
                                                                                                                                            23
                                                                                                                                                   Ulrich Reffle, 4,
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.




Profiler Evaluation

Measure the quality
1.   of global profiles
2.   of OCR error detection

   Challenges
      Measures not obvious
      Good evaluation data is difficult to gather
      Results need interpretation
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.




Evaluation: Measures
(1) Global Profiles
    Percentage of matches for the first 10 patterns in the ranked output lists
    Two Values: Historical Patterns, OCR Patterns

(2) OCR Error Detection
    Precision and Recall for the OCR errors detected by the Profiler

(3) Indirect evaluation
    (For instance, by means of the postcorrection system)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.




Evaluation: Data preparation
(1) Deep Evaluation:
    For each token of the evaluation document the historical interpretation and the
    OCR interpretation have been manually annotated.
    ++ fully accurate -- manual work

(2) Shallow Evaluation:
    The OCR’ed document is automatically aligned with its re-typed ground truth;
    For each token of the evaluation document the historical and the OCR
    interpretation is automatically assigned from the ground truth.

   ++ no manual work – not completely accurate
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.




Evaluation: Data


Deep:        Eckartshausen 100 pages
             Briefkunst                           40 pages
Shallow: 5 books each,
             16th, 17th and 18th century
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.




Evaluation: Eckartshausen


     (1)         historical patterns
                 matches first 10                                              70%
                 precision all                                                 68%
                 recall    all                                                 73%
     (2)         OCR patterns
                matches first 6                                               67%
                precision all                                                59%
                recall all                                                   19%
     (3)        OCR error detection
                precision                                                     86%
                recall                                                        46%
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.




Graphical Evaluation: Eckartshausen
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.




Graphical Evaluation: diacritics


Hist. Var.




   OCR
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.




 Shallow Evaluation Results


                                                                           16th                                     17th                                  18th
HIST Patterns first 10                                                     60%                                      74%                                   78%
OCR Patterns first 10                                                      48%                                      70%                                   50%
Error Detection Prec                                                       95%                                      92%                                   81%
Error Detection Recall                                                     49%                                      43%                                   45%
Content Words Errors                                                       64%                                      44%                                   16%
Easy Interactive Correction per                                            β‰ˆ3000 words                              β‰ˆ 1892 words                          β‰ˆ 720 words
10,000 words
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.




Global Profile: Spelling variation patterns
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.




Spelling variation profile
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.




OCR Error Profile
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.

More Related Content

What's hot

Logics and Ontologies for Portuguese Understanding
Logics and Ontologies for Portuguese UnderstandingLogics and Ontologies for Portuguese Understanding
Logics and Ontologies for Portuguese Understanding
Valeria de Paiva
Β 
Pal gov.tutorial4.session1 2.whatisontology
Pal gov.tutorial4.session1 2.whatisontologyPal gov.tutorial4.session1 2.whatisontology
Pal gov.tutorial4.session1 2.whatisontologyMustafa Jarrar
Β 
Pal gov.tutorial4.session3.lab bankcustomerontology
Pal gov.tutorial4.session3.lab bankcustomerontologyPal gov.tutorial4.session3.lab bankcustomerontology
Pal gov.tutorial4.session3.lab bankcustomerontologyMustafa Jarrar
Β 
CSTalks-Natural Language Processing-17Aug
CSTalks-Natural Language Processing-17AugCSTalks-Natural Language Processing-17Aug
CSTalks-Natural Language Processing-17Augcstalks
Β 
Seeing is Correcting:Linked Open Data for Portuguese
Seeing is Correcting:Linked Open Data for PortugueseSeeing is Correcting:Linked Open Data for Portuguese
Seeing is Correcting:Linked Open Data for Portuguese
Valeria de Paiva
Β 
56 o oo ccf_final
56 o oo ccf_final56 o oo ccf_final
56 o oo ccf_final
AEGIS-ACCESSIBLE Projects
Β 
Open Source Natural Language Processing - Francis Bond
Open Source Natural Language Processing - Francis BondOpen Source Natural Language Processing - Francis Bond
Open Source Natural Language Processing - Francis Bond
jasonong
Β 
Portuguese Linguistic Tools: What, Why and How
Portuguese Linguistic Tools: What, Why and HowPortuguese Linguistic Tools: What, Why and How
Portuguese Linguistic Tools: What, Why and How
Valeria de Paiva
Β 
Why Languages Matter 20090123
Why Languages Matter 20090123Why Languages Matter 20090123
Why Languages Matter 20090123
David Wood
Β 

What's hot (10)

Logics and Ontologies for Portuguese Understanding
Logics and Ontologies for Portuguese UnderstandingLogics and Ontologies for Portuguese Understanding
Logics and Ontologies for Portuguese Understanding
Β 
Pal gov.tutorial4.session1 2.whatisontology
Pal gov.tutorial4.session1 2.whatisontologyPal gov.tutorial4.session1 2.whatisontology
Pal gov.tutorial4.session1 2.whatisontology
Β 
Pal gov.tutorial4.session3.lab bankcustomerontology
Pal gov.tutorial4.session3.lab bankcustomerontologyPal gov.tutorial4.session3.lab bankcustomerontology
Pal gov.tutorial4.session3.lab bankcustomerontology
Β 
CSTalks-Natural Language Processing-17Aug
CSTalks-Natural Language Processing-17AugCSTalks-Natural Language Processing-17Aug
CSTalks-Natural Language Processing-17Aug
Β 
Seeing is Correcting:Linked Open Data for Portuguese
Seeing is Correcting:Linked Open Data for PortugueseSeeing is Correcting:Linked Open Data for Portuguese
Seeing is Correcting:Linked Open Data for Portuguese
Β 
56 o oo ccf_final
56 o oo ccf_final56 o oo ccf_final
56 o oo ccf_final
Β 
Open Source Natural Language Processing - Francis Bond
Open Source Natural Language Processing - Francis BondOpen Source Natural Language Processing - Francis Bond
Open Source Natural Language Processing - Francis Bond
Β 
Pargram2011
Pargram2011Pargram2011
Pargram2011
Β 
Portuguese Linguistic Tools: What, Why and How
Portuguese Linguistic Tools: What, Why and HowPortuguese Linguistic Tools: What, Why and How
Portuguese Linguistic Tools: What, Why and How
Β 
Why Languages Matter 20090123
Why Languages Matter 20090123Why Languages Matter 20090123
Why Languages Matter 20090123
Β 

Viewers also liked

Redes sociales y Microblogs
Redes sociales y MicroblogsRedes sociales y Microblogs
Redes sociales y Microblogs
Pablo Garaizar
Β 
Structural analysis of documents Functional Extension Parser (FEP). GΓΌnter MΓΌ...
Structural analysis of documents Functional Extension Parser (FEP). GΓΌnter MΓΌ...Structural analysis of documents Functional Extension Parser (FEP). GΓΌnter MΓΌ...
Structural analysis of documents Functional Extension Parser (FEP). GΓΌnter MΓΌ...
Biblioteca Nacional de EspaΓ±a
Β 
El archivo de Internet, bibliotecas que piensan en el futuro. Mar PΓ©rez Morillo
El archivo de Internet, bibliotecas que piensan en el futuro. Mar PΓ©rez MorilloEl archivo de Internet, bibliotecas que piensan en el futuro. Mar PΓ©rez Morillo
El archivo de Internet, bibliotecas que piensan en el futuro. Mar PΓ©rez Morillo
Biblioteca Nacional de EspaΓ±a
Β 
Biblioteca Digital del Patrimonio Iberoamericano
Biblioteca Digital del Patrimonio IberoamericanoBiblioteca Digital del Patrimonio Iberoamericano
Biblioteca Digital del Patrimonio Iberoamericano
Biblioteca Nacional de EspaΓ±a
Β 
MΓ‘ster / Curso de experto en bibliotecas y patrimonio documental. Rosario LΓ³p...
MΓ‘ster / Curso de experto en bibliotecas y patrimonio documental. Rosario LΓ³p...MΓ‘ster / Curso de experto en bibliotecas y patrimonio documental. Rosario LΓ³p...
MΓ‘ster / Curso de experto en bibliotecas y patrimonio documental. Rosario LΓ³p...
Biblioteca Nacional de EspaΓ±a
Β 
IMPACT implicaciΓ³n de la BNE-UA y resultados preliminares del proyecto. Isabe...
IMPACT implicaciΓ³n de la BNE-UA y resultados preliminares del proyecto. Isabe...IMPACT implicaciΓ³n de la BNE-UA y resultados preliminares del proyecto. Isabe...
IMPACT implicaciΓ³n de la BNE-UA y resultados preliminares del proyecto. Isabe...
Biblioteca Nacional de EspaΓ±a
Β 
ΒΏQuΓ© es un archivo?
ΒΏQuΓ© es un archivo?ΒΏQuΓ© es un archivo?
ΒΏQuΓ© es un archivo?David GΓ³mez
Β 

Viewers also liked (8)

Redes sociales y Microblogs
Redes sociales y MicroblogsRedes sociales y Microblogs
Redes sociales y Microblogs
Β 
Structural analysis of documents Functional Extension Parser (FEP). GΓΌnter MΓΌ...
Structural analysis of documents Functional Extension Parser (FEP). GΓΌnter MΓΌ...Structural analysis of documents Functional Extension Parser (FEP). GΓΌnter MΓΌ...
Structural analysis of documents Functional Extension Parser (FEP). GΓΌnter MΓΌ...
Β 
El archivo de Internet, bibliotecas que piensan en el futuro. Mar PΓ©rez Morillo
El archivo de Internet, bibliotecas que piensan en el futuro. Mar PΓ©rez MorilloEl archivo de Internet, bibliotecas que piensan en el futuro. Mar PΓ©rez Morillo
El archivo de Internet, bibliotecas que piensan en el futuro. Mar PΓ©rez Morillo
Β 
Biblioteca Digital del Patrimonio Iberoamericano
Biblioteca Digital del Patrimonio IberoamericanoBiblioteca Digital del Patrimonio Iberoamericano
Biblioteca Digital del Patrimonio Iberoamericano
Β 
MΓ‘ster / Curso de experto en bibliotecas y patrimonio documental. Rosario LΓ³p...
MΓ‘ster / Curso de experto en bibliotecas y patrimonio documental. Rosario LΓ³p...MΓ‘ster / Curso de experto en bibliotecas y patrimonio documental. Rosario LΓ³p...
MΓ‘ster / Curso de experto en bibliotecas y patrimonio documental. Rosario LΓ³p...
Β 
Biblioteca Digital del Patrimonio Iberoamericano
Biblioteca Digital del Patrimonio IberoamericanoBiblioteca Digital del Patrimonio Iberoamericano
Biblioteca Digital del Patrimonio Iberoamericano
Β 
IMPACT implicaciΓ³n de la BNE-UA y resultados preliminares del proyecto. Isabe...
IMPACT implicaciΓ³n de la BNE-UA y resultados preliminares del proyecto. Isabe...IMPACT implicaciΓ³n de la BNE-UA y resultados preliminares del proyecto. Isabe...
IMPACT implicaciΓ³n de la BNE-UA y resultados preliminares del proyecto. Isabe...
Β 
ΒΏQuΓ© es un archivo?
ΒΏQuΓ© es un archivo?ΒΏQuΓ© es un archivo?
ΒΏQuΓ© es un archivo?
Β 

Similar to TR5 Prolifer and Post-Correction System. Ludwig Maximilians

IMPACT Final Conference - Ulrich Reffle
IMPACT Final Conference - Ulrich ReffleIMPACT Final Conference - Ulrich Reffle
IMPACT Final Conference - Ulrich Reffle
IMPACT Centre of Competence
Β 
Targeted Language Resources for the Digitisation of Historical Collections
Targeted Language Resources for the Digitisation of Historical CollectionsTargeted Language Resources for the Digitisation of Historical Collections
Targeted Language Resources for the Digitisation of Historical CollectionsEmma Huber
Β 
Impact centre of_competence_for_workshop_ocr_rouen_march_2011[1]
Impact centre of_competence_for_workshop_ocr_rouen_march_2011[1]Impact centre of_competence_for_workshop_ocr_rouen_march_2011[1]
Impact centre of_competence_for_workshop_ocr_rouen_march_2011[1]
IMPACT Centre of Competence
Β 
IMPACT Final Conference - Hildelies Balk-Pennington de Jongh
IMPACT Final Conference - Hildelies Balk-Pennington de JonghIMPACT Final Conference - Hildelies Balk-Pennington de Jongh
IMPACT Final Conference - Hildelies Balk-Pennington de Jongh
IMPACT Centre of Competence
Β 
Bratislava WS - Fuchs - Abbyy - OCR overview_pdf
Bratislava WS - Fuchs - Abbyy - OCR overview_pdfBratislava WS - Fuchs - Abbyy - OCR overview_pdf
Bratislava WS - Fuchs - Abbyy - OCR overview_pdfIMPACT Centre of Competence
Β 
The Improving Access to Text (IMPACT) project and other European initiatives
The Improving Access to Text (IMPACT) project and other European initiativesThe Improving Access to Text (IMPACT) project and other European initiatives
The Improving Access to Text (IMPACT) project and other European initiatives
Michael Day
Β 
Towards a Marketplace of Open Source Software Data
Towards a Marketplace of Open Source Software DataTowards a Marketplace of Open Source Software Data
Towards a Marketplace of Open Source Software Data
Fernando Silva Parreiras
Β 
Oc wg-nif-20130711
Oc wg-nif-20130711Oc wg-nif-20130711
Oc wg-nif-20130711
STIinnsbruck
Β 
Semantics, Automatic Metadata and Audiovisual Contents. A case of study: the ...
Semantics, Automatic Metadata and Audiovisual Contents. A case of study: the ...Semantics, Automatic Metadata and Audiovisual Contents. A case of study: the ...
Semantics, Automatic Metadata and Audiovisual Contents. A case of study: the ...
FIAT/IFTA
Β 
Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...
Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...
Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...
IJCI JOURNAL
Β 
Workflow Development for OCR (and beyond)
Workflow Development for OCR (and beyond)Workflow Development for OCR (and beyond)
Workflow Development for OCR (and beyond)
cneudecker
Β 
A Strong Object Recognition Using Lbp, Ltp And Rlbp
A Strong Object Recognition Using Lbp, Ltp And RlbpA Strong Object Recognition Using Lbp, Ltp And Rlbp
A Strong Object Recognition Using Lbp, Ltp And Rlbp
Rikki Wright
Β 
IRJET - Language Linguist using Image Processing on Intelligent Transport Sys...
IRJET - Language Linguist using Image Processing on Intelligent Transport Sys...IRJET - Language Linguist using Image Processing on Intelligent Transport Sys...
IRJET - Language Linguist using Image Processing on Intelligent Transport Sys...
IRJET Journal
Β 
A SMART LANGUAGE TRANSLATION TECHNIQUE USING OCR
A SMART LANGUAGE TRANSLATION TECHNIQUE USING OCRA SMART LANGUAGE TRANSLATION TECHNIQUE USING OCR
A SMART LANGUAGE TRANSLATION TECHNIQUE USING OCR
IRJET Journal
Β 
Learning Usage of English KWICly with WebLEAP/DSR
Learning Usage of English KWICly with WebLEAP/DSRLearning Usage of English KWICly with WebLEAP/DSR
Learning Usage of English KWICly with WebLEAP/DSR
Takashi Yamanoue
Β 
178 - A replicated study on duplicate detection: Using Apache Lucene to searc...
178 - A replicated study on duplicate detection: Using Apache Lucene to searc...178 - A replicated study on duplicate detection: Using Apache Lucene to searc...
178 - A replicated study on duplicate detection: Using Apache Lucene to searc...
ESEM 2014
Β 
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The ServicesLynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Project
Β 
44 language resources for computer assisted translation
44 language resources for computer assisted translation44 language resources for computer assisted translation
44 language resources for computer assisted translationAEGIS-ACCESSIBLE Projects
Β 
IIIF: International Image Interoperability Framework @ DLF2012
IIIF: International Image Interoperability Framework @ DLF2012IIIF: International Image Interoperability Framework @ DLF2012
IIIF: International Image Interoperability Framework @ DLF2012
Tom-Cramer
Β 

Similar to TR5 Prolifer and Post-Correction System. Ludwig Maximilians (20)

IMPACT Final Conference - Ulrich Reffle
IMPACT Final Conference - Ulrich ReffleIMPACT Final Conference - Ulrich Reffle
IMPACT Final Conference - Ulrich Reffle
Β 
Targeted Language Resources for the Digitisation of Historical Collections
Targeted Language Resources for the Digitisation of Historical CollectionsTargeted Language Resources for the Digitisation of Historical Collections
Targeted Language Resources for the Digitisation of Historical Collections
Β 
Impact centre of_competence_for_workshop_ocr_rouen_march_2011[1]
Impact centre of_competence_for_workshop_ocr_rouen_march_2011[1]Impact centre of_competence_for_workshop_ocr_rouen_march_2011[1]
Impact centre of_competence_for_workshop_ocr_rouen_march_2011[1]
Β 
IMPACT Final Conference - Hildelies Balk-Pennington de Jongh
IMPACT Final Conference - Hildelies Balk-Pennington de JonghIMPACT Final Conference - Hildelies Balk-Pennington de Jongh
IMPACT Final Conference - Hildelies Balk-Pennington de Jongh
Β 
Achievement And Lessons Learned By An Loc
Achievement And Lessons Learned By An LocAchievement And Lessons Learned By An Loc
Achievement And Lessons Learned By An Loc
Β 
Bratislava WS - Fuchs - Abbyy - OCR overview_pdf
Bratislava WS - Fuchs - Abbyy - OCR overview_pdfBratislava WS - Fuchs - Abbyy - OCR overview_pdf
Bratislava WS - Fuchs - Abbyy - OCR overview_pdf
Β 
The Improving Access to Text (IMPACT) project and other European initiatives
The Improving Access to Text (IMPACT) project and other European initiativesThe Improving Access to Text (IMPACT) project and other European initiatives
The Improving Access to Text (IMPACT) project and other European initiatives
Β 
Towards a Marketplace of Open Source Software Data
Towards a Marketplace of Open Source Software DataTowards a Marketplace of Open Source Software Data
Towards a Marketplace of Open Source Software Data
Β 
Oc wg-nif-20130711
Oc wg-nif-20130711Oc wg-nif-20130711
Oc wg-nif-20130711
Β 
Semantics, Automatic Metadata and Audiovisual Contents. A case of study: the ...
Semantics, Automatic Metadata and Audiovisual Contents. A case of study: the ...Semantics, Automatic Metadata and Audiovisual Contents. A case of study: the ...
Semantics, Automatic Metadata and Audiovisual Contents. A case of study: the ...
Β 
Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...
Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...
Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...
Β 
Workflow Development for OCR (and beyond)
Workflow Development for OCR (and beyond)Workflow Development for OCR (and beyond)
Workflow Development for OCR (and beyond)
Β 
A Strong Object Recognition Using Lbp, Ltp And Rlbp
A Strong Object Recognition Using Lbp, Ltp And RlbpA Strong Object Recognition Using Lbp, Ltp And Rlbp
A Strong Object Recognition Using Lbp, Ltp And Rlbp
Β 
IRJET - Language Linguist using Image Processing on Intelligent Transport Sys...
IRJET - Language Linguist using Image Processing on Intelligent Transport Sys...IRJET - Language Linguist using Image Processing on Intelligent Transport Sys...
IRJET - Language Linguist using Image Processing on Intelligent Transport Sys...
Β 
A SMART LANGUAGE TRANSLATION TECHNIQUE USING OCR
A SMART LANGUAGE TRANSLATION TECHNIQUE USING OCRA SMART LANGUAGE TRANSLATION TECHNIQUE USING OCR
A SMART LANGUAGE TRANSLATION TECHNIQUE USING OCR
Β 
Learning Usage of English KWICly with WebLEAP/DSR
Learning Usage of English KWICly with WebLEAP/DSRLearning Usage of English KWICly with WebLEAP/DSR
Learning Usage of English KWICly with WebLEAP/DSR
Β 
178 - A replicated study on duplicate detection: Using Apache Lucene to searc...
178 - A replicated study on duplicate detection: Using Apache Lucene to searc...178 - A replicated study on duplicate detection: Using Apache Lucene to searc...
178 - A replicated study on duplicate detection: Using Apache Lucene to searc...
Β 
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The ServicesLynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Β 
44 language resources for computer assisted translation
44 language resources for computer assisted translation44 language resources for computer assisted translation
44 language resources for computer assisted translation
Β 
IIIF: International Image Interoperability Framework @ DLF2012
IIIF: International Image Interoperability Framework @ DLF2012IIIF: International Image Interoperability Framework @ DLF2012
IIIF: International Image Interoperability Framework @ DLF2012
Β 

More from Biblioteca Nacional de EspaΓ±a

La colecciΓ³n de relaciones de sucesos en la Biblioteca Nacional de EspaΓ±a
La colecciΓ³n de relaciones de sucesos en la Biblioteca Nacional de EspaΓ±aLa colecciΓ³n de relaciones de sucesos en la Biblioteca Nacional de EspaΓ±a
La colecciΓ³n de relaciones de sucesos en la Biblioteca Nacional de EspaΓ±a
Biblioteca Nacional de EspaΓ±a
Β 
Identidad comΓΊn: las fuentes del patrimonio bibliogrΓ‘fico. Ana Santos Aramburo
Identidad comΓΊn: las fuentes del patrimonio bibliogrΓ‘fico. Ana Santos AramburoIdentidad comΓΊn: las fuentes del patrimonio bibliogrΓ‘fico. Ana Santos Aramburo
Identidad comΓΊn: las fuentes del patrimonio bibliogrΓ‘fico. Ana Santos Aramburo
Biblioteca Nacional de EspaΓ±a
Β 
La Biblioteca Nacional de EspaΓ±a como centro de apoyo a la investigaciΓ³n. Ana...
La Biblioteca Nacional de EspaΓ±a como centro de apoyo a la investigaciΓ³n. Ana...La Biblioteca Nacional de EspaΓ±a como centro de apoyo a la investigaciΓ³n. Ana...
La Biblioteca Nacional de EspaΓ±a como centro de apoyo a la investigaciΓ³n. Ana...
Biblioteca Nacional de EspaΓ±a
Β 
Data privacy in library authority files: a survey
Data privacy in library authority files: a surveyData privacy in library authority files: a survey
Data privacy in library authority files: a survey
Biblioteca Nacional de EspaΓ±a
Β 
Perfil de RDA de la BNE. Resumen de cambios
Perfil de RDA de la BNE. Resumen de cambiosPerfil de RDA de la BNE. Resumen de cambios
Perfil de RDA de la BNE. Resumen de cambios
Biblioteca Nacional de EspaΓ±a
Β 
RDA. Autoridades. Fundamentos. IdentificaciΓ³n de entidades. Relaciones
RDA. Autoridades. Fundamentos. IdentificaciΓ³n de entidades. RelacionesRDA. Autoridades. Fundamentos. IdentificaciΓ³n de entidades. Relaciones
RDA. Autoridades. Fundamentos. IdentificaciΓ³n de entidades. Relaciones
Biblioteca Nacional de EspaΓ±a
Β 
RDA: el nuevo texto
RDA: el nuevo textoRDA: el nuevo texto
RDA: el nuevo texto
Biblioteca Nacional de EspaΓ±a
Β 
Pleno del Real Patronato. Biblioteca Nacional de EspaΓ±a
Pleno del Real Patronato. Biblioteca Nacional de EspaΓ±aPleno del Real Patronato. Biblioteca Nacional de EspaΓ±a
Pleno del Real Patronato. Biblioteca Nacional de EspaΓ±a
Biblioteca Nacional de EspaΓ±a
Β 
Objetivos 2019. Pleno del Real Patronato. Biblioteca Nacional de EspaΓ±a
Objetivos 2019. Pleno del Real Patronato. Biblioteca Nacional de EspaΓ±aObjetivos 2019. Pleno del Real Patronato. Biblioteca Nacional de EspaΓ±a
Objetivos 2019. Pleno del Real Patronato. Biblioteca Nacional de EspaΓ±a
Biblioteca Nacional de EspaΓ±a
Β 
Pleno del Real Patronato. Biblioteca Nacional de EspaΓ±a. EvaluaciΓ³n actuacion...
Pleno del Real Patronato. Biblioteca Nacional de EspaΓ±a. EvaluaciΓ³n actuacion...Pleno del Real Patronato. Biblioteca Nacional de EspaΓ±a. EvaluaciΓ³n actuacion...
Pleno del Real Patronato. Biblioteca Nacional de EspaΓ±a. EvaluaciΓ³n actuacion...
Biblioteca Nacional de EspaΓ±a
Β 
EvaluaciΓ³n actuaciones 2018. PlanificaciΓ³n actuaciones 2019
EvaluaciΓ³n actuaciones 2018. PlanificaciΓ³n actuaciones 2019EvaluaciΓ³n actuaciones 2018. PlanificaciΓ³n actuaciones 2019
EvaluaciΓ³n actuaciones 2018. PlanificaciΓ³n actuaciones 2019
Biblioteca Nacional de EspaΓ±a
Β 
DirecciΓ³n TΓ©cnica. Objetivos 2019
DirecciΓ³n TΓ©cnica. Objetivos 2019DirecciΓ³n TΓ©cnica. Objetivos 2019
DirecciΓ³n TΓ©cnica. Objetivos 2019
Biblioteca Nacional de EspaΓ±a
Β 
EvaluaciΓ³n 2018. Objetivos 2019
EvaluaciΓ³n 2018. Objetivos 2019EvaluaciΓ³n 2018. Objetivos 2019
EvaluaciΓ³n 2018. Objetivos 2019
Biblioteca Nacional de EspaΓ±a
Β 
EvaluaciΓ³n actuaciones 2018. DirecciΓ³n Cultural
EvaluaciΓ³n actuaciones 2018. DirecciΓ³n CulturalEvaluaciΓ³n actuaciones 2018. DirecciΓ³n Cultural
EvaluaciΓ³n actuaciones 2018. DirecciΓ³n Cultural
Biblioteca Nacional de EspaΓ±a
Β 
Pleno CCB. Consejo de CooperaciΓ³n Bibliotecaria. Ana Santos Aramburo
Pleno CCB. Consejo de CooperaciΓ³n Bibliotecaria. Ana Santos AramburoPleno CCB. Consejo de CooperaciΓ³n Bibliotecaria. Ana Santos Aramburo
Pleno CCB. Consejo de CooperaciΓ³n Bibliotecaria. Ana Santos Aramburo
Biblioteca Nacional de EspaΓ±a
Β 
Descubrir, aprender, disfrutar en la Biblioteca Nacional de EspaΓ±a. Ana Santo...
Descubrir, aprender, disfrutar en la Biblioteca Nacional de EspaΓ±a. Ana Santo...Descubrir, aprender, disfrutar en la Biblioteca Nacional de EspaΓ±a. Ana Santo...
Descubrir, aprender, disfrutar en la Biblioteca Nacional de EspaΓ±a. Ana Santo...
Biblioteca Nacional de EspaΓ±a
Β 
VIAF GDPR
VIAF GDPRVIAF GDPR
Renacer prensa historica
Renacer prensa historicaRenacer prensa historica
Renacer prensa historica
Biblioteca Nacional de EspaΓ±a
Β 
RDA y Linked data (Ricardo Santos MuΓ±oz)
RDA y Linked data (Ricardo Santos MuΓ±oz)RDA y Linked data (Ricardo Santos MuΓ±oz)
RDA y Linked data (Ricardo Santos MuΓ±oz)
Biblioteca Nacional de EspaΓ±a
Β 
Desarrollo actual de RDA (Pilar Tejero LΓ³pez)
Desarrollo actual de RDA (Pilar Tejero LΓ³pez)Desarrollo actual de RDA (Pilar Tejero LΓ³pez)
Desarrollo actual de RDA (Pilar Tejero LΓ³pez)
Biblioteca Nacional de EspaΓ±a
Β 

More from Biblioteca Nacional de EspaΓ±a (20)

La colecciΓ³n de relaciones de sucesos en la Biblioteca Nacional de EspaΓ±a
La colecciΓ³n de relaciones de sucesos en la Biblioteca Nacional de EspaΓ±aLa colecciΓ³n de relaciones de sucesos en la Biblioteca Nacional de EspaΓ±a
La colecciΓ³n de relaciones de sucesos en la Biblioteca Nacional de EspaΓ±a
Β 
Identidad comΓΊn: las fuentes del patrimonio bibliogrΓ‘fico. Ana Santos Aramburo
Identidad comΓΊn: las fuentes del patrimonio bibliogrΓ‘fico. Ana Santos AramburoIdentidad comΓΊn: las fuentes del patrimonio bibliogrΓ‘fico. Ana Santos Aramburo
Identidad comΓΊn: las fuentes del patrimonio bibliogrΓ‘fico. Ana Santos Aramburo
Β 
La Biblioteca Nacional de EspaΓ±a como centro de apoyo a la investigaciΓ³n. Ana...
La Biblioteca Nacional de EspaΓ±a como centro de apoyo a la investigaciΓ³n. Ana...La Biblioteca Nacional de EspaΓ±a como centro de apoyo a la investigaciΓ³n. Ana...
La Biblioteca Nacional de EspaΓ±a como centro de apoyo a la investigaciΓ³n. Ana...
Β 
Data privacy in library authority files: a survey
Data privacy in library authority files: a surveyData privacy in library authority files: a survey
Data privacy in library authority files: a survey
Β 
Perfil de RDA de la BNE. Resumen de cambios
Perfil de RDA de la BNE. Resumen de cambiosPerfil de RDA de la BNE. Resumen de cambios
Perfil de RDA de la BNE. Resumen de cambios
Β 
RDA. Autoridades. Fundamentos. IdentificaciΓ³n de entidades. Relaciones
RDA. Autoridades. Fundamentos. IdentificaciΓ³n de entidades. RelacionesRDA. Autoridades. Fundamentos. IdentificaciΓ³n de entidades. Relaciones
RDA. Autoridades. Fundamentos. IdentificaciΓ³n de entidades. Relaciones
Β 
RDA: el nuevo texto
RDA: el nuevo textoRDA: el nuevo texto
RDA: el nuevo texto
Β 
Pleno del Real Patronato. Biblioteca Nacional de EspaΓ±a
Pleno del Real Patronato. Biblioteca Nacional de EspaΓ±aPleno del Real Patronato. Biblioteca Nacional de EspaΓ±a
Pleno del Real Patronato. Biblioteca Nacional de EspaΓ±a
Β 
Objetivos 2019. Pleno del Real Patronato. Biblioteca Nacional de EspaΓ±a
Objetivos 2019. Pleno del Real Patronato. Biblioteca Nacional de EspaΓ±aObjetivos 2019. Pleno del Real Patronato. Biblioteca Nacional de EspaΓ±a
Objetivos 2019. Pleno del Real Patronato. Biblioteca Nacional de EspaΓ±a
Β 
Pleno del Real Patronato. Biblioteca Nacional de EspaΓ±a. EvaluaciΓ³n actuacion...
Pleno del Real Patronato. Biblioteca Nacional de EspaΓ±a. EvaluaciΓ³n actuacion...Pleno del Real Patronato. Biblioteca Nacional de EspaΓ±a. EvaluaciΓ³n actuacion...
Pleno del Real Patronato. Biblioteca Nacional de EspaΓ±a. EvaluaciΓ³n actuacion...
Β 
EvaluaciΓ³n actuaciones 2018. PlanificaciΓ³n actuaciones 2019
EvaluaciΓ³n actuaciones 2018. PlanificaciΓ³n actuaciones 2019EvaluaciΓ³n actuaciones 2018. PlanificaciΓ³n actuaciones 2019
EvaluaciΓ³n actuaciones 2018. PlanificaciΓ³n actuaciones 2019
Β 
DirecciΓ³n TΓ©cnica. Objetivos 2019
DirecciΓ³n TΓ©cnica. Objetivos 2019DirecciΓ³n TΓ©cnica. Objetivos 2019
DirecciΓ³n TΓ©cnica. Objetivos 2019
Β 
EvaluaciΓ³n 2018. Objetivos 2019
EvaluaciΓ³n 2018. Objetivos 2019EvaluaciΓ³n 2018. Objetivos 2019
EvaluaciΓ³n 2018. Objetivos 2019
Β 
EvaluaciΓ³n actuaciones 2018. DirecciΓ³n Cultural
EvaluaciΓ³n actuaciones 2018. DirecciΓ³n CulturalEvaluaciΓ³n actuaciones 2018. DirecciΓ³n Cultural
EvaluaciΓ³n actuaciones 2018. DirecciΓ³n Cultural
Β 
Pleno CCB. Consejo de CooperaciΓ³n Bibliotecaria. Ana Santos Aramburo
Pleno CCB. Consejo de CooperaciΓ³n Bibliotecaria. Ana Santos AramburoPleno CCB. Consejo de CooperaciΓ³n Bibliotecaria. Ana Santos Aramburo
Pleno CCB. Consejo de CooperaciΓ³n Bibliotecaria. Ana Santos Aramburo
Β 
Descubrir, aprender, disfrutar en la Biblioteca Nacional de EspaΓ±a. Ana Santo...
Descubrir, aprender, disfrutar en la Biblioteca Nacional de EspaΓ±a. Ana Santo...Descubrir, aprender, disfrutar en la Biblioteca Nacional de EspaΓ±a. Ana Santo...
Descubrir, aprender, disfrutar en la Biblioteca Nacional de EspaΓ±a. Ana Santo...
Β 
VIAF GDPR
VIAF GDPRVIAF GDPR
VIAF GDPR
Β 
Renacer prensa historica
Renacer prensa historicaRenacer prensa historica
Renacer prensa historica
Β 
RDA y Linked data (Ricardo Santos MuΓ±oz)
RDA y Linked data (Ricardo Santos MuΓ±oz)RDA y Linked data (Ricardo Santos MuΓ±oz)
RDA y Linked data (Ricardo Santos MuΓ±oz)
Β 
Desarrollo actual de RDA (Pilar Tejero LΓ³pez)
Desarrollo actual de RDA (Pilar Tejero LΓ³pez)Desarrollo actual de RDA (Pilar Tejero LΓ³pez)
Desarrollo actual de RDA (Pilar Tejero LΓ³pez)
Β 

Recently uploaded

Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
Β 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
Β 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
Β 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
Β 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
Β 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
Β 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
Β 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
Β 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
Β 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
Β 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
Β 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
Β 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
Β 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
Β 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
Β 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
Β 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
Β 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
Β 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
Β 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
Β 

Recently uploaded (20)

Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Β 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Β 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Β 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
Β 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Β 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Β 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Β 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Β 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Β 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
Β 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Β 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Β 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Β 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
Β 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Β 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Β 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Β 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
Β 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Β 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Β 

TR5 Prolifer and Post-Correction System. Ludwig Maximilians

  • 1. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. TR5 Profiler and Post-Correction System Ludwig-Maximilians-UniversitΓ€t MΓΌnchen Centrum fΓΌr Informations- und Sprachverarbeitung
  • 2. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. TR5 Post-Correction System User interface for easy postcorrection of User interface for easy postcorrection of historical OCR'd documents historical OCR'd documents Stand-alone user interface Stand-alone user interface Innovative language technology enables Innovative language technology enables identification, presentation of recognition identification, presentation of recognition errors and efficient correction errors and efficient correction
  • 3. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Customizable user interface Font size Freely rearrangeable interface Freely rearrangeable interface elements: elements: –– OCR with Image snippets OCR with Image snippets –– Complete image Complete image –– Correction candidates/ Special OCR and image fragments Correction candidates/ Special functions functions Complete image Correction candidates, Special functions
  • 4. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. View: OCR and Image clippings Word by word presentation of Word by word presentation of recognized text and image clippings. recognized text and image clippings. Comparison of text and image follows Comparison of text and image follows reading order and isismuch easier than reading order and much easier than side-by-side presentation of image and side-by-side presentation of image and text. text.
  • 5. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. View: Original image –– For difficult cases For difficult cases –– When word segmentation by OCR When word segmentation by OCR fails fails –– Current word isis highlighted Current word highlighted
  • 6. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Word by word correction of text Correction by manual text entry Correction by manual text entry Choosing correction candidates Choosing correction candidates Faster correction thanks to candidates Faster correction thanks to candidates proposed by the postcorrection system proposed by the postcorrection system
  • 7. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Batch correction: efficient postcorrection Batch correction Batch correction –– Several occurences of identical Several occurences of identical word word
  • 8. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Batch correction: efficient postcorrection Batch correction Batch correction –– classes of systematic errors classes of systematic errors –– errors where the correction errors where the correction candidate has aa high degree of candidate has high degree of certainty certainty –– further possilities further possilities Frequent errors Frequent errors For instance Location names For instance Location names
  • 9. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Postcorrection system: Evaluation User Experiment with 14 individual instances Result: Result: Error correction thanks to text and error Error correction thanks to text and error profiling is 2.7 times faster profiling is 2.7 times faster 9 Ulrich Reffle, 4,
  • 10. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Korrektursystem 10
  • 11. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Korrektursystem 11
  • 12. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Why another postcorrection system? Targets more specialist audience Targets more specialist audience Thanks to underlying language technology: Thanks to underlying language technology: Historical variants are recognized and Historical variants are recognized and not marked as errors –– evenwhen not in not marked as errors even when not in historical lexicon historical lexicon Historical variants are proposed as Historical variants are proposed as correction candidates correction candidates Typical error patterns are exploited Typical error patterns are exploited Ranking of correction candidates Ranking of correction candidates
  • 13. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Underlying language technology Lexica and language models help dealing with orthographical variants und Lexica and language models help dealing with orthographical variants und unknown words. unknown words. Recognition of OCR errors and proposal of Correction candidates depends Recognition of OCR errors and proposal of Correction candidates depends on specially developed LMU language technology on specially developed LMU language technology Approximate search inin β€œhypothetical lexicaβ€œ Approximate search β€œhypothetical lexicaβ€œ An analysis of the whole work (β€žtext and error profileβ€œ) produces document- An analysis of the whole work (β€žtext and error profileβ€œ) produces document- specific information about the language and the type of OCR errors specific information about the language and the type of OCR errors
  • 14. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Text and error profiles Text profile Error profile Coverage of lexica Coverage of lexica Estimate of error rate Estimate of error rate Typical variant patterns Typical OCR errors Typical OCR errors Typical variant patterns β†’ Targeted selection of lexica β†’ Targeted selection of lexica β†’ Better language models β†’ Better modeling of error channel β†’ Better modeling of error channel β†’ Better language models β†’ Distinguishing historical variants β†’ Distinguishing historical variants β†’ Distinguishing historical variants β†’ Distinguishing historical variants and OCR errors and OCT errors and OCR errors and OCT errors β†’ Ranking of correction candidates β†’ Ranking of correction candidates β†’ Ranking of correction candidates β†’ Ranking of correction candidates β†’ Recall and Precision in IR β†’Treatment of systematic errors β†’ Recall and Precision in IR β†’Treatment of systematic errors 14
  • 15. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Underlying logic: Dual noisy channel model Interpretation of OCR output tokens as result of two β€œnoisy channels” modern word u historical variant v OCR result w patterns OCR errors Given an OCR token w, give possible interpretations of w in terms of β€’ β€œunderlying” modern word u (IR!) β€’ correct historical word v and its derivation from u via β€œpatterns” β€’ OCR errors garbling v into w
  • 16. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Historical variant and OCR error patterns teil theil Historical Variants OCR Error patterns theil iheil
  • 17. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Relative frequency: 2.9% of all β€˜t’ are rewritten to β€˜th’ Absolute frequency: Pattern was found 120 times in the current document.
  • 18. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Local view: interpretations of tokens – Local view: β€œMeaningful interpretations” for all tokens of the ocr text are the matches in all attached lexicons, using the given settings. Occurrence of spelling variant β€œiβ†’y”: Occurrence of ocr error β€œiβ†’y”:
  • 19. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Global view: pattern frequencies – Global view: Increment counters to estimate (relative) frequencies. Occurrences of spelling variant β€œiβ†’y”: +0.999771 Occurrences of ocr error β€œiβ†’y”: +0.000224948
  • 20. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Computation of profile: initialization Initial global profile Non-specific model with probabilities for β€’Words β€’Variant Patterns β€’Error OCR result w0, w1 ,w2, w3, … 0 1 2 3 20
  • 21. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Computation of profile: global to local Initial global profile Local profile Non-specific model with ww:33:: w: ww… β†’ … β†’ … : w22:33 β†’ … β†’ … …→ … β†’ … probabilities for ……→……→…… …→ …→ … …→ … β†’ … …… β†’ … β†’ … w11::… β†’ … β†’ … w …→…→… β†’ β†’ β€’Words ………→ β†’β†’ …… ………… ……… … β†’ … β†’β†’ β†’ w00…… →→…→ … β†’ … →… … β†’β†’ →… w :: … β†’ ……→… … →…→ … β†’ ……… … β€’Variant Patterns …………… ……→… β†’β†’ β†’β†’β†’ … →…→ β†’ … … β†’ …… β†’ …………→……→… … … →… β†’ …→ … …→ β†’ β€’Error β†’ β†’ …… β†’ … β†’ … …… β†’ … β†’ … →…→… →…→… …… β†’ … β†’ … …… β†’ … β†’ … →…→… →…→… …→…→… …→…→… OCR result w0, w1 ,w2, w3, … 0 1 2 3 21 Ulrich Reffle, 4,
  • 22. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Computation of profile: local to global Global profile Local profile Improved model with ww:33:: w: ww… β†’ … β†’ … : w22:33 β†’ … β†’ … …→ … β†’ … probabilities for ……→……→…… …→ …→ … …→ … β†’ … …… β†’ … β†’ … w11::… β†’ … β†’ … w …→…→… β†’ β†’ β€’Words ………→ β†’β†’ …… ………… ……… … β†’ … β†’β†’ β†’ w00…… →→…→ … β†’ … →… … β†’β†’ →… w :: … β†’ ……→… … →…→ … β†’ ……… … β€’Variant Patterns …………… ……→… β†’β†’ β†’β†’β†’ … →…→ β†’ … … β†’ …… β†’ …………→……→… … … →… β†’ …→ … …→ β†’ β€’Error β†’ β†’ …… β†’ … β†’ … …… β†’ … β†’ … →…→… →…→… …… β†’ … β†’ … …… β†’ … β†’ … →…→… →…→… …→…→… …→…→… OCR result w0, w1 ,w2, w3, … 0 1 2 3 22 Ulrich Reffle, 4,
  • 23. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Computation of profile: iteration Global profile Local profile Improved model with ww:33:: w: ww… β†’ … β†’ … : w22:33 β†’ … β†’ … …→ … β†’ … probabilities for ……→……→…… …→ …→ … …→ … β†’ … …… β†’ … β†’ … w11::… β†’ … β†’ … w …→…→… β†’ β†’ β€’Words ………→ β†’β†’ …… ………… ……… … β†’ … β†’β†’ β†’ w00…… →→…→ … β†’ … →… … β†’β†’ →… w :: … β†’ ……→… … →…→ … β†’ ……… … β€’Variant Patterns …………… ……→… β†’β†’ β†’β†’β†’ … →…→ β†’ … … β†’ …… β†’ …………→……→… … … →… β†’ …→ … …→ β†’ β€’Error β†’ β†’ …… β†’ … β†’ … …… β†’ … β†’ … →…→… →…→… …… β†’ … β†’ … …… β†’ … β†’ … →…→… →…→… …→…→… …→…→… OCR result w0, w1 ,w2, w3, … 0 1 2 3 23 Ulrich Reffle, 4,
  • 24. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Profiler Evaluation Measure the quality 1. of global profiles 2. of OCR error detection Challenges Measures not obvious Good evaluation data is difficult to gather Results need interpretation
  • 25. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Evaluation: Measures (1) Global Profiles Percentage of matches for the first 10 patterns in the ranked output lists Two Values: Historical Patterns, OCR Patterns (2) OCR Error Detection Precision and Recall for the OCR errors detected by the Profiler (3) Indirect evaluation (For instance, by means of the postcorrection system)
  • 26. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Evaluation: Data preparation (1) Deep Evaluation: For each token of the evaluation document the historical interpretation and the OCR interpretation have been manually annotated. ++ fully accurate -- manual work (2) Shallow Evaluation: The OCR’ed document is automatically aligned with its re-typed ground truth; For each token of the evaluation document the historical and the OCR interpretation is automatically assigned from the ground truth. ++ no manual work – not completely accurate
  • 27. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Evaluation: Data Deep: Eckartshausen 100 pages Briefkunst 40 pages Shallow: 5 books each, 16th, 17th and 18th century
  • 28. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Evaluation: Eckartshausen (1) historical patterns matches first 10 70% precision all 68% recall all 73% (2) OCR patterns matches first 6 67% precision all 59% recall all 19% (3) OCR error detection precision 86% recall 46%
  • 29. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Graphical Evaluation: Eckartshausen
  • 30. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Graphical Evaluation: diacritics Hist. Var. OCR
  • 31. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Shallow Evaluation Results 16th 17th 18th HIST Patterns first 10 60% 74% 78% OCR Patterns first 10 48% 70% 50% Error Detection Prec 95% 92% 81% Error Detection Recall 49% 43% 45% Content Words Errors 64% 44% 16% Easy Interactive Correction per β‰ˆ3000 words β‰ˆ 1892 words β‰ˆ 720 words 10,000 words
  • 32. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Global Profile: Spelling variation patterns
  • 33. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Spelling variation profile
  • 34. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. OCR Error Profile
  • 35. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.