Text extraction from scientific figures has been addressed in the past by different unsupervised approaches due to the limited amount of training data. Motivated by the recent advances in Deep Learning, we propose a two-step neural-network-based pipeline to localize and extract text using Fully Convolutional Networks. We improve the localization of the text bounding boxes by applying a novel combination of a Residual Network with the Region Proposal Network based on Faster R-CNN. The predicted bounding boxes are further pre-processed and used as input to the of-the-shelf optical character recognition engine Tesseract 4.0. We evaluate our improved text localization method on five different datasets of scientific figures and compare it with the best unsupervised pipeline. Since only limited training data is available, we further experiment with different data augmentation techniques for increasing the size of the training datasets and demonstrate their positive impact. We use Average Precision and F1 measure to assess the text localization results. In addition, we apply Gestalt Pattern Matching and Levenshtein Distance for evaluating the quality of the recognized text. Our extensive experiments show that our new pipeline based on neural networks outperforms the best unsupervised approach by a large margin of 19-20%.
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Text Localization in Scientific Figures using Fully Convolutional Neural Networks on Limited Training Data
1. Introduction
Datasets
Overview: Supervised Approach
Results
Outlook
Text Localization in Scientific Figures
using Fully Convolutional Neural Networks
on Limited Training Data
Morten Jessen, Falk B¨oschen, Ansgar Scherp
DocEng, September 2019
Morten Jessen, Falk B¨oschen, Ansgar Scherp 1 / 24
2. Introduction
Datasets
Overview: Supervised Approach
Results
Outlook
Motivation
Figures are widely used in scientific papers, media, and other
Figures often contain information that is not present in the
surrounding text and transport core message(s) of a document
Extracted text can be used for
improving existing retrieval systems
building (better) figure retrieval systems
making figures available to visually impaired people
. . .
However, common Optical Character Recognition (OCR)
engines have problems with processing figures
So far the focus was on unsupervised approaches due to the
lack of training data
Morten Jessen, Falk B¨oschen, Ansgar Scherp 2 / 24
3. Introduction
Datasets
Overview: Supervised Approach
Results
Outlook
Our previous unsupervised Approach [MTAP’18]
Observation: text localization (1)-(4) is most challenging part
Propose a supervised approach for text localization that can
work with limited training data =⇒ DocEng’19
Morten Jessen, Falk B¨oschen, Ansgar Scherp 3 / 24
4. Introduction
Datasets
Overview: Supervised Approach
Results
Outlook
Datasets of Scholarly Figures
CHIME-R CHIME-S DeGruyter EconBiz DeTEXT
Number of Images 115 85 120 121 192
Text elements 14 12 24 25 14
Words 18 18 34 35 20
Characters 76 69 149 151 120
Available datasets are quite small
Makes training of supervised methods difficult
Morten Jessen, Falk B¨oschen, Ansgar Scherp 4 / 24
10. Introduction
Datasets
Overview: Supervised Approach
Results
Outlook
Localization
Pre-Training
Artificial Dataset Extension
Recognition
Overview
Focus on a neural network based approach for text
localization in scientific figures
Evaluate different approaches to address the challenge of
limited training data
Pre-Training on large datasets
Artificial dataset extension
We use a common Optical Character Recognition engine for
text recognition (Tesseract)
Morten Jessen, Falk B¨oschen, Ansgar Scherp 10 / 24
14. Introduction
Datasets
Overview: Supervised Approach
Results
Outlook
Localization
Pre-Training
Artificial Dataset Extension
Recognition
Text Exraction with Tesseract 4.0
OCR engine using LSTM neural network
Text extraction process
Generate multiple input images from one bounding box
(provded by Faster R-CNN)
Stop when Tesseract’s confidence score is ≥ 96%, OR
take best otherwise
Morten Jessen, Falk B¨oschen, Ansgar Scherp 14 / 24
16. Introduction
Datasets
Overview: Supervised Approach
Results
Outlook
Evaluation Measures
Text localization: detection of bounding boxes
Average Prevision (AP), AP50, AP75 over
“Intersection over Union” (IoU)
Precision, Recall
Text recognition: extraction of text from bounding boxes
Levenshtein Distance: number of edits needed to correct word
Gestalt Pattern Matching: correctness of extraction in relation
to word length
Morten Jessen, Falk B¨oschen, Ansgar Scherp 16 / 24
18. Introduction
Datasets
Overview: Supervised Approach
Results
Outlook
Average Precision (AP) over IoU
Figure: Visualization of different IoU Values.
AP50: Percentage of predictions with IoU > 0.5
AP75: Percentage of predictions with IoU > 0.75
AP: Summary metric, combines ten equally spaced IoU
thresholds (0.50, 0.55, 0.60, ..., 0.90, 0.95)
Morten Jessen, Falk B¨oschen, Ansgar Scherp 18 / 24
19. Introduction
Datasets
Overview: Supervised Approach
Results
Outlook
Effect of Pre-Training on COCO-Text
Pretraining: none on COCO-Text
AP50 91.35% 95.21%
AP75 63.49% 76.33%
AP 58.37% 65.98%
Table: Comparison for training with and without pre-training on
COCO-Text.
Morten Jessen, Falk B¨oschen, Ansgar Scherp 19 / 24
20. Introduction
Datasets
Overview: Supervised Approach
Results
Outlook
Effect of Dataset Augmentation
AP AP75 AP 50
without augmentation 52.90% 53.02% 90.34%
with augmentation 60.81% 67.57% 92.88%
Table: Comparison: Effect of artificially extended dataset on ResNet101.
Morten Jessen, Falk B¨oschen, Ansgar Scherp 20 / 24
21. Introduction
Datasets
Overview: Supervised Approach
Results
Outlook
Generalization Experiments: Train on 4 + Test on Last
Tested on AP50 AP75 AP
CHIME-R 80.45% 45.19% 45.82%
CHIME-S 87.73% 30.59% 41.12%
DeGruyter 86.63% 35.88% 43.06%
EconBiz 84.61% 15.88% 34.03%
DeTEXT 70.32% 29.49% 34.46%
Table: Generalization: Training on four of the datasets for 200, 000
iterations and testing on the fifth dataset.
Morten Jessen, Falk B¨oschen, Ansgar Scherp 21 / 24
22. Introduction
Datasets
Overview: Supervised Approach
Results
Outlook
Comparison to Unsupervised Approach: Localization
Precision Recall F1 (STD)
TX 0.66 0.55 0.56 (0.25)
NN 0.86 0.83 0.87 (0.12)
Table: Comparison of the unsupervised approach (TX) with our proposed
supervised approach (NN) for text localization in scientific figures.
Morten Jessen, Falk B¨oschen, Ansgar Scherp 22 / 24
23. Introduction
Datasets
Overview: Supervised Approach
Results
Outlook
Comparison to Unsupervised Approach: Recognition
Levenshteinavg (SD) Global Levenshtein (SD)
TX 6.23 (4.93) 108.81 (108.53)
NN 3.44 (4.42) 39.11 (41.75)
Table: Comparison of text recognition of the unsupervised approach (TX)
with our proposed supervised approach (NN).
Morten Jessen, Falk B¨oschen, Ansgar Scherp 23 / 24
24. Introduction
Datasets
Overview: Supervised Approach
Results
Outlook
Summary
Proposed a supervised text extraction approach from scientific
figures using neural networks
Showed that dataset extension and pre-training with natural
images alleviates problem of limited training data
Supervised approach outperforms the previously known best
unsupervised approach(es)
Capable of handling different datasets: generalizes to new
datasets if they contain figures of same type
Thank you! Any questions? Email: ansgar.scherp@essex.ac.uk
Morten Jessen, Falk B¨oschen, Ansgar Scherp 24 / 24