Super Resolution
on Text Images
BTech Project - UG 2nd
Year
Indian Institute of Technology, Jodhpur
Project By
Nivedit Jain
(B18CSE039)
Student
Dr Gaurav Harit
Mentor/Guide
Sanskar Mani
(B18CSE048)
Student
1Problem
2SRCNN
3Data Set Prep
4Our Implementations
5Metrics
6Use Cases
7Learning Outcomes
Overview
Problem Statement
Simultaneous Optimisation of Image Quality Improvement and
Text Content Extraction from scanned documents.
Problem Statement
Lower Resolution
Text Image (64*64) Higher Resolution
Text Image (128*128)
Focused towards
optimizing for OCR
SRCNN!
SRCNN?
● Super-Resolution Convolutional Neural Network (SRCNN)
● used for single image super resolution (SR)
Dong, Chao et al. “Image Super-Resolution Using Deep Convolutional Networks.” IEEE Transactions on Pattern Analysis and
Machine Intelligence 38 (2016): 295-307.
Formulation
● Upscale the low resolution image to the desired size using
bicubic interpolation.
● Apply CNN on the upscaled image in order to extract a high
dimensional vector using overlapping patches.
Formulation
● Apply CNN on the extracted high-dimensional vector in
order to non-linearly map the information to another
high-dimensional vector in order to prevent overfitting.
● Lastly we apply a final CNN on the higher-dimensional
vector in order to reconstruct the output image.
Loss Function
We use Mean-Squared error(MSE) as the loss function.
Bicubic
Interpolation
2D
Convolution
2D
Convolution
2D Convolution
Data Set
Preparation
Data Set Preparation
UNLV 1985 Distinct
Documents
Data Set Preparation
UNLV 1985 Distinct
Documents
Business
Letters
ReportsLegal
Documents
NewspaperMagazinesDepartment of
Energy
Randomly Select 50 Images from each
Data Set Preparation
Randomly Select 50 Images from each
Split 200/50/50 training/validate/test
Data Set Preparation
Cut Images to Obtain HR (128*128)
Cut Images to Obtain HR (128*128)
Manually Removed non useful images
Cut Images to Obtain HR (128*128)
Manually Removed non useful images
Apply Gaussian Blur (7*7 Kernel ,
𝜎=uniform(0.5,5)) and Downscale (64*64)
Cut Images to Obtain HR (128*128)
Manually Removed non useful images
Apply Gaussian Blur (7*7 Kernel ,
𝜎=uniform(0.5,5)) and Downscale (64*64)
HR
Cut Images to Obtain HR (128*128)
Manually Removed non useful images
Apply Gaussian Blur (7*7 Kernel ,
𝜎=uniform(0.5,5)) and Downscale (64*64)
HR
LR
Data Set Preparation
So our prepared dataset has
Test (HR-LR) Pairs : 10,929 Images
Train (HR-LR) Pairs : 42,602 Images
Validate (HR-LR) Pairs : 9659 Images
Our
Implementation
Function - g
Tesseract OCR
Tesseract is considered one
of the most accurate
open-source OCR engines
then available.*
https://en.wikipedia.org/wiki/Tesseract_(software) , https://tesseract-ocr.github.io/
OTSU Thresholding
OTSU Thresholding
Loss Function
Loss Function
Results
Other Experiments
Normal SRCNN
Results
SRCNN
Training Data :
(LR,Tesseracted)
Results
2 Layer SRCNN
Training Data :
(LR,Tesseracted)
Results
Extra SRCNN
Training Data :
(LR,Tesseracted)
Results
SRCNN
Training Data :
(LR,Tesseract otsu)
Results
Evaluation
Matrices
Flow Metric
Flow Metric
Make a Bipartite Graph with Bounding Boxes
of Original and Predicted as nodes
Flow Metric
Make a Bipartite Graph with Bounding Boxes
of Original and Predicted as nodes
Add edges with percentage of intersection as
weights
Add Source and Sink, and connect with all
edges of weight 1
Add Source and Sink, and connect with all
edges of weight 1
Calculate the maximum flow
Bipartite Metric
Bipartite Metric
Make a Bipartite Graph with Bounding Boxes
of Original and Predicted as nodes
Bipartite Metric
Make a Bipartite Graph with Bounding Boxes
of Original and Predicted as nodes
Add edges with 1 as weight when the character matches and
also the intersection is greater than a threshold.
Add Source and Sink, and connect with all
edges of weight 1
Add Source and Sink, and connect with all
edges of weight 1
Calculate the maximum flow
This finds the maximum cardinality bipartite match
between the two B.B.
PSNR
PSNR
85.32% 82.11%
Modified Loss Function
FlowBipartite PSNR
23.31db
83.07% 80.55%
Normal SRCNN
FlowBipartite PSNR
28.94 db
77.49% 74.04%
LR - Tesseract
FlowBipartite PSNR
21.80 db
81.49% 78.82%
2-Layer SRCNN
FlowBipartite PSNR
24.28 db
81.51% 78.91%
Extra SRCNN
FlowBipartite PSNR
22.63 db
79.14% 75.76%
LR - Tesseract OTSU
FlowBipartite PSNR
20.41db
Metric Limitation
Some Use Cases
01Making OCR Better
02Digitization of Documents
Learning Outcomes
Basics of AI Python
Thank You!
Backup Slides
Super Resolution with OCR Optimization

Super Resolution with OCR Optimization