Super Resolution Text Images

•

0 likes•82 views

We have implemented and experimented with a lot of deep learning models for better OCR and simultaneously increasing image resolution.

Education

Super Resolution
on Text Images
BTech Project - UG 2nd
Year
Indian Institute of Technology, Jodhpur

Project By
Nivedit Jain
(B18CSE039)
Student
Dr Gaurav Harit
Mentor/Guide
Sanskar Mani
(B18CSE048)
Student

1Problem
2SRCNN
3Data Set Prep
4Our Implementations
5Metrics
6Use Cases
7Learning Outcomes
Overview

Problem Statement
Simultaneous Optimisation of Image Quality Improvement and
Text Content Extraction from scanned documents.

Problem Statement
Lower Resolution
Text Image (64*64) Higher Resolution
Text Image (128*128)
Focused towards
optimizing for OCR

SRCNN?
● Super-Resolution Convolutional Neural Network (SRCNN)
● used for single image super resolution (SR)
Dong, Chao et al. “Image Super-Resolution Using Deep Convolutional Networks.” IEEE Transactions on Pattern Analysis and
Machine Intelligence 38 (2016): 295-307.

Formulation
● Upscale the low resolution image to the desired size using
bicubic interpolation.
● Apply CNN on the upscaled image in order to extract a high
dimensional vector using overlapping patches.

Formulation
● Apply CNN on the extracted high-dimensional vector in
order to non-linearly map the information to another
high-dimensional vector in order to prevent overﬁtting.
● Lastly we apply a ﬁnal CNN on the higher-dimensional
vector in order to reconstruct the output image.

Loss Function
We use Mean-Squared error(MSE) as the loss function.

Bicubic
Interpolation
2D
Convolution
2D
Convolution
2D Convolution

Data Set Preparation
UNLV 1985 Distinct
Documents

Data Set Preparation
UNLV 1985 Distinct
Documents
Business
Letters
ReportsLegal
Documents
NewspaperMagazinesDepartment of
Energy

Randomly Select 50 Images from each
Data Set Preparation

Randomly Select 50 Images from each
Split 200/50/50 training/validate/test
Data Set Preparation

Cut Images to Obtain HR (128*128)
Manually Removed non useful images

Cut Images to Obtain HR (128*128)
Manually Removed non useful images
Apply Gaussian Blur (7*7 Kernel ,
𝜎=uniform(0.5,5)) and Downscale (64*64)

Cut Images to Obtain HR (128*128)
Manually Removed non useful images
Apply Gaussian Blur (7*7 Kernel ,
𝜎=uniform(0.5,5)) and Downscale (64*64)
HR

Cut Images to Obtain HR (128*128)
Manually Removed non useful images
Apply Gaussian Blur (7*7 Kernel ,
𝜎=uniform(0.5,5)) and Downscale (64*64)
HR
LR

Data Set Preparation
So our prepared dataset has
Test (HR-LR) Pairs : 10,929 Images
Train (HR-LR) Pairs : 42,602 Images
Validate (HR-LR) Pairs : 9659 Images

Tesseract OCR
Tesseract is considered one
of the most accurate
open-source OCR engines
then available.*
https://en.wikipedia.org/wiki/Tesseract_(software) , https://tesseract-ocr.github.io/

2 Layer SRCNN
Training Data :
(LR,Tesseracted)

Extra SRCNN
Training Data :
(LR,Tesseracted)

SRCNN
Training Data :
(LR,Tesseract otsu)

Flow Metric
Make a Bipartite Graph with Bounding Boxes
of Original and Predicted as nodes

Flow Metric
Make a Bipartite Graph with Bounding Boxes
of Original and Predicted as nodes
Add edges with percentage of intersection as
weights

Add Source and Sink, and connect with all
edges of weight 1

Add Source and Sink, and connect with all
edges of weight 1
Calculate the maximum ﬂow

Bipartite Metric
Make a Bipartite Graph with Bounding Boxes
of Original and Predicted as nodes

Bipartite Metric
Make a Bipartite Graph with Bounding Boxes
of Original and Predicted as nodes
Add edges with 1 as weight when the character matches and
also the intersection is greater than a threshold.

Add Source and Sink, and connect with all
edges of weight 1
Calculate the maximum ﬂow
This ﬁnds the maximum cardinality bipartite match
between the two B.B.

85.32% 82.11%
Modiﬁed Loss Function
FlowBipartite PSNR
23.31db

83.07% 80.55%
Normal SRCNN
FlowBipartite PSNR
28.94 db

77.49% 74.04%
LR - Tesseract
FlowBipartite PSNR
21.80 db

81.49% 78.82%
2-Layer SRCNN
FlowBipartite PSNR
24.28 db

81.51% 78.91%
Extra SRCNN
FlowBipartite PSNR
22.63 db

79.14% 75.76%
LR - Tesseract OTSU
FlowBipartite PSNR
20.41db

Some Use Cases
01Making OCR Better
02Digitization of Documents

What's hot

Automatic Image AnnotationKonstantinos Zagoris

Target Detection by Fuzzy Gustafson-Kessel AlgorithmCSCJournals

Segmentation - based Historical Handwritten Word Spotting using document-spec...Konstantinos Zagoris

An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...Pooyan Jamshidi

Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会Eiji Sekiya

Transfer Learning for Improving Model Predictions in Highly Configurable Soft...Pooyan Jamshidi

[Chung il kim] 0829 thesisChung-Il Kim

Robot, Learning From DataSungjoon Choi

Scene Text Detection on Images using Cellular AutomataKonstantinos Zagoris

Transfer Learning for Improving Model Predictions in Robotic SystemsPooyan Jamshidi

Cloud Computingbutest

Super Resolution with OCR OptimizationniveditJain

DNR - Auto deep lab paper review ppttaeseon ryu

A scalable collaborative filtering framework based on co clusteringAllenWu

A fuzzy clustering algorithm for high dimensional streaming dataAlexander Decker

Implementing a neural network potential for exascale molecular dynamicsPFHub PFHub

An Efficient APOA Techniques For Generalized Residual Vector Quantization Bas...IJCSIS Research Publications

Anomaly Detection at ScaleJeff Henrikson

FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETScsandit

20 26 Ijarcsee Journal

What's hot (20)

Automatic Image Annotation

Target Detection by Fuzzy Gustafson-Kessel Algorithm

Segmentation - based Historical Handwritten Word Spotting using document-spec...

An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...

Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会

Transfer Learning for Improving Model Predictions in Highly Configurable Soft...

[Chung il kim] 0829 thesis

Robot, Learning From Data

Scene Text Detection on Images using Cellular Automata

Transfer Learning for Improving Model Predictions in Robotic Systems

Cloud Computing

Super Resolution with OCR Optimization

DNR - Auto deep lab paper review ppt

A scalable collaborative filtering framework based on co clustering

A fuzzy clustering algorithm for high dimensional streaming data

Implementing a neural network potential for exascale molecular dynamics

An Efficient APOA Techniques For Generalized Residual Vector Quantization Bas...

Anomaly Detection at Scale

FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS

20 26

Similar to Super Resolution Text Images

IPT.pdfManas Das

Parallel knn on gpu architecture using opencleSAT Journals

NeuralProcessingofGeneralPurposeApproximateProgramsMohid Nabil

Towards neuralprocessingofgeneralpurposeapproximateprogramsParidha Saxena

Content Based Video Retrieval in Transformed Domain using Fractional Coeffici...CSCJournals

SeRanet introductionKosuke Nakago

IMAGE CAPTION GENERATOR.pptx1.pptxxxxxxxxxxAtharvaTanawade

Survey paper on image compression techniquesIRJET Journal

Bp044411416IJERA Editor

Devanagari Digit and Character Recognition Using Convolutional Neural NetworkIRJET Journal

An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...IRJET Journal

Survey on Single image Super Resolution TechniquesIOSR Journals

Genetic Algorithm Processor for Image Noise Filtering Using Evolvable HardwareCSCJournals

OBTAINING SUPER-RESOLUTION IMAGES BY COMBINING LOW-RESOLUTION IMAGES WITH HIG...ijcsit

HRNET : Deep High-Resolution Representation Learning for Human Pose Estimationtaeseon ryu

A Review on Natural Scene Text Understanding for Computer Vision using Machin...IRJET Journal

Large Scale Kernel Learning using Block Coordinate DescentShaleen Kumar Gupta

Improving Graph Based Model for Content Based Image RetrievalIRJET Journal

Semantic Image Retrieval Using Relevance Feedback dannyijwest

Similar to Super Resolution Text Images (20)

IPT.pdf

Parallel knn on gpu architecture using opencl

NeuralProcessingofGeneralPurposeApproximatePrograms

Towards neuralprocessingofgeneralpurposeapproximateprograms

Content Based Video Retrieval in Transformed Domain using Fractional Coeffici...

SeRanet introduction

IMAGE CAPTION GENERATOR.pptx1.pptxxxxxxxxxx

Survey paper on image compression techniques

Bp044411416

Devanagari Digit and Character Recognition Using Convolutional Neural Network

An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...

Survey on Single image Super Resolution Techniques

Genetic Algorithm Processor for Image Noise Filtering Using Evolvable Hardware

OBTAINING SUPER-RESOLUTION IMAGES BY COMBINING LOW-RESOLUTION IMAGES WITH HIG...

HRNET : Deep High-Resolution Representation Learning for Human Pose Estimation

A Review on Natural Scene Text Understanding for Computer Vision using Machin...

Large Scale Kernel Learning using Block Coordinate Descent

Improving Graph Based Model for Content Based Image Retrieval

Semantic Image Retrieval Using Relevance Feedback

Recently uploaded

CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2

A Critique of the Proposed National Education Policy ReformChameera Dedduwage

History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi

Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke

Staff of Color (SOC) Retention Efforts DDSDDavid Douglas School District

Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle

Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝9953056974 Low Rate Call Girls In Saket, Delhi NCR

Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth

9953330565 Low Rate Call Girls In Rohini Delhi NCR9953056974 Low Rate Call Girls In Saket, Delhi NCR

Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝9953056974 Low Rate Call Girls In Saket, Delhi NCR

Software Engineering Methodologies (overview)eniolaolutunde

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a

Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton

TataKelola dan KamSiber Kecerdasan Buatan v022.pdfSarwono Sutikno, Dr.Eng.,CISA,CISSP,CISM,CSX-F

Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos

Introduction to AI in Higher Education_draft.pptxpboyjonauth

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching

_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon

ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood

call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR

Recently uploaded (20)

CARE OF CHILD IN INCUBATOR..........pptx

A Critique of the Proposed National Education Policy Reform

History Class XII Ch. 3 Kinship, Caste and Class (1).pptx

Painted Grey Ware.pptx, PGW Culture of India

Staff of Color (SOC) Retention Efforts DDSD

Hybridoma Technology ( Production , Purification , and Application )

Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝

Introduction to ArtificiaI Intelligence in Higher Education

9953330565 Low Rate Call Girls In Rohini Delhi NCR

Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝

Software Engineering Methodologies (overview)

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf

Science 7 - LAND and SEA BREEZE and its Characteristics

TataKelola dan KamSiber Kecerdasan Buatan v022.pdf

Final demo Grade 9 for demo Plan dessert.pptx

Introduction to AI in Higher Education_draft.pptx

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...

_Math 4-Q4 Week 5.pptx Steps in Collecting Data

ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT

call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️

Super Resolution Text Images

1. Super Resolution on Text Images BTech Project - UG 2nd Year Indian Institute of Technology, Jodhpur

2. Project By Nivedit Jain (B18CSE039) Student Dr Gaurav Harit Mentor/Guide Sanskar Mani (B18CSE048) Student

3. 1Problem 2SRCNN 3Data Set Prep 4Our Implementations 5Metrics 6Use Cases 7Learning Outcomes Overview

4. Problem Statement Simultaneous Optimisation of Image Quality Improvement and Text Content Extraction from scanned documents.

5. Problem Statement Lower Resolution Text Image (64*64) Higher Resolution Text Image (128*128) Focused towards optimizing for OCR

6. SRCNN!

7. SRCNN? ● Super-Resolution Convolutional Neural Network (SRCNN) ● used for single image super resolution (SR) Dong, Chao et al. “Image Super-Resolution Using Deep Convolutional Networks.” IEEE Transactions on Pattern Analysis and Machine Intelligence 38 (2016): 295-307.

8. Formulation ● Upscale the low resolution image to the desired size using bicubic interpolation. ● Apply CNN on the upscaled image in order to extract a high dimensional vector using overlapping patches.

9. Formulation ● Apply CNN on the extracted high-dimensional vector in order to non-linearly map the information to another high-dimensional vector in order to prevent overﬁtting. ● Lastly we apply a ﬁnal CNN on the higher-dimensional vector in order to reconstruct the output image.

10. Loss Function We use Mean-Squared error(MSE) as the loss function.

11. Bicubic Interpolation 2D Convolution 2D Convolution 2D Convolution

12. Data Set Preparation

13. Data Set Preparation UNLV 1985 Distinct Documents

14. Data Set Preparation UNLV 1985 Distinct Documents Business Letters ReportsLegal Documents NewspaperMagazinesDepartment of Energy

15. Randomly Select 50 Images from each Data Set Preparation

16. Randomly Select 50 Images from each Split 200/50/50 training/validate/test Data Set Preparation

17. Cut Images to Obtain HR (128*128)

18. Cut Images to Obtain HR (128*128) Manually Removed non useful images

19. Cut Images to Obtain HR (128*128) Manually Removed non useful images Apply Gaussian Blur (7*7 Kernel , 𝜎=uniform(0.5,5)) and Downscale (64*64)

20. Cut Images to Obtain HR (128*128) Manually Removed non useful images Apply Gaussian Blur (7*7 Kernel , 𝜎=uniform(0.5,5)) and Downscale (64*64) HR

21. Cut Images to Obtain HR (128*128) Manually Removed non useful images Apply Gaussian Blur (7*7 Kernel , 𝜎=uniform(0.5,5)) and Downscale (64*64) HR LR

22. Data Set Preparation So our prepared dataset has Test (HR-LR) Pairs : 10,929 Images Train (HR-LR) Pairs : 42,602 Images Validate (HR-LR) Pairs : 9659 Images

23. Our Implementation

24. Function - g

25. Tesseract OCR Tesseract is considered one of the most accurate open-source OCR engines then available.* https://en.wikipedia.org/wiki/Tesseract_(software) , https://tesseract-ocr.github.io/

26. OTSU Thresholding

27. OTSU Thresholding

28. Loss Function

29. Loss Function

30. Results

31.

32.

33.

34.

35.

36. Other Experiments

37. Normal SRCNN

38. Results

39.

40.

41.

42.

43.

44. SRCNN Training Data : (LR,Tesseracted)

45. Results

46.

47.

48.

49.

50.

51. 2 Layer SRCNN Training Data : (LR,Tesseracted)

52.

53. Results

54.

55.

56.

57.

58.

59. Extra SRCNN Training Data : (LR,Tesseracted)

60.

61. Results

62.

63.

64.

65.

66.

67. SRCNN Training Data : (LR,Tesseract otsu)

68. Results

69.

70.

71.

72.

73.

74. Evaluation Matrices

75. Flow Metric

76. Flow Metric Make a Bipartite Graph with Bounding Boxes of Original and Predicted as nodes

77. Flow Metric Make a Bipartite Graph with Bounding Boxes of Original and Predicted as nodes Add edges with percentage of intersection as weights

78. Add Source and Sink, and connect with all edges of weight 1

79. Add Source and Sink, and connect with all edges of weight 1 Calculate the maximum ﬂow

80. Bipartite Metric

81. Bipartite Metric Make a Bipartite Graph with Bounding Boxes of Original and Predicted as nodes

82. Bipartite Metric Make a Bipartite Graph with Bounding Boxes of Original and Predicted as nodes Add edges with 1 as weight when the character matches and also the intersection is greater than a threshold.

83. Add Source and Sink, and connect with all edges of weight 1

84. Add Source and Sink, and connect with all edges of weight 1 Calculate the maximum ﬂow This ﬁnds the maximum cardinality bipartite match between the two B.B.

85. PSNR

86. PSNR

87. 85.32% 82.11% Modiﬁed Loss Function FlowBipartite PSNR 23.31db

88. 83.07% 80.55% Normal SRCNN FlowBipartite PSNR 28.94 db

89. 77.49% 74.04% LR - Tesseract FlowBipartite PSNR 21.80 db

90. 81.49% 78.82% 2-Layer SRCNN FlowBipartite PSNR 24.28 db

91. 81.51% 78.91% Extra SRCNN FlowBipartite PSNR 22.63 db

92. 79.14% 75.76% LR - Tesseract OTSU FlowBipartite PSNR 20.41db

93. Metric Limitation

94.

95. Some Use Cases 01Making OCR Better 02Digitization of Documents

96. Learning Outcomes Basics of AI Python