Detecting & Recognizing arbitrary shaped texts from Product Images

•

0 likes•185 views

The document discusses text extraction from product images, including text detection and recognition. It covers training a CRNN-CTC model for text recognition using 15 million synthetic images. Spatial Transformer Networks are used to improve accuracy on irregular text. The models achieve high accuracy on various datasets, including ones with curved text. Training is done using generators due to large dataset size, and models are deployed on a single GPU with prediction time of 0.45 seconds per image.

Data & Analytics

Detecting & Recognizing arbitrary
shaped texts from Product Images
Rajesh Shreedhar Bhat
Senior Data Scientist, Walmart Global Tech India

Agenda
▪ Text Extraction Overview
▪ Text Detection(TD)
▪ Text Recognition(TR) training data
preparation
▪ CRNN-CTC model for TR
▪ Attention – OCR
▪ Spatial Transformer Nets for improving
TR accuracy
▪ Model Accuracies on different dataset.
▪ Training & Deployment.
▪ Questions ?

Text Detection – Model architecture
▪ VGG16 – BN as the backbone
▪ Model has skip connection in decoder part
which is similar to U-Nets.
▪ Output :
▪ Region score
▪ Affinity score - grouping characters
Ref:Baek, Youngmin, et al. "Character Region
Awareness for Text detection." Proceedings of the
IEEE Conference on Computer Vision and Pattern
Recognition. 2019.

Ground Truth Label Generation
Ref:Baek, Youngmin, et al. "Character region awareness for text detection." Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.

Sample Output
Region Score Affinity Score Region Score Affinity Score

Text Recognition – Training Data Preparation
SynthText: image generation engine
for building a large annotated dataset.
15 million images generated with
different font styles, size, color &
varying backgrounds using product
descriptions + open source datasets
Vocabulary: 92 characters
Includes capital + small letters,
numbers and special symbols

Link to “Text Recognition with CRNN-CTC model” blog
published in WANDB : https://bit.ly/3hBaWQv

Attention - OCR
• Encoder – Decoder
framework
• CNN used as visual feature
encoder.
• LSTM with Attention
mechanism is used to
extract text in a generative
fashion.
• Cross-entropy as a loss
function

Spatial Transformation Networks
14
• Spatial Transformer
Network is a learnable
module aimed at increasing
the spatial invariance of
Convolutional Neural
Networks in a
computationally and
parameter efficient manner.

Model Accuracy on Regular and Arbitrary shaped
text
15
Dataset CRNN-CTC CNN-LSTM-Attn STN-CRNN-CTC STN-CNN-LSTM- Attn
IIIT 5K 81.6 82.1 85 85.16
SVT 82.9 83.5 88.7 88.8
ICDAR03_860 89.2 89.8 91.03 91.7
ICDAR03_867 91.1 91.0 91.59 92.4
ICDAR13_857 92.6 92.7 93.08 94.00
ICDAR13_1015 93.1 93.1 93.25 94.53
ICDAR15_1811 69.4 69.8 72.3 76.5
ICDAR15_2077 64.2 64.8 67.5 71.89
SVT-P 70 70.6 69.4 76.89
CUTE 65.5 66.7 85.7 83.3
Dataset mainly with
arbitrary shaped text

Training and deployment
▪ 15 million images ~ 690 GB when loaded into memory!! Given that on
an average images are of the shape (128 * 32 * 3) and dtype is float32.
▪ Usage Generators to load only single batch in memory.
▪ Deployed on Machine Learning Platform internal to Walmart.
▪ Both text detection and recognition are deployed on single V100
GPU’s and prediction time is ~0.45 seconds for each image.
16

The Team behind the project
Rajesh Shreedhar Bhat
Senior Data Scientist
Pranay Dugar
Data Scientist
Anirban Chatterjee
Staff Data Scientist
Vijay Agneeswaran
Director- Data Science

rsbhat@asu.edu
https://www.linkedin.com/in/rajeshshreedhar
Questions ??
Code + PPT
https://github.com/rajesh-bhat/data-ai-summit-2020

Similar to Detecting & Recognizing arbitrary shaped texts from Product Images

Optimal configuration of networkjpstudcorner

A Review on Natural Scene Text Understanding for Computer Vision using Machin...IRJET Journal

resume deeksha anandani NXP SemiconductorsDeeksha Anandani

Text Detection and Recognition in Natural ImagesIRJET Journal

IRJET - Study on the Effects of Increase in the Depth of the Feature Extracto...IRJET Journal

Deep learning fundamental and Research project on IBM POWER9 system from NUSGanesan Narayanasamy

Camera-Based Road Lane Detection by Deep Learning IIYu Huang

Ocr using tensor flowNaresh Kumar

Traffic sign recognition and detection using SVM and CNNIRJET Journal

Devanagari Digit and Character Recognition Using Convolutional Neural NetworkIRJET Journal

day3.pdfVikasPs6

Reconstructing the Path of the Object based on Time and Date OCR in Surveilla...ijtsrd

A REVIEW ON IMPROVING TRAFFIC-SIGN DETECTION USING YOLO ALGORITHM FOR OBJECT ...IRJET Journal

“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...Edge AI and Vision Alliance

IRJET- Automatic Data Collection from Forms using Optical Character RecognitionIRJET Journal

Optimization of Incremental Queries CloudMDE2015József Makai

Traffic Sign Recognition using CNNsIRJET Journal

Vehicle Number Plate DetectionIRJET Journal

Traffic Sign Detection and Recognition for Automated Driverless Cars Based on...ijtsrd

Deep Learning Fundamentals and Case studies using IBM POWER SystemsGanesan Narayanasamy

Similar to Detecting & Recognizing arbitrary shaped texts from Product Images (20)

Optimal configuration of network

A Review on Natural Scene Text Understanding for Computer Vision using Machin...

resume deeksha anandani NXP Semiconductors

Text Detection and Recognition in Natural Images

IRJET - Study on the Effects of Increase in the Depth of the Feature Extracto...

Deep learning fundamental and Research project on IBM POWER9 system from NUS

Camera-Based Road Lane Detection by Deep Learning II

Ocr using tensor flow

Traffic sign recognition and detection using SVM and CNN

Devanagari Digit and Character Recognition Using Convolutional Neural Network

day3.pdf

Reconstructing the Path of the Object based on Time and Date OCR in Surveilla...

A REVIEW ON IMPROVING TRAFFIC-SIGN DETECTION USING YOLO ALGORITHM FOR OBJECT ...

“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...

IRJET- Automatic Data Collection from Forms using Optical Character Recognition

Optimization of Incremental Queries CloudMDE2015

Traffic Sign Recognition using CNNs

Vehicle Number Plate Detection

Traffic Sign Detection and Recognition for Automated Driverless Cars Based on...

Deep Learning Fundamentals and Case studies using IBM POWER Systems

Recently uploaded

Midocean dropshipping via API with DroFxolyaivanovalion

Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha

Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat

Industrialised data - the key to AI success.pdfLars Albertsson

Carero dropshipping via API with DroFx.pptxolyaivanovalion

Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth

Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson

Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann

VidaXL dropshipping via API with DroFx.pptxolyaivanovalion

Smarteg dropshipping via API with DroFx.pptxolyaivanovalion

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor

Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda

BigBuy dropshipping via API with DroFx.pptxolyaivanovalion

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H

Introduction-to-Machine-Learning (1).pptxfirstjob4

Recently uploaded (20)

Midocean dropshipping via API with DroFx

Call Girls In Mahipalpur O9654467111 Escorts Service

Log Analysis using OSSEC sasoasasasas.pptx

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

100-Concepts-of-AI by Anupama Kate .pptx

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service

Industrialised data - the key to AI success.pdf

Carero dropshipping via API with DroFx.pptx

Unveiling Insights: The Role of a Data Analyst

Schema on read is obsolete. Welcome metaprogramming..pdf

Generative AI on Enterprise Cloud with NiFi and Milvus

VidaXL dropshipping via API with DroFx.pptx

Smarteg dropshipping via API with DroFx.pptx

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130

Customer Service Analytics - Make Sense of All Your Data.pptx

BigBuy dropshipping via API with DroFx.pptx

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf

Introduction-to-Machine-Learning (1).pptx

Detecting & Recognizing arbitrary shaped texts from Product Images

1. Detecting & Recognizing arbitrary shaped texts from Product Images Rajesh Shreedhar Bhat Senior Data Scientist, Walmart Global Tech India

2. Agenda ▪ Text Extraction Overview ▪ Text Detection(TD) ▪ Text Recognition(TR) training data preparation ▪ CRNN-CTC model for TR ▪ Attention – OCR ▪ Spatial Transformer Nets for improving TR accuracy ▪ Model Accuracies on different dataset. ▪ Training & Deployment. ▪ Questions ?

3. Text Extraction Overview

4. Text Detection

5. Text Detection – Model architecture ▪ VGG16 – BN as the backbone ▪ Model has skip connection in decoder part which is similar to U-Nets. ▪ Output : ▪ Region score ▪ Affinity score - grouping characters Ref:Baek, Youngmin, et al. "Character Region Awareness for Text detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.

6. Ground Truth Label Generation Ref:Baek, Youngmin, et al. "Character region awareness for text detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.

7. Sample Output Region Score Affinity Score Region Score Affinity Score

8. Sample Output ..

9. Text Recognition

10. Text Recognition – Training Data Preparation SynthText: image generation engine for building a large annotated dataset. 15 million images generated with different font styles, size, color & varying backgrounds using product descriptions + open source datasets Vocabulary: 92 characters Includes capital + small letters, numbers and special symbols

11. Link to “Text Recognition with CRNN-CTC model” blog published in WANDB : https://bit.ly/3hBaWQv

12. Attention - OCR • Encoder – Decoder framework • CNN used as visual feature encoder. • LSTM with Attention mechanism is used to extract text in a generative fashion. • Cross-entropy as a loss function

13. Product Images with curved text 13

14. Spatial Transformation Networks 14 • Spatial Transformer Network is a learnable module aimed at increasing the spatial invariance of Convolutional Neural Networks in a computationally and parameter efficient manner.

15. Model Accuracy on Regular and Arbitrary shaped text 15 Dataset CRNN-CTC CNN-LSTM-Attn STN-CRNN-CTC STN-CNN-LSTM- Attn IIIT 5K 81.6 82.1 85 85.16 SVT 82.9 83.5 88.7 88.8 ICDAR03_860 89.2 89.8 91.03 91.7 ICDAR03_867 91.1 91.0 91.59 92.4 ICDAR13_857 92.6 92.7 93.08 94.00 ICDAR13_1015 93.1 93.1 93.25 94.53 ICDAR15_1811 69.4 69.8 72.3 76.5 ICDAR15_2077 64.2 64.8 67.5 71.89 SVT-P 70 70.6 69.4 76.89 CUTE 65.5 66.7 85.7 83.3 Dataset mainly with arbitrary shaped text

16. Training and deployment ▪ 15 million images ~ 690 GB when loaded into memory!! Given that on an average images are of the shape (128 * 32 * 3) and dtype is float32. ▪ Usage Generators to load only single batch in memory. ▪ Deployed on Machine Learning Platform internal to Walmart. ▪ Both text detection and recognition are deployed on single V100 GPU’s and prediction time is ~0.45 seconds for each image. 16

17. The Team behind the project Rajesh Shreedhar Bhat Senior Data Scientist Pranay Dugar Data Scientist Anirban Chatterjee Staff Data Scientist Vijay Agneeswaran Director- Data Science

18. rsbhat@asu.edu https://www.linkedin.com/in/rajeshshreedhar Questions ?? Code + PPT https://github.com/rajesh-bhat/data-ai-summit-2020

Detecting & Recognizing arbitrary shaped texts from Product Images

Recommended

Recommended

More Related Content

Similar to Detecting & Recognizing arbitrary shaped texts from Product Images

Similar to Detecting & Recognizing arbitrary shaped texts from Product Images (20)

More from Databricks

More from Databricks (20)

Recently uploaded

Recently uploaded (20)

Detecting & Recognizing arbitrary shaped texts from Product Images