Text Extraction from Product Images Using State-of-the-Art Deep Learning Techniques

Databricks
DatabricksDeveloper Marketing and Relations at MuleSoft
1
Text Extraction from Product Images
Rajesh Shreedhar Bhat
Data Scientist @Walmart Labs, Bengaluru
MS CS @ASU, Kaggle Competitions Expert
Agenda
▪ Intro to Text Extraction
▪ Text Detection(TD)
▪ TD Model Architecture
▪ Training data generation
▪ Text Recognition(TR) training data
preparation
▪ CRNN-CTC model for TR
▪ Receptive Fields
▪ CTC decoder and loss
▪ TR Training phase
▪ Other Advanced Techniques
▪ Questions ?
3
Introduction : Text Extraction
4
Text Detection
5
Image Segmentation - Input & Ground Truth
6
Ground Truth Label Generation
7
Text Detection – Model architecture
▪ VGG16 – BN as the backbone
▪ Model has skip connection in decoder
part which is similar to U-Nets.
▪ Output :
▪ Region score
▪ Affinity score - grouping
characters
Ref:Baek, Youngmin, et al. "Character Region
Awareness for Text detection." Proceedings of the
IEEE Conference on Computer Vision and Pattern
Recognition. 2019.
8
Sample Output
Region Score Affinity Score Region Score Affinity Score
9
Sample Output..
10
Text Recognition
11
Text Recognition – Training Data Preparation
SynthText: image generation engine
for building a large annotated dataset.
15 million images generated with
different font styles, size, color &
varying backgrounds using product
descriptions + open source datasets
Vocabulary: 92 characters
Includes capital + small letters,
numbers and special symbols
12
Text Recognition CRNN CTC model
13
CNN - Receptive Fields
Usage of intermediate layer features in
SSD’s in Object detection tasks.
▪ Receptive field is defined as the region in the input image/space that a
particular CNN’s feature is looking at.
14
CNN features to LSTM
Input Image
(None, 128 * 32 * 1)
Feature map
(None, 1, 31, 512)
Feature map
(None, 1, 31, 512)
SoftMax probabilities for every
time step (i.e. 31), over the
vocabulary.
15
Ground Truth and Output of TR task
16
Hello
Input Image Ground Truth
Output from LSTM model
for 31 timesteps ..
Time step t1 t2 t3 t4 t5 ... …. t27 t28 t29 t30 t31
Prediction H H H e e ... ... l o o o o
Length of Ground truth is 5 which is not equal to length of prediction i.e 31
How to calculate the loss?
NER model – loss: categorical cross entropy
• Do we have labels for
every time steps of
LSTM model in CRNN
setting ?
• Can we use cross
entropy loss?
Answer is: NO!!
17
Mapping of Input to Output
Corresponding Text
Hello
Corresponding Text
Hello
Can we manually align each character to its location in the audio/image?
Yes!! But lot of manual effort is needed in creating training data.
18
CTC to rescue
With just mapping from image to text and not worrying about alignment of each
character to the location in input image, one should be able to train the network.
Merge repeats
Merge repeats
Drop blank character
19
Connectionist Temporal Classification (CTC) Loss
▪ Ground truth for an image --> AB
▪ Vocabulary is { A, B, - }
▪ Let's say we have predictions for 3-time steps from LSTM network (SoftMax
probabilities over vocabulary at t1, t2, t3)
▪ Given that we use CTC decode operation discussed earlier, in which scenarios we
can say output from the model is correct??
20
CTC loss continued ..
t1 t2 t3
A B B
A A B
- A B
A - B
A B -
• Merge
repeats
• Drop blank
character
AB
t1 t2 t3
A 0.8 0.7 0.1
B 0.1 0.1 0.8
- 0.1 0.2 0.1
Ground Truth : AB
Score for one path: AAB = (0.8 * 0.7 * 0.8) and similarly for other paths.
Probability of getting GT AB: = P(ABB) + P(AAB) + P(-AB) + P(A-B) + P(AB-)
Loss : - log( Probability of getting GT )
SoftMax probabilities
21
CTC loss perfect match
t1 t2 t3
A - -
- A -
- - A
- A A
A A -
A - A
A A A
• Merge
repeats
• Drop blank
character
t1 t2 t3
A 1 0 0
B 0 0 0
- 0 1 1
• SoftMax probabilities
Score for one path: A- - = (1 * 1 * 1) and similarly for other paths.
Probability of getting ground truth A: = P(A--) + P(-A-) + P(--A) + P(-AA) + P(AA-) + P(A-A) + P(AAA)
Loss : - log( Probability of getting GT ) = 0
Ground Truth : A
A
22
CTC loss perfect mismatch
• Merge
repeats
• Drop blank
character
A
t1 t2 t3
A 0 0 0
B 1 1 1
- 0 0 0
SoftMax probabilities
Score for one path: A - - = (0 * 0 * 0) and similarly for other paths.
Probability of getting ground truth A: = P(A--) + P(-A-) + P(--A) + P(-AA) + P(AA-) + P(A-A) + P(AAA)
Loss : - log( Probability of getting GT ) = tends to infinity !!
Ground Truth : A
t1 t2 t3
A - -
- A -
- - A
- A A
A A -
A - A
A A A
23
Model Architecture & CTC loss in TF
24
Training Phase
▪ 15 million images ~ 690 GB when loaded into memory!! Given that on an average
images are of the shape (128 * 32 * 3) and dtype is float32.
▪ Usage of Python Generators to load only single batch in memory.
▪ Reducing the training time by using workers, max_queue_size & multi-processing
in .fit_generator in Keras.
▪ Training time ~ 2 hours for single epoch on single P100 GPU machine and
prediction time ~1 sec for batch of 2048 images.
25
Other Advanced Techniques
Attention - OCR Spatial Transformer Network – before text recognition
Ref:Jaderberg, Max, Karen Simonyan, and Andrew Zisserman. "Spatial transformer
networks." Advances in neural information processing systems. 2015.
26
Code + PPT
https://github.com/rajesh-bhat/spark-ai-summit-2020-text-extraction
27
Questions ??
rsbhat@asu.edu
https://www.linkedin.com/in/rajeshshreedhar
28
1 of 28

Recommended

Cloud Computing - Benefits and Challenges by
Cloud Computing - Benefits and ChallengesCloud Computing - Benefits and Challenges
Cloud Computing - Benefits and ChallengesThoughtWorks Studios
9K views27 slides
Case study on cloud computing by
Case study on cloud computingCase study on cloud computing
Case study on cloud computingSnehal Takawale
6.3K views16 slides
Data mining in social network by
Data mining in social networkData mining in social network
Data mining in social networkakash_mishra
13.7K views28 slides
Customer, Data Employee Trio by
Customer, Data Employee TrioCustomer, Data Employee Trio
Customer, Data Employee TrioEinat Shimoni
3.1K views121 slides
Sla in cloud by
Sla in cloudSla in cloud
Sla in cloudPankesh Patel
4.7K views19 slides
Cloud service management by
Cloud service managementCloud service management
Cloud service managementgaurav jain
18.4K views10 slides

More Related Content

What's hot

Functionalities in AI Applications and Use Cases (OECD) by
Functionalities in AI Applications and Use Cases (OECD)Functionalities in AI Applications and Use Cases (OECD)
Functionalities in AI Applications and Use Cases (OECD)AnandSRao1962
461 views12 slides
Data mining with big data implementation by
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementationSandip Tipayle Patil
2.7K views42 slides
Enterprise Architecture vs. Data Architecture by
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
794 views36 slides
Security in cloud computing by
Security in cloud computingSecurity in cloud computing
Security in cloud computingveena venugopal
2K views30 slides
Machine Learning in Big Data by
Machine Learning in Big DataMachine Learning in Big Data
Machine Learning in Big DataDataWorks Summit
2.4K views29 slides
MLOps and Data Quality: Deploying Reliable ML Models in Production by
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
211 views32 slides

What's hot(20)

Functionalities in AI Applications and Use Cases (OECD) by AnandSRao1962
Functionalities in AI Applications and Use Cases (OECD)Functionalities in AI Applications and Use Cases (OECD)
Functionalities in AI Applications and Use Cases (OECD)
AnandSRao1962461 views
Enterprise Architecture vs. Data Architecture by DATAVERSITY
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
DATAVERSITY794 views
MLOps and Data Quality: Deploying Reliable ML Models in Production by Provectus
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
Provectus211 views
Object Storage 1: The Fundamentals of Objects and Object Storage by Hitachi Vantara
Object Storage 1: The Fundamentals of Objects and Object StorageObject Storage 1: The Fundamentals of Objects and Object Storage
Object Storage 1: The Fundamentals of Objects and Object Storage
Hitachi Vantara6.5K views
Security & Privacy In Cloud Computing by saurabh soni
Security & Privacy In Cloud ComputingSecurity & Privacy In Cloud Computing
Security & Privacy In Cloud Computing
saurabh soni2.5K views
4.2 spatial data mining by Krish_ver2
4.2 spatial data mining4.2 spatial data mining
4.2 spatial data mining
Krish_ver210.2K views
Snowflake SnowPro Certification Exam Cheat Sheet by Jeno Yamma
Snowflake SnowPro Certification Exam Cheat SheetSnowflake SnowPro Certification Exam Cheat Sheet
Snowflake SnowPro Certification Exam Cheat Sheet
Jeno Yamma79.4K views
Building a Modern Data Architecture on AWS - Webinar by Amazon Web Services
Building a Modern Data Architecture on AWS - WebinarBuilding a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - Webinar
Cloud Mashup by Vasco Elvas
Cloud MashupCloud Mashup
Cloud Mashup
Vasco Elvas13.5K views
Data-driven AI for Self-adaptive Information Systems by Andreas Metzger
Data-driven AI for Self-adaptive Information SystemsData-driven AI for Self-adaptive Information Systems
Data-driven AI for Self-adaptive Information Systems
Andreas Metzger527 views
AI in Finance: Moving forward! by Adrian Hornsby
AI in Finance: Moving forward!AI in Finance: Moving forward!
AI in Finance: Moving forward!
Adrian Hornsby979 views

Similar to Text Extraction from Product Images Using State-of-the-Art Deep Learning Techniques

Slides_for_Spark_AI_TE.pdf by
Slides_for_Spark_AI_TE.pdfSlides_for_Spark_AI_TE.pdf
Slides_for_Spark_AI_TE.pdfDhivya594398
5 views28 slides
CS 354 Understanding Color by
CS 354 Understanding ColorCS 354 Understanding Color
CS 354 Understanding ColorMark Kilgard
1.6K views42 slides
image compresson by
image compressonimage compresson
image compressonAjay Kumar
2.3K views128 slides
Image compression by
Image compressionImage compression
Image compressionAle Johnsan
10.3K views128 slides
Presentation: Plotting Systems in R by
Presentation: Plotting Systems in RPresentation: Plotting Systems in R
Presentation: Plotting Systems in RIlya Zhbannikov
326 views36 slides
Cs gate-2011 by
Cs gate-2011Cs gate-2011
Cs gate-2011Ved Prakash Gautam
560 views26 slides

Similar to Text Extraction from Product Images Using State-of-the-Art Deep Learning Techniques(20)

Slides_for_Spark_AI_TE.pdf by Dhivya594398
Slides_for_Spark_AI_TE.pdfSlides_for_Spark_AI_TE.pdf
Slides_for_Spark_AI_TE.pdf
Dhivya5943985 views
CS 354 Understanding Color by Mark Kilgard
CS 354 Understanding ColorCS 354 Understanding Color
CS 354 Understanding Color
Mark Kilgard1.6K views
image compresson by Ajay Kumar
image compressonimage compresson
image compresson
Ajay Kumar2.3K views
Image compression by Ale Johnsan
Image compressionImage compression
Image compression
Ale Johnsan10.3K views
Presentation: Plotting Systems in R by Ilya Zhbannikov
Presentation: Plotting Systems in RPresentation: Plotting Systems in R
Presentation: Plotting Systems in R
Ilya Zhbannikov326 views
The 2nd graph database in sv meetup by Joshua Bae
The 2nd graph database in sv meetupThe 2nd graph database in sv meetup
The 2nd graph database in sv meetup
Joshua Bae421 views
Surrey dl-4 by ozzie73
Surrey dl-4Surrey dl-4
Surrey dl-4
ozzie731.3K views
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14 by Yuichiro Yasui
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
Yuichiro Yasui1.6K views
Geometry Batching Using Texture-Arrays by Matthias Trapp
Geometry Batching Using Texture-ArraysGeometry Batching Using Texture-Arrays
Geometry Batching Using Texture-Arrays
Matthias Trapp2.3K views
testpang by pangpang2
testpangtestpang
testpang
pangpang2764 views
lecture1.ppt by SagarDR5
lecture1.pptlecture1.ppt
lecture1.ppt
SagarDR52 views
defense by Qing Dou
defensedefense
defense
Qing Dou319 views
Summer training on matlab by dangerahad
Summer training on matlabSummer training on matlab
Summer training on matlab
dangerahad451 views
Datastage real time scenario by Naresh Bala
Datastage real time scenarioDatastage real time scenario
Datastage real time scenario
Naresh Bala5.5K views

More from Databricks

DW Migration Webinar-March 2022.pptx by
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
4.3K views25 slides
Data Lakehouse Symposium | Day 1 | Part 1 by
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
1.5K views43 slides
Data Lakehouse Symposium | Day 1 | Part 2 by
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
741 views16 slides
Data Lakehouse Symposium | Day 4 by
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
1.8K views74 slides
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop by
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
6.3K views64 slides
Democratizing Data Quality Through a Centralized Platform by
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
1.4K views36 slides

More from Databricks(20)

DW Migration Webinar-March 2022.pptx by Databricks
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks4.3K views
Data Lakehouse Symposium | Day 1 | Part 1 by Databricks
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks1.5K views
Data Lakehouse Symposium | Day 1 | Part 2 by Databricks
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks741 views
Data Lakehouse Symposium | Day 4 by Databricks
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks1.8K views
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop by Databricks
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks6.3K views
Democratizing Data Quality Through a Centralized Platform by Databricks
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks1.4K views
Learn to Use Databricks for Data Science by Databricks
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks1.6K views
Why APM Is Not the Same As ML Monitoring by Databricks
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks743 views
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix by Databricks
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks689 views
Stage Level Scheduling Improving Big Data and AI Integration by Databricks
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks850 views
Simplify Data Conversion from Spark to TensorFlow and PyTorch by Databricks
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks1.8K views
Scaling your Data Pipelines with Apache Spark on Kubernetes by Databricks
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks2.1K views
Scaling and Unifying SciKit Learn and Apache Spark Pipelines by Databricks
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks667 views
Sawtooth Windows for Feature Aggregations by Databricks
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks605 views
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink by Databricks
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks677 views
Re-imagine Data Monitoring with whylogs and Spark by Databricks
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks550 views
Raven: End-to-end Optimization of ML Prediction Queries by Databricks
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks449 views
Processing Large Datasets for ADAS Applications using Apache Spark by Databricks
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks513 views
Massive Data Processing in Adobe Using Delta Lake by Databricks
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks719 views
Machine Learning CI/CD for Email Attack Detection by Databricks
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
Databricks389 views

Recently uploaded

SUPER STORE SQL PROJECT.pptx by
SUPER STORE SQL PROJECT.pptxSUPER STORE SQL PROJECT.pptx
SUPER STORE SQL PROJECT.pptxkhan888620
13 views16 slides
Short Story Assignment by Kelly Nguyen by
Short Story Assignment by Kelly NguyenShort Story Assignment by Kelly Nguyen
Short Story Assignment by Kelly Nguyenkellynguyen01
19 views17 slides
3196 The Case of The East River by
3196 The Case of The East River3196 The Case of The East River
3196 The Case of The East RiverErickANDRADE90
16 views4 slides
UNEP FI CRS Climate Risk Results.pptx by
UNEP FI CRS Climate Risk Results.pptxUNEP FI CRS Climate Risk Results.pptx
UNEP FI CRS Climate Risk Results.pptxpekka28
11 views51 slides
Advanced_Recommendation_Systems_Presentation.pptx by
Advanced_Recommendation_Systems_Presentation.pptxAdvanced_Recommendation_Systems_Presentation.pptx
Advanced_Recommendation_Systems_Presentation.pptxneeharikasingh29
5 views9 slides
VoxelNet by
VoxelNetVoxelNet
VoxelNettaeseon ryu
9 views21 slides

Recently uploaded(20)

SUPER STORE SQL PROJECT.pptx by khan888620
SUPER STORE SQL PROJECT.pptxSUPER STORE SQL PROJECT.pptx
SUPER STORE SQL PROJECT.pptx
khan88862013 views
Short Story Assignment by Kelly Nguyen by kellynguyen01
Short Story Assignment by Kelly NguyenShort Story Assignment by Kelly Nguyen
Short Story Assignment by Kelly Nguyen
kellynguyen0119 views
3196 The Case of The East River by ErickANDRADE90
3196 The Case of The East River3196 The Case of The East River
3196 The Case of The East River
ErickANDRADE9016 views
UNEP FI CRS Climate Risk Results.pptx by pekka28
UNEP FI CRS Climate Risk Results.pptxUNEP FI CRS Climate Risk Results.pptx
UNEP FI CRS Climate Risk Results.pptx
pekka2811 views
Advanced_Recommendation_Systems_Presentation.pptx by neeharikasingh29
Advanced_Recommendation_Systems_Presentation.pptxAdvanced_Recommendation_Systems_Presentation.pptx
Advanced_Recommendation_Systems_Presentation.pptx
Data about the sector workshop by info828217
Data about the sector workshopData about the sector workshop
Data about the sector workshop
info82821712 views
Organic Shopping in Google Analytics 4.pdf by GA4 Tutorials
Organic Shopping in Google Analytics 4.pdfOrganic Shopping in Google Analytics 4.pdf
Organic Shopping in Google Analytics 4.pdf
GA4 Tutorials16 views
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P... by DataScienceConferenc1
[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...
Survey on Factuality in LLM's.pptx by NeethaSherra1
Survey on Factuality in LLM's.pptxSurvey on Factuality in LLM's.pptx
Survey on Factuality in LLM's.pptx
NeethaSherra17 views
Data Journeys Hard Talk workshop final.pptx by info828217
Data Journeys Hard Talk workshop final.pptxData Journeys Hard Talk workshop final.pptx
Data Journeys Hard Talk workshop final.pptx
info82821710 views
CRIJ4385_Death Penalty_F23.pptx by yvettemm100
CRIJ4385_Death Penalty_F23.pptxCRIJ4385_Death Penalty_F23.pptx
CRIJ4385_Death Penalty_F23.pptx
yvettemm1006 views
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx by DataScienceConferenc1
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx
Ukraine Infographic_22NOV2023_v2.pdf by AnastosiyaGurin
Ukraine Infographic_22NOV2023_v2.pdfUkraine Infographic_22NOV2023_v2.pdf
Ukraine Infographic_22NOV2023_v2.pdf
AnastosiyaGurin1.4K views
PRIVACY AWRE PERSONAL DATA STORAGE by antony420421
PRIVACY AWRE PERSONAL DATA STORAGEPRIVACY AWRE PERSONAL DATA STORAGE
PRIVACY AWRE PERSONAL DATA STORAGE
antony4204215 views
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an... by StatsCommunications
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...

Text Extraction from Product Images Using State-of-the-Art Deep Learning Techniques

  • 1. 1
  • 2. Text Extraction from Product Images Rajesh Shreedhar Bhat Data Scientist @Walmart Labs, Bengaluru MS CS @ASU, Kaggle Competitions Expert
  • 3. Agenda ▪ Intro to Text Extraction ▪ Text Detection(TD) ▪ TD Model Architecture ▪ Training data generation ▪ Text Recognition(TR) training data preparation ▪ CRNN-CTC model for TR ▪ Receptive Fields ▪ CTC decoder and loss ▪ TR Training phase ▪ Other Advanced Techniques ▪ Questions ? 3
  • 4. Introduction : Text Extraction 4
  • 6. Image Segmentation - Input & Ground Truth 6
  • 7. Ground Truth Label Generation 7
  • 8. Text Detection – Model architecture ▪ VGG16 – BN as the backbone ▪ Model has skip connection in decoder part which is similar to U-Nets. ▪ Output : ▪ Region score ▪ Affinity score - grouping characters Ref:Baek, Youngmin, et al. "Character Region Awareness for Text detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019. 8
  • 9. Sample Output Region Score Affinity Score Region Score Affinity Score 9
  • 12. Text Recognition – Training Data Preparation SynthText: image generation engine for building a large annotated dataset. 15 million images generated with different font styles, size, color & varying backgrounds using product descriptions + open source datasets Vocabulary: 92 characters Includes capital + small letters, numbers and special symbols 12
  • 13. Text Recognition CRNN CTC model 13
  • 14. CNN - Receptive Fields Usage of intermediate layer features in SSD’s in Object detection tasks. ▪ Receptive field is defined as the region in the input image/space that a particular CNN’s feature is looking at. 14
  • 15. CNN features to LSTM Input Image (None, 128 * 32 * 1) Feature map (None, 1, 31, 512) Feature map (None, 1, 31, 512) SoftMax probabilities for every time step (i.e. 31), over the vocabulary. 15
  • 16. Ground Truth and Output of TR task 16 Hello Input Image Ground Truth Output from LSTM model for 31 timesteps .. Time step t1 t2 t3 t4 t5 ... …. t27 t28 t29 t30 t31 Prediction H H H e e ... ... l o o o o Length of Ground truth is 5 which is not equal to length of prediction i.e 31
  • 17. How to calculate the loss? NER model – loss: categorical cross entropy • Do we have labels for every time steps of LSTM model in CRNN setting ? • Can we use cross entropy loss? Answer is: NO!! 17
  • 18. Mapping of Input to Output Corresponding Text Hello Corresponding Text Hello Can we manually align each character to its location in the audio/image? Yes!! But lot of manual effort is needed in creating training data. 18
  • 19. CTC to rescue With just mapping from image to text and not worrying about alignment of each character to the location in input image, one should be able to train the network. Merge repeats Merge repeats Drop blank character 19
  • 20. Connectionist Temporal Classification (CTC) Loss ▪ Ground truth for an image --> AB ▪ Vocabulary is { A, B, - } ▪ Let's say we have predictions for 3-time steps from LSTM network (SoftMax probabilities over vocabulary at t1, t2, t3) ▪ Given that we use CTC decode operation discussed earlier, in which scenarios we can say output from the model is correct?? 20
  • 21. CTC loss continued .. t1 t2 t3 A B B A A B - A B A - B A B - • Merge repeats • Drop blank character AB t1 t2 t3 A 0.8 0.7 0.1 B 0.1 0.1 0.8 - 0.1 0.2 0.1 Ground Truth : AB Score for one path: AAB = (0.8 * 0.7 * 0.8) and similarly for other paths. Probability of getting GT AB: = P(ABB) + P(AAB) + P(-AB) + P(A-B) + P(AB-) Loss : - log( Probability of getting GT ) SoftMax probabilities 21
  • 22. CTC loss perfect match t1 t2 t3 A - - - A - - - A - A A A A - A - A A A A • Merge repeats • Drop blank character t1 t2 t3 A 1 0 0 B 0 0 0 - 0 1 1 • SoftMax probabilities Score for one path: A- - = (1 * 1 * 1) and similarly for other paths. Probability of getting ground truth A: = P(A--) + P(-A-) + P(--A) + P(-AA) + P(AA-) + P(A-A) + P(AAA) Loss : - log( Probability of getting GT ) = 0 Ground Truth : A A 22
  • 23. CTC loss perfect mismatch • Merge repeats • Drop blank character A t1 t2 t3 A 0 0 0 B 1 1 1 - 0 0 0 SoftMax probabilities Score for one path: A - - = (0 * 0 * 0) and similarly for other paths. Probability of getting ground truth A: = P(A--) + P(-A-) + P(--A) + P(-AA) + P(AA-) + P(A-A) + P(AAA) Loss : - log( Probability of getting GT ) = tends to infinity !! Ground Truth : A t1 t2 t3 A - - - A - - - A - A A A A - A - A A A A 23
  • 24. Model Architecture & CTC loss in TF 24
  • 25. Training Phase ▪ 15 million images ~ 690 GB when loaded into memory!! Given that on an average images are of the shape (128 * 32 * 3) and dtype is float32. ▪ Usage of Python Generators to load only single batch in memory. ▪ Reducing the training time by using workers, max_queue_size & multi-processing in .fit_generator in Keras. ▪ Training time ~ 2 hours for single epoch on single P100 GPU machine and prediction time ~1 sec for batch of 2048 images. 25
  • 26. Other Advanced Techniques Attention - OCR Spatial Transformer Network – before text recognition Ref:Jaderberg, Max, Karen Simonyan, and Andrew Zisserman. "Spatial transformer networks." Advances in neural information processing systems. 2015. 26