SlideShare a Scribd company logo
1 of 30
Download to read offline
VitaFlow
Video Image Text Audio - Flow
Mageswaran Dhandapani <
mageswaran.dhandapani@imaginea.com>
Agenda
- Introduction
- Receipt Information Extraction
- How it is done?
- Demo
Flashback
- We deal with lot of text related problems/challenges like clustering/classification docs, information extraction etc..
- Form filling: https://github.com/Imaginea/i-tagger
- Infinity : Pramati level innovation competition
- 2018 with GANs @ https://github.com/dhiraa/asariri
- 2019 with Audio and CNNs
- Our exploration and code base were scattered.
- We organized all our explorations under one code base called VitaFlow
- Planned R&D
- Information Extraction with an aim to generalize i.e independent of dataset (Text + Image)
- Audio related Android applications (Tensorflow Lite + Audio + Android Applications) (need funding and
resources ;))
Introduction
- Problem
- A pipeline to train and deploy DL models
- Address domain specific information extraction
- Traditional IE depends on rule based engines
- Often not easily extensible for new data
- Solution
- Design a ML/DL model pipeline
- Plug and play modules at each stage
- Data sets
- Annotations
- Pre-processing / Post processing
- Training and serving the models
- Metrics to evaluate
- Feedback loop
Pipeline
1. Raw Images
2. Image Annotations (Bounding boxes)
3. Text Detection - Text Localisation / Document Orientation Analysis + Fix
a. EAST
b. DOCT2TEXT
4. Text-Cleaner/Binarization
5. Text Recognition - OCR
a. Calamari (CNN+LSTM models)
b. Tesseract
6. Text Annotations
7. ML / Statistical Inference / Rules
8. Domain Specific Extraction
9. Data Store
Plug and Play Design
tan chay yee
0.2s4 JALAH HARMOHI
312
Date 09/0112019 8:01:11
PM
Total Amount : 31.00
….
EAST/FOTS
Information Extraction
- Rules
- Statistics Inference
- ML/DL Models
(Natural
Scene) Text
Segmentation
Image to Text
CNN +
LSTM
Models
(OCR)
Extract Text
Line segments
Vendor : tan chay yee
Total: 31
Date : 09/01/2019
Domain Specific Information Extraction
Positional Information
Annotation Tool
OCR
- OCR : Text Localization + Text Extraction
- Text Localization
- EAST (https://arxiv.org/abs/1704.03155)
- FOTS (https://arxiv.org/abs/1801.01671)
- Text Extraction
- Calamari (https://github.com/Calamari-OCR/calamari)
- Tesseract
ICDAR Dataset
- ICDAR 2015
Natural images with incidental scene text
- ICDAR 2019
Receipts and invoices
- What's unique about data preparation for OCR Text recognition?
- Its Text + Image
- Format of Images : JPEG or PNG
- Ground truth :
● One text file per image,
● UTF-8 format
● Each line specifies the coordinates of one word's bounding box and its transcription in a comma
separated format
img_01.png <-> img_01.txt
x1_1, y1_1,x2_1,y2_1,x3_1,y3_1,x4_1,y4_1, transcript_1
x1_2,y1_2,x2_2,y2_2,x3_2,y3_2,x4_2,y4_2, transcript_2
x1_3,y1_3,x2_3,y2_3,x3_3,y3_3,x4_3,y4_3, transcript_3
Data Preparation
Input:
- RGB color image (height×width×3) or a grayscale image (height×width×1)
Output
- Image matrix (height×width×3)
- Score map matrix (height×width×1) : Distance to the nearest vertex
- Geometry map matrix (height×width×5) : Bit complicated, expect a post on this soon!
https://www.jeremyjordan.me/semantic-segmentation/
Some Refreshers....
Conv Net
- http://ethereon.github.io/netscope/#/gist/db945b393d40bfa26006
- http://teleported.in/posts/decoding-resnet-architecture/
- Increase the depth of the layer without affecting its generalization power
- The network can be mathematically depicted as:
H(x) = F(x) + x, where F(x) = W2*relu(W1*x+b1)+b2
- During training period, the residual network learns the weights of its layers such that if the identity mapping were
optimal, all the weights get set to 0. In effect F(x) become 0, as in x gets directly mapped to H(x) and no
corrections need to be made. Hence these become your identity mappings which help grow the network deep.
And if there is a deviation from optimal identity mapping, weights and biases of F(x) are learned to adjust for it.
Think of F(x) as learning how to adjust our predictions to match the actuals.
ResNet
EAST
- An Efficient and Accurate Scene Text Detector
- No image matrix algorithms involved like edge detection, filtering, smoothening etc.,
- Character and word segmentation graphs
- Basically no complicated algorithms
- Detects text in an image and videos
- Geometry and confidence scores for the detected text.
- The network architecture is based on U-Net.
- Feed forward “stem” of this network may vary
-  PVANet, VGG16 used in the paper
- Our pipeline uses Resnet
- A popular text detector- Got adopted by OpenCV library.
EAST ARCHITECTURE
These skip connections from earlier layers in the network (prior to a downsampling operation) should provide
the necessary detail in order to reconstruct accurate shapes for segmentation boundaries.
Loss Function
- Cross entropy loss, won’t work
efficiently in this case as one
segmentation can dominate
other
- Dice Loss
- Where |A∩B|
represents the common
elements between sets A
and B, and |A|
represents the number
of elements in set A (and
likewise for set B).
Demo - VitaFlow in Action
Looking for Demo…
We just have to move to next
slide… ;)
Calamari OCR
Output - Image to Text
GUARANTEE
N.ASDA.COM/PRICEGUARANT
MILK
£1.46D
ACCOUNT WILL BE DEBITED AS
Calamari OCR
Challenges
250 KG @ E0.67/KG
PACQTQN
SNUING YOU MONEY EUERY
£180D
OHN LEUIS NEUBURY AT HONE
Takeaways
- Deliver model pipeline, not just models
- Make the pipeline debuggable at each stage
- Provide feedback loop, so that humans can aid the whole process
- Extracting text images has its own challenges
- Identifying the text from its background
- Varying size and fonts
- Similar looking characters (o/0, y/g)
- Recovering text from scanned/aged images
- Multi oriented text
- Black and white images
VitaFlow
For more information
● Code: https://github.com/Imaginea/vitaFlow
** We are in the process of piecing together all our individual efforts as a pipeline
** ReadME will be updated shortly with end to end replication of this talk
Thank You! ** Conditions Apply ;)
Q & A

More Related Content

Similar to VitaFlow | Mageswaran Dhandapani [Pramati]

Tek12: Graphing real-time performance with Graphite
Tek12: Graphing real-time performance with GraphiteTek12: Graphing real-time performance with Graphite
Tek12: Graphing real-time performance with Graphite
nanderoo
 
Multi Processor Architecture for image processing
Multi Processor Architecture for image processingMulti Processor Architecture for image processing
Multi Processor Architecture for image processing
ideas2ignite
 
Accelerating Real Time Video Analytics on a Heterogenous CPU + FPGA Platform
Accelerating Real Time Video Analytics on a Heterogenous CPU + FPGA PlatformAccelerating Real Time Video Analytics on a Heterogenous CPU + FPGA Platform
Accelerating Real Time Video Analytics on a Heterogenous CPU + FPGA Platform
Databricks
 
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD
Edge AI and Vision Alliance
 
SDVIs and In-Situ Visualization on TACC's Stampede
SDVIs and In-Situ Visualization on TACC's StampedeSDVIs and In-Situ Visualization on TACC's Stampede
SDVIs and In-Situ Visualization on TACC's Stampede
Intel® Software
 

Similar to VitaFlow | Mageswaran Dhandapani [Pramati] (20)

IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
 
Tek12: Graphing real-time performance with Graphite
Tek12: Graphing real-time performance with GraphiteTek12: Graphing real-time performance with Graphite
Tek12: Graphing real-time performance with Graphite
 
PowerAI Deep dive
PowerAI Deep divePowerAI Deep dive
PowerAI Deep dive
 
JCConf 2022 - New Features in Java 18 & 19
JCConf 2022 - New Features in Java 18 & 19JCConf 2022 - New Features in Java 18 & 19
JCConf 2022 - New Features in Java 18 & 19
 
Multi Processor Architecture for image processing
Multi Processor Architecture for image processingMulti Processor Architecture for image processing
Multi Processor Architecture for image processing
 
Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)
 
Mirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP LibraryMirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP Library
 
Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019
 
Shantanu's Resume
Shantanu's ResumeShantanu's Resume
Shantanu's Resume
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
 
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
 
Accelerating Real Time Video Analytics on a Heterogenous CPU + FPGA Platform
Accelerating Real Time Video Analytics on a Heterogenous CPU + FPGA PlatformAccelerating Real Time Video Analytics on a Heterogenous CPU + FPGA Platform
Accelerating Real Time Video Analytics on a Heterogenous CPU + FPGA Platform
 
Evolution of Spark APIs
Evolution of Spark APIsEvolution of Spark APIs
Evolution of Spark APIs
 
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
 
20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai
 
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
 
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali ZaidiNatural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
 
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD
 
How to Store and Visualize CAN Bus Telematic Data with InfluxDB Cloud and Gra...
How to Store and Visualize CAN Bus Telematic Data with InfluxDB Cloud and Gra...How to Store and Visualize CAN Bus Telematic Data with InfluxDB Cloud and Gra...
How to Store and Visualize CAN Bus Telematic Data with InfluxDB Cloud and Gra...
 
SDVIs and In-Situ Visualization on TACC's Stampede
SDVIs and In-Situ Visualization on TACC's StampedeSDVIs and In-Situ Visualization on TACC's Stampede
SDVIs and In-Situ Visualization on TACC's Stampede
 

More from Pramati Technologies

More from Pramati Technologies (7)

Graph db - Pramati Technologies [Meetup]
Graph db - Pramati Technologies [Meetup]Graph db - Pramati Technologies [Meetup]
Graph db - Pramati Technologies [Meetup]
 
Clojure through the eyes of a Java Nut | [Mixed Nuts] at Pramati Technologies
Clojure through the eyes of a Java Nut | [Mixed Nuts] at Pramati TechnologiesClojure through the eyes of a Java Nut | [Mixed Nuts] at Pramati Technologies
Clojure through the eyes of a Java Nut | [Mixed Nuts] at Pramati Technologies
 
Swift UI - Declarative Programming [Pramati Technologies]
Swift UI - Declarative Programming [Pramati Technologies]Swift UI - Declarative Programming [Pramati Technologies]
Swift UI - Declarative Programming [Pramati Technologies]
 
Adaptive Cards - Pramati Technologies
Adaptive Cards - Pramati TechnologiesAdaptive Cards - Pramati Technologies
Adaptive Cards - Pramati Technologies
 
Typography Style Transfer using GANs | Pramati
Typography Style Transfer using GANs | Pramati Typography Style Transfer using GANs | Pramati
Typography Style Transfer using GANs | Pramati
 
Document Clustering using LDA | Haridas Narayanaswamy [Pramati]
Document Clustering using LDA | Haridas Narayanaswamy [Pramati]Document Clustering using LDA | Haridas Narayanaswamy [Pramati]
Document Clustering using LDA | Haridas Narayanaswamy [Pramati]
 
Pramati - Chennai Development Center
Pramati - Chennai Development CenterPramati - Chennai Development Center
Pramati - Chennai Development Center
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Recently uploaded (20)

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

VitaFlow | Mageswaran Dhandapani [Pramati]

  • 1. VitaFlow Video Image Text Audio - Flow Mageswaran Dhandapani < mageswaran.dhandapani@imaginea.com>
  • 2. Agenda - Introduction - Receipt Information Extraction - How it is done? - Demo
  • 3. Flashback - We deal with lot of text related problems/challenges like clustering/classification docs, information extraction etc.. - Form filling: https://github.com/Imaginea/i-tagger - Infinity : Pramati level innovation competition - 2018 with GANs @ https://github.com/dhiraa/asariri - 2019 with Audio and CNNs - Our exploration and code base were scattered. - We organized all our explorations under one code base called VitaFlow - Planned R&D - Information Extraction with an aim to generalize i.e independent of dataset (Text + Image) - Audio related Android applications (Tensorflow Lite + Audio + Android Applications) (need funding and resources ;))
  • 4. Introduction - Problem - A pipeline to train and deploy DL models - Address domain specific information extraction - Traditional IE depends on rule based engines - Often not easily extensible for new data - Solution - Design a ML/DL model pipeline - Plug and play modules at each stage - Data sets - Annotations - Pre-processing / Post processing - Training and serving the models - Metrics to evaluate - Feedback loop
  • 5. Pipeline 1. Raw Images 2. Image Annotations (Bounding boxes) 3. Text Detection - Text Localisation / Document Orientation Analysis + Fix a. EAST b. DOCT2TEXT 4. Text-Cleaner/Binarization 5. Text Recognition - OCR a. Calamari (CNN+LSTM models) b. Tesseract 6. Text Annotations 7. ML / Statistical Inference / Rules 8. Domain Specific Extraction 9. Data Store
  • 6. Plug and Play Design
  • 7. tan chay yee 0.2s4 JALAH HARMOHI 312 Date 09/0112019 8:01:11 PM Total Amount : 31.00 …. EAST/FOTS Information Extraction - Rules - Statistics Inference - ML/DL Models (Natural Scene) Text Segmentation Image to Text CNN + LSTM Models (OCR) Extract Text Line segments Vendor : tan chay yee Total: 31 Date : 09/01/2019 Domain Specific Information Extraction Positional Information
  • 9. OCR - OCR : Text Localization + Text Extraction - Text Localization - EAST (https://arxiv.org/abs/1704.03155) - FOTS (https://arxiv.org/abs/1801.01671) - Text Extraction - Calamari (https://github.com/Calamari-OCR/calamari) - Tesseract
  • 10. ICDAR Dataset - ICDAR 2015 Natural images with incidental scene text - ICDAR 2019 Receipts and invoices - What's unique about data preparation for OCR Text recognition? - Its Text + Image - Format of Images : JPEG or PNG - Ground truth : ● One text file per image, ● UTF-8 format ● Each line specifies the coordinates of one word's bounding box and its transcription in a comma separated format img_01.png <-> img_01.txt x1_1, y1_1,x2_1,y2_1,x3_1,y3_1,x4_1,y4_1, transcript_1 x1_2,y1_2,x2_2,y2_2,x3_2,y3_2,x4_2,y4_2, transcript_2 x1_3,y1_3,x2_3,y2_3,x3_3,y3_3,x4_3,y4_3, transcript_3
  • 11. Data Preparation Input: - RGB color image (height×width×3) or a grayscale image (height×width×1) Output - Image matrix (height×width×3) - Score map matrix (height×width×1) : Distance to the nearest vertex - Geometry map matrix (height×width×5) : Bit complicated, expect a post on this soon! https://www.jeremyjordan.me/semantic-segmentation/
  • 12.
  • 15. - http://ethereon.github.io/netscope/#/gist/db945b393d40bfa26006 - http://teleported.in/posts/decoding-resnet-architecture/ - Increase the depth of the layer without affecting its generalization power - The network can be mathematically depicted as: H(x) = F(x) + x, where F(x) = W2*relu(W1*x+b1)+b2 - During training period, the residual network learns the weights of its layers such that if the identity mapping were optimal, all the weights get set to 0. In effect F(x) become 0, as in x gets directly mapped to H(x) and no corrections need to be made. Hence these become your identity mappings which help grow the network deep. And if there is a deviation from optimal identity mapping, weights and biases of F(x) are learned to adjust for it. Think of F(x) as learning how to adjust our predictions to match the actuals. ResNet
  • 16. EAST - An Efficient and Accurate Scene Text Detector - No image matrix algorithms involved like edge detection, filtering, smoothening etc., - Character and word segmentation graphs - Basically no complicated algorithms - Detects text in an image and videos - Geometry and confidence scores for the detected text. - The network architecture is based on U-Net. - Feed forward “stem” of this network may vary -  PVANet, VGG16 used in the paper - Our pipeline uses Resnet - A popular text detector- Got adopted by OpenCV library.
  • 18.
  • 19. These skip connections from earlier layers in the network (prior to a downsampling operation) should provide the necessary detail in order to reconstruct accurate shapes for segmentation boundaries.
  • 20. Loss Function - Cross entropy loss, won’t work efficiently in this case as one segmentation can dominate other - Dice Loss - Where |A∩B| represents the common elements between sets A and B, and |A| represents the number of elements in set A (and likewise for set B).
  • 21. Demo - VitaFlow in Action
  • 22. Looking for Demo… We just have to move to next slide… ;)
  • 23.
  • 24.
  • 25.
  • 26. Calamari OCR Output - Image to Text GUARANTEE N.ASDA.COM/PRICEGUARANT MILK £1.46D ACCOUNT WILL BE DEBITED AS
  • 27. Calamari OCR Challenges 250 KG @ E0.67/KG PACQTQN SNUING YOU MONEY EUERY £180D OHN LEUIS NEUBURY AT HONE
  • 28. Takeaways - Deliver model pipeline, not just models - Make the pipeline debuggable at each stage - Provide feedback loop, so that humans can aid the whole process - Extracting text images has its own challenges - Identifying the text from its background - Varying size and fonts - Similar looking characters (o/0, y/g) - Recovering text from scanned/aged images - Multi oriented text - Black and white images
  • 29. VitaFlow For more information ● Code: https://github.com/Imaginea/vitaFlow ** We are in the process of piecing together all our individual efforts as a pipeline ** ReadME will be updated shortly with end to end replication of this talk Thank You! ** Conditions Apply ;)
  • 30. Q & A