SlideShare a Scribd company logo
1 of 30
Download to read offline
VitaFlow
Video Image Text Audio - Flow
Mageswaran Dhandapani <
mageswaran.dhandapani@imaginea.com>
Agenda
- Introduction
- Receipt Information Extraction
- How it is done?
- Demo
Flashback
- We deal with lot of text related problems/challenges like clustering/classification docs, information extraction etc..
- Form filling: https://github.com/Imaginea/i-tagger
- Infinity : Pramati level innovation competition
- 2018 with GANs @ https://github.com/dhiraa/asariri
- 2019 with Audio and CNNs
- Our exploration and code base were scattered.
- We organized all our explorations under one code base called VitaFlow
- Planned R&D
- Information Extraction with an aim to generalize i.e independent of dataset (Text + Image)
- Audio related Android applications (Tensorflow Lite + Audio + Android Applications) (need funding and
resources ;))
Introduction
- Problem
- A pipeline to train and deploy DL models
- Address domain specific information extraction
- Traditional IE depends on rule based engines
- Often not easily extensible for new data
- Solution
- Design a ML/DL model pipeline
- Plug and play modules at each stage
- Data sets
- Annotations
- Pre-processing / Post processing
- Training and serving the models
- Metrics to evaluate
- Feedback loop
Pipeline
1. Raw Images
2. Image Annotations (Bounding boxes)
3. Text Detection - Text Localisation / Document Orientation Analysis + Fix
a. EAST
b. DOCT2TEXT
4. Text-Cleaner/Binarization
5. Text Recognition - OCR
a. Calamari (CNN+LSTM models)
b. Tesseract
6. Text Annotations
7. ML / Statistical Inference / Rules
8. Domain Specific Extraction
9. Data Store
Plug and Play Design
tan chay yee
0.2s4 JALAH HARMOHI
312
Date 09/0112019 8:01:11
PM
Total Amount : 31.00
….
EAST/FOTS
Information Extraction
- Rules
- Statistics Inference
- ML/DL Models
(Natural
Scene) Text
Segmentation
Image to Text
CNN +
LSTM
Models
(OCR)
Extract Text
Line segments
Vendor : tan chay yee
Total: 31
Date : 09/01/2019
Domain Specific Information Extraction
Positional Information
Annotation Tool
OCR
- OCR : Text Localization + Text Extraction
- Text Localization
- EAST (https://arxiv.org/abs/1704.03155)
- FOTS (https://arxiv.org/abs/1801.01671)
- Text Extraction
- Calamari (https://github.com/Calamari-OCR/calamari)
- Tesseract
ICDAR Dataset
- ICDAR 2015
Natural images with incidental scene text
- ICDAR 2019
Receipts and invoices
- What's unique about data preparation for OCR Text recognition?
- Its Text + Image
- Format of Images : JPEG or PNG
- Ground truth :
● One text file per image,
● UTF-8 format
● Each line specifies the coordinates of one word's bounding box and its transcription in a comma
separated format
img_01.png <-> img_01.txt
x1_1, y1_1,x2_1,y2_1,x3_1,y3_1,x4_1,y4_1, transcript_1
x1_2,y1_2,x2_2,y2_2,x3_2,y3_2,x4_2,y4_2, transcript_2
x1_3,y1_3,x2_3,y2_3,x3_3,y3_3,x4_3,y4_3, transcript_3
Data Preparation
Input:
- RGB color image (height×width×3) or a grayscale image (height×width×1)
Output
- Image matrix (height×width×3)
- Score map matrix (height×width×1) : Distance to the nearest vertex
- Geometry map matrix (height×width×5) : Bit complicated, expect a post on this soon!
https://www.jeremyjordan.me/semantic-segmentation/
Some Refreshers....
Conv Net
- http://ethereon.github.io/netscope/#/gist/db945b393d40bfa26006
- http://teleported.in/posts/decoding-resnet-architecture/
- Increase the depth of the layer without affecting its generalization power
- The network can be mathematically depicted as:
H(x) = F(x) + x, where F(x) = W2*relu(W1*x+b1)+b2
- During training period, the residual network learns the weights of its layers such that if the identity mapping were
optimal, all the weights get set to 0. In effect F(x) become 0, as in x gets directly mapped to H(x) and no
corrections need to be made. Hence these become your identity mappings which help grow the network deep.
And if there is a deviation from optimal identity mapping, weights and biases of F(x) are learned to adjust for it.
Think of F(x) as learning how to adjust our predictions to match the actuals.
ResNet
EAST
- An Efficient and Accurate Scene Text Detector
- No image matrix algorithms involved like edge detection, filtering, smoothening etc.,
- Character and word segmentation graphs
- Basically no complicated algorithms
- Detects text in an image and videos
- Geometry and confidence scores for the detected text.
- The network architecture is based on U-Net.
- Feed forward “stem” of this network may vary
-  PVANet, VGG16 used in the paper
- Our pipeline uses Resnet
- A popular text detector- Got adopted by OpenCV library.
EAST ARCHITECTURE
These skip connections from earlier layers in the network (prior to a downsampling operation) should provide
the necessary detail in order to reconstruct accurate shapes for segmentation boundaries.
Loss Function
- Cross entropy loss, won’t work
efficiently in this case as one
segmentation can dominate
other
- Dice Loss
- Where |A∩B|
represents the common
elements between sets A
and B, and |A|
represents the number
of elements in set A (and
likewise for set B).
Demo - VitaFlow in Action
Looking for Demo…
We just have to move to next
slide… ;)
Calamari OCR
Output - Image to Text
GUARANTEE
N.ASDA.COM/PRICEGUARANT
MILK
£1.46D
ACCOUNT WILL BE DEBITED AS
Calamari OCR
Challenges
250 KG @ E0.67/KG
PACQTQN
SNUING YOU MONEY EUERY
£180D
OHN LEUIS NEUBURY AT HONE
Takeaways
- Deliver model pipeline, not just models
- Make the pipeline debuggable at each stage
- Provide feedback loop, so that humans can aid the whole process
- Extracting text images has its own challenges
- Identifying the text from its background
- Varying size and fonts
- Similar looking characters (o/0, y/g)
- Recovering text from scanned/aged images
- Multi oriented text
- Black and white images
VitaFlow
For more information
● Code: https://github.com/Imaginea/vitaFlow
** We are in the process of piecing together all our individual efforts as a pipeline
** ReadME will be updated shortly with end to end replication of this talk
Thank You! ** Conditions Apply ;)
Q & A

More Related Content

Similar to VitaFlow | Mageswaran Dhandapani [Pramati]

IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...Christopher Diamantopoulos
 
Tek12: Graphing real-time performance with Graphite
Tek12: Graphing real-time performance with GraphiteTek12: Graphing real-time performance with Graphite
Tek12: Graphing real-time performance with Graphitenanderoo
 
JCConf 2022 - New Features in Java 18 & 19
JCConf 2022 - New Features in Java 18 & 19JCConf 2022 - New Features in Java 18 & 19
JCConf 2022 - New Features in Java 18 & 19Joseph Kuo
 
Multi Processor Architecture for image processing
Multi Processor Architecture for image processingMulti Processor Architecture for image processing
Multi Processor Architecture for image processingideas2ignite
 
Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)Intel® Software
 
Mirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP LibraryMirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP LibraryDeepak Shankar
 
Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019Iulian Pintoiu
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsKohei KaiGai
 
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Jason Dai
 
Accelerating Real Time Video Analytics on a Heterogenous CPU + FPGA Platform
Accelerating Real Time Video Analytics on a Heterogenous CPU + FPGA PlatformAccelerating Real Time Video Analytics on a Heterogenous CPU + FPGA Platform
Accelerating Real Time Video Analytics on a Heterogenous CPU + FPGA PlatformDatabricks
 
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...Kohei KaiGai
 
20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGaiKohei KaiGai
 
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...Equnix Business Solutions
 
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali ZaidiNatural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali ZaidiDatabricks
 
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMDEdge AI and Vision Alliance
 
How to Store and Visualize CAN Bus Telematic Data with InfluxDB Cloud and Gra...
How to Store and Visualize CAN Bus Telematic Data with InfluxDB Cloud and Gra...How to Store and Visualize CAN Bus Telematic Data with InfluxDB Cloud and Gra...
How to Store and Visualize CAN Bus Telematic Data with InfluxDB Cloud and Gra...InfluxData
 
SDVIs and In-Situ Visualization on TACC's Stampede
SDVIs and In-Situ Visualization on TACC's StampedeSDVIs and In-Situ Visualization on TACC's Stampede
SDVIs and In-Situ Visualization on TACC's StampedeIntel® Software
 

Similar to VitaFlow | Mageswaran Dhandapani [Pramati] (20)

IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
 
Tek12: Graphing real-time performance with Graphite
Tek12: Graphing real-time performance with GraphiteTek12: Graphing real-time performance with Graphite
Tek12: Graphing real-time performance with Graphite
 
PowerAI Deep dive
PowerAI Deep divePowerAI Deep dive
PowerAI Deep dive
 
JCConf 2022 - New Features in Java 18 & 19
JCConf 2022 - New Features in Java 18 & 19JCConf 2022 - New Features in Java 18 & 19
JCConf 2022 - New Features in Java 18 & 19
 
Multi Processor Architecture for image processing
Multi Processor Architecture for image processingMulti Processor Architecture for image processing
Multi Processor Architecture for image processing
 
Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)
 
Mirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP LibraryMirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP Library
 
Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019
 
Shantanu's Resume
Shantanu's ResumeShantanu's Resume
Shantanu's Resume
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
 
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
 
Accelerating Real Time Video Analytics on a Heterogenous CPU + FPGA Platform
Accelerating Real Time Video Analytics on a Heterogenous CPU + FPGA PlatformAccelerating Real Time Video Analytics on a Heterogenous CPU + FPGA Platform
Accelerating Real Time Video Analytics on a Heterogenous CPU + FPGA Platform
 
Evolution of Spark APIs
Evolution of Spark APIsEvolution of Spark APIs
Evolution of Spark APIs
 
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
 
20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai
 
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
 
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali ZaidiNatural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
 
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD
 
How to Store and Visualize CAN Bus Telematic Data with InfluxDB Cloud and Gra...
How to Store and Visualize CAN Bus Telematic Data with InfluxDB Cloud and Gra...How to Store and Visualize CAN Bus Telematic Data with InfluxDB Cloud and Gra...
How to Store and Visualize CAN Bus Telematic Data with InfluxDB Cloud and Gra...
 
SDVIs and In-Situ Visualization on TACC's Stampede
SDVIs and In-Situ Visualization on TACC's StampedeSDVIs and In-Situ Visualization on TACC's Stampede
SDVIs and In-Situ Visualization on TACC's Stampede
 

More from Pramati Technologies

Graph db - Pramati Technologies [Meetup]
Graph db - Pramati Technologies [Meetup]Graph db - Pramati Technologies [Meetup]
Graph db - Pramati Technologies [Meetup]Pramati Technologies
 
Clojure through the eyes of a Java Nut | [Mixed Nuts] at Pramati Technologies
Clojure through the eyes of a Java Nut | [Mixed Nuts] at Pramati TechnologiesClojure through the eyes of a Java Nut | [Mixed Nuts] at Pramati Technologies
Clojure through the eyes of a Java Nut | [Mixed Nuts] at Pramati TechnologiesPramati Technologies
 
Swift UI - Declarative Programming [Pramati Technologies]
Swift UI - Declarative Programming [Pramati Technologies]Swift UI - Declarative Programming [Pramati Technologies]
Swift UI - Declarative Programming [Pramati Technologies]Pramati Technologies
 
Adaptive Cards - Pramati Technologies
Adaptive Cards - Pramati TechnologiesAdaptive Cards - Pramati Technologies
Adaptive Cards - Pramati TechnologiesPramati Technologies
 
Typography Style Transfer using GANs | Pramati
Typography Style Transfer using GANs | Pramati Typography Style Transfer using GANs | Pramati
Typography Style Transfer using GANs | Pramati Pramati Technologies
 
Document Clustering using LDA | Haridas Narayanaswamy [Pramati]
Document Clustering using LDA | Haridas Narayanaswamy [Pramati]Document Clustering using LDA | Haridas Narayanaswamy [Pramati]
Document Clustering using LDA | Haridas Narayanaswamy [Pramati]Pramati Technologies
 
Pramati - Chennai Development Center
Pramati - Chennai Development CenterPramati - Chennai Development Center
Pramati - Chennai Development CenterPramati Technologies
 

More from Pramati Technologies (7)

Graph db - Pramati Technologies [Meetup]
Graph db - Pramati Technologies [Meetup]Graph db - Pramati Technologies [Meetup]
Graph db - Pramati Technologies [Meetup]
 
Clojure through the eyes of a Java Nut | [Mixed Nuts] at Pramati Technologies
Clojure through the eyes of a Java Nut | [Mixed Nuts] at Pramati TechnologiesClojure through the eyes of a Java Nut | [Mixed Nuts] at Pramati Technologies
Clojure through the eyes of a Java Nut | [Mixed Nuts] at Pramati Technologies
 
Swift UI - Declarative Programming [Pramati Technologies]
Swift UI - Declarative Programming [Pramati Technologies]Swift UI - Declarative Programming [Pramati Technologies]
Swift UI - Declarative Programming [Pramati Technologies]
 
Adaptive Cards - Pramati Technologies
Adaptive Cards - Pramati TechnologiesAdaptive Cards - Pramati Technologies
Adaptive Cards - Pramati Technologies
 
Typography Style Transfer using GANs | Pramati
Typography Style Transfer using GANs | Pramati Typography Style Transfer using GANs | Pramati
Typography Style Transfer using GANs | Pramati
 
Document Clustering using LDA | Haridas Narayanaswamy [Pramati]
Document Clustering using LDA | Haridas Narayanaswamy [Pramati]Document Clustering using LDA | Haridas Narayanaswamy [Pramati]
Document Clustering using LDA | Haridas Narayanaswamy [Pramati]
 
Pramati - Chennai Development Center
Pramati - Chennai Development CenterPramati - Chennai Development Center
Pramati - Chennai Development Center
 

Recently uploaded

Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

VitaFlow | Mageswaran Dhandapani [Pramati]

  • 1. VitaFlow Video Image Text Audio - Flow Mageswaran Dhandapani < mageswaran.dhandapani@imaginea.com>
  • 2. Agenda - Introduction - Receipt Information Extraction - How it is done? - Demo
  • 3. Flashback - We deal with lot of text related problems/challenges like clustering/classification docs, information extraction etc.. - Form filling: https://github.com/Imaginea/i-tagger - Infinity : Pramati level innovation competition - 2018 with GANs @ https://github.com/dhiraa/asariri - 2019 with Audio and CNNs - Our exploration and code base were scattered. - We organized all our explorations under one code base called VitaFlow - Planned R&D - Information Extraction with an aim to generalize i.e independent of dataset (Text + Image) - Audio related Android applications (Tensorflow Lite + Audio + Android Applications) (need funding and resources ;))
  • 4. Introduction - Problem - A pipeline to train and deploy DL models - Address domain specific information extraction - Traditional IE depends on rule based engines - Often not easily extensible for new data - Solution - Design a ML/DL model pipeline - Plug and play modules at each stage - Data sets - Annotations - Pre-processing / Post processing - Training and serving the models - Metrics to evaluate - Feedback loop
  • 5. Pipeline 1. Raw Images 2. Image Annotations (Bounding boxes) 3. Text Detection - Text Localisation / Document Orientation Analysis + Fix a. EAST b. DOCT2TEXT 4. Text-Cleaner/Binarization 5. Text Recognition - OCR a. Calamari (CNN+LSTM models) b. Tesseract 6. Text Annotations 7. ML / Statistical Inference / Rules 8. Domain Specific Extraction 9. Data Store
  • 6. Plug and Play Design
  • 7. tan chay yee 0.2s4 JALAH HARMOHI 312 Date 09/0112019 8:01:11 PM Total Amount : 31.00 …. EAST/FOTS Information Extraction - Rules - Statistics Inference - ML/DL Models (Natural Scene) Text Segmentation Image to Text CNN + LSTM Models (OCR) Extract Text Line segments Vendor : tan chay yee Total: 31 Date : 09/01/2019 Domain Specific Information Extraction Positional Information
  • 9. OCR - OCR : Text Localization + Text Extraction - Text Localization - EAST (https://arxiv.org/abs/1704.03155) - FOTS (https://arxiv.org/abs/1801.01671) - Text Extraction - Calamari (https://github.com/Calamari-OCR/calamari) - Tesseract
  • 10. ICDAR Dataset - ICDAR 2015 Natural images with incidental scene text - ICDAR 2019 Receipts and invoices - What's unique about data preparation for OCR Text recognition? - Its Text + Image - Format of Images : JPEG or PNG - Ground truth : ● One text file per image, ● UTF-8 format ● Each line specifies the coordinates of one word's bounding box and its transcription in a comma separated format img_01.png <-> img_01.txt x1_1, y1_1,x2_1,y2_1,x3_1,y3_1,x4_1,y4_1, transcript_1 x1_2,y1_2,x2_2,y2_2,x3_2,y3_2,x4_2,y4_2, transcript_2 x1_3,y1_3,x2_3,y2_3,x3_3,y3_3,x4_3,y4_3, transcript_3
  • 11. Data Preparation Input: - RGB color image (height×width×3) or a grayscale image (height×width×1) Output - Image matrix (height×width×3) - Score map matrix (height×width×1) : Distance to the nearest vertex - Geometry map matrix (height×width×5) : Bit complicated, expect a post on this soon! https://www.jeremyjordan.me/semantic-segmentation/
  • 12.
  • 15. - http://ethereon.github.io/netscope/#/gist/db945b393d40bfa26006 - http://teleported.in/posts/decoding-resnet-architecture/ - Increase the depth of the layer without affecting its generalization power - The network can be mathematically depicted as: H(x) = F(x) + x, where F(x) = W2*relu(W1*x+b1)+b2 - During training period, the residual network learns the weights of its layers such that if the identity mapping were optimal, all the weights get set to 0. In effect F(x) become 0, as in x gets directly mapped to H(x) and no corrections need to be made. Hence these become your identity mappings which help grow the network deep. And if there is a deviation from optimal identity mapping, weights and biases of F(x) are learned to adjust for it. Think of F(x) as learning how to adjust our predictions to match the actuals. ResNet
  • 16. EAST - An Efficient and Accurate Scene Text Detector - No image matrix algorithms involved like edge detection, filtering, smoothening etc., - Character and word segmentation graphs - Basically no complicated algorithms - Detects text in an image and videos - Geometry and confidence scores for the detected text. - The network architecture is based on U-Net. - Feed forward “stem” of this network may vary -  PVANet, VGG16 used in the paper - Our pipeline uses Resnet - A popular text detector- Got adopted by OpenCV library.
  • 18.
  • 19. These skip connections from earlier layers in the network (prior to a downsampling operation) should provide the necessary detail in order to reconstruct accurate shapes for segmentation boundaries.
  • 20. Loss Function - Cross entropy loss, won’t work efficiently in this case as one segmentation can dominate other - Dice Loss - Where |A∩B| represents the common elements between sets A and B, and |A| represents the number of elements in set A (and likewise for set B).
  • 21. Demo - VitaFlow in Action
  • 22. Looking for Demo… We just have to move to next slide… ;)
  • 23.
  • 24.
  • 25.
  • 26. Calamari OCR Output - Image to Text GUARANTEE N.ASDA.COM/PRICEGUARANT MILK £1.46D ACCOUNT WILL BE DEBITED AS
  • 27. Calamari OCR Challenges 250 KG @ E0.67/KG PACQTQN SNUING YOU MONEY EUERY £180D OHN LEUIS NEUBURY AT HONE
  • 28. Takeaways - Deliver model pipeline, not just models - Make the pipeline debuggable at each stage - Provide feedback loop, so that humans can aid the whole process - Extracting text images has its own challenges - Identifying the text from its background - Varying size and fonts - Similar looking characters (o/0, y/g) - Recovering text from scanned/aged images - Multi oriented text - Black and white images
  • 29. VitaFlow For more information ● Code: https://github.com/Imaginea/vitaFlow ** We are in the process of piecing together all our individual efforts as a pipeline ** ReadME will be updated shortly with end to end replication of this talk Thank You! ** Conditions Apply ;)
  • 30. Q & A