Automated Cheque Processing of
Indian Bank Cheques
Story of Cheque Transactions
Bengal Bank,
established in 1784,
was the first bank to
introduce cheque
system in India.
Majority the
cheques processed
are hand written
In 2017, 405 million
cheques were used
for payments and to
acquire cash across
the UK
The Problem Begins…
Manual evaluation
Human errors
Workforce
Expensive
Proposed Solution
Automation
Recognition of handwritten
information (digits)
Reduce manual effort
Reduce delays
Validations
What More?
Webservice Integration
Digitization for auditing
Faster operations
Parallel Processing
How is it done?
Automated Handwritten Digit
Recognition
Fragmentation and
Identification
Neural Networks
Validation
What did we use?
 MNIST Dataset
 Python
 Image processing libraries (PIL)
 Computer Vision libraries (OpenCV)
 Keras with Tensorflow Backend
 CNN
Preprocessing
 Extract Amount ROI
 Identifying ROI using
pattern Matching
 Convert to greyscale (invert)
and intensify
 Extract Date ROI
 Segment Date ROI and Amount ROI, to get
individual digit images
 Resize the digit images to 28 x 28
Algorithms
 Random Forest
 Pros:
 Used for uneven dataset
and missing values
 Cons:
 Tendency to Overfit
 Neural Networks
 Pros:
 Fault tolerance, parallel
processing
 Cons:
 Blackbox solution
Architecture
Scanned
Images
Image
Processor
ROI Extractor
Digit
Segmenter
Digit Identifier
using Model
Combiner
Business
Validator
Code Concepts
 Image Processing
 Resize
 Convert to greyscale (Invert)
 Intensify
 ROI Extraction
 Sectioning the area
 Pattern Matching
Code Concepts
 Digit Segmentation
 Find Contours
 Find bounding boxes and extract
 Pad digit with black background
 Digit Identification
 Load model and identify the digit
Code Concepts
 Combiner
 Get date string and convert to datetime
 Get Amount as string and convert to int
 Business Validation
 Read Validation CSV and Compare Values
 Print Results
amtPredicted dtPredicted fname CorrectAmtPrediction CorrectDtPrediction Lessthan90days validCheque humanValNeeded
567 29/11/18cheque_3_noisy.jpg TRUE TRUE TRUE TRUE 0
29 19/07/15
cheque_4_amountStarted0.jp
g TRUE TRUE FALSE FALSE 0
10000 11/02/19cheque_5_comma.jpg TRUE TRUE TRUE TRUE 0
1234 10/10/18cheque_2.jpg TRUE TRUE FALSE FALSE 0
127 22/05/17hdfc_joinedNumbers.jpg FALSE FALSE FALSE FALSE 1
890 01/12/18cheque_1.jpg TRUE TRUE TRUE TRUE 0
Code Concepts
 Model
 Image Data Generator
Results
 ROI Extraction accuracy
IOU / Jaccard Index
 MNIST accuracy
Model training
Overall accuracy 95-98%
Digit level misclassifications
Confusion Matrix
Classification Report
Production Plan
 Use more than one algorithm to predict
digits
 Provide as a web service
 Provide more configurable options for
generalization
 Use rule based predictions for Date
Challenges
 Joined Digits
 Model accuracy for Digits like 0 and 1
 Orientation of Image
 Image Quality and Noise
 Special symbols
 Decimal values
Automated cheque recognition

Automated cheque recognition

  • 1.
    Automated Cheque Processingof Indian Bank Cheques
  • 2.
    Story of ChequeTransactions Bengal Bank, established in 1784, was the first bank to introduce cheque system in India. Majority the cheques processed are hand written In 2017, 405 million cheques were used for payments and to acquire cash across the UK
  • 3.
    The Problem Begins… Manualevaluation Human errors Workforce Expensive
  • 4.
    Proposed Solution Automation Recognition ofhandwritten information (digits) Reduce manual effort Reduce delays Validations
  • 5.
    What More? Webservice Integration Digitizationfor auditing Faster operations Parallel Processing
  • 6.
    How is itdone? Automated Handwritten Digit Recognition Fragmentation and Identification Neural Networks Validation
  • 7.
    What did weuse?  MNIST Dataset  Python  Image processing libraries (PIL)  Computer Vision libraries (OpenCV)  Keras with Tensorflow Backend  CNN
  • 8.
    Preprocessing  Extract AmountROI  Identifying ROI using pattern Matching  Convert to greyscale (invert) and intensify  Extract Date ROI  Segment Date ROI and Amount ROI, to get individual digit images  Resize the digit images to 28 x 28
  • 9.
    Algorithms  Random Forest Pros:  Used for uneven dataset and missing values  Cons:  Tendency to Overfit  Neural Networks  Pros:  Fault tolerance, parallel processing  Cons:  Blackbox solution
  • 10.
  • 11.
    Code Concepts  ImageProcessing  Resize  Convert to greyscale (Invert)  Intensify  ROI Extraction  Sectioning the area  Pattern Matching
  • 12.
    Code Concepts  DigitSegmentation  Find Contours  Find bounding boxes and extract  Pad digit with black background  Digit Identification  Load model and identify the digit
  • 13.
    Code Concepts  Combiner Get date string and convert to datetime  Get Amount as string and convert to int  Business Validation  Read Validation CSV and Compare Values  Print Results amtPredicted dtPredicted fname CorrectAmtPrediction CorrectDtPrediction Lessthan90days validCheque humanValNeeded 567 29/11/18cheque_3_noisy.jpg TRUE TRUE TRUE TRUE 0 29 19/07/15 cheque_4_amountStarted0.jp g TRUE TRUE FALSE FALSE 0 10000 11/02/19cheque_5_comma.jpg TRUE TRUE TRUE TRUE 0 1234 10/10/18cheque_2.jpg TRUE TRUE FALSE FALSE 0 127 22/05/17hdfc_joinedNumbers.jpg FALSE FALSE FALSE FALSE 1 890 01/12/18cheque_1.jpg TRUE TRUE TRUE TRUE 0
  • 14.
    Code Concepts  Model Image Data Generator
  • 15.
    Results  ROI Extractionaccuracy IOU / Jaccard Index  MNIST accuracy Model training Overall accuracy 95-98% Digit level misclassifications Confusion Matrix Classification Report
  • 16.
    Production Plan  Usemore than one algorithm to predict digits  Provide as a web service  Provide more configurable options for generalization  Use rule based predictions for Date
  • 17.
    Challenges  Joined Digits Model accuracy for Digits like 0 and 1  Orientation of Image  Image Quality and Noise  Special symbols  Decimal values