This document summarizes two projects:
1. A cognitive email auto tagger that categorizes customer emails to reduce manual work and improve response times. It uses machine learning to automatically tag emails and provide quick responses. The model achieves 85% accuracy.
2. A project to convert creditors/debtors PDF files to Excel by using OCR to extract table data from images of the PDFs. A GUI allows users to crop areas of interest for extraction. The goal is to reduce manual data entry work and support an ongoing marketing project.
1. Pranay Mathur and Aman Gill
Data Science Interns
Edelweiss Springboard: Projects in Big data
and Machine Learning
2. COGNITIVE EMAIL AUTO TAGGER
2
Problem Statement : To categorize incoming customer e-mails.
Why is this needed ?
To reduce man hours spent on
categorizing e-mails.
To reduce the possibility of late
responses to customer needs.
Objective
To automate the e-mail categorization
task.
To improve Company-Customer
relations by quickening responses to
customer.
We are providing
Automatic responses to multiple
requests in an email.
Self learning model - Will improve
with time.
System to increase quick responses
to customer, hence improving
customer relations.
Way Forward
Can be used for other Edelweiss
business units.
As data for minority category
increases, they can be predicted as
well.
Multi-Label Categorization as data
improves.
4. Approach
Training Module
Previous Emails
Categories
Trained Model
Email Auto -Tagger
Uncategorized
New Emails
Trained Model
Auto-Assigned Category.
Other Suggested Categories
Actionable Dates
Related Loan No.s
Instant pulse response for STPs*
*STP: Straight through Process. Processes which can be dealt automatically.
Example: Statement of Accounts, IT Certificate for previous financial years
Feedback
5. Results
5
Status and Result
Categorization model implemented with accuracy of 85% ± 3
Captures Straight Through Process (Automated responses) with high accuracy
Integration with the live server ongoing
Gives better result than manual categorization done by customer representatives
Implemented.
Approved by business.
6. OCR Driven Creditor/Debtors PDF to Excel
6
Problem Statement : To convert Creditors/Debtors Pdf files to Excel
Why is this needed ?
To reduce man hours spent on
manually typing the table data into
excel.
A sub-module for an on-going
marketing project in Edelweiss.
Objective
To convert the Pdf files into de-
skewed images.
To extract tables from these images
by developing a user-friendly
interface.
We are providing
Pdf to image conversion.
GUI for cropping area of interest.
Extracting table data and structure
from images using OCR with very
good accuracy
Way Forward
Researching profiles of companies
present in Creditors/Debtors table in
business with Edelweiss
Finding customers who might require
a loan with 2% hit rate compared to
regular advertising strategies(0.4%).
7. OCR Driven Creditor/Debtors PDF to Excel
7
Data Flow
1.Converts PDF to
Image.
2.De-Skews Image
CROP Area of Interest
OCR
GUI Module
Creditors, Debtors
PDFs
Implemented.
Used by business.