SlideShare a Scribd company logo
1 of 71
Download to read offline
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Address Models for the Indian e-Commerce
Domain
T. Ravindra Babu, Ph.D.
Head, Data Science, Sahaj.ai
16 December 2021
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Presentation Plan
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Data Challenges, Modeling, Solutions and Deployment
Experimental Results
Summary
Monkey Typed Address Classification
Context Setting
Solution Overview
Experimentation
Summary
Address Clustering
Motivation, Solution Overview and Experimental Results
Recent Advances in Address Modelling
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Motivation and Problem Definition
I Definition of an address1: Address is the one that specifies
location by reference to a thoroughfare or a landmark; or it
specifies a point of postal delivery
1
PDFC Subcommittee for Culture and Demographic Data. 2001. United
States Thoroughfare, Landmark, and Postal Address Data Standard.
https://www.fgdc.gov/standards/projects/address-data/
AddressDataStandardPart01
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Motivation and Problem Definition
I Definition of an address1: Address is the one that specifies
location by reference to a thoroughfare or a landmark; or it
specifies a point of postal delivery
I Motivation
I Non-standard addresses
I Spell Variations
I Inadvertent Separation or Joining of area names
I Long addresses and their equivalence
I Grouping of ”similar” addresses for fraud detection
I Origin, Identification and Isolation of Monkey Typed Addresses
I Address non-deliverability/incomplete address
1
PDFC Subcommittee for Culture and Demographic Data. 2001. United
States Thoroughfare, Landmark, and Postal Address Data Standard.
https://www.fgdc.gov/standards/projects/address-data/
AddressDataStandardPart01
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Data Challenges, Modeling, Solutions and Deployment
Experimental Results
Summary
Address Classification
I Problem Definition
I Typical Operations Scenario at Delivery Hub without a model
I Inscan of shipments received from Mother Hub
I Manual reading of address; Assign to the Route/FE
I Sorting and Delivery
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Data Challenges, Modeling, Solutions and Deployment
Experimental Results
Summary
Address Classification
I Problem Definition
I Typical Operations Scenario at Delivery Hub without a model
I Inscan of shipments received from Mother Hub
I Manual reading of address; Assign to the Route/FE
I Sorting and Delivery
I Overview of Proposed Solution
I Capturing FEs’ domain knowledge and modelling around it
I Classifying an address to be belonging to a pre-defined subarea
I Allocation of the shipments to Route/FE based on Machine
Learning based Classifier
I Sorting and Delivery
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Data Challenges, Modeling, Solutions and Deployment
Experimental Results
Summary
Delivery Hub and Subareas
Figure: Hub and Subareas
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Data Challenges, Modeling, Solutions and Deployment
Experimental Results
Summary
Insights into Address Data
I No. of words in an addresses ranges from 4 to 75 leaving few
outliers of more than 100.
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Data Challenges, Modeling, Solutions and Deployment
Experimental Results
Summary
Insights into Address Data
I No. of words in an addresses ranges from 4 to 75 leaving few
outliers of more than 100.
I Word like Apartments is spelt in 263 different ways; whitefield
24 ways, industrial 25 ways, Bangalore 161 ways, karnataka
70 ways, etc.
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Data Challenges, Modeling, Solutions and Deployment
Experimental Results
Summary
Insights into Address Data
I No. of words in an addresses ranges from 4 to 75 leaving few
outliers of more than 100.
I Word like Apartments is spelt in 263 different ways; whitefield
24 ways, industrial 25 ways, Bangalore 161 ways, karnataka
70 ways, etc.
I Structure in address is lacking even in city like Bangalore.
Few examples.
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Data Challenges, Modeling, Solutions and Deployment
Experimental Results
Summary
Insights into Address Data
I No. of words in an addresses ranges from 4 to 75 leaving few
outliers of more than 100.
I Word like Apartments is spelt in 263 different ways; whitefield
24 ways, industrial 25 ways, Bangalore 161 ways, karnataka
70 ways, etc.
I Structure in address is lacking even in city like Bangalore.
Few examples.
I Some words a specific to certain places/states. Examples:
halli, hobli; bawdi, kuan; society; layout; etc.
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Data Challenges, Modeling, Solutions and Deployment
Experimental Results
Summary
Insights into Address Data
I No. of words in an addresses ranges from 4 to 75 leaving few
outliers of more than 100.
I Word like Apartments is spelt in 263 different ways; whitefield
24 ways, industrial 25 ways, Bangalore 161 ways, karnataka
70 ways, etc.
I Structure in address is lacking even in city like Bangalore.
Few examples.
I Some words a specific to certain places/states. Examples:
halli, hobli; bawdi, kuan; society; layout; etc.
I Addressing Systems across the world: US, Europe, Korea,
Japan; countries like Brazil, and India
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Data Challenges, Modeling, Solutions and Deployment
Experimental Results
Summary
Sample Addresses
Table: Sample Addresses
Sl.No. Address
1 Raghavendra Layout PattanagereBhel
Layout Rajarajeshwari nagar 560098
2 Adval Infotech BaNakal Karnataka India 560019
3 Jyothi Enclave 1st A cross Kaggadaspura
CV Raman nagar opposite August Park 560093
4 1st Main Cross 2ndBCross Nanjappa Layout
Adugudi. Ganesha Temple 560030
5 OCEANOUS TRITON OPP TOTAL MALL
OFF SARJAPUR RDpo bellundur 560103
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Data Challenges, Modeling, Solutions and Deployment
Experimental Results
Summary
Example of address with 75 words
Example: AD5LXSZJYRIT40GGELRLWM Flat No. 1005, Oceanus
Greendale Apartment, Jasmine Block, Hoysala Nagar 3rd Main
Road, Ram Murthy Nagar, Bangalore - 560016, L/M- Opposite
Lord Ganesha Temple on Hoysalanagar 3r d Main Road.
Directions- 1. Take Right on Banaswadi signal on outer Ring
Road. 2. Enter Horamavu M ain Road. 3. After Railway gate take
first right. 4.At the dead end turh right, Lord ganesha templ e will
be on the left. 5. Opposite Road will take you to the apartment.
Bangalore 5 60016 2011-11-17 18:32:31
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Data Challenges, Modeling, Solutions and Deployment
Experimental Results
Summary
Preprocessing and Modelling Aspects
I An elaborate preprocessing model was necessary that accounts
for the following.
I Objective is to only those terms that help classification
(discriminability)
I Cleaning of Addresses
I Probabilistic Separation of Words
I Integrating domain knowledge
I Machine Learning based dictionary generation
I Classification of potentially fraudulent addresses
I Generation of n-grams using modified Frequent
Pattern(FP)-tree
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Data Challenges, Modeling, Solutions and Deployment
Experimental Results
Summary
Preprocessing for Data Compaction
Figure: Impact of Preprocessing
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Data Challenges, Modeling, Solutions and Deployment
Experimental Results
Summary
Equivalent set generation using Clustering
Table: A Sample of Equivalent Terms identified by Clustering
Cluster Cluster members
Prototype
adichunchagiri adichuncangiri, adhichunchangiri, aadichunchangiri,
adichunchungiri, adichunchangiri adichunchungiri
apartment apartment, apartmenet, apartmanet,apartmenst,
apartmenrt, apartmennt, apartment,appratment,
aparatment,appratmant, aparment, apartent
yewshwanthpur yewsanthpur, yeswhwanthpur,yeshwenthpur,
yeshwanyhpur, yeshwantrhpur, yeshwanthpua,
yeswantpur, yesvantpur, yeshantpur, yaswantpur
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Data Challenges, Modeling, Solutions and Deployment
Experimental Results
Summary
Fraud Address Classification
- Address Strings
Sl.No. Address
1 adf6546s54f6sadfsd6dsa4f6sd54f6sd46fasd54sd6f
2 gasdfashagadfasmejastic
3 fdgdf
4 hjsdhaddsdsasdsa
5 dsfadafadsasdfsdafsda
6 hjsdhaddsdsasdsa
7 asd
8 lmflvml
9 assasfsafasfsasfsfsafashaphilomena
10 faskjbdasdlkjbsaasd
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Data Challenges, Modeling, Solutions and Deployment
Experimental Results
Summary
Fraud Address Classification-Address Strings-Heatmap
Figure: MonkeyType Addresses
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Data Challenges, Modeling, Solutions and Deployment
Experimental Results
Summary
Fraud Address Classification - Items Bought
Figure: Items bought by such people
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Data Challenges, Modeling, Solutions and Deployment
Experimental Results
Summary
Probabilistic Separation of Compound Words
I To a large extent, Addresses are not amenable to English
Dictionaries
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Data Challenges, Modeling, Solutions and Deployment
Experimental Results
Summary
Probabilistic Separation of Compound Words
I To a large extent, Addresses are not amenable to English
Dictionaries
I While writing addresses it is often found that the customer
either inadvertently misses the space or removed during
storage/retrieval
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Data Challenges, Modeling, Solutions and Deployment
Experimental Results
Summary
Probabilistic Separation of Compound Words
I To a large extent, Addresses are not amenable to English
Dictionaries
I While writing addresses it is often found that the customer
either inadvertently misses the space or removed during
storage/retrieval
I Separating such compound words
I Compute empirical probabilities of words
I Assuming conditional independence, if the joint probability of a
compound word is less than the product of the individual
words, separate the words
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Data Challenges, Modeling, Solutions and Deployment
Experimental Results
Summary
Frequent Pattern Tree for n-gram Generation
I Frequent pattern tree is a celebrated approach in mining large
datasets
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Data Challenges, Modeling, Solutions and Deployment
Experimental Results
Summary
Frequent Pattern Tree for n-gram Generation
I Frequent pattern tree is a celebrated approach in mining large
datasets
I We implement a modified version of the tree to generate
n-grams
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Data Challenges, Modeling, Solutions and Deployment
Experimental Results
Summary
Frequent Pattern Tree for n-gram Generation
I Frequent pattern tree is a celebrated approach in mining large
datasets
I We implement a modified version of the tree to generate
n-grams
I Conventional method
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Data Challenges, Modeling, Solutions and Deployment
Experimental Results
Summary
Frequent Pattern Tree for n-gram Generation
I Frequent pattern tree is a celebrated approach in mining large
datasets
I We implement a modified version of the tree to generate
n-grams
I Conventional method
I New approach
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Data Challenges, Modeling, Solutions and Deployment
Experimental Results
Summary
Clustering for equivalent set of words with spell variations
koramanagala koromangala kormanagala koramnagala
koramangalato kanamangala koramanagla koremangala
koaramangala koramamgala karamangala tkoramangala
kormangalla koramongala koarmangala korammangala
koramangalla koramangale koramanagal
electronice eclectronic elelctronic eelectronic electronica electroincs
electronics electroninc electrinics electroncis electronincs
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Data Challenges, Modeling, Solutions and Deployment
Experimental Results
Summary
Clustering for ... spell variations
bannerghattaroad, bannergattaroad, banerghattaroad, bannerghataroad,
bannerughattaroad, bannarghattaroad, banergattaroad,
banneraghattaroad, bannerghettaroad, bannerugattaroad,
bhannerghattaroad, bennerghattaroad, bannerghttaroad,
bannargattaroad, banarghattaroad, banneghattaroad, banneragattaroad,
bennarghattaroad, baneerghattaroad, bannergettaroad,
banngerghattaroad, banerghataroad, bannerghuttaroad, bannergatharoad,
benerghattaroad, bannerghattaroadto, bannergataroad,
bannergattharoad, banerghettaroad, bannerguttaroad, bannarghataroad,
bannnerghattaroad, bannarghettaroad, banerughattaroad,
bannergahttaroad, bhannerughattaroad, bennergattaroad,
bannerghattroad, bannaraghattaroad, bannerhattaroad,
bannerghatharoad, banneerghattaroad, bannaerghattaroad,
baneergattaroad, bhannergattaroad, bhanerghattaroad,
bannerughataroad, baneerghataroad, bannerghatroad, baneghattaroad,
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Data Challenges, Modeling, Solutions and Deployment
Experimental Results
Summary
Semi-Supervised Methods
Discussion
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Data Challenges, Modeling, Solutions and Deployment
Experimental Results
Summary
Modelling Approaches - Supervised Approach
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Data Challenges, Modeling, Solutions and Deployment
Experimental Results
Summary
Modelling Approaches - Unsupervised Approach
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Data Challenges, Modeling, Solutions and Deployment
Experimental Results
Summary
I Experiments with limited dataset
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Data Challenges, Modeling, Solutions and Deployment
Experimental Results
Summary
I Experiments with limited dataset
I Semi-supervised approches to increase dataset
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Data Challenges, Modeling, Solutions and Deployment
Experimental Results
Summary
I Experiments with limited dataset
I Semi-supervised approches to increase dataset
I Ensemble of classifiers
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Data Challenges, Modeling, Solutions and Deployment
Experimental Results
Summary
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Data Challenges, Modeling, Solutions and Deployment
Experimental Results
Summary
Summary
I Novelty
I Solution is novel and developed in-house
I No similar solution found in the Literature
I Publication 2
2
T. Ravindra Babu, et al Geographical address classification without using
geolocation coordinates http://dl.acm.org/citation.cfm?id=2837696
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Context Setting
Solution Overview
Experimentation
Summary
Monkey Typed Address Classification3
- Motivation
I Why do they occur?
I Impact of such addresses?
I Approaches to identify and eliminate
3
T. Ravindra Babu, Vishal Kakkar, Address Fraud: Monkey Typed
Address Classification for e-Commerce Applications SIGIR e-Com 2017.
url:http:
//sigir-ecom.weebly.com/uploads/1/0/2/9/102947274/paper_21.pdf
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Context Setting
Solution Overview
Experimentation
Summary
Sample of Monkeytyped Addresses
Table: A Sample of Monkey-typed Addresses
Sl. No. Address
1 OEGVOQCS
2 ddfkddd
3 afadfsf
4 gdfgtdf
5 fjrjnvhejvnjdjdfogjfn vmfjgfnl
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Context Setting
Solution Overview
Experimentation
Summary
Challenges with Monkey-typed Addresses
I Variable length strings with maximum number reaching about
100 characters.
I Many of them have repeated substrings such as asdf asdfgij
etc.
I Some addresses are provided as multiple monkey-typed strings
so as to mimic a normal address.
I Combination of upper and lower case characters
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Context Setting
Solution Overview
Experimentation
Summary
Important Stages
I Address Preprocessing
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Context Setting
Solution Overview
Experimentation
Summary
Important Stages
I Address Preprocessing
I Novel way of feature generation
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Context Setting
Solution Overview
Experimentation
Summary
Important Stages
I Address Preprocessing
I Novel way of feature generation
I Pattern Classification
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Context Setting
Solution Overview
Experimentation
Summary
Stages
1. Preprocess the addresses
2. Generate fixed length features that are devoid of repeated
substrings
3. Label the patterns as normal or monkey-typed addresses
4. Divide the data set into training, and test datasets
5. Classify the data and record the average performance
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Context Setting
Solution Overview
Experimentation
Summary
Steps in Data Preparation
1. Remove control and special characters; Reduce all data to
lower case combination
2. Post these changes, identify unique addresses among the
dataset
3. Remove spaces between address words to convert it into a
single string without spaces
4. Identify and eliminate repeated strings of constant length by
starting from different start positions, 1 to 3 of a
space-removed address string.
5. Split the strings into k-character substrings to form features
6. Blank spaces remain in the last string when the address string
length is not divisible by substring length ‘k’. Replace those
blanks that remain after “n modulo k” split with *’s, where n
and k are lengths of full string and substring respectively.
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Context Setting
Solution Overview
Experimentation
Summary
Summary of the Process through examples
Table: Addresses and Features
Stage Sample Data
Preprocessed anjaneyatempleomshanthitempleroad
Addresses hegnahallicrosssunkadakatte
krlayoutjpnagar4phasekalayanamagnum
techpark
hshshshsdfhdhsh
scbmdbsdvgjfsgk
gtysfkjhjuhkjkeraladjuhgjdhiidjidjidkgj
fkjhkdfijkjklfijkjfghkijgfkhjdfklfifkijkldflfi
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Context Setting
Solution Overview
Experimentation
Summary
Summary of the Process through examples
Table: Addresses and Features
Preprocess Sample Data
Stage
Features anjaneya templeom shanthit empleroa
(8-char long) dhegnaha llicross sunkadak atte****
krlayout jpnagar4 tphaseba ngalaore
hshshshs jshshs**
scbmdbsd vgjfsgk*
gtysfkjh juhkjker aladjuhg jdhiidji
dkgjfkjh kdfijkjk lfijkjfg hkijgfkh jdfklfif
kijkldfl fi******
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Context Setting
Solution Overview
Experimentation
Summary
Exercises
Table: Case Studies and Datasets
Case Training dataset Test dataset
1 1-10000 10001-15000
2 5001-15000 1-5000
3 1-5000, 10001-15000 5001-10000
4 1-7500 7501-15000
5 7501-15000 1-7500
6 1-10000 10001-15000
7 5001-15000 1-5000
8 1-5000, 10001-15000 5001-10000
9 1-7500 7501-15000
10 7501-15000 1-7500
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Context Setting
Solution Overview
Experimentation
Summary
Results
Table: Experimental Results
Case Preci- Recall F-
No sion(%) (%) Score
8-Character long features
1 97.75 91.30 94.42
2 97.81 90.94 94.25
3 97.39 91.00 94.09
4 97.84 90.61 94.09
5 97.86 90.33 93.95
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Context Setting
Solution Overview
Experimentation
Summary
Results
Table: Experimental Results
Case Preci- Recall F-
No sion(%) (%) Score
4-Character long features
6 97.98 97.86 97.92
7 98.35 97.74 98.04
8 98.19 97.46 97.82
9 98.58 97.72 98.05
10 98.35 97.68 98.01
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Context Setting
Solution Overview
Experimentation
Summary
Summary
I Motivation, problem definition
I Solution Summary
I Discussion
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Motivation, Solution Overview and Experimental Results
Address Clustering - Context Setting and Motivation4
I Fraud in e-Commerce
4
Vishal Kakkar, T. Ravindra Babu. Address Clustering for e-Commerce
Applications, SIGIR eCom 2018url:https:
//sigir-ecom.github.io/ecom2018/ecom18Papers/paper8.pdf
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Motivation, Solution Overview and Experimental Results
Address Clustering - Context Setting and Motivation4
I Fraud in e-Commerce
I Reseller Fraud, Missing item fraud, Seller Fraud–Protection
Fund, Duplicate Items, Pricing Fraud
4
Vishal Kakkar, T. Ravindra Babu. Address Clustering for e-Commerce
Applications, SIGIR eCom 2018url:https:
//sigir-ecom.github.io/ecom2018/ecom18Papers/paper8.pdf
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Motivation, Solution Overview and Experimental Results
Address Clustering - Context Setting and Motivation4
I Fraud in e-Commerce
I Reseller Fraud, Missing item fraud, Seller Fraud–Protection
Fund, Duplicate Items, Pricing Fraud
I Review of challenges in Indian Addresses: Lack of
Geolocation, limited literacy, ethnicity, variants of same
address, additional data as part of address, wrong PIN code
4
Vishal Kakkar, T. Ravindra Babu. Address Clustering for e-Commerce
Applications, SIGIR eCom 2018url:https:
//sigir-ecom.github.io/ecom2018/ecom18Papers/paper8.pdf
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Motivation, Solution Overview and Experimental Results
Address Clustering - Context Setting and Motivation4
I Fraud in e-Commerce
I Reseller Fraud, Missing item fraud, Seller Fraud–Protection
Fund, Duplicate Items, Pricing Fraud
I Review of challenges in Indian Addresses: Lack of
Geolocation, limited literacy, ethnicity, variants of same
address, additional data as part of address, wrong PIN code
I Need for Address Clustering
4
Vishal Kakkar, T. Ravindra Babu. Address Clustering for e-Commerce
Applications, SIGIR eCom 2018url:https:
//sigir-ecom.github.io/ecom2018/ecom18Papers/paper8.pdf
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Motivation, Solution Overview and Experimental Results
Clustering Approaches
I Discussion on clustering methods5: Iterative vis-a-vis Single
view, Prototype vs centroid,
5
C.C.Aggarwal et al. A survey of text clustering algorithms. In Mining text
data. Springer, Boston, MA, 77-128 pages, 2012
6
T. Ravindra Babu, et al. On Simultaneous selection of prototypes and
features in large data. International Conference on Pattern Recognition and
Machine Intelligence. Springer, Berlin, Heidelberg, 2005.
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Motivation, Solution Overview and Experimental Results
Clustering Approaches
I Discussion on clustering methods5: Iterative vis-a-vis Single
view, Prototype vs centroid,
I Cluster evaluation6: A brief discussion
5
C.C.Aggarwal et al. A survey of text clustering algorithms. In Mining text
data. Springer, Boston, MA, 77-128 pages, 2012
6
T. Ravindra Babu, et al. On Simultaneous selection of prototypes and
features in large data. International Conference on Pattern Recognition and
Machine Intelligence. Springer, Berlin, Heidelberg, 2005.
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Motivation, Solution Overview and Experimental Results
Clustering Approaches
I Discussion on clustering methods5: Iterative vis-a-vis Single
view, Prototype vs centroid,
I Cluster evaluation6: A brief discussion
I Leader Clustering
5
C.C.Aggarwal et al. A survey of text clustering algorithms. In Mining text
data. Springer, Boston, MA, 77-128 pages, 2012
6
T. Ravindra Babu, et al. On Simultaneous selection of prototypes and
features in large data. International Conference on Pattern Recognition and
Machine Intelligence. Springer, Berlin, Heidelberg, 2005.
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Motivation, Solution Overview and Experimental Results
Clustering Approaches
I Discussion on clustering methods5: Iterative vis-a-vis Single
view, Prototype vs centroid,
I Cluster evaluation6: A brief discussion
I Leader Clustering
5
C.C.Aggarwal et al. A survey of text clustering algorithms. In Mining text
data. Springer, Boston, MA, 77-128 pages, 2012
6
T. Ravindra Babu, et al. On Simultaneous selection of prototypes and
features in large data. International Conference on Pattern Recognition and
Machine Intelligence. Springer, Berlin, Heidelberg, 2005.
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Motivation, Solution Overview and Experimental Results
Clustering Approaches
I Discussion on clustering methods5: Iterative vis-a-vis Single
view, Prototype vs centroid,
I Cluster evaluation6: A brief discussion
I Leader Clustering
I Leader Clustering with edit distance
5
C.C.Aggarwal et al. A survey of text clustering algorithms. In Mining text
data. Springer, Boston, MA, 77-128 pages, 2012
6
T. Ravindra Babu, et al. On Simultaneous selection of prototypes and
features in large data. International Conference on Pattern Recognition and
Machine Intelligence. Springer, Berlin, Heidelberg, 2005.
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Motivation, Solution Overview and Experimental Results
Clustering Approaches
I Discussion on clustering methods5: Iterative vis-a-vis Single
view, Prototype vs centroid,
I Cluster evaluation6: A brief discussion
I Leader Clustering
I Leader Clustering with edit distance
I Leader Clustering with word embeddings
5
C.C.Aggarwal et al. A survey of text clustering algorithms. In Mining text
data. Springer, Boston, MA, 77-128 pages, 2012
6
T. Ravindra Babu, et al. On Simultaneous selection of prototypes and
features in large data. International Conference on Pattern Recognition and
Machine Intelligence. Springer, Berlin, Heidelberg, 2005.
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Motivation, Solution Overview and Experimental Results
Solution Overview
I Approach-1: Considering words directly of an address, with
each address as a document
I Approach-2: Word embeddings based approach (add2vec)
Table: Algorithm Complexity
Algorithm Complexity
Leader(edit distance) O((n ∗ d ∗ m)2)
Leader(add2vec) O(n2 ∗ d)
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Motivation, Solution Overview and Experimental Results
Experimental results
Spell Variants
Algorithm Spell Variants
Edit-dist apartmnt, aparent, aparmtement, apretment,
apparment, aparment, apprmnts, apartemnets
Embeddings appartment, appt, apt, apartments, apparment,
aprtment, apartement, appts, appartments
Edit-dist collage, colloge, coolege, cottege, callage,
coolage, collega, callege
Embeddings collage, collge, colleage, clg, colege, colleg, colg, clg
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Motivation, Solution Overview and Experimental Results
Sample Clusters with Leader Clustering and add2vec
Customer Address Cosine Similarity
la renon healthcare prv ltd 711 iscon elegance
prahlad nagar cross road s g highway ahmedabad 1.0
la renon healthcare prv ltd 711 iscon elegance
prahlad nagar cross road s g highway ahmedabad
201 limited 0.989
la renon healthcare prv ltd 711 iscon elegance
prahlad nagar cross road s g highway ahmedabad
201 limited. fax 91 office shapath India 0.962
la renon healthcare prv ltd 711 iscon elegance
prahlad nagar cross road s g highway ahmedabad
201 limited. fax 91 office shapath India 380015
p 5 1001 gujarat roads circle 1000 793046 0.955
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Motivation, Solution Overview and Experimental Results
Summary
I Problem Overview
I Approaches
I Results
I Applications
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Recent Advances
I BERT variants for address classification7
I GIS and geocoding
I Global Grid representations8,9
I BERT variants for address
non-deliverability/incomplete-address-prediction
7
Shreyas Mangalgi, Lakshya Kumar, T. Ravindra Babu. Deep
Contextual Embeddings for Address Classification in E-commerce. KDD AI for
Fashion, 2020. arXiv: 2007. 03020
8
https://mailingsystemstechnology.com/
article-4199-Addresses-and-Discrete-Grid-Systems.html,2017
9
https://www.delhivery.com/news/
economic-impact-of-discoverability-of-localities-and-addresses-in-india
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
BERT Variants for Address Classification
I BERT variants for address classification10
I Approach-1: Preprocessing: Prob. splitting, Spell correction,
Bigram separation, Prob. Merging. W2V for tokens and tf-idf
weighting
I Adv: Appropriate weighting for frequent and infrequent terms;
I Disadv: Averging word vectors leads to lass of sequential
information. Ex. Faridabad addresses
10
Shreyas Mangalgi, Lakshya Kumar, T. Ravindra Babu. Deep
Contextual Embeddings for Address Classification in E-commerce. KDD AI for
Fashion, 2020. arXiv: 2007. 03020
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
BERT variants for Address Classification
I Approach-2: Bidrectional-LSTM
I Address token representation as concatenation of forward
hidden state and backward hidden state through bi-dir
training
I Adv: Bi-directional, captures sequential information
I Disadv: Time consuming
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
BERT variants
I Approach-3: RoBERTa
I Pretraining on addresses and finetuning for subregion
classification taxk.
I It optimises for two pretraining tasks: Masked Lang. Model
and Next Sentence Prediction
I Adv: Captures sequential information
I Disadv: Faster
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Recent Advances
I GIS and geocoding
I Global Grid representations11,12
I Predicting completeness of unstructured shipping addresses
using ensemble methods by Razorpay team
(ConvNet+XGBoost)13
I Mining points of interest via address embeddings: an
unsupervised approach, by Swiggy team+ Univ of
Maryland(unsupervised+RoBERTa+PoI+alg. polygons)14
11
https://mailingsystemstechnology.com/
article-4199-Addresses-and-Discrete-Grid-Systems.html,2017
12
https://www.delhivery.com/news/
economic-impact-of-discoverability-of-localities-and-addresses-in-india
13
https://sigir-ecom.github.io/ecom21Papers/paper25.pdf
14
https://dl.acm.org/doi/abs/10.1145/3486183.3491002
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
Address Problems - Motivation and Challenges
Address Classification or Route Assignment
Monkey Typed Address Classification
Address Clustering
Recent Advances in Address Modelling
Thank You
T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain

More Related Content

What's hot

Generating Qualitative Content with GPT-2 in All Languages
Generating Qualitative Content with GPT-2 in All LanguagesGenerating Qualitative Content with GPT-2 in All Languages
Generating Qualitative Content with GPT-2 in All LanguagesCatalyst
 
A Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-LearnA Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-LearnSarah Guido
 
Adoption of Next-Generation 5G Wireless Technology for “Smarter” Grid Design;...
Adoption of Next-Generation 5G Wireless Technology for “Smarter” Grid Design;...Adoption of Next-Generation 5G Wireless Technology for “Smarter” Grid Design;...
Adoption of Next-Generation 5G Wireless Technology for “Smarter” Grid Design;...Alidu Abubakari
 
Ensemble Learning and Random Forests
Ensemble Learning and Random ForestsEnsemble Learning and Random Forests
Ensemble Learning and Random ForestsCloudxLab
 
Supervised vs Unsupervised vs Reinforcement Learning | Edureka
Supervised vs Unsupervised vs Reinforcement Learning | EdurekaSupervised vs Unsupervised vs Reinforcement Learning | Edureka
Supervised vs Unsupervised vs Reinforcement Learning | EdurekaEdureka!
 
Deep Learning - RNN and CNN
Deep Learning - RNN and CNNDeep Learning - RNN and CNN
Deep Learning - RNN and CNNPradnya Saval
 
Sentiment analysis of Twitter Data
Sentiment analysis of Twitter DataSentiment analysis of Twitter Data
Sentiment analysis of Twitter DataNurendra Choudhary
 
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AIQualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AIQualcomm Research
 
Bringing AI research to wireless communication and sensing
Bringing AI research to wireless communication and sensingBringing AI research to wireless communication and sensing
Bringing AI research to wireless communication and sensingQualcomm Research
 
Hate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation
Hate Speech in Pixels: Detection of Offensive Memes towards Automatic ModerationHate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation
Hate Speech in Pixels: Detection of Offensive Memes towards Automatic ModerationUniversitat Politècnica de Catalunya
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learningKien Le
 
5G and Automative : Cellular V2X (vehicle-to-everything)
5G and Automative : Cellular V2X (vehicle-to-everything)5G and Automative : Cellular V2X (vehicle-to-everything)
5G and Automative : Cellular V2X (vehicle-to-everything)ITU
 
Deep Learning for Graphs
Deep Learning for GraphsDeep Learning for Graphs
Deep Learning for GraphsDeepLearningBlr
 
Scanning The MVNO Opportunity: Business Model Versus Reality
Scanning The MVNO Opportunity: Business Model Versus RealityScanning The MVNO Opportunity: Business Model Versus Reality
Scanning The MVNO Opportunity: Business Model Versus Realitydtc100842
 
Power Electronic Converter
Power Electronic ConverterPower Electronic Converter
Power Electronic ConverterAli
 
Mvno Mvne Indentifying New Business Opportunities
Mvno Mvne Indentifying New Business OpportunitiesMvno Mvne Indentifying New Business Opportunities
Mvno Mvne Indentifying New Business OpportunitiesComarch
 

What's hot (20)

Generating Qualitative Content with GPT-2 in All Languages
Generating Qualitative Content with GPT-2 in All LanguagesGenerating Qualitative Content with GPT-2 in All Languages
Generating Qualitative Content with GPT-2 in All Languages
 
4G 5G and 6G Network.pptx
4G 5G and 6G Network.pptx4G 5G and 6G Network.pptx
4G 5G and 6G Network.pptx
 
A Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-LearnA Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-Learn
 
Adoption of Next-Generation 5G Wireless Technology for “Smarter” Grid Design;...
Adoption of Next-Generation 5G Wireless Technology for “Smarter” Grid Design;...Adoption of Next-Generation 5G Wireless Technology for “Smarter” Grid Design;...
Adoption of Next-Generation 5G Wireless Technology for “Smarter” Grid Design;...
 
Ensemble Learning and Random Forests
Ensemble Learning and Random ForestsEnsemble Learning and Random Forests
Ensemble Learning and Random Forests
 
Supervised vs Unsupervised vs Reinforcement Learning | Edureka
Supervised vs Unsupervised vs Reinforcement Learning | EdurekaSupervised vs Unsupervised vs Reinforcement Learning | Edureka
Supervised vs Unsupervised vs Reinforcement Learning | Edureka
 
Deep Learning - RNN and CNN
Deep Learning - RNN and CNNDeep Learning - RNN and CNN
Deep Learning - RNN and CNN
 
Sentiment analysis of Twitter Data
Sentiment analysis of Twitter DataSentiment analysis of Twitter Data
Sentiment analysis of Twitter Data
 
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AIQualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
 
Bringing AI research to wireless communication and sensing
Bringing AI research to wireless communication and sensingBringing AI research to wireless communication and sensing
Bringing AI research to wireless communication and sensing
 
Hate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation
Hate Speech in Pixels: Detection of Offensive Memes towards Automatic ModerationHate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation
Hate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation
 
SME MVNO
SME MVNOSME MVNO
SME MVNO
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
 
5G and Automative : Cellular V2X (vehicle-to-everything)
5G and Automative : Cellular V2X (vehicle-to-everything)5G and Automative : Cellular V2X (vehicle-to-everything)
5G and Automative : Cellular V2X (vehicle-to-everything)
 
Deep Learning for Graphs
Deep Learning for GraphsDeep Learning for Graphs
Deep Learning for Graphs
 
Scanning The MVNO Opportunity: Business Model Versus Reality
Scanning The MVNO Opportunity: Business Model Versus RealityScanning The MVNO Opportunity: Business Model Versus Reality
Scanning The MVNO Opportunity: Business Model Versus Reality
 
Cellular V2X
Cellular V2XCellular V2X
Cellular V2X
 
[ppt]
[ppt][ppt]
[ppt]
 
Power Electronic Converter
Power Electronic ConverterPower Electronic Converter
Power Electronic Converter
 
Mvno Mvne Indentifying New Business Opportunities
Mvno Mvne Indentifying New Business OpportunitiesMvno Mvne Indentifying New Business Opportunities
Mvno Mvne Indentifying New Business Opportunities
 

Similar to Address classification

Shipment address classification in logistics, Ravindra Babu, Flipkart
Shipment address classification in logistics, Ravindra Babu, FlipkartShipment address classification in logistics, Ravindra Babu, Flipkart
Shipment address classification in logistics, Ravindra Babu, FlipkartMohit Ranjan
 
PyCon 2013 - Experiments in data mining, entity disambiguation and how to thi...
PyCon 2013 - Experiments in data mining, entity disambiguation and how to thi...PyCon 2013 - Experiments in data mining, entity disambiguation and how to thi...
PyCon 2013 - Experiments in data mining, entity disambiguation and how to thi...Ekta Grover
 
Network Engineer Resume
Network Engineer ResumeNetwork Engineer Resume
Network Engineer Resumeanil shinde
 

Similar to Address classification (7)

Shipment address classification in logistics, Ravindra Babu, Flipkart
Shipment address classification in logistics, Ravindra Babu, FlipkartShipment address classification in logistics, Ravindra Babu, Flipkart
Shipment address classification in logistics, Ravindra Babu, Flipkart
 
Design and Development of Questionnaire
Design and Development of Questionnaire Design and Development of Questionnaire
Design and Development of Questionnaire
 
ISEC-2021-Presentation-Saikat-Mondal
ISEC-2021-Presentation-Saikat-MondalISEC-2021-Presentation-Saikat-Mondal
ISEC-2021-Presentation-Saikat-Mondal
 
PyCon 2013 - Experiments in data mining, entity disambiguation and how to thi...
PyCon 2013 - Experiments in data mining, entity disambiguation and how to thi...PyCon 2013 - Experiments in data mining, entity disambiguation and how to thi...
PyCon 2013 - Experiments in data mining, entity disambiguation and how to thi...
 
digvijay (1)
digvijay (1)digvijay (1)
digvijay (1)
 
Network Engineer Resume
Network Engineer ResumeNetwork Engineer Resume
Network Engineer Resume
 
saswati
saswatisaswati
saswati
 

Recently uploaded

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 

Recently uploaded (20)

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 

Address classification

  • 1. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Address Models for the Indian e-Commerce Domain T. Ravindra Babu, Ph.D. Head, Data Science, Sahaj.ai 16 December 2021 T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 2. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Presentation Plan Address Problems - Motivation and Challenges Address Classification or Route Assignment Data Challenges, Modeling, Solutions and Deployment Experimental Results Summary Monkey Typed Address Classification Context Setting Solution Overview Experimentation Summary Address Clustering Motivation, Solution Overview and Experimental Results Recent Advances in Address Modelling T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 3. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Motivation and Problem Definition I Definition of an address1: Address is the one that specifies location by reference to a thoroughfare or a landmark; or it specifies a point of postal delivery 1 PDFC Subcommittee for Culture and Demographic Data. 2001. United States Thoroughfare, Landmark, and Postal Address Data Standard. https://www.fgdc.gov/standards/projects/address-data/ AddressDataStandardPart01 T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 4. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Motivation and Problem Definition I Definition of an address1: Address is the one that specifies location by reference to a thoroughfare or a landmark; or it specifies a point of postal delivery I Motivation I Non-standard addresses I Spell Variations I Inadvertent Separation or Joining of area names I Long addresses and their equivalence I Grouping of ”similar” addresses for fraud detection I Origin, Identification and Isolation of Monkey Typed Addresses I Address non-deliverability/incomplete address 1 PDFC Subcommittee for Culture and Demographic Data. 2001. United States Thoroughfare, Landmark, and Postal Address Data Standard. https://www.fgdc.gov/standards/projects/address-data/ AddressDataStandardPart01 T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 5. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Data Challenges, Modeling, Solutions and Deployment Experimental Results Summary Address Classification I Problem Definition I Typical Operations Scenario at Delivery Hub without a model I Inscan of shipments received from Mother Hub I Manual reading of address; Assign to the Route/FE I Sorting and Delivery T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 6. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Data Challenges, Modeling, Solutions and Deployment Experimental Results Summary Address Classification I Problem Definition I Typical Operations Scenario at Delivery Hub without a model I Inscan of shipments received from Mother Hub I Manual reading of address; Assign to the Route/FE I Sorting and Delivery I Overview of Proposed Solution I Capturing FEs’ domain knowledge and modelling around it I Classifying an address to be belonging to a pre-defined subarea I Allocation of the shipments to Route/FE based on Machine Learning based Classifier I Sorting and Delivery T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 7. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Data Challenges, Modeling, Solutions and Deployment Experimental Results Summary Delivery Hub and Subareas Figure: Hub and Subareas T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 8. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Data Challenges, Modeling, Solutions and Deployment Experimental Results Summary Insights into Address Data I No. of words in an addresses ranges from 4 to 75 leaving few outliers of more than 100. T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 9. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Data Challenges, Modeling, Solutions and Deployment Experimental Results Summary Insights into Address Data I No. of words in an addresses ranges from 4 to 75 leaving few outliers of more than 100. I Word like Apartments is spelt in 263 different ways; whitefield 24 ways, industrial 25 ways, Bangalore 161 ways, karnataka 70 ways, etc. T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 10. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Data Challenges, Modeling, Solutions and Deployment Experimental Results Summary Insights into Address Data I No. of words in an addresses ranges from 4 to 75 leaving few outliers of more than 100. I Word like Apartments is spelt in 263 different ways; whitefield 24 ways, industrial 25 ways, Bangalore 161 ways, karnataka 70 ways, etc. I Structure in address is lacking even in city like Bangalore. Few examples. T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 11. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Data Challenges, Modeling, Solutions and Deployment Experimental Results Summary Insights into Address Data I No. of words in an addresses ranges from 4 to 75 leaving few outliers of more than 100. I Word like Apartments is spelt in 263 different ways; whitefield 24 ways, industrial 25 ways, Bangalore 161 ways, karnataka 70 ways, etc. I Structure in address is lacking even in city like Bangalore. Few examples. I Some words a specific to certain places/states. Examples: halli, hobli; bawdi, kuan; society; layout; etc. T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 12. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Data Challenges, Modeling, Solutions and Deployment Experimental Results Summary Insights into Address Data I No. of words in an addresses ranges from 4 to 75 leaving few outliers of more than 100. I Word like Apartments is spelt in 263 different ways; whitefield 24 ways, industrial 25 ways, Bangalore 161 ways, karnataka 70 ways, etc. I Structure in address is lacking even in city like Bangalore. Few examples. I Some words a specific to certain places/states. Examples: halli, hobli; bawdi, kuan; society; layout; etc. I Addressing Systems across the world: US, Europe, Korea, Japan; countries like Brazil, and India T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 13. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Data Challenges, Modeling, Solutions and Deployment Experimental Results Summary Sample Addresses Table: Sample Addresses Sl.No. Address 1 Raghavendra Layout PattanagereBhel Layout Rajarajeshwari nagar 560098 2 Adval Infotech BaNakal Karnataka India 560019 3 Jyothi Enclave 1st A cross Kaggadaspura CV Raman nagar opposite August Park 560093 4 1st Main Cross 2ndBCross Nanjappa Layout Adugudi. Ganesha Temple 560030 5 OCEANOUS TRITON OPP TOTAL MALL OFF SARJAPUR RDpo bellundur 560103 T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 14. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Data Challenges, Modeling, Solutions and Deployment Experimental Results Summary Example of address with 75 words Example: AD5LXSZJYRIT40GGELRLWM Flat No. 1005, Oceanus Greendale Apartment, Jasmine Block, Hoysala Nagar 3rd Main Road, Ram Murthy Nagar, Bangalore - 560016, L/M- Opposite Lord Ganesha Temple on Hoysalanagar 3r d Main Road. Directions- 1. Take Right on Banaswadi signal on outer Ring Road. 2. Enter Horamavu M ain Road. 3. After Railway gate take first right. 4.At the dead end turh right, Lord ganesha templ e will be on the left. 5. Opposite Road will take you to the apartment. Bangalore 5 60016 2011-11-17 18:32:31 T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 15. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Data Challenges, Modeling, Solutions and Deployment Experimental Results Summary Preprocessing and Modelling Aspects I An elaborate preprocessing model was necessary that accounts for the following. I Objective is to only those terms that help classification (discriminability) I Cleaning of Addresses I Probabilistic Separation of Words I Integrating domain knowledge I Machine Learning based dictionary generation I Classification of potentially fraudulent addresses I Generation of n-grams using modified Frequent Pattern(FP)-tree T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 16. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Data Challenges, Modeling, Solutions and Deployment Experimental Results Summary Preprocessing for Data Compaction Figure: Impact of Preprocessing T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 17. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Data Challenges, Modeling, Solutions and Deployment Experimental Results Summary Equivalent set generation using Clustering Table: A Sample of Equivalent Terms identified by Clustering Cluster Cluster members Prototype adichunchagiri adichuncangiri, adhichunchangiri, aadichunchangiri, adichunchungiri, adichunchangiri adichunchungiri apartment apartment, apartmenet, apartmanet,apartmenst, apartmenrt, apartmennt, apartment,appratment, aparatment,appratmant, aparment, apartent yewshwanthpur yewsanthpur, yeswhwanthpur,yeshwenthpur, yeshwanyhpur, yeshwantrhpur, yeshwanthpua, yeswantpur, yesvantpur, yeshantpur, yaswantpur T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 18. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Data Challenges, Modeling, Solutions and Deployment Experimental Results Summary Fraud Address Classification - Address Strings Sl.No. Address 1 adf6546s54f6sadfsd6dsa4f6sd54f6sd46fasd54sd6f 2 gasdfashagadfasmejastic 3 fdgdf 4 hjsdhaddsdsasdsa 5 dsfadafadsasdfsdafsda 6 hjsdhaddsdsasdsa 7 asd 8 lmflvml 9 assasfsafasfsasfsfsafashaphilomena 10 faskjbdasdlkjbsaasd T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 19. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Data Challenges, Modeling, Solutions and Deployment Experimental Results Summary Fraud Address Classification-Address Strings-Heatmap Figure: MonkeyType Addresses T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 20. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Data Challenges, Modeling, Solutions and Deployment Experimental Results Summary Fraud Address Classification - Items Bought Figure: Items bought by such people T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 21. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Data Challenges, Modeling, Solutions and Deployment Experimental Results Summary Probabilistic Separation of Compound Words I To a large extent, Addresses are not amenable to English Dictionaries T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 22. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Data Challenges, Modeling, Solutions and Deployment Experimental Results Summary Probabilistic Separation of Compound Words I To a large extent, Addresses are not amenable to English Dictionaries I While writing addresses it is often found that the customer either inadvertently misses the space or removed during storage/retrieval T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 23. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Data Challenges, Modeling, Solutions and Deployment Experimental Results Summary Probabilistic Separation of Compound Words I To a large extent, Addresses are not amenable to English Dictionaries I While writing addresses it is often found that the customer either inadvertently misses the space or removed during storage/retrieval I Separating such compound words I Compute empirical probabilities of words I Assuming conditional independence, if the joint probability of a compound word is less than the product of the individual words, separate the words T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 24. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Data Challenges, Modeling, Solutions and Deployment Experimental Results Summary Frequent Pattern Tree for n-gram Generation I Frequent pattern tree is a celebrated approach in mining large datasets T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 25. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Data Challenges, Modeling, Solutions and Deployment Experimental Results Summary Frequent Pattern Tree for n-gram Generation I Frequent pattern tree is a celebrated approach in mining large datasets I We implement a modified version of the tree to generate n-grams T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 26. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Data Challenges, Modeling, Solutions and Deployment Experimental Results Summary Frequent Pattern Tree for n-gram Generation I Frequent pattern tree is a celebrated approach in mining large datasets I We implement a modified version of the tree to generate n-grams I Conventional method T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 27. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Data Challenges, Modeling, Solutions and Deployment Experimental Results Summary Frequent Pattern Tree for n-gram Generation I Frequent pattern tree is a celebrated approach in mining large datasets I We implement a modified version of the tree to generate n-grams I Conventional method I New approach T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 28. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Data Challenges, Modeling, Solutions and Deployment Experimental Results Summary Clustering for equivalent set of words with spell variations koramanagala koromangala kormanagala koramnagala koramangalato kanamangala koramanagla koremangala koaramangala koramamgala karamangala tkoramangala kormangalla koramongala koarmangala korammangala koramangalla koramangale koramanagal electronice eclectronic elelctronic eelectronic electronica electroincs electronics electroninc electrinics electroncis electronincs T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 29. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Data Challenges, Modeling, Solutions and Deployment Experimental Results Summary Clustering for ... spell variations bannerghattaroad, bannergattaroad, banerghattaroad, bannerghataroad, bannerughattaroad, bannarghattaroad, banergattaroad, banneraghattaroad, bannerghettaroad, bannerugattaroad, bhannerghattaroad, bennerghattaroad, bannerghttaroad, bannargattaroad, banarghattaroad, banneghattaroad, banneragattaroad, bennarghattaroad, baneerghattaroad, bannergettaroad, banngerghattaroad, banerghataroad, bannerghuttaroad, bannergatharoad, benerghattaroad, bannerghattaroadto, bannergataroad, bannergattharoad, banerghettaroad, bannerguttaroad, bannarghataroad, bannnerghattaroad, bannarghettaroad, banerughattaroad, bannergahttaroad, bhannerughattaroad, bennergattaroad, bannerghattroad, bannaraghattaroad, bannerhattaroad, bannerghatharoad, banneerghattaroad, bannaerghattaroad, baneergattaroad, bhannergattaroad, bhanerghattaroad, bannerughataroad, baneerghataroad, bannerghatroad, baneghattaroad, T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 30. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Data Challenges, Modeling, Solutions and Deployment Experimental Results Summary Semi-Supervised Methods Discussion T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 31. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Data Challenges, Modeling, Solutions and Deployment Experimental Results Summary Modelling Approaches - Supervised Approach T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 32. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Data Challenges, Modeling, Solutions and Deployment Experimental Results Summary Modelling Approaches - Unsupervised Approach T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 33. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Data Challenges, Modeling, Solutions and Deployment Experimental Results Summary I Experiments with limited dataset T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 34. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Data Challenges, Modeling, Solutions and Deployment Experimental Results Summary I Experiments with limited dataset I Semi-supervised approches to increase dataset T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 35. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Data Challenges, Modeling, Solutions and Deployment Experimental Results Summary I Experiments with limited dataset I Semi-supervised approches to increase dataset I Ensemble of classifiers T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 36. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Data Challenges, Modeling, Solutions and Deployment Experimental Results Summary T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 37. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Data Challenges, Modeling, Solutions and Deployment Experimental Results Summary Summary I Novelty I Solution is novel and developed in-house I No similar solution found in the Literature I Publication 2 2 T. Ravindra Babu, et al Geographical address classification without using geolocation coordinates http://dl.acm.org/citation.cfm?id=2837696 T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 38. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Context Setting Solution Overview Experimentation Summary Monkey Typed Address Classification3 - Motivation I Why do they occur? I Impact of such addresses? I Approaches to identify and eliminate 3 T. Ravindra Babu, Vishal Kakkar, Address Fraud: Monkey Typed Address Classification for e-Commerce Applications SIGIR e-Com 2017. url:http: //sigir-ecom.weebly.com/uploads/1/0/2/9/102947274/paper_21.pdf T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 39. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Context Setting Solution Overview Experimentation Summary Sample of Monkeytyped Addresses Table: A Sample of Monkey-typed Addresses Sl. No. Address 1 OEGVOQCS 2 ddfkddd 3 afadfsf 4 gdfgtdf 5 fjrjnvhejvnjdjdfogjfn vmfjgfnl T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 40. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Context Setting Solution Overview Experimentation Summary Challenges with Monkey-typed Addresses I Variable length strings with maximum number reaching about 100 characters. I Many of them have repeated substrings such as asdf asdfgij etc. I Some addresses are provided as multiple monkey-typed strings so as to mimic a normal address. I Combination of upper and lower case characters T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 41. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Context Setting Solution Overview Experimentation Summary Important Stages I Address Preprocessing T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 42. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Context Setting Solution Overview Experimentation Summary Important Stages I Address Preprocessing I Novel way of feature generation T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 43. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Context Setting Solution Overview Experimentation Summary Important Stages I Address Preprocessing I Novel way of feature generation I Pattern Classification T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 44. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Context Setting Solution Overview Experimentation Summary Stages 1. Preprocess the addresses 2. Generate fixed length features that are devoid of repeated substrings 3. Label the patterns as normal or monkey-typed addresses 4. Divide the data set into training, and test datasets 5. Classify the data and record the average performance T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 45. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Context Setting Solution Overview Experimentation Summary Steps in Data Preparation 1. Remove control and special characters; Reduce all data to lower case combination 2. Post these changes, identify unique addresses among the dataset 3. Remove spaces between address words to convert it into a single string without spaces 4. Identify and eliminate repeated strings of constant length by starting from different start positions, 1 to 3 of a space-removed address string. 5. Split the strings into k-character substrings to form features 6. Blank spaces remain in the last string when the address string length is not divisible by substring length ‘k’. Replace those blanks that remain after “n modulo k” split with *’s, where n and k are lengths of full string and substring respectively. T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 46. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Context Setting Solution Overview Experimentation Summary Summary of the Process through examples Table: Addresses and Features Stage Sample Data Preprocessed anjaneyatempleomshanthitempleroad Addresses hegnahallicrosssunkadakatte krlayoutjpnagar4phasekalayanamagnum techpark hshshshsdfhdhsh scbmdbsdvgjfsgk gtysfkjhjuhkjkeraladjuhgjdhiidjidjidkgj fkjhkdfijkjklfijkjfghkijgfkhjdfklfifkijkldflfi T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 47. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Context Setting Solution Overview Experimentation Summary Summary of the Process through examples Table: Addresses and Features Preprocess Sample Data Stage Features anjaneya templeom shanthit empleroa (8-char long) dhegnaha llicross sunkadak atte**** krlayout jpnagar4 tphaseba ngalaore hshshshs jshshs** scbmdbsd vgjfsgk* gtysfkjh juhkjker aladjuhg jdhiidji dkgjfkjh kdfijkjk lfijkjfg hkijgfkh jdfklfif kijkldfl fi****** T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 48. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Context Setting Solution Overview Experimentation Summary Exercises Table: Case Studies and Datasets Case Training dataset Test dataset 1 1-10000 10001-15000 2 5001-15000 1-5000 3 1-5000, 10001-15000 5001-10000 4 1-7500 7501-15000 5 7501-15000 1-7500 6 1-10000 10001-15000 7 5001-15000 1-5000 8 1-5000, 10001-15000 5001-10000 9 1-7500 7501-15000 10 7501-15000 1-7500 T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 49. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Context Setting Solution Overview Experimentation Summary Results Table: Experimental Results Case Preci- Recall F- No sion(%) (%) Score 8-Character long features 1 97.75 91.30 94.42 2 97.81 90.94 94.25 3 97.39 91.00 94.09 4 97.84 90.61 94.09 5 97.86 90.33 93.95 T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 50. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Context Setting Solution Overview Experimentation Summary Results Table: Experimental Results Case Preci- Recall F- No sion(%) (%) Score 4-Character long features 6 97.98 97.86 97.92 7 98.35 97.74 98.04 8 98.19 97.46 97.82 9 98.58 97.72 98.05 10 98.35 97.68 98.01 T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 51. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Context Setting Solution Overview Experimentation Summary Summary I Motivation, problem definition I Solution Summary I Discussion T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 52. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Motivation, Solution Overview and Experimental Results Address Clustering - Context Setting and Motivation4 I Fraud in e-Commerce 4 Vishal Kakkar, T. Ravindra Babu. Address Clustering for e-Commerce Applications, SIGIR eCom 2018url:https: //sigir-ecom.github.io/ecom2018/ecom18Papers/paper8.pdf T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 53. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Motivation, Solution Overview and Experimental Results Address Clustering - Context Setting and Motivation4 I Fraud in e-Commerce I Reseller Fraud, Missing item fraud, Seller Fraud–Protection Fund, Duplicate Items, Pricing Fraud 4 Vishal Kakkar, T. Ravindra Babu. Address Clustering for e-Commerce Applications, SIGIR eCom 2018url:https: //sigir-ecom.github.io/ecom2018/ecom18Papers/paper8.pdf T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 54. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Motivation, Solution Overview and Experimental Results Address Clustering - Context Setting and Motivation4 I Fraud in e-Commerce I Reseller Fraud, Missing item fraud, Seller Fraud–Protection Fund, Duplicate Items, Pricing Fraud I Review of challenges in Indian Addresses: Lack of Geolocation, limited literacy, ethnicity, variants of same address, additional data as part of address, wrong PIN code 4 Vishal Kakkar, T. Ravindra Babu. Address Clustering for e-Commerce Applications, SIGIR eCom 2018url:https: //sigir-ecom.github.io/ecom2018/ecom18Papers/paper8.pdf T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 55. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Motivation, Solution Overview and Experimental Results Address Clustering - Context Setting and Motivation4 I Fraud in e-Commerce I Reseller Fraud, Missing item fraud, Seller Fraud–Protection Fund, Duplicate Items, Pricing Fraud I Review of challenges in Indian Addresses: Lack of Geolocation, limited literacy, ethnicity, variants of same address, additional data as part of address, wrong PIN code I Need for Address Clustering 4 Vishal Kakkar, T. Ravindra Babu. Address Clustering for e-Commerce Applications, SIGIR eCom 2018url:https: //sigir-ecom.github.io/ecom2018/ecom18Papers/paper8.pdf T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 56. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Motivation, Solution Overview and Experimental Results Clustering Approaches I Discussion on clustering methods5: Iterative vis-a-vis Single view, Prototype vs centroid, 5 C.C.Aggarwal et al. A survey of text clustering algorithms. In Mining text data. Springer, Boston, MA, 77-128 pages, 2012 6 T. Ravindra Babu, et al. On Simultaneous selection of prototypes and features in large data. International Conference on Pattern Recognition and Machine Intelligence. Springer, Berlin, Heidelberg, 2005. T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 57. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Motivation, Solution Overview and Experimental Results Clustering Approaches I Discussion on clustering methods5: Iterative vis-a-vis Single view, Prototype vs centroid, I Cluster evaluation6: A brief discussion 5 C.C.Aggarwal et al. A survey of text clustering algorithms. In Mining text data. Springer, Boston, MA, 77-128 pages, 2012 6 T. Ravindra Babu, et al. On Simultaneous selection of prototypes and features in large data. International Conference on Pattern Recognition and Machine Intelligence. Springer, Berlin, Heidelberg, 2005. T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 58. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Motivation, Solution Overview and Experimental Results Clustering Approaches I Discussion on clustering methods5: Iterative vis-a-vis Single view, Prototype vs centroid, I Cluster evaluation6: A brief discussion I Leader Clustering 5 C.C.Aggarwal et al. A survey of text clustering algorithms. In Mining text data. Springer, Boston, MA, 77-128 pages, 2012 6 T. Ravindra Babu, et al. On Simultaneous selection of prototypes and features in large data. International Conference on Pattern Recognition and Machine Intelligence. Springer, Berlin, Heidelberg, 2005. T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 59. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Motivation, Solution Overview and Experimental Results Clustering Approaches I Discussion on clustering methods5: Iterative vis-a-vis Single view, Prototype vs centroid, I Cluster evaluation6: A brief discussion I Leader Clustering 5 C.C.Aggarwal et al. A survey of text clustering algorithms. In Mining text data. Springer, Boston, MA, 77-128 pages, 2012 6 T. Ravindra Babu, et al. On Simultaneous selection of prototypes and features in large data. International Conference on Pattern Recognition and Machine Intelligence. Springer, Berlin, Heidelberg, 2005. T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 60. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Motivation, Solution Overview and Experimental Results Clustering Approaches I Discussion on clustering methods5: Iterative vis-a-vis Single view, Prototype vs centroid, I Cluster evaluation6: A brief discussion I Leader Clustering I Leader Clustering with edit distance 5 C.C.Aggarwal et al. A survey of text clustering algorithms. In Mining text data. Springer, Boston, MA, 77-128 pages, 2012 6 T. Ravindra Babu, et al. On Simultaneous selection of prototypes and features in large data. International Conference on Pattern Recognition and Machine Intelligence. Springer, Berlin, Heidelberg, 2005. T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 61. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Motivation, Solution Overview and Experimental Results Clustering Approaches I Discussion on clustering methods5: Iterative vis-a-vis Single view, Prototype vs centroid, I Cluster evaluation6: A brief discussion I Leader Clustering I Leader Clustering with edit distance I Leader Clustering with word embeddings 5 C.C.Aggarwal et al. A survey of text clustering algorithms. In Mining text data. Springer, Boston, MA, 77-128 pages, 2012 6 T. Ravindra Babu, et al. On Simultaneous selection of prototypes and features in large data. International Conference on Pattern Recognition and Machine Intelligence. Springer, Berlin, Heidelberg, 2005. T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 62. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Motivation, Solution Overview and Experimental Results Solution Overview I Approach-1: Considering words directly of an address, with each address as a document I Approach-2: Word embeddings based approach (add2vec) Table: Algorithm Complexity Algorithm Complexity Leader(edit distance) O((n ∗ d ∗ m)2) Leader(add2vec) O(n2 ∗ d) T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 63. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Motivation, Solution Overview and Experimental Results Experimental results Spell Variants Algorithm Spell Variants Edit-dist apartmnt, aparent, aparmtement, apretment, apparment, aparment, apprmnts, apartemnets Embeddings appartment, appt, apt, apartments, apparment, aprtment, apartement, appts, appartments Edit-dist collage, colloge, coolege, cottege, callage, coolage, collega, callege Embeddings collage, collge, colleage, clg, colege, colleg, colg, clg T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 64. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Motivation, Solution Overview and Experimental Results Sample Clusters with Leader Clustering and add2vec Customer Address Cosine Similarity la renon healthcare prv ltd 711 iscon elegance prahlad nagar cross road s g highway ahmedabad 1.0 la renon healthcare prv ltd 711 iscon elegance prahlad nagar cross road s g highway ahmedabad 201 limited 0.989 la renon healthcare prv ltd 711 iscon elegance prahlad nagar cross road s g highway ahmedabad 201 limited. fax 91 office shapath India 0.962 la renon healthcare prv ltd 711 iscon elegance prahlad nagar cross road s g highway ahmedabad 201 limited. fax 91 office shapath India 380015 p 5 1001 gujarat roads circle 1000 793046 0.955 T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 65. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Motivation, Solution Overview and Experimental Results Summary I Problem Overview I Approaches I Results I Applications T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 66. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Recent Advances I BERT variants for address classification7 I GIS and geocoding I Global Grid representations8,9 I BERT variants for address non-deliverability/incomplete-address-prediction 7 Shreyas Mangalgi, Lakshya Kumar, T. Ravindra Babu. Deep Contextual Embeddings for Address Classification in E-commerce. KDD AI for Fashion, 2020. arXiv: 2007. 03020 8 https://mailingsystemstechnology.com/ article-4199-Addresses-and-Discrete-Grid-Systems.html,2017 9 https://www.delhivery.com/news/ economic-impact-of-discoverability-of-localities-and-addresses-in-india T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 67. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling BERT Variants for Address Classification I BERT variants for address classification10 I Approach-1: Preprocessing: Prob. splitting, Spell correction, Bigram separation, Prob. Merging. W2V for tokens and tf-idf weighting I Adv: Appropriate weighting for frequent and infrequent terms; I Disadv: Averging word vectors leads to lass of sequential information. Ex. Faridabad addresses 10 Shreyas Mangalgi, Lakshya Kumar, T. Ravindra Babu. Deep Contextual Embeddings for Address Classification in E-commerce. KDD AI for Fashion, 2020. arXiv: 2007. 03020 T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 68. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling BERT variants for Address Classification I Approach-2: Bidrectional-LSTM I Address token representation as concatenation of forward hidden state and backward hidden state through bi-dir training I Adv: Bi-directional, captures sequential information I Disadv: Time consuming T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 69. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling BERT variants I Approach-3: RoBERTa I Pretraining on addresses and finetuning for subregion classification taxk. I It optimises for two pretraining tasks: Masked Lang. Model and Next Sentence Prediction I Adv: Captures sequential information I Disadv: Faster T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 70. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Recent Advances I GIS and geocoding I Global Grid representations11,12 I Predicting completeness of unstructured shipping addresses using ensemble methods by Razorpay team (ConvNet+XGBoost)13 I Mining points of interest via address embeddings: an unsupervised approach, by Swiggy team+ Univ of Maryland(unsupervised+RoBERTa+PoI+alg. polygons)14 11 https://mailingsystemstechnology.com/ article-4199-Addresses-and-Discrete-Grid-Systems.html,2017 12 https://www.delhivery.com/news/ economic-impact-of-discoverability-of-localities-and-addresses-in-india 13 https://sigir-ecom.github.io/ecom21Papers/paper25.pdf 14 https://dl.acm.org/doi/abs/10.1145/3486183.3491002 T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain
  • 71. Address Problems - Motivation and Challenges Address Classification or Route Assignment Monkey Typed Address Classification Address Clustering Recent Advances in Address Modelling Thank You T. Ravindra Babu, Ph.D. Address Models for the Indian e-Commerce Domain