The document discusses address classification in logistics without geolocation data. It describes the challenges of classifying addresses due to variations and lack of structure. A solution is proposed using natural language processing techniques like preprocessing, clustering, and supervised classification to assign addresses to delivery subareas. The model was developed in-house for Flipkart and novel compared to other solutions.
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Shipment Address Classification in Logistics using Machine Learning
1. Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Shipment Address Classification in Logistics in
the absence of Geolocation Information
Dr. T. Ravindra Babu,
Data Scientist,
Flipkart
August 1, 2015
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
2. Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Presentation Plan
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
3. Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Motivation and Problem Definition
Motivation
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
4. Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Motivation and Problem Definition
Motivation
Problem Definition
Typical Operations Scenario at Delivery Hub without a model
Inscan of shipments received from Mother Hub
Manual reading of address; Assign to the Route/FE
Sorting and Delivery
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
5. Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Motivation and Problem Definition
Motivation
Problem Definition
Typical Operations Scenario at Delivery Hub without a model
Inscan of shipments received from Mother Hub
Manual reading of address; Assign to the Route/FE
Sorting and Delivery
Overview of Proposed Solution
Capturing FEs’ domain knowledge and modelling around it
Classifying an address to be belonging to a pre-defined subarea
Allocation of the shipments to Route/FE based on Machine
Learning based Classifier
Sorting and Delivery
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
6. Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Delivery Hub and Subareas
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
7. Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Insights into Address Data
No. of words in an addresses ranges from 4 to 75 leaving few
outliers of more than 100.
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
8. Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Insights into Address Data
No. of words in an addresses ranges from 4 to 75 leaving few
outliers of more than 100.
Word like Apartments is spelt in 263 different ways; whitefield
24 ways, industrial 25 ways, Bangalore 161 ways, karnataka
70 ways, etc.
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
9. Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Insights into Address Data
No. of words in an addresses ranges from 4 to 75 leaving few
outliers of more than 100.
Word like Apartments is spelt in 263 different ways; whitefield
24 ways, industrial 25 ways, Bangalore 161 ways, karnataka
70 ways, etc.
Structure in address is lacking even in city like Bangalore.
Few examples.
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
10. Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Insights into Address Data
No. of words in an addresses ranges from 4 to 75 leaving few
outliers of more than 100.
Word like Apartments is spelt in 263 different ways; whitefield
24 ways, industrial 25 ways, Bangalore 161 ways, karnataka
70 ways, etc.
Structure in address is lacking even in city like Bangalore.
Few examples.
Some words a specific to certain places/states. Examples:
halli, hobli; bawdi, kuan; society; layout; etc.
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
11. Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Insights into Address Data
No. of words in an addresses ranges from 4 to 75 leaving few
outliers of more than 100.
Word like Apartments is spelt in 263 different ways; whitefield
24 ways, industrial 25 ways, Bangalore 161 ways, karnataka
70 ways, etc.
Structure in address is lacking even in city like Bangalore.
Few examples.
Some words a specific to certain places/states. Examples:
halli, hobli; bawdi, kuan; society; layout; etc.
Addressing Systems across the world: US, Europe, Korea,
Japan; countries like Brazil, and India
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
12. Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Proposed Model
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
13. Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Preprocessing
An elaborate preprocessing model was necessary that accounts
for the following.
Retaining only those terms that possibly help classification
(discriminability)
Merging of terms by empirical statistical models as well as
domain knowledge based rules, n-grams, abbreviating, etc.
Developing data dependent dictionaries based on pattern
clustering (Machine Learning) and forming an equivalent set
Preprocessing reduces the vocabulary size by 65% as
measured on a large dataset
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
14. Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Preprocessing for Data Compaction
Figure: Impact of Preprocessing
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
15. Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing::Fraud Address Classification -
Address Strings
Sl.No. Address
1 adf6546s54f6sadfsd6dsa4f6sd54f6sd46fasd54sd6f
2 gasdfashagadfasmejastic
3 fdgdf
4 hjsdhaddsdsasdsa
5 dsfadafadsasdfsdafsda
6 hjsdhaddsdsasdsa
7 asd
8 lmflvml
9 assasfsafasfsasfsfsafashaphilomena
10 faskjbdasdlkjbsaasd
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
16. Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing::Fraud Address Classification -
Address Strings-Heatmap
Figure: MonkeyType Addresses
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
17. Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing::Fraud Address Classification -
Items Bought
Figure: Items bought by such people
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
18. Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing :: Probabilistic Separation of
Compound Words
To a large extent, Addresses are not amenable to English
Dictionaries
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
19. Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing :: Probabilistic Separation of
Compound Words
To a large extent, Addresses are not amenable to English
Dictionaries
While writing addresses it is often found that the customer
either inadvertently misses the space or removed during
storage/retrieval
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
20. Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing :: Probabilistic Separation of
Compound Words
To a large extent, Addresses are not amenable to English
Dictionaries
While writing addresses it is often found that the customer
either inadvertently misses the space or removed during
storage/retrieval
Separating such compound words
Compute empirical probabilities of words
Assuming conditional independence, if the joint probability of a
compound word is less than the product of the individual
words, separate the words
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
21. Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing :: Frequent Pattern Tree for
n-gram Generation
Frequent pattern tree is a celebrated approach in mining large
datasets
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
22. Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing :: Frequent Pattern Tree for
n-gram Generation
Frequent pattern tree is a celebrated approach in mining large
datasets
We implement a modified version of the tree to generate
n-grams
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
23. Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing :: Frequent Pattern Tree for
n-gram Generation
Frequent pattern tree is a celebrated approach in mining large
datasets
We implement a modified version of the tree to generate
n-grams
Conventional method
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
24. Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing :: Frequent Pattern Tree for
n-gram Generation
Frequent pattern tree is a celebrated approach in mining large
datasets
We implement a modified version of the tree to generate
n-grams
Conventional method
New approach
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
25. Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing::Clustering for equivalent set of
words with spell variations - Ex. koramangala, electronics
koramanagala koromangala kormanagala koramnagala
koramangalato kanamangala koramanagla koremangala
koaramangala koramamgala karamangala tkoramangala
kormangalla koramongala koarmangala korammangala
koramangalla koramangale koramanagal
electronice eclectronic elelctronic eelectronic electronica electroincs
electronics electroninc electrinics electroncis electronincs
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
26. Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing:: Clustering for ... spell variations
- Ex. Bannerghattaroad(61 variations)
bannerghattaroad, bannergattaroad, banerghattaroad, bannerghataroad,
bannerughattaroad, bannarghattaroad, banergattaroad,
banneraghattaroad, bannerghettaroad, bannerugattaroad,
bhannerghattaroad, bennerghattaroad, bannerghttaroad,
bannargattaroad, banarghattaroad, banneghattaroad, banneragattaroad,
bennarghattaroad, baneerghattaroad, bannergettaroad,
banngerghattaroad, banerghataroad, bannerghuttaroad, bannergatharoad,
benerghattaroad, bannerghattaroadto, bannergataroad,
bannergattharoad, banerghettaroad, bannerguttaroad, bannarghataroad,
bannnerghattaroad, bannarghettaroad, banerughattaroad,
bannergahttaroad, bhannerughattaroad, bennergattaroad,
bannerghattroad, bannaraghattaroad, bannerhattaroad,
bannerghatharoad, banneerghattaroad, bannaerghattaroad,
baneergattaroad, bhannergattaroad, bhanerghattaroad,
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
27. Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Post-processing :: Semi-Supervised Methods
Discussion
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
28. Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Revisiting The Model
Supervised Classification
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
29. Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Summary
Novelty
Solution is novel and developed in-house
No similar solution found in the Literature
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G
30. Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Thank You
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of G