1SoftAge Information Technology Ltd. : Confidential16 January 2015
SoftAge DDUP
Records Management Software which removes duplication of entries,
promotes consistency and data integrity.
2SoftAge Information Technology Ltd. : Confidential16 January 2015
De-Duplication
3SoftAge Information Technology Ltd. : Confidential16 January 2015
De-duplication is the process to identify multiple records of same customer
from the whole subscriber base.
• There are various techniques to identify the multiple records.
• This process is to identify the duplicate records on the basis of name,
father name and address.
De-Duplication
Report duplicate records, identified by the following (flexible) criteria:
• 80% Name Match
• 80% Father Name Match
• 70% Address Match
4SoftAge Information Technology Ltd. : Confidential16 January 2015
Challenges
• Its very difficult to identify the duplicate records on the basis of
name and address by the similar matching cases.
• Cater the spelling mistakes and similar spellings
• Percentage wise Partial match criteria
• Dealing with large volume databases ranging from 30-100
million records each.
5SoftAge Information Technology Ltd. : Confidential16 January 2015
The Algorithm
Generate names
“similar” to the given
name
Select address and
father name from
database where
these generated
names match
Filter out records
where 80% of father
name matches(by
edit distance)
Filter out records
where at least 70%
of the tokens in the
address match
6SoftAge Information Technology Ltd. : Confidential16 January 2015
De-duplication Web Service
• A simple and good way to incorporate the de-duplication
process, to implement it as a web service.
• Whenever, deduplication of a record needed to be done, one
could simply use the web service via a http request.
• The code was implemented as a web service and hosted on the
IIS(Internet Information Services).
7SoftAge Information Technology Ltd. : Confidential16 January 2015
Thank You
For more details drop a CorpMail at harsh.tikku@softage.net
Or call +919811428984

SoftAge DDUP

  • 1.
    1SoftAge Information TechnologyLtd. : Confidential16 January 2015 SoftAge DDUP Records Management Software which removes duplication of entries, promotes consistency and data integrity.
  • 2.
    2SoftAge Information TechnologyLtd. : Confidential16 January 2015 De-Duplication
  • 3.
    3SoftAge Information TechnologyLtd. : Confidential16 January 2015 De-duplication is the process to identify multiple records of same customer from the whole subscriber base. • There are various techniques to identify the multiple records. • This process is to identify the duplicate records on the basis of name, father name and address. De-Duplication Report duplicate records, identified by the following (flexible) criteria: • 80% Name Match • 80% Father Name Match • 70% Address Match
  • 4.
    4SoftAge Information TechnologyLtd. : Confidential16 January 2015 Challenges • Its very difficult to identify the duplicate records on the basis of name and address by the similar matching cases. • Cater the spelling mistakes and similar spellings • Percentage wise Partial match criteria • Dealing with large volume databases ranging from 30-100 million records each.
  • 5.
    5SoftAge Information TechnologyLtd. : Confidential16 January 2015 The Algorithm Generate names “similar” to the given name Select address and father name from database where these generated names match Filter out records where 80% of father name matches(by edit distance) Filter out records where at least 70% of the tokens in the address match
  • 6.
    6SoftAge Information TechnologyLtd. : Confidential16 January 2015 De-duplication Web Service • A simple and good way to incorporate the de-duplication process, to implement it as a web service. • Whenever, deduplication of a record needed to be done, one could simply use the web service via a http request. • The code was implemented as a web service and hosted on the IIS(Internet Information Services).
  • 7.
    7SoftAge Information TechnologyLtd. : Confidential16 January 2015 Thank You For more details drop a CorpMail at harsh.tikku@softage.net Or call +919811428984