SlideShare a Scribd company logo
1 of 3
Download to read offline
 
Efficient Marketing by Customer  
Record Deduplication ­ UBM​ ​Asia  
 
How Reifier is helping UBM Asia gain single view of customers at low 
costs 
 
About UBM Asia 
UBM Asia ​(www.ubmasia.com​) is Asia’s major trade fair and exhibition organizer. Owned by                         
UBM plc listed on the London Stock Exchange, UBM Asia is headquartered in Hong Kong and                               
has subsidiary companies in Asia and US, spanning 24 cities and 31 offices and a staff of 1300                                   
people. ​With a track record spanning over 30 years, UBM Asia operates in 20 market sectors                               
with 230 dynamic face­to­face exhibitions and high­level professional conferences, 21 targeted                     
trade publications, 18 round­the­clock online products for over 2,000,000 quality exhibitors,                     
visitors, conference delegates, advertisers and subscribers from all over the world.    
 
The problem 
UBM Asia collects contact information of visitors to its leading trade fairs. ​Due to the                             
international nature of the visitorship, the data is multilingual with a mix of English,                           
Chinese, Thai, Turkish, Korean, Japanese. The contact information is collected at various                       
points in the cycle ­ registration desk, online portals, survey forms, social media and various                             
directories etc. The contact information is primarily obtained through paper forms, which get                         
digitized or manual entry into a database. ​Typical record volumes are about a million                           
entries​. At the end of the fair, ​the data is collated and fed into a CRM ​(Client Relationship                                   
Management) system for future correspondence, offerings and promotions. The CRM is the                       
cornerstone for UBM’s business. The sales and marketing team use the system heavily to invite                             
attendees as well as send marketing promotions, service emails and critical event planning                         
information to prospects both electronically and via traditional means. 
 
The contact information is riddled with poor quality data ­ missing fields, typographical and                           
lexical differences as well as field swapping within multiple entries of the same person​.                           
Many times, visitors provide common company sales or marketing email, phone numbers and                         
addresses instead of their own personal email ids, phones & addresses. Other times the same                             
visitor may provide different emails or phone numbers, or official address in one case and                             
personal address in another. There are also misspellings, partial names with missing first,                         
middle or last names, leading and trailing spaces and other typographical variations across all                           
the fields. As a consequence of having these duplicates, UBM Asia was  
­ Missing cost­saving opportunities 
­ Sub­optimal customer experience arising from the same customer being                 
approached multiple times​ ​for the same offer  
 
www.nubetech.co​ | info@nubetech.co 
 
 
The sheer size of the data as well as the nuanced differences make manual deduplication                             
impossible. As exact matches are rare, database joins and filtering are ruled out too.  
 
Requirements 
UBM Asia wanted a solution for data matching and quality which could 
a. Handle different variations in fields across records ­ Missing middle, first and last names,                           
abbreviations in different parts of names and addresses, typographical errors etc. The                       
tool also needed to handle a mix of Chinese and English characters within the same                             
record as well as datasets containing both Chinese and English records   
b. Support different geographies ­ even when the names are in English, there are regional                           
differences when the event is held in India vs one in Singapore 
c. Yield results faster 
d. Work without data massaging, normalization and preprocessing 
 
Approach  
UBM Asia tried multiple existing solutions but none of them could handle the complexity, volume                             
and variety of data and provide a useable level of accuracy. Existing deduplication solutions are                             
rule and dictionary based where defining and managing the rules is a complex and time                             
consuming activity performed by a developer who has a background in matching                       
algorithms and tweaks weights assigned to different fields. To create precise rules, a lot                           
of data cleansing and preprocessing is also needed​. Rules and dictionaries need                       
modification when the context of data changes or with the change in language or locale. A rule                                 
mapping English name Jonathan to Jo is invalid in an Asian context, where Jo is a name in                                   
itself. Thus learnings from one set of data cannot be easily used on another set of data and                                   
requires costly and time consuming intervention from an expert. 
  
UBM Asia’s Business Intelligence team uses Reifier fuzzy machine engine to make smart                         
matching. With minimal setup time, Reifier matches and links contact records containing                       
different languages as well as variations across fields yielding an accuracy of 70% or                           
more​. ​The same training model works with English and Chinese records​. UBM Asia is also                             
able to successfully match and link Japanese records on the same setup without any                           
configuration changes. Using Reifier’s smart web interface, UBM               
Asia’s Business Intelligence team performs their matching tasks               
with ease, deduplicating and linking data ​within minutes               
instead of days​.    
1
 
Reifier’s innovative fuzzy machine algorithms use machine             
learning to overcome the limitations of traditional systems.               
Reifier is ​directly managed by the business user​, data                 
1
 As per industry average and UBM Asia’s internal findings, a temporary worker can manually verify upto 
1000 records a day. 
 
www.nubetech.co​ | info@nubetech.co 
 
scientist or data engineer who can train Reifer to identify duplicates just like a human would                               
without the need of a data matching developer or expert​.  
 
Before Reifer we had to use a lot of manual efforts to identify potential 
duplicates in customer data, now the system can learn patterns and find 
duplicates for us intelligently. It’s a breakthrough to a long­standing issue of our 
businesses.”  
­ Mr. Dave Chan, Regional Director Business Intelligence, UBM Asia 
 
Reifier’s automated learning engine brings up the deduplication system 5 times faster                       
and identifies 2 times more duplicates than conventional tools ​
. As Reifier learns from the                           
2
data itself, it works seamlessly with different datasets ­ products, people, organizations,                       
addresses etc. ​Built on Apache Spark, Reifier is highly scalable to billions of records​.                           
Reifier can be deployed on premise or on the cloud, providing sufficient ROI to the end user.  
 
To see how Reifier can help you, write to us at ​info@nubetech.co / tweet to @nubetech / call                                   
+91­8800541717 today.  
 
 
2
 Comparison performed independently by another customer, reference available on request 
 
www.nubetech.co​ | info@nubetech.co 
 

More Related Content

Recently uploaded

NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxAleenaJamil4
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 

Recently uploaded (20)

NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptx
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 

Featured

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 

Featured (20)

Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 

UBM Asia Reifier Marketing Use Case

  • 1.   Efficient Marketing by Customer   Record Deduplication ­ UBM​ ​Asia     How Reifier is helping UBM Asia gain single view of customers at low  costs    About UBM Asia  UBM Asia ​(www.ubmasia.com​) is Asia’s major trade fair and exhibition organizer. Owned by                          UBM plc listed on the London Stock Exchange, UBM Asia is headquartered in Hong Kong and                                has subsidiary companies in Asia and US, spanning 24 cities and 31 offices and a staff of 1300                                    people. ​With a track record spanning over 30 years, UBM Asia operates in 20 market sectors                                with 230 dynamic face­to­face exhibitions and high­level professional conferences, 21 targeted                      trade publications, 18 round­the­clock online products for over 2,000,000 quality exhibitors,                      visitors, conference delegates, advertisers and subscribers from all over the world.       The problem  UBM Asia collects contact information of visitors to its leading trade fairs. ​Due to the                              international nature of the visitorship, the data is multilingual with a mix of English,                            Chinese, Thai, Turkish, Korean, Japanese. The contact information is collected at various                        points in the cycle ­ registration desk, online portals, survey forms, social media and various                              directories etc. The contact information is primarily obtained through paper forms, which get                          digitized or manual entry into a database. ​Typical record volumes are about a million                            entries​. At the end of the fair, ​the data is collated and fed into a CRM ​(Client Relationship                                    Management) system for future correspondence, offerings and promotions. The CRM is the                        cornerstone for UBM’s business. The sales and marketing team use the system heavily to invite                              attendees as well as send marketing promotions, service emails and critical event planning                          information to prospects both electronically and via traditional means.    The contact information is riddled with poor quality data ­ missing fields, typographical and                            lexical differences as well as field swapping within multiple entries of the same person​.                            Many times, visitors provide common company sales or marketing email, phone numbers and                          addresses instead of their own personal email ids, phones & addresses. Other times the same                              visitor may provide different emails or phone numbers, or official address in one case and                              personal address in another. There are also misspellings, partial names with missing first,                          middle or last names, leading and trailing spaces and other typographical variations across all                            the fields. As a consequence of having these duplicates, UBM Asia was   ­ Missing cost­saving opportunities  ­ Sub­optimal customer experience arising from the same customer being                  approached multiple times​ ​for the same offer     www.nubetech.co​ | info@nubetech.co   
  • 2.   The sheer size of the data as well as the nuanced differences make manual deduplication                              impossible. As exact matches are rare, database joins and filtering are ruled out too.     Requirements  UBM Asia wanted a solution for data matching and quality which could  a. Handle different variations in fields across records ­ Missing middle, first and last names,                            abbreviations in different parts of names and addresses, typographical errors etc. The                        tool also needed to handle a mix of Chinese and English characters within the same                              record as well as datasets containing both Chinese and English records    b. Support different geographies ­ even when the names are in English, there are regional                            differences when the event is held in India vs one in Singapore  c. Yield results faster  d. Work without data massaging, normalization and preprocessing    Approach   UBM Asia tried multiple existing solutions but none of them could handle the complexity, volume                              and variety of data and provide a useable level of accuracy. Existing deduplication solutions are                              rule and dictionary based where defining and managing the rules is a complex and time                              consuming activity performed by a developer who has a background in matching                        algorithms and tweaks weights assigned to different fields. To create precise rules, a lot                            of data cleansing and preprocessing is also needed​. Rules and dictionaries need                        modification when the context of data changes or with the change in language or locale. A rule                                  mapping English name Jonathan to Jo is invalid in an Asian context, where Jo is a name in                                    itself. Thus learnings from one set of data cannot be easily used on another set of data and                                    requires costly and time consuming intervention from an expert.     UBM Asia’s Business Intelligence team uses Reifier fuzzy machine engine to make smart                          matching. With minimal setup time, Reifier matches and links contact records containing                        different languages as well as variations across fields yielding an accuracy of 70% or                            more​. ​The same training model works with English and Chinese records​. UBM Asia is also                              able to successfully match and link Japanese records on the same setup without any                            configuration changes. Using Reifier’s smart web interface, UBM                Asia’s Business Intelligence team performs their matching tasks                with ease, deduplicating and linking data ​within minutes                instead of days​.     1   Reifier’s innovative fuzzy machine algorithms use machine              learning to overcome the limitations of traditional systems.                Reifier is ​directly managed by the business user​, data                  1  As per industry average and UBM Asia’s internal findings, a temporary worker can manually verify upto  1000 records a day.    www.nubetech.co​ | info@nubetech.co   
  • 3. scientist or data engineer who can train Reifer to identify duplicates just like a human would                                without the need of a data matching developer or expert​.     Before Reifer we had to use a lot of manual efforts to identify potential  duplicates in customer data, now the system can learn patterns and find  duplicates for us intelligently. It’s a breakthrough to a long­standing issue of our  businesses.”   ­ Mr. Dave Chan, Regional Director Business Intelligence, UBM Asia    Reifier’s automated learning engine brings up the deduplication system 5 times faster                        and identifies 2 times more duplicates than conventional tools ​ . As Reifier learns from the                            2 data itself, it works seamlessly with different datasets ­ products, people, organizations,                        addresses etc. ​Built on Apache Spark, Reifier is highly scalable to billions of records​.                            Reifier can be deployed on premise or on the cloud, providing sufficient ROI to the end user.     To see how Reifier can help you, write to us at ​info@nubetech.co / tweet to @nubetech / call                                    +91­8800541717 today.       2  Comparison performed independently by another customer, reference available on request    www.nubetech.co​ | info@nubetech.co