Reifier2. © Nube Technologies
About Myself and Nube
- Big data - Hadoop, Spark
- Analytics, Data wrangling, Machine Learning
- Nube Products - Reifier, Crux and HIHO
- IIT Delhi, 98.
- Cofounder from IIT Kanpur, 97
3. © Nube Technologies
Business Data is spread across many systems
● Discovering information a challenge - which are the
entities whom we need to address?
● Consolidating information a challenge - not sure if the
data is tied back to a single entity
● Enhancing data a challenge - are these new records
genuine or do they already exist?
Business Challenges
4. © Nube Technologies
The problem - lake or swamp?
According to Gartner, businesses lose upto 25% of potential revenue due to
lack of multichannel view of data. 67% data scientists say cleaning, organizing
and linking data is their most time consuming task, and 52.3% cite poor data
quality as their biggest challenge.
5. © Nube Technologies
● Data volumes are high
● Each record has multiple dimensions
● Exact matches are rare
● Comparing each record with every other is not possible
● There are many disparate systems
● Languages have unique issues
Technical Challenges for Matching
6. © Nube Technologies
● Discovering and maintaining rules for data quality is
extremely tough
● Custom coding and domain specific logic makes
maintenance a nightmare
● No one size fits all, big custom implementations needed
every time even after using existing tools
Technical Challenges for Matching
7. © Nube Technologies
● Point and Shoot - Zero config
● Learns similarity definitions from data
● No hard coding of business rules
● Highly scalable - runs on open source Apache Spark
● Advanced Machine Learning algorithms pick most
optimal solution
● Domain agnostic, can work with various kinds of data
● Utilities to create labeled data available - just point it to
the data
Reifier Advantages
8. © Nube Technologies
● Handles different languages - English, Chinese,
Japanese
● Highly accurate results
● Available as a library or as a private/public cloud
deployment
● REST interface
● AJAX based web front end
● Real time as well as batch support
● Support and Documentation through web based support
portal http://reifier.freshdesk.com
Reifier Advantages
9. © Nube Technologies
Customer Feedback
Before Reifer we had to use a lot of manual efforts to identify potential duplicates
in customer data, now the system can learn patterns and find duplicates for us
intelligently. It’s a breakthrough to a long-standing issue of our businesses.”
- Mr. Dave Chan, Regional Director Business Intelligence, UBM Asia
10. © Nube Technologies
Case Study - UBM Asia
- Deduplication of marketing data
- Combination of English, Chinese, Japanese
and other languages
- Upto 1 million new records per week
- Temp can do only about 800 records per day
- AWS Hosted, yearly license
- Reference customer
11. © Nube Technologies
Case Study - Government of India
- Invited for data matching for intelligence
agencies
- Reifier outperformed leading international
competition 2x on accuracy and >10x for
speed
- Matched 40million records
12. © Nube Technologies
A banking institution uses Reifier to run loan
applications against credit listing data to ensure
that they are not dealing with blacklisted
individuals and corporates.
Case Study - BFSI
13. © Nube Technologies
Case Study - BFSI
A leading insurance provider uses Reifier to
prevent fraudulent claims. By creating a
centralized consolidated data repository, the
company reduces overexposure of an
individual who has multiple policies. By
matching records, Reifier also helps find out
average policy per individual and household.
14. © Nube Technologies
A credit rating company utilizes Reifier to
consolidate personal credit histories from
different sources and provide accurate ratings
to their customers.
Case Study - BFSI
15. © Nube Technologies
A telecom company offers various products and
services and wants to cross sell to existing
customers. Existing information is fuzzily
matched for accurate customer segmentation
and marketing.
Case Study - Cross Selling
16. © Nube Technologies
Case Study - Regulatory
Regulatory compliance of all kinds - including related to
policies, taxes, privacy, anti terror, and anti money-
laundering - require matching up data pulled from a variety
of sources. With Reifier, organizations can meet regulatory
mandates with capabilities that support everything from
simple deduplication of customer lists to matching data
against government lists of suspected terrorists.
17. © Nube Technologies
A services company sources organization and
people data from LinkedIn and Crunchbase and
uses Reifier to match existing in house entities
to identify leads.
Case Study - Lead Generation
18. © Nube Technologies
By consolidating vendor information from different
geographies, source systems and channels, a retail
operator gets a complete view of its supply chain and it
able to garner better deals and discounts from its vendors.
Reifier helps in cutting costs for the retailer.
Case Study - Retail Operations
19. © Nube Technologies
Case Study - Telecom
Using Reifier, telecom companies can detect
delinquency patterns by identifying non paying
customers who evade detection by enrolling
with give similar sounding names and
addresses with different formatting and
spellings.
20. © Nube Technologies
A local search company lists millions of regional
businesses, restaurants and contacts. They periodically
crawl the web to update their listing database. Information
crawled from the web have similar entries found from
different websites and also with pre-existing entries in the
database. Reifier helps the search company compare their
existing listings with potential listings from the crawled data,
and keeps their directory up to date and free from duplicate
data.
Case Study - Directory Service
25. © Nube Technologies
Reifier - News
● Reifier 1.0 released in October 2014 with
one international paying customer.
● Reifier 2.0 with interactive web GUI released
March 2015.
● GOI POC in Aug - Sep 2015
● Working on real time matching, merging,
GUI enhancements.
30. © Nube Technologies
● Accept or create training data with marked
duplicates
● Identify similarity and indexing rules through
Machine Learning
● Group near similar records together
● Match and predict similar records
Reifier Technology
34. © Nube Technologies
● Built using open source
● Apache Spark
● ElasticSearch
● Machine Learning
● Java
● Scala
Reifier Under The Hood