More Related Content
Similar to Single view with_mongo_db_(lo)
Similar to Single view with_mongo_db_(lo) (20)
Single view with_mongo_db_(lo)
- 1. Single View with MongoDB
MongoDB World London
6th November 2015
Robert Hill – Head of Big Data for Financial Services
- 2. Single View with MongoDB | November 2014
Copyright ©Capgemini 2014. All Rights Reserved 2
Single View – No, It’s Not Tinder!
Single View is the
formation of a unified
view of an “entity” from
a mix of source
systems
These entities can be
customers, employees,
partners, suppliers, etc.
In reality, customers
make up the vast
majority of use cases,
so this is commonlhy
called Single View of
Customer, or SVC
Canonical Single View Architecture
Fuzzy matches
customer
records,
generates link
IDs, etc.
- 3. Single View with MongoDB | November 2014
Copyright ©Capgemini 2014. All Rights Reserved 3
Why Care About a Single View?
Let’s say we end up with 100 “John Smiths” in our Data Warehouse
How many are different John Smiths in person?
How many are simply different systems representing the same John Smith?
How many are a single system representing the same John Smith multiple times?
How many are a “John Smith” that has contacted us multiple times through differing
channels, branches, or brands, in differing contexts – i.e., corporate CFO John Smith of
XYZ Corp. is also citizen John Smith, who has a mortgage, auto loan, and a checking
account.
Any customer-centric activity becomes very difficult when we actually
cannot tell with certainty who a “customer” is…that includes Risk
modelling, Fraud detection, and of course Customer Analytics for
marketing and sales.
Taking the example of our CFO above, a bank would be hesitant to turn
him down for another car loan given he might have his company invest 20
million with the bank’s business division, wouldn’t they?
- 4. Lack of customer knowledge has a high potential cost – poor
understanding of customer view data has been known to have huge
business impacts.
For example, a customer data flaw in a demand forecasting system cost a major US
airline $50 million in one year of operation…on top of the $40 million development cost to
implement it
When a global stationary retailer examined their views of a customer, it was found that
every real customer had roughly 2.5 “virtual” customer records across 75 source data
systems. In short, they had no way of really understanding the value of any given
customer, or even a customer segment, via their sales and marketing data, and the
resulting cost was estimated as a net loss of 9% per customer transaction
Single View with MongoDB | November 2014
Copyright ©Capgemini 2014. All Rights Reserved 4
Bad SVC is Bad, Bad Business
- 5. Big Data is Making SVC Harder Than Ever
The growth in Data Lakes (or Data Hubs depending…) means that
companies store more and more information about entities than they have
ever had access to before
More data is not the same as more information – what good is knowing
everything about “John Smith” when you have records for 30,000 “John
Smiths” stored from various sources in your Data Lake…notice that we
may have an order of magnitude more “John Smiths” than we had prior to
Big Data
Big Data also means richer data…now we require SVC programming
detect duplicate customers from more varied data streams, such as web,
images, voice, geospatial, etc. The matching algorithms become much
harder, longer to develop, and costly
The Data Hub exists to allow new extracts/titrations of it to change to meet
business needs…which places greater demands on the SVC solution to
adapt to new data formats
Single View with MongoDB | November 2014
Copyright ©Capgemini 2014. All Rights Reserved 5
- 6. Single View Affects All Big 4 Big Data Use Cases
Single View with MongoDB | November 2014
Copyright ©Capgemini 2014. All Rights Reserved 6
Data Rationalisation
The Data Lake / Data Hub architecture enables
companies to retain all data in original source formats
These source formats are rife with duplicated entity
objects, and any use of the Data Lake in it’s native
form for analytics or modelling will contain possibly
indeterminate and inaccurate results
The move from Extract/Transform/Load to
Extract/Load/Transform has pushed this further down-stream
Fraud
Big Data is enabling longer retention of data, and
richer sources of data including voice and image
Fraud is moving towards real-time detection and
decisioning, where performance is important
But Big Data expands the difficulty in finding a “true”
customer record to model against, and can exacerbate
the performance issues of real-time or near-real-time
fraud models
Risk
Similarly to Fraud, Big Data is enabling Risk models to
have access to more and richer customer data,
including social media, detailed web interactions,
voice, and image data
This leads to more customer interactions in the data,
and potentially better data training sets for better risk
modelling - if a single customer can be identified to
input into the models! The confusion matrix of the
models is now highly dependant upon Single View.
Customer Analytics
As above, Customer Analytics and the CRM actions it
enables (NBA, NBO, real-time targeting, etc.) are all
potentially benefactors of Big Data.
With CA, the risk of mis-identifying a customer is even
greater, as the messaging directly to the customer may
be obviously wrong. More subtly, constantly
suggesting “customers also liked…” and being entirely
wrong routinely suggests to customers that the
company really doesn’t know or care about them.
- 7. Single View Challenges – It Can’t Be Rocket
Science?
Speed of Comparison - SVC algorithms usually require retrieval and
comparison of vast amounts of entity data for comparison and duplicate
detection. Historically, this has made them poor candidates for executing
from RDBMSs, and flat files in the landing area of DWs are common
Flexibility – as source systems change, the data design of an entity data
object is the product of the changes of all of the source system changes
that underlie it. In an RDBMS, this can have a large impact on the stability
of the Customer table and associated reference data
Speed of Access - Real-Time decisioning requires very fast access to the
underlying data, usually precluding joins in high-load environments
Reliability – as SVC stores begin to underlie more real-time processes, it
becomes imperative that they have high-availability and fail-over
Representation Flexibility - SVC processing can either combine or link
entity data objects, depending upon use cases being considered
Single View with MongoDB | November 2014
Copyright ©Capgemini 2014. All Rights Reserved 7
- 8. How the Canonical Model Stacks Up…And Why It Is
Falling Over
Single View with MongoDB | November 2014
Copyright ©Capgemini 2014. All Rights Reserved 8
Speed - RDBMS
provides limited
throughput for
Comparison Processing
Flexibility – RDMBS has
limited flexibility, can
require substantial
redevelopment as source
systems change
Speed of Access-
RDBMS usually requires
joins, limiting speed of
access
Reliability – RDBMs may
support clustering, but
usually with extra
software costs, i.e., RAC
Representation Flexibilty
- RDBMS usually
requires joins or
combining physical
records destructively
- 9. Single View with MongoDB | November 2014
Copyright ©Capgemini 2014. All Rights Reserved 9
Enter…Mongo! er, MongoDB
MongoDB is an exciting and powerful platform for implementing
enterprise-class Single View solutions
The design of MongoDB enables implementations that avoid the pitfalls of
traditional RDBMS-based Single View architectures, with a lower cost of
implementation
Due to the on-going flexibility of MongoDB to handle source systems
changes and mixed data types, it is very likely that the overall Total Cost of
Ownership of MongoDB solutions will be lower for the entire solution
lifecycle
- 10. MongoDB – a new Big Data SVC architecture
Single View with MongoDB | November 2014
Copyright ©Capgemini 2014. All Rights Reserved 10
We envision that
MongoDB will
usually sit on top of a
Data Lake (or ODS)
ETL has therefore
been replaced with
EL
Single View
processing may (if
possible) be moved
into MongoDB, using
MapReduce
Let’s look in detail…
New MongoDB Single View Architecture
- 11. MongoDB provides Fast Speed of Access…
MongoDB Avoids Joins
Innate to MongoDB is a database architecture that
strives to minimise joins, which is a design philosophy
for most Real-Time Decisioning databases
Embedded documents provide a way to de-normalise
repeated source data with no performance hit (subject
to growth of the object in size)
Single View with MongoDB | November 2014
Copyright ©Capgemini 2014. All Rights Reserved 11
Flexible Indexing
MongoDB provides flexible and powerful indexing
features, that allow the system to access specific data
objects rapidly. As most Single View uses have very
specific and known access patterns, they are easily
indexed
Where possible, Covered Queries allow MongoDB to
return Indexed results from the in-memory indexes
themselves, saving any disk access
Horizontal Scalability
MongoDB is horizontally scalable through the use of
sharding technology. Shards allow MongoDB
instances to be added to achieve the desired levels of
concurrent performance to large numbers of queries
Key to enabling the use of Single View
data is the ability to access it quickly to
perform Real-Time Decisioning.
- 12. MongoDB Provides Rapid Speed of Comparison…
MongoDB Integrates MapReduce
By embedding Map/Reduce processing, MongoDB
provides a better way to run large dataset Single View
processes
As MapReduce operates directly against the
MongoDB database, data import and export are
eliminated
Single View with MongoDB | November 2014
Copyright ©Capgemini 2014. All Rights Reserved 12
MapReduce Allows Intelligent SV Algorithms
MapReduce can implement very powerful algorithms in
JavaScript expressions
One of the primary uses of MapReduce is to find
similar objects and tag them or collect them
Data Access Speed
The same technologies that enable Rapid Speed of
Access also enable the rapid execution of SVC:
• Indexing allows rapid data access if needed, including
Covered Queries when possible
• Sharding again allows the MongoDB cluster to scale
appropriately to handle large data volumes and loads,
without the need for costly technologies such as Oracle RAC
Many Single View processing algorithms
are slow and inefficient if they use a
database, or rely upon difficult to
manage flat files as data input and
output
- 13. MongoDB Provides Representation Flexibility…
A key design issue for many SVC implementations is how strongly to link or combine
suspected duplicates. For applications such as maintaining a bank’s central records,
it is usually not advised to eliminate suspected duplicates unless the algorithm is
nearly 100% certain, or it is verified by human inspection. However, for a database
merely running marketing operations, there is a much lower cost of combining
suspected duplicates, even if they are false matches. MongoDB can easily cater for
both
Single View with MongoDB | November 2014
Copyright ©Capgemini 2014. All Rights Reserved 13
MongoDB can provide Referenced documents
Despite aiming to eliminate joins, MongoDB can
flexibly support more normalized and linked records,
using Referenced documents
This allows suspected duplicate customer documents
to be linked to a real or generated customer master
document, and not be overwritten.
Such an approach remains auditable and reportable at
any time
Batch or Real-Time
Due to the power of MapReduce integrated into the
MongoDB platform, various use cases may be catered
for
The traditional, batch-oriented approach may be
implemented and match keys written back to the
MongoDB database
For certain cases, it may be desirable to not perform
SVC comparisons and linking until query time, which
allows fully flexible linking
- 14. Single View with MongoDB | November 2014
Copyright ©Capgemini 2014. All Rights Reserved 14
MongoDB Provides Data Flexibility…
Flexible Document Formatting
By retaining source system data longer, Data Lakes
increase the variability of source record formats. In a
traditional RDBMS Data Warehouse, these changes
are costly to implement and track
MongoDB’s flexible JSON/BSON document structure
accepts variant record formats easily with no
conversion hassles of existing records, no query re-writes,
etc.
Non-Structured Data Sources
MongoDB accomodates BSON objects up to 16MB,
but has the means to easily incorporate non-structured
source data, using GridFS.
GridFS stores very large image, video, audio and other
non-structured data sources as chunks, each in their
own document with metadata
The ability to store non-structured content within
MongoDB with Customer (or Entity) data often avoids
the need for a separate Content Management System
Data Scalability
The power of Sharding does more than just allow
improved speed – it allows MongoDB to accommodate
data sources that simply grow and grow in size
New MongoDB technologies are expected shortly to
further push data scalability within each Shard while
constraining costs of growth
Big Data systems are incorporating
more data sources, longer retention
periods, and a great deal of non-structured
data. MongoDB provides the
flexibility to accommodate all of these
and lower TCO
- 15. As Single View becomes closely tied to
customer-facing CRM and Real-Time
Decisioning systems, it is imperative that
their source of truth does not fail,
particularly when used by on-line 24x7
customer channels.
Single View with MongoDB | November 2014
Copyright ©Capgemini 2014. All Rights Reserved 15
MongoDB Provides Reliability…
Multiple Redundancies
A deployed MongoDB instance has redundancy built
into the Query Routing nodes, the data-bearing
Shards, and the 3 Config Servers.
Within each Shard, data is apportioned between
primary and backup data sets, with the backups often
sited off-site for security and redundancy
This configuration also had inherent load balancing,
allowing degraded responses from one unit to be
balanced dynamically
- 16. Use Case: Internal Single View of Employee
Single View with MongoDB | November 2014
Copyright ©Capgemini 2014. All Rights Reserved 16
Business Case
Capgemini’s needs to ensure it provides a flexible and
adaptable HR function for its employees .There is a need
for the following requirements to be improved and met by
this function:
• Availability of real time accurate and useful data (consolidated
where possible)
• Single Employee View - Masked data where needed
• Dashboarding & ability to extract and manipulate data
• Improve data quality
• Reduce current support problems
.
Problem
Objectives /Scope
• Capgemini has a variety of employee-related databases – the Oracle HR system, Leave Management
System, Clarity time accounting, etc. Some key data is kept on spreadsheets and data comes from various
sources
• HR must produce both ad-hoc and periodic reports to managers and employees, as well as use the data
internally
• Most data is updated monthly, leading every reporting cycle to have to adjust the previous month’s summary
reports as corrections are applied. This affects accuracy and quality
• HR, Recruiting, and even Managers and Employees require a comprehensive view of HR-related data, with
appropriate data security and visibility rules strictly enforced.
• Construct a Single-View of Employee
data, comprised of HR, LMS, Clarity,
Salary Reference Data, Bench and
Roll-off data, using MongoDB
• Provide users with Tableau, Qlikview, or
similar reporting tool
• Build template for SVC-type MongoDB
projects
- 17. Single View with MongoDB | November 2014
Copyright ©Capgemini 2014. All Rights Reserved 17
Summary and Questions
MongoDB is an excellent platform for building Single View architectures
and solutions
It solves a great many problems associated with existing RDBMS-based
SVC solutions, especially in the areas of
Speed of Access
Speed of Comparison
Representation Flexibility
Data Flexibility
Reliability
As a result of these features, MongoDB provides a demonstrably lower
Total Cost of Ownership for an SVC solution than previous generation
SVC solutions, but at the cost of learning curve to master MongoDB’s
intricacies and associated domain knowledge.
- 18. Single View with MongoDB | November 2014
Copyright ©Capgemini 2014. All Rights Reserved 18
Contact information
Robert
Hill
Head of Big Data for Financial
Services
robert.l.hill@capgemini.com
Capgemini
London
- 19. Insert Client/Partner logo
The information contained in this presentation is proprietary.
© 2014 Capgemini. All rights reserved.
About Capgemini
With almost 140,000 people in over 40 countries, Capgemini is one of the
world's foremost providers of consulting, technology and outsourcing
services. The Group reported 2013 global revenues of EUR 10.1 billion.
Together with its clients, Capgemini creates and delivers business and
technology solutions that fit their needs and drive the results they want. A
deeply multicultural organization, Capgemini has developed its own way
of working, the Collaborative Business ExperienceTM, and draws on
Rightshore®, its worldwide delivery model.
About MongoDB
MongoDB is the next-generation database that helps businesses
transform their industries by harnessing the power of data. The world’s
most sophisticated organizations, from cutting-edge startups to the largest
companies, use MongoDB to create applications never before possible at
a fraction of the cost of legacy databases. MongoDB is the fastest-growing
database ecosystem, with over 8 million downloads, thousands
of customers, and over 650 technology and service partners.
www.capgemini.com www.mongodb.com