MongoDB and RDBMS: Using Polyglot Persistence at Equifax. Presented by Michael Lawrence, Pariveda Solutions on behalf of Equifax at MongoDB Evenings Atlanta on September 24, 2015.
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
MongoDB and RDBMS: Using Polyglot Persistence at Equifax
1. 1
MongoDB and RDBMS:
Using Polyglot Persistence at Equifax
MongoDB Evenings Atlanta
September 24, 2015
Mike Lawrence
2. 2
“I specialize in business development utilizing a strong background in data science and architecture to improve business
Go-To-Market strategies and operation. I enjoy leveraging data to spot industry trends, make predictive decisions about
future growth areas, and improve context capture for data sets. I am also a caffeine-aholic, so please feel free to say hello
to me next time you’re at Starbucks. “
Mike Lawrence
Associate, Pariveda Solutions
@theMrLawrence
3. 3
You will develop a strong understanding of the polyglot persistence usecase
Three Key Takeaways
1. Breaking traditional data storage patterns enabled Equifax to develop data persistence and access
patterns for agility
2. The key drivers to implement MongoDB and the benefits to the business and consumer experience
3. Leveraging the strengths of MongoDB and RDBMS provides a versatile data solution that increase the
lifetime value of consumer relationships and improve customer experience.
4. 4
From business overview to solution architecture
A Look at our Presentation Agenda
Equifax PSOL
A quick overview of the Equifax
Personal Solutions business unit
Data Access Patterns
Diving into the data persistence and
access patterns by the application
5. 5
Understanding the Data
Explore the different types of data
and understanding its use
Document Storage
Use case, adoption, advantages of a
document storage solution
Cost Savings with MongoDB
A document storage solution
provided a reduction of overall
storage costs
Relational Storage
Not all data gets persisted into
MongoDB, some data remains highly
relational
6. 6
Let’s begin!
Solution Architecture
A quick glance into the polyglot
solution architecture between app
and persistence
Q & A
Time for all of your questions and
comments!
7. 7
Fueling New Product Innovation (NPI)
Equifax Personal Solutions
Consumer Impact
Personal Solutions continued to increase the
lifetime value of its consumer relationships
by improving the customer experience and
introducing new, high-value products
Equifax Personal Solutions, which contributes 10% of the overall Equifax Revenue, supplies consumers with information to help them
understand their credit and protect their identity. In 2014 they launched a strategic transformation to ensure long-term, sustainable
growth in the face of a changing market environment.
8. 8
The first step to the future
Equifax PSOL Strategic Transformation
Equifax is re-engineering the consumer application platform to
better reach the digital consumer
A core principle of this strategic project is to introduce new
technologies to Equifax that further expand their ability to
execute on business objectives at lower operating costs and
improve overall system performance.
9. 9
Breaking Traditional Data Storage Patterns
Adopting new approaches to data persistence and data access for agility
10. MapReduce
Data Processing for Complex
BI and Reporting
Streaming
Realtime processing and
fulfillment
Document
Transactional
Document Storage for
cohesive and large
transactional data
Relational
Transactional
Relational storage for highly
structured transactional data
Document
Archival
Document Storage for Archival
Solutions
Data Access Patterns
11. MapReduce
Data Processing for Complex
BI and Reporting
Streaming
Realtime processing and
fulfillment
Document
Transactional
Document Storage for
cohesive and large
transactional data
Relational
Transactional
Relational storage for highly
structured transactional data
Document
Archival
Document Storage for Archival
Solutions
Data Access Patterns
12. 12
The types of data are wide ranging, but centered around the consumer
Application Data is Consumer Centric
Consumer Information
Basic information about the
consumer must be persisted
to track identity
User Authentication
Role management, user, and
customer verification are
required for privileges
Order Management
Orders are tracked through
placement to completion
Product Catalogs
Available products, offers,
and cross sell are managed
through the database
Configurations
Application configurations
are stored for light payload
Audit Logging
All activities must be tracked,
audited, and persisted
Digital Products
Credit products are large
documents of data to be
supplied to a customer
Alert Processing
Alerts are a form of product
that are persisted and
supplied to consumers
13. 13
Relational
Consumer Information
Basic information about the
consumer must be persisted
to track identity
User Authentication
Role management, user, and
customer verification are
required for privileges
Order Management
Orders are tracked through
placement to completion
Product Catalogs
Available products, offers,
and cross sell are managed
through the database
Configurations
Application configurations
are stored for light payload
Audit Logging
All activities must be tracked,
audited, and persisted
Digital Products
Credit products are large
documents of data to be
supplied to a customer
Alert Processing
Alerts are a form of product
that are persisted and
supplied to consumers
14. 14
Consumer Information
Basic information about the
consumer must be persisted
to track identity
User Authentication
Role management, user, and
customer verification are
required for privileges
Order Management
Orders are tracked through
placement to completion
Product Catalogs
Available products, offers,
and cross sell are managed
through the database
Configurations
Application configurations
are stored for light payload
Audit Logging
All activities must be tracked,
audited, and persisted
Digital Products
Credit products are large
documents of data to be
supplied to a customer
Alert Processing
Alerts are a form of product
that are persisted and
supplied to consumers
Document
16. 16
As Equifax grows their consumer base, new data storage technologies were explored to keep with increase demand
Performance and cost
are two key drivers of success
Developing a new platform offered an opportunity to explore
different technologies solve new challenges
17. 17
As development moved forward, the need
document storage became clearer
Building a Case for
Document Storage
Lightweight Searchable Storage
Storage of documents in RDBMS is bulky, slow retrieval, and
difficult to search
Data Volume
High volume of data creation and retrieval requires scalability
Last Mile Delivery
Performance drives realtime rendering of credit reports, invoices,
and other large documents
Realtime Performance
Large volumes of data are analyzed realtime by the business
18. 18
The path that led Equifax from concept to adopting MongoDB
Choosing MongoDB as the Solution
Adoption of MongoDB is driven by
retrieval, scalability, and cost
• MongoDB offers flexible storage, easy scalability, and
high-performance searching and document retrieval
• High performance searching and retrieval allows
Equifax to render credit reports instantly
• MongoDB is a low-cost solution compared to RDBMS
storage of documents
• Scalability of MongoDB meets future growth needs of
Equifax as their data continues to grow exponentially
• Independent searching outside of RDBMS
High Volume Data
Stored in RDBMS
CLOB/BLOB
Fast Retention and
Retrieval of Large
Data
NoSQL Data Store
MongoDB
19. 19
Determining where to persist data is decided from a few key rules
Rules for Data Persisted in MongoDB
Cohesive Unstructured Unknown Metadata High Volatility
Highly cohesive data or
information that cannot be
broken down
Unstructured documents with
few or no standards
New products and
developments may require
different metadata
Documents susceptible to
frequent schema changes
20. 20
Configurations
Application configurations
are stored for light payload
Audit Logging
All activities must be tracked,
audited, and persisted
Digital Products
Credit products are large
documents of data to be
supplied to a customer
Alert Processing
Alerts are a form of product
that are persisted and
supplied to consumers
Documents Stored in MongoDB
22. 22
A dramatic reduction in cost over relational storage
MongoDB Helped Increase Bottom Line
MongoDB Storage Cost per GB
Reduction of storage costs have been a major driving
force behind the implementation of MongoDB to
supplement the relational database.
$2/gb
MongoDB
$/gb
RDBMS
$/gb
Storage costs were reduced
400% from $8/gb using
RDBMS to $2/gb using
MongoDB
400%Cost savings per GB
$ $ $ $ $ $ $ $
24. 24
Determining where to persist data is decided from a few key rules
Rules for Data Persisted in RDBMS
Loosely Coupled Highly Structured Highly Related Low Volatility
Data that withstands the
breakdown into smaller pieces
Hierarchical or other defined
structures
Data extends out to many
associations
Relational Data tends to go
through few and minor
changes over time
25. 25
Relational
Consumer Information
Basic information about the
consumer must be persisted
to track identity
User Authentication
Role management, user, and
customer verification are
required for privileges
Order Management
Orders are tracked through
placement to completion
Product Catalogs
Available products, offers,
and cross sell are managed
through the database
29. 29
Leveraging Both Storage Platforms
Enables Scalability, Performance, and Agility
MongoDB and RDBMS each have their place, using both together increases flexibility and growth
RDBMS
Highly Relational or Structured
Transactional Data
Loose Cohesion
Low Volatility
RDBMS MongoDB
MongoDB
Unstructured
Unknown Metadata
Tight Cohesion
High Volatility
Large Data
30. 30
Branched Consumer Data with
Document Leaf Nodes
The complete consumer is much like a tree and its leaves
Credit Files
Audit Logs Alerts
Configurations
The polyglot persistence architecture leverages the
strengths of both storage technologies. The natural
structure of the consumer and the product catalog dictate
relational, while the products to be fulfilled are highly
cohesive documents.
The consumer tree and the document nodes.
A tree and its leaves
31. 31
WHAT WHY WHERE WHEN WHO HOW
Q&A
MongoDB and RDBMS: Using Polyglot Persistence at Equifax
Editor's Notes
Drink a lot of water
Positive mental process
Continue asking rhetorical questions.
Suppositional language (e.g. Suppose/Imagine)
First get excited about the story, tell people what they came here for.
Thank you
Excited to be speaking
“Many people think that they have to use MongoDB or a Relational database, having to choose one or the other. Wrong. You can use both, and I’m going to tell you about the cool story of how we leveraged the strengths of both at Equifax so we could have our cake and eat it too.”
Talk about relational background and challenges of understaning MongoDB…how do I adapt to this???
Let me tell you a little bit about me
I am data guy
Leveraged background in data sciences to spot industry trends, future growth areas,
Improve business and go to market strategies
I “grew up” with relational databases, when I first heard about NoSQL and MongoDB I was very confused how to use it, adapt to it, and change my mind set.
Found NoSQL/Mongo “different”
What do you mean there's no schema?
Found it has some powerful benefits, especially using both technologies in hybrid
Caffeine-aholic
Continue selling 21st century technology, using the right tool, using both relational and MongoDB. Talk about the usecases, and using the right tool for your needs.
I have to say, I’m excited to be sharing this with you tonight
Using both MongoDB and RDBMS together is a pretty cool experiment that turned out successful
Three key takeaways, and they are all following that theme of use the right tool for the job. You’re not going to use a sledge to hang a picture frame
Breaking traditional storage patterns enabled Equifax to develop data persistence and access patterns for agility
How MongoDB benefited the business and consumer experience
Leveraging strengths of both provides a versatile data solutions to increase lifetime value of consumer relationships
Let’s take a quick look at what we’ll be covering
First, overview Equifax Personal Solutions, PSOL
Persistence and access patterns of the consumer app
Understanding the data from the consume app
The growing need for document storage and the path to MongoDB
Benefits and cost savings with MongoDB
How existing infrastructure is leveraged by using relational storage
What a solution architecture for using both MongoDB and RDBMS looks like
Equifax Personal Solutions, known as PSOL, is the business unit that delivers consumer credit products
Credit files, credit reports, scores, alerts, protection services
PSOL helps consumers understand and protect their identity
Accounts for 10% of the business
It was time for a change – blackberry guy
Get to try some cool R&D like things, inside of a Fortune 500 business
In 2014 PSOL launched the strategic transformation project
We were re-engineering the consumer app to better reach the digital consumer
50% of users use mobile devices now, we needed to keep up
Core principle was to explore new technologies
Expand ability to execute biz objectives at lower cost
Improve the consumer experience
30 years ago credit reports were on paper, now everything is digital
To meet all those goals, we had to look at breaking traditional storage patterns
The same old legacy infrastructure wasn’t going to cut it
So let’s take a look at the data access patterns that exist within the consumer app
Consumer platform, consumer centric data
During the development of the new platform realized this data naturally split into two groups
You may be asking, why do these particular types of data belong in a document store?
We’ll be covering that shortly,
but let’s first take look at the road to document storage.
It was time for us to look at document storage
How can we leverage a document store to meet objectives?
We needed the right tool for the job
Performance and cost – two key drivers of success
As Equifax’s consumer base grows, new data storage technologies need to be explored
A great thing about developing this new platform, opportunity to explore different technologies
As the development of the new platform moved forward, it became evident we needed to explore document storage
We need lightweight searchable storage
RDBMS CLOB/BLOB storage was slow and bulky
As a credit bureau, there’s a high volume of data
Consumers could have a few megs of documents or several gigs
Last mile delivery
Render credit reports on demand across devices
Realtime performance
Business analyzes OF traffic for marketing campaigns
On top of that, faster to develop and go to market with non-relational.
Don’t need new columns for change
Let’s take a look at the timeline that led us to MongoDB as our solution
MongoDB the right choice
Why Mongo?
Flexible storage, easy scalability, and high performance searching for document retrieval
Allows us to render credit reports instantly
Low cost solutions compared to RDBMS for storage of documents
Scalability of MongoDB meets future growth needs as consumer base grows
Independent searching outside RDBMS
Didn’t need to go through RDBMS to identify records, or go to a LOB store
Highly Coheisve – Cannot be broken down
Unstructured – No standards, 3 Bureau alert files
Unknown Metadata – Fields are different for different documents
High voltatility – document schemas tend to change frequently
A credit file for example, is different.
Be human
Be relatable, to successes
Try to connect
Get excited
400% per GB storage
$8 – $2/gb
Not including the performance benefits from using MongoDB
Mongo upgrade to 3.0.x 12:1 compression
Let’s take a few minutes now to discuss how we leveraged relational storage infrastructure.
We didn’t throw the baby out with the bath water. Not everything belongs in MongoDB
Still use RDBMS for many things, the data is typically
Loosely coupled – can be broken down
Highly structured – hierarchical or other structure
Highly related – Data extends to many associations. Data becomes more meaningful building relationships
Low volatility – The schema doesn’t tend to change much, consumer information has been the same
Product catalog is interesting one
MongoDB has had many clients successfully use MongoDB for Product catalogs
Equifax’s product catalog is a unique usescase
Most products are a composite pattern, where one product is a bundle of several other objects
The composite pattern extends to the consumer graph
We have to join those products to the consumer information to identity and build a product option
It is highly relational, which is why, in our case, it exists in RDBMS
What does a polyglot persistence solution look like?
I’m sure you’re wondering, how does MongoDB and RDBMS play well together? You might be saying, “That sounds like oil and water.”
Suppose you need to persist some data into both Mongo and RDBMS, simultaneously. What does that look like?
That was a challenge we had to solve
Aha moments and evolution of developing MongoDB solution
imagine a customer is being enrolled into a monitoring service.
Data will need to be persisted into MongoDB and the relational store
Wrapped in a transaction first
Avoid orphaned records
There is “virtual relational integrity” between the document and RDBMS record
Once both setters register success, the transaction is completed
At Equifax, we have leveraged both MongoDB and RDBMS to work together in harmony
This has provided us with a versatile data solution
Leveraging both platforms, allows for scalability, performance, and agility
We can choose the right tool for the job
MongoDB and RDBMS each have their strengths
Using both, increases our flexibility for growth
The complete consumer is much like a tree and its leaves
Polyglot persistence leverages the strengths of both storage technologies for a complete solution
Natural structure of consumer dictates a relational data store
Digital products fulfilled are highly cohesive documents
It is like a tree and it’s leaves, or an apple tree.
The branches are the consumer graph, and the apples are the nodes, or documents for the consumer
Not every consumer may have a document node
But we are able to use the right tool for the job to paint a complete picture.
Talk about the solution
Schema design conversation
Embedding vs Document
Infrastructure