Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
An Enterprise Architect's View of MongoDB
1. An Enterprise Architect’s View of
MongoDB
Matt Kalan
Business Architect
matt.kalan@mongodb.com
@matthewkalan
2. Agenda
• Modern drivers of change on enterprises
• Requirements these create
• How traditional databases are handling changes
• New capabilities needed
• How MongoDB provides these capabilities
• Case studies
• Enterprise adoption
2
4. More Technologies and Requirements
Than Ever
NoSQL
Datawarehouse
Hadoop
Internet of Things
4
Document Data Stores
Big Data
Key-value
JSON
Wide-column
MongoDB
Cloud Computing
Mobile
Gamification
Social networking
Graph
Agile Development
ODS
Analytics
Consumerization
5. More Technologies and Requirements
Than Ever
NoSQL
Datawarehouse
Hadoop
Internet of Things
5
Document Data Stores
Big Data
Key-value
JSON
Wide-column
MongoDB
Cloud Computing
Mobile
Gamification
Social networking
Graph
Agile Development
ODS
Analytics
Consumerization
Opportunity cost
Globalization
Customer 360
Cross-channel
New Revenue Streams
Faster Competition
Emerging markets
Regulation
Data Monetization
More with less
Common Services
Empowered customers
Lowering TCO
6. Questions for Enterprise Architects
• What current and future requirements does all
6
this raise?
• How to prepare my enterprise to handle these?
• Which technologies and products will help me?
• How to bring them into my enterprise
successfully?
• How does old and new technology work together?
• What does the future state architecture look like?
7. Modern Application Requirements
Data Types & OOP
• Object-oriented
• Variably structured
• Unstructured (not tabular)
7
Volume of Data
• Petabytes of data
• Trillions of records
• Millions of queries per
second
Agile Development
• Iterative
• Short development
cycles
• Fast time-to-market
New Architectures
• Horizontal scaling
• Commodity
servers
• Cloud computing
RDBMS
Single Views
• Disparate data
• Intraday
• Cross-channel/silo
• Global
8. Modern Application Requirements
Data Types & OOP
• Object-oriented
• Variably structured
• Unstructured (not tabular)
8
Volume of Data
• Petabytes of data
• Trillions of records
• Millions of queries per
second
Agile Development
• Iterative
• Short development
cycles
• Fast time-to-market
New Architectures
• Horizontal scaling
• Commodity
servers
• Cloud computing
RDBMS
Single Views
• Disparate data
• Intraday
• Cross-channel/silo
• Global
9. Impact of New Requirements Handled
with 40-year old Technology
• Customfield1…100 or separate tables
• Caching & ORMs
• Expensive hardware and storage
• Schema migration project
• One canonical schema
• Application-specific partitioning
• Use files instead of databases
• Schema change takes 6 months
9
10. Impact of New Requirements Handled
with 40-year old Technology
• Customfield1…100 or separate tables
• Caching & ORMs
• Expensive hardware and storage
• Schema migration project
• One canonical schema
• Application-specific partitioning
• Use files instead of databases
• Schema change takes 6 months
10
Slow time-to-market
Agility lost
High cost
Failed projects
Business frustrated
11.
12. How Do I Prepare My Enterprise
for Modern Requirements?
13. What would we need to make it
easier?
13
New capabilities
• Dynamic and variable
schemas
• Richly-structured [object]
data
• Higher performance
• Easy horizontal scaling
• Low TCO
14. What would we need to make it
easier?
• Dynamic and variable
schemas
• Richly-structured [object]
data
14
New capabilities
• Higher performance
• Easy horizontal scaling
• Low TCO
Traditional capabilities
• Rich querying
• Strongly consistently data
• High availability
• Security
15. Documents Support Modern
Requirements
15
Relational Document Data Structure
{ customer_id : 1,
first_name : "Mark",
last_name : "Smith",
city : "San Francisco",
location : [40.74, -73.97],
image : <binary>,
phones: [ {
number : “1-212-777-1212”,
dnc : true,
type : “home”
},
{
number : “1-212-777-1213”,
type : “cell”
}]
}
19. No SQL But Still Flexible Querying
19
MongoDB
Rich Queries
• Find anyone with phone # “212…”
• Check if the person with number
“555…” is on the “do not call” list
Geospatial
• Find the best offer for the customer at
geo coordinates of 42nd St. and 6th Ave
Text Search
• Find all tweets that mention the firm
within the last 2 days
Aggregation
• Count and sort number of customers by
city
Map Reduce
• For customers in each zip code, what
are the top 5 most common products
{ customer_id : 1,
first_name : "Mark",
last_name : "Smith",
city : ”New York",
phones: [ {
number : “1-212-777-1212”,
dnc : true,
type : “home”
},
{
number : “1-212-777-1213”,
type : “cell”
}]
}
20. Security Capabilities
• Kerberos
• LDAP
• x.509 certificates
20
• User-Defined Roles
• Field Level Security
• Admin Actions
• CRUD operations
• Partner support
• SSL support on wire
• Disk encryption
support by partners
21. Global Deployment with Local
Read/Writes
21
Primary:NYC
Primary:LON
Secondary:NYC
Primary:SYD
Secondary:LON
Secondary:NYC
Secondary:SYD
Secondary:LON
Secondary:SYD
22. MongoDB Business Value
22
Faster Time to Market Lower TCO
Enabling New Apps Faster Response Time
& Scalability
26. MongoDB Hadoop Connector
26
Operational
Database
• Low latency
• Rich fast querying
• Flexible indexing
• Aggregations in database
• Known data relationships
• Great for any subset of data
Analytics
• Longer jobs
• Batch analytics
• Highly parallel processing
• Unknown data relationships
• Great for looking at all data
MongoDB-Hadoop
Connector
33. Architecture Patterns
1. Operational Data Store (ODS)
2. Enterprise Data Service
3. Datamart/Cache
4. Master Data Distribution
5. Single Operational View
33
34. Architecture Patterns
1. Operational Data Store (ODS)
2. Enterprise Data Service
3. Datamart/Cache
4. Master Data Distribution
5. Single Operational View
34
System of Record
System of Engagement
38. Criteria for benefitting most from
MongoDB instead of RDBMS
Data
Variably or
unstructured
Hierarchical
Geo-coordinates
Disparate sources
Schema changes
often
38
Querying
Real-time analytics &
aggregations
Location-based
Lowest latency
Performance affects
user experience
Requirements
Agile development &
fastest time-to-market
Data will grow quickly
Best performance for
request/response
Lowest TCO
Multiple sources
aggregated
Challenges today with
RDBMS
39. ADP’s Global Mobile Platform
One of the world's largest providers of payments solutions
constructs a completely reliable and robust mobile
experience
39
Problem Why MongoDB Results
• Needed a signature
mobile app for customers
• Must support millions of
users
• Needed to quickly change
features & functionality
• High availability was
critically important
• Built-in high availability
architecture optimized for
global, multi-data center
distribution
• Dynamic schema & rich
querying – deep
functionality from launch &
new features easily added
• Much lower TCO,
especially with commodity
hardware
• iTunes App Store “Top 15”
business app since 2012
launch
• Over 1 million active users, 17
countries, 23 languages
• Extremely high performance
through predictive caching
• Maintenance much easier =>
simple codebase, less
hardware
• New functionality easy and
quick to add
41. Challenge: Siloed operational
applications
41
Silo 1 Data
Silo 2 Data
…
Silo N Data
Impact
• Views are siloed
• Duplicate management
and data access layer
• Need another layer to
aggregate
Silo 1 systems
Silo 2 Systems
…
Silo N
Systems
Reporting Reporting Reporting
42. Solution: Unified data services
42
…
Benefit
• Each application can still
save its own data
• Data is already aggregated
for cross-silo reporting
• One cluster and data access
layer to manage
Silo 1 Systems
Silo 2 Systems
…
Silo N Systems
Reporting
……
43. Case Study: Global Broker Dealer
Trade Mart for all OTC Trades
Distribute reference data globally in real-time for
fast local accessing and querying
43
Problem Why MongoDB Results
• Each application had its
own persistence and
audit trail
• Wanted one unified
framework and
persistence for all
trades and products
• Needed to handle many
variable structures
across all securities
• Dynamic schema: can
save trade for all products
in one data service
• Easy scaling: can easily
keep trades as long as
required with high
performance
• Rich querying: can query
on any fields each
business requires
• Fast time-to-market using
the persistence framework
• Store any structure of
products/trades without
changing a schema
• One consolidated trade
store for auditing and
reporting
45. Challenge: Response From Data
Warehouse or Other System is Slow
45
Cards
Loans
…
Deposits
Data
Warehouse
Issues
• Data stored normalized
• Reports slow to generate
• Data updated daily but user
response must be fast
Impact
• Lost productivity
• Dissatisfied users and
business
Reporting
Cards
Silo 1
Loans
Silo 2
Deposits
Silo 3
46. Solution: Optimize Data Structure as a
Datamart In-memory or On-disk
Cards
Loans
Deposits
46
…
Data
Warehouse
Solution
• Data stored in optimal
structure for reports
• Optionally in memory
Impact
• Response times is as fast
as possible
• Users and business
satisfied
Fast Reporting
Cards
Silo 1
Loans
Silo 2
Deposits
Silo 3
Datamart/Cache
…
47. Case Study: Global Bank -
Personalized In-memory Datamart
Needed fast reporting for finance on global
banking transaction data (about 2 petabytes)
47
Problem Why MongoDB Results
• Data warehouse was
too slow for reporting
• No visibility into how
long reports took
• Could not generate
multiple ad hoc reports
• Users included
regulators so even
more demanding
• Dynamic schema: store
data in optimal structure
• Performance: storing
report results optimally
• In-memory caching of
results
• Rich querying: can query
on any field
• Easy scaling: results
spread across shards to
generate report in parallel
• Create a personalized in-memory
data mart
• Reports configured and
notified when results ready
• Data all in memory so fast
to manipulate
• Data spread across shards
for ultra-fast reporting
49. Challenge: Master data can be hard
to change and distribute
49
Golden
Copy
Batch
Batch
Batch
Batch
Batch
Batch
Batch
Batch
Common issues
• Hard to change schema
of master data
• Data copied everywhere
and gets out of sync
Impact
• Process breaks from out
of sync data
• Business doesn’t have
data it needs
• Many copies creates
more management
50. Solution: Persistent dynamic cache
replicated globally
50
Real-time
Real-time Real-time
Real-time
Real-time
Real-time
Real-time
Real-time
Solution:
• Load into primary with
any schema
• Replicate to and read
from secondaries
Benefits
• Easy & fast change at
speed of business
• Easy scale out for one
stop shop for data
• Low TCO
51. Case Study: Global bank
Reference Data Distribution
Distribute reference data globally in real-time for
fast local accessing and querying
51
Problem Why MongoDB Results
• Delays up to 36 hours in
distributing data by batch
• Charged multiple times
globally for same data
• Incurring regulatory
penalties from missing
SLAs
• Had to manage 20
distributed systems with
same data
• Dynamic schema: easy to
load initially & over time
• Auto-replication: data
distributed in real-time,
read locally
• Both cache and database:
cache always up-to-date
• Simple data modeling &
analysis: easy changes
and understanding
• Will save about
$40,000,000 in costs and
penalties over 5 years
• Only charged once for data
• Data in sync globally and
read locally
• Capacity to move to one
global shared data service
55. Case Study
Insurance leader generates coveted 360-degree view of
customers in 90 days – “The Wall”
55
Problem Why MongoDB Results
• No single view of
customer
• 145 yrs of policy data,
70+ systems, 15+ apps
• 2 years, $25M in failing
to aggregate in RDBMS
• Poor customer
experience
• Agility – prototype in 5
days; production in 90
days
• Dynamic schema:
Imperative to combine
disparate data
• Rich querying: necessary
for match data across silos
• Hot tech to attract top
talent
• Unified customer view
available to all channels
• Increased call center
productivity
• Better customer
experience, reduced
churn, more upsell opps
• Dozens more projects
on same data platform
56. Single [Operational] View of ….
Cards
Silo 1
Loans
Silo 2
56
Operational
Reporting
Real-time
or Batch
…
Single CSR
Application
Unified
Customer Portal
Operational Data Layer
Cards
Loans
…
Deposits
Deposits
Silo N
Strategic
Reporting
…
• Millisecond latency
• Request/response
• Easily scalable
• Flexible schema
• Low TCO
• Rich querying
• Globally distributed
DW/Analytic Data Layer
• Analytical/Offline processing
• 10s seconds to hours latency
• Also scalable, low TCO, and
flexible
• Pre-defined slices of data
(few indexes)
Analytics/Batch
processing
MongoDB
Hadoop Connector
…
57. Processing + Data Access Paradigm
Processing
model
Data access
model
57
Request/response
Map-reduce
Batch, ETL, etc.
Analytical Jobs
Latency important (e.g.
user waiting)
Milliseconds to seconds
Small to large subsets
of data
Indexes valuable
Multiple seconds to hours
Processing all or large sets
of data
Indexes not used
58. Processing + Data Access Paradigm
Processing
model
Data access
model
58
Request/response
Map-reduce
Batch, ETL, etc.
Analytical Jobs
Latency important (e.g.
user waiting)
Milliseconds to seconds
Small to large subsets
of data
Indexes valuable
Multiple seconds to hours
Processing all or large sets
of data
Indexes not used
Typical MongoDB
Use Case
59. Processing + Data Access Paradigm
Processing
model
Data access
model
59
Request/response
Map-reduce
Batch, ETL, etc.
Analytical Jobs
Latency important (e.g.
user waiting)
Milliseconds to seconds
Small to large subsets
of data
Indexes valuable
Multiple seconds to hours
Processing all or large sets
of data
Indexes not used
Typical MongoDB
Use Case
Typical Hadoop
Use Case
60. Processing + Data Access Paradigm
Processing
model
Data access
model
60
Request/response
Map-reduce
Batch, ETL, etc.
Analytical Jobs
Latency important (e.g.
user waiting)
Milliseconds to seconds
Small to large subsets
of data
Indexes valuable
Multiple seconds to hours
Processing all or large sets
of data
Indexes not used
Typical MongoDB
Use Case
Typical Hadoop
Use Case
61. Processing + Data Access Paradigm
Processing
model
Data access
model
61
Request/response
Map-reduce
Batch, ETL, etc.
Analytical Jobs
Latency important (e.g.
user waiting)
Milliseconds to seconds
Small to large subsets
of data
Indexes valuable
Multiple seconds to hours
Processing all or large sets
of data
Indexes not used
Typical MongoDB
Use Case
Typical Hadoop
Use Case
Data
Discovery
63. Example Adoption Path
Use of MongoDB
63
One Project
MongoDB CoE
A Few Projects
Certified
Widespread
Adoption
Operationally
Supported
Time
Defined
64. Traditional Data Integrity Enforcement
64
RDBMS
• Apps access DB directly
• Data Integrity must be in the RDBMS
• Schema implemented by a DBA
Application 1
Application 2
Application 3
65. Modern Apps (SOA) - Data Access
Layer Should Enforce Data Integrity
Application 1
65
MongoDB Cluster
Application 2
• Data Integrity and validations done in
• Implemented in code
Data
Access
Layer
…
Application N
…
Data Access Layer
REST/API/WS API on TCP/IP
66. Data Governance Benefits
• Greater adoption from natural developer
66
framework on common data models
• Easier for master data or upstream changes to
flow into MongoDB-backed apps
• MongoDB useful for distributing master data
• ETL providers support MongoDB most in NoSQL
68. Factors to Consider in Adoption
• SDLC and data governance for an application
• Enterprise-wide data governance (inter-app)
• Enterprise-wide security
• Roles and responsibilities
• Training requirements
• Operations/production support
• Center of Excellence (COE)
• Process for choosing which DB to use
• How to work with other technologies in-house
68
69. Recommended Center of Excellence
69
Database Engineering & CoE
Operational Database CoE
Datawarehousing CoE
70. Recommended Center of Excellence
70
Database Engineering & CoE
Database
Advisory
Services
Operational Database CoE
Datawarehousing CoE
71. Recommended Center of Excellence
71
RDBMS
Engineering
Database Engineering & CoE
Database
Advisory
Services
Operational Database CoE
Datawarehousing CoE
77. Summary
• Enormous technology and business change today
• Old technologies not suited for many of them
• MongoDB is purpose built for today and future applications
• And can help solve common architectural challenges
• Bring MongoDB, Inc. in to learn how to adopt it more widely
77
when appropriate
• Firms using MongoDB benefit from 50% time-to-market,
70% lower TCO, lower operating costs, and making the
infeasible possible
79. For More Information
79
Resource Location
Resource Location
MongoDB Downloads mongodb.com/download
Free Online Training university.mongodb.com
Webinars and Events mongodb.com/events
White Papers mongodb.com/white-papers
Case Studies mongodb.com/customers
Presentations mongodb.com/presentations
Documentation docs.mongodb.org
Additional Info info@mongodb.com
Editor's Notes
Here’s a relational model for an application. It has hundreds of tables.
If you are the new developer who just joined the team, congratulations!!
Here’s a map of the database, now go figure out how to add your new feature (or fix a bug).
Good luck!
Point out what other NoSQL databases have (not rich querying and strong consistency)
Point out what other NoSQL databases have (not rich querying and strong consistency)
One of the main reasons is the data model.
Documents are just easier.
If my app tracks car collections, I don’t need to know dozens of tables – all the data for an individual and their collection is in one document. (Walk through this example)
Dynamic schema
Single view of a customer
Single view of a customer
Compared to distributed cache - $ and fixed schema
Single view of a customer
Can store all accounts in one table
Have performance capacity and easy scaling to to do real-time, not just batch
Can store all accounts in one table
Have performance capacity and easy scaling to to do real-time, not just batch
Single view of a customer
In terms of reporting, A number of Business Intelligence (BI) vendors have developed connectors to integrate MongoDB as a data source with their suites, alongside traditional relational dbs. This integration provides reporting, visualizations, dash-boarding of MongoDB data