MongoDB on FS
Norberto Leite
@nleite
norberto@mongodb.com
http://www.mongodb.com/norberto
2
Agenda
Introduction to MongoDB
MongoDB 3.0
MongoDB on FS
Use Cases
MongoDB Introduction
4
Create Applications Never Before Possible
AGILE SCALABLE
5
The Database of the Post-Relational Era
Combines the foundation of relational
databases with the innovations of NoSQL
Flexible Data Model
Performance
Scalability
NoSQL
Strong Consistency
Powerful Query Language
Rich Indexes
RELATIONAL
6
MongoDB Features
JSON Document Model
with Dynamic Schemas
Auto-Sharding for
Horizontal Scalability
Text Search
Aggregation Framework
and MapReduce
Full, Flexible Index Support
and Rich Queries
Built-In Replication
for High Availability
Advanced Security
Large Media Storage
with GridFS
7
Defense in Depth Security Architecture
MongoDB, Inc.
400+ employees 2,000+ customers
Over $311 million in funding13 offices around the world
Why another database?
Facebook
Fastest Growing Database
LinkedInGoogle
Twitter
11
Relational Database Challenges
Data Types
Unstructured data
Semi-structured data
Polymorphic data
Agile Development
Iterative
Short development cycles
New workloads
Volume of Data
Petabytes of data
Trillions of records
Millions of queries/sec
New Architectures
Horizontal scaling
Commodity servers
Cloud computing
Ps(x, s, e) = eng^e * s / x * C
Application change is const in today's development process!
13
Optimize for Engineer Productivity
1985 2013
Infrastructure Cost
Engineer Cost
14
New Challenges
Analytical Workloads Large Data Sets Variable Structures
MongoDB 3.0
16
MongoDB 3.0
• Pluggable Storage Engine API
• Storage Engines
• Large Replica Sets
• Big Polygon
• Security Enhancements – SCRAM
• Audit Trail
• Simplified Operations – Ops Manager
• Tools Rewrite
17
Storage Engine API
• Allows to "plug-in" different storage engines
– Different work sets require different performance
characteristics
– mmapv1 is not ideal for all workloads
– More flexibility
• Can mix storage engines on same replica
set/sharded cluster
• Opportunity to integrate further ( HDFS, native
encrypted, hardware optimized …)
18
What is WiredTiger?
• Storage engine company founded by BerkeleyDB alums
• Recently acquired by MongoDB
• Available as a storage engine option in MongoDB 3.0
19
Improving Concurrency Control
• 2.2 – Global
• 2.4 – Database-level
• 3.0 MMAPv1 – Collection-level
• 3.0 WT – Document-level
– Writes no longer block all other writes
– Higher level of concurrency leads to more
CPU usage
20
Compression
• WT uses snappy compression by default
• Data is compressed on disk
• 2 supported compression algorithms
– snappy: default. Good compression, relatively low
overhead
– zlib: Better
• Indexes are compressed using prefix
compression
– Allows compression in memory
21
Consistency without Journaling
• MMAPv1 uses write-ahead log (journal) to
guarantee consistency
• WT doesn't have this need: no in-place updates
– Write-ahead log committed at checkpoints
• 2GB or 60sec by default – configurable!
– No journal commit interval: writes are written to
journal as they come in
– Better for insert-heavy workloads
• Replication guarantees the durability
MongoDB 3.0 is a bag full of goodies!
23
Benefits
24
Wider Range of Use Cases
How: Flexible Storage Architecture
• Fundamental rearchitecture, with new pluggable storage engine API
• Same data model, same query language, same ops
• But under the hood, many storage engines optimized for many use
cases
Single View Content Management
Real-Time Analytics Catalog
Internet of Things (IoT)Messaging
Log Data Tick Data
25
Performance!
Great, but … what's in it for me?
MongoDB on FS
Reference Data Management
29
Reference Data Management
• Securities Master
• Economic Calendar
• Corporate Actions
• Counter-party Information
• Legal Identifier
30
Reference Data Management
Data
Feed
Master
Reporting
US
EU
AS
ETL
Time
Broker
App
Sales
App
Message Bus
XYZ
App
XYZ^2
App
31
Replication + Distributed Cache out-of-box
Risk Aggregation & Reporting
33
Risk Aggregation & Reporting
• Intraday Controls
– Less than 1minute reporting
• Aggregate vast amount of data from different
trading desks (asset classes)
• Manage exposure to counter-party entities
– Can be thousands depending on the trade
– Challenge for existing RDBMS systems
Trade Repository
35
Trade Repository
• Scalable Database
– Size
– Velocity
– Variety
• Regulatory Requirements
– Dodd-Frank and EMIR
• Any trade, any point in time
• Unified view of product and trades across time
36
Trade Repository
High Speed
Data
Large Volumes
of Information
Very Diverse
Time-to-Market
Single View of Customer
38
Single View of Customer
• Who is your customer?
• Large Company Problem
– Not unique to FS!
• Integration of data
• Consolidation of services
Single View of Customer
Single View of Customer
Retail Bank Transactions Log
42
Retail Bank Transactions Log
• Data needs to be fetched from Mainframe
– That costs Money!
• Read Requests
– Mobile Apps
– Home Banking
– Analytics
– Marketing Workloads
43
Retail Bank Transactions Log
90's
44
Retail Bank Transactions Log
2000's
45
Retail Bank Transactions Log
2010's
Use Cases
Data
Securities Master, Corporate Actions, Market
Data, Counter-Party Information, Economic
Calendar, Legal Entity Identifier.
Problem
Replicating reference data across
geographies in a timely and efficient manner.
Ensuring that data replication meets with
service level agreements. Ensure a
congruent view across all trading entities in a
global organisation.
Business Benefit
Reduced cost in managing infrastructure.
Timely reference data replicated with SLA.
Company in question will save about $40m
in costs and penalties over 5 years. Only
charged once for data from TR / Bloomberg /
etc instead of regionally as before.
Reference Data Management
Why MongoDB?
Dynamic data model means no schema
changes across geographies, built-in robust
replication mechanism simplifies infrastructure
and removes requirement for additional
integration technologies. Data replicated for
each change, not batch orientated. Both cache
and database cache always up-to-date; simple
data modelling & analysis : easy changes and
understanding.
Case Studies: Large American Investment /
Retail bank
Data
Risk metrics from upstream systems. For
instance, data from front office system for
monitoring counter-party exposure.
Problem
Investment Banks need a congruent view of
exposures across their business in order to
effectively manage risk – need for Intraday controls
– risk measures less than 1 minute old. Could not
scale with RDBMS. Data distributed across
multiple silos and consequently needed to be
aggregated. Need for versioning for data lineage
and auditing. Auditors requiring longer time
window
Business Benefit
Single view of exposure / risk data across the
business. Can make applications changes much
faster. Can hedge / trade with more confidence
and be more competitive. Have less capital
reserves.
Why MongoDB?
Scalable, replicable, flexible (a quick time-to-
market). Can handle more data and users
easily.
Dynamic Schema: can store disparate data and
make changes easily.
Replication: local reads and high availability.
Sharding: can add data and users easily by
scaling out.
Case Studies: Tier-1 Bank - Prime Services;
LargeAmerican Banking Group, Swiss Bank
(Equity Derivatives)
Risk Aggregation & Reporting
Data
Trade data for each new or updated trade.
Problem
Dodd-Frank and EMIR (European Markets and
Infrastructure Regulation) have mandated firms
to store all trade data (including updates) for
seven years. Investment Banks also have the
requirement to be able to query and report at
any time to the regulators in a bi-temporal
manner. Each application builds its own
persistence and audit trail. As an example, one
customer wants one unified framework and
persistence for all trades and products. Found
it hard to find a solution that could handle the
many variable structures across all securities.
Business Benefit
Quick access to data and reporting to ensure
that the regulators have what they need in a
timely manner. Ensure compliance to regulatory
mandates, and help to avoid the consequences
of not complying.
Why MongoDB?
Scalable, dynamic schema - trade information
can vary over time, scalable cost structure as
the data volumes grow, “pay as you grow”.
Case Studies: Global leader in institutional
research and investment management. Large
Australian Bank
Trade Repository
Data
Market, client/customer, trade, any data
Problem
Wanted application groups in the bank to focus
on building apps, not data access logic. It
takes 6 months for apps groups to get new
infrastructure ordered/delivered. Application
developers not very interested in speaking with
Hardware/DBA groups. Horizontal scaling
done by each application.
Business Benefit
Time-to-market decreased by at least 50%.
Object persistence included in framework. DB
capacity added in minutes not months. Same
environment from prototype to production.
Why MongoDB?
For new datamarts, single views, flexible
schema allows integrating disparate systems to
be simplified and “loosely coupled”, i.e.
changes to upstream systems won't break
downstream applications. Native language
drivers: groups can focus on agile application
development. Auto-replication: data distributed
globally in real time.
Case Studies: Large US Investment and
Retail Bank.
DBaaS
Data
Client/Customer data, addresses, personal
details, purchase history, status, etc.
Problem
Siloed data across organisation, no consistent
view across the customer. Difficult to identify
needs of the customer for cross-sell / up-sell
opportunities. Not able to positively deal with
the customer as source systems are hard to
change/touch so the business and IT are
normally stuck. In the customer example, they
had 70 source systems and 20 screens to view
customer policies, so couldn’t feasibly see a
single view.
Business Benefit
Provide the business with an accurate view of
their customer base.
Why MongoDB?
Flexible schema schema allows integrating any
disparate systems to be simplified and "loosely
coupled”, i.e. changes to upstream systems
won't break downstream applications.
Performance: can handle all data in one DB.
Replication: local reads and high availability.
Sharding: can add more data and users
globally by scaling out
Case Studies: MetLife.
Single View of Customer
52
Register now: mongodbworld.com
Early Bird Ends May 1!
Use Code NorbertoLeite for additional 25% Off
*Come as a group of 3 or more – Save another 25%
We’re Always Looking for Top Talent
What are employees saying?
“Working with a group of individuals who you know will have your back is
one of the reasons I love working at MongoDB”
“Every day, we get to solve hard problems that make distributed databases
more accessible to developers all over the world”
“MongoDB lets you tackle real problems that affect hundreds of thousands
of users”
Why work with us?
• We’re by developers for developers
• $311 MM in capital raised to date
• #4 on DB-Engines list of top Database
Management Systems… and climbing
• Scaling our EMEA/APAC operations
aggressively
Visit us at www.mongodb.com/careers to see a full list of opportunities or email your resume to
jobs@mongodb.com
What are we hiring for?
• Technical Services Engineers (Dublin)
• Consulting Engineers (UK OR France)
• Solution Architects (France, Spain, Germany)
• Enterprise Account Executives ( France, Italy, UK,
Germany)
• Corporate Account Executives (Dublin)
• Renewals Account Managers (Dublin)
54
For More Information
Resource Location
Case Studies mongodb.com/customers
Presentations mongodb.com/presentations
Free Online Training education.mongodb.com
Webinars and Events mongodb.com/events
Documentation docs.mongodb.org
MongoDB Downloads mongodb.com/download
Additional Info info@mongodb.com
Obrigado!
Norberto Leite
Technical Evangelist
norberto@mongodb.com
@nleite
MongoDB on Financial Services Sector

MongoDB on Financial Services Sector

  • 1.
    MongoDB on FS NorbertoLeite @nleite norberto@mongodb.com http://www.mongodb.com/norberto
  • 2.
    2 Agenda Introduction to MongoDB MongoDB3.0 MongoDB on FS Use Cases
  • 3.
  • 4.
    4 Create Applications NeverBefore Possible AGILE SCALABLE
  • 5.
    5 The Database ofthe Post-Relational Era Combines the foundation of relational databases with the innovations of NoSQL Flexible Data Model Performance Scalability NoSQL Strong Consistency Powerful Query Language Rich Indexes RELATIONAL
  • 6.
    6 MongoDB Features JSON DocumentModel with Dynamic Schemas Auto-Sharding for Horizontal Scalability Text Search Aggregation Framework and MapReduce Full, Flexible Index Support and Rich Queries Built-In Replication for High Availability Advanced Security Large Media Storage with GridFS
  • 7.
    7 Defense in DepthSecurity Architecture
  • 8.
    MongoDB, Inc. 400+ employees2,000+ customers Over $311 million in funding13 offices around the world
  • 9.
  • 10.
  • 11.
    11 Relational Database Challenges DataTypes Unstructured data Semi-structured data Polymorphic data Agile Development Iterative Short development cycles New workloads Volume of Data Petabytes of data Trillions of records Millions of queries/sec New Architectures Horizontal scaling Commodity servers Cloud computing
  • 12.
    Ps(x, s, e)= eng^e * s / x * C Application change is const in today's development process!
  • 13.
    13 Optimize for EngineerProductivity 1985 2013 Infrastructure Cost Engineer Cost
  • 14.
    14 New Challenges Analytical WorkloadsLarge Data Sets Variable Structures
  • 15.
  • 16.
    16 MongoDB 3.0 • PluggableStorage Engine API • Storage Engines • Large Replica Sets • Big Polygon • Security Enhancements – SCRAM • Audit Trail • Simplified Operations – Ops Manager • Tools Rewrite
  • 17.
    17 Storage Engine API •Allows to "plug-in" different storage engines – Different work sets require different performance characteristics – mmapv1 is not ideal for all workloads – More flexibility • Can mix storage engines on same replica set/sharded cluster • Opportunity to integrate further ( HDFS, native encrypted, hardware optimized …)
  • 18.
    18 What is WiredTiger? •Storage engine company founded by BerkeleyDB alums • Recently acquired by MongoDB • Available as a storage engine option in MongoDB 3.0
  • 19.
    19 Improving Concurrency Control •2.2 – Global • 2.4 – Database-level • 3.0 MMAPv1 – Collection-level • 3.0 WT – Document-level – Writes no longer block all other writes – Higher level of concurrency leads to more CPU usage
  • 20.
    20 Compression • WT usessnappy compression by default • Data is compressed on disk • 2 supported compression algorithms – snappy: default. Good compression, relatively low overhead – zlib: Better • Indexes are compressed using prefix compression – Allows compression in memory
  • 21.
    21 Consistency without Journaling •MMAPv1 uses write-ahead log (journal) to guarantee consistency • WT doesn't have this need: no in-place updates – Write-ahead log committed at checkpoints • 2GB or 60sec by default – configurable! – No journal commit interval: writes are written to journal as they come in – Better for insert-heavy workloads • Replication guarantees the durability
  • 22.
    MongoDB 3.0 isa bag full of goodies!
  • 23.
  • 24.
    24 Wider Range ofUse Cases How: Flexible Storage Architecture • Fundamental rearchitecture, with new pluggable storage engine API • Same data model, same query language, same ops • But under the hood, many storage engines optimized for many use cases Single View Content Management Real-Time Analytics Catalog Internet of Things (IoT)Messaging Log Data Tick Data
  • 25.
  • 26.
    Great, but …what's in it for me?
  • 27.
  • 28.
  • 29.
    29 Reference Data Management •Securities Master • Economic Calendar • Corporate Actions • Counter-party Information • Legal Identifier
  • 30.
  • 31.
  • 32.
  • 33.
    33 Risk Aggregation &Reporting • Intraday Controls – Less than 1minute reporting • Aggregate vast amount of data from different trading desks (asset classes) • Manage exposure to counter-party entities – Can be thousands depending on the trade – Challenge for existing RDBMS systems
  • 34.
  • 35.
    35 Trade Repository • ScalableDatabase – Size – Velocity – Variety • Regulatory Requirements – Dodd-Frank and EMIR • Any trade, any point in time • Unified view of product and trades across time
  • 36.
    36 Trade Repository High Speed Data LargeVolumes of Information Very Diverse Time-to-Market
  • 37.
  • 38.
    38 Single View ofCustomer • Who is your customer? • Large Company Problem – Not unique to FS! • Integration of data • Consolidation of services
  • 39.
  • 40.
  • 41.
  • 42.
    42 Retail Bank TransactionsLog • Data needs to be fetched from Mainframe – That costs Money! • Read Requests – Mobile Apps – Home Banking – Analytics – Marketing Workloads
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
    Data Securities Master, CorporateActions, Market Data, Counter-Party Information, Economic Calendar, Legal Entity Identifier. Problem Replicating reference data across geographies in a timely and efficient manner. Ensuring that data replication meets with service level agreements. Ensure a congruent view across all trading entities in a global organisation. Business Benefit Reduced cost in managing infrastructure. Timely reference data replicated with SLA. Company in question will save about $40m in costs and penalties over 5 years. Only charged once for data from TR / Bloomberg / etc instead of regionally as before. Reference Data Management Why MongoDB? Dynamic data model means no schema changes across geographies, built-in robust replication mechanism simplifies infrastructure and removes requirement for additional integration technologies. Data replicated for each change, not batch orientated. Both cache and database cache always up-to-date; simple data modelling & analysis : easy changes and understanding. Case Studies: Large American Investment / Retail bank
  • 48.
    Data Risk metrics fromupstream systems. For instance, data from front office system for monitoring counter-party exposure. Problem Investment Banks need a congruent view of exposures across their business in order to effectively manage risk – need for Intraday controls – risk measures less than 1 minute old. Could not scale with RDBMS. Data distributed across multiple silos and consequently needed to be aggregated. Need for versioning for data lineage and auditing. Auditors requiring longer time window Business Benefit Single view of exposure / risk data across the business. Can make applications changes much faster. Can hedge / trade with more confidence and be more competitive. Have less capital reserves. Why MongoDB? Scalable, replicable, flexible (a quick time-to- market). Can handle more data and users easily. Dynamic Schema: can store disparate data and make changes easily. Replication: local reads and high availability. Sharding: can add data and users easily by scaling out. Case Studies: Tier-1 Bank - Prime Services; LargeAmerican Banking Group, Swiss Bank (Equity Derivatives) Risk Aggregation & Reporting
  • 49.
    Data Trade data foreach new or updated trade. Problem Dodd-Frank and EMIR (European Markets and Infrastructure Regulation) have mandated firms to store all trade data (including updates) for seven years. Investment Banks also have the requirement to be able to query and report at any time to the regulators in a bi-temporal manner. Each application builds its own persistence and audit trail. As an example, one customer wants one unified framework and persistence for all trades and products. Found it hard to find a solution that could handle the many variable structures across all securities. Business Benefit Quick access to data and reporting to ensure that the regulators have what they need in a timely manner. Ensure compliance to regulatory mandates, and help to avoid the consequences of not complying. Why MongoDB? Scalable, dynamic schema - trade information can vary over time, scalable cost structure as the data volumes grow, “pay as you grow”. Case Studies: Global leader in institutional research and investment management. Large Australian Bank Trade Repository
  • 50.
    Data Market, client/customer, trade,any data Problem Wanted application groups in the bank to focus on building apps, not data access logic. It takes 6 months for apps groups to get new infrastructure ordered/delivered. Application developers not very interested in speaking with Hardware/DBA groups. Horizontal scaling done by each application. Business Benefit Time-to-market decreased by at least 50%. Object persistence included in framework. DB capacity added in minutes not months. Same environment from prototype to production. Why MongoDB? For new datamarts, single views, flexible schema allows integrating disparate systems to be simplified and “loosely coupled”, i.e. changes to upstream systems won't break downstream applications. Native language drivers: groups can focus on agile application development. Auto-replication: data distributed globally in real time. Case Studies: Large US Investment and Retail Bank. DBaaS
  • 51.
    Data Client/Customer data, addresses,personal details, purchase history, status, etc. Problem Siloed data across organisation, no consistent view across the customer. Difficult to identify needs of the customer for cross-sell / up-sell opportunities. Not able to positively deal with the customer as source systems are hard to change/touch so the business and IT are normally stuck. In the customer example, they had 70 source systems and 20 screens to view customer policies, so couldn’t feasibly see a single view. Business Benefit Provide the business with an accurate view of their customer base. Why MongoDB? Flexible schema schema allows integrating any disparate systems to be simplified and "loosely coupled”, i.e. changes to upstream systems won't break downstream applications. Performance: can handle all data in one DB. Replication: local reads and high availability. Sharding: can add more data and users globally by scaling out Case Studies: MetLife. Single View of Customer
  • 52.
    52 Register now: mongodbworld.com EarlyBird Ends May 1! Use Code NorbertoLeite for additional 25% Off *Come as a group of 3 or more – Save another 25%
  • 53.
    We’re Always Lookingfor Top Talent What are employees saying? “Working with a group of individuals who you know will have your back is one of the reasons I love working at MongoDB” “Every day, we get to solve hard problems that make distributed databases more accessible to developers all over the world” “MongoDB lets you tackle real problems that affect hundreds of thousands of users” Why work with us? • We’re by developers for developers • $311 MM in capital raised to date • #4 on DB-Engines list of top Database Management Systems… and climbing • Scaling our EMEA/APAC operations aggressively Visit us at www.mongodb.com/careers to see a full list of opportunities or email your resume to jobs@mongodb.com What are we hiring for? • Technical Services Engineers (Dublin) • Consulting Engineers (UK OR France) • Solution Architects (France, Spain, Germany) • Enterprise Account Executives ( France, Italy, UK, Germany) • Corporate Account Executives (Dublin) • Renewals Account Managers (Dublin)
  • 54.
    54 For More Information ResourceLocation Case Studies mongodb.com/customers Presentations mongodb.com/presentations Free Online Training education.mongodb.com Webinars and Events mongodb.com/events Documentation docs.mongodb.org MongoDB Downloads mongodb.com/download Additional Info info@mongodb.com
  • 55.

Editor's Notes

  • #5 We are not in the business of doing things like before We are in the disruptive technology business
  • #7 MongoDB provides agility, scalability, and performance without sacrificing the functionality of relational databases, like full index support and rich queries Indexes: secondary, compound, text search, geospatial, and more
  • #13 x = number of features s= size of the team e = expertise
  • #14 In 1985, storage was the key expense: $100,000 per GB; developer salary: $28,000 per year So relational databases were built to optimize for storage In 2013, storage is cheap: $0.05 per GB. Developers are expensive: $90,000 per year So MongoDB was built to optimize for developer productivity This is what the ratio of those expenses looks like, in 1985 and today Assumptions: 3-year TCO 1985: 2 developers and 5 GB 2013: 2 developers and 5 TB Developer costs comprise the lion’s share relative to storage today. So optimize for developer productivity
  • #15 Analysis of large sets of information New streams of data from big data scnearios and IoT Data formats that are very variable and constant changing
  • #31 Enrichment of an existing feed and feed onboarding takes months Data updates reach the traders with intra-day frequency Sub-optimal data access and global availability Licensing agreements are not effective
  • #32 Replication Distributed Cache Dynamic Schema Search Engine
  • #41 Data Flexibility Analytics Reduction on total cost of ownership Integrated with other big data platforms Expansion of the use cases
  • #43 This can also be set has a mainframe offloading
  • #44 Very controlled environment No spikes (expect for christmas shopping) Limited amount of users Costs are easy to determined ahead
  • #45 Very controlled environment No spikes (expect for christmas shopping) Limited amount of users Costs are easy to determined ahead
  • #46 Very controlled environment No spikes (expect for Christmas shopping) Limited amount of users Costs are easy to determined ahead