Webinar: How MongoDB is Used to Manage Reference Data - May 2014

1,147 views
1,028 views

Published on

Managing and distributing reference data globally has always been a challenge for financial institutions. Managing and maintaining database schemas while integrating and replicating that data across geographies is costly and time consuming. MongoDB's native replication capabilities and partitioned architecture make it simple to distribute and synchronize data efficiently across the globe. MongoDB’s dynamic schema dramatically reduces database maintenance for schema migrations – data structure changes can be applied with no down time, and with no impact to existing applications. For example, by migrating its reference data management application to MongoDB, a Tier 1 bank dramatically reduced the license and hardware costs associated with the proprietary relational database it previously ran.

Published in: Technology, Business
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,147
On SlideShare
0
From Embeds
0
Number of Embeds
259
Actions
Shares
0
Downloads
24
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • 117883, 69461, 102862, 73277, 65134
  • High Availability – Ensure application availability during many types of failures
    Disaster Recovery – Address the RTO and RPO goals for business continuity
    Maintenance – Perform upgrades and other maintenance operations with no application downtime

    Secondaries can be used for a variety of applications – failover, hot backup, rolling upgrades, data locality and privacy and workload isolation


  • MongoDB provides horizontal scale-out for databases using a technique called sharding, which is trans- parent to applications. Sharding distributes data across multiple physical partitions called shards. Sharding allows MongoDB deployments to address the hardware limitations of a single server, such as bottlenecks in RAM or disk I/O, without adding complexity to the application.
    MongoDB supports three types of sharding:
    • Range-based Sharding. Documents are partitioned across shards according to the shard key value. Documents with shard key values “close” to one another are likely to be co-located on the same shard. This approach is well suited for applications that need to optimize range- based queries.
    • Hash-based Sharding. Documents are uniformly distributed according to an MD5 hash of the shard key value. Documents with shard key values “close” to one another are unlikely to be co-located on the same shard. This approach guarantees a uniform distribution of writes across shards, but is less optimal for range-based queries.
    • Tag-aware Sharding. Documents are partitioned according to a user-specified configuration that associates shard key ranges with shards. Users can optimize the physical location of documents for application requirements such as locating data in specific data centers.
    MongoDB automatically balances the data in the cluster as the data grows or the size of the cluster increases or decreases.
  • Sharding is transparent to applications; whether there is one or one hundred shards, the application code for querying MongoDB is the same. Applications issue queries to a query router that dispatches the query to the appropriate shards.

    For key-value queries that are based on the shard key, the query router will dispatch the query to the shard that manages the document with the requested key. When using range-based sharding, queries that specify ranges on the shard key are only dispatched to shards that contain documents with values within the range. For queries that don’t use the shard key, the query router will dispatch the query to all shards and aggregate and sort the results as appropriate. Multiple query routers can be used with a MongoDB system, and the appropriate number is determined based on performance and availability requirements of the application.

  • Webinar: How MongoDB is Used to Manage Reference Data - May 2014

    1. 1. How MongoDB is Used to Manage Reference Data Daniel Roberts @dmroberts #MongoDB
    2. 2. 2 • Problems space • Existing technology solutions • Why MongoDB? • Case Study Agenda
    3. 3. Reference Data Distribution
    4. 4. 4 • How do you globally distribute reference data? – Polymorphic data • Price / Products / Securities Master • Counterparty information - KYC • Corporate Actions • Golden / Single source truth – Often changing in structure, • e.g. new products – Often High volume • How is this typically solved today? Problem Space
    5. 5. 5 • How do you make this available to client applications? – Easy to access – No stale data • Distribute data though multiple technologies • What happens when schema changes are required? – Multiple down stream systems affected. Problem Space
    6. 6. 6 Relational: All Data is Column/Row IssID IssuerName PVCurrency 117883 DWS Vietnam Fund USD 69461 Independence III Cdo Ltd USD 102862 Zamano Plc EUR 73277 Green Way BMD 65134 First European Growth Inc. CHF SecID EventID Company_Meeting IssID 762288 407341 AGM 117883 81198 243459 SDCHG 69461 422999 410626 AGM 102862 422999 243440 SDCHG 102862 75128 20056 ISCHG 65134
    7. 7. 7 MongoDB stores data as JSON Relational MongoDB { "IssID" : 65134, "IssuerName" : "First European Growth Inc.", ”PVCurrency" : “USD”, "actions" : [ { "Company_Meeting" : "ISCHG", "EventID" : 20056, "SecID" : 75128 }, { "Company_Meeting" : ”AGM", "EventID" : 2716296, "SecID" : 75128 } ] }
    8. 8. 8 Do More With Your Data MongoDB Rich Queries • Find all meeting company AGMs that happened last week. Text Search • Find all actions where IssuerName includes “European” Aggregations • How many companies have PVCurrency as USD { "IssID" : 65134, "IssuerName" : "First European Growth Inc.", ”PVCurrency" : “USD”, "actions" : [ { "Company_Meeting" : "ISCHG", "EventID" : 20056, "SecID" : 75128 }, { "Company_Meeting" : ”AGM", "EventID" : 2716296, "SecID" : 75128 } ] }
    9. 9. Why MongoDB?
    10. 10. 10 • What do reference data solutions look like today? • Storage – Relational Database and/or Caching Technologies – File • Replication – ETL or Messaging • Complex, Costly and Brittle – Maintenance • schema changes / infrastructure • Multiple technologies Current Implementations
    11. 11. 11 • What features in MongoDB are ideally suited for Globally replicated reference data systems? 1. Dynamic and flexible schema Why MongoDB?
    12. 12. 12 Document Model Benefits • Agility and flexibility – Data model supports business change – Rapidly iterate to meet new requirements • Intuitive, natural data representation – Eliminates ORM layer – Developers are more productive • Reduces the need for joins, disk seeks – Programming is more simple – Performance delivered at scale
    13. 13. 13 Developers are more productive
    14. 14. 14 • What features in MongoDB are ideally suited for Globally replicated reference data systems? 1. Dynamic and flexible schema 2. Built in replication and high availability Why MongoDB?
    15. 15. 15 Replica Sets • Replica Set – two or more copies • Self-healing • Addresses availability considerations: – High Availability – Disaster Recovery – Maintenance • Deployment Flexibility – Data locality to users – Workload isolation: operational & analytics Primary Driver Application Secondary Secondary Replication
    16. 16. 16 Global Replication Bloomberg IDC Reuter Integration Avoid complicated and costly internal data distribution infrastructure. Single Data vendor interface
    17. 17. 17 Add many nodes Real-Time Real-Time Real-Time Real-Time Real-Time Real-Time Real-Time Primary Secondary Secondary Secondary Secondary Secondary Secondary Secondary
    18. 18. 18 • What features in MongoDB are ideally suited for Globally replicated reference data systems? 1. Dynamic and flexible schema 2. Built in replication and high availability 3. Tag Aware Sharding (Geo) Why MongoDB?
    19. 19. 19 Automatic Sharding • Three types of sharding: hash-based, range-based, tag- aware • Increase or decrease capacity as you go • Automatic balancing
    20. 20. 20 Query Routing • Multiple query optimization models • Each sharding option appropriate for different apps
    21. 21. 21 Read Global/Write Local Primary:NYC Secondary:NYC Primary:LON Primary:SYD Secondary:LON Secondary:NYC Secondary:SYD Secondary:LON Secondary:SYD
    22. 22. Case Study
    23. 23. 23 Distribute reference data globally in real-time for fast local accessing and querying Case Study: Global investment bank Problem Why MongoDB Results • Delays up to 20 hours in distributing data via ETL • Charged multiple times globally for same data • Incurring regulatory penalties from missing SLAs • Had to manage 20 distributed systems with same data • Dynamic schema: easy to load initially & over time • Auto-replication: data distributed in real-time, read locally • Both cache and database: cache always up-to-date • Simple data modeling & analysis: easy changes and understanding • Will save considerable costs. • Individual Groups use internal data instead of paying vendors separately • Data in sync globally, usually within seconds • Moving towards one global shared data service
    24. 24. 24 Previous Reference Data Management Architecture Feeds & Batch data • Pricing • Accounts • Securities Master • Corporate actions Source Master Data (RDBMS) ETL ETL ETL ETL ETL ETL ETL Destination Data (RDBMS) Each represents • People $ • Hardware $ • License $ • Reg penalty $ • & other downstream problems
    25. 25. 25 Solution with MongoDB Feeds & Batch data • Pricing • Accounts • Securities Master • Corporate actions Real-time Real-time Real-time Real-time Real-time Real-time Real-time Each represents • No people $ • Less hardware $ • Less license $ • No penalty $ • & many less problems MongoDB Secondaries MongoDB Primary
    26. 26. 26 • Reference Data technology requirements: Summary Database Cache Geographically replicated Rich Query & Search Flexible Schema Scalable Cost Effective MongoDB Single Technology to meet all these needs
    27. 27. 27 For More Information Resource Location MongoDB Downloads mongodb.com/download Free Online Training education.mongodb.com Webinars and Events mongodb.com/events White Papers mongodb.com/white-papers Case Studies mongodb.com/customers Presentations mongodb.com/presentations Documentation docs.mongodb.org Additional Info info@mongodb.com Resource Location
    28. 28. 28 • Learn to Build & Manage Modern Apps in Two Days • Largest Gather of MongoDB World Experts Ever • 80+ Sessions from Fundamentals to Advanced Opps. Use cases from all industries • Connect with developers, administrators & execs building innovative applications • Ecosystem Partners: IBM, AWS, Microsoft + More • Meet the Experts – Includes Founder Dwight Merriman • Code Webinar300 - $300 off Registration • www.mongodbworld.com MongoDB World – June 23-25, New York City

    ×