Speaker: Feng Qu, Sr MTS, eBay
Level: 200 (Intermediate)
Track: Developer
Building applications resilient to infrastructure failure is essential to systems that run in distributed environments, including those with a MongoDB database. For example, failure can come from computer resources, such as nodes, network switches, or the entire data center. On occasion, MongoDB nodes may be marked down by Operations to perform administrative tasks, such as a software upgrade, adding extra capacity, etc.
In this session, we will discuss how to build resilient applications using appropriate design patterns suitable to enterprise class MongoDB applications.
What You Will Learn:
- How to manage updates within a resilient architecture.
- Design patterns for resilient applications.
- Practical advice for deploying resilient enterprise applications.
2. Speakers Bio
Donovan Hsieh - Sr. Enterprise Data
Architect @ eBay Inc.
• Have worked on
• Major RDBMS & Enterprise Data
Architecture / Modeling
• New passion is
• Highly available, fault tolerant distributed
computing, (Near) Real time Big Data &
Enterprise class NoSQL
• Speaker at 2015, 2016, 2017 EDW, NoSQL Now
& Dama Canada Conferences
• Speaker at 2016 Data Governance & Information
Quality Conference
Feng Qu – Sr. MTS @ eBay Inc.
• Have worked on
• Oracle since 1995
• NoSQL(Cassandra, MongoDB and
Couchbase) since 2011
• Led company wide NoSQL projects
• 2014 and 2015 Cassandra MVP
• Speaker at 2013, 2014 & 2015
Cassandra annual Summit
• Speaker at 2016 Couchbase Connect
• Speaker at 2016 & 2017 EDW
Conferences
3. Presentation Outline
• Why Resiliency Design Pattern for NoSQL Databases?
• Doesn’t NoSQL Support Auto-Failover Out-of-Box?
• NoSQL Resiliency Design Pattern Consideration
• NoSQL Resiliency Design Pattern Approach
• MongoDB Architecture Overview
• MongoDB Resilience Design Pattern Examples
• Other Resilience Design Pattern Examples
• Future Work & Direction
• Key Takeaways & Conclusion
• Q & A
4. Why Resiliency Design Pattern for NoSQL?
• Optimize operation & management efficiency to achieve
highest possible Production Availability
• Production Availability is more than just database clusters or
nodes availability
• Facilitate development with availability architectural blueprint &
SLA
• Application shouldn’t be burdened with unpreventable
infrastructure failure
Maximize
5. Doesn’t NoSQL Support AutoFailover OutofBox?
• Depending on types of NoSQL database & deployment
topology, e.g.,
• Built-in Disaster Recovery (DR)
• Single Point of Failure (SPOF) for reads and/or writes
• Node or cluster failover time
• Client connection stacking
• Graceful cluster nodes rebalancing & migration
• Ease of management for large scale clusters
• Consistent point-in-time backup & recovery
• Not all NoSQL databases are created equal in terms of
Availability, Consistency, Durability & Recoverability
6. Eight Fallacies of Distributed Computing
1. The network is reliable
2. Latency is zero
3. Bandwidth is infinite
4. The network is secure
5. Topology doesn't change
6. There is one administrator
7. Transport cost is zero
8. The network is homogeneous
9. NoSQL applications are resilient to failure
- Peter Deutsch & James Gosling
7. NoSQL Resiliency Design Pattern Consideration
• It’s not one dimension but rather a coherent set of inter-
connected cogs working together:
• Use case qualification
• Application persistence error handling using suitable NoSQL
vendors’ client drivers and SDKs
• Technology stack & engineering framework (e.g., Data Access
Layer or Netflix Hystrix)
• Infrastructure (e.g., cloud) setup
• Operation & management best practice + SOP
8. NoSQL Resiliency Design Pattern Approach
Identify meaningful NoSQL database architectural
abstraction based on relevant CAP theorem, ACID / BASE
properties & performance characteristics
Define different pattern types & standardize minimal
NoSQL Cluster deployment pattern for
- Common small-to-medium, non-mission critical use cases
Define enhanced design patterns to support mission
critical use cases that require high
- Availability, Consistency, Durability, Scalability & Performance
Define other design patterns to support non-conforming
use cases
- Standalone w/out DR, application sharding, etc.,
9. NoSQL Resilience Design Pattern Types
Type Pattern
Workload General Purpose Mixed Read & Write
Performance High Performance Read and / or Write
Durability
100% Durability
High Local and / or Cross Datacenter Durability
High Availability (HA) High Availability Local Read & Write
High Availability Multi Datacenter Read & Write
High Read & Write
Consistency
High Local Datacenter Read & Write Consistency
High Multi Datacenter Read & Write Consistency
Others Administration, Backup & Restore, Application Sharding …
14. Special qualified use
case w/o DR
3+ nodes
MongoDB Resilience Architecture Recap
NoSQL
Database
Cluster
Type
Std. Minimal Deployment Pattern
Datacenter 1 Datacenter 2 Datacenter 3
MongoDB Multi-DC Replica Set
with DR
3+ (1 primary,
2+ secondary)
2+ (secondary) 2+ (secondary
or arbiter)
NoSQL
Database
Datacenter High
Availability
High
Consistency
High
Durability
DR
MongoDB Local DC No for Write
Yes for Read
Yes Yes No
Multi DC Yes
16. Read Intensive / Highly Available Read Pattern
•High read to low write ratio and can tolerate primary node failure
•A Replica Set can have up to 50 members to scale up high read traffic if
needed, still limited to 7 voting members though
17. 17
Extreme High Read / Write Pattern
•Use sharded MongoDB to support horizontal write & read scaling
18. High Performance Local Read Pattern
•Load balance read traffic before application servers
•Size up proper # of local secondary nodes
•Use SSD if active working set > RAM size
•Use readPreference=nearest or readPreference=secondaryPreferred
+ suitable localThreshold value
20. Write Durability Pattern – Quorum Write
•Use WriteConcern (“majority”, …), write waits for confirmation from majority
secondary nodes across different datacenters
•During primary node failover, secondary node having the latest committed
write will be elected as the new primary.
S/A
S/A
21. Strong Consistency Pattern – Quorum R & W
•Support strong read & write consistency across multiple datacenters using writeConcern/
readConcern=Majority which is based on CAP Theorem R+W>N Consistency Property
•Caveat
–May incur additional wait time because of multi-datacenter read & write confirmation
–Application should set proper timeout to circumvent longer than normal wait should any
member of nodes reading and / or writing fail for any reason
22. Strong Consistency – Cross DC Tagged R/W
•Supports strong read & write consistency from local tagged secondary
node(s) without waiting for majority read confirmation
•Caveat
–Write still incurs wait time because of multi-datacenter write confirmation
–Requires non-standard replica set configuration setting
–Developer should set proper timeout to circumvent from indefinite wait
23. Strong Consistency Pattern – R/W from Primary
•Supports strong read & write consistency for application that cannot
tolerate quorum read/write wait time
•Secondary nodes are only used for primary node failover while arbiter
nodes are only used for quorum voting
25. NoSQL DB Agnostic Application Sharding Pattern
•Reduce HA risk associated with managing very large scale NoSQL
clusters
•Requires middle-tier Data Access Layer with built-in hash, range &
modular sharding
•On top of built-in native NoSQL sharding
26. Future Work & Direction
• End-to-end Integration proven NoSQL design patterns with
• Application framework (e.g., Data Access Layer)
• Cloud provisioning & management infrastructure
• Formalize above NoSQL design patterns as officially supported
internal development products rather than engineering patterns
• Collaborate with NoSQL vendors and develop new design
patterns for new features & capabilities