twitter: @MithunShanbhag
blog: mithunshanbhag.github.io
twitter: @MithunShanbhagblog: mithunshanbhag.github.io
CONSISTENCY
PARTITION
TOLERANCE
AVAILABILITY
CA CP
A
P
twitter: @MithunShanbhagblog: mithunshanbhag.github.io
Consistency Level Write Latency Ordering Guarantee(s)
Strong Highest • Linearizability (reads fetch the most recent write).
Bounded Staleness High • Delayed Linearizability (reads lag writes by ‘T’ time interval or ‘V’
version).
Session Medium • Monotonic reads
• Monotonic write
• Read-your-writes
• Writes-follow-reads
Consistent Prefix Low • No out-of-order writes.
Eventual Lowest • No ordering guarantees.
twitter: @MithunShanbhagblog: mithunshanbhag.github.io
Name Balance Version
John 100 9
Roger 250 17
Name Balance Version
John 120 9
Name Balance Version
Roger 300 17
twitter: @MithunShanbhagblog: mithunshanbhag.github.io
Balance
107.81
Latitude Longitude
12.345928 77.601231
twitter: @MithunShanbhagblog: mithunshanbhag.github.io
Date Time Amount TransactionId
01/01/19 22:03 57.20 C920345
24/02/19 10:25 20.00 F456987
17/03/19 10:25 -10.20 D352344
21/04/19 16:07 40.81 P987341
Date Time Latitude Longitude
09/07/19 17:03:02 12.903356 77.232295
09/07/19 17:03:07 12.803472 77.341096
09/07/19 17:03:12 12.597920 77.402988
09/07/19 17:03:17 12.345928 77.601231
TIMETIME
twitter: @MithunShanbhagblog: mithunshanbhag.github.io
FirstName LastName Gender Phone Address City State
Aaron Paul M 3102103432 blah.. Austin TX
Walter White M 6723452110 blah.. Seattle WA
Saul Goodman M 8787822409 blah.. New York NY
Hank Schrader M 2787007844 blah.. Denver CO
Skyler White F 3529378921 blah.. Seattle WA
Date Time VehicleId TripId Latitude Longitude
09/07/19 17:03:02 CR34XYZ TYUYH7 12.903356 71.232295
09/07/19 17:03:02 RTYU651 7TR556 98.803472 32.341096
09/07/19 17:03:02 KLXC955 098JGB 23.597920 25.402988
09/07/19 17:03:02 JH44GV4 BMZA21 09.345928 77.601231
09/07/19 17:03:02 RTYU651 7TR556 94.234510 32.675444
twitter: @MithunShanbhagblog: mithunshanbhag.github.io
image attribution: sql azure documentation
twitter: @MithunShanbhagblog: mithunshanbhag.github.io
image attribution: azure table storage documentation
image attribution:
mongodb documentation
twitter: @MithunShanbhagblog: mithunshanbhag.github.io
image attribution: azure architecture center documentation
twitter: @MithunShanbhagblog: mithunshanbhag.github.io
image attribution: azure architecture center documentation
twitter: @MithunShanbhagblog: mithunshanbhag.github.io
image attribution: azure architecture center documentation image attribution: microservices.io
twitter: @MithunShanbhagblog: mithunshanbhag.github.io
image attribution: azure architecture center documentation
twitter: @MithunShanbhagblog: mithunshanbhag.github.io
image attribution: azure architecture center documentation
twitter: @MithunShanbhagblog: mithunshanbhag.github.io
image attribution: azure architecture center documentation
twitter: @MithunShanbhagblog: mithunshanbhag.github.io
image attribution: azure sql stretch db documentation
twitter: @MithunShanbhagblog: mithunshanbhag.github.io

Design Patterns for Data Management and Consistency

Editor's Notes

  • #3 1. All nodes will be consistent. - If I writes to node X and immediately read from node Y, I should get the most recently written update. - If I query nodes X, Y and Z, they should all return the same (most recently written) state. "Every node provides the most recently written update (or none at all if consistency cannot be guaranteed)“ No Node will return outdated state. 2. Every node will immediately respond to a read or write request. But no consistency guarantee (i.e. response may not have the most recently written update). "Every node has constant read and write access" 4. - Every real-world system has to be partition tolerant. So really the choice comes down between consistency and availability. - Also a spectrum, not absolute black and white. 6. Most “strongly consistent” systems are CP. 7. Most “eventually consistent” systems are AP.
  • #4 - Monotonic reads: "If a process reads the value of a data item x, any successive read operation on x by that process will always return that same value or a more recent value." - Monotonic writes: "A write operation by a process on a data item X is completed before any successive write operation on X by the same process." - Read-your-writes: “A value written by a process on a data item X will be always available to a successive read operation performed by the same process on data item X” - Write-follows-reads: "A write operation by a process on a data item x following a previous read operation on x by the same process is guaranteed to take place on the same or a more recent value of x that was read."
  • #5 This is a “client side” thing.
  • #6 5. What was the account balance on 3rd March 2019? 5. What was the closing balance for April end? 6. E.g. Movie scene where student hacks into college database and changes his/her marks. 8. When not using optimistic locking. Concurrent writes still problematic with pessimistic locks.
  • #7 3. What was the account balance on 3rd March 2019? 3. What was the closing balance for April end? 5. More details in “compensating transactions” section. META POINT: State should never be stored, it should always be computed on-demand (on reads/queries).
  • #8 Hundreds of millions of rows and ever growing (especially for time-series data)
  • #9 1. Sharding will require some native infra support - SQL shard map manager - Built-in Partitioning in Azure Table Storage - MongoS in MongoDB Vertical partitioning/sharding is also a possibility Both dapper & entity-framework can be used to support sql shards in sql elastic pool.
  • #11 If you use read-replicas, you can mostly get away with using CQRS??? 1. With Event Sourcing state must never be persisted and must always be computed (on reads/queries). This requires separate models for read and writes(?) 1. Also imagine a food delivery system. Any order posted must write to 3 systems (payment, kitchen, delivery etc). 3. Food delivery read (payment, kitchen, delivery). Also for querying cold storage. 4. For pessimistic locks (r/w locks)
  • #12  1. Will not satisfy swiggy scenario (read from multiple stores) 2.3. Commands are written to data store & also put on message bus. 2.4. Via message bus or queues or timers.
  • #14 1. Multiple flight providers, multiple hotel providers. 1. Distributed locks across multiple 3rd party systems. 2. Most expensive or one which is hard to acquire (flight ticket) or has cancellation charges. 3.1. User intervention if any one booking fails (try a different hotel, a different flight etc) 3.2. Undoing an action might not be mirror opposite (e.g. Cancellation charges or other business logic involved). 3.3. Unroll concurrently or in a particular sequence?
  • #16 2.1. ensure that expiration policy matches the access pattern of applications that use the data. Don't make the expiration period too short (app has to continually refetch data into cache). Don't make the expiration period too long (that the cached data is likely to become stale). Remember that caching is most effective for relatively static data, or data that is read frequently. 2.2. Various flavors of LRU/LFU policies Global policy vs per cached-item policy. Sometimes the latter works best (eg, if a cached item is very expensive to retrieve from the data store, it can be beneficial to keep this item in the cache at the expense of more frequently accessed but less costly items). - allkeys-lru: the service evicts the least recently used keys out of all keys - allkeys-lfu: the service evicts the least frequently used keys out of all keys - allkeys-random: the service randomly evicts keys out of all keys - volatile-lru: the service evicts the least recently used keys out of all keys with an "expire" field set - volatile-ttl: the service evicts the shortest time to live keys (out of all keys with an "expire" field set) - volatile-lfu: the service evicts the least frequently used keys out of all keys with an "expire" field set - volatile-random: the service randomly evicts keys with an "expire" field set - no-eviction: the service will not evict any keys and no writes will be possible until more memory is freed
  • #17 2. Helps preserve access rules, queries 3.1. cost: (hot/cache) highest -> (cold/storage) lowest 3.2. latency: (hot/cache) lowest -> (cold/storage) highest 3.3. rate of access: (hot/cache) highest -> (cold/storage) lowest 3.4. lifetime/durability: (hot/cache) lowest-> (cold/storage) highest 3.5. size of data: (hot/cache) lowest -> (cold/storage) highest