On October 23rd, 2014, we updated our
By continuing to use LinkedIn’s SlideShare service, you agree to the revised terms, so please take a few minutes to review them.
Goals● Reduce network ○ latency ○ Inter-dc traffic● Localize resource use ○ Reduce failure cases ○ Increase availability ○ Isolate dependencies● Provide multiple active sites● Partition geo/regional data
Challenges● Data concistency ○ User experience ○ Backup and operational needs● Scaling● Partitioning/Sharding
Multiple Online Data Centers● Europe ● USA ○ App ○ App ○ Data ○ Data● Asia ○ App
Non-default behaviorsDefault: Multi-DC needs:● Primary read/write ● Read locally● No stale reads ● Support some stale reads
Replication● Replica Sets ○ Possible to read from non-primary replicas ○ Copy of data ○ Write Concern ■ Tagging ■ Verifiable writes● Provides ○ Isolation (possible stale reads) ○ Availability ○ Distribution of read, possibly stale (WriteConcern)
Sharding + Replication● Range-based sharding (chunks) ○ Not tag aware ○ Random distribution (balancer)● Shards made up of Replica Sets ○ All advantages● Writes to (primary) shard per chunk● Reads ○ From Primary by default ○ Optional non-primary reads