This document summarizes a presentation about implementing cross-region replication in Amazon DynamoDB. It includes:
1. An introduction to DynamoDB and replication patterns using DynamoDB Streams and the AWS Lambda service.
2. Details about Under Armour's use of DynamoDB cross-region replication to distribute user data across regions while complying with data residency requirements.
3. Their experience so far with the current solution and next steps to improve latency, reliability, and support for concurrent writes across regions.
2. What to expect from the session
DynamoDB introduction
1. SQL vs NoSQL refresher
2. Amazon DynamoDB recap
3. DynamoDB replication patterns
Implementing cross-region replication at Under Armour
1. What does single sign-on mean?
2. Background and problem context
3. Decision process that lead to our current solution
4. Our experience so far
5. Next steps
6. Starting over
3. Amazon DynamoDB
Fast and consistent
Scales to any workloadDocument or key-valueFully managed NoSQL
Event driven programmingAccess control
5. Partitions are three-way replicated
Id = 2
Name = Andy
Dept = Engg
Id = 3
Name = Kim
Dept = Ops
Id = 1
Name = Jim
Id = 2
Name = Andy
Dept = Engg
Id = 3
Name = Kim
Dept = Ops
Id = 1
Name = Jim
Id = 2
Name = Andy
Dept = Engg
Id = 3
Name = Kim
Dept = Ops
Id = 1
Name = Jim
Replica 1
Replica 2
Replica 3
Partition 1 Partition 2 Partition N
7. Replication use cases
• Globally distributed applications
• Lower-latency data access
• Traffic distribution
• Disaster recovery
• In-region and cross-region
8. Stream of updates to a table
Asynchronous
Exactly once
Strictly ordered
• Per item
Highly durable
• Scale with table
24-hour lifetime
Sub-second latency
DynamoDB Streams
9. In-region replication
• Automatic replication across AZs within
region (natively provided)
• Writes replicated continuously across 3
AZs, persisted to disk (SSD)
• Reads—strong or eventually consistent
• For data redundancy and protection
• DynamoDB Streams and AWS Lambda
• Streams of updates to a table
• DynamoDB triggers invoke a Lambda
function to run your code
10. Open Source Cross-
Region Replication Library
Cross-region Replication
• Solution uses Amazon
DynamoDB Cross-Region
Replication Library
• Leverages DynamoDB streams to
keep tables in sync across
multiple regions in near real-time
• Leverage cross-region replication
library in your applications
• Available in GitHub repository at:
• https://github.com/awslabs/dyna
modb-cross-region-library
26. Background and problem context
• 1 manager/developer/tech lead
• 1 developer
• 1 site reliability engineer (me!)
• Fast startup
• Fast iteration
• Low overhead
• Reliable188 million users.
Sign on once. That’s it.
27. Background and problem context
STOP
Personally identifiable information (PII)…as used in US privacy law…is
information that can be used…to identify, contact, or locate a single person, or
to identify an individual in context.
https://en.wikipedia.org/wiki/Personally_identifiable_information
28. Background and problem context
*not to scale
• Store data where it belongs
• Don’t store data where it doesn’t belong
• Get data where and when it’s needed
1. Replicate PII-free pointers across regions
2. Follow pointers to locate user data
userId homeRegion
42 US
US users
German users
Other EU users
31. Decision process—AWS CloudFormation
*This solution has now been deprecated.
• CloudFormation
• Amazon EC2 Container Service
• Tuning containers based on throughput
• Possible to wedge the whole thing if you go full chaos monkey
• No custom replication logic
Struggles
32. Decision process
Google:
“dynamodb cross
region
replication.”
Click first result.
http://docs.aws.amazon.com/a
mazondynamodb/latest/develop
erguide/Streams.CrossRegionR
epl.html
Check out the
Amazon Kinesis
Client Library
Plus the DynamoDB Streams adapter
Profit. …well, sort of.
33. Decision process—Amazon Kinesis Client Library
• Requires running a process somewhere
• Troubleshooting, startup, rebalancing, and failovers
• State tracking DynamoDB table in your account
• Scaling processes for throughput
• Less is more
Struggles
34. Decision process
Google:
“dynamodb cross
region
replication.”
Click first result.
http://docs.aws.amazon.com/a
mazondynamodb/latest/develop
erguide/Streams.CrossRegionR
epl.html
Profit. …yep!
DynamoDB Streams
+ Lambda
Check out the
Amazon Kinesis
Client Library
Plus the DynamoDB Streams adapter
35. Decision process—Lambda
• 24 hours to respond to problems
• Parallelizable with 1,024 threads
• Almost zero operational overhead
• Automatically scales with throughput
Strengths
38. Experience—reads
• Public DynamoDB endpoints + TLS
• Read anonymous data locally
• Read PII from user’s home region
eu-west-1us-east-1
us-east-1
OpenID server
eu-west-1
OpenID server
39. Experience—writes
• Write anonymous data to us-east-1
• Replicate anonymous data
• Write PII to user’s home region
• Public DynamoDB endpoints + TLS
us-east-1
OpenID server
us-east-1
eu-west-1
us-east-1
40. Experience—replication
class Main extends StrictLogging {
def handler(event: DynamodbEvent, context: Context): Unit = {
val conf = Main.loadConfFromContext(context)
logger.info("Replicating to regions: %s".format(Main.readConfRegions(conf)))
val clients = Main.buildClientsFromConf(conf)
val (records, skipped) = event.getRecords.asScala.toList.partition(Main.filterReplicatedUpdate)
logger.info("Skipping %s records: %s".format(
skipped.length, for (r <- skipped) yield (r.getEventSourceARN, r.getDynamodb.getKeys)))
logger.info("Replicating %s records: %s".format(
records.length, for (r <- records) yield (r.getEventSourceARN, r.getDynamodb.getKeys)))
records.par.map(Main.replicate(_, clients))
}
}
41. Experience—latency
Slow
Fast
Outside us-east-1, outside home region
Outside us-east-1, inside home region
Inside us-east-1, outside home region
Inside us-east-1, inside home region
from us-east-1
~50ms to eu-west-1
~150ms to ap-northeast-1
45. Multimaster—latency
Slow
Fast
Outside us-east-1, outside home region
Outside us-east-1, inside home region
Inside us-east-1, outside home region
Inside us-east-1, inside home region
SQUISH
Better non-PII data locality
from us-east-1
~50ms to eu-west-1
~150ms to ap-northeast-1
46. Multimaster—write ordering
Extra rields:
1. Timestamp
2. Write ID
3. Replication flag
userId 42
email, etc e@mail.com
timestamp 1476106431728
writeId
5c0fb0d3-c1fe-4526-
a2cf-0678880952f9
replicateMe true
53. Concurrent writes will happen!
The question is not how to work around or avoid them.
The question is how to recognize and resolve them.
54. Document schema
Concurrent writes require storage for multiple versions
of your data.
Either formally as a CRDT data structure or ad hoc for
eventual conflict resolution by a person or process.
55. Dotted version vectors
Thank you:
basho http://basho.com
Russel Brown https://github.com/russelldb
Nuno Preguiça
Carlos Baquero
Paulo Almeida
Victor Fonte
Ricardo Gonçalves
Efficient Causality Tracking in
Distributed Storage Systems
With Dotted Version Vectors.