2. Agenda
• Why migrating to NoSQL (not only “green field”)
• What is a Table
• What is a Schema
• What about Stored Procedures
• Transactions?
• Top DynamoDB Mistakes or Optimization
Opportunities
3. Learn and Be Curious
Leaders are never done learning and always seek to
improve themselves. They are curious about new
possibilities and act to explore them
Supporting Amazon.com Journey to migrate from RDBMS to NoSQL
4. What is your first (DB) language?
RDBMS (ACID, SQL and Stored Procedures)
MongoDB (Document Store)
Hbase (Column Families)
Redis (Advanced Data Types as Sorted Sets)
5. There is always a wall
OR
EffortRevenues
Cost Value
# Users# Users
What you want What you get
8. Small Partition Units
Hash Key Range Table Partitions
0000
FFFF
5333
A666
10GB
1000 1KB Writes / Second 3000 4KB Reads / Second
8888 A
8888 B
8888 C
8888 D
Put 8888 E
Update 8888 B
Get 8888 E
Get Range 8888 A
8888 B
8888 C
5555 A
5555 B
5555 C
5555 D
5555 E
9999 A
9999 B
6666 A
6666 B
8484 7777
0000
FFFF
5333
A666
9. Distributed Hashtable
Hash/Partition Key for O(1) lookup
(Optional) Range/Sort Key for O(ln(n)) lookup
Automatic repartitioning on Size and Read/Write Capacity
10GB
Write
1K*1KB IOPS
Read
3K*4KB IOPS
18. Schema?
Schema for write - RDBMS
Schema for Read - DynamoDB
Schema on Read - Hadoop
Define an attribute in the schema ONLY if you need to
LOOKUP with this attribute (not scan)
19. Hash Key (+ Range Key)
Images Table
User Image Date Link
Bob aed4c 2013-10-01 s3://…
Bob cf2e2 2013-09-05 s3://…
Bob f93bae 2013-10-08 s3://…
Alice ca61a 2013-09-12 s3://…
Table
Lookup Key
Range Key for Uniqueness
The main key is used to LOOKUP an item
20. Flexible Attributes
Images Table
User Image Date Link Size KB
Bob aed4c 2013-10-01 s3://… 124
Bob cf2e2 2013-09-05 s3://… 251
Bob f93bae 2013-10-08 s3://… 98
Alice ca61a 2013-09-12 s3://… 155
Table
New Attribute
Most attributes are not needed for LOOKUP
21. Local Secondary Index
Images Table
User Image Date Link
Bob aed4c 2013-10-01 s3://…
Bob cf2e2 2013-09-05 s3://…
Bob f93bae 2013-10-08 s3://…
Alice ca61a 2013-09-12 s3://…
User Date Image
Bob 2013-09-05 cf2e2
Bob 2013-10-01 aed4c
Bob 2013-10-08 f93bae
Alice 2013-09-12 ca61a
Table ByDate Local Secondary Index
Local Secondary Index on Date
An alternative sort for a hash key
22. To project or not to project?
Images Table
User Image Date Link
Bob aed4c 2013-10-01 s3://…
Bob cf2e2 2013-09-05 s3://…
Bob f93bae 2013-10-08 s3://…
Alice ca61a 2013-09-12 s3://…
User Date Image
Bob 2013-09-05 cf2e2
Bob 2013-10-01 aed4c
Bob 2013-10-08 f93bae
Alice 2013-09-12 ca61a
Table ByDate Local Secondary Index
Additional attributes can be “fetched”
“Pay” on read or on write
Link
s3://…
s3://…
s3://…
s3://…
Or projected
23. User Image Date Link
Bob aed4c 2013-10-01 s3://…
Bob cf2e2 2013-09-05 s3://…
Bob f93bae 2013-10-08 s3://…
Alice ca61a 2013-09-12 s3://…
Item Collection Size < 10GB
Images Table
User Image Date Link
Bob aed4c 2013-10-01 s3://…
Bob cf2e2 2013-09-05 s3://…
Bob f93bae 2013-10-08 s3://…
Alice ca61a 2013-09-12 s3://…
User Date Image
Bob 2013-09-05 cf2e2
Bob 2013-10-01 aed4c
Bob 2013-10-08 f93bae
Alice 2013-09-12 ca61a
Table
Item Collection for Hash Key and all its LSI
Monitor for large Item Collection using ReturnItemCollectionMetrics
Link
s3://…
s3://…
s3://…
s3://…
Up to 5 LSI
24. Sparse Secondary Index
Images Table
User Image Date Link
Bob aed4c 2013-10-01 s3://…
Bob cf2e2 2013-09-05 s3://…
Bob f93bae
Alice ca61a 2013-09-12 s3://…
User Date Image
Bob 2013-09-05 cf2e2
Bob 2013-10-01 aed4c
Alice 2013-09-12 ca61a
Table ByDate Local Secondary Index
Set Date to NULL to remove it from the index
If any of the attributes of the key is missing it won’t be in the index
25. Global Secondary Index
ImageTags Table
Query for images tagged Alice
User Image
Bob aed4c
Bob f93bae
Alice aed4c
Alice f93bae
ByUser Global Secondary Index
Image User
aed4c Alice
aed4c Bob
f93bae Alice
f93bae Bob
Table
A completely new LOOKUP key
26. Global Secondary Index R/W Capacity
ImageTags Table
Credit Bucket
ByUser Global Secondary Index
Image User
aed4c Alice
aed4c Bob
f93bae Alice
f93bae Bob
Table
If any of the indexes has no write capacity the write is throttled
Async
Place Image
London aed4c
London f93bae
Rome ba763
Rome 63f11
User Image
Bob aed4c
Bob f93bae
Alice aed4c
Alice f93bae
ByPlace Global Secondary Index
Up to 5 GSI
27. Negative Example
Images Table
User Image Date Country
Bob aed4c 2013-10-01 USA
Bob cf2e2 2013-09-05 USA
Bob f93bae 2013-10-08 DE
Alice ca61a 2013-09-12 BR
Country Date Image
USA 2013-09-05 cf2e2
USA 2013-10-01 aed4c
USA 2013-10-08 2ee4c
USA 2013-09-12 a5541
Table ByCountry Global Secondary Index
Bad distribution key – cardinality and skew
28. Term 3 – How do you make Coffee?
Data Processing
30. Updates Stream
Images Table
User Image Date Size KB
Bob aed4c 2013-10-01 124
Bob cf2e2 2013-09-05 251
Bob f93bae 2013-10-08 98
Alice ca61a 2013-09-12 155
Table
Aggregation with Stream
User Size KB
Bob 473
Alice 155
Post-Processing items updates
33. Top DynamoDB Mistakes
Too much "old" data
"Wrong" lookup keys (Market=NA, Status=Complete)
Scaling up and down too much
Writing "long" items
Using DynamoDB for Queues
Introducing Artificial GUID
Creating Storms