“Attribution" is the marketing term of art for allocating full or partial credit to individual advertisements that eventually lead to a purchase, sign up, download, or other desired consumer interaction. We'll share how we use DynamoDB at the core of our attribution system to store terabytes of advertising history data. The system is cost effective and dynamically scales from 0 to 300K requests per second on demand with predictable performance and low operational overhead.
8. SQL (Relational)
Price Desc.
$11.50
$8.99
Must
watch..
Columns
Rows
Primary Key Index
$14.95
One of 2
major …
The
Sounds..
Product
ID
Type
1
2
3
Title Date
Harry
Potter…
2010
Book ID Author
1 JK Ro..
Books
Albums
Title
The Fox
Album
ID
Artist
2 Ylvis
Genre Director
Action
Zack
Snyder
Movie ID Title
3
Batman
vs Super
Movies
Products
Book
Album
Movie
9. SQL (Relational) vs. NoSQL (Non-relational)
Product
ID
Type
Harry
Potter..
JK
Rowling
1 Book ID
2 Album ID The Fox
3 Movie ID
Batman
vs Super
Ylvis
Attributes
Schema is defined per item
Items
Partition Key Sort Key
Price Desc.
$11.50
$8.99
Must
watch..
Columns
Rows
Primary Key Index
$14.95
One of 2
major …
The
Sounds..
3
Movie ID:
Actor ID
Ben
Affleck
Action
2010
Zack
Snyder
Primary Key
Product
ID
Type
1
2
3
Title Date
Harry
Potter…
2010
Book ID Author
1 JK Ro.. Title
The Fox
Album
ID
Artist
2 Ylvis
Genre Director
Action
Zack
Snyder
Movie ID Title
3
Batman
vs Super
Products Products
Book
Album
Movie
Books
Albums
Movies
NoSQL design optimizes for
compute instead of storage
10. Why NoSQL?
Optimized for storage Optimized for compute
Normalized/relational Denormalized/hierarchical
Ad hoc queries Instantiated views
Scale vertically Scale horizontally
Good for OLAP Built for OLTP at scale
SQL NoSQL
15. Stream of updates to a table
Asynchronous
Exactly once
Strictly ordered
• Per item
Highly durable
• Scale with table
24-hour lifetime
Sub-second latency
DynamoDB Streams
19. Analytics with
DynamoDB Streams
Collect and de-dupe data in DynamoDB
Aggregate data in memory and flush periodically
Performing real-time aggregation and
analytics
21. DataXu’s DynamoDB Use Case
• Who is DataXu
• Attribution Use Case
• Why DynamoDB
• Deployment Architecture
• Capacity & Performance
• Tips & Lessons Learned
22. DataXu
• Who
• Spun out of MIT Labs
• A petabyte-scale digital
marketing platform
• One of the fastest growing
companies in Inc. 5000
• What
• Help world’s most
valuable brands
understand and engage
with their consumers
• Maximize ROI
Quick Statistics
• 2M+ bid requests per second
• Billions of impressions per
month, petabytes of data
• ~10ms round-trip response time
• 180+ TB logs per day
• 2 PB data analyzed
• 3000+ servers powering the
platform
• 13 regions, 24x7
24. DataXu Reads and Writes on DynamoDB
X-Axis = Day
Y-Axis = Read/Write Capacity used
X-Axis = Time (6 hour intervals)
Y-Axis = Read/Write Capacity used
26. Attribution is the science of allocating credit from an activity/sale to
the marketing touchpoints that a customer was exposed to prior to
the purchase/activity.
Attribution
Online
Purchase
Impression ClickImpression
Customer Journey
EI EventImpression A Activity
27. Generalized Event Chains
AI E I A
Time
• Billions of events and activities are organized into sequences.
• Events are correlated based on time and user to construct paths leading to an
activity.
EI Event
E
Impression A Activity
I E
29. Why DynamoDB
• Managed Service
• Easy to use
• Elastic scaling, no need to overprovision
• API driven
• Fast & Predictable Performance (millisecs)
• Fast lookup/scan of user events
• Consistent & predictable read/write performance
• TCO
• Reasonable capex and no opex
36. R/W Operations vs. R/W Capacity Units
What influences capacity units for your table?
• Item size: Capacity unit size
• 4 KB per Read or 1 KB per Write
• Read/write request rate: Item Gets and Puts by your
Application
• Consistency: Strongly Consistent Read is counted double of
Eventually Consistent Read
• Local Secondary Index: Synchronized with the table
37. Capacity Planning: Unit of Scaling
• Partition:
• Storage: 10 GB per partition
• Compute: 3000 RCU or 1000 WCU per partition
• Partitions(for throughput) = (RCU/3000) + (WCU/1000)
• Partitions(for size) = Storage used in GB/10
• Total Number of Partitions =
Ceiling(MAX (Partitions(for throughput) , Partitions(for size)))
• e.g., Ceiling(Max(100/10, 9000/3000+3000/1000)) = 10
39. Throttling
100 GB 9000 3000 10 900 300
Storage Provisioned
RCU
Provisioned
WCU
Partitions Reads Per
Partition
Writes Per
Partition
900 Reads and 300 Writes Per Partition
Throttling kicks in > 900 R and 300 W
Partitions
43. Design Tips
• Understand Scaling
• Understand Hot Keys/Throttling
• Capture Application Metrics
• Configure Table Alarms
• Application Tuning for Outliers
• Retry w/Backoff
• DynamoDB Best Practices
• http://docs.aws.amazon.com/amazondynamodb/latest/developergui
de/BestPractices.html
• AWS Service Limits
• http://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html
44. Lessons Learned
• Reduce RCU and WCU
• Combined Reads and Writes, Batch API
• Combined multiple rows that share the same hash key to the
same row (3X less puts)
• LZ4 compression
• How do we handle Deletes?
• Table rotation to match attribution windows
• Drop entire table when it is no longer necessary
45. Lessons Learned
• Dynamic scaling to large number of partitions takes time
• Debugging
• Application logging/metrics
• TCP dumps
• Turn on Request ID logging
• CloudWatch
• Local DynamoDB for testing
• http://docs.aws.amazon.com/amazondynamodb/latest/develo
perguide/Tools.DynamoDBLocal.html