6. Ad exchange
Want to show an ad to device
123?
Tapad
Sure, show this ad for $2 CPM
No thanks
Great, you won. Ad was displayed!
How about to device XYZ?
95% response time: ~30 ms
13. Expiration and eviction
Old data expires automatically
Oldest data is evicted if the database is running
out of space
This is desired behavior in ad-tech world
16. Tapad’s Aerospike Configuration
100% keys in memory
100% data in SSD storage
Replication factor 2
512-byte block size
Need lots of free space in memory and storage for defrag (high-
water mark)
17.
18. Migrations and partitions
New node requires data migration, means degraded
performance
Network partition may trigger some data migration
Didn’t realize this had animation, which is why this is out of order.
But, it actually shows the asynchronous nature of the system pretty well!
This number is bid-requests per second. A given request may trigger multiple reads because of following aliases.
Marketers understand delivery and response for a single channel
However, marketers don’t have data about consumer exposure and marketing impact across devices
Marketers understand delivery and response for a single channel
However, marketers don’t have data about consumer exposure and marketing impact across devices
Replication factor 2. If a node goes down, the data is still there.
XDR - our west coast bidders read from west coast aerospike, and vice versa. Changes propagate in milliseconds.
Read and write performance scales linearly. Much easier with a key-value store than a relational database, but hey! it fits our data model perfectly.
This is desired behavior because new information is more actionable than old information. Month old record with no recent activity may never be accessible (cleared cookies). Chuck it, make room for new data.
Expiring old data also makes our ETL process faster because the old data is gone so we don’t need to export it.
Evicting older data makes space for good data when we enter a very high-write situation. Lets the system continue operating.
Config - what Tapad’s deployment looks like. We don’t have as many disks as the setups from Aerospike’s benchmarks because we don’t actually need that much storage.
Migrations - what happens when you add a node to the cluster?
Eviction - it’s a great feature for us, but was counter-intuitive for us; we had an interesting scenario with this. more later
Usage - as a software developer, what is it like using aerospike?
Multiple terabytes of data in 3.3 billion objects
Each object is pretty small, about 200 bytes. More on this later in relation to block size, which is 512 bytes
Recommended high-water mark is 50% memory and 50% storage. Aerospike needs this to defray the data. We set it aggressively to 70% on memory and storage to save money on hardware.
It’s possible to send updates at a rate faster than the defragger can keep up, leading to out-of-disk-space issues even when you have 40% of the disk free. oops. Solution was to set the highwater mark lower (to 50%), which evicted a bunch of older data and got us back in business.
Most of our records are above 128 bytes and below 256 bytes.
Block size is 512 bytes which means we are often wasting 50% of the record’s storage space.
Block size can be set to 128 bytes in versions 2.7 and 3.1. Looking forward to deploying this.
Adding a new node could mean a day (24 hours) of degraded performance.
This is tunable based on speed of migration. Faster migration = cluster is slower bc many writes; alternative is slow migration.
New nodes should be homogenous; a new node with more storage than the other nodes cannot make use of it until all nodes have as much storage
Go to next slide for the picture of the buckets.
Eviction takes place by splitting records into buckets based on TTL, and evicting randomly from lowest TTL (soonest to expire)
If records are not evenly distributed, chaos ensues. Appears that data is being evicted randomly.
Solution was to change the long TTL to a short one, and refresh those devices with a regularly scheduled job.
hot key = 3 outstanding reads on the same key; can’t get away from hot spots in data. Error is sent from the client.
Multiple keys pointing to same record could save us a lot of extra reads (250k bid requests vs 350k reads - a lot of that is following id aliases)