12. That doesn’t
sound so hard
We don’t know when sessions end
There’s a lot of data
It’s all done in (close to) real time
13. Numbers
200 Gb logs
100 million data points
per day
~300 metrics per data point
= 6000 updates / s at peak
14. How we use(d) MongoDB
“Virtual memory” to offload data while we wait
for sessions to finish
Short time storage (<48 hours) for batch jobs,
replays and manual analysis
Metrics storage
15. Why we use MongoDB
Schemalessness makes things so much easier,
the data we collect changes as we come up
with new ideas
Sharding makes it possible to scale writes
Secondary indexes and rich query language are
great features (for the metrics store)
It’s just… nice
23. 2nd iteration
using scans for two step assembling
Instead of updating, save each fragment, then
scan over _id to assemble sessions
24. 2nd iteration
using scans for two step assembling
Outcome: not as much lock, but still not great
performance. We also realised we couldn’t
remove data fast enough
29. 4th iteration
sharding
To get around the global write lock and get
higher write performance we moved to a
sharded cluster.
Outcome: higher write performance, lots of
problems, lots of ops time spent debugging
36. 5th iteration
moving things to separate clusters
We saw very different loads on the shards and
realised we had databases with very different
usage patterns, some that made autosharding
not work. We moved these off the cluster.
Outcome: a more balanced and stable cluster
40. 6th iteration
monster machines
We got new problems removing data and
needed some room to breathe and think
Solution: upgraded the servers to High-
Memory Quadruple Extra Large (with cheese).
I♥
43. 7th iteration
partitioning (again) and pre-chunking
We rewrote the database layer to write to a
new database each day, and we created all
chunks in advance. We also decreased the size
of our documents by a lot.
Outcome: no more problems removing data.
47. 8th iteration
realize when you have the wrong tool
Transient data might not need all the bells and
whistles.
Outcome: Redis gave us 100x performance in
the assembling step
49. 9th iteration
rinse and repeat
We now have the same scaling issues later in
the chain.
Outcome: Upcoming rewrite to make writes/
updated more effective
Redis was actually slower
53. Tips
EC2
You have three copies of your data, do you
really need EBS?
Instance store disks are included in the price
and they have predictable performance.
m1.xlarge comes with 1.7 TB of storage.
54. Tips
Avoid bulk inserts
Very dangerous if there’s a possibility of
duplicate key errors
It’s not fixed in 2.0 even though the driver has a
flag for it.
55. Tips
Safe mode
Run every Nth insert in safe mode
This will give you warnings when bad things
happen; like failovers