Advertisement
Advertisement

More Related Content

Advertisement

Mongodb - Scaling write performance

  1. MongoDB: Scaling write performance Junegunn Choi
  2. MongoDB • Document data store • JSON-like document • Secondary indexes • Automatic failover • Automatic sharding
  3. First impression: Easy • Easy installation • Easy data model • No prior schema design • Native support for secondary indexes
  4. Second thought: Not so easy • No SQL • Coping with massive data growth • Setting up and operating sharded cluster • Scaling write performance
  5. Today we’ll talk about insert performance
  6. Insert throughput on a replica set
  7. Steady 5k inserts/sec * 1kB record. ObjectId as PK * WriteConcern: Journal sync on Majority
  8. Insert throughput on a replica set with a secondary index
  9. Culprit: B+Tree index • Good at sequential insert • e.g. ObjectId, Sequence #, Timestamp • Poor at random insert • Indexes on randomly-distributed data
  10. Sequential vs. Random insert 1 55 2 75 3 78 4 1 5 99 6 36 7 80 8 91 9 52 10 B+Tree 63 B+Tree 11 56 12 33 working set working set Sequential insert ➔ Small working set Random insert ➔ Large working set ➔ Fits in RAM ➔ Sequential I/O ➔ Cannot fit in RAM ➔ Random I/O (bandwidth-bound) (IOPS-bound)
  11. So, what do we do now?
  12. 1. Partitioning Aug 2012 Sep 2012 Oct 2012 B+Tree fits in memory does not fit in memory
  13. 1. Partitioning • MongoDB doesn’t support partitioning • Partitioning at application-level • e.g. Daily log collection • logs_20121012
  14. Switch collection every hour
  15. 2. Better H/W • More RAM • More IOPS • RAID striping • SSD • AWS Provisioned IOPS (1k ~ 10k)
  16. 3. More H/W: Sharding • Automatic partitioning across nodes SHARD1 SHARD2 SHARD3 mongos router
  17. 3 shards (3x3)
  18. 3 shards (3x3) on RAID 1+0
  19. There’s no free lunch • Manual partitioning • Incidental complexity • Better H/W • $ • Sharding • $$ • Operational complexity
  20. “Do you really need that index?”
  21. Scaling insert performance with sharding
  22. = Choosing the right shard key
  23. Shard key example: year_of_birth 64MB chunk ~ 1950 1971 ~ 1990 1951 ~ 1970 1991 ~ 2005 2006 ~ 2010 2010 ~ ∞ USERS USERS USERS SHARD1 SHARD2 SHARD3 mongos router
  24. 5k inserts/sec w/o sharding
  25. Sequential key • ObjectId as shard key • Sequence # • Timestamp
  26. Worse throughput with 3x H/W.
  27. Sequential key 1000 ~ 2000 • All inserts into one chunk 5000 ~ 7500 • Cannot scale insert performance 9000 ~ ∞ • Chunk migration overhead USERS SHARD-x 9001, 9002, 9003, 9004, ...
  28. Sequential key
  29. Hash key • e.g. SHA1(_id) = 9f2feb0f1ef425b292f2f94 ... • Distributes inserts evenly across all chunks
  30. Hash key • Performance drops as collection grows • Why? Mandatory index on shard key • B+Tree problem again!
  31. Sequential key Hash key
  32. Sequential + hash key • Coarse-grained sequential prefix • e.g. Year-month + hash value • 201210_24c3a5b9 B+Tree 201208_* 201209_* 201210_*
  33. But what if... B+Tree large working set 201208_* 201209_* 201210_*
  34. Sequential + hash key • Can you predict data growth rate? • Balancer not clever enough • Only considers # of chunks • Migration slow during heavy-writes
  35. Sequential key Hash key Sequential + hash key
  36. Low-cardinality hash key • Small portion of hash value Shard key range: A ~ D • e.g. A~Z, 00~FF • Alleviates B+Tree problem Local • Sequential access on fixed # B+Tree of parts • Cardinality / # of shards A A A B B B C C C
  37. Low-cardinality hash key • Limits the # of possible chunks • e.g. 00 ~ FF ➔ 256 chunks • Chunk grows past 64MB • Balancing becomes difficult
  38. Sequential key Hash key Sequential + hash key Low-cardinality hash key
  39. Low-cardinality hash prefix + sequential part Shard key range: A000 ~ C999 • e.g. Short hash prefix + timestamp • Nice index access pattern Local B+Tree • Unlimited number of chunks A000 A123 B000 B123 C000 C123
  40. Finally, 2x throughput
  41. Lessons learned • Know the performance impact of secondary index • Choose the right shard key • Test with large data sets • Linear scalability is hard • If you really need it, consider HBase or Cassandra • SSD
  42. Thank you. Questions? 유응섭 rspeed@daumcorp.com 최준건 gunn@daumcorp.com
Advertisement