Mongodb - Scaling write performance

13,368 views

Published on

2 Comments
40 Likes
Statistics
Notes
No Downloads
Views
Total views
13,368
On SlideShare
0
From Embeds
0
Number of Embeds
618
Actions
Shares
0
Downloads
192
Comments
2
Likes
40
Embeds 0
No embeds

No notes for slide

Mongodb - Scaling write performance

  1. 1. MongoDB:Scaling write performance Junegunn Choi
  2. 2. MongoDB• Document data store • JSON-like document• Secondary indexes• Automatic failover• Automatic sharding
  3. 3. First impression: Easy• Easy installation• Easy data model• No prior schema design• Native support for secondary indexes
  4. 4. Second thought: Not so easy• No SQL• Coping with massive data growth• Setting up and operating sharded cluster• Scaling write performance
  5. 5. Today we’ll talk about insert performance
  6. 6. Insert throughput on a replica set
  7. 7. Steady 5k inserts/sec * 1kB record. ObjectId as PK * WriteConcern: Journal sync on Majority
  8. 8. Insert throughput on a replica setwith a secondary index
  9. 9. Culprit: B+Tree index• Good at sequential insert • e.g. ObjectId, Sequence #, Timestamp• Poor at random insert • Indexes on randomly-distributed data
  10. 10. Sequential vs. Random insert 1 55 2 75 3 78 4 1 5 99 6 36 7 80 8 91 9 52 10 B+Tree 63 B+Tree 11 56 12 33 working set working set Sequential insert ➔ Small working set Random insert ➔ Large working set ➔ Fits in RAM ➔ Sequential I/O ➔ Cannot fit in RAM ➔ Random I/O (bandwidth-bound) (IOPS-bound)
  11. 11. So, what do we do now?
  12. 12. 1. Partitioning Aug 2012 Sep 2012 Oct 2012 B+Tree fits in memorydoes not fit in memory
  13. 13. 1. Partitioning• MongoDB doesn’t support partitioning• Partitioning at application-level• e.g. Daily log collection • logs_20121012
  14. 14. Switch collection every hour
  15. 15. 2. Better H/W• More RAM• More IOPS • RAID striping • SSD • AWS Provisioned IOPS (1k ~ 10k)
  16. 16. 3. More H/W: Sharding• Automatic partitioning across nodes SHARD1 SHARD2 SHARD3 mongos router
  17. 17. 3 shards (3x3)
  18. 18. 3 shards (3x3)on RAID 1+0
  19. 19. There’s no free lunch• Manual partitioning • Incidental complexity• Better H/W • $• Sharding • $$ • Operational complexity
  20. 20. “Do you really need that index?”
  21. 21. Scaling insert performance with sharding
  22. 22. =Choosing the right shard key
  23. 23. Shard key example: year_of_birth 64MB chunk ~ 1950 1971 ~ 1990 1951 ~ 19701991 ~ 2005 2006 ~ 2010 2010 ~ ∞ USERS USERS USERS SHARD1 SHARD2 SHARD3 mongos router
  24. 24. 5k inserts/sec w/o sharding
  25. 25. Sequential key• ObjectId as shard key• Sequence #• Timestamp
  26. 26. Worse throughput with 3x H/W.
  27. 27. Sequential key 1000 ~ 2000• All inserts into one chunk 5000 ~ 7500• Cannot scale insert performance 9000 ~ ∞• Chunk migration overhead USERS SHARD-x 9001, 9002, 9003, 9004, ...
  28. 28. Sequential key
  29. 29. Hash key• e.g. SHA1(_id) = 9f2feb0f1ef425b292f2f94 ...• Distributes inserts evenly across all chunks
  30. 30. Hash key• Performance drops as collection grows • Why? Mandatory index on shard key • B+Tree problem again!
  31. 31. Sequential key Hash key
  32. 32. Sequential + hash key• Coarse-grained sequential prefix• e.g. Year-month + hash value • 201210_24c3a5b9 B+Tree 201208_* 201209_* 201210_*
  33. 33. But what if... B+Tree large working set 201208_* 201209_* 201210_*
  34. 34. Sequential + hash key• Can you predict data growth rate?• Balancer not clever enough • Only considers # of chunks • Migration slow during heavy-writes
  35. 35. Sequential key Hash keySequential + hash key
  36. 36. Low-cardinality hash key• Small portion of hash value Shard key range: A ~ D • e.g. A~Z, 00~FF• Alleviates B+Tree problem Local • Sequential access on fixed # B+Tree of parts • Cardinality / # of shards A A A B B B C C C
  37. 37. Low-cardinality hash key• Limits the # of possible chunks • e.g. 00 ~ FF ➔ 256 chunks • Chunk grows past 64MB • Balancing becomes difficult
  38. 38. Sequential key Hash key Sequential + hash keyLow-cardinality hash key
  39. 39. Low-cardinality hash prefix + sequential part Shard key range: A000 ~ C999• e.g. Short hash prefix + timestamp• Nice index access pattern Local B+Tree• Unlimited number of chunks A000 A123 B000 B123 C000 C123
  40. 40. Finally, 2x throughput
  41. 41. Lessons learned• Know the performance impact of secondary index• Choose the right shard key• Test with large data sets• Linear scalability is hard • If you really need it, consider HBase or Cassandra • SSD
  42. 42. Thank you. Questions? 유응섭 rspeed@daumcorp.com 최준건 gunn@daumcorp.com

×