Zero to 1 Billion+ Records: A True Story of Learning & Scaling GameChanger

1,270 views
1,298 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,270
On SlideShare
0
From Embeds
0
Number of Embeds
905
Actions
Shares
0
Downloads
14
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Zero to 1 Billion+ Records: A True Story of Learning & Scaling GameChanger

  1. 1. Zero to 1 Billion Records Kiril Savino @holacrat
  2. 2. 2 GC.com/about/product-team
  3. 3. 3 • have a sense of humor • know what use cases work best • remember that databases are hard • don’t understate the difficulty in scaling up
  4. 4. 4 • 1,480,808,857 events • 8 terabytes of primary data • 35 nodes • 420GB RAM on primaries • 21TB SSD storage • 14TB EBS storage • 120,000 ops/s
  5. 5. • Model • Scale • Grow • Extend 5
  6. 6. 6 Model
  7. 7. November 2009 — MongoDB 1.2 • More indexes per collection • Faster index creation • Map/Reduce • Stored JavaScript functions • Configurable fsync time • Several small features and fixes 7 {.}
  8. 8. 8 {.?!?.}
  9. 9. 9 Decoding/Unmarshalling Django ORM {.} [---] business logic RESTAPI MySQL
  10. 10. 10 Decoding/Unmarshalling Django ORM RESTAPI {.} [---] business logic MySQL
  11. 11. 11 Inning Outs Balls Strikes Pitcher
 Batter
  12. 12. 12 Inning Outs Balls Strikes Pitcher
 Batter Period Minute
 Location Shooter Rebounder Assist
  13. 13. 13 [play] [participant] [role] [sport] [play_property]
  14. 14. 14 [play] [participant] [role] [sport] [play_property]
  15. 15. 15 {_id: ObjectId(), code: “1B”, participants: [{player_id: ObjectId(), roles: [“batter”, “out”]}, {player_id: ObjectId(), roles: [“pitcher”]}], situation: {outs: 1, balls: 2, strikes: 0}, properties: {location: [0.45, 0.721]}}
  16. 16. 16 {_id: ObjectId(), code: “shot”, participants: [{player_id: ObjectId(), roles: [“shooter”]}, {player_id: ObjectId(), roles: [“rebounder”]}], situation: {period: 1, time: 5:29}, properties: {location: [0.45, 0.721]}}
  17. 17. 17 Decoding/Unmarshalling Django ORM RESTAPI {.} business logic {.}MongoDB
  18. 18. 18 Decoding/Unmarshalling Django ORM RESTAPI {.} business logic {.}MongoDB 👏
  19. 19. 19
  20. 20. Modeling data in MongoDB 20 • JSON won the internet • Don’t write your own JSON storage engine • Flexible schemas promote app simplicity • Validation is your responsibility • Invest in schema design early
  21. 21. 21 Scale
  22. 22. 22
  23. 23. 23
  24. 24. 24
  25. 25. 25 $$$
  26. 26. 26 $$$ 😱
  27. 27. 27 User Load System Latency
  28. 28. 28 User Load System Latency
  29. 29. 29 User Load System Latency
  30. 30. 30 Scaling is the process of decoupling load from latency.
  31. 31. Latency comes from 31 • Writing data to your database • Reading data from your database • Aggregating data from multiple locations • Running complex calculations
  32. 32. 32 {.} This is a document.
  33. 33. 33 {.} {.} {.} {.} {.} API MongoDB Browser
  34. 34. 34 {.} {.} {.} {.} {.} API MongoDB Browser
  35. 35. 35 {.} {.} {.} {.} {.} API MongoDB Browser +/-*
  36. 36. 36 Read Load System Latency
  37. 37. 37 {.} {.} {.} {.} {.} API MongoDB Browser
  38. 38. 38 {.} {.} {.} {.} {.} API MongoDB Browser +/-*
  39. 39. 39 Write Load System Latency
  40. 40. 40 {.} {.} {.} {.} {.} API MongoDB Browser Background+/-*
  41. 41. 41 {.} {.} {.} {.} {.} API MongoDB Browser Background+/-*
  42. 42. 42 User Load System Latency
  43. 43. 43 {.} {.} {.}
  44. 44. 44 {.} {.}{.} {.} }
  45. 45. 45 {.} {.}{.} {.} }
  46. 46. 46
  47. 47. Scaling data access 47 • Decouple load from latency • Queries are expensive • Aggregation is expensive • Do calculation in the background • Serve content from single* documents
  48. 48. 48 Grow
  49. 49. 49
  50. 50. 50
  51. 51. 51
  52. 52. 52 {.}
  53. 53. 53 {.}
  54. 54. 54 {.}
  55. 55. 55 {.}
  56. 56. 56
  57. 57. 57 {.} {$addToSet: {a: 2}}
  58. 58. 58 {.} {$addToSet: {a: 2}} {.} {v: 2}, {$set: {v: 3}}
  59. 59. 59 {.}
  60. 60. 60
  61. 61. 61 {.} {.}
  62. 62. 62 {a} {abc}{b} {c} }
  63. 63. 63 {.}
  64. 64. 64 {.} {.}
  65. 65. 65 {.} {.} {.}
  66. 66. 66 {.} {.} {.}
  67. 67. 67 {.} {.} {.}
  68. 68. 68 {.} {.} {.}
  69. 69. 69 {.} {.} {.}
  70. 70. 70 {.} {.} {.}
  71. 71. 71 <id> <id> <id> <id> <id> <id> <id> To Propagate
  72. 72. 72 <id> <id> <id> <id> <id> <id> <id> To Propagate Propagating…
  73. 73. 73 <id> <id> <id> <id> <id> <id> <id> To Propagate Propagating… <id> {.} {.} {.}
  74. 74. 74 {$} {$} {$} {$} {$}
  75. 75. Growing load 75 • Denormalize for constant access time • Use MongoDB atomic operators • Check out optimistic locking and MVCC • Leverage external concurrency control • Watch your oplog
  76. 76. 76 Extend
  77. 77. 77 {.} +
  78. 78. 78
  79. 79. 79
  80. 80. 80
  81. 81. So there we have it • Design your schema to MongoDB’s strengths • Use monolithic documents • Don’t do (live) querying • You can still do transactional things • You may need to denormalize & propagate • Think about your overall architecture 81
  82. 82. 82 • have a sense of humor • know what use cases work best • remember that databases are hard • don’t understate the difficulty in scaling up @holacrat

×