Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Tales from the Field, or
True Stories (Anonymized), or
Don’t Solve The Wrong Problem
Richard Kreuter
Director of Consultin...
2
These Stories are (mostly) true.
Only the names have been
changed, to protect the (mostly)
innocent.
3
Roberta the Retailer had an ecommerce site,
selling diverse goods in 20+ countries.
Story 1: Roberta the Retailer
4
{
_id: 375
en_US : { name : ..., description : ..., <etc...> },
en_GB : { name : ..., description : ..., <etc...> },
fr_...
5
• Each document contains all the data about the
product, across all possible locales.
• Most efficient way to retrieve t...
6
But that’s not how product data
is used (except by translation staff, maybe).
7
db.catalog.find( { _id : 375 } , { en_US : true } );
db.catalog.find( { _id : 375 } , { fr_FR : true } );
db.catalog.fin...
8
The Product Catalog's data
model didn't fit the way the data
are used.
9
• The catalog documents contained 20x more data
than any common use case demands
• MongoDB lets you request just a subse...
10
Why is that an issue?
{
_id: 709,
en_US : { name : ..., description : ..., <etc...> },
en_GB : { name : ..., descriptio...
11
So what's the right approach for
the problem?
12
99.99% of queries want the product data for exactly
one locale at a time.
Design for your use case
13
{ _id: "375-en_US",
name : ..., description : ..., <etc...> }
{ _id: "375-en_GB",
name : ..., description : ..., <etc.....
14
• Queries induced minimal memory overhead.
• 20x as many distinct products fit in RAM at once.
• Disk I/O utilization r...
15
Sal had some software for analyzing the day's
trades.
Story #2: Sal the Securities Trader
16
sh.shardCollection ( "mydb.trades" ,
{ "analytics_serverid" : 1 } );
Sal's Shard Key (before)
17
Why did Sal pick this approach?
18
What's good about this architecture?
… 60 more
servers…
19
None of Sal's clients ever cared what server
analyzed the data.
Why the shard key was an issue
20
All queries became scatter/gather
21
Adding shards didn't help query
22
• MongoDB's sharding will automatically rebalance
data as you add shards.
• But a low cardinality (few distinct values)...
23
Very nearly anything.
Really.
(Sal picked a local pessimum in the option space.)
What would have been a better shard
ke...
24
• The common query patterns were based on
security_id and time.
• This compound, { sid : 1, ts : 1 } made a
good shard ...
25
• Read throughput increased 500%.
• Balancing worked as expected.
The outcome: success!
26
Bill built a system that tracked status information for
entities in his business domain.
State changes happen in batche...
27
Bill's architecture
Application / mongos
mongod
28
Bill's system was a success!
The number of business entities grew by a factor of
5.
What happened when it went into
pro...
29
Bill's eventual architecture
Application / mongos
…16 more shards…
mongod
30
Bill's cluster scaled linearly!
(Bill's TCO scaled linearly, too.)
31
… and the usage was going to grow…
32
Horizontal Scaling = Linear Scaling
What problem did Bill overlook?
33
Scale up the random IOPS!
What we recommended
34
Bill's final architecture
Application / mongos
mongod SSD
35
• Even smart people sometimes solve the wrong
problems:
– Roberta's products optimized for non-existent usage
– Sal sol...
36
Thank you
Upcoming SlideShare
Loading in …5
×

Tales from the Field

2,558 views

Published on

Published in: Technology, Business
  • Be the first to comment

Tales from the Field

  1. 1. Tales from the Field, or True Stories (Anonymized), or Don’t Solve The Wrong Problem Richard Kreuter Director of Consulting Engineering MongoDB, Inc.
  2. 2. 2 These Stories are (mostly) true. Only the names have been changed, to protect the (mostly) innocent.
  3. 3. 3 Roberta the Retailer had an ecommerce site, selling diverse goods in 20+ countries. Story 1: Roberta the Retailer
  4. 4. 4 { _id: 375 en_US : { name : ..., description : ..., <etc...> }, en_GB : { name : ..., description : ..., <etc...> }, fr_FR : { name : ..., description : ..., <etc...> }, de_DE : ..., de_CH : ..., <... and so on for other locales... > } Product Catalog (before)
  5. 5. 5 • Each document contains all the data about the product, across all possible locales. • Most efficient way to retrieve the English, French, German, etc. translations of a single product’s information in one query. What’s good about this solution?
  6. 6. 6 But that’s not how product data is used (except by translation staff, maybe).
  7. 7. 7 db.catalog.find( { _id : 375 } , { en_US : true } ); db.catalog.find( { _id : 375 } , { fr_FR : true } ); db.catalog.find( { _id : 375 } , { de_DE : true } ); ... and so forth for other locales ... Dominant Catalog Queries
  8. 8. 8 The Product Catalog's data model didn't fit the way the data are used.
  9. 9. 9 • The catalog documents contained 20x more data than any common use case demands • MongoDB lets you request just a subset of a document's contents (via the 2nd argument to find())... • …but typically the whole document will get loaded into RAM in order to serve the query. Consequences for the catalog
  10. 10. 10 Why is that an issue? { _id: 709, en_US : { name : ..., description : ..., <etc...> }, en_GB : { name : ..., description : ..., <etc...> }, fr_FR : { name : ..., description : ..., <etc...> }, de_DE : ..., de_CH : ..., <... and so on for other locales... > } { _id: 42, en_US : { name : ..., description : ..., <etc...> }, en_GB : { name : ..., description : ..., <etc...> }, fr_FR : { name : ..., description : ..., <etc...> }, de_DE : ..., de_CH : ..., <... and so on for other locales... > } { _id: 3600, en_US : { name : ..., description : ..., <etc...> }, en_GB : { name : ..., description : ..., <etc...> }, fr_FR : { name : ..., description : ..., <etc...> }, de_DE : ..., de_CH : ..., <... and so on for other locales... > } Data in RED are being used. Data in BLUE take up memory but aren't in demand.
  11. 11. 11 So what's the right approach for the problem?
  12. 12. 12 99.99% of queries want the product data for exactly one locale at a time. Design for your use case
  13. 13. 13 { _id: "375-en_US", name : ..., description : ..., <etc...> } { _id: "375-en_GB", name : ..., description : ..., <etc...> } { _id: "375-fr_FR", name : ..., description : ..., <etc...> } ... and so on for other locales ... Product Catalog (after)
  14. 14. 14 • Queries induced minimal memory overhead. • 20x as many distinct products fit in RAM at once. • Disk I/O utilization reduced • UI latency decreased • Profit (well, we hope) Consequences of the redesign
  15. 15. 15 Sal had some software for analyzing the day's trades. Story #2: Sal the Securities Trader
  16. 16. 16 sh.shardCollection ( "mydb.trades" , { "analytics_serverid" : 1 } ); Sal's Shard Key (before)
  17. 17. 17 Why did Sal pick this approach?
  18. 18. 18 What's good about this architecture? … 60 more servers…
  19. 19. 19 None of Sal's clients ever cared what server analyzed the data. Why the shard key was an issue
  20. 20. 20 All queries became scatter/gather
  21. 21. 21 Adding shards didn't help query
  22. 22. 22 • MongoDB's sharding will automatically rebalance data as you add shards. • But a low cardinality (few distinct values) shard key will inhibit balancing. And there were subtler issues
  23. 23. 23 Very nearly anything. Really. (Sal picked a local pessimum in the option space.) What would have been a better shard key?
  24. 24. 24 • The common query patterns were based on security_id and time. • This compound, { sid : 1, ts : 1 } made a good shard key. What did we propose?
  25. 25. 25 • Read throughput increased 500%. • Balancing worked as expected. The outcome: success!
  26. 26. 26 Bill built a system that tracked status information for entities in his business domain. State changes happen in batches; sometimes 10% of entities get updated, sometimes 100% get updated. Story #3: Bill the Bulk Updater
  27. 27. 27 Bill's architecture Application / mongos mongod
  28. 28. 28 Bill's system was a success! The number of business entities grew by a factor of 5. What happened when it went into production?
  29. 29. 29 Bill's eventual architecture Application / mongos …16 more shards… mongod
  30. 30. 30 Bill's cluster scaled linearly! (Bill's TCO scaled linearly, too.)
  31. 31. 31 … and the usage was going to grow…
  32. 32. 32 Horizontal Scaling = Linear Scaling What problem did Bill overlook?
  33. 33. 33 Scale up the random IOPS! What we recommended
  34. 34. 34 Bill's final architecture Application / mongos mongod SSD
  35. 35. 35 • Even smart people sometimes solve the wrong problems: – Roberta's products optimized for non-existent usage – Sal solved for data ingestion, not query – Bill went horizontal when vertical was needed • Often, MongoDB's staff can tell you in advance when you're going in the wrong direction… • … and what you ought to do to help you get where you need to arrive. So what can we say
  36. 36. 36 Thank you

×