Scaling with mongo db - SF Mongo User Group 7-19-2011

•Download as PPTX, PDF•

8 likes•3,631 views

- MongoDB allows scaling by using documents, optimizing indexes, and understanding your working data set size. - Replica sets can scale reads by adding secondary nodes for load balancing, while sharding scales writes and RAM usage by splitting data across multiple shards. - Proper disk configuration and replication are important to maximize performance when scaling with MongoDB.

Technology Business

Scaling with MongoDBJared Rosoff (jsr@10gen.com) - @forjared

How do we do it today? We use a relational database but … We don’t use joins We don’t use transactions We add read-only slaves We added a caching layer We de-normalized our data We implemented custom sharding We buy bigger servers

The landscape Memcached Key / Value Scalability & Performance RDBMS Depth of functionality

Scaling your app Use documents Indexes make me happy Knowing your working set Disks are the bottleneck Replication makes reading fun Sharding for profit

Documents { author : "roger", date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)", text : "Spirited Away", tags : [ "Tezuka", "Manga" ], comments : [ { author : "Fred", date : "Sat Jul 24 2010 20:51:03 GMT-0700 (PDT)", text : "Best Movie Ever” } ] }

Disk Seeks & Data Locality Read = really really fast Seek = 5+ ms

Disk Seeks & Data Locality Post Comment Author

Disk Seeks & Data Locality Post Author Comment Comment Comment Comment Comment

Table scans Find where x equals 7 1 2 3 4 5 6 7 Looked at 7 objects

Tree Lookup Find where x equals 7 4 6 2 7 5 3 1 Looked at 3 objects

Random Index Entire index must fit in RAM

Working Set Active Documents + Used Indexes RAM Disk

Page Fault App requests document Document not in memory Evict a page from memory Read block from disk Return document from memory App 1 5 2 RAM 3 4 Disk

$Figuring out working Set > db.foo.stats() { "ns" : "test.foo", "count" : 1338330, "size" : 46915928, "avgObjSize" : 35.05557523181876, "storageSize" : 86092032, "numExtents" : 12, "nindexes" : 2, "lastExtentSize" : 20872960, "paddingFactor" : 1, "flags" : 0, "totalIndexSize" : 99860480, "indexSizes" : { "_id_" : 55877632, "x_1" : 43982848 }, "ok" : 1 } Size of data Average document size Size on disk (and in memory!) Size of all indexes Size of each index$

RAID0 ~200 seeks / second ~200 seeks / second ~200 seeks / second

RAID10 ~400 seeks / second ~400 seeks / second ~400 seeks / second

Replica Sets Read / Write Secondary Read Primary Read Secondary

Replica Sets Read / Write Read Secondary Secondary Read Primary Read Secondary Secondary Read

Secondary Secondary Secondary Secondary MongoS MongoS Shard 1 0..10 Shard 2 10..20 Shard 3 20..30 Shard 4 30..40 Primary Primary Primary Primary Secondary Secondary Secondary Secondary

400GB Index? Shard 1 0..10 Shard 2 10..20 Shard 3 20..30 Shard 4 30..40 100GB Index! 100GB Index! 100GB Index! 100GB Index!

Summary Use documents to your advantage! Optimize your indexes Understand your working set Use a sane disk configuratino Use replicas to scale reads Use sharding to scale writes & working RAM

Similar to Scaling with mongo db - SF Mongo User Group 7-19-2011

Rapid and Scalable Development with MongoDB, PyMongo, and MingRick Copeland

MongoSF 2011 - Using MongoDB for IGN's Social PlatformManish Pandit

BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...BigDataCloud

MongoDB ShardingEugene Kovshilovsky

Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...Codemotion

Intro To Mongo Dbchriskite

MongoDB 3.2 - a giant leap. What’s new?Binary Studio

Tales from the FieldMongoDB

Netezza fundamentals for developersBiju Nair

Why NoSQL Makes SenseMongoDB

MongoDB Basics UnileonJuan Antonio Roy Couto

Deployment Preparedness MongoDB

CouchDBcodebits

MongoDB for Time Series Data Part 3: ShardingMongoDB

Mark Logic StrangeLoop 2010Christopher Biow

10 Key MongoDB Performance Indicators iammutex

Performance Optimization of Rails ApplicationsSerge Smetana

Big Data Lakes Benchmarking 2018Tom Grek

MongoDB at ScaleMongoDB

MongoDB Europe 2016 - Big Data meets Big ComputeMongoDB

Similar to Scaling with mongo db - SF Mongo User Group 7-19-2011 (20)

Rapid and Scalable Development with MongoDB, PyMongo, and Ming

MongoSF 2011 - Using MongoDB for IGN's Social Platform

BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...

MongoDB Sharding

Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...

Intro To Mongo Db

MongoDB 3.2 - a giant leap. What’s new?

Tales from the Field

Netezza fundamentals for developers

Why NoSQL Makes Sense

MongoDB Basics Unileon

Deployment Preparedness

CouchDB

MongoDB for Time Series Data Part 3: Sharding

Mark Logic StrangeLoop 2010

10 Key MongoDB Performance Indicators

Performance Optimization of Rails Applications

Big Data Lakes Benchmarking 2018

MongoDB at Scale

MongoDB Europe 2016 - Big Data meets Big Compute

Recently uploaded

WordPress Websites for Engineers: Elevate Your Brandgvaughan

Gen AI in Business - Global Trends Report 2024.pdfAddepto

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang

Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed

Powerpoint exploring the locations used in television show Time Clashcharlottematthew16

costume and set research powerpoint presentationphoebematthew05

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Pigging Solutions in Pet Food ManufacturingPigging Solutions

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge

CloudStudio User manual (basic edition):comworks

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays

Artificial intelligence in cctv survelliance.pptxhariprasad279825

DMCC Future of Trade Web3 - Special EditionDubai Multi Commodity Centre

Recently uploaded (20)

WordPress Websites for Engineers: Elevate Your Brand

Gen AI in Business - Global Trends Report 2024.pdf

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)

Scanning the Internet for External Cloud Exposures via SSL Certs

Powerpoint exploring the locations used in television show Time Clash

costume and set research powerpoint presentation

DevEX - reference for building teams, processes, and platforms

Connect Wave/ connectwave Pitch Deck Presentation

My Hashitalk Indonesia April 2024 Presentation

SIP trunking in Janus @ Kamailio World 2024

Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service

Pigging Solutions in Pet Food Manufacturing

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

Designing IA for AI - Information Architecture Conference 2024

CloudStudio User manual (basic edition):

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

Streamlining Python Development: A Guide to a Modern Project Setup

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn

Artificial intelligence in cctv survelliance.pptx

DMCC Future of Trade Web3 - Special Edition

Scaling with mongo db - SF Mongo User Group 7-19-2011

1. Scaling with MongoDBJared Rosoff (jsr@10gen.com) - @forjared

3. How do we do it today? We use a relational database but … We don’t use joins We don’t use transactions We add read-only slaves We added a caching layer We de-normalized our data We implemented custom sharding We buy bigger servers

4. How’s that working out for you?

5. Costs go up

6. Productivity goes down

7. By engineers, for engineers

8. The landscape Memcached Key / Value Scalability & Performance RDBMS Depth of functionality

9. Scaling your app Use documents Indexes make me happy Knowing your working set Disks are the bottleneck Replication makes reading fun Sharding for profit

10. Scaling your data model

11. Documents { author : "roger", date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)", text : "Spirited Away", tags : [ "Tezuka", "Manga" ], comments : [ { author : "Fred", date : "Sat Jul 24 2010 20:51:03 GMT-0700 (PDT)", text : "Best Movie Ever” } ] }

12. Disk Seeks & Data Locality Read = really really fast Seek = 5+ ms

13. Disk Seeks & Data Locality Post Comment Author

14. Disk Seeks & Data Locality Post Author Comment Comment Comment Comment Comment

15. Optimized indexes

16. Table scans Find where x equals 7 1 2 3 4 5 6 7 Looked at 7 objects

17. Tree Lookup Find where x equals 7 4 6 2 7 5 3 1 Looked at 3 objects

18. Random Index Entire index must fit in RAM

19. Right Aligned Only small portion in RAM

20. Working set size

21. Working Set Active Documents + Used Indexes RAM Disk

22. Page Fault App requests document Document not in memory Evict a page from memory Read block from disk Return document from memory App 1 5 2 RAM 3 4 Disk

23. Figuring out working Set > db.foo.stats() { "ns" : "test.foo", "count" : 1338330, "size" : 46915928, "avgObjSize" : 35.05557523181876, "storageSize" : 86092032, "numExtents" : 12, "nindexes" : 2, "lastExtentSize" : 20872960, "paddingFactor" : 1, "flags" : 0, "totalIndexSize" : 99860480, "indexSizes" : { "_id_" : 55877632, "x_1" : 43982848 }, "ok" : 1 } Size of data Average document size Size on disk (and in memory!) Size of all indexes Size of each index

24. Disk configurations

25. Single Disk ~200 seeks / second

26. RAID0 ~200 seeks / second ~200 seeks / second ~200 seeks / second

27. RAID10 ~400 seeks / second ~400 seeks / second ~400 seeks / second

28. replication

29. Replica Sets Read / Write Secondary Read Primary Read Secondary

30. Replica Sets Read / Write Read Secondary Secondary Read Primary Read Secondary Secondary Read

31. Sharding

32. Secondary Secondary Secondary Secondary MongoS MongoS Shard 1 0..10 Shard 2 10..20 Shard 3 20..30 Shard 4 30..40 Primary Primary Primary Primary Secondary Secondary Secondary Secondary

33. 400GB Index?

34. 400GB Index? Shard 1 0..10 Shard 2 10..20 Shard 3 20..30 Shard 4 30..40 100GB Index! 100GB Index! 100GB Index! 100GB Index!

35. Summary

36. Summary Use documents to your advantage! Optimize your indexes Understand your working set Use a sane disk configuratino Use replicas to scale reads Use sharding to scale writes & working RAM

Editor's Notes

Let’s talk about infrastructure costs. You probably started building your application on top of an RDBMS. This is the way we have built enterprise and web applications for years. But the problem is that your RDBMS doesn’t have a smooth cost curve when you scale it up. When you start off, you may be running on a smaller server, totally adequate for your load. When you exceed the capacity of that small server, you need to buy a bigger server. You can’t add a second small server. This process repeats. You exceed the capacity of your new server, and upgrade your hardware. There are two long term problems with this: As you scale up, you end up paying more and more for each transaction that your system processes. A small server may cost you $1,000 per CPU, but when you need 128 processors, you might be paying as much as $100,000 per CPU. Each incremental step up in hardware gets more and more expensive, not cheaper and cheaper. You reach an end of this scaling approach. Once you have scaled up to the biggest hardware platform available on the market, there is nowhere to go; no bigger box to buy. At this point you need to change strategies, even if you can afford those ultra-high-end boxes.
And while we’ve been spending more and more money on Hardware, our developer productivity has gone down too. You will hear this storyover and over again from CIO’s and architects: “Well, we use <insert RDBMS> but we don’t use joins or transactions and we’ve de-normalized our schema.” As our hardware gets more and more expensive, we ask our developers to squeeze more and more performance out of the same box. To achieve this, they go through “herculean efforts” to strip their code of advanced features that once made them productive. De-normalizing data, eliminating joins and transactions, adding caching and sharding layers… These are risky projects that slow down feature velocity.

Scaling with mongo db - SF Mongo User Group 7-19-2011

Recommended

Recommended

More Related Content

Similar to Scaling with mongo db - SF Mongo User Group 7-19-2011

Similar to Scaling with mongo db - SF Mongo User Group 7-19-2011 (20)

More from Jared Rosoff

More from Jared Rosoff (9)

Recently uploaded

Recently uploaded (20)

Scaling with mongo db - SF Mongo User Group 7-19-2011

Editor's Notes