Scaling with mongo db - SF Mongo User Group 7-19-2011

•Download as PPTX, PDF•

8 likes•3,631 views

- MongoDB allows scaling by using documents, optimizing indexes, and understanding your working data set size. - Replica sets can scale reads by adding secondary nodes for load balancing, while sharding scales writes and RAM usage by splitting data across multiple shards. - Proper disk configuration and replication are important to maximize performance when scaling with MongoDB.

Technology Business

Scaling with MongoDBJared Rosoff (jsr@10gen.com) - @forjared

How do we do it today? We use a relational database but … We don’t use joins We don’t use transactions We add read-only slaves We added a caching layer We de-normalized our data We implemented custom sharding We buy bigger servers

The landscape Memcached Key / Value Scalability & Performance RDBMS Depth of functionality

Scaling your app Use documents Indexes make me happy Knowing your working set Disks are the bottleneck Replication makes reading fun Sharding for profit

Documents { author : "roger", date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)", text : "Spirited Away", tags : [ "Tezuka", "Manga" ], comments : [ { author : "Fred", date : "Sat Jul 24 2010 20:51:03 GMT-0700 (PDT)", text : "Best Movie Ever” } ] }

Disk Seeks & Data Locality Read = really really fast Seek = 5+ ms

Disk Seeks & Data Locality Post Comment Author

Disk Seeks & Data Locality Post Author Comment Comment Comment Comment Comment

Table scans Find where x equals 7 1 2 3 4 5 6 7 Looked at 7 objects

Tree Lookup Find where x equals 7 4 6 2 7 5 3 1 Looked at 3 objects

Random Index Entire index must fit in RAM

Working Set Active Documents + Used Indexes RAM Disk

Page Fault App requests document Document not in memory Evict a page from memory Read block from disk Return document from memory App 1 5 2 RAM 3 4 Disk

$Figuring out working Set > db.foo.stats() { "ns" : "test.foo", "count" : 1338330, "size" : 46915928, "avgObjSize" : 35.05557523181876, "storageSize" : 86092032, "numExtents" : 12, "nindexes" : 2, "lastExtentSize" : 20872960, "paddingFactor" : 1, "flags" : 0, "totalIndexSize" : 99860480, "indexSizes" : { "_id_" : 55877632, "x_1" : 43982848 }, "ok" : 1 } Size of data Average document size Size on disk (and in memory!) Size of all indexes Size of each index$

RAID0 ~200 seeks / second ~200 seeks / second ~200 seeks / second

RAID10 ~400 seeks / second ~400 seeks / second ~400 seeks / second

Replica Sets Read / Write Secondary Read Primary Read Secondary

Replica Sets Read / Write Read Secondary Secondary Read Primary Read Secondary Secondary Read

Secondary Secondary Secondary Secondary MongoS MongoS Shard 1 0..10 Shard 2 10..20 Shard 3 20..30 Shard 4 30..40 Primary Primary Primary Primary Secondary Secondary Secondary Secondary

400GB Index? Shard 1 0..10 Shard 2 10..20 Shard 3 20..30 Shard 4 30..40 100GB Index! 100GB Index! 100GB Index! 100GB Index!

Summary Use documents to your advantage! Optimize your indexes Understand your working set Use a sane disk configuratino Use replicas to scale reads Use sharding to scale writes & working RAM

Similar to Scaling with mongo db - SF Mongo User Group 7-19-2011

Rapid and Scalable Development with MongoDB, PyMongo, and MingRick Copeland

MongoSF 2011 - Using MongoDB for IGN's Social PlatformManish Pandit

BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...BigDataCloud

MongoDB ShardingEugene Kovshilovsky

Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...Codemotion

Intro To Mongo Dbchriskite

MongoDB 3.2 - a giant leap. What’s new?Binary Studio

Tales from the FieldMongoDB

Netezza fundamentals for developersBiju Nair

Why NoSQL Makes SenseMongoDB

MongoDB Basics UnileonJuan Antonio Roy Couto

Deployment Preparedness MongoDB

CouchDBcodebits

MongoDB for Time Series Data Part 3: ShardingMongoDB

Mark Logic StrangeLoop 2010Christopher Biow

10 Key MongoDB Performance Indicators iammutex

Performance Optimization of Rails ApplicationsSerge Smetana

Big Data Lakes Benchmarking 2018Tom Grek

MongoDB at ScaleMongoDB

MongoDB Europe 2016 - Big Data meets Big ComputeMongoDB

Similar to Scaling with mongo db - SF Mongo User Group 7-19-2011 (20)

Rapid and Scalable Development with MongoDB, PyMongo, and Ming

MongoSF 2011 - Using MongoDB for IGN's Social Platform

BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...

MongoDB Sharding

Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...

Intro To Mongo Db

MongoDB 3.2 - a giant leap. What’s new?

Tales from the Field

Netezza fundamentals for developers

Why NoSQL Makes Sense

MongoDB Basics Unileon

Deployment Preparedness

CouchDB

MongoDB for Time Series Data Part 3: Sharding

Mark Logic StrangeLoop 2010

10 Key MongoDB Performance Indicators

Performance Optimization of Rails Applications

Big Data Lakes Benchmarking 2018

MongoDB at Scale

MongoDB Europe 2016 - Big Data meets Big Compute

Recently uploaded

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

Sample pptx for embedding into website for demoHarshalMandlekar2

Training state-of-the-art general text embeddingZilliz

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

A Journey Into the Emotions of Software DevelopersNicole Novielli

Artificial intelligence in cctv survelliance.pptxhariprasad279825

Take control of your SAP testing with UiPath Test SuiteDianaGray10

Time Series Foundation Models - current state and future directionsNathaniel Shimoni

Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3

What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3

TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc

unit 4 immunoblotting technique complete.pptxBkGupta21

The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

TeamStation AI System Report LATAM IT Salaries 2024

Nell’iperspazio con Rocket: il Framework Web di Rust!

Sample pptx for embedding into website for demo

Training state-of-the-art general text embedding

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

Unraveling Multimodality with Large Language Models.pdf

A Journey Into the Emotions of Software Developers

Artificial intelligence in cctv survelliance.pptx

Take control of your SAP testing with UiPath Test Suite

Time Series Foundation Models - current state and future directions

Digital Identity is Under Attack: FIDO Paris Seminar.pptx

What's New in Teams Calling, Meetings and Devices March 2024

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

How AI, OpenAI, and ChatGPT impact business and software.

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy

unit 4 immunoblotting technique complete.pptx

The Ultimate Guide to Choosing WordPress Pros and Cons

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx

Scaling with mongo db - SF Mongo User Group 7-19-2011

1. Scaling with MongoDBJared Rosoff (jsr@10gen.com) - @forjared

3. How do we do it today? We use a relational database but … We don’t use joins We don’t use transactions We add read-only slaves We added a caching layer We de-normalized our data We implemented custom sharding We buy bigger servers

4. How’s that working out for you?

5. Costs go up

6. Productivity goes down

7. By engineers, for engineers

8. The landscape Memcached Key / Value Scalability & Performance RDBMS Depth of functionality

9. Scaling your app Use documents Indexes make me happy Knowing your working set Disks are the bottleneck Replication makes reading fun Sharding for profit

10. Scaling your data model

11. Documents { author : "roger", date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)", text : "Spirited Away", tags : [ "Tezuka", "Manga" ], comments : [ { author : "Fred", date : "Sat Jul 24 2010 20:51:03 GMT-0700 (PDT)", text : "Best Movie Ever” } ] }

12. Disk Seeks & Data Locality Read = really really fast Seek = 5+ ms

13. Disk Seeks & Data Locality Post Comment Author

14. Disk Seeks & Data Locality Post Author Comment Comment Comment Comment Comment

15. Optimized indexes

16. Table scans Find where x equals 7 1 2 3 4 5 6 7 Looked at 7 objects

17. Tree Lookup Find where x equals 7 4 6 2 7 5 3 1 Looked at 3 objects

18. Random Index Entire index must fit in RAM

19. Right Aligned Only small portion in RAM

20. Working set size

21. Working Set Active Documents + Used Indexes RAM Disk

22. Page Fault App requests document Document not in memory Evict a page from memory Read block from disk Return document from memory App 1 5 2 RAM 3 4 Disk

23. Figuring out working Set > db.foo.stats() { "ns" : "test.foo", "count" : 1338330, "size" : 46915928, "avgObjSize" : 35.05557523181876, "storageSize" : 86092032, "numExtents" : 12, "nindexes" : 2, "lastExtentSize" : 20872960, "paddingFactor" : 1, "flags" : 0, "totalIndexSize" : 99860480, "indexSizes" : { "_id_" : 55877632, "x_1" : 43982848 }, "ok" : 1 } Size of data Average document size Size on disk (and in memory!) Size of all indexes Size of each index

24. Disk configurations

25. Single Disk ~200 seeks / second

26. RAID0 ~200 seeks / second ~200 seeks / second ~200 seeks / second

27. RAID10 ~400 seeks / second ~400 seeks / second ~400 seeks / second

28. replication

29. Replica Sets Read / Write Secondary Read Primary Read Secondary

30. Replica Sets Read / Write Read Secondary Secondary Read Primary Read Secondary Secondary Read

31. Sharding

32. Secondary Secondary Secondary Secondary MongoS MongoS Shard 1 0..10 Shard 2 10..20 Shard 3 20..30 Shard 4 30..40 Primary Primary Primary Primary Secondary Secondary Secondary Secondary

33. 400GB Index?

34. 400GB Index? Shard 1 0..10 Shard 2 10..20 Shard 3 20..30 Shard 4 30..40 100GB Index! 100GB Index! 100GB Index! 100GB Index!

35. Summary

36. Summary Use documents to your advantage! Optimize your indexes Understand your working set Use a sane disk configuratino Use replicas to scale reads Use sharding to scale writes & working RAM

Editor's Notes

Let’s talk about infrastructure costs. You probably started building your application on top of an RDBMS. This is the way we have built enterprise and web applications for years. But the problem is that your RDBMS doesn’t have a smooth cost curve when you scale it up. When you start off, you may be running on a smaller server, totally adequate for your load. When you exceed the capacity of that small server, you need to buy a bigger server. You can’t add a second small server. This process repeats. You exceed the capacity of your new server, and upgrade your hardware. There are two long term problems with this: As you scale up, you end up paying more and more for each transaction that your system processes. A small server may cost you $1,000 per CPU, but when you need 128 processors, you might be paying as much as $100,000 per CPU. Each incremental step up in hardware gets more and more expensive, not cheaper and cheaper. You reach an end of this scaling approach. Once you have scaled up to the biggest hardware platform available on the market, there is nowhere to go; no bigger box to buy. At this point you need to change strategies, even if you can afford those ultra-high-end boxes.
And while we’ve been spending more and more money on Hardware, our developer productivity has gone down too. You will hear this storyover and over again from CIO’s and architects: “Well, we use <insert RDBMS> but we don’t use joins or transactions and we’ve de-normalized our schema.” As our hardware gets more and more expensive, we ask our developers to squeeze more and more performance out of the same box. To achieve this, they go through “herculean efforts” to strip their code of advanced features that once made them productive. De-normalizing data, eliminating joins and transactions, adding caching and sharding layers… These are risky projects that slow down feature velocity.

Scaling with mongo db - SF Mongo User Group 7-19-2011

Recommended

Recommended

More Related Content

Similar to Scaling with mongo db - SF Mongo User Group 7-19-2011

Similar to Scaling with mongo db - SF Mongo User Group 7-19-2011 (20)

More from Jared Rosoff

More from Jared Rosoff (9)

Recently uploaded

Recently uploaded (20)

Scaling with mongo db - SF Mongo User Group 7-19-2011

Editor's Notes