SlideShare a Scribd company logo
1 of 36
Scaling with MongoDBJared Rosoff (jsr@10gen.com) - @forjared
How do we do it today?  We use a relational database but …  We don’t use joins We don’t use transactions We add read-only slaves We added a caching layer We de-normalized our data We implemented custom sharding We buy bigger servers
How’s that working out for you?
Costs go up
Productivity goes down
By engineers, for engineers
The landscape Memcached Key / Value Scalability & Performance RDBMS Depth of functionality
Scaling your app Use documents  Indexes make me happy Knowing your working set Disks are the bottleneck Replication makes reading fun Sharding for profit
Scaling your data model
Documents {   author : "roger", date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)",  text : "Spirited Away", tags : [ "Tezuka", "Manga" ], comments : [       { author : "Fred", date : "Sat Jul 24 2010 20:51:03 GMT-0700 (PDT)", text : "Best Movie Ever”       }    ] }
Disk Seeks & Data Locality Read = really really fast Seek = 5+ ms
Disk Seeks & Data Locality Post Comment Author
Disk Seeks & Data Locality Post Author Comment Comment Comment Comment Comment
Optimized indexes
Table scans Find where x equals 7 1 2 3 4 5 6 7 Looked at 7 objects
Tree Lookup Find where x equals 7 4 6 2 7 5 3 1 Looked at 3 objects
Random Index Entire index must fit in RAM
Right Aligned Only small portion in RAM
Working set size
Working Set Active Documents + Used Indexes RAM Disk
Page Fault App requests document Document not in memory Evict a page from memory Read block from disk  Return document from memory App 1 5 2 RAM 3 4 Disk
Figuring out working Set > db.foo.stats()  { 	"ns" : "test.foo", 	"count" : 1338330, 	"size" : 46915928, 	"avgObjSize" : 35.05557523181876, 	"storageSize" : 86092032, 	"numExtents" : 12, 	"nindexes" : 2, 	"lastExtentSize" : 20872960, 	"paddingFactor" : 1, 	"flags" : 0, 	"totalIndexSize" : 99860480, 	"indexSizes" : { 		"_id_" : 55877632, 		"x_1" : 43982848 	}, 	"ok" : 1 } Size of data Average document size Size on disk (and in memory!) Size of all indexes Size of each index
Disk configurations
Single Disk ~200 seeks / second
RAID0 ~200 seeks / second ~200 seeks / second ~200 seeks / second
RAID10 ~400 seeks / second ~400 seeks / second ~400 seeks / second
replication
Replica Sets Read / Write Secondary Read Primary Read Secondary
Replica Sets Read / Write Read Secondary Secondary Read Primary Read Secondary Secondary Read
Sharding
Secondary Secondary Secondary Secondary MongoS MongoS Shard 1 0..10 Shard 2 10..20 Shard 3 20..30 Shard 4 30..40 Primary Primary Primary Primary Secondary Secondary Secondary Secondary
400GB Index?
400GB Index? Shard 1 0..10 Shard 2 10..20 Shard 3 20..30 Shard 4 30..40 100GB Index! 100GB Index! 100GB Index! 100GB Index!
Summary
Summary Use documents to your advantage! Optimize your indexes  Understand your working set Use a sane disk configuratino Use replicas to scale reads  Use sharding to scale writes & working RAM

More Related Content

Similar to Scaling with mongo db - SF Mongo User Group 7-19-2011

BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
BigDataCloud
 
Intro To Mongo Db
Intro To Mongo DbIntro To Mongo Db
Intro To Mongo Db
chriskite
 
Tales from the Field
Tales from the FieldTales from the Field
Tales from the Field
MongoDB
 
Netezza fundamentals for developers
Netezza fundamentals for developersNetezza fundamentals for developers
Netezza fundamentals for developers
Biju Nair
 
Deployment Preparedness
Deployment Preparedness Deployment Preparedness
Deployment Preparedness
MongoDB
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: Sharding
MongoDB
 
Performance Optimization of Rails Applications
Performance Optimization of Rails ApplicationsPerformance Optimization of Rails Applications
Performance Optimization of Rails Applications
Serge Smetana
 

Similar to Scaling with mongo db - SF Mongo User Group 7-19-2011 (20)

Rapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and MingRapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
 
MongoSF 2011 - Using MongoDB for IGN's Social Platform
MongoSF 2011 - Using MongoDB for IGN's Social PlatformMongoSF 2011 - Using MongoDB for IGN's Social Platform
MongoSF 2011 - Using MongoDB for IGN's Social Platform
 
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
 
MongoDB Sharding
MongoDB ShardingMongoDB Sharding
MongoDB Sharding
 
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
 
Intro To Mongo Db
Intro To Mongo DbIntro To Mongo Db
Intro To Mongo Db
 
MongoDB 3.2 - a giant leap. What’s new?
MongoDB 3.2 - a giant leap. What’s new?MongoDB 3.2 - a giant leap. What’s new?
MongoDB 3.2 - a giant leap. What’s new?
 
Tales from the Field
Tales from the FieldTales from the Field
Tales from the Field
 
Netezza fundamentals for developers
Netezza fundamentals for developersNetezza fundamentals for developers
Netezza fundamentals for developers
 
Why NoSQL Makes Sense
Why NoSQL Makes SenseWhy NoSQL Makes Sense
Why NoSQL Makes Sense
 
MongoDB Basics Unileon
MongoDB Basics UnileonMongoDB Basics Unileon
MongoDB Basics Unileon
 
Deployment Preparedness
Deployment Preparedness Deployment Preparedness
Deployment Preparedness
 
CouchDB
CouchDBCouchDB
CouchDB
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: Sharding
 
Mark Logic StrangeLoop 2010
Mark Logic StrangeLoop 2010Mark Logic StrangeLoop 2010
Mark Logic StrangeLoop 2010
 
10 Key MongoDB Performance Indicators
10 Key MongoDB Performance Indicators  10 Key MongoDB Performance Indicators
10 Key MongoDB Performance Indicators
 
Performance Optimization of Rails Applications
Performance Optimization of Rails ApplicationsPerformance Optimization of Rails Applications
Performance Optimization of Rails Applications
 
Big Data Lakes Benchmarking 2018
Big Data Lakes Benchmarking 2018Big Data Lakes Benchmarking 2018
Big Data Lakes Benchmarking 2018
 
MongoDB at Scale
MongoDB at ScaleMongoDB at Scale
MongoDB at Scale
 
MongoDB Europe 2016 - Big Data meets Big Compute
MongoDB Europe 2016 - Big Data meets Big ComputeMongoDB Europe 2016 - Big Data meets Big Compute
MongoDB Europe 2016 - Big Data meets Big Compute
 

More from Jared Rosoff (9)

MongoDB Advanced Schema Design - Inboxes
MongoDB Advanced Schema Design - InboxesMongoDB Advanced Schema Design - Inboxes
MongoDB Advanced Schema Design - Inboxes
 
Mongosv 2011 - Sharding
Mongosv 2011 - ShardingMongosv 2011 - Sharding
Mongosv 2011 - Sharding
 
Mongosv 2011 - Replication
Mongosv 2011 - ReplicationMongosv 2011 - Replication
Mongosv 2011 - Replication
 
Mongosv 2011 - MongoDB on Amazon EC2
Mongosv 2011 - MongoDB on Amazon EC2Mongosv 2011 - MongoDB on Amazon EC2
Mongosv 2011 - MongoDB on Amazon EC2
 
MongoDB Deployment Tips
MongoDB Deployment TipsMongoDB Deployment Tips
MongoDB Deployment Tips
 
MongoDB on EC2 and EBS
MongoDB on EC2 and EBSMongoDB on EC2 and EBS
MongoDB on EC2 and EBS
 
Indexing & query optimization
Indexing & query optimizationIndexing & query optimization
Indexing & query optimization
 
Web performance meetup bos 11 18-2010
Web performance meetup bos 11 18-2010Web performance meetup bos 11 18-2010
Web performance meetup bos 11 18-2010
 
Scalable Event Analytics with MongoDB & Ruby on Rails
Scalable Event Analytics with MongoDB & Ruby on RailsScalable Event Analytics with MongoDB & Ruby on Rails
Scalable Event Analytics with MongoDB & Ruby on Rails
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 

Scaling with mongo db - SF Mongo User Group 7-19-2011

  • 1. Scaling with MongoDBJared Rosoff (jsr@10gen.com) - @forjared
  • 2.
  • 3. How do we do it today? We use a relational database but … We don’t use joins We don’t use transactions We add read-only slaves We added a caching layer We de-normalized our data We implemented custom sharding We buy bigger servers
  • 4. How’s that working out for you?
  • 7. By engineers, for engineers
  • 8. The landscape Memcached Key / Value Scalability & Performance RDBMS Depth of functionality
  • 9. Scaling your app Use documents Indexes make me happy Knowing your working set Disks are the bottleneck Replication makes reading fun Sharding for profit
  • 11. Documents { author : "roger", date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)", text : "Spirited Away", tags : [ "Tezuka", "Manga" ], comments : [ { author : "Fred", date : "Sat Jul 24 2010 20:51:03 GMT-0700 (PDT)", text : "Best Movie Ever” } ] }
  • 12. Disk Seeks & Data Locality Read = really really fast Seek = 5+ ms
  • 13. Disk Seeks & Data Locality Post Comment Author
  • 14. Disk Seeks & Data Locality Post Author Comment Comment Comment Comment Comment
  • 16. Table scans Find where x equals 7 1 2 3 4 5 6 7 Looked at 7 objects
  • 17. Tree Lookup Find where x equals 7 4 6 2 7 5 3 1 Looked at 3 objects
  • 18. Random Index Entire index must fit in RAM
  • 19. Right Aligned Only small portion in RAM
  • 21. Working Set Active Documents + Used Indexes RAM Disk
  • 22. Page Fault App requests document Document not in memory Evict a page from memory Read block from disk Return document from memory App 1 5 2 RAM 3 4 Disk
  • 23. Figuring out working Set > db.foo.stats() { "ns" : "test.foo", "count" : 1338330, "size" : 46915928, "avgObjSize" : 35.05557523181876, "storageSize" : 86092032, "numExtents" : 12, "nindexes" : 2, "lastExtentSize" : 20872960, "paddingFactor" : 1, "flags" : 0, "totalIndexSize" : 99860480, "indexSizes" : { "_id_" : 55877632, "x_1" : 43982848 }, "ok" : 1 } Size of data Average document size Size on disk (and in memory!) Size of all indexes Size of each index
  • 25. Single Disk ~200 seeks / second
  • 26. RAID0 ~200 seeks / second ~200 seeks / second ~200 seeks / second
  • 27. RAID10 ~400 seeks / second ~400 seeks / second ~400 seeks / second
  • 29. Replica Sets Read / Write Secondary Read Primary Read Secondary
  • 30. Replica Sets Read / Write Read Secondary Secondary Read Primary Read Secondary Secondary Read
  • 32. Secondary Secondary Secondary Secondary MongoS MongoS Shard 1 0..10 Shard 2 10..20 Shard 3 20..30 Shard 4 30..40 Primary Primary Primary Primary Secondary Secondary Secondary Secondary
  • 34. 400GB Index? Shard 1 0..10 Shard 2 10..20 Shard 3 20..30 Shard 4 30..40 100GB Index! 100GB Index! 100GB Index! 100GB Index!
  • 36. Summary Use documents to your advantage! Optimize your indexes Understand your working set Use a sane disk configuratino Use replicas to scale reads Use sharding to scale writes & working RAM

Editor's Notes

  1. Let’s talk about infrastructure costs. You probably started building your application on top of an RDBMS. This is the way we have built enterprise and web applications for years. But the problem is that your RDBMS doesn’t have a smooth cost curve when you scale it up. When you start off, you may be running on a smaller server, totally adequate for your load. When you exceed the capacity of that small server, you need to buy a bigger server. You can’t add a second small server. This process repeats. You exceed the capacity of your new server, and upgrade your hardware. There are two long term problems with this: As you scale up, you end up paying more and more for each transaction that your system processes. A small server may cost you $1,000 per CPU, but when you need 128 processors, you might be paying as much as $100,000 per CPU. Each incremental step up in hardware gets more and more expensive, not cheaper and cheaper. You reach an end of this scaling approach. Once you have scaled up to the biggest hardware platform available on the market, there is nowhere to go; no bigger box to buy. At this point you need to change strategies, even if you can afford those ultra-high-end boxes.
  2. And while we’ve been spending more and more money on Hardware, our developer productivity has gone down too. You will hear this storyover and over again from CIO’s and architects: “Well, we use <insert RDBMS> but we don’t use joins or transactions and we’ve de-normalized our schema.” As our hardware gets more and more expensive, we ask our developers to squeeze more and more performance out of the same box. To achieve this, they go through “herculean efforts” to strip their code of advanced features that once made them productive. De-normalizing data, eliminating joins and transactions, adding caching and sharding layers… These are risky projects that slow down feature velocity.