SlideShare a Scribd company logo
1 of 32
Download to read offline
Rick Copeland @rick446
Arborian Consulting, LLC
   Now a consultant, but formerly…

     Software engineer at SourceForge, early adopter of
     MongoDB (version 0.8)

     Wrote the SQLAlchemy book (I love SQL when it’s
     used well)

     Mainly write Python now, but have done C++, C#,
     Java, Javascript, VHDL, Verilog, …
   You can do it with an RDBMS as long as you…
     Don’t use joins
     Don’t use transactions
     Use read-only slaves
     Use memcached
     Denormalize your data
     Use custom sharding/partitioning
     Do a lot of vertical scaling
      ▪ (we’re going to need a bigger box)
+1
Year
Scaling with MongoDB
   Use documents to improve locality

   Optimize your indexes

   Be aware of your working set

   Scaling your disks

   Replication for fault-tolerance and read scaling

   Sharding for read and write scaling
Relational (SQL)   MongoDB
Database           Database              Dynamic
                                          Typing
Table              Collection            B-tree
                                     (range-based)
Index              Index
Row                Document
                                          Think JSON
Column             Field
                                 Primitive types +
                                arrays, documents
{
    title: "Slides for Scaling with MongoDB",
    author: "Rick Copeland",
    date: ISODate("20012-02-29T19:30:00Z"),
    text: "My slides are available on speakerdeck.com",
    comments: [
      { author: "anonymous",
         date: ISODate("20012-02-29T19:30:01Z"),
        text: "Fristpsot!" },
      { author: "mark”,
        date: ISODate("20012-02-29T19:45:23Z"),
        text: "Nice slides" } ] }
                                                 Embed comment data in
                                                  blog post document
Seek = 5+ ms   Read = really really fast
Post
                Comment
Author
Post

Author

Comment
Comment
 Comment
 Comment
  Comment
Find where x equals 7

1   2    3   4   5   6   7




        Looked at 7 objects
Find where x
equals 7         4



        2                6



    1        3       5       7




            Looked at 3 objects
Entire index
must fit in
RAM
Only small
 portion in
      RAM
   Working set =
     sizeof(frequently used data)
     + sizeof(frequently used indexes)

   Right-aligned indexes reduce working set size

   Working set should fit in available RAM for best
    performance

   Page faults are the biggest cause of performance
    loss in MongoDB
>db.foo.stats()
                                      Data Size
{
  "ns" : "test.foo",
  "count" : 1338330,
  "size" : 46915928,                                 Average doc size
  "avgObjSize" : 35.05557523181876,
  "storageSize" : 86092032,
  "numExtents" : 12,
  "nindexes" : 2,                        Size on disk (or RAM!)
  "lastExtentSize" : 20872960,
  "paddingFactor" : 1,
  "flags" : 0,                                    Size of all indexes
  "totalIndexSize" : 99860480,
  "indexSizes" : {
    "_id_" : 55877632,
    "x_1" : 43982848},
  "ok" : 1                                        Size of each index
}
~200 seeks / second
~200 seeks / second   ~200 seeks / second   ~200 seeks / second


   Faster, but less reliable
~400 seeks / second   ~400 seeks / second   ~400 seeks / second


   Faster and more reliable ($$$ though)
   Old and busted  master/slave replication

   The new hotness  replica sets with automatic
    failover
                 Read / Write       Primary




                       Read       Secondary




                       Read       Secondary
   Primary handles all
    writes

   Application optionally
    sends reads to slaves

   Heartbeat manages
    automatic failover
   Special collection (the oplog) records operations
    idempotently

   Secondaries read from primary oplog and replay
    operations locally

   Space is preallocated and fixed for the oplog
{
"ts" : Timestamp(1317653790000, 2),
                                     Insert
"h" : -6022751846629753359,
"op" : "i",
"ns" : "confoo.People",                  Collection name
"o" : {
"_id" : ObjectId("4e89cd1e0364241932324269"),
"first" : "Rick",
"last" : "Copeland”
   }
}                                                   Object to insert
   Use heartbeat signal to detect failure

   When primary can’t be reached, elect a new one

   Replica that’s the most up-to-date is chosen

   If there is skew, changes not on new primary are
    saved to a .bson file for manual reconciliation

   Application can require data to be replicated to a
    majority to ensure this doesn’t happen
   Priority
     Slower nodes with lower priority
     Backup or read-only nodes to never be primary

   slaveDelay
     Fat-finger protection

   Data center awareness and tagging
     Application can ensure complex replication
     guarantees
   Reads scale nicely
     As long as the working set fits in RAM
     … and you don’t mind eventual consistency


   Sharding to the rescue!
     Automatically partitioned data sets
     Scale writes and reads
     Automatic load balancing between the shards
Configuration
             MongoS        MongoS
                                           Config 1    Config 2             Config 3




Shard 1               Shard 2       Shard 3               Shard 4
 0..10                 10..20        20..30                30..40


   Primary               Primary       Primary                    Primary




 Secondary             Secondary     Secondary              Secondary




 Secondary             Secondary     Secondary              Secondary
   Sharding is per-collection and range-based

   The highest-impact choice (and hardest to
    change decision) you make is the shard key
     Random keys: good for writes, bad for reads
     Right-aligned index: bad for writes
     Small # of discrete keys: very bad
     Ideal: balance writes, make reads routable by mongos
     Optimal shard key selection is hard
Primary Data Center               Secondary Data Center


 Shard 1           Shard 1                    Shard 1
Priority 1        Priority 1                 Priority 0



 Shard 2           Shard 2                    Shard 2
Priority 1        Priority 1                 Priority 0



 Shard 3           Shard 3                    Shard 3
                             RS3
Priority 1        Priority 1                 Priority 0



Config 1          Config 2                   Config 3
   Writes and reads both scale (with good choice of
    shard key)

   Reads scale while remaining strongly consistent

   Partitioning ensures you get more usable RAM

   Pitfall: don’t wait too long to add capacity
Rick Copeland @rick446
Arborian Consulting, LLC

More Related Content

What's hot

MongoDB Pros and Cons
MongoDB Pros and ConsMongoDB Pros and Cons
MongoDB Pros and Consjohnrjenson
 
MongoDB Capacity Planning
MongoDB Capacity PlanningMongoDB Capacity Planning
MongoDB Capacity PlanningNorberto Leite
 
Running MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWSRunning MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWSMongoDB
 
Migrating to MongoDB: Best Practices
Migrating to MongoDB: Best PracticesMigrating to MongoDB: Best Practices
Migrating to MongoDB: Best PracticesMongoDB
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDBMongoDB
 
3 scenarios when to use MongoDB!
3 scenarios when to use MongoDB!3 scenarios when to use MongoDB!
3 scenarios when to use MongoDB!Edureka!
 
Webinar: Deploying MongoDB to Production in Data Centers and the Cloud
Webinar: Deploying MongoDB to Production in Data Centers and the CloudWebinar: Deploying MongoDB to Production in Data Centers and the Cloud
Webinar: Deploying MongoDB to Production in Data Centers and the CloudMongoDB
 
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDBBenchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDBAthiq Ahamed
 
Introduction to Sharding
Introduction to ShardingIntroduction to Sharding
Introduction to ShardingMongoDB
 
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...leifwalsh
 
MongoDB Deployment Checklist
MongoDB Deployment ChecklistMongoDB Deployment Checklist
MongoDB Deployment ChecklistMongoDB
 
Everything You Need to Know About Sharding
Everything You Need to Know About ShardingEverything You Need to Know About Sharding
Everything You Need to Know About ShardingMongoDB
 
Strengths and Weaknesses of MongoDB
Strengths and Weaknesses of MongoDBStrengths and Weaknesses of MongoDB
Strengths and Weaknesses of MongoDBlehresman
 
Evolution of MonogDB Sharding and Its Best Practices - Ranjith A - Mydbops Team
Evolution of MonogDB Sharding and Its Best Practices - Ranjith A - Mydbops TeamEvolution of MonogDB Sharding and Its Best Practices - Ranjith A - Mydbops Team
Evolution of MonogDB Sharding and Its Best Practices - Ranjith A - Mydbops TeamMydbops
 
Efficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajoEfficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajoHyunsik Choi
 
Capacity Planning
Capacity PlanningCapacity Planning
Capacity PlanningMongoDB
 
Hardware Provisioning for MongoDB
Hardware Provisioning for MongoDBHardware Provisioning for MongoDB
Hardware Provisioning for MongoDBMongoDB
 
Challenges with MongoDB
Challenges with MongoDBChallenges with MongoDB
Challenges with MongoDBStone Gao
 
Sharding
ShardingSharding
ShardingMongoDB
 

What's hot (20)

MongoDB Pros and Cons
MongoDB Pros and ConsMongoDB Pros and Cons
MongoDB Pros and Cons
 
MongoDB Capacity Planning
MongoDB Capacity PlanningMongoDB Capacity Planning
MongoDB Capacity Planning
 
Running MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWSRunning MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWS
 
Migrating to MongoDB: Best Practices
Migrating to MongoDB: Best PracticesMigrating to MongoDB: Best Practices
Migrating to MongoDB: Best Practices
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDB
 
3 scenarios when to use MongoDB!
3 scenarios when to use MongoDB!3 scenarios when to use MongoDB!
3 scenarios when to use MongoDB!
 
Webinar: Deploying MongoDB to Production in Data Centers and the Cloud
Webinar: Deploying MongoDB to Production in Data Centers and the CloudWebinar: Deploying MongoDB to Production in Data Centers and the Cloud
Webinar: Deploying MongoDB to Production in Data Centers and the Cloud
 
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDBBenchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
 
Introduction to Sharding
Introduction to ShardingIntroduction to Sharding
Introduction to Sharding
 
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
 
MongoDB Deployment Checklist
MongoDB Deployment ChecklistMongoDB Deployment Checklist
MongoDB Deployment Checklist
 
Everything You Need to Know About Sharding
Everything You Need to Know About ShardingEverything You Need to Know About Sharding
Everything You Need to Know About Sharding
 
Strengths and Weaknesses of MongoDB
Strengths and Weaknesses of MongoDBStrengths and Weaknesses of MongoDB
Strengths and Weaknesses of MongoDB
 
Evolution of MonogDB Sharding and Its Best Practices - Ranjith A - Mydbops Team
Evolution of MonogDB Sharding and Its Best Practices - Ranjith A - Mydbops TeamEvolution of MonogDB Sharding and Its Best Practices - Ranjith A - Mydbops Team
Evolution of MonogDB Sharding and Its Best Practices - Ranjith A - Mydbops Team
 
Efficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajoEfficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajo
 
Mongo db
Mongo dbMongo db
Mongo db
 
Capacity Planning
Capacity PlanningCapacity Planning
Capacity Planning
 
Hardware Provisioning for MongoDB
Hardware Provisioning for MongoDBHardware Provisioning for MongoDB
Hardware Provisioning for MongoDB
 
Challenges with MongoDB
Challenges with MongoDBChallenges with MongoDB
Challenges with MongoDB
 
Sharding
ShardingSharding
Sharding
 

Viewers also liked

2017 Volvo S60 Brochure | Orange County Volvo
2017 Volvo S60 Brochure | Orange County Volvo2017 Volvo S60 Brochure | Orange County Volvo
2017 Volvo S60 Brochure | Orange County VolvoVolvo Cars Mission Viejo
 
How to configure static nat on cisco routers
How to configure static nat on cisco routersHow to configure static nat on cisco routers
How to configure static nat on cisco routersIT Tech
 
Assessment for learning meeting april 29th 2014
Assessment for learning meeting april 29th 2014Assessment for learning meeting april 29th 2014
Assessment for learning meeting april 29th 2014Mr Bounab Samir
 
Global IT Consulting Market
Global IT Consulting MarketGlobal IT Consulting Market
Global IT Consulting MarketJoyjeet Dan
 
Best practices multichannel-integration
Best practices multichannel-integrationBest practices multichannel-integration
Best practices multichannel-integrationGiuseppe Monserrato
 
Dantes Inferno Study Guide
Dantes Inferno Study GuideDantes Inferno Study Guide
Dantes Inferno Study Guidefollowthelamb
 
Finding the best Radio Network Planning and Radio Network Optimization software
Finding the best Radio Network Planning and Radio Network Optimization softwareFinding the best Radio Network Planning and Radio Network Optimization software
Finding the best Radio Network Planning and Radio Network Optimization softwareMuhammad Waqas Akram
 
Temperature Transducer
Temperature TransducerTemperature Transducer
Temperature TransducerAIT
 

Viewers also liked (11)

Custom Courseware Development
Custom Courseware DevelopmentCustom Courseware Development
Custom Courseware Development
 
2017 Volvo S60 Brochure | Orange County Volvo
2017 Volvo S60 Brochure | Orange County Volvo2017 Volvo S60 Brochure | Orange County Volvo
2017 Volvo S60 Brochure | Orange County Volvo
 
Containerization and palletization
Containerization and palletizationContainerization and palletization
Containerization and palletization
 
How to configure static nat on cisco routers
How to configure static nat on cisco routersHow to configure static nat on cisco routers
How to configure static nat on cisco routers
 
Assessment for learning meeting april 29th 2014
Assessment for learning meeting april 29th 2014Assessment for learning meeting april 29th 2014
Assessment for learning meeting april 29th 2014
 
Global IT Consulting Market
Global IT Consulting MarketGlobal IT Consulting Market
Global IT Consulting Market
 
Best practices multichannel-integration
Best practices multichannel-integrationBest practices multichannel-integration
Best practices multichannel-integration
 
Dantes Inferno Study Guide
Dantes Inferno Study GuideDantes Inferno Study Guide
Dantes Inferno Study Guide
 
Finding the best Radio Network Planning and Radio Network Optimization software
Finding the best Radio Network Planning and Radio Network Optimization softwareFinding the best Radio Network Planning and Radio Network Optimization software
Finding the best Radio Network Planning and Radio Network Optimization software
 
Temperature Transducer
Temperature TransducerTemperature Transducer
Temperature Transducer
 
Camels approach
Camels approachCamels approach
Camels approach
 

Similar to Scaling with MongoDB

Scaling with MongoDB
Scaling with MongoDBScaling with MongoDB
Scaling with MongoDBMongoDB
 
Optimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at LocalyticsOptimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at Localyticsandrew311
 
Python mongo db-training-europython-2011
Python mongo db-training-europython-2011Python mongo db-training-europython-2011
Python mongo db-training-europython-2011Andreas Jung
 
MongoDB Knowledge Shareing
MongoDB Knowledge ShareingMongoDB Knowledge Shareing
MongoDB Knowledge ShareingPhilip Zhong
 
Intro to mongo db
Intro to mongo dbIntro to mongo db
Intro to mongo dbChi Lee
 
MongoDB Auto-Sharding at Mongo Seattle
MongoDB Auto-Sharding at Mongo SeattleMongoDB Auto-Sharding at Mongo Seattle
MongoDB Auto-Sharding at Mongo SeattleMongoDB
 
Webinar: Understanding Storage for Performance and Data Safety
Webinar: Understanding Storage for Performance and Data SafetyWebinar: Understanding Storage for Performance and Data Safety
Webinar: Understanding Storage for Performance and Data SafetyMongoDB
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB
 
DBVersity MongoDB Online Training Presentations
DBVersity MongoDB Online Training PresentationsDBVersity MongoDB Online Training Presentations
DBVersity MongoDB Online Training PresentationsSrinivas Mutyala
 
MongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: ShardingMongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: ShardingMongoDB
 
2014 05-07-fr - add dev series - session 6 - deploying your application-2
2014 05-07-fr - add dev series - session 6 - deploying your application-22014 05-07-fr - add dev series - session 6 - deploying your application-2
2014 05-07-fr - add dev series - session 6 - deploying your application-2MongoDB
 
Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for SysadminsNathan Milford
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsServer Density
 
Deployment Preparedness
Deployment Preparedness Deployment Preparedness
Deployment Preparedness MongoDB
 
OSDC 2012 | Scaling with MongoDB by Ross Lawley
OSDC 2012 | Scaling with MongoDB by Ross LawleyOSDC 2012 | Scaling with MongoDB by Ross Lawley
OSDC 2012 | Scaling with MongoDB by Ross LawleyNETWAYS
 
Sedna XML Database: Memory Management
Sedna XML Database: Memory ManagementSedna XML Database: Memory Management
Sedna XML Database: Memory ManagementIvan Shcheklein
 

Similar to Scaling with MongoDB (20)

Scaling with MongoDB
Scaling with MongoDBScaling with MongoDB
Scaling with MongoDB
 
Optimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at LocalyticsOptimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at Localytics
 
Python mongo db-training-europython-2011
Python mongo db-training-europython-2011Python mongo db-training-europython-2011
Python mongo db-training-europython-2011
 
MongoDB 3.0
MongoDB 3.0 MongoDB 3.0
MongoDB 3.0
 
MongoDB Knowledge Shareing
MongoDB Knowledge ShareingMongoDB Knowledge Shareing
MongoDB Knowledge Shareing
 
Intro to mongo db
Intro to mongo dbIntro to mongo db
Intro to mongo db
 
2012 phoenix mug
2012 phoenix mug2012 phoenix mug
2012 phoenix mug
 
MongoDB Auto-Sharding at Mongo Seattle
MongoDB Auto-Sharding at Mongo SeattleMongoDB Auto-Sharding at Mongo Seattle
MongoDB Auto-Sharding at Mongo Seattle
 
Webinar: Understanding Storage for Performance and Data Safety
Webinar: Understanding Storage for Performance and Data SafetyWebinar: Understanding Storage for Performance and Data Safety
Webinar: Understanding Storage for Performance and Data Safety
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: Sharding
 
MongoDB is the MashupDB
MongoDB is the MashupDBMongoDB is the MashupDB
MongoDB is the MashupDB
 
DBVersity MongoDB Online Training Presentations
DBVersity MongoDB Online Training PresentationsDBVersity MongoDB Online Training Presentations
DBVersity MongoDB Online Training Presentations
 
MongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: ShardingMongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: Sharding
 
Mongo db japan
Mongo db japanMongo db japan
Mongo db japan
 
2014 05-07-fr - add dev series - session 6 - deploying your application-2
2014 05-07-fr - add dev series - session 6 - deploying your application-22014 05-07-fr - add dev series - session 6 - deploying your application-2
2014 05-07-fr - add dev series - session 6 - deploying your application-2
 
Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for Sysadmins
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & Analytics
 
Deployment Preparedness
Deployment Preparedness Deployment Preparedness
Deployment Preparedness
 
OSDC 2012 | Scaling with MongoDB by Ross Lawley
OSDC 2012 | Scaling with MongoDB by Ross LawleyOSDC 2012 | Scaling with MongoDB by Ross Lawley
OSDC 2012 | Scaling with MongoDB by Ross Lawley
 
Sedna XML Database: Memory Management
Sedna XML Database: Memory ManagementSedna XML Database: Memory Management
Sedna XML Database: Memory Management
 

More from Rick Copeland

Python Functions (PyAtl Beginners Night)
Python Functions (PyAtl Beginners Night)Python Functions (PyAtl Beginners Night)
Python Functions (PyAtl Beginners Night)Rick Copeland
 
Schema Design at Scale
Schema Design at ScaleSchema Design at Scale
Schema Design at ScaleRick Copeland
 
Building Your First MongoDB Application
Building Your First MongoDB ApplicationBuilding Your First MongoDB Application
Building Your First MongoDB ApplicationRick Copeland
 
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and MingRapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and MingRick Copeland
 
Chef on MongoDB and Pyramid
Chef on MongoDB and PyramidChef on MongoDB and Pyramid
Chef on MongoDB and PyramidRick Copeland
 
Chef on Python and MongoDB
Chef on Python and MongoDBChef on Python and MongoDB
Chef on Python and MongoDBRick Copeland
 
Real-Time Python Web: Gevent and Socket.io
Real-Time Python Web: Gevent and Socket.ioReal-Time Python Web: Gevent and Socket.io
Real-Time Python Web: Gevent and Socket.ioRick Copeland
 
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and MingRapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and MingRick Copeland
 
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQRealtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQRick Copeland
 
Rapid, Scalable Web Development with MongoDB, Ming, and Python
Rapid, Scalable Web Development with MongoDB, Ming, and PythonRapid, Scalable Web Development with MongoDB, Ming, and Python
Rapid, Scalable Web Development with MongoDB, Ming, and PythonRick Copeland
 
Allura - an Open Source MongoDB Based Document Oriented SourceForge
Allura - an Open Source MongoDB Based Document Oriented SourceForgeAllura - an Open Source MongoDB Based Document Oriented SourceForge
Allura - an Open Source MongoDB Based Document Oriented SourceForgeRick Copeland
 
MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDBMongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDBRick Copeland
 

More from Rick Copeland (12)

Python Functions (PyAtl Beginners Night)
Python Functions (PyAtl Beginners Night)Python Functions (PyAtl Beginners Night)
Python Functions (PyAtl Beginners Night)
 
Schema Design at Scale
Schema Design at ScaleSchema Design at Scale
Schema Design at Scale
 
Building Your First MongoDB Application
Building Your First MongoDB ApplicationBuilding Your First MongoDB Application
Building Your First MongoDB Application
 
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and MingRapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
 
Chef on MongoDB and Pyramid
Chef on MongoDB and PyramidChef on MongoDB and Pyramid
Chef on MongoDB and Pyramid
 
Chef on Python and MongoDB
Chef on Python and MongoDBChef on Python and MongoDB
Chef on Python and MongoDB
 
Real-Time Python Web: Gevent and Socket.io
Real-Time Python Web: Gevent and Socket.ioReal-Time Python Web: Gevent and Socket.io
Real-Time Python Web: Gevent and Socket.io
 
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and MingRapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
 
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQRealtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
 
Rapid, Scalable Web Development with MongoDB, Ming, and Python
Rapid, Scalable Web Development with MongoDB, Ming, and PythonRapid, Scalable Web Development with MongoDB, Ming, and Python
Rapid, Scalable Web Development with MongoDB, Ming, and Python
 
Allura - an Open Source MongoDB Based Document Oriented SourceForge
Allura - an Open Source MongoDB Based Document Oriented SourceForgeAllura - an Open Source MongoDB Based Document Oriented SourceForge
Allura - an Open Source MongoDB Based Document Oriented SourceForge
 
MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDBMongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDB
 

Recently uploaded

How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 

Recently uploaded (20)

How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 

Scaling with MongoDB

  • 2. Now a consultant, but formerly…  Software engineer at SourceForge, early adopter of MongoDB (version 0.8)  Wrote the SQLAlchemy book (I love SQL when it’s used well)  Mainly write Python now, but have done C++, C#, Java, Javascript, VHDL, Verilog, …
  • 3. You can do it with an RDBMS as long as you…  Don’t use joins  Don’t use transactions  Use read-only slaves  Use memcached  Denormalize your data  Use custom sharding/partitioning  Do a lot of vertical scaling ▪ (we’re going to need a bigger box)
  • 6. Use documents to improve locality  Optimize your indexes  Be aware of your working set  Scaling your disks  Replication for fault-tolerance and read scaling  Sharding for read and write scaling
  • 7. Relational (SQL) MongoDB Database Database Dynamic Typing Table Collection B-tree (range-based) Index Index Row Document Think JSON Column Field Primitive types + arrays, documents
  • 8. { title: "Slides for Scaling with MongoDB", author: "Rick Copeland", date: ISODate("20012-02-29T19:30:00Z"), text: "My slides are available on speakerdeck.com", comments: [ { author: "anonymous", date: ISODate("20012-02-29T19:30:01Z"), text: "Fristpsot!" }, { author: "mark”, date: ISODate("20012-02-29T19:45:23Z"), text: "Nice slides" } ] } Embed comment data in blog post document
  • 9. Seek = 5+ ms Read = really really fast
  • 10. Post Comment Author
  • 12. Find where x equals 7 1 2 3 4 5 6 7 Looked at 7 objects
  • 13. Find where x equals 7 4 2 6 1 3 5 7 Looked at 3 objects
  • 16. Working set =  sizeof(frequently used data)  + sizeof(frequently used indexes)  Right-aligned indexes reduce working set size  Working set should fit in available RAM for best performance  Page faults are the biggest cause of performance loss in MongoDB
  • 17. >db.foo.stats() Data Size { "ns" : "test.foo", "count" : 1338330, "size" : 46915928, Average doc size "avgObjSize" : 35.05557523181876, "storageSize" : 86092032, "numExtents" : 12, "nindexes" : 2, Size on disk (or RAM!) "lastExtentSize" : 20872960, "paddingFactor" : 1, "flags" : 0, Size of all indexes "totalIndexSize" : 99860480, "indexSizes" : { "_id_" : 55877632, "x_1" : 43982848}, "ok" : 1 Size of each index }
  • 18. ~200 seeks / second
  • 19. ~200 seeks / second ~200 seeks / second ~200 seeks / second  Faster, but less reliable
  • 20. ~400 seeks / second ~400 seeks / second ~400 seeks / second  Faster and more reliable ($$$ though)
  • 21. Old and busted  master/slave replication  The new hotness  replica sets with automatic failover Read / Write Primary Read Secondary Read Secondary
  • 22. Primary handles all writes  Application optionally sends reads to slaves  Heartbeat manages automatic failover
  • 23. Special collection (the oplog) records operations idempotently  Secondaries read from primary oplog and replay operations locally  Space is preallocated and fixed for the oplog
  • 24. { "ts" : Timestamp(1317653790000, 2), Insert "h" : -6022751846629753359, "op" : "i", "ns" : "confoo.People", Collection name "o" : { "_id" : ObjectId("4e89cd1e0364241932324269"), "first" : "Rick", "last" : "Copeland” } } Object to insert
  • 25. Use heartbeat signal to detect failure  When primary can’t be reached, elect a new one  Replica that’s the most up-to-date is chosen  If there is skew, changes not on new primary are saved to a .bson file for manual reconciliation  Application can require data to be replicated to a majority to ensure this doesn’t happen
  • 26. Priority  Slower nodes with lower priority  Backup or read-only nodes to never be primary  slaveDelay  Fat-finger protection  Data center awareness and tagging  Application can ensure complex replication guarantees
  • 27. Reads scale nicely  As long as the working set fits in RAM  … and you don’t mind eventual consistency  Sharding to the rescue!  Automatically partitioned data sets  Scale writes and reads  Automatic load balancing between the shards
  • 28. Configuration MongoS MongoS Config 1 Config 2 Config 3 Shard 1 Shard 2 Shard 3 Shard 4 0..10 10..20 20..30 30..40 Primary Primary Primary Primary Secondary Secondary Secondary Secondary Secondary Secondary Secondary Secondary
  • 29. Sharding is per-collection and range-based  The highest-impact choice (and hardest to change decision) you make is the shard key  Random keys: good for writes, bad for reads  Right-aligned index: bad for writes  Small # of discrete keys: very bad  Ideal: balance writes, make reads routable by mongos  Optimal shard key selection is hard
  • 30. Primary Data Center Secondary Data Center Shard 1 Shard 1 Shard 1 Priority 1 Priority 1 Priority 0 Shard 2 Shard 2 Shard 2 Priority 1 Priority 1 Priority 0 Shard 3 Shard 3 Shard 3 RS3 Priority 1 Priority 1 Priority 0 Config 1 Config 2 Config 3
  • 31. Writes and reads both scale (with good choice of shard key)  Reads scale while remaining strongly consistent  Partitioning ensures you get more usable RAM  Pitfall: don’t wait too long to add capacity

Editor's Notes

  1. You’d like to just ‘add capacity’ but you end up having to buy a bigger serverBuild your own infrastructure and you pay more for less as you scaleThe cloud can help with this, but only up to a point; what happens when you’re using the largest instance? Time to rearchitect.
  2. There are a lot of features that make RDBMSs attractiveBut as we scale we need to turn off a lot of them to get performance increasesWe end up with something that scales, but it’s hard to use
  3. RAM functions as a cacheReplication ends up caching documents in multiple locationsSharding makes sure documents only have one ‘home’
  4. A single shard is a replica setMongoS is a router that determines where reads and writes goDocuments is ‘chunked’ into ranges. Chunks can be split and migrated to other servers based on load.Configuration servers persist location of particular shard key ranges Cluster is alive when one or more config servers are down, but there can be no migration