SlideShare a Scribd company logo
SCALING MONGODB IN THE CLOUD
        Simon Maynard - Bugsnag CTO
               @snmaynard
WHERE HAVE I USED MONGODB?
HEYZAP

• Largest   mobile gaming social network
• MongoDB     the main datastore
   • Also   MySQL & Redis
• High   number of reads, fewer writes
BUGSNAG
                       bugsnag.com


• Exception   tracking service for mobile and web
• MongoDB      only persistent datastore
   • Redis   caching
• Lots   of writes, fewer reads
WHAT ARE THE PROS & CONS OF
        MONGODB?
MONGODB PROS & CONS
    Pros

•   Schemaless

•   Fire & Forget

•   Scalable writes / reads

•   Fast!
MONGODB PROS & CONS
    Pros                          Cons

•   Schemaless                •   Schemaless

•   Fire & Forget             •   Fire & Forget

•   Scalable writes / reads   •   No joins

•   Fast!                     •   No transactions

                              •   Database level locking
WHEN SHOULD YOU THINK ABOUT
         SCALING?
• From    the start!

• Monitor

• Anticipate

• React   early
WHAT ARE THE KEY RESOURCES?
RAM

•   Heavily reliant on available RAM

•   “Working set” should fit in RAM

•   Indexes & documents
RAM

      in RAM




      not in RAM
I/O
•   When data is not in RAM, MongoDB hits the disk

•   Ensure this happens infrequently

•   When it does, it should be fast

•   EBS throughput sucks
HOW TO KEEP I/O FAST

•   Fast filesystem - 10gen recommends xfs

•   Use RAID - e.g. RAID 10 (stripe of mirrors)

•   Increase file descriptor limits

•   Turn off atime and diratime

•   Tweak read-ahead settings

•   http://www.mongodb.org/display/DOCS/Production+Notes
HOW CAN YOU ARCHITECT MONGODB
          TO SCALE?
VERTICAL SCALING

•   Buy more resources on single machine

    •   RAM

    •   I/O
HORIZONTAL SCALING

  •   Buy more machines

      •   Replica sets

      •   Sharding
REPLICA SETS

•   Scales reads well

•   One primary, many secondaries

•   Read from all members

•   Write to primary only

•   Inconsistent reads from secondaries
SHARDING


•   Many primaries, many secondaries

•   Scales writes and reads

•   Harder to set up well
WHAT OTHER TECHNIQUES TO SCALE?
STANDARD RULES
•   Standard DB scaling rules apply to MongoDB

    •   Use skip() and limit()

    •   Return subsets of fields

    •   Index all your queries

    •   Run explain() on new/slow queries
SCHEMA DESIGN
•   De-normalize
                   {
                       "_id" : ObjectId("505bd6a6c6b6b99254000003"),
                       "author" : "Simon Maynard",
                       "post" : "Hey everyone!",
                       "comments" : [
                       {
                         "author" : "anonymous",
                         "text" : "Hey!",
                       },{
                         "author" : "James Smith",
                         "text" : "Hey Simon!",
                       }
                   }
SCHEMA DESIGN
•   Indexes should be minimized in size and number
        {
                                                     {
            "name" : "Angry Birds",
                                                         "name" : "Angry Birds",
            "android" : true,
                                                         "platform" : 3
            "iphone" : true
                                                     }
        }
SCHEMA DESIGN
•   Minimize key lengths on small documents

•   Can reduce storage requirements and performance increase
{
    "_id":"AHAHSPGPGSAVKLPAPHSVGKSALR",
    "game_id":"8122",
    "user_id":"1854",
    "session_start":"51067007",
    "session_end":"51067085"
}

                92 bytes
SCHEMA DESIGN
•   Minimize key lengths on small documents

•   Can reduce storage requirements and performance increase
{                                                {
    "_id":"AHAHSPGPGSAVKLPAPHSVGKSALR",          "_id":"AHAHSPGPGSAVKLPAPHSVGKSALR",
    "game_id":"8122",                              "g":"8122",
    "user_id":"1854",                              "u":"1854",
    "session_start":"51067007",                    "s":"51067007",
    "session_end":"51067085"                       "e":"51067085"
}                                                }

                92 bytes                                       58 bytes

                              About 1/3 memory saved!
PROFILER
•   MongoDB has a built in profiler

•   Use the profiler all the time

•   db.setProfilingLevel(1, 100)

•   ‘show profile’ shows recent profiles

•   Stored in db.system.profile
PROFILER OUTPUT
"ts" : ISODate("2012-09-24T23:24:28.908Z"),
                                                     "nscanned" : 1,
"op" : "query",
                                                     "scanAndOrder" : true,
"ns" : "bugsnag.errors",
                                                     "numYield" : 0,
"query" : {
                                                     "lockStats" : {
 "query" : {
                                                      "timeLockedMicros" : { },
   "errorHash":"2ff33b4f86543972577cdee34f60e4b2",
                                                      "timeAcquiringMicros" : {
   "project_id" :"4ff24b7e2511bb1a70000004"
                                                        "r" : NumberLong(2),
 },
                                                        "w" : NumberLong(3)
 "orderby" : {
                                                      }
   "_id" : 1
                                                     },
 }
                                                     "nreturned" : 1,
},
                                                     "responseLength" : 5240,
"ntoreturn" : 1,
                                                     "millis" : 0,
"ntoskip" : 0,
PROFILER OUTPUT
"ts" : ISODate("2012-09-24T23:24:28.908Z"),
                                                     "nscanned" : 1,
"op" : "query",
                                                     "scanAndOrder" : true,
"ns" : "bugsnag.errors",
                                                     "numYield" : 0,
"query" : {
                                                     "lockStats" : {
 "query" : {
                                                      "timeLockedMicros" : { },
   "errorHash":"2ff33b4f86543972577cdee34f60e4b2",
                                                      "timeAcquiringMicros" : {
   "project_id" :"4ff24b7e2511bb1a70000004"
                                                        "r" : NumberLong(2),
 },
                                                        "w" : NumberLong(3)
 "orderby" : {
                                                      }
   "_id" : 1
                                                     },
 }
                                                     "nreturned" : 1,
},
                                                     "responseLength" : 5240,
"ntoreturn" : 1,
                                                     "millis" : 0,
"ntoskip" : 0,
WHAT SHOULD I MONITOR?
MONITORS
•   Chart the index size

•   Chart the number of current ops

•   Monitor index misses

•   Monitor replication lag

•   Monitor I/O performance (iostat)

•   Monitor disk space
HOW CAN I MONITOR MONGODB?
db.currentOp()

{
    "opid" : 783608,
    "active" : true,
    "secs_running" : 149,
    "op" : "query",
    "ns" : "bugsnag.accounts",
    "query" : {
         "_id" : ObjectId("505bd6a6c6b6b99254000003"),
    },
    "waitingForLock" : false,
    "numYields" : 349,
}
db.serverStatus()
"locks" : {                                    ! "misses" : 0,
! "bugsnag" : {                                ! "resets" : 0,
       "timeLockedMicros" : {                  ! "missRatio" : 0
           ! "r" : NumberLong(1639187950),     }
           ! "w" : NumberLong(1313312267)    },
      },                                     "opcounters" : {
      "timeAcquiringMicros" : {              ! "insert" : 13674147,
           "r" : NumberLong(1041368094),     ! "query" : 5261723,
            "w" : NumberLong(630905947)      ! "update" : 2576757,
      }                                      ! "delete" : 22324,
   },                                        ! "getmore" : 4459,
},                                           ! "command" : 4382007
"indexCounters" : {                          },
! "btree" : {
       "accesses" : 610645909,
   ! "hits" : 610645909,
db.stats()
{
!   "db" : "bugsnag",
!   "collections" : 14,
!   "objects" : 68081951,
!   "avgObjSize" : 10147.85350585104,
!   "dataSize" : 690885147618,
!   "storageSize" : 1290028235245,
!   "numExtents" : 67,
!   "indexes" : 28,
!   "indexSize" : 21240430449,
!   "fileSize" : 1925185536051,
!   "nsSizeMB" : 16,
!   "ok" : 1
}
MONGOTOP

         ns               total   read   write

   bugsnag.events          80ms   12ms   68ms

   bugsnag.projects        2ms    2ms    0ms

    bugsnag.users          1ms    1ms    0ms

bugsnag.system.indexes     4ms    4ms    0ms
MONGOSTAT

        insert query update delete getmore command flushes faults locked db idx miss %


localhost 147   210    51     13      4       215       0     0      14%        0
MONGO MONITORING SERVICE


•   MMS is 10gen hosted Mongo monitoring

•   Available as web app (https://mms.10gen.com)

•   Android client also available from Google Play
KIBANA & LOGSTASH




•   Logstash is open-source log parser - http://logstash.net/

•   Kibana is an alternative UI for Logstash - http://kibana.org/

•   Cool trend analysis for mongo logs
•   Questions?

•   Check out www.bugsnag.com

•   Follow me on twitter @snmaynard

More Related Content

What's hot

MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by B...
MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by B...MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by B...
MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by B...
MongoDB
 
Using npm to Manage Your Projects for Fun and Profit - USEFUL INFO IN NOTES!
Using npm to Manage Your Projects for Fun and Profit - USEFUL INFO IN NOTES!Using npm to Manage Your Projects for Fun and Profit - USEFUL INFO IN NOTES!
Using npm to Manage Your Projects for Fun and Profit - USEFUL INFO IN NOTES!
async_io
 
Metaprogramming with JavaScript
Metaprogramming with JavaScriptMetaprogramming with JavaScript
Metaprogramming with JavaScript
Timur Shemsedinov
 
Performance patterns
Performance patternsPerformance patterns
Performance patterns
Stoyan Stefanov
 
Back to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation FrameworkBack to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation Framework
MongoDB
 
Choosing a Shard key
Choosing a Shard keyChoosing a Shard key
Choosing a Shard key
MongoDB
 
HashiConf Digital 2020: HashiCorp Vault configuration as code via HashiCorp T...
HashiConf Digital 2020: HashiCorp Vault configuration as code via HashiCorp T...HashiConf Digital 2020: HashiCorp Vault configuration as code via HashiCorp T...
HashiConf Digital 2020: HashiCorp Vault configuration as code via HashiCorp T...
Andrey Devyatkin
 
Cargo Cult Security UJUG Sep2015
Cargo Cult Security UJUG Sep2015Cargo Cult Security UJUG Sep2015
Cargo Cult Security UJUG Sep2015
Derrick Isaacson
 
DEF CON 23 - amit ashbel and maty siman - game of hacks
DEF CON 23 - amit ashbel and maty siman - game of hacks DEF CON 23 - amit ashbel and maty siman - game of hacks
DEF CON 23 - amit ashbel and maty siman - game of hacks
Felipe Prado
 
Top Ten Web Defenses - DefCamp 2012
Top Ten Web Defenses  - DefCamp 2012Top Ten Web Defenses  - DefCamp 2012
Top Ten Web Defenses - DefCamp 2012
DefCamp
 
Couchdb w Ruby'm
Couchdb w Ruby'mCouchdb w Ruby'm
Couchdb w Ruby'm
Stanisław Wasiutyński
 
MongoDB全機能解説1
MongoDB全機能解説1MongoDB全機能解説1
MongoDB全機能解説1
Takahiro Inoue
 
Getting Started with MongoDB and NodeJS
Getting Started with MongoDB and NodeJSGetting Started with MongoDB and NodeJS
Getting Started with MongoDB and NodeJS
MongoDB
 
Webinar: Index Tuning and Evaluation
Webinar: Index Tuning and EvaluationWebinar: Index Tuning and Evaluation
Webinar: Index Tuning and Evaluation
MongoDB
 
Conceptos básicos. Seminario web 5: Introducción a Aggregation Framework
Conceptos básicos. Seminario web 5: Introducción a Aggregation FrameworkConceptos básicos. Seminario web 5: Introducción a Aggregation Framework
Conceptos básicos. Seminario web 5: Introducción a Aggregation Framework
MongoDB
 
20121023 mongodb schema-design
20121023 mongodb schema-design20121023 mongodb schema-design
20121023 mongodb schema-design
MongoDB
 
Metarhia: Node.js Macht Frei
Metarhia: Node.js Macht FreiMetarhia: Node.js Macht Frei
Metarhia: Node.js Macht Frei
Timur Shemsedinov
 
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
MongoDB
 
Cache is King: Get the Most Bang for Your Buck From Ruby
Cache is King: Get the Most Bang for Your Buck From RubyCache is King: Get the Most Bang for Your Buck From Ruby
Cache is King: Get the Most Bang for Your Buck From Ruby
Molly Struve
 
Javascript Object Signing & Encryption
Javascript Object Signing & EncryptionJavascript Object Signing & Encryption
Javascript Object Signing & Encryption
Aaron Zauner
 

What's hot (20)

MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by B...
MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by B...MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by B...
MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by B...
 
Using npm to Manage Your Projects for Fun and Profit - USEFUL INFO IN NOTES!
Using npm to Manage Your Projects for Fun and Profit - USEFUL INFO IN NOTES!Using npm to Manage Your Projects for Fun and Profit - USEFUL INFO IN NOTES!
Using npm to Manage Your Projects for Fun and Profit - USEFUL INFO IN NOTES!
 
Metaprogramming with JavaScript
Metaprogramming with JavaScriptMetaprogramming with JavaScript
Metaprogramming with JavaScript
 
Performance patterns
Performance patternsPerformance patterns
Performance patterns
 
Back to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation FrameworkBack to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation Framework
 
Choosing a Shard key
Choosing a Shard keyChoosing a Shard key
Choosing a Shard key
 
HashiConf Digital 2020: HashiCorp Vault configuration as code via HashiCorp T...
HashiConf Digital 2020: HashiCorp Vault configuration as code via HashiCorp T...HashiConf Digital 2020: HashiCorp Vault configuration as code via HashiCorp T...
HashiConf Digital 2020: HashiCorp Vault configuration as code via HashiCorp T...
 
Cargo Cult Security UJUG Sep2015
Cargo Cult Security UJUG Sep2015Cargo Cult Security UJUG Sep2015
Cargo Cult Security UJUG Sep2015
 
DEF CON 23 - amit ashbel and maty siman - game of hacks
DEF CON 23 - amit ashbel and maty siman - game of hacks DEF CON 23 - amit ashbel and maty siman - game of hacks
DEF CON 23 - amit ashbel and maty siman - game of hacks
 
Top Ten Web Defenses - DefCamp 2012
Top Ten Web Defenses  - DefCamp 2012Top Ten Web Defenses  - DefCamp 2012
Top Ten Web Defenses - DefCamp 2012
 
Couchdb w Ruby'm
Couchdb w Ruby'mCouchdb w Ruby'm
Couchdb w Ruby'm
 
MongoDB全機能解説1
MongoDB全機能解説1MongoDB全機能解説1
MongoDB全機能解説1
 
Getting Started with MongoDB and NodeJS
Getting Started with MongoDB and NodeJSGetting Started with MongoDB and NodeJS
Getting Started with MongoDB and NodeJS
 
Webinar: Index Tuning and Evaluation
Webinar: Index Tuning and EvaluationWebinar: Index Tuning and Evaluation
Webinar: Index Tuning and Evaluation
 
Conceptos básicos. Seminario web 5: Introducción a Aggregation Framework
Conceptos básicos. Seminario web 5: Introducción a Aggregation FrameworkConceptos básicos. Seminario web 5: Introducción a Aggregation Framework
Conceptos básicos. Seminario web 5: Introducción a Aggregation Framework
 
20121023 mongodb schema-design
20121023 mongodb schema-design20121023 mongodb schema-design
20121023 mongodb schema-design
 
Metarhia: Node.js Macht Frei
Metarhia: Node.js Macht FreiMetarhia: Node.js Macht Frei
Metarhia: Node.js Macht Frei
 
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
 
Cache is King: Get the Most Bang for Your Buck From Ruby
Cache is King: Get the Most Bang for Your Buck From RubyCache is King: Get the Most Bang for Your Buck From Ruby
Cache is King: Get the Most Bang for Your Buck From Ruby
 
Javascript Object Signing & Encryption
Javascript Object Signing & EncryptionJavascript Object Signing & Encryption
Javascript Object Signing & Encryption
 

Similar to Mongo scaling

Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
MongoDB
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & Analytics
Server Density
 
CouchDB introduction
CouchDB introductionCouchDB introduction
CouchDB introduction
Sander van de Graaf
 
MongoDB: a gentle, friendly overview
MongoDB: a gentle, friendly overviewMongoDB: a gentle, friendly overview
MongoDB: a gentle, friendly overview
Antonio Pintus
 
Scaling MongoDB; Sharding Into and Beyond the Multi-Terabyte Range
Scaling MongoDB; Sharding Into and Beyond the Multi-Terabyte RangeScaling MongoDB; Sharding Into and Beyond the Multi-Terabyte Range
Scaling MongoDB; Sharding Into and Beyond the Multi-Terabyte Range
MongoDB
 
Webinar: Position and Trade Management with MongoDB
Webinar: Position and Trade Management with MongoDBWebinar: Position and Trade Management with MongoDB
Webinar: Position and Trade Management with MongoDB
MongoDB
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)
Chris Richardson
 
Scaling MongoDB
Scaling MongoDBScaling MongoDB
Scaling MongoDB
MongoDB
 
MongoDB Live Hacking
MongoDB Live HackingMongoDB Live Hacking
MongoDB Live Hacking
Tobias Trelle
 
Apache Drill @ PJUG, Jan 15, 2013
Apache Drill @ PJUG, Jan 15, 2013Apache Drill @ PJUG, Jan 15, 2013
Apache Drill @ PJUG, Jan 15, 2013
Gera Shegalov
 
Maintenance for MongoDB Replica Sets
Maintenance for MongoDB Replica SetsMaintenance for MongoDB Replica Sets
Maintenance for MongoDB Replica Sets
Igor Donchovski
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: Sharding
MongoDB
 
Introduction to Apache Drill - interactive query and analysis at scale
Introduction to Apache Drill - interactive query and analysis at scaleIntroduction to Apache Drill - interactive query and analysis at scale
Introduction to Apache Drill - interactive query and analysis at scale
MapR Technologies
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
Buildingsocialanalyticstoolwithmongodb
MongoDB APAC
 
nodecalgary1
nodecalgary1nodecalgary1
nodecalgary1
Eric Kryski
 
20110514 mongo dbチューニング
20110514 mongo dbチューニング20110514 mongo dbチューニング
20110514 mongo dbチューニング
Yuichi Matsuo
 
Data as Documents: Overview and intro to MongoDB
Data as Documents: Overview and intro to MongoDBData as Documents: Overview and intro to MongoDB
Data as Documents: Overview and intro to MongoDB
Mitch Pirtle
 
Multithreading and Parallelism on iOS [MobOS 2013]
 Multithreading and Parallelism on iOS [MobOS 2013] Multithreading and Parallelism on iOS [MobOS 2013]
Multithreading and Parallelism on iOS [MobOS 2013]
Kuba Břečka
 
Forking Oryx at Intalio
Forking Oryx at IntalioForking Oryx at Intalio
Forking Oryx at Intalio
Antoine Toulme
 
MongoDB in FS
MongoDB in FSMongoDB in FS
MongoDB in FS
MongoDB
 

Similar to Mongo scaling (20)

Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & Analytics
 
CouchDB introduction
CouchDB introductionCouchDB introduction
CouchDB introduction
 
MongoDB: a gentle, friendly overview
MongoDB: a gentle, friendly overviewMongoDB: a gentle, friendly overview
MongoDB: a gentle, friendly overview
 
Scaling MongoDB; Sharding Into and Beyond the Multi-Terabyte Range
Scaling MongoDB; Sharding Into and Beyond the Multi-Terabyte RangeScaling MongoDB; Sharding Into and Beyond the Multi-Terabyte Range
Scaling MongoDB; Sharding Into and Beyond the Multi-Terabyte Range
 
Webinar: Position and Trade Management with MongoDB
Webinar: Position and Trade Management with MongoDBWebinar: Position and Trade Management with MongoDB
Webinar: Position and Trade Management with MongoDB
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)
 
Scaling MongoDB
Scaling MongoDBScaling MongoDB
Scaling MongoDB
 
MongoDB Live Hacking
MongoDB Live HackingMongoDB Live Hacking
MongoDB Live Hacking
 
Apache Drill @ PJUG, Jan 15, 2013
Apache Drill @ PJUG, Jan 15, 2013Apache Drill @ PJUG, Jan 15, 2013
Apache Drill @ PJUG, Jan 15, 2013
 
Maintenance for MongoDB Replica Sets
Maintenance for MongoDB Replica SetsMaintenance for MongoDB Replica Sets
Maintenance for MongoDB Replica Sets
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: Sharding
 
Introduction to Apache Drill - interactive query and analysis at scale
Introduction to Apache Drill - interactive query and analysis at scaleIntroduction to Apache Drill - interactive query and analysis at scale
Introduction to Apache Drill - interactive query and analysis at scale
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
Buildingsocialanalyticstoolwithmongodb
 
nodecalgary1
nodecalgary1nodecalgary1
nodecalgary1
 
20110514 mongo dbチューニング
20110514 mongo dbチューニング20110514 mongo dbチューニング
20110514 mongo dbチューニング
 
Data as Documents: Overview and intro to MongoDB
Data as Documents: Overview and intro to MongoDBData as Documents: Overview and intro to MongoDB
Data as Documents: Overview and intro to MongoDB
 
Multithreading and Parallelism on iOS [MobOS 2013]
 Multithreading and Parallelism on iOS [MobOS 2013] Multithreading and Parallelism on iOS [MobOS 2013]
Multithreading and Parallelism on iOS [MobOS 2013]
 
Forking Oryx at Intalio
Forking Oryx at IntalioForking Oryx at Intalio
Forking Oryx at Intalio
 
MongoDB in FS
MongoDB in FSMongoDB in FS
MongoDB in FS
 

Recently uploaded

Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
sunilverma7884
 
Sonkoloniya documentation - ONEprojukti.pdf
Sonkoloniya documentation - ONEprojukti.pdfSonkoloniya documentation - ONEprojukti.pdf
Sonkoloniya documentation - ONEprojukti.pdf
SubhamMandal40
 
Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10
ankush9927
 
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
alexjohnson7307
 
UX Webinar Series: Drive Revenue and Decrease Costs with Passkeys for Consume...
UX Webinar Series: Drive Revenue and Decrease Costs with Passkeys for Consume...UX Webinar Series: Drive Revenue and Decrease Costs with Passkeys for Consume...
UX Webinar Series: Drive Revenue and Decrease Costs with Passkeys for Consume...
FIDO Alliance
 
The History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal EmbeddingsThe History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal Embeddings
Zilliz
 
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdfAcumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
BrainSell Technologies
 
Finetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and DefendingFinetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and Defending
Priyanka Aash
 
MAKE MONEY ONLINE Unlock Your Income Potential Today.pptx
MAKE MONEY ONLINE Unlock Your Income Potential Today.pptxMAKE MONEY ONLINE Unlock Your Income Potential Today.pptx
MAKE MONEY ONLINE Unlock Your Income Potential Today.pptx
janagijoythi
 
Gen AI: Privacy Risks of Large Language Models (LLMs)
Gen AI: Privacy Risks of Large Language Models (LLMs)Gen AI: Privacy Risks of Large Language Models (LLMs)
Gen AI: Privacy Risks of Large Language Models (LLMs)
Debmalya Biswas
 
The Path to General-Purpose Robots - Coatue
The Path to General-Purpose Robots - CoatueThe Path to General-Purpose Robots - Coatue
The Path to General-Purpose Robots - Coatue
Razin Mustafiz
 
Patch Tuesday de julio
Patch Tuesday de julioPatch Tuesday de julio
Patch Tuesday de julio
Ivanti
 
Zaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdfZaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdf
AmandaCheung15
 
Acumatica vs. Sage Intacct _Construction_July (1).pptx
Acumatica vs. Sage Intacct _Construction_July (1).pptxAcumatica vs. Sage Intacct _Construction_July (1).pptx
Acumatica vs. Sage Intacct _Construction_July (1).pptx
BrainSell Technologies
 
It's your unstructured data: How to get your GenAI app to production (and spe...
It's your unstructured data: How to get your GenAI app to production (and spe...It's your unstructured data: How to get your GenAI app to production (and spe...
It's your unstructured data: How to get your GenAI app to production (and spe...
Zilliz
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
SynapseIndia
 
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python CodebaseEuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
Jimmy Lai
 
Redefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI CapabilitiesRedefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI Capabilities
Priyanka Aash
 
How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...
DianaGray10
 
Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17
Bhajan Mehta
 

Recently uploaded (20)

Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
 
Sonkoloniya documentation - ONEprojukti.pdf
Sonkoloniya documentation - ONEprojukti.pdfSonkoloniya documentation - ONEprojukti.pdf
Sonkoloniya documentation - ONEprojukti.pdf
 
Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10
 
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
 
UX Webinar Series: Drive Revenue and Decrease Costs with Passkeys for Consume...
UX Webinar Series: Drive Revenue and Decrease Costs with Passkeys for Consume...UX Webinar Series: Drive Revenue and Decrease Costs with Passkeys for Consume...
UX Webinar Series: Drive Revenue and Decrease Costs with Passkeys for Consume...
 
The History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal EmbeddingsThe History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal Embeddings
 
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdfAcumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
 
Finetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and DefendingFinetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and Defending
 
MAKE MONEY ONLINE Unlock Your Income Potential Today.pptx
MAKE MONEY ONLINE Unlock Your Income Potential Today.pptxMAKE MONEY ONLINE Unlock Your Income Potential Today.pptx
MAKE MONEY ONLINE Unlock Your Income Potential Today.pptx
 
Gen AI: Privacy Risks of Large Language Models (LLMs)
Gen AI: Privacy Risks of Large Language Models (LLMs)Gen AI: Privacy Risks of Large Language Models (LLMs)
Gen AI: Privacy Risks of Large Language Models (LLMs)
 
The Path to General-Purpose Robots - Coatue
The Path to General-Purpose Robots - CoatueThe Path to General-Purpose Robots - Coatue
The Path to General-Purpose Robots - Coatue
 
Patch Tuesday de julio
Patch Tuesday de julioPatch Tuesday de julio
Patch Tuesday de julio
 
Zaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdfZaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdf
 
Acumatica vs. Sage Intacct _Construction_July (1).pptx
Acumatica vs. Sage Intacct _Construction_July (1).pptxAcumatica vs. Sage Intacct _Construction_July (1).pptx
Acumatica vs. Sage Intacct _Construction_July (1).pptx
 
It's your unstructured data: How to get your GenAI app to production (and spe...
It's your unstructured data: How to get your GenAI app to production (and spe...It's your unstructured data: How to get your GenAI app to production (and spe...
It's your unstructured data: How to get your GenAI app to production (and spe...
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
 
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python CodebaseEuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
 
Redefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI CapabilitiesRedefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI Capabilities
 
How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...
 
Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17
 

Mongo scaling

  • 1. SCALING MONGODB IN THE CLOUD Simon Maynard - Bugsnag CTO @snmaynard
  • 2. WHERE HAVE I USED MONGODB?
  • 3. HEYZAP • Largest mobile gaming social network • MongoDB the main datastore • Also MySQL & Redis • High number of reads, fewer writes
  • 4. BUGSNAG bugsnag.com • Exception tracking service for mobile and web • MongoDB only persistent datastore • Redis caching • Lots of writes, fewer reads
  • 5. WHAT ARE THE PROS & CONS OF MONGODB?
  • 6. MONGODB PROS & CONS Pros • Schemaless • Fire & Forget • Scalable writes / reads • Fast!
  • 7. MONGODB PROS & CONS Pros Cons • Schemaless • Schemaless • Fire & Forget • Fire & Forget • Scalable writes / reads • No joins • Fast! • No transactions • Database level locking
  • 8. WHEN SHOULD YOU THINK ABOUT SCALING?
  • 9. • From the start! • Monitor • Anticipate • React early
  • 10. WHAT ARE THE KEY RESOURCES?
  • 11. RAM • Heavily reliant on available RAM • “Working set” should fit in RAM • Indexes & documents
  • 12. RAM in RAM not in RAM
  • 13. I/O • When data is not in RAM, MongoDB hits the disk • Ensure this happens infrequently • When it does, it should be fast • EBS throughput sucks
  • 14. HOW TO KEEP I/O FAST • Fast filesystem - 10gen recommends xfs • Use RAID - e.g. RAID 10 (stripe of mirrors) • Increase file descriptor limits • Turn off atime and diratime • Tweak read-ahead settings • http://www.mongodb.org/display/DOCS/Production+Notes
  • 15. HOW CAN YOU ARCHITECT MONGODB TO SCALE?
  • 16. VERTICAL SCALING • Buy more resources on single machine • RAM • I/O
  • 17. HORIZONTAL SCALING • Buy more machines • Replica sets • Sharding
  • 18. REPLICA SETS • Scales reads well • One primary, many secondaries • Read from all members • Write to primary only • Inconsistent reads from secondaries
  • 19. SHARDING • Many primaries, many secondaries • Scales writes and reads • Harder to set up well
  • 21. STANDARD RULES • Standard DB scaling rules apply to MongoDB • Use skip() and limit() • Return subsets of fields • Index all your queries • Run explain() on new/slow queries
  • 22. SCHEMA DESIGN • De-normalize { "_id" : ObjectId("505bd6a6c6b6b99254000003"), "author" : "Simon Maynard", "post" : "Hey everyone!", "comments" : [ { "author" : "anonymous", "text" : "Hey!", },{ "author" : "James Smith", "text" : "Hey Simon!", } }
  • 23. SCHEMA DESIGN • Indexes should be minimized in size and number { { "name" : "Angry Birds", "name" : "Angry Birds", "android" : true, "platform" : 3 "iphone" : true } }
  • 24. SCHEMA DESIGN • Minimize key lengths on small documents • Can reduce storage requirements and performance increase { "_id":"AHAHSPGPGSAVKLPAPHSVGKSALR", "game_id":"8122", "user_id":"1854", "session_start":"51067007", "session_end":"51067085" } 92 bytes
  • 25. SCHEMA DESIGN • Minimize key lengths on small documents • Can reduce storage requirements and performance increase { { "_id":"AHAHSPGPGSAVKLPAPHSVGKSALR", "_id":"AHAHSPGPGSAVKLPAPHSVGKSALR", "game_id":"8122", "g":"8122", "user_id":"1854", "u":"1854", "session_start":"51067007", "s":"51067007", "session_end":"51067085" "e":"51067085" } } 92 bytes 58 bytes About 1/3 memory saved!
  • 26. PROFILER • MongoDB has a built in profiler • Use the profiler all the time • db.setProfilingLevel(1, 100) • ‘show profile’ shows recent profiles • Stored in db.system.profile
  • 27. PROFILER OUTPUT "ts" : ISODate("2012-09-24T23:24:28.908Z"), "nscanned" : 1, "op" : "query", "scanAndOrder" : true, "ns" : "bugsnag.errors", "numYield" : 0, "query" : { "lockStats" : { "query" : { "timeLockedMicros" : { }, "errorHash":"2ff33b4f86543972577cdee34f60e4b2", "timeAcquiringMicros" : { "project_id" :"4ff24b7e2511bb1a70000004" "r" : NumberLong(2), }, "w" : NumberLong(3) "orderby" : { } "_id" : 1 }, } "nreturned" : 1, }, "responseLength" : 5240, "ntoreturn" : 1, "millis" : 0, "ntoskip" : 0,
  • 28. PROFILER OUTPUT "ts" : ISODate("2012-09-24T23:24:28.908Z"), "nscanned" : 1, "op" : "query", "scanAndOrder" : true, "ns" : "bugsnag.errors", "numYield" : 0, "query" : { "lockStats" : { "query" : { "timeLockedMicros" : { }, "errorHash":"2ff33b4f86543972577cdee34f60e4b2", "timeAcquiringMicros" : { "project_id" :"4ff24b7e2511bb1a70000004" "r" : NumberLong(2), }, "w" : NumberLong(3) "orderby" : { } "_id" : 1 }, } "nreturned" : 1, }, "responseLength" : 5240, "ntoreturn" : 1, "millis" : 0, "ntoskip" : 0,
  • 29. WHAT SHOULD I MONITOR?
  • 30. MONITORS • Chart the index size • Chart the number of current ops • Monitor index misses • Monitor replication lag • Monitor I/O performance (iostat) • Monitor disk space
  • 31. HOW CAN I MONITOR MONGODB?
  • 32. db.currentOp() { "opid" : 783608, "active" : true, "secs_running" : 149, "op" : "query", "ns" : "bugsnag.accounts", "query" : { "_id" : ObjectId("505bd6a6c6b6b99254000003"), }, "waitingForLock" : false, "numYields" : 349, }
  • 33. db.serverStatus() "locks" : { ! "misses" : 0, ! "bugsnag" : { ! "resets" : 0, "timeLockedMicros" : { ! "missRatio" : 0 ! "r" : NumberLong(1639187950), } ! "w" : NumberLong(1313312267) }, }, "opcounters" : { "timeAcquiringMicros" : { ! "insert" : 13674147, "r" : NumberLong(1041368094), ! "query" : 5261723, "w" : NumberLong(630905947) ! "update" : 2576757, } ! "delete" : 22324, }, ! "getmore" : 4459, }, ! "command" : 4382007 "indexCounters" : { }, ! "btree" : { "accesses" : 610645909, ! "hits" : 610645909,
  • 34. db.stats() { ! "db" : "bugsnag", ! "collections" : 14, ! "objects" : 68081951, ! "avgObjSize" : 10147.85350585104, ! "dataSize" : 690885147618, ! "storageSize" : 1290028235245, ! "numExtents" : 67, ! "indexes" : 28, ! "indexSize" : 21240430449, ! "fileSize" : 1925185536051, ! "nsSizeMB" : 16, ! "ok" : 1 }
  • 35. MONGOTOP ns total read write bugsnag.events 80ms 12ms 68ms bugsnag.projects 2ms 2ms 0ms bugsnag.users 1ms 1ms 0ms bugsnag.system.indexes 4ms 4ms 0ms
  • 36. MONGOSTAT insert query update delete getmore command flushes faults locked db idx miss % localhost 147 210 51 13 4 215 0 0 14% 0
  • 37. MONGO MONITORING SERVICE • MMS is 10gen hosted Mongo monitoring • Available as web app (https://mms.10gen.com) • Android client also available from Google Play
  • 38. KIBANA & LOGSTASH • Logstash is open-source log parser - http://logstash.net/ • Kibana is an alternative UI for Logstash - http://kibana.org/ • Cool trend analysis for mongo logs
  • 39. Questions? • Check out www.bugsnag.com • Follow me on twitter @snmaynard

Editor's Notes

  1. \n
  2. \n
  3. All user activity stored in mongo - checkins, game usernames, etc\nHeyzap SDK in many top tier titles - lots of events. Analytics for the millions of game sessions involving heyzap SDK\nGeospatial queries to find where people checked in\nSupplement Mongo with MySQL (allows you to do joins etc)\nAlso Redis as a caching layer\n
  4. High burst write. People deploy bad code and we get all their exceptions.\nBugsnag uses Mongo and Redis alone. Redis caching layer on top of mongo\n\n\n
  5. \n
  6. Schemaless - No migrations. Migrating SQL caused a lot of downtime for Heyzap. \nFire & Forget - by default mongo doesnt wait for the write to complete before returning to the app.\n\n
  7. Many pros are also cons. Know what you are getting into.\nSchemaless means the app has to cope with bad data/migrations/bad states etc\nFire & Forget you can use the safe keyword, but that affects speed\nNo joins, can only pull data from one collection at a time\nSingle write lock across a database. Not great for high proportion of writes, but writes yield - mitigate with db per collection in 2.2. 2.4 will have collection locks.\n
  8. \n
  9. You should design with performance in mind. Think future proof.\nWork out where your pain points will be\nBegin to scale before you hit 95% capacity. You need spare capacity to scale.\n
  10. \n
  11. Working set = often used data. In logging app it would be the last n days of logs. 99% of queries would be on that.\nIndexes and documents should be in RAM for best results. Bare minimum is indexes!\n
  12. When RAM gets full! This is no exaggeration. Mongo’s performance drops massively\n
  13. For Heyzap I/O is the single biggest headache on EC2. EBS random spikes. \nHeyzap moved to provisioned IOPS when it was released to smooth the spikes, rather than get better throughput.\n
  14. xfs supports io suspend and write-cache flushing - essential for AWS snapshots\nincrease file descriptors to allow more open files\natime updates access times for files. That turns reads into writes = bad\nread-ahead means system will read extra blocks from disk when doing a read. Good for sequential access, bad for random (mongo) access\n
  15. \n
  16. Bigger machine.\nHard to get more on 1 machine, especially in the cloud.\nCan be viable in the short term. You can do this with no downtime. Heyzap & Bugsnag do\n
  17. \n
  18. If you use replica sets - monitor the replication lag. This should be close to zero. Otherwise users can write something but cant read it back.\nYou can send a “Write Concern” to say replicate to slaves. Can screw you if slaves are behind.\nAll working set still in memory on each member, just scales volume of reads, not data size\n
  19. Can automatically shard, mongo supports that. Carefully pick your shard key to correctly distribute the load across shards.\nDistributes working set across all shards for big working sets. Also distributes writes.\nHeyzap did manual sharding by collection.\n
  20. \n
  21. Only returning what you need will be faster.\nI advise ensuring (on large datasets) that pretty much every query is indexed. Cron jobs running unindexed queries have caused Heyzap downtime. Smaller datasets is fine.\nRun explain on a new query you are about to deploy. Saves a lot of downtime! Verify it uses an index.\n
  22. Means we dont have to read as many documents, which means we dont need to seek as much on disk.\nNot always applicable. Sometimes the same doc will be in too many diff places. Would make updates too hard.\n
  23. If we wanted to index here on android and iphone separately. That would be 2 indexes.\nWe can combine them into one “bitfield”, halving our index size. Heyzap had a very similar issue with schema.\nMeans we can use less RAM. #1 rule in mongo, use less RAM\n
  24. \n
  25. Depends how small your values/documents are as to whether its worth it\nCan reduce your working set - commonly accessed documents smaller.\nNo effect on indexes\n
  26. Small performance hit from using the profile is worth it. You need to know how fast your db is running.\nIn mongo (command line) run db.setProfilingLevel(1,100). Logs all queries that took more than 100ms.\nprofile is capped collection. May need resize depending on your throughput.\n
  27. Sample output of profiler.\n
  28. ts = when it ran. Tie that to your other logs\nnscanned = number of indexes or documents scanned\nscanAndOrder = when mongo cant use the index to sort\nnumYield = how many times it yielded, indication of page fault etc\nmillis = total duration\n
  29. \n
  30. Index size graphing will allow you to predict scaling needs. Heyzap could accurately predict to within ~ day\nCurrent Ops spikes show you when to look at profiler\nIndexes should rarely miss.\nReplication lag leads to bunk user experience on reads, and hard app code (read from primary).\n
  31. \n
  32. opid = opid - Pass this to db.killOp() to stop it\nns = namespace = database.collection\nCan show you why everything has suddenly gone slow, but you can miss the guilty query, profiler is better\n
  33. Locks are the microsecond duration locked and waiting for locks\nindex counters say how many index hits we had. Miss means index not in RAM = bad.\n
  34. Useful stats. Index size - keep in RAM\nGraph index size.\nThese metrics can help you predict the need for scaling\nCan also call db.collection.stats(). Get something similar\n\n
  35. Can use --locks to show you lock statistics if you prefer that view\nGood to check if you aren’t sure what collections are heavily used\n
  36. \n
  37. \n
  38. \n
  39. \n