SlideShare a Scribd company logo
Scaling to 30,000 Requests Per Second
and Beyond
with MongoDB
Mike Chesnut
Director of Operations Engineering
Crittercism
Scaling to 30,000 Requests Per Second
and Beyond
with MongoDB
Mike Chesnut
Director of Operations Engineering
Crittercism
MongoDB World
June 23-25
world.mongodb.com
Code: 25GN for 25% off
MongoDB World
June 23-25
world.mongodb.com
Code: 25GN for 25% off
What I’ll Talk About
What I’ll Talk About
● Crittercism - Overview
● Router (mongos) Architecture
● Sharding Considerations
● The Balancer and Me
● Q&A
How a Startup Gets Started
● Pick something and go with it
● Make mistakes along the way
● Correct the mistakes you can
● Work around the ones you can’t
How a Startup Gets Started
Critter-What?
A Brief History...
Critter-What?
Architecture
APIFeedback
Architecture
APIFeedback
App Loads
Crashes
Handled
Exceptions
Architecture
APIFeedback
App Loads
Crashes
Handled
Exceptions
Architecture
DynamoDB
APIFeedback
App Loads
Crashes
Handled
Exceptions
Metadata
Architecture
DynamoDB
APIFeedback
App Loads
Crashes
Handled
Exceptions
Metadata
Architecture
DynamoDB
API
API
Feedback
App Loads
Crashes
Handled
Exceptions
Metadata
Performance
Data
Geo Data
Critter-What?
… Which brings us to today.
Critter-What?
Critter-What?
● feedback widget
● crash reporting
● live stats
● crash grouping
● app performance
management
● geo data
● user analytics
● executive
dashboard
Architecture
DynamoDB
API
API
Feedback
App Loads
Crashes
Handled
Exceptions
Metadata
Performance
Data
Geo Data
Architecture
DynamoDB
API
API
Feedback
App Loads
Crashes
Handled
Exceptions
Metadata
Performance
Data
Geo Data
40,000+ req/s
Growth
Router Architecture
Router Architecture
mongod
server
mongod
server
mongod
server
replica set
mongod
server
mongod
server
mongod
server
replica set
mongod
server
mongod
server
mongod
server
replica set
mongos
client
process
application server
mongos
client
process
application server
Client Application(s) MongoDB Cluster
Single mongos per client problems we encountered:
Router Architecture
Router Architecture
Single mongos per client problems we encountered:
● thousands of connections to config servers
● config server CPU load
● configdb propagation delays
Router Architecture
mongod
server
mongod
server
mongod
server
replica set
mongod
server
mongod
server
mongod
server
replica set
mongod
server
mongod
server
mongod
server
replica set
mongos
client
process
application server
mongos
client
process
application server
Client Application(s) MongoDB ClusterRouter Tier
Router Architecture
Separate mongos tier advantages:
Router Architecture
Separate mongos tier advantages:
● greatly reduced number of connections to each mongod
● far fewer hosts talking to the config servers
● much faster configdb propagation
Router Architecture
Separate mongos tier advantages:
● greatly reduced number of connections to each mongod
● far fewer hosts talking to the config servers
● much faster configdb propagation
Disadvantages:
Router Architecture
Separate mongos tier advantages:
● greatly reduced number of connections to each mongod
● far fewer hosts talking to the config servers
● much faster configdb propagation
Disadvantages:
● additional network hop
● fewer points of failure
Sharding Considerations
Pick something you want to live with.
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
The Balancer and Me
The Balancer and Me
Why wouldn’t you run the balancer in the first place?
● great question
● for us, it’s because we deleted a ton of data at one point, and left a
bunch of holes
○ we turned it off while deleting this data
○ and then were unable to turn it back on
● but maybe you start without it
● or maybe you need to turn it off for maintenance and forget to turn
it back on
Obviously, don’t do this. But if you do, here’s what happens...
The Balancer and Me
Fresh, new, empty cluster… But no balancer running.
The Balancer and Me
The Balancer and Me
The Balancer and Me
The Balancer and Me
The Balancer and Me
The Balancer and Me
Now we’re pretty full, so let’s add another shard...
The Balancer and Me
The Balancer and Me
And keep inserting...
The Balancer and Me
Suddenly we find ourselves with a very unbalanced cluster.
The Balancer and Me
But if we enable the balancer, it will DoS the 5th shard!
The Balancer and Me
The approximate effect looks something like this:
The Balancer and Me
The approximate effect looks something like this:
The Balancer and Me
The approximate effect looks something like this:
The Balancer and Me
The approximate effect looks something like this:
The Balancer and Me
The approximate effect looks something like this:
The Balancer and Me
The approximate effect looks something like this:
So what can we do?
The Balancer and Me
So what can we do?
1. add IOPS
The Balancer and Me
So what can we do?
1. add IOPS
2. make sure your config servers have plenty of CPU (and IOPS)
The Balancer and Me
So what can we do?
1. add IOPS
2. make sure your config servers have plenty of CPU (and IOPS)
3. slowly move chunks manually
The Balancer and Me
So what can we do?
1. add IOPS
2. make sure your config servers have plenty of CPU (and IOPS)
3. slowly move chunks manually
4. approach a balanced state
The Balancer and Me
So what can we do?
1. add IOPS
2. make sure your config servers have plenty of CPU (and IOPS)
3. slowly move chunks manually
4. approach a balanced state
5. hold your breath
The Balancer and Me
So what can we do?
1. add IOPS
2. make sure your config servers have plenty of CPU (and IOPS)
3. slowly move chunks manually
4. approach a balanced state
5. hold your breath
6. try re-enabling the balancer
The Balancer and Me
How to manually balance:
The Balancer and Me
How to manually balance:
1. determine a chunk on a hot shard
2. monitor effects on both the source and target shards
3. move the chunk
4. allow the system to settle
5. repeat
The Balancer and Me
How to manually balance:
1. determine a chunk on a hot shard
mongos> db.chunks.find({"shard":"<shard_name>",
"ns":"<db_name>.<collection>"}).limit(1).pretty()
You’ll get a single chunk (as both min and max); note its shard key and
ObjectId.
The Balancer and Me
How to manually balance:
1. determine a chunk on a hot shard
"min" : {
"unsymbolized_hash" :
"1572663b72e87[...]",
"_id" : ObjectId("50b97db98238[...]")
},
The Balancer and Me
How to manually balance:
1. determine a chunk on a hot shard
2. monitor effects on both the source and target shards
iostat -xhm 1
mongostat
The Balancer and Me
How to manually balance:
1. determine a chunk on a hot shard
2. monitor effects on both the source and target shards
3. move the chunk
mongos> sh.moveChunk("<db_name>.<collection>", {
"unsymbolized_hash" : "1572663b72e87[...]",
"_id" : ObjectId("50b97db98238[...]") },
"<target_shard>")
The Balancer and Me
How to manually balance:
1. determine a chunk on a hot shard
2. monitor effects on both the source and target shards
3. move the chunk
4. allow the system to settle
5. repeat
The Balancer and Me
Conclusion here:
Run the balancer.
The Balancer and Me
● Design ahead of time
o “NoSQL” lets you play it by ear
o but some of these decisions will bite you later
● Be willing to correct past mistakes
o dedicate time and resources to adapting
o learn how to live with the mistakes you can’t correct
Summary
References
● MongoDB Blog post:http://blog.mongodb.org/post/77278906988/crittercism-scaling-to-
billions-of-requests-per-day-on
● MongoDB Documentation on mongos
routers:http://docs.mongodb.org/master/core/sharded-cluster-query-routing/
● MongoDB Documentation on the
balancer:http://docs.mongodb.org/manual/tutorial/manage-sharded-cluster-balancer/
● MongoDB Documentation on shard
keys:http://docs.mongodb.org/manual/core/sharding-shard-key/
Crittercism: http://www.crittercism.com/
MongoDB World
June 23-25
world.mongodb.com
Code: 25GN for 25% off
Q&A
Thank You!

More Related Content

What's hot

Dry Weight Dr Rosna
Dry Weight Dr RosnaDry Weight Dr Rosna
Dry Weight Dr Rosna
edwinchowyw
 
CRRT in ICU - AKI - Dr. Gawad
CRRT in ICU - AKI - Dr. GawadCRRT in ICU - AKI - Dr. Gawad
CRRT in ICU - AKI - Dr. Gawad
NephroTube - Dr.Gawad
 
CRRT options in the ICU
CRRT options in the ICUCRRT options in the ICU
CRRT options in the ICU
Andrew Ferguson
 
CKD MBD - Think Outside The Box - Case Scenarios Snapshots - Dr. Gawad
CKD MBD - Think Outside The Box - Case Scenarios Snapshots  - Dr. GawadCKD MBD - Think Outside The Box - Case Scenarios Snapshots  - Dr. Gawad
CKD MBD - Think Outside The Box - Case Scenarios Snapshots - Dr. Gawad
NephroTube - Dr.Gawad
 
Natural by pass therapy (EECP therapy)
Natural by pass therapy (EECP therapy)Natural by pass therapy (EECP therapy)
Natural by pass therapy (EECP therapy)
Dr Nikita Khabale Patil
 
Cupping therapy
Cupping therapyCupping therapy
Cupping therapy
Dr Choudhry Abdul Sami
 

What's hot (6)

Dry Weight Dr Rosna
Dry Weight Dr RosnaDry Weight Dr Rosna
Dry Weight Dr Rosna
 
CRRT in ICU - AKI - Dr. Gawad
CRRT in ICU - AKI - Dr. GawadCRRT in ICU - AKI - Dr. Gawad
CRRT in ICU - AKI - Dr. Gawad
 
CRRT options in the ICU
CRRT options in the ICUCRRT options in the ICU
CRRT options in the ICU
 
CKD MBD - Think Outside The Box - Case Scenarios Snapshots - Dr. Gawad
CKD MBD - Think Outside The Box - Case Scenarios Snapshots  - Dr. GawadCKD MBD - Think Outside The Box - Case Scenarios Snapshots  - Dr. Gawad
CKD MBD - Think Outside The Box - Case Scenarios Snapshots - Dr. Gawad
 
Natural by pass therapy (EECP therapy)
Natural by pass therapy (EECP therapy)Natural by pass therapy (EECP therapy)
Natural by pass therapy (EECP therapy)
 
Cupping therapy
Cupping therapyCupping therapy
Cupping therapy
 

Similar to Scaling to 30,000 Requests Per Second and Beyond with MongoDB

Occupational Health and Safety
Occupational Health and SafetyOccupational Health and Safety
Occupational Health and Safety
aeromarine
 
OHSNETbase Presentation
OHSNETbase PresentationOHSNETbase Presentation
OHSNETbase Presentation
aeromarine
 
Expecto Performa! The Magic and Reality of Performance Tuning
Expecto Performa! The Magic and Reality of Performance TuningExpecto Performa! The Magic and Reality of Performance Tuning
Expecto Performa! The Magic and Reality of Performance Tuning
Atlassian
 
Data oriented design and c++
Data oriented design and c++Data oriented design and c++
Data oriented design and c++
Mike Acton
 
Ruby3x3: How are we going to measure 3x
Ruby3x3: How are we going to measure 3xRuby3x3: How are we going to measure 3x
Ruby3x3: How are we going to measure 3x
Matthew Gaudet
 
Moved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmx
Moved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmxMoved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmx
Moved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmx
Milen Dyankov
 
Monitoring of OpenNebula installations
Monitoring of OpenNebula installationsMonitoring of OpenNebula installations
Monitoring of OpenNebula installations
NETWAYS
 
OpenNebulaConf 2013 - Monitoring of OpenNebula installations by Florian Heigl
OpenNebulaConf 2013 - Monitoring of OpenNebula installations by Florian Heigl OpenNebulaConf 2013 - Monitoring of OpenNebula installations by Florian Heigl
OpenNebulaConf 2013 - Monitoring of OpenNebula installations by Florian Heigl
OpenNebula Project
 
Practical Code & Data Design
Practical Code & Data DesignPractical Code & Data Design
Practical Code & Data Design
HenryRose9
 
Monitoring your Python with Prometheus (Python Ireland April 2015)
Monitoring your Python with Prometheus (Python Ireland April 2015)Monitoring your Python with Prometheus (Python Ireland April 2015)
Monitoring your Python with Prometheus (Python Ireland April 2015)
Brian Brazil
 
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Codemotion
 
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Demi Ben-Ari
 
THE RISE AND FALL OF SERVERLESS COSTS - TAMING THE (SERVERLESS) BEAST
THE RISE AND FALL OF SERVERLESS COSTS - TAMING THE (SERVERLESS) BEASTTHE RISE AND FALL OF SERVERLESS COSTS - TAMING THE (SERVERLESS) BEAST
THE RISE AND FALL OF SERVERLESS COSTS - TAMING THE (SERVERLESS) BEAST
Opher Dubrovsky
 
MongoDB World 2018: Tutorial - MongoDB Meets Chaos Monkey
MongoDB World 2018: Tutorial - MongoDB Meets Chaos MonkeyMongoDB World 2018: Tutorial - MongoDB Meets Chaos Monkey
MongoDB World 2018: Tutorial - MongoDB Meets Chaos Monkey
MongoDB
 
Delta: Building Merge on Read
Delta: Building Merge on ReadDelta: Building Merge on Read
Delta: Building Merge on Read
Databricks
 
Ekon24 from Delphi to AVX2
Ekon24 from Delphi to AVX2Ekon24 from Delphi to AVX2
Ekon24 from Delphi to AVX2
Arnaud Bouchez
 
Scaling Crittercism to 30,000 Requests Per Second and Beyond with MongoDB
Scaling Crittercism to 30,000 Requests Per Second and Beyond with MongoDBScaling Crittercism to 30,000 Requests Per Second and Beyond with MongoDB
Scaling Crittercism to 30,000 Requests Per Second and Beyond with MongoDB
MongoDB
 
Systems Monitoring with Prometheus (Devops Ireland April 2015)
Systems Monitoring with Prometheus (Devops Ireland April 2015)Systems Monitoring with Prometheus (Devops Ireland April 2015)
Systems Monitoring with Prometheus (Devops Ireland April 2015)
Brian Brazil
 
MongoDB at Baidu
MongoDB at BaiduMongoDB at Baidu
MongoDB at Baidu
Mat Keep
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
C4Media
 

Similar to Scaling to 30,000 Requests Per Second and Beyond with MongoDB (20)

Occupational Health and Safety
Occupational Health and SafetyOccupational Health and Safety
Occupational Health and Safety
 
OHSNETbase Presentation
OHSNETbase PresentationOHSNETbase Presentation
OHSNETbase Presentation
 
Expecto Performa! The Magic and Reality of Performance Tuning
Expecto Performa! The Magic and Reality of Performance TuningExpecto Performa! The Magic and Reality of Performance Tuning
Expecto Performa! The Magic and Reality of Performance Tuning
 
Data oriented design and c++
Data oriented design and c++Data oriented design and c++
Data oriented design and c++
 
Ruby3x3: How are we going to measure 3x
Ruby3x3: How are we going to measure 3xRuby3x3: How are we going to measure 3x
Ruby3x3: How are we going to measure 3x
 
Moved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmx
Moved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmxMoved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmx
Moved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmx
 
Monitoring of OpenNebula installations
Monitoring of OpenNebula installationsMonitoring of OpenNebula installations
Monitoring of OpenNebula installations
 
OpenNebulaConf 2013 - Monitoring of OpenNebula installations by Florian Heigl
OpenNebulaConf 2013 - Monitoring of OpenNebula installations by Florian Heigl OpenNebulaConf 2013 - Monitoring of OpenNebula installations by Florian Heigl
OpenNebulaConf 2013 - Monitoring of OpenNebula installations by Florian Heigl
 
Practical Code & Data Design
Practical Code & Data DesignPractical Code & Data Design
Practical Code & Data Design
 
Monitoring your Python with Prometheus (Python Ireland April 2015)
Monitoring your Python with Prometheus (Python Ireland April 2015)Monitoring your Python with Prometheus (Python Ireland April 2015)
Monitoring your Python with Prometheus (Python Ireland April 2015)
 
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
 
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
 
THE RISE AND FALL OF SERVERLESS COSTS - TAMING THE (SERVERLESS) BEAST
THE RISE AND FALL OF SERVERLESS COSTS - TAMING THE (SERVERLESS) BEASTTHE RISE AND FALL OF SERVERLESS COSTS - TAMING THE (SERVERLESS) BEAST
THE RISE AND FALL OF SERVERLESS COSTS - TAMING THE (SERVERLESS) BEAST
 
MongoDB World 2018: Tutorial - MongoDB Meets Chaos Monkey
MongoDB World 2018: Tutorial - MongoDB Meets Chaos MonkeyMongoDB World 2018: Tutorial - MongoDB Meets Chaos Monkey
MongoDB World 2018: Tutorial - MongoDB Meets Chaos Monkey
 
Delta: Building Merge on Read
Delta: Building Merge on ReadDelta: Building Merge on Read
Delta: Building Merge on Read
 
Ekon24 from Delphi to AVX2
Ekon24 from Delphi to AVX2Ekon24 from Delphi to AVX2
Ekon24 from Delphi to AVX2
 
Scaling Crittercism to 30,000 Requests Per Second and Beyond with MongoDB
Scaling Crittercism to 30,000 Requests Per Second and Beyond with MongoDBScaling Crittercism to 30,000 Requests Per Second and Beyond with MongoDB
Scaling Crittercism to 30,000 Requests Per Second and Beyond with MongoDB
 
Systems Monitoring with Prometheus (Devops Ireland April 2015)
Systems Monitoring with Prometheus (Devops Ireland April 2015)Systems Monitoring with Prometheus (Devops Ireland April 2015)
Systems Monitoring with Prometheus (Devops Ireland April 2015)
 
MongoDB at Baidu
MongoDB at BaiduMongoDB at Baidu
MongoDB at Baidu
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 

Recently uploaded

Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 

Recently uploaded (20)

Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 

Scaling to 30,000 Requests Per Second and Beyond with MongoDB

Editor's Notes

  1. I’m going to tell you the story of how we’ve scaled to handle over 30k req/s using a storage strategy based on MongoDB
  2. Between proposing this talk and now, we’ve actually grown some more, and now top 40-45k r/s on a daily basis This is about 3.5B requests per day
  3. this is a preview of a talk I’ll be giving at MongoDB World, June 23-25 in NYC you can still register
  4. and of course Crittercism will be there
  5. some advice from our experience about things to do and things not to do
  6. I’ll be sure to leave time for Q&A
  7. I’ll tell you how Crittercism got started, some of the lessons we’ve learned along the way, and some advice we can share based on those experiences
  8. September 2010 (from Wayback Machine) Started as a “feedback widget” Enable mobile app developers to allow their users to provide “criticism” of their apps (outside of the app store) Not just a star rating
  9. this is pretty easy - set up a (mongo) db, put an api in front of it, collect user feedback from our SDK
  10. added more types of data we collect
  11. volume starts getting large, so let’s count app loads in a memory-based data store (redis), and persist it to mongo
  12. then we added user metadata as well, but that’s a different kind of data and a different volume and access pattern, so let’s add dynamodb into the mix
  13. our volume keeps going up, so let’s cache this app data to make our responses faster
  14. then we added APM, which introduced a lot of different data types and structures so we added another ingest API and postgres into the mix (but obviously we’re not going to talk about that part here…)
  15. today (2014) - what it’s evolved into collecting tons of detailed analytics data - crash reports, groupings Geo data launched in 2013 (just kidding, this is stored in postgres) iPad app launched in 2014 - more aggregations of performance data (more ways to view it)
  16. lots to deal with... so we started as a way for people to “criticize” your apps then we helped you catch bugs, so we’re the ones doing the “criticism”
  17. so how do we handle 40k/s on mongodb?
  18. we don’t, but that’s our ingest rate, and most of it ends up in mongodb the takeaway here is to be willing to use whatever works
  19. 2-year period went from 700/s (60M/day) to 40-45k/s (3.8B/day)
  20. one of the biggest things we did to help ourselves scale was to consolidate the mongos routers
  21. default, first-pass architecture (for a sharded cluster): one mongos per client machine each client process connects to a local mongos router each mongos routes queries and returns results
  22. could mean your application is reading stale data, or can’t find the data it needs when it needs it (and maybe it has to retry, which means it’s now slower)
  23. move the mongos routers to their own tier be smart about how you route to them (we use chef to keep it within the same AZ)
  24. be aware that this does introduce some disadvantages, too
  25. This is a fundamental design decision that will have huge implications for a long time, so think about it carefully.
  26. Hard (impossible) to change after the fact!
  27. Say you have 4 shards. Let’s say each of the NHL teams that made the playoffs this year has an app, and we shard by app_id.
  28. Say you have 4 shards. Let’s say each of the NHL teams that made the playoffs this year has an app, and we shard by app_id. Let’s distribute them evenly, as is likely to be the case (assuming a sufficiently randomly-generated app_id)
  29. this looks nice and even, right?
  30. So now it’s time for the Western Conference Finals, and the Blackhawks are playing the Kings
  31. So those 2 apps are going to get heavy use, but they’re on the same shard, so uh-oh...
  32. Now this shard isn’t happy Higher load, slower response time for queries to this shard (which are your most common queries due to these apps’ popularity)
  33. so let’s add another shard
  34. That might help if we have more teams’ apps to add
  35. Those new apps had somewhere to go, to keep our cluster balanced But this hasn’t helped our uneven access pattern at all
  36. Only option now is to vertically scale the problem shard
  37. and hopefully that cools it off, but now we have an uneven cluster to manage. and what happens next year, when it’s two different teams in the conference finals? maybe we get lucky and they’re on different shards… but even then, maybe the access is uneven enough that those 2 shards still get hot. so maybe you just live with this and have heterogenous shard servers. (this is probably a much lesser evil than trying to re-shard.) lesson: you’re going to have to live with the shard key you choose, so choose wisely! another option might’ve been to spread data for each app_id across all shards--but then your queries will likely be slower (due to having to read from many/all shards). it’s a trade-off.
  38. The balancer is a super-important part of a sharded mongo cluster… You should love it.
  39. Start with an empty cluster, and start filling it with data (we’ll denote “fullness” by going from green to red) This is an example of what can happen when the balancer is not running
  40. Okay, so now we have a very unbalanced cluster. 3 of our replica sets are very full, one is pretty full, and the newest one is hardly in use. (remember that the balancer isn’t running in this scenario)
  41. The balancer will see the full shards and one near-empty one, and will want to move a ton of chunks all at once, causing severe I/O strain on the system. (no way to tell the balancer to chill)
  42. remember that all of these chunk moves are causing updates to your configdb, places load on your config servers, and has to propagate to all mongos routers, too
  43. you’re going to be adding a lot of I/O to the system when you move chunks, and it still has to be able to perform its normal functions, so over-provision we’re in AWS so we just go for PIOPS… but if you’re on physical hardware, consider RAIDing wider, or upgrading your SAN, or...
  44. updating the configdb (when you move chunks) puts load on your config servers, so make sure they’re ready to handle it
  45. this is tedious and will take a LONG time (more detail in a minute)
  46. gradually you’ll get to a happier place
  47. take a deep breath before you...
  48. be ready to turn it off and return to step 3 if needed, then try again
  49. (this was step 3)
  50. here’s an example from our “rawcrashlog” collection (hash and _id truncated)
  51. start both commands running on both the source and target
  52. don’t need to specify source shard, since your shard key (unsymbolized_hash in our case) and _id are sufficient for mongo to know where it’s coming from
  53. watch your monitoring (iostat/mongostat) -- look for spikes in page faults, queued reads/writes, database lock percentages. obviously look at your application monitoring too, to ensure no adverse effects. use MMS as well (e.g., lock %, page faults) if everything looks good, keep going. if not, you need to start over with more IOPS, more config server capacity, etc.
  54. seems obvious, but not always the case. and if you’re not running it, you can embark on this tedious journey to get it running again.
  55. best-case scenario is to make all of the right choices up front… but you’re probably not going to do that. (though hopefully you can learn a bit from our experience and minimize the wrong choices you make). the good news is MongoDB is still working for us, despite the headaches we’ve had to deal with.
  56. reminder that MongoDB World is right around the corner along with all of these great presenters, I’ll be giving a version of this talk there, and would love to meet you