Webinar: Capacity Planning

•Download as PPTX, PDF•

2 likes•1,545 views

Deploying MongoDB can be a challenge if you don't understand how resources are used nor how to plan for the capacity of your systems. If you need to deploy, or grow, a MongoDB single instance, replica set, or tens of sharded clusters then you probably share the same challenges in trying to size that deployment. This talk will cover what resources MongoDB uses, and how to plan for their use in your deployment. Topics covered will include understanding how to model and plan capacity needs from the perspective of a new deployment, growing an existing one, and defining where the steps along scalability on your path to the top. The goal of this presentation will be to provide you with the tools needed to be successful in managing your MongoDB capacity planning tasks.

Technology

Server Engineer
Shaun Verch
Capacity Planning:
Deploying
MongoDB

Capacity Planning
• Why is it important?
• What is it?
• When is it important?
• How is it actually done?

• What are the consequences of not planning?
Why does it matter?

What is Capacity Planning?
Requirements
Resources

• Availability
• Throughput
• Responsiveness
Requirements

• Availability
• Throughput
• Responsivenes
s
Requirements to Hardware

Resource Usage
• Storage
– IOPS
– Size
– Data & Loading Patterns
• Memory
– Working Set
• CPU
– Speed
– Cores
• Network
– Latency
– Throughput

Storage
• Active
• Archival
• Loading Patterns
• Integration (BI/DW)

Storage
• Active
• Archival
• Loading Patterns
• Integration (BI/DW)
Example IOPS

Example IOPS
7,200 rpm SATA ~ 75-100 IOPS
15,000 rpm SAS ~ 175-210 IOPS
Amazon
EBS/Provisioned
~ 100 IOPS "up to" 2,000
IOPS
Amazon SSD 9,000 – 120,000 IOPS
Storage Capability

Intel X25-E (SLC) ~ 5,000 IOPS
Fusion IO ~ 135,000 IOPS
Violin Memory 6000 ~ 1,000,000 IOPS
Example IOPS
7,200 rpm SATA ~ 75-100 IOPS
15,000 rpm SAS ~ 175-210 IOPS
Amazon
EBS/Provisioned
~ 100 IOPS "up to" 2,000
IOPS
Amazon SSD 9,000 – 120,000 IOPS
Storage Capability

Intel X25-E (SLC) ~ 5,000 IOPS
Fusion IO ~ 135,000 IOPS
Violin Memory 6000 ~ 1,000,000 IOPS
Cost of IOPS
7,200 rpm SATA ~ 75-100 IOPS
15,000 rpm SAS ~ 175-210 IOPS
Amazon
EBS/Provisioned
~ 100 IOPS "up to" 2,000
IOPS
Amazon SSD 9,000 – 120,000 IOPS
Storage Costs

Memory
• Working Set
– Active Data in Memory
– Measured Over Periods

Memory
• Work:
–Sorting
–Aggregation
–Connections
SORTS
Connections
Aggregations

Working Set
Number of distinct pages
accessed per unit of time

Working Set
Number of distinct pages
accessed per second

Working Set
4 distinct pages per second
Worst case 4 disk accesses

Working Set
6 distinct pages per second
Worst case disk access on every op

CPU
• Non-indexed Data
• Sorting
• Aggregation
– Map/Reduce
– Framework
• Data
– Fields
– Nesting
– Arrays/Embedded-Docs

Network
• Latency
– WriteConcern
– ReadPreference
– Batching
• Throughput
– Update/Write Patterns
– Reads/Queries

What is failure?
• We have failed at Capacity Planning when our
resources don’t meet our requirements
• Because our requirements can have many
dimensions, we may exceed our requirements in
one characteristic but not meet them in another
• This means that we can spend many $$$ and still
fail!

What about Legacy Hardware?
• Let’s hope whatever worked for this legacy
technology also works for MongoDB
• Same principles of Capacity Planning still apply

• Before it's too late!
• When?
Capacity Planning: When
Start Launch Version 2

Capacity Planning is Measurement
Measuring early gives you a comparison
point for when you need to do it again

Velocity of Change
• Limitations -> takes time
– Data Movement
– Allocation/Provisioning (servers/mem/disk)
• Improvement
– Limit Size of Change (if you can)
– Increase Frequency
– MEASURE its effect
– Practice

Repeat (continuously)
• Repeat Testing
• Repeat Evaluations
• Repeat Deployment

Monitoring
Monitoring
 Storage
 Memory
 CPU
 Network
 Application Metrics

Tools
• MMS (MongoDB Monitoring Service)
• MongoDB: mongotop, mongostat
• Linux: iostat, vmstat, sar, etc
• Windows: Perfmon
Measure realistic loads (generated by Load testing)

Models
• Load/Users
– Response Time/TTFB
• System Performance
– Peak Usage
– Min Usage

Starter Questions
• What is the working set?
– How does that equate to memory
– How much disk access will that require
• How efficient are the queries?
• What is the rate of data change?
• How big are the highs and lows?

Deployment Types
All of these use the same resources:
• Single Instance
• Multiple Instances (Replica Set)
• Cluster (Sharding)
• Data Centers

Server Engineer, MongoDB
Shaun Verch
Thank You

What's hot

MongoDB Capacity PlanningNorberto Leite

Drupal performancePiyuesh Kumar

2013 CPM Conference, Nov 6th, NoSQL Capacity Planningasya999

MongoDB and Amazon Web Services: Storage Options for MongoDB DeploymentsMongoDB

Scaling PinterestC4Media

#GeodeSummit - Design Tradeoffs in Distributed SystemsPivotalOpenSourceHub

Migration from Redshift to SparkSky Yin

Ansible for large scale deploymentKarthik .P.R

HBaseCon 2015 General Session: Zen - A Graph Data Model on HBaseHBaseCon

RedisConf17 - Home Depot - Turbo charging existing applications with RedisRedis Labs

#GeodeSummit - Off-Heap Storage Current and Future DesignPivotalOpenSourceHub

Optimising for Performancethomas_mb

Managing PostgreSQL with AnsibleEDB

Zero ETL analytics with LLAP in Azure HDInsightAshish Thapliyal

HDInsight for ArchitectsAshish Thapliyal

Compression talkIlya Ganelin

Cassandra Introduction & FeaturesPhil Peace

#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & GeodePivotalOpenSourceHub

Hadoop Infrastructure (Oct. 3rd, 2012)John Dougherty

Running MongoDB on AWSMongoDB

What's hot (20)

MongoDB Capacity Planning

Drupal performance

2013 CPM Conference, Nov 6th, NoSQL Capacity Planning

MongoDB and Amazon Web Services: Storage Options for MongoDB Deployments

Scaling Pinterest

#GeodeSummit - Design Tradeoffs in Distributed Systems

Migration from Redshift to Spark

Ansible for large scale deployment

HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase

RedisConf17 - Home Depot - Turbo charging existing applications with Redis

#GeodeSummit - Off-Heap Storage Current and Future Design

Optimising for Performance

Managing PostgreSQL with Ansible

Zero ETL analytics with LLAP in Azure HDInsight

HDInsight for Architects

Compression talk

Cassandra Introduction & Features

#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode

Hadoop Infrastructure (Oct. 3rd, 2012)

Running MongoDB on AWS

Similar to Webinar: Capacity Planning

Capacity PlanningMongoDB

Capacityplanning Paulo Fagundes

Geek Sync I Capacity Planning for Improved UptimeIDERA Software

Meta scale kognitio hadoop webinarKognitio

Scalability, Availability & Stability PatternsJonas Bonér

Hardware ProvisioningMongoDB

Writing Scalable Software in JavaRuben Badaró

Handling Massive WritesLiran Zelkha

NoSQL_NightClarence J M Tauro

Flashy prefetching for high performance flash drivesPratik Bhat

Hadoop for the Absolute BeginnerIke Ellis

AWS Activate webinar - Scalable databases for fast growing startupsAmazon Web Services

Datavail Health CheckDatavail

DataIntensiveComputing.pdfBrahmam8

Tuning Linux Windows and Firebird for Heavy WorkloadMarius Adrian Popa

Podila mesos con-northamerica_sep2017Sharma Podila

Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...Amazon Web Services

The Evolution of the Oracle Database - Then, Now and Later (Fontys Hogeschool...Lucas Jellema

FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...Mohamed Sayed

Similar to Webinar: Capacity Planning (20)

Capacity Planning

Capacityplanning

Geek Sync I Capacity Planning for Improved Uptime

Meta scale kognitio hadoop webinar

Scalability, Availability & Stability Patterns

Hardware Provisioning

Writing Scalable Software in Java

Handling Massive Writes

NoSQL_Night

Flashy prefetching for high performance flash drives

Hadoop for the Absolute Beginner

AWS Activate webinar - Scalable databases for fast growing startups

Datavail Health Check

DataIntensiveComputing.pdf

Tuning Linux Windows and Firebird for Heavy Workload

Podila mesos con-northamerica_sep2017

Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...

The Evolution of the Oracle Database - Then, Now and Later (Fontys Hogeschool...

FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...

Recently uploaded

Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106

Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays

Artificial intelligence in the post-deep learning eraDeakin University

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren

Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

DMCC Future of Trade Web3 - Special EditionDubai Multi Commodity Centre

Vulnerability_Management_GRC_by Sohang Sengupta.pptxnull - The Open Security Community

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

Recently uploaded (20)

Advanced Test Driven-Development @ php[tek] 2024

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics

Scanning the Internet for External Cloud Exposures via SSL Certs

Connect Wave/ connectwave Pitch Deck Presentation

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn

Artificial intelligence in the post-deep learning era

Human Factors of XR: Using Human Factors to Design XR Systems

My INSURER PTE LTD - Insurtech Innovation Award 2024

Unblocking The Main Thread Solving ANRs and Frozen Frames

SQL Database Design For Developers at php[tek] 2024

Vertex AI Gemini Prompt Engineering Tips

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...

My Hashitalk Indonesia April 2024 Presentation

DMCC Future of Trade Web3 - Special Edition

Vulnerability_Management_GRC_by Sohang Sengupta.pptx

Nell’iperspazio con Rocket: il Framework Web di Rust!

SIP trunking in Janus @ Kamailio World 2024

Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

Webinar: Capacity Planning

1. Server Engineer Shaun Verch Capacity Planning: Deploying MongoDB

2. Capacity Planning • Why is it important? • What is it? • When is it important? • How is it actually done?

3. Why?

4. • What are the consequences of not planning? Why does it matter?

5. What?

6. What is Capacity Planning? Requirements Resources

7. • Availability • Throughput • Responsiveness Requirements

8. • Availability • Throughput • Responsivenes s Requirements to Hardware

9. Resource Usage • Storage – IOPS – Size – Data & Loading Patterns • Memory – Working Set • CPU – Speed – Cores • Network – Latency – Throughput

10. Storage • Active • Archival • Loading Patterns • Integration (BI/DW)

11. Storage • Active • Archival • Loading Patterns • Integration (BI/DW) Example IOPS

12. Example IOPS 7,200 rpm SATA ~ 75-100 IOPS 15,000 rpm SAS ~ 175-210 IOPS Amazon EBS/Provisioned ~ 100 IOPS "up to" 2,000 IOPS Amazon SSD 9,000 – 120,000 IOPS Storage Capability

13. Intel X25-E (SLC) ~ 5,000 IOPS Fusion IO ~ 135,000 IOPS Violin Memory 6000 ~ 1,000,000 IOPS Example IOPS 7,200 rpm SATA ~ 75-100 IOPS 15,000 rpm SAS ~ 175-210 IOPS Amazon EBS/Provisioned ~ 100 IOPS "up to" 2,000 IOPS Amazon SSD 9,000 – 120,000 IOPS Storage Capability

14. Intel X25-E (SLC) ~ 5,000 IOPS Fusion IO ~ 135,000 IOPS Violin Memory 6000 ~ 1,000,000 IOPS Cost of IOPS 7,200 rpm SATA ~ 75-100 IOPS 15,000 rpm SAS ~ 175-210 IOPS Amazon EBS/Provisioned ~ 100 IOPS "up to" 2,000 IOPS Amazon SSD 9,000 – 120,000 IOPS Storage Costs

15. Memory • Working Set – Active Data in Memory – Measured Over Periods

16. Memory • Work: –Sorting –Aggregation –Connections SORTS Connections Aggregations

17. Memory & Storage >< ?

18. Working Set Number of distinct pages accessed per unit of time

19. Working Set Number of distinct pages accessed per second

20. Working Set 4 distinct pages per second

21. Working Set 4 distinct pages per second

22. Working Set 4 distinct pages per second Worst case 4 disk accesses

23. Working Set 6 distinct pages per second

24. Working Set 6 distinct pages per second

25. Working Set 6 distinct pages per second

26. Working Set 6 distinct pages per second Worst case disk access on every op

27. Memory & Storage MOPs PFs

28. CPU • Non-indexed Data • Sorting • Aggregation – Map/Reduce – Framework • Data – Fields – Nesting – Arrays/Embedded-Docs

29. Network • Latency – WriteConcern – ReadPreference – Batching • Throughput – Update/Write Patterns – Reads/Queries

30. What is failure? • We have failed at Capacity Planning when our resources don’t meet our requirements • Because our requirements can have many dimensions, we may exceed our requirements in one characteristic but not meet them in another • This means that we can spend many $$$ and still fail!

31. What about Legacy Hardware? • Let’s hope whatever worked for this legacy technology also works for MongoDB • Same principles of Capacity Planning still apply

32. When?

33. • Before it's too late! • When? Capacity Planning: When Start Launch Version 2

34. Capacity Planning is Measurement Measuring early gives you a comparison point for when you need to do it again

35. Velocity of Change • Limitations -> takes time – Data Movement – Allocation/Provisioning (servers/mem/disk) • Improvement – Limit Size of Change (if you can) – Increase Frequency – MEASURE its effect – Practice

36. Repeat (continuously) • Repeat Testing • Repeat Evaluations • Repeat Deployment

37. How?

38. Monitoring Monitoring  Storage  Memory  CPU  Network  Application Metrics

39. Tools • MMS (MongoDB Monitoring Service) • MongoDB: mongotop, mongostat • Linux: iostat, vmstat, sar, etc • Windows: Perfmon Measure realistic loads (generated by Load testing)

40. Models • Load/Users – Response Time/TTFB • System Performance – Peak Usage – Min Usage

41. Starter Questions • What is the working set? – How does that equate to memory – How much disk access will that require • How efficient are the queries? • What is the rate of data change? • How big are the highs and lows?

42. Deployment Types All of these use the same resources: • Single Instance • Multiple Instances (Replica Set) • Cluster (Sharding) • Data Centers

43. Questions?

44. Server Engineer, MongoDB Shaun Verch Thank You

Editor's Notes

Doesn’t meet your requirements
Understand trade-offs!
7: Why, What, When, Where, How, WhoSkipping Who is you – that's why you're here, right?Where – hopefully on the same systems you'll be running in production, How -- bonusShow evidenceCapacity planning prerequisitesWhy do we need to do this?Epic fail – what's the consequence of not doing this?Thankless job – if you get it right no one notices. But if you get it wrong...Failure to project performance drop-off as the amount of data increases
Once we launch, we don't want to have avoidable down time due to poorly selected HWAs our success grows we want to stay in front of the demand curveWe want to meet business' and users' expectationsWe want to keep our jobs and get big raises! ;)
Initialize -> ElectionPrimary + data replication from primary to secondary
Break into separate slides - Show actual examples (possibly from mms) illustrating these points
Note, change the color to match the actual things.
top three – all add to memory what happens in RAM...if you have 64GB – 20GB sorting aggregation and connectionshow that each using more decreases availability for others2.4 will add a way to estimate "working set"
top three – all add to memory what happens in RAM...if you have 64GB – 20GB sorting aggregation and connectionshow that each using more decreases availability for others2.4 will add a way to estimate "working set"
Take out “Documents (and Collections)”
PriorityFloating point number between 0..1000Highest member that is up to date wins Up to date == within 10 seconds of primaryIf a higher priority member catches up, it will force election and win Slave DelayLags behind master by configurable time delay Automatically hidden from clientsProtects against operator errorsFat fingeringApplication corrupts data
PriorityFloating point number between 0..1000Highest member that is up to date wins Up to date == within 10 seconds of primaryIf a higher priority member catches up, it will force election and win Slave DelayLags behind master by configurable time delay Automatically hidden from clientsProtects against operator errorsFat fingeringApplication corrupts data

Webinar: Capacity Planning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Webinar: Capacity Planning

Similar to Webinar: Capacity Planning (20)

More from MongoDB

More from MongoDB (20)

Recently uploaded

Recently uploaded (20)

Webinar: Capacity Planning

Editor's Notes