• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
2013 CPM Conference, Nov 6th, NoSQL Capacity Planning
 

2013 CPM Conference, Nov 6th, NoSQL Capacity Planning

on

  • 191 views

 

Statistics

Views

Total Views
191
Views on SlideShare
191
Embed Views
0

Actions

Likes
1
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    2013 CPM Conference, Nov 6th, NoSQL Capacity Planning 2013 CPM Conference, Nov 6th, NoSQL Capacity Planning Presentation Transcript

    • #MongoDB #CMGNews NoSQL: Capacity Planning Asya Kamsky Senior Solutions Architect, MongoDB Inc.
    • No SQL?
    • Some History •  1970's Relational Databases Invented –  Storage is expensive –  Data is normalized –  Data is abstracted away from app
    • Some History •  1970's Relational Databases Invented –  Storage is expensive –  Data is normalized –  Data is abstracted away from app •  1980's RDBMS commercialized –  Client/Server model –  SQL becomes the standard
    • Some History •  1970's Relational Databases Invented –  Storage is expensive –  Data is normalized –  Data storage is abstracted away from app •  1980's RDBMS commercialized –  Client/Server model –  SQL becomes the standard •  1990's Things begin to change –  Client/Server=> 3-tier architecture –  Internet and the Web
    • Some History •  2000's Web 2.0 –  "Social Media" –  E-Commerce –  Decrease of HW prices –  Increase of collected data
    • Some History •  2000's Web 2.0 –  "Social Media" –  E-Commerce –  Decrease of HW prices –  Increase of collected data •  Result –  Need to scale -- How do we keep up?
    • Developers •  Agile Development Methodology –  Shorter development cycles –  Constant evolution of requirements –  Flexibility at design time
    • Developers •  Agile Development Methodology –  Shorter development cycles –  Constant evolution of requirements –  Flexibility at design time •  Relational Schema –  Hard to evolve •  must stay in sync with application
    • Workarounds on Application
    • NoSQL vs Relational •  Relational •  Key-Value Graph XML Document Column •  ACID •  BASE •  Two-phase commit •  ACID on document level •  Joins •  No Joins
    • NoSQL Examples
    • NoSQL Examples
    • NoSQL Examples
    • NoSQL Examples
    • All Different NoSQL != RDBMS MongoDB != Cassandra != Neo4j != Redis != Riak != CouchDB != Couchbase
    • MongoDB
    • MongoDB History •  Designed/developed by founders of Doubleclick, ShopWiki, GILT groupe, etc. •  First production site March 2008 - businessinsider.com •  Open Source – AGPL, written in C++ •  Version 0.8 – first official release February 2009 •  Version 2.4 – March 2013
    • MongoDB Design Goals
    • MongoDB: scalable, high-performance •  Document-oriented Storage •  Based on JSON Documents •  Flexible Schema •  Scalable Architecture •  Auto-sharding •  Replication & high availability •  Key Features Include: •  Full featured indexes •  Query language •  Aggregation & Map/Reduce
    • MongoDB Performance Just like all other systems, w/o understanding what their strengths and weaknesses are, it is easy to build a bad system.
    • MongoDB Performance Better data locality Relational MongoDB
    • Better Data Locality •  Data model means "entities" can reside "together" •  Optimize schema for read and write access patterns •  Minimize "seeks" as they dominate IO slowdown •  Failure to take advantage of document model: –  no improved performance –  all the disadvantages with non of the advantages! –  incorrect model can overshoot "all data embedded"
    • MongoDB Performance Better data locality Relational MongoDB In-Memory Caching
    • In-memory Caching •  memory mapped files, •  caching handled by OS, •  naturally leaves most frequently accessed data in RAM •  have enough RAM to fit indexes and working data set for best performance
    • MongoDB Performance In-Memory Caching Auto-Sharding High Availability Better data locality Relational MongoDB Read/Write scaling
    • Auto-Sharding •  horizontal scaling is "built-in" to the product •  Replication is for HA •  Sharding is for scaling •  Number of servers in replica set based on HA requirements •  Number of shards is based on capacity needed vs. single server/replicaset capacity
    • MongoDB Performance* Top 5 Marketing Firm Government Agency Top 5 Investment Bank 10+ fields, arrays, nested documents 20+ fields, arrays, nested documents Queries Key-based 1 – 100 docs/query 80/20 read/write Compound queries Range queries MapReduce 20/80 read/write Compound queries Range queries 50/50 read/write Servers ~250 ~50 ~40 Ops/sec 1,200,000 500,000 30,000 Data Key/value * These figures are provided as examples. Your application governs your performance.
    • Key Performance Considerations Capacity Planning Performance Tuning
    • Capacity Planning: Why, What, When Why? Consequences of not planning?
    • Capacity Planning: Why, What, When
    • Capacity Planning: Why, What, When What? Requirements
    • What •  There is one thing that is absolutely mandatory to have in order to succeed in capacity planning •  Without it, you will not be successful •  We must have REQUIREMENTS from business –  without requirements, we're building a roadmap without knowing the desired destination Imagine building a car without knowing what its top speed should be, acceleration, MPH, and cost?
    • Capacity Planning: Why, What, When What? •  Availability •  Throughput •  Responsiveness
    • What •  Availability: what is uptime requirement? •  Throughput –  average read/write/users –  peak throughput? –  OPS (operations per second)? per hour? per day? •  Responsiveness –  what is acceptable latency? –  is higher during peak times acceptable?
    • Capacity Planning: Why, What, When What? •  Availability •  Throughput •  Responsiveness
    • Capacity Planning: Why, What, When When? •  Before it's too late! Start Launch Version 2
    • When •  At the beginning before production, but after you launch you must continue the process •  Lack of future planning: Failure to project performance drop-off as the amount of data increases – •  Process (steps): -> ACTIONS –  Requirements ask, guess, try/measure. –  Understand application needs –  Choose hardware to meet that pattern (...) –  How many machines you need –  Monitor to recognize growth exceeding current capacity.
    • Capacity Planning: What? Understand Resources –  Storage –  Memory –  CPU –  Network •  Understand Your Application –  Monitor and Collect Metrics –  Model to Predict Change –  Allocate and Deploy –  (repeat process)
    • Resource Usage Storage –  IOPS –  Size –  Data & Loading Patterns Memory –  Working Set CPU –  Speed –  Cores Network –  Latency –  Throughput
    • Storage •  Active •  Archival •  Loading Patterns •  Integration (BI/DW)
    • Storage Capability Example IOPS 7,200 rpm SATA ~ 75-100 IOPS 15,000 rpm SAS ~ 175-210 IOPS Amazon EBS/Provisioned ~ 100 IOPS "up to" 2,000 IOPS Amazon SSD 9,000 – 120,000 IOPS Intel X25-E (SLC) ~ 5,000 IOPS Fusion IO ~ 135,000 IOPS Violin Memory 6000 ~ 1,000,000 IOPS
    • Storage Measuring and Monitoring
    • Storage Measuring and Monitoring
    • Storage Measuring and Monitoring
    • Memory Working Set –  Active Data in Memory –  Measured Over Periods
    • Memory Work: – Sorting – Aggregation – Connections
    • Memory Work: – Sorting – Aggregation – Connections SORTS Connections Aggregations
    • Memory Measuring and Monitoring New in 2.4 –  workingSet option on db.serverStatus() db.serverStatus( { workingSet: 1 } )
    • Memory & Storage ? > <
    • Memory & Storage MOPS: MongoDB Ops/sec
    • Memory & Storage MOPs MOPS: MongoDB Ops/sec PFs
    • Memory & Storage % Disk Util MOPS
    • CPU Non-indexed Data Sorting Aggregation –  Map/Reduce –  Framework Data –  Fields –  Nesting –  Arrays/Embedded-Docs
    • CPU MOPs
    • CPU MOPs CPU %
    • Network Latency –  WriteConcern –  ReadPreference –  Batching –  Documents (and Collections) Throughput –  Update/Write Patterns –  Reads/Queries
    • Starter Questions What is the working set? –  How does that equate to memory –  How much disk access will that require How efficient are the queries? What is the rate of data change? How big are the highs and lows?
    • Deployment Types All of these use the same resources: •  Single Instance •  Multiple Instances (Replica Set) •  Cluster (Sharding) •  Data Centers
    • Capacity Planning: Monitoring Monitor §  Storage §  Memory §  CPU §  Network §  Application Metrics
    • Monitoring •  CLI and internal status commands •  mongostat; mongotop; db.serverStatus() •  Plug-ins for munin, Nagios, cacti, etc. •  Integration via SNMP to other tools •  MMS
    • MongoDB Management Service Cloud-based suite of services for managing MongoDB deployments
    • A Picture Speaks a Thousand Words
    • Symptoms High Use CPU Similar Query Pattern
    • Monitoring Best Practices •  Monitor Logs –  Alert, escalate –  Correlate •  Disk –  Monitor •  Instrument/Monitor App (including logs!) •  Know your application and application (write) characteristics
    • Models •  Load/Users –  Response Time/TTFB •  System Performance –  Peak Usage –  Min/avg Usage
    • Velocity of Change •  Limitations -> takes time –  Data Movement –  Allocation/Provisioning (servers/mem/disk) •  Improvement –  Limit Size of Change (if you can) –  Increase Frequency –  MEASURE its effect –  Practice
    • Repeat (continuously) Repeat Testing Repeat Evaluations Repeat Deployment
    • #MongoDB Thank You Asya Kamsky Senior Solutions Architect, MongoDB