PetaMongo:
A Petabyte Database for as Little as $200
Chris Biow, MongoDB
Miles Ward, AWS
November 13, 2013

© 2013 Amazon....
Agenda
• MongoDB on AWS review
– Guidance, Storage, Architecture

• MongoDB at PetaScale on AWS
Tools to simplify your design
• Whitepaper
• AWS Marketplace
• AWS
CloudFormation

http://media.amazonwebservices.com/AWS_...
• Easy to start a
single node
• Correctly configured
PIOPS EBS Storage
• No extra cost
https://aws.amazon.com/marketplace/...
AWS CloudFormation
• Nested Templates
• Nodes and Storage
• Configurable Scale
• AWS CloudFormation: Your
Infrastructure b...
AWS Storage Options

EBS
PIOPS

SSD

• Amazon EBS – Provisioned IOPS volumes
•
Deliver predictable, high performance for I...
AWS Storage Options
Testing: random 4k reads
EBS

+

One Volume: ~200 MongoOPS with some variability, <1mb/s
Loaded instan...
Testing: random 4k reads

+

PIOPS

Stable

EBS

SSD
Stability Tips
• Ext4 or XFS, nodiratime, noatime
• Raise file descriptor limits
• Set disk read-ahead
• No large virtual ...
• Retain a PIOPS EBS
node for snapshot
backups
• Snapshots allow crossAZ and cross-region
recovery
• SSD hosts as primary
...
Another option…

244gb cr1.8xlarge
So, about that Petabyte
v.cheap
• Spot Market
• m1.small
• 1024 shards
• 1TB EBS from snapshot
• PowerBench reader
• Aggre...
The naming of parts
Amazon Terms
• Provisioned IOPS
• Elastic Compute Cloud
• EC2 Spot Instances
• Auto Scaling groups

Ni...
Players
MongoDB
• Document-model,
NoSQL database
• Dev adoption is
STRONG
• MongoDB Inc.
trending toward
zero h/w

• Scale-up with...
AWS
•
•
•
•

PIOPS for an IO-hungry client
40% of MongoDB customer usage
90% of MongoDB internal usage
More ports :2701[79...
PB & Chocolate
Differentiators for mutual customers
•
•
•
•
•

Fast time-to-solution
Easy global distribution
Secondary in...
Challenge
Motivation: IWBCI…
•
•
•
•
•

Test scale-out of MongoDB beyond typical
Learn massive scale-out on AWS
Do it as cheaply as ...
m1.small us-east1 Spot Market
m1.small us-east1d Spot Market
Proposal
Item

Units

Time

Unit Cost

Net Cost

m1.small Spot 1050

3hr

$0.007/hr

$22.05

m1.large

3

48hrs

$0.056/hr...
Initial Directions
• Spot Instance requests
– m1.small market, mostly us-east-1 (my zone “d”)
– Net: $0.007 / hour = $7 / ...
MongoDB Architecture
• 3x Config Servers
– mongod --configsvr

• Routing
– mongos --configdb a,b,c

• Replica sets (not us...
Range-based sharding
Hash-based sharding
Process Flow
Spot Instance Request (sir-)

• rejected
• Awaiting evaluation
• Awaiting fulfillment
– Partial
– Launch inte...
Config
sir-

Sharded
Shard

MongoD
MongoS
Spot Instance Lifecycle
Progress
Scaling Experience
•
•
•
•
•
•

4, 16, 64, 256, 1024
4: minimum magnitude for 3x Config
16: startup variation, process flo...
Lessons Learned
• Code defensively
• Monitor: MongoDB Mgt Svc, top, iftop,
mongostat
• Avoid sentimental attachment
• Prot...
Refactor
•
•
•
•
•

BenchPress YCSB
Auto Scaling groups request-spot-instances
use VM::EC2; Net::Amazon::EC2
gsh monolithi...
Dee-Luxe
Docs Loaded, 512 shards
1.8E+09
1.6E+09
1.4E+09
1.2E+09
1E+09
800000000
600000000

^ 1X
RAM

400000000
200000000
0
5:16:48...
Further Work
•
•
•
•
•

Replication
Self-healing
MongoDB-appropriate benchmarks
Customer data
Self-hosting cluster
Please give us your feedback on this
presentation

BDT307
As a thank you, we will select prize
winners daily for completed...
PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013
Upcoming SlideShare
Loading in …5
×

PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013

1,232 views
1,033 views

Published on

1,000,000,000,000,000 bytes. On demand. Online. Live. Big doesn't quite describe this data. Amazon Web Services makes it possible to construct highly elastic computing systems, and you can further increase cost efficiency by leveraging the Spot Pricing model for Amazon EC2. We showcase elasticity by demonstrating the creation and teardown of a petabyte-scale multiregion MongoDB NoSQL database cluster, using Amazon EC2 Spot Instances, for as little as $200 in total AWS costs. Oh and it offers up four million IOPS to storage via the power of PIOPS EBS. Christopher Biow, Principal Technologist at 10gen | MongoDB covers MongoDB best practices on AWS, so you can implement this NoSQL system (perhaps at a more pedestrian hundred-terabyte scale?) confidently in the cloud. You could build a massive enterprise warehouse, process a million human genomes, or collect a staggering number of cat GIFs. The possibilities are huMONGOus.

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,232
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
25
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013

  1. 1. PetaMongo: A Petabyte Database for as Little as $200 Chris Biow, MongoDB Miles Ward, AWS November 13, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  2. 2. Agenda • MongoDB on AWS review – Guidance, Storage, Architecture • MongoDB at PetaScale on AWS
  3. 3. Tools to simplify your design • Whitepaper • AWS Marketplace • AWS CloudFormation http://media.amazonwebservices.com/AWS_NoSQL_MongoDB.pdf
  4. 4. • Easy to start a single node • Correctly configured PIOPS EBS Storage • No extra cost https://aws.amazon.com/marketplace/pp/B00COAAEH8/ref=srh_res_product_title?ie=UTF8&sr=0-6&qid=1383897659043
  5. 5. AWS CloudFormation • Nested Templates • Nodes and Storage • Configurable Scale • AWS CloudFormation: Your Infrastructure belongs in your source control mongodb.org/display/DOCS/Automating+Deployment+with+CloudFormation
  6. 6. AWS Storage Options EBS PIOPS SSD • Amazon EBS – Provisioned IOPS volumes • Deliver predictable, high performance for I/O intensive workloads • Specify IOPS required upfront, and EBS provisions for lifetime of volume – 4000 IOPS per volume, can stripe to get thousands of IOPS to an EC2 instance • High IO Instances – hi1.4xlarge • • For some applications that require tens of thousands of IOPS Eliminates network latency/bandwidth as a performance constraint to storage
  7. 7. AWS Storage Options Testing: random 4k reads EBS + One Volume: ~200 MongoOPS with some variability, <1mb/s Loaded instance: ~ 1000 MongoOPS with some variability <10mb/s One Volume: 200 0 MongoOPS with <1% variability, 16mb/s Loaded Instance: 16,000 MongoOPS with <1% variability, 64mb/s PIOPS Loaded Cluster Instance: SSD MongoOPS, 320mb/s Hi1.4xlarge ephemeral: ~64,000 MongoOPS with low variability, ~245mb/s
  8. 8. Testing: random 4k reads + PIOPS Stable EBS SSD
  9. 9. Stability Tips • Ext4 or XFS, nodiratime, noatime • Raise file descriptor limits • Set disk read-ahead • No large virtual memory pages • SNAPSHOT SNAPSHOT SNAPSHOT
  10. 10. • Retain a PIOPS EBS node for snapshot backups • Snapshots allow crossAZ and cross-region recovery • SSD hosts as primary • Shard for scale
  11. 11. Another option… 244gb cr1.8xlarge
  12. 12. So, about that Petabyte v.cheap • Spot Market • m1.small • 1024 shards • 1TB EBS from snapshot • PowerBench reader • Aggregation queries v.fast • • • • • • Auto Scaling On-Demand m2.4xlarge 50 shards 20TB PIOPS indexed PowerBench loader Aggregation queries
  13. 13. The naming of parts Amazon Terms • Provisioned IOPS • Elastic Compute Cloud • EC2 Spot Instances • Auto Scaling groups Nicks • PIOPS • EC2 • Here, Spot! • ASG
  14. 14. Players
  15. 15. MongoDB • Document-model, NoSQL database • Dev adoption is STRONG • MongoDB Inc. trending toward zero h/w • Scale-up with commodity h/w • Scale-out with sharding • Scale-around with replication
  16. 16. AWS • • • • PIOPS for an IO-hungry client 40% of MongoDB customer usage 90% of MongoDB internal usage More ports :2701[79] than :[15]521
  17. 17. PB & Chocolate Differentiators for mutual customers • • • • • Fast time-to-solution Easy global distribution Secondary index Geo, text, security Fast analytic aggregation
  18. 18. Challenge
  19. 19. Motivation: IWBCI… • • • • • Test scale-out of MongoDB beyond typical Learn massive scale-out on AWS Do it as cheaply as possible Apply customer data Break the petabarrier
  20. 20. m1.small us-east1 Spot Market
  21. 21. m1.small us-east1d Spot Market
  22. 22. Proposal Item Units Time Unit Cost Net Cost m1.small Spot 1050 3hr $0.007/hr $22.05 m1.large 3 48hrs $0.056/hr $8.07 S3 1TB 1wk $95/TB/mo 23.75 EBS 1024 x 1TB 1hr $100/TB/mo 142.22 S3  EBS 1PB ?? $0/TB Total 0.00 $196.09 http://calculator.s3.amazonaws.com/G77798SS77SH72
  23. 23. Initial Directions • Spot Instance requests – m1.small market, mostly us-east-1 (my zone “d”) – Net: $0.007 / hour = $7 / hr / K-shard • Perl – use Net::Amazon::EC2; – gaps: parse EC2 command-line API • • • • Defer Chef, Puppet, AWS CloudFormation YCSB userdata.sh t1.micro / m1.small / cr1.8xlarge
  24. 24. MongoDB Architecture • 3x Config Servers – mongod --configsvr • Routing – mongos --configdb a,b,c • Replica sets (not used) • Shards – mongod • Client load – java -cp [] com.yahoo.ycsb.Client
  25. 25. Range-based sharding
  26. 26. Hash-based sharding
  27. 27. Process Flow Spot Instance Request (sir-) • rejected • Awaiting evaluation • Awaiting fulfillment – Partial – Launch intervals • Fulfilled Instances (i-) • Requested • Initializing (i) • Config running (C) • MongoS starting (s) • MongoS running (S) • MongoD starting (D) • Failed/slow response (X)
  28. 28. Config sir- Sharded Shard MongoD MongoS Spot Instance Lifecycle
  29. 29. Progress
  30. 30. Scaling Experience • • • • • • 4, 16, 64, 256, 1024 4: minimum magnitude for 3x Config 16: startup variation, process flow 64: full speed ahead! 256: chunk distribution time 1024: market dependence, client wire saturation
  31. 31. Lessons Learned • Code defensively • Monitor: MongoDB Mgt Svc, top, iftop, mongostat • Avoid sentimental attachment • Prototype / refactor • Make the instances do the work • Mitigate chunk migration
  32. 32. Refactor • • • • • BenchPress YCSB Auto Scaling groups request-spot-instances use VM::EC2; Net::Amazon::EC2 gsh monolithic Perl serf polling
  33. 33. Dee-Luxe
  34. 34. Docs Loaded, 512 shards 1.8E+09 1.6E+09 1.4E+09 1.2E+09 1E+09 800000000 600000000 ^ 1X RAM 400000000 200000000 0 5:16:48 5:45:36 6:14:24 6:43:12 7:12:00 7:40:48
  35. 35. Further Work • • • • • Replication Self-healing MongoDB-appropriate benchmarks Customer data Self-hosting cluster
  36. 36. Please give us your feedback on this presentation BDT307 As a thank you, we will select prize winners daily for completed surveys!

×