• Like

PetaMongo: A Petabyte Database for as Little as $200

  • 2,992 views
Uploaded on

1,000,000,000,000,000 bytes. On demand. Online. Live. Big doesn't quite describe this data. Amazon Web Services makes it possible to construct highly elastic computing systems, and you can further …

1,000,000,000,000,000 bytes. On demand. Online. Live. Big doesn't quite describe this data. Amazon Web Services makes it possible to construct highly elastic computing systems, and you can further increase cost efficiency by leveraging the Spot Pricing model for Amazon EC2. We showcase elasticity by demonstrating the creation and teardown of a petabyte-scale multiregion MongoDB NoSQL database cluster, using Amazon EC2 Spot Instances, for as little as $200 in total AWS costs. Oh and it offers up four million IOPS to storage via the power of PIOPS EBS. Christopher Biow, Principal Technologist at 10gen | MongoDB covers MongoDB best practices on AWS, so you can implement this NoSQL system (perhaps at a more pedestrian hundred-terabyte scale?) confidently in the cloud. You could build a massive enterprise warehouse, process a million human genomes, or collect a staggering number of cat GIFs. The possibilities are huMONGOus.

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
2,992
On Slideshare
0
From Embeds
0
Number of Embeds
5

Actions

Shares
Downloads
44
Comments
1
Likes
6

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. PetaMongo: A Petabyte Database for as Little as $200 Chris Biow, MongoDB Miles Ward, AWS November 13, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 2. Agenda • MongoDB on AWS review – Guidance, Storage, Architecture • MongoDB at PetaScale on AWS
  • 3. Tools to simplify your design • Whitepaper • Marketplace • CloudFormation http://media.amazonwebservices.com/AWS_NoSQL_MongoDB.pdf
  • 4. • Easy to start a single node • Correctly configured PIOPS EBS Storage • No extra cost https://aws.amazon.com/marketplace/pp/B00COAAEH8/ref=srh_res_product_title?ie=UTF8&sr=0-6&qid=1383897659043
  • 5. CloudFormation • Nested Templates • Nodes and Storage • Configurable Scale • CloudFormation: Your Infrastructure belongs in your source control mongodb.org/display/DOCS/Automating+Deployment+with+CloudFormation
  • 6. AWS Storage Options EBS PIOPS SSD • EBS – Provisioned IOPS volumes • Deliver predictable, high performance for I/O intensive workloads • Specify IOPS required upfront, and EBS provisions for lifetime of volume – 4000 IOPS per volume, can stripe to get thousands of IOPS to an EC2 instance • High IO Instances – hi1.4xlarge • • For some applications that require tens of thousands of IOPS Eliminates network latency/bandwidth as a performance constraint to storage
  • 7. AWS Storage Options Testing: random 4k reads EBS + One Volume: ~200 MongoOPS with some variability, <1mb/s Loaded instance: ~ 1000 MongoOPS with some variability <10mb/s One Volume: 200 0 MongoOPS with <1% variability, 16mb/s Loaded Instance: 16,000 MongoOPS with <1% variability, 64mb/s PIOPS Loaded Cluster Instance: SSD MongoOPS, 320mb/s Hi1.4xlarge ephemeral: ~64,000 MongoOPS with low variability, ~245mb/s
  • 8. Testing: random 4k reads + PIOPS Stable EBS SSD
  • 9. Stability Tips • Ext4 or XFS, nodiratime, noatime • Raise file descriptor limits • Set disk read-ahead • No large virtual memory pages • SNAPSHOT SNAPSHOT SNAPSHOT
  • 10. • Retain a PIOPS EBS node for snapshot backups • Snapshots allow crossAZ and cross-region recovery • SSD hosts as primary • Shard for scale
  • 11. Another option… 244gb cr1.8xlarge
  • 12. So, about that Petabyte v.cheap • Spot Market • m1.small • 1024 shards • 1TB EBS from snapshot • PowerBench reader • Aggregation queries v.fast • AutoScaling On-Demand • cc2.8xlarge • 44 instances x 24 shards each • 24TBx1K PIOPS indexed • YCSB loader • Aggregation queries
  • 13. The naming of parts Amazon Terms • Provisioned IOPS • Elastic Compute Cloud • EC2 Spot Instances • Auto Scaling groups Nicks • PIOPS • EC2 • Here, Spot! • ASG
  • 14. Players
  • 15. MongoDB • Document-model, NoSQL database • Dev adoption is STRONG • MongoDB Inc. trending toward zero h/w • Scale-up with commodity h/w • Scale-out with sharding • Scale-around with replication
  • 16. Dev Activity: stackoverflow.com
  • 17. AWS • • • • PIOPS for an IO-hungry client 40% of MongoDB customer usage 90% of MongoDB internal usage More ports :2701[79] than :[15]521
  • 18. PB & Chocolate Differentiators for mutual customers • • • • • • Fast time-to-solution Easy global distribution Document model Secondary indexes Geo, text, security Fast analytic aggregation
  • 19. Challenge
  • 20. Motivation: IWBCI… • • • • • Test scale-out of MongoDB beyond typical Learn massive scale-out on AWS Do it as cheaply as possible Apply customer data Break the petabarrier
  • 21. m1.small us-east1 Spot Market
  • 22. m1.small us-east1d Spot Market
  • 23. Proposal Item Units Time Unit Cost Net Cost m1.small Spot 1050 3hr $0.007/hr $22.05 m1.large 3 48hrs $0.056/hr $8.07 S3 1TB 1wk $95/TB/mo 23.75 EBS 1024 x 1TB 1hr $100/TB/mo 142.22 S3  EBS 1PB lazy $0/TB Total 0.00 $196.09 http://calculator.s3.amazonaws.com/G77798SS77SH72
  • 24. Initial Directions • Spot instance requests – m1.small market, mostly us-east-1 (my zone “d”) – Net: $0.007 / hour = $7 / hr / K-shard • Perl – use Net::Amazon::EC2; – gaps: parse EC2 command-line API • • • • Defer Chef, Puppet, CloudFormation YCSB userdata.sh t1.micro / m1.small / cr1.8xlarge
  • 25. MongoDB Architecture • 3x Config Servers – mongod --configsvr • Routing – mongos --configdb a,b,c • Replica sets (not used) • Shards – mongod • Client load – java -cp [] com.yahoo.ycsb.Client
  • 26. Range-based sharding
  • 27. Hash-based sharding
  • 28. Process Flow Spot Instance Request (sir-) • Rejected • Awaiting evaluation • Awaiting fulfillment – Partial – Launch intervals • Fulfilled Instances (i-) • Requested • Initializing (i) • Config running (C) • MongoS starting (s) • MongoS running (S) • MongoD starting (D) • Failed/slow response (X)
  • 29. Config sir- Sharded Shard MongoD MongoS Spot Instance Lifecycle
  • 30. Progress
  • 31. Scale Out Experience • • • • • • Sharding by magnitude: 4, 16, 64, 256, 1024 4: functional validation 16: startup variation, process flow 64: full speed ahead! 256: chunk distribution time, single Config 1024: market dependence, client wire saturation
  • 32. Lessons Learned • Code defensively • Monitor: MongoDB Mgt Svc, top, iftop, iostat, mongostat • Avoid sentimental attachment (i-8bad8bee) • Prototype / refactor • Make the instances do the work • Mitigate chunk migration
  • 33. Refactor • • • • • BenchPress YCSB Auto Scaling Groups request-spot-instances use VM::EC2; Net::Amazon::EC2 gsh monolithic Perl serf polling
  • 34. Secure Cloud Networking Enable customers to easily connect, manage and secure applications across VPCs, regions, and hybrid infrastructures. Cloud-scale your VPC connectivity! After the Session: Survey - $500 Gift Card Or schedule a demo Info@unionbaynetworks.com VPC 1 VPC 2 Application Service Mesh
  • 35. 1KB Docs Loaded, 512 shards 1,800,000,000 1,600,000,000 1,400,000,000 1,200,000,000 1,000,000,000 800,000,000 600,000,000 400,000,000 ^ 1X RAM 200,000,000 0 5:16:48 5:45:36 6:14:24 6:43:12 7:12:00 7:40:48
  • 36. 1KB Docs Loaded, 1035 shards, 2 jobs conflicting 2,500,000,000 2,000,000,000 1,500,000,000 1,000,000,000 ^ 1X RAM 500,000,000 0 4:19:12 5:31:12 6:43:12 7:55:12 9:07:12 10:19:12 11:31:12 12:43:12 13:55:12
  • 37. Dee-Luxe
  • 38. 3,500,000 cc2.8xlarge, 24 x 1TB-4K PIOPS EBS, bulk-load 64KB docs 3,000,000 64KBdocs 2,500,000 2,000,000 1,500,000 1,000,000 100% RAM 500,000 0 12:00:00 AM 12:07:12 AM 12:14:24 AM 12:21:36 AM 12:28:48 AM
  • 39. 140,000,000 cc2.8xlarge, 24 x 1TB-4K PIOPS EBS, bulk-load 64KB docs 120,000,000 64KBdocs 100,000,000 80,000,000 60,000,000 40,000,000 20,000,000 0 12:00:00 AM 2:24:00 AM 4:48:00 AM 7:12:00 AM 9:36:00 AM 12:00:00 PM 2:24:00 PM 4:48:00 PM 7:12:00 PM
  • 40. Further Work • • • • • • Completion Replication Self-healing MongoDB-appropriate benchmarks Customer data Self-hosting cluster
  • 41. Please give us your feedback on this presentation BDT307 As a thank you, we will select prize winners daily for completed surveys!