SlideShare a Scribd company logo
1 of 43
PetaMongo:
A Petabyte Database for as Little as $200
Chris Biow, MongoDB
Miles Ward, AWS
November 13, 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Agenda
• MongoDB on AWS review
– Guidance, Storage, Architecture

• MongoDB at PetaScale on AWS
Tools to simplify your design
• Whitepaper
• Marketplace
• CloudFormation

http://media.amazonwebservices.com/AWS_NoSQL_MongoDB.pdf
• Easy to start a
single node

• Correctly configured
PIOPS EBS Storage
• No extra cost
https://aws.amazon.com/marketplace/pp/B00COAAEH8/ref=srh_res_product_title?ie=UTF8&sr=0-6&qid=1383897659043
CloudFormation
• Nested Templates
• Nodes and Storage
• Configurable Scale
• CloudFormation: Your
Infrastructure belongs in your
source control
mongodb.org/display/DOCS/Automating+Deployment+with+CloudFormation
AWS Storage Options

EBS
PIOPS

SSD

• EBS – Provisioned IOPS volumes
•
Deliver predictable, high performance for I/O intensive workloads
•
Specify IOPS required upfront, and EBS provisions for lifetime of volume
– 4000 IOPS per volume, can stripe to get thousands of IOPS to an EC2 instance

• High IO Instances – hi1.4xlarge
•
•

For some applications that require tens of thousands of IOPS
Eliminates network latency/bandwidth as a performance constraint to storage
AWS Storage Options
Testing: random 4k reads
EBS

+

One Volume: ~200 MongoOPS with some variability, <1mb/s
Loaded instance: ~ 1000 MongoOPS with some variability <10mb/s
One Volume: 200
0 MongoOPS with <1% variability, 16mb/s
Loaded Instance: 16,000 MongoOPS with <1% variability, 64mb/s

PIOPS

Loaded Cluster Instance:

SSD

MongoOPS, 320mb/s

Hi1.4xlarge ephemeral: ~64,000 MongoOPS with low variability, ~245mb/s
Testing: random 4k reads

+

PIOPS

Stable

EBS

SSD
Stability Tips
• Ext4 or XFS, nodiratime, noatime

• Raise file descriptor limits
• Set disk read-ahead

• No large virtual memory pages
• SNAPSHOT SNAPSHOT SNAPSHOT
• Retain a PIOPS EBS
node for snapshot
backups

• Snapshots allow crossAZ and cross-region
recovery
• SSD hosts as primary
• Shard for scale
Another option…

244gb cr1.8xlarge
So, about that Petabyte
v.cheap
• Spot Market
• m1.small
• 1024 shards
• 1TB EBS from snapshot
• PowerBench reader
• Aggregation queries

v.fast
• AutoScaling On-Demand
• cc2.8xlarge
• 44 instances x 24 shards
each
• 24TBx1K PIOPS indexed
• YCSB loader
• Aggregation queries
The naming of parts
Amazon Terms
• Provisioned IOPS
• Elastic Compute Cloud
• EC2 Spot Instances
• Auto Scaling groups

Nicks
• PIOPS
• EC2
• Here, Spot!
• ASG
Players
MongoDB
• Document-model,
NoSQL database
• Dev adoption is
STRONG
• MongoDB Inc.
trending toward
zero h/w

• Scale-up with commodity h/w
• Scale-out with sharding
• Scale-around with replication
Dev Activity: stackoverflow.com
AWS
•
•
•
•

PIOPS for an IO-hungry client
40% of MongoDB customer usage
90% of MongoDB internal usage
More ports :2701[79] than :[15]521
PB & Chocolate
Differentiators for mutual customers
•
•
•
•
•
•

Fast time-to-solution
Easy global distribution
Document model
Secondary indexes
Geo, text, security
Fast analytic aggregation
Challenge
Motivation: IWBCI…
•
•
•
•
•

Test scale-out of MongoDB beyond typical
Learn massive scale-out on AWS
Do it as cheaply as possible
Apply customer data
Break the petabarrier
m1.small us-east1 Spot Market
m1.small us-east1d Spot Market
Proposal
Item

Units

Time

Unit Cost

Net Cost

m1.small Spot 1050

3hr

$0.007/hr

$22.05

m1.large

3

48hrs

$0.056/hr

$8.07

S3

1TB

1wk

$95/TB/mo

23.75

EBS

1024 x 1TB

1hr

$100/TB/mo

142.22

S3  EBS

1PB

lazy

$0/TB

Total

0.00
$196.09

http://calculator.s3.amazonaws.com/G77798SS77SH72
Initial Directions
• Spot instance requests
– m1.small market, mostly us-east-1 (my zone “d”)
– Net: $0.007 / hour = $7 / hr / K-shard

• Perl
– use Net::Amazon::EC2;
– gaps: parse EC2 command-line API

•
•
•
•

Defer Chef, Puppet, CloudFormation
YCSB
userdata.sh
t1.micro / m1.small / cr1.8xlarge
MongoDB Architecture
• 3x Config Servers
– mongod --configsvr

• Routing
– mongos --configdb a,b,c

• Replica sets (not used)
• Shards
– mongod

• Client load
– java -cp [] com.yahoo.ycsb.Client
Range-based sharding
Hash-based sharding
Process Flow
Spot Instance Request (sir-)

• Rejected
• Awaiting evaluation
• Awaiting fulfillment
– Partial
– Launch intervals

• Fulfilled

Instances (i-)
• Requested
• Initializing (i)
• Config running (C)
• MongoS starting (s)
• MongoS running (S)
• MongoD starting (D)
• Failed/slow response (X)
Config
sir-

Sharded

Shard
MongoD
MongoS
Spot Instance Lifecycle
Progress
Scale Out Experience
•
•
•
•
•
•

Sharding by magnitude: 4, 16, 64, 256, 1024
4: functional validation
16: startup variation, process flow
64: full speed ahead!
256: chunk distribution time, single Config
1024: market dependence, client wire saturation
Lessons Learned
• Code defensively
• Monitor: MongoDB Mgt Svc, top, iftop, iostat,
mongostat
• Avoid sentimental attachment (i-8bad8bee)
• Prototype / refactor
• Make the instances do the work
• Mitigate chunk migration
Refactor
•
•
•
•
•

BenchPress YCSB
Auto Scaling Groups request-spot-instances
use VM::EC2; Net::Amazon::EC2
gsh monolithic Perl
serf polling
Secure Cloud Networking
Enable customers to easily connect,
manage and secure applications across
VPCs, regions, and hybrid infrastructures.
Cloud-scale your VPC connectivity!
After the Session:
Survey - $500 Gift Card
Or schedule a demo
Info@unionbaynetworks.com

VPC 1

VPC 2

Application
Service
Mesh
1KB Docs Loaded, 512 shards
1,800,000,000
1,600,000,000
1,400,000,000
1,200,000,000
1,000,000,000
800,000,000
600,000,000
400,000,000

^ 1X
RAM

200,000,000
0
5:16:48

5:45:36

6:14:24

6:43:12

7:12:00

7:40:48
1KB Docs Loaded, 1035 shards, 2 jobs conflicting
2,500,000,000

2,000,000,000

1,500,000,000

1,000,000,000

^ 1X
RAM

500,000,000

0
4:19:12

5:31:12

6:43:12

7:55:12

9:07:12

10:19:12

11:31:12

12:43:12

13:55:12
Dee-Luxe
3,500,000

cc2.8xlarge, 24 x 1TB-4K PIOPS EBS, bulk-load 64KB docs
3,000,000
64KBdocs
2,500,000

2,000,000

1,500,000

1,000,000

100% RAM
500,000

0
12:00:00 AM

12:07:12 AM

12:14:24 AM

12:21:36 AM

12:28:48 AM
140,000,000

cc2.8xlarge, 24 x 1TB-4K PIOPS EBS, bulk-load 64KB docs
120,000,000
64KBdocs
100,000,000

80,000,000

60,000,000

40,000,000

20,000,000

0
12:00:00 AM

2:24:00 AM

4:48:00 AM

7:12:00 AM

9:36:00 AM

12:00:00 PM

2:24:00 PM

4:48:00 PM

7:12:00 PM
Further Work
•
•
•
•
•
•

Completion
Replication
Self-healing
MongoDB-appropriate benchmarks
Customer data
Self-hosting cluster
Please give us your feedback on this
presentation

BDT307
As a thank you, we will select prize
winners daily for completed surveys!

More Related Content

More from MongoDB

MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDBMongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDBMongoDB
 
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...MongoDB
 
MongoDB .local Paris 2020: Adéo @MongoDB : MongoDB Atlas & Leroy Merlin : et ...
MongoDB .local Paris 2020: Adéo @MongoDB : MongoDB Atlas & Leroy Merlin : et ...MongoDB .local Paris 2020: Adéo @MongoDB : MongoDB Atlas & Leroy Merlin : et ...
MongoDB .local Paris 2020: Adéo @MongoDB : MongoDB Atlas & Leroy Merlin : et ...MongoDB
 
MongoDB .local Paris 2020: Les bonnes pratiques pour travailler avec les donn...
MongoDB .local Paris 2020: Les bonnes pratiques pour travailler avec les donn...MongoDB .local Paris 2020: Les bonnes pratiques pour travailler avec les donn...
MongoDB .local Paris 2020: Les bonnes pratiques pour travailler avec les donn...MongoDB
 
MongoDB .local Paris 2020: Devenez explorateur de données avec MongoDB Charts
MongoDB .local Paris 2020: Devenez explorateur de données avec MongoDB ChartsMongoDB .local Paris 2020: Devenez explorateur de données avec MongoDB Charts
MongoDB .local Paris 2020: Devenez explorateur de données avec MongoDB ChartsMongoDB
 
MongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDB
MongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDBMongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDB
MongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDBMongoDB
 
MongoDB .local Toronto 2019: Keep your Business Safe and Scaling Holistically...
MongoDB .local Toronto 2019: Keep your Business Safe and Scaling Holistically...MongoDB .local Toronto 2019: Keep your Business Safe and Scaling Holistically...
MongoDB .local Toronto 2019: Keep your Business Safe and Scaling Holistically...MongoDB
 

More from MongoDB (20)

MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDBMongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
 
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...
 
MongoDB .local Paris 2020: Adéo @MongoDB : MongoDB Atlas & Leroy Merlin : et ...
MongoDB .local Paris 2020: Adéo @MongoDB : MongoDB Atlas & Leroy Merlin : et ...MongoDB .local Paris 2020: Adéo @MongoDB : MongoDB Atlas & Leroy Merlin : et ...
MongoDB .local Paris 2020: Adéo @MongoDB : MongoDB Atlas & Leroy Merlin : et ...
 
MongoDB .local Paris 2020: Les bonnes pratiques pour travailler avec les donn...
MongoDB .local Paris 2020: Les bonnes pratiques pour travailler avec les donn...MongoDB .local Paris 2020: Les bonnes pratiques pour travailler avec les donn...
MongoDB .local Paris 2020: Les bonnes pratiques pour travailler avec les donn...
 
MongoDB .local Paris 2020: Devenez explorateur de données avec MongoDB Charts
MongoDB .local Paris 2020: Devenez explorateur de données avec MongoDB ChartsMongoDB .local Paris 2020: Devenez explorateur de données avec MongoDB Charts
MongoDB .local Paris 2020: Devenez explorateur de données avec MongoDB Charts
 
MongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDB
MongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDBMongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDB
MongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDB
 
MongoDB .local Toronto 2019: Keep your Business Safe and Scaling Holistically...
MongoDB .local Toronto 2019: Keep your Business Safe and Scaling Holistically...MongoDB .local Toronto 2019: Keep your Business Safe and Scaling Holistically...
MongoDB .local Toronto 2019: Keep your Business Safe and Scaling Holistically...
 

Recently uploaded

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 

Recently uploaded (20)

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 

PetaMongo: A Petabyte Database for as Little as $200

  • 1. PetaMongo: A Petabyte Database for as Little as $200 Chris Biow, MongoDB Miles Ward, AWS November 13, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 2. Agenda • MongoDB on AWS review – Guidance, Storage, Architecture • MongoDB at PetaScale on AWS
  • 3. Tools to simplify your design • Whitepaper • Marketplace • CloudFormation http://media.amazonwebservices.com/AWS_NoSQL_MongoDB.pdf
  • 4. • Easy to start a single node • Correctly configured PIOPS EBS Storage • No extra cost https://aws.amazon.com/marketplace/pp/B00COAAEH8/ref=srh_res_product_title?ie=UTF8&sr=0-6&qid=1383897659043
  • 5. CloudFormation • Nested Templates • Nodes and Storage • Configurable Scale • CloudFormation: Your Infrastructure belongs in your source control mongodb.org/display/DOCS/Automating+Deployment+with+CloudFormation
  • 6. AWS Storage Options EBS PIOPS SSD • EBS – Provisioned IOPS volumes • Deliver predictable, high performance for I/O intensive workloads • Specify IOPS required upfront, and EBS provisions for lifetime of volume – 4000 IOPS per volume, can stripe to get thousands of IOPS to an EC2 instance • High IO Instances – hi1.4xlarge • • For some applications that require tens of thousands of IOPS Eliminates network latency/bandwidth as a performance constraint to storage
  • 7. AWS Storage Options Testing: random 4k reads EBS + One Volume: ~200 MongoOPS with some variability, <1mb/s Loaded instance: ~ 1000 MongoOPS with some variability <10mb/s One Volume: 200 0 MongoOPS with <1% variability, 16mb/s Loaded Instance: 16,000 MongoOPS with <1% variability, 64mb/s PIOPS Loaded Cluster Instance: SSD MongoOPS, 320mb/s Hi1.4xlarge ephemeral: ~64,000 MongoOPS with low variability, ~245mb/s
  • 8. Testing: random 4k reads + PIOPS Stable EBS SSD
  • 9. Stability Tips • Ext4 or XFS, nodiratime, noatime • Raise file descriptor limits • Set disk read-ahead • No large virtual memory pages • SNAPSHOT SNAPSHOT SNAPSHOT
  • 10. • Retain a PIOPS EBS node for snapshot backups • Snapshots allow crossAZ and cross-region recovery • SSD hosts as primary • Shard for scale
  • 12. So, about that Petabyte v.cheap • Spot Market • m1.small • 1024 shards • 1TB EBS from snapshot • PowerBench reader • Aggregation queries v.fast • AutoScaling On-Demand • cc2.8xlarge • 44 instances x 24 shards each • 24TBx1K PIOPS indexed • YCSB loader • Aggregation queries
  • 13. The naming of parts Amazon Terms • Provisioned IOPS • Elastic Compute Cloud • EC2 Spot Instances • Auto Scaling groups Nicks • PIOPS • EC2 • Here, Spot! • ASG
  • 15. MongoDB • Document-model, NoSQL database • Dev adoption is STRONG • MongoDB Inc. trending toward zero h/w • Scale-up with commodity h/w • Scale-out with sharding • Scale-around with replication
  • 17. AWS • • • • PIOPS for an IO-hungry client 40% of MongoDB customer usage 90% of MongoDB internal usage More ports :2701[79] than :[15]521
  • 18. PB & Chocolate Differentiators for mutual customers • • • • • • Fast time-to-solution Easy global distribution Document model Secondary indexes Geo, text, security Fast analytic aggregation
  • 20. Motivation: IWBCI… • • • • • Test scale-out of MongoDB beyond typical Learn massive scale-out on AWS Do it as cheaply as possible Apply customer data Break the petabarrier
  • 23. Proposal Item Units Time Unit Cost Net Cost m1.small Spot 1050 3hr $0.007/hr $22.05 m1.large 3 48hrs $0.056/hr $8.07 S3 1TB 1wk $95/TB/mo 23.75 EBS 1024 x 1TB 1hr $100/TB/mo 142.22 S3  EBS 1PB lazy $0/TB Total 0.00 $196.09 http://calculator.s3.amazonaws.com/G77798SS77SH72
  • 24. Initial Directions • Spot instance requests – m1.small market, mostly us-east-1 (my zone “d”) – Net: $0.007 / hour = $7 / hr / K-shard • Perl – use Net::Amazon::EC2; – gaps: parse EC2 command-line API • • • • Defer Chef, Puppet, CloudFormation YCSB userdata.sh t1.micro / m1.small / cr1.8xlarge
  • 25. MongoDB Architecture • 3x Config Servers – mongod --configsvr • Routing – mongos --configdb a,b,c • Replica sets (not used) • Shards – mongod • Client load – java -cp [] com.yahoo.ycsb.Client
  • 26.
  • 29. Process Flow Spot Instance Request (sir-) • Rejected • Awaiting evaluation • Awaiting fulfillment – Partial – Launch intervals • Fulfilled Instances (i-) • Requested • Initializing (i) • Config running (C) • MongoS starting (s) • MongoS running (S) • MongoD starting (D) • Failed/slow response (X)
  • 32. Scale Out Experience • • • • • • Sharding by magnitude: 4, 16, 64, 256, 1024 4: functional validation 16: startup variation, process flow 64: full speed ahead! 256: chunk distribution time, single Config 1024: market dependence, client wire saturation
  • 33. Lessons Learned • Code defensively • Monitor: MongoDB Mgt Svc, top, iftop, iostat, mongostat • Avoid sentimental attachment (i-8bad8bee) • Prototype / refactor • Make the instances do the work • Mitigate chunk migration
  • 34. Refactor • • • • • BenchPress YCSB Auto Scaling Groups request-spot-instances use VM::EC2; Net::Amazon::EC2 gsh monolithic Perl serf polling
  • 35. Secure Cloud Networking Enable customers to easily connect, manage and secure applications across VPCs, regions, and hybrid infrastructures. Cloud-scale your VPC connectivity! After the Session: Survey - $500 Gift Card Or schedule a demo Info@unionbaynetworks.com VPC 1 VPC 2 Application Service Mesh
  • 36. 1KB Docs Loaded, 512 shards 1,800,000,000 1,600,000,000 1,400,000,000 1,200,000,000 1,000,000,000 800,000,000 600,000,000 400,000,000 ^ 1X RAM 200,000,000 0 5:16:48 5:45:36 6:14:24 6:43:12 7:12:00 7:40:48
  • 37. 1KB Docs Loaded, 1035 shards, 2 jobs conflicting 2,500,000,000 2,000,000,000 1,500,000,000 1,000,000,000 ^ 1X RAM 500,000,000 0 4:19:12 5:31:12 6:43:12 7:55:12 9:07:12 10:19:12 11:31:12 12:43:12 13:55:12
  • 39. 3,500,000 cc2.8xlarge, 24 x 1TB-4K PIOPS EBS, bulk-load 64KB docs 3,000,000 64KBdocs 2,500,000 2,000,000 1,500,000 1,000,000 100% RAM 500,000 0 12:00:00 AM 12:07:12 AM 12:14:24 AM 12:21:36 AM 12:28:48 AM
  • 40. 140,000,000 cc2.8xlarge, 24 x 1TB-4K PIOPS EBS, bulk-load 64KB docs 120,000,000 64KBdocs 100,000,000 80,000,000 60,000,000 40,000,000 20,000,000 0 12:00:00 AM 2:24:00 AM 4:48:00 AM 7:12:00 AM 9:36:00 AM 12:00:00 PM 2:24:00 PM 4:48:00 PM 7:12:00 PM
  • 41.
  • 43. Please give us your feedback on this presentation BDT307 As a thank you, we will select prize winners daily for completed surveys!