13. Bare Metal
• Build to your specs
• Robust, quickly scaled environment
• Management of all aspects of environment
• Great for Big Data Solutions like MongoDB
14. Cloud Subscription
• Preconfigured
• Performance Tuned
• Bare Metal Single Tenant
• Complex Environment Configurations
15. Pre-configurations
• Set SSD Read Ahead Defaults to 16 Blocks – SSD drives have
excellent seek times allowing for shrinking the Read Ahead to
16 blocks. Spinning disks might require slight buffering so these
have been set to 32 blocks.
• noatime – Adding the noatime option eliminates the need for
the system to make writes to the file system for files which are
simply being read — or in other words: Faster file access and
less disk wear.
16. • Turn NUMA Off in BIOS – Linux, NUMA and MongoDB
tend not to work well together. If you are running
MongoDB on NUMA hardware, we recommend turning it
off (running with an interleave memory policy). If you
don’t, problems will manifest in strange ways like
massive slow downs for periods of time or high system
CPU time.
• Set ulimit – We have set the ulimit to 64000 for open files
and 32000 for user processes to prevent failures due to a
loss of available file handles or user processes.
17. Use ext4 – We have selected ext4 over ext3. We found ext3
to be very slow in allocating files (or removing them).
Additionally, access within large files is poor with ext3.
20. Tests Performed
• Small MongoDB Cloud Subscription vs Shared
Virtual Instance
• Medium MongoDB Cloud Subscription vs
Shared Virtual Instance
• SSD and 15K SAS
• Large MongoDB Cloud Subscription vs Shared
Virtual Instance
• SSD and 15K SAS
21. Small Test
Small (SM) Cloud Subscription MongoDB Server
Single 4-core Intel 1270 CPU
64-bit CentOS
8GB RAM
2 x 500GB SATAII – RAID1
1Gb Network
Virtual Provider Instance
4 Virtual Compute Units
64-bit CentOS
7.5GB RAM
2 x 500GB Network Storage – RAID1
1Gb Network
22. Small Test
Tests Performed
Small Data Set (8GB of .5mb documents)
200 iterations of 6:1 query-to-update operations
Concurrent client connections exponentially increased from 1 to 32
Test duration spanned 48 hours
27. Medium Test
Medium (MD) Cloud Subscription MongoDB Server
Dual 6-core Intel 5670 CPUs
64-bit CentOS
36GB RAM
2 x 64GB SSD – RAID1 (Journal Mount)
4 x 300GB 15K SAS – RAID10 (Data Mount)
1Gb Network – Bonded
Virtual Provider Instance
26 Virtual Compute Units
64-bit CentOS
30GB RAM
2 x 64GB Network Storage – RAID1 (Journal Mount)
4 x 300GB Network Storage – RAID10 (Data Mount)
1Gb Network
28. Medium Test
Tests Performed
Small Data Set (32GB of .5mb documents)
200 iterations of 6:1 query-to-update operations
Concurrent client connections exponentially increased from 1 to 128
Test duration spanned 48 hours
29. Medium Test 15k SAS
Average Read Operations per Second by Concurrent Client
30. Medium Test 15k SAS
Peak Read Operations per Second by Concurrent Client
31. Medium Test 15k SAS
Average Write Operations per Second by Concurrent Client
32. Medium Test 15k SAS
Peak Write Operations per Second by Concurrent Client
37. Large Test
Large (LG) Cloud Subscription MongoDB Server
Dual 8-core Intel E5-2620 CPUs
64-bit CentOS
128GB RAM
2 x 64GB SSD – RAID1 (Journal Mount)
6 x 600GB 15K SAS – RAID10 (Data Mount)
1Gb Network – Bonded
Virtual Provider Instance
26 Virtual Compute Units
64-bit CentOS
64GB RAM (Maximum available on this provider)
2 x 64GB Network Storage – RAID1 (Journal Mount)
6 x 600GB Network Storage – RAID10 (Data Mount)
1Gb Network
38. Large Test
Tests Performed
Small Data Set (64GB of .5mb documents)
200 iterations of 6:1 query-to-update operations
Concurrent client connections exponentially increased from 1 to 128
Test duration spanned 48 hours
39. Large Test 15k SAS
Average Read Operations per Second by Concurrent Client
40. Large Test 15k SAS
Peak Read Operations per Second by Concurrent Client
41. Large Test 15k SAS
Average Write Operations per Second by Concurrent Client
42. Large Test 15k SAS
Peak Write Operations per Second by Concurrent Client
I am HH work for Softlayer for about 6-7 years now, Work in product innovation as a Sr Software ArchitectPart of what we do is R&D for new product solutions for Softlayer which gives me the opportunity to get exposure to a lot of exciting new technologies and solutionsOne thing I’ve been working with lately has been Big Data solutions specifically MongoDBToday we are talking about MongoDB Cloud SubscriptionSome of how we put it togetherSome considerations for deployment and how we arrived at the model we didSome metrics/info on performanceSome helpful hints
Softlayer?
In celebration of Valentines Weekend…….
When I talk about ANY big data solution I love to put this slide upIt’s a great illustration of why we are doing thisHelps to get you thinking about the challenges of deploying a solution like MongoDB
So here is our one and only obligatory analyst slide I promiseThink in terms of the 3 V’s Gartner definedThere are lots of 4th V’s (Value, Veracity, etc.) But really these apply to all data right? These 3 are at the coreAlso for our discussion today we are mostly going to be focused on Volume and Velocity (Variety is a given for us)These are important to consider when we start talking about how we want to deploy our solutionHow much and How fast is our data going to come at us?
So when we talk about Physical deployment we have 2 options obviously:Multi Tenant and Single TenantBoth have their strengths and weaknesses
Typically fast to setup up frontGreat for entry level, POC, Testing, Small applications where maybe things like Velocity aren’t as importantAt first these deployments look very affordableBut we are usually talking about shared network attached resourcesWith shared I/O comes widely varied performance that I am convinced is based upon the direction of the wind in some casesPersonal tests have shown that standard deviation swings are as large (30% or higher)You are going to here me talk a lot today about RSD relative standard deviation when we get to some actual performance testing numbersMost platforms use network attached storageI DO NOT USE NETWORK ATTACHED STORAGE BACKED VIRTUAL INSTANCES for disk intensive applications MongoDBFor everyone that hit the snooze button on my presentation this is probably the most important take away I can give youSo I will repeat that because it is very importantWe found that most customers wanting I/O intensive applications like MongoDB that have an absolute requirement for virtual instances do better with local disk for obvious reasonsNo network hop to data = better performance so we push our customers implementing heavy disk I/O solutions MongoDB to our Local Disc Virtual Instances when they have a hard requirement Multi Tenant Public CloudThat’s not our best solution but when they just can’t leave a virtual instance at least local disk helps alleviate some of the shared resource pain for these sorts of applications
When we talk about a public cloud deployment everyone has this dream of just right clicking and “adding new” and everything is perfect
Although at first things seem simple scaling on multi tenant (especially with NAS) gets trickyIn this case this is a SINGLE instance of a Mongo Node (This is one node, most deployments are going to have 3 or more of these)In order to achieve desired performance you have to raid network volumes and attach them to virtual instancesThis still doesn’t solve shared I/O deviation issues it just smears them so they may not spike as drastically
It gets even crazier when you do highly available deploymentsStriped volumes (sometimes up to 10) attachedSo you can see that as you scale on a NAS Virtual environment You start to see when you look at this picture that your simple Virtualized environment has suddenly started to get very complexIf you are an engineer that believes in keeping things simple to avoid issues, this sort of thing keeps you up at nightBoth complexity and cost can start to spiral beyond what you may have anticipated
So lets look at a different strategy for deployingWe have seen a growing number of customers coming to us wanting single tenant solution for high disk I/O data storage solutions like Big Data applicationsWe consider our platform to be a complete portfolio of Cloud Offerings including Single Tenant Options beyond our multi tenant public cloud offeringsWe do have multi tenant with local disk, but we believe our Bare Metal Cloud offering is far better suited for Big Data solutions than any otherAll the advantages of the Cloud without the pain pointsEasy automated provisioningConsistent high performanceBecause you have noShared I/ONetwork DiskWildly Varied deviated performanceYou get Consistent Solid performance every time because our single tenant offerings are backed by BARE METALStress consistent
So lets go back to that deployment insanity and inject a little serenity into the pictureAll of the stacking of raided NAS volumes and configuration is simplified to a “node” deployment concept with some optionsPhysical Volumes are Already raided behind a controller to appear as a single volumeConsistent dependable performanceThis is the strategy we are taking for MongoDB that we feel best serves our customersAnd I think it is the best platform for any application wanting to use a high disk I/O like MongoDBYou just can’t beat physical volumes with single tenancy when it becomes to I/O performance for the dollarIt brings simplicity, performance, and consistent dependability that you need to back your MongoDB
This is caramel mango macadamia nut pudding by the way and it is deliciousSo I can talk all I want about how theoretically sharing resources and network hops impact high storage I/O deploymentsBut let’s look at some numbers
Thank you for your time, I hope you found this helpful I guess we have some time to take some questions