2. Overview
• Fictional story of a startup using MongoDB &
MEAN stack to build IoT application
• We’ll take a devops perspective - show you what
to watch out for a framework like MEAN
• Tips you can use to help development team focus
on the right things when close to production
• Questions
• How many from operations?
• How many from development?
2
3. 5 Things we Learned
Capacity planning/prototyping is a good idea but
performance is sensitive to sample test data
The MEAN stack rocks - fast to get started - profiler
can help you understand what’s under the hood
Realtime/incremental aggregation works well with IoT
workloads - the “MMS approach”
With NodeJS/Express number of app servers becomes
bottleneck before MongoDB
Performance tuning patterns apply - "bottleneck
whack-a-mole" & “slam-dunk-optimization”
3
5. Internet of Things
Big Data => Humongous Data
“The rise of device oriented development …
new architectural and workflow challenges
… distinctly different from … web and
mobile development so far.” - Morten Bagai
6. Internet of Things
• Bosch: “IoT brings
root and branch
changes to the
world of business”
• Richard Kreuter's
Webinar May 2013
• Earlier bootcamp
looked at sharding
IoT
6
Photo by jurvetson - Creative Commons Attribution License - http://www.flickr.com/photos/jurvetson/916142
7. MEAN stack
7
MongoDB - the database
Express - web app framework/router
Angular - browser HTML/JS MVC
Node - javascript application server
Photo by benmizen - Creative Commons ShareAlike License - http://www.flickr.com/photos/benmizen/9456440635
8. Learn more about MEAN
Valeri Karpov - MongoDB Kernel Tools Team
http://thecodebarbarian.wordpress.com/2013/07/22/
introduction-to-the-mean-stack-part-one-setting-up-your-tools/
MEAN.io
http://mean.io
8
9. About MongoDB Bootcamp
We invest in technical new hires
Everyone does “bootcamp”
NYC for 2 weeks - product internals
Then work on a longer project 3-4 weeks
In our case: wanted to do a bit of everything,
capacity planning, iterate user-stories, MongoDB
a component
9
11. Location based advertising - IoMT
11
!
!
Advertiser
!
Advertiser Advertiser
!
!
!
!
Customer
!
• IoT example 3 from Richard’s Webinar
12. User Stories - for the application
US1 - customer looks
for advertisers near
US2 - advertiser wants
to see how many
customers saw offer
US3 - find hot spots
where many customers
but few advertisers
12
Photo by consumerist - Creative Commons Attribution License - http://www.flickr.com/photos/consumerist/2158190589
15. 5 Things we Learned
Capacity planning/prototyping is a good idea but
performance is sensitive to sample test data
The MEAN stack rocks - fast to get started - profiler
can help you understand what’s under the hood
Realtime/incremental aggregation works well with IoT
workloads - the “MMS approach”
With NodeJS/Express number of app servers becomes
bottleneck before MongoDB
Performance tuning patterns apply - "bottleneck
whack-a-mole" & “slam-dunk-optimization”
15
16. US1 Initial Measurements
MongoDB shell scripts
9 advertisers, small area, distance 10km
MongoDB has 5 kinds of geo query 3 kinds of geo
index
geoSearch (haystack) looked much better than
others (our 1st mistake)
TIP: performance is sensitive to test data & query
16
17. 5 Things we Learned
Capacity planning/prototyping is a good idea but
performance is sensitive to sample test data
The MEAN stack rocks - fast to get started - profiler
can help you understand what’s under the hood
Realtime/incremental aggregation works well with IoT
workloads - the “MMS approach”
With NodeJS/Express number of app servers becomes
bottleneck before MongoDB
Performance tuning patterns apply - "bottleneck
whack-a-mole" & “slam-dunk-optimization”
17
18. The good thing about frameworks is…
!
they do lot’s of things for developers
!
!
!
…and the bad thing about frameworks?
!
they do lot’s of things for developers
19. To find out what’s happening - debug
We used Express passport-http to add Basic-
Digest auth (client id lookup)
It can be hard to figure out what a framework like
express/mongoose really does
Tip: mongoose.set('debug', true) - detailed logging
Console
Mongoose: clients.findOne({ _id: ObjectId(“…”) })!
Mongoose: advertisers.geoHaystack({…[-6.267765, 53.34087]})!
19
20. Find out what’s happening - profiler
Tip: The MongoDB profiler shows operations
really happening on DB, check with dev
20
db.system.profile.find
{"op":"query", "ns":"tings.clients",...!
{“op":"command", "command":{"geoSearch"...!
{"op" :"update","ns":"tings.sessions"...!
exports.all = function(req, res) {!
. . .!
! ! ! req.session = null;!
! !! res.jsonp(advertisers);!
}
10% performance
improvement
Where did that
come from?
Fixing it is not obvious
22. US2 means we built on US1
US1 - customer looks
for advertisers near
• Need to store
customer location
US2 - advertiser wants
to see how many
customers near
22
Being a startup we decided to
take a naive pragmatic approach:
• Store all samples
• US2 aggregates on-demand
Photo by consumerist - Creative Commons Attribution License - http://www.flickr.com/photos/consumerist/2158190589
23. 5 Things we Learned
Capacity planning/prototyping is a good idea but
performance is sensitive to sample test data
The MEAN stack rocks - fast to get started - profiler
can help you understand what’s under the hood
Realtime/incremental aggregation works well with IoT
workloads - the “MMS approach”
With NodeJS/Express number of app servers becomes
bottleneck before MongoDB
Performance tuning patterns apply - "bottleneck
whack-a-mole" & “slam-dunk-optimization”
23
24. US2 - Aggregation of Raw Samples
1 hour of raw samples @ 2k RPS
= 7.2M documents
!
Aggregation on 7.2M raw samples
took 1 second on our instances
Significant impact
• Run every 2 seconds
RPS dropped by factor of 4!
(single instance)
Samples
Query
Aggregate
24
Raw
Insert
Aggregate
25. US2 - Pre aggregation
Samples
Query
Aggregate
25
Raw
Insert
Samples
Pre
Aggregate
!
Update
Query
Aggregate
Aggregate Aggregate
An MMS type approach
Document for
advertiser-customer-month
!
Using update multi-true
(more on this later)
!
Query now only needs to
aggregate unique
customers
26. US1 measurements revisited
MongoDB shell scripts
More realistic data - old measurements repeated
locations
110k advertisers with clusters in DUB and NYC
Performance best for near and nearSphere (2x
better than Haystack)
26
27. Where does the time go?
27
• Express/Mongoose/Node
• Customer Lookup
• Find ($near)
• Save Sample DB
• Save Sample File
• Preagg=multiple docs (6)
• Preagg=multi-update 1 doc
28. 5 Things we Learned
Capacity planning/prototyping is a good idea but
performance is sensitive to sample test data
The MEAN stack rocks - fast to get started - profiler
can help you understand what’s under the hood
Realtime/incremental aggregation works well with IoT
workloads - the “MMS approach”
With NodeJS/Express number of app servers becomes
bottleneck before MongoDB
Performance tuning patterns apply - "bottleneck
whack-a-mole" & “slam-dunk-optimization”
28
31. 5 Things we Learned
Capacity planning/prototyping is a good idea but
performance is sensitive to sample test data
The MEAN stack rocks - fast to get started - profiler
can help you understand what’s under the hood
Realtime/incremental aggregation works well with IoT
workloads - the “MMS approach”
With NodeJS/Express number of app servers becomes
bottleneck before MongoDB
Performance tuning patterns apply - "bottleneck
whack-a-mole" & “slam-dunk-optimization”
31
32. 1 - number of Node.JS
2 - HAproxy
3 - load gen threads/BW
34. Performance tips
1. Increase number of Node.JS
2. Increase perf of proxy/balancer instance
34
HAproxy more balanced than Amazon ELB
3. Tweak Nodeload (generates/measures REST)
Nodeload concurrency 3x Node servers
Run Nodeload on same machine as HAproxy
Development recommendation: Postman chrome
ext - generates REST / Basic Auth
36. US3 Overview
What are the top 10 hot sales areas?
• What is an “area”…?
Requirements
• Little impact, easy to calculate
• Approx. Regular size
• Optimal approx. distance - “bounding areas”
• Plays nice with sharding
Internals of haystack, 2dsphere? Polygon? MGRS?
36
38. MGRS - Military Grid Reference
System
38
• 4QFJ123678 precision level 100m
Image by Mikael Rittri - Creative Commons ShareAlike License
http://en.wikipedia.org/wiki/File:MGRSgridHawaiiSchemeAARealigned.png
39. MGRS - But at the poles…
39
Image by Mikael Rittri - Creative Commons ShareAlike License
http://en.wikipedia.org/wiki/File:MGRSgridNorthPole.png
41. The “box” - the poor-man’s MGRS
x
• Reinvented the sphere
• Long/lat -> box number
• Tailored to specific distance
• Boxes are at least 1km
• Search in current and 8
neighbouring boxes
!
• Filter outside circle in JS
• Performed relatively well
• Can be used to shard
41
43. Impact of Replication
43
Secondary reads
!
Worked for this app
!
Beware - don’t try
this at home!
44. Apply the production notes
Change from default readahead
Disable NUMA & THP
ext4 or XFS
noatime
Load test workload on different configurations
Instance Store / EBS (PIOPs)
SSDs / spinning rust
AWS instance types
44
46. 5 Things we Learned
Capacity planning/prototyping is a good idea but
performance is sensitive to sample test data
The MEAN stack rocks - fast to get started but profiler
can help you understand what’s under the hood
Realtime/incremental aggregation works well with IoT
workloads - the “MMS approach”
Performance tuning patterns apply - "bottleneck
whack-a-mole" & “slam-dunk-optimization”
With NodeJS/Express number of app servers becomes
bottleneck before MongoDB
46
48. Next Steps
Plan to publish as blog post series and github
project
!
Check blog.mongodb.org
!
Continue to explore…
48
49. Next Steps - continuation
Hadoop/YARN for aggregations
Use “box” to geo-shard
Try 2.6 bulk updates
Dynamic angular-google-maps with socket-io
Implement in another framework (Go/Clojure) to
load MongoDB with less hardware
Find balance between batch and pre-aggregation
49
(see next slide)
50. Learn More & Thank You
Introduction to MEAN - Valeri Karpov
http://thecodebarbarian.wordpress.com/2013/07/22/introduction-to-the-mean-stack-part-one-setting-up-your-
tools/
MEAN.io
http://mean.io
Richard Kreuter's webinar - M2M
http://www.mongodb.com/presentations/webinar-realizing-promise-machine-machine-m2m-mongodb
Building MongoDB Into Your Internet of Things
http://blog.mongohq.com/building-mongodb-into-your-internet-of-things-a-tutorial/
Schema design for time series data (MMS)
http://blog.mongodb.org/post/65517193370/schema-design-for-time-series-data-in-mongodb
50