4. Big Data = MongoDB = Solved
Content Management Operational Intelligence E-Commerce
User Data Management High Volume Data Feeds Mobile
Thursday, 25 October 12
5. Location Based Service
•Problem:
•Location based social networking service needs to scale to
high number of users and check-ins
•Solution:
•Used MongoDB deployed on EC2
•8 clusters, 40 machines, 15k QPS, 2.3 billion records
•Auto-sharding and geo-spatial indexing are key
•Results:
•To date have scaled to 9m users, 3m check-ins per day,
750m total check-ins, 20m places, 400k merchants
Thursday, 25 October 12
6. •Problem:
•Business needed modern data store for rapid
development and scale
•Solution:
•Used PHP and MongoDB
•Results:
•RealTime estatistics
•All data, images etc store together
•No need for complex migrations
•Enable very rapid development and growth
Thursday, 25 October 12
7. •Problem:
•Deal with massive data volume across all customers
•Solution:
•Use MongoDB to replace Google Analytics / Omniture
•Results:
•Less than one week to build prototype and POC
•Rapid deployment of new features
Thursday, 25 October 12
8. •Problem:
•Lots of friction with RDMS for archiving storage
•Needed to more scalable archive storage database
•Solution:
•Keep MySQL for active data ( 100 Million )
•MongoDB for archive ( 2 Billion )
•Results:
•No more alter tables statements taking over 2 months
•Sharding fixed vertical scale problem
•Very happily looking for other ways to use MongoDB
Thursday, 25 October 12
9. How Telefónica uses MongoDB
Apps
M2M Event Acquisition
Event notification
Event Notifier Portal
API
Event Mng
Core Storage Mng Mng
Storage
Platform
Event Gateway
BOSS
Event acquisition
Operator Network
MNO1
MNO2
MNOn
Thursday, 25 October 12
12. The Evolution of MongoDB
1.8 2.0 2.2 2.4
March ‘11 Sept ‘11 Aug ‘12 winter ‘12
Journaling Index enhancements Aggregation
Sharding and to improve size and Framework
Replica set performance Multi-Data Center
enhancements Authentication with Deployments
Spherical geo sharded clusters Improved
search Replica Set Performance and
Enhancements Concurrency
Concurrency
improvements
Thursday, 25 October 12
13. 2.2 Release August 2012
• Concurrency: yielding + db level locking
• New aggregation framework
• TTL Collections
• Improved free list implementation
• Tag aware sharding
• Read Preferences
• http://docs.mongodb.org/manual/release-notes/2.2/
Thursday, 25 October 12
14. Yielding + DB Locking
• improved yielding on page fault
• breaking down the global level lock
• Lock per Database in 2.2
• Lock per Collection post 2.2
Thursday, 25 October 12
15. Aggregation Framework
• pipeline model (a bit like unix pipes)
• like a "group by"
– Operators
– $project, $group, $match, $limit, $skip, $unwind, $sort
– Expressions
– Logical Expressions: $and, $not, $or, $cmp ...
– Math Expressions: $add, $divide, $mod ...
– String Expressions: $strcasecmp, $substr, $toLower ...
– Date/Time Expressions: $dayOfMonth, $hour...
– Multi-Expressions: $ifNull, $cond
• Use Cases: Real-time / inline analytics
Thursday, 25 October 12
16. Example - For each "tag", list
the authors
{
title : "my tech blog" ,
author : "bob" ,
tags : [ "fun" , "good" , "tech" ] ,
}
{
title : "cool tech" ,
author : "jim" ,
tags : [ "awesome" , "tech" ] ,
}
Thursday, 25 October 12
18. Time To Live (TTL)
Collections
• auto expire data out of a collection
• must be on a date datatype
• single value is evaluated
• Use Cases: data retention, cache expiration
db.events.ensureIndex(
{ "timestamp": 1 },
{ expireAfterSeconds: 3600 } )
Thursday, 25 October 12
19. Tag aware sharding
• Distribute data based on a Tag
• Use Cases: Locality for Data by Data Center
sh.addShardTag("shard0000", "dc-emea")
sh.addTagRange("mydb.users",
{ country: "uk"}, { country: "ul"},
"dc-emea"
);
sh.addTagRange("mydb.users",
{ country: "by"},{ country: "bz"},
"dc-emea"
);
Thursday, 25 October 12
20. Read Preferences
• Mode
• PRIMARY, PRIMARY_PREFERRED
• SECONDARY, SECONDARY_PREFERRED
• NEAREST
• Tag Sets
• Uses Replica Set tags
• Passed Tag is used to find matching members
Thursday, 25 October 12
21. 2.4 Roadmap
Must
• Kerberos integration
• LDAP/AD integration
Nice To Have
• Hash Shard Key
• Background Index Build on Secondaries
• V8 for Map/Reduce (replaces Spider Monkey)
• Geo: intersecting polygons, Geo shard key
• Agg: $out, more functions, speed improvements
Thursday, 25 October 12
22. And beyond
• Full Text Search
• Collection / Extent level locking
• Field level security
• Audit
Thursday, 25 October 12