MongoDB: How We Did It – 
Reanimating Identity at AOL
Topics 
• Motivation 
• Challenges 
• Approach 
• MongoDB Testing 
• Deployment 
• Collections 
• Problem/Solution 
• Lessons Learned 
• Going Forward
Motivation
Motivation 
• Cluttered data 
• Ambiguous data 
• Functionally shared data 
• Immutable data model
Challenges
Challenges 
• Leaving behind fault-tolerant (Non-Stop) 
platform/Transactional integrity 
• Merge/extricate Identity data 
• Scaling to handle consolidated traffic 
• Continue to support Legacy
Approach
Approach 
• Document-based data model – use MongoDB 
• Migrate data 
• Build adapter/interceptor layer 
• Production testing with no impacts
Approach 
• Audit of setup with MongoDB 
• Tweak mongo settings, including driver, to 
optimize for performance 
• Leverage eventual consistency to overcome 
transactional integrity loss 
• Switch Identity to new data model using 
MongoDB
Migration
Migration 
• Adapters support 4 stages: 
1. Read/write legacy 
2. Read/write legacy, write mongoDB (shadow read 
mongoDB) 
3. read/write mongoDB, write legacy 
4. Read/write mongoDB
Stage 1 Stage 1
Stage 2 
Stage 2
Stage 3 
Stage 3
Stage 4 Stage 4
MongoDB Testing
Production Testing 
• “Chaos Monkey” testing of MongoDB 
• 4 Million requests/Minute (production load, 
read to write ratio 99%) 
• Test primary failover (graceful) 
• Kill Primary
Production Testing 
• Test secondary failure 
• Shutdown all secondaries 
• Manually shutdown interface on primary 
• Performance benchmarking
Production Testing 
• Performance very good, shard key reads ~2- 
3ms 
• Scatter-gather reads ~12ms 
• Writes good as well, ~3-20ms 
• Failovers 4-5 minutes
MongoDB Healthcheck 
• Use dedicated machines for Config servers 
• Place Config servers in different data centers 
• Handle failover in application, if network 
exception, fallback to secondary 
• Set lower TCP keepalive values (5 minutes)
Deployment
Deployment 
• Version 2.4.9 
• All 75 mongod’s on separate switches 
• 2 x 12 Core CPUs, 192GB of RAM and internal 
controller based RAID 10 Ext4 File Systems 
• Using default chunk size (64MB)
Deployment 
• Have dedicated slaves for backup (configured 
as hidden members with priority 0). Backup 
runs during 6-8am window 
• Enable powerOf2Sizes for collections to 
reduce fragmentation 
• Balancer restricted to 4-6am daily
Collections
Document Model 
• Entire data set must be in memory to meet 
performance demands 
• Document field names abbreviated, but 
descriptive 
• Don’t store default values (Legacy document is 
80% defaults) 
• Working hard to keep legacy artifacts out, but 
always about trade-offs
UserIdentity Collection 
• Core data model for Identity 
• Heterogenous collection (some documents 
are “aliases” which are pointers to primary 
document) 
• Index on user+namespace 
• Shard key is guid (UUID Type 1, flipped – 
node then time)
UserIdentity 
{ 
_id: “baebc8bcc8e14f6e9bf70221d81711e2”, 
user: “jdoe”, 
ns: “aol, 
… 
"profile" : { 
"cc" : "US", 
"firstNm" : ”John", 
"lang" : "en_US", 
"lastNm" : ”Doe”}, 
"sysTime" : ISODate("2014-05-03T04:43:49.899Z”) 
}
Relationship Collection 
• Support all cardinalities 
• Equivalent to RDBMS intersection table (guid 
on each end of relationship) 
• Use eventually consistent framework for non-atomic 
writes 
• Shard key is parent+child+type (parent lookup 
is primary use case)
Relationship Collection 
{ 
"_id" : ”baa000163e5ff405b8083d5f164c11e3", 
"child" : "8a9e00237d617f08df7f1685527711e2", 
"createTime" : ISODate("2013-09-05T17:00:51.209Z"), 
"modTime" : ISODate("2013-09-05T17:00:51.209Z"), 
"attributes" : null, 
"parent" : ” baebc8bcc8e14f6e9bf70221d81711e2", 
"type" : ”CLASSROOM” 
}
Legacy Collection 
• Bridge collection to facilitate migration from 
old data model to new 
• Near-image of old data model but with some 
refactoring (3 tables into 1 document) 
• Once migration is complete, plan is to drop 
this collection 
• Defaults not stored, 1-2 character field names
Legacy Collection 
{ 
"_id" : ”jdoe", 
”subData" : { 
"f" : NumberLong(1018628731), 
"g" : ”jdoe", 
"d" : false, 
"e" : NumberLong(1018628731), 
"b" : NumberLong(434077116), 
"a" : ”JDoe", 
"l" : NumberLong("212200907100000000"), 
"i" : NumberLong(659952670) 
}, 
”guid" : "baebc8bcc8e14f6e9bf70221d81711e2", 
"st" : ISODate("2013-06-24T20:13:16.627Z") 
}
Reservation Collection 
• Namespace protection 
• Enforce uniqueness of user/namespace from 
application side because shard key for 
UserIdentity collection is guid 
• Shard key is username+namespace
Reservation Collection 
{ 
"_id" : "b13a00163e062d8ee9dc9eaf3e2411e1", 
"createTime" : ISODate("2012-01- 
13T20:26:46.111Z"), 
"user" : ”jdoe", 
"expires" : ISODate("2012-01-13T21:26:46.111Z"), 
”rsvId" : "e9bddfe1-1c84-42c9-8f4c-1a7a96920ff4", 
”data" : { "k1": "v1", "k2" : "v2" }, 
”ns" : "aol", 
"type" : "R" 
}
Problems/Solutions
Problem 
Writes spanning multiple documents sometimes 
fail part way
Solution 
• Developed eventually consistent framework 
“synchronizer” 
• Events sent to framework to validate, repair, 
or finish 
• Events retryable until success or ttl is expired
Problem 
Scatter-gather queries slower, 100% 
performance impact on failover
Solution 
• Use Memcached to map non-shard key to 
shard key (99% hit ratio for one mapping, 55% 
for other) 
• Use Memcached to map potentially expensive 
intermediary results (88% hit ratio)
Problem 
Querying lists of users required parallel 
processing for performance -- increasing 
connection requirements
Solution 
Use $in operator to query lists of users rather 
than looping through individual queries
Problem 
At application startup a large number of 
requests failed because of overhead in creating 
mongos connections
Solution 
Build into application a “warm-up” stage that 
executes stock queries prior to going online and 
taking traffic
Problem 
During failovers or other slow periods, 
application queues back up and recovery takes 
too long
Solution 
Determine request’s time in queue, if exceeds 
client’s timeout, don’t process, drop request
Problem 
Using application applied optimistic lock 
encounters lock errors during concurrent writes 
(entire document updated)
Solution 
Use $Set operator to target writes to just those 
impacted elements, use MongoDB to enforce 
atomicity
Problem 
Reads from primary, but when secondaries lost, 
reads fail
Solution 
Use primaryPreferred for reads. Want the 
freshest data (password for example), but still 
want reads to work if no primary exists
Problem 
Large number of connections to 
mongos/mongod is extending the failover times 
and nearing limits
Solution 
• Application DAOs share connections to same 
Mongo cluster 
• Connection params initially set too high 
• Set connectionsPerHost and 
connectionMultiplier plus a buffer to cover 
the fixed number of worker threads per 
application (15/5 for 32 worker threads). 
• Went from 15K connections to 2K connections
Benefits
Benefits 
• Unanticipated benefit was ability for all 
eligible users to use the AOL client 
• Easily added Identity extensions leveraging 
the new data model 
• Support for multiple namespaces made 
building APIs for multi-tenancy 
straightforward 
• Model is positioned in such a way to make 
vision for AOL Identity feasible
Lessons Learned
Lessons Learned 
• Keep connections as low as possible 
– Higher connection numbers increase failover 
times 
• Avoid scatter-gather reads (use cache if 
possible to get to shard key) 
• Keep data set in memory 
• Fail fast on application side to lower recovery 
time
Going Forward
Going forward 
• Implement tagging to target secondaries 
• Further reduction in scatter-gather reads 
• Reduce failover window to as short as possible 
• Contact: doug.haydon@teamaol.com

MongoDB: How We Did It – Reanimating Identity at AOL

  • 1.
    MongoDB: How WeDid It – Reanimating Identity at AOL
  • 2.
    Topics • Motivation • Challenges • Approach • MongoDB Testing • Deployment • Collections • Problem/Solution • Lessons Learned • Going Forward
  • 3.
  • 4.
    Motivation • Cluttereddata • Ambiguous data • Functionally shared data • Immutable data model
  • 5.
  • 6.
    Challenges • Leavingbehind fault-tolerant (Non-Stop) platform/Transactional integrity • Merge/extricate Identity data • Scaling to handle consolidated traffic • Continue to support Legacy
  • 7.
  • 8.
    Approach • Document-baseddata model – use MongoDB • Migrate data • Build adapter/interceptor layer • Production testing with no impacts
  • 9.
    Approach • Auditof setup with MongoDB • Tweak mongo settings, including driver, to optimize for performance • Leverage eventual consistency to overcome transactional integrity loss • Switch Identity to new data model using MongoDB
  • 10.
  • 11.
    Migration • Adapterssupport 4 stages: 1. Read/write legacy 2. Read/write legacy, write mongoDB (shadow read mongoDB) 3. read/write mongoDB, write legacy 4. Read/write mongoDB
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
    Production Testing •“Chaos Monkey” testing of MongoDB • 4 Million requests/Minute (production load, read to write ratio 99%) • Test primary failover (graceful) • Kill Primary
  • 18.
    Production Testing •Test secondary failure • Shutdown all secondaries • Manually shutdown interface on primary • Performance benchmarking
  • 19.
    Production Testing •Performance very good, shard key reads ~2- 3ms • Scatter-gather reads ~12ms • Writes good as well, ~3-20ms • Failovers 4-5 minutes
  • 20.
    MongoDB Healthcheck •Use dedicated machines for Config servers • Place Config servers in different data centers • Handle failover in application, if network exception, fallback to secondary • Set lower TCP keepalive values (5 minutes)
  • 21.
  • 23.
    Deployment • Version2.4.9 • All 75 mongod’s on separate switches • 2 x 12 Core CPUs, 192GB of RAM and internal controller based RAID 10 Ext4 File Systems • Using default chunk size (64MB)
  • 24.
    Deployment • Havededicated slaves for backup (configured as hidden members with priority 0). Backup runs during 6-8am window • Enable powerOf2Sizes for collections to reduce fragmentation • Balancer restricted to 4-6am daily
  • 25.
  • 26.
    Document Model •Entire data set must be in memory to meet performance demands • Document field names abbreviated, but descriptive • Don’t store default values (Legacy document is 80% defaults) • Working hard to keep legacy artifacts out, but always about trade-offs
  • 27.
    UserIdentity Collection •Core data model for Identity • Heterogenous collection (some documents are “aliases” which are pointers to primary document) • Index on user+namespace • Shard key is guid (UUID Type 1, flipped – node then time)
  • 28.
    UserIdentity { _id:“baebc8bcc8e14f6e9bf70221d81711e2”, user: “jdoe”, ns: “aol, … "profile" : { "cc" : "US", "firstNm" : ”John", "lang" : "en_US", "lastNm" : ”Doe”}, "sysTime" : ISODate("2014-05-03T04:43:49.899Z”) }
  • 29.
    Relationship Collection •Support all cardinalities • Equivalent to RDBMS intersection table (guid on each end of relationship) • Use eventually consistent framework for non-atomic writes • Shard key is parent+child+type (parent lookup is primary use case)
  • 30.
    Relationship Collection { "_id" : ”baa000163e5ff405b8083d5f164c11e3", "child" : "8a9e00237d617f08df7f1685527711e2", "createTime" : ISODate("2013-09-05T17:00:51.209Z"), "modTime" : ISODate("2013-09-05T17:00:51.209Z"), "attributes" : null, "parent" : ” baebc8bcc8e14f6e9bf70221d81711e2", "type" : ”CLASSROOM” }
  • 31.
    Legacy Collection •Bridge collection to facilitate migration from old data model to new • Near-image of old data model but with some refactoring (3 tables into 1 document) • Once migration is complete, plan is to drop this collection • Defaults not stored, 1-2 character field names
  • 32.
    Legacy Collection { "_id" : ”jdoe", ”subData" : { "f" : NumberLong(1018628731), "g" : ”jdoe", "d" : false, "e" : NumberLong(1018628731), "b" : NumberLong(434077116), "a" : ”JDoe", "l" : NumberLong("212200907100000000"), "i" : NumberLong(659952670) }, ”guid" : "baebc8bcc8e14f6e9bf70221d81711e2", "st" : ISODate("2013-06-24T20:13:16.627Z") }
  • 33.
    Reservation Collection •Namespace protection • Enforce uniqueness of user/namespace from application side because shard key for UserIdentity collection is guid • Shard key is username+namespace
  • 34.
    Reservation Collection { "_id" : "b13a00163e062d8ee9dc9eaf3e2411e1", "createTime" : ISODate("2012-01- 13T20:26:46.111Z"), "user" : ”jdoe", "expires" : ISODate("2012-01-13T21:26:46.111Z"), ”rsvId" : "e9bddfe1-1c84-42c9-8f4c-1a7a96920ff4", ”data" : { "k1": "v1", "k2" : "v2" }, ”ns" : "aol", "type" : "R" }
  • 35.
  • 36.
    Problem Writes spanningmultiple documents sometimes fail part way
  • 37.
    Solution • Developedeventually consistent framework “synchronizer” • Events sent to framework to validate, repair, or finish • Events retryable until success or ttl is expired
  • 38.
    Problem Scatter-gather queriesslower, 100% performance impact on failover
  • 39.
    Solution • UseMemcached to map non-shard key to shard key (99% hit ratio for one mapping, 55% for other) • Use Memcached to map potentially expensive intermediary results (88% hit ratio)
  • 40.
    Problem Querying listsof users required parallel processing for performance -- increasing connection requirements
  • 41.
    Solution Use $inoperator to query lists of users rather than looping through individual queries
  • 42.
    Problem At applicationstartup a large number of requests failed because of overhead in creating mongos connections
  • 43.
    Solution Build intoapplication a “warm-up” stage that executes stock queries prior to going online and taking traffic
  • 44.
    Problem During failoversor other slow periods, application queues back up and recovery takes too long
  • 45.
    Solution Determine request’stime in queue, if exceeds client’s timeout, don’t process, drop request
  • 46.
    Problem Using applicationapplied optimistic lock encounters lock errors during concurrent writes (entire document updated)
  • 47.
    Solution Use $Setoperator to target writes to just those impacted elements, use MongoDB to enforce atomicity
  • 48.
    Problem Reads fromprimary, but when secondaries lost, reads fail
  • 49.
    Solution Use primaryPreferredfor reads. Want the freshest data (password for example), but still want reads to work if no primary exists
  • 50.
    Problem Large numberof connections to mongos/mongod is extending the failover times and nearing limits
  • 51.
    Solution • ApplicationDAOs share connections to same Mongo cluster • Connection params initially set too high • Set connectionsPerHost and connectionMultiplier plus a buffer to cover the fixed number of worker threads per application (15/5 for 32 worker threads). • Went from 15K connections to 2K connections
  • 52.
  • 53.
    Benefits • Unanticipatedbenefit was ability for all eligible users to use the AOL client • Easily added Identity extensions leveraging the new data model • Support for multiple namespaces made building APIs for multi-tenancy straightforward • Model is positioned in such a way to make vision for AOL Identity feasible
  • 54.
  • 55.
    Lessons Learned •Keep connections as low as possible – Higher connection numbers increase failover times • Avoid scatter-gather reads (use cache if possible to get to shard key) • Keep data set in memory • Fail fast on application side to lower recovery time
  • 56.
  • 57.
    Going forward •Implement tagging to target secondaries • Further reduction in scatter-gather reads • Reduce failover window to as short as possible • Contact: doug.haydon@teamaol.com

Editor's Notes

  • #2  ----- Meeting Notes (9/9/14 14:55) ----- brands.aol.com, fingerprint brand logo, stuck on bottom of master page, invert colors
  • #12  ----- Meeting Notes (9/9/14 14:55) ----- after challanges, how we overcame, what is our strategy to overcome -- approach -- why chose mongo, document based
  • #18  ----- Meeting Notes (9/9/14 14:55) ----- what did we learn from this? What did we change? 10gen audit
  • #19  ----- Meeting Notes (9/9/14 14:55) ----- what did we learn from this? What did we change? 10gen audit
  • #28  ----- Meeting Notes (9/9/14 14:55) ----- 12 TB/700M
  • #30  120 GB/140M
  • #31  120 GB/140M
  • #32 300GB/400M
  • #33 300GB/400M
  • #34  400GB
  • #35  400GB