NoSQL Now 2011Why Wordnik went Non-Relational<br />Tony Tam<br />@fehguy<br />
What this Talk is About<br />5 Key reasons why Wordnik migrated into a Non-Relational database<br />Process for selection,...
Why Should You Care?<br />MongoDB user for almost 2 years<br />Lessons learned, analysis, benefits from process<br />We mi...
More on Wordnik<br />World’s fastest updating English dictionary<br />Based on input of text up to 8k words/second<br />Wo...
Architectural History<br />2008: Wordnik was born as a LAMP AWS EC2 stack<br />2009: Introduced public REST API, powered w...
Non-relational by Necessity<br />Moved to NR because of “4S”<br />Speed<br />Stability<br />Scaling<br />Simplicity<br />B...
Wordnik’s 5 Whys for NoSQL<br />
Why #1: Speed bumps with MySQL<br />Inserting data fast (50k recs/second) caused MySQL mayhem<br />Maintaining indexes lar...
Why #2: Retrieval Complexity<br />Objects typically mapped to tables<br />Object Hierarchy always => inner + outer joins<b...
Why #2: Retrieval Complexity<br />One definition = 10+ joins <br />50 requests per second!<br />
Why #2: Retrieval Complexity<br />Embed objects in rows “sort of works”<br />Filtering gets really nasty<br />Native XML i...
Why #3: Object Modeling<br />Object models being compromised for sake of persistence<br />This is backwards!<br />Extra ab...
Why #4: Scaling<br />Needed "cloud friendly storage"<br />Easy up, easy down!<br />Startup: Sync your data, and announce t...
Why #4: Scaling<br />What about those VMs?<br />So convenient!  But… they kind of suck<br />Can the database succeed on a ...
Why #5: Big Picture<br />BI tools use relational constraints for discovery<br />Is this the right reason for them?<br />Ca...
Why #5: Big Picture<br />Is your app smarter than your database?<br />The logic line is probably blurry!<br />What does co...
Ok, I’m in!<br />I thought deciding was easy!?<br />Many quickly maturing products<br />Divergent features tackle differen...
Choice Made, Now What?<br />We went with MongoDB ***<br />Fastest to implement<br />Most reliable<br />Best community<br /...
More on Why MongoDB<br />Testing, testing, testing<br />Used our migration tools to load test<br />Read from MySQL, write ...
Migration & Testing<br />Iterated ODM mapping multiple times<br />Some issues<br />Type Safety<br />cur.next.get("iWasAnIn...
Migration & Testing<br />Expect data model iterations<br />Wordnik migrated table to Mongo collection "as-is”<br />Easier ...
Migration & Testing<br />Sequence generator in-process<br />SequenceGenerator.checkout("doc_metadata,100")<br />Sequence g...
Migration & Testing<br />Expect data access pattern iterations<br />So much more flexibility!<br />Reach into objects<br /...
Flip the Switch<br />Migrate production with zero downtime<br />We temporarily halted loading data<br />Added a switch to ...
Flip the Switch<br />
Flip the Switch<br />Storage selected at runtime<br />valh = shouldUseMongoDb match {<br />case true => new MongoDbSentenc...
Then What?<br />Watch our deployment, many iterations to mapping layer<br />Settled on in-house, type-safe mapper <br />ht...
Performance + Optimization<br />Loading data is fast!<br />Fixed collection padding, similarly-sized records<br />Tail of ...
Performance + Optimization<br />Moved to physical data center<br />DAS & 72GB RAM => great uncached performance<br />Good ...
Performance + Optimization<br />Inserts are fast, how about updates?<br />Well… update => find object, update it, save<br ...
Performance + Optimization<br />Indexes<br />Can't always keep index in ram. MMF "does it's thing"<br />Right-balanced b-t...
More Mongo, Please!<br />We modeled our word graph in mongo<br /><ul><li>50M Nodes
80M Edges
80mS edge fetch</li></li></ul><li>More Mongo, Please!<br />Analytics rolled-up from aggregation jobs<br />Send to Hadoop, ...
What’s next<br />Liberate our models<br />stop worrying about how to store them (for the most part)<br />New features almo...
Upcoming SlideShare
Loading in...5
×

Why Wordnik went non-relational

7,273

Published on

A presentation on the selection criteria, testing + evaluation and successful, zero-downtime migration to MongoDB. Additionally details on Wordnik's speed and stability are covered as well as how NoSQL technologies have changed the way Wordnik scales.

Published in: Technology
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
7,273
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
47
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide
  • Moving to a json-based mapper, 10k/second. Moving to direct mapping, 35k/second
  • Why Wordnik went non-relational

    1. 1. NoSQL Now 2011Why Wordnik went Non-Relational<br />Tony Tam<br />@fehguy<br />
    2. 2. What this Talk is About<br />5 Key reasons why Wordnik migrated into a Non-Relational database<br />Process for selection, migration<br />Optimizations and tips from living survivors of the battle field<br />
    3. 3. Why Should You Care?<br />MongoDB user for almost 2 years<br />Lessons learned, analysis, benefits from process<br />We migrated from MySQL to MongoDB with no downtime<br />We have interesting/challenging data needs, likely relevant to you<br />
    4. 4. More on Wordnik<br />World’s fastest updating English dictionary<br />Based on input of text up to 8k words/second<br />Word Graph as basis to our analysis<br />Synchronous & asynchronous processing<br />10’s of Billions of documents in NR storage<br />20M daily REST API calls, billions served<br />Powered by Swagger OSS API framework<br />swagger.wordnik.com<br />Powered API<br />
    5. 5. Architectural History<br />2008: Wordnik was born as a LAMP AWS EC2 stack<br />2009: Introduced public REST API, powered wordnik.com, partner APIs<br />2009: drank NoSQL cool-aid<br />2010: Scala<br />2011: Micro SOA<br />
    6. 6. Non-relational by Necessity<br />Moved to NR because of “4S”<br />Speed<br />Stability<br />Scaling<br />Simplicity<br />But…<br />MySQL can go a LONG way<br />Takes right team, right reasons (+ patience)<br />NR offerings simply too compelling to focus on scaling MySQL<br />
    7. 7. Wordnik’s 5 Whys for NoSQL<br />
    8. 8. Why #1: Speed bumps with MySQL<br />Inserting data fast (50k recs/second) caused MySQL mayhem<br />Maintaining indexes largely to blame<br />Operations for consistency unnecessary but "cannot be turned off”<br />Devised twisted schemes to avoid client blocking<br />Aka the “master/slave tango”<br />
    9. 9. Why #2: Retrieval Complexity<br />Objects typically mapped to tables<br />Object Hierarchy always => inner + outer joins<br />Lots of static data, so why join?<br />“Noun”is not getting renamed in my code’s lifetime!<br />Logic like this is probably in application logic<br />Since storage is cheap<br />I’ll choose speed<br />
    10. 10. Why #2: Retrieval Complexity<br />One definition = 10+ joins <br />50 requests per second!<br />
    11. 11. Why #2: Retrieval Complexity<br />Embed objects in rows “sort of works”<br />Filtering gets really nasty<br />Native XML in MySQL?<br />If a full table-scan is OK…<br />OK, then cache it!<br />Layers of caching introduced layers of complexity<br />Stale data/corruption<br />Object versionitis<br />Cache stampedes<br />
    12. 12. Why #3: Object Modeling<br />Object models being compromised for sake of persistence<br />This is backwards!<br />Extra abstraction for the wrong reason<br />OK, then performance suffers<br />In-application joins across objects<br />“Who ran the fetch all query against production?!” –any sysadmin<br />“My zillionth ORM layer that only I understand” (and can maintain)<br />
    13. 13. Why #4: Scaling<br />Needed "cloud friendly storage"<br />Easy up, easy down!<br />Startup: Sync your data, and announce to clients when ready for business<br />Shutdown: Announce your departure and leave<br />Adding MySQL instances was a dance<br />Snapshot + bin files<br />mysql> change master to MASTER_HOST='db1', MASTER_USER='xxx', MASTER_PASSWORD='xxx', MASTER_LOG_FILE='master-relay.000431', MASTER_LOG_POS=1035435402;<br />
    14. 14. Why #4: Scaling<br />What about those VMs?<br />So convenient! But… they kind of suck<br />Can the database succeed on a VM?<br />VM Performance:<br />Memory, CPU or I/O—Pick only one<br />Can your database really reduce CPU or disk I/O with lots of RAM?<br />
    15. 15. Why #5: Big Picture<br />BI tools use relational constraints for discovery<br />Is this the right reason for them?<br />Can we work around this?<br />Let’s have a BI tool revolution, too!<br />True service architecture makes relational constraints impractical/impossible<br />Distributed sharding makes relational constraints impractical/impossible<br />
    16. 16. Why #5: Big Picture<br />Is your app smarter than your database?<br />The logic line is probably blurry!<br />What does count(*)really mean when you add 5k records/sec?<br />Maybe eventual consistency is not so bad…<br />2PC? Do some reading and decide!<br />http://eaipatterns.com/docs/IEEE_Software_Design_2PC.pdf<br />
    17. 17. Ok, I’m in!<br />I thought deciding was easy!?<br />Many quickly maturing products<br />Divergent features tackle different needs<br />Wordnik spent 8 weeks researching and testing NoSQL solutions<br />This is a long time! (for a startup)<br />Wrote ODM classes and migrated our data<br />Surprise! There were surprises<br />Be prepared to compromise<br />
    18. 18. Choice Made, Now What?<br />We went with MongoDB ***<br />Fastest to implement<br />Most reliable<br />Best community<br />Why?<br />Why #1: Fast loading/retrieval<br />Why #2: Fast ODM (50 tps => 1000 tps!)<br />Why #3: Document Models === Object models<br />Why #4: MMF => Kernel-managed memory + RS<br />Why #5: It’s 2011, is there no progress?<br />
    19. 19. More on Why MongoDB<br />Testing, testing, testing<br />Used our migration tools to load test<br />Read from MySQL, write to MongoDB<br />We loaded 5+ billion documents, many times over<br />In the end, one server could…<br />Insert 100k records/sec sustained<br />Read 250k records/sec sustained<br />Support concurrent loading/reading<br />
    20. 20. Migration & Testing<br />Iterated ODM mapping multiple times<br />Some issues<br />Type Safety<br />cur.next.get("iWasAnIntOnce").asInstanceOf[Long]<br />Dates as Strings<br />obj.put("a_date", "2011-12-31") != <br />obj.put("a_date", new Date("2011-12-31"))<br />Storage Size<br />obj.put("very_long_field_name", true) >> <br />obj.put("vsfn", true)<br />
    21. 21. Migration & Testing<br />Expect data model iterations<br />Wordnik migrated table to Mongo collection "as-is”<br />Easier to migrate, test<br />_id field used same MySQL PK<br />Auto Increment?<br />Used MySQL to “check-out” sequences<br />One row per mongo collection<br />Run out of sequences => get more<br />Need exclusive locks here!<br />
    22. 22. Migration & Testing<br />Sequence generator in-process<br />SequenceGenerator.checkout("doc_metadata,100")<br />Sequence generator as web service<br />Centralized UID management<br />
    23. 23. Migration & Testing<br />Expect data access pattern iterations<br />So much more flexibility!<br />Reach into objects<br />> db.dictionary_entry.find({"hdr.sr":"cmu"})<br />Access to a whole object tree at query time<br />Overwrite a whole object at once… when desired<br />Not always! This clobbers the whole record<br />> db.foo.save({_id:18727353,foo:"bar"})<br />Update a single field:<br />> db.foo.update({_id:18727353},{$set:{foo:"bar"}})<br />
    24. 24. Flip the Switch<br />Migrate production with zero downtime<br />We temporarily halted loading data<br />Added a switch to flip between MySQL/MongoDB<br />Instrument, monitor, flip it, analyze, flip back<br />Profiling your code is key<br />What is slow?<br />Build this in your app from day 1<br />
    25. 25. Flip the Switch<br />
    26. 26. Flip the Switch<br />Storage selected at runtime<br />valh = shouldUseMongoDb match {<br />case true => new MongoDbSentenceDAO<br /> case _ => new MySQLDbSentenceDAO<br />}<br />h.find(...)<br />Hot-swappable storage via configuration<br />It worked!<br />
    27. 27. Then What?<br />Watch our deployment, many iterations to mapping layer<br />Settled on in-house, type-safe mapper <br />https://github.com/fehguy/mongodb-benchmark-tools<br />Some gotchas (of course)<br />Locking issues on long-running updates (more in a minute)<br />We want more of this!<br />Migrated shared files to Mongo GridFS<br />Easy-IT<br />
    28. 28. Performance + Optimization<br />Loading data is fast!<br />Fixed collection padding, similarly-sized records<br />Tail of collection is always in memory<br />Append faster than MySQL in every case tested<br />But... random access started getting slow<br />Indexes in RAM? Yes<br />Data in RAM? No, > 2TB per server<br />Limited by disk I/O /seek performance<br />EC2 + EBS for storage?<br />
    29. 29. Performance + Optimization<br />Moved to physical data center<br />DAS & 72GB RAM => great uncached performance<br />Good move? Depends on use case<br />If “access anything anytime”, not many options<br />You want to support this?<br />
    30. 30. Performance + Optimization<br />Inserts are fast, how about updates?<br />Well… update => find object, update it, save<br />Lock acquired at “find”, released after “save”<br />If hitting disk, lock time could be large<br />Easy answer, pre-fetch on update<br />Oh, and NEVER do “update all records” against a large collection<br />
    31. 31. Performance + Optimization<br />Indexes<br />Can't always keep index in ram. MMF "does it's thing"<br />Right-balanced b-tree keeps necessary index hot<br />Indexes hit disk => mute your pager<br />17<br />15<br />27<br />
    32. 32. More Mongo, Please!<br />We modeled our word graph in mongo<br /><ul><li>50M Nodes
    33. 33. 80M Edges
    34. 34. 80mS edge fetch</li></li></ul><li>More Mongo, Please!<br />Analytics rolled-up from aggregation jobs<br />Send to Hadoop, load to mongo for fast access<br />
    35. 35. What’s next<br />Liberate our models<br />stop worrying about how to store them (for the most part)<br />New features almost always NR<br />Some MySQL left<br />Less on each release<br />
    36. 36. Questions?<br />See more about Wordnik APIs<br />http://developer.wordnik.com<br />Migrating from MySQL to MongoDB<br />http://www.slideshare.net/fehguy/migrating-from-mysql-to-mongodb-at-wordnik<br />Maintaining your MongoDB Installation<br />http://www.slideshare.net/fehguy/mongo-sv-tony-tam<br />Swagger API Framework<br />http://swagger.wordnik.com<br />Mapping Benchmark<br />https://github.com/fehguy/mongodb-benchmark-tools<br />Wordnik OSS Tools<br /> https://github.com/wordnik/wordnik-oss<br />
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×