What Drove Wordnik Non-Relational?


Published on

Wordnik's technical co-founder Tony Tam describes the reason for going NoSQL. During his talk Tony will discuss the selection criteria, testing + evaluation and successful, zero-downtime migration to MongoDB. Additionally details on Wordnik's speed and stability will be covered as well as how NoSQL technologies have changed the way Wordnik scales.

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

What Drove Wordnik Non-Relational?

  1. 1. NoSQL NNow 2011Why Wordnik wen Non-Relational y nt Tony Tam @fehhguy
  2. 2. What this Ta is About alk• 5 Key reasons why Wordnik migrated into a N R l ti Non-Relational database l dat b• Process for selection migration n,• Optimizations and tiips from living p p g survivors of the batt field tle
  3. 3. Why Should You Care? d• MongoDB user for aalmost 2 years• Lessons learned, an nalysis, benefits from process• We migrated from My to MongoDB g MySQL g with no downtime• W have interesting/ h ll i d t We h i t ti g/challenging data needs, likely relevan to you , y nt y
  4. 4. More on Wordnik• World’s fastest upda English dictionary ating • Based on input of text up to 8k words/second • Word Graph as basis t our analysis to • Synchronous & asyn nchronous processing• 10’s of Billions of do ocuments in NR storage• 20M daily REST AP calls,, billions served y PI • Powered by Swagger OSS API framework Powered API swagg ger.wordnik.com
  5. 5. Architectural History• 2008: Wordnik was born as a LAMP AWS EC2 stack t k• 2009: Introduced pu REST API ublic API, powered wordnik.co partner APIs om,• 2009: drank NoSQL cool-aid L• 2010 S l 2010: Scala• 2011: Micro SOA
  6. 6. Non-relational by Necessity l• Moved to NR becau of “4S” use • Speed • Stability • Scaling • Simplicity• But But… • MySQL can go a LONG way • Takes right team, rig reasons (+ patience) ght • NR offerings simply t o compelling t f ff i i l to lli to focus on scaling MySQL
  7. 7. Wordnik s WhysWordnik’s 5 Whys for NoSQL
  8. 8. Why #1: Speed bu umps with MySQL• Inserting data fast (5 recs/second) 50k caused M SQL mayh d MySQL yhem • Maintaining indexes la ge y to b a e a ta g de es a argely blame • Operations for consisteency unnecessary but "cannot be turned off” cannot off• Devised twisted sch hemes to avoid client blocking • Aka h “master/slave tango” Ak the “ / l ”
  9. 9. Why #2: Retrie eval Complexity• Objects typically ma to tables apped • Object Hierarchy alway => inner + outer joins ys• Lots of static data s why join? data, so • “Noun” is not getting re enamed in my code’s lifetime! • Logic like this is probably in application logic• Since storage is che eap • I’ll choose speed
  10. 10. Why #2: Retrieval Complexity e One definition = 10+ 50 requests per second! d!
  11. 11. Why #2: Retrie eval Complexity• Embed objects in ro “sort of works” ows • Filtering Fil i gets really nasty ll • Native XML in MySQL y QL? • If a full table-scan is OK… s• OK then cache it! OK, • Layers of caching intro oduced layers of complexity • Stale data/corruption • Object versionitis • Cache stampedes
  12. 12. Why #3: Obje Modeling ect• Object models being compromised for g sake of persistence k f i t • This is bac a ds s s backwards! • Extra abstraction for th wrong reason he• OK, then performan suffers nce • In-application In application joins acr ross objects • “Who ran the fetch all query against production?!” –any sysadmin• “My zillionth ORM la that only I My ayer understand” (and ca maintain) an
  13. 13. Why #4: Scaling• Needed "cloud friendly storage" • Easy up, easy down! • Startup: Sync your d data, data and announce to clients when ready f business for • Shutdown: A Sh td Announc your d ce departure and l t d leave• Adding MySQL insta was a dance ances • Snapshot + bin files mysql> change master t MASTER_HOST=db1, to MASTER_USER=xxx, MAS STER_PASSWORD=xxx, MASTER_LOG_FILE=maste MASTER LOG FILE ter-relay.000431, l 000431 MASTER_LOG_POS=1035435 5402;
  14. 14. Why #4: Scaling• What about those V VMs? • So convenient! But… they kind of suck • Can the database succ ceed on a VM?• VM Performance: • Memory, CPU or I/O— —Pick only one • Can C your d t b database reall reduce CPU or di k I/O ally d disk with lots of RAM?
  15. 15. Why #5: B Picture Big• BI tools use relational constraints for discovery • Is hi h i h I this the right reason f r them? for h ? • Can we work around this? ? • Let’s have a BI tool revolu ution, too!• True service architectu makes relational ure constraints impractical/impossible• Distributed sharding m relational makes constraints impractical/impossible
  16. 16. Why #5: B Picture Big• Is your app smarter than your database? • The logic line is probab blurry! bly• What does count(*) really mean when y count(*) add 5k records/sec? ? • Maybe eventual consis stency is not so bad…• 2PC? Do some rea ading and decide!http://eaipatterns.com/docs/IEEE_Software_Design_2PC.pd
  17. 17. Ok, I’ in! ’m• I thought deciding w easy!? was • Many quickly maturing products g • Divergent features tackle different needs• Wordnik spent 8 we researching and eeks testing NoSQL solut tions • This is a long time! (for a startup) • Wrote ODM classes an migrated our data nd• Surprise! There were surprises • Be prepared to compro omise
  18. 18. Choice Made Now What? e,• We went with MongoDB *** • Fastest to implement • Most reliable • Best community• Wh ? Why? • Why #1: Fast loading/r y g retrieval • Why #2: Fast ODM (50 tps => 1000 tps!) 0 • Why #3: Document Mo odels === Object models • Why #4: MMF => Kern managed memory + RS nel nel-managed • Why #5: It’s 2011, is th here no progress?
  19. 19. More on Wh MongoDB hy• Testing, testing, testting • Used our migration too to load test ols • Read from MySQL, write to MongoDB MySQL • We loaded 5+ billion documents, many times over• In the end, one serv could… ver • Insert 100k records/se sustained I t d / ec t i d • Read 250k records/sec sustained • Support concurrent loa ading/reading
  20. 20. Migration & Testing• Iterated ODM mapp multiple times ping • Some issues • Type Safety cur.next.get(”iWasAnIntOn nce").asInstanceOf[Long] • Dates as S i D Strings obj.put("a_date", "2011-1 12-31") != obj.put("a_date", new Dat te("2011-12-31")) • Storage Size obj.put("very_long_field_ _name", true) >> obj.put("vsfn", true)
  21. 21. Migration & Testing• Expect data model iiterations • Wordnik migrated table to Mongo collection "as-is” e • Easier to migrate te migrate, est • _id field used same MySQL PK e • Auto Increment? • Used MySQL to “check-out” sequences check-out • One row per mon collection ngo • Run out of sequences => get more • Need exclusive lock here! ks
  22. 22. Migration & Testing• Sequence generator in-process SequenceGenerator.check kout("doc_metadata,100")• Sequence generator as web service • Centralized UID management
  23. 23. Migration & Testing• Expect data access pattern iterations • So much more flexibilit ty! • Reach into objects > db.dictionary_entry.f find({"hdr.sr":"cmu"}) • Access to a whole obje tree at query time A h l bject i • Overwrite a whole obje at once… when desired ject • Not always! This clo obbers the whole record > db.foo.save({foo:"bar r”}) • Update a single field d: > db.foo.update({_id:18 8727353},{$set:{foo:"bar"}})
  24. 24. Flip the Switch• Migrate production w zero downtime with • We temporarily halted loading data • Added a switch to flip b between MySQL/MongoDB • Instrument, monitor, fli it, analyze, flip back ip• Profiling your code iis key • What i l ? Wh t is slow? • Build this in your app f from day 1
  25. 25. Flip the Switch
  26. 26. Flip the Switch• Storage selected at runtime val h = shouldUseMongo b match { l h ld oDb h case true => new Mo ongoDbSentenceDAO case _ => new MySQL LDbSentenceDAO } h.find(...)• Hot swappable stora via configuration Hot-swappable age • It worked!
  27. 27. Then W What?• Watch our deployme many iterations to ent, mapping l i layer • Sett ed o Settled on in-house, ty sa e mapper ouse, ype-safe appe ype https://github.com/fehguy y/mongodb-benchmark-tools• S gotchas (off co ) Some t h ( ourse) • Locking issues on long running updates (more in a g g-running minute)• W wantt more off thiis!! We • Migrated shared files t Mongo GridFS to • Easy-IT
  28. 28. Performance + Optimization• Loading data is fast! • Fixed collection paddin similarly-sized records ng, • Tail of collection is alw ways in memory • Append faster than My ySQL in every case tested• But... random acces started getting slow ss • Indexes i RAM? Y s I d in Yes • Data in RAM? No, > 2 2TB per server • Limited by disk I/O /seek performance • EC2 + EBS f storage? for t e?
  29. 29. Performance + Optimization• Moved to physical d center data • DAS & 72GB RAM => great uncached performance• Good move? Depends on use case • If “access anything any ytime”, not many options • You want to support th his?
  30. 30. Performance + Optimization• Inserts are fast, how about updates? w • Well… update => find object, update it, save • Lock acquired at “find”, released after “save” find” save • If hitting disk, lock time could be large• Easy answer, pre-fe on update etch • Oh, d Oh and NEVER d “ pdate all records” against a do “u d t ll d ” i t large collection
  31. 31. Performance + Optimization• Indexes • Cant always keep inde in ram. MMF "does its ex thing" • Right-balanced b-tree keeps necessary index hot • Indexes hit di k => mut your pager I d disk > ute 17
  32. 32. More Mong Please! go,• We modeled our wo graph in mongo ord0M Nodes0M Edges0μS edge fetch
  33. 33. More Mong Please! go,• Analytics rolled-up ffrom aggregation jobs • Send to Hadoop, load to mongo for fast access
  34. 34. What’s next s• Liberate our models s • stop worrying about ho to store them (for the ow most part)• New features almos always NR st• Some MySQL left • Less on each release
  35. 35. Quest tions?• See more about Wordnik AP PIs http://deve eloper.wordnik.com eloper wordnik com• Migrating from MySQL to Mo ongoDBhttp://www.slideshare.net/fehguy/mig grating-from-mysql-to-mongodb-at-wordn• Maintaining your MongoDB Installation http://www.slideshare e.net/fehguy/mongo-sv-tony-tam• Swagger API Framework http://sw wagger.wordnik.com• Mapping Benchmark pp g https://github.com/f fehguy/mongodb-benchmark-tools• Wordnik OSS Tools https://github.c com/wordnik/wordnik-oss