Your SlideShare is downloading. ×
0
Why Wordnik went non-relational
Why Wordnik went non-relational
Why Wordnik went non-relational
Why Wordnik went non-relational
Why Wordnik went non-relational
Why Wordnik went non-relational
Why Wordnik went non-relational
Why Wordnik went non-relational
Why Wordnik went non-relational
Why Wordnik went non-relational
Why Wordnik went non-relational
Why Wordnik went non-relational
Why Wordnik went non-relational
Why Wordnik went non-relational
Why Wordnik went non-relational
Why Wordnik went non-relational
Why Wordnik went non-relational
Why Wordnik went non-relational
Why Wordnik went non-relational
Why Wordnik went non-relational
Why Wordnik went non-relational
Why Wordnik went non-relational
Why Wordnik went non-relational
Why Wordnik went non-relational
Why Wordnik went non-relational
Why Wordnik went non-relational
Why Wordnik went non-relational
Why Wordnik went non-relational
Why Wordnik went non-relational
Why Wordnik went non-relational
Why Wordnik went non-relational
Why Wordnik went non-relational
Why Wordnik went non-relational
Why Wordnik went non-relational
Why Wordnik went non-relational
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Why Wordnik went non-relational

7,217

Published on

A presentation on the selection criteria, testing + evaluation and successful, zero-downtime migration to MongoDB. Additionally details on Wordnik's speed and stability are covered as well as how …

A presentation on the selection criteria, testing + evaluation and successful, zero-downtime migration to MongoDB. Additionally details on Wordnik's speed and stability are covered as well as how NoSQL technologies have changed the way Wordnik scales.

Published in: Technology
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
7,217
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
47
Comments
0
Likes
7
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Moving to a json-based mapper, 10k/second. Moving to direct mapping, 35k/second
  • Transcript

    • 1. NoSQL Now 2011Why Wordnik went Non-Relational
      Tony Tam
      @fehguy
    • 2. What this Talk is About
      5 Key reasons why Wordnik migrated into a Non-Relational database
      Process for selection, migration
      Optimizations and tips from living survivors of the battle field
    • 3. Why Should You Care?
      MongoDB user for almost 2 years
      Lessons learned, analysis, benefits from process
      We migrated from MySQL to MongoDB with no downtime
      We have interesting/challenging data needs, likely relevant to you
    • 4. More on Wordnik
      World’s fastest updating English dictionary
      Based on input of text up to 8k words/second
      Word Graph as basis to our analysis
      Synchronous & asynchronous processing
      10’s of Billions of documents in NR storage
      20M daily REST API calls, billions served
      Powered by Swagger OSS API framework
      swagger.wordnik.com
      Powered API
    • 5. Architectural History
      2008: Wordnik was born as a LAMP AWS EC2 stack
      2009: Introduced public REST API, powered wordnik.com, partner APIs
      2009: drank NoSQL cool-aid
      2010: Scala
      2011: Micro SOA
    • 6. Non-relational by Necessity
      Moved to NR because of “4S”
      Speed
      Stability
      Scaling
      Simplicity
      But…
      MySQL can go a LONG way
      Takes right team, right reasons (+ patience)
      NR offerings simply too compelling to focus on scaling MySQL
    • 7. Wordnik’s 5 Whys for NoSQL
    • 8. Why #1: Speed bumps with MySQL
      Inserting data fast (50k recs/second) caused MySQL mayhem
      Maintaining indexes largely to blame
      Operations for consistency unnecessary but "cannot be turned off”
      Devised twisted schemes to avoid client blocking
      Aka the “master/slave tango”
    • 9. Why #2: Retrieval Complexity
      Objects typically mapped to tables
      Object Hierarchy always => inner + outer joins
      Lots of static data, so why join?
      “Noun”is not getting renamed in my code’s lifetime!
      Logic like this is probably in application logic
      Since storage is cheap
      I’ll choose speed
    • 10. Why #2: Retrieval Complexity
      One definition = 10+ joins
      50 requests per second!
    • 11. Why #2: Retrieval Complexity
      Embed objects in rows “sort of works”
      Filtering gets really nasty
      Native XML in MySQL?
      If a full table-scan is OK…
      OK, then cache it!
      Layers of caching introduced layers of complexity
      Stale data/corruption
      Object versionitis
      Cache stampedes
    • 12. Why #3: Object Modeling
      Object models being compromised for sake of persistence
      This is backwards!
      Extra abstraction for the wrong reason
      OK, then performance suffers
      In-application joins across objects
      “Who ran the fetch all query against production?!” –any sysadmin
      “My zillionth ORM layer that only I understand” (and can maintain)
    • 13. Why #4: Scaling
      Needed "cloud friendly storage"
      Easy up, easy down!
      Startup: Sync your data, and announce to clients when ready for business
      Shutdown: Announce your departure and leave
      Adding MySQL instances was a dance
      Snapshot + bin files
      mysql> change master to MASTER_HOST='db1', MASTER_USER='xxx', MASTER_PASSWORD='xxx', MASTER_LOG_FILE='master-relay.000431', MASTER_LOG_POS=1035435402;
    • 14. Why #4: Scaling
      What about those VMs?
      So convenient! But… they kind of suck
      Can the database succeed on a VM?
      VM Performance:
      Memory, CPU or I/O—Pick only one
      Can your database really reduce CPU or disk I/O with lots of RAM?
    • 15. Why #5: Big Picture
      BI tools use relational constraints for discovery
      Is this the right reason for them?
      Can we work around this?
      Let’s have a BI tool revolution, too!
      True service architecture makes relational constraints impractical/impossible
      Distributed sharding makes relational constraints impractical/impossible
    • 16. Why #5: Big Picture
      Is your app smarter than your database?
      The logic line is probably blurry!
      What does count(*)really mean when you add 5k records/sec?
      Maybe eventual consistency is not so bad…
      2PC? Do some reading and decide!
      http://eaipatterns.com/docs/IEEE_Software_Design_2PC.pdf
    • 17. Ok, I’m in!
      I thought deciding was easy!?
      Many quickly maturing products
      Divergent features tackle different needs
      Wordnik spent 8 weeks researching and testing NoSQL solutions
      This is a long time! (for a startup)
      Wrote ODM classes and migrated our data
      Surprise! There were surprises
      Be prepared to compromise
    • 18. Choice Made, Now What?
      We went with MongoDB ***
      Fastest to implement
      Most reliable
      Best community
      Why?
      Why #1: Fast loading/retrieval
      Why #2: Fast ODM (50 tps => 1000 tps!)
      Why #3: Document Models === Object models
      Why #4: MMF => Kernel-managed memory + RS
      Why #5: It’s 2011, is there no progress?
    • 19. More on Why MongoDB
      Testing, testing, testing
      Used our migration tools to load test
      Read from MySQL, write to MongoDB
      We loaded 5+ billion documents, many times over
      In the end, one server could…
      Insert 100k records/sec sustained
      Read 250k records/sec sustained
      Support concurrent loading/reading
    • 20. Migration & Testing
      Iterated ODM mapping multiple times
      Some issues
      Type Safety
      cur.next.get("iWasAnIntOnce").asInstanceOf[Long]
      Dates as Strings
      obj.put("a_date", "2011-12-31") !=
      obj.put("a_date", new Date("2011-12-31"))
      Storage Size
      obj.put("very_long_field_name", true) >>
      obj.put("vsfn", true)
    • 21. Migration & Testing
      Expect data model iterations
      Wordnik migrated table to Mongo collection "as-is”
      Easier to migrate, test
      _id field used same MySQL PK
      Auto Increment?
      Used MySQL to “check-out” sequences
      One row per mongo collection
      Run out of sequences => get more
      Need exclusive locks here!
    • 22. Migration & Testing
      Sequence generator in-process
      SequenceGenerator.checkout("doc_metadata,100")
      Sequence generator as web service
      Centralized UID management
    • 23. Migration & Testing
      Expect data access pattern iterations
      So much more flexibility!
      Reach into objects
      > db.dictionary_entry.find({"hdr.sr":"cmu"})
      Access to a whole object tree at query time
      Overwrite a whole object at once… when desired
      Not always! This clobbers the whole record
      > db.foo.save({_id:18727353,foo:"bar"})
      Update a single field:
      > db.foo.update({_id:18727353},{$set:{foo:"bar"}})
    • 24. Flip the Switch
      Migrate production with zero downtime
      We temporarily halted loading data
      Added a switch to flip between MySQL/MongoDB
      Instrument, monitor, flip it, analyze, flip back
      Profiling your code is key
      What is slow?
      Build this in your app from day 1
    • 25. Flip the Switch
    • 26. Flip the Switch
      Storage selected at runtime
      valh = shouldUseMongoDb match {
      case true => new MongoDbSentenceDAO
      case _ => new MySQLDbSentenceDAO
      }
      h.find(...)
      Hot-swappable storage via configuration
      It worked!
    • 27. Then What?
      Watch our deployment, many iterations to mapping layer
      Settled on in-house, type-safe mapper
      https://github.com/fehguy/mongodb-benchmark-tools
      Some gotchas (of course)
      Locking issues on long-running updates (more in a minute)
      We want more of this!
      Migrated shared files to Mongo GridFS
      Easy-IT
    • 28. Performance + Optimization
      Loading data is fast!
      Fixed collection padding, similarly-sized records
      Tail of collection is always in memory
      Append faster than MySQL in every case tested
      But... random access started getting slow
      Indexes in RAM? Yes
      Data in RAM? No, > 2TB per server
      Limited by disk I/O /seek performance
      EC2 + EBS for storage?
    • 29. Performance + Optimization
      Moved to physical data center
      DAS & 72GB RAM => great uncached performance
      Good move? Depends on use case
      If “access anything anytime”, not many options
      You want to support this?
    • 30. Performance + Optimization
      Inserts are fast, how about updates?
      Well… update => find object, update it, save
      Lock acquired at “find”, released after “save”
      If hitting disk, lock time could be large
      Easy answer, pre-fetch on update
      Oh, and NEVER do “update all records” against a large collection
    • 31. Performance + Optimization
      Indexes
      Can't always keep index in ram. MMF "does it's thing"
      Right-balanced b-tree keeps necessary index hot
      Indexes hit disk => mute your pager
      17
      15
      27
    • 32. More Mongo, Please!
      We modeled our word graph in mongo
    • More Mongo, Please!
      Analytics rolled-up from aggregation jobs
      Send to Hadoop, load to mongo for fast access
    • 35. What’s next
      Liberate our models
      stop worrying about how to store them (for the most part)
      New features almost always NR
      Some MySQL left
      Less on each release
    • 36. Questions?
      See more about Wordnik APIs
      http://developer.wordnik.com
      Migrating from MySQL to MongoDB
      http://www.slideshare.net/fehguy/migrating-from-mysql-to-mongodb-at-wordnik
      Maintaining your MongoDB Installation
      http://www.slideshare.net/fehguy/mongo-sv-tony-tam
      Swagger API Framework
      http://swagger.wordnik.com
      Mapping Benchmark
      https://github.com/fehguy/mongodb-benchmark-tools
      Wordnik OSS Tools
      https://github.com/wordnik/wordnik-oss

    ×