• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Migrating from MySQL to MongoDB at Wordnik
 

Migrating from MySQL to MongoDB at Wordnik

on

  • 19,858 views

Slide's from Tony Tam's presentation at MongoSF on 4/30/2010

Slide's from Tony Tam's presentation at MongoSF on 4/30/2010

Statistics

Views

Total Views
19,858
Views on SlideShare
17,145
Embed Views
2,713

Actions

Likes
30
Downloads
299
Comments
0

16 Embeds 2,713

http://nosql.mypopescu.com 1445
http://blog.nosqlfan.com 1104
http://www.slideshare.net 64
http://www.lifeyun.com 61
http://www.linkedin.com 8
http://static.slidesharecdn.com 8
http://webcache.googleusercontent.com 6
https://www.linkedin.com 3
http://translate.googleusercontent.com 3
http://xianguo.com 3
http://cache.baidu.com 2
http://reader.youdao.com 2
http://fever.leehorrocks.com 1
http://safe.tumblr.com 1
http://localhost 1
http://devmeat.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Migrating from MySQL to MongoDB at Wordnik Migrating from MySQL to MongoDB at Wordnik Presentation Transcript

    • MongoSF 4/30/2010From MySQL to MongoDB
      Migrating a Live Application
      Tony Tam
    • What is Wordnik
      Project to track language
      like GPS for English
      Dictionary is a road block to the language
      Roughly 200 new words created daily
      Language is not static
      Capture information about all words
      Meaning is often undefined in traditional sense
      Machines can determine meaning through analysis
      Needs LOTS of data
    • Why should You care
      Every Developer can use a Robust Language API!
      Wordnik migrated to MongoDB
      > 5 Billion documents
      > 1.2 TB
      Zero application downtime
      Learn from our Experience
    • Wordnik
      Not just a website!
      But we have one
      Launched Wordnik entirely on MySQL
      Hit road bumps with insert speed ~4B rows on MyISAMtables
      Tables locked for 10’s of seconds during inserts
      But we need more data!
      Created elaborate update schemes to work around it
      Lost lots of sleep babysitting servers while researching LT solution
    • Wordnik + MongoDB
      What are our storage needs?
      Database vs. Application Logic
      No PK/FK constraints
      No Stored Procedures
      Consistency?
      Lots of R&D
      Tried most all noSQL solutions
    • Migrating Storage Engines
      Many parts to this effort
      Setup & Administration
      Software Design
      Optimization
      Many types of data at Wordnik
      Corpus
      Structured HierarchicalData
      User Data
      Migrated #1 & #2
    • Server Infrastructure
      Wordnik is Heavily Read-only
      Master / Slave deployment
      Looking at replica pairs
      MongoDB loves system resources
      Wordnik runs dedicated boxes to avoid other apps being sent to disk (aka time-out)
      Memory + Disk = Happy Mongo
      Many X the disk space of MySQL
      Easy pill to swallow until…
    • Server Infrastructure
      Physical Hardware
      2 x 4 core CPU, 32gb RAM, FC SAN
      Had bad luck on VMs
      (you might not)
      Disk speed => performance
    • Software Design
      Two distinct use cases for MongoDB
      Identical structure, different storage engine
      Same underlying objects, same storage fidelity (largelykey/value)
      Hierarchical data structure
      Same underlying objects, document-oriented storage
    • Software Design
      Create BasicDBObjects from POJOs and used collection methods
      BasicDBObjectdbo =
      new BasicDBObject("sentence",s.getSentence())
      .append("rating",s.getRating()).append(...);
      ID Generation to manage unique _ID values
      Analogous to MySQL AutoIncrement behavior
      Compatible with MySQL Ids (more later)
      dbo.append("_ID", getId());
      collection.save(dbo);
      Implemented all CRUD methods in DAO
      Swappable between MongoDB and MySQL at runtime
    • Software Design
      Key-Value storage use case
      Easy as implementing new DAOs
      SentenceHandlerh = new MongoDBSentenceHandler();
      Save methods construct BasicDBObject and call save() on collection
      Implement same interface
      Same methods against DAO between MySQL and MongoDB versions
      Data Abstraction 101
    • Software Design
      What about bulk inserts?
      FAF Queued approach
      Add objects to queue, return to caller
      Every X seconds, process queue
      All objects from same collection are appended to a single List<DBObject>
      Call collection.insert(…) before 2M characters
      Reduces network overhead
      Very fast inserts
    • Software Design
      Hierarchical Data done more elegantly
      Wordnik Dictionary Model
      Java POJOs already had JAXB annotations
      Part of public REST api
      Used Mysql
      12+ tables
      13 DAOs
      2500 lines of code
      50 requests/second uncached
      Memcache needed to maintain reasonable speed
    • Software Design
      TMGO
    • Software Design
      MongoDB’s Document Storage let us…
      Turn the Objects into JSON via Jackson Mapper (fasterxml.com)
      Call save
      Support all fetch types, enhanced filters
      1000 requests / second
      No explicit caching
      No less scary code
    • Software Design
      Saving a complex object
      String rawJSON = getMapper().writeValueAsString(veryComplexObject);
      collection.save(newBasicDBOBject(getId(),JSON.parse(rawJSON));
      Fetching complex object
      BasicDBObjectdbo = cursor.next();
      ComplexObjectobj = getMapper().readValue(dbo.toString(), ComplexObject.class);
      No joins, 20x faster
    • Migrating Data
      Migrating => existing data logic
      Use logic to select DAOs appropriately
      Read from old, write with new
      Great system test for MongoDB
      SentenceHandlermysqlSh = new MySQLSentenceHandler();
      SentenceHandlermongoSh = new MongoDbSentenceHandler();
      while(hasMoreData){
      mongoSh.asyncWrite(mysqlSh.next());
      ...
      }
    • Migrating Data
      Wordnik moved 5 billion rows from MySQL
      Sustained 100,000 inserts/second
      Migration tool was CPU bound
      ID generation logic, among other
      Wordnik reads MongoDB fast
      Read + create java objects @ 250k/second (!)
    • Going live to Production
      Choose your use case carefully if migrating incrementally
      Scary no matter what
      Test your perf monitoring system first!
      Use your DAOs from migration
      Turn on MongoDB on one server, monitor, tune (rollback, repeat)
      Full switch over when comfortable
    • Going live to Production
      Really?
      SentenceHandlerh = null;
      if(useMongoDb){
      h = new MongoDbSentenceHandler();
      }
      else{
      h = new MySQLDbSentenceHandler();
      }
      return h.find(...);
    • Optimizing Performance
      Home-grown connection pooling
      Master only
      ConnectionManager.getReadWriteConnection()
      Slave only
      ConnectionManager.getReadOnlyConnection()
      Round-robin all servers, bias on slaves
      ConnectionManager.getConnection()
    • Optimizing Performance
      Caching
      Had complex logic to handle cache invalidation
      Out-of-process caches are not free
      MongoDB loves your RAM
      Let it do your LRU cache (it will anyway)
      Hardware
      Do not skimp on your disk or RAM
      Indexes
      Schema-less design
      Even if no values in any document, needs to read document schema to check
    • Optimizing Performance
      Disk space
      Schemaless => schema per document (row)
      Choose your mappings wisely
      ({veryLongAttributeName:true}) => more disk space than ({vlan:true})
    • Optimizing Performance
      A Typical Day at the Office for MongoDB
      API call rate: 47.7 calls/sec
    • Other Tips
      Data Types
      Use caution when changing
      DBObjectobj = cur.next();
      long id = (Long) obj.get(“IWasAnIntOnce”)
      Attribute names
      Don’t change w/o migrating existing data!
      WTFDMDG????
    • What’s next?
      GridFS
      Store audio files on disk
      Requires clustered file system for shared access
      Capped Collections (rolling out this week)
      UGC from MySQL => MongoDB
      Beg/Bribe 10gen for some Features
    • Questions?