Migrating from MySQL to MongoDB at Wordnik


Published on

Slide's from Tony Tam's presentation at MongoSF on 4/30/2010

  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Migrating from MySQL to MongoDB at Wordnik

  1. 1. MongoSF 4/30/2010From MySQL to MongoDB<br />Migrating a Live Application<br />Tony Tam<br />
  2. 2. What is Wordnik<br />Project to track language <br />like GPS for English<br />Dictionary is a road block to the language<br />Roughly 200 new words created daily<br />Language is not static<br />Capture information about all words<br />Meaning is often undefined in traditional sense<br />Machines can determine meaning through analysis<br />Needs LOTS of data<br />
  3. 3. Why should You care<br />Every Developer can use a Robust Language API!<br />Wordnik migrated to MongoDB<br />> 5 Billion documents<br />> 1.2 TB<br />Zero application downtime<br />Learn from our Experience<br />
  4. 4. Wordnik<br />Not just a website!<br />But we have one<br />Launched Wordnik entirely on MySQL<br />Hit road bumps with insert speed ~4B rows on MyISAMtables<br />Tables locked for 10’s of seconds during inserts<br />But we need more data!<br />Created elaborate update schemes to work around it<br />Lost lots of sleep babysitting servers while researching LT solution<br />
  5. 5. Wordnik + MongoDB<br />What are our storage needs?<br />Database vs. Application Logic<br />No PK/FK constraints<br />No Stored Procedures<br />Consistency?<br />Lots of R&D<br />Tried most all noSQL solutions<br />
  6. 6. Migrating Storage Engines<br />Many parts to this effort<br />Setup & Administration<br />Software Design<br />Optimization<br />Many types of data at Wordnik<br />Corpus<br />Structured HierarchicalData<br />User Data<br />Migrated #1 & #2<br />
  7. 7. Server Infrastructure<br />Wordnik is Heavily Read-only<br />Master / Slave deployment<br />Looking at replica pairs<br />MongoDB loves system resources<br />Wordnik runs dedicated boxes to avoid other apps being sent to disk (aka time-out)<br />Memory + Disk = Happy Mongo<br />Many X the disk space of MySQL<br />Easy pill to swallow until…<br />
  8. 8. Server Infrastructure<br />Physical Hardware<br />2 x 4 core CPU, 32gb RAM, FC SAN<br />Had bad luck on VMs<br />(you might not)<br />Disk speed => performance<br />
  9. 9. Software Design<br />Two distinct use cases for MongoDB<br />Identical structure, different storage engine<br />Same underlying objects, same storage fidelity (largelykey/value)<br />Hierarchical data structure<br />Same underlying objects, document-oriented storage<br />
  10. 10. Software Design<br />Create BasicDBObjects from POJOs and used collection methods<br />BasicDBObjectdbo =<br /> new BasicDBObject("sentence",s.getSentence())<br /> .append("rating",s.getRating()).append(...);<br />ID Generation to manage unique _ID values<br />Analogous to MySQL AutoIncrement behavior<br />Compatible with MySQL Ids (more later)<br />dbo.append("_ID", getId());<br />collection.save(dbo);<br />Implemented all CRUD methods in DAO<br />Swappable between MongoDB and MySQL at runtime <br />
  11. 11. Software Design<br />Key-Value storage use case<br />Easy as implementing new DAOs<br />SentenceHandlerh = new MongoDBSentenceHandler();<br />Save methods construct BasicDBObject and call save() on collection<br />Implement same interface<br />Same methods against DAO between MySQL and MongoDB versions<br />Data Abstraction 101<br />
  12. 12. Software Design<br />What about bulk inserts?<br />FAF Queued approach<br />Add objects to queue, return to caller<br />Every X seconds, process queue<br />All objects from same collection are appended to a single List<DBObject><br />Call collection.insert(…) before 2M characters<br />Reduces network overhead<br />Very fast inserts<br />
  13. 13. Software Design<br />Hierarchical Data done more elegantly<br />Wordnik Dictionary Model<br />Java POJOs already had JAXB annotations<br />Part of public REST api<br />Used Mysql<br />12+ tables<br />13 DAOs<br />2500 lines of code<br />50 requests/second uncached<br />Memcache needed to maintain reasonable speed<br />
  14. 14. Software Design<br />TMGO<br />
  15. 15. Software Design<br />MongoDB’s Document Storage let us…<br />Turn the Objects into JSON via Jackson Mapper (fasterxml.com)<br />Call save<br />Support all fetch types, enhanced filters<br />1000 requests / second<br />No explicit caching<br />No less scary code<br />
  16. 16. Software Design<br />Saving a complex object<br />String rawJSON = getMapper().writeValueAsString(veryComplexObject);<br />collection.save(newBasicDBOBject(getId(),JSON.parse(rawJSON));<br />Fetching complex object<br />BasicDBObjectdbo = cursor.next();<br />ComplexObjectobj = getMapper().readValue(dbo.toString(), ComplexObject.class);<br />No joins, 20x faster<br />
  17. 17. Migrating Data<br />Migrating => existing data logic<br />Use logic to select DAOs appropriately<br />Read from old, write with new<br />Great system test for MongoDB<br />SentenceHandlermysqlSh = new MySQLSentenceHandler();<br />SentenceHandlermongoSh = new MongoDbSentenceHandler();<br />while(hasMoreData){<br />mongoSh.asyncWrite(mysqlSh.next());<br /> ...<br />}<br />
  18. 18. Migrating Data<br />Wordnik moved 5 billion rows from MySQL<br />Sustained 100,000 inserts/second<br />Migration tool was CPU bound<br />ID generation logic, among other<br />Wordnik reads MongoDB fast<br />Read + create java objects @ 250k/second (!)<br />
  19. 19. Going live to Production<br />Choose your use case carefully if migrating incrementally<br />Scary no matter what<br />Test your perf monitoring system first!<br />Use your DAOs from migration<br />Turn on MongoDB on one server, monitor, tune (rollback, repeat)<br />Full switch over when comfortable<br />
  20. 20. Going live to Production<br />Really?<br />SentenceHandlerh = null;<br />if(useMongoDb){<br />h = new MongoDbSentenceHandler();<br />}<br />else{<br />h = new MySQLDbSentenceHandler();<br />}<br />return h.find(...);<br />
  21. 21. Optimizing Performance<br />Home-grown connection pooling<br />Master only<br />ConnectionManager.getReadWriteConnection()<br />Slave only<br />ConnectionManager.getReadOnlyConnection()<br />Round-robin all servers, bias on slaves<br />ConnectionManager.getConnection()<br />
  22. 22. Optimizing Performance<br />Caching<br />Had complex logic to handle cache invalidation<br />Out-of-process caches are not free<br />MongoDB loves your RAM<br />Let it do your LRU cache (it will anyway)<br />Hardware<br />Do not skimp on your disk or RAM<br />Indexes<br />Schema-less design<br />Even if no values in any document, needs to read document schema to check<br />
  23. 23. Optimizing Performance<br />Disk space<br />Schemaless => schema per document (row)<br />Choose your mappings wisely<br />({veryLongAttributeName:true}) => more disk space than ({vlan:true})<br />
  24. 24. Optimizing Performance<br />A Typical Day at the Office for MongoDB<br />API call rate: 47.7 calls/sec<br />
  25. 25. Other Tips<br />Data Types<br />Use caution when changing<br />DBObjectobj = cur.next();<br />long id = (Long) obj.get(“IWasAnIntOnce”)<br />Attribute names<br />Don’t change w/o migrating existing data!<br />WTFDMDG????<br />
  26. 26. What’s next?<br />GridFS<br />Store audio files on disk<br />Requires clustered file system for shared access<br />Capped Collections (rolling out this week)<br />UGC from MySQL => MongoDB<br />Beg/Bribe 10gen for some Features<br />
  27. 27. Questions?<br />