Your SlideShare is downloading. ×
0
From MySQL to MongoDB at Wordnik (Tony Tam)
From MySQL to MongoDB at Wordnik (Tony Tam)
From MySQL to MongoDB at Wordnik (Tony Tam)
From MySQL to MongoDB at Wordnik (Tony Tam)
From MySQL to MongoDB at Wordnik (Tony Tam)
From MySQL to MongoDB at Wordnik (Tony Tam)
From MySQL to MongoDB at Wordnik (Tony Tam)
From MySQL to MongoDB at Wordnik (Tony Tam)
From MySQL to MongoDB at Wordnik (Tony Tam)
From MySQL to MongoDB at Wordnik (Tony Tam)
From MySQL to MongoDB at Wordnik (Tony Tam)
From MySQL to MongoDB at Wordnik (Tony Tam)
From MySQL to MongoDB at Wordnik (Tony Tam)
From MySQL to MongoDB at Wordnik (Tony Tam)
From MySQL to MongoDB at Wordnik (Tony Tam)
From MySQL to MongoDB at Wordnik (Tony Tam)
From MySQL to MongoDB at Wordnik (Tony Tam)
From MySQL to MongoDB at Wordnik (Tony Tam)
From MySQL to MongoDB at Wordnik (Tony Tam)
From MySQL to MongoDB at Wordnik (Tony Tam)
From MySQL to MongoDB at Wordnik (Tony Tam)
From MySQL to MongoDB at Wordnik (Tony Tam)
From MySQL to MongoDB at Wordnik (Tony Tam)
From MySQL to MongoDB at Wordnik (Tony Tam)
From MySQL to MongoDB at Wordnik (Tony Tam)
From MySQL to MongoDB at Wordnik (Tony Tam)
From MySQL to MongoDB at Wordnik (Tony Tam)
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

From MySQL to MongoDB at Wordnik (Tony Tam)

12,716

Published on

Published in: Technology
0 Comments
24 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
12,716
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
187
Comments
0
Likes
24
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. MongoSF 4/30/2010From MySQL to MongoDB<br />Migrating a Live Application<br />Tony Tam<br />
  • 2. What is Wordnik<br />Project to track language <br />like GPS for English<br />Dictionary is a road block to the language<br />Roughly 200 new words created daily<br />Language is not static<br />Capture information about all words<br />Meaning is often undefined in traditional sense<br />Machines can determine meaning through analysis<br />Needs LOTS of data<br />
  • 3. Why should You care<br />Every Developer can use a Robust Language API!<br />Wordnik migrated to MongoDB<br />&gt; 5 Billion documents<br />&gt; 1.2 TB<br />Zero application downtime<br />Learn from our Experience<br />
  • 4. Wordnik<br />Not just a website!<br />But we have one<br />Launched Wordnik entirely on MySQL<br />Hit road bumps with insert speed ~4B rows on MyISAMtables<br />Tables locked for 10’s of seconds during inserts<br />But we need more data!<br />Created elaborate update schemes to work around it<br />Lost lots of sleep babysitting servers while researching LT solution<br />
  • 5. Wordnik + MongoDB<br />What are our storage needs?<br />Database vs. Application Logic<br />No PK/FK constraints<br />No Stored Procedures<br />Consistency?<br />Lots of R&amp;D<br />Tried most all noSQL solutions<br />
  • 6. Migrating Storage Engines<br />Many parts to this effort<br />Setup &amp; Administration<br />Software Design<br />Optimization<br />Many types of data at Wordnik<br />Corpus<br />Structured HierarchicalData<br />User Data<br />Migrated #1 &amp; #2<br />
  • 7. Server Infrastructure<br />Wordnik is Heavily Read-only<br />Master / Slave deployment<br />Looking at replica pairs<br />MongoDB loves system resources<br />Wordnik runs dedicated boxes to avoid other apps being sent to disk (aka time-out)<br />Memory + Disk = Happy Mongo<br />Many X the disk space of MySQL<br />Easy pill to swallow until…<br />
  • 8. Server Infrastructure<br />Physical Hardware<br />2 x 4 core CPU, 32gb RAM, FC SAN<br />Had bad luck on VMs<br />(you might not)<br />Disk speed =&gt; performance<br />
  • 9. Software Design<br />Two distinct use cases for MongoDB<br />Identical structure, different storage engine<br />Same underlying objects, same storage fidelity (largelykey/value)<br />Hierarchical data structure<br />Same underlying objects, document-oriented storage<br />
  • 10. Software Design<br />Create BasicDBObjects from POJOs and used collection methods<br />BasicDBObjectdbo =<br /> new BasicDBObject(&quot;sentence&quot;,s.getSentence())<br /> .append(&quot;rating&quot;,s.getRating()).append(...);<br />ID Generation to manage unique _ID values<br />Analogous to MySQL AutoIncrement behavior<br />Compatible with MySQL Ids (more later)<br />dbo.append(&quot;_ID&quot;, getId());<br />collection.save(dbo);<br />Implemented all CRUD methods in DAO<br />Swappable between MongoDB and MySQL at runtime <br />
  • 11. Software Design<br />Key-Value storage use case<br />Easy as implementing new DAOs<br />SentenceHandlerh = new MongoDBSentenceHandler();<br />Save methods construct BasicDBObject and call save() on collection<br />Implement same interface<br />Same methods against DAO between MySQL and MongoDB versions<br />Data Abstraction 101<br />
  • 12. Software Design<br />What about bulk inserts?<br />FAF Queued approach<br />Add objects to queue, return to caller<br />Every X seconds, process queue<br />All objects from same collection are appended to a single List&lt;DBObject&gt;<br />Call collection.insert(…) before 2M characters<br />Reduces network overhead<br />Very fast inserts<br />
  • 13. Software Design<br />Hierarchical Data done more elegantly<br />Wordnik Dictionary Model<br />Java POJOs already had JAXB annotations<br />Part of public REST api<br />Used Mysql<br />12+ tables<br />13 DAOs<br />2500 lines of code<br />50 requests/second uncached<br />Memcache needed to maintain reasonable speed<br />
  • 14. Software Design<br />TMGO<br />
  • 15. Software Design<br />MongoDB’s Document Storage let us…<br />Turn the Objects into JSON via Jackson Mapper (fasterxml.com)<br />Call save<br />Support all fetch types, enhanced filters<br />1000 requests / second<br />No explicit caching<br />No less scary code<br />
  • 16. Software Design<br />Saving a complex object<br />String rawJSON = getMapper().writeValueAsString(veryComplexObject);<br />collection.save(newBasicDBOBject(getId(),JSON.parse(rawJSON));<br />Fetching complex object<br />BasicDBObjectdbo = cursor.next();<br />ComplexObjectobj = getMapper().readValue(dbo.toString(), ComplexObject.class);<br />No joins, 20x faster<br />
  • 17. Migrating Data<br />Migrating =&gt; existing data logic<br />Use logic to select DAOs appropriately<br />Read from old, write with new<br />Great system test for MongoDB<br />SentenceHandlermysqlSh = new MySQLSentenceHandler();<br />SentenceHandlermongoSh = new MongoDbSentenceHandler();<br />while(hasMoreData){<br />mongoSh.asyncWrite(mysqlSh.next());<br /> ...<br />}<br />
  • 18. Migrating Data<br />Wordnik moved 5 billion rows from MySQL<br />Sustained 100,000 inserts/second<br />Migration tool was CPU bound<br />ID generation logic, among other<br />Wordnik reads MongoDB fast<br />Read + create java objects @ 250k/second (!)<br />
  • 19. Going live to Production<br />Choose your use case carefully if migrating incrementally<br />Scary no matter what<br />Test your perf monitoring system first!<br />Use your DAOs from migration<br />Turn on MongoDB on one server, monitor, tune (rollback, repeat)<br />Full switch over when comfortable<br />
  • 20. Going live to Production<br />Really?<br />SentenceHandlerh = null;<br />if(useMongoDb){<br />h = new MongoDbSentenceHandler();<br />}<br />else{<br />h = new MySQLDbSentenceHandler();<br />}<br />return h.find(...);<br />
  • 21. Optimizing Performance<br />Home-grown connection pooling<br />Master only<br />ConnectionManager.getReadWriteConnection()<br />Slave only<br />ConnectionManager.getReadOnlyConnection()<br />Round-robin all servers, bias on slaves<br />ConnectionManager.getConnection()<br />
  • 22. Optimizing Performance<br />Caching<br />Had complex logic to handle cache invalidation<br />Out-of-process caches are not free<br />MongoDB loves your RAM<br />Let it do your LRU cache (it will anyway)<br />Hardware<br />Do not skimp on your disk or RAM<br />Indexes<br />Schema-less design<br />Even if no values in any document, needs to read document schema to check<br />
  • 23. Optimizing Performance<br />Disk space<br />Schemaless =&gt; schema per document (row)<br />Choose your mappings wisely<br />({veryLongAttributeName:true}) =&gt; more disk space than ({vlan:true})<br />
  • 24. Optimizing Performance<br />A Typical Day at the Office for MongoDB<br />API call rate: 47.7 calls/sec<br />
  • 25. Other Tips<br />Data Types<br />Use caution when changing<br />DBObjectobj = cur.next();<br />long id = (Long) obj.get(“IWasAnIntOnce”)<br />Attribute names<br />Don’t change w/o migrating existing data!<br />WTFDMDG????<br />
  • 26. What’s next?<br />GridFS<br />Store audio files on disk<br />Requires clustered file system for shared access<br />Capped Collections (rolling out this week)<br />UGC from MySQL =&gt; MongoDB<br />Beg/Bribe 10gen for some Features<br />
  • 27. Questions?<br />

×