Your SlideShare is downloading. ×
Migrating from MySQL to MongoDB at Wordnik
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Migrating from MySQL to MongoDB at Wordnik


Published on

Slide's from Tony Tam's presentation at MongoSF on 4/30/2010

Slide's from Tony Tam's presentation at MongoSF on 4/30/2010

  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. MongoSF 4/30/2010From MySQL to MongoDB
    Migrating a Live Application
    Tony Tam
  • 2. What is Wordnik
    Project to track language
    like GPS for English
    Dictionary is a road block to the language
    Roughly 200 new words created daily
    Language is not static
    Capture information about all words
    Meaning is often undefined in traditional sense
    Machines can determine meaning through analysis
    Needs LOTS of data
  • 3. Why should You care
    Every Developer can use a Robust Language API!
    Wordnik migrated to MongoDB
    > 5 Billion documents
    > 1.2 TB
    Zero application downtime
    Learn from our Experience
  • 4. Wordnik
    Not just a website!
    But we have one
    Launched Wordnik entirely on MySQL
    Hit road bumps with insert speed ~4B rows on MyISAMtables
    Tables locked for 10’s of seconds during inserts
    But we need more data!
    Created elaborate update schemes to work around it
    Lost lots of sleep babysitting servers while researching LT solution
  • 5. Wordnik + MongoDB
    What are our storage needs?
    Database vs. Application Logic
    No PK/FK constraints
    No Stored Procedures
    Lots of R&D
    Tried most all noSQL solutions
  • 6. Migrating Storage Engines
    Many parts to this effort
    Setup & Administration
    Software Design
    Many types of data at Wordnik
    Structured HierarchicalData
    User Data
    Migrated #1 & #2
  • 7. Server Infrastructure
    Wordnik is Heavily Read-only
    Master / Slave deployment
    Looking at replica pairs
    MongoDB loves system resources
    Wordnik runs dedicated boxes to avoid other apps being sent to disk (aka time-out)
    Memory + Disk = Happy Mongo
    Many X the disk space of MySQL
    Easy pill to swallow until…
  • 8. Server Infrastructure
    Physical Hardware
    2 x 4 core CPU, 32gb RAM, FC SAN
    Had bad luck on VMs
    (you might not)
    Disk speed => performance
  • 9. Software Design
    Two distinct use cases for MongoDB
    Identical structure, different storage engine
    Same underlying objects, same storage fidelity (largelykey/value)
    Hierarchical data structure
    Same underlying objects, document-oriented storage
  • 10. Software Design
    Create BasicDBObjects from POJOs and used collection methods
    BasicDBObjectdbo =
    new BasicDBObject("sentence",s.getSentence())
    ID Generation to manage unique _ID values
    Analogous to MySQL AutoIncrement behavior
    Compatible with MySQL Ids (more later)
    dbo.append("_ID", getId());;
    Implemented all CRUD methods in DAO
    Swappable between MongoDB and MySQL at runtime
  • 11. Software Design
    Key-Value storage use case
    Easy as implementing new DAOs
    SentenceHandlerh = new MongoDBSentenceHandler();
    Save methods construct BasicDBObject and call save() on collection
    Implement same interface
    Same methods against DAO between MySQL and MongoDB versions
    Data Abstraction 101
  • 12. Software Design
    What about bulk inserts?
    FAF Queued approach
    Add objects to queue, return to caller
    Every X seconds, process queue
    All objects from same collection are appended to a single List<DBObject>
    Call collection.insert(…) before 2M characters
    Reduces network overhead
    Very fast inserts
  • 13. Software Design
    Hierarchical Data done more elegantly
    Wordnik Dictionary Model
    Java POJOs already had JAXB annotations
    Part of public REST api
    Used Mysql
    12+ tables
    13 DAOs
    2500 lines of code
    50 requests/second uncached
    Memcache needed to maintain reasonable speed
  • 14. Software Design
  • 15. Software Design
    MongoDB’s Document Storage let us…
    Turn the Objects into JSON via Jackson Mapper (
    Call save
    Support all fetch types, enhanced filters
    1000 requests / second
    No explicit caching
    No less scary code
  • 16. Software Design
    Saving a complex object
    String rawJSON = getMapper().writeValueAsString(veryComplexObject);,JSON.parse(rawJSON));
    Fetching complex object
    BasicDBObjectdbo =;
    ComplexObjectobj = getMapper().readValue(dbo.toString(), ComplexObject.class);
    No joins, 20x faster
  • 17. Migrating Data
    Migrating => existing data logic
    Use logic to select DAOs appropriately
    Read from old, write with new
    Great system test for MongoDB
    SentenceHandlermysqlSh = new MySQLSentenceHandler();
    SentenceHandlermongoSh = new MongoDbSentenceHandler();
  • 18. Migrating Data
    Wordnik moved 5 billion rows from MySQL
    Sustained 100,000 inserts/second
    Migration tool was CPU bound
    ID generation logic, among other
    Wordnik reads MongoDB fast
    Read + create java objects @ 250k/second (!)
  • 19. Going live to Production
    Choose your use case carefully if migrating incrementally
    Scary no matter what
    Test your perf monitoring system first!
    Use your DAOs from migration
    Turn on MongoDB on one server, monitor, tune (rollback, repeat)
    Full switch over when comfortable
  • 20. Going live to Production
    SentenceHandlerh = null;
    h = new MongoDbSentenceHandler();
    h = new MySQLDbSentenceHandler();
    return h.find(...);
  • 21. Optimizing Performance
    Home-grown connection pooling
    Master only
    Slave only
    Round-robin all servers, bias on slaves
  • 22. Optimizing Performance
    Had complex logic to handle cache invalidation
    Out-of-process caches are not free
    MongoDB loves your RAM
    Let it do your LRU cache (it will anyway)
    Do not skimp on your disk or RAM
    Schema-less design
    Even if no values in any document, needs to read document schema to check
  • 23. Optimizing Performance
    Disk space
    Schemaless => schema per document (row)
    Choose your mappings wisely
    ({veryLongAttributeName:true}) => more disk space than ({vlan:true})
  • 24. Optimizing Performance
    A Typical Day at the Office for MongoDB
    API call rate: 47.7 calls/sec
  • 25. Other Tips
    Data Types
    Use caution when changing
    DBObjectobj =;
    long id = (Long) obj.get(“IWasAnIntOnce”)
    Attribute names
    Don’t change w/o migrating existing data!
  • 26. What’s next?
    Store audio files on disk
    Requires clustered file system for shared access
    Capped Collections (rolling out this week)
    UGC from MySQL => MongoDB
    Beg/Bribe 10gen for some Features
  • 27. Questions?