SlideShare a Scribd company logo
1 of 35
Why Wordnik went Non-Relational

            Tony Tam
            @fehguy
What this Talk is About

• 5 Key reasons why Wordnik migrated into
  a Non-Relational database
• Process for selection, migration
• Optimizations and tips from living
  survivors of the battle field
Why Should You Care?

• MongoDB user for 2 years
• Lessons learned, analysis, benefits from
  process
• We migrated from MySQL to MongoDB
  with no downtime
• We have interesting/challenging data
  needs, likely relevant to you
More on Wordnik

• World’s fastest updating English dictionary
 •   Based on input of text up to 8k words/second
 •   Word Graph as basis to our analysis
     •   Synchronous & asynchronous processing
• 10’s of Billions of documents in NR
  storage
• 20M daily REST API calls, billions served
 •   Powered by Swagger OSS API framework

                Powered API
                              swagger.wordnik.com
Architectural History

• 2008: Wordnik was born as a LAMP AWS
  EC2 stack
• 2009: Introduced public REST
  API, powered wordnik.com, partner APIs
• 2009: drank NoSQL cool-aid
• 2010: Scala
• 2011: Micro SOA
Non-relational by Necessity

• Moved to NR because of ―4S‖
 •   Speed
 •   Stability
 •   Scaling
 •   Simplicity
• But…
 •   MySQL can go a LONG way
     •   Takes right team, right reasons (+ patience)
 •   NR offerings simply too compelling to focus on
     scaling MySQL
Wordnik’s 5 Whys for NoSQL
Why #1: Speed bumps with MySQL

• Inserting data fast (50k recs/second)
  caused MySQL mayhem
 •   Maintaining indexes largely to blame
 •   Operations for consistency unnecessary but
     "cannot be turned off‖
• Devised twisted schemes to avoid client
  blocking
 •   Aka the ―master/slave tango‖
Why #2: Retrieval Complexity

• Objects typically mapped to tables
  •   Object Hierarchy always => inner + outer joins
• Lots of static data, so why join?
  •   “Noun” is not getting renamed in my code’s
      lifetime!
  •   Logic like this is probably in application logic
• Since storage is cheap
  •   I’ll choose speed
Why #2: Retrieval Complexity




                    One definition = 10+ joins


                            50
                         requests
                            per
                         second!
Why #2: Retrieval Complexity
• Embed objects in rows ―sort of works‖
 •   Filtering gets really nasty
 •   Native XML in MySQL?
     •   If a full table-scan is OK…


• OK, then cache it!
 •   Layers of caching introduced layers of complexity
     •   Stale data/corruption
     •   Object versionitis
     •   Cache stampedes
Why #3: Object Modeling

• Object models being compromised for
  sake of persistence
  •   This is backwards!
  •   Extra abstraction for the wrong reason
• OK, then performance suffers
  •   In-application joins across objects
  •   ―Who ran the fetch all query against production?!‖
      –any sysadmin

• ―My zillionth ORM layer that only I
  understand‖ (and can maintain)
Why #4: Scaling

• Needed "cloud friendly storage"
 •   Easy up, easy down!
     •   Startup: Sync your data, and announce to
         clients when ready for business
     •   Shutdown: Announce your departure and leave
• Adding MySQL instances was a dance
 •   Snapshot + bin files
 mysql> change master to
 MASTER_HOST='db1', MASTER_USER='xxx', MASTER_
 PASSWORD='xxx', MASTER_LOG_FILE='master-
 relay.000431', MASTER_LOG_POS=1035435402;
Why #4: Scaling

• What about those VMs?
 •   So convenient! But… they kind of suck
 •   Can the database succeed on a VM?
• VM Performance:
 •   Memory, CPU or I/O—Pick only one
 •   Can your database really reduce CPU or disk I/O
     with lots of RAM?
Why #5: Big Picture
• BI tools use relational constraints for discovery
  •   Is this the right reason for them?
  •   Can we work around this?
  •   Let’s have a BI tool revolution, too!
• True service architecture makes relational
  constraints impractical/impossible
• Distributed sharding makes relational
  constraints impractical/impossible
Why #5: Big Picture

• Is your app smarter than your database?
  •   The logic line is probably blurry!
• What does count(*) really mean when you
  add 5k records/sec?
  •   Maybe eventual consistency is not so bad…
• 2PC?      Do some reading and decide!
http://eaipatterns.com/docs/IEEE_Software_Design_2PC.pdf
Ok, I’m in!

• I thought deciding was easy!?
 •   Many quickly maturing products
 •   Divergent features tackle different needs
• Wordnik spent 8 weeks researching and
  testing NoSQL solutions
 •   This is a long time! (for a startup)
 •   Wrote ODM classes and migrated our data
• Surprise!     There were surprises
 •   Be prepared to compromise
Choice Made, Now What?
• We went with MongoDB ***
 •   Fastest to implement
 •   Most reliable
 •   Best community
• Why?
 •   Why #1: Fast loading/retrieval
 •   Why #2: Fast ODM (50 tps => 1000 tps!)
 •   Why #3: Document Models === Object models
 •   Why #4: MMF => Kernel-managed memory + RS
 •   Why #5: It’s 2011, is there no progress?
More on Why MongoDB

• Testing, testing, testing
  •   Used our migration tools to load test
      •   Read from MySQL, write to MongoDB
  •   We loaded 5+ billion documents, many times over
• In the end, one server could…
  •   Insert 100k records/sec sustained
  •   Read 250k records/sec sustained
  •   Support concurrent loading/reading
Migration & Testing

• Iterated ODM mapping multiple times
 •   Some issues
     •   Type Safety
 cur.next.get("iWasAnIntOnce").asInstanceOf[Long]

     •   Dates as Strings
 obj.put("a_date", "2011-12-31") !=
 obj.put("a_date", new Date("2011-12-31"))

     •   Storage Size
 obj.put("very_long_field_name", true) >>
 obj.put("vsfn", true)
Migration & Testing

• Expect data model iterations
 •   Wordnik migrated table to Mongo collection "as-is‖
     •   Easier to migrate, test
     •   _id field used same MySQL PK
 •   Auto Increment?
     •   Used MySQL to ―check-out‖ sequences
         •   One row per mongo collection
         •   Run out of sequences => get more
     •   Need exclusive locks here!
Migration & Testing

• Sequence generator in-process
 SequenceGenerator.checkout("doc_metadata,100")

• Sequence generator as web service
 •   Centralized UID management
Migration & Testing

• Expect data access pattern iterations
 •   So much more flexibility!
     •   Reach into objects
     > db.dictionary_entry.find({"hdr.sr":"cmu"})

 •   Access to a whole object tree at query time
 •   Overwrite a whole object at once… when desired
     •   Not always! This clobbers the whole record
     > db.foo.save({_id:18727353,foo:"bar"})

     •   Update a single field:
     > db.foo.update({_id:18727353},{$set:{foo:"bar"}})
Flip the Switch

• Migrate production with zero downtime
  •   We temporarily halted loading data
  •   Added a switch to flip between MySQL/MongoDB
  •   Instrument, monitor, flip it, analyze, flip back
• Profiling your code is key
  •   What is slow?
  •   Build this in your app from day 1
Flip the Switch
Flip the Switch

• Storage selected at runtime
 val h = shouldUseMongoDb match {
   case true => new MongoDbSentenceDAO
   case _ => new MySQLDbSentenceDAO
 }
 h.find(...)

• Hot-swappable storage via configuration
 •   It worked!
Then What?

• Watch our deployment, many iterations to
  mapping layer
 •   Settled on in-house, type-safe mapper
 https://github.com/fehguy/mongodb-benchmark-tools

• Some gotchas (of course)
 •   Locking issues on long-running updates (more in a
     minute)
• We want more of this!
 •   Migrated shared files to Mongo GridFS
 •   Easy-IT
Performance + Optimization

• Loading data is fast!
  •   Fixed collection padding, similarly-sized records
  •   Tail of collection is always in memory
  •   Append faster than MySQL in every case tested
• But... random access started getting slow
  •   Indexes in RAM? Yes
  •   Data in RAM? No, > 2TB per server
  •   Limited by disk I/O /seek performance
  •   EC2 + EBS for storage?
Performance + Optimization

• Moved to physical data center
 •   DAS & 72GB RAM => great uncached
     performance
• Good move?        Depends on use case
 •   If ―access anything anytime‖, not many options
 •   You want to support this?
Performance + Optimization

• Inserts are fast, how about updates?
 •   Well… update => find object, update it, save
 •   Lock acquired at ―find‖, released after ―save‖
     •   If hitting disk, lock time could be large
• Easy answer, pre-fetch on update
 •   Oh, and NEVER do ―update all records‖ against a
     large collection
Performance + Optimization

• Indexes
 •   Can't always keep index in ram. MMF "does it's
     thing"
 •   Right-balanced b-tree keeps necessary index hot
 •   Indexes hit disk => mute your pager
                                               1
                                               7




                                                      1   2
                                                      5   7
More Mongo, Please!

    • We modeled our word graph in mongo




• 50M Nodes
• 80M Edges
• 80 S edge fetch
More Mongo, Please!

• Analytics rolled-up from aggregation jobs
 •   Send to Hadoop, load to mongo for fast access
What’s next

• Liberate our models
 •   stop worrying about how to store them (for the
     most part)
• New features almost always NR
• Some MySQL left
 •   Less on each release
Questions?
•   See more about Wordnik APIs
                    http://developer.wordnik.com

•   Migrating from MySQL to MongoDB
http://www.slideshare.net/fehguy/migrating-from-mysql-to-mongodb-at-wordnik

•   Maintaining your MongoDB Installation
               http://www.slideshare.net/fehguy/mongo-sv-tony-tam

•   Swagger API Framework
                          http://swagger.wordnik.com

•   Mapping Benchmark
               https://github.com/fehguy/mongodb-benchmark-tools

•   Wordnik OSS Tools
                    https://github.com/wordnik/wordnik-oss

More Related Content

What's hot

The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationDATAVERSITY
 
Slides: Accelerating Queries on Cloud Data Lakes
Slides: Accelerating Queries on Cloud Data LakesSlides: Accelerating Queries on Cloud Data Lakes
Slides: Accelerating Queries on Cloud Data LakesDATAVERSITY
 
Platforming the Major Analytic Use Cases for Modern Engineering
Platforming the Major Analytic Use Cases for Modern EngineeringPlatforming the Major Analytic Use Cases for Modern Engineering
Platforming the Major Analytic Use Cases for Modern EngineeringDATAVERSITY
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure CloudCaserta
 
Data architecture for modern enterprise
Data architecture for modern enterpriseData architecture for modern enterprise
Data architecture for modern enterprisekayalvizhi kandasamy
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best PracticesDATAVERSITY
 
Slides: Relational to NoSQL Migration
Slides: Relational to NoSQL MigrationSlides: Relational to NoSQL Migration
Slides: Relational to NoSQL MigrationDATAVERSITY
 
Enable Better Decision Making with Power BI Visualizations & Modern Data Estate
Enable Better Decision Making with Power BI Visualizations & Modern Data EstateEnable Better Decision Making with Power BI Visualizations & Modern Data Estate
Enable Better Decision Making with Power BI Visualizations & Modern Data EstateCCG
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the CloudCaserta
 
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...Erik Fransen
 
Future of Analytics: Drivers of Change
Future of Analytics: Drivers of ChangeFuture of Analytics: Drivers of Change
Future of Analytics: Drivers of ChangeCCG
 
Data-Ed Online: Data Architecture Requirements
Data-Ed Online: Data Architecture RequirementsData-Ed Online: Data Architecture Requirements
Data-Ed Online: Data Architecture RequirementsDATAVERSITY
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on HadoopCaserta
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and InnovationCaserta
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for EveryoneCaserta
 
Power BI Advanced Data Modeling Virtual Workshop
Power BI Advanced Data Modeling Virtual WorkshopPower BI Advanced Data Modeling Virtual Workshop
Power BI Advanced Data Modeling Virtual WorkshopCCG
 
How to Use a Semantic Layer on Big Data to Drive AI & BI Impact
How to Use a Semantic Layer on Big Data to Drive AI & BI ImpactHow to Use a Semantic Layer on Big Data to Drive AI & BI Impact
How to Use a Semantic Layer on Big Data to Drive AI & BI ImpactDATAVERSITY
 

What's hot (20)

The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data Integration
 
Slides: Accelerating Queries on Cloud Data Lakes
Slides: Accelerating Queries on Cloud Data LakesSlides: Accelerating Queries on Cloud Data Lakes
Slides: Accelerating Queries on Cloud Data Lakes
 
Platforming the Major Analytic Use Cases for Modern Engineering
Platforming the Major Analytic Use Cases for Modern EngineeringPlatforming the Major Analytic Use Cases for Modern Engineering
Platforming the Major Analytic Use Cases for Modern Engineering
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
 
Data architecture for modern enterprise
Data architecture for modern enterpriseData architecture for modern enterprise
Data architecture for modern enterprise
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Slides: Relational to NoSQL Migration
Slides: Relational to NoSQL MigrationSlides: Relational to NoSQL Migration
Slides: Relational to NoSQL Migration
 
Enable Better Decision Making with Power BI Visualizations & Modern Data Estate
Enable Better Decision Making with Power BI Visualizations & Modern Data EstateEnable Better Decision Making with Power BI Visualizations & Modern Data Estate
Enable Better Decision Making with Power BI Visualizations & Modern Data Estate
 
Big Data Boom
Big Data BoomBig Data Boom
Big Data Boom
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the Cloud
 
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
 
Future of Analytics: Drivers of Change
Future of Analytics: Drivers of ChangeFuture of Analytics: Drivers of Change
Future of Analytics: Drivers of Change
 
Data-Ed Online: Data Architecture Requirements
Data-Ed Online: Data Architecture RequirementsData-Ed Online: Data Architecture Requirements
Data-Ed Online: Data Architecture Requirements
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on Hadoop
 
Enterprise Data Management
Enterprise Data ManagementEnterprise Data Management
Enterprise Data Management
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and Innovation
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for Everyone
 
Power BI Advanced Data Modeling Virtual Workshop
Power BI Advanced Data Modeling Virtual WorkshopPower BI Advanced Data Modeling Virtual Workshop
Power BI Advanced Data Modeling Virtual Workshop
 
Optimizing for Costs in the Cloud
Optimizing for Costs in the CloudOptimizing for Costs in the Cloud
Optimizing for Costs in the Cloud
 
How to Use a Semantic Layer on Big Data to Drive AI & BI Impact
How to Use a Semantic Layer on Big Data to Drive AI & BI ImpactHow to Use a Semantic Layer on Big Data to Drive AI & BI Impact
How to Use a Semantic Layer on Big Data to Drive AI & BI Impact
 

Similar to A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?

What Drove Wordnik Non-Relational?
What Drove Wordnik Non-Relational?What Drove Wordnik Non-Relational?
What Drove Wordnik Non-Relational?DATAVERSITY
 
Running MongoDB in the Cloud
Running MongoDB in the CloudRunning MongoDB in the Cloud
Running MongoDB in the CloudTony Tam
 
Why Wordnik went non-relational
Why Wordnik went non-relationalWhy Wordnik went non-relational
Why Wordnik went non-relationalTony Tam
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQLTony Tam
 
Inside Wordnik's Architecture
Inside Wordnik's ArchitectureInside Wordnik's Architecture
Inside Wordnik's ArchitectureTony Tam
 
Five Years of EC2 Distilled
Five Years of EC2 DistilledFive Years of EC2 Distilled
Five Years of EC2 DistilledGrig Gheorghiu
 
MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...
MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...
MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...MongoDB
 
Mongo DB at Community Engine
Mongo DB at Community EngineMongo DB at Community Engine
Mongo DB at Community EngineCommunity Engine
 
MongoDB at community engine
MongoDB at community engineMongoDB at community engine
MongoDB at community enginemathraq
 
Where Django Caching Bust at the Seams
Where Django Caching Bust at the SeamsWhere Django Caching Bust at the Seams
Where Django Caching Bust at the SeamsConcentric Sky
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterJohn Adams
 
SeaJUG May 2012 mybatis
SeaJUG May 2012 mybatisSeaJUG May 2012 mybatis
SeaJUG May 2012 mybatisWill Iverson
 
Why we love ArangoDB. The hunt for the right NosQL Database
Why we love ArangoDB. The hunt for the right NosQL DatabaseWhy we love ArangoDB. The hunt for the right NosQL Database
Why we love ArangoDB. The hunt for the right NosQL DatabaseAndreas Jung
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage SystemsSATOSHI TAGOMORI
 
Scality S3 Server: Node js Meetup Presentation
Scality S3 Server: Node js Meetup PresentationScality S3 Server: Node js Meetup Presentation
Scality S3 Server: Node js Meetup PresentationScality
 
A Presentation on MongoDB Introduction - Habilelabs
A Presentation on MongoDB Introduction - HabilelabsA Presentation on MongoDB Introduction - Habilelabs
A Presentation on MongoDB Introduction - HabilelabsHabilelabs
 
Database as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance PlatformDatabase as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance PlatformMaris Elsins
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitterRoger Xia
 

Similar to A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational? (20)

What Drove Wordnik Non-Relational?
What Drove Wordnik Non-Relational?What Drove Wordnik Non-Relational?
What Drove Wordnik Non-Relational?
 
Running MongoDB in the Cloud
Running MongoDB in the CloudRunning MongoDB in the Cloud
Running MongoDB in the Cloud
 
Why Wordnik went non-relational
Why Wordnik went non-relationalWhy Wordnik went non-relational
Why Wordnik went non-relational
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
 
Inside Wordnik's Architecture
Inside Wordnik's ArchitectureInside Wordnik's Architecture
Inside Wordnik's Architecture
 
Why ruby and rails
Why ruby and railsWhy ruby and rails
Why ruby and rails
 
Five Years of EC2 Distilled
Five Years of EC2 DistilledFive Years of EC2 Distilled
Five Years of EC2 Distilled
 
MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...
MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...
MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...
 
Mongo DB at Community Engine
Mongo DB at Community EngineMongo DB at Community Engine
Mongo DB at Community Engine
 
MongoDB at community engine
MongoDB at community engineMongoDB at community engine
MongoDB at community engine
 
Where Django Caching Bust at the Seams
Where Django Caching Bust at the SeamsWhere Django Caching Bust at the Seams
Where Django Caching Bust at the Seams
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
 
SeaJUG May 2012 mybatis
SeaJUG May 2012 mybatisSeaJUG May 2012 mybatis
SeaJUG May 2012 mybatis
 
Why we love ArangoDB. The hunt for the right NosQL Database
Why we love ArangoDB. The hunt for the right NosQL DatabaseWhy we love ArangoDB. The hunt for the right NosQL Database
Why we love ArangoDB. The hunt for the right NosQL Database
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage Systems
 
Scality S3 Server: Node js Meetup Presentation
Scality S3 Server: Node js Meetup PresentationScality S3 Server: Node js Meetup Presentation
Scality S3 Server: Node js Meetup Presentation
 
A Presentation on MongoDB Introduction - Habilelabs
A Presentation on MongoDB Introduction - HabilelabsA Presentation on MongoDB Introduction - Habilelabs
A Presentation on MongoDB Introduction - Habilelabs
 
Database as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance PlatformDatabase as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance Platform
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 

More from DATAVERSITY

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...DATAVERSITY
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceDATAVERSITY
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data LiteracyDATAVERSITY
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for YouDATAVERSITY
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?DATAVERSITY
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?DATAVERSITY
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling FundamentalsDATAVERSITY
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectDATAVERSITY
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?DATAVERSITY
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...DATAVERSITY
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsDATAVERSITY
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayDATAVERSITY
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise AnalyticsDATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best PracticesDATAVERSITY
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?DATAVERSITY
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best PracticesDATAVERSITY
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageDATAVERSITY
 

More from DATAVERSITY (20)

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 

Recently uploaded

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 

A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?

  • 1. Why Wordnik went Non-Relational Tony Tam @fehguy
  • 2. What this Talk is About • 5 Key reasons why Wordnik migrated into a Non-Relational database • Process for selection, migration • Optimizations and tips from living survivors of the battle field
  • 3. Why Should You Care? • MongoDB user for 2 years • Lessons learned, analysis, benefits from process • We migrated from MySQL to MongoDB with no downtime • We have interesting/challenging data needs, likely relevant to you
  • 4. More on Wordnik • World’s fastest updating English dictionary • Based on input of text up to 8k words/second • Word Graph as basis to our analysis • Synchronous & asynchronous processing • 10’s of Billions of documents in NR storage • 20M daily REST API calls, billions served • Powered by Swagger OSS API framework Powered API swagger.wordnik.com
  • 5. Architectural History • 2008: Wordnik was born as a LAMP AWS EC2 stack • 2009: Introduced public REST API, powered wordnik.com, partner APIs • 2009: drank NoSQL cool-aid • 2010: Scala • 2011: Micro SOA
  • 6. Non-relational by Necessity • Moved to NR because of ―4S‖ • Speed • Stability • Scaling • Simplicity • But… • MySQL can go a LONG way • Takes right team, right reasons (+ patience) • NR offerings simply too compelling to focus on scaling MySQL
  • 7. Wordnik’s 5 Whys for NoSQL
  • 8. Why #1: Speed bumps with MySQL • Inserting data fast (50k recs/second) caused MySQL mayhem • Maintaining indexes largely to blame • Operations for consistency unnecessary but "cannot be turned off‖ • Devised twisted schemes to avoid client blocking • Aka the ―master/slave tango‖
  • 9. Why #2: Retrieval Complexity • Objects typically mapped to tables • Object Hierarchy always => inner + outer joins • Lots of static data, so why join? • “Noun” is not getting renamed in my code’s lifetime! • Logic like this is probably in application logic • Since storage is cheap • I’ll choose speed
  • 10. Why #2: Retrieval Complexity One definition = 10+ joins 50 requests per second!
  • 11. Why #2: Retrieval Complexity • Embed objects in rows ―sort of works‖ • Filtering gets really nasty • Native XML in MySQL? • If a full table-scan is OK… • OK, then cache it! • Layers of caching introduced layers of complexity • Stale data/corruption • Object versionitis • Cache stampedes
  • 12. Why #3: Object Modeling • Object models being compromised for sake of persistence • This is backwards! • Extra abstraction for the wrong reason • OK, then performance suffers • In-application joins across objects • ―Who ran the fetch all query against production?!‖ –any sysadmin • ―My zillionth ORM layer that only I understand‖ (and can maintain)
  • 13. Why #4: Scaling • Needed "cloud friendly storage" • Easy up, easy down! • Startup: Sync your data, and announce to clients when ready for business • Shutdown: Announce your departure and leave • Adding MySQL instances was a dance • Snapshot + bin files mysql> change master to MASTER_HOST='db1', MASTER_USER='xxx', MASTER_ PASSWORD='xxx', MASTER_LOG_FILE='master- relay.000431', MASTER_LOG_POS=1035435402;
  • 14. Why #4: Scaling • What about those VMs? • So convenient! But… they kind of suck • Can the database succeed on a VM? • VM Performance: • Memory, CPU or I/O—Pick only one • Can your database really reduce CPU or disk I/O with lots of RAM?
  • 15. Why #5: Big Picture • BI tools use relational constraints for discovery • Is this the right reason for them? • Can we work around this? • Let’s have a BI tool revolution, too! • True service architecture makes relational constraints impractical/impossible • Distributed sharding makes relational constraints impractical/impossible
  • 16. Why #5: Big Picture • Is your app smarter than your database? • The logic line is probably blurry! • What does count(*) really mean when you add 5k records/sec? • Maybe eventual consistency is not so bad… • 2PC? Do some reading and decide! http://eaipatterns.com/docs/IEEE_Software_Design_2PC.pdf
  • 17. Ok, I’m in! • I thought deciding was easy!? • Many quickly maturing products • Divergent features tackle different needs • Wordnik spent 8 weeks researching and testing NoSQL solutions • This is a long time! (for a startup) • Wrote ODM classes and migrated our data • Surprise! There were surprises • Be prepared to compromise
  • 18. Choice Made, Now What? • We went with MongoDB *** • Fastest to implement • Most reliable • Best community • Why? • Why #1: Fast loading/retrieval • Why #2: Fast ODM (50 tps => 1000 tps!) • Why #3: Document Models === Object models • Why #4: MMF => Kernel-managed memory + RS • Why #5: It’s 2011, is there no progress?
  • 19. More on Why MongoDB • Testing, testing, testing • Used our migration tools to load test • Read from MySQL, write to MongoDB • We loaded 5+ billion documents, many times over • In the end, one server could… • Insert 100k records/sec sustained • Read 250k records/sec sustained • Support concurrent loading/reading
  • 20. Migration & Testing • Iterated ODM mapping multiple times • Some issues • Type Safety cur.next.get("iWasAnIntOnce").asInstanceOf[Long] • Dates as Strings obj.put("a_date", "2011-12-31") != obj.put("a_date", new Date("2011-12-31")) • Storage Size obj.put("very_long_field_name", true) >> obj.put("vsfn", true)
  • 21. Migration & Testing • Expect data model iterations • Wordnik migrated table to Mongo collection "as-is‖ • Easier to migrate, test • _id field used same MySQL PK • Auto Increment? • Used MySQL to ―check-out‖ sequences • One row per mongo collection • Run out of sequences => get more • Need exclusive locks here!
  • 22. Migration & Testing • Sequence generator in-process SequenceGenerator.checkout("doc_metadata,100") • Sequence generator as web service • Centralized UID management
  • 23. Migration & Testing • Expect data access pattern iterations • So much more flexibility! • Reach into objects > db.dictionary_entry.find({"hdr.sr":"cmu"}) • Access to a whole object tree at query time • Overwrite a whole object at once… when desired • Not always! This clobbers the whole record > db.foo.save({_id:18727353,foo:"bar"}) • Update a single field: > db.foo.update({_id:18727353},{$set:{foo:"bar"}})
  • 24. Flip the Switch • Migrate production with zero downtime • We temporarily halted loading data • Added a switch to flip between MySQL/MongoDB • Instrument, monitor, flip it, analyze, flip back • Profiling your code is key • What is slow? • Build this in your app from day 1
  • 26. Flip the Switch • Storage selected at runtime val h = shouldUseMongoDb match { case true => new MongoDbSentenceDAO case _ => new MySQLDbSentenceDAO } h.find(...) • Hot-swappable storage via configuration • It worked!
  • 27. Then What? • Watch our deployment, many iterations to mapping layer • Settled on in-house, type-safe mapper https://github.com/fehguy/mongodb-benchmark-tools • Some gotchas (of course) • Locking issues on long-running updates (more in a minute) • We want more of this! • Migrated shared files to Mongo GridFS • Easy-IT
  • 28. Performance + Optimization • Loading data is fast! • Fixed collection padding, similarly-sized records • Tail of collection is always in memory • Append faster than MySQL in every case tested • But... random access started getting slow • Indexes in RAM? Yes • Data in RAM? No, > 2TB per server • Limited by disk I/O /seek performance • EC2 + EBS for storage?
  • 29. Performance + Optimization • Moved to physical data center • DAS & 72GB RAM => great uncached performance • Good move? Depends on use case • If ―access anything anytime‖, not many options • You want to support this?
  • 30. Performance + Optimization • Inserts are fast, how about updates? • Well… update => find object, update it, save • Lock acquired at ―find‖, released after ―save‖ • If hitting disk, lock time could be large • Easy answer, pre-fetch on update • Oh, and NEVER do ―update all records‖ against a large collection
  • 31. Performance + Optimization • Indexes • Can't always keep index in ram. MMF "does it's thing" • Right-balanced b-tree keeps necessary index hot • Indexes hit disk => mute your pager 1 7 1 2 5 7
  • 32. More Mongo, Please! • We modeled our word graph in mongo • 50M Nodes • 80M Edges • 80 S edge fetch
  • 33. More Mongo, Please! • Analytics rolled-up from aggregation jobs • Send to Hadoop, load to mongo for fast access
  • 34. What’s next • Liberate our models • stop worrying about how to store them (for the most part) • New features almost always NR • Some MySQL left • Less on each release
  • 35. Questions? • See more about Wordnik APIs http://developer.wordnik.com • Migrating from MySQL to MongoDB http://www.slideshare.net/fehguy/migrating-from-mysql-to-mongodb-at-wordnik • Maintaining your MongoDB Installation http://www.slideshare.net/fehguy/mongo-sv-tony-tam • Swagger API Framework http://swagger.wordnik.com • Mapping Benchmark https://github.com/fehguy/mongodb-benchmark-tools • Wordnik OSS Tools https://github.com/wordnik/wordnik-oss