Polyglot Persistence - Two Great Tastes That Taste Great Together

Polyglot Persistence
Two Great Tastes
That Taste Great Together!

John Wood
john_p_wood@yahoo.com
@johnpwood

About Me
● Software Developer at Interactive Mediums
● Primarily work on a web application that allows
our customers to engage and interact with their
customers
● Writing code for about 15 years
● Tinkering with NoSQL for about 1.5 years
● Have a NoSQL solution that has been running
in production for a year

The RDBMS Is No Longer The
Default Choice

The RDBMS Is No Longer The
Default Choice
● Can be very difficult to scale horizontally
● Schemas can be difficult to maintain and
migrate
● For some applications, the data integrity
features of the RDBMS are an unnecessary
overhead
● Data constraints and JOINs can be expensive
at runtime

NoSQL Databases Have Stepped
Up To Address These Issues

NoSQL Databases Have Stepped
Up To Address These Issues

● Schema-less
● Little to no data integrity enforcement
● Self-contained data
● Eventually consistent
● Easy to scale horizontally to add processing
power and storage

But The RDBMS Is Far From Dead

But The RDBMS Is Far From Dead
● Incredibly mature, and battle tested
● Immediate and constant consistency
● Integrity of data is enforced
● Efficient use of storage space if data
normalized properly
● Supported by everyone and everything (tools,
frameworks, libraries, etc)
● Incredibly flexible and powerful query language
● Help is plentiful and easy to find

“You've got your chocolate in my peanut butter!”

pol●y●glot - Adjective
Knowing or using several languages

pol●y●glot - Adjective
Knowing or using several languages

per●sist●ence - Noun
The continued or prolonged existence of
something

something using several languages

something using several languages
databases

“Polyglot Persistence, like
polyglot programming, is all
about choosing the right
persistence option for the task at
hand.” - Scott Leberknight,
October, 2008

http://www.nearinfinity.com/blogs/scott_leberknight/polyglot_persistence.html

Why On Earth Would
You Want To Do This?

CAP Theorem

http://en.wikipedia.org/wiki/CAP_theorem

http://blog.nahurst.com/visual-guide-to-nosql-systems

Consistency and
Data Integrity
+
Scalability and
Flexibility

Support A Wide Range
of Storage
Requirements

Get The Job Done
Faster, With Better
Quality

DB Doesn't Just Stand For
Database

Use A NoSQL Database
For A Particular
Application Feature

For Speedy Batch
Processing

For Distributed Logging

For Large Tables

Sounds Great!
What's The Catch?

Difficult For Data In
Different Databases To
Interact

You Now Have To
Decide Where To Store
Data

Increased Application
And Deployment
Complexity

Additional
Administrative
Responsibilities

What Will This Do To
My Beautiful Code?

class User < ActiveRecord::Base
end

class ContestEntry < CouchRest::ExtendedDocument
property :entry_number
end

class User < ActiveRecord::Base
def contest_entries
ContestEntry.entries_for_user(self.id)
end
end

class ContestEntry < CouchRest::ExtendedDocument
property :entry_number
property :user_id

def self.entries_for_user(user_id)
# Execute your view to fetch the contest entries
end

def user
User.f nd_by_id(user_id)
i
end
end

Additional Options
Available

So, Who Is Actually
Doing This?

● Primary MySQL database with a backup
● A few very large tables, containing 5M – 30M
rows each, and growing quickly
● Increasing query execution time
● Some pages on the web app were timing out
● Increasing database migration time
● Rigid schema of the RDBMS was preventing
some planned features from moving forward

● Brought in a consultant to help us optimize our
MySQL setup
● Optimized slow queries
● Added some indexes
● Offloaded some work to the backup database
● Considered the use of summary tables for
statistics

● Migrated old data from large tables to CouchDB
● Using CouchDB views to aggregate summary
data
● Data is imported and views are updated nightly
● Queries for statistics now very fast
● Using Lucene (via couchdb-lucene) for full text
searching
● Taking full advantage of CouchDBs schema-
less nature in several new application features

It's Not All Rainbows And Unicorns

● CouchDB databases and views can be very
large on disk
● Some queries could not be substituted with
CouchDB views
● Indexing tens of millions of documents for full
text search with Lucene takes weeks
● Development takes longer, as the map/reduce
model requires additional thought and planning
● Changing/Upgrading views in production not
straightforward
http://www.couch.io/migrating-to-couchdb

http://twitter.com/about/opensource

● Vertically and horizontally partitioned MySQL
● Several layers of aggressive caching, all
application managed
● Schema changes impossible, resulting in the
use of bitfields and piggyback tables
● Hardware intensive
● Error prone
● Hitting MySQL limits
● Already eventually consistent

● Migrating from MySQL to Cassandra as their
main online data store
● Hadoop/HBase used for people search feature
● FlockDB used to manage the social graph
● Hadoop for analytics
● “As with all NoSQL systems, strengths in
different situations” - Kevin Weil, Analytics
Lead, Twitter
http://www.slideshare.net/kevinweil/nosql-at-twitter-nosql-eu-2010

● Increased availability
● The ability to support new features
● The ability to analyze their massive amount of
data in a reasonable amount of time

http://www.slideshare.net/kevinweil/nosql-at-twitter-nosql-eu-2010

Thanks!
john_p_wood@yahoo.com
@johnpwood

Polyglot Persistence - Two Great Tastes That Taste Great Together

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Polyglot Persistence - Two Great Tastes That Taste Great Together

Similar to Polyglot Persistence - Two Great Tastes That Taste Great Together (20)

Recently uploaded

Recently uploaded (20)

Polyglot Persistence - Two Great Tastes That Taste Great Together