NoSQL and CouchDB
The view from MOO
Who is Steve Storey?

• Hobby Coder for ~20 years
  Everything from Spectrum 48k through x86 assembler

• Professional Cod...
Who are MOO?

• MOO is a London based, online printing company

• Launched in 2006 with one product – The MiniCard

• Expa...
MOO Technology Stack


• Debian Lenny platform



• LAMP stack



• PHP 5.2
Where's the NoSQL?




    <This slide intentionally left blank>




        But who can tell what the future holds?
What is NoSQL?

• Mostly an inflammatory descriptive term

• Refers to a database of semi-structured data
  “semi-structur...
What is NoSQL NOT?

• A new idea

• A replacement for SQL
 NoSQL = “Not Only SQL” ?
 Entirely complementary to RDBMS syste...
Quick List of popular implementations

• Apache CouchDB

• MongoDB

• Amazon Dynamo
  Powers the Amazon S3 web service

• ...
Quick List of less popular implementations

• Lotus Notes/Domino
 In fact – very popular with corporates, just not their e...
What is CouchDB?

• Document store
  A document is an associative array (in fact a JSON associative array)

• Allows devel...
Why do I need documents?

• How much data is document like?
  Wiki's
  Blogs
  SQL tables with CLOB fields (text/mediumtex...
Woah! No schema?

• Requires thinking a bit differently
  Field usage is defined by the code
  Less restrictive in reality...
Simple Views

• Matches a set of documents on some condition
  the WHERE clause
  also the FROM clause

• Outputs a set of...
Advanced Views

• Still no joins

• Can perform complex calculations
  You can only rely on the content of the document be...
What about transactions?

• ACID compliant
  On a document-by-document basis
  Tolerant of very wide array of failure mode...
Scaling

• Master/master replication strategy

• Eventually consistent replication
  But the documents represent the full ...
Weaknesses

• DBs require periodic compaction
  All document operations (including deletion) are appended to the DB file

...
The known unknowns
What's been left out


• Security
  Recently introduced in 0.11.0
  Pluggable authentication, defaults ...
Concluding ...

• What's MOO doing with all this?

• NoSQL databases have their place
  There's more to the details of eve...
Questions?

• Steve Storey - steves@moo.com

• CouchDB - http://couchdb.apache.org/
• Further reading - http://books.couch...
Upcoming SlideShare
Loading in …5
×

NoSQL and CouchDB: the view from MOO

2,392 views

Published on

Published in: Technology
  • Be the first to comment

NoSQL and CouchDB: the view from MOO

  1. 1. NoSQL and CouchDB The view from MOO
  2. 2. Who is Steve Storey? • Hobby Coder for ~20 years Everything from Spectrum 48k through x86 assembler • Professional Coder for 12 years Everything from Pascal + REXX up to Java + PHP • Application Architect at MOO Everything from coding to meetings
  3. 3. Who are MOO? • MOO is a London based, online printing company • Launched in 2006 with one product – The MiniCard • Expanded to 5 products • Now has a UK and US printing/shipping facility • We ship globally – to over 43 countries in our first year Now well over 100, including Antarctica
  4. 4. MOO Technology Stack • Debian Lenny platform • LAMP stack • PHP 5.2
  5. 5. Where's the NoSQL? <This slide intentionally left blank> But who can tell what the future holds?
  6. 6. What is NoSQL? • Mostly an inflammatory descriptive term • Refers to a database of semi-structured data “semi-structured” defined however you like it “database” might or might not have in-built query capability not “relational” as per RDBMS, but might allow arbitrary relationships between data nodes • 4 general types Key/Value – simple arbitrary data store (unstructured) Graph databases – Inspired by Euler + graph theory BigTable – clones of Google's BigTable database Document – essentially associative arrays
  7. 7. What is NoSQL NOT? • A new idea • A replacement for SQL NoSQL = “Not Only SQL” ? Entirely complementary to RDBMS systems • Non-transactional This does slightly depend on your definition of transactional
  8. 8. Quick List of popular implementations • Apache CouchDB • MongoDB • Amazon Dynamo Powers the Amazon S3 web service • Memcached • Neo4J • More at http://en.wikipedia.org/wiki/Structured_storage
  9. 9. Quick List of less popular implementations • Lotus Notes/Domino In fact – very popular with corporates, just not their employees 1.0 released in 1989 One of its engineers was Damian Katz who later went on to write CouchDB
  10. 10. What is CouchDB? • Document store A document is an associative array (in fact a JSON associative array) • Allows developer-defined views on the documents Akin to materialised views found in Oracle Views use a Map/Reduce engine • Restful HTTP interface Client APIs written for most higher level languages Also means that you can host an AJAX app entirely in CouchDB • Built-in fault tolerant replication NOTE! Not clustering “Eventually consistent” Lock-less updates (Multi-version concurrency control)
  11. 11. Why do I need documents? • How much data is document like? Wiki's Blogs SQL tables with CLOB fields (text/mediumtext/longtext) • Schema-less Arbitrary fields can be added at any point to any document The DB doesn't attach any significance to (almost) any field “_id” and “_rev” are special • Hierarchical data structures moo.com Pack data model
  12. 12. Woah! No schema? • Requires thinking a bit differently Field usage is defined by the code Less restrictive in reality since different fields can be used for different concerns No type or null restrictions (but documents can be validated at save time) A document should represent the complete state of that part of the data model • Doesn't necessarily mean acting very differently Does all your code definitely attaches the same meaning to all the DB fields? Even the meaning of status flags? How long does it take to add a new column to a MySQL DB? How much time do developers take learning your ORM solution? How much time is spent mapping objects and relationships to tables, only to load the complete tree on every request? What happens to the careful DB guarantees if you shard your data?
  13. 13. Simple Views • Matches a set of documents on some condition the WHERE clause also the FROM clause • Outputs a set of fields, or parts of the associative array the SELECT clause • Usually coded in Javascript CouchDB does however support alternative view server View servers for Python, PHP, Ruby, Erlang, Perl available • Uses only the “map” part of map/reduce • No joins But the documents represent the full state of that part of the data model ... right?
  14. 14. Advanced Views • Still no joins • Can perform complex calculations You can only rely on the content of the document being processed But the documents represent the full state of that part of the data model ... right? • A Reduce function can be used to aggregate calculations • Map and reduce intermediate results are indexed Once calculated for a document, they never need to be re-calculated until the document is updated It's therefore very fast! • Not as obvious how to program them
  15. 15. What about transactions? • ACID compliant On a document-by-document basis Tolerant of very wide array of failure modes due to Erlang paradigms The documented way to cleanly stop a CouchDB server is to kill the process • No user-defined transactions Essentially it's auto-commit But the documents represent the full state of that part of the data model ... right? No isolation levels, so don't run your banking on it … No isolation levels to get in the way when you're storing data for a single user Effective isolation is READ_COMMITTED • No distributed transactions The world is eventually consistent A given user tied to a particular CouchDB server will always have a consistent world view
  16. 16. Scaling • Master/master replication strategy • Eventually consistent replication But the documents represent the full state of that part of the data model ... right? • Requires conflict resolution in the application code This might as simple as last update wins Can equally be a user-driven process – the application code sees all conflicts of a document and can decide how to proceed • Offline working is easy In fact – in-built for AJAX applications hosted within the CouchDB database
  17. 17. Weaknesses • DBs require periodic compaction All document operations (including deletion) are appended to the DB file • Under heavy update load, storage may be sub-optimal Try MongoDB, which does in-place updates – but requires greater transactional overhead as a result • SQL skills don't map (or reduce) over to CouchDB
  18. 18. The known unknowns What's been left out • Security Recently introduced in 0.11.0 Pluggable authentication, defaults to CouchDB hosted _users database Together with the validation functionality fairly powerful • Caveat emptor ... There's more to the details of everything I've talked about
  19. 19. Concluding ... • What's MOO doing with all this? • NoSQL databases have their place There's more to the details of everything I've talked about • SQL can do everything NoSQL can Might take rather longer to do it NoSQL is better suited for some use-cases • Many different implementations for different use-cases Each as their own strengths and weaknesses • Download and try a few!
  20. 20. Questions? • Steve Storey - steves@moo.com • CouchDB - http://couchdb.apache.org/ • Further reading - http://books.couchdb.org/relax/

×