• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
NoSQL and CouchDB: the view from MOO
 

NoSQL and CouchDB: the view from MOO

on

  • 2,603 views

 

Statistics

Views

Total Views
2,603
Views on SlideShare
2,569
Embed Views
34

Actions

Likes
4
Downloads
23
Comments
0

5 Embeds 34

http://blog.huddle.net 18
http://www.slideshare.net 11
http://www.techgig.com 3
http://www.m.techgig.com 1
http://www.techgig.timesjobs.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    NoSQL and CouchDB: the view from MOO NoSQL and CouchDB: the view from MOO Presentation Transcript

    • NoSQL and CouchDB The view from MOO
    • Who is Steve Storey? • Hobby Coder for ~20 years Everything from Spectrum 48k through x86 assembler • Professional Coder for 12 years Everything from Pascal + REXX up to Java + PHP • Application Architect at MOO Everything from coding to meetings
    • Who are MOO? • MOO is a London based, online printing company • Launched in 2006 with one product – The MiniCard • Expanded to 5 products • Now has a UK and US printing/shipping facility • We ship globally – to over 43 countries in our first year Now well over 100, including Antarctica
    • MOO Technology Stack • Debian Lenny platform • LAMP stack • PHP 5.2
    • Where's the NoSQL? <This slide intentionally left blank> But who can tell what the future holds?
    • What is NoSQL? • Mostly an inflammatory descriptive term • Refers to a database of semi-structured data “semi-structured” defined however you like it “database” might or might not have in-built query capability not “relational” as per RDBMS, but might allow arbitrary relationships between data nodes • 4 general types Key/Value – simple arbitrary data store (unstructured) Graph databases – Inspired by Euler + graph theory BigTable – clones of Google's BigTable database Document – essentially associative arrays
    • What is NoSQL NOT? • A new idea • A replacement for SQL NoSQL = “Not Only SQL” ? Entirely complementary to RDBMS systems • Non-transactional This does slightly depend on your definition of transactional
    • Quick List of popular implementations • Apache CouchDB • MongoDB • Amazon Dynamo Powers the Amazon S3 web service • Memcached • Neo4J • More at http://en.wikipedia.org/wiki/Structured_storage
    • Quick List of less popular implementations • Lotus Notes/Domino In fact – very popular with corporates, just not their employees 1.0 released in 1989 One of its engineers was Damian Katz who later went on to write CouchDB
    • What is CouchDB? • Document store A document is an associative array (in fact a JSON associative array) • Allows developer-defined views on the documents Akin to materialised views found in Oracle Views use a Map/Reduce engine • Restful HTTP interface Client APIs written for most higher level languages Also means that you can host an AJAX app entirely in CouchDB • Built-in fault tolerant replication NOTE! Not clustering “Eventually consistent” Lock-less updates (Multi-version concurrency control)
    • Why do I need documents? • How much data is document like? Wiki's Blogs SQL tables with CLOB fields (text/mediumtext/longtext) • Schema-less Arbitrary fields can be added at any point to any document The DB doesn't attach any significance to (almost) any field “_id” and “_rev” are special • Hierarchical data structures moo.com Pack data model
    • Woah! No schema? • Requires thinking a bit differently Field usage is defined by the code Less restrictive in reality since different fields can be used for different concerns No type or null restrictions (but documents can be validated at save time) A document should represent the complete state of that part of the data model • Doesn't necessarily mean acting very differently Does all your code definitely attaches the same meaning to all the DB fields? Even the meaning of status flags? How long does it take to add a new column to a MySQL DB? How much time do developers take learning your ORM solution? How much time is spent mapping objects and relationships to tables, only to load the complete tree on every request? What happens to the careful DB guarantees if you shard your data?
    • Simple Views • Matches a set of documents on some condition the WHERE clause also the FROM clause • Outputs a set of fields, or parts of the associative array the SELECT clause • Usually coded in Javascript CouchDB does however support alternative view server View servers for Python, PHP, Ruby, Erlang, Perl available • Uses only the “map” part of map/reduce • No joins But the documents represent the full state of that part of the data model ... right?
    • Advanced Views • Still no joins • Can perform complex calculations You can only rely on the content of the document being processed But the documents represent the full state of that part of the data model ... right? • A Reduce function can be used to aggregate calculations • Map and reduce intermediate results are indexed Once calculated for a document, they never need to be re-calculated until the document is updated It's therefore very fast! • Not as obvious how to program them
    • What about transactions? • ACID compliant On a document-by-document basis Tolerant of very wide array of failure modes due to Erlang paradigms The documented way to cleanly stop a CouchDB server is to kill the process • No user-defined transactions Essentially it's auto-commit But the documents represent the full state of that part of the data model ... right? No isolation levels, so don't run your banking on it … No isolation levels to get in the way when you're storing data for a single user Effective isolation is READ_COMMITTED • No distributed transactions The world is eventually consistent A given user tied to a particular CouchDB server will always have a consistent world view
    • Scaling • Master/master replication strategy • Eventually consistent replication But the documents represent the full state of that part of the data model ... right? • Requires conflict resolution in the application code This might as simple as last update wins Can equally be a user-driven process – the application code sees all conflicts of a document and can decide how to proceed • Offline working is easy In fact – in-built for AJAX applications hosted within the CouchDB database
    • Weaknesses • DBs require periodic compaction All document operations (including deletion) are appended to the DB file • Under heavy update load, storage may be sub-optimal Try MongoDB, which does in-place updates – but requires greater transactional overhead as a result • SQL skills don't map (or reduce) over to CouchDB
    • The known unknowns What's been left out • Security Recently introduced in 0.11.0 Pluggable authentication, defaults to CouchDB hosted _users database Together with the validation functionality fairly powerful • Caveat emptor ... There's more to the details of everything I've talked about
    • Concluding ... • What's MOO doing with all this? • NoSQL databases have their place There's more to the details of everything I've talked about • SQL can do everything NoSQL can Might take rather longer to do it NoSQL is better suited for some use-cases • Many different implementations for different use-cases Each as their own strengths and weaknesses • Download and try a few!
    • Questions? • Steve Storey - steves@moo.com • CouchDB - http://couchdb.apache.org/ • Further reading - http://books.couchdb.org/relax/