Couchdb: No SQL? No driver? No problem

  • 5,416 views
Uploaded on

I cover the basics of CouchDB and use a simple example of storing and querying a protein annotation record from NCBI.

I cover the basics of CouchDB and use a simple example of storing and querying a protein annotation record from NCBI.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
5,416
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
160
Comments
0
Likes
7

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • - If the key is a DateTime, then B-tree is a much better choice
  • Brewer’s CAP Theorem http://www.julianbrowne.com/article/viewer/brewers-cap-theorem

    Partition tolerance encompasses both business logic and data partitioning.

    PAXOS will override more recent updates to a disconnected resource if it did not vote on a previous transaction.
  • Highlighted words covered later in order that they appear
  • Highlighted words covered later in order that they appear
  • Highlighted words covered later in order that they appear
  • Highlighted words covered later in order that they appear
  • Highlighted words covered later in order that they appear
  • Highlighted words covered later in order that they appear
  • Other stuff, but this is the most relevant for the discussion

    Older browsers only support green verbs
  • Other stuff, but this is the most relevant for the discussion

    Older browsers only support green verbs
  • Other stuff, but this is the most relevant for the discussion

    Older browsers only support green verbs
  • Other stuff, but this is the most relevant for the discussion

    Older browsers only support green verbs
  • Other stuff, but this is the most relevant for the discussion

    Older browsers only support green verbs
  • Other stuff, but this is the most relevant for the discussion

    Older browsers only support green verbs
  • Other stuff, but this is the most relevant for the discussion

    Older browsers only support green verbs
  • CRUD = Create Read Update Delete
  • Next is the API discussions
  • You can give a “count” parameter to UUID function:

    $ curl -X GET http://localhost:5984/_uuids?count=10
  • You can give a “count” parameter to UUID function:

    $ curl -X GET http://localhost:5984/_uuids?count=10
  • Can give it as an URL parameter or in the E-Tag HTTP header.

    You cannot delete a specific revision! The revision number is only there so that the server can definitively say you are talking about the most recent record.

    You need delete rev for replication of delete operations on other servers that are being synced to this one.
  • Might also be able to delete a particualr version. Will have to check that.
  • Note: I could’ve made GI a number, but did not in this case
    Zipcodes would be a bad thing to turn into numbers, b/c of possible leading zeros
  • Note: I could’ve made GI a number, but did not in this case
    Zipcodes would be a bad thing to turn into numbers, b/c of possible leading zeros
  • Note: I could’ve made GI a number, but did not in this case
    Zipcodes would be a bad thing to turn into numbers, b/c of possible leading zeros
  • Best practice = One design document per application or set of requirements

    Next: Map-Reduce Views
  • Best practice = One design document per application or set of requirements

    Next: Map-Reduce Views
  • Best practice = One design document per application or set of requirements

    Next: Map-Reduce Views
  • We are just going to take a look at a simple plain text example of FASTA file
  • Append-only file structure ensures that your DB is always valid, even during mid-write server failures.
  • Append-only file structure ensures that your DB is always valid, even during mid-write server failures.
  • Append-only file structure ensures that your DB is always valid, even during mid-write server failures.
  • Append-only file structure ensures that your DB is always valid, even during mid-write server failures.
  • Append-only file structure ensures that your DB is always valid, even during mid-write server failures.

Transcript

  • 1. * CouchDB No SQL? No Driver? No problem. Angel Pizarro angel@upenn.edu * www.bauwel-movement.co.uk/ sculpture.php
  • 2. About Me Me: CBIL alumni! Work in mass spec proteomics Lots of data in lots of formats in bioinformatics Ruby for programming and Ruby on Rails for Web apps But that doesn’t matter for CouchDB! Interested in CouchDB for AWS deployment
  • 3. Overview Talk about Key-Value stores Introduce some general theory and concepts CouchDB specifics Example problem More CouchDB specifics Questions?
  • 4. Key-Value Databases Datastore of values indexed by keys (duh!) Hash or B-Tree index for keys Cassandra Hash is FAST, but only allows single-value lookups B-Tree is slower, but allows range queries Horizontally scalable - via key partitioning
  • 5. The CAP theory : applies when business logic is separate from storage Consistency vs. Availability vs. Partition tolerance RDBMS = enforced consistency PAXOS = quorum consistency CouchDB (and others) = eventual consistency and horizontally scalable http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
  • 6. CouchDB
  • 7. CouchDB Document Oriented Database JSON documents
  • 8. CouchDB Document Oriented Database JSON documents HTTP protocol using REST operations No direct native language drivers * Javascript is the lingua franca * Hovercraft: http://github.com/jchris/hovercraft/
  • 9. CouchDB Document Oriented Database JSON documents HTTP protocol using REST operations No direct native language drivers * Javascript is the lingua franca ACID & MVCC guarantees on a per- document basis * Hovercraft: http://github.com/jchris/hovercraft/
  • 10. CouchDB Document Oriented Database JSON documents HTTP protocol using REST operations No direct native language drivers * Javascript is the lingua franca ACID & MVCC guarantees on a per- document basis Map-Reduce indexing and views * Hovercraft: http://github.com/jchris/hovercraft/
  • 11. CouchDB Document Oriented Database JSON documents HTTP protocol using REST operations No direct native language drivers * Javascript is the lingua franca ACID & MVCC guarantees on a per- document basis Map-Reduce indexing and views Back-ups and replication are easy-peasy * Hovercraft: http://github.com/jchris/hovercraft/
  • 12. Javascript Object Notation
  • 13. Javascript Object Notation * * http://www.json.org
  • 14. Example JSON { “name”: “J. Doe”, “friends”: 0, “traits”: [“nice”, “outgoing”] }
  • 15. REST
  • 16. REST Representational State Transfer
  • 17. REST Representational State Transfer Clients-Server separation with uniform interface (HTTP)
  • 18. REST Representational State Transfer Clients-Server separation with uniform interface (HTTP) Load-balancing, caching, authorization & authentication, proxies
  • 19. REST Representational State Transfer Clients-Server separation with uniform interface (HTTP) Load-balancing, caching, authorization & authentication, proxies Stateless - client is responsible for creating a self- sufficient request
  • 20. REST Representational State Transfer Clients-Server separation with uniform interface (HTTP) Load-balancing, caching, authorization & authentication, proxies Stateless - client is responsible for creating a self- sufficient request Resources are cacheable - servers must mark non-cacheable resources as such
  • 21. REST Representational State Transfer Clients-Server separation with uniform interface (HTTP) Load-balancing, caching, authorization & authentication, proxies Stateless - client is responsible for creating a self- sufficient request Resources are cacheable - servers must mark non-cacheable resources as such Only 5 HTTP verbs
  • 22. REST Representational State Transfer Clients-Server separation with uniform interface (HTTP) Load-balancing, caching, authorization & authentication, proxies Stateless - client is responsible for creating a self- sufficient request Resources are cacheable - servers must mark non-cacheable resources as such Only 5 HTTP verbs GET, PUT, POST, DELETE, HEAD
  • 23. CouchDB REST/CRUD GET read PUT create or update DELETE delete something POST bulk operations
  • 24. CouchDB passes the ACID test Each document is completely self-sufficient Each document has a version number An update operation writes a complete new copy of the the record and is assigned the new version number Append-only file structure allows the write to occur while still serving read requests
  • 25. MVCC RDBMS CouchDB Multi-Version Concurrency Control RDBMS enforces consistency using read/write locks Instead of locks, CouchDB just serve up old data Multi-document (mutli-row) transactional semantics must be handled by the application
  • 26. Database API Create a DB: $ curl -X PUT http://127.0.0.1:5984/friendbook {"ok":true}
  • 27. Database API Create a DB: Protocol $ curl -X PUT http://127.0.0.1:5984/friendbook {"ok":true}
  • 28. Database API Create a DB: CouchDB server $ curl -X PUT http://127.0.0.1:5984/friendbook {"ok":true}
  • 29. Database API Create a DB: DB name $ curl -X PUT http://127.0.0.1:5984/friendbook {"ok":true}
  • 30. Database API Create a DB: $ curl -X PUT http://127.0.0.1:5984/friendbook {"ok":true} Try it Again: {"error":"db_exists"}
  • 31. Database API Create a DB: $ curl -X PUT http://127.0.0.1:5984/friendbook {"ok":true} Try it Again: {"error":"db_exists"} Not recoverable! Delete a DB: $ curl -X DELETE http://localhost:5984/friendbook {"ok":true}
  • 32. Inserting a document All insert require that you give a unique ID. You can request one from CouchDB: $ curl -X GET http://localhost:5984/_uuids {"uuids":["d1dde0996a4db7c1ebc78fb89c01b9e6"]}
  • 33. Inserting a document All insert require that you give a unique ID. You can request one from CouchDB: $ curl -X GET http://localhost:5984/_uuids {"uuids":["d1dde0996a4db7c1ebc78fb89c01b9e6"]} We’ll just give one: $ curl -X PUT http://localhost:5984/friendbook/j_doe -d @j_doe.json {"ok":true, "id":"j_doe", "rev":"1-062af1c4ac73287b7e07396c86243432"}
  • 34. Inserting a document All insert require that you give a unique ID. You can request one from CouchDB: $ curl -X GET http://localhost:5984/_uuids {"uuids":["d1dde0996a4db7c1ebc78fb89c01b9e6"]} We’ll just give one: $ curl -X PUT http://localhost:5984/friendbook/j_doe -d @j_doe.json Read a JSON file {"ok":true, "id":"j_doe", "rev":"1-062af1c4ac73287b7e07396c86243432"}
  • 35. Full JSON document Before { "name": "J. Doe", "friends": 0 } After { "_id": "j_doe", "_rev": "1-062af1c4ac73287b7e07396c86243432", "name": "J. Doe", "friends": 0 }
  • 36. Updating a document $ curl -X PUT http://localhost:5984/friendbook/j_doe -d '{"name": "J. Doe", "friends": 1 }' {"error":"conflict","reason":"Document update conflict."}
  • 37. Updating a document $ curl -X PUT http://localhost:5984/friendbook/j_doe -d '{"name": "J. Doe", "friends": 1 }' {"error":"conflict","reason":"Document update conflict."} Must give _rev (revision number) for updates! revised.json { "_rev":"1-062af1c4ac73287b7e07396c86243432", "name":"J. Doe", "friends": 1 } $ curl -X PUT http://localhost:5984/friendbook/j_doe -d @revised.json {"ok":true,"id":"j_doe","rev":"2-0629239b53a8d146a3a3c4c63e 2dbfd0"}
  • 38. Deleting a document $ curl -X DELETE http://localhost:5984/friendbook/j_doe {"error":"conflict","reason":"Document update conflict."} Must give revision number for deletes! $ curl -X DELETE http://localhost:5984/friendbook/j_doe? rev=2-0629239b53a8d146a3a3c4c63e2dbfd0 {"ok":true,"id":"j_doe", "rev":"3-57673a4b7b662bb916cc374a92318c6b"} Returns a revision number for the delete $ curl -X GET http://localhost:5984/friendbook/j_doe {"error":"not_found","reason":"deleted"}
  • 39. Bulk operation POST /database/_bulk_docs with a JSON document containing all of the new or updated documents. // documents to bulk upload { "docs": [ {"_id": "0", "integer": 0, "string": "0"}, {"_id": "1", "integer": 1, "string": "1"}, {"_id": "2", "integer": 2, "string": "2"} ] // reply from CouchDB } [ {"id":"0","rev":"1-62657917"}, {"id":"1","rev":"1-2089673485"}, {"id":"2","rev":"1-2063452834"} ]
  • 40. GOTCHA’s! Version storage is not guaranteed! Do not use this as a VCS! POST to /db/_compact deletes all older vesions To “roll back a transaction” you must: Retrieve all related records, cache these Insert any updates to records. On failure, use the returned revision numbers to re-insert the older record as a new one
  • 41. Our Example Problem
  • 42. Our Example Problem Hello world? Blog? Twitter clone?
  • 43. Our Example Problem Hello world? Blog? Twitter clone? Let’s store all human proteins instead
  • 44. Our Example Problem Hello world? Blog? Twitter clone? Let’s store all human proteins instead LOCUS YP_003024029 227 aa linear PRI 09-JUL-2009 DEFINITION cytochrome c oxidase subunit II [Homo sapiens]. ACCESSION YP_003024029 VERSION YP_003024029.1 GI:251831110 DBLINK Project:30353 DBSOURCE REFSEQ: accession NC_012920.1 KEYWORDS . SOURCE mitochondrion Homo sapiens (human) ORGANISM Homo sapiens Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo.
  • 45. Our Example Problem Hello world? Blog? Twitter clone? Let’s store all human proteins instead LOCUS YP_003024029 227 aa linear PRI 09-JUL-2009 DEFINITION cytochrome c oxidase subunit II [Homo sapiens]. ACCESSION YP_003024029 VERSION YP_003024029.1 GI:251831110 DBLINK Project:30353 FEATURES DBSOURCE REFSEQ: accession NC_012920.1 Location/Qualifiers KEYWORDS . source 1..227 SOURCE /organism="Homo sapiens" mitochondrion Homo sapiens (human) ORGANISM Homo sapiens /organelle="mitochondrion" /isolation_source="caucasian" Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; /db_xref="taxon:9606" Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo./tissue_type="placenta" /country="United Kingdom: Great Britain" /note="this is the rCRS" Protein 1..227 /product="cytochrome c oxidase subunit II" /calculated_mol_wt=25434 http://www.ncbi.nlm.nih.gov/
  • 46. Our Example Problem Hello world? Blog? Twitter clone? Let’s store all human proteins instead LOCUS YP_003024029 227 aa linear PRI 09-JUL-2009 DEFINITION cytochrome c oxidase subunit II [Homo sapiens]. ACCESSION YP_003024029 VERSION YP_003024029.1 GI:251831110 DBLINK Project:30353 FEATURES DBSOURCE REFSEQ: accession NC_012920.1 Location/Qualifiers KEYWORDS . source 1..227 SOURCE /organism="Homo sapiens" mitochondrion Homo sapiens (human) ORGANISM Homo sapiens /organelle="mitochondrion" /isolation_source="caucasian" Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; /db_xref="taxon:9606" Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo./tissue_type="placenta" /country="United Kingdom: Great Britain" /note="this is the rCRS" Protein 1..227 /product="cytochrome c oxidase subunit II" /calculated_mol_wt=25434 http://www.ncbi.nlm.nih.gov/
  • 47. Futon : A Couchapp
  • 48. Futon : A Couchapp
  • 49. Futon : A Couchapp
  • 50. Futon : A Couchapp This one is going to be a bit tougher
  • 51. Design Documents
  • 52. Design Documents The key to using CouchDB as more than a key-value store
  • 53. Design Documents The key to using CouchDB as more than a key-value store Just another JSON document, but contain javascript functions that CouchDB treats as application code Functions are executed within CouchDB
  • 54. Design Documents The key to using CouchDB as more than a key-value store Just another JSON document, but contain javascript functions that CouchDB treats as application code Functions are executed within CouchDB Contain sections for map-reduce views, data validation, alternate formatting, ... Also library code & data structures specific to the design document
  • 55. Soy Map!
  • 56. Soy Map! Views use a Map-Reduce model for indexing and defining “virtual” documents Fits well with assumptions of self-sufficient documents and eventual consistency
  • 57. Soy Map! Views use a Map-Reduce model for indexing and defining “virtual” documents Fits well with assumptions of self-sufficient documents and eventual consistency Map function is applied to all documents in the database Emits (parts of) documents that pass mustard Indexing is incremental after an initial definition You can choose to defer an index update for insert speed
  • 58. Map function example
  • 59. Map function example
  • 60. Complex Map
  • 61. Complex Map
  • 62. View Result
  • 63. View Result
  • 64. GET by the indexed key
  • 65. GET by the indexed key GET /refseq_human/_design/gb/_view/dbXref?key="GeneID:10" {"total_rows":7,"offset":2,"rows":[ {"id":"NP_000006", "key":"GeneID:10", "value":"NP_000006"} ]}
  • 66. Reduce functions Optional and used in concert with a specific map function Great for summarizing or collating numerical data points E.g. counts, number of over time X, average load, probability of conversion Not really applicable to our example, so we’ll not cover it today
  • 67. Show me the ... HTML? JSON is great, but what about, ya know, something useful? You can make a separate app to reformat the JSON OR you can use the “shows” section of a _design document. Rich formating possible with functions, templates, and special include macros
  • 68. FASTA format “shows” : { “fasta” : “function(doc, req) { return ‘>’ + doc._id + ‘n’ + doc.seq; }”, ...
  • 69. FASTA format “shows” : { “fasta” : “function(doc, req) { return ‘>’ + doc._id + ‘n’ + doc.seq; }”, ... GET /refseq_human/_design/gb/_show/fasta/NP_000006 >NP_000006 MDIEAYFERIGYKNSRNKLDLETLTDILEHQIRAVPFENLNMHCGQAMELGLEAIFDHIVRR NRGGWCLQVNQLLYWALTTIGFQTTMLGGYFYIPPVNKYSTGMVHLLLQVTIDGRNYIV DAGSGSSSQMWQPLELISGKDQPQVPCIFCLTEERGIWYLDQIRREQYITNKEFLNSHLLPK KKHQKIYLFTLEPRTIEDFESMNTYLQTSPTSSFITTSFCSLQTPEGVYCLVGFILTYRKFNYKD NTDLVEFKTLTEEEVEEVLRNIFKISLGRNLVPKPGDGSLTI
  • 70. Backups & Replication
  • 71. Backups & Replication Backup: simply copy the database file
  • 72. Backups & Replication Backup: simply copy the database file Replicate: send a POST request with a source and target database
  • 73. Backups & Replication Backup: simply copy the database file Replicate: send a POST request with a source and target database Source and target DB’s can either be local (just the db name) or remote (full URL)
  • 74. Backups & Replication Backup: simply copy the database file Replicate: send a POST request with a source and target database Source and target DB’s can either be local (just the db name) or remote (full URL) “continous”: true option will register the target to the source’s _changes notification API
  • 75. Backups & Replication Backup: simply copy the database file Replicate: send a POST request with a source and target database Source and target DB’s can either be local (just the db name) or remote (full URL) “continous”: true option will register the target to the source’s _changes notification API $ curl -X POST http://localhost:5984/_replicate -d '{"source":"db", "target":"db-replica", "continuous":true}'
  • 76. Data normalization? Schema? Foreign Keys? Column Constraints?
  • 77. Data normalization? Schema? Foreign Keys? Column Constraints? forgetaboutit Italian for “forget about it” … “or die”
  • 78. Data normalization? Schema? Foreign Keys? Column Constraints? forgetaboutit Italian for “forget about it” … “or die” Denormalize “until it hurts”
  • 79. Data normalization? Schema? Foreign Keys? Column Constraints? forgetaboutit Italian for “forget about it” … “or die” Denormalize “until it hurts” But there are validations are available
  • 80. Data normalization? Schema? Foreign Keys? Column Constraints? forgetaboutit Italian for “forget about it” … “or die” Denormalize “until it hurts” But there are validations are available Validates a record on update with a JS function
  • 81. Required Fields function(newDoc, oldDoc, userCtx) { function require(field, message) { message = message || "Document must have a " + field; if (!newDoc[field]) throw({forbidden : message}); }; if (newDoc.type == "blogPost") { require("title"); require("created_at"); require("body"); Convention alert! require("author"); } ... }
  • 82. Thank You! Learn http://couchdb.apache.org/ http://books.couchdb.org/relax http://wiki.apache.org/couchdb/ Awesome posts by community http://planet.couchdb.org Development Libraries http://github.com/jchris/couchrest http://github.com/couchapp/couchapp