Successfully reported this slideshow.
Your SlideShare is downloading. ×

Python-CouchDB Training at PyCon PL 2012

Upcoming SlideShare
An introduction to CouchDB
An introduction to CouchDB
Loading in …3
×

Check these out next

1 of 79 Ad
1 of 79 Ad
Advertisement

More Related Content

Advertisement

Similar to Python-CouchDB Training at PyCon PL 2012 (20)

Advertisement

Python-CouchDB Training at PyCon PL 2012

  1. 1. Using CouchDB with Python Stefan Kögl @skoegl
  2. 2. What we will cover ● What is CouchDB? – Access from Python though couchdbkit – Key-value Store Functionality – MapReduce Queries – HTTP API ● When is CouchDB useful and when not? – Multi-Master Replication – Scaling up and down ● Pointers to other resources, CouchDB ecosystem
  3. 3. What we won't cover ● CouchApps – browser-based apps that are served by CouchDB ● Detailled Security, Scaling and other operative issues ● Other functionality that didn't fit
  4. 4. Training Modes ● Code-Along – Follow Examples, write your own code – Small Scripts or REPL ● Learning-by-Watching – Example Application at https://github.com/stefankoegl/python-couchdb-examples – Slides at https://slideshare.net/skoegl/couch-db-pythonpyconpl2012 – Use example scripts and see what happens – Submit Pull-Requests!
  5. 5. Contents ● Intro – Contents – CouchDB – Example Application ● DB Initialization ● Key-Value Store ● Simple MapReduce Queries ● The _changes Feed ● Complex MapReduce Queries ● Replication ● Additional Features and the Couch Ecosystem
  6. 6. CouchDB ● Apache Project ● https://couchdb.apache.org/ ● Current Version: 1.2 ● Apache CouchDB™ is a database that uses JSON for documents, JavaScript for MapReduce queries, and regular HTTP for an API
  7. 7. Example Application ● Lending Database – Stores Items that you might want to lend – Stores when you have lent what to whom ● Stand-alone or distributed ● Small Scripts that do one task each ● Look at HTTP traffic
  8. 8. Contents ● Intro ● DB Initialization – Setting Up CouchDB – Installing couchdbkit – Creating a Database ● Key-Value Store ● Simple MapReduce Queries ● The _changes Feed ● Complex MapReduce Queries ● Replication ● Additional Features and the Couch Ecosystem
  9. 9. Getting Set Up: CouchDB ● Provided by me (not valid anymore after the training) ● http://couch.skoegl.net:5984/<yourname> ● Authentication: username training, password training ● Setup your DB_URL in settings.py ● If you want to install your own – Tutorials: https://wiki.apache.org/couchdb/Installation – Ubuntu: https://launchpad.net/~longsleep/+archive/couchdb – Mac, Windows: https://couchdb.apache.org/#download
  10. 10. Getting Set Up: couchdbkit ● http://couchdbkit.org/ ● Python client library # install with pip pip install couchdbkit # or from source git clone git://github.com/benoitc/couchdbkit.git cd couchdbkit  sudo python setup.py install # and then you should be able to import  import couchdbkit
  11. 11. Contents ● Intro ● DB Initialization – Setting Up CouchDB – Installing couchdbkit – Creating a Database ● Key-Value Store ● Simple MapReduce Queries ● Complex MapReduce Queries ● The _changes Feed ● Replication ● Additional Features and the Couch Ecosystem
  12. 12. Creating a Database ● What we have: a CouchDB server and its URL eg http://127.0.0.1:5984 ● What we want: a database there eg http://127.0.0.1:5984/myname ● http://wiki.apache.org/couchdb/HTTP_database_API
  13. 13. A note on Debugging ● Apache-style log files ● Locally – $ tail ­f /var/log/couchdb/couch.log ● HTTP – http://127.0.0.1:5984/_log?bytes=5000 – http://wiki.apache.org/couchdb/HttpGetLog
  14. 14. Creating a Database # ldb-init.py from restkit import BasicAuth from couchdbkit import Database from couchdbkit.exceptions import ResourceNotFound auth_filter = BasicAuth('username', 'pwd') db = Database(dburl, filters=[auth_filter]) server = db.server try: server.delete_db(db.dbname) except ResourceNotFound: pass db = server.get_or_create_db(db.dbname)
  15. 15. Creating a Database [Thu, 06 Sep 2012 16:44:30 GMT] [info] [<0.1435.0>] 127.0.0.1 - - DELETE /myname/ 200 [Thu, 06 Sep 2012 16:44:30 GMT] [info] [<0.1435.0>] 127.0.0.1 - - HEAD /myname/ 404 [Thu, 06 Sep 2012 16:44:30 GMT] [info] [<0.1440.0>] 127.0.0.1 - - PUT /myname/ 201
  16. 16. Contents ● Intro ● DB Initialization ● Key-Value Store – Modelling Documents – Storing and Retrieving Documents – Updating Documents ● Simple MapReduce Queries ● Complex MapReduce Queries ● The _changes Feed ● Replication ● Additional Features and the Couch Ecosystem
  17. 17. Key-Value Store ● Core of CouchDB ● Keys (_id): any valid JSON string ● Values (documents): any valid JSON objects ● Stored in B+-Trees ● http://guide.couchdb.org/draft/btree.html
  18. 18. Modelling a Thing ● A thing that we want to lend – Name – Owner – Dynamic properties like ● Description ● Movie rating ● etc
  19. 19. Modelling a Thing ● In CouchDB documents are JSON objects ● You can store any dict – Wrapped in couchdbkit's Document classes for convenience ● Documents can be serialized to JSON … mydict = mydoc.to_json() ● … and deserialized from JSON mydoc = DocClass.wrap(mydict)
  20. 20. Modelling a Thing # models.py from couchdbkit import Database, Document, StringProperty class Thing(Document): owner = StringProperty(required=True) name = StringProperty(required=True) db = Database(DB_URL) Thing.set_db(db)
  21. 21. Storing a Document ● Document identified by _id – Auto-assigned by Database (bad) – Provided when storing the database (good) – Think about lost responses – couchdbkit does that for us ● couchdbkit adds property doc_type with value „Thing“
  22. 22. Internal Storage ● Database File /var/lib/couchdb/dbname.couch ● B+-Tree of _id ● Access: O(log n) ● Append-only storage ● Accessible in historic order (we'll come to that later)
  23. 23. Storing a Document # ldb-new-thing.py couchguide = Thing(owner='stefan', name='CouchDB The Definitive Guide') couchguide.publisher = "O'Reilly“ couchguide.to_json() # {'owner': u'stefan', 'doc_type': 'Thing', # 'name': u'CouchDB The Definitive Guide', # 'publisher': u"O'Reilly"} couchguide.save() print couchguide._id # 448aaecfe9bc1cde5d6564a4c93f79c2
  24. 24. Storing a Document [Thu, 06 Sep 2012 19:40:26 GMT] [info] [<0.962.0>] 127.0.0.1 - - GET /_uuids?count=1000 200 [Thu, 06 Sep 2012 19:40:26 GMT] [info] [<0.962.0>] 127.0.0.1 - - PUT /lendb/8f14ef7617b8492fdbd800f1101ebb35 201
  25. 25. Retrieving a Document ● Retrieve Documents by its _id – Limited use – Does not allow queries by other properties # ldb­get­thing.py  thing = Thing.get(thing_id)
  26. 26. Retrieving a Document [Thu, 06 Sep 2012 19:45:30 GMT] [info] [<0.962.0>] 127.0.0.1 - - GET /lendb/8f14ef7617b8492fdbd800f1101ebb35 200
  27. 27. Updating a Document ● Optimistic Concurrency Control ● Each Document has a revision ● Each Operation includes revision ● Operation fails if revision doesn't match
  28. 28. Updating a Document >>> thing1 = Thing.get(some_id) >>> thing2 = Thing.get(some_id) >>> thing1._rev >>> thing2._rev '1­110e1e46bcde6ed3c2d9b1073f0b26' '1­110e1e46bcde6ed3c2d9b1073f0b26' >>> thing1.something = True >>> thing1.save() >>> thing2._rev >>> thing1._rev '1­110e1e46bcde6ed3c2d9b1073f0b26' '2­3f800dffa62f4414b2f8c84f7cb1a1' >>> thing2.conflicting = 'test' >>> thing2.save() Success couchdbkit.exceptions.ResourceConfl ict: Document update conflict. Failed
  29. 29. Updating a Document [Thu, 13 Sep 2012 06:16:52 GMT] [info] [<0.7977.0>] 127.0.0.1 - - GET /lendb/d46d311d9a0f64b1f7322d20721f9f1d 200 [Thu, 13 Sep 2012 06:16:55 GMT] [info] [<0.7977.0>] 127.0.0.1 - - GET /lendb/d46d311d9a0f64b1f7322d20721f9f1d 200 [Thu, 13 Sep 2012 06:17:34 GMT] [info] [<0.7977.0>] 127.0.0.1 - - PUT /lendb/d46d311d9a0f64b1f7322d20721f9f1d 201 [Thu, 13 Sep 2012 06:17:48 GMT] [info] [<0.7977.0>] 127.0.0.1 - - PUT /lendb/d46d311d9a0f64b1f7322d20721f9f1d 409
  30. 30. Contents ● Intro ● DB Initialization ● Key-Value Store ● Simple MapReduce Queries – Create a View – Query the View ● Complex MapReduce Queries ● The _changes Feed ● Replication ● Additional Features and the Couch Ecosystem
  31. 31. Views ● A specific „view“ on (parts of) the data in a database ● Indexed incrementally ● Query is just reading a range of a view sequentially ● Generated using MapReduce
  32. 32. MapReduce Views ● Map Function – Called for each document – Has to be side-effect free – Emits zero or more intermediate key-value pairs ● Reduce Function (optional) – Aggregates intermediate pairs ● View Results stored in B+-Tree – Incrementally pre-computed at query-time – Queries are just a O(log n)
  33. 33. List all Things ● Implemented as MapReduce View ● Contained in a Design Document – Create – Store – Query
  34. 34. Create a Design Document ● Regular document, interpreted by the database ● Views Mapped to Filesystem by directory structure _design/<ddoc name>/views/<view name>/{map,reduce}.js ● Written in JavaScript or Erlang ● Pluggable View Servers – http://wiki.apache.org/couchdb/View_server – http://packages.python.org/CouchDB/views.html – Lisp, PHP, Ruby, Python, Clojure, Perl, etc
  35. 35. Design Document # _design/things/views/by_owner_name/map.js function(doc) { if(doc.doc_type == “Thing“) { emit([doc.owner, doc.name], null); } }
  36. 36. Intermediate Results Key Value [„stefan“, „couchguide“] null [„stefan“, „Polish Dictionary“] null [„marek“, „robot“] null
  37. 37. Design Document # _design/things/views/by_owner_name/reduce.js _count
  38. 38. Reduced Results ● Result depends on group level Key Value [„stefan“, „couchguide“] 1 [„stefan“, „Polish Dictionary“] 1 [„marek“, „robot“] 1 Key Value [„stefan“] 2 [„marek“] 1 Key Value null 3
  39. 39. Synchronize Design Docs ● Upload the design document ● _id: _design/<ddoc name> ● couchdbkit syncs ddocs from filesystem ● We'll need this a few more times – Put the following in its own script – or run $ ./ldb­sync­ddocs.py
  40. 40. Synchronize Design Docs # ldb­sync­ddocs.py from couchdbkit.loaders import FileSystemDocsLoader auth_filter = BasicAuth('username', 'pwd') db = Database(dburl, filters=[auth_filter]) loader = FileSystemDocsLoader('_design') loader.sync(db, verbose=True)
  41. 41. View things/by_name ● Emitted key-value pairs ● Sorted by key http://wiki.apache.org/couchdb/View_collation ● Keys can be complex (lists, dicts) ● Query http://127.0.0.1:5984/myname/_design/things/_view/by_name?reduce=false Key Value _id (implicit) Document (implicit) [“stefan“, “couchguide“] null {…} [“stefan“, “Polish Dictionary“] null {…}
  42. 42. Query a View # ldb­list­things.py things = Thing.view('things/by_owner_name',                     include_docs=True, reduce=False) for thing in things:    print thing._id, thing.name, thing.owner
  43. 43. Query a View – Reduced # ldb­overview.py owners = Thing.view('things/by_owner_name',                     group_level=1) for owner_status in owners:     owner = owner_status['key'][0]     count = owner_status['value']     print owner, count
  44. 44. Break
  45. 45. From the Break ● Filtering by Price – startkey = 5 – endkey = 10 ● Structure: ddoc name / view name – Logical Grouping – Performance
  46. 46. Contents ● Intro ● DB Initialization ● Key-Value Store ● Simple MapReduce Queries ● The _changes Feed – Accessing the _changes Feed – Lending Objects ● Advanced MapReduce Queries ● Replication ● Additional Features and the Couch Ecosystem
  47. 47. Changes Sequence ● With every document update, a change is recorded ● local history, ordered by _seq value ● Only the latest _seq is kept
  48. 48. Changes Feed ● List of all documents, in the order they were last modified ● Possibility to – React on changes – Process all documents without skipping any – Continue at some point with since parameter ● CouchDB as a distributed, persistent MQ ● http://guide.couchdb.org/draft/notifications.html ● http://wiki.apache.org/couchdb/HTTP_database_API#Changes
  49. 49. Changes Feed # ldb­changes­log.py def callback(line):     seq = line['seq']     doc = line['doc']         # get obj according to doc['doc_type']     print seq, obj consumer = Consumer(db) consumer.wait(callback, since=since, include_docs=True)
  50. 50. „Lending“ Objects ● Thing that is lent ● Who lent it (ie who is the owner of the thing) ● To whom it is lent ● When it was lent ● When it was returned
  51. 51. Modelling a „Lend“ Object # models.py  class Lending(Document):     thing = StringProperty(required=True)     owner = StringProperty(required=True)     to_user = StringProperty(required=True)     lent = DateTimeProperty(default=datetime.now)     returned = DateTimeProperty() Lending.set_db(db)
  52. 52. Lending a Thing # ldb­lend­thing.py lending = Lending(thing=thing_id,                   owner=username,                   to_user=to_user)            lending.save()                                                               
  53. 53. Returning a Thing # ldb­return­thing.py     lending = Lending.get(lend_id) lending.returned = datetime.now() lending.save()           
  54. 54. Contents ● Intro ● DB Initialization ● Key-Value Store ● Simple MapReduce Queries ● The _changes Feed ● Advanced MapReduce Queries – Imitating Joins with „Mixed“ Views ● Replication ● Additional Features and the Couch Ecosystem
  55. 55. Current Thing Status ● View to get the current status of a thing ● No Joins ● We emit with keys, that group together
  56. 56. Complex View # _design/things/_view/history/map.js function(doc) {     if(doc.doc_type == "Thing") {         emit([doc.owner, doc._id, 1], doc.name);     }     if(doc.doc_type == "Lending") {         if(doc.lent && !doc.returned) {             emit([doc.owner, doc.thing, 2], doc.to_user);         }     } }                                                                                
  57. 57. Intermediate View Results Key Value [„stefan“, 12345, 1] „couchguide“ [„stefan“, 12345, 2] [„someone“, „2012-09-12“] [„marek“, 34544, 1] „robot“
  58. 58. Reduce Intermediate Results # _design/things/_view/status/reduce.js /* use with group_level = 2 */ function(keys, values) {          /* there is at least one „Lending“ row */     if(keys.length > 1) {         return "lent";     } else {         return "available";     } }
  59. 59. Reduce Intermediate Results ● Don't forget to synchronize your design docs! ● Group Level: 2 ● Reduce Function receives rows with same grouped value Intermediate – not reduced Key Value [„stefan“, 12345, 1] „couchguide“ [„stefan“, 12345, 2] [„someone“, „2012-09-12“] [„marek“, 34544, 1] „robot“ reduced Key Value [„owner“, 12345] „lent“ [„owner“, 34544] „available“
  60. 60. Get Status # ldb­status.py things = Thing.view('things/status', group_level = 2) for result in things:     owner = result['key'][0]     thing_id = result['key'][1]     status = result['value'])     Print owner, thing_id, status
  61. 61. Contents ● Intro ● DB Initialization ● Key-Value Store ● Simple MapReduce Queries ● The _changes Feed ● Advanced MapReduce Queries ● Replication – Setting up filters – Find Friends and Replicate from them – Eventual Consistency and Conflicts ● Additional Features and the Couch Ecosystem
  62. 62. Replication ● Replicate Things and their status from friends ● Don't replicate things from friends of friends – we don't want to borrow anything from them
  63. 63. Replication ● Pull replication – Pull documents from our friends, and store them locally ● There's also Push replication, but we won't use it ● Goes through the source's _changes feed ● Compares with local documents, updates or creates conflicts
  64. 64. Set up a Filter ● A Filter is a JavaScript function that takes – a document – a request object ● and returns – true, if the document passes the filter – false otherwise ● A filter is evaluated at the source
  65. 65. Replication Filter # _design/things/filters/from_friend.js /* doc is the document,     req is the request that uses the filter */ function(doc, req) {     /* Allow only if entry is owned by the friend */     return (doc.owner == req.query.friend); }
  66. 66. Replication ● Sync design docs to your own database! ● Find friends to borrow from – Post your nickname and Database URL to http://piratepad.net/pycouchpl – Pick at least two friends
  67. 67. Replication ● _replicator database ● Objects describe Replication tasks – Source – Target – Continuous – Filter – etc ● http://wiki.apache.org/couchdb/Replication
  68. 68. Replication # ldb­replicate­friend.py auth_filter = BasicAuth(username, password) db = Database(db_url, filters=[auth_filter]) replicator_db = db.server['_replicator'] replication_doc = {     "source": friend_db_url,  "target": db_url,     "continuous": True,      "filter": "things/from_friend",     "query_params": { "friend": friend_name } } replicator_db[username+“­“+friend_name]=replication_doc
  69. 69. Replication ● Documents should be propagated into own database ● Views should contain both own and friends' things
  70. 70. Dealing with Conflicts ● Conflicts introduces by – Replication – „forcing“ a document update ● _rev calculated based on – Previous _rev – document content ● Conflict when two documents have – The same _id – Distinct _rev
  71. 71. Dealing with Conflicts ● Select a Winner ● Database can't do this for you ● Automatic strategy selects a (temporary) winner – Deterministic: always the same winner on each node – leaves them in conflict state ● View that contains all conflicts ● Resolve conflict programmatically ● http://guide.couchdb.org/draft/conflicts.html ● http://wiki.apache.org/couchdb/Replication_and_conflicts
  72. 72. Contents ● Intro ● DB Initialization ● Key-Value Store ● Simple MapReduce Queries ● The _changes Feed ● Advanced MapReduce Queries ● Replication ● Additional Features and the Couch Ecosystem – Scaling and related Projects – Fulltext Search – Further Reading
  73. 73. Scaling Up / Out ● BigCouch – Cluster of CouchDB nodes that appears as a single server – http://bigcouch.cloudant.com/ – will be merged into CouchDB soon ● refuge – Fully decentralized data platform based on CouchDB – Includes fork of GeoCouch for spatial indexing – http://refuge.io/
  74. 74. Scaling Down ● CouchDB-compatible Databases on a smaller scale ● PouchDB – JavaScript library http://pouchdb.com/ ● TouchDB ● IOS: https://github.com/couchbaselabs/TouchDB-iOS ● Android: https://github.com/couchbaselabs/TouchDB-Android
  75. 75. Fulltext and Relational Search ● http://wiki.apache.org/couchdb/Full_text_search ● CouchDB Lucene – http://www.slideshare.net/martin.rehfeld/couchdblucene – https://github.com/rnewson/couchdb-lucene ● Elastic Search – http://www.elasticsearch.org/
  76. 76. Operations Considerations ● Append Only Storage ● Your backup tools: cp, rsync ● Regular Compaction needed
  77. 77. Further Features ● Update Handlers: JavaScript code that carries out update in the database server ● External Processes: use CouchDB as a proxy to other processes (eg search engines) ● Attachments: attach binary files to documents ● Update Validation: JavaScript code to validate doc updates ● CouchApps: Web-Apps served directly by CouchDB ● Bulk APIs: Several Updates in one Request ● List and Show Functions: Transforming responses before serving them
  78. 78. Summing Up ● Apache CouchDB™ is a database that uses JSON for documents, JavaScript for MapReduce queries, and regular HTTP for an API ● couchdbkit is a a Python library providing access to Apache CouchDB
  79. 79. Thanks! Time for Questions and Discussion Stefan Kögl stefan@skoegl.net @skoegl Downloads https://slideshare.net/skoegl/couch-db-pythonpyconpl2012 https://github.com/stefankoegl/python-couchdb-examples

×