Python-CouchDB Training at PyCon PL 2012
Upcoming SlideShare
Loading in...5
×
 

Python-CouchDB Training at PyCon PL 2012

on

  • 1,342 views

 

Statistics

Views

Total Views
1,342
Views on SlideShare
1,342
Embed Views
0

Actions

Likes
1
Downloads
20
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Python-CouchDB Training at PyCon PL 2012 Python-CouchDB Training at PyCon PL 2012 Presentation Transcript

    • Using CouchDB with Python Stefan Kögl @skoegl
    • What we will cover● What is CouchDB? – Access from Python though couchdbkit – Key-value Store Functionality – MapReduce Queries – HTTP API● When is CouchDB useful and when not? – Multi-Master Replication – Scaling up and down● Pointers to other resources, CouchDB ecosystem
    • What we wont cover● CouchApps – browser-based apps that are served by CouchDB● Detailled Security, Scaling and other operative issues● Other functionality that didnt fit
    • Training Modes● Code-Along – Follow Examples, write your own code – Small Scripts or REPL● Learning-by-Watching – Example Application at https://github.com/stefankoegl/python-couchdb-examples – Slides at https://slideshare.net/skoegl/couch-db-pythonpyconpl2012 – Use example scripts and see what happens – Submit Pull-Requests!
    • Contents● Intro – Contents – CouchDB – Example Application● DB Initialization● Key-Value Store● Simple MapReduce Queries● The _changes Feed● Complex MapReduce Queries● Replication● Additional Features and the Couch Ecosystem
    • CouchDB● Apache Project● https://couchdb.apache.org/● Current Version: 1.2● Apache CouchDB™ is a database that uses JSON for documents, JavaScript for MapReduce queries, and regular HTTP for an API
    • Example Application● Lending Database – Stores Items that you might want to lend – Stores when you have lent what to whom● Stand-alone or distributed● Small Scripts that do one task each● Look at HTTP traffic
    • Contents● Intro● DB Initialization – Setting Up CouchDB – Installing couchdbkit – Creating a Database● Key-Value Store● Simple MapReduce Queries● The _changes Feed● Complex MapReduce Queries● Replication● Additional Features and the Couch Ecosystem
    • Getting Set Up: CouchDB● Provided by me (not valid anymore after the training)● http://couch.skoegl.net:5984/<yourname>● Authentication: username training, password training● Setup your DB_URL in settings.py● If you want to install your own – Tutorials: https://wiki.apache.org/couchdb/Installation – Ubuntu: https://launchpad.net/~longsleep/+archive/couchdb – Mac, Windows: https://couchdb.apache.org/#download
    • Getting Set Up: couchdbkit● http://couchdbkit.org/● Python client library# install with pippip install couchdbkit# or from sourcegit clone git://github.com/benoitc/couchdbkit.gitcd couchdbkit sudo python setup.py install# and then you should be able to import import couchdbkit
    • Contents● Intro● DB Initialization – Setting Up CouchDB – Installing couchdbkit – Creating a Database● Key-Value Store● Simple MapReduce Queries● Complex MapReduce Queries● The _changes Feed● Replication● Additional Features and the Couch Ecosystem
    • Creating a Database● What we have: a CouchDB server and its URL eg http://127.0.0.1:5984● What we want: a database there eg http://127.0.0.1:5984/myname● http://wiki.apache.org/couchdb/HTTP_database_API
    • A note on Debugging● Apache-style log files● Locally – $ tail ­f /var/log/couchdb/couch.log● HTTP – http://127.0.0.1:5984/_log?bytes=5000 – http://wiki.apache.org/couchdb/HttpGetLog
    • Creating a Database# ldb-init.pyfrom restkit import BasicAuthfrom couchdbkit import Databasefrom couchdbkit.exceptions import ResourceNotFoundauth_filter = BasicAuth(username, pwd)db = Database(dburl, filters=[auth_filter])server = db.servertry: server.delete_db(db.dbname)except ResourceNotFound: passdb = server.get_or_create_db(db.dbname)
    • Creating a Database[Thu, 06 Sep 2012 16:44:30 GMT] [info] [<0.1435.0>] 127.0.0.1 - -DELETE /myname/ 200[Thu, 06 Sep 2012 16:44:30 GMT] [info] [<0.1435.0>] 127.0.0.1 - -HEAD /myname/ 404[Thu, 06 Sep 2012 16:44:30 GMT] [info] [<0.1440.0>] 127.0.0.1 - -PUT /myname/ 201
    • Contents● Intro● DB Initialization● Key-Value Store – Modelling Documents – Storing and Retrieving Documents – Updating Documents● Simple MapReduce Queries● Complex MapReduce Queries● The _changes Feed● Replication● Additional Features and the Couch Ecosystem
    • Key-Value Store● Core of CouchDB● Keys (_id): any valid JSON string● Values (documents): any valid JSON objects● Stored in B+-Trees● http://guide.couchdb.org/draft/btree.html
    • Modelling a Thing● A thing that we want to lend – Name – Owner – Dynamic properties like ● Description ● Movie rating ● etc
    • Modelling a Thing● In CouchDB documents are JSON objects● You can store any dict – Wrapped in couchdbkits Document classes for convenience● Documents can be serialized to JSON … mydict = mydoc.to_json()● … and deserialized from JSON mydoc = DocClass.wrap(mydict)
    • Modelling a Thing# models.pyfrom couchdbkit import Database, Document, StringPropertyclass Thing(Document): owner = StringProperty(required=True) name = StringProperty(required=True)db = Database(DB_URL)Thing.set_db(db)
    • Storing a Document● Document identified by _id – Auto-assigned by Database (bad) – Provided when storing the database (good) – Think about lost responses – couchdbkit does that for us● couchdbkit adds property doc_type with value „Thing“
    • Internal Storage● Database File /var/lib/couchdb/dbname.couch● B+-Tree of _id● Access: O(log n)● Append-only storage● Accessible in historic order (well come to that later)
    • Storing a Document# ldb-new-thing.pycouchguide = Thing(owner=stefan, name=CouchDB The Definitive Guide)couchguide.publisher = "OReilly“couchguide.to_json()# {owner: ustefan, doc_type: Thing,# name: uCouchDB The Definitive Guide,# publisher: u"OReilly"}couchguide.save()print couchguide._id# 448aaecfe9bc1cde5d6564a4c93f79c2
    • Storing a Document[Thu, 06 Sep 2012 19:40:26 GMT] [info] [<0.962.0>] 127.0.0.1 - -GET /_uuids?count=1000 200[Thu, 06 Sep 2012 19:40:26 GMT] [info] [<0.962.0>] 127.0.0.1 - -PUT /lendb/8f14ef7617b8492fdbd800f1101ebb35 201
    • Retrieving a Document● Retrieve Documents by its _id – Limited use – Does not allow queries by other properties# ldb­get­thing.py thing = Thing.get(thing_id)
    • Retrieving a Document[Thu, 06 Sep 2012 19:45:30 GMT] [info] [<0.962.0>] 127.0.0.1 - -GET /lendb/8f14ef7617b8492fdbd800f1101ebb35 200
    • Updating a Document● Optimistic Concurrency Control● Each Document has a revision● Each Operation includes revision● Operation fails if revision doesnt match
    • Updating a Document>>> thing1 = Thing.get(some_id) >>> thing2 = Thing.get(some_id)>>> thing1._rev >>> thing2._rev1­110e1e46bcde6ed3c2d9b1073f0b26 1­110e1e46bcde6ed3c2d9b1073f0b26>>> thing1.something = True>>> thing1.save() >>> thing2._rev>>> thing1._rev 1­110e1e46bcde6ed3c2d9b1073f0b262­3f800dffa62f4414b2f8c84f7cb1a1 >>> thing2.conflicting = test >>> thing2.save() Success couchdbkit.exceptions.ResourceConfl ict: Document update conflict. Failed
    • Updating a Document[Thu, 13 Sep 2012 06:16:52 GMT] [info] [<0.7977.0>] 127.0.0.1 - -GET /lendb/d46d311d9a0f64b1f7322d20721f9f1d 200[Thu, 13 Sep 2012 06:16:55 GMT] [info] [<0.7977.0>] 127.0.0.1 - -GET /lendb/d46d311d9a0f64b1f7322d20721f9f1d 200[Thu, 13 Sep 2012 06:17:34 GMT] [info] [<0.7977.0>] 127.0.0.1 - -PUT /lendb/d46d311d9a0f64b1f7322d20721f9f1d 201[Thu, 13 Sep 2012 06:17:48 GMT] [info] [<0.7977.0>] 127.0.0.1 - -PUT /lendb/d46d311d9a0f64b1f7322d20721f9f1d 409
    • Contents● Intro● DB Initialization● Key-Value Store● Simple MapReduce Queries – Create a View – Query the View● Complex MapReduce Queries● The _changes Feed● Replication● Additional Features and the Couch Ecosystem
    • Views● A specific „view“ on (parts of) the data in a database● Indexed incrementally● Query is just reading a range of a view sequentially● Generated using MapReduce
    • MapReduce Views● Map Function – Called for each document – Has to be side-effect free – Emits zero or more intermediate key-value pairs● Reduce Function (optional) – Aggregates intermediate pairs● View Results stored in B+-Tree – Incrementally pre-computed at query-time – Queries are just a O(log n)
    • List all Things● Implemented as MapReduce View● Contained in a Design Document – Create – Store – Query
    • Create a Design Document● Regular document, interpreted by the database● Views Mapped to Filesystem by directory structure _design/<ddoc name>/views/<view name>/{map,reduce}.js● Written in JavaScript or Erlang● Pluggable View Servers – http://wiki.apache.org/couchdb/View_server – http://packages.python.org/CouchDB/views.html – Lisp, PHP, Ruby, Python, Clojure, Perl, etc
    • Design Document# _design/things/views/by_owner_name/map.jsfunction(doc) { if(doc.doc_type == “Thing“) { emit([doc.owner, doc.name], null); }}
    • Intermediate ResultsKey Value[„stefan“, „couchguide“] null[„stefan“, „Polish Dictionary“] null[„marek“, „robot“] null
    • Design Document# _design/things/views/by_owner_name/reduce.js_count
    • Reduced Results ● Result depends on group levelKey Value[„stefan“, „couchguide“] 1[„stefan“, „Polish Dictionary“] 1[„marek“, „robot“] 1Key Value[„stefan“] 2[„marek“] 1Key Valuenull 3
    • Synchronize Design Docs● Upload the design document● _id: _design/<ddoc name>● couchdbkit syncs ddocs from filesystem● Well need this a few more times – Put the following in its own script – or run $ ./ldb­sync­ddocs.py
    • Synchronize Design Docs# ldb­sync­ddocs.pyfrom couchdbkit.loaders import FileSystemDocsLoaderauth_filter = BasicAuth(username, pwd)db = Database(dburl, filters=[auth_filter])loader = FileSystemDocsLoader(_design)loader.sync(db, verbose=True)
    • View things/by_name ● Emitted key-value pairs ● Sorted by key http://wiki.apache.org/couchdb/View_collation ● Keys can be complex (lists, dicts) ● Query http://127.0.0.1:5984/myname/_design/things/_view/by_name?reduce=falseKey Value _id (implicit) Document (implicit)[“stefan“, “couchguide“] null {…}[“stefan“, “Polish Dictionary“] null {…}
    • Query a View# ldb­list­things.pythings = Thing.view(things/by_owner_name,                    include_docs=True, reduce=False)for thing in things:   print thing._id, thing.name, thing.owner
    • Query a View – Reduced# ldb­overview.pyowners = Thing.view(things/by_owner_name,                    group_level=1)for owner_status in owners:    owner = owner_status[key][0]    count = owner_status[value]    print owner, count
    • Break
    • From the Break● Filtering by Price – startkey = 5 – endkey = 10● Structure: ddoc name / view name – Logical Grouping – Performance
    • Contents● Intro● DB Initialization● Key-Value Store● Simple MapReduce Queries● The _changes Feed – Accessing the _changes Feed – Lending Objects● Advanced MapReduce Queries● Replication● Additional Features and the Couch Ecosystem
    • Changes Sequence● With every document update, a change is recorded● local history, ordered by _seq value● Only the latest _seq is kept
    • Changes Feed● List of all documents, in the order they were last modified● Possibility to – React on changes – Process all documents without skipping any – Continue at some point with since parameter● CouchDB as a distributed, persistent MQ● http://guide.couchdb.org/draft/notifications.html● http://wiki.apache.org/couchdb/HTTP_database_API#Changes
    • Changes Feed# ldb­changes­log.pydef callback(line):    seq = line[seq]    doc = line[doc]       # get obj according to doc[doc_type]    print seq, objconsumer = Consumer(db)consumer.wait(callback, since=since, include_docs=True)
    • „Lending“ Objects● Thing that is lent● Who lent it (ie who is the owner of the thing)● To whom it is lent● When it was lent● When it was returned
    • Modelling a „Lend“ Object# models.py class Lending(Document):    thing = StringProperty(required=True)    owner = StringProperty(required=True)    to_user = StringProperty(required=True)    lent = DateTimeProperty(default=datetime.now)    returned = DateTimeProperty()Lending.set_db(db)
    • Lending a Thing# ldb­lend­thing.pylending = Lending(thing=thing_id,                  owner=username,                  to_user=to_user)           lending.save()                                                              
    • Returning a Thing# ldb­return­thing.py    lending = Lending.get(lend_id)lending.returned = datetime.now()lending.save()           
    • Contents● Intro● DB Initialization● Key-Value Store● Simple MapReduce Queries● The _changes Feed● Advanced MapReduce Queries – Imitating Joins with „Mixed“ Views● Replication● Additional Features and the Couch Ecosystem
    • Current Thing Status● View to get the current status of a thing● No Joins● We emit with keys, that group together
    • Complex View# _design/things/_view/history/map.jsfunction(doc) {    if(doc.doc_type == "Thing") {        emit([doc.owner, doc._id, 1], doc.name);    }    if(doc.doc_type == "Lending") {        if(doc.lent && !doc.returned) {            emit([doc.owner, doc.thing, 2], doc.to_user);        }    }}                                                                               
    • Intermediate View ResultsKey Value[„stefan“, 12345, 1] „couchguide“[„stefan“, 12345, 2] [„someone“, „2012-09-12“][„marek“, 34544, 1] „robot“
    • Reduce Intermediate Results# _design/things/_view/status/reduce.js/* use with group_level = 2 */function(keys, values) {        /* there is at least one „Lending“ row */    if(keys.length > 1) {        return "lent";    } else {        return "available";    }}
    • Reduce Intermediate Results● Dont forget to synchronize your design docs!● Group Level: 2● Reduce Function receives rows with same grouped value Intermediate – not reducedKey Value[„stefan“, 12345, 1] „couchguide“[„stefan“, 12345, 2] [„someone“, „2012-09-12“][„marek“, 34544, 1] „robot“ reducedKey Value[„owner“, 12345] „lent“[„owner“, 34544] „available“
    • Get Status# ldb­status.pythings = Thing.view(things/status, group_level = 2)for result in things:    owner = result[key][0]    thing_id = result[key][1]    status = result[value])    Print owner, thing_id, status
    • Contents● Intro● DB Initialization● Key-Value Store● Simple MapReduce Queries● The _changes Feed● Advanced MapReduce Queries● Replication – Setting up filters – Find Friends and Replicate from them – Eventual Consistency and Conflicts● Additional Features and the Couch Ecosystem
    • Replication● Replicate Things and their status from friends● Dont replicate things from friends of friends – we dont want to borrow anything from them
    • Replication● Pull replication – Pull documents from our friends, and store them locally● Theres also Push replication, but we wont use it● Goes through the sources _changes feed● Compares with local documents, updates or creates conflicts
    • Set up a Filter● A Filter is a JavaScript function that takes – a document – a request object● and returns – true, if the document passes the filter – false otherwise● A filter is evaluated at the source
    • Replication Filter# _design/things/filters/from_friend.js/* doc is the document,    req is the request that uses the filter */function(doc, req){    /* Allow only if entry is owned by the friend */    return (doc.owner == req.query.friend);}
    • Replication● Sync design docs to your own database!● Find friends to borrow from – Post your nickname and Database URL to http://piratepad.net/pycouchpl – Pick at least two friends
    • Replication● _replicator database● Objects describe Replication tasks – Source – Target – Continuous – Filter – etc● http://wiki.apache.org/couchdb/Replication
    • Replication# ldb­replicate­friend.pyauth_filter = BasicAuth(username, password)db = Database(db_url, filters=[auth_filter])replicator_db = db.server[_replicator]replication_doc = {    "source": friend_db_url,  "target": db_url,    "continuous": True,     "filter": "things/from_friend",    "query_params": { "friend": friend_name }}replicator_db[username+“­“+friend_name]=replication_doc
    • Replication● Documents should be propagated into own database● Views should contain both own and friends things
    • Dealing with Conflicts● Conflicts introduces by – Replication – „forcing“ a document update● _rev calculated based on – Previous _rev – document content● Conflict when two documents have – The same _id – Distinct _rev
    • Dealing with Conflicts● Select a Winner● Database cant do this for you● Automatic strategy selects a (temporary) winner – Deterministic: always the same winner on each node – leaves them in conflict state● View that contains all conflicts● Resolve conflict programmatically● http://guide.couchdb.org/draft/conflicts.html● http://wiki.apache.org/couchdb/Replication_and_conflicts
    • Contents● Intro● DB Initialization● Key-Value Store● Simple MapReduce Queries● The _changes Feed● Advanced MapReduce Queries● Replication● Additional Features and the Couch Ecosystem – Scaling and related Projects – Fulltext Search – Further Reading
    • Scaling Up / Out● BigCouch – Cluster of CouchDB nodes that appears as a single server – http://bigcouch.cloudant.com/ – will be merged into CouchDB soon● refuge – Fully decentralized data platform based on CouchDB – Includes fork of GeoCouch for spatial indexing – http://refuge.io/
    • Scaling Down● CouchDB-compatible Databases on a smaller scale● PouchDB – JavaScript library http://pouchdb.com/● TouchDB ● IOS: https://github.com/couchbaselabs/TouchDB-iOS ● Android: https://github.com/couchbaselabs/TouchDB-Android
    • Fulltext and Relational Search● http://wiki.apache.org/couchdb/Full_text_search● CouchDB Lucene – http://www.slideshare.net/martin.rehfeld/couchdblucene – https://github.com/rnewson/couchdb-lucene● Elastic Search – http://www.elasticsearch.org/
    • Operations Considerations● Append Only Storage● Your backup tools: cp, rsync● Regular Compaction needed
    • Further Features● Update Handlers: JavaScript code that carries out update in the database server● External Processes: use CouchDB as a proxy to other processes (eg search engines)● Attachments: attach binary files to documents● Update Validation: JavaScript code to validate doc updates● CouchApps: Web-Apps served directly by CouchDB● Bulk APIs: Several Updates in one Request● List and Show Functions: Transforming responses before serving them
    • Summing Up● Apache CouchDB™ is a database that uses JSON for documents, JavaScript for MapReduce queries, and regular HTTP for an API● couchdbkit is a a Python library providing access to Apache CouchDB
    • Thanks! Time for Questions and DiscussionStefan Köglstefan@skoegl.net@skoegl Downloads https://slideshare.net/skoegl/couch-db-pythonpyconpl2012 https://github.com/stefankoegl/python-couchdb-examples