Mongopersist

1,371 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,371
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • A few sentences about mongoDB, databases and persistence Then we'll dive into the features of mongopersist
  • Ask around who knows mongoDB, pg/mysql, zodb, ZTK, pyramid authors and stuff: I had the idea first with using a KV store for a ZODB backend, because of relstorage and it’s memcached caching. But having ACID transactions on top of any KV store is a pain. Then MongoDB came along Stephan had the idea to implement on top of Persistent and just skip ZODB. I’m the co-pilot since the beginning.
  • Most of us know it, but it grew tons of features recently It’s document store is great, indexing and query features get improved. Pain is having no ACID transactions. You have to embrace eventual consistency. Some like it some hate it, lately I read complaints about it We like it because of it’s schemaless documents and because it provides easy query possibility, NOT really for scaling
  • Tools/databases have their strengths and weaknesses You have to evaluate and pick the right tool for the job. Sometimes it turns out later that the eventual best tool isn't the best. But sometimes it’s not just the tool, but their usage e.g. it matters if a sub-object is a sub-document or a separate collection Optimize on demand The goal of mongopersist is to address the cons of mongoDB by effectively reducing the impendance mismatch and being almost fully transparent like ZODB.
  • State that outlives the process that created it -- so optimal case is when my objects just keep their state without any extra calls I get my object from somewhere, manipulate it and I'm done. OK, need to commit the transaction Otherwise we do this all the time, just with additional circles, query, deserialize, modify, serialize, write Also, when directly manipulating documents / raw data, there are no objects in sight or I need to manually (de) serialize
  • Let's have a look at an example that's handmade This example is a VERY simple example, imagine traversal and etc Here we miss de/serialization, but at the same time loose OO paradigm At the beginning of a project I want simplicity and optimize later on demand
  • Then what we can do using mongopersist
  • A quick view on the class we'll be using with the examples Note the friends {} and visited () attributes IMPORTANT: subclass Persistent, PersistentDict, PersistentList
  • as good as it gets We need some connection setup code Either app startup code or request setup code will do that, so the DB/datamanager is more or less given for a single request
  • `dm.root` is a mapping from names to mongoDB DBRef objects that are automatically resolved to objects when accessed That sounds complicated, but in reality it does a find() on the object.name Let's create an object Add it to the datamanager root, which is our persistence root transaction.commit, abort
  • DumpCollection is our helper
  • Another sample class, note the _p_mongo_collection attribute That means the instances will get stored in that specified collection
  • Sub-object, which becomes a document in a different collection
  • it sometimes makes sense to store multiple types of (similar) objects in the same collection. mongopersist will automatically notice these cases and stores the Python type as part of the document
  • _p_mongo_sub_object The ``_p_mongo_sub_object`` is used to mark a type of object to be just part of another document. This is a design decision - which can avoid multiple queries - get more consistency - ...
  • In this case Phone does NOT have Persistent as it's superclass Sub-objects get dumped, but changes later not, that’s why Persistent is needed. mongopersist will return list and dict types converted to PersistentList and PersistentDict, respectively. This makes life easier.
  • Look, no declaration needed
  • object, list of objects Note the possibility of recursion in Person Mongopersist silently changes basic mutable types to their persistent implementations Circular references: Object trees might be a problem a problem when inserting
  • The process of dumping data during a transaction under the assumption the transaction will succeed. CRUD and query don’t match? like you need to store changes before, but a query needs to have those changes Mongopersist keeps the original state of the objects, so we can revert in case of a problem database might temporarily be in an inconsistent state
  • Here is the code example We modify some objects NO COMMIT! If mongopersist would not flush, the count would return who knows what.
  • no MVCC in mongoDB, we use a serial number in the document - NoCheckConflictHandler: This conflict handler does absolutely nothing to resolve conflicts. Default of the library. Last flush wins. - SimpleSerialConflictHandler: Detects conflicts by comparing serial numbers and always raises a ``ConflictError`` error. - ResolvingSerialConflictHandler: Detects conflicts by comparing serial numbers and allows objects to resolve conflicts by calling their ``_p_resolveConflict()`` method.
  • Some objects might not naturally serialize well and create a very ugly Mongo entry. Making querying a pain. Thus, we allow custom serializers to be registered, which can encode/decode different types of objects. register those in serialize.SERIALIZERS there’s already one for datetime.date, which serializes nicely to an ordinal number
  • datamanager.get_collection(): find_objects, find_one_object the datamanager can load any object by DBRef, that means if you have the database, collection and _id, it’s straightforward to get the object
  • DB access and craeting objects in python is quite slow, so there are some methods in place to improve on this: - dbref -> python class lookup - object cache: the same object instance is returned on access - document cache: retrieved documents can be cached, so that on object lookup a trip to the DB can be avoided.
  • LoggingDecorator: LOGGED_METHODS = ['insert', 'update', 'remove', 'save', 'find_and_modify', 'find_one', 'find', 'count'] logs the calls to those methods, incl. args and kwargs, optionally with traceback (by default it's added) mongoDB has it's own query logging, but it definitely won't log tracebacks → __traceback_info__
  • mapping.MongoCollectionMapping It has a dict-ish interface: subclasses UserDict.DictMixin which should provide all the methods as a dict With __mongo_collection__ you specify the collection NAME within the DB Uses by default the “key” attribute of the contained object as key Override with __mongo_mapping_key__
  • Explain (not just): events Contained, Container, __name__, __parent__: those are the basics of the tree structure that you usually build with a ZTK/ZODB app ZODB containers can hold Mongo items, allowing switching to mongo on any level MongoContained: works hard on __name__+__parent__ not easily persisted Don't be afraid of MongoContaine[r/d] they provide handy features Find* constrains the scope to the objects contained in the actual container
  • A usual webapp will be multithreaded Each thread should have it's own MongoDataManager Therefore Mongopersist provides connection pooling A bit more setup code, but with a webapp you'll need the pool Again, don't be afraid of the ZCA Also useful without ZTK, just copy the ZCA calls, it will work
  • annotations: IMongoAttributeAnnotatable zope annotations can store meta-data about the object itself, like DublinCore data, permissions, etc The default zope IAnnotatable BtreeContainers don't serialize well to mongo, had to rewrite Does not use __annotations__ but stores the keys directly as document attributes, that makes nicer mongo documents. DublinCore: all sorts of metadata (created, modified, etc) which get automatically updated by zope events
  • pickle? quite handy -- persists any python object (well most, without external effects like files) but it’s downside is the BLOB-ish storage that makes it unqueryable and unaccessible without python the sourcecode must be available to unpickle
  • It’s less widely known, so some details here: it’s actually a key-value store, it was before any KV store was hype (first commit 1997/feb) Features of the ZODB include: transactions, history/undo, transparently pluggable storage, built-in caching, multiversion concurrency control (MVCC), and scalability across a network (using ZEO). pro: very transparent -- my favourite for MVPs, ACID, good for read intensive apps con: the V in KV uses pickles build your own indexes usual practice is to build the indexes with ZODB, even text indexes, lately text indexes rather go into Lucene and friends
  • pro: data goes into a well known SQL database, ACID con: very strict schema, object - table impedance mismatch It starts with types, for example a list of strings, how do put that into pgsql? This can be more or less avoided by the ORM in mind and/or doing conversions. But do I want that? Keeping the object and DB schema in sync http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx
  • pro: documents match objects, quite complex structures can be built no strict schema con: no real ACID, you better think twice how you store the data
  • Mongopersist

    1. 1. 2013-07-05 Europython Florence mongopersist 1/37 mongopersist Adam GROSZER https://github.com/zopefoundation/mongopersist https://pypi.python.org/pypi/mongopersist slideshare URL https://github.com/agroszer http://hu.linkedin.com/in/agroszer/
    2. 2. 2013-07-05 Europython Florence mongopersist 2/37 contents ● mongoDB ● databases ● persistence ● mongopersist features
    3. 3. 2013-07-05 Europython Florence mongopersist 3/37 mongopersist
    4. 4. 2013-07-05 Europython Florence mongopersist 4/37
    5. 5. 2013-07-05 Europython Florence mongopersist 5/37 choose your database, choose your future pro con ZODB - very transparent - object store - only python/native - no query lang - no 3rd party tools - no default indexing RDBMS ORM - ad-hoc SQL queries - indexes, tools, etc. - BIG impedance mismatch - limited transparency - strict schema mongoDB - document store - ad-hoc queries - indexes, tools, etc. - small impedance mismatch - limited/no transparency
    6. 6. 2013-07-05 Europython Florence mongopersist 6/37 persistence state that outlives the process - get the object - modify - (finish the transaction)
    7. 7. 2013-07-05 Europython Florence mongopersist 7/37 persistence hand-crafted example, using pymongo: >>> stephan = db.coll.find_one( ... {'name': 'Stephan'}) >>> stephan['phone'] = { ... 'country': '+1', ... 'area': '555', ... 'number': '3945456'} >>> db.coll.save(stephan)
    8. 8. 2013-07-05 Europython Florence mongopersist 8/37 persistence ideal case with mongopersist: >>> persons = dm.root >>> stephan = persons['stephan'] >>> stephan.phone = Phone( ... ‘+1’, ‘555’, ‘3945456’) >>> transaction.commit()
    9. 9. 2013-07-05 Europython Florence mongopersist 9/37 mongopersist features ● transparency ● transactions Optimistic Data Dumping ● write conflict detection ● custom de/serialization ● object caching ● query logging, incl. traceback ● (not just) ZTK corner
    10. 10. 2013-07-05 Europython Florence mongopersist 10/37 sample class class Person(persistent.Persistent): def __init__(self, name, phone=None, address=None, friends=None, visited=(), birthday=None): self.name = name self.address = address self.friends = friends or {} self.visited = visited self.phone = phone self.birthday = birthday self.today = datetime.datetime.now() ...
    11. 11. 2013-07-05 Europython Florence mongopersist 11/37 transparency setup code: >>> conn = pymongo.Connection( ... 'localhost', 27017, tz_aware=False) >>> from mongopersist import datamanager >>> dm = datamanager.MongoDataManager(conn)
    12. 12. 2013-07-05 Europython Florence mongopersist 12/37 transparency >>> dm.root >>> stephan = Person(u'Stephan') >>> dm.root['stephan'] = stephan >>> stephan = dm.root['stephan'] >>> stephan.name = u'Stephan Richter' >>> transaction.commit()
    13. 13. 2013-07-05 Europython Florence mongopersist 13/37 defaults >>> dumpCollection('__main__.Person') [{u'_id': ObjectId('...'), u'address': None, u'birthday': None, u'friends': {}, u'name': u'Stephan Richter', u'phone': None, u'today': datetime.datetime(2013, 6, 18, 14, 48, 30, 970000), u'visited': []}]
    14. 14. 2013-07-05 Europython Florence mongopersist 14/37 customizing class Address(persistent.Persistent): _p_mongo_collection = 'address' def __init__(self, city, zip): self.city = city self.zip = zip
    15. 15. 2013-07-05 Europython Florence mongopersist 15/37 sub-objects via DBRef >>> stephan.address = Address('Maynard', '01754') >>> transaction.commit() >>> dumpCollection('address') [{u'_id': ObjectId('...'), u'city': u'Maynard', u'zip': u'01754'}] >>> dumpCollection('__main__.Person') [{u'_id': ObjectId('...'), u'address': DBRef(u'address', ObjectId('...'), u'mongopersist_test'), ...}]
    16. 16. 2013-07-05 Europython Florence mongopersist 16/37 collection sharing class Person(persistent.Persistent): _p_mongo_collection = 'person' name = u'' ... class Employee(Person): _p_mongo_collection = 'person' salary = 0 ... mongopersist will automatically notice these cases and stores the Python type as part of the document
    17. 17. 2013-07-05 Europython Florence mongopersist 17/37 sub object/document class Car(persistent.Persistent): _p_mongo_sub_object = True def __init__(self, year, make, model): self.year = year self.make = make self.model = model >>> dm.root['stephan'].car = Car('2005', 'Ford', 'Explorer') >>> dumpCollection('__main__.Person') [{... u'car': {u'_py_persistent_type': u'__main__.Car', u'make': u'Ford', u'model': u'Explorer', u'year': u'2005'}, ...}]
    18. 18. 2013-07-05 Europython Florence mongopersist 18/37 beware of non Persistent objects class Phone(object): def __init__(self, country, area, number): ... >>> stephan.phone = Phone('+1', '978', '394-5124') >>> dumpCollection('__main__.Person') [{... u'phone': {u'_py_type': u'__main__.Phone', u'area': u'978', u'country': u'+1', u'number': u'394-5124'}, ...] >>> stephan.phone.number = '555-1234' >>> transaction.commit() Changes not saved, because not subclassing Persistent
    19. 19. 2013-07-05 Europython Florence mongopersist 19/37 add/delete property >>> stephan.foobar = 42 >>> transaction.commit() No declaration needed! >>> dumpCollection('__main__.Person') [{... u'foobar': 42, u'name': u'Stephan', ...}] >>> del stephan.foobar >>> transaction.commit()
    20. 20. 2013-07-05 Europython Florence mongopersist 20/37 custom property >>> stephan.friends[u'roger'] = Person(u'Roger') >>> stephan.visited.append('Italy') >>> transaction.commit() >>> dumpCollection('__main__.Person') [{... u'friends': {u'roger': DBRef(u'__main__.Person', ObjectId('...'), u'mongopersist_test')}, u'visited': [u'Italy']}, {... u'name': u'Roger', ...] >>> stephan.friends[u'roger'].name u'Roger' Circular references: - Persistent: OK - non-Persistent: no-go!
    21. 21. 2013-07-05 Europython Florence mongopersist 21/37 Optimistic Data Dumping The process of dumping data during a transaction under the assumption the transaction will succeed. object modifications ... object modifications ... automatic/implicit flush query
    22. 22. 2013-07-05 Europython Florence mongopersist 22/37 Optimistic Data Dumping >>> stephan.foobar = 42 ...code... >>> roy.foobar = 88 ...code... >>> dm.get_collection_from_object( ... roy).count({'foobar': 88}) 1 ALL query methods are wrapped to call flush first
    23. 23. 2013-07-05 Europython Florence mongopersist 23/37 write conflict detection _py_serial ● Conflict detection ● Conflict resolution Handlers: ● NoCheckConflictHandler ● Ignore conflicts, last flush wins, default ● SimpleSerialConflictHandler ● Detects conflicts, always raise ConflictError ● ResolvingSerialConflictHandler ● Detects conflicts, calls _p_resolveConflict()
    24. 24. 2013-07-05 Europython Florence mongopersist 24/37 custom serializers stephan.birthday = datetime.date(1980, 1, 25) u'birthday': {u'_py_factory': u'datetime.date', u'_py_factory_args': [Binary('x07xbcx01x19', 0)]}, class DateSerializer(serialize.ObjectSerializer): def can_read(self, state): return isinstance(state, dict) and state.get('_py_type') == 'datetime.date' def read(self, state): return datetime.date.fromordinal(state['ordinal']) def can_write(self, obj): return isinstance(obj, datetime.date) def write(self, obj): return {'_py_type': 'datetime.date', 'ordinal': obj.toordinal()} >>> serialize.SERIALIZERS.append(DateSerializer()) u'birthday': {u'_py_type': u'datetime.date', u'ordinal': 722839},
    25. 25. 2013-07-05 Europython Florence mongopersist 25/37 querying mongoDB datamanager.get_collection(dbname, collname) datamanager.get_collection_from_object(obj) - find, find_one, count, etc extra methods, which return objects: - find_objects() - find_one_object() ALL query methods are wrapped to call flush first datamanager.load(dbref)
    26. 26. 2013-07-05 Europython Florence mongopersist 26/37 object caching DB access and object instantiation is quite slow ● class Lookup Cache: dbref → python class lookup ● object Cache: dbref → object (within transaction) ● document Cache: dbref → document (avoid DB trip)
    27. 27. 2013-07-05 Europython Florence mongopersist 27/37 query logging, incl. traceback LoggingDecorator: Logs the calls to insert, update, remove, save, find, find_one, find_and_modify, count including args and kwargs. With optional traceback.
    28. 28. 2013-07-05 Europython Florence mongopersist 28/37 containers and collections class People(MongoCollectionMapping): __mongo_collection__ = '__main__.Person' __mongo_mapping_key__ = 'name' ● Mapping/dict API for a Mongo collection. ● Specify the collection to use for the mapping ● Specify the attribute that represents the dictionary key.
    29. 29. 2013-07-05 Europython Florence mongopersist 29/37 Contained, Container, __name__, __parent__ zope.container.MongoContained zope.container.MongoContainer - mapping/dict interface - and more: add, (raw_)find, (raw_)find_one zope.container.IdNamesMongoContainer - uses the item's ObjectID as key (not just) ZTK corner
    30. 30. 2013-07-05 Europython Florence mongopersist 30/37 (not just) ZTK corner connection pooling app setup: mdmp = pool.MongoDataManagerProvider( host='localhost', port=27017, logLevel=20, tz_aware=True, w=1, j=True) zope.component.provideUtility(mdmp) request setup: mdmp = getUtility(interfaces.IMongoDataManagerProvider) dm = mdmp.get()
    31. 31. 2013-07-05 Europython Florence mongopersist 31/37 (not just) ZTK corner IMongoAttributeAnnotatable - storing metadata DublinCore - standard for metadata
    32. 32. 2013-07-05 Europython Florence mongopersist 32/37 Q&A
    33. 33. 2013-07-05 Europython Florence mongopersist 33/37 timesink
    34. 34. 2013-07-05 Europython Florence mongopersist 34/37 pickle persistence in python=pickle?
    35. 35. 2013-07-05 Europython Florence mongopersist 35/37 databases pro con ZODB - very transparent - object store - only python/native - no query lang - no 3rd party tools - no default indexing RDBMS ORM - ad-hoc SQL queries - indexes, tools, etc. - BIG impedance mismatch - limited transparency - rigid schema mongoDB - document store - ad-hoc queries - indexes, tools, etc. - small impedance mismatch - limited/no transparency
    36. 36. 2013-07-05 Europython Florence mongopersist 36/37 databases pro con ZODB - very transparent - object store - only python/native - no query lang - no 3rd party tools - no default indexing RDBMS ORM - ad-hoc SQL queries - indexes, tools, etc. - BIG impedance mismatch - limited transparency - strict schema mongoDB - document store - ad-hoc queries - indexes, tools, etc. - small impedance mismatch - limited/no transparency
    37. 37. 2013-07-05 Europython Florence mongopersist 37/37 databases pro con ZODB - very transparent - object store - only python/native - no query lang - no 3rd party tools - no default indexing RDBMS ORM - ad-hoc SQL queries - indexes, tools, etc. - BIG impedance mismatch - limited transparency - rigid schema mongoDB - document store - ad-hoc queries - indexes, tools, etc. - small impedance mismatch - limited/no transparency

    ×