PyConUK2013 - Validated documents on MongoDB with Ming


Published on

Ming is a SQLAlchemy-inspired object-document mapper (ODM) for MongoDB developed at SourceForge which is also used by the TurboGears2 web framework to provide mongodb support.

After a short introduction to the basic Ming layer we will cover the Ming Object Document Mapper layer to show how to take advantage of its Unit Of Work to avoid performing incomplete changes and achieve relations between collections.

The last part of the talk will show how to use Ming to perform lazy migration of data when your schema changes and how to drop below the ODM layer to achieve maximum speed.

Published in: Technology, Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

PyConUK2013 - Validated documents on MongoDB with Ming

  1. 1. VALIDATED DOCUMENTS ON MONGODB WITH MING Alessandro Molina @__amol__
  2. 2. Who am I ● CTO @, mostly Python company (with some iOS and Android) ● TurboGears development team member ● Contributions to Ming project ODM layer ● Really happy to be here at PyConUK! ○ I thought I would have crashed my car driving on the wrong side!
  3. 3. MongoDB Models ● Schema free ○ It looks like you don’t have a schema, but your code depends on properties that need to be there. ● SubDocuments ○ You know that a blog post contain a list of comments, but what it is a comment? ● Relations ○ You don’t have joins and foreign keys, but you still need to express relationships
  4. 4. What’s Ming? ● MongoDB toolkit ○ Validation layer on pymongo ○ Manages schema migrations ○ In Memory MongoDB ○ ODM on top of all of those ● Born at ● Supported by TurboGears community MongoDB PyMongo Ming Ming.ODM
  5. 5. Getting Started with the ODM ● Ming.ODM looks like SQLAlchemy ● UnitOfWork ○ Avoid half-saved changes in case of crashes ○ Flush all your changes at once ● IdentityMap ○ Same DB objects are the same object in memory ● Supports Relations ● Supports events (after_insert, before_update, …)
  6. 6. Declaring Schema with the ODM class WikiPage(MappedClass): # Metadata for the collection # like its name, indexes, session, ... class __mongometa__: session = DBSession name = 'wiki_page' unique_indexes = [('title',)] _id = FieldProperty(schema.ObjectId) title = FieldProperty(schema.String) text = FieldProperty(schema.String) # Ming automatically generates # the relationship query comments = RelationProperty('WikiComment') class WikiComment(MappedClass): class __mongometa__: session = DBSession name = 'wiki_comment' _id = FieldProperty(schema.ObjectId) text=FieldProperty(s.String, if_missing='') # Provides an actual relation point # between comments and pages page_id = ForeignIdProperty('WikiPage') ● Declarative interface for models ● Supports polymorphic models
  7. 7. Querying the ODM wp = WikiPage.query.get(title='FirstPage') # Identity map prevents duplicates wp2 = WikiPage.query.get(title='FirstPage') assert wp is wp2 # manually fetching related comments comments = WikiComment.query.find(dict(page_id=wp._id)).all() # or comments = wp.comments # gets last 5 wikipages in natural order wps = WikiPage.query.find().sort('$natural', DESCENDING).limit(5).all() ● Query language tries to be natural for both SQLAlchemy and MongoDB users
  8. 8. The Unit Of Work ● Flush or Clear the pending changes ● Avoid mixing UOW and atomic operations ● UnitOfWork as a cache wp = WikiPage(title='FirstPage', text='This is my first page') DBSession.flush() wp.title = "TITLE 2" DBSession.update(WikiPage, {'_id':wp._id}, {'$set': {'title': "TITLE 3"}}) DBSession.flush() # wp.title will be TITLE 2, not TITLE 3 wp2 = DBSession.get(WikiPage, wp._id) # wp2 lookup won’t query the database again
  9. 9. How Validation works ● Ming documents are validated at certain points in their life cycle ○ When saving the document to the database ○ When loading it from the database. ○ Additionally, validation is performed when the document is created through the ODM layer or using the .make() method ■ Happens before they get saved for real
  10. 10. Cost of Validation ● MongoDB is famous for its speed, but validation has a cost ○ MongoDB documents can contain many subdocuments ○ Each subdocument must be validated by ming ○ Can even contain lists of multiple subdocuments
  11. 11. Cost of Validation benchmark #With Validation class User(MappedClass): # ... friends = FieldProperty([dict(fbuser=s.String, photo=s.String, name=s.String)], if_missing=[]) >>> timeit.timeit('User.query.find().all()', number=20000) 31.97218942642212 #Without Validation class User(MappedClass): # ... friends = FieldProperty(s.Anything, if_missing=[]) >>> timeit.timeit('User.query.find().all()', number=20000) 23.391359090805054 #Avoiding the field at query time >>> timeit.timeit('User.query.find({}, fields=("_id","name")).all()', number=20000) 21.58667516708374
  12. 12. Only query what you need ● Previous benchmark explains why it is good to query only for fields you need to process the current request ● All the fields you don’t query for, will still be available in the object with None value
  13. 13. Evolving the Schema ● Migrations are performed lazily as the objects are loaded from the database ● Simple schema evolutions: ○ New field: It will just be None for old entities. ○ Removed: Declare it as ming.schema.Deprecated ○ Changed Type: Declare it as ming.schema.Migrate ● Complex schema evolutions: ○ Add a migration function in __mongometa__
  14. 14. Complex migrations with Ming class OldWikiPage(Document): _id = Field(schema.ObjectId) title = Field(str) text = Field(str, if_missing='') metadata = Field(dict(tags=[str], categories=[str])) class WikiPage(Document): class __mongometa__: session = DBSession name = 'wiki_page' version_of = OldWikiPage def migrate(data): result = dict(data, version=1, tags=data['metadata']['tags'], categories=data['metadata']['categories']) del result['metadata'] return result version = Field(1, required=True) # … more fields ...
  15. 15. Testing MongoDB ● Ming makes testing easy ○ Your models can be directly imported from tests ○ Just bind the session to a DataStorage created in your tests suite ● Ming provides MongoInMemory ○ much like sqlite://:memory: ● Implements 90% of mongodb, including javascript execution with spidermonkey
  16. 16. Ming for Web Applications ● Ming can be integrated in any WSGI framework through the ming.odm. middleware.MingMiddleware ○ Automatically disposes open sessions at the end of requests ○ Automatically provides session flushing ○ Automatically clears the session in case of exceptions
  17. 17. Ming with TurboGears ● Provides builtin support for ming ○ $ gearbox quickstart --ming projectname ● Ready made test suite with fixtures on MIM ● Facilities to debug and benchmark Ming queries through the DebugBar ● TurboGears Admin automatically generates CRUD from Ming models
  18. 18. Debugging MongoDB ● TurboGears debugbar has builtin support for MongoDB ○ Executed queries logging and results ○ Queries timing ○ Syntax prettifier and highlight for Map-Reduce and $where javascript code ○ Queries tracking on logs for performance reporting of webservices
  19. 19. DebugBar in action
  20. 20. Ming without learning MongoDB ● Transition from SQL/Relational solutions to MongoDB can be scary first time. ● You can use Sprox to lower the learning cost for simple applications ○ Sprox is the library that empowers TurboGears Admin to automatically generate pages from SQLA or Ming
  21. 21. Sprox ORM abstractions ● ORMProvider, provides an abstraction over the ORM ● ORMProviderSelector, automatically detects the provider to use from a model. ● Mix those together and you have a db independent layer with automatic storage backend detection.
  22. 22. Hands on Sprox ● Provider.query(self, entity, **kwargs) → get all objects of a collection ● Provider.get_obj(self, entity, params) → get an object ● Provider.update(self, entity, params) → update an object ● Provider.create(self, entity, params) → create a new object # Sprox (Ming or SQLAlchemy) count, transactions = provider.query(MoneyTransfer) transactions = DBSession.query(MoneyTransfer).all() # SQLAlchemy transactions = MoneyTransfer.query.find().all() # Ming
  23. 23. Questions?