Rapid and Scalable Development with MongoDB, PyMongo, and Ming


Published on

This talk, given at PyGotham 2011, will teach you techniques using the popular NoSQL database MongoDB and the Python library Ming to write maintainable, high-performance, and scalable applications. We will cover everything you need to become an effective Ming/MongoDB developer from basic PyMongo queries to high-level object-document mapping setups in Ming.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Rapid and Scalable Development with MongoDB, PyMongo, and Ming

  1. 1. R apid and Scalable Development with MongoDB, PyMongo, and Ming Rick Copeland @rick446 [email_address]
  2. 2. <ul><li>SourceForge and MongoDB </li></ul><ul><li>Get started with PyMongo </li></ul><ul><li>Sprinkle in some Ming schemas </li></ul><ul><li>ORM: When a dict just won’t do </li></ul><ul><li>What we are learning </li></ul>
  3. 3. SourceForge s MongoDB <ul><li>Tried CouchDB – liked the dev model, not so much the performance </li></ul><ul><li>Migrated consumer-facing pages (summary, browse, download) to MongoDB and it worked great (on MongoDB 0.8 no less!) </li></ul><ul><li>All our new stuff uses MongoDB (Allura, Zarkov, Ming, …) </li></ul>
  4. 4. What is MongoDB? MongoDB (from &quot;humongous&quot;) is a scalable, high-performance, open source, document-oriented database. Sharding, Replication 20k inserts/s? No problem Hierarchical JSON-like store, easy to develop app Source Forge. Yeah. We like FOSS
  5. 5. MongoDB to Relational Mental Mapping <ul><li>Rows are flat, documents are nested </li></ul><ul><li>Typing: SQL is static, MongoDB is dynamic </li></ul>Relational (SQL) MongoDB Database Database Table Collection Index Index Row Document Column Field
  6. 6. <ul><li>SourceForge and MongoDB </li></ul><ul><li>Get started with PyMongo </li></ul><ul><li>Sprinkle in some Ming schemas </li></ul><ul><li>ORM: When a dict just won’t do </li></ul><ul><li>What we are learning </li></ul>
  7. 7. PyMongo: Getting Started <ul><li>>>> import pymongo </li></ul><ul><li>>>> conn = pymongo.Connection( ) </li></ul><ul><li>>>> conn </li></ul><ul><li>Connection('localhost', 27017) </li></ul><ul><li>>>> conn .test </li></ul><ul><li>Database(Connection('localhost', 27017), u'test') </li></ul><ul><li>>>> conn .test.foo </li></ul><ul><li>Collection(Database(Connection('localhost', 27017), u'test'), u'foo') </li></ul><ul><li>>>> conn[ 'test-db'] </li></ul><ul><li>Database(Connection('localhost', 27017), u'test-db') </li></ul><ul><li>>>> conn[ 'test-db']['foo-collection'] </li></ul><ul><li>Collection(Database(Connection('localhost', 27017), u'test-db'), u'foo-collection') </li></ul><ul><li>>>> conn .test.foo.bar.baz </li></ul><ul><li>Collection(Database(Connection('localhost', 27017), u'test'), u'foo.bar.baz') </li></ul>
  8. 8. PyMongo: Insert / Update / Delete <ul><li>>>> db = conn.test </li></ul><ul><li>>>> id = db.foo.insert({ 'bar': 1, 'baz':[ 1, 2, { ’k': 5} ] }) </li></ul><ul><li>>>> id </li></ul><ul><li>ObjectId('4e712e21eb033009fa000000') </li></ul><ul><li>>>> db .foo.find() </li></ul><ul><li><pymongo.cursor.Cursor object at 0x29c7d50> </li></ul><ul><li>>>> list(db .foo.find()) </li></ul><ul><li>[{u'bar': 1, u'_id': ObjectId('4e712e21eb033009fa000000'), u'baz': [1, 2, {k': 5}]}] </li></ul><ul><li>>>> db .foo.update({ '_id': id}, { '$set': { 'bar': 2}}) </li></ul><ul><li>>>> db .foo.find().next() </li></ul><ul><li>{u'bar': 2, u'_id': ObjectId('4e712e21eb033009fa000000'), u'baz': [1, 2, {k': 5}]} </li></ul><ul><li>>>> db .foo.remove({ '_id': id}) </li></ul><ul><li>>>> list(db .foo.find()) </li></ul><ul><li>[ ] </li></ul>
  9. 9. PyMongo: Queries, Indexes <ul><li>>>> db .foo.insert([ dict(x =x) for x in range( 10) ]) </li></ul><ul><li>[ObjectId('4e71313aeb033009fa00000b'), … ] </li></ul><ul><li>>>> list(db .foo.find({ 'x': {'$gt': 3} })) </li></ul><ul><li>[{u'x': 4, u'_id': ObjectId('4e71313aeb033009fa00000f')}, </li></ul><ul><li>{u'x': 5, u'_id': ObjectId('4e71313aeb033009fa000010')}, </li></ul><ul><li>{u'x': 6, u'_id': ObjectId('4e71313aeb033009fa000011')}, …] </li></ul><ul><li>>>> list(db .foo.find({ 'x': {'$gt': 3} }, { '_id': 0 } )) </li></ul><ul><li>[{u'x': 4}, {u'x': 5}, {u'x': 6}, {u'x': 7}, {u'x': 8}, </li></ul><ul><li>{u'x': 9}] </li></ul><ul><li>>>> list(db .foo.find({ 'x': {'$gt': 3} }, { '_id': 0 } ) </li></ul><ul><li>.skip( 1) .limit( 2)) </li></ul><ul><li>[{u'x': 5}, {u'x': 6}] </li></ul><ul><li>>>> db .foo.ensure_index([ </li></ul><ul><li>( 'x', pymongo .ASCENDING), ( 'y', pymongo .DESCENDING) ] ) </li></ul><ul><li>u'x_1_y_-1' </li></ul>
  10. 10. PyMongo: Aggregation et.al. <ul><li>You gotta write Javascript  (for now) </li></ul><ul><li>It’s pretty slow (single-threaded JS engine)  </li></ul><ul><li>Javascript is used by </li></ul><ul><ul><li>$where in a query </li></ul></ul><ul><ul><li>.group(key, condition, initial, reduce, finalize=None) </li></ul></ul><ul><ul><li>.map_reduce(map, reduce, out, finalize=None, …) </li></ul></ul><ul><li>If you shard, you can get some parallelism across multiple mongod instances with .map_reduce() (and possibly ‘$where’). Otherwise you’re single threaded. </li></ul>
  11. 11. PyMongo: GridFS >>> import gridfs >>> fs = gridfs.GridFS(db) >>> with fs .new_file() as fp: ... fp .write( 'The file') ... >>> fp <gridfs.grid_file.GridIn object at 0x2cae910> >>> fp ._id ObjectId('4e727f64eb03300c0b000003') >>> fs .get(fp._id).read() 'The file' <ul><li>Arbitrary data can be attached to the ‘fp’ object – it’s just a Document </li></ul><ul><ul><li>Mime type </li></ul></ul><ul><ul><li>Filename </li></ul></ul>
  12. 12. PyMongo: GridFS Versioning >>> file_id = fs .put( 'Moar data!', filename = 'foo.txt') >>> fs .get_last_version( 'foo.txt') .read() 'Moar data!’ >>> file_id = fs .put( 'Even moar data!', filename = 'foo.txt') >>> fs .get_last_version( 'foo.txt') .read() 'Even moar data!’ >>> fs .get_version( 'foo.txt', - 2) .read() 'Moar data!’ >>> fs .list() [u'foo.txt'] >>> fs .delete(fs.get_last_version( 'foo.txt') ._id) >>> fs .list() [u'foo.txt'] >>> fs .delete(fs.get_last_version( 'foo.txt') ._id) >>> fs .list() []
  13. 13. <ul><li>SourceForge and MongoDB </li></ul><ul><li>Get started with PyMongo </li></ul><ul><li>Sprinkle in some Ming schemas </li></ul><ul><li>ORM: When a dict just won’t do </li></ul><ul><li>What we are learning </li></ul>
  14. 14. Why Ming? <ul><li>Your data has a schema </li></ul><ul><ul><li>Your database can define and enforce it </li></ul></ul><ul><ul><li>It can live in your application (as with MongoDB) </li></ul></ul><ul><ul><li>Nice to have the schema defined in one place in the code </li></ul></ul><ul><li>Sometimes you need a “migration” </li></ul><ul><ul><li>Changing the structure/meaning of fields </li></ul></ul><ul><ul><li>Adding indexes, particularly unique indexes </li></ul></ul><ul><ul><li>Sometimes lazy, sometimes eager </li></ul></ul><ul><li>“ Unit of work:” Queuing up all your updates can be handy </li></ul><ul><li>Python dicts are nice; objects are nicer </li></ul>
  15. 15. Ming: Engines & Sessions >>> import ming.datastore >>> ds = ming.datastore.DataStore( 'mongodb://localhost:27017', database = 'test') >>> ds .db Database(Connection('localhost', 27017), u'test') >>> session = ming.Session(ds) >>> session .db Database(Connection('localhost', 27017), u'test') >>> ming .configure(**{ 'ming.main.master':'mongodb://localhost:27017', 'ming.main.database':'test'}) >>> Session .by_name( 'main') .db Database(Connection(u'localhost', 27017), u'test')
  16. 16. Ming: Define Your Schema <ul><li>from ming import schema, Field </li></ul><ul><li>WikiDoc = collection(‘ wiki_page' , session, </li></ul><ul><li>Field( '_id' , schema . ObjectId()), </li></ul><ul><li>Field( 'title' , str , index = True ), </li></ul><ul><li>Field( 'text' , str )) </li></ul><ul><li>CommentDoc = collection(‘ comment' , session, </li></ul><ul><li>Field( '_id' , schema . ObjectId()), </li></ul><ul><li>Field( 'page_id' , schema . ObjectId(), index = True ), </li></ul><ul><li>Field( 'text' , str )) </li></ul>
  17. 17. Ming: Define Your Schema… Once more, with feeling <ul><li>from ming import Document, Session, Field </li></ul><ul><li>class WikiDoc (Document): </li></ul><ul><li>class __mongometa__ : </li></ul><ul><li>session =Session.by_name( ’main') </li></ul><ul><li>name = 'wiki_page’ </li></ul><ul><li>indexes =[ ( 'title') ] </li></ul><ul><li>title = Field( str) </li></ul><ul><li>text = Field( str) </li></ul><ul><li>… </li></ul><ul><li>Old declarative syntax continues to exist and be supported, but it’s not being actively improved </li></ul>
  18. 18. Ming: Use Your Schema <ul><li>>>> doc = WikiDoc( dict(title = 'Cats', text = 'I can haz cheezburger?')) </li></ul><ul><li>>>> doc .m.save() </li></ul><ul><li>>>> WikiDoc .m.find() </li></ul><ul><li><ming.base.Cursor object at 0x2c2cd90> </li></ul><ul><li>>>> WikiDoc .m.find().all() </li></ul><ul><li>[{'text': u'I can haz cheezburger?', '_id': ObjectId('4e727163eb03300c0b000001'), 'title': u'Cats'}] </li></ul><ul><li>>>> WikiDoc .m.find().one().text </li></ul><ul><li>u'I can haz cheezburger?’ </li></ul><ul><li>>>> doc = WikiDoc( dict(tietul = 'LOL', text = 'Invisible bicycle')) </li></ul><ul><li>>>> doc .m.save() </li></ul><ul><li>Traceback (most recent call last): File &quot;<stdin>&quot;, line 1, … </li></ul><ul><li>ming.schema.Invalid : <class 'ming.metadata.Document<wiki_page>'>: Extra keys: set(['tietul']) </li></ul>
  19. 19. Ming: Adding Your own Types <ul><li>Not usually necessary, built-in SchemaItems provide BSON types, default values, etc. </li></ul>class ForceInt (ming .schema.FancySchemaItem): def _validate( self, value): try : return int(value) except TypeError: raise Invalid( 'Bad value %s ' % value, value, None)
  20. 20. Ming Bonus: Mongo-in-Memory >>> ming .datastore.DataStore( 'mim://', database = 'test') .db mim.Database(test) <ul><li>MongoDB is (generally) fast </li></ul><ul><ul><li>… except when creating databases </li></ul></ul><ul><ul><li>… particularly when you preallocate </li></ul></ul><ul><li>Unit tests like things to be isolated </li></ul><ul><li>MIM gives you isolation at the expense of speed & scaling </li></ul>
  21. 21. <ul><li>SourceForge and MongoDB </li></ul><ul><li>Get started with PyMongo </li></ul><ul><li>Sprinkle in some Ming schemas </li></ul><ul><li>ORM: When a dict just won’t do </li></ul><ul><li>What we are learning </li></ul>
  22. 22. Ming ORM: Classes and Collections from ming import schema, Field from ming.orm import (mapper, Mapper, RelationProperty, ForeignIdProperty) WikiDoc = collection(‘ wiki_page' , session, Field( '_id' , schema . ObjectId()), Field( 'title' , str , index = True ), Field( 'text' , str )) CommentDoc = collection(‘ comment' , session, Field( '_id' , schema . ObjectId()), Field( 'page_id' , schema . ObjectId(), index = True ), Field( 'text' , str )) class WikiPage ( object ): pass class Comment ( object ): pass ormsession . mapper(WikiPage, WikiDoc, properties = dict ( comments = RelationProperty( 'WikiComment' ))) ormsession . mapper(Comment, CommentDoc, properties = dict ( page_id = ForeignIdProperty( 'WikiPage' ), page = RelationProperty( 'WikiPage' ))) Mapper . compile_all()
  23. 23. Ming ORM: Classes and Collections (declarative) class WikiPage (MappedClass): class __mongometa__ : session = main_orm_session name= 'wiki_page’ indexes = [ 'title' ] _id =FieldProperty(S.ObjectId) title = FieldProperty( str) text = FieldProperty( str) class CommentDoc (MappedClass): class __mongometa__ : session = main_orm_session name= 'comment’ indexes = [ 'page_id' ] _id =FieldProperty(S.ObjectId) page_id = ForeignIdProperty(WikiPage) page = RelationProperty(WikiPage) text = FieldProperty( str)
  24. 24. Ming ORM: Sessions and Queries <ul><li>Session  ORMSession </li></ul><ul><li>My_collection.m…  My_mapped_class.query… </li></ul><ul><li>ORMSession actually does stuff </li></ul><ul><ul><li>Track object identity </li></ul></ul><ul><ul><li>Track object modifications </li></ul></ul><ul><ul><li>Unit of work flushing all changes at once </li></ul></ul>>>> pg = WikiPage(title= 'MyPage', text = 'is here') >>> session .db.wiki_page.count() 0 >>> main_orm_session .flush() >>> session .db.wiki_page.count() 1
  25. 25. Ming ORM: Extending the Session <ul><li>Various plug points in the session </li></ul><ul><ul><li>before_flush </li></ul></ul><ul><ul><li>after_flush </li></ul></ul><ul><li>Some uses </li></ul><ul><ul><li>Logging changes to sensitive data or for analytics purposes </li></ul></ul><ul><ul><li>Full-text search indexing </li></ul></ul><ul><ul><li>“ last modified” fields </li></ul></ul>
  26. 26. <ul><li>SourceForge and MongoDB </li></ul><ul><li>Get started with PyMongo </li></ul><ul><li>Sprinkle in some Ming Schemas </li></ul><ul><li>ORM: When a dict just won’t do </li></ul><ul><li>What we are learning </li></ul>
  27. 27. Tips From the Trenches <ul><li>Watch your document size </li></ul><ul><li>Choose your indexes well </li></ul><ul><ul><li>Watch your server log; bad queries show up there </li></ul></ul><ul><li>Don’t go crazy with denormalization </li></ul><ul><ul><li>Try to use an index if all you need is a backref </li></ul></ul><ul><ul><li>Stale data is a tricky problem </li></ul></ul><ul><li>Try to stay with one database </li></ul><ul><li>Watch the # of queries </li></ul><ul><li>Drop to lower levels (ORM  document  pymongo) when performance is an issue </li></ul>
  28. 28. Future Work <ul><li>Performance </li></ul><ul><li>Analytics in MongoDB: Zarkov </li></ul><ul><li>Web framework integration </li></ul><ul><li>Magic Columns (?) </li></ul><ul><li>??? </li></ul>
  29. 29. Related Projects Ming http://sf.net/projects/merciless/ MIT License Zarkov http://sf.net/p/zarkov/ Apache License Allura http://sf.net/p/allura/ Apache License PyMongo http://api.mongodb.org/python Apache License
  30. 30. Rick Copeland @rick446 [email_address]