Lessons learnt building




                @RossC0
                http://github.com/rozza
WHAT IS MONGODB?

 A document database   { _id : ObjectId("..."),
                         author : "Ross",
 Highly scalable         date : ISODate("2012-07-05..."),
                         text : "About MongoDB...",
                         tags : [ "tech", "databases" ],
 Developer friendly      comments : [{
                           author : "Tim",
                           date : ISODate("2012-07-05..."),
                           text : "Best Post Ever!"
                         }],
                         comment_count : 1
                       }

http://mongodb.org

                                   In BSON
WHAT IS MONGODB?

{ "_id" : ObjectId("..."),             { _id : ObjectId("..."),
  "author" : "Ross",                     author : "Ross",
  "date" : datetime(2012,7,5,10,0),      date : ISODate("2012-07-05..."),
  "text" : "About MongoDB...",           text : "About MongoDB...",
  "tags" : ["tech", "databases"],        tags : [ "tech", "databases" ],
  "comments" : [{                        comments : [{
    "author" : "Tim",                      author : "Tim",
    "date" : datetime(2012,7,5,11,35),     date : ISODate("2012-07-05..."),
    "text" : "Best Post Ever!"             text : "Best Post Ever!"
  }],                                    }],
  "comment_count" : 1                    comment_count : 1
}                                      }




             In Python                             In BSON
Want to know more?




http://education.10gen.com
WHY DO YOU EVEN NEED AN ODM?
http://www.flickr.com/photos/51838104@N02/5841690990
SCHEMA LESS != CHAOS
MongoDB a good fit
Documents schema in code
Enforces schema
Data validation
Speeds up development
Build tooling off it
Can DRY up code...
Inspired by Django's ORM

Supports Python 2.5 - Python 3.3

Originally authored by Harry Marr 2010

I took over development in May 2011

Current release 0.7.5

http://github.com/MongoEngine/mongoengine
INTRODUCING
                 MONGOENGINE
class Post(Document):
    title = StringField(max_length=120, required=True)
    author = ReferenceField('User')
    tags = ListField(StringField(max_length=30))
    comments = ListField(EmbeddedDocumentField('Comment'))

class Comment(EmbeddedDocument):
    content = StringField()
    name = StringField(max_length=120)

class User(Document):
    email = StringField(required=True)
    first_name = StringField(max_length=50)
    last_name = StringField(max_length=50)
CREATING A MODEL
class Post(Document):
    title = StringField(max_length=120, required=True)
    author = ReferenceField('User')
    tags = ListField(StringField(max_length=30))
    comments = ListField(EmbeddedDocumentField('Comment'))


       Define a class inheriting from Document

       Map a field to a defined data type
       strings, ints, binary, files, lists etc..

       By default all declared fields aren't required

       Pass keyword arguments to apply constraints
       eg set if unique, max_length, default values.
INSERTING DATA
# Pass data into the constructor
user = User(email="ross@10gen.com", name="Ross").save()

# Create instance and edit / update in place
post = Post()
post.title = "mongoengine"
post.author = user
post.tags = ['odm', 'mongodb', 'python']
post.save()


      Create instance of the object

      Update its attributes

      Call save, insert, update to persist the data
QUERYING DATA

# An `objects` manager is added to every `Document` class
users = User.objects(email='ross@10gen.com')

# Pass kwargs to commands are lazy and be extended as needed
users.filter(auth=True)

# Iterating evaluates the queryset
print [u for u in users]


      Documents have a queryset manager (objects) for
      querying

      You can continually extend it

      Queryset evaluated on iteration
6 LESSONS LEARNT
LESSON 1: DIVE IN!
http://www.flickr.com/photos/jackace/565857899/
PROJECT STALLED
     In May 2011
>200 forks
>100 issues
~50 pull requests

      I needed it
Volunteered to help
Started reviewing issues
Supported Harry and
community
L E S S O N 2 : M E TA C L A S S E S
http://www.flickr.com/photos/ubique/135848053
WHATS NEEDED TO MAKE
      AN ORM?

Instance methods

 validation data
 manipulate data
 convert data to and from mongodb

Queryset methods

 Finding data
 Bulk changes
METACLASSES
class Document(object):
    __metaclass__ = TopLevelDocumentMetaclass
    ...

class EmbeddedDocument(object):
    __metaclass__ = DocumentMetaclass
    ...

   Needed for:

    1. inspect the object inheritance

    2. inject functionality to the class

   Its surprisingly simple - all we need is: __new__
METACLASSES 101

         IN               TopLevelDocument
cls, name, bases, attrs
                             Document

        Out                 python's type
      new class
METACLASSES

              IN                      TopLevelDocument
 Creates default meta data
 inheritance rules, id_field, index       Document
 information, default ordering.
 Merges in parents meta
 Validation                             python's type
 abstract flag on an inherited class
 collection set on a subclass

Manipulates the attrs going in.
METACLASSES

          IN                    TopLevelDocument
Merges all fields from parents
Adds in own field definitions        Document
Creates lookups
_db_field_map
_reverse_db_field_map             python's type
Determine superclasses
(for model inheritance)
METACLASSES

             OUT                                 TopLevelDocument
Adds in handling for delete rules                   Document
So we can handle deleted References
Adds class to the registry
So we can load the data into the correct class     python's type
METACLASSES

         OUT                   TopLevelDocument

Builds index specifications        Document
Injects queryset manager
Sets primary key (if needed)
                                 python's type
LESSONS LEARNT

Spend time learning what is being done and why

Don't continually patch:




Rewrote the metaclasses in 0.7
L E S S O N 3 : S T R AY I N G F R O M T H E PAT H
http://www.flickr.com/photos/51838104@N02/5841690990
REWRITING THE QUERY
             LANGUAGE
# In pymongo you pass dictionaries to query
uk_pages = db.page.find({"published": True})
# In mongoengine
uk_pages = Page.objects(published=True)

# pymongo dot syntax to query sub documents
uk_pages = db.page.find({"author.country": "uk"})
# In mongoengine we use dunder instead
uk_pages = Page.objects(author__country='uk')
REWRITING THE QUERY
             LANGUAGE
#Somethings are nicer - regular expresion search
db.post.find({'title': re.compile('MongoDB', re.IGNORECASE)})
Post.objects(title__icontains='MongoDB') # In mongoengine

# Chaining is nicer
db.post.update({"published": False},
               {"$set": {"published": True}}, multi=True)

Post.objects(published=False).update(set__published=True)
LESSON 4: NOT ALL IDEAS ARE GOOD
http://www.flickr.com/photos/abiding_silence/6951229015
CHANGING SAVE
# In pymongo save replaces the whole document
db.post.save({'_id': 'my_id', 'title': 'MongoDB',
              'published': True})

# In mongoengine we track changes
post = Post.objects(_id='my_id').first()
post.published = True
post.save()

# Results in:
db.post.update({'_id': 'my_id'},
               {'$set': {'published': True}})
CHANGING SAVE

Any field changes are noted

How to monitor lists and dicts?

  Custom List and Dict classes

  Observes changes and marks as dirty
HOW IT WORKS
class Post(Document):
    title = StringField(max_length=120, required=True)
    author = ReferenceField('User')
    tags = ListField(StringField(max_length=30))
    comments = ListField(EmbeddedDocumentField('Comment'))

class User(Document):
    email = StringField(required=True)
    first_name = StringField(max_length=50)
    last_name = StringField(max_length=50)

class Comment(EmbeddedDocument):
    content = StringField()
    name = StringField(max_length=120)
HOW IT WORKS
post = Post.objects.first()      Post
post.comments[1].name = 'Fred'
post.save()
                                 - comments

                                        comment

                                        comment

                                        comment
HOW IT WORKS
post.comments[1].name = 'Fred'   Post

1.Convert the
                                 - comments
  comments
                                        comment
  data to a
  BaseList                              comment

BaseList      Stores                    comment
the instance and
name / location
HOW IT WORKS
post.comments[1].name = 'Fred'   Post

2. Convert the
                                 - comments
   comment
                                        comment
   data to
   BaseDict                             comment

sets name as:                           comment
'comments.1'
HOW IT WORKS
post.comments[1].name = 'Fred'   Post

3.Change name
                                 - comments
  to "Fred"
                                        comment

4. Tell Post                            comment
  'comments.1.name'
  has changed                           comment
HOW IT WORKS
post.save()                       Post

5.On save()
                                  - comments
Iterate all the changes
on post and run                          comment
$set / $unset queries
                                         comment
db.post.update(
  {'_id': 'my_id'},
  {'$set': {                             comment
    'comments.1.name': 'Fred'}}
)
A GOOD IDEA?

+ Makes it easier to use
+ save acts how people think it should
- Its expensive
- Doesn't help people understand MongoDB
LESSON 5: MANAGING A COMMUNIT Y
http://kingscross.the-hub.net/
CODERS JUST WANT TO CODE         *


       Github effect
 >10 django mongoengine
 projects
 None active on pypi
 Little cross project
 communication




* Side effect of being stalled?
REACH OUT
  Flask-mongoengine on
  pypi
   There were 2 different
  projects
   Now has extra maintainers
   from the flask-mongorest

 Django-mongoengine*
 Spoke to authors of 7
 projects and merged their
 work together to form a
 single library

* Coming soon!
THE COMMUNITY

Not all ideas are good!

Vocal people don't always have great ideas

Travis is great*
* but you still have to read the pull request

Communities have to be managed
I've only just started to learn how to herding cats
LESSON 6: DON' T BE AFR AID TO ASK
http://www.flickr.com/photos/kandyjaxx/2012468692
I NEED HELP ;)
 Website
 Documentation
 Tutorials
 Framework support
 Core mongoengine

http://twitter.com/RossC0




          http://github.com/MongoEngine
QUESTIONS?
http://www.flickr.com/photos/9550033@N04/5020799468

Pyconie 2012

  • 1.
    Lessons learnt building @RossC0 http://github.com/rozza
  • 2.
    WHAT IS MONGODB? A document database { _id : ObjectId("..."), author : "Ross", Highly scalable date : ISODate("2012-07-05..."), text : "About MongoDB...", tags : [ "tech", "databases" ], Developer friendly comments : [{ author : "Tim", date : ISODate("2012-07-05..."), text : "Best Post Ever!" }], comment_count : 1 } http://mongodb.org In BSON
  • 3.
    WHAT IS MONGODB? {"_id" : ObjectId("..."), { _id : ObjectId("..."), "author" : "Ross", author : "Ross", "date" : datetime(2012,7,5,10,0), date : ISODate("2012-07-05..."), "text" : "About MongoDB...", text : "About MongoDB...", "tags" : ["tech", "databases"], tags : [ "tech", "databases" ], "comments" : [{ comments : [{ "author" : "Tim", author : "Tim", "date" : datetime(2012,7,5,11,35), date : ISODate("2012-07-05..."), "text" : "Best Post Ever!" text : "Best Post Ever!" }], }], "comment_count" : 1 comment_count : 1 } } In Python In BSON
  • 4.
    Want to knowmore? http://education.10gen.com
  • 5.
    WHY DO YOUEVEN NEED AN ODM? http://www.flickr.com/photos/51838104@N02/5841690990
  • 6.
    SCHEMA LESS !=CHAOS MongoDB a good fit Documents schema in code Enforces schema Data validation Speeds up development Build tooling off it Can DRY up code...
  • 7.
    Inspired by Django'sORM Supports Python 2.5 - Python 3.3 Originally authored by Harry Marr 2010 I took over development in May 2011 Current release 0.7.5 http://github.com/MongoEngine/mongoengine
  • 8.
    INTRODUCING MONGOENGINE class Post(Document): title = StringField(max_length=120, required=True) author = ReferenceField('User') tags = ListField(StringField(max_length=30)) comments = ListField(EmbeddedDocumentField('Comment')) class Comment(EmbeddedDocument): content = StringField() name = StringField(max_length=120) class User(Document): email = StringField(required=True) first_name = StringField(max_length=50) last_name = StringField(max_length=50)
  • 9.
    CREATING A MODEL classPost(Document): title = StringField(max_length=120, required=True) author = ReferenceField('User') tags = ListField(StringField(max_length=30)) comments = ListField(EmbeddedDocumentField('Comment')) Define a class inheriting from Document Map a field to a defined data type strings, ints, binary, files, lists etc.. By default all declared fields aren't required Pass keyword arguments to apply constraints eg set if unique, max_length, default values.
  • 10.
    INSERTING DATA # Passdata into the constructor user = User(email="ross@10gen.com", name="Ross").save() # Create instance and edit / update in place post = Post() post.title = "mongoengine" post.author = user post.tags = ['odm', 'mongodb', 'python'] post.save() Create instance of the object Update its attributes Call save, insert, update to persist the data
  • 11.
    QUERYING DATA # An`objects` manager is added to every `Document` class users = User.objects(email='ross@10gen.com') # Pass kwargs to commands are lazy and be extended as needed users.filter(auth=True) # Iterating evaluates the queryset print [u for u in users] Documents have a queryset manager (objects) for querying You can continually extend it Queryset evaluated on iteration
  • 12.
  • 13.
    LESSON 1: DIVEIN! http://www.flickr.com/photos/jackace/565857899/
  • 14.
    PROJECT STALLED In May 2011 >200 forks >100 issues ~50 pull requests I needed it Volunteered to help Started reviewing issues Supported Harry and community
  • 15.
    L E SS O N 2 : M E TA C L A S S E S http://www.flickr.com/photos/ubique/135848053
  • 16.
    WHATS NEEDED TOMAKE AN ORM? Instance methods validation data manipulate data convert data to and from mongodb Queryset methods Finding data Bulk changes
  • 17.
    METACLASSES class Document(object): __metaclass__ = TopLevelDocumentMetaclass ... class EmbeddedDocument(object): __metaclass__ = DocumentMetaclass ... Needed for: 1. inspect the object inheritance 2. inject functionality to the class Its surprisingly simple - all we need is: __new__
  • 18.
    METACLASSES 101 IN TopLevelDocument cls, name, bases, attrs Document Out python's type new class
  • 19.
    METACLASSES IN TopLevelDocument Creates default meta data inheritance rules, id_field, index Document information, default ordering. Merges in parents meta Validation python's type abstract flag on an inherited class collection set on a subclass Manipulates the attrs going in.
  • 20.
    METACLASSES IN TopLevelDocument Merges all fields from parents Adds in own field definitions Document Creates lookups _db_field_map _reverse_db_field_map python's type Determine superclasses (for model inheritance)
  • 21.
    METACLASSES OUT TopLevelDocument Adds in handling for delete rules Document So we can handle deleted References Adds class to the registry So we can load the data into the correct class python's type
  • 22.
    METACLASSES OUT TopLevelDocument Builds index specifications Document Injects queryset manager Sets primary key (if needed) python's type
  • 23.
    LESSONS LEARNT Spend timelearning what is being done and why Don't continually patch: Rewrote the metaclasses in 0.7
  • 24.
    L E SS O N 3 : S T R AY I N G F R O M T H E PAT H http://www.flickr.com/photos/51838104@N02/5841690990
  • 25.
    REWRITING THE QUERY LANGUAGE # In pymongo you pass dictionaries to query uk_pages = db.page.find({"published": True}) # In mongoengine uk_pages = Page.objects(published=True) # pymongo dot syntax to query sub documents uk_pages = db.page.find({"author.country": "uk"}) # In mongoengine we use dunder instead uk_pages = Page.objects(author__country='uk')
  • 26.
    REWRITING THE QUERY LANGUAGE #Somethings are nicer - regular expresion search db.post.find({'title': re.compile('MongoDB', re.IGNORECASE)}) Post.objects(title__icontains='MongoDB') # In mongoengine # Chaining is nicer db.post.update({"published": False}, {"$set": {"published": True}}, multi=True) Post.objects(published=False).update(set__published=True)
  • 27.
    LESSON 4: NOTALL IDEAS ARE GOOD http://www.flickr.com/photos/abiding_silence/6951229015
  • 28.
    CHANGING SAVE # Inpymongo save replaces the whole document db.post.save({'_id': 'my_id', 'title': 'MongoDB', 'published': True}) # In mongoengine we track changes post = Post.objects(_id='my_id').first() post.published = True post.save() # Results in: db.post.update({'_id': 'my_id'}, {'$set': {'published': True}})
  • 29.
    CHANGING SAVE Any fieldchanges are noted How to monitor lists and dicts? Custom List and Dict classes Observes changes and marks as dirty
  • 30.
    HOW IT WORKS classPost(Document): title = StringField(max_length=120, required=True) author = ReferenceField('User') tags = ListField(StringField(max_length=30)) comments = ListField(EmbeddedDocumentField('Comment')) class User(Document): email = StringField(required=True) first_name = StringField(max_length=50) last_name = StringField(max_length=50) class Comment(EmbeddedDocument): content = StringField() name = StringField(max_length=120)
  • 31.
    HOW IT WORKS post= Post.objects.first() Post post.comments[1].name = 'Fred' post.save() - comments comment comment comment
  • 32.
    HOW IT WORKS post.comments[1].name= 'Fred' Post 1.Convert the - comments comments comment data to a BaseList comment BaseList Stores comment the instance and name / location
  • 33.
    HOW IT WORKS post.comments[1].name= 'Fred' Post 2. Convert the - comments comment comment data to BaseDict comment sets name as: comment 'comments.1'
  • 34.
    HOW IT WORKS post.comments[1].name= 'Fred' Post 3.Change name - comments to "Fred" comment 4. Tell Post comment 'comments.1.name' has changed comment
  • 35.
    HOW IT WORKS post.save() Post 5.On save() - comments Iterate all the changes on post and run comment $set / $unset queries comment db.post.update( {'_id': 'my_id'}, {'$set': { comment 'comments.1.name': 'Fred'}} )
  • 36.
    A GOOD IDEA? +Makes it easier to use + save acts how people think it should - Its expensive - Doesn't help people understand MongoDB
  • 37.
    LESSON 5: MANAGINGA COMMUNIT Y http://kingscross.the-hub.net/
  • 38.
    CODERS JUST WANTTO CODE * Github effect >10 django mongoengine projects None active on pypi Little cross project communication * Side effect of being stalled?
  • 39.
    REACH OUT Flask-mongoengine on pypi There were 2 different projects Now has extra maintainers from the flask-mongorest Django-mongoengine* Spoke to authors of 7 projects and merged their work together to form a single library * Coming soon!
  • 40.
    THE COMMUNITY Not allideas are good! Vocal people don't always have great ideas Travis is great* * but you still have to read the pull request Communities have to be managed I've only just started to learn how to herding cats
  • 41.
    LESSON 6: DON'T BE AFR AID TO ASK http://www.flickr.com/photos/kandyjaxx/2012468692
  • 42.
    I NEED HELP;) Website Documentation Tutorials Framework support Core mongoengine http://twitter.com/RossC0 http://github.com/MongoEngine
  • 43.