Python mongo db-training-europython-2011
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Python mongo db-training-europython-2011

on

  • 9,009 views

Slides of my Python/MongoDB training given at EuroPython 2011 in Florence.

Slides of my Python/MongoDB training given at EuroPython 2011 in Florence.

Statistics

Views

Total Views
9,009
Views on SlideShare
5,069
Embed Views
3,940

Actions

Likes
21
Downloads
253
Comments
0

23 Embeds 3,940

http://blog.nosqlfan.com 2897
http://simple-is-better.com 485
http://beef.nisra.net:9999 192
http://lanyrd.com 121
http://dev1.veit-schiele.de 107
http://www.zopyx.de 44
http://beta.zopyx.com 39
http://edit.veit-schiele.de 20
http://www.makingsenseofspace.com 10
http://127.0.0.1 7
http://www.andreas-jung.com 3
url_unknown 2
http://www.simple-is-better.com 2
http://localhost 2
http://xianguo.com 1
http://www.uplook.cn 1
http://webcache.googleusercontent.com 1
http://twitter.com 1
http://simple-is-better.com.sixxs.org 1
http://www.alertize.com 1
http://reader.youdao.com 1
http://www.slideshare.net 1
http://cache.baiducontent.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Python mongo db-training-europython-2011 Presentation Transcript

  • 1. Python andMongoDBThe perfect Match
    Andreas Jung, www.zopyx.com
  • 2. Trainer Andreas Jung
    Python developersince 1993
    Python, Zope & Plonedevelopment
    Specialized in Electronic Publishing
    DirectoroftheZopeFoundation
    Authorofdozensadd-onsfor Python, ZopeandPlone
    Co-Founderofthe German Zope User Group (DZUG)
    Member ofthePloneFoundation
    usingMongoDBsince 2009
  • 3. Agenda (45 minutes per slot)
    IntroductiontoMongoDB
    UsingMongoDB
    UsingMongoDBfrom Python withPyMongo
    (PyMongoextensions/ORM-ishlayersor Q/A)
  • 4. Things not coveredin thistutorial
    Geospatialindexing
    Map-reduce
    Details on scaling (Sharding, Replicasets)
  • 5. Part I/4
    IntroductiontoMongoDB:
    ConceptsofMongoDB
    Architecture
    HowMongoDBcompareswith relational databases
    Scalability
  • 6. MongoDBis...
    an open-source,
    high-performance,
    schema-less,
    document-oriented
    database
  • 7. Let‘sagree on thefollowingorleave...
    MongoDBis cool
    MongoDBis not the multi-purpose-one-size-fits-all database
    MongoDBisanotheradditionaltoolforthesoftwaredeveloper
    MongoDBis not a replacementfor RDBMS in general
    Usetherighttoolforeachtask
  • 8. And.....
    Don‘taskmeabouthowto do JOINs in MongoDB
  • 9. Oh, SQL – let‘shavesomefunfirst
    A SQL statementwalksinto a bar andseestwotables. He walksandsays: „Hello, may I joinyou“
    A SQL injectionwalksinto a bar andstartstoquotesomething but suddenlystops, drops a tableanddashes out.
  • 10. The historyofMongoDB
    10gen founded in 2007
    Startedascloud-alternative GAE
    App-engineed
    Database p
    Javascriptasimplementationlanguage
    2008: focusing on thedatabasepart: MongoDB
    2009: firstMongoDBrelease
    2011: MongoDB 1.8:
    Major deployments
    A fast growingcommunity
    Fast adoptationfor large projects
    10gen growing
  • 11. Major MongoDBdeployments
  • 12. MongoDBis schema-less
    JSON-style datastore
    Eachdocumentcanhaveitsownschema
    Documentsinside a collectionusuallyshare a commonschemabyconvention
    {‚name‘ : ‚kate‘, ‚age‘:12, }
    {‚name‘ : ‚adam‘, ‚height‘ : 180}
    {‚q‘: 1234, ‚x‘ = [‚foo‘, ‚bar‘]}
  • 13. Terminology: RDBMS vs. MongoDB
  • 14. CharacteristicsofMongoDB (I)
    High-performance
    Rich querylanguage (similarto SQL)
    Map-Reduce (ifyoureallyneedit)
    Secondaryindexes
    Geospatialindexing
    Replication
    Auto-sharing (partitioningofdata)
    Manyplatforms, driversformanylanguages
  • 15. CharacteristicsofMongoDB (II)
    Notransactionsupport, onlyatomicoperations
    Default: „fire-and-forget“ modefor high throughput
    „Safe-Mode“: waitforserverconfirmation, checkingforerrors
  • 16. Typicalperformancecharacteristics
    Decentcommoditiyhardware:
    Upto 100.000 read/writes per second (fire-and-forget)
    Upto 50.000 reads/writes per second (safemode)
    Yourmileagemayvary– depending on
    RAM
    Speed IO system
    CPU
    Client-sidedriver& application
  • 17. Functionality vs. Scability
  • 18. MongoDB: Pros & Cons
  • 19. Durability
    Default: fire-and-forget (usesafe-mode)
    Changesarekept in RAM (!)
    Fsynctodiskevery 60 seconds (default)
    Deploymentoptions:
    Standaloneinstallation: usejournaling (V 1.8+)
    Replicated: usereplicasets(s)
  • 20. Differences from Typical RDBMS
    Memory mapped data
    All data in memory (if it fits), synced to disk periodically
    No joins
    Reads have greater data locality
    No joins between servers
    No transactions
    Improves performance of various operations
    No transactions between servers
  • 21. Replica Sets
    Cluster of N servers
    Only one node is ‘primary’ at a time
    This is equivalent to master
    The node where writes go
    Primary is elected by concensus
    Automatic failover
    Automatic recovery of failed nodes
  • 22. Replica Sets - Writes
    A write is only ‘committed’ once it has been replicated to a majority of nodes in the set
    Before this happens, reads to the set may or may not see the write
    On failover, data which is not ‘committed’ may be dropped (but not necessarily)
    If dropped, it will be rolled back from all servers which wrote it
    For improved durability, use getLastError/w
    Other criteria – block writes when nodes go down or slaves get too far behind
    Or, to reduce latency, reduce getLastError/w
  • 23. Replica Sets - Nodes
    Nodes monitor each other’s heartbeats
    If primary can’t see a majority of nodes, it relinquishes primary status
    If a majority of nodes notice there is no primary, they elect a primary using criteria
    Node priority
    Node data’s freshness
  • 24. Replica Sets - Nodes
    Member 1
    Member 2
    Member 3
  • 25. Replica Sets - Nodes
    {a:1}
    Member 1
    SECONDARY
    {a:1}
    {b:2}
    Member 2
    SECONDARY
    {a:1}
    {b:2}
    {c:3}
    Member 3
    PRIMARY
  • 26. Replica Sets - Nodes
    {a:1}
    Member 1
    SECONDARY
    {a:1}
    {b:2}
    Member 2
    PRIMARY
    {a:1}
    {b:2}
    {c:3}
    Member 3
    DOWN
  • 27. Replica Sets - Nodes
    {a:1}
    {b:2}
    Member 1
    SECONDARY
    {a:1}
    {b:2}
    Member 2
    PRIMARY
    {a:1}
    {b:2}
    {c:3}
    Member 3
    RECOVERING
  • 28. Replica Sets - Nodes
    {a:1}
    {b:2}
    Member 1
    SECONDARY
    {a:1}
    {b:2}
    Member 2
    PRIMARY
    {a:1}
    {b:2}
    Member 3
    SECONDARY
  • 29. Replica Sets – Node Types
    Standard – can be primary or secondary
    Passive – will be secondary but never primary
    Arbiter – will vote on primary, but won’t replicate data
  • 30. SlaveOk
    db.getMongo().setSlaveOk();
    Syntax varies by driver
    Writes to master, reads to slave
    Slave will be picked arbitrarily
  • 31. Sharding Architecture
  • 32. Shard
    A replica set
    Manages a well defined range of shard keys
  • 33. Shard
    Distribute data across machines
    Reduce data per machine
    Better able to fit in RAM
    Distribute write load across shards
    Distribute read load across shards, and across nodes within shards
  • 34. Shard Key
    { user_id: 1 }
    { lastname: 1, firstname: 1 }
    { tag: 1, timestamp: -1 }
    { _id: 1 }
    This is the default
  • 35. Mongos
    Routes data to/from shards
    db.users.find( { user_id: 5000 } )
    db.users.find( { user_id: { $gt: 4000, $lt: 6000 } } )
    db.users.find( { hometown: ‘Seattle’ } )
    db.users.find( { hometown: ‘Seattle’ } ).sort( { user_id: 1 } )
  • 36. Differences from Typical RDBMS
    Memory mapped data
    All data in memory (if it fits), synced to disk periodically
    No joins
    Reads have greater data locality
    No joins between servers
    No transactions
    Improves performance of various operations
    No transactions between servers
    A weak authentication and authorization model
  • 37. Part 2/4
    UsingMongoDB
    StartingMongoDB
    Usingtheinteractive Mongo console
    Basic databaseoperations
  • 38. Gettingstarted...theserver
    wget http://fastdl.mongodb.org/osx/mongodb-osx-x86_64-1.8.1.tgz
    tarxfzmongodb-osx-x86_64-1.8.1.tgz
    cd mongodb-osx-x86_64-1.8.1
    mkdir /tmp/db
    bin/mongod –dbpath /tmp/db
    Pick upyour OS-specificpackagefromhttp://www.mongodb.org/downloads
    Take careof 32 bitbs. 64 bitversion
  • 39. Gettingstarted...theconsole
    bin/mongod
    mongodlistenstoport 27017 bydefault
    HTTP interface on port 28017
    > help
    > db.help()
    > db.some_collection.help()
  • 40. Datatypes...
    Remember: MongoDBis schema-less
    MongoDBsupports JSON + some extra types
  • 41. A smalladdressdatabase
    Person:
    firstname
    lastname
    birthday
    city
    phone
  • 42. Inserting
    > db.foo.insert(document)
    > db.foo.insert({‚firstname‘ : ‚Ben‘})
    everydocumenthas an „_id“ field
    „_id“ insertedautomaticallyif not present
  • 43. Querying
    > db.foo.find(query_expression)
    > db.foo.find({‚firstname‘ : ‚Ben‘})
    Queriesareexpressedusing JSON notationwith JSON/BSON objects
    queryexpressionscombinedusing AND (bydefault)
    http://www.mongodb.org/display/DOCS/Querying
  • 44. Queryingwithsorting
    > db.foo.find({}).sort({‚firstname‘ :1, ‚age‘: -1})
    sortingspecification in JSON notation
    1 = ascending, -1 = descending
  • 45. Advancedquerying
    $all
    $exists
    $mod
    $ne
    $in
    $nin
    $nor
    $or
    $size
    $type
    http://www.mongodb.org/display/DOCS/Advanced+Queries
  • 46. Updating
    > db.foo.update(criteria, obj, multi, upsert)
    update() updatesonlyonedocumentbydefault (specifymulti=1)
    upsert=1: ifdocumentdoes not exist, insertit
  • 47. Updating – modifieroperations
    $inc
    $set
    $unset
    $push
    $pushAll
    $addToSet
    $pop
    $pull
    $pullAll
    $rename
    $bit
    http://www.mongodb.org/display/DOCS/Updating
  • 48. Updating
    > db.foo.update(criteria, obj, multi, upsert)
    update() updatesonlyonedocumentbydefault (specifymulti=1)
    upsert=1: ifdocumentdoes not exist, insertit
  • 49. Removing
    db.foo.remove({}) // remove all
    db.foo.remove({‚firstname‘ : ‚Ben‘}) // removebykey
    db.foo.remove({‚_id‘ : ObjectId(...)}) // removeby _id
    Atomicremoval(locksthedatabase)
    db.foo.remove( { age: 42, $atomic : true } )
    http://www.mongodb.org/display/DOCS/Removing
  • 50. Indexes
    workingsimilartoindex in relational databases
    db.foo.ensureIndex({age: 1}, {background: true})
    onequery– oneindex
    CompoundIndexes
    db.foo.ensureIndex({age: 1, firstname:-1}
    Orderingofqueryparametersmatters
    http://www.mongodb.org/display/DOCS/Indexes
  • 51. Embedded documents
    MongoDBdocs = JSON/BSON-like
    Embeededdocumentssimilarnesteddicts in Python
    db.foo.insert({firstname:‘Ben‘, data:{a:1, b:2, c:3})
    db.foo.find({‚data.a‘:1})
    Dottednotationforreachingintoembeddedocuments
    Usequotesarounddottednames
    Indexes work on embeddesdocuments
  • 52. Arrays (1/2)
    Like (nested) lists in Python
    db.foo.insert({colors: [‚green‘, ‚blue‘, ‚red‘]})
    db.foo.find({colors: ‚red‘})
    Useindexes
  • 53. Arrays (2/2) – matchingarrays
    db.bar.insert({users: [ {name: ‚Hans‘, age:42}, {name:‘Jim‘, age: 30 }, ]})
    db.bar.find({users : {‚$elemMatch‘: {age : {$gt:42}}}})
  • 54. Part 3/4
    UsingMongoDBfrom Python
    PyMongo
    InstallingPyMongo
    UsingPyMongo
  • 55. InstallingandtestingPyMongo
    Installpymongo
    virtualenv –no-site-packagespymongo
    bin/easy_installpymongo
    Start MongoDB
    mkdir /tmp/db
    mongod –dbpath /tmp/db
    Start Python
    bin/python
    > importpymongo
    > conn = pymongo.Connection(‚localhost‘, 27127)
  • 56. Part 4/4
    ? High-level PyMongoframeworks
    Mongokit
    Mongoengine
    MongoAlchemy
    ? Migration SQL toMongoDB
    ? Q/A
    ? Lookingat a real worldprojectdonewithPyramidandMongoDB?
    ? Let‘stalkabout..
  • 57. Mongokit (1/3)
    schemavalidation (wich usesimple pythontype forthedeclaration)
    dotednotation
    nestedandcomplexschemadeclaration
    untypedfieldsupport
    requiredfieldsvalidation
    defaultvalues
    customvalidators
    crossdatabasedocumentreference
    randomquerysupport (whichreturns a randomdocumentfromthedatabase)
    inheritanceandpolymorphismesupport
    versionizeddocumentsupport (in betastage)
    partial authsupport (itbrings a simple User model)
    operatorforvalidation (currently : OR, NOT and IS)
    simple web frameworkintegration
    import/exporttojson
    i18n support
    GridFSsupport
    documentmigrationsupport
  • 58. Mongokit (2/3)
    classBlogPost(Document):
    structure = {
    'title': unicode,
    'body': unicode,
    'author': pymongo.objectid.ObjectId,
    'created_at': datetime.datetime,
    'tags': [unicode],
    }
    required_fields = ['title','author', 'date_creation']
    blog_post = BlogPost()
    blog_post['title'] = 'myblogpost'
    blog_post['created_at'] = datetime.datetime.utcnow()
    blog_post.save()
  • 59. Mongokit (3/3)
    Speed andperformanceimpact
    Mongokitisalwaysbehindthemostcurrentpymongoversions
    one-man developershow
    http://namlook.github.com/mongokit/
  • 60. Mongoengine (1/2)
    MongoEngineis a Document-Object Mapper (think ORM, but fordocumentdatabases) forworkingwithMongoDBfrom Python. Ituses a simple declarative API, similartotheDjango ORM.
    http://mongoengine.org/
  • 61. Mongokit (2/2)
    classBlogPost(Document):
    title = StringField(required=True)
    body = StringField()
    author = ReferenceField(User)
    created_at = DateTimeField(required=True)
    tags = ListField(StringField())
    blog_post = BlogPost(title='myblogpost', created_at=datetime.datetime.utcnow())
    blog_post.save()
  • 62. MongoAlchemy (1/2)
    MongoAlchemyis a layer on top ofthe Python MongoDBdriverwhichadds client-sideschemadefinitions, an easiertoworkwithandprogrammaticquerylanguage, and a Document-Objectmapperwhichallowspythonobjectstobesavedandloadedintothedatabase in a type-safe way.
    An explicit goalofthisprojectistobeabletoperformasmanyoperationsaspossiblewithouthavingtoperform a load/save cyclesincedoing so isbothsignificantlyslowerandmorelikelytocausedataloss.
    http://mongoalchemy.org/
  • 63. MongoAlchemy(2/2)
    frommongoalchemy.documentimportDocument, DocumentField
    frommongoalchemy.fieldsimport *
    fromdatetimeimportdatetime
    frompprintimportpprint
    class Event(Document):
    name = StringField()
    children = ListField(DocumentField('Event'))
    begin = DateTimeField()
    end = DateTimeField()
    def __init__(self, name, parent=None):
    Document.__init__(self, name=name)
    self.children = []
    ifparent != None:
    parent.children.append(self)
  • 64. From SQL toMongoDB
  • 65. The CAP theorem
    Consistency
    Availablity
    TolerancetonetworkPartitions
    Pick two...
  • 66. ACID versus Base
    Atomicity
    Consistency
    Isolation
    Durability
    BasicallyAvailable
    Soft state
    Eventuallyconsistent