Python mongo db-training-europython-2011
Upcoming SlideShare
Loading in...5
×
 

Python mongo db-training-europython-2011

on

  • 8,777 views

Slides of my Python/MongoDB training given at EuroPython 2011 in Florence.

Slides of my Python/MongoDB training given at EuroPython 2011 in Florence.

Statistics

Views

Total Views
8,777
Slideshare-icon Views on SlideShare
4,918
Embed Views
3,859

Actions

Likes
21
Downloads
249
Comments
0

23 Embeds 3,859

http://blog.nosqlfan.com 2818
http://simple-is-better.com 483
http://beef.nisra.net:9999 192
http://lanyrd.com 121
http://dev1.veit-schiele.de 107
http://www.zopyx.de 44
http://beta.zopyx.com 39
http://edit.veit-schiele.de 20
http://www.makingsenseofspace.com 10
http://127.0.0.1 7
http://www.andreas-jung.com 3
url_unknown 2
http://www.simple-is-better.com 2
http://localhost 2
http://xianguo.com 1
http://www.uplook.cn 1
http://webcache.googleusercontent.com 1
http://twitter.com 1
http://simple-is-better.com.sixxs.org 1
http://www.alertize.com 1
http://reader.youdao.com 1
http://www.slideshare.net 1
http://cache.baiducontent.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Python mongo db-training-europython-2011 Python mongo db-training-europython-2011 Presentation Transcript

    • Python andMongoDBThe perfect Match
      Andreas Jung, www.zopyx.com
    • Trainer Andreas Jung
      Python developersince 1993
      Python, Zope & Plonedevelopment
      Specialized in Electronic Publishing
      DirectoroftheZopeFoundation
      Authorofdozensadd-onsfor Python, ZopeandPlone
      Co-Founderofthe German Zope User Group (DZUG)
      Member ofthePloneFoundation
      usingMongoDBsince 2009
    • Agenda (45 minutes per slot)
      IntroductiontoMongoDB
      UsingMongoDB
      UsingMongoDBfrom Python withPyMongo
      (PyMongoextensions/ORM-ishlayersor Q/A)
    • Things not coveredin thistutorial
      Geospatialindexing
      Map-reduce
      Details on scaling (Sharding, Replicasets)
    • Part I/4
      IntroductiontoMongoDB:
      ConceptsofMongoDB
      Architecture
      HowMongoDBcompareswith relational databases
      Scalability
    • MongoDBis...
      an open-source,
      high-performance,
      schema-less,
      document-oriented
      database
    • Let‘sagree on thefollowingorleave...
      MongoDBis cool
      MongoDBis not the multi-purpose-one-size-fits-all database
      MongoDBisanotheradditionaltoolforthesoftwaredeveloper
      MongoDBis not a replacementfor RDBMS in general
      Usetherighttoolforeachtask
    • And.....
      Don‘taskmeabouthowto do JOINs in MongoDB
    • Oh, SQL – let‘shavesomefunfirst
      A SQL statementwalksinto a bar andseestwotables. He walksandsays: „Hello, may I joinyou“
      A SQL injectionwalksinto a bar andstartstoquotesomething but suddenlystops, drops a tableanddashes out.
    • The historyofMongoDB
      10gen founded in 2007
      Startedascloud-alternative GAE
      App-engineed
      Database p
      Javascriptasimplementationlanguage
      2008: focusing on thedatabasepart: MongoDB
      2009: firstMongoDBrelease
      2011: MongoDB 1.8:
      Major deployments
      A fast growingcommunity
      Fast adoptationfor large projects
      10gen growing
    • Major MongoDBdeployments
    • MongoDBis schema-less
      JSON-style datastore
      Eachdocumentcanhaveitsownschema
      Documentsinside a collectionusuallyshare a commonschemabyconvention
      {‚name‘ : ‚kate‘, ‚age‘:12, }
      {‚name‘ : ‚adam‘, ‚height‘ : 180}
      {‚q‘: 1234, ‚x‘ = [‚foo‘, ‚bar‘]}
    • Terminology: RDBMS vs. MongoDB
    • CharacteristicsofMongoDB (I)
      High-performance
      Rich querylanguage (similarto SQL)
      Map-Reduce (ifyoureallyneedit)
      Secondaryindexes
      Geospatialindexing
      Replication
      Auto-sharing (partitioningofdata)
      Manyplatforms, driversformanylanguages
    • CharacteristicsofMongoDB (II)
      Notransactionsupport, onlyatomicoperations
      Default: „fire-and-forget“ modefor high throughput
      „Safe-Mode“: waitforserverconfirmation, checkingforerrors
    • Typicalperformancecharacteristics
      Decentcommoditiyhardware:
      Upto 100.000 read/writes per second (fire-and-forget)
      Upto 50.000 reads/writes per second (safemode)
      Yourmileagemayvary– depending on
      RAM
      Speed IO system
      CPU
      Client-sidedriver& application
    • Functionality vs. Scability
    • MongoDB: Pros & Cons
    • Durability
      Default: fire-and-forget (usesafe-mode)
      Changesarekept in RAM (!)
      Fsynctodiskevery 60 seconds (default)
      Deploymentoptions:
      Standaloneinstallation: usejournaling (V 1.8+)
      Replicated: usereplicasets(s)
    • Differences from Typical RDBMS
      Memory mapped data
      All data in memory (if it fits), synced to disk periodically
      No joins
      Reads have greater data locality
      No joins between servers
      No transactions
      Improves performance of various operations
      No transactions between servers
    • Replica Sets
      Cluster of N servers
      Only one node is ‘primary’ at a time
      This is equivalent to master
      The node where writes go
      Primary is elected by concensus
      Automatic failover
      Automatic recovery of failed nodes
    • Replica Sets - Writes
      A write is only ‘committed’ once it has been replicated to a majority of nodes in the set
      Before this happens, reads to the set may or may not see the write
      On failover, data which is not ‘committed’ may be dropped (but not necessarily)
      If dropped, it will be rolled back from all servers which wrote it
      For improved durability, use getLastError/w
      Other criteria – block writes when nodes go down or slaves get too far behind
      Or, to reduce latency, reduce getLastError/w
    • Replica Sets - Nodes
      Nodes monitor each other’s heartbeats
      If primary can’t see a majority of nodes, it relinquishes primary status
      If a majority of nodes notice there is no primary, they elect a primary using criteria
      Node priority
      Node data’s freshness
    • Replica Sets - Nodes
      Member 1
      Member 2
      Member 3
    • Replica Sets - Nodes
      {a:1}
      Member 1
      SECONDARY
      {a:1}
      {b:2}
      Member 2
      SECONDARY
      {a:1}
      {b:2}
      {c:3}
      Member 3
      PRIMARY
    • Replica Sets - Nodes
      {a:1}
      Member 1
      SECONDARY
      {a:1}
      {b:2}
      Member 2
      PRIMARY
      {a:1}
      {b:2}
      {c:3}
      Member 3
      DOWN
    • Replica Sets - Nodes
      {a:1}
      {b:2}
      Member 1
      SECONDARY
      {a:1}
      {b:2}
      Member 2
      PRIMARY
      {a:1}
      {b:2}
      {c:3}
      Member 3
      RECOVERING
    • Replica Sets - Nodes
      {a:1}
      {b:2}
      Member 1
      SECONDARY
      {a:1}
      {b:2}
      Member 2
      PRIMARY
      {a:1}
      {b:2}
      Member 3
      SECONDARY
    • Replica Sets – Node Types
      Standard – can be primary or secondary
      Passive – will be secondary but never primary
      Arbiter – will vote on primary, but won’t replicate data
    • SlaveOk
      db.getMongo().setSlaveOk();
      Syntax varies by driver
      Writes to master, reads to slave
      Slave will be picked arbitrarily
    • Sharding Architecture
    • Shard
      A replica set
      Manages a well defined range of shard keys
    • Shard
      Distribute data across machines
      Reduce data per machine
      Better able to fit in RAM
      Distribute write load across shards
      Distribute read load across shards, and across nodes within shards
    • Shard Key
      { user_id: 1 }
      { lastname: 1, firstname: 1 }
      { tag: 1, timestamp: -1 }
      { _id: 1 }
      This is the default
    • Mongos
      Routes data to/from shards
      db.users.find( { user_id: 5000 } )
      db.users.find( { user_id: { $gt: 4000, $lt: 6000 } } )
      db.users.find( { hometown: ‘Seattle’ } )
      db.users.find( { hometown: ‘Seattle’ } ).sort( { user_id: 1 } )
    • Differences from Typical RDBMS
      Memory mapped data
      All data in memory (if it fits), synced to disk periodically
      No joins
      Reads have greater data locality
      No joins between servers
      No transactions
      Improves performance of various operations
      No transactions between servers
      A weak authentication and authorization model
    • Part 2/4
      UsingMongoDB
      StartingMongoDB
      Usingtheinteractive Mongo console
      Basic databaseoperations
    • Gettingstarted...theserver
      wget http://fastdl.mongodb.org/osx/mongodb-osx-x86_64-1.8.1.tgz
      tarxfzmongodb-osx-x86_64-1.8.1.tgz
      cd mongodb-osx-x86_64-1.8.1
      mkdir /tmp/db
      bin/mongod –dbpath /tmp/db
      Pick upyour OS-specificpackagefromhttp://www.mongodb.org/downloads
      Take careof 32 bitbs. 64 bitversion
    • Gettingstarted...theconsole
      bin/mongod
      mongodlistenstoport 27017 bydefault
      HTTP interface on port 28017
      > help
      > db.help()
      > db.some_collection.help()
    • Datatypes...
      Remember: MongoDBis schema-less
      MongoDBsupports JSON + some extra types
    • A smalladdressdatabase
      Person:
      firstname
      lastname
      birthday
      city
      phone
    • Inserting
      > db.foo.insert(document)
      > db.foo.insert({‚firstname‘ : ‚Ben‘})
      everydocumenthas an „_id“ field
      „_id“ insertedautomaticallyif not present
    • Querying
      > db.foo.find(query_expression)
      > db.foo.find({‚firstname‘ : ‚Ben‘})
      Queriesareexpressedusing JSON notationwith JSON/BSON objects
      queryexpressionscombinedusing AND (bydefault)
      http://www.mongodb.org/display/DOCS/Querying
    • Queryingwithsorting
      > db.foo.find({}).sort({‚firstname‘ :1, ‚age‘: -1})
      sortingspecification in JSON notation
      1 = ascending, -1 = descending
    • Advancedquerying
      $all
      $exists
      $mod
      $ne
      $in
      $nin
      $nor
      $or
      $size
      $type
      http://www.mongodb.org/display/DOCS/Advanced+Queries
    • Updating
      > db.foo.update(criteria, obj, multi, upsert)
      update() updatesonlyonedocumentbydefault (specifymulti=1)
      upsert=1: ifdocumentdoes not exist, insertit
    • Updating – modifieroperations
      $inc
      $set
      $unset
      $push
      $pushAll
      $addToSet
      $pop
      $pull
      $pullAll
      $rename
      $bit
      http://www.mongodb.org/display/DOCS/Updating
    • Updating
      > db.foo.update(criteria, obj, multi, upsert)
      update() updatesonlyonedocumentbydefault (specifymulti=1)
      upsert=1: ifdocumentdoes not exist, insertit
    • Removing
      db.foo.remove({}) // remove all
      db.foo.remove({‚firstname‘ : ‚Ben‘}) // removebykey
      db.foo.remove({‚_id‘ : ObjectId(...)}) // removeby _id
      Atomicremoval(locksthedatabase)
      db.foo.remove( { age: 42, $atomic : true } )
      http://www.mongodb.org/display/DOCS/Removing
    • Indexes
      workingsimilartoindex in relational databases
      db.foo.ensureIndex({age: 1}, {background: true})
      onequery– oneindex
      CompoundIndexes
      db.foo.ensureIndex({age: 1, firstname:-1}
      Orderingofqueryparametersmatters
      http://www.mongodb.org/display/DOCS/Indexes
    • Embedded documents
      MongoDBdocs = JSON/BSON-like
      Embeededdocumentssimilarnesteddicts in Python
      db.foo.insert({firstname:‘Ben‘, data:{a:1, b:2, c:3})
      db.foo.find({‚data.a‘:1})
      Dottednotationforreachingintoembeddedocuments
      Usequotesarounddottednames
      Indexes work on embeddesdocuments
    • Arrays (1/2)
      Like (nested) lists in Python
      db.foo.insert({colors: [‚green‘, ‚blue‘, ‚red‘]})
      db.foo.find({colors: ‚red‘})
      Useindexes
    • Arrays (2/2) – matchingarrays
      db.bar.insert({users: [ {name: ‚Hans‘, age:42}, {name:‘Jim‘, age: 30 }, ]})
      db.bar.find({users : {‚$elemMatch‘: {age : {$gt:42}}}})
    • Part 3/4
      UsingMongoDBfrom Python
      PyMongo
      InstallingPyMongo
      UsingPyMongo
    • InstallingandtestingPyMongo
      Installpymongo
      virtualenv –no-site-packagespymongo
      bin/easy_installpymongo
      Start MongoDB
      mkdir /tmp/db
      mongod –dbpath /tmp/db
      Start Python
      bin/python
      > importpymongo
      > conn = pymongo.Connection(‚localhost‘, 27127)
    • Part 4/4
      ? High-level PyMongoframeworks
      Mongokit
      Mongoengine
      MongoAlchemy
      ? Migration SQL toMongoDB
      ? Q/A
      ? Lookingat a real worldprojectdonewithPyramidandMongoDB?
      ? Let‘stalkabout..
    • Mongokit (1/3)
      schemavalidation (wich usesimple pythontype forthedeclaration)
      dotednotation
      nestedandcomplexschemadeclaration
      untypedfieldsupport
      requiredfieldsvalidation
      defaultvalues
      customvalidators
      crossdatabasedocumentreference
      randomquerysupport (whichreturns a randomdocumentfromthedatabase)
      inheritanceandpolymorphismesupport
      versionizeddocumentsupport (in betastage)
      partial authsupport (itbrings a simple User model)
      operatorforvalidation (currently : OR, NOT and IS)
      simple web frameworkintegration
      import/exporttojson
      i18n support
      GridFSsupport
      documentmigrationsupport
    • Mongokit (2/3)
      classBlogPost(Document):
      structure = {
      'title': unicode,
      'body': unicode,
      'author': pymongo.objectid.ObjectId,
      'created_at': datetime.datetime,
      'tags': [unicode],
      }
      required_fields = ['title','author', 'date_creation']
      blog_post = BlogPost()
      blog_post['title'] = 'myblogpost'
      blog_post['created_at'] = datetime.datetime.utcnow()
      blog_post.save()
    • Mongokit (3/3)
      Speed andperformanceimpact
      Mongokitisalwaysbehindthemostcurrentpymongoversions
      one-man developershow
      http://namlook.github.com/mongokit/
    • Mongoengine (1/2)
      MongoEngineis a Document-Object Mapper (think ORM, but fordocumentdatabases) forworkingwithMongoDBfrom Python. Ituses a simple declarative API, similartotheDjango ORM.
      http://mongoengine.org/
    • Mongokit (2/2)
      classBlogPost(Document):
      title = StringField(required=True)
      body = StringField()
      author = ReferenceField(User)
      created_at = DateTimeField(required=True)
      tags = ListField(StringField())
      blog_post = BlogPost(title='myblogpost', created_at=datetime.datetime.utcnow())
      blog_post.save()
    • MongoAlchemy (1/2)
      MongoAlchemyis a layer on top ofthe Python MongoDBdriverwhichadds client-sideschemadefinitions, an easiertoworkwithandprogrammaticquerylanguage, and a Document-Objectmapperwhichallowspythonobjectstobesavedandloadedintothedatabase in a type-safe way.
      An explicit goalofthisprojectistobeabletoperformasmanyoperationsaspossiblewithouthavingtoperform a load/save cyclesincedoing so isbothsignificantlyslowerandmorelikelytocausedataloss.
      http://mongoalchemy.org/
    • MongoAlchemy(2/2)
      frommongoalchemy.documentimportDocument, DocumentField
      frommongoalchemy.fieldsimport *
      fromdatetimeimportdatetime
      frompprintimportpprint
      class Event(Document):
      name = StringField()
      children = ListField(DocumentField('Event'))
      begin = DateTimeField()
      end = DateTimeField()
      def __init__(self, name, parent=None):
      Document.__init__(self, name=name)
      self.children = []
      ifparent != None:
      parent.children.append(self)
    • From SQL toMongoDB
    • The CAP theorem
      Consistency
      Availablity
      TolerancetonetworkPartitions
      Pick two...
    • ACID versus Base
      Atomicity
      Consistency
      Isolation
      Durability
      BasicallyAvailable
      Soft state
      Eventuallyconsistent