MongoDB SF Python

3,151 views

Published on

MongoDB presentation for SF Python's June meetup. Review of non-relational databases for the first half, focus on MongoDB for the second.

Published in: Technology

MongoDB SF Python

  1. 1. open-source, high-performance, schema-free, document-oriented database
  2. 2. RDBMS • Great for many applications • Shortcomings • Scalability • Flexibility
  3. 3. CAP Theorem • Consistency • Availability • Tolerance to network Partitions • Pick two http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf
  4. 4. ACID vs BASE • Atomicity • Basically Available • Consistency • Soft state • Isolation • Eventually consistent • Durability
  5. 5. Schema-free • Loosening constraints - added flexibility • Dynamically typed languages • Migrations
  6. 6. BigTable • Single master node • Row / Column hybrid • Versioned
  7. 7. BigTable • Open-source clones: • HBase • Hypertable
  8. 8. Dynamo • Simple Key/Value store • No master node • Write to any (many) nodes • Read from one or more nodes (balance speed vs. consistency) • Read repair
  9. 9. Dynamo • Open-source clones • Project Voldemort • Cassandra - data model more like BigTable • Dynomite
  10. 10. memcached • Used as a caching layer • Essentially a key/value store • RAM only - fast • Does away with ACID
  11. 11. Redis • Like memcached • Different • Values can be strings, lists, sets • Non-volatile
  12. 12. Tokyo Cabinet + Tyrant • Key/value store with focus on speed • Some more advanced queries • Sorting, range or prefix matching • Multiple storage engines • Hash, B-Tree, Fixed length and Table
  13. 13. • A lot in common with MongoDB: • Document-oriented • Schema-free • JSON-style documents
  14. 14. • Differences • MVCC based • Replication as path to scalability • Query through predefined views • ACID • REST
  15. 15. • Focus on performance • Rich dynamic queries • Secondary indexes • Replication / failover • Auto-sharding • Many platforms / languages supported
  16. 16. Good at • The web • Caching • High volume / low value • Scalability
  17. 17. Less good at • Highly transactional • Ad-hoc business intelligence • Problems that require SQL
  18. 18. PyMongo • Python driver for MongoDB • Pure Python, with optional C extension • Installation • Use setuptools: easy_install pymongo
  19. 19. Basics • Connect: • Insert: • Query:
  20. 20. Document values • None • Special values: • bool • SON • int • Binary • float • ObjectId • string / unicode • DBRef • dict • datetime.datetime • compiled re
  21. 21. • Schema-free: • Query with selectors:
  22. 22. Advanced queries • Sorting: • Limit: • Range:
  23. 23. • $gt, $lt, $gte, $lte, $ne, $all, $in, $nin • where() • count() • group()
  24. 24. GridFS • File storage in MongoDB • File-like API for Python • Initialize: • Write: • Read:
  25. 25. Other cool stuff • Capped collections • Upserts • Multikeys
  26. 26. Demo
  27. 27. • Download MongoDB http://www.mongodb.org • Install PyMongo • Try it out!
  28. 28. • http://www.mongodb.org • irc.freenode.net#mongodb • mongodb-user on google groups • @mongodb • mike@10gen.com

×