MongoDB
Justin Smestad


@jsmestad
justin@mongomachine.com
whoami

Software Engineer (Ruby, Javascript, and Clojure)
User Experience & Design on the side
Passion for DevOps
Chicagoland native
whoami

Software Engineer (Ruby, Javascript, and Clojure)
User Experience & Design on the side
Passion for DevOps
Chicagoland native
whoami

Software Engineer (Ruby, Javascript, and Clojure)
User Experience & Design on the side
Passion for DevOps
Chicagoland native
whoami

Software Engineer (Ruby, Javascript, and Clojure)
User Experience & Design on the side
Passion for DevOps
Chicagoland native
Background
Java Engineer @ Orbitz Worldwide
  Brought Ruby and RSpec/Cucumber to replace
  Fitness Testing
Application Developer @ Factory Design Labs
  Brand & Campaign sites for Audi, Oakley, and TNF
Independent Contractor
  Development team for TechStars alumni
Background
Java Engineer @ Orbitz Worldwide
  Brought Ruby and RSpec/Cucumber to replace
  Fitness Testing
Application Developer @ Factory Design Labs
  Brand & Campaign sites for Audi, Oakley, and TNF
Independent Contractor
  Development team for TechStars alumni
Background
Java Engineer @ Orbitz Worldwide
  Brought Ruby and RSpec/Cucumber to replace
  Fitness Testing
Application Developer @ Factory Design Labs
  Brand & Campaign sites for Audi, Oakley, and TNF
Independent Contractor
  Development team for TechStars alumni
mongo machine
    hosted MongoDB wherever you need it
What’s mongo machine ?
mongo machine
         hosted MongoDB + data management


 http://mongomachine.com
mongo machine    hosted MongoDB + data management

Our Goals
    Educate and promote best practices
    Increase customer efficiency
    Promote transparency
Our Product
Hosted MongoDB
Provide the best managed MongoDB
experience, on any platform.
mongo machine     hosted MongoDB + data management




Managed MongoDB Infrastructure
   Our infrastructure on AWS or Rackspace
   Automated Deployments to your own infrastructure

Management Console
   Instantly create new databases
   Track database trends
   Scale up & down on-demand
Management Console
Analytics data about what your DB is doing.
Management Console
Analytics data about what your DB is doing.
“NoSQL”
hasn’t this been tried before?
why should I care now?
So what is MongoDB?
      (10,000ft view)
MongoDB (from "humongous") is a
scalable, high-performance, open source,
    schema-free, document-oriented
                database.
                                 -- mongodb.org
Who is using MongoDB?
Fortune 500 <=> Startups
Github
Bug Tracking & Analytics
Foursquare
Check-in System
New York Times	
Photo Submissions
The list is large and growing fast




http://www.mongodb.org/display/DOCS/Production+Deployments
Philosophy
Philosophy



        “One size DOESN’T fit”
Philosophy



 Non-relational (de-normalized) DBs are easier to scale,
                  especially horizontally
Philosophy



 DBs should be an on-demand commodity (cloud-like)
Philosophy



 Focus on performance, flexibility and scalability (CA)
Philosophy



  Not concerned with transactional stuff or relational
                    semantics
Philosophy
Mongo aims for the
performance of key-
value stores while
maintaining
functionality of
traditional RDBMS
Features
Features
Standard database stuff
  Indexing
  Traditional master-slave
  Database References (no concept of JOINS)
Features
Standard database stuff
  Indexing
  Traditional master-slave
  Database References (no concept of JOINS)
Features
Standard database stuff
  Indexing
  Traditional master-slave
  Database References (no concept of JOINS)
Features
Standard database stuff
  Indexing
  Traditional master-slave
  Database References (no concept of JOINS)
Features


Speed
in-memory dataset with fsync to disk
Features


Durability
solve with replication, or use journaling
Features



Document-oriented database
Features: Document Storage



 Documents are stored in BSON (binary JSON)
Features: Document Storage



 BSON is a binary serialization of JSON-like objects
Features: Document Storage



 Large BSON documents are served in chunks (GridFS)
Features: Document Storage



 This is extremely powerful, b/c it means mongo
 understands JSON natively
Features: Document Storage



 Any valid JSON can be easily imported and queried
Features: Document Storage


 Documents can contain embedded documents
 (nested hashes) without losing any indexing
 capabilities.
Features


Schema-less; very flexible
no more blocking ALTER TABLE or NULL debates
Features



Replica Sets and Sharding make horizontal scaling easy.
Features: Replica Set
  Traditional Master-Slave, but automatic failover
  Each server holds all data (CA)
  One is elected master at a given time
  Arbitration process detects failover
  New election within seconds
Features: Replica Set
  Traditional Master-Slave, but automatic failover
  Each server holds all data (CA)
  One is elected master at a given time
  Arbitration process detects failover
  New election within seconds
Features: Replica Set
  Traditional Master-Slave, but automatic failover
  Each server holds all data (CA)
  One is elected master at a given time
  Arbitration process detects failover
  New election within seconds
Features: Replica Set
  Traditional Master-Slave, but automatic failover
  Each server holds all data (CA)
  One is elected master at a given time
  Arbitration process detects failover
  New election within seconds
Features: Replica Set
  Traditional Master-Slave, but automatic failover
  Each server holds all data (CA)
  One is elected master at a given time
  Arbitration process detects failover
  New election within seconds
Features


Map/Reduce
Map Reduce operations are written in Javascript.
Features: Querying
Rich, javascript-based query syntax
  Allows us to do deep, nested queries
Features: Querying
Rich, javascript-based query syntax
  Allows us to do deep, nested queries
Features: Querying
Rich, javascript-based query syntax
  Allows us to do deep, nested queries

  db.order.find( { shipping: { carrier: "usps" } } );
Features: Querying
Rich, javascript-based query syntax
  Allows us to do deep, nested queries

  db.order.find( { shipping: { carrier: "usps" } } );




 shipping is an embedded document (object)
Features


atomic operations
upserts, $set, $inc, $push, $pull, $pop, $addToSet, ...
Features
 Official Drivers
 .NET, Java, Javascript, Ruby, Node.js, PHP, Haskell, C/C++, Perl
Features
 Community & Market Growth
 MongoDB works alongside your existing technologies
Concepts
Concepts: Document-oriented
  Think of “documents” as objects / database records
Concepts: Document-oriented
  Think of “documents” as objects / database records
  Documents are basically just JSON in binary
Concept: Document-oriented
  Think of “documents” as objects / database records
  Documents are basically just JSON in binary
  Ability to store information all together
Concept Mapping
RDBMS (mysql, postgres)   MongoDB


        Tables            Collections
Concept Mapping
RDBMS (mysql, postgres)       MongoDB


        Tables               Collections


     Records/rows         Documents/objects
Concept Mapping
RDBMS (mysql, postgres)           MongoDB


         Tables                   Collections


     Records/rows            Documents/objects


 Queries return record(s)   Queries return a cursor
Concept Mapping
RDBMS (mysql, postgres)          MongoDB


         Tables                  Collections


     Records/rows            Documents/objects


Queries return record(s)   Queries return a cursor


                  ???
Concepts: Cursors
Queries return “cursors” instead of collections
Concepts: Cursors
Queries return “cursors” instead of collections
  A cursor allows you to iterate through the result set
Concepts: Cursors
Queries return “cursors” instead of collections
  A cursor allows you to iterate through the result set
  A big reason for this is performance
Concepts: Cursors
Queries return “cursors” instead of collections
  A cursor allows you to iterate through the result set
  A big reason for this is performance
  Much more efficient than loading all objects into
  memory
Concepts: Cursors
The find() function returns a cursor object
Concepts: Cursors
The find() function returns a cursor object

 var cursor = db.logged_requests.find({ 'status_code' : 200 })

 cursor.hasNext() // "true"

 cursor.forEach(
    function(item) {
      print(tojson(item))
    }
 );

 cursor.hasNext() // "false"
Cool Features
Cool Features
Capped collections
  Fixed-sized, limited operation, auto-LRU age-out
  collections
  Fixed insertion order
  Extremely performant
  Ideal for logging and caching
Cool Features
Capped collections
  Fixed-sized, limited operation, auto-LRU age-out
  collections
  Fixed insertion order
  Extremely performant
  Ideal for logging and caching
Cool Features
Capped collections
  Fixed-sized, limited operation, auto-LRU age-out
  collections
  Fixed insertion order
  Extremely performant
  Ideal for logging and caching
Cool Features
Capped collections
  Fixed-sized, limited operation, auto-LRU age-out
  collections
  Fixed insertion order
  Extremely performant
  Ideal for logging and caching
Cool Features
Capped collections
  Fixed-sized, limited operation, auto-LRU age-out
  collections
  Fixed insertion order
  Extremely performant
  Ideal for logging and caching
Use Cases
Use Cases
Data Warehouse
 Mongo understands JSON natively
 Very powerful map-reduce for analytics
 Nested hashes make roll-up (RRDtool) systems
 natural
Use Cases
Data Warehouse
 Mongo understands JSON natively
 Very powerful map-reduce for analytics
 Nested hashes make roll-up (RRDtool) systems
 natural
Use Cases
Data Warehouse
 Mongo understands JSON natively
 Very powerful map-reduce for analytics
 Nested hashes make roll-up (RRDtool) systems
 natural
Use Cases
Data Warehouse
 Mongo understands JSON natively
 Very powerful map-reduce for analytics
 Nested hashes make roll-up (RRDtool) systems
 natural
Use Cases
Audit Trails
  Nested documents allow you to store audit trails in
  context.
Use Cases
Audit Trails
  Nested documents allow you to store audit trails in
  context.
               {
                   ‘name’: ‘Justin Smestad’,
                   ‘address’: ‘1441 Central St’,
                   ‘history’: [
                     {
                         ‘name’: ‘Justin Smestad’,
                         ‘address’: ‘350 S Jackson St’
                     },....]
               }
Limitations
Limitations
 Transaction support
 Relational integrity (normalization)
 Partition balancing
 Map / Reduce is single threaded
 Data compaction
Limitations
 Transaction support
 Relational integrity (normalization)
 Partition balancing
 Map / Reduce is single threaded
 Data compaction
Limitations
 Transaction support
 Relational integrity (normalization)
 Partition balancing
 Map / Reduce is single threaded
 Data compaction
Limitations
 Transaction support
 Relational integrity (normalization)
 Partition balancing
 Map / Reduce is single threaded
 Data compaction
Limitations
 Transaction support
 Relational integrity (normalization)
 Partition balancing
 Map / Reduce is single threaded
 Data compaction
Why not `other db` ?
CouchDB, Cassandra, Riak, Membase, Redis, ...
Why not `other db`?


 FLEXIBILITY
 MongoDB’s design choices empower developers and
 administrators with features that allow you to
 implement their own solutions. *
Questions?
@jsmestad / @mongo_machine

Introduction to MongoDB