SlideShare a Scribd company logo
1 of 28
What is MongoDB?
          What Makes Mongo Special?
                         MapReduce




                           MongoDB
Or how I learned to stop worrying and love the database


                          Mathias Stearn

                                   10gen


    Bay Area Hadoop Meetup – February 17, 2010




                      Mathias Stearn   MongoDB
What is MongoDB?
              What Makes Mongo Special?
                             MapReduce




    The resulting [MongoDB] application has literally
    changed the way the pharma company conducts
    business. Whereas in the past, patient queries could
    take minutes to hours, results are now essentially
    real-time.
http://news.cnet.com/8301-13846_3-10451248-62.html
    Why bother [with] memcached for caching HTTP sessions
    when you have an authoritative MongoDB? The
    performance is there.

Carl Byström, @cgbystrom on twitter




                          Mathias Stearn   MongoDB
What is MongoDB?
              What Makes Mongo Special?
                             MapReduce




    Compared to hadoop, Mongo’s speed and startup time
    make developing new queries much easier; what took
    us two weeks to get working on hadoop was done in
    two days on mongo.
Emmett Shear, CTO (and developer) at Justin.tv
    It took me half a day to go from not touching MongoDB to
    writing some fairly good functionality against it. It makes
    setting up, configuring, and interfacing with MySQL look
    archaic – ridiculously archaic.

http://geekaustin.org/2010/01/31/
mongodb-day-geek-austin-data-series



                          Mathias Stearn   MongoDB
What is MongoDB?
              What Makes Mongo Special?
                             MapReduce



1   What is MongoDB?
     Document Oriented
     JavaScript Enabled
     Fast, Scalable, Available, and Reliable
2   What Makes Mongo Special?
     Native Language Integration
     Rich Data Types
     Atomic Modifiers
     Dynamic Queries
3   MapReduce
      Built-In MapReduce
      Easy Hadoop-Mongo Integration
      Better Hadoop-Mongo Integration


                          Mathias Stearn   MongoDB
What is MongoDB?     Document Oriented
              What Makes Mongo Special?    JavaScript Enabled
                             MapReduce     Fast, Scalable, Available, and Reliable




1   What is MongoDB?
     Document Oriented
     JavaScript Enabled
     Fast, Scalable, Available, and Reliable
2   What Makes Mongo Special?
     Native Language Integration
     Rich Data Types
     Atomic Modifiers
     Dynamic Queries
3   MapReduce
      Built-In MapReduce
      Easy Hadoop-Mongo Integration
      Better Hadoop-Mongo Integration

                          Mathias Stearn   MongoDB
What is MongoDB?     Document Oriented
            What Makes Mongo Special?    JavaScript Enabled
                           MapReduce     Fast, Scalable, Available, and Reliable


   Document Oriented
   Organized into Databases and Collections (like Tables)
   JSON-like (BSON)
   Schemaless
   Dynamic, Strong Typing
   Database can “reach into” objects
db.people.insert({
  _id: "mstearn",
  name: "Mathias Stearn",
  karma: 42,
  active: true,
  birthdate: new Date(517896000000),
  interests: ["MongoDB", "Python", "Üñíçø¯˘"],
                                         de
  subobject: {foo: "bar"}
});


                        Mathias Stearn   MongoDB
What is MongoDB?     Document Oriented
            What Makes Mongo Special?    JavaScript Enabled
                           MapReduce     Fast, Scalable, Available, and Reliable

JavaScript used for:
    Shell and Documentation
    (Very) Advanced Queries
    “Group By” Queries
    MapReduce
db.users.find({$where: "this.a + this.b >= 42"});
db.posts.group(
  { key: "user"
  , initial: {count:0, comments:0}
  , reduce: function(doc,out){
      out.count++;
      out.comments += doc.comments.length; }
  , finalize: function(out){
      out.avg = out.comments / out.count; }
  });

                        Mathias Stearn   MongoDB
What is MongoDB?     Document Oriented
                 What Makes Mongo Special?    JavaScript Enabled
                                MapReduce     Fast, Scalable, Available, and Reliable


Fast, Scalable, Available, and Reliable




       Master-Slave replication for Availability and Reliability
            Replica-Pairs support auto-negotiation for master
       Auto-Sharding for Horizontal Scalability
            Distributes based on specified field
            Currently alpha
       MMAP database files to automatically use available RAM
       Asynchronous modifications




                             Mathias Stearn   MongoDB
What is MongoDB?     Document Oriented
What Makes Mongo Special?    JavaScript Enabled
               MapReduce     Fast, Scalable, Available, and Reliable




            Mathias Stearn   MongoDB
Native Language Integration
                      What is MongoDB?
                                           Rich Data Types
              What Makes Mongo Special?
                                           Atomic Modifiers
                             MapReduce
                                           Dynamic Queries



1   What is MongoDB?
     Document Oriented
     JavaScript Enabled
     Fast, Scalable, Available, and Reliable
2   What Makes Mongo Special?
     Native Language Integration
     Rich Data Types
     Atomic Modifiers
     Dynamic Queries
3   MapReduce
      Built-In MapReduce
      Easy Hadoop-Mongo Integration
      Better Hadoop-Mongo Integration

                          Mathias Stearn   MongoDB
Native Language Integration
                     What is MongoDB?
                                          Rich Data Types
             What Makes Mongo Special?
                                          Atomic Modifiers
                            MapReduce
                                          Dynamic Queries




Official
    Java/JVM
    Python
    Ruby
    C/C++
    Perl
    PHP

Community Supported
Closure, Scala, C#, Haskell, Erlang, and More



                         Mathias Stearn   MongoDB
Native Language Integration
                     What is MongoDB?
                                          Rich Data Types
             What Makes Mongo Special?
                                          Atomic Modifiers
                            MapReduce
                                          Dynamic Queries



JSON
   String (UTF8)
   Double
   Object (hash/map/dict)
   Array
   Bool
   Null / Undefined

Extras
    Date
   Int32 / Int64
   ObjectID (12 bytes: timestamp + host + pid + counter)
   Binary (with type byte)

                         Mathias Stearn   MongoDB
Native Language Integration
                     What is MongoDB?
                                          Rich Data Types
             What Makes Mongo Special?
                                          Atomic Modifiers
                            MapReduce
                                          Dynamic Queries




   $set
   $inc
   $multiply (soon)
   $push / $pushAll
   $pull / $pullAll

db.posts.update({_id:SOMEID}, {$push:{tags:"mongodb"}})
db.tags.update({_id:"mongodb"}, {$inc:{count:1}},
                      {upsert:true}})




                         Mathias Stearn   MongoDB
Native Language Integration
                     What is MongoDB?
                                          Rich Data Types
             What Makes Mongo Special?
                                          Atomic Modifiers
                            MapReduce
                                          Dynamic Queries

Simple




     db.posts.findOne({ user: "mstearn" });

     var cursor = db.posts.find({ user: "mstearn" });
     cursor.forEach(function(){
       doSomething(this.text);
     });




                         Mathias Stearn   MongoDB
Native Language Integration
                     What is MongoDB?
                                          Rich Data Types
             What Makes Mongo Special?
                                          Atomic Modifiers
                            MapReduce
                                          Dynamic Queries

Sorted




     db.posts.find(
       { user: "mstearn" }
     ).sort({timestamp:-1})




                         Mathias Stearn   MongoDB
Native Language Integration
                     What is MongoDB?
                                          Rich Data Types
             What Makes Mongo Special?
                                          Atomic Modifiers
                            MapReduce
                                          Dynamic Queries

Paginated




     db.posts.find(
       { user: "mstearn" }
     ).sort({timestamp:-1}).skip(10).limit(10);




                         Mathias Stearn   MongoDB
Native Language Integration
                      What is MongoDB?
                                           Rich Data Types
              What Makes Mongo Special?
                                           Atomic Modifiers
                             MapReduce
                                           Dynamic Queries

Simple Tag Search




     db.posts.find(
       { user: "mstearn"
       , tags: "mongo"
       }
     ).sort({timestamp:-1}).skip(10).limit(10);




                          Mathias Stearn   MongoDB
Native Language Integration
                      What is MongoDB?
                                           Rich Data Types
              What Makes Mongo Special?
                                           Atomic Modifiers
                             MapReduce
                                           Dynamic Queries

Complex Tag Search




     db.posts.find(
       { user: "mstearn"
       , tags: {$in: ["mongo", "mongodb"]}
       }
     ).sort({timestamp:-1}).skip(10).limit(10);




                          Mathias Stearn   MongoDB
Native Language Integration
                         What is MongoDB?
                                              Rich Data Types
                 What Makes Mongo Special?
                                              Atomic Modifiers
                                MapReduce
                                              Dynamic Queries

Nested Objects




     db.posts.find(
       { user: "mstearn"
       , tags: {$in: ["mongo", "mongodb"]}
       , comments.user: "mdirolf"
       }
     ).sort({timestamp:-1}).skip(10).limit(10);




                             Mathias Stearn   MongoDB
Native Language Integration
                      What is MongoDB?
                                           Rich Data Types
              What Makes Mongo Special?
                                           Atomic Modifiers
                             MapReduce
                                           Dynamic Queries

Regular Expressions




     db.posts.find(
       { user: "mstearn"
       , tags: {$in: ["mongo", "mongodb"]}
       , comments.user: "mdirolf"
       , text: /windows/i
       }
     ).sort({timestamp:-1}).skip(10).limit(10);




                          Mathias Stearn   MongoDB
Native Language Integration
                    What is MongoDB?
                                         Rich Data Types
            What Makes Mongo Special?
                                         Atomic Modifiers
                           MapReduce
                                         Dynamic Queries

Ranges



    db.posts.find(
      { user: "mstearn"
      , tags: {$in: ["mongo", "mongodb"]}
      , comments.user: "mdirolf"
      , text: /windows/i
      , points: {$gt: 10, $lt: 100}
      }
    ).sort({timestamp:-1}).skip(10).limit(10);




                        Mathias Stearn   MongoDB
Native Language Integration
                        What is MongoDB?
                                             Rich Data Types
                What Makes Mongo Special?
                                             Atomic Modifiers
                               MapReduce
                                             Dynamic Queries

Arbitrary JavaScript



     db.posts.find(
       { user: "mstearn"
       , tags: {$in: ["mongo", "mongodb"]}
       , comments.user: "mdirolf"
       , text: /windows/i
       , points: {$gt: 10, $lt 100}
       , $where: "this.a + this.b >= 42"
       }
     ).sort({timestamp:-1}).skip(10).limit(10);




                            Mathias Stearn   MongoDB
What is MongoDB?     Built-In MapReduce
              What Makes Mongo Special?    Easy Hadoop-Mongo Integration
                             MapReduce     Better Hadoop-Mongo Integration




1   What is MongoDB?
     Document Oriented
     JavaScript Enabled
     Fast, Scalable, Available, and Reliable
2   What Makes Mongo Special?
     Native Language Integration
     Rich Data Types
     Atomic Modifiers
     Dynamic Queries
3   MapReduce
      Built-In MapReduce
      Easy Hadoop-Mongo Integration
      Better Hadoop-Mongo Integration

                          Mathias Stearn   MongoDB
What is MongoDB?     Built-In MapReduce
           What Makes Mongo Special?    Easy Hadoop-Mongo Integration
                          MapReduce     Better Hadoop-Mongo Integration


db.posts.mapReduce(
   function() {
     this.comments.forEach(c){
       emit(c.user,
            {count:1, words:c.text.split().length; } }
 , function(key, values){
     for (var i=1; i<values.length; i++){
       values[0].count += values[i].count;
       values[0].words += values[i].words; }
     return values[0]; }
 , { finalize: function(out){
       out.avg = out.words / out.count;
       return out; }
   , query: {posted: {$gt: new Date(2010,0,1)}}
   , out: ’posts.comment_stats’
   }
 });

                       Mathias Stearn   MongoDB
What is MongoDB?     Built-In MapReduce
               What Makes Mongo Special?    Easy Hadoop-Mongo Integration
                              MapReduce     Better Hadoop-Mongo Integration


Easy Hadoop-Mongo Integration




      mongoexport can export to JSON/CSV/TSV
          Can also easily use a custom script
      Process in Hadoop
      Use mongoimport to get data back into MongoDB




                           Mathias Stearn   MongoDB
What is MongoDB?     Built-In MapReduce
               What Makes Mongo Special?    Easy Hadoop-Mongo Integration
                              MapReduce     Better Hadoop-Mongo Integration


Better Hadoop-Mongo Integration



      mongodump writes a stream of BSON to a file
      Write an InputFilter and RecordReader to read BSON
      Write a BSONWriter class to directly use the data
          Just added two methods to driver to make this easier
      Process the data with the Java/Scala/Closure driver
      Write a custom RecordWriter to either:
          Dump to a file and use mongorestore
          Dump the output directly to MongoDB
      Optional: use renameCollection to mimic our MapReduce




                           Mathias Stearn   MongoDB
What is MongoDB?
              What Makes Mongo Special?
                             MapReduce


Upcoming events




      NoSQL Live! from Boston (March 11)
      MongoDB Training in San Francisco (March 25)
      San Fransisco MySQL Meetup (April 12)




                          Mathias Stearn   MongoDB
What is MongoDB?
                What Makes Mongo Special?
                               MapReduce


Links




        http://mongo.kylebanker.com (Try mongo in your browser)
        http://www.mongodb.org
        #mongodb on irc.freenode.net
        mongodb-user on google groups

        mathias@10gen.com
        @mathias_mongo on twitter




                            Mathias Stearn   MongoDB

More Related Content

Viewers also liked

Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop User Group
 
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReducePublic Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduceHadoop User Group
 
Yahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupYahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupHadoop User Group
 
Karmasphere Studio for Hadoop
Karmasphere Studio for HadoopKarmasphere Studio for Hadoop
Karmasphere Studio for HadoopHadoop User Group
 
2 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-212 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-21Hadoop User Group
 
1 content optimization-hug-2010-07-21
1 content optimization-hug-2010-07-211 content optimization-hug-2010-07-21
1 content optimization-hug-2010-07-21Hadoop User Group
 
HUG August 2010: Best practices
HUG August 2010: Best practicesHUG August 2010: Best practices
HUG August 2010: Best practicesHadoop User Group
 
The Bixo Web Mining Toolkit
The Bixo Web Mining ToolkitThe Bixo Web Mining Toolkit
The Bixo Web Mining ToolkitTom Croucher
 
Nov 2010 HUG: Business Intelligence for Big Data
Nov 2010 HUG: Business Intelligence for Big DataNov 2010 HUG: Business Intelligence for Big Data
Nov 2010 HUG: Business Intelligence for Big DataYahoo Developer Network
 
Upgrading To The New Map Reduce API
Upgrading To The New Map Reduce APIUpgrading To The New Map Reduce API
Upgrading To The New Map Reduce APITom Croucher
 
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...Hadoop User Group
 
Bay Area HUG Feb 2011 Intro
Bay Area HUG Feb 2011 IntroBay Area HUG Feb 2011 Intro
Bay Area HUG Feb 2011 IntroOwen O'Malley
 

Viewers also liked (19)

Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReducePublic Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
 
Yahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupYahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user group
 
Karmasphere Studio for Hadoop
Karmasphere Studio for HadoopKarmasphere Studio for Hadoop
Karmasphere Studio for Hadoop
 
2 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-212 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-21
 
1 content optimization-hug-2010-07-21
1 content optimization-hug-2010-07-211 content optimization-hug-2010-07-21
1 content optimization-hug-2010-07-21
 
HUG August 2010: Best practices
HUG August 2010: Best practicesHUG August 2010: Best practices
HUG August 2010: Best practices
 
Mumak
MumakMumak
Mumak
 
The Bixo Web Mining Toolkit
The Bixo Web Mining ToolkitThe Bixo Web Mining Toolkit
The Bixo Web Mining Toolkit
 
Nov 2010 HUG: Business Intelligence for Big Data
Nov 2010 HUG: Business Intelligence for Big DataNov 2010 HUG: Business Intelligence for Big Data
Nov 2010 HUG: Business Intelligence for Big Data
 
Nov 2010 HUG: Fuzzy Table - B.A.H
Nov 2010 HUG: Fuzzy Table - B.A.HNov 2010 HUG: Fuzzy Table - B.A.H
Nov 2010 HUG: Fuzzy Table - B.A.H
 
Upgrading To The New Map Reduce API
Upgrading To The New Map Reduce APIUpgrading To The New Map Reduce API
Upgrading To The New Map Reduce API
 
HUG Nov 2010: HDFS Raid - Facebook
HUG Nov 2010: HDFS Raid - FacebookHUG Nov 2010: HDFS Raid - Facebook
HUG Nov 2010: HDFS Raid - Facebook
 
Cloudera Desktop
Cloudera DesktopCloudera Desktop
Cloudera Desktop
 
3 avro hug-2010-07-21
3 avro hug-2010-07-213 avro hug-2010-07-21
3 avro hug-2010-07-21
 
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
 
January 2011 HUG: Howl Presentation
January 2011 HUG: Howl PresentationJanuary 2011 HUG: Howl Presentation
January 2011 HUG: Howl Presentation
 
January 2011 HUG: Pig Presentation
January 2011 HUG: Pig PresentationJanuary 2011 HUG: Pig Presentation
January 2011 HUG: Pig Presentation
 
Bay Area HUG Feb 2011 Intro
Bay Area HUG Feb 2011 IntroBay Area HUG Feb 2011 Intro
Bay Area HUG Feb 2011 Intro
 

More from Hadoop User Group

Karmasphere hadoop-productivity-tools
Karmasphere hadoop-productivity-toolsKarmasphere hadoop-productivity-tools
Karmasphere hadoop-productivity-toolsHadoop User Group
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopHadoop User Group
 
1 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit20101 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit2010Hadoop User Group
 
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Hadoop User Group
 
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...Hadoop User Group
 
Hadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop User Group
 
Flightcaster Presentation Hadoop
Flightcaster  Presentation  HadoopFlightcaster  Presentation  Hadoop
Flightcaster Presentation HadoopHadoop User Group
 

More from Hadoop User Group (16)

Common crawlpresentation
Common crawlpresentationCommon crawlpresentation
Common crawlpresentation
 
Hdfs high availability
Hdfs high availabilityHdfs high availability
Hdfs high availability
 
Cascalog internal dsl_preso
Cascalog internal dsl_presoCascalog internal dsl_preso
Cascalog internal dsl_preso
 
Karmasphere hadoop-productivity-tools
Karmasphere hadoop-productivity-toolsKarmasphere hadoop-productivity-tools
Karmasphere hadoop-productivity-tools
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
 
Hdfs high availability
Hdfs high availabilityHdfs high availability
Hdfs high availability
 
Pig at Linkedin
Pig at LinkedinPig at Linkedin
Pig at Linkedin
 
1 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit20101 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit2010
 
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
 
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
 
Hadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User Group
 
Hadoop Security Preview
Hadoop Security PreviewHadoop Security Preview
Hadoop Security Preview
 
Flightcaster Presentation Hadoop
Flightcaster  Presentation  HadoopFlightcaster  Presentation  Hadoop
Flightcaster Presentation Hadoop
 
Map Reduce Online
Map Reduce OnlineMap Reduce Online
Map Reduce Online
 
Hadoop Security Preview
Hadoop Security PreviewHadoop Security Preview
Hadoop Security Preview
 
Hadoop Security Preview
Hadoop Security PreviewHadoop Security Preview
Hadoop Security Preview
 

10 Gen 20100217 Hadoop Bay Area

  • 1. What is MongoDB? What Makes Mongo Special? MapReduce MongoDB Or how I learned to stop worrying and love the database Mathias Stearn 10gen Bay Area Hadoop Meetup – February 17, 2010 Mathias Stearn MongoDB
  • 2. What is MongoDB? What Makes Mongo Special? MapReduce The resulting [MongoDB] application has literally changed the way the pharma company conducts business. Whereas in the past, patient queries could take minutes to hours, results are now essentially real-time. http://news.cnet.com/8301-13846_3-10451248-62.html Why bother [with] memcached for caching HTTP sessions when you have an authoritative MongoDB? The performance is there. Carl Byström, @cgbystrom on twitter Mathias Stearn MongoDB
  • 3. What is MongoDB? What Makes Mongo Special? MapReduce Compared to hadoop, Mongo’s speed and startup time make developing new queries much easier; what took us two weeks to get working on hadoop was done in two days on mongo. Emmett Shear, CTO (and developer) at Justin.tv It took me half a day to go from not touching MongoDB to writing some fairly good functionality against it. It makes setting up, configuring, and interfacing with MySQL look archaic – ridiculously archaic. http://geekaustin.org/2010/01/31/ mongodb-day-geek-austin-data-series Mathias Stearn MongoDB
  • 4. What is MongoDB? What Makes Mongo Special? MapReduce 1 What is MongoDB? Document Oriented JavaScript Enabled Fast, Scalable, Available, and Reliable 2 What Makes Mongo Special? Native Language Integration Rich Data Types Atomic Modifiers Dynamic Queries 3 MapReduce Built-In MapReduce Easy Hadoop-Mongo Integration Better Hadoop-Mongo Integration Mathias Stearn MongoDB
  • 5. What is MongoDB? Document Oriented What Makes Mongo Special? JavaScript Enabled MapReduce Fast, Scalable, Available, and Reliable 1 What is MongoDB? Document Oriented JavaScript Enabled Fast, Scalable, Available, and Reliable 2 What Makes Mongo Special? Native Language Integration Rich Data Types Atomic Modifiers Dynamic Queries 3 MapReduce Built-In MapReduce Easy Hadoop-Mongo Integration Better Hadoop-Mongo Integration Mathias Stearn MongoDB
  • 6. What is MongoDB? Document Oriented What Makes Mongo Special? JavaScript Enabled MapReduce Fast, Scalable, Available, and Reliable Document Oriented Organized into Databases and Collections (like Tables) JSON-like (BSON) Schemaless Dynamic, Strong Typing Database can “reach into” objects db.people.insert({ _id: "mstearn", name: "Mathias Stearn", karma: 42, active: true, birthdate: new Date(517896000000), interests: ["MongoDB", "Python", "Üñíçø¯˘"], de subobject: {foo: "bar"} }); Mathias Stearn MongoDB
  • 7. What is MongoDB? Document Oriented What Makes Mongo Special? JavaScript Enabled MapReduce Fast, Scalable, Available, and Reliable JavaScript used for: Shell and Documentation (Very) Advanced Queries “Group By” Queries MapReduce db.users.find({$where: "this.a + this.b >= 42"}); db.posts.group( { key: "user" , initial: {count:0, comments:0} , reduce: function(doc,out){ out.count++; out.comments += doc.comments.length; } , finalize: function(out){ out.avg = out.comments / out.count; } }); Mathias Stearn MongoDB
  • 8. What is MongoDB? Document Oriented What Makes Mongo Special? JavaScript Enabled MapReduce Fast, Scalable, Available, and Reliable Fast, Scalable, Available, and Reliable Master-Slave replication for Availability and Reliability Replica-Pairs support auto-negotiation for master Auto-Sharding for Horizontal Scalability Distributes based on specified field Currently alpha MMAP database files to automatically use available RAM Asynchronous modifications Mathias Stearn MongoDB
  • 9. What is MongoDB? Document Oriented What Makes Mongo Special? JavaScript Enabled MapReduce Fast, Scalable, Available, and Reliable Mathias Stearn MongoDB
  • 10. Native Language Integration What is MongoDB? Rich Data Types What Makes Mongo Special? Atomic Modifiers MapReduce Dynamic Queries 1 What is MongoDB? Document Oriented JavaScript Enabled Fast, Scalable, Available, and Reliable 2 What Makes Mongo Special? Native Language Integration Rich Data Types Atomic Modifiers Dynamic Queries 3 MapReduce Built-In MapReduce Easy Hadoop-Mongo Integration Better Hadoop-Mongo Integration Mathias Stearn MongoDB
  • 11. Native Language Integration What is MongoDB? Rich Data Types What Makes Mongo Special? Atomic Modifiers MapReduce Dynamic Queries Official Java/JVM Python Ruby C/C++ Perl PHP Community Supported Closure, Scala, C#, Haskell, Erlang, and More Mathias Stearn MongoDB
  • 12. Native Language Integration What is MongoDB? Rich Data Types What Makes Mongo Special? Atomic Modifiers MapReduce Dynamic Queries JSON String (UTF8) Double Object (hash/map/dict) Array Bool Null / Undefined Extras Date Int32 / Int64 ObjectID (12 bytes: timestamp + host + pid + counter) Binary (with type byte) Mathias Stearn MongoDB
  • 13. Native Language Integration What is MongoDB? Rich Data Types What Makes Mongo Special? Atomic Modifiers MapReduce Dynamic Queries $set $inc $multiply (soon) $push / $pushAll $pull / $pullAll db.posts.update({_id:SOMEID}, {$push:{tags:"mongodb"}}) db.tags.update({_id:"mongodb"}, {$inc:{count:1}}, {upsert:true}}) Mathias Stearn MongoDB
  • 14. Native Language Integration What is MongoDB? Rich Data Types What Makes Mongo Special? Atomic Modifiers MapReduce Dynamic Queries Simple db.posts.findOne({ user: "mstearn" }); var cursor = db.posts.find({ user: "mstearn" }); cursor.forEach(function(){ doSomething(this.text); }); Mathias Stearn MongoDB
  • 15. Native Language Integration What is MongoDB? Rich Data Types What Makes Mongo Special? Atomic Modifiers MapReduce Dynamic Queries Sorted db.posts.find( { user: "mstearn" } ).sort({timestamp:-1}) Mathias Stearn MongoDB
  • 16. Native Language Integration What is MongoDB? Rich Data Types What Makes Mongo Special? Atomic Modifiers MapReduce Dynamic Queries Paginated db.posts.find( { user: "mstearn" } ).sort({timestamp:-1}).skip(10).limit(10); Mathias Stearn MongoDB
  • 17. Native Language Integration What is MongoDB? Rich Data Types What Makes Mongo Special? Atomic Modifiers MapReduce Dynamic Queries Simple Tag Search db.posts.find( { user: "mstearn" , tags: "mongo" } ).sort({timestamp:-1}).skip(10).limit(10); Mathias Stearn MongoDB
  • 18. Native Language Integration What is MongoDB? Rich Data Types What Makes Mongo Special? Atomic Modifiers MapReduce Dynamic Queries Complex Tag Search db.posts.find( { user: "mstearn" , tags: {$in: ["mongo", "mongodb"]} } ).sort({timestamp:-1}).skip(10).limit(10); Mathias Stearn MongoDB
  • 19. Native Language Integration What is MongoDB? Rich Data Types What Makes Mongo Special? Atomic Modifiers MapReduce Dynamic Queries Nested Objects db.posts.find( { user: "mstearn" , tags: {$in: ["mongo", "mongodb"]} , comments.user: "mdirolf" } ).sort({timestamp:-1}).skip(10).limit(10); Mathias Stearn MongoDB
  • 20. Native Language Integration What is MongoDB? Rich Data Types What Makes Mongo Special? Atomic Modifiers MapReduce Dynamic Queries Regular Expressions db.posts.find( { user: "mstearn" , tags: {$in: ["mongo", "mongodb"]} , comments.user: "mdirolf" , text: /windows/i } ).sort({timestamp:-1}).skip(10).limit(10); Mathias Stearn MongoDB
  • 21. Native Language Integration What is MongoDB? Rich Data Types What Makes Mongo Special? Atomic Modifiers MapReduce Dynamic Queries Ranges db.posts.find( { user: "mstearn" , tags: {$in: ["mongo", "mongodb"]} , comments.user: "mdirolf" , text: /windows/i , points: {$gt: 10, $lt: 100} } ).sort({timestamp:-1}).skip(10).limit(10); Mathias Stearn MongoDB
  • 22. Native Language Integration What is MongoDB? Rich Data Types What Makes Mongo Special? Atomic Modifiers MapReduce Dynamic Queries Arbitrary JavaScript db.posts.find( { user: "mstearn" , tags: {$in: ["mongo", "mongodb"]} , comments.user: "mdirolf" , text: /windows/i , points: {$gt: 10, $lt 100} , $where: "this.a + this.b >= 42" } ).sort({timestamp:-1}).skip(10).limit(10); Mathias Stearn MongoDB
  • 23. What is MongoDB? Built-In MapReduce What Makes Mongo Special? Easy Hadoop-Mongo Integration MapReduce Better Hadoop-Mongo Integration 1 What is MongoDB? Document Oriented JavaScript Enabled Fast, Scalable, Available, and Reliable 2 What Makes Mongo Special? Native Language Integration Rich Data Types Atomic Modifiers Dynamic Queries 3 MapReduce Built-In MapReduce Easy Hadoop-Mongo Integration Better Hadoop-Mongo Integration Mathias Stearn MongoDB
  • 24. What is MongoDB? Built-In MapReduce What Makes Mongo Special? Easy Hadoop-Mongo Integration MapReduce Better Hadoop-Mongo Integration db.posts.mapReduce( function() { this.comments.forEach(c){ emit(c.user, {count:1, words:c.text.split().length; } } , function(key, values){ for (var i=1; i<values.length; i++){ values[0].count += values[i].count; values[0].words += values[i].words; } return values[0]; } , { finalize: function(out){ out.avg = out.words / out.count; return out; } , query: {posted: {$gt: new Date(2010,0,1)}} , out: ’posts.comment_stats’ } }); Mathias Stearn MongoDB
  • 25. What is MongoDB? Built-In MapReduce What Makes Mongo Special? Easy Hadoop-Mongo Integration MapReduce Better Hadoop-Mongo Integration Easy Hadoop-Mongo Integration mongoexport can export to JSON/CSV/TSV Can also easily use a custom script Process in Hadoop Use mongoimport to get data back into MongoDB Mathias Stearn MongoDB
  • 26. What is MongoDB? Built-In MapReduce What Makes Mongo Special? Easy Hadoop-Mongo Integration MapReduce Better Hadoop-Mongo Integration Better Hadoop-Mongo Integration mongodump writes a stream of BSON to a file Write an InputFilter and RecordReader to read BSON Write a BSONWriter class to directly use the data Just added two methods to driver to make this easier Process the data with the Java/Scala/Closure driver Write a custom RecordWriter to either: Dump to a file and use mongorestore Dump the output directly to MongoDB Optional: use renameCollection to mimic our MapReduce Mathias Stearn MongoDB
  • 27. What is MongoDB? What Makes Mongo Special? MapReduce Upcoming events NoSQL Live! from Boston (March 11) MongoDB Training in San Francisco (March 25) San Fransisco MySQL Meetup (April 12) Mathias Stearn MongoDB
  • 28. What is MongoDB? What Makes Mongo Special? MapReduce Links http://mongo.kylebanker.com (Try mongo in your browser) http://www.mongodb.org #mongodb on irc.freenode.net mongodb-user on google groups mathias@10gen.com @mathias_mongo on twitter Mathias Stearn MongoDB