NOSQL 101
Or: How I Learned To Stop Worrying And Love The Mongo
Who Am I?
• Daniel Cousineau
• Sr. Software Applications Developer at
  Texas A&M University
• dcousineau@gmail.com
• Twitter: @dcousineau
• dcousineau.com
In the beginning... there
    was Accounting.
Row Based, Fixed Schema
The RDBMS was
created to address this
        usage.
RDBMS Ideology
• ACID
 • Atomicity
 • Consistency
 • Isolation
 • Durability
• All or nothing, no corruption, no mistakes
• Accounting errors are EXPENSIVE!
RDBMS Ideology


• Pessimistic
• Consistency at the end of EVERY step!
Moore’s Law happened.
Computers took on
more complex tasks...
Problems became...
     Dynamic
NOSQL attempts to
address the Dynamic.
What is NOSQL?
What Is NOSQL?


• Any data storage engine that does not use
  a SQL interface and does not use relational
  algebra
NOSQL Ideology
• BASE
 • Basically Available
 • Soft state
 • Eventually consistent
• Zen-like
• Content loss isn’t that big of a deal
NOSQL Ideology


• Optimistic
• State will be in flux, just accept it
NOSQL Ideology


• Don’t Do More Work Than You Have To
• Don’t Unnecessarily Duplicate Effort
NOSQL is diverse...
Types of NOSQL

• Key-Value Stores
 • memcache/memcachedb
 • riak
 • tokyo cabinet/tyrant
Types of NOSQL

• Column-oriented
 • dynamo
 • bigtable
 • cassandra
Types of NOSQL


• Graph
 • neo4j
Types of NOSQL

• Document-oriented
 • couchdb
 • MongoDb
Lets focus on
 MongoDB...
Why MongoDB?
• Because it’s what I need
• Because I understand it
• Because I’ve used it
• Because it’s easy
• Because it has superior driver support
• Because I said so
Support?
•   Operating Systems    • Official Drivers
    • OSX 32/64bit        • C, C++, Java,
                          JavaScript, Perl, PHP,
    • Linux 32/64bit      Python, Ruby
    • Windows 32/64bit • Community Drivers
    • Solaris i86pc     •     REST, C#, Clojure, ColdFusion,
                              Delphi, Erlang, Factor, Fantom, F#,

    • Solaris 64              Go, Groovy, Haskell, Lua, Obj-C,
                              PowerShell, Scala, Scheme, Smalltalk
What is MongoDB?
• A document-based storage system
• Databases contain collections, collections
  contain documents
• Documents are arbitrary BSON (extension
  of JSON) objects
• No schema is enforced
What is MongoDB?

• Drivers expose MongoDB query API to
  languages in a form familiar and native
• Drivers usually handle serialization
 • You always work in native system
    objects, BSON is really only used
    internally
Install MongoDB

$ wget http://fastdl.mongodb.org/osx/mongodb-osx-x86_64-1.6.0.tgz
$ tar -xzvf ./mongodb-osx-x86_64-1.6.0.tgz
$ ./mongodb-osx-x86_64-1.6.0/bin/mongod --dbpath=/path/to/save/db
Manage MongoDB
$ ./mongodb-osx-x86_64-1.6.0/bin/mongo
MongoDB shell version: 1.6.0
connecting to: test
> db.foo.save( {a:1} )
> db.foo.find()
{ "_id" : ObjectId("4c60d0143cd09f6d17a18094"), "a" : 1 }
>
Simple Queries
$ ./mongodb-osx-x86_64-1.6.0/bin/mongo
MongoDB shell version: 1.6.0
connecting to: test
> db.foo.find()
Get All Records
> db.foo.find( {a: 1} )
Get All Records Where Property ‘a’ Is 1
> db.foo.find( {a: 1}, {_id: 1} )
Get The ‘_id’ Property Of All Records Where ‘a’ Is 1
> db.foo.find().limit( 1 )
Get 1 Record
> db.foo.find().sort( {a: -1} )
Get All Records Sorted By Property ‘a’ In Reverse
>
Some Common
  Questions
So I should chose
MongoDB over MySQL?

• Bad Question!
• 90% of the time you’ll probably implement
  a hybrid system.
When should I use
     MongoDB?
• When an ORM is necessary
 • It’s in the name, Object-Relational
    Mapper
• When you use a metric ton of 1:n and n:m
  tables to generate 1 object
  • And you rarely if ever use them for
    reporting
MongoDB performance
     is better?
• Too simple of a question
• Performance comparable, MySQL usually
  wins sheer query speed
 • Sterilized Lab Test
• MongoDB usually wins due to fewer
  queries required and no object reassembly
 • The Real World
Can MongoDB enforce
     a schema?
• You can add indexes on arbitrary
  keypatterns
• Otherwise, why?
 • Application is responsible for correctness
    and error handling, no need to duplicate
Can I trust eventual
     consistency?

• No, but you shouldn’t trust ACID either
• Build your application to be flexible and to
  handle consistency issues
  • Stale data is a fact of life
Can MongoDB Handle
 Large Deployments?
   1.2 TB over 5 billion records

   600+ million documents

  Migrating ENTIRE app from Postgres




              http://www.mongodb.org/display/DOCS/Production+Deployments
Can MongoDB Handle
 Large Deployments?
• huMONGOousDB
• 32-bit only supports ~2.5GB
 • Memory-mapped files
• Individual documents limited to 4MB
Why waste time with
     theory?
2 Case Studies
The ‘Have Done’




  http://orthochronos.com
Very Simple

• Daemon does insertion in the background
• Front end just does simple grabs
 • Grab 1, Grab Many, etc.
Data Model
• System has Stocks
• Each Stock has Daily (once per day) and
  IntraDaily (every 15 minutes) data
  • Limited to trading hours
• Each set of data (daily or intradaily) has 4
  Graphs
• Each graph has upwards of 6 Lines
• Each line has between 300 to 800 Points
Data Model
• With each data point representing 1
  minute, each 15-minute IntraDay graph will
  have about 785 overlapping points with the
  preceding graph
• Why not consolidate into a single running
  table, and just SELECT ... LIMIT 800
  points from X timestamp?
Data Model

• The Algorithm will cause past points to
  change
• But each graph should be preserved so one
  can see the historical evolution of a given
  curve
Data Model


• Now imagine implementing these
  requirements in a traditional RDBMS
Data Model


• Instead, lets see my MongoDB
  implementation
Database Layout
~/MongoDB/bin $ ./mongo
MongoDB shell version: 1.6.0
connecting to: test
> use DBNAME
switched to db DBNAME
> show collections
aapl
aapl.daily
aapl.intraday
acas
acas.daily
acas.intraday
...
wfr
wfr.daily
wfr.intraday
x
x.daily
x.intraday
Stock ‘Metadata’


{ "_id" : ObjectId("4c5a15038ead0eec04000000"), "timestamp" : 1228995000,
"data" : [ "0", "92", "0" ] }
Interval Data
{ "_id" : ObjectId("4bb901cc8ead0e041a0d0000"), "timestamp" : 1228994100, "number" : 3, "charts" : [
    {
        "number" : "1",
        "data" : [
            -99999,
            -99999,
            -99999,
            -99999,
            -99999
        ],
        "domainLength" : 300,
        "domainDates" : [
            "Tue Nov 25 10:45:00 GMT-0500 2008",
                  ...
        ],
        "lines" : [
            {
                "first" : 76,
                "last" : 300,
                "points" : [
                      {
                          "index" : 1,
                          "value" : 0
                      },
                      { ... }
                ]
            }, { ... }, { ... }, { ... }, { ... }, { ... }
        ]
    }, { ... }, { ... }, { ... }
] }
Connect
/**
  * @return MongoDB
  */
protected static function _getDb()
{
     static $db;

    if( !isset($db) )
    {
        $mongo = new Mongo();
        $db = $mongo->selectDb('...db name...');
    }

    return $db;
}
SELECT TOP 1...
//...

$db = self::_getDb();

$collection = $db->selectCollection(strtolower($symbol));

$dti = $collection->find()
                  ->sort(array(
                      'timestamp' => -1
                  ))
                  ->limit(1)
                  ->getNext();

//...
Get Specific Timestamp
//...

$tstamp = strtotime($lastTimestamp);

$cur = $collection->find(array(
                      'timestamp' => $tstamp
                  ))
                  ->limit(1);

//...
Only Get Timestamps
//...

$dailyCollection = $db->selectCollection(strtolower($symbol).'.daily');

$dailyCur = $dailyCollection->find(array(), array('timestamp'))
                            ->sort(array(
                                'timestamp' => 1
                            ));

foreach( $dailyCur as $timestamp ) {
    //...
}

//...
Utilizing Collections
//...

$db = self::_getDb();

$stocks = array();
foreach( $db->listCollections() as $collection )
{
    $collection_name = strtolower($collection->getName());

    if( preg_match('#^[a-z0-9]+$#i', $collection_name) )
    {
        $collection_name = strtoupper($collection_name);

        $stocks[] = $collection_name;
    }
}

sort($stocks);

//...
The ‘Wish I Did’
Not Too Terrible
• Keep track of Student Cases
 • A case keeps track of demographics,
    diagnoses, disabilities, notes, schedule, etc.
• Also tracks Exams
 • Schedule multiple exams per course
• Finally, students can log into a portal,
  counselors can log in
  • Basic user management
Mostly Static


• Most information display only
• Even with reporting
So I used good old
fashioned RDBMS
      design...
lolwut?
Instead...

• A collection for Student Cases
• A collection for Courses
• etc...
• Denormalize!
Boyce-Codd Who?
{
             A Student Document
    "_id":ObjectId("4c5f572e8ead0ed00d0f0000"),
    "uin":"485596916",                                "disabilities":[
    "firstname":"Zach",                                  "Blind",
    "middleinitial":"I",                                 "Asthma"
    "lastname":"Hill",                                ],
    "major":"ACCT",                                   "casenotes":[
    "classification":"G4",                               ...
    "registrationdate":"2008-03-09",                     {
    "renewaldates":[                                         "counselor":"Zander King",
       ...                                                   "note":"lorem ipsum bibendum enim ..."
       {                                                 }
           "semester":"201021",                       ],
           "date":"2010-05-15"                        "schedules":[
       }                                                 {
    ],                                                       "semester":"201021",
    "localaddress":{                                         "courses":[
       "street":"9116 View",                                    ...
       "city":"College Station",                                {
       "state":"TX",                                                "$ref":"courses",
       "zip":"77840"                                                "$id":ObjectId(...)
    },                                                          }
    "permanentaddress":{                                     ]
       "street":"3960 Lake",                             }
       "city":"Los Angeles",                          ]
       "state":"CA",                              }
       "zip":"90001"
    },
A Course Document

  {
      "_id" : ObjectId("4c5f572e8ead0ed00d000000"),
      "subject" : "MECH",
      "course" : 175,
      "section" : 506,
      "faculty" : "Dr. Quagmire Hill"
  }
I lied...
MongoDB DBRef

• Similar to FK
• Requires driver support
• Not query-able
But can we do
everything we need?
Paginate?
>   db.students.find( {}, {_id:1} ).skip( 20 ).limit( 10 )
{   "_id" : ObjectId("4c5f572e8ead0ed00d230000") }
{   "_id" : ObjectId("4c5f572e8ead0ed00d240000") }
{   "_id" : ObjectId("4c5f572e8ead0ed00d250000") }
{   "_id" : ObjectId("4c5f572e8ead0ed00d260000") }
{   "_id" : ObjectId("4c5f572e8ead0ed00d270000") }
{   "_id" : ObjectId("4c5f572e8ead0ed00d280000") }
{   "_id" : ObjectId("4c5f572e8ead0ed00d290000") }
{   "_id" : ObjectId("4c5f572e8ead0ed00d2a0000") }
{   "_id" : ObjectId("4c5f572e8ead0ed00d2b0000") }
{   "_id" : ObjectId("4c5f572e8ead0ed00d2c0000") }
>
Only Renewed Once?

> db.students.find( {renewaldates: {$size: 1}}, {_id:1, renewaldates:1} )
{   "_id"   :   ObjectId("4c5f572e8ead0ed00d150000"),   "renewaldates"   :   [   {   "semester"   :   "201031",   "date"   :   "2010-08-16"   }   ]   }
{   "_id"   :   ObjectId("4c5f572e8ead0ed00d190000"),   "renewaldates"   :   [   {   "semester"   :   "201031",   "date"   :   "2010-08-16"   }   ]   }
{   "_id"   :   ObjectId("4c5f572e8ead0ed00d440000"),   "renewaldates"   :   [   {   "semester"   :   "201021",   "date"   :   "2010-05-15"   }   ]   }
{   "_id"   :   ObjectId("4c5f572e8ead0ed00d460000"),   "renewaldates"   :   [   {   "semester"   :   "201031",   "date"   :   "2010-08-16"   }   ]   }
{   "_id"   :   ObjectId("4c5f572e8ead0ed00d4e0000"),   "renewaldates"   :   [   {   "semester"   :   "201031",   "date"   :   "2010-08-16"   }   ]   }
{   "_id"   :   ObjectId("4c5f572e8ead0ed00d660000"),   "renewaldates"   :   [   {   "semester"   :   "201031",   "date"   :   "2010-08-16"   }   ]   }
{   "_id"   :   ObjectId("4c5f572e8ead0ed00d6f0000"),   "renewaldates"   :   [   {   "semester"   :   "201031",   "date"   :   "2010-08-16"   }   ]   }
{   "_id"   :   ObjectId("4c5f572e8ead0ed00d720000"),   "renewaldates"   :   [   {   "semester"   :   "201031",   "date"   :   "2010-08-16"   }   ]   }
{   "_id"   :   ObjectId("4c5f572e8ead0ed00d750000"),   "renewaldates"   :   [   {   "semester"   :   "201031",   "date"   :   "2010-08-16"   }   ]   }
{   "_id"   :   ObjectId("4c5f572e8ead0ed00d800000"),   "renewaldates"   :   [   {   "semester"   :   "201021",   "date"   :   "2010-05-15"   }   ]   }
{   "_id"   :   ObjectId("4c5f572e8ead0ed00d880000"),   "renewaldates"   :   [   {   "semester"   :   "201031",   "date"   :   "2010-08-16"   }   ]   }
{   "_id"   :   ObjectId("4c5f572e8ead0ed00d8d0000"),   "renewaldates"   :   [   {   "semester"   :   "201021",   "date"   :   "2010-05-15"   }   ]   }
{   "_id"   :   ObjectId("4c5f572e8ead0ed00d8f0000"),   "renewaldates"   :   [   {   "semester"   :   "201031",   "date"   :   "2010-08-16"   }   ]   }
{   "_id"   :   ObjectId("4c5f572e8ead0ed00d940000"),   "renewaldates"   :   [   {   "semester"   :   "201031",   "date"   :   "2010-08-16"   }   ]   }
>
How Many Blind
           Asthmatics?

> db.students.find( {disabilities: {$in: ['Blind', 'Asthma']}} ).count()
76
>
Who Lives On Elm?

>   db.students.find( {'localaddress.street': /.*Elm/}, {_id:1, 'localaddress.street':1} )
{   "_id" : ObjectId("4c5f572e8ead0ed00d220000"), "localaddress" : { "street" : "2807 Elm"   }   }
{   "_id" : ObjectId("4c5f572e8ead0ed00d290000"), "localaddress" : { "street" : "5762 Elm"   }   }
{   "_id" : ObjectId("4c5f572e8ead0ed00d400000"), "localaddress" : { "street" : "6261 Elm"   }   }
{   "_id" : ObjectId("4c5f572e8ead0ed00d610000"), "localaddress" : { "street" : "7099 Elm"   }   }
{   "_id" : ObjectId("4c5f572e8ead0ed00d930000"), "localaddress" : { "street" : "4994 Elm"   }   }
{   "_id" : ObjectId("4c5f572e8ead0ed00d960000"), "localaddress" : { "street" : "3456 Elm"   }   }
>
Number Registered On
   Or Before 2010/06/20?
> db.students
    .find( {$where: "new Date(this.registrationdate) <= new Date('2010/06/20')"} )
    .count()
136
>
Update Classification?

> db.students.find( {uin:'735383393'}, {_id: 1, uin: 1, classification: 1} )
{ "_id" : ObjectId("4c60dae48ead0e143e0f0000"), "uin" : "735383393", "classification" : "G2" }

> db.students.update( {uin:'735383393'}, {$set: {classification: 'G3'}} )

> db.students.find( {uin:'735383393'}, {_id: 1, uin: 1, classification: 1} )
{ "_id" : ObjectId("4c60dae48ead0e143e0f0000"), "uin" : "735383393", "classification" : "G3" }
>
Home Towns?
> db.students.distinct('permanentaddress.city')
[
    "Austin",
    "Chicago",
    "Dallas",
    "Denver",
    "Houston",
    "Los Angeles",
    "Lubbock",
    "New York"
]
>
Number of students by
      major?
  > db.students
      .group({
          key: {major:true},
          cond: {major: {$exists: true}},
          reduce: function(obj, prev) { prev.count += 1; },
          initial: { count: 0 }
      })
  [
      {"major" : "CPSC", "count" : 12},
      {"major" : "MECH", "count" : 16},
      {"major" : "ACCT", "count" : 18},
      {"major" : "MGMT", "count" : 18},
      {"major" : "FINC", "count" : 16},
      {"major" : "ENDS", "count" : 15},
      {"major" : "ARCH", "count" : 18},
      {"major" : "ENGL", "count" : 15},
      {"major" : "POLS", "count" : 22}
  ]
  >
How Many Students In
        A Course?
> db.students
    .find({'schedules.courses': {$in: [
         new DBRef('courses', new ObjectId('4c60dae48ead0e143e000000'))
    ]}})
    .count()
25
No Time To Cover...

• Map-Reduce
 • MongoDB has it, and it is extremely
    powerful
• GridFS
 • Store files/blobs
• Sharding/Replica Pairs/Master-Slave
Q&A
Resources
•   BASE: An Acid Alternative
    http://queue.acm.org/detail.cfm?id=1394128

•   PHP Mongo Driver Reference
    http://php.net/mongo

•   MongoDB Advance Query Reference
    http://www.mongodb.org/display/DOCS/Advanced+Queries

•   MongoDB Query Cheat Sheet
    http://www.10gen.com/reference

•   myNoSQL Blog
    http://nosql.mypopescu.com/

NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!

  • 1.
    NOSQL 101 Or: HowI Learned To Stop Worrying And Love The Mongo
  • 2.
    Who Am I? •Daniel Cousineau • Sr. Software Applications Developer at Texas A&M University • dcousineau@gmail.com • Twitter: @dcousineau • dcousineau.com
  • 3.
    In the beginning...there was Accounting.
  • 4.
  • 5.
    The RDBMS was createdto address this usage.
  • 6.
    RDBMS Ideology • ACID • Atomicity • Consistency • Isolation • Durability • All or nothing, no corruption, no mistakes • Accounting errors are EXPENSIVE!
  • 7.
    RDBMS Ideology • Pessimistic •Consistency at the end of EVERY step!
  • 8.
  • 9.
    Computers took on morecomplex tasks...
  • 10.
  • 11.
  • 12.
  • 13.
    What Is NOSQL? •Any data storage engine that does not use a SQL interface and does not use relational algebra
  • 14.
    NOSQL Ideology • BASE • Basically Available • Soft state • Eventually consistent • Zen-like • Content loss isn’t that big of a deal
  • 15.
    NOSQL Ideology • Optimistic •State will be in flux, just accept it
  • 16.
    NOSQL Ideology • Don’tDo More Work Than You Have To • Don’t Unnecessarily Duplicate Effort
  • 17.
  • 18.
    Types of NOSQL •Key-Value Stores • memcache/memcachedb • riak • tokyo cabinet/tyrant
  • 19.
    Types of NOSQL •Column-oriented • dynamo • bigtable • cassandra
  • 20.
    Types of NOSQL •Graph • neo4j
  • 21.
    Types of NOSQL •Document-oriented • couchdb • MongoDb
  • 22.
    Lets focus on MongoDB...
  • 23.
    Why MongoDB? • Becauseit’s what I need • Because I understand it • Because I’ve used it • Because it’s easy • Because it has superior driver support • Because I said so
  • 24.
    Support? • Operating Systems • Official Drivers • OSX 32/64bit • C, C++, Java, JavaScript, Perl, PHP, • Linux 32/64bit Python, Ruby • Windows 32/64bit • Community Drivers • Solaris i86pc • REST, C#, Clojure, ColdFusion, Delphi, Erlang, Factor, Fantom, F#, • Solaris 64 Go, Groovy, Haskell, Lua, Obj-C, PowerShell, Scala, Scheme, Smalltalk
  • 25.
    What is MongoDB? •A document-based storage system • Databases contain collections, collections contain documents • Documents are arbitrary BSON (extension of JSON) objects • No schema is enforced
  • 26.
    What is MongoDB? •Drivers expose MongoDB query API to languages in a form familiar and native • Drivers usually handle serialization • You always work in native system objects, BSON is really only used internally
  • 27.
    Install MongoDB $ wgethttp://fastdl.mongodb.org/osx/mongodb-osx-x86_64-1.6.0.tgz $ tar -xzvf ./mongodb-osx-x86_64-1.6.0.tgz $ ./mongodb-osx-x86_64-1.6.0/bin/mongod --dbpath=/path/to/save/db
  • 28.
    Manage MongoDB $ ./mongodb-osx-x86_64-1.6.0/bin/mongo MongoDBshell version: 1.6.0 connecting to: test > db.foo.save( {a:1} ) > db.foo.find() { "_id" : ObjectId("4c60d0143cd09f6d17a18094"), "a" : 1 } >
  • 29.
    Simple Queries $ ./mongodb-osx-x86_64-1.6.0/bin/mongo MongoDBshell version: 1.6.0 connecting to: test > db.foo.find() Get All Records > db.foo.find( {a: 1} ) Get All Records Where Property ‘a’ Is 1 > db.foo.find( {a: 1}, {_id: 1} ) Get The ‘_id’ Property Of All Records Where ‘a’ Is 1 > db.foo.find().limit( 1 ) Get 1 Record > db.foo.find().sort( {a: -1} ) Get All Records Sorted By Property ‘a’ In Reverse >
  • 30.
    Some Common Questions
  • 31.
    So I shouldchose MongoDB over MySQL? • Bad Question! • 90% of the time you’ll probably implement a hybrid system.
  • 32.
    When should Iuse MongoDB? • When an ORM is necessary • It’s in the name, Object-Relational Mapper • When you use a metric ton of 1:n and n:m tables to generate 1 object • And you rarely if ever use them for reporting
  • 33.
    MongoDB performance is better? • Too simple of a question • Performance comparable, MySQL usually wins sheer query speed • Sterilized Lab Test • MongoDB usually wins due to fewer queries required and no object reassembly • The Real World
  • 34.
    Can MongoDB enforce a schema? • You can add indexes on arbitrary keypatterns • Otherwise, why? • Application is responsible for correctness and error handling, no need to duplicate
  • 35.
    Can I trusteventual consistency? • No, but you shouldn’t trust ACID either • Build your application to be flexible and to handle consistency issues • Stale data is a fact of life
  • 36.
    Can MongoDB Handle Large Deployments? 1.2 TB over 5 billion records 600+ million documents Migrating ENTIRE app from Postgres http://www.mongodb.org/display/DOCS/Production+Deployments
  • 37.
    Can MongoDB Handle Large Deployments? • huMONGOousDB • 32-bit only supports ~2.5GB • Memory-mapped files • Individual documents limited to 4MB
  • 38.
    Why waste timewith theory?
  • 39.
  • 40.
    The ‘Have Done’ http://orthochronos.com
  • 41.
    Very Simple • Daemondoes insertion in the background • Front end just does simple grabs • Grab 1, Grab Many, etc.
  • 42.
    Data Model • Systemhas Stocks • Each Stock has Daily (once per day) and IntraDaily (every 15 minutes) data • Limited to trading hours • Each set of data (daily or intradaily) has 4 Graphs • Each graph has upwards of 6 Lines • Each line has between 300 to 800 Points
  • 43.
    Data Model • Witheach data point representing 1 minute, each 15-minute IntraDay graph will have about 785 overlapping points with the preceding graph • Why not consolidate into a single running table, and just SELECT ... LIMIT 800 points from X timestamp?
  • 44.
    Data Model • TheAlgorithm will cause past points to change • But each graph should be preserved so one can see the historical evolution of a given curve
  • 45.
    Data Model • Nowimagine implementing these requirements in a traditional RDBMS
  • 46.
    Data Model • Instead,lets see my MongoDB implementation
  • 47.
    Database Layout ~/MongoDB/bin $./mongo MongoDB shell version: 1.6.0 connecting to: test > use DBNAME switched to db DBNAME > show collections aapl aapl.daily aapl.intraday acas acas.daily acas.intraday ... wfr wfr.daily wfr.intraday x x.daily x.intraday
  • 48.
    Stock ‘Metadata’ { "_id": ObjectId("4c5a15038ead0eec04000000"), "timestamp" : 1228995000, "data" : [ "0", "92", "0" ] }
  • 49.
    Interval Data { "_id": ObjectId("4bb901cc8ead0e041a0d0000"), "timestamp" : 1228994100, "number" : 3, "charts" : [ { "number" : "1", "data" : [ -99999, -99999, -99999, -99999, -99999 ], "domainLength" : 300, "domainDates" : [ "Tue Nov 25 10:45:00 GMT-0500 2008", ... ], "lines" : [ { "first" : 76, "last" : 300, "points" : [ { "index" : 1, "value" : 0 }, { ... } ] }, { ... }, { ... }, { ... }, { ... }, { ... } ] }, { ... }, { ... }, { ... } ] }
  • 50.
    Connect /** *@return MongoDB */ protected static function _getDb() { static $db; if( !isset($db) ) { $mongo = new Mongo(); $db = $mongo->selectDb('...db name...'); } return $db; }
  • 51.
    SELECT TOP 1... //... $db= self::_getDb(); $collection = $db->selectCollection(strtolower($symbol)); $dti = $collection->find() ->sort(array( 'timestamp' => -1 )) ->limit(1) ->getNext(); //...
  • 52.
    Get Specific Timestamp //... $tstamp= strtotime($lastTimestamp); $cur = $collection->find(array( 'timestamp' => $tstamp )) ->limit(1); //...
  • 53.
    Only Get Timestamps //... $dailyCollection= $db->selectCollection(strtolower($symbol).'.daily'); $dailyCur = $dailyCollection->find(array(), array('timestamp')) ->sort(array( 'timestamp' => 1 )); foreach( $dailyCur as $timestamp ) { //... } //...
  • 54.
    Utilizing Collections //... $db =self::_getDb(); $stocks = array(); foreach( $db->listCollections() as $collection ) { $collection_name = strtolower($collection->getName()); if( preg_match('#^[a-z0-9]+$#i', $collection_name) ) { $collection_name = strtoupper($collection_name); $stocks[] = $collection_name; } } sort($stocks); //...
  • 55.
  • 56.
    Not Too Terrible •Keep track of Student Cases • A case keeps track of demographics, diagnoses, disabilities, notes, schedule, etc. • Also tracks Exams • Schedule multiple exams per course • Finally, students can log into a portal, counselors can log in • Basic user management
  • 57.
    Mostly Static • Mostinformation display only • Even with reporting
  • 58.
    So I usedgood old fashioned RDBMS design...
  • 59.
  • 60.
    Instead... • A collectionfor Student Cases • A collection for Courses • etc... • Denormalize!
  • 61.
  • 62.
    { A Student Document "_id":ObjectId("4c5f572e8ead0ed00d0f0000"), "uin":"485596916", "disabilities":[ "firstname":"Zach", "Blind", "middleinitial":"I", "Asthma" "lastname":"Hill", ], "major":"ACCT", "casenotes":[ "classification":"G4", ... "registrationdate":"2008-03-09", { "renewaldates":[ "counselor":"Zander King", ... "note":"lorem ipsum bibendum enim ..." { } "semester":"201021", ], "date":"2010-05-15" "schedules":[ } { ], "semester":"201021", "localaddress":{ "courses":[ "street":"9116 View", ... "city":"College Station", { "state":"TX", "$ref":"courses", "zip":"77840" "$id":ObjectId(...) }, } "permanentaddress":{ ] "street":"3960 Lake", } "city":"Los Angeles", ] "state":"CA", } "zip":"90001" },
  • 63.
    A Course Document { "_id" : ObjectId("4c5f572e8ead0ed00d000000"), "subject" : "MECH", "course" : 175, "section" : 506, "faculty" : "Dr. Quagmire Hill" }
  • 64.
  • 65.
    MongoDB DBRef • Similarto FK • Requires driver support • Not query-able
  • 66.
    But can wedo everything we need?
  • 67.
    Paginate? > db.students.find( {}, {_id:1} ).skip( 20 ).limit( 10 ) { "_id" : ObjectId("4c5f572e8ead0ed00d230000") } { "_id" : ObjectId("4c5f572e8ead0ed00d240000") } { "_id" : ObjectId("4c5f572e8ead0ed00d250000") } { "_id" : ObjectId("4c5f572e8ead0ed00d260000") } { "_id" : ObjectId("4c5f572e8ead0ed00d270000") } { "_id" : ObjectId("4c5f572e8ead0ed00d280000") } { "_id" : ObjectId("4c5f572e8ead0ed00d290000") } { "_id" : ObjectId("4c5f572e8ead0ed00d2a0000") } { "_id" : ObjectId("4c5f572e8ead0ed00d2b0000") } { "_id" : ObjectId("4c5f572e8ead0ed00d2c0000") } >
  • 68.
    Only Renewed Once? >db.students.find( {renewaldates: {$size: 1}}, {_id:1, renewaldates:1} ) { "_id" : ObjectId("4c5f572e8ead0ed00d150000"), "renewaldates" : [ { "semester" : "201031", "date" : "2010-08-16" } ] } { "_id" : ObjectId("4c5f572e8ead0ed00d190000"), "renewaldates" : [ { "semester" : "201031", "date" : "2010-08-16" } ] } { "_id" : ObjectId("4c5f572e8ead0ed00d440000"), "renewaldates" : [ { "semester" : "201021", "date" : "2010-05-15" } ] } { "_id" : ObjectId("4c5f572e8ead0ed00d460000"), "renewaldates" : [ { "semester" : "201031", "date" : "2010-08-16" } ] } { "_id" : ObjectId("4c5f572e8ead0ed00d4e0000"), "renewaldates" : [ { "semester" : "201031", "date" : "2010-08-16" } ] } { "_id" : ObjectId("4c5f572e8ead0ed00d660000"), "renewaldates" : [ { "semester" : "201031", "date" : "2010-08-16" } ] } { "_id" : ObjectId("4c5f572e8ead0ed00d6f0000"), "renewaldates" : [ { "semester" : "201031", "date" : "2010-08-16" } ] } { "_id" : ObjectId("4c5f572e8ead0ed00d720000"), "renewaldates" : [ { "semester" : "201031", "date" : "2010-08-16" } ] } { "_id" : ObjectId("4c5f572e8ead0ed00d750000"), "renewaldates" : [ { "semester" : "201031", "date" : "2010-08-16" } ] } { "_id" : ObjectId("4c5f572e8ead0ed00d800000"), "renewaldates" : [ { "semester" : "201021", "date" : "2010-05-15" } ] } { "_id" : ObjectId("4c5f572e8ead0ed00d880000"), "renewaldates" : [ { "semester" : "201031", "date" : "2010-08-16" } ] } { "_id" : ObjectId("4c5f572e8ead0ed00d8d0000"), "renewaldates" : [ { "semester" : "201021", "date" : "2010-05-15" } ] } { "_id" : ObjectId("4c5f572e8ead0ed00d8f0000"), "renewaldates" : [ { "semester" : "201031", "date" : "2010-08-16" } ] } { "_id" : ObjectId("4c5f572e8ead0ed00d940000"), "renewaldates" : [ { "semester" : "201031", "date" : "2010-08-16" } ] } >
  • 69.
    How Many Blind Asthmatics? > db.students.find( {disabilities: {$in: ['Blind', 'Asthma']}} ).count() 76 >
  • 70.
    Who Lives OnElm? > db.students.find( {'localaddress.street': /.*Elm/}, {_id:1, 'localaddress.street':1} ) { "_id" : ObjectId("4c5f572e8ead0ed00d220000"), "localaddress" : { "street" : "2807 Elm" } } { "_id" : ObjectId("4c5f572e8ead0ed00d290000"), "localaddress" : { "street" : "5762 Elm" } } { "_id" : ObjectId("4c5f572e8ead0ed00d400000"), "localaddress" : { "street" : "6261 Elm" } } { "_id" : ObjectId("4c5f572e8ead0ed00d610000"), "localaddress" : { "street" : "7099 Elm" } } { "_id" : ObjectId("4c5f572e8ead0ed00d930000"), "localaddress" : { "street" : "4994 Elm" } } { "_id" : ObjectId("4c5f572e8ead0ed00d960000"), "localaddress" : { "street" : "3456 Elm" } } >
  • 71.
    Number Registered On Or Before 2010/06/20? > db.students .find( {$where: "new Date(this.registrationdate) <= new Date('2010/06/20')"} ) .count() 136 >
  • 72.
    Update Classification? > db.students.find({uin:'735383393'}, {_id: 1, uin: 1, classification: 1} ) { "_id" : ObjectId("4c60dae48ead0e143e0f0000"), "uin" : "735383393", "classification" : "G2" } > db.students.update( {uin:'735383393'}, {$set: {classification: 'G3'}} ) > db.students.find( {uin:'735383393'}, {_id: 1, uin: 1, classification: 1} ) { "_id" : ObjectId("4c60dae48ead0e143e0f0000"), "uin" : "735383393", "classification" : "G3" } >
  • 73.
    Home Towns? > db.students.distinct('permanentaddress.city') [ "Austin", "Chicago", "Dallas", "Denver", "Houston", "Los Angeles", "Lubbock", "New York" ] >
  • 74.
    Number of studentsby major? > db.students .group({ key: {major:true}, cond: {major: {$exists: true}}, reduce: function(obj, prev) { prev.count += 1; }, initial: { count: 0 } }) [ {"major" : "CPSC", "count" : 12}, {"major" : "MECH", "count" : 16}, {"major" : "ACCT", "count" : 18}, {"major" : "MGMT", "count" : 18}, {"major" : "FINC", "count" : 16}, {"major" : "ENDS", "count" : 15}, {"major" : "ARCH", "count" : 18}, {"major" : "ENGL", "count" : 15}, {"major" : "POLS", "count" : 22} ] >
  • 75.
    How Many StudentsIn A Course? > db.students .find({'schedules.courses': {$in: [ new DBRef('courses', new ObjectId('4c60dae48ead0e143e000000')) ]}}) .count() 25
  • 76.
    No Time ToCover... • Map-Reduce • MongoDB has it, and it is extremely powerful • GridFS • Store files/blobs • Sharding/Replica Pairs/Master-Slave
  • 77.
  • 78.
    Resources • BASE: An Acid Alternative http://queue.acm.org/detail.cfm?id=1394128 • PHP Mongo Driver Reference http://php.net/mongo • MongoDB Advance Query Reference http://www.mongodb.org/display/DOCS/Advanced+Queries • MongoDB Query Cheat Sheet http://www.10gen.com/reference • myNoSQL Blog http://nosql.mypopescu.com/