• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
 

NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!

on

  • 3,565 views

Lets learn the philosophy NOSQL takes (from a developer's standpoint), the changes you'll (not) have to take, discuss mongo, and see some practical examples! ...

Lets learn the philosophy NOSQL takes (from a developer's standpoint), the changes you'll (not) have to take, discuss mongo, and see some practical examples!

These are my first revision of this talk and will be making some organizational improvements late.

Statistics

Views

Total Views
3,565
Views on SlideShare
3,545
Embed Views
20

Actions

Likes
7
Downloads
123
Comments
0

4 Embeds 20

http://rg443blog.wordpress.com 16
http://coderwall.com 2
http://www.schoox.com 1
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

CC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />

NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo! NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo! Presentation Transcript

  • NOSQL 101 Or: How I Learned To Stop Worrying And Love The Mongo
  • Who Am I? • Daniel Cousineau • Sr. Software Applications Developer at Texas A&M University • dcousineau@gmail.com • Twitter: @dcousineau • dcousineau.com
  • In the beginning... there was Accounting.
  • Row Based, Fixed Schema
  • The RDBMS was created to address this usage.
  • RDBMS Ideology • ACID • Atomicity • Consistency • Isolation • Durability • All or nothing, no corruption, no mistakes • Accounting errors are EXPENSIVE!
  • RDBMS Ideology • Pessimistic • Consistency at the end of EVERY step!
  • Moore’s Law happened.
  • Computers took on more complex tasks...
  • Problems became... Dynamic
  • NOSQL attempts to address the Dynamic.
  • What is NOSQL?
  • What Is NOSQL? • Any data storage engine that does not use a SQL interface and does not use relational algebra
  • NOSQL Ideology • BASE • Basically Available • Soft state • Eventually consistent • Zen-like • Content loss isn’t that big of a deal
  • NOSQL Ideology • Optimistic • State will be in flux, just accept it
  • NOSQL Ideology • Don’t Do More Work Than You Have To • Don’t Unnecessarily Duplicate Effort
  • NOSQL is diverse...
  • Types of NOSQL • Key-Value Stores • memcache/memcachedb • riak • tokyo cabinet/tyrant
  • Types of NOSQL • Column-oriented • dynamo • bigtable • cassandra
  • Types of NOSQL • Graph • neo4j
  • Types of NOSQL • Document-oriented • couchdb • MongoDb
  • Lets focus on MongoDB...
  • Why MongoDB? • Because it’s what I need • Because I understand it • Because I’ve used it • Because it’s easy • Because it has superior driver support • Because I said so
  • Support? • Operating Systems • Official Drivers • OSX 32/64bit • C, C++, Java, JavaScript, Perl, PHP, • Linux 32/64bit Python, Ruby • Windows 32/64bit • Community Drivers • Solaris i86pc • REST, C#, Clojure, ColdFusion, Delphi, Erlang, Factor, Fantom, F#, • Solaris 64 Go, Groovy, Haskell, Lua, Obj-C, PowerShell, Scala, Scheme, Smalltalk
  • What is MongoDB? • A document-based storage system • Databases contain collections, collections contain documents • Documents are arbitrary BSON (extension of JSON) objects • No schema is enforced
  • What is MongoDB? • Drivers expose MongoDB query API to languages in a form familiar and native • Drivers usually handle serialization • You always work in native system objects, BSON is really only used internally
  • Install MongoDB $ wget http://fastdl.mongodb.org/osx/mongodb-osx-x86_64-1.6.0.tgz $ tar -xzvf ./mongodb-osx-x86_64-1.6.0.tgz $ ./mongodb-osx-x86_64-1.6.0/bin/mongod --dbpath=/path/to/save/db
  • Manage MongoDB $ ./mongodb-osx-x86_64-1.6.0/bin/mongo MongoDB shell version: 1.6.0 connecting to: test > db.foo.save( {a:1} ) > db.foo.find() { "_id" : ObjectId("4c60d0143cd09f6d17a18094"), "a" : 1 } >
  • Simple Queries $ ./mongodb-osx-x86_64-1.6.0/bin/mongo MongoDB shell version: 1.6.0 connecting to: test > db.foo.find() Get All Records > db.foo.find( {a: 1} ) Get All Records Where Property ‘a’ Is 1 > db.foo.find( {a: 1}, {_id: 1} ) Get The ‘_id’ Property Of All Records Where ‘a’ Is 1 > db.foo.find().limit( 1 ) Get 1 Record > db.foo.find().sort( {a: -1} ) Get All Records Sorted By Property ‘a’ In Reverse >
  • Some Common Questions
  • So I should chose MongoDB over MySQL? • Bad Question! • 90% of the time you’ll probably implement a hybrid system.
  • When should I use MongoDB? • When an ORM is necessary • It’s in the name, Object-Relational Mapper • When you use a metric ton of 1:n and n:m tables to generate 1 object • And you rarely if ever use them for reporting
  • MongoDB performance is better? • Too simple of a question • Performance comparable, MySQL usually wins sheer query speed • Sterilized Lab Test • MongoDB usually wins due to fewer queries required and no object reassembly • The Real World
  • Can MongoDB enforce a schema? • You can add indexes on arbitrary keypatterns • Otherwise, why? • Application is responsible for correctness and error handling, no need to duplicate
  • Can I trust eventual consistency? • No, but you shouldn’t trust ACID either • Build your application to be flexible and to handle consistency issues • Stale data is a fact of life
  • Can MongoDB Handle Large Deployments? 1.2 TB over 5 billion records 600+ million documents Migrating ENTIRE app from Postgres http://www.mongodb.org/display/DOCS/Production+Deployments
  • Can MongoDB Handle Large Deployments? • huMONGOousDB • 32-bit only supports ~2.5GB • Memory-mapped files • Individual documents limited to 4MB
  • Why waste time with theory?
  • 2 Case Studies
  • The ‘Have Done’ http://orthochronos.com
  • Very Simple • Daemon does insertion in the background • Front end just does simple grabs • Grab 1, Grab Many, etc.
  • Data Model • System has Stocks • Each Stock has Daily (once per day) and IntraDaily (every 15 minutes) data • Limited to trading hours • Each set of data (daily or intradaily) has 4 Graphs • Each graph has upwards of 6 Lines • Each line has between 300 to 800 Points
  • Data Model • With each data point representing 1 minute, each 15-minute IntraDay graph will have about 785 overlapping points with the preceding graph • Why not consolidate into a single running table, and just SELECT ... LIMIT 800 points from X timestamp?
  • Data Model • The Algorithm will cause past points to change • But each graph should be preserved so one can see the historical evolution of a given curve
  • Data Model • Now imagine implementing these requirements in a traditional RDBMS
  • Data Model • Instead, lets see my MongoDB implementation
  • Database Layout ~/MongoDB/bin $ ./mongo MongoDB shell version: 1.6.0 connecting to: test > use DBNAME switched to db DBNAME > show collections aapl aapl.daily aapl.intraday acas acas.daily acas.intraday ... wfr wfr.daily wfr.intraday x x.daily x.intraday
  • Stock ‘Metadata’ { "_id" : ObjectId("4c5a15038ead0eec04000000"), "timestamp" : 1228995000, "data" : [ "0", "92", "0" ] }
  • Interval Data { "_id" : ObjectId("4bb901cc8ead0e041a0d0000"), "timestamp" : 1228994100, "number" : 3, "charts" : [ { "number" : "1", "data" : [ -99999, -99999, -99999, -99999, -99999 ], "domainLength" : 300, "domainDates" : [ "Tue Nov 25 10:45:00 GMT-0500 2008", ... ], "lines" : [ { "first" : 76, "last" : 300, "points" : [ { "index" : 1, "value" : 0 }, { ... } ] }, { ... }, { ... }, { ... }, { ... }, { ... } ] }, { ... }, { ... }, { ... } ] }
  • Connect /** * @return MongoDB */ protected static function _getDb() { static $db; if( !isset($db) ) { $mongo = new Mongo(); $db = $mongo->selectDb('...db name...'); } return $db; }
  • SELECT TOP 1... //... $db = self::_getDb(); $collection = $db->selectCollection(strtolower($symbol)); $dti = $collection->find() ->sort(array( 'timestamp' => -1 )) ->limit(1) ->getNext(); //...
  • Get Specific Timestamp //... $tstamp = strtotime($lastTimestamp); $cur = $collection->find(array( 'timestamp' => $tstamp )) ->limit(1); //...
  • Only Get Timestamps //... $dailyCollection = $db->selectCollection(strtolower($symbol).'.daily'); $dailyCur = $dailyCollection->find(array(), array('timestamp')) ->sort(array( 'timestamp' => 1 )); foreach( $dailyCur as $timestamp ) { //... } //...
  • Utilizing Collections //... $db = self::_getDb(); $stocks = array(); foreach( $db->listCollections() as $collection ) { $collection_name = strtolower($collection->getName()); if( preg_match('#^[a-z0-9]+$#i', $collection_name) ) { $collection_name = strtoupper($collection_name); $stocks[] = $collection_name; } } sort($stocks); //...
  • The ‘Wish I Did’
  • Not Too Terrible • Keep track of Student Cases • A case keeps track of demographics, diagnoses, disabilities, notes, schedule, etc. • Also tracks Exams • Schedule multiple exams per course • Finally, students can log into a portal, counselors can log in • Basic user management
  • Mostly Static • Most information display only • Even with reporting
  • So I used good old fashioned RDBMS design...
  • lolwut?
  • Instead... • A collection for Student Cases • A collection for Courses • etc... • Denormalize!
  • Boyce-Codd Who?
  • { A Student Document "_id":ObjectId("4c5f572e8ead0ed00d0f0000"), "uin":"485596916", "disabilities":[ "firstname":"Zach", "Blind", "middleinitial":"I", "Asthma" "lastname":"Hill", ], "major":"ACCT", "casenotes":[ "classification":"G4", ... "registrationdate":"2008-03-09", { "renewaldates":[ "counselor":"Zander King", ... "note":"lorem ipsum bibendum enim ..." { } "semester":"201021", ], "date":"2010-05-15" "schedules":[ } { ], "semester":"201021", "localaddress":{ "courses":[ "street":"9116 View", ... "city":"College Station", { "state":"TX", "$ref":"courses", "zip":"77840" "$id":ObjectId(...) }, } "permanentaddress":{ ] "street":"3960 Lake", } "city":"Los Angeles", ] "state":"CA", } "zip":"90001" },
  • A Course Document { "_id" : ObjectId("4c5f572e8ead0ed00d000000"), "subject" : "MECH", "course" : 175, "section" : 506, "faculty" : "Dr. Quagmire Hill" }
  • I lied...
  • MongoDB DBRef • Similar to FK • Requires driver support • Not query-able
  • But can we do everything we need?
  • Paginate? > db.students.find( {}, {_id:1} ).skip( 20 ).limit( 10 ) { "_id" : ObjectId("4c5f572e8ead0ed00d230000") } { "_id" : ObjectId("4c5f572e8ead0ed00d240000") } { "_id" : ObjectId("4c5f572e8ead0ed00d250000") } { "_id" : ObjectId("4c5f572e8ead0ed00d260000") } { "_id" : ObjectId("4c5f572e8ead0ed00d270000") } { "_id" : ObjectId("4c5f572e8ead0ed00d280000") } { "_id" : ObjectId("4c5f572e8ead0ed00d290000") } { "_id" : ObjectId("4c5f572e8ead0ed00d2a0000") } { "_id" : ObjectId("4c5f572e8ead0ed00d2b0000") } { "_id" : ObjectId("4c5f572e8ead0ed00d2c0000") } >
  • Only Renewed Once? > db.students.find( {renewaldates: {$size: 1}}, {_id:1, renewaldates:1} ) { "_id" : ObjectId("4c5f572e8ead0ed00d150000"), "renewaldates" : [ { "semester" : "201031", "date" : "2010-08-16" } ] } { "_id" : ObjectId("4c5f572e8ead0ed00d190000"), "renewaldates" : [ { "semester" : "201031", "date" : "2010-08-16" } ] } { "_id" : ObjectId("4c5f572e8ead0ed00d440000"), "renewaldates" : [ { "semester" : "201021", "date" : "2010-05-15" } ] } { "_id" : ObjectId("4c5f572e8ead0ed00d460000"), "renewaldates" : [ { "semester" : "201031", "date" : "2010-08-16" } ] } { "_id" : ObjectId("4c5f572e8ead0ed00d4e0000"), "renewaldates" : [ { "semester" : "201031", "date" : "2010-08-16" } ] } { "_id" : ObjectId("4c5f572e8ead0ed00d660000"), "renewaldates" : [ { "semester" : "201031", "date" : "2010-08-16" } ] } { "_id" : ObjectId("4c5f572e8ead0ed00d6f0000"), "renewaldates" : [ { "semester" : "201031", "date" : "2010-08-16" } ] } { "_id" : ObjectId("4c5f572e8ead0ed00d720000"), "renewaldates" : [ { "semester" : "201031", "date" : "2010-08-16" } ] } { "_id" : ObjectId("4c5f572e8ead0ed00d750000"), "renewaldates" : [ { "semester" : "201031", "date" : "2010-08-16" } ] } { "_id" : ObjectId("4c5f572e8ead0ed00d800000"), "renewaldates" : [ { "semester" : "201021", "date" : "2010-05-15" } ] } { "_id" : ObjectId("4c5f572e8ead0ed00d880000"), "renewaldates" : [ { "semester" : "201031", "date" : "2010-08-16" } ] } { "_id" : ObjectId("4c5f572e8ead0ed00d8d0000"), "renewaldates" : [ { "semester" : "201021", "date" : "2010-05-15" } ] } { "_id" : ObjectId("4c5f572e8ead0ed00d8f0000"), "renewaldates" : [ { "semester" : "201031", "date" : "2010-08-16" } ] } { "_id" : ObjectId("4c5f572e8ead0ed00d940000"), "renewaldates" : [ { "semester" : "201031", "date" : "2010-08-16" } ] } >
  • How Many Blind Asthmatics? > db.students.find( {disabilities: {$in: ['Blind', 'Asthma']}} ).count() 76 >
  • Who Lives On Elm? > db.students.find( {'localaddress.street': /.*Elm/}, {_id:1, 'localaddress.street':1} ) { "_id" : ObjectId("4c5f572e8ead0ed00d220000"), "localaddress" : { "street" : "2807 Elm" } } { "_id" : ObjectId("4c5f572e8ead0ed00d290000"), "localaddress" : { "street" : "5762 Elm" } } { "_id" : ObjectId("4c5f572e8ead0ed00d400000"), "localaddress" : { "street" : "6261 Elm" } } { "_id" : ObjectId("4c5f572e8ead0ed00d610000"), "localaddress" : { "street" : "7099 Elm" } } { "_id" : ObjectId("4c5f572e8ead0ed00d930000"), "localaddress" : { "street" : "4994 Elm" } } { "_id" : ObjectId("4c5f572e8ead0ed00d960000"), "localaddress" : { "street" : "3456 Elm" } } >
  • Number Registered On Or Before 2010/06/20? > db.students .find( {$where: "new Date(this.registrationdate) <= new Date('2010/06/20')"} ) .count() 136 >
  • Update Classification? > db.students.find( {uin:'735383393'}, {_id: 1, uin: 1, classification: 1} ) { "_id" : ObjectId("4c60dae48ead0e143e0f0000"), "uin" : "735383393", "classification" : "G2" } > db.students.update( {uin:'735383393'}, {$set: {classification: 'G3'}} ) > db.students.find( {uin:'735383393'}, {_id: 1, uin: 1, classification: 1} ) { "_id" : ObjectId("4c60dae48ead0e143e0f0000"), "uin" : "735383393", "classification" : "G3" } >
  • Home Towns? > db.students.distinct('permanentaddress.city') [ "Austin", "Chicago", "Dallas", "Denver", "Houston", "Los Angeles", "Lubbock", "New York" ] >
  • Number of students by major? > db.students .group({ key: {major:true}, cond: {major: {$exists: true}}, reduce: function(obj, prev) { prev.count += 1; }, initial: { count: 0 } }) [ {"major" : "CPSC", "count" : 12}, {"major" : "MECH", "count" : 16}, {"major" : "ACCT", "count" : 18}, {"major" : "MGMT", "count" : 18}, {"major" : "FINC", "count" : 16}, {"major" : "ENDS", "count" : 15}, {"major" : "ARCH", "count" : 18}, {"major" : "ENGL", "count" : 15}, {"major" : "POLS", "count" : 22} ] >
  • How Many Students In A Course? > db.students .find({'schedules.courses': {$in: [ new DBRef('courses', new ObjectId('4c60dae48ead0e143e000000')) ]}}) .count() 25
  • No Time To Cover... • Map-Reduce • MongoDB has it, and it is extremely powerful • GridFS • Store files/blobs • Sharding/Replica Pairs/Master-Slave
  • Q&A
  • Resources • BASE: An Acid Alternative http://queue.acm.org/detail.cfm?id=1394128 • PHP Mongo Driver Reference http://php.net/mongo • MongoDB Advance Query Reference http://www.mongodb.org/display/DOCS/Advanced+Queries • MongoDB Query Cheat Sheet http://www.10gen.com/reference • myNoSQL Blog http://nosql.mypopescu.com/