NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!

3,737 views

Published on

Lets learn the philosophy NOSQL takes (from a developer's standpoint), the changes you'll (not) have to take, discuss mongo, and see some practical examples!

These are my first revision of this talk and will be making some organizational improvements late.

Published in: Technology
0 Comments
10 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,737
On SlideShare
0
From Embeds
0
Number of Embeds
21
Actions
Shares
0
Downloads
133
Comments
0
Likes
10
Embeds 0
No embeds

No notes for slide














































































  • NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!

    1. 1. NOSQL 101 Or: How I Learned To Stop Worrying And Love The Mongo
    2. 2. Who Am I? • Daniel Cousineau • Sr. Software Applications Developer at Texas A&M University • dcousineau@gmail.com • Twitter: @dcousineau • dcousineau.com
    3. 3. In the beginning... there was Accounting.
    4. 4. Row Based, Fixed Schema
    5. 5. The RDBMS was created to address this usage.
    6. 6. RDBMS Ideology • ACID • Atomicity • Consistency • Isolation • Durability • All or nothing, no corruption, no mistakes • Accounting errors are EXPENSIVE!
    7. 7. RDBMS Ideology • Pessimistic • Consistency at the end of EVERY step!
    8. 8. Moore’s Law happened.
    9. 9. Computers took on more complex tasks...
    10. 10. Problems became... Dynamic
    11. 11. NOSQL attempts to address the Dynamic.
    12. 12. What is NOSQL?
    13. 13. What Is NOSQL? • Any data storage engine that does not use a SQL interface and does not use relational algebra
    14. 14. NOSQL Ideology • BASE • Basically Available • Soft state • Eventually consistent • Zen-like • Content loss isn’t that big of a deal
    15. 15. NOSQL Ideology • Optimistic • State will be in flux, just accept it
    16. 16. NOSQL Ideology • Don’t Do More Work Than You Have To • Don’t Unnecessarily Duplicate Effort
    17. 17. NOSQL is diverse...
    18. 18. Types of NOSQL • Key-Value Stores • memcache/memcachedb • riak • tokyo cabinet/tyrant
    19. 19. Types of NOSQL • Column-oriented • dynamo • bigtable • cassandra
    20. 20. Types of NOSQL • Graph • neo4j
    21. 21. Types of NOSQL • Document-oriented • couchdb • MongoDb
    22. 22. Lets focus on MongoDB...
    23. 23. Why MongoDB? • Because it’s what I need • Because I understand it • Because I’ve used it • Because it’s easy • Because it has superior driver support • Because I said so
    24. 24. Support? • Operating Systems • Official Drivers • OSX 32/64bit • C, C++, Java, JavaScript, Perl, PHP, • Linux 32/64bit Python, Ruby • Windows 32/64bit • Community Drivers • Solaris i86pc • REST, C#, Clojure, ColdFusion, Delphi, Erlang, Factor, Fantom, F#, • Solaris 64 Go, Groovy, Haskell, Lua, Obj-C, PowerShell, Scala, Scheme, Smalltalk
    25. 25. What is MongoDB? • A document-based storage system • Databases contain collections, collections contain documents • Documents are arbitrary BSON (extension of JSON) objects • No schema is enforced
    26. 26. What is MongoDB? • Drivers expose MongoDB query API to languages in a form familiar and native • Drivers usually handle serialization • You always work in native system objects, BSON is really only used internally
    27. 27. Install MongoDB $ wget http://fastdl.mongodb.org/osx/mongodb-osx-x86_64-1.6.0.tgz $ tar -xzvf ./mongodb-osx-x86_64-1.6.0.tgz $ ./mongodb-osx-x86_64-1.6.0/bin/mongod --dbpath=/path/to/save/db
    28. 28. Manage MongoDB $ ./mongodb-osx-x86_64-1.6.0/bin/mongo MongoDB shell version: 1.6.0 connecting to: test > db.foo.save( {a:1} ) > db.foo.find() { "_id" : ObjectId("4c60d0143cd09f6d17a18094"), "a" : 1 } >
    29. 29. Simple Queries $ ./mongodb-osx-x86_64-1.6.0/bin/mongo MongoDB shell version: 1.6.0 connecting to: test > db.foo.find() Get All Records > db.foo.find( {a: 1} ) Get All Records Where Property ‘a’ Is 1 > db.foo.find( {a: 1}, {_id: 1} ) Get The ‘_id’ Property Of All Records Where ‘a’ Is 1 > db.foo.find().limit( 1 ) Get 1 Record > db.foo.find().sort( {a: -1} ) Get All Records Sorted By Property ‘a’ In Reverse >
    30. 30. Some Common Questions
    31. 31. So I should chose MongoDB over MySQL? • Bad Question! • 90% of the time you’ll probably implement a hybrid system.
    32. 32. When should I use MongoDB? • When an ORM is necessary • It’s in the name, Object-Relational Mapper • When you use a metric ton of 1:n and n:m tables to generate 1 object • And you rarely if ever use them for reporting
    33. 33. MongoDB performance is better? • Too simple of a question • Performance comparable, MySQL usually wins sheer query speed • Sterilized Lab Test • MongoDB usually wins due to fewer queries required and no object reassembly • The Real World
    34. 34. Can MongoDB enforce a schema? • You can add indexes on arbitrary keypatterns • Otherwise, why? • Application is responsible for correctness and error handling, no need to duplicate
    35. 35. Can I trust eventual consistency? • No, but you shouldn’t trust ACID either • Build your application to be flexible and to handle consistency issues • Stale data is a fact of life
    36. 36. Can MongoDB Handle Large Deployments? 1.2 TB over 5 billion records 600+ million documents Migrating ENTIRE app from Postgres http://www.mongodb.org/display/DOCS/Production+Deployments
    37. 37. Can MongoDB Handle Large Deployments? • huMONGOousDB • 32-bit only supports ~2.5GB • Memory-mapped files • Individual documents limited to 4MB
    38. 38. Why waste time with theory?
    39. 39. 2 Case Studies
    40. 40. The ‘Have Done’ http://orthochronos.com
    41. 41. Very Simple • Daemon does insertion in the background • Front end just does simple grabs • Grab 1, Grab Many, etc.
    42. 42. Data Model • System has Stocks • Each Stock has Daily (once per day) and IntraDaily (every 15 minutes) data • Limited to trading hours • Each set of data (daily or intradaily) has 4 Graphs • Each graph has upwards of 6 Lines • Each line has between 300 to 800 Points
    43. 43. Data Model • With each data point representing 1 minute, each 15-minute IntraDay graph will have about 785 overlapping points with the preceding graph • Why not consolidate into a single running table, and just SELECT ... LIMIT 800 points from X timestamp?
    44. 44. Data Model • The Algorithm will cause past points to change • But each graph should be preserved so one can see the historical evolution of a given curve
    45. 45. Data Model • Now imagine implementing these requirements in a traditional RDBMS
    46. 46. Data Model • Instead, lets see my MongoDB implementation
    47. 47. Database Layout ~/MongoDB/bin $ ./mongo MongoDB shell version: 1.6.0 connecting to: test > use DBNAME switched to db DBNAME > show collections aapl aapl.daily aapl.intraday acas acas.daily acas.intraday ... wfr wfr.daily wfr.intraday x x.daily x.intraday
    48. 48. Stock ‘Metadata’ { "_id" : ObjectId("4c5a15038ead0eec04000000"), "timestamp" : 1228995000, "data" : [ "0", "92", "0" ] }
    49. 49. Interval Data { "_id" : ObjectId("4bb901cc8ead0e041a0d0000"), "timestamp" : 1228994100, "number" : 3, "charts" : [ { "number" : "1", "data" : [ -99999, -99999, -99999, -99999, -99999 ], "domainLength" : 300, "domainDates" : [ "Tue Nov 25 10:45:00 GMT-0500 2008", ... ], "lines" : [ { "first" : 76, "last" : 300, "points" : [ { "index" : 1, "value" : 0 }, { ... } ] }, { ... }, { ... }, { ... }, { ... }, { ... } ] }, { ... }, { ... }, { ... } ] }
    50. 50. Connect /** * @return MongoDB */ protected static function _getDb() { static $db; if( !isset($db) ) { $mongo = new Mongo(); $db = $mongo->selectDb('...db name...'); } return $db; }
    51. 51. SELECT TOP 1... //... $db = self::_getDb(); $collection = $db->selectCollection(strtolower($symbol)); $dti = $collection->find() ->sort(array( 'timestamp' => -1 )) ->limit(1) ->getNext(); //...
    52. 52. Get Specific Timestamp //... $tstamp = strtotime($lastTimestamp); $cur = $collection->find(array( 'timestamp' => $tstamp )) ->limit(1); //...
    53. 53. Only Get Timestamps //... $dailyCollection = $db->selectCollection(strtolower($symbol).'.daily'); $dailyCur = $dailyCollection->find(array(), array('timestamp')) ->sort(array( 'timestamp' => 1 )); foreach( $dailyCur as $timestamp ) { //... } //...
    54. 54. Utilizing Collections //... $db = self::_getDb(); $stocks = array(); foreach( $db->listCollections() as $collection ) { $collection_name = strtolower($collection->getName()); if( preg_match('#^[a-z0-9]+$#i', $collection_name) ) { $collection_name = strtoupper($collection_name); $stocks[] = $collection_name; } } sort($stocks); //...
    55. 55. The ‘Wish I Did’
    56. 56. Not Too Terrible • Keep track of Student Cases • A case keeps track of demographics, diagnoses, disabilities, notes, schedule, etc. • Also tracks Exams • Schedule multiple exams per course • Finally, students can log into a portal, counselors can log in • Basic user management
    57. 57. Mostly Static • Most information display only • Even with reporting
    58. 58. So I used good old fashioned RDBMS design...
    59. 59. lolwut?
    60. 60. Instead... • A collection for Student Cases • A collection for Courses • etc... • Denormalize!
    61. 61. Boyce-Codd Who?
    62. 62. { A Student Document "_id":ObjectId("4c5f572e8ead0ed00d0f0000"), "uin":"485596916", "disabilities":[ "firstname":"Zach", "Blind", "middleinitial":"I", "Asthma" "lastname":"Hill", ], "major":"ACCT", "casenotes":[ "classification":"G4", ... "registrationdate":"2008-03-09", { "renewaldates":[ "counselor":"Zander King", ... "note":"lorem ipsum bibendum enim ..." { } "semester":"201021", ], "date":"2010-05-15" "schedules":[ } { ], "semester":"201021", "localaddress":{ "courses":[ "street":"9116 View", ... "city":"College Station", { "state":"TX", "$ref":"courses", "zip":"77840" "$id":ObjectId(...) }, } "permanentaddress":{ ] "street":"3960 Lake", } "city":"Los Angeles", ] "state":"CA", } "zip":"90001" },
    63. 63. A Course Document { "_id" : ObjectId("4c5f572e8ead0ed00d000000"), "subject" : "MECH", "course" : 175, "section" : 506, "faculty" : "Dr. Quagmire Hill" }
    64. 64. I lied...
    65. 65. MongoDB DBRef • Similar to FK • Requires driver support • Not query-able
    66. 66. But can we do everything we need?
    67. 67. Paginate? > db.students.find( {}, {_id:1} ).skip( 20 ).limit( 10 ) { "_id" : ObjectId("4c5f572e8ead0ed00d230000") } { "_id" : ObjectId("4c5f572e8ead0ed00d240000") } { "_id" : ObjectId("4c5f572e8ead0ed00d250000") } { "_id" : ObjectId("4c5f572e8ead0ed00d260000") } { "_id" : ObjectId("4c5f572e8ead0ed00d270000") } { "_id" : ObjectId("4c5f572e8ead0ed00d280000") } { "_id" : ObjectId("4c5f572e8ead0ed00d290000") } { "_id" : ObjectId("4c5f572e8ead0ed00d2a0000") } { "_id" : ObjectId("4c5f572e8ead0ed00d2b0000") } { "_id" : ObjectId("4c5f572e8ead0ed00d2c0000") } >
    68. 68. Only Renewed Once? > db.students.find( {renewaldates: {$size: 1}}, {_id:1, renewaldates:1} ) { "_id" : ObjectId("4c5f572e8ead0ed00d150000"), "renewaldates" : [ { "semester" : "201031", "date" : "2010-08-16" } ] } { "_id" : ObjectId("4c5f572e8ead0ed00d190000"), "renewaldates" : [ { "semester" : "201031", "date" : "2010-08-16" } ] } { "_id" : ObjectId("4c5f572e8ead0ed00d440000"), "renewaldates" : [ { "semester" : "201021", "date" : "2010-05-15" } ] } { "_id" : ObjectId("4c5f572e8ead0ed00d460000"), "renewaldates" : [ { "semester" : "201031", "date" : "2010-08-16" } ] } { "_id" : ObjectId("4c5f572e8ead0ed00d4e0000"), "renewaldates" : [ { "semester" : "201031", "date" : "2010-08-16" } ] } { "_id" : ObjectId("4c5f572e8ead0ed00d660000"), "renewaldates" : [ { "semester" : "201031", "date" : "2010-08-16" } ] } { "_id" : ObjectId("4c5f572e8ead0ed00d6f0000"), "renewaldates" : [ { "semester" : "201031", "date" : "2010-08-16" } ] } { "_id" : ObjectId("4c5f572e8ead0ed00d720000"), "renewaldates" : [ { "semester" : "201031", "date" : "2010-08-16" } ] } { "_id" : ObjectId("4c5f572e8ead0ed00d750000"), "renewaldates" : [ { "semester" : "201031", "date" : "2010-08-16" } ] } { "_id" : ObjectId("4c5f572e8ead0ed00d800000"), "renewaldates" : [ { "semester" : "201021", "date" : "2010-05-15" } ] } { "_id" : ObjectId("4c5f572e8ead0ed00d880000"), "renewaldates" : [ { "semester" : "201031", "date" : "2010-08-16" } ] } { "_id" : ObjectId("4c5f572e8ead0ed00d8d0000"), "renewaldates" : [ { "semester" : "201021", "date" : "2010-05-15" } ] } { "_id" : ObjectId("4c5f572e8ead0ed00d8f0000"), "renewaldates" : [ { "semester" : "201031", "date" : "2010-08-16" } ] } { "_id" : ObjectId("4c5f572e8ead0ed00d940000"), "renewaldates" : [ { "semester" : "201031", "date" : "2010-08-16" } ] } >
    69. 69. How Many Blind Asthmatics? > db.students.find( {disabilities: {$in: ['Blind', 'Asthma']}} ).count() 76 >
    70. 70. Who Lives On Elm? > db.students.find( {'localaddress.street': /.*Elm/}, {_id:1, 'localaddress.street':1} ) { "_id" : ObjectId("4c5f572e8ead0ed00d220000"), "localaddress" : { "street" : "2807 Elm" } } { "_id" : ObjectId("4c5f572e8ead0ed00d290000"), "localaddress" : { "street" : "5762 Elm" } } { "_id" : ObjectId("4c5f572e8ead0ed00d400000"), "localaddress" : { "street" : "6261 Elm" } } { "_id" : ObjectId("4c5f572e8ead0ed00d610000"), "localaddress" : { "street" : "7099 Elm" } } { "_id" : ObjectId("4c5f572e8ead0ed00d930000"), "localaddress" : { "street" : "4994 Elm" } } { "_id" : ObjectId("4c5f572e8ead0ed00d960000"), "localaddress" : { "street" : "3456 Elm" } } >
    71. 71. Number Registered On Or Before 2010/06/20? > db.students .find( {$where: "new Date(this.registrationdate) <= new Date('2010/06/20')"} ) .count() 136 >
    72. 72. Update Classification? > db.students.find( {uin:'735383393'}, {_id: 1, uin: 1, classification: 1} ) { "_id" : ObjectId("4c60dae48ead0e143e0f0000"), "uin" : "735383393", "classification" : "G2" } > db.students.update( {uin:'735383393'}, {$set: {classification: 'G3'}} ) > db.students.find( {uin:'735383393'}, {_id: 1, uin: 1, classification: 1} ) { "_id" : ObjectId("4c60dae48ead0e143e0f0000"), "uin" : "735383393", "classification" : "G3" } >
    73. 73. Home Towns? > db.students.distinct('permanentaddress.city') [ "Austin", "Chicago", "Dallas", "Denver", "Houston", "Los Angeles", "Lubbock", "New York" ] >
    74. 74. Number of students by major? > db.students .group({ key: {major:true}, cond: {major: {$exists: true}}, reduce: function(obj, prev) { prev.count += 1; }, initial: { count: 0 } }) [ {"major" : "CPSC", "count" : 12}, {"major" : "MECH", "count" : 16}, {"major" : "ACCT", "count" : 18}, {"major" : "MGMT", "count" : 18}, {"major" : "FINC", "count" : 16}, {"major" : "ENDS", "count" : 15}, {"major" : "ARCH", "count" : 18}, {"major" : "ENGL", "count" : 15}, {"major" : "POLS", "count" : 22} ] >
    75. 75. How Many Students In A Course? > db.students .find({'schedules.courses': {$in: [ new DBRef('courses', new ObjectId('4c60dae48ead0e143e000000')) ]}}) .count() 25
    76. 76. No Time To Cover... • Map-Reduce • MongoDB has it, and it is extremely powerful • GridFS • Store files/blobs • Sharding/Replica Pairs/Master-Slave
    77. 77. Q&A
    78. 78. Resources • BASE: An Acid Alternative http://queue.acm.org/detail.cfm?id=1394128 • PHP Mongo Driver Reference http://php.net/mongo • MongoDB Advance Query Reference http://www.mongodb.org/display/DOCS/Advanced+Queries • MongoDB Query Cheat Sheet http://www.10gen.com/reference • myNoSQL Blog http://nosql.mypopescu.com/

    ×