DATAASDOCUMENTSMitch PirtleBigDive 2013Turin, Italy
ABOUTME•Moved from NYC to TO in 2011•Recovering Joomla! founder•CTO @soundaymusic•Use primarily PHP (Lithium), Node.js•Mon...
ABOUTTHISTALK•Background on database history•Impact from the Web•Emerging solutions and technologies•Hands-on session•Clos...
Are you done with lunch?
INTHEBEGINNING• Data was simple.• Performance wassimpler.• Scale was a rare need.
BIRTHOFRELATIONALDATA• Applications got morecomplex.• Many apps, onedatabase pushed logicinto the data tier.• “Business ru...
BIRTHOFWEB• Very complexarchitecture• Very high scalerequirements• Rapid applicationdevelopment
WRONGTOOLRIGHTJOB?•Was great for data consistency andfeatures, but...•Impossible to scale•Impedance mismatch with modernapps
ALTERNATIVES• Key / Value• Documents• Memory-only*
KEYVALUE•EXAMPLES: Memcache, Voldemort,Cassandra, Dynamo, Hibari, Riak•No schema needed•Blazing fast•Minimal features
DOCUMENT•EXAMPLES: MongoDB, SimpleDB,ElasticSearch, OrientDB•Rich datatypes matching modern apps•More features•Mostly JSON...
EXAMPLEPLATFORMS
MONGODB•Document database, uses JSON•Many user/developer features•Many deployment features•Designed specifically for modern...
REDIS•Key-value database•Extended data types•Many features•Similar facilities for scale andperformance
VOLDEMORT•Key-value•Extreme scale
HADOOP•Framework, not really a database•Born from Google’s map reduce anddistributed file system efforts
DEPLOYMENTOVERVIEW
DEDICATEDSYSTEMS•Low cost, simple to setup•Great performance•Difficult to scale•Require constant management
TETHEREDCLOUD•Takes dedicated environment andextends with cloud infrastructure forscale•Extremely flexible•Even more manage...
FULLCLOUD•High initial effort•Much simpler to manage long term•Extreme scale•Possibility for equally extreme costsavings*
DEVELOPERS!
(hang on a minute)
DEVELOPERS!
(much better)
(ok now to get serious)
WORKINGWITHSQL• Crap, now I need anORM!• Disconnect betweenrelational data andobject languages• Tons of debuggingfun!
WORKINGWITHMONGODB• Simplifies data access• Simplifies code• Fewer execution stepsmake faster andlighter apps
COMMONTERMS•database <-> database•table <-> collection•result <-> document•column <-> property
WHATISJSON?
DOCUMENTDESIGN• strings• integers• arrays• objects• dates• boolean• regex• symbol• javascript• ObjectID• timestamps• GridF...
DOCUMENTDESIGN• strings• integers• arrays• objects• dates• boolean• regex• symbol• javascript• ObjectID• timestamps• GridF...
DATATYPE:OBJECTID•MongoDB’s ObjectID is a 12-byteBSON type, comprised of unix secondsfrom epoch (4 bytes), machine identifi...
DATATYPE:OBJECTIDObjectId("4ee75a9c318b9d2c640001a6"}
DATATYPE:OBJECTID•ObjectID is not a string. Alwaysreference them as ObjectId(“...”) asyour comparisons will not work if yo...
DATATYPE:OBJECTID> x = ObjectId()ObjectId("51b73dff884498553b746046")> x.getTimestamp()ISODate("2013-06-11T15:10:55Z")
DATATYPE:DATE•MongoDB’s Date is a 64-bit integerthat represents the Unix epoch inmilliseconds. It is signed, negativevalue...
DATATYPE:DATE> when = new Date()ISODate("2013-06-11T15:18:30.241Z")> when.toString()Tue Jun 11 2013 17:18:30 GMT+0200 (CES...
DATATYPE:GRIDFS•MongoDB’s GridFS is a facility thatallows you to store binary files withinthe database, and allows you to e...
(ok this part is easier on thecommand line. more on thislater in this class.)
COMMONTASKS• find(), findOne()• findAndModify()• ensureIndex()• drop()• insert()• update()• upsert()• save()• remove()• stats()
INDEXES•MongoDB’s indexes support a varietyof types and needs•Indexing overview
INDEXTYPES• Standard (_id)• Secondary• Subdocuments• Embedded fields• Compound• ASC and DESC keys• Multikeys• Unique• Spars...
INDEXCREATION
INDEXCREATION•Standard:db.people.ensureIndex( { zipcode: 1} )
INDEXCREATION•Standard:db.people.ensureIndex( { zipcode: 1} )•Background:
INDEXCREATION•Standard:db.people.ensureIndex( { zipcode: 1} )•Background:db.people.ensureIndex( { zipcode: 1},{background:...
INDEXCREATION•Standard:db.people.ensureIndex( { zipcode: 1} )•Background:db.people.ensureIndex( { zipcode: 1},{background:...
INDEXCREATION•Standard:db.people.ensureIndex( { zipcode: 1} )•Background:db.people.ensureIndex( { zipcode: 1},{background:...
GRIDFS•Drivers support GridFS with helpermethods, as well as the mongofilescommand line tool that is distributedwith MongoD...
(drop to console)
NOTE: MongoDB provides manycommand line tools to work with yourdatabase. They are listed anddocumented in great detail onl...
HOWMONGODBSCALES•Vertically: Replication•Horizontally: Sharding
REPLICATION•MongoDB’s Replica Sets allow you toadd multiple masters for writeperformance, slaves for readperformance•Many ...
REPLICATIONM1 M2 M3H1 D1 D2(M)ember(H)idden(D)elayed
AGGREGATIONFRAMEWORK•Aggregation Framework providesGROUP BY like functionality withoutmap reduce•Many examples•Detailed re...
{! "_id" : ObjectId("51b833cd884498553b746047"),! "title" : "Book 1",! "author" : "Ima Writer",! "tags" : [! ! "awesome",!...
db.articles.aggregate({ $project : {author : 1,tags : 1,} },{ $unwind : "$tags" },{ $group : {_id : { tags : "$tags" },aut...
{! "result" : [! ! {! ! ! "_id" : {! ! ! ! "tags" : "good"! ! ! },! ! ! "authors" : [! ! ! ! "Heesan Author"! ! ! ]! ! },!...
SHARDING•MongoDB’s Sharding allows you toscale your data beyond one physicalmachine:- need more RAM- need more CPU- need m...
SHARDINGDEPLOYMENTS1 S2 S3M1 M2 M3(C)onfig(S)hard server (mongos)(M)ongo shard node (mongod)C1
MAPREDUCE•MongoDB’s mapReduce performscomplex aggregation operations•Many examples•Even more fun than regex!
Map Reduce is covered indetail in a later class atBIGDIVE
QUESTIONSANDANSWERS
THANKYOU
Data as Documents: Overview and intro to MongoDB
Upcoming SlideShare
Loading in...5
×

Data as Documents: Overview and intro to MongoDB

1,907

Published on

This is from my talk at BigDive in Turin, Italy 2013. The talk is generally about databases and how we evolved to where we are. There is a lot of command line stuff that is not shown here though - this is mostly for attendees for reference.

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,907
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Data as Documents: Overview and intro to MongoDB

  1. 1. DATAASDOCUMENTSMitch PirtleBigDive 2013Turin, Italy
  2. 2. ABOUTME•Moved from NYC to TO in 2011•Recovering Joomla! founder•CTO @soundaymusic•Use primarily PHP (Lithium), Node.js•MongoDB Master
  3. 3. ABOUTTHISTALK•Background on database history•Impact from the Web•Emerging solutions and technologies•Hands-on session•Close with Q&A
  4. 4. Are you done with lunch?
  5. 5. INTHEBEGINNING• Data was simple.• Performance wassimpler.• Scale was a rare need.
  6. 6. BIRTHOFRELATIONALDATA• Applications got morecomplex.• Many apps, onedatabase pushed logicinto the data tier.• “Business rules” wasthe king buzzword.
  7. 7. BIRTHOFWEB• Very complexarchitecture• Very high scalerequirements• Rapid applicationdevelopment
  8. 8. WRONGTOOLRIGHTJOB?•Was great for data consistency andfeatures, but...•Impossible to scale•Impedance mismatch with modernapps
  9. 9. ALTERNATIVES• Key / Value• Documents• Memory-only*
  10. 10. KEYVALUE•EXAMPLES: Memcache, Voldemort,Cassandra, Dynamo, Hibari, Riak•No schema needed•Blazing fast•Minimal features
  11. 11. DOCUMENT•EXAMPLES: MongoDB, SimpleDB,ElasticSearch, OrientDB•Rich datatypes matching modern apps•More features•Mostly JSON based
  12. 12. EXAMPLEPLATFORMS
  13. 13. MONGODB•Document database, uses JSON•Many user/developer features•Many deployment features•Designed specifically for modern scalechallenges and programminglanguages
  14. 14. REDIS•Key-value database•Extended data types•Many features•Similar facilities for scale andperformance
  15. 15. VOLDEMORT•Key-value•Extreme scale
  16. 16. HADOOP•Framework, not really a database•Born from Google’s map reduce anddistributed file system efforts
  17. 17. DEPLOYMENTOVERVIEW
  18. 18. DEDICATEDSYSTEMS•Low cost, simple to setup•Great performance•Difficult to scale•Require constant management
  19. 19. TETHEREDCLOUD•Takes dedicated environment andextends with cloud infrastructure forscale•Extremely flexible•Even more management andadministration
  20. 20. FULLCLOUD•High initial effort•Much simpler to manage long term•Extreme scale•Possibility for equally extreme costsavings*
  21. 21. DEVELOPERS!
  22. 22. (hang on a minute)
  23. 23. DEVELOPERS!
  24. 24. (much better)
  25. 25. (ok now to get serious)
  26. 26. WORKINGWITHSQL• Crap, now I need anORM!• Disconnect betweenrelational data andobject languages• Tons of debuggingfun!
  27. 27. WORKINGWITHMONGODB• Simplifies data access• Simplifies code• Fewer execution stepsmake faster andlighter apps
  28. 28. COMMONTERMS•database <-> database•table <-> collection•result <-> document•column <-> property
  29. 29. WHATISJSON?
  30. 30. DOCUMENTDESIGN• strings• integers• arrays• objects• dates• boolean• regex• symbol• javascript• ObjectID• timestamps• GridFSMongoDB documents are BSON:
  31. 31. DOCUMENTDESIGN• strings• integers• arrays• objects• dates• boolean• regex• symbol• javascript• ObjectID• timestamps• GridFSMongoDB documents are BSON:
  32. 32. DATATYPE:OBJECTID•MongoDB’s ObjectID is a 12-byteBSON type, comprised of unix secondsfrom epoch (4 bytes), machine identifier(3 bytes), process id (2 bytes), andrandom counter (3 bytes).
  33. 33. DATATYPE:OBJECTIDObjectId("4ee75a9c318b9d2c640001a6"}
  34. 34. DATATYPE:OBJECTID•ObjectID is not a string. Alwaysreference them as ObjectId(“...”) asyour comparisons will not work if youdo not.
  35. 35. DATATYPE:OBJECTID> x = ObjectId()ObjectId("51b73dff884498553b746046")> x.getTimestamp()ISODate("2013-06-11T15:10:55Z")
  36. 36. DATATYPE:DATE•MongoDB’s Date is a 64-bit integerthat represents the Unix epoch inmilliseconds. It is signed, negativevalues represents dates before 1970.
  37. 37. DATATYPE:DATE> when = new Date()ISODate("2013-06-11T15:18:30.241Z")> when.toString()Tue Jun 11 2013 17:18:30 GMT+0200 (CEST)> when.getMonth()5
  38. 38. DATATYPE:GRIDFS•MongoDB’s GridFS is a facility thatallows you to store binary files withinthe database, and allows you to extendthem with JSON metadata.
  39. 39. (ok this part is easier on thecommand line. more on thislater in this class.)
  40. 40. COMMONTASKS• find(), findOne()• findAndModify()• ensureIndex()• drop()• insert()• update()• upsert()• save()• remove()• stats()
  41. 41. INDEXES•MongoDB’s indexes support a varietyof types and needs•Indexing overview
  42. 42. INDEXTYPES• Standard (_id)• Secondary• Subdocuments• Embedded fields• Compound• ASC and DESC keys• Multikeys• Unique• Sparse• Hash
  43. 43. INDEXCREATION
  44. 44. INDEXCREATION•Standard:db.people.ensureIndex( { zipcode: 1} )
  45. 45. INDEXCREATION•Standard:db.people.ensureIndex( { zipcode: 1} )•Background:
  46. 46. INDEXCREATION•Standard:db.people.ensureIndex( { zipcode: 1} )•Background:db.people.ensureIndex( { zipcode: 1},{background: true } )
  47. 47. INDEXCREATION•Standard:db.people.ensureIndex( { zipcode: 1} )•Background:db.people.ensureIndex( { zipcode: 1},{background: true } )•Background Sparse:
  48. 48. INDEXCREATION•Standard:db.people.ensureIndex( { zipcode: 1} )•Background:db.people.ensureIndex( { zipcode: 1},{background: true } )•Background Sparse:db.people.ensureIndex( { zipcode: 1},{background: true, sparse: true } )
  49. 49. GRIDFS•Drivers support GridFS with helpermethods, as well as the mongofilescommand line tool that is distributedwith MongoDB.•Crazy, whack-daddy fast.•Dead simple to use.
  50. 50. (drop to console)
  51. 51. NOTE: MongoDB provides manycommand line tools to work with yourdatabase. They are listed anddocumented in great detail online.
  52. 52. HOWMONGODBSCALES•Vertically: Replication•Horizontally: Sharding
  53. 53. REPLICATION•MongoDB’s Replica Sets allow you toadd multiple masters for writeperformance, slaves for readperformance•Many tutorials and procedures
  54. 54. REPLICATIONM1 M2 M3H1 D1 D2(M)ember(H)idden(D)elayed
  55. 55. AGGREGATIONFRAMEWORK•Aggregation Framework providesGROUP BY like functionality withoutmap reduce•Many examples•Detailed reference
  56. 56. {! "_id" : ObjectId("51b833cd884498553b746047"),! "title" : "Book 1",! "author" : "Ima Writer",! "tags" : [! ! "awesome",! ! "ok",! ! "lousy",! ! "ok",! ! "meh",! ! "meh"! ]}{! "_id" : ObjectId("51b833ee884498553b746048"),! "title" : "Book 2",! "author" : "Heesan Author",! "tags" : [! ! "awesome",! ! "ok",! ! "lousy",! ! "awesome",! ! "good",! ! "good"! ]}
  57. 57. db.articles.aggregate({ $project : {author : 1,tags : 1,} },{ $unwind : "$tags" },{ $group : {_id : { tags : "$tags" },authors : { $addToSet : "$author" }} });
  58. 58. {! "result" : [! ! {! ! ! "_id" : {! ! ! ! "tags" : "good"! ! ! },! ! ! "authors" : [! ! ! ! "Heesan Author"! ! ! ]! ! },! ! {! ! ! "_id" : {! ! ! ! "tags" : "meh"! ! ! },! ! ! "authors" : [! ! ! ! "Sheesan Author",! ! ! ! "Ima Writer"! ! ! ]! ! }! ],! "ok" : 1}
  59. 59. SHARDING•MongoDB’s Sharding allows you toscale your data beyond one physicalmachine:- need more RAM- need more CPU- need more disk
  60. 60. SHARDINGDEPLOYMENTS1 S2 S3M1 M2 M3(C)onfig(S)hard server (mongos)(M)ongo shard node (mongod)C1
  61. 61. MAPREDUCE•MongoDB’s mapReduce performscomplex aggregation operations•Many examples•Even more fun than regex!
  62. 62. Map Reduce is covered indetail in a later class atBIGDIVE
  63. 63. QUESTIONSANDANSWERS
  64. 64. THANKYOU

×