12: NoSQL in ActionZubair Nabizubair.nabi@itu.edu.pkApril 20, 2013Zubair Nabi 12: NoSQL in Action April 20, 2013 1 / 33
Outline1 Amazon’s Dynamo2 MongoDB3 Google BigTable4 CassandraZubair Nabi 12: NoSQL in Action April 20, 2013 2 / 33
Outline1 Amazon’s Dynamo2 MongoDB3 Google BigTable4 CassandraZubair Nabi 12: NoSQL in Action April 20, 2013 3 / 33
IntroductionAt the forefront of the NoSQL movement and has influenced the designof many subsequent systemsZubair Nabi 12: N...
IntroductionAt the forefront of the NoSQL movement and has influenced the designof many subsequent systemsDesign considerat...
Infrastructure ConsiderationsTens of thousands of servers and network elements distributed acrossthe globeZubair Nabi 12: ...
Infrastructure ConsiderationsTens of thousands of servers and network elements distributed acrossthe globeCommodity off-th...
Infrastructure ConsiderationsTens of thousands of servers and network elements distributed acrossthe globeCommodity off-th...
Business ConsiderationsStrict, internal SLAs regarding performance, reliability, and efficiencyZubair Nabi 12: NoSQL in Act...
Business ConsiderationsStrict, internal SLAs regarding performance, reliability, and efficiencyReliability is of paramount ...
Business ConsiderationsStrict, internal SLAs regarding performance, reliability, and efficiencyReliability is of paramount ...
Business ConsiderationsStrict, internal SLAs regarding performance, reliability, and efficiencyReliability is of paramount ...
Business ConsiderationsStrict, internal SLAs regarding performance, reliability, and efficiencyReliability is of paramount ...
Design1 Implemented as a partitioned system with replication and consistencywindowsZubair Nabi 12: NoSQL in Action April 2...
Design1 Implemented as a partitioned system with replication and consistencywindows2 Targets applications that require wea...
Design1 Implemented as a partitioned system with replication and consistencywindows2 Targets applications that require wea...
Design1 Implemented as a partitioned system with replication and consistencywindows2 Targets applications that require wea...
Design1 Implemented as a partitioned system with replication and consistencywindows2 Targets applications that require wea...
Conflict ResolutionA datastore can only perform simple conflict resolutionZubair Nabi 12: NoSQL in Action April 20, 2013 8 /...
Conflict ResolutionA datastore can only perform simple conflict resolutionPasses the buck to the applicationZubair Nabi 12: ...
Conflict ResolutionA datastore can only perform simple conflict resolutionPasses the buck to the applicationThe application ...
Conflict ResolutionA datastore can only perform simple conflict resolutionPasses the buck to the applicationThe application ...
Interface1 Simple key/value interface storing values as BLOBsZubair Nabi 12: NoSQL in Action April 20, 2013 9 / 33
Interface1 Simple key/value interface storing values as BLOBs2 Operations limited to one key/value pair at a timeZubair Na...
Interface1 Simple key/value interface storing values as BLOBs2 Operations limited to one key/value pair at a time3 No supp...
Node AssignmentCompletely decentralized so all nodes have equal responsibilitiesZubair Nabi 12: NoSQL in Action April 20, ...
Node AssignmentCompletely decentralized so all nodes have equal responsibilitiesAs nodes can be heterogeneous, work is dis...
OperationsProvides two operations:Zubair Nabi 12: NoSQL in Action April 20, 2013 11 / 33
OperationsProvides two operations:1 get(key), returns a list of objects and a contextZubair Nabi 12: NoSQL in Action April...
OperationsProvides two operations:1 get(key), returns a list of objects and a context2 put(key, context, object)Zubair Nab...
OperationsProvides two operations:1 get(key), returns a list of objects and a context2 put(key, context, object)get can re...
OperationsProvides two operations:1 get(key), returns a list of objects and a context2 put(key, context, object)get can re...
OperationsProvides two operations:1 get(key), returns a list of objects and a context2 put(key, context, object)get can re...
PartitioningMD5 hash of keys determines their storage nodesZubair Nabi 12: NoSQL in Action April 20, 2013 12 / 33
PartitioningMD5 hash of keys determines their storage nodesConsistent hashing to provide incremental scalabilityZubair Nab...
PartitioningMD5 hash of keys determines their storage nodesConsistent hashing to provide incremental scalabilityPartitioni...
Outline1 Amazon’s Dynamo2 MongoDB3 Google BigTable4 CassandraZubair Nabi 12: NoSQL in Action April 20, 2013 13 / 33
IntroductionSchemaless document database in C++Zubair Nabi 12: NoSQL in Action April 20, 2013 14 / 33
IntroductionSchemaless document database in C++Used by a large number of organizations including SourceForge.net.foursquar...
IntroductionSchemaless document database in C++Used by a large number of organizations including SourceForge.net.foursquar...
Databases and CollectionsDatabases contain collections (“named groupings”) of documentsZubair Nabi 12: NoSQL in Action Apr...
Databases and CollectionsDatabases contain collections (“named groupings”) of documentsDocuments within a collection might...
Databases and CollectionsDatabases contain collections (“named groupings”) of documentsDocuments within a collection might...
Databases and CollectionsDatabases contain collections (“named groupings”) of documentsDocuments within a collection might...
Hierarchical NamespacesDocuments can be organized into a hierarchical structure using adot-notationZubair Nabi 12: NoSQL i...
Hierarchical NamespacesDocuments can be organized into a hierarchical structure using adot-notationFor instance, the colle...
Hierarchical NamespacesDocuments can be organized into a hierarchical structure using adot-notationFor instance, the colle...
DocumentsUnit of data storageZubair Nabi 12: NoSQL in Action April 20, 2013 17 / 33
DocumentsUnit of data storageConceptually similar to an XML document, JSON document, etc.Zubair Nabi 12: NoSQL in Action A...
DocumentsUnit of data storageConceptually similar to an XML document, JSON document, etc.Documents are persisted in Binary...
DocumentsUnit of data storageConceptually similar to an XML document, JSON document, etc.Documents are persisted in Binary...
DocumentsUnit of data storageConceptually similar to an XML document, JSON document, etc.Documents are persisted in Binary...
DatatypesScalar: boolean, integer, doubleZubair Nabi 12: NoSQL in Action April 20, 2013 18 / 33
DatatypesScalar: boolean, integer, doubleCharacter sequence: string, code, etc.Zubair Nabi 12: NoSQL in Action April 20, 2...
DatatypesScalar: boolean, integer, doubleCharacter sequence: string, code, etc.BSON-objects: objectZubair Nabi 12: NoSQL i...
DatatypesScalar: boolean, integer, doubleCharacter sequence: string, code, etc.BSON-objects: objectObject ID: To identify ...
DatatypesScalar: boolean, integer, doubleCharacter sequence: string, code, etc.BSON-objects: objectObject ID: To identify ...
ReferencesNo mechanism for foreign keysZubair Nabi 12: NoSQL in Action April 20, 2013 19 / 33
ReferencesNo mechanism for foreign keysReferences between documents need to be resolved by clientapplicationsZubair Nabi 1...
Transaction PropertiesAtomicity for only update and delete operationsZubair Nabi 12: NoSQL in Action April 20, 2013 20 / 33
Transaction PropertiesAtomicity for only update and delete operationsAllows code to be executed locally on database nodes ...
Transaction PropertiesAtomicity for only update and delete operationsAllows code to be executed locally on database nodes ...
Transaction PropertiesAtomicity for only update and delete operationsAllows code to be executed locally on database nodes ...
Transaction PropertiesAtomicity for only update and delete operationsAllows code to be executed locally on database nodes ...
Transaction PropertiesAtomicity for only update and delete operationsAllows code to be executed locally on database nodes ...
Outline1 Amazon’s Dynamo2 MongoDB3 Google BigTable4 CassandraZubair Nabi 12: NoSQL in Action April 20, 2013 21 / 33
IntroductionSupports a relaxed relational model that is dynamically controlled bythe clientsZubair Nabi 12: NoSQL in Actio...
IntroductionSupports a relaxed relational model that is dynamically controlled bythe clientsClients can reason about the l...
IntroductionSupports a relaxed relational model that is dynamically controlled bythe clientsClients can reason about the l...
IntroductionSupports a relaxed relational model that is dynamically controlled bythe clientsClients can reason about the l...
IntroductionSupports a relaxed relational model that is dynamically controlled bythe clientsClients can reason about the l...
Data ModelValues stored as arrays of bytes which need to be interpreted by theclientsZubair Nabi 12: NoSQL in Action April...
Data ModelValues stored as arrays of bytes which need to be interpreted by theclientsValues are addressed by a 3-tuple (ro...
Data ModelValues stored as arrays of bytes which need to be interpreted by theclientsValues are addressed by a 3-tuple (ro...
Data ModelValues stored as arrays of bytes which need to be interpreted by theclientsValues are addressed by a 3-tuple (ro...
Data ModelValues stored as arrays of bytes which need to be interpreted by theclientsValues are addressed by a 3-tuple (ro...
Data ModelValues stored as arrays of bytes which need to be interpreted by theclientsValues are addressed by a 3-tuple (ro...
Data ModelValues stored as arrays of bytes which need to be interpreted by theclientsValues are addressed by a 3-tuple (ro...
Data ModelValues stored as arrays of bytes which need to be interpreted by theclientsValues are addressed by a 3-tuple (ro...
ColumnsNo limit on the number of columns per tableZubair Nabi 12: NoSQL in Action April 20, 2013 24 / 33
ColumnsNo limit on the number of columns per tableColumns grouped into sets called column families based on their keyprefix...
ColumnsNo limit on the number of columns per tableColumns grouped into sets called column families based on their keyprefix...
ColumnsNo limit on the number of columns per tableColumns grouped into sets called column families based on their keyprefix...
ColumnsNo limit on the number of columns per tableColumns grouped into sets called column families based on their keyprefix...
Timestamps64-bit integers that represent different versions of a cell valueZubair Nabi 12: NoSQL in Action April 20, 2013 ...
Timestamps64-bit integers that represent different versions of a cell valueValue assigned by either the datastore or the c...
Timestamps64-bit integers that represent different versions of a cell valueValue assigned by either the datastore or the c...
Timestamps64-bit integers that represent different versions of a cell valueValue assigned by either the datastore or the c...
APIRead operations for lookup, selection, etc.Zubair Nabi 12: NoSQL in Action April 20, 2013 26 / 33
APIRead operations for lookup, selection, etc.Write operations for creation, update, and deletion of valuesZubair Nabi 12:...
APIRead operations for lookup, selection, etc.Write operations for creation, update, and deletion of valuesWrite operation...
APIRead operations for lookup, selection, etc.Write operations for creation, update, and deletion of valuesWrite operation...
APIRead operations for lookup, selection, etc.Write operations for creation, update, and deletion of valuesWrite operation...
APIRead operations for lookup, selection, etc.Write operations for creation, update, and deletion of valuesWrite operation...
ArchitectureImplemented atop GFSZubair Nabi 12: NoSQL in Action April 20, 2013 27 / 33
ArchitectureImplemented atop GFSMultiple tablet servers and a single masterZubair Nabi 12: NoSQL in Action April 20, 2013 ...
HBaseOpen source clone of HBase in JavaZubair Nabi 12: NoSQL in Action April 20, 2013 28 / 33
HBaseOpen source clone of HBase in JavaImplemented atop HDFSZubair Nabi 12: NoSQL in Action April 20, 2013 28 / 33
HBaseOpen source clone of HBase in JavaImplemented atop HDFSHBase can be the source and/or the sink of Hadoop jobsZubair N...
HBaseOpen source clone of HBase in JavaImplemented atop HDFSHBase can be the source and/or the sink of Hadoop jobsFacebook...
Outline1 Amazon’s Dynamo2 MongoDB3 Google BigTable4 CassandraZubair Nabi 12: NoSQL in Action April 20, 2013 29 / 33
IntroductionBorrows concepts from both Dynamo and BigTableZubair Nabi 12: NoSQL in Action April 20, 2013 30 / 33
IntroductionBorrows concepts from both Dynamo and BigTableOriginally developed by Facebook but now an Apache open sourcepr...
IntroductionBorrows concepts from both Dynamo and BigTableOriginally developed by Facebook but now an Apache open sourcepr...
Design GoalsProcessing of a large amount of dataZubair Nabi 12: NoSQL in Action April 20, 2013 31 / 33
Design GoalsProcessing of a large amount of dataHighly scalableZubair Nabi 12: NoSQL in Action April 20, 2013 31 / 33
Design GoalsProcessing of a large amount of dataHighly scalableReliability at a massive scaleZubair Nabi 12: NoSQL in Acti...
Design GoalsProcessing of a large amount of dataHighly scalableReliability at a massive scaleHigh throughput writes withou...
Data ModelA table is a distributed multidimensional map indexed by a keyZubair Nabi 12: NoSQL in Action April 20, 2013 32 ...
Data ModelA table is a distributed multidimensional map indexed by a keyRows are identified by a string-key and operations ...
Data ModelA table is a distributed multidimensional map indexed by a keyRows are identified by a string-key and operations ...
Data ModelA table is a distributed multidimensional map indexed by a keyRows are identified by a string-key and operations ...
Data ModelA table is a distributed multidimensional map indexed by a keyRows are identified by a string-key and operations ...
Data ModelA table is a distributed multidimensional map indexed by a keyRows are identified by a string-key and operations ...
References1 NoSQL Databases: https://oak.cs.ucla.edu/cs144/handouts/nosqldbs.pdfZubair Nabi 12: NoSQL in Action April 20, ...
Upcoming SlideShare
Loading in...5
×

Topic 12: NoSQL in Action

1,372

Published on

Cloud Computing Workshop 2013, ITU

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,372
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
76
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Topic 12: NoSQL in Action

  1. 1. 12: NoSQL in ActionZubair Nabizubair.nabi@itu.edu.pkApril 20, 2013Zubair Nabi 12: NoSQL in Action April 20, 2013 1 / 33
  2. 2. Outline1 Amazon’s Dynamo2 MongoDB3 Google BigTable4 CassandraZubair Nabi 12: NoSQL in Action April 20, 2013 2 / 33
  3. 3. Outline1 Amazon’s Dynamo2 MongoDB3 Google BigTable4 CassandraZubair Nabi 12: NoSQL in Action April 20, 2013 3 / 33
  4. 4. IntroductionAt the forefront of the NoSQL movement and has influenced the designof many subsequent systemsZubair Nabi 12: NoSQL in Action April 20, 2013 4 / 33
  5. 5. IntroductionAt the forefront of the NoSQL movement and has influenced the designof many subsequent systemsDesign considerations are two-fold: 1) Infrastructure and 2) BusinessZubair Nabi 12: NoSQL in Action April 20, 2013 4 / 33
  6. 6. Infrastructure ConsiderationsTens of thousands of servers and network elements distributed acrossthe globeZubair Nabi 12: NoSQL in Action April 20, 2013 5 / 33
  7. 7. Infrastructure ConsiderationsTens of thousands of servers and network elements distributed acrossthe globeCommodity off-the-shelf hardwareFailure is normalZubair Nabi 12: NoSQL in Action April 20, 2013 5 / 33
  8. 8. Infrastructure ConsiderationsTens of thousands of servers and network elements distributed acrossthe globeCommodity off-the-shelf hardwareFailure is normalHundreds of services, all decentralized and loosely coupledZubair Nabi 12: NoSQL in Action April 20, 2013 5 / 33
  9. 9. Business ConsiderationsStrict, internal SLAs regarding performance, reliability, and efficiencyZubair Nabi 12: NoSQL in Action April 20, 2013 6 / 33
  10. 10. Business ConsiderationsStrict, internal SLAs regarding performance, reliability, and efficiencyReliability is of paramount importance because an outage means lossin revenue and customer trustZubair Nabi 12: NoSQL in Action April 20, 2013 6 / 33
  11. 11. Business ConsiderationsStrict, internal SLAs regarding performance, reliability, and efficiencyReliability is of paramount importance because an outage means lossin revenue and customer trustThe platform needs to be highly scalable, to support continuous growthZubair Nabi 12: NoSQL in Action April 20, 2013 6 / 33
  12. 12. Business ConsiderationsStrict, internal SLAs regarding performance, reliability, and efficiencyReliability is of paramount importance because an outage means lossin revenue and customer trustThe platform needs to be highly scalable, to support continuous growthMost services only store and retrieve data by primary key, such as bestsellers lists, shopping carts, etc.Zubair Nabi 12: NoSQL in Action April 20, 2013 6 / 33
  13. 13. Business ConsiderationsStrict, internal SLAs regarding performance, reliability, and efficiencyReliability is of paramount importance because an outage means lossin revenue and customer trustThe platform needs to be highly scalable, to support continuous growthMost services only store and retrieve data by primary key, such as bestsellers lists, shopping carts, etc.No need for complex querying and management afforded by RDBMSZubair Nabi 12: NoSQL in Action April 20, 2013 6 / 33
  14. 14. Design1 Implemented as a partitioned system with replication and consistencywindowsZubair Nabi 12: NoSQL in Action April 20, 2013 7 / 33
  15. 15. Design1 Implemented as a partitioned system with replication and consistencywindows2 Targets applications that require weaker consistencyZubair Nabi 12: NoSQL in Action April 20, 2013 7 / 33
  16. 16. Design1 Implemented as a partitioned system with replication and consistencywindows2 Targets applications that require weaker consistency3 Gives high availabilityZubair Nabi 12: NoSQL in Action April 20, 2013 7 / 33
  17. 17. Design1 Implemented as a partitioned system with replication and consistencywindows2 Targets applications that require weaker consistency3 Gives high availability4 Possibility for write operations even in the presence of partitioningamongst replicasZubair Nabi 12: NoSQL in Action April 20, 2013 7 / 33
  18. 18. Design1 Implemented as a partitioned system with replication and consistencywindows2 Targets applications that require weaker consistency3 Gives high availability4 Possibility for write operations even in the presence of partitioningamongst replicas5 Always writeable so conflict resolution needs to happen during readsZubair Nabi 12: NoSQL in Action April 20, 2013 7 / 33
  19. 19. Conflict ResolutionA datastore can only perform simple conflict resolutionZubair Nabi 12: NoSQL in Action April 20, 2013 8 / 33
  20. 20. Conflict ResolutionA datastore can only perform simple conflict resolutionPasses the buck to the applicationZubair Nabi 12: NoSQL in Action April 20, 2013 8 / 33
  21. 21. Conflict ResolutionA datastore can only perform simple conflict resolutionPasses the buck to the applicationThe application is aware of the data schema and hence better suited tochoose a conflict resolution mechanismZubair Nabi 12: NoSQL in Action April 20, 2013 8 / 33
  22. 22. Conflict ResolutionA datastore can only perform simple conflict resolutionPasses the buck to the applicationThe application is aware of the data schema and hence better suited tochoose a conflict resolution mechanismIf the application does not want to implement conflict resolution, simplemechanisms, such as “last write wins” provided by the frameworkZubair Nabi 12: NoSQL in Action April 20, 2013 8 / 33
  23. 23. Interface1 Simple key/value interface storing values as BLOBsZubair Nabi 12: NoSQL in Action April 20, 2013 9 / 33
  24. 24. Interface1 Simple key/value interface storing values as BLOBs2 Operations limited to one key/value pair at a timeZubair Nabi 12: NoSQL in Action April 20, 2013 9 / 33
  25. 25. Interface1 Simple key/value interface storing values as BLOBs2 Operations limited to one key/value pair at a time3 No support for hierarchichal namespaces (like those in filesystems)Zubair Nabi 12: NoSQL in Action April 20, 2013 9 / 33
  26. 26. Node AssignmentCompletely decentralized so all nodes have equal responsibilitiesZubair Nabi 12: NoSQL in Action April 20, 2013 10 / 33
  27. 27. Node AssignmentCompletely decentralized so all nodes have equal responsibilitiesAs nodes can be heterogeneous, work is distributed proportional to thecapabilities of a nodeZubair Nabi 12: NoSQL in Action April 20, 2013 10 / 33
  28. 28. OperationsProvides two operations:Zubair Nabi 12: NoSQL in Action April 20, 2013 11 / 33
  29. 29. OperationsProvides two operations:1 get(key), returns a list of objects and a contextZubair Nabi 12: NoSQL in Action April 20, 2013 11 / 33
  30. 30. OperationsProvides two operations:1 get(key), returns a list of objects and a context2 put(key, context, object)Zubair Nabi 12: NoSQL in Action April 20, 2013 11 / 33
  31. 31. OperationsProvides two operations:1 get(key), returns a list of objects and a context2 put(key, context, object)get can return more than one object if more than one conflictingversionsZubair Nabi 12: NoSQL in Action April 20, 2013 11 / 33
  32. 32. OperationsProvides two operations:1 get(key), returns a list of objects and a context2 put(key, context, object)get can return more than one object if more than one conflictingversionsThe context contains system metadata such as the object versionZubair Nabi 12: NoSQL in Action April 20, 2013 11 / 33
  33. 33. OperationsProvides two operations:1 get(key), returns a list of objects and a context2 put(key, context, object)get can return more than one object if more than one conflictingversionsThe context contains system metadata such as the object versionKeys and values are stored as an array of bytes, and only interpretedby the applicationZubair Nabi 12: NoSQL in Action April 20, 2013 11 / 33
  34. 34. PartitioningMD5 hash of keys determines their storage nodesZubair Nabi 12: NoSQL in Action April 20, 2013 12 / 33
  35. 35. PartitioningMD5 hash of keys determines their storage nodesConsistent hashing to provide incremental scalabilityZubair Nabi 12: NoSQL in Action April 20, 2013 12 / 33
  36. 36. PartitioningMD5 hash of keys determines their storage nodesConsistent hashing to provide incremental scalabilityPartitioning done across virtual nodes instead of physical ones to takehardware heterogeneity into accountZubair Nabi 12: NoSQL in Action April 20, 2013 12 / 33
  37. 37. Outline1 Amazon’s Dynamo2 MongoDB3 Google BigTable4 CassandraZubair Nabi 12: NoSQL in Action April 20, 2013 13 / 33
  38. 38. IntroductionSchemaless document database in C++Zubair Nabi 12: NoSQL in Action April 20, 2013 14 / 33
  39. 39. IntroductionSchemaless document database in C++Used by a large number of organizations including SourceForge.net.foursquare, the New York Times, bit.ly, Craigslist, SAP, MTV, EASports, github, etc.Zubair Nabi 12: NoSQL in Action April 20, 2013 14 / 33
  40. 40. IntroductionSchemaless document database in C++Used by a large number of organizations including SourceForge.net.foursquare, the New York Times, bit.ly, Craigslist, SAP, MTV, EASports, github, etc.Databases are distributed over multiple serversZubair Nabi 12: NoSQL in Action April 20, 2013 14 / 33
  41. 41. Databases and CollectionsDatabases contain collections (“named groupings”) of documentsZubair Nabi 12: NoSQL in Action April 20, 2013 15 / 33
  42. 42. Databases and CollectionsDatabases contain collections (“named groupings”) of documentsDocuments within a collection might be heterogeneousZubair Nabi 12: NoSQL in Action April 20, 2013 15 / 33
  43. 43. Databases and CollectionsDatabases contain collections (“named groupings”) of documentsDocuments within a collection might be heterogeneousBut a good strategy is to create a database collection for each objecttypeZubair Nabi 12: NoSQL in Action April 20, 2013 15 / 33
  44. 44. Databases and CollectionsDatabases contain collections (“named groupings”) of documentsDocuments within a collection might be heterogeneousBut a good strategy is to create a database collection for each objecttypeA collection is created automatically whenever the first document isinserted into the databaseZubair Nabi 12: NoSQL in Action April 20, 2013 15 / 33
  45. 45. Hierarchical NamespacesDocuments can be organized into a hierarchical structure using adot-notationZubair Nabi 12: NoSQL in Action April 20, 2013 16 / 33
  46. 46. Hierarchical NamespacesDocuments can be organized into a hierarchical structure using adot-notationFor instance, the collections wiki.articles, wiki.categoriesand wiki.authors exist within the namespace wikiZubair Nabi 12: NoSQL in Action April 20, 2013 16 / 33
  47. 47. Hierarchical NamespacesDocuments can be organized into a hierarchical structure using adot-notationFor instance, the collections wiki.articles, wiki.categoriesand wiki.authors exist within the namespace wikiThe collection namespace itself is flat, hierarchical structure only forthe userZubair Nabi 12: NoSQL in Action April 20, 2013 16 / 33
  48. 48. DocumentsUnit of data storageZubair Nabi 12: NoSQL in Action April 20, 2013 17 / 33
  49. 49. DocumentsUnit of data storageConceptually similar to an XML document, JSON document, etc.Zubair Nabi 12: NoSQL in Action April 20, 2013 17 / 33
  50. 50. DocumentsUnit of data storageConceptually similar to an XML document, JSON document, etc.Documents are persisted in Binary JSON (BSON)Zubair Nabi 12: NoSQL in Action April 20, 2013 17 / 33
  51. 51. DocumentsUnit of data storageConceptually similar to an XML document, JSON document, etc.Documents are persisted in Binary JSON (BSON)Easy to convert between BSON and JSON and between BSON andother programming language structuresZubair Nabi 12: NoSQL in Action April 20, 2013 17 / 33
  52. 52. DocumentsUnit of data storageConceptually similar to an XML document, JSON document, etc.Documents are persisted in Binary JSON (BSON)Easy to convert between BSON and JSON and between BSON andother programming language structuresPossible to insert (insert), search (find), and update a document(save)Zubair Nabi 12: NoSQL in Action April 20, 2013 17 / 33
  53. 53. DatatypesScalar: boolean, integer, doubleZubair Nabi 12: NoSQL in Action April 20, 2013 18 / 33
  54. 54. DatatypesScalar: boolean, integer, doubleCharacter sequence: string, code, etc.Zubair Nabi 12: NoSQL in Action April 20, 2013 18 / 33
  55. 55. DatatypesScalar: boolean, integer, doubleCharacter sequence: string, code, etc.BSON-objects: objectZubair Nabi 12: NoSQL in Action April 20, 2013 18 / 33
  56. 56. DatatypesScalar: boolean, integer, doubleCharacter sequence: string, code, etc.BSON-objects: objectObject ID: To identify documents within a collectionZubair Nabi 12: NoSQL in Action April 20, 2013 18 / 33
  57. 57. DatatypesScalar: boolean, integer, doubleCharacter sequence: string, code, etc.BSON-objects: objectObject ID: To identify documents within a collectionMisc: null, array, dateZubair Nabi 12: NoSQL in Action April 20, 2013 18 / 33
  58. 58. ReferencesNo mechanism for foreign keysZubair Nabi 12: NoSQL in Action April 20, 2013 19 / 33
  59. 59. ReferencesNo mechanism for foreign keysReferences between documents need to be resolved by clientapplicationsZubair Nabi 12: NoSQL in Action April 20, 2013 19 / 33
  60. 60. Transaction PropertiesAtomicity for only update and delete operationsZubair Nabi 12: NoSQL in Action April 20, 2013 20 / 33
  61. 61. Transaction PropertiesAtomicity for only update and delete operationsAllows code to be executed locally on database nodes (server-sidecode execution)Zubair Nabi 12: NoSQL in Action April 20, 2013 20 / 33
  62. 62. Transaction PropertiesAtomicity for only update and delete operationsAllows code to be executed locally on database nodes (server-sidecode execution)Three different strategies for server-side execution:Zubair Nabi 12: NoSQL in Action April 20, 2013 20 / 33
  63. 63. Transaction PropertiesAtomicity for only update and delete operationsAllows code to be executed locally on database nodes (server-sidecode execution)Three different strategies for server-side execution:1 Execution of arbitrary code on a single node via eval operatorZubair Nabi 12: NoSQL in Action April 20, 2013 20 / 33
  64. 64. Transaction PropertiesAtomicity for only update and delete operationsAllows code to be executed locally on database nodes (server-sidecode execution)Three different strategies for server-side execution:1 Execution of arbitrary code on a single node via eval operator2 Aggregation via count, group, and distinctZubair Nabi 12: NoSQL in Action April 20, 2013 20 / 33
  65. 65. Transaction PropertiesAtomicity for only update and delete operationsAllows code to be executed locally on database nodes (server-sidecode execution)Three different strategies for server-side execution:1 Execution of arbitrary code on a single node via eval operator2 Aggregation via count, group, and distinct3 MapReduce code execution on multiple nodesZubair Nabi 12: NoSQL in Action April 20, 2013 20 / 33
  66. 66. Outline1 Amazon’s Dynamo2 MongoDB3 Google BigTable4 CassandraZubair Nabi 12: NoSQL in Action April 20, 2013 21 / 33
  67. 67. IntroductionSupports a relaxed relational model that is dynamically controlled bythe clientsZubair Nabi 12: NoSQL in Action April 20, 2013 22 / 33
  68. 68. IntroductionSupports a relaxed relational model that is dynamically controlled bythe clientsClients can reason about the locality properties of the dataZubair Nabi 12: NoSQL in Action April 20, 2013 22 / 33
  69. 69. IntroductionSupports a relaxed relational model that is dynamically controlled bythe clientsClients can reason about the locality properties of the dataData indexing can be row-wise as well as column-wiseZubair Nabi 12: NoSQL in Action April 20, 2013 22 / 33
  70. 70. IntroductionSupports a relaxed relational model that is dynamically controlled bythe clientsClients can reason about the locality properties of the dataData indexing can be row-wise as well as column-wiseData can be delivered either out of memory or from diskZubair Nabi 12: NoSQL in Action April 20, 2013 22 / 33
  71. 71. IntroductionSupports a relaxed relational model that is dynamically controlled bythe clientsClients can reason about the locality properties of the dataData indexing can be row-wise as well as column-wiseData can be delivered either out of memory or from diskUsed internally by Google for more than 60 projects including GoogleEarth, Google Analytics, Orkut, and Google DocsZubair Nabi 12: NoSQL in Action April 20, 2013 22 / 33
  72. 72. Data ModelValues stored as arrays of bytes which need to be interpreted by theclientsZubair Nabi 12: NoSQL in Action April 20, 2013 23 / 33
  73. 73. Data ModelValues stored as arrays of bytes which need to be interpreted by theclientsValues are addressed by a 3-tuple (row-key, column-key,timestamp)Zubair Nabi 12: NoSQL in Action April 20, 2013 23 / 33
  74. 74. Data ModelValues stored as arrays of bytes which need to be interpreted by theclientsValues are addressed by a 3-tuple (row-key, column-key,timestamp)Row keys are strings of up to 64KBZubair Nabi 12: NoSQL in Action April 20, 2013 23 / 33
  75. 75. Data ModelValues stored as arrays of bytes which need to be interpreted by theclientsValues are addressed by a 3-tuple (row-key, column-key,timestamp)Row keys are strings of up to 64KBRows are maintained in lexicographic order and are dynamicallypartitioned into tabletsZubair Nabi 12: NoSQL in Action April 20, 2013 23 / 33
  76. 76. Data ModelValues stored as arrays of bytes which need to be interpreted by theclientsValues are addressed by a 3-tuple (row-key, column-key,timestamp)Row keys are strings of up to 64KBRows are maintained in lexicographic order and are dynamicallypartitioned into tabletsThe unit of distribution and load balancingZubair Nabi 12: NoSQL in Action April 20, 2013 23 / 33
  77. 77. Data ModelValues stored as arrays of bytes which need to be interpreted by theclientsValues are addressed by a 3-tuple (row-key, column-key,timestamp)Row keys are strings of up to 64KBRows are maintained in lexicographic order and are dynamicallypartitioned into tabletsThe unit of distribution and load balancingReads can be made efficient (only having to access a small number ofservers) by wisely choosing row keysZubair Nabi 12: NoSQL in Action April 20, 2013 23 / 33
  78. 78. Data ModelValues stored as arrays of bytes which need to be interpreted by theclientsValues are addressed by a 3-tuple (row-key, column-key,timestamp)Row keys are strings of up to 64KBRows are maintained in lexicographic order and are dynamicallypartitioned into tabletsThe unit of distribution and load balancingReads can be made efficient (only having to access a small number ofservers) by wisely choosing row keysRow ranges with small lexicographic distances are partitioned into fewertabletsZubair Nabi 12: NoSQL in Action April 20, 2013 23 / 33
  79. 79. Data ModelValues stored as arrays of bytes which need to be interpreted by theclientsValues are addressed by a 3-tuple (row-key, column-key,timestamp)Row keys are strings of up to 64KBRows are maintained in lexicographic order and are dynamicallypartitioned into tabletsThe unit of distribution and load balancingReads can be made efficient (only having to access a small number ofservers) by wisely choosing row keysRow ranges with small lexicographic distances are partitioned into fewertabletsFor instance storing URLs in reverse order: com.cnn.blogs,com.cnn.www, etc.Zubair Nabi 12: NoSQL in Action April 20, 2013 23 / 33
  80. 80. ColumnsNo limit on the number of columns per tableZubair Nabi 12: NoSQL in Action April 20, 2013 24 / 33
  81. 81. ColumnsNo limit on the number of columns per tableColumns grouped into sets called column families based on their keyprefixZubair Nabi 12: NoSQL in Action April 20, 2013 24 / 33
  82. 82. ColumnsNo limit on the number of columns per tableColumns grouped into sets called column families based on their keyprefixBasic unit of access controlZubair Nabi 12: NoSQL in Action April 20, 2013 24 / 33
  83. 83. ColumnsNo limit on the number of columns per tableColumns grouped into sets called column families based on their keyprefixBasic unit of access controlExpected to store the same or similar type of data so that it can becompressedZubair Nabi 12: NoSQL in Action April 20, 2013 24 / 33
  84. 84. ColumnsNo limit on the number of columns per tableColumns grouped into sets called column families based on their keyprefixBasic unit of access controlExpected to store the same or similar type of data so that it can becompressedNeed to be created before data can be stored in a columnZubair Nabi 12: NoSQL in Action April 20, 2013 24 / 33
  85. 85. Timestamps64-bit integers that represent different versions of a cell valueZubair Nabi 12: NoSQL in Action April 20, 2013 25 / 33
  86. 86. Timestamps64-bit integers that represent different versions of a cell valueValue assigned by either the datastore or the clientZubair Nabi 12: NoSQL in Action April 20, 2013 25 / 33
  87. 87. Timestamps64-bit integers that represent different versions of a cell valueValue assigned by either the datastore or the clientCells ordered in decreasing order of their timestampZubair Nabi 12: NoSQL in Action April 20, 2013 25 / 33
  88. 88. Timestamps64-bit integers that represent different versions of a cell valueValue assigned by either the datastore or the clientCells ordered in decreasing order of their timestampAutomatic garbage collection can be used to remove revisionsZubair Nabi 12: NoSQL in Action April 20, 2013 25 / 33
  89. 89. APIRead operations for lookup, selection, etc.Zubair Nabi 12: NoSQL in Action April 20, 2013 26 / 33
  90. 90. APIRead operations for lookup, selection, etc.Write operations for creation, update, and deletion of valuesZubair Nabi 12: NoSQL in Action April 20, 2013 26 / 33
  91. 91. APIRead operations for lookup, selection, etc.Write operations for creation, update, and deletion of valuesWrite operations for tables and column families for creation anddeletionZubair Nabi 12: NoSQL in Action April 20, 2013 26 / 33
  92. 92. APIRead operations for lookup, selection, etc.Write operations for creation, update, and deletion of valuesWrite operations for tables and column families for creation anddeletionAdministrative operations to modify store configuration and metadataZubair Nabi 12: NoSQL in Action April 20, 2013 26 / 33
  93. 93. APIRead operations for lookup, selection, etc.Write operations for creation, update, and deletion of valuesWrite operations for tables and column families for creation anddeletionAdministrative operations to modify store configuration and metadataMapReduce hooksZubair Nabi 12: NoSQL in Action April 20, 2013 26 / 33
  94. 94. APIRead operations for lookup, selection, etc.Write operations for creation, update, and deletion of valuesWrite operations for tables and column families for creation anddeletionAdministrative operations to modify store configuration and metadataMapReduce hooksTransactions are atomic at the single-row levelZubair Nabi 12: NoSQL in Action April 20, 2013 26 / 33
  95. 95. ArchitectureImplemented atop GFSZubair Nabi 12: NoSQL in Action April 20, 2013 27 / 33
  96. 96. ArchitectureImplemented atop GFSMultiple tablet servers and a single masterZubair Nabi 12: NoSQL in Action April 20, 2013 27 / 33
  97. 97. HBaseOpen source clone of HBase in JavaZubair Nabi 12: NoSQL in Action April 20, 2013 28 / 33
  98. 98. HBaseOpen source clone of HBase in JavaImplemented atop HDFSZubair Nabi 12: NoSQL in Action April 20, 2013 28 / 33
  99. 99. HBaseOpen source clone of HBase in JavaImplemented atop HDFSHBase can be the source and/or the sink of Hadoop jobsZubair Nabi 12: NoSQL in Action April 20, 2013 28 / 33
  100. 100. HBaseOpen source clone of HBase in JavaImplemented atop HDFSHBase can be the source and/or the sink of Hadoop jobsFacebook Chat implemented using HBaseZubair Nabi 12: NoSQL in Action April 20, 2013 28 / 33
  101. 101. Outline1 Amazon’s Dynamo2 MongoDB3 Google BigTable4 CassandraZubair Nabi 12: NoSQL in Action April 20, 2013 29 / 33
  102. 102. IntroductionBorrows concepts from both Dynamo and BigTableZubair Nabi 12: NoSQL in Action April 20, 2013 30 / 33
  103. 103. IntroductionBorrows concepts from both Dynamo and BigTableOriginally developed by Facebook but now an Apache open sourceprojectZubair Nabi 12: NoSQL in Action April 20, 2013 30 / 33
  104. 104. IntroductionBorrows concepts from both Dynamo and BigTableOriginally developed by Facebook but now an Apache open sourceprojectDesigned for Facebook Chat for efficiently storing, indexing, andsearching messagesZubair Nabi 12: NoSQL in Action April 20, 2013 30 / 33
  105. 105. Design GoalsProcessing of a large amount of dataZubair Nabi 12: NoSQL in Action April 20, 2013 31 / 33
  106. 106. Design GoalsProcessing of a large amount of dataHighly scalableZubair Nabi 12: NoSQL in Action April 20, 2013 31 / 33
  107. 107. Design GoalsProcessing of a large amount of dataHighly scalableReliability at a massive scaleZubair Nabi 12: NoSQL in Action April 20, 2013 31 / 33
  108. 108. Design GoalsProcessing of a large amount of dataHighly scalableReliability at a massive scaleHigh throughput writes without sacrificing read efficiencyZubair Nabi 12: NoSQL in Action April 20, 2013 31 / 33
  109. 109. Data ModelA table is a distributed multidimensional map indexed by a keyZubair Nabi 12: NoSQL in Action April 20, 2013 32 / 33
  110. 110. Data ModelA table is a distributed multidimensional map indexed by a keyRows are identified by a string-key and operations over them areatomic per replica regardless of the number of columnsZubair Nabi 12: NoSQL in Action April 20, 2013 32 / 33
  111. 111. Data ModelA table is a distributed multidimensional map indexed by a keyRows are identified by a string-key and operations over them areatomic per replica regardless of the number of columnsColumn families encapsule columns and super columnsZubair Nabi 12: NoSQL in Action April 20, 2013 32 / 33
  112. 112. Data ModelA table is a distributed multidimensional map indexed by a keyRows are identified by a string-key and operations over them areatomic per replica regardless of the number of columnsColumn families encapsule columns and super columnsColumns have a name and store a number of values per row, each witha timestampZubair Nabi 12: NoSQL in Action April 20, 2013 32 / 33
  113. 113. Data ModelA table is a distributed multidimensional map indexed by a keyRows are identified by a string-key and operations over them areatomic per replica regardless of the number of columnsColumn families encapsule columns and super columnsColumns have a name and store a number of values per row, each witha timestampSuper columns are columns with sub columnsZubair Nabi 12: NoSQL in Action April 20, 2013 32 / 33
  114. 114. Data ModelA table is a distributed multidimensional map indexed by a keyRows are identified by a string-key and operations over them areatomic per replica regardless of the number of columnsColumn families encapsule columns and super columnsColumns have a name and store a number of values per row, each witha timestampSuper columns are columns with sub columnsOnly three operations to get, insert, and deleteZubair Nabi 12: NoSQL in Action April 20, 2013 32 / 33
  115. 115. References1 NoSQL Databases: https://oak.cs.ucla.edu/cs144/handouts/nosqldbs.pdfZubair Nabi 12: NoSQL in Action April 20, 2013 33 / 33
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×