SlideShare a Scribd company logo
12: NoSQL in Action
Zubair Nabi
zubair.nabi@itu.edu.pk
April 20, 2013
Zubair Nabi 12: NoSQL in Action April 20, 2013 1 / 33
Outline
1 Amazon’s Dynamo
2 MongoDB
3 Google BigTable
4 Cassandra
Zubair Nabi 12: NoSQL in Action April 20, 2013 2 / 33
Outline
1 Amazon’s Dynamo
2 MongoDB
3 Google BigTable
4 Cassandra
Zubair Nabi 12: NoSQL in Action April 20, 2013 3 / 33
Introduction
At the forefront of the NoSQL movement and has influenced the design
of many subsequent systems
Zubair Nabi 12: NoSQL in Action April 20, 2013 4 / 33
Introduction
At the forefront of the NoSQL movement and has influenced the design
of many subsequent systems
Design considerations are two-fold: 1) Infrastructure and 2) Business
Zubair Nabi 12: NoSQL in Action April 20, 2013 4 / 33
Infrastructure Considerations
Tens of thousands of servers and network elements distributed across
the globe
Zubair Nabi 12: NoSQL in Action April 20, 2013 5 / 33
Infrastructure Considerations
Tens of thousands of servers and network elements distributed across
the globe
Commodity off-the-shelf hardware
Failure is normal
Zubair Nabi 12: NoSQL in Action April 20, 2013 5 / 33
Infrastructure Considerations
Tens of thousands of servers and network elements distributed across
the globe
Commodity off-the-shelf hardware
Failure is normal
Hundreds of services, all decentralized and loosely coupled
Zubair Nabi 12: NoSQL in Action April 20, 2013 5 / 33
Business Considerations
Strict, internal SLAs regarding performance, reliability, and efficiency
Zubair Nabi 12: NoSQL in Action April 20, 2013 6 / 33
Business Considerations
Strict, internal SLAs regarding performance, reliability, and efficiency
Reliability is of paramount importance because an outage means loss
in revenue and customer trust
Zubair Nabi 12: NoSQL in Action April 20, 2013 6 / 33
Business Considerations
Strict, internal SLAs regarding performance, reliability, and efficiency
Reliability is of paramount importance because an outage means loss
in revenue and customer trust
The platform needs to be highly scalable, to support continuous growth
Zubair Nabi 12: NoSQL in Action April 20, 2013 6 / 33
Business Considerations
Strict, internal SLAs regarding performance, reliability, and efficiency
Reliability is of paramount importance because an outage means loss
in revenue and customer trust
The platform needs to be highly scalable, to support continuous growth
Most services only store and retrieve data by primary key, such as best
sellers lists, shopping carts, etc.
Zubair Nabi 12: NoSQL in Action April 20, 2013 6 / 33
Business Considerations
Strict, internal SLAs regarding performance, reliability, and efficiency
Reliability is of paramount importance because an outage means loss
in revenue and customer trust
The platform needs to be highly scalable, to support continuous growth
Most services only store and retrieve data by primary key, such as best
sellers lists, shopping carts, etc.
No need for complex querying and management afforded by RDBMS
Zubair Nabi 12: NoSQL in Action April 20, 2013 6 / 33
Design
1 Implemented as a partitioned system with replication and consistency
windows
Zubair Nabi 12: NoSQL in Action April 20, 2013 7 / 33
Design
1 Implemented as a partitioned system with replication and consistency
windows
2 Targets applications that require weaker consistency
Zubair Nabi 12: NoSQL in Action April 20, 2013 7 / 33
Design
1 Implemented as a partitioned system with replication and consistency
windows
2 Targets applications that require weaker consistency
3 Gives high availability
Zubair Nabi 12: NoSQL in Action April 20, 2013 7 / 33
Design
1 Implemented as a partitioned system with replication and consistency
windows
2 Targets applications that require weaker consistency
3 Gives high availability
4 Possibility for write operations even in the presence of partitioning
amongst replicas
Zubair Nabi 12: NoSQL in Action April 20, 2013 7 / 33
Design
1 Implemented as a partitioned system with replication and consistency
windows
2 Targets applications that require weaker consistency
3 Gives high availability
4 Possibility for write operations even in the presence of partitioning
amongst replicas
5 Always writeable so conflict resolution needs to happen during reads
Zubair Nabi 12: NoSQL in Action April 20, 2013 7 / 33
Conflict Resolution
A datastore can only perform simple conflict resolution
Zubair Nabi 12: NoSQL in Action April 20, 2013 8 / 33
Conflict Resolution
A datastore can only perform simple conflict resolution
Passes the buck to the application
Zubair Nabi 12: NoSQL in Action April 20, 2013 8 / 33
Conflict Resolution
A datastore can only perform simple conflict resolution
Passes the buck to the application
The application is aware of the data schema and hence better suited to
choose a conflict resolution mechanism
Zubair Nabi 12: NoSQL in Action April 20, 2013 8 / 33
Conflict Resolution
A datastore can only perform simple conflict resolution
Passes the buck to the application
The application is aware of the data schema and hence better suited to
choose a conflict resolution mechanism
If the application does not want to implement conflict resolution, simple
mechanisms, such as “last write wins” provided by the framework
Zubair Nabi 12: NoSQL in Action April 20, 2013 8 / 33
Interface
1 Simple key/value interface storing values as BLOBs
Zubair Nabi 12: NoSQL in Action April 20, 2013 9 / 33
Interface
1 Simple key/value interface storing values as BLOBs
2 Operations limited to one key/value pair at a time
Zubair Nabi 12: NoSQL in Action April 20, 2013 9 / 33
Interface
1 Simple key/value interface storing values as BLOBs
2 Operations limited to one key/value pair at a time
3 No support for hierarchichal namespaces (like those in filesystems)
Zubair Nabi 12: NoSQL in Action April 20, 2013 9 / 33
Node Assignment
Completely decentralized so all nodes have equal responsibilities
Zubair Nabi 12: NoSQL in Action April 20, 2013 10 / 33
Node Assignment
Completely decentralized so all nodes have equal responsibilities
As nodes can be heterogeneous, work is distributed proportional to the
capabilities of a node
Zubair Nabi 12: NoSQL in Action April 20, 2013 10 / 33
Operations
Provides two operations:
Zubair Nabi 12: NoSQL in Action April 20, 2013 11 / 33
Operations
Provides two operations:
1 get(key), returns a list of objects and a context
Zubair Nabi 12: NoSQL in Action April 20, 2013 11 / 33
Operations
Provides two operations:
1 get(key), returns a list of objects and a context
2 put(key, context, object)
Zubair Nabi 12: NoSQL in Action April 20, 2013 11 / 33
Operations
Provides two operations:
1 get(key), returns a list of objects and a context
2 put(key, context, object)
get can return more than one object if more than one conflicting
versions
Zubair Nabi 12: NoSQL in Action April 20, 2013 11 / 33
Operations
Provides two operations:
1 get(key), returns a list of objects and a context
2 put(key, context, object)
get can return more than one object if more than one conflicting
versions
The context contains system metadata such as the object version
Zubair Nabi 12: NoSQL in Action April 20, 2013 11 / 33
Operations
Provides two operations:
1 get(key), returns a list of objects and a context
2 put(key, context, object)
get can return more than one object if more than one conflicting
versions
The context contains system metadata such as the object version
Keys and values are stored as an array of bytes, and only interpreted
by the application
Zubair Nabi 12: NoSQL in Action April 20, 2013 11 / 33
Partitioning
MD5 hash of keys determines their storage nodes
Zubair Nabi 12: NoSQL in Action April 20, 2013 12 / 33
Partitioning
MD5 hash of keys determines their storage nodes
Consistent hashing to provide incremental scalability
Zubair Nabi 12: NoSQL in Action April 20, 2013 12 / 33
Partitioning
MD5 hash of keys determines their storage nodes
Consistent hashing to provide incremental scalability
Partitioning done across virtual nodes instead of physical ones to take
hardware heterogeneity into account
Zubair Nabi 12: NoSQL in Action April 20, 2013 12 / 33
Outline
1 Amazon’s Dynamo
2 MongoDB
3 Google BigTable
4 Cassandra
Zubair Nabi 12: NoSQL in Action April 20, 2013 13 / 33
Introduction
Schemaless document database in C++
Zubair Nabi 12: NoSQL in Action April 20, 2013 14 / 33
Introduction
Schemaless document database in C++
Used by a large number of organizations including SourceForge.net.
foursquare, the New York Times, bit.ly, Craigslist, SAP, MTV, EA
Sports, github, etc.
Zubair Nabi 12: NoSQL in Action April 20, 2013 14 / 33
Introduction
Schemaless document database in C++
Used by a large number of organizations including SourceForge.net.
foursquare, the New York Times, bit.ly, Craigslist, SAP, MTV, EA
Sports, github, etc.
Databases are distributed over multiple servers
Zubair Nabi 12: NoSQL in Action April 20, 2013 14 / 33
Databases and Collections
Databases contain collections (“named groupings”) of documents
Zubair Nabi 12: NoSQL in Action April 20, 2013 15 / 33
Databases and Collections
Databases contain collections (“named groupings”) of documents
Documents within a collection might be heterogeneous
Zubair Nabi 12: NoSQL in Action April 20, 2013 15 / 33
Databases and Collections
Databases contain collections (“named groupings”) of documents
Documents within a collection might be heterogeneous
But a good strategy is to create a database collection for each object
type
Zubair Nabi 12: NoSQL in Action April 20, 2013 15 / 33
Databases and Collections
Databases contain collections (“named groupings”) of documents
Documents within a collection might be heterogeneous
But a good strategy is to create a database collection for each object
type
A collection is created automatically whenever the first document is
inserted into the database
Zubair Nabi 12: NoSQL in Action April 20, 2013 15 / 33
Hierarchical Namespaces
Documents can be organized into a hierarchical structure using a
dot-notation
Zubair Nabi 12: NoSQL in Action April 20, 2013 16 / 33
Hierarchical Namespaces
Documents can be organized into a hierarchical structure using a
dot-notation
For instance, the collections wiki.articles, wiki.categories
and wiki.authors exist within the namespace wiki
Zubair Nabi 12: NoSQL in Action April 20, 2013 16 / 33
Hierarchical Namespaces
Documents can be organized into a hierarchical structure using a
dot-notation
For instance, the collections wiki.articles, wiki.categories
and wiki.authors exist within the namespace wiki
The collection namespace itself is flat, hierarchical structure only for
the user
Zubair Nabi 12: NoSQL in Action April 20, 2013 16 / 33
Documents
Unit of data storage
Zubair Nabi 12: NoSQL in Action April 20, 2013 17 / 33
Documents
Unit of data storage
Conceptually similar to an XML document, JSON document, etc.
Zubair Nabi 12: NoSQL in Action April 20, 2013 17 / 33
Documents
Unit of data storage
Conceptually similar to an XML document, JSON document, etc.
Documents are persisted in Binary JSON (BSON)
Zubair Nabi 12: NoSQL in Action April 20, 2013 17 / 33
Documents
Unit of data storage
Conceptually similar to an XML document, JSON document, etc.
Documents are persisted in Binary JSON (BSON)
Easy to convert between BSON and JSON and between BSON and
other programming language structures
Zubair Nabi 12: NoSQL in Action April 20, 2013 17 / 33
Documents
Unit of data storage
Conceptually similar to an XML document, JSON document, etc.
Documents are persisted in Binary JSON (BSON)
Easy to convert between BSON and JSON and between BSON and
other programming language structures
Possible to insert (insert), search (find), and update a document
(save)
Zubair Nabi 12: NoSQL in Action April 20, 2013 17 / 33
Datatypes
Scalar: boolean, integer, double
Zubair Nabi 12: NoSQL in Action April 20, 2013 18 / 33
Datatypes
Scalar: boolean, integer, double
Character sequence: string, code, etc.
Zubair Nabi 12: NoSQL in Action April 20, 2013 18 / 33
Datatypes
Scalar: boolean, integer, double
Character sequence: string, code, etc.
BSON-objects: object
Zubair Nabi 12: NoSQL in Action April 20, 2013 18 / 33
Datatypes
Scalar: boolean, integer, double
Character sequence: string, code, etc.
BSON-objects: object
Object ID: To identify documents within a collection
Zubair Nabi 12: NoSQL in Action April 20, 2013 18 / 33
Datatypes
Scalar: boolean, integer, double
Character sequence: string, code, etc.
BSON-objects: object
Object ID: To identify documents within a collection
Misc: null, array, date
Zubair Nabi 12: NoSQL in Action April 20, 2013 18 / 33
References
No mechanism for foreign keys
Zubair Nabi 12: NoSQL in Action April 20, 2013 19 / 33
References
No mechanism for foreign keys
References between documents need to be resolved by client
applications
Zubair Nabi 12: NoSQL in Action April 20, 2013 19 / 33
Transaction Properties
Atomicity for only update and delete operations
Zubair Nabi 12: NoSQL in Action April 20, 2013 20 / 33
Transaction Properties
Atomicity for only update and delete operations
Allows code to be executed locally on database nodes (server-side
code execution)
Zubair Nabi 12: NoSQL in Action April 20, 2013 20 / 33
Transaction Properties
Atomicity for only update and delete operations
Allows code to be executed locally on database nodes (server-side
code execution)
Three different strategies for server-side execution:
Zubair Nabi 12: NoSQL in Action April 20, 2013 20 / 33
Transaction Properties
Atomicity for only update and delete operations
Allows code to be executed locally on database nodes (server-side
code execution)
Three different strategies for server-side execution:
1 Execution of arbitrary code on a single node via eval operator
Zubair Nabi 12: NoSQL in Action April 20, 2013 20 / 33
Transaction Properties
Atomicity for only update and delete operations
Allows code to be executed locally on database nodes (server-side
code execution)
Three different strategies for server-side execution:
1 Execution of arbitrary code on a single node via eval operator
2 Aggregation via count, group, and distinct
Zubair Nabi 12: NoSQL in Action April 20, 2013 20 / 33
Transaction Properties
Atomicity for only update and delete operations
Allows code to be executed locally on database nodes (server-side
code execution)
Three different strategies for server-side execution:
1 Execution of arbitrary code on a single node via eval operator
2 Aggregation via count, group, and distinct
3 MapReduce code execution on multiple nodes
Zubair Nabi 12: NoSQL in Action April 20, 2013 20 / 33
Outline
1 Amazon’s Dynamo
2 MongoDB
3 Google BigTable
4 Cassandra
Zubair Nabi 12: NoSQL in Action April 20, 2013 21 / 33
Introduction
Supports a relaxed relational model that is dynamically controlled by
the clients
Zubair Nabi 12: NoSQL in Action April 20, 2013 22 / 33
Introduction
Supports a relaxed relational model that is dynamically controlled by
the clients
Clients can reason about the locality properties of the data
Zubair Nabi 12: NoSQL in Action April 20, 2013 22 / 33
Introduction
Supports a relaxed relational model that is dynamically controlled by
the clients
Clients can reason about the locality properties of the data
Data indexing can be row-wise as well as column-wise
Zubair Nabi 12: NoSQL in Action April 20, 2013 22 / 33
Introduction
Supports a relaxed relational model that is dynamically controlled by
the clients
Clients can reason about the locality properties of the data
Data indexing can be row-wise as well as column-wise
Data can be delivered either out of memory or from disk
Zubair Nabi 12: NoSQL in Action April 20, 2013 22 / 33
Introduction
Supports a relaxed relational model that is dynamically controlled by
the clients
Clients can reason about the locality properties of the data
Data indexing can be row-wise as well as column-wise
Data can be delivered either out of memory or from disk
Used internally by Google for more than 60 projects including Google
Earth, Google Analytics, Orkut, and Google Docs
Zubair Nabi 12: NoSQL in Action April 20, 2013 22 / 33
Data Model
Values stored as arrays of bytes which need to be interpreted by the
clients
Zubair Nabi 12: NoSQL in Action April 20, 2013 23 / 33
Data Model
Values stored as arrays of bytes which need to be interpreted by the
clients
Values are addressed by a 3-tuple (row-key, column-key,
timestamp)
Zubair Nabi 12: NoSQL in Action April 20, 2013 23 / 33
Data Model
Values stored as arrays of bytes which need to be interpreted by the
clients
Values are addressed by a 3-tuple (row-key, column-key,
timestamp)
Row keys are strings of up to 64KB
Zubair Nabi 12: NoSQL in Action April 20, 2013 23 / 33
Data Model
Values stored as arrays of bytes which need to be interpreted by the
clients
Values are addressed by a 3-tuple (row-key, column-key,
timestamp)
Row keys are strings of up to 64KB
Rows are maintained in lexicographic order and are dynamically
partitioned into tablets
Zubair Nabi 12: NoSQL in Action April 20, 2013 23 / 33
Data Model
Values stored as arrays of bytes which need to be interpreted by the
clients
Values are addressed by a 3-tuple (row-key, column-key,
timestamp)
Row keys are strings of up to 64KB
Rows are maintained in lexicographic order and are dynamically
partitioned into tablets
The unit of distribution and load balancing
Zubair Nabi 12: NoSQL in Action April 20, 2013 23 / 33
Data Model
Values stored as arrays of bytes which need to be interpreted by the
clients
Values are addressed by a 3-tuple (row-key, column-key,
timestamp)
Row keys are strings of up to 64KB
Rows are maintained in lexicographic order and are dynamically
partitioned into tablets
The unit of distribution and load balancing
Reads can be made efficient (only having to access a small number of
servers) by wisely choosing row keys
Zubair Nabi 12: NoSQL in Action April 20, 2013 23 / 33
Data Model
Values stored as arrays of bytes which need to be interpreted by the
clients
Values are addressed by a 3-tuple (row-key, column-key,
timestamp)
Row keys are strings of up to 64KB
Rows are maintained in lexicographic order and are dynamically
partitioned into tablets
The unit of distribution and load balancing
Reads can be made efficient (only having to access a small number of
servers) by wisely choosing row keys
Row ranges with small lexicographic distances are partitioned into fewer
tablets
Zubair Nabi 12: NoSQL in Action April 20, 2013 23 / 33
Data Model
Values stored as arrays of bytes which need to be interpreted by the
clients
Values are addressed by a 3-tuple (row-key, column-key,
timestamp)
Row keys are strings of up to 64KB
Rows are maintained in lexicographic order and are dynamically
partitioned into tablets
The unit of distribution and load balancing
Reads can be made efficient (only having to access a small number of
servers) by wisely choosing row keys
Row ranges with small lexicographic distances are partitioned into fewer
tablets
For instance storing URLs in reverse order: com.cnn.blogs,
com.cnn.www, etc.
Zubair Nabi 12: NoSQL in Action April 20, 2013 23 / 33
Columns
No limit on the number of columns per table
Zubair Nabi 12: NoSQL in Action April 20, 2013 24 / 33
Columns
No limit on the number of columns per table
Columns grouped into sets called column families based on their key
prefix
Zubair Nabi 12: NoSQL in Action April 20, 2013 24 / 33
Columns
No limit on the number of columns per table
Columns grouped into sets called column families based on their key
prefix
Basic unit of access control
Zubair Nabi 12: NoSQL in Action April 20, 2013 24 / 33
Columns
No limit on the number of columns per table
Columns grouped into sets called column families based on their key
prefix
Basic unit of access control
Expected to store the same or similar type of data so that it can be
compressed
Zubair Nabi 12: NoSQL in Action April 20, 2013 24 / 33
Columns
No limit on the number of columns per table
Columns grouped into sets called column families based on their key
prefix
Basic unit of access control
Expected to store the same or similar type of data so that it can be
compressed
Need to be created before data can be stored in a column
Zubair Nabi 12: NoSQL in Action April 20, 2013 24 / 33
Timestamps
64-bit integers that represent different versions of a cell value
Zubair Nabi 12: NoSQL in Action April 20, 2013 25 / 33
Timestamps
64-bit integers that represent different versions of a cell value
Value assigned by either the datastore or the client
Zubair Nabi 12: NoSQL in Action April 20, 2013 25 / 33
Timestamps
64-bit integers that represent different versions of a cell value
Value assigned by either the datastore or the client
Cells ordered in decreasing order of their timestamp
Zubair Nabi 12: NoSQL in Action April 20, 2013 25 / 33
Timestamps
64-bit integers that represent different versions of a cell value
Value assigned by either the datastore or the client
Cells ordered in decreasing order of their timestamp
Automatic garbage collection can be used to remove revisions
Zubair Nabi 12: NoSQL in Action April 20, 2013 25 / 33
API
Read operations for lookup, selection, etc.
Zubair Nabi 12: NoSQL in Action April 20, 2013 26 / 33
API
Read operations for lookup, selection, etc.
Write operations for creation, update, and deletion of values
Zubair Nabi 12: NoSQL in Action April 20, 2013 26 / 33
API
Read operations for lookup, selection, etc.
Write operations for creation, update, and deletion of values
Write operations for tables and column families for creation and
deletion
Zubair Nabi 12: NoSQL in Action April 20, 2013 26 / 33
API
Read operations for lookup, selection, etc.
Write operations for creation, update, and deletion of values
Write operations for tables and column families for creation and
deletion
Administrative operations to modify store configuration and metadata
Zubair Nabi 12: NoSQL in Action April 20, 2013 26 / 33
API
Read operations for lookup, selection, etc.
Write operations for creation, update, and deletion of values
Write operations for tables and column families for creation and
deletion
Administrative operations to modify store configuration and metadata
MapReduce hooks
Zubair Nabi 12: NoSQL in Action April 20, 2013 26 / 33
API
Read operations for lookup, selection, etc.
Write operations for creation, update, and deletion of values
Write operations for tables and column families for creation and
deletion
Administrative operations to modify store configuration and metadata
MapReduce hooks
Transactions are atomic at the single-row level
Zubair Nabi 12: NoSQL in Action April 20, 2013 26 / 33
Architecture
Implemented atop GFS
Zubair Nabi 12: NoSQL in Action April 20, 2013 27 / 33
Architecture
Implemented atop GFS
Multiple tablet servers and a single master
Zubair Nabi 12: NoSQL in Action April 20, 2013 27 / 33
HBase
Open source clone of HBase in Java
Zubair Nabi 12: NoSQL in Action April 20, 2013 28 / 33
HBase
Open source clone of HBase in Java
Implemented atop HDFS
Zubair Nabi 12: NoSQL in Action April 20, 2013 28 / 33
HBase
Open source clone of HBase in Java
Implemented atop HDFS
HBase can be the source and/or the sink of Hadoop jobs
Zubair Nabi 12: NoSQL in Action April 20, 2013 28 / 33
HBase
Open source clone of HBase in Java
Implemented atop HDFS
HBase can be the source and/or the sink of Hadoop jobs
Facebook Chat implemented using HBase
Zubair Nabi 12: NoSQL in Action April 20, 2013 28 / 33
Outline
1 Amazon’s Dynamo
2 MongoDB
3 Google BigTable
4 Cassandra
Zubair Nabi 12: NoSQL in Action April 20, 2013 29 / 33
Introduction
Borrows concepts from both Dynamo and BigTable
Zubair Nabi 12: NoSQL in Action April 20, 2013 30 / 33
Introduction
Borrows concepts from both Dynamo and BigTable
Originally developed by Facebook but now an Apache open source
project
Zubair Nabi 12: NoSQL in Action April 20, 2013 30 / 33
Introduction
Borrows concepts from both Dynamo and BigTable
Originally developed by Facebook but now an Apache open source
project
Designed for Facebook Chat for efficiently storing, indexing, and
searching messages
Zubair Nabi 12: NoSQL in Action April 20, 2013 30 / 33
Design Goals
Processing of a large amount of data
Zubair Nabi 12: NoSQL in Action April 20, 2013 31 / 33
Design Goals
Processing of a large amount of data
Highly scalable
Zubair Nabi 12: NoSQL in Action April 20, 2013 31 / 33
Design Goals
Processing of a large amount of data
Highly scalable
Reliability at a massive scale
Zubair Nabi 12: NoSQL in Action April 20, 2013 31 / 33
Design Goals
Processing of a large amount of data
Highly scalable
Reliability at a massive scale
High throughput writes without sacrificing read efficiency
Zubair Nabi 12: NoSQL in Action April 20, 2013 31 / 33
Data Model
A table is a distributed multidimensional map indexed by a key
Zubair Nabi 12: NoSQL in Action April 20, 2013 32 / 33
Data Model
A table is a distributed multidimensional map indexed by a key
Rows are identified by a string-key and operations over them are
atomic per replica regardless of the number of columns
Zubair Nabi 12: NoSQL in Action April 20, 2013 32 / 33
Data Model
A table is a distributed multidimensional map indexed by a key
Rows are identified by a string-key and operations over them are
atomic per replica regardless of the number of columns
Column families encapsule columns and super columns
Zubair Nabi 12: NoSQL in Action April 20, 2013 32 / 33
Data Model
A table is a distributed multidimensional map indexed by a key
Rows are identified by a string-key and operations over them are
atomic per replica regardless of the number of columns
Column families encapsule columns and super columns
Columns have a name and store a number of values per row, each with
a timestamp
Zubair Nabi 12: NoSQL in Action April 20, 2013 32 / 33
Data Model
A table is a distributed multidimensional map indexed by a key
Rows are identified by a string-key and operations over them are
atomic per replica regardless of the number of columns
Column families encapsule columns and super columns
Columns have a name and store a number of values per row, each with
a timestamp
Super columns are columns with sub columns
Zubair Nabi 12: NoSQL in Action April 20, 2013 32 / 33
Data Model
A table is a distributed multidimensional map indexed by a key
Rows are identified by a string-key and operations over them are
atomic per replica regardless of the number of columns
Column families encapsule columns and super columns
Columns have a name and store a number of values per row, each with
a timestamp
Super columns are columns with sub columns
Only three operations to get, insert, and delete
Zubair Nabi 12: NoSQL in Action April 20, 2013 32 / 33
References
1 NoSQL Databases: https:
//oak.cs.ucla.edu/cs144/handouts/nosqldbs.pdf
Zubair Nabi 12: NoSQL in Action April 20, 2013 33 / 33

More Related Content

What's hot

Graph databases and the Panama Papers - Stefan Armbruster - Codemotion Milan ...
Graph databases and the Panama Papers - Stefan Armbruster - Codemotion Milan ...Graph databases and the Panama Papers - Stefan Armbruster - Codemotion Milan ...
Graph databases and the Panama Papers - Stefan Armbruster - Codemotion Milan ...
Codemotion
 
Meetup070416 Presentations
Meetup070416 PresentationsMeetup070416 Presentations
Meetup070416 Presentations
Ana Rebelo
 
Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...
Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...
Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...
Databricks
 
NOSQL Overview
NOSQL OverviewNOSQL Overview
NOSQL Overview
Tobias Lindaaker
 
Building Applications with a Graph Database
Building Applications with a Graph DatabaseBuilding Applications with a Graph Database
Building Applications with a Graph Database
Tobias Lindaaker
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
MongoDB
 
A Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapA Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's Roadmap
Itai Yaffe
 
Gobblin @ NerdWallet (Nov 2015)
Gobblin @ NerdWallet (Nov 2015)Gobblin @ NerdWallet (Nov 2015)
Gobblin @ NerdWallet (Nov 2015)
NerdWalletHQ
 
Ensuring Quality in Data Lakes (D&D Meetup Feb 22)
Ensuring Quality in Data Lakes  (D&D Meetup Feb 22)Ensuring Quality in Data Lakes  (D&D Meetup Feb 22)
Ensuring Quality in Data Lakes (D&D Meetup Feb 22)
lakeFS
 
Introduction to SQL++ for Big Data: Same Language, More Power
Introduction to SQL++ for Big Data: Same Language, More PowerIntroduction to SQL++ for Big Data: Same Language, More Power
Introduction to SQL++ for Big Data: Same Language, More Power
All Things Open
 
R2DBC JEEConf 2019 by Igor Lozynskyi
R2DBC JEEConf 2019 by Igor LozynskyiR2DBC JEEConf 2019 by Igor Lozynskyi
R2DBC JEEConf 2019 by Igor Lozynskyi
Igor Lozynskyi
 
CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...
CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...
CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...
Databricks
 
Grokking Engineering - Data Analytics Infrastructure at Viki - Huy Nguyen
Grokking Engineering - Data Analytics Infrastructure at Viki - Huy NguyenGrokking Engineering - Data Analytics Infrastructure at Viki - Huy Nguyen
Grokking Engineering - Data Analytics Infrastructure at Viki - Huy Nguyen
Huy Nguyen
 
Multi-Tenant Log Analytics SaaS Service using Solr: Presented by Chirag Gupta...
Multi-Tenant Log Analytics SaaS Service using Solr: Presented by Chirag Gupta...Multi-Tenant Log Analytics SaaS Service using Solr: Presented by Chirag Gupta...
Multi-Tenant Log Analytics SaaS Service using Solr: Presented by Chirag Gupta...
Lucidworks
 
Choosing the right NOSQL database
Choosing the right NOSQL databaseChoosing the right NOSQL database
Choosing the right NOSQL database
Tobias Lindaaker
 
Flink Community Update December 2015: Year in Review
Flink Community Update December 2015: Year in ReviewFlink Community Update December 2015: Year in Review
Flink Community Update December 2015: Year in Review
Robert Metzger
 
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Robert Metzger
 
Streaming ETL to Elastic with Apache Kafka and KSQL
Streaming ETL to Elastic with Apache Kafka and KSQLStreaming ETL to Elastic with Apache Kafka and KSQL
Streaming ETL to Elastic with Apache Kafka and KSQL
confluent
 
NoSQL databases pros and cons
NoSQL databases pros and consNoSQL databases pros and cons
NoSQL databases pros and cons
Fabio Fumarola
 
MongoDB Europe 2016 - Using MongoDB to Build a Fast and Scalable Content Repo...
MongoDB Europe 2016 - Using MongoDB to Build a Fast and Scalable Content Repo...MongoDB Europe 2016 - Using MongoDB to Build a Fast and Scalable Content Repo...
MongoDB Europe 2016 - Using MongoDB to Build a Fast and Scalable Content Repo...
MongoDB
 

What's hot (20)

Graph databases and the Panama Papers - Stefan Armbruster - Codemotion Milan ...
Graph databases and the Panama Papers - Stefan Armbruster - Codemotion Milan ...Graph databases and the Panama Papers - Stefan Armbruster - Codemotion Milan ...
Graph databases and the Panama Papers - Stefan Armbruster - Codemotion Milan ...
 
Meetup070416 Presentations
Meetup070416 PresentationsMeetup070416 Presentations
Meetup070416 Presentations
 
Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...
Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...
Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...
 
NOSQL Overview
NOSQL OverviewNOSQL Overview
NOSQL Overview
 
Building Applications with a Graph Database
Building Applications with a Graph DatabaseBuilding Applications with a Graph Database
Building Applications with a Graph Database
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
A Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapA Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's Roadmap
 
Gobblin @ NerdWallet (Nov 2015)
Gobblin @ NerdWallet (Nov 2015)Gobblin @ NerdWallet (Nov 2015)
Gobblin @ NerdWallet (Nov 2015)
 
Ensuring Quality in Data Lakes (D&D Meetup Feb 22)
Ensuring Quality in Data Lakes  (D&D Meetup Feb 22)Ensuring Quality in Data Lakes  (D&D Meetup Feb 22)
Ensuring Quality in Data Lakes (D&D Meetup Feb 22)
 
Introduction to SQL++ for Big Data: Same Language, More Power
Introduction to SQL++ for Big Data: Same Language, More PowerIntroduction to SQL++ for Big Data: Same Language, More Power
Introduction to SQL++ for Big Data: Same Language, More Power
 
R2DBC JEEConf 2019 by Igor Lozynskyi
R2DBC JEEConf 2019 by Igor LozynskyiR2DBC JEEConf 2019 by Igor Lozynskyi
R2DBC JEEConf 2019 by Igor Lozynskyi
 
CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...
CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...
CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...
 
Grokking Engineering - Data Analytics Infrastructure at Viki - Huy Nguyen
Grokking Engineering - Data Analytics Infrastructure at Viki - Huy NguyenGrokking Engineering - Data Analytics Infrastructure at Viki - Huy Nguyen
Grokking Engineering - Data Analytics Infrastructure at Viki - Huy Nguyen
 
Multi-Tenant Log Analytics SaaS Service using Solr: Presented by Chirag Gupta...
Multi-Tenant Log Analytics SaaS Service using Solr: Presented by Chirag Gupta...Multi-Tenant Log Analytics SaaS Service using Solr: Presented by Chirag Gupta...
Multi-Tenant Log Analytics SaaS Service using Solr: Presented by Chirag Gupta...
 
Choosing the right NOSQL database
Choosing the right NOSQL databaseChoosing the right NOSQL database
Choosing the right NOSQL database
 
Flink Community Update December 2015: Year in Review
Flink Community Update December 2015: Year in ReviewFlink Community Update December 2015: Year in Review
Flink Community Update December 2015: Year in Review
 
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
 
Streaming ETL to Elastic with Apache Kafka and KSQL
Streaming ETL to Elastic with Apache Kafka and KSQLStreaming ETL to Elastic with Apache Kafka and KSQL
Streaming ETL to Elastic with Apache Kafka and KSQL
 
NoSQL databases pros and cons
NoSQL databases pros and consNoSQL databases pros and cons
NoSQL databases pros and cons
 
MongoDB Europe 2016 - Using MongoDB to Build a Fast and Scalable Content Repo...
MongoDB Europe 2016 - Using MongoDB to Build a Fast and Scalable Content Repo...MongoDB Europe 2016 - Using MongoDB to Build a Fast and Scalable Content Repo...
MongoDB Europe 2016 - Using MongoDB to Build a Fast and Scalable Content Repo...
 

Viewers also liked

Tìm hiểu MongoDB
Tìm hiểu MongoDBTìm hiểu MongoDB
Tìm hiểu MongoDB
Trung Hiếu Trần
 
Query mechanisms for NoSQL databases
Query mechanisms for NoSQL databasesQuery mechanisms for NoSQL databases
Query mechanisms for NoSQL databases
ArangoDB Database
 
Hospital Management System
Hospital Management SystemHospital Management System
Hospital Management System
Pranil Dukare
 
Hospital management system(database)
Hospital management system(database)Hospital management system(database)
Hospital management system(database)
Iftikhar Ahmad
 
Hospital management system
Hospital management systemHospital management system
Hospital management system
Mohammad Safiullah
 
Hospital management system project
Hospital management system projectHospital management system project
Hospital management system project
Himani Chopra
 

Viewers also liked (6)

Tìm hiểu MongoDB
Tìm hiểu MongoDBTìm hiểu MongoDB
Tìm hiểu MongoDB
 
Query mechanisms for NoSQL databases
Query mechanisms for NoSQL databasesQuery mechanisms for NoSQL databases
Query mechanisms for NoSQL databases
 
Hospital Management System
Hospital Management SystemHospital Management System
Hospital Management System
 
Hospital management system(database)
Hospital management system(database)Hospital management system(database)
Hospital management system(database)
 
Hospital management system
Hospital management systemHospital management system
Hospital management system
 
Hospital management system project
Hospital management system projectHospital management system project
Hospital management system project
 

Similar to Topic 12: NoSQL in Action

Topic 13: Cloud Stacks
Topic 13: Cloud StacksTopic 13: Cloud Stacks
Topic 13: Cloud Stacks
Zubair Nabi
 
Topic 11: Google Filesystem
Topic 11: Google FilesystemTopic 11: Google Filesystem
Topic 11: Google Filesystem
Zubair Nabi
 
Topic 8: Enhancements and Alternative Architectures
Topic 8: Enhancements and Alternative ArchitecturesTopic 8: Enhancements and Alternative Architectures
Topic 8: Enhancements and Alternative Architectures
Zubair Nabi
 
Introduction to MongoDB.pptx
Introduction to MongoDB.pptxIntroduction to MongoDB.pptx
Introduction to MongoDB.pptx
Surya937648
 
Oracle NoSQL Database release 3.0 overview
Oracle NoSQL Database release 3.0 overviewOracle NoSQL Database release 3.0 overview
Oracle NoSQL Database release 3.0 overview
Dave Segleau
 
The Power of Relationships in Your Big Data
The Power of Relationships in Your Big DataThe Power of Relationships in Your Big Data
The Power of Relationships in Your Big Data
Paulo Fagundes
 
Erciyes university
Erciyes universityErciyes university
Erciyes university
hothaifa alkhazraji
 
Session 203 iouc summit database
Session 203 iouc summit databaseSession 203 iouc summit database
Session 203 iouc summit database
OUGTH Oracle User Group in Thailand
 
Considerations for using NoSQL technology on your next IT project - Akmal Cha...
Considerations for using NoSQL technology on your next IT project - Akmal Cha...Considerations for using NoSQL technology on your next IT project - Akmal Cha...
Considerations for using NoSQL technology on your next IT project - Akmal Cha...
BCS Data Management Specialist Group
 
How to bake a Customer Story with With Windows, NVM-e, Data Guard, ACFS Snaps...
How to bake a Customer Story with With Windows, NVM-e, Data Guard, ACFS Snaps...How to bake a Customer Story with With Windows, NVM-e, Data Guard, ACFS Snaps...
How to bake a Customer Story with With Windows, NVM-e, Data Guard, ACFS Snaps...
Ludovico Caldara
 
CloverDX 6.2 Release
CloverDX 6.2 ReleaseCloverDX 6.2 Release
CloverDX 6.2 Release
CloverDX
 
FIWARE Global Summit - A Multi-database Plugin for the Orion FIWARE Context B...
FIWARE Global Summit - A Multi-database Plugin for the Orion FIWARE Context B...FIWARE Global Summit - A Multi-database Plugin for the Orion FIWARE Context B...
FIWARE Global Summit - A Multi-database Plugin for the Orion FIWARE Context B...
FIWARE
 
What is new in MariaDB 10.6?
What is new in MariaDB 10.6?What is new in MariaDB 10.6?
What is new in MariaDB 10.6?
Mydbops
 
Topic 15: Datacenter Design and Networking
Topic 15: Datacenter Design and NetworkingTopic 15: Datacenter Design and Networking
Topic 15: Datacenter Design and Networking
Zubair Nabi
 
Solution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big DataSolution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big Data
InfiniteGraph
 
Why no sql_ibm_cloudant
Why no sql_ibm_cloudantWhy no sql_ibm_cloudant
Why no sql_ibm_cloudant
Peter Tutty
 
Neo4j Vision and Roadmap
Neo4j Vision and Roadmap Neo4j Vision and Roadmap
Neo4j Vision and Roadmap
Neo4j
 
DDJ_102113
DDJ_102113DDJ_102113
DDJ_102113
Deirdre Blake
 
NoSQL
NoSQLNoSQL
Les nouveautés de MySQL 8.0
Les nouveautés de MySQL 8.0Les nouveautés de MySQL 8.0
Les nouveautés de MySQL 8.0
Frederic Descamps
 

Similar to Topic 12: NoSQL in Action (20)

Topic 13: Cloud Stacks
Topic 13: Cloud StacksTopic 13: Cloud Stacks
Topic 13: Cloud Stacks
 
Topic 11: Google Filesystem
Topic 11: Google FilesystemTopic 11: Google Filesystem
Topic 11: Google Filesystem
 
Topic 8: Enhancements and Alternative Architectures
Topic 8: Enhancements and Alternative ArchitecturesTopic 8: Enhancements and Alternative Architectures
Topic 8: Enhancements and Alternative Architectures
 
Introduction to MongoDB.pptx
Introduction to MongoDB.pptxIntroduction to MongoDB.pptx
Introduction to MongoDB.pptx
 
Oracle NoSQL Database release 3.0 overview
Oracle NoSQL Database release 3.0 overviewOracle NoSQL Database release 3.0 overview
Oracle NoSQL Database release 3.0 overview
 
The Power of Relationships in Your Big Data
The Power of Relationships in Your Big DataThe Power of Relationships in Your Big Data
The Power of Relationships in Your Big Data
 
Erciyes university
Erciyes universityErciyes university
Erciyes university
 
Session 203 iouc summit database
Session 203 iouc summit databaseSession 203 iouc summit database
Session 203 iouc summit database
 
Considerations for using NoSQL technology on your next IT project - Akmal Cha...
Considerations for using NoSQL technology on your next IT project - Akmal Cha...Considerations for using NoSQL technology on your next IT project - Akmal Cha...
Considerations for using NoSQL technology on your next IT project - Akmal Cha...
 
How to bake a Customer Story with With Windows, NVM-e, Data Guard, ACFS Snaps...
How to bake a Customer Story with With Windows, NVM-e, Data Guard, ACFS Snaps...How to bake a Customer Story with With Windows, NVM-e, Data Guard, ACFS Snaps...
How to bake a Customer Story with With Windows, NVM-e, Data Guard, ACFS Snaps...
 
CloverDX 6.2 Release
CloverDX 6.2 ReleaseCloverDX 6.2 Release
CloverDX 6.2 Release
 
FIWARE Global Summit - A Multi-database Plugin for the Orion FIWARE Context B...
FIWARE Global Summit - A Multi-database Plugin for the Orion FIWARE Context B...FIWARE Global Summit - A Multi-database Plugin for the Orion FIWARE Context B...
FIWARE Global Summit - A Multi-database Plugin for the Orion FIWARE Context B...
 
What is new in MariaDB 10.6?
What is new in MariaDB 10.6?What is new in MariaDB 10.6?
What is new in MariaDB 10.6?
 
Topic 15: Datacenter Design and Networking
Topic 15: Datacenter Design and NetworkingTopic 15: Datacenter Design and Networking
Topic 15: Datacenter Design and Networking
 
Solution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big DataSolution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big Data
 
Why no sql_ibm_cloudant
Why no sql_ibm_cloudantWhy no sql_ibm_cloudant
Why no sql_ibm_cloudant
 
Neo4j Vision and Roadmap
Neo4j Vision and Roadmap Neo4j Vision and Roadmap
Neo4j Vision and Roadmap
 
DDJ_102113
DDJ_102113DDJ_102113
DDJ_102113
 
NoSQL
NoSQLNoSQL
NoSQL
 
Les nouveautés de MySQL 8.0
Les nouveautés de MySQL 8.0Les nouveautés de MySQL 8.0
Les nouveautés de MySQL 8.0
 

More from Zubair Nabi

AOS Lab 12: Network Communication
AOS Lab 12: Network CommunicationAOS Lab 12: Network Communication
AOS Lab 12: Network Communication
Zubair Nabi
 
AOS Lab 11: Virtualization
AOS Lab 11: VirtualizationAOS Lab 11: Virtualization
AOS Lab 11: Virtualization
Zubair Nabi
 
AOS Lab 10: File system -- Inodes and beyond
AOS Lab 10: File system -- Inodes and beyondAOS Lab 10: File system -- Inodes and beyond
AOS Lab 10: File system -- Inodes and beyond
Zubair Nabi
 
AOS Lab 9: File system -- Of buffers, logs, and blocks
AOS Lab 9: File system -- Of buffers, logs, and blocksAOS Lab 9: File system -- Of buffers, logs, and blocks
AOS Lab 9: File system -- Of buffers, logs, and blocks
Zubair Nabi
 
AOS Lab 8: Interrupts and Device Drivers
AOS Lab 8: Interrupts and Device DriversAOS Lab 8: Interrupts and Device Drivers
AOS Lab 8: Interrupts and Device Drivers
Zubair Nabi
 
AOS Lab 7: Page tables
AOS Lab 7: Page tablesAOS Lab 7: Page tables
AOS Lab 7: Page tables
Zubair Nabi
 
AOS Lab 6: Scheduling
AOS Lab 6: SchedulingAOS Lab 6: Scheduling
AOS Lab 6: Scheduling
Zubair Nabi
 
AOS Lab 5: System calls
AOS Lab 5: System callsAOS Lab 5: System calls
AOS Lab 5: System calls
Zubair Nabi
 
AOS Lab 4: If you liked it, then you should have put a “lock” on it
AOS Lab 4: If you liked it, then you should have put a “lock” on itAOS Lab 4: If you liked it, then you should have put a “lock” on it
AOS Lab 4: If you liked it, then you should have put a “lock” on it
Zubair Nabi
 
AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!
Zubair Nabi
 
AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!
Zubair Nabi
 
AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!
Zubair Nabi
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data Stack
Zubair Nabi
 
Raabta: Low-cost Video Conferencing for the Developing World
Raabta: Low-cost Video Conferencing for the Developing WorldRaabta: Low-cost Video Conferencing for the Developing World
Raabta: Low-cost Video Conferencing for the Developing World
Zubair Nabi
 
The Anatomy of Web Censorship in Pakistan
The Anatomy of Web Censorship in PakistanThe Anatomy of Web Censorship in Pakistan
The Anatomy of Web Censorship in Pakistan
Zubair Nabi
 
MapReduce and DBMS Hybrids
MapReduce and DBMS HybridsMapReduce and DBMS Hybrids
MapReduce and DBMS Hybrids
Zubair Nabi
 
MapReduce Application Scripting
MapReduce Application ScriptingMapReduce Application Scripting
MapReduce Application Scripting
Zubair Nabi
 
Topic 14: Operating Systems and Virtualization
Topic 14: Operating Systems and VirtualizationTopic 14: Operating Systems and Virtualization
Topic 14: Operating Systems and Virtualization
Zubair Nabi
 
Lab 5: Interconnecting a Datacenter using Mininet
Lab 5: Interconnecting a Datacenter using MininetLab 5: Interconnecting a Datacenter using Mininet
Lab 5: Interconnecting a Datacenter using Mininet
Zubair Nabi
 
Lab 4: Interfacing with Cassandra
Lab 4: Interfacing with CassandraLab 4: Interfacing with Cassandra
Lab 4: Interfacing with Cassandra
Zubair Nabi
 

More from Zubair Nabi (20)

AOS Lab 12: Network Communication
AOS Lab 12: Network CommunicationAOS Lab 12: Network Communication
AOS Lab 12: Network Communication
 
AOS Lab 11: Virtualization
AOS Lab 11: VirtualizationAOS Lab 11: Virtualization
AOS Lab 11: Virtualization
 
AOS Lab 10: File system -- Inodes and beyond
AOS Lab 10: File system -- Inodes and beyondAOS Lab 10: File system -- Inodes and beyond
AOS Lab 10: File system -- Inodes and beyond
 
AOS Lab 9: File system -- Of buffers, logs, and blocks
AOS Lab 9: File system -- Of buffers, logs, and blocksAOS Lab 9: File system -- Of buffers, logs, and blocks
AOS Lab 9: File system -- Of buffers, logs, and blocks
 
AOS Lab 8: Interrupts and Device Drivers
AOS Lab 8: Interrupts and Device DriversAOS Lab 8: Interrupts and Device Drivers
AOS Lab 8: Interrupts and Device Drivers
 
AOS Lab 7: Page tables
AOS Lab 7: Page tablesAOS Lab 7: Page tables
AOS Lab 7: Page tables
 
AOS Lab 6: Scheduling
AOS Lab 6: SchedulingAOS Lab 6: Scheduling
AOS Lab 6: Scheduling
 
AOS Lab 5: System calls
AOS Lab 5: System callsAOS Lab 5: System calls
AOS Lab 5: System calls
 
AOS Lab 4: If you liked it, then you should have put a “lock” on it
AOS Lab 4: If you liked it, then you should have put a “lock” on itAOS Lab 4: If you liked it, then you should have put a “lock” on it
AOS Lab 4: If you liked it, then you should have put a “lock” on it
 
AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!
 
AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!
 
AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data Stack
 
Raabta: Low-cost Video Conferencing for the Developing World
Raabta: Low-cost Video Conferencing for the Developing WorldRaabta: Low-cost Video Conferencing for the Developing World
Raabta: Low-cost Video Conferencing for the Developing World
 
The Anatomy of Web Censorship in Pakistan
The Anatomy of Web Censorship in PakistanThe Anatomy of Web Censorship in Pakistan
The Anatomy of Web Censorship in Pakistan
 
MapReduce and DBMS Hybrids
MapReduce and DBMS HybridsMapReduce and DBMS Hybrids
MapReduce and DBMS Hybrids
 
MapReduce Application Scripting
MapReduce Application ScriptingMapReduce Application Scripting
MapReduce Application Scripting
 
Topic 14: Operating Systems and Virtualization
Topic 14: Operating Systems and VirtualizationTopic 14: Operating Systems and Virtualization
Topic 14: Operating Systems and Virtualization
 
Lab 5: Interconnecting a Datacenter using Mininet
Lab 5: Interconnecting a Datacenter using MininetLab 5: Interconnecting a Datacenter using Mininet
Lab 5: Interconnecting a Datacenter using Mininet
 
Lab 4: Interfacing with Cassandra
Lab 4: Interfacing with CassandraLab 4: Interfacing with Cassandra
Lab 4: Interfacing with Cassandra
 

Recently uploaded

JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
Edge AI and Vision Alliance
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 

Recently uploaded (20)

JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 

Topic 12: NoSQL in Action

  • 1. 12: NoSQL in Action Zubair Nabi zubair.nabi@itu.edu.pk April 20, 2013 Zubair Nabi 12: NoSQL in Action April 20, 2013 1 / 33
  • 2. Outline 1 Amazon’s Dynamo 2 MongoDB 3 Google BigTable 4 Cassandra Zubair Nabi 12: NoSQL in Action April 20, 2013 2 / 33
  • 3. Outline 1 Amazon’s Dynamo 2 MongoDB 3 Google BigTable 4 Cassandra Zubair Nabi 12: NoSQL in Action April 20, 2013 3 / 33
  • 4. Introduction At the forefront of the NoSQL movement and has influenced the design of many subsequent systems Zubair Nabi 12: NoSQL in Action April 20, 2013 4 / 33
  • 5. Introduction At the forefront of the NoSQL movement and has influenced the design of many subsequent systems Design considerations are two-fold: 1) Infrastructure and 2) Business Zubair Nabi 12: NoSQL in Action April 20, 2013 4 / 33
  • 6. Infrastructure Considerations Tens of thousands of servers and network elements distributed across the globe Zubair Nabi 12: NoSQL in Action April 20, 2013 5 / 33
  • 7. Infrastructure Considerations Tens of thousands of servers and network elements distributed across the globe Commodity off-the-shelf hardware Failure is normal Zubair Nabi 12: NoSQL in Action April 20, 2013 5 / 33
  • 8. Infrastructure Considerations Tens of thousands of servers and network elements distributed across the globe Commodity off-the-shelf hardware Failure is normal Hundreds of services, all decentralized and loosely coupled Zubair Nabi 12: NoSQL in Action April 20, 2013 5 / 33
  • 9. Business Considerations Strict, internal SLAs regarding performance, reliability, and efficiency Zubair Nabi 12: NoSQL in Action April 20, 2013 6 / 33
  • 10. Business Considerations Strict, internal SLAs regarding performance, reliability, and efficiency Reliability is of paramount importance because an outage means loss in revenue and customer trust Zubair Nabi 12: NoSQL in Action April 20, 2013 6 / 33
  • 11. Business Considerations Strict, internal SLAs regarding performance, reliability, and efficiency Reliability is of paramount importance because an outage means loss in revenue and customer trust The platform needs to be highly scalable, to support continuous growth Zubair Nabi 12: NoSQL in Action April 20, 2013 6 / 33
  • 12. Business Considerations Strict, internal SLAs regarding performance, reliability, and efficiency Reliability is of paramount importance because an outage means loss in revenue and customer trust The platform needs to be highly scalable, to support continuous growth Most services only store and retrieve data by primary key, such as best sellers lists, shopping carts, etc. Zubair Nabi 12: NoSQL in Action April 20, 2013 6 / 33
  • 13. Business Considerations Strict, internal SLAs regarding performance, reliability, and efficiency Reliability is of paramount importance because an outage means loss in revenue and customer trust The platform needs to be highly scalable, to support continuous growth Most services only store and retrieve data by primary key, such as best sellers lists, shopping carts, etc. No need for complex querying and management afforded by RDBMS Zubair Nabi 12: NoSQL in Action April 20, 2013 6 / 33
  • 14. Design 1 Implemented as a partitioned system with replication and consistency windows Zubair Nabi 12: NoSQL in Action April 20, 2013 7 / 33
  • 15. Design 1 Implemented as a partitioned system with replication and consistency windows 2 Targets applications that require weaker consistency Zubair Nabi 12: NoSQL in Action April 20, 2013 7 / 33
  • 16. Design 1 Implemented as a partitioned system with replication and consistency windows 2 Targets applications that require weaker consistency 3 Gives high availability Zubair Nabi 12: NoSQL in Action April 20, 2013 7 / 33
  • 17. Design 1 Implemented as a partitioned system with replication and consistency windows 2 Targets applications that require weaker consistency 3 Gives high availability 4 Possibility for write operations even in the presence of partitioning amongst replicas Zubair Nabi 12: NoSQL in Action April 20, 2013 7 / 33
  • 18. Design 1 Implemented as a partitioned system with replication and consistency windows 2 Targets applications that require weaker consistency 3 Gives high availability 4 Possibility for write operations even in the presence of partitioning amongst replicas 5 Always writeable so conflict resolution needs to happen during reads Zubair Nabi 12: NoSQL in Action April 20, 2013 7 / 33
  • 19. Conflict Resolution A datastore can only perform simple conflict resolution Zubair Nabi 12: NoSQL in Action April 20, 2013 8 / 33
  • 20. Conflict Resolution A datastore can only perform simple conflict resolution Passes the buck to the application Zubair Nabi 12: NoSQL in Action April 20, 2013 8 / 33
  • 21. Conflict Resolution A datastore can only perform simple conflict resolution Passes the buck to the application The application is aware of the data schema and hence better suited to choose a conflict resolution mechanism Zubair Nabi 12: NoSQL in Action April 20, 2013 8 / 33
  • 22. Conflict Resolution A datastore can only perform simple conflict resolution Passes the buck to the application The application is aware of the data schema and hence better suited to choose a conflict resolution mechanism If the application does not want to implement conflict resolution, simple mechanisms, such as “last write wins” provided by the framework Zubair Nabi 12: NoSQL in Action April 20, 2013 8 / 33
  • 23. Interface 1 Simple key/value interface storing values as BLOBs Zubair Nabi 12: NoSQL in Action April 20, 2013 9 / 33
  • 24. Interface 1 Simple key/value interface storing values as BLOBs 2 Operations limited to one key/value pair at a time Zubair Nabi 12: NoSQL in Action April 20, 2013 9 / 33
  • 25. Interface 1 Simple key/value interface storing values as BLOBs 2 Operations limited to one key/value pair at a time 3 No support for hierarchichal namespaces (like those in filesystems) Zubair Nabi 12: NoSQL in Action April 20, 2013 9 / 33
  • 26. Node Assignment Completely decentralized so all nodes have equal responsibilities Zubair Nabi 12: NoSQL in Action April 20, 2013 10 / 33
  • 27. Node Assignment Completely decentralized so all nodes have equal responsibilities As nodes can be heterogeneous, work is distributed proportional to the capabilities of a node Zubair Nabi 12: NoSQL in Action April 20, 2013 10 / 33
  • 28. Operations Provides two operations: Zubair Nabi 12: NoSQL in Action April 20, 2013 11 / 33
  • 29. Operations Provides two operations: 1 get(key), returns a list of objects and a context Zubair Nabi 12: NoSQL in Action April 20, 2013 11 / 33
  • 30. Operations Provides two operations: 1 get(key), returns a list of objects and a context 2 put(key, context, object) Zubair Nabi 12: NoSQL in Action April 20, 2013 11 / 33
  • 31. Operations Provides two operations: 1 get(key), returns a list of objects and a context 2 put(key, context, object) get can return more than one object if more than one conflicting versions Zubair Nabi 12: NoSQL in Action April 20, 2013 11 / 33
  • 32. Operations Provides two operations: 1 get(key), returns a list of objects and a context 2 put(key, context, object) get can return more than one object if more than one conflicting versions The context contains system metadata such as the object version Zubair Nabi 12: NoSQL in Action April 20, 2013 11 / 33
  • 33. Operations Provides two operations: 1 get(key), returns a list of objects and a context 2 put(key, context, object) get can return more than one object if more than one conflicting versions The context contains system metadata such as the object version Keys and values are stored as an array of bytes, and only interpreted by the application Zubair Nabi 12: NoSQL in Action April 20, 2013 11 / 33
  • 34. Partitioning MD5 hash of keys determines their storage nodes Zubair Nabi 12: NoSQL in Action April 20, 2013 12 / 33
  • 35. Partitioning MD5 hash of keys determines their storage nodes Consistent hashing to provide incremental scalability Zubair Nabi 12: NoSQL in Action April 20, 2013 12 / 33
  • 36. Partitioning MD5 hash of keys determines their storage nodes Consistent hashing to provide incremental scalability Partitioning done across virtual nodes instead of physical ones to take hardware heterogeneity into account Zubair Nabi 12: NoSQL in Action April 20, 2013 12 / 33
  • 37. Outline 1 Amazon’s Dynamo 2 MongoDB 3 Google BigTable 4 Cassandra Zubair Nabi 12: NoSQL in Action April 20, 2013 13 / 33
  • 38. Introduction Schemaless document database in C++ Zubair Nabi 12: NoSQL in Action April 20, 2013 14 / 33
  • 39. Introduction Schemaless document database in C++ Used by a large number of organizations including SourceForge.net. foursquare, the New York Times, bit.ly, Craigslist, SAP, MTV, EA Sports, github, etc. Zubair Nabi 12: NoSQL in Action April 20, 2013 14 / 33
  • 40. Introduction Schemaless document database in C++ Used by a large number of organizations including SourceForge.net. foursquare, the New York Times, bit.ly, Craigslist, SAP, MTV, EA Sports, github, etc. Databases are distributed over multiple servers Zubair Nabi 12: NoSQL in Action April 20, 2013 14 / 33
  • 41. Databases and Collections Databases contain collections (“named groupings”) of documents Zubair Nabi 12: NoSQL in Action April 20, 2013 15 / 33
  • 42. Databases and Collections Databases contain collections (“named groupings”) of documents Documents within a collection might be heterogeneous Zubair Nabi 12: NoSQL in Action April 20, 2013 15 / 33
  • 43. Databases and Collections Databases contain collections (“named groupings”) of documents Documents within a collection might be heterogeneous But a good strategy is to create a database collection for each object type Zubair Nabi 12: NoSQL in Action April 20, 2013 15 / 33
  • 44. Databases and Collections Databases contain collections (“named groupings”) of documents Documents within a collection might be heterogeneous But a good strategy is to create a database collection for each object type A collection is created automatically whenever the first document is inserted into the database Zubair Nabi 12: NoSQL in Action April 20, 2013 15 / 33
  • 45. Hierarchical Namespaces Documents can be organized into a hierarchical structure using a dot-notation Zubair Nabi 12: NoSQL in Action April 20, 2013 16 / 33
  • 46. Hierarchical Namespaces Documents can be organized into a hierarchical structure using a dot-notation For instance, the collections wiki.articles, wiki.categories and wiki.authors exist within the namespace wiki Zubair Nabi 12: NoSQL in Action April 20, 2013 16 / 33
  • 47. Hierarchical Namespaces Documents can be organized into a hierarchical structure using a dot-notation For instance, the collections wiki.articles, wiki.categories and wiki.authors exist within the namespace wiki The collection namespace itself is flat, hierarchical structure only for the user Zubair Nabi 12: NoSQL in Action April 20, 2013 16 / 33
  • 48. Documents Unit of data storage Zubair Nabi 12: NoSQL in Action April 20, 2013 17 / 33
  • 49. Documents Unit of data storage Conceptually similar to an XML document, JSON document, etc. Zubair Nabi 12: NoSQL in Action April 20, 2013 17 / 33
  • 50. Documents Unit of data storage Conceptually similar to an XML document, JSON document, etc. Documents are persisted in Binary JSON (BSON) Zubair Nabi 12: NoSQL in Action April 20, 2013 17 / 33
  • 51. Documents Unit of data storage Conceptually similar to an XML document, JSON document, etc. Documents are persisted in Binary JSON (BSON) Easy to convert between BSON and JSON and between BSON and other programming language structures Zubair Nabi 12: NoSQL in Action April 20, 2013 17 / 33
  • 52. Documents Unit of data storage Conceptually similar to an XML document, JSON document, etc. Documents are persisted in Binary JSON (BSON) Easy to convert between BSON and JSON and between BSON and other programming language structures Possible to insert (insert), search (find), and update a document (save) Zubair Nabi 12: NoSQL in Action April 20, 2013 17 / 33
  • 53. Datatypes Scalar: boolean, integer, double Zubair Nabi 12: NoSQL in Action April 20, 2013 18 / 33
  • 54. Datatypes Scalar: boolean, integer, double Character sequence: string, code, etc. Zubair Nabi 12: NoSQL in Action April 20, 2013 18 / 33
  • 55. Datatypes Scalar: boolean, integer, double Character sequence: string, code, etc. BSON-objects: object Zubair Nabi 12: NoSQL in Action April 20, 2013 18 / 33
  • 56. Datatypes Scalar: boolean, integer, double Character sequence: string, code, etc. BSON-objects: object Object ID: To identify documents within a collection Zubair Nabi 12: NoSQL in Action April 20, 2013 18 / 33
  • 57. Datatypes Scalar: boolean, integer, double Character sequence: string, code, etc. BSON-objects: object Object ID: To identify documents within a collection Misc: null, array, date Zubair Nabi 12: NoSQL in Action April 20, 2013 18 / 33
  • 58. References No mechanism for foreign keys Zubair Nabi 12: NoSQL in Action April 20, 2013 19 / 33
  • 59. References No mechanism for foreign keys References between documents need to be resolved by client applications Zubair Nabi 12: NoSQL in Action April 20, 2013 19 / 33
  • 60. Transaction Properties Atomicity for only update and delete operations Zubair Nabi 12: NoSQL in Action April 20, 2013 20 / 33
  • 61. Transaction Properties Atomicity for only update and delete operations Allows code to be executed locally on database nodes (server-side code execution) Zubair Nabi 12: NoSQL in Action April 20, 2013 20 / 33
  • 62. Transaction Properties Atomicity for only update and delete operations Allows code to be executed locally on database nodes (server-side code execution) Three different strategies for server-side execution: Zubair Nabi 12: NoSQL in Action April 20, 2013 20 / 33
  • 63. Transaction Properties Atomicity for only update and delete operations Allows code to be executed locally on database nodes (server-side code execution) Three different strategies for server-side execution: 1 Execution of arbitrary code on a single node via eval operator Zubair Nabi 12: NoSQL in Action April 20, 2013 20 / 33
  • 64. Transaction Properties Atomicity for only update and delete operations Allows code to be executed locally on database nodes (server-side code execution) Three different strategies for server-side execution: 1 Execution of arbitrary code on a single node via eval operator 2 Aggregation via count, group, and distinct Zubair Nabi 12: NoSQL in Action April 20, 2013 20 / 33
  • 65. Transaction Properties Atomicity for only update and delete operations Allows code to be executed locally on database nodes (server-side code execution) Three different strategies for server-side execution: 1 Execution of arbitrary code on a single node via eval operator 2 Aggregation via count, group, and distinct 3 MapReduce code execution on multiple nodes Zubair Nabi 12: NoSQL in Action April 20, 2013 20 / 33
  • 66. Outline 1 Amazon’s Dynamo 2 MongoDB 3 Google BigTable 4 Cassandra Zubair Nabi 12: NoSQL in Action April 20, 2013 21 / 33
  • 67. Introduction Supports a relaxed relational model that is dynamically controlled by the clients Zubair Nabi 12: NoSQL in Action April 20, 2013 22 / 33
  • 68. Introduction Supports a relaxed relational model that is dynamically controlled by the clients Clients can reason about the locality properties of the data Zubair Nabi 12: NoSQL in Action April 20, 2013 22 / 33
  • 69. Introduction Supports a relaxed relational model that is dynamically controlled by the clients Clients can reason about the locality properties of the data Data indexing can be row-wise as well as column-wise Zubair Nabi 12: NoSQL in Action April 20, 2013 22 / 33
  • 70. Introduction Supports a relaxed relational model that is dynamically controlled by the clients Clients can reason about the locality properties of the data Data indexing can be row-wise as well as column-wise Data can be delivered either out of memory or from disk Zubair Nabi 12: NoSQL in Action April 20, 2013 22 / 33
  • 71. Introduction Supports a relaxed relational model that is dynamically controlled by the clients Clients can reason about the locality properties of the data Data indexing can be row-wise as well as column-wise Data can be delivered either out of memory or from disk Used internally by Google for more than 60 projects including Google Earth, Google Analytics, Orkut, and Google Docs Zubair Nabi 12: NoSQL in Action April 20, 2013 22 / 33
  • 72. Data Model Values stored as arrays of bytes which need to be interpreted by the clients Zubair Nabi 12: NoSQL in Action April 20, 2013 23 / 33
  • 73. Data Model Values stored as arrays of bytes which need to be interpreted by the clients Values are addressed by a 3-tuple (row-key, column-key, timestamp) Zubair Nabi 12: NoSQL in Action April 20, 2013 23 / 33
  • 74. Data Model Values stored as arrays of bytes which need to be interpreted by the clients Values are addressed by a 3-tuple (row-key, column-key, timestamp) Row keys are strings of up to 64KB Zubair Nabi 12: NoSQL in Action April 20, 2013 23 / 33
  • 75. Data Model Values stored as arrays of bytes which need to be interpreted by the clients Values are addressed by a 3-tuple (row-key, column-key, timestamp) Row keys are strings of up to 64KB Rows are maintained in lexicographic order and are dynamically partitioned into tablets Zubair Nabi 12: NoSQL in Action April 20, 2013 23 / 33
  • 76. Data Model Values stored as arrays of bytes which need to be interpreted by the clients Values are addressed by a 3-tuple (row-key, column-key, timestamp) Row keys are strings of up to 64KB Rows are maintained in lexicographic order and are dynamically partitioned into tablets The unit of distribution and load balancing Zubair Nabi 12: NoSQL in Action April 20, 2013 23 / 33
  • 77. Data Model Values stored as arrays of bytes which need to be interpreted by the clients Values are addressed by a 3-tuple (row-key, column-key, timestamp) Row keys are strings of up to 64KB Rows are maintained in lexicographic order and are dynamically partitioned into tablets The unit of distribution and load balancing Reads can be made efficient (only having to access a small number of servers) by wisely choosing row keys Zubair Nabi 12: NoSQL in Action April 20, 2013 23 / 33
  • 78. Data Model Values stored as arrays of bytes which need to be interpreted by the clients Values are addressed by a 3-tuple (row-key, column-key, timestamp) Row keys are strings of up to 64KB Rows are maintained in lexicographic order and are dynamically partitioned into tablets The unit of distribution and load balancing Reads can be made efficient (only having to access a small number of servers) by wisely choosing row keys Row ranges with small lexicographic distances are partitioned into fewer tablets Zubair Nabi 12: NoSQL in Action April 20, 2013 23 / 33
  • 79. Data Model Values stored as arrays of bytes which need to be interpreted by the clients Values are addressed by a 3-tuple (row-key, column-key, timestamp) Row keys are strings of up to 64KB Rows are maintained in lexicographic order and are dynamically partitioned into tablets The unit of distribution and load balancing Reads can be made efficient (only having to access a small number of servers) by wisely choosing row keys Row ranges with small lexicographic distances are partitioned into fewer tablets For instance storing URLs in reverse order: com.cnn.blogs, com.cnn.www, etc. Zubair Nabi 12: NoSQL in Action April 20, 2013 23 / 33
  • 80. Columns No limit on the number of columns per table Zubair Nabi 12: NoSQL in Action April 20, 2013 24 / 33
  • 81. Columns No limit on the number of columns per table Columns grouped into sets called column families based on their key prefix Zubair Nabi 12: NoSQL in Action April 20, 2013 24 / 33
  • 82. Columns No limit on the number of columns per table Columns grouped into sets called column families based on their key prefix Basic unit of access control Zubair Nabi 12: NoSQL in Action April 20, 2013 24 / 33
  • 83. Columns No limit on the number of columns per table Columns grouped into sets called column families based on their key prefix Basic unit of access control Expected to store the same or similar type of data so that it can be compressed Zubair Nabi 12: NoSQL in Action April 20, 2013 24 / 33
  • 84. Columns No limit on the number of columns per table Columns grouped into sets called column families based on their key prefix Basic unit of access control Expected to store the same or similar type of data so that it can be compressed Need to be created before data can be stored in a column Zubair Nabi 12: NoSQL in Action April 20, 2013 24 / 33
  • 85. Timestamps 64-bit integers that represent different versions of a cell value Zubair Nabi 12: NoSQL in Action April 20, 2013 25 / 33
  • 86. Timestamps 64-bit integers that represent different versions of a cell value Value assigned by either the datastore or the client Zubair Nabi 12: NoSQL in Action April 20, 2013 25 / 33
  • 87. Timestamps 64-bit integers that represent different versions of a cell value Value assigned by either the datastore or the client Cells ordered in decreasing order of their timestamp Zubair Nabi 12: NoSQL in Action April 20, 2013 25 / 33
  • 88. Timestamps 64-bit integers that represent different versions of a cell value Value assigned by either the datastore or the client Cells ordered in decreasing order of their timestamp Automatic garbage collection can be used to remove revisions Zubair Nabi 12: NoSQL in Action April 20, 2013 25 / 33
  • 89. API Read operations for lookup, selection, etc. Zubair Nabi 12: NoSQL in Action April 20, 2013 26 / 33
  • 90. API Read operations for lookup, selection, etc. Write operations for creation, update, and deletion of values Zubair Nabi 12: NoSQL in Action April 20, 2013 26 / 33
  • 91. API Read operations for lookup, selection, etc. Write operations for creation, update, and deletion of values Write operations for tables and column families for creation and deletion Zubair Nabi 12: NoSQL in Action April 20, 2013 26 / 33
  • 92. API Read operations for lookup, selection, etc. Write operations for creation, update, and deletion of values Write operations for tables and column families for creation and deletion Administrative operations to modify store configuration and metadata Zubair Nabi 12: NoSQL in Action April 20, 2013 26 / 33
  • 93. API Read operations for lookup, selection, etc. Write operations for creation, update, and deletion of values Write operations for tables and column families for creation and deletion Administrative operations to modify store configuration and metadata MapReduce hooks Zubair Nabi 12: NoSQL in Action April 20, 2013 26 / 33
  • 94. API Read operations for lookup, selection, etc. Write operations for creation, update, and deletion of values Write operations for tables and column families for creation and deletion Administrative operations to modify store configuration and metadata MapReduce hooks Transactions are atomic at the single-row level Zubair Nabi 12: NoSQL in Action April 20, 2013 26 / 33
  • 95. Architecture Implemented atop GFS Zubair Nabi 12: NoSQL in Action April 20, 2013 27 / 33
  • 96. Architecture Implemented atop GFS Multiple tablet servers and a single master Zubair Nabi 12: NoSQL in Action April 20, 2013 27 / 33
  • 97. HBase Open source clone of HBase in Java Zubair Nabi 12: NoSQL in Action April 20, 2013 28 / 33
  • 98. HBase Open source clone of HBase in Java Implemented atop HDFS Zubair Nabi 12: NoSQL in Action April 20, 2013 28 / 33
  • 99. HBase Open source clone of HBase in Java Implemented atop HDFS HBase can be the source and/or the sink of Hadoop jobs Zubair Nabi 12: NoSQL in Action April 20, 2013 28 / 33
  • 100. HBase Open source clone of HBase in Java Implemented atop HDFS HBase can be the source and/or the sink of Hadoop jobs Facebook Chat implemented using HBase Zubair Nabi 12: NoSQL in Action April 20, 2013 28 / 33
  • 101. Outline 1 Amazon’s Dynamo 2 MongoDB 3 Google BigTable 4 Cassandra Zubair Nabi 12: NoSQL in Action April 20, 2013 29 / 33
  • 102. Introduction Borrows concepts from both Dynamo and BigTable Zubair Nabi 12: NoSQL in Action April 20, 2013 30 / 33
  • 103. Introduction Borrows concepts from both Dynamo and BigTable Originally developed by Facebook but now an Apache open source project Zubair Nabi 12: NoSQL in Action April 20, 2013 30 / 33
  • 104. Introduction Borrows concepts from both Dynamo and BigTable Originally developed by Facebook but now an Apache open source project Designed for Facebook Chat for efficiently storing, indexing, and searching messages Zubair Nabi 12: NoSQL in Action April 20, 2013 30 / 33
  • 105. Design Goals Processing of a large amount of data Zubair Nabi 12: NoSQL in Action April 20, 2013 31 / 33
  • 106. Design Goals Processing of a large amount of data Highly scalable Zubair Nabi 12: NoSQL in Action April 20, 2013 31 / 33
  • 107. Design Goals Processing of a large amount of data Highly scalable Reliability at a massive scale Zubair Nabi 12: NoSQL in Action April 20, 2013 31 / 33
  • 108. Design Goals Processing of a large amount of data Highly scalable Reliability at a massive scale High throughput writes without sacrificing read efficiency Zubair Nabi 12: NoSQL in Action April 20, 2013 31 / 33
  • 109. Data Model A table is a distributed multidimensional map indexed by a key Zubair Nabi 12: NoSQL in Action April 20, 2013 32 / 33
  • 110. Data Model A table is a distributed multidimensional map indexed by a key Rows are identified by a string-key and operations over them are atomic per replica regardless of the number of columns Zubair Nabi 12: NoSQL in Action April 20, 2013 32 / 33
  • 111. Data Model A table is a distributed multidimensional map indexed by a key Rows are identified by a string-key and operations over them are atomic per replica regardless of the number of columns Column families encapsule columns and super columns Zubair Nabi 12: NoSQL in Action April 20, 2013 32 / 33
  • 112. Data Model A table is a distributed multidimensional map indexed by a key Rows are identified by a string-key and operations over them are atomic per replica regardless of the number of columns Column families encapsule columns and super columns Columns have a name and store a number of values per row, each with a timestamp Zubair Nabi 12: NoSQL in Action April 20, 2013 32 / 33
  • 113. Data Model A table is a distributed multidimensional map indexed by a key Rows are identified by a string-key and operations over them are atomic per replica regardless of the number of columns Column families encapsule columns and super columns Columns have a name and store a number of values per row, each with a timestamp Super columns are columns with sub columns Zubair Nabi 12: NoSQL in Action April 20, 2013 32 / 33
  • 114. Data Model A table is a distributed multidimensional map indexed by a key Rows are identified by a string-key and operations over them are atomic per replica regardless of the number of columns Column families encapsule columns and super columns Columns have a name and store a number of values per row, each with a timestamp Super columns are columns with sub columns Only three operations to get, insert, and delete Zubair Nabi 12: NoSQL in Action April 20, 2013 32 / 33
  • 115. References 1 NoSQL Databases: https: //oak.cs.ucla.edu/cs144/handouts/nosqldbs.pdf Zubair Nabi 12: NoSQL in Action April 20, 2013 33 / 33