SlideShare a Scribd company logo
1 of 104
Download to read offline
10: Taxonomy of Data and Storage
Zubair Nabi
zubair.nabi@itu.edu.pk
April 20, 2013
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 1 / 27
Outline
1 Datasets
2 Storage
3 Beyond RDBMS
4 NoSQL Taxonomy
5 NewSQL
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 2 / 27
Outline
1 Datasets
2 Storage
3 Beyond RDBMS
4 NoSQL Taxonomy
5 NewSQL
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 3 / 27
Introduction
Data is everywhere and is the driving force behind our lives
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
Introduction
Data is everywhere and is the driving force behind our lives
The address book on your phone is data
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
Introduction
Data is everywhere and is the driving force behind our lives
The address book on your phone is data
So is the newspaper that you read every morning
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
Introduction
Data is everywhere and is the driving force behind our lives
The address book on your phone is data
So is the newspaper that you read every morning
Everything you see around you is a potential source of data which
might be useful for a certain application
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
Introduction
Data is everywhere and is the driving force behind our lives
The address book on your phone is data
So is the newspaper that you read every morning
Everything you see around you is a potential source of data which
might be useful for a certain application
We use this data to share information and make a more informed
decision about different events
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
Introduction
Data is everywhere and is the driving force behind our lives
The address book on your phone is data
So is the newspaper that you read every morning
Everything you see around you is a potential source of data which
might be useful for a certain application
We use this data to share information and make a more informed
decision about different events
Datasets can easily be classiļ¬ed on the basis of their structure
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
Introduction
Data is everywhere and is the driving force behind our lives
The address book on your phone is data
So is the newspaper that you read every morning
Everything you see around you is a potential source of data which
might be useful for a certain application
We use this data to share information and make a more informed
decision about different events
Datasets can easily be classiļ¬ed on the basis of their structure
1 Structured
2 Unstructured
3 Semi-structured
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
Structured Data
Formatted in a universally understandable and identiļ¬able way
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
Structured Data
Formatted in a universally understandable and identiļ¬able way
In most cases, structured data is formally speciļ¬ed by a schema
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
Structured Data
Formatted in a universally understandable and identiļ¬able way
In most cases, structured data is formally speciļ¬ed by a schema
Your phone address phone is structured because it has a schema
consisting of name, phone number, address, email address, etc.
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
Structured Data
Formatted in a universally understandable and identiļ¬able way
In most cases, structured data is formally speciļ¬ed by a schema
Your phone address phone is structured because it has a schema
consisting of name, phone number, address, email address, etc.
Most traditional databases contain structured data revolving around
data laid out across columns and rows
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
Structured Data
Formatted in a universally understandable and identiļ¬able way
In most cases, structured data is formally speciļ¬ed by a schema
Your phone address phone is structured because it has a schema
consisting of name, phone number, address, email address, etc.
Most traditional databases contain structured data revolving around
data laid out across columns and rows
Each ļ¬eld also has an associated type
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
Structured Data
Formatted in a universally understandable and identiļ¬able way
In most cases, structured data is formally speciļ¬ed by a schema
Your phone address phone is structured because it has a schema
consisting of name, phone number, address, email address, etc.
Most traditional databases contain structured data revolving around
data laid out across columns and rows
Each ļ¬eld also has an associated type
Possible to search for items based on their data types
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
Unstructured Data
Data without any conceptual deļ¬nition or type
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 6 / 27
Unstructured Data
Data without any conceptual deļ¬nition or type
Can vary from raw text to binary data
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 6 / 27
Unstructured Data
Data without any conceptual deļ¬nition or type
Can vary from raw text to binary data
Processing unstructured data requires parsing and tagging on the ļ¬‚y
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 6 / 27
Unstructured Data
Data without any conceptual deļ¬nition or type
Can vary from raw text to binary data
Processing unstructured data requires parsing and tagging on the ļ¬‚y
In most cases, consists of simple log ļ¬les
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 6 / 27
Semi-structured Data
Occupies the space between the structured and unstructured data
spectrum
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 7 / 27
Semi-structured Data
Occupies the space between the structured and unstructured data
spectrum
For instance, while binary data has no structure, audio and video ļ¬les
have meta-data which has structure, such as author, time of creation,
etc.
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 7 / 27
Semi-structured Data
Occupies the space between the structured and unstructured data
spectrum
For instance, while binary data has no structure, audio and video ļ¬les
have meta-data which has structure, such as author, time of creation,
etc.
Can also be labelled as self-describing structure
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 7 / 27
Outline
1 Datasets
2 Storage
3 Beyond RDBMS
4 NoSQL Taxonomy
5 NewSQL
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 8 / 27
Database Management Systems (DBMS)
Used to store and manage data
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 9 / 27
Database Management Systems (DBMS)
Used to store and manage data
Support for large amounts of data
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 9 / 27
Database Management Systems (DBMS)
Used to store and manage data
Support for large amounts of data
Ensure concurrency, sharing, and locking
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 9 / 27
Database Management Systems (DBMS)
Used to store and manage data
Support for large amounts of data
Ensure concurrency, sharing, and locking
Security is useful too; to enable ļ¬ne-grained access control
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 9 / 27
Database Management Systems (DBMS)
Used to store and manage data
Support for large amounts of data
Ensure concurrency, sharing, and locking
Security is useful too; to enable ļ¬ne-grained access control
Ability to keep working in the face of failure
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 9 / 27
Relational Database Management Systems (RDBMS)
The most popular and predominant storage system in use
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
Relational Database Management Systems (RDBMS)
The most popular and predominant storage system in use
Data in different ļ¬les is connected by using a key ļ¬eld
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
Relational Database Management Systems (RDBMS)
The most popular and predominant storage system in use
Data in different ļ¬les is connected by using a key ļ¬eld
Data is laid out in different tables, with a key ļ¬eld that identiļ¬es each
row
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
Relational Database Management Systems (RDBMS)
The most popular and predominant storage system in use
Data in different ļ¬les is connected by using a key ļ¬eld
Data is laid out in different tables, with a key ļ¬eld that identiļ¬es each
row
The same key ļ¬eld is used to connect one table to another
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
Relational Database Management Systems (RDBMS)
The most popular and predominant storage system in use
Data in different ļ¬les is connected by using a key ļ¬eld
Data is laid out in different tables, with a key ļ¬eld that identiļ¬es each
row
The same key ļ¬eld is used to connect one table to another
For instance, a relation might have customer ID as key and her details
as data; another table might have the same key but different data, say
her purchases; yet another table with the same key might have a
breakdown of her preferences
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
Relational Database Management Systems (RDBMS)
The most popular and predominant storage system in use
Data in different ļ¬les is connected by using a key ļ¬eld
Data is laid out in different tables, with a key ļ¬eld that identiļ¬es each
row
The same key ļ¬eld is used to connect one table to another
For instance, a relation might have customer ID as key and her details
as data; another table might have the same key but different data, say
her purchases; yet another table with the same key might have a
breakdown of her preferences
Examples include Oracle Database, MS SQL Server, MySQL, IBM
DB2, and Teradata
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
Structured Query Language (SQL)
Non-procedural language used for data retrieval and manipulation in
RDBMS
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 11 / 27
Structured Query Language (SQL)
Non-procedural language used for data retrieval and manipulation in
RDBMS
Adds a layer of abstraction over relational algebra, which enables set
operations, selections, etc.
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 11 / 27
Structured Query Language (SQL)
Non-procedural language used for data retrieval and manipulation in
RDBMS
Adds a layer of abstraction over relational algebra, which enables set
operations, selections, etc.
Due to its declarative nature, users operate in terms of their expected
output while the underlying system decides the actual query execution
plan
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 11 / 27
Structured Query Language (SQL)
Non-procedural language used for data retrieval and manipulation in
RDBMS
Adds a layer of abstraction over relational algebra, which enables set
operations, selections, etc.
Due to its declarative nature, users operate in terms of their expected
output while the underlying system decides the actual query execution
plan
Instructions consist of a speciļ¬c SQL statement and additional
parameters and operands
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 11 / 27
Structured Query Language (SQL)
Non-procedural language used for data retrieval and manipulation in
RDBMS
Adds a layer of abstraction over relational algebra, which enables set
operations, selections, etc.
Due to its declarative nature, users operate in terms of their expected
output while the underlying system decides the actual query execution
plan
Instructions consist of a speciļ¬c SQL statement and additional
parameters and operands
For instance, the SELECT operator retrieves certain records, INSERT
adds a record, and so on
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 11 / 27
RDBMS and Structured Data
As structured data follows a predeļ¬ned schema, it naturally maps on to
a relational database system
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
RDBMS and Structured Data
As structured data follows a predeļ¬ned schema, it naturally maps on to
a relational database system
The schema deļ¬nes the type and structure of the data and its relations
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
RDBMS and Structured Data
As structured data follows a predeļ¬ned schema, it naturally maps on to
a relational database system
The schema deļ¬nes the type and structure of the data and its relations
Schema design is an arduous process and needs to be done before
the database can be populated
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
RDBMS and Structured Data
As structured data follows a predeļ¬ned schema, it naturally maps on to
a relational database system
The schema deļ¬nes the type and structure of the data and its relations
Schema design is an arduous process and needs to be done before
the database can be populated
Another consequence of a strict schema is that it is non-trivial to
extend it
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
RDBMS and Structured Data
As structured data follows a predeļ¬ned schema, it naturally maps on to
a relational database system
The schema deļ¬nes the type and structure of the data and its relations
Schema design is an arduous process and needs to be done before
the database can be populated
Another consequence of a strict schema is that it is non-trivial to
extend it
For instance, adding a new attribute to an existing row necessitates
adding a new column to the entire table
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
RDBMS and Structured Data
As structured data follows a predeļ¬ned schema, it naturally maps on to
a relational database system
The schema deļ¬nes the type and structure of the data and its relations
Schema design is an arduous process and needs to be done before
the database can be populated
Another consequence of a strict schema is that it is non-trivial to
extend it
For instance, adding a new attribute to an existing row necessitates
adding a new column to the entire table
Extremely suboptimal in tables with millions of rows
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
RDBMS and Semi- and Un-structured Data
Unstructured data has no notion of schema while semi-structured data
only has a weak one
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
RDBMS and Semi- and Un-structured Data
Unstructured data has no notion of schema while semi-structured data
only has a weak one
Data within such datasets also has an associated type
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
RDBMS and Semi- and Un-structured Data
Unstructured data has no notion of schema while semi-structured data
only has a weak one
Data within such datasets also has an associated type
In fact, types are application-centric: It might be possible to interpret a
ļ¬eld as a ļ¬‚oat in one application and as a string in another
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
RDBMS and Semi- and Un-structured Data
Unstructured data has no notion of schema while semi-structured data
only has a weak one
Data within such datasets also has an associated type
In fact, types are application-centric: It might be possible to interpret a
ļ¬eld as a ļ¬‚oat in one application and as a string in another
While it is possible, with human intervention, to glean structure from
unstructured data, it is an extremely expensive task
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
RDBMS and Semi- and Un-structured Data
Unstructured data has no notion of schema while semi-structured data
only has a weak one
Data within such datasets also has an associated type
In fact, types are application-centric: It might be possible to interpret a
ļ¬eld as a ļ¬‚oat in one application and as a string in another
While it is possible, with human intervention, to glean structure from
unstructured data, it is an extremely expensive task
Structureless data generated by real-time sources can change the
number of attributes and their types on the ļ¬‚y
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
RDBMS and Semi- and Un-structured Data
Unstructured data has no notion of schema while semi-structured data
only has a weak one
Data within such datasets also has an associated type
In fact, types are application-centric: It might be possible to interpret a
ļ¬eld as a ļ¬‚oat in one application and as a string in another
While it is possible, with human intervention, to glean structure from
unstructured data, it is an extremely expensive task
Structureless data generated by real-time sources can change the
number of attributes and their types on the ļ¬‚y
RDBMS would require the creation of a new table each time such a
change takes place
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
RDBMS and Semi- and Un-structured Data
Unstructured data has no notion of schema while semi-structured data
only has a weak one
Data within such datasets also has an associated type
In fact, types are application-centric: It might be possible to interpret a
ļ¬eld as a ļ¬‚oat in one application and as a string in another
While it is possible, with human intervention, to glean structure from
unstructured data, it is an extremely expensive task
Structureless data generated by real-time sources can change the
number of attributes and their types on the ļ¬‚y
RDBMS would require the creation of a new table each time such a
change takes place
Therefore, unstructured and semi-structured data does not ļ¬t the
relational model
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
Outline
1 Datasets
2 Storage
3 Beyond RDBMS
4 NoSQL Taxonomy
5 NewSQL
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 14 / 27
Motivation
Different semantics:
RDBMS provide ACID semantics:
1 Atomic: The entire transaction either succeeds or fails
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
Motivation
Different semantics:
RDBMS provide ACID semantics:
1 Atomic: The entire transaction either succeeds or fails
2 Consistent: Data within the database remains consistent after each
transaction
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
Motivation
Different semantics:
RDBMS provide ACID semantics:
1 Atomic: The entire transaction either succeeds or fails
2 Consistent: Data within the database remains consistent after each
transaction
3 Isolation: Transactions are sandboxed from each other
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
Motivation
Different semantics:
RDBMS provide ACID semantics:
1 Atomic: The entire transaction either succeeds or fails
2 Consistent: Data within the database remains consistent after each
transaction
3 Isolation: Transactions are sandboxed from each other
4 Durable: Transactions are persistent across failures and restarts
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
Motivation
Different semantics:
RDBMS provide ACID semantics:
1 Atomic: The entire transaction either succeeds or fails
2 Consistent: Data within the database remains consistent after each
transaction
3 Isolation: Transactions are sandboxed from each other
4 Durable: Transactions are persistent across failures and restarts
Overkill in case of most user-facing applications
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
Motivation
Different semantics:
RDBMS provide ACID semantics:
1 Atomic: The entire transaction either succeeds or fails
2 Consistent: Data within the database remains consistent after each
transaction
3 Isolation: Transactions are sandboxed from each other
4 Durable: Transactions are persistent across failures and restarts
Overkill in case of most user-facing applications
Most applications are more interested in availability and willing to
sacriļ¬ce consistency leading to eventual consistency
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
Motivation
Different semantics:
RDBMS provide ACID semantics:
1 Atomic: The entire transaction either succeeds or fails
2 Consistent: Data within the database remains consistent after each
transaction
3 Isolation: Transactions are sandboxed from each other
4 Durable: Transactions are persistent across failures and restarts
Overkill in case of most user-facing applications
Most applications are more interested in availability and willing to
sacriļ¬ce consistency leading to eventual consistency
This basically available, soft state, eventually consistent (BASE) model
enables applications to function even in the face of partial failure
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
Motivation
Different semantics:
RDBMS provide ACID semantics:
1 Atomic: The entire transaction either succeeds or fails
2 Consistent: Data within the database remains consistent after each
transaction
3 Isolation: Transactions are sandboxed from each other
4 Durable: Transactions are persistent across failures and restarts
Overkill in case of most user-facing applications
Most applications are more interested in availability and willing to
sacriļ¬ce consistency leading to eventual consistency
This basically available, soft state, eventually consistent (BASE) model
enables applications to function even in the face of partial failure
High Throughput: Most NoSQL databases sacriļ¬ce consistency for
availability leading to higher throughput (in some cases an order of
magnitude)
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
Motivation (2)
Horizontal Scalability: To cater for more data, NoSQL stores can be
scaled up by just adding more machines and the underlying system
automatically re-distributes the data
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 16 / 27
Motivation (2)
Horizontal Scalability: To cater for more data, NoSQL stores can be
scaled up by just adding more machines and the underlying system
automatically re-distributes the data
Commodity Hardware: A large number of RDBMS require specialized
and proprietary hardware for operation. In contrast, NoSQL databases
function over commodity off-the-shelf hardware
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 16 / 27
Motivation (2)
Horizontal Scalability: To cater for more data, NoSQL stores can be
scaled up by just adding more machines and the underlying system
automatically re-distributes the data
Commodity Hardware: A large number of RDBMS require specialized
and proprietary hardware for operation. In contrast, NoSQL databases
function over commodity off-the-shelf hardware
Programming Language Support: Over the years programming
languages have started providing abstractions for database support
(LINQ, etc.) while bypassing SQL. NoSQL databases provide
abstractions that directly map onto the language abstractions leading
to tighter coupling
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 16 / 27
Motivation (3)
The Rise of Cloud Computing: Cloud Computing applications require
horizontal scalability and low administration overhead. Both
requirements are naturally satisļ¬ed by NoSQL stores
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 17 / 27
Outline
1 Datasets
2 Storage
3 Beyond RDBMS
4 NoSQL Taxonomy
5 NewSQL
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 18 / 27
Introduction
NoSQL databases can be classiļ¬ed on the basis of:
1 Data Model: How data is represented
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 19 / 27
Introduction
NoSQL databases can be classiļ¬ed on the basis of:
1 Data Model: How data is represented
2 Scalability: How scalable the system is
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 19 / 27
Introduction
NoSQL databases can be classiļ¬ed on the basis of:
1 Data Model: How data is represented
2 Scalability: How scalable the system is
3 Query Model: What type of API it exposes
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 19 / 27
Introduction
NoSQL databases can be classiļ¬ed on the basis of:
1 Data Model: How data is represented
2 Scalability: How scalable the system is
3 Query Model: What type of API it exposes
4 Persistence: How persistent the data is
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 19 / 27
Classiļ¬cation by Data Model
Based on the data model, NoSQL databases can roughly be categorized
into three categories:
1 Key/value Stores: A map/dictionary allowing put/get semantics per
key
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 20 / 27
Classiļ¬cation by Data Model
Based on the data model, NoSQL databases can roughly be categorized
into three categories:
1 Key/value Stores: A map/dictionary allowing put/get semantics per
key
2 Document Stores: Complex data structures to encapsulate document
key/value pairs
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 20 / 27
Classiļ¬cation by Data Model
Based on the data model, NoSQL databases can roughly be categorized
into three categories:
1 Key/value Stores: A map/dictionary allowing put/get semantics per
key
2 Document Stores: Complex data structures to encapsulate document
key/value pairs
3 Column-Oriented Stores: Data laid out by column
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 20 / 27
Key/value Stores
Data is stored within a large hash map
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 21 / 27
Key/value Stores
Data is stored within a large hash map
Simple get/put API
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 21 / 27
Key/value Stores
Data is stored within a large hash map
Simple get/put API
Favour scalability over consistency
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 21 / 27
Key/value Stores
Data is stored within a large hash map
Simple get/put API
Favour scalability over consistency
Limit on the size of the key
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 21 / 27
Key/value Stores
Data is stored within a large hash map
Simple get/put API
Favour scalability over consistency
Limit on the size of the key
Examples include Amazonā€™s Dynamo, LinkedInā€™s Voldemort, Redis,
and Memcached
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 21 / 27
Document Stores
Key/value semantics but based on documents
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 22 / 27
Document Stores
Key/value semantics but based on documents
A document encapsulates data in a standard format, such as JSON,
XML, PDF, etc.
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 22 / 27
Document Stores
Key/value semantics but based on documents
A document encapsulates data in a standard format, such as JSON,
XML, PDF, etc.
Documents themselves can be heterogeneous
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 22 / 27
Document Stores
Key/value semantics but based on documents
A document encapsulates data in a standard format, such as JSON,
XML, PDF, etc.
Documents themselves can be heterogeneous
Documents can also be retrieved based on their content
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 22 / 27
Document Stores
Key/value semantics but based on documents
A document encapsulates data in a standard format, such as JSON,
XML, PDF, etc.
Documents themselves can be heterogeneous
Documents can also be retrieved based on their content
Examples include Apache CouchDB and MongoDB
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 22 / 27
Column-Oriented Stores
Data is stored and processed by column
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 23 / 27
Column-Oriented Stores
Data is stored and processed by column
Useful for read-mostly and read-intensive data
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 23 / 27
Column-Oriented Stores
Data is stored and processed by column
Useful for read-mostly and read-intensive data
Data within the same column is of the same type enabling
opportunities for efļ¬cient compression
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 23 / 27
Column-Oriented Stores
Data is stored and processed by column
Useful for read-mostly and read-intensive data
Data within the same column is of the same type enabling
opportunities for efļ¬cient compression
Columns are stored separately so they can be loaded in parallel
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 23 / 27
Column-Oriented Stores
Data is stored and processed by column
Useful for read-mostly and read-intensive data
Data within the same column is of the same type enabling
opportunities for efļ¬cient compression
Columns are stored separately so they can be loaded in parallel
Examples include Googleā€™s BigTable (Apache HBase is its open source
clone) and Facebookā€™s Cassandra
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 23 / 27
Outline
1 Datasets
2 Storage
3 Beyond RDBMS
4 NoSQL Taxonomy
5 NewSQL
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 24 / 27
Introduction
A hybrid of traditional RDBMS and NoSQL
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
Introduction
A hybrid of traditional RDBMS and NoSQL
Scalability and performance of NoSQL and ACID guarantees of RDBMS
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
Introduction
A hybrid of traditional RDBMS and NoSQL
Scalability and performance of NoSQL and ACID guarantees of RDBMS
Use SQL as the primary language
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
Introduction
A hybrid of traditional RDBMS and NoSQL
Scalability and performance of NoSQL and ACID guarantees of RDBMS
Use SQL as the primary language
Ability to scale out and run over commodity hardware
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
Introduction
A hybrid of traditional RDBMS and NoSQL
Scalability and performance of NoSQL and ACID guarantees of RDBMS
Use SQL as the primary language
Ability to scale out and run over commodity hardware
Classiļ¬ed into:
1 New Databases: Designed from scratch
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
Introduction
A hybrid of traditional RDBMS and NoSQL
Scalability and performance of NoSQL and ACID guarantees of RDBMS
Use SQL as the primary language
Ability to scale out and run over commodity hardware
Classiļ¬ed into:
1 New Databases: Designed from scratch
2 New MySQL Storage Engines: Keep MySQL as interface but replace
the storage engine
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
Introduction
A hybrid of traditional RDBMS and NoSQL
Scalability and performance of NoSQL and ACID guarantees of RDBMS
Use SQL as the primary language
Ability to scale out and run over commodity hardware
Classiļ¬ed into:
1 New Databases: Designed from scratch
2 New MySQL Storage Engines: Keep MySQL as interface but replace
the storage engine
3 Transparent Clustering: Add pluggable features to existing databases
to ensure scalability
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
New Databases
1 Query Distribution:
Each node holds a subset of the data
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
New Databases
1 Query Distribution:
Each node holds a subset of the data
Queries are split and shipped to nodes that own the data
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
New Databases
1 Query Distribution:
Each node holds a subset of the data
Queries are split and shipped to nodes that own the data
Examples include Googleā€™s Spanner and NuoDB
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
New Databases
1 Query Distribution:
Each node holds a subset of the data
Queries are split and shipped to nodes that own the data
Examples include Googleā€™s Spanner and NuoDB
2 Pull Data:
A central node (possibly replicated) holds all data
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
New Databases
1 Query Distribution:
Each node holds a subset of the data
Queries are split and shipped to nodes that own the data
Examples include Googleā€™s Spanner and NuoDB
2 Pull Data:
A central node (possibly replicated) holds all data
A set of processing nodes receives queries and pulls in required data
from the central node
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
New Databases
1 Query Distribution:
Each node holds a subset of the data
Queries are split and shipped to nodes that own the data
Examples include Googleā€™s Spanner and NuoDB
2 Pull Data:
A central node (possibly replicated) holds all data
A set of processing nodes receives queries and pulls in required data
from the central node
Examples include VMwareā€™s SQLFire
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
References
1 NoSQL Databases: https:
//oak.cs.ucla.edu/cs144/handouts/nosqldbs.pdf
2 NewSQL ā€“ The New Way to Handle Big Data: http://www.
linuxforu.com/2012/01/newsql-handle-big-data/
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 27 / 27

More Related Content

What's hot

DSpace Training Presentation
DSpace Training PresentationDSpace Training Presentation
DSpace Training PresentationThomas King
Ā 
Fungi general-characteristics-ppt
Fungi general-characteristics-pptFungi general-characteristics-ppt
Fungi general-characteristics-pptAmna Mustafa
Ā 
Biological Databases
Biological DatabasesBiological Databases
Biological DatabasesShweta Kagliwal
Ā 
Life cycle of_sphagnum
Life cycle of_sphagnumLife cycle of_sphagnum
Life cycle of_sphagnumJayakara Bhandary
Ā 
Cell biology of Lignification in plants
Cell biology of Lignification in plantsCell biology of Lignification in plants
Cell biology of Lignification in plantsManjunath R
Ā 
Solanaceae family
Solanaceae familySolanaceae family
Solanaceae familyvarshaYadav102
Ā 
Gymnospermic wood
Gymnospermic woodGymnospermic wood
Gymnospermic woodrapunzal t
Ā 
Old Botanic Garden UAF
Old Botanic Garden UAFOld Botanic Garden UAF
Old Botanic Garden UAFRana Salah-ud-Din
Ā 
Bioinformatics biological databases
Bioinformatics biological databasesBioinformatics biological databases
Bioinformatics biological databasesSangeeta Das
Ā 
Algae seminar
Algae seminarAlgae seminar
Algae seminarjishashajik
Ā 
Sporophytic evolution of pteridophytes
Sporophytic evolution of pteridophytesSporophytic evolution of pteridophytes
Sporophytic evolution of pteridophytesbhanupriya R
Ā 
Seasonal variation in cambial activity
Seasonal variation in cambial activitySeasonal variation in cambial activity
Seasonal variation in cambial activityAlen Shaji
Ā 

What's hot (20)

Nostoc
NostocNostoc
Nostoc
Ā 
DSpace Training Presentation
DSpace Training PresentationDSpace Training Presentation
DSpace Training Presentation
Ā 
Fungi general-characteristics-ppt
Fungi general-characteristics-pptFungi general-characteristics-ppt
Fungi general-characteristics-ppt
Ā 
Chara
CharaChara
Chara
Ā 
Biological Databases
Biological DatabasesBiological Databases
Biological Databases
Ā 
Life cycle of_sphagnum
Life cycle of_sphagnumLife cycle of_sphagnum
Life cycle of_sphagnum
Ā 
Rhynia
RhyniaRhynia
Rhynia
Ā 
Cell biology of Lignification in plants
Cell biology of Lignification in plantsCell biology of Lignification in plants
Cell biology of Lignification in plants
Ā 
Artificial classification system- Carolus Linneaus
Artificial classification system- Carolus LinneausArtificial classification system- Carolus Linneaus
Artificial classification system- Carolus Linneaus
Ā 
Solanaceae family
Solanaceae familySolanaceae family
Solanaceae family
Ā 
Wood-Types, properties & Importance
Wood-Types, properties  & ImportanceWood-Types, properties  & Importance
Wood-Types, properties & Importance
Ā 
Shoot Apex
Shoot ApexShoot Apex
Shoot Apex
Ā 
Gymnospermic wood
Gymnospermic woodGymnospermic wood
Gymnospermic wood
Ā 
Old Botanic Garden UAF
Old Botanic Garden UAFOld Botanic Garden UAF
Old Botanic Garden UAF
Ā 
Brassicaceae
BrassicaceaeBrassicaceae
Brassicaceae
Ā 
Introduction to plant taxonomy(2)
Introduction to plant taxonomy(2)Introduction to plant taxonomy(2)
Introduction to plant taxonomy(2)
Ā 
Bioinformatics biological databases
Bioinformatics biological databasesBioinformatics biological databases
Bioinformatics biological databases
Ā 
Algae seminar
Algae seminarAlgae seminar
Algae seminar
Ā 
Sporophytic evolution of pteridophytes
Sporophytic evolution of pteridophytesSporophytic evolution of pteridophytes
Sporophytic evolution of pteridophytes
Ā 
Seasonal variation in cambial activity
Seasonal variation in cambial activitySeasonal variation in cambial activity
Seasonal variation in cambial activity
Ā 

Similar to Topic 10: Taxonomy of Data and Storage

Database Management System
Database Management SystemDatabase Management System
Database Management SystemRHIMRJ Journal
Ā 
Ch # 09 database management system
Ch # 09 database management systemCh # 09 database management system
Ch # 09 database management systemMuhammadRobeel3
Ā 
Database management system
Database management systemDatabase management system
Database management systemSayed Ahmed
Ā 
Database management system
Database management systemDatabase management system
Database management systemSayed Ahmed
Ā 
Data Base Management System(Dbms)Sunita
Data Base Management System(Dbms)SunitaData Base Management System(Dbms)Sunita
Data Base Management System(Dbms)SunitaApex
Ā 
Ais Romney 2006 Slides 04 Relational Databases
Ais Romney 2006 Slides 04 Relational DatabasesAis Romney 2006 Slides 04 Relational Databases
Ais Romney 2006 Slides 04 Relational Databasessharing notes123
Ā 
Ais Romney 2006 Slides 04 Relational Databases
Ais Romney 2006 Slides 04 Relational DatabasesAis Romney 2006 Slides 04 Relational Databases
Ais Romney 2006 Slides 04 Relational DatabasesSharing Slides Training
Ā 
Ais Romney 2006 Slides 04 Relational Databases
Ais Romney 2006 Slides 04 Relational DatabasesAis Romney 2006 Slides 04 Relational Databases
Ais Romney 2006 Slides 04 Relational DatabasesSharing Slides Training
Ā 
Complete book Database management systems Handbook 3rd edition by Muhammad Sh...
Complete book Database management systems Handbook 3rd edition by Muhammad Sh...Complete book Database management systems Handbook 3rd edition by Muhammad Sh...
Complete book Database management systems Handbook 3rd edition by Muhammad Sh...Bahria University Islamabad, Pakistan
Ā 

Similar to Topic 10: Taxonomy of Data and Storage (20)

Database Management System
Database Management SystemDatabase Management System
Database Management System
Ā 
Ch # 09 database management system
Ch # 09 database management systemCh # 09 database management system
Ch # 09 database management system
Ā 
Database management system
Database management systemDatabase management system
Database management system
Ā 
Database management system
Database management systemDatabase management system
Database management system
Ā 
Database systems Handbook 4th dbms by Muhammad Sharif.pdf
Database systems Handbook 4th  dbms by Muhammad Sharif.pdfDatabase systems Handbook 4th  dbms by Muhammad Sharif.pdf
Database systems Handbook 4th dbms by Muhammad Sharif.pdf
Ā 
Database systems Handbook 4th dbms by Muhammad Sharif.pdf
Database systems Handbook 4th  dbms by Muhammad Sharif.pdfDatabase systems Handbook 4th  dbms by Muhammad Sharif.pdf
Database systems Handbook 4th dbms by Muhammad Sharif.pdf
Ā 
Database systems Handbook 4th dbms by Muhammad Sharif.pdf
Database systems Handbook 4th  dbms by Muhammad Sharif.pdfDatabase systems Handbook 4th  dbms by Muhammad Sharif.pdf
Database systems Handbook 4th dbms by Muhammad Sharif.pdf
Ā 
Data Base Management System(Dbms)Sunita
Data Base Management System(Dbms)SunitaData Base Management System(Dbms)Sunita
Data Base Management System(Dbms)Sunita
Ā 
Database system Handbook 4th muhammad sharif.pdf
Database system Handbook 4th muhammad sharif.pdfDatabase system Handbook 4th muhammad sharif.pdf
Database system Handbook 4th muhammad sharif.pdf
Ā 
Database system Handbook 4th muhammad sharif.pdf
Database system Handbook 4th muhammad sharif.pdfDatabase system Handbook 4th muhammad sharif.pdf
Database system Handbook 4th muhammad sharif.pdf
Ā 
Database systems handbook dbms rdbms.pdf
Database systems handbook dbms rdbms.pdfDatabase systems handbook dbms rdbms.pdf
Database systems handbook dbms rdbms.pdf
Ā 
Database systems handbook dbms rdbms.pdf
Database systems handbook dbms rdbms.pdfDatabase systems handbook dbms rdbms.pdf
Database systems handbook dbms rdbms.pdf
Ā 
Database systems handbook dbms rdbms.pdf
Database systems handbook dbms rdbms.pdfDatabase systems handbook dbms rdbms.pdf
Database systems handbook dbms rdbms.pdf
Ā 
Database systems handbook dbms rdbms.pdf
Database systems handbook dbms rdbms.pdfDatabase systems handbook dbms rdbms.pdf
Database systems handbook dbms rdbms.pdf
Ā 
Database systems handbook.pdf
Database systems handbook.pdfDatabase systems handbook.pdf
Database systems handbook.pdf
Ā 
Ais Romney 2006 Slides 04 Relational Databases
Ais Romney 2006 Slides 04 Relational DatabasesAis Romney 2006 Slides 04 Relational Databases
Ais Romney 2006 Slides 04 Relational Databases
Ā 
Ais Romney 2006 Slides 04 Relational Databases
Ais Romney 2006 Slides 04 Relational DatabasesAis Romney 2006 Slides 04 Relational Databases
Ais Romney 2006 Slides 04 Relational Databases
Ā 
Ais Romney 2006 Slides 04 Relational Databases
Ais Romney 2006 Slides 04 Relational DatabasesAis Romney 2006 Slides 04 Relational Databases
Ais Romney 2006 Slides 04 Relational Databases
Ā 
Databasell
DatabasellDatabasell
Databasell
Ā 
Complete book Database management systems Handbook 3rd edition by Muhammad Sh...
Complete book Database management systems Handbook 3rd edition by Muhammad Sh...Complete book Database management systems Handbook 3rd edition by Muhammad Sh...
Complete book Database management systems Handbook 3rd edition by Muhammad Sh...
Ā 

More from Zubair Nabi

AOS Lab 12: Network Communication
AOS Lab 12: Network CommunicationAOS Lab 12: Network Communication
AOS Lab 12: Network CommunicationZubair Nabi
Ā 
AOS Lab 11: Virtualization
AOS Lab 11: VirtualizationAOS Lab 11: Virtualization
AOS Lab 11: VirtualizationZubair Nabi
Ā 
AOS Lab 10: File system -- Inodes and beyond
AOS Lab 10: File system -- Inodes and beyondAOS Lab 10: File system -- Inodes and beyond
AOS Lab 10: File system -- Inodes and beyondZubair Nabi
Ā 
AOS Lab 9: File system -- Of buffers, logs, and blocks
AOS Lab 9: File system -- Of buffers, logs, and blocksAOS Lab 9: File system -- Of buffers, logs, and blocks
AOS Lab 9: File system -- Of buffers, logs, and blocksZubair Nabi
Ā 
AOS Lab 8: Interrupts and Device Drivers
AOS Lab 8: Interrupts and Device DriversAOS Lab 8: Interrupts and Device Drivers
AOS Lab 8: Interrupts and Device DriversZubair Nabi
Ā 
AOS Lab 7: Page tables
AOS Lab 7: Page tablesAOS Lab 7: Page tables
AOS Lab 7: Page tablesZubair Nabi
Ā 
AOS Lab 6: Scheduling
AOS Lab 6: SchedulingAOS Lab 6: Scheduling
AOS Lab 6: SchedulingZubair Nabi
Ā 
AOS Lab 5: System calls
AOS Lab 5: System callsAOS Lab 5: System calls
AOS Lab 5: System callsZubair Nabi
Ā 
AOS Lab 4: If you liked it, then you should have put a ā€œlockā€ on it
AOS Lab 4: If you liked it, then you should have put a ā€œlockā€ on itAOS Lab 4: If you liked it, then you should have put a ā€œlockā€ on it
AOS Lab 4: If you liked it, then you should have put a ā€œlockā€ on itZubair Nabi
Ā 
AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!Zubair Nabi
Ā 
AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!Zubair Nabi
Ā 
AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!Zubair Nabi
Ā 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data StackZubair Nabi
Ā 
Raabta: Low-cost Video Conferencing for the Developing World
Raabta: Low-cost Video Conferencing for the Developing WorldRaabta: Low-cost Video Conferencing for the Developing World
Raabta: Low-cost Video Conferencing for the Developing WorldZubair Nabi
Ā 
The Anatomy of Web Censorship in Pakistan
The Anatomy of Web Censorship in PakistanThe Anatomy of Web Censorship in Pakistan
The Anatomy of Web Censorship in PakistanZubair Nabi
Ā 
MapReduce and DBMS Hybrids
MapReduce and DBMS HybridsMapReduce and DBMS Hybrids
MapReduce and DBMS HybridsZubair Nabi
Ā 
MapReduce Application Scripting
MapReduce Application ScriptingMapReduce Application Scripting
MapReduce Application ScriptingZubair Nabi
Ā 
Topic 15: Datacenter Design and Networking
Topic 15: Datacenter Design and NetworkingTopic 15: Datacenter Design and Networking
Topic 15: Datacenter Design and NetworkingZubair Nabi
Ā 
Topic 14: Operating Systems and Virtualization
Topic 14: Operating Systems and VirtualizationTopic 14: Operating Systems and Virtualization
Topic 14: Operating Systems and VirtualizationZubair Nabi
Ā 
Topic 13: Cloud Stacks
Topic 13: Cloud StacksTopic 13: Cloud Stacks
Topic 13: Cloud StacksZubair Nabi
Ā 

More from Zubair Nabi (20)

AOS Lab 12: Network Communication
AOS Lab 12: Network CommunicationAOS Lab 12: Network Communication
AOS Lab 12: Network Communication
Ā 
AOS Lab 11: Virtualization
AOS Lab 11: VirtualizationAOS Lab 11: Virtualization
AOS Lab 11: Virtualization
Ā 
AOS Lab 10: File system -- Inodes and beyond
AOS Lab 10: File system -- Inodes and beyondAOS Lab 10: File system -- Inodes and beyond
AOS Lab 10: File system -- Inodes and beyond
Ā 
AOS Lab 9: File system -- Of buffers, logs, and blocks
AOS Lab 9: File system -- Of buffers, logs, and blocksAOS Lab 9: File system -- Of buffers, logs, and blocks
AOS Lab 9: File system -- Of buffers, logs, and blocks
Ā 
AOS Lab 8: Interrupts and Device Drivers
AOS Lab 8: Interrupts and Device DriversAOS Lab 8: Interrupts and Device Drivers
AOS Lab 8: Interrupts and Device Drivers
Ā 
AOS Lab 7: Page tables
AOS Lab 7: Page tablesAOS Lab 7: Page tables
AOS Lab 7: Page tables
Ā 
AOS Lab 6: Scheduling
AOS Lab 6: SchedulingAOS Lab 6: Scheduling
AOS Lab 6: Scheduling
Ā 
AOS Lab 5: System calls
AOS Lab 5: System callsAOS Lab 5: System calls
AOS Lab 5: System calls
Ā 
AOS Lab 4: If you liked it, then you should have put a ā€œlockā€ on it
AOS Lab 4: If you liked it, then you should have put a ā€œlockā€ on itAOS Lab 4: If you liked it, then you should have put a ā€œlockā€ on it
AOS Lab 4: If you liked it, then you should have put a ā€œlockā€ on it
Ā 
AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!
Ā 
AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!
Ā 
AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!
Ā 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data Stack
Ā 
Raabta: Low-cost Video Conferencing for the Developing World
Raabta: Low-cost Video Conferencing for the Developing WorldRaabta: Low-cost Video Conferencing for the Developing World
Raabta: Low-cost Video Conferencing for the Developing World
Ā 
The Anatomy of Web Censorship in Pakistan
The Anatomy of Web Censorship in PakistanThe Anatomy of Web Censorship in Pakistan
The Anatomy of Web Censorship in Pakistan
Ā 
MapReduce and DBMS Hybrids
MapReduce and DBMS HybridsMapReduce and DBMS Hybrids
MapReduce and DBMS Hybrids
Ā 
MapReduce Application Scripting
MapReduce Application ScriptingMapReduce Application Scripting
MapReduce Application Scripting
Ā 
Topic 15: Datacenter Design and Networking
Topic 15: Datacenter Design and NetworkingTopic 15: Datacenter Design and Networking
Topic 15: Datacenter Design and Networking
Ā 
Topic 14: Operating Systems and Virtualization
Topic 14: Operating Systems and VirtualizationTopic 14: Operating Systems and Virtualization
Topic 14: Operating Systems and Virtualization
Ā 
Topic 13: Cloud Stacks
Topic 13: Cloud StacksTopic 13: Cloud Stacks
Topic 13: Cloud Stacks
Ā 

Recently uploaded

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
Ā 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
Ā 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
Ā 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
Ā 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
Ā 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
Ā 
Nellā€™iperspazio con Rocket: il Framework Web di Rust!
Nellā€™iperspazio con Rocket: il Framework Web di Rust!Nellā€™iperspazio con Rocket: il Framework Web di Rust!
Nellā€™iperspazio con Rocket: il Framework Web di Rust!Commit University
Ā 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
Ā 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
Ā 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
Ā 
Integration and Automation in Practice: CI/CD in MuleĀ Integration and Automat...
Integration and Automation in Practice: CI/CD in MuleĀ Integration and Automat...Integration and Automation in Practice: CI/CD in MuleĀ Integration and Automat...
Integration and Automation in Practice: CI/CD in MuleĀ Integration and Automat...Patryk Bandurski
Ā 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
Ā 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
Ā 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
Ā 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
Ā 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo GarcĆ­a Lavilla
Ā 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
Ā 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
Ā 

Recently uploaded (20)

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
Ā 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
Ā 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Ā 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Ā 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
Ā 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
Ā 
Hot Sexy call girls in Panjabi Bagh šŸ” 9953056974 šŸ” Delhi escort Service
Hot Sexy call girls in Panjabi Bagh šŸ” 9953056974 šŸ” Delhi escort ServiceHot Sexy call girls in Panjabi Bagh šŸ” 9953056974 šŸ” Delhi escort Service
Hot Sexy call girls in Panjabi Bagh šŸ” 9953056974 šŸ” Delhi escort Service
Ā 
Nellā€™iperspazio con Rocket: il Framework Web di Rust!
Nellā€™iperspazio con Rocket: il Framework Web di Rust!Nellā€™iperspazio con Rocket: il Framework Web di Rust!
Nellā€™iperspazio con Rocket: il Framework Web di Rust!
Ā 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
Ā 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
Ā 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
Ā 
Integration and Automation in Practice: CI/CD in MuleĀ Integration and Automat...
Integration and Automation in Practice: CI/CD in MuleĀ Integration and Automat...Integration and Automation in Practice: CI/CD in MuleĀ Integration and Automat...
Integration and Automation in Practice: CI/CD in MuleĀ Integration and Automat...
Ā 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Ā 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
Ā 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
Ā 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
Ā 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
Ā 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
Ā 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
Ā 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
Ā 

Topic 10: Taxonomy of Data and Storage

  • 1. 10: Taxonomy of Data and Storage Zubair Nabi zubair.nabi@itu.edu.pk April 20, 2013 Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 1 / 27
  • 2. Outline 1 Datasets 2 Storage 3 Beyond RDBMS 4 NoSQL Taxonomy 5 NewSQL Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 2 / 27
  • 3. Outline 1 Datasets 2 Storage 3 Beyond RDBMS 4 NoSQL Taxonomy 5 NewSQL Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 3 / 27
  • 4. Introduction Data is everywhere and is the driving force behind our lives Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
  • 5. Introduction Data is everywhere and is the driving force behind our lives The address book on your phone is data Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
  • 6. Introduction Data is everywhere and is the driving force behind our lives The address book on your phone is data So is the newspaper that you read every morning Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
  • 7. Introduction Data is everywhere and is the driving force behind our lives The address book on your phone is data So is the newspaper that you read every morning Everything you see around you is a potential source of data which might be useful for a certain application Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
  • 8. Introduction Data is everywhere and is the driving force behind our lives The address book on your phone is data So is the newspaper that you read every morning Everything you see around you is a potential source of data which might be useful for a certain application We use this data to share information and make a more informed decision about different events Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
  • 9. Introduction Data is everywhere and is the driving force behind our lives The address book on your phone is data So is the newspaper that you read every morning Everything you see around you is a potential source of data which might be useful for a certain application We use this data to share information and make a more informed decision about different events Datasets can easily be classiļ¬ed on the basis of their structure Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
  • 10. Introduction Data is everywhere and is the driving force behind our lives The address book on your phone is data So is the newspaper that you read every morning Everything you see around you is a potential source of data which might be useful for a certain application We use this data to share information and make a more informed decision about different events Datasets can easily be classiļ¬ed on the basis of their structure 1 Structured 2 Unstructured 3 Semi-structured Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
  • 11. Structured Data Formatted in a universally understandable and identiļ¬able way Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
  • 12. Structured Data Formatted in a universally understandable and identiļ¬able way In most cases, structured data is formally speciļ¬ed by a schema Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
  • 13. Structured Data Formatted in a universally understandable and identiļ¬able way In most cases, structured data is formally speciļ¬ed by a schema Your phone address phone is structured because it has a schema consisting of name, phone number, address, email address, etc. Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
  • 14. Structured Data Formatted in a universally understandable and identiļ¬able way In most cases, structured data is formally speciļ¬ed by a schema Your phone address phone is structured because it has a schema consisting of name, phone number, address, email address, etc. Most traditional databases contain structured data revolving around data laid out across columns and rows Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
  • 15. Structured Data Formatted in a universally understandable and identiļ¬able way In most cases, structured data is formally speciļ¬ed by a schema Your phone address phone is structured because it has a schema consisting of name, phone number, address, email address, etc. Most traditional databases contain structured data revolving around data laid out across columns and rows Each ļ¬eld also has an associated type Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
  • 16. Structured Data Formatted in a universally understandable and identiļ¬able way In most cases, structured data is formally speciļ¬ed by a schema Your phone address phone is structured because it has a schema consisting of name, phone number, address, email address, etc. Most traditional databases contain structured data revolving around data laid out across columns and rows Each ļ¬eld also has an associated type Possible to search for items based on their data types Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
  • 17. Unstructured Data Data without any conceptual deļ¬nition or type Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 6 / 27
  • 18. Unstructured Data Data without any conceptual deļ¬nition or type Can vary from raw text to binary data Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 6 / 27
  • 19. Unstructured Data Data without any conceptual deļ¬nition or type Can vary from raw text to binary data Processing unstructured data requires parsing and tagging on the ļ¬‚y Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 6 / 27
  • 20. Unstructured Data Data without any conceptual deļ¬nition or type Can vary from raw text to binary data Processing unstructured data requires parsing and tagging on the ļ¬‚y In most cases, consists of simple log ļ¬les Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 6 / 27
  • 21. Semi-structured Data Occupies the space between the structured and unstructured data spectrum Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 7 / 27
  • 22. Semi-structured Data Occupies the space between the structured and unstructured data spectrum For instance, while binary data has no structure, audio and video ļ¬les have meta-data which has structure, such as author, time of creation, etc. Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 7 / 27
  • 23. Semi-structured Data Occupies the space between the structured and unstructured data spectrum For instance, while binary data has no structure, audio and video ļ¬les have meta-data which has structure, such as author, time of creation, etc. Can also be labelled as self-describing structure Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 7 / 27
  • 24. Outline 1 Datasets 2 Storage 3 Beyond RDBMS 4 NoSQL Taxonomy 5 NewSQL Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 8 / 27
  • 25. Database Management Systems (DBMS) Used to store and manage data Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 9 / 27
  • 26. Database Management Systems (DBMS) Used to store and manage data Support for large amounts of data Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 9 / 27
  • 27. Database Management Systems (DBMS) Used to store and manage data Support for large amounts of data Ensure concurrency, sharing, and locking Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 9 / 27
  • 28. Database Management Systems (DBMS) Used to store and manage data Support for large amounts of data Ensure concurrency, sharing, and locking Security is useful too; to enable ļ¬ne-grained access control Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 9 / 27
  • 29. Database Management Systems (DBMS) Used to store and manage data Support for large amounts of data Ensure concurrency, sharing, and locking Security is useful too; to enable ļ¬ne-grained access control Ability to keep working in the face of failure Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 9 / 27
  • 30. Relational Database Management Systems (RDBMS) The most popular and predominant storage system in use Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
  • 31. Relational Database Management Systems (RDBMS) The most popular and predominant storage system in use Data in different ļ¬les is connected by using a key ļ¬eld Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
  • 32. Relational Database Management Systems (RDBMS) The most popular and predominant storage system in use Data in different ļ¬les is connected by using a key ļ¬eld Data is laid out in different tables, with a key ļ¬eld that identiļ¬es each row Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
  • 33. Relational Database Management Systems (RDBMS) The most popular and predominant storage system in use Data in different ļ¬les is connected by using a key ļ¬eld Data is laid out in different tables, with a key ļ¬eld that identiļ¬es each row The same key ļ¬eld is used to connect one table to another Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
  • 34. Relational Database Management Systems (RDBMS) The most popular and predominant storage system in use Data in different ļ¬les is connected by using a key ļ¬eld Data is laid out in different tables, with a key ļ¬eld that identiļ¬es each row The same key ļ¬eld is used to connect one table to another For instance, a relation might have customer ID as key and her details as data; another table might have the same key but different data, say her purchases; yet another table with the same key might have a breakdown of her preferences Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
  • 35. Relational Database Management Systems (RDBMS) The most popular and predominant storage system in use Data in different ļ¬les is connected by using a key ļ¬eld Data is laid out in different tables, with a key ļ¬eld that identiļ¬es each row The same key ļ¬eld is used to connect one table to another For instance, a relation might have customer ID as key and her details as data; another table might have the same key but different data, say her purchases; yet another table with the same key might have a breakdown of her preferences Examples include Oracle Database, MS SQL Server, MySQL, IBM DB2, and Teradata Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
  • 36. Structured Query Language (SQL) Non-procedural language used for data retrieval and manipulation in RDBMS Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 11 / 27
  • 37. Structured Query Language (SQL) Non-procedural language used for data retrieval and manipulation in RDBMS Adds a layer of abstraction over relational algebra, which enables set operations, selections, etc. Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 11 / 27
  • 38. Structured Query Language (SQL) Non-procedural language used for data retrieval and manipulation in RDBMS Adds a layer of abstraction over relational algebra, which enables set operations, selections, etc. Due to its declarative nature, users operate in terms of their expected output while the underlying system decides the actual query execution plan Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 11 / 27
  • 39. Structured Query Language (SQL) Non-procedural language used for data retrieval and manipulation in RDBMS Adds a layer of abstraction over relational algebra, which enables set operations, selections, etc. Due to its declarative nature, users operate in terms of their expected output while the underlying system decides the actual query execution plan Instructions consist of a speciļ¬c SQL statement and additional parameters and operands Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 11 / 27
  • 40. Structured Query Language (SQL) Non-procedural language used for data retrieval and manipulation in RDBMS Adds a layer of abstraction over relational algebra, which enables set operations, selections, etc. Due to its declarative nature, users operate in terms of their expected output while the underlying system decides the actual query execution plan Instructions consist of a speciļ¬c SQL statement and additional parameters and operands For instance, the SELECT operator retrieves certain records, INSERT adds a record, and so on Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 11 / 27
  • 41. RDBMS and Structured Data As structured data follows a predeļ¬ned schema, it naturally maps on to a relational database system Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
  • 42. RDBMS and Structured Data As structured data follows a predeļ¬ned schema, it naturally maps on to a relational database system The schema deļ¬nes the type and structure of the data and its relations Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
  • 43. RDBMS and Structured Data As structured data follows a predeļ¬ned schema, it naturally maps on to a relational database system The schema deļ¬nes the type and structure of the data and its relations Schema design is an arduous process and needs to be done before the database can be populated Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
  • 44. RDBMS and Structured Data As structured data follows a predeļ¬ned schema, it naturally maps on to a relational database system The schema deļ¬nes the type and structure of the data and its relations Schema design is an arduous process and needs to be done before the database can be populated Another consequence of a strict schema is that it is non-trivial to extend it Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
  • 45. RDBMS and Structured Data As structured data follows a predeļ¬ned schema, it naturally maps on to a relational database system The schema deļ¬nes the type and structure of the data and its relations Schema design is an arduous process and needs to be done before the database can be populated Another consequence of a strict schema is that it is non-trivial to extend it For instance, adding a new attribute to an existing row necessitates adding a new column to the entire table Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
  • 46. RDBMS and Structured Data As structured data follows a predeļ¬ned schema, it naturally maps on to a relational database system The schema deļ¬nes the type and structure of the data and its relations Schema design is an arduous process and needs to be done before the database can be populated Another consequence of a strict schema is that it is non-trivial to extend it For instance, adding a new attribute to an existing row necessitates adding a new column to the entire table Extremely suboptimal in tables with millions of rows Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
  • 47. RDBMS and Semi- and Un-structured Data Unstructured data has no notion of schema while semi-structured data only has a weak one Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
  • 48. RDBMS and Semi- and Un-structured Data Unstructured data has no notion of schema while semi-structured data only has a weak one Data within such datasets also has an associated type Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
  • 49. RDBMS and Semi- and Un-structured Data Unstructured data has no notion of schema while semi-structured data only has a weak one Data within such datasets also has an associated type In fact, types are application-centric: It might be possible to interpret a ļ¬eld as a ļ¬‚oat in one application and as a string in another Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
  • 50. RDBMS and Semi- and Un-structured Data Unstructured data has no notion of schema while semi-structured data only has a weak one Data within such datasets also has an associated type In fact, types are application-centric: It might be possible to interpret a ļ¬eld as a ļ¬‚oat in one application and as a string in another While it is possible, with human intervention, to glean structure from unstructured data, it is an extremely expensive task Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
  • 51. RDBMS and Semi- and Un-structured Data Unstructured data has no notion of schema while semi-structured data only has a weak one Data within such datasets also has an associated type In fact, types are application-centric: It might be possible to interpret a ļ¬eld as a ļ¬‚oat in one application and as a string in another While it is possible, with human intervention, to glean structure from unstructured data, it is an extremely expensive task Structureless data generated by real-time sources can change the number of attributes and their types on the ļ¬‚y Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
  • 52. RDBMS and Semi- and Un-structured Data Unstructured data has no notion of schema while semi-structured data only has a weak one Data within such datasets also has an associated type In fact, types are application-centric: It might be possible to interpret a ļ¬eld as a ļ¬‚oat in one application and as a string in another While it is possible, with human intervention, to glean structure from unstructured data, it is an extremely expensive task Structureless data generated by real-time sources can change the number of attributes and their types on the ļ¬‚y RDBMS would require the creation of a new table each time such a change takes place Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
  • 53. RDBMS and Semi- and Un-structured Data Unstructured data has no notion of schema while semi-structured data only has a weak one Data within such datasets also has an associated type In fact, types are application-centric: It might be possible to interpret a ļ¬eld as a ļ¬‚oat in one application and as a string in another While it is possible, with human intervention, to glean structure from unstructured data, it is an extremely expensive task Structureless data generated by real-time sources can change the number of attributes and their types on the ļ¬‚y RDBMS would require the creation of a new table each time such a change takes place Therefore, unstructured and semi-structured data does not ļ¬t the relational model Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
  • 54. Outline 1 Datasets 2 Storage 3 Beyond RDBMS 4 NoSQL Taxonomy 5 NewSQL Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 14 / 27
  • 55. Motivation Different semantics: RDBMS provide ACID semantics: 1 Atomic: The entire transaction either succeeds or fails Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
  • 56. Motivation Different semantics: RDBMS provide ACID semantics: 1 Atomic: The entire transaction either succeeds or fails 2 Consistent: Data within the database remains consistent after each transaction Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
  • 57. Motivation Different semantics: RDBMS provide ACID semantics: 1 Atomic: The entire transaction either succeeds or fails 2 Consistent: Data within the database remains consistent after each transaction 3 Isolation: Transactions are sandboxed from each other Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
  • 58. Motivation Different semantics: RDBMS provide ACID semantics: 1 Atomic: The entire transaction either succeeds or fails 2 Consistent: Data within the database remains consistent after each transaction 3 Isolation: Transactions are sandboxed from each other 4 Durable: Transactions are persistent across failures and restarts Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
  • 59. Motivation Different semantics: RDBMS provide ACID semantics: 1 Atomic: The entire transaction either succeeds or fails 2 Consistent: Data within the database remains consistent after each transaction 3 Isolation: Transactions are sandboxed from each other 4 Durable: Transactions are persistent across failures and restarts Overkill in case of most user-facing applications Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
  • 60. Motivation Different semantics: RDBMS provide ACID semantics: 1 Atomic: The entire transaction either succeeds or fails 2 Consistent: Data within the database remains consistent after each transaction 3 Isolation: Transactions are sandboxed from each other 4 Durable: Transactions are persistent across failures and restarts Overkill in case of most user-facing applications Most applications are more interested in availability and willing to sacriļ¬ce consistency leading to eventual consistency Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
  • 61. Motivation Different semantics: RDBMS provide ACID semantics: 1 Atomic: The entire transaction either succeeds or fails 2 Consistent: Data within the database remains consistent after each transaction 3 Isolation: Transactions are sandboxed from each other 4 Durable: Transactions are persistent across failures and restarts Overkill in case of most user-facing applications Most applications are more interested in availability and willing to sacriļ¬ce consistency leading to eventual consistency This basically available, soft state, eventually consistent (BASE) model enables applications to function even in the face of partial failure Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
  • 62. Motivation Different semantics: RDBMS provide ACID semantics: 1 Atomic: The entire transaction either succeeds or fails 2 Consistent: Data within the database remains consistent after each transaction 3 Isolation: Transactions are sandboxed from each other 4 Durable: Transactions are persistent across failures and restarts Overkill in case of most user-facing applications Most applications are more interested in availability and willing to sacriļ¬ce consistency leading to eventual consistency This basically available, soft state, eventually consistent (BASE) model enables applications to function even in the face of partial failure High Throughput: Most NoSQL databases sacriļ¬ce consistency for availability leading to higher throughput (in some cases an order of magnitude) Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
  • 63. Motivation (2) Horizontal Scalability: To cater for more data, NoSQL stores can be scaled up by just adding more machines and the underlying system automatically re-distributes the data Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 16 / 27
  • 64. Motivation (2) Horizontal Scalability: To cater for more data, NoSQL stores can be scaled up by just adding more machines and the underlying system automatically re-distributes the data Commodity Hardware: A large number of RDBMS require specialized and proprietary hardware for operation. In contrast, NoSQL databases function over commodity off-the-shelf hardware Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 16 / 27
  • 65. Motivation (2) Horizontal Scalability: To cater for more data, NoSQL stores can be scaled up by just adding more machines and the underlying system automatically re-distributes the data Commodity Hardware: A large number of RDBMS require specialized and proprietary hardware for operation. In contrast, NoSQL databases function over commodity off-the-shelf hardware Programming Language Support: Over the years programming languages have started providing abstractions for database support (LINQ, etc.) while bypassing SQL. NoSQL databases provide abstractions that directly map onto the language abstractions leading to tighter coupling Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 16 / 27
  • 66. Motivation (3) The Rise of Cloud Computing: Cloud Computing applications require horizontal scalability and low administration overhead. Both requirements are naturally satisļ¬ed by NoSQL stores Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 17 / 27
  • 67. Outline 1 Datasets 2 Storage 3 Beyond RDBMS 4 NoSQL Taxonomy 5 NewSQL Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 18 / 27
  • 68. Introduction NoSQL databases can be classiļ¬ed on the basis of: 1 Data Model: How data is represented Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 19 / 27
  • 69. Introduction NoSQL databases can be classiļ¬ed on the basis of: 1 Data Model: How data is represented 2 Scalability: How scalable the system is Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 19 / 27
  • 70. Introduction NoSQL databases can be classiļ¬ed on the basis of: 1 Data Model: How data is represented 2 Scalability: How scalable the system is 3 Query Model: What type of API it exposes Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 19 / 27
  • 71. Introduction NoSQL databases can be classiļ¬ed on the basis of: 1 Data Model: How data is represented 2 Scalability: How scalable the system is 3 Query Model: What type of API it exposes 4 Persistence: How persistent the data is Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 19 / 27
  • 72. Classiļ¬cation by Data Model Based on the data model, NoSQL databases can roughly be categorized into three categories: 1 Key/value Stores: A map/dictionary allowing put/get semantics per key Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 20 / 27
  • 73. Classiļ¬cation by Data Model Based on the data model, NoSQL databases can roughly be categorized into three categories: 1 Key/value Stores: A map/dictionary allowing put/get semantics per key 2 Document Stores: Complex data structures to encapsulate document key/value pairs Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 20 / 27
  • 74. Classiļ¬cation by Data Model Based on the data model, NoSQL databases can roughly be categorized into three categories: 1 Key/value Stores: A map/dictionary allowing put/get semantics per key 2 Document Stores: Complex data structures to encapsulate document key/value pairs 3 Column-Oriented Stores: Data laid out by column Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 20 / 27
  • 75. Key/value Stores Data is stored within a large hash map Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 21 / 27
  • 76. Key/value Stores Data is stored within a large hash map Simple get/put API Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 21 / 27
  • 77. Key/value Stores Data is stored within a large hash map Simple get/put API Favour scalability over consistency Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 21 / 27
  • 78. Key/value Stores Data is stored within a large hash map Simple get/put API Favour scalability over consistency Limit on the size of the key Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 21 / 27
  • 79. Key/value Stores Data is stored within a large hash map Simple get/put API Favour scalability over consistency Limit on the size of the key Examples include Amazonā€™s Dynamo, LinkedInā€™s Voldemort, Redis, and Memcached Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 21 / 27
  • 80. Document Stores Key/value semantics but based on documents Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 22 / 27
  • 81. Document Stores Key/value semantics but based on documents A document encapsulates data in a standard format, such as JSON, XML, PDF, etc. Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 22 / 27
  • 82. Document Stores Key/value semantics but based on documents A document encapsulates data in a standard format, such as JSON, XML, PDF, etc. Documents themselves can be heterogeneous Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 22 / 27
  • 83. Document Stores Key/value semantics but based on documents A document encapsulates data in a standard format, such as JSON, XML, PDF, etc. Documents themselves can be heterogeneous Documents can also be retrieved based on their content Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 22 / 27
  • 84. Document Stores Key/value semantics but based on documents A document encapsulates data in a standard format, such as JSON, XML, PDF, etc. Documents themselves can be heterogeneous Documents can also be retrieved based on their content Examples include Apache CouchDB and MongoDB Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 22 / 27
  • 85. Column-Oriented Stores Data is stored and processed by column Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 23 / 27
  • 86. Column-Oriented Stores Data is stored and processed by column Useful for read-mostly and read-intensive data Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 23 / 27
  • 87. Column-Oriented Stores Data is stored and processed by column Useful for read-mostly and read-intensive data Data within the same column is of the same type enabling opportunities for efļ¬cient compression Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 23 / 27
  • 88. Column-Oriented Stores Data is stored and processed by column Useful for read-mostly and read-intensive data Data within the same column is of the same type enabling opportunities for efļ¬cient compression Columns are stored separately so they can be loaded in parallel Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 23 / 27
  • 89. Column-Oriented Stores Data is stored and processed by column Useful for read-mostly and read-intensive data Data within the same column is of the same type enabling opportunities for efļ¬cient compression Columns are stored separately so they can be loaded in parallel Examples include Googleā€™s BigTable (Apache HBase is its open source clone) and Facebookā€™s Cassandra Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 23 / 27
  • 90. Outline 1 Datasets 2 Storage 3 Beyond RDBMS 4 NoSQL Taxonomy 5 NewSQL Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 24 / 27
  • 91. Introduction A hybrid of traditional RDBMS and NoSQL Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
  • 92. Introduction A hybrid of traditional RDBMS and NoSQL Scalability and performance of NoSQL and ACID guarantees of RDBMS Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
  • 93. Introduction A hybrid of traditional RDBMS and NoSQL Scalability and performance of NoSQL and ACID guarantees of RDBMS Use SQL as the primary language Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
  • 94. Introduction A hybrid of traditional RDBMS and NoSQL Scalability and performance of NoSQL and ACID guarantees of RDBMS Use SQL as the primary language Ability to scale out and run over commodity hardware Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
  • 95. Introduction A hybrid of traditional RDBMS and NoSQL Scalability and performance of NoSQL and ACID guarantees of RDBMS Use SQL as the primary language Ability to scale out and run over commodity hardware Classiļ¬ed into: 1 New Databases: Designed from scratch Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
  • 96. Introduction A hybrid of traditional RDBMS and NoSQL Scalability and performance of NoSQL and ACID guarantees of RDBMS Use SQL as the primary language Ability to scale out and run over commodity hardware Classiļ¬ed into: 1 New Databases: Designed from scratch 2 New MySQL Storage Engines: Keep MySQL as interface but replace the storage engine Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
  • 97. Introduction A hybrid of traditional RDBMS and NoSQL Scalability and performance of NoSQL and ACID guarantees of RDBMS Use SQL as the primary language Ability to scale out and run over commodity hardware Classiļ¬ed into: 1 New Databases: Designed from scratch 2 New MySQL Storage Engines: Keep MySQL as interface but replace the storage engine 3 Transparent Clustering: Add pluggable features to existing databases to ensure scalability Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
  • 98. New Databases 1 Query Distribution: Each node holds a subset of the data Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
  • 99. New Databases 1 Query Distribution: Each node holds a subset of the data Queries are split and shipped to nodes that own the data Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
  • 100. New Databases 1 Query Distribution: Each node holds a subset of the data Queries are split and shipped to nodes that own the data Examples include Googleā€™s Spanner and NuoDB Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
  • 101. New Databases 1 Query Distribution: Each node holds a subset of the data Queries are split and shipped to nodes that own the data Examples include Googleā€™s Spanner and NuoDB 2 Pull Data: A central node (possibly replicated) holds all data Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
  • 102. New Databases 1 Query Distribution: Each node holds a subset of the data Queries are split and shipped to nodes that own the data Examples include Googleā€™s Spanner and NuoDB 2 Pull Data: A central node (possibly replicated) holds all data A set of processing nodes receives queries and pulls in required data from the central node Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
  • 103. New Databases 1 Query Distribution: Each node holds a subset of the data Queries are split and shipped to nodes that own the data Examples include Googleā€™s Spanner and NuoDB 2 Pull Data: A central node (possibly replicated) holds all data A set of processing nodes receives queries and pulls in required data from the central node Examples include VMwareā€™s SQLFire Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
  • 104. References 1 NoSQL Databases: https: //oak.cs.ucla.edu/cs144/handouts/nosqldbs.pdf 2 NewSQL ā€“ The New Way to Handle Big Data: http://www. linuxforu.com/2012/01/newsql-handle-big-data/ Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 27 / 27