CS8091 / Big Data Analytics
III Year / VI Semester
UNIT V NOSQL DATA MANAGEMENT FOR
BIG DATAAND VISUALIZATION
NoSQL Databases : Schema-less Models: Increasing
Flexibility for Data Manipulation-Key Value Stores-
Document Stores - Tabular Stores - Object Data
Stores - Graph Databases Hive - Sharding –- Hbase
– Analyzing big data with twitter - Big data for E-
Commerce Big data for blogs - Review of Basic
Data Analytic Methods using R.
NoSQL
 Most hardware and software appliances support
standard approaches to standard, SQL-based
relational database management systems
(RDBMSs).
 Software appliances often bundle their execution
engines with the RDBMS and utilities for creating
the database structures and for bulk data loading.
NoSQL
 The availability of a high-performance, elastic
distributed data environment enables creative
algorithms to exploit variant modes of data
management in different ways.
 Data management frameworks are bundled
under the term “NoSQL databases”.
NoSQL
 Not only SQL
 Combine traditional SQL (or SQL-like query
languages) with alternative means of querying
and access.
Schema-Less Models: Increasing
Flexibility for Data Manipulation
 NoSQL data systems hold out the promise of
greater flexibility in database management
while reducing the dependence on more formal
database administration.
Schema-Less Models: Increasing
Flexibility for Data Manipulation
 Types:
 Key Value stores: align to certain big data
programming models
 Graph Database: a graph abstraction is
implemented to embed both semantics and
connectivity within its structure.
Schema-Less Models: Increasing
Flexibility for Data Manipulation
 NoSQL databases also provide for integrated
data caching that helps reduce data access
latency and speed performance.
 The loosening of the relational structure is
intended to allow different models to be
adapted to specific types of analyses.
Schema-Less Models: Increasing
Flexibility for Data Manipulation
 Types: Key –Value Stores:
 Values (or sets of values, or even more complex
entity objects) are associated with distinct
character strings called keys.
 Programmers may see similarity with the data
structure known as a hash table.
Schema-Less Models: Increasing
Flexibility for Data Manipulation
 Types: Key –Value Stores:
Key Value
BMW {“1-Series”, “3-Series”, “5-Series”, “5-Series GT”,
“7-Series”, “X3”, “X5”, “X6”, “Z4”}
Schema-Less Models: Increasing
Flexibility for Data Manipulation
 Types: Key –Value Stores:
 The key is the name of the automobile make,
while the value is a list of names of models
associated with that automobile make.
Schema-Less Models: Increasing
Flexibility for Data Manipulation
 Types: Key –Value Stores - Operations:
 Get(key), which returns the value associated with the
provided key.
Put(key, value), which associates the value with the key.
Multi-get(key1, key2,.., keyN), which returns the list of
values associated with the list of keys.
Delete(key), which removes the entry for the key from the
data store
Schema-Less Models: Increasing
Flexibility for Data Manipulation
 Types: Key –Value Stores – Characteristics:
 Uniqueness of the key - to find the values you are
looking for, you must use the exact key.
 In this data management approach, if you want to
associate multiple values with a single key, you need
to consider the representations of the objects and how
they are associated with the key.
Schema-Less Models: Increasing
Flexibility for Data Manipulation
 Types: Key –Value Stores – Characteristics:
 Key-value stores are essentially very long, and likely thin
tables.
 The table’s rows can be sorted by the key value to simplify
finding the key during a query.
 The keys can be hashed using a hash function that maps
the key to a particular location (sometimes called a
“bucket”) in the table.
Schema-Less Models: Increasing
Flexibility for Data Manipulation
 Types: Key –Value Stores – Characteristics:
 The representation can grow indefinitely, which
makes it good for storing large amounts of data that
can be accessed relatively quickly, as well as
environments requiring incremental appends of data.
 Examples include capturing system transaction logs,
managing profile data about individuals.
Schema-Less Models: Increasing
Flexibility for Data Manipulation
 Types: Key –Value Stores – Characteristics:
 The simplicity of the representation allows massive
amounts of indexed data values to be appended to the
same key value table, which can then be sharded, or
distributed across the storage nodes.
 Under the right conditions, the table is distributed in a
way that is aligned with the way the keys are
organized.
Schema-Less Models: Increasing
Flexibility for Data Manipulation
 Types: Key –Value Stores:
 While key value pairs are very useful for both
storing the results of analytical algorithms (such as
phrase counts among massive numbers of
documents) and for producing those results for
reports, the model does pose some potential
drawbacks.
Schema-Less Models: Increasing
Flexibility for Data Manipulation
 Types: Key –Value Stores – Drawbacks:
 the model will not inherently provide any kind of
traditional database capabilities (such as atomicity of
transactions, or consistency when multiple transactions
are executed simultaneously)—those capabilities must
be provided by the application itself.
 Another is that as the model grows, maintaining
unique values as keys may become more difficult.
Schema-Less Models: Increasing
Flexibility for Data Manipulation
 Types: Document Stores:
 A document store is similar to a key value store in that
stored objects are associated (and therefore accessed
via) character string keys.
 The difference is that the values being stored, which
are referred to as “documents,” provide some structure
and encoding of the managed data.
Schema-Less Models: Increasing
Flexibility for Data Manipulation
 Types: Document Stores:
 A document store is similar to a key value store in that
stored objects are associated (and therefore accessed via)
character string keys.
 The difference is that the values being stored, which are
referred to as “documents,” provide some structure and
encoding of the managed data.
 Common encodings - XML
Schema-Less Models: Increasing
Flexibility for Data Manipulation
 Types: Document Stores – Example:
{StoreName:“Retail Store #34”, {Street:“1203 O ST”,
City:“Lincoln”, State:“NE”, ZIP:“68508”} }
{StoreName:”Retail Store #65”, {MallLocation:”Westfield
Wheaton”, City:”Wheaton”, State:”IL”} }
{StoreName:”Retail Store $102”, {Latitude:” 40.748328”,
Longitude:” -73.985560”} }
Schema-Less Models: Increasing
Flexibility for Data Manipulation
 Types: Document Stores:
 The document representation embeds the model so
that the meanings of the document values can be
inferred by the application.
Schema-Less Models: Increasing
Flexibility for Data Manipulation
 Types: Document Stores:
 One of the differences between a keyvalue store
and a document store is that while the former
requires the use of a key to retrieve data, the latter
often provides a means (either through a
programming API or using a query language) for
querying the data based on the contents.
Schema-Less Models: Increasing
Flexibility for Data Manipulation
 Types: Tabular Stores:
 Tabular, or table-based stores are largely derived
from Google’s original Bigtable design to manage
structured data.
 The HBase model, a Hadoop-related NoSQL data
management system that evolved from bigtable.
Schema-Less Models: Increasing
Flexibility for Data Manipulation
 Types: Tabular Stores:
 The bigtable NoSQL model allows sparse data to
be stored in a three-dimensional table that is
indexed by a row key, a column key that indicates
the specific attribute for which a data value is
stored, and a timestamp that may refer to the time
at which the row’s column value was stored.
Schema-Less Models: Increasing
Flexibility for Data Manipulation
 Types: Tabular Stores:
 As an example, various attributes of a web page
can be associated with the web page’s URL:
the HTML content of the page,
URLs of other web pages that link to this web page, and
the author of the content.
Schema-Less Models: Increasing
Flexibility for Data Manipulation
 Types: Tabular Stores:
 Columns in a Bigtable model are grouped together as
“families,” and the timestamps enable management of
multiple versions of an object.
 The timestamp can be used to maintain history—each
time the content changes, new column attachments can
be created with the timestamp of when the content was
downloaded.
Schema-Less Models: Increasing
Flexibility for Data Manipulation
 Types: Object Data Stores:
 Object databases can be similar to document
stores except that the document stores explicitly
serializes the object so the data values are stored as
strings, while object databases maintain the object
structures as they are bound to object-oriented
programming languages.
Schema-Less Models: Increasing
Flexibility for Data Manipulation
 Types: Object Data Stores:
 Object database management systems are more likely
to provide traditional ACID (atomicity, consistency,
isolation, and durability) compliance—characteristics
that are bound to database reliability. Object databases
are not relational databases and are not queried using
SQL.
Schema-Less Models: Increasing
Flexibility for Data Manipulation
 Types: Graph Databases:
 Graph databases provide a model of representing
individual entities and numerous kinds of relationships
that connect those entities.
It is consisting of a collection of vertices that represent
the modeled entities, connected by edges that capture
the way that two entities are related.

CS8091_BDA_Unit_V_NoSQL

  • 1.
    CS8091 / BigData Analytics III Year / VI Semester
  • 2.
    UNIT V NOSQLDATA MANAGEMENT FOR BIG DATAAND VISUALIZATION NoSQL Databases : Schema-less Models: Increasing Flexibility for Data Manipulation-Key Value Stores- Document Stores - Tabular Stores - Object Data Stores - Graph Databases Hive - Sharding –- Hbase – Analyzing big data with twitter - Big data for E- Commerce Big data for blogs - Review of Basic Data Analytic Methods using R.
  • 3.
    NoSQL  Most hardwareand software appliances support standard approaches to standard, SQL-based relational database management systems (RDBMSs).  Software appliances often bundle their execution engines with the RDBMS and utilities for creating the database structures and for bulk data loading.
  • 4.
    NoSQL  The availabilityof a high-performance, elastic distributed data environment enables creative algorithms to exploit variant modes of data management in different ways.  Data management frameworks are bundled under the term “NoSQL databases”.
  • 5.
    NoSQL  Not onlySQL  Combine traditional SQL (or SQL-like query languages) with alternative means of querying and access.
  • 6.
    Schema-Less Models: Increasing Flexibilityfor Data Manipulation  NoSQL data systems hold out the promise of greater flexibility in database management while reducing the dependence on more formal database administration.
  • 7.
    Schema-Less Models: Increasing Flexibilityfor Data Manipulation  Types:  Key Value stores: align to certain big data programming models  Graph Database: a graph abstraction is implemented to embed both semantics and connectivity within its structure.
  • 8.
    Schema-Less Models: Increasing Flexibilityfor Data Manipulation  NoSQL databases also provide for integrated data caching that helps reduce data access latency and speed performance.  The loosening of the relational structure is intended to allow different models to be adapted to specific types of analyses.
  • 9.
    Schema-Less Models: Increasing Flexibilityfor Data Manipulation  Types: Key –Value Stores:  Values (or sets of values, or even more complex entity objects) are associated with distinct character strings called keys.  Programmers may see similarity with the data structure known as a hash table.
  • 10.
    Schema-Less Models: Increasing Flexibilityfor Data Manipulation  Types: Key –Value Stores: Key Value BMW {“1-Series”, “3-Series”, “5-Series”, “5-Series GT”, “7-Series”, “X3”, “X5”, “X6”, “Z4”}
  • 11.
    Schema-Less Models: Increasing Flexibilityfor Data Manipulation  Types: Key –Value Stores:  The key is the name of the automobile make, while the value is a list of names of models associated with that automobile make.
  • 12.
    Schema-Less Models: Increasing Flexibilityfor Data Manipulation  Types: Key –Value Stores - Operations:  Get(key), which returns the value associated with the provided key. Put(key, value), which associates the value with the key. Multi-get(key1, key2,.., keyN), which returns the list of values associated with the list of keys. Delete(key), which removes the entry for the key from the data store
  • 13.
    Schema-Less Models: Increasing Flexibilityfor Data Manipulation  Types: Key –Value Stores – Characteristics:  Uniqueness of the key - to find the values you are looking for, you must use the exact key.  In this data management approach, if you want to associate multiple values with a single key, you need to consider the representations of the objects and how they are associated with the key.
  • 14.
    Schema-Less Models: Increasing Flexibilityfor Data Manipulation  Types: Key –Value Stores – Characteristics:  Key-value stores are essentially very long, and likely thin tables.  The table’s rows can be sorted by the key value to simplify finding the key during a query.  The keys can be hashed using a hash function that maps the key to a particular location (sometimes called a “bucket”) in the table.
  • 15.
    Schema-Less Models: Increasing Flexibilityfor Data Manipulation  Types: Key –Value Stores – Characteristics:  The representation can grow indefinitely, which makes it good for storing large amounts of data that can be accessed relatively quickly, as well as environments requiring incremental appends of data.  Examples include capturing system transaction logs, managing profile data about individuals.
  • 16.
    Schema-Less Models: Increasing Flexibilityfor Data Manipulation  Types: Key –Value Stores – Characteristics:  The simplicity of the representation allows massive amounts of indexed data values to be appended to the same key value table, which can then be sharded, or distributed across the storage nodes.  Under the right conditions, the table is distributed in a way that is aligned with the way the keys are organized.
  • 17.
    Schema-Less Models: Increasing Flexibilityfor Data Manipulation  Types: Key –Value Stores:  While key value pairs are very useful for both storing the results of analytical algorithms (such as phrase counts among massive numbers of documents) and for producing those results for reports, the model does pose some potential drawbacks.
  • 18.
    Schema-Less Models: Increasing Flexibilityfor Data Manipulation  Types: Key –Value Stores – Drawbacks:  the model will not inherently provide any kind of traditional database capabilities (such as atomicity of transactions, or consistency when multiple transactions are executed simultaneously)—those capabilities must be provided by the application itself.  Another is that as the model grows, maintaining unique values as keys may become more difficult.
  • 19.
    Schema-Less Models: Increasing Flexibilityfor Data Manipulation  Types: Document Stores:  A document store is similar to a key value store in that stored objects are associated (and therefore accessed via) character string keys.  The difference is that the values being stored, which are referred to as “documents,” provide some structure and encoding of the managed data.
  • 20.
    Schema-Less Models: Increasing Flexibilityfor Data Manipulation  Types: Document Stores:  A document store is similar to a key value store in that stored objects are associated (and therefore accessed via) character string keys.  The difference is that the values being stored, which are referred to as “documents,” provide some structure and encoding of the managed data.  Common encodings - XML
  • 21.
    Schema-Less Models: Increasing Flexibilityfor Data Manipulation  Types: Document Stores – Example: {StoreName:“Retail Store #34”, {Street:“1203 O ST”, City:“Lincoln”, State:“NE”, ZIP:“68508”} } {StoreName:”Retail Store #65”, {MallLocation:”Westfield Wheaton”, City:”Wheaton”, State:”IL”} } {StoreName:”Retail Store $102”, {Latitude:” 40.748328”, Longitude:” -73.985560”} }
  • 22.
    Schema-Less Models: Increasing Flexibilityfor Data Manipulation  Types: Document Stores:  The document representation embeds the model so that the meanings of the document values can be inferred by the application.
  • 23.
    Schema-Less Models: Increasing Flexibilityfor Data Manipulation  Types: Document Stores:  One of the differences between a keyvalue store and a document store is that while the former requires the use of a key to retrieve data, the latter often provides a means (either through a programming API or using a query language) for querying the data based on the contents.
  • 24.
    Schema-Less Models: Increasing Flexibilityfor Data Manipulation  Types: Tabular Stores:  Tabular, or table-based stores are largely derived from Google’s original Bigtable design to manage structured data.  The HBase model, a Hadoop-related NoSQL data management system that evolved from bigtable.
  • 25.
    Schema-Less Models: Increasing Flexibilityfor Data Manipulation  Types: Tabular Stores:  The bigtable NoSQL model allows sparse data to be stored in a three-dimensional table that is indexed by a row key, a column key that indicates the specific attribute for which a data value is stored, and a timestamp that may refer to the time at which the row’s column value was stored.
  • 26.
    Schema-Less Models: Increasing Flexibilityfor Data Manipulation  Types: Tabular Stores:  As an example, various attributes of a web page can be associated with the web page’s URL: the HTML content of the page, URLs of other web pages that link to this web page, and the author of the content.
  • 27.
    Schema-Less Models: Increasing Flexibilityfor Data Manipulation  Types: Tabular Stores:  Columns in a Bigtable model are grouped together as “families,” and the timestamps enable management of multiple versions of an object.  The timestamp can be used to maintain history—each time the content changes, new column attachments can be created with the timestamp of when the content was downloaded.
  • 28.
    Schema-Less Models: Increasing Flexibilityfor Data Manipulation  Types: Object Data Stores:  Object databases can be similar to document stores except that the document stores explicitly serializes the object so the data values are stored as strings, while object databases maintain the object structures as they are bound to object-oriented programming languages.
  • 29.
    Schema-Less Models: Increasing Flexibilityfor Data Manipulation  Types: Object Data Stores:  Object database management systems are more likely to provide traditional ACID (atomicity, consistency, isolation, and durability) compliance—characteristics that are bound to database reliability. Object databases are not relational databases and are not queried using SQL.
  • 30.
    Schema-Less Models: Increasing Flexibilityfor Data Manipulation  Types: Graph Databases:  Graph databases provide a model of representing individual entities and numerous kinds of relationships that connect those entities. It is consisting of a collection of vertices that represent the modeled entities, connected by edges that capture the way that two entities are related.