CCS334 BIG DATAANALYTICS
(R-21 III (I Sem))
Department of Artificial Intelligence and Data Science )
Session 2
by
Asst.Prof.M.Gokilavani
NIET
9/19/2023 Department of AI & DS 1
TEXT BOOKS
• Michael Minelli, Michelle Chambers, and AmbigaDhiraj, "Big Data,
Big Analytics: Emerging Business Intelligence and Analytic Trends for
Today's Businesses", Wiley, 2013.
• Eric Sammer, "Hadoop Operations", O'Reilley, 2012.
• Sadalage, Pramod J. “NoSQL distilled”, 2013.
REFERENCES
• E. Capriolo, D. Wampler, and J. Rutherglen, "Programming Hive",
O'Reilley, 2012.
• Lars George, "HBase: The Definitive Guide", O'Reilley, 2011.
• Eben Hewitt, "Cassandra: The Definitive Guide", O'Reilley, 2010.
9/19/2023 Department of AI & DS 2
Topics covered in Unit 2 session
9/19/2023 Department of AI & DS 3
UNIT II NOSQL DATA MANAGEMENT
Introduction to NoSQL – aggregate data models – key-value and
document data models – relationships – graph databases –
schema less databases – materialized views – distribution models –
master-slave replication – consistency - Cassandra – Cassandra data
model – Cassandra examples – Cassandra clients.
Summarization of todays session 1
• Database-Organized collection of data in table format.
• DBMS-Database Management System
• RDBMS Characteristics
• ACID properties
• Abstraction on physical layer
• Standard Query language (SQL)
• NoSQL why, what and when?
• What’s NoSQL?
• Characteristics of NoSQL databases
• Difference between SQL and NoSQL
9/19/2023 Department of AI & DS 4
CAP Theorem
• CAP THEOREM Stands for :
• Consistency
• Availability
• partition tolerance
9/19/2023 Department of AI & DS 5
Definition: The CAP theorem states
that distributed databases can have at
most two of the three properties:
consistency, availability, and partition
tolerance. As a result, database systems
prioritize only two properties at a time.
CAP Theorem related to SQL and NoSQL
9/19/2023 Department of AI & DS 6
NoSQL Database Types
Discussing NoSQL databases is complicated because there are a variety of
types:
•Sorted ordered Column Store
•Optimized for queries over large datasets, and store columns of data together, instead of
rows
•Document databases:
•pair each key with a complex data structure known as a document.
•Key-Value Store :
•Are the simplest NoSQL databases. Every single item in the database is stored as an
attribute name (or 'key'), together with its value.
•Graph Databases :
•are used to store information about networks of data, such as social connections.
9/19/2023 Department of AI & DS 7
9/19/2023 Department of AI & DS 8
Document Databases (Document Store)
• Documents
• Loosely structured sets of key/value pairs in documents, e.g., XML, JSON,
BSON
• Encapsulate and encode data in some standard formats or encodings
• Are addressed in the database via a unique key
• Documents are treated as a whole, avoiding splitting a document into its
constituent name/value pairs
• Allow documents retrieving by keys or contents
• Notable for:
• MongoDB (used in FourSquare, Github, and more)
• CouchDB (used in Apple, BBC, Canonical, Cern, and more)
9/19/2023 Department of AI & DS 9
9/19/2023 Department of AI & DS 10
Document Databases, JSON
{
_id: ObjectId("51156a1e056d6f966f268f81"),
type: "Article",
author: "Derick Rethans",
title: "Introduction to Document Databases with MongoDB",
date: ISODate("2013-04-24T16:26:31.911Z"),
body: "This arti…"
},
{
_id: ObjectId("51156a1e056d6f966f268f82"),
type: "Book",
author: "Derick Rethans",
title: "php|architect's Guide to Date and Time Programming with PHP",
isbn: "978-0-9738621-5-7"
}
9/19/2023 Department of AI & DS 11
Key/Value stores
• Store data in a schema-less way
• Store data as maps
• Hash Maps or associative arrays
• Provide a very efficient average running time algorithm for accessing data
• Notable for:
• Couch base (Zynga, Vimeo, NAVTEQ, ...)
• Redis (Craig list, Instagram, Stack Overfow, flickr, ...)
• Amazon Dynamo (Amazon, Elsevier, IMDb, ...)
• Apache Cassandra (Facebook, Digg, Reddit, Twitter,...)
• Voldemort (LinkedIn, eBay, …)
• Riak (Github, Comcast, Mochi, ...)
9/19/2023 Department of AI & DS 12
Scheme less Database
What are schema-less databases?
• Schema-less databases are a type of NoSQL database that do not
require a predefined schema to store data.
• Instead, they allow data to be stored in flexible and dynamic
formats, such as JSON documents, key-value pairs, graphs, or
columns.
9/19/2023 Department of AI & DS 13
SQL QUERIES
9/19/2023 Department of AI & DS 14
Scheme Less Database
9/19/2023 Department of AI & DS 15
Materialized View
• A materialized view takes the regular view described above and
materializes it by proactively computing the results and storing them
in a “virtual” table.
• Materialized View definition: A view can be “materialized” by storing
the tuples of the view in the database.
• Index structures can be built on the materialized view.
• Database system uses one of the three ways to keep the materialized
view updated:
• Update the materialized view as soon as the relation on which it is
defined is updated.
• Update the materialized view every time the view is accessed.
• Update the materialized view periodically.
9/19/2023 Department of AI & DS 16
Summarization
9/19/2023 Department of AI & DS 17
Topics to be covered in next session 3
• Distributed models
9/19/2023 Department of CSE (AI/ML) 18
Thank you!!!

CCS334 BIG DATA ANALYTICS Session 2 Types NoSQL.pptx

  • 1.
    CCS334 BIG DATAANALYTICS (R-21III (I Sem)) Department of Artificial Intelligence and Data Science ) Session 2 by Asst.Prof.M.Gokilavani NIET 9/19/2023 Department of AI & DS 1
  • 2.
    TEXT BOOKS • MichaelMinelli, Michelle Chambers, and AmbigaDhiraj, "Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today's Businesses", Wiley, 2013. • Eric Sammer, "Hadoop Operations", O'Reilley, 2012. • Sadalage, Pramod J. “NoSQL distilled”, 2013. REFERENCES • E. Capriolo, D. Wampler, and J. Rutherglen, "Programming Hive", O'Reilley, 2012. • Lars George, "HBase: The Definitive Guide", O'Reilley, 2011. • Eben Hewitt, "Cassandra: The Definitive Guide", O'Reilley, 2010. 9/19/2023 Department of AI & DS 2
  • 3.
    Topics covered inUnit 2 session 9/19/2023 Department of AI & DS 3 UNIT II NOSQL DATA MANAGEMENT Introduction to NoSQL – aggregate data models – key-value and document data models – relationships – graph databases – schema less databases – materialized views – distribution models – master-slave replication – consistency - Cassandra – Cassandra data model – Cassandra examples – Cassandra clients.
  • 4.
    Summarization of todayssession 1 • Database-Organized collection of data in table format. • DBMS-Database Management System • RDBMS Characteristics • ACID properties • Abstraction on physical layer • Standard Query language (SQL) • NoSQL why, what and when? • What’s NoSQL? • Characteristics of NoSQL databases • Difference between SQL and NoSQL 9/19/2023 Department of AI & DS 4
  • 5.
    CAP Theorem • CAPTHEOREM Stands for : • Consistency • Availability • partition tolerance 9/19/2023 Department of AI & DS 5 Definition: The CAP theorem states that distributed databases can have at most two of the three properties: consistency, availability, and partition tolerance. As a result, database systems prioritize only two properties at a time.
  • 6.
    CAP Theorem relatedto SQL and NoSQL 9/19/2023 Department of AI & DS 6
  • 7.
    NoSQL Database Types DiscussingNoSQL databases is complicated because there are a variety of types: •Sorted ordered Column Store •Optimized for queries over large datasets, and store columns of data together, instead of rows •Document databases: •pair each key with a complex data structure known as a document. •Key-Value Store : •Are the simplest NoSQL databases. Every single item in the database is stored as an attribute name (or 'key'), together with its value. •Graph Databases : •are used to store information about networks of data, such as social connections. 9/19/2023 Department of AI & DS 7
  • 8.
  • 9.
    Document Databases (DocumentStore) • Documents • Loosely structured sets of key/value pairs in documents, e.g., XML, JSON, BSON • Encapsulate and encode data in some standard formats or encodings • Are addressed in the database via a unique key • Documents are treated as a whole, avoiding splitting a document into its constituent name/value pairs • Allow documents retrieving by keys or contents • Notable for: • MongoDB (used in FourSquare, Github, and more) • CouchDB (used in Apple, BBC, Canonical, Cern, and more) 9/19/2023 Department of AI & DS 9
  • 10.
  • 11.
    Document Databases, JSON { _id:ObjectId("51156a1e056d6f966f268f81"), type: "Article", author: "Derick Rethans", title: "Introduction to Document Databases with MongoDB", date: ISODate("2013-04-24T16:26:31.911Z"), body: "This arti…" }, { _id: ObjectId("51156a1e056d6f966f268f82"), type: "Book", author: "Derick Rethans", title: "php|architect's Guide to Date and Time Programming with PHP", isbn: "978-0-9738621-5-7" } 9/19/2023 Department of AI & DS 11
  • 12.
    Key/Value stores • Storedata in a schema-less way • Store data as maps • Hash Maps or associative arrays • Provide a very efficient average running time algorithm for accessing data • Notable for: • Couch base (Zynga, Vimeo, NAVTEQ, ...) • Redis (Craig list, Instagram, Stack Overfow, flickr, ...) • Amazon Dynamo (Amazon, Elsevier, IMDb, ...) • Apache Cassandra (Facebook, Digg, Reddit, Twitter,...) • Voldemort (LinkedIn, eBay, …) • Riak (Github, Comcast, Mochi, ...) 9/19/2023 Department of AI & DS 12
  • 13.
    Scheme less Database Whatare schema-less databases? • Schema-less databases are a type of NoSQL database that do not require a predefined schema to store data. • Instead, they allow data to be stored in flexible and dynamic formats, such as JSON documents, key-value pairs, graphs, or columns. 9/19/2023 Department of AI & DS 13
  • 14.
    SQL QUERIES 9/19/2023 Departmentof AI & DS 14 Scheme Less Database
  • 15.
  • 16.
    Materialized View • Amaterialized view takes the regular view described above and materializes it by proactively computing the results and storing them in a “virtual” table. • Materialized View definition: A view can be “materialized” by storing the tuples of the view in the database. • Index structures can be built on the materialized view. • Database system uses one of the three ways to keep the materialized view updated: • Update the materialized view as soon as the relation on which it is defined is updated. • Update the materialized view every time the view is accessed. • Update the materialized view periodically. 9/19/2023 Department of AI & DS 16
  • 17.
  • 18.
    Topics to becovered in next session 3 • Distributed models 9/19/2023 Department of CSE (AI/ML) 18 Thank you!!!