SlideShare a Scribd company logo
1 of 43
Module -7
Overview of No-SQL Database
Management
Outline
1. Motivation
2.CAP Theorem
3. Category
4. Typical NoSQL API
1 Motivation
• Database Management System (DBMS)
provides our application:
efficient, reliable, convenient, and safe
multi-user storage of and access to
massive
amounts of persistent data.
• Not every data management/analysis
problem is best solved using a
traditional DBMS
Data Management
Why NoSQL
• Big data
• Scalability
• Data format
• Manageability
Big Data
• Collect
• Store
• Organize
• Analyze
• Share
Data growth outruns the ability to manage it so
we need scalable solutions.
Scalability
• Scale up, Vertical scalability.
–Increasing server capacity.
–Adding more CPU, RAM.
–Managing is hard.
–Possible down times
Horizontal scaling means that you scale by adding more machines into your pool
of resources whereas Vertical scaling means that you scale by adding more
power (CPU, RAM) to an existing machine.
Scalability
• Scale out, Horizontal scalability.
– Adding servers to existing system with little effort,
aka Elastically scalable
•Bugs, hardware errors, things fail all the time.
•It should become cheaper. Cost efficiency.
– Shared nothing
– Use of commodity/cheap hardware
– Heterogeneous systems
 “SQL” = Traditional relational DBMS
 Recognition over past decade or so:
Not every data management/analysis
problem is best solved using a traditional
relational DBMS
“NoSQL” = “No SQL” =
Not using traditional relational DBMS
“No SQL”  Don’t use SQL language
 “NoSQL” =? “Not Only SQL” => NOSQL
NoSQL: The Name
NoSQL databases
• The name stands for Not Only SQL
• Common features:
– non-relational
– usually do not require a fixed table schema
– horizontal scalable
– mostly open source
• More characteristics
– relax one or more of the ACID properties (see CAP
theorem)
– replication support
– easy API (if SQL, then only its very restricted variant)
• Do not fully support relational features
– no join operations (except within partitions),
– no referential integrity constraints across partitions.
More Programming and Less Database
Design
Alternative to traditional relational DBMS
+ Flexible schema
+ Quicker/cheaper to set up
+ Massive scalability
+ Relaxed consistency  higher performance &
availability
– No declarative query language  more
programming
– Relaxed consistency  fewer guarantees
Challenge: Coordination
• The solution to availability and scalability is to
decentralize and replicate functions and data…but how do
we coordinate the nodes?
– data consistency
– update propagation
– mutual exclusion
CAP Theorem
 Conjectured by Prof. Eric Brewer at
PODC (Principle of Distributed
Computing) 2000 keynote talk
 Described the trade-offs involved in
distributed system
 It is impossible for a web service to
provide following three guarantees at
the same time:
 Consistency
 Availability
 Partition-tolerance
CAP Theorem
• Consistency:
– All nodes should see the same data at the same time
• Availability:
– Node failures do not prevent survivors from
continuing to operate
• Partition-tolerance:
– The system continues to operate despite network
partitions
• A distributed system can satisfy any two of these
guarantees at the same time but not all three
CAP Theorem
C A
P
17
Relaxing ACID properties
• ACID is hard to achieve, moreover, it is not always required,
e.g. for blogs, status updates, product listings, etc.
• Availability
– Traditionally, thought of as the server/process available 99.999
% of time
– For a large-scale node system, there is a high probability that a
node is either down or that there is a network partitioning
• Partition tolerance
– ensures that write and read operations are redirected to
available replicas when segments of the network become
disconnected
18
Eventual Consistency
• Eventual Consistency
– When no updates occur for a long period of time, eventually all
updates will propagate through the system and all the nodes
will be consistent
– For a given accepted update and a given node, eventually
either the update reaches the node or the node is removed
from service
• BASE (Basically Available, Soft state, Eventual
consistency) properties, as opposed to ACID
•Soft state: copies of a data item may be inconsistent
•Eventually Consistent – copies becomes consistent at some
later time if there are no more updates to that data item
•Basically Available – possibilities of faults but not a fault of the
whole system
CAP Theorem
• Suppose three properties of a system
– Consistency (all copies have same value)
– Availability (system can run even if parts have failed)
– Partitions (network can break into two or more parts, each with active
systems that can not influence other parts)
• Brewer’s CAP “Theorem”: for any system sharing data it is
impossible to guarantee simultaneously all of these three
properties
• Very large systems will partition at some point
– it is necessary to decide between C and A
– traditional DBMS prefer C over A and P
– most Web applications choose A (except in specific applications such
as order processing)
CAP Theorem
• Drop A or C of ACID
–relaxing C makes replication easy,
facilitates fault tolerance,
–relaxing A reduces (or eliminates) need
for distributed concurrency control.
4 Category
Categories of NoSQL databases
• Key-value stores
• Column NoSQL databases
• Document-based
• Graph database (neo4j, InfoGrid)
Key-Value Stores
Extremely simple interface
 Data model: (key, value) pairs
 Operations: Insert(key,value), Fetch(key),
Update(key), Delete(key)
Implementation: efficiency, scalability, fault-
tolerance
 Records distributed to nodes based on key
 Replication
 Single-record transactions, “eventual consistency
Key-Value Data Stores
• Example: SimpleDB
– Based on Amazon’s Single Storage Service
(S3)
– items (represent objects) having one or more
pairs (name, value), where name denotes an
attribute.
– An attribute can have multiple values.
– items are combined into domains.
Key-Value Stores
Extremely simple interface
 Data model: (key, value) pairs
 Operations: Insert(key,value), Fetch(key),
Update(key), Delete(key)
Implementation: efficiency, scalability, fault-
tolerance
 Records distributed to nodes based on key
 Replication
 Single-record transactions, “eventual consistency”
Suitable Use Cases
• Storing Session Information
• User Profiles, Preferences: Almost every
user has a unique userID as well as
preferences such as language, color,
timezone, which products the user has
access to , and so on.
Shopping Cart Data
• As we want the shopping carts to be available
all the time, across browsers, machines, and
sessions, all the shopping information can be
put into value where the key is the userID
Not to Use
• Relationships among data
• Multi-operation Transactions
• Query by Data
• Operations by Sets: since operations are
limited to one key at a time, there is no
way to operate upon multiple keys at the
same time. If you need to operate upon
multiple keys, you have to handle this
from the client side
Document Stores
Like Key-Value Stores except value is document
 Data model: (key, document) pairs
 Document: JSON, XML, other semistructured
formats
 Basic operations: Insert(key,document), Fetch(key),
Update(key), Delete(key)
 Also Fetch based on document contents
Example systems
 CouchDB, MongoDB, SimpleDB etc
Document-Based
• based on JSON format: a data model which supports
lists, maps, dates, Boolean with nesting
• Really: indexed semistructured documents
• Example: Mongo
– {Name:"Jaroslav",
Address:"Malostranske nám. 25, 118 00 Praha 1“
Grandchildren: [Claire: "7", Barbara: "6", "Magda: "3", "Kirsten:
"1", "Otis: "3", Richard: "1"]
}
MongoDB CRUD operations
• CRUD stands for create, read, update, and
delete
• MongoDB stores data in the form of
documents, which are JSON-like field and
value pairs.
Suitable Use Cases
• Event Logging
• Content Management Systems
• Web Analytics or Real time Analysis
• E-commerce Applications
Not to use
• Complex Transaction Spanning Different
Operations
• Queries against Varying Aggregate Structure
Column-oriented
• Store data in column order
• Allow key-value pairs to be stored (and retrieved on
key) in a massively parallel system
– data model: families of attributes defined in a schema,
new attributes can be added
– storing principle: big hashed distributed tables
– properties: partitioning (horizontally and/or vertically),
high availability etc. completely transparent to application
Cassandra
Cassandra
– keyspace: Usually the name of the application;
e.g., 'Twitter', 'Wordpress‘.
– column family: structure containing an
unlimited number of rows
– column: a tuple with name, value and time
stamp
– key: name of record
– super column: contains more columns
Cassandra
Suitable Use Cases
• Event Logging
• Content management Systems, blogging
platforms
Not to use
• There are problems for which column-family
databases are not best solutions, such as
systems that require ACID transactions for
writes and reads.
• If you need the database to aggregate the
data using queries (such as SUM or AVG), you
have to do this on the client side using data
retrieved by the client from all the rows.
Graph Database
 Data model: nodes and edges
 Nodes may have properties (including ID)
 Edges may have labels or roles
Graph Database Systems
 Interfaces and query languages vary
 Single-step versus “path expressions” versus
full recursion
 Example systems
Neo4j, FlockDB, Pregel, …
 RDF “triple stores” can map to graph
databases
5 Summary

More Related Content

Similar to Master.pptx

no sql presentation
no sql presentationno sql presentation
no sql presentationchandanm2
 
No sql databases
No sql databases No sql databases
No sql databases Ankit Dubey
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...DataStax Academy
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture OverviewChristopher Foot
 
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataDebajani Mohanty
 
Big Data_Architecture.pptx
Big Data_Architecture.pptxBig Data_Architecture.pptx
Big Data_Architecture.pptxbetalab
 
Data management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesData management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesEditor Jacotech
 
Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra Nikiforos Botis
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsFirat Atagun
 

Similar to Master.pptx (20)

no sql presentation
no sql presentationno sql presentation
no sql presentation
 
No sql databases
No sql databases No sql databases
No sql databases
 
NoSQL and Couchbase
NoSQL and CouchbaseNoSQL and Couchbase
NoSQL and Couchbase
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture Overview
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 
BigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearchBigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearch
 
No sql (not only sql)
No sql                 (not only sql)No sql                 (not only sql)
No sql (not only sql)
 
Gcp data engineer
Gcp data engineerGcp data engineer
Gcp data engineer
 
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big Data
 
Big Data_Architecture.pptx
Big Data_Architecture.pptxBig Data_Architecture.pptx
Big Data_Architecture.pptx
 
Data management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesData management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunities
 
Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra
 
No sq lv1_0
No sq lv1_0No sq lv1_0
No sq lv1_0
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, Implementations
 
Data engineering
Data engineeringData engineering
Data engineering
 
Revision
RevisionRevision
Revision
 

Recently uploaded

Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxMuhammadAsimMuhammad6
 
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...jabtakhaidam7
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
Electromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxElectromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxNANDHAKUMARA10
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Call Girls Mumbai
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...drmkjayanthikannan
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...Amil baba
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxkalpana413121
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdfKamal Acharya
 
Computer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesComputer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesChandrakantDivate1
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdfKamal Acharya
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxSCMS School of Architecture
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdfKamal Acharya
 

Recently uploaded (20)

Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
 
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Signal Processing and Linear System Analysis
Signal Processing and Linear System AnalysisSignal Processing and Linear System Analysis
Signal Processing and Linear System Analysis
 
Electromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxElectromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptx
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptx
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Computer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesComputer Graphics Introduction To Curves
Computer Graphics Introduction To Curves
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 

Master.pptx

  • 1. Module -7 Overview of No-SQL Database Management
  • 2. Outline 1. Motivation 2.CAP Theorem 3. Category 4. Typical NoSQL API
  • 4. • Database Management System (DBMS) provides our application: efficient, reliable, convenient, and safe multi-user storage of and access to massive amounts of persistent data. • Not every data management/analysis problem is best solved using a traditional DBMS Data Management
  • 5. Why NoSQL • Big data • Scalability • Data format • Manageability
  • 6. Big Data • Collect • Store • Organize • Analyze • Share Data growth outruns the ability to manage it so we need scalable solutions.
  • 7. Scalability • Scale up, Vertical scalability. –Increasing server capacity. –Adding more CPU, RAM. –Managing is hard. –Possible down times
  • 8. Horizontal scaling means that you scale by adding more machines into your pool of resources whereas Vertical scaling means that you scale by adding more power (CPU, RAM) to an existing machine.
  • 9. Scalability • Scale out, Horizontal scalability. – Adding servers to existing system with little effort, aka Elastically scalable •Bugs, hardware errors, things fail all the time. •It should become cheaper. Cost efficiency. – Shared nothing – Use of commodity/cheap hardware – Heterogeneous systems
  • 10.  “SQL” = Traditional relational DBMS  Recognition over past decade or so: Not every data management/analysis problem is best solved using a traditional relational DBMS “NoSQL” = “No SQL” = Not using traditional relational DBMS “No SQL”  Don’t use SQL language  “NoSQL” =? “Not Only SQL” => NOSQL NoSQL: The Name
  • 11. NoSQL databases • The name stands for Not Only SQL • Common features: – non-relational – usually do not require a fixed table schema – horizontal scalable – mostly open source • More characteristics – relax one or more of the ACID properties (see CAP theorem) – replication support – easy API (if SQL, then only its very restricted variant) • Do not fully support relational features – no join operations (except within partitions), – no referential integrity constraints across partitions.
  • 12. More Programming and Less Database Design Alternative to traditional relational DBMS + Flexible schema + Quicker/cheaper to set up + Massive scalability + Relaxed consistency  higher performance & availability – No declarative query language  more programming – Relaxed consistency  fewer guarantees
  • 13. Challenge: Coordination • The solution to availability and scalability is to decentralize and replicate functions and data…but how do we coordinate the nodes? – data consistency – update propagation – mutual exclusion
  • 14. CAP Theorem  Conjectured by Prof. Eric Brewer at PODC (Principle of Distributed Computing) 2000 keynote talk  Described the trade-offs involved in distributed system  It is impossible for a web service to provide following three guarantees at the same time:  Consistency  Availability  Partition-tolerance
  • 15. CAP Theorem • Consistency: – All nodes should see the same data at the same time • Availability: – Node failures do not prevent survivors from continuing to operate • Partition-tolerance: – The system continues to operate despite network partitions • A distributed system can satisfy any two of these guarantees at the same time but not all three
  • 17. 17 Relaxing ACID properties • ACID is hard to achieve, moreover, it is not always required, e.g. for blogs, status updates, product listings, etc. • Availability – Traditionally, thought of as the server/process available 99.999 % of time – For a large-scale node system, there is a high probability that a node is either down or that there is a network partitioning • Partition tolerance – ensures that write and read operations are redirected to available replicas when segments of the network become disconnected
  • 18. 18 Eventual Consistency • Eventual Consistency – When no updates occur for a long period of time, eventually all updates will propagate through the system and all the nodes will be consistent – For a given accepted update and a given node, eventually either the update reaches the node or the node is removed from service • BASE (Basically Available, Soft state, Eventual consistency) properties, as opposed to ACID •Soft state: copies of a data item may be inconsistent •Eventually Consistent – copies becomes consistent at some later time if there are no more updates to that data item •Basically Available – possibilities of faults but not a fault of the whole system
  • 19. CAP Theorem • Suppose three properties of a system – Consistency (all copies have same value) – Availability (system can run even if parts have failed) – Partitions (network can break into two or more parts, each with active systems that can not influence other parts) • Brewer’s CAP “Theorem”: for any system sharing data it is impossible to guarantee simultaneously all of these three properties • Very large systems will partition at some point – it is necessary to decide between C and A – traditional DBMS prefer C over A and P – most Web applications choose A (except in specific applications such as order processing)
  • 20. CAP Theorem • Drop A or C of ACID –relaxing C makes replication easy, facilitates fault tolerance, –relaxing A reduces (or eliminates) need for distributed concurrency control.
  • 22. Categories of NoSQL databases • Key-value stores • Column NoSQL databases • Document-based • Graph database (neo4j, InfoGrid)
  • 23.
  • 24. Key-Value Stores Extremely simple interface  Data model: (key, value) pairs  Operations: Insert(key,value), Fetch(key), Update(key), Delete(key) Implementation: efficiency, scalability, fault- tolerance  Records distributed to nodes based on key  Replication  Single-record transactions, “eventual consistency
  • 25. Key-Value Data Stores • Example: SimpleDB – Based on Amazon’s Single Storage Service (S3) – items (represent objects) having one or more pairs (name, value), where name denotes an attribute. – An attribute can have multiple values. – items are combined into domains.
  • 26. Key-Value Stores Extremely simple interface  Data model: (key, value) pairs  Operations: Insert(key,value), Fetch(key), Update(key), Delete(key) Implementation: efficiency, scalability, fault- tolerance  Records distributed to nodes based on key  Replication  Single-record transactions, “eventual consistency”
  • 27. Suitable Use Cases • Storing Session Information • User Profiles, Preferences: Almost every user has a unique userID as well as preferences such as language, color, timezone, which products the user has access to , and so on.
  • 28. Shopping Cart Data • As we want the shopping carts to be available all the time, across browsers, machines, and sessions, all the shopping information can be put into value where the key is the userID
  • 29. Not to Use • Relationships among data • Multi-operation Transactions • Query by Data • Operations by Sets: since operations are limited to one key at a time, there is no way to operate upon multiple keys at the same time. If you need to operate upon multiple keys, you have to handle this from the client side
  • 30. Document Stores Like Key-Value Stores except value is document  Data model: (key, document) pairs  Document: JSON, XML, other semistructured formats  Basic operations: Insert(key,document), Fetch(key), Update(key), Delete(key)  Also Fetch based on document contents Example systems  CouchDB, MongoDB, SimpleDB etc
  • 31. Document-Based • based on JSON format: a data model which supports lists, maps, dates, Boolean with nesting • Really: indexed semistructured documents • Example: Mongo – {Name:"Jaroslav", Address:"Malostranske nám. 25, 118 00 Praha 1“ Grandchildren: [Claire: "7", Barbara: "6", "Magda: "3", "Kirsten: "1", "Otis: "3", Richard: "1"] }
  • 32. MongoDB CRUD operations • CRUD stands for create, read, update, and delete • MongoDB stores data in the form of documents, which are JSON-like field and value pairs.
  • 33. Suitable Use Cases • Event Logging • Content Management Systems • Web Analytics or Real time Analysis • E-commerce Applications
  • 34. Not to use • Complex Transaction Spanning Different Operations • Queries against Varying Aggregate Structure
  • 35. Column-oriented • Store data in column order • Allow key-value pairs to be stored (and retrieved on key) in a massively parallel system – data model: families of attributes defined in a schema, new attributes can be added – storing principle: big hashed distributed tables – properties: partitioning (horizontally and/or vertically), high availability etc. completely transparent to application
  • 37. Cassandra – keyspace: Usually the name of the application; e.g., 'Twitter', 'Wordpress‘. – column family: structure containing an unlimited number of rows – column: a tuple with name, value and time stamp – key: name of record – super column: contains more columns
  • 39. Suitable Use Cases • Event Logging • Content management Systems, blogging platforms
  • 40. Not to use • There are problems for which column-family databases are not best solutions, such as systems that require ACID transactions for writes and reads. • If you need the database to aggregate the data using queries (such as SUM or AVG), you have to do this on the client side using data retrieved by the client from all the rows.
  • 41. Graph Database  Data model: nodes and edges  Nodes may have properties (including ID)  Edges may have labels or roles
  • 42. Graph Database Systems  Interfaces and query languages vary  Single-step versus “path expressions” versus full recursion  Example systems Neo4j, FlockDB, Pregel, …  RDF “triple stores” can map to graph databases