To SQL or
NoSQL?
That is the
question
Krishnakumar S
D E V C O N Kochi
28th March 2015
History of Database Systems
 1960’s : Hierarchical and Network (IMS, CODASYL etc.)
 1970’s : Beginning of theory of relational model of database
 1980’s : Rise of RDBMS and SQL
 1990’s : Spreadsheets and MySQL; evolution of web
 2000’s : Large enterprise & open source; Google & Amazon
 2010’s : Emergence of NoSQL systems
 2020’s : NewSQL?
CAP
theorem
RDBMS
 Strong foundation – Relational Model
 Highly Structured – rows, columns, data types
 Structured Query Language - standardized
 ACID properties – all or nothing
 Joins – new views from relationships
RDBMS – Weakness
 Joins – Not scalable
 Transactions – Read & write operations will
be slow because of locking resources
 Fixed definitions – Difficult to work with
highly variable data
 Document integration – difficult create reports
based on structured & unstructured data
Changing scenarios...
Changing scenarios...
Changing scenarios...
Changing scenarios...
What drives to NoSQL?
Velocity Agility
Volume
Variability
Any existing solution?
• Data partitioning
• Replication
• Clustering
• Query distribution
• Load balancing
• Consistency/Syncing
• Latency/Concurrency
• Network bottle neck
• Multiple data centers
• Distributed backups
• Node failures
• Voting algorithms for failure detection
• Administration of many systems
• Monitoring
RDBMS is scalable only if designed & administered correctly (Period)
NoSQL! What is in a name?
1998 :
• Carlo Strozzi developed a open-source relational database “Strozzi NoSQL”
• Database stores tables as ASCII files; tuples as tab separated values
• It doesn’t use SQL as query language – so given the name “NoSQL”
• Instead it used UNIX shell script and pipeline to retrieve data
Irony! A relational database is named as NoSQL!
2009 :
• Johan Oskarsson organized a meetup of people developing open-source,
distributed, non relational databases on June 11, 2009
• He wanted a simple twitter hash tag for the meetup; quick, memorable, & helps
Google search
• Eric Evans come up with the name NoSQL, for the single meetup
NoSQL! What is in a name?
• The name is negative
• The name does not describe the purpose of their meet up
• The name does not define the new database system
• But; the name just satisfied the twitter tag! And caught on like wildfire
What does it stands for!
• “No to SQL”? Not exactly
• “Not Only SQL”? Then what about SQL Server, Oracle etc.?
The answer is “You don’t worry about what it stands for!
NoSQL
• The NoSQL is a movement
• The NoSQL is an ecosystem for future database technology
• NoSQL is an accidental neologism. There is no prescriptive definition
Characteristics of NoSQL
• Not using the relational model
• Running well in clusters
• Open-source
• Built for 21st century web estates
• Schemaless
The most important result of NoSQL movement is; Polyglot Persistence
Theorems Ahead!
Brewer’s CAP theorem
• In 2000, Eric Brewer presented the CAP principle as conjuncture
• In 2002, Seth Gilbert & Nancy Lynch published a formal proof and rendered
the principle as CAP theorem
There are three essential system requirements necessary for the successful
design, implementation, and deployment of applications in distributed
computing
1. Consistency
2. Availability
3. Partition Tolerance
In majority of instances, a distributed system can only guarantee any two, not
all three
Brewer’s CAP theorem
Consistency refers to whether a system operates fully or not. Do all nodes
within a cluster see all the data they are supposed to? This is the same
idea presented in ACID
Availability means just as it sounds. Is the given service or system available
when requested? Does each request get a response outside of failure or
success?
Partition Tolerance represents the fact that a given system continues to
operate even under circumstances of data loss or system failure. A single
node failure should not cause the entire system to collapse.
In large scale, distributed, non relational systems, they need availability and
partition tolerance, so consistency suffers and ACID collapses
Brewer’s CAP theorem
Pick any two
CA AP
CP
RDBMS’s
SQL Server
Oracle
MySQL etc.
Availability
Each client can always read and
write
Consistency
All clients always have he same
view
of data
Partition
Tolerance
The system works well despite
physical
Network partitions
Bigtable, MongoDB, BerkleyDB, MemcacheDB, Hbase etc
Cassandra
CouchDB
Dynamo
Voldemort
BASE
Basically Available : states that the system does guarantee the availability
of the data as regards CAP Theorem; there will be a response to any
request. But, that response could still be ‘failure’ to obtain the requested
data or the data may be in an inconsistent or changing state
Soft state : The state of the system could change over time, so even during
times without input there may be changes going on due to ‘eventual
consistency,’ thus the state of the system is always ‘soft.’
Eventual Consistency : The system will eventually become consistent once
it stops receiving input. The data will propagate to everywhere it should
sooner or later, but the system will continue to receive input and is not
checking the consistency of every transaction before it moves onto the
next one
It’s OK to use stale data; it’s OK to give approximate answers.
NoSQL Data Architecture Patterns
Key-Value
key value
key value
key value
key value
Column-Family
Graph Document
Key-Value
Key-Value
key value
key value
key value
key value
 Keys used to access opaque
blobs of data
 Values can contain any type of
data (images, video)
Pros: scalable, simple API (put,
get, delete)
Cons: no way to query based on
the content of the value
Column family
Column-Family
 Key includes a row, column
family and column name
 Store versioned blobs in one
large table
 Queries can be done on rows,
column families and column
names
 Pros: Good scale out
 Cons: Can not query blob
content, row and column designs
are critical
Graph Store
Graph  Data is stored in a series of nodes
and properties
 Queries are really graph traversals
 Ideal when relationships between
data is key:
 e.g. social networks
 Pros: fast network search, works
with public linked data sets
 Cons: Poor scalability when graphs
don't fit into RAM, specialized query
language
Document Store
Document  Data stored in nested hierarchies
 Logical data remains stored
together as a unit
 Any item in the document can be
queried
 Pros: No object-relational
mapping layer, ideal for search
 Cons: Complex to implement,
incompatible with SQL
NoSQL & Functional Programming
NoSQL & Functional Programming
Polyglot Persistence
Different database systems are designed to solve different problems
Using single database engine for all the requirements leads to non-
performant solutions
The solution is polyglot persistence; a hybrid approach to data
persistence
NoSQL - Evolution
© Natalino Busa
References
• Making Sense of NoSQL – Dan McCreary and Ann Kelly
• NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot
Persistence - Pramod J. Sadalage and Martin Fowler
• Data Access for Highly-Scalable Solutions: Using SQL, NoSQL, and
Polyglot Persistence - John Sharp, Douglas McMurtry, Andrew
Oakley, Mani Subramanian, Hanzhong Zhang
To SQL or NoSQL, that is the question
To SQL or NoSQL, that is the question

To SQL or NoSQL, that is the question

  • 1.
    To SQL or NoSQL? Thatis the question Krishnakumar S D E V C O N Kochi 28th March 2015
  • 2.
    History of DatabaseSystems  1960’s : Hierarchical and Network (IMS, CODASYL etc.)  1970’s : Beginning of theory of relational model of database  1980’s : Rise of RDBMS and SQL  1990’s : Spreadsheets and MySQL; evolution of web  2000’s : Large enterprise & open source; Google & Amazon  2010’s : Emergence of NoSQL systems  2020’s : NewSQL? CAP theorem
  • 3.
    RDBMS  Strong foundation– Relational Model  Highly Structured – rows, columns, data types  Structured Query Language - standardized  ACID properties – all or nothing  Joins – new views from relationships
  • 4.
    RDBMS – Weakness Joins – Not scalable  Transactions – Read & write operations will be slow because of locking resources  Fixed definitions – Difficult to work with highly variable data  Document integration – difficult create reports based on structured & unstructured data
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
    What drives toNoSQL? Velocity Agility Volume Variability
  • 10.
    Any existing solution? •Data partitioning • Replication • Clustering • Query distribution • Load balancing • Consistency/Syncing • Latency/Concurrency • Network bottle neck • Multiple data centers • Distributed backups • Node failures • Voting algorithms for failure detection • Administration of many systems • Monitoring RDBMS is scalable only if designed & administered correctly (Period)
  • 11.
    NoSQL! What isin a name? 1998 : • Carlo Strozzi developed a open-source relational database “Strozzi NoSQL” • Database stores tables as ASCII files; tuples as tab separated values • It doesn’t use SQL as query language – so given the name “NoSQL” • Instead it used UNIX shell script and pipeline to retrieve data Irony! A relational database is named as NoSQL! 2009 : • Johan Oskarsson organized a meetup of people developing open-source, distributed, non relational databases on June 11, 2009 • He wanted a simple twitter hash tag for the meetup; quick, memorable, & helps Google search • Eric Evans come up with the name NoSQL, for the single meetup
  • 12.
    NoSQL! What isin a name? • The name is negative • The name does not describe the purpose of their meet up • The name does not define the new database system • But; the name just satisfied the twitter tag! And caught on like wildfire What does it stands for! • “No to SQL”? Not exactly • “Not Only SQL”? Then what about SQL Server, Oracle etc.? The answer is “You don’t worry about what it stands for!
  • 13.
    NoSQL • The NoSQLis a movement • The NoSQL is an ecosystem for future database technology • NoSQL is an accidental neologism. There is no prescriptive definition Characteristics of NoSQL • Not using the relational model • Running well in clusters • Open-source • Built for 21st century web estates • Schemaless The most important result of NoSQL movement is; Polyglot Persistence Theorems Ahead!
  • 14.
    Brewer’s CAP theorem •In 2000, Eric Brewer presented the CAP principle as conjuncture • In 2002, Seth Gilbert & Nancy Lynch published a formal proof and rendered the principle as CAP theorem There are three essential system requirements necessary for the successful design, implementation, and deployment of applications in distributed computing 1. Consistency 2. Availability 3. Partition Tolerance In majority of instances, a distributed system can only guarantee any two, not all three
  • 15.
    Brewer’s CAP theorem Consistencyrefers to whether a system operates fully or not. Do all nodes within a cluster see all the data they are supposed to? This is the same idea presented in ACID Availability means just as it sounds. Is the given service or system available when requested? Does each request get a response outside of failure or success? Partition Tolerance represents the fact that a given system continues to operate even under circumstances of data loss or system failure. A single node failure should not cause the entire system to collapse. In large scale, distributed, non relational systems, they need availability and partition tolerance, so consistency suffers and ACID collapses
  • 16.
    Brewer’s CAP theorem Pickany two CA AP CP RDBMS’s SQL Server Oracle MySQL etc. Availability Each client can always read and write Consistency All clients always have he same view of data Partition Tolerance The system works well despite physical Network partitions Bigtable, MongoDB, BerkleyDB, MemcacheDB, Hbase etc Cassandra CouchDB Dynamo Voldemort
  • 17.
    BASE Basically Available :states that the system does guarantee the availability of the data as regards CAP Theorem; there will be a response to any request. But, that response could still be ‘failure’ to obtain the requested data or the data may be in an inconsistent or changing state Soft state : The state of the system could change over time, so even during times without input there may be changes going on due to ‘eventual consistency,’ thus the state of the system is always ‘soft.’ Eventual Consistency : The system will eventually become consistent once it stops receiving input. The data will propagate to everywhere it should sooner or later, but the system will continue to receive input and is not checking the consistency of every transaction before it moves onto the next one It’s OK to use stale data; it’s OK to give approximate answers.
  • 18.
    NoSQL Data ArchitecturePatterns Key-Value key value key value key value key value Column-Family Graph Document
  • 19.
    Key-Value Key-Value key value key value keyvalue key value  Keys used to access opaque blobs of data  Values can contain any type of data (images, video) Pros: scalable, simple API (put, get, delete) Cons: no way to query based on the content of the value
  • 20.
    Column family Column-Family  Keyincludes a row, column family and column name  Store versioned blobs in one large table  Queries can be done on rows, column families and column names  Pros: Good scale out  Cons: Can not query blob content, row and column designs are critical
  • 21.
    Graph Store Graph Data is stored in a series of nodes and properties  Queries are really graph traversals  Ideal when relationships between data is key:  e.g. social networks  Pros: fast network search, works with public linked data sets  Cons: Poor scalability when graphs don't fit into RAM, specialized query language
  • 22.
    Document Store Document Data stored in nested hierarchies  Logical data remains stored together as a unit  Any item in the document can be queried  Pros: No object-relational mapping layer, ideal for search  Cons: Complex to implement, incompatible with SQL
  • 23.
    NoSQL & FunctionalProgramming
  • 24.
    NoSQL & FunctionalProgramming
  • 25.
    Polyglot Persistence Different databasesystems are designed to solve different problems Using single database engine for all the requirements leads to non- performant solutions The solution is polyglot persistence; a hybrid approach to data persistence
  • 26.
    NoSQL - Evolution ©Natalino Busa
  • 27.
    References • Making Senseof NoSQL – Dan McCreary and Ann Kelly • NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence - Pramod J. Sadalage and Martin Fowler • Data Access for Highly-Scalable Solutions: Using SQL, NoSQL, and Polyglot Persistence - John Sharp, Douglas McMurtry, Andrew Oakley, Mani Subramanian, Hanzhong Zhang