(No)SQL
Radu Vunvulea
vunvulear@gmail.com
http://vunvulearadu.blogspot.com
{
“name” : “Radu Vunvulea,
“company” : “iQuest”,
“userType” : “enthusiastic”
“technologies” : [ “.NET”, “JS”, “Azure”, “Web”,
“Mobile”, “SL” ],
“blog” : “vunvulearadu.blogspot.com”,
“email” : ”vunvulear@gmail.com”,
“socialMedia” :
{
“twitter” : “@RaduVunvulea”,
“fb” : “radu.vunvulea”
}
}
Who am I?
SQL
In the early 1980s, relational databases began to
be defined. One of the proponents of relational
database theory was Edgar F. Codd, who
published 13 rules that set out to define a
relational database. This was the beginning of
the formalized scientific groundwork done to lay
down specific rules for the existence of the
relational aspects of a database.
Sursa: http://www.ehow.com
Relevant rules
• Relational facilities
• Information is represented only in one way
• All data must be accessible
• All views that are theoretically updatable
must be updatable by the system
• Insert, Update, Delete for any retrieval sets
• Where from is this name?
NoSQL
• Where from is this name?
• Non-relational
• Web-scale database
NoSQL
• What is NoSQL
What is NoSQL
Any database that is not a Relational
Database!
What is NoSQL?
Any database that is not a Relational
Database!
Simple like this 
What is NoSQL?
• Non-Relational Database
• But is to long
• Is not so cool
• This name would not caught on
A better name would be
• Non-Relational Database
• But is to long
• Is not so cool
• This name would not caught on
…so we are back to
NoSQL
A better name would be
• More and more connections between data
• Everything is linked to something more… and
more… and so on
• Hyperlinks
• Tags
• RSS
• RDF
• Attributes
• User content
Database trends – 1 Connections
• From a flat architecture
Database trends – 2 Architecture
DB
App
• From a flat architecture to a couple one
Database trends – 2 Architecture
DB
AppApp App
• From a flat architecture to a couple one and now
we have a decoupled one based on services
Database trends – 2 Architecture
DB
App
DB
App
DB
App
• From web 2.0 the structure of data are don’t
have so fixed structure (is more flexible)
• How many phone number a person could have
in 1970?
Database trends – 3 No fix structure
• From web 2.0 the structure of data are don’t
have so fixed structure (is more flexible)
• How many phone number a person could have
in 1970? And NOW …
Database trends – 3 No fix structure
• 2006 - 160
• 2008 – 390
• 2010 – 998
• 2012 – 2000+
• First column is in years
• Second column is in … ?
Database trends – 4 Data Size
Database trends – 4 Data Size
What we need
Relational Database Performance
RDBMS
performance
So, we end up with
So, we end up with
Non-Relational Database
1. Key-Value
2. Document
3. Big Table
4. Graph DB
Categories of NoSQL Database
• Design to handle massive load
• Can scale to massive amounts of data
• Based on Key-Value collections
• Dynamic ring partition
• Dynamic replication
• Ex.: Dynoite
Key-Value
Key-Value
• Like column oriented Relational Database,
but with a twist
• Tables similar to RDBMS, but handle semi-
structured
• Based on Google’s BigTable paper
• Data mode:
• Columns – columns family -> ACL
• Dataums keyed by - row, column, time, index
• Row-range – table -> distribution
• Ex.: Cassandra
Big Table
• Similar with Key-Value pair but
• DB knows what the Value is
• Inspired by Lotus Notes
• Data model:
• Collections of Key-Value collections
• Documents are often versioned
• Ex.: MongoDB
Document Database
• Focus in modeling the structure of data
• The interconnectivity
• Scales on the complexity of data
• Inspired by mathematical Graph Theory
• Data model:
• Property Graph -> Nodes
• Relationships/Edges between Nodes
• Key-Value pair on both
• Possible Edge Labels and/or Node/Edge Types
• Ex.: Neo4j
Graph Database
• Not part of NoSQL community
• Still a good solution for a lot of problems
• Focuses on matching OOP paradigm
• Easy to use
• Simple to integrate
• Neither gain nor loosing traction
Object Database
• Easy to deploy
• No OS management
• Scaling
• Monitoring
• Publish from different source controls
• Support different technologies (PHP, node.js,
.NET)
• Low cost support – shared mode
• Reserved mode – dedicated instance
• Each site run in an isolated environment
Web Sites
Scaling
Complexity
Size
Key
Value
Big
Table
Doc.
Graph
How to query it?
• REST
• GQL (SQL Like)
• SPARQL
• Gremlin
• API’s
How to query it?
• Replication
• Write to many
• Master/Slave replication
• Master reelection
• Failover
• Either by another machine taking over
• Client knowing
Availability
• Most NoSQL sacrifice Consistency
• Some NoSQL don’t have Transactions
• Atom single operations
• Because of this some operations are
impossible to implement
Correctness
• NoSQL is the Batman
Performance
• NoSQL is the Batman
• Durability is sacrificed
• On-disk durability
• Multiple-replicas durability
Performance
One solution for all our problemes?Web Sites
One solution for all our problems?
• Why
• Dynamic query
• Content is stored as documents
• Big database that need to be very fast
• Where
• Properties are stored like query and index
• Can be used for voting system, CMS or comment
storage
MongoDB
• Why
• When you make a lot of updates and insert
• Reading data is not the main scope of the database
(writes are faster than reads)
• Content is stored as column
• High availability
• Where
• Can be used with success for logging
• Financial industry or any place where we work with a
lot of data that is needed to be written
• Basket of an e-commerce application
Cassandra
• Why
• For data that don’t change very often (insert
and read and NOT update)
• We have a lot of predefined queries and we
need versioning support
• Where
• Is a great database for CMS and CRM.
CouchDB
• Why
• When you do data analyzing
• Where
• Works great in combination with Hadoop
HBase
• Why
• When we need high concurrency
• When the latency is very low and we want
the latency to be minimal
• Where
• Backend of a game or a system that offer
data in real time
Membase
• Why
• When we need to make a lot of updates
• When the database is not too big and can be
kept in memory
• Where
• Can be used when we have a real time
communication, for example a stock market
with prices
Redis
Redis
• Facebook
• Hbase – Facebook messages
• Scribe - Real-time click logs
• Hive – SQLqueries -> MapReducejobs
• Hadoop
• Web analytics warehouse
• Distribute datastore
• MySQLbackup
Examples
• Twitter
• Hadoop – Analytics
• Hbase – People search
• Scribe – Log collection framework
• FlockDB – Social graph analysis
Examples
Question
Answers
THE END
Radu Vunvulea
vunvulear@gmail.com
http://vunvulearadu.blogspot.com

NoSQL

  • 1.
  • 2.
    { “name” : “RaduVunvulea, “company” : “iQuest”, “userType” : “enthusiastic” “technologies” : [ “.NET”, “JS”, “Azure”, “Web”, “Mobile”, “SL” ], “blog” : “vunvulearadu.blogspot.com”, “email” : ”vunvulear@gmail.com”, “socialMedia” : { “twitter” : “@RaduVunvulea”, “fb” : “radu.vunvulea” } } Who am I?
  • 3.
  • 4.
    In the early1980s, relational databases began to be defined. One of the proponents of relational database theory was Edgar F. Codd, who published 13 rules that set out to define a relational database. This was the beginning of the formalized scientific groundwork done to lay down specific rules for the existence of the relational aspects of a database. Sursa: http://www.ehow.com
  • 5.
    Relevant rules • Relationalfacilities • Information is represented only in one way • All data must be accessible • All views that are theoretically updatable must be updatable by the system • Insert, Update, Delete for any retrieval sets
  • 8.
    • Where fromis this name? NoSQL
  • 9.
    • Where fromis this name? • Non-relational • Web-scale database NoSQL
  • 11.
    • What isNoSQL What is NoSQL
  • 12.
    Any database thatis not a Relational Database! What is NoSQL?
  • 13.
    Any database thatis not a Relational Database! Simple like this  What is NoSQL?
  • 14.
    • Non-Relational Database •But is to long • Is not so cool • This name would not caught on A better name would be
  • 15.
    • Non-Relational Database •But is to long • Is not so cool • This name would not caught on …so we are back to NoSQL A better name would be
  • 18.
    • More andmore connections between data • Everything is linked to something more… and more… and so on • Hyperlinks • Tags • RSS • RDF • Attributes • User content Database trends – 1 Connections
  • 19.
    • From aflat architecture Database trends – 2 Architecture DB App
  • 20.
    • From aflat architecture to a couple one Database trends – 2 Architecture DB AppApp App
  • 21.
    • From aflat architecture to a couple one and now we have a decoupled one based on services Database trends – 2 Architecture DB App DB App DB App
  • 22.
    • From web2.0 the structure of data are don’t have so fixed structure (is more flexible) • How many phone number a person could have in 1970? Database trends – 3 No fix structure
  • 23.
    • From web2.0 the structure of data are don’t have so fixed structure (is more flexible) • How many phone number a person could have in 1970? And NOW … Database trends – 3 No fix structure
  • 24.
    • 2006 -160 • 2008 – 390 • 2010 – 998 • 2012 – 2000+ • First column is in years • Second column is in … ? Database trends – 4 Data Size
  • 25.
  • 26.
    What we need RelationalDatabase Performance RDBMS performance
  • 27.
    So, we endup with
  • 28.
    So, we endup with Non-Relational Database
  • 29.
    1. Key-Value 2. Document 3.Big Table 4. Graph DB Categories of NoSQL Database
  • 30.
    • Design tohandle massive load • Can scale to massive amounts of data • Based on Key-Value collections • Dynamic ring partition • Dynamic replication • Ex.: Dynoite Key-Value
  • 31.
  • 32.
    • Like columnoriented Relational Database, but with a twist • Tables similar to RDBMS, but handle semi- structured • Based on Google’s BigTable paper • Data mode: • Columns – columns family -> ACL • Dataums keyed by - row, column, time, index • Row-range – table -> distribution • Ex.: Cassandra Big Table
  • 34.
    • Similar withKey-Value pair but • DB knows what the Value is • Inspired by Lotus Notes • Data model: • Collections of Key-Value collections • Documents are often versioned • Ex.: MongoDB Document Database
  • 36.
    • Focus inmodeling the structure of data • The interconnectivity • Scales on the complexity of data • Inspired by mathematical Graph Theory • Data model: • Property Graph -> Nodes • Relationships/Edges between Nodes • Key-Value pair on both • Possible Edge Labels and/or Node/Edge Types • Ex.: Neo4j Graph Database
  • 39.
    • Not partof NoSQL community • Still a good solution for a lot of problems • Focuses on matching OOP paradigm • Easy to use • Simple to integrate • Neither gain nor loosing traction Object Database
  • 41.
    • Easy todeploy • No OS management • Scaling • Monitoring • Publish from different source controls • Support different technologies (PHP, node.js, .NET) • Low cost support – shared mode • Reserved mode – dedicated instance • Each site run in an isolated environment Web Sites
  • 42.
  • 43.
  • 44.
    • REST • GQL(SQL Like) • SPARQL • Gremlin • API’s How to query it?
  • 45.
    • Replication • Writeto many • Master/Slave replication • Master reelection • Failover • Either by another machine taking over • Client knowing Availability
  • 46.
    • Most NoSQLsacrifice Consistency • Some NoSQL don’t have Transactions • Atom single operations • Because of this some operations are impossible to implement Correctness
  • 47.
    • NoSQL isthe Batman Performance
  • 48.
    • NoSQL isthe Batman • Durability is sacrificed • On-disk durability • Multiple-replicas durability Performance
  • 49.
    One solution forall our problemes?Web Sites
  • 50.
    One solution forall our problems?
  • 51.
    • Why • Dynamicquery • Content is stored as documents • Big database that need to be very fast • Where • Properties are stored like query and index • Can be used for voting system, CMS or comment storage MongoDB
  • 52.
    • Why • Whenyou make a lot of updates and insert • Reading data is not the main scope of the database (writes are faster than reads) • Content is stored as column • High availability • Where • Can be used with success for logging • Financial industry or any place where we work with a lot of data that is needed to be written • Basket of an e-commerce application Cassandra
  • 53.
    • Why • Fordata that don’t change very often (insert and read and NOT update) • We have a lot of predefined queries and we need versioning support • Where • Is a great database for CMS and CRM. CouchDB
  • 54.
    • Why • Whenyou do data analyzing • Where • Works great in combination with Hadoop HBase
  • 55.
    • Why • Whenwe need high concurrency • When the latency is very low and we want the latency to be minimal • Where • Backend of a game or a system that offer data in real time Membase
  • 56.
    • Why • Whenwe need to make a lot of updates • When the database is not too big and can be kept in memory • Where • Can be used when we have a real time communication, for example a stock market with prices Redis
  • 57.
  • 58.
    • Facebook • Hbase– Facebook messages • Scribe - Real-time click logs • Hive – SQLqueries -> MapReducejobs • Hadoop • Web analytics warehouse • Distribute datastore • MySQLbackup Examples
  • 59.
    • Twitter • Hadoop– Analytics • Hbase – People search • Scribe – Log collection framework • FlockDB – Social graph analysis Examples
  • 63.
  • 64.