No sql – rise of the clusters

September 19, 2013
Speaker: David Wolfe

Topics
What is SQL? What is NoSQL?
 Why have relational databases been
successful?
 Why did NoSQL databases emerge?
 How are their data models different?


SQL & relational databases
Relational databases are software
applications that store data
 Data is stored in tables that have rows &
columns : think excel spreadsheets


FirstName LastName Age

Zipcode

Gender

Bob

Smith

45

38444

M

Jane

Happy

23

15122

F

Fred

Jones

55

92102

M

Johnny

Appleseed

26

90025

M



Relational databases typically have
many tables that are “related” to one
another

Relational databases support access to
data in tables through a language called
“SQL” – Structured Query Language
 SQL supports “set” based operations on
tables – selection, projection, joining




SQL is based on relational algebra

Relational databases were developed in
the late 1970s at IBM
 They have been the dominant approach
to data management in the enterprise
through the early 2000’s
 Examples include







Oracle
Sybase
MySQL
Postgress

NoSQL databases
NoSQL are software applications that
store data
 They, not surprisingly, do not use SQL or
the relational model (interrelated tables)
 They are “less strict” about data
definition
 They were developed in a “big-data”
world for applications needing massive
scalability (clustering)


NoSQL databases


There are many types of NoSQL
databases

We will review the differences later

RDBMS value - persistence
During the 90’s and 2000’s as pc’s
became ubiquitous, distributed
computing took off.
 In the 1990’s, client-server and n-tier
architectures dominated enterprise
development
 The late 90’s and 2000’s saw the
dominance of the web and distributed
applications that broke out of enterprise


RDBMS value - persistence


In this distributed world where
applications needed to keep data
around for
 Many users

 Extended periods

RDBMS emerged as the defacto choice for
persisting data.

RDBMS value - concurrency


Another challenge that distributed
applications presented was
concurrency:
 many users viewing and potentially updating

the same data at the same time

Concurrency is notoriously difficult to
get right for even the best engineers.
 Relational databases “helped” by
controlling data access with transactions


RDBMS value - integration


Enterprise application eco-systems
necessitate multiple integrated software
applications. Example
 Customer Service app

 Biz Intel app
 E-Commerce app
 Inventory management apps



Common approach was to use a shared
rdbms database integration approach.

RDBMS value – SQL
RDBMS providers all supported a core
SQL standard
 In theory this would allow developers to
switch reliance on different RDBMS
providers without problems
 In fact, different providers (Oracle,
Sybase, Microsoft) developed different
“dialects” or SQL extensions (pl SQL vs.
T-SQL)


Crack #1– impedance mismatch


Impedance mismatch is the difference
between the relational model and inmemory data structures

In the late 1990s people believed that
impedance mismatch would lead to
RDBMS being replaced by databases
that replicated in-memory structures to
disk (OODBMS)
 While the 1990s saw the rise of OO
programming languages, OODBMS
never took gained real traction




OODBMS didn’t gain traction because
 Impedance mismatch had been made easier

to deal with by Object-Relational (OR)
mapping frameworks like Hibernate, iBatis,
& Cocoon
 There was a growing professional divide
between application developers and
database administrators
 The value of RDBMS as an app integration
mechanism was large

Crack #2– SOA
The 2000’s saw a shift in how enterprise
applications interacted
 Historically, many applications interacted
through a shared RDBMS.
 This approach – shared integration
RDBMS – has serious problems


 Overly complex schema
 Cant change tables or add indices easily
 Database has to preserve integrity

Crack #2– SOA
Interactions between applications shifted
to web-services
 Web-services constituted protocols for
moving documents (XML, JSON) over
HTTP using SOAP or REST based
approaches
 SOA allowed applications to
encapsulate data and expose it through
services


The Final Crack #3– Clusters
The internet saw several large web
properties dramatically increase in scale
 Websites started tracking activity and
structure in a very detailed way









Social gestures
Social links
Log data
Purchase gestures

Increasing numbers of users appeared
using more devices

The Final Crack #3– Clusters
The problem with scaling out (clustering)
is that RDBMS are not designed to run
on clusters.
 Oracle RAC & MS SQL Server all use
the concept of a shared disk sub-system


 Still single point of failure and scaling

limitation


The final crack – mismatch between
RDBMS & clusters

NoSQL Emergence


The emergence of NoSQL was really
about needing databases that run on
clusters




One exception is Graph databases

Though problems with shared database
integration and impedance mismatch
existed, it was the need for scale that
drove the emergence of NoSQL
databases

Aggregate Data Models
A key characteristic of NoSQL
databases is that they do not use the
Relational data metamodel (relations &
tuples)
 There are four types of data
metamodels in the NoSQL eco-system







Key-value
Document
Column-family
Graph

Aggregate Data Models


Key-value, document, and columnfamily NoSQL databases share a
common characteristic of their data
models called “aggregate orientation”


We ill not cover graph based data metamodels in this presentation

Aggregates
The relational model takes information
you want to store and divides it into
rows.
 Rows are lists of simple data values.
 Rows are the unit of data operation
 Aggregate orientation recognizes that
often times data units can be more
complex and can have nested lists and
record structures


Aggregates







The relational model takes information you
want to store and divides it into rows.
In RDBMS rows are lists of simple data
values.
In RDBMS rows are the unit of data
operation
Aggregate orientation recognizes that often
times data units can be more complex and
can have nested lists and record structures
With Aggregates, aggregates are the unit
of data operation

Consequences of Aggregate
Orientation
Relations capture data elements and
relations, but not aggregates.
 Aggregates are really “chunks” of data
that are typically retrieved and operated
on as an interaction unit.
 Aggregates are about how the data is
being used.
 RDBMS do not have knowledge of
aggregate structure and cant use it to
store and distribute data


Orientation
So, RDBMS are aggregate-ignorant. Is
that a bad or good thing? Its both
 Its good if you need to access and use
the data in many different ways – if you
don’t have a primary structure for
manipulating your data
 Its bad if you want to run on a cluster.
 Aggregates are great on clusters
because you can distribute them across
nodes


Orientation
Aggregate orientation allows you to
operate many logical data items (in the
aggregate) by updating the aggregate
atomically
 Aggregate oriented NoSQL databases
can be said to support transactions on
single aggregates, but not across
aggregates


Key-Value & Document Data
Models
Both types of databases have a key or
Id that is mapped to an aggregate data
structure in a virtual table
 With key-value NoSQL dbs, we can only
access the aggregate by looking up its
key
 With document databases we can also
look up aggregates by fields in the
aggregate


Key-Value & Document Data
Models


Examples of Key-Value NoSQL dbs are
 Redis



Examples of Document NoSQL dbs are
 Mongodb
 Couchbase
 SimpleDB

Column-Family Data Models
These NoSQL databases where
influenced by Google’s BigTable
 The Columnar is a two-level aggregate
structure


 There is a key (row identifier) that maps to

the aggregate of interest
 The aggregate is a map of more detailed
values – these are referred to as columns

Column-family dbs organize columns
into families
 The data is row-oriented


 Each row is an aggregate (eg. Customer

with id 1234)


The data is column-oriented
 Each column family defines a record type

(customer profile)


But, columns can also be dynamic and
unique (to model lists)



Examples of Column-Family NoSQL dbs
are
 Hbase
 Cassandra

Polyglot Persistence
The future?
 Only NoSQL?
 Only SQL?


Probably both – Polyglot Persistence

No sql – rise of the clusters

More Related Content

What's hot

Similar to No sql – rise of the clusters

Recently uploaded

No sql – rise of the clusters