No sql

RDBMS ?

Prateek Jain
12-Jul-2012

Common Architecture

Web server Web server Web server

App server App server

Cache server

RDBMS

CMS Data Feeds

SQL - Story till now…

 Stable environment.
 No more discussions on Data stores.
 Easy to train and employ people.
 SQL running effectively at core.

SQL - Story till now…

 For dealing with lists (as tables) it’s a great
language,dynamic and relatively fast
• Sure it has a few problems but give me a language that
doesn’t

What Next…?

We need to fast, scale
and be part of web

ORM - OMG!

 The effort of trying to convert something inherently
hierarchical into something relational
 Probably the biggest waste of programming time,
lines of code and source of bugs and latency is ORM

Challenges

 Data grows exponentially.
 Data is unstructured.
 Data is huge and spread across 100’s/1000’s
of nodes.
 SQL is useful - when things are flat

Lots of data

 In the banking world we have a lot of data
 Today 50-100,000 quotes a second isn’t
unusual
 It gets more complex...
• 10,000 portfolios, each with 1,000 buy/sell orders at specific
prices
• We now have 100,000 prices coming in every second and 10
million orders to watch

Time is critical

 Inthe world of trading only the first one gets
the deal, there is no second place.
 While being first to have the order is what
makes the money banks now have a “new”
problem
“RISK”

Lots of data, lots of calculations

 There are two main flavors of distributed computing
• Data
• Computation

 Often they are closely related but not always.
 To achieve either we usually need lots of memory and CPUs
 We don’t stack them or put them in clusters these days, we
distribute them

Why not RDBMS?

 Not designed to scale out.
 Strongly ACID complaint.
 Slower running queries (specially in joins).
 Schema based.
 Not suited for changing data structure.

CAP Theorem

C – consistency
 A – availability
 P – partition tolerance

** You must make trade-offs and sacriﬁce at least one in favor of
the other two.

Categories
Document Based
Graph Based

Column Based

Key/Value Based Data Structure Based

Eventual Consistency

 Given a sufficiently long period of time, over
which no updates are sent, one can expect
that all updates will, eventually, propagate
through the system and all the replicas will
be consistent.
 In the presence of continuing updates, an
accepted update eventually either reaches a
replica or the replica retires from service.

Scalability

 Scalability is the ability of a system to
increase throughput with addition of
resources to address load increases.
 Scalability can be achieved by:
– Provisioning a large and powerful resource to meet the additional
demands.
– It can be achieved by relying on a cluster of ordinary machines to
work as a unit.

How to choose ?

 Scalability
 Transactionalintegrity and consistency
 Data modeling
 Query support
 Access and interface availability

Scalability

 column-family-centric NoSQL databases are
a good choice if extreme scalability is a
requirement.
 Not well suited for real-time transaction
processing. (RDBMS is best)
 Eventually consistent NoSQL options, like
Cassandra or Riak, may be workable.

Transactional Integrity and Consistency

 Batch-centric analytics on warehoused data
is also not subject to transactional
requirements.
 Data sets that are written once for e.g., web
traffic log files, social networking status
updates, advt. click-through imprints, road-
traffic data, stock market tick data, game
scores etc.

Transactional Integrity and Consistency

 If range operations are common and integrity
of updates is required, an RDBMS is the best
choice.
 If atomicity at an individual item level is
sufficient, then column-family databases,
document databases.

Data Modeling

 RDBMS offers a consistent way of modeling
data. Relational algebra underlies the data
model.
 In the NoSQL world there is no such
standardized and well-defined data model.

Data Modeling

 Ifrelaxed schema is your primary reason for
using NoSQL, then MongoDB is a great
option for getting started with NoSQL.
 MongoDB is used by many web-centric
businesses.

Querying Support

 An RDBMS thrives on SQL support, which
makes accessing and querying data easy.
 Among document databases, MongoDB
provides the best querying capabilities.
 For key/value pairs and in-memory stores,
nothing is more feature-rich than Redis as far
as querying capabilities go.

Querying Support

 Column-family stores like HBase have little to
offer as far as rich querying capabilities go.
 Project called Hive makes it possible to
query HBase using SQL-like syntax and
semantics.

Access and Interface Availability

 MongoDB has the notion of drivers.
 CouchDB always has the RESTful HTTP
interface available.
 Redis, Membase, Riak, HBase, Hypertable,
Cassandra, and Voldemort have support for
language bindings to connect from most
mainstream languages.

50/50 Read and Update

 Resultsshowthat under this test case
Apache Cassandra outperforms the
competition on both read and update
latencies.
 HBase comes close but stays behind
Cassandra.

95/5 Read and Update

 The sorted ordered column-family stores
perform best for contiguous range reads.
 HBase seems to deliver consistent
performance for reads, irrespective of the
number of operations per second.
 MySQL delivers the best performance for
read-only cases.

Future

 Getting ready for polyglot persistence.
 Understanding the database technologies
suitable for immutable data sets.
 Choosing the right database to facilitate ease
of application development.

Examples

 Linked In uses Hadoop for many large-scale
analytics jobs like probabilistically predicting people
you may know.
 Facebook (mysql + HBase, cassandra, ZooKeeper)
 Twitter (mysql + Cassandra + FlockDB)

Feedback

trainer.prateek@gmail.com

No sql

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to No sql

Similar to No sql (20)

Recently uploaded

Recently uploaded (20)

No sql

Editor's Notes