No sql

RDBMS ?

Prateek Jain
12-Jul-2012

Common Architecture

Web server Web server Web server

App server App server

Cache server

RDBMS

CMS Data Feeds

SQL - Story till now…

 Stable environment.
 No more discussions on Data stores.
 Easy to train and employ people.
 SQL running effectively at core.

SQL - Story till now…

 For dealing with lists (as tables) it’s a great
language,dynamic and relatively fast
• Sure it has a few problems but give me a language that
doesn’t

What Next…?

We need to fast, scale
and be part of web

ORM - OMG!

 The effort of trying to convert something inherently
hierarchical into something relational
 Probably the biggest waste of programming time,
lines of code and source of bugs and latency is ORM

Challenges

 Data grows exponentially.
 Data is unstructured.
 Data is huge and spread across 100’s/1000’s
of nodes.
 SQL is useful - when things are flat

Lots of data

 In the banking world we have a lot of data
 Today 50-100,000 quotes a second isn’t
unusual
 It gets more complex...
• 10,000 portfolios, each with 1,000 buy/sell orders at specific
prices
• We now have 100,000 prices coming in every second and 10
million orders to watch

Time is critical

 Inthe world of trading only the first one gets
the deal, there is no second place.
 While being first to have the order is what
makes the money banks now have a “new”
problem
“RISK”

Lots of data, lots of calculations

 There are two main flavors of distributed computing
• Data
• Computation

 Often they are closely related but not always.
 To achieve either we usually need lots of memory and CPUs
 We don’t stack them or put them in clusters these days, we
distribute them

Why not RDBMS?

 Not designed to scale out.
 Strongly ACID complaint.
 Slower running queries (specially in joins).
 Schema based.
 Not suited for changing data structure.

CAP Theorem

C – consistency
 A – availability
 P – partition tolerance

** You must make trade-offs and sacriﬁce at least one in favor of
the other two.

Categories
Document Based
Graph Based

Column Based

Key/Value Based Data Structure Based

Eventual Consistency

 Given a sufficiently long period of time, over
which no updates are sent, one can expect
that all updates will, eventually, propagate
through the system and all the replicas will
be consistent.
 In the presence of continuing updates, an
accepted update eventually either reaches a
replica or the replica retires from service.

Scalability

 Scalability is the ability of a system to
increase throughput with addition of
resources to address load increases.
 Scalability can be achieved by:
– Provisioning a large and powerful resource to meet the additional
demands.
– It can be achieved by relying on a cluster of ordinary machines to
work as a unit.

How to choose ?

 Scalability
 Transactionalintegrity and consistency
 Data modeling
 Query support
 Access and interface availability

Scalability

 column-family-centric NoSQL databases are
a good choice if extreme scalability is a
requirement.
 Not well suited for real-time transaction
processing. (RDBMS is best)
 Eventually consistent NoSQL options, like
Cassandra or Riak, may be workable.

Transactional Integrity and Consistency

 Batch-centric analytics on warehoused data
is also not subject to transactional
requirements.
 Data sets that are written once for e.g., web
traffic log files, social networking status
updates, advt. click-through imprints, road-
traffic data, stock market tick data, game
scores etc.

Transactional Integrity and Consistency

 If range operations are common and integrity
of updates is required, an RDBMS is the best
choice.
 If atomicity at an individual item level is
sufficient, then column-family databases,
document databases.

Data Modeling

 RDBMS offers a consistent way of modeling
data. Relational algebra underlies the data
model.
 In the NoSQL world there is no such
standardized and well-defined data model.

Data Modeling

 Ifrelaxed schema is your primary reason for
using NoSQL, then MongoDB is a great
option for getting started with NoSQL.
 MongoDB is used by many web-centric
businesses.

Querying Support

 An RDBMS thrives on SQL support, which
makes accessing and querying data easy.
 Among document databases, MongoDB
provides the best querying capabilities.
 For key/value pairs and in-memory stores,
nothing is more feature-rich than Redis as far
as querying capabilities go.

Querying Support

 Column-family stores like HBase have little to
offer as far as rich querying capabilities go.
 Project called Hive makes it possible to
query HBase using SQL-like syntax and
semantics.

Access and Interface Availability

 MongoDB has the notion of drivers.
 CouchDB always has the RESTful HTTP
interface available.
 Redis, Membase, Riak, HBase, Hypertable,
Cassandra, and Voldemort have support for
language bindings to connect from most
mainstream languages.

50/50 Read and Update

 Resultsshowthat under this test case
Apache Cassandra outperforms the
competition on both read and update
latencies.
 HBase comes close but stays behind
Cassandra.

95/5 Read and Update

 The sorted ordered column-family stores
perform best for contiguous range reads.
 HBase seems to deliver consistent
performance for reads, irrespective of the
number of operations per second.
 MySQL delivers the best performance for
read-only cases.

Future

 Getting ready for polyglot persistence.
 Understanding the database technologies
suitable for immutable data sets.
 Choosing the right database to facilitate ease
of application development.

Examples

 Linked In uses Hadoop for many large-scale
analytics jobs like probabilistically predicting people
you may know.
 Facebook (mysql + HBase, cassandra, ZooKeeper)
 Twitter (mysql + Cassandra + FlockDB)

Feedback

trainer.prateek@gmail.com

No sql

More Related Content

What's hot

Viewers also liked

Similar to No sql

Recently uploaded

No sql

Editor's Notes