NoSQL-Overview

Ranjeet Kumar Jha
Reachable:
• ranjeet@policybazaar.co
m
• Cell: +91 9811006657
Exp:
• Java JEE: 13+
• NoSQL/BigData: 4+
2
LinkedIn: https://in.linkedin.com/in/jharanjeet
(Oracle Certified Enterprise Architect)
PolicyBazaar.com

Agenda
• Before SQL and After SQL
• NoSQL universe
• Trend of NoSQL
• Characteristic of BigData
3V
• Where to use NoSQL
• What NoSQL must deliver
• Classification of NoSQL
databases
• Size Vs Complexity
• Visual Guide of CAP
Theorem
• Overview of key/Value
Store
• Overview of Document
Store
• Overview of Column
Family Store
• Overview of Graph Store
• Use Case of Twitter
3PolicyBazaar.com

Three Eras of Databases
4
Note: The era of using RDBMSes for all problems is over. Instead
we should use the database most suited for the problem at hand.
PolicyBazaar.com

Before NoSQL DB Selection Was Easy!
5PolicyBazaar.com

Big Data Definition
• Volumes & volumes of data
• Unstructured
• Semi-structured
• Not suited for Relational Databases
• Often utilizes MapReduce frameworks
6PolicyBazaar.com

Databases Universe
7Source: http://arxiv.org/ftp/arxiv/papers/1307/1307.0191.pdfPolicyBazaar.com

The NO-SQL Universe
8PolicyBazaar.com

Before NoSQL
9PolicyBazaar.com

Pressures on Single Node RDBMS
Architectures
10PolicyBazaar.com

After NoSQL
11PolicyBazaar.com

RDBMS vs. NoSQL
12
Source: http://www.google.com/trends/explore#q=nosql%2C%20rdbms&date=1%2F2009%2051m&cmpt=q
PolicyBazaar.com

NoSQL or SQL?
• Wrong question
• What is your problem?
– Transactions
– Amount of data
– Data structure
– Scale-out Vs Scale-up
– OLTP Or OLAP
13PolicyBazaar.com

What is your problem…
• Key Evaluation Requirements
– Transactional, Durability & Consistency
– Response time
– Functionality
– Data characteristics
– Scalability, Clustering
– Failover
– Maintenance, Online changes, Node Management
– Maturity
– Community, Support
– Hosted or Managed
– Cost, open source
14PolicyBazaar.com

Why NOSQL Now?
•Trend 1: Size
•Trend 2: Connectedness
•Trend 3: Semi-structure
•Trend 4: Architecture
15PolicyBazaar.com

Character of Big Data: 3V
• Volume: Large volumes of data
– Today, Facebook ingests 500 terabytes of new data every day; a Boeing 737 will
generate 240 terabytes of flight data during a single flight across the US
• Velocity: rate of moving data
– E.g. Clickstreams and ad impressions capture user behavior at millions of events per
second;
• Variety: structured, semi structure, unstructured,
images, etc.
– Big Data data isn't just numbers, dates, and strings. Big Data is also geospatial data, 3D
data, audio and video, and unstructured text, including log files and social media
Source: http://www-01.ibm.com/software/data/bigdata/
16PolicyBazaar.com

Many Uses of Data
• Transactions (OLTP)
• Analysis (OLAP)
• Search and Findability
• Enterprise Agility
• Speed and Reliability
• Consistency and Availability
• Or anything else…
17PolicyBazaar.com

Where to use NoSQL?
• Social data
• Data processing (Hadoop)
• Search (Lucene)
• Caching (Memcache, ...)
• Data Warehousing
• Logging
• ...
18PolicyBazaar.com

What NoSQL must deliver
• Massive scalability
– No application-level sharding
• Performance
• High Availability/Fault Tolerance
• Ease of use
– Simple operations/administration
– No application-level sharding
– Simple APIs
– Quickly evolve application & schema
19PolicyBazaar.com

Classification of NoSQL Databases
• Key-Value
– Very popular for simple key-value lookup: disk/memory. e.g
Dynamo, Redis,, Voldemort, MemcachedDB, Berkeley, HazelCast etc
• Document
– Popular for document type storage. e.g. MongoDB, OrientDB, CouchDB,
Riak etc.
• Column Family
– Key value with fixed column families, allows dynamic columns
within column family. E.g. Cassandra, BigTable, HBase, Hypertable etc
• Graph
– Connected graph with entity Relationship. e.g.Titan, Neo4j,
infiniteGraph
20PolicyBazaar.com

NoSQL Store
• Key-Value Stores
– Dynamo Clones
• Redis
• Membase
• Riak
• Tokyo Cabinet
• Voldemort
• Document Stores
– MongoDB
– CouchDB
– SimpleDB
• Column Family
– BigTable Clones
• Cassandra
• Hbase
• HyperTable
• Graph Databases
– Neo4J
– Titan
– InfoGrid
– AllegroGraph
21PolicyBazaar.com

NOSQL: Size Vs Complexity
22
Sources: http://blogs.neotechnology.com/emil/2009/11/nosql-scaling-to-size-
and-scaling-to-complexity.html
PolicyBazaar.com

Visual Guide to NoSQL
23Sources: http://blog.nahurst.com/visual-guide-to-nosql-systemsPolicyBazaar.com

Key-Value Store
• Focus on scaling to huge amounts of data
• Designed to handle massive load
• Based on Amazon’s Dynamo paper
• Data model: (global) collection of Key-Value
pairs
• Dynamo ring partitioning and replication
24PolicyBazaar.com

Types of Key-Value Stores
• Eventually-consistent key-value store
• Hierarchical key-value stores
• Key-Value stores in RAM
• Key-Value stores on disk
• High availability key-value store
• Ordered key-value stores
• Values that allow simple list operations
25PolicyBazaar.com

Key / value stores (Opaque)
• Keys are mapped to values
• Values are treated as BLOBs (opaque data)
• No type information is stored
• Values can be heterogeneous
• Example values:
{ name: “ranjeet“, age: 35, city: “DL“ } => JSON, but store will not care about it
xdexadxb0x0b => binary, but store will not care about it
26
Key Value
PolicyBazaar.com

• Open source in-memory key-value store with
optional durability
• Focus on high speed reads and writes of
common data structures to RAM
• Allows simple lists, sets and hashes to be
stored within the value and manipulated
• Many features that developers like
– expiration, transactions, pub/sub, partitioning
27PolicyBazaar.com

BigTable clones
• Like column oriented Relational Databases,
but with a twist
• Tables similarly to RDBMS, but handles semi-
structured
• Based on Google’s BigTable paper
28PolicyBazaar.com

Document Store
• Data stored in nested hierarchies
• Logical data remains stored together as a unit
• Any item in the document can be queried
• Similar to Key-Value stores, but the DB knows
what the Value is
• Inspired by Lotus Notes
• Documents are often versioned
29PolicyBazaar.com

Document Store …
• Data model: Collections of Key-Value
collections
• Pros: No object-relational mapping layer, ideal
for search, Schema less
• Cons: Complex to implement, incompatible
with SQL
• Examples: MongoDB, Couchbase, CouchDB
30PolicyBazaar.com

MongoDB (DocumentDB)
• Open Source JSON data store created
by 10gen
• Master-slave scale out model
• Strong developer community
• Sharding built-in, automatic
• Implemented in C++ with many APIs
(C++, JavaScript, Java, .net, Perl, Python etc.)
31PolicyBazaar.com

Column-Family
• Key includes a row, column family and column
name
• Store versioned blobs in one large table
• Queries can be done on rows, column families
and column names
• Pros: Great scale out, Performant, versioning
• Cons: Cannot query blob content, row and
column designs are critical
• Examples: Cassandra, Bigtable, HBase, Hypertable, Apache
Accumulo
32PolicyBazaar.com

The Evolution of Cassandra
33PolicyBazaar.com

Cassandra
• Apache open source column family database
supported by DataStax
• Peer-to-peer distribution model
• Strong reputation for linear scale out (millions
of writes/second)
• Database side security
• Written in Java and works well with HDFS and
MapReduce
34PolicyBazaar.com

Cassandra: Feature Headlines
• Elastic
– Read and write throughput increases linearly as
new machines are
• Decentralized
– Fault tolerant with no single point of failure; no
“master” node
• Rich data model
– Column based, range slices, column slices,
secondary indexes, counters, expiring columns
35
Source: http://cassandra.apache.org/
PolicyBazaar.com

• Apache Hadoop is a framework that allows for the
distributed processing of large data sets across clusters of
commodity computers using a simple programming model.
It is designed to scale up from single servers to thousands
of machines, each providing computation and storage.
• Hadoop is an open-source implementation of Google
MapReduce, GFS(distributed file system).
• Hadoop was created by Doug Cutting, the creator of Apache
Lucene, the widely used text search library.
• Hadoop fulfill need of common infrastructure
– Efficient, reliable, easy to use
– Open Source, Apache License Hadoop origins
36PolicyBazaar.com

HBase /Hadoop
• Open source implementation of MapReduce
algorithm written in Java
• Initially created by Yahoo
• Column-oriented data store
• Java interface
• HBase designed specifically to work with Hadoop
• High-level query language (Pig)
• Strong support by many vendors
37PolicyBazaar.com

Graph Store
• Focus on modeling the structure of data -
interconnectivity
• Scales to the complexity of the data
• Inspired by mathematical Graph Theory ( G=(E,V)
) Data is stored in a series of nodes, relationships
and properties
• Queries are really graph traversals
• Data is stored in a series of nodes, relationships
and properties
• Ideal when relationships between data is key:
– e.g. social networks
38PolicyBazaar.com

Graph Store (cont..)
• Ideal when relationships between data is key:
– e.g. social networks
• Data model: “Property Graph” ‣Nodes
‣Relationships/Edges between Nodes ‣Key-Value
pairs on both ‣Possibly Edge Labels and/or Node/
Edge Types
• Pros: fast network search, works with public
linked data sets
• Cons: specialized query languages (RDF uses
SPARQL) , gramlin, cypher)
• Examples: Neo4j, Titan, AllegroGraph, InfiniteGraph..
39PolicyBazaar.com

Graph Stores (cont..)
• Used when the relationship and relationships
types between items are critical
• Used for
– Social networking queries: "friends of my friends"
– Inference and rules engines
– Pattern recognition
– Used for working with open-linked data
• Automate "joins" of public data
40PolicyBazaar.com

Property Graph model
• Nodes i.e. Vertex
• Relationships between Nodes i.e Edge
• Relationships have Labels
• Relationships are directed, but traversed at equal
speed in both directions
• The semantics of the direction is up to the
application (LIVES WITH is reflexive, LOVES is not)
• Nodes have key-value properties
• Relationships have key-value properties
41PolicyBazaar.com

Neo4J
• Graph database designed to be easy to
use by Java developers
• Dual license (community edition is
GPL)
• Works as an embedded java library in
your application
• Disk-based (not just RAM)
• Full ACID
42PolicyBazaar.com

Decides what you need
• SQL
– Relational, transactional processing
• NoSQL
– Non relational, distributed, high performance and
highly scalable
• Analytics, Warehouse, BigData
– Data Warehousing, Analytics, Data science, and
reporting
• Combination of all 3
– Begin with SQL, NoSQL and eventually need BigData/
Analytics platform
43PolicyBazaar.com

Finally… in One liner…
• SQL
– Works great , can’t easily scale.
• NoSQL
• Works great , can’t fit for all
• Analytics, BigData
– Every Business need it.
44PolicyBazaar.com

Use Case: Twitter
• Twitter challenges
– Needs to store many graphs
• Who you are following
• Who’s following you
• Who you receive phone notifications from etc
– To deliver a tweet requires rapid paging of followers
– Heavy write load as followers are added and removed
– Set arithmetic for @mentions (intersection of users).
45PolicyBazaar.com

Use Case: Twitter …
• What did they try?
• Started with Relational Databases
• Tried Key-Value storage of denormalized lists
• Did it work?
– Nope
– Either good at Handling the write load or paging
large amounts of data But not both
46PolicyBazaar.com

Open source implementations to play
with!
• MongoDB - http://www.mongodb.org/
• Cassandra - http://cassandra.apache.org/
• Neo4j - http://neo4j.org/
• Hadoop + Hbase - http://hadoop.apache.org/
• Redis - http://code.google.com/p/redis/
• Oracle Berkley DB - http://www.oracle.com/
database/berkeley-db/
• … and Many more…
47PolicyBazaar.com

Thank You
For any Query or feedback write to me
ranjeet@policyBazaar.com
ranjeet.kr@gmail.com
PolicyBazaar.com 48

NoSQL-Overview

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to NoSQL-Overview

Similar to NoSQL-Overview (20)

NoSQL-Overview