SlideShare a Scribd company logo
1 of 48
Download to read offline
1PolicyBazaar.com
Ranjeet Kumar Jha
Reachable:
• ranjeet@policybazaar.co
m
• Cell: +91 9811006657
Exp:
• Java JEE: 13+
• NoSQL/BigData: 4+
2
LinkedIn: https://in.linkedin.com/in/jharanjeet
(Oracle Certified Enterprise Architect)
PolicyBazaar.com
Agenda
• Before SQL and After SQL
• NoSQL universe
• Trend of NoSQL
• Characteristic of BigData
3V
• Where to use NoSQL
• What NoSQL must deliver
• Classification of NoSQL
databases
• Size Vs Complexity
• Visual Guide of CAP
Theorem
• Overview of key/Value
Store
• Overview of Document
Store
• Overview of Column
Family Store
• Overview of Graph Store
• Use Case of Twitter
3PolicyBazaar.com
Three Eras of Databases
4
Note: The era of using RDBMSes for all problems is over. Instead
we should use the database most suited for the problem at hand.
PolicyBazaar.com
Before NoSQL DB Selection Was Easy!
5PolicyBazaar.com
Big Data Definition
• Volumes & volumes of data
• Unstructured
• Semi-structured
• Not suited for Relational Databases
• Often utilizes MapReduce frameworks
6PolicyBazaar.com
Databases Universe
7Source: http://arxiv.org/ftp/arxiv/papers/1307/1307.0191.pdfPolicyBazaar.com
The NO-SQL Universe
8PolicyBazaar.com
Before NoSQL
9PolicyBazaar.com
Pressures on Single Node RDBMS
Architectures
10PolicyBazaar.com
After NoSQL
11PolicyBazaar.com
RDBMS vs. NoSQL
12
Source: http://www.google.com/trends/explore#q=nosql%2C%20rdbms&date=1%2F2009%2051m&cmpt=q
PolicyBazaar.com
NoSQL or SQL?
• Wrong question
• What is your problem?
– Transactions
– Amount of data
– Data structure
– Scale-out Vs Scale-up
– OLTP Or OLAP
13PolicyBazaar.com
What is your problem…
• Key Evaluation Requirements
– Transactional, Durability & Consistency
– Response time
– Functionality
– Data characteristics
– Scalability, Clustering
– Failover
– Maintenance, Online changes, Node Management
– Maturity
– Community, Support
– Hosted or Managed
– Cost, open source
14PolicyBazaar.com
Why NOSQL Now?
•Trend 1: Size
•Trend 2: Connectedness
•Trend 3: Semi-structure
•Trend 4: Architecture
15PolicyBazaar.com
Character of Big Data: 3V
• Volume: Large volumes of data
– Today, Facebook ingests 500 terabytes of new data every day; a Boeing 737 will
generate 240 terabytes of flight data during a single flight across the US
• Velocity: rate of moving data
– E.g. Clickstreams and ad impressions capture user behavior at millions of events per
second;
• Variety: structured, semi structure, unstructured,
images, etc.
– Big Data data isn't just numbers, dates, and strings. Big Data is also geospatial data, 3D
data, audio and video, and unstructured text, including log files and social media
Source: http://www-01.ibm.com/software/data/bigdata/
16PolicyBazaar.com
Many Uses of Data
• Transactions (OLTP)
• Analysis (OLAP)
• Search and Findability
• Enterprise Agility
• Speed and Reliability
• Consistency and Availability
• Or anything else…
17PolicyBazaar.com
Where to use NoSQL?
• Social data
• Data processing (Hadoop)
• Search (Lucene)
• Caching (Memcache, ...)
• Data Warehousing
• Logging
• ...
18PolicyBazaar.com
What NoSQL must deliver
• Massive scalability
– No application-level sharding
• Performance
• High Availability/Fault Tolerance
• Ease of use
– Simple operations/administration
– No application-level sharding
– Simple APIs
– Quickly evolve application & schema
19PolicyBazaar.com
Classification of NoSQL Databases
• Key-Value
– Very popular for simple key-value lookup: disk/memory. e.g
Dynamo, Redis,, Voldemort, MemcachedDB, Berkeley, HazelCast etc
• Document
– Popular for document type storage. e.g. MongoDB, OrientDB, CouchDB,
Riak etc.
• Column Family
– Key value with fixed column families, allows dynamic columns
within column family. E.g. Cassandra, BigTable, HBase, Hypertable etc
• Graph
– Connected graph with entity Relationship. e.g.Titan, Neo4j,
infiniteGraph
20PolicyBazaar.com
NoSQL Store
• Key-Value Stores
– Dynamo Clones
• Redis
• Membase
• Riak
• Tokyo Cabinet
• Voldemort
• Document Stores
– MongoDB
– CouchDB
– SimpleDB
• Column Family
– BigTable Clones
• Cassandra
• Hbase
• HyperTable
• Graph Databases
– Neo4J
– Titan
– InfoGrid
– AllegroGraph
21PolicyBazaar.com
NOSQL: Size Vs Complexity
22
Sources: http://blogs.neotechnology.com/emil/2009/11/nosql-scaling-to-size-
and-scaling-to-complexity.html
PolicyBazaar.com
Visual Guide to NoSQL
23Sources: http://blog.nahurst.com/visual-guide-to-nosql-systemsPolicyBazaar.com
Key-Value Store
• Focus on scaling to huge amounts of data
• Designed to handle massive load
• Based on Amazon’s Dynamo paper
• Data model: (global) collection of Key-Value
pairs
• Dynamo ring partitioning and replication
24PolicyBazaar.com
Types of Key-Value Stores
• Eventually-consistent key-value store
• Hierarchical key-value stores
• Key-Value stores in RAM
• Key-Value stores on disk
• High availability key-value store
• Ordered key-value stores
• Values that allow simple list operations
25PolicyBazaar.com
Key / value stores (Opaque)
• Keys are mapped to values
• Values are treated as BLOBs (opaque data)
• No type information is stored
• Values can be heterogeneous
• Example values:
{ name: “ranjeet“, age: 35, city: “DL“ } => JSON, but store will not care about it
xdexadxb0x0b => binary, but store will not care about it
26
Key Value
PolicyBazaar.com
• Open source in-memory key-value store with
optional durability
• Focus on high speed reads and writes of
common data structures to RAM
• Allows simple lists, sets and hashes to be
stored within the value and manipulated
• Many features that developers like
– expiration, transactions, pub/sub, partitioning
27PolicyBazaar.com
BigTable clones
• Like column oriented Relational Databases,
but with a twist
• Tables similarly to RDBMS, but handles semi-
structured
• Based on Google’s BigTable paper
28PolicyBazaar.com
Document Store
• Data stored in nested hierarchies
• Logical data remains stored together as a unit
• Any item in the document can be queried
• Similar to Key-Value stores, but the DB knows
what the Value is
• Inspired by Lotus Notes
• Documents are often versioned
29PolicyBazaar.com
Document Store …
• Data model: Collections of Key-Value
collections
• Pros: No object-relational mapping layer, ideal
for search, Schema less
• Cons: Complex to implement, incompatible
with SQL
• Examples: MongoDB, Couchbase, CouchDB
30PolicyBazaar.com
MongoDB (DocumentDB)
• Open Source JSON data store created
by 10gen
• Master-slave scale out model
• Strong developer community
• Sharding built-in, automatic
• Implemented in C++ with many APIs
(C++, JavaScript, Java, .net, Perl, Python etc.)
31PolicyBazaar.com
Column-Family
• Key includes a row, column family and column
name
• Store versioned blobs in one large table
• Queries can be done on rows, column families
and column names
• Pros: Great scale out, Performant, versioning
• Cons: Cannot query blob content, row and
column designs are critical
• Examples: Cassandra, Bigtable, HBase, Hypertable, Apache
Accumulo
32PolicyBazaar.com
The Evolution of Cassandra
33PolicyBazaar.com
Cassandra
• Apache open source column family database
supported by DataStax
• Peer-to-peer distribution model
• Strong reputation for linear scale out (millions
of writes/second)
• Database side security
• Written in Java and works well with HDFS and
MapReduce
34PolicyBazaar.com
Cassandra: Feature Headlines
• Elastic
– Read and write throughput increases linearly as
new machines are
• Decentralized
– Fault tolerant with no single point of failure; no
“master” node
• Rich data model
– Column based, range slices, column slices,
secondary indexes, counters, expiring columns
35
Source: http://cassandra.apache.org/
PolicyBazaar.com
• Apache Hadoop is a framework that allows for the
distributed processing of large data sets across clusters of
commodity computers using a simple programming model.
It is designed to scale up from single servers to thousands
of machines, each providing computation and storage.
• Hadoop is an open-source implementation of Google
MapReduce, GFS(distributed file system).
• Hadoop was created by Doug Cutting, the creator of Apache
Lucene, the widely used text search library.
• Hadoop fulfill need of common infrastructure
– Efficient, reliable, easy to use
– Open Source, Apache License Hadoop origins
36PolicyBazaar.com
HBase /Hadoop
• Open source implementation of MapReduce
algorithm written in Java
• Initially created by Yahoo
• Column-oriented data store
• Java interface
• HBase designed specifically to work with Hadoop
• High-level query language (Pig)
• Strong support by many vendors
37PolicyBazaar.com
Graph Store
• Focus on modeling the structure of data -
interconnectivity
• Scales to the complexity of the data
• Inspired by mathematical Graph Theory ( G=(E,V)
) Data is stored in a series of nodes, relationships
and properties
• Queries are really graph traversals
• Data is stored in a series of nodes, relationships
and properties
• Ideal when relationships between data is key:
– e.g. social networks
38PolicyBazaar.com
Graph Store (cont..)
• Ideal when relationships between data is key:
– e.g. social networks
• Data model: “Property Graph” ‣Nodes
‣Relationships/Edges between Nodes ‣Key-Value
pairs on both ‣Possibly Edge Labels and/or Node/
Edge Types
• Pros: fast network search, works with public
linked data sets
• Cons: specialized query languages (RDF uses
SPARQL) , gramlin, cypher)
• Examples: Neo4j, Titan, AllegroGraph, InfiniteGraph..
39PolicyBazaar.com
Graph Stores (cont..)
• Used when the relationship and relationships
types between items are critical
• Used for
– Social networking queries: "friends of my friends"
– Inference and rules engines
– Pattern recognition
– Used for working with open-linked data
• Automate "joins" of public data
40PolicyBazaar.com
Property Graph model
• Nodes i.e. Vertex
• Relationships between Nodes i.e Edge
• Relationships have Labels
• Relationships are directed, but traversed at equal
speed in both directions
• The semantics of the direction is up to the
application (LIVES WITH is reflexive, LOVES is not)
• Nodes have key-value properties
• Relationships have key-value properties
41PolicyBazaar.com
Neo4J
• Graph database designed to be easy to
use by Java developers
• Dual license (community edition is
GPL)
• Works as an embedded java library in
your application
• Disk-based (not just RAM)
• Full ACID
42PolicyBazaar.com
Decides what you need
• SQL
– Relational, transactional processing
• NoSQL
– Non relational, distributed, high performance and
highly scalable
• Analytics, Warehouse, BigData
– Data Warehousing, Analytics, Data science, and
reporting
• Combination of all 3
– Begin with SQL, NoSQL and eventually need BigData/
Analytics platform
43PolicyBazaar.com
Finally… in One liner…
• SQL
– Works great , can’t easily scale.
• NoSQL
• Works great , can’t fit for all
• Analytics, BigData
– Every Business need it.
44PolicyBazaar.com
Use Case: Twitter
• Twitter challenges
– Needs to store many graphs
• Who you are following
• Who’s following you
• Who you receive phone notifications from etc
– To deliver a tweet requires rapid paging of followers
– Heavy write load as followers are added and removed
– Set arithmetic for @mentions (intersection of users).
45PolicyBazaar.com
Use Case: Twitter …
• What did they try?
• Started with Relational Databases
• Tried Key-Value storage of denormalized lists
• Did it work?
– Nope
– Either good at Handling the write load or paging
large amounts of data But not both
46PolicyBazaar.com
Open source implementations to play
with!
• MongoDB - http://www.mongodb.org/
• Cassandra - http://cassandra.apache.org/
• Neo4j - http://neo4j.org/
• Hadoop + Hbase - http://hadoop.apache.org/
• Redis - http://code.google.com/p/redis/
• Oracle Berkley DB - http://www.oracle.com/
database/berkeley-db/
• … and Many more…
47PolicyBazaar.com
Thank You
For any Query or feedback write to me
ranjeet@policyBazaar.com
ranjeet.kr@gmail.com
PolicyBazaar.com 48

More Related Content

What's hot

Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQLTony Tam
 
The Hadoop Ecosystem for Developers
The Hadoop Ecosystem for DevelopersThe Hadoop Ecosystem for Developers
The Hadoop Ecosystem for DevelopersZohar Elkayam
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...MongoDB
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sqlRam kumar
 
Common MongoDB Use Cases
Common MongoDB Use CasesCommon MongoDB Use Cases
Common MongoDB Use CasesDATAVERSITY
 
introduction to Neo4j (Tabriz Software Open Talks)
introduction to Neo4j (Tabriz Software Open Talks)introduction to Neo4j (Tabriz Software Open Talks)
introduction to Neo4j (Tabriz Software Open Talks)Farzin Bagheri
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and UsesSuvradeep Rudra
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational DatabasesUdi Bauman
 
NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL DatabasesBADR
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless DatabasesDan Gunter
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph DatabasesMax De Marzi
 
Python Ireland Conference 2016 - Python and MongoDB Workshop
Python Ireland Conference 2016 - Python and MongoDB WorkshopPython Ireland Conference 2016 - Python and MongoDB Workshop
Python Ireland Conference 2016 - Python and MongoDB WorkshopJoe Drumgoole
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big dataSteven Francia
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked DataEUCLID project
 
The Real-time Web in the Age of Agents
The Real-time Web in the Age of AgentsThe Real-time Web in the Age of Agents
The Real-time Web in the Age of AgentsJoshua Shinavier
 
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012Amazon Web Services
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDBMongoDB
 

What's hot (20)

Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
 
The Hadoop Ecosystem for Developers
The Hadoop Ecosystem for DevelopersThe Hadoop Ecosystem for Developers
The Hadoop Ecosystem for Developers
 
Mongo db
Mongo dbMongo db
Mongo db
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sql
 
Common MongoDB Use Cases
Common MongoDB Use CasesCommon MongoDB Use Cases
Common MongoDB Use Cases
 
introduction to Neo4j (Tabriz Software Open Talks)
introduction to Neo4j (Tabriz Software Open Talks)introduction to Neo4j (Tabriz Software Open Talks)
introduction to Neo4j (Tabriz Software Open Talks)
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational Databases
 
Relational vs. Non-Relational
Relational vs. Non-RelationalRelational vs. Non-Relational
Relational vs. Non-Relational
 
NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL Databases
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
Python Ireland Conference 2016 - Python and MongoDB Workshop
Python Ireland Conference 2016 - Python and MongoDB WorkshopPython Ireland Conference 2016 - Python and MongoDB Workshop
Python Ireland Conference 2016 - Python and MongoDB Workshop
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big data
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
The Real-time Web in the Age of Agents
The Real-time Web in the Age of AgentsThe Real-time Web in the Age of Agents
The Real-time Web in the Age of Agents
 
Hdfs Dhruba
Hdfs DhrubaHdfs Dhruba
Hdfs Dhruba
 
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDB
 

Similar to NoSQL-Overview

UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxRahul Borate
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxRahul Borate
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web developmentTung Nguyen
 
Introduction to no sql database
Introduction to no sql databaseIntroduction to no sql database
Introduction to no sql databaseHeman Hosainpana
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...Institute of Contemporary Sciences
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureVenu Anuganti
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology LandscapeShivanandaVSeeri
 
NoSQL in the context of Social Web
NoSQL in the context of Social WebNoSQL in the context of Social Web
NoSQL in the context of Social WebBogdan Gaza
 
DataFrames: The Extended Cut
DataFrames: The Extended CutDataFrames: The Extended Cut
DataFrames: The Extended CutWes McKinney
 
MongoDB Scalability Best Practices
MongoDB Scalability Best PracticesMongoDB Scalability Best Practices
MongoDB Scalability Best PracticesJason Terpko
 
Scaling Databases On The Cloud
Scaling Databases On The CloudScaling Databases On The Cloud
Scaling Databases On The CloudImaginea
 
Scaing databases on the cloud
Scaing databases on the cloudScaing databases on the cloud
Scaing databases on the cloudImaginea
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databasesJames Serra
 
Transform from database professional to a Big Data architect
Transform from database professional to a Big Data architectTransform from database professional to a Big Data architect
Transform from database professional to a Big Data architectSaurabh K. Gupta
 

Similar to NoSQL-Overview (20)

UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
 
Introduction to no sql database
Introduction to no sql databaseIntroduction to no sql database
Introduction to no sql database
 
NoSQL
NoSQLNoSQL
NoSQL
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 
Apache drill
Apache drillApache drill
Apache drill
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data Architecture
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology Landscape
 
NoSQL in the context of Social Web
NoSQL in the context of Social WebNoSQL in the context of Social Web
NoSQL in the context of Social Web
 
DataFrames: The Extended Cut
DataFrames: The Extended CutDataFrames: The Extended Cut
DataFrames: The Extended Cut
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
MongoDB Scalability Best Practices
MongoDB Scalability Best PracticesMongoDB Scalability Best Practices
MongoDB Scalability Best Practices
 
NOsql Presentation.pdf
NOsql Presentation.pdfNOsql Presentation.pdf
NOsql Presentation.pdf
 
Nosql data models
Nosql data modelsNosql data models
Nosql data models
 
Scaling Databases On The Cloud
Scaling Databases On The CloudScaling Databases On The Cloud
Scaling Databases On The Cloud
 
Scaing databases on the cloud
Scaing databases on the cloudScaing databases on the cloud
Scaing databases on the cloud
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Transform from database professional to a Big Data architect
Transform from database professional to a Big Data architectTransform from database professional to a Big Data architect
Transform from database professional to a Big Data architect
 

NoSQL-Overview

  • 2. Ranjeet Kumar Jha Reachable: • ranjeet@policybazaar.co m • Cell: +91 9811006657 Exp: • Java JEE: 13+ • NoSQL/BigData: 4+ 2 LinkedIn: https://in.linkedin.com/in/jharanjeet (Oracle Certified Enterprise Architect) PolicyBazaar.com
  • 3. Agenda • Before SQL and After SQL • NoSQL universe • Trend of NoSQL • Characteristic of BigData 3V • Where to use NoSQL • What NoSQL must deliver • Classification of NoSQL databases • Size Vs Complexity • Visual Guide of CAP Theorem • Overview of key/Value Store • Overview of Document Store • Overview of Column Family Store • Overview of Graph Store • Use Case of Twitter 3PolicyBazaar.com
  • 4. Three Eras of Databases 4 Note: The era of using RDBMSes for all problems is over. Instead we should use the database most suited for the problem at hand. PolicyBazaar.com
  • 5. Before NoSQL DB Selection Was Easy! 5PolicyBazaar.com
  • 6. Big Data Definition • Volumes & volumes of data • Unstructured • Semi-structured • Not suited for Relational Databases • Often utilizes MapReduce frameworks 6PolicyBazaar.com
  • 10. Pressures on Single Node RDBMS Architectures 10PolicyBazaar.com
  • 12. RDBMS vs. NoSQL 12 Source: http://www.google.com/trends/explore#q=nosql%2C%20rdbms&date=1%2F2009%2051m&cmpt=q PolicyBazaar.com
  • 13. NoSQL or SQL? • Wrong question • What is your problem? – Transactions – Amount of data – Data structure – Scale-out Vs Scale-up – OLTP Or OLAP 13PolicyBazaar.com
  • 14. What is your problem… • Key Evaluation Requirements – Transactional, Durability & Consistency – Response time – Functionality – Data characteristics – Scalability, Clustering – Failover – Maintenance, Online changes, Node Management – Maturity – Community, Support – Hosted or Managed – Cost, open source 14PolicyBazaar.com
  • 15. Why NOSQL Now? •Trend 1: Size •Trend 2: Connectedness •Trend 3: Semi-structure •Trend 4: Architecture 15PolicyBazaar.com
  • 16. Character of Big Data: 3V • Volume: Large volumes of data – Today, Facebook ingests 500 terabytes of new data every day; a Boeing 737 will generate 240 terabytes of flight data during a single flight across the US • Velocity: rate of moving data – E.g. Clickstreams and ad impressions capture user behavior at millions of events per second; • Variety: structured, semi structure, unstructured, images, etc. – Big Data data isn't just numbers, dates, and strings. Big Data is also geospatial data, 3D data, audio and video, and unstructured text, including log files and social media Source: http://www-01.ibm.com/software/data/bigdata/ 16PolicyBazaar.com
  • 17. Many Uses of Data • Transactions (OLTP) • Analysis (OLAP) • Search and Findability • Enterprise Agility • Speed and Reliability • Consistency and Availability • Or anything else… 17PolicyBazaar.com
  • 18. Where to use NoSQL? • Social data • Data processing (Hadoop) • Search (Lucene) • Caching (Memcache, ...) • Data Warehousing • Logging • ... 18PolicyBazaar.com
  • 19. What NoSQL must deliver • Massive scalability – No application-level sharding • Performance • High Availability/Fault Tolerance • Ease of use – Simple operations/administration – No application-level sharding – Simple APIs – Quickly evolve application & schema 19PolicyBazaar.com
  • 20. Classification of NoSQL Databases • Key-Value – Very popular for simple key-value lookup: disk/memory. e.g Dynamo, Redis,, Voldemort, MemcachedDB, Berkeley, HazelCast etc • Document – Popular for document type storage. e.g. MongoDB, OrientDB, CouchDB, Riak etc. • Column Family – Key value with fixed column families, allows dynamic columns within column family. E.g. Cassandra, BigTable, HBase, Hypertable etc • Graph – Connected graph with entity Relationship. e.g.Titan, Neo4j, infiniteGraph 20PolicyBazaar.com
  • 21. NoSQL Store • Key-Value Stores – Dynamo Clones • Redis • Membase • Riak • Tokyo Cabinet • Voldemort • Document Stores – MongoDB – CouchDB – SimpleDB • Column Family – BigTable Clones • Cassandra • Hbase • HyperTable • Graph Databases – Neo4J – Titan – InfoGrid – AllegroGraph 21PolicyBazaar.com
  • 22. NOSQL: Size Vs Complexity 22 Sources: http://blogs.neotechnology.com/emil/2009/11/nosql-scaling-to-size- and-scaling-to-complexity.html PolicyBazaar.com
  • 23. Visual Guide to NoSQL 23Sources: http://blog.nahurst.com/visual-guide-to-nosql-systemsPolicyBazaar.com
  • 24. Key-Value Store • Focus on scaling to huge amounts of data • Designed to handle massive load • Based on Amazon’s Dynamo paper • Data model: (global) collection of Key-Value pairs • Dynamo ring partitioning and replication 24PolicyBazaar.com
  • 25. Types of Key-Value Stores • Eventually-consistent key-value store • Hierarchical key-value stores • Key-Value stores in RAM • Key-Value stores on disk • High availability key-value store • Ordered key-value stores • Values that allow simple list operations 25PolicyBazaar.com
  • 26. Key / value stores (Opaque) • Keys are mapped to values • Values are treated as BLOBs (opaque data) • No type information is stored • Values can be heterogeneous • Example values: { name: “ranjeet“, age: 35, city: “DL“ } => JSON, but store will not care about it xdexadxb0x0b => binary, but store will not care about it 26 Key Value PolicyBazaar.com
  • 27. • Open source in-memory key-value store with optional durability • Focus on high speed reads and writes of common data structures to RAM • Allows simple lists, sets and hashes to be stored within the value and manipulated • Many features that developers like – expiration, transactions, pub/sub, partitioning 27PolicyBazaar.com
  • 28. BigTable clones • Like column oriented Relational Databases, but with a twist • Tables similarly to RDBMS, but handles semi- structured • Based on Google’s BigTable paper 28PolicyBazaar.com
  • 29. Document Store • Data stored in nested hierarchies • Logical data remains stored together as a unit • Any item in the document can be queried • Similar to Key-Value stores, but the DB knows what the Value is • Inspired by Lotus Notes • Documents are often versioned 29PolicyBazaar.com
  • 30. Document Store … • Data model: Collections of Key-Value collections • Pros: No object-relational mapping layer, ideal for search, Schema less • Cons: Complex to implement, incompatible with SQL • Examples: MongoDB, Couchbase, CouchDB 30PolicyBazaar.com
  • 31. MongoDB (DocumentDB) • Open Source JSON data store created by 10gen • Master-slave scale out model • Strong developer community • Sharding built-in, automatic • Implemented in C++ with many APIs (C++, JavaScript, Java, .net, Perl, Python etc.) 31PolicyBazaar.com
  • 32. Column-Family • Key includes a row, column family and column name • Store versioned blobs in one large table • Queries can be done on rows, column families and column names • Pros: Great scale out, Performant, versioning • Cons: Cannot query blob content, row and column designs are critical • Examples: Cassandra, Bigtable, HBase, Hypertable, Apache Accumulo 32PolicyBazaar.com
  • 33. The Evolution of Cassandra 33PolicyBazaar.com
  • 34. Cassandra • Apache open source column family database supported by DataStax • Peer-to-peer distribution model • Strong reputation for linear scale out (millions of writes/second) • Database side security • Written in Java and works well with HDFS and MapReduce 34PolicyBazaar.com
  • 35. Cassandra: Feature Headlines • Elastic – Read and write throughput increases linearly as new machines are • Decentralized – Fault tolerant with no single point of failure; no “master” node • Rich data model – Column based, range slices, column slices, secondary indexes, counters, expiring columns 35 Source: http://cassandra.apache.org/ PolicyBazaar.com
  • 36. • Apache Hadoop is a framework that allows for the distributed processing of large data sets across clusters of commodity computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each providing computation and storage. • Hadoop is an open-source implementation of Google MapReduce, GFS(distributed file system). • Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. • Hadoop fulfill need of common infrastructure – Efficient, reliable, easy to use – Open Source, Apache License Hadoop origins 36PolicyBazaar.com
  • 37. HBase /Hadoop • Open source implementation of MapReduce algorithm written in Java • Initially created by Yahoo • Column-oriented data store • Java interface • HBase designed specifically to work with Hadoop • High-level query language (Pig) • Strong support by many vendors 37PolicyBazaar.com
  • 38. Graph Store • Focus on modeling the structure of data - interconnectivity • Scales to the complexity of the data • Inspired by mathematical Graph Theory ( G=(E,V) ) Data is stored in a series of nodes, relationships and properties • Queries are really graph traversals • Data is stored in a series of nodes, relationships and properties • Ideal when relationships between data is key: – e.g. social networks 38PolicyBazaar.com
  • 39. Graph Store (cont..) • Ideal when relationships between data is key: – e.g. social networks • Data model: “Property Graph” ‣Nodes ‣Relationships/Edges between Nodes ‣Key-Value pairs on both ‣Possibly Edge Labels and/or Node/ Edge Types • Pros: fast network search, works with public linked data sets • Cons: specialized query languages (RDF uses SPARQL) , gramlin, cypher) • Examples: Neo4j, Titan, AllegroGraph, InfiniteGraph.. 39PolicyBazaar.com
  • 40. Graph Stores (cont..) • Used when the relationship and relationships types between items are critical • Used for – Social networking queries: "friends of my friends" – Inference and rules engines – Pattern recognition – Used for working with open-linked data • Automate "joins" of public data 40PolicyBazaar.com
  • 41. Property Graph model • Nodes i.e. Vertex • Relationships between Nodes i.e Edge • Relationships have Labels • Relationships are directed, but traversed at equal speed in both directions • The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not) • Nodes have key-value properties • Relationships have key-value properties 41PolicyBazaar.com
  • 42. Neo4J • Graph database designed to be easy to use by Java developers • Dual license (community edition is GPL) • Works as an embedded java library in your application • Disk-based (not just RAM) • Full ACID 42PolicyBazaar.com
  • 43. Decides what you need • SQL – Relational, transactional processing • NoSQL – Non relational, distributed, high performance and highly scalable • Analytics, Warehouse, BigData – Data Warehousing, Analytics, Data science, and reporting • Combination of all 3 – Begin with SQL, NoSQL and eventually need BigData/ Analytics platform 43PolicyBazaar.com
  • 44. Finally… in One liner… • SQL – Works great , can’t easily scale. • NoSQL • Works great , can’t fit for all • Analytics, BigData – Every Business need it. 44PolicyBazaar.com
  • 45. Use Case: Twitter • Twitter challenges – Needs to store many graphs • Who you are following • Who’s following you • Who you receive phone notifications from etc – To deliver a tweet requires rapid paging of followers – Heavy write load as followers are added and removed – Set arithmetic for @mentions (intersection of users). 45PolicyBazaar.com
  • 46. Use Case: Twitter … • What did they try? • Started with Relational Databases • Tried Key-Value storage of denormalized lists • Did it work? – Nope – Either good at Handling the write load or paging large amounts of data But not both 46PolicyBazaar.com
  • 47. Open source implementations to play with! • MongoDB - http://www.mongodb.org/ • Cassandra - http://cassandra.apache.org/ • Neo4j - http://neo4j.org/ • Hadoop + Hbase - http://hadoop.apache.org/ • Redis - http://code.google.com/p/redis/ • Oracle Berkley DB - http://www.oracle.com/ database/berkeley-db/ • … and Many more… 47PolicyBazaar.com
  • 48. Thank You For any Query or feedback write to me ranjeet@policyBazaar.com ranjeet.kr@gmail.com PolicyBazaar.com 48