SlideShare a Scribd company logo
1 of 45
1
June, 2016
NoSQL – A Quick Tour
2
Objectives
History of Database
Statistics
Power of 4V
What is NoSQL
How is data growing ?
Challenges
What’s the solution ?
NoSQL Features and Types
Eventual Consistency
High-level Overview of some popular No-SQL DB
NoSQL – Not mandatory
Aadhaar – An example of Polygot persistency
A support for JPA 2.0 in NoSQL
What’s Next ?
Basic implementation concepts
Useful Links
3
History of Database
1960 1970 200019901980
Computerized
Database
Network model called
CODASY
Hierarchical model
IMS
Relational Database
Evolution of RDBMS
(Codd’s model)
Introduction of E-R
Diagram
A New Era
RDBMS as the
primary choice of
database
Evolution of Social
Networking
Big Data
Internet of Things
NoSQL
Object Database
Introduction of ODMS
Growth of Client-
Server applications
Increase use of
internet
Query Language
Evolution of SQL as
the standard language
4
History of Database to Big Data
Relational Database (1980)
“Impedance Mismatch” (quoted by Martin Fowler)
Object Database (1990) – Not very successful because RDBMS was too close to the systems
And then ……something which is able to process humungous data (like 500+ TB every day)
 Facebook
 LinkedIn
 Twitter
5
History of Database to Big Data…continues
6
Statistics
The McKinsey Global Institute estimates that data volume is growing 40% per year, and will
grow 44x between 2009 and 2020
Twitters processes ~8 TB/day
~277K tweets/min
2 million search queries/min
Youtube processes 72 hours of video uploaded/min
100 million emails sent/min
Facebook processes 350 GB/min
Big Numbers
1,000 bytes = one kilobyte (kB)
1,000 kB = one megabyte (MB)
1,000 MB = one gigabyte (GB)
1,000 GB = one terabyte (TB)
1,000 TB = one petabyte (PB)
1,000 PB = one exabyte (EB)
1,000 EB = one zettabyte (ZB)
1,000 ZB = one yottabyte (YB)
7
Power of 4V
Volume
Very high volume of data
Different kind of transactions (read/write)
Many use cases where read is substantially more than write and vice versa
Velocity
Very high frequency transactions
Need low latency
High throughput
Variety
Structured vs un-structure data
Normalized vs. De-normalized data
Relational vs Aggregated data
Value
Data is very useful peace of information
Data can be junk
Value proposition, Analytics (user behavior, buying pattern, market information, economic prediction, success/failure of the government)
According to computer giant IBM, 2.5 exabytes - that's 2.5 billion gigabytes (GB) - of data was generated every day in 2012. That's big by anyone's
standards. "About 75% of data is unstructured, coming from sources such as text, voice and video," says Mr Miles (Head of Analytics of Big Data
Specialist SAS).
8
Current Picture
What do you think all your favorite sites are running with ?
Facebook – Cassandra, Hbase, Hadoop/Hive
LinkedIn – Voldermort
Twitter – Hadoop/Pig/FlockDB, Cassandra
Google – BigTable
Amazon – AWS, DynamoDB
WhatsApp ??
9
What is NoSQL
Abbreviation of “Not only SQL”, any data source which doesn’t come under typical SQL category exclusively
The term is attributed more or less to Eric Evans of Rackspace. The name was coined in 2009 during a discussion with Joan
Oskarsson of Last.fm for open-source distributed databases.
A new way to store and retrieve data (specially sparse and unstructured) in modern day high-volume real-time web traffic,
batch processing, and analytics
Not an alternative for RDBMS but a parallel concept
A “Shared Nothing” architecture as opposed to “Shared Architecture” in RDBMS (Philosophy of Shared Nothing (NA)
architecture - A shared nothing architecture (SN) is a distributed computing architecture in which each node is independent
and self-sufficient, and there is no single point of contention across the system. More specifically, none of the nodes share
memory or disk storage.). The term was first coined by Michael Stonebraker at University of California at Berkeley in his
1986 paper “The Case for Shared Nothing.”
Can be tagged with technologies which doesn’t use SQL and relational mapping of data
Promotes huge storage of data and efficient retrieval, supports normal CRUD operations
Not a stringent follower of ACID due to its inherent nature
Works with “Big Data” for supporting high volume processing of data
10
What is NoSQL Continued..
Comes with varied flavor of products often coined due to research at social networking giants like Twitter, Facebook,
LinkedIn, Google, Yahoo etc.
Data can be stored in four basic databases like key/value pair, column-family data, document, and graph
Mostly these stores or databases don’t expect pre-defined schemas or normalization and consistency
Primary purpose of NoSQL is to have fast and efficient storing and processing of constantly growing data without the
constraint of relation database and provide the scalable architecture to support future growth of data without compromising
the performance
First generation and second generation of NoSQL
Mantra of NoSQL : “Getting an answer quickly is more important than getting a
correct answer”
If you can’t split it, you can’t scale it
—Randy Shoup, Distinguished Architect, eBay
scale-first database
—Nati Shalom, CTO and founder of GigaSpaces
11
How is data growing ?
12
How is data growing ? Continued..
There are two concepts Big User and Big Data along with Cloud Computing
Big User – Number of users accessing web is growing rapidly accessing several kind of data such Personal information, social data
like tweets, likes, blogs, click streams, comments, follows, or geo location data, log files, system generated data, user generated
data, sensor-generated data etc. The growing number of users can’t be predicted specially the advancement of mobile space
Big Data – Source of data has increased tremendously but the actual data increased exponentially. Data is not in-terms of gigabyte
but in more higher number (tera/penta/exa/zetta/yotta bytes)
 Transactional Data
 Machine Data
 Social networking data
Cloud Computing – More and more Data is stored in the cloud and access to data should be fast
The number of concurrent users skyrocketed as applications increasingly became accessible via the web (and later on mobile
devices)
 The amount of data collected and processed soared as it became easier and increasingly valuable to capture all kinds of data
 The amount of unstructured or semi-structured data exploded and its use became integral to the value and richness of
applications
13
Challenges
Shared
Application Server Layer
Add CPU
Add RAM V
e
r
t
i
c
a
l
Shared Disk
Oh No! I am loaded
Application Server Layer
Commodity
Server
Commodity
Server
Commodity
Server
Commodity
Server
Commodity
Server
14
Challenges Continued..
Need to scale out (i.e. sharding) and without compromising latency and performance (keeping low latency and high
throughout)
How to handle social sites ?
Do we really need strong consistency or reliability for the status update at Facebook or a tweet at
Tweeter ?
Need to avoid scale up with costly and complex high-end servers
Should be easily scaled out with low cost commodity servers without any application downtime
Data is not only structured but majority of the social networking data is unstructured. So, there is a need to support
schema less data structures
There are many cases where transaction is not of prime concern and costly and heavy writes are not needed
Should support easy and quick replication as well as failover
RDBMS is mainly a centralized system with single point of failure (SpoF) ; the replication is very expensive and complex
due to latency attributed for transaction management and co-ordination through two-phase commits etc.
Traditional big appliances like IBM rack or Oracle infrastructure stack with related hard disk and storage are costly to
maintain
15
What’s the solution ?
Real time and batch processing for analytical and operational data by second generation of NoSQL like Couchbase
The picture depict a scenario where both batch processing as well as real-time data is being handled by a combination of Hadoop
and Couchbase.
16
NoSQL Features and Types
Early adopter: Google’s Bigtable is used for throughput sensitive batch processing to latency based online queries. Used in
Google Earth, Finance, Orkut, Analytics etc. This is based on the concept of column family data store.
There are mainly 3 types of NoSQL databases as mentioned below.
Document Database
MongoDB and Couchbase (An amalgamated version of CouchDB and Membase)
Couchebase started as Apache incubator project and then continued by Couchbase Inc
Implemented in Erlang and C, and Javascript execution environment
Used by Apple, BBC, and many others.
17
NoSQL Features and Types Continued..
Key/Value pair and Eventual Consistency datastore
Redis, Membase, Voldemort, and Cassandra
Cassandra supports key/value pair and eventual consistency (based on Amazon’s DynamoDB).
Developed by Facebook and implemented in Java
Clients are available in Java, PHP, Python, Grails, Ruby, .NET.
Used by Facebook, Twitter, Paddy Power, GitHub, and many others.
Redis supports key-value pair and a distributed in-memory as well as persistent storage system
Started by Salvatore Sanfilippo in 2009 as an independent project
Implemented in C
Clients are available in PHP, Java, Ruby, Python, C++ etc.
Used by Craigslist
Sorted Column Family datastore
HBase (developed based on Google's bigtable)
Created by Powerset and donated to Apache. Implemented in Java. Used by Facebook, Yahoo! and many others.
Access method is JRuby, Java, Thrift, REST, ProtoBuf etc.
Graph Database
Neo4J, FlockDB
Neo4J was developed in 2003 and implemented in Java
Accessed through REST and Gremlin interfaces. Used by Box.com, ThoughtWorks
There are other types of NOSQL databases available as well.
XML Database
Object Database
Grid and Cloud Database
Multimodel Database
18
Eventual Consistency
Consistency – Data should be consistent across user transactions and each client can have the same view of the data
Availability – System should be available always and user can read as well as write always
Partition – System should be partition-tolerant and work in a distributed environment
Eventual Consistency implies BASE (Basically Available, Soft state, Eventual Consistency)
Brewer’s CAP Theorem
Succinctly put, Brewer’s Theorem states that in systems that are distributed or scaled out it’s impossible to achieve
all three (Consistency, Availability, and Partition Tolerance) at the same time. You must make trade-offs and
sacrifice at least one in favor of the other two.
ACID BASE
Strong consistency Weak consistency – stale data OK
Isolation Availability first
Focus on “commit” Best effort
Nested transactions Approximate answers OK
Availability? Aggressive (optimistic)
Conservative (pessimistic) Simpler!
Difficult evolution (e. g. schema) Faster
Easier evolution
19
Eventual Consistency – HAPPY GO LUCKY
An example of booking flights for two close friends for attending a conference
There is only one ticket left
Data Center
(Asia)
Data Center
(US)
Anand Scott
DATA SYNC-UP
GOOD FRIENDS
bookflighttickets.com
SYNC
DOWN
Will the tickets be booked ?
Booking done
20
Brewer’s Theorem was conjectured by Eric Brewer and presented by him (www.cs.berkeley.edu/
~brewer/cs262b-2004/PODC-keynote.pdf) as a keynote address at the ACM Symposium on the
Principles of Distributed Computing (PODC) in 2000.
Brewer’s ideas on CAP developed as a part of his work at UC Berkeley and at Inktomi.
A look at Distributed Update and Replication
Replication
Eventual Consistency – Distributed Environment
A V0 (V1)
B V0(V1)
Writes
Reads
21
Theorem of Two options and Three Alternatives
SYNCHRONOUS SYNCHRONIZATION
A Single Transaction, Consistency is of prime importance
ASYNCHRONOUS SYNCHRONIZATION
Can be achieved but if synchronization fails, no way to know when it happened
Option 1 : Let the Consistency take the precedence and Availability may be compromised considering the system supports
partition-tolerance (CP)
Option 2: Let the Availability take the precedence and Consistency may be compromised
considering the system supports partition-tolerance (AP)
Option 3: Let both Consistency and Availability take the precedence and the system is not partition-tolerant (AC)
22
A SMALL TALK
COUCHBASE (DOCUMENT)
REDIS (KEY-VALUE PAIR)
CASSANDRA (COLUMN-FAMILY AND EVENTUAL CONSISTENCY)
NEO4J (GRAPH)
HOW WILL WE GO THROUGH ?
 A SMALL OVERVIEW
 BASIC FEATURES
23
Features of Couchbase
A high level overview of Couchbase is mentioned below.
1.Stores data in a JSON or binary format in the data store (called a document)
2. Supports basic CRUD operations like get, set, delete etc. Uses MVCC to continue with non-blocking IO for read/writes.
3. Provides a strong layer of caching of data in memory and automatically persists data in file system to support strong failover
mechanism
4. Uses concepts like Buckets to group physical resources in a cluster logically with options like setting memory for each bucket
as well as replication rule.
5. Each buckets divided in 24 logical partitions called vBuckets and used a cluster map to locate document in a cluster
6. vBuckets the lowest denominator to locate a document in a cluster through hash identifier for each document
7. Asynchronous storing of data in disk and replication data to other servers in a cluster as well as across data-center through
XDCR feature
8. Very efficient and easy management of a distributed cluster (horizontal scaling), also known as “Scale Out”
9. Supports integration with Memcache protocol seamlessly
10. Rebalancing of data (documents) through change in the clustering (adding or removing nodes from a cluster and updating
cluster map with updated location of documents)
11. Database index like feature through Views for faster access of indexed data
12. Highly secured access to Couchbase server through SASL mechanism
13. Provides the support for both optimistic (through compare and swap - CAS mechanism) and pessimistic locking
(through explicit locking)
14. An asynchronous listener based approach for manipulating data through Future interface
24
How does Couchbase work ?
25
More Overview
Replication Process
Smart Client writes the data in server object-managed Cache
Documents is submitted to intra-cluster replication queue for replicating to other servers
The document is persisted to disk write queue asynchronously to write in the disk. The data is written to disk once disk queue
flushes off
The data is replicated to other clusters through XDCR once the data in persisted in the disk and eventually indexed for
searching.
Major Components
Data Manager
Object Managed Cache
 server warm-up, checkpoint, TAP replicator, backfill, resident item ratio, NRU, ejection, item pager)
Storage Engine
 compaction)
Query Engine
 Index can be created and queried for JSON documents
 Secondary indexes are created through View and Design documents
Cluster Manager (orchestration node)
 The Hearbeat watchdog
 The process Manager
 The configuration manager
26
Features of REDIS
A high level overview of Redis is mentioned below.
1.Started in 2009, REDIS (Remote Dictionary Service) is a distributed key-value pair database. A shared memory system
for very fast read and write capabilities. Fundamentally, an advanced version of Memcache database.
2. Creator of Redis - Salvatore Sanfilippo termed as “Data Structure Store” capable of storing complex data structure as
keys like Set, List, Hash, Sorted Set, bitmaps etc. apart from normal strings.
3. Apart from a data structure store, it also works as blocking queue (Stack) and Publish-Subscribe system
4. A powerful command-line-interface (CLI) and rich API for the clients.
5. An expired based policy can be set for each key-value pair in order to let the list grow unbounded
6. Provides an option to save data in the disk, an unusual case for any key-value system which primarily operates over memory.
There is the facility to take the snapshot in an interval based on some criteria like number of changes for keys etc.
7. An additional protection of data through Append-only file for each writes to save from crashing of the server
8. By default, Redis doesn’t provide a good way to handle security of its own. So, its better to use firewall on SSH to protect
the secure data.
9. Supports a Master-Slave replication mechanism but not an multi-master scaling and fail-over intelligent system
10. A client managed cluster support through consistent-hashing rather than server side
11. Provides a probabilistic determination of non-existence of data through Bloom filters (managing sequence of bits)
12. Uses a special data structure called “Dynamic String Structure” (SDS) to store all the data internally
13. Uses its own Virtual Memory management to locate data in the disk
27
Replication in REDIS
Master Node
Slave Node
(Read-Write)
Slave Node
(Read-Write)
Slave Node
(Read-only)
R
e
p
l
i
c
a
t
i
o
n
R
e
p
l
i
c
a
t
i
o
n
R
e
p
l
i
c
a
t
i
o
n
Disk
Non-Blocking Synchronization
Redis Server
Redis Client/Smart Client
PeriodicSave
Redis uses hash slots to bucketing data elements across nodes, so that data is sharded across nodes for fault-tolerant. If
any new node has been added or removed from the cluster, Redis maintains the linkage of data from old to new node
through its internal node-to-node communication (ping-pong in Redis’s term) based on binary protocol. Under the hood,
Redis also uses a Gossip-based protocol among the nodes to track the status of each node and take necessary actions in case
some node went down or not responding. Redis has a smart client who can decide to connect to the right node in the
cluster to find the data instead of client to any node arbitrarily.
Gossip Gossip
A B DC E
28
An Example of REDIS
Redis can be used for managing data where caching based simple key-value pair along with complex querying facility be given
based on the keys.
Analytics
Caching
Search Engine
Messaging
Broker
1.Get a list of cities under a zip code around the world
2.Get a list of books based on ISBN code where each book is associated with multiple “tagging” words
3.Build a sub-system which can browse through a catalog system to find the product data
4.Use a broker to collect data for multiple sources like managing centralized log content
Not so good use cases
1. Every bit of data is very precious
2. Multiple master-master setup and failover needed
3. ACID transaction is highly desired
4. Relational data is of prime importance
29
Features of APACHE CASSANDRA
A high level overview of Cassandra is mentioned below.
If I had asked people what they wanted, they would have said faster horses. – Henry Ford
1.Influenced by Amazon’s DynamoDB for its distributed design and Google’s Bigtable for the data model, Cassandra is a hybrid
datastore supporting both column-family as well as key-value data with Eventual Consistency
2. Cassandra was developed by Facebook and it’s a sparse multi-dimensional hashtable
3. Supports secondary indexes apart from the index on the row-key
4. Supports powerful command-line-interface (CLI) as well as Thrift based multi-lingual drive type communication techniques
for the clients.
5.Runs on decentralized mode of keeping each node identical, not like a master-slave topology
6. A tunable consistent system instead of eventual consistency (A always writeble system)
7.Uses Gossip protocol with hinted hands-off to perform peer-to-peer communication across nodes
8.Uses Anti-entropy to manage data synchronization (replication) across multiple nodes with the updated version
9.Uses compaction to merge large datafiles for better management of spaces and use proprietary compression technology
10.Uses Bloom filter to find if any element is available in map
11.Uses a concept called “Tombstone” for soft delete. The data is physically deleted during compaction.
12.Uses “Staged Event-Driven Architecture” (SEDA) for highly efficient parallel processing
13.Uses the three separate processes (commit log, memtable, and SSTable) to store and manage data during write operation
14.Uses a concept called “Read Repair” to update outdated values in any node
30
CASSANDRA – Column Family
Suppose A customer has a personal information as well as address. So, two column families can be created, one for personal
data and the other for address.
In column family store, data is identified by the row-key. The difference from a RDBMS is that each row can have its own
column family data. In case, there are null values for some columns data is not stored there unlike RDBMS tables which
consume additional space.
Cassandra uses a SQL like query language (very similar) called “CQL” (Cassandra query language).
SELECT USERS FROM STATE WHERE STATE =‘TX”
1
5
2
4
3
31
CASSANDRA – An Example
Cassandra is a popular datastore for many popular large scale web applications. The following criteria describes some of the
important use cases for Cassandra.
High volume writes like tweets from Twitter or comments from Facebook
Don’t need strong consistency
High throughput for Writes
Consistency can be controlled
Deformalized data without the need of secondary indexes
32
Features of Neo4J
Graph Space
Graph
Database
Graph Compute
Engine
OLTP OLAP
A Property Graph
Leveraging complex and dynamic relationships in highly-connected data to generate insight and competitive advantage.
33
Features of Neo4J
A high level overview of Neo4J is mentioned below.
1.A Graph database (mathematical modeling - Leonard Euler's graph theory ) for supporting relational information (relationship )
across multiple entities. Developed by Neo technologies.
2.It is built on the concept of nodes, relationship, parameters (key/value pair), and labels
3.A proprietary query named “Cypher” for performing CRUD operations
4.A very high-performing NoSQL database for storing and retrieving connected data
5.Graph is a way to maintain multi-dimensional relation among entities
6.Highly applicable in social networking applications like social graphs, recommendation etc.
7.The Neo4J site provides a rich REPL (Read-eval- print loop) web interface for running queries as well as performing
administrative works.
8.It is ACID compliant as well as provides high-availability and master-slave replication across multiple nodes
9.Provides easy client interface through REST and Gremlin
10.Provides fast-look up through Lucene
Sachin
Grapes
Gaurav
Java
Bikramfriend
likes
friend
likes
eats
34
NoSQL – Not Mandatory
35
NoSQL – Not Mandatory
NoSQL is not a replacement of SQL
Generally NoSQL is not fit for applications which need strong consistency
Correctness of data is more important than availability of data
Transactional context is important than analytical processing
Data is structured and maintained through object relational hierarchy
Need to support Legacy database
Future of Databases
The future of databases lies with the amalgamation of relational and NoSQL databases based on the need.
Pramod J. Sadalage and Martin Fowler mentioned the concept of Polygot Persistence in their famous book “NoSQL
Distilled”.
36
Polygot Persistency – Classic Example
37
Polygot Persistency – Classic Example Contd..
38
JPA 2.o Support for NoSQL
Kundera by Impetus Technology
A JPA 2.0 compliant Object-Datastore Mapping Library for NoSQL Datastores
Supports multiple NoSQL DB like Hbase, Cassandra, Redis, MongoDB, CouchDB, Neo4j etc.
An easy interface to work with polygot persistence
Giving developers the capability to not get into the complexity of the individual NoSQL databases
39
What’s Next ?
NewSQL DB
The definition of NewSQL according to Wikipedia is “NewSQL is a class of modern relational database management systems
that seek to provide the same scalable performance of NoSQL systems for online transaction processing (OLTP) read-write
workloads while still maintaining the ACID guarantees of a traditional database system”.
MySQL at its top as open source high performing database
Sun took over MySQL
2009 Oracle acquired Sun and potentially became the owner of MySQL
Number of additional players added other products to make MySQL more efficient – Memcache was added for caching
Still not sufficient from performance and scalability point of view. The term NewSQL became popular because of different
supporting database technologies and products were added with MySQL DB
Early 2009, several NoSQL databases were on the rise
NuoDB is one of the popular NewSQL DBs
NewSQL Language
A new breed of database query language
No need to have separate JDBC driver for each database
LDBC (Liberty DataBase Connectivity) is a JDBC driver that provides vendor-independent database access
The grammar is not finalized yet, can be used as Jdb (Java-database) or S2 (SQL 2)
40
What’s Next ? Continued…..
A. New Set of very high performing NoSQL – At early stage
Game of Benchmarking
FoundationDB’s 14.4 million write/sec vs Cassandra’s 1.1. million writes/sec (used by Netflix) - NoSQL, YesACID motto
Aerospike - speed of Redis, at the scale of Cassandra, but with the economies of flash (flash-optimized)
MemSQL – A super fast read-optimized in-memory database, scanned 134 billion rows/sec to find popular terms from Wikipedia
search trend
B. MPP Databases – Massively Parallel Processing (Greenplum, Vertica DB) vs Hadoop/SQL (Hive)
C. Connected Thinking - SMAC
Social, Mobility, Analytics, and Cloud computing
D. Internet of Things - IoT
41
Basic Implementation Concepts
The following terms are some of the most common and important implementation concepts on which NoSQL products are built.
Sharding or Horizontal Scaling – Data Replication
Sharding is the way to replicate data when needed. The data should be scaled horizontally so that “scaling out” is possible.
Quorum – Data Replication
Quorum is the minimum number of votes that a distributed transaction needs in order to allow to perform an operation in the
distributed environment.
Wikipedia definition : “A quorum is the minimum number of votes that a distributed transaction has to obtain in order to be
allowed to perform an operation in a distributed system. A quorum-based technique is implemented to enforce consistent
operation in a distributed system.”
Used for distributed commit and replica
Gossip Protocol – Co-ordination, Consistency, and detection of failures
Its like a human gossiping to understand who are your peers, neighbors and exchange information among them.
Read Repair – Data Repair
Cassandra uses a concept called "Read Repair" to keep the consistency of read operations for requested rows across all the replica
nodes. It uses two types of read request—direct read and background read requests. The direct read can be configured through
read consistency level; the background read request goes to all the other nodes which didn't receive a direct read request. It is an
optional feature and can be configured at the column level.
42
Basic Implementation Concepts
 Hinted Hands-off - Co-ordination, Consistency, and detection of failures
Hinted handoff is a Cassandra feature that optimizes the cluster consistency process and anti-entropy when a replica-owning
node is not available, due to network issues or other problems, to accept a replica from a successful write operation
It is required to keep consistency in the cluster to ensure that when a node is not available, the local hint table keeps the data
and then writes back when the node becomes available.
Hinted Handoff is used to handle transient failures.
 Vector Clocks - Conflict detection
Dynamo (Amazon ), Voldemort (LinkedIn) uses vector clocks to identify the order of multi-version records in the system.
Suppose Amit, Ajit, Atul, and Ankush are meeting at some place. They initially decided the meet-up on Tuesday. Later Amit
and Ajit discussed to have this on Wednesday. Similarly Ankush and Atul decided to have the meeting on Friday. Ajit and
Ankush had another exchange that if it can happen on Thursday. Now all have different versions and conflicts arises. All of
them wanted to clarify the date but all are not reachable at the same time. How can they resolve this conflict ?
43
Basic Implementation Concepts
 Consistent Hashing – Rebalancing
 Merkle Tree – Deleting inconsistency
Dynamo, Riak, and Cassandra uses this algorithm to minimize the amount of data to be synchronized in case of any
inconsistency.
 Multi-version Concurrency Control – Consistency and resolution of conflicts
CouchDB uses Multi-Version Concurrency Control (MVCC) to avoid locking the database file during writes. Conflicts are
left to the application to resolve at write time. Older document versions (called revisions) may be lost when the append-
only database file is compacted.
 LMT (Log Merge Tree)
LMT provides the techniques to store data and can be used to overcome some of the pitfalls of B-tree. LevelDB is probably
the most popular implementation. Apart from LevelDB, there are other products like Cassandra or HBase the also uses
this. LMT uses immutable storage segments. It also facilitates write before read as well as reduces fragmentation, or
possibly replaces B-Tree "write cliff". For more information, please check this link:
http://www.xaprb.com/blog/2015/04/02/state-of-the-storage-engine/
44
Useful Links
http://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf
http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
Professional NoSQL by Shashank Tiwari
NoSQL Databases by Christof Strauch
Couchbase Server Under the Hood from Couchbase Inc.
http://www.slideshare.net/Muratakal/rdbms-vs-nosql-15797058
http://www.nosql-database.org/
http://www.youtube.com/watch?v=MmL9Lq6WbSY
http://incubator.apache.org/thrift/
http://www.youtube.com/watch?v=uMxZ4RI6sCQ
http://planetcassandra.org/apache-cassandra-use-cases/
http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf
Seven Databases in Seven Weeks by Eric Redmond and Jim R. Wilson
http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_about_hh_c.html
http://www.slideshare.net/benoitperroud/nosql-overview-implementation-free
http://java.dzone.com/articles/simple-magic-consistent
https://github.com/impetus-opensource/Kundera
http://newsql.sourceforge.net/
http://cs.brown.edu/courses/cs227/archives/2012/papers/newsql/aslett-newsql.pdf
http://thenewstack.io/databases-high-volume-transactions-scale-part-two/
45
Useful Links continued….
http://www.oracle.com/us/products/database/big-data-for-enterprise-519135.pdf
Managing Online Risk: Apps, Mobile, and Social Media Security by Deborah Gonzalez
http://highscalability.com/blog/2014/3/31/how-whatsapp-grew-to-nearly-500-million-users-11000-cores-an.html
http://db-engines.com/en/ranking_osvsc
https://www.youtube.com/watch?v=jznJKL0CrxM
https://highlyscalable.wordpress.com/2012/09/18/distributed-algorithms-in-nosql-databases/?blogsub=confirming#blog_subscripti
http://basho.com/why-vector-clocks-are-easy/
http://cloudacademy.com/blog/how-to-handle-failures-in-dynamodb-an-inside-look-into-nosql-part-6/
http://www.oracle.com/technetwork/database/database-technologies/nosqldb/documentation/nosql-vs-couchdb-1961720.pdf
http://bsonspec.org/
https://code.google.com/p/protobuf/
https://thrift.apache.org/
http://www.vertica.com/2014/04/18/facebook-and-vertica-a-case-for-mpp-databases/
http://prod2.aerospike.com/wp-content/uploads/2014/02/snapdeal_casestudy.pdf
http://www.slideshare.net/regunathbalasubramanian/oss-as-a-competitive-advantage
http://www.thoughtworks.com/insights/blog/nosql-databases-overview
http://techcrunch.com/2015/03/24/apple-acquires-durable-database-company-foundationdb/

More Related Content

What's hot

Cassandra overview
Cassandra overviewCassandra overview
Cassandra overviewSean Murphy
 
Horizon Cloud on Microsoft Azure 概要 (2018年4月版)
Horizon Cloud on Microsoft Azure 概要 (2018年4月版)Horizon Cloud on Microsoft Azure 概要 (2018年4月版)
Horizon Cloud on Microsoft Azure 概要 (2018年4月版)Takamasa Maejima
 
Introduction à Neo4j - La base de données de graphes - 2016
Introduction à Neo4j - La base de données de graphes - 2016Introduction à Neo4j - La base de données de graphes - 2016
Introduction à Neo4j - La base de données de graphes - 2016Cédric Fauvet
 
データ収集の基本と「JapanTaxi」アプリにおける実践例
データ収集の基本と「JapanTaxi」アプリにおける実践例データ収集の基本と「JapanTaxi」アプリにおける実践例
データ収集の基本と「JapanTaxi」アプリにおける実践例Tetsutaro Watanabe
 
データプロダクトを支えるビッグデータ基盤
データプロダクトを支えるビッグデータ基盤データプロダクトを支えるビッグデータ基盤
データプロダクトを支えるビッグデータ基盤Google Cloud Platform - Japan
 
Snowflakeって実際どうなの?数多のDBを使い倒した猛者が語る
Snowflakeって実際どうなの?数多のDBを使い倒した猛者が語るSnowflakeって実際どうなの?数多のDBを使い倒した猛者が語る
Snowflakeって実際どうなの?数多のDBを使い倒した猛者が語るRyota Shibuya
 
マイクロサービスに至る歴史とこれから - XP祭り2021
マイクロサービスに至る歴史とこれから - XP祭り2021マイクロサービスに至る歴史とこれから - XP祭り2021
マイクロサービスに至る歴史とこれから - XP祭り2021Yusuke Suzuki
 
Raster Data In GeoServer And GeoTools: Achievements, Issues And Future Develo...
Raster Data In GeoServer And GeoTools: Achievements, Issues And Future Develo...Raster Data In GeoServer And GeoTools: Achievements, Issues And Future Develo...
Raster Data In GeoServer And GeoTools: Achievements, Issues And Future Develo...GeoSolutions
 
Migrating On-Premises DBs to Cloud Systems
Migrating On-Premises DBs to Cloud SystemsMigrating On-Premises DBs to Cloud Systems
Migrating On-Premises DBs to Cloud SystemsChristopher Foot
 
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideCeph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideKaran Singh
 
データウェアハウスモデリング入門(ダイジェスト版)(事前公開版)
データウェアハウスモデリング入門(ダイジェスト版)(事前公開版) データウェアハウスモデリング入門(ダイジェスト版)(事前公開版)
データウェアハウスモデリング入門(ダイジェスト版)(事前公開版) Satoshi Nagayasu
 
Snowflake Architecture and Performance(db tech showcase Tokyo 2018)
Snowflake Architecture and Performance(db tech showcase Tokyo 2018)Snowflake Architecture and Performance(db tech showcase Tokyo 2018)
Snowflake Architecture and Performance(db tech showcase Tokyo 2018)Mineaki Motohashi
 
Jp chaosmap 2014-2015
Jp chaosmap 2014-2015Jp chaosmap 2014-2015
Jp chaosmap 2014-2015Hiroshi Kondo
 
Snowflake Elastic Data Warehouse as a Service
Snowflake Elastic Data Warehouse as a ServiceSnowflake Elastic Data Warehouse as a Service
Snowflake Elastic Data Warehouse as a ServiceMineaki Motohashi
 
Snowflake Architecture and Performance
Snowflake Architecture and PerformanceSnowflake Architecture and Performance
Snowflake Architecture and PerformanceMineaki Motohashi
 
Databricksを初めて使う人に向けて.pptx
Databricksを初めて使う人に向けて.pptxDatabricksを初めて使う人に向けて.pptx
Databricksを初めて使う人に向けて.pptxotato
 
大量のデータ処理や分析に使えるOSS Apache Spark入門(Open Source Conference 2021 Online/Kyoto 発表資料)
大量のデータ処理や分析に使えるOSS Apache Spark入門(Open Source Conference 2021 Online/Kyoto 発表資料)大量のデータ処理や分析に使えるOSS Apache Spark入門(Open Source Conference 2021 Online/Kyoto 発表資料)
大量のデータ処理や分析に使えるOSS Apache Spark入門(Open Source Conference 2021 Online/Kyoto 発表資料)NTT DATA Technology & Innovation
 
Azure Kubernetes Service Overview
Azure Kubernetes Service OverviewAzure Kubernetes Service Overview
Azure Kubernetes Service OverviewTakeshi Fukuhara
 
My first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdfMy first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdfAlkin Tezuysal
 

What's hot (20)

Introduction to Aerospike
Introduction to AerospikeIntroduction to Aerospike
Introduction to Aerospike
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overview
 
Horizon Cloud on Microsoft Azure 概要 (2018年4月版)
Horizon Cloud on Microsoft Azure 概要 (2018年4月版)Horizon Cloud on Microsoft Azure 概要 (2018年4月版)
Horizon Cloud on Microsoft Azure 概要 (2018年4月版)
 
Introduction à Neo4j - La base de données de graphes - 2016
Introduction à Neo4j - La base de données de graphes - 2016Introduction à Neo4j - La base de données de graphes - 2016
Introduction à Neo4j - La base de données de graphes - 2016
 
データ収集の基本と「JapanTaxi」アプリにおける実践例
データ収集の基本と「JapanTaxi」アプリにおける実践例データ収集の基本と「JapanTaxi」アプリにおける実践例
データ収集の基本と「JapanTaxi」アプリにおける実践例
 
データプロダクトを支えるビッグデータ基盤
データプロダクトを支えるビッグデータ基盤データプロダクトを支えるビッグデータ基盤
データプロダクトを支えるビッグデータ基盤
 
Snowflakeって実際どうなの?数多のDBを使い倒した猛者が語る
Snowflakeって実際どうなの?数多のDBを使い倒した猛者が語るSnowflakeって実際どうなの?数多のDBを使い倒した猛者が語る
Snowflakeって実際どうなの?数多のDBを使い倒した猛者が語る
 
マイクロサービスに至る歴史とこれから - XP祭り2021
マイクロサービスに至る歴史とこれから - XP祭り2021マイクロサービスに至る歴史とこれから - XP祭り2021
マイクロサービスに至る歴史とこれから - XP祭り2021
 
Raster Data In GeoServer And GeoTools: Achievements, Issues And Future Develo...
Raster Data In GeoServer And GeoTools: Achievements, Issues And Future Develo...Raster Data In GeoServer And GeoTools: Achievements, Issues And Future Develo...
Raster Data In GeoServer And GeoTools: Achievements, Issues And Future Develo...
 
Migrating On-Premises DBs to Cloud Systems
Migrating On-Premises DBs to Cloud SystemsMigrating On-Premises DBs to Cloud Systems
Migrating On-Premises DBs to Cloud Systems
 
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideCeph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing Guide
 
データウェアハウスモデリング入門(ダイジェスト版)(事前公開版)
データウェアハウスモデリング入門(ダイジェスト版)(事前公開版) データウェアハウスモデリング入門(ダイジェスト版)(事前公開版)
データウェアハウスモデリング入門(ダイジェスト版)(事前公開版)
 
Snowflake Architecture and Performance(db tech showcase Tokyo 2018)
Snowflake Architecture and Performance(db tech showcase Tokyo 2018)Snowflake Architecture and Performance(db tech showcase Tokyo 2018)
Snowflake Architecture and Performance(db tech showcase Tokyo 2018)
 
Jp chaosmap 2014-2015
Jp chaosmap 2014-2015Jp chaosmap 2014-2015
Jp chaosmap 2014-2015
 
Snowflake Elastic Data Warehouse as a Service
Snowflake Elastic Data Warehouse as a ServiceSnowflake Elastic Data Warehouse as a Service
Snowflake Elastic Data Warehouse as a Service
 
Snowflake Architecture and Performance
Snowflake Architecture and PerformanceSnowflake Architecture and Performance
Snowflake Architecture and Performance
 
Databricksを初めて使う人に向けて.pptx
Databricksを初めて使う人に向けて.pptxDatabricksを初めて使う人に向けて.pptx
Databricksを初めて使う人に向けて.pptx
 
大量のデータ処理や分析に使えるOSS Apache Spark入門(Open Source Conference 2021 Online/Kyoto 発表資料)
大量のデータ処理や分析に使えるOSS Apache Spark入門(Open Source Conference 2021 Online/Kyoto 発表資料)大量のデータ処理や分析に使えるOSS Apache Spark入門(Open Source Conference 2021 Online/Kyoto 発表資料)
大量のデータ処理や分析に使えるOSS Apache Spark入門(Open Source Conference 2021 Online/Kyoto 発表資料)
 
Azure Kubernetes Service Overview
Azure Kubernetes Service OverviewAzure Kubernetes Service Overview
Azure Kubernetes Service Overview
 
My first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdfMy first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdf
 

Similar to NoSQL Basics - a quick tour

Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databasesJames Serra
 
Relational Databases For An Efficient Data Management And...
Relational Databases For An Efficient Data Management And...Relational Databases For An Efficient Data Management And...
Relational Databases For An Efficient Data Management And...Sheena Crouch
 
The Rise of Nosql Databases
The Rise of Nosql DatabasesThe Rise of Nosql Databases
The Rise of Nosql DatabasesJAMES NGONDO
 
Big Data using NoSQL Technologies
Big Data using NoSQL TechnologiesBig Data using NoSQL Technologies
Big Data using NoSQL TechnologiesAmit Singh
 
The Recent Pronouncement Of The World Wide Web (Www) Had
The Recent Pronouncement Of The World Wide Web (Www) HadThe Recent Pronouncement Of The World Wide Web (Www) Had
The Recent Pronouncement Of The World Wide Web (Www) HadDeborah Gastineau
 
Big Data
Big DataBig Data
Big DataNGDATA
 
History of NoSQL and Azure Documentdb feature set
History of NoSQL and Azure Documentdb feature setHistory of NoSQL and Azure Documentdb feature set
History of NoSQL and Azure Documentdb feature setSoner Altin
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
Big data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edgeBig data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edgeBhavya Gulati
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptalmaraniabwmalk
 
Graph databases and OrientDB
Graph databases and OrientDBGraph databases and OrientDB
Graph databases and OrientDBAhsan Bilal
 
NoSQL Databases Introduction - UTN 2013
NoSQL Databases Introduction - UTN 2013NoSQL Databases Introduction - UTN 2013
NoSQL Databases Introduction - UTN 2013Facundo Farias
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology LandscapeShivanandaVSeeri
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvewKunal Khanna
 
SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?Venu Anuganti
 

Similar to NoSQL Basics - a quick tour (20)

The NoSQL Movement
The NoSQL MovementThe NoSQL Movement
The NoSQL Movement
 
NoSQL Basics - A Quick Tour
NoSQL Basics - A Quick TourNoSQL Basics - A Quick Tour
NoSQL Basics - A Quick Tour
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Relational Databases For An Efficient Data Management And...
Relational Databases For An Efficient Data Management And...Relational Databases For An Efficient Data Management And...
Relational Databases For An Efficient Data Management And...
 
The Rise of Nosql Databases
The Rise of Nosql DatabasesThe Rise of Nosql Databases
The Rise of Nosql Databases
 
Big Data using NoSQL Technologies
Big Data using NoSQL TechnologiesBig Data using NoSQL Technologies
Big Data using NoSQL Technologies
 
The Recent Pronouncement Of The World Wide Web (Www) Had
The Recent Pronouncement Of The World Wide Web (Www) HadThe Recent Pronouncement Of The World Wide Web (Www) Had
The Recent Pronouncement Of The World Wide Web (Www) Had
 
No sql databases
No sql databasesNo sql databases
No sql databases
 
Big data technologies with Case Study Finance and Healthcare
Big data technologies with Case Study Finance and HealthcareBig data technologies with Case Study Finance and Healthcare
Big data technologies with Case Study Finance and Healthcare
 
Big Data
Big DataBig Data
Big Data
 
History of NoSQL and Azure Documentdb feature set
History of NoSQL and Azure Documentdb feature setHistory of NoSQL and Azure Documentdb feature set
History of NoSQL and Azure Documentdb feature set
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Big data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edgeBig data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edge
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Graph databases and OrientDB
Graph databases and OrientDBGraph databases and OrientDB
Graph databases and OrientDB
 
NoSQL Type, Bigdata, and Analytics
NoSQL Type, Bigdata, and AnalyticsNoSQL Type, Bigdata, and Analytics
NoSQL Type, Bigdata, and Analytics
 
NoSQL Databases Introduction - UTN 2013
NoSQL Databases Introduction - UTN 2013NoSQL Databases Introduction - UTN 2013
NoSQL Databases Introduction - UTN 2013
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology Landscape
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
 
SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?
 

Recently uploaded

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 

Recently uploaded (20)

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 

NoSQL Basics - a quick tour

  • 1. 1 June, 2016 NoSQL – A Quick Tour
  • 2. 2 Objectives History of Database Statistics Power of 4V What is NoSQL How is data growing ? Challenges What’s the solution ? NoSQL Features and Types Eventual Consistency High-level Overview of some popular No-SQL DB NoSQL – Not mandatory Aadhaar – An example of Polygot persistency A support for JPA 2.0 in NoSQL What’s Next ? Basic implementation concepts Useful Links
  • 3. 3 History of Database 1960 1970 200019901980 Computerized Database Network model called CODASY Hierarchical model IMS Relational Database Evolution of RDBMS (Codd’s model) Introduction of E-R Diagram A New Era RDBMS as the primary choice of database Evolution of Social Networking Big Data Internet of Things NoSQL Object Database Introduction of ODMS Growth of Client- Server applications Increase use of internet Query Language Evolution of SQL as the standard language
  • 4. 4 History of Database to Big Data Relational Database (1980) “Impedance Mismatch” (quoted by Martin Fowler) Object Database (1990) – Not very successful because RDBMS was too close to the systems And then ……something which is able to process humungous data (like 500+ TB every day)  Facebook  LinkedIn  Twitter
  • 5. 5 History of Database to Big Data…continues
  • 6. 6 Statistics The McKinsey Global Institute estimates that data volume is growing 40% per year, and will grow 44x between 2009 and 2020 Twitters processes ~8 TB/day ~277K tweets/min 2 million search queries/min Youtube processes 72 hours of video uploaded/min 100 million emails sent/min Facebook processes 350 GB/min Big Numbers 1,000 bytes = one kilobyte (kB) 1,000 kB = one megabyte (MB) 1,000 MB = one gigabyte (GB) 1,000 GB = one terabyte (TB) 1,000 TB = one petabyte (PB) 1,000 PB = one exabyte (EB) 1,000 EB = one zettabyte (ZB) 1,000 ZB = one yottabyte (YB)
  • 7. 7 Power of 4V Volume Very high volume of data Different kind of transactions (read/write) Many use cases where read is substantially more than write and vice versa Velocity Very high frequency transactions Need low latency High throughput Variety Structured vs un-structure data Normalized vs. De-normalized data Relational vs Aggregated data Value Data is very useful peace of information Data can be junk Value proposition, Analytics (user behavior, buying pattern, market information, economic prediction, success/failure of the government) According to computer giant IBM, 2.5 exabytes - that's 2.5 billion gigabytes (GB) - of data was generated every day in 2012. That's big by anyone's standards. "About 75% of data is unstructured, coming from sources such as text, voice and video," says Mr Miles (Head of Analytics of Big Data Specialist SAS).
  • 8. 8 Current Picture What do you think all your favorite sites are running with ? Facebook – Cassandra, Hbase, Hadoop/Hive LinkedIn – Voldermort Twitter – Hadoop/Pig/FlockDB, Cassandra Google – BigTable Amazon – AWS, DynamoDB WhatsApp ??
  • 9. 9 What is NoSQL Abbreviation of “Not only SQL”, any data source which doesn’t come under typical SQL category exclusively The term is attributed more or less to Eric Evans of Rackspace. The name was coined in 2009 during a discussion with Joan Oskarsson of Last.fm for open-source distributed databases. A new way to store and retrieve data (specially sparse and unstructured) in modern day high-volume real-time web traffic, batch processing, and analytics Not an alternative for RDBMS but a parallel concept A “Shared Nothing” architecture as opposed to “Shared Architecture” in RDBMS (Philosophy of Shared Nothing (NA) architecture - A shared nothing architecture (SN) is a distributed computing architecture in which each node is independent and self-sufficient, and there is no single point of contention across the system. More specifically, none of the nodes share memory or disk storage.). The term was first coined by Michael Stonebraker at University of California at Berkeley in his 1986 paper “The Case for Shared Nothing.” Can be tagged with technologies which doesn’t use SQL and relational mapping of data Promotes huge storage of data and efficient retrieval, supports normal CRUD operations Not a stringent follower of ACID due to its inherent nature Works with “Big Data” for supporting high volume processing of data
  • 10. 10 What is NoSQL Continued.. Comes with varied flavor of products often coined due to research at social networking giants like Twitter, Facebook, LinkedIn, Google, Yahoo etc. Data can be stored in four basic databases like key/value pair, column-family data, document, and graph Mostly these stores or databases don’t expect pre-defined schemas or normalization and consistency Primary purpose of NoSQL is to have fast and efficient storing and processing of constantly growing data without the constraint of relation database and provide the scalable architecture to support future growth of data without compromising the performance First generation and second generation of NoSQL Mantra of NoSQL : “Getting an answer quickly is more important than getting a correct answer” If you can’t split it, you can’t scale it —Randy Shoup, Distinguished Architect, eBay scale-first database —Nati Shalom, CTO and founder of GigaSpaces
  • 11. 11 How is data growing ?
  • 12. 12 How is data growing ? Continued.. There are two concepts Big User and Big Data along with Cloud Computing Big User – Number of users accessing web is growing rapidly accessing several kind of data such Personal information, social data like tweets, likes, blogs, click streams, comments, follows, or geo location data, log files, system generated data, user generated data, sensor-generated data etc. The growing number of users can’t be predicted specially the advancement of mobile space Big Data – Source of data has increased tremendously but the actual data increased exponentially. Data is not in-terms of gigabyte but in more higher number (tera/penta/exa/zetta/yotta bytes)  Transactional Data  Machine Data  Social networking data Cloud Computing – More and more Data is stored in the cloud and access to data should be fast The number of concurrent users skyrocketed as applications increasingly became accessible via the web (and later on mobile devices)  The amount of data collected and processed soared as it became easier and increasingly valuable to capture all kinds of data  The amount of unstructured or semi-structured data exploded and its use became integral to the value and richness of applications
  • 13. 13 Challenges Shared Application Server Layer Add CPU Add RAM V e r t i c a l Shared Disk Oh No! I am loaded Application Server Layer Commodity Server Commodity Server Commodity Server Commodity Server Commodity Server
  • 14. 14 Challenges Continued.. Need to scale out (i.e. sharding) and without compromising latency and performance (keeping low latency and high throughout) How to handle social sites ? Do we really need strong consistency or reliability for the status update at Facebook or a tweet at Tweeter ? Need to avoid scale up with costly and complex high-end servers Should be easily scaled out with low cost commodity servers without any application downtime Data is not only structured but majority of the social networking data is unstructured. So, there is a need to support schema less data structures There are many cases where transaction is not of prime concern and costly and heavy writes are not needed Should support easy and quick replication as well as failover RDBMS is mainly a centralized system with single point of failure (SpoF) ; the replication is very expensive and complex due to latency attributed for transaction management and co-ordination through two-phase commits etc. Traditional big appliances like IBM rack or Oracle infrastructure stack with related hard disk and storage are costly to maintain
  • 15. 15 What’s the solution ? Real time and batch processing for analytical and operational data by second generation of NoSQL like Couchbase The picture depict a scenario where both batch processing as well as real-time data is being handled by a combination of Hadoop and Couchbase.
  • 16. 16 NoSQL Features and Types Early adopter: Google’s Bigtable is used for throughput sensitive batch processing to latency based online queries. Used in Google Earth, Finance, Orkut, Analytics etc. This is based on the concept of column family data store. There are mainly 3 types of NoSQL databases as mentioned below. Document Database MongoDB and Couchbase (An amalgamated version of CouchDB and Membase) Couchebase started as Apache incubator project and then continued by Couchbase Inc Implemented in Erlang and C, and Javascript execution environment Used by Apple, BBC, and many others.
  • 17. 17 NoSQL Features and Types Continued.. Key/Value pair and Eventual Consistency datastore Redis, Membase, Voldemort, and Cassandra Cassandra supports key/value pair and eventual consistency (based on Amazon’s DynamoDB). Developed by Facebook and implemented in Java Clients are available in Java, PHP, Python, Grails, Ruby, .NET. Used by Facebook, Twitter, Paddy Power, GitHub, and many others. Redis supports key-value pair and a distributed in-memory as well as persistent storage system Started by Salvatore Sanfilippo in 2009 as an independent project Implemented in C Clients are available in PHP, Java, Ruby, Python, C++ etc. Used by Craigslist Sorted Column Family datastore HBase (developed based on Google's bigtable) Created by Powerset and donated to Apache. Implemented in Java. Used by Facebook, Yahoo! and many others. Access method is JRuby, Java, Thrift, REST, ProtoBuf etc. Graph Database Neo4J, FlockDB Neo4J was developed in 2003 and implemented in Java Accessed through REST and Gremlin interfaces. Used by Box.com, ThoughtWorks There are other types of NOSQL databases available as well. XML Database Object Database Grid and Cloud Database Multimodel Database
  • 18. 18 Eventual Consistency Consistency – Data should be consistent across user transactions and each client can have the same view of the data Availability – System should be available always and user can read as well as write always Partition – System should be partition-tolerant and work in a distributed environment Eventual Consistency implies BASE (Basically Available, Soft state, Eventual Consistency) Brewer’s CAP Theorem Succinctly put, Brewer’s Theorem states that in systems that are distributed or scaled out it’s impossible to achieve all three (Consistency, Availability, and Partition Tolerance) at the same time. You must make trade-offs and sacrifice at least one in favor of the other two. ACID BASE Strong consistency Weak consistency – stale data OK Isolation Availability first Focus on “commit” Best effort Nested transactions Approximate answers OK Availability? Aggressive (optimistic) Conservative (pessimistic) Simpler! Difficult evolution (e. g. schema) Faster Easier evolution
  • 19. 19 Eventual Consistency – HAPPY GO LUCKY An example of booking flights for two close friends for attending a conference There is only one ticket left Data Center (Asia) Data Center (US) Anand Scott DATA SYNC-UP GOOD FRIENDS bookflighttickets.com SYNC DOWN Will the tickets be booked ? Booking done
  • 20. 20 Brewer’s Theorem was conjectured by Eric Brewer and presented by him (www.cs.berkeley.edu/ ~brewer/cs262b-2004/PODC-keynote.pdf) as a keynote address at the ACM Symposium on the Principles of Distributed Computing (PODC) in 2000. Brewer’s ideas on CAP developed as a part of his work at UC Berkeley and at Inktomi. A look at Distributed Update and Replication Replication Eventual Consistency – Distributed Environment A V0 (V1) B V0(V1) Writes Reads
  • 21. 21 Theorem of Two options and Three Alternatives SYNCHRONOUS SYNCHRONIZATION A Single Transaction, Consistency is of prime importance ASYNCHRONOUS SYNCHRONIZATION Can be achieved but if synchronization fails, no way to know when it happened Option 1 : Let the Consistency take the precedence and Availability may be compromised considering the system supports partition-tolerance (CP) Option 2: Let the Availability take the precedence and Consistency may be compromised considering the system supports partition-tolerance (AP) Option 3: Let both Consistency and Availability take the precedence and the system is not partition-tolerant (AC)
  • 22. 22 A SMALL TALK COUCHBASE (DOCUMENT) REDIS (KEY-VALUE PAIR) CASSANDRA (COLUMN-FAMILY AND EVENTUAL CONSISTENCY) NEO4J (GRAPH) HOW WILL WE GO THROUGH ?  A SMALL OVERVIEW  BASIC FEATURES
  • 23. 23 Features of Couchbase A high level overview of Couchbase is mentioned below. 1.Stores data in a JSON or binary format in the data store (called a document) 2. Supports basic CRUD operations like get, set, delete etc. Uses MVCC to continue with non-blocking IO for read/writes. 3. Provides a strong layer of caching of data in memory and automatically persists data in file system to support strong failover mechanism 4. Uses concepts like Buckets to group physical resources in a cluster logically with options like setting memory for each bucket as well as replication rule. 5. Each buckets divided in 24 logical partitions called vBuckets and used a cluster map to locate document in a cluster 6. vBuckets the lowest denominator to locate a document in a cluster through hash identifier for each document 7. Asynchronous storing of data in disk and replication data to other servers in a cluster as well as across data-center through XDCR feature 8. Very efficient and easy management of a distributed cluster (horizontal scaling), also known as “Scale Out” 9. Supports integration with Memcache protocol seamlessly 10. Rebalancing of data (documents) through change in the clustering (adding or removing nodes from a cluster and updating cluster map with updated location of documents) 11. Database index like feature through Views for faster access of indexed data 12. Highly secured access to Couchbase server through SASL mechanism 13. Provides the support for both optimistic (through compare and swap - CAS mechanism) and pessimistic locking (through explicit locking) 14. An asynchronous listener based approach for manipulating data through Future interface
  • 25. 25 More Overview Replication Process Smart Client writes the data in server object-managed Cache Documents is submitted to intra-cluster replication queue for replicating to other servers The document is persisted to disk write queue asynchronously to write in the disk. The data is written to disk once disk queue flushes off The data is replicated to other clusters through XDCR once the data in persisted in the disk and eventually indexed for searching. Major Components Data Manager Object Managed Cache  server warm-up, checkpoint, TAP replicator, backfill, resident item ratio, NRU, ejection, item pager) Storage Engine  compaction) Query Engine  Index can be created and queried for JSON documents  Secondary indexes are created through View and Design documents Cluster Manager (orchestration node)  The Hearbeat watchdog  The process Manager  The configuration manager
  • 26. 26 Features of REDIS A high level overview of Redis is mentioned below. 1.Started in 2009, REDIS (Remote Dictionary Service) is a distributed key-value pair database. A shared memory system for very fast read and write capabilities. Fundamentally, an advanced version of Memcache database. 2. Creator of Redis - Salvatore Sanfilippo termed as “Data Structure Store” capable of storing complex data structure as keys like Set, List, Hash, Sorted Set, bitmaps etc. apart from normal strings. 3. Apart from a data structure store, it also works as blocking queue (Stack) and Publish-Subscribe system 4. A powerful command-line-interface (CLI) and rich API for the clients. 5. An expired based policy can be set for each key-value pair in order to let the list grow unbounded 6. Provides an option to save data in the disk, an unusual case for any key-value system which primarily operates over memory. There is the facility to take the snapshot in an interval based on some criteria like number of changes for keys etc. 7. An additional protection of data through Append-only file for each writes to save from crashing of the server 8. By default, Redis doesn’t provide a good way to handle security of its own. So, its better to use firewall on SSH to protect the secure data. 9. Supports a Master-Slave replication mechanism but not an multi-master scaling and fail-over intelligent system 10. A client managed cluster support through consistent-hashing rather than server side 11. Provides a probabilistic determination of non-existence of data through Bloom filters (managing sequence of bits) 12. Uses a special data structure called “Dynamic String Structure” (SDS) to store all the data internally 13. Uses its own Virtual Memory management to locate data in the disk
  • 27. 27 Replication in REDIS Master Node Slave Node (Read-Write) Slave Node (Read-Write) Slave Node (Read-only) R e p l i c a t i o n R e p l i c a t i o n R e p l i c a t i o n Disk Non-Blocking Synchronization Redis Server Redis Client/Smart Client PeriodicSave Redis uses hash slots to bucketing data elements across nodes, so that data is sharded across nodes for fault-tolerant. If any new node has been added or removed from the cluster, Redis maintains the linkage of data from old to new node through its internal node-to-node communication (ping-pong in Redis’s term) based on binary protocol. Under the hood, Redis also uses a Gossip-based protocol among the nodes to track the status of each node and take necessary actions in case some node went down or not responding. Redis has a smart client who can decide to connect to the right node in the cluster to find the data instead of client to any node arbitrarily. Gossip Gossip A B DC E
  • 28. 28 An Example of REDIS Redis can be used for managing data where caching based simple key-value pair along with complex querying facility be given based on the keys. Analytics Caching Search Engine Messaging Broker 1.Get a list of cities under a zip code around the world 2.Get a list of books based on ISBN code where each book is associated with multiple “tagging” words 3.Build a sub-system which can browse through a catalog system to find the product data 4.Use a broker to collect data for multiple sources like managing centralized log content Not so good use cases 1. Every bit of data is very precious 2. Multiple master-master setup and failover needed 3. ACID transaction is highly desired 4. Relational data is of prime importance
  • 29. 29 Features of APACHE CASSANDRA A high level overview of Cassandra is mentioned below. If I had asked people what they wanted, they would have said faster horses. – Henry Ford 1.Influenced by Amazon’s DynamoDB for its distributed design and Google’s Bigtable for the data model, Cassandra is a hybrid datastore supporting both column-family as well as key-value data with Eventual Consistency 2. Cassandra was developed by Facebook and it’s a sparse multi-dimensional hashtable 3. Supports secondary indexes apart from the index on the row-key 4. Supports powerful command-line-interface (CLI) as well as Thrift based multi-lingual drive type communication techniques for the clients. 5.Runs on decentralized mode of keeping each node identical, not like a master-slave topology 6. A tunable consistent system instead of eventual consistency (A always writeble system) 7.Uses Gossip protocol with hinted hands-off to perform peer-to-peer communication across nodes 8.Uses Anti-entropy to manage data synchronization (replication) across multiple nodes with the updated version 9.Uses compaction to merge large datafiles for better management of spaces and use proprietary compression technology 10.Uses Bloom filter to find if any element is available in map 11.Uses a concept called “Tombstone” for soft delete. The data is physically deleted during compaction. 12.Uses “Staged Event-Driven Architecture” (SEDA) for highly efficient parallel processing 13.Uses the three separate processes (commit log, memtable, and SSTable) to store and manage data during write operation 14.Uses a concept called “Read Repair” to update outdated values in any node
  • 30. 30 CASSANDRA – Column Family Suppose A customer has a personal information as well as address. So, two column families can be created, one for personal data and the other for address. In column family store, data is identified by the row-key. The difference from a RDBMS is that each row can have its own column family data. In case, there are null values for some columns data is not stored there unlike RDBMS tables which consume additional space. Cassandra uses a SQL like query language (very similar) called “CQL” (Cassandra query language). SELECT USERS FROM STATE WHERE STATE =‘TX” 1 5 2 4 3
  • 31. 31 CASSANDRA – An Example Cassandra is a popular datastore for many popular large scale web applications. The following criteria describes some of the important use cases for Cassandra. High volume writes like tweets from Twitter or comments from Facebook Don’t need strong consistency High throughput for Writes Consistency can be controlled Deformalized data without the need of secondary indexes
  • 32. 32 Features of Neo4J Graph Space Graph Database Graph Compute Engine OLTP OLAP A Property Graph Leveraging complex and dynamic relationships in highly-connected data to generate insight and competitive advantage.
  • 33. 33 Features of Neo4J A high level overview of Neo4J is mentioned below. 1.A Graph database (mathematical modeling - Leonard Euler's graph theory ) for supporting relational information (relationship ) across multiple entities. Developed by Neo technologies. 2.It is built on the concept of nodes, relationship, parameters (key/value pair), and labels 3.A proprietary query named “Cypher” for performing CRUD operations 4.A very high-performing NoSQL database for storing and retrieving connected data 5.Graph is a way to maintain multi-dimensional relation among entities 6.Highly applicable in social networking applications like social graphs, recommendation etc. 7.The Neo4J site provides a rich REPL (Read-eval- print loop) web interface for running queries as well as performing administrative works. 8.It is ACID compliant as well as provides high-availability and master-slave replication across multiple nodes 9.Provides easy client interface through REST and Gremlin 10.Provides fast-look up through Lucene Sachin Grapes Gaurav Java Bikramfriend likes friend likes eats
  • 34. 34 NoSQL – Not Mandatory
  • 35. 35 NoSQL – Not Mandatory NoSQL is not a replacement of SQL Generally NoSQL is not fit for applications which need strong consistency Correctness of data is more important than availability of data Transactional context is important than analytical processing Data is structured and maintained through object relational hierarchy Need to support Legacy database Future of Databases The future of databases lies with the amalgamation of relational and NoSQL databases based on the need. Pramod J. Sadalage and Martin Fowler mentioned the concept of Polygot Persistence in their famous book “NoSQL Distilled”.
  • 36. 36 Polygot Persistency – Classic Example
  • 37. 37 Polygot Persistency – Classic Example Contd..
  • 38. 38 JPA 2.o Support for NoSQL Kundera by Impetus Technology A JPA 2.0 compliant Object-Datastore Mapping Library for NoSQL Datastores Supports multiple NoSQL DB like Hbase, Cassandra, Redis, MongoDB, CouchDB, Neo4j etc. An easy interface to work with polygot persistence Giving developers the capability to not get into the complexity of the individual NoSQL databases
  • 39. 39 What’s Next ? NewSQL DB The definition of NewSQL according to Wikipedia is “NewSQL is a class of modern relational database management systems that seek to provide the same scalable performance of NoSQL systems for online transaction processing (OLTP) read-write workloads while still maintaining the ACID guarantees of a traditional database system”. MySQL at its top as open source high performing database Sun took over MySQL 2009 Oracle acquired Sun and potentially became the owner of MySQL Number of additional players added other products to make MySQL more efficient – Memcache was added for caching Still not sufficient from performance and scalability point of view. The term NewSQL became popular because of different supporting database technologies and products were added with MySQL DB Early 2009, several NoSQL databases were on the rise NuoDB is one of the popular NewSQL DBs NewSQL Language A new breed of database query language No need to have separate JDBC driver for each database LDBC (Liberty DataBase Connectivity) is a JDBC driver that provides vendor-independent database access The grammar is not finalized yet, can be used as Jdb (Java-database) or S2 (SQL 2)
  • 40. 40 What’s Next ? Continued….. A. New Set of very high performing NoSQL – At early stage Game of Benchmarking FoundationDB’s 14.4 million write/sec vs Cassandra’s 1.1. million writes/sec (used by Netflix) - NoSQL, YesACID motto Aerospike - speed of Redis, at the scale of Cassandra, but with the economies of flash (flash-optimized) MemSQL – A super fast read-optimized in-memory database, scanned 134 billion rows/sec to find popular terms from Wikipedia search trend B. MPP Databases – Massively Parallel Processing (Greenplum, Vertica DB) vs Hadoop/SQL (Hive) C. Connected Thinking - SMAC Social, Mobility, Analytics, and Cloud computing D. Internet of Things - IoT
  • 41. 41 Basic Implementation Concepts The following terms are some of the most common and important implementation concepts on which NoSQL products are built. Sharding or Horizontal Scaling – Data Replication Sharding is the way to replicate data when needed. The data should be scaled horizontally so that “scaling out” is possible. Quorum – Data Replication Quorum is the minimum number of votes that a distributed transaction needs in order to allow to perform an operation in the distributed environment. Wikipedia definition : “A quorum is the minimum number of votes that a distributed transaction has to obtain in order to be allowed to perform an operation in a distributed system. A quorum-based technique is implemented to enforce consistent operation in a distributed system.” Used for distributed commit and replica Gossip Protocol – Co-ordination, Consistency, and detection of failures Its like a human gossiping to understand who are your peers, neighbors and exchange information among them. Read Repair – Data Repair Cassandra uses a concept called "Read Repair" to keep the consistency of read operations for requested rows across all the replica nodes. It uses two types of read request—direct read and background read requests. The direct read can be configured through read consistency level; the background read request goes to all the other nodes which didn't receive a direct read request. It is an optional feature and can be configured at the column level.
  • 42. 42 Basic Implementation Concepts  Hinted Hands-off - Co-ordination, Consistency, and detection of failures Hinted handoff is a Cassandra feature that optimizes the cluster consistency process and anti-entropy when a replica-owning node is not available, due to network issues or other problems, to accept a replica from a successful write operation It is required to keep consistency in the cluster to ensure that when a node is not available, the local hint table keeps the data and then writes back when the node becomes available. Hinted Handoff is used to handle transient failures.  Vector Clocks - Conflict detection Dynamo (Amazon ), Voldemort (LinkedIn) uses vector clocks to identify the order of multi-version records in the system. Suppose Amit, Ajit, Atul, and Ankush are meeting at some place. They initially decided the meet-up on Tuesday. Later Amit and Ajit discussed to have this on Wednesday. Similarly Ankush and Atul decided to have the meeting on Friday. Ajit and Ankush had another exchange that if it can happen on Thursday. Now all have different versions and conflicts arises. All of them wanted to clarify the date but all are not reachable at the same time. How can they resolve this conflict ?
  • 43. 43 Basic Implementation Concepts  Consistent Hashing – Rebalancing  Merkle Tree – Deleting inconsistency Dynamo, Riak, and Cassandra uses this algorithm to minimize the amount of data to be synchronized in case of any inconsistency.  Multi-version Concurrency Control – Consistency and resolution of conflicts CouchDB uses Multi-Version Concurrency Control (MVCC) to avoid locking the database file during writes. Conflicts are left to the application to resolve at write time. Older document versions (called revisions) may be lost when the append- only database file is compacted.  LMT (Log Merge Tree) LMT provides the techniques to store data and can be used to overcome some of the pitfalls of B-tree. LevelDB is probably the most popular implementation. Apart from LevelDB, there are other products like Cassandra or HBase the also uses this. LMT uses immutable storage segments. It also facilitates write before read as well as reduces fragmentation, or possibly replaces B-Tree "write cliff". For more information, please check this link: http://www.xaprb.com/blog/2015/04/02/state-of-the-storage-engine/
  • 44. 44 Useful Links http://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html Professional NoSQL by Shashank Tiwari NoSQL Databases by Christof Strauch Couchbase Server Under the Hood from Couchbase Inc. http://www.slideshare.net/Muratakal/rdbms-vs-nosql-15797058 http://www.nosql-database.org/ http://www.youtube.com/watch?v=MmL9Lq6WbSY http://incubator.apache.org/thrift/ http://www.youtube.com/watch?v=uMxZ4RI6sCQ http://planetcassandra.org/apache-cassandra-use-cases/ http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf Seven Databases in Seven Weeks by Eric Redmond and Jim R. Wilson http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_about_hh_c.html http://www.slideshare.net/benoitperroud/nosql-overview-implementation-free http://java.dzone.com/articles/simple-magic-consistent https://github.com/impetus-opensource/Kundera http://newsql.sourceforge.net/ http://cs.brown.edu/courses/cs227/archives/2012/papers/newsql/aslett-newsql.pdf http://thenewstack.io/databases-high-volume-transactions-scale-part-two/
  • 45. 45 Useful Links continued…. http://www.oracle.com/us/products/database/big-data-for-enterprise-519135.pdf Managing Online Risk: Apps, Mobile, and Social Media Security by Deborah Gonzalez http://highscalability.com/blog/2014/3/31/how-whatsapp-grew-to-nearly-500-million-users-11000-cores-an.html http://db-engines.com/en/ranking_osvsc https://www.youtube.com/watch?v=jznJKL0CrxM https://highlyscalable.wordpress.com/2012/09/18/distributed-algorithms-in-nosql-databases/?blogsub=confirming#blog_subscripti http://basho.com/why-vector-clocks-are-easy/ http://cloudacademy.com/blog/how-to-handle-failures-in-dynamodb-an-inside-look-into-nosql-part-6/ http://www.oracle.com/technetwork/database/database-technologies/nosqldb/documentation/nosql-vs-couchdb-1961720.pdf http://bsonspec.org/ https://code.google.com/p/protobuf/ https://thrift.apache.org/ http://www.vertica.com/2014/04/18/facebook-and-vertica-a-case-for-mpp-databases/ http://prod2.aerospike.com/wp-content/uploads/2014/02/snapdeal_casestudy.pdf http://www.slideshare.net/regunathbalasubramanian/oss-as-a-competitive-advantage http://www.thoughtworks.com/insights/blog/nosql-databases-overview http://techcrunch.com/2015/03/24/apple-acquires-durable-database-company-foundationdb/

Editor's Notes

  1. SDS is made of three categories of data buff — A character array that stores the string len — A long type that stores the length of the buff array free — Number of additional bytes available for use Uses different techniques than MongoDB (uses memory-mapped files)
  2. Read-Write Slave Node Read Only Slave Node