NoSQL: Overview of the main features and approaches
This presentation has been developed in the context of the Databases course at the DISIM Department of the University of L’Aquila (Italy).
http://www.di.univaq.it/malavolta
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
NoSQL
1. Ivano Malavolta
ivano.malavolta@univaq.it
http://www.di.univaq.it/malavolta
DISIM - University of L’Aquila
2. Why, When, Who NOSQL (now)?
The CAP Theorem
NOSQL Approaches
Case Study 1: Instagram
Case Study 2: Twitter
Case Study 3: tumblr
Summary
References
DISIM - University of L’Aquila
3. ACID
Atomicity
Consistency
Isolation
Durability
Based on Relational Algebra
Select, Projection, Set Operators, Renaming, Joins
Concept of Schema
Standard DISIM - University of L’Aquila
4. The term was coined in 2009 by Eric Evans,
Software Developer at Apache Software Foundation
Class of non-relational data storage systems
Usually do not require a fixed schema
Many NoSQL offerings relax one or more of the ACID properties
DISIM - University of L’Aquila
6. No to SQL
…we are not against SQL!
Not only SQL
It’s about recognizing that for some
problems other storage solutions
are better suited!
http://goo.gl/gWIoy DISIM - University of L’Aquila
7. Each NOSQL approach addresses some
limitations of relational databases, like:
• horizontal scalability
• read/write performance reason about
sharding and
• schema limitations master-slave
replicas
• difficult query patterns
• parallel data processing
• etc.
DISIM - University of L’Aquila
8. Massive read/write performance
usually fast key-value access
High Availability
Data can be stored in multiple nodes data can be partitioned
Helps in avoiding a single point of failure fault-tolerance
http://goo.gl/DAxmN
http://goo.gl/PVpoh DISIM - University of L’Aquila
9. Flexible schema and data types
easy to develop the application layer
(JSON, HTTP access, JS functions, etc.)
Ease of maintenance, administration
many vendors are spending a lot of effort on ease of use,
minimal administration, and automated operations
Promotes parallel computing
tremendously performant!
see Map-Reduce
http://goo.gl/PVpoh http://goo.gl/DAxmN DISIM - University of L’Aquila
10. Supporting large data sets with room to grow
thanks to partitioning, data structures and dedicated
algorithms
Tunable for deployment size or functionality
can be used for either medium to large datasets both in
terms of size and complexity
CHEAP (open-source)
http://goo.gl/DAxmN
http://goo.gl/PVpoh DISIM - University of L’Aquila
11. What are we giving up? some NOSQL approaches
provide some (but not
all) features listed here
• joins
• group by
• order by
• indexes
• ACID transactions
• complex relationships
• powerful and standard query language (SQL)
• data independence (mainly for data integrity)
• maturity
http://goo.gl/PVpoh DISIM - University of L’Aquila
12. Do you have somewhere a large set of
uncontrolled, unstructured, data
that you are trying to fit into a RDBMS?
– Storage of large amount of non-transactional data
• log analysis, web statistics, etc.
– Caching results from slower databases (see Twitter)
– Data denormalization of expensive join queries
– Manage data that is not easily analyzed in a RDBMS such
as time-or location-based data
– Real-time systems
• games, financial data, chats, etc.
DISIM - University of L’Aquila
13. Slide curtesy of Tobias Lindaaker http://www.thobe.org/ DISIM - University of L’Aquila
14. Slide curtesy of Tobias Lindaaker http://www.thobe.org/ DISIM - University of L’Aquila
15. Slide curtesy of Tobias Lindaaker http://www.thobe.org/ DISIM - University of L’Aquila
16. Slide curtesy of Tobias Lindaaker http://www.thobe.org/ DISIM - University of L’Aquila
17. Slide curtesy of Tobias Lindaaker http://www.thobe.org/ DISIM - University of L’Aquila
18. Slide curtesy of Tobias Lindaaker http://www.thobe.org/ DISIM - University of L’Aquila
19. Slide curtesy of Tobias Lindaaker http://www.thobe.org/ DISIM - University of L’Aquila
20. Slide curtesy of Tobias Lindaaker http://www.thobe.org/ DISIM - University of L’Aquila
21. Slide curtesy of Tobias Lindaaker http://www.thobe.org/ DISIM - University of L’Aquila
22. Why, When, Who NOSQL (now)?
The CAP Theorem
NOSQL Approaches
Case Study 1: Instagram
Case Study 2: Twitter
Case Study 3: tumblr
Summary
References
DISIM - University of L’Aquila
23. CAP Theorem
formulated by scientist Eric Brewer in 2000
It is impossible for a distributed computer system to
simultaneously provide all three of the following guarantees:
• Consistency: each client always has the same view of the data
• Availability: every received request must result in a response
• Partition Tolerance: every node must respond, even though some
messages between the nodes may be lost
DISIM - University of L’Aquila
25. Consistency CA Availability
∅
CP AP
Partition
Tolerance
To scale out, you have to partition
you have to choose between consistency or availability
DISIM - University of L’Aquila
26. Consistency model weaker than ACID Atomicity
Consistency
Isolation
Durability
BASE = Basically Available, Soft state, Eventual consistency
If a node fails,
part of the data The state of the The system
will not be system may change becomes consistent
available, but the over time, even at some later time
entire data layer without input
stays operational
http://queue.acm.org/detail.cfm?id=1394128 DISIM - University of L’Aquila
28. Why, When, Who NOSQL (now)?
The CAP Theorem
NOSQL Approaches
Case Study 1: Instagram
Case Study 2: Twitter
Case Study 3: tumblr
Summary
References
DISIM - University of L’Aquila
29. Document
Four genres of NOSQL databases:
key
Key-value
key value
Columnar
Graph
DISIM - University of L’Aquila
30. Implementations:
Riak
Redis
Voldemort
Here the focus is on SCALABILITY Dynamo
designed to handle massive load
stores a collection of Key-Value pairs
think absout maps or (associative
arrays) in classical programming
languages
KEY= string value
VALUE= any kind of element such as strings, videos, XML files, etc.
Key Namespaces to avoid collisions
http://goo.gl/LfG1N DISIM - University of L’Aquila
31. PROS
• easy to use
• extreme performance
• no need to maintain indices
• large horizontal data
CONS
• no complex queries (no SQL)
• no transactions
– actually REDIS has transactions
• many data structures cannot be easily modeled as key-value pairs
• must fit in memory
http://goo.gl/PGfjU DISIM - University of L’Aquila
32. • Stock prices
• Analytics
• Real-time data collection
• Real-time communication
• User sessions storage
• Caching Data from other DBs
SEE CASE STUDIES LATER IN THIS LECTURE
DISIM - University of L’Aquila
33. Implementations:
HBase
BigTable
Cassandra
Midway between relational and KV stores Vertica
Values are queried by matching keys
like relational DBs, their values are groups of zero or more columns
Differently from relational DBs, data from a given column is
stored together
adding columns is quite inexpensive
Each row can have a different set of columns, or none at all
this allows tables to remain sparse without additional storage cost for null values
DISIM - University of L’Aquila
34. PROS
• Easy to Distribute Tasks
• Solving ‘Big Data’ issues
• High Availability
• Garbage collection for expired data
• Scanning is very easy
CONS
• De-normalization
• Expensive to insert
• Requires heavy pre-planning of queries
DISIM - University of L’Aquila
35. • Search engines
• Logging
• Analysing log data
• When you need to scan huge, two-dimensional, join-less tables
• Banking (consistency enforcement)
• Many implementations provide versioning facilities
• in Cassandra writing is faster than reading values (!)
SEE CASE STUDIES LATER IN THIS LECTURE
DISIM - University of L’Aquila
36. Implementations:
MongoDB
CouchDB
RavenDB
Super-set of key-value DBs, you can query also on the value part
the document portion is structured
Think about documents as tuples with any number of fields (JSON)
Documents can contain nested structures
Documents are often versioned
Different document databases take different approaches for
indexing, querying, replication, consistency, etc.
choose wisely!
DISIM - University of L’Aquila
37. PROS
• Variable data
• Object Oriented Paradigms
• Concurrency
• Works well with de-normalized data
CONS
• Hard to do complex queries
• No Joins
• Enforcing Structured Data
DISIM - University of L’Aquila
38. • When you don’t know in advance what exactly your data will
look like
• They map well to object-oriented programming models
• For accumulating, occasionally changing data, on which pre-
defined queries are to be run
• Places where versioning is important
• Services that handle age difference, geographic location,
tastes and dislikes, etc.
• A leaderboard system that depends on many variables
SEE CASE STUDIES LATER IN THIS LECTURE
DISIM - University of L’Aquila
39. Implementations:
Neo4J
OrientDB
FlockDB
Trinity
Focus on modeling the structure of data & interconnectivity
Inspired by mathematical Graph Theory ( G=(E,V) )
b C e
Data model is the Property Graph: A d
• Entities are nodes D
a c
• Relationships are edges between Nodes
B E
• Key-Value pairs on both
Excels in dealing with highly interconnected data
Relational DBs can model graphs, but an edge requires a join which is expensive
DISIM - University of L’Aquila
41. PROS
• Easy match with the problem domain
– with relational, you have to create ER diagram, then normalize, etc.
• ability to quickly traverse nodes and relationships to find relevant
data
– you can apply the Dijstra algorithm for querying the DB
• Fit well with object-oriented concepts
• Neo4J has full ACID conformity
CONS
• generally not suitable for network partitioning
– due to the high interconnectedness
• No Joins
• Enforcing Structured Data
DISIM - University of L’Aquila
42. • Social networks
• Recommendation engines
• Geographic data
• Public transport links
• Road maps
• Network topologies
SEE CASE STUDIES LATER IN THIS LECTURE
DISIM - University of L’Aquila
46. Why, When, Who NOSQL (now)?
The CAP Theorem
NOSQL Approaches
Case Study 1: Instagram
Case Study 2: Twitter
Case Study 3: tumblr
Summary
References
DISIM - University of L’Aquila
52. relational
columnar
key-value
http://goo.gl/CrC0P DISIM - University of L’Aquila
53. Why, When, Who NOSQL (now)?
The CAP Theorem
NOSQL Approaches
Case Study 1: Instagram
Case Study 2: Twitter
Case Study 3: tumblr
Summary
References
DISIM - University of L’Aquila
54. both to
size and complexity
SCALABILITY - SCALABILITY – SCALABILITY
SCALABILITY - SCALABILITY - SCALABILITY
SCALABILITY - SCALABILITY – SCALABILITY
...usually at the cost of consistency
NOSQL is not the silver bullet for everything
Polyglot data is the new main trend...
...in 10 years the majority of the IT solutions still based
on RDBMS
DISIM - University of L’Aquila
56. simply drop a line to
ivano.malavolta@univaq.it
DISIM - University of L’Aquila
57. http://nosql-database.org/
http://goo.gl/ThO63
check out my blog for these slides
www.ivanomalavolta.com
Chapters 1 and 9
DISIM - University of L’Aquila