Ivano Malavolta    ivano.malavolta@univaq.ithttp://www.di.univaq.it/malavolta                          DISIM - University ...
Why, When, Who NOSQL (now)?The CAP TheoremNOSQL ApproachesCase Study 1: InstagramCase Study 2: TwitterCase Study 3: tumblr...
 ACID   Atomicity   Consistency   Isolation   Durability Based on Relational Algebra     Select, Projection, Set Op...
The term was coined in 2009 by Eric Evans,Software Developer at Apache Software FoundationClass of non-relational data sto...
DISIM - University of L’Aquila
No to SQL                          …we are not against SQL!                      Not only SQL                      It’s ab...
Each NOSQL approach addresses somelimitations of relational databases, like:• horizontal scalability• read/write performan...
Massive read/write performance    usually fast key-value access    High Availability    Data can be stored in multiple no...
Flexible schema and data types     easy to develop the application layer    (JSON, HTTP access, JS functions, etc.)    Ea...
Supporting large data sets with room to grow     thanks to partitioning, data structures and dedicated      algorithms   ...
What are we giving up?                   some NOSQL approaches                                              provide some (...
Do you have somewhere a large set of       uncontrolled, unstructured, data    that you are trying to fit into a RDBMS?– S...
Slide curtesy of Tobias Lindaaker http://www.thobe.org/   DISIM - University of L’Aquila
Slide curtesy of Tobias Lindaaker http://www.thobe.org/   DISIM - University of L’Aquila
Slide curtesy of Tobias Lindaaker http://www.thobe.org/   DISIM - University of L’Aquila
Slide curtesy of Tobias Lindaaker http://www.thobe.org/   DISIM - University of L’Aquila
Slide curtesy of Tobias Lindaaker http://www.thobe.org/   DISIM - University of L’Aquila
Slide curtesy of Tobias Lindaaker http://www.thobe.org/   DISIM - University of L’Aquila
Slide curtesy of Tobias Lindaaker http://www.thobe.org/   DISIM - University of L’Aquila
Slide curtesy of Tobias Lindaaker http://www.thobe.org/   DISIM - University of L’Aquila
Slide curtesy of Tobias Lindaaker http://www.thobe.org/   DISIM - University of L’Aquila
Why, When, Who NOSQL (now)?The CAP TheoremNOSQL ApproachesCase Study 1: InstagramCase Study 2: TwitterCase Study 3: tumblr...
CAP Theoremformulated by scientist Eric Brewer in 2000It is impossible for a distributed computer system tosimultaneously ...
Demonstration...                   DISIM - University of L’Aquila
Consistency       CA     Availability                               ∅                        CP           AP              ...
Consistency model weaker than             ACID       Atomicity                                                        Cons...
BASE example               DISIM - University of L’Aquila
Why, When, Who NOSQL (now)?The CAP TheoremNOSQL ApproachesCase Study 1: InstagramCase Study 2: TwitterCase Study 3: tumblr...
Document  Four genres of NOSQL databases:                                    keyKey-valuekey   value                    Co...
Implementations:                                                                       Riak                               ...
PROS    • easy to use    • extreme performance    • no need to maintain indices    • large horizontal data    CONS    • no...
•   Stock prices•   Analytics•   Real-time data collection•   Real-time communication•   User sessions storage•   Caching ...
Implementations:                                                                            HBase                         ...
PROS• Easy to Distribute Tasks• Solving ‘Big Data’ issues• High Availability• Garbage collection for expired data• Scannin...
•   Search engines•   Logging•   Analysing log data•   When you need to scan huge, two-dimensional, join-less tables•   Ba...
Implementations:                                                          MongoDB                                         ...
PROS• Variable data• Object Oriented Paradigms• Concurrency• Works well with de-normalized dataCONS• Hard to do complex qu...
• When you don’t know in advance what exactly your data will  look like• They map well to object-oriented programming mode...
Implementations:                                                                              Neo4J                       ...
DISIM - University of L’Aquila
PROS• Easy match with the problem domain   – with relational, you have to create ER diagram, then normalize, etc.• ability...
•   Social networks•   Recommendation engines•   Geographic data•   Public transport links•   Road maps•   Network topolog...
DISIM - University of L’Aquila
DISIM - University of L’Aquila
http://goo.gl/0JoW8   DISIM - University of L’Aquila
Why, When, Who NOSQL (now)?The CAP TheoremNOSQL ApproachesCase Study 1: InstagramCase Study 2: TwitterCase Study 3: tumblr...
http://goo.gl/xpPac   DISIM - University of L’Aquila
relational  key-value (in   the cloud)     key-valuehttp://goo.gl/mkfQNhttp://goo.gl/xpPac                DISIM - Universi...
DISIM - University of L’Aquila
columnar  graph  key-value                         plushttp://goo.gl/2kdvm                      Blobstore!   DISIM - Unive...
http://goo.gl/CrC0P                      DISIM - University of L’Aquila
relational                                 columnar  key-valuehttp://goo.gl/CrC0P   DISIM - University of L’Aquila
Why, When, Who NOSQL (now)?The CAP TheoremNOSQL ApproachesCase Study 1: InstagramCase Study 2: TwitterCase Study 3: tumblr...
both to                                            size and complexity       SCALABILITY - SCALABILITY – SCALABILITY      ...
DISIM - University of L’Aquila
simply drop a line toivano.malavolta@univaq.it             DISIM - University of L’Aquila
http://nosql-database.org/                     http://goo.gl/ThO63                   check out my blog for these slides   ...
   Neo4j - http://neo4j.org   OrientDB – http://www.orientdb.org   VoltDB – http://www.voltdb.com   CouchDB - http://c...
Upcoming SlideShare
Loading in...5
×

NoSQL

1,552

Published on

NoSQL: Overview of the main features and approaches

This presentation has been developed in the context of the Databases course at the DISIM Department of the University of L’Aquila (Italy).

http://www.di.univaq.it/malavolta

Published in: Technology
3 Comments
5 Likes
Statistics
Notes
No Downloads
Views
Total Views
1,552
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
104
Comments
3
Likes
5
Embeds 0
No embeds

No notes for slide

NoSQL

  1. 1. Ivano Malavolta ivano.malavolta@univaq.ithttp://www.di.univaq.it/malavolta DISIM - University of L’Aquila
  2. 2. Why, When, Who NOSQL (now)?The CAP TheoremNOSQL ApproachesCase Study 1: InstagramCase Study 2: TwitterCase Study 3: tumblrSummaryReferences DISIM - University of L’Aquila
  3. 3.  ACID  Atomicity  Consistency  Isolation  Durability Based on Relational Algebra  Select, Projection, Set Operators, Renaming, Joins Concept of Schema Standard DISIM - University of L’Aquila
  4. 4. The term was coined in 2009 by Eric Evans,Software Developer at Apache Software FoundationClass of non-relational data storage systemsUsually do not require a fixed schemaMany NoSQL offerings relax one or more of the ACID properties DISIM - University of L’Aquila
  5. 5. DISIM - University of L’Aquila
  6. 6. No to SQL …we are not against SQL! Not only SQL It’s about recognizing that for some problems other storage solutions are better suited!http://goo.gl/gWIoy DISIM - University of L’Aquila
  7. 7. Each NOSQL approach addresses somelimitations of relational databases, like:• horizontal scalability• read/write performance reason about sharding and• schema limitations master-slave replicas• difficult query patterns• parallel data processing• etc. DISIM - University of L’Aquila
  8. 8. Massive read/write performance usually fast key-value access High Availability Data can be stored in multiple nodes  data can be partitioned Helps in avoiding a single point of failure  fault-tolerancehttp://goo.gl/DAxmNhttp://goo.gl/PVpoh DISIM - University of L’Aquila
  9. 9. Flexible schema and data types  easy to develop the application layer (JSON, HTTP access, JS functions, etc.) Ease of maintenance, administration  many vendors are spending a lot of effort on ease of use, minimal administration, and automated operations Promotes parallel computing  tremendously performant! see Map-Reducehttp://goo.gl/PVpoh http://goo.gl/DAxmN DISIM - University of L’Aquila
  10. 10. Supporting large data sets with room to grow  thanks to partitioning, data structures and dedicated algorithms Tunable for deployment size or functionality  can be used for either medium to large datasets both in terms of size and complexity CHEAP (open-source)http://goo.gl/DAxmNhttp://goo.gl/PVpoh DISIM - University of L’Aquila
  11. 11. What are we giving up? some NOSQL approaches provide some (but not all) features listed here • joins • group by • order by • indexes • ACID transactions • complex relationships • powerful and standard query language (SQL) • data independence (mainly for data integrity) • maturityhttp://goo.gl/PVpoh DISIM - University of L’Aquila
  12. 12. Do you have somewhere a large set of uncontrolled, unstructured, data that you are trying to fit into a RDBMS?– Storage of large amount of non-transactional data • log analysis, web statistics, etc.– Caching results from slower databases (see Twitter)– Data denormalization of expensive join queries– Manage data that is not easily analyzed in a RDBMS such as time-or location-based data– Real-time systems • games, financial data, chats, etc. DISIM - University of L’Aquila
  13. 13. Slide curtesy of Tobias Lindaaker http://www.thobe.org/ DISIM - University of L’Aquila
  14. 14. Slide curtesy of Tobias Lindaaker http://www.thobe.org/ DISIM - University of L’Aquila
  15. 15. Slide curtesy of Tobias Lindaaker http://www.thobe.org/ DISIM - University of L’Aquila
  16. 16. Slide curtesy of Tobias Lindaaker http://www.thobe.org/ DISIM - University of L’Aquila
  17. 17. Slide curtesy of Tobias Lindaaker http://www.thobe.org/ DISIM - University of L’Aquila
  18. 18. Slide curtesy of Tobias Lindaaker http://www.thobe.org/ DISIM - University of L’Aquila
  19. 19. Slide curtesy of Tobias Lindaaker http://www.thobe.org/ DISIM - University of L’Aquila
  20. 20. Slide curtesy of Tobias Lindaaker http://www.thobe.org/ DISIM - University of L’Aquila
  21. 21. Slide curtesy of Tobias Lindaaker http://www.thobe.org/ DISIM - University of L’Aquila
  22. 22. Why, When, Who NOSQL (now)?The CAP TheoremNOSQL ApproachesCase Study 1: InstagramCase Study 2: TwitterCase Study 3: tumblrSummaryReferences DISIM - University of L’Aquila
  23. 23. CAP Theoremformulated by scientist Eric Brewer in 2000It is impossible for a distributed computer system tosimultaneously provide all three of the following guarantees:• Consistency: each client always has the same view of the data• Availability: every received request must result in a response• Partition Tolerance: every node must respond, even though some messages between the nodes may be lost DISIM - University of L’Aquila
  24. 24. Demonstration... DISIM - University of L’Aquila
  25. 25. Consistency CA Availability ∅ CP AP Partition ToleranceTo scale out, you have to partition  you have to choose between consistency or availability DISIM - University of L’Aquila
  26. 26. Consistency model weaker than ACID Atomicity Consistency Isolation Durability BASE = Basically Available, Soft state, Eventual consistency If a node fails, part of the data The state of the The system will not be system may change becomes consistent available, but the over time, even at some later time entire data layer without input stays operationalhttp://queue.acm.org/detail.cfm?id=1394128 DISIM - University of L’Aquila
  27. 27. BASE example DISIM - University of L’Aquila
  28. 28. Why, When, Who NOSQL (now)?The CAP TheoremNOSQL ApproachesCase Study 1: InstagramCase Study 2: TwitterCase Study 3: tumblrSummaryReferences DISIM - University of L’Aquila
  29. 29. Document Four genres of NOSQL databases: keyKey-valuekey value Columnar Graph DISIM - University of L’Aquila
  30. 30. Implementations: Riak Redis Voldemort Here the focus is on SCALABILITY Dynamo designed to handle massive load stores a collection of Key-Value pairs think absout maps or (associative arrays) in classical programming languages KEY= string value VALUE= any kind of element such as strings, videos, XML files, etc. Key Namespaces to avoid collisionshttp://goo.gl/LfG1N DISIM - University of L’Aquila
  31. 31. PROS • easy to use • extreme performance • no need to maintain indices • large horizontal data CONS • no complex queries (no SQL) • no transactions – actually REDIS has transactions • many data structures cannot be easily modeled as key-value pairs • must fit in memoryhttp://goo.gl/PGfjU DISIM - University of L’Aquila
  32. 32. • Stock prices• Analytics• Real-time data collection• Real-time communication• User sessions storage• Caching Data from other DBsSEE CASE STUDIES LATER IN THIS LECTURE DISIM - University of L’Aquila
  33. 33. Implementations: HBase BigTable CassandraMidway between relational and KV stores VerticaValues are queried by matching keys like relational DBs, their values are groups of zero or more columnsDifferently from relational DBs, data from a given column is stored together adding columns is quite inexpensiveEach row can have a different set of columns, or none at all this allows tables to remain sparse without additional storage cost for null values DISIM - University of L’Aquila
  34. 34. PROS• Easy to Distribute Tasks• Solving ‘Big Data’ issues• High Availability• Garbage collection for expired data• Scanning is very easyCONS• De-normalization• Expensive to insert• Requires heavy pre-planning of queries DISIM - University of L’Aquila
  35. 35. • Search engines• Logging• Analysing log data• When you need to scan huge, two-dimensional, join-less tables• Banking (consistency enforcement)• Many implementations provide versioning facilities• in Cassandra writing is faster than reading values (!) SEE CASE STUDIES LATER IN THIS LECTURE DISIM - University of L’Aquila
  36. 36. Implementations: MongoDB CouchDB RavenDBSuper-set of key-value DBs, you can query also on the value part the document portion is structuredThink about documents as tuples with any number of fields (JSON)Documents can contain nested structuresDocuments are often versionedDifferent document databases take different approaches for indexing, querying, replication, consistency, etc. choose wisely! DISIM - University of L’Aquila
  37. 37. PROS• Variable data• Object Oriented Paradigms• Concurrency• Works well with de-normalized dataCONS• Hard to do complex queries• No Joins• Enforcing Structured Data DISIM - University of L’Aquila
  38. 38. • When you don’t know in advance what exactly your data will look like• They map well to object-oriented programming models• For accumulating, occasionally changing data, on which pre- defined queries are to be run• Places where versioning is important• Services that handle age difference, geographic location, tastes and dislikes, etc.• A leaderboard system that depends on many variables SEE CASE STUDIES LATER IN THIS LECTURE DISIM - University of L’Aquila
  39. 39. Implementations: Neo4J OrientDB FlockDB TrinityFocus on modeling the structure of data & interconnectivityInspired by mathematical Graph Theory ( G=(E,V) ) b C eData model is the Property Graph: A d• Entities are nodes D a c• Relationships are edges between Nodes B E• Key-Value pairs on bothExcels in dealing with highly interconnected data Relational DBs can model graphs, but an edge requires a join which is expensive DISIM - University of L’Aquila
  40. 40. DISIM - University of L’Aquila
  41. 41. PROS• Easy match with the problem domain – with relational, you have to create ER diagram, then normalize, etc.• ability to quickly traverse nodes and relationships to find relevant data – you can apply the Dijstra algorithm for querying the DB• Fit well with object-oriented concepts• Neo4J has full ACID conformityCONS• generally not suitable for network partitioning – due to the high interconnectedness• No Joins• Enforcing Structured Data DISIM - University of L’Aquila
  42. 42. • Social networks• Recommendation engines• Geographic data• Public transport links• Road maps• Network topologies SEE CASE STUDIES LATER IN THIS LECTURE DISIM - University of L’Aquila
  43. 43. DISIM - University of L’Aquila
  44. 44. DISIM - University of L’Aquila
  45. 45. http://goo.gl/0JoW8 DISIM - University of L’Aquila
  46. 46. Why, When, Who NOSQL (now)?The CAP TheoremNOSQL ApproachesCase Study 1: InstagramCase Study 2: TwitterCase Study 3: tumblrSummaryReferences DISIM - University of L’Aquila
  47. 47. http://goo.gl/xpPac DISIM - University of L’Aquila
  48. 48. relational key-value (in the cloud) key-valuehttp://goo.gl/mkfQNhttp://goo.gl/xpPac DISIM - University of L’Aquila
  49. 49. DISIM - University of L’Aquila
  50. 50. columnar graph key-value plushttp://goo.gl/2kdvm Blobstore! DISIM - University of L’Aquila
  51. 51. http://goo.gl/CrC0P DISIM - University of L’Aquila
  52. 52. relational columnar key-valuehttp://goo.gl/CrC0P DISIM - University of L’Aquila
  53. 53. Why, When, Who NOSQL (now)?The CAP TheoremNOSQL ApproachesCase Study 1: InstagramCase Study 2: TwitterCase Study 3: tumblrSummaryReferences DISIM - University of L’Aquila
  54. 54. both to size and complexity SCALABILITY - SCALABILITY – SCALABILITY SCALABILITY - SCALABILITY - SCALABILITY SCALABILITY - SCALABILITY – SCALABILITY ...usually at the cost of consistencyNOSQL is not the silver bullet for everythingPolyglot data is the new main trend......in 10 years the majority of the IT solutions still based on RDBMS DISIM - University of L’Aquila
  55. 55. DISIM - University of L’Aquila
  56. 56. simply drop a line toivano.malavolta@univaq.it DISIM - University of L’Aquila
  57. 57. http://nosql-database.org/ http://goo.gl/ThO63 check out my blog for these slides www.ivanomalavolta.comChapters 1 and 9 DISIM - University of L’Aquila
  58. 58.  Neo4j - http://neo4j.org OrientDB – http://www.orientdb.org VoltDB – http://www.voltdb.com CouchDB - http://couchdb.apache.org Cassandra - http://cassandra.apache.org Riak – http://www.basho.com Hbase – http: //hbase.apache.org MongoDB - http://www.mongodb.org Redis - http://code.google.com/p/redis Oracle Berkley DB - http://www.oracle.com/database/berkeley-db FlockDB - http://github.com/twitter/flockdb DISIM - University of L’Aquila
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×