NoSQL
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

NoSQL

  • 1,753 views
Uploaded on

NoSQL: Overview of the main features and approaches ...

NoSQL: Overview of the main features and approaches

This presentation has been developed in the context of the Databases course at the DISIM Department of the University of L’Aquila (Italy).

http://www.di.univaq.it/malavolta

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
1,753
On Slideshare
1,449
From Embeds
304
Number of Embeds
3

Actions

Shares
Downloads
94
Comments
3
Likes
4

Embeds 304

http://www.ivanomalavolta.com 276
http://twimblr.appspot.com 27
https://twitter.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Ivano Malavolta ivano.malavolta@univaq.ithttp://www.di.univaq.it/malavolta DISIM - University of L’Aquila
  • 2. Why, When, Who NOSQL (now)?The CAP TheoremNOSQL ApproachesCase Study 1: InstagramCase Study 2: TwitterCase Study 3: tumblrSummaryReferences DISIM - University of L’Aquila
  • 3.  ACID  Atomicity  Consistency  Isolation  Durability Based on Relational Algebra  Select, Projection, Set Operators, Renaming, Joins Concept of Schema Standard DISIM - University of L’Aquila
  • 4. The term was coined in 2009 by Eric Evans,Software Developer at Apache Software FoundationClass of non-relational data storage systemsUsually do not require a fixed schemaMany NoSQL offerings relax one or more of the ACID properties DISIM - University of L’Aquila
  • 5. DISIM - University of L’Aquila
  • 6. No to SQL …we are not against SQL! Not only SQL It’s about recognizing that for some problems other storage solutions are better suited!http://goo.gl/gWIoy DISIM - University of L’Aquila
  • 7. Each NOSQL approach addresses somelimitations of relational databases, like:• horizontal scalability• read/write performance reason about sharding and• schema limitations master-slave replicas• difficult query patterns• parallel data processing• etc. DISIM - University of L’Aquila
  • 8. Massive read/write performance usually fast key-value access High Availability Data can be stored in multiple nodes  data can be partitioned Helps in avoiding a single point of failure  fault-tolerancehttp://goo.gl/DAxmNhttp://goo.gl/PVpoh DISIM - University of L’Aquila
  • 9. Flexible schema and data types  easy to develop the application layer (JSON, HTTP access, JS functions, etc.) Ease of maintenance, administration  many vendors are spending a lot of effort on ease of use, minimal administration, and automated operations Promotes parallel computing  tremendously performant! see Map-Reducehttp://goo.gl/PVpoh http://goo.gl/DAxmN DISIM - University of L’Aquila
  • 10. Supporting large data sets with room to grow  thanks to partitioning, data structures and dedicated algorithms Tunable for deployment size or functionality  can be used for either medium to large datasets both in terms of size and complexity CHEAP (open-source)http://goo.gl/DAxmNhttp://goo.gl/PVpoh DISIM - University of L’Aquila
  • 11. What are we giving up? some NOSQL approaches provide some (but not all) features listed here • joins • group by • order by • indexes • ACID transactions • complex relationships • powerful and standard query language (SQL) • data independence (mainly for data integrity) • maturityhttp://goo.gl/PVpoh DISIM - University of L’Aquila
  • 12. Do you have somewhere a large set of uncontrolled, unstructured, data that you are trying to fit into a RDBMS?– Storage of large amount of non-transactional data • log analysis, web statistics, etc.– Caching results from slower databases (see Twitter)– Data denormalization of expensive join queries– Manage data that is not easily analyzed in a RDBMS such as time-or location-based data– Real-time systems • games, financial data, chats, etc. DISIM - University of L’Aquila
  • 13. Slide curtesy of Tobias Lindaaker http://www.thobe.org/ DISIM - University of L’Aquila
  • 14. Slide curtesy of Tobias Lindaaker http://www.thobe.org/ DISIM - University of L’Aquila
  • 15. Slide curtesy of Tobias Lindaaker http://www.thobe.org/ DISIM - University of L’Aquila
  • 16. Slide curtesy of Tobias Lindaaker http://www.thobe.org/ DISIM - University of L’Aquila
  • 17. Slide curtesy of Tobias Lindaaker http://www.thobe.org/ DISIM - University of L’Aquila
  • 18. Slide curtesy of Tobias Lindaaker http://www.thobe.org/ DISIM - University of L’Aquila
  • 19. Slide curtesy of Tobias Lindaaker http://www.thobe.org/ DISIM - University of L’Aquila
  • 20. Slide curtesy of Tobias Lindaaker http://www.thobe.org/ DISIM - University of L’Aquila
  • 21. Slide curtesy of Tobias Lindaaker http://www.thobe.org/ DISIM - University of L’Aquila
  • 22. Why, When, Who NOSQL (now)?The CAP TheoremNOSQL ApproachesCase Study 1: InstagramCase Study 2: TwitterCase Study 3: tumblrSummaryReferences DISIM - University of L’Aquila
  • 23. CAP Theoremformulated by scientist Eric Brewer in 2000It is impossible for a distributed computer system tosimultaneously provide all three of the following guarantees:• Consistency: each client always has the same view of the data• Availability: every received request must result in a response• Partition Tolerance: every node must respond, even though some messages between the nodes may be lost DISIM - University of L’Aquila
  • 24. Demonstration... DISIM - University of L’Aquila
  • 25. Consistency CA Availability ∅ CP AP Partition ToleranceTo scale out, you have to partition  you have to choose between consistency or availability DISIM - University of L’Aquila
  • 26. Consistency model weaker than ACID Atomicity Consistency Isolation Durability BASE = Basically Available, Soft state, Eventual consistency If a node fails, part of the data The state of the The system will not be system may change becomes consistent available, but the over time, even at some later time entire data layer without input stays operationalhttp://queue.acm.org/detail.cfm?id=1394128 DISIM - University of L’Aquila
  • 27. BASE example DISIM - University of L’Aquila
  • 28. Why, When, Who NOSQL (now)?The CAP TheoremNOSQL ApproachesCase Study 1: InstagramCase Study 2: TwitterCase Study 3: tumblrSummaryReferences DISIM - University of L’Aquila
  • 29. Document Four genres of NOSQL databases: keyKey-valuekey value Columnar Graph DISIM - University of L’Aquila
  • 30. Implementations: Riak Redis Voldemort Here the focus is on SCALABILITY Dynamo designed to handle massive load stores a collection of Key-Value pairs think absout maps or (associative arrays) in classical programming languages KEY= string value VALUE= any kind of element such as strings, videos, XML files, etc. Key Namespaces to avoid collisionshttp://goo.gl/LfG1N DISIM - University of L’Aquila
  • 31. PROS • easy to use • extreme performance • no need to maintain indices • large horizontal data CONS • no complex queries (no SQL) • no transactions – actually REDIS has transactions • many data structures cannot be easily modeled as key-value pairs • must fit in memoryhttp://goo.gl/PGfjU DISIM - University of L’Aquila
  • 32. • Stock prices• Analytics• Real-time data collection• Real-time communication• User sessions storage• Caching Data from other DBsSEE CASE STUDIES LATER IN THIS LECTURE DISIM - University of L’Aquila
  • 33. Implementations: HBase BigTable CassandraMidway between relational and KV stores VerticaValues are queried by matching keys like relational DBs, their values are groups of zero or more columnsDifferently from relational DBs, data from a given column is stored together adding columns is quite inexpensiveEach row can have a different set of columns, or none at all this allows tables to remain sparse without additional storage cost for null values DISIM - University of L’Aquila
  • 34. PROS• Easy to Distribute Tasks• Solving ‘Big Data’ issues• High Availability• Garbage collection for expired data• Scanning is very easyCONS• De-normalization• Expensive to insert• Requires heavy pre-planning of queries DISIM - University of L’Aquila
  • 35. • Search engines• Logging• Analysing log data• When you need to scan huge, two-dimensional, join-less tables• Banking (consistency enforcement)• Many implementations provide versioning facilities• in Cassandra writing is faster than reading values (!) SEE CASE STUDIES LATER IN THIS LECTURE DISIM - University of L’Aquila
  • 36. Implementations: MongoDB CouchDB RavenDBSuper-set of key-value DBs, you can query also on the value part the document portion is structuredThink about documents as tuples with any number of fields (JSON)Documents can contain nested structuresDocuments are often versionedDifferent document databases take different approaches for indexing, querying, replication, consistency, etc. choose wisely! DISIM - University of L’Aquila
  • 37. PROS• Variable data• Object Oriented Paradigms• Concurrency• Works well with de-normalized dataCONS• Hard to do complex queries• No Joins• Enforcing Structured Data DISIM - University of L’Aquila
  • 38. • When you don’t know in advance what exactly your data will look like• They map well to object-oriented programming models• For accumulating, occasionally changing data, on which pre- defined queries are to be run• Places where versioning is important• Services that handle age difference, geographic location, tastes and dislikes, etc.• A leaderboard system that depends on many variables SEE CASE STUDIES LATER IN THIS LECTURE DISIM - University of L’Aquila
  • 39. Implementations: Neo4J OrientDB FlockDB TrinityFocus on modeling the structure of data & interconnectivityInspired by mathematical Graph Theory ( G=(E,V) ) b C eData model is the Property Graph: A d• Entities are nodes D a c• Relationships are edges between Nodes B E• Key-Value pairs on bothExcels in dealing with highly interconnected data Relational DBs can model graphs, but an edge requires a join which is expensive DISIM - University of L’Aquila
  • 40. DISIM - University of L’Aquila
  • 41. PROS• Easy match with the problem domain – with relational, you have to create ER diagram, then normalize, etc.• ability to quickly traverse nodes and relationships to find relevant data – you can apply the Dijstra algorithm for querying the DB• Fit well with object-oriented concepts• Neo4J has full ACID conformityCONS• generally not suitable for network partitioning – due to the high interconnectedness• No Joins• Enforcing Structured Data DISIM - University of L’Aquila
  • 42. • Social networks• Recommendation engines• Geographic data• Public transport links• Road maps• Network topologies SEE CASE STUDIES LATER IN THIS LECTURE DISIM - University of L’Aquila
  • 43. DISIM - University of L’Aquila
  • 44. DISIM - University of L’Aquila
  • 45. http://goo.gl/0JoW8 DISIM - University of L’Aquila
  • 46. Why, When, Who NOSQL (now)?The CAP TheoremNOSQL ApproachesCase Study 1: InstagramCase Study 2: TwitterCase Study 3: tumblrSummaryReferences DISIM - University of L’Aquila
  • 47. http://goo.gl/xpPac DISIM - University of L’Aquila
  • 48. relational key-value (in the cloud) key-valuehttp://goo.gl/mkfQNhttp://goo.gl/xpPac DISIM - University of L’Aquila
  • 49. DISIM - University of L’Aquila
  • 50. columnar graph key-value plushttp://goo.gl/2kdvm Blobstore! DISIM - University of L’Aquila
  • 51. http://goo.gl/CrC0P DISIM - University of L’Aquila
  • 52. relational columnar key-valuehttp://goo.gl/CrC0P DISIM - University of L’Aquila
  • 53. Why, When, Who NOSQL (now)?The CAP TheoremNOSQL ApproachesCase Study 1: InstagramCase Study 2: TwitterCase Study 3: tumblrSummaryReferences DISIM - University of L’Aquila
  • 54. both to size and complexity SCALABILITY - SCALABILITY – SCALABILITY SCALABILITY - SCALABILITY - SCALABILITY SCALABILITY - SCALABILITY – SCALABILITY ...usually at the cost of consistencyNOSQL is not the silver bullet for everythingPolyglot data is the new main trend......in 10 years the majority of the IT solutions still based on RDBMS DISIM - University of L’Aquila
  • 55. DISIM - University of L’Aquila
  • 56. simply drop a line toivano.malavolta@univaq.it DISIM - University of L’Aquila
  • 57. http://nosql-database.org/ http://goo.gl/ThO63 check out my blog for these slides www.ivanomalavolta.comChapters 1 and 9 DISIM - University of L’Aquila
  • 58.  Neo4j - http://neo4j.org OrientDB – http://www.orientdb.org VoltDB – http://www.voltdb.com CouchDB - http://couchdb.apache.org Cassandra - http://cassandra.apache.org Riak – http://www.basho.com Hbase – http: //hbase.apache.org MongoDB - http://www.mongodb.org Redis - http://code.google.com/p/redis Oracle Berkley DB - http://www.oracle.com/database/berkeley-db FlockDB - http://github.com/twitter/flockdb DISIM - University of L’Aquila