Your SlideShare is downloading. ×
0
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Seminar.2010.NoSql
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Seminar.2010.NoSql

4,011

Published on

4 Comments
4 Likes
Statistics
Notes
No Downloads
Views
Total Views
4,011
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
274
Comments
4
Likes
4
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. July 11th, 2010
  • 2. Apples, Oranges and NOSQL Roi Aldaag Architect & Consultant Nadav Wiener Architect & Consultant
  • 3. Agenda Introduction » What is NoSQL? » What’s “wrong” with RDBMS? » Why now? 3
  • 4. Agenda RDBMS vs. NoSQL » Scaling » CAP Theorem » ACID vs. BASE 4
  • 5. Agenda NoSQL Taxonomy » Key / Value » Column » Document » Graph 5
  • 6. Agenda How to choose? » Comparing Apples to Oranges » Polyglot Persistence 6
  • 7. Introduction
  • 8. Introduction Question: What do they all have in common? 8
  • 9. Introduction Before we answer – some facts: 9
  • 10. Introduction Before we answer – some facts: Daily Page Views 7.8x109 7.1x109 550x106 350x106 82x106 Daily Visitors 620x106 500x106 56x106 37x106 12x106 Data size Petabytes Petabytes Petabytes Terabytes Terabytes July, 2010: http://www.alexa.com 10
  • 11. Introduction Answer: They use NoSQL data stores 11
  • 12. Introduction Why!? 12
  • 13. Introduction Relational DBs Have Scaling Limitations » ACID doesn’t scale well horizontally  Sharding breaks relations  Joins are inefficient » Transactions overhead » Schema is not flexible  Predfined  Hard to evolve 13
  • 14. Introduction What is NoSQL? » NO SQL / Not Only SQL » A collective description of Open Source, Non-relational, data stores  Highly distributed  Highly scalable  Not ACID and... doesn’t use SQL » Term coined in a convention in 2009 called “NoSQL” (Eric Evans) » Started a movement that is gaining momentum 14
  • 15. Introduction 15
  • 16. Introduction Why now? » NoSQL data stores predate RDBMS (1970)  But remained a niche » RDBMS – most popular and generic option » Web 2.0 introduced new requirements:  Exponential increase in data  Information connectivity  Semi-structured data » NoSQL data stores had answers  When time was right  When RDBMSs didn’t 16
  • 17. Introduction It’s theory time: 17
  • 18. ali Sc ng 18
  • 19. Scaling Scaling Up » Adding resources to a single node in a system » Add more CPUs or memory » Move system to a larger machine » Pros:  Quick and Simple » Cons:  Outgrowing the capacity of largest system available (More’s law)  Expensive  Creates vendor lock-in 19
  • 20. Scaling Scaling Out » Add more nodes to a system » Functional Scaling (vertical)  Grouping data by function and spreading functional groups across databases » Sharding (horizontal)  Splitting same functional data across multiple databases » Pros: More flexible » Cons: More complex 20
  • 21. Distributed Databases
  • 22. Distributed Databases » Many nodes Node 1 Node 2 » Same database Node 3 22
  • 23. Distributed Databases What are the requirements from distributed databases? » Consistency  All clients can see the same data » Availability  All clients can always access data » Partition tolerance  The ability to continue working when the network topology is broken  The ability to recover once the network is healed 23
  • 24. Distributed Databases CAP Theorem (E. Brewer, N. Lynch) » You can fully satisfy at most 2 out of 3  Compromise on 3rd » Not “all or nothing”  Choose various levels of consistency, availability or partition tolerance » Recognize which of the CAP rules your business needs for the task 24
  • 25. Distributed Databases CA: Consistency & Availability » Partition Tolerance is compromised » Single site clusters (easier to ensure all nodes are always in contact) » When a network partition occurs, the system blocks » e.g. Two Phase Commit (2PC) Partition Tolerance 25
  • 26. Distributed Databases CP: Consistency & Partitioning » Availability is compromised » Access to some data may be temporarily limited » The rest is still consistent/accurate » e.g. Sharded database » TBD sample Partition Tolerance 26
  • 27. Distributed Databases AP: Availability & Partitioning » Consistency is compromised » System is still available under partitioning » Some data returned may be temporarily not up-to-date » Requires conflict resolution strategy » e.g. DNS, caches, Master/Slave replication » TBD sample Partition Tolerance 27
  • 28. ACID vs. BASE
  • 29. ACID vs. BASE ACID – a quick recap » Atomicity  When a part of the transaction fails -> the entire transaction fails; Database state is left unchanged » Consistency  A transaction takes database from one consistent state to another » Isolation  A transaction can't see dirty state from other transactions » Durability  Commit means commit. 29
  • 30. ACID vs. BASE BASE » The CAP compliment of ACID  Just had to be called BASE  Backronym: » Basically Available » Soft State » Eventually Consistent 30
  • 31. ACID vs. BASE RDBMS & ACID / NoSQL & BASE » RDBMSs strive to provide ACID guarantees  ACID forces consistency » NoSQL solutions often scale through BASE  BASE accepts that conflicts will happen 31
  • 32. Taxonomy
  • 33. Taxonomy Key / Value Column XML Graph Document TXT BIN 33
  • 34. Taxonomy Key / Value Databases 34
  • 35. Taxonomy Key/Value Stores » Simple Key / Value lookups (DHT) » Value is opaque » Focus on scaling to huge amounts of data » Designed to handle massive load » E.g.  Riak Based on Amazon’s  Project Voldemort Dynamo paper  Redis 35
  • 36. Taxonomy Key/Value e.g.: Riak » No single point of failure » No machines are special or central » MapReduce queries (Erlang / Javascript) » HTTP/JSON API » Ring cluster with automatic replication » Elastic / partition rebalancing » Written in: Erlang, C, Javascript » Developed by: Basho Technologies » Java client: (jonjlee / riak-java-client) 36
  • 37. Key/Value e.g.: Riak Data Model » Key / Value pairs are stored in a Bucket » A Bucket ~ a namespace Versioning » Each update is tracked by a Vector Clock  An algorithm for determining ordering and detecting conflicts » When in conflict  Last wins / manual resolution 37
  • 38. Key/Value e.g.: Riak Example: REST API » Read an object GET /riak/bucket/key » Store a new object POST /riak/bucket » Store an object with existing key (update) PUT /riak/bucket/key 38
  • 39. Key/Value e.g.: Riak MapReduce » A framework supporting distributed computing on large data sets on clusters of machines » Leverage parallel processing power » Introduced by Google » Inspired by map / reduce functions in functional programming » Map step » Reduce step 39
  • 40. Key/Value e.g.: Riak MapReduce example: Inverted Index » Map » Parse each document » Emit a sequence of <word, doc_id> pairs <doc_id, doc_text> <word ,doc_id> Node < word1 ,100>, <100, TXT1 >, 1 < word2 ,100>, Node <200, TXT2 >, 2 < word2 ,200>, TXT3 Node <300, > 3 < word2 ,300> 40
  • 41. Key/Value e.g.: Riak MapReduce example: Inverted Index » Reduce » Accept all pairs for a given word » Sort the corresponding document IDs » Emit a <word, list(document ID)> pair <word, list(document_id)> < word1 ,(100) >, < word2 ,(100,200)>, < word3 ,(300) > 41
  • 42. Taxonomy BigTable and Column Oriented Databases 42
  • 43. Taxonomy Column Stores – BigTable derivatives » Conceptually a single, infinitely large table » Each rows can have different number of columns » Table is sparse: |rows|*|columns| > |values | » Based on Google’s BigTable paper » E.g.  Cassandra  Hbase  Hypertable 43
  • 44. Taxonomy Use Case: Manage products with diverse attributes » RDBMS:  Create a central table with common attributes  Create a table per product with unique attributes  Use a join query  Alternatively create a table that holds meta data on products » NoSQL:  Column oriented database  Use arbitrarily columns 44
  • 45. Taxonomy Column Store e.g.: Cassandra » Data model: Google’s BigTable » Infrastructure: Amazon Dynamo » Incremental scalability » Flexible schema » No single point of failure (Distributed P2P) » Optimistic replication (Gossip protocol) » Written in: Java » Developed by: Facebook » Java client: e.g. Hector / Thrift 45
  • 46. Column e.g.: Cassandra Data Model » Column  Smallest increment of data: tuple of name, value, timestamp { name: "emailAddress", value: “nosql@alphacsp.com", timestamp: 123456789 } 46
  • 47. Column e.g.: Cassandra » SuperColumn  A sorted, associative, unbounded array of columns { // this is a SuperColumn name: "homeAddress", // with an unbounded array of Columns value: { // the keys is the name of the Column street: {name: "street", value: "s", timestamp:...}, city: {name: "city", value: "c", timestamp:...}, zip: {name: "zip", value: "z", timestamp:...} } } 47
  • 48. Column e.g.: Cassandra » ColumnFamily  A container (~Table) for columns sorted by their names  Column Families are referenced and sorted by row keys Users = { // ColumnFamily john: { // key to row in CF "role" : "admin", "status" : "offline", "nick" : "dude1934" }, // end row fred: { // another row "nick" : “freddy", "email" :"fred@example.com", "age" : "25", "gender" : "male",… },… // more rows } Column Family 48
  • 49. Column e.g.: Cassandra » Keyspace  The outer most grouping of data (~DB Schema)  Contains ColumnFamily’s  There is no imposed relationship between ColumsFamily’s 49
  • 50. Column e.g.: Cassandra » Example Tweets CF Keyspace Timeline CF 50
  • 51. Taxonomy XML TXT Document Oriented Databases BIN 51
  • 52. Taxonomy Document Store » Store semi-structured documents (think JSON) » Document versioning » Map/Reduce based queries, sorting, aggregation, etc. » DB is aware of internal structure » E.g.  MongoDB  CouchDB  JackRabbit (JCR JSR 170) 52
  • 53. Taxonomy Use Case: Blog with tagged posts and comments » RDBMS:  Table for each: posts, comments, tags  Foreign relations » NoSQL:  Document storage  Store post + tags + comments as a document 53
  • 54. Taxonomy Document Store e.g: MongoDB » MongoDB (from "humongous") » Manages collections of JSON-like documents (BSON) » Queries can return specific fields of documents » Supports secondary indexes » Atomic operations on single documents » Developed by: 10gen » Written in: C++ » Clients: Java, Scala and more 54
  • 55. Docment e.g.: MongoDB Example: Blog posts » Suppose you host a blog, where each post is tagged: db.posts.save({ _id : 3, author:"john", title : “Apples, Oranges and NOSQL", text : “This article will…", tags : [ “database", “nosql" ] }); » Notice how posts have an array of tags 55
  • 56. Docment e.g.: MongoDB » MongoDB supports secondary indexes and a query optimizer  Compound indexes are also supported db.posts.ensureIndex({ tags: 1 }); db.posts.ensureIndex({ author: 1}); db.posts.find({ author: "john", tags: "nosql" }); // Result: { "_id" : 3, "author" : "john", "title" : "Apples, Oranges and NOSQL", "text" : "This article will…", "tags" : ["database", "nosql", "mongodb" ] } 56
  • 57. Docment e.g.: MongoDB » Let's update our posts to include some comments: db.posts.update({ _id: 3 }, { $inc: { comments_count: 4}, $pushAll : { comments: [ { text: “Comment 1" }, { text: “Comment 2", author: "Mr. T" }, { text: “Comment 3" }, { text: “Comment 4" } ] } }); 57
  • 58. Taxonomy Graph Databases 58
  • 59. Taxonomy Graph databases » Inspired by mathematical graph theory G=(E,V) » Models the structure of data » Navigational data model » Scalability / data complexity » Data model: Key-Value pairs on Edges / Nodes » Relationships: Edges between Nodes » E.g.  Neo4j  Pregel (Google’s PageRank)  AllegroGraph 59
  • 60. Taxonomy Use Case: Connected data - deep relationship links between users in a social network » RDBMS  Complex recursive algorithm  Multiple Self joins  Round trips to DB / bulk read and resolve in RAM » NoSQL:  Graph Storage  Network traversal 60
  • 61. Taxonomy Graph e.g.: Neo4J » High-performance graph engine » Embedded / disk based » Work with OO model: nodes, relationships, properties » ACID Transactions  JTA support – participate in 2PC with your RDBMS » Developed by: Neo Technologies » Written in: Java » Clients: Java, client libraries in other platforms 61
  • 62. Graph e.g.: Neo4j http://neo4j.org/ 62
  • 63. Comparing Apples to Oranges
  • 64. Comparing Apples to Oranges Comparing Data Structures » RDBMS  Databases contains tables, columns and rows  All rows the same structure  Inherent ORM mismatch » NoSQL  Choose your data structure  Data is stored in natural structure (e.g. Documents, Graphs, Objects) 64
  • 65. Comparing Apples to Oranges Comparing Schema Flexibility » RDBMS  Strict schema, difficult to evolve  Maintains relations and forces data integrity » NoSQL  Structure of data can be changed dynamically • e.g. Column stores – Cassandra  Data can sometimes be completely opaque • e.g Key/Value – Project Voldemort 65
  • 66. Comparing Apples to Oranges Comparing Normalization & Relations » RDBMS  The data model is normalized to remove data duplication  Normalization establishes table relations » NoSQL  Denormalization is not a dirty word  Relations are not explicitly defined  Related data is usually grouped and stored as one unit • E.g. document, column 66
  • 67. Comparing Apples to Oranges Comparing Data Acces » RDBMS  CRUD operations using SQL  Access data from multiple tables using SQL joins  Generic API such as JDBC » NoSQL  Proprietary API and DSLs (e.g. Pig / Hive / Gremlin)  MapReduce, graph traversals  REST APIs, portable serialization formats • BSON, JSON, Apache Thrift, Memcached 67
  • 68. Comparing Apples to Oranges Comparing Reporting Capabilities » RDBMS  Slice and Dice data, then reassemble any way you like » NoSQL  Hard to repurpose data for ad-hoc usage • Plan ahead  Think in advance • How and what you store • Data access patterns 68
  • 69. Summary
  • 70. Summary Why NOSQL / BASE » ACID ruled exclusively in the last 40 years  doesn’t compromise on consistency » Database industry neglected distributed DBs w/ availability » Vacuum was filled with “NoSQL” BASE architectures  Strict A and P, minimize C compromise » Relational databases are now trying to catch up 70
  • 71. Summary NoSQL Limitations » Missing some query capabilities  joins / composite transaction » Eventual consistency -- not for every problem » Not a drop in replacement for RDBMS “on ACID” » No standardization -> product lock-in » Relatively immature (support, bugs, community) 71
  • 72. Summary Choose the right tool for the job » Relational databases and NoSQL databases are designed to meet different needs » RDBMS-only should not be a default » NOSQL databases outperform RDBMS’s in their particular niche » No one size fits all / Silver bullet ...but you don’t have to choose one 72
  • 73. Summary Polyglot Persistence » Poly: many Glot: language » Meshing up persistence mechanisms to best meet requirements » Good integration stories:  E.g. Neo4j + JDBC using JTA 73

×