NoSQL Introduction

825 views
662 views

Published on

Understand what NoSQL is and what it is not. Why would you want to use NoSQL within your project and which NoSQL database would you utilize. Explore the relationships between NoSQL and RDBMS. Understand how to select between an RDBMs (MySQL and PostgreSQL), Document Database(MongoDB), Key-Value Store, Graph Database, and Columnar databases or combinations of the above.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
825
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
24
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

NoSQL Introduction

  1. 1. NoSQL Introduction Slides and Feedback at: http://joind.in/11012
  2. 2. NoSQL Introduction • Understand what NoSQL is and what it is not. • Why would you want to use NoSQL within your project and which NoSQL database would you utilize? • Explore the relationships between NoSQL and RDBMS. • Understand how to select between an RDBMs (MySQL and PostgreSQL), Document Database (MongoDB), Key- Value Store, Graph Database, and Columnar databases or combinations of the above. Thursday May 8th 2014, 3:00pm-3:50pm SB 139 Slides and Feedback at: http://joind.in/11012 2
  3. 3. NoSQL • History • Popular NoSQL Databases • NoSQL Database Comparisons • Terminology • Consistency, Replication, Performance • NoSQL Implementation CRUD Operations 3 Slides and Feedback at: http://joind.in/11012
  4. 4. NoSQL Introduction • NoSQL is a commonly adopted misnomer • Typically does not use ANSI SQL – SQL = Structured Query Language – Structure exists but is more Flexible – Queries are performed – Language is closer to Programming Languages 4 Slides and Feedback at: http://joind.in/11012
  5. 5. NoSQL History 5 http://www.w3resource.com/mongodb/nosql.php
  6. 6. NoSQL History • 1998 Carlo Strozzi Command Line Database • June 11, 2009 Meetup – Open Source, Distributed, Non-Relational DB – Eric Evans (Rackspace) – Johan Oskarsson (Last.fm) 6
  7. 7. NoSQL History 7
  8. 8. NoSQL History • Bad name, but it stuck! • Not a definitive term • Generally, Newer databases solving new and different problems • Not Only SQL http://blog.sym- link.com/2009/10/30/nosql_whats_in_a_name.html 8 Slides and Feedback at: http://joind.in/11012
  9. 9. NoSQL Origination • Problems not solved by RDBMs • Limitations of RDBMs, not SQL 9 Slides and Feedback at: http://joind.in/11012
  10. 10. Most Popular Databases http://db-engines.com/en/ranking Ranking by: Web Content, Web Searches, Technical Discussion, Jobs, Resumes 10
  11. 11. Most Popular NoSQL • MongoDB - Document Store • Cassandra – Wide Column Store • Solr – Search Engine • Redis – Key-value store • Hbase – Wide Column Store • Memcached – Key-value Store • CouchDB – Document Store • Neo4j – Graph Database • Riak – Key-value Store • SimpleDB – Key-value Store within Amazon Cloud 11 Slides and Feedback at: http://joind.in/11012
  12. 12. 12 NoSQL vs RDBMs Image Reference: http://blogs.the451group.com/information_management/2012/11/02/updated-database-landscape-graphic/
  13. 13. Reading Recommendations Great Overview of NoSQL: Seven Databases in Seven Weeks Eric Redmond and Jim Wilson 13
  14. 14. NoSQL “Bleeding Edge” • Several solutions are mature and stable enough to run large scale production environments • Not all permutations have been considered • Several (but not all) optimization strategies have been published • Crucial elements such as Security may be a secondary add-on in favor of performance. 14
  15. 15. NoSQL “Bleeding Edge” Sun Microsystems csh man page: “Although robust enough for general use, adventures into the esoteric periphery of the C shell may reveal unexpected quirks.” 15 Slides and Feedback at: http://joind.in/11012
  16. 16. NoSQL Comparison 16 Take note of patterns: Recent Release, Open Source, Utilized at High-Volume sites Variety of Formats: Key-Value, Wide-Column, Document, Graph http://db-engines.com/en/ranking
  17. 17. NoSQL Database Types 17 • Key-Value • Column Oriented Databases (Columnar) • Graph • Document • Search Database - Solr • Key-Value Web Optimization - Memcached
  18. 18. Key-Value Stores 18 Key Value code:java 17.316% Lowest rank on Feb 2014 code:C 18.334% Lowest rank on August 2013 code:Objective-C Lowest rank on Dec 2007 11.341% code:C++ {“score”:”6.892%”, “low rank”: “Feb 2008”} Key Value drink:java coffee drink:punch Sprite + pineapple juice drink:pop Carbonated Soda http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html Codebucketdrinkbucket
  19. 19. Column Oriented Database 19
  20. 20. Neo4j 20
  21. 21. Document Oriented Database 21 { "_id" : 1, "name" : { "first" : "John", "last" : "Backus" }, "contribs" : [ "Fortran", "ALGOL", "FP" ], "awards" : [ { "award" : "W.W. McDowell Award", "year" : 1967, "by" : "IEEE Computer Society" }, { "award" : "Draper Prize", "year" : 1993, "by" : "National Academy of Engineering" } ] }
  22. 22. Document Oriented Database 22 { "facutly" : [ { "_id" : 1, "name" : { "first" : "John", "last" : "Backus" }, "contribs" : [ "Fortran", "ALGOL", "FP" ], "awards" : [ { "award" : "W.W. McDowell Award", "year" : 1967, "by" : "IEEE Computer Society" }, { "award" : "Draper Prize", "year" : 1993, "by" : "National Academy of Engineering" } ] }, { "_id" : 2, "name" : { "first" : "David", "last" : "Williams" }, "contribs" : [ "C#", "Java", "PHP" ], "awards" : [ { "award" : "Sherman Peabody Award II", "year" : 2095, "location" : "Paris", "by" : "Intergalactic Continuum" }, { "award" : "Sherman Peabody Award IX", "year" : 2090, "location" : "Paris", "by" : "Intergalactic Continuum" }, { "award" : "Sherman Peabody Award IV", "year" : 2093, "location" : "Paris", "by" : "Intergalactic Continuum" } ] } ] }
  23. 23. Document Oriented Database 23 http://chris.photobooks.com/json/ Slides and Feedback at: http://joind.in/11012
  24. 24. 24 http://visualizer.json2html.com/
  25. 25. NoSQL Comparison 25 No ANSI SQL Standards, No Predefined Schemas, Replication, Eventual Consistency, Rarely Foreign Keys, Data Types not required Newer Concepts: Sharding, REST API, JSON, MapReduce
  26. 26. NoSQL Characteristics 26 No Predefined Schemas • May insert data without creating a table • Schema Versions (v1.5, v1.6, v1.7,…) Rarely Foreign Keys • No JOIN operations • Relationships are not automatically maintained Eventual Consistency • Old copies being replaced by new records • Inconsistent data until all replacements are complete
  27. 27. Download NoSQL v95141.3 27 Released 4/1/2014 http://www.nosql.org/downloads/ymbkm.zip
  28. 28. NoSQL Terminology and Concepts 28 Slides and Feedback at: http://joind.in/11012
  29. 29. Sharding 29 Partitions – Data distributed across disks Sharding – Data distributed across servers
  30. 30. Map Reduce 30 Divides work across distributed systems Parallel processing of large data sets Divide – Conquer – Consolidate Often Implement by defining Map and Reduce classes or functions 1+2+3+6+7+8+9=? Google’s MapReduce Programming Model – Revisited Ralf Lammel, Microsoft, 2008 http://www.sciencedirect.com/science/article/pii/S0167642307001281 2 6 8 1 7 3 9 16 20 36
  31. 31. JSON 31 Subset of JavaScript Object Notation Similarities to XML method for representing data Syntax Name : Value pairs “salary” : “125000” Values are: number, string, Boolean, array, object, or NULL Objects can store Objects, Arrays can store Arrays Separate pairs by commas “salary” : “125000”, “gender” : “male” Curly braces denote objects { “salary” : “125000”, “gender” : “male” } Square brackets denote arrays “phone” : [”555-1212”, ”555-3344”] “phone” : [ {“office” : ”555-1212”}, {“mobile” : ”555-3344”} ]
  32. 32. JSON Example 32 { "_id" : 1, "name" : { "first" : "John", "last" : "Backus" }, "contribs" : [ "Fortran", "ALGOL", "FP" ], "awards" : [ { "award" : "W.W. McDowell Award", "year" : 1967, "by" : "IEEE Computer Society" }, { "award" : "Draper Prize", "year" : 1993, "by" : "National Academy of Engineering" } ] } http://www.mongodb.com/json-and-bson
  33. 33. REST API 33 CRUD (Create, Read, Update, Delete) operations through the web HTTP Methods GET (List/Read) POST(Update) PUT(Create) DELETE(Delete) EXAMPLE API http://www.blinksale.com/api/ List/Read Data via HTTP GET to http://www.blinksale.com/invoices http://www.blinksale.com/invoices/invoice_id/payments http://www.blinksale.com/invoices/?start=2006&end=2008 Returns XML results
  34. 34. REST API 34 Update data via HTTP POST to http://www.blinksale.com/invoices/invoice_id/payments <?xml version="1.0" encoding="UTF-8"?> <payment xmlns="http://www.blinksale.com/api"> <amount>1000.00</amount> <date>2006-09-27</date> </payment> REST = REpresentational State Transfer Twitter Example: https://dev.twitter.com/docs/api/1.1 (GET and POST only)
  35. 35. Database SELECT Statements 35 Oracle SELECT * FROM relationships MongoDB db.relationships.find() Cassandra (CQL) SELECT * FROM relationships Slides and Feedback at: http://joind.in/11012
  36. 36. Database SELECT Statements 36 Redis – Key-Value Store SMEMBERS relationships Riak – Key-Value Store with REST API (+ proprietary drivers) http://localhost:8091/riak/relationships/likes Neo4j (Cypher) MATCH (n)-[r:LIKES]->(m) RETURN n,r,m Slides and Feedback at: http://joind.in/11012
  37. 37. JOINS without Foreign Keys 37 original_id = ObjectId() db.employer.insert({ "_id": original_id, "name": "Broadway Tech", "url": "bc.example.net" }) db.people.insert({ "name": "Erin", “employer_id": original_id, "url": "bc.example.net/Erin" }) “Erin” works at “Broadway Tech” One of the employees at “Broadway Tech” is “Erin” http://docs.mongodb.org/manual/reference/database-references/#document-references
  38. 38. Replication Challenge is Write Consistency 38
  39. 39. ACID, BASE, CAP, CPR 1979 Gray, 1983 Reuter & Härder - ACID Atomic, Consistent, Isolated, Durable Rollback: All or Nothing, Follows Rules, Simultaneous, No Drops 1997 Brewer - BASE Basically Available, Soft-state, Eventually consistent 2000 Brewer – CAP (Pick Two) Consistency, Availability, Partition Tolerance CPR (Pick Two) Consistency, Performance, Replication/Redundancy Contrived - Stretch Definitions 39
  40. 40. CPR 40 Consistency Redundancy Performance Pick Two
  41. 41. CPR 41 Consistency Performance A DCB Spread data across storage or computer Redundancy
  42. 42. 42 Redundancy Performance Consistency ABCE ABCDABCDABCE Updates may be inconsistent across devices
  43. 43. 43 Consistency Redundancy Performance ABCD ABCDABCDABCD One Update Locks all Nodes
  44. 44. CRUD 44 Create Read Update Delete Slides and Feedback at: http://joind.in/11012
  45. 45. SQL CRUD 45 Create INSERT INTO table (column1, column2) VALUES (9, 'string'); Read SELECT column1, column2 FROM table; Update UPDATE table SET column2 = 'text' WHERE column1= 9 Delete DELETE FROM table WHERE column2='text'
  46. 46. Key-Value Stores 46 Key Value code:java 17.316% Lowest rank on Feb 2014 code:C 18.334% Lowest rank on August 2013 code:Objective-C Lowest rank on Dec 2007 11.341% code:C++ {“score”:”6.892%”, “low rank”: “Feb 2008”} Key Value drink:java coffee drink:punch Sprite + pineapple juice drink:pop Carbonated Soda http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html codebucketdrinkbucket
  47. 47. Redis CRUD 47 http://redis.io/commands http://redis.io/topics/data-types-intro http://openmymind.net/2011/11/8/Redis-Zero-To-Master-In-30-Minutes-Part-1/ Redis is an in-memory Key-Value Store which stores: Strings, Hashes, Lists, Sets, or Ordered sets Strings: values of strings are concrete and can not be altered SET user:jim {lastname: ‘Mathews’, salary: 125000} GET user:jim Hashes: allows modification and retrieval of individual values HSET user:jim salary 125000 HSET user:jim lastname Mathews HGET jim salary
  48. 48. Redis CRUD 48 Lists: One-dimensional array with insert, append, pop, and push Redis.lpush(‘users:employees’, ‘user:jim’) redis.mget(redis.lrange(‘users:employess’,0,5)) Sets: lists with no duplicate values (SADD = Set Add) SADD users:employees jim SADD users:employees krishna SMEMBERS employees Sorted Sets: are sets with an added sorting value ZADD users:employees 125000 jim ZADD users:employees 157000 Krishna ZRANGEBYSCORE users:employees 100000 180000
  49. 49. Riak CRUD 49 Easy to install and configure test cluster REST Queries Create/PUT a “course:CIS2120” row curl –v –X PUT http://localhost:8091/riak/course/CIS2120 -H “Content-Type: application/json” -d ‘{“name”:”Database Coding”, “days”:”MWF”}’ Read/GET the value for “course:CIS2120” curl –X GET http://localhost:8091/riak/course/CIS2120 curl http://localhost:8091/riak/course/CIS2120 Key Value course:CIS2120 {“name”:”Database Coding”, “days”:”MWF”}
  50. 50. Riak Links 50 Riak can link on value to key:value to another with a relationship curl –v –X PUT http://localhost:8091/riak/student/sorensen -H “Content-Type: application/json” -H “Link: </riak/course/CIS2120>; riaktag=”enrolled”” -d ‘{“firstname”:”Conner”}’ This does not automatically create a link from “sorensen“ to “CIS2120”
  51. 51. Neo4j 51
  52. 52. Neo4j – Graph Database 52 http://www.neo4j.org/learn/try http://docs.neo4j.org/refcard/2.0/ MATCH (n)-[r:LIKES]->(m) RETURN n,r,m Matches a person “n” that likes person “m” https://gist.github.com/peterneubauer/6019125 http://gist.neo4j.org/?6019125
  53. 53. Neo4j CRUD 53 Must try dragging nodes at: http://www.neo4j.org/learn/try MATCH (user {name:“Bill"})-[:KNOWS]->(colleague) WHERE colleague.employer=“LinkedIn” RETURN user,colleague ORDER BY colleague.name LIMIT 10 http://docs.neo4j.org/refcard/2.0/ MATCH (n)-[r:LIKES]->(m) RETURN n,r,m Matches a person “n” that likes person “m” MATCH (n)-[r]->(m) RETURN n,r,m Matches any relationship between “n” and “m” http://www.neo4j.org/learn/cypher
  54. 54. Neo4j(LUKE {name:"Luke Skywalker"}), (HAN {name:"Han Solo"}), (LEIA {name:"Princess Leia Organa"}), (OBI_WAN {name:"Obi Wan Kenobi"}), (YODA {name : "Yoda"}), (VADER {name:"Darth Vader"}), (C3PO {name:"C3PO", droid:true}), (R2D2 {name:"R2D2", droid:true}), (CHEWBACCA {name:"Chewbacca"}), (TATOOINE {name:"Tatooine", distance:13184}), (DAGOBAH {name:"Dagobah", distance:15407}), (JEDI {name:"Jedi"}), (SITH {name:"Sith"}), (REBELLION {name:"Rebellion"}), (EMPIRE {name:"Empire"}), (DARK_SIDE {name:"Dark Side"}), (LIGHT_SIDE {name:"Light Side"}), … (LUKE)-[:FRIENDS_WITH]->(HAN), (LUKE)-[:FRIENDS_WITH]->(LEIA), (HAN)-[:FRIENDS_WITH]->(CHEWBACCA), (YODA)-[:TEACHES]->(OBI_WAN), (YODA)-[:TEACHES]->(LUKE), (OBI_WAN)-[:TEACHES]->(LUKE), (OBI_WAN)-[:KNOWS]->(VADER), (LUKE)-[:KNOWS]->(R2D2), (R2D2)-[:KNOWS]->(C3PO), (LUKE)-[:LIVED_ON]->(TATOOINE), (HAN)-[:LIVED_ON]->(CORELLIA), (LEIA)-[:LIVED_ON]->(ALDERAAN), (YODA)-[:LIVED_ON]->(DAGOBAH), (LUKE)-[:DEVOTED_TO]->(JEDI), (LUKE)-[:DEVOTED_TO]->(REBELLION), (LUKE)-[:DEVOTED_TO]->(LIGHT_SIDE), (VADER)-[:DEVOTED_TO]->(SITH), (VADER)-[:DEVOTED_TO]->(EMPIRE), (VADER)-[:DEVOTED_TO]->(DARK_SIDE), (LEIA)-[:DEVOTED_TO]->(REBELLION), (HAN)-[:DEVOTED_TO]->(REBELLION) … https://gist.github.com/peterneubauer/6019125 http://gist.neo4j.org/?6019125 MATCH y-[r]-other WHERE y.name='Yoda' return y.name, type(r), other.name 54
  55. 55. Google BigTable • White Paper published in 2006 • Many databases based upon BigTable • 13 pages, readable for many non-techies • Insightful into the early days of NoSQL http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf 55
  56. 56. Hbase 56 Large-Scale, Column-oriented database Consistency, Performance, Fault-Tolerant, ACID via Locking Tables are created before initial data is added Tables have row keys are indexed row identifier strings column families – contain one or more columns timestamp for version control
  57. 57. Hbase 57 Row key is a unifier for column families. If row does insert values in a column family no disk space is utilized within the column family. Keys are identified by column_family:column_name text: revision:author revision:comment Write-Ahead Logging (WAL) similar to file system journaling
  58. 58. Hbase CRUD 58 create ‘wiki_table’, ‘text_column_family’, ‘revision_column_family’ create ‘wiki’, ‘text’, ‘revision’ put ‘wiki’, ‘first page’, ‘text:’, ‘…’ put ‘wiki’, ‘first page’, ‘revision:author’, ‘…’ get ‘wiki’, ‘first page’, [‘revision:author’, ‘revision:comment’] delete ‘wiki’, ‘first page’, ‘revision:author’ scan ‘wiki’ = SELECT * FROM wiki Seven Databases in Seven Weeks, Redmond & Wilson 2012
  59. 59. MongoDB Document Store 59 db.courses.insert({ name: “CIS2120”, description: “Database Coding”, instructor: { name: “David Williams”, email: “david.williams@usu.edu” } instructor2: { name: “John Kerley-Weeks”, office: “JQL230” } subjects: [“Python”, “MongoDB”, “3NF”, “ETL”, “Star Schema”] })
  60. 60. MongoDB vs SQL 60 http://docs.mongodb.org/manual/reference/sql-comparison/ MongoDB -> SQL Terminology Collection -> Table Document -> Row Column -> Field db.courses.find() = SELECT * FROM courses db.courses.count() = SELECT COUNT(*) FROM courses db.courses.find({name: “CIS2120”})
  61. 61. MongoDB Simple Database 61 http://media.mongodb.org/zips.json {"city": "ACMAR", "loc": [-86.51557, 33.584132], "pop": 6055, "state": "AL", "_id": "35004"} {"city": "ADAMSVILLE", "loc": [-86.959727, 33.588437], "pop": 10616, "state": "AL", "_id": "35005"} {"city": "ADGER", "loc": [-87.167455, 33.434277], "pop": 3205, "state": "AL", "_id": "35006"} {"city": "KEYSTONE", "loc": [-86.812861, 33.236868], "pop": 14218, "state": "AL", "_id": "35007"} {"city": "NEW SITE", "loc": [-85.951086, 32.941445], "pop": 19942, "state": "AL", "_id": "35010"} {"city": "ALPINE", "loc": [-86.208934, 33.331165], "pop": 3062, "state": "AL", "_id": "35014"} {"city": "ARAB", "loc": [-86.489638, 34.328339], "pop": 13650, "state": "AL", "_id": "35016"} {"city": "BAILEYTON", "loc": [-86.621299, 34.268298], "pop": 1781, "state": "AL", "_id": "35019"} {"city": "BESSEMER", "loc": [-86.947547, 33.409002], "pop": 40549, "state": "AL", "_id": "35020"} {"city": "HUEYTOWN", "loc": [-86.999607, 33.414625], "pop": 39677, "state": "AL", "_id": "35023"} {"city": "BLOUNTSVILLE", "loc": [-86.568628, 34.092937], "pop": 9058, "state": "AL", "_id": "35031"} {"city": "BREMEN", "loc": [-87.004281, 33.973664], "pop": 3448, "state": "AL", "_id": "35033"} {"city": "BRENT", "loc": [-87.211387, 32.93567], "pop": 3791, "state": "AL", "_id": "35034"} {"city": "BRIERFIELD", "loc": [-86.951672, 33.042747], "pop": 1282, "state": "AL", "_id": "35035"} {“city”: “Logan, UT”, “additionally”: [“Nibley, UT”, “River Heights, UT”], “state”: “UT”, “version”: “2.1”, “_id”: “84321”} {“city”: “Olivehurst, CA”, “additionally”: [“Arboga, CA”, “Plumas Lake, CA”, “West Linda, CA”], “state”: “CA”, “version”: “2.1”, “_id”: “95961”}
  62. 62. Cassandra Characteristics 62 Scalable, High-availability Wide-columnar datastore Peer-to-peer rather than master-slave clusters Tunable consistency can read/write to a single node, quorum of nodes or all nodes Recommends static and dynamic column families Static column families have contain pre-defined columns Contact Info: phone, address, email, web Dynamic families have variable numbers of similar columns Students enrolled in a course
  63. 63. Cassandra CRUD 63 http://www.datastax.com/docs/0.8/references/cql http://cassandra.apache.org/doc/cql3/CQL.html#selectStmt CREATE TABLE course ( name text PRIMARY KEY, instructor text, maxstudents int ) INSERT INTO course (name, instructor, maxstudents) VALUES (‘CIS2120’, ‘Williams’, 28) UPDATE course SET maxstudents=26 WHERE name=‘CIS2120’ SELECT name, instructor FROM course WHERE maxstudents > 20
  64. 64. Cassandra CRUD 64 No JOIN operations or FOREIGN KEYS CREATE TABLE people ( name text PRIMARY KEY, email text, phones map<text, text> ) INSERT INTO people (name, email, phones) VALUES (‘John Weeks’, ‘john.weeks@usu.edu’, {‘mobile’ : ‘555-1212’, ‘office’ : ‘797-7133’, ‘fax’ : ‘555-1212’}) UPDATE people SET phones[‘office’] = ‘555-1212’ WHERE email = ‘john.weeks@usu.edu’
  65. 65. Questions ??? 65 Slides and Feedback at: http://joind.in/11012

×