NoSQL Introduction
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

NoSQL Introduction

on

  • 64 views

Understand what NoSQL is and what it is not. Why would you want to use NoSQL within your project and which NoSQL database would you utilize. Explore the relationships between NoSQL and RDBMS. ...

Understand what NoSQL is and what it is not. Why would you want to use NoSQL within your project and which NoSQL database would you utilize. Explore the relationships between NoSQL and RDBMS. Understand how to select between an RDBMs (MySQL and PostgreSQL), Document Database(MongoDB), Key-Value Store, Graph Database, and Columnar databases or combinations of the above.

Statistics

Views

Total Views
64
Views on SlideShare
61
Embed Views
3

Actions

Likes
1
Downloads
3
Comments
0

1 Embed 3

https://www.linkedin.com 3

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

NoSQL Introduction Presentation Transcript

  • 1. NoSQL Introduction Slides and Feedback at: http://joind.in/11012
  • 2. NoSQL Introduction • Understand what NoSQL is and what it is not. • Why would you want to use NoSQL within your project and which NoSQL database would you utilize? • Explore the relationships between NoSQL and RDBMS. • Understand how to select between an RDBMs (MySQL and PostgreSQL), Document Database (MongoDB), Key- Value Store, Graph Database, and Columnar databases or combinations of the above. Thursday May 8th 2014, 3:00pm-3:50pm SB 139 Slides and Feedback at: http://joind.in/11012 2
  • 3. NoSQL • History • Popular NoSQL Databases • NoSQL Database Comparisons • Terminology • Consistency, Replication, Performance • NoSQL Implementation CRUD Operations 3 Slides and Feedback at: http://joind.in/11012
  • 4. NoSQL Introduction • NoSQL is a commonly adopted misnomer • Typically does not use ANSI SQL – SQL = Structured Query Language – Structure exists but is more Flexible – Queries are performed – Language is closer to Programming Languages 4 Slides and Feedback at: http://joind.in/11012
  • 5. NoSQL History 5 http://www.w3resource.com/mongodb/nosql.php
  • 6. NoSQL History • 1998 Carlo Strozzi Command Line Database • June 11, 2009 Meetup – Open Source, Distributed, Non-Relational DB – Eric Evans (Rackspace) – Johan Oskarsson (Last.fm) 6
  • 7. NoSQL History 7
  • 8. NoSQL History • Bad name, but it stuck! • Not a definitive term • Generally, Newer databases solving new and different problems • Not Only SQL http://blog.sym- link.com/2009/10/30/nosql_whats_in_a_name.html 8 Slides and Feedback at: http://joind.in/11012
  • 9. NoSQL Origination • Problems not solved by RDBMs • Limitations of RDBMs, not SQL 9 Slides and Feedback at: http://joind.in/11012
  • 10. Most Popular Databases http://db-engines.com/en/ranking Ranking by: Web Content, Web Searches, Technical Discussion, Jobs, Resumes 10
  • 11. Most Popular NoSQL • MongoDB - Document Store • Cassandra – Wide Column Store • Solr – Search Engine • Redis – Key-value store • Hbase – Wide Column Store • Memcached – Key-value Store • CouchDB – Document Store • Neo4j – Graph Database • Riak – Key-value Store • SimpleDB – Key-value Store within Amazon Cloud 11 Slides and Feedback at: http://joind.in/11012
  • 12. 12 NoSQL vs RDBMs Image Reference: http://blogs.the451group.com/information_management/2012/11/02/updated-database-landscape-graphic/
  • 13. Reading Recommendations Great Overview of NoSQL: Seven Databases in Seven Weeks Eric Redmond and Jim Wilson 13
  • 14. NoSQL “Bleeding Edge” • Several solutions are mature and stable enough to run large scale production environments • Not all permutations have been considered • Several (but not all) optimization strategies have been published • Crucial elements such as Security may be a secondary add-on in favor of performance. 14
  • 15. NoSQL “Bleeding Edge” Sun Microsystems csh man page: “Although robust enough for general use, adventures into the esoteric periphery of the C shell may reveal unexpected quirks.” 15 Slides and Feedback at: http://joind.in/11012
  • 16. NoSQL Comparison 16 Take note of patterns: Recent Release, Open Source, Utilized at High-Volume sites Variety of Formats: Key-Value, Wide-Column, Document, Graph http://db-engines.com/en/ranking
  • 17. NoSQL Database Types 17 • Key-Value • Column Oriented Databases (Columnar) • Graph • Document • Search Database - Solr • Key-Value Web Optimization - Memcached
  • 18. Key-Value Stores 18 Key Value code:java 17.316% Lowest rank on Feb 2014 code:C 18.334% Lowest rank on August 2013 code:Objective-C Lowest rank on Dec 2007 11.341% code:C++ {“score”:”6.892%”, “low rank”: “Feb 2008”} Key Value drink:java coffee drink:punch Sprite + pineapple juice drink:pop Carbonated Soda http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html Codebucketdrinkbucket
  • 19. Column Oriented Database 19
  • 20. Neo4j 20
  • 21. Document Oriented Database 21 { "_id" : 1, "name" : { "first" : "John", "last" : "Backus" }, "contribs" : [ "Fortran", "ALGOL", "FP" ], "awards" : [ { "award" : "W.W. McDowell Award", "year" : 1967, "by" : "IEEE Computer Society" }, { "award" : "Draper Prize", "year" : 1993, "by" : "National Academy of Engineering" } ] }
  • 22. Document Oriented Database 22 { "facutly" : [ { "_id" : 1, "name" : { "first" : "John", "last" : "Backus" }, "contribs" : [ "Fortran", "ALGOL", "FP" ], "awards" : [ { "award" : "W.W. McDowell Award", "year" : 1967, "by" : "IEEE Computer Society" }, { "award" : "Draper Prize", "year" : 1993, "by" : "National Academy of Engineering" } ] }, { "_id" : 2, "name" : { "first" : "David", "last" : "Williams" }, "contribs" : [ "C#", "Java", "PHP" ], "awards" : [ { "award" : "Sherman Peabody Award II", "year" : 2095, "location" : "Paris", "by" : "Intergalactic Continuum" }, { "award" : "Sherman Peabody Award IX", "year" : 2090, "location" : "Paris", "by" : "Intergalactic Continuum" }, { "award" : "Sherman Peabody Award IV", "year" : 2093, "location" : "Paris", "by" : "Intergalactic Continuum" } ] } ] }
  • 23. Document Oriented Database 23 http://chris.photobooks.com/json/ Slides and Feedback at: http://joind.in/11012
  • 24. 24 http://visualizer.json2html.com/
  • 25. NoSQL Comparison 25 No ANSI SQL Standards, No Predefined Schemas, Replication, Eventual Consistency, Rarely Foreign Keys, Data Types not required Newer Concepts: Sharding, REST API, JSON, MapReduce
  • 26. NoSQL Characteristics 26 No Predefined Schemas • May insert data without creating a table • Schema Versions (v1.5, v1.6, v1.7,…) Rarely Foreign Keys • No JOIN operations • Relationships are not automatically maintained Eventual Consistency • Old copies being replaced by new records • Inconsistent data until all replacements are complete
  • 27. Download NoSQL v95141.3 27 Released 4/1/2014 http://www.nosql.org/downloads/ymbkm.zip
  • 28. NoSQL Terminology and Concepts 28 Slides and Feedback at: http://joind.in/11012
  • 29. Sharding 29 Partitions – Data distributed across disks Sharding – Data distributed across servers
  • 30. Map Reduce 30 Divides work across distributed systems Parallel processing of large data sets Divide – Conquer – Consolidate Often Implement by defining Map and Reduce classes or functions 1+2+3+6+7+8+9=? Google’s MapReduce Programming Model – Revisited Ralf Lammel, Microsoft, 2008 http://www.sciencedirect.com/science/article/pii/S0167642307001281 2 6 8 1 7 3 9 16 20 36
  • 31. JSON 31 Subset of JavaScript Object Notation Similarities to XML method for representing data Syntax Name : Value pairs “salary” : “125000” Values are: number, string, Boolean, array, object, or NULL Objects can store Objects, Arrays can store Arrays Separate pairs by commas “salary” : “125000”, “gender” : “male” Curly braces denote objects { “salary” : “125000”, “gender” : “male” } Square brackets denote arrays “phone” : [”555-1212”, ”555-3344”] “phone” : [ {“office” : ”555-1212”}, {“mobile” : ”555-3344”} ]
  • 32. JSON Example 32 { "_id" : 1, "name" : { "first" : "John", "last" : "Backus" }, "contribs" : [ "Fortran", "ALGOL", "FP" ], "awards" : [ { "award" : "W.W. McDowell Award", "year" : 1967, "by" : "IEEE Computer Society" }, { "award" : "Draper Prize", "year" : 1993, "by" : "National Academy of Engineering" } ] } http://www.mongodb.com/json-and-bson
  • 33. REST API 33 CRUD (Create, Read, Update, Delete) operations through the web HTTP Methods GET (List/Read) POST(Update) PUT(Create) DELETE(Delete) EXAMPLE API http://www.blinksale.com/api/ List/Read Data via HTTP GET to http://www.blinksale.com/invoices http://www.blinksale.com/invoices/invoice_id/payments http://www.blinksale.com/invoices/?start=2006&end=2008 Returns XML results
  • 34. REST API 34 Update data via HTTP POST to http://www.blinksale.com/invoices/invoice_id/payments <?xml version="1.0" encoding="UTF-8"?> <payment xmlns="http://www.blinksale.com/api"> <amount>1000.00</amount> <date>2006-09-27</date> </payment> REST = REpresentational State Transfer Twitter Example: https://dev.twitter.com/docs/api/1.1 (GET and POST only)
  • 35. Database SELECT Statements 35 Oracle SELECT * FROM relationships MongoDB db.relationships.find() Cassandra (CQL) SELECT * FROM relationships Slides and Feedback at: http://joind.in/11012
  • 36. Database SELECT Statements 36 Redis – Key-Value Store SMEMBERS relationships Riak – Key-Value Store with REST API (+ proprietary drivers) http://localhost:8091/riak/relationships/likes Neo4j (Cypher) MATCH (n)-[r:LIKES]->(m) RETURN n,r,m Slides and Feedback at: http://joind.in/11012
  • 37. JOINS without Foreign Keys 37 original_id = ObjectId() db.employer.insert({ "_id": original_id, "name": "Broadway Tech", "url": "bc.example.net" }) db.people.insert({ "name": "Erin", “employer_id": original_id, "url": "bc.example.net/Erin" }) “Erin” works at “Broadway Tech” One of the employees at “Broadway Tech” is “Erin” http://docs.mongodb.org/manual/reference/database-references/#document-references
  • 38. Replication Challenge is Write Consistency 38
  • 39. ACID, BASE, CAP, CPR 1979 Gray, 1983 Reuter & Härder - ACID Atomic, Consistent, Isolated, Durable Rollback: All or Nothing, Follows Rules, Simultaneous, No Drops 1997 Brewer - BASE Basically Available, Soft-state, Eventually consistent 2000 Brewer – CAP (Pick Two) Consistency, Availability, Partition Tolerance CPR (Pick Two) Consistency, Performance, Replication/Redundancy Contrived - Stretch Definitions 39
  • 40. CPR 40 Consistency Redundancy Performance Pick Two
  • 41. CPR 41 Consistency Performance A DCB Spread data across storage or computer Redundancy
  • 42. 42 Redundancy Performance Consistency ABCE ABCDABCDABCE Updates may be inconsistent across devices
  • 43. 43 Consistency Redundancy Performance ABCD ABCDABCDABCD One Update Locks all Nodes
  • 44. CRUD 44 Create Read Update Delete Slides and Feedback at: http://joind.in/11012
  • 45. SQL CRUD 45 Create INSERT INTO table (column1, column2) VALUES (9, 'string'); Read SELECT column1, column2 FROM table; Update UPDATE table SET column2 = 'text' WHERE column1= 9 Delete DELETE FROM table WHERE column2='text'
  • 46. Key-Value Stores 46 Key Value code:java 17.316% Lowest rank on Feb 2014 code:C 18.334% Lowest rank on August 2013 code:Objective-C Lowest rank on Dec 2007 11.341% code:C++ {“score”:”6.892%”, “low rank”: “Feb 2008”} Key Value drink:java coffee drink:punch Sprite + pineapple juice drink:pop Carbonated Soda http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html codebucketdrinkbucket
  • 47. Redis CRUD 47 http://redis.io/commands http://redis.io/topics/data-types-intro http://openmymind.net/2011/11/8/Redis-Zero-To-Master-In-30-Minutes-Part-1/ Redis is an in-memory Key-Value Store which stores: Strings, Hashes, Lists, Sets, or Ordered sets Strings: values of strings are concrete and can not be altered SET user:jim {lastname: ‘Mathews’, salary: 125000} GET user:jim Hashes: allows modification and retrieval of individual values HSET user:jim salary 125000 HSET user:jim lastname Mathews HGET jim salary
  • 48. Redis CRUD 48 Lists: One-dimensional array with insert, append, pop, and push Redis.lpush(‘users:employees’, ‘user:jim’) redis.mget(redis.lrange(‘users:employess’,0,5)) Sets: lists with no duplicate values (SADD = Set Add) SADD users:employees jim SADD users:employees krishna SMEMBERS employees Sorted Sets: are sets with an added sorting value ZADD users:employees 125000 jim ZADD users:employees 157000 Krishna ZRANGEBYSCORE users:employees 100000 180000
  • 49. Riak CRUD 49 Easy to install and configure test cluster REST Queries Create/PUT a “course:CIS2120” row curl –v –X PUT http://localhost:8091/riak/course/CIS2120 -H “Content-Type: application/json” -d ‘{“name”:”Database Coding”, “days”:”MWF”}’ Read/GET the value for “course:CIS2120” curl –X GET http://localhost:8091/riak/course/CIS2120 curl http://localhost:8091/riak/course/CIS2120 Key Value course:CIS2120 {“name”:”Database Coding”, “days”:”MWF”}
  • 50. Riak Links 50 Riak can link on value to key:value to another with a relationship curl –v –X PUT http://localhost:8091/riak/student/sorensen -H “Content-Type: application/json” -H “Link: </riak/course/CIS2120>; riaktag=”enrolled”” -d ‘{“firstname”:”Conner”}’ This does not automatically create a link from “sorensen“ to “CIS2120”
  • 51. Neo4j 51
  • 52. Neo4j – Graph Database 52 http://www.neo4j.org/learn/try http://docs.neo4j.org/refcard/2.0/ MATCH (n)-[r:LIKES]->(m) RETURN n,r,m Matches a person “n” that likes person “m” https://gist.github.com/peterneubauer/6019125 http://gist.neo4j.org/?6019125
  • 53. Neo4j CRUD 53 Must try dragging nodes at: http://www.neo4j.org/learn/try MATCH (user {name:“Bill"})-[:KNOWS]->(colleague) WHERE colleague.employer=“LinkedIn” RETURN user,colleague ORDER BY colleague.name LIMIT 10 http://docs.neo4j.org/refcard/2.0/ MATCH (n)-[r:LIKES]->(m) RETURN n,r,m Matches a person “n” that likes person “m” MATCH (n)-[r]->(m) RETURN n,r,m Matches any relationship between “n” and “m” http://www.neo4j.org/learn/cypher
  • 54. Neo4j(LUKE {name:"Luke Skywalker"}), (HAN {name:"Han Solo"}), (LEIA {name:"Princess Leia Organa"}), (OBI_WAN {name:"Obi Wan Kenobi"}), (YODA {name : "Yoda"}), (VADER {name:"Darth Vader"}), (C3PO {name:"C3PO", droid:true}), (R2D2 {name:"R2D2", droid:true}), (CHEWBACCA {name:"Chewbacca"}), (TATOOINE {name:"Tatooine", distance:13184}), (DAGOBAH {name:"Dagobah", distance:15407}), (JEDI {name:"Jedi"}), (SITH {name:"Sith"}), (REBELLION {name:"Rebellion"}), (EMPIRE {name:"Empire"}), (DARK_SIDE {name:"Dark Side"}), (LIGHT_SIDE {name:"Light Side"}), … (LUKE)-[:FRIENDS_WITH]->(HAN), (LUKE)-[:FRIENDS_WITH]->(LEIA), (HAN)-[:FRIENDS_WITH]->(CHEWBACCA), (YODA)-[:TEACHES]->(OBI_WAN), (YODA)-[:TEACHES]->(LUKE), (OBI_WAN)-[:TEACHES]->(LUKE), (OBI_WAN)-[:KNOWS]->(VADER), (LUKE)-[:KNOWS]->(R2D2), (R2D2)-[:KNOWS]->(C3PO), (LUKE)-[:LIVED_ON]->(TATOOINE), (HAN)-[:LIVED_ON]->(CORELLIA), (LEIA)-[:LIVED_ON]->(ALDERAAN), (YODA)-[:LIVED_ON]->(DAGOBAH), (LUKE)-[:DEVOTED_TO]->(JEDI), (LUKE)-[:DEVOTED_TO]->(REBELLION), (LUKE)-[:DEVOTED_TO]->(LIGHT_SIDE), (VADER)-[:DEVOTED_TO]->(SITH), (VADER)-[:DEVOTED_TO]->(EMPIRE), (VADER)-[:DEVOTED_TO]->(DARK_SIDE), (LEIA)-[:DEVOTED_TO]->(REBELLION), (HAN)-[:DEVOTED_TO]->(REBELLION) … https://gist.github.com/peterneubauer/6019125 http://gist.neo4j.org/?6019125 MATCH y-[r]-other WHERE y.name='Yoda' return y.name, type(r), other.name 54
  • 55. Google BigTable • White Paper published in 2006 • Many databases based upon BigTable • 13 pages, readable for many non-techies • Insightful into the early days of NoSQL http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf 55
  • 56. Hbase 56 Large-Scale, Column-oriented database Consistency, Performance, Fault-Tolerant, ACID via Locking Tables are created before initial data is added Tables have row keys are indexed row identifier strings column families – contain one or more columns timestamp for version control
  • 57. Hbase 57 Row key is a unifier for column families. If row does insert values in a column family no disk space is utilized within the column family. Keys are identified by column_family:column_name text: revision:author revision:comment Write-Ahead Logging (WAL) similar to file system journaling
  • 58. Hbase CRUD 58 create ‘wiki_table’, ‘text_column_family’, ‘revision_column_family’ create ‘wiki’, ‘text’, ‘revision’ put ‘wiki’, ‘first page’, ‘text:’, ‘…’ put ‘wiki’, ‘first page’, ‘revision:author’, ‘…’ get ‘wiki’, ‘first page’, [‘revision:author’, ‘revision:comment’] delete ‘wiki’, ‘first page’, ‘revision:author’ scan ‘wiki’ = SELECT * FROM wiki Seven Databases in Seven Weeks, Redmond & Wilson 2012
  • 59. MongoDB Document Store 59 db.courses.insert({ name: “CIS2120”, description: “Database Coding”, instructor: { name: “David Williams”, email: “david.williams@usu.edu” } instructor2: { name: “John Kerley-Weeks”, office: “JQL230” } subjects: [“Python”, “MongoDB”, “3NF”, “ETL”, “Star Schema”] })
  • 60. MongoDB vs SQL 60 http://docs.mongodb.org/manual/reference/sql-comparison/ MongoDB -> SQL Terminology Collection -> Table Document -> Row Column -> Field db.courses.find() = SELECT * FROM courses db.courses.count() = SELECT COUNT(*) FROM courses db.courses.find({name: “CIS2120”})
  • 61. MongoDB Simple Database 61 http://media.mongodb.org/zips.json {"city": "ACMAR", "loc": [-86.51557, 33.584132], "pop": 6055, "state": "AL", "_id": "35004"} {"city": "ADAMSVILLE", "loc": [-86.959727, 33.588437], "pop": 10616, "state": "AL", "_id": "35005"} {"city": "ADGER", "loc": [-87.167455, 33.434277], "pop": 3205, "state": "AL", "_id": "35006"} {"city": "KEYSTONE", "loc": [-86.812861, 33.236868], "pop": 14218, "state": "AL", "_id": "35007"} {"city": "NEW SITE", "loc": [-85.951086, 32.941445], "pop": 19942, "state": "AL", "_id": "35010"} {"city": "ALPINE", "loc": [-86.208934, 33.331165], "pop": 3062, "state": "AL", "_id": "35014"} {"city": "ARAB", "loc": [-86.489638, 34.328339], "pop": 13650, "state": "AL", "_id": "35016"} {"city": "BAILEYTON", "loc": [-86.621299, 34.268298], "pop": 1781, "state": "AL", "_id": "35019"} {"city": "BESSEMER", "loc": [-86.947547, 33.409002], "pop": 40549, "state": "AL", "_id": "35020"} {"city": "HUEYTOWN", "loc": [-86.999607, 33.414625], "pop": 39677, "state": "AL", "_id": "35023"} {"city": "BLOUNTSVILLE", "loc": [-86.568628, 34.092937], "pop": 9058, "state": "AL", "_id": "35031"} {"city": "BREMEN", "loc": [-87.004281, 33.973664], "pop": 3448, "state": "AL", "_id": "35033"} {"city": "BRENT", "loc": [-87.211387, 32.93567], "pop": 3791, "state": "AL", "_id": "35034"} {"city": "BRIERFIELD", "loc": [-86.951672, 33.042747], "pop": 1282, "state": "AL", "_id": "35035"} {“city”: “Logan, UT”, “additionally”: [“Nibley, UT”, “River Heights, UT”], “state”: “UT”, “version”: “2.1”, “_id”: “84321”} {“city”: “Olivehurst, CA”, “additionally”: [“Arboga, CA”, “Plumas Lake, CA”, “West Linda, CA”], “state”: “CA”, “version”: “2.1”, “_id”: “95961”}
  • 62. Cassandra Characteristics 62 Scalable, High-availability Wide-columnar datastore Peer-to-peer rather than master-slave clusters Tunable consistency can read/write to a single node, quorum of nodes or all nodes Recommends static and dynamic column families Static column families have contain pre-defined columns Contact Info: phone, address, email, web Dynamic families have variable numbers of similar columns Students enrolled in a course
  • 63. Cassandra CRUD 63 http://www.datastax.com/docs/0.8/references/cql http://cassandra.apache.org/doc/cql3/CQL.html#selectStmt CREATE TABLE course ( name text PRIMARY KEY, instructor text, maxstudents int ) INSERT INTO course (name, instructor, maxstudents) VALUES (‘CIS2120’, ‘Williams’, 28) UPDATE course SET maxstudents=26 WHERE name=‘CIS2120’ SELECT name, instructor FROM course WHERE maxstudents > 20
  • 64. Cassandra CRUD 64 No JOIN operations or FOREIGN KEYS CREATE TABLE people ( name text PRIMARY KEY, email text, phones map<text, text> ) INSERT INTO people (name, email, phones) VALUES (‘John Weeks’, ‘john.weeks@usu.edu’, {‘mobile’ : ‘555-1212’, ‘office’ : ‘797-7133’, ‘fax’ : ‘555-1212’}) UPDATE people SET phones[‘office’] = ‘555-1212’ WHERE email = ‘john.weeks@usu.edu’
  • 65. Questions ??? 65 Slides and Feedback at: http://joind.in/11012