NoSQL Introduction
Slides and Feedback at: http://joind.in/11012
NoSQL Introduction
• Understand what NoSQL is and what it is not.
• Why would you want to use NoSQL within your project
and which NoSQL database would you utilize?
• Explore the relationships between NoSQL and RDBMS.
• Understand how to select between an RDBMs (MySQL
and PostgreSQL), Document Database (MongoDB), Key-
Value Store, Graph Database, and Columnar databases or
combinations of the above.
Thursday May 8th 2014, 3:00pm-3:50pm SB 139
Slides and Feedback at: http://joind.in/11012
2
NoSQL
• History
• Popular NoSQL Databases
• NoSQL Database Comparisons
• Terminology
• Consistency, Replication, Performance
• NoSQL Implementation CRUD Operations
3 Slides and Feedback at: http://joind.in/11012
NoSQL Introduction
• NoSQL is a commonly adopted misnomer
• Typically does not use ANSI SQL
– SQL = Structured Query Language
– Structure exists but is more Flexible
– Queries are performed
– Language is closer to Programming Languages
4
Slides and Feedback at: http://joind.in/11012
NoSQL History
5 http://www.w3resource.com/mongodb/nosql.php
NoSQL History
• 1998 Carlo Strozzi Command Line Database
• June 11, 2009 Meetup
– Open Source, Distributed, Non-Relational DB
– Eric Evans (Rackspace)
– Johan Oskarsson (Last.fm)
6
NoSQL History
7
NoSQL History
• Bad name, but it stuck!
• Not a definitive term
• Generally, Newer databases solving new
and different problems
• Not Only SQL http://blog.sym-
link.com/2009/10/30/nosql_whats_in_a_name.html
8 Slides and Feedback at: http://joind.in/11012
NoSQL Origination
• Problems not solved by RDBMs
• Limitations of RDBMs, not SQL
9 Slides and Feedback at: http://joind.in/11012
Most Popular Databases
http://db-engines.com/en/ranking
Ranking by: Web Content, Web Searches, Technical Discussion, Jobs, Resumes
10
Most Popular NoSQL
• MongoDB - Document Store
• Cassandra – Wide Column Store
• Solr – Search Engine
• Redis – Key-value store
• Hbase – Wide Column Store
• Memcached – Key-value Store
• CouchDB – Document Store
• Neo4j – Graph Database
• Riak – Key-value Store
• SimpleDB – Key-value Store within Amazon Cloud
11 Slides and Feedback at: http://joind.in/11012
12
NoSQL vs RDBMs
Image Reference: http://blogs.the451group.com/information_management/2012/11/02/updated-database-landscape-graphic/
Reading Recommendations
Great Overview of NoSQL:
Seven Databases in Seven Weeks
Eric Redmond and Jim Wilson
13
NoSQL “Bleeding Edge”
• Several solutions are mature and stable
enough to run large scale production
environments
• Not all permutations have been considered
• Several (but not all) optimization strategies
have been published
• Crucial elements such as Security may be a
secondary add-on in favor of performance.
14
NoSQL “Bleeding Edge”
Sun Microsystems csh man page:
“Although robust enough for
general use, adventures into the
esoteric periphery of the C shell
may reveal unexpected quirks.”
15 Slides and Feedback at: http://joind.in/11012
NoSQL Comparison
16
Take note of patterns:
Recent Release, Open Source, Utilized at High-Volume sites
Variety of Formats:
Key-Value, Wide-Column, Document, Graph
http://db-engines.com/en/ranking
NoSQL Database Types
17
• Key-Value
• Column Oriented Databases (Columnar)
• Graph
• Document
• Search Database - Solr
• Key-Value Web Optimization - Memcached
Key-Value Stores
18
Key Value
code:java 17.316% Lowest rank on Feb 2014
code:C 18.334% Lowest rank on August 2013
code:Objective-C Lowest rank on Dec 2007 11.341%
code:C++ {“score”:”6.892%”, “low rank”: “Feb 2008”}
Key Value
drink:java coffee
drink:punch Sprite + pineapple juice
drink:pop Carbonated Soda
http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html
Codebucketdrinkbucket
Column Oriented Database
19
Neo4j
20
Document Oriented Database
21
{
"_id" : 1,
"name" : { "first" : "John", "last" : "Backus" },
"contribs" : [ "Fortran", "ALGOL", "FP" ],
"awards" : [
{ "award" : "W.W. McDowell Award",
"year" : 1967,
"by" : "IEEE Computer Society" },
{ "award" : "Draper Prize",
"year" : 1993,
"by" : "National Academy of Engineering" }
]
}
Document Oriented Database
22
{ "facutly" :
[
{
"_id" : 1,
"name" : { "first" : "John", "last" : "Backus" },
"contribs" : [ "Fortran", "ALGOL", "FP" ],
"awards" : [
{ "award" : "W.W. McDowell Award",
"year" : 1967,
"by" : "IEEE Computer Society" },
{ "award" : "Draper Prize",
"year" : 1993,
"by" : "National Academy of Engineering" }
]
},
{
"_id" : 2,
"name" : { "first" : "David", "last" : "Williams" },
"contribs" : [ "C#", "Java", "PHP" ],
"awards" : [
{ "award" : "Sherman Peabody Award II",
"year" : 2095,
"location" : "Paris",
"by" : "Intergalactic Continuum" },
{ "award" : "Sherman Peabody Award IX",
"year" : 2090,
"location" : "Paris",
"by" : "Intergalactic Continuum" },
{ "award" : "Sherman Peabody Award IV",
"year" : 2093,
"location" : "Paris",
"by" : "Intergalactic Continuum" }
]
}
]
}
Document Oriented Database
23
http://chris.photobooks.com/json/
Slides and Feedback at: http://joind.in/11012
24
http://visualizer.json2html.com/
NoSQL Comparison
25
No ANSI SQL Standards, No Predefined Schemas, Replication,
Eventual Consistency, Rarely Foreign Keys, Data Types not required
Newer Concepts: Sharding, REST API, JSON, MapReduce
NoSQL Characteristics
26
No Predefined Schemas
• May insert data without creating a table
• Schema Versions (v1.5, v1.6, v1.7,…)
Rarely Foreign Keys
• No JOIN operations
• Relationships are not automatically maintained
Eventual Consistency
• Old copies being replaced by new records
• Inconsistent data until all replacements are complete
Download NoSQL v95141.3
27
Released 4/1/2014
http://www.nosql.org/downloads/ymbkm.zip
NoSQL
Terminology
and
Concepts
28 Slides and Feedback at: http://joind.in/11012
Sharding
29
Partitions – Data distributed across disks
Sharding – Data distributed across servers
Map Reduce
30
Divides work across distributed systems
Parallel processing of large data sets
Divide – Conquer – Consolidate
Often Implement by defining Map and Reduce classes or functions
1+2+3+6+7+8+9=?
Google’s MapReduce Programming Model – Revisited Ralf Lammel, Microsoft, 2008
http://www.sciencedirect.com/science/article/pii/S0167642307001281
2
6
8
1
7
3
9
16
20
36
JSON
31
Subset of JavaScript Object Notation
Similarities to XML method for representing data
Syntax
Name : Value pairs
“salary” : “125000”
Values are: number, string, Boolean, array, object, or NULL
Objects can store Objects, Arrays can store Arrays
Separate pairs by commas
“salary” : “125000”, “gender” : “male”
Curly braces denote objects
{ “salary” : “125000”, “gender” : “male” }
Square brackets denote arrays
“phone” : [”555-1212”, ”555-3344”]
“phone” : [ {“office” : ”555-1212”}, {“mobile” : ”555-3344”} ]
JSON Example
32
{
"_id" : 1,
"name" : { "first" : "John", "last" : "Backus" },
"contribs" : [ "Fortran", "ALGOL", "FP" ],
"awards" : [
{ "award" : "W.W. McDowell Award",
"year" : 1967,
"by" : "IEEE Computer Society" },
{ "award" : "Draper Prize",
"year" : 1993,
"by" : "National Academy of Engineering" }
]
}
http://www.mongodb.com/json-and-bson
REST API
33
CRUD (Create, Read, Update, Delete) operations through the web
HTTP Methods
GET (List/Read)
POST(Update)
PUT(Create)
DELETE(Delete)
EXAMPLE API http://www.blinksale.com/api/
List/Read Data via HTTP GET to
http://www.blinksale.com/invoices
http://www.blinksale.com/invoices/invoice_id/payments
http://www.blinksale.com/invoices/?start=2006&end=2008
Returns XML results
REST API
34
Update data via HTTP POST to
http://www.blinksale.com/invoices/invoice_id/payments
<?xml version="1.0" encoding="UTF-8"?>
<payment xmlns="http://www.blinksale.com/api">
<amount>1000.00</amount>
<date>2006-09-27</date>
</payment>
REST = REpresentational State Transfer
Twitter Example:
https://dev.twitter.com/docs/api/1.1 (GET and POST only)
Database SELECT Statements
35
Oracle
SELECT * FROM relationships
MongoDB
db.relationships.find()
Cassandra (CQL)
SELECT * FROM relationships
Slides and Feedback at: http://joind.in/11012
Database SELECT Statements
36
Redis – Key-Value Store
SMEMBERS relationships
Riak – Key-Value Store with REST API (+ proprietary drivers)
http://localhost:8091/riak/relationships/likes
Neo4j (Cypher)
MATCH (n)-[r:LIKES]->(m) RETURN n,r,m
Slides and Feedback at: http://joind.in/11012
JOINS without Foreign Keys
37
original_id = ObjectId()
db.employer.insert({
"_id": original_id,
"name": "Broadway Tech",
"url": "bc.example.net" })
db.people.insert({
"name": "Erin",
“employer_id": original_id,
"url": "bc.example.net/Erin" })
“Erin” works at “Broadway Tech”
One of the employees at “Broadway Tech” is “Erin”
http://docs.mongodb.org/manual/reference/database-references/#document-references
Replication Challenge is
Write Consistency
38
ACID, BASE, CAP, CPR
1979 Gray, 1983 Reuter & Härder - ACID
Atomic, Consistent, Isolated, Durable
Rollback: All or Nothing, Follows Rules, Simultaneous, No Drops
1997 Brewer - BASE
Basically Available, Soft-state, Eventually consistent
2000 Brewer – CAP (Pick Two)
Consistency, Availability, Partition Tolerance
CPR (Pick Two)
Consistency, Performance, Replication/Redundancy
Contrived - Stretch Definitions
39
CPR
40
Consistency
Redundancy
Performance
Pick Two
CPR
41
Consistency Performance
A DCB
Spread data across storage or computer
Redundancy
42
Redundancy
Performance
Consistency
ABCE ABCDABCDABCE
Updates may be
inconsistent across devices
43
Consistency
Redundancy
Performance
ABCD ABCDABCDABCD
One Update Locks all Nodes
CRUD
44
Create
Read
Update
Delete
Slides and Feedback at: http://joind.in/11012
SQL CRUD
45
Create
INSERT INTO table (column1, column2) VALUES (9, 'string');
Read
SELECT column1, column2 FROM table;
Update
UPDATE table SET column2 = 'text' WHERE column1= 9
Delete
DELETE FROM table WHERE column2='text'
Key-Value Stores
46
Key Value
code:java 17.316% Lowest rank on Feb 2014
code:C 18.334% Lowest rank on August 2013
code:Objective-C Lowest rank on Dec 2007 11.341%
code:C++ {“score”:”6.892%”, “low rank”: “Feb 2008”}
Key Value
drink:java coffee
drink:punch Sprite + pineapple juice
drink:pop Carbonated Soda
http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html
codebucketdrinkbucket
Redis CRUD
47
http://redis.io/commands
http://redis.io/topics/data-types-intro
http://openmymind.net/2011/11/8/Redis-Zero-To-Master-In-30-Minutes-Part-1/
Redis is an in-memory Key-Value Store which stores:
Strings, Hashes, Lists, Sets, or Ordered sets
Strings: values of strings are concrete and can not be altered
SET user:jim {lastname: ‘Mathews’, salary: 125000}
GET user:jim
Hashes: allows modification and retrieval of individual values
HSET user:jim salary 125000
HSET user:jim lastname Mathews
HGET jim salary
Redis CRUD
48
Lists: One-dimensional array with insert, append, pop, and
push
Redis.lpush(‘users:employees’, ‘user:jim’)
redis.mget(redis.lrange(‘users:employess’,0,5))
Sets: lists with no duplicate values (SADD = Set Add)
SADD users:employees jim
SADD users:employees krishna
SMEMBERS employees
Sorted Sets: are sets with an added sorting value
ZADD users:employees 125000 jim
ZADD users:employees 157000 Krishna
ZRANGEBYSCORE users:employees 100000 180000
Riak CRUD
49
Easy to install and configure test cluster
REST Queries
Create/PUT a “course:CIS2120” row
curl –v –X PUT http://localhost:8091/riak/course/CIS2120 
-H “Content-Type: application/json” 
-d ‘{“name”:”Database Coding”, “days”:”MWF”}’
Read/GET the value for “course:CIS2120”
curl –X GET http://localhost:8091/riak/course/CIS2120
curl http://localhost:8091/riak/course/CIS2120
Key Value
course:CIS2120 {“name”:”Database Coding”, “days”:”MWF”}
Riak Links
50
Riak can link on value to key:value to another with a relationship
curl –v –X PUT http://localhost:8091/riak/student/sorensen 
-H “Content-Type: application/json” 
-H “Link: </riak/course/CIS2120>; riaktag=”enrolled”” 
-d ‘{“firstname”:”Conner”}’
This does not automatically create a link from “sorensen“ to
“CIS2120”
Neo4j
51
Neo4j – Graph Database
52
http://www.neo4j.org/learn/try
http://docs.neo4j.org/refcard/2.0/
MATCH (n)-[r:LIKES]->(m) RETURN n,r,m
Matches a person “n” that likes person “m”
https://gist.github.com/peterneubauer/6019125
http://gist.neo4j.org/?6019125
Neo4j CRUD
53
Must try dragging nodes at: http://www.neo4j.org/learn/try
MATCH (user {name:“Bill"})-[:KNOWS]->(colleague)
WHERE colleague.employer=“LinkedIn”
RETURN user,colleague
ORDER BY colleague.name LIMIT 10
http://docs.neo4j.org/refcard/2.0/
MATCH (n)-[r:LIKES]->(m) RETURN n,r,m
Matches a person “n” that likes person “m”
MATCH (n)-[r]->(m) RETURN n,r,m
Matches any relationship between “n” and “m”
http://www.neo4j.org/learn/cypher
Neo4j(LUKE {name:"Luke Skywalker"}),
(HAN {name:"Han Solo"}),
(LEIA {name:"Princess Leia Organa"}),
(OBI_WAN {name:"Obi Wan Kenobi"}),
(YODA {name : "Yoda"}),
(VADER {name:"Darth Vader"}),
(C3PO {name:"C3PO", droid:true}),
(R2D2 {name:"R2D2", droid:true}),
(CHEWBACCA {name:"Chewbacca"}),
(TATOOINE {name:"Tatooine", distance:13184}),
(DAGOBAH {name:"Dagobah", distance:15407}),
(JEDI {name:"Jedi"}),
(SITH {name:"Sith"}),
(REBELLION {name:"Rebellion"}),
(EMPIRE {name:"Empire"}),
(DARK_SIDE {name:"Dark Side"}),
(LIGHT_SIDE {name:"Light Side"}),
…
(LUKE)-[:FRIENDS_WITH]->(HAN),
(LUKE)-[:FRIENDS_WITH]->(LEIA),
(HAN)-[:FRIENDS_WITH]->(CHEWBACCA),
(YODA)-[:TEACHES]->(OBI_WAN),
(YODA)-[:TEACHES]->(LUKE),
(OBI_WAN)-[:TEACHES]->(LUKE),
(OBI_WAN)-[:KNOWS]->(VADER),
(LUKE)-[:KNOWS]->(R2D2),
(R2D2)-[:KNOWS]->(C3PO),
(LUKE)-[:LIVED_ON]->(TATOOINE),
(HAN)-[:LIVED_ON]->(CORELLIA),
(LEIA)-[:LIVED_ON]->(ALDERAAN),
(YODA)-[:LIVED_ON]->(DAGOBAH),
(LUKE)-[:DEVOTED_TO]->(JEDI),
(LUKE)-[:DEVOTED_TO]->(REBELLION),
(LUKE)-[:DEVOTED_TO]->(LIGHT_SIDE),
(VADER)-[:DEVOTED_TO]->(SITH),
(VADER)-[:DEVOTED_TO]->(EMPIRE),
(VADER)-[:DEVOTED_TO]->(DARK_SIDE),
(LEIA)-[:DEVOTED_TO]->(REBELLION),
(HAN)-[:DEVOTED_TO]->(REBELLION)
…
https://gist.github.com/peterneubauer/6019125
http://gist.neo4j.org/?6019125
MATCH y-[r]-other
WHERE y.name='Yoda'
return y.name, type(r), other.name
54
Google BigTable
• White Paper published in 2006
• Many databases based upon BigTable
• 13 pages, readable for many non-techies
• Insightful into the early days of NoSQL
http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
55
Hbase
56
Large-Scale, Column-oriented database
Consistency, Performance, Fault-Tolerant, ACID via Locking
Tables are created before initial data is added
Tables have
row keys are indexed row identifier strings
column families – contain one or more columns
timestamp for version control
Hbase
57
Row key is a unifier for column families.
If row does insert values in a column family no disk space
is utilized within the column family.
Keys are identified by column_family:column_name
text:
revision:author
revision:comment
Write-Ahead Logging
(WAL)
similar to file system
journaling
Hbase CRUD
58
create ‘wiki_table’, ‘text_column_family’, ‘revision_column_family’
create ‘wiki’, ‘text’, ‘revision’
put ‘wiki’, ‘first page’, ‘text:’, ‘…’
put ‘wiki’, ‘first page’, ‘revision:author’, ‘…’
get ‘wiki’, ‘first page’, [‘revision:author’, ‘revision:comment’]
delete ‘wiki’, ‘first page’, ‘revision:author’
scan ‘wiki’ = SELECT * FROM wiki
Seven Databases in Seven Weeks, Redmond & Wilson 2012
MongoDB Document Store
59
db.courses.insert({
name: “CIS2120”,
description: “Database Coding”,
instructor: {
name: “David Williams”,
email: “david.williams@usu.edu”
}
instructor2: {
name: “John Kerley-Weeks”,
office: “JQL230”
}
subjects: [“Python”, “MongoDB”, “3NF”, “ETL”, “Star Schema”]
})
MongoDB vs SQL
60
http://docs.mongodb.org/manual/reference/sql-comparison/
MongoDB -> SQL Terminology
Collection -> Table
Document -> Row
Column -> Field
db.courses.find() = SELECT * FROM courses
db.courses.count() = SELECT COUNT(*) FROM courses
db.courses.find({name: “CIS2120”})
MongoDB Simple Database
61
http://media.mongodb.org/zips.json
{"city": "ACMAR", "loc": [-86.51557, 33.584132], "pop": 6055, "state": "AL", "_id": "35004"}
{"city": "ADAMSVILLE", "loc": [-86.959727, 33.588437], "pop": 10616, "state": "AL", "_id": "35005"}
{"city": "ADGER", "loc": [-87.167455, 33.434277], "pop": 3205, "state": "AL", "_id": "35006"}
{"city": "KEYSTONE", "loc": [-86.812861, 33.236868], "pop": 14218, "state": "AL", "_id": "35007"}
{"city": "NEW SITE", "loc": [-85.951086, 32.941445], "pop": 19942, "state": "AL", "_id": "35010"}
{"city": "ALPINE", "loc": [-86.208934, 33.331165], "pop": 3062, "state": "AL", "_id": "35014"}
{"city": "ARAB", "loc": [-86.489638, 34.328339], "pop": 13650, "state": "AL", "_id": "35016"}
{"city": "BAILEYTON", "loc": [-86.621299, 34.268298], "pop": 1781, "state": "AL", "_id": "35019"}
{"city": "BESSEMER", "loc": [-86.947547, 33.409002], "pop": 40549, "state": "AL", "_id": "35020"}
{"city": "HUEYTOWN", "loc": [-86.999607, 33.414625], "pop": 39677, "state": "AL", "_id": "35023"}
{"city": "BLOUNTSVILLE", "loc": [-86.568628, 34.092937], "pop": 9058, "state": "AL", "_id": "35031"}
{"city": "BREMEN", "loc": [-87.004281, 33.973664], "pop": 3448, "state": "AL", "_id": "35033"}
{"city": "BRENT", "loc": [-87.211387, 32.93567], "pop": 3791, "state": "AL", "_id": "35034"}
{"city": "BRIERFIELD", "loc": [-86.951672, 33.042747], "pop": 1282, "state": "AL", "_id": "35035"}
{“city”: “Logan, UT”, “additionally”: [“Nibley, UT”, “River Heights, UT”], “state”: “UT”, “version”: “2.1”, “_id”: “84321”}
{“city”: “Olivehurst, CA”, “additionally”: [“Arboga, CA”, “Plumas Lake, CA”, “West Linda, CA”], “state”: “CA”, “version”: “2.1”,
“_id”: “95961”}
Cassandra Characteristics
62
Scalable, High-availability Wide-columnar datastore
Peer-to-peer rather than master-slave clusters
Tunable consistency can read/write to a single node,
quorum of nodes or all nodes
Recommends static and dynamic column families
Static column families have contain pre-defined columns
Contact Info: phone, address, email, web
Dynamic families have variable numbers of similar columns
Students enrolled in a course
Cassandra CRUD
63
http://www.datastax.com/docs/0.8/references/cql
http://cassandra.apache.org/doc/cql3/CQL.html#selectStmt
CREATE TABLE course (
name text PRIMARY KEY,
instructor text,
maxstudents int
)
INSERT INTO course (name, instructor, maxstudents) VALUES
(‘CIS2120’, ‘Williams’, 28)
UPDATE course SET maxstudents=26 WHERE name=‘CIS2120’
SELECT name, instructor FROM course WHERE maxstudents > 20
Cassandra CRUD
64
No JOIN operations or FOREIGN KEYS
CREATE TABLE people (
name text PRIMARY KEY,
email text,
phones map<text, text>
)
INSERT INTO people (name, email, phones)
VALUES (‘John Weeks’, ‘john.weeks@usu.edu’,
{‘mobile’ : ‘555-1212’, ‘office’ : ‘797-7133’, ‘fax’ : ‘555-1212’})
UPDATE people SET phones[‘office’] = ‘555-1212’
WHERE email = ‘john.weeks@usu.edu’
Questions
???
65 Slides and Feedback at: http://joind.in/11012

NoSQL Introduction

  • 1.
    NoSQL Introduction Slides andFeedback at: http://joind.in/11012
  • 2.
    NoSQL Introduction • Understandwhat NoSQL is and what it is not. • Why would you want to use NoSQL within your project and which NoSQL database would you utilize? • Explore the relationships between NoSQL and RDBMS. • Understand how to select between an RDBMs (MySQL and PostgreSQL), Document Database (MongoDB), Key- Value Store, Graph Database, and Columnar databases or combinations of the above. Thursday May 8th 2014, 3:00pm-3:50pm SB 139 Slides and Feedback at: http://joind.in/11012 2
  • 3.
    NoSQL • History • PopularNoSQL Databases • NoSQL Database Comparisons • Terminology • Consistency, Replication, Performance • NoSQL Implementation CRUD Operations 3 Slides and Feedback at: http://joind.in/11012
  • 4.
    NoSQL Introduction • NoSQLis a commonly adopted misnomer • Typically does not use ANSI SQL – SQL = Structured Query Language – Structure exists but is more Flexible – Queries are performed – Language is closer to Programming Languages 4 Slides and Feedback at: http://joind.in/11012
  • 5.
  • 6.
    NoSQL History • 1998Carlo Strozzi Command Line Database • June 11, 2009 Meetup – Open Source, Distributed, Non-Relational DB – Eric Evans (Rackspace) – Johan Oskarsson (Last.fm) 6
  • 7.
  • 8.
    NoSQL History • Badname, but it stuck! • Not a definitive term • Generally, Newer databases solving new and different problems • Not Only SQL http://blog.sym- link.com/2009/10/30/nosql_whats_in_a_name.html 8 Slides and Feedback at: http://joind.in/11012
  • 9.
    NoSQL Origination • Problemsnot solved by RDBMs • Limitations of RDBMs, not SQL 9 Slides and Feedback at: http://joind.in/11012
  • 10.
    Most Popular Databases http://db-engines.com/en/ranking Rankingby: Web Content, Web Searches, Technical Discussion, Jobs, Resumes 10
  • 11.
    Most Popular NoSQL •MongoDB - Document Store • Cassandra – Wide Column Store • Solr – Search Engine • Redis – Key-value store • Hbase – Wide Column Store • Memcached – Key-value Store • CouchDB – Document Store • Neo4j – Graph Database • Riak – Key-value Store • SimpleDB – Key-value Store within Amazon Cloud 11 Slides and Feedback at: http://joind.in/11012
  • 12.
    12 NoSQL vs RDBMs ImageReference: http://blogs.the451group.com/information_management/2012/11/02/updated-database-landscape-graphic/
  • 13.
    Reading Recommendations Great Overviewof NoSQL: Seven Databases in Seven Weeks Eric Redmond and Jim Wilson 13
  • 14.
    NoSQL “Bleeding Edge” •Several solutions are mature and stable enough to run large scale production environments • Not all permutations have been considered • Several (but not all) optimization strategies have been published • Crucial elements such as Security may be a secondary add-on in favor of performance. 14
  • 15.
    NoSQL “Bleeding Edge” SunMicrosystems csh man page: “Although robust enough for general use, adventures into the esoteric periphery of the C shell may reveal unexpected quirks.” 15 Slides and Feedback at: http://joind.in/11012
  • 16.
    NoSQL Comparison 16 Take noteof patterns: Recent Release, Open Source, Utilized at High-Volume sites Variety of Formats: Key-Value, Wide-Column, Document, Graph http://db-engines.com/en/ranking
  • 17.
    NoSQL Database Types 17 •Key-Value • Column Oriented Databases (Columnar) • Graph • Document • Search Database - Solr • Key-Value Web Optimization - Memcached
  • 18.
    Key-Value Stores 18 Key Value code:java17.316% Lowest rank on Feb 2014 code:C 18.334% Lowest rank on August 2013 code:Objective-C Lowest rank on Dec 2007 11.341% code:C++ {“score”:”6.892%”, “low rank”: “Feb 2008”} Key Value drink:java coffee drink:punch Sprite + pineapple juice drink:pop Carbonated Soda http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html Codebucketdrinkbucket
  • 19.
  • 20.
  • 21.
    Document Oriented Database 21 { "_id": 1, "name" : { "first" : "John", "last" : "Backus" }, "contribs" : [ "Fortran", "ALGOL", "FP" ], "awards" : [ { "award" : "W.W. McDowell Award", "year" : 1967, "by" : "IEEE Computer Society" }, { "award" : "Draper Prize", "year" : 1993, "by" : "National Academy of Engineering" } ] }
  • 22.
    Document Oriented Database 22 {"facutly" : [ { "_id" : 1, "name" : { "first" : "John", "last" : "Backus" }, "contribs" : [ "Fortran", "ALGOL", "FP" ], "awards" : [ { "award" : "W.W. McDowell Award", "year" : 1967, "by" : "IEEE Computer Society" }, { "award" : "Draper Prize", "year" : 1993, "by" : "National Academy of Engineering" } ] }, { "_id" : 2, "name" : { "first" : "David", "last" : "Williams" }, "contribs" : [ "C#", "Java", "PHP" ], "awards" : [ { "award" : "Sherman Peabody Award II", "year" : 2095, "location" : "Paris", "by" : "Intergalactic Continuum" }, { "award" : "Sherman Peabody Award IX", "year" : 2090, "location" : "Paris", "by" : "Intergalactic Continuum" }, { "award" : "Sherman Peabody Award IV", "year" : 2093, "location" : "Paris", "by" : "Intergalactic Continuum" } ] } ] }
  • 23.
  • 24.
  • 25.
    NoSQL Comparison 25 No ANSISQL Standards, No Predefined Schemas, Replication, Eventual Consistency, Rarely Foreign Keys, Data Types not required Newer Concepts: Sharding, REST API, JSON, MapReduce
  • 26.
    NoSQL Characteristics 26 No PredefinedSchemas • May insert data without creating a table • Schema Versions (v1.5, v1.6, v1.7,…) Rarely Foreign Keys • No JOIN operations • Relationships are not automatically maintained Eventual Consistency • Old copies being replaced by new records • Inconsistent data until all replacements are complete
  • 27.
    Download NoSQL v95141.3 27 Released4/1/2014 http://www.nosql.org/downloads/ymbkm.zip
  • 28.
    NoSQL Terminology and Concepts 28 Slides andFeedback at: http://joind.in/11012
  • 29.
    Sharding 29 Partitions – Datadistributed across disks Sharding – Data distributed across servers
  • 30.
    Map Reduce 30 Divides workacross distributed systems Parallel processing of large data sets Divide – Conquer – Consolidate Often Implement by defining Map and Reduce classes or functions 1+2+3+6+7+8+9=? Google’s MapReduce Programming Model – Revisited Ralf Lammel, Microsoft, 2008 http://www.sciencedirect.com/science/article/pii/S0167642307001281 2 6 8 1 7 3 9 16 20 36
  • 31.
    JSON 31 Subset of JavaScriptObject Notation Similarities to XML method for representing data Syntax Name : Value pairs “salary” : “125000” Values are: number, string, Boolean, array, object, or NULL Objects can store Objects, Arrays can store Arrays Separate pairs by commas “salary” : “125000”, “gender” : “male” Curly braces denote objects { “salary” : “125000”, “gender” : “male” } Square brackets denote arrays “phone” : [”555-1212”, ”555-3344”] “phone” : [ {“office” : ”555-1212”}, {“mobile” : ”555-3344”} ]
  • 32.
    JSON Example 32 { "_id" :1, "name" : { "first" : "John", "last" : "Backus" }, "contribs" : [ "Fortran", "ALGOL", "FP" ], "awards" : [ { "award" : "W.W. McDowell Award", "year" : 1967, "by" : "IEEE Computer Society" }, { "award" : "Draper Prize", "year" : 1993, "by" : "National Academy of Engineering" } ] } http://www.mongodb.com/json-and-bson
  • 33.
    REST API 33 CRUD (Create,Read, Update, Delete) operations through the web HTTP Methods GET (List/Read) POST(Update) PUT(Create) DELETE(Delete) EXAMPLE API http://www.blinksale.com/api/ List/Read Data via HTTP GET to http://www.blinksale.com/invoices http://www.blinksale.com/invoices/invoice_id/payments http://www.blinksale.com/invoices/?start=2006&end=2008 Returns XML results
  • 34.
    REST API 34 Update datavia HTTP POST to http://www.blinksale.com/invoices/invoice_id/payments <?xml version="1.0" encoding="UTF-8"?> <payment xmlns="http://www.blinksale.com/api"> <amount>1000.00</amount> <date>2006-09-27</date> </payment> REST = REpresentational State Transfer Twitter Example: https://dev.twitter.com/docs/api/1.1 (GET and POST only)
  • 35.
    Database SELECT Statements 35 Oracle SELECT* FROM relationships MongoDB db.relationships.find() Cassandra (CQL) SELECT * FROM relationships Slides and Feedback at: http://joind.in/11012
  • 36.
    Database SELECT Statements 36 Redis– Key-Value Store SMEMBERS relationships Riak – Key-Value Store with REST API (+ proprietary drivers) http://localhost:8091/riak/relationships/likes Neo4j (Cypher) MATCH (n)-[r:LIKES]->(m) RETURN n,r,m Slides and Feedback at: http://joind.in/11012
  • 37.
    JOINS without ForeignKeys 37 original_id = ObjectId() db.employer.insert({ "_id": original_id, "name": "Broadway Tech", "url": "bc.example.net" }) db.people.insert({ "name": "Erin", “employer_id": original_id, "url": "bc.example.net/Erin" }) “Erin” works at “Broadway Tech” One of the employees at “Broadway Tech” is “Erin” http://docs.mongodb.org/manual/reference/database-references/#document-references
  • 38.
  • 39.
    ACID, BASE, CAP,CPR 1979 Gray, 1983 Reuter & Härder - ACID Atomic, Consistent, Isolated, Durable Rollback: All or Nothing, Follows Rules, Simultaneous, No Drops 1997 Brewer - BASE Basically Available, Soft-state, Eventually consistent 2000 Brewer – CAP (Pick Two) Consistency, Availability, Partition Tolerance CPR (Pick Two) Consistency, Performance, Replication/Redundancy Contrived - Stretch Definitions 39
  • 40.
  • 41.
    CPR 41 Consistency Performance A DCB Spreaddata across storage or computer Redundancy
  • 42.
  • 43.
  • 44.
  • 45.
    SQL CRUD 45 Create INSERT INTOtable (column1, column2) VALUES (9, 'string'); Read SELECT column1, column2 FROM table; Update UPDATE table SET column2 = 'text' WHERE column1= 9 Delete DELETE FROM table WHERE column2='text'
  • 46.
    Key-Value Stores 46 Key Value code:java17.316% Lowest rank on Feb 2014 code:C 18.334% Lowest rank on August 2013 code:Objective-C Lowest rank on Dec 2007 11.341% code:C++ {“score”:”6.892%”, “low rank”: “Feb 2008”} Key Value drink:java coffee drink:punch Sprite + pineapple juice drink:pop Carbonated Soda http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html codebucketdrinkbucket
  • 47.
    Redis CRUD 47 http://redis.io/commands http://redis.io/topics/data-types-intro http://openmymind.net/2011/11/8/Redis-Zero-To-Master-In-30-Minutes-Part-1/ Redis isan in-memory Key-Value Store which stores: Strings, Hashes, Lists, Sets, or Ordered sets Strings: values of strings are concrete and can not be altered SET user:jim {lastname: ‘Mathews’, salary: 125000} GET user:jim Hashes: allows modification and retrieval of individual values HSET user:jim salary 125000 HSET user:jim lastname Mathews HGET jim salary
  • 48.
    Redis CRUD 48 Lists: One-dimensionalarray with insert, append, pop, and push Redis.lpush(‘users:employees’, ‘user:jim’) redis.mget(redis.lrange(‘users:employess’,0,5)) Sets: lists with no duplicate values (SADD = Set Add) SADD users:employees jim SADD users:employees krishna SMEMBERS employees Sorted Sets: are sets with an added sorting value ZADD users:employees 125000 jim ZADD users:employees 157000 Krishna ZRANGEBYSCORE users:employees 100000 180000
  • 49.
    Riak CRUD 49 Easy toinstall and configure test cluster REST Queries Create/PUT a “course:CIS2120” row curl –v –X PUT http://localhost:8091/riak/course/CIS2120 -H “Content-Type: application/json” -d ‘{“name”:”Database Coding”, “days”:”MWF”}’ Read/GET the value for “course:CIS2120” curl –X GET http://localhost:8091/riak/course/CIS2120 curl http://localhost:8091/riak/course/CIS2120 Key Value course:CIS2120 {“name”:”Database Coding”, “days”:”MWF”}
  • 50.
    Riak Links 50 Riak canlink on value to key:value to another with a relationship curl –v –X PUT http://localhost:8091/riak/student/sorensen -H “Content-Type: application/json” -H “Link: </riak/course/CIS2120>; riaktag=”enrolled”” -d ‘{“firstname”:”Conner”}’ This does not automatically create a link from “sorensen“ to “CIS2120”
  • 51.
  • 52.
    Neo4j – GraphDatabase 52 http://www.neo4j.org/learn/try http://docs.neo4j.org/refcard/2.0/ MATCH (n)-[r:LIKES]->(m) RETURN n,r,m Matches a person “n” that likes person “m” https://gist.github.com/peterneubauer/6019125 http://gist.neo4j.org/?6019125
  • 53.
    Neo4j CRUD 53 Must trydragging nodes at: http://www.neo4j.org/learn/try MATCH (user {name:“Bill"})-[:KNOWS]->(colleague) WHERE colleague.employer=“LinkedIn” RETURN user,colleague ORDER BY colleague.name LIMIT 10 http://docs.neo4j.org/refcard/2.0/ MATCH (n)-[r:LIKES]->(m) RETURN n,r,m Matches a person “n” that likes person “m” MATCH (n)-[r]->(m) RETURN n,r,m Matches any relationship between “n” and “m” http://www.neo4j.org/learn/cypher
  • 54.
    Neo4j(LUKE {name:"Luke Skywalker"}), (HAN{name:"Han Solo"}), (LEIA {name:"Princess Leia Organa"}), (OBI_WAN {name:"Obi Wan Kenobi"}), (YODA {name : "Yoda"}), (VADER {name:"Darth Vader"}), (C3PO {name:"C3PO", droid:true}), (R2D2 {name:"R2D2", droid:true}), (CHEWBACCA {name:"Chewbacca"}), (TATOOINE {name:"Tatooine", distance:13184}), (DAGOBAH {name:"Dagobah", distance:15407}), (JEDI {name:"Jedi"}), (SITH {name:"Sith"}), (REBELLION {name:"Rebellion"}), (EMPIRE {name:"Empire"}), (DARK_SIDE {name:"Dark Side"}), (LIGHT_SIDE {name:"Light Side"}), … (LUKE)-[:FRIENDS_WITH]->(HAN), (LUKE)-[:FRIENDS_WITH]->(LEIA), (HAN)-[:FRIENDS_WITH]->(CHEWBACCA), (YODA)-[:TEACHES]->(OBI_WAN), (YODA)-[:TEACHES]->(LUKE), (OBI_WAN)-[:TEACHES]->(LUKE), (OBI_WAN)-[:KNOWS]->(VADER), (LUKE)-[:KNOWS]->(R2D2), (R2D2)-[:KNOWS]->(C3PO), (LUKE)-[:LIVED_ON]->(TATOOINE), (HAN)-[:LIVED_ON]->(CORELLIA), (LEIA)-[:LIVED_ON]->(ALDERAAN), (YODA)-[:LIVED_ON]->(DAGOBAH), (LUKE)-[:DEVOTED_TO]->(JEDI), (LUKE)-[:DEVOTED_TO]->(REBELLION), (LUKE)-[:DEVOTED_TO]->(LIGHT_SIDE), (VADER)-[:DEVOTED_TO]->(SITH), (VADER)-[:DEVOTED_TO]->(EMPIRE), (VADER)-[:DEVOTED_TO]->(DARK_SIDE), (LEIA)-[:DEVOTED_TO]->(REBELLION), (HAN)-[:DEVOTED_TO]->(REBELLION) … https://gist.github.com/peterneubauer/6019125 http://gist.neo4j.org/?6019125 MATCH y-[r]-other WHERE y.name='Yoda' return y.name, type(r), other.name 54
  • 55.
    Google BigTable • WhitePaper published in 2006 • Many databases based upon BigTable • 13 pages, readable for many non-techies • Insightful into the early days of NoSQL http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf 55
  • 56.
    Hbase 56 Large-Scale, Column-oriented database Consistency,Performance, Fault-Tolerant, ACID via Locking Tables are created before initial data is added Tables have row keys are indexed row identifier strings column families – contain one or more columns timestamp for version control
  • 57.
    Hbase 57 Row key isa unifier for column families. If row does insert values in a column family no disk space is utilized within the column family. Keys are identified by column_family:column_name text: revision:author revision:comment Write-Ahead Logging (WAL) similar to file system journaling
  • 58.
    Hbase CRUD 58 create ‘wiki_table’,‘text_column_family’, ‘revision_column_family’ create ‘wiki’, ‘text’, ‘revision’ put ‘wiki’, ‘first page’, ‘text:’, ‘…’ put ‘wiki’, ‘first page’, ‘revision:author’, ‘…’ get ‘wiki’, ‘first page’, [‘revision:author’, ‘revision:comment’] delete ‘wiki’, ‘first page’, ‘revision:author’ scan ‘wiki’ = SELECT * FROM wiki Seven Databases in Seven Weeks, Redmond & Wilson 2012
  • 59.
    MongoDB Document Store 59 db.courses.insert({ name:“CIS2120”, description: “Database Coding”, instructor: { name: “David Williams”, email: “david.williams@usu.edu” } instructor2: { name: “John Kerley-Weeks”, office: “JQL230” } subjects: [“Python”, “MongoDB”, “3NF”, “ETL”, “Star Schema”] })
  • 60.
    MongoDB vs SQL 60 http://docs.mongodb.org/manual/reference/sql-comparison/ MongoDB-> SQL Terminology Collection -> Table Document -> Row Column -> Field db.courses.find() = SELECT * FROM courses db.courses.count() = SELECT COUNT(*) FROM courses db.courses.find({name: “CIS2120”})
  • 61.
    MongoDB Simple Database 61 http://media.mongodb.org/zips.json {"city":"ACMAR", "loc": [-86.51557, 33.584132], "pop": 6055, "state": "AL", "_id": "35004"} {"city": "ADAMSVILLE", "loc": [-86.959727, 33.588437], "pop": 10616, "state": "AL", "_id": "35005"} {"city": "ADGER", "loc": [-87.167455, 33.434277], "pop": 3205, "state": "AL", "_id": "35006"} {"city": "KEYSTONE", "loc": [-86.812861, 33.236868], "pop": 14218, "state": "AL", "_id": "35007"} {"city": "NEW SITE", "loc": [-85.951086, 32.941445], "pop": 19942, "state": "AL", "_id": "35010"} {"city": "ALPINE", "loc": [-86.208934, 33.331165], "pop": 3062, "state": "AL", "_id": "35014"} {"city": "ARAB", "loc": [-86.489638, 34.328339], "pop": 13650, "state": "AL", "_id": "35016"} {"city": "BAILEYTON", "loc": [-86.621299, 34.268298], "pop": 1781, "state": "AL", "_id": "35019"} {"city": "BESSEMER", "loc": [-86.947547, 33.409002], "pop": 40549, "state": "AL", "_id": "35020"} {"city": "HUEYTOWN", "loc": [-86.999607, 33.414625], "pop": 39677, "state": "AL", "_id": "35023"} {"city": "BLOUNTSVILLE", "loc": [-86.568628, 34.092937], "pop": 9058, "state": "AL", "_id": "35031"} {"city": "BREMEN", "loc": [-87.004281, 33.973664], "pop": 3448, "state": "AL", "_id": "35033"} {"city": "BRENT", "loc": [-87.211387, 32.93567], "pop": 3791, "state": "AL", "_id": "35034"} {"city": "BRIERFIELD", "loc": [-86.951672, 33.042747], "pop": 1282, "state": "AL", "_id": "35035"} {“city”: “Logan, UT”, “additionally”: [“Nibley, UT”, “River Heights, UT”], “state”: “UT”, “version”: “2.1”, “_id”: “84321”} {“city”: “Olivehurst, CA”, “additionally”: [“Arboga, CA”, “Plumas Lake, CA”, “West Linda, CA”], “state”: “CA”, “version”: “2.1”, “_id”: “95961”}
  • 62.
    Cassandra Characteristics 62 Scalable, High-availabilityWide-columnar datastore Peer-to-peer rather than master-slave clusters Tunable consistency can read/write to a single node, quorum of nodes or all nodes Recommends static and dynamic column families Static column families have contain pre-defined columns Contact Info: phone, address, email, web Dynamic families have variable numbers of similar columns Students enrolled in a course
  • 63.
    Cassandra CRUD 63 http://www.datastax.com/docs/0.8/references/cql http://cassandra.apache.org/doc/cql3/CQL.html#selectStmt CREATE TABLEcourse ( name text PRIMARY KEY, instructor text, maxstudents int ) INSERT INTO course (name, instructor, maxstudents) VALUES (‘CIS2120’, ‘Williams’, 28) UPDATE course SET maxstudents=26 WHERE name=‘CIS2120’ SELECT name, instructor FROM course WHERE maxstudents > 20
  • 64.
    Cassandra CRUD 64 No JOINoperations or FOREIGN KEYS CREATE TABLE people ( name text PRIMARY KEY, email text, phones map<text, text> ) INSERT INTO people (name, email, phones) VALUES (‘John Weeks’, ‘john.weeks@usu.edu’, {‘mobile’ : ‘555-1212’, ‘office’ : ‘797-7133’, ‘fax’ : ‘555-1212’}) UPDATE people SET phones[‘office’] = ‘555-1212’ WHERE email = ‘john.weeks@usu.edu’
  • 65.
    Questions ??? 65 Slides andFeedback at: http://joind.in/11012