Dropping ACID: 
Wrapping Your Mind Around NoSQL Databases 
Kyle Banerjee 
Digital Services Program Manager 
Orbis Cascade Alliance
Why should anyone care? 
Great for the Web 
• No schema – easy to store data that are really 
awkward to work with in RDBMS 
• Much easier horizontal scalability than RDBMS 
• Works great with huge amounts of data 
• High fault tolerance 
• Integration of both RESTful and cloud computing 
technologies
Examples of sites using NoSQL
There is no magic 
• Databases are fast because they physically structure 
data so it can be accessed efficiently 
• NoSQL achieves performance through tradeoffs that 
make sense in a Web environment 
• RDBMS can be used in high performance applications 
• Compromises (e.g. denormalization, sharding) that kill the 
advantage of having an RDBMS are often necessary 
• Technically more complex (i.e. expen$ive) to 
implement/maintain
What is a NoSQL database? 
A nonrelational data store 
–Document Store 
–Wide Column Store 
–Key Value Store 
–Graph 
–XML 
NoSQL databases differ significantly in what 
they are good for
What’s best depends on your data 
Complexity 
Key/Value stores 
Size 
Wide column 
Document 
databases Graph 
databases
Your priorities 
• What types of queries do you need to 
support? 
• How much data? 
• Optimized for reads, writes, or updates? 
• Versioning 
• How separate is data from app? Will other 
applications need to access it in future?
And how you want to interact with it 
• RESTful inteface 
• Query API 
• NonSQL query languages 
• Via indexed values, keys, nodes 
• File access
Key value stores 
• Basically a hash 
• Focus on scaling to huge amounts of data 
• Examples: Amazon SimpleDB, Voldemort, 
Dynomite, BerkeleyDB, Riak
Wide column stores 
• Somewhat like column oriented relational 
databases 
• Same elements don’t have to have same 
columns 
• Examples: Hadoop, Cassandra, Hbase
Document databases 
• Like key-value stores, but values have meaning 
to database 
• Examples: CouchDB, MongoDB
Graph databases 
• Uses nodes, relationships between nodes and 
key-value properties 
• Recursive structures in relational DBs require 
expensive joins 
• Examples: Neo4j, VertexDB, AllegroGraph
Things that simplify life 
• JSON 
• RESTful interface or easy API 
• Multiversion Concurrency Control (MVCC)
Traditional RDBMS 
animal_type 
animal_id: integer 
description: varchar 
pet 
pet_id: integer 
animal_id: integer 
name: varchar 
likes 
pet_id: integer 
friend_id: integer 
hates 
pet_id: integer 
animal_id: integer 
pet animal_type likes animal_type 
Charley dog Powder dog 
Charley dog Bo dog 
pet animal_type hates animal_type 
Charley dog Abby cat 
Charley dog Spidey tarantula
JSON Example 
{ 
"name": "Charley", 
"animal_type": "dog", 
"likes": [ 
{"name": "Powder", "animal_type": "dog"}, 
{"name": "Bo", "animal_type": "dog"} 
], 
"hates": [ 
{"name": "Abby", "animal_type": "cat "}, 
{"name": “Spidey", "animal_type": “tarantula"} 
] 
}
Why JSON? 
• Lightweight, interoperable and open 
• Can be composed in any text editor 
• Syntax is crazy easy 
• With RESTful API, can be used with any 
software that supports HTTP (even the user’s 
browser can make direct DB calls) 
• Allows you to send and receive data as it is 
used
How easy can REST be? 
Create: HTTP PUT /db/docid 
Read: HTTP GET /db/docid 
Update: HTTP POST /db/docid 
Delete: HTTP DELETE /db/docid
MVCC in a nutshell 
• Creates new version each time an update is 
made 
• Timestamps used to prevent conflicts 
• Reads are always possible
Disadvantages of NoSQL 
• Performance and scalability achieved at the 
expense of feature support 
• No joins. Grouping and ordering become more 
problematic 
• No SQL 
• No transactions 
• Eventual consistency vs strict consistency 
• Tools are often lacking
The bottom line 
• In a library context, NoSQL is appropriate 
when flexible schema or fast displays that 
contain related data are needed 
• Understand the problem at hand as well as 
the pros/cons of your options before deciding 
on a solution 
• Don’t ditch your RDBMS
Questions? 
Kyle Banerjee 
Orbis Cascade Alliance 
banerjek@uoregon.edu

Dropping ACID: Wrapping Your Mind Around NoSQL Databases

  • 1.
    Dropping ACID: WrappingYour Mind Around NoSQL Databases Kyle Banerjee Digital Services Program Manager Orbis Cascade Alliance
  • 2.
    Why should anyonecare? Great for the Web • No schema – easy to store data that are really awkward to work with in RDBMS • Much easier horizontal scalability than RDBMS • Works great with huge amounts of data • High fault tolerance • Integration of both RESTful and cloud computing technologies
  • 3.
    Examples of sitesusing NoSQL
  • 4.
    There is nomagic • Databases are fast because they physically structure data so it can be accessed efficiently • NoSQL achieves performance through tradeoffs that make sense in a Web environment • RDBMS can be used in high performance applications • Compromises (e.g. denormalization, sharding) that kill the advantage of having an RDBMS are often necessary • Technically more complex (i.e. expen$ive) to implement/maintain
  • 5.
    What is aNoSQL database? A nonrelational data store –Document Store –Wide Column Store –Key Value Store –Graph –XML NoSQL databases differ significantly in what they are good for
  • 6.
    What’s best dependson your data Complexity Key/Value stores Size Wide column Document databases Graph databases
  • 7.
    Your priorities •What types of queries do you need to support? • How much data? • Optimized for reads, writes, or updates? • Versioning • How separate is data from app? Will other applications need to access it in future?
  • 8.
    And how youwant to interact with it • RESTful inteface • Query API • NonSQL query languages • Via indexed values, keys, nodes • File access
  • 9.
    Key value stores • Basically a hash • Focus on scaling to huge amounts of data • Examples: Amazon SimpleDB, Voldemort, Dynomite, BerkeleyDB, Riak
  • 10.
    Wide column stores • Somewhat like column oriented relational databases • Same elements don’t have to have same columns • Examples: Hadoop, Cassandra, Hbase
  • 11.
    Document databases •Like key-value stores, but values have meaning to database • Examples: CouchDB, MongoDB
  • 12.
    Graph databases •Uses nodes, relationships between nodes and key-value properties • Recursive structures in relational DBs require expensive joins • Examples: Neo4j, VertexDB, AllegroGraph
  • 13.
    Things that simplifylife • JSON • RESTful interface or easy API • Multiversion Concurrency Control (MVCC)
  • 14.
    Traditional RDBMS animal_type animal_id: integer description: varchar pet pet_id: integer animal_id: integer name: varchar likes pet_id: integer friend_id: integer hates pet_id: integer animal_id: integer pet animal_type likes animal_type Charley dog Powder dog Charley dog Bo dog pet animal_type hates animal_type Charley dog Abby cat Charley dog Spidey tarantula
  • 15.
    JSON Example { "name": "Charley", "animal_type": "dog", "likes": [ {"name": "Powder", "animal_type": "dog"}, {"name": "Bo", "animal_type": "dog"} ], "hates": [ {"name": "Abby", "animal_type": "cat "}, {"name": “Spidey", "animal_type": “tarantula"} ] }
  • 16.
    Why JSON? •Lightweight, interoperable and open • Can be composed in any text editor • Syntax is crazy easy • With RESTful API, can be used with any software that supports HTTP (even the user’s browser can make direct DB calls) • Allows you to send and receive data as it is used
  • 17.
    How easy canREST be? Create: HTTP PUT /db/docid Read: HTTP GET /db/docid Update: HTTP POST /db/docid Delete: HTTP DELETE /db/docid
  • 18.
    MVCC in anutshell • Creates new version each time an update is made • Timestamps used to prevent conflicts • Reads are always possible
  • 19.
    Disadvantages of NoSQL • Performance and scalability achieved at the expense of feature support • No joins. Grouping and ordering become more problematic • No SQL • No transactions • Eventual consistency vs strict consistency • Tools are often lacking
  • 20.
    The bottom line • In a library context, NoSQL is appropriate when flexible schema or fast displays that contain related data are needed • Understand the problem at hand as well as the pros/cons of your options before deciding on a solution • Don’t ditch your RDBMS
  • 21.
    Questions? Kyle Banerjee Orbis Cascade Alliance banerjek@uoregon.edu