NO SQL: What, Why, How

NO-SQL: WHY, WHAT, HOW
Igor Moochnick
Director, Cloud Platforms
BlueMetal Architects
igorm@bluemetal.com
Blog: igorshare.wordpress.com

What is wrong with SQL?
 Is it answering your needs?
 Does it fit your solution?
 Do you rip the benefits of the
relational storage?
 Will it support the needs of your
projects in the future?
 More users?
 More data?

Gov't data stored in the US (2009):
more than 800 petabytes

Assumptions
 The data doesn’t fit on one node
 The data may not fit one rack
 Each machine operates independently with minimal
coordination between themselves
 Conclusion:
 There is a need to partition data across lots of machines

There is a limit to RDBMS scale
 Scaling up doesn't work
 Scaling out with traditional RDBMSs isn't so hot either
 Sharding scales, but you lose all the features that make RDBMSs
useful!
 Sharding and Table partitioning are operationally heavy
 If we don't need relational features, we want a distributed
NRDBMS.

Fallacies of Distributed Computing
1. The network is reliable
2. Latency is zero
3. Bandwidth is infinite
4. The network is secure
5. Topology doesn’t change
6. There is one administrator
7. Transport cost is zero
8. The network is homogeneous
http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing

The magic CAP
C
(Consistency)
P
(Partitioning)
A
(Availability)
AP

Commodity Hardware
Here Calxeda's EnergyCard atop a HP Redstone server
prototype. Source: Jon Snyder
CNET: Google uncloaks once-secret server
RAM is new Disk, Disk is new Tape
- Jim Gray, (former) manager of Microsoft Research’s eScience Group

Why it is important
 New levels of scalability
 Rapid development
 Cloud ready
 Distributed by nature
 There’s no need for DBA, no need for complicated SQL
queries and it is fast. Hooray, freedom for the people!
 WORLD, PEACE!

Beware!
 Data models are still important
 Data duplication
 Interfaces and interoperability - nonexistent
 Understand limitations of the technology
 OPS are screwed

Advantages of NOSQL
 Cheap, easy to implement
 Removes impedance mismatch between objects and tables
 Quickly process large amounts of data
 Data Modeling Flexibility (including schema evolution)
Disadvantages of NOSQL
 New Technology
 Data is generally duplicated, potential for inconsistency
 No standard language or format for queries
 Depends on the application layer to enforce data integrity

 Document Databases
 Based loosely on documents / POCO
 Data model – collections of documents
 Graph Databases
 Based on Graph theory
 Data model – graph, nodes, edges, properties
NOSQL categories

NOSQL categories
 Key Value Stores
 Based on DHT (Distributed Hash Table),
Amazon’s Dynamo design
 Data model – collection of key value pairs
 Column Stores
 Based on Google’s BigTable design
 Data model - big table, column families

Types of NOSQL Databases
 Document (examples: MongoDB, CouchDB, RavenDB)
 Graph (examples: Neo4J, Sones, TinkerGraph)
 Key/Value (examples: Cassandra, SimpleDB, Dynamo,
Voldemort, Riak, Redis)
 Tabular/Wide Column (examples: BigTable, Hbase, Cassandra)
 Search (example: Lucene)
http://NOSQL-databases.org

Consistency Models
 Full Consistency
 Read-what-I-wrote
 Session Consistency
 Monotonic Read Consistency
 Eventual Consistency

Write collision resolutions
 Timestamps
 Vector Clocks
 HiLo algorithm

Sharding
 Is NOT
 Replication
 Clustering
 Backup
 It is a smart way of splitting
data across databases
 Requires aggregation
 Enables parallelization

Where is my data?
 Lookup tables
 (Consistent) Hash
functions
Node A
Node B
Node C

Gossip
Node
Node
Node
Node
Node Node
Node
Node

Gossip (round 1)
Node
Node
Node
Node
Node Node
Node
Node

Gossip (round 2)
Node
Node
Node
Node
Node Node
Node
Node

Gossip (round 3)
Node
Node
Node
Node
Node Node
Node
Node

Gossip (round 4)
Node
Node
Node
Node
Node Node
Node
Node

Don’t forget backups !!!
Replication ≠ Backups

Modeling
 Stop thinking relational
 Start thinking about how your data will be used
 Usage scenarios
 Optimize for reads? Writes?
 Think about your domain objects and business logic in native .Net
(POCO) classes
 Deformalize if needed
 Reference entities to other entities, or collections of them
 Identify aggregate root(s)

Map-Reduce
 First impressions, but it’s get better over time
Original

Play Nice
 Know and use the tools you need for the job at hand

Extra Links and References
 NOSQL debrief
 Distributed Storage -
http://horicky.blogspot.com/2008/08/distributed-
storage.html
 Images from here

NO SQL: What, Why, How

More Related Content

What's hot

Similar to NO SQL: What, Why, How

More from Igor Moochnick

Recently uploaded

NO SQL: What, Why, How

Editor's Notes