No sql – rise of the clusters
Upcoming SlideShare
Loading in...5
×
 

No sql – rise of the clusters

on

  • 549 views

What is NoSQL? NoSQL describes a family of approaches to managing data at an enterprise level that have key similarities, but - at the same time - are very different from classic SQL based relational ...

What is NoSQL? NoSQL describes a family of approaches to managing data at an enterprise level that have key similarities, but - at the same time - are very different from classic SQL based relational databases.

NoSQL has emerged as a 'movement' over the last 5 years and many specific noSQL datastores - Mongo, Redis, HBase, Cassandra, Neo4J - are being used for mission critical systems by many organizations including Facebook, LinkedIn, Dropbox, American Express, NSA, & the CIA. Does NoSQL spell the end of SQL based relational datastores like Oracle, MySQL, SQLServer, & Sybase? Definitely not, but the world is moving in the direction of "Polyglot Persistence" and away from the "Relational Persistence" hegemony. In my presentation I will explain why this shift is occurring and will speculate about what the future will hold.

Statistics

Views

Total Views
549
Views on SlideShare
505
Embed Views
44

Actions

Likes
0
Downloads
5
Comments
0

5 Embeds 44

http://www.mylife.com 22
http://www.linkedin.com 15
https://www.linkedin.com 4
http://tweetedtimes.com 2
https://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

No sql – rise of the clusters No sql – rise of the clusters Presentation Transcript

  • September 19, 2013 Speaker: David Wolfe
  • Topics What is SQL? What is NoSQL?  Why have relational databases been successful?  Why did NoSQL databases emerge?  How are their data models different? 
  • SQL & relational databases Relational databases are software applications that store data  Data is stored in tables that have rows & columns : think excel spreadsheets  FirstName LastName Age Zipcode Gender Bob Smith 45 38444 M Jane Happy 23 15122 F Fred Jones 55 92102 M Johnny Appleseed 26 90025 M
  • SQL & relational databases  Relational databases typically have many tables that are “related” to one another
  • SQL & relational databases Relational databases support access to data in tables through a language called “SQL” – Structured Query Language  SQL supports “set” based operations on tables – selection, projection, joining   SQL is based on relational algebra
  • SQL & relational databases Relational databases were developed in the late 1970s at IBM  They have been the dominant approach to data management in the enterprise through the early 2000’s  Examples include      Oracle Sybase MySQL Postgress
  • NoSQL databases NoSQL are software applications that store data  They, not surprisingly, do not use SQL or the relational model (interrelated tables)  They are “less strict” about data definition  They were developed in a “big-data” world for applications needing massive scalability (clustering) 
  • NoSQL databases  There are many types of NoSQL databases We will review the differences later
  • RDBMS value - persistence During the 90’s and 2000’s as pc’s became ubiquitous, distributed computing took off.  In the 1990’s, client-server and n-tier architectures dominated enterprise development  The late 90’s and 2000’s saw the dominance of the web and distributed applications that broke out of enterprise 
  • RDBMS value - persistence  In this distributed world where applications needed to keep data around for  Many users  Extended periods RDBMS emerged as the defacto choice for persisting data.
  • RDBMS value - concurrency  Another challenge that distributed applications presented was concurrency:  many users viewing and potentially updating the same data at the same time Concurrency is notoriously difficult to get right for even the best engineers.  Relational databases “helped” by controlling data access with transactions 
  • RDBMS value - integration  Enterprise application eco-systems necessitate multiple integrated software applications. Example  Customer Service app  Biz Intel app  E-Commerce app  Inventory management apps  Common approach was to use a shared rdbms database integration approach.
  • RDBMS value – SQL RDBMS providers all supported a core SQL standard  In theory this would allow developers to switch reliance on different RDBMS providers without problems  In fact, different providers (Oracle, Sybase, Microsoft) developed different “dialects” or SQL extensions (pl SQL vs. T-SQL) 
  • Crack #1– impedance mismatch  Impedance mismatch is the difference between the relational model and inmemory data structures
  • Crack #1– impedance mismatch In the late 1990s people believed that impedance mismatch would lead to RDBMS being replaced by databases that replicated in-memory structures to disk (OODBMS)  While the 1990s saw the rise of OO programming languages, OODBMS never took gained real traction 
  • Crack #1– impedance mismatch  OODBMS didn’t gain traction because  Impedance mismatch had been made easier to deal with by Object-Relational (OR) mapping frameworks like Hibernate, iBatis, & Cocoon  There was a growing professional divide between application developers and database administrators  The value of RDBMS as an app integration mechanism was large
  • Crack #2– SOA The 2000’s saw a shift in how enterprise applications interacted  Historically, many applications interacted through a shared RDBMS.  This approach – shared integration RDBMS – has serious problems   Overly complex schema  Cant change tables or add indices easily  Database has to preserve integrity
  • Crack #2– SOA Interactions between applications shifted to web-services  Web-services constituted protocols for moving documents (XML, JSON) over HTTP using SOAP or REST based approaches  SOA allowed applications to encapsulate data and expose it through services 
  • The Final Crack #3– Clusters The internet saw several large web properties dramatically increase in scale  Websites started tracking activity and structure in a very detailed way       Social gestures Social links Log data Purchase gestures Increasing numbers of users appeared using more devices
  • The Final Crack #3– Clusters The problem with scaling out (clustering) is that RDBMS are not designed to run on clusters.  Oracle RAC & MS SQL Server all use the concept of a shared disk sub-system   Still single point of failure and scaling limitation  The final crack – mismatch between RDBMS & clusters
  • NoSQL Emergence  The emergence of NoSQL was really about needing databases that run on clusters   One exception is Graph databases Though problems with shared database integration and impedance mismatch existed, it was the need for scale that drove the emergence of NoSQL databases
  • Aggregate Data Models A key characteristic of NoSQL databases is that they do not use the Relational data metamodel (relations & tuples)  There are four types of data metamodels in the NoSQL eco-system      Key-value Document Column-family Graph
  • Aggregate Data Models  Key-value, document, and columnfamily NoSQL databases share a common characteristic of their data models called “aggregate orientation”  We ill not cover graph based data metamodels in this presentation
  • Aggregates The relational model takes information you want to store and divides it into rows.  Rows are lists of simple data values.  Rows are the unit of data operation  Aggregate orientation recognizes that often times data units can be more complex and can have nested lists and record structures 
  • Aggregates      The relational model takes information you want to store and divides it into rows. In RDBMS rows are lists of simple data values. In RDBMS rows are the unit of data operation Aggregate orientation recognizes that often times data units can be more complex and can have nested lists and record structures With Aggregates, aggregates are the unit of data operation
  • Relational Data Example
  • Aggregate Data Example
  • Consequences of Aggregate Orientation Relations capture data elements and relations, but not aggregates.  Aggregates are really “chunks” of data that are typically retrieved and operated on as an interaction unit.  Aggregates are about how the data is being used.  RDBMS do not have knowledge of aggregate structure and cant use it to store and distribute data 
  • Consequences of Aggregate Orientation So, RDBMS are aggregate-ignorant. Is that a bad or good thing? Its both  Its good if you need to access and use the data in many different ways – if you don’t have a primary structure for manipulating your data  Its bad if you want to run on a cluster.  Aggregates are great on clusters because you can distribute them across nodes 
  • Consequences of Aggregate Orientation Aggregate orientation allows you to operate many logical data items (in the aggregate) by updating the aggregate atomically  Aggregate oriented NoSQL databases can be said to support transactions on single aggregates, but not across aggregates 
  • Key-Value & Document Data Models Both types of databases have a key or Id that is mapped to an aggregate data structure in a virtual table  With key-value NoSQL dbs, we can only access the aggregate by looking up its key  With document databases we can also look up aggregates by fields in the aggregate 
  • Key-Value & Document Data Models  Examples of Key-Value NoSQL dbs are  Redis  Examples of Document NoSQL dbs are  Mongodb  Couchbase  SimpleDB
  • Column-Family Data Models These NoSQL databases where influenced by Google’s BigTable  The Columnar is a two-level aggregate structure   There is a key (row identifier) that maps to the aggregate of interest  The aggregate is a map of more detailed values – these are referred to as columns
  • Column-Family Data Models
  • Column-Family Data Models Column-family dbs organize columns into families  The data is row-oriented   Each row is an aggregate (eg. Customer with id 1234)  The data is column-oriented  Each column family defines a record type (customer profile)  But, columns can also be dynamic and unique (to model lists)
  • Column-Family Data Models  Examples of Column-Family NoSQL dbs are  Hbase  Cassandra
  • Polyglot Persistence The future?  Only NoSQL?  Only SQL?  Probably both – Polyglot Persistence