• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Overview of no sql
 

Overview of no sql

on

  • 2,211 views

 

Statistics

Views

Total Views
2,211
Views on SlideShare
2,211
Embed Views
0

Actions

Likes
1
Downloads
124
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Overview of no sql Overview of no sql Presentation Transcript

    • Overview ofNoSQL...motivation, technologies, should youcare?
    • Overview● Evolution of/motivation for NoSQL databases● Characterization of NoSQL databases● Classification of NoSQL databases● Popularity/usage of NoSQL systems
    • A brief history of NoSQL● Originally coined in 1998 by Strozzi for specific non-rel database ○ easy to use, free, text based data storage, easy manipulation of contents of db● Reintroduced by Evans (Rackspace) in 2009 for conf on open source distributed databases ○ in response to increase in interest in non RDBMS solutions ■ bringing together Cassandra, Mongo, Couch, etc● Has grown as a movement over last 3 years
    • Current status● Significant buzz within community in 2010 ○ initial development of technology ○ pioneer deployments ○ lots of meetups/conferences/birds of feathers● Many key technologies evolved later 2010, 2011 ○ more large deployments for some technologies ○ small companies with no legacy basing operations on NoSQL
    • Current Status● 2012 ○ buzz/hype is fading ○ technology continues to mature ○ increased number of deployments ○ skills sought in job market
    • NoSQL - a negativedefinition● NoSQL simply defined by being non- relational ○ diverse set of technologies fall into NoSQL camp● Motivations mixed ○ open source ○ scale - TB, PB - particulary for read/write latency ○ increased flexibility over RDBMS systems ○ ability to work with raw data ○ ACID not always most appropriate design choice ■ analytics data is excellent example● Results in many different NoSQL technologies
    • Typical characteristics● Dont use SQL!● Open Source● Intended to deliver performance ○ in some dimension● Typically JOIN not supported ○ performance hit● Consistency often relaxed ○ eventual consistency● More flexibility in schema ○ if schema used at all!
    • Diversity of NoSQLdatabases● 122 seperate technologies listed on http: //nosql-database.org/ ○ mix of commercial, open source and some inbetween● Vary in many dimensions: ○ architecture ○ interfaces ■ api/languages ○ internal data storage ○ distribution mechanisms ■ redundancy, reliability ○ usage - deployments & support community ○ maturity
    • Classification of NoSQLsystems● Column based solutions● Document store solutions● Key/Value solutions● Graph based solutions● Less significantly: ○ XML databases ○ Object databases ○ Mulitvalue databases
    • Column based solutions● Structured data ○ similar to classical tables● Generally much more flexible ○ no rigorous schema necessary ○ can typically add columns in ad hoc fashion ■ often without explicitly declaring column● However, can result in very different usage ○ eg can have millions of columns associated with given row● Examples: Hadoop/HBase, Cassandra, Hypertable, SimpleDB
    • Document based solutions● Less structured data ○ DB composed of documents containing arbitrary data ■ usually containing longer form content eg CMS● Documents contain some structure to support query/search/filter, etc● Somewhat less emphasis on a key ○ can be autogenerated● Quite unlike classical databases● Examples: MongoDB, CouchDB
    • Key/value stores● DBs inspired by memcache ○ simple, fast key/value stores● Attempt to retain most of DB in memory ○ fast response times● Different designs for scalability ○ single node/multi node● Much emphasis on the keys in this type of DB● Write usually overwrites entire previous entry● Examples: Redis, Couchbase/Membase, DynamoDB, Riak
    • Graph based solutions● Obviously different from previous categories ○ Focus specifically on graphs● Queries supported are graph-specific ○ eg get nodes related to specified node● Typically support for solving standard graph problems ○ eg shortest path, general graph traversal● Can deliver very significant performance over non-graph specific solutions ○ for graph problems!● Examples: Neo4j
    • Its a noisy space...● Very many candidate technologies● Relatively small amount of real world solutions● Differences between classifications above is one of emphasis... ○ column based and document based arrive at semi- structured sweet spot from opposite ends of spectrum● ...although this results in different preferred use cases... ○ document based solution better for document problems, eg CMS
    • Common techniques used● Hashing techniques used to map data to nodes in cluster● Internode communication via Gossip● Common replication techniques● Thrift is used in a few cases● MapReduce often used to search over distributed system
    • Comparison (oldish)...
    • Comparison (oldish)
    • Comparison (oldish)
    • Horses for courses...● SQL is perfectly good solution for many problems ○ tried and tested● Some problems require alternative solution ○ typically driven by scale and/or flexibility● NoSQL offers (many) alternatives ○ although relatively easy to identify realistic options● Column based approaches good for mostly structured data with enhanced flexibility● Document based approaches good for document oriented problems
    • ...so lets dive into oneNoSQL database...● Cassandra...