Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Overview of no sql


Published on

Published in: Technology
  • Be the first to comment

Overview of no sql

  1. 1. Overview ofNoSQL...motivation, technologies, should youcare?
  2. 2. Overview● Evolution of/motivation for NoSQL databases● Characterization of NoSQL databases● Classification of NoSQL databases● Popularity/usage of NoSQL systems
  3. 3. A brief history of NoSQL● Originally coined in 1998 by Strozzi for specific non-rel database ○ easy to use, free, text based data storage, easy manipulation of contents of db● Reintroduced by Evans (Rackspace) in 2009 for conf on open source distributed databases ○ in response to increase in interest in non RDBMS solutions ■ bringing together Cassandra, Mongo, Couch, etc● Has grown as a movement over last 3 years
  4. 4. Current status● Significant buzz within community in 2010 ○ initial development of technology ○ pioneer deployments ○ lots of meetups/conferences/birds of feathers● Many key technologies evolved later 2010, 2011 ○ more large deployments for some technologies ○ small companies with no legacy basing operations on NoSQL
  5. 5. Current Status● 2012 ○ buzz/hype is fading ○ technology continues to mature ○ increased number of deployments ○ skills sought in job market
  6. 6. NoSQL - a negativedefinition● NoSQL simply defined by being non- relational ○ diverse set of technologies fall into NoSQL camp● Motivations mixed ○ open source ○ scale - TB, PB - particulary for read/write latency ○ increased flexibility over RDBMS systems ○ ability to work with raw data ○ ACID not always most appropriate design choice ■ analytics data is excellent example● Results in many different NoSQL technologies
  7. 7. Typical characteristics● Dont use SQL!● Open Source● Intended to deliver performance ○ in some dimension● Typically JOIN not supported ○ performance hit● Consistency often relaxed ○ eventual consistency● More flexibility in schema ○ if schema used at all!
  8. 8. Diversity of NoSQLdatabases● 122 seperate technologies listed on http: // ○ mix of commercial, open source and some inbetween● Vary in many dimensions: ○ architecture ○ interfaces ■ api/languages ○ internal data storage ○ distribution mechanisms ■ redundancy, reliability ○ usage - deployments & support community ○ maturity
  9. 9. Classification of NoSQLsystems● Column based solutions● Document store solutions● Key/Value solutions● Graph based solutions● Less significantly: ○ XML databases ○ Object databases ○ Mulitvalue databases
  10. 10. Column based solutions● Structured data ○ similar to classical tables● Generally much more flexible ○ no rigorous schema necessary ○ can typically add columns in ad hoc fashion ■ often without explicitly declaring column● However, can result in very different usage ○ eg can have millions of columns associated with given row● Examples: Hadoop/HBase, Cassandra, Hypertable, SimpleDB
  11. 11. Document based solutions● Less structured data ○ DB composed of documents containing arbitrary data ■ usually containing longer form content eg CMS● Documents contain some structure to support query/search/filter, etc● Somewhat less emphasis on a key ○ can be autogenerated● Quite unlike classical databases● Examples: MongoDB, CouchDB
  12. 12. Key/value stores● DBs inspired by memcache ○ simple, fast key/value stores● Attempt to retain most of DB in memory ○ fast response times● Different designs for scalability ○ single node/multi node● Much emphasis on the keys in this type of DB● Write usually overwrites entire previous entry● Examples: Redis, Couchbase/Membase, DynamoDB, Riak
  13. 13. Graph based solutions● Obviously different from previous categories ○ Focus specifically on graphs● Queries supported are graph-specific ○ eg get nodes related to specified node● Typically support for solving standard graph problems ○ eg shortest path, general graph traversal● Can deliver very significant performance over non-graph specific solutions ○ for graph problems!● Examples: Neo4j
  14. 14. Its a noisy space...● Very many candidate technologies● Relatively small amount of real world solutions● Differences between classifications above is one of emphasis... ○ column based and document based arrive at semi- structured sweet spot from opposite ends of spectrum● ...although this results in different preferred use cases... ○ document based solution better for document problems, eg CMS
  15. 15. Common techniques used● Hashing techniques used to map data to nodes in cluster● Internode communication via Gossip● Common replication techniques● Thrift is used in a few cases● MapReduce often used to search over distributed system
  16. 16. Comparison (oldish)...
  17. 17. Comparison (oldish)
  18. 18. Comparison (oldish)
  19. 19. Horses for courses...● SQL is perfectly good solution for many problems ○ tried and tested● Some problems require alternative solution ○ typically driven by scale and/or flexibility● NoSQL offers (many) alternatives ○ although relatively easy to identify realistic options● Column based approaches good for mostly structured data with enhanced flexibility● Document based approaches good for document oriented problems
  20. 20. lets dive into oneNoSQL database...● Cassandra...