Overview of no sql


Technology
  1. 1. Overview ofNoSQL...motivation, technologies, should youcare?
  2. 2. Overview● Evolution of/motivation for NoSQL databases● Characterization of NoSQL databases● Classification of NoSQL databases● Popularity/usage of NoSQL systems
  3. 3. A brief history of NoSQL● Originally coined in 1998 by Strozzi for specific non-rel database ○ easy to use, free, text based data storage, easy manipulation of contents of db● Reintroduced by Evans (Rackspace) in 2009 for conf on open source distributed databases ○ in response to increase in interest in non RDBMS solutions ■ bringing together Cassandra, Mongo, Couch, etc● Has grown as a movement over last 3 years
  4. 4. Current status● Significant buzz within community in 2010 ○ initial development of technology ○ pioneer deployments ○ lots of meetups/conferences/birds of feathers● Many key technologies evolved later 2010, 2011 ○ more large deployments for some technologies ○ small companies with no legacy basing operations on NoSQL
  5. 5. Current Status● 2012 ○ buzz/hype is fading ○ technology continues to mature ○ increased number of deployments ○ skills sought in job market
  6. 6. NoSQL - a negativedefinition● NoSQL simply defined by being non- relational ○ diverse set of technologies fall into NoSQL camp● Motivations mixed ○ open source ○ scale - TB, PB - particulary for read/write latency ○ increased flexibility over RDBMS systems ○ ability to work with raw data ○ ACID not always most appropriate design choice ■ analytics data is excellent example● Results in many different NoSQL technologies
  7. 7. Typical characteristics● Dont use SQL!● Open Source● Intended to deliver performance ○ in some dimension● Typically JOIN not supported ○ performance hit● Consistency often relaxed ○ eventual consistency● More flexibility in schema ○ if schema used at all!
  8. 8. Diversity of NoSQLdatabases● 122 seperate technologies listed on http: // ○ mix of commercial, open source and some inbetween● Vary in many dimensions: ○ architecture ○ interfaces ■ api/languages ○ internal data storage ○ distribution mechanisms ■ redundancy, reliability ○ usage - deployments & support community ○ maturity
  9. 9. Classification of NoSQLsystems● Column based solutions● Document store solutions● Key/Value solutions● Graph based solutions● Less significantly: ○ XML databases ○ Object databases ○ Mulitvalue databases
  10. 10. Column based solutions● Structured data ○ similar to classical tables● Generally much more flexible ○ no rigorous schema necessary ○ can typically add columns in ad hoc fashion ■ often without explicitly declaring column● However, can result in very different usage ○ eg can have millions of columns associated with given row● Examples: Hadoop/HBase, Cassandra, Hypertable, SimpleDB
  11. 11. Document based solutions● Less structured data ○ DB composed of documents containing arbitrary data ■ usually containing longer form content eg CMS● Documents contain some structure to support query/search/filter, etc● Somewhat less emphasis on a key ○ can be autogenerated● Quite unlike classical databases● Examples: MongoDB, CouchDB
  12. 12. Key/value stores● DBs inspired by memcache ○ simple, fast key/value stores● Attempt to retain most of DB in memory ○ fast response times● Different designs for scalability ○ single node/multi node● Much emphasis on the keys in this type of DB● Write usually overwrites entire previous entry● Examples: Redis, Couchbase/Membase, DynamoDB, Riak
  13. 13. Graph based solutions● Obviously different from previous categories ○ Focus specifically on graphs● Queries supported are graph-specific ○ eg get nodes related to specified node● Typically support for solving standard graph problems ○ eg shortest path, general graph traversal● Can deliver very significant performance over non-graph specific solutions ○ for graph problems!● Examples: Neo4j
  14. 14. Its a noisy space...● Very many candidate technologies● Relatively small amount of real world solutions● Differences between classifications above is one of emphasis... ○ column based and document based arrive at semi- structured sweet spot from opposite ends of spectrum● ...although this results in different preferred use cases... ○ document based solution better for document problems, eg CMS
  15. 15. Common techniques used● Hashing techniques used to map data to nodes in cluster● Internode communication via Gossip● Common replication techniques● Thrift is used in a few cases● MapReduce often used to search over distributed system
  16. 16. Comparison (oldish)...
  17. 17. Comparison (oldish)
  18. 18. Comparison (oldish)
  19. 19. Horses for courses...● SQL is perfectly good solution for many problems ○ tried and tested● Some problems require alternative solution ○ typically driven by scale and/or flexibility● NoSQL offers (many) alternatives ○ although relatively easy to identify realistic options● Column based approaches good for mostly structured data with enhanced flexibility● Document based approaches good for document oriented problems
  20. 20. lets dive into oneNoSQL database...● Cassandra...