Nosql seminar


Published on

Published in: Education
  • Excellent wrk ..shreyash bhai...!!!!
    Are you sure you want to  Yes  No
    Your message goes here
  • Hello
    dear,! Nice to meet you, A friend is A gift from God my name is success i went through your profile on this site and became interested in you please i will be very happy if you can contact me with my email address at ( so that i will tell you about myself and my pictures for you to know whom i am, Have a wonderful day!
    Best Regard
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Nosql seminar

  1. 1. NOSQL
  2. 2. Agenda Introduction to NOSQL Objective Examples of NOSQL databases NOSQL vs SQL Conclusion
  3. 3. Basic Concepts Database – is a organized collection of data. Data base Management System (DBMS)- is a software package with computer program that controls the creation , maintainance & use of a database.  for DBMS , we use structured language to interact with it  Ex. Oracle , IBM DB2 , Ms Access , MySQL , FoxPro etc. Relational DBMS - A relational database is a collection of data items organized as a set of formally described tables from which data can be accessed easily. A relational database is created using the relational model. The software used in a relational database is called a relational database management system (RDBMS).
  4. 4. SQL Stuctured Query Language Special purpose programming language designed for managing data in RDBMS. Origininally based upon relational algebra & tuple relation calculas. SQl’s scope include data insert,upadte & delete, schema creation and modification , data access control. It is static and strong used in database. Most used widely used database language. Query is the most important operation in SQL. Ex. SELECT * FROM Book WHERE price > 100.00 ORDER BY title;
  5. 5. NOSQL Stands for Not Only SQL Class of non-relational data storage systems Usually do not require a fixed table schema nor do they use the concept of joins All NOSQL offerings relax one or more of the ACID properties .  Atomicity , Consistancy , Isolation , Durability ( ACID ) “NOSQL” = “Not Only SQL” = Not Only using traditional relational DBMS
  6. 6. NOSQL• Alternative to traditional relational DBMS • Flexible schema • Quicker/cheaper to set up • Massive scalability • Relaxed consistency higher performance & availability * No declarative query language more programming * Relaxed consistency fewer guarantees
  7. 7. Why NOSQL? Every problem cannot be solved by traditional relational database system exclusively. Handles huge databases. Redundancy, data is pretty safe on commodity hardware Super flexible queries using map/reduce Rapid development (no fixed schema, yeah!) Very fast for common use cases
  8. 8. Contd.. Inspired by Distributed Data Storage problems Scale easily by adding servers Not suited to all problem types, but super-suited to certain large problem types High-write situations (eg activity tracking or timeline rendering for millions of users) A lot of relational uses are really dumbed down (eg fetch by PK with update)
  9. 9. Architecture
  10. 10. How does it work? Clients know how to: Send items to servers (consistent hashing) What to do when a server fails How to fetch keys from servers Can “weigh” to server capacities Servers know how to: Store items they receive Expire them from the cache No inter-server comms – everything is unaware
  11. 11. Performance RDBMS uses buffer to ensure ACID properties NoSQL does not guarantee ACID and is therefore much faster We don’t need ACID everywhere! Ex. Data processing (every minute) is 4x faster with MongoDB, despite being a lot more detailed (due to much simple development)
  12. 12. Why NOSQL is faster than SQL ? - Scalling Simple web application with not much traffic  Application server, database server all on one machine
  13. 13. Scalling contd.. More traffic comes in  Application server  Database server Even more traffic comes in  Load balancer  Application server x2  Database server
  14. 14. Scalling contd.. Even more traffic comes in  Load balancer x N  easy  Application server x N  easy  Database server xN  hard for SQL databases
  15. 15. SQL Slowdown Not linear!
  16. 16. Scalling contd.. NoSQL Scalling - Need more storage?  Add more servers! Need higher performance?  Add more servers! Need better reliability?  Add more servers!
  17. 17. Scalling Summary You can scale SQL databases (Oracle, MySQL, SQL Server…)  This will cost you dearly  If you don’t have a lot of money, you will reach limits quickly You can scale NoSQL databases  Very easy horizontal scaling  Lots of open-source solutions  Scaling is one of the basic incentives for design, so it is well handled  Scaling is the cause of trade-offs causing you to have to use map/reduce
  18. 18. Characterstics Almost infinite horizontal scaling Very fast Performance doesn’t deteriorate with growth (much) No fixed table schemas No join operations Ad-hoc queries difficult or impossible Structured storage Almost everything happens in RAM
  19. 19. NOSQL Types Wide Column Store / Column Families Document Store Key Value / Tuple Store Graph Databases Object Databases XML Databases Multivalue Databases
  20. 20. Main types - Key-Value Stores Map Reduce Framework Document Databases Graph Databases
  21. 21. Key Value Stores Lineage: Amazons Dynamo paper and Distributed HashTables. Data model: A global collection of key-value pairs Example systems  Google BigTable , Amazon Dynamo, Cassandra, Voldemort , Hbase , … Implementation: efficiency, scalability, fault-tolerance  Records distributed to nodes based on key  Replication  Single-record transactions, “eventual consistency”
  22. 22. Documented Databases Lineage: Inspired by Lotus Notes. Data model: Collections of documents, which contain key-value collections (called "documents"). Example: CouchDB, MongoDB, Riak
  23. 23. Graph Database Lineage: Draws from Euler and graph theory. Data model: Nodes & relationships, both which can hold key-value pairs Example: AllegroGraph, InfoGrid, Neo4j
  24. 24. Map Reduce Framework Google’s framework for processing highly distributable problems across huge datasets using a large number of computers Let’s define large number of computers  Cluster if all of them have same hardware  Grid unless Cluster (if !Cluster for old-style programmers) Process split into two phases  Map  Take the input, partition it delegate to other machines  Other machines can repeat the process, leading to tree structure  Each machine returns results to the machine who gave it the task
  25. 25. Map Reduce Framework contd.. Reduce  collect results from machines you gave the tasks  combine results and return it to requester Slower than sequential data processing, but massively parallel Sort petabyte of data in a few hours Input, Map, Shuffle, Reduce, Output
  26. 26. Popular NoSQL Hadoop / Hbase  MemcacheDB Cassandra  Voldemort Amazon  Hypertable SimpleDB  Cloudata MongoDB  IBM CouchDB Lotus/Domino Redis
  27. 27. Real World Use Cassandra  Facebook (original developer, used it till late 2010)  Twitter  Digg  Reddit  Rackspace  Cisco BigTable  Google (open-source version is HBase) MongoDB  Foursquare  Craigslist   SourceForge  GitHub
  28. 28. MONGODB  Document store  Basic support for dynamic (ad hoc) queries  Query by example (nice!) Conditional Operators  <, <=, >, >=  $all, $exists, $mod, $ne, $in, $nin, $nor, $or, $and, $si ze, $type
  29. 29. MONGODB Data is stored as BSON (binary JSON)  Makes it very well suited for languages with native JSON support Map/Reduce written in Javascript  Slow! There is one single thread of execution in Javascript Master/slave replication (auto failover with replica sets) Sharding built-in Uses memory mapped files for data storage Performance over features On 32bit systems, limited to ~2.5Gb An empty database takes up 192Mb GridFS to store big data + metadata (not actually an FS)
  30. 30. CASANDRA Written in: Java Protocol: Custom, binary (Thrift) Tunable trade-offs for distribution and replication (N, R, W) Querying by column, range of keys BigTable-like features: columns, column families Writes are much faster than reads (!)  Constant write time regardless of database size Map/reduce possible with Apache Hadoop
  31. 31. Some more info about Cassndra in Facebook Cassandra is open source DBMS from Appache software foundation. Cassandra provides a structured key-value store with tunable consistency Cassandra is a distributed storage system for managing structured data that is designed to scale to a very large size across many commodity servers, with no single point of failure It is a NoSQL solution that was initially developed by Facebook and powered their Inbox Search feature until late 2010
  32. 32. HBASE Written in: Java Main point: Billions of rows X millions of columns Modeled after BigTable Map/reduce with Hadoop Query predicate push down via server side scan and get filters Optimizations for real time queries A high performance Thrift gateway HTTP supports XML, Protobuf, and binary Cascading, hive, and pig source and sink modules No single point of failure While Hadoop streams data efficiently, it has overhead for starting map/reduce jobs. HBase is column oriented key/value store and allows for low latency read and writes. Random access performance is like MySQL
  33. 33. COUCHDB Written in: Erlang Main point: DB consistency, ease of use Bi-directional (!) replication, continuous or ad-hoc, with conflict detection, thus, master-master replication. (!) MVCC - write operations do not block reads Previous versions of documents are available Crash-only (reliable) design Needs compacting from time to time Views: embedded map/reduce Formatting views: lists & shows Server-side document validation possible Authentication possible Real-time updates via _changes (!) Attachment handling CouchApps (standalone JS apps)
  34. 34. HADOOP Apache project A framework that allows for the distributed processing of large data sets across clusters of computers Designed to scale up from single servers to thousands of machines Designed to detect and handle failures at the application layer, instead of relying on hardware for it Created by Doug Cutting, who named it after his sons toy elephant Hadoop subprojects  Cassandra  HBase  Pig Hive was a Hadoop subproject, but is now a top-level Apache project
  35. 35. HADOOP contd.. Scales to hundreds or thousands of computers, each with several processor cores Designed to efficiently distribute large amounts of work across a set of machines Hundreds of gigabytes of data constitute the low end of Hadoop- scale Built to process "web-scale" data on the order of hundreds of gigabytes to terabytes or petabytes Uses Java, but allows streaming so other languages can easily send and accept data items to/from Hadoop
  36. 36. HADOOP contd.. Uses distributed file system (HDFS)  Designed to hold very large amounts of data (terabytes or even petabytes)  Files are stored in a redundant fashion across multiple machines to ensure their durability to failure and high availability to very parallel applications  Data organized into directories and files  Files are divided into block (64MB by default) and distributed across nodes Design of HDFS is based on the design of the Google File System
  37. 37. HIVE A petabyte-scale data warehouse system for Hadoop Easy data summarization, ad-hoc queries Query the data using a SQL-like language called HiveQL Hive compiler generates map-reduce jobs for most queries
  38. 38. Conclusion NoSQL is a great problem solver if you need it Choose your NoSQL platform carefully as each is designed for specific purpose Get used to Map/Reduce It’s not a sin to use NoSQL alongside (yes)SQL database
  39. 39. Referance 138919
  40. 40. THANKYOU..!!