SlideShare a Scribd company logo
NOSQL
Agenda
 Introduction to NOSQL
 Objective
 Examples of NOSQL databases
 NOSQL vs SQL
 Conclusion
Basic Concepts

 Database – is a organized collection of data.
 Data base Management System (DBMS)- is a software
  package with computer program that controls the
  creation , maintainance & use of a database.
     for DBMS , we use structured language to interact with it
     Ex. Oracle , IBM DB2 , Ms Access , MySQL , FoxPro etc.
 Relational DBMS - A relational database is a
  collection of data items organized as a set of formally
  described tables from which data can be accessed easily.
  A relational database is created using the relational
  model. The software used in a relational database is
  called a relational database management
  system (RDBMS).
SQL

 Stuctured Query Language
 Special purpose programming language designed for
    managing data in RDBMS.
   Origininally based upon relational algebra & tuple relation
    calculas.
   SQl’s scope include data insert,upadte & delete, schema
    creation and modification , data access control.
   It is static and strong used in database.
   Most used widely used database language.
   Query is the most important operation in SQL.
   Ex. SELECT *
         FROM Book
         WHERE price > 100.00
         ORDER BY title;
NOSQL

 Stands for Not Only SQL
 Class of non-relational data storage systems
 Usually do not require a fixed table schema nor do
  they use the concept of joins
 All NOSQL offerings relax one or more of the ACID
  properties .
    Atomicity , Consistancy , Isolation , Durability ( ACID )
 “NOSQL” = “Not Only SQL” =
       Not Only using traditional relational DBMS
NOSQL

•   Alternative to traditional relational DBMS
    •   Flexible schema
    •   Quicker/cheaper to set up
    •   Massive scalability
    •   Relaxed consistency higher performance &
        availability

    * No declarative query language more programming
    * Relaxed consistency fewer guarantees
Why NOSQL?


 Every problem cannot be solved by traditional
    relational database system exclusively.
   Handles huge databases.
   Redundancy, data is pretty safe on commodity
    hardware
   Super flexible queries using map/reduce
   Rapid development (no fixed schema, yeah!)
   Very fast for common use cases
Contd..


 Inspired by Distributed Data Storage problems
 Scale easily by adding servers
 Not suited to all problem types, but super-suited to
  certain large problem types
 High-write situations (eg activity tracking or timeline
  rendering for millions of users)
 A lot of relational uses are really dumbed down (eg
  fetch by PK with update)
Architecture
How does it work?

 Clients know how to:
  Send items to servers (consistent hashing)
  What to do when a server fails
  How to fetch keys from servers
  Can “weigh” to server capacities

 Servers know how to:
  Store items they receive
  Expire them from the cache
  No inter-server comms – everything is unaware
Performance

 RDBMS uses buffer to ensure ACID properties
 NoSQL does not guarantee ACID and is therefore
  much faster
 We don’t need ACID everywhere!
 Ex. Data processing (every minute) is 4x faster with
  MongoDB, despite being a lot more detailed (due to
  much simple development)
Why NOSQL is faster than SQL ? - Scalling

 Simple web application with not much traffic
   Application server, database server all on one machine
Scalling contd..

 More traffic comes in
   Application server

   Database server




 Even more traffic comes in
   Load balancer

   Application server x2

   Database server
Scalling contd..


 Even more traffic comes in
     Load balancer x N
       easy
     Application server x N
       easy
     Database server xN
       hard for SQL databases
SQL Slowdown




 Not linear!
Scalling contd..


 NoSQL Scalling -
 Need more storage?
   Add more servers!

 Need higher performance?
   Add more servers!

 Need better reliability?
   Add more servers!
Scalling Summary

 You can scale SQL databases (Oracle, MySQL, SQL
  Server…)
     This will cost you dearly
     If you don’t have a lot of money, you will reach limits quickly
 You can scale NoSQL databases
   Very easy horizontal scaling

   Lots of open-source solutions

   Scaling is one of the basic incentives for design, so it is well
    handled
   Scaling is the cause of trade-offs causing you to have to use
    map/reduce
Characterstics

 Almost infinite horizontal scaling
 Very fast
 Performance doesn’t deteriorate with growth (much)
 No fixed table schemas
 No join operations
 Ad-hoc queries difficult or impossible
 Structured storage
 Almost everything happens in RAM
NOSQL Types


 Wide Column Store / Column Families
 Document Store
 Key Value / Tuple Store
 Graph Databases
 Object Databases
 XML Databases
 Multivalue Databases
Main types -

 Key-Value Stores
 Map Reduce Framework
 Document Databases
 Graph Databases
Key Value Stores

 Lineage: Amazon's Dynamo paper and Distributed
  HashTables.
 Data model: A global collection of key-value pairs
 Example systems
   Google BigTable , Amazon Dynamo, Cassandra,
     Voldemort , Hbase , …
 Implementation: efficiency, scalability, fault-tolerance
   Records distributed to nodes based on key
   Replication

   Single-record transactions, “eventual consistency”
Documented Databases

 Lineage: Inspired by Lotus Notes.
 Data model: Collections of documents, which
  contain key-value collections (called "documents").
 Example: CouchDB, MongoDB, Riak
Graph Database

 Lineage: Draws from Euler and graph theory.
 Data model: Nodes & relationships, both which can
  hold key-value pairs
 Example: AllegroGraph, InfoGrid, Neo4j
Map Reduce Framework

 Google’s framework for processing highly
  distributable problems across huge datasets
  using a large number of computers
 Let’s define large number of computers
    Cluster if all of them have same hardware
    Grid unless Cluster (if !Cluster for old-style programmers)
 Process split into two phases
   Map
      Take the input, partition it delegate to other machines
      Other machines can repeat the process, leading to tree structure
      Each machine returns results to the machine who gave it the task
Map Reduce Framework contd..

   Reduce
     collect results from machines you gave the tasks
     combine results and return it to requester

   Slower than sequential data processing, but massively parallel
   Sort petabyte of data in a few hours
   Input, Map, Shuffle, Reduce, Output
Popular NoSQL


 Hadoop / Hbase       MemcacheDB
 Cassandra            Voldemort
 Amazon               Hypertable
  SimpleDB             Cloudata
 MongoDB              IBM
 CouchDB              Lotus/Domino
 Redis
Real World Use

 Cassandra
   Facebook (original developer, used it till late 2010)
   Twitter
   Digg
   Reddit
   Rackspace
   Cisco

 BigTable
   Google (open-source version is HBase)

 MongoDB
   Foursquare
   Craigslist
   Bit.ly
   SourceForge
   GitHub
MONGODB

  Document store
  Basic support for dynamic (ad hoc) queries
  Query by example (nice!)




 Conditional Operators
    <, <=, >, >=
    $all, $exists, $mod, $ne, $in, $nin, $nor, $or, $and, $si
     ze, $type
MONGODB

 Data is stored as BSON (binary JSON)
     Makes it very well suited for languages with native JSON support
 Map/Reduce written in Javascript
     Slow! There is one single thread of execution in Javascript
 Master/slave replication (auto failover with replica sets)
 Sharding built-in
 Uses memory mapped files for data storage
 Performance over features
 On 32bit systems, limited to ~2.5Gb
 An empty database takes up 192Mb
 GridFS to store big data + metadata (not actually an FS)
CASANDRA

 Written in: Java
 Protocol: Custom, binary (Thrift)
 Tunable trade-offs for distribution and replication
  (N, R, W)
 Querying by column, range of keys
 BigTable-like features: columns, column families
 Writes are much faster than reads (!)
    Constant write time regardless of database size
 Map/reduce possible with Apache Hadoop
Some more info about Cassndra in Facebook

 Cassandra is open source DBMS from Appache
  software foundation.
 Cassandra provides a structured key-value
  store with tunable consistency
 Cassandra is a distributed storage system for
  managing structured data that is designed to scale to
  a very large size across many commodity
  servers, with no single point of failure
 It is a NoSQL solution that was initially developed
  by Facebook and powered their Inbox Search feature
  until late 2010
HBASE

 Written in: Java
 Main point: Billions of rows X millions of columns
 Modeled after BigTable
 Map/reduce with Hadoop
 Query predicate push down via server side scan and get filters
 Optimizations for real time queries
 A high performance Thrift gateway
 HTTP supports XML, Protobuf, and binary
 Cascading, hive, and pig source and sink modules
 No single point of failure
 While Hadoop streams data efficiently, it has overhead for
  starting map/reduce jobs. HBase is column oriented
  key/value store and allows for low latency read and writes.
 Random access performance is like MySQL
COUCHDB

 Written in: Erlang
 Main point: DB consistency, ease of use
 Bi-directional (!) replication, continuous or ad-hoc, with conflict
    detection, thus, master-master replication. (!)
   MVCC - write operations do not block reads
   Previous versions of documents are available
   Crash-only (reliable) design
   Needs compacting from time to time
   Views: embedded map/reduce
   Formatting views: lists & shows
   Server-side document validation possible
   Authentication possible
   Real-time updates via _changes (!)
   Attachment handling
   CouchApps (standalone JS apps)
HADOOP

 Apache project
 A framework that allows for the distributed processing of
    large data sets across clusters of computers
   Designed to scale up from single servers to thousands of
    machines
   Designed to detect and handle failures at the application
    layer, instead of relying on hardware for it
   Created by Doug Cutting, who named it after his son's toy
    elephant
   Hadoop subprojects
       Cassandra
       HBase
       Pig
   Hive was a Hadoop subproject, but is now a top-level Apache project
HADOOP contd..

 Scales to hundreds or thousands of computers, each with several
    processor cores
   Designed to efficiently distribute large amounts of work across a
    set of machines
   Hundreds of gigabytes of data constitute the low end of Hadoop-
    scale
   Built to process "web-scale" data on the order of hundreds of
    gigabytes to terabytes or petabytes
   Uses Java, but allows streaming so other languages can easily
    send and accept data items to/from Hadoop
HADOOP contd..

 Uses distributed file system (HDFS)
   Designed to hold very large amounts of data (terabytes or even
    petabytes)
   Files are stored in a redundant fashion across multiple
    machines to ensure their durability to failure and high
    availability to very parallel applications
   Data organized into directories and files

   Files are divided into block (64MB by default) and distributed
    across nodes
 Design of HDFS is based on the design of the Google
  File System
HIVE

 A petabyte-scale data warehouse system for Hadoop
 Easy data summarization, ad-hoc queries
 Query the data using a SQL-like language called
  HiveQL
 Hive compiler generates map-reduce jobs for most
  queries
Conclusion

 NoSQL is a great problem solver if you need it
 Choose your NoSQL platform carefully as each is
  designed for specific purpose
 Get used to Map/Reduce
 It’s not a sin to use NoSQL alongside (yes)SQL
  database
Referance

 http://www.facebook.com/note.php?note_id=24413
    138919
   http://en.wikipedia.org/wiki/Apache_Cassandra
   http://en.wikipedia.org/wiki/SQL
   http://en.wikipedia.org/wiki/NoSQL
   www.slideshare.com
THANK
YOU..!!

More Related Content

What's hot

SQL vs. NoSQL Databases
SQL vs. NoSQL DatabasesSQL vs. NoSQL Databases
SQL vs. NoSQL Databases
Osama Jomaa
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
alexbaranau
 
SQL & NoSQL
SQL & NoSQLSQL & NoSQL
NoSQL
NoSQLNoSQL
NoSQL
Radu Potop
 
NOSQL vs SQL
NOSQL vs SQLNOSQL vs SQL
NOSQL vs SQL
Mohammed Fazuluddin
 
Introduction to NOSQL databases
Introduction to NOSQL databasesIntroduction to NOSQL databases
Introduction to NOSQL databases
Ashwani Kumar
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
DataStax
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP Theorem
Rahul Jain
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
Gokhan Atil
 
Introduction to MongoDB.pptx
Introduction to MongoDB.pptxIntroduction to MongoDB.pptx
Introduction to MongoDB.pptx
Surya937648
 
An Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBAn Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDB
Lee Theobald
 
SQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDBSQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDB
Marco Segato
 
Introduction to snowflake
Introduction to snowflakeIntroduction to snowflake
Introduction to snowflake
Sunil Gurav
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sql
Ram kumar
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
Christian Johannsen
 
SQOOP PPT
SQOOP PPTSQOOP PPT
SQOOP PPT
Dushhyant Kumar
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Mike Dirolf
 
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Edureka!
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
nehabsairam
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
Filip Ilievski
 

What's hot (20)

SQL vs. NoSQL Databases
SQL vs. NoSQL DatabasesSQL vs. NoSQL Databases
SQL vs. NoSQL Databases
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
SQL & NoSQL
SQL & NoSQLSQL & NoSQL
SQL & NoSQL
 
NoSQL
NoSQLNoSQL
NoSQL
 
NOSQL vs SQL
NOSQL vs SQLNOSQL vs SQL
NOSQL vs SQL
 
Introduction to NOSQL databases
Introduction to NOSQL databasesIntroduction to NOSQL databases
Introduction to NOSQL databases
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP Theorem
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Introduction to MongoDB.pptx
Introduction to MongoDB.pptxIntroduction to MongoDB.pptx
Introduction to MongoDB.pptx
 
An Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBAn Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDB
 
SQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDBSQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDB
 
Introduction to snowflake
Introduction to snowflakeIntroduction to snowflake
Introduction to snowflake
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sql
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 
SQOOP PPT
SQOOP PPTSQOOP PPT
SQOOP PPT
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 

Viewers also liked

NoSQL - 05March2014 Seminar
NoSQL - 05March2014 SeminarNoSQL - 05March2014 Seminar
NoSQL - 05March2014 Seminar
Jainul Musani
 
Smart quill seminar report final
Smart quill seminar report finalSmart quill seminar report final
Smart quill seminar report finalPramod Kumar
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
Marc Seeger
 
Introduction to Mongodb
Introduction to MongodbIntroduction to Mongodb
Introduction to Mongodb
Harun Yardımcı
 
Final ppt
Final pptFinal ppt
Final ppt
dikshagupta111
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduce
J Singh
 
Alpha compositing computer technology
Alpha compositing computer technologyAlpha compositing computer technology
Alpha compositing computer technology
Rushikesh Welkar
 
NoSQL Slideshare Presentation
NoSQL Slideshare Presentation NoSQL Slideshare Presentation
NoSQL Slideshare Presentation
Ericsson Labs
 
Jini network technology
Jini  network   technologyJini  network   technology
Jini network technologyKeerthi Thomas
 
PRESENTATION ON MIRROR LINK
PRESENTATION ON MIRROR LINKPRESENTATION ON MIRROR LINK
PRESENTATION ON MIRROR LINK
PraDeep SiNgh PaRihar
 
Dna ppt
Dna pptDna ppt
NoSQL databases pros and cons
NoSQL databases pros and consNoSQL databases pros and cons
NoSQL databases pros and cons
Fabio Fumarola
 
smart quill pen
smart quill pensmart quill pen
smart quill pen
ranjith12
 
The Most effective models for Customer Support Operations
The Most effective models for Customer Support OperationsThe Most effective models for Customer Support Operations
The Most effective models for Customer Support Operations
David Loia
 
Retail Idea
Retail IdeaRetail Idea
Retail Idea
sachin chaudhary
 
Coneixer barcelona(15 16). ppt
Coneixer barcelona(15 16). pptConeixer barcelona(15 16). ppt
Coneixer barcelona(15 16). ppt
mvilage
 

Viewers also liked (20)

NoSQL - 05March2014 Seminar
NoSQL - 05March2014 SeminarNoSQL - 05March2014 Seminar
NoSQL - 05March2014 Seminar
 
Smart quill seminar report final
Smart quill seminar report finalSmart quill seminar report final
Smart quill seminar report final
 
NoSQL Seminer
NoSQL SeminerNoSQL Seminer
NoSQL Seminer
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Introduction to Mongodb
Introduction to MongodbIntroduction to Mongodb
Introduction to Mongodb
 
Final ppt
Final pptFinal ppt
Final ppt
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduce
 
Alpha compositing computer technology
Alpha compositing computer technologyAlpha compositing computer technology
Alpha compositing computer technology
 
NoSQL Slideshare Presentation
NoSQL Slideshare Presentation NoSQL Slideshare Presentation
NoSQL Slideshare Presentation
 
Jini network technology
Jini  network   technologyJini  network   technology
Jini network technology
 
PRESENTATION ON MIRROR LINK
PRESENTATION ON MIRROR LINKPRESENTATION ON MIRROR LINK
PRESENTATION ON MIRROR LINK
 
Dna ppt
Dna pptDna ppt
Dna ppt
 
NoSQL databases pros and cons
NoSQL databases pros and consNoSQL databases pros and cons
NoSQL databases pros and cons
 
E paper
E paperE paper
E paper
 
smart quill pen
smart quill pensmart quill pen
smart quill pen
 
Proyecto cine
Proyecto cineProyecto cine
Proyecto cine
 
Presentation_NEW.PPTX
Presentation_NEW.PPTXPresentation_NEW.PPTX
Presentation_NEW.PPTX
 
The Most effective models for Customer Support Operations
The Most effective models for Customer Support OperationsThe Most effective models for Customer Support Operations
The Most effective models for Customer Support Operations
 
Retail Idea
Retail IdeaRetail Idea
Retail Idea
 
Coneixer barcelona(15 16). ppt
Coneixer barcelona(15 16). pptConeixer barcelona(15 16). ppt
Coneixer barcelona(15 16). ppt
 

Similar to Nosql seminar

Gluecon 2012 - DynamoDB
Gluecon 2012 - DynamoDBGluecon 2012 - DynamoDB
Gluecon 2012 - DynamoDB
Jeff Douglas
 
DynamoDB Gluecon 2012
DynamoDB Gluecon 2012DynamoDB Gluecon 2012
DynamoDB Gluecon 2012Appirio
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
datastack
 
Big data technology unit 3
Big data technology unit 3Big data technology unit 3
Big data technology unit 3
RojaT4
 
NO SQL: What, Why, How
NO SQL: What, Why, HowNO SQL: What, Why, How
NO SQL: What, Why, HowIgor Moochnick
 
Vskills Apache Cassandra sample material
Vskills Apache Cassandra sample materialVskills Apache Cassandra sample material
Vskills Apache Cassandra sample material
Vskills
 
Selecting best NoSQL
Selecting best NoSQL Selecting best NoSQL
Selecting best NoSQL
Mohammed Fazuluddin
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
Martin Bém
 
Minnebar 2013 - Scaling with Cassandra
Minnebar 2013 - Scaling with CassandraMinnebar 2013 - Scaling with Cassandra
Minnebar 2013 - Scaling with Cassandra
Jeff Bollinger
 
Nosql
NosqlNosql
Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Amazon Web Services
 
SQL and NoSQL in SQL Server
SQL and NoSQL in SQL ServerSQL and NoSQL in SQL Server
SQL and NoSQL in SQL Server
Michael Rys
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
balwinders
 
Mongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorialMongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorial
Mohan Rathour
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
Ahmed Helmy
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
James Serra
 
The ABC of Big Data
The ABC of Big DataThe ABC of Big Data
The ABC of Big Data
André Faria Gomes
 

Similar to Nosql seminar (20)

NoSQL
NoSQLNoSQL
NoSQL
 
Gluecon 2012 - DynamoDB
Gluecon 2012 - DynamoDBGluecon 2012 - DynamoDB
Gluecon 2012 - DynamoDB
 
DynamoDB Gluecon 2012
DynamoDB Gluecon 2012DynamoDB Gluecon 2012
DynamoDB Gluecon 2012
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
 
Big data technology unit 3
Big data technology unit 3Big data technology unit 3
Big data technology unit 3
 
NO SQL: What, Why, How
NO SQL: What, Why, HowNO SQL: What, Why, How
NO SQL: What, Why, How
 
Vskills Apache Cassandra sample material
Vskills Apache Cassandra sample materialVskills Apache Cassandra sample material
Vskills Apache Cassandra sample material
 
Big data concepts
Big data conceptsBig data concepts
Big data concepts
 
Selecting best NoSQL
Selecting best NoSQL Selecting best NoSQL
Selecting best NoSQL
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Minnebar 2013 - Scaling with Cassandra
Minnebar 2013 - Scaling with CassandraMinnebar 2013 - Scaling with Cassandra
Minnebar 2013 - Scaling with Cassandra
 
Nosql
NosqlNosql
Nosql
 
Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2
 
SQL and NoSQL in SQL Server
SQL and NoSQL in SQL ServerSQL and NoSQL in SQL Server
SQL and NoSQL in SQL Server
 
No sql
No sqlNo sql
No sql
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Mongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorialMongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorial
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
The ABC of Big Data
The ABC of Big DataThe ABC of Big Data
The ABC of Big Data
 

Recently uploaded

Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
GeoBlogs
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
Celine George
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
Celine George
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
bennyroshan06
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
AzmatAli747758
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
Nguyen Thanh Tu Collection
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
Vivekanand Anglo Vedic Academy
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 

Recently uploaded (20)

Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 

Nosql seminar

  • 2. Agenda  Introduction to NOSQL  Objective  Examples of NOSQL databases  NOSQL vs SQL  Conclusion
  • 3. Basic Concepts  Database – is a organized collection of data.  Data base Management System (DBMS)- is a software package with computer program that controls the creation , maintainance & use of a database.  for DBMS , we use structured language to interact with it  Ex. Oracle , IBM DB2 , Ms Access , MySQL , FoxPro etc.  Relational DBMS - A relational database is a collection of data items organized as a set of formally described tables from which data can be accessed easily. A relational database is created using the relational model. The software used in a relational database is called a relational database management system (RDBMS).
  • 4. SQL  Stuctured Query Language  Special purpose programming language designed for managing data in RDBMS.  Origininally based upon relational algebra & tuple relation calculas.  SQl’s scope include data insert,upadte & delete, schema creation and modification , data access control.  It is static and strong used in database.  Most used widely used database language.  Query is the most important operation in SQL.  Ex. SELECT * FROM Book WHERE price > 100.00 ORDER BY title;
  • 5. NOSQL  Stands for Not Only SQL  Class of non-relational data storage systems  Usually do not require a fixed table schema nor do they use the concept of joins  All NOSQL offerings relax one or more of the ACID properties .  Atomicity , Consistancy , Isolation , Durability ( ACID )  “NOSQL” = “Not Only SQL” = Not Only using traditional relational DBMS
  • 6. NOSQL • Alternative to traditional relational DBMS • Flexible schema • Quicker/cheaper to set up • Massive scalability • Relaxed consistency higher performance & availability * No declarative query language more programming * Relaxed consistency fewer guarantees
  • 7. Why NOSQL?  Every problem cannot be solved by traditional relational database system exclusively.  Handles huge databases.  Redundancy, data is pretty safe on commodity hardware  Super flexible queries using map/reduce  Rapid development (no fixed schema, yeah!)  Very fast for common use cases
  • 8. Contd..  Inspired by Distributed Data Storage problems  Scale easily by adding servers  Not suited to all problem types, but super-suited to certain large problem types  High-write situations (eg activity tracking or timeline rendering for millions of users)  A lot of relational uses are really dumbed down (eg fetch by PK with update)
  • 10. How does it work?  Clients know how to: Send items to servers (consistent hashing) What to do when a server fails How to fetch keys from servers Can “weigh” to server capacities  Servers know how to: Store items they receive Expire them from the cache No inter-server comms – everything is unaware
  • 11. Performance  RDBMS uses buffer to ensure ACID properties  NoSQL does not guarantee ACID and is therefore much faster  We don’t need ACID everywhere!  Ex. Data processing (every minute) is 4x faster with MongoDB, despite being a lot more detailed (due to much simple development)
  • 12. Why NOSQL is faster than SQL ? - Scalling  Simple web application with not much traffic  Application server, database server all on one machine
  • 13. Scalling contd..  More traffic comes in  Application server  Database server  Even more traffic comes in  Load balancer  Application server x2  Database server
  • 14. Scalling contd..  Even more traffic comes in  Load balancer x N  easy  Application server x N  easy  Database server xN  hard for SQL databases
  • 16. Scalling contd..  NoSQL Scalling -  Need more storage?  Add more servers!  Need higher performance?  Add more servers!  Need better reliability?  Add more servers!
  • 17. Scalling Summary  You can scale SQL databases (Oracle, MySQL, SQL Server…)  This will cost you dearly  If you don’t have a lot of money, you will reach limits quickly  You can scale NoSQL databases  Very easy horizontal scaling  Lots of open-source solutions  Scaling is one of the basic incentives for design, so it is well handled  Scaling is the cause of trade-offs causing you to have to use map/reduce
  • 18. Characterstics  Almost infinite horizontal scaling  Very fast  Performance doesn’t deteriorate with growth (much)  No fixed table schemas  No join operations  Ad-hoc queries difficult or impossible  Structured storage  Almost everything happens in RAM
  • 19. NOSQL Types  Wide Column Store / Column Families  Document Store  Key Value / Tuple Store  Graph Databases  Object Databases  XML Databases  Multivalue Databases
  • 20. Main types -  Key-Value Stores  Map Reduce Framework  Document Databases  Graph Databases
  • 21. Key Value Stores  Lineage: Amazon's Dynamo paper and Distributed HashTables.  Data model: A global collection of key-value pairs  Example systems  Google BigTable , Amazon Dynamo, Cassandra, Voldemort , Hbase , …  Implementation: efficiency, scalability, fault-tolerance  Records distributed to nodes based on key  Replication  Single-record transactions, “eventual consistency”
  • 22. Documented Databases  Lineage: Inspired by Lotus Notes.  Data model: Collections of documents, which contain key-value collections (called "documents").  Example: CouchDB, MongoDB, Riak
  • 23. Graph Database  Lineage: Draws from Euler and graph theory.  Data model: Nodes & relationships, both which can hold key-value pairs  Example: AllegroGraph, InfoGrid, Neo4j
  • 24. Map Reduce Framework  Google’s framework for processing highly distributable problems across huge datasets using a large number of computers  Let’s define large number of computers  Cluster if all of them have same hardware  Grid unless Cluster (if !Cluster for old-style programmers)  Process split into two phases  Map  Take the input, partition it delegate to other machines  Other machines can repeat the process, leading to tree structure  Each machine returns results to the machine who gave it the task
  • 25. Map Reduce Framework contd..  Reduce  collect results from machines you gave the tasks  combine results and return it to requester  Slower than sequential data processing, but massively parallel  Sort petabyte of data in a few hours  Input, Map, Shuffle, Reduce, Output
  • 26. Popular NoSQL  Hadoop / Hbase  MemcacheDB  Cassandra  Voldemort  Amazon  Hypertable SimpleDB  Cloudata  MongoDB  IBM  CouchDB Lotus/Domino  Redis
  • 27. Real World Use  Cassandra  Facebook (original developer, used it till late 2010)  Twitter  Digg  Reddit  Rackspace  Cisco  BigTable  Google (open-source version is HBase)  MongoDB  Foursquare  Craigslist  Bit.ly  SourceForge  GitHub
  • 28. MONGODB  Document store  Basic support for dynamic (ad hoc) queries  Query by example (nice!)  Conditional Operators  <, <=, >, >=  $all, $exists, $mod, $ne, $in, $nin, $nor, $or, $and, $si ze, $type
  • 29. MONGODB  Data is stored as BSON (binary JSON)  Makes it very well suited for languages with native JSON support  Map/Reduce written in Javascript  Slow! There is one single thread of execution in Javascript  Master/slave replication (auto failover with replica sets)  Sharding built-in  Uses memory mapped files for data storage  Performance over features  On 32bit systems, limited to ~2.5Gb  An empty database takes up 192Mb  GridFS to store big data + metadata (not actually an FS)
  • 30. CASANDRA  Written in: Java  Protocol: Custom, binary (Thrift)  Tunable trade-offs for distribution and replication (N, R, W)  Querying by column, range of keys  BigTable-like features: columns, column families  Writes are much faster than reads (!)  Constant write time regardless of database size  Map/reduce possible with Apache Hadoop
  • 31. Some more info about Cassndra in Facebook  Cassandra is open source DBMS from Appache software foundation.  Cassandra provides a structured key-value store with tunable consistency  Cassandra is a distributed storage system for managing structured data that is designed to scale to a very large size across many commodity servers, with no single point of failure  It is a NoSQL solution that was initially developed by Facebook and powered their Inbox Search feature until late 2010
  • 32. HBASE  Written in: Java  Main point: Billions of rows X millions of columns  Modeled after BigTable  Map/reduce with Hadoop  Query predicate push down via server side scan and get filters  Optimizations for real time queries  A high performance Thrift gateway  HTTP supports XML, Protobuf, and binary  Cascading, hive, and pig source and sink modules  No single point of failure  While Hadoop streams data efficiently, it has overhead for starting map/reduce jobs. HBase is column oriented key/value store and allows for low latency read and writes.  Random access performance is like MySQL
  • 33. COUCHDB  Written in: Erlang  Main point: DB consistency, ease of use  Bi-directional (!) replication, continuous or ad-hoc, with conflict detection, thus, master-master replication. (!)  MVCC - write operations do not block reads  Previous versions of documents are available  Crash-only (reliable) design  Needs compacting from time to time  Views: embedded map/reduce  Formatting views: lists & shows  Server-side document validation possible  Authentication possible  Real-time updates via _changes (!)  Attachment handling  CouchApps (standalone JS apps)
  • 34. HADOOP  Apache project  A framework that allows for the distributed processing of large data sets across clusters of computers  Designed to scale up from single servers to thousands of machines  Designed to detect and handle failures at the application layer, instead of relying on hardware for it  Created by Doug Cutting, who named it after his son's toy elephant  Hadoop subprojects  Cassandra  HBase  Pig  Hive was a Hadoop subproject, but is now a top-level Apache project
  • 35. HADOOP contd..  Scales to hundreds or thousands of computers, each with several processor cores  Designed to efficiently distribute large amounts of work across a set of machines  Hundreds of gigabytes of data constitute the low end of Hadoop- scale  Built to process "web-scale" data on the order of hundreds of gigabytes to terabytes or petabytes  Uses Java, but allows streaming so other languages can easily send and accept data items to/from Hadoop
  • 36. HADOOP contd..  Uses distributed file system (HDFS)  Designed to hold very large amounts of data (terabytes or even petabytes)  Files are stored in a redundant fashion across multiple machines to ensure their durability to failure and high availability to very parallel applications  Data organized into directories and files  Files are divided into block (64MB by default) and distributed across nodes  Design of HDFS is based on the design of the Google File System
  • 37. HIVE  A petabyte-scale data warehouse system for Hadoop  Easy data summarization, ad-hoc queries  Query the data using a SQL-like language called HiveQL  Hive compiler generates map-reduce jobs for most queries
  • 38. Conclusion  NoSQL is a great problem solver if you need it  Choose your NoSQL platform carefully as each is designed for specific purpose  Get used to Map/Reduce  It’s not a sin to use NoSQL alongside (yes)SQL database
  • 39. Referance  http://www.facebook.com/note.php?note_id=24413 138919  http://en.wikipedia.org/wiki/Apache_Cassandra  http://en.wikipedia.org/wiki/SQL  http://en.wikipedia.org/wiki/NoSQL  www.slideshare.com