NoSQL
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
Uploaded on

My presentation in the Architects forum

My presentation in the Architects forum

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,177
On Slideshare
2,174
From Embeds
3
Number of Embeds
1

Actions

Shares
Downloads
44
Comments
0
Likes
1

Embeds 3

http://www.linkedin.com 3

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • SQL Databases approach data in the form of sets and tables. Incidentally its strength soon become its weakness.Assumptions made:Data is represented in the form of tables. Row and ColumnsData in each table can be related to data in another.Data can/has to be searchable through all columns.Strengths:Data manipulation through Set theory.Enforce relational constraints with its management system.Weakness:Relational ness becomes an overhead once data becomes real huge.Large amounts of writes in a SQL database is a lot of burden on the DBMS apart from the storage disk.
  • NoSQL is a collection of databases which elude from the drawbacks of RDBMS without completely giving up on Relational Models. They are not stringent when it comes to certain core RDBMS concepts like ACID complianceand other integrity constraints.The priority is to support high levels of scalability through easy partitioning abilities across multiple cheap naïve hardware by giving up on Consistency which SQL databases look at delivering apart from some amount of relatedness from the data.
  • The CAP theorem states that any shared-data system can only achieve two of these three.Consistency (All database clients see the same data, even with concurrent updates.)Availability (All database clients are able to access some version of the data.)Partition tolerance (The database can be split over multiple servers.)http://www.julianbrowne.com/article/viewer/brewers-cap-theoremhttp://devblog.streamy.com/2009/08/24/cap-theorem/http://www.royans.net/arch/brewers-cap-theorem-on-distributed-systems/

Transcript

  • 1. Data, data, data. I cannot make bricks without clay.
    Sherlock Holmes, Sherlock Holmes [2009]
  • 2. Data
    Qualitative or Quantitative attributes of a variable or set of variables
    Lowest level of abstraction from which information and then knowledge are derived.
    Representation of a fact, figure and idea.
  • 3. A well organized newspaper or a clumsy, cluttered one?
  • 4. Data explosion
    From Gigabytes to Terabytes to Petabytes to perhaps (I’m out of nomenclature)-bytes
  • 5. NoSQL
    = Not Only SQL!= No to SQL
    != Never SQL
  • 6. Open Source
    Abridged version of this presentation and notes will be available for everyone.
    Distributed under no License
    FREE AS IN SPEECH AND BEER
  • 7. WEB 2.0
    DDBMS
    RDBMS performance
    OODB
    RnD
    Cloud Computing
    Multiple Solutions
    Necessity is the mother of Invention
  • 8. SQL Databases, the ‘Hammer’
    It’s a wonderful tool
  • 9. Commercial SQL Databases
    Even Gods use it
    Design
    Power
    Ergonomics
    Ease of use
    Features
    Warranty
    Upgrades
    Apart from
    Hole in the Pocket
  • 10. Nail is a nail, Screw is a screw
    Hammering a screw or Screw driving a nail is FOOLISHNESS!
  • 11. Non-relational next generation operational data stores and databases
    What?
    NoSQL is a new look at data to deliver:
    • High Performance
    • 12. Unlimited horizontal scalability
    • 13. Economic, common, unreliable hardware
    • 14. Auto Sharding
    • 15. Support for wide range of data
    • 16. Recursive, Hierarchical
    • 17. Non-Rigid
    • 18. High Availability
  • What? (Continued…)
    Partly or completely independent of RDBMS concepts
    No specific implementation
    Breakthrough Approaches
    Key:
    Non-relational approach
    Non-ACIDness
    A STEP BACKWARDS, THEN MANY STEPS FORWARD
  • 19. NoSQL, the ‘screwdriver’
    Yet another tool in our repository to go along with the hammer
  • 20. NoSQL is about choice
    Not all problems are nails.
    Not all screws are same.
    GOOD PROGRAMMING PRACTICE:
    Know your tools and use them appropriately
  • 21. SQL Databases
    Data
    Relational
    Tabular – Rows/Columns
    Interface
    Sql
    Basic Design Inspiration
    Set Theory
    ACID Design
    Scale Up Design
    And many more
  • 26. Why?
    • Is all data really relational?
    • 27. If Consistency is ensured, do we have to enforce/check it again at the database level.
    • 28. Are RDBMS ready for challenges of the future like:
    • 29. Dynamic schema/metadata
    • 30. Huge amounts of data
    • 31. Through horizontal auto scaling
    • 32. Ability to handle complex data types
    • 33. Images, Videos, Audios and much more
    Not Really!
  • 34. Why? (Continued…)
    RDBMS drawbacks:
    Scalability
    CRUD
    Performance
    Write Overhead
    Limited by single disk architecture
    Lack of In Memory design
    Rigid schema design
    And more …..
  • 35. HAMMERS
    Are under some
    Hammering
  • 36. DRAWBACKS
    E
    E
    P
    D
    I
    V
    E
  • 37. Scalability
    True Scalability
    Horizontal Scaling
    Transparency to the application
    No single point of failure
    Problems with SQL databases
    Vertical Scaling
    Partitioning aka Sharding
    Read Slaves
    Anti Patterns
    Normalized Data
    Joins
    ACID Transactions
  • 38. No Breadcrumbs
    CRUD is crude
    Delete/Update strategy is improper
    CRA!
    Create, Read, Archive – way to go ahead
    Audit information is lost in CRUD but not in the case of CRA
  • 39. Naive Data Support
    Not designed for
    Complex Data Structures
    Recursive
    Hierarchical
    Ordered List
    Circular
    Dynamic Metadata
  • 40. Logical/Physical separation concerns
    Relational model -> Logical Model
    RDBMS implement it at physical level
    Using Multiple indices
    Artificial overhead in managing the database
    Frequent drop and create index to make DB perform
  • 41. Spinning Disk Storage
    Design flaw for most RDBMS systems
    With cheaper memory, Memory based approach should also be included in the design
    Defiance of Moore’s law
    Disk reads grew only 12.5 times in about 50 years
    Disk writes much lesser.
    Disk write is expensive.
    RDBMS make things worse by writing more.
    ACID rains are UNHEALTHY
  • 42. Think ‘Out of the ROM’
  • 43. At Snail’s pace
    RDBMS engine growth – SLOW
    Optimizations have been minor since initial days
    Majority of growth due to Moore’s law
    Faster hardware
    Slightly faster storage
    Faster memory
    What when Moore’s law diminishes thanks to external factors like heat generated.
  • 44. Database size limits
    RDBMS are too slow
    Over multiterabyte and petabyte databases
    Purpose designed parallel processing would be needed to handle such capacities of data in a RDBMS.
  • 45. RDBMS
    has been there since years
    and is proven technology
    What aboutNoSQL
  • 46. RDBMS
    grew fast but
    growth slowed down over time and
    might eventually reach a stale point
    NoSQL
    unarguably a new immature tool,
    has been growing faster than RDBMS ever did
    and is being supported by the Big Players
  • 47. Did you say
    BIG PLAYERS!
    WHO?
  • 48. NoSQL Real World Implementations
    • Google – BigTable
    • 49. Facebook – Hbase
    • 50. Digg – Cassandra
    • 51. Amazon – Dynamo
    • 52. Trend Micro – Hbase
    • 53. Netflix – Amazon SimpleDB
    • 54. Shutterfly – MongoDB
    • 55. LinkedIn – Voldemort
    and more
    Microsoft is considering NoSQL as well for Azure services so is Twitter
    Are we next?
    Major IT Companies have implemented or even better created their own NoSQL to manage huge Data stores which couldn’t be managed by SQL Databases.
  • 56. We are used to
    SQL and relatedness,
    why can’t they just fix RDBMS
    to handle Big Data
    STORAGE SEEK RATES
    Large writes and ACID being a huge limitation
    Big Data can be handled via
    Scale Out/Partitionability across Multiple Nodes
  • 57. CAP Theorem
    Applies to distributed shared data system
  • 58. CAP THEOREM
  • 59. A Deeper look
    Consistency: The system is in a consistent state after an operation
    All clients see the same data
    Strong Consistency(ACID) vs. Eventual (BASE)
    Availability: ‘Always On’ mode, no downtime
    All clients can find some available replica
    Software/hardware upgrade tolerance
    Partition Tolerance: The system continues to function even when split into disconnected subsets (by a network disruption)
    Reads and Writes combined
  • 60. CP
    • Some data maybe inaccessible but rest is accurate/consistent
    • 61. Sharded database
    • 62. TERADATA comes here
    CA
    • Single Site Clusters
    RDBMS
    Paxos
    NoSQL
    AP
    • System is still available under partitioning but some of the data returned may be inaccurate
  • All of the operations in the transaction will complete, or none will.
    The database will be in a consistent state when the transaction begins and ends.
    The transaction will behave as if it is the only operation being performed upon the database.
    Upon completion of the transaction, the operation will not be reversed.
    Atomicity
    Consistency
    Isolation
    Durability
  • 63. Basically
    Available
    Soft State
    Eventually
    Consistent
    When Availability and Partitionability are prioritized over Consistency, think in terms of BASE
  • 64. Eventual Consistency
    If no new updates are made to the object, eventually all accesses will return the last updated value.
    Ex: Domain Name System (DNS)
  • 65. Types of Eventual Consistency
    Read-your-write consistency
    Session consistency
    Monotonic read consistency
    Monotonic write consistency
    Causal consistency
    Practically, Read-your-write consistency and monotonic read consistency are desirable in an eventually consistent system
  • 66. Hash()
    Different Apps – Different CAP requirement
    Prioritize among
    Consistency – Availability
    Availability – Partitionability
    Consistency - Partitionability
  • 67. WHERE?
    So will NoSQL eventually replace RDBMSs everywhere?No, RDBMS are there to stay.
    NoSQL is here to help.
  • 68. Wherever you want to take
    Advantage
    of
    NoSQL
  • 69. Big Data
    Denormalize
    Shard
    Scale Out
    And look no further than NoSQL
  • 70. Write Intensive Applications
    I/OpS of the Best storage device <<< n * I/OpS of relatively cheaper storage devices in simple terms: ‘HARNESS THE POWER OF YOUR CLOUD’
  • 71. Fast Key-Value Access
    NoSQL – ‘User, you are looking for $value’
    RDBMS – ‘Query executing ….’
    A O(1) Hash operation or O(log n) B+/B tree traversals
  • 72. Flexible Schema and Data types
    ‘I once was a integer, then a string then a date; What am I’ - FieldRDBMS – ‘WTH! Whatever you are, You are beyond my scope’
  • 73. Transient Data
    Data – ‘I’m here only for a while and want to get my work done fast’
    RDBMS – ‘You are data and you shall be treated like the rest’
    NoSQL – ‘Okay, I’ll allot you space in the RAM using Memcached If available otherwise you still have my cloud’
  • 74. High Write Availability
    Warning - Incoming data ….NoSQL – ‘Anytime you like, user’
    RDBMS – ‘This is insane, I’m already busy with other things’
  • 75. ECONOMICS
    RDBMS – ‘I’m powered by a wonderful, beautiful rabbit’
    NoSQL – ‘I’m powered by many cute little hamsters’
  • 76. No Single Point of Failure
    Designed to run over
    Economic
    Commonly Available
    Unreliable hardware
  • 77. Full table scan operations
    MapReduce:
    Map:
    To define your problems into optimal sub problems which can be computed in parallel and reduced later
    Reduce:
    To merge the sub optimal solutions into the result
    Divide and Conquer your way to Victory
    Powered by MapReduce! Or something similar
  • 78. Ability to restore, maintain, repair itself
    No DBA required Design
  • 79. HOW?
    Let us welcome
    Keys, Values, Collections, Data Structures, Objects, Documents Graphs
  • 80. NoSQL View
    The basic approach at data:
    Key/Value store
    Run on multiple machines
    Partitions and Replication across these machines
    Relax consistency
    Aim at Eventual Consistency
    Asynchronous replication
    But not all NoSQL take the same path.
  • 81. Document Store
    Key-Value Store
    Object
    NoSQL
    Multivalue
    Graph Stores
    BigTable Clones
    Tuble Store
  • 82. Key-Value Stores
    One key, one value, no duplicates and crazy fast
    Distributed hash tables
    The value is stored as binary object – BLOB
    The DB doesn’t understand it and doesn’t want to
    Ex: Amazon Dynamo, MemcacheDB
  • 83. Key4
    Key3
    Key2
    Key1
    Key/Value store doesn’t know what is in here
  • 84. Document Store
    Key-value store, but the value is structured and understood by the DB
    Querying data is possible
    On not just the key
    Ex: MongoDB, CouchDB, Riaketc
  • 85. Each database has collections
    Each collection has a set of documents
    They are well-designed for access through applications
    Suitable for web applications
    Few Document databases provide SQL Like query interface now
  • 86. Key4
    Key3
    Key2
    Key1
    Name: $NameValue: $Value
    Version: $Version
    Type: $Type
    Emb Object1
    Objects inside Objects
    CRAZY!
    Emb Object2
  • 87. BigTable & its Clones
    Database, tables, rows, columns and ’ SuperColumn’
    Row consists of columns and SuperColumns
    Few supercolumns can be made a must
    Each supercolumn – arbitrary set of columns
    Rows are typically versioned by a system assigned timestamp.
  • 88. Intended for tables with huge number of columns
    Millions can also be supported very easily
    ‘a sparse, distributed multi-dimensional sorted map’
    Also referred to as Wide Column stores
    Ex: Google BigTable, Cassandra, Hbase, Voldemort, Azure Tables
  • 89. Key1
    Key2
    Key3
  • 90. Graph Databases
    Nodes, Edges, Properties
    Replace traditional tables, columns, rows
    Graph database can be implement in different ways
    Key/value store, columnar, bigtable clone or even combination of these
    Fields are used to directly store the id of another entity forming the edge
  • 91. Graph database is a multi-relational graph
    No need for secondary indexes
    Relationships in RDBMS are ‘weak’
    Relationships in Graphs are ‘strong’
    The rest don’t really care about relations at db level
  • 92. Address
    Age: 32
    Matt
    Mobile
    April
    Is related to
    SSN
    Spouse
    owns
    Drives
    Honda
    Model
    City
    registration
  • 93. Key-Value Store
    Size
    Document Store
    BigTable Clone
    Graph Databases
    Complexity
  • 94. Too Many Cooks and Recipes
    No specific recipe!
    Major implementations:
    Graph
    Document store
    Tabular
    Key value store
    Eventually consistent
    Hierarchical
    Ordered
    Other Known Recipes:
    Multivalue
    Object
    Tuble Store
  • 95. The Menu
    On Disk
    BigTable
    Membase
    Tokyo Cabinet
    In RAM
    Memcached
    Velocity
    Eventually Consistent
    Cassandra
    Dynamo
    Riak
    Hierarchical
    GT.M
    Ordered
    Berkeley DB
    NMDB
    C-ISAM
    Multivalue
    eXe
    OpenQM
    Document Store
    CouchDB
    Lotus Notes
    MongoDB
    Graph
    AllegroGraph
    Neo4j
    DEX
    Tabular
    BigTable
    Hbase
    HyperTable
    The list isn’t even a quarter of the whole
  • 96. _theOpenSourceIssue
    Most of them are open source
    Thus fork-ablelike Linux
    The first of the lot
    Google’s BigTable
    Amazon’s Dynamo
    All in all, there are about 10 roots with 4 major ones.
  • 97. No single database to rule them all
  • 98. Real World Implementations
    Digg’s 3TB for Green Badges [CASSANDRA]
    Facebook’s 50TB for Inbox Search [HBASE]
    eBay’s 2PB overall data
    Google’s
  • 99. Naïve Recipe
  • 100. MongoDB
    Document Store
    JSON Storage
    REST ….. Not out of the box
    Map/Reduce
    Master slave replication
    Strong suite of query APIs
    Good support for SQL
    Work in Progress:
    Autosharding based scalability
    Failover support
    Open Source
    Non Relational
    Scalable
    Schemaless
    Queryable
  • 101. Document Oriented
    Mongo stores documents in collections
    Documents are slightly enhanced JSON Objects
    Complex data structures is very much possible
    Data Modelling is a more natural process
  • 102. Embeddable Objects
    Complexity.begin()
    Embed objects within a single document
    Document is an enhanced form of object like mentioned earlier
    The same thing in RDBMS can be achieved using multiple tables and joining them together
    Consider our requirement is to store a blogging post with this information
    Post Content
    Post Title
    Post Author
    Comments
    Comment order
    Comment content
    Comment author
  • 103. RDBMS solution
  • 104. MongoDB Solution
    Documents …. Each one of them is a post
    { Name: $name,
    Author: $author,
    Comment: [ { Author: $author1,
    Comment: $comment1} ,
    { Author: $author2,
    Comment: $comment2,
    Replies: [ { Author: $author3,
    Comment: $comment3} ] }
    ]
    }
  • 105. RDBMS Viewpoint
  • 106. ODF
    Mongodb’ed
  • 107.
  • 108. Schema-less
    No database enforced Schema
    Addition, Deletion of columns are simple
    Its about how the application uses APIs
    Data definition need not be defined up front.
  • 109. Other Features
    Data Tagging
    Caching
    Real Time Analytics
    Image Storage
    Dynamic Queries
    Binary Storage
  • 110. MongoDB - Why Not?
    Lacks transactions
    Doesn’t completely support SQL
    Lacks built-in revisioning system like CouchDB
    Lacks full text searching features
  • 111. Try MongoDB @
    http://try.mongodb.org/
  • 112. n
    EOL
  • 113. Calm down!
    Eventually Answered System
    All your questions will be answered eventually