NoSQL

1. Data, data, data. I cannot make bricks without clay. Sherlock Holmes, Sherlock Holmes [2009]

2. Data Qualitative or Quantitative attributes of a variable or set of variables Lowest level of abstraction from which information and then knowledge are derived. Representation of a fact, figure and idea.

3. A well organized newspaper or a clumsy, cluttered one?

4. Data explosion From Gigabytes to Terabytes to Petabytes to perhaps (I’m out of nomenclature)-bytes

5. NoSQL = Not Only SQL!= No to SQL != Never SQL

6. Open Source Abridged version of this presentation and notes will be available for everyone. Distributed under no License FREE AS IN SPEECH AND BEER

7. WEB 2.0 DDBMS RDBMS performance OODB RnD Cloud Computing Multiple Solutions Necessity is the mother of Invention

8. SQL Databases, the ‘Hammer’ It’s a wonderful tool

9. Commercial SQL Databases Even Gods use it Design Power Ergonomics Ease of use Features Warranty Upgrades Apart from Hole in the Pocket

10. Nail is a nail, Screw is a screw Hammering a screw or Screw driving a nail is FOOLISHNESS!

12. Unlimited horizontal scalability

13. Economic, common, unreliable hardware

14. Auto Sharding

15. Support for wide range of data

16. Recursive, Hierarchical

17. Non-Rigid

19. NoSQL, the ‘screwdriver’ Yet another tool in our repository to go along with the hammer

20. NoSQL is about choice Not all problems are nails. Not all screws are same. GOOD PROGRAMMING PRACTICE: Know your tools and use them appropriately

22. MySQL

23. Teradata

24. SQLite

25. SQL ServerAnd many more

27. If Consistency is ensured, do we have to enforce/check it again at the database level.

28. Are RDBMS ready for challenges of the future like:

29. Dynamic schema/metadata

30. Huge amounts of data

31. Through horizontal auto scaling

32. Ability to handle complex data types

33. Images, Videos, Audios and much moreNot Really!

34. Why? (Continued…) RDBMS drawbacks: Scalability CRUD Performance Write Overhead Limited by single disk architecture Lack of In Memory design Rigid schema design And more …..

35. HAMMERS Are under some Hammering

36. DRAWBACKS E E P D I V E

37. Scalability True Scalability Horizontal Scaling Transparency to the application No single point of failure Problems with SQL databases Vertical Scaling Partitioning aka Sharding Read Slaves Anti Patterns Normalized Data Joins ACID Transactions

38. No Breadcrumbs CRUD is crude Delete/Update strategy is improper CRA! Create, Read, Archive – way to go ahead Audit information is lost in CRUD but not in the case of CRA

39. Naive Data Support Not designed for Complex Data Structures Recursive Hierarchical Ordered List Circular Dynamic Metadata

40. Logical/Physical separation concerns Relational model -> Logical Model RDBMS implement it at physical level Using Multiple indices Artificial overhead in managing the database Frequent drop and create index to make DB perform

41. Spinning Disk Storage Design flaw for most RDBMS systems With cheaper memory, Memory based approach should also be included in the design Defiance of Moore’s law Disk reads grew only 12.5 times in about 50 years Disk writes much lesser. Disk write is expensive. RDBMS make things worse by writing more. ACID rains are UNHEALTHY

42. Think ‘Out of the ROM’

43. At Snail’s pace RDBMS engine growth – SLOW Optimizations have been minor since initial days Majority of growth due to Moore’s law Faster hardware Slightly faster storage Faster memory What when Moore’s law diminishes thanks to external factors like heat generated.

44. Database size limits RDBMS are too slow Over multiterabyte and petabyte databases Purpose designed parallel processing would be needed to handle such capacities of data in a RDBMS.

45. RDBMS has been there since years and is proven technology What aboutNoSQL

46. RDBMS grew fast but growth slowed down over time and might eventually reach a stale point NoSQL unarguably a new immature tool, has been growing faster than RDBMS ever did and is being supported by the Big Players

47. Did you say BIG PLAYERS! WHO?

49. Facebook – Hbase

50. Digg – Cassandra

51. Amazon – Dynamo

52. Trend Micro – Hbase

53. Netflix – Amazon SimpleDB

54. Shutterfly – MongoDB

55. LinkedIn – Voldemortand more Microsoft is considering NoSQL as well for Azure services so is Twitter Are we next? Major IT Companies have implemented or even better created their own NoSQL to manage huge Data stores which couldn’t be managed by SQL Databases.

56. We are used to SQL and relatedness, why can’t they just fix RDBMS to handle Big Data STORAGE SEEK RATES Large writes and ACID being a huge limitation Big Data can be handled via Scale Out/Partitionability across Multiple Nodes

57. CAP Theorem Applies to distributed shared data system

58. CAP THEOREM

59. A Deeper look Consistency: The system is in a consistent state after an operation All clients see the same data Strong Consistency(ACID) vs. Eventual (BASE) Availability: ‘Always On’ mode, no downtime All clients can find some available replica Software/hardware upgrade tolerance Partition Tolerance: The system continues to function even when split into disconnected subsets (by a network disruption) Reads and Writes combined

61. Sharded database

63. Basically Available Soft State Eventually Consistent When Availability and Partitionability are prioritized over Consistency, think in terms of BASE

64. Eventual Consistency If no new updates are made to the object, eventually all accesses will return the last updated value. Ex: Domain Name System (DNS)

65. Types of Eventual Consistency Read-your-write consistency Session consistency Monotonic read consistency Monotonic write consistency Causal consistency Practically, Read-your-write consistency and monotonic read consistency are desirable in an eventually consistent system

66. Hash() Different Apps – Different CAP requirement Prioritize among Consistency – Availability Availability – Partitionability Consistency - Partitionability

67. WHERE? So will NoSQL eventually replace RDBMSs everywhere?No, RDBMS are there to stay. NoSQL is here to help.

68. Wherever you want to take Advantage of NoSQL

69. Big Data Denormalize Shard Scale Out And look no further than NoSQL

70. Write Intensive Applications I/OpS of the Best storage device <<< n * I/OpS of relatively cheaper storage devices in simple terms: ‘HARNESS THE POWER OF YOUR CLOUD’

71. Fast Key-Value Access NoSQL – ‘User, you are looking for $value’ RDBMS – ‘Query executing ….’ A O(1) Hash operation or O(log n) B+/B tree traversals

72. Flexible Schema and Data types ‘I once was a integer, then a string then a date; What am I’ - FieldRDBMS – ‘WTH! Whatever you are, You are beyond my scope’

73. Transient Data Data – ‘I’m here only for a while and want to get my work done fast’ RDBMS – ‘You are data and you shall be treated like the rest’ NoSQL – ‘Okay, I’ll allot you space in the RAM using Memcached If available otherwise you still have my cloud’

74. High Write Availability Warning - Incoming data ….NoSQL – ‘Anytime you like, user’ RDBMS – ‘This is insane, I’m already busy with other things’

75. ECONOMICS RDBMS – ‘I’m powered by a wonderful, beautiful rabbit’ NoSQL – ‘I’m powered by many cute little hamsters’

76. No Single Point of Failure Designed to run over Economic Commonly Available Unreliable hardware

77. Full table scan operations MapReduce: Map: To define your problems into optimal sub problems which can be computed in parallel and reduced later Reduce: To merge the sub optimal solutions into the result Divide and Conquer your way to Victory Powered by MapReduce! Or something similar

78. Ability to restore, maintain, repair itself No DBA required Design

79. HOW? Let us welcome Keys, Values, Collections, Data Structures, Objects, Documents Graphs

80. NoSQL View The basic approach at data: Key/Value store Run on multiple machines Partitions and Replication across these machines Relax consistency Aim at Eventual Consistency Asynchronous replication But not all NoSQL take the same path.

81. Document Store Key-Value Store Object NoSQL Multivalue Graph Stores BigTable Clones Tuble Store

82. Key-Value Stores One key, one value, no duplicates and crazy fast Distributed hash tables The value is stored as binary object – BLOB The DB doesn’t understand it and doesn’t want to Ex: Amazon Dynamo, MemcacheDB

83. Key4 Key3 Key2 Key1 Key/Value store doesn’t know what is in here

84. Document Store Key-value store, but the value is structured and understood by the DB Querying data is possible On not just the key Ex: MongoDB, CouchDB, Riaketc

85. Each database has collections Each collection has a set of documents They are well-designed for access through applications Suitable for web applications Few Document databases provide SQL Like query interface now

86. Key4 Key3 Key2 Key1 Name: $NameValue: $Value Version: $Version Type: $Type Emb Object1 Objects inside Objects CRAZY! Emb Object2

87. BigTable & its Clones Database, tables, rows, columns and ’ SuperColumn’ Row consists of columns and SuperColumns Few supercolumns can be made a must Each supercolumn – arbitrary set of columns Rows are typically versioned by a system assigned timestamp.

88. Intended for tables with huge number of columns Millions can also be supported very easily ‘a sparse, distributed multi-dimensional sorted map’ Also referred to as Wide Column stores Ex: Google BigTable, Cassandra, Hbase, Voldemort, Azure Tables

89. Key1 Key2 Key3

90. Graph Databases Nodes, Edges, Properties Replace traditional tables, columns, rows Graph database can be implement in different ways Key/value store, columnar, bigtable clone or even combination of these Fields are used to directly store the id of another entity forming the edge

91. Graph database is a multi-relational graph No need for secondary indexes Relationships in RDBMS are ‘weak’ Relationships in Graphs are ‘strong’ The rest don’t really care about relations at db level

92. Address Age: 32 Matt Mobile April Is related to SSN Spouse owns Drives Honda Model City registration

93. Key-Value Store Size Document Store BigTable Clone Graph Databases Complexity

94. Too Many Cooks and Recipes No specific recipe! Major implementations: Graph Document store Tabular Key value store Eventually consistent Hierarchical Ordered Other Known Recipes: Multivalue Object Tuble Store

95. The Menu On Disk BigTable Membase Tokyo Cabinet In RAM Memcached Velocity Eventually Consistent Cassandra Dynamo Riak Hierarchical GT.M Ordered Berkeley DB NMDB C-ISAM Multivalue eXe OpenQM Document Store CouchDB Lotus Notes MongoDB Graph AllegroGraph Neo4j DEX Tabular BigTable Hbase HyperTable The list isn’t even a quarter of the whole

96. _theOpenSourceIssue Most of them are open source Thus fork-ablelike Linux The first of the lot Google’s BigTable Amazon’s Dynamo All in all, there are about 10 roots with 4 major ones.

97. No single database to rule them all

98. Real World Implementations Digg’s 3TB for Green Badges [CASSANDRA] Facebook’s 50TB for Inbox Search [HBASE] eBay’s 2PB overall data Google’s

99. Naïve Recipe

100. MongoDB Document Store JSON Storage REST ….. Not out of the box Map/Reduce Master slave replication Strong suite of query APIs Good support for SQL Work in Progress: Autosharding based scalability Failover support Open Source Non Relational Scalable Schemaless Queryable

101. Document Oriented Mongo stores documents in collections Documents are slightly enhanced JSON Objects Complex data structures is very much possible Data Modelling is a more natural process

102. Embeddable Objects Complexity.begin() Embed objects within a single document Document is an enhanced form of object like mentioned earlier The same thing in RDBMS can be achieved using multiple tables and joining them together Consider our requirement is to store a blogging post with this information Post Content Post Title Post Author Comments Comment order Comment content Comment author

103. RDBMS solution

104. MongoDB Solution Documents …. Each one of them is a post { Name: $name, Author: $author, Comment: [ { Author: $author1, Comment: $comment1} , { Author: $author2, Comment: $comment2, Replies: [ { Author: $author3, Comment: $comment3} ] } ] }

105. RDBMS Viewpoint

106. ODF Mongodb’ed

108. Schema-less No database enforced Schema Addition, Deletion of columns are simple Its about how the application uses APIs Data definition need not be defined up front.

109. Other Features Data Tagging Caching Real Time Analytics Image Storage Dynamic Queries Binary Storage

110. MongoDB - Why Not? Lacks transactions Doesn’t completely support SQL Lacks built-in revisioning system like CouchDB Lacks full text searching features

111. Try MongoDB @ http://try.mongodb.org/

112. EOL

113. Calm down! Eventually Answered System All your questions will be answered eventually

NoSQL

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to NoSQL

Similar to NoSQL (20)

NoSQL

Editor's Notes