Your SlideShare is downloading. ×
0
Nonrelational Databases
Nonrelational Databases
Nonrelational Databases
Nonrelational Databases
Nonrelational Databases
Nonrelational Databases
Nonrelational Databases
Nonrelational Databases
Nonrelational Databases
Nonrelational Databases
Nonrelational Databases
Nonrelational Databases
Nonrelational Databases
Nonrelational Databases
Nonrelational Databases
Nonrelational Databases
Nonrelational Databases
Nonrelational Databases
Nonrelational Databases
Nonrelational Databases
Nonrelational Databases
Nonrelational Databases
Nonrelational Databases
Nonrelational Databases
Nonrelational Databases
Nonrelational Databases
Nonrelational Databases
Nonrelational Databases
Nonrelational Databases
Nonrelational Databases
Nonrelational Databases
Nonrelational Databases
Nonrelational Databases
Nonrelational Databases
Nonrelational Databases
Nonrelational Databases
Nonrelational Databases
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Nonrelational Databases

12,411

Published on

My improvised/copied preso for some short talk I gave.

My improvised/copied preso for some short talk I gave.

Published in: Technology
0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
12,411
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
243
Comments
0
Likes
8
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Non-relational Databases A new kind of Databases for handling Web Scale
  • 2. Agenda
  • 7. The problem
    • The Web introduces a new scale for applications, in terms of:
      • Concurrent users (millions of reqs/second)
      • 8. Data (peta-bytes generated daily)
      • 9. Processing (all this data needs processing)
      • 10. Exponential growth (surging unpredictable demands)
  • 11. The problem (contd.)
    • Web sites with very large traffic have no way to deal with this using existing RDBMS solutions:
    • Even with their high-end clustering solutions
  • 16. The problem (contd.)
    • Why?
      • Applications using normalized database schema require the use of join's, which doesn't perform well under lots of data and/or nodes
      • 17. Existing RDBMS clustering solutions require scale-up, which is limited & not really scalable when dealing with exponential growth
      • 18. Machines have upper limits on capacity, & sharding the data & processing across machines is very complex & app-specific
  • 19. The problem (contd.)
    • Why not just use sharding?
      • Very problematic when adding/removing nodes
      • 20. Basically, you end up denormalizing everything & loosing all benefits of relational databases
  • 21. Who faced this problem?
    • Web applications dealing with high traffic, massive data, large user-base & user-generated content, such as:
  • 28. 1 difference though
    • Compared to traditional large applications (telco, financial, &c), these web applications are usually free & therefore:
      • can sacrifice data integrity / consistency
        • No one will sue them if he doesn't receive the most current:
          • status of their friends (Facebook/Twitter)
          • 29. Web search result (Google /Yahoo!)
          • 30. Item added to cart (Amazon)
  • 31. The solution
    • These companies had to come up with a new kind of DBMS, capable of handling web scale
      • Possibly sacrificing some level of consistency or some other feature
  • 32. Must we sacrifice something?
    • In 2000, Eric Brewer (co-founder of Inktomi) formulated the CAP theorem, claiming that you can only optimize 2 out of these 3:
      • C onsistency
      • 33. A vailability
      • 34. P artition-tolerance
    • BTW, the theorem was later proved by MIT scientists in 2002
  • 35. Simple example
    • When you have a lot of data which needs to be highly available, you'll usually need to p artition it across machines & also replicate it to be more fault-tolerant
    • 36. This means, that when writing a record, all replica's must be updated too
    • 37. Now you need to choose between:
      • Lock all relevant replica's during update => be less a vailable
      • 38. Don't lock the replicas => be less c onsistent
  • 39. The consequence
    • You need to either:
      • Drop partition tolerance (CA)
      • 40. Drop availability (CP)
      • 41. Drop consistency (AP)
    • “Drop” here is usually not meant as binary, but rather tunable
  • 42. Non-relational databases
    • The solution these companies came up with are a family of database for handling web scale:
      • BigTable (developed at Google)
      • 43. Hbase (developed at Yahoo!)
      • 44. Dynamo (developed at Amazon)
      • 45. Cassandra (developed at FaceBook)
      • 46. Voldemort (developed at LinkedIn)
      • 47. & a few more:
        • Riak, Redis, CouchDB, MongoDB, Hypertable
  • 48. Benefits
    • Massively scalable
    • 49. Extremely fast
    • 50. Highly available, decentralized & fault tolerant (no single-point-of-failure)
    • 51. Transparent sharding (consistent hashing)
    • 52. Elasticity
    • 53. Parallel processing
    • 54. Dynamic schema
    • 55. Automatic conflict resolution
  • 56. Consistent hashing
  • 57. Replication
  • 58. Replication – node joining
  • 59. Replication – node leaving
  • 60. Scale-out / elasticity?
    • O(1) Distributed Hashtable
    • 61. Runs on a large number of cheap commodity machines
    • 62. Replication
    • 63. Gossip protocol
    • 64. Transparently handles adding/removing nodes
  • 65. Tunable consistency?
    • Levels of consistency:
      • Strict consistency
      • 66. Read your writes consistency
      • 67. Session consistency
      • 68. Monotonic read consistency
      • 69. Eventual consistency
    • Tunable means: how many replica's to lock on write
      • N, R, W parameters
      • 70. Quorum
  • 71. Dealing with inconsistency
    • Read-repair (when encountering inconsistency)
    • 72. Vector clock conflict resolution
  • 73. Dynamic schema
    • Column families (basically a sparse table)
  • 74. Dynamic schema (contd.)
    • “Supercolumn” is a collection of columns
    • 75. Record can have several “supercolumns”
  • 76. Data processing
    • Map/Reduce: an API exposed by non-relational databases to process data
      • A functional programming pattern for parallelizing work
      • 77. Brings the workers to the data – excellent fit for non-relational databases
      • 78. Minimizes the programming to 2 simple functions (map & reduce)
      • 79. Example: count appearances of a word in a giant table of large texts
  • 80. Map/Reduce (contd.)
  • 81. Storage
  • 82. Cost
    • Allows sacrificing consistency (ACID) - at certain circumstances (but can deal with it)
    • 83. Non-standard new API model
    • 84. Non-standard new Schema model
    • 85. New knowledge required to tune/optimize
    • 86. Less mature
  • 87. API model
    • Usually, similar to Key-Value map:
      • Get(key)
      • 88. Put(key, value)
      • 89. Delete(key)
      • 90. Execute(operation, key_list)
    • “value” can be
      • an opaque serialized object
      • 91. a record (list of “columns”: <name, value, timestamp>)
  • 92. Schema model
    • Kind of sparse table
    • 93. No schema
  • 94. Example: Cassandra
    • Features:
      • O(1) DHT
      • 95. Eventual consistency
        • tunable: consistency vs. latency
      • Values are structured, indexed
      • 96. Columns / column families
      • 97. Slicing with predicates (queries)
      • 98. PartitionOrderer
  • 99. Cassandra performance
    • Benchmark against MySQL (50GB)
      • MySQL:
      • Cassandra:
    • how come writes are so fast?
      • Writes involve no reads/seeks
      • 102. Use any node (closest to you)
  • 103. Cassandra API
  • 104. Cassandra API (contd.)
  • 105. Example: Cassandra (contd.)
    • Java API
      • Simple DAO
      • 106. Simple client
  • 107. Cassandra usage
  • 110. Further information
    • The Dynamo paper:
      • http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
    • Nosql patterns:
      • http://horicky.blogspot.com/2009/11/nosql-patterns.html
    • Nosql conference video's:
      • https://nosqleast.com/2009/
    • Hebrew podcast covering nosql & Cassandra (episodes 56, 57 & more):
      • http://www.reversim.com/
  • 111. Further information (contd.)
    • Ran Tavori's lecture (video + slides):
      • http://prettyprint.me/2010/01/09/introduction-to-nosql-and-cassandra-part-1/
      • 112. http://prettyprint.me/2010/01/20/introduction-to-nosql-and-cassandra-part-2/

×