RDBMS vs. Other Data Stores forScalability<br />ramki.g@directi.com<br />TechTalk 2009, IIIT Hyderabad<br />
Scalability<br />Increase Resources  Increase Performance (Linearly)<br />Performance?<br />Latency, Capacity, Throughput...
Relational Database<br />Table, Row, Column<br />Set, Item, Property<br />
Relational Theory<br />Selection: SELECT<br />Filter: WHERE<br />Join: JOIN, LEFT JOIN,RIGHT JOIN<br />Correlation: SELECT...
Relational Theory<br />Aggregation<br />Set Operators<br />Union, Intersection, Minus<br />Group By<br />MAX, MIN, SUM, AV...
Transactions: Atomicity<br />Transaction Level<br />Entire Logical operations is a transaction<br />Multiple statements<br...
Transactions: Consistency<br />Integrity Constraints<br />Referential Integrity<br />
Transactions: Isolation Levels<br />Serializable<br />A definite order of mutations/transactions is possible to arrive to ...
RDBMS Luxuries<br />Multiple Indexes<br />Auto Increments/Sequences<br />Triggers<br />
Scalability in RDBMS<br />Replication<br />Read Replication (Master-Slave)<br />Read Write Replication (Master-Master)<br ...
Scalability Impediments<br />Performance<br />Sub-Queries/Correlation, Joins, Aggregates, <br />Referential Integrity cons...
CAP?<br />Conjecture: Distributed systems cannot ensure all three of the following properties at once<br />Consistency The...
ACID to BASE<br />Basically Available - system seems to work all the time<br />Soft State - it doesn&apos;t have to be con...
BASE: An Example<br />BEGIN Transaction<br />INSERT INTO ORDER( oid, timestamp, customer)<br />FOREACH item IN itemList<br...
Alternate Implementations<br />BigTable – Google – CP<br />Hbase – Apache – CP <br />HyperTable – Community - CP<br />Dyna...
Data Models<br />Key/Value Pairs <br />Dynamo, MemcacheDB, Voldemort<br />Row-Column<br />BigTable, Casandra, SimpleDB, Hy...
Programming Models<br />// Open the table<br />Table *T = OpenOrDie(&quot;/bigtable/web/webtable&quot;);<br />// Write a n...
BigTable: Consistent yet Infinitely Scalable<br />Single Master<br />B+ tree based data distribution<br />
BigTable: Transactions<br />Enities and Entity Groups<br />Invoice<br />Invoice Item<br />Delivery Note<br />
Dynamo: Highly available and Infinitely Scalable<br />Consistent Hashing<br />Peer to Peer Distributed<br />Gossip based m...
RDBMS or Other?<br />Nature of Business<br />Maturity of the Product<br />Cost of Adoption<br />Maturity of the alternativ...
Q&A<br />
Upcoming SlideShare
Loading in...5
×

Scalability: Rdbms Vs Other Data Stores

5,826

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
5,826
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
92
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • May have to discuss Queuing Systems, Idempotency and so on
  • Transcript of "Scalability: Rdbms Vs Other Data Stores"

    1. 1. RDBMS vs. Other Data Stores forScalability<br />ramki.g@directi.com<br />TechTalk 2009, IIIT Hyderabad<br />
    2. 2. Scalability<br />Increase Resources  Increase Performance (Linearly)<br />Performance?<br />Latency, Capacity, Throughput<br />Vertical Scalability (Scaling Up)<br />Divide the functionality<br />Horizontal Scalability (Scaling Out)<br />Divide the data<br />
    3. 3. Relational Database<br />Table, Row, Column<br />Set, Item, Property<br />
    4. 4. Relational Theory<br />Selection: SELECT<br />Filter: WHERE<br />Join: JOIN, LEFT JOIN,RIGHT JOIN<br />Correlation: SELECT a FROM A WHERE A.b IN (SELECT b FROM B WHERE b.a &gt; a)<br />
    5. 5. Relational Theory<br />Aggregation<br />Set Operators<br />Union, Intersection, Minus<br />Group By<br />MAX, MIN, SUM, AVG<br />
    6. 6. Transactions: Atomicity<br />Transaction Level<br />Entire Logical operations is a transaction<br />Multiple statements<br />Statement level<br />Each statement is either successful or not, no partial success<br />Multiple records<br />Record Level<br />All modifications to a record are successful or not<br />
    7. 7. Transactions: Consistency<br />Integrity Constraints<br />Referential Integrity<br />
    8. 8. Transactions: Isolation Levels<br />Serializable<br />A definite order of mutations/transactions is possible to arrive to state B from state A<br />Repeatable Read<br />Any data read by a transaction will remain so till transaction is complete<br />Non Repeatable Read aka Read Committed<br />Two reads within a transaction may give different results<br />Dirty Read<br />A transaction might read data which might then be rolledback<br />
    9. 9. RDBMS Luxuries<br />Multiple Indexes<br />Auto Increments/Sequences<br />Triggers<br />
    10. 10. Scalability in RDBMS<br />Replication<br />Read Replication (Master-Slave)<br />Read Write Replication (Master-Master)<br />Cluster<br />Distributed Transaction<br />Two-phase commits<br />
    11. 11. Scalability Impediments<br />Performance<br />Sub-Queries/Correlation, Joins, Aggregates, <br />Referential Integrity constraints<br />Basic Guarantee<br />Consistency<br />Availability<br />
    12. 12. CAP?<br />Conjecture: Distributed systems cannot ensure all three of the following properties at once<br />Consistency The client perceives that a set of operations has occurred all at once.<br />Availability Every operation must terminate in an intended response.<br />Partition tolerance Operations will complete, even if individual components are unavailable.<br />
    13. 13. ACID to BASE<br />Basically Available - system seems to work all the time<br />Soft State - it doesn&apos;t have to be consistent all the time<br />Eventually Consistent - becomes consistent at some later time<br />
    14. 14. BASE: An Example<br />BEGIN Transaction<br />INSERT INTO ORDER( oid, timestamp, customer)<br />FOREACH item IN itemList<br /> INSERT INTO ORDER_ITEM ( oid, item.id, item.quantity, item.unitprice)<br /> //UPDATE INVENTORY SET quantity=quantity- item.quantityWHERE item = item.id<br />COMMIT<br />END Transaction<br />Assume Each statement is queued for execution <br />You will get COMMIT success<br />
    15. 15. Alternate Implementations<br />BigTable – Google – CP<br />Hbase – Apache – CP <br />HyperTable – Community - CP<br />Dynamo – Amazon – AP<br />SimpleDB– Amazon - AP<br />Voldemort – LinkedIn – AP<br />Cassandra – Facebook– AP<br />MemcacheDB - community – CP/AP<br />
    16. 16. Data Models<br />Key/Value Pairs <br />Dynamo, MemcacheDB, Voldemort<br />Row-Column<br />BigTable, Casandra, SimpleDB, Hypertable, Hbase<br />
    17. 17. Programming Models<br />// Open the table<br />Table *T = OpenOrDie(&quot;/bigtable/web/webtable&quot;);<br />// Write a new anchor and delete an old anchor<br />RowMutation r1(T, &quot;com.cnn.www&quot;);<br />r1.Set(&quot;anchor:www.c-span.org&quot;, &quot;CNN&quot;);<br />r1.Delete(&quot;anchor:www.abc.com&quot;);<br />Operation op;<br />Apply(&op, &r1);<br />
    18. 18. BigTable: Consistent yet Infinitely Scalable<br />Single Master<br />B+ tree based data distribution<br />
    19. 19. BigTable: Transactions<br />Enities and Entity Groups<br />Invoice<br />Invoice Item<br />Delivery Note<br />
    20. 20. Dynamo: Highly available and Infinitely Scalable<br />Consistent Hashing<br />Peer to Peer Distributed<br />Gossip based member discovery<br />
    21. 21. RDBMS or Other?<br />Nature of Business<br />Maturity of the Product<br />Cost of Adoption<br />Maturity of the alternative Datastores<br />
    22. 22. Q&A<br />
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×