Retso hotdep-2011

744 views

Published on

Talk at the 7th Workshop on Hot Topics in Systems Dependability (HotDep 2011)

0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
744
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
16
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Retso hotdep-2011

  1. 1. Lock-free transactional support for large-scale storage systemsFlavio Junqueira, Benjamin Reed, Maysam Yabandeh Yahoo! Research June 2011
  2. 2. Big data• Large data sets ✓ Unstructured, semi-structured data ✓ Critical for business logic• Examples of such data ✓ Web logs, server logs, social media, etc June 2011 2
  3. 3. Big data+160% clicks +43% clicksvs. one-size fits all vs. editor selected Eric Baldeschwieler @IBM Big Data, May 2011 June 2011 3
  4. 4. Big data: Hadoop Eric Baldeschwieler @IBM Big Data, May 2011 June 2011 4
  5. 5. Background • Database generations in batchesInput e.g., Input Hours of Output Hours of Output Hbase,Input DB MapReduce DB MapReduce DB HDFS • Online concurrent updates Input txn Output Require transactional Input txn DB support June 2011 5
  6. 6. Examples• Mutable tables• Various indexes: Web, news, shopping, coupons• User and content models• Characteristics ✓ Concurrency ✓ Losing updates is undesirable ✓ There are concurrent reads and they must be consistent June 2011 6
  7. 7. Semantics• Read only previously committed values Txn w(x,v) r(x) = v w(x,v’) Time June 2011 7
  8. 8. Semantics• No concurrent writes to the same row Txn w(x,v) w(x,v’) Time At least one must abort June 2011 8
  9. 9. Snapshot Isolation• Known in the database realm• Conflicting transactions ✓ Write to the same element (e.g., row) ✓ Time range between start and commit overlap• Efficient implementation by versioning June 2011 9
  10. 10. Locks?• Previous approaches: Lock data to modify ✓ Convoy effect [Percolator, OSDI’10] ✓ Delays of several seconds ✓ Higher overhead on data servers• Our approach ✓ Lock-free, centralized transaction manager ✓ Single point of failure, potential bottleneck? June 2011 10
  11. 11. Transaction Status Oracle• Single process ✓ Processes client inquiries about transactions ✓ Includes a timestamp oracle Keeps state TO TSO about committed rowsDB1 Client DB2 June 2011 11
  12. 12. Transaction Status Oracle• Single process ✓ Processes client inquiries about transactions ✓ Includes a timestamp oracle Keeps state TO TSO about committed rows TsDB1 Client DB2 June 2011 11
  13. 13. Transaction Status Oracle• Single process ✓ Processes client inquiries about transactions ✓ Includes a timestamp oracle Keeps state TO TSO about committed rows Ts w(r2, v2, Ts(txnr)) r(r1)DB1 ACK Client v1, Ts(txnw), ⊥ DB2 Ts(txnw) < Ts(txnr) June 2011 11
  14. 14. Transaction Status Oracle• Single process ✓ Processes client inquiries about transactions ✓ Includes a timestamp oracle Keeps state TO TSO about committed rows Ts Tc(txnw) < Ts(txnr)? w(r2, v2, Ts(txnr)) r(r1)DB1 ACK Client v1, Ts(txnw), ⊥ DB2 Ts(txnw) < Ts(txnr) June 2011 11
  15. 15. Transaction Status Oracle• Single process ✓ Processes client inquiries about transactions ✓ Includes a timestamp oracle Keeps state TO TSO about committed rows Ts Commit r2 Tc(txnw) < Ts(txnr)? w(r2, v2, Ts(txnr)) r(r1)DB1 ACK Client v1, Ts(txnw), ⊥ DB2 Ts(txnw) < Ts(txnr) June 2011 11
  16. 16. Transaction Status Oracle• Single process ✓ Processes client inquiries about transactions ✓ Includes a timestamp oracle Keeps state TO TSO about committed rows Ts Commit r2 Tc(txnw) < Ts(txnr)? w(r2, v2, Ts(txnr)) r(r1)DB1 ACK Client v1, Ts(txnw), ⊥ DB2 Ts(txnw) < Ts(txnr) Cleanup(r2, txnr) June 2011 11
  17. 17. ReTSO: Design choices• TSO ✓ Keeps state of modified rows• In-memory state ✓ Highest commit timestamp of all garbage-collected rows• Auto-GC Hash map ✓ Lazy garbage-collection ✓ Upon a hit June 2011 12
  18. 18. ReTSO: Increasing dependability • Remote write-ahead log Inquiries Backup ReTSO ReTSO Warm or cold Updates Writes to WALare synchronous but do WAL e.g., NFS, BookKeeper not block other txns [http://zookeeper.apache.org/bookkeeper] June 2011 13
  19. 19. Preliminary results• Coded in Java ✓ Except for hash map (C++ with JNI interface)• Uses BookKeeper for WAL• 10 identical servers ✓ 2.13 Dual Core Intel Xeon ✓ 4GB of RAM ✓ 1 Gigabit interfaces June 2011 14
  20. 20. Preliminary results• Average throughput observed ✓ 3 clients, 1,000 concurrent transactions ✓ 81k TPS• Average latency ✓ 1 client, 1 txn ✓ 0.87 ms (with WAL) ✓ 0.17 ms (without WAL) June 2011 15
  21. 21. Preliminary results• Increasing the load of the system ✓ 1 to 16 clients 18 ReTSO 16 WAL-disabled ✓ Max is 72k TPS 14 12 Latency in ms 10 8 6 4 2 0 20000 40000 60000 80000 100000 120000 Throughput in TPS June 2011 16
  22. 22. What’s baking?• Integration ✓ HBase ✓ Query engine• Real workloads June 2011 17
  23. 23. Summary• Transaction management for large-scale data repositories• Lock-based vs. Lock-free ✓ ReTSO is lock-free and dependable ✓ Reduced load on storage nodes ✓ Low latency despite faults• Performance sufficient for realistic applications June 2011 18

×