RadFS Random Access DFS
DFS in a slide <ul><li>Locates DataNode, opens socket, says hi </li></ul><ul><li>DataNode allocates a thread to stream blo...
A Different Approach <ul><li>Everything is a positioned read </li></ul><ul><li>All interactions with DN are stateless </li...
DFS Anatomy – seek + read()
DFS seek+read con't <ul><li>Locates preferred block, caches DN locations </li></ul><ul><li>Opens new Socket and BlockReade...
RadFS Overview
RadFS seek+read, con't <ul><li>Transparently caches frequently read data </li></ul><ul><li>Automatically pools/manages fil...
Implementation Notes <ul><li>Checksum is currently generated by wrapping CheckSumFileSystem around RadFileSystem </li></ul...
Benchmark Environment <ul><li>EC2 “Medium” - 2x2GHz, 1.7GB, shared I/O </li></ul><ul><li>Operations against 20GB sequence ...
Random Reads - 2k <ul><li>10,000 random reads of 2k each over the length of a 20GB file </li></ul><ul><li>DFS averaged 7.8...
Random Reads – 2kb (avg in ms)
SequenceFile Search <ul><li>Binary search over 10gb sequence file </li></ul><ul><li>DFS, RadFS with various cache settings...
Sequence File Binary Search 5000 searches, avg ms per search
Streaming <ul><li>DFS is inherently faster for streaming due to the dedicated server thread </li></ul><ul><li>Checksumming...
Streaming – 1GB 1b reads, time in seconds
Streaming 1GB 2k reads, time in seconds
Going forward – modular reader
Going forward - Applications <ul><li>Could improve Hbase, solves file handle problem and improves latency </li></ul><ul><l...
Going forward - Development <ul><li>Copy existing HDFS method of interleaving checksums directly from datanode – one read ...
Contribute! <ul><li>Patch is at Apache JIRA issue HDFS-516 </li></ul><ul><li>Will be on GitHub momentarily </li></ul><ul><...
Upcoming SlideShare
Loading in …5
×

Hw09 Low Latency, Random Reads From Hdfs

2,968 views
2,806 views

Published on

Published in: Technology
1 Comment
2 Likes
Statistics
Notes
No Downloads
Views
Total views
2,968
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
82
Comments
1
Likes
2
Embeds 0
No embeds

No notes for slide

Hw09 Low Latency, Random Reads From Hdfs

  1. 1. RadFS Random Access DFS
  2. 2. DFS in a slide <ul><li>Locates DataNode, opens socket, says hi </li></ul><ul><li>DataNode allocates a thread to stream block contents from opened position </li></ul><ul><li>Client reads in as it processes </li></ul><ul><li>Great for streaming entire blocks </li></ul><ul><li>Positioned reads cut across the grain </li></ul>
  3. 3. A Different Approach <ul><li>Everything is a positioned read </li></ul><ul><li>All interactions with DN are stateless </li></ul><ul><li>Wrapped ByteService interfaces </li></ul><ul><ul><li>Caching, Network, Checksum </li></ul></ul><ul><ul><li>Each configurable, can be optimized for task </li></ul></ul><ul><li>Connections pooled and re-used often </li></ul><ul><li>Fewer threads on server </li></ul>
  4. 4. DFS Anatomy – seek + read()
  5. 5. DFS seek+read con't <ul><li>Locates preferred block, caches DN locations </li></ul><ul><li>Opens new Socket and BlockReader for each random read </li></ul><ul><li>Reads from the Socket </li></ul><ul><li>Object creation means GC debt </li></ul><ul><li>No optimizations for repeated reads of the same data </li></ul><ul><li>Threads will consume resources on server after client hangup – files aren't automatically closed </li></ul>
  6. 6. RadFS Overview
  7. 7. RadFS seek+read, con't <ul><li>Transparently caches frequently read data </li></ul><ul><li>Automatically pools/manages file handles </li></ul><ul><li>Reduces network congestion (in theory) </li></ul><ul><li>Lower DataNode workload </li></ul><ul><ul><li>3 threads total instead of 1 per Xceiver </li></ul></ul><ul><li>Configurable on client side for the task at hand </li></ul><ul><li>Network latency penalty on long-running reads </li></ul><ul><li>Checksum implementation means 2 reads per random read if caching is disabled </li></ul>
  8. 8. Implementation Notes <ul><li>Checksum is currently generated by wrapping CheckSumFileSystem around RadFileSystem </li></ul><ul><ul><li>Inefficient, reads 2 files over dfs </li></ul></ul><ul><ul><li>Improper – what if checksum block is corrupt? </li></ul></ul><ul><li>CachingByteService implements lookahead (good) by copying bytes twice (bad) </li></ul><ul><li>Permissions happen “by accident” at namenode </li></ul><ul><ul><li>Attackable by searching blockid space on DNs </li></ul></ul><ul><ul><li>Could exchange UserAccessToken on request </li></ul></ul>
  9. 9. Benchmark Environment <ul><li>EC2 “Medium” - 2x2GHz, 1.7GB, shared I/O </li></ul><ul><li>Operations against 20GB sequence file </li></ul><ul><li>All tests run singlethreaded from the lightly loaded namenode </li></ul><ul><li>Fast internal network, adequate memory but not enough to page entire file </li></ul><ul><li>All benchmarks in a given set were run on the same instance, middle value from 3 runs </li></ul>
  10. 10. Random Reads - 2k <ul><li>10,000 random reads of 2k each over the length of a 20GB file </li></ul><ul><li>DFS averaged 7.8ms while Rad with no cache averaged 4.4ms </li></ul><ul><li>Caching added a full 2ms – hardcoded lookahead was no help and lots of unnecessary byte copying </li></ul>
  11. 11. Random Reads – 2kb (avg in ms)
  12. 12. SequenceFile Search <ul><li>Binary search over 10gb sequence file </li></ul><ul><li>DFS, RadFS with various cache settings </li></ul><ul><li>Indicative of potential filesystem uses </li></ul><ul><ul><li>Lucene </li></ul></ul><ul><ul><li>Large numbers of secondary indices </li></ul></ul><ul><ul><li>Ease of development </li></ul></ul><ul><ul><li>Read-only RDBMS-like systems built from ETLs or other long-running process </li></ul></ul>
  13. 13. Sequence File Binary Search 5000 searches, avg ms per search
  14. 14. Streaming <ul><li>DFS is inherently faster for streaming due to the dedicated server thread </li></ul><ul><li>Checksumming is expensive! </li></ul><ul><li>Early radfs builds beat dfs at 1-byte read()s because they didn't have checksumming </li></ul><ul><li>Require a PipeliningByteService for use in streaming jobs that would make requests to Datanode, stream in and checksum in a separate client-side thread </li></ul>
  15. 15. Streaming – 1GB 1b reads, time in seconds
  16. 16. Streaming 1GB 2k reads, time in seconds
  17. 17. Going forward – modular reader
  18. 18. Going forward - Applications <ul><li>Could improve Hbase, solves file handle problem and improves latency </li></ul><ul><li>Could be used to create low-latency lookup formats accessible from scripting languages </li></ul><ul><ul><li>Cache is automatic, simplifying development </li></ul></ul><ul><ul><li>“ Table” directory with main store file and several secondary index files generated by ETL </li></ul></ul><ul><ul><li>Lucene indices? Can be built with MapReduce </li></ul></ul>
  19. 19. Going forward - Development <ul><li>Copy existing HDFS method of interleaving checksums directly from datanode – one read </li></ul><ul><ul><li>Audit checksumming code for CPU efficiency – reading can be CPU bound </li></ul></ul><ul><ul><li>Implement as a ByteService instead of clumsy wrapper around FileSystem. Make configurable </li></ul></ul><ul><li>Implement PipeliningByteService to improve streaming by pre-fetching pages </li></ul><ul><li>Exchange UserAccessToken at each read, could possibly use for encryption of blockid </li></ul>
  20. 20. Contribute! <ul><li>Patch is at Apache JIRA issue HDFS-516 </li></ul><ul><li>Will be on GitHub momentarily </li></ul><ul><li>Goals: </li></ul><ul><ul><li>Equivalent streaming performance to DFS </li></ul></ul><ul><ul><li>Faster random read, caching option </li></ul></ul><ul><ul><li>Lower resource consumption on server </li></ul></ul><ul><li>3 doable tasks above </li></ul><ul><li>Large configuration space to explore </li></ul><ul><li>Email me: [email_address] </li></ul>

×