Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hw09 Low Latency, Random Reads From Hdfs

3,229 views

Published on

Published in: Technology

Hw09 Low Latency, Random Reads From Hdfs

  1. 1. RadFS Random Access DFS
  2. 2. DFS in a slide <ul><li>Locates DataNode, opens socket, says hi </li></ul><ul><li>DataNode allocates a thread to stream block contents from opened position </li></ul><ul><li>Client reads in as it processes </li></ul><ul><li>Great for streaming entire blocks </li></ul><ul><li>Positioned reads cut across the grain </li></ul>
  3. 3. A Different Approach <ul><li>Everything is a positioned read </li></ul><ul><li>All interactions with DN are stateless </li></ul><ul><li>Wrapped ByteService interfaces </li></ul><ul><ul><li>Caching, Network, Checksum </li></ul></ul><ul><ul><li>Each configurable, can be optimized for task </li></ul></ul><ul><li>Connections pooled and re-used often </li></ul><ul><li>Fewer threads on server </li></ul>
  4. 4. DFS Anatomy – seek + read()
  5. 5. DFS seek+read con't <ul><li>Locates preferred block, caches DN locations </li></ul><ul><li>Opens new Socket and BlockReader for each random read </li></ul><ul><li>Reads from the Socket </li></ul><ul><li>Object creation means GC debt </li></ul><ul><li>No optimizations for repeated reads of the same data </li></ul><ul><li>Threads will consume resources on server after client hangup – files aren't automatically closed </li></ul>
  6. 6. RadFS Overview
  7. 7. RadFS seek+read, con't <ul><li>Transparently caches frequently read data </li></ul><ul><li>Automatically pools/manages file handles </li></ul><ul><li>Reduces network congestion (in theory) </li></ul><ul><li>Lower DataNode workload </li></ul><ul><ul><li>3 threads total instead of 1 per Xceiver </li></ul></ul><ul><li>Configurable on client side for the task at hand </li></ul><ul><li>Network latency penalty on long-running reads </li></ul><ul><li>Checksum implementation means 2 reads per random read if caching is disabled </li></ul>
  8. 8. Implementation Notes <ul><li>Checksum is currently generated by wrapping CheckSumFileSystem around RadFileSystem </li></ul><ul><ul><li>Inefficient, reads 2 files over dfs </li></ul></ul><ul><ul><li>Improper – what if checksum block is corrupt? </li></ul></ul><ul><li>CachingByteService implements lookahead (good) by copying bytes twice (bad) </li></ul><ul><li>Permissions happen “by accident” at namenode </li></ul><ul><ul><li>Attackable by searching blockid space on DNs </li></ul></ul><ul><ul><li>Could exchange UserAccessToken on request </li></ul></ul>
  9. 9. Benchmark Environment <ul><li>EC2 “Medium” - 2x2GHz, 1.7GB, shared I/O </li></ul><ul><li>Operations against 20GB sequence file </li></ul><ul><li>All tests run singlethreaded from the lightly loaded namenode </li></ul><ul><li>Fast internal network, adequate memory but not enough to page entire file </li></ul><ul><li>All benchmarks in a given set were run on the same instance, middle value from 3 runs </li></ul>
  10. 10. Random Reads - 2k <ul><li>10,000 random reads of 2k each over the length of a 20GB file </li></ul><ul><li>DFS averaged 7.8ms while Rad with no cache averaged 4.4ms </li></ul><ul><li>Caching added a full 2ms – hardcoded lookahead was no help and lots of unnecessary byte copying </li></ul>
  11. 11. Random Reads – 2kb (avg in ms)
  12. 12. SequenceFile Search <ul><li>Binary search over 10gb sequence file </li></ul><ul><li>DFS, RadFS with various cache settings </li></ul><ul><li>Indicative of potential filesystem uses </li></ul><ul><ul><li>Lucene </li></ul></ul><ul><ul><li>Large numbers of secondary indices </li></ul></ul><ul><ul><li>Ease of development </li></ul></ul><ul><ul><li>Read-only RDBMS-like systems built from ETLs or other long-running process </li></ul></ul>
  13. 13. Sequence File Binary Search 5000 searches, avg ms per search
  14. 14. Streaming <ul><li>DFS is inherently faster for streaming due to the dedicated server thread </li></ul><ul><li>Checksumming is expensive! </li></ul><ul><li>Early radfs builds beat dfs at 1-byte read()s because they didn't have checksumming </li></ul><ul><li>Require a PipeliningByteService for use in streaming jobs that would make requests to Datanode, stream in and checksum in a separate client-side thread </li></ul>
  15. 15. Streaming – 1GB 1b reads, time in seconds
  16. 16. Streaming 1GB 2k reads, time in seconds
  17. 17. Going forward – modular reader
  18. 18. Going forward - Applications <ul><li>Could improve Hbase, solves file handle problem and improves latency </li></ul><ul><li>Could be used to create low-latency lookup formats accessible from scripting languages </li></ul><ul><ul><li>Cache is automatic, simplifying development </li></ul></ul><ul><ul><li>“ Table” directory with main store file and several secondary index files generated by ETL </li></ul></ul><ul><ul><li>Lucene indices? Can be built with MapReduce </li></ul></ul>
  19. 19. Going forward - Development <ul><li>Copy existing HDFS method of interleaving checksums directly from datanode – one read </li></ul><ul><ul><li>Audit checksumming code for CPU efficiency – reading can be CPU bound </li></ul></ul><ul><ul><li>Implement as a ByteService instead of clumsy wrapper around FileSystem. Make configurable </li></ul></ul><ul><li>Implement PipeliningByteService to improve streaming by pre-fetching pages </li></ul><ul><li>Exchange UserAccessToken at each read, could possibly use for encryption of blockid </li></ul>
  20. 20. Contribute! <ul><li>Patch is at Apache JIRA issue HDFS-516 </li></ul><ul><li>Will be on GitHub momentarily </li></ul><ul><li>Goals: </li></ul><ul><ul><li>Equivalent streaming performance to DFS </li></ul></ul><ul><ul><li>Faster random read, caching option </li></ul></ul><ul><ul><li>Lower resource consumption on server </li></ul></ul><ul><li>3 doable tasks above </li></ul><ul><li>Large configuration space to explore </li></ul><ul><li>Email me: [email_address] </li></ul>

×