Your SlideShare is downloading. ×

Hw09 Low Latency, Random Reads From Hdfs


Published on

Published in: Technology
1 Comment
No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. RadFS Random Access DFS
  • 2. DFS in a slide
    • Locates DataNode, opens socket, says hi
    • DataNode allocates a thread to stream block contents from opened position
    • Client reads in as it processes
    • Great for streaming entire blocks
    • Positioned reads cut across the grain
  • 3. A Different Approach
    • Everything is a positioned read
    • All interactions with DN are stateless
    • Wrapped ByteService interfaces
      • Caching, Network, Checksum
      • Each configurable, can be optimized for task
    • Connections pooled and re-used often
    • Fewer threads on server
  • 4. DFS Anatomy – seek + read()
  • 5. DFS seek+read con't
    • Locates preferred block, caches DN locations
    • Opens new Socket and BlockReader for each random read
    • Reads from the Socket
    • Object creation means GC debt
    • No optimizations for repeated reads of the same data
    • Threads will consume resources on server after client hangup – files aren't automatically closed
  • 6. RadFS Overview
  • 7. RadFS seek+read, con't
    • Transparently caches frequently read data
    • Automatically pools/manages file handles
    • Reduces network congestion (in theory)
    • Lower DataNode workload
      • 3 threads total instead of 1 per Xceiver
    • Configurable on client side for the task at hand
    • Network latency penalty on long-running reads
    • Checksum implementation means 2 reads per random read if caching is disabled
  • 8. Implementation Notes
    • Checksum is currently generated by wrapping CheckSumFileSystem around RadFileSystem
      • Inefficient, reads 2 files over dfs
      • Improper – what if checksum block is corrupt?
    • CachingByteService implements lookahead (good) by copying bytes twice (bad)
    • Permissions happen “by accident” at namenode
      • Attackable by searching blockid space on DNs
      • Could exchange UserAccessToken on request
  • 9. Benchmark Environment
    • EC2 “Medium” - 2x2GHz, 1.7GB, shared I/O
    • Operations against 20GB sequence file
    • All tests run singlethreaded from the lightly loaded namenode
    • Fast internal network, adequate memory but not enough to page entire file
    • All benchmarks in a given set were run on the same instance, middle value from 3 runs
  • 10. Random Reads - 2k
    • 10,000 random reads of 2k each over the length of a 20GB file
    • DFS averaged 7.8ms while Rad with no cache averaged 4.4ms
    • Caching added a full 2ms – hardcoded lookahead was no help and lots of unnecessary byte copying
  • 11. Random Reads – 2kb (avg in ms)
  • 12. SequenceFile Search
    • Binary search over 10gb sequence file
    • DFS, RadFS with various cache settings
    • Indicative of potential filesystem uses
      • Lucene
      • Large numbers of secondary indices
      • Ease of development
      • Read-only RDBMS-like systems built from ETLs or other long-running process
  • 13. Sequence File Binary Search 5000 searches, avg ms per search
  • 14. Streaming
    • DFS is inherently faster for streaming due to the dedicated server thread
    • Checksumming is expensive!
    • Early radfs builds beat dfs at 1-byte read()s because they didn't have checksumming
    • Require a PipeliningByteService for use in streaming jobs that would make requests to Datanode, stream in and checksum in a separate client-side thread
  • 15. Streaming – 1GB 1b reads, time in seconds
  • 16. Streaming 1GB 2k reads, time in seconds
  • 17. Going forward – modular reader
  • 18. Going forward - Applications
    • Could improve Hbase, solves file handle problem and improves latency
    • Could be used to create low-latency lookup formats accessible from scripting languages
      • Cache is automatic, simplifying development
      • “ Table” directory with main store file and several secondary index files generated by ETL
      • Lucene indices? Can be built with MapReduce
  • 19. Going forward - Development
    • Copy existing HDFS method of interleaving checksums directly from datanode – one read
      • Audit checksumming code for CPU efficiency – reading can be CPU bound
      • Implement as a ByteService instead of clumsy wrapper around FileSystem. Make configurable
    • Implement PipeliningByteService to improve streaming by pre-fetching pages
    • Exchange UserAccessToken at each read, could possibly use for encryption of blockid
  • 20. Contribute!
    • Patch is at Apache JIRA issue HDFS-516
    • Will be on GitHub momentarily
    • Goals:
      • Equivalent streaming performance to DFS
      • Faster random read, caching option
      • Lower resource consumption on server
    • 3 doable tasks above
    • Large configuration space to explore
    • Email me: [email_address]