Hw09   Low Latency, Random Reads From Hdfs

Like this? Share it with your network


Hw09 Low Latency, Random Reads From Hdfs






Total Views
Views on SlideShare
Embed Views



1 Embed 15

http://www.slideshare.net 15



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Hw09 Low Latency, Random Reads From Hdfs Presentation Transcript

  • 1. RadFS Random Access DFS
  • 2. DFS in a slide
    • Locates DataNode, opens socket, says hi
    • DataNode allocates a thread to stream block contents from opened position
    • Client reads in as it processes
    • Great for streaming entire blocks
    • Positioned reads cut across the grain
  • 3. A Different Approach
    • Everything is a positioned read
    • All interactions with DN are stateless
    • Wrapped ByteService interfaces
      • Caching, Network, Checksum
      • Each configurable, can be optimized for task
    • Connections pooled and re-used often
    • Fewer threads on server
  • 4. DFS Anatomy – seek + read()
  • 5. DFS seek+read con't
    • Locates preferred block, caches DN locations
    • Opens new Socket and BlockReader for each random read
    • Reads from the Socket
    • Object creation means GC debt
    • No optimizations for repeated reads of the same data
    • Threads will consume resources on server after client hangup – files aren't automatically closed
  • 6. RadFS Overview
  • 7. RadFS seek+read, con't
    • Transparently caches frequently read data
    • Automatically pools/manages file handles
    • Reduces network congestion (in theory)
    • Lower DataNode workload
      • 3 threads total instead of 1 per Xceiver
    • Configurable on client side for the task at hand
    • Network latency penalty on long-running reads
    • Checksum implementation means 2 reads per random read if caching is disabled
  • 8. Implementation Notes
    • Checksum is currently generated by wrapping CheckSumFileSystem around RadFileSystem
      • Inefficient, reads 2 files over dfs
      • Improper – what if checksum block is corrupt?
    • CachingByteService implements lookahead (good) by copying bytes twice (bad)
    • Permissions happen “by accident” at namenode
      • Attackable by searching blockid space on DNs
      • Could exchange UserAccessToken on request
  • 9. Benchmark Environment
    • EC2 “Medium” - 2x2GHz, 1.7GB, shared I/O
    • Operations against 20GB sequence file
    • All tests run singlethreaded from the lightly loaded namenode
    • Fast internal network, adequate memory but not enough to page entire file
    • All benchmarks in a given set were run on the same instance, middle value from 3 runs
  • 10. Random Reads - 2k
    • 10,000 random reads of 2k each over the length of a 20GB file
    • DFS averaged 7.8ms while Rad with no cache averaged 4.4ms
    • Caching added a full 2ms – hardcoded lookahead was no help and lots of unnecessary byte copying
  • 11. Random Reads – 2kb (avg in ms)
  • 12. SequenceFile Search
    • Binary search over 10gb sequence file
    • DFS, RadFS with various cache settings
    • Indicative of potential filesystem uses
      • Lucene
      • Large numbers of secondary indices
      • Ease of development
      • Read-only RDBMS-like systems built from ETLs or other long-running process
  • 13. Sequence File Binary Search 5000 searches, avg ms per search
  • 14. Streaming
    • DFS is inherently faster for streaming due to the dedicated server thread
    • Checksumming is expensive!
    • Early radfs builds beat dfs at 1-byte read()s because they didn't have checksumming
    • Require a PipeliningByteService for use in streaming jobs that would make requests to Datanode, stream in and checksum in a separate client-side thread
  • 15. Streaming – 1GB 1b reads, time in seconds
  • 16. Streaming 1GB 2k reads, time in seconds
  • 17. Going forward – modular reader
  • 18. Going forward - Applications
    • Could improve Hbase, solves file handle problem and improves latency
    • Could be used to create low-latency lookup formats accessible from scripting languages
      • Cache is automatic, simplifying development
      • “ Table” directory with main store file and several secondary index files generated by ETL
      • Lucene indices? Can be built with MapReduce
  • 19. Going forward - Development
    • Copy existing HDFS method of interleaving checksums directly from datanode – one read
      • Audit checksumming code for CPU efficiency – reading can be CPU bound
      • Implement as a ByteService instead of clumsy wrapper around FileSystem. Make configurable
    • Implement PipeliningByteService to improve streaming by pre-fetching pages
    • Exchange UserAccessToken at each read, could possibly use for encryption of blockid
  • 20. Contribute!
    • Patch is at Apache JIRA issue HDFS-516
    • Will be on GitHub momentarily
    • Goals:
      • Equivalent streaming performance to DFS
      • Faster random read, caching option
      • Lower resource consumption on server
    • 3 doable tasks above
    • Large configuration space to explore
    • Email me: [email_address]