Enterprises are using Hadoop for interactive real-time data processing via projects such as the Stinger Initiative. We describe two new HDFS features – Centralized Cache Management and Heterogeneous Storage – that allow applications to effectively use low latency storage media such as Solid State Disks and RAM. In the first part of this talk, we discuss Centralized Cache Management to coordinate caching important datasets and place tasks for memory locality. HDFS deployments today rely on the OS buffer cache to keep data in RAM for faster access. However, the user has no direct control over what data is held in RAM or how long it?s going to stay there. Centralized Cache Management allows users to specify which data to lock into RAM. Next, we describe Heterogeneous Storage support for applications to choose storage media based on their performance and durability requirements. Perhaps the most interesting of the newer storage media are Solid State Drives which provide improved random IO performance over spinning disks. We also discuss memory as a storage tier which can be useful for temporary files and intermediate data for latency sensitive real-time applications. In the last part of the talk we describe how administrators can use quota mechanism extensions to manage fair distribution of scarce storage resources across users and applications.
Clipping is a handy way to collect important slides you want to go back to later.