Presentation on 2013-06-27, Workshop on the future of Big Data management, discussing hadoop for a science audience that are either HPC/grid users or people suddenly discovering that their data is accruing towards PB.
The other talks were on GPFS, LustreFS and Ceph, so rather than just do beauty-contest slides, I decided to raise the question of "what is a filesystem?", whether the constraints imposed by the Unix metaphor and API are becoming limits on scale and parallelism (both technically and, for GPFS and Lustre Enterprise in cost).
Then: HDFS as the foundation for the Hadoop stack.
All the other FS talks did emphasise their Hadoop integration, with the Intel talk doing the most to assert performance improvements of LustreFS over HDFSv1 in dfsIO and Terasort (no gridmix?), which showed something important: Hadoop is the application that add DFS developers have to have a story for