HadoopWorld 2011 Presentation by Daniel Abadi
As Hadoop rapidly becomes the universal standard for scalable data analysis and processing, it is increasingly important to understand its strengths and weaknesses for particular application scenarios in order to avoid inefficiency pitfalls. For example, Hadoop has great potential to perform scalable graph analysis if it is used correctly. Recent benchmarking has shown that simple implementations can be 1300 times less efficient than a more optimal Hadoop-centered implementation. In this talk, Daniel Abadi gives an overview of a recent research project at Yale University that investigates how to perform sub-graph pattern matching within a Hadoop-centered system that is three orders of magnitude faster than a more simple approach. This talk highlights how the cleaning, transforming, and parallel processing strengths of Hadoop are combined with storage optimized for graph data analysis. It then discusses further changes that are needed in the core Hadoop framework to take performance to the next level.