This document discusses using machine learning and MapReduce with Hadoop to perform predictive analysis on health insurance claims data. It proposes extracting data from online forums, searching for correlations between medical keywords and claims, and using support vector regression to build a predictive model. The analysis would be run on Amazon Elastic MapReduce for scalability and cost efficiency. Future work may include additional data sources and model enhancements.