This document discusses big data and the challenges of working with large datasets. It notes that every 2 days now as much data is created as was created from the beginning of civilization until 2003. The Hadoop ecosystem, including tools like MapReduce and machine learning, are proposed solutions for analyzing large and diverse datasets, but challenges remain around usability, speed of analysis, and finding new applications beyond web logs.