This document provides an overview of Hadoop and its ecosystem. It describes Hadoop as a framework for distributed storage and processing of large datasets across clusters of commodity hardware. The key components of Hadoop are the Hadoop Distributed File System (HDFS) for storage, and MapReduce as a programming model for distributed computation across large datasets. A variety of related projects form the Hadoop ecosystem, providing capabilities like data integration, analytics, workflow scheduling and more.