This document provides an overview of the Hadoop architecture, including its two main components: the Hadoop Distributed File System (HDFS) and the MapReduce engine. It describes HDFS' use of blocks and replication for reliability. It explains how files are written to HDFS and accessed via command line tools. It also outlines the different node types in Hadoop including the NameNode, DataNode, JobTracker and TaskTracker. Finally, it discusses topology awareness and how bandwidth decreases between processes on different nodes, racks and data centers.