Hadoop is an open-source software for reliable, high-scalability distributed computing. It can efficiently process extremely large amounts of data across clusters of commodity hardware. Hadoop distributes data across clusters and allows for simple programming of distributed processing using MapReduce. It can detect and handle failures at the application layer to provide high availability. HDFS is Hadoop's distributed file system that stores data across data nodes in a fault-tolerant way. It uses a master-slave architecture with a single NameNode managing metadata and DataNodes storing data blocks.