HDFS is a distributed file system designed for storing large files across clusters of commodity hardware. It provides high-throughput access to application data reliably, even in the event of hardware failures, through data replication across multiple nodes. The master node (namenode) manages metadata like file paths and block locations, while slave nodes (datanodes) store file blocks. Files are split into large blocks which are replicated to provide fault tolerance.