This document discusses the problem of storing and processing many small files in HDFS and Hadoop. It introduces the concept of "harballing" where Hadoop uses an archiving technique called Hadoop Archive (HAR) to collect many small files into a single large file to reduce overhead on the namenode and improve performance. HAR packs small files into an archive file with a .har extension so the original files can still be accessed efficiently and in parallel. This reduction of small files through harballing increases scalability by reducing namespace usage and load on the namenode.