This document discusses a research project on scheduling schemes for Hadoop clusters. It begins with an introduction to Hadoop and its two main components, MapReduce and HDFS. It then reviews existing scheduling systems like FIFO, Facebook's Fair Scheduler, and Yahoo's Capacity Scheduler. The proposed system aims to address issues like CPU and I/O underutilization in the existing systems by using a predictive scheduler and prefetching mechanism. This predictive scheduler would assign tasks to appropriate task trackers and allow prefetching of data blocks. The prefetching module would help avoid I/O stalls and maximize CPU utilization. In comparison to existing systems, the proposed system is expected to provide higher I/O performance,