Scheduling scheme for hadoop clusters


Published on

a prefetching mechanism into MapReduce model while retaining compatibility with the native Hadoop.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • ex / pro1. low i/o performence* high i/o performence2. cpu work load underutilised* proper utilisation of CPU work load3. no overhead to master* additional  overhead of prefetching to master4. Suited for real time solution* not suited for real time solutions
  • Scheduling scheme for hadoop clusters

    1. 1. A RESEARCH ON SCHEDULING SCHEME FOR HADOOP CLUSTERS Guided by Neetha K N Dept of CSE Presented by Amjith B S7 CSE
    2. 2. Hadoop MapReduce and HDFS AREAS OF SEMINAR
    3. 3. Hadoop cluster TERMINOLOGY REVIEW Rack 1 Rack 2 Rack n Node 1 . . . Node 1 Node 1 Node 2 Node 2 Node 2 ... Node n Node n Node n
    4. 4. • Hadoop is a Open source software framework for distributed processing of large datasets across large clusters of computers • 2 Components MapReduce engine Distributed file system INTRODUCTION
    5. 5. • Mapreduce engine Programming model developed by Google  Computation component of Hadoop  Consists of Map and Reduce functions • HDFS  Storage component of Hadoop  Splits the data into blocks and distributes them Fault tolerant and self-healing COMPONENTS
    6. 6. MapReduce • Jobtracker node • Tasktracker • Name node HDFS node • Data node
    7. 7. • HDFS Node • NameNode – Maintains metadata information about files (1 per cluster). • DataNode – Handles all data allocation and replication and is installed on each slave node (1 to many per cluster). • MapReduce node • JobTracker – Schedules job execution and keep track of cluster wide job status (1 per cluster) • TaskTracker – Receives tasks from job tracker. Runs on compute nodes in conjunction with data node (1 to many per cluster).
    8. 8. SYSTEM FEATURES DISADVANTAG ES Hadoop FIFO scheduing Implements by FIFO principle Can not assign priority for jobs Facebook’s Fair scheduler Even allocation of No preemption resources support for large tasks REF [4] Yahoo’s Capacity scheduler FIFO scheduler based on priority REF[6] Problem in assigning priorities LITERATURE SURVEY REFERENCE REF [6]
    10. 10. • The underutilization of CPU processes • Not flexible • Interaction between master node with slave nodes EXISTING SYSTEM (disadvantage)
    11. 11. • Analyze the system for CPU and IO underutilization • Use a predictive scheduler for predicting the appropriate TaskTracker • Couple the scheduler with a prefetching mechanism to improve the system performance PROPSED SYSTEM
    12. 12. • Flexible task scheduler • Predicts the most appropriate task trackers to assign future tasks • Allows DataNodes to explore underutilization of disk bandwidth • Seeks stragglers and predicts candidate data blocks PREDICTIVE SCHEDULER
    13. 13. • Integrate with predictive scheduler • Multiple worker threads • Monitor status of worker threads and coordinate prefetching process PREFETCHING MODULE
    14. 14. Copying the job from HDFS to TaskTracker Creation of local working directory for task Creation of TaskTracker instance STEPS FOR LAUNCHING TASKS
    15. 15. ISSUES IN PREFETCHING MODULE • When to prefetch • What to prefetch • How much to prefetch
    16. 16. • • • • Avoidance of I/O stalls Maximising CPU utilisation Helps the smooth functioning of Hadoop Flexible ADVANTAGES
    17. 17. EXISTING SYSTEM PROPSOED SYSTEM Low i/o perfomance High I/O perfomance CPU underutilised Proper utilisation Less flexible Additional overhead of prefetching to master COMPARISON
    18. 18. • Hadoop on demand (HOD) • A scheduler in heterogeneous environment FUTURE SCOPE
    19. 19. • 1. J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. OSDI ’04, pages 137–150, 2008. • 2. M.Zaharia, A.Konwinski, A.Joseph, Y.zatz, and I.Stoica. Improving mapreduce performance in heterogeneous environments. In OSDI’08: 8th USENIX Symposium on Operating Systems Design and Implementation, October 2008. • 3. R. H. Patterson, G. A. Gibson, E. Ginting, D. Stodolsky, and J. Zelenka. Informed prefetching and caching. SIGOPS Oper. Syst. Rev., 29:79–95, December 1995. • 4. Sangwon Seo, Ingook Jang, Kyungchang Woo, Inkyo Kim,et. al. Hpmr: Prefetching and pre-shuffling in shared mapreduce computation environment. In Proceedings of 11th IEEE International Conference on Cluster Computing, pages 16–20. ACM, 2009. • 5. Tom White. Hadoop The Definitive Guide. O’Reilly, 2009. • 6. Towards a Resource Aware Scheduler in Hadoop Mark Yong, Nitin Garegrat, Shiwali Mohan REFERENCES
    20. 20. THANK YOU!!!!!!
    21. 21. QUESTIONS??