Scheduling scheme for hadoop clusters
Upcoming SlideShare
Loading in...5
×
 

Scheduling scheme for hadoop clusters

on

  • 217 views

a prefetching mechanism into MapReduce model while retaining compatibility with the native Hadoop.

a prefetching mechanism into MapReduce model while retaining compatibility with the native Hadoop.

Statistics

Views

Total Views
217
Views on SlideShare
216
Embed Views
1

Actions

Likes
0
Downloads
11
Comments
0

1 Embed 1

http://www.slideee.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • TERMINOLOGY REVIEW
  • ex / pro1. low i/o performence* high i/o performence2. cpu work load underutilised* proper utilisation of CPU work load3. no overhead to master* additional  overhead of prefetching to master4. Suited for real time solution* not suited for real time solutions

Scheduling scheme for hadoop clusters Scheduling scheme for hadoop clusters Presentation Transcript

  • A RESEARCH ON SCHEDULING SCHEME FOR HADOOP CLUSTERS Guided by Neetha K N Dept of CSE Presented by Amjith B S7 CSE
  • Hadoop MapReduce and HDFS AREAS OF SEMINAR
  • Hadoop cluster TERMINOLOGY REVIEW Rack 1 Rack 2 Rack n Node 1 . . . Node 1 Node 1 Node 2 Node 2 Node 2 ... Node n Node n Node n
  • • Hadoop is a Open source software framework for distributed processing of large datasets across large clusters of computers • 2 Components MapReduce engine Distributed file system INTRODUCTION
  • • Mapreduce engine Programming model developed by Google  Computation component of Hadoop  Consists of Map and Reduce functions • HDFS  Storage component of Hadoop  Splits the data into blocks and distributes them Fault tolerant and self-healing COMPONENTS
  • MapReduce • Jobtracker node • Tasktracker • Name node HDFS node • Data node
  • • HDFS Node • NameNode – Maintains metadata information about files (1 per cluster). • DataNode – Handles all data allocation and replication and is installed on each slave node (1 to many per cluster). • MapReduce node • JobTracker – Schedules job execution and keep track of cluster wide job status (1 per cluster) • TaskTracker – Receives tasks from job tracker. Runs on compute nodes in conjunction with data node (1 to many per cluster).
  • SYSTEM FEATURES DISADVANTAG ES Hadoop FIFO scheduing Implements by FIFO principle Can not assign priority for jobs Facebook’s Fair scheduler Even allocation of No preemption resources support for large tasks REF [4] Yahoo’s Capacity scheduler FIFO scheduler based on priority REF[6] Problem in assigning priorities LITERATURE SURVEY REFERENCE REF [6]
  • EXISTING SYSTEM
  • • The underutilization of CPU processes • Not flexible • Interaction between master node with slave nodes EXISTING SYSTEM (disadvantage)
  • • Analyze the system for CPU and IO underutilization • Use a predictive scheduler for predicting the appropriate TaskTracker • Couple the scheduler with a prefetching mechanism to improve the system performance PROPSED SYSTEM
  • • Flexible task scheduler • Predicts the most appropriate task trackers to assign future tasks • Allows DataNodes to explore underutilization of disk bandwidth • Seeks stragglers and predicts candidate data blocks PREDICTIVE SCHEDULER
  • • Integrate with predictive scheduler • Multiple worker threads • Monitor status of worker threads and coordinate prefetching process PREFETCHING MODULE
  • Copying the job from HDFS to TaskTracker Creation of local working directory for task Creation of TaskTracker instance STEPS FOR LAUNCHING TASKS
  • ISSUES IN PREFETCHING MODULE • When to prefetch • What to prefetch • How much to prefetch
  • • • • • Avoidance of I/O stalls Maximising CPU utilisation Helps the smooth functioning of Hadoop Flexible ADVANTAGES
  • EXISTING SYSTEM PROPSOED SYSTEM Low i/o perfomance High I/O perfomance CPU underutilised Proper utilisation Less flexible Additional overhead of prefetching to master COMPARISON
  • • Hadoop on demand (HOD) • A scheduler in heterogeneous environment FUTURE SCOPE
  • • 1. J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. OSDI ’04, pages 137–150, 2008. • 2. M.Zaharia, A.Konwinski, A.Joseph, Y.zatz, and I.Stoica. Improving mapreduce performance in heterogeneous environments. In OSDI’08: 8th USENIX Symposium on Operating Systems Design and Implementation, October 2008. • 3. R. H. Patterson, G. A. Gibson, E. Ginting, D. Stodolsky, and J. Zelenka. Informed prefetching and caching. SIGOPS Oper. Syst. Rev., 29:79–95, December 1995. • 4. Sangwon Seo, Ingook Jang, Kyungchang Woo, Inkyo Kim,et. al. Hpmr: Prefetching and pre-shuffling in shared mapreduce computation environment. In Proceedings of 11th IEEE International Conference on Cluster Computing, pages 16–20. ACM, 2009. • 5. Tom White. Hadoop The Definitive Guide. O’Reilly, 2009. • 6. Towards a Resource Aware Scheduler in Hadoop Mark Yong, Nitin Garegrat, Shiwali Mohan REFERENCES
  • THANK YOU!!!!!!
  • QUESTIONS??