Your SlideShare is downloading. ×
YARN Under The Hood
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

YARN Under The Hood


Published on

About Hadoop YARN internals. Beware it has code snipped from JobTracker :)

About Hadoop YARN internals. Beware it has code snipped from JobTracker :)

Published in: Technology, Education

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • Does this look fragile. No one dares to touch this piece
  • Does this look fragile. No one dares to touch this piece
  • New Features and their impact in State Transitions can be articulated, discussed and designed
  • Transcript

    • 1. Apache HadoopYARN - Under the Hood Sharad Agarwal
    • 2. About me• Apache Foundation – Hadoop Committer and PMC member – Hadoop MR contributor for over 4 years – Author of Hadoop Nextgen/YARN core and MR Application Master – Organizer of Hadoop Bangalore Meetups• Head of Technology Platforms @InMobi – Formerly Architect @Yahoo!
    • 3. • Component ArchitectureYARN CORE • Concurrency • State Management
    • 4. Concurrent Systems
    • 5. Concurrent System ?// Loop through all expired items in the queue//// Need to lock the JobTracker here since we are// manipulating its data-structures via// -> JobTracker.lostTaskTracker ->// JobInProgress.failedTask -> JobTracker.markCompleteTaskAttempt// Also need to lock JobTracker before locking taskTracker &// trackerExpiryQueue to prevent deadlock// This method is synchronized to make sure that the locking order// "taskTrackers lock followed by faultyTrackers.potentiallyFaultyTrackers// lock" is under JobTracker lock to avoid deadlocks.
    • 6. JobTracker JIP TIP Scheduler Heartbeat RequestHeartbeatResponse JobTracker Global Lock
    • 7. Highly Concurrent Systems• scales much better (if done right)• makes effective use of multi-core hardwares• even more important in master slave architectures• overall job latencies of the systems can come down drastically• managing eventual consistency of states hard• need for a systemic framework to manage this
    • 8. Event Queue Event Dispatcher Component Component Component A B N• Mutations only via events• Components only expose Read APIs• Use Re-entrant locks• Components follow clear lifecycle Event Model
    • 9. Heartbeat Listener Event Q NM Info Heartbeat Request Get commandsHeartbeatResponse Asynchronous Heartbeat Handling
    • 10. RM NM Heartbeats MR Scheduler Container Launcher Event QueueClient Handler Job Task Attempt Task Listener HeartbeatsJob Client MR AM Task
    • 11. StateManagement
    • 12. Looks Familiar ?public synchronized void updateTaskStatus(TaskInProgress tip, TaskStatus status) { double oldProgress = tip.getProgress(); // save old progress boolean wasRunning = tip.isRunning(); Very Hard to Maintain boolean wasComplete = tip.isComplete(); boolean wasPending = tip.isOnlyCommitPending(); Debugging even harder TaskAttemptID taskid = status.getTaskID(); boolean wasAttemptRunning = tip.isAttemptRunning(taskid); // If the TIP is already completed and the task reports as SUCCEEDED then // mark the task as KILLED. // In case of task with no promotion the task tracker will mark the task // as SUCCEEDED. // User has requested to kill the task, but TT reported SUCCEEDED, // mark the task KILLED. if ((wasComplete || tip.wasKilled(taskid)) && (status.getRunState() == TaskStatus.State.SUCCEEDED)) { status.setRunState(TaskStatus.State.KILLED); } // If the job is complete and a task has just reported its // state as FAILED_UNCLEAN/KILLED_UNCLEAN, // make the tasks state FAILED/KILLED without launching cleanup attempt. // Note that if task is already a cleanup attempt, // we dont change the state to make sure the task gets a killTaskAction if ((this.isComplete() || jobFailed || jobKilled) && !tip.isCleanupAttempt(taskid)) { if (status.getRunState() == TaskStatus.State.FAILED_UNCLEAN) { status.setRunState(TaskStatus.State.FAILED); } else if (status.getRunState() == TaskStatus.State.KILLED_UNCLEAN) { status.setRunState(TaskStatus.State.KILLED); } }
    • 13. Complex State Management• Light weight State Machines Library – Declarative way of specifying the state Transitions – Invalid transitions are handled automatically – Fits nicely with the event model – Debuggability is drastically improved. Lineage of object states can easily be determined – Handy while recovering the state
    • 14. Declarative State MachineStateMachineFactory<JobImpl, JobState, JobEventType, JobEvent> stateMachineFactory = new StateMachineFactory<JobImpl, JobState, JobEventType, JobEvent>(JobState.NEW) // Transitions from NEW state .addTransition(JobState.NEW, JobState.NEW, JobEventType.JOB_DIAGNOSTIC_UPDATE, DIAGNOSTIC_UPDATE_TRANSITION) .addTransition(JobState.NEW, JobState.NEW, JobEventType.JOB_COUNTER_UPDATE, COUNTER_UPDATE_TRANSITION) .addTransition (JobState.NEW, EnumSet.of(JobState.INITED, JobState.FAILED), JobEventType.JOB_INIT, new InitTransition()) .addTransition(JobState.NEW, JobState.KILLED, JobEventType.JOB_KILL, new KillNewJobTransition())
    • 15. YARN: New Possibilities• Open MPI - MR-2911• Master-Worker – MR-3315• Distributed Shell• Graph processing – Giraph-13• BSP – HAMA-431• CEP – Storm •• Iterative processing - Spark •
    • 16. Thank You!@twitter: sharad_ag