YARN - Under the Hood
Sharad Agarwal discusses the architecture of YARN and how it handles concurrency. YARN uses an event-driven model with asynchronous processing to improve scalability. It employs a lightweight state machine library to better manage state transitions and make debugging easier. This declarative approach to state management improves on Hadoop 1's approach which was difficult to maintain and debug.
2. About me
• Apache Foundation
– Hadoop Committer and PMC member
– Hadoop MR contributor for over 4 years
– Author of Hadoop Nextgen/YARN core and MR
Application Master
– Organizer of Hadoop Bangalore Meetups
• Head of Technology Platforms @InMobi
– Formerly Architect @Yahoo!
5. Concurrent System ?
// Loop through all expired items in the queue
//
// Need to lock the JobTracker here since we are
// manipulating it's data-structures via
// ExpireTrackers.run -> JobTracker.lostTaskTracker ->
// JobInProgress.failedTask -> JobTracker.markCompleteTaskAttempt
// Also need to lock JobTracker before locking 'taskTracker' &
// 'trackerExpiryQueue' to prevent deadlock
// This method is synchronized to make sure that the locking order
// "taskTrackers lock followed by faultyTrackers.potentiallyFaultyTrackers
// lock" is under JobTracker lock to avoid deadlocks.
6. JobTracker JIP TIP Scheduler
Heartbeat
Request
Heartbeat
Response
JobTracker Global Lock
7. Highly Concurrent Systems
• scales much better (if done right)
• makes effective use of multi-core hardwares
• even more important in master slave
architectures
• overall job latencies of the systems can come
down drastically
• managing eventual consistency of states hard
• need for a systemic framework to manage this
8. Event Queue Event
Dispatcher
Component Component Component
A B N
• Mutations only via events
• Components only expose Read APIs
• Use Re-entrant locks
• Components follow clear lifecycle
Event Model
9. Heartbeat
Listener Event Q NM Info
Heartbeat
Request
Get commands
Heartbeat
Response
Asynchronous
Heartbeat Handling
12. Looks Familiar ?
public synchronized void updateTaskStatus(TaskInProgress tip,
TaskStatus status) {
double oldProgress = tip.getProgress(); // save old progress
boolean wasRunning = tip.isRunning();
Very Hard to Maintain
boolean wasComplete = tip.isComplete();
boolean wasPending = tip.isOnlyCommitPending(); Debugging even harder
TaskAttemptID taskid = status.getTaskID();
boolean wasAttemptRunning = tip.isAttemptRunning(taskid);
// If the TIP is already completed and the task reports as SUCCEEDED then
// mark the task as KILLED.
// In case of task with no promotion the task tracker will mark the task
// as SUCCEEDED.
// User has requested to kill the task, but TT reported SUCCEEDED,
// mark the task KILLED.
if ((wasComplete || tip.wasKilled(taskid)) &&
(status.getRunState() == TaskStatus.State.SUCCEEDED)) {
status.setRunState(TaskStatus.State.KILLED);
}
// If the job is complete and a task has just reported its
// state as FAILED_UNCLEAN/KILLED_UNCLEAN,
// make the task's state FAILED/KILLED without launching cleanup attempt.
// Note that if task is already a cleanup attempt,
// we don't change the state to make sure the task gets a killTaskAction
if ((this.isComplete() || jobFailed || jobKilled) &&
!tip.isCleanupAttempt(taskid)) {
if (status.getRunState() == TaskStatus.State.FAILED_UNCLEAN) {
status.setRunState(TaskStatus.State.FAILED);
} else if (status.getRunState() == TaskStatus.State.KILLED_UNCLEAN) {
status.setRunState(TaskStatus.State.KILLED);
}
}
13.
14. Complex State Management
• Light weight State Machines Library
– Declarative way of specifying the state Transitions
– Invalid transitions are handled automatically
– Fits nicely with the event model
– Debuggability is drastically improved. Lineage of
object states can easily be determined
– Handy while recovering the state
15. Declarative State Machine
StateMachineFactory<JobImpl, JobState, JobEventType, JobEvent> stateMachineFactory
= new StateMachineFactory<JobImpl, JobState, JobEventType, JobEvent>(JobState.NEW)
// Transitions from NEW state
.addTransition(JobState.NEW, JobState.NEW,
JobEventType.JOB_DIAGNOSTIC_UPDATE,
DIAGNOSTIC_UPDATE_TRANSITION)
.addTransition(JobState.NEW, JobState.NEW,
JobEventType.JOB_COUNTER_UPDATE, COUNTER_UPDATE_TRANSITION)
.addTransition
(JobState.NEW,
EnumSet.of(JobState.INITED, JobState.FAILED),
JobEventType.JOB_INIT,
new InitTransition())
.addTransition(JobState.NEW, JobState.KILLED,
JobEventType.JOB_KILL,
new KillNewJobTransition())