YARN Under The Hood

Apache Hadoop
YARN - Under the Hood

Sharad Agarwal
sharad@apache.org

About me
• Apache Foundation
– Hadoop Committer and PMC member
– Hadoop MR contributor for over 4 years
– Author of Hadoop Nextgen/YARN core and MR
Application Master
– Organizer of Hadoop Bangalore Meetups

• Head of Technology Platforms @InMobi
– Formerly Architect @Yahoo!

• Component
Architecture

YARN CORE • Concurrency

• State
Management

Concurrent System ?
// Loop through all expired items in the queue
//
// Need to lock the JobTracker here since we are
// manipulating it's data-structures via
// ExpireTrackers.run -> JobTracker.lostTaskTracker ->
// JobInProgress.failedTask -> JobTracker.markCompleteTaskAttempt
// Also need to lock JobTracker before locking 'taskTracker' &
// 'trackerExpiryQueue' to prevent deadlock

// This method is synchronized to make sure that the locking order
// "taskTrackers lock followed by faultyTrackers.potentiallyFaultyTrackers
// lock" is under JobTracker lock to avoid deadlocks.

JobTracker JIP TIP Scheduler

Heartbeat
Request

Heartbeat
Response

JobTracker Global Lock

Highly Concurrent Systems
• scales much better (if done right)
• makes effective use of multi-core hardwares
• even more important in master slave
architectures
• overall job latencies of the systems can come
down drastically
• managing eventual consistency of states hard
• need for a systemic framework to manage this

Event Queue Event
Dispatcher

Component Component Component
A B N

• Mutations only via events
• Components only expose Read APIs
• Use Re-entrant locks
• Components follow clear lifecycle

Event Model

Heartbeat
Listener Event Q NM Info

Heartbeat
Request

Get commands

Heartbeat
Response

Asynchronous
Heartbeat Handling

RM NM

Heartbeats

MR Scheduler Container Launcher

Event Queue

Client Handler Job Task Attempt Task Listener

Heartbeats

Job Client MR AM Task

Looks Familiar ?
public synchronized void updateTaskStatus(TaskInProgress tip,
TaskStatus status) {

double oldProgress = tip.getProgress(); // save old progress
boolean wasRunning = tip.isRunning();
Very Hard to Maintain
boolean wasComplete = tip.isComplete();
boolean wasPending = tip.isOnlyCommitPending(); Debugging even harder
TaskAttemptID taskid = status.getTaskID();
boolean wasAttemptRunning = tip.isAttemptRunning(taskid);

// If the TIP is already completed and the task reports as SUCCEEDED then
// mark the task as KILLED.
// In case of task with no promotion the task tracker will mark the task
// as SUCCEEDED.
// User has requested to kill the task, but TT reported SUCCEEDED,
// mark the task KILLED.
if ((wasComplete || tip.wasKilled(taskid)) &&
(status.getRunState() == TaskStatus.State.SUCCEEDED)) {
status.setRunState(TaskStatus.State.KILLED);
}

// If the job is complete and a task has just reported its
// state as FAILED_UNCLEAN/KILLED_UNCLEAN,
// make the task's state FAILED/KILLED without launching cleanup attempt.
// Note that if task is already a cleanup attempt,
// we don't change the state to make sure the task gets a killTaskAction
if ((this.isComplete() || jobFailed || jobKilled) &&
!tip.isCleanupAttempt(taskid)) {
if (status.getRunState() == TaskStatus.State.FAILED_UNCLEAN) {
status.setRunState(TaskStatus.State.FAILED);
} else if (status.getRunState() == TaskStatus.State.KILLED_UNCLEAN) {
status.setRunState(TaskStatus.State.KILLED);
}
}

Complex State Management
• Light weight State Machines Library
– Declarative way of specifying the state Transitions
– Invalid transitions are handled automatically
– Fits nicely with the event model
– Debuggability is drastically improved. Lineage of
object states can easily be determined
– Handy while recovering the state

Declarative State Machine
StateMachineFactory<JobImpl, JobState, JobEventType, JobEvent> stateMachineFactory
= new StateMachineFactory<JobImpl, JobState, JobEventType, JobEvent>(JobState.NEW)

// Transitions from NEW state
.addTransition(JobState.NEW, JobState.NEW,
JobEventType.JOB_DIAGNOSTIC_UPDATE,
DIAGNOSTIC_UPDATE_TRANSITION)
.addTransition(JobState.NEW, JobState.NEW,
JobEventType.JOB_COUNTER_UPDATE, COUNTER_UPDATE_TRANSITION)
.addTransition
(JobState.NEW,
EnumSet.of(JobState.INITED, JobState.FAILED),
JobEventType.JOB_INIT,
new InitTransition())
.addTransition(JobState.NEW, JobState.KILLED,
JobEventType.JOB_KILL,
new KillNewJobTransition())

YARN: New Possibilities
• Open MPI - MR-2911
• Master-Worker – MR-3315
• Distributed Shell
• Graph processing – Giraph-13
• BSP – HAMA-431
• CEP – Storm
• https://github.com/nathanmarz/storm/issues/74
• Iterative processing - Spark
• https://github.com/mesos/spark-yarn/

Thank You!
@twitter: sharad_ag

YARN Under The Hood

More Related Content

Viewers also liked

Similar to YARN Under The Hood

Recently uploaded

YARN Under The Hood

Editor's Notes