Building Applications on YARNChris Riccomini10/11/2012
Staff Software Engineer at LinkedIn http://riccomini.name @criccomini
What I want to Talk AboutAnatomy of a YARN ApplicationThings to consider when building your application Architecture Operations
Anatomy of a YARN AppClientApplication MasterContainer CodeResource ManagerNode Manager
Anatomy of a YARN AppClient Client Client RM RMApplication MasterContainer CodeResource Manager NM NM NM NMNode Manager AM AM CC CC * simplified
A lot to considerDeployment LoggingMetrics Fault ToleranceConfiguration IsolationSecurity DashboardLanguage State
DeploymentHDFSHTTPFile (NFS)DDOS’ing your serversWhat we do: Tarball over HTTP. Life is easier with HDFS,but operational overhead is too high.
MetricsApplication-level metricsYARN-level metricsmetrics2Containers are transientWhat we do: Both app-level and framework-level metrics usesame metrics framework. Pipe to in-house metricsdashboard. We don’t use metrics2 since we don’t want adependency on Hadoop in our core jar.
ConfigurationYARN config (yarn-site.xml, core-site.xml, etc)Application ConfigurationTransporting ConfigurationWhat we do: Config is fully resolved at client execution time.No admin-override/locked config protection yet. Config ispassed from client to AM to containers via environmentvariables.
SecurityKerberos?Firewalls are your friendGateway machineDashboardWhat we do: Firewall all YARN machines so they can onlytalk to each-other. All users go through LDAP controlleddashboard.
LanguageFavor complexity in Application Master, and makecontainer-logic thinTalk to RM via RESTPotential to talk to RM via Protobuf RPCWhat we do: Application AM is Java. Tasks-side ofapplication has Python and Java implementations.
LoggingLocal storage (application is running)HDFS storage (application has stopped for a while)Be careful with STDOUT/STDERR (rollover)What we do: No HDFS. Logs sit for 7 days, then disappear.Not ideal.
Fault ToleranceFailure matrixHA RM/NMOrphaned processesPay attention to process treesWhat we do: No HA. Manual fail over when RM dies.Orphaned process monitor (proc start time < RM start time).