Building Applications on YARN


Published on

On Friday, I presented, "Building Applications on YARN," at the Apache YARN meet-up at Hortonworks. This post contains my slides.

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Building Applications on YARN

  1. 1. Building Applications on YARNChris Riccomini10/11/2012
  2. 2. Staff Software Engineer at LinkedIn @criccomini
  3. 3. What I want to Talk AboutAnatomy of a YARN ApplicationThings to consider when building your application Architecture Operations
  4. 4. Anatomy of a YARN AppClientApplication MasterContainer CodeResource ManagerNode Manager
  5. 5. Anatomy of a YARN AppClient Client Client RM RMApplication MasterContainer CodeResource Manager NM NM NM NMNode Manager AM AM CC CC * simplified
  6. 6. A lot to considerDeployment LoggingMetrics Fault ToleranceConfiguration IsolationSecurity DashboardLanguage State
  7. 7. DeploymentHDFSHTTPFile (NFS)DDOS’ing your serversWhat we do: Tarball over HTTP. Life is easier with HDFS,but operational overhead is too high.
  8. 8. MetricsApplication-level metricsYARN-level metricsmetrics2Containers are transientWhat we do: Both app-level and framework-level metrics usesame metrics framework. Pipe to in-house metricsdashboard. We don’t use metrics2 since we don’t want adependency on Hadoop in our core jar.
  9. 9. Metrics
  10. 10. ConfigurationYARN config (yarn-site.xml, core-site.xml, etc)Application ConfigurationTransporting ConfigurationWhat we do: Config is fully resolved at client execution time.No admin-override/locked config protection yet. Config ispassed from client to AM to containers via environmentvariables.
  11. 11. SecurityKerberos?Firewalls are your friendGateway machineDashboardWhat we do: Firewall all YARN machines so they can onlytalk to each-other. All users go through LDAP controlleddashboard.
  12. 12. LanguageFavor complexity in Application Master, and makecontainer-logic thinTalk to RM via RESTPotential to talk to RM via Protobuf RPCWhat we do: Application AM is Java. Tasks-side ofapplication has Python and Java implementations.
  13. 13. LoggingLocal storage (application is running)HDFS storage (application has stopped for a while)Be careful with STDOUT/STDERR (rollover)What we do: No HDFS. Logs sit for 7 days, then disappear.Not ideal.
  14. 14. Fault ToleranceFailure matrixHA RM/NMOrphaned processesPay attention to process treesWhat we do: No HA. Manual fail over when RM dies.Orphaned process monitor (proc start time < RM start time).
  15. 15. Fault Tolerance
  16. 16. IsolationMemoryDiskCPUNetworkWhat we do: Nothing, right now. Hoping YARN will solvethis before we need it (cgroups?).
  17. 17. DashboardApplication-specific informationIntegrate with YARNApplication Master or Standalone?What we do: Dashboard enforces security, talks to RM/AMvia HTTP/JSON to get information about jobs.
  18. 18. Dashboard
  19. 19. StateHDFSDeployed with ApplicationRemote data storeWhat we do: Nothing, right now.
  20. 20. TakeawaysThere’s a lot more than just the YARN APILook for examples (Spark, Storm, Map-Reduce)Decide your level of Hadoop integration Metrics2 HDFS Config Kerberos and doAs
  21. 21. Questions?