Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Building Applications on YARN

On Friday, I presented, "Building Applications on YARN," at the Apache YARN meet-up at Hortonworks. This post contains my slides.

  • Login to see the comments

Building Applications on YARN

  1. 1. Building Applications on YARNChris Riccomini10/11/2012
  2. 2. Staff Software Engineer at LinkedIn @criccomini
  3. 3. What I want to Talk AboutAnatomy of a YARN ApplicationThings to consider when building your application Architecture Operations
  4. 4. Anatomy of a YARN AppClientApplication MasterContainer CodeResource ManagerNode Manager
  5. 5. Anatomy of a YARN AppClient Client Client RM RMApplication MasterContainer CodeResource Manager NM NM NM NMNode Manager AM AM CC CC * simplified
  6. 6. A lot to considerDeployment LoggingMetrics Fault ToleranceConfiguration IsolationSecurity DashboardLanguage State
  7. 7. DeploymentHDFSHTTPFile (NFS)DDOS’ing your serversWhat we do: Tarball over HTTP. Life is easier with HDFS,but operational overhead is too high.
  8. 8. MetricsApplication-level metricsYARN-level metricsmetrics2Containers are transientWhat we do: Both app-level and framework-level metrics usesame metrics framework. Pipe to in-house metricsdashboard. We don’t use metrics2 since we don’t want adependency on Hadoop in our core jar.
  9. 9. Metrics
  10. 10. ConfigurationYARN config (yarn-site.xml, core-site.xml, etc)Application ConfigurationTransporting ConfigurationWhat we do: Config is fully resolved at client execution time.No admin-override/locked config protection yet. Config ispassed from client to AM to containers via environmentvariables.
  11. 11. SecurityKerberos?Firewalls are your friendGateway machineDashboardWhat we do: Firewall all YARN machines so they can onlytalk to each-other. All users go through LDAP controlleddashboard.
  12. 12. LanguageFavor complexity in Application Master, and makecontainer-logic thinTalk to RM via RESTPotential to talk to RM via Protobuf RPCWhat we do: Application AM is Java. Tasks-side ofapplication has Python and Java implementations.
  13. 13. LoggingLocal storage (application is running)HDFS storage (application has stopped for a while)Be careful with STDOUT/STDERR (rollover)What we do: No HDFS. Logs sit for 7 days, then disappear.Not ideal.
  14. 14. Fault ToleranceFailure matrixHA RM/NMOrphaned processesPay attention to process treesWhat we do: No HA. Manual fail over when RM dies.Orphaned process monitor (proc start time < RM start time).
  15. 15. Fault Tolerance
  16. 16. IsolationMemoryDiskCPUNetworkWhat we do: Nothing, right now. Hoping YARN will solvethis before we need it (cgroups?).
  17. 17. DashboardApplication-specific informationIntegrate with YARNApplication Master or Standalone?What we do: Dashboard enforces security, talks to RM/AMvia HTTP/JSON to get information about jobs.
  18. 18. Dashboard
  19. 19. StateHDFSDeployed with ApplicationRemote data storeWhat we do: Nothing, right now.
  20. 20. TakeawaysThere’s a lot more than just the YARN APILook for examples (Spark, Storm, Map-Reduce)Decide your level of Hadoop integration Metrics2 HDFS Config Kerberos and doAs
  21. 21. Questions?