More Related Content

Similar to Building Applications on YARN(20)


Building Applications on YARN

  1. Building Applications on YARN Chris Riccomini 10/11/2012
  2. Staff Software Engineer at LinkedIn @criccomini
  3. What I want to Talk About Anatomy of a YARN Application Things to consider when building your application Architecture Operations
  4. Anatomy of a YARN App Client Application Master Container Code Resource Manager Node Manager
  5. Anatomy of a YARN App Client Client Client RM RM Application Master Container Code Resource Manager NM NM NM NM Node Manager AM AM CC CC * simplified
  6. A lot to consider Deployment Logging Metrics Fault Tolerance Configuration Isolation Security Dashboard Language State
  7. Deployment HDFS HTTP File (NFS) DDOS’ing your servers What we do: Tarball over HTTP. Life is easier with HDFS, but operational overhead is too high.
  8. Metrics Application-level metrics YARN-level metrics metrics2 Containers are transient What we do: Both app-level and framework-level metrics use same metrics framework. Pipe to in-house metrics dashboard. We don’t use metrics2 since we don’t want a dependency on Hadoop in our core jar.
  9. Metrics
  10. Configuration YARN config (yarn-site.xml, core-site.xml, etc) Application Configuration Transporting Configuration What we do: Config is fully resolved at client execution time. No admin-override/locked config protection yet. Config is passed from client to AM to containers via environment variables.
  11. Security Kerberos? Firewalls are your friend Gateway machine Dashboard What we do: Firewall all YARN machines so they can only talk to each-other. All users go through LDAP controlled dashboard.
  12. Language Favor complexity in Application Master, and make container-logic thin Talk to RM via REST Potential to talk to RM via Protobuf RPC What we do: Application AM is Java. Tasks-side of application has Python and Java implementations.
  13. Logging Local storage (application is running) HDFS storage (application has stopped for a while) Be careful with STDOUT/STDERR (rollover) What we do: No HDFS. Logs sit for 7 days, then disappear. Not ideal.
  14. Fault Tolerance Failure matrix HA RM/NM Orphaned processes Pay attention to process trees What we do: No HA. Manual fail over when RM dies. Orphaned process monitor (proc start time < RM start time).
  15. Fault Tolerance
  16. Isolation Memory Disk CPU Network What we do: Nothing, right now. Hoping YARN will solve this before we need it (cgroups?).
  17. Dashboard Application-specific information Integrate with YARN Application Master or Standalone? What we do: Dashboard enforces security, talks to RM/AM via HTTP/JSON to get information about jobs.
  18. Dashboard
  19. State HDFS Deployed with Application Remote data store What we do: Nothing, right now.
  20. Takeaways There’s a lot more than just the YARN API Look for examples (Spark, Storm, Map-Reduce) Decide your level of Hadoop integration Metrics2 HDFS Config Kerberos and doAs
  21. Questions?