Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Timeline Service v.2 (Hadoop Summit 2016)


Published on

Hadoop Summit talk on YARN Timeline Service v.2

Published in: Software

Timeline Service v.2 (Hadoop Summit 2016)

  1. 1. (Big Data)2 How YARN Timeline Service v.2 Unlocks 360-Degree Pla@orm Insights at Scale Sangjin Lee @sjlee (Twi5er) Li Lu (Hortonworks) Vrushali Channapa5an @vrushalivc (Twi5er)
  2. 2. Outline • Why v.2? • Highlights • Developing for Timeline Service v.2 • SeIng up Timeline Service v.2 • Milestones • Demo
  3. 3. Why v.2? • YARN Timeline Service v 1.x • Gained good adopSon: Tez, HIVE, Pig, etc. • Keeps improving with v 1.5 APIs and storage implementaSon • SSll facing some fundamental challenges...
  4. 4. Why v.2? • Scalability and reliability challenges • Single instance of Timeline Server • Storage (single local LevelDB instance) • Usability • Flow • Metrics and configuraSon as first-class ciSzens • Metrics aggregaSon up the enSty hierarchy
  5. 5. Highlights v.1 v.2 Single writer/reader Timeline Server Distributed writer/collector architecture Single local LevelDB storage* Scalable storage (HBase) v.1 enSty model New v.2 enSty model No aggregaSon Metrics aggregaSon REST API Richer query REST API
  6. 6. Architecture • SeparaSon of writers (“collectors”) and readers • Distributed collectors: one collector for each app • Dedicated RM collector for RM-generated data • Collector discovery via RM • Pluggable storage with HBase as default storage
  7. 7. Distributed collectors & readers !meline reader !meline reader Storage !meline reader AM !meline collector NM !meline reader pool app metrics/events container events/metrics RM !meline collector app/container events user queries (worker node running AM) (worker node running containers) write flow read flow
  8. 8. Collector discovery RM AM app id => address ! start AM container NM 3meline collector " node heartbeat # allocate response worker node 3meline client
  9. 9. New enSty model • Flows and flow runs as parents of YARN applicaSon enSSes • First-class configuraSon (key-value pairs) • First-class metrics (single-value or Sme series) • Designed to handle mulS-cluster environment out of the box
  10. 10. What is a flow? • A flow is a group of YARN applicaSons that are launched as parts of a logical app • Oozie, Scalding, Pig, etc. • name: “frequent_visitor_stat” • run id: 1466097809000 • version: “b9b9068”
  11. 11. ConfiguraSon and metrics • Now explicit top-level a5ributes of enSSes • Fine-grained updates and queries made possible • “update metric A to value x” • “query enMMes where config A = B” container 1_1 metric: A = 10 metric: B = 100 config: "Foo" = "bar"
  12. 12. ConfiguraSon and metrics • Now explicit top-level a5ributes of enSSes • Fine-grained updates and queries made possible • “update metric A to value x” • “query enMMes where config A = B” container 1_1 metric: A = 50 metric: B = 100 config: "Foo" = "bar"
  13. 13. HBase Storage • Scalable backend • Row Key structure • efficient range scans • KeyPrefixRegionSplitPolicy • Filter pushdown • Coprocessors for flow aggregaSon (“readless” aggregaSon) • Cell tags for metadata (applicaSon id, aggregaSon operaSon) • Cell Smestamps generated during put • lei shiied with app id added to avoid overwrites
  14. 14. Tables in HBase • flow run • application • entity • flow activity • app to flow
  15. 15. table: flow run Row key: clusterId!userName! flowName! inverted(flowRunId) most recent flow run stored first coprocessor enabled
  16. 16. table: applicaSon Row key: clusterId!userName! flowName! inverted(flowRunId)! AppId applicaSons within a flow run stored together most recent flow run stored first
  17. 17. table: enSty Row key: userName!clusterId!flowName! inverted(flowRunId)!AppId!entityType! entityId enSSes within an applicaSon within a flow run stored together per type • for example, all containers within a yarn applicaSon will be stored together pre-split table stores information per entity run like info, relatesTo, relatedTo, events, metrics, config
  18. 18. table: flow acSvity Row key: clusterId! inverted(TopOfTheDay)! userName!flowName shows the flows that ran on that day stores informaSon per flow like number of runs, the run ids, versions
  19. 19. table: appToFlow Row key: clusterId!appId - stores mapping of appId to flowName and flowRunId
  20. 20. Metrics aggregaSon • ApplicaSon level • Rolls up sub-applicaSon metrics • Performed in real Sme in the collectors in memory • Flow run level • Rolls up app level metrics • Performed in HBase region servers via coprocessors • Offline aggregaSon (TBD) • Rolls up on user, queue, and flow offline periodically • Phoenix tables Container 1_1 “bytes” : 23 Container 1_2 “bytes” : 135 Container 2_1 “bytes” : 50 Container 3_1 “bytes” : 64 App1 “bytes”: 158 App2 “bytes”: 50 App3 “bytes”: 64 flow1 “bytes”: 208 flow2 “bytes”: 64 user1 “bytes”: 272 queue1 “bytes”: 272 App aggregation In collector flow aggregation In hbase offline aggregation
  21. 21. FlowRun Aggrega:on via the HBase Coprocessor App Metrics Cells in HBase FlowRun Metric Sum
  22. 22. App Metrics Cells in HBase FlowRun Metric Sum FlowRun Aggrega:on via the HBase Coprocessor
  23. 23. Reader REST API: paths • URLs under /ws/v2/Smeline • Canonical REST style URLs: /ws/v2/Smeline/clusters/cluster_name/ users/user_name/flows/flow_name/runs/run_id • Path elements may be omi5ed if they can be inferred • flow context can be inferred by app id • default cluster is assumed if cluster is omi5ed
  24. 24. Reader REST API: query params • limit, createdTimeStart, createdTimeEnd: constrain the enSSes • fields (ALL | EVENTS | INFO | CONFIGS | METRICS | RELATES_TO | IS_RELATED_TO): limit the contents to return • metricsToRetrieve, confsToRetrieve: further limit the contents to return • metricsLimit: limits the number of values in a Sme series
  25. 25. Reader REST API: query params • relatesTo, isRelatedTo: filters by associaSon • *Filters: filters by info, config, metric, event, … • Supports complex filters including operators • metricFilter=(((metric1 eq 50) AND (metric2 gt 40)) OR (metric1 lt 20))
  26. 26. Developing: TimelineClient In your application master: // create TimelineClient v.2 style TimelineClient client = TimelineClient.createTimelineClient(appId); client.init(conf); client.start(); // bind it to AM/RM client to receive the collector address amRMClient.registerTimelineClient(client); // create and write timeline entities TimelineEntity entity = new TimelineEntity(); client.putEntities(entity); // when the app is complete, stop the timeline client client.stop();
  27. 27. Developing: Flow context In your app submitter: ApplicationSubmissionContext appContext = app.getApplicationSubmissionContext(); // set the flow context as YARN application tags Set<String> tags = new HashSet<>(); tags.add(TimelineUtils.generateFlowNameTag("distributed grep")); tags.add(TimelineUtils.generateFlowVersionTag( "3df8b0d6100530080d2e0decf9e528e57c42a90a")); tags.add(TimelineUtils.generateFlowRunIdTag(System.currentTimeMillis())); appContext.setApplicationTags(tags);
  28. 28. SeIng up Timeline Service v.2 • Set up the HBase cluster (1.1.x) • Add the Smeline service jar to HBase • Install the flow run coprocessor • Create tables via TimelineSchemaCreator uSlity • Configure the YARN cluster • Enable Timeline Service v.2 • Add hbase-site.xml for the Smeline collector and readers • Start the Smeline reader daemon
  29. 29. Milestone 1 ("Alpha 1") • Merge discussion (YARN-2928) in progress as we speak! ✓ Complete end-to-end read/write flow ✓ Real Sme applicaSon and flow aggregaSon ✓ New enSty model ✓ HBase Storage ✓ Rich REST API ✓ IntegraSon with Distributed Shell and MapReduce ✓ YARN generic events and system metrics
  30. 30. Milestones - Future • Milestone 2 (“Alpha 2”) • IntegraSon with new YARN UI • IntegraSon with more frameworks • Beta • Freeze API and storage schema • Security • Collectors as containers • Storage fault tolerance • ProducSon-ready • MigraSon-ready
  31. 31. Demo
  32. 32. Contributors • Li Lu, Junping Du, Vinod Kumar Vavilapalli (Hortonworks) • Varun Saxena, Naganarasimha G. R. (Huawei) • Sangjin Lee, Vrushali Channapa5an, Joep RoInghuis (Twi5er) • Zhijie Shen (now at Facebook) • The HBase and Phoenix community!
  33. 33. Thank you!