Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Hadoop 3.0:
What’s new in
YARN & MapReduce
Tokyo, Oct.26 2016...
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
About Speakers
⬢ Junping Du
– Apache Hadoop Committer & PMC member
–...
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
⬢ Evolutions in YARN & MR (Done and In Progress)
⬢ Timeline E...
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
First, A bit of Vision…
⬢ Evolution of Hadoop start with YARN
⬢ YARN...
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Several important trends in age of Hadoop 3.0 +
YARN and Other Platf...
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Evolutions in YARN & MR
⬢ Re-architecture for YARN Timeline Service ...
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Timeline Service Revolution – ATS v2
⬢ Why ATS v2?
– Scalability & P...
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Core Design for ATS v2
⬢ Distributed write path
– Logical per app co...
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
ATS v2 Architecture
Resource
Manager
RMApp
NodeManager
Info of Colle...
1
0
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data Model in ATS v2
Entity
ID + Type
Configurations
Metadata(Info...
1
1
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Status for ATS v2
⬢ For other details, like:
– Aggregations (app/f...
1
2
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Native Service Support in YARN
 A native YARN framework. YARN-469...
1
3
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
API Simplification - REST
 Existing APIs are too low level and no...
1
4
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Discovery services in YARN
 YARN Service Discovery via DNS: YARN-...
1
5
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
More Cloud Friendly
⬢ Elastic
–Dynamic Resource Configuration
•YAR...
1
6
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
More Cloud Friendly (Contd.)
⬢ Isolation
–Embrace container techno...
1
7
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Scheduling Enhancements
 Application priorities: YARN-1963
– Inne...
1
8
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Operational and User Experience Enhancements (YARN-3368)
1
9
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Other YARN work could get released in Hadoop 3.X
⬢ Resource profil...
2
0
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Release Timeline for Apache Hadoop 3.0
⬢ 3.0.0-alpha1 is released ...
2
1
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDP Evolution with Apache Hadoop and YARN
2
2
© Hortonworks Inc. 2011 – 2016. All Rights Reserved2
2
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank you!
Upcoming SlideShare
Loading in …5
×

Apache Hadoop 3.0 What's new in YARN and MapReduce

5,734 views

Published on

Apache Hadoop 3.0 What's new in YARN and MapReduce

Published in: Technology
  • Be the first to comment

Apache Hadoop 3.0 What's new in YARN and MapReduce

  1. 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Hadoop 3.0: What’s new in YARN & MapReduce Tokyo, Oct.26 2016 Junping Du junping_du@apache.org
  2. 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved About Speakers ⬢ Junping Du – Apache Hadoop Committer & PMC member – Lead Software Engineer @ Hortonworks YARN Core Team – 10+ years for developing enterprise software (5+ years for being “Hadooper”)
  3. 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda ⬢ Evolutions in YARN & MR (Done and In Progress) ⬢ Timeline Estimation for Apache Hadoop 3.0 Release
  4. 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved First, A bit of Vision… ⬢ Evolution of Hadoop start with YARN ⬢ YARN Evolution will continue to drive Hadoop forward Hadoop 3
  5. 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Several important trends in age of Hadoop 3.0 + YARN and Other Platform Services Storage Resource Management Security Service Discovery Management Monitoring Alerts IOT Assembly Kafka Storm HBase Solr Governance MR Tez Spark … Innovating frameworks: Flink, DL(TensorFlow), etc. Various Environments On Premise Private Cloud Public Cloud
  6. 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Evolutions in YARN & MR ⬢ Re-architecture for YARN Timeline Service - ATS v2 ⬢ Service Native Support in YARN ⬢ YARN Scheduling Enhancements ⬢ More Cloud Friendly ⬢ Better User Experiences ⬢ Other Enhancements
  7. 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Timeline Service Revolution – ATS v2 ⬢ Why ATS v2? – Scalability & Performance To get rid of v1 limitation: •Single global instance of writer/reader •Local disk based LevelDB storage – Usability •Handle flows as first-class concepts and model aggregation •Add configuration and metrics as first-class members •Better support for queries – Reliability v1 limitation: •Data is stored in a local disk •Single point of failure (SPOF) for timeline server – Flexibility •Data model is more describable •Extended to more specific info to app
  8. 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Core Design for ATS v2 ⬢ Distributed write path – Logical per app collector + physical per node writer – Collector/Writer launched as an auxiliary service in NM. – Standalone writers will be added later. ⬢ Pluggable backend storage – Built in with a scalable and reliable implementation (HBase) ⬢ Enhanced data model – Entity (bi-directional relation) with flow, queue, etc. – Configuration, Metric, Event, etc. ⬢ Separate reader instances ⬢ Aggregation & Accumulation – Aggregation: rolling up the metric values to the parent •Online aggregation for apps and flow runs •Offline aggregation for users, flows and queues – Accumulation: rolling up the metric values across time interval •Accumulated resource consumption for app, flow, etc.
  9. 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved ATS v2 Architecture Resource Manager RMApp NodeManager Info of Collectors { app_1, app_2, …. } app_1 AM Syncapp_1 Collector app_n Collector Aux Service AM timeline info Timeline Writer RM app Events NM Collector Service Timeline Writer NM_n … NM_1 app_1 container NM Collector Service Sync Container Monitor 1 1Timeline Reader User Queries Container metric info HBase container info (to be added)
  10. 10. 1 0 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Data Model in ATS v2 Entity ID + Type Configurations Metadata(Info) Parent-Child Relationships Metrics Events Metric ID Metadata Single Value or Time Series(with timestamps) Cluster Type Cluster Attributes Flow Type User Flow Runs Flow Attributes Flow Run Type User Running apps Flow Run Attributes Application Type User Flow + Run Queue Attempts Attempt Type Application Queue Containers Container Type Attempt Attributes Entities of first class citizens User Username(ID) Aggregated metrics Queue Queue(ID) Sub queues Aggregated metrics Aggregation Event ID Metadata Timestamp
  11. 11. 1 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Status for ATS v2 ⬢ For other details, like: – Aggregations (app/flow/user/queue level, offline or online) – HBase table schema for EntityTable, ApplicationTable, FlowRunTable, etc. – Reader APIs (RESTful) Please refer to previous talks in Hadoop Summit 2016 San Jose: https://www.youtube.com/watch?v=adV-DFa-8us&index=6&list=PLKnYDs_-dq16K1NH83Bke2dGGUO3YKZ5b ⬢ Status –Phase I (YARN-2928): already released as an alpha feature in 3.0.0-alpha1 –Phase II (YARN-5355): In progress
  12. 12. 1 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Native Service Support in YARN  A native YARN framework. YARN-4692 – Abstract common Framework (Similar to Slider) to support long running service – More simplified API  Better support for long running service – Recognition of long running service • Affect the policy of preemption, container reservation, etc. – Auto-restart of containers • Containers in long running service are more stateful – Service/application upgrade support • More services are expected to run long enough to across versions – Dynamic container configuration • Only reserve resource for necessary moment
  13. 13. 1 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved API Simplification - REST  Existing APIs are too low level and not easy to work with.  Simple REST API layer fronting YARN – YARN-4793. Simplified API layer for services and beyond  Create and manage lifecycle of YARN services. Example: ZooKeeper App
  14. 14. 1 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Discovery services in YARN  YARN Service Discovery via DNS: YARN-4757 – Expose existing service information in YARN registry via DNS • Current YARN service registry’s records will be converted into DNS entries – Enabling Container to IP mappings - enables discovery of the IPs of containers via standard DNS lookups. • Application – zkapp1.user1.yarncluster.com -> 192.168.10.11:8080 • Container – container-1454001598828-0001-01-00004.yarncluster.com -> 192.168.10.18
  15. 15. 1 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved More Cloud Friendly ⬢ Elastic –Dynamic Resource Configuration •YARN-291 •Allow tune down/up on NM’s resource in runtime –Graceful decommissioning of NodeManagers •YARN-914 •Drains a node that’s being decommissioned to allow running containers to finish ⬢ Efficient –Support for container resizing •YARN-1197 •Allows applications to change the size of an existing container –Task level native optimization •MAPREDUCE-2841
  16. 16. 1 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved More Cloud Friendly (Contd.) ⬢ Isolation –Embrace container technology to achieve better isolation –Resource isolation support for disk and network •YARN-2619 (disk), YARN-2140 (network) •Containers get a fair share of disk and network resources using Cgroups –Docker support in LinuxContainerExecutor •YARN-3611 •Support to launch Docker containers alongside process •Packaging and resource isolation ⬢ Operation –Container upgrades (YARN-4726) •”Do an upgrade of my Spark / HBase apps with minimal impact to end-users” –AM Restart With Work Preserving •MAPREDUCE-6608
  17. 17. 1 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Scheduling Enhancements  Application priorities: YARN-1963 – Inner-queue priority support  Affinity / anti-affinity: YARN-1042 – More restraints on locations  Global Scheduling: YARN-5139 – Get rid of per node scheduling model – Enhance container scheduling throughput
  18. 18. 1 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Operational and User Experience Enhancements (YARN-3368)
  19. 19. 1 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Other YARN work could get released in Hadoop 3.X ⬢ Resource profiles –YARN-3926 –Users can specify resource profile name instead of individual resources –Resource types read via a config file ⬢ YARN federation –YARN-2915 –Allows YARN to scale out to tens of thousands of nodes –Cluster of clusters which appear as a single cluster to an end user ⬢ Gang Scheduling –YARN-624 More Details in tomorrow noon session “Apache Hadoop YARN: Past, Present and Future” by Junping Du and Jian He
  20. 20. 2 0 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Release Timeline for Apache Hadoop 3.0 ⬢ 3.0.0-alpha1 is released on Sep/3/2016 ⬢ alpha2 in Q4. 2016 (Estimated) ⬢ beta1 in early Q1. 2017 (Estimated) ⬢ GA in Q1/Q2 2017 (Estimated)
  21. 21. 2 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDP Evolution with Apache Hadoop and YARN
  22. 22. 2 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved2 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank you!

×