Vinod Kumar Vavilapalli
Apache Hadoop PMC, Co-founder of YARN project
Hortonworks Inc
A Multi-Colored YARN
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
About.html
 Apache Hadoop PMC, ASF Member
 9 years of only Hadoop
– Finally the job-adverts asking for “10 years of Hadoop experience” have validity
 ’Rewritten’ the Hadoop processing side – Became Apache Hadoop YARN
 With me today
– Billie Rinaldi: VP Apache Accumulo, Apache Slider PMC, ASF Member
– Jayush Luniya: Apache Ambari PMC
– Vadim Vaks: Kickass field guy (Sr. Solutions Architect)
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hadoop Compute Platform Today
 Layers that enable applications and higher order
frameworks
 It’s all about data!
 Still a single colored yarn
 Apache Hadoop YARN pretty good at jobs, queries,
short running apps
– We will continue doing this
 Admins and admin tools (Ambari) takes care of
statically provisioned services
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hadoop Compute Platform Today
Platform Services
Storage
Resource
Management Security
Management
Monitoring
Alerts
Governance
MR Tez Spark …
 Run everything in a single secure, multi-
tenant, elastic Hadoop YARN cluster
– An ongoing journey
 Adding new ‘stuff’ to this stack is an
involved effort
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Evolution of user focus
 A need for reuse, composition and to keep building ‘upwards’
 Applications & services & more complex combinations - Assembly
IOT ApplicationsApache Metron
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
IOT ApplicationsApache Metron
• Simplified deployment of an assembly
– Ready to go packages
– Discovery
– Resource/capacity planning
• Management / monitoring / metrics of assemblies!
– “Start / stop” my business app end-to-end
– “Tell me what’s happening with my business application”
– “I don’t care whether HBase RegionServer is down or not, is my assembly healthy?”
• Scale up/down the entire app!
– “I got more input coming in, I don’t care how you scale individual pieces, but do scale the entire machinery”
Emerging needs of the platform
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why on YARN?
 Manual plumbing is very tiresome, not repeatable
 Assemblies - similar to apps & services, but N x harder (because there are N services to
grapple with)
 Why not static allocations?
– Machines die
– Jobs (MapReduce, Spark) are tolerant of faults, but static services aren’t!
– Upfront capacity planning
– Cannot react to hardware or utilization changes without manual intervention
– Elasticity is a manual operation
 This is fundamentally the same resource-management problem that YARN is built to
address!
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why on YARN? Contd..
 The Apache Hadoop ecosystem knows Data services the best – YARN is data-first!
 Big Data use-cases don’t stop at Hadoop services and apps
– Hive for all data, summary in traditional on-demand DB for driving analysts
– Extracting results from HDP and hosting report servers, interactive Uis like Apache Zeppelin
 Users don’t care about this separation
– Big Data is already a huge cluster on one side
– Asking for another infrastructure & needing separate management of this other stuff is
burdensome
– Unified solution >> Silos
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hadoop Compute Platform Next
 A colorful, multi-threaded yarn
 For use-cases of various colors
 Today’s applications better
 Simplified long running applications
 Bring your app easily
https://www.flickr.com/photos/happyskrappy/15699919424
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What is happening now?
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Packaging
 Containers
– Lightweight mechanism for packaging and resource isolation
– Popularized and made accessible by Docker
– Can replace VMs in some cases
– Or more accurately, VMs got used in places where they didn’t
need to be
 Native integration ++ in YARN
– Support for “Container Runtimes” in LCE: YARN-3611
– Process runtime
– Docker runtime
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
APIs
 Applications need simple APIs
 Need to be deployable “easily”
 Simple REST API layer fronting YARN
– https://issues.apache.org/jira/browse/YARN-4793
– [Umbrella] Simplified API layer for services and beyond
 Spawn services & Manage them
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Platform++
 YARN itself is evolving to support services and complex apps
– https://issues.apache.org/jira/browse/YARN-4692
– [Umbrella] Simplified and first-class support for services in YARN
 Scheduling
– Application priorities: YARN-1963
– Affinity / anti-affinity: YARN-1042
– Services as first-class citizens: Preemption, reservations etc
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Platform++ Contd
 Application & Services upgrades
– ”Do an upgrade of my Spark / HBase apps with minimal impact to end-users”
– YARN-4726
 Simplified discovery of services via DNS mechanisms: YARN-4757
 YARN Federation – to infinity and beyond: YARN-2915
 Easier container sizing models: Resource profiles: YARN-3926
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Framework++
 Platform is only as good as the tools
 A native YARN framework
– https://issues.apache.org/jira/browse/YARN-4692
– [Umbrella] Native YARN framework layer for services and
beyond
 Slider supporting a DAG of apps:
– https://issues.apache.org/jira/browse/SLIDER-875
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
User facing and operational experience
 Modern YARN web UI - YARN-3368
 Enhanced shell interfaces
 Metrics: Timeline Service V2 – YARN-2928
 Application & Services monitoring, integration with other systems
 First class support for YARN hosted services in Ambari
– https://issues.apache.org/jira/browse/AMBARI-17353
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Use-cases.. Assemble!
Platform Services
Storage
Resource
Management Security
Service
Discovery Management
Monitoring
Alerts
Holiday Assembly
HBase
Web
Server
IOT Assembly
Kafka Storm HBase Solr
Governance
MR Tez Spark …
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Take away..
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank You
(Rest of) The demo Team
• Gour Saha
• Sidhartha Seethana
• Varun Vasudev
• Shane Kumpf
• Jaimin Jetly
• Yusaku Sako
• Yu Liu

A Multi Colored YARN

  • 1.
    Vinod Kumar Vavilapalli ApacheHadoop PMC, Co-founder of YARN project Hortonworks Inc A Multi-Colored YARN
  • 2.
    2 © HortonworksInc. 2011 – 2016. All Rights Reserved About.html  Apache Hadoop PMC, ASF Member  9 years of only Hadoop – Finally the job-adverts asking for “10 years of Hadoop experience” have validity  ’Rewritten’ the Hadoop processing side – Became Apache Hadoop YARN  With me today – Billie Rinaldi: VP Apache Accumulo, Apache Slider PMC, ASF Member – Jayush Luniya: Apache Ambari PMC – Vadim Vaks: Kickass field guy (Sr. Solutions Architect)
  • 3.
    3 © HortonworksInc. 2011 – 2016. All Rights Reserved Hadoop Compute Platform Today  Layers that enable applications and higher order frameworks  It’s all about data!  Still a single colored yarn  Apache Hadoop YARN pretty good at jobs, queries, short running apps – We will continue doing this  Admins and admin tools (Ambari) takes care of statically provisioned services
  • 4.
    4 © HortonworksInc. 2011 – 2016. All Rights Reserved Hadoop Compute Platform Today Platform Services Storage Resource Management Security Management Monitoring Alerts Governance MR Tez Spark …  Run everything in a single secure, multi- tenant, elastic Hadoop YARN cluster – An ongoing journey  Adding new ‘stuff’ to this stack is an involved effort
  • 5.
    5 © HortonworksInc. 2011 – 2016. All Rights Reserved Evolution of user focus  A need for reuse, composition and to keep building ‘upwards’  Applications & services & more complex combinations - Assembly IOT ApplicationsApache Metron
  • 6.
    6 © HortonworksInc. 2011 – 2016. All Rights Reserved IOT ApplicationsApache Metron • Simplified deployment of an assembly – Ready to go packages – Discovery – Resource/capacity planning • Management / monitoring / metrics of assemblies! – “Start / stop” my business app end-to-end – “Tell me what’s happening with my business application” – “I don’t care whether HBase RegionServer is down or not, is my assembly healthy?” • Scale up/down the entire app! – “I got more input coming in, I don’t care how you scale individual pieces, but do scale the entire machinery” Emerging needs of the platform
  • 7.
    7 © HortonworksInc. 2011 – 2016. All Rights Reserved Why on YARN?  Manual plumbing is very tiresome, not repeatable  Assemblies - similar to apps & services, but N x harder (because there are N services to grapple with)  Why not static allocations? – Machines die – Jobs (MapReduce, Spark) are tolerant of faults, but static services aren’t! – Upfront capacity planning – Cannot react to hardware or utilization changes without manual intervention – Elasticity is a manual operation  This is fundamentally the same resource-management problem that YARN is built to address!
  • 8.
    8 © HortonworksInc. 2011 – 2016. All Rights Reserved Why on YARN? Contd..  The Apache Hadoop ecosystem knows Data services the best – YARN is data-first!  Big Data use-cases don’t stop at Hadoop services and apps – Hive for all data, summary in traditional on-demand DB for driving analysts – Extracting results from HDP and hosting report servers, interactive Uis like Apache Zeppelin  Users don’t care about this separation – Big Data is already a huge cluster on one side – Asking for another infrastructure & needing separate management of this other stuff is burdensome – Unified solution >> Silos
  • 9.
    9 © HortonworksInc. 2011 – 2016. All Rights Reserved Hadoop Compute Platform Next  A colorful, multi-threaded yarn  For use-cases of various colors  Today’s applications better  Simplified long running applications  Bring your app easily https://www.flickr.com/photos/happyskrappy/15699919424
  • 10.
    10 © HortonworksInc. 2011 – 2016. All Rights Reserved What is happening now?
  • 11.
    11 © HortonworksInc. 2011 – 2016. All Rights Reserved Packaging  Containers – Lightweight mechanism for packaging and resource isolation – Popularized and made accessible by Docker – Can replace VMs in some cases – Or more accurately, VMs got used in places where they didn’t need to be  Native integration ++ in YARN – Support for “Container Runtimes” in LCE: YARN-3611 – Process runtime – Docker runtime
  • 12.
    12 © HortonworksInc. 2011 – 2016. All Rights Reserved APIs  Applications need simple APIs  Need to be deployable “easily”  Simple REST API layer fronting YARN – https://issues.apache.org/jira/browse/YARN-4793 – [Umbrella] Simplified API layer for services and beyond  Spawn services & Manage them
  • 13.
    13 © HortonworksInc. 2011 – 2016. All Rights Reserved Platform++  YARN itself is evolving to support services and complex apps – https://issues.apache.org/jira/browse/YARN-4692 – [Umbrella] Simplified and first-class support for services in YARN  Scheduling – Application priorities: YARN-1963 – Affinity / anti-affinity: YARN-1042 – Services as first-class citizens: Preemption, reservations etc
  • 14.
    14 © HortonworksInc. 2011 – 2016. All Rights Reserved Platform++ Contd  Application & Services upgrades – ”Do an upgrade of my Spark / HBase apps with minimal impact to end-users” – YARN-4726  Simplified discovery of services via DNS mechanisms: YARN-4757  YARN Federation – to infinity and beyond: YARN-2915  Easier container sizing models: Resource profiles: YARN-3926
  • 15.
    15 © HortonworksInc. 2011 – 2016. All Rights Reserved Framework++  Platform is only as good as the tools  A native YARN framework – https://issues.apache.org/jira/browse/YARN-4692 – [Umbrella] Native YARN framework layer for services and beyond  Slider supporting a DAG of apps: – https://issues.apache.org/jira/browse/SLIDER-875
  • 16.
    16 © HortonworksInc. 2011 – 2016. All Rights Reserved User facing and operational experience  Modern YARN web UI - YARN-3368  Enhanced shell interfaces  Metrics: Timeline Service V2 – YARN-2928  Application & Services monitoring, integration with other systems  First class support for YARN hosted services in Ambari – https://issues.apache.org/jira/browse/AMBARI-17353
  • 17.
    17 © HortonworksInc. 2011 – 2016. All Rights Reserved Use-cases.. Assemble! Platform Services Storage Resource Management Security Service Discovery Management Monitoring Alerts Holiday Assembly HBase Web Server IOT Assembly Kafka Storm HBase Solr Governance MR Tez Spark …
  • 18.
    18 © HortonworksInc. 2011 – 2016. All Rights Reserved Take away..
  • 19.
    19 © HortonworksInc. 2011 – 2016. All Rights Reserved Thank You (Rest of) The demo Team • Gour Saha • Sidhartha Seethana • Varun Vasudev • Shane Kumpf • Jaimin Jetly • Yusaku Sako • Yu Liu