Build Enterprise Grade Applications
in YARN with
Poorna Chandra
poorna@cask.co
Big Data App Meetup
July 27, 2016
Agenda
● Hadoop YARN
● Challenges in building enterprise applications
● Apache Twill
● Architecture
● Features
● Real World Enterprise Use Case - CDAP
● Roadmap
● Q & A
2
First: The NEWS
to the Apache Twill Community!!!
Apache Twill is now a Top-Level Project of the ASF
Announcement: https://s.apache.org/Rzsf
3
Apache Hadoop® YARN
● MapReduce NextGen aka MRv2
● Resource management vs job scheduling/monitoring
● New ResourceManager manages the global assignment of compute
resources to applications
● Introduce concept of ApplicationMaster per application to communicate
with ResourceManager for compute resource management
● Enables more than MR jobs on cluster - like Apache Spark, etc.
4
How YARN Application Works
5
YARN is powerful, but...
● Every application needs to write boilerplate code
○ Negotiate resources from RM
○ Talk to NM to run jobs
○ Monitor running jobs
● Every application needs to handle
○ High availability
○ Long running applications
■ Security aspects - delegation token expiry
○ Easy scalability
6
● Provides abstraction for YARN to
reduce complexity to develop complex
and large scale distributed applications
● Adds simplicity to the power of YARN
○ Java thread-like programming
model
● Reduces boilerplate code
● Offers common needs for distributed
enterprise-grade application
development
○ Lifecycle management
○ High Availability
○ Scalability
○ Service discovery
Simplification with Apache Twill
7
Hello World in Twill
Define a TwillRunnable
public class HelloWorldRunnable extends AbstractTwillRunnable {
@Override
public void run() {
LOG.info("Hello World. My first distributed application.");
}
}
8
Hello World in Twill
Launch it!
public class HelloWorld {
public static void main(String[] args) throws Exception {
TwillRunnerService twillRunner =
new YarnTwillRunnerService(new YarnConfiguration(), "localhost:2181");
twillRunner.startAndWait();
TwillController controller = twillRunner.prepare(new HelloWorldRunnable());
controller.start();
controller.awaitTermination();
//...
}
}
9
Major Features
● Service Discovery
● Placement Policy
● Elastic Scaling
● Command Messages
● State Recovery
10
11
Service Discovery
Placement Policy
● Placement policy can be used to address
○ Performance
○ Availability
○ Resource conflict
● Exposes container placement policy from YARN
● Will allow Twill to allocate containers in specific racks and host based on
DISTRIBUTED deployment mode
12
Elastic Scaling
● Ability to add or reduce number of YARN containers to run the
application
● Scale your application based on load
● No need to restart the application
● Twill API TwillController.changeInstances is used to accomplish
this task
13
14
Command Messages
15
State Recovery
Real World Enterprise Usages - CDAP
● Cask Data Application Platform (CDAP) - http://cdap.io
○ Open source application and integration framework for big data
○ Simplifies and enhances data application development and management
■ APIs for simplification, portability and standardization
● Works across wide range of Hadoop versions and all common distros
■ Built-in System services, such as metrics and logs aggregation, dataset
management, and distributed transaction service for common big data applications
needs
○ Extensions to enhance user experience
■ Hydrator - Interactive data pipeline construction
■ Tracker - Metadata discovery and data lineage
16
Apache Twill in CDAP
● CDAP runs different types of processes on YARN
○ Long running daemons
○ REST services
○ Real-time transactional streaming framework
○ Workflow execution
● CDAP only interacts with Twill
○ Greatly simplifies the CDAP code base
○ Just a matter of minutes to add support for new type of work to run on YARN
● Twill support of common needs
○ Service discovery
○ Leader election
○ Elastic scaling
○ Security
17
CDAP
Architecture
18
Service Discovery
● CDAP exposes all functionalities through REST
● Almost all CDAP HTTP services are running in YARN
○ No fixed host and port
○ Bind to ephemeral port
○ Announce the host and port through Twill
■ Unique service name for a given service type
● Router inspects the request URI to derive a service name
○ Uses Twill discovery service client to locate actual host and port
○ Proxy the request and response
19
Long Running Applications
● All CDAP services on YARN are long running
○ Transaction server, metrics and log processing, real-time data ingestion, …
● Many user applications are long running too
○ Real-time streaming, HTTP service, application daemon
● Secure cluster, specifically Kerberos enabled cluster
○ All all Hadoop services use delegation token
■ NN, RM, HBase Master, Hive, KMS, ...
○ YARN containers don’t have the keytab, hence can’t update the token
20
Long Running Applications in Twill
● Twill provides support for updating delegation tokens
○ TwillRunner.scheduleSecureStoreUpdate
● Update delegation tokens from the launcher process (kinit process)
○ Acquires new delegation tokens periodically
○ Serializes tokens to HDFS
○ Notifies all running applications about the update
■ Through command message
○ Each runnable refreshes delegation tokens by reading from HDFS
■ Requires a non-expired HDFS delegation token
● New launcher process will discover all Twill apps from ZK
○ Can run HA launcher processes using leader election support from Twill
21
Scalability
● Many components in CDAP are linearly scalable, such as
○ Streaming data ingestion (through REST endpoint)
○ Log processing
■ Reads from Kafka, writes to HDFS
○ Metrics processing
■ Reads from Kafka, writes to timeseries table
○ User real-time streaming DAG
○ User HTTP service
● Twill supports adding/reducing YARN containers for a given TwillRunnable
○ No need to restart application
○ Guarantees a unique instance ID is assigned
■ Application can use it for partitioning
● Dynamic scaling using service discovery
22
High Availability
● In production environment, it is important to have high availability
● Twill provides couple means to achieve that
○ Running multiple instances of the same TwillRunnable
○ Use dynamic service discovery to route requests
○ Twill Automatic restart of TwillRunnable container if it gets killed / exit abnormally
■ Killed container will be removed from the service discovery
■ Restarted container will be added to the service discovery
○ Built-in leader election support to have active-passive type of redundancy
■ Tephra service use that, as it requires only having one active server
○ Placement policy to make sure that instances run on different hosts
23
Apache Twill in Enterprise
● CDAP, which uses Twill, is being used by large enterprises in production
● Apache Twill runs on different cluster types
○ AWS, Azure, bare metal, VMs
● Compatible with wide range of Hadoop versions
○ Vanilla Hadoop 2.0 - 2.7
○ HDP 2.1 - 2.3
○ CDH 5
○ MapR 4.1 - 5.1
24
Roadmap
● Generalize to run on more frameworks
○ Apache Mesos, Kubernetes
● Smarter containers management
○ Run simple runnable in AM
○ Multiple runnables in one container
● Fine-grained control of containers lifecycle
○ When to start, stop and restart on failure
● Smaller footprint
○ Optional Kafka, optional ZooKeeper
25
Thank you!
● Apache Twill is Open Source
○ http://twill.apache.org
○ dev@twill.apache.org
○ @ApacheTwill
● Contributions are welcome!
26

Building Enterprise Grade Applications in Yarn with Apache Twill

  • 1.
    Build Enterprise GradeApplications in YARN with Poorna Chandra poorna@cask.co Big Data App Meetup July 27, 2016
  • 2.
    Agenda ● Hadoop YARN ●Challenges in building enterprise applications ● Apache Twill ● Architecture ● Features ● Real World Enterprise Use Case - CDAP ● Roadmap ● Q & A 2
  • 3.
    First: The NEWS tothe Apache Twill Community!!! Apache Twill is now a Top-Level Project of the ASF Announcement: https://s.apache.org/Rzsf 3
  • 4.
    Apache Hadoop® YARN ●MapReduce NextGen aka MRv2 ● Resource management vs job scheduling/monitoring ● New ResourceManager manages the global assignment of compute resources to applications ● Introduce concept of ApplicationMaster per application to communicate with ResourceManager for compute resource management ● Enables more than MR jobs on cluster - like Apache Spark, etc. 4
  • 5.
  • 6.
    YARN is powerful,but... ● Every application needs to write boilerplate code ○ Negotiate resources from RM ○ Talk to NM to run jobs ○ Monitor running jobs ● Every application needs to handle ○ High availability ○ Long running applications ■ Security aspects - delegation token expiry ○ Easy scalability 6
  • 7.
    ● Provides abstractionfor YARN to reduce complexity to develop complex and large scale distributed applications ● Adds simplicity to the power of YARN ○ Java thread-like programming model ● Reduces boilerplate code ● Offers common needs for distributed enterprise-grade application development ○ Lifecycle management ○ High Availability ○ Scalability ○ Service discovery Simplification with Apache Twill 7
  • 8.
    Hello World inTwill Define a TwillRunnable public class HelloWorldRunnable extends AbstractTwillRunnable { @Override public void run() { LOG.info("Hello World. My first distributed application."); } } 8
  • 9.
    Hello World inTwill Launch it! public class HelloWorld { public static void main(String[] args) throws Exception { TwillRunnerService twillRunner = new YarnTwillRunnerService(new YarnConfiguration(), "localhost:2181"); twillRunner.startAndWait(); TwillController controller = twillRunner.prepare(new HelloWorldRunnable()); controller.start(); controller.awaitTermination(); //... } } 9
  • 10.
    Major Features ● ServiceDiscovery ● Placement Policy ● Elastic Scaling ● Command Messages ● State Recovery 10
  • 11.
  • 12.
    Placement Policy ● Placementpolicy can be used to address ○ Performance ○ Availability ○ Resource conflict ● Exposes container placement policy from YARN ● Will allow Twill to allocate containers in specific racks and host based on DISTRIBUTED deployment mode 12
  • 13.
    Elastic Scaling ● Abilityto add or reduce number of YARN containers to run the application ● Scale your application based on load ● No need to restart the application ● Twill API TwillController.changeInstances is used to accomplish this task 13
  • 14.
  • 15.
  • 16.
    Real World EnterpriseUsages - CDAP ● Cask Data Application Platform (CDAP) - http://cdap.io ○ Open source application and integration framework for big data ○ Simplifies and enhances data application development and management ■ APIs for simplification, portability and standardization ● Works across wide range of Hadoop versions and all common distros ■ Built-in System services, such as metrics and logs aggregation, dataset management, and distributed transaction service for common big data applications needs ○ Extensions to enhance user experience ■ Hydrator - Interactive data pipeline construction ■ Tracker - Metadata discovery and data lineage 16
  • 17.
    Apache Twill inCDAP ● CDAP runs different types of processes on YARN ○ Long running daemons ○ REST services ○ Real-time transactional streaming framework ○ Workflow execution ● CDAP only interacts with Twill ○ Greatly simplifies the CDAP code base ○ Just a matter of minutes to add support for new type of work to run on YARN ● Twill support of common needs ○ Service discovery ○ Leader election ○ Elastic scaling ○ Security 17
  • 18.
  • 19.
    Service Discovery ● CDAPexposes all functionalities through REST ● Almost all CDAP HTTP services are running in YARN ○ No fixed host and port ○ Bind to ephemeral port ○ Announce the host and port through Twill ■ Unique service name for a given service type ● Router inspects the request URI to derive a service name ○ Uses Twill discovery service client to locate actual host and port ○ Proxy the request and response 19
  • 20.
    Long Running Applications ●All CDAP services on YARN are long running ○ Transaction server, metrics and log processing, real-time data ingestion, … ● Many user applications are long running too ○ Real-time streaming, HTTP service, application daemon ● Secure cluster, specifically Kerberos enabled cluster ○ All all Hadoop services use delegation token ■ NN, RM, HBase Master, Hive, KMS, ... ○ YARN containers don’t have the keytab, hence can’t update the token 20
  • 21.
    Long Running Applicationsin Twill ● Twill provides support for updating delegation tokens ○ TwillRunner.scheduleSecureStoreUpdate ● Update delegation tokens from the launcher process (kinit process) ○ Acquires new delegation tokens periodically ○ Serializes tokens to HDFS ○ Notifies all running applications about the update ■ Through command message ○ Each runnable refreshes delegation tokens by reading from HDFS ■ Requires a non-expired HDFS delegation token ● New launcher process will discover all Twill apps from ZK ○ Can run HA launcher processes using leader election support from Twill 21
  • 22.
    Scalability ● Many componentsin CDAP are linearly scalable, such as ○ Streaming data ingestion (through REST endpoint) ○ Log processing ■ Reads from Kafka, writes to HDFS ○ Metrics processing ■ Reads from Kafka, writes to timeseries table ○ User real-time streaming DAG ○ User HTTP service ● Twill supports adding/reducing YARN containers for a given TwillRunnable ○ No need to restart application ○ Guarantees a unique instance ID is assigned ■ Application can use it for partitioning ● Dynamic scaling using service discovery 22
  • 23.
    High Availability ● Inproduction environment, it is important to have high availability ● Twill provides couple means to achieve that ○ Running multiple instances of the same TwillRunnable ○ Use dynamic service discovery to route requests ○ Twill Automatic restart of TwillRunnable container if it gets killed / exit abnormally ■ Killed container will be removed from the service discovery ■ Restarted container will be added to the service discovery ○ Built-in leader election support to have active-passive type of redundancy ■ Tephra service use that, as it requires only having one active server ○ Placement policy to make sure that instances run on different hosts 23
  • 24.
    Apache Twill inEnterprise ● CDAP, which uses Twill, is being used by large enterprises in production ● Apache Twill runs on different cluster types ○ AWS, Azure, bare metal, VMs ● Compatible with wide range of Hadoop versions ○ Vanilla Hadoop 2.0 - 2.7 ○ HDP 2.1 - 2.3 ○ CDH 5 ○ MapR 4.1 - 5.1 24
  • 25.
    Roadmap ● Generalize torun on more frameworks ○ Apache Mesos, Kubernetes ● Smarter containers management ○ Run simple runnable in AM ○ Multiple runnables in one container ● Fine-grained control of containers lifecycle ○ When to start, stop and restart on failure ● Smaller footprint ○ Optional Kafka, optional ZooKeeper 25
  • 26.
    Thank you! ● ApacheTwill is Open Source ○ http://twill.apache.org ○ dev@twill.apache.org ○ @ApacheTwill ● Contributions are welcome! 26