OpenShift scheduling docker 
containers in YARN via Kubernetes 
Page 1 © Hortonworks Inc. 2014
Docker: a shipping container system for code 
Static website 
Mul$plicity 
of 
Stacks 
hardware 
environments 
Page 2 © Hortonworks Inc. 2014 
User DB 
Web frontend 
Queue 
Analytics DB 
Development VM 
QA server 
Public Cloud 
appropriately? 
Contributor’s 
laptop 
Mul$plicity 
of 
Production 
Cluster 
Customer Data 
Center 
Do 
services 
and 
apps 
interact 
smoothly 
and 
quickly 
Can 
I 
migrate 
…that can be manipulated using 
standard operations and run 
consistently on virtually any 
hardware platform 
An engine that enables 
any payload to be 
encapsulated as a 
lightweight, portable, 
self-sufficient 
container…
Why are Docker containers lightweight? 
HDP 2.1 
Hortonworks Data Platform 
I/O performance comparable to Bare Metal 
Page 3 © Hortonworks Inc. 2014
Kubernetes – Container Orchestrator 
• Service for container cluster management 
• Allows deploying and managing applications running 
on multiple hosts using docker 
• Open sourced by Google 
• Supports GCE, CoreOS, Azure, vSphere 
• Used to manage Docker containers as a default 
implementation 
• Master – maintain state of Kubernetes Server runtime 
• Scheduler, API server, registries, storage 
• Minions – represent the host were containers created 
• Kubelet – manage pod and container lifecycle 
Page 4 © Hortonworks Inc. 2014
OpenShift 
• Red Hat’s platform as a service for 
applications in the cloud that supports 
both public and private cloud 
• Provides high level abstraction for applications on top 
of containers allowing easy scaling, service discovery, 
and deployment 
• Enable Docker image authors to easily deliver reusable 
application components, including highly available 
databases, monitoring and log aggregation tools, 
service discovery platforms, and prepackaged web 
applications 
• Allow developers to deeply customize their runtime 
environments while preserving operational support at 
scale for those applications 
Page 5 © Hortonworks Inc. 2014
Kubernetes/YARN/Docker Integration 
YARN Node Manager 
YARN Node Manager 
Kubernetes 
AppMaster 
Page 6 © Hortonworks Inc. 2014 6
Understanding Storm via a Real-World Use Case 
A large truck fleet company wants to, in real-time, 
capture events of drivers in their trucks on the road 
across the US. 
Sensor devices on trucks captures all kinds of events 
varying from vehicle diagnostics to driver infractions. 
§ E.g.: Excessive breaking/acceleration, speeding, start/stop, etc.. 
Initial Business Requirement: 
§ Stream these events in, filter on violations and do real-time alerting 
on “lots” of erratic behavior over a short period of time.. 
Page 7 © Hortonworks Inc. 2014
High Level Architecture 
Truck Streaming Data 
T(1) T(2) T(N) 
Interactive Query 
TEZ 
Perform Ad Hoc 
Queries on 
driver/truck 
events and other 
related data 
sources 
Page 8 © Hortonworks Inc. 2014 
Messaging Grid 
(WMQ, ActiveMQ, Kafka) 
truck 
events 
TOPIC 
Stream Processing with Storm 
Kafka Spout 
HBase 
Bolt 
Monitoring 
Bolt 
HDFS 
Bolt 
High Speed Ingestion 
Create 
Alerts 
Distributed Storage 
HDFS 
Write to 
HDFS 
Email 
Alerts 
ActiveMQ 
Alert 
Topic 
Real-time Serviing with 
HBase 
driver 
dangerous 
events 
driver 
dangerou 
s events 
count 
Write to 
HBase 
Update Alert 
Thresholds 
Real-Time 
Streaming Driver 
Monitoring App 
Spring WebApp with SockJS 
WebSockets 
Query driver 
events in 
real-time 
Consume 
alerts in 
real-time 
Batch Analytics 
MR2 
Do batch 
analysis/models 
& update HBase 
with right 
thresholds for 
alerts
HDP Provides a Single Data Platform 
Truck Streaming Data 
T(1) T(2) T(N) 
Interactive Query 
TEZ 
Perform Ad Hoc 
Queries on 
driver/truck 
events and other 
related data 
sources 
Page 9 © Hortonworks Inc. 2014 
Messaging Grid 
(WMQ, ActiveMQ, Kafka) 
truck 
events 
TOPIC 
Stream Processing with Storm 
Kafka Spout 
HBase 
Bolt 
Monitoring 
Bolt 
HDFS 
Bolt 
High Speed Ingestion 
Create 
Alerts 
Distributed Storage 
HDFS 
Write to 
HDFS 
Email 
Alerts 
ActiveMQ 
Alert 
Topic 
HDP Data Lake 
Real-time Serviing with 
HBase 
driver 
dangerous 
events 
driver 
dangerou 
s events 
count 
Write to 
HBase 
Update Alert 
Thresholds 
Real-Time 
Streaming Driver 
Monitoring App 
Spring WebApp with SockJS 
WebSockets 
Query driver 
events in 
real-time 
Consume 
alerts in 
real-time 
Batch Analytics 
MR2 
Do batch 
analysis/models 
& update HBase 
with right 
thresholds for 
alerts YARN Enables 4 different apps/ 
workloads on a single cluster
HDP Provides a Single Data Platform 
Truck Streaming Data 
T(1) T(2) T(N) 
Interactive Query 
TEZ 
Perform Ad Hoc 
Queries on 
driver/truck 
events and other 
related data 
sources 
Page 10 © Hortonworks Inc. 2014 
Messaging Grid 
(WMQ, ActiveMQ, Kafka) 
truck 
events 
TOPIC 
Stream Processing with Storm 
Kafka Spout 
HBase 
Bolt 
Monitoring 
Bolt 
HDFS 
Bolt 
High Speed Ingestion 
Create 
Alerts 
Distributed Storage 
HDFS 
Write to 
HDFS 
Email 
Alerts 
ActiveMQ 
Alert 
Topic 
HDP Data Lake 
Real-time Serviing with 
HBase 
driver 
dangerous 
events 
driver 
dangerou 
s events 
count 
Write to 
HBase 
Update Alert 
Thresholds 
Real-Time 
Streaming Driver 
Monitoring App 
Spring WebApp with SockJS 
WebSockets 
Query driver 
events in 
real-time 
Consume 
alerts in 
real-time 
Batch Analytics 
MR2 
Do batch 
analysis/models 
& update HBase 
with right 
thresholds for 
alerts YARN Enables 4 different apps/ 
workloads on a single cluster
Demo: OpenShift scheduling Docker container in YARN 
Page 11 © Hortonworks Inc. 2014 
Running in Docker

Openshift YARN - strata 2014

  • 1.
    OpenShift scheduling docker containers in YARN via Kubernetes Page 1 © Hortonworks Inc. 2014
  • 2.
    Docker: a shippingcontainer system for code Static website Mul$plicity of Stacks hardware environments Page 2 © Hortonworks Inc. 2014 User DB Web frontend Queue Analytics DB Development VM QA server Public Cloud appropriately? Contributor’s laptop Mul$plicity of Production Cluster Customer Data Center Do services and apps interact smoothly and quickly Can I migrate …that can be manipulated using standard operations and run consistently on virtually any hardware platform An engine that enables any payload to be encapsulated as a lightweight, portable, self-sufficient container…
  • 3.
    Why are Dockercontainers lightweight? HDP 2.1 Hortonworks Data Platform I/O performance comparable to Bare Metal Page 3 © Hortonworks Inc. 2014
  • 4.
    Kubernetes – ContainerOrchestrator • Service for container cluster management • Allows deploying and managing applications running on multiple hosts using docker • Open sourced by Google • Supports GCE, CoreOS, Azure, vSphere • Used to manage Docker containers as a default implementation • Master – maintain state of Kubernetes Server runtime • Scheduler, API server, registries, storage • Minions – represent the host were containers created • Kubelet – manage pod and container lifecycle Page 4 © Hortonworks Inc. 2014
  • 5.
    OpenShift • RedHat’s platform as a service for applications in the cloud that supports both public and private cloud • Provides high level abstraction for applications on top of containers allowing easy scaling, service discovery, and deployment • Enable Docker image authors to easily deliver reusable application components, including highly available databases, monitoring and log aggregation tools, service discovery platforms, and prepackaged web applications • Allow developers to deeply customize their runtime environments while preserving operational support at scale for those applications Page 5 © Hortonworks Inc. 2014
  • 6.
    Kubernetes/YARN/Docker Integration YARNNode Manager YARN Node Manager Kubernetes AppMaster Page 6 © Hortonworks Inc. 2014 6
  • 7.
    Understanding Storm viaa Real-World Use Case A large truck fleet company wants to, in real-time, capture events of drivers in their trucks on the road across the US. Sensor devices on trucks captures all kinds of events varying from vehicle diagnostics to driver infractions. § E.g.: Excessive breaking/acceleration, speeding, start/stop, etc.. Initial Business Requirement: § Stream these events in, filter on violations and do real-time alerting on “lots” of erratic behavior over a short period of time.. Page 7 © Hortonworks Inc. 2014
  • 8.
    High Level Architecture Truck Streaming Data T(1) T(2) T(N) Interactive Query TEZ Perform Ad Hoc Queries on driver/truck events and other related data sources Page 8 © Hortonworks Inc. 2014 Messaging Grid (WMQ, ActiveMQ, Kafka) truck events TOPIC Stream Processing with Storm Kafka Spout HBase Bolt Monitoring Bolt HDFS Bolt High Speed Ingestion Create Alerts Distributed Storage HDFS Write to HDFS Email Alerts ActiveMQ Alert Topic Real-time Serviing with HBase driver dangerous events driver dangerou s events count Write to HBase Update Alert Thresholds Real-Time Streaming Driver Monitoring App Spring WebApp with SockJS WebSockets Query driver events in real-time Consume alerts in real-time Batch Analytics MR2 Do batch analysis/models & update HBase with right thresholds for alerts
  • 9.
    HDP Provides aSingle Data Platform Truck Streaming Data T(1) T(2) T(N) Interactive Query TEZ Perform Ad Hoc Queries on driver/truck events and other related data sources Page 9 © Hortonworks Inc. 2014 Messaging Grid (WMQ, ActiveMQ, Kafka) truck events TOPIC Stream Processing with Storm Kafka Spout HBase Bolt Monitoring Bolt HDFS Bolt High Speed Ingestion Create Alerts Distributed Storage HDFS Write to HDFS Email Alerts ActiveMQ Alert Topic HDP Data Lake Real-time Serviing with HBase driver dangerous events driver dangerou s events count Write to HBase Update Alert Thresholds Real-Time Streaming Driver Monitoring App Spring WebApp with SockJS WebSockets Query driver events in real-time Consume alerts in real-time Batch Analytics MR2 Do batch analysis/models & update HBase with right thresholds for alerts YARN Enables 4 different apps/ workloads on a single cluster
  • 10.
    HDP Provides aSingle Data Platform Truck Streaming Data T(1) T(2) T(N) Interactive Query TEZ Perform Ad Hoc Queries on driver/truck events and other related data sources Page 10 © Hortonworks Inc. 2014 Messaging Grid (WMQ, ActiveMQ, Kafka) truck events TOPIC Stream Processing with Storm Kafka Spout HBase Bolt Monitoring Bolt HDFS Bolt High Speed Ingestion Create Alerts Distributed Storage HDFS Write to HDFS Email Alerts ActiveMQ Alert Topic HDP Data Lake Real-time Serviing with HBase driver dangerous events driver dangerou s events count Write to HBase Update Alert Thresholds Real-Time Streaming Driver Monitoring App Spring WebApp with SockJS WebSockets Query driver events in real-time Consume alerts in real-time Batch Analytics MR2 Do batch analysis/models & update HBase with right thresholds for alerts YARN Enables 4 different apps/ workloads on a single cluster
  • 11.
    Demo: OpenShift schedulingDocker container in YARN Page 11 © Hortonworks Inc. 2014 Running in Docker