Multi-tenant Apache Storm as a service

Multi-tenant Storm as a Service

Hi I’m Bobby (evans@yahoo-inc.com)
2
 Low Latency Data Processing Architect
› My team and I provide Apache Storm as a service.
› We also maintain Spark, but that is another talk.
› And we get to play around with deep learning and online machine learning too.
 Commiter and PMC/PPMC member for
› Apache Storm incubating
› Apache Hadoop
› Apache Spark
› Apache TEZ

Agenda
3
 Storm and YARN Overview
 Why?
 Securing Standalone Storm
 Storm on YARN
 What’s Next?

Storm Concepts
1. Streams
› Unbounded sequence of tuples
2. Spout
› Source of Stream
› E.g. Read from Twitter streaming API
3. Bolts
› Processes input streams and produces
new streams
› E.g. Functions, Filters, Aggregation,
Joins
4. Topologies
› Network of spouts and bolts

Storm Architecture
Master
Node
Cluster
Coordination
Worker
Processes
Worker
Nimbus
Zookeeper
Zookeeper
Zookeeper
Supervisor
Supervisor
Supervisor
Supervisor Worker
Worker
Worker
Launches
Workers

YARN
6
Resource
Manager
Client
MapReduce Status
Job Submission
Client
Node
Manager
Container Container
Node
Manager
App Mstr Container
Node
Manager
Container App Mstr
Node Status
Resource Request

Agenda
7
 Why?
 Storm on YARN
 What’s Next?

Why?
8
Short Term:
 SOX Compliance (Security)
 Reduced Operations Overhead
 Centralized Knowledge
 Managed Updates
 Some Elasticity
Longer Term:
 Elasticity
 Utilization

Agenda
9
 Why?
 Storm on YARN
 What’s Next?

Authenticating
Each Connection
10/5/201510

Authentication By Type
10/5/201511
 HTTP – Using HTTP Authentication or with a Custom Java Servlet
Filter.
 Thrift – Kerberos (Possibly through a forwarded TGT)
 ZooKeeper
› Kerberos for system processes (Because there is a keytab available)
› a shared secret for worker processes with MD5SUM in ZK.
 File System – OS user/group + FS permissions.
 Worker to Worker – Can use encryption with shared secret, but we
really need to add in SASL Auth.
 External Services (like HBase) – Sorry it is up to you (Sort of …)

Authenticating
Each Connection
10/5/201512

Credentials Push
(Authenticating with External Services)
10/5/201513
APIs to deliver credentials to a Topology.
 ICredentialsListener – informed of credentials updates.
 IAutoCredentials – automatically include credentials to push.
 ICredentialsRenewer – renew credentials.
 Push new Credentials
› storm upload_credentials
› StormSubmitter.pushCredentails
 AutoTGT – push forwardable TGT to topology.
› Also logs you into Hadoop/HBase if needed

Authorization
10/5/201514
IAuthorizer plugin allows you to decide what is and isn’t allowed
SimpleACLAuthorizer for Nimbus.
 Different roles for users
› Administrators can do anything.
› Supervisors
› Users
 Topology can configure access to itself as well (rebalance).
DRPCSimpleACLAuthorizer for DRPC.
 Can configure client and topology users per function.
 Can default open or closed.
Topology can also whitelist users to view info through UI and Logviewer

Multi-tenancy
supervisor.run.worker.as.user: true
10/5/201515
Modified code from Hadoop to let Supervisor launch workers as the user
that ran the topology.

Multi-tenant Scheduler
16
 Provides admin resource allotments per user instead of per topology
› Users decide how to divide up their resources per topology

Available Now
17
Code:
https://github.com/apache/incubator-storm/tree/security
Instructions:
https://github.com/apache/incubator-storm/blob/security/SECURITY.md
Pull Request:
https://github.com/apache/incubator-storm/pull/121

Agenda
18
 Why?
 Storm on YARN
 What’s Next?

Storm on YARN (Launching a Cluster)
19

Storm on YARN
10/5/201521
Currently
 A stand alone storm cluster running on YARN
 Has some hacks to avoid port conflicts
 No security
 No recovery if AM goes down

Available Now
22
https://github.com/yahoo/storm-yarn
And we plan to push this back into apache storm incubating once security
is merged to master.

Agenda
23
 Why?
 Storm on YARN
 What’s Next?

What’s Next?
(If you see anything you like we are hiring…)
24
 Nimbus HA/Recovery.
 Long lived secure processes in YARN.
 Ephemeral ports for storm.
 Combine the AM and Nimbus.
 Do we need a Supervisor if we have a Node Manager?
 Possibly run as Unmanaged AMs and Proxy Users.
 Elasticity for storm topologies.
 Resource aware scheduling/requests in storm.
 Network aware scheduling in YARN and Storm.
 Automatic fetching of delegation tokens like Oozie

Questions?
We are hiring!
Stop by Kiosk P9
or reach out to us at
bigdata@yahoo-inc.com.

Why Not…
27
No need for a religious war, there are lots of good options out there and
we picked one.
Apache Spark Streaming
 We started before Spark Streaming was a possibility.
 Storm is currently more advanced in many areas, but not in all.
› Fault Tolerance (I can turn it off in storm)
S4
 The community for Storm was more active
 Fault Tolerance (I can turn it on in storm)

Worker
Task
(Spout A-1)
Task
(Spout A-5)
Task
(Spout A-9)
Task
(Bolt B-3)
Task
(Bolt B-7)
Task
(Acker)
Disruptor Queue

Multi-tenant Apache Storm as a service

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Multi-tenant Apache Storm as a service

Similar to Multi-tenant Apache Storm as a service (20)

Recently uploaded

Recently uploaded (20)

Multi-tenant Apache Storm as a service