Sharing resources with non-Hadoop workloads

Sharing resources with
non-Hadoop workloads
Matthew Farrellee <matt@redhat.com>
Principal Software Engineer
Red Hat

Abstract - what you hoped to hear?
Enterprise data centers house numerous workloads. With Hadoop growing in
these data centers, IT departments need tools to avoid creating silos, while
maintaining SLAs, reporting and chargeback requirements. We present a
completely open source reference architecture including Apache Hadoop,
Linux cgroups and namespace isolation, Gluster and HTCondor. Topics to
be covered -
•  Augmenting existing HDFS and MapReduce infrastructure with
dynamically provisioned resources
•  On-demand creating, growing and shrinking MapReduce infrastructure for
user workload
•  Isolating workloads to enable multi-tenant access to resources
•  Publishing of resource utilization and accounting information for ingest into
chargeback systems

Agenda
•  Use cases
•  High level architecture diagram
•  Demonstrations
•  cgroups and namespaces
•  Lessons learned
Feel free to ask questions along the way

Use cases
1. Augmenting infrastructure by elastically
extending Hadoop clusters
2. User self-service clusters
3. Consolidating many small clusters onto
hardware with existing services
a.  Managing, upgrading, or testing multiple Hadoop
versions on shared infrastructure

Architecture
DataNode
DataNode
TaskTracker
TaskTracker
Gluster FS
TaskTracker
TaskTracker
DataNode
TaskTracker
DataNode
TaskTracker
DataNode
TaskTracker
DataNode
DataNode
DataNode
TaskTracker
TaskTracker
NameNode
JobTracker
TaskTracker
DataNode
TaskTracker
DataNode
TaskTracker
DataNode
JobTracker
NameNode
Infra. Service
Dynamic
Service
Key:
Gluster FS
TaskTracker
Scheduler
Machine

Scheduler
•  SLA
o  Quota, Requirements (won't run w/o), Rank (order),
global and local limits (won't exceed)
•  Reporting
o  Resource usage by time & group / user
o  Audit log
•  Performance
o  Requirements - minimum physical resources
o  Local limits - available spindle or co-processor or
railgun

System
•  SLA
o  cgroups (memory, cpu, cpuacct, blk)
•  Isolation
o  namespaces
o  virtualization
•  Reporting
o  Resource usage per process and group
•  Performance
o  cpuset, numactl, numad

System
Machine
cgroups
Gluster FS
TaskTracker
DataNode
/tmp
/tmp
mounts
/glustervol

Use case one - augmenting infra.
DataNode
DataNode
TaskTracker
TaskTracker
Gluster FS
TaskTracker
TaskTracker
DataNode
TaskTracker
DataNode
TaskTracker
DataNode
TaskTracker
DataNode
DataNode
DataNode
TaskTracker
TaskTracker
NameNode
JobTracker
TaskTracker
DataNode
TaskTracker
DataNode
TaskTracker
DataNode
JobTracker
NameNode
Infra. Service
Dynamic
Service
Key:
Gluster FS
TaskTracker
Scheduler
Machine
Cluster

Use case two - self-service cluster
DataNode
DataNode
TaskTracker
TaskTracker
Gluster FS
TaskTracker
TaskTracker
DataNode
TaskTracker
DataNode
TaskTracker
DataNode
TaskTracker
DataNode
DataNode
DataNode
TaskTracker
TaskTracker
NameNode
JobTracker
TaskTracker
DataNode
TaskTracker
DataNode
TaskTracker
DataNode
JobTracker
NameNode
Infra. Service
Dynamic
Service
Key:
Gluster FS
TaskTracker
Scheduler
Machine
Cluster

Control Groups (cgroups)
•  https://www.kernel.org/doc/Documentation/cgroups/
cgroups.txt
•  Why not virtualization?
o  Virt is not for SLA, it's for isolation
•  All processes must be in a group, use
systemd or roll your own or use systemd
o  Keep a close eye on systemd changes
o  http://lwn.net/Articles/555920/
•  Group depth and width
•  Share / weight isn't %

Namespaces
•  Mount
•  PID
•  Network
•  Others,
o  UTC - uname() - nodename and domainname
o  IPC - SysV IPC and POSIX message queues
o  User

Lessons learned
•  Play nice
•  Be flexible
•  Cleanup is important and hard
•  Resource tracking is hard

Play nice
•  Don't assume you are the only scheduler on
the system, don't claim ownership of nodes,
cohabitate
•  System integration helps (cgroups &
systemd)

Be flexible
•  Use extensible data structures
o  Obvious: CPU, Memory, Disk, Network
o  Less obvious: GPU, co-processor, cache, spindle,
running services, licensing
•  Might end up with an expression language to
evaluate policy

•  You need to deallocate the resources you
allocate
o  Kill all processes you spawned
o  Clean up disk spaces you used
•  Tracking processes used to be hard
o  Processes can escape your watchful eye
o  By uid / gid, by env cookies
o  Now cgroups
•  Tracking disk usage used to be nearly
impossible (inefficient)
o  Now mount namespaces
Cleanup is important and hard

•  Similar to resource cleanup
•  Keeping track of resources meant walking /
proc and merging with getrusage()
•  Far easier with cgroups
Resource tracking is hard

Sharing resources with non-Hadoop workloads

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Sharing resources with non-Hadoop workloads

Similar to Sharing resources with non-Hadoop workloads (20)

More from DataWorks Summit

More from DataWorks Summit (20)

Recently uploaded

Recently uploaded (20)

Sharing resources with non-Hadoop workloads