Containers and
Hadoop
Hadoop virtualization, done right!
Dinesh Subhraveti - dineshs@altiscale.com
Altiscale Inc.
“Brief History of Containers”
2001 2002 2003 20052004
First implementation of
containers based on syscall
interposition — ...
“Brief History of Containers”
2001 2002 2003 20052004
First implementation of
containers based on syscall
interposition — ...
“Brief History of Containers”
2001 2002 2003 20052004
First research paper on
Linux Containers —
OSDI’02
First container-b...
“Brief History of Containers”
2001 2002 2003 2005
Enterprise Linux
Container solution —
Meiosys
2004
First research paper ...
“Brief History of Containers”
2001 2002 2003 2005
Enterprise Linux
Container solution —
Meiosys
2004
First research paper ...
“Brief History of Containers”
2001 2002 2003 2005
Enterprise Linux
Container solution —
Meiosys
2004
First research paper ...
“Brief History of Containers”
2001 2002 2003 2005
Enterprise Linux
Container solution —
Meiosys
2004
First research paper ...
Container Renaissance
“Datacenter is the Computer”
“The new computer needs an OS!”
Computer
OS
Mesos KubernetesYARN
Mesos KubernetesYARN
Containers: Enabler of the Datacenter OS
Computer
OS
ProcessesContainers: isolated abstractions
Why not Virtual Machines?
Application — Hardware misalignment
Hypervisor
Container Host
Application
Application
Applicatio...
Why not Virtual Machines?
Application — Hardware misalignment
Hypervisor
Container Host
Application
Applications have roun...
Host
iSCSI, NFS
Image Format Interpreter
Virtual Device
VM Exit (Context Switch)
Guest Driver
Guest File System
Host
Appli...
Why not Virtual Machines?
The Unwelcome Guest OS
Slow startup time
Guest OS licensing and maintenance burden
Poor scalabil...
!
Hadoop
Resource Manager
Map Reduce
!
YARN
Map
Reduce
Spark Hbase ...
Evolution of Hadoop from Map Reduce to
YARN
Isolati...
!
Hadoop
Resource Manager
Map Reduce
!
YARN
Map
Reduce
Spark Hbase ...
Containers on YARN
Containers provide a simple and ...
!
Node Manager
Customer A
Task 1
Customer B
Task 1
Containers on YARN
Node Manager Spawned Tasks as Containers
Container V...
Containers on YARN
Advantages
Secure multitenancy
Performance Isolation
Utilization via coscheduling IO and CPU tasks
Cons...
❏ Recent addition to the kernel

❏ Superuser in container maps to a
regular user on the host

❏ Docker support for UID vir...
References
!
❏ Blog post describing UID virtualization support in Docker
❏ https://www.altiscale.com/making-docker-work-ya...
Backup
Containers on Hadoop or
Hadoop on Containers?
Hadoop on Separate Physical Clusters
Awesomely Secure !
Everybody gets private hardware running private
services
Customer ...
Hadoop on Separate Physical Clusters
Customer 1 Customer 2 Customer 3
Cannot scale the business this way!
Poor utilization...
Container Clusters to Decouple Host from Customer
Each customer gets a container image
❖ Encapsulates customer specific so...
Global Pool of Resources
Global Utilization: 11
Spare: 16
Unused: 0
Container Clusters to Drive Utilization
Each customer ...
Global Pool of Resources
Containers with Fine-grain Resources
❖ Container resource levels adjusted dynamically per
custome...
Global Pool of Resources
Disaggregated Compute and Storage
DNNM
❖ Add more storage to Customer 1 cluster from a storage ri...
Upcoming SlideShare
Loading in...5
×

July 2014 HUG : Privilege Isolation in Docker Containers

2,053

Published on

July 2014 HUG : Privilege Isolation in Docker Containers

Published in: Data & Analytics
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,053
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
60
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • Loss of locality etc. doesn’t make material difference
    Suboptimal scheduling
    No sharing (IA usecase: universities sharing data over a common HDFS)
  • Loss of locality etc. doesn’t make material difference
    Suboptimal scheduling
    No sharing (IA usecase: universities sharing data over a common HDFS)
  • Loss of locality etc. doesn’t make material difference
    Suboptimal scheduling
    No sharing (IA usecase: universities sharing data over a common HDFS)
  • July 2014 HUG : Privilege Isolation in Docker Containers

    1. 1. Containers and Hadoop Hadoop virtualization, done right! Dinesh Subhraveti - dineshs@altiscale.com Altiscale Inc.
    2. 2. “Brief History of Containers” 2001 2002 2003 20052004 First implementation of containers based on syscall interposition — Columbia
    3. 3. “Brief History of Containers” 2001 2002 2003 20052004 First implementation of containers based on syscall interposition — Columbia First research paper on Linux Containers — OSDI’02
    4. 4. “Brief History of Containers” 2001 2002 2003 20052004 First research paper on Linux Containers — OSDI’02 First container-based distributed checkpointing — HP Labs First implementation of containers based on syscall interposition — Columbia
    5. 5. “Brief History of Containers” 2001 2002 2003 2005 Enterprise Linux Container solution — Meiosys 2004 First research paper on Linux Containers — OSDI’02 First container-based distributed checkpointing — HP Labs First implementation of containers based on syscall interposition — Columbia
    6. 6. “Brief History of Containers” 2001 2002 2003 2005 Enterprise Linux Container solution — Meiosys 2004 First research paper on Linux Containers — OSDI’02 IBM acquires Meiosys — Focus shifted to AIX First container-based distributed checkpointing — HP Labs First implementation of containers based on syscall interposition — Columbia
    7. 7. “Brief History of Containers” 2001 2002 2003 2005 Enterprise Linux Container solution — Meiosys 2004 First research paper on Linux Containers — OSDI’02 IBM acquires Meiosys — Focus shifted to AIX First container-based distributed checkpointing — HP Labs First implementation of containers based on syscall interposition — Columbia
    8. 8. “Brief History of Containers” 2001 2002 2003 2005 Enterprise Linux Container solution — Meiosys 2004 First research paper on Linux Containers — OSDI’02 IBM acquires Meiosys — Focus shifted to AIX First container-based distributed checkpointing — HP Labs First implementation of containers based on syscall interposition — Columbia Most core kernel changes finally made into Linux mainline
    9. 9. Container Renaissance “Datacenter is the Computer”
    10. 10. “The new computer needs an OS!” Computer OS Mesos KubernetesYARN
    11. 11. Mesos KubernetesYARN Containers: Enabler of the Datacenter OS Computer OS ProcessesContainers: isolated abstractions
    12. 12. Why not Virtual Machines? Application — Hardware misalignment Hypervisor Container Host Application Application Applications have round edges — system call interface Hypervisors expose square holes — hardware interface Lightweight abstraction without IO overhead or startup latency
    13. 13. Why not Virtual Machines? Application — Hardware misalignment Hypervisor Container Host Application Applications have round edges — system call interface Hypervisors expose square holes — hardware interface Lightweight abstraction without IO overhead or startup latency The unwelcome Guest OS Application
    14. 14. Host iSCSI, NFS Image Format Interpreter Virtual Device VM Exit (Context Switch) Guest Driver Guest File System Host Application Why not Virtual Machines? Layers of Intermediate Software VMsContainers Application High IO overhead due to many intermediate layers
    15. 15. Why not Virtual Machines? The Unwelcome Guest OS Slow startup time Guest OS licensing and maintenance burden Poor scalability High resource consumption due to duplication Obfuscated network / storage / compute topologies Application semantic information is lost
    16. 16. ! Hadoop Resource Manager Map Reduce ! YARN Map Reduce Spark Hbase ... Evolution of Hadoop from Map Reduce to YARN Isolation is an immediate challenge
    17. 17. ! Hadoop Resource Manager Map Reduce ! YARN Map Reduce Spark Hbase ... Containers on YARN Containers provide a simple and elegant solution Container Virtualization
    18. 18. ! Node Manager Customer A Task 1 Customer B Task 1 Containers on YARN Node Manager Spawned Tasks as Containers Container Virtualization Customer A Task 2 Customer C Task 1 Tasks representing the same job share the same container
    19. 19. Containers on YARN Advantages Secure multitenancy Performance Isolation Utilization via coscheduling IO and CPU tasks Consistent cluster environment Isolation of software dependencies / configuration Reproducible way to define app environment Rapid provisioning
    20. 20. ❏ Recent addition to the kernel
 ❏ Superuser in container maps to a regular user on the host
 ❏ Docker support for UID virtualization Privilege Isolation through UID namespaces Host Container Container root UID 0 Regular user UID 100 UID Virtualization U Host root UID 0
    21. 21. References ! ❏ Blog post describing UID virtualization support in Docker ❏ https://www.altiscale.com/making-docker-work-yarn/ ❏ Apache wiki page tracking work status across Docker and YARN projects ❏ https://wiki.apache.org/hadoop/dineshs/IsolatingYarnAppsInDockerContainers ❏ JIRA tracking Docker integration into YARN ❏ https://issues.apache.org/jira/browse/YARN-1964 ❏ Related Docker tickets ❏ Several tickets linked from: https://github.com/dotcloud/docker/pull/4572
 
 dineshs@altiscale.com Questions?
    22. 22. Backup Containers on Hadoop or Hadoop on Containers?
    23. 23. Hadoop on Separate Physical Clusters Awesomely Secure ! Everybody gets private hardware running private services Customer 1 Customer 2 Customer 3
    24. 24. Hadoop on Separate Physical Clusters Customer 1 Customer 2 Customer 3 Cannot scale the business this way! Poor utilization Host platform is a huge maintenance burden ❖ Customer 1 needs R ❖ Customer 2 needs Matlab ❖ Customer 3 needs ß∂ø… Utilization: 6 Spare: 0 Unused: 3 Utilization: 1 Spare: 6 Unused: 2 Utilization: 4 Spare: 3 Unused: 2
    25. 25. Container Clusters to Decouple Host from Customer Each customer gets a container image ❖ Encapsulates customer specific software and configuration ❖ Host platform remains lean and simple Utilization: 6 Spare: 0 Unused: 3 Utilization: 1 Spare: 6 Unused: 2 Utilization: 4 Spare: 3 Unused: 2 Poor utilization Customer 1 Customer 2 Customer 3
    26. 26. Global Pool of Resources Global Utilization: 11 Spare: 16 Unused: 0 Container Clusters to Drive Utilization Each customer gets a container image ❖ Encapsulates customer specific software and configuration ❖ Host platform remains lean and simple Densely pack containers together
    27. 27. Global Pool of Resources Containers with Fine-grain Resources ❖ Container resource levels adjusted dynamically per customer ➢ As dictated by business policy ❖ Fractional resource allocation
    28. 28. Global Pool of Resources Disaggregated Compute and Storage DNNM ❖ Add more storage to Customer 1 cluster from a storage rich node ➢ While a compute intensive job from Customer 2 utilizes the available compute capacity on the same node Independently scale compute and storage
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×