SlideShare a Scribd company logo
Secure Containers with EPT Isolation
Chunyan Liu
liuchunyan9@huawei.com
Jixing Gu
jixing.gu@intel.com
Presenters
• Jixing Gu:
Software Architect, from Intel CIG SW Team, working
on secure container solution architecture design and
KVM support.
• Chunyan Liu:
Software Engineer, from Huawei Central Software
Institution, working on secure container solution
design, guest OS support and Docker tooling
integration.
2
Agenda
• Background
• Secure Container Solution
• Architecture Design
• Work Flow
• Security Cases & Demo
• Benchmark
3
Containers Stack
Infrasturcture
Host OS
Docker Engine
Bin/Libs
App1
Bin/Libs
App2
• Ease of Development/Deployment
• High performance, low overhead
• Shared kernel
• Namespace isolation
• Attack surface is large
• Namespace isolation is
weak
• Bugs in Linux kernel can
allow escape to the host
and harm other containers
Security?
Container Stack & Security
 Capability: restrict capabilities of process in container
 Seccomp: filter access to syscall
 SElinux: customize privileges for processes, users and files.
 User namespace: map ‘root’ in container to non-root user on host
 Fuse: isolate “/proc”, useful for container monitoring system.
However, it’s not enough……
Security Features Supported by Docker
CANNOT: container privilege escalation by exploiting
kernel vulnerabilities …
CAN: restrict container capabilities and privileges, reduce
chances container attacking kernel
Security Problems We Address
Problem-1:
Privilege Escalation
Container
Kernel
Attacker App
By exploiting kernel vulnerability,
like CVE-2017-6074, attackers are
able to escalate to root privilege
with ret2user
Problem-2:
Memory Data Peek
Container
Kernel
Victim App
By exploiting kernel modules, like
/dev/mem device, attackers w/ root
privilege are able to peek containers
memory data
Attacker App
$$
Our Solution: Namespace-alike Memory
Isolation w/ EPT
Protection-1:
Defeat Privilege Escalation
Container
Kernel
Attacker App
Protection-2:
Defend Memory Data Peek
Container
Kernel
Victim App Attacker App
$$
KVM KVM
KVM creates a EPT memory region to isolate container user-space memory from kernel, and
captures illegitimate cross-EPT code execution or data access behaviors with EPT violation
8
Infrastructure
Host OS (Hypervisor)
Guest OS Guest OS
Docker Engine
Bin/Libs
App
Bin/Libs
App
Docker Engine
Bin/Libs
App
Bin/Libs
App
VM1 VM2
container container container container
memory
EPT Table EPT Table
Typical User Model of Containers
9
Infrastructure
Host OS (Hypervisor)
Guest OS Guest OS
Docker Engine
Bin/Libs
App
Bin/Libs
App
Docker Engine
Bin/Libs
App
Bin/Libs
App
VM1 VM2
container container container container
memory
EPT Table EPT Table
Our Ideas To Secure Containers
EPT Table
 Extract container memory to a new EPT table, separated from VM EPT Table
 Strong EPT isolation between container and VM kernel.
Legend: Existing ComponentsSC Components
• Secure Container KVM Patch
 Extend KVM existing interfaces for secure
container
 Create EPT for new container
 Add mem page into EPT view
 Delete mem page from EPT view
• Secure Container Guest Kernel Patch
 Handle secure container creation w/ extended
interfaces from the underlying KVM
 Handle data exchange w/ extended interfaces
from KVM
• Secure Container QEMU Patch
 Support VM management interfaces
• Tiny Changes To Docker Tools
 Differentiate Secure Container from Common
Container
Container 1 Container 2
Docker Tools
EPT1 EPT2
Guest OSGuest Kernel Patch
Hypervisor ( KVM/QEMU & Host OS )
KVM Patch
Secure Container Solution
QEMU Patch
Server HW System
Intel VT Hardware
Secure Container Solution Includes:
Secure Container SW Stack Up
VM
Typical Usage Scenarios
Private Cloud:
Isolate containers in a single shared guest OS
Infrastructure
Hypervisor(KVM & Host OS)
Guest OS
Container
EPT-1
Container
EPT-N
Public Cloud:
Isolate containers from same tenant &
leverage VM to isolate between tenants
Infrastructure
Hypervisor(KVM & Host OS)
Guest OS
Container
EPT-1
Container
EPT-N
Container
EPT-1
Container
EPT-N
Guest OS
VM-1 (for one tenant) VM-N (for one tenant)
Secure Container Arch and Data Flow
Secure Container Interfaces Calling Flow
Kernel to User Condition
- Violation happens under view 0
- VM’s CPL = 3
- RIP from user
- GVA from user space
- Page is present in specific user
view
User to Kernel Condition
- Violation happens under user
view
- VM’s CPL = 0
- RIP from kernel space
- GVA from kernel space
New page in current user
view Condition
- Requested page is not used by
other user views
- Requested page GVA is in user
space
EPT Violation Flow – Switch View & Add Page
• Delete Page is to delete
all access permission for a
specific page from an EPT
view.
• It entails a page free from
guest kernel.
• EPT violation handles a
delayed page deletion in
hypervisor.
• Hypervisor will erase all
the data before deleting a
page.
EPT Violation Flow – Delete Page
• Pages in file cache may
be shared across
containers.
• Hypervisor records
share page status.
• Shared pages shall be
granted to access by
different containers.
• Share pages should be
created per EPT view.
EPT Violation Flow – Share Page
struct task_struct {
……
bool sc_flag; // if true, indicate the task is in a secure container.
……
};
int data_exchange(struct data_ex_cfg *cfg, uint64_t size)
{
if (current->sc_flag == false) { return 1; }
……
//Transfer struct data_ex_cfg to hypervisor through hyper call.
HyperCall( KVM_HC_SC, HC_DATA_EXCHANGE, cfg,
sizeof(struct data_ex_cfg) );
……
}
Data Exchange between user/kernel space
Pseudo Code
Optimization: VMFUNC Switch EPT View
• Use VMFUNC in places that need to switch view (e.g. syscall)
Note:
1. Kernel trigger user space execution code;
2. CPL (current privilege level) is ring 0;
3. Privilege escalation succeeds;
Security Case: CVE-2017-6074 ret2user Attack
Note:
1. EPT violation happens while kernel trying to execute user space code because kernel space and user space are
isolated in different EPT views;
2. KVM checks CPL (current privilege level) is ring 0, and concludes it is not a reasonable system call return and
rejects the operation. (CPL should be ring 3 for normal syscall_return);
3. Even SMAP and SMEP were both disabled, secure container can prevent the privilege escalation;
Security Case: CVE-2017-6074 ret2user Attack
Demo Time
Benchmark
SC vs CC Overhead Notes
Boot time 555ms vs 481 ms (~15%)
Memory footprint ≈
Host memory has little overhead to
manage more EPT tables
CPU Performance ≈
Memory Performance ≈
Storage IO 18%~26%
Optimization focus (VMFUNC result
not updated)
Network IO 28%~30%
Optimization focus (VMFUNC result
not updated)
• Current data:
• Next Step:
1. Use VMFUNC to reduce switch view VMEXIT/VMENTRY overhead
2. Improve DataExchange performance by reducing VMCALL times
Thank you!
23

More Related Content

Viewers also liked

LiteOS
LiteOS LiteOS
Linux Kernel Development
Linux Kernel DevelopmentLinux Kernel Development
Linux Kernel Development
LinuxCon ContainerCon CloudOpen China
 
Hyperledger Technical Community in China.
Hyperledger Technical Community in China. Hyperledger Technical Community in China.
Hyperledger Technical Community in China.
LinuxCon ContainerCon CloudOpen China
 
OpenStack on AArch64
OpenStack on AArch64OpenStack on AArch64
Simplify Networking for Containers
Simplify Networking for ContainersSimplify Networking for Containers
Simplify Networking for Containers
LinuxCon ContainerCon CloudOpen China
 
Linuxcon secureefficientcontainerimagemanagementharbor
Linuxcon secureefficientcontainerimagemanagementharborLinuxcon secureefficientcontainerimagemanagementharbor
Linuxcon secureefficientcontainerimagemanagementharbor
LinuxCon ContainerCon CloudOpen China
 
OpenDaylight OpenStack Integration
OpenDaylight OpenStack IntegrationOpenDaylight OpenStack Integration
OpenDaylight OpenStack Integration
LinuxCon ContainerCon CloudOpen China
 
点融网区块链即服务实践 - The Practice of Blockchain as a Service in Dianrong
点融网区块链即服务实践 - The Practice of Blockchain as a Service in Dianrong点融网区块链即服务实践 - The Practice of Blockchain as a Service in Dianrong
点融网区块链即服务实践 - The Practice of Blockchain as a Service in Dianrong
LinuxCon ContainerCon CloudOpen China
 
Practical CNI
Practical CNIPractical CNI
kdump: usage and_internals
kdump: usage and_internalskdump: usage and_internals
kdump: usage and_internals
LinuxCon ContainerCon CloudOpen China
 
Releasing a Distribution in the Age of DevOps.
Releasing a Distribution in the Age of DevOps. Releasing a Distribution in the Age of DevOps.
Releasing a Distribution in the Age of DevOps.
LinuxCon ContainerCon CloudOpen China
 
High Performance Linux Virtual Machine on Microsoft Azure: SR-IOV Networking ...
High Performance Linux Virtual Machine on Microsoft Azure: SR-IOV Networking ...High Performance Linux Virtual Machine on Microsoft Azure: SR-IOV Networking ...
High Performance Linux Virtual Machine on Microsoft Azure: SR-IOV Networking ...
LinuxCon ContainerCon CloudOpen China
 
Status of Embedded Linux
Status of Embedded LinuxStatus of Embedded Linux
Status of Embedded Linux
LinuxCon ContainerCon CloudOpen China
 
GPU Acceleration for Containers on Intel Processor Graphics
GPU Acceleration for Containers on Intel Processor GraphicsGPU Acceleration for Containers on Intel Processor Graphics
GPU Acceleration for Containers on Intel Processor Graphics
LinuxCon ContainerCon CloudOpen China
 
Libvirt API Certification
Libvirt API CertificationLibvirt API Certification
Libvirt API Certification
LinuxCon ContainerCon CloudOpen China
 
Zephyr: Creating a Best-of-Breed, Secure RTOS for IoT
Zephyr: Creating a Best-of-Breed, Secure RTOS for IoTZephyr: Creating a Best-of-Breed, Secure RTOS for IoT
Zephyr: Creating a Best-of-Breed, Secure RTOS for IoT
LinuxCon ContainerCon CloudOpen China
 
See what happened with real time kvm when building real time cloud pezhang@re...
See what happened with real time kvm when building real time cloud pezhang@re...See what happened with real time kvm when building real time cloud pezhang@re...
See what happened with real time kvm when building real time cloud pezhang@re...
LinuxCon ContainerCon CloudOpen China
 
Policy-based Resource Placement
Policy-based Resource PlacementPolicy-based Resource Placement
Policy-based Resource Placement
LinuxCon ContainerCon CloudOpen China
 
Running Legacy Applications with Containers
Running Legacy Applications with ContainersRunning Legacy Applications with Containers
Running Legacy Applications with Containers
LinuxCon ContainerCon CloudOpen China
 
Scale Kubernetes to support 50000 services
Scale Kubernetes to support 50000 servicesScale Kubernetes to support 50000 services
Scale Kubernetes to support 50000 services
LinuxCon ContainerCon CloudOpen China
 

Viewers also liked (20)

LiteOS
LiteOS LiteOS
LiteOS
 
Linux Kernel Development
Linux Kernel DevelopmentLinux Kernel Development
Linux Kernel Development
 
Hyperledger Technical Community in China.
Hyperledger Technical Community in China. Hyperledger Technical Community in China.
Hyperledger Technical Community in China.
 
OpenStack on AArch64
OpenStack on AArch64OpenStack on AArch64
OpenStack on AArch64
 
Simplify Networking for Containers
Simplify Networking for ContainersSimplify Networking for Containers
Simplify Networking for Containers
 
Linuxcon secureefficientcontainerimagemanagementharbor
Linuxcon secureefficientcontainerimagemanagementharborLinuxcon secureefficientcontainerimagemanagementharbor
Linuxcon secureefficientcontainerimagemanagementharbor
 
OpenDaylight OpenStack Integration
OpenDaylight OpenStack IntegrationOpenDaylight OpenStack Integration
OpenDaylight OpenStack Integration
 
点融网区块链即服务实践 - The Practice of Blockchain as a Service in Dianrong
点融网区块链即服务实践 - The Practice of Blockchain as a Service in Dianrong点融网区块链即服务实践 - The Practice of Blockchain as a Service in Dianrong
点融网区块链即服务实践 - The Practice of Blockchain as a Service in Dianrong
 
Practical CNI
Practical CNIPractical CNI
Practical CNI
 
kdump: usage and_internals
kdump: usage and_internalskdump: usage and_internals
kdump: usage and_internals
 
Releasing a Distribution in the Age of DevOps.
Releasing a Distribution in the Age of DevOps. Releasing a Distribution in the Age of DevOps.
Releasing a Distribution in the Age of DevOps.
 
High Performance Linux Virtual Machine on Microsoft Azure: SR-IOV Networking ...
High Performance Linux Virtual Machine on Microsoft Azure: SR-IOV Networking ...High Performance Linux Virtual Machine on Microsoft Azure: SR-IOV Networking ...
High Performance Linux Virtual Machine on Microsoft Azure: SR-IOV Networking ...
 
Status of Embedded Linux
Status of Embedded LinuxStatus of Embedded Linux
Status of Embedded Linux
 
GPU Acceleration for Containers on Intel Processor Graphics
GPU Acceleration for Containers on Intel Processor GraphicsGPU Acceleration for Containers on Intel Processor Graphics
GPU Acceleration for Containers on Intel Processor Graphics
 
Libvirt API Certification
Libvirt API CertificationLibvirt API Certification
Libvirt API Certification
 
Zephyr: Creating a Best-of-Breed, Secure RTOS for IoT
Zephyr: Creating a Best-of-Breed, Secure RTOS for IoTZephyr: Creating a Best-of-Breed, Secure RTOS for IoT
Zephyr: Creating a Best-of-Breed, Secure RTOS for IoT
 
See what happened with real time kvm when building real time cloud pezhang@re...
See what happened with real time kvm when building real time cloud pezhang@re...See what happened with real time kvm when building real time cloud pezhang@re...
See what happened with real time kvm when building real time cloud pezhang@re...
 
Policy-based Resource Placement
Policy-based Resource PlacementPolicy-based Resource Placement
Policy-based Resource Placement
 
Running Legacy Applications with Containers
Running Legacy Applications with ContainersRunning Legacy Applications with Containers
Running Legacy Applications with Containers
 
Scale Kubernetes to support 50000 services
Scale Kubernetes to support 50000 servicesScale Kubernetes to support 50000 services
Scale Kubernetes to support 50000 services
 

Similar to Secure Containers with EPT Isolation

Dockers zero to hero
Dockers zero to heroDockers zero to hero
Dockers zero to hero
Nicolas De Loof
 
XPDS14 - Zero-Footprint Guest Memory Introspection from Xen - Mihai Dontu, Bi...
XPDS14 - Zero-Footprint Guest Memory Introspection from Xen - Mihai Dontu, Bi...XPDS14 - Zero-Footprint Guest Memory Introspection from Xen - Mihai Dontu, Bi...
XPDS14 - Zero-Footprint Guest Memory Introspection from Xen - Mihai Dontu, Bi...
The Linux Foundation
 
Bridging the Semantic Gap in Virtualized Environment
Bridging the Semantic Gap in Virtualized EnvironmentBridging the Semantic Gap in Virtualized Environment
Bridging the Semantic Gap in Virtualized Environment
Andy Lee
 
Container & kubernetes
Container & kubernetesContainer & kubernetes
Container & kubernetes
Ted Jung
 
Containers and workload security an overview
Containers and workload security an overview Containers and workload security an overview
Containers and workload security an overview
Krishna-Kumar
 
Review of Hardware based solutions for trusted cloud computing.pptx
Review of Hardware based solutions for trusted cloud computing.pptxReview of Hardware based solutions for trusted cloud computing.pptx
Review of Hardware based solutions for trusted cloud computing.pptx
ssusere142fe
 
Fosdem 18: Securing embedded Systems using Virtualization
Fosdem 18: Securing embedded Systems using VirtualizationFosdem 18: Securing embedded Systems using Virtualization
Fosdem 18: Securing embedded Systems using Virtualization
The Linux Foundation
 
Revolutionizing the cloud with container virtualization
Revolutionizing the cloud with container virtualizationRevolutionizing the cloud with container virtualization
Revolutionizing the cloud with container virtualizationWSO2
 
Linux container & docker
Linux container & dockerLinux container & docker
Linux container & docker
ejlp12
 
DockerCon SF 2015: Docker Security
DockerCon SF 2015: Docker SecurityDockerCon SF 2015: Docker Security
DockerCon SF 2015: Docker Security
Docker, Inc.
 
Lightweight Virtualization in Linux
Lightweight Virtualization in LinuxLightweight Virtualization in Linux
Lightweight Virtualization in Linux
Sadegh Dorri N.
 
Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...
Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...
Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...
Odinot Stanislas
 
The State of Linux Containers
The State of Linux ContainersThe State of Linux Containers
The State of Linux Containers
inside-BigData.com
 
Networking in Docker EE 2.0 with Kubernetes and Swarm
Networking in Docker EE 2.0 with Kubernetes and SwarmNetworking in Docker EE 2.0 with Kubernetes and Swarm
Networking in Docker EE 2.0 with Kubernetes and Swarm
Abhinandan P.b
 
Networking in docker ee with kubernetes and swarm
Networking in docker ee with kubernetes and swarmNetworking in docker ee with kubernetes and swarm
Networking in docker ee with kubernetes and swarm
Docker, Inc.
 
Unikernels: Rise of the Library Hypervisor
Unikernels: Rise of the Library HypervisorUnikernels: Rise of the Library Hypervisor
Unikernels: Rise of the Library Hypervisor
Anil Madhavapeddy
 
The State of Kubernetes Security
The State of Kubernetes Security The State of Kubernetes Security
The State of Kubernetes Security
Jimmy Mesta
 
AWS re:Invent 2016: Securing Container-Based Applications (CON402)
AWS re:Invent 2016: Securing Container-Based Applications (CON402)AWS re:Invent 2016: Securing Container-Based Applications (CON402)
AWS re:Invent 2016: Securing Container-Based Applications (CON402)
Amazon Web Services
 
AWS re:Invent 2016: Securing Container-Based Applications (CON402)
AWS re:Invent 2016: Securing Container-Based Applications (CON402)AWS re:Invent 2016: Securing Container-Based Applications (CON402)
AWS re:Invent 2016: Securing Container-Based Applications (CON402)
Amazon Web Services
 

Similar to Secure Containers with EPT Isolation (20)

Dockers zero to hero
Dockers zero to heroDockers zero to hero
Dockers zero to hero
 
XPDS14 - Zero-Footprint Guest Memory Introspection from Xen - Mihai Dontu, Bi...
XPDS14 - Zero-Footprint Guest Memory Introspection from Xen - Mihai Dontu, Bi...XPDS14 - Zero-Footprint Guest Memory Introspection from Xen - Mihai Dontu, Bi...
XPDS14 - Zero-Footprint Guest Memory Introspection from Xen - Mihai Dontu, Bi...
 
Bridging the Semantic Gap in Virtualized Environment
Bridging the Semantic Gap in Virtualized EnvironmentBridging the Semantic Gap in Virtualized Environment
Bridging the Semantic Gap in Virtualized Environment
 
Container & kubernetes
Container & kubernetesContainer & kubernetes
Container & kubernetes
 
Containers and workload security an overview
Containers and workload security an overview Containers and workload security an overview
Containers and workload security an overview
 
Review of Hardware based solutions for trusted cloud computing.pptx
Review of Hardware based solutions for trusted cloud computing.pptxReview of Hardware based solutions for trusted cloud computing.pptx
Review of Hardware based solutions for trusted cloud computing.pptx
 
Fosdem 18: Securing embedded Systems using Virtualization
Fosdem 18: Securing embedded Systems using VirtualizationFosdem 18: Securing embedded Systems using Virtualization
Fosdem 18: Securing embedded Systems using Virtualization
 
Revolutionizing the cloud with container virtualization
Revolutionizing the cloud with container virtualizationRevolutionizing the cloud with container virtualization
Revolutionizing the cloud with container virtualization
 
Linux container & docker
Linux container & dockerLinux container & docker
Linux container & docker
 
Os file
Os fileOs file
Os file
 
DockerCon SF 2015: Docker Security
DockerCon SF 2015: Docker SecurityDockerCon SF 2015: Docker Security
DockerCon SF 2015: Docker Security
 
Lightweight Virtualization in Linux
Lightweight Virtualization in LinuxLightweight Virtualization in Linux
Lightweight Virtualization in Linux
 
Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...
Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...
Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...
 
The State of Linux Containers
The State of Linux ContainersThe State of Linux Containers
The State of Linux Containers
 
Networking in Docker EE 2.0 with Kubernetes and Swarm
Networking in Docker EE 2.0 with Kubernetes and SwarmNetworking in Docker EE 2.0 with Kubernetes and Swarm
Networking in Docker EE 2.0 with Kubernetes and Swarm
 
Networking in docker ee with kubernetes and swarm
Networking in docker ee with kubernetes and swarmNetworking in docker ee with kubernetes and swarm
Networking in docker ee with kubernetes and swarm
 
Unikernels: Rise of the Library Hypervisor
Unikernels: Rise of the Library HypervisorUnikernels: Rise of the Library Hypervisor
Unikernels: Rise of the Library Hypervisor
 
The State of Kubernetes Security
The State of Kubernetes Security The State of Kubernetes Security
The State of Kubernetes Security
 
AWS re:Invent 2016: Securing Container-Based Applications (CON402)
AWS re:Invent 2016: Securing Container-Based Applications (CON402)AWS re:Invent 2016: Securing Container-Based Applications (CON402)
AWS re:Invent 2016: Securing Container-Based Applications (CON402)
 
AWS re:Invent 2016: Securing Container-Based Applications (CON402)
AWS re:Invent 2016: Securing Container-Based Applications (CON402)AWS re:Invent 2016: Securing Container-Based Applications (CON402)
AWS re:Invent 2016: Securing Container-Based Applications (CON402)
 

More from LinuxCon ContainerCon CloudOpen China

SecurityPI - Hardening your IoT endpoints in Home.
SecurityPI - Hardening your IoT endpoints in Home. SecurityPI - Hardening your IoT endpoints in Home.
SecurityPI - Hardening your IoT endpoints in Home.
LinuxCon ContainerCon CloudOpen China
 
Flowchain: A case study on building a Blockchain for the IoT
Flowchain: A case study on building a Blockchain for the IoTFlowchain: A case study on building a Blockchain for the IoT
Flowchain: A case study on building a Blockchain for the IoT
LinuxCon ContainerCon CloudOpen China
 
Open Source Software Business Models Redux
Open Source Software Business Models ReduxOpen Source Software Business Models Redux
Open Source Software Business Models Redux
LinuxCon ContainerCon CloudOpen China
 
Introduction to OCI Image Technologies Serving Container
Introduction to OCI Image Technologies Serving ContainerIntroduction to OCI Image Technologies Serving Container
Introduction to OCI Image Technologies Serving Container
LinuxCon ContainerCon CloudOpen China
 
From Resilient to Antifragile Chaos Engineering Primer
From Resilient to Antifragile Chaos Engineering PrimerFrom Resilient to Antifragile Chaos Engineering Primer
From Resilient to Antifragile Chaos Engineering Primer
LinuxCon ContainerCon CloudOpen China
 
OCI Support in Mesos
OCI Support in MesosOCI Support in Mesos
UEFI HTTP/HTTPS Boot
UEFI HTTP/HTTPS BootUEFI HTTP/HTTPS Boot
How Open Source Communities do Standardization
How Open Source Communities do StandardizationHow Open Source Communities do Standardization
How Open Source Communities do Standardization
LinuxCon ContainerCon CloudOpen China
 
Fully automated kubernetes deployment and management
Fully automated kubernetes deployment and managementFully automated kubernetes deployment and management
Fully automated kubernetes deployment and management
LinuxCon ContainerCon CloudOpen China
 
Is there still room for innovation in container orchestration and scheduling
Is there still room for innovation in container orchestration and scheduling Is there still room for innovation in container orchestration and scheduling
Is there still room for innovation in container orchestration and scheduling
LinuxCon ContainerCon CloudOpen China
 
Container Security
Container SecurityContainer Security
Quickly Debug VM Failures in OpenStack
Quickly Debug VM Failures in OpenStackQuickly Debug VM Failures in OpenStack
Quickly Debug VM Failures in OpenStack
LinuxCon ContainerCon CloudOpen China
 
Rethinking the OS
Rethinking the OSRethinking the OS

More from LinuxCon ContainerCon CloudOpen China (13)

SecurityPI - Hardening your IoT endpoints in Home.
SecurityPI - Hardening your IoT endpoints in Home. SecurityPI - Hardening your IoT endpoints in Home.
SecurityPI - Hardening your IoT endpoints in Home.
 
Flowchain: A case study on building a Blockchain for the IoT
Flowchain: A case study on building a Blockchain for the IoTFlowchain: A case study on building a Blockchain for the IoT
Flowchain: A case study on building a Blockchain for the IoT
 
Open Source Software Business Models Redux
Open Source Software Business Models ReduxOpen Source Software Business Models Redux
Open Source Software Business Models Redux
 
Introduction to OCI Image Technologies Serving Container
Introduction to OCI Image Technologies Serving ContainerIntroduction to OCI Image Technologies Serving Container
Introduction to OCI Image Technologies Serving Container
 
From Resilient to Antifragile Chaos Engineering Primer
From Resilient to Antifragile Chaos Engineering PrimerFrom Resilient to Antifragile Chaos Engineering Primer
From Resilient to Antifragile Chaos Engineering Primer
 
OCI Support in Mesos
OCI Support in MesosOCI Support in Mesos
OCI Support in Mesos
 
UEFI HTTP/HTTPS Boot
UEFI HTTP/HTTPS BootUEFI HTTP/HTTPS Boot
UEFI HTTP/HTTPS Boot
 
How Open Source Communities do Standardization
How Open Source Communities do StandardizationHow Open Source Communities do Standardization
How Open Source Communities do Standardization
 
Fully automated kubernetes deployment and management
Fully automated kubernetes deployment and managementFully automated kubernetes deployment and management
Fully automated kubernetes deployment and management
 
Is there still room for innovation in container orchestration and scheduling
Is there still room for innovation in container orchestration and scheduling Is there still room for innovation in container orchestration and scheduling
Is there still room for innovation in container orchestration and scheduling
 
Container Security
Container SecurityContainer Security
Container Security
 
Quickly Debug VM Failures in OpenStack
Quickly Debug VM Failures in OpenStackQuickly Debug VM Failures in OpenStack
Quickly Debug VM Failures in OpenStack
 
Rethinking the OS
Rethinking the OSRethinking the OS
Rethinking the OS
 

Recently uploaded

May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
Philip Schwarz
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
pavan998932
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 

Recently uploaded (20)

May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 

Secure Containers with EPT Isolation

  • 1. Secure Containers with EPT Isolation Chunyan Liu liuchunyan9@huawei.com Jixing Gu jixing.gu@intel.com
  • 2. Presenters • Jixing Gu: Software Architect, from Intel CIG SW Team, working on secure container solution architecture design and KVM support. • Chunyan Liu: Software Engineer, from Huawei Central Software Institution, working on secure container solution design, guest OS support and Docker tooling integration. 2
  • 3. Agenda • Background • Secure Container Solution • Architecture Design • Work Flow • Security Cases & Demo • Benchmark 3
  • 4. Containers Stack Infrasturcture Host OS Docker Engine Bin/Libs App1 Bin/Libs App2 • Ease of Development/Deployment • High performance, low overhead • Shared kernel • Namespace isolation • Attack surface is large • Namespace isolation is weak • Bugs in Linux kernel can allow escape to the host and harm other containers Security? Container Stack & Security
  • 5.  Capability: restrict capabilities of process in container  Seccomp: filter access to syscall  SElinux: customize privileges for processes, users and files.  User namespace: map ‘root’ in container to non-root user on host  Fuse: isolate “/proc”, useful for container monitoring system. However, it’s not enough…… Security Features Supported by Docker CANNOT: container privilege escalation by exploiting kernel vulnerabilities … CAN: restrict container capabilities and privileges, reduce chances container attacking kernel
  • 6. Security Problems We Address Problem-1: Privilege Escalation Container Kernel Attacker App By exploiting kernel vulnerability, like CVE-2017-6074, attackers are able to escalate to root privilege with ret2user Problem-2: Memory Data Peek Container Kernel Victim App By exploiting kernel modules, like /dev/mem device, attackers w/ root privilege are able to peek containers memory data Attacker App $$
  • 7. Our Solution: Namespace-alike Memory Isolation w/ EPT Protection-1: Defeat Privilege Escalation Container Kernel Attacker App Protection-2: Defend Memory Data Peek Container Kernel Victim App Attacker App $$ KVM KVM KVM creates a EPT memory region to isolate container user-space memory from kernel, and captures illegitimate cross-EPT code execution or data access behaviors with EPT violation
  • 8. 8 Infrastructure Host OS (Hypervisor) Guest OS Guest OS Docker Engine Bin/Libs App Bin/Libs App Docker Engine Bin/Libs App Bin/Libs App VM1 VM2 container container container container memory EPT Table EPT Table Typical User Model of Containers
  • 9. 9 Infrastructure Host OS (Hypervisor) Guest OS Guest OS Docker Engine Bin/Libs App Bin/Libs App Docker Engine Bin/Libs App Bin/Libs App VM1 VM2 container container container container memory EPT Table EPT Table Our Ideas To Secure Containers EPT Table  Extract container memory to a new EPT table, separated from VM EPT Table  Strong EPT isolation between container and VM kernel.
  • 10. Legend: Existing ComponentsSC Components • Secure Container KVM Patch  Extend KVM existing interfaces for secure container  Create EPT for new container  Add mem page into EPT view  Delete mem page from EPT view • Secure Container Guest Kernel Patch  Handle secure container creation w/ extended interfaces from the underlying KVM  Handle data exchange w/ extended interfaces from KVM • Secure Container QEMU Patch  Support VM management interfaces • Tiny Changes To Docker Tools  Differentiate Secure Container from Common Container Container 1 Container 2 Docker Tools EPT1 EPT2 Guest OSGuest Kernel Patch Hypervisor ( KVM/QEMU & Host OS ) KVM Patch Secure Container Solution QEMU Patch Server HW System Intel VT Hardware Secure Container Solution Includes: Secure Container SW Stack Up VM
  • 11. Typical Usage Scenarios Private Cloud: Isolate containers in a single shared guest OS Infrastructure Hypervisor(KVM & Host OS) Guest OS Container EPT-1 Container EPT-N Public Cloud: Isolate containers from same tenant & leverage VM to isolate between tenants Infrastructure Hypervisor(KVM & Host OS) Guest OS Container EPT-1 Container EPT-N Container EPT-1 Container EPT-N Guest OS VM-1 (for one tenant) VM-N (for one tenant)
  • 12. Secure Container Arch and Data Flow
  • 14. Kernel to User Condition - Violation happens under view 0 - VM’s CPL = 3 - RIP from user - GVA from user space - Page is present in specific user view User to Kernel Condition - Violation happens under user view - VM’s CPL = 0 - RIP from kernel space - GVA from kernel space New page in current user view Condition - Requested page is not used by other user views - Requested page GVA is in user space EPT Violation Flow – Switch View & Add Page
  • 15. • Delete Page is to delete all access permission for a specific page from an EPT view. • It entails a page free from guest kernel. • EPT violation handles a delayed page deletion in hypervisor. • Hypervisor will erase all the data before deleting a page. EPT Violation Flow – Delete Page
  • 16. • Pages in file cache may be shared across containers. • Hypervisor records share page status. • Shared pages shall be granted to access by different containers. • Share pages should be created per EPT view. EPT Violation Flow – Share Page
  • 17. struct task_struct { …… bool sc_flag; // if true, indicate the task is in a secure container. …… }; int data_exchange(struct data_ex_cfg *cfg, uint64_t size) { if (current->sc_flag == false) { return 1; } …… //Transfer struct data_ex_cfg to hypervisor through hyper call. HyperCall( KVM_HC_SC, HC_DATA_EXCHANGE, cfg, sizeof(struct data_ex_cfg) ); …… } Data Exchange between user/kernel space Pseudo Code
  • 18. Optimization: VMFUNC Switch EPT View • Use VMFUNC in places that need to switch view (e.g. syscall)
  • 19. Note: 1. Kernel trigger user space execution code; 2. CPL (current privilege level) is ring 0; 3. Privilege escalation succeeds; Security Case: CVE-2017-6074 ret2user Attack
  • 20. Note: 1. EPT violation happens while kernel trying to execute user space code because kernel space and user space are isolated in different EPT views; 2. KVM checks CPL (current privilege level) is ring 0, and concludes it is not a reasonable system call return and rejects the operation. (CPL should be ring 3 for normal syscall_return); 3. Even SMAP and SMEP were both disabled, secure container can prevent the privilege escalation; Security Case: CVE-2017-6074 ret2user Attack
  • 22. Benchmark SC vs CC Overhead Notes Boot time 555ms vs 481 ms (~15%) Memory footprint ≈ Host memory has little overhead to manage more EPT tables CPU Performance ≈ Memory Performance ≈ Storage IO 18%~26% Optimization focus (VMFUNC result not updated) Network IO 28%~30% Optimization focus (VMFUNC result not updated) • Current data: • Next Step: 1. Use VMFUNC to reduce switch view VMEXIT/VMENTRY overhead 2. Improve DataExchange performance by reducing VMCALL times