Summary of linux kernel security protections

Summary of Linux kernel
Security Protections and it’s
associated attacks
Shubham Dubey

About me
Shubham Dubey
Security Researcher @ Microsoft
nixhacker.com
/in/shubham0d

AGENDA
• Introduction to kernel configuration and Memory mapping
• Kernel Self protection techniques – Software
• Kernel Self protection/mitigation techniques - Hardware
• Bonus kernel security projects
• Planning: To cover maximum number of protections rather than going
in depth of one

CONFIGURATION support for linux kernel
• Linux kernel can be configured during compilation using configure
parameters. Used to enable/disable a feature
• Usually have naming convention of CONFIG_*.
• Once kernel is built, these parameters cannot be modified.
• To check the status of a parameter in running kernel
• zcat /proc/config.gz | grep CONFIG_DEBUG_RODATA
• grep CONFIG_DEBUG_RODATA /boot/config-`uname -r`
- configuration parameter specific to a security protection.

Linux kernel Memory layout – Low Memory
• This is a linear(1to1) map memory.
• Linux kernel image resides here.
• Usually, 896MB in 32 bits architecture and around 5096 max in 64
bits.
• The virtual address where lowmem is mapped is defined by
PAGE_OFFSET
• Memory allocated by kmalloc() resides in lowmem and it is physically
contiguous.

Linux kernel Memory layout – High Memory
• This is an arbitrary mapped memory.
• Mostly introduced for 32 bit system since, in 32 bit due to virtual address
space limitation, not everything can be mapped to lowmem all the time.
• The virtual base address where high memory is defined is high_memory.
• There are multiple types of mappings in the highmem area:
• Multi-page permanent mappings (vmalloc, ioremap)
• Temporary 1 page mappings (atomic_kmap)
• Permanent 1 page mappings (kmap, fix-mapped linear addresses)

Strict kernel memory permissions
2006
CONFIG_DEBUG_RODATA
CONFIG_ARM_KERNMEMPERMS
2016
CONFIG_DEBUG_RODATA
CONFIG_DEBUG_ALIGN_RODATA
2017
CONFIG_STRICT_KERNEL_RWX
CONFIG_STRICT_MODULE_RWX
Makes kernel rodata and kernel module – R^XW.
Makes kernel text and kernel module text RX^W.
This protect against rare chance that attackers might find and use ROP gadgets that exist in the rodata
section.
Adds an additional section-aligned split of rodata from kernel text so it can be made explicitly non-
executable. This padding may waste memory space to gain the additional protection.

Major changes in memory permission
2006 (v2.6)
Introduced CONFIG_DEBUG_RODATA
link
2006(v2.6)
Added file_operation structs
link
2006(v2.6)
Included kernel_params structure
link
2006(v2.6)
Included kallsyms data
link
2009(v2.6)
Made text section writable for hooks
link
2010(v2.6)
Added NX protection
link
2010(v2.6)
Included module permission
CONFIG_DEBUG_SET_MODULE_RONX
link
2014(v3.19)
Introduced for ARM architecture
link

Major changes in memory permission -
contd
2015 (v4.0)
Introduced for ARM64
link
2016(v4.6)
Introduced
DEBUG_ALIGN_RODATA
link
2016(v4.7)
Added fault_info table
link
2017(v4.11)
Renaming to STRICT_*_RWX
link
2017(v4.14)
Introduced for PPC32
link
2020(v5.7)
Removed
DEBUG_ALIGN_RODATA
link
2020(v5.8)
Refuse loading module that
don’t enforce W^X
link

Limitation of Strict memory permission
• A kernel module/component can modify the page permission using
some default functions available in linux kernel.
• One of such function is set_memory_rw part of set_memory_*
function set.
• It’s not exported to use directly. But can be called manually.

Requirement of KASLR - Background
• The attacker can use kernel vulnerability to insert malicious code into the
kernel address space by various means and redirect the kernel's execution
to that that code.
• One method used to get root privilege:
commit_creds(prepare_creds());
• These attacks rely on knowing where symbols of interest live in the kernel's
address space.
• Those locations change between kernel versions and distribution build, but
are known (or can be figured out) for a particular kernel.
• ASLR disrupts that process and adds another layer of difficulty to an attack.

Kernel Address Randomization Timeline
2006 (v2.6)
Introduced
CONFIG_RELOCATABLE
link
2013(v3.10)
Introduced KASLR for x86/64
link
2013(v3.14)
Introduced
CONFIG_RANDOMIZE_BASE
link
2014(v3.15)
Randomization for modules
link
2016(v4.6)
Introduced for ARM64
link
2016(v4.7)
Introduced for MIPS
link
2016(v4.8)
Randomize kernel memory
range
link
2021(v5.13)
Randomization of kernel stack
link

Consequences of KASLR in linux kernel
• The kernel previously used to be at very start of lowmem, but now any
placement relative to memory ranges is possible.
• Kernel can be separate from lowmem area. In theory, KASLR can put the
kernel anywhere in the range of [16M, MAXMEM) on 64-bit, and [16M,
KERNEL_IMAGE_SIZE) on 32-bit.
• Load address of modules are randomized in the kernel to make KASLR
effective for modules.
• Both physical and virtual addresses are randomized.
• Other Kernel memory regions are also randomized like physical mapping,
vmalloc and vmemmap regions using CONFIG_RANDOMIZE_MEMORY.
• Introduced randstack feature link

CONFIG_RELOCATABLE
• This builds a kernel image that retains relocation information to enables
loading and running a kernel binary from a different physical address than
it has been compiled for.
• This involves processing the dynamic relocations in the image in the early
stages of booting
• Works by building the kernel as a Position Independent Executable (PIE),
which retains all relocation metadata required to relocate the kernel binary
at runtime to a different virtual address.
• Runtime relocation is possible since relocation metadata are embedded
into the kernel.
• Can read more about the internals here

CONFIG_RANDOMIZE_BASE
• Depends on CONFIG_RELOCATABLE
• With CONFIG_RANDOMIZE_BASE set, it randomizes the address at
which the kernel is decompressed at boot.
• It deters exploit attempts relying on knowledge of the location of
kernel internals.
• Entropy is generated using the RDRAND instruction. If not supported
by CPU, then RDTSC is used.

CONFIG_RANDOMIZE_MEMORY
• Introduced in 2016 kernel v4.8
• When enabled the direct mapping of all physical memory,
vmalloc/ioremap space and virtual memory map are randomized.
• Works by randomizing base address of each sections.
• Order is preserved but their base offset differ.
• This makes exploits relying on predictable memory locations less
reliable.

CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAU
LT – Background
• Introduced in 2021 kernel v5.13.
• This feature is based on the original idea from GRSecurity PaX’s
RANDKSTACK feature.
• Linux assigns two pages of kernel stack to every task. This stack is
used whenever the task enters the kernel (system call, device
interrupt, CPU exception, etc).
• By the time the task returns to userland, the kernel land stack pointer
will be at the point of the initial entry to the kernel thread stack.
• This means that a userland originating attack against a kernel bug
would find itself always at the same place on the task's kernel stack.

CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAU
LT - Overview
• This feature aims to make various stack-based attacks that rely on
deterministic stack structure harder.
• The goal of randomize_kstack_offset feature is to add a random offset
after the pt_regs has been pushed to the stack and before the rest of
the thread stack is used during the transition.
• If the stack offset is randomized on each system call, it is harder for an
attack to reliably land in any particular place on the thread stack.
• Even if address is exposed, the stack offsets will change on the next
syscall.

PAX RANDSTACK vs randomize_kstack_offset

KASLR limitation and Bypasses - Infoleaks
• Most common ways different kernel vulnerabilities are exploited with
KASLR on is due to some info leaks. It can be a pointers leak (can be
pointer to struct or heap/stack area) or content leak.
• Raw kernel pointers were frequently printed to the kernel debug log
• Bugs which trigger a kernel oops can be used to leak kernel pointers
• Leak can happen due to uninitialized stack variables. Reference
• Example - CVE-2019-10639 (Remote kernel pointer leak), CVE-2017-
14954

KASLR limitation and Bypasses
• Low Entropy - there are only so many locations the kernel can fit in.
This means an attacker could guess without too much trouble.
• Arbitary read/ write - CVE-2017-18344
• Heap spraying using msgsnd()->msg_msg struct - CVE-2021-26708,
CVE-2021-43267, CVE-2021-22555
• Hardware attacks and side channels – BlindSide attack
• Each vulnerability exploits has it’s own story.

More on kernel address leaks protection
kptr_restrict - This indicates whether restrictions are placed on exposing
kernel addresses via /proc and other interfaces.
• When kptr_restrict is set to (1), kernel pointers printed using the %pK
format specifier will be replaced with 0’s unless the user has CAP_SYSLOG.
dmesg_restrict - This indicates whether unprivileged users are prevented
from using dmesg to view messages from the kernel's log buffer.
• When dmesg_restrict is set, users must have CAP_SYSLOG to use dmesg.

Canaries based protection in stack
• This feature puts a canary value on the stack just before the return
address and validates the value before returning.
• Stack based buffer overflows (that need to overwrite return address)
also overwrite the canary, which gets detected and neutralized via a
kernel panic.
• CONFIG_CC_STACKPROTECTOR – Only put canaries at the starting of
critical functions. Equivalent GCC flag: -fstack-protector
• CONFIG_CC_STACKPROTECTOR_STRONG - Adds the canary for a
wider set of functions. i.e With this, more functions end up with a
canary. Equivalent GCC flag: -fstack-protector-strong

Canaries based protection in stack
CONFIG_HAVE_CC_STACKPROTECTOR
CONFIG_CC_HAS_STACKPROTECTOR_NONE
CONFIG_CC_HAS_SANE_STACKPROTECTOR
where the "CC_" versions are about internal compiler infrastructure.

Canaries based protection limitation
• Issue with canaries: if a stack overflow is detected at all on a
production system, it is often well after the actual event and after an
unknown amount of damage has been done.
• Function recursion based exploitation is still possible.

Virtual mapped stack – CONFIG_VMAP_STACK
• Earlier the stack lives in directly-mapped kernel memory, so it must be
physically contiguous.
• VMAP_STACK enable support for kernel stack to be present at not
physically contiguous memory (vmalloc area).
• This adds a guard pages in each thread stack area.
• Guard pages are the pages that are set non writable in page table.
Upon writing, it cause PF exception.
• This feature causes reliable faults when the stack overflows. Kernel
can set up the usability of stack trace and response to overflow.

Bypassing VMAP guard pages – Stack hopping
• Background – The kernel place one guard page at the starting and
ending of each stack area.
• A thread that wanders off the bottom of a stack into the guard page
will be rewarded with a segmentation-fault signal
• The fundamental problem with the guard page is that it is too small.
• There are a number of ways in which the stack can be expanded by
more than one page at a time.

Stack hopping - contd
Stack expansion working:
• if the stack-pointer (the esp register, on i386_64) reaches the start of the
stack and there is unmapped memory pages below
• then a "page-fault" exception is raised and caught by handler
• the page-fault handler transparently expands the stack of the process
• or it terminates the thread/process if the stack expansion
fails(THREAD_SIZE is reached)
Unfortunately, this stack expansion mechanism is implicit and fragile: it relies
on page-fault exceptions, but if another memory region is mapped directly
below the stack, then the stack-pointer can move from the stack into the
other memory region without raising a page-fault.

Stack hopping – Steps overview
• "Clashing" the stack with another memory region: allocate memory
until the stack reaches another memory region.
• "Jumping" over the stack guard-page: move the
stack-pointer from the stack and into the other
memory region, without accessing the stack
guard-page.
• "Smashing" the stack, or the other memory region:
i.e overwrite the stack with the other memory
region.
Kernel Thread 1 Stack
Guard Page (4kb)
Thread 2 Stack/memory

Stack hopping – Illustration
• Step 1: Allocate memory until the start of the stack reaches the end
of another memory region
• Through megabytes of argument passed to the thread function.
• Through recursive function call. Project zero reference
Guard Page
Thread 2 Stack
Fill up the stack

• Step 1: Allocate memory until the start of the stack reaches the end
of another memory region
• Through megabytes of argument passed to the thread function.
• Through recursive function call. Reference
• Step 2: Consume the unused stack memory that
separates the stack-pointer from the start of the
stack.
Guard Page
Thread 2 Stack
Fill up the stack

• Step 3: Jump over the stack guard-page, into another memory region
• Move the stack-pointer from the stack and into the memory region that
clashed with the stack without accessing the guard-page. Can be done using
large memory allocations.
• it must be larger than the guard-page;
• it must end in the stack, below the guard-page;
• it must start in the memory region above the stack guard-page;
• it must not be fully written to (a full write would access
the guard-page, raise a page-fault exception.
Guard Page
Thread 2 Stack
Fill up the stack
Th1 stack allocation

• Step 3: Jump over the stack guard-page, into another memory region
• Move the stack-pointer from the stack and into the memory region that
clashed with the stack but without accessing the guard-page.
• it must be larger than the guard-page;
• it must end in the stack, below the guard-page;
• it must start in the memory region above the stack guard-page;
• it must not be fully written to (a full write would access
the guard-page, raise a page-fault exception).
• Step 4: Either smash the stack with another
memory region or smash another memory region
with the stack.
Guard Page
Fill up the stack
Th1 stack allocation
Th2 starts filling

Kernel Page table isolation -
CONFIG_PAGE_TABLE_ISOLATION
• Introduced in 2017 (v4.15) as countermeasure to famous Meltdown
attack.
• Earlier whole kernel page table used to be mapped to user space
process memory.
• To mitigate Meltdown like side channel, linux create an independent
set of page tables for use only when running userspace applications.
• The userspace page tables only contain a minimal amount of kernel
data. Only what is needed to enter/exit the kernel such as the
entry/exit functions and interrupt descriptor table (IDT).

ret2usr exploitation
• ret2usr (return-to-user) based on the fact that code in kernel mode
can execute the code in user mode.
• Even with protection like KPTI, attackers can easily execute shellcode
with kernel rights by hijacking a privileged execution path in kernel
mode and redirecting it to user space.
• In a ret2usr attack, kernel data is overwritten with user space
addresses, typically after exploitation of memory corruption bugs in
kernel code.
• Example associated CVEs: CVE-2017-7308

ret2usr exploitation – source Blackhat 2014

CONFIG_RETPOLINE
• Introduced in 2018 (v4.15) as countermeasure to famous Specter attack.
• It guard against kernel-to-user data leaks by avoiding speculative indirect
branches.
• Specter background:
• Modern CPUs have a branch predictor to optimize their performance.
• It works by referencing Branch Target Buffer or BTB that is a storage for a key(PC)-
value(Target PC) pair.
• But its size limitation causes a BTB collision that leads to a new side-channel attack
• Using this primitive, an attacker can inject an indirect branch target into
the BTB, and consequently run some codes in a speculative context. It can
leak a sensitive data across boundaries. (e.g. between VMs, Processes,
Kernel/User mode)

CONFIG_RETPOLINE - Contd
• In simple terms, it replace all the indirect jmp and call with return
instructions.

CONFIG_MODULE_SIG – Module signing
• Introduced in 2012 (v3.7), when set allow only loading of signed module
with valid key.
• The kernel module signing facility cryptographically signs modules during
installation and then checks the signature upon loading the module.
• This allows increased kernel security by disallowing the loading of
unsigned and malicious modules.
• It uses RSA public key encryption and upto sha512 for hashing.
• A private key is used to generate a signature and the corresponding public
key is used to check it.
• Under normal conditions, the kernel build will automatically generate a
new keypair using openssl if one does not exist in the file
cert/signing_key.pem. More details here

SMEP/SMAP – Supervisor mode
execution/access prevention
• SMEP prevents the CPU in kernel-mode to jump to an executable
page that has the user flag set in the PTE.
• This prevents the kernel from executing user-space code accidentally
or maliciously, for example prevents kernel from jumping to specially
prepared user-mode shellcode. (ret2usr)
• Can be enabled by CR4.SMEP (20th bit) and CR4.SMAP(21st bit)

SMEP/SMAP – Count.
• SMAP extend the protection of SMEP to read and write.
• SMAP can be temporarily disabled for explicit memory accesses by
setting the EFLAGS.AC (Alignment Check) flag.
• ARM have equivalent of SMEP, named as PXN (Privileged Execute-
Never)
• An standlone bundle kGuard(cross platform) can be used in case
hardware support is not present.
• It inject CFAs that perform a small runtime check before every branch to
verify that the target address is located in kernel space or loaded from kernel-
mapped memory.

Bypassing SMEP/SMAP – ret2dir
• Return to direct-mapped memory
• Each address allocated at userspace will have a physical address
directly mapped on kernel address space as well(called address
aliasing) if it’s in lowmem.
• To bypass SMEP/SMAP, user can provide kernel synonym address
rather than the userspace mapped address during ret2usr
exploitation.
• Kernel will execute it without any issues.

Ret2dir – process overview
• Step 1: Allocate a memory in userspace that will get mapped in
lowmem of kernel space
• This can be done by allocating big chunks of memory from different processes
and filling the highmem. This force kernel to allocate memory from lowmem.
• Step 2: Guess the kernel space address in lowmem for the directly
mapped memory
• /proc/<pid>/pagemap can be used to get the page number offset in lowmem.
• lowmem_base knowledge will help determining base.
• Step 3: Use ret2usr vulnerability to execute the shellcode. (Disclaimer:
The memory area need to have WX permission set in kernel mapping)

CONFIG_X86_KERNEL_IBT - Indirect branch
tracking
• Merged in March, 2022 (v5.8)
• If an attacker can corrupt a variable that is used for indirect branches,
they may be able to redirect the kernel's execution flow to an
arbitrary location.
• Exploit techniques like return-oriented programming and jump-
oriented programming depend on this kind of redirection.
• IBT’s purpose is to prevent an attacker from causing an indirect
branch (a function call via a pointer variable, for example) to go to an
unintended place.

tracking – Compiler version
• It works by trying to ensure that the target of every indirect branch is, in
fact, intended to be reached that way.
• In linux implementation, indirect branch goes through a "jump table",
ensuring that the target is not only meant to be reached by indirect
branches, but that the prototype of the called function matches what the
caller is expecting.
• whenever an indirect function call is made, control goes to a special function called
__cfi_check()
• It will verify that the target address is, indeed, an address within the expected jump
table, extract the real function address from the table, and jump to that address.
• If the target address is not within the jump table, instead, the default action is to
assume that an attack is in progress and immediately panic the system

tracking – Intel CET version
• If IBT is enabled, the CPU will ensure that every indirect branch lands
on a special instruction (endbr32 or endbr64). If anything else is
found, the processor will raise a control-protection (#CP) exception.
• The processor implements a state machine that tracks indirect JMP
and CALL instructions.
• When one of these instructions is seen, the state machine moves from IDLE
to WAIT_FOR_ENDBRANCH state.
• In WAIT_FOR_ENDBRANCH state the next instruction in the program stream
must be an ENDBRANCH.
• If an ENDBRANCH is not seen the processor causes a control protection fault
(#CP), otherwise the state machine moves back to IDLE state.

Intel CET – Indirect Branch Tracking illustration

CONFIG_ARM64_MTE – Memory tagging
extension
• Merged in linux kernel v5.10, this mechanism enables the automated
detection of a wide range of memory-safety issues.(user-space only)
• Context: Arm64 only uses 48 bits for addressing, remaining bits are
“top byte ignore” feature that allows software to store arbitrary data
in the uppermost byte of a virtual address.
• MTE allows the storage of a four-bit "key” value in bits 59-56 of a
virtual address that is associate with one or more 16-byte ranges of
memory.
• When a pointer is dereferenced, the key stored in the pointer itself is
compared to that associated with the memory the pointer references;
if the two do not match, a trap may be raised.

Memory tagging extension - contd
• Each memory granule has a tag (aka color)
• Every pointer has a tag
• On allocation, both memory and pointer get a matching random tag

Memory tagging extension - contd
• Each memory granule has a tag (aka color)
• Every pointer has a tag
• On allocation, both memory and pointer get a matching random tag
• On pointer dereference, pointer tag must match memory tag

CONFIG_AMD_MEM_ENCRYPT - AMD Secure
Memory Encryption
• SME is hardware feature present on AMD CPUs allowing system RAM
to be encrypted and decrypted (mostly) transparently by the CPU,
with a little help from the kernel to transition to/from encrypted
RAM.
• Such RAM should be more secure against various physical attacks like
RAM access via the memory bus and should make the radio signature
of memory bus traffic harder to intercept (and decrypt) as well.
• It works by marking individual pages of memory as encrypted using
the standard x86 page tables. A page that is marked encrypted will
be automatically decrypted when read from DRAM and encrypted
when written to DRAM.

AMD SME -contd
• Support for SME can be determined
through the CPUID instruction.
The CPUID function 0x8000001f[eax]
reports if SME is supported.
• If support for SME is present,
MSR 0xc00100010 (MSR_K8_SYSCFG)
Bit[21] can be used to determine if
memory encryption is enabled.
• Bits[5:0] pagetable bit number used
to activate memory encryption.(C-bit)

Linux kernel
security
projects
Honorable mentions

grSecurity and PAX project
• Grsecurity ( www.grsecurity.net) is the only drop-in Linux kernel
replacement offering high-performance, state-of-the-art exploit
prevention against both known and unknown threats.
• PaX is a separate project that is included in Grsecurity as part of its
security strategy. The PaX project researches various defences against
the exploitation of software bugs (e.g., buffer overflows and user-
supplied format string bugs).
• PaX does not focus on finding and fixing the bugs, but rather the
prevention and containment of exploit techniques

Linux Kernel Runtime Guard(LKRG)
• It’s an independent project, equivalent of windows Patchguard.
• Performs runtime integrity checking of the Linux kernel and detection
of security vulnerability exploits against the kernel.
• LKRG is a kernel module, so it can be built for and loaded on top of a
wide range of mainline and distros' kernels, without needing to patch
those.
• It uses kprobe for hooking various linux kernel apis.
• Amount of protection provided is based on profile set by user.
Allowed values are 0 (log and accept), 1 (selective), 2 (strict), and 3
(paranoid).

Linux Kernel Runtime Guard(LKRG) - Features
• Exploit detection
• Tracking processes important data structures and metadata, pointers and capability,
namespace, cred struct modification.
• Ptrace access, Keyring access
• SeLinux state modification
• Checking kernel modules integrity
• Module removed from module list or KOBJ.
• Gathers information about loaded kernel modules and tries to protect them via calculating
hashes from their core_text section.
• Kernel Components validation
• SMEP, MSRs, pint, kint, umh and Profiles validation
• Periodically check critical system hashes using timer
• (Un)Hide itself from the module system activity components

LSM – Linux security module framework
• The LSM kernel patch provides a general kernel framework to support
security modules.
• By itself, the framework does not provide any additional security; it
merely provides the infrastructure to support security modules
• The LSM kernel patch adds security fields to kernel data structures
and inserts calls to hook functions at critical points in the kernel code
to manage the security fields and to perform access control.
• It also adds functions for registering and unregistering security
modules, and adds a general security system call to support new
system calls for security-aware applications.

Kernel lockdown
• Introduced in 2021(v5.3), a linux kernel security module that uses LSM.
• The Kernel Lockdown feature is designed to prevent both direct and
indirect access to a running kernel image
• Attempting to protect against unauthorized modification of the kernel image and
• Prevent access to security and cryptographic data located in kernel memory.
• If a prohibited or restricted feature is accessed or used, the kernel will emit
a message that looks like:
Lockdown: X: Y is restricted, see man kernel_lockdown.7
where X indicates the process name and Y indicates what is restricted.

Uncovered Protections
• Retbleed mitigation:
• CONFIG_CPU_IBPB_ENTRY
• CONFIG_CPU_UNRET_ENTRY
• CONFIG_RETPOLINE
• CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
• Specter and Meltdown - ARM
• CONFIG_MITIGATE_SPECTRE_BRANCH_HISTORY
• CONFIG_UNMAP_KERNEL_AT_EL0
• CONFIG_ARM64_PTR_AUTH_KERNEL
• CONFIG_ARM64_BTI
• CONFIG_ARM64_EPAN

Uncovered Protections
• CONFIG_DEBUG_STACKOVERFLOW
• KFENCE
• KASAN/KMSAN/KCSAN
• CONFIG_DEBUG_KMEMLEAK
• refcount_t API
• FORTIFY_SOURCE
• PAX RAP
• L1TF mitigation - PTE inversion

Summary of linux kernel security protections

More Related Content

What's hot

Similar to Summary of linux kernel security protections

Recently uploaded

Summary of linux kernel security protections

Editor's Notes