SlideShare a Scribd company logo
1 of 73
Download to read offline
vmlinux: Anatomy of bzimage and how
x86_64 processor is booted
Adrian Huang | May, 2021
* Based on kernel 5.11 (x86_64) – QEMU
* Legacy BIOS
Agenda
• bzimage: high-level overview
• Layout of bzImage
• ELF layout
• setup.bin and compressed vmlinux
• Physical memory layout
• Entry point of Linux – ‘start_of_setup’@0x10200 (physical memory)
• From viewpoint of GRUB and QEMU loader
• Initialization flow
• Compressed vmlinux
• ELF layout
• Physical memory layout
• Initialization flow
Agenda
• Layout of bzImage
• ELF layout
• setup.bin and compressed vmlinux
• Physical memory layout
• Entry point of Linux – ‘start_of_setup’@0x10200 (physical memory)
• From viewpoint of GRUB and QEMU loader
• Initialization flow
• Compressed vmlinux
• ELF layout
• Physical memory layout
• Initialization flow
• CPU architecture knowledge
✓ Near call and far call
✓ Near jump and far jump
✓ Instruction opcode
• CPU Operation Mode
✓ Real mode, protected mode and long mode (64-bit mode)
➢ Memory addressing
• ELF
✓ Relocation, program header,…
• GNU assembly
Requisite Knowledge
bzImage: High-level Overview (1/2)
Boot code
(Real mode -> protected mode)
Compressed vmlinux
Boot code
(protected mode + paging -> long mode)
vmlinux.bin.gz
bzImage
bzImage: High-level Overview (2/2)
Boot code
(Real mode -> protected mode)
Compressed vmlinux
Boot code
(protected mode + paging -> long mode)
vmlinux.bin.gz
bzImage
setup.bin
Compressed
vmlinux
(Protected-mode kernel)
CRC
bzImage
Layout of bzImage – setup.bin
setup.bin
Compressed
vmlinux
(Protected-mode kernel)
.bstext
CRC
.bsdata
Part 1 of ‘.header’
Part 2 of ‘.header’
.entrytext
Kernel Boot Section: 512 bytes (MBR)
Source : arch/x86/boot/header.S
<- arch/x86/boot/header.S
0x0
0x200
Offset/Size Name Description
0x1F1/1 setup_sects The size of the setup in sectors
0x01FE/2 boot_flag magic number: 0xAA55
0x200/2 jump Jump instruction
0x214/4 code32_start
Boot loader hook: The address to jump to in protected mode.
Default: 0x100000
".header": Real-mode kernel header
0x1F1
Part 1 of ‘header’
Part 2 of ‘header’
.inittext
.initdata
.text
.text32
.rodata
.videocards
.data
.signature
.bss
<- arch/x86/boot/header.S
<- arch/x86/boot/header.S
<- arch/x86/boot/tty.c
<- arch/x86/boot/*.c
arch/x86/boot/bioscall.S
arch/x86/boot/copy.S
arch/x86/boot/ pmjump.S
<- arch/x86/boot/ pmjump.S
<- arch/x86/boot/*.c
<- arch/x86/boot/video-*.c
<- arch/x86/boot/*.c
<- 4-byte signature
<- arch/x86/boot/*.c
bzImage
ELF sections
Layout of bzImage – setup.bin
setup.bin
Compressed
vmlinux
(Protected-mode kernel)
.bstext
CRC
.bsdata
Part 1 of ‘.header’
Part 2 of ‘.header’
.entrytext
Kernel Boot Section: 512 bytes (MBR)
Source : arch/x86/boot/header.S
<- arch/x86/boot/header.S
0x0
0x200
Offset/Size Name Description
0x1F1/1 setup_sects The size of the setup in sectors
0x01FE/2 boot_flag magic number: 0xAA55
0x200/2 jump Jump instruction
0x214/4 code32_start
Boot loader hook: The address to jump to in protected mode.
Default: 0x100000
".header": Real-mode kernel header
0x1F1
Part 1 of ‘header’
Part 2 of ‘header’
short jump
.inittext
.initdata
.text
.text32
.rodata
.videocards
.data
.signature
.bss
<- arch/x86/boot/header.S
<- arch/x86/boot/header.S
<- arch/x86/boot/tty.c
<- arch/x86/boot/*.c
arch/x86/boot/bioscall.S
arch/x86/boot/copy.S
arch/x86/boot/ pmjump.S
<- arch/x86/boot/ pmjump.S
<- arch/x86/boot/*.c
<- arch/x86/boot/video-*.c
<- arch/x86/boot/*.c
<- 4-byte signature
<- arch/x86/boot/*.c
bzImage
near call
long jump
1
2
3
1 CPU Real Mode (16 bits)
2 CPU Real Mode
3 CPU Real Mode -> CPU Protected Mode (32 bits)
ELF sections
Layout of bzImage – compressed vmlinux
setup.bin
(arch/x86/boot/setup.bin)
Compressed vmlinux
(Protected-mode kernel)
Note
ELF: arch/x86/boot/compressed/vmlinux
Binary: arch/x86/boot/vmlinux.bin
CRC
bzImage
vmlinux.bin
vmlinux.bin.gz
How to pack vmlinux.bin.gz?
arch/x86/boot/compressed
.head.text
.rodata..compressed
(vmlinux.bin.gz)
.text
.rodata
.data
arch/x86/boot/compressed/head_64.S
0x0
.bss
.pgtable
arch/x86/boot/compressed/piggy.S created by arch/x86/boot/compressed/mkpiggy.c
arch/x86/boot/compressed/vmlinux.bin.gz
arch/x86/boot/compressed/*.c
arch/x86/boot/compressed/head_64.S
arch/x86/boot/compressed/efi_thunk_64.S
arch/x86/boot/compressed/head_64.S
ELF Sections
Layout of bzImage – compressed vmlinux
Compressed vmlinux
setup.bin
(arch/x86/boot/setup.bin)
Compressed vmlinux
(Protected-mode kernel)
Note
ELF: arch/x86/boot/compressed/vmlinux
Binary: arch/x86/boot/vmlinux.bin
CRC
bzImage
.head.text
.rodata..compressed
(vmlinux.bin.gz)
.text
.rodata
.data
arch/x86/boot/compressed/head_64.S
0x0
.bss
.pgtable
arch/x86/boot/compressed/piggy.S created by arch/x86/boot/compressed/mkpiggy.c
arch/x86/boot/compressed/vmlinux.bin.gz
arch/x86/boot/compressed/*.c
arch/x86/boot/compressed/head_64.S
arch/x86/boot/compressed/efi_thunk_64.S
arch/x86/boot/compressed/head_64.S
ELF Sections
Layout of bzImage – compressed vmlinux.bin
* Symbol: Equivalent to using ‘.set’ directive
* https://sourceware.org/binutils/docs/as/Setting-Symbols.html
Why z_input_len/input and z_output_len/output_len?
* BFD: Binary File Descriptor library - https://www.gnu.org/software/binutils/
Memory layout of bzImage – Entry Point Address
Where is ‘X’?
BIOS use only
Typically used by MBR
Reserved for MBR/BIOS
Boot loader
0x00000
0x00600
0x00800
0x01000
Kernel boot section
stack/heap
X
X+0x08000
Reserved for BIOS
Command line
I/O memory hole
Protected-mode kernel
(Compressed vmlinux)
X+0x10000
0x100000
0xA0000
Boot sector entry point 0000:7C00
The kernel legacy boot sector
The kernel real-mode/protected mode code
For use by the kernel real-mode/protected mode code
Physical Memory
Kernel setup code
Reference: Documentation/x86/boot.rst
Entry Point of Linux - GRUB
Memory addressing in real mode
[GRUB] Get the memory address for real mode code
1. gs = fs = es = ds = ss = 0x1000
2. sp = GRUB_LINUX_SETUP_STACK = 0x9000
3. cs = 0x1020, ip = 0
Registers configured by GRUB
Kernel boot section
0x10000
0x10200
Physical Memory
GRUB loads ‘setup.bin’ at address 0x10000
0
ds = es = fs = gs = ss
cs
stack
ss:sp = 0x1FFF0
protected mode
real mode
Kernel setup
code
Entry Point of Linux - GRUB
Memory addressing in real mode
[GRUB] Get the memory address for real mode code
1. gs = fs = es = ds = ss = 0x1000
2. sp = GRUB_LINUX_SETUP_STACK = 0x9000
3. cs = 0x1020, ip = 0
Registers configured by GRUB
Kernel boot section
0x10000
0x10200
Physical Memory
GRUB loads ‘setup.bin’ at address 0x10000
0
ds = es = fs = gs = ss
cs
stack
ss:sp = 0x1FFF0
protected mode
real mode
Kernel setup
code
1. QEMU loader and GRUB load ‘setup.bin’ at address 0x10000
2. QEMU loader sets SS:SP = 1000:FFF0 while GRUB sets SS:SP 1000:9000
Entry Point of Linux: QEMU loader
Kernel boot section
0x10000
0x10200
Physical Memory
QEMU loader loads ‘setup.bin’ at address 0x10000
0
ds = es = fs = gs = ss
cs
stack
ss:sp = 0x1FFF0
protected mode
real mode
Kernel setup
code
1
2
3
4
5
6
7
ds = es = fs = gs = ss = segment_addr = 0x1000
esp = stack_addr = cmdline_addr - setup_addr – 16 = 0x20000 –
0x10000 – 16 = 0x10000 – 16 = 0xfff0
cs = 0x1020, ip = 0
Registers configured by QEMU loader
5
6
7
Prepare for far return
8
far return: change ‘cs’ by means of
CPU arch itself
Entry Point of Linux: QEMU loader – Near and Far calls
3
4
5
6
7 Prepare for far return
8
far return: change ‘cs’ by means of
CPU arch itself
Entry Point of Linux: QEMU loader
Kernel boot section
0x10000
0x10200
Physical Memory
QEMU loader loads ‘setup.bin’ at address 0x10000
0
ds = es = fs = gs = ss
cs
stack
sp = 0x1FFF0 (ss:0xFFF0)
protected mode
real mode
Kernel setup
code
Make sure setup.bin is loaded at 0x10000
Make sure vmlinux.bin is loaded at 0x100000
Address of setup.bin
Address of vmlinux.bin
arch/x86/boot/setup.ld
arch/x86/boot/header.S
1
2
Entry Point of Linux: GNU Linker
[GNU Linker] ENTRY() command
* First executable instruction in an output file → entry point
* ENTRY() is one of choosing the entry point
-- the `-e' entry command-line option
-- the ENTRY(symbol) command in a linker control script
-- the value of the symbol start, if present
-- the address of the first byte of the .text section, if present;
-- the address 0
arch/x86/boot/setup.ld
1
Entry Point of Linux: GNU Linker
[GNU Linker] ENTRY() command
* First executable instruction in an output file → entry point
* ENTRY() is one of choosing the entry point
-- the `-e' entry command-line option
-- the ENTRY(symbol) command in a linker control script
-- the value of the symbol start, if present
-- the address of the first byte of the .text section, if present;
-- the address 0
Kernel boot section
0x10000
0x10200
Physical Memory
QEMU loader loads ‘setup.bin’ at address 0x10000
0
ds = es = fs = gs = ss
cs
stack
sp = 0x1FFF0 (ss:0xFFF0)
protected mode
real mode
Kernel setup
code
Entry Point of Linux: start_of_setup - GDB
Kernel boot section
0x10000
0x10200
Physical Memory
Boot loader loads ‘setup.bin’ at address 0x10000
0
gs = fs = es = ds = ss
cs
stack
sp = 0x1FFF0 (ss:0xFFF0)
protected mode
real mode
Kernel setup
code
Entry Point of Linux: start_of_setup - GDB
Kernel boot section
0x10000
0x10200
Physical Memory
Boot loader loads ‘setup.bin’ at address 0x10000
0
gs = fs = es = ds = ss
cs
stack
sp = 0x1FFF0 (ss:0xFFF0)
protected mode
real mode
Kernel setup
code
Entry Point of Linux: start_of_setup – short jump
Kernel boot section
0x10000
0x10200
Physical Memory
Boot loader loads ‘setup.bin’ at address 0x10000
0
gs = fs = es = ds = ss
cs
stack
sp = 0x1FFF0 (ss:0xFFF0)
protected mode
real mode
Kernel setup
code
Offset/Size Name Description
0x1F1/1 setup_sects The size of the setup in sectors
0x01FE/2 boot_flag magic number: 0xAA55
0x200/2 jump Jump instruction
0x214/4 code32_start
Boot loader hook: The address to jump to in protected mode.
Default: 0x100000
".header": Real-mode kernel header
Entry Point of Linux: start_of_setup – short jump
0x26c – 0x202 = 0x6a
Entry Point of Linux: start_of_setup
Call Path
Kernel boot section
0x10000
0x10200
Boot loader loads ‘setup.bin’ at address 0x10000
0
gs = fs = es = ds = ss = cs
cs
stack
sp = 0x1FFF0 (ss:0xFFF0)
protected mode
real mode
Kernel setup
code
Physical Memory
lretw instruction: Far Return Operation
‘l’ prefix: far control transfer
‘w’ suffix: word (16 bits)
Entry Point of Linux: start_of_setup
Call Path
Kernel boot section
0x10000
0x10200
Boot loader loads ‘setup.bin’ at address 0x10000
0
gs = fs = es = ds = ss = cs
cs
stack
sp = 0x1FFF0 (ss:0xFFF0)
protected mode
real mode
Kernel setup
code
Physical Memory
lretw instruction: Far Return Operation
‘l’ prefix: far control transfer
‘w’ suffix: word (16 bits)
1
1
2
2
3
3
Entry Point of Linux: start_of_setup
Kernel boot section
0x10000
0x10200
Boot loader loads ‘setup.bin’ at address 0x10000
0
gs = fs = es = ds = ss = cs
cs
stack
sp = 0x1FFF0 (ss:0xFFF0)
protected mode
real mode
Kernel setup
code
Physical Memory
Call Path
lretw instruction: Far Return Operation
‘l’ prefix: far control transfer
‘w’ suffix: word (16 bits)
Entry Point of Linux: start_of_setup – Why to align CS?
Kernel boot section
0x10000
0x10200
Boot loader loads ‘setup.bin’ at address 0x10000
0
gs = fs = es = ds = ss = cs
cs
stack
sp = 0x1FFF0 (ss:0xFFF0)
protected mode
real mode
Kernel setup
code
Physical Memory
Call Path
lretw instruction: Far Return Operation
‘l’ prefix: far control transfer
‘w’ suffix: word (16 bits)
If cs is not align with ds, ds and es are incorrect
after returning from ‘intcall’.
Entry Point of Linux: start_of_setup – data & bss section
Call Path
Kernel boot section
0x10000
0x10200
Boot loader loads ‘setup.bin’ at address 0x10000
0
gs = fs = es = ds
= ss= cs
stack
sp = 0x1FFF0
protected mode
real mode
Kernel setup
code
Kernel boot section
0x10000
0x10200
0
gs = fs = es = ds
= ss = cs
stack
sp = 0x1FFF0
protected mode
real mode
Kernel setup
code
BSS Section
_end
__bss_start
__bss_end
Data Section
Physical Memory
Entry Point of Linux: start_of_setup -> main()
Call Path
Entry Point of Linux: start_of_setup -> main()
Entry Point of Linux: start_of_setup -> main()
Entry Point of Linux: start_of_setup -> main() -> copy_boot_params()
Call Path
• copy setup header into boot parameter block (struct boot_params:
arch/x86/include/uapi/asm/bootparam.h)
o `struct setup_header hdr` in boot_params
▪ Contain the same fields defined in Linux boot protocol. Those fields are
configured by boot loader and kernel compile/build time
Call Path • console_init()
o Initialize the corresponding serial port if command line has ‘earlyprintk’
parameter
Entry Point of Linux: start_of_setup -> main() -> console_init() – (1/2)
Kernel boot section
0x10000
0x10200
0
gs = fs = es = ds
= ss = cs
stack
sp = 0x1FFF0
protected mode
real mode
Kernel setup
code
BSS Section
_end
__bss_start
__bss_end
Data Section
Kernel Command Line
0x20000
QEMU Loader
Physical Memory
Call Path • console_init()
o Initialize the corresponding serial port if command line has ‘earlyprintk’
parameter
Entry Point of Linux: start_of_setup -> main() -> console_init() – (2/2)
Kernel boot section
0x10000
0x10200
0
gs = fs = es = ds
= ss = cs
stack
sp = 0x1FFF0
protected mode
real mode
Kernel setup
code
BSS Section
_end
__bss_start
__bss_end
Data Section
Kernel Command Line
0x20000
Physical Memory
Call Path • init_heap()
• Discussion in the next few slides
• validate_cpu()
o Check CPU flags
o Check if long mode (x86_64) is available
o [AMD – K7 Processor] Turn SSE+SSE2 on if they are missing in CPU
flags
• detect_memory()
o Use different program interfaces (0xe820, 0xe801 and 0x88) for memory
detection
o 0xe820
▪ Fill boot_params.e820_table based on e820 map
Entry Point of Linux: start_of_setup -> main() -> validate_cpu() & detect_memory()
Kernel boot section
0x10000
0x10200
0
gs = fs = es = ds
= ss = cs
stack
sp = 0x1FFF0
protected mode
real mode
Kernel setup
code
BSS Section
_end
__bss_start
__bss_end
Data Section
Kernel Command Line
0x20000
Physical Memory
Call Path
• init_heap
o Setup the heap space if the ‘CAN_USE_HEAP’ flag (0x80) is set in loadflags
of the kernel setup header.
Entry Point of Linux: start_of_setup -> main() -> init_heap() (1/2)
Call Path
Entry Point of Linux: start_of_setup -> main() -> init_heap() (2/2)
heap: allocate heap if CAN_USE_HEAP’ flag (0x80) is set
No heap
sp (STACK_SIZE = 0x400)
Kernel boot section
0x10000
0x10200
stack
protected mode
real mode
Kernel setup
code
BSS Section
Unused Area
__bss_start
__bss_end
HEAP = heap_end = _end
Data Section
sp (STACK_SIZE = 0x400)
Kernel boot section
0x10000
0x10200
stack
protected mode
real mode
Kernel setup
code
BSS Section
Heap
__bss_start
__bss_end
HEAP = _end
heap_end
Data Section
gs = fs = es = ds = ss = cs
gs = fs = es = ds = ss = cs
go_to_protected_mode
GDT_ENTRY_BOOT_DS
GDT_ENTRY_BOOT_CS
NULL
NULL
0
1
2
3
GDT_ENTRY_BOOT_TSS
4
Descriptor Table: boot_gdt
System
Memory
0
0xFFFFFFFF
limit
Base Address
GDTR
x86 Segmentation: Address Translation
setup_gdt(): Setup 4G memory space for CS/DS
Call Path
protected_mode_jump (1/6)
protected_mode_jump – ljmpl instruction: ignore ‘.Lin_pm32’ relocation (2/6)
0x30cc
Jump
(absolute
address)
to
the
wrong
location
sp (STACK_SIZE = 0x400)
Kernel boot section
0x10000
0x10200
stack
protected mode
real mode
Kernel setup
code
BSS Section
Heap
__bss_start
__bss_end
HEAP = _end
heap_end
Data Section
gs = fs = es = ds = ss = cs
setup.bin generation
Physical Memory
protected_mode_jump – ljmpl instruction - relocation (3/6)
sp (STACK_SIZE = 0x400)
Kernel boot section
0x10000
0x10200
stack
protected mode
real mode
Kernel setup
code
BSS Section
Heap
__bss_start
__bss_end
HEAP = _end
heap_end
Data Section
gs = fs = es = ds = ss = cs
Relocation for absolute address of ‘ljmpl’
ljmpl
Physical Memory
Relocation for absolute address of ‘ljmpl’
protected_mode_jump – ljmpl instruction (4/6)
sp (STACK_SIZE = 0x400)
Kernel boot section
0x10000
0x10200
stack
protected mode
real mode
Kernel setup
code
BSS Section
Heap
__bss_start
__bss_end
HEAP = _end
heap_end
Data Section
gs = fs = es = ds = ss = cs
ljmpl
Physical Memory
protected_mode_jump – ljmpl instruction: instruction format (5/6)
protected_mode_jump – ljmpl instruction: instruction format (6/6)
Protected mode: ‘.Lin_pm32’ (1/2)
[real mode] SP configuration [protected mode] SP configuration
`addl %ebx, %esp` in label “.Lin_pm32”
0x1FF80 (SS:SP = 0x1000:0xFF80)
Kernel boot section
0x10000
0x10200
stack
protected mode
real mode
Kernel setup
code
BSS Section
Heap
__bss_start
__bss_end
HEAP = _end
heap_end
esp = 0x1FF80
Kernel boot section
0x10000 (ebx)
0x10200
stack
protected mode
real mode
Kernel setup
code
BSS Section
Heap
__bss_start
__bss_end
HEAP = _end
heap_end
Data Section Data Section
Data Section
1
2
4
3
Protected mode: ‘.Lin_pm32’ (2/2)
X = 0x10000
esp = 0x1FF80
Kernel boot section
0x10200
stack
protected mode
real mode
Kernel setup
code
BSS Section
Heap
__bss_start
__bss_end
HEAP = _end
heap_end
Reserved for BIOS command line
I/O memory hole
Protected-mode kernel code
(compressed vmlinux)
X+0x10000
0xA0000
0x100000 jmpl *%eax
5
Physical Memory
Call Path
Compressed vmlinux: memory layout (1/10)
.head.text – startup_32
0x100000 (ebp register)
0x100200
decompressed vmlinux.bin.bz
.head.text – startup_64
0x1000000
compressed vmlinux
(Relocation)
0x1000000 + boot_param.init_size
0x1000000 + boot_param.init_size
- _end (rbx register)
vmlinux.bin.gz
.text
.rodata
.data
.bss
.pgtable
_end
0x100000 + _end
boot_heap (size: 0x10000)
boot_stack (size: 0x4000)
…
input_data
input_data_end
Memory Layout
32-bit entry point
_bss
Compressed vmlinux: boot_stack & boot_heap in .bss (2/10)
.head.text – startup_32
0x100000 (ebp register)
0x100200
decompressed vmlinux.bin.bz
.head.text – startup_64
0x1000000
compressed vmlinux
(Relocation)
0x1000000 + boot_param.init_size
0x1000000 + boot_param.init_size
- _end (rbx register)
vmlinux.bin.gz
.text
.rodata
.data
.bss
.pgtable
_end
0x100000 + _end
boot_heap (size: 0x10000)
boot_stack (size: 0x4000)
…
input_data
input_data_end
Memory Layout
32-bit entry point
_bss
Compressed vmlinux: High-level Overview (3/10)
Why relocation
• Base address of 32-bit Linux kernel entry point: 0x100000
• Default base address of Linux kernel:
CONFIG_PHYSICAL_START=0x1000000
• Use Case
• kdump: a recuse kernel is loaded to a different address
• PIE (Position independent Executable) and PIC (Position
Independent Code)
Compressed vmlinux: startup_32: 32-bit entry point (4/10)
1
1
Compressed vmlinux: startup_32 (5/10)
1
1
Get the loading address
2
2
Compressed vmlinux: startup_32 (6/10)
Compressed vmlinux: startup_32: Init 4-level page table (7/10)
Sign-extend
Page Map
Level-4 Offset
Page Directory
Pointer Offset
Page Directory
Offset
Physical Page Offset
0
30 21
39 20
38 29
47
48
63
PML4E #0
PDPTE #3
Data
Page Map
Level-4 Table
Page Directory
Pointer Table
Page Directory
Table
40
9 9 9
Linear Address
CR3
PDPTE #2
PDPTE #1
PDPTE #0
PDE #1535
PDE #1024
.
.
PDE #2047
PDE #1536
.
.
PDE #511
PDE #0
.
.
PDE #1023
PDE #512
.
.
2MBbyte
Physical
Page
40
40
31
21
[Paging] Identity mapping for 0-4GB memory space
Compressed vmlinux: startup_32: Init 4-level page table (8/10)
Reference: Section 4.1 “PAGING MODES AND CONTROL BITS”, Intel® 64 and IA-32 Architectures Software Developer’s Manual
Volume 3 (3A, 3B, 3C & 3D): System Programming Guide
Compressed vmlinux: startup_32: Init 4-level page table (9/10)
Compressed vmlinux: far return to startup_64 (10/10)
rva(startup_64) = 0x200
ebp = 0x100000
eax = 0x100000 + 0x200 = 0x100200
Compressed vmlinux: startup_64
2
3
Why to reload CS? (Commit “34bb49229f19”)
When the pre-decompression code loads its first GDT in startup_64, it is still
running on the CS value of the previous GDT. In the case of SEV-ES this is the EFI
GDT. It can be anything depending on what has loaded the kernel (EFI, legacy boot
code, container runtime, etc.)
Compressed vmlinux: [.text] .Lrelocated (1/5)
4
5
Why to call initialize_identity_maps()?
Compressed vmlinux: [.text] .Lrelocated (2/5)
4
5
Why to map boot_params and command line?
Compressed vmlinux: parse_elf (3/5)
4
ELF Header
0x1000000
decompressed vmlinux.bin.bz
(vmlinux.bin – ELF format)
program headers
program header #0
(.text, .rodata, .pci_fixup….)
0x1200000
program header #1
(.data .vvar)
program header #2
(.init.text .altinstr_aux …)
0x1a00000
0x1ac2000
program header #3 (.notes)
0x18886b0
0x1000000
program header #0
(.text, .rodata, .pci_fixup….)
0x1800000
program header #1
(.data .vvar)
program header #2
(.init.text .altinstr_aux …) 0x18c2000
Physical memory Physical memory
Compressed vmlinux: handle_relocations (4/5)
4
CONFIG_RELOCATABLE
• Retain relocation information (generate .rel.* or rela.* sections) when
building a kernel image, so it can be loaded someplace besides the default
address (CONFIG_PHYSICAL_START = 16MB).
• Use case: kdump kernel (recovery kernel)
handle_relocations() - Relocation if CONFIG_X86_NEED_RELOCS is set
• Depend on RANDOMIZE_BASE || (X86_32 && RELOCATABLE)
• Scan relocation tables (.rel.* or .rela.* sections) for symbol relocation
Compressed vmlinux: handle_relocations (5/5)
4
vmlinux.bin.bz
vmlinux.bin
vmlinux.relocs
handle_relocations():
Perform relocation
backwards from the end
of the decompressed
vmlinux
64-bit relocation
address
0
32-bit relocation
address
0
-R section_name: Remove any section matching section_name
-S or strip-all: Do not copy relocation and symbol information from the source file
objdump options
Recap
setup.bin
(arch/x86/boot/setup.bin)
Compressed vmlinux
(Protected-mode kernel)
Note
ELF: arch/x86/boot/compressed/vmlinux
Binary: arch/x86/boot/vmlinux.bin
CRC
bzImage
[More info] bzImage = vmlinuz
On a physical machine
Source code: arch/x86/boot/Makefile, arch/x86/boot/install.sh
Reference
• The Linux/x86 Boot Protocol, Documentation/x86/boot.rst
• Intel® 64 and IA-32 Architectures Software Developer’s Manual
• https://wdv4758h.github.io/notes/blog/linux-kernel-boot.html
• Linux insides, https://0xax.gitbooks.io/linux-insides/content/
Appendix
gdb: Preparation for debugging real-mode of Linux kernel (1/2)
Github: https://github.com/AdrianHuang/gdb-linux-real-mode
gdb: Preparation for debugging real-mode of Linux kernel (2/2)
Github: https://github.com/AdrianHuang/gdb-linux-real-mode
initialize_identity_maps
x86_mapping_info
void *(*alloc_pgt_page)(void *)
void *context
unsigned long page_flag
unsigned long offset
alloc_pgt_data
unsigned char *pgt_buf
unsigned long pgt_buf_size
unsigned long pgt_buf_offset
bool direct_gbpages
unsigned long kernpg_flag
UEFI booting flow – EFI boot stub: Entry point
AddressOfEntryPoint (efi_pe_entry): 0x18d84a
ImageBase = 0x1000000
Physical address of AddressofEntryPoint = 0x1000000 +
0x18d84a = 0x118d84a
UEFI booting flow – EFI Handover protocol
UEFI booting flow – EFI Handover protocol
UEFI booting flow – EFI Handover protocol
Where is the address of bzimage loaded by boot loader?
UEFI booting: call path

More Related Content

What's hot

Linux Memory Management
Linux Memory ManagementLinux Memory Management
Linux Memory Management
Ni Zo-Ma
 

What's hot (20)

Linux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBLinux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKB
 
Anatomy of the loadable kernel module (lkm)
Anatomy of the loadable kernel module (lkm)Anatomy of the loadable kernel module (lkm)
Anatomy of the loadable kernel module (lkm)
 
Linux Initialization Process (2)
Linux Initialization Process (2)Linux Initialization Process (2)
Linux Initialization Process (2)
 
Slab Allocator in Linux Kernel
Slab Allocator in Linux KernelSlab Allocator in Linux Kernel
Slab Allocator in Linux Kernel
 
Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...
 
qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...
qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...
qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...
 
Memory Mapping Implementation (mmap) in Linux Kernel
Memory Mapping Implementation (mmap) in Linux KernelMemory Mapping Implementation (mmap) in Linux Kernel
Memory Mapping Implementation (mmap) in Linux Kernel
 
Linux Initialization Process (1)
Linux Initialization Process (1)Linux Initialization Process (1)
Linux Initialization Process (1)
 
Physical Memory Management.pdf
Physical Memory Management.pdfPhysical Memory Management.pdf
Physical Memory Management.pdf
 
Understanding a kernel oops and a kernel panic
Understanding a kernel oops and a kernel panicUnderstanding a kernel oops and a kernel panic
Understanding a kernel oops and a kernel panic
 
Linux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionLinux MMAP & Ioremap introduction
Linux MMAP & Ioremap introduction
 
Page cache in Linux kernel
Page cache in Linux kernelPage cache in Linux kernel
Page cache in Linux kernel
 
Linux Kernel - Virtual File System
Linux Kernel - Virtual File SystemLinux Kernel - Virtual File System
Linux Kernel - Virtual File System
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast Storage
 
Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)
 
Linux dma engine
Linux dma engineLinux dma engine
Linux dma engine
 
Arm device tree and linux device drivers
Arm device tree and linux device driversArm device tree and linux device drivers
Arm device tree and linux device drivers
 
Linux Memory Management
Linux Memory ManagementLinux Memory Management
Linux Memory Management
 
twlkh-linux-vsyscall-and-vdso
twlkh-linux-vsyscall-and-vdsotwlkh-linux-vsyscall-and-vdso
twlkh-linux-vsyscall-and-vdso
 
Linux Internals - Part I
Linux Internals - Part ILinux Internals - Part I
Linux Internals - Part I
 

Similar to Vmlinux: anatomy of bzimage and how x86 64 processor is booted

Kernel compilation
Kernel compilationKernel compilation
Kernel compilation
mcganesh
 
LCU14 302- How to port OP-TEE to another platform
LCU14 302- How to port OP-TEE to another platformLCU14 302- How to port OP-TEE to another platform
LCU14 302- How to port OP-TEE to another platform
Linaro
 
망고100 보드로 놀아보자 7
망고100 보드로 놀아보자 7망고100 보드로 놀아보자 7
망고100 보드로 놀아보자 7
종인 전
 

Similar to Vmlinux: anatomy of bzimage and how x86 64 processor is booted (20)

Linux Kernel Tour
Linux Kernel TourLinux Kernel Tour
Linux Kernel Tour
 
U-Boot Porting on New Hardware
U-Boot Porting on New HardwareU-Boot Porting on New Hardware
U-Boot Porting on New Hardware
 
How to build and load linux to embedded system
How to build and load linux to embedded systemHow to build and load linux to embedded system
How to build and load linux to embedded system
 
Grub2 Booting Process
Grub2 Booting ProcessGrub2 Booting Process
Grub2 Booting Process
 
Linux Porting
Linux PortingLinux Porting
Linux Porting
 
Raspberry Pi tutorial
Raspberry Pi tutorialRaspberry Pi tutorial
Raspberry Pi tutorial
 
U Boot or Universal Bootloader
U Boot or Universal BootloaderU Boot or Universal Bootloader
U Boot or Universal Bootloader
 
“Linux Kernel CPU Hotplug in the Multicore System”
“Linux Kernel CPU Hotplug in the Multicore System”“Linux Kernel CPU Hotplug in the Multicore System”
“Linux Kernel CPU Hotplug in the Multicore System”
 
Bootstrap process of u boot (NDS32 RISC CPU)
Bootstrap process of u boot (NDS32 RISC CPU)Bootstrap process of u boot (NDS32 RISC CPU)
Bootstrap process of u boot (NDS32 RISC CPU)
 
005 skyeye
005 skyeye005 skyeye
005 skyeye
 
[Defcon] Hardware backdooring is practical
[Defcon] Hardware backdooring is practical[Defcon] Hardware backdooring is practical
[Defcon] Hardware backdooring is practical
 
Kdump and the kernel crash dump analysis
Kdump and the kernel crash dump analysisKdump and the kernel crash dump analysis
Kdump and the kernel crash dump analysis
 
Launch the First Process in Linux System
Launch the First Process in Linux SystemLaunch the First Process in Linux System
Launch the First Process in Linux System
 
Beagleboard xm-setup
Beagleboard xm-setupBeagleboard xm-setup
Beagleboard xm-setup
 
Kernel compilation
Kernel compilationKernel compilation
Kernel compilation
 
LCU14 302- How to port OP-TEE to another platform
LCU14 302- How to port OP-TEE to another platformLCU14 302- How to port OP-TEE to another platform
LCU14 302- How to port OP-TEE to another platform
 
Vagrant, Ansible, and OpenStack on your laptop
Vagrant, Ansible, and OpenStack on your laptopVagrant, Ansible, and OpenStack on your laptop
Vagrant, Ansible, and OpenStack on your laptop
 
PV-Drivers for SeaBIOS using Upstream Qemu
PV-Drivers for SeaBIOS using Upstream QemuPV-Drivers for SeaBIOS using Upstream Qemu
PV-Drivers for SeaBIOS using Upstream Qemu
 
망고100 보드로 놀아보자 7
망고100 보드로 놀아보자 7망고100 보드로 놀아보자 7
망고100 보드로 놀아보자 7
 
Linux Booting Steps
Linux Booting StepsLinux Booting Steps
Linux Booting Steps
 

Recently uploaded

%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
masabamasaba
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 

Recently uploaded (20)

%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 

Vmlinux: anatomy of bzimage and how x86 64 processor is booted

  • 1. vmlinux: Anatomy of bzimage and how x86_64 processor is booted Adrian Huang | May, 2021 * Based on kernel 5.11 (x86_64) – QEMU * Legacy BIOS
  • 2. Agenda • bzimage: high-level overview • Layout of bzImage • ELF layout • setup.bin and compressed vmlinux • Physical memory layout • Entry point of Linux – ‘start_of_setup’@0x10200 (physical memory) • From viewpoint of GRUB and QEMU loader • Initialization flow • Compressed vmlinux • ELF layout • Physical memory layout • Initialization flow
  • 3. Agenda • Layout of bzImage • ELF layout • setup.bin and compressed vmlinux • Physical memory layout • Entry point of Linux – ‘start_of_setup’@0x10200 (physical memory) • From viewpoint of GRUB and QEMU loader • Initialization flow • Compressed vmlinux • ELF layout • Physical memory layout • Initialization flow • CPU architecture knowledge ✓ Near call and far call ✓ Near jump and far jump ✓ Instruction opcode • CPU Operation Mode ✓ Real mode, protected mode and long mode (64-bit mode) ➢ Memory addressing • ELF ✓ Relocation, program header,… • GNU assembly Requisite Knowledge
  • 4. bzImage: High-level Overview (1/2) Boot code (Real mode -> protected mode) Compressed vmlinux Boot code (protected mode + paging -> long mode) vmlinux.bin.gz bzImage
  • 5. bzImage: High-level Overview (2/2) Boot code (Real mode -> protected mode) Compressed vmlinux Boot code (protected mode + paging -> long mode) vmlinux.bin.gz bzImage setup.bin Compressed vmlinux (Protected-mode kernel) CRC bzImage
  • 6. Layout of bzImage – setup.bin setup.bin Compressed vmlinux (Protected-mode kernel) .bstext CRC .bsdata Part 1 of ‘.header’ Part 2 of ‘.header’ .entrytext Kernel Boot Section: 512 bytes (MBR) Source : arch/x86/boot/header.S <- arch/x86/boot/header.S 0x0 0x200 Offset/Size Name Description 0x1F1/1 setup_sects The size of the setup in sectors 0x01FE/2 boot_flag magic number: 0xAA55 0x200/2 jump Jump instruction 0x214/4 code32_start Boot loader hook: The address to jump to in protected mode. Default: 0x100000 ".header": Real-mode kernel header 0x1F1 Part 1 of ‘header’ Part 2 of ‘header’ .inittext .initdata .text .text32 .rodata .videocards .data .signature .bss <- arch/x86/boot/header.S <- arch/x86/boot/header.S <- arch/x86/boot/tty.c <- arch/x86/boot/*.c arch/x86/boot/bioscall.S arch/x86/boot/copy.S arch/x86/boot/ pmjump.S <- arch/x86/boot/ pmjump.S <- arch/x86/boot/*.c <- arch/x86/boot/video-*.c <- arch/x86/boot/*.c <- 4-byte signature <- arch/x86/boot/*.c bzImage ELF sections
  • 7. Layout of bzImage – setup.bin setup.bin Compressed vmlinux (Protected-mode kernel) .bstext CRC .bsdata Part 1 of ‘.header’ Part 2 of ‘.header’ .entrytext Kernel Boot Section: 512 bytes (MBR) Source : arch/x86/boot/header.S <- arch/x86/boot/header.S 0x0 0x200 Offset/Size Name Description 0x1F1/1 setup_sects The size of the setup in sectors 0x01FE/2 boot_flag magic number: 0xAA55 0x200/2 jump Jump instruction 0x214/4 code32_start Boot loader hook: The address to jump to in protected mode. Default: 0x100000 ".header": Real-mode kernel header 0x1F1 Part 1 of ‘header’ Part 2 of ‘header’ short jump .inittext .initdata .text .text32 .rodata .videocards .data .signature .bss <- arch/x86/boot/header.S <- arch/x86/boot/header.S <- arch/x86/boot/tty.c <- arch/x86/boot/*.c arch/x86/boot/bioscall.S arch/x86/boot/copy.S arch/x86/boot/ pmjump.S <- arch/x86/boot/ pmjump.S <- arch/x86/boot/*.c <- arch/x86/boot/video-*.c <- arch/x86/boot/*.c <- 4-byte signature <- arch/x86/boot/*.c bzImage near call long jump 1 2 3 1 CPU Real Mode (16 bits) 2 CPU Real Mode 3 CPU Real Mode -> CPU Protected Mode (32 bits) ELF sections
  • 8. Layout of bzImage – compressed vmlinux setup.bin (arch/x86/boot/setup.bin) Compressed vmlinux (Protected-mode kernel) Note ELF: arch/x86/boot/compressed/vmlinux Binary: arch/x86/boot/vmlinux.bin CRC bzImage vmlinux.bin vmlinux.bin.gz How to pack vmlinux.bin.gz? arch/x86/boot/compressed .head.text .rodata..compressed (vmlinux.bin.gz) .text .rodata .data arch/x86/boot/compressed/head_64.S 0x0 .bss .pgtable arch/x86/boot/compressed/piggy.S created by arch/x86/boot/compressed/mkpiggy.c arch/x86/boot/compressed/vmlinux.bin.gz arch/x86/boot/compressed/*.c arch/x86/boot/compressed/head_64.S arch/x86/boot/compressed/efi_thunk_64.S arch/x86/boot/compressed/head_64.S ELF Sections
  • 9. Layout of bzImage – compressed vmlinux Compressed vmlinux setup.bin (arch/x86/boot/setup.bin) Compressed vmlinux (Protected-mode kernel) Note ELF: arch/x86/boot/compressed/vmlinux Binary: arch/x86/boot/vmlinux.bin CRC bzImage .head.text .rodata..compressed (vmlinux.bin.gz) .text .rodata .data arch/x86/boot/compressed/head_64.S 0x0 .bss .pgtable arch/x86/boot/compressed/piggy.S created by arch/x86/boot/compressed/mkpiggy.c arch/x86/boot/compressed/vmlinux.bin.gz arch/x86/boot/compressed/*.c arch/x86/boot/compressed/head_64.S arch/x86/boot/compressed/efi_thunk_64.S arch/x86/boot/compressed/head_64.S ELF Sections
  • 10. Layout of bzImage – compressed vmlinux.bin * Symbol: Equivalent to using ‘.set’ directive * https://sourceware.org/binutils/docs/as/Setting-Symbols.html Why z_input_len/input and z_output_len/output_len? * BFD: Binary File Descriptor library - https://www.gnu.org/software/binutils/
  • 11. Memory layout of bzImage – Entry Point Address Where is ‘X’? BIOS use only Typically used by MBR Reserved for MBR/BIOS Boot loader 0x00000 0x00600 0x00800 0x01000 Kernel boot section stack/heap X X+0x08000 Reserved for BIOS Command line I/O memory hole Protected-mode kernel (Compressed vmlinux) X+0x10000 0x100000 0xA0000 Boot sector entry point 0000:7C00 The kernel legacy boot sector The kernel real-mode/protected mode code For use by the kernel real-mode/protected mode code Physical Memory Kernel setup code Reference: Documentation/x86/boot.rst
  • 12. Entry Point of Linux - GRUB Memory addressing in real mode [GRUB] Get the memory address for real mode code 1. gs = fs = es = ds = ss = 0x1000 2. sp = GRUB_LINUX_SETUP_STACK = 0x9000 3. cs = 0x1020, ip = 0 Registers configured by GRUB Kernel boot section 0x10000 0x10200 Physical Memory GRUB loads ‘setup.bin’ at address 0x10000 0 ds = es = fs = gs = ss cs stack ss:sp = 0x1FFF0 protected mode real mode Kernel setup code
  • 13. Entry Point of Linux - GRUB Memory addressing in real mode [GRUB] Get the memory address for real mode code 1. gs = fs = es = ds = ss = 0x1000 2. sp = GRUB_LINUX_SETUP_STACK = 0x9000 3. cs = 0x1020, ip = 0 Registers configured by GRUB Kernel boot section 0x10000 0x10200 Physical Memory GRUB loads ‘setup.bin’ at address 0x10000 0 ds = es = fs = gs = ss cs stack ss:sp = 0x1FFF0 protected mode real mode Kernel setup code 1. QEMU loader and GRUB load ‘setup.bin’ at address 0x10000 2. QEMU loader sets SS:SP = 1000:FFF0 while GRUB sets SS:SP 1000:9000
  • 14. Entry Point of Linux: QEMU loader Kernel boot section 0x10000 0x10200 Physical Memory QEMU loader loads ‘setup.bin’ at address 0x10000 0 ds = es = fs = gs = ss cs stack ss:sp = 0x1FFF0 protected mode real mode Kernel setup code 1 2 3 4 5 6 7 ds = es = fs = gs = ss = segment_addr = 0x1000 esp = stack_addr = cmdline_addr - setup_addr – 16 = 0x20000 – 0x10000 – 16 = 0x10000 – 16 = 0xfff0 cs = 0x1020, ip = 0 Registers configured by QEMU loader 5 6 7 Prepare for far return 8 far return: change ‘cs’ by means of CPU arch itself
  • 15. Entry Point of Linux: QEMU loader – Near and Far calls 3 4 5 6 7 Prepare for far return 8 far return: change ‘cs’ by means of CPU arch itself
  • 16. Entry Point of Linux: QEMU loader Kernel boot section 0x10000 0x10200 Physical Memory QEMU loader loads ‘setup.bin’ at address 0x10000 0 ds = es = fs = gs = ss cs stack sp = 0x1FFF0 (ss:0xFFF0) protected mode real mode Kernel setup code Make sure setup.bin is loaded at 0x10000 Make sure vmlinux.bin is loaded at 0x100000 Address of setup.bin Address of vmlinux.bin
  • 17. arch/x86/boot/setup.ld arch/x86/boot/header.S 1 2 Entry Point of Linux: GNU Linker [GNU Linker] ENTRY() command * First executable instruction in an output file → entry point * ENTRY() is one of choosing the entry point -- the `-e' entry command-line option -- the ENTRY(symbol) command in a linker control script -- the value of the symbol start, if present -- the address of the first byte of the .text section, if present; -- the address 0
  • 18. arch/x86/boot/setup.ld 1 Entry Point of Linux: GNU Linker [GNU Linker] ENTRY() command * First executable instruction in an output file → entry point * ENTRY() is one of choosing the entry point -- the `-e' entry command-line option -- the ENTRY(symbol) command in a linker control script -- the value of the symbol start, if present -- the address of the first byte of the .text section, if present; -- the address 0 Kernel boot section 0x10000 0x10200 Physical Memory QEMU loader loads ‘setup.bin’ at address 0x10000 0 ds = es = fs = gs = ss cs stack sp = 0x1FFF0 (ss:0xFFF0) protected mode real mode Kernel setup code
  • 19. Entry Point of Linux: start_of_setup - GDB Kernel boot section 0x10000 0x10200 Physical Memory Boot loader loads ‘setup.bin’ at address 0x10000 0 gs = fs = es = ds = ss cs stack sp = 0x1FFF0 (ss:0xFFF0) protected mode real mode Kernel setup code
  • 20. Entry Point of Linux: start_of_setup - GDB Kernel boot section 0x10000 0x10200 Physical Memory Boot loader loads ‘setup.bin’ at address 0x10000 0 gs = fs = es = ds = ss cs stack sp = 0x1FFF0 (ss:0xFFF0) protected mode real mode Kernel setup code
  • 21. Entry Point of Linux: start_of_setup – short jump Kernel boot section 0x10000 0x10200 Physical Memory Boot loader loads ‘setup.bin’ at address 0x10000 0 gs = fs = es = ds = ss cs stack sp = 0x1FFF0 (ss:0xFFF0) protected mode real mode Kernel setup code Offset/Size Name Description 0x1F1/1 setup_sects The size of the setup in sectors 0x01FE/2 boot_flag magic number: 0xAA55 0x200/2 jump Jump instruction 0x214/4 code32_start Boot loader hook: The address to jump to in protected mode. Default: 0x100000 ".header": Real-mode kernel header
  • 22. Entry Point of Linux: start_of_setup – short jump 0x26c – 0x202 = 0x6a
  • 23. Entry Point of Linux: start_of_setup Call Path Kernel boot section 0x10000 0x10200 Boot loader loads ‘setup.bin’ at address 0x10000 0 gs = fs = es = ds = ss = cs cs stack sp = 0x1FFF0 (ss:0xFFF0) protected mode real mode Kernel setup code Physical Memory lretw instruction: Far Return Operation ‘l’ prefix: far control transfer ‘w’ suffix: word (16 bits)
  • 24. Entry Point of Linux: start_of_setup Call Path Kernel boot section 0x10000 0x10200 Boot loader loads ‘setup.bin’ at address 0x10000 0 gs = fs = es = ds = ss = cs cs stack sp = 0x1FFF0 (ss:0xFFF0) protected mode real mode Kernel setup code Physical Memory lretw instruction: Far Return Operation ‘l’ prefix: far control transfer ‘w’ suffix: word (16 bits) 1 1 2 2 3 3
  • 25. Entry Point of Linux: start_of_setup Kernel boot section 0x10000 0x10200 Boot loader loads ‘setup.bin’ at address 0x10000 0 gs = fs = es = ds = ss = cs cs stack sp = 0x1FFF0 (ss:0xFFF0) protected mode real mode Kernel setup code Physical Memory Call Path lretw instruction: Far Return Operation ‘l’ prefix: far control transfer ‘w’ suffix: word (16 bits)
  • 26. Entry Point of Linux: start_of_setup – Why to align CS? Kernel boot section 0x10000 0x10200 Boot loader loads ‘setup.bin’ at address 0x10000 0 gs = fs = es = ds = ss = cs cs stack sp = 0x1FFF0 (ss:0xFFF0) protected mode real mode Kernel setup code Physical Memory Call Path lretw instruction: Far Return Operation ‘l’ prefix: far control transfer ‘w’ suffix: word (16 bits) If cs is not align with ds, ds and es are incorrect after returning from ‘intcall’.
  • 27. Entry Point of Linux: start_of_setup – data & bss section Call Path Kernel boot section 0x10000 0x10200 Boot loader loads ‘setup.bin’ at address 0x10000 0 gs = fs = es = ds = ss= cs stack sp = 0x1FFF0 protected mode real mode Kernel setup code Kernel boot section 0x10000 0x10200 0 gs = fs = es = ds = ss = cs stack sp = 0x1FFF0 protected mode real mode Kernel setup code BSS Section _end __bss_start __bss_end Data Section Physical Memory
  • 28. Entry Point of Linux: start_of_setup -> main() Call Path
  • 29. Entry Point of Linux: start_of_setup -> main()
  • 30. Entry Point of Linux: start_of_setup -> main()
  • 31. Entry Point of Linux: start_of_setup -> main() -> copy_boot_params() Call Path • copy setup header into boot parameter block (struct boot_params: arch/x86/include/uapi/asm/bootparam.h) o `struct setup_header hdr` in boot_params ▪ Contain the same fields defined in Linux boot protocol. Those fields are configured by boot loader and kernel compile/build time
  • 32. Call Path • console_init() o Initialize the corresponding serial port if command line has ‘earlyprintk’ parameter Entry Point of Linux: start_of_setup -> main() -> console_init() – (1/2) Kernel boot section 0x10000 0x10200 0 gs = fs = es = ds = ss = cs stack sp = 0x1FFF0 protected mode real mode Kernel setup code BSS Section _end __bss_start __bss_end Data Section Kernel Command Line 0x20000 QEMU Loader Physical Memory
  • 33. Call Path • console_init() o Initialize the corresponding serial port if command line has ‘earlyprintk’ parameter Entry Point of Linux: start_of_setup -> main() -> console_init() – (2/2) Kernel boot section 0x10000 0x10200 0 gs = fs = es = ds = ss = cs stack sp = 0x1FFF0 protected mode real mode Kernel setup code BSS Section _end __bss_start __bss_end Data Section Kernel Command Line 0x20000 Physical Memory
  • 34. Call Path • init_heap() • Discussion in the next few slides • validate_cpu() o Check CPU flags o Check if long mode (x86_64) is available o [AMD – K7 Processor] Turn SSE+SSE2 on if they are missing in CPU flags • detect_memory() o Use different program interfaces (0xe820, 0xe801 and 0x88) for memory detection o 0xe820 ▪ Fill boot_params.e820_table based on e820 map Entry Point of Linux: start_of_setup -> main() -> validate_cpu() & detect_memory() Kernel boot section 0x10000 0x10200 0 gs = fs = es = ds = ss = cs stack sp = 0x1FFF0 protected mode real mode Kernel setup code BSS Section _end __bss_start __bss_end Data Section Kernel Command Line 0x20000 Physical Memory
  • 35. Call Path • init_heap o Setup the heap space if the ‘CAN_USE_HEAP’ flag (0x80) is set in loadflags of the kernel setup header. Entry Point of Linux: start_of_setup -> main() -> init_heap() (1/2)
  • 36. Call Path Entry Point of Linux: start_of_setup -> main() -> init_heap() (2/2) heap: allocate heap if CAN_USE_HEAP’ flag (0x80) is set No heap sp (STACK_SIZE = 0x400) Kernel boot section 0x10000 0x10200 stack protected mode real mode Kernel setup code BSS Section Unused Area __bss_start __bss_end HEAP = heap_end = _end Data Section sp (STACK_SIZE = 0x400) Kernel boot section 0x10000 0x10200 stack protected mode real mode Kernel setup code BSS Section Heap __bss_start __bss_end HEAP = _end heap_end Data Section gs = fs = es = ds = ss = cs gs = fs = es = ds = ss = cs
  • 37. go_to_protected_mode GDT_ENTRY_BOOT_DS GDT_ENTRY_BOOT_CS NULL NULL 0 1 2 3 GDT_ENTRY_BOOT_TSS 4 Descriptor Table: boot_gdt System Memory 0 0xFFFFFFFF limit Base Address GDTR x86 Segmentation: Address Translation setup_gdt(): Setup 4G memory space for CS/DS Call Path
  • 39. protected_mode_jump – ljmpl instruction: ignore ‘.Lin_pm32’ relocation (2/6) 0x30cc Jump (absolute address) to the wrong location sp (STACK_SIZE = 0x400) Kernel boot section 0x10000 0x10200 stack protected mode real mode Kernel setup code BSS Section Heap __bss_start __bss_end HEAP = _end heap_end Data Section gs = fs = es = ds = ss = cs setup.bin generation Physical Memory
  • 40. protected_mode_jump – ljmpl instruction - relocation (3/6) sp (STACK_SIZE = 0x400) Kernel boot section 0x10000 0x10200 stack protected mode real mode Kernel setup code BSS Section Heap __bss_start __bss_end HEAP = _end heap_end Data Section gs = fs = es = ds = ss = cs Relocation for absolute address of ‘ljmpl’ ljmpl Physical Memory
  • 41. Relocation for absolute address of ‘ljmpl’ protected_mode_jump – ljmpl instruction (4/6) sp (STACK_SIZE = 0x400) Kernel boot section 0x10000 0x10200 stack protected mode real mode Kernel setup code BSS Section Heap __bss_start __bss_end HEAP = _end heap_end Data Section gs = fs = es = ds = ss = cs ljmpl Physical Memory
  • 42. protected_mode_jump – ljmpl instruction: instruction format (5/6)
  • 43. protected_mode_jump – ljmpl instruction: instruction format (6/6)
  • 44. Protected mode: ‘.Lin_pm32’ (1/2) [real mode] SP configuration [protected mode] SP configuration `addl %ebx, %esp` in label “.Lin_pm32” 0x1FF80 (SS:SP = 0x1000:0xFF80) Kernel boot section 0x10000 0x10200 stack protected mode real mode Kernel setup code BSS Section Heap __bss_start __bss_end HEAP = _end heap_end esp = 0x1FF80 Kernel boot section 0x10000 (ebx) 0x10200 stack protected mode real mode Kernel setup code BSS Section Heap __bss_start __bss_end HEAP = _end heap_end Data Section Data Section
  • 45. Data Section 1 2 4 3 Protected mode: ‘.Lin_pm32’ (2/2) X = 0x10000 esp = 0x1FF80 Kernel boot section 0x10200 stack protected mode real mode Kernel setup code BSS Section Heap __bss_start __bss_end HEAP = _end heap_end Reserved for BIOS command line I/O memory hole Protected-mode kernel code (compressed vmlinux) X+0x10000 0xA0000 0x100000 jmpl *%eax 5 Physical Memory Call Path
  • 46. Compressed vmlinux: memory layout (1/10) .head.text – startup_32 0x100000 (ebp register) 0x100200 decompressed vmlinux.bin.bz .head.text – startup_64 0x1000000 compressed vmlinux (Relocation) 0x1000000 + boot_param.init_size 0x1000000 + boot_param.init_size - _end (rbx register) vmlinux.bin.gz .text .rodata .data .bss .pgtable _end 0x100000 + _end boot_heap (size: 0x10000) boot_stack (size: 0x4000) … input_data input_data_end Memory Layout 32-bit entry point _bss
  • 47. Compressed vmlinux: boot_stack & boot_heap in .bss (2/10) .head.text – startup_32 0x100000 (ebp register) 0x100200 decompressed vmlinux.bin.bz .head.text – startup_64 0x1000000 compressed vmlinux (Relocation) 0x1000000 + boot_param.init_size 0x1000000 + boot_param.init_size - _end (rbx register) vmlinux.bin.gz .text .rodata .data .bss .pgtable _end 0x100000 + _end boot_heap (size: 0x10000) boot_stack (size: 0x4000) … input_data input_data_end Memory Layout 32-bit entry point _bss
  • 48. Compressed vmlinux: High-level Overview (3/10) Why relocation • Base address of 32-bit Linux kernel entry point: 0x100000 • Default base address of Linux kernel: CONFIG_PHYSICAL_START=0x1000000 • Use Case • kdump: a recuse kernel is loaded to a different address • PIE (Position independent Executable) and PIC (Position Independent Code)
  • 49. Compressed vmlinux: startup_32: 32-bit entry point (4/10) 1 1
  • 50. Compressed vmlinux: startup_32 (5/10) 1 1 Get the loading address
  • 52. Compressed vmlinux: startup_32: Init 4-level page table (7/10) Sign-extend Page Map Level-4 Offset Page Directory Pointer Offset Page Directory Offset Physical Page Offset 0 30 21 39 20 38 29 47 48 63 PML4E #0 PDPTE #3 Data Page Map Level-4 Table Page Directory Pointer Table Page Directory Table 40 9 9 9 Linear Address CR3 PDPTE #2 PDPTE #1 PDPTE #0 PDE #1535 PDE #1024 . . PDE #2047 PDE #1536 . . PDE #511 PDE #0 . . PDE #1023 PDE #512 . . 2MBbyte Physical Page 40 40 31 21 [Paging] Identity mapping for 0-4GB memory space
  • 53. Compressed vmlinux: startup_32: Init 4-level page table (8/10) Reference: Section 4.1 “PAGING MODES AND CONTROL BITS”, Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3 (3A, 3B, 3C & 3D): System Programming Guide
  • 54. Compressed vmlinux: startup_32: Init 4-level page table (9/10)
  • 55. Compressed vmlinux: far return to startup_64 (10/10) rva(startup_64) = 0x200 ebp = 0x100000 eax = 0x100000 + 0x200 = 0x100200
  • 56. Compressed vmlinux: startup_64 2 3 Why to reload CS? (Commit “34bb49229f19”) When the pre-decompression code loads its first GDT in startup_64, it is still running on the CS value of the previous GDT. In the case of SEV-ES this is the EFI GDT. It can be anything depending on what has loaded the kernel (EFI, legacy boot code, container runtime, etc.)
  • 57. Compressed vmlinux: [.text] .Lrelocated (1/5) 4 5 Why to call initialize_identity_maps()?
  • 58. Compressed vmlinux: [.text] .Lrelocated (2/5) 4 5 Why to map boot_params and command line?
  • 59. Compressed vmlinux: parse_elf (3/5) 4 ELF Header 0x1000000 decompressed vmlinux.bin.bz (vmlinux.bin – ELF format) program headers program header #0 (.text, .rodata, .pci_fixup….) 0x1200000 program header #1 (.data .vvar) program header #2 (.init.text .altinstr_aux …) 0x1a00000 0x1ac2000 program header #3 (.notes) 0x18886b0 0x1000000 program header #0 (.text, .rodata, .pci_fixup….) 0x1800000 program header #1 (.data .vvar) program header #2 (.init.text .altinstr_aux …) 0x18c2000 Physical memory Physical memory
  • 60. Compressed vmlinux: handle_relocations (4/5) 4 CONFIG_RELOCATABLE • Retain relocation information (generate .rel.* or rela.* sections) when building a kernel image, so it can be loaded someplace besides the default address (CONFIG_PHYSICAL_START = 16MB). • Use case: kdump kernel (recovery kernel) handle_relocations() - Relocation if CONFIG_X86_NEED_RELOCS is set • Depend on RANDOMIZE_BASE || (X86_32 && RELOCATABLE) • Scan relocation tables (.rel.* or .rela.* sections) for symbol relocation
  • 61. Compressed vmlinux: handle_relocations (5/5) 4 vmlinux.bin.bz vmlinux.bin vmlinux.relocs handle_relocations(): Perform relocation backwards from the end of the decompressed vmlinux 64-bit relocation address 0 32-bit relocation address 0 -R section_name: Remove any section matching section_name -S or strip-all: Do not copy relocation and symbol information from the source file objdump options
  • 62. Recap setup.bin (arch/x86/boot/setup.bin) Compressed vmlinux (Protected-mode kernel) Note ELF: arch/x86/boot/compressed/vmlinux Binary: arch/x86/boot/vmlinux.bin CRC bzImage
  • 63. [More info] bzImage = vmlinuz On a physical machine Source code: arch/x86/boot/Makefile, arch/x86/boot/install.sh
  • 64. Reference • The Linux/x86 Boot Protocol, Documentation/x86/boot.rst • Intel® 64 and IA-32 Architectures Software Developer’s Manual • https://wdv4758h.github.io/notes/blog/linux-kernel-boot.html • Linux insides, https://0xax.gitbooks.io/linux-insides/content/
  • 66. gdb: Preparation for debugging real-mode of Linux kernel (1/2) Github: https://github.com/AdrianHuang/gdb-linux-real-mode
  • 67. gdb: Preparation for debugging real-mode of Linux kernel (2/2) Github: https://github.com/AdrianHuang/gdb-linux-real-mode
  • 68. initialize_identity_maps x86_mapping_info void *(*alloc_pgt_page)(void *) void *context unsigned long page_flag unsigned long offset alloc_pgt_data unsigned char *pgt_buf unsigned long pgt_buf_size unsigned long pgt_buf_offset bool direct_gbpages unsigned long kernpg_flag
  • 69. UEFI booting flow – EFI boot stub: Entry point AddressOfEntryPoint (efi_pe_entry): 0x18d84a ImageBase = 0x1000000 Physical address of AddressofEntryPoint = 0x1000000 + 0x18d84a = 0x118d84a
  • 70. UEFI booting flow – EFI Handover protocol
  • 71. UEFI booting flow – EFI Handover protocol
  • 72. UEFI booting flow – EFI Handover protocol Where is the address of bzimage loaded by boot loader?