Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Linux Kernel Booting Process (1) - For NLKB

11,344 views

Published on

Describes the bootstrapping part in Linux and some related technologies.
This is the part one of the slides, and the succeeding slides will contain the errata for this slide.

Published in: Engineering
  • Login to see the comments

Linux Kernel Booting Process (1) - For NLKB

  1. 1. Booting Process (1) Taku Shimosawa Pour le livre nouveau du Linux noyau 1
  2. 2. References • Wikipedia (!) • Wikipedia knows everything  • Wiktionary • I wanted to use OED if I had one… • Source Files • Linux 3.15 • U-boot 2014.04 • ELILO 3.16 • GRUB 2.00 • Manual • Intel® 64 and IA-32 Architectures Software Developer’s Manual • ARM® Architecture Reference Manual 2
  3. 3. 1. Booting “what’s boot” 3
  4. 4. What is “boot”? • boot (n.) 4 [1] http://en.wikipedia.org/wiki/Boot
  5. 5. Brief etymology[2] • Phrase “pull oneself up by one’s bootstraps” • Misattributed (at latest in 1901!) to “The Surprising Adventures of Baron Munchausen” (1781, Rudolf Erich Raspe) : The baron pulls himself out of a swamp by his hair (pigtail). • The use of this phrase is found in 1834 in the U.S. • “[S]omeone is attempting or has claimed some ludicrously far-fetched or impossible task” • In the 20th century, the “possible task” meaning has appeared • “To begin an enterprise or recover from a setback without any outside help; to succeed only on one's own effort or abilities” 5 bootstrap[3] [2] http://en.wiktionary.org/wiki/pull_oneself_up_by_one%27s_bootstraps [3] http://en.wikipedia.org/wiki/Bootstrapping
  6. 6. Bootstrapping (in Computer) • The process of loading the basic software (typically, operating systems) into the main memory from persistent memory (HDD, flash ROM, etc.) • “Boot” is an abbreviation for “bootstrap(ping)” 6 Boostrapping Code
  7. 7. Boot loader • “It is responsible for loading and transferring control to the operating system kernel software (such as the Hurd or Linux).”[4] • Boot loader • BIOS (PC) • UEFI (Universal Extensible Firmware Interface) (PC) • “Secure Boot” issue • Das U-Boot (Universal bootloader) (for embedded systems) • Second-stage boot loader • LILO (Linux Loader, Ver. 24.0, Released on Jun 7, 2013) • Supports GPT and RAID (!?) • GRUB2 (Ver. 2.00, Jun 26, 2012) • Supports BIOS and UEFI boot • GRUB Legacy (Grand Unified Boot Loader, Ver. 0.97, May 8, 2005) • ELILO (EFI Linux Boot Loader, Ver 3.16, Mar 29, 2013) • Originally for EFI and Itanium; currently bug fix only • SYSLINUX (Ver. 6.02, Oct 13, 2013) • NTLDR, BOOTMGR (beginning from Windows Vista) 7 [4] http://www.gnu.org/software/grub/
  8. 8. What loads and what is loaded 8 Power On BIOS GRUB2 Linux HDD (MBR) HDD PXELINUX (a part of SYSLINUX) BIOS/NIC Option ROM Network (tftp) Network (tftp) bzImage bzImage U-Boot Flash ROM HDD Network SD Card etc… uImage
  9. 9. 2. Prerequisites How to say “Hello” in x86? 9
  10. 10. Miscellanea • Architecture and GNU Assembly Language • Very briefly • x86 • ARMv7 • Things left.. • Linker Script 10
  11. 11. x86 Architecture : Mode • Too complicated to explain • 3 Modes • Real mode • 16 bit mode • No mode switch (always privileged) • No virtual memory (Segmentation only) • Protected mode • 32 bit mode • Segmentation / Virtual memory • (Virtual 8086 mode) • Compatibility for executing 16-bit code in 32-bit mode • Long mode • 64 bit mode • Virtual memory only • (Another mode “,compatibility mode,” for executing 32-bit code) • What is this bit? • Size of the virtual address • Default size of the operand registers (*) 11 (*) Of course, you can use %al in 32-bit mode, %ax in 64-bit mode…
  12. 12. x86 Architecture : Registers • Registers before 64-bit • 8 general-purpose registers • Some instructions uses a certain set of registers for its input and output… • Especially, sp is only used for a stack pointer • Each register has names for certain parts (the lower 8-bit, for example) • Example: eax register 12 eax ah al ax 8bit8bit 16bit 32bit
  13. 13. x86 Architecture : Registers • In 64-bit mode, the registers are extended to 64-bit and new names for them are introduced (r**) • The new 8 registers (r8 ~ r15) are also introduced 13 64-bit Lower 32-bit Lower 16-bit Higher/Lower 8-bit in Lower 16-bit rax eax ax ah/al rcx ecx cx ch/cl rdx edx dx dh/dl rbx ebx bx bh/bl rsp esp sp --/spl rbp ebp bp --/bpl rsi esi si --/sil rdi edi di --/dil r8 r8d r8w --/r8l
  14. 14. x86 Architecture : Segmentation • 6 Segment Registers (16-bit registers) • Code Segment Register: CS • Data Segment Register: DS, ES, FS, GS • Stack Segment Register: SS • Real mode : 20-bit address space • Linear address = Physical address • The size of each segment is 64K (16-bit) • The segment register denotes the higher 16-bit offset in 20-bit address space for the segment • Protected mode : 32-bit/36-bit physical address space • Virtual –(Paging)-> Linear –(Segmentation)-> Physical • The offset and limit are stored in the descriptor table • The segment registers points to the entry in the table • Long mode : 48-bit physical address space • For CS, DS, ES, and SS, the offset is always 0, the limit is ignored. • For FS and GS, the offset can be set by the descriptor or through MSR (for > 32-bit addresses) 14
  15. 15. x86 Architecture : Segmentation • Default segment register • For code accesses, CS is used (CS:IP) • For data accesses, DS is used (DS:xx) • For string instructions, ES is used for destination (ES:(E)DI) • For stack accesses, SS is used (SS:SP) • Anyway, in real-mode: • When CS = 0x0700 and IP = 0x0c00, the instruction at 0x7c00 is executed. • Of course, there are many ways to point this address • CS : 0x0000, IP : 0x7c00 • CS : 0x07c0, IP : 0x0000 • DS, ES, and SS are similar • movw $3, 0(%bx) means store 3 to the address DS * 16 + BX 15 Code segment CS * 16 = 0x7000 0xc00 IP = 0xc00
  16. 16. x86 architecture – Misc. • Ring (privilege mode) • Ring 0, that’s all • Descriptor Table • Paging, Interrupts • Left to the next • Basic Instructions • MOV • PUSH/POP • ADD/SUB… • LEA • JMP • Jcc 16
  17. 17. x86 Architecture : Assembly Lang. • Two types of syntaxes • AT&T Style (gcc (gas)) : Of course, Linux uses this style • Intel Style (MASM, NASM) 17 AT&T Intel Sample movb $0xff, %al addl $8, 0(%rax, %rcx, 4) MOV AL, FFh ADD DWORD PTR [RAX + RCX * 4], 8 Operand Order Source, Destination Destination, Source Symbol Immediate : prefixed with $ Register : prefixed with % No prefix Suffix b for 8-bit operation, w for 16-bit, l for 32-bit, and q for 64-bit No suffix (Inferred by the operand) Addressing displacement(base, index, scale) (width) ptr [base + index * scale + displacement]
  18. 18. ARM Architecture (ARMv7-A) • Also too complicated • ARM Instruction Set (32-bit) • Thumb Instruction Set (16-bit, more code density) • ARMv6T2 introduced Thumb2 • Thumb2 has almost the same functionality as ARM • Jazelle, ThumbEE • Execution State register states which instruction set the processor executes. • Registers • 16 “general-purpose registers” • 13 General Purpose Registers (r0 to r12) • r8 to r15 (r8 to r12, SP, LR and PC) are banked • 3 Special Purpose Registers (SP, LR, and PC / or r13 to r15) • Reading PC returns the current inst + 8 (in ARM), + 4 (in Thumb) 18
  19. 19. Instruction Sets • How to switch the instruction set? • BX, BLX instructions • If the least significant bit of the target address is ‘1’, then it switches to Thumb. If the second significant bit is ‘0’, to ARM. [interworking address] • LDR, LDM to PC (r15) • Also in ARM7, ALU instructions (ADD, MOV, etc.) for the PC register in ARM instruction set w/o condition flags • Exceptions entries and returns 19
  20. 20. ARM architecture – Misc. • Mode • User, Supervisor, FIQ, etc. • Paging, Interrupts • Conditional Instructions • Etc. 20
  21. 21. ARM Assembler • UAL (unified assembler language) • Canonical form for ARM and Thumb instructions • ADC (Thumb) => ADCS • Instruction Example • MOV{S}<c> <Rd>, #<const> • Load the immediate (8-bit) to the register • MOV{S}<c><q> <Rd>, <Rm> • Copy the contents of <Rm> to <Rd> • <c> : condition • <q> : encoding (16-bit/32-bit) qualifier • When not specified and both are available, the 16-bit encoding is selected 21
  22. 22. 3. Booting in x86 GRUB and bzImage 22
  23. 23. Boot Sequence in This Presentation • Typical boot sequence in PC (x86_64) 23 Power On BIOS GRUB2(boot.img) HDD (MBR/VBR) boot.img GRUB2(core.img) (1 sector = 512 byte) HDD (MBR~1st part.) core.img (up to 62 sectors = approx. 32KB) HDD (/boot part.) grub.cfg bzImage *.mod Entrypoint in Linux
  24. 24. BIOS • BIOS (Basic Input/Output System) • Executed right after the machine turns on • Initializes CPUs, and hardware • Provides basic I/O services • Used by boot loaders (in real mode) • E.g.) Load from Hard Disk Drive (INT 13H), Memory Information (INT 0x15, AX 0xe820) etc. • Builds up various data structures for machine information • ACPI Tables • Starts up bootloaders • Loads at CS:IP = 0x00:0x7c00 • Provides user interface for boot, Hardware settings, various managements • To be replaced by UEFI… 24
  25. 25. BIOS Call • Uses “INT” instruction • It executes an interrupt handler • BIOS sets the address for its service code in the interrupt vector table. • Some operating systems also use this for system calls • INT 0x21 for MS-DOS • INT 0x80 for Linux (in the past) • Parameters are specified by the registers • AH / AX : Function number • Other registers : Parameters • Example • INT 0x13 (Disk access) • AH = 0x02 (Read by CHS), 0x03 (Write by CHS)… • AL = Number of sectors • CH = Cylinder Number • CL = Sector Number (Bits 0-5), Higher bits in Cylinder Number (Bits 6-7) • DH =Head Number • DL = Driver Number • ES:BX = Buffer 25
  26. 26. GRUB • boot.img (512 byte) • Usually located in the first sector (MBR) in HDD • Loaded at 0x7c00 by BIOS • Real-mode • Loads the next sector from HDD • The position is embedded by the GRUB installer (in sector, blue part) • Typically, at Sector 1 (the next sector) • core.img • Located at the gap sectors between MBR and the first partition • The first partition begins at the 63rd sector (traditionally) or at 1MB (recently, as seen in right) • The first sector in core.img loads the remaining part of core.img from HDD 26 # dd if=/dev/vda count=1 bs=512 2> /dev/null | od -t x1 -A x 000000 eb 63 90 00 00 00 00 00 00 00 00 00 00 00 00 00 000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * 000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 02 000040 ff 00 00 20 01 00 00 00 00 02 fa 90 90 f6 c2 80 000050 75 02 b2 80 ea 59 7c 00 00 31 00 80 01 00 00 00 000060 00 00 00 00 ff fa 90 90 f6 c2 80 74 05 f6 c2 70 000070 74 02 b2 80 ea 79 7c 00 00 31 c0 8e d8 8e d0 bc 000080 00 20 fb a0 64 7c 3c ff 74 02 88 c2 52 be 80 7d 000090 e8 17 01 be 05 7c b4 41 bb aa 55 cd 13 5a 52 72 0000a0 3d 81 fb 55 aa 75 37 83 e1 01 74 32 31 c0 89 44 0000b0 04 40 88 44 ff 89 44 02 c7 04 10 00 66 8b 1e 5c 0000c0 7c 66 89 5c 08 66 8b 1e 60 7c 66 89 5c 0c c7 44 0000d0 06 00 70 b4 42 cd 13 72 05 bb 00 70 eb 76 b4 08 0000e0 cd 13 73 0d f6 c2 80 0f 84 d8 00 be 8b 7d e9 82 0000f0 00 66 0f b6 c6 88 64 ff 40 66 89 44 04 0f b6 d1 000100 c1 e2 02 88 e8 88 f4 40 89 44 08 0f b6 c2 c0 e8 000110 02 66 89 04 66 a1 60 7c 66 09 c0 75 4e 66 a1 5c 000120 7c 66 31 d2 66 f7 34 88 d1 31 d2 66 f7 74 04 3b 000130 44 08 7d 37 fe c1 88 c5 30 c0 c1 e8 02 08 c1 88 000140 d0 5a 88 c6 bb 00 70 8e c3 31 db b8 01 02 cd 13 000150 72 1e 8c c3 60 1e b9 00 01 8e db 31 f6 bf 00 80 000160 8e c6 fc f3 a5 1f 61 ff 26 5a 7c be 86 7d eb 03 000170 be 95 7d e8 34 00 be 9a 7d e8 2e 00 cd 18 eb fe 000180 47 52 55 42 20 00 47 65 6f 6d 00 48 61 72 64 20 000190 44 69 73 6b 00 52 65 61 64 00 20 45 72 72 6f 72 0001a0 0d 0a 00 bb 01 00 b4 0e cd 10 ac 3c 00 75 f4 c3 0001b0 00 00 00 00 00 00 00 00 0e 14 50 70 00 00 00 20 0001c0 21 00 83 35 37 3e 00 08 00 00 00 38 0f 00 00 35 0001d0 38 3e 82 51 60 31 00 40 0f 00 00 98 3b 00 00 51 0001e0 61 31 83 fe ff ff 00 d8 4a 00 00 10 7e 03 00 fe 0001f0 ff ff 05 fe ff ff fe ef c8 03 02 08 37 15 55 aa 000200 Jump! Boot sector signature
  27. 27. GRUB (2) • core.img • Includes the modules required to boot operating systems • Menu facilities • e.g.) vga.mod • File system modules to access the configuration file (grub.cfg) • e.g.) ext2.mod • OS Loader modules • e.g.) linux.mod • Modularized to fit in the gap sectors 27
  28. 28. Linux image • Linux boot image • vmlinux [+ compression + setup code + headers] • Various types of boot images • bzImage • “big zImage” • Mainly used in x86 • uImage • Used in systems booted by U-Boot • ARM, SPARC, PPC, SH, … • treeImage • Used by OpenBIOS (ppc) • Includes DeviceTree blob • simpleImage • Used by OpenFirmware (ppc) • xipImage • “eXecute-In-Place” image 28
  29. 29. bzImage • What you’ve got in /boot in your PC • Usually named as /boot/vmlinuz-(version) • What format is this? • Originally bootable from FDD • bzImage is written in the first sector in FDD • Deprecated in 2.5.xx? (not verified) 29 000000 ea 05 00 c0 07 8c c8 8e d8 8e c0 8e d0 31 e4 fb 000010 fc be 2d 00 ac 20 c0 74 09 b4 0e bb 07 00 cd 10 000020 eb f2 31 c0 cd 16 cd 19 ea f0 ff 00 f0 44 69 72 000030 65 63 74 20 66 6c 6f 70 70 79 20 62 6f 6f 74 20 000040 69 73 20 6e 6f 74 20 73 75 70 70 6f 72 74 65 64 000050 2e 20 55 73 65 20 61 20 62 6f 6f 74 20 6c 6f 61 000060 64 65 72 20 70 72 6f 67 72 61 6d 20 69 6e 73 74 000070 65 61 64 2e 0d 0a 0a 52 65 6d 6f 76 65 20 64 69 000080 73 6b 20 61 6e 64 20 70 72 65 73 73 20 61 6e 79 000090 20 6b 65 79 20 74 6f 20 72 65 62 6f 6f 74 20 2e 0000a0 2e 2e 0d 0a 00 00 00 00 00 00 00 00 00 00 00 00 0000b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * 0001e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ff 0001f0 ff 1d 01 00 5c db 00 00 00 00 ff ff 00 00 55 aa ... Boot sector signature Again Far jump 93: bugger_off_msg: 94: .ascii "Direct floppy boot is not supported. " 95: .ascii "Use a boot loader program instead.rn" 96: .ascii "n" 97: .ascii "Remove disk and press any key to reboot ...rn" 98: .byte 0
  30. 30. How bzImage is created? • Magical ceremonies in arch/x86/boot • After “vmlinux” is ready, the following sequence runs. 30 LD vmlinux SORTEX vmlinux SYSMAP System.map CC arch/x86/boot/a20.o AS arch/x86/boot/bioscall.o ... LDS arch/x86/boot/compressed/vmlinux.lds AS arch/x86/boot/compressed/head_32.o ... CC arch/x86/boot/compressed/early_serial_console.o OBJCOPY arch/x86/boot/compressed/vmlinux.bin GZIP arch/x86/boot/compressed/vmlinux.bin.gz HOSTCC arch/x86/boot/compressed/mkpiggy MKPIGGY arch/x86/boot/compressed/piggy.S AS arch/x86/boot/compressed/piggy.o ... LD arch/x86/boot/compressed/vmlinux ZOFFSET arch/x86/boot/zoffset.h ... LD arch/x86/boot/setup.elf OBJCOPY arch/x86/boot/setup.bin OBJCOPY arch/x86/boot/vmlinux.bin ... BUILD arch/x86/boot/bzImage
  31. 31. So what? 31 vmlinux boot/compressed/vmlinux.bin (1) Strip symbols vmlinux.bin.gz(2) Compress (gzip, bzip2, lzma, lzo, lz4) piggy.o (3) mkpiggy (piggy-back) Make an object that contains the compressed image piggy.o*.o boot/compressed/vmlinux (4) Link with the other objects in boot/compressed (Decompressing codes) (5) Transform it into a simple binary boot/vmlinux.bin boot/vmlinux.binboot/setup.bin (6) Concatenate with real-mode setup code, headers, and CRC32 CRC boot/bzImage The deprecated FDD boot code is (was) here!
  32. 32. Column: How to embed a binary in your executable? • Many ways to do that • Convert it to the “hex” text, and #include • Use “.incbin” mnemonic in the assembler • mkpiggy automatically generates this assembler file. 32 (binary file.hex) 0xeb, 0xfe, 0x90, 0x90, … (C file) unsigned char binary[] = { #include “binary_file.hex” }; .section .rodata .globl input_data, input_data_end input_data: .incbin “binary_file.bin” input_data_end:
  33. 33. Boot Protocol • Documentation/x86/boot.txt • Build-time parameters (size of the setup code, etc.) and parameters filled by bootloaders (the address for command-line parameters, initrd, etc.) are located in the bottom of the boot sector and the header of the setup code. 33 boot/vmlinux.binboot/setup.bin CRC Real-mode kernel Protected-mode kernel Setup code Boot sector (header.S) Real-mode (16-bit) entry point 32-bit entry point (+0x0) 64-bit entry point (+0x200)
  34. 34. Boot Protocol in bzImage • 4 entry points (1) 16-bit entry point (Real mode) (2) 32-bit entry point (Protected mode) (3) 64-bit entry point (Long mode) (4) The true entry point (in vmlinux) 34 boot/vmlinux.binboot/setup.bin CRC Real-mode kernel Protected-mode kernel Setup code Boot sector (header.S) (1) Real-mode (16-bit) entry point (2) 32-bit entry point (+0x0) (3) 64-bit entry point (+0x200) vmlinux decompress (4) entry point in vmlinux
  35. 35. Boot Protocol in bzImage • To avoid excess mode transition • The modern bootloader/firmware runs in the protected mode or long mode • For later entry points, the bootloader/firmware should provide the information that had been collected in the prior stage of the Linux kernel • In most cases, such information is already retrieved in a bootloader as the bootloader also needs it. 35
  36. 36. Fast Backward 36 (1) bzImage is loaded by a boot loader Protected-mode kernelRM Kernel (2) The real-mode kernel runs and switches CPU to the protected mode. RM Kernel Protected-mode kernel (3) The protected-mode kernel runs RM Kernel Protected-mode kernel RM Kernel Protected-mode kernel (4) It switches the CPU to the long mode. (In x86_64 only) RM Kernel vmlinux (5) It decompresses the compressed vmlinux (moves the decompressing code if necessary) Decompress Code (6) Jumps to the entery point in vmlinx (the startup_32/startup_64 function) RM Kernel vmlinux Decompress Code 16-bit Mode 32-bit Mode 32-bit /64-bit Mode Higher Address 0x100000
  37. 37. 32-bit or 64-bit in x86 • Originally, the “arch” directories for 32-bit kernel and 64-bit kernel are different (i386 and amd64). • In Linux 2.6.24, they are merged into a single directory (x86). • First it was almost just merging the directory and renaming the 32-bit source files to xxx_32.c, and the 64- bit source files to xxx_64.c • Then merged to a single file with #ifdef’s • Now, the duplication of the code is minimized • (non-suffixed) and xxxx_64.c/h • xxxx_32.c/h and xxxx_64.c/h • CONFIG_X86_32 and CONFIG_X86_64 37
  38. 38. 3-1. Real Mode “640 k ought to be enough for anybody” 38
  39. 39. Real-mode protocol • Used with the “linux16” module in GRUB2 • Starts with the transition from real-mode to protected-mode, and jump into protected-mode kernel (32-bit entry point) • Suggested Memory Layout: 39 Setup code B S Heap/stac k BIOS Resv. I/O Mem Hole 0xA0000 (640KB) 0x100000 (1MB) Protected Mode Kernel Higher Address 0 Jump
  40. 40. Headers (1) • Boot sector • The code itself is entirely useless • setup_sects denotes the size of setup code in sector • Thus, the protect kernel begins at the offset (1 + setup_sects) * 512 40 262: .globl hdr 263: hdr: 264: setup_sects: .byte 0 /* Filled in by build.c */ 265: root_flags: .word ROOT_RDONLY 266: syssize:.long 0 /* Filled in by build.c */ 267: ram_size: .word 0 /* Obsolete */ 268: vid_mode: .word SVGA_MODE 269: root_dev: .word 0 /* Filled in by build.c */ 270: boot_flag: .word 0xAA55 (arch/x86/boot/header.S)
  41. 41. Headers (2) • Setup code • The top of the setup code contains parameters • struct setup_header • The parameter in the boot sector and the header of the setup code are defined as one struct in C 41 47: struct setup_header { 48: __u8 setup_sects; 49: __u16 root_flags; 50: __u32 syssize; 51: __u16 ram_size; 52: __u16 vid_mode; 53: __u16 root_dev; 54: __u16 boot_flag; 55: __u16 jump; 56: __u32 header; 57: __u16 version; 58: __u32 realmode_swtch; ... (arch/x86/include/uapi/asm/bootparam.h) Setup code Boot Sector 0x0000 0x0200 0x1f1
  42. 42. struct setup_header (1) 42 Member Sz Description setup_sects 1 Number of sectors for setup code syssize 4 Size of protected-mode kernel in 16-byte unit header 4 “HdrS” version 2 Header version. Latest = 0x020d (Protocol 2.13) type_of_loader 1 Type of bootloader + ver. 0xTV (T: 0 = LILO, 7 = GRUB…) code32_start 4 Address of protected-mode kernel is loaded. Default: 0x100000. Used to hook/load in the other addr. ramdisk_image 4 Address of initial ramdisk/ramfs. 0 = None ramdisk_size 4 Size of initial ramdisk/ramfs. 0 = None heap_end_ptr 2 Offset of the end of the heap/stack minus 0x200 cmd_line_ptr 4 Address of the command line parameter. Somewhere between the heap/stack end and 0xA0000 If zero, loader is assumed not to support 2.02 protocol. For an empty parameter, point to “auto” or empty string.
  43. 43. struct setup_header (2) 43 Member Sz Description relocatable_kernel 1 Indicate whether the protected-mode kernel is relocatable. payload_offset 4 Offset to the payload from the protected-mode code payload_length 4 Length of the payload setup_data 8 Pointer to the single linked list for additional setup_data’s realmode_switch 4 Hook called just before switching to the protected mode
  44. 44. What does the setup code setup? • header.S • Contains setup_header • Prepares stack and BSS to run C programs • Jumps into the C program (main.c) • main.c • Copies setup_header into “zeropage” • Setups early console • Initializes heap • Checks the CPUs (64-bit capable for 64-bit kernel?) • Collect HW information by querying to BIOS, and stores the results in “zeropage” • Finally transits to protected-mode, and jumps into the “protected-mode kernel” 44
  45. 45. struct boot_params • Traditionally called “zeropage” • A page that contains additional boot information for 32-bit mode • Statically allocated in main.c • Including the whole struct setup_header. • main.c first copies the contents of struct setup_header into &boot_params.hdr 45 113: struct boot_params { 114: struct screen_info screen_info; /* 0x000 */ … 132: __u8 e820_entries; /* 0x1e8 */ … 150: struct setup_header hdr; /* setup header */ /* 0x1f1 */ … 153: struct e820entry e820_map[E820MAX]; /* 0x2d0 */ … 157: } __attribute__((packed));
  46. 46. Collecting HW Information • Memory size [arch/x86/boot/memory.c] • Try the methods in the following order: • AX = 0xe820, INT 0x15 • AX = 0xe801, INT 0x15 • AH = 0x88, INT 0x15 • IST (Intel SpeedStep Technology) Information 46
  47. 47. Memory Information • AX = 0xe820, INT 0x15 [detect_memory_e820()] • INPUT • AX = 0xe820 • CX = size of the buffer • EDX = “SMAP” (0x534d4150 / Signature) • EBX = Continuation value • ES:DI = address for the buffer • OUTPUT • CF = 0 if successful, 1 otherwise • CX = Returned Byte • EBX = Continuation value • Each call returns information for one range • To get information for the next range, give the continuation value returned in the previous call • The range information is returned by the following structure • Stored in boot_params.e820_map (struct e820entry[128]) 47 52 struct e820entry { 53 __u64 addr; /* start of memory segment */ 54 __u64 size; /* size of memory segment */ 55 __u32 type; /* type of memory segment */ 56 } __attribute__((packed)); (arch/x86/include/uapi/asm/e820.h) Type Value E820_RAM 1 E820_RESERVED 2 E820_ACPI 3 E820_NVS 4 E820_UNUSABLE 5
  48. 48. (Example) 48 BIOS-e820: [mem 0x0000000000000000-0x000000000009ebff] usable BIOS-e820: [mem 0x000000000009ec00-0x000000000009ffff] reserved BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved BIOS-e820: [mem 0x0000000000100000-0x00000000668c2fff] usable BIOS-e820: [mem 0x00000000668c3000-0x000000006690afff] ACPI NVS BIOS-e820: [mem 0x000000006690b000-0x0000000066913fff] ACPI data BIOS-e820: [mem 0x0000000066914000-0x0000000066916fff] ACPI NVS BIOS-e820: [mem 0x0000000066917000-0x0000000066918fff] usable BIOS-e820: [mem 0x0000000066919000-0x0000000066919fff] reserved BIOS-e820: [mem 0x000000006691a000-0x000000006691afff] ACPI NVS BIOS-e820: [mem 0x000000006691b000-0x000000006693dfff] reserved BIOS-e820: [mem 0x000000006693e000-0x0000000066945fff] ACPI NVS BIOS-e820: [mem 0x0000000066946000-0x000000006699ffff] reserved BIOS-e820: [mem 0x00000000669a0000-0x00000000669a3fff] ACPI NVS BIOS-e820: [mem 0x00000000669a4000-0x0000000066d95fff] usable BIOS-e820: [mem 0x0000000066d96000-0x0000000066ef2fff] reserved BIOS-e820: [mem 0x0000000066ef3000-0x0000000066efffff] usable BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved BIOS-e820: [mem 0x00000000fec10000-0x00000000fec10fff] reserved BIOS-e820: [mem 0x00000000fed00000-0x00000000fed00fff] reserved BIOS-e820: [mem 0x00000000fed61000-0x00000000fed70fff] reserved BIOS-e820: [mem 0x00000000fed80000-0x00000000fed8ffff] reserved BIOS-e820: [mem 0x00000000fef00000-0x00000000ffffffff] reserved
  49. 49. Memory Information : Old Age • AH = 0x88, INT 0x15 [detect_memory_88()] • INPUT • AH = 0x88, INT 0x15 • OUTPUT • AX = Size of memory above 1MB [in KB] • CF = 1 if error, 0 otherwise • Can return up to 64MB • Stored in boot_params.ext_mem_k • AX = 0xe801, INT 0x15 [detect_memory_e801()] • INPUT • AH = 0xe801, INT 0x15 • OUTPUT • AX = Size of memory between 1MB ~ 16MB [in KB] • BX = Size of memory above 16MB [in 64KB] • CX, DX = Unknown (Same as AX and BX, respectively) • Can return up to 4GB • Currently, Linux ignores the area > 16MB if AX != 15MB. • Stored in boot_params.alt_mem_k 49 Gradually, converted to e820 map in arch/x86/ kernel/e820.c
  50. 50. Goes into the protected mode • go_to_protected_mode() in pm.c • Calls realmode_switch_hook • Enables A20 • Deassert IGNNE# in x87 • Disable interrupts in PICs • Set up IDT and GDT • Protected_mode_jump in pmjump.S • Enable PE in CR0 • And ljmp (0x66 0xea) • Jumps into 32-bit entry point 50
  51. 51. 3-2. Protected Mode Now it’s 32 bit. 51
  52. 52. Protected-Mode Protocol • Starts at the top of the protected mode kernel • Usually loaded at 0x100000 (1MB) • Can be at any position if compiled as relocatable • Should be at the same position as specified in the compile time if compiled as not relocatable • Used in “linux” module in GRUB2 • [Protocol] At the entry point, • The loaded GDT must have __BOOT_CS (0x10 / execute and read) and __BOOT_DS(0x18 / read and write) • %cs must be __BOOT_CS • %ds, %es, and %ss must be __BOOT_DS • Interrupts must be disabled • %esi must be the address for struct boot_params • %ebp, %edi, and %ebx must be zero. 52
  53. 53. Protected-Mode Kernel • arch/x86/boot/compressed/head_{32,64}.S • Goal: Decompresses the kernel (vmlinux.gz/.bz2/.xz…) and start the kernel • Relocates the decompressing code (if relocatable and loaded at a different address) • Enables paging and enters the long-mode (in head_64.S) • Clears the BSS, and prepares the heap and stack • Decompresses the kernel • Relocates if required • RANDOMIZED_BASE or RELOCATABLE (in 32-bit) 53
  54. 54. Relocation of the Kernel (1) • When is it required? • When a program is loaded at a different address from the expected one in the compile-time. • When an instruction in the program uses an absolute address for operand(s) • When is it? • In 32-bit mode, kernel data address = kernel code address • There is no simple way to do so • No RIP-relative! • Address randomization (RANDOMIZE_BASE) • To randomize kernel physical/virtual address 54
  55. 55. Relocation of the Kernel (2) • How is it done? • Create tables of the positions of all the absolute symbols • At the runtime, rewrite the addresses adding the delta between the expected address and the actual address. • Done! • How is the table created? • The object files have the table to link with the other objjects • LD’s option (-q/--emit-relocs) leaves the table in ELF 55
  56. 56. Kernel memory map in x86(_32) 56 PAGE_OFFSET (0xc0000000) 0xf8000000 lowmem User space 0x00000000 Virtual Address Physical Address Linux Kernel Linux Kernel
  57. 57. Kernel memory map in x86_64 57 PAGE_OFFSET (0xFFFF880000000000) lowmem User space 0x0000000000000000 Virtual Address Physical Address __START_KERNEL_map (0xFFFFFFFF80000000) Linux Kernel text & data
  58. 58. Decompressing the Kernel • Heap and stack are taken from the static area in head_{32/64}.S • If the output and the decompressing code may overlap, first relocate the decompressing code • When all done, it parses the ELF header, and loads the sections to appropriate addresses • Now, jumping!! to the entry point in ELF! 58
  59. 59. Welcome to Linux kernel! • Now we are at arch/x86/head_{32/64}.S! • The details from here on are in the next presentation 59
  60. 60. 3-3. Modern World 60
  61. 61. GPT • GPT (GUID Partition Table) • Resolves MBR’s issues • The offset and size of a partition in MBR’s table is expressed by 32- bit wide LBA (Logical Block Addressing) • Cannot point to the >2TB sector • 32 bit (232) * 512 byte/sector (29) = 2 TB (241) • Part of UEFI • Sectors used • LBA 0 = MBR (compatible partition table) • MBR’s partition table is set as the whole disk area is reserved for a partition (System ID = 0xee) • LBA 1 = Header • LBA 2 ~ 33 = Partition Information (128 Partitions) • Partition type is expressed by GUID (Global Unique Identifier) • EBD0A0A2-B9E5-4433-87C0-68B6B72699C7 for Linux file system • LBA -34 ~ -1 = Backup • The copy of LBA 1 ~ 33 61
  62. 62. GPT and GRUB • Hey, where should “core.img” be?? • In MBR partitions, there is “gap” between MBR and the first partition… • In GPT, it should be allocated as a partition • “BIOS Boot Partition” • Created right after GPT, and before 1MB • All the other partitions are allocated in 1MB aligned • Thus, there is also gap between GPT and the next partition. • GUID: 21686148-6449-6e6f-744e656564454649 • “Hah!IdontNeedEFI” • Yes. If you use UEFI, you don’t need such partition! 62
  63. 63. UEFI • Universal Extensible Firmware Interface • Its origin, EFI, was developed by HP and Intel. • Developed for the Itanium systems (2000) • Addresses the 16-bit mode limitation in BIOS • Advantages • CPU-independent architecture and drivers • Device drivers are created by EFI Byte Code (EBC) • Modular design • Rich functionality like GUI, file systems, and network boot (not only by PXE, but also SAN boot, iSCSI, etc.), cryptography, etc. • GPT (Can boot from >2TB disks) 63
  64. 64. Boot loaders in UEFI • OS Loaders • One of the UEFI Application • Loaded from the EFI System Partition • GRUB2 supports the UEFI boot • Linux kernel itself can be directly executed from UEFI • “EFI Stub” (CONFIG_EFI_STUB) • Makes bzImage as the UEFI Application (PE Format) • Very simple functionality (no boot menu…), thus GRUB2 is recommended 64 43 .global bootsect_start 44 bootsect_start: 45 #ifdef CONFIG_EFI_STUB 46 # "MZ", MS-DOS header 47 .byte 0x4d 48 .byte 0x5a 49 #endif 50 51 # Normalize the start address 52 ljmp $BOOTSEG, $start2 84 #ifdef CONFIG_EFI_STUB 85 .org 0x3c 86 # 87 # Offset to the PE header. 88 # 89 .long pe_header 90 #endif /* CONFIG_EFI_STUB */
  65. 65. “Secure Boot” • A mechanism that allows only the “signed” binary (OS) can be executed • To protect the PCs from the malware (like ones that infects the MBRs) • The trusted keys are pre-stored in the firmware • Microsoft can (practically) enforce the PC vendors to include its key • Otherwise, Windows cannot be booted. • Then, how about the other OSes? • Several approaches • Having users put the keys of distributors to the trusted list • The open-source foundation makes the PC vendors include their keys • Bootloader projects (or distribution) have their bootloaders signed by Microsoft 65
  66. 66. shim • A simple EFI bootloader • It just chainloads the GRUB2 UEFI bootloader. • The path to the next bootloader is hardcoded in the program • “grubx64.efi” • “fallback.efi” • In Ubuntu, “shim-signed” package contains the signed version for “shim” 66
  67. 67. 4. Booting in ARM uBoot and uImage 67
  68. 68. U-Boot and uImage • ARM Case 68 vmlinux boot/Image (1) Strip symbols and transform into a simple binary piggy.gzip (2) Compress (gzip, xzkern, lzma, lzo, lz4) piggy.gzip.o piggy.o*.o boot/compressed/vmlinux (5) Transform it into a simple binary boot/zImage (3) Make an object file piggyback the compressed image (4) Link with the other objects in boot/compressed (Decompressing codes) (6) Convert zImage to uImage by mkimage (U-boot’s utility) boot/uImage
  69. 69. ARM Boot Protocol • Documentation/arm/Booting • The entry point (in compressed/head.S) is called with two arguments • r1: CPU type • r2: boot data • Either the pointer to ATAGs or to DTB (device tree blobs) • When r2 points to DTB, r1 is ignored. • No BIOS-like things, thus hardware information should be provided by the boot loader (and also hard-coded in the kernel itself) • ATAG List • An array of ATAG; each element is of variable length • ATAG_CORE : # of cores, ATAG_MEM : memory size, etc. 69
  70. 70. ATAGs • ATAG • Tagged information for hardware • Used by “bootm” command in uBoot • Converted to FDT if CONFIG_ARM_ATAG_DTB_COMPAT 70 22: #define ATAG_NONE 0x00000000 24: struct tag_header { 25: __u32 size; 26: __u32 tag; 27: }; ... 39: #define ATAG_MEM 0x54410002 40: 41: struct tag_mem32 { 42: __u32 size; 43: __u32 start; /* physical start address */ 44: }; 45 (arch/arm/include/uapi/asm/setup.h) Header: ATAG_CORE Contents for ATAG_CORE Header: ATAG_MEM Contents for ATAG_MEM Header: ATAG_INITRD2 Contents for ATAG_INITRD2 Header: ATAG_CMDLINE Contents for ATAG_CMDLINE Header: ATAG_NONE
  71. 71. Flattened Device Tree (FDT) Blobs • Binary form of flattened device tree • Device Tree (described in the next slide) expressed in the memory sequentially in binary • Used by “fdt” command 71 44: struct boot_param_header { 45: __be32 magic; /* magic word OF_DT_HEADER */ 46: __be32 totalsize; /* total size of DT block */ 47: __be32 off_dt_struct; /* offset to structure */ 48: __be32 off_dt_strings; /* offset to strings */ 49: __be32 off_mem_rsvmap; /* offset to memory reserve map */ 50: __be32 version; /* format version */ 51: __be32 last_comp_version; /* last compatible version */ 52: /* version 2 fields below */ 53: __be32 boot_cpuid_phys; /* Physical CPU id we're booting on */ 54: /* version 3 fields below */ 55: __be32 dt_strings_size; /* size of the DT strings block */ 56: /* version 17 fields below */ 57: __be32 dt_struct_size; /* size of the DT structure block */ 58: }; (include/linux/of_fdt.h)
  72. 72. Device Tree [5][6] • Device Tree • Describes hardware • Simple tree of named nodes and properties • A property is a pair of a name and a value • “chosen” node • Not representing the real hardware • Information between the firmware (bootloader) and OS kernel • Can include initrd information, command line parameters (bootargs) 72 [5] http://www.devicetree.org/Main_Page [6] http://www.devicetree.org/Device_Tree_Usage
  73. 73. FDT 73 struct boot_param_header: __be32 off_dt_struct; __be32 off_dt_strings; r2 OF_DT_BEGIN_NODE (0x01) Path name (“chosen”) OF_DT_PROP (0x03) Size Offset in the string Strings... Contents OF_DT_END_NODE (0x02) off_dt_struct off_dt_strings “bootargs” OF_DT_END (0x09) (drivers/of/fdt.c)

×