SlideShare a Scribd company logo
1 of 44
Initialization (1)
Taku Shimosawa
Pour le livre nouveau du Linux noyau
1
Agenda
• Initialization Phase of the Linux Kernel
• Turning on the paging feature
• Calling *init functions
• And miscellaneous things related to initialization
2
1. vmlinux
This is the linux kernel
3
vmlinux
• Main kernel binary
• Runs with the final CPU state
• Protected Mode in x86_32 (i386)
• Long Mode in x86_64
• And so on…
• Runs in the virtual memory space
• Above PAGE_OFFSET (default: 0xc0000000) (32-bit)
• Above __START_KERNEL_map (default: 0xff…f80000000)
• i.e. All the absolute addresses in the binary are virtual ones
• Entry points
4
Architecture Name Location Name (secondary)
x86_32 startup_32 arch/x86/kernel/head_32.S startup_32_smp
x86_64 startup_64 arch/x86/kernel/head_64.S secondary_startup_64
ARM stext arch/arm/kernel/head[_nommu].S secondary_startup
ARM64 stext arch/arm64/kenel/head.S secondary_holding_pen
secondary_entry
PPC _stext arch/powerpc/kernel/head_32.S* (__secondary_start)
Virtual memory mapping
5
x86_64 Virtuali386 Virtual Physical
LOWMEM
PAGE_OFFSET
(0xC0000000)
Up to ~896 MB
text/data
PAGE_OFFSET
(0xFFFF8800
00000000)
__START_KERNEL_map
(0xFFFFFFFF
80000000)
0x00000000 0x0000000000000000
0xFFFFFFFF
0xFFFFFFFFFFFFFFFF
2GB
Why different mapping in 64-bit?
• The kernel code, data, and BSS reside in the last 2-
GB of the memory
=> Addressable by 32-bit!
• -mcmodel option in GCC
• Specifies the assumptions for the size of code/data
sections
6
-mcmodel option
(x86)
text data
small within 2GB
kernel within -2GB
medium within 2GB Can be > 2GB
large Anywhere in 64bit
Column: -mcmodel in gcc
7
int g_data = 4;
int main(void)
{
g_data += 7;
...
}
8b 05 c6 0b 20 00 mov 0x200bc6(%rip),%eax # 601040 <g_data>
...
bf 01 00 00 00 mov $0x1,%edi
8d 50 07 lea 0x7(%rax),%edx
48 b8 40 10 60 00 00 movabs $0x601040,%rax
00 00 00
bf 01 00 00 00 mov $0x1,%edi
8b 30 mov (%rax),%esi
...
8d 56 07 lea 0x7(%rsi),%edx
large
#define SZ (1 << 30)
int buf[SZ] = {1};
int main(void)
{
buf[0] += 3;
}
$ gcc -O3 -o ba -mcmodel=small bigarray.c
/usr/lib/gcc/x86_64-linux-gnu/4.8/crtbegin.o: In function
`deregister_tm_clones':
crtstuff.c:(.text+0x1): relocation truncated to fit:
R_X86_64_32 against symbol `__TMC_END__' defined in .data
section in ba
small
kernel
48 b8 60 10 a0 00 00 movabs $0xa01060,%rax
00 00 00
8b 08 mov (%rax),%ecx
8d 51 03 lea 0x3(%rcx),%edx
medium
large
*The offset of RIP-relative addressing is 32-
bit
Column: -mcmodel in gcc (2)
• Code?
8
void nop(void)
{
asm volatile(".fill (2 << 30), 1, 0x90");
}
$ gcc -O3 -o ba -mcmodel=small supernop.c
/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-
gnu/crt1.o: In function `_start':
(.text+0x12): relocation truncated to fit: R_X86_64_32S
against symbol `__libc_csu_fini' defined in .text section in
/usr/lib/x86_64-linux-gnu/libc_nonshared.a(elf-init.oS)
$ gcc -O3 -o ba -mcmodel=large supernop.c
/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-
gnu/crt1.o: In function `_start':
(.text+0x12): relocation truncated to fit: R_X86_64_32S
against symbol `__libc_csu_fini' defined in .text section in
/usr/lib/x86_64-linux-gnu/libc_nonshared.a(elf-init.oS)
small
medium
kernel
large
Initialization Overview
9
Booting Code
(Preparing CPU states, Gathering HW information, Decompressing vmlinux etc.)
arch/*/boot/
arch/*/kernel/head*.S, head*.c
Low-level Initialization
(Switching to virtual memory world, Getting prepared for C programs)
init/main.c (startup_kernel)
Initialization
(Initializing all the kernel features including architecture-dependent parts)
init/main.c (rest_init)
Creating the “init” process, and letting it the rest
initialization
(Setting up multiprocessing, scheduling)
kernel/sched/idle.c (cpu_idle_loop)
“Swapper” (PID=0) now sleeps
init/main.c (kernel_init)
Performing final initialization
and
“Exec”ing the “init” user
“init” (PID=1)
arch/*/kernel, arch/*/mm, …Call
vmlinux
2. Towards Virtual
Memory
10
Enabling paging
• The early part is executed with paging off.
• Physical address space
• vmlinux is assumed to be executed with paging on.
• The addresses in the binary are not physical addresses.
• The first big job in vmlinux is enabling paging
• Creating a (transitional) page table
• Setting the CPU to use the page table, and to enable
paging
• Jumping to the entry point in C (compiled in the virtual
address space)
11
Identity Map
• At first, the goal page table cannot be used
• Since changing PC and enabling paging are (at least, in
x86) separate instructions.
12
PC
Physical Virtual
Enable
Paging
Physical Virtual
Page Fault!
Identity Map
• Therefore, identity map is created in addition to the
(goal) map.
13
PC
Physical Virtual
Jump
(1) Create an initial page table (2) Enable paging, and
Jump to a virtual address.
(3) Zap the low
mapping
Addresses in the transitional phase
• x86_64
• The decompressing routine enables paging and creates
an identity page table (only for first 4GB)
• Paging is required for CPUs to switch to 64-bit mode
• Located in 6 pages (pgtable) in the decompressing routine
• Symbols in vmlinux are accessed with RIP-relative
• No trick is necessary for using the symbols
14
leaq _text(%rip), %rbp
subq $_text - __START_KERNEL_map, %rbp
...
leaq early_level4_pgt(%rip), %rbx
...
movq $(early_level4_pgt - __START_KERNEL_map), %rax
addq phys_base(%rip), %rax
movq %rax, %cr3
movq $1f, %rax
jmp *%rax
1: (arch/x86/kernel/head_64.S)
Addresses in the transitional phase
• i386
• Symbols in vmlinux are accessed with absolute
addresses
• Before paging is enabled, PAGE_OFFSET is always subtracted
from addresses
15
movl $pa(__bss_start),%edi
movl $pa(__bss_stop),%ecx
subl %edi,%ecx
shrl $2,%ecx
rep ; stosl
...
movl $pa(initial_page_table), %eax
movl %eax,%cr3 /* set the page table pointer.. */
movl $CR0_STATE,%eax
movl %eax,%cr0 /* ..and set paging (PG) bit */
ljmp $__BOOT_CS,$1f /* Clear prefetch and normalize %eip */
1:
...
lgdt early_gdt_descr
lidt idt_descr
#define pa(X) ((X) - __PAGE_OFFSET)
(arch/x86/kernel/head_32.S)
3. Initialization
At last, we have come here!
16
Initialization (start_kernel)
• A lot of *_init functions!
• Furthermore, some init functions call another init
functions.
• At least, 80 functions are called in this function.
• This slide will pick up some topics from the
initialization functions
17
2.9. Before Initialization
A little more tricks
18
Special directives
• What are these?
• “I’m curious!”.
19
asmlinkage __visible void __init start_kernel(void) {
…
}
asmlinkage
• asmlinkage
• Ensures the symbol is not mangled
• (in x86_32) Ensures all the parameters are passed by the
stack
20
#ifdef CONFIG_X86_32
#define asmlinkage CPP_ASMLINKAGE __attribute__((regparm(0)))
arch/x86/include/asm/linkage.h
#ifdef __cplusplus
#define CPP_ASMLINKAGE extern "C"
#else
#define CPP_ASMLINKAGE
#endif
#ifndef asmlinkage
#define asmlinkage CPP_ASMLINKAGE
#endif
include/linux/linkage.h
__visible
• (Effective in gcc >=4.6)
21
#if GCC_VERSION >= 40600
/*
* Tell the optimizer that something else uses this function or
variable.
*/
#define __visible __attribute__((externally_visible))
#endif
include/linux/compiler-gcc4.h
commit 9a858dc7cebce01a7bb616bebb85087fa2b40871
author Andi Kleen <ak@linux.intel.com> Mon Sep 17 21:09:15 2012
committer Linus Torvalds <torvalds@linux-foundation.org> Mon Sep 17 22:00:38 2012
compiler.h: add __visible
gcc 4.6+ has support for a externally_visible attribute that prevents the
optimizer from optimizing unused symbols away. Add a __visible macro to
use it with that compiler version or later.
This is used (at least) by the "Link Time Optimization" patchset.
__init (1)
• To mark code(text) and data as only necessary
during initialization
22
#define __init __section(.init.text) __cold notrace
#define __initdata __section(.init.data)
#define __initconst __constsection(.init.rodata)
#define __exitdata __section(.exit.data)
#define __exit_call __used __section(.exitcall.exit)
(include/linux/init.h)
#ifndef __cold
#define __cold __attribute__((__cold__))
#endif
(include/linux/compiler-gcc4.h)
#ifndef __section
# define __section(S) __attribute__ ((__section__(#S)))
#endif
...
#define notrace __attribute__((no_instrument_function))
(include/linux/compiler.h)
__init (2)
• The init* sections are concentrated to a contiguous memory area
23
. = ALIGN(PAGE_SIZE);
.init.begin : AT(ADDR(.init.begin) - LOAD_OFFSET) {
__init_begin = .; /* paired with __init_end */
}
...
INIT_TEXT_SECTION(PAGE_SIZE)
#ifdef CONFIG_X86_64
:init
#endif
INIT_DATA_SECTION(16)
....
. = ALIGN(PAGE_SIZE);
...
.init.end : AT(ADDR(.init.end) - LOAD_OFFSET) {
__init_end = .;
}
arch/x86/kernel/vmlinux.lds.S
init.text
init.data
…
__init_begin
__init_end
__init (3)
• And, they are discarded (free’d) after initialization
• Called from kernel_init
24
void free_initmem(void)
{
free_init_pages("unused kernel",
(unsigned long)(&__init_begin),
(unsigned long)(&__init_end));
}
arch/x86/mm/init.c
void free_initmem(void)
{
...
poison_init_mem(__init_begin, __init_end - __init_begin);
if (!machine_is_integrator() && !machine_is_cintegrator())
free_initmem_default(-1);
}
arch/arm/mm/init.c
head32.c, head64.c
• Before calling start_kernel, i386_start_kernel or
x86_64_start_kernel is called in x86
• Located in arch/x86/kernel/head{32,64}.c
• No underscore between head and 32!
• x86 (32-bit)
• Reserve BIOS memory (in conventional memory)
• x86 (64-bit)
• Erase the identity map
• Clear BSS, copy boot information from the low memory
• And reserve BIOS memory
25
Reserve? But how?
• This is very initial time. No complicated memory
management is working right now.
• memblock (Logical memory blocks) is working!
• memblock simply manages memory blocks
• And in some architecture, information is took over to another
mechanism, and discarded after initialization
26
#define BIOS_LOWMEM_KILOBYTES 0x413
lowmem = *(unsigned short *)__va(BIOS_LOWMEM_KILOBYTES);
lowmem <<= 10;
...
memblock_reserve(lowmem, 0x100000 - lowmem);
arch/x86/kernel/head.c
#ifdef CONFIG_ARCH_DISCARD_MEMBLOCK
#define __init_memblock __meminit
#define __initdata_memblock __meminitdata
#else
...
#endif
include/linux/memblock.h
Set in S+Core, IA64, S390, SH,
MIPS and x86
Without memory hotplug,
__meminit is __init.
memblock
• Data Structure (include/linux/memblock.h)
• Initially the arrays are allocated statically
27
memblock (memblock)
memory
(memblock_type)
reserved
(memblock_type)
memblock_region
• base, size, flags[, nid]
memblock_region
memblock_region
memblock_region
Array of memblock_region
Array of memblock_region
static struct memblock_region
memblock_memory_init_regions[INIT_MEMBLOCK_REGIONS] __initdata_memblock;
static struct memblock_region
memblock_reserved_init_regions[INIT_MEMBLOCK_REGIONS] __initdata_memblock;
*INIT_MEMBLOCK_REGIONS = 128
(memblock: Global variable)
Reserving in memblock
• Reserving adds the region to the region array in the
“reserved” type
• A function to adding the available region is
memblock_add
28
static int __init_memblock memblock_reserve_region(phys_addr_t base,
phys_addr_t size,
int nid,
unsigned long flags)
{
struct memblock_type *_rgn = &memblock.reserved;
...
return memblock_add_region(_rgn, base, size, nid, flags);
}
int __init_memblock memblock_reserve(phys_addr_t base, phys_addr_t size)
{
return memblock_reserve_region(base, size, MAX_NUMNODES, 0);
}
When the available memory is
added?
• x86
• memblock_x86_fill
• called by setup_arch (8/80)
• ARM
• arm_memblock_init
• Also called by setup_arch (8/80)
29
void __init memblock_x86_fill(void)
{
...
memblock_allow_resize();
for (i = 0; i < e820.nr_map; i++) {
... memblock_add(ei->addr, ei->size);
}
memblock_trim_memory(PAGE_SIZE);
...
}
BTW, what’s this?
Resizing, or reallocation.
• Memblock uses slab for resizing if available
• # of e820 entries may be more than 128
• However, slab is available at kmem_cache_init called by
mm_init (25/80), so not at this time.
• Memblock tries to allocate by itself by finding an
area in memory && !reserved.
30
static int __init_memblock memblock_double_array(struct memblock_type *type,
phys_addr_t new_area_start,
phys_addr_t new_area_size)
{
…
addr = memblock_find_in_range(new_area_start + new_area_size,
memblock.current_limit,
new_alloc_size, PAGE_SIZE);
memblock: Debug options
• “memblock=debug”
31
static int __init early_memblock(char *p)
{
if (p && strstr(p, "debug"))
memblock_debug = 1;
return 0;
}
early_param("memblock", early_memblock);
static int __init_memblock memblock_reserve_region(...)
{
...
memblock_dbg("memblock_reserve: [%#016llx-%#016llx]
flags %#02lx %pFn",
(unsigned long long)base,
(unsigned long long)base + size - 1,
flags, (void *)_RET_IP_);
3. Initialization
Okay, okay.
32
start_kernel
• What’s the first initialization function called?
33
smp_setup_processor_id() ((at least 2.6.18) ~ 3.2)
lockdep_init () (3.3 ~)
commit 73839c5b2eacc15cb0aa79c69b285fc659fa8851
Author: Ming Lei <tom.leiming@gmail.com>
Date: Thu Nov 17 13:34:31 2011 +0800
init/main.c: Execute lockdep_init() as early as possible
This patch fixes a lockdep warning on ARM platforms:
[ 0.000000] WARNING: lockdep init error! Arch code didn't call lockdep_init() early
enough?
[ 0.000000] Call stack leading to lockdep invocation was:
[ 0.000000] [<c00164bc>] save_stack_trace_tsk+0x0/0x90
[ 0.000000] [<ffffffff>] 0xffffffff
The warning is caused by printk inside smp_setup_processor_id().
init (1/80) : lockdep_init
• Initializes lockdep (lock validator)
• “Runtime locking correctness validator”
• Detects
• Lock inversion
• Circular lock dependencies
• When enabled, lockdep is called when any spinlock or
mutex is acquired.
• Thus, the initialization for lockdep must be first.
• Initialization is simple (just initializing list_head’s of hashes)
34
void lockdep_init(void)
{...
for (i = 0; i < CLASSHASH_SIZE; i++)
INIT_LIST_HEAD(classhash_table + i);
for (i = 0; i < CHAINHASH_SIZE; i++)
INIT_LIST_HEAD(chainhash_table + i);
...}
kernel/locking/lockdep.c
Config: CONFIG_LOCKDEP
selected by PROVE_LOCKING
or DEBUG_LOCK_ALLOC
or LOCK_STAT
init (2/80) : smp_setup_processor_id
• Only effective in some architecture
• ARM, s390, SPARC
35
u32 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] =
MPIDR_INVALID };
void __init smp_setup_processor_id(void)
{
int i;
u32 mpidr = is_smp() ? read_cpuid_mpidr() &
MPIDR_HWID_BITMASK : 0;
u32 cpu = MPIDR_AFFINITY_LEVEL(mpidr, 0);
cpu_logical_map(0) = cpu;
for (i = 1; i < nr_cpu_ids; ++i)
cpu_logical_map(i) = i == cpu ? 0 : i;
set_my_cpu_offset(0);
pr_info("Booting Linux on physical CPU 0x%xn", mpidr);
}
arch/arm/kernel/setup.c
Hardware CPU (core) ID
Exchange the logical ID
for the boot CPU and
the logical ID for the
CPU 0.
12 0 3cpu_logical_map:
init (3/80) : debug_objects_early_init
• Initializes debugobjects
• Lifetime debugging facility for objects
• Seems to be used by timer, hrtimer, workqueue,
per_cpu_counter and rcu
• Again, this function initializes locks and listheads
36
Config:
CONFIG_DEBUG_OBJECTS
void __init debug_objects_early_init(void)
{
int i;
for (i = 0; i < ODEBUG_HASH_SIZE; i++)
raw_spin_lock_init(&obj_hash[i].lock);
for (i = 0; i < ODEBUG_POOL_SIZE; i++)
hlist_add_head(&obj_static_pool[i].node, &obj_pool);
}
lib/debugobjects.c
init (4/80): boot_init_stack_canary
• Setup the stackprotector
• include/asm/stackprotector.h
• Decide the canary value based on random value and TSC
37
static __always_inline void boot_init_stack_canary(void)
{
u64 canary;
u64 tsc;
#ifdef CONFIG_X86_64
BUILD_BUG_ON(offsetof(union irq_stack_union, stack_canary) != 40);
#endif
get_random_bytes(&canary, sizeof(canary));
tsc = __native_read_tsc();
canary += tsc + (tsc << 32UL);
current->stack_canary = canary;
#ifdef CONFIG_X86_64
this_cpu_write(irq_stack_union.stack_canary, canary);
#else
this_cpu_write(stack_canary.canary, canary);
#endif
}
init (5/80): cgroup_init_early
• Initializes cgroups
• For subsystems that have early_init set, initialize the
subsystem.
• cpu, cpuacct, cpuset
• The rest of subsystems are initialized in cgroup_init (71/80)
• Initializes the structure, and the names for the
subsystems
38
init (6/80): boot_cpu_init
• Initializes various cpumasks for the boot CPU
• online : available to scheduler
• active : available to migration
• present : cpu is populated
• possible : cpu is populatable
• set_cpu_online adds the cpu to active
• set_cpu_present does not add the cpu to possible
39
static void __init boot_cpu_init(void)
{
int cpu = smp_processor_id();
/* Mark the boot cpu "present", "online" etc for SMP and UP
case */
set_cpu_online(cpu, true);
set_cpu_active(cpu, true);
set_cpu_present(cpu, true);
set_cpu_possible(cpu, true);
}
init/main.c
!HOTPLUG_CPU => same
!HOTPLUG_CPU => same
cpumask
• A bit map
40
typedef struct cpumask { DECLARE_BITMAP(bits, NR_CPUS); } cpumask_t;
include/linux/cpumask.h
#define DECLARE_BITMAP(name,bits) 
unsigned long name[BITS_TO_LONGS(bits)]
include/linux/types.h
#define BITS_TO_LONGS(nr) DIV_ROUND_UP(nr, BITS_PER_BYTE *
sizeof(long))
include/linux/bitops.h
NR_CPU bits
bits :
array of long (4 / 8 bytes)
Set bit! (x86)
• The register bitoffset operand for bts is
• -231 ~ 231-1 or -263 ~ 263-1
41
#define IS_IMMEDIATE(nr) (__builtin_constant_p(nr))
...
static __always_inline void
set_bit(long nr, volatile unsigned long *addr)
{
if (IS_IMMEDIATE(nr)) {
asm volatile(LOCK_PREFIX "orb %1,%0"
: CONST_MASK_ADDR(nr, addr)
: "iq" ((u8)CONST_MASK(nr))
: "memory");
} else {
asm volatile(LOCK_PREFIX "bts %1,%0"
: BITOP_ADDR(addr) : "Ir" (nr) : "memory");
}
}
arch/x86/include/asm/bitops.h
Set bit! (ARM)
42
#if __LINUX_ARM_ARCH__ >= 6
.macro bitop, name, instr
ENTRY( ¥name )
UNWIND( .fnstart)
ands ip, r1, #3
strneb r1, [ip] @ assert word-aligned
mov r2, #1
and r3, r0, #31 @ Get bit offset
mov r0, r0, lsr #5
add r1, r1, r0, lsl #2 @ Get word offset
...
mov r3, r2, lsl r3
1: ldrex r2, [r1]
¥instr r2, r2, r3
strex r0, r2, [r1]
cmp r0, #0
bne 1b
bx lr
UNWIND( .fnend )
ENDPROC(¥name )
.endm
bitop _set_bit, orr
smp_processor_id
• Returns the core ID (in the kernel)
• In ARM (and old days in x86)
• Located in “current”
• Located in the top of the current stack
• In x86
• Located in the per-cpu area.
43
#define raw_smp_processor_id() (this_cpu_read(cpu_number))
arch/x86/include/asm/smp.h
#define raw_smp_processor_id() (current_thread_info()->cpu)
arch/arm/include/asm/smp.h
static inline struct thread_info *current_thread_info(void)
{
register unsigned long sp asm ("sp");
return (struct thread_info *)(sp & ~(THREAD_SIZE - 1));
}
arch/arm/include/asm/thread_info.h
Next
• Topics and the rest of initialization
• Setup parameters (early_param() etc.)
• Initcalls
• Multiprocessor supports
• Per-cpus
• SMP boot (secondary boot)
• SMP altenatives
• And other alternatives
• And Others?
• Modules?
44

More Related Content

What's hot

Page cache in Linux kernel
Page cache in Linux kernelPage cache in Linux kernel
Page cache in Linux kernelAdrian Huang
 
semaphore & mutex.pdf
semaphore & mutex.pdfsemaphore & mutex.pdf
semaphore & mutex.pdfAdrian Huang
 
Linux Kernel - Virtual File System
Linux Kernel - Virtual File SystemLinux Kernel - Virtual File System
Linux Kernel - Virtual File SystemAdrian Huang
 
Uboot startup sequence
Uboot startup sequenceUboot startup sequence
Uboot startup sequenceHoucheng Lin
 
Physical Memory Management.pdf
Physical Memory Management.pdfPhysical Memory Management.pdf
Physical Memory Management.pdfAdrian Huang
 
Memory Management with Page Folios
Memory Management with Page FoliosMemory Management with Page Folios
Memory Management with Page FoliosAdrian Huang
 
Slab Allocator in Linux Kernel
Slab Allocator in Linux KernelSlab Allocator in Linux Kernel
Slab Allocator in Linux KernelAdrian Huang
 
U-Boot Porting on New Hardware
U-Boot Porting on New HardwareU-Boot Porting on New Hardware
U-Boot Porting on New HardwareRuggedBoardGroup
 
malloc & vmalloc in Linux
malloc & vmalloc in Linuxmalloc & vmalloc in Linux
malloc & vmalloc in LinuxAdrian Huang
 
U Boot or Universal Bootloader
U Boot or Universal BootloaderU Boot or Universal Bootloader
U Boot or Universal BootloaderSatpal Parmar
 
Introduction to Linux Kernel by Quontra Solutions
Introduction to Linux Kernel by Quontra SolutionsIntroduction to Linux Kernel by Quontra Solutions
Introduction to Linux Kernel by Quontra SolutionsQUONTRASOLUTIONS
 
Launch the First Process in Linux System
Launch the First Process in Linux SystemLaunch the First Process in Linux System
Launch the First Process in Linux SystemJian-Hong Pan
 
LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceBrendan Gregg
 
U-Boot presentation 2013
U-Boot presentation  2013U-Boot presentation  2013
U-Boot presentation 2013Wave Digitech
 
Linux Crash Dump Capture and Analysis
Linux Crash Dump Capture and AnalysisLinux Crash Dump Capture and Analysis
Linux Crash Dump Capture and AnalysisPaul V. Novarese
 
Embedded_Linux_Booting
Embedded_Linux_BootingEmbedded_Linux_Booting
Embedded_Linux_BootingRashila Rr
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugginglibfetion
 

What's hot (20)

Page cache in Linux kernel
Page cache in Linux kernelPage cache in Linux kernel
Page cache in Linux kernel
 
semaphore & mutex.pdf
semaphore & mutex.pdfsemaphore & mutex.pdf
semaphore & mutex.pdf
 
Linux Kernel - Virtual File System
Linux Kernel - Virtual File SystemLinux Kernel - Virtual File System
Linux Kernel - Virtual File System
 
Uboot startup sequence
Uboot startup sequenceUboot startup sequence
Uboot startup sequence
 
Physical Memory Management.pdf
Physical Memory Management.pdfPhysical Memory Management.pdf
Physical Memory Management.pdf
 
spinlock.pdf
spinlock.pdfspinlock.pdf
spinlock.pdf
 
Memory Management with Page Folios
Memory Management with Page FoliosMemory Management with Page Folios
Memory Management with Page Folios
 
Slab Allocator in Linux Kernel
Slab Allocator in Linux KernelSlab Allocator in Linux Kernel
Slab Allocator in Linux Kernel
 
U-Boot Porting on New Hardware
U-Boot Porting on New HardwareU-Boot Porting on New Hardware
U-Boot Porting on New Hardware
 
U-Boot - An universal bootloader
U-Boot - An universal bootloader U-Boot - An universal bootloader
U-Boot - An universal bootloader
 
malloc & vmalloc in Linux
malloc & vmalloc in Linuxmalloc & vmalloc in Linux
malloc & vmalloc in Linux
 
U Boot or Universal Bootloader
U Boot or Universal BootloaderU Boot or Universal Bootloader
U Boot or Universal Bootloader
 
Introduction to Linux Kernel by Quontra Solutions
Introduction to Linux Kernel by Quontra SolutionsIntroduction to Linux Kernel by Quontra Solutions
Introduction to Linux Kernel by Quontra Solutions
 
Launch the First Process in Linux System
Launch the First Process in Linux SystemLaunch the First Process in Linux System
Launch the First Process in Linux System
 
LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems Performance
 
U-Boot presentation 2013
U-Boot presentation  2013U-Boot presentation  2013
U-Boot presentation 2013
 
Linux Crash Dump Capture and Analysis
Linux Crash Dump Capture and AnalysisLinux Crash Dump Capture and Analysis
Linux Crash Dump Capture and Analysis
 
Qemu JIT Code Generator and System Emulation
Qemu JIT Code Generator and System EmulationQemu JIT Code Generator and System Emulation
Qemu JIT Code Generator and System Emulation
 
Embedded_Linux_Booting
Embedded_Linux_BootingEmbedded_Linux_Booting
Embedded_Linux_Booting
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugging
 

Similar to Linux Initialization Process (1)

Exploiting the Linux Kernel via Intel's SYSRET Implementation
Exploiting the Linux Kernel via Intel's SYSRET ImplementationExploiting the Linux Kernel via Intel's SYSRET Implementation
Exploiting the Linux Kernel via Intel's SYSRET Implementationnkslides
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debuggingHao-Ran Liu
 
Shellcoding in linux
Shellcoding in linuxShellcoding in linux
Shellcoding in linuxAjin Abraham
 
LCU14 209- LLVM Linux
LCU14 209- LLVM LinuxLCU14 209- LLVM Linux
LCU14 209- LLVM LinuxLinaro
 
Modern Linux Tracing Landscape
Modern Linux Tracing LandscapeModern Linux Tracing Landscape
Modern Linux Tracing LandscapeSasha Goldshtein
 
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerPragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerMarina Kolpakova
 
Lecture 6 Kernel Debugging + Ports Development
Lecture 6 Kernel Debugging + Ports DevelopmentLecture 6 Kernel Debugging + Ports Development
Lecture 6 Kernel Debugging + Ports DevelopmentMohammed Farrag
 
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]RootedCON
 
Compromising Linux Virtual Machines with Debugging Mechanisms
Compromising Linux Virtual Machines with Debugging MechanismsCompromising Linux Virtual Machines with Debugging Mechanisms
Compromising Linux Virtual Machines with Debugging MechanismsRussell Sanford
 
Linux SMEP bypass techniques
Linux SMEP bypass techniquesLinux SMEP bypass techniques
Linux SMEP bypass techniquesVitaly Nikolenko
 
Experience on porting HIGHMEM and KASAN to RISC-V at COSCUP 2020
Experience on porting HIGHMEM and KASAN to RISC-V at COSCUP 2020Experience on porting HIGHMEM and KASAN to RISC-V at COSCUP 2020
Experience on porting HIGHMEM and KASAN to RISC-V at COSCUP 2020Eric Lin
 
“Linux Kernel CPU Hotplug in the Multicore System”
“Linux Kernel CPU Hotplug in the Multicore System”“Linux Kernel CPU Hotplug in the Multicore System”
“Linux Kernel CPU Hotplug in the Multicore System”GlobalLogic Ukraine
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Akihiro Hayashi
 
Auditing the Opensource Kernels
Auditing the Opensource KernelsAuditing the Opensource Kernels
Auditing the Opensource KernelsSilvio Cesare
 
Debugging Python with gdb
Debugging Python with gdbDebugging Python with gdb
Debugging Python with gdbRoman Podoliaka
 
Crash_Report_Mechanism_In_Tizen
Crash_Report_Mechanism_In_TizenCrash_Report_Mechanism_In_Tizen
Crash_Report_Mechanism_In_TizenLex Yu
 

Similar to Linux Initialization Process (1) (20)

Exploiting the Linux Kernel via Intel's SYSRET Implementation
Exploiting the Linux Kernel via Intel's SYSRET ImplementationExploiting the Linux Kernel via Intel's SYSRET Implementation
Exploiting the Linux Kernel via Intel's SYSRET Implementation
 
Programar para GPUs
Programar para GPUsProgramar para GPUs
Programar para GPUs
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugging
 
X86 assembly nasm syntax
X86 assembly nasm syntaxX86 assembly nasm syntax
X86 assembly nasm syntax
 
Shellcoding in linux
Shellcoding in linuxShellcoding in linux
Shellcoding in linux
 
LCU14 209- LLVM Linux
LCU14 209- LLVM LinuxLCU14 209- LLVM Linux
LCU14 209- LLVM Linux
 
Modern Linux Tracing Landscape
Modern Linux Tracing LandscapeModern Linux Tracing Landscape
Modern Linux Tracing Landscape
 
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerPragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
 
Lecture 6 Kernel Debugging + Ports Development
Lecture 6 Kernel Debugging + Ports DevelopmentLecture 6 Kernel Debugging + Ports Development
Lecture 6 Kernel Debugging + Ports Development
 
Linux memory
Linux memoryLinux memory
Linux memory
 
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]
 
Compromising Linux Virtual Machines with Debugging Mechanisms
Compromising Linux Virtual Machines with Debugging MechanismsCompromising Linux Virtual Machines with Debugging Mechanisms
Compromising Linux Virtual Machines with Debugging Mechanisms
 
Linux SMEP bypass techniques
Linux SMEP bypass techniquesLinux SMEP bypass techniques
Linux SMEP bypass techniques
 
Experience on porting HIGHMEM and KASAN to RISC-V at COSCUP 2020
Experience on porting HIGHMEM and KASAN to RISC-V at COSCUP 2020Experience on porting HIGHMEM and KASAN to RISC-V at COSCUP 2020
Experience on porting HIGHMEM and KASAN to RISC-V at COSCUP 2020
 
“Linux Kernel CPU Hotplug in the Multicore System”
“Linux Kernel CPU Hotplug in the Multicore System”“Linux Kernel CPU Hotplug in the Multicore System”
“Linux Kernel CPU Hotplug in the Multicore System”
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
 
Genode Compositions
Genode CompositionsGenode Compositions
Genode Compositions
 
Auditing the Opensource Kernels
Auditing the Opensource KernelsAuditing the Opensource Kernels
Auditing the Opensource Kernels
 
Debugging Python with gdb
Debugging Python with gdbDebugging Python with gdb
Debugging Python with gdb
 
Crash_Report_Mechanism_In_Tizen
Crash_Report_Mechanism_In_TizenCrash_Report_Mechanism_In_Tizen
Crash_Report_Mechanism_In_Tizen
 

Recently uploaded

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 

Recently uploaded (20)

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 

Linux Initialization Process (1)

  • 1. Initialization (1) Taku Shimosawa Pour le livre nouveau du Linux noyau 1
  • 2. Agenda • Initialization Phase of the Linux Kernel • Turning on the paging feature • Calling *init functions • And miscellaneous things related to initialization 2
  • 3. 1. vmlinux This is the linux kernel 3
  • 4. vmlinux • Main kernel binary • Runs with the final CPU state • Protected Mode in x86_32 (i386) • Long Mode in x86_64 • And so on… • Runs in the virtual memory space • Above PAGE_OFFSET (default: 0xc0000000) (32-bit) • Above __START_KERNEL_map (default: 0xff…f80000000) • i.e. All the absolute addresses in the binary are virtual ones • Entry points 4 Architecture Name Location Name (secondary) x86_32 startup_32 arch/x86/kernel/head_32.S startup_32_smp x86_64 startup_64 arch/x86/kernel/head_64.S secondary_startup_64 ARM stext arch/arm/kernel/head[_nommu].S secondary_startup ARM64 stext arch/arm64/kenel/head.S secondary_holding_pen secondary_entry PPC _stext arch/powerpc/kernel/head_32.S* (__secondary_start)
  • 5. Virtual memory mapping 5 x86_64 Virtuali386 Virtual Physical LOWMEM PAGE_OFFSET (0xC0000000) Up to ~896 MB text/data PAGE_OFFSET (0xFFFF8800 00000000) __START_KERNEL_map (0xFFFFFFFF 80000000) 0x00000000 0x0000000000000000 0xFFFFFFFF 0xFFFFFFFFFFFFFFFF 2GB
  • 6. Why different mapping in 64-bit? • The kernel code, data, and BSS reside in the last 2- GB of the memory => Addressable by 32-bit! • -mcmodel option in GCC • Specifies the assumptions for the size of code/data sections 6 -mcmodel option (x86) text data small within 2GB kernel within -2GB medium within 2GB Can be > 2GB large Anywhere in 64bit
  • 7. Column: -mcmodel in gcc 7 int g_data = 4; int main(void) { g_data += 7; ... } 8b 05 c6 0b 20 00 mov 0x200bc6(%rip),%eax # 601040 <g_data> ... bf 01 00 00 00 mov $0x1,%edi 8d 50 07 lea 0x7(%rax),%edx 48 b8 40 10 60 00 00 movabs $0x601040,%rax 00 00 00 bf 01 00 00 00 mov $0x1,%edi 8b 30 mov (%rax),%esi ... 8d 56 07 lea 0x7(%rsi),%edx large #define SZ (1 << 30) int buf[SZ] = {1}; int main(void) { buf[0] += 3; } $ gcc -O3 -o ba -mcmodel=small bigarray.c /usr/lib/gcc/x86_64-linux-gnu/4.8/crtbegin.o: In function `deregister_tm_clones': crtstuff.c:(.text+0x1): relocation truncated to fit: R_X86_64_32 against symbol `__TMC_END__' defined in .data section in ba small kernel 48 b8 60 10 a0 00 00 movabs $0xa01060,%rax 00 00 00 8b 08 mov (%rax),%ecx 8d 51 03 lea 0x3(%rcx),%edx medium large *The offset of RIP-relative addressing is 32- bit
  • 8. Column: -mcmodel in gcc (2) • Code? 8 void nop(void) { asm volatile(".fill (2 << 30), 1, 0x90"); } $ gcc -O3 -o ba -mcmodel=small supernop.c /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux- gnu/crt1.o: In function `_start': (.text+0x12): relocation truncated to fit: R_X86_64_32S against symbol `__libc_csu_fini' defined in .text section in /usr/lib/x86_64-linux-gnu/libc_nonshared.a(elf-init.oS) $ gcc -O3 -o ba -mcmodel=large supernop.c /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux- gnu/crt1.o: In function `_start': (.text+0x12): relocation truncated to fit: R_X86_64_32S against symbol `__libc_csu_fini' defined in .text section in /usr/lib/x86_64-linux-gnu/libc_nonshared.a(elf-init.oS) small medium kernel large
  • 9. Initialization Overview 9 Booting Code (Preparing CPU states, Gathering HW information, Decompressing vmlinux etc.) arch/*/boot/ arch/*/kernel/head*.S, head*.c Low-level Initialization (Switching to virtual memory world, Getting prepared for C programs) init/main.c (startup_kernel) Initialization (Initializing all the kernel features including architecture-dependent parts) init/main.c (rest_init) Creating the “init” process, and letting it the rest initialization (Setting up multiprocessing, scheduling) kernel/sched/idle.c (cpu_idle_loop) “Swapper” (PID=0) now sleeps init/main.c (kernel_init) Performing final initialization and “Exec”ing the “init” user “init” (PID=1) arch/*/kernel, arch/*/mm, …Call vmlinux
  • 11. Enabling paging • The early part is executed with paging off. • Physical address space • vmlinux is assumed to be executed with paging on. • The addresses in the binary are not physical addresses. • The first big job in vmlinux is enabling paging • Creating a (transitional) page table • Setting the CPU to use the page table, and to enable paging • Jumping to the entry point in C (compiled in the virtual address space) 11
  • 12. Identity Map • At first, the goal page table cannot be used • Since changing PC and enabling paging are (at least, in x86) separate instructions. 12 PC Physical Virtual Enable Paging Physical Virtual Page Fault!
  • 13. Identity Map • Therefore, identity map is created in addition to the (goal) map. 13 PC Physical Virtual Jump (1) Create an initial page table (2) Enable paging, and Jump to a virtual address. (3) Zap the low mapping
  • 14. Addresses in the transitional phase • x86_64 • The decompressing routine enables paging and creates an identity page table (only for first 4GB) • Paging is required for CPUs to switch to 64-bit mode • Located in 6 pages (pgtable) in the decompressing routine • Symbols in vmlinux are accessed with RIP-relative • No trick is necessary for using the symbols 14 leaq _text(%rip), %rbp subq $_text - __START_KERNEL_map, %rbp ... leaq early_level4_pgt(%rip), %rbx ... movq $(early_level4_pgt - __START_KERNEL_map), %rax addq phys_base(%rip), %rax movq %rax, %cr3 movq $1f, %rax jmp *%rax 1: (arch/x86/kernel/head_64.S)
  • 15. Addresses in the transitional phase • i386 • Symbols in vmlinux are accessed with absolute addresses • Before paging is enabled, PAGE_OFFSET is always subtracted from addresses 15 movl $pa(__bss_start),%edi movl $pa(__bss_stop),%ecx subl %edi,%ecx shrl $2,%ecx rep ; stosl ... movl $pa(initial_page_table), %eax movl %eax,%cr3 /* set the page table pointer.. */ movl $CR0_STATE,%eax movl %eax,%cr0 /* ..and set paging (PG) bit */ ljmp $__BOOT_CS,$1f /* Clear prefetch and normalize %eip */ 1: ... lgdt early_gdt_descr lidt idt_descr #define pa(X) ((X) - __PAGE_OFFSET) (arch/x86/kernel/head_32.S)
  • 16. 3. Initialization At last, we have come here! 16
  • 17. Initialization (start_kernel) • A lot of *_init functions! • Furthermore, some init functions call another init functions. • At least, 80 functions are called in this function. • This slide will pick up some topics from the initialization functions 17
  • 18. 2.9. Before Initialization A little more tricks 18
  • 19. Special directives • What are these? • “I’m curious!”. 19 asmlinkage __visible void __init start_kernel(void) { … }
  • 20. asmlinkage • asmlinkage • Ensures the symbol is not mangled • (in x86_32) Ensures all the parameters are passed by the stack 20 #ifdef CONFIG_X86_32 #define asmlinkage CPP_ASMLINKAGE __attribute__((regparm(0))) arch/x86/include/asm/linkage.h #ifdef __cplusplus #define CPP_ASMLINKAGE extern "C" #else #define CPP_ASMLINKAGE #endif #ifndef asmlinkage #define asmlinkage CPP_ASMLINKAGE #endif include/linux/linkage.h
  • 21. __visible • (Effective in gcc >=4.6) 21 #if GCC_VERSION >= 40600 /* * Tell the optimizer that something else uses this function or variable. */ #define __visible __attribute__((externally_visible)) #endif include/linux/compiler-gcc4.h commit 9a858dc7cebce01a7bb616bebb85087fa2b40871 author Andi Kleen <ak@linux.intel.com> Mon Sep 17 21:09:15 2012 committer Linus Torvalds <torvalds@linux-foundation.org> Mon Sep 17 22:00:38 2012 compiler.h: add __visible gcc 4.6+ has support for a externally_visible attribute that prevents the optimizer from optimizing unused symbols away. Add a __visible macro to use it with that compiler version or later. This is used (at least) by the "Link Time Optimization" patchset.
  • 22. __init (1) • To mark code(text) and data as only necessary during initialization 22 #define __init __section(.init.text) __cold notrace #define __initdata __section(.init.data) #define __initconst __constsection(.init.rodata) #define __exitdata __section(.exit.data) #define __exit_call __used __section(.exitcall.exit) (include/linux/init.h) #ifndef __cold #define __cold __attribute__((__cold__)) #endif (include/linux/compiler-gcc4.h) #ifndef __section # define __section(S) __attribute__ ((__section__(#S))) #endif ... #define notrace __attribute__((no_instrument_function)) (include/linux/compiler.h)
  • 23. __init (2) • The init* sections are concentrated to a contiguous memory area 23 . = ALIGN(PAGE_SIZE); .init.begin : AT(ADDR(.init.begin) - LOAD_OFFSET) { __init_begin = .; /* paired with __init_end */ } ... INIT_TEXT_SECTION(PAGE_SIZE) #ifdef CONFIG_X86_64 :init #endif INIT_DATA_SECTION(16) .... . = ALIGN(PAGE_SIZE); ... .init.end : AT(ADDR(.init.end) - LOAD_OFFSET) { __init_end = .; } arch/x86/kernel/vmlinux.lds.S init.text init.data … __init_begin __init_end
  • 24. __init (3) • And, they are discarded (free’d) after initialization • Called from kernel_init 24 void free_initmem(void) { free_init_pages("unused kernel", (unsigned long)(&__init_begin), (unsigned long)(&__init_end)); } arch/x86/mm/init.c void free_initmem(void) { ... poison_init_mem(__init_begin, __init_end - __init_begin); if (!machine_is_integrator() && !machine_is_cintegrator()) free_initmem_default(-1); } arch/arm/mm/init.c
  • 25. head32.c, head64.c • Before calling start_kernel, i386_start_kernel or x86_64_start_kernel is called in x86 • Located in arch/x86/kernel/head{32,64}.c • No underscore between head and 32! • x86 (32-bit) • Reserve BIOS memory (in conventional memory) • x86 (64-bit) • Erase the identity map • Clear BSS, copy boot information from the low memory • And reserve BIOS memory 25
  • 26. Reserve? But how? • This is very initial time. No complicated memory management is working right now. • memblock (Logical memory blocks) is working! • memblock simply manages memory blocks • And in some architecture, information is took over to another mechanism, and discarded after initialization 26 #define BIOS_LOWMEM_KILOBYTES 0x413 lowmem = *(unsigned short *)__va(BIOS_LOWMEM_KILOBYTES); lowmem <<= 10; ... memblock_reserve(lowmem, 0x100000 - lowmem); arch/x86/kernel/head.c #ifdef CONFIG_ARCH_DISCARD_MEMBLOCK #define __init_memblock __meminit #define __initdata_memblock __meminitdata #else ... #endif include/linux/memblock.h Set in S+Core, IA64, S390, SH, MIPS and x86 Without memory hotplug, __meminit is __init.
  • 27. memblock • Data Structure (include/linux/memblock.h) • Initially the arrays are allocated statically 27 memblock (memblock) memory (memblock_type) reserved (memblock_type) memblock_region • base, size, flags[, nid] memblock_region memblock_region memblock_region Array of memblock_region Array of memblock_region static struct memblock_region memblock_memory_init_regions[INIT_MEMBLOCK_REGIONS] __initdata_memblock; static struct memblock_region memblock_reserved_init_regions[INIT_MEMBLOCK_REGIONS] __initdata_memblock; *INIT_MEMBLOCK_REGIONS = 128 (memblock: Global variable)
  • 28. Reserving in memblock • Reserving adds the region to the region array in the “reserved” type • A function to adding the available region is memblock_add 28 static int __init_memblock memblock_reserve_region(phys_addr_t base, phys_addr_t size, int nid, unsigned long flags) { struct memblock_type *_rgn = &memblock.reserved; ... return memblock_add_region(_rgn, base, size, nid, flags); } int __init_memblock memblock_reserve(phys_addr_t base, phys_addr_t size) { return memblock_reserve_region(base, size, MAX_NUMNODES, 0); }
  • 29. When the available memory is added? • x86 • memblock_x86_fill • called by setup_arch (8/80) • ARM • arm_memblock_init • Also called by setup_arch (8/80) 29 void __init memblock_x86_fill(void) { ... memblock_allow_resize(); for (i = 0; i < e820.nr_map; i++) { ... memblock_add(ei->addr, ei->size); } memblock_trim_memory(PAGE_SIZE); ... } BTW, what’s this?
  • 30. Resizing, or reallocation. • Memblock uses slab for resizing if available • # of e820 entries may be more than 128 • However, slab is available at kmem_cache_init called by mm_init (25/80), so not at this time. • Memblock tries to allocate by itself by finding an area in memory && !reserved. 30 static int __init_memblock memblock_double_array(struct memblock_type *type, phys_addr_t new_area_start, phys_addr_t new_area_size) { … addr = memblock_find_in_range(new_area_start + new_area_size, memblock.current_limit, new_alloc_size, PAGE_SIZE);
  • 31. memblock: Debug options • “memblock=debug” 31 static int __init early_memblock(char *p) { if (p && strstr(p, "debug")) memblock_debug = 1; return 0; } early_param("memblock", early_memblock); static int __init_memblock memblock_reserve_region(...) { ... memblock_dbg("memblock_reserve: [%#016llx-%#016llx] flags %#02lx %pFn", (unsigned long long)base, (unsigned long long)base + size - 1, flags, (void *)_RET_IP_);
  • 33. start_kernel • What’s the first initialization function called? 33 smp_setup_processor_id() ((at least 2.6.18) ~ 3.2) lockdep_init () (3.3 ~) commit 73839c5b2eacc15cb0aa79c69b285fc659fa8851 Author: Ming Lei <tom.leiming@gmail.com> Date: Thu Nov 17 13:34:31 2011 +0800 init/main.c: Execute lockdep_init() as early as possible This patch fixes a lockdep warning on ARM platforms: [ 0.000000] WARNING: lockdep init error! Arch code didn't call lockdep_init() early enough? [ 0.000000] Call stack leading to lockdep invocation was: [ 0.000000] [<c00164bc>] save_stack_trace_tsk+0x0/0x90 [ 0.000000] [<ffffffff>] 0xffffffff The warning is caused by printk inside smp_setup_processor_id().
  • 34. init (1/80) : lockdep_init • Initializes lockdep (lock validator) • “Runtime locking correctness validator” • Detects • Lock inversion • Circular lock dependencies • When enabled, lockdep is called when any spinlock or mutex is acquired. • Thus, the initialization for lockdep must be first. • Initialization is simple (just initializing list_head’s of hashes) 34 void lockdep_init(void) {... for (i = 0; i < CLASSHASH_SIZE; i++) INIT_LIST_HEAD(classhash_table + i); for (i = 0; i < CHAINHASH_SIZE; i++) INIT_LIST_HEAD(chainhash_table + i); ...} kernel/locking/lockdep.c Config: CONFIG_LOCKDEP selected by PROVE_LOCKING or DEBUG_LOCK_ALLOC or LOCK_STAT
  • 35. init (2/80) : smp_setup_processor_id • Only effective in some architecture • ARM, s390, SPARC 35 u32 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = MPIDR_INVALID }; void __init smp_setup_processor_id(void) { int i; u32 mpidr = is_smp() ? read_cpuid_mpidr() & MPIDR_HWID_BITMASK : 0; u32 cpu = MPIDR_AFFINITY_LEVEL(mpidr, 0); cpu_logical_map(0) = cpu; for (i = 1; i < nr_cpu_ids; ++i) cpu_logical_map(i) = i == cpu ? 0 : i; set_my_cpu_offset(0); pr_info("Booting Linux on physical CPU 0x%xn", mpidr); } arch/arm/kernel/setup.c Hardware CPU (core) ID Exchange the logical ID for the boot CPU and the logical ID for the CPU 0. 12 0 3cpu_logical_map:
  • 36. init (3/80) : debug_objects_early_init • Initializes debugobjects • Lifetime debugging facility for objects • Seems to be used by timer, hrtimer, workqueue, per_cpu_counter and rcu • Again, this function initializes locks and listheads 36 Config: CONFIG_DEBUG_OBJECTS void __init debug_objects_early_init(void) { int i; for (i = 0; i < ODEBUG_HASH_SIZE; i++) raw_spin_lock_init(&obj_hash[i].lock); for (i = 0; i < ODEBUG_POOL_SIZE; i++) hlist_add_head(&obj_static_pool[i].node, &obj_pool); } lib/debugobjects.c
  • 37. init (4/80): boot_init_stack_canary • Setup the stackprotector • include/asm/stackprotector.h • Decide the canary value based on random value and TSC 37 static __always_inline void boot_init_stack_canary(void) { u64 canary; u64 tsc; #ifdef CONFIG_X86_64 BUILD_BUG_ON(offsetof(union irq_stack_union, stack_canary) != 40); #endif get_random_bytes(&canary, sizeof(canary)); tsc = __native_read_tsc(); canary += tsc + (tsc << 32UL); current->stack_canary = canary; #ifdef CONFIG_X86_64 this_cpu_write(irq_stack_union.stack_canary, canary); #else this_cpu_write(stack_canary.canary, canary); #endif }
  • 38. init (5/80): cgroup_init_early • Initializes cgroups • For subsystems that have early_init set, initialize the subsystem. • cpu, cpuacct, cpuset • The rest of subsystems are initialized in cgroup_init (71/80) • Initializes the structure, and the names for the subsystems 38
  • 39. init (6/80): boot_cpu_init • Initializes various cpumasks for the boot CPU • online : available to scheduler • active : available to migration • present : cpu is populated • possible : cpu is populatable • set_cpu_online adds the cpu to active • set_cpu_present does not add the cpu to possible 39 static void __init boot_cpu_init(void) { int cpu = smp_processor_id(); /* Mark the boot cpu "present", "online" etc for SMP and UP case */ set_cpu_online(cpu, true); set_cpu_active(cpu, true); set_cpu_present(cpu, true); set_cpu_possible(cpu, true); } init/main.c !HOTPLUG_CPU => same !HOTPLUG_CPU => same
  • 40. cpumask • A bit map 40 typedef struct cpumask { DECLARE_BITMAP(bits, NR_CPUS); } cpumask_t; include/linux/cpumask.h #define DECLARE_BITMAP(name,bits) unsigned long name[BITS_TO_LONGS(bits)] include/linux/types.h #define BITS_TO_LONGS(nr) DIV_ROUND_UP(nr, BITS_PER_BYTE * sizeof(long)) include/linux/bitops.h NR_CPU bits bits : array of long (4 / 8 bytes)
  • 41. Set bit! (x86) • The register bitoffset operand for bts is • -231 ~ 231-1 or -263 ~ 263-1 41 #define IS_IMMEDIATE(nr) (__builtin_constant_p(nr)) ... static __always_inline void set_bit(long nr, volatile unsigned long *addr) { if (IS_IMMEDIATE(nr)) { asm volatile(LOCK_PREFIX "orb %1,%0" : CONST_MASK_ADDR(nr, addr) : "iq" ((u8)CONST_MASK(nr)) : "memory"); } else { asm volatile(LOCK_PREFIX "bts %1,%0" : BITOP_ADDR(addr) : "Ir" (nr) : "memory"); } } arch/x86/include/asm/bitops.h
  • 42. Set bit! (ARM) 42 #if __LINUX_ARM_ARCH__ >= 6 .macro bitop, name, instr ENTRY( ¥name ) UNWIND( .fnstart) ands ip, r1, #3 strneb r1, [ip] @ assert word-aligned mov r2, #1 and r3, r0, #31 @ Get bit offset mov r0, r0, lsr #5 add r1, r1, r0, lsl #2 @ Get word offset ... mov r3, r2, lsl r3 1: ldrex r2, [r1] ¥instr r2, r2, r3 strex r0, r2, [r1] cmp r0, #0 bne 1b bx lr UNWIND( .fnend ) ENDPROC(¥name ) .endm bitop _set_bit, orr
  • 43. smp_processor_id • Returns the core ID (in the kernel) • In ARM (and old days in x86) • Located in “current” • Located in the top of the current stack • In x86 • Located in the per-cpu area. 43 #define raw_smp_processor_id() (this_cpu_read(cpu_number)) arch/x86/include/asm/smp.h #define raw_smp_processor_id() (current_thread_info()->cpu) arch/arm/include/asm/smp.h static inline struct thread_info *current_thread_info(void) { register unsigned long sp asm ("sp"); return (struct thread_info *)(sp & ~(THREAD_SIZE - 1)); } arch/arm/include/asm/thread_info.h
  • 44. Next • Topics and the rest of initialization • Setup parameters (early_param() etc.) • Initcalls • Multiprocessor supports • Per-cpus • SMP boot (secondary boot) • SMP altenatives • And other alternatives • And Others? • Modules? 44