Trap Handling in Linux
focusing on system call
Yongrae Jo
2017. 2. 16
2
CONTENTS
Background
Function Call Flow from start_kernel()
IDT initialization & Its Data
Structure(gate, idt_table, MSR)
syscall entry, fast vs slow path,
sys_call_table Initialization
system call procedure from user
application and glibc
3
Background
Interrupt
External Interrupt
Asynchronous Interrupt
IRQ
Trap
Exception
Fault
System Call
Internal Interrupt
Synchronous Interrupt
Hardware Interrupt Software Interrupt
But in Linux, Software Interrupt are all regarded as Trap
An interrupt is a signal from a device attached to a
computer or from a program within the computer
that requires the operating system to stop and
figure out what to do next (from whatis.techtarget.com/)
4
Execution Flow of Interrupt Service
Normal
Execution
Interrupt
Triggered,
Non-Maskable
Interrupt(NMI)
1. Save current State
2. Call Handler Routine
But it is masked
Execute Requested Handler Routine
1. Restore state
2. Return from ISR
Source : Image from http://studymake.tistory.com/341
5
(External)Interrupt Controller
source : http://embien.com/blog/interrupt-handling-in-embedded-
software/
6
CONTENTS
Background
Function Call Flow from start_kernel()
IDT initialization & Its Data
Structure(gate, idt_table, MSR)
syscall entry & sys_call_table
Initialization &
system call procedure from user
application and glibc
7
Function Call Flow from
start_kernel()
8
Function Call Flow from
start_kernel()
9
CONTENTS
Background
Function Call Flow from start_kernel()
IDT initialization & Its Data
Structure(gate, idt_table, MSR)
syscall entry, fast vs slow path,
sys_call_table Initialization
system call procedure from user
application and glibc
10
trap_init() from
/usr/src/linux-4.9.6/arch/x86/kernel/traps.c
11
trap_init()
/usr/src/linux-4.9.6/arch/x86/kernel/traps.c
12
List of interrupt from
/usr/src/linux-4.9.6/arch/x86/include/asm/traps.h
13
x86’s Interrupt Descriptor Table
Source : Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A:, System Programming Guide, Part 1
14
x86’s Interrupt Descriptor Table(cont’d)
Source : Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A:, System Programming Guide, Part 1
15
x86’s Interrupt Descriptor Table(cont’d)
Where are the other interrupts defined?
16
List of interrupt from
: /usr/src/linux-4.9.6/arch/x86/include/asm/irq_vectors.h
In this file, We can see the other interrupt vector names and its
numbers
17
Let’s follow set_intr_gate()
function call flow
trap_init()
18
Let’s follow set_intr_gate()’s
function call flow
This function is real deal!
trap_init()
19
What is gate?
…
The architecture also defines a set of special descriptors
called gates (call gates, interrupt gates, trap gates, and
task gates). These provide protected gateways to system
procedures and handlers that may operate at a different
privilege level than application programs and most
procedures. For example, a CALL to a call gate can
provide access to a procedure in a code segment that is at
the same or a numerically lower privilege level (more
privileged) than the current code segment.
Source : Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A:, System Programming Guide, Part 1
20
_set_gate() from
: /usr/src/linux-4.9.6/arch/x86/include/asm/desc.h
We need to know the meaning of following terms
“gate_desc”, “type”,“dpl”, “ist”, “seg” and “idt_table”
21
gate_desc from
: /usr/src/linux-4.9.6/arch/x86/include/asm/desc_defs.h
Bit fields
gate_desc in 64 bits
22
gate_struct64 and its connection to
x86’s feature
Source : Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A:, System Programming Guide, Part 1
23
type field in gate_struct64
Source : Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A:, System Programming Guide, Part 1
24
write_idt_entry() from
: /usr/src/linux-4.9.6/arch/x86/include/asm/desc.h
25
Let’s see cpu_init()
26
cpu_init()
: /usr/src/linux-4.9.6/arch/x86/kernel/cpu/common.c
27
load_current_idt() function call
flow
Inline
assembly,
load idt
instruction
28
GCC’s Inline Assembly for x86 in
Linux
Load idt nth
parameter
Input
operands :
Memory
constraints
C expression memory address
Source :
https://www.ibm.co
m/developerworks/li
brary/l-ia/
Source : https://www.ibm.com/developerworks/library/l-ia/
29
syscall_init() from
: /usr/src/linux-4.9.6/arch/x86/kernel/cpu/common.c
30
MSR Flags from
: /usr/src/linux-4.9.6/arch/x86/include/asm/msr-index.h
Where does these address come from?
31
What is MSR?
A model-specific register (MSR) is any of various control
registers in the x86 instruction set used for debugging,
program execution tracing, computer performance
monitoring, and toggling certain CPU features.(wikipedia)
Model
Source : Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3C:, System Programming Guide, Part 3
32
Some MSRs
Source : Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3C:, System Programming Guide, Part 3
MSR Register
Address
(hex/Decimal)
Architectural MSR Name
and bit fields MSR/Bit Description
33
Register syscall entry to MSRs
: /usr/src/linux-4.9.6/arch/x86/kernel/cpu/common.c
segment
x86 -Assembly
procedure
34
CONTENTS
Background
Function Call Flow from start_kernel()
IDT initialization & Its Data
Structure(gate, idt_table, MSR)
syscall entry, fast vs slow path,
sys_call_table Initialization
system call procedure from user
application and glibc
35
syscall_init() has external
references
System.map is a symbol table which contains memory address, type and
its name. Here “t or T” means code(or text) section
Register entry address to MSR register
36
More details on entry_SYSCALL_64
: /usr/src/linux-4.9.6/arch/x86/entry/entry_64.S
:
37
More details on entry_SYSCALL_64
: /usr/src/linux-4.9.6/arch/x86/arch/entry/entry_64.S
:
38
System call has two types
: fast vs slow(in entry_64.S)
39
System call has two types
: fast vs slow(in entry_64.S)
Invoke adequate system call
40
Fast vs slow system call
A fast syscall is one that is known to be able to complete without
blocking or waiting. When the kernel encounters a fast syscall, it
knows it can execute the syscall immediately and keep the same
process scheduled (e.g. getuid(), getpid(), gettimeofday(), ...)
A slow syscall potentially requires waiting for another task to
complete, so the kernel must prepare to pause the calling process
and run another task.(e.g. sleep(), possibly read())
Source : http://unix.stackexchange.com/questions/14293/difference-between-slow-system-calls-and-fast-system-calls
41
sys_call_table()
: array where system call are listed
sys_call_table is an array of function pointer named
sys_call_ptr_t which points to address of system call
function and it takes 6 arguments and returns long
type value
42
Initializing sys_call_table
: /usr/src/linux-4.9.6/arch/x86/arch/syscall_64.c
Init doing nothing
function
Declaration of system call
functions
Assign system call function’s address to
sys_call_table array using nr as an index
43
Wait a second
44
Wait a second
45
Designated Initializers
Standard C90 requires the elements of an initializer to appear in a fixed order,
the same as the order of the elements in the array or structure being initialized.
In ISO C99 you can give the elements in any order, specifying the array
indices or structure field names they apply to, ...
To specify an array index, write ‘[index] =’ before the element value. For
example,
int a[6] = { [4] = 29, [2] = 15 };
is equivalent to
int a[6] = { 0, 0, 15, 0, 29, 0 };
To initialize a range of elements to the same value, write ‘[first ... last] =
value’. This is a GNU extension. For example,
int widths[] = { [0 ... 9] = 1, [10 ... 99] = 2,
[100] = 3 };
Source : https://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html
46
C Preprocessor’s #, ## Macro
#define STRING(x) #x means “x” : stringfy x by “x”
#define X(n) x##n means xn : concatenation with x and n
So let me roll down initialization code of sys_call_table array
For example, index 0
→
[0] = __SYSCALL_64_QUAL_##qual(sys_read)
= __SYSCALL_64_QUAL_(sys_read) : (## is concatenation
and qual is empty)
= sys_read
47
do_syscall64() from
: /usr/src/linux-4.9.6/arch/x86/entry/common.c
48
do_syscall64() from
: /usr/src/linux-4.9.6/arch/x86/entry/common.c
It invokes
system call
with
arguments
548~0
These registers are
constructed from
entry_64.S
49
do_syscall uses registers
constructed at entry_SYSCALL_64
50
CONTENTS
Background
Function Call Flow from start_kernel()
IDT initialization & Its Data
Structure(gate, idt_table, MSR)
syscall entry, fast vs slow path,
sys_call_table Initialization
system call procedure from user
application and glibc
51
syscall from Linux Programmer’s
Manuel
syscall() is a small library function that invokes the system
call whose assembly language interface has the specified
number with the specified arguments.
Architecture calling convention
Old!
New!
52
System call from user application
Assemble : gcc -S sys_mult.c
syscall
in
glibc
Intel x86-64 Instruction
53
More on syscall instruction in x86
Intel x86-64 Instruction
SYSCALL invokes an OS system-call handler at privilege level 0. It does so by
loading RIP from the IA32_LSTAR MSR (after saving the address of the
instruction following SYSCALL into RCX). (The WRMSR instruction ensures that
the IA32_LSTAR MSR always contain a canonical address.)
syscall_init()
from Page 29
MSRs From page 32
Source : Intel® 64 and IA-32
Architectures Software
Developer’s Manual, Volume
2: Instruction Set Reference
54
System Call Architecture with glibc
Source : https://ko.wikipedia.org/wiki/GNU_C_라이브러리
To understand the actual
process of system call from
application level to kernel
level, you have to know
additional functions in
glibc(https://www.gnu.org/s/li
bc/)
Many other functions
...
I’ll cover these later
if possible
55
Q & A

Trap Handling in Linux

  • 1.
    Trap Handling inLinux focusing on system call Yongrae Jo 2017. 2. 16
  • 2.
    2 CONTENTS Background Function Call Flowfrom start_kernel() IDT initialization & Its Data Structure(gate, idt_table, MSR) syscall entry, fast vs slow path, sys_call_table Initialization system call procedure from user application and glibc
  • 3.
    3 Background Interrupt External Interrupt Asynchronous Interrupt IRQ Trap Exception Fault SystemCall Internal Interrupt Synchronous Interrupt Hardware Interrupt Software Interrupt But in Linux, Software Interrupt are all regarded as Trap An interrupt is a signal from a device attached to a computer or from a program within the computer that requires the operating system to stop and figure out what to do next (from whatis.techtarget.com/)
  • 4.
    4 Execution Flow ofInterrupt Service Normal Execution Interrupt Triggered, Non-Maskable Interrupt(NMI) 1. Save current State 2. Call Handler Routine But it is masked Execute Requested Handler Routine 1. Restore state 2. Return from ISR Source : Image from http://studymake.tistory.com/341
  • 5.
    5 (External)Interrupt Controller source :http://embien.com/blog/interrupt-handling-in-embedded- software/
  • 6.
    6 CONTENTS Background Function Call Flowfrom start_kernel() IDT initialization & Its Data Structure(gate, idt_table, MSR) syscall entry & sys_call_table Initialization & system call procedure from user application and glibc
  • 7.
    7 Function Call Flowfrom start_kernel()
  • 8.
    8 Function Call Flowfrom start_kernel()
  • 9.
    9 CONTENTS Background Function Call Flowfrom start_kernel() IDT initialization & Its Data Structure(gate, idt_table, MSR) syscall entry, fast vs slow path, sys_call_table Initialization system call procedure from user application and glibc
  • 10.
  • 11.
  • 12.
    12 List of interruptfrom /usr/src/linux-4.9.6/arch/x86/include/asm/traps.h
  • 13.
    13 x86’s Interrupt DescriptorTable Source : Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A:, System Programming Guide, Part 1
  • 14.
    14 x86’s Interrupt DescriptorTable(cont’d) Source : Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A:, System Programming Guide, Part 1
  • 15.
    15 x86’s Interrupt DescriptorTable(cont’d) Where are the other interrupts defined?
  • 16.
    16 List of interruptfrom : /usr/src/linux-4.9.6/arch/x86/include/asm/irq_vectors.h In this file, We can see the other interrupt vector names and its numbers
  • 17.
  • 18.
    18 Let’s follow set_intr_gate()’s functioncall flow This function is real deal! trap_init()
  • 19.
    19 What is gate? … Thearchitecture also defines a set of special descriptors called gates (call gates, interrupt gates, trap gates, and task gates). These provide protected gateways to system procedures and handlers that may operate at a different privilege level than application programs and most procedures. For example, a CALL to a call gate can provide access to a procedure in a code segment that is at the same or a numerically lower privilege level (more privileged) than the current code segment. Source : Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A:, System Programming Guide, Part 1
  • 20.
    20 _set_gate() from : /usr/src/linux-4.9.6/arch/x86/include/asm/desc.h Weneed to know the meaning of following terms “gate_desc”, “type”,“dpl”, “ist”, “seg” and “idt_table”
  • 21.
  • 22.
    22 gate_struct64 and itsconnection to x86’s feature Source : Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A:, System Programming Guide, Part 1
  • 23.
    23 type field ingate_struct64 Source : Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A:, System Programming Guide, Part 1
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
    28 GCC’s Inline Assemblyfor x86 in Linux Load idt nth parameter Input operands : Memory constraints C expression memory address Source : https://www.ibm.co m/developerworks/li brary/l-ia/ Source : https://www.ibm.com/developerworks/library/l-ia/
  • 29.
  • 30.
    30 MSR Flags from :/usr/src/linux-4.9.6/arch/x86/include/asm/msr-index.h Where does these address come from?
  • 31.
    31 What is MSR? Amodel-specific register (MSR) is any of various control registers in the x86 instruction set used for debugging, program execution tracing, computer performance monitoring, and toggling certain CPU features.(wikipedia) Model Source : Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3C:, System Programming Guide, Part 3
  • 32.
    32 Some MSRs Source :Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3C:, System Programming Guide, Part 3 MSR Register Address (hex/Decimal) Architectural MSR Name and bit fields MSR/Bit Description
  • 33.
    33 Register syscall entryto MSRs : /usr/src/linux-4.9.6/arch/x86/kernel/cpu/common.c segment x86 -Assembly procedure
  • 34.
    34 CONTENTS Background Function Call Flowfrom start_kernel() IDT initialization & Its Data Structure(gate, idt_table, MSR) syscall entry, fast vs slow path, sys_call_table Initialization system call procedure from user application and glibc
  • 35.
    35 syscall_init() has external references System.mapis a symbol table which contains memory address, type and its name. Here “t or T” means code(or text) section Register entry address to MSR register
  • 36.
    36 More details onentry_SYSCALL_64 : /usr/src/linux-4.9.6/arch/x86/entry/entry_64.S :
  • 37.
    37 More details onentry_SYSCALL_64 : /usr/src/linux-4.9.6/arch/x86/arch/entry/entry_64.S :
  • 38.
    38 System call hastwo types : fast vs slow(in entry_64.S)
  • 39.
    39 System call hastwo types : fast vs slow(in entry_64.S) Invoke adequate system call
  • 40.
    40 Fast vs slowsystem call A fast syscall is one that is known to be able to complete without blocking or waiting. When the kernel encounters a fast syscall, it knows it can execute the syscall immediately and keep the same process scheduled (e.g. getuid(), getpid(), gettimeofday(), ...) A slow syscall potentially requires waiting for another task to complete, so the kernel must prepare to pause the calling process and run another task.(e.g. sleep(), possibly read()) Source : http://unix.stackexchange.com/questions/14293/difference-between-slow-system-calls-and-fast-system-calls
  • 41.
    41 sys_call_table() : array wheresystem call are listed sys_call_table is an array of function pointer named sys_call_ptr_t which points to address of system call function and it takes 6 arguments and returns long type value
  • 42.
    42 Initializing sys_call_table : /usr/src/linux-4.9.6/arch/x86/arch/syscall_64.c Initdoing nothing function Declaration of system call functions Assign system call function’s address to sys_call_table array using nr as an index
  • 43.
  • 44.
  • 45.
    45 Designated Initializers Standard C90requires the elements of an initializer to appear in a fixed order, the same as the order of the elements in the array or structure being initialized. In ISO C99 you can give the elements in any order, specifying the array indices or structure field names they apply to, ... To specify an array index, write ‘[index] =’ before the element value. For example, int a[6] = { [4] = 29, [2] = 15 }; is equivalent to int a[6] = { 0, 0, 15, 0, 29, 0 }; To initialize a range of elements to the same value, write ‘[first ... last] = value’. This is a GNU extension. For example, int widths[] = { [0 ... 9] = 1, [10 ... 99] = 2, [100] = 3 }; Source : https://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html
  • 46.
    46 C Preprocessor’s #,## Macro #define STRING(x) #x means “x” : stringfy x by “x” #define X(n) x##n means xn : concatenation with x and n So let me roll down initialization code of sys_call_table array For example, index 0 → [0] = __SYSCALL_64_QUAL_##qual(sys_read) = __SYSCALL_64_QUAL_(sys_read) : (## is concatenation and qual is empty) = sys_read
  • 47.
  • 48.
    48 do_syscall64() from : /usr/src/linux-4.9.6/arch/x86/entry/common.c Itinvokes system call with arguments 548~0 These registers are constructed from entry_64.S
  • 49.
  • 50.
    50 CONTENTS Background Function Call Flowfrom start_kernel() IDT initialization & Its Data Structure(gate, idt_table, MSR) syscall entry, fast vs slow path, sys_call_table Initialization system call procedure from user application and glibc
  • 51.
    51 syscall from LinuxProgrammer’s Manuel syscall() is a small library function that invokes the system call whose assembly language interface has the specified number with the specified arguments. Architecture calling convention Old! New!
  • 52.
    52 System call fromuser application Assemble : gcc -S sys_mult.c syscall in glibc Intel x86-64 Instruction
  • 53.
    53 More on syscallinstruction in x86 Intel x86-64 Instruction SYSCALL invokes an OS system-call handler at privilege level 0. It does so by loading RIP from the IA32_LSTAR MSR (after saving the address of the instruction following SYSCALL into RCX). (The WRMSR instruction ensures that the IA32_LSTAR MSR always contain a canonical address.) syscall_init() from Page 29 MSRs From page 32 Source : Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2: Instruction Set Reference
  • 54.
    54 System Call Architecturewith glibc Source : https://ko.wikipedia.org/wiki/GNU_C_라이브러리 To understand the actual process of system call from application level to kernel level, you have to know additional functions in glibc(https://www.gnu.org/s/li bc/) Many other functions ... I’ll cover these later if possible
  • 55.