Kernel Module
Taku Shimosawa
0
Feb. 21, 2015 Pour le livre nouveau du Linux noyau
Notes
• Linux kernel version: 3.19
• Quoted source codes come from kernel/module.c
unless otherwise noted.
1
Kernel Module
• A feature for dynamically adding/removing kernel
features while the kernel is running
• Benefits
• To update the kernel features while running
• To reduce memory consumption (and CPU overhead) by
loading only necessary kernel modules
• Avoiding GPL (Not required to compliant with GPL;
proprietary drivers)
• Many kernel features can be compiled either linked
to the kernel statically or independent modules
• File systems, device drivers, etc.
• “TRISTATE” in Kconfig (y, m, or n)
2
Where is the kernel module?
• Linux kernel modules are ELF binaries with an
extension “.ko”
• Many distributions locate the kernel modules
under /lib/modules
• e.g. /lib/modules/3.13.0-44-generic/kernel (Ubuntu
14.10)
• “depmod” finds the kernel modules located under the
directory to create module dependency map
(modules.dep)
• “modprobe” utility loads a kernel module with its
dependent modules by looking up the modules.dep file
• However, a module located in any place can be
loaded to the kernel if specified explicitly.
3
What is the “dependency?”
• A kernel module can export “symbols” that may be
used by another kernel module
• A symbol : a name for a location in the memory; a global
variable or a function in C
• If a module (B) uses a symbol exported by another module
(A), then the module B has dependency for the module A
• Thus, the module A should be loaded before the module B is
loaded
• (There seems to be no way to load modules that have circular
dependencies (e.g. A depends on B; B also depends on A))
4
Kernel module A Kernel module B
function f() {
}
EXPORT_SYMBOL(f);
function g() {
f();
}
DEP
Exported Symbols
• The symbols explicitly marked as “export” can be
accessed by other kernel modules
• The Linux kernel itself has “export”-ed symbols.
• Kernel modules are allowed to use only the exported symbols
in the kernel
• Not all the global functions are available for the modules!
• The symbols to be exported are declared with the
EXPORT_SYMBOL and EXPORT_SYMBOL_GPL macros.
• The latter makes the symbol available only for GPL modules.
5
struct task_struct *pid_task(struct pid *pid, enum pid_type type)
{ ... }
EXPORT_SYMBOL(pid_task);
...
struct task_struct *get_pid_task(struct pid *pid, enum pid_type type)
{ ... }
EXPORT_SYMBOL_GPL(get_pid_task);
(kernel/pid.c)
(BTW)
• What makes difference?
6
struct task_struct *pid_task(struct pid *pid, enum pid_type type)
{
...
}
EXPORT_SYMBOL(pid_task);
struct task_struct *get_pid_task(struct pid *pid, enum pid_type type)
{
struct task_struct *result;
rcu_read_lock();
result = pid_task(pid, type);
if (result)
get_task_struct(result);
rcu_read_unlock();
return result;
}
EXPORT_SYMBOL_GPL(get_pid_task);
(kernel/pid.c)
Make a kernel module!
• Out-of-tree module
• The only necessary files are
• Makefile
• C source file(s)
• Example for Makefile
7
obj-m += hello.o
KERN_BUILD=/lib/modules/$(shell uname -r)/build
all:
make -C $(KERN_BUILD) M=$(PWD) modules
clean:
make -C $(KERN_BUILD) M=$(PWD) clean
cf.
obj-$(CONFIG_SHIMOS) = shimos.o
Inside the kernel module
• What sections are inside a kernel module?
8
$ readelf –a hello.ko
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
…
[ 2] .text PROGBITS 0000000000000000 00000064
0000000000000000 0000000000000000 AX 0 0 1
[ 3] .init.text PROGBITS 0000000000000000 00000064
0000000000000016 0000000000000000 AX 0 0 1
[ 4] .rela.init.text RELA 0000000000000000 000009c0
0000000000000030 0000000000000018 16 3 8
[ 5] .exit.text PROGBITS 0000000000000000 0000007a
0000000000000006 0000000000000000 AX 0 0 1
…
[ 7] .modinfo PROGBITS 0000000000000000 00000091
00000000000000c1 0000000000000000 A 0 0 1
[ 8] __versions PROGBITS 0000000000000000 00000160
0000000000000080 0000000000000000 A 0 0 32
…
[18] .gnu.linkonce.thi PROGBITS 0000000000000000 00000280
0000000000000260 0000000000000000 WA 0 0 32
Sections
9
Section Name Description
.gnu.linkonce.this_module Module structure
.modinfo String-style module information
(Licenses, etc.)
__versions Expected (compile-time) versions (CRC) of the
symbols that this module depends on.
__ksymtab* Table of symbols which this module exports.
__kcrctab* Table of versions of symbols which this module
exports.
*.init Sections used while initialization (__init)
.text, .data, etc. The code and data
* : (none), _gpl, _gpl_future, _unused, unused_gpl
(License restriction / attribute of the symbols)
Module load and unload
• The simplest way : “insmod” and “rmmod” commands
• More sophisticated way is “modprobe” and “modprobe –r”
• The former tries to load modules which the specified module
depends on
• The latter tries to unload modules which the specified module
depends on
10
# insmod (file name) [parameters…]
(e.g.) # insmod helloworld.ko msg=hoge
# rmmod (module name)
(e.g.) # rmmod helloworld
How insmod calls the kernel?
• Source: kmod-19
11
KMOD_EXPORT int kmod_module_insert_module(struct kmod_module *mod,
unsigned int flags,
const char *options)
{
...
if (kmod_file_get_direct(mod->file)) {
unsigned int kernel_flags = 0;
if (flags & KMOD_INSERT_FORCE_VERMAGIC)
kernel_flags |= MODULE_INIT_IGNORE_VERMAGIC;
if (flags & KMOD_INSERT_FORCE_MODVERSION)
kernel_flags |= MODULE_INIT_IGNORE_MODVERSIONS;
err = finit_module(kmod_file_get_fd(mod->file), args, kernel_flags);
if (err == 0 || errno != ENOSYS)
goto init_finished;
}
...
(libkmod/libkmod-module.c)
System calls
• 3 Module-related System Calls
• init_module
• finit_module
• To load a module
• delete_module
• To unload a module
12
int init_module(void *module_image, unsigned long len,
const char *param_values);
int finit_module(int fd, const char *param_values,
int flags);
int delete_module(const char *name, int flags);
(from man pages)
init_module / finit_module
• Load a kernel module
• How to specify the module?
• init_module : by user memory buffer that contains the
kernel module image
• finit_module : by file descriptor for the kernel module
file
• By using finit_module, some flags can be specified
13
flags
MODULE_INIT_IGNORE_MODVERSIONS Ignore symbol version hashes
MODULE_INIT_IGNORE_VERMAGIC Ignore kernel version magic
delete_module
• Unload a kernel module
• Specifies a module to be unloaded by its “name”
• Some flags can be specified
• Why different policy from finit_module…?
14
flags
O_NONBLOCK | O_TRUNC Forcefully unload the module
(even when the ref count is not
zero; taints the kernel)
O_NONBLOCK Returns immediately with an error
(EWOULDBLOCK)
O_NONBLOCK not set Stops the module, and waits until
the ref count reaches zero.
(UNINTERRUPTIBLE)
Data structures for modules
• struct load_info
• Used while initializing a module
• Most members are ELF-related.
15
struct load_info {
Elf_Ehdr *hdr;
unsigned long len;
Elf_Shdr *sechdrs;
char *secstrings, *strtab;
unsigned long symoffs, stroffs;
struct _ddebug *debug;
unsigned int num_debug;
bool sig_ok;
struct {
unsigned int sym, str, mod, vers, info, pcpu;
} index;
};
(include/linux/module.h)
Data structures for modules
• struct module (too large..)
16
struct module {
enum module_state state;
/* Member of list of modules */
struct list_head list;
/* Unique handle for this module */
char name[MODULE_NAME_LEN];
/* Sysfs stuff. */
struct module_kobject mkobj;
...
/* Exported symbols */
const struct kernel_symbol *syms;
const unsigned long *crcs;
unsigned int num_syms;
/* Kernel parameters. */
struct kernel_param *kp;
unsigned int num_kp;
“modules” list
Exported symbols
Symbol CRC
Data structures for modules
17
/* GPL-only exported symbols. */
unsigned int num_gpl_syms;
const struct kernel_symbol *gpl_syms;
const unsigned long *gpl_crcs;
...
#ifdef CONFIG_MODULE_SIG
/* Signature was verified. */
bool sig_ok;
#endif
...
/* Exception table */
unsigned int num_exentries;
struct exception_table_entry *extable;
/* Startup function. */
int (*init)(void);
/* If this is non-NULL, vfree after init() returns */
void *module_init;
...
/* Here is the actual code + data, vfree'd on unload. */
void *module_core;
GPL Symbols
“init” function
“init” sections
Other (core) sections
Data structures for modules
18
/* Here are the sizes of the init and core sections */
unsigned int init_size, core_size;
/* The size of the executable code in each section. */
unsigned int init_text_size, core_text_size;
/* Size of RO sections of the module (text+rodata) */
unsigned int init_ro_size, core_ro_size;
/* Arch-specific module values */
struct mod_arch_specific arch;
...
/* The command line arguments (may be mangled). People like
keeping pointers to this stuff */
char *args;
...
#ifdef CONFIG_SMP
/* Per-cpu data. */
void __percpu *percpu;
unsigned int percpu_size;
#endifz
Sizes of sections
Command line
parameters
Per-CPU
Datas
Data structures for modules
19
...
#ifdef CONFIG_MODULE_UNLOAD
/* What modules depend on me? */
struct list_head source_list;
/* What modules do I depend on? */
struct list_head target_list;
/* Destruction function. */
void (*exit)(void);
struct module_ref __percpu *refptr;
#endif
#ifdef CONFIG_CONSTRUCTORS
/* Constructor functions. */
ctor_fn_t *ctors;
unsigned int num_ctors;
#endif
};
(include/linux/module.h)
Lists to manage
dependencies
(only unload is enabled)
Module state
• state in struct module
• During its load, state becomes
(created) -> UNFORMED -> COMING -> LIVE.
• During its unload, state becomes
LIVE -> GOING -> (removed)
20
state description
MODULE_STATE_UNFORMED Appeared in the modules list, but still during
set up
MODULE_STATE_COMING Fully formed. Running module_init.
MODULE_STATE_LIVE Normal state.
MODULE_STATE_GOING Being unloaded.
Global module information
Variables Description
LIST_HEAD(modules) List of modules that are in the kernel.
DEFINE_MUTEX(module_mutex) Protection against “modules,” etc.
• Add : RCU list operations
• Remove : stop_machine(~3.18)
21
/*
* Mutex protects:
* 1) List of modules (also safely readable with preempt_disable),
* 2) module_use links,
* 3) module_addr_min/module_addr_max.
* (delete uses stop_machine/add uses RCU list operations). */
DEFINE_MUTEX(module_mutex);
EXPORT_SYMBOL_GPL(module_mutex);
Loading a Module
• Load the whole module file onto memory
• Parse the ELF and module information
• Check the module information to
determine whether the module is
loadable or not
• Layout the sections and copy to the final
location
• Add the module to the kernel
• Resolve the symbols and apply
relocations
• Copy module parameters
• Call the init function
22
System Calls
load_module
layout_and_allocate
setup_load_info
check_mod_info
layout_sections
layout_symtabs
move_module
add_unformed_mo
dule
simply_symbols
apply_relocations
do_init_module
UNFORMED
COMING
LIVE
Unloading a Module
• Check if the reference count of the
module is zero
• If zero or it is forced unloading, then set
the state to GOING
• If not zero, it fails
• Call the “exit” function
• Free and cleanup everything
23
sys_delete_module
try_stop_module
__try_stop_module
free_module
stop_machine (-3.18)
• Until Linux 3.18, the reference count check and
module remove in module unloading is
implemented with stop_machine.
24
static int try_stop_module(struct module *mod, int flags, int *forced)
{
struct stopref sref = { mod, flags, forced };
return stop_machine(__try_stop_module, &sref, NULL);
}
static void free_module(struct module *mod)
{
...
mutex_lock(&module_mutex);
stop_machine(__unlink_module, mod, NULL);
mutex_unlock(&module_mutex);
...
}
Now (3.19)
• Reference count is now atomic_t (was per-cpu int
before) and checked without stop_machine
• (thanks to a mysterious guy)
25
static int try_stop_module(struct module *mod, int flags, int *forced)
{
/* If it's not unused, quit unless we're forcing. */
if (try_release_module_ref(mod) != 0) {
*forced = try_force_unload(flags);
if (!(*forced))
return -EWOULDBLOCK;
}
/* Mark it as dying. */
mod->state = MODULE_STATE_GOING;
return 0;
}
Now (3.19)
• Stop_machine also goes away from removing
26
static void free_module(struct module *mod)
{
...
/* Now we can delete it from the lists */
mutex_lock(&module_mutex);
/* Unlink carefully: kallsyms could be walking list. */
list_del_rcu(&mod->list);
/* Remove this module from bug list, this uses list_del_rcu */
module_bug_cleanup(mod);
/* Wait for RCU synchronizing before releasing mod->list and
buglist. */
synchronize_rcu();
mutex_unlock(&module_mutex);
...
}
Details (1)
Loading
27
sys_init_module/sys_finit_module
• Initialize a load_info structure
• Check whether module load is permitted or not.
(may_init_module function)
• [finit only] Flags check
• [init only] Copy module data in user memory to
kernel memory (copy_module_from_user function)
• [finit only] Read from the fd into kernel memory
(copy_module_from_fd function)
• Call the load_module function
28
may_init_module
• Capability: CAP_SYS_MODULE
• “module_disabled” parameter
• Blocks loading and unloading of modules
29
/* Block module loading/unloading? */
int modules_disabled = 0;
core_param(nomodule, modules_disabled, bint, 0);
...
static int may_init_module(void)
{
if (!capable(CAP_SYS_MODULE) || modules_disabled)
return -EPERM;
return 0;
}
(kernel/module.c)
# sysctl kernel.modules_disabled
kernel.modules_disabled = 0
copy_module_from_fd
• Pass the file struct to the security module
• vmalloc an area for the module data
• Load the whole module file into the area
• Set the pointer to info->hdr
30
static int copy_module_from_fd(int fd, struct load_info *info)
{
...
err = security_kernel_module_from_file(f.file);
if (err)
goto out;
...
info->hdr = vmalloc(stat.size);
if (!info->hdr) {
err = -ENOMEM;
goto out;
}
...
while (pos < stat.size) {
bytes = kernel_read(f.file, pos, (char *)(info->hdr) + pos,
stat.size - pos);
...
}
info->len = pos;
copy_module_from_user
• Differences:
• Pass “NULL” pointer to the security module
• Just copy_from_user instead of kernel_read
31
static int copy_module_from_user(const void __user *umod, unsigned long len,
struct load_info *info)
{...
info->len = len;
...
err = security_kernel_module_from_file(NULL);
if (err)
return err;
...
/* Suck in entire file: we'll want most of it. */
info->hdr = vmalloc(info->len);
if (!info->hdr)
return -ENOMEM;
...
if (copy_from_user(info->hdr, umod, info->len) != 0) {
vfree(info->hdr);
return -EFAULT;
}
return 0;
load_module function (1)
• Signature check (module_sig_check)
• ELF header check (elf_header_check)
• Layout and allocate the final location for the module
(layout_and_allocate)
• Add the module to the “modules” list
(add_unformed_module)
• Allocate per-cpu areas used in the module
(percpu_modalloc)
• Initialize link lists used for dependency management and
unloading features (module_unload_init)
• Find optional sections (find_module_sections)
• License and version dirty hack
(check_module_license_and_versions)
• Setup MODINFO_ATTR fields (setup_modinfo)
32
load_module function (2)
• Resolve the symbols (simplify_symbols)
• Fix up the addresses in the module (apply_relocations)
• Extable and per-cpu initialization (post_relocation)
• Flush I-cache for the module area
(flush_module_icache)
• Copy the module parameters to mod->args.
• Check duplication of symbols, and setup NX attributes.
(complete_formation)
• Parse the module parameters (parse_args)
• sysfs setup (mod_sysfs_setup)
• Free the copy in the load_info structure (free_copy)
• Call the init function of the module (do_init_module)
33
module_sig_check
• Check the signature in the module (if
CONFIG_MODULE_SIG=y)
• If a module is signed, “signature” and “marker” resides at the
tail of the module file.
• If signature is OK, module->sig_ok is set to true.
• If no signature is found (-ENOKEY) and signature is not
enforced, it returns success(0).
• Signature is enforced either
• When CONFIG_MODULE_SIG_FORCE is Y
• When “sig_enforce” parameter is set
34
Module (ELF) Signature Marker
“~Module signature appended~n”
$ hd /lib/module/3.13.0-45-generic/kernel/fs/btrfs/btrfs.ko
0014b470 f8 a6 b7 74 01 06 01 1e 14 00 00 00 00 00 02 02 |...t............|
0014b480 7e 4d 6f 64 75 6c 65 20 73 69 67 6e 61 74 75 72 |~Module signatur|
0014b490 65 20 61 70 70 65 6e 64 65 64 7e 0a |e appended~.|
0014b49c
elf_header_check
• Sanity check for the ELF header
• The magic number is correct
• The architecture is correct
• The length is large enough to contain all the section headers,
etc.
35
static int elf_header_check(struct load_info *info)
{
if (info->len < sizeof(*(info->hdr)))
return -ENOEXEC;
if (memcmp(info->hdr->e_ident, ELFMAG, SELFMAG) != 0
|| info->hdr->e_type != ET_REL
|| !elf_check_arch(info->hdr)
|| info->hdr->e_shentsize != sizeof(Elf_Shdr))
return -ENOEXEC;
if (info->hdr->e_shoff >= info->len
|| (info->hdr->e_shnum * sizeof(Elf_Shdr) >
info->len - info->hdr->e_shoff))
return -ENOEXEC;
return 0;
}
ELF (.ko)
ELF Header
36
Elf_Ehdr
e_ident
e_type
e_shoff
e_shentsize
e_shnum
e_shstrndx
…
…
Elf_Shdr
Elf_Shdr
load_info.hdr (ELF_EHdr)
= The head of the kernel module file
= The head of the ELF
= Pointer to ELF_EHdr
e_shentsize
e_shentsize
e_shoff
e_shnum
ELF (.ko)
e_ident: magic (‘x7fELF’), 32/64-bit,
etc. (16 byte in total incl. padding)
e_type: ET_REL / ET_EXEC / ET_DYN
layout_and_allocate
• Fill the section information of the load_info, and
create a module structure pointing to the
temporary location (setup_load_info)
• Check the module information and report if the
module taints the kernel (check_modinfo)
• Calculate the size required for the final location of
the module (layout_sections / layout_symtab)
• Allocate the memory of the calculated size, and
copy the contents of the module, and move the
pointer of the module structure there
(move_module).
37
setup_load_info
• Set the following members according to the ELF header
and section headers.
• sechdrs (Pointer to the section header)
• secstrings (Pointer to the string section that contains section
names)
• index.info, index.ver (Section indices of modinfo, version)
• index.sym, index.str (Section indices of symbols, strings)
• strtab (Pointer to the string section)
• index.mod (section index of module section)
• “.gnu.linkonce.this_module” section
• Set the module pointer to this section (temporally)
• index.pcu (section index for per-cpu section)
• “.data..percpu” section (if exists)
• Return a pointer to a (temporary) module structure
38
setup_load_info
• info->sechdrs
• info->secstrings
• info->strtab
• Each section’s offset is
stored in ELF_Shdr.sh_offset
• info->index.info = 12
• info->index.vers = 16
• info->index.sym = 24
• info->index.str = 25
• info->index.mod = 18
• struct module *mod
• info->index.pcpu = 0
• No per-cpu data in this example.
39
Elf_Ehdr
Elf_Shdr (0)
Elf_Shdr (18)
.gnu.linkonce.this_module
Elf_Shdr (23) : .shstrtab
Elf_Shdr (24) : .symtab
Elf_Shdr (25) : .strtab
…
.shstrtab section
.strtab section
.gnu.linkonce.this_module section
Elf_Shdr (12) : .modinfo
Elf_Shdr (16) : __versions
Header
Section (Contents)
check_modinfo (1)
• Check “modinfo” in the module, and check if the
version magic is identical to the current kernel, and
mark “tainted” if it taints the kernel.
• “Modinfo” resides in the “.modinfo” section, and is
composed of zero-terminated strings of key-value
pairs connected by “=“.
40
description=Hello world kernel module0
author=Taku Shimosawa <shimos@shimos.net>0
license=GPL v20
srcversion=8D5BACDC1EA9421ABFF79DD0
depends=0
vermagic=3.13.0-44-generic SMP mod_unload modversions
check_modinfo (2)
• First, check the version magic in the module
41
static int check_modinfo(struct module *mod, struct load_info *info,
int flags)
{
const char *modmagic = get_modinfo(info, "vermagic");
...
if (flags & MODULE_INIT_IGNORE_VERMAGIC)
modmagic = NULL;
...
if (!modmagic) {
err = try_to_force_load(mod, "bad vermagic");
if (err)
return err;
} else if (!same_magic(modmagic, vermagic, info->index.vers)) {
pr_err("%s: version magic '%s' should be '%s'n",
mod->name, modmagic, vermagic);
return -ENOEXEC;
}
check_modinfo (3)
• Version magic
• Example:
• same_magic function
• Compare the vermagic strings excluding CRCs if they
have CRCs.
42
#define VERMAGIC_STRING 
UTS_RELEASE " " 
MODULE_VERMAGIC_SMP MODULE_VERMAGIC_PREEMPT 
MODULE_VERMAGIC_MODULE_UNLOAD MODULE_VERMAGIC_MODVERSIONS 
MODULE_ARCH_VERMAGIC
(include/linux/vermagic.h)
3.13.0-44-generic SMP mod_unload modversions
check_modinfo (4)
• …And mark tainted if any is necesary
43
if (!get_modinfo(info, "intree"))
add_taint_module(mod, TAINT_OOT_MODULE, LOCKDEP_STILL_OK);
if (get_modinfo(info, "staging")) {
add_taint_module(mod, TAINT_CRAP, LOCKDEP_STILL_OK);
pr_warn("%s: module is from the staging directory, the
quality "
"is unknown, you have been warned.n", mod->name);
}
/* Set up license info based on the info section */
set_license(mod, get_modinfo(info, "license"));
check_modinfo (5)
• License information is also important
44
static void set_license(struct module *mod, const char *license)
{
if (!license)
license = "unspecified";
if (!license_is_gpl_compatible(license)) {
if (!test_taint(TAINT_PROPRIETARY_MODULE))
pr_warn("%s: module license '%s' taints
kernel.n",
mod->name, license);
add_taint_module(mod, TAINT_PROPRIETARY_MODULE,
LOCKDEP_NOW_UNRELIABLE);
}
}
check_modinfo (6)
• GPL compatible?
• See the “GPL0….” case
45
static inline int license_is_gpl_compatible(const char *license)
{
return (strcmp(license, "GPL") == 0
|| strcmp(license, "GPL v2") == 0
|| strcmp(license, "GPL and additional rights") == 0
|| strcmp(license, "Dual BSD/GPL") == 0
|| strcmp(license, "Dual MIT/GPL") == 0
|| strcmp(license, "Dual MPL/GPL") == 0);
}
(include/linux/license.h)
check_modinfo (7)
• Also, the kernel is marked tainted when the module
is loaded forcefully
46
static int try_to_force_load(struct module *mod, const char
*reason)
{
#ifdef CONFIG_MODULE_FORCE_LOAD
if (!test_taint(TAINT_FORCED_MODULE))
pr_warn("%s: %s: kernel tainted.n", mod->name,
reason);
add_taint_module(mod, TAINT_FORCED_MODULE,
LOCKDEP_NOW_UNRELIABLE);
return 0;
#else
return -ENOEXEC;
#endif
}
Taints!
• Tainted mask are composed of several flags that
identifies the reason of tainting
• Lockdep is disabled if it will not work well
• Ignoring the version magic, proprietary drivers, forceful unload
47
void add_taint(unsigned flag, enum lockdep_ok lockdep_ok)
{
if (lockdep_ok == LOCKDEP_NOW_UNRELIABLE && __debug_locks_off())
pr_warn("Disabling lock debugging due to kernel taintn");
set_bit(flag, &tainted_mask);
}
(kernel/panic.c)
static inline void add_taint_module(struct module *mod, unsigned flag,
enum lockdep_ok lockdep_ok)
{
add_taint(flag, lockdep_ok);
mod->taints |= (1U << flag);
}
(kernel/module.c)
Kernel global flags
Per-module flags
Taints!
• 15 reasons are defined
48
#define TAINT_PROPRIETARY_MODULE 0
#define TAINT_FORCED_MODULE 1
#define TAINT_CPU_OUT_OF_SPEC 2
#define TAINT_FORCED_RMMOD 3
#define TAINT_MACHINE_CHECK 4
#define TAINT_BAD_PAGE 5
#define TAINT_USER 6
#define TAINT_DIE 7
#define TAINT_OVERRIDDEN_ACPI_TABLE 8
#define TAINT_WARN 9
#define TAINT_CRAP 10
#define TAINT_FIRMWARE_WORKAROUND 11
#define TAINT_OOT_MODULE 12
#define TAINT_UNSIGNED_MODULE 13
#define TAINT_SOFTLOCKUP 14
(include/linux/kernel.h)
$ sysctl kernel.tainted
kernel.tainted = 12288
12288 = 0x3000
layout_sections
• Calculate the size of final memory to load the module
• Load only sections with “SHF_ALLOC” flags set
• Calculate sizes for “core” and “init”
• “init” sections are determined when the section name starts with
“.init”
• Sets the following member of module
• core_size : sum of the sizes of the “core” sections to be
loaded
• core_text_size, core_ro_size : sum of the sizes of the text and
R/O “core” sections
• init_size : sum of the sizes of the “init” sections to be loaded
• init_text_size, init_ro_size : … of “init” sections
• sh_entsize in ELF_Shdr is used as the offset of the
memory where the section will be loaded.
49
layout_sections
• The sections in the example “hello.ko” are
categorized as follows:
50
Sections
Core Text .text, .exit.text
R/O __ksymtab, __kcrctab, .rodata.str1.1, __ksymtab_strings
__mcount_loc,
R/W .data, .gnu.linkonce.this_module, .bss,
Init Text .init.text
R/O
R/W
(Others) Not loaded .rela.text, .rela.init.text, .rela__ksymtab, .rela__kcrctab
.rela__mcount_loc, .rela.gnu.linonce.this_module
.comment, .note.GNU-stack, .shstrtab, .symtab, .strtab
.modinfo, __versions (*)
(*) These two sections originally have SHF_ALLOC, but the flags are
dropped by rewrite_section_headers
layout_symtab
• Put the symtab and strtab at the end of the init part
• (Actually this function does not put, but add init_size by
the size of symtab)
• Put the symtab and strtab for the core symbols at
the end of core part.
51
move_module
• Allocate the final memory of the module, and
update the boundary addresses for the modules
(module_alloc_update_bounds)
• Copy the section contents and update sh_addr’s
52
static void *module_alloc_update_bounds(unsigned long size)
{
void *ret = module_alloc(size);
if (ret) {
mutex_lock(&module_mutex);
if ((unsigned long)ret < module_addr_min)
module_addr_min = (unsigned long)ret;
if ((unsigned long)ret + size > module_addr_max)
module_addr_max = (unsigned long)ret + size;
mutex_unlock(&module_mutex);
}
return ret;
}
module_alloc : x86
• x86
• Get_module_load_offset() determines the load offset as
a random value at the first time if KASLR is enabled
53
#define MODULES_VADDR VMALLOC_START
#define MODULES_END VMALLOC_END
(arch/x86/include/asm/pgtable_32_types.h)
#define MODULES_VADDR (__START_KERNEL_map + KERNEL_IMAGE_SIZE)
#define MODULES_END _AC(0xffffffffff000000, UL)
(arch/x86/include/asm/pgtable_64_types.h)
void *module_alloc(unsigned long size)
{
if (PAGE_ALIGN(size) > MODULES_LEN)
return NULL;
return __vmalloc_node_range(size, 1,
MODULES_VADDR + get_module_load_offset(),
MODULES_END, GFP_KERNEL | __GFP_HIGHMEM,
PAGE_KERNEL_EXEC, NUMA_NO_NODE,
__builtin_return_address(0));
}
(arch/x86/kernel/module.c)
module_alloc : ARM
• ARM
54
#ifndef CONFIG_THUMB2_KERNEL
#define MODULES_VADDR (PAGE_OFFSET - SZ_16M)
#else
/* smaller range for Thumb-2 symbols relocation (2^24)*/
#define MODULES_VADDR (PAGE_OFFSET - SZ_8M)
#endif
(arch/arm/include/asm/memory.h)
#define MODULES_END (PAGE_OFFSET)
#define MODULES_VADDR (MODULES_END - SZ_64M)
(arch/arm64/include/asm/memory.h)
#ifdef CONFIG_MMU
void *module_alloc(unsigned long size)
{
return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
GFP_KERNEL, PAGE_KERNEL_EXEC, NUMA_NO_NODE,
__builtin_return_address(0));
}
#endif
(arch/arm/kernel/module.c)
module to final place
• Struct module for the module loaded was pointed
to the temporary module image memory
• Now, it’s loaded and copied to the final location , so
the pointer is also changed to the final location
55
/* Module has been copied to its final place now: return it. */
mod = (void *)info->sechdrs[info->index.mod].sh_addr;
load_module function (1) [RE]
• Signature check (module_sig_check)
• ELF header check (elf_header_check)
• Layout and allocate the final location for the module
(layout_and_allocate)
• Add the module to the “modules” list
(add_unformed_module)
• Allocate per-cpu areas used in the module
(percpu_modalloc)
• Initialize link lists used for dependency management and
unloading features (module_unload_init)
• Find optional sections (find_module_sections)
• License and version dirty hack
(check_module_license_and_versions)
• Setup MODINFO_ATTR fields (setup_modinfo)
56
add_unformed_module
• Add the module to the “modules” list
• Checking the duplicated loading of the same module
• If the same module is still being loaded, this waits for
the completion of the load, and it tries again
• Just in case that the module fails to be loaded
57
add_unformed_module
58
static int add_unformed_module(struct module *mod)
{
mod->state = MODULE_STATE_UNFORMED;
...
again:
mutex_lock(&module_mutex);
old = find_module_all(mod->name, strlen(mod->name), true);
if (old != NULL) {
if (old->state == MODULE_STATE_COMING
|| old->state == MODULE_STATE_UNFORMED) {
mutex_unlock(&module_mutex);
err = wait_finished_loading(mod);
if (err)
goto out_unlocked;
goto again;
}
err = -EEXIST;
goto out;
}
list_add_rcu(&mod->list, &modules);
err = 0;
...
When loading occurs concurrently
59
Module A UNFORMED LIVE
Module A UNFORMED (fail)
Module B
(depends on A)
UNFORMED Resolve Resolve LIVE
wakeup_all
(@do_init_module)
time
COMING
percpu_modalloc
• Allocate per-cpu area for the size of the per-cpu
section
60
static int percpu_modalloc(struct module *mod, struct load_info *info)
{
Elf_Shdr *pcpusec = &info->sechdrs[info->index.pcpu];
unsigned long align = pcpusec->sh_addralign;
if (!pcpusec->sh_size)
return 0;
...
mod->percpu = __alloc_reserved_percpu(pcpusec->sh_size, align);
if (!mod->percpu) {
pr_warn("%s: Could not allocate %lu bytes percpu datan",
mod->name, (unsigned long)pcpusec->sh_size);
return -ENOMEM;
}
mod->percpu_size = pcpusec->sh_size;
return 0;
}
module_unload_init
• Initialize a reference counter for the module
• After this function, it becomes 2.
• Initialize lists that manages dependency
• source_list : list of “usages” in which the module is contained
as their “source” (= the list of modules which uses the
symbols of the module)
• target_list : list of “usages” in which the module is contained
as their “target” (= the list of modules symbols of which the
module uses)
61
static int module_unload_init(struct module *mod)
{
atomic_set(&mod->refcnt, MODULE_REF_BASE);
INIT_LIST_HEAD(&mod->source_list);
INIT_LIST_HEAD(&mod->target_list);
atomic_inc(&mod->refcnt);
return 0;
}
find_module_sections
• Find additional sections in the module
• Mostly related to symbol tables, and tracers
62
Sections
__param
__ksymtab
__kcrctab
__ksymtab_gpl
__kcrctab_gpl
__ksymtab_gpl_future
__kcrctab_gpl_future
__ksymtab_unused
__kcrctab_unused
__ksymtab_unused_gpl
__kcrctab_unused_gtpl
Sections
.ctors / .init_array
__tracepoints_ptrs
__jump_table
_ftrace_events
__trace_printk_fmt
__mcount_loc
__ex_table
__verbose
check_module_license_and_versions
• Some hacks on specific modules
• e.g.) ndiswrapper driver may be GPL (it needs symbols
exported only to GPL modules), but the driver it loads
will not be GPL, so mark tainted
63
static int check_module_license_and_versions(struct module *mod)
{
if (strcmp(mod->name, "ndiswrapper") == 0)
add_taint(TAINT_PROPRIETARY_MODULE, LOCKDEP_NOW_UNRELIABLE);
/* driverloader was caught wrongly pretending to be under GPL */
if (strcmp(mod->name, "driverloader") == 0)
add_taint_module(mod, TAINT_PROPRIETARY_MODULE,
LOCKDEP_NOW_UNRELIABLE);
/* lve claims to be GPL but upstream won't provide source */
if (strcmp(mod->name, "lve") == 0)
add_taint_module(mod, TAINT_PROPRIETARY_MODULE,
LOCKDEP_NOW_UNRELIABLE);
check_module_license_and_versions
• Checks whether the symbols have CRCs (versions)
64
#ifdef CONFIG_MODVERSIONS
if ((mod->num_syms && !mod->crcs)
|| (mod->num_gpl_syms && !mod->gpl_crcs)
|| (mod->num_gpl_future_syms && !mod->gpl_future_crcs)
#ifdef CONFIG_UNUSED_SYMBOLS
|| (mod->num_unused_syms && !mod->unused_crcs)
|| (mod->num_unused_gpl_syms && !mod->unused_gpl_crcs)
#endif
) {
return try_to_force_load(mod,
"no versions for exported
symbols");
}
#endif
return 0;
setup_modinfo
• Call “setup” for module attributes
• Only “version” and “srcversion” have “setup” callback.
• Module attributes
• version, srcversion
• uevent
• initstate
• coresize, initsize
• taint
• refcnt
65
#define MODINFO_ATTR(field) 
static void setup_modinfo_##field(struct module *mod, const char *s) 
{ 
mod->field = kstrdup(s, GFP_KERNEL); 
}
load_module function (2) [Re]
• Resolve the symbols (simplify_symbols)
• Fix up the addresses in the module (apply_relocations)
• Extable and per-cpu initialization (post_relocation)
• Flush I-cache for the module area
(flush_module_icache)
• Copy the module parameters to mod->args.
• Check duplication of symbols, and setup NX attributes.
(complete_formation)
• Parse the module parameters (parse_args)
• sysfs setup (mod_sysfs_setup)
• Free the copy in the load_info structure (free_copy)
• Call the init function of the module (do_init_module)
66
simplify_symbols
• Change the address of the unresolved symbols in
the “symtab” section to the actual addresses
67
static int simplify_symbols(struct module *mod, const struct load_info *info)
{
Elf_Shdr *symsec = &info->sechdrs[info->index.sym];
Elf_Sym *sym = (void *)symsec->sh_addr;
...
for (i = 1; i < symsec->sh_size / sizeof(Elf_Sym); i++) {
const char *name = info->strtab + sym[i].st_name;
...
case SHN_UNDEF:
ksym = resolve_symbol_wait(mod, info, name);
/* Ok if resolved. */
if (ksym && !IS_ERR(ksym)) {
sym[i].st_value = ksym->value;
break;
}
/* Ok if weak. */
if (!ksym && ELF_ST_BIND(sym[i].st_info) == STB_WEAK)
break;
resolve_symbol_wait
• Waits if the resolved symbol is that of the module
which is under initialization.
68
static const struct kernel_symbol *
resolve_symbol_wait(struct module *mod,
const struct load_info *info,
const char *name)
{
const struct kernel_symbol *ksym;
char owner[MODULE_NAME_LEN];
if (wait_event_interruptible_timeout(module_wq,
!IS_ERR(ksym = resolve_symbol(mod, info, name, owner))
|| PTR_ERR(ksym) != -EBUSY,
30 * HZ) <= 0) {
pr_warn("%s: gave up waiting for init of module %s.n",
mod->name, owner);
}
return ksym;
}
resolve_symbol
• Find the symbol from the kernel’s symbol tables
and other modules’ symbol tables. (find_symbol)
• If found, check if the version (CRC) of the symbol
matches one that the module expects
(check_versions)
• And add dependency for the target module and the
symbol owner module (ref_module)
69
find_symbol (1)
• Well, try to find it from the kernel
70
bool each_symbol_section(bool (*fn)(const struct symsearch *arr,
struct module *owner,
void *data),
void *data)
{
struct module *mod;
static const struct symsearch arr[] = {
{ __start___ksymtab, __stop___ksymtab, __start___kcrctab,
NOT_GPL_ONLY, false },
{ __start___ksymtab_gpl, __stop___ksymtab_gpl,
__start___kcrctab_gpl,
GPL_ONLY, false },
{ __start___ksymtab_gpl_future, __stop___ksymtab_gpl_future,
__start___kcrctab_gpl_future,
WILL_BE_GPL_ONLY, false },
...
};
if (each_symbol_in_section(arr, ARRAY_SIZE(arr), NULL, fn, data))
return true;
find_symbol (2)
• And, try to find in the modules (after UNFORMED)
71
list_for_each_entry_rcu(mod, &modules, list) {
struct symsearch arr[] = {
{ mod->syms, mod->syms + mod->num_syms, mod->crcs,
NOT_GPL_ONLY, false },
{ mod->gpl_syms, mod->gpl_syms + mod->num_gpl_syms,
mod->gpl_crcs,
GPL_ONLY, false },
{ mod->gpl_future_syms,
mod->gpl_future_syms + mod->num_gpl_future_syms,
mod->gpl_future_crcs,
WILL_BE_GPL_ONLY, false },
if (mod->state == MODULE_STATE_UNFORMED)
continue;
if (each_symbol_in_section(arr, ARRAY_SIZE(arr), mod, fn,
data))
return true;
}
return false;
}
find_symbol (3)
• Bianry search in the section!
72
static int cmp_name(const void *va, const void *vb)
{
const char *a;
const struct kernel_symbol *b;
a = va; b = vb;
return strcmp(a, b->name);
}
static bool find_symbol_in_section(const struct symsearch *syms,
struct module *owner,
void *data)
{
struct find_symbol_arg *fsa = data;
sym = bsearch(fsa->name, syms->start, syms->stop - syms->start,
sizeof(struct kernel_symbol), cmp_name);
if (sym != NULL && check_symbol(syms, owner, sym - syms->start, data))
return true;
return false;
}
Checks the found symbol’s target license
ref_module
• If the target module is NULL (=the symbol is in the
kernel) or the module already uses the target module,
it immediately returns.
• Increment the reference counter of the target module
(if the target module is in the middle of initialization,
returns –EBUSY)
• Add usage
• Source : the module
• Target : the target module
73
static int add_module_usage(struct module *a, struct module *b)
{
struct module_use *use;
use = kmalloc(sizeof(*use), GFP_ATOMIC);
use->source = a;
use->target = b;
list_add(&use->source_list, &b->source_list);
list_add(&use->target_list, &a->target_list);
}
Usage example
74
Kernel module A Kernel module B
function f() {
}
function g() {
f();
}
DEP
struct module A
refcnt : 2
struct module B
refcnt: 1
struct module_use
source: &B
target: &A
source_list
target_list
source_list
target_list
apply_relocations
• Apply relocations for each “rel” section
• “rel” sections
• Section Type : SHT_REL or SHT_RELA
75
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 2] .text PROGBITS 0000000000000000 00000070
0000000000000019 0000000000000000 AX 0 0 16
[ 3] .rela.text RELA 0000000000000000 00000ca8
0000000000000048 0000000000000018 24 2 8
[ 4] .init.text PROGBITS 0000000000000000 00000089
0000000000000016 0000000000000000 AX 0 0 1
[ 5] .rela.init.text RELA 0000000000000000 00000cf0
0000000000000030 0000000000000018 24 4 8
[24] .symtab SYMTAB 0000000000000000 00000db0
00000000000003c0 0000000000000018 25 32 8
[25] .strtab STRTAB 0000000000000000 00001170
000000000000014a 0000000000000000 0 0 1
Relocation
• Example
• This function uses
the “printk” symbol
outside the module.
(And also __fentry__)
76
0000000000000000 <say_hello>:
0: e8 00 00 00 00 callq 5 <say_hello+0x5>
1: R_X86_64_PC32 __fentry__-0x4
5: 55 push %rbp
6: 48 c7 c7 00 00 00 00 mov $0x0,%rdi
9: R_X86_64_32S .rodata.str1.1
d: 31 c0 xor %eax,%eax
f: 48 89 e5 mov %rsp,%rbp
12: e8 00 00 00 00 callq 17 <say_hello+0x17>
13: R_X86_64_PC32 printk-0x4
17: 5d pop %rbp
18: c3 retq
void say_hello(void)
{
printk(KERN_INFO
"Hello, World.n");
}
RIP-relative is based on
the next instruction
apply_relocate[_add]
• Addressing is architecture-dependent, so the
relocation is also architecture-dependent
• x86_64 (RELA)
• An RELA section is an array of Elf64_Rela
• In the “printk” example
• r_offset = 0x13
• r_info = R_X86_64_PC32 (RIP-relative in x86_64)
• r_addend = -0x04
77
typedef struct elf64_rela {
Elf64_Addr r_offset; /* Location at which to apply the action */
Elf64_Xword r_info; /* index and type of relocation */
Elf64_Sxword r_addend; /* Constant addend used to compute value */
} Elf64_Rela;
apply_relocate_add in x86_64
78
int apply_relocate_add(Elf64_Shdr *sechdrs,
const char *strtab,
unsigned int symindex,
unsigned int relsec,
struct module *me)
{
...
for (i = 0; i < sechdrs[relsec].sh_size / sizeof(*rel); i++) {
/* This is where to make the change */
loc = (void *)sechdrs[sechdrs[relsec].sh_info].sh_addr
+ rel[i].r_offset;
/* This is the symbol it is referring to. Note that
all undefined symbols have been resolved. */
sym = (Elf64_Sym *)sechdrs[symindex].sh_addr
+ ELF64_R_SYM(rel[i].r_info);
...
val = sym->st_value + rel[i].r_addend;
apply_relocate_add in x86_64
79
switch (ELF64_R_TYPE(rel[i].r_info)) {
...
case R_X86_64_64:
*(u64 *)loc = val;
break;
...
case R_X86_64_32S:
*(s32 *)loc = val;
if ((s64)val != *(s32 *)loc)
goto overflow;
break;
case R_X86_64_PC32:
val -= (u64)loc;
*(u32 *)loc = val;
#if 0
if ((s64)val != *(s32 *)loc)
goto overflow;
#endif
break;
Calculate the delta between
the current address and the
target address
post_relocation
• Sort the exception table (sort_extable)
• Exception table: the instruction addresses which the page
fault handler treats specially page faults for.
• get_user etc.
• Copy the per-cpu section contents for all the possible
cpus. (percpu_modcopy)
• Set kallsyms-related members to the final location, and
copy core symtab from the whole symtab.
(add_kallsyms)
• Call architecture-dependent finalizing function of
loading (module_finalize)
80
for_each_possible_cpu(cpu)
memcpy(per_cpu_ptr(mod->percpu, cpu), from, size);
module_finalize in x86_64
• Alternatives, paravirt and so on.
81
int module_finalize(const Elf_Ehdr *hdr,
const Elf_Shdr *sechdrs,
struct module *me)
{
const Elf_Shdr *s, *text = NULL, *alt = NULL, *locks = NULL,
*para = NULL;
char *secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset;
for (s = sechdrs; s < sechdrs + hdr->e_shnum; s++) {
if (!strcmp(".text", secstrings + s->sh_name))
text = s;
if (!strcmp(".altinstructions", secstrings + s->sh_name))
alt = s;
if (!strcmp(".smp_locks", secstrings + s->sh_name))
locks = s;
if (!strcmp(".parainstructions", secstrings + s->sh_name))
para = s;
if (alt) {
/* patch .altinstructions */
void *aseg = (void *)alt->sh_addr;
apply_alternatives(aseg, aseg + alt->sh_size);
}
...
flush_module_icache
• Flush instruction cache for text area so that the
code be executed correctly
82
static void flush_module_icache(const struct module *mod)
{
mm_segment_t old_fs;
/* flush the icache in correct context */
old_fs = get_fs();
set_fs(KERNEL_DS);
if (mod->module_init)
flush_icache_range((unsigned long)mod->module_init,
(unsigned long)mod->module_init
+ mod->init_size);
flush_icache_range((unsigned long)mod->module_core,
(unsigned long)mod->module_core + mod->core_size);
set_fs(old_fs);
}
complete_formation
• Check if the exported symbols are already exported
by another module (verify_export_symbols)
• Add section information of symbols for BUG report
(module_bug_finalize)
• Set NX and RO for core and init area.
• Set the module state to MODULE_STATE_COMING
83
mod->state = MODULE_STATE_COMING;
load_module function (2) [Re]
• Resolve the symbols (simplify_symbols)
• Fix up the addresses in the module (apply_relocations)
• Extable and per-cpu initialization (post_relocation)
• Flush I-cache for the module area
(flush_module_icache)
• Copy the module parameters to mod->args.
• Check duplication of symbols, and setup NX attributes.
(complete_formation)
• Parse the module parameters (parse_args)
• sysfs setup (mod_sysfs_setup)
• Free the copy in the load_info structure (free_copy)
• Call the init function of the module (do_init_module)
84
do_init_module (1)
• Make a structure for call_rcu to free init area
• And call the init function in the module
• Set the module state to MODULE_STATE_LIVE
85
struct mod_initfree *freeinit;
freeinit = kmalloc(sizeof(*freeinit), GFP_KERNEL);
...
freeinit->module_init = mod->module_init;
do_mod_ctors(mod);
/* Start the module */
if (mod->init != NULL)
ret = do_one_initcall(mod->init);
mod->state = MODULE_STATE_LIVE;
do_init_module (2)
• To avoid deadlock, perform synchronize
• Drop the initial reference
• And clears the init-related stuffs!
86
if (current->flags & PF_USED_ASYNC)
async_synchronize_full();
mutex_lock(&module_mutex);
/* Drop initial reference. */
module_put(mod);
trim_init_extable(mod);
#ifdef CONFIG_KALLSYMS
mod->num_symtab = mod->core_num_syms;
mod->symtab = mod->core_symtab;
mod->strtab = mod->core_strtab;
#endif
unset_module_init_ro_nx(mod);
module_arch_freeing_init(mod);
do_init_module (3)
• Finally, frees the init stuffs
• Wakes up if someone is waiting for the completion
of the initialization.
87
call_rcu(&freeinit->rcu, do_free_init);
mutex_unlock(&module_mutex);
wake_up_all(&module_wq);
Details (2)
Unloading
88
sys_delete_module
• Check capability and module blocking parameter
• Find the specified module by name
• If the module has the init function AND does not
have the exit function and it is not forceful unload,
it fails with –EBUSY
• Try to stop the module (try_stop_module)
• Call the exit function
• Frees the module
89
Now (3.19) [RE]
• Reference count is now atomic_t (was per-cpu int
before) and checked without stop_machine
• (thanks to a mysterious guy)
90
static int try_stop_module(struct module *mod, int flags, int *forced)
{
/* If it's not unused, quit unless we're forcing. */
if (try_release_module_ref(mod) != 0) {
*forced = try_force_unload(flags);
if (!(*forced))
return -EWOULDBLOCK;
}
/* Mark it as dying. */
mod->state = MODULE_STATE_GOING;
return 0;
}
try_release_module_ref
• Decrement the reference counter and checks if it
reaches is zero (= can be unloaded).
91
static int try_release_module_ref(struct module *mod)
{
int ret;
/* Try to decrement refcnt which we set at loading */
ret = atomic_sub_return(MODULE_REF_BASE, &mod->refcnt);
BUG_ON(ret < 0);
if (ret)
/* Someone can put this right now, recover with
checking */
ret = atomic_add_unless(&mod->refcnt, MODULE_REF_BASE,
0);
return ret;
}
Details (3)
Building a out-of-tree kernel module
92
Build steps (1) : .c -> .o
• make .tmp_versions, create .tmp_versions/<module>.mod
• The file contains the names of the final .ko file and source .o files
• Compile .tmp_[name].o from [name].c
• Calculate the CRCs (version) for the exported symbols
• Find a __ksymtab section in .tmp_[name].o
• objdump –h (obj) | grep –q __ksymtab
• Calculate CRC for exported symbols in the source file by genksyms
(Output is LD Script format)
• Compile the CRC values into the object file.
93
cmd_modversions = 
if $(OBJDUMP) -h $(@D)/.tmp_$(@F) | grep -q __ksymtab; then 
$(call cmd_gensymtypes,$(KBUILD_SYMTYPES),$(@:.o=.symtypes)) 
> $(@D)/.tmp_$(@F:.o=.ver); 

$(LD) $(LDFLAGS) -r -o $@ $(@D)/.tmp_$(@F) 
-T $(@D)/.tmp_$(@F:.o=.ver); 
rm -f $(@D)/.tmp_$(@F) $(@D)/.tmp_$(@F:.o=.ver); 
else 
mv -f $(@D)/.tmp_$(@F) $@; 
fi;
_crc_say_hello = 0xb37b83db ;
Exported Symbols
• Each exported symbol has a struct in __ksymtab*
section.
94
#define __EXPORT_SYMBOL(sym, sec) 
extern typeof(sym) sym; 
__CRC_SYMBOL(sym, sec) 
static const char __kstrtab_##sym[] 
__attribute__((section("__ksymtab_strings"), aligned(1))) 
= VMLINUX_SYMBOL_STR(sym); 
extern const struct kernel_symbol __ksymtab_##sym; 
__visible const struct kernel_symbol __ksymtab_##sym 
__used 
__attribute__((section("___ksymtab" sec "+" #sym), unused)) 
= { (unsigned long)&sym, __kstrtab_##sym }
#define EXPORT_SYMBOL(sym) 
__EXPORT_SYMBOL(sym, "")
#define EXPORT_SYMBOL_GPL(sym) 
__EXPORT_SYMBOL(sym, "_gpl")
#define EXPORT_SYMBOL_GPL_FUTURE(sym) 
__EXPORT_SYMBOL(sym, "_gpl_future")
(include/linux/export.h)
CRC sections
• Declare CRC symbols in CRC sections with the weak
attribute.
95
#ifndef __GENKSYMS__
#ifdef CONFIG_MODVERSIONS
/* Mark the CRC weak since genksyms apparently decides not to
* generate a checksums for some symbols */
#define __CRC_SYMBOL(sym, sec) 
extern __visible void *__crc_##sym __attribute__((weak)); 
static const unsigned long __kcrctab_##sym 
__used 
__attribute__((section("___kcrctab" sec "+" #sym), unused)) 
= (unsigned long) &__crc_##sym;
#else
#define __CRC_SYMBOL(sym, sec)
#endif
(include/linux/export.h)
Build Steps (2) : .c -> .o
• Create __mcount_loc list (if –pg is enabled)
• The list of pointers where “mcount” is called
• Fix up the dep file
• Link into a single object file (<module>.o) if the
module is composed of multiple object files
96
Build Steps (3) – Stage 2
• Create <module>.mod.c and <module>.symvers by modpost
command
• Compile the <module>.mod.c
• Link the <module>.mod.o and <module>.o into a module
<module>.ko
97
modpost = scripts/mod/modpost 
$(if $(CONFIG_MODVERSIONS),-m) 
$(if $(CONFIG_MODULE_SRCVERSION_ALL),-a,) 
$(if $(KBUILD_EXTMOD),-i,-o) $(kernelsymfile) 
$(if $(KBUILD_EXTMOD),-I $(modulesymfile)) 
$(if $(KBUILD_EXTRA_SYMBOLS), $(patsubst %, -e %,$(KBUILD_EXTRA_SYMBOLS))) 
$(if $(KBUILD_EXTMOD),-o $(modulesymfile)) 
$(if $(CONFIG_DEBUG_SECTION_MISMATCH),,-S) 
$(if $(KBUILD_EXTMOD)$(KBUILD_MODPOST_WARN),-w)
MODPOST_OPT=$(subst -i,-n,$(filter -i,$(MAKEFLAGS)))
# We can go over command line length here, so be careful.
quiet_cmd_modpost = MODPOST $(words $(filter-out vmlinux FORCE, $^)) modules
cmd_modpost = $(MODLISTCMD) | sed 's/.ko$$/.o/' | $(modpost) $(MODPOST_OPT) -s
-T -
modpost (1)
• Collects module information, symbol information
and versions from kernel symbols, object files, and
generate module source file and symvers file.
• Arguments
• Options
98
Option Description
-m CONFIG_MODVERSIONS (Symbol version)
-a CONFIG_MODULE_SRCVERSION_ALL (“srcversion” in modinfo)
MD4 for the source files that made the module
-I (symvers file) Input symbol versions (kernel symbols)
-e (symvers file) Input extra symbol versions
-o (symvers file) Output symbol versions (for exported symbols of the module)
-T (files) Source (object) file list
$ modpost [Options...] [(Module object files...)]
modpost (2)
• Generate the source file
99
for (mod = modules; mod; mod = mod->next) {
char fname[PATH_MAX];
...
buf.pos = 0;
add_header(&buf, mod);
add_intree_flag(&buf, !external_module);
add_staging_flag(&buf, mod->name);
err |= add_versions(&buf, mod);
add_depends(&buf, mod, modules);
add_moddevtable(&buf, mod);
add_srcversion(&buf, mod);
sprintf(fname, "%s.mod.c", mod->name);
write_if_changed(&buf, fname);
}
(scripts/mod/modpost.c)
modpost (3)
• Dump the symbol versions
100
static void write_dump(const char *fname)
{
struct buffer buf = { };
struct symbol *symbol;
int n;
for (n = 0; n < SYMBOL_HASH_SIZE ; n++) {
symbol = symbolhash[n];
while (symbol) {
if (dump_sym(symbol))
buf_printf(&buf, "0x%08xt%st%st%sn",
symbol->crc, symbol->name,
symbol->module->name,
export_str(symbol->export));
symbol = symbol->next;
}
}
write_if_changed(&buf, fname);
}
(scripts/mod/modpost.c)0xb37b83db say_hello /home/shimos/test_module/hello EXPORT_SYMBOL
Generated <module>.mod.c (1)
• Example
101
#include <linux/module.h>
#include <linux/vermagic.h>
#include <linux/compiler.h>
MODULE_INFO(vermagic, VERMAGIC_STRING);
__visible struct module __this_module
__attribute__((section(".gnu.linkonce.this_module"))) = {
.name = KBUILD_MODNAME,
.init = init_module,
#ifdef CONFIG_MODULE_UNLOAD
.exit = cleanup_module,
#endif
.arch = MODULE_ARCH_INIT,
};
static const struct modversion_info ____versions[]
__used
__attribute__((section("__versions"))) = {
{ 0x9412fa01, __VMLINUX_SYMBOL_STR(module_layout) },
{ 0x27e1a049, __VMLINUX_SYMBOL_STR(printk) },
{ 0xbdfb6dbb, __VMLINUX_SYMBOL_STR(__fentry__) },
}; ...
Additional modinfo is included
Base of struct module
Symbols and (expected) versions
which this module depends on.
Generated <module>.mod.c (2)
• Example
102
static const char __module_depends[]
__used
__attribute__((section(".modinfo"))) =
"depends=";
MODULE_INFO(srcversion, "8D5BACDC1EA9421ABFF79DD")
Modinfo about dependency
(but the kernel does not use this)
Modinfo “srcversion”
modinfo
• The modinfo string is created by macros, and concatenated
by collecting the string into a single section
103
#define __MODULE_INFO(tag, name, info) 
static const char __UNIQUE_ID(name)[] 
__used __attribute__((section(".modinfo"), unused, aligned(1))) 
= __stringify(tag) "=" info
(include/linux/moduleparam.h)
#define MODULE_INFO(tag, info) __MODULE_INFO(tag, tag, info)
...
#define MODULE_LICENSE(_license) MODULE_INFO(license, _license)
...
#define MODULE_AUTHOR(_author) MODULE_INFO(author, _author)
...
(include/linux/module.h)
UNIQUE_ID
104
#define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix),
__COUNTER__)
(include/linux/compiler-gcc4.h)

Linux Kernel Module - For NLKB

  • 1.
    Kernel Module Taku Shimosawa 0 Feb.21, 2015 Pour le livre nouveau du Linux noyau
  • 2.
    Notes • Linux kernelversion: 3.19 • Quoted source codes come from kernel/module.c unless otherwise noted. 1
  • 3.
    Kernel Module • Afeature for dynamically adding/removing kernel features while the kernel is running • Benefits • To update the kernel features while running • To reduce memory consumption (and CPU overhead) by loading only necessary kernel modules • Avoiding GPL (Not required to compliant with GPL; proprietary drivers) • Many kernel features can be compiled either linked to the kernel statically or independent modules • File systems, device drivers, etc. • “TRISTATE” in Kconfig (y, m, or n) 2
  • 4.
    Where is thekernel module? • Linux kernel modules are ELF binaries with an extension “.ko” • Many distributions locate the kernel modules under /lib/modules • e.g. /lib/modules/3.13.0-44-generic/kernel (Ubuntu 14.10) • “depmod” finds the kernel modules located under the directory to create module dependency map (modules.dep) • “modprobe” utility loads a kernel module with its dependent modules by looking up the modules.dep file • However, a module located in any place can be loaded to the kernel if specified explicitly. 3
  • 5.
    What is the“dependency?” • A kernel module can export “symbols” that may be used by another kernel module • A symbol : a name for a location in the memory; a global variable or a function in C • If a module (B) uses a symbol exported by another module (A), then the module B has dependency for the module A • Thus, the module A should be loaded before the module B is loaded • (There seems to be no way to load modules that have circular dependencies (e.g. A depends on B; B also depends on A)) 4 Kernel module A Kernel module B function f() { } EXPORT_SYMBOL(f); function g() { f(); } DEP
  • 6.
    Exported Symbols • Thesymbols explicitly marked as “export” can be accessed by other kernel modules • The Linux kernel itself has “export”-ed symbols. • Kernel modules are allowed to use only the exported symbols in the kernel • Not all the global functions are available for the modules! • The symbols to be exported are declared with the EXPORT_SYMBOL and EXPORT_SYMBOL_GPL macros. • The latter makes the symbol available only for GPL modules. 5 struct task_struct *pid_task(struct pid *pid, enum pid_type type) { ... } EXPORT_SYMBOL(pid_task); ... struct task_struct *get_pid_task(struct pid *pid, enum pid_type type) { ... } EXPORT_SYMBOL_GPL(get_pid_task); (kernel/pid.c)
  • 7.
    (BTW) • What makesdifference? 6 struct task_struct *pid_task(struct pid *pid, enum pid_type type) { ... } EXPORT_SYMBOL(pid_task); struct task_struct *get_pid_task(struct pid *pid, enum pid_type type) { struct task_struct *result; rcu_read_lock(); result = pid_task(pid, type); if (result) get_task_struct(result); rcu_read_unlock(); return result; } EXPORT_SYMBOL_GPL(get_pid_task); (kernel/pid.c)
  • 8.
    Make a kernelmodule! • Out-of-tree module • The only necessary files are • Makefile • C source file(s) • Example for Makefile 7 obj-m += hello.o KERN_BUILD=/lib/modules/$(shell uname -r)/build all: make -C $(KERN_BUILD) M=$(PWD) modules clean: make -C $(KERN_BUILD) M=$(PWD) clean cf. obj-$(CONFIG_SHIMOS) = shimos.o
  • 9.
    Inside the kernelmodule • What sections are inside a kernel module? 8 $ readelf –a hello.ko Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align … [ 2] .text PROGBITS 0000000000000000 00000064 0000000000000000 0000000000000000 AX 0 0 1 [ 3] .init.text PROGBITS 0000000000000000 00000064 0000000000000016 0000000000000000 AX 0 0 1 [ 4] .rela.init.text RELA 0000000000000000 000009c0 0000000000000030 0000000000000018 16 3 8 [ 5] .exit.text PROGBITS 0000000000000000 0000007a 0000000000000006 0000000000000000 AX 0 0 1 … [ 7] .modinfo PROGBITS 0000000000000000 00000091 00000000000000c1 0000000000000000 A 0 0 1 [ 8] __versions PROGBITS 0000000000000000 00000160 0000000000000080 0000000000000000 A 0 0 32 … [18] .gnu.linkonce.thi PROGBITS 0000000000000000 00000280 0000000000000260 0000000000000000 WA 0 0 32
  • 10.
    Sections 9 Section Name Description .gnu.linkonce.this_moduleModule structure .modinfo String-style module information (Licenses, etc.) __versions Expected (compile-time) versions (CRC) of the symbols that this module depends on. __ksymtab* Table of symbols which this module exports. __kcrctab* Table of versions of symbols which this module exports. *.init Sections used while initialization (__init) .text, .data, etc. The code and data * : (none), _gpl, _gpl_future, _unused, unused_gpl (License restriction / attribute of the symbols)
  • 11.
    Module load andunload • The simplest way : “insmod” and “rmmod” commands • More sophisticated way is “modprobe” and “modprobe –r” • The former tries to load modules which the specified module depends on • The latter tries to unload modules which the specified module depends on 10 # insmod (file name) [parameters…] (e.g.) # insmod helloworld.ko msg=hoge # rmmod (module name) (e.g.) # rmmod helloworld
  • 12.
    How insmod callsthe kernel? • Source: kmod-19 11 KMOD_EXPORT int kmod_module_insert_module(struct kmod_module *mod, unsigned int flags, const char *options) { ... if (kmod_file_get_direct(mod->file)) { unsigned int kernel_flags = 0; if (flags & KMOD_INSERT_FORCE_VERMAGIC) kernel_flags |= MODULE_INIT_IGNORE_VERMAGIC; if (flags & KMOD_INSERT_FORCE_MODVERSION) kernel_flags |= MODULE_INIT_IGNORE_MODVERSIONS; err = finit_module(kmod_file_get_fd(mod->file), args, kernel_flags); if (err == 0 || errno != ENOSYS) goto init_finished; } ... (libkmod/libkmod-module.c)
  • 13.
    System calls • 3Module-related System Calls • init_module • finit_module • To load a module • delete_module • To unload a module 12 int init_module(void *module_image, unsigned long len, const char *param_values); int finit_module(int fd, const char *param_values, int flags); int delete_module(const char *name, int flags); (from man pages)
  • 14.
    init_module / finit_module •Load a kernel module • How to specify the module? • init_module : by user memory buffer that contains the kernel module image • finit_module : by file descriptor for the kernel module file • By using finit_module, some flags can be specified 13 flags MODULE_INIT_IGNORE_MODVERSIONS Ignore symbol version hashes MODULE_INIT_IGNORE_VERMAGIC Ignore kernel version magic
  • 15.
    delete_module • Unload akernel module • Specifies a module to be unloaded by its “name” • Some flags can be specified • Why different policy from finit_module…? 14 flags O_NONBLOCK | O_TRUNC Forcefully unload the module (even when the ref count is not zero; taints the kernel) O_NONBLOCK Returns immediately with an error (EWOULDBLOCK) O_NONBLOCK not set Stops the module, and waits until the ref count reaches zero. (UNINTERRUPTIBLE)
  • 16.
    Data structures formodules • struct load_info • Used while initializing a module • Most members are ELF-related. 15 struct load_info { Elf_Ehdr *hdr; unsigned long len; Elf_Shdr *sechdrs; char *secstrings, *strtab; unsigned long symoffs, stroffs; struct _ddebug *debug; unsigned int num_debug; bool sig_ok; struct { unsigned int sym, str, mod, vers, info, pcpu; } index; }; (include/linux/module.h)
  • 17.
    Data structures formodules • struct module (too large..) 16 struct module { enum module_state state; /* Member of list of modules */ struct list_head list; /* Unique handle for this module */ char name[MODULE_NAME_LEN]; /* Sysfs stuff. */ struct module_kobject mkobj; ... /* Exported symbols */ const struct kernel_symbol *syms; const unsigned long *crcs; unsigned int num_syms; /* Kernel parameters. */ struct kernel_param *kp; unsigned int num_kp; “modules” list Exported symbols Symbol CRC
  • 18.
    Data structures formodules 17 /* GPL-only exported symbols. */ unsigned int num_gpl_syms; const struct kernel_symbol *gpl_syms; const unsigned long *gpl_crcs; ... #ifdef CONFIG_MODULE_SIG /* Signature was verified. */ bool sig_ok; #endif ... /* Exception table */ unsigned int num_exentries; struct exception_table_entry *extable; /* Startup function. */ int (*init)(void); /* If this is non-NULL, vfree after init() returns */ void *module_init; ... /* Here is the actual code + data, vfree'd on unload. */ void *module_core; GPL Symbols “init” function “init” sections Other (core) sections
  • 19.
    Data structures formodules 18 /* Here are the sizes of the init and core sections */ unsigned int init_size, core_size; /* The size of the executable code in each section. */ unsigned int init_text_size, core_text_size; /* Size of RO sections of the module (text+rodata) */ unsigned int init_ro_size, core_ro_size; /* Arch-specific module values */ struct mod_arch_specific arch; ... /* The command line arguments (may be mangled). People like keeping pointers to this stuff */ char *args; ... #ifdef CONFIG_SMP /* Per-cpu data. */ void __percpu *percpu; unsigned int percpu_size; #endifz Sizes of sections Command line parameters Per-CPU Datas
  • 20.
    Data structures formodules 19 ... #ifdef CONFIG_MODULE_UNLOAD /* What modules depend on me? */ struct list_head source_list; /* What modules do I depend on? */ struct list_head target_list; /* Destruction function. */ void (*exit)(void); struct module_ref __percpu *refptr; #endif #ifdef CONFIG_CONSTRUCTORS /* Constructor functions. */ ctor_fn_t *ctors; unsigned int num_ctors; #endif }; (include/linux/module.h) Lists to manage dependencies (only unload is enabled)
  • 21.
    Module state • statein struct module • During its load, state becomes (created) -> UNFORMED -> COMING -> LIVE. • During its unload, state becomes LIVE -> GOING -> (removed) 20 state description MODULE_STATE_UNFORMED Appeared in the modules list, but still during set up MODULE_STATE_COMING Fully formed. Running module_init. MODULE_STATE_LIVE Normal state. MODULE_STATE_GOING Being unloaded.
  • 22.
    Global module information VariablesDescription LIST_HEAD(modules) List of modules that are in the kernel. DEFINE_MUTEX(module_mutex) Protection against “modules,” etc. • Add : RCU list operations • Remove : stop_machine(~3.18) 21 /* * Mutex protects: * 1) List of modules (also safely readable with preempt_disable), * 2) module_use links, * 3) module_addr_min/module_addr_max. * (delete uses stop_machine/add uses RCU list operations). */ DEFINE_MUTEX(module_mutex); EXPORT_SYMBOL_GPL(module_mutex);
  • 23.
    Loading a Module •Load the whole module file onto memory • Parse the ELF and module information • Check the module information to determine whether the module is loadable or not • Layout the sections and copy to the final location • Add the module to the kernel • Resolve the symbols and apply relocations • Copy module parameters • Call the init function 22 System Calls load_module layout_and_allocate setup_load_info check_mod_info layout_sections layout_symtabs move_module add_unformed_mo dule simply_symbols apply_relocations do_init_module UNFORMED COMING LIVE
  • 24.
    Unloading a Module •Check if the reference count of the module is zero • If zero or it is forced unloading, then set the state to GOING • If not zero, it fails • Call the “exit” function • Free and cleanup everything 23 sys_delete_module try_stop_module __try_stop_module free_module
  • 25.
    stop_machine (-3.18) • UntilLinux 3.18, the reference count check and module remove in module unloading is implemented with stop_machine. 24 static int try_stop_module(struct module *mod, int flags, int *forced) { struct stopref sref = { mod, flags, forced }; return stop_machine(__try_stop_module, &sref, NULL); } static void free_module(struct module *mod) { ... mutex_lock(&module_mutex); stop_machine(__unlink_module, mod, NULL); mutex_unlock(&module_mutex); ... }
  • 26.
    Now (3.19) • Referencecount is now atomic_t (was per-cpu int before) and checked without stop_machine • (thanks to a mysterious guy) 25 static int try_stop_module(struct module *mod, int flags, int *forced) { /* If it's not unused, quit unless we're forcing. */ if (try_release_module_ref(mod) != 0) { *forced = try_force_unload(flags); if (!(*forced)) return -EWOULDBLOCK; } /* Mark it as dying. */ mod->state = MODULE_STATE_GOING; return 0; }
  • 27.
    Now (3.19) • Stop_machinealso goes away from removing 26 static void free_module(struct module *mod) { ... /* Now we can delete it from the lists */ mutex_lock(&module_mutex); /* Unlink carefully: kallsyms could be walking list. */ list_del_rcu(&mod->list); /* Remove this module from bug list, this uses list_del_rcu */ module_bug_cleanup(mod); /* Wait for RCU synchronizing before releasing mod->list and buglist. */ synchronize_rcu(); mutex_unlock(&module_mutex); ... }
  • 28.
  • 29.
    sys_init_module/sys_finit_module • Initialize aload_info structure • Check whether module load is permitted or not. (may_init_module function) • [finit only] Flags check • [init only] Copy module data in user memory to kernel memory (copy_module_from_user function) • [finit only] Read from the fd into kernel memory (copy_module_from_fd function) • Call the load_module function 28
  • 30.
    may_init_module • Capability: CAP_SYS_MODULE •“module_disabled” parameter • Blocks loading and unloading of modules 29 /* Block module loading/unloading? */ int modules_disabled = 0; core_param(nomodule, modules_disabled, bint, 0); ... static int may_init_module(void) { if (!capable(CAP_SYS_MODULE) || modules_disabled) return -EPERM; return 0; } (kernel/module.c) # sysctl kernel.modules_disabled kernel.modules_disabled = 0
  • 31.
    copy_module_from_fd • Pass thefile struct to the security module • vmalloc an area for the module data • Load the whole module file into the area • Set the pointer to info->hdr 30 static int copy_module_from_fd(int fd, struct load_info *info) { ... err = security_kernel_module_from_file(f.file); if (err) goto out; ... info->hdr = vmalloc(stat.size); if (!info->hdr) { err = -ENOMEM; goto out; } ... while (pos < stat.size) { bytes = kernel_read(f.file, pos, (char *)(info->hdr) + pos, stat.size - pos); ... } info->len = pos;
  • 32.
    copy_module_from_user • Differences: • Pass“NULL” pointer to the security module • Just copy_from_user instead of kernel_read 31 static int copy_module_from_user(const void __user *umod, unsigned long len, struct load_info *info) {... info->len = len; ... err = security_kernel_module_from_file(NULL); if (err) return err; ... /* Suck in entire file: we'll want most of it. */ info->hdr = vmalloc(info->len); if (!info->hdr) return -ENOMEM; ... if (copy_from_user(info->hdr, umod, info->len) != 0) { vfree(info->hdr); return -EFAULT; } return 0;
  • 33.
    load_module function (1) •Signature check (module_sig_check) • ELF header check (elf_header_check) • Layout and allocate the final location for the module (layout_and_allocate) • Add the module to the “modules” list (add_unformed_module) • Allocate per-cpu areas used in the module (percpu_modalloc) • Initialize link lists used for dependency management and unloading features (module_unload_init) • Find optional sections (find_module_sections) • License and version dirty hack (check_module_license_and_versions) • Setup MODINFO_ATTR fields (setup_modinfo) 32
  • 34.
    load_module function (2) •Resolve the symbols (simplify_symbols) • Fix up the addresses in the module (apply_relocations) • Extable and per-cpu initialization (post_relocation) • Flush I-cache for the module area (flush_module_icache) • Copy the module parameters to mod->args. • Check duplication of symbols, and setup NX attributes. (complete_formation) • Parse the module parameters (parse_args) • sysfs setup (mod_sysfs_setup) • Free the copy in the load_info structure (free_copy) • Call the init function of the module (do_init_module) 33
  • 35.
    module_sig_check • Check thesignature in the module (if CONFIG_MODULE_SIG=y) • If a module is signed, “signature” and “marker” resides at the tail of the module file. • If signature is OK, module->sig_ok is set to true. • If no signature is found (-ENOKEY) and signature is not enforced, it returns success(0). • Signature is enforced either • When CONFIG_MODULE_SIG_FORCE is Y • When “sig_enforce” parameter is set 34 Module (ELF) Signature Marker “~Module signature appended~n” $ hd /lib/module/3.13.0-45-generic/kernel/fs/btrfs/btrfs.ko 0014b470 f8 a6 b7 74 01 06 01 1e 14 00 00 00 00 00 02 02 |...t............| 0014b480 7e 4d 6f 64 75 6c 65 20 73 69 67 6e 61 74 75 72 |~Module signatur| 0014b490 65 20 61 70 70 65 6e 64 65 64 7e 0a |e appended~.| 0014b49c
  • 36.
    elf_header_check • Sanity checkfor the ELF header • The magic number is correct • The architecture is correct • The length is large enough to contain all the section headers, etc. 35 static int elf_header_check(struct load_info *info) { if (info->len < sizeof(*(info->hdr))) return -ENOEXEC; if (memcmp(info->hdr->e_ident, ELFMAG, SELFMAG) != 0 || info->hdr->e_type != ET_REL || !elf_check_arch(info->hdr) || info->hdr->e_shentsize != sizeof(Elf_Shdr)) return -ENOEXEC; if (info->hdr->e_shoff >= info->len || (info->hdr->e_shnum * sizeof(Elf_Shdr) > info->len - info->hdr->e_shoff)) return -ENOEXEC; return 0; }
  • 37.
    ELF (.ko) ELF Header 36 Elf_Ehdr e_ident e_type e_shoff e_shentsize e_shnum e_shstrndx … … Elf_Shdr Elf_Shdr load_info.hdr(ELF_EHdr) = The head of the kernel module file = The head of the ELF = Pointer to ELF_EHdr e_shentsize e_shentsize e_shoff e_shnum ELF (.ko) e_ident: magic (‘x7fELF’), 32/64-bit, etc. (16 byte in total incl. padding) e_type: ET_REL / ET_EXEC / ET_DYN
  • 38.
    layout_and_allocate • Fill thesection information of the load_info, and create a module structure pointing to the temporary location (setup_load_info) • Check the module information and report if the module taints the kernel (check_modinfo) • Calculate the size required for the final location of the module (layout_sections / layout_symtab) • Allocate the memory of the calculated size, and copy the contents of the module, and move the pointer of the module structure there (move_module). 37
  • 39.
    setup_load_info • Set thefollowing members according to the ELF header and section headers. • sechdrs (Pointer to the section header) • secstrings (Pointer to the string section that contains section names) • index.info, index.ver (Section indices of modinfo, version) • index.sym, index.str (Section indices of symbols, strings) • strtab (Pointer to the string section) • index.mod (section index of module section) • “.gnu.linkonce.this_module” section • Set the module pointer to this section (temporally) • index.pcu (section index for per-cpu section) • “.data..percpu” section (if exists) • Return a pointer to a (temporary) module structure 38
  • 40.
    setup_load_info • info->sechdrs • info->secstrings •info->strtab • Each section’s offset is stored in ELF_Shdr.sh_offset • info->index.info = 12 • info->index.vers = 16 • info->index.sym = 24 • info->index.str = 25 • info->index.mod = 18 • struct module *mod • info->index.pcpu = 0 • No per-cpu data in this example. 39 Elf_Ehdr Elf_Shdr (0) Elf_Shdr (18) .gnu.linkonce.this_module Elf_Shdr (23) : .shstrtab Elf_Shdr (24) : .symtab Elf_Shdr (25) : .strtab … .shstrtab section .strtab section .gnu.linkonce.this_module section Elf_Shdr (12) : .modinfo Elf_Shdr (16) : __versions Header Section (Contents)
  • 41.
    check_modinfo (1) • Check“modinfo” in the module, and check if the version magic is identical to the current kernel, and mark “tainted” if it taints the kernel. • “Modinfo” resides in the “.modinfo” section, and is composed of zero-terminated strings of key-value pairs connected by “=“. 40 description=Hello world kernel module0 author=Taku Shimosawa <shimos@shimos.net>0 license=GPL v20 srcversion=8D5BACDC1EA9421ABFF79DD0 depends=0 vermagic=3.13.0-44-generic SMP mod_unload modversions
  • 42.
    check_modinfo (2) • First,check the version magic in the module 41 static int check_modinfo(struct module *mod, struct load_info *info, int flags) { const char *modmagic = get_modinfo(info, "vermagic"); ... if (flags & MODULE_INIT_IGNORE_VERMAGIC) modmagic = NULL; ... if (!modmagic) { err = try_to_force_load(mod, "bad vermagic"); if (err) return err; } else if (!same_magic(modmagic, vermagic, info->index.vers)) { pr_err("%s: version magic '%s' should be '%s'n", mod->name, modmagic, vermagic); return -ENOEXEC; }
  • 43.
    check_modinfo (3) • Versionmagic • Example: • same_magic function • Compare the vermagic strings excluding CRCs if they have CRCs. 42 #define VERMAGIC_STRING UTS_RELEASE " " MODULE_VERMAGIC_SMP MODULE_VERMAGIC_PREEMPT MODULE_VERMAGIC_MODULE_UNLOAD MODULE_VERMAGIC_MODVERSIONS MODULE_ARCH_VERMAGIC (include/linux/vermagic.h) 3.13.0-44-generic SMP mod_unload modversions
  • 44.
    check_modinfo (4) • …Andmark tainted if any is necesary 43 if (!get_modinfo(info, "intree")) add_taint_module(mod, TAINT_OOT_MODULE, LOCKDEP_STILL_OK); if (get_modinfo(info, "staging")) { add_taint_module(mod, TAINT_CRAP, LOCKDEP_STILL_OK); pr_warn("%s: module is from the staging directory, the quality " "is unknown, you have been warned.n", mod->name); } /* Set up license info based on the info section */ set_license(mod, get_modinfo(info, "license"));
  • 45.
    check_modinfo (5) • Licenseinformation is also important 44 static void set_license(struct module *mod, const char *license) { if (!license) license = "unspecified"; if (!license_is_gpl_compatible(license)) { if (!test_taint(TAINT_PROPRIETARY_MODULE)) pr_warn("%s: module license '%s' taints kernel.n", mod->name, license); add_taint_module(mod, TAINT_PROPRIETARY_MODULE, LOCKDEP_NOW_UNRELIABLE); } }
  • 46.
    check_modinfo (6) • GPLcompatible? • See the “GPL0….” case 45 static inline int license_is_gpl_compatible(const char *license) { return (strcmp(license, "GPL") == 0 || strcmp(license, "GPL v2") == 0 || strcmp(license, "GPL and additional rights") == 0 || strcmp(license, "Dual BSD/GPL") == 0 || strcmp(license, "Dual MIT/GPL") == 0 || strcmp(license, "Dual MPL/GPL") == 0); } (include/linux/license.h)
  • 47.
    check_modinfo (7) • Also,the kernel is marked tainted when the module is loaded forcefully 46 static int try_to_force_load(struct module *mod, const char *reason) { #ifdef CONFIG_MODULE_FORCE_LOAD if (!test_taint(TAINT_FORCED_MODULE)) pr_warn("%s: %s: kernel tainted.n", mod->name, reason); add_taint_module(mod, TAINT_FORCED_MODULE, LOCKDEP_NOW_UNRELIABLE); return 0; #else return -ENOEXEC; #endif }
  • 48.
    Taints! • Tainted maskare composed of several flags that identifies the reason of tainting • Lockdep is disabled if it will not work well • Ignoring the version magic, proprietary drivers, forceful unload 47 void add_taint(unsigned flag, enum lockdep_ok lockdep_ok) { if (lockdep_ok == LOCKDEP_NOW_UNRELIABLE && __debug_locks_off()) pr_warn("Disabling lock debugging due to kernel taintn"); set_bit(flag, &tainted_mask); } (kernel/panic.c) static inline void add_taint_module(struct module *mod, unsigned flag, enum lockdep_ok lockdep_ok) { add_taint(flag, lockdep_ok); mod->taints |= (1U << flag); } (kernel/module.c) Kernel global flags Per-module flags
  • 49.
    Taints! • 15 reasonsare defined 48 #define TAINT_PROPRIETARY_MODULE 0 #define TAINT_FORCED_MODULE 1 #define TAINT_CPU_OUT_OF_SPEC 2 #define TAINT_FORCED_RMMOD 3 #define TAINT_MACHINE_CHECK 4 #define TAINT_BAD_PAGE 5 #define TAINT_USER 6 #define TAINT_DIE 7 #define TAINT_OVERRIDDEN_ACPI_TABLE 8 #define TAINT_WARN 9 #define TAINT_CRAP 10 #define TAINT_FIRMWARE_WORKAROUND 11 #define TAINT_OOT_MODULE 12 #define TAINT_UNSIGNED_MODULE 13 #define TAINT_SOFTLOCKUP 14 (include/linux/kernel.h) $ sysctl kernel.tainted kernel.tainted = 12288 12288 = 0x3000
  • 50.
    layout_sections • Calculate thesize of final memory to load the module • Load only sections with “SHF_ALLOC” flags set • Calculate sizes for “core” and “init” • “init” sections are determined when the section name starts with “.init” • Sets the following member of module • core_size : sum of the sizes of the “core” sections to be loaded • core_text_size, core_ro_size : sum of the sizes of the text and R/O “core” sections • init_size : sum of the sizes of the “init” sections to be loaded • init_text_size, init_ro_size : … of “init” sections • sh_entsize in ELF_Shdr is used as the offset of the memory where the section will be loaded. 49
  • 51.
    layout_sections • The sectionsin the example “hello.ko” are categorized as follows: 50 Sections Core Text .text, .exit.text R/O __ksymtab, __kcrctab, .rodata.str1.1, __ksymtab_strings __mcount_loc, R/W .data, .gnu.linkonce.this_module, .bss, Init Text .init.text R/O R/W (Others) Not loaded .rela.text, .rela.init.text, .rela__ksymtab, .rela__kcrctab .rela__mcount_loc, .rela.gnu.linonce.this_module .comment, .note.GNU-stack, .shstrtab, .symtab, .strtab .modinfo, __versions (*) (*) These two sections originally have SHF_ALLOC, but the flags are dropped by rewrite_section_headers
  • 52.
    layout_symtab • Put thesymtab and strtab at the end of the init part • (Actually this function does not put, but add init_size by the size of symtab) • Put the symtab and strtab for the core symbols at the end of core part. 51
  • 53.
    move_module • Allocate thefinal memory of the module, and update the boundary addresses for the modules (module_alloc_update_bounds) • Copy the section contents and update sh_addr’s 52 static void *module_alloc_update_bounds(unsigned long size) { void *ret = module_alloc(size); if (ret) { mutex_lock(&module_mutex); if ((unsigned long)ret < module_addr_min) module_addr_min = (unsigned long)ret; if ((unsigned long)ret + size > module_addr_max) module_addr_max = (unsigned long)ret + size; mutex_unlock(&module_mutex); } return ret; }
  • 54.
    module_alloc : x86 •x86 • Get_module_load_offset() determines the load offset as a random value at the first time if KASLR is enabled 53 #define MODULES_VADDR VMALLOC_START #define MODULES_END VMALLOC_END (arch/x86/include/asm/pgtable_32_types.h) #define MODULES_VADDR (__START_KERNEL_map + KERNEL_IMAGE_SIZE) #define MODULES_END _AC(0xffffffffff000000, UL) (arch/x86/include/asm/pgtable_64_types.h) void *module_alloc(unsigned long size) { if (PAGE_ALIGN(size) > MODULES_LEN) return NULL; return __vmalloc_node_range(size, 1, MODULES_VADDR + get_module_load_offset(), MODULES_END, GFP_KERNEL | __GFP_HIGHMEM, PAGE_KERNEL_EXEC, NUMA_NO_NODE, __builtin_return_address(0)); } (arch/x86/kernel/module.c)
  • 55.
    module_alloc : ARM •ARM 54 #ifndef CONFIG_THUMB2_KERNEL #define MODULES_VADDR (PAGE_OFFSET - SZ_16M) #else /* smaller range for Thumb-2 symbols relocation (2^24)*/ #define MODULES_VADDR (PAGE_OFFSET - SZ_8M) #endif (arch/arm/include/asm/memory.h) #define MODULES_END (PAGE_OFFSET) #define MODULES_VADDR (MODULES_END - SZ_64M) (arch/arm64/include/asm/memory.h) #ifdef CONFIG_MMU void *module_alloc(unsigned long size) { return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, GFP_KERNEL, PAGE_KERNEL_EXEC, NUMA_NO_NODE, __builtin_return_address(0)); } #endif (arch/arm/kernel/module.c)
  • 56.
    module to finalplace • Struct module for the module loaded was pointed to the temporary module image memory • Now, it’s loaded and copied to the final location , so the pointer is also changed to the final location 55 /* Module has been copied to its final place now: return it. */ mod = (void *)info->sechdrs[info->index.mod].sh_addr;
  • 57.
    load_module function (1)[RE] • Signature check (module_sig_check) • ELF header check (elf_header_check) • Layout and allocate the final location for the module (layout_and_allocate) • Add the module to the “modules” list (add_unformed_module) • Allocate per-cpu areas used in the module (percpu_modalloc) • Initialize link lists used for dependency management and unloading features (module_unload_init) • Find optional sections (find_module_sections) • License and version dirty hack (check_module_license_and_versions) • Setup MODINFO_ATTR fields (setup_modinfo) 56
  • 58.
    add_unformed_module • Add themodule to the “modules” list • Checking the duplicated loading of the same module • If the same module is still being loaded, this waits for the completion of the load, and it tries again • Just in case that the module fails to be loaded 57
  • 59.
    add_unformed_module 58 static int add_unformed_module(structmodule *mod) { mod->state = MODULE_STATE_UNFORMED; ... again: mutex_lock(&module_mutex); old = find_module_all(mod->name, strlen(mod->name), true); if (old != NULL) { if (old->state == MODULE_STATE_COMING || old->state == MODULE_STATE_UNFORMED) { mutex_unlock(&module_mutex); err = wait_finished_loading(mod); if (err) goto out_unlocked; goto again; } err = -EEXIST; goto out; } list_add_rcu(&mod->list, &modules); err = 0; ...
  • 60.
    When loading occursconcurrently 59 Module A UNFORMED LIVE Module A UNFORMED (fail) Module B (depends on A) UNFORMED Resolve Resolve LIVE wakeup_all (@do_init_module) time COMING
  • 61.
    percpu_modalloc • Allocate per-cpuarea for the size of the per-cpu section 60 static int percpu_modalloc(struct module *mod, struct load_info *info) { Elf_Shdr *pcpusec = &info->sechdrs[info->index.pcpu]; unsigned long align = pcpusec->sh_addralign; if (!pcpusec->sh_size) return 0; ... mod->percpu = __alloc_reserved_percpu(pcpusec->sh_size, align); if (!mod->percpu) { pr_warn("%s: Could not allocate %lu bytes percpu datan", mod->name, (unsigned long)pcpusec->sh_size); return -ENOMEM; } mod->percpu_size = pcpusec->sh_size; return 0; }
  • 62.
    module_unload_init • Initialize areference counter for the module • After this function, it becomes 2. • Initialize lists that manages dependency • source_list : list of “usages” in which the module is contained as their “source” (= the list of modules which uses the symbols of the module) • target_list : list of “usages” in which the module is contained as their “target” (= the list of modules symbols of which the module uses) 61 static int module_unload_init(struct module *mod) { atomic_set(&mod->refcnt, MODULE_REF_BASE); INIT_LIST_HEAD(&mod->source_list); INIT_LIST_HEAD(&mod->target_list); atomic_inc(&mod->refcnt); return 0; }
  • 63.
    find_module_sections • Find additionalsections in the module • Mostly related to symbol tables, and tracers 62 Sections __param __ksymtab __kcrctab __ksymtab_gpl __kcrctab_gpl __ksymtab_gpl_future __kcrctab_gpl_future __ksymtab_unused __kcrctab_unused __ksymtab_unused_gpl __kcrctab_unused_gtpl Sections .ctors / .init_array __tracepoints_ptrs __jump_table _ftrace_events __trace_printk_fmt __mcount_loc __ex_table __verbose
  • 64.
    check_module_license_and_versions • Some hackson specific modules • e.g.) ndiswrapper driver may be GPL (it needs symbols exported only to GPL modules), but the driver it loads will not be GPL, so mark tainted 63 static int check_module_license_and_versions(struct module *mod) { if (strcmp(mod->name, "ndiswrapper") == 0) add_taint(TAINT_PROPRIETARY_MODULE, LOCKDEP_NOW_UNRELIABLE); /* driverloader was caught wrongly pretending to be under GPL */ if (strcmp(mod->name, "driverloader") == 0) add_taint_module(mod, TAINT_PROPRIETARY_MODULE, LOCKDEP_NOW_UNRELIABLE); /* lve claims to be GPL but upstream won't provide source */ if (strcmp(mod->name, "lve") == 0) add_taint_module(mod, TAINT_PROPRIETARY_MODULE, LOCKDEP_NOW_UNRELIABLE);
  • 65.
    check_module_license_and_versions • Checks whetherthe symbols have CRCs (versions) 64 #ifdef CONFIG_MODVERSIONS if ((mod->num_syms && !mod->crcs) || (mod->num_gpl_syms && !mod->gpl_crcs) || (mod->num_gpl_future_syms && !mod->gpl_future_crcs) #ifdef CONFIG_UNUSED_SYMBOLS || (mod->num_unused_syms && !mod->unused_crcs) || (mod->num_unused_gpl_syms && !mod->unused_gpl_crcs) #endif ) { return try_to_force_load(mod, "no versions for exported symbols"); } #endif return 0;
  • 66.
    setup_modinfo • Call “setup”for module attributes • Only “version” and “srcversion” have “setup” callback. • Module attributes • version, srcversion • uevent • initstate • coresize, initsize • taint • refcnt 65 #define MODINFO_ATTR(field) static void setup_modinfo_##field(struct module *mod, const char *s) { mod->field = kstrdup(s, GFP_KERNEL); }
  • 67.
    load_module function (2)[Re] • Resolve the symbols (simplify_symbols) • Fix up the addresses in the module (apply_relocations) • Extable and per-cpu initialization (post_relocation) • Flush I-cache for the module area (flush_module_icache) • Copy the module parameters to mod->args. • Check duplication of symbols, and setup NX attributes. (complete_formation) • Parse the module parameters (parse_args) • sysfs setup (mod_sysfs_setup) • Free the copy in the load_info structure (free_copy) • Call the init function of the module (do_init_module) 66
  • 68.
    simplify_symbols • Change theaddress of the unresolved symbols in the “symtab” section to the actual addresses 67 static int simplify_symbols(struct module *mod, const struct load_info *info) { Elf_Shdr *symsec = &info->sechdrs[info->index.sym]; Elf_Sym *sym = (void *)symsec->sh_addr; ... for (i = 1; i < symsec->sh_size / sizeof(Elf_Sym); i++) { const char *name = info->strtab + sym[i].st_name; ... case SHN_UNDEF: ksym = resolve_symbol_wait(mod, info, name); /* Ok if resolved. */ if (ksym && !IS_ERR(ksym)) { sym[i].st_value = ksym->value; break; } /* Ok if weak. */ if (!ksym && ELF_ST_BIND(sym[i].st_info) == STB_WEAK) break;
  • 69.
    resolve_symbol_wait • Waits ifthe resolved symbol is that of the module which is under initialization. 68 static const struct kernel_symbol * resolve_symbol_wait(struct module *mod, const struct load_info *info, const char *name) { const struct kernel_symbol *ksym; char owner[MODULE_NAME_LEN]; if (wait_event_interruptible_timeout(module_wq, !IS_ERR(ksym = resolve_symbol(mod, info, name, owner)) || PTR_ERR(ksym) != -EBUSY, 30 * HZ) <= 0) { pr_warn("%s: gave up waiting for init of module %s.n", mod->name, owner); } return ksym; }
  • 70.
    resolve_symbol • Find thesymbol from the kernel’s symbol tables and other modules’ symbol tables. (find_symbol) • If found, check if the version (CRC) of the symbol matches one that the module expects (check_versions) • And add dependency for the target module and the symbol owner module (ref_module) 69
  • 71.
    find_symbol (1) • Well,try to find it from the kernel 70 bool each_symbol_section(bool (*fn)(const struct symsearch *arr, struct module *owner, void *data), void *data) { struct module *mod; static const struct symsearch arr[] = { { __start___ksymtab, __stop___ksymtab, __start___kcrctab, NOT_GPL_ONLY, false }, { __start___ksymtab_gpl, __stop___ksymtab_gpl, __start___kcrctab_gpl, GPL_ONLY, false }, { __start___ksymtab_gpl_future, __stop___ksymtab_gpl_future, __start___kcrctab_gpl_future, WILL_BE_GPL_ONLY, false }, ... }; if (each_symbol_in_section(arr, ARRAY_SIZE(arr), NULL, fn, data)) return true;
  • 72.
    find_symbol (2) • And,try to find in the modules (after UNFORMED) 71 list_for_each_entry_rcu(mod, &modules, list) { struct symsearch arr[] = { { mod->syms, mod->syms + mod->num_syms, mod->crcs, NOT_GPL_ONLY, false }, { mod->gpl_syms, mod->gpl_syms + mod->num_gpl_syms, mod->gpl_crcs, GPL_ONLY, false }, { mod->gpl_future_syms, mod->gpl_future_syms + mod->num_gpl_future_syms, mod->gpl_future_crcs, WILL_BE_GPL_ONLY, false }, if (mod->state == MODULE_STATE_UNFORMED) continue; if (each_symbol_in_section(arr, ARRAY_SIZE(arr), mod, fn, data)) return true; } return false; }
  • 73.
    find_symbol (3) • Bianrysearch in the section! 72 static int cmp_name(const void *va, const void *vb) { const char *a; const struct kernel_symbol *b; a = va; b = vb; return strcmp(a, b->name); } static bool find_symbol_in_section(const struct symsearch *syms, struct module *owner, void *data) { struct find_symbol_arg *fsa = data; sym = bsearch(fsa->name, syms->start, syms->stop - syms->start, sizeof(struct kernel_symbol), cmp_name); if (sym != NULL && check_symbol(syms, owner, sym - syms->start, data)) return true; return false; } Checks the found symbol’s target license
  • 74.
    ref_module • If thetarget module is NULL (=the symbol is in the kernel) or the module already uses the target module, it immediately returns. • Increment the reference counter of the target module (if the target module is in the middle of initialization, returns –EBUSY) • Add usage • Source : the module • Target : the target module 73 static int add_module_usage(struct module *a, struct module *b) { struct module_use *use; use = kmalloc(sizeof(*use), GFP_ATOMIC); use->source = a; use->target = b; list_add(&use->source_list, &b->source_list); list_add(&use->target_list, &a->target_list); }
  • 75.
    Usage example 74 Kernel moduleA Kernel module B function f() { } function g() { f(); } DEP struct module A refcnt : 2 struct module B refcnt: 1 struct module_use source: &B target: &A source_list target_list source_list target_list
  • 76.
    apply_relocations • Apply relocationsfor each “rel” section • “rel” sections • Section Type : SHT_REL or SHT_RELA 75 [Nr] Name Type Address Offset Size EntSize Flags Link Info Align [ 2] .text PROGBITS 0000000000000000 00000070 0000000000000019 0000000000000000 AX 0 0 16 [ 3] .rela.text RELA 0000000000000000 00000ca8 0000000000000048 0000000000000018 24 2 8 [ 4] .init.text PROGBITS 0000000000000000 00000089 0000000000000016 0000000000000000 AX 0 0 1 [ 5] .rela.init.text RELA 0000000000000000 00000cf0 0000000000000030 0000000000000018 24 4 8 [24] .symtab SYMTAB 0000000000000000 00000db0 00000000000003c0 0000000000000018 25 32 8 [25] .strtab STRTAB 0000000000000000 00001170 000000000000014a 0000000000000000 0 0 1
  • 77.
    Relocation • Example • Thisfunction uses the “printk” symbol outside the module. (And also __fentry__) 76 0000000000000000 <say_hello>: 0: e8 00 00 00 00 callq 5 <say_hello+0x5> 1: R_X86_64_PC32 __fentry__-0x4 5: 55 push %rbp 6: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 9: R_X86_64_32S .rodata.str1.1 d: 31 c0 xor %eax,%eax f: 48 89 e5 mov %rsp,%rbp 12: e8 00 00 00 00 callq 17 <say_hello+0x17> 13: R_X86_64_PC32 printk-0x4 17: 5d pop %rbp 18: c3 retq void say_hello(void) { printk(KERN_INFO "Hello, World.n"); } RIP-relative is based on the next instruction
  • 78.
    apply_relocate[_add] • Addressing isarchitecture-dependent, so the relocation is also architecture-dependent • x86_64 (RELA) • An RELA section is an array of Elf64_Rela • In the “printk” example • r_offset = 0x13 • r_info = R_X86_64_PC32 (RIP-relative in x86_64) • r_addend = -0x04 77 typedef struct elf64_rela { Elf64_Addr r_offset; /* Location at which to apply the action */ Elf64_Xword r_info; /* index and type of relocation */ Elf64_Sxword r_addend; /* Constant addend used to compute value */ } Elf64_Rela;
  • 79.
    apply_relocate_add in x86_64 78 intapply_relocate_add(Elf64_Shdr *sechdrs, const char *strtab, unsigned int symindex, unsigned int relsec, struct module *me) { ... for (i = 0; i < sechdrs[relsec].sh_size / sizeof(*rel); i++) { /* This is where to make the change */ loc = (void *)sechdrs[sechdrs[relsec].sh_info].sh_addr + rel[i].r_offset; /* This is the symbol it is referring to. Note that all undefined symbols have been resolved. */ sym = (Elf64_Sym *)sechdrs[symindex].sh_addr + ELF64_R_SYM(rel[i].r_info); ... val = sym->st_value + rel[i].r_addend;
  • 80.
    apply_relocate_add in x86_64 79 switch(ELF64_R_TYPE(rel[i].r_info)) { ... case R_X86_64_64: *(u64 *)loc = val; break; ... case R_X86_64_32S: *(s32 *)loc = val; if ((s64)val != *(s32 *)loc) goto overflow; break; case R_X86_64_PC32: val -= (u64)loc; *(u32 *)loc = val; #if 0 if ((s64)val != *(s32 *)loc) goto overflow; #endif break; Calculate the delta between the current address and the target address
  • 81.
    post_relocation • Sort theexception table (sort_extable) • Exception table: the instruction addresses which the page fault handler treats specially page faults for. • get_user etc. • Copy the per-cpu section contents for all the possible cpus. (percpu_modcopy) • Set kallsyms-related members to the final location, and copy core symtab from the whole symtab. (add_kallsyms) • Call architecture-dependent finalizing function of loading (module_finalize) 80 for_each_possible_cpu(cpu) memcpy(per_cpu_ptr(mod->percpu, cpu), from, size);
  • 82.
    module_finalize in x86_64 •Alternatives, paravirt and so on. 81 int module_finalize(const Elf_Ehdr *hdr, const Elf_Shdr *sechdrs, struct module *me) { const Elf_Shdr *s, *text = NULL, *alt = NULL, *locks = NULL, *para = NULL; char *secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset; for (s = sechdrs; s < sechdrs + hdr->e_shnum; s++) { if (!strcmp(".text", secstrings + s->sh_name)) text = s; if (!strcmp(".altinstructions", secstrings + s->sh_name)) alt = s; if (!strcmp(".smp_locks", secstrings + s->sh_name)) locks = s; if (!strcmp(".parainstructions", secstrings + s->sh_name)) para = s; if (alt) { /* patch .altinstructions */ void *aseg = (void *)alt->sh_addr; apply_alternatives(aseg, aseg + alt->sh_size); } ...
  • 83.
    flush_module_icache • Flush instructioncache for text area so that the code be executed correctly 82 static void flush_module_icache(const struct module *mod) { mm_segment_t old_fs; /* flush the icache in correct context */ old_fs = get_fs(); set_fs(KERNEL_DS); if (mod->module_init) flush_icache_range((unsigned long)mod->module_init, (unsigned long)mod->module_init + mod->init_size); flush_icache_range((unsigned long)mod->module_core, (unsigned long)mod->module_core + mod->core_size); set_fs(old_fs); }
  • 84.
    complete_formation • Check ifthe exported symbols are already exported by another module (verify_export_symbols) • Add section information of symbols for BUG report (module_bug_finalize) • Set NX and RO for core and init area. • Set the module state to MODULE_STATE_COMING 83 mod->state = MODULE_STATE_COMING;
  • 85.
    load_module function (2)[Re] • Resolve the symbols (simplify_symbols) • Fix up the addresses in the module (apply_relocations) • Extable and per-cpu initialization (post_relocation) • Flush I-cache for the module area (flush_module_icache) • Copy the module parameters to mod->args. • Check duplication of symbols, and setup NX attributes. (complete_formation) • Parse the module parameters (parse_args) • sysfs setup (mod_sysfs_setup) • Free the copy in the load_info structure (free_copy) • Call the init function of the module (do_init_module) 84
  • 86.
    do_init_module (1) • Makea structure for call_rcu to free init area • And call the init function in the module • Set the module state to MODULE_STATE_LIVE 85 struct mod_initfree *freeinit; freeinit = kmalloc(sizeof(*freeinit), GFP_KERNEL); ... freeinit->module_init = mod->module_init; do_mod_ctors(mod); /* Start the module */ if (mod->init != NULL) ret = do_one_initcall(mod->init); mod->state = MODULE_STATE_LIVE;
  • 87.
    do_init_module (2) • Toavoid deadlock, perform synchronize • Drop the initial reference • And clears the init-related stuffs! 86 if (current->flags & PF_USED_ASYNC) async_synchronize_full(); mutex_lock(&module_mutex); /* Drop initial reference. */ module_put(mod); trim_init_extable(mod); #ifdef CONFIG_KALLSYMS mod->num_symtab = mod->core_num_syms; mod->symtab = mod->core_symtab; mod->strtab = mod->core_strtab; #endif unset_module_init_ro_nx(mod); module_arch_freeing_init(mod);
  • 88.
    do_init_module (3) • Finally,frees the init stuffs • Wakes up if someone is waiting for the completion of the initialization. 87 call_rcu(&freeinit->rcu, do_free_init); mutex_unlock(&module_mutex); wake_up_all(&module_wq);
  • 89.
  • 90.
    sys_delete_module • Check capabilityand module blocking parameter • Find the specified module by name • If the module has the init function AND does not have the exit function and it is not forceful unload, it fails with –EBUSY • Try to stop the module (try_stop_module) • Call the exit function • Frees the module 89
  • 91.
    Now (3.19) [RE] •Reference count is now atomic_t (was per-cpu int before) and checked without stop_machine • (thanks to a mysterious guy) 90 static int try_stop_module(struct module *mod, int flags, int *forced) { /* If it's not unused, quit unless we're forcing. */ if (try_release_module_ref(mod) != 0) { *forced = try_force_unload(flags); if (!(*forced)) return -EWOULDBLOCK; } /* Mark it as dying. */ mod->state = MODULE_STATE_GOING; return 0; }
  • 92.
    try_release_module_ref • Decrement thereference counter and checks if it reaches is zero (= can be unloaded). 91 static int try_release_module_ref(struct module *mod) { int ret; /* Try to decrement refcnt which we set at loading */ ret = atomic_sub_return(MODULE_REF_BASE, &mod->refcnt); BUG_ON(ret < 0); if (ret) /* Someone can put this right now, recover with checking */ ret = atomic_add_unless(&mod->refcnt, MODULE_REF_BASE, 0); return ret; }
  • 93.
    Details (3) Building aout-of-tree kernel module 92
  • 94.
    Build steps (1): .c -> .o • make .tmp_versions, create .tmp_versions/<module>.mod • The file contains the names of the final .ko file and source .o files • Compile .tmp_[name].o from [name].c • Calculate the CRCs (version) for the exported symbols • Find a __ksymtab section in .tmp_[name].o • objdump –h (obj) | grep –q __ksymtab • Calculate CRC for exported symbols in the source file by genksyms (Output is LD Script format) • Compile the CRC values into the object file. 93 cmd_modversions = if $(OBJDUMP) -h $(@D)/.tmp_$(@F) | grep -q __ksymtab; then $(call cmd_gensymtypes,$(KBUILD_SYMTYPES),$(@:.o=.symtypes)) > $(@D)/.tmp_$(@F:.o=.ver); $(LD) $(LDFLAGS) -r -o $@ $(@D)/.tmp_$(@F) -T $(@D)/.tmp_$(@F:.o=.ver); rm -f $(@D)/.tmp_$(@F) $(@D)/.tmp_$(@F:.o=.ver); else mv -f $(@D)/.tmp_$(@F) $@; fi; _crc_say_hello = 0xb37b83db ;
  • 95.
    Exported Symbols • Eachexported symbol has a struct in __ksymtab* section. 94 #define __EXPORT_SYMBOL(sym, sec) extern typeof(sym) sym; __CRC_SYMBOL(sym, sec) static const char __kstrtab_##sym[] __attribute__((section("__ksymtab_strings"), aligned(1))) = VMLINUX_SYMBOL_STR(sym); extern const struct kernel_symbol __ksymtab_##sym; __visible const struct kernel_symbol __ksymtab_##sym __used __attribute__((section("___ksymtab" sec "+" #sym), unused)) = { (unsigned long)&sym, __kstrtab_##sym } #define EXPORT_SYMBOL(sym) __EXPORT_SYMBOL(sym, "") #define EXPORT_SYMBOL_GPL(sym) __EXPORT_SYMBOL(sym, "_gpl") #define EXPORT_SYMBOL_GPL_FUTURE(sym) __EXPORT_SYMBOL(sym, "_gpl_future") (include/linux/export.h)
  • 96.
    CRC sections • DeclareCRC symbols in CRC sections with the weak attribute. 95 #ifndef __GENKSYMS__ #ifdef CONFIG_MODVERSIONS /* Mark the CRC weak since genksyms apparently decides not to * generate a checksums for some symbols */ #define __CRC_SYMBOL(sym, sec) extern __visible void *__crc_##sym __attribute__((weak)); static const unsigned long __kcrctab_##sym __used __attribute__((section("___kcrctab" sec "+" #sym), unused)) = (unsigned long) &__crc_##sym; #else #define __CRC_SYMBOL(sym, sec) #endif (include/linux/export.h)
  • 97.
    Build Steps (2): .c -> .o • Create __mcount_loc list (if –pg is enabled) • The list of pointers where “mcount” is called • Fix up the dep file • Link into a single object file (<module>.o) if the module is composed of multiple object files 96
  • 98.
    Build Steps (3)– Stage 2 • Create <module>.mod.c and <module>.symvers by modpost command • Compile the <module>.mod.c • Link the <module>.mod.o and <module>.o into a module <module>.ko 97 modpost = scripts/mod/modpost $(if $(CONFIG_MODVERSIONS),-m) $(if $(CONFIG_MODULE_SRCVERSION_ALL),-a,) $(if $(KBUILD_EXTMOD),-i,-o) $(kernelsymfile) $(if $(KBUILD_EXTMOD),-I $(modulesymfile)) $(if $(KBUILD_EXTRA_SYMBOLS), $(patsubst %, -e %,$(KBUILD_EXTRA_SYMBOLS))) $(if $(KBUILD_EXTMOD),-o $(modulesymfile)) $(if $(CONFIG_DEBUG_SECTION_MISMATCH),,-S) $(if $(KBUILD_EXTMOD)$(KBUILD_MODPOST_WARN),-w) MODPOST_OPT=$(subst -i,-n,$(filter -i,$(MAKEFLAGS))) # We can go over command line length here, so be careful. quiet_cmd_modpost = MODPOST $(words $(filter-out vmlinux FORCE, $^)) modules cmd_modpost = $(MODLISTCMD) | sed 's/.ko$$/.o/' | $(modpost) $(MODPOST_OPT) -s -T -
  • 99.
    modpost (1) • Collectsmodule information, symbol information and versions from kernel symbols, object files, and generate module source file and symvers file. • Arguments • Options 98 Option Description -m CONFIG_MODVERSIONS (Symbol version) -a CONFIG_MODULE_SRCVERSION_ALL (“srcversion” in modinfo) MD4 for the source files that made the module -I (symvers file) Input symbol versions (kernel symbols) -e (symvers file) Input extra symbol versions -o (symvers file) Output symbol versions (for exported symbols of the module) -T (files) Source (object) file list $ modpost [Options...] [(Module object files...)]
  • 100.
    modpost (2) • Generatethe source file 99 for (mod = modules; mod; mod = mod->next) { char fname[PATH_MAX]; ... buf.pos = 0; add_header(&buf, mod); add_intree_flag(&buf, !external_module); add_staging_flag(&buf, mod->name); err |= add_versions(&buf, mod); add_depends(&buf, mod, modules); add_moddevtable(&buf, mod); add_srcversion(&buf, mod); sprintf(fname, "%s.mod.c", mod->name); write_if_changed(&buf, fname); } (scripts/mod/modpost.c)
  • 101.
    modpost (3) • Dumpthe symbol versions 100 static void write_dump(const char *fname) { struct buffer buf = { }; struct symbol *symbol; int n; for (n = 0; n < SYMBOL_HASH_SIZE ; n++) { symbol = symbolhash[n]; while (symbol) { if (dump_sym(symbol)) buf_printf(&buf, "0x%08xt%st%st%sn", symbol->crc, symbol->name, symbol->module->name, export_str(symbol->export)); symbol = symbol->next; } } write_if_changed(&buf, fname); } (scripts/mod/modpost.c)0xb37b83db say_hello /home/shimos/test_module/hello EXPORT_SYMBOL
  • 102.
    Generated <module>.mod.c (1) •Example 101 #include <linux/module.h> #include <linux/vermagic.h> #include <linux/compiler.h> MODULE_INFO(vermagic, VERMAGIC_STRING); __visible struct module __this_module __attribute__((section(".gnu.linkonce.this_module"))) = { .name = KBUILD_MODNAME, .init = init_module, #ifdef CONFIG_MODULE_UNLOAD .exit = cleanup_module, #endif .arch = MODULE_ARCH_INIT, }; static const struct modversion_info ____versions[] __used __attribute__((section("__versions"))) = { { 0x9412fa01, __VMLINUX_SYMBOL_STR(module_layout) }, { 0x27e1a049, __VMLINUX_SYMBOL_STR(printk) }, { 0xbdfb6dbb, __VMLINUX_SYMBOL_STR(__fentry__) }, }; ... Additional modinfo is included Base of struct module Symbols and (expected) versions which this module depends on.
  • 103.
    Generated <module>.mod.c (2) •Example 102 static const char __module_depends[] __used __attribute__((section(".modinfo"))) = "depends="; MODULE_INFO(srcversion, "8D5BACDC1EA9421ABFF79DD") Modinfo about dependency (but the kernel does not use this) Modinfo “srcversion”
  • 104.
    modinfo • The modinfostring is created by macros, and concatenated by collecting the string into a single section 103 #define __MODULE_INFO(tag, name, info) static const char __UNIQUE_ID(name)[] __used __attribute__((section(".modinfo"), unused, aligned(1))) = __stringify(tag) "=" info (include/linux/moduleparam.h) #define MODULE_INFO(tag, info) __MODULE_INFO(tag, tag, info) ... #define MODULE_LICENSE(_license) MODULE_INFO(license, _license) ... #define MODULE_AUTHOR(_author) MODULE_INFO(author, _author) ... (include/linux/module.h)
  • 105.
    UNIQUE_ID 104 #define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_,prefix), __COUNTER__) (include/linux/compiler-gcc4.h)