SlideShare a Scribd company logo
1 of 56
Linux Kernel Tour
By:
Samrat Das
AED600
Tour Map Starting From
To – Full functionality working of OS
Topics to be covered
o Introduction
o Kernel Source Organization
o Compilation Process
o Booting Process
o Loading of Kernel
o Initialization Process
o Working of Kernel
o Subsystem of Kernel
o Introduction to common Kernel API's
o Kernel Symbols usage
o Introduction to mailing List & How to contribute to kernel tree
- (creating a patch and submitting)
Introduction
Introduction - Kernel Map
Lets see how the Linux Kernel
source is organized
Next Kernel Source Organization
You can get a get source from kernel.org or git
Kernel Source Organization
Kernel Source Organization
Kernel Source Organization
Kernel Source Organization
Kernel Source Organization
Browsing source code
cscope - Tool to browse source code
http://lxr.free-electrons.com – Online Source Browser
Compilation Process
After configurations, when the user types 'make zImage' or 'make
bzImage' resulting bootable kernel image is stored as
arch/i386/boot/zImage or bzimage .
Here is how the image is built
Compilation Process
I. C and assembly source files are compiled into ELF relocatable object format
(.o) and some of them are grouped logically into archives (.a) using ar(1).
II. Using ld(1), the above .o and .a are linked into vmlinux which is a statically
linked, non-stripped ELF 32-bit LSB 80386 executable file.
III. System.map is produced by nm vmlinux, irrelevant or uninteresting symbols
are grepped out.
IV. Enter directory arch/i386/boot.
V. Bootsector asm code bootsect.S is preprocessed either with or without -
D__BIG_KERNEL__, depending on whether the target is bzImage or zImage,
into bbootsect.s or bootsect.s respectively.
VI. bbootsect.s is assembled and then converted into 'raw binary' form called
bbootsect (or bootsect.s assembled and raw-converted into bootsect for
zImage).
VII. Setup code setup.S (setup.S includes video.S) is preprocessed into bsetup.s for
bzImage or setup.s for zImage. In the same way as the bootsector code, the
difference is marked by -D__BIG_KERNEL__ present for bzImage. The result is
then converted into 'raw binary' form called bsetup.
Compilation Process cont.
VIII.Enter directory arch/i386/boot/compressed and convert
/usr/src/linux/vmlinux to $tmppiggy (tmp filename) in raw binary format,
removing .note and .comment ELF sections.
IX. gzip -9 < $tmppiggy > $tmppiggy.gz
X. Link $tmppiggy.gz into ELF relocatable (ld -r) piggy.o.
XI. Compile compression routines head.S and misc.c (still in
arch/i386/boot/compressed directory) into ELF objects head.o and misc.o.
XII. Link together head.o, misc.o and piggy.o into bvmlinux (or vmlinux for
zImage, don't mistake this for /usr/src/linux/vmlinux!). Note the difference
between -Ttext 0x1000 used for vmlinux and -Ttext 0x100000 for bvmlinux,
i.e. for bzImage compression loader is high-loaded.
XIII.Convert bvmlinux to 'raw binary' bvmlinux.out removing .note and .comment
ELF sections.
XIV.Go back to arch/i386/boot directory and, using the program tools/build, cat
together bbootsect, bsetup and compressed/bvmlinux.out into bzImage
(delete extra 'b' above for zImage). This writes important variables like
setup_sects and root_dev at the end of the bootsector.
Result after compilation - bzimage
What's there inside
Objdump –D bzImage
Let us see how this kernel is working
Lets start from boot process
Booting Process
I. BIOS selects the boot device.
II. BIOS loads the bootsector from the boot device.
III. Bootsector loads setup, decompression routines and
compressed kernel image.
IV. The kernel is uncompressed in protected mode.
V. Low-level initialization is performed by asm code.
VI. High-level C initialization.
Mapping of Kernel and
other peripherals
Initializations – asm
I. Initialize segment values.
II. Initialize page tables.
III. Enable paging by setting PG bit in %cr0.
IV. Zero-clean BSS (on SMP, only first CPU does this).
V. Copy the first 2k of bootup parameters (kernel command
line).
VI. Check CPU type using EFLAGS and, if possible, cpuid, able
to detect 386 and higher.
VII. The first CPU calls start_kernel(), all others call
arch/i386/kernel/smpboot.c:initialize_secondary() if
ready=1, which just reloads esp/eip and doesn't return.
Initializations – high level
I. Take a global kernel lock (it is needed so that only one
CPU goes through initialization).
II. Perform arch-specific setup (memory layout analysis,
copying boot command line again, etc.).
III. Print Linux kernel "banner" containing the version.
IV. Initialize traps.
V. Initialize irqs.
Initializations – high level
VI. Initialize data required for scheduler.
VII. Initialize time keeping data.
VIII.Initialize softirq subsystem.
IX. Parse boot commandline options.
X. Initialize console.
XI. If module support was compiled into the kernel, initialize
dynamical module loading facility.
XII. If "profile=" command line was supplied, initialize
profiling buffers.
XIII.kmem_cache_init(), initialize most of slab allocator.
XIV.Enable interrupts.
Initializations – high level
XV. Calculate BogoMips value for this CPU.
XVI. Call mem_init() which calculates max_mapnr,
totalram_pages and high_memory and prints out the
"Memory: ..." line.
XVII. kmem_cache_sizes_init(), finish slab allocator
initialization.
XVIII. Initialize data structures used by procfs.
XIX. fork_init(), create uid_cache, initialise max_threads
based on the amount of memory available and configure
RLIMIT_NPROC for init_task to be max_threads/2.
XX. Create various slab caches needed for VFS, VM, buffer
cache, etc.
Initializations – high level
XXI.If System V IPC support is compiled in, initialise the IPC
subsystem. Note that for System V shm, this includes
mounting an internal (in-kernel) instance of shmfs
filesystem.
XXII. If quota support is compiled into the kernel, create and
initialise a special slab cache for it.
XXIII. Perform arch-specific "check for bugs" and, whenever
possible, activate workaround for processor/bus/etc
bugs. Comparing various architectures reveals that "ia64
has no bugs" and "ia32 has quite a few bugs", good
example is "f00f bug" which is only checked if kernel is
compiled for less than 686 and worked around
accordingly.
Initializations – high level
Finally the kernel is ready to move_to_user_mode()
XXIV. Set a flag to indicate that a schedule should be invoked
at "next opportunity" and create a kernel thread init()
which execs execute_command if supplied via "init=" boot
parameter, or tries to exec /sbin/init, /etc/init,
/bin/init, /bin/sh in this order; if all these fail, panic
with "suggestion" to use "init=" parameter.
XXV. Go into the idle loop, this is an idle thread with pid=0.
Working of Kernel
After exec()ing the init program from one of the
standard places the kernel has no direct control on
the program flow.
Its role, from now on is to provide processes with
system calls, as well as servicing asynchronous
events.
Multitasking has been setup, and it is now init which
manages multiuser access by fork()ing system
daemons and login processes.
Working of Kernel
Whenever program tries
to use system resource, it
uses system call
System Call Implementation
• The mechanism to signal the kernel is a software interrupt.
• Incur an exception and then the system will switch to kernel mode and
execute the exception handler/System call handler.
• The defined software interrupt on x86 is the int $0x80 instruction.
• It triggers a switch to kernel mode and the execution of exception
vector 128, which is the system call handler.
• The system call handler is the aptly named function system_call(). It is
architecture dependent and typically implemented in assembly in
entry.S.
• x86 processors added a feature known as sysenter. This feature
provides a faster, more specialized way of trapping into a kernel to
execute a system call than using the int interrupt instruction.
System Call Implementation
Denoting the Correct System Call
• On x86, the syscall number is fed to the kernel via the eax register.
• Before causing the trap into the kernel, user-space sticks in eax the
number corresponding to the desired system call.
• The system call handler then reads the value from eax.
• The system_call() function checks the validity of the given system call
number by comparing it to NR_syscalls.
• If it is larger than or equal to NR_syscalls, the function returns -
ENOSYS. Otherwise, the specified system call is invoked:
• call *sys_call_table(,%eax,4)
Because each element in the system call table is 32 bits (four bytes), the
kernel multiplies the given system call number by four to arrive at its
location in the system call table.
System Call Implementation
Parameter Passing
In addition to the system call number, most syscalls require that
one or more parameters be passed to them. The easiest way to
do this is via the same means that the syscall number is passed:
• The parameters are stored in registers. On x86, the registers
ebx, ecx, edx, esi, and edi contain, in order, the first five
arguments.
• In the unlikely case of six or more arguments, a single register
is used to hold a pointer to user-space where all the
parameters are stored.
The return value is sent to user-space also via register. On x86,
it is written into the eax register.
We have seen how system calls are
implemented. But what about the
system calls?.
System calls are the calls to the subsystems of the kernel.
Now let us understand about Subsystems of kernel.
Subsystem of Kernel
 Human Interface
 System Interface
 Process Management
 Memory Management
 Storage Handling
 Networking
Human Interface
Subsystem of Kernel Required to handle input output of
system
It controls the functionality of:
• Keyboard
• Console screen
• Mouse
• Etc.
System Interface
Device Drivers are the part of system Interface.
Which is responsible to interface the system with the
peripherals and system Hardware Components
Types of drivers:
• Character Drivers
• Block Drivers
• USB Drivers
• Network Drivers
Process Management
From the kernel point of view, a process is an entry in the process table.
Nothing more.
The process table, then, is one of the most important data structures
within the system, together with the memory-management tables and the
buffer cache. The individual item in the process table is the task_struct
structure, defined in include/linux/sched.h.
The process table is both an array and a double-linked list, as well as a
tree. The physical implementation is a static array of pointers, whose
length is NR_TASKS, a constant defined in include/linux/tasks.h, and each
structure resides in a reserved memory page. The list structure is
achieved through the pointers next_task and prev_task.
Process Management Cont.
After booting is over, the kernel is always working on behalf of one of the
processes, and the global variable current, a pointer to a task_struct
item, is used to record the running one. current is only changed by the
scheduler, in kernel/sched.c. When, however, all processes must be
looked at, the macro for_each_task is used. It is considerably faster than
a sequential scan of the array, when the system is lightly loaded.
A process is always running in either ``user mode'' or ``kernel mode''. The
main body of a user program is executed in user mode and system calls
are executed in kernel mode.
System calls, within the kernel, exist as C language functions, their
`official' name being prefixed by `sys_'. A system call named, for
example, burnout invokes the kernel function sys_burnout().
Process Management
Creating processes
A unix system creates a process though the fork() system call, and process
termination is performed either by exit() or by receiving a signal.
The Linux implementation for them resides in kernel/fork.c and
kernel/exit.c.
Fork’s main task is filling the data structure for the new process. Relevant
steps, apart from filling fields, are:
• getting a free page to hold the task_struct
• finding an empty process slot (find_empty_process())
• getting another free page for the kernel_stack_page
• copying the father's LDT to the child
• duplicating mmap information of the father
sys_fork() also manages file descriptors and inodes.
Process Management
Destroying processes
Exiting from a process is trickier, because the parent process must be
notified about any child who exits.
Moreover, a process can exit by being kill()ed by another process (these
are Unix features).
The file exit.c is therefore the home of sys_kill() and the various flavors
of sys_wait(), in addition to sys_exit().
Process Management
Executing programs
• After fork()ing, two copies of the same program are running. One of them
usually exec()s another program.
• The exec() system call must locate the binary image of the executable file,
load and run it.
• The Linux implementation of exec() supports different binary formats. This is
accomplished through the linux_binfmt structure.
• Loading of shared libraries is implemented in the same source file as exec() is,
but let's stick to exec() itself.
• The Unix systems provide the programmer with six flavors of the exec()
function. All but one of them can be implemented as library functions, and the
Linux kernel implements sys_execve() alone.
It performs quite a simple task: loading the head of the executable, and trying to
execute it. If the first two bytes are ``#!'', then the first line is parsed and an
interpreter is invoked, otherwise the registered binary formats are sequentially
tried.
Process Management
State
As a process executes it changes state according to its circumstances.
Linux processes have the following states:
• Running: The process is either running or it is ready to run
• Waiting: The process is waiting for an event or for a resource. Linux
differentiates between two types of waiting process; interruptible and
uninterruptible.
• Stopped: The process has been stopped, usually by receiving a signal.
A process that is being debugged can be in a stopped state.
• Zombie: This is a halted process which, for some reason, still has a
task_struct data structure in the task vector. It is what it sounds like, a
dead process.
The scheduler needs this information in order to fairly decide which process in
the system most deserves to run
Process Management
Process Handling - Schedulers
History of Schedulers
• O(n) - before – 2.6
• O(1) - Ingo Molnar - 2.6 to 2.6.23
• Rotating Staircase Deadline Scheduler - Con Kolivas
• Complete Fair Scheduler - Ingo Molnar - 2.6.23 to 3.18
• Brain Fuck Scheduler - Con Kolivas – 3.18.1
Processes System Calls
Scheduler
Memory ManagementLinux uses segmentation + pagination, which simplifies notation.
Linux uses only 4 segments:
2 segments (code and data/stack) for KERNEL SPACE (3 GB) to (4 GB)
2 segments (code and data/stack) for USER SPACE from (0 GB) to (3 GB)
Memory Management
Memory Management
Storage Handling
The Virtual Filesystem (sometimes called the Virtual File Switch or more
commonly simply the VFS) is the subsystem of the kernel that implements
the file and filesystem-related
interfaces provided to user-space programs.
The VFS is the glue that enables system calls such as open(), read(), and
write() to work regardless of the filesystem or underlying physical
medium.
Networking
This Layer is Responsible for handling the network Packets.
Protocol stacks required, are implemented here.
It is also responsible for decrypting / encrypting the network
Packets.
How To Program
How to use the features of kernel or change existing thing in kernel.
Kernel Common API's
Kernel API’s are documented here
https://www.kernel.org/doc/htmldocs/kernel-api/
• Data Types
• Basic C Library Functions
• Basic Kernel Library Functions
• Memory Management in Linux
• Kernel IPC facilities
• FIFO Buffer
• relay interface support
• Module Support
• Hardware Interfaces
• Firmware Interfaces
• ……. Etc.
Kernel Symbol Usage
When modules are loaded, they are dynamically linked into the kernel. As with
user-space, dynamically linked binaries can call only into external functions that
are explicitly exported for use. In the kernel, this is handled via special directives
called EXPORT_ SYMBOL() and EXPORT_SYMBOL_GPL().
Functions that are exported are available for use by modules. Functions that are
not exported cannot be invoked from modules.
The set of kernel symbols that are exported are known as the exported kernel
interfaces or even the kernel API.
Exporting a symbol is easy. After the function is declared, it is usually followed by
an EXPORT_SYMBOL(). For example,
int get_pirate_beard_color(void)
{
return pirate->beard->color;
}
EXPORT_SYMBOL(get_pirate_beard_color);
Introduction to mailing
List & How to contribute
---------------------------------------------------------------------------------------
git diff
git commit
git show
git format-patch
git send-email
References
The Linux Document Project – TLPD
http://www.tldp.org/LDP/lki/lki.html
Kernelnewbies.org
http://kernelnewbies.org/Documentation/Subsystems
Free-electrons
http://free-electrons.com
http://lxr.free-electrons.com
Kernel Map
http://www.makelinux.net/kernel_map/
Thank you
Samrat Das
samrat48@hotmail.com

More Related Content

What's hot

Linux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBLinux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBshimosawa
 
U boot porting guide for SoC
U boot porting guide for SoCU boot porting guide for SoC
U boot porting guide for SoCMacpaul Lin
 
U-Boot Porting on New Hardware
U-Boot Porting on New HardwareU-Boot Porting on New Hardware
U-Boot Porting on New HardwareRuggedBoardGroup
 
Process Scheduler and Balancer in Linux Kernel
Process Scheduler and Balancer in Linux KernelProcess Scheduler and Balancer in Linux Kernel
Process Scheduler and Balancer in Linux KernelHaifeng Li
 
Physical Memory Models.pdf
Physical Memory Models.pdfPhysical Memory Models.pdf
Physical Memory Models.pdfAdrian Huang
 
Linux Memory Management
Linux Memory ManagementLinux Memory Management
Linux Memory ManagementNi Zo-Ma
 
Understanding RAID Levels (RAID 0, RAID 1, RAID 2, RAID 3, RAID 4, RAID 5)
Understanding RAID Levels (RAID 0, RAID 1, RAID 2, RAID 3, RAID 4, RAID 5)Understanding RAID Levels (RAID 0, RAID 1, RAID 2, RAID 3, RAID 4, RAID 5)
Understanding RAID Levels (RAID 0, RAID 1, RAID 2, RAID 3, RAID 4, RAID 5)Raid Data Recovery
 
linux device driver
linux device driverlinux device driver
linux device driverRahul Batra
 
Arm device tree and linux device drivers
Arm device tree and linux device driversArm device tree and linux device drivers
Arm device tree and linux device driversHoucheng Lin
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugginglibfetion
 
U Boot or Universal Bootloader
U Boot or Universal BootloaderU Boot or Universal Bootloader
U Boot or Universal BootloaderSatpal Parmar
 
Linux Initialization Process (1)
Linux Initialization Process (1)Linux Initialization Process (1)
Linux Initialization Process (1)shimosawa
 
Introduction to Linux Kernel
Introduction to Linux KernelIntroduction to Linux Kernel
Introduction to Linux KernelStryker King
 

What's hot (20)

Linux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBLinux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKB
 
U boot porting guide for SoC
U boot porting guide for SoCU boot porting guide for SoC
U boot porting guide for SoC
 
U-Boot Porting on New Hardware
U-Boot Porting on New HardwareU-Boot Porting on New Hardware
U-Boot Porting on New Hardware
 
Process Scheduler and Balancer in Linux Kernel
Process Scheduler and Balancer in Linux KernelProcess Scheduler and Balancer in Linux Kernel
Process Scheduler and Balancer in Linux Kernel
 
Physical Memory Models.pdf
Physical Memory Models.pdfPhysical Memory Models.pdf
Physical Memory Models.pdf
 
Qemu
QemuQemu
Qemu
 
Linux scheduler
Linux schedulerLinux scheduler
Linux scheduler
 
Linux Memory Management
Linux Memory ManagementLinux Memory Management
Linux Memory Management
 
A practical guide to buildroot
A practical guide to buildrootA practical guide to buildroot
A practical guide to buildroot
 
Understanding RAID Levels (RAID 0, RAID 1, RAID 2, RAID 3, RAID 4, RAID 5)
Understanding RAID Levels (RAID 0, RAID 1, RAID 2, RAID 3, RAID 4, RAID 5)Understanding RAID Levels (RAID 0, RAID 1, RAID 2, RAID 3, RAID 4, RAID 5)
Understanding RAID Levels (RAID 0, RAID 1, RAID 2, RAID 3, RAID 4, RAID 5)
 
linux device driver
linux device driverlinux device driver
linux device driver
 
Basic Linux Internals
Basic Linux InternalsBasic Linux Internals
Basic Linux Internals
 
Arm device tree and linux device drivers
Arm device tree and linux device driversArm device tree and linux device drivers
Arm device tree and linux device drivers
 
DPDK In Depth
DPDK In DepthDPDK In Depth
DPDK In Depth
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugging
 
U Boot or Universal Bootloader
U Boot or Universal BootloaderU Boot or Universal Bootloader
U Boot or Universal Bootloader
 
Dpdk applications
Dpdk applicationsDpdk applications
Dpdk applications
 
Linux Initialization Process (1)
Linux Initialization Process (1)Linux Initialization Process (1)
Linux Initialization Process (1)
 
Introduction to Linux Kernel
Introduction to Linux KernelIntroduction to Linux Kernel
Introduction to Linux Kernel
 
Embedded Linux Kernel - Build your custom kernel
Embedded Linux Kernel - Build your custom kernelEmbedded Linux Kernel - Build your custom kernel
Embedded Linux Kernel - Build your custom kernel
 

Viewers also liked

Introduction To Linux Kernel Modules
Introduction To Linux Kernel ModulesIntroduction To Linux Kernel Modules
Introduction To Linux Kernel Modulesdibyajyotig
 
Linux Kernel Introduction
Linux Kernel IntroductionLinux Kernel Introduction
Linux Kernel IntroductionSage Sharp
 
Browsing Linux Kernel Source
Browsing Linux Kernel SourceBrowsing Linux Kernel Source
Browsing Linux Kernel SourceMotaz Saad
 
OpenStack for Beginners
OpenStack for BeginnersOpenStack for Beginners
OpenStack for BeginnersJesse Proudman
 
Linux Kernel Input: mouse, teclado, joystick
Linux Kernel Input: mouse, teclado, joystickLinux Kernel Input: mouse, teclado, joystick
Linux Kernel Input: mouse, teclado, joystickMarcos Paulo de Souza
 
OMFW 2012: Analyzing Linux Kernel Rootkits with Volatlity
OMFW 2012: Analyzing Linux Kernel Rootkits with VolatlityOMFW 2012: Analyzing Linux Kernel Rootkits with Volatlity
OMFW 2012: Analyzing Linux Kernel Rootkits with VolatlityAndrew Case
 
A particle filter based scheme for indoor tracking on an Android Smartphone
A particle filter based scheme for indoor tracking on an Android SmartphoneA particle filter based scheme for indoor tracking on an Android Smartphone
A particle filter based scheme for indoor tracking on an Android SmartphoneDivye Kapoor
 
Rootkit 102 - Kernel-Based Rootkit
Rootkit 102 - Kernel-Based RootkitRootkit 102 - Kernel-Based Rootkit
Rootkit 102 - Kernel-Based RootkitChia-Hao Tsai
 
Cybermania Prelims
Cybermania PrelimsCybermania Prelims
Cybermania PrelimsDivye Kapoor
 
Kernel Recipes 2015: The stable Linux Kernel Tree - 10 years of insanity
Kernel Recipes 2015: The stable Linux Kernel Tree - 10 years of insanityKernel Recipes 2015: The stable Linux Kernel Tree - 10 years of insanity
Kernel Recipes 2015: The stable Linux Kernel Tree - 10 years of insanityAnne Nicolas
 
Linux Internals - Kernel/Core
Linux Internals - Kernel/CoreLinux Internals - Kernel/Core
Linux Internals - Kernel/CoreShay Cohen
 
The TCP/IP stack in the FreeBSD kernel COSCUP 2014
The TCP/IP stack in the FreeBSD kernel COSCUP 2014The TCP/IP stack in the FreeBSD kernel COSCUP 2014
The TCP/IP stack in the FreeBSD kernel COSCUP 2014Kevin Lo
 
Kernel Recipes 2014 - The Linux Kernel, how fast it is developed and how we s...
Kernel Recipes 2014 - The Linux Kernel, how fast it is developed and how we s...Kernel Recipes 2014 - The Linux Kernel, how fast it is developed and how we s...
Kernel Recipes 2014 - The Linux Kernel, how fast it is developed and how we s...Anne Nicolas
 

Viewers also liked (20)

Introduction To Linux Kernel Modules
Introduction To Linux Kernel ModulesIntroduction To Linux Kernel Modules
Introduction To Linux Kernel Modules
 
Linux Kernel Introduction
Linux Kernel IntroductionLinux Kernel Introduction
Linux Kernel Introduction
 
Browsing Linux Kernel Source
Browsing Linux Kernel SourceBrowsing Linux Kernel Source
Browsing Linux Kernel Source
 
Linux kernel architecture
Linux kernel architectureLinux kernel architecture
Linux kernel architecture
 
Architecture Of The Linux Kernel
Architecture Of The Linux KernelArchitecture Of The Linux Kernel
Architecture Of The Linux Kernel
 
Redux js
Redux jsRedux js
Redux js
 
Linux device drivers
Linux device driversLinux device drivers
Linux device drivers
 
How to Redux
How to ReduxHow to Redux
How to Redux
 
OpenStack for Beginners
OpenStack for BeginnersOpenStack for Beginners
OpenStack for Beginners
 
Linux Kernel Input: mouse, teclado, joystick
Linux Kernel Input: mouse, teclado, joystickLinux Kernel Input: mouse, teclado, joystick
Linux Kernel Input: mouse, teclado, joystick
 
Linux performance
Linux performanceLinux performance
Linux performance
 
OMFW 2012: Analyzing Linux Kernel Rootkits with Volatlity
OMFW 2012: Analyzing Linux Kernel Rootkits with VolatlityOMFW 2012: Analyzing Linux Kernel Rootkits with Volatlity
OMFW 2012: Analyzing Linux Kernel Rootkits with Volatlity
 
Cybermania Mains
Cybermania MainsCybermania Mains
Cybermania Mains
 
A particle filter based scheme for indoor tracking on an Android Smartphone
A particle filter based scheme for indoor tracking on an Android SmartphoneA particle filter based scheme for indoor tracking on an Android Smartphone
A particle filter based scheme for indoor tracking on an Android Smartphone
 
Rootkit 102 - Kernel-Based Rootkit
Rootkit 102 - Kernel-Based RootkitRootkit 102 - Kernel-Based Rootkit
Rootkit 102 - Kernel-Based Rootkit
 
Cybermania Prelims
Cybermania PrelimsCybermania Prelims
Cybermania Prelims
 
Kernel Recipes 2015: The stable Linux Kernel Tree - 10 years of insanity
Kernel Recipes 2015: The stable Linux Kernel Tree - 10 years of insanityKernel Recipes 2015: The stable Linux Kernel Tree - 10 years of insanity
Kernel Recipes 2015: The stable Linux Kernel Tree - 10 years of insanity
 
Linux Internals - Kernel/Core
Linux Internals - Kernel/CoreLinux Internals - Kernel/Core
Linux Internals - Kernel/Core
 
The TCP/IP stack in the FreeBSD kernel COSCUP 2014
The TCP/IP stack in the FreeBSD kernel COSCUP 2014The TCP/IP stack in the FreeBSD kernel COSCUP 2014
The TCP/IP stack in the FreeBSD kernel COSCUP 2014
 
Kernel Recipes 2014 - The Linux Kernel, how fast it is developed and how we s...
Kernel Recipes 2014 - The Linux Kernel, how fast it is developed and how we s...Kernel Recipes 2014 - The Linux Kernel, how fast it is developed and how we s...
Kernel Recipes 2014 - The Linux Kernel, how fast it is developed and how we s...
 

Similar to Linux Kernel Tour

Linux kernel driver tutorial vorlesung
Linux kernel driver tutorial vorlesungLinux kernel driver tutorial vorlesung
Linux kernel driver tutorial vorlesungdns -
 
Linux kernel-rootkit-dev - Wonokaerun
Linux kernel-rootkit-dev - WonokaerunLinux kernel-rootkit-dev - Wonokaerun
Linux kernel-rootkit-dev - Wonokaerunidsecconf
 
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is bootedVmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is bootedAdrian Huang
 
Linux kernel modules
Linux kernel modulesLinux kernel modules
Linux kernel modulesHao-Ran Liu
 
Bootloaders (U-Boot)
Bootloaders (U-Boot) Bootloaders (U-Boot)
Bootloaders (U-Boot) Omkar Rane
 
Development platform virtualization using qemu
Development platform virtualization using qemuDevelopment platform virtualization using qemu
Development platform virtualization using qemuPremjith Achemveettil
 
Preparing BitVisor for Supporting Multiple Architectures
Preparing BitVisor for Supporting Multiple ArchitecturesPreparing BitVisor for Supporting Multiple Architectures
Preparing BitVisor for Supporting Multiple ArchitecturesAke Koomsin
 
Kernel compilation
Kernel compilationKernel compilation
Kernel compilationmcganesh
 
Linux Kernel Startup Code In Embedded Linux
Linux    Kernel    Startup  Code In  Embedded  LinuxLinux    Kernel    Startup  Code In  Embedded  Linux
Linux Kernel Startup Code In Embedded LinuxEmanuele Bonanni
 
Basic about-router
Basic about-routerBasic about-router
Basic about-routersaurabh goel
 
Part 04 Creating a System Call in Linux
Part 04 Creating a System Call in LinuxPart 04 Creating a System Call in Linux
Part 04 Creating a System Call in LinuxTushar B Kute
 
Your first dive into systemd!
Your first dive into systemd!Your first dive into systemd!
Your first dive into systemd!Etsuji Nakai
 
An Insight into the Linux Booting Process
An Insight into the Linux Booting ProcessAn Insight into the Linux Booting Process
An Insight into the Linux Booting ProcessHardeep Bhurji
 

Similar to Linux Kernel Tour (20)

LINUX Device Drivers
LINUX Device DriversLINUX Device Drivers
LINUX Device Drivers
 
Linux kernel driver tutorial vorlesung
Linux kernel driver tutorial vorlesungLinux kernel driver tutorial vorlesung
Linux kernel driver tutorial vorlesung
 
Ch04 system administration
Ch04 system administration Ch04 system administration
Ch04 system administration
 
Ch04
Ch04Ch04
Ch04
 
Linux kernel-rootkit-dev - Wonokaerun
Linux kernel-rootkit-dev - WonokaerunLinux kernel-rootkit-dev - Wonokaerun
Linux kernel-rootkit-dev - Wonokaerun
 
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is bootedVmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
 
Linux Booting Steps
Linux Booting StepsLinux Booting Steps
Linux Booting Steps
 
Linux kernel modules
Linux kernel modulesLinux kernel modules
Linux kernel modules
 
Bootloaders (U-Boot)
Bootloaders (U-Boot) Bootloaders (U-Boot)
Bootloaders (U-Boot)
 
Development platform virtualization using qemu
Development platform virtualization using qemuDevelopment platform virtualization using qemu
Development platform virtualization using qemu
 
Computer Science Assignment Help
Computer Science Assignment HelpComputer Science Assignment Help
Computer Science Assignment Help
 
Preparing BitVisor for Supporting Multiple Architectures
Preparing BitVisor for Supporting Multiple ArchitecturesPreparing BitVisor for Supporting Multiple Architectures
Preparing BitVisor for Supporting Multiple Architectures
 
Linux startup
Linux startupLinux startup
Linux startup
 
Building
BuildingBuilding
Building
 
Kernel compilation
Kernel compilationKernel compilation
Kernel compilation
 
Linux Kernel Startup Code In Embedded Linux
Linux    Kernel    Startup  Code In  Embedded  LinuxLinux    Kernel    Startup  Code In  Embedded  Linux
Linux Kernel Startup Code In Embedded Linux
 
Basic about-router
Basic about-routerBasic about-router
Basic about-router
 
Part 04 Creating a System Call in Linux
Part 04 Creating a System Call in LinuxPart 04 Creating a System Call in Linux
Part 04 Creating a System Call in Linux
 
Your first dive into systemd!
Your first dive into systemd!Your first dive into systemd!
Your first dive into systemd!
 
An Insight into the Linux Booting Process
An Insight into the Linux Booting ProcessAn Insight into the Linux Booting Process
An Insight into the Linux Booting Process
 

Recently uploaded

Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the weldingMuhammadUzairLiaqat
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Piping Basic stress analysis by engineering
Piping Basic stress analysis by engineeringPiping Basic stress analysis by engineering
Piping Basic stress analysis by engineeringJuanCarlosMorales19600
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm Systemirfanmechengr
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 

Recently uploaded (20)

Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the welding
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Piping Basic stress analysis by engineering
Piping Basic stress analysis by engineeringPiping Basic stress analysis by engineering
Piping Basic stress analysis by engineering
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm System
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 

Linux Kernel Tour

  • 3.
  • 4. To – Full functionality working of OS
  • 5. Topics to be covered o Introduction o Kernel Source Organization o Compilation Process o Booting Process o Loading of Kernel o Initialization Process o Working of Kernel o Subsystem of Kernel o Introduction to common Kernel API's o Kernel Symbols usage o Introduction to mailing List & How to contribute to kernel tree - (creating a patch and submitting)
  • 8. Lets see how the Linux Kernel source is organized Next Kernel Source Organization You can get a get source from kernel.org or git
  • 14. Browsing source code cscope - Tool to browse source code http://lxr.free-electrons.com – Online Source Browser
  • 15. Compilation Process After configurations, when the user types 'make zImage' or 'make bzImage' resulting bootable kernel image is stored as arch/i386/boot/zImage or bzimage . Here is how the image is built
  • 16. Compilation Process I. C and assembly source files are compiled into ELF relocatable object format (.o) and some of them are grouped logically into archives (.a) using ar(1). II. Using ld(1), the above .o and .a are linked into vmlinux which is a statically linked, non-stripped ELF 32-bit LSB 80386 executable file. III. System.map is produced by nm vmlinux, irrelevant or uninteresting symbols are grepped out. IV. Enter directory arch/i386/boot. V. Bootsector asm code bootsect.S is preprocessed either with or without - D__BIG_KERNEL__, depending on whether the target is bzImage or zImage, into bbootsect.s or bootsect.s respectively. VI. bbootsect.s is assembled and then converted into 'raw binary' form called bbootsect (or bootsect.s assembled and raw-converted into bootsect for zImage). VII. Setup code setup.S (setup.S includes video.S) is preprocessed into bsetup.s for bzImage or setup.s for zImage. In the same way as the bootsector code, the difference is marked by -D__BIG_KERNEL__ present for bzImage. The result is then converted into 'raw binary' form called bsetup.
  • 17. Compilation Process cont. VIII.Enter directory arch/i386/boot/compressed and convert /usr/src/linux/vmlinux to $tmppiggy (tmp filename) in raw binary format, removing .note and .comment ELF sections. IX. gzip -9 < $tmppiggy > $tmppiggy.gz X. Link $tmppiggy.gz into ELF relocatable (ld -r) piggy.o. XI. Compile compression routines head.S and misc.c (still in arch/i386/boot/compressed directory) into ELF objects head.o and misc.o. XII. Link together head.o, misc.o and piggy.o into bvmlinux (or vmlinux for zImage, don't mistake this for /usr/src/linux/vmlinux!). Note the difference between -Ttext 0x1000 used for vmlinux and -Ttext 0x100000 for bvmlinux, i.e. for bzImage compression loader is high-loaded. XIII.Convert bvmlinux to 'raw binary' bvmlinux.out removing .note and .comment ELF sections. XIV.Go back to arch/i386/boot directory and, using the program tools/build, cat together bbootsect, bsetup and compressed/bvmlinux.out into bzImage (delete extra 'b' above for zImage). This writes important variables like setup_sects and root_dev at the end of the bootsector.
  • 18. Result after compilation - bzimage What's there inside Objdump –D bzImage
  • 19.
  • 20. Let us see how this kernel is working Lets start from boot process
  • 21. Booting Process I. BIOS selects the boot device. II. BIOS loads the bootsector from the boot device. III. Bootsector loads setup, decompression routines and compressed kernel image. IV. The kernel is uncompressed in protected mode. V. Low-level initialization is performed by asm code. VI. High-level C initialization.
  • 22. Mapping of Kernel and other peripherals
  • 23. Initializations – asm I. Initialize segment values. II. Initialize page tables. III. Enable paging by setting PG bit in %cr0. IV. Zero-clean BSS (on SMP, only first CPU does this). V. Copy the first 2k of bootup parameters (kernel command line). VI. Check CPU type using EFLAGS and, if possible, cpuid, able to detect 386 and higher. VII. The first CPU calls start_kernel(), all others call arch/i386/kernel/smpboot.c:initialize_secondary() if ready=1, which just reloads esp/eip and doesn't return.
  • 24. Initializations – high level I. Take a global kernel lock (it is needed so that only one CPU goes through initialization). II. Perform arch-specific setup (memory layout analysis, copying boot command line again, etc.). III. Print Linux kernel "banner" containing the version. IV. Initialize traps. V. Initialize irqs.
  • 25. Initializations – high level VI. Initialize data required for scheduler. VII. Initialize time keeping data. VIII.Initialize softirq subsystem. IX. Parse boot commandline options. X. Initialize console. XI. If module support was compiled into the kernel, initialize dynamical module loading facility. XII. If "profile=" command line was supplied, initialize profiling buffers. XIII.kmem_cache_init(), initialize most of slab allocator. XIV.Enable interrupts.
  • 26. Initializations – high level XV. Calculate BogoMips value for this CPU. XVI. Call mem_init() which calculates max_mapnr, totalram_pages and high_memory and prints out the "Memory: ..." line. XVII. kmem_cache_sizes_init(), finish slab allocator initialization. XVIII. Initialize data structures used by procfs. XIX. fork_init(), create uid_cache, initialise max_threads based on the amount of memory available and configure RLIMIT_NPROC for init_task to be max_threads/2. XX. Create various slab caches needed for VFS, VM, buffer cache, etc.
  • 27. Initializations – high level XXI.If System V IPC support is compiled in, initialise the IPC subsystem. Note that for System V shm, this includes mounting an internal (in-kernel) instance of shmfs filesystem. XXII. If quota support is compiled into the kernel, create and initialise a special slab cache for it. XXIII. Perform arch-specific "check for bugs" and, whenever possible, activate workaround for processor/bus/etc bugs. Comparing various architectures reveals that "ia64 has no bugs" and "ia32 has quite a few bugs", good example is "f00f bug" which is only checked if kernel is compiled for less than 686 and worked around accordingly.
  • 28. Initializations – high level Finally the kernel is ready to move_to_user_mode() XXIV. Set a flag to indicate that a schedule should be invoked at "next opportunity" and create a kernel thread init() which execs execute_command if supplied via "init=" boot parameter, or tries to exec /sbin/init, /etc/init, /bin/init, /bin/sh in this order; if all these fail, panic with "suggestion" to use "init=" parameter. XXV. Go into the idle loop, this is an idle thread with pid=0.
  • 29. Working of Kernel After exec()ing the init program from one of the standard places the kernel has no direct control on the program flow. Its role, from now on is to provide processes with system calls, as well as servicing asynchronous events. Multitasking has been setup, and it is now init which manages multiuser access by fork()ing system daemons and login processes.
  • 30. Working of Kernel Whenever program tries to use system resource, it uses system call
  • 31. System Call Implementation • The mechanism to signal the kernel is a software interrupt. • Incur an exception and then the system will switch to kernel mode and execute the exception handler/System call handler. • The defined software interrupt on x86 is the int $0x80 instruction. • It triggers a switch to kernel mode and the execution of exception vector 128, which is the system call handler. • The system call handler is the aptly named function system_call(). It is architecture dependent and typically implemented in assembly in entry.S. • x86 processors added a feature known as sysenter. This feature provides a faster, more specialized way of trapping into a kernel to execute a system call than using the int interrupt instruction.
  • 32. System Call Implementation Denoting the Correct System Call • On x86, the syscall number is fed to the kernel via the eax register. • Before causing the trap into the kernel, user-space sticks in eax the number corresponding to the desired system call. • The system call handler then reads the value from eax. • The system_call() function checks the validity of the given system call number by comparing it to NR_syscalls. • If it is larger than or equal to NR_syscalls, the function returns - ENOSYS. Otherwise, the specified system call is invoked: • call *sys_call_table(,%eax,4) Because each element in the system call table is 32 bits (four bytes), the kernel multiplies the given system call number by four to arrive at its location in the system call table.
  • 33. System Call Implementation Parameter Passing In addition to the system call number, most syscalls require that one or more parameters be passed to them. The easiest way to do this is via the same means that the syscall number is passed: • The parameters are stored in registers. On x86, the registers ebx, ecx, edx, esi, and edi contain, in order, the first five arguments. • In the unlikely case of six or more arguments, a single register is used to hold a pointer to user-space where all the parameters are stored. The return value is sent to user-space also via register. On x86, it is written into the eax register.
  • 34. We have seen how system calls are implemented. But what about the system calls?. System calls are the calls to the subsystems of the kernel. Now let us understand about Subsystems of kernel.
  • 35. Subsystem of Kernel  Human Interface  System Interface  Process Management  Memory Management  Storage Handling  Networking
  • 36. Human Interface Subsystem of Kernel Required to handle input output of system It controls the functionality of: • Keyboard • Console screen • Mouse • Etc.
  • 37. System Interface Device Drivers are the part of system Interface. Which is responsible to interface the system with the peripherals and system Hardware Components Types of drivers: • Character Drivers • Block Drivers • USB Drivers • Network Drivers
  • 38. Process Management From the kernel point of view, a process is an entry in the process table. Nothing more. The process table, then, is one of the most important data structures within the system, together with the memory-management tables and the buffer cache. The individual item in the process table is the task_struct structure, defined in include/linux/sched.h. The process table is both an array and a double-linked list, as well as a tree. The physical implementation is a static array of pointers, whose length is NR_TASKS, a constant defined in include/linux/tasks.h, and each structure resides in a reserved memory page. The list structure is achieved through the pointers next_task and prev_task.
  • 39. Process Management Cont. After booting is over, the kernel is always working on behalf of one of the processes, and the global variable current, a pointer to a task_struct item, is used to record the running one. current is only changed by the scheduler, in kernel/sched.c. When, however, all processes must be looked at, the macro for_each_task is used. It is considerably faster than a sequential scan of the array, when the system is lightly loaded. A process is always running in either ``user mode'' or ``kernel mode''. The main body of a user program is executed in user mode and system calls are executed in kernel mode. System calls, within the kernel, exist as C language functions, their `official' name being prefixed by `sys_'. A system call named, for example, burnout invokes the kernel function sys_burnout().
  • 40. Process Management Creating processes A unix system creates a process though the fork() system call, and process termination is performed either by exit() or by receiving a signal. The Linux implementation for them resides in kernel/fork.c and kernel/exit.c. Fork’s main task is filling the data structure for the new process. Relevant steps, apart from filling fields, are: • getting a free page to hold the task_struct • finding an empty process slot (find_empty_process()) • getting another free page for the kernel_stack_page • copying the father's LDT to the child • duplicating mmap information of the father sys_fork() also manages file descriptors and inodes.
  • 41. Process Management Destroying processes Exiting from a process is trickier, because the parent process must be notified about any child who exits. Moreover, a process can exit by being kill()ed by another process (these are Unix features). The file exit.c is therefore the home of sys_kill() and the various flavors of sys_wait(), in addition to sys_exit().
  • 42. Process Management Executing programs • After fork()ing, two copies of the same program are running. One of them usually exec()s another program. • The exec() system call must locate the binary image of the executable file, load and run it. • The Linux implementation of exec() supports different binary formats. This is accomplished through the linux_binfmt structure. • Loading of shared libraries is implemented in the same source file as exec() is, but let's stick to exec() itself. • The Unix systems provide the programmer with six flavors of the exec() function. All but one of them can be implemented as library functions, and the Linux kernel implements sys_execve() alone. It performs quite a simple task: loading the head of the executable, and trying to execute it. If the first two bytes are ``#!'', then the first line is parsed and an interpreter is invoked, otherwise the registered binary formats are sequentially tried.
  • 43. Process Management State As a process executes it changes state according to its circumstances. Linux processes have the following states: • Running: The process is either running or it is ready to run • Waiting: The process is waiting for an event or for a resource. Linux differentiates between two types of waiting process; interruptible and uninterruptible. • Stopped: The process has been stopped, usually by receiving a signal. A process that is being debugged can be in a stopped state. • Zombie: This is a halted process which, for some reason, still has a task_struct data structure in the task vector. It is what it sounds like, a dead process. The scheduler needs this information in order to fairly decide which process in the system most deserves to run
  • 44. Process Management Process Handling - Schedulers History of Schedulers • O(n) - before – 2.6 • O(1) - Ingo Molnar - 2.6 to 2.6.23 • Rotating Staircase Deadline Scheduler - Con Kolivas • Complete Fair Scheduler - Ingo Molnar - 2.6.23 to 3.18 • Brain Fuck Scheduler - Con Kolivas – 3.18.1 Processes System Calls Scheduler
  • 45. Memory ManagementLinux uses segmentation + pagination, which simplifies notation. Linux uses only 4 segments: 2 segments (code and data/stack) for KERNEL SPACE (3 GB) to (4 GB) 2 segments (code and data/stack) for USER SPACE from (0 GB) to (3 GB)
  • 48. Storage Handling The Virtual Filesystem (sometimes called the Virtual File Switch or more commonly simply the VFS) is the subsystem of the kernel that implements the file and filesystem-related interfaces provided to user-space programs. The VFS is the glue that enables system calls such as open(), read(), and write() to work regardless of the filesystem or underlying physical medium.
  • 49. Networking This Layer is Responsible for handling the network Packets. Protocol stacks required, are implemented here. It is also responsible for decrypting / encrypting the network Packets.
  • 50.
  • 51. How To Program How to use the features of kernel or change existing thing in kernel.
  • 52. Kernel Common API's Kernel API’s are documented here https://www.kernel.org/doc/htmldocs/kernel-api/ • Data Types • Basic C Library Functions • Basic Kernel Library Functions • Memory Management in Linux • Kernel IPC facilities • FIFO Buffer • relay interface support • Module Support • Hardware Interfaces • Firmware Interfaces • ……. Etc.
  • 53. Kernel Symbol Usage When modules are loaded, they are dynamically linked into the kernel. As with user-space, dynamically linked binaries can call only into external functions that are explicitly exported for use. In the kernel, this is handled via special directives called EXPORT_ SYMBOL() and EXPORT_SYMBOL_GPL(). Functions that are exported are available for use by modules. Functions that are not exported cannot be invoked from modules. The set of kernel symbols that are exported are known as the exported kernel interfaces or even the kernel API. Exporting a symbol is easy. After the function is declared, it is usually followed by an EXPORT_SYMBOL(). For example, int get_pirate_beard_color(void) { return pirate->beard->color; } EXPORT_SYMBOL(get_pirate_beard_color);
  • 54. Introduction to mailing List & How to contribute --------------------------------------------------------------------------------------- git diff git commit git show git format-patch git send-email
  • 55. References The Linux Document Project – TLPD http://www.tldp.org/LDP/lki/lki.html Kernelnewbies.org http://kernelnewbies.org/Documentation/Subsystems Free-electrons http://free-electrons.com http://lxr.free-electrons.com Kernel Map http://www.makelinux.net/kernel_map/

Editor's Notes

  1. Control Registers of x86 http://en.wikipedia.org/wiki/Control_register ; http://www.eecg.toronto.edu/~amza/www.mindsec.com/files/x86regs.html BSS - Block Started by Symbol General registers EAX EBX ECX EDX Segment registers CS DS ES FS GS SS Index and pointers ESI EDI EBP EIP ESP Indicator EFLAGS
  2. http://man7.org/linux/man-pages/man7/bootparam.7.html It is possible to enable a kernel profiling function, if one wishes to find out where the kernel is spending its CPU cycles.
  3. It is not possible for user-space applications to execute kernel code directly. They cannot simply make a function call to a method existing in kernel-space because the kernel exists in a protected memory space. If applications could directly read and write to the kernel's address space, system security and stability would go out the window. Instead, user-space applications must somehow signal the kernel that they want to execute a system call and have the system switch to kernel mode, where the system call can be executed in kernel-space by the kernel on behalf of the application.
  4. Simply entering kernel-space alone is not sufficient because there are multiple system calls, all of which enter the kernel in the same manner. Thus, the system call number must be passed into the kernel.
  5. The stack used by the process in the two execution modes is different--a conventional stack segment is used for user mode, while a fixed-size stack (one page, owned by the process) is used in kernel mode. The kernel stack page is never swapped out, because it must be available whenever a system call is entered.
  6. The Local Descriptor Table (LDT) is a memory table used in the x86 architecture in protected mode and containing memory segment descriptors: start in linear memory, size, executability, writability, access privilege, actual presence in memory, etc.
  7. Interruptible waiting processes can be interrupted by signals whereas uninterruptible waiting processes are waiting directly on hardware conditions and cannot be interrupted under any circumstances.
  8. The main idea behind the CFS is to maintain balance (fairness) in providing processor time to tasks.