The document discusses process communication and program execution in Linux. It describes various inter-process communication mechanisms like pipes, FIFOs, semaphores, shared memory, and sockets. It also explains how the kernel sets up the execution context for a new process by loading the executable file and any shared libraries. Key steps include resolving executable format, loading program code and data, and setting up memory segments for the text, data, bss, and stack.
2. PROCESS COMMUNICATION
a form of synchronization among User Mode processes can be
achieved by creating a (possibly empty) file and using suitable VFS
system calls to lock and unlock it.
While processes can similarly share data via temporary files
protected by locks, this approach is costly because it requires access
to the disk filesystem.
For this reason, all Unix kernels include a set of system calls that
support process communication without interacting with the
filesystem; furthermore, several wrapper functions were developed
and inserted in suitable libraries to expedite how processes issue their
synchronization requests to the kernel.
3. Pipes and FIFOs (named pipes): Best suited to implement
producer/consumer interactions among processes. Some processes fill
the pipe with data, while others extract data from the pipe.
Semaphores: It Represents, as the name implies, the User Mode
version of the kernel semaphores.
Shared memory regions: Allow processes to exchange information
via a shared block of memory. In applications that must share large
amounts of data, this can be the most efficient form of process
communication. Sockets Allow processes on different computers to
exchange data through a network.
Sockets: can also be used as a communication tool for processes
located on the same host computer; the X Window System graphic
interface, for instance, uses a socket to allow client programs to
exchange data with the X server.
4. PIPES
The standard output of the first process, which executes the ls program, is
redirected to the pipe; the second process, which executes the more program,
reads its input from the pipe.
$ ls> temp $ more< temp
Pipes may be considered open files that have no corresponding image in the
mounted filesystems. A process creates a new pipe by means of the pipe( )
system call, which returns a pair of file descriptors; the process may then pass
these descriptors to its descendants through fork( ), thus sharing the pipe with
them. The processes can read from the pipe by using the read( ) system call
with the first file descriptor; likewise, they can write into the pipe by using
the write( ) system call with the second file descriptor.
5. The first child process, which must execute the ls program, performs
the following operations:
1. Invokes dup2(4,1) to copy file descriptor 4 to file descriptor 1.
From now on, file descriptor 1 refers to the pipe’s write channel.
2. 2. Invokes the close( ) system call twice to release file descriptors 3
and 4
3. Invokes the execve( ) system call to execute the ls program The
program writes its output to the file that has file descriptor 1 (the
standard output); i.e., it writes into the pipe.
6. The second child process must execute the more program; therefore, it
performs the following operations:
1. Invokes dup2(3,0) to copy file descriptor 3 to file descriptor 0.
From now on, file descriptor 0 refers to the pipe’s read channel.
2. Invokes the close( ) system call twice to release file descriptors 3
and 4.
3. Invokes the execve( ) system call to execute more. By default, that
program reads its input from the file that has file descriptor 0 (the
standard input); i.e., it reads from the pipe
7. READINGAND WRITNG INTOAPIPE
● A process wishing to get data from a pipe issues a read( ) system
call, specifying the file descriptor
● associated with the pipe’s reading end. As
described in Section 12.6.2, the kernel ends up invoking
● the read method found in the file operation table associated
with the proper file object. In the case of a pipe,
● the entry for the read method in the read_pipe_fops table points
to the pipe_read( ) function.
8. PROGRAM EXECUTION
• We specifically describe how the kernel sets up the execution context for a
process according to the contents of the program file. While it may not seem
like a big problem to load a bunch of instructions into memory and point the
CPU to them, the kernel has to deal with flexibility in several areas:
• Different executable formats Linux is distinguished by its ability to run
binaries that were compiled for other operating systems
• Shared libraries Many executable files don’t contain all the code required to
run the program but expect the kernel to load in functions from a library at
runtime. Other information in the execution context This includes the
command-line arguments and environment variables familiar to programmers
9. By this we mean the collection of information needed to carry on a
specific computation; it includes the pages accessed, the open files,
the hardware register contents, and so on. An executable file is a
regular file that describes how to initialize a new execution context
(i.e., how to start a new computation).
Thesys_execve( ) service routine finds the corresponding file, checks
the executable format, and modifies the execution context of the
current process according to the information stored in it. As a result,
when the system call terminates, the process starts executing the code
stored in the executable file, which performs the directory listing.
10. DiPROCESS COMMANDS
The conventions for passing the command-line arguments depend on the
high-level language used. In the C language, the main( ) function of a
program may receive as parameters an integer specifying how many
arguments have been passed to the program and the address of an array of
pointers to strings.
The following prototype formalizes this standard:
int main(intargc, char *argv[])
Going back to the previous example, when the /bin/ls program is invoked,
argc has the value 3, argv[0]points to the ls string, argv[1] points to the -l
string, and argv[2] points to the /usr/bin string. The end of the argv array is
always marked by a null pointer, so argv[3] contains NULL.
11. The assigning, or resolution, of such addresses is performed by the
linker, which collects all the object files of the program and
constructs the executable file.
The linker also analyzes the library’s functions used by the program
and glues them into the executable file in a manner described later in
this chapter. Most programs, even the most trivial ones, use libraries.
Consider, for instance, the following one-line C program:
void main(void) { }
12. LIBRARIES
• Many other libraries of functions, besides the C library, are included
in Unix systems. A generic Linux system could easily have 50
different libraries. Just to mention a couple of them: the math library
libm includes advanced functions for floating point operations, while
the X11 library libX11 collects together the basic low-level functions
for the X11 Window System graphics interface.
• All executable files in traditional Unix systems were based on static
libraries . This means that the executable file produced by the linker
includes not only the code of the original program but also the code of
the library functions that the program refers to.
13. • Modern Unix systems use shared libraries . The executable file does not
contain the library object code, but only a reference to the library name.
• Shared libraries are especially convenient on systems that provide file
memory mapping, since they reduce the amount of main memory requested
for executing a program.
• When the program interpreter must link some shared library to a process, it
does not copy the object code, but just performs a memory mapping of the
relevant portion of the library file into the process’s address space. This
allows the page frames containing the machine code of the library to be
shared among all processes that are using the same code.
14. PROGRAM SEGMENTS
The linear address space of a Unix program is traditionally partitioned, from
a logical point of view, in several linear address intervals called segments
Text segment Includes the executable code Initialized data segment Contains
the initialized data—that is, the static variables and the global variables
whose initial values are stored in the executable file (because the program
must know their values at startup).
Uninitialized data segment (bss) Contains the uninitialized data—that is, all
global variables whose initial values are not stored in the executable file
(because the program sets the values before referencing them)
It is historically called a segment. The stack segment Contains the program
stack, w
15. • start_code, end_code Store the initial and final linear addresses of the
memory region that includes the native code of the program—the
code in the executable file.
• Since the text segment includes shared libraries but the executable file
does not, the memory region demarcated by these fields is a subset of
the text segment.
• start_data, end_data Store the initial and final linear addresses of the
memory region that includes the native initialized data of the
program, as specified in the executable file.
16. • The fields identify a memory region that roughly corresponds to the data
segment. Actually, start_data should almost always be set to the address of the
first page right after end_code, and thus the field is unused.
• The end_data field is used, though. start_brk, brk Store the initial and final
linear addresses of the memory region that includes the dynamically allocated
memory areas of the process (see Section 8.6). This memory region is
sometimes called the heap.
• start_stack Stores the address right above that of main( )’s return address,
higher addresses are reserved (recall that stacks grow toward lower
addresses). arg_start, arg_end Store the initial and final addresses of the stack
portion containing the command-line arguments
17. PROCESS MEMORY REGIONS
• The memory region starting from 0x804d000 is a memory mapping
associated with another portion of/sbin/init ranging from byte 16384
(corresponding to offset 0x4000 shown in Table 20-4) to 20,479. Since the
permissions specify that the private region may be written, we can conclude
that it maps the data segment of the program.
• The next one-page memory region starting from 0x0804e000 is anonymous,
that is, it is not associated with any file and includes the bss segment of init.
• Similarly, the next three memory regions starting from 0x40000000,
0x40015000, and 0x40016000correspond to the text segment, the data
segment, and the bss segment, respectively, of the /lib/ld.2.2.3.solibrary,
which is the program interpreter for the ELF shared libraries.
18. On this system, the C library happens to be stored in the
/lib/libc.2.2.3.so file. The text segment, data segment, and bss
segment of the C library are mapped into the next three memory
regions, starting from address0x40020000.
Remember that page frames included in private regions can be shared
among several processes with the Copy On Write mechanism, as long
as they are not modified.
Thus, since the text segment is read-only, the page frames containing
the executable code of the C library are shared among almost all
currently executing processes
19. Execution tracing is a technique that allows a program to monitor the
execution of another program. The traced program can be executed
step by step, until a signal is received, or until a system call is
invoked.
Execution tracing is widely used by debuggers, together with other
techniques like the insertion of breakpoints in the debugged program
and run-time access to its variables. We focus on how the kernel
supports execution tracing rather than discussing how debuggers
work.
EXECUTION TRACING
20. Processes having the CAP_SYS_PTRACE capability flag set are allowed to
trace any process in the system except init. Conversely, a process P with no
CAP_SYS_PTRACE capability is allowed to trace only processes having the
same owner as P. Moreover, a process cannot be traced by two processes at
the same time.
The ptrace( ) system call modifies the p_pptr field in the descriptor of the
traced process so that it points to the tracing process; therefore, the tracing
process becomes the effective parent of the traced one. When execution
tracing terminates—i.e., when ptrace( ) is invoked with the
PTRACE_DETACH command—the system call sets p_pptr to the value of
p_opptr, thus restoring the original parent of the traced process
21. EXECUTABLE FORMATS
The standard Linux executable format is named Executable and
Linking Format ( ELF). It was developed by Unix System
Laboratories and is now the most widely used format in the Unix
world. Several well-known Unix operating systems, such as System V
Release 4 and Sun’s Solaris 2, have adopted ELF as their main
executable format.
Older Linux versions supported another format named Assembler
OUTput Format (a.out); actually, there were several versions of that
format floating around the Unix world. It is seldom used now, since
ELF is much more practical
22. load_binary: Sets up a new execution environment for the current
process by reading the information stored in an executable file.
load_shlib: Dynamically binds a shared library to an already running
process; it is activated by the use lib ( )system call.
core_dump: Stores the execution context of the current process in a
file named core. This file, whose format depends on the type of
executable of the program being executed, is usually created when a
process receives a signal whose default action is “dump”
23. All linux_binfmt objects are included in a simply linked list, and the
address of the first element is stored in the formats variable. Elements
can be inserted and removed in the list by invoking the
register_binfmt( )and unregister_binfmt( ) functions.
The register_binfmt( ) function is executed during system startup for
each executable format compiled into the kernel.
This function is also executed when a module implementing a new
executable format is being loaded, while the unregister_binfmt( )
function is invoked when the module is unloaded.
24. EXECUTION DOMAIN
Two kinds of support are offered for these “foreign” programs:
Emulated execution: necessary to execute programs that include system calls
that are not POSIXcompliant
Native execution: valid for programs whose system calls are totally POSIX-
compliant Microsoft MS-DOS and Windows programs are emulated: they
cannot be natively executed, since they include APIs that are not recognized
by Linux.
An emulator like DOSemu or Wine (which appeared in the example at the end
of the previous section) is invoked to translate each API call into an emulating
wrapper function call, which in turn uses the existing Linux system calls. Since
emulators are mostly implemented as User Mode applications, we don’t
discuss them further.
25. On the other hand, POSIX-compliant programs compiled on
operating systems other than Linux can be executed without too much
trouble, since POSIX operating systems offer similar APIs. (Actually,
the APIs should be identical, although this is not always the case.)
Minor differences that the kernel must iron out usually refer to how
system calls are invoked or how the various signals are numbered.
This information is stored in execution domain descriptors of type
exec_domain.
A process can change its personality by issuing a suitable system call
named personality( ); typical values assumed by the system call’s
parameter
26. Microsoft MS-DOS and Windows programs are emulated: they
cannot be natively executed, since they include APIs that are not
recognized by Linux. An emulator like DOSemu or Wine (which
appeared in the example at the end of the previous section)
It is invoked to translate each API call into an emulating wrapper
function call, which in turn uses the existing Linux system calls. Since
emulators are mostly implemented as User Mode applications, we
don’t discuss them further
On the other hand, POSIX-compliant programs compiled on
operating systems other than Linux can be executed without too much
trouble, since POSIX operating systems offer similar APIs
27. EXEC FUNCTIONS
Unix systems provide a family of functions that replace the execution
context of a process with a new context described by an executable
file. The names of these functions start with the prefix exec, followed
by one or two letters; therefore, a generic function in the family is
usually referred to as an exec function.
Besides the first parameter, the execl( ), execlp( ), and execle( )
functions include a variable number of additional parameters. Each
points to a string describing a command-line argument for the new
program; as the "l" character in the function names suggests, the
parameters are organized in a list terminated by a NULL value
28. Usually, the first command-line argument duplicates the executable
filename. Conversely, the execv( ), execvp( ), and execve( ) functions
specify the command-line arguments with a single parameter; as the
vcharacter in the function names suggests, the parameter is the
address of a vector of pointers to command-line argument strings. The
last component of the array must be NULL.
The execle( ) and execve( ) functions receive as their last parameter
the address of an array of pointers to environment strings; as usual,
the last component of the array must be NULL. The other functions
may access the environment for the new program from the external
environ global variable, which is defined in the C library.
29. The sys_execve( ) service routine receives the following parameters:
The address of the executable file pathname (in the User Mode
address space).
The address of a NULL-terminated array (in the User Mode address
space) of pointers to strings (again in the User Mode address space);
each string represents a command-line argument.
The address of a NULL-terminated array (in the User Mode address
space) of pointers to strings (again in the User Mode address space);
each string represents an environment variable in the NAME=value
format.
30. In turn, do_execve( ) performs the following operations:
1. Statically allocates a linux_binprm data structure, which will be filled with
data concerning the new executable file.
2. Invokes path_init( ), path_walk( ), and dentry_open( ) to get the dentry
object, the file object, and the inode object associated with the executable
file. On failure, returns the proper error code.
3. Verifies that the executable file is not being written by checking the
i_writecount field of the inode; stores-1 in that field to forbid further write
accesses.
4. Invokes the prepare_binprm( ) function to fill the linux_binprm data
structure