The first child process, which must execute the l s program, performs the following operations:
1. Invokes dup2(4,1) to copy file descriptor 4 to file descriptor 1. From now on, file descriptor 1 refers to the pipe's write channel.
2. Invokes the close( ) system call twice to release file descriptors 3 and 4.
3. Invokes the execve( ) system call to execute the /bin/ls program . By default, such a program writes its output to the file having file descriptor 1 (the standard output), that is, it writes into the pipe .
The popen( ) function receives two parameters: the filename pathname of an executable file and a type string specifying the direction of the data transfer. It returns the pointer to a FILE data structure. The popen( ) function essentially performs the following operations:
1. Creates a new pipe by making use of the pipe( ) system call
2. Forks a new process, which in turn executes the following operations:
a. If type is r , duplicates the file descriptor associated with the pipe's write channel as file descriptor 1 (standard output); otherwise, if type is w , duplicates the file descriptor associated with the pipe's read channel as file descriptor 0 (standard input)
b. Closes the file descriptors returned by pipe( )
c. Invokes the execve( ) system call to execute the program specified by filename
3. If type is r , closes the file descriptor associated with the pipe's write channel; otherwise, if type is w , closes the file descriptor associated with the pipe's read channel
4. Returns the address of the FILE file pointer that refers to whichever file descriptor for the pipe is still open
The pclose( ) function, which receives the file pointer returned by popen( ) as its parameter, simply invokes the wait4( ) system call and waits for the termination of the process created by popen( ) .
The pipe( ) system call is serviced by the sys_pipe( ) function, which in turn invokes the do_pipe( ) function. In order to create a new pipe, do_pipe( ) performs the following operations:
1. Allocates a file object and a file descriptor for the read channel of the pipe, sets the flag field of the file object to O_RDONLY , and initializes the f_op field with the address of the read_ pipe_fops table.
2. Allocates a file object and a file descriptor for the write channel of the pipe, sets the flag field of the file object to O_WRONLY , and initializes the f_op field with the address of the write_ pipe_fops table.
3. Invokes the get_ pipe_inode( ) function, which allocates and initializes an inode object for the pipe. This function also allocates a page frame for the pipe buffer and stores its address in the base field of the pipe_inode_info structure.
4. Allocates a dentry object and uses it to link together the two file objects and the inode object.
5. Returns the two file descriptors to the User Mode process.
Whenever a process invokes the close( ) system call on a file descriptor associated with a pipe, the kernel executes the fput( ) function on the corresponding file object, which decrements the usage counter. If the counter becomes 0, the function invokes the release method of the file operations.
the pipe_read_release( ) and the pipe_write_release( ) functions set 0 to the readers and the writers fields, respectively, of the pipe_inode_info structure.
Each function then invokes the pipe_release( ) function. This function wakes up any processes sleeping in the pipe's wait queue so that they can recognize the change in the pipe state.
Moreover, the function checks whether both the readers and writers fields are equal to 0; in this case, it releases the page frame containing the pipe buffer.
A process wishing to get data from a pipe issues a read( ) system call, specifying as its file descriptor the descriptor associated with the pipe's read channel. T he kernel ends up invoking the read method found in the file operation table associated with the proper file object. In the case of a pipe, the entry for the read method in the read_pipe_fops table points to the pipe_read( ) function which performs the following operations:
1. Determines if the pipe size, which is stored into the inode's i_size field, is 0. In this case, determines if the function must return or if the process must be blocked while waiting until another process writes some data in the pipe. The type of I/O operation (blocking or nonblocking) is specified by the O_NONBLOCK flag in the f_flags field of the file object. If necessary, invokes the i nterruptible_sleep_on( ) function to suspend the current process after having inserted it in the wait queue to which the wait field of the pipe_inode_info data structure points.
2. Checks the lock field of the pipe_inode_info data structure. If it is not null, another process is currently accessing the pipe; in this case, either suspends the current process or immediately terminates the system call, depending on the type of read operation (blocking or nonblocking).
3. Increments the lock field.
4. Copies the requested number of bytes (or the number of available bytes, if the buffer size is too small) from the pipe's buffer to the user address space.
5. Decrements the lock field.
6. Invokes wake_up_interruptible( ) to wake up all processes sleeping on the pipe's wait queue.
7. Returns the number of bytes copied into the user address space.
A process wishing to put data into a pipe issues a write( ) system call, specifying as its file descriptor the descriptor associated with the pipe's write channel. The kernel satisfies this request by invoking the write method of the proper file object; the corresponding entry in the write _ pipe_fops table points to the pipe_write( ) function which performs the following functions:
1. Checks whether the pipe has at least one reading process. If not, sends a SIGPIPE signal to the current process and return an -EPIPE value.
2. Releases the i_sem semaphore of the pipe's inode, which was acquired by the sys_write( ) function and acquires the i_atomic_write semaphore of the same inode . The i_sem semaphore prevents multiple processes from starting write operations on a file, and thus on the pipe.
3. Checks whether the number of bytes to be written is within the pipe's buffer size:
a. If so, the write operation must be atomic. Therefore, checks whether the buffer has enough free space to store all bytes to be written.
b. If the number of bytes is greater than the buffer size, the operation can start as long as there is any free space at all. Therefore, checks for at least 1 free byte.
4. If the buffer does not have enough free space and the write operation is blocking, inserts the current process into the pipe's wait queue and suspends it until some data is read from the pipe. Notice that the i_atomic_write semaphore is not released, so no other process can start a write operation on the buffer. If the write operation is nonblocking, returns the -EAGAIN error code.
5. Checks the lock field of the pipe_inode_info data structure. If it is not null, another process is currently reading the pipe, so either suspends the current process or immediately terminates the write depending on whether the write operation is blocking or nonblocking.
6. Increments the lock field.
7. Copies the requested number of bytes (or the number of free bytes if the pipe size is too small) from the user address space to the pipe's buffer.
8. If there are bytes yet to be written, goes to step 4.
9. After all requested data is written, decrements the lock field.
10. Invokes wake_up_interruptible( ) to wake up all processes sleeping on the pipe's wait queue.
11. Releases the i_atomic_write semaphore and acquires the i_sem semaphore (so that sys_write( ) can safely release the latter).
12. Returns the number of bytes written into the pipe's buffer.
A process creates a FIFO by issuing a mknod( ) system call , passing to it as parameters the pathname of the new FIFO and the value S_IFIFO ( 0x1000 ) logically ORed with the permission bit mask of the new file.
POSIX introduces a system call named mkfifo( ) specifically to create a FIFO. This call is implemented in Linux, as in System V Release 4, as a C library function that invokes mknod( ) .
Once created, a FIFO can be accessed through the usual open( ) , read( ) , write( ) , and close( ) system calls .
2. T he function further determines specialized behavior for the set of file operations to be used by setting the f_op field of the file object to the address of some predefined tables shown in Table 18-5 .
3. Finally, the function checks whether the base field of the pipe_inode_info data structure is NULL ; in this case, it gets a free page frame for the FIFO's kernel buffer and stores its address in base .
It denotes a set of system calls that allows a User Mode process to :
Synchronize itself with other processes by means of semaphores
Send messages to other processes or receive messages from them
Share a memory area with other processes
IPC data structures are created dynamically when a process requests an IPC resource (a semaphore, a message queue, or a shared memory segment).
Each IPC resource is persistent: unless explicitly released by a process, it is kept in memory.
An IPC resource may be used by any process, including those that do not share the ancestor that created the resource.
E ach new resource is identified by a 32-bit IPC key , which is similar to the file pathname in the system's directory tree.
Each IPC resource also has a 32-bit IPC identifier , which is somewhat similar to the file descriptor associated with an open file. When two or more processes wish to communicate through an IPC resource, they all refer to the IPC identifier of the resource.
The ftok( ) function attempts to create a new key from a file pathname and an 8-bit project identifier passed as parameters. It does not gu arantee, however, a unique key number .
One process issues a semget( ) , msgget( ) , or shmget( ) function by specifying IPC_PRIVATE as its IPC key. A new IPC resource is thus allocated, and the process can either communicate its IPC identifier to the other process in the application or fork the other process itself.
IPC_CREAT flag specifies that the IPC resource must be created, if it does not already exist.
IPC_EXCL flag specifies that the function must fail if the resource already exists and the IPC_CREAT flag is set.
Each IPC identifier is computed by combining a slot usage sequence number s relative to the resource type, an arbitrary slot index i for the allocated resource, and the value chosen in the kernel for the maximum number of allocatable resources M , where 0 i < M
IPC identifier = s x M + i
At every allocation, slot index i is incremented.
At every deallocation, t he slot usage sequence number s , which is initialized to 0, is incremented by 1 while i decreases.
Each IPC semaphore is a set of one or more semaphore values , called primitive semaphores , not just a single value as for kernel semaphores.
The number of primitive semaphore s in each IPC semaphore must be specified as a parameter of the semget( ) function when the resource is being allocated, but it cannot be greater than SEMMSL (usually 32).
When a process chooses to use this mechanism, the resulting operations are called undoable semaphore operations. When the process dies, all of its IPC semaphores can revert to the values they would have had if the process had never started its operations.
T o access one or more resources protected by an IPC semaphore , a process:
1. Invokes the semget( ) wrapper function to get the IPC semaphore identifier, specifying as the parameter the IPC key of the IPC semaphore that protects the shared resources. If the process wants to create a new IPC semaphore, it also specifies the IPC_CREATE or IPC_PRIVATE flag and the number of primitive semaphores required.
2. Invokes the semop( ) wrapper function to test and decrement all primitive semaphore values involved. If all the tests succeed, the decrements are performed, the function terminates, and the process is allowed to access the protected resources. If some semaphores are in use, the process is usually suspended until some other process releases the resources. The function receives as parameters the IPC semaphore identifier, an array of numbers specifying the operations to be atomically performed on the primitive semaphores, and the number of such operations. Optionally, the process may specify the SEM_UNDO flag, which instructs the kernel to reverse the operations should the process exit without releasing the primitive semaphores.
3. When relinquishing the protected resources, invokes the semop( ) function again to atomically increment all primitive semaphores involved.
4. Optionally, invokes the semctl( ) wrapper function, specifying in its parameter the IPC_RMID flag to remove the IPC semaphore from the system.
Linux is a multitasking kernel that allows more than one process to exist at any given time
A kernel’s scheduler enforces a thread scheduling policy, including when, for how long, and in some cases where (on Symmetric Multiprocessing (SMP) systems) threads can execute. Normally the scheduler runs in its own thread, which is woken up by a timer interrupt. Otherwise it is invoked via a system call or another kernel thread that wishes to yield the CPU.
A thread will be allowed to execute for a certain amount of time , then a context switch to the scheduler thread will occur, followed by another context switch to a thread of the scheduler’s choice.
Runqueue keeps track of all runnable tasks assigned to a particular CPU. As such, one runqueue is created and maintained for each CPU in a system.
Each runqueue contains two priority arrays, tasks on a CPU begin in one priority array, the active one, and as they run out of their timeslices they are moved to the expired priority array. During the move, a new timeslice is calculated. When there are no more runnable tasks in the active priority arrays, it is simply swapped with the expired priority array (simply updating two pointers).
Runqueue also keep track of a CPU’s special thread information (idle thread, migration thread).
unsigned int nr_active The number of active tasks in the priority array.
unsigned long bitmap[BITMAP_SIZE]
The bitmap representing the priorities for which active tasks exist in the priority array. It is stored in form of 5 32-bit words.
For example - if there are three active tasks, two at priority 0 and one at priority 5, then bits 0 and 5 should be set in this bitmap. Because the number of priorities is constant time to search the first set bit.
All tasks have a static priority, often called a nice value. On Linux, nice values range from -20 to 19, with higher values being lower priority . By default, tasks start with a static priority of 0, but that priority can be changed via the nice() system call. Apart from its initial value and modifications via the nice() system call, the scheduler never changes a task’s static priority. Static priority is the mechanism through which users can modify task’s priority, and the scheduler will respect the user’s input.
A task’s static priority is stored in its static_prio variable.
Interactive task’s timeslice may be broken up into chunks, based on the TIMESLICE_GRANULARITY value in the scheduler.
The function scheduler_tick() (called after every 1 msec) checks to see if the currently running task has been taking the CPU from other tasks of the same dynamic priority for too long (TIMESLICE_GRANULARITY).
If a task has been running for TIMESLICE_GRANULARITY and task of the same dynamic priority exists a round-robin switch between other tasks of the same dynamic priority is made.
If a task is sufficiently interactive, when it exhausts its timeslice, it will not be inserted into the expired array, but instead reinserted back into the active array(so long as nothing is starving in the expired priority array checked by EXPIRED_STARVING() called by scheduler_tick())
Called whenever a task wishes to give up the CPU voluntarily (often through the sys_sched_yield() system call), and if scheduler_tick() sets the TIF_NEED_RESCHED flag on a task because it has run out of timeslice.
scheduler_tick() is a function called during every system time tick, via a clock interrupt. It checks the state of the currently running task and other tasks in a CPU’s runqueue to see if scheduling is necessary .