04basic Concepts

Basic Concepts

CS 167 IV–1 Copyright © 2006 Thomas W. Doeppner. All rights reserved.

IV–1

Outline

• Subroutine linkage
• Thread linkage
• Input/output
• Dynamic storage allocation


In this lecture we go over some basic concepts important to the study of operating
systems. We look at the low-level details of subroutine calling and see how they relate to
the implementation of threads. We then cover the basics of I/O architectures. Finally, we
look at dynamic storage allocation.

IV–2

Subroutines

main( ) { int sub(int x, int y) {
int i; return(x+y);
int a; }

...

i = sub(a, 1);
...
}


Subroutines are (or should be) a well understood programming concept: one procedure
calls another, passing it arguments and possibly expecting a return value. We examine how
the linkage between caller and callee is implemented, first on the Intel x86 and then on a
SPARC.

IV–3

Intel x86 (32-Bit):
Subroutine Linkage

args
eip
ebp stack frame
saved registers
local variables
args
eip
ebp ebp
saved registers
local variables
esp


Subroutine linkage on an Intel x86 is fairly straightforward. (We are discussing the 32-bit
version of the architecture.) Associated with each incarnation of a subroutine is a stack
frame that contains the arguments to the subroutine, the instruction pointer (in register eip)
of the caller (i.e. the address to which control should return when the subroutine
completes), a copy of the caller’s frame pointer (in register ebp), which links the stack frame
to the previous frame, space to save any registers modified by the subroutine, and space
for local variables used by the subroutine. Note that these frames are of variable size—the
size of the space reserved for local data depends on the subroutine, as does the size of the
space reserved for registers.
The frame pointer register (eip) points into the stack frame at a fixed position, just after
the saved copy of the caller’s instruction pointer (note that lower-addressed memory is
towards the bottom of the picture). The value of the frame pointer is not changed by the
subroutine, other than setting it on entry to the subroutine and restoring it on exit. The
stack pointer (esp) always points to the last item on the stack—new allocations (e.g. for
arguments to be passed to the next procedure) are performed here.
This picture is idealized: not all portions of the stack frame are always used. For example,
registers are not saved if the subroutine doesn’t modify them. The frame pointer is not
saved if it’s not used, etc.
Note: for linked-list fans, a stack is nothing more than a singly linked list of stack frames.
The Intel Pentium IV architecture manuals can be found at
http://developer.intel.com/design/pentium4/manuals/.

IV–4

Intel x86:
Subroutine Code
_main PROC NEAR _sub PROC NEAR
push ebp ; push frame ptr push ebp ; push f ptr
mov ebp, esp ; set frame ptr mov ebp, esp ; set f ptr
sub esp, 8 ; space for locals mov eax, 8[ebp] ; get x
push 1 ; push arg 2 add eax, 12[ebp] ; add y
mov eax, -4[ebp] ; get a pop ebp ; pop f ptr
push eax ; push a ret 0 ; return
call sub
add esp, 8 ; pop args
mov -8[ebp], eax ; store in i
xor eax, eax ; return 0
mov esp, ebp ; restore stk ptr
pop ebp ; pop f ptr
ret 0 ; return


Here we see assembler code based on the win32 calling sequence produced by the
Microsoft Visual C++ compiler (with no optimization). In the main routine, first the frame
pointer is pushed on the stack (following the arguments and instruction pointer (return
address) which had been pushed by the caller). Next the current stack pointer is copied
into the frame pointer register (ebp), thereby establishing a fixed reference into the stack
frame. Space is now allocated on the stack for two local variables (occupying a total of eight
bytes) by subtracting eight from the stack pointer.
At this point the entry code for the main routine is complete and we now get ready to call
the subroutine. First the arguments are pushed onto the stack, in reverse order. Note that
a is referred to as four below the frame pointer (i.e., the first of the local variables). The
subroutine is called. On return, the two arguments are popped off the stack by adding their
size to the stack pointer. The return value of sub, in register eax, is stored into i.
Now the main routine is ready to return to its caller. It clears the return register (eax), so
as to return a zero, restores the stack pointer’s value to what was earlier copied into the
frame pointer (thereby popping the local variables from the stack), restores the frame
pointer by popping it off the stack, and finally returns to the caller.
The action in the subroutine sub is similar. First the frame pointer (ebp) is pushed onto
the stack, then the current stack pointer (esp) is copied into the frame pointer register.
With the stack frame’s location established by the frame pointer, the code accesses the two
parameters as 8 and 12 above the position pointed to by the frame pointer, respectively.
The sum of the two parameters is stored in the result register (eax), the old frame pointer is
popped from the stack, and finally an ret instruction is ececuted to pop the return address
off the stack and return to it.

IV–5

SPARC Architecture
return address i7 r31 o7 r15
frame pointer i6 r30 stack pointer o6 r14
i5 r29 o5 r13
i4 r28 o4 r12
i3 r27 o3 r11
i2 r26 o2 r10
i1 r25 o1 r9
i0 r24 o0 r8
Input Registers Output Registers

l7 r23 g7 r7
l6 r22 g6 r6
l5 r21 g5 r5
l4 r20 g4 r4
l3 r19 g3 r3
l2 r18 g2 r2
l1 r17 g1 r1
l0 r16 0 g0 r0
Local Registers Global Registers

The SPARC (Scalable Processor ARChitecture) is an example of a RISC (Reduced-
Instruction-Set Computer). We won’t go into all of the details of its architecture, but we do
cover what is relevant from the point of view of subroutine calling conventions. There are
nominally 32 registers on the SPARC, arranged as four groups of eight—input registers,
local registers, output registers, and global registers. Two of the input registers serve the
special purposes of a return address register and a frame pointer, much like the
corresponding registers on the 68000. One of the output registers is the stack pointer.
Register 0 (of the global registers) is very special—when read it always reads 0 and when
written it acts as a sink.
The SPARC architecture manual can be found at
http://www.sparc.com/standards/V8.pdf.

IV–6

SPARC Architecture:
Register Windows

input
window 1 local

output input

local window 2

input output
window 3 local
output


As its subroutine-calling technique the SPARC uses sliding windows: when one calls a
subroutine, the caller’s output registers become the callee’s input registers. Thus the
register sets of successive subroutines overlap, as shown in the picture.
Any particular implementation of the SPARC has a fixed number of register sets (of eight
registers a piece)—seven in the picture. As long as we do not exceed the number of register
sets, subroutine entry and exit is very efficient—the input and local registers are effectively
saved (and made unavailable to the callee) on subroutine entry, and arguments (up to six)
can be efficiently passed to the callee. The caller just puts outgoing arguments in the
output registers and the callee finds them in its input registers. Returning from a
subroutine involves first putting the return value in a designated input register (i0). In a
single action, control transfers to the location contained in i7, the return address register,
and the register windows are shifted so that the caller’s registers are in place again.
However, if the nesting of subroutine calls exceeds the available number of register sets,
then subroutine entry and exit is not so efficient—the register windows must be copied to
an x86-like stack. As implemented on the SPARC, when an attempt is made to nest
subroutines deeper than can be handled by the register windows, a trap occurs and the
operating system is called upon to copy the registers to the program’s stack and reset the
windows. Similarly, when a subroutine return encounters the end of the register windows,
a trap again occurs and the operating system loads a new set of registers from the values
stored on the program’s stack.

IV–7

SPARC Architecture:
Stack
FP, old SP
storage for local variables

dynamically allocated stack space

space for compiler temporaries
and saved floating point registers

outgoing parameters beyond 6th
save area for callee to store
register arguments
one-word “hidden” parameter

16 words to save in and local regs
SP


The form of the SPARC stack is shown in the picture. Space is always allocated for the
stack on entry to a subroutine. The space for saving the in and local registers is not used
unless necessary because of a window overflow. The “hidden” parameter supports
programs that return something larger than 32 bits—this field within the stack points to
the parameter (which is located in separately allocated storage off the stack).

IV–8

SPARC Architecture:
Subroutine Code
ld [%fp-8], %o0 sub:
! put local var (a) save %sp, -64, %sp
! into out register ! push a new
mov 1, %o1 ! stack frame
add %i0, %i1, %i0
! deal with 2nd
! compute sum
! parameter ret
call sub ! return to caller
nop restore
st %o0, [%fp-4] ! pop frame off
! store result into ! stack (in delay slot)
! local var (i)
...

Here we see the assembler code produced by a compiler for the SPARC. The first step, in
preparation for a subroutine call, is to put the outgoing parameters into the output
registers. The first parameter, a from our original C program, is a local variable and is
found in the stack frame. The second parameter is a constant. The call instruction merely
saves the program counter in o7 and then transfers control to the indicated address. In the
subroutine, the save instruction creates a new stack frame and advances the register
windows. It creates the new stack frame by taking the old value of the stack pointer (in the
caller’s o6), subtracting from it the amount of space that is needed (64 bytes in this
example), and storing the result into the callee’s stack pointer (o6 of the callee). At the
same time, it also advances the register windows, so that the caller’s output registers
become the callee’s input registers. If there is a window overflow, then the operating system
takes over.
Inside the subroutine, the return value is computed and stored into the callee’s i0. The
restore instruction pops the stack and backs down the register windows. Thus what the
callee left in i0 is found by the caller in o0.

IV–9

Representing Threads

Thread A Thread B
Control Block Control Block

fp
Stack Stack fp

sp
sp


We now consider what happens with multiple threads of control. Each thread must have
its own context, represented by a control block and a stack. Together these represent what
needs to be known about a thread within a particular address space. We are at the moment
concerned about aspects of a thread pertaining to its flow of control. Thus we need to keep
track of those components of a thread that affect its flow of control, in particular, the entire
contents of each thread’s stack and the registers containing the status of the stack—the
stack pointer and the frame pointer.

IV–10

Switching Between Threads

void switch(thread_t next_thread) {
save current_thread’s SP and FP;
restore next_thread’s SP and FP;
return;
}


Switching between thread contexts turns out to be very straightforward (though not
expressible in most programming languages). We have an ordinary-looking subroutine,
switch. A thread calls it, passing the address of the control block of the thread to whose
context we wish to switch. On entry to the subroutine the caller’s registers are saved. The
caller then saves its own stack pointer (SP) and frame pointer (FP) in its own control block.
It then fetches the target thread’s stack and frame pointers from its control block and loads
them into the actual stack and frame pointers. At this point, we have effectively switched
threads, since we are now executing on the target thread’s stack. All that has to be done is
to return—the return takes place on the target thread’s stack.
This may be easier to follow if you now work through what happens when some thread
switches to our original thread: it will switch to the original thread’s stack and execute a
return, in the context (on the stack) of the original thread. So, from the point of view of the
original thread, it made a call to switch, which didn’t appear to do very much, but it took a
long time to do it.

IV–11

System Calls
prog( ) { write( ) {
prog Frame
... ...
write(fd, buffer, size); trap(write_code); write Frame
... ...
} }
User Stack
User
Kernel trap_handler(code) { trap_handler
... Frame
if (code == write_code) write_handler
Frame
write_handler( );
...
}
Kernel Stack

System calls involve the transfer of control from user code to system (or kernel) code and
back again. However, keep in mind that this does not necessarily involve a switch between
different threads—the original thread executing in user mode just changes its execution
mode to kernel (privileged) mode.
For an example, consider a C program, running on a Unix system, that calls write. From
the programmer’s perspective, write is a system call, but a bit more work needs to be done
before we enter the kernel. Write is actually a routine supplied in a special library of (user-
level) programs, the C library. Write is probably written in assembler language; the heart of
it is some instruction that causes a trap to occur, thereby making control enter the
operating system. Prior to this point, the thread had been using the thread’s user stack.
After the trap, as part of entering kernel mode, the thread switches to using the thread’s
kernel stack. (This notion of two stacks is used by most common architectures.) Within the
kernel our thread enters a fault-handler routine that determines the nature of the fault and
then calls the handler for the write system call.
Note that if we have multiple threads of control, then each thread has its own pair of
stacks.

IV–12

Interrupts
Thread A
User Interrupt-
Thread A Stack Handler
Code
Code

Processor

Thread A
Kernel
Stack


When an interrupt occurs, the processor puts aside the execution of the current thread
and switches to executing the interrupt handler. When the interrupt handler is finished,
the processor resumes execution of the original thread. A very important question is: what
does the interrupt handler use for its stack? There are a number of possibilities: we could
allocate a new stack each time an interrupt occurs, we could have one stack that is shared
by all interrupt handlers, or the interrupt handler could borrow a stack from the thread it
is interrupting.
The first technique, allocating a stack, is ruled out for a number of reasons, not the least
of which is that it is too time-consuming. The latter two approaches are both used. A single
system-wide interrupt stack was used on DEC’s VAX computers; in most other
architectures the interrupt handler borrows a stack (the kernel stack) from the thread that
was interrupted.
It is very significant that the interrupt handler uses a stack borrowed from a thread. It
means that the interrupt handler executes in a context that is conceptually different from
that of a typical thread. This interrupt context cannot be put aside and resumed as thread
contexts can. For a single, shared interrupt stack, only one interrupt handler can use it at
a time (or, more precisely, in the case of nested interrupts, only one interrupt handler can
be both running and at the top of the stack at a time); thus we cannot put one interrupt
context aside and resume another. If the interrupt handler borrows a thread’s kernel stack,
we now have two contexts using the same stack; thus, at the very least, the interrupted
thread cannot be resumed until the interrupt handler completes, which means that we
cannot put the interrupt handler aside and resume normal execution, since normal
execution would involve resuming the interrupted thread!

IV–13

Input/Output

• Architectural concerns
– memory-mapped I/O
- programmed I/O (PIO)
- direct memory access (DMA)
– I/O processors (channels)
• Software concerns
– device drivers
– concurrency of I/O and computation


In this section we address the area of input and output (I/O). We discuss two basic I/O
architectures and talk about the fundamental I/O-related portion of an operating system—
the device driver.

IV–14

Simple I/O Architecture

Bus

Controller Controller Controller

Processor

Memory Disk


A very simple I/O architecture is the memory-mapped architecture. Each device is
controlled by a controller and each controller contains a set of registers for monitoring and
controlling its operation. In the memory-mapped approach, these registers appear to the
processor as if they occupied physical memory locations. In reality, each of the controllers
is connected to a bus. When the processor wants to access or modify a particular location,
it broadcasts the address on the bus. Each controller listens for a fixed set of addresses
and, if it finds that one of its addresses has been broadcast, then it pays attention to what
the processor would like to have done, e.g., read the data at a particular location or modify
the data at a particular location. The memory controller is a special case. It passes the bus
requests to the actual primary memory. The other controllers respond to far fewer
addresses, and the effect of reading and writing is to access and modify the various
controller registers.
There are two categories of devices, programmed I/O (PIO) devices and direct memory
access (DMA) devices. In the former, I/O is performed by reading or writing data in the
controller registers a byte or word at a time. In the latter, the controller itself performs the
I/O: the processor puts a description of the desired I/O operation into the controller’s
registers, then the controller takes over and transfers data between a device and primary
memory.

IV–15

Brown Simulator:
I/O Registers
DMA SIM_dev_ctl(
Control Register dev, new_val)
& PIO
DMA sts =SIM_dev_sts(
Status Register dev)
& PIO
val = SIM_dev_rreg(
PIO Read Register dev)
SIM_dev_wreg(
PIO Write Register dev, new_val)
SIM_dev_maddr(
DMA Memory Address Register dev, new_val)
SIM_dev_daddr(
DMA Device Address Register dev, new_val)


The Brown Simulator supports both PIO and DMA devices. The default configuration has
one PIO device (a terminal) and one DMA device (a disk). Each device is identified by a
handle, as described in the simulator documentation. For each PIO device there are four
registers: Control, Status, Read, and Write. For each DMA device there are also four
registers: Control, Status, Memory Address, and Device Address. In the simulator, rather
than reading or writing particular locations to access these registers, procedures for
register access are provided, as shown in the picture.
Note that the title of the slide contains a hypertext link to the Brown Simulator manual.

IV–16

Programmed I/O

• E.g.: Terminal controller (in the simulator)
• Procedure (write)
– write a byte into the write register
– set the WGO bit in the control register
– wait for WREADY bit (in status register) to be
set (if interrupts have been enabled, an
interrupt occurs when this happens)


The sequence of operations necessary for performing PIO is outlined in the picture. One
may choose to perform I/O with interrupts disabled, you must check to see if I/O has
completed by testing the ready bit. If you perform I/O with interrupts enabled, then an
interrupt occurs when the operation is complete. The primary disadvantage of the former
technique is that the ready bit is typically checked many times before it is discovered to be
set.

IV–17

Direct Memory Access

• E.g.: Disk controller (in the simulator)
• Procedure
– set the disk address in the device address
register (only relevant for a seek request)
– set the buffer address in the memory address
register
– set the op code (SEEK, READ or WRITE), the
GO bit and, if desired, the interrupt ENABLE bit
in the control register
– wait for interrupt or for READY bit to be set


For I/O to a DMA device, one must put a description of the desired operation into the
controller registers. A disk request on the simulator typically requires two operations: one
must first perform a seek to establish the location on disk from or to which the transfer will
take place. The second step is the actual transfer, which specifies that location in primary
memory to or from which the transfer will take place.

IV–18

Device Drivers

read

Common
write
Data

interrupt

Device Device Driver


A device driver is a software module responsible for a particular device or class of devices.
It resides in the lowest layers of an operating system and provides an interface to other
layers that is device-independent. That is, the device driver is the only piece of software that
is concerned about the details of particular devices. The higher layers of the operating
system need only pass on read and write requests, leaving the details to the driver. The
driver is also responsible for dealing with interrupts that come from its devices.

IV–19

I/O Processors: Channels

Channel Controller

Memory Channel Controller
Processor

Channel Controller


Not all architectures employ the memory-mapped I/O model. Another common approach
(used mainly on “mainframes” used for data processing) is the use of specialized I/O
processors called channels. Instead of containing a set of registers into which the central
processor writes a description of its requests, channels execute programs that have been
prepared for them in primary memory. The advantages of this approach are less central-
processor involvement in I/O and higher throughput.

IV–20

Dynamic Storage Allocation

• Goal: allow dynamic creation and destruction
of data structures
• Concerns:
– efficient use of storage
– efficient use of processor time
• Example:
– first-fit vs. best-fit allocation


Storage allocation is a very important concern in an operating system. Whenever a thread
is created, its stacks and control block and other data structures must be allocated, and
whenever a thread terminates, these data structures must be freed. As there are numerous
other such dynamic data structures, this allocation and liberation of storage must be done
as quickly as possible.
One plausible technique for allocating fixed-size objects is to maintain a linked list of
available (free) objects of the appropriate size, and then allocate from this list and return
items to the list when they are freed. This technique is very time-efficient, but not
necessarily space-efficient—one must determine ahead of time exactly how much space to
allocate for each size of object.
We discuss in this section space-efficient techniques for the management of storage. We
later discuss compromise techniques that also save time. Much of the material in this
section is taken from The Art of Computer Programming, Vol. 1: Fundamental Algorithms, by
D. Knuth.

IV–21

Allocation
1300
1200
First Fit Best Fit
300
1000 bytes 1300
1200 200

300 200
100 1100 bytes 200

50
250 bytes Stuck!
100


Consider the situation in which we have one large pool of memory from which we will
allocate (and to which we will liberate) variable-sized pieces of memory. Assume that we are
currently in the situation shown at the top of the picture: two unallocated areas of memory
are left in the pool—one of size 1300 bytes, the other of size 1200 bytes. We wish to process
a series of allocation requests, and will try out two different algorithms. The first is known
as first fit—an allocation request is taken from the first area of memory that is large enough
to satisfy the request. The second is known as best fit—the request is taken from the
smallest area of memory that is large enough to satisfy the request. On the principle that
whatever requires the most work must work the best, one might think that best fit would
be the algorithm of choice.
The picture illustrates a case in which first fit behaves better than best fit. We first
allocate 1000 bytes. Under the first-fit approach (shown on the left side), this allocation is
taken from the topmost region of free memory, leaving behind a region of 300 bytes of still
unallocated memory. With the best-fit approach (shown on the right side), this allocation is
taken from the bottommost region of free memory, leaving behind a region of 200 bytes of
still-unallocated memory. The next allocation is for 1100 bytes. Under first fit, we now have
two regions of 300 bytes and 100 bytes. Under best fit, we have two regions of 200 bytes.
Finally, there is an allocation of 250 bytes. Under first fit this leaves behind two regions of
50 bytes and 100 bytes, but the allocation cannot be handled under best fit—neither
remaining region is large enough.
Clearly, one could come up with examples in which best fit performs better. However,
simulation studies performed by Knuth have shown that, on the average, first fit works
best. Intuitively, the reason for this is that best fit tends to leave behind a large number of
regions of memory that are too small to be of any use.

IV–22

Implementing First Fit:
Data Structures
size
link size
link size
struct fblock link

struct fblock
struct fblock


We now look at an implementation of the first-fit allocation algorithm. We need a data
structure—struct fblock—to represent an unallocated region of memory. Since these regions
are of variable size, the data structure has a size field. We need to link the unallocated
regions together, and thus the data structure has a link field. Conceptually, the data
structure represents the entire region of unallocated memory, but, since C has no natural
ability to represent variable-sized structures, we define names for only the size and link
fields.
All of the fblocks are singly linked into a free list or avail list. The header for this list is
also a struct fblock.

IV–23

Implementing First Fit:
Code
char *firstfit(int size) { found:
struct fblock *current, *next; rem = next->size - size;
int rem; if (rem < sizeof(struct fblock)) {
// leave enough space for header
current->link = next->link;
current = &avail; return((char *)next);
next = current->link; } else {
next->size = rem -
while (next != &avail) { sizeof(struct fblock);
if (next->size >= size) // must account for the space
goto found; // occupied by the header
return((char *)
current = next;
((int)next + rem));
next = next->link; }
} }
return(NULL);
// error: no space


The C code for the first-fit algorithm is shown in the slide. It searches the avail list for the
first fblock that represents a large enough region of free memory. If it finds no such region,
it returns NULL. Otherwise it determines how much space will be left over after the
allocation (it must make certain that any leftover space has at least enough room for a
header—i.e. the size and link fields of struct fblock). It then returns a pointer to the
beginning of the allocated space.

IV–24

Liberation of Storage

A
free(A)


The liberation of storage is more difficult than its allocation, for the reason shown in the
picture. Here the shaded regions are unallocated memory. The region of storage, A,
separating the two unallocated regions is about to be liberated. The effect of doing this
should be to produce one large region of unallocated storage rather three adjacent smaller
regions. Thus the liberation algorithm must be able to handle this sort of situation.

IV–25

Boundary Tags

size -size

flink
blink
size -size

Allocated Block Free Block


A simple method for implementing storage liberation is to use a technique known as
boundary tags. The idea is that each region of memory, whether allocated or unallocated,
has a boundary tag at each end indicating its size and whether it is allocated or not. (A
positive size means allocated, a negative size means unallocated.) Thus, when we liberate a
region of memory, we can quickly check the adjacent regions to determine if they too are
free. Free regions are linked into a doubly linked list; thus free blocks also contain two link
fields—a forward link (flink) and a backward link (blink). We call the structure representing
a free block a struct block. (In the picture, storage addresses increase towards the top of the
page, so that a pointer to a struct block points to the bottom of the free block.)

IV–26

Boundary Tags: Code (1)

#define PREV(x) (((int *)x)[-1])
struct block avail;
// assume that avail is initialized to refer
// to list of available storage

void free(struct block *b) {
struct block *t0, *t1, *t2;

b = (struct block *)&PREV(b);
// b, as provided by the caller (who is not aware of the
// tags), points to the memory just after the boundary tag
b->size = -b->size;
// adjust the tag to indicate that the storage is “free”


This slide and the next presents the C code implementing liberation with boundary tags.
We define the macro PREV which, given the address of a struct block, returns the size
field of the preceding block.
The algorithm proceeds as follows. We first mark the beginning tag field of the block
being liberated to indicate that it is free. We then check to see if the previous adjacent
block is also free. If it is, we pull this block out of the free list and combine it with the block
being allocated. We then check to see if the block following the one being liberated is free. If
it is, we pull it out of the list and combine it with the block being liberated (which, of
course, may have already been combined with a previous block). Finally, after adjusting the
size fields in the tags, we insert the possibly combined block into the beginning of the free
list.

IV–27


// check if block just before b is free:
if (PREV(b) < 0) {
// it’s free, so remove from free list and combine with b
t0 = (struct block *) ((int)b - (-PREV(b)));
// t0 now points to preceding block
t1 = t0->flink; // get free block after t0
t2 = t0->blink; // get free block before t0
t1->blink = t2; // link together
t2->flink = t1; // thereby eliminate t0 from free list
t0->size += b->size; // combine sizes of t0 and b
b = t0; // b now refers to combined block
}
t0 = (struct block *)((int)b + (-b->size));
// t0 now points to block beyond b

IV–28

// check if the block just beyond b is free
if (t0->size < 0) {
// it’s free, so remove it from the free list
// and combine it with b
t1 = t0->flink; // get the free block after t0
t2 = t0->blink; // get the free block before t0
t1->blink = t2; // combine them together
t2->flink = t1; // thereby remove t0
// from the free list
b->size += t0->size; // b now refers to
// the combined block
t0 += -t0->size; // t0 again refers to the
// block beyond b
}


IV–29

Boundary Tags: (Code 4)

// connect the possibly combined blocks to
// the beginning of the free list
PREV(t0) = b->size; // fix up b’s trailing size field
b->flink = avail.flink; // link b into the
// beginning of the free list
b->blink = &avail;
avail.flink->blink = b;
avail.flink = b;
}


IV–30

Garbage Collection
root


Garbage collection is the accepted name for a class of techniques for liberating storage. The
general idea is that one does not liberate storage explicitly, rather it is somehow automatically
determined that a particular item is no longer useful and thus should be liberated.
Consider an application in which nodes are linked into a graph structure and assume that one
such node has been designated the root. Any node on a path that starts from the root is
considered accessible and hence useful. Any node not on a path that starts from the root is
inaccessible and hence not useful (it’s not attached to any data structure that is currently being
used). These not-useful nodes are called garbage. The problem is to determine which nodes are
garbage. In some cases, this can be done quite simply: we associate with each node a reference
count that contains the count of the number of pointers from other nodes to this node. Thus
when we point a pointer at a node, we increment the node’s reference count by one, and when
we remove such a pointer, we decrement the reference count by one. Then, if the reference count
is zero, the node cannot be on any path that emanates from the root and is hence garbage. As
soon as the reference count becomes zero, we can put the node on the free list.
It is clear that all nodes whose reference counts are zero are garbage, but is the converse true?
I.e., do all garbage nodes have a reference count that is zero? In the bottom of the picture are two
nodes, one pointing to the other. The first has a reference count of zero, but the second has a
reference count of one yet both nodes are garbage. But we can deal with this when we put the
node whose reference count is zero on the free list: we remove each of its pointers, decrementing
the reference counts of the nodes pointed to.
There is one more problem situation, however. Consider the node in the middle of the picture
that has three nodes pointing to it (two from above, one from below). If the top two pointers are
removed, then the node has a reference count of one, but it is not on a path that starts from the
root and hence is garbage. Thus reference counts are of no use at all in determining that this
node (and those it points to) are garbage. The problem is that the graph has a cycle. If we don’t
have cycles, then reference counts are sufficient for detecting garbage, but if we do have cycles,
then we must use some other technique.
General garbage-collection techniques use a two-phase approach: first, all nodes that are not
garbage are somehow “marked.” Then all unmarked nodes are collected and placed on the free
list.

IV–31

Garbage Collection:
Simple Algorithm
void GC( ) { void traverse(struct node *node) {
MARK(&nil) = if (node->lchild &&
MARK(&root) = 1; !MARK(node->lchild)) {
traverse(&root); // l child has not been visited
collect( ); MARK(node->lchild) = 1;
} traverse(node->lchild);
}
void collect( ) { if (node->rchild &&
for (all nodes) { !MARK(node->rchild)) {
if (!MARK(node)) // r child has not been visited
AddToFreeList(node); MARK(node->rchild) = 1;
else traverse(node->rchild);
MARK(node) = 0; }
} }
}


Our garbage-collection algorithm is quite simple (in fact: too simple. Using a recursive
algorithm for its marking phase, it makes a preorder traversal of the graph, which means
that it traverses a tree (or subtree) by first marking its root, then traversing the left subtree
and then the right subtree.
The collection phase simply examines every node in memory, appends unmarked nodes
to the free list, and clears all mark bits.
Why is this algorithm too simple? I.e., what’s wrong with it?

IV–32

04basic Concepts

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to 04basic Concepts

Similar to 04basic Concepts (20)

Recently uploaded

Recently uploaded (20)

04basic Concepts