A survey of indexing techniques for sparse matrices

  • 129 views
Uploaded on

 

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
129
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
7
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. A Survey of Indexing Techniques for Sparse Matrices UDO W. POOCH, AND AL NIEDER Texas A & M Umversily,* College Statwn, Texas A sparse matrix is defined to be a matrix containing a high proportion of elements that are zeros. Sparse matrices of large order are of great interest and application in science and industry; for example, electrical networks, structural engineering, power distribution, reactor diffusion, and solutions to differential equations While conclusions within this paper are primarily drawn considering orders of greater than 1000, much ~s applicable to sparse matrices of smaller orders in the hundreds. Because of increasing use of large order sparse matrices and the tendency to attempt to solve larger order problems, great attention must be focused on core storage and execution time Every effort should be made to optimize both computer memory allocation and executmn times, as these are the limiting factors that most often dictate the practicahty of solving a given problem Indexing algorithms are the subject of this paper, as they are generMly recognized as the most ~mportant factor in fast and efficient processing of large order sparse matrices. Indexing schemes of main interest are the bit map, address map, row-column, and the threaded list Major variations of the indexing techniques above mentioned are noted, as well as the particular indexing scheme inherent in diagonal or band matrices. The concluding section of the paper compares the types of methods, discusses their suitabihty for different types of processing, and makes suggestions eoneernlng the adaptability and flexibility of the maj or exmting methods of indexing algorithms for application to user problems Key Words and Phrases: Matrix, sparse matrix, matrix manipulation, indexing. CR Categomes: 5 14, 5 19 I. INTRODUCTION Computations involving sparse matrices have been of widespread use since the 1950s, becoming increasingly popular with the advent of faster cycle times and larger computer memories. One cycle time is the time required for the central processing unit to send and to receive a data signal from main memory. Systems applications for sparse matrices include electrical networks and power distribution, structural engineering, reactor diffusion, and solutions to differentim equations. A sparse matrix is a matrix having few nonzero elements. Matrix density is defined as the number of nonzero elements of the * D e p a r t m e n t of Industrial Engineering. matrix divided by the total number of elements in the full matrix. Most available references utilizing sparse matrices for calculations [1-8] consider matrices of order 50, or more [9, 10], with densities ranging from 15 % to 25 % and decreasing steadily as the order increases. This paper will accept these boundary conditions as a strict definition of a sparse matrix. Brayton, Gustavson, and Willoughby [8] say that a typical large (implied to be in the hundreds) order sparse matrix has 2 to 10 nonzero entries per row. Hays [5] says that an average of 20 nonzero elements per row is not an unreasonably small number in quite large (implied to be around 100 and greater) order. Livesley [1] indicates that an average of 3 or 4 elements Computing Surveys, Vol. 5, No. 2, June 1973
  • 2. 110 • U. W. Pooch and A. Nieder CONTENTS I Introduction II Bit Map Scheme III Address Map Scheme IV Row-Column Scheme V Threaded List Scheme ¥I Diagonal or Band Indexing Scheme VII Conclusion Appendix A Algorithm 1 Bit Map Scheme Algorithm 2 Address Map Scheme Algorithm 3. Address Map Scheme Bibliography 109-112 112-114 114-116 116-119 119-122 122-123 123-127 127-132 132-133 C o p y r i g h t (~ 1973, A s s o c ~ a t m n for C o m p u t i n g M a c h i n e r y , Inc. G e n e r a l p e r m i s s i o n to r e p u b h s h , b u t n o t for profit, all or p a r t of t h i s m a t e r i a l is g r a n t e d , p r o v i d e d t h a t A C M ' s c o p y r i g h t n o t i c e is g i v e n a n d t h a t r e f e r e n c e is m a d e to thJs p u b l i c a tion, to i t s d a t e of ~.ssue, a n d to t h e f a c t t h a t rep r m L i n g p r i v i l e g e s were g r a n t e d b y p e r m i s s i o n of t h e A s s o c m t m n for C o m p u t i n g M a c h i n e r y . Computing Sulvevs, Vol 5, No 2, June 1973 per row in a large (implied to be around 1000) order structural problem is a good estimate. If the order I of the matrix is reasonably small, i.e., about order 50 or less, it would make little difference if the full matrix were kept in core. However, if the sparse matrix is of larger order than about 50, it becomes efficient in terms of execution time and core allocation to store only the nonzero entries of the matrix. The efficiency of retaining only the nonzero elements becomes obvious in the exampie of a 500 X 500 matrix with 10 % density. With one word of storage allocated for each element, the matrix requires 250,000 words, which is very often more than is physically available. Storing only the nonzero elements requires 25,000 words. If the full matrix were multiplied by a similar full matrix a minimum of 500 X 500 X 500 = 125 X 106 arithmetic operations are required, compared to a minimum of (500 X 10 %)3 = 125 X 103 arithmetic operations when only the nonzero elements are retained. If both 500 X 500 matrices were to be retained in core as full matrices, core allocation and execution time would be prohibitive on many computers, and the problem would be abandoned as infeasible for computer solution. By storing the nonzero elements in some reasonable manner, and using logical operations to decide when arithmetic operations are necessary, Brayton, et al. [8] relate that both the storage requirements and the required amount of arithmetic can often, in practice, be decreased by a factor of I over the full matrix. Sparse matrices are classified generally by the arrangement of the nonzero elements. When the matrix is in random form, nonzero elements appear in no specific pattern. A matrix is said to be a band matrix, or in band form, if its elements a~.~ = 0 for [ i - j I > m (where m is a small integer, and usually m ~ I) and where the nonzero elements form a band along the mam diagonal. The band width is the number of nonzero elements that appear in one row of a band matrix (i.e., 2m ~- 1). A block-diagonal form occurs when submatrices of nonzero elements appear along the matrix diagonal. In block
  • 3. Indexing Techniques for Sparse Matrices form, the matrix has submatrices of nonzero elements that occur in no specific pattern throughout the full matrix. The block dimension is the order of a submatrix in a block or block-diagonal matrix. In electrical network and power distribution problems, the matrix is generally in random, band, or block-diagonal form, with the elements representing circuit voltages, currents, impedances, power sources, or users [9-10]; in structural engineering applications, the sparse matrix is generally of band or block form, with the band width or block dimension representing the number of joints per floor [3, 11]; in reactor diffusion problems and differential equations, the band form of matrix is most common, with the band width being the number of points used in a pointdifference formula [12-14]. This paper, while not concerned with the actual mathematical manipulations of sparse matrices, is primarily concerned with the indexing algorithms employed in such calculations. If the sparse matrix is stored in a haphazard manner, elements can only be retrieved by a search of all the data, which takes much time. If the sparse matrix is stored in some very convenient form, execution time will be much less. Conservation of execution time is of major importance in selecting an indexing algorithm. Another major consideration in selecting a particular indexing method is the amount of fast core the method requires in addition to that used for the storage of the nonzero data elements. For most applications, a small difference in core allocation between two methods is not a critical factor. In this case, the critical consideration is the execution time difference between the two methods. Since execution times vary greatly with the methods of indexing, an exact comparison of execution times must reflect the type of mathematical manipulation that is to be performed on the sparse matrix. One last major aspect of indexing algorithm selection concerns the adaptability and flexibility of programming the selected scheme. This depends in great part on the type of machine, business or scientific; machine configuration; operating system capabilities; number of bits per word; access • 111 times for peripheral devices; average instruction times; availability of the required instructions; the maximum row or column size to be used; the expected matrix density; and the availability and size of buffers. As with most applications, the use of a high-level programming language may provide relative ease of implementation for a selected indexing scheme, but such use is frequently accompanied by penalties in execution time and storage requirements. However, on the positive side, use of high-level languages may well result in a minimum of elapsed time for problem solution with a given programming staff, as well as overall minimum cost, considering both personnel and computer usage. Problems involving large order sparse matrices focus their attention on core storage utilization and execution time minimization, and therefore all but eliminate the employment of high-level languages for indexing schemes. In subsequent sections of this paper, current indexing schemes will be examined in an attempt to isolate a "fast" indexing algorithm, with "fast" being defined as producing an optimization of execution time and core storage for sparse matrices of large order. Particular advantages and disadvantages of each major type of indexing discussed will be brought to the attention of the reader. Parts II through VI discuss aspects of particular indexing schemes, while Part VII compares the requirements and advantages of the various schemes. Part VII, in conclusion, also makes recommendations concerning the adaptability and flexibility of the major existing indexing algorithms for application to user problems. The authors have attempted, as much as possible, to make their discussions machine independent. However, the authors made use of an IBM System 360/65 Model I in their research and certain basic aspects of this machine, such as the 32-bit word, are alluded to in the succeeding pages. The interested reader should have httle difficulty in adapting the concepts presented to machines of differing architecture. Computing Surveys, Vol 5, N o 2, June 1973
  • 4. 112 U. W. Pooch and A. Nieder • II. BIT MAP SCHEME I 0100 1010 I 1001 0101 I . . . . . . . I n a bit m a p scheme, a Boolean form of the matrix M is the basic indexing reference. Whenever a nonzero entry occurs in the sparse matrix, a 1 bit is placed in the bit map, with null entries remaining as zeros in the bit map. The position of each successive nonzero entry is found by counting over to the next 1 bit in the map. More rapid access to any element of a row is achieved b y providing an additional row index vector, where each element of t h a t vector is the address of the first nonzero elem e n t of each row [16]. An additional column index vector m a y also be applied for a more rapid column access, but this will also necessitate storing each nonzero entry twice. I t should be noted, however, t h a t any machine based on word, rather t h a n bit, addressing techniques will give much slower access in one dimension of the matrix t h a n in the other. As an example, the following matrix M, and its associated bit m a p and reduced Zvector is given. M= BM= 05 00 10 [3,2,5,4,7,1,8] Z-- 01 00 10 Figure 1 demonstrates a sample bit m a p supplemented with the row index vector V; the Z elements are the nonzero elements of the matrix. The bit m a p in Figure 1 is a matrix conception of the bit map. To conserve core, instead of using one word for each row of the bit map, all four rows (16 bits) are cornv v(2) • 2 • z(2) V(3) • 4 , Z(4) z(5) V(4) ) z(3) ) Z(6) z(7) R WIndex ValueIndicates O f l r s t nonzero element for row FI~ 1. , Z-vector value Sample bit m a p . Computing Surveys, Vol 5, No 2, June 1973 Bit Map byte 1 byte 2 ] byte 3 FIG. 2. Bit map of Figure 1 in core. pacted into one word as shown in Figure 2 with byte (8 bits) boundaries marked. F r o m Figure 2, it is simple to see t h a t the bit map, being the Boolean form of the matrix, will, in fast core, require at least W = I . J / B words, where I and J are the dimensions of the matrix and B is the n u m b e r of bits per word; W is rounded up to the nearest integer. The bit m a p uses at m i n i m u m Emt Map ---- (100/B) % of the storage requirements of the full matrix for indexing. The additional row index vector adds W= I . A / B more words, where A is the n u m b e r of bits required for an address. Supplemented with the row index vector, E•lt Map -~- R O W I n d e x = IO0/B (1 ~- ( A / J ) ) % of the full matrix is required for the indexing. Now, if the sparse matrix has less t h a n 65,536 nonzero elements, then A can be 16 bits in excess 32,768 notation. I n a 32-bitword machine for example, 16 bits m a y be conveniently accessed if the instruction set has a complement of half-word instructions. Attention should be given to the number of bits required for an address to range through the m a x i m u m core size. If this number of bits is not conveniently manipulated, it will be necessary to use more than the m i n i m u m a m o u n t of core to gain an execution advantage. Execution times for full word instructions are often less t h a n execution times for half-word instructions. Therefore, when choosing a convenient number of bits for A, the n u m b e r of bits used for an address, it is i m p o r t a n t to realize the tradeoff between core conservation and access time. Using B = 32 bits (word length), and A = 16 bits (half-word length), for a 500 × 500 matrix the bit m a p and row index vector require 8313 words, or 3.325 % of the 250,000 words for the full matrix; if the matIix is only 5 % dense, another 12,500 words are required for the nonzero elements; the total is 20,813 words, or 8.325 % of the full matrix.
  • 5. Indexing Techniques for Sparse Matrices In order to reference the M,~ element, it is necessary to physically count across to the j t h element in the zth "row" of the bit map. The correct bit will lie in the S~ = ((i - 1) * J + j + (B - 1))/B word of the bit map. To isolate the required bit, it will be necessary to either shift the word the necessary number of bits or mask all the other bits by a logical operation. If a shift is used, then repeated shifts perform a row operation when the bit map is stored by rows. Algorithm 1 (see Appendix) isolates the correct beginning word of a row in the bit map; a segment of the code shifts through one entire row, in preparation for a mathematical manipulation of the row. Algorithm 1 with slight alteration will accommodate matrices up to order 100,000. The restriction occurs in statement 06, where the multiplication must not result in loss of significant bits due to exceeding word size. In practice, the algorithm is limited either by the index vector being half-words, as indexing is provided for only 65,536 nonzero elements; or by 4095 rows or columns, the maximum number used in the indexing in statement 02. When the bit map is stored by rows, as in the algorithm above, then to perform a column operation it is necessary to count to the correct j bit for all I rows. This means executing virtually the entire algorithm I times. If more than a few column operations are to be performed, then execution time will become an important factor. The execution time is dependent on the density of the sparse matrix, the order of the sparse matrix, and the number of column operations to be performed. The time factor is exemplified by the following: EXAMPLE 1: A 500 X 500 m a t r i x exists, and it is necessary to perform 10 column operations when the matrix is 5 % dense. The average column execution time will be that of the 250th column. Assuming the entire algorithm is executed for each row, the execution time will be approximately: 500 rows X 10 column operations [(time to locate beginning of each row) • 113 + .05 density X 500/2 columns X (to process 1 bits) + (1 - .05 density) X 500/2 columns X (time to process 0 bits) + 500/2 columns X (time to locate bit in bit map) + 500/2 words X (time to locate word in bit map)] which is about 10 seconds on the I B M 360/65, with additional microseconds incorporated for the mathematical operation not listed in the coding. Had the same procedure been carried out on the transpose of the bit map, that is, the bit map is now column-oriented instead of row-oriented, then the execution time would have been cut by a factor of about 500, a considerable time savings. Not taken into consideration is any further computer processing, such as updating an index register after each 4095 characters or bytes, if necessary. If the bit map of the sparse matrix can be transposed and the data rearranged in less time than the difference between the column and row execution times, then the transpose operation will conserve execution time. In the above example, the difference between column and row execution times is about 9.7 seconds. For certain types of operations the bit map is ideal. Being in Boolean form, which means elements are either 1 or 0, true or false, or plus or minus, the bit map is the most compact form for logical operations, such as AND, OR, or E X C L U S I V E OR. Thus, if matrices MA and M B exist, and it is necessary to determine which elements are nonzero in both matrices, it is necessary only to A N D each word of bit map MA with the corresponding word of bit map MB. If the result is zero, both are not present; if the result is nonzero, the indicated elements appear in both matrices. An E X C L U S I V E OR determines which elements are present in either, but not both, of the matrices; an OR determines which elements appear in either or both of the matrices. Logical operations performed on the bit map require about 1/~2 of the execution time for the same logical operation on the full scale matrix, because the bit map on a 32 bit-word machine condenses 32 pieces of data into 1 word. Additionally, Computing Surveys, Vol. 5, No 2, June 1973
  • 6. 114 • U. W. Pooch and A. N~eder and often most importantly, the bit map conserves core storage. To determine how many elements will be present in the sum of two rows, and their order, an OR is performed on the two rows of the bit map. Using similar techniques, the feasibility of rearranging the matrix in a form more convenient for the user, such as diagonal form, where nonzero elements appear all along the diagonal, is determined. Kettler and Well [15] discuss some of the aspects of such a rearrangement algorithm. M a n y references are found to endorse or suggest the use of a bit map scheme for sparse matrices [7, 15-20], but it is particularly difficult to ascertain the exact algorithms utilized, as most authors do not include these in their papers. While a bit map scheme appears convenient and fast, it is restricted by the amount of fast core available for the bit map. In the case where the sparse matrix is less dense than the percentage of the full matrix that the bit map scheme occupies, core storage will be conserved by switching to an alternative method of indexing. Givens [21] has suggested that the bit map scheme would be more attractive to users if some special instructions were designed and implemented, to further decrease execution times. One such instruction Givens references is C L E A R TO ZERO, which would clear a large block of core, e.g., the bit map, from a first to a last address. Another instruction would be LOAD N E X T NONZERO, which would fetch the address of the next nonzero entry of the bit map, given the previous nonzero element, thereby eliminating the necessity of counting through all the zero bits. These special instructions would be implemented as microprogrammed subroutines [21]. To define a microprogram, it is necessary to understand that the execution of each assembly language instruction involves a specific sequence of transfers of information from one register in the processor to another; some of the transfers take place directly, and some through an adder or other logical circuit. Each of these steps defines a microinstruction and the complete set of steps necessary to execute the assembly language instruction constitutes a microprogram [22]. Computing Surveys, Vol 5, No 2, June 1973 IlL ADDRESS MAP SCHEME The address map is similar in form to the bit map, the main difference being that the address map stores an address or address displacement for each matrix element. If the matrix element is zero then a zero address is stored. The bit map requires only one bit for each matrix element. Since an address or address displacement requires more than one bit for each matrix element, the address map scheme will require N times more core storage than the bit map scheme, where N is the number of bits used for an address or address displacement. If address displacements instead of full-length addresses are used, then the address map must be augmented by a row index vector, as with the bit map. Assuming there are less than 256 nonzero entries per row, for example, an address displacement would require only 8 bits (a common character size). If a particular computer allows character operations that are faster than the access time to an individual bit map entry, the improved column access time of the address map can warrant the increased core expenditure. On a system with 6 bit characters, up to 64 nonzero row entries can be accommodated. The overall percentage storage requirement of the full matrix required for the address map with the row index vector will be EAdd.... Map = 100/B (C + A / J ) % where B is the number of bits per word; C is the number of bits used for an address displacement; A is the number of bits used for an element of the row index vector; and J is the number of columns of the matrix. Using C = 8 bits; A = 32 bits; B = 32 bits; and J = 1000 columns, the address map and row index vector require 25.1% of the full matrix, that is 251,000 words compared to 1 million for the full matrix. In addition, if the matrix is 5% dense, an additional 50,000 words are required for the storage of the nonzero elements. In order to isolate the M,~ element, it is necessary to access the S, = C / B (i -- 1). J -t- j character (or byte). In terms of words, S, = {C[(i -- 1). J + (j - 1)] + B } / B . )
  • 7. Indexing Techniques for Sparse Matrices where i and 3 are respectively the row and column of interest. If the S~ character (byte) is zero, it is a null entry; otherwise, the content of the S~ character (byte) is added to the row index element to give the address of the nonzero element. The address map scheme is subject to many of the same limitations of the bit map scheme, and requires a larger amount of core storage for indexing. A sample coding, Algorithm 2, which has the same characteristics as the example used in the bit map method (Algorithm 1) illustrates that fewer arithmetic operations than the bit map method are required when the computer is equipped with character addressing capabilities. If the computer used does not allow convenient arithmetic manipulation of individual characters, then the coding enclosed in brackets in Algorithm 2 must be added to overcome this difficulty. The bracketed coding requires much of the algorithm time, so if a computer has built-in arithmetic character manipulation, then the algorithm becomes increasingly faster. With an example similar to Example 1, we find that the execution time, with the bracketed coding included, is drastically different from the bit map time. This is primarily because of the easy access to any character. To access by column instead of by row, only the first row location of the correct column need be found. To find the correct location of the character in row 2, it is sufficient to add just the column dimension. This process is continued until the end of the matrix is encountered. For a column manipulation, then, we easily obtain Algorithm 3, similar to Algorithm 2. EXAMPLE 2. As in Example 1, a 500 X 500 matrix exists with 5 % density, and it is necessary to perform 10 column operations. It is therefore necessary to execute Algorithm 3, 10 times, so the execution time will be approximately 10 column operations X [(initialization time to lobate beginning of each row) 500 rows X (time to locate bit in bit map) • 115 + (1 -- .05 density) X (time to process 0 bits) + .05 density X (time to process 1 bits)] which is about 30 msec on the I B M 360/65, and has incorporated 2 additional ~sec that were included for the mathematical operation not listed in the coding. As with Algorithm 1, the limitations are due to the use of halfwords for the index vector, and to the use of an index register. Note that there is a considerable time savings, but at the expense of computer memory. Again, not taken into consideration is any further computer processing, other than the above coding, such as updating index registers, which may be necessary and require more time. Unhke the bit map scheme, where the entire row of the bit map up to the desired element must be scanned for nonzero entries before data manipulation can occur, the address map method requires only a reference to the desired element. Because the storage location of a data element is found independently of all except the desired address displacement, the address map method blends well with the concept of parallel processing. Parallel processing involves the s~multaneous execution of a sequence of operations by dependent central processing units. Thus, using the address map method, 4 separate central processing units could simultaneously execute the required arithmetic on 4 different elements of the matrix; at best, using the bit map method, different steps in the execution of 1 matrix element would be shared by the 4 central processing units. Employing the address map method, the processing units could work independently, except for the final results; while the bit map method would require transfers of information from one processing unit to the other processing units to execute the shared steps, which introduces an additional time lag. While no references have been found to explicitly endorse or suggest this method, and comparatively large core requirements exist, the address map scheme m a y prove useful with some future computer t h a t features both very fast core of a few million characters and a multitude of parallel proc- Computing Surveys, Vol. 5, No. 2, June 1973
  • 8. 116 • U. W . Pooch and A . Nieder 0 2 0 0 the row designation and another specified number of bits for the column designation (Figure 4). If computations are to be performed in a row manner, it is highly practical and efficient to order the nonzero entries first by rows and then by columns. Ordering the entries by rows makes it unnecessary to maintain the row index for every nonzero element; only the row need be identified for the first nonzero element of each row, as it is known t h a t all the following entries up to the next row indicator belong to the same row. In order to create the row marker, a check bit, such as a minus sign bit, can be set in the first column index word of each row (Figure 5), or as is usually done, an additional and separate row index vector can be created (Figure 6). The row index element generally contains the address or index number of the first column index for the row. The same syst e m m a y be applied to ordering the entries I: O 4 ° oo 1 o 7 9 FIG. 3 v(]> v(2) v(3) v(4) v(5) v(6) v(7) 5 l 2 ÷ 2 Z(1) 2 l ÷ 6 Z(2) 2 3 + 4 Z(3) 3 l ÷ 3 Z(4) 4 l ÷ 4 2 ÷ 7 9 z(s) z(6) 4 4 ÷ 5 Z(7) Row FIG. 4 nators 0 Sample matrix. Col umn Indexing with row and column deslg- V(2) essing units. Hoffman and McCormick [22] state t h a t at present the value of parallel processing on a large scale is debatable as far as manipulating sparse matrices, as there are virtually no available computers with more t h a n just a few parallel central processing units, and the field is quite unexplored. IV. R O W - C O L U M N 2 V(1) SCHEME Row-column indexing schemes refer to methods relying on paired vectors of some type; generally one vector contains the nonzero elements, which are most often ordered by rows or columns, and the other vector maintains the indexing information. Row-column indexing schemes are sometimes referred to as block index, row, or column packing schemes, depending on the author's description of how the indexing algorithm works [7, 15, 17, 20, 23-24]. I n the simplest, but not the most core- and time-efficient form, each nonzero element of the matrix has a corresponding index word t h a t contains a specified number of bits for Computing Surveys, Vol 5, No 2, June 1973 V(3) V(4) V(5) V(6) V(7) ÷ ~ z(l) Z(2) - 1 ÷ + 3 + - 1 Z(4) - 1 z(5) ~(6) : ÷ + 2 + 4 Row Column indicator (Sign b i t ) z(3) Z(7) FIG. 5. Indexing with row m d m a t o r and column designation VR(1) ~ ! VR(2) VR(3) VR(4) First column index for each row (halfword) V(1) V(2) 1 1 V(3) 3 1 ÷ V(4) 1 : ÷ V(5) " ' V(6) 2 ~ ÷ V(7) 4 i ÷ Column (halfword) 2 6 4 3 7 9 5 z(1) z(2) z(3) z(4) z(5) z(6) z(7) FIG 6 Indexing with row vector and column index vector.
  • 9. Indexing Techniques for Sparse Matrices by columns if column operations are to be performed. Figures 3 through 6 depict sample vectors for the row-column schemes described above. The index vectors are V and VR; the nonzero entries are contained in vector Z. The data matrix used in Figures 4 through 6 is displayed in Figure 3. The nonzero entries of the data matrix are stored by rows, in order of increasing column number. All index vectors are full words unless otherwise noted. From the above figures it is evident, there exists a wide possibility of variation in the row-column scheme of indexing. Further variations and adaptations can occur as a result of optimizing peculiar computer characteristics, or as a result of making calculations on special forms of sparse matrices, such as block matrices. However, caution is advised, for such optimizations may result in a useless program whenever system changes occur, and should therefore only be used when they are critical economies of the calculations. In the instance of computer peculiarities, Smith [17] states that a particular type of second generation IBM computer did not utilize the bits of the second word in extended-precision floating-point calculations that were normally used as the exponent bits in single precision floating-point calculations. A sparse matrix row-column indexing algorithm was developed that employed these otherwise wasted 8 to 9 bits as the row or column indices, and could accommodate matrices up to order 255 and 511 respectively. For the case of a special sparse matrix, the row-column indexing scheme for a blockdiagonal matrix could become a blocked indexing scheme. The blocked indexing scheme would be identical to the row-column method, except that the large sparse matrix is partitmned into several smaller submatrices (blocks). Then each submatrix is identified with a separate row-column scheme of some sort. A blocked indexing scheme may also be used to refer to combining several column indices into one block (word). For example, one 64-bit word would contain 4 column indices, each index of 16 bits. When a row • 117 operation is performed, then, 4 nonzero elements can be readied for processing at the expense of a loading time for only one block [17]. I t should be noted t h a t for many computers and algorithms more time is required to load a referenced word for arithmetic processing than is required to perform the necessary arithmetic to isolate the required bits of the referenced word. Likewise, more time is required to load extended-precision words than ordinary ,words. Also, since most computers are geared to utilize arithmetic data primarily by words, more time is required to load a half-word for arithmetic processing than is required to load a full word. Another major variation, known as delta or displacement indexing, is also popular, and is somewhat similar to the address map form of indexing. For one particular example of a delta indexing scheme, one 64-bit extended-precision word contains one 16-bit index and six 8-bit displacements to the index. Therefore, the column indices of 7 elements can be referred to by loading and processing one extended-precision word, which can result in both a considerable time and core savings. For a delta of 8 bits, it is possible for 2 nonzero entries of the same row to be a maximum of 255 columns apart. If elements can appear farther apart than 255 columns, then a greater number of bits must be allocated for each delta or the method must be abandoned. To determine the column number of the first element paired with the 64-bit index word, the first 16 bits of the index word are used. In order to determine subsequent column numbers for any other element paired with the 64-bit index word, the appropriate delta is added to the first 16 bits and the sum of deltas in between. Smith [17] also states that delta indexing is more efficient for large order (implied order about 250) sparse matrices than a blocked index form. Figures 7 and 8 depict the blocked and delta indexed word mentioned above, and are equivalent. EXAMPLE 3. From Figure 7, column index 3 = 1078. From Figure 8, column index 3 = 1027 + 20 -t- 31 = 1078. Computing Surveys, Vol, 5, No 2, June 1973
  • 10. 118 * U. W. Pooch and A. Nieder 1027 Column index 1 1047 1078 1095 Column Column Column index 2 index 3 index 4 (16 bits each index) FIG. 7 Blocked index word. For the row-column indexing method, using a column index for each nonzero entry and a row index vector, there is a required minimum for indexing W = I / B ( J . T . D + V) words; where I is the number of rows; J is the number of columns; T is the number of bits used for a column index element; D is the density of the matrix; V is the number of bits used for a row index element; and B is the number of bits per word. In reality, however, for matrices up to order 65,535 (in excess 32,768 notation), half-words may be most conveniently and efficiently used for all the row and column indices. Half-word indices are used to increase core savings at a generally tolerable increase in execution time; few it any matrices of order 30,000 or greater have been of notable use. Using half-word indices, then, the abovementioned indexing scheme requires a minimum core storage of ERow-co~umn = ( 1 / 2 J + D ) % of the full matrix for indexing. To access an M , element, it is necessary to refer to the ith row index, which points to the first nonzero element of the ith row. The column indices between the ~th and i + 1st row indices are searched for j. If the column indices searched do not contain j, the M , element is zero; otherwise the data element paired with the j column index is fetched and processed. For row operations, as long as the matrix remains ordered, execution time is very fast. For more than a few column operations, however, on a matrix of order greater than about 200, it is almost always more convenient and efficient to transpose the entire matrix and reorder all the data elements before performing the desired arithmetic. Again, the same situation exists as with the bit map; if the data and indexing scheme can be transposed in less time than the difference between the column and row execution times, then the transpose operation will conserve execution time. Unlike the bit map and address map schemes, which have constant core requirements for indexing, the row-column method has a core requirement for indexing directly proportional to the matrix density. Since each nonzero element has a paired column index, only the number of elements in the row index vector is constant. For example, adding two 50 X 50 sparse matrices, M A and MB, does not in general produce the result that the total number of resulting nonzero elements is the sum of the nonzero elements for each matrix before the matrix addition: if M A has 250 data elements and M B has 450, the sum of matrices MA and M B will not, in most cases, have 700 elements, i n the sum of matrices M A and MB, the only surety is t h a t there will still be 50 row index elements. A variable amount of core for indexing creates core allocation difficulties t h a t m a y not be readily acceptable to the user. In comparison to the bit map method, the row-column indexing method is noted for its fast execution time, when data elements are properly ordered, and its ease of programming, even for matrices of very large order (in the thousands). A wide variety of references endorse (or imply an endorsement of) a row-column techmque for indexing [15, 17, 25-30], or a block-diagonal method [3134], especially for particular applications, as noted in the Introduction, or for special matrices, such as symmetric matrices. I t should be noted that a symmetric matrix 1027 20 31 17 Column delta delta delta m delta index 1 (16 bits) (8 bits each index) FIG 8. Delta index word. Computing S u r v e y s , Vol 5, N o 2, J u n e 1973 __ __ delta delta
  • 11. Indexing Techniques for Sparse Matrices decreases by almost 50 % the core requirements in the row-column technique, both for the data elements and for the indexing elements. Two of the more general sets of algorithms encountered for processing random, and some special, sparse matrices and employing the row-column indexing technique are MATLAN [29], an I B M product, and Algorithm 408 [30], a more recent private effort. As these algorithms are readily available and are of general interest, a particular coding example is not given for the rowcolumn indexing technique. Both these algorithms were intended for use on sparse matrices of order less than about 32,700, and are more efficient for orders less than (about) 1,000. MATLAN is a programming system, operating under the control of Operating System/360, and has a very wide applicability. MATLAN includes many supplementary features, such as different versions for an all-core problem and for a segmented problem, three overlay structures for core storage, and options on precision. A segmented problem exists when portions of the problem under consideration are stored in core and on tapes or disks, an all-core problem exists when the storage requirement is such that the entire problem is stored in fast memory. Because of the variable precision option and the all-core or segmented feature, it is difficult to assess execution times. Array dimensions are limited to 32,756, which indicates half-words are used for indexing purposes. Algorithm 408 uses a variation of the indexing algorithm depicted in Figure 6. Instead of having the row index vector contain the address or index number of the first column index for the row, the row index vector contains the number of stored elements in the row. In addition, the row index vector is appended to the column index vector by using the same array name, M. While the scope of Algorithm 408 is not as broad as ~¢IATLAN, Algorithm 408 has the distinct advantage of being readily alterable: a section of the reference is devoted to possible alterations, such as combining three or more indices to a word of the M array. • 119 Because of the great variation in coding, at present it is not considered economically worthwhile to compare actual core storage and execution times to determine which of the many different existing algorithms employing the row-column method is the most efficient or optimal. A good basis for examing some of the rowcolumn indexing scheme characteristics rests on using half-word indices, with a row index vector, for calculations. At worst, the method (as typified by Algorithm 408) will utilize less core than the full matrix up to a density of slightly over 66%. Conservation of core allocation and execution time increases as the density decreases. It has been noted that the bit map method employs approximately 4 % of the full matrix for indexing. Therefore, it can easily be seen that when the matrix density falls below about 4%, the row-column method will conserve more core than the bit map scheme. In addition, the advantage of the faster indexing into the data by the row-column method in this case almost excludes the use of the bit map, except for special cases, such as a Boolean problem. V. THREADED LIST SCHEME A threaded, or linked list, scheme contains one element of an array in core for each nonzero element of the sparse matrix. Each array element in a linked list method has at least three components: one component contains the row and column indices; another contains the matrix element (data); and the third contains the address of, or a pointer to, the next array element. If the third component of an array element were not present, the linked list scheme would have, at an absolute minimum, the same core requirement for indexing as the row-column method. The third component adds W = A*D/B more words for indexing which gives a minimum total of W -- I / B ((J.T A- A)D A- V) words for indexing a threaded list scheme: where I is the number of rows; J is the number of columns; D is the density of the matrix; T is the number of bits used for a column index; V is the number Computing Surveys, Vol. 5, No. 2, June 1973
  • 12. 120 • U. W. Pooch and A. Nieder of bits used for a row index; A is the number of bits required for an address to range through the entire amount of core used to contain the complete threaded list; and B is the number of bits per word. For any practical application, however, both the row and column indices must be retained, which gives an overall minimum core allocation for indexing of W = I . J . D ( T + V + A ) / B words. As in the previously discussed methods of indexing, half-words (16 bits) are used in practice for both the row and column indices, which give capabilities of a matrix of order 65,535 (in excess 32,768 notation). In addition, because of the great difficulty and great time involved in manipulating addresses of less than full word size (refer to Bit Map Scheme), full words (32 bits) are conveniently used for addresses. These considerations now require for the overall minimum core storage for indexing, W = 2 . I . J . D words. As a percentage (E) of the full matrix, this is E L m k e d LI~t = 2*D % necessary for indexing. In order to reference an M , element, the entire threaded list must be searched if the nonzero elements are stored in a random manner. Elements can be stored, except for updates, and accessed more efficiently by rows and colums, which can reduce access time to particular elements or rows of elements. Elements need not be stored contiguously for reasonably efficient processing. In one particular application of a threaded list scheme, data elements were initially stored by rows and columns, and a table of pointers was kept. Each pointer addressed the beginning element of a group of 8 elements. Any particular item, or row of items, could be found by a binary search on the list of pointers. Example 4 typifies the search for a particular matrix element in this application of linked list indexing. EXAMPLE 4. Matrix elements are stored by rows and columns. The element to be found is in the middle row of the matrix, so the pointer in the middle of the pointer list is selected. The contents of the pointer word Computing Surveys, Vol 5, No 2, June 1973 addresses an element of the linked list. The element is then examined, to compare the row and column components with the required row and column numbers. Three separate cases can now occur: (1) If the row and column numbers match, the correct element has been found. (2) The rest of the elements in the group of 8 are searched, and if the row matches, but not the column, it is known that the correct group can probably be found by a search on the next few pointers about the pointer last used. if the pointer indexed an element whose column number was greater than required, then the next lower pointer is used. (3) The rest of the elements in the group of 8 are searched, and if the row doesn't match, then a binary search on the pointers is continued. In a binary search, if the pointer indexed an element whose row number was greater than required, the next pointer to be selected is the one halfway between the last pointer (upper bound pointer in this case) and the lower bound pointer (the first pointer in this case). When the procedure is iterated, (2) above, and the appropriate groups are searched, but the correct row and column cannot be found, then it is known that the required matrix element is the null element. It should be noted that unless the data elements are in reasonable order, the binary search on the pointers is almost useless. The particular value of a linked list is that there is no longer the requirement that data elements be stored contiguously: updates, insertions, and deletions of matrix elements are performed by altering the address component of the appropriate hnked elements. However, a linked list expansion or contraction results in some pointer groups having a greater number of link elements, and some other pointer groups having fewer link elements. The alterable number of link
  • 13. Indexing Techniques for Sparse Matrices elements in each pointer group necessitates a periodic updating of the pointer table. A pointer table update is vital to the efficiency of the binary search, and may require a great amount of execution time. The amount of execution time required for a pointer table update depends directly on the number of link elements to be grouped, as each link element must be inspected m order to find each successive link element. For peak efficiency of the binary search, every group should have the same number of linked list elements. Using the additional pointer table to combat the otherwise slow execution time of the linked list scheme, one pointer exists for each 8 nonzero matrix elements. Employing a full word for each pointer, which is an address, we now have a minimum indexing core requirement of W = 21/~*I*J*D words, for ELmked List --~ 2 . 1 2 5 , D % of the full matrix. This is a much greater core requirement than the row-column methods of the previous section require for any matrix of order greater than three. Figure 9 depicts a few elements of a linked list, and the correlation between elements. A pointer table is not included. Not previously mentioned is the practical necessity of maintaining a table of available addresses, so that core allocation remain conservative during the insertion and deleAddress Address I051 next RW O Column element . . . . . . . . . . . . . . . . . . . Data element * Address 1162 . F'2 I 3 1 9841 J i i . . . . . . . . . . . Address . . . . . . . . . . i. . "1 . . . . . I I 1273 . I i i I H 41 1,4 FIG 9 f .6'2 J Linkedhst elements. f • 121 tion of matrix elements. When matrix elements are deleted, the address of the deleted link element must be appended to the table of available addresses. Not only must the table be maintained in fast core but the threaded list scheme additionally requires a buffer area to be used for the inserted and/or deleted link elements. If such a buffer area is not used or kept, then core will not be conserved and the prime ~dvantage of the threaded list will have been discarded. Few references endorse, or suggest endorsement of, the linked list scheme as a practical method for indexing sparse matrices [15, 34-37]. Only a few sources [15, 38-40] found in the literature survey actually utilized the threaded list scheme; while the actual algorithms were seldom described in great detail, the scheme basically followed the designs of Example 4. Overall, the threaded list technique of indexing into sparse matrices requires a significant amount of execution time for processing indices, in addition to the core requirements of a buffer and two separate tables. Inherent in the method, then, are considerable execution times for processing and considerable core expenditure, in comparison with the bit map and row-column schemes for identical matrices. Offsetting these disadvantages, however, the linked list scheme has the distinct advantage of not requiring a significant amount of execution time to update the linked list by insertion or deletion of single matrix elements or series of matrix elements. All other previously discussed indexing techniques require a shifting of data when an update is performed, which will take a great amount of execution time when numerous matrix elements have to be shifted to make the appropriate word available for the update. The linked list scheme is slow for random processing of matrix elements; however, in many applications items are accessed sequentially by row or column. In these applications, proper chains of pointers speed up processing greatly. As with previous methods, a definite symmetry of the sparse matrix reduces proportionately the core requirements for indexing. Computing Surveys, Vol. 5, No. 2, June 1973
  • 14. 122 • U. W . P o o c h a n d A . N~eder Vl. DIAGONAL OR BAND INDEXING SCHEME / -199 Band and diagonal matrices are special types of matrices t h a t occur frequently in electrical engineering, structural engineering, nuclear engineering and physics, solutions to differential equations, and a host of other fields, as mentioned in the I n t r o d u c t i o n . Band and diagonal matrices, while of frequent occurrence, should not be mistaken as a general case of sparse m a t rices. When band or diagonal matrices occur, a special effort on the part of the user should be made to a d a p t his processing a n d / o r indexing algorithms to the case at hand. This adaptation should be made because of the inherent simplicity of processing, manipulating, and solving band matrices, and also because of the opportunity to minimize core allocation and execution time. In most cases, band or diagonal matrices are processed either wholly by rows or columns, and httle or no processing of single elements occurs. For a band matrix, a comm o n manipulation involves decreasing the band width. I n such a manipulation, it is normal procedure for one entire row (column) to operate on the row (column) immediately above or below it (or to either side). With such a simple processing sequence, it is evident t h a t only a few rows (columns) need be maintained in fast core for immediate use. If d a t a transmission rates are comparable to the rate with which rows (columns) are manipulated, then rows (columns) not in immediate use can be stored on slower access devices, such as tapes or disks. Storing data on tapes or disks frees the more expensive fast core. I n most machine configurations there is a much larger amount of m e m o r y available in the slower devices. When slow devices can be used efficiently for processing band matrices, the capability of manipulating large order sparse matrices is limited by the m a x i m u m allowable execution time and the desired accuracy limits of the results, and not by the order of the matrix involved. To further conserve execution time, but at the expense of fast memory, the entire band matrix can be stored in fast core. Preserving Computing Surveys, Vol 5, No 2, June 1973 lO0 5 99 -199. lOl 98 5 -199 98 lOl 5 0 -199, I02 97 5 -199 97 102 5 -199 I03 96,5 -199 0 96 103 5 -199 104 95 5 -199 / FIG 10. Band matrix. the entire matrix in fast core eliminates the transmission times between fast core and auxiliary devices, as well as the time required to restore elements in fast core, which is done prior to data manipulation and processing. Another prime a d v a n t a g e directly involved with data transmission is the use of overlapping channels in burst or select mode. However, when the matrix is fully maintained in fast core, channels will then be available to other users on multi-user computers. If the band matrix has full bands, t h a t is, no row has any zero elements within the band, then the total number of elements to be stored is the band width multiplied by the number of rows in the matrix. Figure 10 depicts a band matrix with full bands (a band width of 3 here): EXAMPLE 5. Figure 10 is the resulting 9 X 9 matrix obtained by using a central difference approximation (3 points) to solve the boundary-value differential equation 2 + 3t 2 = y + y' + y" using 10 intervals between the points y ( t = 0) = 0. a n d y ( t = 1) = 1. A 5-point interpolation would yield a band width of 5; 50 intervals would result in a 49 X 49 matrix. N o t e that the augment column, a constant associated with each row of the matrix, is not considered here as an integral part of the sparse matrix. Accuracy of results depends on the number of intervals, n u m b e r of points in the interpolation formula, and computer round off. I n one particular application of processing a band matrix by rows (columns), it is convenient and efficient to store elements in full vectors, one vector for each super- or sub-
  • 15. Indexing Techniques for Sparse Matrices diagonal of the band matrix. Since the diagonal has the greatest number of elements, the vector for the diagonal will be the largest vector. To avoid double indexing, which takes greater execution time, an additional table of addresses is created. Each element of the address table contains the address of the first element of the respective vector. The indexing scheme in the algorithm used to arithmetically manipulate the band matrix is then altered to suit the storage scheme. If, for some reason, it is more convenient to store elements in a row or column form, e.g., because of a very difficult or time-consuming arithmetic manipulation, most of the advantage of employing a band scheme is lost, and other methods of indexing should be considered. Band matrices, as noted above, are unusual from an indexing standpoint because of the very slight core requirements for indexing. For the application described above, only W = I , V / B words are required for indexing; where I is the number of rows; V is the number of bits used for a row index element; and B is the number of bits per word. As a percentage (E) of the full matrix, this indexing requirement is Ezand = 1 0 0 / J % where J is the number of columns in the matrix when full words are used for the table of addresses. If hMf-words are adequate, it decreases this requirement further by onehalf. It should be brought to the attention of the user that in the instance where bands do contain zero elements, a decision should be made whether to employ a band scheme, which may not be very efficient in use of core if a large number of null entries exists, or some other particular scheme, such as a block-diagonM scheme, which may not conserve execution time. Many papers [4, 10, 34, 40-43] are concerned with band matrices, primarily, as said, because of the prevalence of band matrices in many specific fields of interest. Also, many algorithms are readily available for processing band matrices; FOaTRAN M • 123 [44] being one of the more recent programming packages. VII. CONCLUSION In the previous sections four major types of indexing methods were discussed, three of which are in general use: the bit map scheme, the row-column scheme, and the threaded list scheme. Each major type, of course, has many variations (the address map method is not in general use at present, so no variations occur). The important special case of the band matrix is discussed as a separate entity, because it is not a general case of a sparse matrix, even though it has wide application. As stated in the Introduction, one of the major considerations in selecting a particular indexing method is the amount of fast core the method requires, in addition to the data elements. The indexing in the bit map method requires a fast core allocation of approximately 4 % of the full matrix; in the address map method indexing requires about 25 % of the full matrix. The row-column and threaded list schemes have no definite core requirements for indexing, and fast memory for indexing is directly proportional to the sparse matrix density. The percentage of the full matrix required for indexing a rowcolumn scheme is about one times the matrix density, and about twice the density is required for a threaded list scheme. Previous discussion indicated that an exact comparison of execution times must reflect the type of mathematical manipulation being performed on the sparse matrix. For example, the bit map method is of particular use when the matrix is used to produce an "optimal" ordering, so the matrix inverse will not have a greatly increased density. In contrast, the row-column method is faster than other methods when manipulations involve one row (column) acting on other rows (columns). The second important aspect of indexing scheme selection is the conservation of execution time. If arithmetic operations are to be performed on the data, primary consideration should first be given to a rowcolumn method; if Boolean arithmetic or Computmg Surveys, Vol. 5, No 2, June 1973
  • 16. 124 • U. W. Pooch and A. Nieder reordering algorithms are to be performed, the bit map scheme should be considered first; and if a great number of data elements are to be reordered, created, or annihilated, a threaded list scheme deserves first consideration. The bit map scheme has a definite core allocation for indexing, offers a reasonable row access time, is quite fast in execution time when row operations are performed, is core efficient when the matrix density is greater than 4 %, and allows very fast manipulation of logical (Boolean) operations. Logical operations can be conveniently used to determine when arithmetic operations are to be executed. As to its disadvantages: the bit map scheme has extremely poor column access time when elements are ordered by rows, which in most cases requires transposing the bit map and reordering the data elements: it makes poor use of parallel processing, requires considerable time to reorder data elements, and is not core efficient when matrix density fails below 4 %. The address map proves advantageous when character addressing is available, makes very efficient use of parallel processors, provides ready access to any element, does not require an extensive amount of execution time (in comparison to the bit map scheme) to reorder data elements, and exhibits a reasonable row and column execution time. The primary disadvantages of the address map method are: a large fast core requirement for indexing; and the relatively large execution time, in comparison with the threaded list scheme, to reorder matrix elements. Both bit and address maps require significant execution times to transpose the mat r i x - t h e map must be transposed, and all the data elements must be reordered. Execution time to transpose the matrix is directly proportional to the order of the matrix and the matrix density. Primary advantages of the row-column schemes are: a very fast row access time in comparison with the bit and address maps; a relatively fast column access time in comparison to all other methods; conservation of Computing Surveys, Vol 5, N o 2, June 1973 core with matrices of less than 4% density when compared to the bit map method; an increase in efficiency as the order of the matrix increases, as more complex variations become more efficient; and faster reordering than the bit map or address map methods. The main disadvantages of the row-column scheme are that column access time and the time required to reorder elements greatly increase as the matrix order a n d / o r matrix density increases. The threaded list technique is the sole technique that allows a simple and fast executing method of reordering, adding, or annihilating data elements. The threaded list scheme exhibits a variety of disadvantages, the primary ones being a large core requirement for indexing in comparison with the row-column method, a slow access time for rows when elements are stored by rows, and an even slower access time for columns compared with the rowcolumn method. The inclusion of orthogonal links, as discussed by K n u t h [35], removes some of the column access difficulties, but only at the price of additional storage. For the special case of band matrices, a scheme similar to the one described in Part VI should be used unless either half or more of the elements within the b, nd width are null, or the nature of the mathematical operations to be performed dictates otherwise (as described in Part VI). If the band matrix scheme cannot he utilized, the user must decide which characteristics of the other types of indexing are considered vital to the solution, and select a method on this basis. A final major aspect of indexing the user must consider concerns the adaptability and flexibility of programming the selected scheme, which depends upon the factors enumerated in the Introduction. The following suggestions and comments concerning programming flexibility and adaptability are offered. None of the major types of indexing schemes requires double indexing. Double indexing involves using one register (adder) to index across the row, and another register to index down the column. Double indices have at least three drawbacks: they require
  • 17. Indexing Techniquesfor Sparse Matrices more time than single indices; the computer may have a built-in limit on the number of characters or words that can be indexed by one or both of the registers before a new index (base) register must be designated;and registers are at a premium, because of the extremely fast register to register operation time, and should be used for more vital arithmetic. In the last analysis, the increased time involved in double indexing is the critical factor. In general, the larger the order of the matrix, the lower the matrix density. Because of this the row-column method is preferred for matrices with orders of 1000 or more, especially when arithmetic manipulations or operations are to be performed. As the order of the matrix increases, it becomes more efficient to employ more complex variations of the major types. For instance, the delta indexing scheme (as described in Part VI) conserves a considerable amount of fast core compared with the simpler row-column schemes, without a great increase in execution time, when the order approaches 1000. If the matrix requires more fast core than is available, the user must decide either to segment the matrix between fast and slow core, or to reduce the complexity of the problem. If the problem can be simplified, or the matrix condensed or partitioned (blocked), then it is not necessary to segment the matrix between fast and slow core. Simplifying the matrix involves the real consideration of whether or not it is economically feasible to reorder rows and/or columns to produce a new matrix that can be more efficiently processed. Many schemes have been developed [7, 16, 18, 27] to attempt such an optimal ordering of matrix elements. Condensing the matrix involves the elimination of data elements that produce insignificant or negligible change in the results. Such condensing can often be done with reasonable competence by somebody skilled in the nature of the problem to be solved. If the matrix is of block-diagonal form, each block can be processed as a separate entity to produce a composite result. The availability of a virtual memory • 125 processor might lead the user to the erroneous conclusion that the benefits of a proper indexing algorithm are negated. This is not so; at some time during the processing of a sparse matrix the matrix must reside in physical memory. It then follows that the fewer the number of pages occupied by the sparse matrix, the fewer the page faults generated, and therefore the less time involved in moving the matrix to and from peripheral paging devices. In other words, the same benefits accruing from indexing in an ordinary processor apply in a virtual memory processor.When such updating of data files is anticipated, the user should designate buffer storage. When new matrix elements are introduced, they should be stored in the buffer area. When a considerable humber of corrections to the data elements exist (about 5%), then the matrix is reordered. The threaded list scheme requires no separate buffer area, as a buffer is inherent in the indexing scheme. The segments of coding that contain the actual indexing algorithm should be programmed in a low-level language, such as assembly language, to conserve execution time. High-level languages, such as FOgWRhN utilize a compiler, which may not produce the most efficient coding. For instance, if a division by 32,768 is necessary, the high-level language may simply create a division by 32,768 in assembly language. If the highlevel compiler, however, recognized that a division by 32,768 is identical to shifting an accumulator right 16 bits, the assembly language version would be a shift right logical or shift right double logical. The first version would require significantly more execution time than the more efficient assembly language program version. A considerable savings is realized when the computation is performed perhaps as many as several million times in a program. The user should avoid making the indexing algorithm in a subroutine form, especially in a high-level language, because of the added linkage time during program execution. While a "fast" algorithm for indexing into arbitrarily sparse matrices would allow very Computing Surveys, Vol. 5, No. 2, June 1973
  • 18. 126 • U. W . Pooch and A . N~eder efficient core storage allocation and execution times for matrix manipulations, it is also evident that no such single algorithm exists, at least at present. The advent of array processors and pipeline computers may eliminate the desire to handle sparse matrices in any special manner whatsoever. However, it also appears that no matter how large, or how fast and sophisticated, computing machines become, users will continue to strive for core storage conservation and faster execution times. It remains to be seen if sufficiently sophisticated indexing algorithms will be developed to accomplish those goals in array or pipeline machines; or whether such machines will come into Computing Surveys, Vol. 5, No 2, June 1973 general use and provide an environment conducive to developing sparse matrix indexing schemes. For the present, the choice of an indexing algorithm depends upon many considerations, with each major type of indexing discussed here having particular advantages and disadvantages. Careful selection of an algorithm can satisfactorily achieve the goals of conservation of core memory and execution time. In addition, whenever there exists some pattern to the nonzero entries, the possibility of reorganizing the calculations as a means to handle some sparse matrices should be carefully considered.
  • 19. Indexing Techniques for Sparse Matrices • 127 APPENDIX ALGORITHM 1 BIT MAP SCHEME Statement Meaning is t h e row n u m b e r t h a t will b e m a m p u l a t e d v is t h e row i n d e x v e c t o r b = n u m b e r of b i t s / w o r d (* -- 1) J is t h e n u m b e r of c o l u m n s in t h e m a t r i x (z -- 1) * J Save (z- l)*J (((z1 ) * J ) 4- b - D S, = (((~ - l) * J ) 4- b -- 1 ) / b w o r d c o n t a i n s t h e first b i t of r e q u i r e d row E n d of row c o u n t e r ( J ) S t a r t i n g w o r d of t h e r o w D e t e r m i n e c o r r e c t n u m b e r of d i s p l a c e m e n t b i t s ; M A S K = m a s k for m a x i m u m d i s p l a c e m e n t bits S h i f t to e l i m i n a t e i n c o r r e c t b i t s ( f r o m p r e v i o u s 01 02 03 04 05 06 07 08 09 R O W ~R I N D E X ~- v(~) BITS ~ b R O W ¢-- R O W - 1 C O L S ~- J ROW e- ROW * COLS S A V E (--- R O W R O W ~- R O W 4- B I T S R O W ~- R O W / B I T S 10 ll 12 R O W E N D (-- C O L S S T A R T ~- R O W R O W E N D ~- R O W E N D MASK 13 S T A R T *- S T A R T * 2 * * S A V E 14 15 16 C O U N T R O W E N D *- R O W E N D GO TO ROWSCAN R O W E N D *-- B I T S 17 18 19 R O W S C A N R O W ~- R O W 4- 1 W O R D ~-- b i t w o r d f r o m m a p W O R D B 1 T *-- b i t f r o m b i t - w o r d 20 C O L N U M (-- C O L N U M 4- 1 Increment column number 21 WORDB1T 22 IF YES, GO TO MATH Following statements are branch controls Is t h e b i t n o n - z e r o ? Yes, an element exists. 23 E N D R O W 24 COLNUM = COLS ~ IF YES, GO TO END1 Is t h e c o l u m n c o u n t e r e q u a l to t h e r o w c o u n t e r ? Y e s , e n d of row 25 COLNUM 26 27 28 M A T H I F Y E S , GO T O C O U N T GO TO ROWSCAN R I N D E X e - - R I N D E X 4- 1 H a v e we s h i f t e d c o m p l e t e l y t h r o u g h b i t m a p word? Yes, fetch another word. N o , s c a n n e x t b i t in w o r d R I N D E X = a d d r e s s of n o n z e r o e l e m e n t 1 AND rOW) - SAVE C o r r e c t for e h m i n a t e d b i t s B r a n c h to code to s c a n row in b i t m a p for 1 b i t s F o l l o w m g code s c a n s o n e e n t i r e r o w of a b i t m a p . A f t e r first w o r d of row is s c a n n e d , t h e b i t counter (ROWEND) = b Increment bit map word address by one W o r d of b i t m a p P i c k u p h{gh o r d e r b i t f r o m b i t w o r d (WORD) = 1 = ROWEND COLNUM element = column number of non-zero P e r f o r m r e q u i r e d o p e r a t i o n on e l e m e n t 29 30 E N D 1 GO TO ENDROW STOP Computing Surveys, ¥oi 5, No 2, June 1973 Return E n d of o p e r a t i o n o n t h e row.
  • 20. • 128 U. W . Pooch and A . Nieder ALGORITHM Statement 01 02 03 04 05 06 07 S T A R T 08 09 10 ll I 12 13 14 15 16 M A T H 2: A D D R E S S MAP SCHEME R O W *-- i R I N D E X ~-- v(~) R O W *-" R O W -- 1 C O L S ¢-- 3 R O W (-- R O W * C O L S R O W (-- R O W - 1 R O W ~-" ROW + 1 C O L N U M (--- C O L N U M + 1 COLNUM > COLS IF YES, GO TO ENDROW B Y T E ~- b y t e f r o m address map BYTE ~ 0 IF YES, GO TO START C H E C K ~-- 0 CHECK *- BYTE CHECK ~ CHECK + RINDEX Meaning i = row v = row index vector (i - 1) 3 = $ columns j*(~ -- 1) (3*(2 -- 1)) -- 1 Increment across row Increment column $ E n d of row Yes, done Pick up partial word I s b y t e zero? Reenter scan process Zero w o r k a r e a Byte to work area Points to non-zero element Required operations performed here 17 18 E N D R O W 19 GO TO START STOP END ALGORITHM Statement 01 O2 03 O4 O5 O6 07 S T A R T O8 O9 10 11 12 13 14 15 16 17 M A T H Reenter scan process Finish 3: A D D R E S S MAP SCHEME B E G I N *-- A d d r e s s of a d d r e s s m a p B E G I N ~-- B E G I N + J B E G I N *-- B E G I N -- 1 C O L S (-- 3 ROWS ~ B E G I N ~-- B E G I N - C O L S B E G I N ~-- B E G I N + C O L S R I N D E X *- v(I) R O W C T R ~-- R O W C T R + 1 ROWCTR > COLS IF YES, GO TO ENDROW BYTE *- byte from address map BYTE = 0 ~ IF YES, GO TO START C H E C K ~- 0 C H E C K ~- B Y T E C H E C K ~-- C H E C K + R I N D E X Meaning Pointer J = column g 3 = g columns i = g rows Increment address Row index vector I n c r e m e n t row c o u n t e r P a s s e d e n d of m a t r i x ? Yes, passed end Pick up partml word Is b y t e zero? Reenter scan process Zero w o r k a r e a Byte to work area P o i n t s to n o n - z e r o e l e m e n t Required operations performed here 18 19 E N D R O W 20 GO TO START STOP END Reenter scan process Finish Computing Surveys, Vol 5, No 2, June 1973
  • 21. Indexing Techniques for Sparse Matrices I • 129 ROW + i I ,1,, )-- IR,,DEX + v(i) _ i_ __~ ....... ~, ~IT÷bit frombitmap I COLS÷ J ( D [Row ~- RO.*CO'S i $ ,, . I"R W O÷ ¢ NO ¢ (ROW + BITS - I ) / B I T S l ! [i.o.~,o: ~oc,~ .~ [START ÷ R W OI • ~ , I MASK& SHIFT R W N I O ED I .... RowEND~ NO oc.o. ; O"E"9 @.o @ R W N- SAVEI O ED FIG A1. Flowchart--algorithm 1 bit map scheme. Computing Surveys, VoL 5, No. 2, JuBe 1973
  • 22. 130 • U. W. Pooch and A. Nieder C E K÷ 0 HC ~ NDEX÷ v(i) ICHECK÷BYTE F~o~~o~-~I ~-EX~ ICHECK÷ C E K+ RIND - HC CL÷j OS @ O _,] ~,ROW÷R W+ l YS E ( IS COLNUM,COLS~ ~ ~NO ~_~TE ÷ bYte from address map ] ~IS BYTE= 0?~ @"° FIG A2 YES~ F l o w c h a r t - - a l g o r i t h m 2: a d d r e s s m a p s c h e m e Computing Surveys, Vol 5, No 2, June 1973
  • 23. Indexing Techniques for Sparse Matrices BEGIN I, ÷ address map address I • 131 CHECK 0 ÷ T FCHECK÷ BYTE BEGIN ÷ BEGIN + J - l C E K÷ C E K+ HC HC ~ BEG,N ÷ RINDEX $ BEGIN + CO'S l [ RINDEX÷ v(I> I ( ~ RO.C~> c o ~ ~ '~._j / NO I-BYTE+ byte from address map~ ~ ~,~ o~; Y~ < ~ + Fza A3 F l o w c h a r t - - a l g o r i t h m 3 address m a p scheme. Computing Surveys, Vol, 5, No. 2, June 1973
  • 24. 132 • U. W. Pooch and A. Nieder BIBLIOGRAPHY 1. BRAYTON, R., GUSTAVSON, F., AND WILLOUGHBY~ R. "Some results on sparse matrices." RC2332, IBM Watson Research Center, (February 1969), 37-46. 2. LARSEN, L. "A modified inversion procedure for product form of the inverse-linear programruing codes " Comm. ACM 5, 7 (July 1962) 382383 3. LIVESLEY,R. "An analysis of large structural system." Comp. J. 3, (1960)34-39. 4. McCoRMICK,C.W. "Application of partially handed matrix methods to structural analysis." Sparse Matrix Proceedings, R. Willoughby (Ed.) IBM Watson Research Center, RAl1707 (March 1969) 155-158 5 ORCHARD-HAYs, W. Advanced L~near Programming Techniques McGraw-Hill, New York, 1968, 73-82. 6. TEWARSON, R. "On the product form of inverse of sparse matrices." S I A M Rewew 8, (1966) 336-342. 7 TEWARSON,R. "Row column permutation of sparse matrices." Comp. J 10, (1967/68) 300-305 8. BRAYTON, R., GUSTAVSON, F , AND WILLOUGHBY, R. "Some results on sparse matrices." (Introduction), RC2332, IBM Watson Research Center, (February 1969) 1-3. 9. BASHKOW, T "Network analysis." Mathematical Methods for Digztal Computers A. Ralston and A. S. Wilf, Eds., Vol. I, John Wiley and Sons, New York, 1967280-290 1O. TINNEY,W F. "Comments on using sparsltv techniques for power system problems." Sparse Matrix Proceedings R Willoughby, Ed., IBM Watson Research Center, RAl1707 (March, 1969) 25-34. 11. PALACOL,E . L . "The finite element method of structural analysis " Sparse Matmx Proceedzngs R. Willoughby Ed., IBM Watson Research Center, RAl1707 (March, 1969) 101-5. 12. RALSTON, A. "Numerical integration methods for the solution of ordinary differential equations." Mathematzcal Metaods for Dzgztal Computers A. Ralston and A. S. Wilf Eds, Vol. I, John Wiley and Sons, New York, 1967, 95109. 13. ROMANELLI, M "Runge-Kutta methods for the solution of ordinary differentml equations " Mathematzcal Methods for Dzgztal Computers A. Ralston and A S Wilf, Eds , Vol. I, John Wiley and Sons, New York 1967, 110-20. 14. WAC~SPRESS,E "The numerical solution of boundary value problems " Mathematzcal Methods for Dzgztal Computers A Ralston and A. S. Wflf, E d s , Vol. I, John Wiley and Sons, New York, 1967, 121-7. 15. WEIL, R,, JR, AND KETTLER, P. " A n algorithm to provide structure for decomposition." Sparse Matrzx Procee&ngs R. Willoughby, Ed., IBM Wa~sca Research Center, RAl1707 (March, 1969) 11-24 16. GUSTAVSON,F., LINIGEB,W., WILLOUGHBY,R. "Symbohc generation of an optimal crout algorithm for sparse systems of linear equa- Computing Surveys, Vol. 5, No 2, June 1973 17. 18. 19 20. 21. 22 23. 24. 25. 26. 27 28. 29. 30. 31. 32. 33. 34. tlons." Sparse Matrix Proceedings R. Willoughby, Ed., IBM Watson Research Center, RAl1707 (March, 1969) 1-10. SMI~I, D . M . "Data logistics for matrix inversion." Sparse Matrix Proceedings R. Willoughby, Ed., IBM Watson Research Center, RAl1707 (March, 1969) 127-32. SPILLERS,W. R., AND t~ICKERSON,N. "Optimal elimination for sparse symmetric systems as a graph problem." Quar Appl. Math. 26 (1968) 425-32 STEWARD,D. V. "On an to technique for the analysis of thepproacha of large structure systems of equations." S I A M Rev 4 (1962) 321-42. TEWARSON,R . P . "The Gausslan elimination and sparse systems," Sparse Matrzx Proceed~ngs R. Willoughby, Ed., IBM Watson Research Center, RAl1707 (March, 1969) 35-42. GIVENS, W., McCoRMICK, HOFFMAN, et al. "Panel discussion on new and needed word and open questions." (Chairman P. Wolfe), Sparse Matmx Proceedings R. Willoughby, Ed., IBM Watson Research Center, RAl1707 (March, 1969) 159-80. WILKES, M. V. "The growth of interest in microprogramming: a literature survey," Com p. Surveys, 1,3 (September, 1969) 139-45. ORC~ARD-HAYs,W. " M P s y s t e m s technology for large sparse matrices." Sparse Matrix Proceedzngs R. Willoughby, Ed , IBM Watson Research Center, RAl1707 (March, 1969) 59-64. CHANG, A. "Apphcatlon of sparse matrix methods in electric power system analysis." Sparse Matrix Proceedings R. Willoughby, Ed., IBM Watson Research Center, BAll707 (March, 1969) 113-122. BRAYTON, n . , GUSTAVSON, F., WILLOUGHBY, R "Some results on sparse matrices." IBM Watson Research Center, RC2332 (February 1969) 21-22. CHhRTRES, B A., ANn GLUDEN, J C. " C o m putable error bounds for direct solution of hnear equations." J ACM 14, 1 (Jan 1967) 63-71 FORSY~HE, G. E. "Crout with pivoting." Comm. ACM 3 (1960) 507-8. JENNINGS,A. "A compact storage scheme for the solution of symmetric linear simultaneous equations." Comput. J. 9 (1966/67) 281-5 System 360 Matrix Language (MATLAN) Application Description, IBM H20-0479 Program Description Manual, IBM H20-0564 McNAMEE, J M. "Algorithm 408, a sparse matrix package." (Part I), Comm ACM 4, 4 (April 1971) 265-273. DULMAGE, A L., AND MENDELSOHN, N. S. "On the inversion of sparse matrices." Math. Comp. 16 (1962) 494-496. MAYOH,B.H. "A graph technique for inverting certain matrices." Math. Comp. 19 (1965) 644-646. RoT~, J. P. "An application of algebraic topology: Kron's method of tearing " Quar. Appl. Math. 17 (1959) 1-24 SWIFT, G "A comment on matrix inversaon by partition." S I A M Rev. 2 (1960) 132-33.
  • 25. Indexing Techniques for Sparse Matrices 35. KNUTH, D. ]~. The Art of Computer Programm~ng, Vol. I, Addison--Wesley, Reading, Mass. 1968 299-304, 554-556. 36. BERZTISS, A . T . Data Structures: Theory and Practice. Academic Press, New York, 1971, 276-279. 37. LARCOMBE, M. "A hst processing approach to the solution of large sparse sets of matrix equations and the factorization of the overall matrix." in Large Sparse Sets of L~near Equatwns, Reid, J. K., Ed., Academm Press, London, 1971. 38. WEIL, R. L., ANDKI~TTLER,P . C . "Rearranging matmces to block-angular form for decompotation (and other) algorithms." Management Science 18, 1 (Sept. 1971) 98-108. 39. GUSTAVSON, F. G. "Some basic techniques for solving sparse systems of linear equations " in Sparse Matmces and Their Applications, Rose, D J , and Willoughby, R. A., Eds., Plenum Press, New York, 1972 41-52. 40. FIKE, C . T . PL/I for Scientific Programmers, 41. 42. 43. 44. 45. 46. • 133 Prentice-Hall, Englewood Cliffs, N. J., 1970 108, 180. WILLOUGHBY, R. A. "A survey of sparse matrix technology." IBM Watson Research Center, RC3872 May 1972. CuTmt.t., E. "Several strategies for reducing the band-width of matrices." in Sparse Matraces and their Applications, Rose, D . J., and Willoughby, R. A., Eds., Plenum Press, New York, 1972, 34-38. TEWARSON,R . P . "Computations withsparse matrices." SIAM Rev., 12, 4 (Oct. 1970) 527543. PETTY, J. S. "FORTRAN M: programming package for band matrices and vectors." Aerospace Research Labs., Wright-Patterson AFB, Ohio, ARL-69-0064 (April, 1969). SHLL~RS, W . R . "On Diakoptics: Tearing an arbitrary system." Quar. Appl. Math. 23 (1965) 188-90. IBM System/360 Model 65 Functional Characteristics, IBM A22-6884-3, File No. $360-01. Computing Surveys, VoI. 5, No. 2, June 1973