3. 1. Arithmetic Logic Unit (ALU)
An arithmeticlogic unit (ALU) is the part of a computer processor (CPU) that carries out
arithmetic and logic operations on the operands in computer instruction words. ALU
performs operations such as addition, subtraction and multiplication of integers and bit
wise AND, OR, NOT, XOR and other Boolean operations. The CPU's instruction decode
logic determines which particular operation the ALU should perform, the source of the
operands and the destination of the result.
The width in bits of the words which the ALU handles is usually the same as that quoted
for the processor as a whole whereas its external buses may be narrower. Floatingpoint
operations are usually done by a separate "floatingpoint unit". Some processors use the
ALU for address calculations (e.g. incrementing the program counter), others have
separate logic for this.
In some processors, the ALU is divided into two units, an arithmetic unit (AU) and a
logic unit (LU). Some processors contain more than one AU for example, one for fixed
point operations and another for floatingpoint operations. (In personal computers
floating point operations are sometimes done by a floating point unit on a separate chip
called a numeric coprocessor.)
Typically, the ALU has direct input and output access to the processor controller, main
memory (random access memory or RAM in a personal computer), and input/output
devices. Inputs and outputs flow along an electronic path that is called a bus. The input
consists of an instruction word (sometimes called a machine instruction word) that
contains an operation code (sometimes called an "op code"), one or more operands, and
sometimes a format code. The operation code tells the ALU what operation to perform
and the operands are used in the operation. (For example, two operands might be added
together or compared logically.) The format may be combined with the op code and tells,
for example, whether this is a fixedpoint or a floatingpoint instruction. The output
consists of a result that is placed in a storage register and settings that indicate whether
3
5. 3. Instructions including signed and unsigned data operation
4. Internal accumulators
5. Condition registers (ZF, CF, SF, OF)
6. Instruction with 1 or 2 operands with internal stack
7. Carry lookahead algorithm to implement ADD operation
8. Wallace Tree algorithm to implement MUL operation
2.2. Set of the functions
The input signals for ALU module are rst, clk, opcode and data.
rst: reset signal
clk: system clocks
opcode: the instruction operational code, [13:6] for instruction code, [5:3] for register 1
address, [2:0] for register 2 address
data: instruction required data or addresses for ALU module
The output signals for ALU module are output_address, output_data and
pc_counter_address.
output_address: when meeting the external memory storage instructions, the
output_address will output the external memory address
output_data: when meeting the external memory storage instructions, the output_data will
output the data to be stored in the external memory
pc_counter_address: when meeting the jumping instructions, the pc_counter_address will
output the jumping address to inform the external PC
The internal registers are explained as following.
system_stack: ALU internal system stack, which is for PUSH and POP operation
ZF,CF,SF,OF: four flags for arithmetic instruction result
accum: ALU accumulator A
bccum: ALU accumulator B
accum_for_out: to solve the pipeline hazard (readafterwrite), this internal register is
5
8. always@(posedge Clk) Operand0 = Operand0 & Operand1;
endmodule
3.2. OR:
(1) Instruction Format: OR Oprand1 Oprand2
(2) Function: Oprand1 and Oprand2 are two 16_bits register inputs. This operator
performs the bitwise_OR operation, and put the result in Oprand1.
The truth table defines the behavior of each bit operation shown in figure 2(a). The
circuit symbol is shown in figure 2(b).
(3) Implementing OR with Verilog HDL:
module ALU_OR(Clk, Operand0, Operand1);
input [15:0] Operand0;
input [15:0] Operand1;
input Clk;
reg[15:0] Operand0;
always@(posedge Clk) Operand0 = Operand0 | Operand1;
endmodule
3.3. XOR:
(1) Instruction Format: XOR Oprand1 Oprand2
(2) Function: Oprand1 and Oprand2 are two 16_bits register inputs. This operator
performs the bitwise_XOR operation, and put the result in Oprand1.
The truth table defines the behavior of each bit operation shown in figure 3(a). The
8
9. circuit symbol is shown in figure 3(b).
(3) Implementing XOR with Verilog HDL:
module ALU_XOR(Clk, Operand0, Operand1);
input [15:0] Operand0;
input [15:0] Operand1;
input Clk;
reg[15:0] Operand0;
always@(posedge Clk) Operand0 = Operand0 ^ Operand1;
endmodule
3.4. NOT:
(1) Instruction Format: NOT Oprand1
(2) Function: Oprand1 is a 16_bits register input. This operator performs the
bitwise_NOT operation on Oprand1.
The truth table defines the behavior of each bit operation shown in figure 4(a). The
circuit symbol is shown in figure 4(b).
9
10.
(3) Implementing NOT with Verilog HDL:
module ALU_NOT(Clk, Operand0);
input [15:0] Operand0;
input Clk;
reg[15:0] Operand0;
always@(posedge Clk) Operand0 = ~Operand0;
Endmodule
3.5. NEG:
(1) Three different ways of representing signed numbers:
a) Signed magnitude representation: To negate a number by adding an extra sign
bit to the front of our numbers. By convention:
– A 0 sign bit represents a positive number.
– A 1 sign bit represents a negative number.
For example: 01101B = +13 D(a positive number in 5bit signed magnitude)
1 1101B = 13 D(a negative number in 5bit signed magnitude)
b) One’s complement representation: To negate a number by complementing
each bit of the number.
For example: 01101 B= +13D (a positive number in 5bit one’s complement)
1 0010B= 13D (a negative number in 5bit one’s complement)
c) Two’s complement representation: To negate a number, complement each
bit (just as for ones’ complement) and then add 1.
For example: 01101 = +1310 (a positive number in 5bit two’s complement)
1 0011 = 1310 (a negative number in 5bit two’s complement)
10
11. In this project, we use the third, 2’s complement representation, to represent
signed numbers.
(2) Instruction Format: NEG Oprand1
(3) Function: Oprand1 is a 16_bits register inputs. This operator performs negating
Oprand1, and put the result in Oprand1.
(4) Implementing NEG with Verilog HDL:
module ALU_NEG(Clk, Operand0);
input [15:0] Operand0;
input Clk;
reg [15:0] Operand0;
always@(posedge Clk) Operand0 = ~Operand0 + 1;
endmodule
4. Carry Look Ahead Adder
The sum of two singlebit binary numbers can be formed using the logic gates. If both
bits are zero the sum is zero; the sum of a one and a zero is one, but the sum of two ones
is two, which is represented in binary notation by the two bits '10'. An adder for two
singlebit inputs must therefore have two output bits: a SUM bit with the same weight as
the input bits and a CARRY bit which has twice that weight. An adder for Nbits binary
numbers can be constructed from singlebit adders, but all bits except the first may have
to accept a carry input from the next lower stage. Each bit of the adder produces a sum
and a carryout from the inputs and the carryin.
In the 4bit ripplecarry adder, called so because the result of an addition of two
bits depends on the carry generated by the addition of the previous two bits. Thus, the
Sum of the most significant bit is only available after the carry signal has rippled through
the adder from the least significant stage to the most significant stage. This can be easily
11
14. are called the Generate and Propagate term, respectively.
From (2) – (4) it is clear that both the Propagate and Generate terms only depend on the
input bits and thus will be valid after one gate delay. If we use the above expression to
calculate the carry signals, we do not need to wait for the carry to ripple through all the
previous stages to find its proper value. Now if we apply this to a 4bit adder.
C1 = G0 + P0.C0 (5)
C2 = G1 + P1.C1 = G1 + P1.G0 + P1.P0.C0 (6)
C3 = G2 + P2.G1 + P2.P1.G0 + P2.P1.P0.C0 (7)
C4 = G3 + P3.G2 + P3.P2.G1 + P3P2.P1.G0 + P3P2.P1.P0.C0 (8)
Now it is clear that the carryout bit, C i+1, of the last stage will be available after three
delays (one delay to calculate the Propagate signal and two delays as a result of the AND
and OR gate). The Sum signal can be calculated as follows,
Si = Ai ⊕ Bi ⊕ Ci = Pi ⊕ Ci (9)
The Sum bit will thus be available after one additional gate delay. The advantage is that
these delays will be the same independent of the number of bits one needs to add, in
contrast to the ripple counter.
The carrylookahead adder can be broken up in two modules: (A) the Partial Full Adder,
PFA, which generates Si, Pi and Gi as defined by equations 3, 4 and 9 above; and (B) the
Carry Lookahead Logic, which generates the carryout bits according to equations 5 to
8. The 4bit adder can then be built by using 4 PFAs and the Carry Lookahead logic
block as shown in Fig. 7.
The disadvantage of the carrylookahead adder is that the carry logic is getting quite
complicated for more than 4 bits. For that reason, carrylookahead adders are usually
implemented as 4bit modules and are used in a hierarchical structure to realize adders
that have multiples of 4 bits. Fig. 8 shows the block diagram for a 16bit CLA adder. The
14
16. Figure 8. The block diagram of a 16bit CLA Adder
5. Wallace Tree Multiplier
At the most basic level, digital multiplication can be seen as a series of bit shifts and bit
additions, where two numbers, the multiplier and the multiplicand are combined into the
final result. Consider the multiplication of two numbers: the multiplier P, and
multiplicand C, where P is an nbit number with bit representation {pn1,pn2,...,p0}, the
most significant bit being pn1 and the least significant bit being p0; C has a similar bit
representation {cn1,cn2,...,c0}. For unsigned multiplication, up to n shifted copies of the
multiplicand are added to form the result. The entire procedure is divided into three steps:
partial product (PP) generation, partial product reduction, and final addition. This is
illustrated conceptually in Figure 9. In order to find a convenient and fast structure for
our multiplier, we should consider various multiplier structures.
16