SlideShare a Scribd company logo
1 of 59
How computers do math
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
Number Format Considerations
 Type of numbers (integer, fraction, real, complex)
 Range of values
 between smallest and largest values
 wider in floating-point formats
 Precision of values (max. accuracy)
 usually related to number of bits allocated
n bits => represent 2n values/levels
 Value/weight of least-significant bit
 Cost of hardware to store & process numbers (some
formats more difficult to add, multiply/divide, etc.)
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
Taxonomy of Computer Information
Information
Instructions Addresses
Data
Numeric Non-numeric
Fixed-point Floating-point Character
(ASCII)
Boolean
Unsigned
(ordinal)
Signed
Sign-magnitude 2’s complement
Single-
precision
Other
Double-
precision
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
Unsigned Integers
 Positional number system:
an-1an-2 …. a2a1a0 =
an-1x2n-1 + an-2x2n-2 + … + a2x22 + a1x21 + a0x20
 Range = [(2n -1) … 0]
 Carry out of MSB has weight 2n
 Fixed-point fraction:
0.a-1a-2 …. a-n = a-1x2-1 + a-2x2-2 + … + a-nx2-n
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
Signed Integers
 Sign-magnitude format (n-bit values):
A = San-2 …. a2a1a0 (S = sign bit)
 Range = [-(2n-1-1) … +(2n-1-1)]
 Addition/subtraction difficult (multiply easy)
 Redundant representations of 0
 2’s complement format (n-bit values):
-A represented by 2n – A
 Range = [-(2n-1) … +(2n-1-1)]
 Addition/subtraction easier (multiply harder)
 Single representation of 0
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
Computing the 2’s Complement
To compute the 2’s complement of A:
 Let A = an-1an-2 …. a2a1a0
 2n - A = 2n -1 + 1 – A = (2n -1) - A + 1
1 1 … 1 1 1
- an-1 an-2 …. a2 a1 a0 + 1
-------------------------------------
a’n-1a’n-2 …. a’2a’1a’0 + 1 (one’s complement + 1)
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
2’s Complement Arithmetic
 Let (2n-1-1) ≥ A ≥ 0 and (2n-1-1) ≥ B ≥ 0
 Case 1: A + B
 (2n-2) ≥ (A + B) ≥ 0
 Since result < 2n, there is no carry out of the MSB
 Valid result if (A + B) < 2n-1
 MSB (sign bit) = 0
 Overflow if (A + B) ≥ 2n-1
 MSB (sign bit) = 1 if result ≥ 2n-1
 Carry into MSB
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
2’s Complement Arithmetic
 Case 2: A - B
 Compute by adding: A + (-B)
 2’s complement: A + (2n - B)
 -2n-1 < result < 2n-1 (no overflow possible)
 If A ≥ B: 2n + (A - B) ≥ 2n
 Weight of adder carry output = 2n
 Discard carry (2n), keeping (A-B), which is ≥ 0
 If A < B: 2n + (A - B) < 2n
 Adder carry output = 0
 Result is 2n - (B - A) =
2’s complement representation of -(B-A)
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
2’s Complement Arithmetic
 Case 3: -A - B
 Compute by adding: (-A) + (-B)
 2’s complement: (2n - A) + (2n - B) = 2n + 2n - (A + B)
 Discard carry (2n), making result 2n - (A + B)
= 2’s complement representation of -(A + B)
 0 ≥ result ≥ -2n
 Overflow if -(A + B) < -2n-1
 MSB (sign bit) = 0 if 2n - (A + B) < 2n-1
 no carry into MSB
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
Relational Operators
Compute A-B & test ALU “flags” to compare A vs. B
ZF = result zero OF = 2’s complement overflow
SF = sign bit of result CF = adder carry output
Signed Unsigned
A = B ZF = 1 ZF = 1
A <> B ZF = 0 ZF = 0
A ≥ B (SF  OF) = 0 CF = 1 (no borrow)
A > B (SF  OF) + ZF = 0 CF  ZF’ = 1
A ≤ B (SF  OF) + ZF = 1 CF  ZF’ = 0
A < B (SF  OF) = 1 CF = 0 (borrow)
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
MIPS Overflow Detection
 An exception (interrupt) occurs when overflow
detected for add,addi,sub
 Control jumps to predefined address for exception
 Interrupted address is saved for possible resumption
 Details based on software system / language
 example: flight control vs. homework assignment
 Don't always want to detect overflow
— new MIPS instructions: addu, addiu, subu
note: addiu still sign-extends!
note: sltu, sltiu for unsigned comparisons
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
Designing the Arithmetic & Logic
Unit (ALU)
 Provide arithmetic and logical functions as needed by
the instruction set
 Consider tradeoffs of area vs. performance
(Material from Appendix B)
32
32
32
operation
result
a
b
ALU
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
 Not easy to decide the “best” way to build something
 Don't want too many inputs to a single gate (fan in)
 Don’t want to have to go through too many gates (delay)
 For our purposes, ease of comprehension is important
 Let's look at a 1-bit ALU for addition:
 How could we build a 1-bit ALU for add, and, and or?
 How could we build a 32-bit ALU?
Different Implementations
cout = a b + a cin + b cin
sum = a xor b xor cin
Sum
CarryIn
CarryOut
a
b
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
Building a 32 bit ALU
b
0
2
Result
Operation
a
1
CarryIn
CarryOut
Result31
a31
b31
Result0
CarryIn
a0
b0
Result1
a1
b1
Result2
a2
b2
Operation
ALU0
CarryIn
CarryOut
ALU1
CarryIn
CarryOut
ALU2
CarryIn
CarryOut
ALU31
CarryIn
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
 Two's complement approach: just negate b and add.
What about subtraction (a – b) ?
0
2
Result
Operation
a
1
CarryIn
CarryOut
0
1
Binvert
b
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
Adding a NOR function
 Can also choose to invert a. How do we get “a NOR b” ?
Binvert
a
b
CarryIn
CarryOut
Operation
1
0
2
+
Result
1
0
Ainvert
1
0
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
 Need to support the set-on-less-than instruction (slt)
 remember: slt is an arithmetic instruction
 produces a 1 if rs < rt and 0 otherwise
 use subtraction: (a-b) < 0 implies a < b
 Need to support test for equality (beq $t5, $t6, $t7)
 use subtraction: (a-b) = 0 implies a = b
Tailoring the ALU to the MIPS
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
Supporting slt
Use this ALU for most significant bit
all other bits
Binvert
a
b
CarryIn
Operation
1
0
2
+
Result
1
0
3
Less
Overflow
detection
Set
Overflow
Ainvert
1
0
Binvert
a
b
CarryIn
CarryOut
Operation
1
0
2
+
Result
1
0
Ainvert
1
0
3
Less
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
Supporting slt
a0
Operation
CarryIn
ALU0
Less
CarryOut
b0
CarryIn
a1 CarryIn
ALU1
Less
CarryOut
b1
Result0
Result1
a2 CarryIn
ALU2
Less
CarryOut
b2
a31 CarryIn
ALU31
Less
b31
Result2
Result31
.
.
.
.
.
.
.
.
.
Binvert
Ainvert
0
0
0 Overflow
Set
CarryIn
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
Test for equality
 Notice control lines:
0000 = and
0001 = or
0010 = add
0110 = subtract
0111 = slt
1100 = NOR
•Note: zero is a 1 when the result is zero!
a0
Operation
CarryIn
ALU0
Less
CarryOut
b0
a1 CarryIn
ALU1
Less
CarryOut
b1
Result0
Result1
a2 CarryIn
ALU2
Less
CarryOut
b2
a31 CarryIn
ALU31
Less
b31
Result2
Result31
.
.
.
.
.
.
.
.
.
Bnegate
Ainvert
0
0
0 Overflow
Set
CarryIn
.
.
.
.
.
.
Zero
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
Conclusion
 We can build an ALU to support the MIPS instruction set
 key idea: use multiplexor to select the output we want
 we can efficiently perform subtraction using two’s complement
 we can replicate a 1-bit ALU to produce a 32-bit ALU
 Important points about hardware
 all of the gates are always working
 the speed of a gate is affected by the number of inputs to the gate
 the speed of a circuit is affected by the number of gates in series
(on the “critical path” or the “deepest level of logic”)
 Our primary focus: comprehension, however,
 Clever changes to organization can improve performance
(similar to using better algorithms in software)
 We saw this in multiplication, let’s look at addition now
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
 Is a 32-bit ALU as fast as a 1-bit ALU?
 Is there more than one way to do addition?
 two extremes: ripple carry and sum-of-products
Can you see the ripple? How could you get rid of it?
c1 = b0c0 + a0c0 + a0b0
c2 = b1c1 + a1c1 + a1b1 c2 =
c3 = b2c2 + a2c2 + a2b2 c3 =
c4 = b3c3 + a3c3 + a3b3 c4 =
Not feasible! Why?
Problem: ripple carry adder is
slow
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
One-bit Full-Adder Circuit
ai
bi
XOR
AND
XOR
AND
OR
ci
sumi
Ci+1
FAi
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
32-bit Ripple-Carry Adder
FA0
FA1
FA2
FA31
c0
a0
b0
a1
b1
a2
b2
a31
b31
sum0
sum1
sum2
sum31
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
How Fast is Ripple-Carry Adder?
 Longest delay path (critical path) runs from cin to
sum31.
 Suppose delay of full-adder is 100ps.
 Critical path delay = 3,200ps
 Clock rate cannot be higher than 1012/3,200 =
312MHz.
 Must use more efficient ways to handle carry.
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
Fast Adders
 In general, any output of a 32-bit adder can be evaluated
as a logic expression in terms of all 65 inputs.
 Levels of logic in the circuit can be reduced to log2N for
N-bit adder. Ripple-carry has N levels.
 More gates are needed, about log2N times that of ripple-
carry design.
 Fastest design is known as carry lookahead adder.
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
N-bit Adder Design Options
Type of adder Time complexity
(delay)
Space complexity
(size)
Ripple-carry O(N) O(N)
Carry-lookahead O(log2N) O(N log2N)
Carry-skip O(√N) O(N)
Carry-select O(√N) O(N)
Reference: J. L. Hennessy and D. A. Patterson, Computer Architecture:
A Quantitative Approach, Second Edition, San Francisco, California,
1990.
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
 An approach in-between our two extremes
 Motivation:
 If we didn't know the value of carry-in, what could we do?
 When would we always generate a carry? gi = aibi
 When would we propagate the carry? pi = ai + bi
 Did we get rid of the ripple?
c1 = g0 + p0c0
c2 = g1 + p1c1 c2 = g1 + p1 g0 + p1p0c0
c3 = g2 + p2c2 c3 = …
c4 = g3 + p3c3 c4 = …
Feasible! Why?
Carry-lookahead adder
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
 Can’t build a 16 bit adder this way... (too big)
 Could use ripple carry of 4-bit CLA adders
 Better: use the CLA principle again!
Use principle to build bigger
adders
a4 CarryIn
ALU1
P1
G1
b4
a5
b5
a6
b6
a7
b7
a0 CarryIn
ALU0
P0
G0
b0
Carry-lookahead unit
a1
b1
a2
b2
a3
b3
CarryIn
Result0–3
pi
gi
ci + 1
pi + 1
gi + 1
C1
Result4–7
a8 CarryIn
ALU2
P2
G2
b8
a9
b9
a10
b10
a11
b11
ci + 2
pi + 2
gi + 2
C2
Result8–11
a12 CarryIn
ALU3
P3
G3
b12
a13
b13
a14
b14
a15
b15
ci + 3
pi + 3
gi + 3
C3
Result12–15
ci + 4
C4
CarryOut
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
Carry-Select Adder
16-bit
ripple
carry
adder
a0-a15
b0-b15
cin
sum0-sum15
16-bit
ripple
carry
adder
a16-a31
b16-b31
0
16-bit
ripple
carry
adder
a16-a31
b16-b31
1
Multiplexer
sum16-sum31
0
1
This is known as
carry-select adder
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
ALU Summary
 We can build an ALU to support MIPS addition
 Our focus is on comprehension, not performance
 Real processors use more sophisticated techniques for arithmetic
 Where performance is not critical, hardware description languages
allow designers to completely automate the creation of hardware!
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
Multiplication
 More complicated than addition
 accomplished via shifting and addition
 More time and more area
 Let's look at 3 versions based on a gradeschool
algorithm
0010 (multiplicand)
__x_1011 (multiplier)
 Negative numbers: convert and multiply
 there are better techniques, we won’t look at them
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
Multiplication: Implementation
Datapath
Control
Multiplicand
Shift left
64 bits
64-bit ALU
Product
Write
64 bits
Control test
Multiplier
Shift right
32 bits
32nd repetition?
1a. Add multiplicand to product and
place the result in Product register
Multiplier0 = 0
1. Test
Multiplier0
Start
Multiplier0 = 1
2. Shift the Multiplicand register left 1 bit
3. Shift the Multiplier register right 1 bit
No: < 32 repetitions
Yes: 32 repetitions
Done
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
Final Version
Multiplicand
32 bits
32-bit ALU
Product
Write
64 bits
Control
test
Shift right
32nd repetition?
Product0 = 0
1. Test
Product0
Start
Product0 = 1
3. Shift the Product register right 1 bit
No: < 32 repetitions
Yes: 32 repetitions
Done
What goes here?
•Multiplier starts in right half of product
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
Multiplying Signed Numbers with
Boothe’s Algorithm
 Consider A x B where A and B are signed integers
(2’s complemented format)
 Decompose B into the sum B1 + B2 + … + Bn
A x B = A x (B1 + B2 + … + Bn)
= (A x B1) + (A x B2) + … (A x Bn)
 Let each Bi be a single string of 1’s embedded in 0’s:
…0011…1100…
 Example:
0110010011100 = 0110000000000
+ 0000010000000
+ 0000000011100
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
Boothe’s Algorithm
 Scanning from right to left, bit number u is the first 1
bit of the string and bit v is the first 0 left of the string:
v u
Bi = 0 … 0 1 … 1 0 … 0
= 0 … 0 1 … 1 1 … 1 (2v – 1)
- 0 … 0 0 … 0 1 … 1 (2u – 1)
= (2v – 1) - (2u – 1)
= 2v – 2u
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
Boothe’s Algorithm
 Decomposing B:
A x B = A x (B1 + B2 + … )
= A x [(2v1 – 2u1) + (2v2 – 2u2) + …]
= (A x 2v1) – (A x 2u1) + (A x 2v2) – (A x 2u2) …
 A x B can be computed by adding and subtracted
shifted values of A:
 Scan bits right to left, shifting A once per bit
 When the bit string changes from 0 to 1, subtract
shifted A from the current product P – (A x 2u)
 When the bit string changes from 1 to 0, add shifted A
to the current product P + (A x 2v)
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
Floating Point Numbers
 We need a way to represent a wide range of numbers
 numbers with fractions, e.g., 3.1416
 large number:
976,000,000,000,000 = 9.76 × 1014
 small number:
0.0000000000000976 = 9.76 × 10-14
 Representation:
 sign, exponent, significand:
(–1)sign × significand × 2exponent
 more bits for significand gives more accuracy
 more bits for exponent increases range
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
Scientific Notation
 Scientific notation:
 0.525×105 = 5.25×104 = 52.5×103
 5.25 ×104 is in normalized scientific notation.
 position of decimal point fixed
 leading digit non-zero
 Binary numbers
 5.25 = 101.01 = 1.0101×22
 Binary point
 multiplication by 2 moves the point to the left
 division by 2 moves the point to the right
 Known as floating point format.
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
Binary to Decimal Conversion
Binary (-1)S (1.b1b2b3b4) × 2E
Decimal (-1)S × (1 + b1×2-1 + b2×2-2 + b3×2-3 + b4×2-4) × 2E
Example: -1.1100 × 2-2 (binary) = - (1 + 2-1 + 2-2) ×2-2
= - (1 + 0.5 + 0.25)/4
= - 1.75/4
= - 0.4375 (decimal)
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
IEEE Std. 754 Floating-Point Format
S E: 8-bit Exponent F: 23-bit Fraction
bits 0-22
bits 23-30
bit 31
S E: 11-bit Exponent F: 52-bit Fraction +
bits 0-19
bits 20-30
bit 31
Continuation of 52-bit Fraction
bits 0-31
Single-Precision
Double-Precision
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
IEEE 754 floating-point standard
 Represented value = (–1)sign × (1+F) × 2exponent – bias
 Exponent is “biased” (excess-K format) to make sorting easier
 bias of 127 for single precision and 1023 for double precision
 E values in [1 .. 254] (0 and 255 reserved)
 Range = 2-126 to 2+127 (1038 to 10+38)
 Significand in sign-magnitude, normalized form
 Significand = (1 + F) = 1. b-1b-1b-1…b-23
 Suppress storage of leading 1
 Overflow: Exponent requiring more than 8 bits. Number can be
positive or negative.
 Underflow: Fraction requiring more than 23 bits. Number can be
positive or negative.
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
IEEE 754 floating-point standard
 Example:
 Decimal: -5.75 = - ( 4 + 1 + ½ + ¼ )
 Binary: -101.11 = -1.0111 x 22
 Floating point: exponent = 129 = 10000001
 IEEE single precision:
110000001011100000000000000000000
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
Examples
1.1010001 × 210100 = 0 10010011 10100010000000000000000 = 1.6328125 × 220
-1.1010001 × 210100 = 1 10010011 10100010000000000000000 = -1.6328125 × 220
1.1010001 × 2-10100 = 0 01101011 10100010000000000000000 = 1.6328125 × 2-20
-1.1010001 × 2-10100 = 1 01101011 10100010000000000000000 = -1.6328125 × 2-20
Biased exponent (0-255), bias 127 (01111111) to be subtracted
0.5
0.125
0.0078125
0.6328125
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
Negative
Overflow
Positive
Overflow
Expressible numbers
Numbers in 32-bit Formats
 Two’s complement integers
 Floating point numbers
 Ref: W. Stallings, Computer Organization and Architecture, Sixth Edition, Upper Saddle River, NJ: Prentice-Hall.
-231 231-1
0
Expressible
negative
numbers
Expressible
positive
numbers
0
-2-127 2-127
Positive underflow
Negative underflow
(2 – 2-23)×2127
- (2 – 2-23)×2127
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
IEEE 754 Special Codes
 ± 1.0 × 2-127
 Smallest positive number in single-precision IEEE 754 standard.
 Interpreted as positive/negative zero.
 Exponent less than -127 is positive underflow (regard as zero).
S 00000000 00000000000000000000000
S 11111111 00000000000000000000000
Zero
Infinity
 ± 1.0 × 2128
 Largest positive number in single-precision IEEE 754 standard.
 Interpreted as ± ∞
 If true exponent = 128 and fraction ≠ 0, then the number is greater
than ∞. It is called “not a number” or NaN (interpret as ∞).
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
Addition and Subtraction
 Addition/subtraction of two floating-point numbers:
Example: 2 x 103 Align mantissas: 0.2 x 104
+ 3 x 104 + 3 x 104
3.2 x 104
 General Case:
m1 x 2e1 ± m2 x 2e2 = (m1 ± m2 x 2e2-e1) x 2e1 for e1 > e2
= (m1 x 2e1-e2 ± m2) x 2e2 for e2 > e1
 Shift smaller mantissa right by | e1 - e2| bits to align the mantissas.
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
Addition/Subtraction Algorithm
0. Zero check
- Change the sign of subtrahend
- If either operand is 0, the other is the result
1. Significand alignment: right shift smaller significand until two
exponents are identical.
2. Addition: add significands and report exception if overflow occurs.
3. Normalization
- Shift significand bits to normalize.
- report overflow or underflow if exponent goes out of range.
4. Rounding
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
FP Add/Subtract (PH Text Figs. 3.16/17)
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
Example
 Subtraction: 0.5ten- 0.4375ten
 Step 0: Floating point numbers to be added
1.000two×2-1 and -1.110two×2-2
 Step 1: Significand of lesser exponent is shifted
right until exponents match
-1.110two×2-2 → - 0.111two×2-1
 Step 2: Add significands, 1.000two + (- 0.111two)
Result is 0.001two ×2-1
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
Example (Continued)
 Step 3: Normalize, 1.000two× 2-4
No overflow/underflow since
127 ≥ exponent ≥ -126
 Step 4: Rounding, no change since the sum fits
in 4 bits.
1.000two ×2-4 = (1+0)/16 = 0.0625ten
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
FP Multiplication: Basic Idea
(m1 x 2e1) x (m2 x 2e2) = (m1 x m2) x 2e1+e2
 Separate signs
 Add exponents
 Multiply significands
 Normalize, round, check overflow
 Replace sign
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
FP Multiplication Algorithm
P-H Figure 3.18
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
FP Mult. Illustration
 Multiply 0.5ten and -0.4375ten (answer = - 0.21875ten) or
 Multiply 1.000two×2-1 and -1.110two×2-2
 Step 1: Add exponents
-1 + (-2) = -3
 Step 2: Multiply significands
1.000
×1.110
0000
1000
1000
1000
1110000 Product is 1.110000
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
FP Mult. Illustration (Cont.)
 Step 3:
 Normalization: If necessary, shift significand right and increment
exponent.
Normalized product is 1.110000 × 2-3
 Check overflow/underflow: 127 ≥ exponent ≥ -126
 Step 4: Rounding: 1.110 × 2-3
 Step 5: Sign: Operands have opposite signs,
Product is -1.110 × 2-3
Decimal value = - (1+0.5+0.25)/8 = - 0.21875ten
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
FP Division: Basic Idea
 Separate sign.
 Check for zeros and infinity.
 Subtract exponents.
 Divide significands.
 Normalize/overflow/underflow.
 Rounding.
 Replace sign.
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
MIPS Floating Point
 32 floating point registers, $f0, . . . , $f31
 FP registers used in pairs for double precision; $f0
denotes double precision content of $f0,$f1
 Data transfer instructions:
 lwc1 $f1, 100($s2) # $f1←Mem[$s1+100]
 swc1 $f1, 100($s2) # Mem[$s2+100]←$f1
 Arithmetic instructions: (xxx=add, sub, mul, div)
 xxx.s single precision
 xxx.d double precision
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
Floating Point Complexities
 Operations are somewhat more complicated (see text)
 In addition to overflow we can have “underflow”
 Accuracy can be a big problem
 IEEE 754 keeps two extra bits, guard and round
 four rounding modes
 positive divided by zero yields “infinity”
 zero divide by zero yields “not a number”
 other complexities
 Implementing the standard can be tricky
 Not using the standard can be even worse
 see text for description of 80x86 and Pentium bug!
Spring 2005 ELEC 5200/6200
From Patterson/Hennessey Slides
Chapter Three Summary
 Computer arithmetic is constrained by limited precision
 Bit patterns have no inherent meaning but standards do exist
 two’s complement
 IEEE 754 floating point
 Computer instructions determine “meaning” of the bit patterns
 Performance and accuracy are important so there are many
complexities in real machines
 Algorithm choice is important and may lead to hardware
optimizations for both space and time (e.g., multiplication)
 You may want to look back (Section 3.10 is great reading!)

More Related Content

Similar to how computers do math from patterson and hennessey book

Speeding Up Distributed Machine Learning Using Codes
Speeding Up Distributed Machine Learning Using CodesSpeeding Up Distributed Machine Learning Using Codes
Speeding Up Distributed Machine Learning Using CodesNAVER Engineering
 
ArithmeticCircuits.Ivy Nile vs. Rhea Ripley
ArithmeticCircuits.Ivy Nile vs. Rhea RipleyArithmeticCircuits.Ivy Nile vs. Rhea Ripley
ArithmeticCircuits.Ivy Nile vs. Rhea RipleyRudraBhai3
 
Combinational and sequential logic
Combinational and sequential logicCombinational and sequential logic
Combinational and sequential logicDeepak John
 
Module 4_Digital Electronics till complements.pdf
Module 4_Digital Electronics till complements.pdfModule 4_Digital Electronics till complements.pdf
Module 4_Digital Electronics till complements.pdfmanjunath V Gudur
 
Lecture11 combinational logic dynamics
Lecture11 combinational logic dynamicsLecture11 combinational logic dynamics
Lecture11 combinational logic dynamicsvidhya DS
 
Digital-Logic40124sequential circuits logic gatepptx
Digital-Logic40124sequential circuits logic gatepptxDigital-Logic40124sequential circuits logic gatepptx
Digital-Logic40124sequential circuits logic gatepptxssuser6feece1
 
unit-i-number-systems.pdf
unit-i-number-systems.pdfunit-i-number-systems.pdf
unit-i-number-systems.pdfRameshK531901
 
FYBSC IT Digital Electronics Unit III Chapter II Arithmetic Circuits
FYBSC IT Digital Electronics Unit III Chapter II Arithmetic CircuitsFYBSC IT Digital Electronics Unit III Chapter II Arithmetic Circuits
FYBSC IT Digital Electronics Unit III Chapter II Arithmetic CircuitsArti Parab Academics
 
Digital Comprator
Digital CompratorDigital Comprator
Digital Compratorsuraj829
 
combinational-circuit (1).ppt
combinational-circuit (1).pptcombinational-circuit (1).ppt
combinational-circuit (1).pptThanmayiKumar
 
DPCO-Unit 2-Combinational Circuit.pdf
DPCO-Unit 2-Combinational Circuit.pdfDPCO-Unit 2-Combinational Circuit.pdf
DPCO-Unit 2-Combinational Circuit.pdfssuser08e250
 

Similar to how computers do math from patterson and hennessey book (20)

Speeding Up Distributed Machine Learning Using Codes
Speeding Up Distributed Machine Learning Using CodesSpeeding Up Distributed Machine Learning Using Codes
Speeding Up Distributed Machine Learning Using Codes
 
Digital logic
Digital logicDigital logic
Digital logic
 
Multipliers in VLSI
Multipliers in VLSIMultipliers in VLSI
Multipliers in VLSI
 
9525.ppt
9525.ppt9525.ppt
9525.ppt
 
ArithmeticCircuits.Ivy Nile vs. Rhea Ripley
ArithmeticCircuits.Ivy Nile vs. Rhea RipleyArithmeticCircuits.Ivy Nile vs. Rhea Ripley
ArithmeticCircuits.Ivy Nile vs. Rhea Ripley
 
Unit 4 dica
Unit 4 dicaUnit 4 dica
Unit 4 dica
 
Combinational and sequential logic
Combinational and sequential logicCombinational and sequential logic
Combinational and sequential logic
 
Module 4_Digital Electronics till complements.pdf
Module 4_Digital Electronics till complements.pdfModule 4_Digital Electronics till complements.pdf
Module 4_Digital Electronics till complements.pdf
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
 
ALU.ppt
ALU.pptALU.ppt
ALU.ppt
 
Vlsi presentation final
Vlsi presentation finalVlsi presentation final
Vlsi presentation final
 
Lecture11 combinational logic dynamics
Lecture11 combinational logic dynamicsLecture11 combinational logic dynamics
Lecture11 combinational logic dynamics
 
Digital-Logic40124sequential circuits logic gatepptx
Digital-Logic40124sequential circuits logic gatepptxDigital-Logic40124sequential circuits logic gatepptx
Digital-Logic40124sequential circuits logic gatepptx
 
unit-i-number-systems.pdf
unit-i-number-systems.pdfunit-i-number-systems.pdf
unit-i-number-systems.pdf
 
14 Lec11 2003
14 Lec11 200314 Lec11 2003
14 Lec11 2003
 
FYBSC IT Digital Electronics Unit III Chapter II Arithmetic Circuits
FYBSC IT Digital Electronics Unit III Chapter II Arithmetic CircuitsFYBSC IT Digital Electronics Unit III Chapter II Arithmetic Circuits
FYBSC IT Digital Electronics Unit III Chapter II Arithmetic Circuits
 
Digital Comprator
Digital CompratorDigital Comprator
Digital Comprator
 
combinational-circuit (1).ppt
combinational-circuit (1).pptcombinational-circuit (1).ppt
combinational-circuit (1).ppt
 
DPCO-Unit 2-Combinational Circuit.pdf
DPCO-Unit 2-Combinational Circuit.pdfDPCO-Unit 2-Combinational Circuit.pdf
DPCO-Unit 2-Combinational Circuit.pdf
 
Digital Logic Design
Digital Logic Design Digital Logic Design
Digital Logic Design
 

Recently uploaded

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 

Recently uploaded (20)

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 

how computers do math from patterson and hennessey book

  • 2. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Number Format Considerations  Type of numbers (integer, fraction, real, complex)  Range of values  between smallest and largest values  wider in floating-point formats  Precision of values (max. accuracy)  usually related to number of bits allocated n bits => represent 2n values/levels  Value/weight of least-significant bit  Cost of hardware to store & process numbers (some formats more difficult to add, multiply/divide, etc.)
  • 3. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Taxonomy of Computer Information Information Instructions Addresses Data Numeric Non-numeric Fixed-point Floating-point Character (ASCII) Boolean Unsigned (ordinal) Signed Sign-magnitude 2’s complement Single- precision Other Double- precision
  • 4. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Unsigned Integers  Positional number system: an-1an-2 …. a2a1a0 = an-1x2n-1 + an-2x2n-2 + … + a2x22 + a1x21 + a0x20  Range = [(2n -1) … 0]  Carry out of MSB has weight 2n  Fixed-point fraction: 0.a-1a-2 …. a-n = a-1x2-1 + a-2x2-2 + … + a-nx2-n
  • 5. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Signed Integers  Sign-magnitude format (n-bit values): A = San-2 …. a2a1a0 (S = sign bit)  Range = [-(2n-1-1) … +(2n-1-1)]  Addition/subtraction difficult (multiply easy)  Redundant representations of 0  2’s complement format (n-bit values): -A represented by 2n – A  Range = [-(2n-1) … +(2n-1-1)]  Addition/subtraction easier (multiply harder)  Single representation of 0
  • 6. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Computing the 2’s Complement To compute the 2’s complement of A:  Let A = an-1an-2 …. a2a1a0  2n - A = 2n -1 + 1 – A = (2n -1) - A + 1 1 1 … 1 1 1 - an-1 an-2 …. a2 a1 a0 + 1 ------------------------------------- a’n-1a’n-2 …. a’2a’1a’0 + 1 (one’s complement + 1)
  • 7. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides 2’s Complement Arithmetic  Let (2n-1-1) ≥ A ≥ 0 and (2n-1-1) ≥ B ≥ 0  Case 1: A + B  (2n-2) ≥ (A + B) ≥ 0  Since result < 2n, there is no carry out of the MSB  Valid result if (A + B) < 2n-1  MSB (sign bit) = 0  Overflow if (A + B) ≥ 2n-1  MSB (sign bit) = 1 if result ≥ 2n-1  Carry into MSB
  • 8. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides 2’s Complement Arithmetic  Case 2: A - B  Compute by adding: A + (-B)  2’s complement: A + (2n - B)  -2n-1 < result < 2n-1 (no overflow possible)  If A ≥ B: 2n + (A - B) ≥ 2n  Weight of adder carry output = 2n  Discard carry (2n), keeping (A-B), which is ≥ 0  If A < B: 2n + (A - B) < 2n  Adder carry output = 0  Result is 2n - (B - A) = 2’s complement representation of -(B-A)
  • 9. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides 2’s Complement Arithmetic  Case 3: -A - B  Compute by adding: (-A) + (-B)  2’s complement: (2n - A) + (2n - B) = 2n + 2n - (A + B)  Discard carry (2n), making result 2n - (A + B) = 2’s complement representation of -(A + B)  0 ≥ result ≥ -2n  Overflow if -(A + B) < -2n-1  MSB (sign bit) = 0 if 2n - (A + B) < 2n-1  no carry into MSB
  • 10. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Relational Operators Compute A-B & test ALU “flags” to compare A vs. B ZF = result zero OF = 2’s complement overflow SF = sign bit of result CF = adder carry output Signed Unsigned A = B ZF = 1 ZF = 1 A <> B ZF = 0 ZF = 0 A ≥ B (SF  OF) = 0 CF = 1 (no borrow) A > B (SF  OF) + ZF = 0 CF  ZF’ = 1 A ≤ B (SF  OF) + ZF = 1 CF  ZF’ = 0 A < B (SF  OF) = 1 CF = 0 (borrow)
  • 11. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides MIPS Overflow Detection  An exception (interrupt) occurs when overflow detected for add,addi,sub  Control jumps to predefined address for exception  Interrupted address is saved for possible resumption  Details based on software system / language  example: flight control vs. homework assignment  Don't always want to detect overflow — new MIPS instructions: addu, addiu, subu note: addiu still sign-extends! note: sltu, sltiu for unsigned comparisons
  • 12. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Designing the Arithmetic & Logic Unit (ALU)  Provide arithmetic and logical functions as needed by the instruction set  Consider tradeoffs of area vs. performance (Material from Appendix B) 32 32 32 operation result a b ALU
  • 13. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides  Not easy to decide the “best” way to build something  Don't want too many inputs to a single gate (fan in)  Don’t want to have to go through too many gates (delay)  For our purposes, ease of comprehension is important  Let's look at a 1-bit ALU for addition:  How could we build a 1-bit ALU for add, and, and or?  How could we build a 32-bit ALU? Different Implementations cout = a b + a cin + b cin sum = a xor b xor cin Sum CarryIn CarryOut a b
  • 14. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Building a 32 bit ALU b 0 2 Result Operation a 1 CarryIn CarryOut Result31 a31 b31 Result0 CarryIn a0 b0 Result1 a1 b1 Result2 a2 b2 Operation ALU0 CarryIn CarryOut ALU1 CarryIn CarryOut ALU2 CarryIn CarryOut ALU31 CarryIn
  • 15. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides  Two's complement approach: just negate b and add. What about subtraction (a – b) ? 0 2 Result Operation a 1 CarryIn CarryOut 0 1 Binvert b
  • 16. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Adding a NOR function  Can also choose to invert a. How do we get “a NOR b” ? Binvert a b CarryIn CarryOut Operation 1 0 2 + Result 1 0 Ainvert 1 0
  • 17. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides  Need to support the set-on-less-than instruction (slt)  remember: slt is an arithmetic instruction  produces a 1 if rs < rt and 0 otherwise  use subtraction: (a-b) < 0 implies a < b  Need to support test for equality (beq $t5, $t6, $t7)  use subtraction: (a-b) = 0 implies a = b Tailoring the ALU to the MIPS
  • 18. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Supporting slt Use this ALU for most significant bit all other bits Binvert a b CarryIn Operation 1 0 2 + Result 1 0 3 Less Overflow detection Set Overflow Ainvert 1 0 Binvert a b CarryIn CarryOut Operation 1 0 2 + Result 1 0 Ainvert 1 0 3 Less
  • 19. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Supporting slt a0 Operation CarryIn ALU0 Less CarryOut b0 CarryIn a1 CarryIn ALU1 Less CarryOut b1 Result0 Result1 a2 CarryIn ALU2 Less CarryOut b2 a31 CarryIn ALU31 Less b31 Result2 Result31 . . . . . . . . . Binvert Ainvert 0 0 0 Overflow Set CarryIn
  • 20. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Test for equality  Notice control lines: 0000 = and 0001 = or 0010 = add 0110 = subtract 0111 = slt 1100 = NOR •Note: zero is a 1 when the result is zero! a0 Operation CarryIn ALU0 Less CarryOut b0 a1 CarryIn ALU1 Less CarryOut b1 Result0 Result1 a2 CarryIn ALU2 Less CarryOut b2 a31 CarryIn ALU31 Less b31 Result2 Result31 . . . . . . . . . Bnegate Ainvert 0 0 0 Overflow Set CarryIn . . . . . . Zero
  • 21. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Conclusion  We can build an ALU to support the MIPS instruction set  key idea: use multiplexor to select the output we want  we can efficiently perform subtraction using two’s complement  we can replicate a 1-bit ALU to produce a 32-bit ALU  Important points about hardware  all of the gates are always working  the speed of a gate is affected by the number of inputs to the gate  the speed of a circuit is affected by the number of gates in series (on the “critical path” or the “deepest level of logic”)  Our primary focus: comprehension, however,  Clever changes to organization can improve performance (similar to using better algorithms in software)  We saw this in multiplication, let’s look at addition now
  • 22. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides  Is a 32-bit ALU as fast as a 1-bit ALU?  Is there more than one way to do addition?  two extremes: ripple carry and sum-of-products Can you see the ripple? How could you get rid of it? c1 = b0c0 + a0c0 + a0b0 c2 = b1c1 + a1c1 + a1b1 c2 = c3 = b2c2 + a2c2 + a2b2 c3 = c4 = b3c3 + a3c3 + a3b3 c4 = Not feasible! Why? Problem: ripple carry adder is slow
  • 23. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides One-bit Full-Adder Circuit ai bi XOR AND XOR AND OR ci sumi Ci+1 FAi
  • 24. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides 32-bit Ripple-Carry Adder FA0 FA1 FA2 FA31 c0 a0 b0 a1 b1 a2 b2 a31 b31 sum0 sum1 sum2 sum31
  • 25. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides How Fast is Ripple-Carry Adder?  Longest delay path (critical path) runs from cin to sum31.  Suppose delay of full-adder is 100ps.  Critical path delay = 3,200ps  Clock rate cannot be higher than 1012/3,200 = 312MHz.  Must use more efficient ways to handle carry.
  • 26. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Fast Adders  In general, any output of a 32-bit adder can be evaluated as a logic expression in terms of all 65 inputs.  Levels of logic in the circuit can be reduced to log2N for N-bit adder. Ripple-carry has N levels.  More gates are needed, about log2N times that of ripple- carry design.  Fastest design is known as carry lookahead adder.
  • 27. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides N-bit Adder Design Options Type of adder Time complexity (delay) Space complexity (size) Ripple-carry O(N) O(N) Carry-lookahead O(log2N) O(N log2N) Carry-skip O(√N) O(N) Carry-select O(√N) O(N) Reference: J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative Approach, Second Edition, San Francisco, California, 1990.
  • 28. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides  An approach in-between our two extremes  Motivation:  If we didn't know the value of carry-in, what could we do?  When would we always generate a carry? gi = aibi  When would we propagate the carry? pi = ai + bi  Did we get rid of the ripple? c1 = g0 + p0c0 c2 = g1 + p1c1 c2 = g1 + p1 g0 + p1p0c0 c3 = g2 + p2c2 c3 = … c4 = g3 + p3c3 c4 = … Feasible! Why? Carry-lookahead adder
  • 29. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides  Can’t build a 16 bit adder this way... (too big)  Could use ripple carry of 4-bit CLA adders  Better: use the CLA principle again! Use principle to build bigger adders a4 CarryIn ALU1 P1 G1 b4 a5 b5 a6 b6 a7 b7 a0 CarryIn ALU0 P0 G0 b0 Carry-lookahead unit a1 b1 a2 b2 a3 b3 CarryIn Result0–3 pi gi ci + 1 pi + 1 gi + 1 C1 Result4–7 a8 CarryIn ALU2 P2 G2 b8 a9 b9 a10 b10 a11 b11 ci + 2 pi + 2 gi + 2 C2 Result8–11 a12 CarryIn ALU3 P3 G3 b12 a13 b13 a14 b14 a15 b15 ci + 3 pi + 3 gi + 3 C3 Result12–15 ci + 4 C4 CarryOut
  • 30. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Carry-Select Adder 16-bit ripple carry adder a0-a15 b0-b15 cin sum0-sum15 16-bit ripple carry adder a16-a31 b16-b31 0 16-bit ripple carry adder a16-a31 b16-b31 1 Multiplexer sum16-sum31 0 1 This is known as carry-select adder
  • 31. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides ALU Summary  We can build an ALU to support MIPS addition  Our focus is on comprehension, not performance  Real processors use more sophisticated techniques for arithmetic  Where performance is not critical, hardware description languages allow designers to completely automate the creation of hardware!
  • 32. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Multiplication  More complicated than addition  accomplished via shifting and addition  More time and more area  Let's look at 3 versions based on a gradeschool algorithm 0010 (multiplicand) __x_1011 (multiplier)  Negative numbers: convert and multiply  there are better techniques, we won’t look at them
  • 33. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Multiplication: Implementation Datapath Control Multiplicand Shift left 64 bits 64-bit ALU Product Write 64 bits Control test Multiplier Shift right 32 bits 32nd repetition? 1a. Add multiplicand to product and place the result in Product register Multiplier0 = 0 1. Test Multiplier0 Start Multiplier0 = 1 2. Shift the Multiplicand register left 1 bit 3. Shift the Multiplier register right 1 bit No: < 32 repetitions Yes: 32 repetitions Done
  • 34. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Final Version Multiplicand 32 bits 32-bit ALU Product Write 64 bits Control test Shift right 32nd repetition? Product0 = 0 1. Test Product0 Start Product0 = 1 3. Shift the Product register right 1 bit No: < 32 repetitions Yes: 32 repetitions Done What goes here? •Multiplier starts in right half of product
  • 35. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Multiplying Signed Numbers with Boothe’s Algorithm  Consider A x B where A and B are signed integers (2’s complemented format)  Decompose B into the sum B1 + B2 + … + Bn A x B = A x (B1 + B2 + … + Bn) = (A x B1) + (A x B2) + … (A x Bn)  Let each Bi be a single string of 1’s embedded in 0’s: …0011…1100…  Example: 0110010011100 = 0110000000000 + 0000010000000 + 0000000011100
  • 36. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Boothe’s Algorithm  Scanning from right to left, bit number u is the first 1 bit of the string and bit v is the first 0 left of the string: v u Bi = 0 … 0 1 … 1 0 … 0 = 0 … 0 1 … 1 1 … 1 (2v – 1) - 0 … 0 0 … 0 1 … 1 (2u – 1) = (2v – 1) - (2u – 1) = 2v – 2u
  • 37. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Boothe’s Algorithm  Decomposing B: A x B = A x (B1 + B2 + … ) = A x [(2v1 – 2u1) + (2v2 – 2u2) + …] = (A x 2v1) – (A x 2u1) + (A x 2v2) – (A x 2u2) …  A x B can be computed by adding and subtracted shifted values of A:  Scan bits right to left, shifting A once per bit  When the bit string changes from 0 to 1, subtract shifted A from the current product P – (A x 2u)  When the bit string changes from 1 to 0, add shifted A to the current product P + (A x 2v)
  • 38. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Floating Point Numbers  We need a way to represent a wide range of numbers  numbers with fractions, e.g., 3.1416  large number: 976,000,000,000,000 = 9.76 × 1014  small number: 0.0000000000000976 = 9.76 × 10-14  Representation:  sign, exponent, significand: (–1)sign × significand × 2exponent  more bits for significand gives more accuracy  more bits for exponent increases range
  • 39. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Scientific Notation  Scientific notation:  0.525×105 = 5.25×104 = 52.5×103  5.25 ×104 is in normalized scientific notation.  position of decimal point fixed  leading digit non-zero  Binary numbers  5.25 = 101.01 = 1.0101×22  Binary point  multiplication by 2 moves the point to the left  division by 2 moves the point to the right  Known as floating point format.
  • 40. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Binary to Decimal Conversion Binary (-1)S (1.b1b2b3b4) × 2E Decimal (-1)S × (1 + b1×2-1 + b2×2-2 + b3×2-3 + b4×2-4) × 2E Example: -1.1100 × 2-2 (binary) = - (1 + 2-1 + 2-2) ×2-2 = - (1 + 0.5 + 0.25)/4 = - 1.75/4 = - 0.4375 (decimal)
  • 41. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides IEEE Std. 754 Floating-Point Format S E: 8-bit Exponent F: 23-bit Fraction bits 0-22 bits 23-30 bit 31 S E: 11-bit Exponent F: 52-bit Fraction + bits 0-19 bits 20-30 bit 31 Continuation of 52-bit Fraction bits 0-31 Single-Precision Double-Precision
  • 42. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides IEEE 754 floating-point standard  Represented value = (–1)sign × (1+F) × 2exponent – bias  Exponent is “biased” (excess-K format) to make sorting easier  bias of 127 for single precision and 1023 for double precision  E values in [1 .. 254] (0 and 255 reserved)  Range = 2-126 to 2+127 (1038 to 10+38)  Significand in sign-magnitude, normalized form  Significand = (1 + F) = 1. b-1b-1b-1…b-23  Suppress storage of leading 1  Overflow: Exponent requiring more than 8 bits. Number can be positive or negative.  Underflow: Fraction requiring more than 23 bits. Number can be positive or negative.
  • 43. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides IEEE 754 floating-point standard  Example:  Decimal: -5.75 = - ( 4 + 1 + ½ + ¼ )  Binary: -101.11 = -1.0111 x 22  Floating point: exponent = 129 = 10000001  IEEE single precision: 110000001011100000000000000000000
  • 44. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Examples 1.1010001 × 210100 = 0 10010011 10100010000000000000000 = 1.6328125 × 220 -1.1010001 × 210100 = 1 10010011 10100010000000000000000 = -1.6328125 × 220 1.1010001 × 2-10100 = 0 01101011 10100010000000000000000 = 1.6328125 × 2-20 -1.1010001 × 2-10100 = 1 01101011 10100010000000000000000 = -1.6328125 × 2-20 Biased exponent (0-255), bias 127 (01111111) to be subtracted 0.5 0.125 0.0078125 0.6328125
  • 45. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Negative Overflow Positive Overflow Expressible numbers Numbers in 32-bit Formats  Two’s complement integers  Floating point numbers  Ref: W. Stallings, Computer Organization and Architecture, Sixth Edition, Upper Saddle River, NJ: Prentice-Hall. -231 231-1 0 Expressible negative numbers Expressible positive numbers 0 -2-127 2-127 Positive underflow Negative underflow (2 – 2-23)×2127 - (2 – 2-23)×2127
  • 46. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides IEEE 754 Special Codes  ± 1.0 × 2-127  Smallest positive number in single-precision IEEE 754 standard.  Interpreted as positive/negative zero.  Exponent less than -127 is positive underflow (regard as zero). S 00000000 00000000000000000000000 S 11111111 00000000000000000000000 Zero Infinity  ± 1.0 × 2128  Largest positive number in single-precision IEEE 754 standard.  Interpreted as ± ∞  If true exponent = 128 and fraction ≠ 0, then the number is greater than ∞. It is called “not a number” or NaN (interpret as ∞).
  • 47. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Addition and Subtraction  Addition/subtraction of two floating-point numbers: Example: 2 x 103 Align mantissas: 0.2 x 104 + 3 x 104 + 3 x 104 3.2 x 104  General Case: m1 x 2e1 ± m2 x 2e2 = (m1 ± m2 x 2e2-e1) x 2e1 for e1 > e2 = (m1 x 2e1-e2 ± m2) x 2e2 for e2 > e1  Shift smaller mantissa right by | e1 - e2| bits to align the mantissas.
  • 48. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Addition/Subtraction Algorithm 0. Zero check - Change the sign of subtrahend - If either operand is 0, the other is the result 1. Significand alignment: right shift smaller significand until two exponents are identical. 2. Addition: add significands and report exception if overflow occurs. 3. Normalization - Shift significand bits to normalize. - report overflow or underflow if exponent goes out of range. 4. Rounding
  • 49. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides FP Add/Subtract (PH Text Figs. 3.16/17)
  • 50. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Example  Subtraction: 0.5ten- 0.4375ten  Step 0: Floating point numbers to be added 1.000two×2-1 and -1.110two×2-2  Step 1: Significand of lesser exponent is shifted right until exponents match -1.110two×2-2 → - 0.111two×2-1  Step 2: Add significands, 1.000two + (- 0.111two) Result is 0.001two ×2-1
  • 51. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Example (Continued)  Step 3: Normalize, 1.000two× 2-4 No overflow/underflow since 127 ≥ exponent ≥ -126  Step 4: Rounding, no change since the sum fits in 4 bits. 1.000two ×2-4 = (1+0)/16 = 0.0625ten
  • 52. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides FP Multiplication: Basic Idea (m1 x 2e1) x (m2 x 2e2) = (m1 x m2) x 2e1+e2  Separate signs  Add exponents  Multiply significands  Normalize, round, check overflow  Replace sign
  • 53. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides FP Multiplication Algorithm P-H Figure 3.18
  • 54. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides FP Mult. Illustration  Multiply 0.5ten and -0.4375ten (answer = - 0.21875ten) or  Multiply 1.000two×2-1 and -1.110two×2-2  Step 1: Add exponents -1 + (-2) = -3  Step 2: Multiply significands 1.000 ×1.110 0000 1000 1000 1000 1110000 Product is 1.110000
  • 55. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides FP Mult. Illustration (Cont.)  Step 3:  Normalization: If necessary, shift significand right and increment exponent. Normalized product is 1.110000 × 2-3  Check overflow/underflow: 127 ≥ exponent ≥ -126  Step 4: Rounding: 1.110 × 2-3  Step 5: Sign: Operands have opposite signs, Product is -1.110 × 2-3 Decimal value = - (1+0.5+0.25)/8 = - 0.21875ten
  • 56. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides FP Division: Basic Idea  Separate sign.  Check for zeros and infinity.  Subtract exponents.  Divide significands.  Normalize/overflow/underflow.  Rounding.  Replace sign.
  • 57. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides MIPS Floating Point  32 floating point registers, $f0, . . . , $f31  FP registers used in pairs for double precision; $f0 denotes double precision content of $f0,$f1  Data transfer instructions:  lwc1 $f1, 100($s2) # $f1←Mem[$s1+100]  swc1 $f1, 100($s2) # Mem[$s2+100]←$f1  Arithmetic instructions: (xxx=add, sub, mul, div)  xxx.s single precision  xxx.d double precision
  • 58. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Floating Point Complexities  Operations are somewhat more complicated (see text)  In addition to overflow we can have “underflow”  Accuracy can be a big problem  IEEE 754 keeps two extra bits, guard and round  four rounding modes  positive divided by zero yields “infinity”  zero divide by zero yields “not a number”  other complexities  Implementing the standard can be tricky  Not using the standard can be even worse  see text for description of 80x86 and Pentium bug!
  • 59. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Chapter Three Summary  Computer arithmetic is constrained by limited precision  Bit patterns have no inherent meaning but standards do exist  two’s complement  IEEE 754 floating point  Computer instructions determine “meaning” of the bit patterns  Performance and accuracy are important so there are many complexities in real machines  Algorithm choice is important and may lead to hardware optimizations for both space and time (e.g., multiplication)  You may want to look back (Section 3.10 is great reading!)