Lecture 1 - Numerical Methods

Numerical Methods
Lecture 1
Representation of Numerical Values in the Computer

Overview
– Number Systems
– Number Conversion
– Representation of Numbers
– Computer Arithmetic
– Errors in Arithmetic
Tashreef Muhammad 2

Number Systems
Let us discuss mainly about “Positional Number
Systems”
Tashreef Muhammad 3

Number Systems
– Non-positional Number Systems
– Positional Number Systems
– Decimal
– Binary
– Hexadecimal
Tashreef Muhammad 4

Number Conversion
Converting numbers represented in one format, to
other
Tashreef Muhammad 5

Number Conversion
– Decimal to Non-decimal Number
– Non-decimal to Decimal Number
– Binary to Hexadecimal Number
– Hexadecimal to Binary Number
Tashreef Muhammad 6

Representation of
Numbers
How modern computers store and represent
numbers
Tashreef Muhammad 7

Representation of Numbers
– Dependent on words
– Words are a number of bits
– Word size varies from machine to machine
– Computers process using binary
– Humans easily understand decimal
– Hexadecimal expresses more in less digits
Tashreef Muhammad 8

– Integer Representation
– For word size n, maximum possible number to store is 2n – 1
– Negative numbers are stored as 2’s complement
– The use of sign bit makes the highest possible number 2n-1 – 1
– First toggle all the bits of the number
– Add 1 to the whole number
– Use the sign bit as it is
Tashreef Muhammad 9

– Floating Point Representation
– Exponential form, 𝑥 = 𝑓 × 10𝐸
– 𝑓 is mantissa and 𝐸 is exponent
– The entire memory location is divided to three parts
– Sign, mantissa and exponent
– Typically 24 bits for mantissa and 7 bits for exponent are used on
a 32 bit word representation system
Tashreef Muhammad 10

Representation of
Numbers
A sample floating point representation from
Numerical Methods textbook (Figure 3.1)
SIG
N
EXPONENT MANTISSA
1 bit 7 bits 24 bits
0
24 23
31 30

– Floating Point Representation
– Shifting of decimal pints  Normalisation
– Numbers in normalized form  Normalised Floating Point
Numbers
– Two possible notations: 0.596 x 10-2 or .596E-2
– Conditions to be satisfied by mantissa, 𝑓
– For positive numbers: 0.1 ≤ 𝑓 < 1.0
– For negative numbers: −1 < 𝑓 ≤ −0.1
– In general: 0.1 ≤ |𝑓| < 1.0

Computer Arithmetic
How computers process arithmetic operations

Computer Arithmetic
– Integer Arithmetic
– Integer arithmetic always results in integer
– Exception:
– Cannot represent infinite numbers, bounded above and below
– Integer division provides two results: quotient and remainder
– Sample integer arithmetic:
– 25 + 12 = 37, 25 - 12 = 13, 12 – 25 = -13
– 25 x 12 = 300, 25 ÷ 12 = 2, 12 ÷ 25 = 0
– Show the following is incorrect for integer arithmetic
–
𝑎+𝑏
𝑐
=
𝑎
𝑐
+
𝑏
𝑐

Computer Arithmetic
–Floating Point Arithmetic: Addition
– Let 𝑥 and 𝑦 be added to result in 𝑧; where 𝑥 ≥ 𝑦
– The fractional parts: f𝑥, f𝑦, f𝑧
– The exponent parts: E𝑥, E𝑦, E𝑧; where E𝑥 ≥ 𝐸𝑦
– Algorithm:
– Set E𝑧 = max(E𝑥, E𝑦)  E𝑧 = E𝑥
– Shift f𝑦 to the right by E𝑥 − E𝑦 places
– Set f𝑧 = f𝑥 + f𝑦
– Normalize if necessary

Computer Arithmetic
–Floating Point Arithmetic: Subtraction
– Addition with different sign bits
– May result in mantissa overflow
– Left shift f𝑧 and decrease E𝑧 accordingly

Computer Arithmetic
–Floating Point Arithmetic: Multiplication
– Multiply mantissa, f𝑧 = 𝑓𝑥 × 𝑓𝑦
– Add the exponents, E𝑧 = 𝐸𝑥 + 𝐸𝑦
– Normalise if necessary

Computer Arithmetic
–Floating Point Arithmetic: Division
– Divide mantissa, f𝑧 = 𝑓𝑥 ÷ 𝑓𝑦
– Subtract the exponents, E𝑧 = 𝐸𝑥 − 𝐸𝑦
– Normalise if necessary

Computer Arithmetic
–Floating Point Arithmetic: Practice
– Add 0.964572 E2 and 0.586351 E5
– Add 0.735816 E4 and 0.635742 E4
– Subtract 0.994576 E-3 from 0.999658 E-3
– Multiply 0.20 E4 and 0.40 E-2
– Divide 0.876543 by 0.200000 E-3

Computer Arithmetic
–Floating Point Arithmetic: Practice Answers
– Add 0.964572 E2 and 0.586351 E5  0.587315 E5
– Add 0.735816 E4 and 0.635742 E4  0.1371558 E5
– Subtract 0.994576 E-3 from 0.999658 E-3  0.508200 E-5
– Multiply 0.20 E4 and 0.40 E-2  0.80 E1
– Divide 0.876543 by 0.200000 E-3  0.438271 E-1

Errors in Arithmetic
Errors that are seen in arithmetic operations

–Floating Point Arithmetic
– Inexact binary representation of decimal number: 0.1 to
0.0001100110011…
– Errors due to rounding-off a number: Add the numbers
0.500000 E1 and 0.100000 E-7
– Subtractive cancellation: Subtract 0.499998 from 0.500000
– Overflow or underflow of numbers

– Laws of Arithmetic
– Errors in arithmetic induce error in associative and distributive law
errors
– 𝑥 + 𝑦 + 𝑧 ≠ 𝑥 + 𝑦 + 𝑧 and 𝑥 × 𝑦 × 𝑧 ≠ 𝑥 × 𝑦 × 𝑧
– 𝑥 × 𝑦 + 𝑧 ≠ 𝑥 × 𝑦 + 𝑥 × 𝑧
– Provide an example where associative law of addition fails due to
an arithmetic error of floating point number
– 𝑥 = 0.456732 × 10−2
, 𝑦 = 0.243451, 𝑧 = −0.248000
– 𝑥 + 𝑦 + 𝑧 ≠ 𝑥 + 𝑦 + 𝑧

Thank You

Lecture 1 - Numerical Methods

More Related Content

Similar to Lecture 1 - Numerical Methods

Recently uploaded

Lecture 1 - Numerical Methods