Numerical Methods
Lecture 1
Representation of Numerical Values in the Computer
Overview
– Number Systems
– Number Conversion
– Representation of Numbers
– Computer Arithmetic
– Errors in Arithmetic
Tashreef Muhammad 2
Number Systems
Let us discuss mainly about “Positional Number
Systems”
Tashreef Muhammad 3
Number Systems
– Non-positional Number Systems
– Positional Number Systems
– Decimal
– Binary
– Hexadecimal
Tashreef Muhammad 4
Number Conversion
Converting numbers represented in one format, to
other
Tashreef Muhammad 5
Number Conversion
– Decimal to Non-decimal Number
– Non-decimal to Decimal Number
– Binary to Hexadecimal Number
– Hexadecimal to Binary Number
Tashreef Muhammad 6
Representation of
Numbers
How modern computers store and represent
numbers
Tashreef Muhammad 7
Representation of Numbers
– Dependent on words
– Words are a number of bits
– Word size varies from machine to machine
– Computers process using binary
– Humans easily understand decimal
– Hexadecimal expresses more in less digits
Tashreef Muhammad 8
Representation of Numbers
– Integer Representation
– For word size n, maximum possible number to store is 2n – 1
– Negative numbers are stored as 2’s complement
– The use of sign bit makes the highest possible number 2n-1 – 1
– First toggle all the bits of the number
– Add 1 to the whole number
– Use the sign bit as it is
Tashreef Muhammad 9
Representation of Numbers
– Floating Point Representation
– Exponential form, 𝑥 = 𝑓 × 10𝐸
– 𝑓 is mantissa and 𝐸 is exponent
– The entire memory location is divided to three parts
– Sign, mantissa and exponent
– Typically 24 bits for mantissa and 7 bits for exponent are used on
a 32 bit word representation system
Tashreef Muhammad 10
Representation of
Numbers
A sample floating point representation from
Numerical Methods textbook (Figure 3.1)
Tashreef Muhammad 11
SIG
N
EXPONENT MANTISSA
1 bit 7 bits 24 bits
0
24 23
31 30
Representation of Numbers
– Floating Point Representation
– Shifting of decimal pints  Normalisation
– Numbers in normalized form  Normalised Floating Point
Numbers
– Two possible notations: 0.596 x 10-2 or .596E-2
– Conditions to be satisfied by mantissa, 𝑓
– For positive numbers: 0.1 ≤ 𝑓 < 1.0
– For negative numbers: −1 < 𝑓 ≤ −0.1
– In general: 0.1 ≤ |𝑓| < 1.0
Tashreef Muhammad 12
Computer Arithmetic
How computers process arithmetic operations
Tashreef Muhammad 13
Computer Arithmetic
– Integer Arithmetic
– Integer arithmetic always results in integer
– Exception:
– Cannot represent infinite numbers, bounded above and below
– Integer division provides two results: quotient and remainder
– Sample integer arithmetic:
– 25 + 12 = 37, 25 - 12 = 13, 12 – 25 = -13
– 25 x 12 = 300, 25 ÷ 12 = 2, 12 ÷ 25 = 0
– Show the following is incorrect for integer arithmetic
–
𝑎+𝑏
𝑐
=
𝑎
𝑐
+
𝑏
𝑐
Tashreef Muhammad 14
Computer Arithmetic
–Floating Point Arithmetic: Addition
– Let 𝑥 and 𝑦 be added to result in 𝑧; where 𝑥 ≥ 𝑦
– The fractional parts: f𝑥, f𝑦, f𝑧
– The exponent parts: E𝑥, E𝑦, E𝑧; where E𝑥 ≥ 𝐸𝑦
– Algorithm:
– Set E𝑧 = max(E𝑥, E𝑦)  E𝑧 = E𝑥
– Shift f𝑦 to the right by E𝑥 − E𝑦 places
– Set f𝑧 = f𝑥 + f𝑦
– Normalize if necessary
Tashreef Muhammad 15
Computer Arithmetic
–Floating Point Arithmetic: Subtraction
– Addition with different sign bits
– May result in mantissa overflow
– Left shift f𝑧 and decrease E𝑧 accordingly
Tashreef Muhammad 16
Computer Arithmetic
–Floating Point Arithmetic: Multiplication
– Multiply mantissa, f𝑧 = 𝑓𝑥 × 𝑓𝑦
– Add the exponents, E𝑧 = 𝐸𝑥 + 𝐸𝑦
– Normalise if necessary
Tashreef Muhammad 17
Computer Arithmetic
–Floating Point Arithmetic: Division
– Divide mantissa, f𝑧 = 𝑓𝑥 ÷ 𝑓𝑦
– Subtract the exponents, E𝑧 = 𝐸𝑥 − 𝐸𝑦
– Normalise if necessary
Tashreef Muhammad 18
Computer Arithmetic
–Floating Point Arithmetic: Practice
– Add 0.964572 E2 and 0.586351 E5
– Add 0.735816 E4 and 0.635742 E4
– Subtract 0.994576 E-3 from 0.999658 E-3
– Multiply 0.20 E4 and 0.40 E-2
– Divide 0.876543 by 0.200000 E-3
Tashreef Muhammad 19
Computer Arithmetic
–Floating Point Arithmetic: Practice Answers
– Add 0.964572 E2 and 0.586351 E5  0.587315 E5
– Add 0.735816 E4 and 0.635742 E4  0.1371558 E5
– Subtract 0.994576 E-3 from 0.999658 E-3  0.508200 E-5
– Multiply 0.20 E4 and 0.40 E-2  0.80 E1
– Divide 0.876543 by 0.200000 E-3  0.438271 E-1
Tashreef Muhammad 20
Errors in Arithmetic
Errors that are seen in arithmetic operations
Tashreef Muhammad 21
Errors in Arithmetic
–Floating Point Arithmetic
– Inexact binary representation of decimal number: 0.1 to
0.0001100110011…
– Errors due to rounding-off a number: Add the numbers
0.500000 E1 and 0.100000 E-7
– Subtractive cancellation: Subtract 0.499998 from 0.500000
– Overflow or underflow of numbers
Tashreef Muhammad 22
Errors in Arithmetic
– Laws of Arithmetic
– Errors in arithmetic induce error in associative and distributive law
errors
– 𝑥 + 𝑦 + 𝑧 ≠ 𝑥 + 𝑦 + 𝑧 and 𝑥 × 𝑦 × 𝑧 ≠ 𝑥 × 𝑦 × 𝑧
– 𝑥 × 𝑦 + 𝑧 ≠ 𝑥 × 𝑦 + 𝑥 × 𝑧
– Provide an example where associative law of addition fails due to
an arithmetic error of floating point number
– 𝑥 = 0.456732 × 10−2
, 𝑦 = 0.243451, 𝑧 = −0.248000
– 𝑥 + 𝑦 + 𝑧 ≠ 𝑥 + 𝑦 + 𝑧
Tashreef Muhammad 23
Thank You
Tashreef Muhammad 24

Lecture 1 - Numerical Methods

  • 1.
    Numerical Methods Lecture 1 Representationof Numerical Values in the Computer
  • 2.
    Overview – Number Systems –Number Conversion – Representation of Numbers – Computer Arithmetic – Errors in Arithmetic Tashreef Muhammad 2
  • 3.
    Number Systems Let usdiscuss mainly about “Positional Number Systems” Tashreef Muhammad 3
  • 4.
    Number Systems – Non-positionalNumber Systems – Positional Number Systems – Decimal – Binary – Hexadecimal Tashreef Muhammad 4
  • 5.
    Number Conversion Converting numbersrepresented in one format, to other Tashreef Muhammad 5
  • 6.
    Number Conversion – Decimalto Non-decimal Number – Non-decimal to Decimal Number – Binary to Hexadecimal Number – Hexadecimal to Binary Number Tashreef Muhammad 6
  • 7.
    Representation of Numbers How moderncomputers store and represent numbers Tashreef Muhammad 7
  • 8.
    Representation of Numbers –Dependent on words – Words are a number of bits – Word size varies from machine to machine – Computers process using binary – Humans easily understand decimal – Hexadecimal expresses more in less digits Tashreef Muhammad 8
  • 9.
    Representation of Numbers –Integer Representation – For word size n, maximum possible number to store is 2n – 1 – Negative numbers are stored as 2’s complement – The use of sign bit makes the highest possible number 2n-1 – 1 – First toggle all the bits of the number – Add 1 to the whole number – Use the sign bit as it is Tashreef Muhammad 9
  • 10.
    Representation of Numbers –Floating Point Representation – Exponential form, 𝑥 = 𝑓 × 10𝐸 – 𝑓 is mantissa and 𝐸 is exponent – The entire memory location is divided to three parts – Sign, mantissa and exponent – Typically 24 bits for mantissa and 7 bits for exponent are used on a 32 bit word representation system Tashreef Muhammad 10
  • 11.
    Representation of Numbers A samplefloating point representation from Numerical Methods textbook (Figure 3.1) Tashreef Muhammad 11 SIG N EXPONENT MANTISSA 1 bit 7 bits 24 bits 0 24 23 31 30
  • 12.
    Representation of Numbers –Floating Point Representation – Shifting of decimal pints  Normalisation – Numbers in normalized form  Normalised Floating Point Numbers – Two possible notations: 0.596 x 10-2 or .596E-2 – Conditions to be satisfied by mantissa, 𝑓 – For positive numbers: 0.1 ≤ 𝑓 < 1.0 – For negative numbers: −1 < 𝑓 ≤ −0.1 – In general: 0.1 ≤ |𝑓| < 1.0 Tashreef Muhammad 12
  • 13.
    Computer Arithmetic How computersprocess arithmetic operations Tashreef Muhammad 13
  • 14.
    Computer Arithmetic – IntegerArithmetic – Integer arithmetic always results in integer – Exception: – Cannot represent infinite numbers, bounded above and below – Integer division provides two results: quotient and remainder – Sample integer arithmetic: – 25 + 12 = 37, 25 - 12 = 13, 12 – 25 = -13 – 25 x 12 = 300, 25 ÷ 12 = 2, 12 ÷ 25 = 0 – Show the following is incorrect for integer arithmetic – 𝑎+𝑏 𝑐 = 𝑎 𝑐 + 𝑏 𝑐 Tashreef Muhammad 14
  • 15.
    Computer Arithmetic –Floating PointArithmetic: Addition – Let 𝑥 and 𝑦 be added to result in 𝑧; where 𝑥 ≥ 𝑦 – The fractional parts: f𝑥, f𝑦, f𝑧 – The exponent parts: E𝑥, E𝑦, E𝑧; where E𝑥 ≥ 𝐸𝑦 – Algorithm: – Set E𝑧 = max(E𝑥, E𝑦)  E𝑧 = E𝑥 – Shift f𝑦 to the right by E𝑥 − E𝑦 places – Set f𝑧 = f𝑥 + f𝑦 – Normalize if necessary Tashreef Muhammad 15
  • 16.
    Computer Arithmetic –Floating PointArithmetic: Subtraction – Addition with different sign bits – May result in mantissa overflow – Left shift f𝑧 and decrease E𝑧 accordingly Tashreef Muhammad 16
  • 17.
    Computer Arithmetic –Floating PointArithmetic: Multiplication – Multiply mantissa, f𝑧 = 𝑓𝑥 × 𝑓𝑦 – Add the exponents, E𝑧 = 𝐸𝑥 + 𝐸𝑦 – Normalise if necessary Tashreef Muhammad 17
  • 18.
    Computer Arithmetic –Floating PointArithmetic: Division – Divide mantissa, f𝑧 = 𝑓𝑥 ÷ 𝑓𝑦 – Subtract the exponents, E𝑧 = 𝐸𝑥 − 𝐸𝑦 – Normalise if necessary Tashreef Muhammad 18
  • 19.
    Computer Arithmetic –Floating PointArithmetic: Practice – Add 0.964572 E2 and 0.586351 E5 – Add 0.735816 E4 and 0.635742 E4 – Subtract 0.994576 E-3 from 0.999658 E-3 – Multiply 0.20 E4 and 0.40 E-2 – Divide 0.876543 by 0.200000 E-3 Tashreef Muhammad 19
  • 20.
    Computer Arithmetic –Floating PointArithmetic: Practice Answers – Add 0.964572 E2 and 0.586351 E5  0.587315 E5 – Add 0.735816 E4 and 0.635742 E4  0.1371558 E5 – Subtract 0.994576 E-3 from 0.999658 E-3  0.508200 E-5 – Multiply 0.20 E4 and 0.40 E-2  0.80 E1 – Divide 0.876543 by 0.200000 E-3  0.438271 E-1 Tashreef Muhammad 20
  • 21.
    Errors in Arithmetic Errorsthat are seen in arithmetic operations Tashreef Muhammad 21
  • 22.
    Errors in Arithmetic –FloatingPoint Arithmetic – Inexact binary representation of decimal number: 0.1 to 0.0001100110011… – Errors due to rounding-off a number: Add the numbers 0.500000 E1 and 0.100000 E-7 – Subtractive cancellation: Subtract 0.499998 from 0.500000 – Overflow or underflow of numbers Tashreef Muhammad 22
  • 23.
    Errors in Arithmetic –Laws of Arithmetic – Errors in arithmetic induce error in associative and distributive law errors – 𝑥 + 𝑦 + 𝑧 ≠ 𝑥 + 𝑦 + 𝑧 and 𝑥 × 𝑦 × 𝑧 ≠ 𝑥 × 𝑦 × 𝑧 – 𝑥 × 𝑦 + 𝑧 ≠ 𝑥 × 𝑦 + 𝑥 × 𝑧 – Provide an example where associative law of addition fails due to an arithmetic error of floating point number – 𝑥 = 0.456732 × 10−2 , 𝑦 = 0.243451, 𝑧 = −0.248000 – 𝑥 + 𝑦 + 𝑧 ≠ 𝑥 + 𝑦 + 𝑧 Tashreef Muhammad 23
  • 24.