College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
Neumerical Methods.pptx
1. NUMERICAL
METHODS
NAME – PAYEL DALAL
ROLL NO –
14400120030
STREAM – CSE
SUBJECT –
NUMERICAL METHODS
SUBJECT CODE –
OEC-IT601A
2. Presentation title 2
TOPIC:-
APPROXIMATION IN NUMERICAL
COMPUTATION
Truncation and rounding errors
Fixed and floating point arithmetic
Propagation of errors
4. 4
Error Definitions:
True error (Et ): the difference between the true value and
the approximation.
Absolute error (|Et |): the absolute difference between the
true value and the approximation.
True fractional relative error: the true error divided by
the true value.
Relative error (et): the true fractional relative error
expressed as a percentage.
• True fractional relative error = (true value – approximation)/true value
• Et = True value – approximation
• et= true fractional relative error * 100%
5. 5
We don’t know the true value, so we can’t calculate
the true error - approximate the error
Relative approximate error – an approximation of
the error relative to the approximation itself
o Relative Approximate Error:
ea =
Approximate error
Approximation
*100%
o Iterative Approach:
ea =
Current Approximation – Previous Approximation
Current Approximation
6. 6
For iterative approximations, continue to iterate
until the relative approximate error magnitude is
less than a specified stopping criterion
For accuracy to at least n significant figures set the
stopping criterion to
o Stopping Criterion:
|ea | = es
es = (0.5*10(2-n))%
7. 7
ROUND-OFF ERROR:
Roundoff errors arise because digital computers cannot represent some quantities
exactly.
There are two major facets of roundoff errors involved in numerical calculations:
• Digital computers have size and precision limits on their
ability to represent numbers.
• Certain numerical manipulations are highly sensitive to
roundoff errors.
Errors
Truncation Errors Round-off Errors
8. 8
Base-10 computer with a 5 bit word
S1d1.d2*10S0d0
2 -5= 0.03125 3.1 x 10 -2
roundoff error =
(0.03125-0.031)/0.03125 = 0.008 = 0.8%
Because of the limited number of bits for significand and exponent,
Roundoff errors is occur.
π= 3.141593for 16-bit word computer
π= 3.14159265358979 for 32-bit word computer
Although adding significand digits can improve the approximation, such
quantities will always have some roundoff error when stored in a computer
9. 9
Truncation errors are those that result from using an approximation in place of an
exact mathematical procedure.
Example 1: approximation to a derivative using a finite-difference equation:
~
Example 2: The Taylor Series
TRUNCATION ERROR:
=
dv v v(ti+1) – v(ti)
dt t ti+1 - ti
=
f(x)= f(x0)+ f' (x0)+ f'' (x0) + ….. + f(n) (x0) + Rn
x – x0
1!
(x – x0)2 (x – x0)n
n!
2!
11. 11
In general, the nth order Taylor series expansion will be exact for an nth order polynomial.
In other cases, the remainder term Rn is of the order of h n+1, meaning:
o Truncation Error:
• The more terms are used, the smaller the error, and
• The smaller the spacing, the smaller the error for a given number of terms.
The total numerical error is the summation of the
truncation and roundoff errors.
The truncation error generally increases as the
step size increases, while the roundoff error
decreases as the step size increases - this leads to a
point of diminishing returns for step size.
o Total Numerical Error:
12. 12
Figure: Parts of fixed-point representation
Sign bit:- The fixed-point number representation in binary uses a sign bit. The
negative number has a sign bit 1, while a positive number has a bit 0.
Integral Part:- The integral part in fixed-point numbers is of different lengths at
different places. It depends on the register's size; for an 8-bit register, the
integral part is 4 bits.
Fractional part:- The Fractional part is of different lengths at different places. It
depends on the registers; for an 8-bit register, the fractional part is 3 bits.
There are three parts of the fixed-point number representation: Sign bit, Integral
part, and Fractional part.
Fixed Point Representation:
13. 13
Register Sign Bit Integer Part Fraction Part
8-bit register 1 bit 4 bits 3 bits
16-bit register 1 bit 9 bits 6 bits
32-bit register 1 bit 15 bits 9 bits
Size of Sign Bit, Integer Part, and Fractional Part for different registers are displayed
below:
Now that we have learned about fixed-point number representation, let's see how to represent
it.
The number considered is 4.5
Step 1: We will convert the number 4.5 to binary form. 4.5 = 100.1
Step 2: Represent the binary number in fixed-point notation with the following format.
Figure: Fixed Point Notation of 4.5
o How to write numbers in Fixed-point notation:
14. 14
Floating Point Representation:
A floating-point representation has three parts: Sign bit, Exponent Part, and Mantissa.
Figure: Parts of floating-point representation
Sign bit:- The floating-point numbers in binary uses a sign bit. A negative
number has a sign bit 1, while a positive number has a sign bit 0. The sign of
any number depends on mantissa, not on exponent.
Mantissa Part:- The mantissa part is of different lengths at different places. It
depends on registers like for a 16-bit register, and mantissa part is of 8 bits.
Exponent Part:- It is the power of the number. It depends on the size of the
register. For example, in the 16-bit register, the exponent part is of 7 bits.
15. 15
The number considered is 53.5
Step 1: We will convert the number 53.5 to binary form. 53.5 =
110101.1
Step 2: Normalize the number ( base is 2) = (1.101011) * 25.
Step 2: Represent the binary number in floating-point notation with the
following format.
Figure: Floating Point Notation of 53.5
o How to write numbers in Floating-point notation:
16. 16
Error Propagation:
Let x refer to the floating point representation of the real number x.
Since computer has fixed word length, there is a difference between x and x (round-off
error)
and we would like to estimate the error in the calculation of f(x):
f(xfl) = |f(x) - f(xfl)|
f(x) - f(xfl) = f'(xfl)(x - xfl)
f(xfl) = f'(xfl) * x
Result: If f'(x) and x are known, then we can estimate the error using this formula
~
• Both x and f(x) are unknown.
• If x, is close to x, then we can use first order Taylor expansion and compute:
~