The significant figures are the digits that carry meaning to the precision of
Consider three measurements for the length of a table:
L2: 3.27 m
L3: 3.270 m
Number of significant figures for L1 is two, for L2 is three, and for L3 is four.
First digit is the most significant figure, and the last digit is the least
significant digit in the measurement.
We can assign error associated with each measurement:
L1=3.2 +- 0.2 m L2: 3.27 m +- 0.01 m
L3: 3.270 +- 0.003 m
Any digit beyond the error carrying digits is meaningless.
Leading zeros are not significant. They are only used to show the location of
the decimal point. e. g. 0.00052 has only two significant digits . To avoid
confusion, scientists prefer scientific notation (e.g., 5.2x10-4).
Accuracy & Precision
Accuracy refers to how closely a
computed or measured value
agrees with the true value.
Inaccuracy (also called bias) is a
systematic deviation from the
Precision refers to how closely
individual computed or measured
values agree with each other.
Imprecision (also called
uncertainty) refers to the
magnitude of the scatter.
accuracy and precision are
independent from each other.
In numerical methods both accuracy and precision is required for
a particular problem. We will use the collective term error to
represent both inaccuracy and imprecision in our predictions.
Numerical errors arise from the use of approximation to
represent exact mathematic operations or quantities. Consider
the approximation we did in the problem of falling object in air.
We observed some error between the exact (true) and numerical
The relationship between them:
True value = approximation + error
Et = true value – approximation
Note that in this equation, we included all factors contributing
to the error. So, we used the subscript t to designate that this
is the true error).
To take into account different magnitudes in different
measurements, we prefer to normalize the error. Then, we
define the fractional relative error:
Fractional relative error = (true value-approximation)/(true value)
or the percent relative error:
t = (true value-approximation)/(true value) x 100
Most of the times, we just say “error” to mean percent relative error.
So, we define true error as :
(true value approximated value)
In most cases we don’t have the knowledge of the “true value”,
so we define approximate error as
(approximate error )
Approximate error can be defined in different ways depending
on the problem. For example, in iterative methods, error is
defined with respect to the previous calculation.
(current approx. previous approx.)
Round-off errors result from the omission of the significant figures.
Base-10 (decimal) versus Base-2 (binary) system:
103 102 101 100
23 22 21 20
a b c d
a b a b
= ax103 + bx102 + cx101 + dx100
= ax23 + bx22 + ax21 + bx20
Computers knows only two numbers (on/off states). So
computers can only store numbers in binary (base-2) system.
a bit (binary digit).
1 byte= 8 bits
computer uses 6 bits to store this number.
First bit is used to store the sign (0 for “+” and 1 for “-”);
remaining bits are used to store the number.
In integer representation, numbers can be defined exactly but
only a limited range of numbers are allowed in a limited
memory. Also fractional quantities can not be represented.
Ex: How -3 is stored in a computer in integer representation?
Ex: Find the range of numbers that you can store in a 16-bit
computer in integer representation. (-32767 to 32767).
Floating Point Representation:
FPR allows a much wider range of numbers than integer
It allows storing fractional quantities.
It is similar to the scientific notation.
0.015678 1.5678 10 2
Ex: Assume you have a hypotetical base-10 computer with a 5-digit word size
(one digit for sign, two for exponent with sign, two for mantissa). a) Find the
range of values that can be represented. b) Calculate the error of representing 2-5
using this representation.
IEEE floating point representation standards:
32-bit (single precision) word format:
64-bit (double precision) word format:
Mantissa takes only a limited number of significant digits
Increasing the number of digits (32-bit versus 64-bit) decreases
the roundoff error.
In FPR there is still a limit for the representation numbers but the
range is much bigger.
In 64-bit representiaton in IEEE format:
Max value= +1.111…1111 x 2 +(1111111111) = 1.7977 x 10+308
Min value= 1.000…0000 x 2 -(1111111111) = 2.2251 x 10 -308
Numbers larger than the max. value cannot be represented by
the computer overflow error.
Any value bigger than this is set to infinity
Numbers smaller than the min. value cannot be represented.
There is a “hole” at zero. underflow error.
Any value smalller than this is set to zero
52 bits used for the mantissa correspond to about 15-16 base-10
32-bit representation (single precision)
64-bit representation (double precision)
Ex: Find the smallest possible value floating point number for a hypothetical
base-2 machine that stores information using 7-bits words (first bit for the sign
of the number, next three for the sign and magnitude of the exponent , and the
last three for the magnitude of the mantissa). (1x2-3)
Chopping versus Rounding:
Assume a computer that can store 7 significant digits:
Rounding is a better choice since the sign of error can be either
positive and negative leading to smaller total numerical error.
Whereas error in chopping is always positive and adds up.
Rounding costs an extra processing to the computer, so most
computers just chops off the number.
Error associated with rounding/chopping Quantization error
As a result of quantization of numbers, there is a finite length of
interval between two numbers in floating point representation.
Machine epsilon (or machine precision) is the upper bound on
the relative error due to chopping/rounding in floating point
For a 64-bit
representation, b=2, t=53
The machine epsilon can be computed as =2.22044.. x 10-16
t= number of digits in
Besides the limitations of the computer for storage of numbers,
arithmetic operations of these numbers also contribute to the
Consider a hypotetical base-10 computer with 4-digit mantissa and 1-digit
1.345 + 0.03406 = 0.1345 x 101 + 0.003406 x 101 = 0.137906 x 101
in arithmetic operations numbers are converted as with same exponents
Ex: a) Evaluate the polynomial
y x 3 5 x 2 6 x 0.55
at x=1.73. Use 3-digit arithmetic with chopping. Evaluate the error.
b) If the function is expressed as
y x( x 5) 6x 0.55
What is the percent relative error? Compare with part a.
Subtructive cancellation occurs when subtracting two nearly
0.7549 x 103 - 0.7548 x 103 = 0.0001 x 103
loss of significance
Many problems in numerical analysis are prone to subtractive
cencallation error. They can be mitigated by manipulations in the
formulation of the problem or by increasing the precision.
Consider finding the roots of a
2nd order polynomial:
b b 2 4ac
- Can use double precision, or
- Can use an alternative
b b 2 4ac
Truncation errors result from using an approximations in place of
exact mathematical representations. Remember the
approximation in the falling object in air problem:
dv v v(ti 1 ) v(ti )
ti 1 ti
Taylor’s theorem give us insight for estimating the truncation
error in the numerical approximation.
Taylor’s theorem states that if the function f and its n+1 drivatives
are continous on an interval containing a and x, then the values of
the function at x is given by
f ( 2) (a)
f ( n ) (a)
f ( x) f (a) f (a)( x a)
( x a) ...
( x a) n Rn
In other words, any smooth
function can be
approximated as a
polynomial of order n within
a given interval.
The error gets smaller as n
Ex: Use second order Taylor series expansion to approximate the
f ( x) 0.1x 4 0.15 x 3 0.5 x 2 0.25 x 1.2
at x=1 from a=0. Calculate the truncation error from this
Suppose you have f(xi ) and want to evaulate f(xi+1):
f 2 ( xi ) 2
f n ( xi ) n
f ( xi 1 ) f ( xi ) f ' ( xi )h
h ( xi 1 xi )
f n 1 ( ) n 1
( n 1)!
xi , xi 1
Here Rn represents the remainder (or the error) from the n-th
order approximation of the function. It provides and exact
determination of the error.
We can estimate the order of the magnitude of the error in
terms of step size (h):
Rn O (h n 1 )
we can change ‘h’ to control the magnitude of the error in the calculation!
Falling object in air problem:
We can evalate the truncation error for the “falling object in
air” problem. Express v(ti+1 ) in Taylor series:
v '' (ti ) 2
v n (ti ) n
v(ti 1 ) v(ti ) v (ti )h
h (ti 1 ti )
Taylor series to n=1:
v(ti 1 ) v(ti ) v ' (ti )h R1
R1 O(h 2 )
v ' (ti )
v(ti 1 ) v(ti ) R1
Finite difference appr.
R1 O(h 2 )
O ( h)
Then the error associated with
finite difference approximation
is in the order of h.
Error propagation concerns how an error in x is propagated to the
xo= approx. v.
x x x
f ( x ) f ( x) f ( x )
Taylor expansion can be used to estimate the error propagation.
Lets evaluate f(x) near f(xo):
f ( x) f ( x ) f ' ( x)x ...
f ( x) f ( x) f ( x ) f ' ( x)x
dropping 2nd and
higher order terms
f ( x ) f ' ( xo ) x
Ex 2.3: Given a measured value of x0= 2.50.01, estimate the
resulting error in the function f(x)=x3.
Functions of more than one variable:
x x x
y y y
x z z
f ( x , y o , z o ,..)
Error propagation for functions of more than one variable can be
understood as the generalization of the case of functions with a
f ( x , y , z ,..)
Ex 2.3: Open channel flow formula for a rectangular channel is given by :
1 (bh) 5 / 3
n (b 2h) 2 / 3
(Q=flow rate , n=roughness coff.,)
(b=width, h=depth, s=slope)
Assume that b=20 m and h=0.3 m for the channel. If you know that
n=0.030±0.002 and s=0.05±0.01, what is the resulting error in calculation of Q?
Condition and stability:
Condition of a mathematical computation is the sensitivity to the
input values. It is a measure of how much an uncertainity is
magnified by the computation.
Error in the output
Error in the input
f ( x) f ( x o )
f ' ( x o )( x x o )
xo f ' ( xo )
f ( xo )
f ( xo )
(x x )
(x x )
f ( xo )
If the uncertainty in the input results in gross changes in the
output, we say that the problem is unstable or ill-conditioned.
Total numerical error:
Round-off errors can be minimized by increasing the number of
significant digits. Subtractive cancellations and number of
computations increases the roundoff-error.
Truncation errors can be reduced by decreasing the step size (h).
But this may result in subtractive cancellation error too.
So, there is a trade-off between
truncation error and round-off error in
terms of step size (h).
Note that there is no systematic and
general approach to evaluating
numerical errors for all problems.
Formulation Errors & Data Uncertainity
These errors are totally independant from numerical errors, and
are not directly connected to most numerical methods.
In other words: stupid mistakes. They can only be mitigated by
experience, or by consulting to experienced persons.
Formulation (model) errors:
Formulation (or model) errors are causes by incomplete
formulation of the mathematical model. (e.g., in “the falling object
in air problem”, not taking the effect of air fricton into account).
If your data contain large inaccuracies or imprecisions (may be
due to problems with measurement device), this will directly
affect the quality of the results.
Statistical analyses on the data helps to minimize these errors.