This Chapter is part of previous published ch.1 and ch.3 and its use for undergraduate students in physics department. also, you can use it for mathematical and Statistical courses and for those experimental courses of data fitting.
1. Intr. to Computational Physics
ch2.
METHODS OF DATA FITTING
By Jifar R.
JU CNS Phys. Dept.
2. Data Fitting
• Data fitting is an art worthy of serious study. In this unit we
just scratch the surface.
• Data fitting is the process of fitting models to data and
analyzing the accuracy of the fit. Engineers and scientists
use data fitting techniques, including mathematical equations
and nonparametric methods, to model acquired data.
• We examine how to interpolate within a table of numbers
and how to do a least-squares fit for linear functions.
• If a least squares fit is needed for non linear functions, then
some of the search routines, either your own or those
obtained from scientific subroutine libraries, may be used.
3. Numerical interpolation and extrapolation
• is perhaps one of the most used tools in numerical
applications to physics.
• function f at a set of points x1, …xn where an analytic
form is missing,
• f may represent some data points from experiment or
result of a lengthy large-scale computation of physical
quantity that cannot be cast into a simple analytical
form,
• to evaluate f at point x within data set x1, …xn, where x
differs from the tabulated values: interpolation
• If x is outside: extrapolation
• Two methods of interpolation and extrapolation:
polynomial interpolation and extrapolation and qubic
spline interpolation.
4. • polynomial interpolation is the interpolation of a given
data set by the polynomial of lowest possible degree that
passes through the points of the dataset.
• Given a set of n + 1 data points ( 𝒙𝟎 , 𝒚𝟎 ) , … , ( 𝒙𝒏 , 𝒚𝒏 )
with no two 𝑥𝑗 the same, a polynomial function 𝑝 𝑥 is
said to interpolate the data if 𝒑 𝒙𝒊 = 𝒚𝒊 for each
𝒋 ∈ 𝟎, 𝟏, … 𝒏
• Two common explicit formulas for this polynomial are the
Lagrange polynomials and Newton polynomials.
• Two common explicit formulas for this polynomial are the
Lagrange polynomials and Newton polynomials.
• Interpolation is needed when we want to infer some local
information from a set of incomplete or discrete data.
• Overall approximation or fitting is needed when we want
to know the general or global behavior of the data.
5. Interpolation
• In doing measurement: discrete set of points
represent the experiment.
• Interpretation of measured data or theoretical
calculations is important part.
• Assume experiment to be represented by pairs of
values x & y:
• x is independent variable which vary,
• y is the measured value at point x, i.e. y = y(x)
• Consider a radioactive source and a detector, which
counts the number of decays.
• To determine half-life of this source, count the
number of decays 𝑁0, 𝑁1, 𝑁2, … . . . 𝑁𝑘 𝑎𝑡 𝑡𝑖𝑚𝑒 𝑡0, 𝑡1, 𝑡2, … . . . . 𝑡𝑘.
6. Interpolation
• t is independent variable, & what you measure is a discrete set of
pairs of numbers 𝑡𝑘, 𝑁𝑘 in the range of 𝑡0, 𝑡𝑘
• In order to extract information from such an experiment, finding
analytical function which relates N(t) & t
• sometimes finding it is impossible, or it is time consuming to
calculate(if known) or we might be only interested in a small local
region of the independent variable.
• Assume radioactive source is α emitter, Its half-life is
𝑡1 2 = 430 𝑦𝑒𝑎𝑟𝑠, cannot be determine by measuring,
• Because it is very slowly decaying you probably will measure the
activity over a longer time period, say every Monday for a couple
of months.
• After 5 months you would stop and look at the data.
• What was the activity A(t) on Wednesday of the 3rd week?
• This day is inside your range of 𝑡0, 𝑡𝑘 : use interpolation
techniques to determine this value.
7. Interpolation
• Again to know activity (A) 8 months from end of your
measurement, use extrapolation to this point from the
previous series of measurements.
• Idea of interpolation is to select a function 𝑔 𝑥 such that
𝑔 𝑥𝑖 = 𝑓𝑖 for each data point i and that this function is a
good approximation for any other x lying between the
original data points.
• Because data points may be interpolated by an infinite
number of functions, criterion or guideline to select a
reasonable function to get good approximation is needed.
• In mathematics there are very many theorems on function
analysis including interpolation with error analysis.
• As a rule these methods are grounded on “smoothness” of
the interpolated functions.
• This would not work for functions like 1 1 + 25𝑥2 on the
interval −1, +1 .
8. Linear interpolation
• For a function fi = f(xi) with i = 0, 1, . . . , n.
• Simplest approximation of f(x) for x ∈ [xi, xi+1] is
constructing straight line b/n xi and xi+1. i.e.
• which is not accurate in most cases, serves as a
good start in understanding other interpolation.
• Any value of f(x) in [xi, xi+1] is equal to the sum of
the linear interpolation in the above eqn & a
quadratic contribution that has a unique curvature
and is equal to zero at xi and xi+1.
• i.e. the error ∆f(x) in the linear interpolation is:
9. Linear interpolation
Between two points in straight line, b & a from eqn
𝒚 𝒙 = 𝒂 + 𝒃𝒙 are:
and
Combining them:
which is eqn of line through 𝑥1, 𝑦1 𝑥1 and 𝑥2, 𝑦2 𝑥2 .
𝒂 = 𝒚𝟏 𝒙𝟏 − 𝒙𝟏
𝒚𝟐 𝒙𝟐 − 𝒚𝟏 𝒙𝟏
𝒙𝟐 − 𝒙𝟏
𝒃 =
𝒚𝟐 𝒙𝟐 − 𝒚𝟏 𝒙𝟏
𝒙𝟐 − 𝒙𝟏
𝒚 𝒙 =
𝒙 − 𝒙𝟐
𝒙𝟏 − 𝒙𝟐
𝒚𝟏 𝒙𝟏 +
𝒙 − 𝒙𝟏
𝒙𝟐 − 𝒙𝟏
𝒚𝟐 𝒙𝟐
11. Generalized interpolation
• Using 4 points & constructing a 3rd degree
polynomial, 5 points for 4th degree and so on.
• Generalized interpolation formula passing through
n+1 data points can be derived from linear
interpolation eqn in a symmetric form with:
𝒇 𝒙 =
𝒙 − 𝒙𝒊+𝟏
𝒙𝒊 − 𝒙𝒊+𝟏
𝒇𝒊 +
𝒙 − 𝒙𝒊
𝒙𝒊+𝟏 − 𝒙𝒊
𝒇𝒊+𝟏
12. Example
Vapor pressure of 4He as a function of temperature
given in table. Find pressure at 3.0K.
Using linear interpolation p(3K) = 21.871kPa
Using quadratic interpolation at points 2.7, 2.9 & 3.2,
p(3K) = 21.671kPa, which is close to 21.595(exact)
Temperature [K] Vapor pressure [kPa]
2.3 6.38512
2.7 13.6218
2.9 18.676
3.2 28.2599
3.5 40.4082
3.7 49.9945
13. Lagrangian interpolation
• Assume we have a set of N+1 points:
y0=f(x0), y1=f(x1), y2=f(x2),…,yN=f(xN)
• To determine a polynomial of degree n so that PN(xi) =
f(xi) = yi; i=0,1,2,…N
• Write PN in the form
PN(x)=a0+a1(x-x0)+a2(x-x0)(x-x1)+…+aN(x-x0)…(x-xN-1)
The coefficients ai are determined in a recursive way,
Interpolation formulae by Lagrange and is given by:
For 2 points (a straight line) we get linear interpolation
and for 3 points we get quadratic.
14. The Aitken method
• One way to achieve the Lagrange interpolation
efficiently is by performing a sequence of linear
interpolations, scheme 1st developed by Aitken
(1932).
• 1st work out n linear interpolations with each
constructed from neighboring pair of n+1 data
points.
• Then use these n data points to achieve another
level of n−1 linear interpolations with the next
neighboring points of xi .
• Repeat this until we obtain the final result after n
levels of consecutive linear interpolations.
• Use equation: 𝒇𝒊...𝒋 =
𝒙 − 𝒙𝒋
𝒙𝒊 − 𝒙𝒋
𝒇𝒊...𝒋−𝟏 +
𝒙 − 𝒙𝒊
𝒙𝒋 − 𝒙𝒊
𝒇𝒊+𝟏...𝒋
15. Hierarchy in the Aitken scheme for n + 1 data points
Using this for 5 points: Error in the Lagrange interpolation is:
∆𝒇(𝒙) ≈
|𝒇𝟎𝟏𝟐𝟑𝟒 − 𝒇𝟎𝟏𝟐𝟑| + |𝒇𝟎𝟏𝟐𝟑𝟒 − 𝒇𝟏𝟐𝟑𝟒|
𝟐
x0 f0
f01
x1 f1 f012
f12 f0123
x2 f2 f123 f01234
f23 f1234
x3 f3 f234
f34
x4 f4
𝒇𝒊...𝒋 =
𝒙 − 𝒙𝒋
𝒙𝒊 − 𝒙𝒋
𝒇𝒊...𝒋−𝟏 +
𝒙 − 𝒙𝒊
𝒙𝒋 − 𝒙𝒊
𝒇𝒊+𝟏...𝒋
16. Least square fitting
To fit the curve
𝒚 = 𝒂𝒙𝒎 + 𝒃𝒙𝒎−𝟏 + 𝒄𝒙𝒎−𝟐+. , . , . +𝒈
to a given set of observations (𝑥1, 𝑦1), (𝑥2, 𝑦2), . , . , (𝑥𝑛, 𝑦𝑛)
For any 𝑥𝑖, observed value is 𝑦𝑖 and expected value is
𝜼𝒊 = 𝒂𝒙𝒏
𝒎 + 𝒃𝒙𝒏−𝟏
𝒎−𝟏
+ 𝒄𝒙𝒏−𝟐
𝒎−𝟐
+. , . , . +𝒈
Error 𝒆𝒊 = 𝒚𝒊 − 𝜼𝒊.
Sum of the squares of these errors is:
𝑬 = 𝒆𝟏
𝟐
+ 𝒆𝟐
𝟐
+ 𝒆𝟑
𝟐
+. . . +𝒆𝒏
𝟐 = 𝒆𝒊
𝟐
𝑵
𝒊=𝟏
For E to be minimum:
𝝏𝑬
𝝏𝒂
= 𝟎,
𝝏𝑬
𝝏𝒃
= 𝟎,
𝝏𝑬
𝝏𝒄
= 𝟎, , ,
𝝏𝑬
𝝏𝒈
= 𝟎
17. Linear Least Square fitting
Working procedure
To fit the straight line y = ax + b,
𝜂𝑖 = 𝑎𝑥𝑖 + 𝑏
𝑒𝑖 = 𝑦𝑖 − 𝜂𝑖
𝑬 = 𝒆𝒊
𝟐
𝑵
𝒊=𝟏
For E to be minimum:
𝝏𝑬
𝝏𝒂
= 𝟎,
𝝏𝑬
𝝏𝒃
= 𝟎
19. • The above eqns are Normal equations,
• solved simultaneously in a and b.
• a & b when substituted give desired curve of best fit
𝑎 =
𝑛 𝑥𝑖𝑦𝑖
𝑖 − 𝑥𝑖
𝑖 𝑦𝑖
𝑖
𝑛 𝑥𝑖
2
𝑖 − 𝑥𝑖
𝑖 𝑥𝑖
𝑖
and 𝑏 =
𝑛 𝑥𝑖
2
𝑖 𝑦𝑖
𝑖 − 𝑥𝑖
𝑖 𝑥𝑖
𝑖 𝑦𝑖
𝑛 𝑥𝑖
2
𝑖 − 𝑥𝑖
𝑖 𝑥𝑖
𝑖
20. Example
An experiment produces the following data sample pairs
(xi , yi ): y = ax + b, find a & b
xi = 1.85,2.72,2.81,3.06,3.42,3.76,4.31,4.47,4.64,4.99
yi = 2.26,3.10,3.80,4.11,4.74,4.31,5.24,4.03,5.69,6.57
Step 1. determine each sum, n=10
𝑥 = 36.03, 𝑦 = 43.85, 𝑥𝑦 = 168.06839, 𝑥2
= 138.90129
Step 2. use these values in the equations of a and b
• value of slope is a = 1.1091495
• value of intercept is b = 0.38873413
21. Cubic Spline fitting
• Over n intervals, the routine fits n equations subject
to the boundary conditions of n+1 data points.
• Derivation assumes functional form for the curve
fit.
• This equation form is simplified and then solved for
the curve fit equation.
• The assumed form for the cubic polynomial curve
fit for each segment is,
• Spacing between successive data points is
22. Cubic Spline fitting
• Cubic spline constrains the function value, 1st & 2nd
derivative.
• The routine must ensure that y(x), y′(x), and y′′(x)
and are equal at interior node points for adjacent
segments.
• Substituting a variable S for the polynomial’s 2nd
derivative reduces number of eqns from a, b, c, d for
each segment to only S for each segment.
• For ith segment, the S governing equation is,
23. Cubic Spline fitting
• In matrix form, governing eqns reduce to tri-diagonal.
• S1= Sn= 0 for natural spline boundary condition.
• Substituting the above into a, b, c and d correspond to the
polynomial definition for each segment.
24. General program steps
• Problem Initialization: program initializes the
variables.
• Read in Data Values: data values are read and the
individual intervals are calculated.
• Determine S matrices: influence coefficient values for
S are determined. The constant matrix, C, is determined.
• Matrix Solver: Tri-Diagonal-Matrix-Algorithm
determines the S value at each interval.
• Calculate Cubic Parameters: cubic parameters a, b, c
and d are calculated at each interval from S and h.
• Write out: The program writes out the polynomial
specification terms a, b, c and d.
25. F90 program for cubic spline interpolation
Program cubic_spline
Implicit none
! Initialization
Real*8, Dimension (10):: x, y, h
Integer :: norder, i, j , ntdma, nstep, step
Real*8, Dimension (10):: S, B, D, A, C
Real*8 R,xx(10),yy(10), xs, ys
Integer, parameter::n=100,m=1000
Real*8 xr(m), yr(m), xa(m), fa(m), dx
!Read in data-point order
norder = 5
!x=0,1,2,3,4: 5 data points
do i=1,norder
y(i) = x(i)**3 -8.0
End do
! width of the ith interval
Do i = 1, (norder - 1)
h(i) = x(i + 1) - x(i)
End do
! Set S matrix for natural spline
Do i = 2 , (norder - 1)
j = i - 1
D (j) = 2 * (h(i - 1) + h(i))
A(j) = h(i) !Ignore A(norder)
B(j) = h(i - 1) !Ignore B(0)
End do
26. !Set Constant Matrix C
Do i = 2 , (norder - 1)
j = i - 1
C(j)=6*((y(i+1)-y(i))/h(i) - (y(i) - y(i - 1)) / h(i -
1))
End do
! Max tdma length
ntdma = norder - 2
!Upper Triangularization
Do i = 2 , ntdma
R = B(i) / D(i - 1)
D(i) = D(i) - R * A(i - 1)
C(i) = C(i) - R * C(i - 1)
End do
! Directly set the last C
C(ntdma) = C(ntdma)/D(ntdma)
! Back Substitute
Do i = (ntdma - 1) , 1, -1
C(i) = (C(i) - A(i) * C(i + 1)) / D(i)
End do
! Switch from C to S
Do i = 2 , (norder - 1)
j = i - 1 ! Shift from TDMA coordinate system
S(i) = C(j)
End do
S(1) = 0.0
S(norder) = 0.0
! Calculate cubic ai,bi,ci and di from S and h
Do i = 1 , (norder - 1)
A(i) = (S(i + 1) - S(i)) / (6 * h(i))
B(i) = S(i) / 2.
C(i) = (y(i + 1)-y(i))/h(i)-(2 * h(i) * S(i) + h(i) * S(i
+ 1))/6.
D(i) = y(i)
End do
Do i = 1 , norder
End do
27. Plotting
! Read in steps
nstep = 100
step = 0
Do i = 1 , (norder - 1) !'Discrete function
Do j = 1 , nstep
step = step + 1
xs = x(i) + (h(i) / nstep) * (j - 1)
ys=A(i)*(xs - x(i))**3 + B(i)*(xs - x(i))**2 +
C(i)*(xs - x(i)) + D(i)
xr(step) = xs
yr(step) = ys
End do
End do
do i=1,step
!print*,i,xr(i),yr(i)
write(1,*)xr(i),yr(i)
enddo
End
• fit is good except b/n x=3 & 4
w/c is caused by natural spline
boundary conditions at x=0 & 4.
• Changing the spline to reflect the
correct 2nd derivative at x=4
would help the fit.