Scientific Computing with Python Webinar 9/18/2009:Curve Fitting

Curve-fitting (regression) with Python

September 18, 2009

Enthought Training Courses

Python Basics, NumPy, SciPy,
Matplotlib, Traits, TraitsUI,
Chaco…

Enthought Python Distribution (EPD)
http://www.enthought.com/products/epd.php

Data Model

y = mx + b
m = 4.316
b = 2.763

a
y =
(b + ce−dx )
a = 7.06
b = 2.52
c = 26.14
d = −5.57

Curve Fitting or Regression?

Adrien-Marie
Francis Galton
Legendre

R.A. Fisher
Carl Gauss

or (my preferred) ... Bayesian Inference
Model Prior
Inference p(Y|X)p(X)
p(X|Y) =
p(Y)

Un

Da
p(Y|X)p(X)

kn

ta
=

ow
p(Y|X)p(X)dX

ns
Bayes Laplace

Harold Jeffreys Richard T. Cox Edwin T. Jaynes

More pedagogy
Machine Learning

Curve Fitting Regression
Parameter Estimation Bayesian Inference
Understated statistical Statistical model is
model more important
Just want “best” fit to Post estimation
data analysis of error and fit

Pragmatic look at the methods

• Because the concept is really at the heart of
science, many practical methods have been
developed.
• SciPy contains the building blocks to
implement basically any method.
• SciPy should get high-level interfaces to all
the methods in common use.

Methods vary in...

• The model used:
– parametric (specific model): y = f (x; θ)
– non-parametric (many unknowns) y = θi φi (x)
i
• The way error is modeled y = y +
ˆ
– few assumptions (e.g. zero-mean, homoscedastic)
– full probabilistic model
• What “best fit” means (i.e. what is distance
between the predicted and the measured).
– traditional least-squares
– robust methods (e.g. absolute difference)

Parametric Least Squares

T
y
ˆ = [y0 , y1 , ..., yN −1 ]
T
x = [x0 , x1 , ..., xN −1 ]
T
y = f (x; β) +
ˆ
β = [β0 , β1 , ..., βK−1 ]
K < N

ˆ
β = argmin J(ˆ, x, β)
y
β
ˆ
β = argmin (ˆ − f (x; β))T W(ˆ − f (x; β))
y y
β

Linear Least Squares
y = H(x)β +
ˆ
−1
ˆ = H(x)T WH(x)
β H(x) Wˆ
y
T

Quadratic Example:
yi = 2
axi + bxi + c
 
x20 x0 1  
 x2 x1 1  a
 1 
y=
ˆ .
. .
. .
.  b +
 . . .  c
x2 −1
N xN −1 1
H(x) β

Non-linear least squares
ˆ
β = argmin J(ˆ, x, β)
y
β
ˆ
β = argmin (ˆ − f (x; β)) W(ˆ − f (x; β))
y y T
β

Logistic Example

a
yi =
b + ce−dxi

Optimization Problem!!

Tools in NumPy / SciPy

• polyfit (linear least squares)
• curve_fit (non-linear least-squares)
• poly1d (polynomial object)
• numpy.random (random number generators)
• scipy.stats (distribution objects
• scipy.optimize (unconstrained and
constrained optimization)

Polynomials

• p = poly1d(<coefficient array>) >>> p = poly1d([1,-2,4])
>>> print p
2
• p.roots (p.r) are the roots x - 2 x + 4

• p.coefficients (p.c) are the coefficients >>> g = p**3 + p*(3-2*p)
>>> print g

• p.order is the order 6 5 4 3 2
x - 6 x + 25 x - 51 x + 81 x - 58 x + 44
• p[n] is the coefficient of xn >>> print g.deriv(m=2)
4 3 2
• p(val) evaulates the polynomial at val 30 x - 120 x + 300 x - 306 x + 162

>>> print p.integ(m=2,k=[2,1])
• p.integ() integrates the polynomial 4 3 2
0.08333 x - 0.3333 x + 2 x + 2 x + 1
• p.deriv() differentiates the polynomial
>>> print p.roots
• Basic numeric operations (+,-,/,*) work [ 1.+1.7321j 1.-1.7321j]

>>> print p.coeffs
• Acts like p.c when used as an array [ 1 -2 4]

• Fancy printing

15

Statistics
scipy.stats — CONTINUOUS DISTRIBUTIONS

over 80
continuous
distributions!

METHODS
pdf entropy
cdf nnlf
rvs moment
ppf freeze
stats
fit
sf
isf
16

Using stats objects
DISTRIBUTIONS

>>> from scipy.stats import norm
# Sample normal dist. 100 times.
>>> samp = norm.rvs(size=100)

>>> x = linspace(-5, 5, 100)
# Calculate probability dist.
>>> pdf = norm.pdf(x)
# Calculate cummulative Dist.
>>> cdf = norm.cdf(x)
# Calculate Percent Point Function
>>> ppf = norm.ppf(x)

17

Setting location and Scale
NORMAL DISTRIBUTION

# Normal dist with mean=10 and std=2
>>> dist = norm(loc=10, scale=2)

>>> x = linspace(-5, 15, 100)
# Calculate probability dist.
>>> pdf = dist.pdf(x)
# Calculate cummulative dist.
>>> cdf = dist.cdf(x)

# Get 100 random samples from dist.
>>> samp = dist.rvs(size=100)

# Estimate parameters from data
>>> mu, sigma = norm.fit(samp) .fit returns best
>>> print “%4.2f, %4.2f” % (mu, sigma) shape + (loc, scale)
10.07, 1.95 that explains the data
18

Fitting Polynomials (NumPy)
POLYFIT(X, Y, DEGREE)
>>> from numpy import polyfit, poly1d
# Create clean data.
>>> x = linspace(0, 4.0, 100)
>>> y = 1.5 * exp(-0.2 * x) + 0.3
# Add a bit of noise.
>>> noise = 0.1 * norm.rvs(size=100)
>>> noisy_y = y + noise

# Fit noisy data with a linear model.
>>> linear_coef = polyfit(x, noisy_y, 1)
>>> linear_poly = poly1d(linear_coef)
>>> linear_y = linear_poly(x),

# Fit noisy data with a quadratic model.
>>> quad_coef = polyfit(x, noisy_y, 2)
>>> quad_poly = poly1d(quad_coef)
>>> quad_y = quad_poly(x))
19

Optimization
scipy.optimize — Unconstrained Minimization and Root Finding

Unconstrained Optimization Constrained Optimization
• fmin (Nelder-Mead simplex) • fmin_l_bfgs_b
• fmin_powell (Powell’s method) • fmin_tnc (truncated Newton code)
• fmin_bfgs (BFGS quasi-Newton • fmin_cobyla (constrained optimization by
method) linear approximation)
• fmin_ncg (Newton conjugate
gradient) • fminbound (interval constrained 1D
minimizer)
• leastsq (Levenberg-Marquardt)
• anneal (simulated annealing global Root Finding
minimizer) • fsolve (using MINPACK)
• brute (brute force global minimizer) • brentq
• brent (excellent 1-D minimizer) • brenth
• golden • ridder
• bracket
• newton
• bisect
• fixed_point (fixed point equation solver)

20

Optimization: Data Fitting
NONLINEAR LEAST SQUARES CURVE FITTING
>>> from scipy.optimize import curve_fit
# Define the function to fit.
>>> def function(x, a , b, f, phi):
... result = a * exp(-b * sin(f * x + phi))
... return result

# Create a noisy data set.
>>> actual_params = [3, 2, 1, pi/4]
>>> x = linspace(0,2*pi,25)
>>> exact = function(x, *actual_params)
>>> noisy = exact + 0.3 * randn(len(x))

# Use curve_fit to estimate the function parameters from the noisy data.
>>> initial_guess = [1,1,1,1]
>>> estimated_params, err_est = curve_fit(function, x, noisy, p0=initial_guess) >>>
estimated_params
array([3.1705, 1.9501, 1.0206, 0.7034])

# err_est is an estimate of the covariance matrix of the estimates
# (i.e. how good of a fit is it)

21

StatsModels
Josef Perktold
Canada

Economists

Skipper Seabold
PhD Student
American University
Washington, D.C.

GUI example: astropysics (with TraitsUI)
Erik J. Tollerud
PhD Student
UC Irvine
Center for Cosmology
Irvine, CA

http://www.physics.uci.edu/~etolleru/

Scientific Python Classes
http://www.enthought.com/training

Sept 21-25 Austin
Oct 19-22 Silicon Valley
Nov 9-12 Chicago
Dec 7-11 Austin

Scientific Computing with Python Webinar 9/18/2009:Curve Fitting

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (18)

Similar to Scientific Computing with Python Webinar 9/18/2009:Curve Fitting

Similar to Scientific Computing with Python Webinar 9/18/2009:Curve Fitting (20)

More from Enthought, Inc.

More from Enthought, Inc. (13)

Recently uploaded

Recently uploaded (20)

Scientific Computing with Python Webinar 9/18/2009:Curve Fitting