Curve-fitting (regression) with Python



            September 18, 2009
Enthought Consulting
Enthought Training Courses




                Python Basics, NumPy, SciPy,
                Matplotlib, Traits, TraitsUI,
...
Enthought Python Distribution (EPD)
     http://www.enthought.com/products/epd.php
Data                      Model


       y   =   mx + b
       m   =   4.316
       b   =   2.763




                a
  ...
Curve Fitting or Regression?




Adrien-Marie
                         Francis Galton
Legendre




                       ...
or (my preferred) ... Bayesian Inference
                                                       Model   Prior
            ...
More pedagogy
                                           Machine Learning


       Curve Fitting             Regression
  ...
Pragmatic look at the methods

 • Because the concept is really at the heart of
   science, many practical methods have be...
Methods vary in...

  • The model used:
   – parametric (specific model): y = f (x; θ)
   – non-parametric (many unknowns)...
Parametric Least Squares

                                 T
   y
   ˆ   =   [y0 , y1 , ..., yN −1 ]
                     ...
Linear Least Squares
   y = H(x)β +
   ˆ
                                 −1
  ˆ = H(x)T WH(x)
  β                        ...
Non-linear least squares
     ˆ
     β    =     argmin J(ˆ, x, β)
                         y
                    β
     ˆ
...
Tools in NumPy / SciPy

 • polyfit (linear least squares)
 • curve_fit (non-linear least-squares)
 • poly1d (polynomial ob...
Polynomials

• p = poly1d(<coefficient array>)             >>> p = poly1d([1,-2,4])
                                      ...
Statistics
scipy.stats — CONTINUOUS DISTRIBUTIONS

 over 80
 continuous
 distributions!

METHODS
pdf     entropy
cdf     n...
Using stats objects
DISTRIBUTIONS


>>> from scipy.stats import norm
# Sample normal dist. 100 times.
>>> samp = norm.rvs(...
Setting location and Scale
NORMAL DISTRIBUTION


>>> from scipy.stats import norm
# Normal dist with mean=10 and std=2
>>>...
Fitting Polynomials (NumPy)
POLYFIT(X, Y, DEGREE)
>>> from numpy import polyfit, poly1d
>>> from scipy.stats import norm
#...
Optimization
scipy.optimize — Unconstrained Minimization and Root Finding

Unconstrained Optimization               Constr...
Optimization: Data Fitting
NONLINEAR LEAST SQUARES CURVE FITTING
>>> from scipy.optimize import curve_fit
# Define the fun...
StatsModels
Josef Perktold
  Assistant Professor
  University of Chicago
  Chicago, IL




Economists


Skipper Seabold
  ...
GUI example: astropysics (with TraitsUI)
    Erik J. Tollerud
       PhD Student
       UC Irvine
       Center for Cosmol...
Scientific Python Classes
       http://www.enthought.com/training

        Sept 21-25      Austin
        Oct 19-22      ...
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Upcoming SlideShare
Loading in...5
×

Scientific Computing with Python Webinar 9/18/2009:Curve Fitting

7,742

Published on

This webinar will provide an overview of the tools that SciPy and NumPy provide for regression analysis including linear and non-linear least-squares and a brief look at handling other error metrics. We will also demonstrate simple GUI tools that can make some problems easier and provide a quick overview of the new Scikits package statsmodels whose API is maturing in a separate package but should be incorporated into SciPy in the future.

Published in: Technology

Scientific Computing with Python Webinar 9/18/2009:Curve Fitting

  1. 1. Curve-fitting (regression) with Python September 18, 2009
  2. 2. Enthought Consulting
  3. 3. Enthought Training Courses Python Basics, NumPy, SciPy, Matplotlib, Traits, TraitsUI, Chaco…
  4. 4. Enthought Python Distribution (EPD) http://www.enthought.com/products/epd.php
  5. 5. Data Model y = mx + b m = 4.316 b = 2.763 a y = (b + ce−dx ) a = 7.06 b = 2.52 c = 26.14 d = −5.57
  6. 6. Curve Fitting or Regression? Adrien-Marie Francis Galton Legendre R.A. Fisher Carl Gauss
  7. 7. or (my preferred) ... Bayesian Inference Model Prior Inference p(Y|X)p(X) p(X|Y) = p(Y) Un Da p(Y|X)p(X) kn ta = ow p(Y|X)p(X)dX ns Bayes Laplace Harold Jeffreys Richard T. Cox Edwin T. Jaynes
  8. 8. More pedagogy Machine Learning Curve Fitting Regression Parameter Estimation Bayesian Inference Understated statistical Statistical model is model more important Just want “best” fit to Post estimation data analysis of error and fit
  9. 9. Pragmatic look at the methods • Because the concept is really at the heart of science, many practical methods have been developed. • SciPy contains the building blocks to implement basically any method. • SciPy should get high-level interfaces to all the methods in common use.
  10. 10. Methods vary in... • The model used: – parametric (specific model): y = f (x; θ) – non-parametric (many unknowns) y = θi φi (x) i • The way error is modeled y = y + ˆ – few assumptions (e.g. zero-mean, homoscedastic) – full probabilistic model • What “best fit” means (i.e. what is distance between the predicted and the measured). – traditional least-squares – robust methods (e.g. absolute difference)
  11. 11. Parametric Least Squares T y ˆ = [y0 , y1 , ..., yN −1 ] T x = [x0 , x1 , ..., xN −1 ] T y = f (x; β) + ˆ β = [β0 , β1 , ..., βK−1 ] K < N ˆ β = argmin J(ˆ, x, β) y β ˆ β = argmin (ˆ − f (x; β))T W(ˆ − f (x; β)) y y β
  12. 12. Linear Least Squares y = H(x)β + ˆ −1 ˆ = H(x)T WH(x) β H(x) Wˆ y T Quadratic Example: yi = 2 axi + bxi + c   x20 x0 1    x2 x1 1  a  1  y= ˆ . . . . . .  b +  . . .  c x2 −1 N xN −1 1 H(x) β
  13. 13. Non-linear least squares ˆ β = argmin J(ˆ, x, β) y β ˆ β = argmin (ˆ − f (x; β)) W(ˆ − f (x; β)) y y T β Logistic Example a yi = b + ce−dxi Optimization Problem!!
  14. 14. Tools in NumPy / SciPy • polyfit (linear least squares) • curve_fit (non-linear least-squares) • poly1d (polynomial object) • numpy.random (random number generators) • scipy.stats (distribution objects • scipy.optimize (unconstrained and constrained optimization)
  15. 15. Polynomials • p = poly1d(<coefficient array>) >>> p = poly1d([1,-2,4]) >>> print p 2 • p.roots (p.r) are the roots x - 2 x + 4 • p.coefficients (p.c) are the coefficients >>> g = p**3 + p*(3-2*p) >>> print g • p.order is the order 6 5 4 3 2 x - 6 x + 25 x - 51 x + 81 x - 58 x + 44 • p[n] is the coefficient of xn >>> print g.deriv(m=2) 4 3 2 • p(val) evaulates the polynomial at val 30 x - 120 x + 300 x - 306 x + 162 >>> print p.integ(m=2,k=[2,1]) • p.integ() integrates the polynomial 4 3 2 0.08333 x - 0.3333 x + 2 x + 2 x + 1 • p.deriv() differentiates the polynomial >>> print p.roots • Basic numeric operations (+,-,/,*) work [ 1.+1.7321j 1.-1.7321j] >>> print p.coeffs • Acts like p.c when used as an array [ 1 -2 4] • Fancy printing 15
  16. 16. Statistics scipy.stats — CONTINUOUS DISTRIBUTIONS over 80 continuous distributions! METHODS pdf entropy cdf nnlf rvs moment ppf freeze stats fit sf isf 16
  17. 17. Using stats objects DISTRIBUTIONS >>> from scipy.stats import norm # Sample normal dist. 100 times. >>> samp = norm.rvs(size=100) >>> x = linspace(-5, 5, 100) # Calculate probability dist. >>> pdf = norm.pdf(x) # Calculate cummulative Dist. >>> cdf = norm.cdf(x) # Calculate Percent Point Function >>> ppf = norm.ppf(x) 17
  18. 18. Setting location and Scale NORMAL DISTRIBUTION >>> from scipy.stats import norm # Normal dist with mean=10 and std=2 >>> dist = norm(loc=10, scale=2) >>> x = linspace(-5, 15, 100) # Calculate probability dist. >>> pdf = dist.pdf(x) # Calculate cummulative dist. >>> cdf = dist.cdf(x) # Get 100 random samples from dist. >>> samp = dist.rvs(size=100) # Estimate parameters from data >>> mu, sigma = norm.fit(samp) .fit returns best >>> print “%4.2f, %4.2f” % (mu, sigma) shape + (loc, scale) 10.07, 1.95 that explains the data 18
  19. 19. Fitting Polynomials (NumPy) POLYFIT(X, Y, DEGREE) >>> from numpy import polyfit, poly1d >>> from scipy.stats import norm # Create clean data. >>> x = linspace(0, 4.0, 100) >>> y = 1.5 * exp(-0.2 * x) + 0.3 # Add a bit of noise. >>> noise = 0.1 * norm.rvs(size=100) >>> noisy_y = y + noise # Fit noisy data with a linear model. >>> linear_coef = polyfit(x, noisy_y, 1) >>> linear_poly = poly1d(linear_coef) >>> linear_y = linear_poly(x), # Fit noisy data with a quadratic model. >>> quad_coef = polyfit(x, noisy_y, 2) >>> quad_poly = poly1d(quad_coef) >>> quad_y = quad_poly(x)) 19
  20. 20. Optimization scipy.optimize — Unconstrained Minimization and Root Finding Unconstrained Optimization Constrained Optimization • fmin (Nelder-Mead simplex) • fmin_l_bfgs_b • fmin_powell (Powell’s method) • fmin_tnc (truncated Newton code) • fmin_bfgs (BFGS quasi-Newton • fmin_cobyla (constrained optimization by method) linear approximation) • fmin_ncg (Newton conjugate gradient) • fminbound (interval constrained 1D minimizer) • leastsq (Levenberg-Marquardt) • anneal (simulated annealing global Root Finding minimizer) • fsolve (using MINPACK) • brute (brute force global minimizer) • brentq • brent (excellent 1-D minimizer) • brenth • golden • ridder • bracket • newton • bisect • fixed_point (fixed point equation solver) 20
  21. 21. Optimization: Data Fitting NONLINEAR LEAST SQUARES CURVE FITTING >>> from scipy.optimize import curve_fit # Define the function to fit. >>> def function(x, a , b, f, phi): ... result = a * exp(-b * sin(f * x + phi)) ... return result # Create a noisy data set. >>> actual_params = [3, 2, 1, pi/4] >>> x = linspace(0,2*pi,25) >>> exact = function(x, *actual_params) >>> noisy = exact + 0.3 * randn(len(x)) # Use curve_fit to estimate the function parameters from the noisy data. >>> initial_guess = [1,1,1,1] >>> estimated_params, err_est = curve_fit(function, x, noisy, p0=initial_guess) >>> estimated_params array([3.1705, 1.9501, 1.0206, 0.7034]) # err_est is an estimate of the covariance matrix of the estimates # (i.e. how good of a fit is it) 21
  22. 22. StatsModels Josef Perktold Assistant Professor University of Chicago Chicago, IL Economists Skipper Seabold PhD Student American University Washington, D.C.
  23. 23. GUI example: astropysics (with TraitsUI) Erik J. Tollerud PhD Student UC Irvine Center for Cosmology Irvine, CA http://www.physics.uci.edu/~etolleru/
  24. 24. Scientific Python Classes http://www.enthought.com/training Sept 21-25 Austin Oct 19-22 Silicon Valley Nov 9-12 Chicago Dec 7-11 Austin
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×