Upcoming SlideShare
×

# Scientific Computing with Python Webinar 9/18/2009:Curve Fitting

8,999 views

Published on

This webinar will provide an overview of the tools that SciPy and NumPy provide for regression analysis including linear and non-linear least-squares and a brief look at handling other error metrics. We will also demonstrate simple GUI tools that can make some problems easier and provide a quick overview of the new Scikits package statsmodels whose API is maturing in a separate package but should be incorporated into SciPy in the future.

Published in: Technology
8 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
8,999
On SlideShare
0
From Embeds
0
Number of Embeds
125
Actions
Shares
0
272
0
Likes
8
Embeds 0
No embeds

No notes for slide

### Scientific Computing with Python Webinar 9/18/2009:Curve Fitting

1. 1. Curve-fitting (regression) with Python September 18, 2009
2. 2. Enthought Consulting
3. 3. Enthought Training Courses Python Basics, NumPy, SciPy, Matplotlib, Traits, TraitsUI, Chaco…
4. 4. Enthought Python Distribution (EPD) http://www.enthought.com/products/epd.php
5. 5. Data Model y = mx + b m = 4.316 b = 2.763 a y = (b + ce−dx ) a = 7.06 b = 2.52 c = 26.14 d = −5.57
6. 6. Curve Fitting or Regression? Adrien-Marie Francis Galton Legendre R.A. Fisher Carl Gauss
7. 7. or (my preferred) ... Bayesian Inference Model Prior Inference p(Y|X)p(X) p(X|Y) = p(Y) Un Da p(Y|X)p(X) kn ta = ow p(Y|X)p(X)dX ns Bayes Laplace Harold Jeffreys Richard T. Cox Edwin T. Jaynes
8. 8. More pedagogy Machine Learning Curve Fitting Regression Parameter Estimation Bayesian Inference Understated statistical Statistical model is model more important Just want “best” fit to Post estimation data analysis of error and fit
9. 9. Pragmatic look at the methods • Because the concept is really at the heart of science, many practical methods have been developed. • SciPy contains the building blocks to implement basically any method. • SciPy should get high-level interfaces to all the methods in common use.
10. 10. Methods vary in... • The model used: – parametric (specific model): y = f (x; θ) – non-parametric (many unknowns) y = θi φi (x) i • The way error is modeled y = y + ˆ – few assumptions (e.g. zero-mean, homoscedastic) – full probabilistic model • What “best fit” means (i.e. what is distance between the predicted and the measured). – traditional least-squares – robust methods (e.g. absolute difference)
11. 11. Parametric Least Squares T y ˆ = [y0 , y1 , ..., yN −1 ] T x = [x0 , x1 , ..., xN −1 ] T y = f (x; β) + ˆ β = [β0 , β1 , ..., βK−1 ] K < N ˆ β = argmin J(ˆ, x, β) y β ˆ β = argmin (ˆ − f (x; β))T W(ˆ − f (x; β)) y y β
12. 12. Linear Least Squares y = H(x)β + ˆ −1 ˆ = H(x)T WH(x) β H(x) Wˆ y T Quadratic Example: yi = 2 axi + bxi + c   x20 x0 1    x2 x1 1  a  1  y= ˆ . . . . . .  b +  . . .  c x2 −1 N xN −1 1 H(x) β
13. 13. Non-linear least squares ˆ β = argmin J(ˆ, x, β) y β ˆ β = argmin (ˆ − f (x; β)) W(ˆ − f (x; β)) y y T β Logistic Example a yi = b + ce−dxi Optimization Problem!!
14. 14. Tools in NumPy / SciPy • polyfit (linear least squares) • curve_fit (non-linear least-squares) • poly1d (polynomial object) • numpy.random (random number generators) • scipy.stats (distribution objects • scipy.optimize (unconstrained and constrained optimization)
15. 15. Polynomials • p = poly1d(<coefficient array>) >>> p = poly1d([1,-2,4]) >>> print p 2 • p.roots (p.r) are the roots x - 2 x + 4 • p.coefficients (p.c) are the coefficients >>> g = p**3 + p*(3-2*p) >>> print g • p.order is the order 6 5 4 3 2 x - 6 x + 25 x - 51 x + 81 x - 58 x + 44 • p[n] is the coefficient of xn >>> print g.deriv(m=2) 4 3 2 • p(val) evaulates the polynomial at val 30 x - 120 x + 300 x - 306 x + 162 >>> print p.integ(m=2,k=[2,1]) • p.integ() integrates the polynomial 4 3 2 0.08333 x - 0.3333 x + 2 x + 2 x + 1 • p.deriv() differentiates the polynomial >>> print p.roots • Basic numeric operations (+,-,/,*) work [ 1.+1.7321j 1.-1.7321j] >>> print p.coeffs • Acts like p.c when used as an array [ 1 -2 4] • Fancy printing 15
16. 16. Statistics scipy.stats — CONTINUOUS DISTRIBUTIONS over 80 continuous distributions! METHODS pdf entropy cdf nnlf rvs moment ppf freeze stats fit sf isf 16
17. 17. Using stats objects DISTRIBUTIONS >>> from scipy.stats import norm # Sample normal dist. 100 times. >>> samp = norm.rvs(size=100) >>> x = linspace(-5, 5, 100) # Calculate probability dist. >>> pdf = norm.pdf(x) # Calculate cummulative Dist. >>> cdf = norm.cdf(x) # Calculate Percent Point Function >>> ppf = norm.ppf(x) 17
18. 18. Setting location and Scale NORMAL DISTRIBUTION >>> from scipy.stats import norm # Normal dist with mean=10 and std=2 >>> dist = norm(loc=10, scale=2) >>> x = linspace(-5, 15, 100) # Calculate probability dist. >>> pdf = dist.pdf(x) # Calculate cummulative dist. >>> cdf = dist.cdf(x) # Get 100 random samples from dist. >>> samp = dist.rvs(size=100) # Estimate parameters from data >>> mu, sigma = norm.fit(samp) .fit returns best >>> print “%4.2f, %4.2f” % (mu, sigma) shape + (loc, scale) 10.07, 1.95 that explains the data 18
19. 19. Fitting Polynomials (NumPy) POLYFIT(X, Y, DEGREE) >>> from numpy import polyfit, poly1d >>> from scipy.stats import norm # Create clean data. >>> x = linspace(0, 4.0, 100) >>> y = 1.5 * exp(-0.2 * x) + 0.3 # Add a bit of noise. >>> noise = 0.1 * norm.rvs(size=100) >>> noisy_y = y + noise # Fit noisy data with a linear model. >>> linear_coef = polyfit(x, noisy_y, 1) >>> linear_poly = poly1d(linear_coef) >>> linear_y = linear_poly(x), # Fit noisy data with a quadratic model. >>> quad_coef = polyfit(x, noisy_y, 2) >>> quad_poly = poly1d(quad_coef) >>> quad_y = quad_poly(x)) 19
20. 20. Optimization scipy.optimize — Unconstrained Minimization and Root Finding Unconstrained Optimization Constrained Optimization • fmin (Nelder-Mead simplex) • fmin_l_bfgs_b • fmin_powell (Powell’s method) • fmin_tnc (truncated Newton code) • fmin_bfgs (BFGS quasi-Newton • fmin_cobyla (constrained optimization by method) linear approximation) • fmin_ncg (Newton conjugate gradient) • fminbound (interval constrained 1D minimizer) • leastsq (Levenberg-Marquardt) • anneal (simulated annealing global Root Finding minimizer) • fsolve (using MINPACK) • brute (brute force global minimizer) • brentq • brent (excellent 1-D minimizer) • brenth • golden • ridder • bracket • newton • bisect • fixed_point (fixed point equation solver) 20
21. 21. Optimization: Data Fitting NONLINEAR LEAST SQUARES CURVE FITTING >>> from scipy.optimize import curve_fit # Define the function to fit. >>> def function(x, a , b, f, phi): ... result = a * exp(-b * sin(f * x + phi)) ... return result # Create a noisy data set. >>> actual_params = [3, 2, 1, pi/4] >>> x = linspace(0,2*pi,25) >>> exact = function(x, *actual_params) >>> noisy = exact + 0.3 * randn(len(x)) # Use curve_fit to estimate the function parameters from the noisy data. >>> initial_guess = [1,1,1,1] >>> estimated_params, err_est = curve_fit(function, x, noisy, p0=initial_guess) >>> estimated_params array([3.1705, 1.9501, 1.0206, 0.7034]) # err_est is an estimate of the covariance matrix of the estimates # (i.e. how good of a fit is it) 21
22. 22. StatsModels Josef Perktold Assistant Professor University of Chicago Chicago, IL Economists Skipper Seabold PhD Student American University Washington, D.C.
23. 23. GUI example: astropysics (with TraitsUI) Erik J. Tollerud PhD Student UC Irvine Center for Cosmology Irvine, CA http://www.physics.uci.edu/~etolleru/
24. 24. Scientific Python Classes http://www.enthought.com/training Sept 21-25 Austin Oct 19-22 Silicon Valley Nov 9-12 Chicago Dec 7-11 Austin