Data analysis and Its applications      on Asteroseismology                Olga Moreira                 April 2005        ...
OutlinePrinciples of data analysis         Introduction to spectral analysisIntroduction                        Fourier an...
Part IPrinciples of data analysis
Introduction
What do you think of when someone say “data”?                        Roxbourg & Paternó   - Eddington                     ...
What do you think of when someone say “data”?
What do you think of when someone say “data”?
What do you think of when someone say “data”?
What do you think of when someone say “data”?
What all those definitions of data have in common?Incomplete        Probability                                     Infere...
Analysis MethodMerit function    Best fit     Goodness-of-fit
Analysis MethodA complete analysis should provide:    Parameters;    Error estimates on the parameters;    A statistical m...
Merit functions andparameters fitting
Maximum Likelihood Estimators (MLE)λ = λ       λ        λ           : Set of   parameters    =                            ...
It is common to find defined the as the likelihood., but in fact isjust the logarithm of the likelihood, which more easy t...
Error Estimate                 λGaussian shape       Non-guassian shape          Non-guassian shape                     wi...
Estimator: Desirable properties Unbiased:                             Minimum variance :     λ = λ −λ =                   ...
MLE asymptotically unbiased ′ λ     =        λ    +     λ − λ     λ   +Neglecting the largers orders and    →∞          λ ...
In multi-dimensions: =                         λ= λ λ    λ                    λ =      λ =              λ                 ...
If        λ λ      =       ↔       ≠  λ    ≈ λ      ± σ     λ                          =      If        λ λ      ≠       ↔...
Least-square and Chi-square fit1. Considering one measures with errors that are independently and normal   distributed aro...
Limitations:    Real data most of the time violate the i.i.d condition    Sometimes one have a limited sample    In practi...
Example: modes stochastically excited  ν =%ν χ νFor a single mode:                     Γ% ν    =                          ...
Maximization/Minimization        Problem
Going “Downhill” MethodsFinding a global extreme is general very                                                          ...
Falling in the wrong valleyThe downhill methods a lack onefficiency/robustness. For instance the simplexmethod can very fa...
Exotic Methods              Solving “The traveling salesman problem”:              A salesman has to visit each city on a ...
Goodness-of-fit
Chi-square test                    −                      : is the number of events observed in χ =                       ...
Kolmogorov-Smirnov (K-S) test                                                                                             ...
Synthetic data
Monte-Carlo simulations           If one know something about the           process that generated our data , given       ...
Hare-and-Hounds game          Team A:    generates theoretical mode frequencies and synthetictime series.            Team ...
End of Part IOptions available : • Questions • Coffee break • “Get on with it !!!”
Part IIIntroduction to spectral       analysis
Fourier transformProperties:                                                +∞                   2               = 2( & ) ...
Sampling theorem                                     γ                       ⋅γ              2&                     ϒ&    ...
Undersampling                                               The sampling theorem assumes that a                           ...
Discrete Fourier transformDiscrete Fourier transform         2    &    =                       π&           =   δ        =...
Power spectrum estimationPeriodogram:           & =       2 &      =                         π&                           ...
Frequencies leakage:   Leakage from nearby frequencies, which is described usually as a spectralwindow and is a primarily ...
Futher complications          Closely spaced frequencies:          Direct contribution for the first          aforemention...
Power spectrum of random noise        =3     +77       →,              ,        ,        +,,3       → +         ,3The esti...
Filling gapsThe unevenly spaced data problem can be solve by (few suggestions):    Finding a way to reduce the unevenly sp...
Lomb-Scargle peridogram                                      &       − τ                      &     − τ                  =...
Deconvolution analysis
Deconvolution                     2 & ⊗ % &             = 3 & + ε &                                               signal  ...
Hogbom CLEAN algorithmThe first CLEAN method was developed by Hogbom (1974). It constructsdiscrete approximations of the c...
CLEAN algorithmThe basic steps of the CLEAN algorithm used in asteroseismology are:1. Compute the power spectra of the sig...
All poles: =                       π &δ        &      =           2 &          =                         :                ...
Phase dispersion minimization           PDM
DefinitionsA discrete set of observations can be represented by to vectors, the magnitudesand the observation times ( with...
PDM as period search methodSuppose that one want to minimize thevariance of a data set with respect to themean light curve...
Wavelets
Wavelets transformWavelets are a class of functions used to localize a given function in both space andscaling. A family o...
Applications in variable stars Szatmáry et al. (1994) - fig. 17: Double mode oscillation.
Conclusion
Short overview   Data analysis results must never be subjective, it should return the best fittingparameters, the underlyi...
Data analysis and Its applications      on Asteroseismology                Olga Moreira                 April 2005        ...
OutlinePrinciples of data analysis         Introduction to spectral analysisIntroduction                        Fourier an...
Part IPrinciples of data analysis
Introduction
What do you think of when someone say “data”?                        Roxbourg & Paternó   - Eddington                     ...
What do you think of when someone say “data”?
What do you think of when someone say “data”?
What do you think of when someone say “data”?
What do you think of when someone say “data”?
What all those definitions of data have in common?Incomplete        Probability                                     Infere...
Analysis MethodMerit function    Best fit     Goodness-of-fit
Analysis MethodA complete analysis should provide:    Parameters;    Error estimates on the parameters;    A statistical m...
Merit functions andparameters fitting
Maximum Likelihood Estimators (MLE)λ = λ       λ        λ           : Set of   parameters    =                            ...
It is common to find defined the as the likelihood., but in fact isjust the logarithm of the likelihood, which more easy t...
Error Estimate                 λGaussian shape       Non-guassian shape          Non-guassian shape                     wi...
Estimator: Desirable properties Unbiased:                             Minimum variance :     λ = λ −λ =                   ...
MLE asymptotically unbiased ′ λ     =        λ    +     λ − λ     λ   +Neglecting the largers orders and    →∞          λ ...
In multi-dimensions: =                         λ= λ λ    λ                    λ =      λ =              λ                 ...
If        λ λ      =       ↔       ≠  λ    ≈ λ      ± σ     λ                          =      If        λ λ      ≠       ↔...
Least-square and Chi-square fit1. Considering one measures with errors that are independently and normal   distributed aro...
Limitations:    Real data most of the time violate the i.i.d condition    Sometimes one have a limited sample    In practi...
Example: modes stochastically excited  ν =%ν χ νFor a single mode:                     Γ% ν    =                          ...
Maximization/Minimization        Problem
Going “Downhill” MethodsFinding a global extreme is general very                                                          ...
Falling in the wrong valleyThe downhill methods a lack onefficiency/robustness. For instance the simplexmethod can very fa...
Exotic Methods              Solving “The traveling salesman problem”:              A salesman has to visit each city on a ...
Goodness-of-fit
Chi-square test                    −                      : is the number of events observed in χ =                       ...
Kolmogorov-Smirnov (K-S) test                                                                                             ...
Synthetic data
Monte-Carlo simulations           If one know something about the           process that generated our data , given       ...
Hare-and-Hounds game          Team A:    generates theoretical mode frequencies and synthetictime series.            Team ...
End of Part IOptions available : • Questions • Coffee break • “Get on with it !!!”
Part IIIntroduction to spectral       analysis
Fourier transformProperties:                                                +∞                   2               = 2( & ) ...
Sampling theorem                                     γ                       ⋅γ              2&                     ϒ&    ...
Undersampling                                               The sampling theorem assumes that a                           ...
Discrete Fourier transformDiscrete Fourier transform         2    &    =                       π&           =   δ        =...
Power spectrum estimationPeriodogram:           & =       2 &      =                         π&                           ...
Frequencies leakage:   Leakage from nearby frequencies, which is described usually as a spectralwindow and is a primarily ...
Futher complications          Closely spaced frequencies:          Direct contribution for the first          aforemention...
Power spectrum of random noise        =3     +77       →,              ,        ,        +,,3       → +         ,3The esti...
Filling gapsThe unevenly spaced data problem can be solve by (few suggestions):    Finding a way to reduce the unevenly sp...
Lomb-Scargle peridogram                                      &       − τ                      &     − τ                  =...
Deconvolution analysis
Deconvolution                     2 & ⊗ % &             = 3 & + ε &                                               signal  ...
Hogbom CLEAN algorithmThe first CLEAN method was developed by Hogbom (1974). It constructsdiscrete approximations of the c...
CLEAN algorithmThe basic steps of the CLEAN algorithm used in asteroseismology are:1. Compute the power spectra of the sig...
All poles: =                       π &δ        &      =           2 &          =                         :                ...
Phase dispersion minimization           PDM
DefinitionsA discrete set of observations can be represented by to vectors, the magnitudesand the observation times ( with...
PDM as period search methodSuppose that one want to minimize thevariance of a data set with respect to themean light curve...
Wavelets
Wavelets transformWavelets are a class of functions used to localize a given function in both space andscaling. A family o...
Applications in variable stars Szatmáry et al. (1994) - fig. 17: Double mode oscillation.
Conclusion
Short overview   Data analysis results must never be subjective, it should return the best fittingparameters, the underlyi...
Upcoming SlideShare
Loading in...5
×

Dataanalysis2

109

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
109
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Dataanalysis2

  1. 1. Data analysis and Its applications on Asteroseismology Olga Moreira April 2005 DEA en Sciences “Astérosismologie” Lectured by Anne Thoul
  2. 2. OutlinePrinciples of data analysis Introduction to spectral analysisIntroduction Fourier analysis Fourier transform Power spectrum estimationMerit functions an parametersfitting Deconvolution analysisMaximum Likelihood Estimator CLEAN All polesMaximization/Minimization ProblemOrdinary methods Phase dispersion MinimizationExotic methods Period searchGoodness-of-fit Wavelet analysisChi-square test Wavelets transform and Its applicationsK-S testThe beauty of synthetic dataMonte-Carlo simulationsHare-and-Hounds game
  3. 3. Part IPrinciples of data analysis
  4. 4. Introduction
  5. 5. What do you think of when someone say “data”? Roxbourg & Paternó - Eddington Workshop (Italy)
  6. 6. What do you think of when someone say “data”?
  7. 7. What do you think of when someone say “data”?
  8. 8. What do you think of when someone say “data”?
  9. 9. What do you think of when someone say “data”?
  10. 10. What all those definitions of data have in common?Incomplete Probability Inferencesinformation theory Data Tools Analysis
  11. 11. Analysis MethodMerit function Best fit Goodness-of-fit
  12. 12. Analysis MethodA complete analysis should provide: Parameters; Error estimates on the parameters; A statistical measure of the goodness-of-fit Ignoring the 3rd step will bring drastical consequences
  13. 13. Merit functions andparameters fitting
  14. 14. Maximum Likelihood Estimators (MLE)λ = λ λ λ : Set of parameters = : Set of random variables = λ : Probability distribution characterized by λ andThe posteriori probability of a single measurement is given by: = λIf are a set of independents and identical distributed (i.i.d) then the joint probabilityfunction becomes: = ∏ = λWhere λ =∏ λ is defined as the Likelihood =• The best fit of parameters is the one that maximizes the likelihood.
  15. 15. It is common to find defined the as the likelihood., but in fact isjust the logarithm of the likelihood, which more easy to work with. = = λ or = − = Posteriori probability is the probability after the event under nocircuntances should the likelihood be confused with probabilitydensity.
  16. 16. Error Estimate λGaussian shape Non-guassian shape Non-guassian shape with a single with several local λ = λ ± ∆λ extreme: extremes: ∆λ ≈ σ • No problem on the • Problems on the determination of determination of maximum, although it maximum can represent some • Problems on the error difficulties to the error bars estimative. bars estimative.
  17. 17. Estimator: Desirable properties Unbiased: Minimum variance : λ = λ −λ = σ λ → Information inequality/Cramer-Rao inequality: + ′λσ λ ≥ λ = ′ =− λ ′ λ = σ λ ≥ λ • The larger the information the smaller the variance
  18. 18. MLE asymptotically unbiased ′ λ = λ + λ − λ λ +Neglecting the largers orders and →∞ λ =− λ − λ −λ λ =− λ −λ + λ ≈ σ λ λThe MLE function has the form of normal distribution with σ λ =and : λ λ = λ ± σ λ
  19. 19. In multi-dimensions: = λ= λ λ λ λ = λ = λ = ∂ λ ∂ λ λ = λ + λ −λ − λ −λ λ −λ + = ∂λ = = ∂λ ∂λ λ = λ + λ −λ λ −λ + ∂ λ →∞ = = Hessian matrix ∂λ ∂λ = λ −λ λ −λ Multivariate gaussian distribution λ λ λ λ = − ρ λ λ = σ λ = − σ λ σ λ
  20. 20. If λ λ = ↔ ≠ λ ≈ λ ± σ λ = If λ λ ≠ ↔ ≠Tere is an error region not only defined by σ λ but by the completecovariance matrix . For instance in 2D the error region defines an elipse.
  21. 21. Least-square and Chi-square fit1. Considering one measures with errors that are independently and normal distributed around the true value2. The standard deviations σ are the same for all points.Then joint probability for !"# is given by $ ## ∝∏ − − ∆ = σMaximazing is the same as minimazing − = σ The least-square fitting is the MLE of the fitted parameters if the measurementsare independent and normally distributed.3. If the deviations are different σ ! σ then : − =χ = σ
  22. 22. Limitations: Real data most of the time violate the i.i.d condition Sometimes one have a limited sample In practice depends on the λ behaviour The MLE grants a unique solution. α =α λ ∂ ∂ ∂α ∂ ∂ = ⋅ = = ∂λ ∂α ∂λ ∂α ∂λBut the uncertity of an estimate depends on the specifique choice of λ λ α α ∂λ λ λ α α λ ∂α λ <λ<λ = λ +∞ = α +∞ ≠ α +∞ ∂λ λ λ α α λ −∞ −∞ ∂α −∞
  23. 23. Example: modes stochastically excited ν =%ν χ νFor a single mode: Γ% ν = + Γ ν −ν + ν ν λ = − % ν λ % ν λ ν =− = %ν λ + = %ν Minimization λ= Γν →λ = Γν
  24. 24. Maximization/Minimization Problem
  25. 25. Going “Downhill” MethodsFinding a global extreme is general very Press et al.(1992)difficult.For one dimensional minimizationUsually there are two types of methods:• Methods that bracket the minimum: Goldensection search, and parabolic interpolation(Brent’s Method)• Methods that use the first derivative Press et al.(1992)information. Multidimensional there are three kind of methods: • Direction-set methods. Powell’s method is the prototype. • Downhill Simplex method. • Methods that use the gradient information. Adapted from Press et al.(1992)
  26. 26. Falling in the wrong valleyThe downhill methods a lack onefficiency/robustness. For instance the simplexmethod can very fast for some functions and veryslow for others.They depend on priori knowledge of the overallstructure of vector space, and require repeatedmanual intervention. If the function to minimize is not well-known, sometimes, numerically speaking, a smooth hill can become an headache. They also don’t solve the famous combinatorial analysis problem : The traveling salesman problem
  27. 27. Exotic Methods Solving “The traveling salesman problem”: A salesman has to visit each city on a given list, knowing the distance between all cities will try to minimize the length of his tour.Methods available:Simulated Annealing: based on an analogywith thermodynamics.Genetic algorithms: based on an analogyto evolutionary selection rules.Nearest NeighborNeural networks :based on the observationof biological neural network (brains).Knowledge-based systems, etc … Adapted from Charbonneau (1995)
  28. 28. Goodness-of-fit
  29. 29. Chi-square test − : is the number of events observed in χ = the ith bin : is the number expected according to = some known distribution + + - +# , ,# + ! , #$% , ) ( " ( !" #$% & " $ " !" #$% & " $ * * "H0: The data follow a specified distributionSignificance level is determined by - χ && is the degree of freedom: & ! ( + +, ) + )* )+ + ,Normally acceptable models have - . /# , but day-in and day-out we find //"accepted models with -!" / 0
  30. 30. Kolmogorov-Smirnov (K-S) test Press et al.(1992)% ( ) : Cumulative distribution ( ) : Known cumulative distribution1 : Maximum absolute differencebetween the two cumulative functionsThe significance of an observed valueof D is given approximately by: 1 = % −℘ 1> =- , + + + 1 −∞ > > +∞ + ∞ −- = − − = + , + =- , is a monotonic function with limits values: - , = : Largest agreement - , ∞ = : Smallest agreement
  31. 31. Synthetic data
  32. 32. Monte-Carlo simulations If one know something about the process that generated our data , given an assumed set of parameters l then one can figure out how to simulate our own sets of “synthetic” realizations of these parameters. The procedure is to draw random numbers from appropriate distribution so as to mimic our best understanding of the underlying processes and measurement errors. Stello et al. (2004) xi-hya
  33. 33. Hare-and-Hounds game Team A: generates theoretical mode frequencies and synthetictime series. Team B: analyses the time series, performs the mode identificationand fitting, does the structure inversionRules: The teams only have access to time series. Nothing else is allowed.
  34. 34. End of Part IOptions available : • Questions • Coffee break • “Get on with it !!!”
  35. 35. Part IIIntroduction to spectral analysis
  36. 36. Fourier transformProperties: +∞ 2 = 2( & ) = π& −∞ 2 ( ( )+ 3 ) = 2 & +4 & & 2 =2& 2 ( )= 2 +∞ ( )= 3 +( ⇔ 2 ( )=2& ⋅4 & −∞ 2 ( ( )⊗ 3 ) = 2 & ⋅4 &Parseval’s Theorem:The power of a signal represented by f(t) is the same whether computed in time spaceor in frequency space: +∞ +∞ = 2 & & −∞ −∞
  37. 37. Sampling theorem γ ⋅γ 2& ϒ& 2 & ⊗ϒ &Adapted from Bracewell (1986)For a bandlimited signal, which has no components the frequency & , thesampling theorem states that a real signal can be reconstructed without errorfrom samples taken uniformly at &.5& . The minimum sampling frequency,2 & !5& is called the Nyquist frequency, corresponding to the sampling interval !"5& ( where ! ). 6
  38. 38. Undersampling The sampling theorem assumes that a signal is limited in frequency but in practice the signal is time limited. For 1 alias spectrum ."5& then signal the signal is 6 Spectrum undersampled. Overlying tails appear in spectrum, spectrum alias. Adapted from Bracewell (1986)Aliasing :Examining the terms of undersampled Fourier transform (FT) (Bracewell (1986)): The undersampled FT is evener than the complete FT as consequence thesampling procedure discriminates the zero components at &!& There is a leakage of the high frequencies (aliasing)
  39. 39. Discrete Fourier transformDiscrete Fourier transform 2 & = π& = δ = =Discrete form of Parseval’s theorem: = 2 & = = Fast Fourier Transform (FFT): The FFT is a discrete Fourier transform algorithm which reduces the number of computation of N pints from 5 5 to 5 3 . This is done by means of Danielson- Lanczos lemma, which basic idea id to break a transform of length to 5 transforms of length 6 5.
  40. 40. Power spectrum estimationPeriodogram: & = 2 & = π& = & = π& + π& = =If contains periodic signal i.e.: =3 +7 Random noise 3 = π& +ϕThen at & =&/ there is a large contribution in the sum , for other values the terms inthe sum will be randomly negative and positive, yielding to small contribution. Thusa peak in the periodogram reveals the existence of a periodic embedded signal.
  41. 41. Frequencies leakage: Leakage from nearby frequencies, which is described usually as a spectralwindow and is a primarily product of the finite length of data. Leakage from high frequencies, due to data sampling, the aforementionedaliasing.Tapering functions: Sometimes also called as data windowing. These functionstry to smooth the leakage between frequencies bringing the interference slowlyback to zero. The main goal is to narrow the peak and vanish the side lobes.Smoothing can represents in certain cases loss of information. Press et al.(1992) Press et al.(1992)
  42. 42. Futher complications Closely spaced frequencies: Direct contribution for the first aforementioned leakage. 2( + ) = 2 & +2 & = 2 & + 2 & + 2 & 2 & Damping: =( π& −ϕ ) −η The peak in power spectrum will have a Lorentzian profile
  43. 43. Power spectrum of random noise =3 +77 →, , , +,,3 → + ,3The estimation of spectral density:ρ & = γ 7 π& =γ 7 → ( + 7Thus : No matter how much one increase the number of 7 & =σ 7 points, N, the signal-to-noise will tend to be constant.For unevenly spaced data (missing data) the equation (1) isn’t always valid, indeedit’s only valid for homogeneous white noise (independent and identically distributednormal random variables)
  44. 44. Filling gapsThe unevenly spaced data problem can be solve by (few suggestions): Finding a way to reduce the unevenly spaced sample into a evenly spaced. Basic idea: Interpolation of the missing points (problem: Doesn’t work forlong gaps) Using the Lomb-Scargle periodogram Doing a deconvolution analysis (Filters)
  45. 45. Lomb-Scargle peridogram & − τ & − τ = = & = + & − τ & − τ = = &τ = − = & & = It’s like weighting the data on a “per point” basis instead on a “per timeinterval” basis, which make independent on sampling irregularity. It has an exponential probability distribution with unit mean, which meansone can establish a false-alarm probability of the null hypothesis (significancelevel). 9> 9 = − − −9 8 ≈ 8 −9
  46. 46. Deconvolution analysis
  47. 47. Deconvolution 2 & ⊗ % & = 3 & + ε & signal noise Linear algorithms: inverse filtering or Wiener filtering. The are inapplicable toincomplete sampling (irregular sampling) of spatial frequency. Non-linear algorithm: CLEAN, All poles. Problem : The deconvolution usually does not a unique solutions.
  48. 48. Hogbom CLEAN algorithmThe first CLEAN method was developed by Hogbom (1974). It constructsdiscrete approximations of the clean map from the convolution equation: ⊗ =Starting with /=/ , it searches for the largest value in the residual map: = − ⊗ −After locating the largest residual of given amplitude, it subtracts it from to toyield to . The iteration continues until root-mean-square (RMS) decreases tosome level. Each subtracted location is saved in so-called CLEAN map. Theresulting final map denoted by it is assumed that is mainly in noise.
  49. 49. CLEAN algorithmThe basic steps of the CLEAN algorithm used in asteroseismology are:1. Compute the power spectra of the signal and identify the dominant period2. Perform a least-square fit to the data to obtain the amplitude and phase of the identified mode.3. Constructs the time series corresponding to that single mode and subtracts from the original signal to obtain a new signal4. Repeats all steps until all its left is noise.Stello et al. (2004) proposed a improvement to this algorithm, by after subtracting the frequency it recalculates the amplitude, phase and frequencies of the previous subtracted peaks while fixing the frequency of the latest extracted peak.
  50. 50. All poles: = π &δ & = 2 & = : =The discrete FT is a particular case of the Z-transform (unilateral): +∞ : = : =It turns up that one can have some advantages by doing the following approximation: & ≈ 8 Press et al.(1992) + : = The notable fact is that the equation allows to have poles, corresponding to infinite spectral power density, on the unit z-circle (at the real frequencies of the Nyquist interval), and such poles can provide an accurate representation for underlying power spectra that have sharp discrete “lines” or delta-functions. M is called the number of poles. This approximation does under several names all-poles model, Maximum Entropy method (MEM), auto regressive model (AR).
  51. 51. Phase dispersion minimization PDM
  52. 52. DefinitionsA discrete set of observations can be represented by to vectors, the magnitudesand the observation times ( with !"; ). Thus the variance of is given: − σ = = = − =Suppose that one divides the initial set into several subsets/samples. If M are thenumber samples, having , variances, and containing data points then the over allvariance for all the samples is given by: − , = % = − 8 =
  53. 53. PDM as period search methodSuppose that one want to minimize thevariance of a data set with respect to themean light curve.The phase vector is given: − φ =Considering as a function of thephase, the variance of thesesamples gives a scatter around themean light curve.Defining : %Θ = σIf P is not the true period % ≈ σ Θ ≈If P is true value then Θ will reach a local minimum.Mathematically, the PDM is a least-square fitting, but rather than fitting a givencurve, is a fitting relatively to mean curve as defined by means of each bin,simultaneously one obtain the best period.
  54. 54. Wavelets
  55. 55. Wavelets transformWavelets are a class of functions used to localize a given function in both space andscaling. A family of wavelets can be constructed from a function Ψ * sometimesknown as the “mother wavelet” which is confined in a finite interval. The “daughterwavelets” Ψ are then formed by translation of (b) and contraction of (a).An individual wavelet can be written as: − * Ψ * = ΨThen the wavelet transform is given by: +∞ − * < * = Ψ −∞ +∞ +∞ − = Ψ Ψ * Ψ * * −∞ −∞
  56. 56. Applications in variable stars Szatmáry et al. (1994) - fig. 17: Double mode oscillation.
  57. 57. Conclusion
  58. 58. Short overview Data analysis results must never be subjective, it should return the best fittingparameters, the underlying errors, accuracy of the fitted model. All the providedstatistical information must be clear. Because data is necessary in all scientific fields there a bunch methods foroptimization, merit functions, spectral analysis… Therefore, sometimes is not easy todecided which method is the ideal method. Most of the time it the decisiondependents on the data to be analyzed. All that has been considering here, was the case of a deterministic signal (a fixedamplitude) add to random noise. Sometimes the signal itself is probabilistic
  59. 59. Data analysis and Its applications on Asteroseismology Olga Moreira April 2005 DEA en Sciences “Astérosismologie” Lectured by Anne Thoul
  60. 60. OutlinePrinciples of data analysis Introduction to spectral analysisIntroduction Fourier analysis Fourier transform Power spectrum estimationMerit functions an parametersfitting Deconvolution analysisMaximum Likelihood Estimator CLEAN All polesMaximization/Minimization ProblemOrdinary methods Phase dispersion MinimizationExotic methods Period searchGoodness-of-fit Wavelet analysisChi-square test Wavelets transform and Its applicationsK-S testThe beauty of synthetic dataMonte-Carlo simulationsHare-and-Hounds game
  61. 61. Part IPrinciples of data analysis
  62. 62. Introduction
  63. 63. What do you think of when someone say “data”? Roxbourg & Paternó - Eddington Workshop (Italy)
  64. 64. What do you think of when someone say “data”?
  65. 65. What do you think of when someone say “data”?
  66. 66. What do you think of when someone say “data”?
  67. 67. What do you think of when someone say “data”?
  68. 68. What all those definitions of data have in common?Incomplete Probability Inferencesinformation theory Data Tools Analysis
  69. 69. Analysis MethodMerit function Best fit Goodness-of-fit
  70. 70. Analysis MethodA complete analysis should provide: Parameters; Error estimates on the parameters; A statistical measure of the goodness-of-fit Ignoring the 3rd step will bring drastical consequences
  71. 71. Merit functions andparameters fitting
  72. 72. Maximum Likelihood Estimators (MLE)λ = λ λ λ : Set of parameters = : Set of random variables = λ : Probability distribution characterized by λ andThe posteriori probability of a single measurement is given by: = λIf are a set of independents and identical distributed (i.i.d) then the joint probabilityfunction becomes: = ∏ = λWhere λ =∏ λ is defined as the Likelihood =• The best fit of parameters is the one that maximizes the likelihood.
  73. 73. It is common to find defined the as the likelihood., but in fact isjust the logarithm of the likelihood, which more easy to work with. = = λ or = − = Posteriori probability is the probability after the event under nocircuntances should the likelihood be confused with probabilitydensity.
  74. 74. Error Estimate λGaussian shape Non-guassian shape Non-guassian shape with a single with several local λ = λ ± ∆λ extreme: extremes: ∆λ ≈ σ • No problem on the • Problems on the determination of determination of maximum, although it maximum can represent some • Problems on the error difficulties to the error bars estimative. bars estimative.
  75. 75. Estimator: Desirable properties Unbiased: Minimum variance : λ = λ −λ = σ λ → Information inequality/Cramer-Rao inequality: + ′λσ λ ≥ λ = ′ =− λ ′ λ = σ λ ≥ λ • The larger the information the smaller the variance
  76. 76. MLE asymptotically unbiased ′ λ = λ + λ − λ λ +Neglecting the largers orders and →∞ λ =− λ − λ −λ λ =− λ −λ + λ ≈ σ λ λThe MLE function has the form of normal distribution with σ λ =and : λ λ = λ ± σ λ
  77. 77. In multi-dimensions: = λ= λ λ λ λ = λ = λ = ∂ λ ∂ λ λ = λ + λ −λ − λ −λ λ −λ + = ∂λ = = ∂λ ∂λ λ = λ + λ −λ λ −λ + ∂ λ →∞ = = Hessian matrix ∂λ ∂λ = λ −λ λ −λ Multivariate gaussian distribution λ λ λ λ = − ρ λ λ = σ λ = − σ λ σ λ
  78. 78. If λ λ = ↔ ≠ λ ≈ λ ± σ λ = If λ λ ≠ ↔ ≠Tere is an error region not only defined by σ λ but by the completecovariance matrix . For instance in 2D the error region defines an elipse.
  79. 79. Least-square and Chi-square fit1. Considering one measures with errors that are independently and normal distributed around the true value2. The standard deviations σ are the same for all points.Then joint probability for !"# is given by $ ## ∝∏ − − ∆ = σMaximazing is the same as minimazing − = σ The least-square fitting is the MLE of the fitted parameters if the measurementsare independent and normally distributed.3. If the deviations are different σ ! σ then : − =χ = σ
  80. 80. Limitations: Real data most of the time violate the i.i.d condition Sometimes one have a limited sample In practice depends on the λ behaviour The MLE grants a unique solution. α =α λ ∂ ∂ ∂α ∂ ∂ = ⋅ = = ∂λ ∂α ∂λ ∂α ∂λBut the uncertity of an estimate depends on the specifique choice of λ λ α α ∂λ λ λ α α λ ∂α λ <λ<λ = λ +∞ = α +∞ ≠ α +∞ ∂λ λ λ α α λ −∞ −∞ ∂α −∞
  81. 81. Example: modes stochastically excited ν =%ν χ νFor a single mode: Γ% ν = + Γ ν −ν + ν ν λ = − % ν λ % ν λ ν =− = %ν λ + = %ν Minimization λ= Γν →λ = Γν
  82. 82. Maximization/Minimization Problem
  83. 83. Going “Downhill” MethodsFinding a global extreme is general very Press et al.(1992)difficult.For one dimensional minimizationUsually there are two types of methods:• Methods that bracket the minimum: Goldensection search, and parabolic interpolation(Brent’s Method)• Methods that use the first derivative Press et al.(1992)information. Multidimensional there are three kind of methods: • Direction-set methods. Powell’s method is the prototype. • Downhill Simplex method. • Methods that use the gradient information. Adapted from Press et al.(1992)
  84. 84. Falling in the wrong valleyThe downhill methods a lack onefficiency/robustness. For instance the simplexmethod can very fast for some functions and veryslow for others.They depend on priori knowledge of the overallstructure of vector space, and require repeatedmanual intervention. If the function to minimize is not well-known, sometimes, numerically speaking, a smooth hill can become an headache. They also don’t solve the famous combinatorial analysis problem : The traveling salesman problem
  85. 85. Exotic Methods Solving “The traveling salesman problem”: A salesman has to visit each city on a given list, knowing the distance between all cities will try to minimize the length of his tour.Methods available:Simulated Annealing: based on an analogywith thermodynamics.Genetic algorithms: based on an analogyto evolutionary selection rules.Nearest NeighborNeural networks :based on the observationof biological neural network (brains).Knowledge-based systems, etc … Adapted from Charbonneau (1995)
  86. 86. Goodness-of-fit
  87. 87. Chi-square test − : is the number of events observed in χ = the ith bin : is the number expected according to = some known distribution + + - +# , ,# + ! , #$% , ) ( " ( !" #$% & " $ " !" #$% & " $ * * "H0: The data follow a specified distributionSignificance level is determined by - χ && is the degree of freedom: & ! ( + +, ) + )* )+ + ,Normally acceptable models have - . /# , but day-in and day-out we find //"accepted models with -!" / 0
  88. 88. Kolmogorov-Smirnov (K-S) test Press et al.(1992)% ( ) : Cumulative distribution ( ) : Known cumulative distribution1 : Maximum absolute differencebetween the two cumulative functionsThe significance of an observed valueof D is given approximately by: 1 = % −℘ 1> =- , + + + 1 −∞ > > +∞ + ∞ −- = − − = + , + =- , is a monotonic function with limits values: - , = : Largest agreement - , ∞ = : Smallest agreement
  89. 89. Synthetic data
  90. 90. Monte-Carlo simulations If one know something about the process that generated our data , given an assumed set of parameters l then one can figure out how to simulate our own sets of “synthetic” realizations of these parameters. The procedure is to draw random numbers from appropriate distribution so as to mimic our best understanding of the underlying processes and measurement errors. Stello et al. (2004) xi-hya
  91. 91. Hare-and-Hounds game Team A: generates theoretical mode frequencies and synthetictime series. Team B: analyses the time series, performs the mode identificationand fitting, does the structure inversionRules: The teams only have access to time series. Nothing else is allowed.
  92. 92. End of Part IOptions available : • Questions • Coffee break • “Get on with it !!!”
  93. 93. Part IIIntroduction to spectral analysis
  94. 94. Fourier transformProperties: +∞ 2 = 2( & ) = π& −∞ 2 ( ( )+ 3 ) = 2 & +4 & & 2 =2& 2 ( )= 2 +∞ ( )= 3 +( ⇔ 2 ( )=2& ⋅4 & −∞ 2 ( ( )⊗ 3 ) = 2 & ⋅4 &Parseval’s Theorem:The power of a signal represented by f(t) is the same whether computed in time spaceor in frequency space: +∞ +∞ = 2 & & −∞ −∞
  95. 95. Sampling theorem γ ⋅γ 2& ϒ& 2 & ⊗ϒ &Adapted from Bracewell (1986)For a bandlimited signal, which has no components the frequency & , thesampling theorem states that a real signal can be reconstructed without errorfrom samples taken uniformly at &.5& . The minimum sampling frequency,2 & !5& is called the Nyquist frequency, corresponding to the sampling interval !"5& ( where ! ). 6
  96. 96. Undersampling The sampling theorem assumes that a signal is limited in frequency but in practice the signal is time limited. For 1 alias spectrum ."5& then signal the signal is 6 Spectrum undersampled. Overlying tails appear in spectrum, spectrum alias. Adapted from Bracewell (1986)Aliasing :Examining the terms of undersampled Fourier transform (FT) (Bracewell (1986)): The undersampled FT is evener than the complete FT as consequence thesampling procedure discriminates the zero components at &!& There is a leakage of the high frequencies (aliasing)
  97. 97. Discrete Fourier transformDiscrete Fourier transform 2 & = π& = δ = =Discrete form of Parseval’s theorem: = 2 & = = Fast Fourier Transform (FFT): The FFT is a discrete Fourier transform algorithm which reduces the number of computation of N pints from 5 5 to 5 3 . This is done by means of Danielson- Lanczos lemma, which basic idea id to break a transform of length to 5 transforms of length 6 5.
  98. 98. Power spectrum estimationPeriodogram: & = 2 & = π& = & = π& + π& = =If contains periodic signal i.e.: =3 +7 Random noise 3 = π& +ϕThen at & =&/ there is a large contribution in the sum , for other values the terms inthe sum will be randomly negative and positive, yielding to small contribution. Thusa peak in the periodogram reveals the existence of a periodic embedded signal.
  99. 99. Frequencies leakage: Leakage from nearby frequencies, which is described usually as a spectralwindow and is a primarily product of the finite length of data. Leakage from high frequencies, due to data sampling, the aforementionedaliasing.Tapering functions: Sometimes also called as data windowing. These functionstry to smooth the leakage between frequencies bringing the interference slowlyback to zero. The main goal is to narrow the peak and vanish the side lobes.Smoothing can represents in certain cases loss of information. Press et al.(1992) Press et al.(1992)
  100. 100. Futher complications Closely spaced frequencies: Direct contribution for the first aforementioned leakage. 2( + ) = 2 & +2 & = 2 & + 2 & + 2 & 2 & Damping: =( π& −ϕ ) −η The peak in power spectrum will have a Lorentzian profile
  101. 101. Power spectrum of random noise =3 +77 →, , , +,,3 → + ,3The estimation of spectral density:ρ & = γ 7 π& =γ 7 → ( + 7Thus : No matter how much one increase the number of 7 & =σ 7 points, N, the signal-to-noise will tend to be constant.For unevenly spaced data (missing data) the equation (1) isn’t always valid, indeedit’s only valid for homogeneous white noise (independent and identically distributednormal random variables)
  102. 102. Filling gapsThe unevenly spaced data problem can be solve by (few suggestions): Finding a way to reduce the unevenly spaced sample into a evenly spaced. Basic idea: Interpolation of the missing points (problem: Doesn’t work forlong gaps) Using the Lomb-Scargle periodogram Doing a deconvolution analysis (Filters)
  103. 103. Lomb-Scargle peridogram & − τ & − τ = = & = + & − τ & − τ = = &τ = − = & & = It’s like weighting the data on a “per point” basis instead on a “per timeinterval” basis, which make independent on sampling irregularity. It has an exponential probability distribution with unit mean, which meansone can establish a false-alarm probability of the null hypothesis (significancelevel). 9> 9 = − − −9 8 ≈ 8 −9
  104. 104. Deconvolution analysis
  105. 105. Deconvolution 2 & ⊗ % & = 3 & + ε & signal noise Linear algorithms: inverse filtering or Wiener filtering. The are inapplicable toincomplete sampling (irregular sampling) of spatial frequency. Non-linear algorithm: CLEAN, All poles. Problem : The deconvolution usually does not a unique solutions.
  106. 106. Hogbom CLEAN algorithmThe first CLEAN method was developed by Hogbom (1974). It constructsdiscrete approximations of the clean map from the convolution equation: ⊗ =Starting with /=/ , it searches for the largest value in the residual map: = − ⊗ −After locating the largest residual of given amplitude, it subtracts it from to toyield to . The iteration continues until root-mean-square (RMS) decreases tosome level. Each subtracted location is saved in so-called CLEAN map. Theresulting final map denoted by it is assumed that is mainly in noise.
  107. 107. CLEAN algorithmThe basic steps of the CLEAN algorithm used in asteroseismology are:1. Compute the power spectra of the signal and identify the dominant period2. Perform a least-square fit to the data to obtain the amplitude and phase of the identified mode.3. Constructs the time series corresponding to that single mode and subtracts from the original signal to obtain a new signal4. Repeats all steps until all its left is noise.Stello et al. (2004) proposed a improvement to this algorithm, by after subtracting the frequency it recalculates the amplitude, phase and frequencies of the previous subtracted peaks while fixing the frequency of the latest extracted peak.
  108. 108. All poles: = π &δ & = 2 & = : =The discrete FT is a particular case of the Z-transform (unilateral): +∞ : = : =It turns up that one can have some advantages by doing the following approximation: & ≈ 8 Press et al.(1992) + : = The notable fact is that the equation allows to have poles, corresponding to infinite spectral power density, on the unit z-circle (at the real frequencies of the Nyquist interval), and such poles can provide an accurate representation for underlying power spectra that have sharp discrete “lines” or delta-functions. M is called the number of poles. This approximation does under several names all-poles model, Maximum Entropy method (MEM), auto regressive model (AR).
  109. 109. Phase dispersion minimization PDM
  110. 110. DefinitionsA discrete set of observations can be represented by to vectors, the magnitudesand the observation times ( with !"; ). Thus the variance of is given: − σ = = = − =Suppose that one divides the initial set into several subsets/samples. If M are thenumber samples, having , variances, and containing data points then the over allvariance for all the samples is given by: − , = % = − 8 =
  111. 111. PDM as period search methodSuppose that one want to minimize thevariance of a data set with respect to themean light curve.The phase vector is given: − φ =Considering as a function of thephase, the variance of thesesamples gives a scatter around themean light curve.Defining : %Θ = σIf P is not the true period % ≈ σ Θ ≈If P is true value then Θ will reach a local minimum.Mathematically, the PDM is a least-square fitting, but rather than fitting a givencurve, is a fitting relatively to mean curve as defined by means of each bin,simultaneously one obtain the best period.
  112. 112. Wavelets
  113. 113. Wavelets transformWavelets are a class of functions used to localize a given function in both space andscaling. A family of wavelets can be constructed from a function Ψ * sometimesknown as the “mother wavelet” which is confined in a finite interval. The “daughterwavelets” Ψ are then formed by translation of (b) and contraction of (a).An individual wavelet can be written as: − * Ψ * = ΨThen the wavelet transform is given by: +∞ − * < * = Ψ −∞ +∞ +∞ − = Ψ Ψ * Ψ * * −∞ −∞
  114. 114. Applications in variable stars Szatmáry et al. (1994) - fig. 17: Double mode oscillation.
  115. 115. Conclusion
  116. 116. Short overview Data analysis results must never be subjective, it should return the best fittingparameters, the underlying errors, accuracy of the fitted model. All the providedstatistical information must be clear. Because data is necessary in all scientific fields there a bunch methods foroptimization, merit functions, spectral analysis… Therefore, sometimes is not easy todecided which method is the ideal method. Most of the time it the decisiondependents on the data to be analyzed. All that has been considering here, was the case of a deterministic signal (a fixedamplitude) add to random noise. Sometimes the signal itself is probabilistic
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×