This document discusses an adaptive step-size method for the Frank-Wolfe algorithm that eliminates the need for a manually selected step-size parameter. It presents the standard Frank-Wolfe algorithm and the Demyanov-Rubinov variant that uses a step-size based on sufficient decrease. It then proposes an adaptive Frank-Wolfe algorithm that replaces the global Lipschitz constant L with a local constant Lt, allowing for potentially larger step sizes. This adaptive approach is shown to maintain sufficient decrease and can be extended to other Frank-Wolfe variants like away-steps Frank-Wolfe.
Stochastic Frank-Wolfe for Constrained Finite Sum Minimization @ Montreal Opt...Geoffrey Négiar
We propose a novel Stochastic Frank-Wolfe (a.k.a. conditional gradient) algorithm for constrained smooth finite-sum minimization with a generalized linear prediction/structure. This class of problems includes empirical risk minimization with sparse, low-rank, or other structured constraints. The proposed method is simple to implement, does not require step-size tuning, and has a constant per-iteration cost that is independent of the dataset size. Furthermore, as a byproduct of the method we obtain a stochastic estimator of the Frank-Wolfe gap that can be used as a stopping criterion. Depending on the setting, the proposed method matches or improves on the best computational guarantees for Stochastic Frank-Wolfe algorithms. Benchmarks on several datasets highlight different regimes in which the proposed method exhibits a faster empirical convergence than related methods. Finally, we provide an implementation of all considered methods in an open-source package.
This document provides an overview of digital modulation and coding fundamentals. It introduces key concepts such as lowpass and bandpass signals, signal space concepts, and orthogonal expansion of signals. Modulation and demodulation of bandpass signals is discussed through translating a baseband signal to a higher frequency bandpass signal and vice versa.
The Fourier transform relates a signal in the time domain, x(t), to its frequency domain representation, X(jw). It represents the frequency content of the signal. The Fourier transform is a linear operation, and time shifts in the time domain result in phase shifts in the frequency domain. Differentiation in the time domain corresponds to multiplication by jw in the frequency domain. Convolution becomes simple multiplication in the frequency domain. These properties allow differential equations and systems with convolution to be solved using algebraic operations by working in the frequency domain.
The document provides notes on signals and systems from an EECE 301 course. It includes:
- An overview of continuous-time (C-T) and discrete-time (D-T) signal and system models.
- Details on chapters covering differentials/differences, convolution, Fourier analysis (both C-T and D-T), Laplace transforms, and Z-transforms.
- Examples of calculating the Fourier transform of specific signals like a decaying exponential and rectangular pulse. These illustrate properties of the Fourier transform.
The Fourier transform decomposes a signal into its constituent frequencies, representing it in the frequency domain rather than the spatial domain, which can make certain operations and analyses easier to perform; it has both magnitude and phase components that provide information about the frequency content and relative phases of the signal. The discrete Fourier transform (DFT) is a sampled version of the continuous Fourier transform that is useful for digital signal and image processing applications.
The document discusses frequency domain processing and the Fourier transform. It defines key concepts such as:
- The frequency domain represents how much of a signal lies within different frequency bands, while the time domain shows how a signal changes over time.
- The Fourier transform provides the frequency domain representation of a signal and is used to analyze signals with respect to frequency. Its inverse transform reconstructs the original signal.
- The Fourier transform decomposes a signal into orthogonal sine and cosine waves of different frequencies, showing the contribution of each frequency component. This representation is important for signal processing tasks like filtering.
This document provides an overview of the continuous-time Fourier transform. It introduces the Fourier integral and defines the Fourier transform pair. It discusses properties of the Fourier transform including linearity, time scaling, time reversal, time shifting, frequency shifting, and properties for real functions. Examples are provided to illustrate these concepts and properties. The document also reviews the discrete Fourier transform and Fourier series to provide context and comparison to the continuous-time Fourier transform.
The document summarizes properties and examples of the Fourier transform. It discusses:
1) The Fourier transform represents the frequency content of a signal and relates a signal x(t) to its frequency domain representation X(jω).
2) The Fourier transform is a linear operator, so transforms of summed signals are the sums of the individual transforms. Additionally, a time shift in the signal results in a phase shift in the frequency domain representation.
3) Differentiation in the time domain corresponds to multiplication by jω in the frequency domain. Convolution in the time domain is represented by simple multiplication of the frequency domain representations. This allows solving differential equations using Fourier transforms.
Stochastic Frank-Wolfe for Constrained Finite Sum Minimization @ Montreal Opt...Geoffrey Négiar
We propose a novel Stochastic Frank-Wolfe (a.k.a. conditional gradient) algorithm for constrained smooth finite-sum minimization with a generalized linear prediction/structure. This class of problems includes empirical risk minimization with sparse, low-rank, or other structured constraints. The proposed method is simple to implement, does not require step-size tuning, and has a constant per-iteration cost that is independent of the dataset size. Furthermore, as a byproduct of the method we obtain a stochastic estimator of the Frank-Wolfe gap that can be used as a stopping criterion. Depending on the setting, the proposed method matches or improves on the best computational guarantees for Stochastic Frank-Wolfe algorithms. Benchmarks on several datasets highlight different regimes in which the proposed method exhibits a faster empirical convergence than related methods. Finally, we provide an implementation of all considered methods in an open-source package.
This document provides an overview of digital modulation and coding fundamentals. It introduces key concepts such as lowpass and bandpass signals, signal space concepts, and orthogonal expansion of signals. Modulation and demodulation of bandpass signals is discussed through translating a baseband signal to a higher frequency bandpass signal and vice versa.
The Fourier transform relates a signal in the time domain, x(t), to its frequency domain representation, X(jw). It represents the frequency content of the signal. The Fourier transform is a linear operation, and time shifts in the time domain result in phase shifts in the frequency domain. Differentiation in the time domain corresponds to multiplication by jw in the frequency domain. Convolution becomes simple multiplication in the frequency domain. These properties allow differential equations and systems with convolution to be solved using algebraic operations by working in the frequency domain.
The document provides notes on signals and systems from an EECE 301 course. It includes:
- An overview of continuous-time (C-T) and discrete-time (D-T) signal and system models.
- Details on chapters covering differentials/differences, convolution, Fourier analysis (both C-T and D-T), Laplace transforms, and Z-transforms.
- Examples of calculating the Fourier transform of specific signals like a decaying exponential and rectangular pulse. These illustrate properties of the Fourier transform.
The Fourier transform decomposes a signal into its constituent frequencies, representing it in the frequency domain rather than the spatial domain, which can make certain operations and analyses easier to perform; it has both magnitude and phase components that provide information about the frequency content and relative phases of the signal. The discrete Fourier transform (DFT) is a sampled version of the continuous Fourier transform that is useful for digital signal and image processing applications.
The document discusses frequency domain processing and the Fourier transform. It defines key concepts such as:
- The frequency domain represents how much of a signal lies within different frequency bands, while the time domain shows how a signal changes over time.
- The Fourier transform provides the frequency domain representation of a signal and is used to analyze signals with respect to frequency. Its inverse transform reconstructs the original signal.
- The Fourier transform decomposes a signal into orthogonal sine and cosine waves of different frequencies, showing the contribution of each frequency component. This representation is important for signal processing tasks like filtering.
This document provides an overview of the continuous-time Fourier transform. It introduces the Fourier integral and defines the Fourier transform pair. It discusses properties of the Fourier transform including linearity, time scaling, time reversal, time shifting, frequency shifting, and properties for real functions. Examples are provided to illustrate these concepts and properties. The document also reviews the discrete Fourier transform and Fourier series to provide context and comparison to the continuous-time Fourier transform.
The document summarizes properties and examples of the Fourier transform. It discusses:
1) The Fourier transform represents the frequency content of a signal and relates a signal x(t) to its frequency domain representation X(jω).
2) The Fourier transform is a linear operator, so transforms of summed signals are the sums of the individual transforms. Additionally, a time shift in the signal results in a phase shift in the frequency domain representation.
3) Differentiation in the time domain corresponds to multiplication by jω in the frequency domain. Convolution in the time domain is represented by simple multiplication of the frequency domain representations. This allows solving differential equations using Fourier transforms.
This document provides definitions and examples related to Fourier series and Fourier transforms. It defines the Fourier transform and inverse Fourier transform of a function f(x). It gives the Fourier integral representation of a function and provides an example of finding the Fourier integral representation of a rectangle function. It also defines Fourier sine and cosine integrals. Finally, it outlines some properties of Fourier transforms, including the modulation theorem and convolution theorem.
The document discusses 11 properties of the Fourier transform: (1) Linearity and superposition, (2) Time scaling, (3) Time shifting, (4) Duality or symmetry, (5) Area under the time domain function equals the Fourier transform at f=0, (6) Area under the Fourier transform equals the time domain function at t=0, (7) Frequency shifting, (8) Differentiation in the time domain, (9) Integration in the time domain, (10) Multiplication in the time domain becomes convolution in the frequency domain, and (11) Convolution in the time domain becomes multiplication in the frequency domain. Each property is explained briefly.
fourier representation of signal and systemsSugeng Widodo
This document provides an overview of Fourier analysis concepts including:
- The Fourier transform decomposes a signal into its constituent frequencies.
- Properties of the Fourier transform like linearity, time/frequency shifting, and modulation are discussed.
- The Fourier transform of a time derivative or integral is related to the original Fourier transform.
- Convolution and correlation theorems explain how time domain operations translate to the frequency domain.
The document discusses modeling dynamic systems and earthquake response. It covers basic concepts like Fourier transforms, single and multi-degree of freedom systems, modal analysis, and elastic response spectra. Numerical methods are presented for dynamic analysis in the frequency and time domains, including the finite element method and method of complex response. Examples of earthquake records and harmonic motion are shown.
The document discusses Fourier analysis and its applications. It introduces:
1) Fourier series and transforms, which express any function as a sum of sines and cosines, allowing the original function to be reconstructed without loss of information.
2) The discrete Fourier transform (DFT), which is used to process digital images and signals, representing them in the frequency domain.
3) An example calculating the 2D DFT of a sample image and recovering the original from the inverse DFT.
4) Common representations of DFT outputs and filters that emphasize different frequencies.
The document discusses several key concepts related to the Fourier transform:
1) It introduces the Dirac delta function and explains how it relates to the Fourier transform of exponential and cosine functions.
2) It describes several theorems regarding how the Fourier transform is affected by scaling, shifting, summing and differentiating functions.
3) It explains that both the intensity and phase of a time domain function, and the spectral intensity and phase in the frequency domain, are needed to fully characterize the function and its Fourier transform.
The document discusses the Fourier transform, which relates a signal sampled in time or space to the same signal sampled in frequency. It explains the mathematical definition and provides an example of using a Fourier transform to convert time domain data to the frequency domain. Specifically, it uses a cosine wave as input data and calculates the Fourier transform to reveal a strong amplitude at the expected frequency component.
- The document discusses the Fourier transform, which decomposes a function into its constituent frequencies.
- It provides the mathematical definition of the Fourier transform and an example calculation.
- Some applications of the Fourier transform mentioned include image processing, data analysis, and designing filters.
Toward an Improved Computational Strategy for Vibration-Proof Structures Equi...Alessandro Palmeri
This presentation has been delivered at the 15th World Conference on Earthquake Engineering in Lisbon (Portugal) on 28th September 2012, and shows some preliminary results on the dynamic analysis on non-linear viscoelastic structures.
This presentation describes the Fourier Transform used in different mathematical and physical applications.
The presentation is at an Undergraduate in Science (math, physics, engineering) level.
Please send comments and suggestions to improvements to solo.hermelin@gmail.com.
More presentations can be found at my website http://www.solohermelin.com.
1) The Fourier transform is useful for designing filters by allowing systems to be described in the frequency domain. Important properties include linearity, time shifts, differentiation, and convolution.
2) Convolution becomes simple multiplication in the frequency domain. To solve a differential/convolution equation using Fourier transforms, take the Fourier transform of the inputs, multiply them, and take the inverse Fourier transform of the result.
3) An example shows designing a low-pass filter by taking the inverse Fourier transform of a rectangular function, producing an ideal low-pass response without time-domain oscillations. Approximating this with a causal function provides some low-pass filtering characteristics.
Signal Processing Introduction using Fourier TransformsArvind Devaraj
1) The document introduces signal processing by discussing signals, systems, and transforms. It defines signals as functions of time or space and systems as maps that manipulate signals. Transforms represent signals in different domains like frequency to simplify operations.
2) Signals can be represented in the frequency domain using Fourier transforms. This makes operations like filtering easier. Low frequencies represent overall shape while high frequencies are details like noise or edges.
3) Linear and time-invariant systems can be characterized by their impulse response. The output is the convolution of the input and impulse response. Convolution is a mechanism that shapes signals to produce outputs.
The document discusses the Fourier transform, which represents signals in terms of their frequencies rather than polynomials. It originated from Jean Fourier's idea that periodic functions can be represented as a weighted sum of sines and cosines of different frequencies. The Fourier transform generalizes this idea and represents functions as a sum of waves with different amplitudes and phases. It allows representing signals in the frequency domain rather than the spatial domain, making filtering and solving differential equations easier. The Fourier transform and its inverse are defined mathematically. It has many applications in areas like physics, signal processing, and image analysis.
The document provides an overview of Fourier transforms. It begins by introducing Fourier series which deals with continuous-time periodic signals and results in discrete frequency spectra. It then discusses how the Fourier integral and continuous Fourier transform can deal with aperiodic signals by providing continuous spectra. The continuous Fourier transform represents a function as an integral of its frequencies, while the inverse transform uses this representation to recover the original function. The properties of the Fourier transform discussed include linearity, time scaling, time reversal, time shifting, and frequency shifting. Real functions have special properties where the Fourier transform is always real or pure imaginary. Examples are provided to illustrate how to calculate the Fourier transform of simple functions.
This document discusses various operations on signals using MATLAB coding. It includes summaries of operations such as generating continuous and discrete signals like square waves, triangular waves, time shifting signals, time reversal, using the linearity property, checking for causality and stability, separating signals into even and odd components, multiplying two signals, and generating exponential and sinusoidal discrete signals. MATLAB code is provided for examples of each of these signal operations.
Fourier Series for Continuous Time & Discrete Time SignalsJayanshu Gundaniya
- Fourier introduced Fourier series in 1807 to solve the heat equation in a metal plate. The heat equation is a partial differential equation describing the distribution of heat in a body over time.
- Prior to Fourier's work, there was no known solution to the heat equation in the general case. Fourier's idea was to model a complicated heat source as a superposition of simple sine and cosine waves.
- This superposition or linear combination of sine and cosine waves is called the Fourier series. It allows any periodic function to be decomposed into the sum of simple oscillating functions. Although originally introduced for heat problems, Fourier series have wide applications in mathematics and physics.
Full paper: https://arxiv.org/pdf/1804.02339.pdf
We propose and analyze a novel adaptive step size variant of the Davis-Yin three operator splitting, a method that can solve optimization problems composed of a sum of a smooth term for which we have access to its gradient and an arbitrary number of potentially non-smooth terms for which we have access to their proximal operator. The proposed method leverages local information of the objective function, allowing for larger step sizes while preserving the convergence properties of the original method. It only requires two extra function evaluations per iteration and does not depend on any step size hyperparameter besides an initial estimate. We provide a convergence rate analysis of this method, showing sublinear convergence rate for general convex functions and linear convergence under stronger assumptions, matching the best known rates of its non adaptive variant. Finally, an empirical comparison with related methods on 6 different problems illustrates the computational advantage of the adaptive step size strategy.
This document introduces key concepts in calculus including limits, derivatives, and rates of change. It discusses:
1) How limits describe the behavior of functions as inputs get closer to a target value. Limits are used to define continuity, derivatives, and tangent lines.
2) How derivatives measure the instantaneous rate of change of a function and are defined as the limit of the average rate of change over infinitesimally small intervals.
3) Properties of continuous functions including that polynomials, rational functions, and basic operations preserve continuity. The intermediate value theorem and existence of zeros are introduced.
Mathematics (from Greek μάθημα máthēma, “knowledge, study, learning”) is the study of topics such as quantity (numbers), structure, space, and change. There is a range of views among mathematicians and philosophers as to the exact scope and definition of mathematics
This document discusses modal analysis and parameter estimation. It introduces single degree of freedom (SDOF) and multi degree of freedom (MDOF) system theory, including equations of motion, transfer functions, frequency response functions, and impulse responses. Parameter estimation can be performed in the frequency domain using FRFs or the time domain using impulse response functions. The goal is to estimate modal parameters like natural frequencies, damping ratios, and mode shapes.
This document provides definitions and examples related to Fourier series and Fourier transforms. It defines the Fourier transform and inverse Fourier transform of a function f(x). It gives the Fourier integral representation of a function and provides an example of finding the Fourier integral representation of a rectangle function. It also defines Fourier sine and cosine integrals. Finally, it outlines some properties of Fourier transforms, including the modulation theorem and convolution theorem.
The document discusses 11 properties of the Fourier transform: (1) Linearity and superposition, (2) Time scaling, (3) Time shifting, (4) Duality or symmetry, (5) Area under the time domain function equals the Fourier transform at f=0, (6) Area under the Fourier transform equals the time domain function at t=0, (7) Frequency shifting, (8) Differentiation in the time domain, (9) Integration in the time domain, (10) Multiplication in the time domain becomes convolution in the frequency domain, and (11) Convolution in the time domain becomes multiplication in the frequency domain. Each property is explained briefly.
fourier representation of signal and systemsSugeng Widodo
This document provides an overview of Fourier analysis concepts including:
- The Fourier transform decomposes a signal into its constituent frequencies.
- Properties of the Fourier transform like linearity, time/frequency shifting, and modulation are discussed.
- The Fourier transform of a time derivative or integral is related to the original Fourier transform.
- Convolution and correlation theorems explain how time domain operations translate to the frequency domain.
The document discusses modeling dynamic systems and earthquake response. It covers basic concepts like Fourier transforms, single and multi-degree of freedom systems, modal analysis, and elastic response spectra. Numerical methods are presented for dynamic analysis in the frequency and time domains, including the finite element method and method of complex response. Examples of earthquake records and harmonic motion are shown.
The document discusses Fourier analysis and its applications. It introduces:
1) Fourier series and transforms, which express any function as a sum of sines and cosines, allowing the original function to be reconstructed without loss of information.
2) The discrete Fourier transform (DFT), which is used to process digital images and signals, representing them in the frequency domain.
3) An example calculating the 2D DFT of a sample image and recovering the original from the inverse DFT.
4) Common representations of DFT outputs and filters that emphasize different frequencies.
The document discusses several key concepts related to the Fourier transform:
1) It introduces the Dirac delta function and explains how it relates to the Fourier transform of exponential and cosine functions.
2) It describes several theorems regarding how the Fourier transform is affected by scaling, shifting, summing and differentiating functions.
3) It explains that both the intensity and phase of a time domain function, and the spectral intensity and phase in the frequency domain, are needed to fully characterize the function and its Fourier transform.
The document discusses the Fourier transform, which relates a signal sampled in time or space to the same signal sampled in frequency. It explains the mathematical definition and provides an example of using a Fourier transform to convert time domain data to the frequency domain. Specifically, it uses a cosine wave as input data and calculates the Fourier transform to reveal a strong amplitude at the expected frequency component.
- The document discusses the Fourier transform, which decomposes a function into its constituent frequencies.
- It provides the mathematical definition of the Fourier transform and an example calculation.
- Some applications of the Fourier transform mentioned include image processing, data analysis, and designing filters.
Toward an Improved Computational Strategy for Vibration-Proof Structures Equi...Alessandro Palmeri
This presentation has been delivered at the 15th World Conference on Earthquake Engineering in Lisbon (Portugal) on 28th September 2012, and shows some preliminary results on the dynamic analysis on non-linear viscoelastic structures.
This presentation describes the Fourier Transform used in different mathematical and physical applications.
The presentation is at an Undergraduate in Science (math, physics, engineering) level.
Please send comments and suggestions to improvements to solo.hermelin@gmail.com.
More presentations can be found at my website http://www.solohermelin.com.
1) The Fourier transform is useful for designing filters by allowing systems to be described in the frequency domain. Important properties include linearity, time shifts, differentiation, and convolution.
2) Convolution becomes simple multiplication in the frequency domain. To solve a differential/convolution equation using Fourier transforms, take the Fourier transform of the inputs, multiply them, and take the inverse Fourier transform of the result.
3) An example shows designing a low-pass filter by taking the inverse Fourier transform of a rectangular function, producing an ideal low-pass response without time-domain oscillations. Approximating this with a causal function provides some low-pass filtering characteristics.
Signal Processing Introduction using Fourier TransformsArvind Devaraj
1) The document introduces signal processing by discussing signals, systems, and transforms. It defines signals as functions of time or space and systems as maps that manipulate signals. Transforms represent signals in different domains like frequency to simplify operations.
2) Signals can be represented in the frequency domain using Fourier transforms. This makes operations like filtering easier. Low frequencies represent overall shape while high frequencies are details like noise or edges.
3) Linear and time-invariant systems can be characterized by their impulse response. The output is the convolution of the input and impulse response. Convolution is a mechanism that shapes signals to produce outputs.
The document discusses the Fourier transform, which represents signals in terms of their frequencies rather than polynomials. It originated from Jean Fourier's idea that periodic functions can be represented as a weighted sum of sines and cosines of different frequencies. The Fourier transform generalizes this idea and represents functions as a sum of waves with different amplitudes and phases. It allows representing signals in the frequency domain rather than the spatial domain, making filtering and solving differential equations easier. The Fourier transform and its inverse are defined mathematically. It has many applications in areas like physics, signal processing, and image analysis.
The document provides an overview of Fourier transforms. It begins by introducing Fourier series which deals with continuous-time periodic signals and results in discrete frequency spectra. It then discusses how the Fourier integral and continuous Fourier transform can deal with aperiodic signals by providing continuous spectra. The continuous Fourier transform represents a function as an integral of its frequencies, while the inverse transform uses this representation to recover the original function. The properties of the Fourier transform discussed include linearity, time scaling, time reversal, time shifting, and frequency shifting. Real functions have special properties where the Fourier transform is always real or pure imaginary. Examples are provided to illustrate how to calculate the Fourier transform of simple functions.
This document discusses various operations on signals using MATLAB coding. It includes summaries of operations such as generating continuous and discrete signals like square waves, triangular waves, time shifting signals, time reversal, using the linearity property, checking for causality and stability, separating signals into even and odd components, multiplying two signals, and generating exponential and sinusoidal discrete signals. MATLAB code is provided for examples of each of these signal operations.
Fourier Series for Continuous Time & Discrete Time SignalsJayanshu Gundaniya
- Fourier introduced Fourier series in 1807 to solve the heat equation in a metal plate. The heat equation is a partial differential equation describing the distribution of heat in a body over time.
- Prior to Fourier's work, there was no known solution to the heat equation in the general case. Fourier's idea was to model a complicated heat source as a superposition of simple sine and cosine waves.
- This superposition or linear combination of sine and cosine waves is called the Fourier series. It allows any periodic function to be decomposed into the sum of simple oscillating functions. Although originally introduced for heat problems, Fourier series have wide applications in mathematics and physics.
Full paper: https://arxiv.org/pdf/1804.02339.pdf
We propose and analyze a novel adaptive step size variant of the Davis-Yin three operator splitting, a method that can solve optimization problems composed of a sum of a smooth term for which we have access to its gradient and an arbitrary number of potentially non-smooth terms for which we have access to their proximal operator. The proposed method leverages local information of the objective function, allowing for larger step sizes while preserving the convergence properties of the original method. It only requires two extra function evaluations per iteration and does not depend on any step size hyperparameter besides an initial estimate. We provide a convergence rate analysis of this method, showing sublinear convergence rate for general convex functions and linear convergence under stronger assumptions, matching the best known rates of its non adaptive variant. Finally, an empirical comparison with related methods on 6 different problems illustrates the computational advantage of the adaptive step size strategy.
This document introduces key concepts in calculus including limits, derivatives, and rates of change. It discusses:
1) How limits describe the behavior of functions as inputs get closer to a target value. Limits are used to define continuity, derivatives, and tangent lines.
2) How derivatives measure the instantaneous rate of change of a function and are defined as the limit of the average rate of change over infinitesimally small intervals.
3) Properties of continuous functions including that polynomials, rational functions, and basic operations preserve continuity. The intermediate value theorem and existence of zeros are introduced.
Mathematics (from Greek μάθημα máthēma, “knowledge, study, learning”) is the study of topics such as quantity (numbers), structure, space, and change. There is a range of views among mathematicians and philosophers as to the exact scope and definition of mathematics
This document discusses modal analysis and parameter estimation. It introduces single degree of freedom (SDOF) and multi degree of freedom (MDOF) system theory, including equations of motion, transfer functions, frequency response functions, and impulse responses. Parameter estimation can be performed in the frequency domain using FRFs or the time domain using impulse response functions. The goal is to estimate modal parameters like natural frequencies, damping ratios, and mode shapes.
Quantitative norm convergence of some ergodic averagesVjekoslavKovac1
The document summarizes quantitative estimates for the convergence of multiple ergodic averages of commuting transformations. Specifically, it presents a theorem that provides an explicit bound on the number of jumps in the Lp norm for double averages over commuting Aω actions on a probability space. The proof transfers the structure of the Cantor group AZ to R+ and establishes norm estimates for bilinear averages of functions on R2+. This allows bounding the variation of the double averages and proving the theorem.
The document discusses limits, continuity, and related concepts. Some key points:
1) It defines the concept of a limit and explains how to evaluate one-sided and two-sided limits. A limit exists only if the left and right-sided limits are equal.
2) Continuity is defined as a function being defined at a point, and the limit existing and being equal to the function value. Functions like tan(x) are only continuous where the denominator is not 0.
3) Theorems are presented for evaluating limits of polynomials, sums, products, quotients of continuous functions, and the squeeze theorem. Piecewise functions may or may not be continuous depending on behavior at points of discontin
A very wide spectrum of optimization problems can be efficiently solved with proximal gradient methods which hinge on the celebrated forward-backward splitting (FBS) schema. But such first-order methods are only effective when low or medium accuracy is required and are known to be rather slow or even impractical for badly conditioned problems. Moreover, the straightforward introduction of second-order (Hessian) information is beset with shortcomings as, typically, at every iteration we need to solve a non-separable optimisation problem. In this talk we will follow a different route to the solution of such optimisation problems. We will recast non-smooth optimisation problems as the minimisation of a real-valued, continuously differentiable function known as the forward-backward envelope. We will then employ a semismooth Newton method to solve the equivalent optimisation problem instead of the original one. We will then apply the proposed semismooth Newton method to L1-regularised least squares (LASSO) problems which is motivated by an an interesting application: recursive compressed sensing. Compressed sensing is a signal processing methodology for the reconstruction of sparsely sampled signals and it offers a new paradigm for sampling signals based on their innovation, that is, the minimum number of coefficients sufficient to accurately represent it in an appropriately selected basis. Compressed sensing leads to a lower sampling rate compared to theories using some fixed basis and has many applications in image processing, medical imaging and MRI, photography, holography, facial recognition, radio astronomy, radar technology and more. The traditional compressed sensing approach is naturally offline, in that it amounts to sparsely sampling and reconstructing a given dataset. Recently, an online algorithm for performing compressed sensing on streaming data was proposed; the scheme uses recursive sampling of the input stream and recursive decompression to accurately estimate stream entries from the acquired noisy measurements. We will see how we can tailor the forward-backward Newton method to solve recursive compressed sensing problems at one tenth of the time required by other algorithms such as ISTA, FISTA, ADMM and interior-point methods (L1LS).
- The document discusses the convergence rate of gradient descent for minimizing strongly convex and smooth loss functions.
- It shows that under suitable step sizes, the gradient descent iterates converge geometrically fast to the optimal solution.
- It then applies this analysis to the generalized linear model, showing that gradient descent can estimate the true parameter within radius proportional to the statistical error, provided enough iterations.
This document discusses probability density functions (pdfs) and how they relate to probability distribution functions. It provides examples of common pdfs like the uniform and Gaussian distributions. The Gaussian or normal distribution is described in more detail. The document also discusses how to determine the pdf of a random variable that is a function of another random variable, whether the function is monotonic or non-monotonic. Key aspects like changing of variables in integrals and combining probabilities for multiple values are addressed.
Computational Tools and Techniques for Numerical Macro-Financial ModelingVictor Zhorin
A set of numerical tools used to create and analyze non-linear macroeconomic models with financial sector is discussed. New methods and results for computing Hansen-Scheinkman-Borovicka shock-price and shock-exposure elasticities for variety of models are presented. Spectral approximation technology (chebfun):
numerical computation in Chebyshev functions piece-wise smooth functions
breakpoints detection
rootfinding
functions with singularities
fast adaptive quadratures continuous QR, SVD, least-squares linear operators
solution of linear and non-linear ODE
Frechet derivatives via automatic differentiation PDEs in one space variable plus time
Stochastic processes:
(quazi) Monte-Carlo simulations, Polynomial Expansion (gPC), finite-differences (FD) non-linear IRF
Boroviˇcka-Hansen-Sc[heinkman shock-exposure and shock-price elasticities Malliavin derivatives
Many states:
Dimensionality Curse Cure
low-rank tensor decomposition
sparse Smolyak grids
This document discusses perturbed proximal gradient algorithms for minimizing composite functions involving both smooth and nonsmooth terms. It specifically focuses on cases where the gradient of the smooth term is intractable or can only be approximated using Markov chain Monte Carlo methods. The document outlines convergence results for stochastic proximal gradient descent with biased approximations and presents an accelerated Nesterov-based variant. It also discusses open questions regarding variance reduction techniques, averaging strategies, maximal achievable rates, and the potential benefits of faster increasing sequences.
Estimation of the score vector and observed information matrix in intractable...Pierre Jacob
This document discusses methods for estimating derivatives of the likelihood function in intractable models, such as hidden Markov models, where the likelihood does not have a closed form. It presents three key ideas:
1) Iterated filtering, which approximates the score by perturbing the parameter and tracking the evolution of the perturbation through sequential updates.
2) Proximity mapping, which relates the shift in the posterior mode to the prior mode as the prior variance goes to zero to the score via Moreau's approximation.
3) Posterior concentration induced by a normal prior concentrating at a point, where Taylor expansions show the shift in posterior moments is order of the prior variance, relating it to the derivatives of the log
This document provides an overview of functions, limits, and continuity. It defines key concepts such as domain and range of functions, and examples of standard real functions. It also covers even and odd functions, and how to calculate limits, including left and right hand limits. Methods for evaluating algebraic limits using substitution, factorization, and rationalization are presented. The objectives are to understand functions, domains, ranges, and how to evaluate limits of functions.
Stochastic Alternating Direction Method of MultipliersTaiji Suzuki
This document discusses stochastic optimization methods for solving regularized learning problems with structured regularization and large datasets. It proposes applying the alternating direction method of multipliers (ADMM) in a stochastic manner. Specifically, it introduces two stochastic ADMM methods for online data: RDA-ADMM, which extends regularized dual averaging with ADMM updates; and OPG-ADMM, which extends online proximal gradient descent with ADMM updates. These methods allow the regularization term to be optimized in batches, resolving computational difficulties, while the loss is optimized online using only a small number of samples per iteration.
The lattice Boltzmann equation: background, boundary conditions, and Burnett-...Tim Reis
An overview of the lattice Boltzmann equation and a discussion of moment-based boundary conditions. Includes applications to the slip flow regime and the Burnett stress. Some analysis sheds insight into the physical and numerical behaviour of the algorithm.
This document discusses functions, limits, and continuity. It begins by defining functions, domains, ranges, and some standard real functions like constant, identity, modulus, and greatest integer functions. It then covers limits of functions including one-sided limits and properties of limits. Examples are provided to illustrate evaluating limits using substitution and factorization methods. The overall objectives are to understand functions, domains, ranges, limits of functions and methods to evaluate limits.
The document outlines the steps to formulate an optimization problem including identifying design variables, formulating constraints, formulating the objective function, and setting variable bounds. It then provides an example of optimizing the cross-sectional areas of members in a seven bar truss structure to minimize weight while satisfying stress, stability, stiffness, and other constraints. Classical optimization techniques are summarized, such as single variable methods like bracketing and interval halving, and multi-variable methods including handling equality and inequality constraints.
The document discusses digital image processing and two-dimensional transforms. It provides an agenda that covers two-dimensional mathematical preliminaries and two transforms: the discrete Fourier transform (DFT) and discrete cosine transform (DCT). It then discusses the DFT and DCT in more detail over several pages, covering properties, examples, and applications such as image compression.
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...Sean Meyn
Many machine learning and optimization algorithms solve hidden root-finding problems through the magic of stochastic approximation (SA). Unfortunately, these algorithms are slow to converge: the optimal convergence rate for the mean squared error (MSE) is of order O(n⁻¹) at iteration n.
Far faster convergence rates are possible by reconsidering the design of exploration signals used in these algorithms. In this lecture the focus is on quasi-stochastic approximation (QSA), in which a multi-dimensional clock process defines exploration. It is found that algorithms can be designed to achieve a MSE convergence rate approaching O(n⁻⁴).
Although the framework is entirely deterministic, this new theory leans heavily on concepts from the theory of Markov processes. Most critical is Poisson’s equation to transform the QSA equations into a mean flow with additive “noise” with attractive properties. Existence of solutions to Poisson’s equation is based on Baker’s Theorem from number theory---to the best of our knowledge, this is the first time this theorem has been applied to any topic in engineering!
The theory is illustrated with applications to gradient free optimization.
Joint research with Caio Lauand, current graduate student at UF.
References
[1] C. Kalil Lauand and S. Meyn. Approaching quartic convergence rates for quasi-stochastic approximation with application to gradient-free optimization. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 15743–15756. Curran Associates, Inc., 2022.
[2] C. K. Lauand and S. Meyn. Quasi-stochastic approximation: Design principles with applications to extremum seeking control. IEEE Control Systems Magazine, 43(5):111–136, Oct 2023.
[3] C. K. Lauand and S. Meyn. The curse of memory in stochastic approximation. In Proc. IEEE Conference on Decision and Control, pages 7803–7809, 2023. Extended version. arXiv 2309.02944, 2023.
Similar to Sufficient decrease is all you need (20)
Random Matrix Theory and Machine Learning - Part 4Fabian Pedregosa
Deep learning models with millions or billions of parameters should overfit according to classical theory, but they do not. The emerging theory of double descent seeks to explain why larger neural networks can generalize well. Random matrix theory provides a tractable framework to model double descent through random feature models, where the number of random features controls model capacity. In the high-dimensional limit, the test error of random feature regression exhibits a double descent shape that can be computed analytically.
Random Matrix Theory and Machine Learning - Part 3Fabian Pedregosa
ICML 2021 tutorial on random matrix theory and machine learning.
Part 3 covers: 1. Motivation: Average-case versus worst-case in high dimensions 2. Algorithm halting times (runtimes) 3. Outlook
Random Matrix Theory and Machine Learning - Part 1Fabian Pedregosa
This document provides an introduction to random matrix theory and its applications in machine learning. It discusses several classical random matrix ensembles like the Gaussian Orthogonal Ensemble (GOE) and Wishart ensemble. These ensembles are used to model phenomena in fields like number theory, physics, and machine learning. Specifically, the GOE is used to model Hamiltonians of heavy nuclei, while the Wishart ensemble relates to the Hessian of least squares problems. The tutorial will cover applications of random matrix theory to analyzing loss landscapes, numerical algorithms, and the generalization properties of machine learning models.
Average case acceleration through spectral density estimationFabian Pedregosa
We develop a framework for designing optimal quadratic optimization methods in terms of their average-case runtime. This yields a new class of methods that achieve acceleration through a model of the Hessian's expected spectral density. We develop explicit algorithms for the uniform, Marchenko-Pastur, and exponential distributions. These methods are momentum-based gradient algorithms whose hyper-parameters can be estimated without knowledge of the Hessian's smallest singular value, in contrast with classical accelerated methods like Nesterov acceleration and Polyak momentum. Empirical results on quadratic, logistic regression and neural networks show the proposed methods always match and in many cases significantly improve over classical accelerated methods.
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsFabian Pedregosa
This document provides an overview of asynchronous stochastic optimization methods and algorithms. It discusses asynchronous parallel stochastic gradient descent (SGD) and how it can minimize idle time. It also introduces asynchronous variance-reduced optimization methods like asynchronous SAGA that provide faster convergence than SGD. The document analyzes the convergence properties of asynchronous optimization methods and presents empirical results demonstrating the speedups achieved by asynchronous proximal SAGA (ProxASAGA) on large datasets.
Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...Fabian Pedregosa
The document proposes a new parallel method called Proximal Asynchronous Stochastic Gradient Average (ProxASAGA) for solving composite optimization problems. ProxASAGA extends SAGA to handle nonsmooth objectives using proximal operators, and runs asynchronously in parallel without locks. It is shown to converge at the same linear rate as the sequential algorithm theoretically, and achieves speedups of 6-12x on a 20-core machine in practice on large datasets, with greater speedups on sparser problems as predicted by theory.
Hyperparameter optimization with approximate gradientFabian Pedregosa
This document discusses hyperparameter optimization using approximate gradients. It introduces the problem of optimizing hyperparameters along with model parameters. While model parameters can be estimated from data, hyperparameters require methods like cross-validation. The document proposes using approximate gradients to optimize hyperparameters more efficiently than costly methods like grid search. It derives the gradient of the objective with respect to hyperparameters and presents an algorithm called HOAG that approximates this gradient using inexact solutions. The document analyzes HOAG's convergence and provides experimental results comparing it to other hyperparameter optimization methods.
Lightning: large scale machine learning in pythonFabian Pedregosa
Lightning is a Python library for large-scale machine learning that incorporates recent advances in optimization algorithms. It is compatible with scikit-learn and supports both dense and sparse data as well as structured sparsity penalties. Lightning scales to large datasets using stochastic optimization methods like SGD, SVRG, SDCA, and SAGA. It also efficiently handles large feature spaces using coordinate descent algorithms. The API is similar to scikit-learn but is based on optimization algorithms rather than machine learning models. Lightning is part of the scikit-learn-contrib project.
Profiling in Python provides concise summaries of key profiling tools in 3 sentences:
cProfile and line_profiler profile execution time and identify slow lines of code. memory_profiler profiles memory usage with line-by-line or time-based outputs. YEP extends profiling to compiled C/C++ extensions like Cython modules, which are not covered by the standard Python profilers.
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
Or: Beyond linear.
Abstract: Equivariant neural networks are neural networks that incorporate symmetries. The nonlinear activation functions in these networks result in interesting nonlinear equivariant maps between simple representations, and motivate the key player of this talk: piecewise linear representation theory.
Disclaimer: No one is perfect, so please mind that there might be mistakes and typos.
dtubbenhauer@gmail.com
Corrected slides: dtubbenhauer.com/talks.html
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxMAGOTI ERNEST
Although Artemia has been known to man for centuries, its use as a food for the culture of larval organisms apparently began only in the 1930s, when several investigators found that it made an excellent food for newly hatched fish larvae (Litvinenko et al., 2023). As aquaculture developed in the 1960s and ‘70s, the use of Artemia also became more widespread, due both to its convenience and to its nutritional value for larval organisms (Arenas-Pardo et al., 2024). The fact that Artemia dormant cysts can be stored for long periods in cans, and then used as an off-the-shelf food requiring only 24 h of incubation makes them the most convenient, least labor-intensive, live food available for aquaculture (Sorgeloos & Roubach, 2021). The nutritional value of Artemia, especially for marine organisms, is not constant, but varies both geographically and temporally. During the last decade, however, both the causes of Artemia nutritional variability and methods to improve poorquality Artemia have been identified (Loufi et al., 2024).
Brine shrimp (Artemia spp.) are used in marine aquaculture worldwide. Annually, more than 2,000 metric tons of dry cysts are used for cultivation of fish, crustacean, and shellfish larva. Brine shrimp are important to aquaculture because newly hatched brine shrimp nauplii (larvae) provide a food source for many fish fry (Mozanzadeh et al., 2021). Culture and harvesting of brine shrimp eggs represents another aspect of the aquaculture industry. Nauplii and metanauplii of Artemia, commonly known as brine shrimp, play a crucial role in aquaculture due to their nutritional value and suitability as live feed for many aquatic species, particularly in larval stages (Sorgeloos & Roubach, 2021).
Authoring a personal GPT for your research and practice: How we created the Q...Leonel Morgado
Thematic analysis in qualitative research is a time-consuming and systematic task, typically done using teams. Team members must ground their activities on common understandings of the major concepts underlying the thematic analysis, and define criteria for its development. However, conceptual misunderstandings, equivocations, and lack of adherence to criteria are challenges to the quality and speed of this process. Given the distributed and uncertain nature of this process, we wondered if the tasks in thematic analysis could be supported by readily available artificial intelligence chatbots. Our early efforts point to potential benefits: not just saving time in the coding process but better adherence to criteria and grounding, by increasing triangulation between humans and artificial intelligence. This tutorial will provide a description and demonstration of the process we followed, as two academic researchers, to develop a custom ChatGPT to assist with qualitative coding in the thematic data analysis process of immersive learning accounts in a survey of the academic literature: QUAL-E Immersive Learning Thematic Analysis Helper. In the hands-on time, participants will try out QUAL-E and develop their ideas for their own qualitative coding ChatGPT. Participants that have the paid ChatGPT Plus subscription can create a draft of their assistants. The organizers will provide course materials and slide deck that participants will be able to utilize to continue development of their custom GPT. The paid subscription to ChatGPT Plus is not required to participate in this workshop, just for trying out personal GPTs during it.
The debris of the ‘last major merger’ is dynamically youngSérgio Sacani
The Milky Way’s (MW) inner stellar halo contains an [Fe/H]-rich component with highly eccentric orbits, often referred to as the
‘last major merger.’ Hypotheses for the origin of this component include Gaia-Sausage/Enceladus (GSE), where the progenitor
collided with the MW proto-disc 8–11 Gyr ago, and the Virgo Radial Merger (VRM), where the progenitor collided with the
MW disc within the last 3 Gyr. These two scenarios make different predictions about observable structure in local phase space,
because the morphology of debris depends on how long it has had to phase mix. The recently identified phase-space folds in Gaia
DR3 have positive caustic velocities, making them fundamentally different than the phase-mixed chevrons found in simulations
at late times. Roughly 20 per cent of the stars in the prograde local stellar halo are associated with the observed caustics. Based
on a simple phase-mixing model, the observed number of caustics are consistent with a merger that occurred 1–2 Gyr ago.
We also compare the observed phase-space distribution to FIRE-2 Latte simulations of GSE-like mergers, using a quantitative
measurement of phase mixing (2D causticality). The observed local phase-space distribution best matches the simulated data
1–2 Gyr after collision, and certainly not later than 3 Gyr. This is further evidence that the progenitor of the ‘last major merger’
did not collide with the MW proto-disc at early times, as is thought for the GSE, but instead collided with the MW disc within
the last few Gyr, consistent with the body of work surrounding the VRM.
hematic appreciation test is a psychological assessment tool used to measure an individual's appreciation and understanding of specific themes or topics. This test helps to evaluate an individual's ability to connect different ideas and concepts within a given theme, as well as their overall comprehension and interpretation skills. The results of the test can provide valuable insights into an individual's cognitive abilities, creativity, and critical thinking skills
When I was asked to give a companion lecture in support of ‘The Philosophy of Science’ (https://shorturl.at/4pUXz) I decided not to walk through the detail of the many methodologies in order of use. Instead, I chose to employ a long standing, and ongoing, scientific development as an exemplar. And so, I chose the ever evolving story of Thermodynamics as a scientific investigation at its best.
Conducted over a period of >200 years, Thermodynamics R&D, and application, benefitted from the highest levels of professionalism, collaboration, and technical thoroughness. New layers of application, methodology, and practice were made possible by the progressive advance of technology. In turn, this has seen measurement and modelling accuracy continually improved at a micro and macro level.
Perhaps most importantly, Thermodynamics rapidly became a primary tool in the advance of applied science/engineering/technology, spanning micro-tech, to aerospace and cosmology. I can think of no better a story to illustrate the breadth of scientific methodologies and applications at their best.
Immersive Learning That Works: Research Grounding and Paths ForwardLeonel Morgado
We will metaverse into the essence of immersive learning, into its three dimensions and conceptual models. This approach encompasses elements from teaching methodologies to social involvement, through organizational concerns and technologies. Challenging the perception of learning as knowledge transfer, we introduce a 'Uses, Practices & Strategies' model operationalized by the 'Immersive Learning Brain' and ‘Immersion Cube’ frameworks. This approach offers a comprehensive guide through the intricacies of immersive educational experiences and spotlighting research frontiers, along the immersion dimensions of system, narrative, and agency. Our discourse extends to stakeholders beyond the academic sphere, addressing the interests of technologists, instructional designers, and policymakers. We span various contexts, from formal education to organizational transformation to the new horizon of an AI-pervasive society. This keynote aims to unite the iLRN community in a collaborative journey towards a future where immersive learning research and practice coalesce, paving the way for innovative educational research and practice landscapes.
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...AbdullaAlAsif1
The pygmy halfbeak Dermogenys colletei, is known for its viviparous nature, this presents an intriguing case of relatively low fecundity, raising questions about potential compensatory reproductive strategies employed by this species. Our study delves into the examination of fecundity and the Gonadosomatic Index (GSI) in the Pygmy Halfbeak, D. colletei (Meisner, 2001), an intriguing viviparous fish indigenous to Sarawak, Borneo. We hypothesize that the Pygmy halfbeak, D. colletei, may exhibit unique reproductive adaptations to offset its low fecundity, thus enhancing its survival and fitness. To address this, we conducted a comprehensive study utilizing 28 mature female specimens of D. colletei, carefully measuring fecundity and GSI to shed light on the reproductive adaptations of this species. Our findings reveal that D. colletei indeed exhibits low fecundity, with a mean of 16.76 ± 2.01, and a mean GSI of 12.83 ± 1.27, providing crucial insights into the reproductive mechanisms at play in this species. These results underscore the existence of unique reproductive strategies in D. colletei, enabling its adaptation and persistence in Borneo's diverse aquatic ecosystems, and call for further ecological research to elucidate these mechanisms. This study lends to a better understanding of viviparous fish in Borneo and contributes to the broader field of aquatic ecology, enhancing our knowledge of species adaptations to unique ecological challenges.
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Sufficient decrease is all you need
1. Sufficient decrease is all you need
A simple condition to forget about the step-size, with
applications to the Frank-Wolfe algorithm.
Fabian Pedregosa
June 4th, 2018. Google Brain Montreal
2. Where I Come From
ML/Optimization/Software Guy
Engineer (2010–2012)
First contact with ML: develop ML
library (scikit-learn).
ML and NeuroScience (2012–2015)
PhD applying ML to neuroscience.
ML and Optimization (2015–)
Stochastic, Parallel, Constrained,
Hyperparameter optimization.
1/30
4. Outline
Motivation: eliminate step-size parameter.
1. Frank-Wolfe, A method for constrained optimization.
2. Adaptive Frank-Wolfe. Frank-Wolfe without the step-size.
3. Perspectives. Other applications: proximal splitting,
stochastic optimization.
2/30
5. Outline
Motivation: eliminate step-size parameter.
1. Frank-Wolfe, A method for constrained optimization.
2. Adaptive Frank-Wolfe. Frank-Wolfe without the step-size.
3. Perspectives. Other applications: proximal splitting,
stochastic optimization.
With a little help from my collaborators
Armin Askari
(UC Berkeley)
Geoffrey N´egiar
(UC Berkeley)
Martin Jaggi
(EPFL)
Gauthier Gidel
(UdeM)
2/30
7. The Frank-Wolfe (FW) algorithm, aka conditional gradient
Problem: smooth f , compact D
arg min
x∈D
f (x)
Algorithm 1: Frank-Wolfe (FW)
1 for t = 0, 1 . . . do
2 st ∈ arg mins∈D f (xt), s
3 Find γt by line-search: γt ∈
arg minγ∈[0,1] f ((1−γ)xt +γst)
4 xt+1 = (1 − γt)xt + γtst
3/30
8. The Frank-Wolfe (FW) algorithm, aka conditional gradient
Problem: smooth f , compact D
arg min
x∈D
f (x)
Algorithm 1: Frank-Wolfe (FW)
1 for t = 0, 1 . . . do
2 st ∈ arg mins∈D f (xt), s
3 Find γt by line-search: γt ∈
arg minγ∈[0,1] f ((1−γ)xt +γst)
4 xt+1 = (1 − γt)xt + γtst
3/30
9. The Frank-Wolfe (FW) algorithm, aka conditional gradient
Problem: smooth f , compact D
arg min
x∈D
f (x)
Algorithm 1: Frank-Wolfe (FW)
1 for t = 0, 1 . . . do
2 st ∈ arg mins∈D f (xt), s
3 Find γt by line-search: γt ∈
arg minγ∈[0,1] f ((1−γ)xt +γst)
4 xt+1 = (1 − γt)xt + γtst
3/30
10. The Frank-Wolfe (FW) algorithm, aka conditional gradient
Problem: smooth f , compact D
arg min
x∈D
f (x)
Algorithm 1: Frank-Wolfe (FW)
1 for t = 0, 1 . . . do
2 st ∈ arg mins∈D f (xt), s
3 Find γt by line-search: γt ∈
arg minγ∈[0,1] f ((1−γ)xt +γst)
4 xt+1 = (1 − γt)xt + γtst
3/30
11. The Frank-Wolfe (FW) algorithm, aka conditional gradient
Problem: smooth f , compact D
arg min
x∈D
f (x)
Algorithm 1: Frank-Wolfe (FW)
1 for t = 0, 1 . . . do
2 st ∈ arg mins∈D f (xt), s
3 Find γt by line-search: γt ∈
arg minγ∈[0,1] f ((1−γ)xt +γst)
4 xt+1 = (1 − γt)xt + γtst
3/30
12. Why people ♥ Frank-Wolfe
• Projection-free. Only linear subproblems arise vs quadratic
for projection.
4/30
13. Why people ♥ Frank-Wolfe
• Projection-free. Only linear subproblems arise vs quadratic
for projection.
• Solution of linear subproblem is always extremal element of
D.
4/30
14. Why people ♥ Frank-Wolfe
• Projection-free. Only linear subproblems arise vs quadratic
for projection.
• Solution of linear subproblem is always extremal element of
D.
• Iterates admit sparse representation = xt convex combination
of at most t elements.
4/30
15. Recent applications of Frank-Wolfe
• Learning the structure of a neural network.1
• Attention mechanisms that enforce sparsity.2
• 1-constrained problems with extreme number of features.3
1
Wei Ping, Qiang Liu, and Alexander T Ihler (2016). “Learning Infinite RBMs
with Frank-Wolfe”. In: Advances in Neural Information Processing Systems.
2
Vlad Niculae et al. (2018). “SparseMAP: Differentiable Sparse Structured
Inference”. In: International Conference on Machine Learning.
3
Thomas Kerdreux, Fabian Pedregosa, and Alexandre d’Aspremont (2018).
“Frank-Wolfe with Subsampling Oracle”. In: Proceedings of the 35th
International Conference on Machine Learning.
5/30
16. A practical issue
• Line-search only
efficient when closed
form exists (quadratic
objective).
• Step-size
γt = 2/(t + 2) is
convergent, but
extremely slow.
Algorithm 2: Frank-Wolfe (FW)
1 for t = 0, 1 . . . do
2 st ∈ arg mins∈D f (xt), s
3 Find γt by line-search: γt ∈
arg minγ∈[0,1] f ((1−γ)xt +γst)
4 xt+1 = (1 − γt)xt + γtst
6/30
17. A practical issue
• Line-search only
efficient when closed
form exists (quadratic
objective).
• Step-size
γt = 2/(t + 2) is
convergent, but
extremely slow.
Algorithm 2: Frank-Wolfe (FW)
1 for t = 0, 1 . . . do
2 st ∈ arg mins∈D f (xt), s
3 Find γt by line-search: γt ∈
arg minγ∈[0,1] f ((1−γ)xt +γst)
4 xt+1 = (1 − γt)xt + γtst
Can we do better?
6/30
19. Down the citation rabbit hole
4
Vladimir Demyanov
Alexsandr Rubinov
4
Vladimir Demyanov and Aleksandr Rubinov (1970). Approximate methods in
optimization problems (translated from Russian).
7/30
20. Down the citation rabbit hole
4
Vladimir Demyanov
Alexsandr Rubinov
4
Vladimir Demyanov and Aleksandr Rubinov (1970). Approximate methods in
optimization problems (translated from Russian).
7/30
21. The Demyanov-Rubinov (DR) Frank-Wolfe variant
Problem: smooth objective, compact domain
arg min
x∈D
f (x), where f is L-smooth .
(L-smooth ≡ differentiable with L-Lipschitz gradient).
• Step-size depends
on the correlation
between − f (xt)
and the descent
direction st − xt.
Algorithm 3: FW, DR variant
1 for t = 0, 1 . . . do
2 st ∈ arg mins∈D f (xt), s
3 γt =min
− f (xt), st − xt
L st − xt
2
, 1
4 xt+1 = (1 − γt)xt + γtst
8/30
22. The Demyanov-Rubinov (DR) Frank-Wolfe variant
Where does γt =min
− f (xt), st − xt
L st − xt
2
, 1 come from?
9/30
23. The Demyanov-Rubinov (DR) Frank-Wolfe variant
Where does γt =min
− f (xt), st − xt
L st − xt
2
, 1 come from?
L-smooth inequality
Any L-smooth function f verifies
f (y) ≤ f (x) + f (x), y − x +
L
2
x − y 2
,
for all x, y in the domain.
9/30
24. The Demyanov-Rubinov (DR) Frank-Wolfe variant
Where does γt =min
− f (xt), st − xt
L st − xt
2
, 1 come from?
L-smooth inequality
Any L-smooth function f verifies
f (y) ≤ f (x) + f (x), y − x +
L
2
x − y 2
:=Qx (y)
,
for all x, y in the domain.
• The right hand side is a
quadratic upper bound
Qx (y)
f (y)
9/30
25. Justification of the step-size
• L-smooth inequality at xt+1(γ) = (1 − γ)xt + γst, x = xt
gives
f (xt+1(γ)) ≤ f (xt) − γ f (xt), st − xt +
γ2L
2
st − xt
2
10/30
26. Justification of the step-size
• L-smooth inequality at xt+1(γ) = (1 − γ)xt + γst, x = xt
gives
f (xt+1(γ)) ≤ f (xt) − γ f (xt), st − xt +
γ2L
2
st − xt
2
• Minimizing right hand side on γ ∈ [0, 1] gives
γ =min
− f (xt), st − xt
L st − xt
2
, 1 ,
= Demyanov-Rubinov step-size!
10/30
27. Justification of the step-size
• L-smooth inequality at xt+1(γ) = (1 − γ)xt + γst, x = xt
gives
f (xt+1(γ)) ≤ f (xt) − γ f (xt), st − xt +
γ2L
2
st − xt
2
• Minimizing right hand side on γ ∈ [0, 1] gives
γ =min
− f (xt), st − xt
L st − xt
2
, 1 ,
= Demyanov-Rubinov step-size!
• ≡ exact line search on the quadratic upper bound.
10/30
28. Towards an Adaptive FW
Quadratic upper bound
The Demyanov-Rubinov makes use of a quadratic upper bound,
but it is only evaluated at xt, xt+1.
11/30
29. Towards an Adaptive FW
Quadratic upper bound
The Demyanov-Rubinov makes use of a quadratic upper bound,
but it is only evaluated at xt, xt+1.
Sufficient decrease is all you need
L-smooth inequality can be replaced by
f (xt+1) ≤ f (xt) − γt f (xt), st − xt +
γ2
t Lt
2
st − xt
2
with γt =min
− f (xt), st − xt
Lt st − xt
2
, 1
11/30
30. Towards an Adaptive FW
Quadratic upper bound
The Demyanov-Rubinov makes use of a quadratic upper bound,
but it is only evaluated at xt, xt+1.
Sufficient decrease is all you need
L-smooth inequality can be replaced by
f (xt+1) ≤ f (xt) − γt f (xt), st − xt +
γ2
t Lt
2
st − xt
2
with γt =min
− f (xt), st − xt
Lt st − xt
2
, 1
Key difference with DR: L is replaced by Lt. Potentially Lt L.
11/30
31. The Adaptive FW algorithm
New FW variant with adaptive step-size.5
Algorithm 4: The Adaptive Frank-Wolfe algorithm (AdaFW)
1 for t = 0, 1 . . . do
2 st ∈ arg mins∈D f (xt), s
3 Find Lt that verifies sufficient decrease (1), with
4 γt =min
− f (xt), st − xt
Lt st − xt
2
, 1
5 xt+1 = (1 − γt)xt + γtst
f (xt+1) ≤ f (xt) + γt f (xt), st − xt +
γ2
t Lt
2
st − xt
2
(1)
5
Fabian Pedregosa, Armin Askari, Geoffrey Negiar, and Martin Jaggi (2018).
“Step-Size Adaptivity in Projection-Free Optimization”. In: Submitted.
12/30
32. The Adaptive FW algorithm
γ =0 γ = 1γt
f (xt) + γ f (xt), st − xt + γ2Lt
2 st − xt
2
f ((1 − γ)xt + γst)
• Worst-case, Lt = L. Often Lt L =⇒ larger step-size.
13/30
33. The Adaptive FW algorithm
γ =0 γ = 1γt
f (xt) + γ f (xt), st − xt + γ2Lt
2 st − xt
2
f ((1 − γ)xt + γst)
• Worst-case, Lt = L. Often Lt L =⇒ larger step-size.
• Adaptivity to local geometry.
13/30
34. The Adaptive FW algorithm
γ =0 γ = 1γt
f (xt) + γ f (xt), st − xt + γ2Lt
2 st − xt
2
f ((1 − γ)xt + γst)
• Worst-case, Lt = L. Often Lt L =⇒ larger step-size.
• Adaptivity to local geometry.
• Two extra function evaluations per iteration. Often given as
byproduct of gradient.
13/30
36. Zig-Zagging phenomena in FW
The Frank-Wolfe algorithm zig-zags when the solution lies in a
face of the boundary.
Some FW variants have been developed to address this issue.
14/30
37. Away-steps FW, informal
Away-steps FW algorithm (Wolfe, 1970) (Gu´elat and Marcotte,
1986) adds the possibility to move away from an active atom.
15/30
38. Away-steps FW algorithm
Keep active set St = active vertices that have been previously
selected and have non-zero weight.
16/30
39. Away-steps FW algorithm
Keep active set St = active vertices that have been previously
selected and have non-zero weight.
Algorithm 5: Away-Steps FW
1 for t = 0, 1 . . . do
2 st ∈ arg mins∈D f (xt), s
3 vt ∈ arg maxv∈St
f (xt), v
4 if − f (xt), st − xt ≥ − f (xt), xt − vt then
5 dt = st − xt, FW step
6 else
7 dt = xt − vt, Away step
8 Find γt by line-search: γt ∈ arg minγ∈[0,γmax
t ] f (xt +γdt)
9 xt+1 = xt + γtdt
16/30
40. Away-steps FW algorithm
Keep active set St = active vertices that have been previously
selected and have non-zero weight.
Algorithm 5: Away-Steps FW
1 for t = 0, 1 . . . do
2 st ∈ arg mins∈D f (xt), s
3 vt ∈ arg maxv∈St
f (xt), v
4 if − f (xt), st − xt ≥ − f (xt), xt − vt then
5 dt = st − xt, FW step
6 else
7 dt = xt − vt, Away step
8 Find γt by line-search: γt ∈ arg minγ∈[0,γmax
t ] f (xt +γdt)
9 xt+1 = xt + γtdt
16/30
41. Away-steps FW algorithm
Keep active set St = active vertices that have been previously
selected and have non-zero weight.
Algorithm 5: Away-Steps FW
1 for t = 0, 1 . . . do
2 st ∈ arg mins∈D f (xt), s
3 vt ∈ arg maxv∈St
f (xt), v
4 if − f (xt), st − xt ≥ − f (xt), xt − vt then
5 dt = st − xt, FW step
6 else
7 dt = xt − vt, Away step
8 Find γt by line-search: γt ∈ arg minγ∈[0,γmax
t ] f (xt +γdt)
9 xt+1 = xt + γtdt
16/30
42. Away-steps FW algorithm
Keep active set St = active vertices that have been previously
selected and have non-zero weight.
Algorithm 5: Away-Steps FW
1 for t = 0, 1 . . . do
2 st ∈ arg mins∈D f (xt), s
3 vt ∈ arg maxv∈St
f (xt), v
4 if − f (xt), st − xt ≥ − f (xt), xt − vt then
5 dt = st − xt, FW step
6 else
7 dt = xt − vt, Away step
8 Find γt by line-search: γt ∈ arg minγ∈[0,γmax
t ] f (xt +γdt)
9 xt+1 = xt + γtdt
16/30
43. Away-steps FW algorithm
Keep active set St = active vertices that have been previously
selected and have non-zero weight.
Algorithm 5: Away-Steps FW
1 for t = 0, 1 . . . do
2 st ∈ arg mins∈D f (xt), s
3 vt ∈ arg maxv∈St
f (xt), v
4 if − f (xt), st − xt ≥ − f (xt), xt − vt then
5 dt = st − xt, FW step
6 else
7 dt = xt − vt, Away step
8 Find γt by line-search: γt ∈ arg minγ∈[0,γmax
t ] f (xt +γdt)
9 xt+1 = xt + γtdt
16/30
44. Pairwise FW
Key idea
Move weight mass between two atoms in each step.
Proposed by (Lacoste-Julien and Jaggi, 2015), inspired the MDM
alg. (Mitchell, Demyanov, and Malozemov, 1974).
Algorithm 6: Pairwise FW
1 for t = 0, 1 . . . do
2 st ∈ arg mins∈D f (xt), s
3 vt ∈ arg maxs∈St
f (xt), s
4 dt = st − vt
5 Find γt by line-search: γt ∈
arg minγ∈[0,γmax
t ] f (xt +γdt)
6 xt+1 = xt + γtdt
17/30
45. Away-steps FW and Pairwise FW
Convergence of Away-steps and Pairwise FW
• Linear convergence for strongly convex functions on polytopes
(Lacoste-Julien and Jaggi, 2015).
18/30
46. Away-steps FW and Pairwise FW
Convergence of Away-steps and Pairwise FW
• Linear convergence for strongly convex functions on polytopes
(Lacoste-Julien and Jaggi, 2015).
• Can we design variants with sufficient decrease?
18/30
47. Away-steps FW and Pairwise FW
Convergence of Away-steps and Pairwise FW
• Linear convergence for strongly convex functions on polytopes
(Lacoste-Julien and Jaggi, 2015).
• Can we design variants with sufficient decrease?
Introducing Adaptive Away-steps and Adaptive Pairwise
Choose Lt such that it verifies
f (xt + γtdt) ≤ f (xt) + γt f (xt), dt +
γ2
t Lt
2
dt
2
with γt =min
− f (xt), dt
Lt dt
2
, 1
18/30
48. Adaptive Pairwise FW
Algorithm 7: Pairwise FW
1 for t = 0, 1 . . . do
2 st ∈ arg mins∈D f (xt), s
3 vt ∈ arg maxs∈St
f (xt), s
4 dt = st − vt
5 Find Lt that verifies sufficient decrease (2), with
6 γt =min
− f (xt), dt
Lt dt
2
, 1
7 xt+1 = xt + γtdt
f (xt + γtdt) ≤ f (xt) + γt f (xt), dt +
γ2
t Lt
2
dt
2
(2)
19/30
49. Theory for Adaptive Step-size variants
Strongly convex f
Pairwise and Away-steps converge linearly on a polytope. For
each “good step” we have:
f (xt+1) − f (x ) ≤ (1 − ρ)(f (xt) − f (x ))
20/30
50. Theory for Adaptive Step-size variants
Strongly convex f
Pairwise and Away-steps converge linearly on a polytope. For
each “good step” we have:
f (xt+1) − f (x ) ≤ (1 − ρ)(f (xt) − f (x ))
Convex f
For all FW variants, f (xt) − f (x ) ≤ O(1/t)
20/30
51. Theory for Adaptive Step-size variants
Strongly convex f
Pairwise and Away-steps converge linearly on a polytope. For
each “good step” we have:
f (xt+1) − f (x ) ≤ (1 − ρ)(f (xt) − f (x ))
Convex f
For all FW variants, f (xt) − f (x ) ≤ O(1/t)
Non-Convex f
For all FW variants, maxs∈D f (xt), xt − s ≤ O(1/
√
t)
20/30
57. Proximal Splitting
Building quadratic upper bound is common in proximal gradient
descent (Beck and Teboulle, 2009) (Nesterov, 2013).
Recently extended to the Davis-Yin three operator splitting6
f (x) + g(x) + h(x)
with access to f , proxγg , proxγh
Key insight: verify a sufficient decrease condition of the form
f (xt+1) ≤ f (zt) + f (zt), xt+1 − zt +
1
2γt
xt+1 − zt
2
6
Fabian Pedregosa and Gauthier Gidel (2018). “Adaptive Three Operator
Splitting”. In: Proceedings of the 35th International Conference on Machine
Learning.
24/30
58. Nearly-isotonic penalty
Problem
arg minx loss(x) + λ p−1
i=1 max{xi − xi+1, 0}
Coefficients
Magnitude
=10 6
Coefficients
Magnitude
=10 3
Coefficients
Magnitude
=0.01
Coefficients
Magnitude
=0.1
estimated coefficients
ground truth
0 100 200 300 400
Time (in seconds)
10 12
10 10
10 8
10 6
10 4
10 2
100
Objectiveminusoptimum
0 200 400 600
Time (in seconds)
10 12
10 10
10 8
10 6
10 4
10 2
100
0 100 200 300 400
Time (in seconds)
10 12
10 10
10 8
10 6
10 4
10 2
100
0 100 200 300 400
Time (in seconds)
10 12
10 10
10 8
10 6
10 4
10 2
100
Adaptive TOS (variant 1) Adaptive TOS (variant 2) TOS (1/L) TOS (1.99/L)TOS-AOLS PDHG Adaptive PDHG
25/30
59. Overlapping group lasso penalty
Problem
arg min
x
loss(x) + λ g∈G [x]g 2
Coefficients
Magnitude
=10 6
Coefficients
Magnitude
=10 3
Coefficients
Magnitude
=0.01
Coefficients
Magnitude
=0.1
estimated coefficients
ground truth
0 10 20 30 40 50
Time (in seconds)
10 12
10 10
10 8
10 6
10 4
10 2
100
Objectiveminusoptimum
0 10 20 30 40 50
Time (in seconds)
10 12
10 10
10 8
10 6
10 4
10 2
100
0 5 10 15 20
Time (in seconds)
10 12
10 10
10 8
10 6
10 4
10 2
100
0.0 0.2 0.4 0.6 0.8 1.0
Time (in seconds)
10 12
10 10
10 8
10 6
10 4
10 2
Adaptive TOS (variant 1) Adaptive TOS (variant 2) TOS (1/L) TOS (1.99/L) TOS-AOLS PDHG Adaptive PDHG
26/30
61. Stochastic optimization
Problem
arg min
x∈Rd
1
n
n
i=1
fi (x)
Heuristic from7 to estimate L by verifying at each iteration t
fi (xt −
1
L
fi (xt)) ≤ fi (xt) −
1
2L
fi (xt) 2
with i random index sampled at iter t.
7
Mark Schmidt, Nicolas Le Roux, and Francis Bach (2017). “Minimizing finite
sums with the stochastic average gradient”. In: Mathematical Programming.
27/30
62. Stochastic optimization
Problem
arg min
x∈Rd
1
n
n
i=1
fi (x)
Heuristic from7 to estimate L by verifying at each iteration t
fi (xt −
1
L
fi (xt)) ≤ fi (xt) −
1
2L
fi (xt) 2
with i random index sampled at iter t.
L-smooth inequality with y = xt − 1
L fi (xt), x = xt
7
Mark Schmidt, Nicolas Le Roux, and Francis Bach (2017). “Minimizing finite
sums with the stochastic average gradient”. In: Mathematical Programming.
27/30
66. Conclusion
• Sufficient decrease condition to set step-size in FW and
variants.
• (Mostly) Hyperparameter-free, adaptivity to local geometry.
29/30
67. Conclusion
• Sufficient decrease condition to set step-size in FW and
variants.
• (Mostly) Hyperparameter-free, adaptivity to local geometry.
• Applications in proximal splitting and stochastic optimization.
Thanks for your attention
29/30
68. References
Beck, Amir and Marc Teboulle (2009). “Gradient-based algorithms with applications to
signal recovery”. In: Convex optimization in signal processing and communications.
Demyanov, Vladimir and Aleksandr Rubinov (1970). Approximate methods in
optimization problems (translated from Russian).
Gu´elat, Jacques and Patrice Marcotte (1986). “Some comments on Wolfe’s away
step”. In: Mathematical Programming.
Kerdreux, Thomas, Fabian Pedregosa, and Alexandre d’Aspremont (2018).
“Frank-Wolfe with Subsampling Oracle”. In: Proceedings of the 35th International
Conference on Machine Learning.
Lacoste-Julien, Simon and Martin Jaggi (2015). “On the global linear convergence of
Frank-Wolfe optimization variants”. In: Advances in Neural Information Processing
Systems.
Mitchell, BF, Vladimir Fedorovich Demyanov, and VN Malozemov (1974). “Finding
the point of a polyhedron closest to the origin”. In: SIAM Journal on Control.
29/30
69. Nesterov, Yu (2013). “Gradient methods for minimizing composite functions”. In:
Mathematical Programming.
Niculae, Vlad et al. (2018). “SparseMAP: Differentiable Sparse Structured Inference”.
In: International Conference on Machine Learning.
Pedregosa, Fabian et al. (2018). “Step-Size Adaptivity in Projection-Free
Optimization”. In: Submitted.
Pedregosa, Fabian and Gauthier Gidel (2018). “Adaptive Three Operator Splitting”.
In: Proceedings of the 35th International Conference on Machine Learning.
Ping, Wei, Qiang Liu, and Alexander T Ihler (2016). “Learning Infinite RBMs with
Frank-Wolfe”. In: Advances in Neural Information Processing Systems.
Schmidt, Mark, Nicolas Le Roux, and Francis Bach (2017). “Minimizing finite sums
with the stochastic average gradient”. In: Mathematical Programming.
Wolfe, Philip (1970). “Convergence theory in nonlinear programming”. In: Integer and
nonlinear programming.
30/30