This document provides an overview and introduction to the course ELEN E4810: Digital Signal Processing. It discusses key topics that will be covered in the course including digital signal processing concepts and operations, basic signal processing blocks and diagrams, classes of sequences, and an introduction to MATLAB for programming assignments. The course will focus on modifying and processing signals using computers and will involve homework assignments, a midterm, final exam, and a course project involving hands-on DSP implementation.
This document provides an overview of machine learning techniques for classification and regression, including decision trees, linear models, and support vector machines. It discusses key concepts like overfitting, regularization, and model selection. For decision trees, it explains how they work by binary splitting of space, common splitting criteria like entropy and Gini impurity, and how trees are built using a greedy optimization approach. Linear models like logistic regression and support vector machines are covered, along with techniques like kernels, regularization, and stochastic optimization. The importance of testing on a holdout set to avoid overfitting is emphasized.
Introduction to machine learning terminology.
Applications within High Energy Physics and outside HEP.
* Basic problems: classification and regression.
* Nearest neighbours approach and spacial indices
* Overfitting (intro)
* Curse of dimensionality
* ROC curve, ROC AUC
* Bayes optimal classifier
* Density estimation: KDE and histograms
* Parametric density estimation
* Mixtures for density estimation and EM algorithm
* Generative approach vs discriminative approach
* Linear decision rule, intro to logistic regression
* Linear regression
* ML in HEP
* classification and regression
* knn classification and regression
* ROC curve
* optimal bayesian classifier
* Fisher's QDA
* intro to Logistic Regression
1. Backpropagation is an algorithm for training multilayer perceptrons by calculating the gradient of the loss function with respect to the network parameters in a layer-by-layer manner, from the final layer to the first layer.
2. The gradient is calculated using the chain rule of differentiation, with the gradient of each layer depending on the error from the next layer and the outputs from the previous layer.
3. Issues that can arise in backpropagation include vanishing gradients if the activation functions have near-zero derivatives, and proper initialization of weights is required to break symmetry and allow gradients to flow effectively through the network during training.
This document summarizes an approach for incremental learning-to-learn with statistical guarantees. It proposes an online algorithm for linear feature learning across multiple tasks, modeled as a distribution over distributions. The algorithm performs projected stochastic gradient descent on the empirical risk to learn a low-rank feature mapping. A statistical analysis shows the algorithm converges to the optimal feature mapping, with a bound that improves on independent task learning when the covariance across tasks is small. The approach is related to multitask learning with trace norm regularization.
Slides for the presentation at ENBIS 2018 of "Deep k-Means: Jointly Clustering with k-Means and Learning Representations" by Thibaut Thonet. Joint work with Maziar Moradi Fard and Eric Gaussier.
This document provides an overview of machine learning techniques for classification and regression, including decision trees, linear models, and support vector machines. It discusses key concepts like overfitting, regularization, and model selection. For decision trees, it explains how they work by binary splitting of space, common splitting criteria like entropy and Gini impurity, and how trees are built using a greedy optimization approach. Linear models like logistic regression and support vector machines are covered, along with techniques like kernels, regularization, and stochastic optimization. The importance of testing on a holdout set to avoid overfitting is emphasized.
Introduction to machine learning terminology.
Applications within High Energy Physics and outside HEP.
* Basic problems: classification and regression.
* Nearest neighbours approach and spacial indices
* Overfitting (intro)
* Curse of dimensionality
* ROC curve, ROC AUC
* Bayes optimal classifier
* Density estimation: KDE and histograms
* Parametric density estimation
* Mixtures for density estimation and EM algorithm
* Generative approach vs discriminative approach
* Linear decision rule, intro to logistic regression
* Linear regression
* ML in HEP
* classification and regression
* knn classification and regression
* ROC curve
* optimal bayesian classifier
* Fisher's QDA
* intro to Logistic Regression
1. Backpropagation is an algorithm for training multilayer perceptrons by calculating the gradient of the loss function with respect to the network parameters in a layer-by-layer manner, from the final layer to the first layer.
2. The gradient is calculated using the chain rule of differentiation, with the gradient of each layer depending on the error from the next layer and the outputs from the previous layer.
3. Issues that can arise in backpropagation include vanishing gradients if the activation functions have near-zero derivatives, and proper initialization of weights is required to break symmetry and allow gradients to flow effectively through the network during training.
This document summarizes an approach for incremental learning-to-learn with statistical guarantees. It proposes an online algorithm for linear feature learning across multiple tasks, modeled as a distribution over distributions. The algorithm performs projected stochastic gradient descent on the empirical risk to learn a low-rank feature mapping. A statistical analysis shows the algorithm converges to the optimal feature mapping, with a bound that improves on independent task learning when the covariance across tasks is small. The approach is related to multitask learning with trace norm regularization.
Slides for the presentation at ENBIS 2018 of "Deep k-Means: Jointly Clustering with k-Means and Learning Representations" by Thibaut Thonet. Joint work with Maziar Moradi Fard and Eric Gaussier.
Optimal interval clustering: Application to Bregman clustering and statistica...Frank Nielsen
This document summarizes an academic paper on optimal interval clustering and its applications to Bregman clustering and statistical mixture learning. It begins by introducing hard clustering and center-based clustering approaches. It then describes how k-means clustering is NP-hard in higher dimensions but polynomial-time in 1D using dynamic programming. The document outlines an optimal interval clustering algorithm using dynamic programming with runtime O(n2kT1(n)) or O(n2T1(n)) using a lookup table. It discusses how this can be applied to 1D Bregman clustering and learning statistical mixtures, providing experimental results on Gaussian mixture models. Finally, it considers perspectives on hierarchical clustering, dynamic clustering maintenance, and streaming approximations.
This talk introduces a new way to compact a (possibly non-uniform) probability distribution “F” into a set of representative points, called support points. These point sets can have important uses for both small-data problems, such as experimental design and uncertainty quantification in engineering applications, as well as big-data problems, such as the optimal reduction of large datasets in Bayesian computation. We first present support points as the minimizer of a powerful goodness-of-fit test called the energy distance, and discuss why such point sets are appealing to use for simulation and integration. An extension of this point set, called projected support points, is then introduced for high-dimensional integration under non-uniform “F”. We show that support points (and its variants) can provide good solutions to the aforementioned small-data and big-data problems. This talk concludes with some new ideas and ongoing work on experimental design, potential theory and robust optimization.
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineSoma Boubou
Object recognition from RGB-D sensors has recently emerged as a renowned and challenging research topic. The current systems often require large amounts of time to train the models and to classify new data. We proposed an effective and fast object recognition approach from 3D data acquired from depth sensors such as Structure or Kinect sensors.
Our contribution in this work} is to present a novel fast and effective approach for real-time object recognition from 3D depth data:
- First, we extract simple but effective frame-level features, which we name as differential frames, from the raw depth data.
- Second, we build a recognition system based on Extreme Learning Machine classifier with a Local Receptive Field (ELM-LRF).
1. The document discusses various machine learning algorithms for classification and regression including logistic regression, neural networks, decision trees, and ensemble methods.
2. It explains key concepts like overfitting, regularization, kernel methods, and different types of neural network architectures like convolutional neural networks.
3. Decision trees are described as intuitive algorithms for classification and regression but are unstable and use greedy optimization. Techniques like pre-pruning and post-pruning are used to improve decision trees.
Sequential quasi-Monte Carlo (SQMC) is a quasi-Monte Carlo (QMC) version of sequential Monte Carlo (or particle filtering), a popular class of Monte Carlo techniques used to carry out inference in state space models. In this talk I will first review the SQMC methodology as well as some theoretical results. Although SQMC converges faster than the usual Monte Carlo error rate its performance deteriorates quickly as the dimension of the hidden variable increases. However, I will show with an example that SQMC may perform well for some "high" dimensional problems. I will conclude this talk with some open problems and potential applications of SQMC in complicated settings.
Reweighting and Boosting to uniforimty in HEParogozhnikov
This document discusses using machine learning boosting techniques to achieve uniformity in particle physics applications. It introduces the uBoost and uGB+FL (gradient boosting with flatness loss) approaches, which aim to produce flat predictions along features of interest, like particle mass. This provides advantages over standard boosting by reducing non-uniformities that could create false signals. The document also proposes a non-uniformity measure and minimizing this with a flatness loss term during gradient boosting training. Examples applying these techniques to rare decay analysis, particle identification, and triggering are shown to achieve more uniform efficiencies than standard boosting.
Information-theoretic clustering with applicationsFrank Nielsen
Information-theoretic clustering with applications
Abstract: Clustering is a fundamental and key primitive to discover structural groups of homogeneous data in data sets, called clusters. The most famous clustering technique is the celebrated k-means clustering that seeks to minimize the sum of intra-cluster variances. k-Means is NP-hard as soon as the dimension and the number of clusters are both greater than 1. In the first part of the talk, we first present a generic dynamic programming method to compute the optimal clustering of n scalar elements into k pairwise disjoint intervals. This case includes 1D Euclidean k-means but also other kinds of clustering algorithms like the k-medoids, the k-medians, the k-centers, etc.
We extend the method to incorporate cluster size constraints and show how to choose the appropriate number of clusters using model selection. We then illustrate and refine the method on two case studies: 1D Bregman clustering and univariate statistical mixture learning maximizing the complete likelihood. In the second part of the talk, we introduce a generalization of k-means to cluster sets of histograms that has become an important ingredient of modern information processing due to the success of the bag-of-word modelling paradigm.
Clustering histograms can be performed using the celebrated k-means centroid-based algorithm. We consider the Jeffreys divergence that symmetrizes the Kullback-Leibler divergence, and investigate the computation of Jeffreys centroids. We prove that the Jeffreys centroid can be expressed analytically using the Lambert W function for positive histograms. We then show how to obtain a fast guaranteed approximation when dealing with frequency histograms and conclude with some remarks on the k-means histogram clustering.
References: - Optimal interval clustering: Application to Bregman clustering and statistical mixture learning IEEE ISIT 2014 (recent result poster) http://arxiv.org/abs/1403.2485
- Jeffreys Centroids: A Closed-Form Expression for Positive Histograms and a Guaranteed Tight Approximation for Frequency Histograms.
IEEE Signal Process. Lett. 20(7): 657-660 (2013) http://arxiv.org/abs/1303.7286
http://www.i.kyoto-u.ac.jp/informatics-seminar/
- The document discusses techniques for reducing the size of large datasets ("big data") by reducing the number of observations and features.
- Dimensionality reduction techniques like principal component analysis (PCA) and random projections can reduce the number of features to a lower dimensional space while preserving distances between observations.
- PCA finds an aligned coordinate system that maximizes the spread of data, while random projections randomly determine a coordinate system. Both techniques can significantly compress datasets, especially those with many redundant features like images.
Efficient end-to-end learning for quantizable representationsNAVER Engineering
발표자: 정연우(서울대 박사과정)
발표일: 2018.7.
유사한 이미지 검색을 위해 neural network를 이용해 이미지의 embedding을 학습시킨다. 기존 연구에서는 검색 속도 증가를 위해 binary code의 hamming distance를 활용하지만 여전히 전체 데이터 셋을 검색해야 하며 정확도가 떨어지는 다는 단점이 있다. 이 논문에서는 sparse한 binary code를 학습하여 검색 정확도가 떨어지지 않으면서 검색 속도도 향상시키는 해쉬 테이블을 생성한다. 또한 mini-batch 상에서 optimal한 sparse binary code를 minimum cost flow problem을 통해 찾을 수 있음을 보였다. 우리의 방법은 Cifar-100과 ImageNet에서 precision@k, NMI에서 최고의 검색 정확도를 보였으며 각각 98× 와 478×의 검색 속도 증가가 있었다.
Sparse Kernel Learning for Image AnnotationSean Moran
The document describes an approach called Sparse Kernel Continuous Relevance Model (SKL-CRM) for image annotation. SKL-CRM learns data-adaptive visual kernels to better combine different image features like GIST, SIFT, color, and texture. It introduces a binary kernel-feature alignment matrix to learn which kernel functions are best suited to which features by directly optimizing annotation performance on a validation set. Evaluation on standard datasets shows SKL-CRM improves over baselines with fixed 'default' kernels, achieving a relative gain of 10-15% in F1 score.
Big Data Analysis with Signal Processing on GraphsMohamed Seif
This document discusses signal processing on graphs and big data analysis using graph theory concepts. It begins with introducing fundamental graph theory terms like nodes, edges, and adjacency matrices. It then explains how to define graph signals and how signal processing concepts like shifting, filtering, and Fourier transforms can be generalized to graphs. In particular, it describes how the graph shift replaces time shifts, graph filters are polynomials of the graph shift matrix, and the graph Fourier transform uses the eigenvectors of the graph shift matrix as the basis. The document concludes by discussing how eigenvalues represent frequencies on graphs and how filters affect the frequency content of graph signals.
This document summarizes that some slides were adapted from various sources including machine learning lectures and professors from Stanford University, Cornell University, IIT Kharagpur, and University of Illinois at Chicago. Students are requested to use this material for study purposes only and not distribute it.
This document provides an outline and introduction to deep generative models. It discusses what generative models are, their applications like image and speech generation/enhancement, and different types of generative models including PixelRNN/CNN, variational autoencoders, and generative adversarial networks. Variational autoencoders are explained in detail, covering how they introduce a restriction in the latent space z to generate new data points by sampling from the latent prior distribution.
Rate distortion theory calculates the minimum bit rate R needed to represent a source within a given distortion D. Scalar quantization maps samples to reconstruction levels, minimizing the distortion between original and reconstructed values. Optimal quantizers like Lloyd-Max iteratively assign samples to levels and update levels to centroids. Entropy-constrained quantization assigns variable length codes to levels to minimize a rate-distortion cost function.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
This document provides an overview of support vector machines (SVMs) for machine learning. It explains that SVMs find the optimal separating hyperplane that maximizes the margin between examples of separate classes. This is achieved by formulating SVM training as a convex optimization problem that can be solved efficiently. The document discusses how SVMs can handle non-linear decision boundaries using the "kernel trick" to implicitly map examples to higher-dimensional feature spaces without explicitly performing the mapping.
This document discusses strategies for containing computer viruses in a network. It presents a game theoretic model where nodes choose whether to install antivirus software. Nash equilibria are characterized but can be inefficient. Computing the optimal strategy that minimizes total infection risk is NP-hard. Approximating this solution can be reduced to the sum-of-squares partition problem, allowing for a near-optimal deployment. Open problems consider extensions like taxation mechanisms and strategic behavior.
- The lecture covered lighting surfaces in computer graphics, specifically how light interacts with visible surfaces through illumination models like ambient, diffuse, and specular lighting.
- Assignments were given including turning in Homework #3, an upcoming Homework #4, and Project #2 on texturing, shading, and lighting due after Spring Break.
- A midterm exam was announced for March 8th and office hours were provided for questions.
This document presents an online dictionary learning algorithm for sparse coding. It proposes solving the dictionary learning problem using an online approach that can handle large datasets and adapt to dynamic training sets much faster than batch algorithms. The online dictionary learning algorithm alternates between sparse coding and dictionary learning steps on minibatches of data. Experimental results show it converges much faster than batch approaches and other online methods like SGD. The algorithm is also extended to problems like NMF and sparse PCA.
Optimal interval clustering: Application to Bregman clustering and statistica...Frank Nielsen
This document summarizes an academic paper on optimal interval clustering and its applications to Bregman clustering and statistical mixture learning. It begins by introducing hard clustering and center-based clustering approaches. It then describes how k-means clustering is NP-hard in higher dimensions but polynomial-time in 1D using dynamic programming. The document outlines an optimal interval clustering algorithm using dynamic programming with runtime O(n2kT1(n)) or O(n2T1(n)) using a lookup table. It discusses how this can be applied to 1D Bregman clustering and learning statistical mixtures, providing experimental results on Gaussian mixture models. Finally, it considers perspectives on hierarchical clustering, dynamic clustering maintenance, and streaming approximations.
This talk introduces a new way to compact a (possibly non-uniform) probability distribution “F” into a set of representative points, called support points. These point sets can have important uses for both small-data problems, such as experimental design and uncertainty quantification in engineering applications, as well as big-data problems, such as the optimal reduction of large datasets in Bayesian computation. We first present support points as the minimizer of a powerful goodness-of-fit test called the energy distance, and discuss why such point sets are appealing to use for simulation and integration. An extension of this point set, called projected support points, is then introduced for high-dimensional integration under non-uniform “F”. We show that support points (and its variants) can provide good solutions to the aforementioned small-data and big-data problems. This talk concludes with some new ideas and ongoing work on experimental design, potential theory and robust optimization.
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineSoma Boubou
Object recognition from RGB-D sensors has recently emerged as a renowned and challenging research topic. The current systems often require large amounts of time to train the models and to classify new data. We proposed an effective and fast object recognition approach from 3D data acquired from depth sensors such as Structure or Kinect sensors.
Our contribution in this work} is to present a novel fast and effective approach for real-time object recognition from 3D depth data:
- First, we extract simple but effective frame-level features, which we name as differential frames, from the raw depth data.
- Second, we build a recognition system based on Extreme Learning Machine classifier with a Local Receptive Field (ELM-LRF).
1. The document discusses various machine learning algorithms for classification and regression including logistic regression, neural networks, decision trees, and ensemble methods.
2. It explains key concepts like overfitting, regularization, kernel methods, and different types of neural network architectures like convolutional neural networks.
3. Decision trees are described as intuitive algorithms for classification and regression but are unstable and use greedy optimization. Techniques like pre-pruning and post-pruning are used to improve decision trees.
Sequential quasi-Monte Carlo (SQMC) is a quasi-Monte Carlo (QMC) version of sequential Monte Carlo (or particle filtering), a popular class of Monte Carlo techniques used to carry out inference in state space models. In this talk I will first review the SQMC methodology as well as some theoretical results. Although SQMC converges faster than the usual Monte Carlo error rate its performance deteriorates quickly as the dimension of the hidden variable increases. However, I will show with an example that SQMC may perform well for some "high" dimensional problems. I will conclude this talk with some open problems and potential applications of SQMC in complicated settings.
Reweighting and Boosting to uniforimty in HEParogozhnikov
This document discusses using machine learning boosting techniques to achieve uniformity in particle physics applications. It introduces the uBoost and uGB+FL (gradient boosting with flatness loss) approaches, which aim to produce flat predictions along features of interest, like particle mass. This provides advantages over standard boosting by reducing non-uniformities that could create false signals. The document also proposes a non-uniformity measure and minimizing this with a flatness loss term during gradient boosting training. Examples applying these techniques to rare decay analysis, particle identification, and triggering are shown to achieve more uniform efficiencies than standard boosting.
Information-theoretic clustering with applicationsFrank Nielsen
Information-theoretic clustering with applications
Abstract: Clustering is a fundamental and key primitive to discover structural groups of homogeneous data in data sets, called clusters. The most famous clustering technique is the celebrated k-means clustering that seeks to minimize the sum of intra-cluster variances. k-Means is NP-hard as soon as the dimension and the number of clusters are both greater than 1. In the first part of the talk, we first present a generic dynamic programming method to compute the optimal clustering of n scalar elements into k pairwise disjoint intervals. This case includes 1D Euclidean k-means but also other kinds of clustering algorithms like the k-medoids, the k-medians, the k-centers, etc.
We extend the method to incorporate cluster size constraints and show how to choose the appropriate number of clusters using model selection. We then illustrate and refine the method on two case studies: 1D Bregman clustering and univariate statistical mixture learning maximizing the complete likelihood. In the second part of the talk, we introduce a generalization of k-means to cluster sets of histograms that has become an important ingredient of modern information processing due to the success of the bag-of-word modelling paradigm.
Clustering histograms can be performed using the celebrated k-means centroid-based algorithm. We consider the Jeffreys divergence that symmetrizes the Kullback-Leibler divergence, and investigate the computation of Jeffreys centroids. We prove that the Jeffreys centroid can be expressed analytically using the Lambert W function for positive histograms. We then show how to obtain a fast guaranteed approximation when dealing with frequency histograms and conclude with some remarks on the k-means histogram clustering.
References: - Optimal interval clustering: Application to Bregman clustering and statistical mixture learning IEEE ISIT 2014 (recent result poster) http://arxiv.org/abs/1403.2485
- Jeffreys Centroids: A Closed-Form Expression for Positive Histograms and a Guaranteed Tight Approximation for Frequency Histograms.
IEEE Signal Process. Lett. 20(7): 657-660 (2013) http://arxiv.org/abs/1303.7286
http://www.i.kyoto-u.ac.jp/informatics-seminar/
- The document discusses techniques for reducing the size of large datasets ("big data") by reducing the number of observations and features.
- Dimensionality reduction techniques like principal component analysis (PCA) and random projections can reduce the number of features to a lower dimensional space while preserving distances between observations.
- PCA finds an aligned coordinate system that maximizes the spread of data, while random projections randomly determine a coordinate system. Both techniques can significantly compress datasets, especially those with many redundant features like images.
Efficient end-to-end learning for quantizable representationsNAVER Engineering
발표자: 정연우(서울대 박사과정)
발표일: 2018.7.
유사한 이미지 검색을 위해 neural network를 이용해 이미지의 embedding을 학습시킨다. 기존 연구에서는 검색 속도 증가를 위해 binary code의 hamming distance를 활용하지만 여전히 전체 데이터 셋을 검색해야 하며 정확도가 떨어지는 다는 단점이 있다. 이 논문에서는 sparse한 binary code를 학습하여 검색 정확도가 떨어지지 않으면서 검색 속도도 향상시키는 해쉬 테이블을 생성한다. 또한 mini-batch 상에서 optimal한 sparse binary code를 minimum cost flow problem을 통해 찾을 수 있음을 보였다. 우리의 방법은 Cifar-100과 ImageNet에서 precision@k, NMI에서 최고의 검색 정확도를 보였으며 각각 98× 와 478×의 검색 속도 증가가 있었다.
Sparse Kernel Learning for Image AnnotationSean Moran
The document describes an approach called Sparse Kernel Continuous Relevance Model (SKL-CRM) for image annotation. SKL-CRM learns data-adaptive visual kernels to better combine different image features like GIST, SIFT, color, and texture. It introduces a binary kernel-feature alignment matrix to learn which kernel functions are best suited to which features by directly optimizing annotation performance on a validation set. Evaluation on standard datasets shows SKL-CRM improves over baselines with fixed 'default' kernels, achieving a relative gain of 10-15% in F1 score.
Big Data Analysis with Signal Processing on GraphsMohamed Seif
This document discusses signal processing on graphs and big data analysis using graph theory concepts. It begins with introducing fundamental graph theory terms like nodes, edges, and adjacency matrices. It then explains how to define graph signals and how signal processing concepts like shifting, filtering, and Fourier transforms can be generalized to graphs. In particular, it describes how the graph shift replaces time shifts, graph filters are polynomials of the graph shift matrix, and the graph Fourier transform uses the eigenvectors of the graph shift matrix as the basis. The document concludes by discussing how eigenvalues represent frequencies on graphs and how filters affect the frequency content of graph signals.
This document summarizes that some slides were adapted from various sources including machine learning lectures and professors from Stanford University, Cornell University, IIT Kharagpur, and University of Illinois at Chicago. Students are requested to use this material for study purposes only and not distribute it.
This document provides an outline and introduction to deep generative models. It discusses what generative models are, their applications like image and speech generation/enhancement, and different types of generative models including PixelRNN/CNN, variational autoencoders, and generative adversarial networks. Variational autoencoders are explained in detail, covering how they introduce a restriction in the latent space z to generate new data points by sampling from the latent prior distribution.
Rate distortion theory calculates the minimum bit rate R needed to represent a source within a given distortion D. Scalar quantization maps samples to reconstruction levels, minimizing the distortion between original and reconstructed values. Optimal quantizers like Lloyd-Max iteratively assign samples to levels and update levels to centroids. Entropy-constrained quantization assigns variable length codes to levels to minimize a rate-distortion cost function.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
This document provides an overview of support vector machines (SVMs) for machine learning. It explains that SVMs find the optimal separating hyperplane that maximizes the margin between examples of separate classes. This is achieved by formulating SVM training as a convex optimization problem that can be solved efficiently. The document discusses how SVMs can handle non-linear decision boundaries using the "kernel trick" to implicitly map examples to higher-dimensional feature spaces without explicitly performing the mapping.
This document discusses strategies for containing computer viruses in a network. It presents a game theoretic model where nodes choose whether to install antivirus software. Nash equilibria are characterized but can be inefficient. Computing the optimal strategy that minimizes total infection risk is NP-hard. Approximating this solution can be reduced to the sum-of-squares partition problem, allowing for a near-optimal deployment. Open problems consider extensions like taxation mechanisms and strategic behavior.
- The lecture covered lighting surfaces in computer graphics, specifically how light interacts with visible surfaces through illumination models like ambient, diffuse, and specular lighting.
- Assignments were given including turning in Homework #3, an upcoming Homework #4, and Project #2 on texturing, shading, and lighting due after Spring Break.
- A midterm exam was announced for March 8th and office hours were provided for questions.
This document presents an online dictionary learning algorithm for sparse coding. It proposes solving the dictionary learning problem using an online approach that can handle large datasets and adapt to dynamic training sets much faster than batch algorithms. The online dictionary learning algorithm alternates between sparse coding and dictionary learning steps on minibatches of data. Experimental results show it converges much faster than batch approaches and other online methods like SGD. The algorithm is also extended to problems like NMF and sparse PCA.
1. Software metrics are used to quantify attributes of software objects by mapping symbols to those objects. There are different types of metrics that can be calculated.
2. Important metrics include McCabe's cyclomatic complexity metric, which measures the number of linearly independent paths through a program, and Halstead's software science metrics which measure properties like program volume and difficulty.
3. Product metrics measure attributes of the software independent of how it was produced, like lines of code. Process metrics measure attributes relating to the development process.
Supervised learning is a category of machine learning that uses labeled datasets to train algorithms to predict outcomes and recognize patterns. Unlike unsupervised learning, supervised learning algorithms are given labeled training to learn the relationship between the input and the outputs.
Supervised machine learning algorithms make it easier for organizations to create complex models that can make accurate predictions. As a result, they are widely used across various industries and fields, including healthcare, marketing, financial services, and more.
Here, we’ll cover the fundamentals of supervised learning in AI, how supervised learning algorithms work, and some of its most common use cases.
Get started for free
How does supervised learning work?
The data used in supervised learning is labeled — meaning that it contains examples of both inputs (called features) and correct outputs (labels). The algorithms analyze a large dataset of these training pairs to infer what a desired output value would be when asked to make a prediction on new data.
For instance, let’s pretend you want to teach a model to identify pictures of trees. You provide a labeled dataset that contains many different examples of types of trees and the names of each species. You let the algorithm try to define what set of characteristics belongs to each tree based on the labeled outputs. You can then test the model by showing it a tree picture and asking it to guess what species it is. If the model provides an incorrect answer, you can continue training it and adjusting its parameters with more examples to improve its accuracy and minimize errors.
Once the model has been trained and tested, you can use it to make predictions on unknown data based on the previous knowledge it has learned.
How does supervised learning work?
The data used in supervised learning is labeled — meaning that it contains examples of both inputs (called features) and correct outputs (labels). The algorithms analyze a large dataset of these training pairs to infer what a desired output value would be when asked to make a prediction on new data.
For instance, let’s pretend you want to teach a model to identify pictures of trees. You provide a labeled dataset that contains many different examples of types of trees and the names of each species. You let the algorithm try to define what set of characteristics belongs to each tree based on the labeled outputs. You can then test the model by showing it a tree picture and asking it to guess what species it is. If the model provides an incorrect answer, you can continue training it and adjusting its parameters with more examples to improve its accuracy and minimize errors.
Once the model has been trained and tested, you can use it to make predictions on unknown data based on the previous knowledge it has learned.
Types of supervised learning
Supervised learning in machine learning is generally divided into two categories: classification and regre
The document discusses using dynamic programming to solve optimization problems like finding the longest increasing subsequence in a sequence, cutting a rod into pieces for maximum profit, and finding the shortest path in a directed acyclic graph. It provides examples and explanations of how to model these problems as dynamic programming problems and efficiently solve them using techniques like memoization and bottom-up computation.
The document discusses roundoff errors that occur in numerical algorithms involving real numbers. It provides an overview of floating point number representation, rounding errors, and how roundoff error accumulates over multiple operations. Specifically, it describes how numbers are stored in computers using a floating point system, the IEEE standard for floating point arithmetic, and identifies different sources of growing roundoff errors and how to reduce their effects.
A fast-paced introduction to Deep Learning that starts with a simple yet complete neural network (no frameworks), followed by an overview of activation functions, cost functions, backpropagation, and then a quick dive into CNNs. Next we'll create a neural network using Keras, followed by an introduction to TensorFlow and TensorBoard. For best results, familiarity with basic vectors and matrices, inner (aka "dot") products of vectors, and rudimentary Python is definitely helpful.
This document discusses MATLAB graphics and plotting discrete and continuous signals. It provides examples of plotting single and multiple signals, changing plot settings, using subplots, generating sine waves, and tasks for practicing MATLAB plotting. Key topics covered include using stem, plot, and hold on to display signals; setting line types and styles; manually zooming axes; creating subplots; and generating and playing sine waves with MATLAB.
We will describe and analyze accurate and efficient numerical algorithms to interpolate and approximate the integral of multivariate functions. The algorithms can be applied when we are given the function values at an arbitrary positioned, and usually small, existing sparse set of function values (samples), and additional samples are impossible, or difficult (e.g. expensive) to obtain. The methods are based on local, and global, tensor-product sparse quasi-interpolation methods that are exact for a class of sparse multivariate orthogonal polynomials.
This document discusses automatic Bayesian cubature for numerical integration. It begins with an introduction to multivariate integration and the challenges it poses. It then describes an automatic cubature algorithm that generates sample points and computes error bounds iteratively until a tolerance threshold is met. Next, it covers Bayesian cubature, which treats integrands as random functions to obtain probabilistic error bounds. It defines a Bayesian trio identity relating the integration error to discrepancies, variations, and alignments. The document concludes with discussions of future work.
Particle Filters and Applications in Computer Visionzukun
The document discusses particle filters and their applications in computer vision. It begins with an introduction to particle filters, which use a set of randomly chosen weighted samples to approximate a probability density function. Particle filters can be used for state estimation problems in nonlinear and non-Gaussian systems. The document then discusses several applications of particle filters in computer vision, including visual tracking, medical image analysis, human-computer interaction, image restoration, and robot navigation. Finally, it provides an outline of topics to be covered, including the general Bayesian framework, particle filtering methods, visual tracking techniques, and conclusions.
Pattern learning and recognition on statistical manifolds: An information-geo...Frank Nielsen
This document provides an overview of Frank Nielsen's talk on pattern learning and recognition using information geometry and statistical manifolds. The talk focuses on departing from vector space representations and dealing with (dis)similarities that do not have Euclidean or metric properties. This poses new theoretical and computational challenges for pattern recognition. The talk describes using exponential family mixture models defined on dually flat statistical manifolds induced by convex functions. On these manifolds, dual coordinate systems and dual affine geodesics allow for computing-friendly representations of divergences and similarities between probabilistic patterns. The techniques aim to achieve statistical invariance and enable algorithmic approaches to problems like Gaussian mixture modeling, shape retrieval, and diffusion tensor imaging analysis.
An introduction to Deep Learning (DL) concepts, starting with a simple yet complete neural network (no frameworks), followed by aspects of deep neural networks, such as back propagation, activation functions, CNNs, and the AUT theorem. Next, a quick introduction to TensorFlow and Tensorboard, and then some code samples with Scala and TensorFlow.
Slides: A glance at information-geometric signal processingFrank Nielsen
This document discusses information geometry and its applications in statistical signal processing. It introduces several key concepts:
1) Statistical signal processing models data with probability distributions like Gaussians and histograms. Information geometry provides a geometric framework for intuitive reasoning about these statistical models.
2) Exponential family mixture models generalize Gaussian and Rayleigh mixtures and are algorithmically useful in dually flat spaces.
3) Distances between statistical models, like Kullback-Leibler divergence and Bregman divergences, can be interpreted geometrically in terms of convex conjugates and Legendre transformations.
1. Motivation: why do we need low-rank tensors
2. Tensors of the second order (matrices)
3. CP, Tucker and tensor train tensor formats
4. Many classical kernels have (or can be approximated in ) low-rank tensor format
5. Post processing: Computation of mean, variance, level sets, frequency
The document discusses various clustering algorithms and concepts:
1) K-means clustering groups data by minimizing distances between points and cluster centers, but it is sensitive to initialization and may find local optima.
2) K-medians clustering is similar but uses point medians instead of means as cluster representatives.
3) K-center clustering aims to minimize maximum distances between points and clusters, and can be approximated with a farthest-first traversal algorithm.
This document provides an overview of dynamic programming, including examples of 1-dimensional, 2-dimensional, interval, tree, and subset dynamic programming problems. It explains the general process of solving dynamic programming problems through defining subproblems, finding recurrences relating the subproblems, and solving base cases. Specific examples covered include the longest common subsequence problem, editing strings to palindromes, tree coloring, and the traveling salesman problem.
This document provides an overview and introduction to deep learning concepts including linear regression, activation functions, gradient descent, backpropagation, hyperparameters, convolutional neural networks (CNNs), recurrent neural networks (RNNs), and TensorFlow. It discusses clustering examples to illustrate neural networks, explores different activation functions and cost functions, and provides code examples of TensorFlow operations, constants, placeholders, and saving graphs.
Digital Signal Processing[ECEG-3171]-Ch1_L03Rediet Moges
This Digital Signal Processing Lecture material is the property of the author (Rediet M.) . It is not for publication,nor is it to be sold or reproduced.
#Africa#Ethiopia
1. ELEN E4810: Digital Signal Processing
Topic 1: Introduction
1. Course overview
2. Digital Signal Processing
3. Basic operations & block diagrams
4. Classes of sequences
Dan Ellis 2012-09-05 1
2. 1. Course overview
Digital signal processing:
Modifying signals with computers
Web site:
http://www.ee.columbia.edu/~dpwe/e4810/
Book:
Mitra “Digital Signal Processing”
(3rd ed., 2005)
Instructor: dpwe@ee.columbia.edu
Dan Ellis 2012-09-05 2
3. Grading structure
Homeworks: 20%
Mainly from Mitra
Wednesday-Wednesday schedule
Collaborate, don’t copy
Midterm: 20%
One session
Final exam: 30%
Project: 30%
Dan Ellis 2012-09-05 3
4. Course project
Goal: hands-on experience with DSP
Practical implementation
Work in pairs or alone
Brief report, optional presentation
Recommend MATLAB
Ideas on website
Don’t copy! Cite your sources!
Dan Ellis 2012-09-05 4
5. Example past projects
Solo Singing Detection
on web site
Guitar Chord Classifier
Speech/Music Discrimination
Room sonar
Construction equipment monitoring
DTMF decoder
Reverb algorithms
Compression algorithms
Dan Ellis 2012-09-05 5 W
6. MATLAB
Interactive system for numerical
computation
Extensive signal processing library
Focus on algorithm, not implementation
Access:
Columbia Site License:
https://portal.seas.columbia.edu/matlab/
Student Version (need Sig. Proc. toolbox)
Engineering Terrace 251 computer lab
Dan Ellis 2012-09-05 6
8. 2. Digital Signal Processing
Signals:
Information-bearing function
E.g. sound: air pressure variation at a
point as a function of time p(t)
Dimensionality:
Sound: 1-Dimension
Greyscale image i(x,y) : 2-D
Video: 3 x 3-D: {r(x,y,t) g(x,y,t) b(x,y,t)}
Dan Ellis 2012-09-05 8
9. Example signals
Noise - all domains
Spread-spectrum phone - radio
ECG - biological
Music
Image/video - compression
….
Dan Ellis 2012-09-05 9
10. Signal processing
Modify a signal to extract/enhance/
rearrange the information
Origin in analog electronics e.g. radar
Examples…
Noise reduction
Data compression
Representation for recognition/
classification…
Dan Ellis 2012-09-05 10
11. Digital Signal Processing
DSP = signal processing on a computer
Two effects: discrete-time, discrete level
x(t)
x[n]
Dan Ellis 2012-09-05 11
12. DSP vs. analog SP
Conventional signal processing:
p(t) Processor q(t)
Digital SP system:
p[n] q[n]
p(t) A/D Processor D/A q(t)
Dan Ellis 2012-09-05 12
13. Digital vs. analog
Pros
Noise performance - quantized signal
Use a general computer - flexibility, upgrde
Stability/duplicability
Novelty
Cons
Limitations of A/D & D/A
Baseline complexity / power consumption
Dan Ellis 2012-09-05 13
14. DSP example
Speech time-scale modification:
extend duration without altering pitch
M
Dan Ellis 2012-09-05 14
15. 3. Operations on signals
Discrete time signal often obtained by
sampling a continuous-time signal
Sequence {x[n]} = xa(nT), n=…-1,0,1,2…
T= samp. period; 1/T= samp. frequency
Dan Ellis 2012-09-05 15
16. Sequences
Can write a sequence by listing values:
{x[n]} = {. . . , 0.2, 2.2, 1.1, 0.2, 3.7, 2.9, . . .}
↑
Arrow indicates where n=0
Thus,
Dan Ellis 2012-09-05 16
17. Left- and right-sided
x[n] may be defined only for certain n:
N1 ≤ n ≤ N2: Finite length (length = …)
N1 ≤ n: Right-sided (Causal if N1 ≥ 0)
n ≤ N2: Left-sided (Anticausal)
Can always extend with zero-padding
Left-sided Right-sided
Dan Ellis 2012-09-05 17
18. Operations on sequences
Addition operation:
Adder x[n] y[n]
w[n] y[n] = x[n] + w[n]
Multiplication operation
A
Multiplier x[n] y[n]
y[n] = A x[n]
Dan Ellis 2012-09-05 18
19. More operations
Product (modulation) operation:
x[n] y[n]
Modulator
w[n] y[n] = x[n] w[n]
E.g. Windowing:
Multiplying an infinite-length sequence
by a finite-length window sequence
to extract a region
Dan Ellis 2012-09-05 19
20. Time shifting
Time-shifting operation:
where N is an integer
If N > 0, it is delaying operation
Unit delay x[n] y[n]
If N < 0, it is an advance operation
Unit advance
x[n] y[n]
Dan Ellis 2012-09-05 20
22. Up- and down-sampling
Certain operations change the effective
sampling rate of sequences by adding
or removing samples
Up-sampling = adding more samples
= interpolation
Down-sampling = discarding samples
= decimation
Dan Ellis 2012-09-05 22
23. Down-sampling
In down-sampling by an integer factor
M > 1, every M-th sample of the input
sequence is kept and M - 1 in-between
samples are removed:
xd [n] = x[nM ]
M xd [n]
Dan Ellis 2012-09-05 23
24. Down-sampling
An example of down-sampling
3 y[n] = x[3n]
Dan Ellis 2012-09-05 24
25. Up-sampling
Up-sampling is the converse of down-
sampling: L-1 zero values are inserted
between each pair of original values.
x[n/L] n = 0, ±L, 2L, . . .
xu [n] =
0 otherwise
L
Dan Ellis 2012-09-05 25
26. Up-sampling
An example of up-sampling
3
not inverse of downsampling!
Dan Ellis 2012-09-05 26
27. Complex numbers
.. a mathematical convenience that lead
to simple expressions
A second “imaginary” dimension (j≡√-1)
is added to all values.
Rectangular form: x = xre + j·xim
where magnitude |x| = √(xre2 + xim2)
and phase θ = tan-1(xim/xre)
Polar form: x = |x| ejθ = |x|cosθ + j· |x|sinθ
! (! e = cos ! + j sin )
j
!
Dan Ellis 2012-09-05 27
28. Complex math
When adding, real
and imaginary parts
add: (a+jb) + (c+jd)
= (a+c) + j(b+d)
x When multiplying,
magnitudes multiply
and phases add:
rejθ·sejφ = rsej(θ+φ)
Phases modulo 2π
Dan Ellis 2012-09-05 28
29. Complex conjugate
Flips imaginary part / negates phase:
Conjugate x* = xre – j·xim = |x| ej(–µ)
Useful in resolving to real quantities:
x + x* = xre + j·xim + xre – j·xim = 2xre
x·x* = |x| ej(µ) |x| ej(–µ) = |x|2
Dan Ellis 2012-09-05 29
30. Classes of sequences
Useful to define broad categories…
Finite/infinite (extent in n)
Real/complex:
x[n] = xre[n] + j·xim[n]
Dan Ellis 2012-09-05 30
31. Classification by symmetry
Conjugate symmetric sequence:
if x[n] = xre[n] + j·xim[n]
then xcs[n] = xcs*[-n]
= xre[-n] – j·xim[-n]
Conjugate antisymmetric:
xca[n] = –xca*[-n] = –xre[-n] + j·xim[-n]
Dan Ellis 2012-09-05 31
32. Conjugate symmetric decomposition
Any sequence can be expressed as
conjugate symmetric (CS) /
antisymmetric (CA) parts:
x[n] = xcs[n] + xca[n]
where:
xcs[n] = 1/2(x[n] + x*[-n]) = xcs*[-n]
xca[n] = 1/2(x[n] – x*[-n]) = -xca*[-n]
When signals are real,
CS → Even (xre[n] = xre[-n]), CA → Odd
Dan Ellis 2012-09-05 32
33. Basic sequences
Unit sample sequence:
1
n
–4 –3 –2 –1 0 1 2 3 4 5 6
1
Shift in time:
±[n - k] k–2 k–1 k
n
k+1 k+2 k+3
Can express any sequence with ±:
{Æ0,Æ1,Æ2..}= Æ0±[n] + Æ1±[n-1] + Æ2±[n-2]..
Dan Ellis 2012-09-05 33
34. More basic sequences
Unit step sequence: 1, n 0
µ [n] =
0, n < 0
Relate to unit sample:
[n] = µ[n] µ[n 1]
µ[n] =
n
[k]
k=
Dan Ellis 2012-09-05 34
35. Exponential sequences
Exponential sequences are
eigenfunctions of LTI systems
General form: x[n] = A·Æn
If A and Æ are real (and positive):
|Æ| > 1 |Æ| < 1
Dan Ellis 2012-09-05 35
36. Complex exponentials
x[n] = A·Æn
Constants A, Æ can be complex :
A = |A|ej¡ ; Æ = e(æ + j!)
→ x[n] = |A| eæn ej(!n + ¡)
scale varying varying
magnitude phase
Dan Ellis 2012-09-05 36
37. Complex exponentials
Complex exponential sequence can
‘project down’ onto real & imaginary
axes to give sinusoidal sequences
x[n] = exp{( 12 + j 6 )n} e = cos + j sin
1 j
xre[n] xim[n]
xre[n] = e-n/12cos(πn/6) xim[n] = e-n/12sin(πn/6) M
Dan Ellis 2012-09-05 37
38. Periodic sequences
A sequence satisfying
is called a periodic sequence with a
period N where N is a positive integer and
k is any integer.
Smallest value of N satisfying
is called the fundamental period
Dan Ellis 2012-09-05 38
39. Periodic exponentials
Sinusoidal sequence and
complex exponential sequence
are periodic sequences of period N only if
o N = 2 r with N & r positive integers
Smallest value of N satisfying
is the fundamental period of the
sequence
r = 1 → one sinusoid cycle per N samples
r > 1 → r cycles per N samples M
Dan Ellis 2012-09-05 39
40. Symmetry of periodic sequences
An N-point finite-length sequence xf[n]
defines a periodic sequence:
“n modulo N” n N = n + rN
x[n] = xf [ n N]
s.t. 0 n N < N, r Z
Symmetry of xf [n] is not defined
because xf [n] is undefined for n < 0
Define Periodic Conjugate Symmetric:
xpcs [n] =1/2 (x[n] + x [ n N ])
=1/2 xf [n] + xf [N n] 1 n<N
Dan Ellis 2012-09-05 40
41. Sampling sinusoids
Sampling a sinusoid is ambiguous:
x1 [n] = sin(!0n)
x2 [n] = sin((!0+2πr)n) = sin(!0n) = x1 [n]
Dan Ellis 2012-09-05 41
42. Aliasing
E.g. for cos(!n), ! = 2ºr ± !0
all (integer) r appear the same after
sampling
We say that a larger ! appears
aliased to a lower frequency
Principal value for discrete-time
frequency: 0 ≤ !0 ≤ º
( i.e. less than 1/2 cycle per sample)
Dan Ellis 2012-09-05 42
43. ELEN E4810: Digital Signal Processing
Topic 2: Time domain
1. Discrete-time systems
2. Convolution
3. Linear Constant-Coefficient Difference
Equations (LCCDEs)
4. Correlation
Dan Ellis 2012-09-12 1
44. 1. Discrete-time systems
A system converts input to output:
x[n] DT System y[n] {y[n]} = f ({x[n]}) A
n
E.g. Moving Average (MA):
1/M
1 M 1
y[n] = x[n k]
x[n]
z-1 1/M
x[n-1] + y[n] M k=0
z-1 1/M
x[n-2] (M = 3)
Dan Ellis 2012-09-12 2
45. Moving Average (MA)
x[n] A
-1 1 2 3 4 5 6 7 8 9 n
1/M
x[n] x[n-1]
z-1 1/M
x[n-1] + y[n] -1 1 2 3 4 5 6 7 8 9 n
z-1 1/M x[n-2]
x[n-2]
-1 1 2 3 4 5 6 7 8 9 n
1 M 1
y[n] = x[n k]
y[n]
M k=0 -1 1 2 3 4 5 6 7 8 9 n
Dan Ellis 2012-09-12 3
46. MA Smoother
MA smoothes out rapid variations
(e.g. “12 month moving average”)
e.g. signal noise
1 4
x[n] = s[n] + d[n] y[n] = k=0 x[n k]
5 5-pt
moving
average
Dan Ellis 2012-09-12 4
47. Accumulator
Output accumulates all past inputs:
n
y[n] = x[]
= x[n] + y[n]
n1
= x[] + x[n]
z-1
y[n-1]
=
= y[n 1]+ x[n]
Dan Ellis 2012-09-12 5
48. Accumulator
x[n] A
-1 1 2 3 4 5 6 7 8 9 n
y[n-1]
x[n] + y[n]
z-1 -1 1 2 3 4 5 6 7 8 9 n
y[n-1]
y[n]
-1 1 2 3 4 5 6 7 8 9 n
Dan Ellis 2012-09-12 6
M
49. Classes of DT systems
Linear systems obey superposition:
x[n] DT system y[n]
if input x1[n] → output y1[n], x2 → y2 ...
given a linear combination
of inputs:
then output
for all Æ, Ø, x1, x2
i.e. same linear combination of outputs
Dan Ellis 2012-09-12 7
50. Linearity: Example 1
n
Accumulator: y[n] = x[]
=
x[n] = x1[n] + x 2 [n]
n
y[n] = (x1[] + x 2[])
=
= (x1[]) + (x 2 [])
= x1[] + x 2 []
= y1[n] + y2 [n] Linear
Dan Ellis 2012-09-12 8
52. Linearity Example 3:
n
‘Offset’ accumulator: y[n] = C + x[]
n =
y1[n] = C + x1[]
=
n
but y[n] = C + (x1[] + x2 [])
=
y1[n] + y2 [n] Nonlinear
.. unless C = 0
Dan Ellis 2012-09-12 10
53. Property: Shift (time) invariance
Time-shift of input
causes same shift in output
i.e. if x1[n] → y1[n]
then x[n] = x1[n n0 ]
y[n] = y1[n n0 ]
i.e. process doesn’t depend on absolute
value of n
Dan Ellis 2012-09-12 11
54. Shift-invariance counterexample
Upsampler: x[n] L y[n]
x[n/L] n = 0, ±L, ±2L, . . .
y[n] =
0 otherwise
y1 [n] = x1 [n/L] (n = r · L)
x[n] = x1 [n n0 ]
y[n] = x[n/L] = x1 [n/L n0 ]
Not shift invariant
n L · n0
= x1 = y1 [n L · n0 ] = y1 [n n0 ]
L
Dan Ellis 2012-09-12 12
55. Another counterexample
y[n] = n x[n] scaling by time index
Hence y1[n n0 ] = (n n0 ) x1[n n0 ]
If x[n] = x1[n n0 ] ≠
then y[n] = n x1[n n0 ]
Not shift invariant
- parameters depend on n
Dan Ellis 2012-09-12 13
56. Linear Shift Invariant (LSI)
Systems which are both linear and
shift invariant are easily manipulated
mathematically
This is still a wide and useful class of
systems
If discrete index corresponds to time,
called Linear Time Invariant (LTI)
Dan Ellis 2012-09-12 14
57. Causality
If output depends only on past and
current inputs (not future), system is
called causal
Formally, if x1[n] → y1[n] & x2[n] → y2[n]
Causal x1[n] = x 2 [n] n < N
y1[n] = y2 [n] n < N
Dan Ellis 2012-09-12 15
58. Causality example
1 M 1
Moving average: y[n] = x[n k]
M k=0
y[n] depends on x[n-k], k ≥ 0 → causal
‘Centered’ moving average
yc [n] = y[n + ( M 1) /2]
=
M
(
1
)
x[n] + k =1 x[n k] + x[n + k]
(M 1) /2
.. looks forward in time → noncausal
.. Can make causal by delaying
Dan Ellis 2012-09-12 16
59. Impulse response (IR)
±[n] 1
Impulse -3 -2 -1 1 2 3 4 5 6 7 n
(unit sample sequence)
Given a system: x[n] DT system y[n]
∆
if x[n] = ±[n] then y[n] = h[n]
“impulse response”
LSI system completely specified by h[n]
Dan Ellis 2012-09-12 17
62. Convolution sum
Hence, since x[n] = x[k][n k]
k=
Convolution
For LSI, y[n] = x[k]h[n k] sum
k=
written as y[n] = x[n] * h[n]
Summation is symmetric in x and h
i.e. l = n – k →
x[n] * h[n] = x[n l]h[l] = h[n] * x[n]
l=
Dan Ellis 2012-09-12 20
68. Convolution notes
Total nonzero length of convolving N
and M point sequences is N+M-1
Adding the indices of the terms within
the summation gives n :
y[n] = h[k]x[n k] k + (n k ) = n
k=
i.e. summation indices move in opposite
senses
Dan Ellis 2012-09-12 26
69. Convolution in MATLAB
The M-file conv implements the
convolution sum of two finite-length
sequences
If a = [0 3 1 2 -1]
b = [3 2 1]
then conv(a,b) yields
[0 9 9 11 2 0 -1]
Dan Ellis 2012-09-12 27
M
70. Connected systems
Cascade connection:
*
Impulse response h[n] of the cascade of
two systems with impulse responses
h1[n] and h2[n] is h[n] = h1[n] *
By commutativity,
Dan Ellis 2012-09-12 28
71. Inverse systems
±[n] is identity for convolution
i.e. x[n] * ±[n] = x[n]
Consider
x[n] y[n] z[n]
z[n] = h2 [n] * y[n] = h2 [n] * h1[n] * x[n]
= x[n] if h2 [n] * h1[n] = [n]
h2[n] is the inverse system of h1[n]
Dan Ellis 2012-09-12 29
72. Inverse systems
Use inverse system to recover input x[n]
from output y[n] (e.g. to undo effects of
transmission channel)
Only sometimes possible - e.g. cannot
‘invert’ h1[n] = 0
In general, attempt to solve
h2 [n] * h1[n] = [n]
Dan Ellis 2012-09-12 30
73. Inverse system example
Accumulator:
Impulse response h1[n] = μ[n]
‘Backwards difference’ -3 -2 -1 1 2 3 4 5 6 7 n
.. has desired property:
µ[n] µ[n 1] = [n] -3 -2 -1 1 2 3 4 5 6 7 n
Thus, ‘backwards difference’ is inverse
system of accumulator.
Dan Ellis 2012-09-12 31
74. Parallel connection
Impulse response of two parallel
systems added together is:
Dan Ellis 2012-09-12 32
75. 3. Linear Constant-Coefficient
Difference Equation (LCCDE)
General spec. of DT, LSI, finite-dim sys:
N M
d k y[n k] = pk x[n k] defined by {dk},{pk}
k=0 k=0 order = max(N,M)
Rearrange for y[n] in causal form:
N M
dk pk
y[n] = y[n k] + x[n k]
k=1 d 0 k=0 d 0
WLOG, always have d0 = 1
Dan Ellis 2012-09-12 33
76. Solving LCCDEs
“Total solution”
y[n] = yc [n] + y p [n]
Complementary Solution Particular Solution
N
for given forcing function
satisfies d k y[n k] = 0 x[n]
k=0
Dan Ellis 2012-09-12 34
77. Complementary Solution
General form of unforced oscillation
i.e. system’s ‘natural modes’
Assume yc has form yc [n] = n
N
dk nk
=0
k=0
nN
(d
0
N
+ d1 N 1
+…+ d N 1 + d N ) = 0
N
d k N k = 0 Characteristic polynomial
k=0 of system - depends only on {dk}
Dan Ellis 2012-09-12 35
78. Complementary Solution
N
d k N k = 0 factors into roots λi , i.e.
k=0 ( 1 )( 2 )... = 0
Each/any λi satisfies eqn.
Thus, complementary solution:
yc [n] = 1 1 + 2 2 + 3 3 + ...
n n n
Any linear combination will work
→ αis are free to match initial conditions
Dan Ellis 2012-09-12 36
79. Complementary Solution
Repeated roots in chr. poly:
( 1 ) ( 2 )... = 0
L
yc [n] = 1 1 + 2 n 1 + 3 n 1
n n 2 n
+...+ L n 1 + ...
L1 n
Complex λis → sinusoidal yc [n] = i i
n
Dan Ellis 2012-09-12 37
80. Particular Solution
Recall: Total solution y[n] = yc [n] + y p [n]
Particular solution reflects input
‘Modes’ usually decay away for large n
leaving just yp[n]
Assume ‘form’ of x[n], scaled by β:
e.g. x[n] constant → yp[n] = β
x[n] = λ0n → yp[n] = β · λ0n (λ0 ∉ λi)
or = β nL λ0n (λ0 ∈ λi)
Dan Ellis 2012-09-12 38
81. LCCDE example
y[n] + y[n 1] 6y[n 2] = x[n]
x[n] + y[n]
Need input: x[n] = 8μ[n]
Need initial conditions:
y[-1] = 1, y[-2] = -1
Dan Ellis 2012-09-12 39
82. LCCDE example
Complementary solution:
y[n] + y[n 1] 6y[n 2] = 0; y[n] = n
n2
( 2
+ 6) = 0
( + 3)( 2) = 0 →roots λ1 = -3, λ2 = 2
yc [n] = 1 (3) + 2 (2)
n n
α1, α2 are unknown at this point
Dan Ellis 2012-09-12 40
84. LCCDE example
Total solution y[n] = yc [n] + y p [n]
= 1 (3) + 2 (2) +
n n
Solve for unknown αis by substituting
initial conditions into DE at n = 0, 1, ...
y[n] + y[n 1] 6y[n 2] = x[n]
from ICs
n = 0 y[0] + y[1] 6y[2] = x[0]
1 + 2 + +1+ 6 = 8
1 + 2 = 3
Dan Ellis 2012-09-12 42
85. LCCDE example
n = 1 y[1] + y[0] 6y[1] = x[1]
1 (3) + 2 (2) + + 1 + 2 + 6 = 8
2 1 + 3 2 = 18
solve: α1 = -1.8, α2 = 4.8
Hence, system output:
y[n] = 1.8(3) + 4.8(2) 2 n 0
n n
Don’t find αis by solving with ICs at
(ICs may not reflect natural modes;
n = -1,-2 Mitra example 2.37/38 is wrong)
Dan Ellis 2012-09-12 43
M
86. LCCDE solving summary
Difference Equation (DE):
Ay[n] + By[n-1] + ... = Cx[n] + Dx[n-1] + ...
Initial Conditions (ICs): y[-1] = ...
DE RHS = 0 with y[n]=λn → roots {λi}
gives complementary soln yc [n] = i i
n
Particular soln: yp[n] ~ x[n]
solve for βλ0n “at large n”
αis by substituting DE at n = 0, 1, ...
ICs for y[-1], y[-2]; yt=yc+yp for y[0], y[1]
Dan Ellis 2012-09-12 44
87. LCCDEs: zero input/zero state
Alternative approach to solving
LCCDEs is to solve two subproblems:
yzi[n], response with zero input (just ICs)
yzs[n], response with zero state (just x[n])
Because of linearity, y[n] = yzi[n]+yzs[n]
Both subproblems are ‘real’
But, have to solve for αis twice
(then sum them)
Dan Ellis 2012-09-12 45
88. Impulse response of LCCDEs
Impulse response: δ[n] LCCDE h[n]
i.e. solve with x[n] = δ[n] → y[n] = h[n]
(zero ICs)
With x[n] = δ[n], ‘form’ of yp[n] = βδ[n]
→ solve y[n] for n = 0,1, 2... to find αis
Dan Ellis 2012-09-12 46
89. LCCDE IR example
e.g. y[n] + y[n 1] 6y[n 2] = x[n]
(from before); x[n] = δ[n]; y[n] = 0 for n<0
yc [n] = 1 (3) + 2 (2) yp[n] = βδ[n]
n n
1
n = 0: y[0] + y[1] 6y[2] = x[0]
⇒ α1 + α2 + β = 1
n = 1: α1(–3) + α2(2) + 1 = 0
n = 2: α1(9) + α2(4) – 1 – 6 = 0
⇒ α1 = 0.6, α2 = 0.4, β = 0
thus h[n] = 0.6(3) + 0.4 (2) Infinite length
n n n≥0
Dan Ellis 2012-09-12 47
M
90. System property: Stability
Certain systems can be unstable e.g.
x[n] + y[n] y[n]
z-1 ...
2
-1 1 2 3 4 n
Output grows without limit in some
conditions
Dan Ellis 2012-09-12 48
91. Stability
Several definitions for stability; we use
Bounded-input, bounded-output
(BIBO) stable
For every bounded input x[n] < Bx n
output is also subject to a finite bound,
y[n] < By n
Dan Ellis 2012-09-12 49
92. Stability example
1 M 1
MA filter: y[n] = k=0 x[n k]
M
1 M 1
y[n] = k=0 x[n k]
M
1 M 1
k=0 x[n k]
M
1
M Bx By → BIBO Stable
M
Dan Ellis 2012-09-12 50
93. Stability & LCCDEs
LCCDE output is of form:
y[n] = 1 1 + 2 2 + ...+ 0 + ...
n n n
αs and βs depend on input & ICs,
but to be bounded for any input
we need |λ| < 1
Dan Ellis 2012-09-12 51
94. 4. Correlation
Correlation ~ identifies similarity
between sequences:
Cross
correlation rxy [ ] = x[n]y[n ]
of x against y n=
“lag”
Note: ryx [ ] = y[n]x[n ]
n= call m = n – ℓ
= y[m + ]x[m] = rxy [ ]
m=
Dan Ellis 2012-09-12 52
M
95. Correlation and convolution
Correlation: rxy [n] = x[k]y[k n]
k=
Convolution: x[n] y[n] = x[k]y[n k]
k=
Hence: rxy [n] = x[n] y[ n]
Correlation may be calculated by
convolving with time-reversed sequence
Dan Ellis 2012-09-12 53
96. Autocorrelation
Autocorrelation (AC) is correlation of
signal with itself:
rxx [] = x[n]x[n ] = rxx []
n=
Note: rxx [0] = x 2 [n] = x Energy of
sequence x[n]
n=
Dan Ellis 2012-09-12 54
97. Correlation maxima
rxx []
Note: rxx [] rxx [0] 1
rxx [0]
rxy []
Similarly: rxy [] x y 1
rxx [0]ryy [0]
From geometry, xi 2
angle
between
xy = i xi yi = x y cos x and y
when x//y, cosθ = 1, else cosθ < 1
Dan Ellis 2012-09-12 55
98. AC of a periodic sequence
Sequence of period N: x[n] = x[n + N]
˜ ˜
Calculate AC over a finite window:
M
1
rxx [] =
˜˜ x[n]x[n ]
2M +1 n=M
˜ ˜
N 1
1
= x[n]x[n ]
˜ ˜ if M >> N
N n=0
Dan Ellis 2012-09-12 56
99. AC of a periodic sequence
1 N 1 2 Average energy per
rxx [0] = x [n] = Px
˜˜ ˜ ˜ sample or Power of x
N n=0
N 1
1
rxx [ + N] = x[n]x[n N ] = rxx []
˜˜ ˜ ˜ ˜˜
N n=0
i.e AC of periodic sequence is periodic
Dan Ellis 2012-09-12 57
100. What correlations look like
rxx []
AC of any x[n]
rxx []
˜˜
AC of periodic
rxy []
Cross correlation
Dan Ellis 2012-09-12 58
102. Correlation in action
Close mic vs.
video camera
mic
Short-time
cross-correlation
Dan Ellis 2012-09-12 60
103. ELEN E4810: Digital Signal Processing
Topic 3: Fourier domain
1. The Fourier domain
2. Discrete-Time Fourier Transform (DTFT)
3. Discrete Fourier Transform (DFT)
4. Convolution with the DFT
Dan Ellis 2012-09-24 1
104. 1. The Fourier Transform
Basic observation (continuous time):
A periodic signal can be decomposed
into sinusoids at integer multiples of the
fundamental frequency
i.e. if x (t) = x (t +T )
˜ ˜
we can approach x with ˜ Harmonics
M
2 k of the
x(t)
˜ ak cos t+ k fundamental
T
k=0
Dan Ellis 2012-09-24 2
105. M
2 k
Fourier Series ak cos
T
t+ k
k=0
For a square wave,
k 1
( 1) 2
1
k = 1, 3, 5, . . .
k = 0; ak = k
0 otherwise
2 1 2 1 2
i.e. x(t) = cos
T
t
3
cos
T
3t + cos
5 T
5t ...
1
0.5
0
0.5
1
1.5 1 0.5 0 0.5 1 1.5
Dan Ellis 2012-09-24 3
M
106. Fourier domain
x is equivalently described
by its Fourier Series ak 1.0
parameters:
k 1 1 1 2 3 4 5 6 7 k
ak = ( 1) 2 k = 1, 3, 5, . . .
k ¡k
π
Negative ak is
equivalent to phase of º 1 2 3 4 5 6 7 k
M
j 2T k t
Complex form: x(t)
˜ ck e
k= M
Dan Ellis 2012-09-24 4
107. M
Fourier analysis x(t)
˜
k= M
ck ej 2T k t
How to find {|ck |}, {arg{ck }} ?
Inner product with complex sinusoids:
1 T /2
j 2T k t
ck = x(t)e dt
T T /2 but
ej = cos + jsin
1 2 k 2 k
= x(t) cos( t)dt j x(t) sin( t)dt
T T T
Dan Ellis 2012-09-24 5
108. M
Fourier analysis x(t)
˜ ck e j 2T k t
2 k= M
Consider x(t) = cos l t
T
.. so ck should = 0 except k = ±l
Then
ck =
1
T
( x (t ) cos 2kt dt j x (t ) sin 2kt dt
T T ) 0
∴
even·odd
=
1
T
( cos 2T lt cos 2kt dt j cos 2T lt sin 2kt dt
T
T )
Dan Ellis 2012-09-24 6
109. Fourier analysis
Works if k, l are positive integers,
∴
(say 1 1 k = ±l
cos(kt) · cos(lt)dt =
T=2π) 2 0 otherwise
= 1
cos(k + l)t + cos(k
1
0.5
0
cos(1•t)
4 l)tdt
-0.5
cos(2•t)
-1
sin(k+l)t sin(k l)t
= 1
+
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
t/
4 k+l k l
= 1
2 (sinc (k + l) + sinc (k l))
Dan Ellis 2012-09-24 7
110. sinc
sin x
sincx =
∆
x
sin(x)/x
1.0
4 3 2 2 3 4 x
sin(x)
y=x
= 1 when x = 0
= 0 when x = r·º, r ≠ 0, r = ±1, ±2, ±3,...
Dan Ellis 2012-09-24 8
111. Fourier Analysis
1 T /2
j 2T k t
Thus, ck = T x(t)e dt
T /2
j 2T k t
because real & imag sinusoids in e
pick out the corresponding sinusoidal
components linearly combined
M
j 2T k t
in x(t) = ck e
k= M
Dan Ellis 2012-09-24 9
112. Fourier Transform
Fourier series for periodic signals
extends naturally to Fourier Transform
for any (CT) signal (not just periodic):
Fourier
X(j ) = x(t)e j t
dt Transform (FT)
1 Inverse Fourier
x(t) = X(j )e j t
d Transform (IFT)
2
Discrete index k → continuous freq. Ω
Dan Ellis 2012-09-24 10
114. Fourier Transform of a sine
Assume x(t) = e
j 0 t
Z •
1 jWt
Now, since x(t) = X(W)e dW
2p •
...we know X(W) = 2pd(W W0)
...where ±(x) is the Dirac delta function
δ(x-x0)
(continuous time) i.e.
( x x0 ) f ( x) dx = f ( x0 ) x0
f(x)
x
→ x(t) = Ae
j 0 t
X() = A ( 0 )
Dan Ellis 2012-09-24 12
115. Fourier Transforms
Time Frequency
Fourier Series Continuous Discrete
(FS) periodic ~
x(t) infinite ck
Fourier Continuous Continuous
Transform (FT) infinite x(t) infinite X(Ω)
Discrete-Time Discrete Continuous
FT (DTFT) infinite x[n] periodic X(ejω)
Discrete FT Discrete Discrete
(DFT) finite/pdc ~x[n] finite/pdc X[k]
Dan Ellis 2012-09-24 13
116. 2. Discrete Time FT (DTFT)
FT defined for discrete sequences:
j
X(e ) = x[n]e jn
DTFT
n=
Summation (not integral)
Discrete (normalized)
frequency variable !
Argument is ej!, not j!
Dan Ellis 2012-09-24 14
117. DTFT example
e.g. x[n] = Æn·μ[n], |Æ| < 1 -1 1 2 3 4 5 6 7 n
X(e ) = n= µ[n]e
j n jn
= n=0 (e )
j n
S = c cS = c
n n
1
= n =0 n =1
Im 1 - e-j
j
1 e
Re
n
S cS = c = 1 0
|X(ej )| arg{X(ej )}
3
1 ( |c | < 1 )
2
S =
1c
2 3 4 5
1
0 2 3 4 5
Dan Ellis 2012-09-24 15
118. Periodicity of X(ej!)
X(ej!) has periodicity 2º in ! :
X(e j ( +2 )
) = x[n]e j ( +2 )n
= x[n]e jn
e j 2 n
= X(e ) j
|X(ej )| arg{X(ej )}
3
2
2 3 4 5
1
0 2 3 4 5
Phase ambiguity of ej! makes it implicit
Dan Ellis 2012-09-24 16
119. Inverse DTFT (IDTFT)
Same basic form as other IFTs:
1
X(e
j jn
x[n] = )e d IDTFT
2
Note: continuous, periodic X(ej!)
discrete, infinite x[n] ...
IDTFT is actually forward Fourier Series
(except for sign of !)
Dan Ellis 2012-09-24 17
127.
DTFT symmetry j
X(e ) = x[n]e jn
n=
If x[n]
X(ej!) then...
x[-n]
X(e-j!) from summation
x*[n]
X*(e-j!) (e-j!)* = ej!
Re{x[n]}
XCS(e
2
[ ( ) ( )]
j!) = 1 X e j + X * e j
conjugate symmetry cancels Im parts on IDTFT
= [ X (e ) X (e )]
1 j * j
jIm{x[n]}
XCA(ej!)
2
xcs[n]
Re{X(ej!)}
xca[n]
jIm{X(ej!)}
Dan Ellis 2012-09-24 25
128. DTFT of real x[n]
When x[n] is pure real, X(ej!) = X*(e-j!)
XCS
xcs[n] ≡ xev[n] = xev[-n] XR(ej!) = XR(e-j!)
xca[n] ≡ xod[n] = -xod[-n] XI(ej!) = -XI(e-j!)
Imag
xim[n]
x[n] real, even Real
X(ej!) even, real xre[n]
n
Dan Ellis 2012-09-24 26
130. Convolution with DTFT
j j
Since g[n] h[n] G(e )H (e )
we can calculate a convolution by:
finding DTFTs of g, h → G, H
multiply them: G·H
IDTFT of product is result, g[n] h[n]
G(e j )
g[n] DTFT
Y (e j ) y[n]
IDTFT
h[n] DTFT
H (e j )
Dan Ellis 2012-09-24 28
M
131. DTFT convolution example
1
j
x[n] = Æn·μ[n]X(e ) =
1 e j
h[n] = ±[n] - Ʊ[n-1]
H (e ) = 1 (e
j j 1
) 1
y[n] = x[n] ∗ h[n]
Y (e j ) = H (e j )X(e j )
1
= (1 e j ) = 1
1 e j
y[n] = ±[n] i.e. ...
Dan Ellis 2012-09-24 29
132. DTFT modulation
Modulation: x[n] = g[n] h[n]
Could solve if g[n] was just sinusoids...
1
X(e ) = n
j
G(e )e d h[n]e jn
j jn
2
=
1
2
j
[
G(e ) n h[n]e ]
j ( )n
d
1
g[n] h[n]
2
G(e )H (e )d
j j ( )
Dual of convolution in time
Dan Ellis 2012-09-24 30
133. Parseval’s relation
“Energy” in time and frequency domains
are equal:
1
g[n]h [n] =
*
2
G(e j )H * (e j )d
n
If g = h, then g·g* = |g|2 = energy...
Dan Ellis 2012-09-24 31
134. Energy density spectrum
Energy of sequence g = g[n]
2
n
1
G(e
By Parseval g = j 2
) d
2
Define Energy Density Spectrum (EDS)
j j 2
Sgg (e ) = G(e )
Dan Ellis 2012-09-24 32
135. EDS and autocorrelation
Autocorrelation of g[n]:
rgg [] = g[n]g[n ] = g[n] g[n]
n=
DTFT {rgg []} = G(e )G(e j j
)
If g[n] is real, G(e-j!) = G*(ej!), so
DTFT {rgg []} = G(e ) = Sgg (e )
j 2 j no phase
info.
Mag-sq of spectrum is DTFT of autoco
Dan Ellis 2012-09-24 33
136. 3. Discrete FT (DFT)
Discrete FT Discrete Discrete
(DFT) finite/pdc x[n] finite/pdc X[k]
A finite or periodic sequence has only
N unique values, x[n] for 0 ≤ n < N
Spectrum is completely defined by N
distinct frequency samples
Divide 0..2º into N equal steps,
{!k} = 2ºk/N
Dan Ellis 2012-09-24 34
137. DFT and IDFT
Uniform sampling of DTFT spectrum:
N 1 2 k
j
X[k] = X(e j ) = 2 k = x[n]e
n
N
N n=0
N 1 2º/N
DFT: X[k] = x[n]WNkn 1
n=0 WN
2
j
where WN = e N
i.e. 1/Nth of a revolution
Dan Ellis 2012-09-24 35
138. IDFT
N 1
1
Inverse DFT IDFT x[n] = X[k]WN
nk
N k=0
Check:
1
(
x[n] = k l x[l]WN WN
N
kl nk
)
N 1 N 1 Sum of complete set
1
= x[l] WN k (ln ) of rotated vectors
N l=0 = 0 if l ≠ n; = N if l = n
k=0 im
= x[n] W re
N or finite
WN2 geometric series
0≤n<N = (1-WNlN)/(1-WNl)
Dan Ellis 2012-09-24 36
139. DFT examples
1
Finite impulse x[n] =
n=0
0 n = 1..N 1
X[k] = n=0 x[n]WN = WN = 1 k
N 1 kn 0
Periodic sinusoid:
⇥ (r Z)
2 rn 1 ⇥
x[n] = cos = WN + WN
rn rn
N 2
n=0 (WN + WNrn )WNkn
N 1 rn
X[k] = 1
2
(0 ≤ k < N)
N /2 k = r,k = N r
=
0 o.w.
Dan Ellis 2012-09-24 37
141. Matrix IDFT
If X = DN x
1
then x = DN X
i.e. inverse DFT is also just a matrix,
⇥
1 1 1 ··· 1
⇧1 WN 1 WN 2 ··· WN
(N 1) ⌃
⇧ ⌃
1 ⇧1
⇧ WN 2 WN 4 ··· WN
2(N 1) ⌃
⌃
DN1 =
N ⇧.
⇧. .
. .
. .. .
.
⌃
⌃
⇤. . . . . ⌅
(N 1) 2(N 1) (N 1)2
1 WN WN ··· WN
=1/NDN*
Dan Ellis 2012-09-24 39
142. DFT and MATLAB
MATLAB is concerned with sequences
not continuous functions like X(ej!)
Instead, we use the DFT to sample
X(ej!) on an (arbitrarily-fine) grid:
X = freqz(x,1,w); samples the DTFT
of sequence x at angular frequencies in w
X = fft(x); calculates the N-point DFT
of an N-point sequence x
Dan Ellis 2012-09-24 40
M
144. DTFT from DFT
N-point DFT completely specifies the
continuous DTFT of the finite sequence
N 1
1 N 1 jn
X(e ) = X[k]WN e
j kn
n=0 N k=0
j ( 2 k ) n
N 1 N 1
1
= X[k] e N “periodic
N k=0 n=0
sinc”
k = 2Nk
k
N 1
1 sin N 2 j ( N21) k
= X[k] k
e
interpolation N k=0 sin 2
Dan Ellis 2012-09-24 42
145. Periodic sinc jN
N 1
1e k
e j n
= k
1e j k
n =0
jN k / 2 jN k / 2 jN k / 2
e e e
= j k / 2 j k / 2
e e e j k / 2
k
j 2 k sin N 2
( N 1)
pure real
=e k
pure phase sin 2
= N when Δ!k = 0; = (-)N when Δ!k/2 = º
= 0 when Δ!k/2 = r·º/N, r = ±1, ± 2, ...
other values in-between...
Dan Ellis 2012-09-24 43
146. Periodic sinc N
sinNx/sinx
sin Nx sinNx
0
2 x
sinx
sin x
-N (N = 8)
sin N k/2
X[k] = X(ej2 k/N)
X(ej 0) = X[k]·
N sin k/2
1.5
DFT→ DTFT 1 X[3]·
sin N 3/2
N sin 3/2
= interpolation 0.5
by periodic
sinc 0
freq
-0.5
X[k]→X(ej!) -1 0 k=1
= 2 /N 0
k=3
= 6 /N
k=4
= 8 /N
Dan Ellis 2012-09-24 44
147. DFT from overlength DTFT
If x[n] has more than N points, can still
j
form X[k] = X(e ) = 2 k
N
IDFT of X[k] will give N point x[n]
˜
How does x[n] relate to x[n]?
˜
Dan Ellis 2012-09-24 45
148. DFT from overlength DTFT
DTFT sample IDFT
x[n] X(ejω) X[k] x[n]
˜
-A ≤ n < B 0≤n<N
N 1
1
x[n] =
˜ x[ ]WN
k
WN nk
N
k=0 = =1 for n-l = rN, r∈I
N 1 = 0 otherwise
1 k( n)
= x[ ] WN
N
= k=0
all values shifted by
x[n] =
˜ x[n rN ] exact multiples of N pts
0≤n<N r= to lie in 0 ≤ n < N
Dan Ellis 2012-09-24 46
149. DFT from DTFT example
If x[n] = { 8, 5, 4, 3, 2, 2, 1, 1} (8 point)
We form X[k] for k = 0, 1, 2, 3
by sampling X(ej!) at ! = 0, º/2, º, 3º/2
IDFT of X[k] gives 4 pt x[n] =
˜ x[n rN ]
r=
Overlap only for r = -1: (N = 4)
8 5 4 3
x[n] =
˜ + + + + = {10 7 5 4}
2 2 2 1
Dan Ellis 2012-09-24 47
150. DFT from DTFT example
x[n]
-1 1 2 3 4 5 6 7 8 n
x[n+N]
-5 -4 -3 -2 -1 1 2 3 4 5 n
(r = -1)
x[n]
˜
1 2 3 n
x[n] is the time aliased or ‘folded down’
˜
version of x[n].
Dan Ellis 2012-09-24 48
151. Properties: Circular time shift
DFT properties mirror DTFT, with twists:
Time shift must stay within N-pt ‘window’
kn
g[ n n0 N ] WN 0 G[k]
Modulo-N indexing keeps index between
0 and N-1:
g[n n0 ] n n0
g[ n n0 N] =
g[N + n n0 ] n < n0
0 ≤ n0 < N
Dan Ellis 2012-09-24 49
152. Circular time shift
Points shifted out to the right don’t
disappear – they come in from the left
g[n] g[<n-2>5]
‘delay’ by 2
1 2 3 4
n 1 2 3 4
n
5-pt sequence
Like a ‘barrel shifter’:
origin pointer
Dan Ellis 2012-09-24 50
153. Circular time reversal
Time reversal is tricky in ‘modulo-N’
indexing - not reversing the sequence:
x [n]
˜
5-pt sequence
made periodic
-7 -6 -5 -4 -3 -2 -1 1 2 3 4 5 6 7 8 9 10 11
n
x [ n
˜ N ]
Time-reversed
periodic sequence
-7 -6 -5 -4 -3 -2 -1 1 2 3 4 5 6 7 8 9 10 11
n
Zero point stays fixed; remainder flips
Dan Ellis 2012-09-24 51
154. Duality
DFT and IDFT are very similar
both map an N-pt vector to an N-pt vector
Duality:
Circular
if g[n] G [k ] time reversal
then G [n] N g[ k N ]
i.e. if you treat DFT sequence as a time
sequence, result is almost symmetric
Dan Ellis 2012-09-24 52
155. 4. Convolution with the DFT
IDTFT of product of DTFTs of two N-pt
sequences is their 2N-1 pt convolution
IDFT of the product of two N-pt DFTs
can only give N points!
Equivalent of 2N-1 pt result time aliased:
i.e. y [n ] =
c r=
yl [n + rN ] (0 ≤ n < N)
must be, because G[k]H[k] are exact
samples of G(ej!)H(ej!)
This is known as circular convolution
Dan Ellis 2012-09-24 53
156. Circular convolution
Can also do entire convolution with
modulo-N indexing
Hence, Circular Convolution:
N 1
g[m ]h[ n m N ] G[k]H[k]
m=0
Written as g[n] h[n]
N
Dan Ellis 2012-09-24 54
157. Circular convolution example
4 pt sequences: g[n]={1 2 0 1} h[n]={2 2 1 0}
N 1
g[m ]h[ n m N ] 1 2 3 n 1 2 3 n
m=0
g[n] 4 h[n]={4 7 5 4}
h[<n - 0>4] n 1
1 2 3
h[<n - 1>4] n 2
1 2 3
1 2 3 n
h[<n - 2>4] n 0
1 2 3
check: g[n] * h[n]
h[<n - 3>4] n 1
1 2 3 ={2 6 5 4 2 1 0}
Dan Ellis 2012-09-24 55
158. DFT properties summary
Circular convolution
m=0 g[m ]h[ n m
N 1
N ] G[k]H[k]
Modulation
m=0 G[m]H[ k m N ]
N 1
g[n] h[n] 1
N
Duality
G [n] N g[ k N ]
Parseval
n=0 x[n] k=0 X [k ]
N 1 2 N 1 2
= 1
N
Dan Ellis 2012-09-24 56
159. Linear convolution w/ the DFT
DFT → fast circular convolution
.. but we need linear convolution
Circular conv. is time-aliased linear
conv.; can aliasing be avoided?
e.g. convolving L-pt g[n] with M-pt h[n]:
y[n] = g[n] * h[n] has L+M-1 nonzero pts
Set DFT size N ≥ L+M-1 → no aliasing
Dan Ellis 2012-09-24 57
160. Linear convolution w/ the DFT
Procedure (N = L + M - 1): g[n]
pad L-pt g[n] with (at least)
M-1 zeros L n
→ N-pt DFT G[k], k = 0..N-1 h[n]
pad M-pt h[n] with (at least)
M n
L-1 zeros
yc[n]
→ N-pt DFT H[k], k = 0..N-1
Y[k] = G[k]·H[k], k = 0..N-1 Nn
IDFT{Y[k]} =
r=
yL [n + rN ] = yL [n] (0 ≤ n < N)
Dan Ellis 2012-09-24 58
161. Overlap-Add convolution
Very long g[n] → break up into
segments, convolve piecewise, overlap
→ bound size of DFT, processing delay
g[n] i · N n < (i + 1) · N
Make gi [n] =
0 otherwise
g[n] = i gi [n]
h[n] g[n] = i h[n] gi [n]
Called Overlap-Add (OLA) convolution...
Dan Ellis 2012-09-24 59
162. Overlap-Add convolution
g[n] h[n]
L
n n
g0[n] g0[n] * h[n]
n n
g1[n] g1[n] * h[n]
n n
g2[n] g2[n] * h[n]
n n
N 2N 3N valid OLA sum
h[n] * g[n]
n
LN 2N 3N
Dan Ellis 2012-09-24 60
163. ELEN E4810: Digital Signal Processing
Topic 4: The Z Transform
1. The Z Transform
2. Inverse Z Transform
Dan Ellis 2012-10-03 1
164. 1. The Z Transform
Powerful tool for analyzing & designing
DT systems
Generalization of the DTFT:
G(z) = Z{g[n]} = g[n]z n Z Transform
n=
z is complex...
z = ej! → DTFT
DTFT of
z = r·ej! → n g[n]r n jn
e r-n·g[n]
Dan Ellis 2012-10-03 2
165. Region of Convergence (ROC)
Critical question:
Does summation G(z) = n= x[n]z n
converge (to a finite value)?
In general, depends on the value of z
→ Region of Convergence: Im{z}
Portion of complex z-plane
Re{z}
for which a particular G(z) λ
will converge
ROC z-plane
|z| > λ
Dan Ellis 2012-10-03 3
166. ROC Example
e.g. x[n] = ∏nµ[n] -2 -1 1 2 3 4
n
1
X(z) = z = n n
n=0 1 z 1
ß converges only for |∏z-1| < 1
i.e. ROC is |z| > |∏| (see previous slide)
|∏| < 1 (e.g. 0.8) - finite energy sequence
|∏| > 1 (e.g. 1.2) - divergent sequence,
infinite energy, DTFT does not exist
but still has ZT when |z| > 1.2 (in ROC)
Dan Ellis 2012-10-03 4
167. About ROCs
ROCs always defined in terms of |z|
→ circular regions on z-plane
(inside circles/outside circles/rings)
If ROC includes Im{z}
unit circle (|z| = 1),
→ g[n] has a DTFT Re{z}
1
(finite energy
sequence) Unit circle
z-plane
lies in ROC
→ DTFT OK
Dan Ellis 2012-10-03 5
168. Another ROC example
Anticausal (left-sided) sequence:
x [n] = µ [n 1] -5 -4 -3 -2 -1
n
1 2 3 4 n
X ( z ) = n ( µ [ n 1]) z
n n
ROC:
|λ| > |z|
= n = z = m =1 z
1 n n m m
1 1
= z
1
=
1 z 1 z
1 1
Same ZT as ∏nµ[n], different sequence?
Dan Ellis 2012-10-03 6
169. ROC is necessary!
To completely define a ZT, you must
specify the ROC:
x[n] = ∏nµ[n]
X(z) = 1 Im
1 z 1
-4 -3 -2 -1 1 2 3 4
n ROC |z| > |∏| Re
|λ|
x[n] = -∏ nµ[-n-1] 1
X(z) =
-4 -3 -2 -1
1 z 1
z-plane
1 2 3 4 n ROC |z| < |∏|
A single G(z) can describe several
DTFTs?
sequences with different ROCs
Dan Ellis 2012-10-03 7
170. Rational Z-transforms
G(z) can be any function;
rational polynomials are important
class:
P (z ) p0 + p1z +…+ pM 1z
1 (M 1)
+ pM z M
G (z) = =
D(z ) d0 + d1z 1 +…+ d N1z (N1) + d N z N
By convention, expressed in terms of z-1
– matches ZT definition
(Reminiscent of LCCDE expression...)
Dan Ellis 2012-10-03 8
171. Factored rational ZTs
Numerator, denominator can be
factored:
G (z) =
=1 ( )
p0 M 1 z 1 z M p0 M (z )
= N =1
(
d0 =1 1 z
N 1
)
z d0 =1 (z )
N
{≥} are roots of numerator
→ G(z) = 0 → {≥} are the zeros of G(z)
{∏} are roots of denominator
→ G(z) = ∞ → {∏} are the poles of G(z)
Dan Ellis 2012-10-03 9
172. Pole-zero diagram
Can plot poles and zeros on
complex z-plane:
Im{z}
poles ∏
(cpx conj for real g[n])
o
×
o × o Re{z}
× 1
o
zeros ≥
z-plane
Dan Ellis 2012-10-03 10
173. Z-plane surface
G(z): cpx function of a cpx variable
Can calculate value over entire z-plane
ROC
not
shown!!
Slice between surface and unit cylinder
(|z| = 1 z = ej!) is G(ej!), the DTFT
M
Dan Ellis 2012-10-03 11