Nice Synopsis of what we can achieve through the use of Matlab. Manipulate, analyse and visualize data. Pinpoint error and correct them
4 options. Columns or row next to each other or below one another
Solid line, dashed line, dotted line, etc
4 attributes or fields
Never again coredump
After you exhaust the 8000 built-in Matlab commands…
“Will consider finite (at any given time), although in streaming context, N grows”
“Note that the number of coefficients is still eight…”
…easier for interpretation, not for algebraic manipulation. But, algebraic, even easier with complex form (next slide)
Callouts: “bases are zero outside window boundaries”
Say about relationship (or lack thereof) between window “size” and filter length…
Export setup: 6 x 5 in (expand axes)
Explain “MRA” in words: reconstruction using the coefficients *only* from that level Export setup: 6 x 5 in (expand axes)
Export setup: 6 x 5 in (expand axes)
PE plot export: 4x4in (expand) Inset export: in (expand)
Previous slide: more from signal-processing – this slide is DB-specific
Export setup for t.s. plot: 5x7 in (expand axes)
First: how what exactly do we mean by “correlation”? (Answer: linear correlations)
So: all we have to do, is estimate the slope. Starting with the first two points, this is really very easy and fast.
We are “lucky” so far. Next: what happens when we have to update the slope.
Answer: rotate the slope to “fix” the error. [Unanswered question: rotate around *which* point?] This is a simple vector a addition (and re-normalization) -> O(n) very simple operations
Mention that this converges assuming no “drifts” (technically: stationarity).
Done with intuition, now give real names.
Just point out very special case (but.. APCA more elaborate time segmentation…)
1. Why variable-length segmentation is good (if goal is piecewise-constant) 2. Also shows weakness of Haar… APCA-21: 24% RMS error Haar (lv 7): 44% RMS error
1. Why variable-length segmentation is good (if goal is piecewise-constant) 2. Also shows weakness of Haar… APCA-21: 24% RMS error Haar (lv 7): 44% RMS error
APCA-15: 27% RMS error DB3 (lv-7): 38% RMS error
Show case k=2 (for which the equivalence is exact) First, two clusters always separable on 1 st PC (i.e., reduces to 1-D problem, easy) Furthermore, related objectives: K-means: minimize green length PCA: minimize red length (or, equivalently, angle) For k > 2, things get more complicated – see reference
Say: gray cells are prefix subsequences – we use only these in recursive definition/estimation
This is a sketch of the idea… Works like this under certain(?) “smoothness” conditions (may have to look at all four sub-rectangles separately, lb property does not have to guarantee “inclusion”…)
(+)No need to know anything about the distance. Just pairwise distances
Distance functions that are robust to outliers or to extremely noisy data will typically violate the triangular inequality. These functions achieve this by not considering the most dissimilar parts of the objects. These functions are extremely useful, because they represent an accurate model of the human perception, since when comparing any kind of data (images, time-series etc), we mostly focus on the portions that are similar and we are willing to pay less attention to regions of great dissimilarity.
Nx1 vector. It does not show the compression, but it does show the quality of the approximation
Trajectory data in other applications too.
All of these applications are a different application, a different twist of similarity measures and similarity matching.
Transcript of "Matlab tme series benni"
1.
Hands-On Time-Series Analysis with MatlabMichalis Vlachos and Spiros Papadimitriou IBM T.J. Watson Research Center
2.
Tutorial | Time-Series with MatlabDisclaimer Feel free to use any of the following slides for educational purposes, however kindly acknowledge the source. We would also like to know how you have used these slides, so please send us emails with comments or suggestions.
3.
Tutorial | Time-Series with Matlab About this tutorial The goal of this tutorial is to show you that time-series research (or research in general) can be made fun, when it involves visualizing ideas, that can be achieved with concise programming. Matlab enables us to do that. Will I be able I am definitely to use this smarter than her, MATLAB but I am not a time- right away series person, per-se. after the tutorial? I wonder what I gain from this tutorial…
4.
Tutorial | Time-Series with Matlab Disclaimer We are not affiliated with Mathworks in any way … but we do like using Matlab a lot since it makes our lives easier Errors and bugs are most likely contained in this tutorial. We might be responsible for some of them.
5.
Tutorial | Time-Series with Matlab What this tutorial is NOT about Moving averages Autoregressive models Forecasting/Prediction Stationarity Seasonality
6.
Tutorial | Time-Series with MatlabOverviewPART A — The Matlab programming environmentPART B — Basic mathematics Introduction / geometric intuition Coordinates and transforms Quantized representations Non-Euclidean distancesPART C — Similarity Search and Applications Introduction Representations Distance Measures Lower Bounding Clustering/Classification/Visualization Applications
7.
Tutorial | Time-Series with MatlabPART A: Matlab Introduction
8.
Tutorial | Time-Series with MatlabWhy does anyone need Matlab? Matlab enables the efficient Exploratory Data Analysis (EDA)“Science progresses through observation” -- Isaac Newton Isaac Newton“The greatest value of a picture is that is forces us to notice what we never expected to see” -- John Tukey John Tukey
9.
Tutorial | Time-Series with MatlabMatlab Interpreted Language – Easy code maintenance (code is very compact) – Very fast array/vector manipulation – Support for OOP Easy plotting and visualization Easy Integration with other Languages/OS’s – Interact with C/C++, COM Objects, DLLs – Build in Java support (and compiler) – Ability to make executable files – Multi-Platform Support (Windows, Mac, Linux) Extensive number of Toolboxes – Image, Statistics, Bioinformatics, etc
10.
Tutorial | Time-Series with MatlabHistory of Matlab (MATrix LABoratory)“The most important thing in the programming language is the name.I have recently invented a very good name and now I am looking for asuitable language”. -- R. Knuth Programmed by Cleve Moler as an interface for EISPACK & LINPACK Cleve Moler 1957: Moler goes to Caltech. Studies numerical Analysis 1961: Goes to Stanford. Works with G. Forsythe on Laplacian eigenvalues. 1977: First edition of Matlab; 2000 lines of Fortran – 80 functions (now more than 8000 functions) 1979: Met with Jack Little in Stanford. Started working on porting it to C 1984: Mathworks is foundedVideo:http://www.mathworks.com/company/aboutus/founders/origins_of_matlab_wm.html
12.
Tutorial | Time-Series with MatlabCurrent State of Matlab/Mathworks Matlab, Simulink, Stateflow Matlab version 7.3, R2006b Used in variety of industries – Aerospace, defense, computers, communication, biotech Mathworks still is privately owned Used in >3,500 Universities, with >500,000 users worldwide 2005 Revenue: >350 M. Money is better than Money is better than poverty, if only for poverty, if only for 2005 Employees: 1,400+ financial reasons…… financial reasons…… Pricing: – starts from 1900$ (Commercial use), – ~100$ (Student Edition)
13.
Tutorial | Time-Series with MatlabMatlab 7.3 R2006b, Released on Sept 1 2006 – Distributed computing – Better support for large files – New optimization Toolbox – Matlab builder for Java • create Java classes from Matlab – Demos, Webinars in Flash format – (http://www.mathworks.com/products/matlab/demos. html)
14.
Tutorial | Time-Series with MatlabWho needs Matlab? R&D companies for easy application deployment Professors – Lab assignments – Matlab allows focus on algorithms not on language features Students – Batch processing of files • No more incomprehensible perl code! – Great environment for testing ideas • Quick coding of ideas, then porting to C/Java etc – Easy visualization – It’s cheap! (for students at least…)
15.
Tutorial | Time-Series with MatlabStarting up Matlab Personally Im always ready to learn, although I do not always like be Sir Winston Churchill Dos/Unix like directory navigation Commands like: – cd – pwd – mkdir For navigation it is easier to just copy/paste the path from explorer E.g.: cd ‘c:documents’
16.
Tutorial | Time-Series with MatlabMatlab Environment Command Window: - type commands - load scripts Workspace: Loaded Variables/Types/Size
17.
Tutorial | Time-Series with Matlab Matlab Environment Command Window: - type commands - load scripts Workspace: Loaded Variables/Types/SizeHelp contains a comprehensiveintroduction to all functions
18.
Tutorial | Time-Series with MatlabMatlab Environment Command Window: - type commands - load scripts Workspace: Loaded Variables/Types/Size Excellent demos and tutorial of the various features and toolboxes
19.
Tutorial | Time-Series with MatlabStarting with Matlab Everything is arrays Manipulation of arrays is faster than regular manipulation with for-loops a = [1 2 3 4 5 6 7 9 10] % define an array
20.
Tutorial | Time-Series with MatlabPopulating arrays Plot sinusoid function a = [0:0.3:2*pi] % generate values from 0 to 2pi (with step of 0.3) b = cos(a) % access cos at positions contained in array [a] plot(a,b) % plot a (x-axis) against b (y-axis)Related:linspace(-100,100,15); % generate 15 values between -100 and 100
21.
Tutorial | Time-Series with MatlabArray Access Access array elements >> a(1) >> a(1:3) ans = ans = 0 0.3000 0.6000 0 Set array elements >> a(1) = 100 >> a(1:3) = [100 100 100]
22.
Tutorial | Time-Series with Matlab2D Arrays Can access whole columns or rows – Let’s define a 2D array >> a = [1 2 3; 4 5 6] >> a(1,:) Row-wise access a = ans = 1 2 3 4 5 6 1 2 3 >> a(2,2) >> a(:,1) Column-wise access ans = ans = 5 1 4 A good listener is not only popular everywhere, but after a while he gets to know something. –Wilson Mizner
23.
Tutorial | Time-Series with MatlabColumn-wise computation For arrays greater than 1D, all computations happen column-by-column >> a = [1 2 3; 3 2 1] >> max(a) a = ans = 1 2 3 3 2 1 3 2 3 >> mean(a) >> sort(a) ans = ans = 2.0000 2.0000 2.0000 1 2 1 3 2 3
24.
Tutorial | Time-Series with MatlabConcatenating arrays Column-wise or row-wise >> a = [1 2 3]; Row next to row >> a = [1;2]; Column next to column >> b = [4 5 6]; >> b = [3;4]; >> c = [a b] >> c = [a b] c = c = 1 3 1 2 3 4 5 6 2 4 >> a = [1 2 3]; Row below row >> a = [1;2]; Column below column >> b = [4 5 6]; >> b = [3;4]; >> c = [a; b] >> c = [a; b] c = c = 1 2 3 1 4 5 6 2 3 4
25.
Tutorial | Time-Series with MatlabInitializing arrays Create array of ones [ones] >> a = ones(1,3) >> a = ones(2,2)*5; a = a = 1 1 1 5 5 5 5 >> a = ones(1,3)*inf a = Inf Inf Inf Create array of zeroes [zeros] – Good for initializing arrays >> a = zeros(1,4) >> a = zeros(3,1) + [1 2 3]’ a = a = 1 0 0 0 0 2 3
26.
Tutorial | Time-Series with MatlabReshaping and Replicating Arrays Changing the array shape [reshape] – (eg, for easier column-wise computation) >> a = [1 2 3 4 5 6]’; % make it into a column reshape(X,[M,N]): >> reshape(a,2,3) [M,N] matrix of columnwise version ans = of X 1 3 5 2 4 6 Replicating an array [repmat] >> a = [1 2 3]; repmat(X,[M,N]): >> repmat(a,1,2) make [M,N] tiles of X ans = 1 2 3 1 2 3 >> repmat(a,2,1) ans = 1 2 3 1 2 3
27.
Tutorial | Time-Series with MatlabUseful Array functions Last element of array [end] >> a = [1 3 2 5]; >> a = [1 3 2 5]; >> a(end) >> a(end-1) ans = ans = 5 2 Length of array [length] Length = 4 >> length(a) ans = a= 1 3 2 5 4 Dimensions of array [size] columns = 4 rows = 1 >> [rows, columns] = size(a) rows = 1 1 2 3 5 columns = 4
28.
Tutorial | Time-Series with MatlabUseful Array functions Find a specific element [find] ** >> a = [1 3 2 5 10 5 2 3]; >> b = find(a==2) b = 3 7 Sorting [sort] *** >> a = [1 3 2 5]; >> [s,i]=sort(a) a= 1 3 2 5 s = 1 2 3 5 s= 1 2 3 5 i = 1 3 2 4 i= 1 3 2 4 Indicates the index where the element came from
29.
Tutorial | Time-Series with Matlab Visualizing Data and Exporting Figures Use Fisher’s Iris dataset >> load fisheriris – 4 dimensions, 3 species – Petal length & width, sepal length & width – Iris: • virginica/versicolor/setosa meas (150x4 array): Holds 4D measurements ...versicolorversicolorversicolorversicolorversicolor species (150x1 cell array):virginica Holds name of species forvirginica the specific measurementvirginicavirginica‘ ...
30.
Tutorial | Time-Series with Matlab strcmp, scatter, hold onVisualizing Data (2D) >> idx_setosa = strcmp(species, ‘setosa’); % rows of setosa data >> idx_virginica = strcmp(species, ‘virginica’); % rows of virginica >> >> setosa = meas(idx_setosa,[1:2]); >> virgin = meas(idx_virginica,[1:2]); >> scatter(setosa(:,1), setosa(:,2)); % plot in blue circles by default >> hold on; >> scatter(virgin(:,1), virgin(:,2), ‘rs’); % red[r] squares[s] for these idx_setosa ... 1 1 An array of zeros and 1 ones indicating the 0 positions where the 0 keyword ‘setosa’ was 0 found ... The world is governed more by appearances rather than realities… --Daniel Webster
31.
Tutorial | Time-Series with Matlab scatter3 Visualizing Data (3D) >> idx_setosa = strcmp(species, ‘setosa’); % rows of setosa data >> idx_virginica = strcmp(species, ‘virginica’); % rows of virginica >> idx_versicolor = strcmp(species, ‘versicolor’); % rows of versicolor >> setosa = meas(idx_setosa,[1:3]); >> virgin = meas(idx_virginica,[1:3]); >> versi = meas(idx_versicolor,[1:3]); >> scatter3(setosa(:,1), setosa(:,2),setosa(:,3)); % plot in blue circles by default >> hold on; >> scatter3(virgin(:,1), virgin(:,2),virgin(:,3), ‘rs’); % red[r] squares[s] for these >> scatter3(versi(:,1), virgin(:,2),versi(:,3), ‘gx’); % green x’s 7 6 5 4 >> grid on; % show grid on axis 3 >> rotate3D on; % rotate with mouse 2 14.5 4 8 3.5 7.5 7 6.5 3 6 5.5 2.5 5 4.5 2 4
32.
Tutorial | Time-Series with MatlabChanging Plots Visually Zoom out Zoom in Computers are Computers are useless. They can useless. They can Create line only give you only give you answers… answers… Create Arrow Select Object Add text
33.
Tutorial | Time-Series with MatlabChanging Plots Visually Add titles Add labels on axis Change tick labels Add grids to axis Change color of line Change thickness/ Linestyle etc
34.
Tutorial | Time-Series with MatlabChanging Plots Visually (Example) Change color and width of a line A Right click C B
36.
Tutorial | Time-Series with MatlabChanging Figure Properties with Code GUI’s are easy, but sooner or later we realize that coding is faster>> a = cumsum(randn(365,1)); % random walk of 365 values If this represents a year’s worth of measurements of an imaginary quantity, we will change: • x-axis annotation to months • Axis labels • Put title in the figure • Include some greek letters in the title just for fun Real men do it command-line… --Anonymous
37.
Tutorial | Time-Series with MatlabChanging Figure Properties with Code Axis annotation to months>> axis tight; % irrelevant but useful...>> xx = [15:30:365];>> set(gca, ‘xtick’,xx) The result … Real men do it command-line… --Anonymous
38.
Tutorial | Time-Series with MatlabChanging Figure Properties with Code Axis annotation to months >> set(gca,’xticklabel’,[‘Jan’; ... ‘Feb’;‘Mar’]) The result … Real men do it command-line… --Anonymous
39.
Tutorial | Time-Series with Matlab Changing Figure Properties with Code Other latex examples: Axis labels and title alpha, beta, e^{-alpha} etc >> title(‘My measurements (epsilon/pi)’)>> ylabel(‘Imaginary Quantity’)>> xlabel(‘Month of 2005’) Real men do it command-line… --Anonymous
40.
Tutorial | Time-Series with MatlabSaving Figures Matlab allows to save the figures (.fig) for later processing .fig can be later opened through Matlab You can always put-off for tomorrow, what you can do today. -Anonymous
41.
Tutorial | Time-Series with MatlabExporting Figures Export to: emf, eps, jpg, etc
42.
Tutorial | Time-Series with MatlabExporting figures (code) You can also achieve the same result with Matlab code Matlab code: % extract to color eps print -depsc myImage.eps; % from command-line print(gcf,’-depsc’,’myImage’) % using variable as name
45.
Tutorial | Time-Series with MatlabVisualizing Data - Surfaces data 10 9 1 2 3 … 10 8 1 7 6 5 9 10 4 1 10 3 2 1 10 The value at position 8 6 8 10 x-y of the array 4 6 indicates the height of 4 2 2 the surface 0 0data = [1:10];data = repmat(data,10,1); % create datasurface(data,FaceColor,[1 1 1], Edgecolor, [0 0 1]); % plot dataview(3); grid on; % change viewpoint and put axis lines
46.
Tutorial | Time-Series with MatlabCreating .m files Standard text files – Script: A series of Matlab commands (no input/output arguments) – Functions: Programs that accept input and return output Right click
47.
Tutorial | Time-Series with MatlabCreating .m files M editor Double click
48.
Tutorial | Time-Series with Matlab cumsum, num2str, save Creating .m files The following script will create: – An array with 10 random walk vectors – Will save them under text files: 1.dat, …, 10.datmyScript.m Sample Script A cumsum(A)a = cumsum(randn(100,10)); % 10 random walk data of length 100 1 1for i=1:size(a,2), % number of columns data = a(:,i) ; 2 3 fname = [num2str(i) ‘.dat’]; % a string is a vector of characters! save(fname, ’data’,’-ASCII’); % save each column in a text file 3 6end 4 10 Write this in the 5 15 A random walk time-series M editor… 10 5 0 …and execute by typing the name on the Matlab -5 command line 0 10 20 30 40 50 60 70 80 90 100
49.
Tutorial | Time-Series with MatlabFunctions in .m scripts When we need to: – Organize our code – Frequently change parameters in our scripts keyword output argument function name input argumentfunction dataN = zNorm(data)% ZNORM zNormalization of vector Help Text% subtract mean and divide by std (help function_name)if (nargin<1), % check parameters error(‘Not enough arguments’);enddata = data – mean(data); % subtract mean Function Bodydata = data/std(data); % divide by stddataN = data; function [a,b] = myFunc(data, x, y) % pass & return more argumentsSee also:varargin, varargout
50.
Tutorial | Time-Series with MatlabCell Arrays Cells that hold other Matlab arrays – Let’s read the files of a directory >> f = dir(‘*.dat’) % read file contents f = 15x1 struct array with fields: name me date Struct Array ).na bytes name f(1 date isdir 1 bytes for i=1:length(f), isdir a{i} = load(f(i).name); 2 N = length(a{i}); plot3([1:N], a{i}(:,1), a{i}(:,2), ... 3 ‘r-’, ‘Linewidth’, 1.5); grid on; 4 pause; 600 5 cla; 500 end 400 300 200 100 0 1000 1500 500 1000 500
51.
Tutorial | Time-Series with MatlabReading/Writing Files Load/Save are faster than C style I/O operations – But fscanf, fprintf can be useful for file formatting or reading non-Matlab filesfid = fopen(fischer.txt, wt);for i=1:length(species), fprintf(fid, %6.4f %6.4f %6.4f %6.4f %sn, meas(i,:), species{i});endfclose(fid);Output file: Elements are accessed column-wise (again…) x = 0:.1:1; y = [x; exp(x)]; fid = fopen(exp.txt,w); fprintf(fid,%6.2f %12.8fn,y); fclose(fid); 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 1 1.1052 1.2214 1.3499 1.4918 1.6487 1.8221 2.0138
52.
Tutorial | Time-Series with MatlabFlow Control/Loops if (else/elseif) , switch – Check logical conditions while – Execute statements infinite number of times for – Execute statements a fixed number of times break, continue return – Return execution to the invoking function Life is pleasant. Death is peaceful. It’s the transition that’s troublesome. –Isaac Asimov
53.
Tutorial | Time-Series with Matlab tic, toc, clear allFor-Loop or vectorization? Pre-allocate arrays that store output results clear all; elapsed_time = – No need for Matlab to tic; for i=1:50000 5.0070 resize everytime a(i) = sin(i); end Functions are faster than toc scripts – Compiled into pseudo- clear all; elapsed_time = code a = zeros(1,50000); tic; 0.1400 Load/Save faster than for i=1:50000 a(i) = sin(i); Matlab I/O functions end toc After v. 6.5 of Matlab there is for-loop vectorization (interpreter) clear all; tic; elapsed_time = Vectorizations help, but i = [1:50000]; not so obvious how to a = sin(i); 0.0200 toc; achieve many times Time not important…only life important. –The Fifth Element
54.
Tutorial | Time-Series with Matlab Matlab Profiler Find which portions of code take up most of the execution time – Identify bottlenecks – Vectorize offending code Time not important…only life important. –The Fifth Element
55.
Tutorial | Time-Series with MatlabHints &Tips There is always an easier (and faster) way – Typically there is a specialized function for what you want to achieve Learn vectorization techniques, by ‘peaking’ at the actual Matlab files: – edit [fname], eg – edit mean – edit princomp Matlab Help contains many vectorization examples
56.
Tutorial | Time-Series with MatlabDebugging Beware of bugs in the above code; I have only proved it correct, not tried it -- R. Knuth Not as frequently required as in C/C++ – Set breakpoints, step, step in, check variables values Set breakpoints
57.
Tutorial | Time-Series with Matlab Either this man is Either this man is dead or my watch dead or my watchDebugging has stopped. has stopped. Full control over variables and execution path – F10: step, F11: step in (visit functions, as well)A B F10 C
58.
Tutorial | Time-Series with MatlabAdvanced Features – 3D modeling/Volume Rendering Very easy volume manipulation and rendering
59.
Tutorial | Time-Series with MatlabAdvanced Features – Making Animations (Example) Create animation by changing the camera viewpoint 3 3 2 2 1 13 0 02 -1 -11 -2 -20 -3 0 0 -3-1 0 4-2 50 3 50 50 2-3 1 -1 0 0 1 2 100 4 100 3 4 100 2 3 -1 0 1 -1azimuth = [50:100 99:-1:50]; % azimuth range of valuesfor k = 1:length(azimuth), plot3(1:length(a), a(:,1), a(:,2), r, Linewidth,2); grid on; view(azimuth(k),30); % change new M(k) = getframe; % save the frameendmovie(M,20); % play movie 20 times See also:movie2avi
60.
Tutorial | Time-Series with MatlabAdvanced Features – GUI’s Built-in Development Environment – Buttons, figures, Menus, sliders, etc Several Examples in Help – Directory listing – Address book reader – GUI with multiple axis
61.
Tutorial | Time-Series with MatlabAdvanced Features – Using Java Matlab is shipped with Java Virtual Machine (JVM) Access Java API (eg I/O or networking) Import Java classes and construct objects Pass data between Java objects and Matlab variables
62.
Tutorial | Time-Series with MatlabAdvanced Features – Using Java (Example) Stock Quote Query – Connect to Yahoo server – http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=4069&objectType=file disp(Contacting YAHOO server using ...); disp([url = java.net.URL( urlString )]); end; url = java.net.URL(urlString); try stream = openStream(url); ireader = java.io.InputStreamReader(stream); breader = java.io.BufferedReader(ireader); connect_query_data= 1; %connect made; catch connect_query_data= -1; %could not connect case; disp([URL: urlString]); error([Could not connect to server. It may be unavailable. Try again later.]); stockdata={}; return; end
63.
Tutorial | Time-Series with MatlabMatlab Toolboxes You ca n buy m any specialize d toolbox e s from Ma thw orks – Image Processing, Statistics, Bio-Informatics, etc The re a re m any equiva le nt free toolbox e s too: – SVM toolbox • http://theoval.sys.uea.ac.u k/~gcc/svm/toolbox/ – W avelets • http://www.math.rutgers.ed u/~ojanen/wavekit/ – Speech Processing • http://www.ee.ic.ac.uk/hp /staff/dmb/voicebox/voicebox.html – Bayesian Networks • http://www.cs.ubc.ca/~murphyk/Software/BNT/bnt.html
64.
Tutorial | Time-Series with Matlab I’ve had a wonderful I’ve had a wonderfulIn case I get stuck… evening. But this evening. But this wasn’t it… wasn’t it… help [command] (on the command line) eg. help fft Menu: help -> matlab help – Excellent introduction on various topics Matlab webinars – http://www.mathworks.com/company/events/archived_webinars.html?fp Google groups – comp.soft-sys.matlab – You can find *anything* here – Someone else had the same problem before you!
65.
Tutorial | Time-Series with MatlabPART B: Mathematical notions Eight percent of Eight percent of success is showing success is showing up. up.
66.
Tutorial | Time-Series with MatlabOverview of Part B1. Introduction and geometric intuition2. Coordinates and transforms Fourier transform (DFT) Wavelet transform (DWT) Incremental DWT Principal components (PCA) Incremental PCA3. Quantized representations Piecewise quantized / symbolic Vector quantization (VQ) / K-means4. Non-Euclidean distances Dynamic time warping (DTW)
67.
Tutorial | Time-Series with MatlabWhat is a time-seriesDefinition: A sequence of measurements over timeDefinition: A sequence of measurements over time Medicine ECG 64.0 Stock Market Meteorology 62.8 62.0 Geology 66.0 Astronomy 62.0 32.0 Sunspot Chemistry 86.4 ... Biometrics 21.6 Robotics 45.2 43.2 53.0 Earthquake 43.2 42.8 43.2 36.4 time
69.
Tutorial | Time-Series with MatlabTime Series value x5 x2 x6 x3 x1 x4 time
70.
Tutorial | Time-Series with MatlabTime Series value x = (3, 8, 4, 1, 9, 6) 9 8 6 4 3 1 time Sequence of numeric values – Finite: – N-dimensional vectors/points – Infinite: – Infinite-dimensional vectors
71.
Tutorial | Time-Series with MatlabMean Definition: From now on, we will generally assume zero mean — mean normalization:
72.
Tutorial | Time-Series with MatlabVariance Definition: or, if zero mean, then From now on, we will generally assume unit variance — variance normalization:
73.
Tutorial | Time-Series with MatlabMean and variance variance σ mean µ
74.
Tutorial | Time-Series with MatlabWhy and when to normalize Intuitively, the notion of “shape” is generally independent of – Average level (mean) – Magnitude (variance) Unless otherwise specified, we normalize to zero mean and unit variance
75.
Tutorial | Time-Series with MatlabVariance “=” Length Variance of zero-mean series: Length of N-dimensional vector (L2-norm): So that: x2 || ||x x1
76.
Tutorial | Time-Series with MatlabCovariance and correlation Definition or, if zero mean and unit variance, then
77.
Tutorial | Time-Series with MatlabCorrelation and similarity How “strong” is the linear relationship between xt and yt ? For normalized series, residualslope 2.5 2.5 2 ρ = -0.23 2 ρ = 0.99 1.5 1.5 1 1 0.5 0.5 CAD BEF 0 0 -0.5 -0.5 -1 -1 -1.5 -1.5 -2 -2 -2.5 -2.5 -2 -1 0 1 2 -2 -1 0 1 2 FRF FRF
78.
Tutorial | Time-Series with MatlabCorrelation “=” Angle Correlation of normalized series: Cosine law: So that: x θ y x.y
79.
Tutorial | Time-Series with MatlabCorrelation and distance For normalized series, i.e., correlation and squared Euclidean distance are linearly related. x ||x -y || θ y x.y
80.
Tutorial | Time-Series with MatlabErgodicityExample Assume I eat chicken at the same restaurant every day and Question: How often is the food good? – Answer one: – Answer two: Answers are equal ⇒ ergodic – “If the chicken is usually good, then my guests today can safely order other things.”
81.
Tutorial | Time-Series with MatlabErgodicityExample Ergodicity is a common and fundamental assumption, but sometimes can be wrong: “Total number of murders this year is 5% of the population” “If I live 100 years, then I will commit about 5 murders, and if I live 60 years, I will commit about 3 murders” … non-ergodic! Such ergodicity assumptions on population ensembles is commonly called “racism.”
82.
Tutorial | Time-Series with MatlabStationarityExample Is the chicken quality consistent? – Last week: – Two weeks ago: – Last month: – Last year: Answers are equal ⇒ stationary
83.
Tutorial | Time-Series with MatlabAutocorrelation Definition: Is well-defined if and only if the series is (weakly) stationary Depends only on lag ℓ, not time t
86.
Tutorial | Time-Series with MatlabOrthonormal basis Set of N vectors, { e1, e2, …, eN } – Normal: ||ei|| = 1, for all 1 ≤ i ≤ N – Orthogonal: ei¢ej = 0, for i ≠ j Describe a Cartesian coordinate system – Preserve length (aka. “Parseval theorem”) – Preserve angles (inner-product, correlations)
87.
Tutorial | Time-Series with MatlabOrthonormal basis Note that the coefficients xi w.r.t. the basis { e1, …, eN } are the corresponding “similarities” of x to each basis vector/series: 6 4 3.5 1.5 2 1 = -0.5 + 4 + … -0.5 -2 e1 e2 x x2
88.
Tutorial | Time-Series with MatlabOrthonormal bases The time-domain basis is a trivial tautology: – Each coefficient is simply the value at one time instant What other bases may be of interest? Coefficients may correspond to: – Frequency (Fourier) – Time/scale (wavelets) – Features extracted from series collection (PCA)
93.
Tutorial | Time-Series with MatlabFrequency One cycle every 20 time units (period)
94.
Tutorial | Time-Series with MatlabFrequency and time . = 0 Why is the period 20? period = 8 It’s not 8, because its “similarity” (projection) to a period-8 series (of the same length) is zero.
95.
Tutorial | Time-Series with MatlabFrequency and time . = 0 period = 10 Why is the cycle 20? It’s not 10, because its “similarity” (projection) to a period-10 series (of the same length) is zero.
96.
Tutorial | Time-Series with MatlabFrequency and time . = 0 period = 40 Why is the cycle 20? It’s not 40, because its “similarity” (projection) to a period-40 series (of the same length) is zero. …and so on
97.
Tutorial | Time-Series with MatlabFrequencyFourier transform - Intuition To find the period, we compared the time series with sinusoids of many different periods Therefore, a good “description” (or basis) would consist of all these sinusoids This is precisely the idea behind the discrete Fourier transform – The coefficients capture the similarity (in terms of amplitude and phase) of the series with sinusoids of different periods
98.
Tutorial | Time-Series with MatlabFrequencyFourier transform - Intuition Technical details: – We have to ensure we get an orthonormal basis – Real form: sines and cosines at N/2 different frequencies – Complex form: exponentials at N different frequencies
99.
Tutorial | Time-Series with MatlabFourier transformReal form For odd-length series, The pair of bases at frequency fk areplus the zero-frequency (mean) component
100.
Tutorial | Time-Series with MatlabFourier transformReal form — Amplitude and phase Observe that, for any fk, we can write where are the amplitude and phase, respectively.
101.
Tutorial | Time-Series with MatlabFourier transformReal form — Amplitude and phase It is often easier to think in terms of amplitude rk and phase θ k – e.g., 1 0.5 0 -0.5 5 -1 0 10 20 30 40 50 60 70 80
102.
Tutorial | Time-Series with MatlabFourier transformComplex form The equations become easier to handle if we allow the series and the Fourier coefficients Xk to take complex values: Matlab note: fft omits the scaling factor and is not unitary—however, ifft includes an scaling factor, so always ifft(fft(x)) == x.
106.
Tutorial | Time-Series with MatlabFrequency and timee.g., . period = 20 ≠ 0 . ≠ 0 period = 10 What is the cycle now? etc… No single cycle, because the series isn’t exactly similar with any series of the same length.
107.
Tutorial | Time-Series with MatlabFrequency and time Fourier is successful for summarization of series with a few, stable periodic components However, content is “smeared” across frequencies when there are – Frequency shifts or jumps, e.g., – Discontinuities (jumps) in time, e.g.,
108.
Tutorial | Time-Series with MatlabFrequency and time If there are discontinuities in time/frequency or frequency shifts, then we should seek an alternate “description” or basis Main idea: Localize bases in time – Short-time Fourier transform (STFT) – Discrete wavelet transform (DWT)
109.
Tutorial | Time-Series with MatlabFrequency and timeIntuition What if we examined, e.g., eight values at a time?
110.
Tutorial | Time-Series with MatlabFrequency and timeIntuition What if we examined, e.g., eight values at a time? Can only compare with periods up to eight. – Results may be different for each group (window)
111.
Tutorial | Time-Series with MatlabFrequency and timeIntuition Can “adapt” to localized phenomena Fixed window: short-window Fourier (STFT) – How to choose window size? Variable windows: wavelets
112.
Tutorial | Time-Series with MatlabWaveletsIntuition Main idea – Use small windows for small periods • Remove high-frequency component, then – Use larger windows for larger periods • Twice as large – Repeat recursively Technical details – Need to ensure we get an orthonormal basis
113.
Tutorial | Time-Series with MatlabWaveletsIntuition Scale (frequency) Frequency Time Time
114.
Tutorial | Time-Series with MatlabWaveletsIntuition — Tiling time and frequency Scale (frequency)Frequency Frequency Time Time Fourier, DCT, … STFT Wavelets
115.
Tutorial | Time-Series with MatlabWavelet transformPyramid algorithm High pass Low pass
116.
Tutorial | Time-Series with MatlabWavelet transformPyramid algorithm High pass Low pass
117.
Tutorial | Time-Series with MatlabWavelet transformPyramid algorithm High pass Low pass
118.
Tutorial | Time-Series with MatlabWavelet transformPyramid algorithm High w1 passx ≡ w0 High w2 pass Low v1 pass High w3 Low v2 pass pass Low v3 pass
119.
Tutorial | Time-Series with MatlabWavelet transformsGeneral form A high-pass / low-pass filter pair – Example: pairwise difference / average (Haar) – In general: Quadrature Mirror Filter (QMF) pair • Orthogonal spans, which cover the entire space – Additional requirements to ensure orthonormality of overall transform… Use to recursively analyze into top / bottom half of frequency band
120.
Tutorial | Time-Series with MatlabWavelet transformsOther filters — examples Haar (Daubechies-1) Better frequency isolation Worse time localization Daubechies-2 Daubechies-3 Daubechies-4 Wavelet filter, or Scaling filter, or Mother filter Father filter (high-pass) (low-pass)
125.
Tutorial | Time-Series with Matlab Other wavelets Only scratching the surface… Wavelet packets – All possible tilings (binary) – Best-basis transform Overcomplete wavelet transform (ODWT), aka. maximum-overlap wavelets (MODWT), aka. shift- invariant waveletsFurther reading:1. Donald B. Percival, Andrew T. Walden, Wavelet Methods for Time Series Analysis,Cambridge Univ. Press, 2006.2. Gilbert Strang, Truong Nguyen, Wavelets and Filter Banks, Wellesley College, 1996.3. Tao Li, Qi Li, Shenghuo Zhu, Mitsunori Ogihara, A Survey of Wavelet Applications inData Mining, SIGKDD Explorations, 4(2), 2002.
126.
Tutorial | Time-Series with MatlabMore on wavelets Signal representation and compressibility 100 Partial energy (GBP) 100 Partial energy (Light) 90 90 80 80 70 70 Quality (% energy) Quality (% energy) 60 60 50 50 40 40 30 30 20 Time 20 Time FFT FFT 10 Haar 10 Haar DB3 DB3 0 0 0 2 4 6 8 10 0 5 10 15 Compression (% coefficients) Compression (% coefficients)
127.
Tutorial | Time-Series with Matlab More wavelets Keeping the highest coefficients minimizes total error (L2-distance) Other coefficient selection/thresholding schemes for different error metrics (e.g., maximum per-instant error, or L1 -dist.) – Typically use Haar basesFurther reading:1. Minos Garofalakis, Amit Kumar, Wavelet Synopses for General Error Metrics, ACMTODS, 30(4), 2005.2.Panagiotis Karras, Nikos Mamoulis, One-pass Wavelet Synopses for Maximum-ErrorMetrics, VLDB 2005.
137.
Tutorial | Time-Series with MatlabTime series collectionsOverview Fourier and wavelets are the most prevalent and successful “descriptions” of time series. Next, we will consider collections of M time series, each of length N. – What is the series that is “most similar” to all series in the collection? – What is the second “most similar”, and so on…
138.
Tutorial | Time-Series with MatlabTime series collections Some notation:values at time t, xt i-th series, x(i)
141.
Tutorial | Time-Series with Matlab Principal Component Analysis Matrix notation — Singular Value Decomposition (SVD) X = UΣVT X U ΣVTx(1) x(2) x(M) = u1 u2 uk . υ1 υ2 υ3 υM coefficients w.r.t. basis in U time series basis for (columns) time series
142.
Tutorial | Time-Series with Matlab Principal Component Analysis Matrix notation — Singular Value Decomposition (SVD) X = UΣVT X U ΣVT v’1 v’2x(1) x(2) x(M) = u1 u2 uk . υ1 υ2 υ3 υN v’k basis for measurements time series basis for (rows) time series coefficients w.r.t. basis in U (columns)
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.
Be the first to comment