Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

No Downloads

Total views

2,387

On SlideShare

0

From Embeds

0

Number of Embeds

3

Shares

0

Downloads

989

Comments

0

Likes

2

No embeds

No notes for slide

- 1. Hands-On Time-Series Analysis with MatlabMichalis Vlachos and Spiros Papadimitriou IBM T.J. Watson Research Center
- 2. Tutorial | Time-Series with MatlabDisclaimer Feel free to use any of the following slides for educational purposes, however kindly acknowledge the source. We would also like to know how you have used these slides, so please send us emails with comments or suggestions.
- 3. Tutorial | Time-Series with Matlab About this tutorial The goal of this tutorial is to show you that time-series research (or research in general) can be made fun, when it involves visualizing ideas, that can be achieved with concise programming. Matlab enables us to do that. Will I be able I am definitely to use this smarter than her, MATLAB but I am not a time- right away series person, per-se. after the tutorial? I wonder what I gain from this tutorial…
- 4. Tutorial | Time-Series with Matlab Disclaimer We are not affiliated with Mathworks in any way … but we do like using Matlab a lot since it makes our lives easier Errors and bugs are most likely contained in this tutorial. We might be responsible for some of them.
- 5. Tutorial | Time-Series with Matlab What this tutorial is NOT about Moving averages Autoregressive models Forecasting/Prediction Stationarity Seasonality
- 6. Tutorial | Time-Series with MatlabOverviewPART A — The Matlab programming environmentPART B — Basic mathematics Introduction / geometric intuition Coordinates and transforms Quantized representations Non-Euclidean distancesPART C — Similarity Search and Applications Introduction Representations Distance Measures Lower Bounding Clustering/Classification/Visualization Applications
- 7. Tutorial | Time-Series with MatlabPART A: Matlab Introduction
- 8. Tutorial | Time-Series with MatlabWhy does anyone need Matlab? Matlab enables the efficient Exploratory Data Analysis (EDA)“Science progresses through observation” -- Isaac Newton Isaac Newton“The greatest value of a picture is that is forces us to notice what we never expected to see” -- John Tukey John Tukey
- 9. Tutorial | Time-Series with MatlabMatlab Interpreted Language – Easy code maintenance (code is very compact) – Very fast array/vector manipulation – Support for OOP Easy plotting and visualization Easy Integration with other Languages/OS’s – Interact with C/C++, COM Objects, DLLs – Build in Java support (and compiler) – Ability to make executable files – Multi-Platform Support (Windows, Mac, Linux) Extensive number of Toolboxes – Image, Statistics, Bioinformatics, etc
- 10. Tutorial | Time-Series with MatlabHistory of Matlab (MATrix LABoratory)“The most important thing in the programming language is the name.I have recently invented a very good name and now I am looking for asuitable language”. -- R. Knuth Programmed by Cleve Moler as an interface for EISPACK & LINPACK Cleve Moler 1957: Moler goes to Caltech. Studies numerical Analysis 1961: Goes to Stanford. Works with G. Forsythe on Laplacian eigenvalues. 1977: First edition of Matlab; 2000 lines of Fortran – 80 functions (now more than 8000 functions) 1979: Met with Jack Little in Stanford. Started working on porting it to C 1984: Mathworks is foundedVideo:http://www.mathworks.com/company/aboutus/founders/origins_of_matlab_wm.html
- 11. Tutorial | Time-Series with Matlab
- 12. Tutorial | Time-Series with MatlabCurrent State of Matlab/Mathworks Matlab, Simulink, Stateflow Matlab version 7.3, R2006b Used in variety of industries – Aerospace, defense, computers, communication, biotech Mathworks still is privately owned Used in >3,500 Universities, with >500,000 users worldwide 2005 Revenue: >350 M. Money is better than Money is better than poverty, if only for poverty, if only for 2005 Employees: 1,400+ financial reasons…… financial reasons…… Pricing: – starts from 1900$ (Commercial use), – ~100$ (Student Edition)
- 13. Tutorial | Time-Series with MatlabMatlab 7.3 R2006b, Released on Sept 1 2006 – Distributed computing – Better support for large files – New optimization Toolbox – Matlab builder for Java • create Java classes from Matlab – Demos, Webinars in Flash format – (http://www.mathworks.com/products/matlab/demos. html)
- 14. Tutorial | Time-Series with MatlabWho needs Matlab? R&D companies for easy application deployment Professors – Lab assignments – Matlab allows focus on algorithms not on language features Students – Batch processing of files • No more incomprehensible perl code! – Great environment for testing ideas • Quick coding of ideas, then porting to C/Java etc – Easy visualization – It’s cheap! (for students at least…)
- 15. Tutorial | Time-Series with MatlabStarting up Matlab Personally Im always ready to learn, although I do not always like be Sir Winston Churchill Dos/Unix like directory navigation Commands like: – cd – pwd – mkdir For navigation it is easier to just copy/paste the path from explorer E.g.: cd ‘c:documents’
- 16. Tutorial | Time-Series with MatlabMatlab Environment Command Window: - type commands - load scripts Workspace: Loaded Variables/Types/Size
- 17. Tutorial | Time-Series with Matlab Matlab Environment Command Window: - type commands - load scripts Workspace: Loaded Variables/Types/SizeHelp contains a comprehensiveintroduction to all functions
- 18. Tutorial | Time-Series with MatlabMatlab Environment Command Window: - type commands - load scripts Workspace: Loaded Variables/Types/Size Excellent demos and tutorial of the various features and toolboxes
- 19. Tutorial | Time-Series with MatlabStarting with Matlab Everything is arrays Manipulation of arrays is faster than regular manipulation with for-loops a = [1 2 3 4 5 6 7 9 10] % define an array
- 20. Tutorial | Time-Series with MatlabPopulating arrays Plot sinusoid function a = [0:0.3:2*pi] % generate values from 0 to 2pi (with step of 0.3) b = cos(a) % access cos at positions contained in array [a] plot(a,b) % plot a (x-axis) against b (y-axis)Related:linspace(-100,100,15); % generate 15 values between -100 and 100
- 21. Tutorial | Time-Series with MatlabArray Access Access array elements >> a(1) >> a(1:3) ans = ans = 0 0.3000 0.6000 0 Set array elements >> a(1) = 100 >> a(1:3) = [100 100 100]
- 22. Tutorial | Time-Series with Matlab2D Arrays Can access whole columns or rows – Let’s define a 2D array >> a = [1 2 3; 4 5 6] >> a(1,:) Row-wise access a = ans = 1 2 3 4 5 6 1 2 3 >> a(2,2) >> a(:,1) Column-wise access ans = ans = 5 1 4 A good listener is not only popular everywhere, but after a while he gets to know something. –Wilson Mizner
- 23. Tutorial | Time-Series with MatlabColumn-wise computation For arrays greater than 1D, all computations happen column-by-column >> a = [1 2 3; 3 2 1] >> max(a) a = ans = 1 2 3 3 2 1 3 2 3 >> mean(a) >> sort(a) ans = ans = 2.0000 2.0000 2.0000 1 2 1 3 2 3
- 24. Tutorial | Time-Series with MatlabConcatenating arrays Column-wise or row-wise >> a = [1 2 3]; Row next to row >> a = [1;2]; Column next to column >> b = [4 5 6]; >> b = [3;4]; >> c = [a b] >> c = [a b] c = c = 1 3 1 2 3 4 5 6 2 4 >> a = [1 2 3]; Row below row >> a = [1;2]; Column below column >> b = [4 5 6]; >> b = [3;4]; >> c = [a; b] >> c = [a; b] c = c = 1 2 3 1 4 5 6 2 3 4
- 25. Tutorial | Time-Series with MatlabInitializing arrays Create array of ones [ones] >> a = ones(1,3) >> a = ones(2,2)*5; a = a = 1 1 1 5 5 5 5 >> a = ones(1,3)*inf a = Inf Inf Inf Create array of zeroes [zeros] – Good for initializing arrays >> a = zeros(1,4) >> a = zeros(3,1) + [1 2 3]’ a = a = 1 0 0 0 0 2 3
- 26. Tutorial | Time-Series with MatlabReshaping and Replicating Arrays Changing the array shape [reshape] – (eg, for easier column-wise computation) >> a = [1 2 3 4 5 6]’; % make it into a column reshape(X,[M,N]): >> reshape(a,2,3) [M,N] matrix of columnwise version ans = of X 1 3 5 2 4 6 Replicating an array [repmat] >> a = [1 2 3]; repmat(X,[M,N]): >> repmat(a,1,2) make [M,N] tiles of X ans = 1 2 3 1 2 3 >> repmat(a,2,1) ans = 1 2 3 1 2 3
- 27. Tutorial | Time-Series with MatlabUseful Array functions Last element of array [end] >> a = [1 3 2 5]; >> a = [1 3 2 5]; >> a(end) >> a(end-1) ans = ans = 5 2 Length of array [length] Length = 4 >> length(a) ans = a= 1 3 2 5 4 Dimensions of array [size] columns = 4 rows = 1 >> [rows, columns] = size(a) rows = 1 1 2 3 5 columns = 4
- 28. Tutorial | Time-Series with MatlabUseful Array functions Find a specific element [find] ** >> a = [1 3 2 5 10 5 2 3]; >> b = find(a==2) b = 3 7 Sorting [sort] *** >> a = [1 3 2 5]; >> [s,i]=sort(a) a= 1 3 2 5 s = 1 2 3 5 s= 1 2 3 5 i = 1 3 2 4 i= 1 3 2 4 Indicates the index where the element came from
- 29. Tutorial | Time-Series with Matlab Visualizing Data and Exporting Figures Use Fisher’s Iris dataset >> load fisheriris – 4 dimensions, 3 species – Petal length & width, sepal length & width – Iris: • virginica/versicolor/setosa meas (150x4 array): Holds 4D measurements ...versicolorversicolorversicolorversicolorversicolor species (150x1 cell array):virginica Holds name of species forvirginica the specific measurementvirginicavirginica‘ ...
- 30. Tutorial | Time-Series with Matlab strcmp, scatter, hold onVisualizing Data (2D) >> idx_setosa = strcmp(species, ‘setosa’); % rows of setosa data >> idx_virginica = strcmp(species, ‘virginica’); % rows of virginica >> >> setosa = meas(idx_setosa,[1:2]); >> virgin = meas(idx_virginica,[1:2]); >> scatter(setosa(:,1), setosa(:,2)); % plot in blue circles by default >> hold on; >> scatter(virgin(:,1), virgin(:,2), ‘rs’); % red[r] squares[s] for these idx_setosa ... 1 1 An array of zeros and 1 ones indicating the 0 positions where the 0 keyword ‘setosa’ was 0 found ... The world is governed more by appearances rather than realities… --Daniel Webster
- 31. Tutorial | Time-Series with Matlab scatter3 Visualizing Data (3D) >> idx_setosa = strcmp(species, ‘setosa’); % rows of setosa data >> idx_virginica = strcmp(species, ‘virginica’); % rows of virginica >> idx_versicolor = strcmp(species, ‘versicolor’); % rows of versicolor >> setosa = meas(idx_setosa,[1:3]); >> virgin = meas(idx_virginica,[1:3]); >> versi = meas(idx_versicolor,[1:3]); >> scatter3(setosa(:,1), setosa(:,2),setosa(:,3)); % plot in blue circles by default >> hold on; >> scatter3(virgin(:,1), virgin(:,2),virgin(:,3), ‘rs’); % red[r] squares[s] for these >> scatter3(versi(:,1), virgin(:,2),versi(:,3), ‘gx’); % green x’s 7 6 5 4 >> grid on; % show grid on axis 3 >> rotate3D on; % rotate with mouse 2 14.5 4 8 3.5 7.5 7 6.5 3 6 5.5 2.5 5 4.5 2 4
- 32. Tutorial | Time-Series with MatlabChanging Plots Visually Zoom out Zoom in Computers are Computers are useless. They can useless. They can Create line only give you only give you answers… answers… Create Arrow Select Object Add text
- 33. Tutorial | Time-Series with MatlabChanging Plots Visually Add titles Add labels on axis Change tick labels Add grids to axis Change color of line Change thickness/ Linestyle etc
- 34. Tutorial | Time-Series with MatlabChanging Plots Visually (Example) Change color and width of a line A Right click C B
- 35. Tutorial | Time-Series with MatlabChanging Plots Visually (Example) The result … Other Styles: 3 2 1 0 -1 -2 -3 0 10 20 30 40 50 60 70 80 90 100 3 2 1 0 -1 -2 -3 0 10 20 30 40 50 60 70 80 90 100
- 36. Tutorial | Time-Series with MatlabChanging Figure Properties with Code GUI’s are easy, but sooner or later we realize that coding is faster>> a = cumsum(randn(365,1)); % random walk of 365 values If this represents a year’s worth of measurements of an imaginary quantity, we will change: • x-axis annotation to months • Axis labels • Put title in the figure • Include some greek letters in the title just for fun Real men do it command-line… --Anonymous
- 37. Tutorial | Time-Series with MatlabChanging Figure Properties with Code Axis annotation to months>> axis tight; % irrelevant but useful...>> xx = [15:30:365];>> set(gca, ‘xtick’,xx) The result … Real men do it command-line… --Anonymous
- 38. Tutorial | Time-Series with MatlabChanging Figure Properties with Code Axis annotation to months >> set(gca,’xticklabel’,[‘Jan’; ... ‘Feb’;‘Mar’]) The result … Real men do it command-line… --Anonymous
- 39. Tutorial | Time-Series with Matlab Changing Figure Properties with Code Other latex examples: Axis labels and title alpha, beta, e^{-alpha} etc >> title(‘My measurements (epsilon/pi)’)>> ylabel(‘Imaginary Quantity’)>> xlabel(‘Month of 2005’) Real men do it command-line… --Anonymous
- 40. Tutorial | Time-Series with MatlabSaving Figures Matlab allows to save the figures (.fig) for later processing .fig can be later opened through Matlab You can always put-off for tomorrow, what you can do today. -Anonymous
- 41. Tutorial | Time-Series with MatlabExporting Figures Export to: emf, eps, jpg, etc
- 42. Tutorial | Time-Series with MatlabExporting figures (code) You can also achieve the same result with Matlab code Matlab code: % extract to color eps print -depsc myImage.eps; % from command-line print(gcf,’-depsc’,’myImage’) % using variable as name
- 43. Tutorial | Time-Series with MatlabVisualizing Data - 2D Bars 1 2 3 4 colormap bars time = [100 120 80 70]; % our data h = bar(time); % get handle cmap = [1 0 0; 0 1 0; 0 0 1; .5 0 1]; % colors colormap(cmap); % create colormap cdata = [1 2 3 4]; % assign colors set(h,CDataMapping,direct,CData,cdata);
- 44. Tutorial | Time-Series with MatlabVisualizing Data - 3D Bars data colormap10 10 8 7 0 0 0 8 9 6 5 0.0198 0.0124 0.0079 6 8 6 4 0.0397 0.0248 0.0158 4 6 5 4 0.0595 0.0372 0.0237 2 6 3 2 0.0794 0.0496 0.0316 0 3 2 1 64 0.0992 0.0620 0.0395 ... 1 2 1.0000 0.7440 0.4738 3 1.0000 0.7564 0.4817 5 6 3 1.0000 0.7688 0.4896 7 1 2 1.0000 0.7812 0.4975 3 data = [ 10 8 7; 9 6 5; 8 6 4; 6 5 4; 6 3 2; 3 2 1]; bar3([1 2 3 5 6 7], data); c = colormap(gray); % get colors of colormap c = c(20:55,:); % get some colors colormap(c); % new colormap
- 45. Tutorial | Time-Series with MatlabVisualizing Data - Surfaces data 10 9 1 2 3 … 10 8 1 7 6 5 9 10 4 1 10 3 2 1 10 The value at position 8 6 8 10 x-y of the array 4 6 indicates the height of 4 2 2 the surface 0 0data = [1:10];data = repmat(data,10,1); % create datasurface(data,FaceColor,[1 1 1], Edgecolor, [0 0 1]); % plot dataview(3); grid on; % change viewpoint and put axis lines
- 46. Tutorial | Time-Series with MatlabCreating .m files Standard text files – Script: A series of Matlab commands (no input/output arguments) – Functions: Programs that accept input and return output Right click
- 47. Tutorial | Time-Series with MatlabCreating .m files M editor Double click
- 48. Tutorial | Time-Series with Matlab cumsum, num2str, save Creating .m files The following script will create: – An array with 10 random walk vectors – Will save them under text files: 1.dat, …, 10.datmyScript.m Sample Script A cumsum(A)a = cumsum(randn(100,10)); % 10 random walk data of length 100 1 1for i=1:size(a,2), % number of columns data = a(:,i) ; 2 3 fname = [num2str(i) ‘.dat’]; % a string is a vector of characters! save(fname, ’data’,’-ASCII’); % save each column in a text file 3 6end 4 10 Write this in the 5 15 A random walk time-series M editor… 10 5 0 …and execute by typing the name on the Matlab -5 command line 0 10 20 30 40 50 60 70 80 90 100
- 49. Tutorial | Time-Series with MatlabFunctions in .m scripts When we need to: – Organize our code – Frequently change parameters in our scripts keyword output argument function name input argumentfunction dataN = zNorm(data)% ZNORM zNormalization of vector Help Text% subtract mean and divide by std (help function_name)if (nargin<1), % check parameters error(‘Not enough arguments’);enddata = data – mean(data); % subtract mean Function Bodydata = data/std(data); % divide by stddataN = data; function [a,b] = myFunc(data, x, y) % pass & return more argumentsSee also:varargin, varargout
- 50. Tutorial | Time-Series with MatlabCell Arrays Cells that hold other Matlab arrays – Let’s read the files of a directory >> f = dir(‘*.dat’) % read file contents f = 15x1 struct array with fields: name me date Struct Array ).na bytes name f(1 date isdir 1 bytes for i=1:length(f), isdir a{i} = load(f(i).name); 2 N = length(a{i}); plot3([1:N], a{i}(:,1), a{i}(:,2), ... 3 ‘r-’, ‘Linewidth’, 1.5); grid on; 4 pause; 600 5 cla; 500 end 400 300 200 100 0 1000 1500 500 1000 500
- 51. Tutorial | Time-Series with MatlabReading/Writing Files Load/Save are faster than C style I/O operations – But fscanf, fprintf can be useful for file formatting or reading non-Matlab filesfid = fopen(fischer.txt, wt);for i=1:length(species), fprintf(fid, %6.4f %6.4f %6.4f %6.4f %sn, meas(i,:), species{i});endfclose(fid);Output file: Elements are accessed column-wise (again…) x = 0:.1:1; y = [x; exp(x)]; fid = fopen(exp.txt,w); fprintf(fid,%6.2f %12.8fn,y); fclose(fid); 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 1 1.1052 1.2214 1.3499 1.4918 1.6487 1.8221 2.0138
- 52. Tutorial | Time-Series with MatlabFlow Control/Loops if (else/elseif) , switch – Check logical conditions while – Execute statements infinite number of times for – Execute statements a fixed number of times break, continue return – Return execution to the invoking function Life is pleasant. Death is peaceful. It’s the transition that’s troublesome. –Isaac Asimov
- 53. Tutorial | Time-Series with Matlab tic, toc, clear allFor-Loop or vectorization? Pre-allocate arrays that store output results clear all; elapsed_time = – No need for Matlab to tic; for i=1:50000 5.0070 resize everytime a(i) = sin(i); end Functions are faster than toc scripts – Compiled into pseudo- clear all; elapsed_time = code a = zeros(1,50000); tic; 0.1400 Load/Save faster than for i=1:50000 a(i) = sin(i); Matlab I/O functions end toc After v. 6.5 of Matlab there is for-loop vectorization (interpreter) clear all; tic; elapsed_time = Vectorizations help, but i = [1:50000]; not so obvious how to a = sin(i); 0.0200 toc; achieve many times Time not important…only life important. –The Fifth Element
- 54. Tutorial | Time-Series with Matlab Matlab Profiler Find which portions of code take up most of the execution time – Identify bottlenecks – Vectorize offending code Time not important…only life important. –The Fifth Element
- 55. Tutorial | Time-Series with MatlabHints &Tips There is always an easier (and faster) way – Typically there is a specialized function for what you want to achieve Learn vectorization techniques, by ‘peaking’ at the actual Matlab files: – edit [fname], eg – edit mean – edit princomp Matlab Help contains many vectorization examples
- 56. Tutorial | Time-Series with MatlabDebugging Beware of bugs in the above code; I have only proved it correct, not tried it -- R. Knuth Not as frequently required as in C/C++ – Set breakpoints, step, step in, check variables values Set breakpoints
- 57. Tutorial | Time-Series with Matlab Either this man is Either this man is dead or my watch dead or my watchDebugging has stopped. has stopped. Full control over variables and execution path – F10: step, F11: step in (visit functions, as well)A B F10 C
- 58. Tutorial | Time-Series with MatlabAdvanced Features – 3D modeling/Volume Rendering Very easy volume manipulation and rendering
- 59. Tutorial | Time-Series with MatlabAdvanced Features – Making Animations (Example) Create animation by changing the camera viewpoint 3 3 2 2 1 13 0 02 -1 -11 -2 -20 -3 0 0 -3-1 0 4-2 50 3 50 50 2-3 1 -1 0 0 1 2 100 4 100 3 4 100 2 3 -1 0 1 -1azimuth = [50:100 99:-1:50]; % azimuth range of valuesfor k = 1:length(azimuth), plot3(1:length(a), a(:,1), a(:,2), r, Linewidth,2); grid on; view(azimuth(k),30); % change new M(k) = getframe; % save the frameendmovie(M,20); % play movie 20 times See also:movie2avi
- 60. Tutorial | Time-Series with MatlabAdvanced Features – GUI’s Built-in Development Environment – Buttons, figures, Menus, sliders, etc Several Examples in Help – Directory listing – Address book reader – GUI with multiple axis
- 61. Tutorial | Time-Series with MatlabAdvanced Features – Using Java Matlab is shipped with Java Virtual Machine (JVM) Access Java API (eg I/O or networking) Import Java classes and construct objects Pass data between Java objects and Matlab variables
- 62. Tutorial | Time-Series with MatlabAdvanced Features – Using Java (Example) Stock Quote Query – Connect to Yahoo server – http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=4069&objectType=file disp(Contacting YAHOO server using ...); disp([url = java.net.URL( urlString )]); end; url = java.net.URL(urlString); try stream = openStream(url); ireader = java.io.InputStreamReader(stream); breader = java.io.BufferedReader(ireader); connect_query_data= 1; %connect made; catch connect_query_data= -1; %could not connect case; disp([URL: urlString]); error([Could not connect to server. It may be unavailable. Try again later.]); stockdata={}; return; end
- 63. Tutorial | Time-Series with MatlabMatlab Toolboxes You ca n buy m any specialize d toolbox e s from Ma thw orks – Image Processing, Statistics, Bio-Informatics, etc The re a re m any equiva le nt free toolbox e s too: – SVM toolbox • http://theoval.sys.uea.ac.u k/~gcc/svm/toolbox/ – W avelets • http://www.math.rutgers.ed u/~ojanen/wavekit/ – Speech Processing • http://www.ee.ic.ac.uk/hp /staff/dmb/voicebox/voicebox.html – Bayesian Networks • http://www.cs.ubc.ca/~murphyk/Software/BNT/bnt.html
- 64. Tutorial | Time-Series with Matlab I’ve had a wonderful I’ve had a wonderfulIn case I get stuck… evening. But this evening. But this wasn’t it… wasn’t it… help [command] (on the command line) eg. help fft Menu: help -> matlab help – Excellent introduction on various topics Matlab webinars – http://www.mathworks.com/company/events/archived_webinars.html?fp Google groups – comp.soft-sys.matlab – You can find *anything* here – Someone else had the same problem before you!
- 65. Tutorial | Time-Series with MatlabPART B: Mathematical notions Eight percent of Eight percent of success is showing success is showing up. up.
- 66. Tutorial | Time-Series with MatlabOverview of Part B1. Introduction and geometric intuition2. Coordinates and transforms Fourier transform (DFT) Wavelet transform (DWT) Incremental DWT Principal components (PCA) Incremental PCA3. Quantized representations Piecewise quantized / symbolic Vector quantization (VQ) / K-means4. Non-Euclidean distances Dynamic time warping (DTW)
- 67. Tutorial | Time-Series with MatlabWhat is a time-seriesDefinition: A sequence of measurements over timeDefinition: A sequence of measurements over time Medicine ECG 64.0 Stock Market Meteorology 62.8 62.0 Geology 66.0 Astronomy 62.0 32.0 Sunspot Chemistry 86.4 ... Biometrics 21.6 Robotics 45.2 43.2 53.0 Earthquake 43.2 42.8 43.2 36.4 time
- 68. Tutorial | Time-Series with MatlabApplications Images Shapes Motion capture ImageColor Histogram600400200 Acer platanoides 0 50 100 150 200 250400200 0 50 100 150 200 250800600400200 0 50 100 150 200 250 Time-Series …more to come Salix fragilis
- 69. Tutorial | Time-Series with MatlabTime Series value x5 x2 x6 x3 x1 x4 time
- 70. Tutorial | Time-Series with MatlabTime Series value x = (3, 8, 4, 1, 9, 6) 9 8 6 4 3 1 time Sequence of numeric values – Finite: – N-dimensional vectors/points – Infinite: – Infinite-dimensional vectors
- 71. Tutorial | Time-Series with MatlabMean Definition: From now on, we will generally assume zero mean — mean normalization:
- 72. Tutorial | Time-Series with MatlabVariance Definition: or, if zero mean, then From now on, we will generally assume unit variance — variance normalization:
- 73. Tutorial | Time-Series with MatlabMean and variance variance σ mean µ
- 74. Tutorial | Time-Series with MatlabWhy and when to normalize Intuitively, the notion of “shape” is generally independent of – Average level (mean) – Magnitude (variance) Unless otherwise specified, we normalize to zero mean and unit variance
- 75. Tutorial | Time-Series with MatlabVariance “=” Length Variance of zero-mean series: Length of N-dimensional vector (L2-norm): So that: x2 || ||x x1
- 76. Tutorial | Time-Series with MatlabCovariance and correlation Definition or, if zero mean and unit variance, then
- 77. Tutorial | Time-Series with MatlabCorrelation and similarity How “strong” is the linear relationship between xt and yt ? For normalized series, residualslope 2.5 2.5 2 ρ = -0.23 2 ρ = 0.99 1.5 1.5 1 1 0.5 0.5 CAD BEF 0 0 -0.5 -0.5 -1 -1 -1.5 -1.5 -2 -2 -2.5 -2.5 -2 -1 0 1 2 -2 -1 0 1 2 FRF FRF
- 78. Tutorial | Time-Series with MatlabCorrelation “=” Angle Correlation of normalized series: Cosine law: So that: x θ y x.y
- 79. Tutorial | Time-Series with MatlabCorrelation and distance For normalized series, i.e., correlation and squared Euclidean distance are linearly related. x ||x -y || θ y x.y
- 80. Tutorial | Time-Series with MatlabErgodicityExample Assume I eat chicken at the same restaurant every day and Question: How often is the food good? – Answer one: – Answer two: Answers are equal ⇒ ergodic – “If the chicken is usually good, then my guests today can safely order other things.”
- 81. Tutorial | Time-Series with MatlabErgodicityExample Ergodicity is a common and fundamental assumption, but sometimes can be wrong: “Total number of murders this year is 5% of the population” “If I live 100 years, then I will commit about 5 murders, and if I live 60 years, I will commit about 3 murders” … non-ergodic! Such ergodicity assumptions on population ensembles is commonly called “racism.”
- 82. Tutorial | Time-Series with MatlabStationarityExample Is the chicken quality consistent? – Last week: – Two weeks ago: – Last month: – Last year: Answers are equal ⇒ stationary
- 83. Tutorial | Time-Series with MatlabAutocorrelation Definition: Is well-defined if and only if the series is (weakly) stationary Depends only on lag ℓ, not time t
- 84. Tutorial | Time-Series with Matlab Time-domain “coordinates” 6 4 3.5 2 1.5 1 -0.5 = -2-0.5 + 4 + 1.5 + -2+ 2 + 6 + 3.5 + 1
- 85. Tutorial | Time-Series with Matlab Time-domain “coordinates” 6 4 3.5 2 1.5 1 -0.5 = -2 x1-0.5 £e1 + x2 4 £e2 x3 + 1.5 £e3 x4 + -2 £e4+ x5 2 £e5 + x6 6 £e6 x7 + 3.5 £e7 + x8 1 £e8
- 86. Tutorial | Time-Series with MatlabOrthonormal basis Set of N vectors, { e1, e2, …, eN } – Normal: ||ei|| = 1, for all 1 ≤ i ≤ N – Orthogonal: ei¢ej = 0, for i ≠ j Describe a Cartesian coordinate system – Preserve length (aka. “Parseval theorem”) – Preserve angles (inner-product, correlations)
- 87. Tutorial | Time-Series with MatlabOrthonormal basis Note that the coefficients xi w.r.t. the basis { e1, …, eN } are the corresponding “similarities” of x to each basis vector/series: 6 4 3.5 1.5 2 1 = -0.5 + 4 + … -0.5 -2 e1 e2 x x2
- 88. Tutorial | Time-Series with MatlabOrthonormal bases The time-domain basis is a trivial tautology: – Each coefficient is simply the value at one time instant What other bases may be of interest? Coefficients may correspond to: – Frequency (Fourier) – Time/scale (wavelets) – Features extracted from series collection (PCA)
- 89. Tutorial | Time-Series with Matlab Frequency domain “coordinates” Preview 6 4 3.5 2 1.5 1 -0.5 = -2 5.6 + -2.2 + 0 + 2.8- 4.9 + -3 + 0 + 0.05
- 90. Tutorial | Time-Series with MatlabTime series geometrySummary Basic concepts: – Series / vector – Mean: “average level” – Variance: “magnitude/length” – Correlation: “similarity”, “distance”, “angle” – Basis: “Cartesian coordinate system”
- 91. Tutorial | Time-Series with MatlabTime series geometryPreview — Applications The quest for the right basis… Compression / pattern extraction – Filtering – Similarity / distance – Indexing – Clustering – Forecasting – Periodicity estimation – Correlation
- 92. Tutorial | Time-Series with MatlabOverview1. Introduction and geometric intuition2. Coordinates and transforms Fourier transform (DFT) Wavelet transform (DWT) Incremental DWT Principal components (PCA) Incremental PCA3. Quantized representations Piecewise quantized / symbolic Vector quantization (VQ) / K-means4. Non-Euclidean distances Dynamic time warping (DTW)
- 93. Tutorial | Time-Series with MatlabFrequency One cycle every 20 time units (period)
- 94. Tutorial | Time-Series with MatlabFrequency and time . = 0 Why is the period 20? period = 8 It’s not 8, because its “similarity” (projection) to a period-8 series (of the same length) is zero.
- 95. Tutorial | Time-Series with MatlabFrequency and time . = 0 period = 10 Why is the cycle 20? It’s not 10, because its “similarity” (projection) to a period-10 series (of the same length) is zero.
- 96. Tutorial | Time-Series with MatlabFrequency and time . = 0 period = 40 Why is the cycle 20? It’s not 40, because its “similarity” (projection) to a period-40 series (of the same length) is zero. …and so on
- 97. Tutorial | Time-Series with MatlabFrequencyFourier transform - Intuition To find the period, we compared the time series with sinusoids of many different periods Therefore, a good “description” (or basis) would consist of all these sinusoids This is precisely the idea behind the discrete Fourier transform – The coefficients capture the similarity (in terms of amplitude and phase) of the series with sinusoids of different periods
- 98. Tutorial | Time-Series with MatlabFrequencyFourier transform - Intuition Technical details: – We have to ensure we get an orthonormal basis – Real form: sines and cosines at N/2 different frequencies – Complex form: exponentials at N different frequencies
- 99. Tutorial | Time-Series with MatlabFourier transformReal form For odd-length series, The pair of bases at frequency fk areplus the zero-frequency (mean) component
- 100. Tutorial | Time-Series with MatlabFourier transformReal form — Amplitude and phase Observe that, for any fk, we can write where are the amplitude and phase, respectively.
- 101. Tutorial | Time-Series with MatlabFourier transformReal form — Amplitude and phase It is often easier to think in terms of amplitude rk and phase θ k – e.g., 1 0.5 0 -0.5 5 -1 0 10 20 30 40 50 60 70 80
- 102. Tutorial | Time-Series with MatlabFourier transformComplex form The equations become easier to handle if we allow the series and the Fourier coefficients Xk to take complex values: Matlab note: fft omits the scaling factor and is not unitary—however, ifft includes an scaling factor, so always ifft(fft(x)) == x.
- 103. Tutorial | Time-Series with MatlabFourier transformExample 2 1 1 frequency GBP 0 -1 2 2 frequencies 1 GBP 0 -1 2 3 frequencies 1 GBP 0 -1 2 5 frequencies 1 GBP 0 -1 2 10 frequencies 1 GBP 0 -1 2 20 frequencies 1 GBP 0 -1
- 104. Tutorial | Time-Series with MatlabOther frequency-based transforms Discrete Cosine Transform (DCT) – Matlab: dct / idct Modified Discrete Cosine Transform (MDCT)
- 105. Tutorial | Time-Series with MatlabOverview1. Introduction and geometric intuition2. Coordinates and transforms Fourier transform (DFT) Wavelet transform (DWT) Incremental DWT Principal components (PCA) Incremental PCA3. Quantized representations Piecewise quantized / symbolic Vector quantization (VQ) / K-means4. Non-Euclidean distances Dynamic time warping (DTW)
- 106. Tutorial | Time-Series with MatlabFrequency and timee.g., . period = 20 ≠ 0 . ≠ 0 period = 10 What is the cycle now? etc… No single cycle, because the series isn’t exactly similar with any series of the same length.
- 107. Tutorial | Time-Series with MatlabFrequency and time Fourier is successful for summarization of series with a few, stable periodic components However, content is “smeared” across frequencies when there are – Frequency shifts or jumps, e.g., – Discontinuities (jumps) in time, e.g.,
- 108. Tutorial | Time-Series with MatlabFrequency and time If there are discontinuities in time/frequency or frequency shifts, then we should seek an alternate “description” or basis Main idea: Localize bases in time – Short-time Fourier transform (STFT) – Discrete wavelet transform (DWT)
- 109. Tutorial | Time-Series with MatlabFrequency and timeIntuition What if we examined, e.g., eight values at a time?
- 110. Tutorial | Time-Series with MatlabFrequency and timeIntuition What if we examined, e.g., eight values at a time? Can only compare with periods up to eight. – Results may be different for each group (window)
- 111. Tutorial | Time-Series with MatlabFrequency and timeIntuition Can “adapt” to localized phenomena Fixed window: short-window Fourier (STFT) – How to choose window size? Variable windows: wavelets
- 112. Tutorial | Time-Series with MatlabWaveletsIntuition Main idea – Use small windows for small periods • Remove high-frequency component, then – Use larger windows for larger periods • Twice as large – Repeat recursively Technical details – Need to ensure we get an orthonormal basis
- 113. Tutorial | Time-Series with MatlabWaveletsIntuition Scale (frequency) Frequency Time Time
- 114. Tutorial | Time-Series with MatlabWaveletsIntuition — Tiling time and frequency Scale (frequency)Frequency Frequency Time Time Fourier, DCT, … STFT Wavelets
- 115. Tutorial | Time-Series with MatlabWavelet transformPyramid algorithm High pass Low pass
- 116. Tutorial | Time-Series with MatlabWavelet transformPyramid algorithm High pass Low pass
- 117. Tutorial | Time-Series with MatlabWavelet transformPyramid algorithm High pass Low pass
- 118. Tutorial | Time-Series with MatlabWavelet transformPyramid algorithm High w1 passx ≡ w0 High w2 pass Low v1 pass High w3 Low v2 pass pass Low v3 pass
- 119. Tutorial | Time-Series with MatlabWavelet transformsGeneral form A high-pass / low-pass filter pair – Example: pairwise difference / average (Haar) – In general: Quadrature Mirror Filter (QMF) pair • Orthogonal spans, which cover the entire space – Additional requirements to ensure orthonormality of overall transform… Use to recursively analyze into top / bottom half of frequency band
- 120. Tutorial | Time-Series with MatlabWavelet transformsOther filters — examples Haar (Daubechies-1) Better frequency isolation Worse time localization Daubechies-2 Daubechies-3 Daubechies-4 Wavelet filter, or Scaling filter, or Mother filter Father filter (high-pass) (low-pass)
- 121. Tutorial | Time-Series with Matlab Wavelets Example Wavelet coefficients (GBP, Haar) Wavelet coefficients (GBP, Daubechies-3) 2 2 GBP 1 1 0 0 -1 -1 500 1000 1500 2000 2500 500 1000 1500 2000 2500 1 1 W1 0 0 -1 -1 200 400 600 800 1000 1200 200 400 600 800 1000 1200 1 1 W2 0 0 -1 -1 100 200 300 400 500 600 100 200 300 400 500 600 2 1 W3 0 0 -2 -1 50 100 150 200 250 300 50 100 150 200 250 300 2 2 W4 0 0 -2 -2 20 40 60 80 100 120 140 160 20 40 60 80 100 120 140 160 5 5 W5 0 0 -5 -5 10 20 30 40 50 60 70 80 10 20 30 40 50 60 70 80 10 5W6 0 0 -10 -5 5 10 15 20 25 30 35 40 5 10 15 20 25 30 35 40 45 20 20V6 0 0 -20 -20 5 10 15 20 25 30 35 40 5 10 15 20 25 30 35 40 45
- 122. Tutorial | Time-Series with Matlab Wavelets Example Multi-resolution analysis (GBP, Haar) Multi-resolution analysis (GBP, Daubechies-3) 2 2 GBP 1 1 0 0 -1 -1 500 1000 1500 2000 2500 500 1000 1500 2000 2500 0.1 0 0 -0.2D1 -0.1 -0.2 -0.4 -0.3 -0.6 500 1000 1500 2000 2500 500 1000 1500 2000 2500 0.2 0.2 0D2 0 -0.2 -0.2 -0.4 -0.6 500 1000 1500 2000 2500 500 1000 1500 2000 2500 0.4 0.2 0.2D3 0 0 -0.2 -0.2 -0.4 -0.4 500 1000 1500 2000 2500 500 1000 1500 2000 2500 0.4 0.2 0.2 0D4 0 -0.2 -0.2 -0.4 -0.4 500 1000 1500 2000 2500 500 1000 1500 2000 2500 0.5 0.5D5 0 0 -0.5 -0.5 500 1000 1500 2000 2500 500 1000 1500 2000 2500 0.5 0.5 0D6 0 -0.5 -0.5 500 1000 1500 2000 2500 500 1000 1500 2000 2500 2 2 1 1 A6 0 0 -1 -1 500 1000 1500 2000 2500 500 1000 1500 2000 2500
- 123. Tutorial | Time-Series with Matlab Wavelets Example Multi-resolution analysis (GBP, Haar) Multi-resolution analysis (GBP, Daubechies-3) 2 2 GBP 1 1 Analysis levels are orthogonal, 0 0 -1 -1 Di¢Dj = 0, for i ≠ j 500 1000 1500 2000 2500 500 1000 1500 2000 2500 0.1 0 0 -0.2D1 -0.1 -0.2 -0.4 -0.3 -0.6 500 1000 1500 2000 2500 500 1000 1500 2000 2500 0.2 2 Haar analysis: simple, piecewise constant 0.2 0D2 0 -0.2 -0.2 1 -0.4 -0.6 500 0 1000 1500 2000 2500 500 1000 1500 2000 2500 0.4 0.2 0.2 -1D3 0 0 -0.2 -0.2 -0.4 500 1000 -0.4 1500 2000 2500 500 1000 1500 2000 2500 500 1000 1500 2000 2500 0.4 0.2 Daubechies-3 analysis: less artifacting 0.2 0D4 0 -0.2 2 -0.2 -0.4 -0.4 500 1 1000 1500 2000 2500 500 1000 1500 2000 2500 0.5 0.5 0D5 0 0 -0.5 -1 -0.5 500 1000 1500 500 2000 2500 1000 1500 500 1000 2000 1500 2500 2000 2500 0.5 0.5 0D6 0 -0.5 -0.5 500 1000 1500 2000 2500 500 1000 1500 2000 2500 2 2 1 1 A6 0 0 -1 -1 500 1000 1500 2000 2500 500 1000 1500 2000 2500
- 124. Tutorial | Time-Series with MatlabWaveletsMatlab Wavelet GUI: wavemenu Single level: dwt / idwt Multiple level: wavedec / waverec – wmaxlev Wavelet bases: wavefun
- 125. Tutorial | Time-Series with Matlab Other wavelets Only scratching the surface… Wavelet packets – All possible tilings (binary) – Best-basis transform Overcomplete wavelet transform (ODWT), aka. maximum-overlap wavelets (MODWT), aka. shift- invariant waveletsFurther reading:1. Donald B. Percival, Andrew T. Walden, Wavelet Methods for Time Series Analysis,Cambridge Univ. Press, 2006.2. Gilbert Strang, Truong Nguyen, Wavelets and Filter Banks, Wellesley College, 1996.3. Tao Li, Qi Li, Shenghuo Zhu, Mitsunori Ogihara, A Survey of Wavelet Applications inData Mining, SIGKDD Explorations, 4(2), 2002.
- 126. Tutorial | Time-Series with MatlabMore on wavelets Signal representation and compressibility 100 Partial energy (GBP) 100 Partial energy (Light) 90 90 80 80 70 70 Quality (% energy) Quality (% energy) 60 60 50 50 40 40 30 30 20 Time 20 Time FFT FFT 10 Haar 10 Haar DB3 DB3 0 0 0 2 4 6 8 10 0 5 10 15 Compression (% coefficients) Compression (% coefficients)
- 127. Tutorial | Time-Series with Matlab More wavelets Keeping the highest coefficients minimizes total error (L2-distance) Other coefficient selection/thresholding schemes for different error metrics (e.g., maximum per-instant error, or L1 -dist.) – Typically use Haar basesFurther reading:1. Minos Garofalakis, Amit Kumar, Wavelet Synopses for General Error Metrics, ACMTODS, 30(4), 2005.2.Panagiotis Karras, Nikos Mamoulis, One-pass Wavelet Synopses for Maximum-ErrorMetrics, VLDB 2005.
- 128. Tutorial | Time-Series with MatlabOverview1. Introduction and geometric intuition2. Coordinates and transforms Fourier transform (DFT) Wavelet transform (DWT) Incremental DWT Principal components (PCA) Incremental PCA3. Quantized representations Piecewise quantized / symbolic Vector quantization (VQ) / K-means4. Non-Euclidean distances Dynamic time warping (DTW)
- 129. Tutorial | Time-Series with MatlabWaveletsIncremental estimation
- 130. Tutorial | Time-Series with MatlabWaveletsIncremental estimation
- 131. Tutorial | Time-Series with MatlabWaveletsIncremental estimation
- 132. Tutorial | Time-Series with MatlabWaveletsIncremental estimation
- 133. Tutorial | Time-Series with MatlabWaveletsIncremental estimation
- 134. Tutorial | Time-Series with MatlabWaveletsIncremental estimation post-order traversal
- 135. Tutorial | Time-Series with MatlabWaveletsIncremental estimation Forward transform : – Post-order traversal of wavelet coefficient tree – O(1) time (amortized) – O(logN) buffer space (total) constant factor: filter length Inverse transform: – Pre-order traversal of wavelet coefficient tree – Same complexity
- 136. Tutorial | Time-Series with MatlabOverview1. Introduction and geometric intuition2. Coordinates and transforms Fourier transform (DFT) Wavelet transform (DWT) Incremental DWT Principal components (PCA) Incremental PCA3. Quantized representations Piecewise quantized / symbolic Vector quantization (VQ) / K-means4. Non-Euclidean distances Dynamic time warping (DTW)
- 137. Tutorial | Time-Series with MatlabTime series collectionsOverview Fourier and wavelets are the most prevalent and successful “descriptions” of time series. Next, we will consider collections of M time series, each of length N. – What is the series that is “most similar” to all series in the collection? – What is the second “most similar”, and so on…
- 138. Tutorial | Time-Series with MatlabTime series collections Some notation:values at time t, xt i-th series, x(i)
- 139. Tutorial | Time-Series with MatlabPrincipal Component AnalysisExample Exchange rates (vs. USD) Principal components 1-4 (µ ≠ 0) 0.05 u1 2 = 48% AUD U1 0 0 -2 -0.05 0.05 2 + 33% u2 BEF 0 U2 0 -2 -0.05 = 81% 2 0.05 + 11% u3 CAD 0 U3 0 -2 -0.05 = 92% 0.05 2 + 4% FRF u4 0 0 U4 -2 -0.05 500 1000 1500 2000 2500 = 96% 2 DEM Time 0 “Best” basis : { u1, u2, u3, u4 } -2 2 x = 49.1u1 + 8.1u2 + 7.8u3 + 3.6u4 + ε 1 JPY 0 (2) -2 2 Coefficients of each time series NLG 0 -2 2 w.r.t. basis { u1, u2, u3, u4 } : NZL 0 -2 2 ESP 0 -2 2 SEK 0 -2 2 CHF 0 -2 2 GBP 0 -2 500 1000 1500 2000 2500 Time
- 140. Tutorial | Time-Series with MatlabPrincipal component analysis 2 First two principal componentsCAD 0 -2 2 ESP 50 0 -2 SEK 40 2 GBP 0 -2 30 AUD 2 FRF 0 -2 20υi,2 2 BEF 10 0 -2 NZL CHF 0 2 NLG 0 2 DEM -2 0 -10 -2 -20 2 JPY 0 -2 -30 -20 -10 0 10 20 30 40 50 60 υi,1
- 141. Tutorial | Time-Series with Matlab Principal Component Analysis Matrix notation — Singular Value Decomposition (SVD) X = UΣVT X U ΣVTx(1) x(2) x(M) = u1 u2 uk . υ1 υ2 υ3 υM coefficients w.r.t. basis in U time series basis for (columns) time series
- 142. Tutorial | Time-Series with Matlab Principal Component Analysis Matrix notation — Singular Value Decomposition (SVD) X = UΣVT X U ΣVT v’1 v’2x(1) x(2) x(M) = u1 u2 uk . υ1 υ2 υ3 υN v’k basis for measurements time series basis for (rows) time series coefficients w.r.t. basis in U (columns)

No public clipboards found for this slide

Be the first to comment