An adaptive modular approach to the mining of sensor network data G. Bontempi, Y. Le Borgne  (1) {gbonte,yleborgn}@ulb.ac....
Outline <ul><li>Wireless sensor networks: Overview </li></ul><ul><li>Machine learning in WSN </li></ul><ul><li>An adaptive...
Sensor networks : Overview <ul><li>Goal : Allow for a sensing task over an environment </li></ul><ul><li>Desiderata for th...
Smart dust project <ul><li>Smart dust: Get mote size down to 1mm³ </li></ul><ul><ul><li>Berkeley - Deputy dust (2001) </li...
Current available sensors <ul><li>Crossbow : Mica / Mica dot </li></ul><ul><ul><li>uProc:  4Mhz, 8 bit Atmel RISC Radio:  ...
Applications <ul><li>Wildfire monitoring </li></ul><ul><li>Ecosystem monitoring </li></ul><ul><li>Earthquake monitoring  <...
Challenges for… <ul><li>Electronics </li></ul><ul><li>Networking </li></ul><ul><li>Systems </li></ul><ul><li>Data bases </...
Machine learning and WSN <ul><li>Local scale </li></ul><ul><li>Spatio-temporal correlations </li></ul><ul><ul><li>Local pr...
Machine learning and WSN  <ul><li>Global scale </li></ul><ul><li>The network as a a whole can achieve high level tasks </l...
Supervised learning and WSN <ul><li>Classification (Traffic type classification) </li></ul><ul><li>Prediction (Pollution f...
A supervised learning scenario <ul><li>Ѕ : Network of S sensors </li></ul><ul><li>x(t)={s 1 (t),s 2 (t),…s S (t)} snapshot...
Centralized approach High transmission overhead
Two-layer approach <ul><li>Use of compression </li></ul><ul><ul><li>Reduce transmission overhead  </li></ul></ul><ul><ul><...
Two-layer adaptive approach <ul><li>PAST : Online compression </li></ul><ul><li>Lazy learning : Online learning </li></ul>
Compression : PCA <ul><li>PCA:  </li></ul><ul><ul><li>Transform the set of n input variables  ,  into a set of m variables...
PAST – Recursive PCA <ul><li>Projection approximation subspace tracking [YAN95] </li></ul><ul><li>Online formulation: </li...
PAST Algorithm Recursive formulation: [HYV01]
Learning algorithm <ul><li>Lazy learning: K-NN approach </li></ul><ul><ul><ul><li>Storage of observation set: </li></ul></...
How many neighbours? <ul><li>y=sin(x)+e </li></ul><ul><li>e : Gaussian noise with  σ =0.1 </li></ul><ul><li>What is the y ...
How many neighbours? <ul><li>K=2 : Overfitting </li></ul>
How many neighbours? <ul><li>K=2 : Overfitting </li></ul><ul><li>K=3 : Overfitting </li></ul>
How many neighbours? <ul><li>K=2: Overfitting </li></ul><ul><li>K=3: Overfitting </li></ul><ul><li>K=4: Overfitting </li><...
How many neighbours? <ul><li>K=2: Overfitting </li></ul><ul><li>K=3: Overfitting </li></ul><ul><li>K=4: Overfitting </li><...
How many neighbours? <ul><li>K=2: Overfitting </li></ul><ul><li>K=3: Overfitting </li></ul><ul><li>K=4: Overfitting </li><...
Automatic model selection ([BIR99],[BON99],[BON00]) <ul><li>Starting with a low k, local models are identified </li></ul><...
Advantages of PAST and lazy <ul><li>No assumption on the process underlying data </li></ul><ul><li>On-line learning capabi...
Simulation <ul><li>Modeling wave propagation phenomenon  </li></ul><ul><li>Helmholtz equation: </li></ul><ul><li>k is the ...
Test procedure <ul><li>Prediction error measurement </li></ul><ul><ul><li>Normalized Mean Square Error (NMSE) </li></ul></...
Experiment 1 <ul><li>Centralized configuration </li></ul><ul><li>Comparison PCA / PAST for 1 to 16 first principal compone...
Results <ul><li>Prediction accuracy similar if number of principal components sufficient </li></ul>0.115 0.124 0.132 0.196...
Clustering <ul><li>The number of clusters involves a trade-off between </li></ul><ul><ul><li>The routing costs between clu...
Experiment 2 <ul><li>Partitioning into geographical clusters </li></ul><ul><li>P varies from P (2)   to P (7) </li></ul><u...
Results <ul><li>Comparison of P (2)  (Top) and  P (5)  (bottom) error curves </li></ul><ul><li>As number of cluster increa...
Experiment 3 <ul><li>Simulation: at each time instant </li></ul><ul><ul><li>Probability 10% for a sensor failure </li></ul...
Results <ul><li>Comparison of P(2) (Top) and P(5) (bottom) error curves  </li></ul><ul><li>The number of clusters increase...
Experiment 4 <ul><li>Time varying changes in sensor measures </li></ul><ul><ul><li>2700 time instants </li></ul></ul><ul><...
Results <ul><li>Due to the concept drift, the fixed model (in black) becomes outdated </li></ul><ul><li>The lazy character...
Conclusion <ul><li>Architecture: </li></ul><ul><ul><li>Yielding good results compared to batch equivalent </li></ul></ul><...
Future work <ul><li>Extensions of tests to real-world data </li></ul><ul><li>Improvement of clustering strategy </li></ul>...
References <ul><li>Smart Dust project:  http:// www-bsac.eecs.berkeley.edu /archive / users / warneke-brett / SmartDust / ...
References on lazy learning <ul><li>[BIR99] M. Birattari, G. Bontempi, and H. Bersini. Lazy learning meets the recursive l...
<ul><li>Thanks for your attention! </li></ul>
Upcoming SlideShare
Loading in …5
×

"An adaptive modular approach to the mining of sensor network ...

299 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
299
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

"An adaptive modular approach to the mining of sensor network ...

  1. 1. An adaptive modular approach to the mining of sensor network data G. Bontempi, Y. Le Borgne (1) {gbonte,yleborgn}@ulb.ac.be Machine Learning Group Université Libre de Bruxelles – Belgium (1) Supported by the COMP 2 SYS project, sponsored by the HRM program of the European Community (MEST-CT-2004-505079)
  2. 2. Outline <ul><li>Wireless sensor networks: Overview </li></ul><ul><li>Machine learning in WSN </li></ul><ul><li>An adaptive two-layer architecture </li></ul><ul><li>Simulation and results </li></ul><ul><li>Conclusion and perspective </li></ul>
  3. 3. Sensor networks : Overview <ul><li>Goal : Allow for a sensing task over an environment </li></ul><ul><li>Desiderata for the nodes: </li></ul><ul><ul><li>Autonomous power </li></ul></ul><ul><ul><li>Wireless communication </li></ul></ul><ul><ul><li>Computing capabilities </li></ul></ul>
  4. 4. Smart dust project <ul><li>Smart dust: Get mote size down to 1mm³ </li></ul><ul><ul><li>Berkeley - Deputy dust (2001) </li></ul></ul><ul><ul><ul><li>6mm³ </li></ul></ul></ul><ul><ul><ul><li>Solar powered </li></ul></ul></ul><ul><ul><ul><li>Acceleration and light sensors </li></ul></ul></ul><ul><ul><ul><li>Optical communication </li></ul></ul></ul><ul><li>Low cost in large quantities </li></ul>
  5. 5. Current available sensors <ul><li>Crossbow : Mica / Mica dot </li></ul><ul><ul><li>uProc: 4Mhz, 8 bit Atmel RISC Radio: 40 kbit 900/450/300 MHz or 250 kbit 2.5GHz (MicaZ 802.15.4) Memory: 4K RAM / 128 K Program Flash / </li></ul></ul><ul><ul><li>512 K Data Flash Power: 2 x AA or coin cell </li></ul></ul><ul><li>Intel : iMote </li></ul><ul><ul><li>uProc: 12Mhz, 16 bit ARM Radio: Bluetooth Memory: 64K SRAM / 512 K Data Flash Power: 2 x AA </li></ul></ul><ul><li>MoteIV : Telos </li></ul><ul><ul><li>uProc: 8Mhz, 16 bit TI RISC Radio: 250 kbit 2.5GHz (802.15.4) Memory: 2 K RAM / 60 K Program Flash / </li></ul></ul><ul><ul><li>512 K Data Flash Power: 2 x AA </li></ul></ul>
  6. 6. Applications <ul><li>Wildfire monitoring </li></ul><ul><li>Ecosystem monitoring </li></ul><ul><li>Earthquake monitoring </li></ul><ul><li>Precision agriculture </li></ul><ul><li>Object tracking </li></ul><ul><li>Intrusion detection </li></ul><ul><li>… </li></ul>
  7. 7. Challenges for… <ul><li>Electronics </li></ul><ul><li>Networking </li></ul><ul><li>Systems </li></ul><ul><li>Data bases </li></ul><ul><li>Statistics </li></ul><ul><li>Signal processing </li></ul><ul><li>… </li></ul>
  8. 8. Machine learning and WSN <ul><li>Local scale </li></ul><ul><li>Spatio-temporal correlations </li></ul><ul><ul><li>Local predictive model identification </li></ul></ul><ul><ul><li>Can be used to: </li></ul></ul><ul><ul><ul><li>Reduce sensor communication activity </li></ul></ul></ul><ul><ul><ul><li>Predict values for malfunctioning sensors </li></ul></ul></ul>
  9. 9. Machine learning and WSN <ul><li>Global scale </li></ul><ul><li>The network as a a whole can achieve high level tasks </li></ul><ul><li>Sensor network <-> Image </li></ul>
  10. 10. Supervised learning and WSN <ul><li>Classification (Traffic type classification) </li></ul><ul><li>Prediction (Pollution forecast) </li></ul><ul><li>Regression (Wave intensity, population density) </li></ul>
  11. 11. A supervised learning scenario <ul><li>Ѕ : Network of S sensors </li></ul><ul><li>x(t)={s 1 (t),s 2 (t),…s S (t)} snapshot at time t </li></ul><ul><li>y(t)=f(x(t))+ ε (t) the value associated to S at time t ( ε standing for noise) </li></ul><ul><li>Let D N be a set of N observations (x(t),y(t)) </li></ul><ul><ul><li>Goal : Find a model that predicts y for any new x </li></ul></ul>
  12. 12. Centralized approach High transmission overhead
  13. 13. Two-layer approach <ul><li>Use of compression </li></ul><ul><ul><li>Reduce transmission overhead </li></ul></ul><ul><ul><li>Spatial correlation induces low loss in compression </li></ul></ul><ul><ul><li>Reduction of learning problem dimensionality </li></ul></ul>
  14. 14. Two-layer adaptive approach <ul><li>PAST : Online compression </li></ul><ul><li>Lazy learning : Online learning </li></ul>
  15. 15. Compression : PCA <ul><li>PCA: </li></ul><ul><ul><li>Transform the set of n input variables , into a set of m variables , m<n. </li></ul></ul><ul><ul><li>Linear transformation : , </li></ul></ul><ul><ul><li>Variance preserving maximization </li></ul></ul><ul><ul><li>Solution : </li></ul></ul><ul><ul><ul><li>m first eigenvectors of x correlation matrix, or </li></ul></ul></ul><ul><ul><ul><li>Minimization of </li></ul></ul></ul>
  16. 16. PAST – Recursive PCA <ul><li>Projection approximation subspace tracking [YAN95] </li></ul><ul><li>Online formulation: </li></ul><ul><li>Low memory requirement and computational complexity : </li></ul><ul><li> O(n*m)+O(m²) </li></ul>
  17. 17. PAST Algorithm Recursive formulation: [HYV01]
  18. 18. Learning algorithm <ul><li>Lazy learning: K-NN approach </li></ul><ul><ul><ul><li>Storage of observation set: </li></ul></ul></ul><ul><ul><ul><li>When a query q is asked, takes the k nearest neighbours to q: </li></ul></ul></ul><ul><ul><ul><li>Builds a local linear model: , such that </li></ul></ul></ul><ul><ul><ul><li>Computes the output at by applying </li></ul></ul></ul>
  19. 19. How many neighbours? <ul><li>y=sin(x)+e </li></ul><ul><li>e : Gaussian noise with σ =0.1 </li></ul><ul><li>What is the y value at x=1.5? </li></ul>
  20. 20. How many neighbours? <ul><li>K=2 : Overfitting </li></ul>
  21. 21. How many neighbours? <ul><li>K=2 : Overfitting </li></ul><ul><li>K=3 : Overfitting </li></ul>
  22. 22. How many neighbours? <ul><li>K=2: Overfitting </li></ul><ul><li>K=3: Overfitting </li></ul><ul><li>K=4: Overfitting </li></ul>
  23. 23. How many neighbours? <ul><li>K=2: Overfitting </li></ul><ul><li>K=3: Overfitting </li></ul><ul><li>K=4: Overfitting </li></ul><ul><li>K=5: Good </li></ul>
  24. 24. How many neighbours? <ul><li>K=2: Overfitting </li></ul><ul><li>K=3: Overfitting </li></ul><ul><li>K=4: Overfitting </li></ul><ul><li>K=5: Good </li></ul><ul><li>K=6: Underfitting </li></ul>
  25. 25. Automatic model selection ([BIR99],[BON99],[BON00]) <ul><li>Starting with a low k, local models are identified </li></ul><ul><li>Their quality is assessed by a leave one out procedure </li></ul><ul><li>The best model(s) are kept for computing the prediction </li></ul><ul><li>Low computational cost </li></ul><ul><ul><li>PRESS statistics (ALL74) </li></ul></ul><ul><ul><li>Recursive least squares ([GOO84]) </li></ul></ul>
  26. 26. Advantages of PAST and lazy <ul><li>No assumption on the process underlying data </li></ul><ul><li>On-line learning capability </li></ul><ul><li>Adaptive with non-stationarity </li></ul><ul><li>Low computational and memory costs </li></ul>
  27. 27. Simulation <ul><li>Modeling wave propagation phenomenon </li></ul><ul><li>Helmholtz equation: </li></ul><ul><li>k is the wave number </li></ul><ul><li>2372 sensors </li></ul><ul><li>30 k values between 1 and 146; 50 time instants </li></ul><ul><li>1500 Observations </li></ul><ul><li>Output k is noisy </li></ul>
  28. 28. Test procedure <ul><li>Prediction error measurement </li></ul><ul><ul><li>Normalized Mean Square Error (NMSE) </li></ul></ul><ul><ul><li>10-fold cross-validation (1350/150) </li></ul></ul>Example of learning curve:
  29. 29. Experiment 1 <ul><li>Centralized configuration </li></ul><ul><li>Comparison PCA / PAST for 1 to 16 first principal components </li></ul>
  30. 30. Results <ul><li>Prediction accuracy similar if number of principal components sufficient </li></ul>0.115 0.124 0.132 0.196 0.183 0.223 0.257 0.363 0.782 NMSE PAST 0.116 0.124 0.133 0.134 0.138 0.144 0.181 0.266 0.621 NMSE PCA 16 12 8 6 5 4 3 2 1 m
  31. 31. Clustering <ul><li>The number of clusters involves a trade-off between </li></ul><ul><ul><li>The routing costs between clusters and gateway </li></ul></ul><ul><ul><li>The final prediction accuracy </li></ul></ul><ul><ul><li>The robustness of the architecture </li></ul></ul>
  32. 32. Experiment 2 <ul><li>Partitioning into geographical clusters </li></ul><ul><li>P varies from P (2) to P (7) </li></ul><ul><li>2 main components for each cluster </li></ul><ul><li>Ten-fold cross-validation – 1500 data </li></ul>Example of P (2) partitioning
  33. 33. Results <ul><li>Comparison of P (2) (Top) and P (5) (bottom) error curves </li></ul><ul><li>As number of cluster increases: </li></ul><ul><ul><li>Better accuracy </li></ul></ul><ul><ul><li>Faster convergence </li></ul></ul>0.114 0.116 0.118 0.118 0.118 0.140 NMSE P (7) P (6) P (5) P (4) P (3) P (2)
  34. 34. Experiment 3 <ul><li>Simulation: at each time instant </li></ul><ul><ul><li>Probability 10% for a sensor failure </li></ul></ul><ul><ul><li>Probability 1% for a supernode failure </li></ul></ul><ul><li>Recursive PCA and lazy learning deals efficiently with input space dimension variations </li></ul><ul><ul><li>Robust with random sensor malfunctioning </li></ul></ul>
  35. 35. Results <ul><li>Comparison of P(2) (Top) and P(5) (bottom) error curves </li></ul><ul><li>The number of clusters increases the robustness </li></ul>0.117 0.116 0.116 0.119 0.132 0.501 NMSE P (7) P (6) P (5) P (4) P (3) P (2)
  36. 36. Experiment 4 <ul><li>Time varying changes in sensor measures </li></ul><ul><ul><li>2700 time instants </li></ul></ul><ul><ul><li>Sensor response decreases linearly from a factor 1 to a factor 0.4 </li></ul></ul><ul><li>A temporal window: </li></ul><ul><ul><li>Only the last 1500 measures are kept </li></ul></ul>
  37. 37. Results <ul><li>Due to the concept drift, the fixed model (in black) becomes outdated </li></ul><ul><li>The lazy characteristic of the proposed architecture can deal with this drift very easily </li></ul>
  38. 38. Conclusion <ul><li>Architecture: </li></ul><ul><ul><li>Yielding good results compared to batch equivalent </li></ul></ul><ul><ul><li>Computationally efficient </li></ul></ul><ul><ul><li>Adaptive with appearing and disappearing units </li></ul></ul><ul><ul><li>Handling easily non-stationarity </li></ul></ul>
  39. 39. Future work <ul><li>Extensions of tests to real-world data </li></ul><ul><li>Improvement of clustering strategy </li></ul><ul><ul><li>Taking costs (routing/accuracy) into consideration </li></ul></ul><ul><ul><li>Making use of ad-hoc feature of the network </li></ul></ul><ul><li>Test of other compression procedures </li></ul><ul><ul><li>Robust PCA </li></ul></ul><ul><ul><li>ICA </li></ul></ul>
  40. 40. References <ul><li>Smart Dust project: http:// www-bsac.eecs.berkeley.edu /archive / users / warneke-brett / SmartDust / </li></ul><ul><li>Crossbow: http://www.xbow.com/ </li></ul><ul><li>[BON99] G.Bontempi. Local Techniques for Modeling, Prediction and Control. PhD Thesis, IRIDIA- Université Libre de Bruxelles, 1999. </li></ul><ul><li>[YAN95] B. Yang. Projection Approximation Subspace Tracking, IEEE Transactions on Signal Processing, 43(1):95-107,1995. </li></ul><ul><li>[ALL74] D.M. Allen. 1974. The relationship between variable and data augmentation and a method of prediction. Technometrics, 16, 125-127 </li></ul><ul><li>[GOO84] G.C. Goodwin & K.S. Sin. 1984. Adaptive filtering Prediction and Control. Prentice-Hall. </li></ul><ul><li>[HYV01] Independent Component Analysis. A. Hyvarinen, J. Karhunen, E. Oja. 2001. </li></ul>
  41. 41. References on lazy learning <ul><li>[BIR99] M. Birattari, G. Bontempi, and H. Bersini. Lazy learning meets the recursive least square algorithm. In M. S. Kearns, S.a. Solla, and D.a. Cohn, editors, NIPS 11 , pages 375-381, Cambridge,1999, MIT Press. </li></ul><ul><li>[BON99] G. Bontempi, M.Birattari, and H.Bersini. Local learning for iterated time-series prediction. In I. Bratko and S. Dzeroski, editors, Machine Learning : Proceedings of the 16th International Conference , pages 32-38, San Francisco, CA, 1999. Morgan Kaufmann Publishers. </li></ul><ul><li>[BON00] G. Bontempi, M.Birattari, and H. Bersini. A model selection approach for local learning. Artificial Intelligence Communications , 121(1), 2000. </li></ul>
  42. 42. <ul><li>Thanks for your attention! </li></ul>

×