9/15/2008 CTBTO Data Mining/Data Fusion Workshop

518 views

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
518
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
7
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

9/15/2008 CTBTO Data Mining/Data Fusion Workshop

  1. 1. 9/15/2008 CTBTO Data Mining/Data Fusion Workshop Spatiotemporal Stream Mining Applied to Seismic+ Data Margaret H. Dunham CSE Department Southern Methodist University Dallas, Texas 75275 USA [email_address]
  2. 2. Outline <ul><li>CTBTO Data </li></ul><ul><li>CTBTO Modeling Requirements </li></ul><ul><li>EMM </li></ul>9/15/2008 CTBTO Data Mining/Data Fusion Workshop
  3. 3. CTBTO Data <ul><li>As a Data Miner I must first understand your DATA </li></ul>9/15/2008 CTBTO Data Mining/Data Fusion Workshop <ul><li>Diverse – Seismic, Hydroacoustic, Infrasound, Radionuclide </li></ul><ul><li>Spatial (source and sensor) </li></ul><ul><li>Temporal </li></ul><ul><li>STREAM Data </li></ul>
  4. 4. From Sensors to Streams <ul><li>Stream Data - Data captured and sent by a set of sensors </li></ul><ul><li>Real-time sequence of encoded signals which contain desired information. </li></ul><ul><li>Continuous, ordered (implicitly by arrival time or explicitly by timestamp or by geographic coordinates) sequence of items </li></ul><ul><li>Stream data is infinite - the data keeps coming. </li></ul>11/26/07 – IRADSN’07
  5. 5. CTBTO & Data Mining <ul><li>Data Mining techniques must be defined based on your data and applications </li></ul><ul><li>Can’t use predefined fixed models and prediction/classification techniques. </li></ul><ul><li>Must not redo massive amounts of algorithms already created. </li></ul>9/15/2008 CTBTO Data Mining/Data Fusion Workshop
  6. 6. CTBTO + DM Requirements <ul><ul><li>Model: </li></ul></ul><ul><ul><ul><li>Handle different data types (seismic, hydroacoustic, etc.) </li></ul></ul></ul><ul><ul><ul><li>Spatial + Temporal (Spatiotemporal) </li></ul></ul></ul><ul><ul><ul><li>Hierarchical </li></ul></ul></ul><ul><ul><ul><li>Scalable </li></ul></ul></ul><ul><ul><ul><li>Online </li></ul></ul></ul><ul><ul><ul><li>Dynamic </li></ul></ul></ul><ul><ul><li>Anomaly Detection: </li></ul></ul><ul><ul><ul><li>Not just specific wave type or data values </li></ul></ul></ul><ul><ul><ul><li>Relationships between arrival of waves/data </li></ul></ul></ul><ul><ul><ul><li>Combined values of data from all sensors </li></ul></ul></ul>9/15/2008 CTBTO Data Mining/Data Fusion Workshop
  7. 7. EMM (Extensible Markov Model) <ul><li>Time Varying Discrete First Order Markov Model </li></ul><ul><li>Nodes are clusters of real world states. </li></ul><ul><li>Overlap of learning and validation phases </li></ul><ul><li>Learning: </li></ul><ul><ul><li>Transition probabilities between nodes </li></ul></ul><ul><ul><li>Node labels (centroid or medoid of cluster) </li></ul></ul><ul><ul><li>Nodes are added and removed as data arrives </li></ul></ul><ul><li>Applications: prediction, anomaly detection </li></ul>9/15/2008 CTBTO Data Mining/Data Fusion Workshop
  8. 8. Research Objectives <ul><li>Apply proven spatiotemporal modeling technique to seismic data </li></ul><ul><li>Construct EMM to model sensor data </li></ul><ul><ul><li>Local EMM at location or area </li></ul></ul><ul><ul><li>Hierarchical EMM to summarize lower level models </li></ul></ul><ul><ul><li>Represent all data in one vector of values </li></ul></ul><ul><ul><li>EMM learns normal behavior </li></ul></ul><ul><li>Develop new similarity metrics to include all sensor data types (Fusion) </li></ul><ul><li>Apply anomaly detection algorithms </li></ul>9/15/2008 CTBTO Data Mining/Data Fusion Workshop
  9. 9. EMM Creation/Learning 9/15/2008 <18,10,3,3,1,0,0> <17,10,2,3,1,0,0> <16,9,2,3,1,0,0> <14,8,2,3,1,0,0> <14,8,2,3,0,0,0> <18,10,3,3,1,1,0.>
  10. 10. Input Data Representation <ul><li>Vector of sensor values (numeric) at precise time points or aggregated over time intervals. </li></ul><ul><li>Need not come from same sensor types. </li></ul><ul><li>Similarity/distance between vectors used to determine creation of new nodes in EMM. </li></ul>9/15/2008 CTBTO Data Mining/Data Fusion Workshop
  11. 11. Anomaly Detection with EMM <ul><li>Objective : Detect rare (unusual, surprising) events </li></ul><ul><li>Advantages: </li></ul><ul><ul><li>Dynamically learns what is normal </li></ul></ul><ul><ul><li>Based on this learning, can predict what is not normal </li></ul></ul><ul><ul><li>Do not have to a priori indicate normal behavior </li></ul></ul><ul><li>Applications: </li></ul><ul><ul><li>Network Intrusion </li></ul></ul><ul><ul><li>Data: IP traffic data, Automobile traffic data </li></ul></ul><ul><li>Seismic: </li></ul><ul><ul><li>Unusual Seismic Events </li></ul></ul><ul><ul><li>Automatically Filter out normal events </li></ul></ul>11/3/04 Weekdays Weekend Minnesota DOT Traffic Data Detected unusual weekend traffic pattern
  12. 12. EMM with Seismic Data <ul><li>Input – Wave arrivals (all or one per sensor) </li></ul><ul><li>Identify states and changes of states in seismic data </li></ul><ul><li>Wave form would first have to be converted into a series of vectors representing the activity at various points in time. </li></ul><ul><li>Initial Testing with RDG data </li></ul><ul><li>Use amplitude, period, and wave type </li></ul>9/15/2008 CTBTO Data Mining/Data Fusion Workshop
  13. 13. New Distance Measure <ul><li>Data = <amplitude, period, wave type> </li></ul><ul><li>Different wave type = 100% difference </li></ul><ul><li>For events of same wave type: </li></ul><ul><ul><li>50% weight given to the difference in amplitude. </li></ul></ul><ul><ul><li>50% weight given to the difference in period. </li></ul></ul><ul><li>If the distance is greater than the threshold, a state change is required. </li></ul><ul><li>   amplitude = </li></ul><ul><li>| amplitude new – amplitude average | / amplitude average </li></ul><ul><li> period = </li></ul><ul><li>| period new – period average | / period average </li></ul>9/15/2008 CTBTO Data Mining/Data Fusion Workshop
  14. 14. EMM with Seismic Data 9/15/2008 CTBTO Data Mining/Data Fusion Workshop States 1, 2, and 3 correspond to Noise, Wave A, and Wave B respectively.
  15. 15. Preliminary Testing <ul><li>RDG data February 1, 1981 – 6 earthquakes </li></ul><ul><li>Find transition times close to known earthquakes </li></ul><ul><li>9 total nodes </li></ul><ul><li>652 total transitions </li></ul><ul><li>Found all quakes </li></ul>9/15/2008 CTBTO Data Mining/Data Fusion Workshop
  16. 16. EMM Nodes 9/15/2008 CTBTO Data Mining/Data Fusion Workshop . Node # Average amplitude Average period Phase code 1 1.649  m 0.119 sec P (primary wave) 2 8.353  m 0.803 sec P (primary wave) 3 23.237  m 0.898 sec P (primary wave) 4 87.324  m 0.997 sec P (primary wave) 5 253.333  m 1.282 sec P (primary wave) 6 270.524  m 0.96 sec P (primary wave) 7 7.719  m 20.4 sec P (primary wave) 8 723.088  m 1.962 sec P (primary wave) 9 1938.772  m 1.2 sec P (primary wave)
  17. 17. Hierarchical EMM 9/15/2008 CTBTO Data Mining/Data Fusion Workshop
  18. 18. Now What? 9/15/2008 CTBTO Data Mining/Data Fusion Workshop DATA NEEDED NOISE MAY NOT BE BAD KDD CUP Interest DM COMMUNITY
  19. 19. References <ul><li>Zhigang Li and Margaret H. Dunham, “ STIFF: A Forecasting Framework for Spatio-Temporal Data”, Proceedings of the First International Workshop on Knowledge Discovery in Multimedia and Complex Data , May 2002, pp 1-9. </li></ul><ul><li>Zhigang Li, Liangang Liu, and Margaret H. Dunham, “ Considering Correlation Between Variables to Improve Spatiotemporal Forecasting,” Proceedings of the PAKDD Conference, May 2003, pp 519-531. </li></ul><ul><li>Jie Huang, Yu Meng, and Margaret H. Dunham, “Extensible Markov Model,” Proceedings IEEE ICDM Conference , November 2004, pp 371-374. </li></ul><ul><li>Yu Meng and Margaret H. Dunham, “Efficient Mining of Emerging Events in a Dynamic Spatiotemporal,” Proceedings of the IEEE PAKDD Conference , April 2006, Singapore. (Also in Lecture Notes in Computer Science , Vol 3918, 2006, Springer Berlin/Heidelberg, pp 750-754.) </li></ul><ul><li>Yu Meng and Margaret H. Dunham, “Mining Developing Trends of Dynamic Spatiotemporal Data Streams,” Journal of Computers , Vol 1, No 3, June 2006, pp 43-50. </li></ul><ul><li>Charlie Isaksson, Yu Meng, and Margaret H. Dunham, “Risk Leveling of Network Traffic Anomalies,” International Journal of Computer Science and Network Security , Vol 6, No 6, June 2006, pp 258-265. </li></ul><ul><li>Margaret H. Dunham and Vijay Kumar, “Stream Hierarchy Data Mining for Sensor Data,” Innovations and Real-Time Applications of Distributed Sensor Networks (DSN) Symposium , November 26, 2007, Shreveport Louisiana. </li></ul>9/15/2008 CTBTO Data Mining/Data Fusion Workshop

×