9/15/2008 CTBTO Data Mining/Data Fusion Workshop
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

9/15/2008 CTBTO Data Mining/Data Fusion Workshop

  • 453 views
Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
453
On Slideshare
453
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
6
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. 9/15/2008 CTBTO Data Mining/Data Fusion Workshop Spatiotemporal Stream Mining Applied to Seismic+ Data Margaret H. Dunham CSE Department Southern Methodist University Dallas, Texas 75275 USA [email_address]
  • 2. Outline
    • CTBTO Data
    • CTBTO Modeling Requirements
    • EMM
    9/15/2008 CTBTO Data Mining/Data Fusion Workshop
  • 3. CTBTO Data
    • As a Data Miner I must first understand your DATA
    9/15/2008 CTBTO Data Mining/Data Fusion Workshop
    • Diverse – Seismic, Hydroacoustic, Infrasound, Radionuclide
    • Spatial (source and sensor)
    • Temporal
    • STREAM Data
  • 4. From Sensors to Streams
    • Stream Data - Data captured and sent by a set of sensors
    • Real-time sequence of encoded signals which contain desired information.
    • Continuous, ordered (implicitly by arrival time or explicitly by timestamp or by geographic coordinates) sequence of items
    • Stream data is infinite - the data keeps coming.
    11/26/07 – IRADSN’07
  • 5. CTBTO & Data Mining
    • Data Mining techniques must be defined based on your data and applications
    • Can’t use predefined fixed models and prediction/classification techniques.
    • Must not redo massive amounts of algorithms already created.
    9/15/2008 CTBTO Data Mining/Data Fusion Workshop
  • 6. CTBTO + DM Requirements
      • Model:
        • Handle different data types (seismic, hydroacoustic, etc.)
        • Spatial + Temporal (Spatiotemporal)
        • Hierarchical
        • Scalable
        • Online
        • Dynamic
      • Anomaly Detection:
        • Not just specific wave type or data values
        • Relationships between arrival of waves/data
        • Combined values of data from all sensors
    9/15/2008 CTBTO Data Mining/Data Fusion Workshop
  • 7. EMM (Extensible Markov Model)
    • Time Varying Discrete First Order Markov Model
    • Nodes are clusters of real world states.
    • Overlap of learning and validation phases
    • Learning:
      • Transition probabilities between nodes
      • Node labels (centroid or medoid of cluster)
      • Nodes are added and removed as data arrives
    • Applications: prediction, anomaly detection
    9/15/2008 CTBTO Data Mining/Data Fusion Workshop
  • 8. Research Objectives
    • Apply proven spatiotemporal modeling technique to seismic data
    • Construct EMM to model sensor data
      • Local EMM at location or area
      • Hierarchical EMM to summarize lower level models
      • Represent all data in one vector of values
      • EMM learns normal behavior
    • Develop new similarity metrics to include all sensor data types (Fusion)
    • Apply anomaly detection algorithms
    9/15/2008 CTBTO Data Mining/Data Fusion Workshop
  • 9. EMM Creation/Learning 9/15/2008 <18,10,3,3,1,0,0> <17,10,2,3,1,0,0> <16,9,2,3,1,0,0> <14,8,2,3,1,0,0> <14,8,2,3,0,0,0> <18,10,3,3,1,1,0.>
  • 10. Input Data Representation
    • Vector of sensor values (numeric) at precise time points or aggregated over time intervals.
    • Need not come from same sensor types.
    • Similarity/distance between vectors used to determine creation of new nodes in EMM.
    9/15/2008 CTBTO Data Mining/Data Fusion Workshop
  • 11. Anomaly Detection with EMM
    • Objective : Detect rare (unusual, surprising) events
    • Advantages:
      • Dynamically learns what is normal
      • Based on this learning, can predict what is not normal
      • Do not have to a priori indicate normal behavior
    • Applications:
      • Network Intrusion
      • Data: IP traffic data, Automobile traffic data
    • Seismic:
      • Unusual Seismic Events
      • Automatically Filter out normal events
    11/3/04 Weekdays Weekend Minnesota DOT Traffic Data Detected unusual weekend traffic pattern
  • 12. EMM with Seismic Data
    • Input – Wave arrivals (all or one per sensor)
    • Identify states and changes of states in seismic data
    • Wave form would first have to be converted into a series of vectors representing the activity at various points in time.
    • Initial Testing with RDG data
    • Use amplitude, period, and wave type
    9/15/2008 CTBTO Data Mining/Data Fusion Workshop
  • 13. New Distance Measure
    • Data = <amplitude, period, wave type>
    • Different wave type = 100% difference
    • For events of same wave type:
      • 50% weight given to the difference in amplitude.
      • 50% weight given to the difference in period.
    • If the distance is greater than the threshold, a state change is required.
    •    amplitude =
    • | amplitude new – amplitude average | / amplitude average
    •  period =
    • | period new – period average | / period average
    9/15/2008 CTBTO Data Mining/Data Fusion Workshop
  • 14. EMM with Seismic Data 9/15/2008 CTBTO Data Mining/Data Fusion Workshop States 1, 2, and 3 correspond to Noise, Wave A, and Wave B respectively.
  • 15. Preliminary Testing
    • RDG data February 1, 1981 – 6 earthquakes
    • Find transition times close to known earthquakes
    • 9 total nodes
    • 652 total transitions
    • Found all quakes
    9/15/2008 CTBTO Data Mining/Data Fusion Workshop
  • 16. EMM Nodes 9/15/2008 CTBTO Data Mining/Data Fusion Workshop . Node # Average amplitude Average period Phase code 1 1.649  m 0.119 sec P (primary wave) 2 8.353  m 0.803 sec P (primary wave) 3 23.237  m 0.898 sec P (primary wave) 4 87.324  m 0.997 sec P (primary wave) 5 253.333  m 1.282 sec P (primary wave) 6 270.524  m 0.96 sec P (primary wave) 7 7.719  m 20.4 sec P (primary wave) 8 723.088  m 1.962 sec P (primary wave) 9 1938.772  m 1.2 sec P (primary wave)
  • 17. Hierarchical EMM 9/15/2008 CTBTO Data Mining/Data Fusion Workshop
  • 18. Now What? 9/15/2008 CTBTO Data Mining/Data Fusion Workshop DATA NEEDED NOISE MAY NOT BE BAD KDD CUP Interest DM COMMUNITY
  • 19. References
    • Zhigang Li and Margaret H. Dunham, “ STIFF: A Forecasting Framework for Spatio-Temporal Data”, Proceedings of the First International Workshop on Knowledge Discovery in Multimedia and Complex Data , May 2002, pp 1-9.
    • Zhigang Li, Liangang Liu, and Margaret H. Dunham, “ Considering Correlation Between Variables to Improve Spatiotemporal Forecasting,” Proceedings of the PAKDD Conference, May 2003, pp 519-531.
    • Jie Huang, Yu Meng, and Margaret H. Dunham, “Extensible Markov Model,” Proceedings IEEE ICDM Conference , November 2004, pp 371-374.
    • Yu Meng and Margaret H. Dunham, “Efficient Mining of Emerging Events in a Dynamic Spatiotemporal,” Proceedings of the IEEE PAKDD Conference , April 2006, Singapore. (Also in Lecture Notes in Computer Science , Vol 3918, 2006, Springer Berlin/Heidelberg, pp 750-754.)
    • Yu Meng and Margaret H. Dunham, “Mining Developing Trends of Dynamic Spatiotemporal Data Streams,” Journal of Computers , Vol 1, No 3, June 2006, pp 43-50.
    • Charlie Isaksson, Yu Meng, and Margaret H. Dunham, “Risk Leveling of Network Traffic Anomalies,” International Journal of Computer Science and Network Security , Vol 6, No 6, June 2006, pp 258-265.
    • Margaret H. Dunham and Vijay Kumar, “Stream Hierarchy Data Mining for Sensor Data,” Innovations and Real-Time Applications of Distributed Sensor Networks (DSN) Symposium , November 26, 2007, Shreveport Louisiana.
    9/15/2008 CTBTO Data Mining/Data Fusion Workshop