Your SlideShare is downloading. ×
0
9/15/2008 CTBTO Data Mining/Data Fusion Workshop
9/15/2008 CTBTO Data Mining/Data Fusion Workshop
9/15/2008 CTBTO Data Mining/Data Fusion Workshop
9/15/2008 CTBTO Data Mining/Data Fusion Workshop
9/15/2008 CTBTO Data Mining/Data Fusion Workshop
9/15/2008 CTBTO Data Mining/Data Fusion Workshop
9/15/2008 CTBTO Data Mining/Data Fusion Workshop
9/15/2008 CTBTO Data Mining/Data Fusion Workshop
9/15/2008 CTBTO Data Mining/Data Fusion Workshop
9/15/2008 CTBTO Data Mining/Data Fusion Workshop
9/15/2008 CTBTO Data Mining/Data Fusion Workshop
9/15/2008 CTBTO Data Mining/Data Fusion Workshop
9/15/2008 CTBTO Data Mining/Data Fusion Workshop
9/15/2008 CTBTO Data Mining/Data Fusion Workshop
9/15/2008 CTBTO Data Mining/Data Fusion Workshop
9/15/2008 CTBTO Data Mining/Data Fusion Workshop
9/15/2008 CTBTO Data Mining/Data Fusion Workshop
9/15/2008 CTBTO Data Mining/Data Fusion Workshop
9/15/2008 CTBTO Data Mining/Data Fusion Workshop
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

9/15/2008 CTBTO Data Mining/Data Fusion Workshop

291

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
291
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. 9/15/2008 CTBTO Data Mining/Data Fusion Workshop Spatiotemporal Stream Mining Applied to Seismic+ Data Margaret H. Dunham CSE Department Southern Methodist University Dallas, Texas 75275 USA [email_address]
  • 2. Outline
    • CTBTO Data
    • CTBTO Modeling Requirements
    • EMM
    9/15/2008 CTBTO Data Mining/Data Fusion Workshop
  • 3. CTBTO Data
    • As a Data Miner I must first understand your DATA
    9/15/2008 CTBTO Data Mining/Data Fusion Workshop
    • Diverse – Seismic, Hydroacoustic, Infrasound, Radionuclide
    • Spatial (source and sensor)
    • Temporal
    • STREAM Data
  • 4. From Sensors to Streams
    • Stream Data - Data captured and sent by a set of sensors
    • Real-time sequence of encoded signals which contain desired information.
    • Continuous, ordered (implicitly by arrival time or explicitly by timestamp or by geographic coordinates) sequence of items
    • Stream data is infinite - the data keeps coming.
    11/26/07 – IRADSN’07
  • 5. CTBTO & Data Mining
    • Data Mining techniques must be defined based on your data and applications
    • Can’t use predefined fixed models and prediction/classification techniques.
    • Must not redo massive amounts of algorithms already created.
    9/15/2008 CTBTO Data Mining/Data Fusion Workshop
  • 6. CTBTO + DM Requirements
      • Model:
        • Handle different data types (seismic, hydroacoustic, etc.)
        • Spatial + Temporal (Spatiotemporal)
        • Hierarchical
        • Scalable
        • Online
        • Dynamic
      • Anomaly Detection:
        • Not just specific wave type or data values
        • Relationships between arrival of waves/data
        • Combined values of data from all sensors
    9/15/2008 CTBTO Data Mining/Data Fusion Workshop
  • 7. EMM (Extensible Markov Model)
    • Time Varying Discrete First Order Markov Model
    • Nodes are clusters of real world states.
    • Overlap of learning and validation phases
    • Learning:
      • Transition probabilities between nodes
      • Node labels (centroid or medoid of cluster)
      • Nodes are added and removed as data arrives
    • Applications: prediction, anomaly detection
    9/15/2008 CTBTO Data Mining/Data Fusion Workshop
  • 8. Research Objectives
    • Apply proven spatiotemporal modeling technique to seismic data
    • Construct EMM to model sensor data
      • Local EMM at location or area
      • Hierarchical EMM to summarize lower level models
      • Represent all data in one vector of values
      • EMM learns normal behavior
    • Develop new similarity metrics to include all sensor data types (Fusion)
    • Apply anomaly detection algorithms
    9/15/2008 CTBTO Data Mining/Data Fusion Workshop
  • 9. EMM Creation/Learning 9/15/2008 <18,10,3,3,1,0,0> <17,10,2,3,1,0,0> <16,9,2,3,1,0,0> <14,8,2,3,1,0,0> <14,8,2,3,0,0,0> <18,10,3,3,1,1,0.>
  • 10. Input Data Representation
    • Vector of sensor values (numeric) at precise time points or aggregated over time intervals.
    • Need not come from same sensor types.
    • Similarity/distance between vectors used to determine creation of new nodes in EMM.
    9/15/2008 CTBTO Data Mining/Data Fusion Workshop
  • 11. Anomaly Detection with EMM
    • Objective : Detect rare (unusual, surprising) events
    • Advantages:
      • Dynamically learns what is normal
      • Based on this learning, can predict what is not normal
      • Do not have to a priori indicate normal behavior
    • Applications:
      • Network Intrusion
      • Data: IP traffic data, Automobile traffic data
    • Seismic:
      • Unusual Seismic Events
      • Automatically Filter out normal events
    11/3/04 Weekdays Weekend Minnesota DOT Traffic Data Detected unusual weekend traffic pattern
  • 12. EMM with Seismic Data
    • Input – Wave arrivals (all or one per sensor)
    • Identify states and changes of states in seismic data
    • Wave form would first have to be converted into a series of vectors representing the activity at various points in time.
    • Initial Testing with RDG data
    • Use amplitude, period, and wave type
    9/15/2008 CTBTO Data Mining/Data Fusion Workshop
  • 13. New Distance Measure
    • Data = <amplitude, period, wave type>
    • Different wave type = 100% difference
    • For events of same wave type:
      • 50% weight given to the difference in amplitude.
      • 50% weight given to the difference in period.
    • If the distance is greater than the threshold, a state change is required.
    •    amplitude =
    • | amplitude new – amplitude average | / amplitude average
    •  period =
    • | period new – period average | / period average
    9/15/2008 CTBTO Data Mining/Data Fusion Workshop
  • 14. EMM with Seismic Data 9/15/2008 CTBTO Data Mining/Data Fusion Workshop States 1, 2, and 3 correspond to Noise, Wave A, and Wave B respectively.
  • 15. Preliminary Testing
    • RDG data February 1, 1981 – 6 earthquakes
    • Find transition times close to known earthquakes
    • 9 total nodes
    • 652 total transitions
    • Found all quakes
    9/15/2008 CTBTO Data Mining/Data Fusion Workshop
  • 16. EMM Nodes 9/15/2008 CTBTO Data Mining/Data Fusion Workshop . Node # Average amplitude Average period Phase code 1 1.649  m 0.119 sec P (primary wave) 2 8.353  m 0.803 sec P (primary wave) 3 23.237  m 0.898 sec P (primary wave) 4 87.324  m 0.997 sec P (primary wave) 5 253.333  m 1.282 sec P (primary wave) 6 270.524  m 0.96 sec P (primary wave) 7 7.719  m 20.4 sec P (primary wave) 8 723.088  m 1.962 sec P (primary wave) 9 1938.772  m 1.2 sec P (primary wave)
  • 17. Hierarchical EMM 9/15/2008 CTBTO Data Mining/Data Fusion Workshop
  • 18. Now What? 9/15/2008 CTBTO Data Mining/Data Fusion Workshop DATA NEEDED NOISE MAY NOT BE BAD KDD CUP Interest DM COMMUNITY
  • 19. References
    • Zhigang Li and Margaret H. Dunham, “ STIFF: A Forecasting Framework for Spatio-Temporal Data”, Proceedings of the First International Workshop on Knowledge Discovery in Multimedia and Complex Data , May 2002, pp 1-9.
    • Zhigang Li, Liangang Liu, and Margaret H. Dunham, “ Considering Correlation Between Variables to Improve Spatiotemporal Forecasting,” Proceedings of the PAKDD Conference, May 2003, pp 519-531.
    • Jie Huang, Yu Meng, and Margaret H. Dunham, “Extensible Markov Model,” Proceedings IEEE ICDM Conference , November 2004, pp 371-374.
    • Yu Meng and Margaret H. Dunham, “Efficient Mining of Emerging Events in a Dynamic Spatiotemporal,” Proceedings of the IEEE PAKDD Conference , April 2006, Singapore. (Also in Lecture Notes in Computer Science , Vol 3918, 2006, Springer Berlin/Heidelberg, pp 750-754.)
    • Yu Meng and Margaret H. Dunham, “Mining Developing Trends of Dynamic Spatiotemporal Data Streams,” Journal of Computers , Vol 1, No 3, June 2006, pp 43-50.
    • Charlie Isaksson, Yu Meng, and Margaret H. Dunham, “Risk Leveling of Network Traffic Anomalies,” International Journal of Computer Science and Network Security , Vol 6, No 6, June 2006, pp 258-265.
    • Margaret H. Dunham and Vijay Kumar, “Stream Hierarchy Data Mining for Sensor Data,” Innovations and Real-Time Applications of Distributed Sensor Networks (DSN) Symposium , November 26, 2007, Shreveport Louisiana.
    9/15/2008 CTBTO Data Mining/Data Fusion Workshop

×