Exploring The Hubness-Related Properties of Oceanographic Sensor Data

678 views
620 views

Published on

The talk was delivered at the Conference on Data Mining and Data Warehouses (SiKDD 2011) on the 10th of October 2011 in Ljubljana, Slovenia.

Publication: http://bit.ly/zrpZ35

Abstract:
In this paper we examine how the high dimensionality of oceanographic sensor data impacts the potential use of nearest-neighbor machine learning methods. We focus on one particular consequence of the curse of dimensionality – hubness. We examine the hubness of oceanographic data and show how it can be used to visualize and detect both prototypical sensors/locations, as well as ambiguous and potentially erroneous ones. We proceed to define an easy classification problem on the data, showing that the recently developed hubness-aware classification methods may help to overcome some of the hubness-related issues in sensor data.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
678
On SlideShare
0
From Embeds
0
Number of Embeds
15
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Exploring The Hubness-Related Properties of Oceanographic Sensor Data

  1. 1. EXPLORING THEHUBNESS-RELATED Nenad Tomašev PROPERTIES OF Dunja OCEANOGRAPHIC Mladenić SENSOR DATA
  2. 2. PRESENTATION OUTLINEHubness and why it matters Oceanographic data: overview Bad hubs in the measurementsVisualizing the problematic sensors
  3. 3. WHY IT MATTERS Hubness is the skewness (asymmetry) in the distribution of k- occurrences: some points ( Hubs) become neighbors very VERY often This often happens in high dimensional data It is, however, a phenomenon only of importance for nearest- neighbor methods So, why should we care, in general?
  4. 4. WHY IT MATTERS Sensor data = streams, time series The state of the art for time series data: 1 -NN classifier coupled with an appropriate metric for comparing the time series In other words: nearest neighbor methods are not only occasionally used for time series classification, they are considered the state of the art! So, hubness matters.
  5. 5. RELATED WORK Radovanovic, Nanopulous, Ivanovic: Time series classification in many intrinsic dimensions, SDM 2010 Due to the correlation between subsequent values, not all time series are inherently very high dimensional Some, however – are. These time series have been shown to exhibit hubness. Also – bad hubness. It was shown that in such cases, bad -hubness-based weighting is helpful (the hw -kNN algorithm)
  6. 6. ANALYSIS GOALS Explore the k-nearest neighbor structure of the oceanographic sensor data Explore the bad hubness in the data Visualize the results
  7. 7. TEST CASE: OCEANOGRAPHIC DATA Integrated Ocean Observing System data (http://www.ioos.gov/) Nodes spread across the Pacific, Atlantic and Great lakes… Several sensors at each node, measuring various quantities air temperature, barometric pressure, wind, water level observation, water level prediction, salinity, water temperature and conductivity
  8. 8. TEST CASE: OCEANOGRAPHIC DATA 20 days worth of measurements 10.11 .-30.11.2010. Sampled every 6 minutes (10 measurements an hour) 4801 measurements total for each sensor Missing values: replaced by the average of the closest known values
  9. 9. THE EXPERIMENTAL SETUP Tested under two dif ferent metrics  Manhattan, Variance of between-series differences  Future work: perform the experiments with DTW (Dynamic Time Warping) Defined “Pacific”, “Atlantic” and “Lakes” as location-based labels = 3 categories
  10. 10. SKEWNESS, BAD HUBNESS
  11. 11. CLASS TO CLASS HUBNESS MATRIX, K=3, WIND MEASUREMENTS 0.772 0.186 0.042 0.013 0.987 0.0 0.027 0.014 0.959 Atlantic = 1. Pacific = 2. Lakes = 3
  12. 12. WOULD THE HUBNESS-AWARE METHODS HELP?
  13. 13. WIND MEASUREMENTS: SENSOR HUBNESS MAP
  14. 14. WIND MEASUREMENTS: SENSOR HUBNESS MAP
  15. 15. WATER TEMPERATURE: SENSOR HUBNESS MAP
  16. 16. WATER TEMPERATURE: SENSOR HUBNESS MAP
  17. 17. BAROMETRIC PRESSURE: SENSOR HUBNESS MAP
  18. 18. AIR TEMPERATURE: THE BERMUDA TRIANGLE 
  19. 19. CONCLUSIONS: Bad hubness may be useful to detect potentially erroneous measurement devices Some measurement type stream apparently do exhibit hubness, so hubness is a phenomenon of interest for dealing with sensor data Hubness-aware methods could be potentially helpful when working with sensor data
  20. 20. AKNOWLEDGEMENTSThis work was supported by the ICT Programme of the EC PlanetData (ICTNoE- 257641).
  21. 21. THANK YOU FOR YOUR ATTENTION

×