Sensor Data Management

3,343 views

Published on

In this presentation we review some of the research problems we address at EPFL in the area of sensor data management. At the level of infrastructure we have developed a middleware to seamlessly integrate, aggregate and analyze heterogeneous sensor data streams in real-time, a WIKI based repository supporting the cooperative management of the metadata associated with sensor deployments and cloud-based storage infrastructure. An important problem in managing sensor data is their efficient storage and transmission using compression techniques. To that end we apply model-based compression methods. For analyzing sensor data, we have developed methods to dynamically estimate the variability, which can be readily used for outlier detection, and to extract semantic features from GPS sensor data streams. We also investigate techniques for trading off between the accuracy of the sensor data obtained and the degree of privacy preservation that can be maintained.

The Sensor Data Management presentation was presented by Karl Aberer (Ecole Polytechnique Federale de Lausanne) at the PlanetData project Meeting on February 28 - March 4, 2011 in Innsbruck, Austria.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
3,343
On SlideShare
0
From Embeds
0
Number of Embeds
327
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Sensor Data Management

  1. 1. Sensor Data Management @ EPFL Karl Aberer
  2. 2. Overview  Sensor Data Management –  Global Sensor Networks –  Swiss Experiment –  Sensor Metadata Management –  Time Series compression and retrieval –  Sensor data analysis and quality –  Economics-based resource allocation in distributed clouds –  Cloud-based time series management system  Web Data Management –  Large-scale Semantic Data Integration –  Web Stream Data Analysis (Twitter)
  3. 3. Global Global Sensor Networks Sensor Networks (GSN)Integrates different sensor networks GSN:– Different abstractions, hard to share Reference Implementation Integrity Service– Isolated networks, hard to republish Access ControlGSN server: GSN/Web/Web-Services Notification Manager– Goal: Publishing streams generated Query Processor by sensor networks Query Repository– Storage, archive Storage Manager Virtual Sensor Manager– Access to sensor network hardware Input Stream Manager– Easy setup, easy to change Stream Quality ManagerVirtual Sensor: Life Cycle Manager– Processing, filtering, aggregation Pool Of Sensing Devices– Functional/non-functional properties– Described in a XML file
  4. 4. Current GSN deployments GSN Deployments
  5. 5. Swiss Experiment Infrastructure!"# "$%&( )*+*,- !"#$%&&% ( ()%"*% $+!,)"%
  6. 6. Sensor Metadata Management Metadata Effective Metadata Management in Federated Sensor Networks !"#$%&()&*+,$-&*()&.+/+,,-012&3()&*+45"&*()&67",",&8()&9+:"2&;()&.+/+-1+$$1#&<()&="5$-$%&>()&&& 41&+//"+,&-$&*?<@ ABCB( !"#$%&(&)*%+,-,%&-*,./%"01$%.-,+,-,+2&-*234-+%5)2(/%,4-).,-+%.-,+,-,%6(*,-2)( &(,:&9)-& );%"01%%%%%%% ,+7,(8+%.-,+,-,%&,*89 ;)*%"<2&&=>
  7. 7. Time Series Compression and Retrieval  A model M describes the dependency between two sets of variables X and Y  Models may capture data correlations, derive unknown values, quantify and correct measurement errors –  They are particularly useful for data compression, data completion and data cleaning  Our work is on –  Deriving lower bounds on the achievable compression ratio for a time series –  Define a suitable model-based storage and indexing scheme for fast retrieval –  Defining innovative models for data cleaning and data quality estimation  Publications: ICDE’10, MDM’11, VLDB’11 (under preparation)
  8. 8. Parameter Compression
  9. 9. Data Compression  Towards Multi-Model Approximation of Time-Series Thanasis Papaioannou, Mehdi Riahi, Karl Aberer [MDM 2011] (under review)
  10. 10. Probabilistic Data Generation
  11. 11. Sensor Context Extraction  SeMiTri: A Framework for Semantic Annotation of Heterogeneous Trajectories Z. Yan, D. Chakraborty, C. Parent, S. Spaccapietra, K. Aberer [EDBT 2011] Objec&ve:    A  Middleware  for  automa&cally  annota&ng  trajectories  of  different  types   of  moving  objects  (cars,  people)   Spa&al  join  (region)   bus metro walking Semantic trajectory home office market home Semantic Annotation Middleware Map-­‐matching  (road  network)   Hidden Spatial Map Markov Join Matching Model HMM  (point  of  Interest)   region road network point of interest e1 e2 e3 e4 e5 e6 e7 GPS episodes
  12. 12. Trusted Privacy-preserving Sensing
  13. 13. Economic Cloud Resource Management  Objective: high availability and low response-time in a cost-effective w ay in data clouds –  Hardware (correlated) failures, highly irregular query rates, NP multi-constr ained global optimization problem!  Solution: decentralized virtual economy (‘Skute’) –  Partition data using consistent hashing –  A virtual node is responsible for a key range –  Virtual ring organizes virtual nodes per availability level and per application –  Virtual nodes act as economic agents and independently migrate, replicate or delete themselves –  Skute offers differentiated availability guarantees, as well as automated an d balanced cloud resources elasticity  Publications: ACDC’09, ICDE’09, SoCC’10, Cloud’10, CCGrid’11  Springer book on “Economic Cloud Resource Management”, under prep aration
  14. 14. TimeCloud  A Cloud System for Massive Time Series Management –  Web-based time series management in the cloud •  Storage cloud, various time-series visualization, group-based data share, … •  Potentially linked to third-party software, e.g. SensorMap, SwissEx Wiki –  Storage-and-computing platform for massive time series processing •  Built on Hadoop/Hbase/GSN with capability of handling data streams •  Very efficient model-based parallel time-series data processing third-parties data streams Time-series compression Efficient data processing based on model-based views Distributed time-series processing
  15. 15. Overview  Sensor Data Management –  Global Sensor Networks –  Swiss Experiment –  Sensor Metadata Management –  Time Series compression and retrieval –  Sensor data analysis and quality –  Economics-based resource allocation in distributed clouds –  Cloud-based time series management system  Web Data Management –  Large-scale Semantic Data Integration –  Web Stream Data Analysis (Twitter)
  16. 16. “The Wisdom of the Network”Problem Emergent semantics• Schema heterogeneity inherent • Establishing semanticproblem for enterprise cooperation interoperability as a self-organizingnetworks process within a community or• Both manual and automated mapping social networkerror-prone • Mappings are established in a• Interoperability challenges evolve localized, incremental mannerconstantly •  Create mappings in a pay-as-you-go fashion •  Exploit the the knowledge available in the network: •  Available mappings in the network •  Content features •  Social structure of the network •  User feedback •  Economic incentives •  Apply probabilistic reasoning techniques to improve mapping quality
  17. 17. Web Data Stream Analysis  Classifying Twitter messages   We would like to classify tweets, containing a given keyword (e.g. “ apple”), whether they are related to a given company or not   Won the WePS 2010 tweet classification task
  18. 18.   Thank you for your attention!  For more information please visit http://lsir.epfl.ch/

×