In this presentation we review some of the research problems we address at EPFL in the area of sensor data management. At the level of infrastructure we have developed a middleware to seamlessly integrate, aggregate and analyze heterogeneous sensor data streams in real-time, a WIKI based repository supporting the cooperative management of the metadata associated with sensor deployments and cloud-based storage infrastructure. An important problem in managing sensor data is their efficient storage and transmission using compression techniques. To that end we apply model-based compression methods. For analyzing sensor data, we have developed methods to dynamically estimate the variability, which can be readily used for outlier detection, and to extract semantic features from GPS sensor data streams. We also investigate techniques for trading off between the accuracy of the sensor data obtained and the degree of privacy preservation that can be maintained.
The Sensor Data Management presentation was presented by Karl Aberer (Ecole Polytechnique Federale de Lausanne) at the PlanetData project Meeting on February 28 - March 4, 2011 in Innsbruck, Austria.
2. Overview
Sensor Data Management
– Global Sensor Networks
– Swiss Experiment
– Sensor Metadata Management
– Time Series compression and retrieval
– Sensor data analysis and quality
– Economics-based resource allocation in distributed clouds
– Cloud-based time series management system
Web Data Management
– Large-scale Semantic Data Integration
– Web Stream Data Analysis (Twitter)
3. Global Global Sensor Networks
Sensor Networks (GSN)
Integrates different sensor networks GSN:
– Different abstractions, hard to share Reference Implementation
Integrity Service
– Isolated networks, hard to republish
Access Control
GSN server: GSN/Web/Web-Services
Notification Manager
– Goal: Publishing streams generated Query Processor
by sensor networks Query Repository
– Storage, archive Storage Manager
Virtual Sensor Manager
– Access to sensor network hardware
Input Stream Manager
– Easy setup, easy to change Stream Quality Manager
Virtual Sensor:
Life Cycle Manager
– Processing, filtering, aggregation Pool Of Sensing Devices
– Functional/non-functional properties
– Described in a XML file
7. Time Series Compression and Retrieval
A model M describes the dependency between two sets of variables X and Y
Models may capture data correlations, derive unknown values, quantify and
correct measurement errors
– They are particularly useful for data compression, data completion and data cleaning
Our work is on
– Deriving lower bounds on the achievable compression ratio for a time series
– Define a suitable model-based storage and indexing scheme for fast
retrieval
– Defining innovative models for data cleaning and data quality estimation
Publications: ICDE’10, MDM’11, VLDB’11 (under preparation)
11. Sensor Context Extraction
SeMiTri: A Framework for Semantic Annotation of Heterogeneous Trajectories
Z. Yan, D. Chakraborty, C. Parent, S. Spaccapietra, K. Aberer [EDBT 2011]
Objec&ve:
A
Middleware
for
automa&cally
annota&ng
trajectories
of
different
types
of
moving
objects
(cars,
people)
Spa&al
join
(region)
bus metro walking
Semantic
trajectory home office market home
Semantic Annotation Middleware
Map-‐matching
(road
network)
Hidden
Spatial Map
Markov
Join Matching
Model
HMM
(point
of
Interest)
region road network point of interest
e1 e2 e3 e4 e5 e6 e7
GPS
episodes
13. Economic Cloud Resource Management
Objective: high availability and low response-time in a cost-effective w
ay in data clouds
– Hardware (correlated) failures, highly irregular query rates, NP multi-constr
ained global optimization problem!
Solution: decentralized virtual economy (‘Skute’)
– Partition data using consistent hashing
– A virtual node is responsible for a key range
– Virtual ring organizes virtual nodes per availability level and per application
– Virtual nodes act as economic agents and independently migrate, replicate
or delete themselves
– Skute offers differentiated availability guarantees, as well as automated an
d balanced cloud resources elasticity
Publications: ACDC’09, ICDE’09, SoCC’10, Cloud’10, CCGrid’11
Springer book on “Economic Cloud Resource Management”, under prep
aration
14. TimeCloud
A Cloud System for Massive Time Series Management
– Web-based time series management in the cloud
• Storage cloud, various time-series visualization, group-based data share, …
• Potentially linked to third-party software, e.g. SensorMap, SwissEx Wiki
– Storage-and-computing platform for massive time series processing
• Built on Hadoop/Hbase/GSN with capability of handling data streams
• Very efficient model-based parallel time-series data processing
third-parties
data streams
Time-series compression
Efficient data processing based on model-based views
Distributed time-series processing
15. Overview
Sensor Data Management
– Global Sensor Networks
– Swiss Experiment
– Sensor Metadata Management
– Time Series compression and retrieval
– Sensor data analysis and quality
– Economics-based resource allocation in distributed clouds
– Cloud-based time series management system
Web Data Management
– Large-scale Semantic Data Integration
– Web Stream Data Analysis (Twitter)
16. “The Wisdom of the Network”
Problem Emergent semantics
• Schema heterogeneity inherent • Establishing semantic
problem for enterprise cooperation interoperability as a self-organizing
networks process within a community or
• Both manual and automated mapping social network
error-prone • Mappings are established in a
• Interoperability challenges evolve localized, incremental manner
constantly
• Create mappings in a pay-as-you-go
fashion
• Exploit the the knowledge available in the
network:
• Available mappings in the network
• Content features
• Social structure of the network
• User feedback
• Economic incentives
• Apply probabilistic reasoning techniques to
improve mapping quality
17. Web Data Stream Analysis
Classifying Twitter messages
We would like to classify tweets, containing a given keyword (e.g. “
apple”), whether they are related to a given company or not
Won the WePS 2010 tweet classification task
18. Thank you for your attention!
For more information please visit
http://lsir.epfl.ch/