SlideShare a Scribd company logo
1 of 22
Download to read offline
Towards Enabling Probabilistic Databases for 
Participatory Sensing 
Nguyen Quoc Viet Hung 1, Saket Sathe 2, Duong Chi Thang 1, Karl Aberer1 
1 École Polytechnique Fédérale de Lausanne 
2 IBM Melbourne Research Laboratory
Participatory sensing 
 A concept of communities in which participants proactively report 
sensory information 
 Sensors might be humans or their mobile devices. 
 Enable harnessing the wisdom of the crowd to collect a huge amount of data for 
various applications: geo‐tagging, environmental monitoring, and public health. 
CollaborateCom | 2014 2 
 CrowdSense scenario: http://opensense.epfl.ch 
 Goal: collect high‐resolution urban air quality 
 Approach: community‐based sensing in which sensors are attached on vehicles and 
personal devices. 
 Advantages: real‐time data gathering, high resolution, and improving data quality 
CrowdSense
CollaborateCom | 2014 3 
Probabilistic Database 
 Traditional database: 
 Sensor data: a series of values at timestamps 
 Probabilistic database: 
 Sensor data: a series of distributions 
is a random variable of data value at timestamp 
reflects a probability distribution, e.g. 
݌ଵ ܴଵ 
݌ଶ ܴଶ 
݌ଷ ܴଷ 
1 2 3 
Traditional database 
Probabilistic database
Probabilistic Database for Participatory Sensing 
 Setting: 
CollaborateCom | 2014 4 
Aggregation 
 Motivation: data collected from individual participants and their devices 
are inherently uncertain due to noise factors: 
 Low sensor quality 
 Unstable communication channels 
 Challenges: 
 Sensor data irregularly depend on time  dynamic computation 
• Temperature changes dramatically around sunrise and sunset, but changes only 
slightly during the night. 
 Each sensor has different quality  trust model 
Aggregated 
probabilistic data 
Multiple sensor data
CollaborateCom | 2014 5 
Approach overview 
 Computing likelihoods: compute probabilistic sensor data from raw 
sensor data 
 Infer probability distribution at each timestamp 
 Aggregate: combines all probabilistic sensor data into single probabilistic 
sensor data 
 Trust assessment: compute trust scores for each sensor and its data 
 Trust‐based aggregation: sensors with high trust scores have high impact on 
aggregated data
CollaborateCom | 2014 6 
Outline 
 Model 
 Computing likelihoods of sensor data 
 Aggregating multiple sensor data 
 Experimental results 
 Conclusion and future work
CollaborateCom | 2014 7 
Model 
 Sensor data: 
is a reading collected by sensor at time 
 Probabilistic sensor data: 
 : random variable 
: probability density function 
 Problem statement: given a set of time series of 
sensors, compute an aggregated probabilistic sensor data 
. 
 is the probabilistic distribution at timestamp combined from 
.
CollaborateCom | 2014 8 
Computing likelihoods of single sensor data 
 Input: a sensor data 
 Output: a probabilistic sensor data 
 Requirements: 
 R1: The currently value dynamically depends on past values 
 R2: Uncertainty range varies over time 
 Solution: model by a Gaussian probability distribution 
 Estimate the expected value : using Auto Regressive Moving Average [1] – ARMA 
(p,q) with autoregressive terms and moving‐average terms: 
 Compute the variance : using Generalized AutoRegressive Conditional 
Heteroskedasticity (GARCH) [1] model: 
[*] R. H. Shumway and D. S. Stoffer, Time series analysis and its applications. Springer‐Verlag, 2000 
satisfy (R1) 
satisfy (R2)
CollaborateCom | 2014 9 
Aggregating multiple sensor data 
 Input: a set of probabilistic sensor data 
 Output: an aggregated sensor data 
 Method: use trust‐based approach 
 Trust assessment 
 Probability aggregation
CollaborateCom | 2014 10 
Outline 
 Model 
 Computing likelihoods of sensor data 
 Aggregating multiple sensor data 
 Trustworthiness assessment 
 Probability aggregation 
 Experimental results 
 Conclusion and future work
CollaborateCom | 2014 11 
Trustworthiness assessment 
 Assess the trustworthiness based on two factors: 
 Similarity of distributions: distributions which are similar to each other should have 
higher trust scores 
• The first distribution of sensor 1 is similar to the first distribution of sensor 2 
and sensor 3  should have high trust score 
 Reliability of sensors: a reliable sensor tends to provide correct information 
• If sensor 1 is reliable, even the third distribution of sensor 1 is different from 
the others  this distribution should nevertheless have high trust score. 
ߙ௧௜ 
ൌ 
೔,ೕ 
Σ ఉೕ௦೟ 
ೕసభ..೙,ೕಯ೔ 
Σೕసభ..೙ ఉೕ 
ߙ௧௜ 
: trust score of ݐ‐th distribution of sensor ݅ 
ߚ௝: reliability of sensor ݆ 
ݏ௧ 
௜,௝ : similarity between ݐ‐th distributions of sensor ݅ and ݆
CollaborateCom | 2014 12 
Trustworthiness assessment (cont’d) 
 Compute the reliability for each sensor: 
 A sensor that provides more correct data tends to be more reliable 
 Compute the reliability of each sensor based on the trust scores of its data: 
ߙ ൌ 1 
ߙ ൌ 1 
ߙ ൌ 0.5 
ߚ ൎ 0.83
Trustworthiness assessment (cont’d) 
 There is a mutually reinforcing relationship between sensors and data 
they provide. 
CollaborateCom | 2014 13 
 Trust score of sensor data: 
 Reliability of sensor: 
 Propose an iterative algorithm to concurrently update the reliability of 
sensor and trust scores of sensor data until convergence.
CollaborateCom | 2014 14 
Trustworthiness assessment (cont’d) 
 Initialize trust scores as 0.5 (maximum entropy principle) 
 Iterate until termination condition is satisfied: 
 Compute reliability of each sensor based on current trust scores 
 Compute the trust scores based on current reliability.
Probability aggregation 
 Compute the final random variable from multiple sensor data weighted 
by their trust scores: 
 where is timestamp, is the number of sensors, is the random variable 
representing the probability distributions of sensor ݅ at timestamp ݐ. 
CollaborateCom | 2014 15 
 Applying moment‐generating function [2]: 
[2] C. M. Grinstead and J. L. Snell, Introduction to probability. American Mathematical Soc., 1998
CollaborateCom | 2014 16 
Experiment – Dataset and Setting 
 Datasets: 
 Real data: 
• Campus: temperature readings collected from a real sensor network deployed 
on university campus 
• Moving‐object: GPS readings collected from 192 moving objects. 
 Synthetic data: 
• Fix a true distribution for each timestamp 
• Generate probability distributions from the true distribution by randomly 
adding differences to the mean value. 
 Evaluation measures: computation time, effects of outliers, effects of 
heterogeneity level
CollaborateCom | 2014 17 
Computation time 
 Setting: 
 Vary the length of time series 
 Metrics: computation time 
 Observations: 
 Computation time is reasonable w.r.t. data size 
 GARCH model is suitable for online and real‐time applications.
CollaborateCom | 2014 18 
Effects of outlier 
 Setting: 
 Vary the percentage of outliers (i.e. sensors whose data are completely different 
from normal sensors) in real data. 
 Measure the average trust scores of normal sensors vs. outliers 
 Observations: 
 The difference between normal sensors and outliers is clearly separated. 
 Our approach can distinguish between normal sensors and outliers
CollaborateCom | 2014 19 
Effects of heterogeneity level 
 Setting: 
 Generate synthetic data by randomly adding some differences (i.e. distance) to the 
mean value of true distribution 
 Compute the trust scores for the synthetic distributions 
 Metrics: compare the rank of data by distance to true distribution vs. by the trust 
scores 
 Observations: 
 The two ranking orders are similar 
 Our trust scores can reflect the correctness of data 
Effect of distance between distributions on the accuracy of computing trust 
(size of the circles reflects the number of data with the same ranks)
Conclusions 
 We built a systematic model to manage uncertain data from 
participatory sensing. 
 Our probabilistic model captures the dynamic and uncertain nature of 
sensor data. 
 We combined multiple probabilistic data by evaluating the trust scores 
of data and aggregate based on these trust scores. 
CollaborateCom | 2014 20
Applications 
 Information about uncertainty of sensor data can be used as/in: 
CollaborateCom | 2014 21 
 Guidance for data repair: 
• Sensor data is inherently uncertain  need human knowledge to repair data 
• Minimize the repair effort by suggesting the data with most information gain 
(i.e. the amount of uncertainty reduction of knowing the true value) 
 Adaptive data acquisition in resilience systems: 
• Abnormal readings may reflect unexpected situations: network loss, battery 
discharge, etc. 
• Detect such situations by analyzing the probability distributions of consecutive 
timestamps (e.g. sudden changes of variances and mean values).
CollaborateCom | 2014 22 
THANK YOU 
Q&A

More Related Content

What's hot

Earthquake shakes twitter users
Earthquake shakes twitter usersEarthquake shakes twitter users
Earthquake shakes twitter usersEshan Mudwel
 
Semantic Twitter Analyzing Tweets For Real Time Event Notification
Semantic Twitter Analyzing Tweets For Real Time Event NotificationSemantic Twitter Analyzing Tweets For Real Time Event Notification
Semantic Twitter Analyzing Tweets For Real Time Event Notificationokazaki117
 
Earthquake shakes twitter users real-time event detection by social sensors
Earthquake shakes twitter users  real-time event detection by social sensorsEarthquake shakes twitter users  real-time event detection by social sensors
Earthquake shakes twitter users real-time event detection by social sensorsMike Mayer
 
WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...
WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...
WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...tksakaki
 
SENTIMENT ANALYSIS AND GEOGRAPHICAL ANALYSIS FOR ENHANCING SECURITY
SENTIMENT ANALYSIS AND GEOGRAPHICAL ANALYSIS FOR ENHANCING SECURITYSENTIMENT ANALYSIS AND GEOGRAPHICAL ANALYSIS FOR ENHANCING SECURITY
SENTIMENT ANALYSIS AND GEOGRAPHICAL ANALYSIS FOR ENHANCING SECURITYSangeetha Mam
 
REDUCING THE COGNITIVE LOAD ON ANALYSTS THROUGH HAMMING DISTANCE BASED ALERT ...
REDUCING THE COGNITIVE LOAD ON ANALYSTS THROUGH HAMMING DISTANCE BASED ALERT ...REDUCING THE COGNITIVE LOAD ON ANALYSTS THROUGH HAMMING DISTANCE BASED ALERT ...
REDUCING THE COGNITIVE LOAD ON ANALYSTS THROUGH HAMMING DISTANCE BASED ALERT ...IJNSA Journal
 
TWO LEVEL DATA FUSION MODEL FOR DATA MINIMIZATION AND EVENT DETECTION IN PERI...
TWO LEVEL DATA FUSION MODEL FOR DATA MINIMIZATION AND EVENT DETECTION IN PERI...TWO LEVEL DATA FUSION MODEL FOR DATA MINIMIZATION AND EVENT DETECTION IN PERI...
TWO LEVEL DATA FUSION MODEL FOR DATA MINIMIZATION AND EVENT DETECTION IN PERI...pijans
 
Energy Efficient Mobile Targets Classification and Tracking in WSNs based on ...
Energy Efficient Mobile Targets Classification and Tracking in WSNs based on ...Energy Efficient Mobile Targets Classification and Tracking in WSNs based on ...
Energy Efficient Mobile Targets Classification and Tracking in WSNs based on ...idescitation
 
Prediction Based Moving Object Tracking in Wireless Sensor Network
Prediction Based Moving Object Tracking in Wireless Sensor NetworkPrediction Based Moving Object Tracking in Wireless Sensor Network
Prediction Based Moving Object Tracking in Wireless Sensor NetworkIRJET Journal
 
Incremental clustering based object tracking in wireless sensor networks
Incremental clustering based object tracking in wireless sensor networksIncremental clustering based object tracking in wireless sensor networks
Incremental clustering based object tracking in wireless sensor networksSachin MS
 
Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...
Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...
Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...EuroIoTa
 
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...Neelabha Pant
 
Data mining projects topics for java and dot net
Data mining projects topics for java and dot netData mining projects topics for java and dot net
Data mining projects topics for java and dot netredpel dot com
 
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...Real-time Classification of Malicious URLs on Twitter using Machine Activity ...
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...Pete Burnap
 
Distributional Semantics and Unsupervised Clustering for Sensor Relevancy Pre...
Distributional Semantics and Unsupervised Clustering for Sensor Relevancy Pre...Distributional Semantics and Unsupervised Clustering for Sensor Relevancy Pre...
Distributional Semantics and Unsupervised Clustering for Sensor Relevancy Pre...iammyr
 
Using Dempster-Shafer Theory and Real Options Theory
Using Dempster-Shafer Theory and Real Options TheoryUsing Dempster-Shafer Theory and Real Options Theory
Using Dempster-Shafer Theory and Real Options TheoryEric van Heck
 

What's hot (20)

Earthquake shakes twitter users
Earthquake shakes twitter usersEarthquake shakes twitter users
Earthquake shakes twitter users
 
inter net things
inter net thingsinter net things
inter net things
 
Semantic Twitter Analyzing Tweets For Real Time Event Notification
Semantic Twitter Analyzing Tweets For Real Time Event NotificationSemantic Twitter Analyzing Tweets For Real Time Event Notification
Semantic Twitter Analyzing Tweets For Real Time Event Notification
 
Earthquake shakes twitter users real-time event detection by social sensors
Earthquake shakes twitter users  real-time event detection by social sensorsEarthquake shakes twitter users  real-time event detection by social sensors
Earthquake shakes twitter users real-time event detection by social sensors
 
WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...
WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...
WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...
 
SENTIMENT ANALYSIS AND GEOGRAPHICAL ANALYSIS FOR ENHANCING SECURITY
SENTIMENT ANALYSIS AND GEOGRAPHICAL ANALYSIS FOR ENHANCING SECURITYSENTIMENT ANALYSIS AND GEOGRAPHICAL ANALYSIS FOR ENHANCING SECURITY
SENTIMENT ANALYSIS AND GEOGRAPHICAL ANALYSIS FOR ENHANCING SECURITY
 
REDUCING THE COGNITIVE LOAD ON ANALYSTS THROUGH HAMMING DISTANCE BASED ALERT ...
REDUCING THE COGNITIVE LOAD ON ANALYSTS THROUGH HAMMING DISTANCE BASED ALERT ...REDUCING THE COGNITIVE LOAD ON ANALYSTS THROUGH HAMMING DISTANCE BASED ALERT ...
REDUCING THE COGNITIVE LOAD ON ANALYSTS THROUGH HAMMING DISTANCE BASED ALERT ...
 
TWO LEVEL DATA FUSION MODEL FOR DATA MINIMIZATION AND EVENT DETECTION IN PERI...
TWO LEVEL DATA FUSION MODEL FOR DATA MINIMIZATION AND EVENT DETECTION IN PERI...TWO LEVEL DATA FUSION MODEL FOR DATA MINIMIZATION AND EVENT DETECTION IN PERI...
TWO LEVEL DATA FUSION MODEL FOR DATA MINIMIZATION AND EVENT DETECTION IN PERI...
 
Energy Efficient Mobile Targets Classification and Tracking in WSNs based on ...
Energy Efficient Mobile Targets Classification and Tracking in WSNs based on ...Energy Efficient Mobile Targets Classification and Tracking in WSNs based on ...
Energy Efficient Mobile Targets Classification and Tracking in WSNs based on ...
 
Prediction Based Moving Object Tracking in Wireless Sensor Network
Prediction Based Moving Object Tracking in Wireless Sensor NetworkPrediction Based Moving Object Tracking in Wireless Sensor Network
Prediction Based Moving Object Tracking in Wireless Sensor Network
 
BPI@BPM2011
BPI@BPM2011BPI@BPM2011
BPI@BPM2011
 
Incremental clustering based object tracking in wireless sensor networks
Incremental clustering based object tracking in wireless sensor networksIncremental clustering based object tracking in wireless sensor networks
Incremental clustering based object tracking in wireless sensor networks
 
Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...
Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...
Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...
 
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
 
Data mining projects topics for java and dot net
Data mining projects topics for java and dot netData mining projects topics for java and dot net
Data mining projects topics for java and dot net
 
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...Real-time Classification of Malicious URLs on Twitter using Machine Activity ...
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...
 
Distributional Semantics and Unsupervised Clustering for Sensor Relevancy Pre...
Distributional Semantics and Unsupervised Clustering for Sensor Relevancy Pre...Distributional Semantics and Unsupervised Clustering for Sensor Relevancy Pre...
Distributional Semantics and Unsupervised Clustering for Sensor Relevancy Pre...
 
AI: Belief Networks
AI: Belief NetworksAI: Belief Networks
AI: Belief Networks
 
Using Dempster-Shafer Theory and Real Options Theory
Using Dempster-Shafer Theory and Real Options TheoryUsing Dempster-Shafer Theory and Real Options Theory
Using Dempster-Shafer Theory and Real Options Theory
 
Byasian nural network BCPNN
Byasian nural network BCPNNByasian nural network BCPNN
Byasian nural network BCPNN
 

Viewers also liked

Four Steps to Creating an Effective Open Source Policy
Four Steps to Creating an Effective Open Source PolicyFour Steps to Creating an Effective Open Source Policy
Four Steps to Creating an Effective Open Source Policyiasaglobal
 
Austin Roundtable Discussion on REST
Austin Roundtable Discussion on RESTAustin Roundtable Discussion on REST
Austin Roundtable Discussion on RESTiasaglobal
 
Vst effects manipulation 3
Vst effects manipulation 3Vst effects manipulation 3
Vst effects manipulation 3Kalen612
 
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatchLinking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatchPlanetData Network of Excellence
 
Adaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsAdaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsPlanetData Network of Excellence
 

Viewers also liked (6)

Four Steps to Creating an Effective Open Source Policy
Four Steps to Creating an Effective Open Source PolicyFour Steps to Creating an Effective Open Source Policy
Four Steps to Creating an Effective Open Source Policy
 
Austin Roundtable Discussion on REST
Austin Roundtable Discussion on RESTAustin Roundtable Discussion on REST
Austin Roundtable Discussion on REST
 
Vst effects manipulation 3
Vst effects manipulation 3Vst effects manipulation 3
Vst effects manipulation 3
 
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatchLinking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
 
Adaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsAdaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of Endpoints
 
CLODA: A Crowdsourced Linked Open Data Architecture
CLODA: A Crowdsourced Linked Open Data ArchitectureCLODA: A Crowdsourced Linked Open Data Architecture
CLODA: A Crowdsourced Linked Open Data Architecture
 

Similar to Towards Enabling Probabilistic Databases for Participatory Sensing

An Autonomic Approach to Real-Time Predictive Analytics using Open Data and ...
An Autonomic Approach to Real-Time Predictive Analytics using Open Data and ...An Autonomic Approach to Real-Time Predictive Analytics using Open Data and ...
An Autonomic Approach to Real-Time Predictive Analytics using Open Data and ...Wassim Derguech
 
Improvising Network life time of Wireless sensor networks using mobile data a...
Improvising Network life time of Wireless sensor networks using mobile data a...Improvising Network life time of Wireless sensor networks using mobile data a...
Improvising Network life time of Wireless sensor networks using mobile data a...Editor IJCATR
 
A Novel Approach To Answer Continuous Aggregation Queries Using Data Aggregat...
A Novel Approach To Answer Continuous Aggregation Queries Using Data Aggregat...A Novel Approach To Answer Continuous Aggregation Queries Using Data Aggregat...
A Novel Approach To Answer Continuous Aggregation Queries Using Data Aggregat...IJMER
 
MCP_ES_2012_Jie
MCP_ES_2012_JieMCP_ES_2012_Jie
MCP_ES_2012_JieMDO_Lab
 
Efficient Data Gathering with Compressive Sensing in Wireless Sensor Networks
Efficient Data Gathering with Compressive Sensing in Wireless Sensor NetworksEfficient Data Gathering with Compressive Sensing in Wireless Sensor Networks
Efficient Data Gathering with Compressive Sensing in Wireless Sensor NetworksIRJET Journal
 
Data Accuracy Models under Spatio - Temporal Correlation with Adaptive Strate...
Data Accuracy Models under Spatio - Temporal Correlation with Adaptive Strate...Data Accuracy Models under Spatio - Temporal Correlation with Adaptive Strate...
Data Accuracy Models under Spatio - Temporal Correlation with Adaptive Strate...IDES Editor
 
SENSOR SELECTION SCHEME IN WIRELESS SENSOR NETWORKS: A NEW ROUTING APPROACH
SENSOR SELECTION SCHEME IN WIRELESS SENSOR NETWORKS: A NEW ROUTING APPROACHSENSOR SELECTION SCHEME IN WIRELESS SENSOR NETWORKS: A NEW ROUTING APPROACH
SENSOR SELECTION SCHEME IN WIRELESS SENSOR NETWORKS: A NEW ROUTING APPROACHcsandit
 
SENSOR SELECTION SCHEME IN TEMPERATURE WIRELESS SENSOR NETWORK
SENSOR SELECTION SCHEME IN TEMPERATURE WIRELESS SENSOR NETWORKSENSOR SELECTION SCHEME IN TEMPERATURE WIRELESS SENSOR NETWORK
SENSOR SELECTION SCHEME IN TEMPERATURE WIRELESS SENSOR NETWORKijwmn
 
part-4-structural-equation-modelling-qr.pptx
part-4-structural-equation-modelling-qr.pptxpart-4-structural-equation-modelling-qr.pptx
part-4-structural-equation-modelling-qr.pptxKaRim295737
 
An Adaptive Monitoring Service exploiting Data Correlations in Fog Computing ...
An Adaptive Monitoring Service exploiting Data Correlations in Fog Computing ...An Adaptive Monitoring Service exploiting Data Correlations in Fog Computing ...
An Adaptive Monitoring Service exploiting Data Correlations in Fog Computing ...Monica Vitali
 
1 Object tracking using sensor network Orla Sahi
1       Object tracking using sensor network Orla Sahi1       Object tracking using sensor network Orla Sahi
1 Object tracking using sensor network Orla SahiSilvaGraf83
 
Input Data Collection and Analysis.pptx
Input Data Collection and Analysis.pptxInput Data Collection and Analysis.pptx
Input Data Collection and Analysis.pptxbitf20m550SenirJusti
 
information retrival evaluation.ppt
information retrival evaluation.pptinformation retrival evaluation.ppt
information retrival evaluation.pptBonnieKabiru
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
 
Multi-resolution Data Communication in Wireless Sensor Networks
Multi-resolution Data Communication in Wireless Sensor NetworksMulti-resolution Data Communication in Wireless Sensor Networks
Multi-resolution Data Communication in Wireless Sensor NetworksPayamBarnaghi
 
Mobile and Web Applications for Sensing Hazardous Room Temperature using Wire...
Mobile and Web Applications for Sensing Hazardous Room Temperature using Wire...Mobile and Web Applications for Sensing Hazardous Room Temperature using Wire...
Mobile and Web Applications for Sensing Hazardous Room Temperature using Wire...Mysa Vijay
 
Information extraction from sensor networks using the Watershed transform alg...
Information extraction from sensor networks using the Watershed transform alg...Information extraction from sensor networks using the Watershed transform alg...
Information extraction from sensor networks using the Watershed transform alg...M H
 
Multi-resolution Data Communication in Wireless Sensor Networks
 Multi-resolution Data Communication in Wireless Sensor Networks Multi-resolution Data Communication in Wireless Sensor Networks
Multi-resolution Data Communication in Wireless Sensor NetworksCityPulse Project
 

Similar to Towards Enabling Probabilistic Databases for Participatory Sensing (20)

Pay-as-you-go Reconciliation in Schema Matching Networks
Pay-as-you-go Reconciliation in Schema Matching NetworksPay-as-you-go Reconciliation in Schema Matching Networks
Pay-as-you-go Reconciliation in Schema Matching Networks
 
An Autonomic Approach to Real-Time Predictive Analytics using Open Data and ...
An Autonomic Approach to Real-Time Predictive Analytics using Open Data and ...An Autonomic Approach to Real-Time Predictive Analytics using Open Data and ...
An Autonomic Approach to Real-Time Predictive Analytics using Open Data and ...
 
Improvising Network life time of Wireless sensor networks using mobile data a...
Improvising Network life time of Wireless sensor networks using mobile data a...Improvising Network life time of Wireless sensor networks using mobile data a...
Improvising Network life time of Wireless sensor networks using mobile data a...
 
A Novel Approach To Answer Continuous Aggregation Queries Using Data Aggregat...
A Novel Approach To Answer Continuous Aggregation Queries Using Data Aggregat...A Novel Approach To Answer Continuous Aggregation Queries Using Data Aggregat...
A Novel Approach To Answer Continuous Aggregation Queries Using Data Aggregat...
 
MCP_ES_2012_Jie
MCP_ES_2012_JieMCP_ES_2012_Jie
MCP_ES_2012_Jie
 
Efficient Data Gathering with Compressive Sensing in Wireless Sensor Networks
Efficient Data Gathering with Compressive Sensing in Wireless Sensor NetworksEfficient Data Gathering with Compressive Sensing in Wireless Sensor Networks
Efficient Data Gathering with Compressive Sensing in Wireless Sensor Networks
 
Data Accuracy Models under Spatio - Temporal Correlation with Adaptive Strate...
Data Accuracy Models under Spatio - Temporal Correlation with Adaptive Strate...Data Accuracy Models under Spatio - Temporal Correlation with Adaptive Strate...
Data Accuracy Models under Spatio - Temporal Correlation with Adaptive Strate...
 
SENSOR SELECTION SCHEME IN WIRELESS SENSOR NETWORKS: A NEW ROUTING APPROACH
SENSOR SELECTION SCHEME IN WIRELESS SENSOR NETWORKS: A NEW ROUTING APPROACHSENSOR SELECTION SCHEME IN WIRELESS SENSOR NETWORKS: A NEW ROUTING APPROACH
SENSOR SELECTION SCHEME IN WIRELESS SENSOR NETWORKS: A NEW ROUTING APPROACH
 
SENSOR SELECTION SCHEME IN TEMPERATURE WIRELESS SENSOR NETWORK
SENSOR SELECTION SCHEME IN TEMPERATURE WIRELESS SENSOR NETWORKSENSOR SELECTION SCHEME IN TEMPERATURE WIRELESS SENSOR NETWORK
SENSOR SELECTION SCHEME IN TEMPERATURE WIRELESS SENSOR NETWORK
 
Bs4301396400
Bs4301396400Bs4301396400
Bs4301396400
 
part-4-structural-equation-modelling-qr.pptx
part-4-structural-equation-modelling-qr.pptxpart-4-structural-equation-modelling-qr.pptx
part-4-structural-equation-modelling-qr.pptx
 
An Adaptive Monitoring Service exploiting Data Correlations in Fog Computing ...
An Adaptive Monitoring Service exploiting Data Correlations in Fog Computing ...An Adaptive Monitoring Service exploiting Data Correlations in Fog Computing ...
An Adaptive Monitoring Service exploiting Data Correlations in Fog Computing ...
 
1 Object tracking using sensor network Orla Sahi
1       Object tracking using sensor network Orla Sahi1       Object tracking using sensor network Orla Sahi
1 Object tracking using sensor network Orla Sahi
 
Input Data Collection and Analysis.pptx
Input Data Collection and Analysis.pptxInput Data Collection and Analysis.pptx
Input Data Collection and Analysis.pptx
 
information retrival evaluation.ppt
information retrival evaluation.pptinformation retrival evaluation.ppt
information retrival evaluation.ppt
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
Multi-resolution Data Communication in Wireless Sensor Networks
Multi-resolution Data Communication in Wireless Sensor NetworksMulti-resolution Data Communication in Wireless Sensor Networks
Multi-resolution Data Communication in Wireless Sensor Networks
 
Mobile and Web Applications for Sensing Hazardous Room Temperature using Wire...
Mobile and Web Applications for Sensing Hazardous Room Temperature using Wire...Mobile and Web Applications for Sensing Hazardous Room Temperature using Wire...
Mobile and Web Applications for Sensing Hazardous Room Temperature using Wire...
 
Information extraction from sensor networks using the Watershed transform alg...
Information extraction from sensor networks using the Watershed transform alg...Information extraction from sensor networks using the Watershed transform alg...
Information extraction from sensor networks using the Watershed transform alg...
 
Multi-resolution Data Communication in Wireless Sensor Networks
 Multi-resolution Data Communication in Wireless Sensor Networks Multi-resolution Data Communication in Wireless Sensor Networks
Multi-resolution Data Communication in Wireless Sensor Networks
 

More from PlanetData Network of Excellence

A Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about TrentinoA Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about TrentinoPlanetData Network of Excellence
 
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching NetworksOn Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching NetworksPlanetData Network of Excellence
 
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstreamDemo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstreamPlanetData Network of Excellence
 
On the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingOn the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingPlanetData Network of Excellence
 
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...PlanetData Network of Excellence
 
SciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMSSciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMSPlanetData Network of Excellence
 
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduceScalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReducePlanetData Network of Excellence
 
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...PlanetData Network of Excellence
 
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of FactsTowards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of FactsPlanetData Network of Excellence
 
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...PlanetData Network of Excellence
 
Exploring The Hubness-Related Properties of Oceanographic Sensor Data
Exploring The Hubness-Related Properties of Oceanographic Sensor DataExploring The Hubness-Related Properties of Oceanographic Sensor Data
Exploring The Hubness-Related Properties of Oceanographic Sensor DataPlanetData Network of Excellence
 

More from PlanetData Network of Excellence (20)

Dl2014 slides
Dl2014 slidesDl2014 slides
Dl2014 slides
 
A Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about TrentinoA Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about Trentino
 
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching NetworksOn Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
 
Privacy-Preserving Schema Reuse
Privacy-Preserving Schema ReusePrivacy-Preserving Schema Reuse
Privacy-Preserving Schema Reuse
 
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstreamDemo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
 
On the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingOn the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream Processing
 
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
 
SciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMSSciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMS
 
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduceScalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
 
Data and Knowledge Evolution
Data and Knowledge Evolution  Data and Knowledge Evolution
Data and Knowledge Evolution
 
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
 
Access Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract ModelsAccess Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract Models
 
Arrays in Databases, the next frontier?
Arrays in Databases, the next frontier?Arrays in Databases, the next frontier?
Arrays in Databases, the next frontier?
 
Abstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF DatasetsAbstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF Datasets
 
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of FactsTowards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
 
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
 
Heuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQLHeuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQL
 
Building a Front End for a Sensor Data Cloud
Building a Front End for a Sensor Data CloudBuilding a Front End for a Sensor Data Cloud
Building a Front End for a Sensor Data Cloud
 
OntoGen Extension for Exploring Image Collections
OntoGen Extension for Exploring Image CollectionsOntoGen Extension for Exploring Image Collections
OntoGen Extension for Exploring Image Collections
 
Exploring The Hubness-Related Properties of Oceanographic Sensor Data
Exploring The Hubness-Related Properties of Oceanographic Sensor DataExploring The Hubness-Related Properties of Oceanographic Sensor Data
Exploring The Hubness-Related Properties of Oceanographic Sensor Data
 

Towards Enabling Probabilistic Databases for Participatory Sensing

  • 1. Towards Enabling Probabilistic Databases for Participatory Sensing Nguyen Quoc Viet Hung 1, Saket Sathe 2, Duong Chi Thang 1, Karl Aberer1 1 École Polytechnique Fédérale de Lausanne 2 IBM Melbourne Research Laboratory
  • 2. Participatory sensing  A concept of communities in which participants proactively report sensory information  Sensors might be humans or their mobile devices.  Enable harnessing the wisdom of the crowd to collect a huge amount of data for various applications: geo‐tagging, environmental monitoring, and public health. CollaborateCom | 2014 2  CrowdSense scenario: http://opensense.epfl.ch  Goal: collect high‐resolution urban air quality  Approach: community‐based sensing in which sensors are attached on vehicles and personal devices.  Advantages: real‐time data gathering, high resolution, and improving data quality CrowdSense
  • 3. CollaborateCom | 2014 3 Probabilistic Database  Traditional database:  Sensor data: a series of values at timestamps  Probabilistic database:  Sensor data: a series of distributions is a random variable of data value at timestamp reflects a probability distribution, e.g. ݌ଵ ܴଵ ݌ଶ ܴଶ ݌ଷ ܴଷ 1 2 3 Traditional database Probabilistic database
  • 4. Probabilistic Database for Participatory Sensing  Setting: CollaborateCom | 2014 4 Aggregation  Motivation: data collected from individual participants and their devices are inherently uncertain due to noise factors:  Low sensor quality  Unstable communication channels  Challenges:  Sensor data irregularly depend on time  dynamic computation • Temperature changes dramatically around sunrise and sunset, but changes only slightly during the night.  Each sensor has different quality  trust model Aggregated probabilistic data Multiple sensor data
  • 5. CollaborateCom | 2014 5 Approach overview  Computing likelihoods: compute probabilistic sensor data from raw sensor data  Infer probability distribution at each timestamp  Aggregate: combines all probabilistic sensor data into single probabilistic sensor data  Trust assessment: compute trust scores for each sensor and its data  Trust‐based aggregation: sensors with high trust scores have high impact on aggregated data
  • 6. CollaborateCom | 2014 6 Outline  Model  Computing likelihoods of sensor data  Aggregating multiple sensor data  Experimental results  Conclusion and future work
  • 7. CollaborateCom | 2014 7 Model  Sensor data: is a reading collected by sensor at time  Probabilistic sensor data:  : random variable : probability density function  Problem statement: given a set of time series of sensors, compute an aggregated probabilistic sensor data .  is the probabilistic distribution at timestamp combined from .
  • 8. CollaborateCom | 2014 8 Computing likelihoods of single sensor data  Input: a sensor data  Output: a probabilistic sensor data  Requirements:  R1: The currently value dynamically depends on past values  R2: Uncertainty range varies over time  Solution: model by a Gaussian probability distribution  Estimate the expected value : using Auto Regressive Moving Average [1] – ARMA (p,q) with autoregressive terms and moving‐average terms:  Compute the variance : using Generalized AutoRegressive Conditional Heteroskedasticity (GARCH) [1] model: [*] R. H. Shumway and D. S. Stoffer, Time series analysis and its applications. Springer‐Verlag, 2000 satisfy (R1) satisfy (R2)
  • 9. CollaborateCom | 2014 9 Aggregating multiple sensor data  Input: a set of probabilistic sensor data  Output: an aggregated sensor data  Method: use trust‐based approach  Trust assessment  Probability aggregation
  • 10. CollaborateCom | 2014 10 Outline  Model  Computing likelihoods of sensor data  Aggregating multiple sensor data  Trustworthiness assessment  Probability aggregation  Experimental results  Conclusion and future work
  • 11. CollaborateCom | 2014 11 Trustworthiness assessment  Assess the trustworthiness based on two factors:  Similarity of distributions: distributions which are similar to each other should have higher trust scores • The first distribution of sensor 1 is similar to the first distribution of sensor 2 and sensor 3  should have high trust score  Reliability of sensors: a reliable sensor tends to provide correct information • If sensor 1 is reliable, even the third distribution of sensor 1 is different from the others  this distribution should nevertheless have high trust score. ߙ௧௜ ൌ ೔,ೕ Σ ఉೕ௦೟ ೕసభ..೙,ೕಯ೔ Σೕసభ..೙ ఉೕ ߙ௧௜ : trust score of ݐ‐th distribution of sensor ݅ ߚ௝: reliability of sensor ݆ ݏ௧ ௜,௝ : similarity between ݐ‐th distributions of sensor ݅ and ݆
  • 12. CollaborateCom | 2014 12 Trustworthiness assessment (cont’d)  Compute the reliability for each sensor:  A sensor that provides more correct data tends to be more reliable  Compute the reliability of each sensor based on the trust scores of its data: ߙ ൌ 1 ߙ ൌ 1 ߙ ൌ 0.5 ߚ ൎ 0.83
  • 13. Trustworthiness assessment (cont’d)  There is a mutually reinforcing relationship between sensors and data they provide. CollaborateCom | 2014 13  Trust score of sensor data:  Reliability of sensor:  Propose an iterative algorithm to concurrently update the reliability of sensor and trust scores of sensor data until convergence.
  • 14. CollaborateCom | 2014 14 Trustworthiness assessment (cont’d)  Initialize trust scores as 0.5 (maximum entropy principle)  Iterate until termination condition is satisfied:  Compute reliability of each sensor based on current trust scores  Compute the trust scores based on current reliability.
  • 15. Probability aggregation  Compute the final random variable from multiple sensor data weighted by their trust scores:  where is timestamp, is the number of sensors, is the random variable representing the probability distributions of sensor ݅ at timestamp ݐ. CollaborateCom | 2014 15  Applying moment‐generating function [2]: [2] C. M. Grinstead and J. L. Snell, Introduction to probability. American Mathematical Soc., 1998
  • 16. CollaborateCom | 2014 16 Experiment – Dataset and Setting  Datasets:  Real data: • Campus: temperature readings collected from a real sensor network deployed on university campus • Moving‐object: GPS readings collected from 192 moving objects.  Synthetic data: • Fix a true distribution for each timestamp • Generate probability distributions from the true distribution by randomly adding differences to the mean value.  Evaluation measures: computation time, effects of outliers, effects of heterogeneity level
  • 17. CollaborateCom | 2014 17 Computation time  Setting:  Vary the length of time series  Metrics: computation time  Observations:  Computation time is reasonable w.r.t. data size  GARCH model is suitable for online and real‐time applications.
  • 18. CollaborateCom | 2014 18 Effects of outlier  Setting:  Vary the percentage of outliers (i.e. sensors whose data are completely different from normal sensors) in real data.  Measure the average trust scores of normal sensors vs. outliers  Observations:  The difference between normal sensors and outliers is clearly separated.  Our approach can distinguish between normal sensors and outliers
  • 19. CollaborateCom | 2014 19 Effects of heterogeneity level  Setting:  Generate synthetic data by randomly adding some differences (i.e. distance) to the mean value of true distribution  Compute the trust scores for the synthetic distributions  Metrics: compare the rank of data by distance to true distribution vs. by the trust scores  Observations:  The two ranking orders are similar  Our trust scores can reflect the correctness of data Effect of distance between distributions on the accuracy of computing trust (size of the circles reflects the number of data with the same ranks)
  • 20. Conclusions  We built a systematic model to manage uncertain data from participatory sensing.  Our probabilistic model captures the dynamic and uncertain nature of sensor data.  We combined multiple probabilistic data by evaluating the trust scores of data and aggregate based on these trust scores. CollaborateCom | 2014 20
  • 21. Applications  Information about uncertainty of sensor data can be used as/in: CollaborateCom | 2014 21  Guidance for data repair: • Sensor data is inherently uncertain  need human knowledge to repair data • Minimize the repair effort by suggesting the data with most information gain (i.e. the amount of uncertainty reduction of knowing the true value)  Adaptive data acquisition in resilience systems: • Abnormal readings may reflect unexpected situations: network loss, battery discharge, etc. • Detect such situations by analyzing the probability distributions of consecutive timestamps (e.g. sudden changes of variances and mean values).
  • 22. CollaborateCom | 2014 22 THANK YOU Q&A