Search, Discovery and Analysis of Sensory Data Streams

Search, Discovery and Analysis
of Sensory Data Streams
1
Payam Barnaghi
Centre for Vision, Speech and Signal Processing (CVSSP), University of
Surrey
Care Technology & Research Centre, The UK Dementia Research
Institute (DRI)
SAW2019: 1st International Workshop on Sensors and Actuators on
the Web

46 years ago on the 5th of November (submission
day)
2
Source: https://www.cs.princeton.edu/courses/archive/fall06/cos561/papers/cerf74.pdf
• A 32 bit IP address was
used of which the first 8
bits signified the network
and the remaining 24 bits
designated the host on
that network.
• The assumption was that
256 networks would be
sufficient for the
foreseeable future…
• Obviously this was before
LANs (Ethernet was
under development at
Xerox PARC at that time).

Web search in the early days
44

And there came Google!
55
Google says that the web has now over 30
trillion unique individual pages. It is
probably not even that relevant anymore;
lots of resources are dynamic…

The Crawling problem
6Source: https://www.bruceclay.com/seo/submit-website/

The Web content search lifecycle
− Creation
− Upload
− Crawling
− Indexing
− Delete/Update
− Query
− Search and discovery
− Processing
− Ranking
− Presentation
7
Content
Access

However, not only pages are on the web…
8
Image source: Youmegeek.com

Internet of Things (IoT) Search
9

14
Image sources: Wolfram Alpha

Search and automation
15
source: Passler.com

Sensor Data Flow on the Web
17
P. Barnaghi, A. Sheth, “On Searching the Internet of Things: Requirements and Challenges”, IEEE Intelligent Systems, 2016.

Searching for…
19
(Y. Fathy, P. Barnaghi, et. al, 2018)

Searching for Sensory Devices
(i.e. Resources)
20

LSM : A Semantic Approach
23
(Danh Le-Phuoc et. al, ISWC, 2011)

A discovery engine for the IoT
24(HosseiniTabatabaie, Barnaghi et. al, 2018)

A GMM model for indexing
25
Average Success rates
First attempt: 92.3%
(min)
At first DS: 92.5 % (min)
At first DSL2 : 98.5 %
(min)
Number of attempts
Percentageofthetotalqueries
0 10 20 30 40 50 60
10
-4
10
-3
10
-2
10
-1
10
0
DSL2 capacity 1
DSL2 capacity 2
DSL2 capacity 3
DSL2 capacity 4

26
However, there
are also other
possible solutions:
(Y. Fathy, P. Barnaghi, et. al, 2017)
(A. HosseiniTabatabaie, P. Barnaghi et. al, 2019)

The Crawling and Update Issue
27

The Crawling Challenge
− Uniform policy: re-visiting all pages in the collection with
the same frequency, regardless of their rates of change.
− Proportional policy: re-visiting more often the pages that
change more frequently. The visiting frequency is directly
proportional to the (estimated) change frequency.
28
Cho, Junghoo; Garcia-Molina, Hector (2003). "Effective page refresh policies for Web
crawlers". ACM Transactions on Database Systems. 28 (4): 390–426.

Web Crawling
− Cho and Garcia-Molina proved the surprising result that,
in terms of average freshness, the uniform policy
outperforms the proportional policy in both a simulated
Web and a real Web crawl.
− Allocating too many new crawls to rapidly changing
pages at the expense of less frequently updating pages.
− A proportional policy allocates more resources to
crawling frequently updating pages, but experiences less
overall freshness time from them.
29
Source: Wikipedia

Crawling and the Freshness Issue
− To improve freshness, the crawler should penalise the
elements that change too often.
− The optimal re-visiting policy is neither the uniform policy
nor the proportional policy.
− The optimal method for keeping average freshness high
includes ignoring the pages that change too often, and
the optimal for keeping average age low is to use access
frequencies that monotonically (and sub-linearly)
increase with the rate of change of each page.
30
Junghoo Cho; Hector Garcia-Molina (2003). "Estimating frequency of change". ACM
Transactions on Internet Technology. 3 (3): 256–290.
Source: Wikipedia

Searching the content of data streams
31

Patterns and segmentation of time-series data
32

But the data is often multidimensional and
multivariate
33Credit: Shirin Enshaeifar, CR&T Centre, UK Dementia Research Institute/CVSSP, Uni of Surrey

Creating patterns from streaming data
34(Gonzalez-Vidal, Barnaghi, Skarmeta, IEEE TKDE, 2018)

IoTCrawler search engine
35http://iot-crawler.ee.surrey.ac.uk/search-engine/

36http://iot-crawler.ee.surrey.ac.uk/search-engine/

Pattern analysis
37
Days
Time
Aggregated daily pattern (2weeks)
Days
Time
Aggregated daily pattern (2weeks)
(Enshaeifar, Barnaghi, et. al, PlosOne, 2018)

Developing end-to-end solutions
38
(Enshaeifar, Barnaghi, et. al, 2019)

Some of the Research Challenges
− Provenance monitoring and fact checking algorithms
and tools
− Dealing with noisy, incomplete and dynamic data.
− Handling and processing large data streams, search and
identification of patterns.
− Crawling, search and query of changing data
− Multi-modal information analysis and continual and
adaptive learning algorithms
− Security, privacy, trust and accessibility
− Solutions to keep (and make) the Web a safe, open,
inclusive and collaborative environment.
39

Some (other) important issues
40

How representative is your data?
41

The issue of trust and reliability
42

How stable are the models that you learn from
your data?
43
Credits: Roonak Rezvani, CR&T Centre, UK Dementia Research Institute/CVSSP, Uni of Surrey

Dynamicity and machine learning issue
44
Noise and missing data Pattern and change representation
Continual and adaptive learning Network and Causation analysis

Avoid (unnecessary) complexity
45

References
− S. Enshaeifar et. al, "Health management and pattern analysis of daily living activities
of people with Dementia using in-home sensors and machine learning techniques",
PLoS ONE 13(5): e0195605, 2018.
− A. González Vidal, P. Barnaghi, A. F. Skarmeta, "BEATS: Blocks of Eigenvalues
Algorithm for Time series Segmentation", IEEE Transactions on Knowledge and Data
Engineering (TKDE), 2018.
− Y. Fathy, P. Barnaghi, R. Tafazolli, "An Online Adaptive Algorithm for Change
Detection in Streaming Sensory Data", IEEE Systems Journal, 2018.
− Y. Fathy, P. Barnaghi, R. Tafazolli, "Large-Scale Indexing, Discovery and Ranking for
the Internet of Things (IoT)", ACM Computing Surveys, 2017.
− S. A. Hosieni Tabatabaei, Y. Fathy, P. Barnaghi, C. Wang, R. Tafazolli, "A Novel
Indexing Method for Scalable IoT Source Lookup", IEEE Internet of Things Journal,
2018.
− Y. Fathy, P. Barnaghi, R. Tafazolli, "Distributed Spatial Indexing for the Internet of
Things Data Management", Proc. of IFIP/IEEE International Symposium on
Integrated Network Management, Lisbon, Portugal, May 2017.
47

Thank you!
http://personal.ee.surrey.ac.uk/Personal/P.Barnaghi/
@pbarnaghi
p.barnaghi@surrey.ac.uk
https://ukdri.ac.uk/team/payam-barnaghi

Search, Discovery and Analysis of Sensory Data Streams

More Related Content

What's hot

Similar to Search, Discovery and Analysis of Sensory Data Streams

More from PayamBarnaghi

Recently uploaded

Search, Discovery and Analysis of Sensory Data Streams

Editor's Notes