Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Search, Discovery and Analysis of Sensory Data Streams

113 views

Published on

Keynote at SAW2019: 1st International Workshop on Sensors and Actuators on the Web, ISWC 2019, Auckland.

Published in: Education
  • Be the first to comment

  • Be the first to like this

Search, Discovery and Analysis of Sensory Data Streams

  1. 1. Search, Discovery and Analysis of Sensory Data Streams 1 Payam Barnaghi Centre for Vision, Speech and Signal Processing (CVSSP), University of Surrey Care Technology & Research Centre, The UK Dementia Research Institute (DRI) SAW2019: 1st International Workshop on Sensors and Actuators on the Web
  2. 2. 46 years ago on the 5th of November (submission day) 2 Source: https://www.cs.princeton.edu/courses/archive/fall06/cos561/papers/cerf74.pdf • A 32 bit IP address was used of which the first 8 bits signified the network and the remaining 24 bits designated the host on that network. • The assumption was that 256 networks would be sufficient for the foreseeable future… • Obviously this was before LANs (Ethernet was under development at Xerox PARC at that time).
  3. 3. Around 20 years later… 3
  4. 4. Web search in the early days 44
  5. 5. And there came Google! 55 Google says that the web has now over 30 trillion unique individual pages. It is probably not even that relevant anymore; lots of resources are dynamic…
  6. 6. The Crawling problem 6Source: https://www.bruceclay.com/seo/submit-website/
  7. 7. The Web content search lifecycle − Creation − Upload − Crawling − Indexing − Delete/Update − Query − Search and discovery − Processing − Ranking − Presentation 7 Content Access
  8. 8. However, not only pages are on the web… 8 Image source: Youmegeek.com
  9. 9. Internet of Things (IoT) Search 9
  10. 10. 10 http://Thingful.net
  11. 11. 11 http://Thingful.net
  12. 12. 12
  13. 13. 13
  14. 14. 14 Image sources: Wolfram Alpha
  15. 15. Search and automation 15 source: Passler.com
  16. 16. Sensory data 16
  17. 17. Sensor Data Flow on the Web 17 P. Barnaghi, A. Sheth, “On Searching the Internet of Things: Requirements and Challenges”, IEEE Intelligent Systems, 2016.
  18. 18. 18https://iotcrawler.eu
  19. 19. Searching for… 19 (Y. Fathy, P. Barnaghi, et. al, 2018)
  20. 20. Searching for Sensory Devices (i.e. Resources) 20
  21. 21. Semantic models 21
  22. 22. Semantic models 22
  23. 23. LSM : A Semantic Approach 23 (Danh Le-Phuoc et. al, ISWC, 2011)
  24. 24. A discovery engine for the IoT 24(HosseiniTabatabaie, Barnaghi et. al, 2018)
  25. 25. A GMM model for indexing 25 Average Success rates First attempt: 92.3% (min) At first DS: 92.5 % (min) At first DSL2 : 98.5 % (min) Number of attempts Percentageofthetotalqueries 0 10 20 30 40 50 60 10 -4 10 -3 10 -2 10 -1 10 0 DSL2 capacity 1 DSL2 capacity 2 DSL2 capacity 3 DSL2 capacity 4
  26. 26. 26 However, there are also other possible solutions: (Y. Fathy, P. Barnaghi, et. al, 2017) (A. HosseiniTabatabaie, P. Barnaghi et. al, 2019)
  27. 27. The Crawling and Update Issue 27
  28. 28. The Crawling Challenge − Uniform policy: re-visiting all pages in the collection with the same frequency, regardless of their rates of change. − Proportional policy: re-visiting more often the pages that change more frequently. The visiting frequency is directly proportional to the (estimated) change frequency. 28 Cho, Junghoo; Garcia-Molina, Hector (2003). "Effective page refresh policies for Web crawlers". ACM Transactions on Database Systems. 28 (4): 390–426.
  29. 29. Web Crawling − Cho and Garcia-Molina proved the surprising result that, in terms of average freshness, the uniform policy outperforms the proportional policy in both a simulated Web and a real Web crawl. − Allocating too many new crawls to rapidly changing pages at the expense of less frequently updating pages. − A proportional policy allocates more resources to crawling frequently updating pages, but experiences less overall freshness time from them. 29 Source: Wikipedia
  30. 30. Crawling and the Freshness Issue − To improve freshness, the crawler should penalise the elements that change too often. − The optimal re-visiting policy is neither the uniform policy nor the proportional policy. − The optimal method for keeping average freshness high includes ignoring the pages that change too often, and the optimal for keeping average age low is to use access frequencies that monotonically (and sub-linearly) increase with the rate of change of each page. 30 Junghoo Cho; Hector Garcia-Molina (2003). "Estimating frequency of change". ACM Transactions on Internet Technology. 3 (3): 256–290. Source: Wikipedia
  31. 31. Searching the content of data streams 31
  32. 32. Patterns and segmentation of time-series data 32
  33. 33. But the data is often multidimensional and multivariate 33Credit: Shirin Enshaeifar, CR&T Centre, UK Dementia Research Institute/CVSSP, Uni of Surrey
  34. 34. Creating patterns from streaming data 34(Gonzalez-Vidal, Barnaghi, Skarmeta, IEEE TKDE, 2018)
  35. 35. IoTCrawler search engine 35http://iot-crawler.ee.surrey.ac.uk/search-engine/
  36. 36. 36http://iot-crawler.ee.surrey.ac.uk/search-engine/
  37. 37. Pattern analysis 37 Days Time Aggregated daily pattern (2weeks) Days Time Aggregated daily pattern (2weeks) (Enshaeifar, Barnaghi, et. al, PlosOne, 2018)
  38. 38. Developing end-to-end solutions 38 (Enshaeifar, Barnaghi, et. al, 2019)
  39. 39. Some of the Research Challenges − Provenance monitoring and fact checking algorithms and tools − Dealing with noisy, incomplete and dynamic data. − Handling and processing large data streams, search and identification of patterns. − Crawling, search and query of changing data − Multi-modal information analysis and continual and adaptive learning algorithms − Security, privacy, trust and accessibility − Solutions to keep (and make) the Web a safe, open, inclusive and collaborative environment. 39
  40. 40. Some (other) important issues 40
  41. 41. How representative is your data? 41
  42. 42. The issue of trust and reliability 42
  43. 43. How stable are the models that you learn from your data? 43 Credits: Roonak Rezvani, CR&T Centre, UK Dementia Research Institute/CVSSP, Uni of Surrey
  44. 44. Dynamicity and machine learning issue 44 Noise and missing data Pattern and change representation Continual and adaptive learning Network and Causation analysis
  45. 45. Avoid (unnecessary) complexity 45
  46. 46. Be ready for setbacks 46
  47. 47. References − S. Enshaeifar et. al, "Health management and pattern analysis of daily living activities of people with Dementia using in-home sensors and machine learning techniques", PLoS ONE 13(5): e0195605, 2018. − A. González Vidal, P. Barnaghi, A. F. Skarmeta, "BEATS: Blocks of Eigenvalues Algorithm for Time series Segmentation", IEEE Transactions on Knowledge and Data Engineering (TKDE), 2018. − Y. Fathy, P. Barnaghi, R. Tafazolli, "An Online Adaptive Algorithm for Change Detection in Streaming Sensory Data", IEEE Systems Journal, 2018. − Y. Fathy, P. Barnaghi, R. Tafazolli, "Large-Scale Indexing, Discovery and Ranking for the Internet of Things (IoT)", ACM Computing Surveys, 2017. − S. A. Hosieni Tabatabaei, Y. Fathy, P. Barnaghi, C. Wang, R. Tafazolli, "A Novel Indexing Method for Scalable IoT Source Lookup", IEEE Internet of Things Journal, 2018. − Y. Fathy, P. Barnaghi, R. Tafazolli, "Distributed Spatial Indexing for the Internet of Things Data Management", Proc. of IFIP/IEEE International Symposium on Integrated Network Management, Lisbon, Portugal, May 2017. 47
  48. 48. Acknowledgments 48
  49. 49. Thank you! http://personal.ee.surrey.ac.uk/Personal/P.Barnaghi/ @pbarnaghi p.barnaghi@surrey.ac.uk https://ukdri.ac.uk/team/payam-barnaghi

×