invited talk at iPHEM16, Innovation in Pre-hospital Emergency Medicine, Kent Surrey and Sussex Air Ambulance Trust, July 2016, Brighton, United Kingdom
Dynamic Semantics for the Internet of Things PayamBarnaghi
Ontology Summit 2015 : Track A Session - Ontology Integration in the Internet of Things - Thu 2015-02-05,
http://ontolog-02.cim3.net/wiki/ConferenceCall_2015_02_05
invited talk at iPHEM16, Innovation in Pre-hospital Emergency Medicine, Kent Surrey and Sussex Air Ambulance Trust, July 2016, Brighton, United Kingdom
Dynamic Semantics for the Internet of Things PayamBarnaghi
Ontology Summit 2015 : Track A Session - Ontology Integration in the Internet of Things - Thu 2015-02-05,
http://ontolog-02.cim3.net/wiki/ConferenceCall_2015_02_05
MQTT - MQ Telemetry Transport for Message QueueingPeter R. Egli
Description of message queueing (MQ) protocol for the transport of telemetry data (MQTT - MQ Telemetry Transport).
MQTT is a protocol designed to fit the needs of Internet of Things scenarios. It is lightweight and efficient, but still affords all the features required for reliable messaging between wireless sensor / actor nodes and applications. MQTT decouples producer and consumer of data (sensors, actors and applications) through message brokers with publish / subscribe message queues called topics. MQTT supports different levels of quality of service thus providing the flexibility to adapt to the different needs of applications.
Further features like will and retain messages make MQTT well suited for sensor network scenarios as well as for lightweight enterprise messaging applications.
Open source implementations like Eclipse paho provide ample code for integrating MQTT in your own applications.
This tutorial presents tools and techniques for effectively utilizing the Internet of Things (IoT) for building advanced applications, including the Physical-Cyber-Social (PCS) systems. The issues and challenges related to IoT, semantic data modelling, annotation, knowledge representation (e.g. modelling for constrained environments, complexity issues and time/location dependency of data), integration, analy- sis, and reasoning will be discussed. The tutorial will de- scribe recent developments on creating annotation models and semantic description frameworks for IoT data (e.g. such as W3C Semantic Sensor Network ontology). A review of enabling technologies and common scenarios for IoT applications from the data and knowledge engineering point of view will be discussed. Information processing, reasoning, and knowledge extraction, along with existing solutions re- lated to these topics will be presented. The tutorial summarizes state-of-the-art research and developments on PCS systems, IoT related ontology development, linked data, do- main knowledge integration and management, querying large- scale IoT data, and AI applications for automated knowledge extraction from real world data.
Related: Semantic Sensor Web: http://knoesis.org/projects/ssw
Physical-Cyber-Social Computing: http://wiki.knoesis.org/index.php/PCS
Introduction to Cloud computing and Big Data-HadoopNagarjuna D.N
Cloud Computing Evolution
Why Cloud Computing needed?
Cloud Computing Models
Cloud Solutions
Cloud Jobs opportunities
Criteria for Big Data
Big Data challenges
Technologies to process Big Data- Hadoop
Hadoop History and Architecture
Hadoop Eco-System
Hadoop Real-time Use cases
Hadoop Job opportunities
Hadoop and SAP HANA integration
Summary
These slides were used at the first Aarhus Follower Group meet-up for the EU-funded project IoTCrawler. They entail an introduction to the project aswell as a more in depth presentation of the difference between web search and Internet of Things (IoT) search an the development of Internet of Things. Furthermore some of the scenarios from the project are presented.
Data Modelling and Knowledge Engineering for the Internet of ThingsCory Andrew Henson
Tutorial on Data Modelling and Knowledge Engineering for the Internet of Things, presented at EKAW 2012, Galway City, Ireland, October 8-12, 2012
http://knoesis.org/iot-tutorial-ekaw2012/
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...Edward Curry
The Real-time Linked Dataspace (RLD) is an enabling platform for data management for intelligent systems within smart environments that combines the pay-as-you-go paradigm of dataspaces, linked data, and knowledge graphs with entity-centric real-time query capabilities.
The RLD contains all the relevant information within a data ecosystem including things, sensors, and data sources and has the responsibility for managing the relationships among these participants.
It manages sources without presuming a pre-existing semantic integration among them using specialised dataspace support services for loose administrative proximity and semantic integration for event and stream systems. Support services leverage approximate and best-effort techniques and operate under a 5 star model for “pay-as-you-go” incremental data management.
Data Science at Scale - The DevOps ApproachMihai Criveti
DevOps Practices for Data Scientists and Engineers
1 Data Science Landscape
2 Process and Flow
3 The Data
4 Data Science Toolkit
5 Cloud Computing Solutions
6 The rise of DevOps
7 Reusable Assets and Practices
8 Skills Development
Scientific and Academic Research: A Survival Guide PayamBarnaghi
Payam Barnaghi
Centre for Vision, Speech and Signal Processing (CVSSP)
Electrical and Electronic Engineering Department
University of Surrey
February 2019
Normal Labour/ Stages of Labour/ Mechanism of LabourWasim Ak
Normal labor is also termed spontaneous labor, defined as the natural physiological process through which the fetus, placenta, and membranes are expelled from the uterus through the birth canal at term (37 to 42 weeks
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
A workshop hosted by the South African Journal of Science aimed at postgraduate students and early career researchers with little or no experience in writing and publishing journal articles.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Intelligent Data Processing for the Internet of Things
1. 1
Intelligent Data Processing for the Internet of
Things
Payam Barnaghi
Institute for Communication Systems (ICS)
University of Surrey
Guildford, United Kingdom
International “IoT 360″ Summer School
October 29th – November 1st, 2014 – Rome, Italy
2. 2
Key characteristics of IoT devices
−Often inexpensive sensors (actuators) equipped with a radio
transceiver for various applications, typically low data rate ~
10-250 kbps.
−Deployed in large numbers
−The sensors should coordinate to perform the desired task.
−The acquired information (periodic or event-based) is
reported back to the information processing centre (or some
cases in-network processing is required)
−Solutions are often application-dependent.
2
3. 3
Beyond conventional sensors
− Human as a sensor (citizen sensors)
− e.g. tweeting real world data and/or events
− Software sensors
− e.g. Software agents/services generating/representing
data
Road block, A3
Road block, A3
Suggest a different route
4. Internet of Things: The story so far
P. Barnaghi, A. Sheth, “The Internet of Things: The story so far”, IEEE IoT Newsletter, September 2014.
5. 5
The benefits of data processing in IoT
− Turn 12 terabytes of Tweets created each day into sentiment
analysis related to different events/occurrences or relate them to
products and services.
− Convert (billions of) smart meter readings to better predict and
balance power consumption.
− Analyze thousands of traffic, pollution, weather, congestion, public
transport and event sensory data to provide better traffic and
smart city management.
− Monitor patients, elderly care and much more…
− Requires: real-time, reliable, efficient (for low power and resource
limited nodes), and scalable solutions.
Partially adapted from: What is Bog Data?, IBM
6. Not just Volume…
… but also Data Dynamicity:
How can we efficiently deal with:
- Large amounts of (heterogeneous/distributed) data?
- Both static and dynamic data?
- In a re-usable, modular, flexible way?
- Integrate different types of data
- Provide hypothesis and create more context-aware solutions
Adapted from: M. Hauswirth. A. Mileo, Insight, National University of Ireland, Galway.
7. Data Volume
AnyThing
AnyPlace AnyTime
Security, Reliability,
Trust and Privacy
Societal Impacts, Economic Values
and Viability
Services and Applications
Networking and
Communication
11. Problem #1
Data: We seem to have lots of it…
Real World Data: it is always difficult to get
(silos, format, privacy, business interests or
lack of interest!...)
12. Problem #2
Data: interoperability and metadata
frameworks…
Real World Data: there are solutions for
service based (RESTful) access, meta-data/
semantic representation frameworks
(e.g. W3C SSN, HyperCat,…) but none of
them are still widely adapted.
13. Problem #3
Data: quality, reliability…
Real World Data: data can be noisy, crowed
source data can be inaccurate,
contradictory, delay in accessing/processing
the data…
14. Problem #4
Data: having too much data and using
analytics tools alone won’t solve the
problem…
Real World Data: in addition to the HPC
issues, we need new methods/solutions that
can provide real-time analysis of dynamic,
variable quality and multi-modal streams…
15. Problem #5
Data: abstraction, discovering the
associations…
Real World Data: co-occurrence vs.
causation; we need hypothesis, background
knowledge,…
After all data is not what we are really
after…
17. Sometimes it’s even better if we have:
(near) real-time
linked open data
Streams
18. or even better than that if we have:
(near) real-time
linked open data
Streams
+
meta-data (semantic annotations)
+
Adaptable and scalable analytics tools
+
Sufficient background knowledge
20. Current focus on Big Data
− Emphasis on power of data and data mining
solutions
− Technology solutions to handle large volumes of
data; e.g. Hadoop, NoSQL, Graph Databases, …
− Trying to find patterns and trends from large
volumes of data…
21. Myths About Big Data
− Big Data is only about massive data volume
− Big Data means Hadoop
− Big Data means unstructured data
− If we have enough data we can draw conclusions
(enough here often means massive amounts)
− NoSQL means No SQL
− It is about increasing computational power and
taking more data and running data mining
algorithms.
21
Some of the items are adapted from: Brain Gentile, http://mashable.com/2012/06/19/big-data-myths/
22. What happens if we only focus on data
− Number of burgers consumed per day.
− Number of cats outside.
− Number of people checking their facebook
account.
− What insight would you draw?
22
23. It is also important to note what type of
problems we expect to solve.
28. 101 Smart City Use-case Scenarios
http://www.ict-citypulse.eu/page/content/smart-city-use-cases-and-requirements
29. 29
Data alone is not enough
− Domain knowledge
− Machine interpretable meta-data
− Delivery, sharing and representation services
− Query, discovery, aggregation services
− Publish, subscribe, notification, and access
interfaces/services
− More open solutions for innovation and citizen participation
− Efficient feedback and control mechanisms
− Social network and social system analysis
− In cities, interactions with people and social systems is the
key.
33. 33
Technical Challenges
− Discovery: finding appropriate device and data sources
− Access: Availability and (open) access to data resources
and data
− Search: querying for data
− Integration: dealing with heterogeneous devices, networks
and data
− Large-scale data mining, adaptable learning and efficient
computing and processing
− Interpretation: translating data to knowledge that can be
used by people and applications
− Scalability: dealing with large numbers of devices and a
myriad of data and the computational complexity of
interpreting the data.
34. 34
IoT Data Access
− Publish/Subscribe (long-term/short-term)
− Ad-hoc query
− The typical types of data request for sensory data:
− Query based on
− ID (resource/service) – for known resources
− Location
− Type
− Time – requests for freshness data or historical data;
− One of the above + a range [+ Unit of Measurement]
− Type/Location/Time + A combination of Quality of Information
attributes
− An entity of interest (a feature of an entity on interest)
− Complex Data Types (e.g. pollution data could be a combination of
different types)
35. 35
IoT Data in the Cloud
Image courtesy: http://images.mathrubhumi.com
http://www.anacostiaws.org/userfiles/image/Blog-Photos/river2.jpg
36. Comparing IoT data streams with
conventional multimedia streams
Source: P. Barnaghi, W. Wang, L. Dong, C. Wang, "A Linked-data Model for Semantic Sensor Streams", in the Proc. of
IEEE International Conference on Internet of Things (iThings 2013), August 2013.
37. 37
Describing IoT Data: An example
Time
Location
Type
Value
Link to QoI metadata
UTC
#GeoHash
#Hash
[DataType, Value]
URI
Ontology for
common
types
38. 38
Observation and Measurement Value
GeoHash
UTC time
Standard XSD data type
UTC time (in Java) : The time indicated is returned represented as the distance, measured in milliseconds,
of that time from the epoch (00:00:00 GMT on January 1, 1970).
39. 39
GeoHashing
− For example Guildford: lat: 51.235401 and
long: -0.574600 can be hashed as: gcpe6zjeffgp
− It can be used as:
− A unique identifier
− represent point data as hash string
− It uses Base 32 encoding and bit interleaving
− It’s used for geo-tagging (and is a symmetric technique)
− Place close to each other will have similar prefix (string similarity)
− Limitations:
− We could have Geohash codes with no common prefix
− Edge case (locations close to each other but on opposite sides of the
Equator)
− A meridian point (line of longitude)
40. 40
GeoHash Example
Sample locations on a Google Map and their
equivalent geohash strings;
- close locations have similar prefixes
41. IoT Data Processing
WSN
WSN
WSN
WSN
WSN
Network-enabled
Devices
Network-enabled
Devices
Network
services/storage
and processing
units Data/service
access at
application level
Data collections
and processing
within the
networks
Data Discovery
Service/
Resource
Discovery
42. 42
In-network processing
− Mobile Ad-hoc Networks can be seen as a set of nodes that
deliver bits from one end to the other;
− WSNs, on the other end, are expected to provide
information, not necessarily original bits
− Gives additional options
− e.g., manipulate or process the data in the network
− Main example: aggregation
− Applying aggregation functions to a obtain an average value of
measurement data
− Typical functions: minimum, maximum, average, sum, …
− Not amenable functions: median
Source: Protocols and Architectures for Wireless Sensor Networks, Protocols and Architectures for Wireless Sensor Networks
Holger Karl, Andreas Willig, chapter 3, Wiley, 2005 .
43. 43
In-network processing
− Depending on application, more sophisticated processing of
data can take place within the network
− Example edge detection: locally exchange raw data with
neighboring nodes, compute edges, only communicate edge
description to far away data sinks
− Example tracking/angle detection of signal source: Conceive of
sensor nodes as a distributed microphone array, use it to
compute the angle of a single source, only communicate this
angle, not all the raw data
− Exploit temporal and spatial correlation
− Observed signals might vary only slowly in time; so no need to
transmit all data at full rate all the time
− Signals of neighboring nodes are often quite similar; only try
to transmit differences (details a bit complicated, see later)
Source: Protocols and Architectures for Wireless Sensor Networks, Protocols and Architectures for Wireless Sensor Networks
Holger Karl, Andreas Willig, chapter 3, Wiley, 2005 .
44. Data Aggregation
− Computing a smaller representation of a number of data items (or
messages) that is extracted from all the individual data items.
− For example computing min/max or mean of sensor data.
− More advance aggregation solutions could use approximation
techniques to transform high-dimensionality data to lower-dimensionality
abstractions/representations.
− The aggregated data can be smaller in size, represent
patterns/abstractions; so in multi-hop networks, nodes can
receive data form other node and aggregate them before
forwarding them to a sink or gateway.
− Or the aggregation can happen on a sink/gateway node.
45. Aggregation example
− Reduce number of transmitted bits/packets by applying an
aggregation function in the network
1
1
3
1
1
6
1
1
1
1
1
1
Source: Holger Karl, Andreas Willig, Protocols and Architectures for Wireless Sensor Networks, Protocols and Architectures for Wireless Sensor
Networks, chapter 3, Wiley, 2005 .
46. Efficacy of an aggregation mechanism
− Accuracy: difference between the resulting value or representation
and the original data
− Some solutions can be lossless or lossly depending on the
applied techniques.
− Completeness: the percentage of all the data items that are
included in the computation of the aggregated data.
− Latency: delay time to compute and report the aggregated data
− Computation foot-print; complexity;
− Overhead: the main advantage of the aggregation is reducing the
size of the data representation;
− Aggregation functions can trade-off between accuracy, latency and
overhead;
− Aggregation should happen close to the source.
47. Publish/Subscribe
− Achieved by publish/subscribe paradigm
− Idea: Entities can publish data under certain names
− Entities can subscribe to updates of such named data
− Conceptually: Implemented by a software bus
− Software bus stores subscriptions, published data; names used
as filters; subscribers notified when values of named data
changes
Publisher 1 Publisher 2
Software bus
Subscriber 1 Subscriber 2 Subscriber 3
Source: Holger Karl, Andreas Willig, Protocols and Architectures for Wireless Sensor Networks, Protocols and Architectures for Wireless Sensor
Networks, chapter 12, Wiley, 2005 .
48. MQTT Pub/Sub Protocol
− MQ Telemetry Transport (MQTT) is a lightweight broker-based
publish/subscribe messaging protocol.
− MQTT is designed to be open, simple, lightweight and easy to
implement.
− These characteristics make MQTT ideal for use in constrained
environments, for example in IoT.
−Where the network is expensive, has low bandwidth or is
unreliable
−When run on an embedded device with limited processor or
memory resources;
− A small transport overhead (the fixed-length header is just 2
bytes), and protocol exchanges minimised to reduce network
traffic
− MQTT was developed by Andy Stanford-Clark of IBM, and Arlen
Nipper of Cirrus Link Solutions.
Source: MQTT V3.1 Protocol Specification, IBM, http://public.dhe.ibm.com/software/dw/webservices/ws-mqtt/mqtt-v3r1.html
49. MQTT
− It supports publish/subscribe message pattern to provide one-to-many
message distribution and decoupling of applications
− A messaging transport that is agnostic to the content of the payload
− The use of TCP/IP to provide basic network connectivity
− Three qualities of service for message delivery:
− "At most once", where messages are delivered according to the best
efforts of the underlying TCP/IP network. Message loss or duplication
can occur.
− This level could be used, for example, with ambient sensor data
where it does not matter if an individual reading is lost as the next
one will be published soon after.
− "At least once", where messages are assured to arrive but duplicates
may occur.
− "Exactly once", where message are assured to arrive exactly once. This
level could be used, for example, with billing systems where duplicate
or lost messages could lead to incorrect charges being applied.
Source: MQTT V3.1 Protocol Specification, IBM, http://public.dhe.ibm.com/software/dw/webservices/ws-mqtt/mqtt-v3r1.html
50. MQTT Message Format
− The message header for each MQTT command message contains a
fixed header.
− Some messages also require a variable header and a payload.
− The format for each part of the message header:
— DUP: Duplicate delivery
— QoS: Quality of Service
— RETAIN: RETAIN flag
—This flag is only used on PUBLISH messages. When a client
sends a PUBLISH to a server, if the Retain flag is set (1), the
server should hold on to the message after it has been delivered
to the current subscribers.
—This allows new subscribers to instantly receive data with the
retained, or Last Known Good, value.
Source: MQTT V3.1 Protocol Specification, IBM, http://public.dhe.ibm.com/software/dw/webservices/ws-mqtt/mqtt-v3r1.html
51. Sensor Data as time-series data
− The sensor data (or IoT data in general) can be seen as time-series
data.
− A sensor stream refers to a source that provide sensor data over
time.
− The data can be sampled/collected at a rate (can be also variable)
and is sent as a series of values.
− Over time, there will be a large number of data items collected.
− Using time-series processing techniques can help to reduce the
size of the data that is communicated;
− Let’s remember, communication can consume more energy
than communication;
52. Sensor Data as time-series data
− Different representation method that introduced for time-series
data can be applied.
− The goal is to reduce the dimensionality (and size) of the data, to
find patterns, detect anomalies, to query similar data;
− Dimensionality reduction techniques transform a data series with
n items to a representation with w items where w < n.
− This functions are often lossy in comparison with solutions like
normal compression that preserve all the data.
− One of these techniques is called Symbolic Aggregation
Approximation (SAX).
− SAX was originally proposed for symbolic representation of time-series
data; it can be also used for symbolic representation of
time-series sensor measurements.
− The computational foot-print of SAX is low; so it can be also used
as a an in-network processing technique.
53. 53
In-network processing
Using Symbolic Aggregate Approximation (SAX)
SAX Pattern (blue) with word length of 20 and a vocabulary of 10 symbols
over the original sensor time-series data (green)
Source: P. Barnaghi, F. Ganz, C. Henson, A. Sheth, "Computing Perception from Sensor Data",
in Proc. of the IEEE Sensors 2012, Oct. 2012.
fggfffhfffffgjhghfff
jfhiggfffhfffffgjhgi
fggfffhfffffgjhghfff
54. Symbolic Aggregate Approximation
(SAX)
− SAX transforms time-series data into symbolic string
representations.
− Symbolic Aggregate approXimation was proposed by Jessica Lin et
al at the University of California –Riverside;
− http://www.cs.ucr.edu/~eamonn/SAX.htm .
− It extends Piecewise Aggregate Approximation (PAA) symbolic
representation approach.
− SAX algorithm is interesting for in-network processing in WSN
because of its simplicity and low computational complexity.
− SAX provides reasonable sensitivity and selectivity in representing
the data.
− The use of a symbolic representation makes it possible to use
several other algorithms and techniques to process/utilise SAX
representations such as hashing, pattern matching, suffix trees
etc.
55. Processing Steps in SAX
− SAX transforms a time-series X of length n into the string of
arbitrary length, where typically, using an alphabet A of size a >
2.
− The SAX algorithm has two main steps:
− Transforming the original time-series into a PAA representation
− Converting the PAA intermediate representation into a string
during.
− The string representations can be used for pattern matching,
distance measurements, outlier detection, etc.
56. Piecewise Aggregate Approximation
− In PAA, to reduce the time series from n dimensions to w
dimensions, the data is divided into w equal sized “frames.”
− The mean value of the data falling within a frame is calculated and
a vector of these values becomes the data-reduced
representation.
− Before applying PAA, each time series to have a needs to be
normalised to achieve a mean of zero and a standard deviation of
one.
− The reason is to avoid comparing time series with different
offsets and amplitudes;
Source: Jessica Lin, Eamonn Keogh, Stefano Lonardi, and Bill Chiu. 2003. A symbolic representation of time series, with implications for streaming algorithms. In
Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery (DMKD '03). ACM, New York, NY, USA, 2-11.
60. PAA to SAX Conversion
− Conversion of the PAA representation of a time-series into
SAX is based on producing symbols that correspond to the
time-series features with equal probability.
− The SAX developers have shown that time-series which are
normalised (zero mean and standard deviation of 1) follow
a Normal distribution (Gaussian distribution).
− The SAX method introduces breakpoints that divides the
PAA representation to equal sections and assigns an
alphabet for each section.
− For defining breakpoints, Normal inverse cumulative
distribution function
61. Breakpoints in SAX
− “Breakpoints: breakpoints are a sorted list of numbers B =
β 1,…, β a-1 such that the area under a N(0,1) Gaussian curve
from βi to βi+1 = 1/a”.
Source: Jessica Lin, Eamonn Keogh, Stefano Lonardi, and Bill Chiu. 2003. A symbolic representation of time series, with implications for streaming algorithms. In
Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery (DMKD '03). ACM, New York, NY, USA, 2-11.
62. Alphabet representation in SAX
− Let’s assume that we will have 4 symbols alphabet: a,b,c,d
− As shown in the table in the previous slide, the cut lines for
this alphabet (also shown as the thin red lines on the plot
below) will be { -0.67, 0, 0.67 }
Source: JMOTIF Time series mining, http://code.google.com/p/jmotif/wiki/SAX
64. Features of the SAX technique
− SAX divides a time series data into equal segments and then
creates a string representation for each segment.
− The SAX patterns create the lower-level abstractions that are used
to create the higher-level interpretation of the underlying data.
− The string representation of the SAX mechanism enables to
compare the patterns using a specific type of string similarity
function.
65. High-level
information/
knowledge
65
A sample data processing framework
Temporal data
(extracted from
descriptions)
Day-time
Night-time
High-level abstractions
Domain knowledge
fggfffhfffffgjhghfff dddfffffffffffddd cccddddccccdddccc dddcdcdcdcddasddd aaaacccaaaaaaaacccc
Raw sensor data stream Raw sensor data stream Raw sensor data stream
PIR Sensor Light Sensor Temperature
Sensor
Attendance Phone Hot
Temperature
Cold
Temperature Bright
Office room
BA0121
On going
meeting
Window has
been left open
….
Spatial data
(extracted from
descriptions)
Thematic data
(low level
abstractions)
Intelligent Processing
Observations
SAX Patterns
Raw sensor data
(or Annotated data)
…
….
Intelligent
Processing/
Reasoning
72. A discovery
method in the IoT
time
location
type
Query formulating
[#location || ##ttyyppee || ttiimmee]]
Discovery ID
Discovery/
DHT Server
Data repository
(archived data)
#location
#type
#location
#type
Data hypercube
#location
#type
Gateway
Core network
Logical Connection
Network Connection
Data
73. An example: a discovery method in the IoT
S. A. Hoseinitabatabaei, P. Barnaghi, C. Wang, R. Tafazolli, L. Dong, "A Distributed Data Discovery Mechanism for the Internet of 73
Things", 2014.
74. An example: a discovery method in the IoT
S. A. Hoseinitabatabaei, P. Barnaghi, C. Wang, R. Tafazolli, L. Dong, "A Distributed Data Discovery Mechanism for the Internet of 74
Things", 2014.
77. Equilibrium in transient and non-uniform
world
A D
B C
Image source for equilibrium diagram: John D. Hey, The University of York.
78. Social data analysis: A case study
Slides: Pramod Anantharam, Kno.e.sis Centre,
Wright State University
79. Social data analysis: A case study
− Are people talking about city infrastructure on
twitter?
− Can we extract city infrastructure related events
from twitter?
− How can we leverage event and location
knowledge bases for event extraction?
− How well can we extract city events?
Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
80. Do people talk about city
infrastructure on Twitter?
Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University. 80
81. Some challenges in extracting
events from Tweets
− No well accepted definition of “events related to a
city”.
− Tweets are short (140 characters) and its
informal nature make it hard to analyze
− Entity, location, time, and type of the event
− Multiple reports of the same event and sparse
report of some events (biased sample)
− Numbers don’t necessarily indicate intensity
− Validation of the solution is hard due to the open
domain nature of the problem
Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University. 81
82. Event extraction techniques
− N-grams + Regression
− Text analysis to extract uni- and bi-grams (event markers)
− Feature selection to select best possible event markers
− Apply regression to predict P(Y|X) where Y is the target (rainfall)
and X is the input (event marker).
− Clustering
− Create event clusters incrementally over time
− Identify clusters of interest based on its relevance (manual
inspection)
− Granularity remains at the tweet/cluster level (tweets are
assigned to clusters of interest)
− Sequence Labeling (CRFs)
− Text analysis to extract features such as named entities, POS1
tagging
− Each event indicator is modeled as a mixture of event types that
are latent variables
− Each type corresponds to a distribution over named entities n
(labels assigned to event types by manual inspection) and other
features
Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
83. City event extraction
− Event extraction should be open domain (no a
priori event types) with event metadata.
− Incorporate background knowledge related to city
related events e.g., 511.org hierarchy, SCRIBE
ontology, city location names.
− Assess the intensity of an event using content
and network cues.
− Robust to noise, informal nature, and variability
of data.
Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
84. N-grams + Regression
− Open domain: works best when there is a reference
corpus to extract n-grams
− Event metadata: cannot distinguish between entities
and hence hard to extract event metadata
− Background knowledge: incorporating domain
vocabulary (e.g., subsumption) is not natural
− Event intensity: regression maps the event indicators to
some quantified values
− Robustness: quite robust if there is a reference corpus
Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
85. Clustering
− Open domain: works well for domains with no a priori
knowledge of events (may need human inspection)
− Background knowledge: incorporating domain
vocabulary is not natural
− Event intensity: not captured
− Robustness: quite robust for twitter data with enough
data for each cluster
Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
86. Sequence Labeling (CRFs)
− Open domain: works well for domains with no a priori
knowledge of events (may need human inspection)
− Event metadata: event metadata extraction is captured
naturally with the named entities
− Background knowledge: incorporating domain
vocabulary is quite natural
− Event intensity: part-of-speech tag may indirectly
capture intensity
− Robustness: with a deeper model for capturing context,
quite robust for twitter data
Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
87. City event extraction solution
architecture
POS
Tagging
Tweets from a city POS
City Infrastructure
Tagging
Hybrid NER+
Event term
extraction
Hybrid NER+
Event term
extraction
Impact
Assessment
Impact
Assessment
Temporal
Estimation
Temporal
Estimation
Event
Event
OOSSMM L Looccaatitoionnss SSCCRRIBIBEE o onntotolologgyy Aggregation
Aggregation
GGeeoohhaasshhiningg
551111.o.orrgg h hieierraarrcchhyy
City Event Annotation City Event Extraction
Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
88. City event extraction - Key insights
a) Space: events reported within a grid (gi ∈G
where G is a set of all grids in a city)at a certain
time are most likely reporting the same event
b) Time: events reported within a time Δt in a grid
gi are most likely to be reporting the same event
c) Theme: events with similar entities within a grid
gi and time Δt are most likely reporting the
same event
Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
89. 0.6 miles
Max-lat
Min-lat
Min-long
37.7545166015625, -122.40966796875
Max-long
0.38 miles
37.7490234375, -122.40966796875
37.7545166015625, -122.420654296875
37.7490234375, -122.420654296875
4
37.74933, -122.4106711
Hierarchical spatial structure of geohash for
representing locations with variable precision.
Here the location string is 5H34
0 1 2 3 4 5 6
7 8 9 B C D E
F G H I J K L
0 1
2 3 4
5 6 7
8 9
0 1 2 3 4
5 6 7
0 1 2
3 4 5
6 7 8
City event extraction – Geohashing
90. Implmentation
− City event annotation
− Automated creation of training data
− Annotation task (our CRF model vs. baseline CRF model)
− City event extraction
− Use aggregation algorithm for event extraction
− Extracted events AND ground truth
− Dataset (Aug – Nov 2013) ~ 8 GB of data on disk
− Over 8 million tweets
− Over 162 million sensor data points
− 311 active events and 170 scheduled events
Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
92. 92
Conclusions
− A primary goal of interconnecting devices and
collecting/processing data from them is to create situation
awareness and enable applications, machines, and human
users to better understand their surrounding environments.
− The understanding of a situation, or context, potentially
enables services and applications to make intelligent
decisions and to respond to the dynamics of their
environments.
− Dynamicity, energy efficiency, multi-modality,
heterogeneity and volume are among the key challenges.
− We need to design adaptable and scalable solutions that
combine knowledge and data engineering (semantics),
background knowledge (ontologies) and machine learning
techniques.
93. Acknowledgements
− Some parts of the content are adapted from:
− Holger Karl, Andreas Willig, Protocols and Architectures for
Wireless Sensor Networks, Protocols and Architectures for
Wireless Sensor Networks, chapters 3 and 12, Wiley, 2005 .
− Jessica Lin, Eamonn Keogh, Stefano Lonardi, and Bill Chiu.
2003. A symbolic representation of time series, with
implications for streaming algorithms. In Proceedings of the
8th ACM SIGMOD workshop on Research issues in data mining
and knowledge discovery (DMKD '03). ACM, New York, NY,
USA, 2-11.
− JMOTIF Time series mining,
http://code.google.com/p/jmotif/wiki/SAX
94. Q&A
− Payam Barnaghi, University of
Surrey/EU FP7 CityPulse Project
http://www.ict-citypulse.eu/
@pbarnaghi
p.barnaghi@surrey.ac.uk
Editor's Notes
Publishing and upload the data into the Cloud platforms is not enough; The IoT data needs to be pre-processed and processed; the aim is to extract meaningful knowledge from the data and to use this knowledge and information for control and monitoring and/or for decision making (automated or human controlled).
Classification is another method – but it is not feasible for the open domain problem we are interested in.
CRFs – Conditional Random Fields
Sequence Labeling
Consider deeper features such as named entities, POS tags
Captures context of mention of entities within a tweet
Allows natural incorporation of domain knowledge
Intensity vs. density =&gt; tweets, page-rank analogous to importance of event, importance of events
Importance of event based on location + time
Contrasting density
Noise
Flood - people affected vs. people conversing
N-grams + Regression
Cannot provide event metadata extraction
May need reference corpus for creating n-grams
Needs good quality tweets if no reference corpus
Clustering
Does not capture event metadata
Too coarse grained (tweet level)
May not be able to identify location, time etc.
Sequence Labeling (CRFs)
can be captured as a latent variable conditioned on the occurrence of some entities
Distance computed using the formula:
dlon = lon2 - lon1
dlat = lat2 - lat1
a = (sin(dlat/2))^2 + cos(lat1) * cos(lat2) * (sin(dlon/2))^2
c = 2 * atan2( sqrt(a), sqrt(1-a) )
d = R * c (where R is the radius of the Earth)
Found the box for the tweet!
37.7545166015625, -122.420654296875
37.7545166015625, -122.40966796875
37.7490234375, -122.40966796875
37.7490234375, -122.420654296875