Bridging Sensor Data
Streams and Human
KnowledgeMATTIA ZENI
Trento, 12th December 2017
Department of Information
Engineering and Computer
Science,
University of Trento
mattia.zeni.1@unitn.it
The work compiled in this thesis has been partially supported by the
European Union’s Horizon 2020 (H2020) research and innovation
programme under grant agreement n. 732194, QROWD - Because Big
Motivating
Example
The core is always the
user, and by chance his
name is Fausto
The user
Motivating Example
The Scenario
He has an appointment
for lunch
What?
The restaurant is in the
city center
Where?
The lunch is with an old
friend
With whom?
3/40
Standard Solution
Event in the
Agenda
The user can add the
event in the Agenda
Notifications
A notification will
popup to alert the
user
Manual
Configuration
Alert me 30 minutes
before the event
What is the role of the technology?
Smartphone
Empower users with
applications
4/40
Motivating Example (part 2)
A more complicated and realistic one
Adaptation to weather
conditions
External factors
Fausto is always late
Personality traits
Personal goal of doing
more than 10000 steps
per day
Personal goals
5/40
Adaptation to
unexpected events
Unexpected events
Adaptation
What should the role of technology
be?
Smartphone
User state of
affairs
The machine should
be aware of the user
situation
Adaptation and
personalization
The machine should
adapt to this state of
affairs and react
accordingly
Personalized
services
The machine helps
the human through
useful service
6/40
Empower users with
applications
The context
The context is the key to adaptation
A machine should be
aware of the user context
at any moment in time so
that it can understand
and use it
7/40
The Problem
The two representations
Humans vs Machines
The user has a complete view
of what is her contextual
information. She can
understand what she
perceives, thanks to her
intelligence, habits and
routines.
Human - Context
There is the machine, which is
great at dealing with huge
amount of information in a short
period of time, but it cannot
understand it.
Machine – Sensor data
9/40
Localization service
Where is the user?
GPS
Position represented
as coordinates:
46° 04’N, 11° 07’E
Lookup
The user is in a
building
Smarter
Lookup
The user is in a
University
10/40
Wrong
Lookup
The user is at the
toilet or in the forest
Machine – Sensor data
Sensors to knowledge is not a 1 to 1 mapping
73%Too good
50%
18%
Too bad
11/40
Human - Context
How is this situation tackled by the user
I work in the University
I’m in a meeting I’m in my office
I work for the University
of Trento
TN
I’m in room OFEK
I work in via Sommarive, 4
12/40
The Solution
The Methodology
Based on an interdisciplinary approach
The user knowledge
must be represented
in a way so that the
machine can use it
Semantic Web
The user is the core
of the system and
then we must take
into account her
needs
Sociology
The sensor streaming
data need to be
collected and
analyzed to extract
the insights
Machine Learning
14/40
Semantics
Represent the user knowledge
Ontologies provide structured data that are
general, compositional, reusable and
incremental allowing us to operate in open
domain.
What are the advantages?
We modelled the user and the world using
the Entity Centric approach.
How can it be used?
A set of representational primitives with
which to model a domain of knowledge
[Gruber, 1993]
What is an ontology?
[Gruber,1993] “A translation approach to portable ontology specifications”. Knowledge acquisition, 5(2), 199-220. 15/40
Plugging domains incrementally
The way to adaptation and real time re-configuration
16/40
PartOf
Where
Nr: 1234
Sensors: Wi-Fi,
Bluetooth, GPS
Smartphone
Nr: 567
Markers: 2
Board
Model: XYZ
Temperature: 25 °C
Thermostate
Name: University of
Trento
Location: Trento
University
Name: OS2
Department: DISI
OpenSpace
Name: Fausto
Role: Professor
Person
In
Near
Name: 9.30 meeting
Meeting
Name: Enrico
Role: PhD student
Person
Attend
Attend
In
Own
HasActivity
With
Who
ME
What
Machine Learning
Process the sensor streams according to the contextual information
The task of bridging the semantic
gap because it can be divided into
multiple, less complex, modular
and compositional micro-tasks.
Each micro-task uses streaming
sensor data collected from the user’s
mobile device and processes it in a
context-aware manner.
The result of each micro-task is
mapped to the values of the
attributes in entities composing the
context. 17/40
Knowledge Generation
Micro-task
MANAGING SENSOR DATA
SENSOR DATA PRE-PROCESSING
.
.
.
1
2
N
concepti
WI
concepti
WE
concepti
WA
concepti
WO
INFERRING THE CONTEXT
CONTEXT MODELLING
{T1, ans1
Q1
, ans1
Q2
, ans1
Q3
, ans1
Q4
},
{T2, ans2
Q1
, ans2
Q2
, ans2
Q3
, ans2
Q4
},
{…},
{TN, ansN
Q1
, ansN
Q2
, ansN
Q3
, ansN
Q4
}
ansi
Q1
ansi
Q2
ansi
Q3
ansi
Q4
STREAMING DATA
STORAGE (SB)
CONTEXTUAL
INFORMATION
(KB/EB)
INVOLVING THE USER
REAL TIME
ANNOTATIONS
SENSOR DATA
WI
WE
WA
WO
CONTEXT
AGGREGATION
ADDRESSING
HUMAN RELIABILITY
QUANTIFYING BIASES
( , )∆ QA ∆ A (1,3)
WI
ML
C
ML
C
WE
ML
C
WA
ML
C
ML
C
WO
ML
C
ML
C
18/40
The user must be kept in the loop
Human centered computing
The user is the most important source of
information
19/40
The user needs to collaborate and help
the machine
For doing so, we need to leverage on the
tools and approaches used in Sociology
to understand human behavior
The most important tool are Time Diaries
used as annotations for the sensor data
The Reference Architecture
The Reference Architecture
The main subsystems
Data Acquisition and
Management
Subsystem
In charge of the
acquisition and
management of both
sensor streaming
data but also user
knowledge.
Knowledge Generation
Subsystem
It implements the
procedures used to
analyze the
streaming data to
map them to the
attribute values in the
user knowledge.
Knowledge Exploitation
Subsystem
It exploits the
generated knowledge
to provide services
to the user so that
she can improve her
general quality of life.
21/40
Data Acquisition and Management
SubsystemHow to collect, store, manage and access the data?
22/40
API UNZIP PROCESSINGSCHEMA
MATCHING
INSERT
API DISAMBIGUATIONLINKING INSERTSCHEMA
MATCHING
Entity Data
Storage
Streaming
Data
Storage
Knowledge Generation Subsystem
How to exploit the different data sources to generate meaningful knowledge?
23/40
Internal Database Access
Micro-tasks repository
SchedulerTrigger
Data Scientist
User Feedback
Knowledge Exploitation Subsystem
How to use the generated knowledge to provide services to the user?
24/40
Privacy Layer
Anonymization, Access Control, Data Subscription
Internal Services
External Services
Data
The Stream Base System
Requirements
Instantiation of the reference architecture
The system should operate in real time
with an high number of users, processing
huge amounts of data simultaneously
Performance
The system should be composed by
multiple autonomous blocks that
collaborate so that to provide complex
functionalities
Modularity
The system load in such a scenario is
unpredictable and then it should scale up
(and down) automatically depending on
the needs
Scalability
The system must be always available,
independently on the load the system has
High Availability
The modularity of the system should not
affect its deployment. Having multiple
components doesn’t mean that it is more
difficult to deploy them
Fast Deployment
26/44
Docker is the world’s
leading software
containerization
platform
Modularity
Kubernetes is an
open-source system for
automating
deployment, and
manage containerized
applications.
Scalability, Availability,
Fast deployment
The Apache
Cassandra database is
scalable and highly
available without
compromising
performance.
Scalability
Apache Spark is a
fast and general
engine for large-scale
data processing.
Performance
Technologies
State-of-the-art software solutions
27/40
i-Log
How can we collect data from the users?
The application adapts to different use cases and to
different smartphones
Configurability and adaptability
The user is the only one who can correctly annotate her day
life situations so that the machine can learn how to recognize
them in the future
Collect user annotations as Time Diaries
i-Log is also the platform that will provide service back to
the users, generated from the knowledge extracted from
her data
Provide services
The application can collect up to 39 streams of sensor
data from the user smartphone, generating up to
1GB/day
Collect streaming sensor data
28/40
Real Life
Exploitation
Municipality of Trento
Help decision makers in applying policies about sustainable mobility for the public good
30/40
Modal Split to understand how people
move around the city can help decision
makers in taking decisions
Positive consequences on the society
as a whole
Law reducing cars by 1%
Moving parking lots outside the city to
decongestion
Improve public transport
Vary parking fees
Not a one-man job
The team
31/40
Elisa Gobbi
Ph.D. Student @ Department of Sociology and Social Research
Sociology expert for participant sampling and behavioral analysis
Enrico Bignotti
Ph.D. Student @ Department of Information Engineering and
Computer Science
Philosophy expert for context modelling and annotation
University of Trento
SmartUnitn - The first real life adoption on Unitn students
32/40
Understand how students’ time allocation affects their academic performances
110GB of data72 participants 2 weeks
First real life
experiment
University of Trento
33/40F. Giunchiglia, M. Zeni, E. Gobbi, E. Bignotti and I. Bison, "Mobile Social Media Usage and Academic Performance”, accepted in Computers in Human
Behavior.
Scientific faculties students
are affected by Social
Networks while they study
Humanities female students are
affected by Instant Messaging
while they study
Help students in improving their academic performances
Distribution of Social Media Usage
Quantifying user interactions
34/40
S of social media apps while studying D̄ of social media apps while studying
S̄ of social media apps while attending lessons D̄ of social media apps while attending lessons
S: the average number of
sessions of social media app
usage,
D: the average time of social
media app usage (in seconds),
I: the average time in between
app usage (in seconds)
Quantifying biases in users answers
Inconsistencies in user answering behavior
35/40
∆QA lower than 30.4
minutes.
The red dashed vertical line
is the mean value of 2.81
clusters
DurationA greater than
8.8sec.
The red dashed vertical line
isthe mean value of 2.38
clustersDistribution of the number of clusters for “Home”.
The red dashed vertical line is the mean value of
3.47 clusters
Consistency
Quality of the data
36/40
Already scheduled use cases
What’s next?
0 200 400 600 800 1000
Dec
Jan
Mar
Apr
March 2018
February 2018 March 2018
April 2018
38/40
Want a thesis on this?
Look no further
38/40
Android iOS
Develop the iOS
version of our mobile
application
Machine
Learning
Define algorithm to
process the huge
amount of data we
are collecting
Software
Developer
Develop state of the
art solutions for our
backend
infrastructure
Improve the current
application we use to
collect data and
provide services
mattia.zeni@disi.unitn.it
Thank you

Bridging Sensor Data Streams and Human Knowledge

  • 1.
    Bridging Sensor Data Streamsand Human KnowledgeMATTIA ZENI Trento, 12th December 2017 Department of Information Engineering and Computer Science, University of Trento mattia.zeni.1@unitn.it The work compiled in this thesis has been partially supported by the European Union’s Horizon 2020 (H2020) research and innovation programme under grant agreement n. 732194, QROWD - Because Big
  • 2.
  • 3.
    The core isalways the user, and by chance his name is Fausto The user Motivating Example The Scenario He has an appointment for lunch What? The restaurant is in the city center Where? The lunch is with an old friend With whom? 3/40
  • 4.
    Standard Solution Event inthe Agenda The user can add the event in the Agenda Notifications A notification will popup to alert the user Manual Configuration Alert me 30 minutes before the event What is the role of the technology? Smartphone Empower users with applications 4/40
  • 5.
    Motivating Example (part2) A more complicated and realistic one Adaptation to weather conditions External factors Fausto is always late Personality traits Personal goal of doing more than 10000 steps per day Personal goals 5/40 Adaptation to unexpected events Unexpected events
  • 6.
    Adaptation What should therole of technology be? Smartphone User state of affairs The machine should be aware of the user situation Adaptation and personalization The machine should adapt to this state of affairs and react accordingly Personalized services The machine helps the human through useful service 6/40 Empower users with applications
  • 7.
    The context The contextis the key to adaptation A machine should be aware of the user context at any moment in time so that it can understand and use it 7/40
  • 8.
  • 9.
    The two representations Humansvs Machines The user has a complete view of what is her contextual information. She can understand what she perceives, thanks to her intelligence, habits and routines. Human - Context There is the machine, which is great at dealing with huge amount of information in a short period of time, but it cannot understand it. Machine – Sensor data 9/40
  • 10.
    Localization service Where isthe user? GPS Position represented as coordinates: 46° 04’N, 11° 07’E Lookup The user is in a building Smarter Lookup The user is in a University 10/40 Wrong Lookup The user is at the toilet or in the forest
  • 11.
    Machine – Sensordata Sensors to knowledge is not a 1 to 1 mapping 73%Too good 50% 18% Too bad 11/40
  • 12.
    Human - Context Howis this situation tackled by the user I work in the University I’m in a meeting I’m in my office I work for the University of Trento TN I’m in room OFEK I work in via Sommarive, 4 12/40
  • 13.
  • 14.
    The Methodology Based onan interdisciplinary approach The user knowledge must be represented in a way so that the machine can use it Semantic Web The user is the core of the system and then we must take into account her needs Sociology The sensor streaming data need to be collected and analyzed to extract the insights Machine Learning 14/40
  • 15.
    Semantics Represent the userknowledge Ontologies provide structured data that are general, compositional, reusable and incremental allowing us to operate in open domain. What are the advantages? We modelled the user and the world using the Entity Centric approach. How can it be used? A set of representational primitives with which to model a domain of knowledge [Gruber, 1993] What is an ontology? [Gruber,1993] “A translation approach to portable ontology specifications”. Knowledge acquisition, 5(2), 199-220. 15/40
  • 16.
    Plugging domains incrementally Theway to adaptation and real time re-configuration 16/40 PartOf Where Nr: 1234 Sensors: Wi-Fi, Bluetooth, GPS Smartphone Nr: 567 Markers: 2 Board Model: XYZ Temperature: 25 °C Thermostate Name: University of Trento Location: Trento University Name: OS2 Department: DISI OpenSpace Name: Fausto Role: Professor Person In Near Name: 9.30 meeting Meeting Name: Enrico Role: PhD student Person Attend Attend In Own HasActivity With Who ME What
  • 17.
    Machine Learning Process thesensor streams according to the contextual information The task of bridging the semantic gap because it can be divided into multiple, less complex, modular and compositional micro-tasks. Each micro-task uses streaming sensor data collected from the user’s mobile device and processes it in a context-aware manner. The result of each micro-task is mapped to the values of the attributes in entities composing the context. 17/40
  • 18.
    Knowledge Generation Micro-task MANAGING SENSORDATA SENSOR DATA PRE-PROCESSING . . . 1 2 N concepti WI concepti WE concepti WA concepti WO INFERRING THE CONTEXT CONTEXT MODELLING {T1, ans1 Q1 , ans1 Q2 , ans1 Q3 , ans1 Q4 }, {T2, ans2 Q1 , ans2 Q2 , ans2 Q3 , ans2 Q4 }, {…}, {TN, ansN Q1 , ansN Q2 , ansN Q3 , ansN Q4 } ansi Q1 ansi Q2 ansi Q3 ansi Q4 STREAMING DATA STORAGE (SB) CONTEXTUAL INFORMATION (KB/EB) INVOLVING THE USER REAL TIME ANNOTATIONS SENSOR DATA WI WE WA WO CONTEXT AGGREGATION ADDRESSING HUMAN RELIABILITY QUANTIFYING BIASES ( , )∆ QA ∆ A (1,3) WI ML C ML C WE ML C WA ML C ML C WO ML C ML C 18/40
  • 19.
    The user mustbe kept in the loop Human centered computing The user is the most important source of information 19/40 The user needs to collaborate and help the machine For doing so, we need to leverage on the tools and approaches used in Sociology to understand human behavior The most important tool are Time Diaries used as annotations for the sensor data
  • 20.
  • 21.
    The Reference Architecture Themain subsystems Data Acquisition and Management Subsystem In charge of the acquisition and management of both sensor streaming data but also user knowledge. Knowledge Generation Subsystem It implements the procedures used to analyze the streaming data to map them to the attribute values in the user knowledge. Knowledge Exploitation Subsystem It exploits the generated knowledge to provide services to the user so that she can improve her general quality of life. 21/40
  • 22.
    Data Acquisition andManagement SubsystemHow to collect, store, manage and access the data? 22/40 API UNZIP PROCESSINGSCHEMA MATCHING INSERT API DISAMBIGUATIONLINKING INSERTSCHEMA MATCHING Entity Data Storage Streaming Data Storage
  • 23.
    Knowledge Generation Subsystem Howto exploit the different data sources to generate meaningful knowledge? 23/40 Internal Database Access Micro-tasks repository SchedulerTrigger Data Scientist User Feedback
  • 24.
    Knowledge Exploitation Subsystem Howto use the generated knowledge to provide services to the user? 24/40 Privacy Layer Anonymization, Access Control, Data Subscription Internal Services External Services Data
  • 25.
  • 26.
    Requirements Instantiation of thereference architecture The system should operate in real time with an high number of users, processing huge amounts of data simultaneously Performance The system should be composed by multiple autonomous blocks that collaborate so that to provide complex functionalities Modularity The system load in such a scenario is unpredictable and then it should scale up (and down) automatically depending on the needs Scalability The system must be always available, independently on the load the system has High Availability The modularity of the system should not affect its deployment. Having multiple components doesn’t mean that it is more difficult to deploy them Fast Deployment 26/44
  • 27.
    Docker is theworld’s leading software containerization platform Modularity Kubernetes is an open-source system for automating deployment, and manage containerized applications. Scalability, Availability, Fast deployment The Apache Cassandra database is scalable and highly available without compromising performance. Scalability Apache Spark is a fast and general engine for large-scale data processing. Performance Technologies State-of-the-art software solutions 27/40
  • 28.
    i-Log How can wecollect data from the users? The application adapts to different use cases and to different smartphones Configurability and adaptability The user is the only one who can correctly annotate her day life situations so that the machine can learn how to recognize them in the future Collect user annotations as Time Diaries i-Log is also the platform that will provide service back to the users, generated from the knowledge extracted from her data Provide services The application can collect up to 39 streams of sensor data from the user smartphone, generating up to 1GB/day Collect streaming sensor data 28/40
  • 29.
  • 30.
    Municipality of Trento Helpdecision makers in applying policies about sustainable mobility for the public good 30/40 Modal Split to understand how people move around the city can help decision makers in taking decisions Positive consequences on the society as a whole Law reducing cars by 1% Moving parking lots outside the city to decongestion Improve public transport Vary parking fees
  • 31.
    Not a one-manjob The team 31/40 Elisa Gobbi Ph.D. Student @ Department of Sociology and Social Research Sociology expert for participant sampling and behavioral analysis Enrico Bignotti Ph.D. Student @ Department of Information Engineering and Computer Science Philosophy expert for context modelling and annotation
  • 32.
    University of Trento SmartUnitn- The first real life adoption on Unitn students 32/40 Understand how students’ time allocation affects their academic performances 110GB of data72 participants 2 weeks First real life experiment
  • 33.
    University of Trento 33/40F.Giunchiglia, M. Zeni, E. Gobbi, E. Bignotti and I. Bison, "Mobile Social Media Usage and Academic Performance”, accepted in Computers in Human Behavior. Scientific faculties students are affected by Social Networks while they study Humanities female students are affected by Instant Messaging while they study Help students in improving their academic performances
  • 34.
    Distribution of SocialMedia Usage Quantifying user interactions 34/40 S of social media apps while studying D̄ of social media apps while studying S̄ of social media apps while attending lessons D̄ of social media apps while attending lessons S: the average number of sessions of social media app usage, D: the average time of social media app usage (in seconds), I: the average time in between app usage (in seconds)
  • 35.
    Quantifying biases inusers answers Inconsistencies in user answering behavior 35/40 ∆QA lower than 30.4 minutes. The red dashed vertical line is the mean value of 2.81 clusters DurationA greater than 8.8sec. The red dashed vertical line isthe mean value of 2.38 clustersDistribution of the number of clusters for “Home”. The red dashed vertical line is the mean value of 3.47 clusters
  • 36.
  • 37.
    Already scheduled usecases What’s next? 0 200 400 600 800 1000 Dec Jan Mar Apr March 2018 February 2018 March 2018 April 2018 38/40
  • 38.
    Want a thesison this? Look no further 38/40 Android iOS Develop the iOS version of our mobile application Machine Learning Define algorithm to process the huge amount of data we are collecting Software Developer Develop state of the art solutions for our backend infrastructure Improve the current application we use to collect data and provide services mattia.zeni@disi.unitn.it
  • 39.

Editor's Notes