The document summarizes Mattia Zeni's thesis work on bridging sensor data streams from smartphones with human contextual knowledge to power personalized services. It describes a methodology using semantic web ontologies to represent user knowledge, machine learning to analyze sensor data and extract insights, and involving users through time diary annotations. A reference architecture is proposed with subsystems for data management, knowledge generation, and service exploitation. The work has been applied to help municipalities with transportation planning and to analyze university students' academic performance based on mobile app usage data.
1. Bridging Sensor Data
Streams and Human
KnowledgeMATTIA ZENI
Trento, 12th December 2017
Department of Information
Engineering and Computer
Science,
University of Trento
mattia.zeni.1@unitn.it
The work compiled in this thesis has been partially supported by the
European Union’s Horizon 2020 (H2020) research and innovation
programme under grant agreement n. 732194, QROWD - Because Big
3. The core is always the
user, and by chance his
name is Fausto
The user
Motivating Example
The Scenario
He has an appointment
for lunch
What?
The restaurant is in the
city center
Where?
The lunch is with an old
friend
With whom?
3/40
4. Standard Solution
Event in the
Agenda
The user can add the
event in the Agenda
Notifications
A notification will
popup to alert the
user
Manual
Configuration
Alert me 30 minutes
before the event
What is the role of the technology?
Smartphone
Empower users with
applications
4/40
5. Motivating Example (part 2)
A more complicated and realistic one
Adaptation to weather
conditions
External factors
Fausto is always late
Personality traits
Personal goal of doing
more than 10000 steps
per day
Personal goals
5/40
Adaptation to
unexpected events
Unexpected events
6. Adaptation
What should the role of technology
be?
Smartphone
User state of
affairs
The machine should
be aware of the user
situation
Adaptation and
personalization
The machine should
adapt to this state of
affairs and react
accordingly
Personalized
services
The machine helps
the human through
useful service
6/40
Empower users with
applications
7. The context
The context is the key to adaptation
A machine should be
aware of the user context
at any moment in time so
that it can understand
and use it
7/40
9. The two representations
Humans vs Machines
The user has a complete view
of what is her contextual
information. She can
understand what she
perceives, thanks to her
intelligence, habits and
routines.
Human - Context
There is the machine, which is
great at dealing with huge
amount of information in a short
period of time, but it cannot
understand it.
Machine – Sensor data
9/40
10. Localization service
Where is the user?
GPS
Position represented
as coordinates:
46° 04’N, 11° 07’E
Lookup
The user is in a
building
Smarter
Lookup
The user is in a
University
10/40
Wrong
Lookup
The user is at the
toilet or in the forest
11. Machine – Sensor data
Sensors to knowledge is not a 1 to 1 mapping
73%Too good
50%
18%
Too bad
11/40
12. Human - Context
How is this situation tackled by the user
I work in the University
I’m in a meeting I’m in my office
I work for the University
of Trento
TN
I’m in room OFEK
I work in via Sommarive, 4
12/40
14. The Methodology
Based on an interdisciplinary approach
The user knowledge
must be represented
in a way so that the
machine can use it
Semantic Web
The user is the core
of the system and
then we must take
into account her
needs
Sociology
The sensor streaming
data need to be
collected and
analyzed to extract
the insights
Machine Learning
14/40
15. Semantics
Represent the user knowledge
Ontologies provide structured data that are
general, compositional, reusable and
incremental allowing us to operate in open
domain.
What are the advantages?
We modelled the user and the world using
the Entity Centric approach.
How can it be used?
A set of representational primitives with
which to model a domain of knowledge
[Gruber, 1993]
What is an ontology?
[Gruber,1993] “A translation approach to portable ontology specifications”. Knowledge acquisition, 5(2), 199-220. 15/40
16. Plugging domains incrementally
The way to adaptation and real time re-configuration
16/40
PartOf
Where
Nr: 1234
Sensors: Wi-Fi,
Bluetooth, GPS
Smartphone
Nr: 567
Markers: 2
Board
Model: XYZ
Temperature: 25 °C
Thermostate
Name: University of
Trento
Location: Trento
University
Name: OS2
Department: DISI
OpenSpace
Name: Fausto
Role: Professor
Person
In
Near
Name: 9.30 meeting
Meeting
Name: Enrico
Role: PhD student
Person
Attend
Attend
In
Own
HasActivity
With
Who
ME
What
17. Machine Learning
Process the sensor streams according to the contextual information
The task of bridging the semantic
gap because it can be divided into
multiple, less complex, modular
and compositional micro-tasks.
Each micro-task uses streaming
sensor data collected from the user’s
mobile device and processes it in a
context-aware manner.
The result of each micro-task is
mapped to the values of the
attributes in entities composing the
context. 17/40
18. Knowledge Generation
Micro-task
MANAGING SENSOR DATA
SENSOR DATA PRE-PROCESSING
.
.
.
1
2
N
concepti
WI
concepti
WE
concepti
WA
concepti
WO
INFERRING THE CONTEXT
CONTEXT MODELLING
{T1, ans1
Q1
, ans1
Q2
, ans1
Q3
, ans1
Q4
},
{T2, ans2
Q1
, ans2
Q2
, ans2
Q3
, ans2
Q4
},
{…},
{TN, ansN
Q1
, ansN
Q2
, ansN
Q3
, ansN
Q4
}
ansi
Q1
ansi
Q2
ansi
Q3
ansi
Q4
STREAMING DATA
STORAGE (SB)
CONTEXTUAL
INFORMATION
(KB/EB)
INVOLVING THE USER
REAL TIME
ANNOTATIONS
SENSOR DATA
WI
WE
WA
WO
CONTEXT
AGGREGATION
ADDRESSING
HUMAN RELIABILITY
QUANTIFYING BIASES
( , )∆ QA ∆ A (1,3)
WI
ML
C
ML
C
WE
ML
C
WA
ML
C
ML
C
WO
ML
C
ML
C
18/40
19. The user must be kept in the loop
Human centered computing
The user is the most important source of
information
19/40
The user needs to collaborate and help
the machine
For doing so, we need to leverage on the
tools and approaches used in Sociology
to understand human behavior
The most important tool are Time Diaries
used as annotations for the sensor data
21. The Reference Architecture
The main subsystems
Data Acquisition and
Management
Subsystem
In charge of the
acquisition and
management of both
sensor streaming
data but also user
knowledge.
Knowledge Generation
Subsystem
It implements the
procedures used to
analyze the
streaming data to
map them to the
attribute values in the
user knowledge.
Knowledge Exploitation
Subsystem
It exploits the
generated knowledge
to provide services
to the user so that
she can improve her
general quality of life.
21/40
22. Data Acquisition and Management
SubsystemHow to collect, store, manage and access the data?
22/40
API UNZIP PROCESSINGSCHEMA
MATCHING
INSERT
API DISAMBIGUATIONLINKING INSERTSCHEMA
MATCHING
Entity Data
Storage
Streaming
Data
Storage
23. Knowledge Generation Subsystem
How to exploit the different data sources to generate meaningful knowledge?
23/40
Internal Database Access
Micro-tasks repository
SchedulerTrigger
Data Scientist
User Feedback
24. Knowledge Exploitation Subsystem
How to use the generated knowledge to provide services to the user?
24/40
Privacy Layer
Anonymization, Access Control, Data Subscription
Internal Services
External Services
Data
26. Requirements
Instantiation of the reference architecture
The system should operate in real time
with an high number of users, processing
huge amounts of data simultaneously
Performance
The system should be composed by
multiple autonomous blocks that
collaborate so that to provide complex
functionalities
Modularity
The system load in such a scenario is
unpredictable and then it should scale up
(and down) automatically depending on
the needs
Scalability
The system must be always available,
independently on the load the system has
High Availability
The modularity of the system should not
affect its deployment. Having multiple
components doesn’t mean that it is more
difficult to deploy them
Fast Deployment
26/44
27. Docker is the world’s
leading software
containerization
platform
Modularity
Kubernetes is an
open-source system for
automating
deployment, and
manage containerized
applications.
Scalability, Availability,
Fast deployment
The Apache
Cassandra database is
scalable and highly
available without
compromising
performance.
Scalability
Apache Spark is a
fast and general
engine for large-scale
data processing.
Performance
Technologies
State-of-the-art software solutions
27/40
28. i-Log
How can we collect data from the users?
The application adapts to different use cases and to
different smartphones
Configurability and adaptability
The user is the only one who can correctly annotate her day
life situations so that the machine can learn how to recognize
them in the future
Collect user annotations as Time Diaries
i-Log is also the platform that will provide service back to
the users, generated from the knowledge extracted from
her data
Provide services
The application can collect up to 39 streams of sensor
data from the user smartphone, generating up to
1GB/day
Collect streaming sensor data
28/40
30. Municipality of Trento
Help decision makers in applying policies about sustainable mobility for the public good
30/40
Modal Split to understand how people
move around the city can help decision
makers in taking decisions
Positive consequences on the society
as a whole
Law reducing cars by 1%
Moving parking lots outside the city to
decongestion
Improve public transport
Vary parking fees
31. Not a one-man job
The team
31/40
Elisa Gobbi
Ph.D. Student @ Department of Sociology and Social Research
Sociology expert for participant sampling and behavioral analysis
Enrico Bignotti
Ph.D. Student @ Department of Information Engineering and
Computer Science
Philosophy expert for context modelling and annotation
32. University of Trento
SmartUnitn - The first real life adoption on Unitn students
32/40
Understand how students’ time allocation affects their academic performances
110GB of data72 participants 2 weeks
First real life
experiment
33. University of Trento
33/40F. Giunchiglia, M. Zeni, E. Gobbi, E. Bignotti and I. Bison, "Mobile Social Media Usage and Academic Performance”, accepted in Computers in Human
Behavior.
Scientific faculties students
are affected by Social
Networks while they study
Humanities female students are
affected by Instant Messaging
while they study
Help students in improving their academic performances
34. Distribution of Social Media Usage
Quantifying user interactions
34/40
S of social media apps while studying D̄ of social media apps while studying
S̄ of social media apps while attending lessons D̄ of social media apps while attending lessons
S: the average number of
sessions of social media app
usage,
D: the average time of social
media app usage (in seconds),
I: the average time in between
app usage (in seconds)
35. Quantifying biases in users answers
Inconsistencies in user answering behavior
35/40
∆QA lower than 30.4
minutes.
The red dashed vertical line
is the mean value of 2.81
clusters
DurationA greater than
8.8sec.
The red dashed vertical line
isthe mean value of 2.38
clustersDistribution of the number of clusters for “Home”.
The red dashed vertical line is the mean value of
3.47 clusters
37. Already scheduled use cases
What’s next?
0 200 400 600 800 1000
Dec
Jan
Mar
Apr
March 2018
February 2018 March 2018
April 2018
38/40
38. Want a thesis on this?
Look no further
38/40
Android iOS
Develop the iOS
version of our mobile
application
Machine
Learning
Define algorithm to
process the huge
amount of data we
are collecting
Software
Developer
Develop state of the
art solutions for our
backend
infrastructure
Improve the current
application we use to
collect data and
provide services
mattia.zeni@disi.unitn.it