Invited talk delivered at Paris Descartes Univ., Seminars on Data Analytics, Paris, 15.10.2015. Link: http://www.mi.parisdescartes.fr/~themisp/seminars/2015-10-22-Theodoridis.html
Presented in : JIST2015, Yichang, China
Prototype: http://rc.lodac.nii.ac.jp/rdf4u/
Video: https://www.youtube.com/watch?v=z3roA9-Cp8g
Abstract: It is known that Semantic Web and Linked Open Data (LOD) are powerful technologies for knowledge management, and explicit knowledge is expected to be presented by RDF format (Resource Description Framework), but normal users are far from RDF due to technical skills required. As we learn, a concept-map or a node-link diagram can enhance the learning ability of learners from beginner to advanced user level, so RDF graph visualization can be a suitable tool for making users be familiar with Semantic technology. However, an RDF graph generated from the whole query result is not suitable for reading, because it is highly connected like a hairball and less organized. To make a graph presenting knowledge be more proper to read, this research introduces an approach to sparsify a graph using the combination of three main functions: graph simplification, triple ranking, and property selection. These functions are mostly initiated based on the interpretation of RDF data as knowledge units together with statistical analysis in order to deliver an easily-readable graph to users. A prototype is implemented to demonstrate the suitability and feasibility of the approach. It shows that the simple and flexible graph visualization is easy to read, and it creates the impression of users. In addition, the attractive tool helps to inspire users to realize the advantageous role of linked data in knowledge management.
Entity Search: The Last Decade and the Nextkrisztianbalog
Keynote talk given at the 10th Russian Summer School in Information Retrieval (RuSSIR ’16), Saratov, Russia, August 2016.
Note: part of the work is under still review; those slides are not yet included.
This is Part II of the tutorial "Entity Linking and Retrieval" given at SIGIR 2013 (together with E. Meij and D. Odijk). For the complete tutorial material (including slides for the other parts) visit http://ejmeij.github.io/entity-linking-and-retrieval-tutorial/
Presented in : JIST2015, Yichang, China
Prototype: http://rc.lodac.nii.ac.jp/rdf4u/
Video: https://www.youtube.com/watch?v=z3roA9-Cp8g
Abstract: It is known that Semantic Web and Linked Open Data (LOD) are powerful technologies for knowledge management, and explicit knowledge is expected to be presented by RDF format (Resource Description Framework), but normal users are far from RDF due to technical skills required. As we learn, a concept-map or a node-link diagram can enhance the learning ability of learners from beginner to advanced user level, so RDF graph visualization can be a suitable tool for making users be familiar with Semantic technology. However, an RDF graph generated from the whole query result is not suitable for reading, because it is highly connected like a hairball and less organized. To make a graph presenting knowledge be more proper to read, this research introduces an approach to sparsify a graph using the combination of three main functions: graph simplification, triple ranking, and property selection. These functions are mostly initiated based on the interpretation of RDF data as knowledge units together with statistical analysis in order to deliver an easily-readable graph to users. A prototype is implemented to demonstrate the suitability and feasibility of the approach. It shows that the simple and flexible graph visualization is easy to read, and it creates the impression of users. In addition, the attractive tool helps to inspire users to realize the advantageous role of linked data in knowledge management.
Entity Search: The Last Decade and the Nextkrisztianbalog
Keynote talk given at the 10th Russian Summer School in Information Retrieval (RuSSIR ’16), Saratov, Russia, August 2016.
Note: part of the work is under still review; those slides are not yet included.
This is Part II of the tutorial "Entity Linking and Retrieval" given at SIGIR 2013 (together with E. Meij and D. Odijk). For the complete tutorial material (including slides for the other parts) visit http://ejmeij.github.io/entity-linking-and-retrieval-tutorial/
Researchers have been interested recently in publishing and linking Humanities datasets following Linked Data principles. This has given rise to some issues that complicate the semantic modelling, comparison, combination and longitudinal analysis of these datasets. In this research proposal we discuss three of these issues: representation round- tripping, concept drift, and contextual knowledge. We advocate an inte- grated approach to solve them, and present some preliminary results.
Cities are composed of complex systems with physical, cyber, and social components. Current works on extracting and understanding city events mainly rely on technology enabled infrastructure to observe and record events. In this work, we propose an approach to leverage citizen observations of various city systems and services such as traffic, public transport, water supply, weather, sewage, and public safety as a source of city events. We investigate the feasibility of using such textual streams for extracting city events from annotated text. We formalize the problem of annotating social streams such as microblogs as a sequence labeling problem. We present a novel training data creation process for training sequence labeling models. Our automatic training data creation process utilizes instance level domain knowledge (e.g., locations in a city, possible event terms). We compare this automated annotation process to a state-of-the-art tool that needs manually created training data and show that it has comparable performance in annotation tasks. An aggregation algorithm is then presented for event extraction from annotated text. We carry out a comprehensive evaluation of the event annotation and event extraction on a real-world dataset consisting of event reports and tweets collected over four months from San Francisco Bay Area. The evaluation results are promising and provide insights into the utility of social stream for extracting city events.
Euro30 2019 - Benchmarking tree approaches on street dataFabion Kauker
By examining the use of algorithms to solve the Prize Collecting Steiner Tree (PCST) problem we consider the facets which determine effectiveness. Specifically, by measuring a number of solution approaches and comparing them based on metrics. In order to understand the solution approach we must asses why it is useful. Our goal is to determine the effectiveness of Mixed Integer Programming (MIP) and heuristic methods. Utilizing freely available street and address data a base graph representation is created and then computed on. Such that a tree connects every address utilizing the minimum total length of edges from the street network. This is the basis of many approaches used to solve infrastructure problems including telecommunications network design and costing. The analysis is conducted on methods developed by Hegde et al. 2015, Ljubić et al. 2006, and Teitz et al. 1963. We present a data processing architecture, as well as a concise set of results and a framework for assessing the facets and trade-offs for a given approach. In this case the heuristic approaches are proven to have advantages in the simplistic case but fail when more complex requirements are added. This is where the MIP approach is able to capitalize, whilst detrimentally limiting the flexibility due to the strictness and specificity in modelling.
Mobile information collectors trajectory data warehouse designIJMIT JOURNAL
To analyze complex phenomena which involve moving objects, Trajectory Data Warehouse (TDW) seems to be an answer for many recent decision problems related to various professions (physicians, commercial representatives, transporters, ecologists …) concerned with mobility. This work aims to make trajectories as a first class concept in the trajectory data conceptual model and to design a TDW, in which data resulting from mobile information collectors’ trajectory are gathered. These data will be analyzed, according to trajectory characteristics, for decision making purposes, such as new products commercialization, new commerce implementation, etc.
Oplægget blev holdt ved InfinIT-arrangementet Big Data og data-intensive systemer i Danmark, der blev af holdt en 15. januar 2014. Læs mere om arrangementet her: http://infinit.dk/dk/arrangementer/tidligere_arrangementer/big_data_i_danmark.htm
Presentation I used while defending my thesis on MEILI: Multiple Day Travel Behaviour Data Collection, Automation and Analysis.
Thesis available at: http://kth.diva-portal.org/smash/record.jsf?pid=diva2%3A1204245&dswid=7962
While the Rio 2016 Olympics are winding down and the final medals are being handed out, we thought we would share a bit of work that was done recently by Rik Van Bruggen to explore a really interesting dataset in Neo4j.
Based on an original public dataset by the UK newspaper The Guardian, Rik completed the medallist dataset to contain over 30,000 Olympians between 1896 and 2012. He created a graph model, loaded the data, and wrote a bunch of example queries that yielded some very interesting results. Join us for this 30 minute webinar where we’ll take you through this great Olympian graph and take the data for a spin yourself afterwards.
Researchers have been interested recently in publishing and linking Humanities datasets following Linked Data principles. This has given rise to some issues that complicate the semantic modelling, comparison, combination and longitudinal analysis of these datasets. In this research proposal we discuss three of these issues: representation round- tripping, concept drift, and contextual knowledge. We advocate an inte- grated approach to solve them, and present some preliminary results.
Cities are composed of complex systems with physical, cyber, and social components. Current works on extracting and understanding city events mainly rely on technology enabled infrastructure to observe and record events. In this work, we propose an approach to leverage citizen observations of various city systems and services such as traffic, public transport, water supply, weather, sewage, and public safety as a source of city events. We investigate the feasibility of using such textual streams for extracting city events from annotated text. We formalize the problem of annotating social streams such as microblogs as a sequence labeling problem. We present a novel training data creation process for training sequence labeling models. Our automatic training data creation process utilizes instance level domain knowledge (e.g., locations in a city, possible event terms). We compare this automated annotation process to a state-of-the-art tool that needs manually created training data and show that it has comparable performance in annotation tasks. An aggregation algorithm is then presented for event extraction from annotated text. We carry out a comprehensive evaluation of the event annotation and event extraction on a real-world dataset consisting of event reports and tweets collected over four months from San Francisco Bay Area. The evaluation results are promising and provide insights into the utility of social stream for extracting city events.
Euro30 2019 - Benchmarking tree approaches on street dataFabion Kauker
By examining the use of algorithms to solve the Prize Collecting Steiner Tree (PCST) problem we consider the facets which determine effectiveness. Specifically, by measuring a number of solution approaches and comparing them based on metrics. In order to understand the solution approach we must asses why it is useful. Our goal is to determine the effectiveness of Mixed Integer Programming (MIP) and heuristic methods. Utilizing freely available street and address data a base graph representation is created and then computed on. Such that a tree connects every address utilizing the minimum total length of edges from the street network. This is the basis of many approaches used to solve infrastructure problems including telecommunications network design and costing. The analysis is conducted on methods developed by Hegde et al. 2015, Ljubić et al. 2006, and Teitz et al. 1963. We present a data processing architecture, as well as a concise set of results and a framework for assessing the facets and trade-offs for a given approach. In this case the heuristic approaches are proven to have advantages in the simplistic case but fail when more complex requirements are added. This is where the MIP approach is able to capitalize, whilst detrimentally limiting the flexibility due to the strictness and specificity in modelling.
Mobile information collectors trajectory data warehouse designIJMIT JOURNAL
To analyze complex phenomena which involve moving objects, Trajectory Data Warehouse (TDW) seems to be an answer for many recent decision problems related to various professions (physicians, commercial representatives, transporters, ecologists …) concerned with mobility. This work aims to make trajectories as a first class concept in the trajectory data conceptual model and to design a TDW, in which data resulting from mobile information collectors’ trajectory are gathered. These data will be analyzed, according to trajectory characteristics, for decision making purposes, such as new products commercialization, new commerce implementation, etc.
Oplægget blev holdt ved InfinIT-arrangementet Big Data og data-intensive systemer i Danmark, der blev af holdt en 15. januar 2014. Læs mere om arrangementet her: http://infinit.dk/dk/arrangementer/tidligere_arrangementer/big_data_i_danmark.htm
Presentation I used while defending my thesis on MEILI: Multiple Day Travel Behaviour Data Collection, Automation and Analysis.
Thesis available at: http://kth.diva-portal.org/smash/record.jsf?pid=diva2%3A1204245&dswid=7962
While the Rio 2016 Olympics are winding down and the final medals are being handed out, we thought we would share a bit of work that was done recently by Rik Van Bruggen to explore a really interesting dataset in Neo4j.
Based on an original public dataset by the UK newspaper The Guardian, Rik completed the medallist dataset to contain over 30,000 Olympians between 1896 and 2012. He created a graph model, loaded the data, and wrote a bunch of example queries that yielded some very interesting results. Join us for this 30 minute webinar where we’ll take you through this great Olympian graph and take the data for a spin yourself afterwards.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
On the Management, Analysis and Simulation of our LifeSteps
1. On the management,
analysis and simulation
of our LifeSteps*
Yannis Theodoridis
Data Science Lab., Univ. Piraeus
www.datastories.org
(*) joint work with N. Pelekis, S.
Sideridis, P. Tampakis
Paris Descartes Univ., 15.10.2015
2. Motivation (1/2)
¡ The field of Mobility Data Management and Exploration*
has many success stories to narrate
¡ Data management - access methods, query
processing techniques, DBMS extensions
(the so-called, Moving Object Databases)
¡ Data exploration – trajectory data cubes,
data mining techniques (clusters, flocks,
convoys, T-patterns, etc.)
¡ … all based on the sampled spatio-temporal
coordinates (x-, y-, t-axis) of moving objects
(*) N. Pelekis, Y. Theodoridis (2014): Mobility data management and
exploration. Springer, New York.
3. Motivation (2/2)
¡ The new era that emerges is around two keywords:
semantic trajectories* and BIG mobility data
¡ Semantic trajectories – information
about when, where, what, why
¡ BIG mobility data - voluminous, streaming,
disperse information about movement of
objects (on land, sea, air)
(*) C. Parent, S. Spaccapietra, C. Renso, G. Andrienko, N. Andrienko, V.
Bogorny, M. L. Damiani, A. Gkoulalas-Divanis, J. Macedo, N. Pelekis, Y.
Theodoridis, Z. Yan (2013): Semantic trajectories modeling and
analysis. ACM Computing Surveys, 45(4).
4. Talk outline
¡ Background – what is a LifeStep
¡ On the management and analysis of our LifeSteps
¡ On the simulation of our LifeSteps
¡ Relevant publications
5. Talk outline
¡ Background – what is a LifeStep
¡ On the management and analysis of our LifeSteps
¡ On the simulation of our LifeSteps
¡ Relevant publications
6. raw mobility (x, y, t)
data series
e.g., GPS feeds
mobility diaries:
meaningful mobility
tuples of type
<when, where, how,
what, why>
Home (breakfast) office (work) Market (shopping) Home (relax)
Road
(bus)
Train
(metro)
Sideway
(walk)
[~, 8am]
[8am, 9am] [6pm, 6:30am] [7:30pm, 8pm]
[9am, 6pm] [6:30pm, 7:30pm] [8pm,~]
From raw (GPS-based) to
semantic trajectories (Parent et al., 2013 )
7. Home (breakfast) office (work) Market (shopping) Home (relax)
Road
(bus)
Train
(metro)
Sideway
(walk)
[~, 8am]
[8am, 9am] [6pm, 6:30am] [7:30pm, 8pm]
[9am, 6pm] [6:30pm, 7:30pm] [8pm,~]
Semantic trajectories consist of
our LifeSteps (Pelekis et al. 2013b)
¡ (informal) Definitions:
¡ An Episode / LifeStep
is a tuple modeling
homogeneous
movement
behavior
(Stop vs. Move)
¡ A Semantic Trajectory / Mobility Timeline is a
sequence of Episodes / LifeSteps
9. Challenges
¡ Drawbacks:
¡ A MOD system cannot be used as-is to support semantic
trajectories
¡ different models, querying and indexing requirements,
¡ different specs for data analytics
¡ Real semantic trajectory data (of appropriate size) are not
available nowadays.
¡ synthetic data generators should be developed (as usual)
¡ Questions (that motivate our work):
¡ Q1: how would a semantic-aware MOD look like?
¡ Q2: how would a semantic trajectory data
generator look like?
10. Talk outline
¡ Background – what is a LifeStep
¡ On the management and analysis of our
LifeSteps
¡ On the simulation of our LifeSteps
¡ Relevant publications
11. Motivation 1 – specs for a
semantic MOD
¡ (preliminary step) Activity Inference issues
¡ From spatial to activity information
¡ he/she stopped where? for what purpose?
¡ Management issues
¡ Querying semantic MODs
¡ raw vs. semantic layer
¡ Analytics issues
¡ Similarity measure
¡ Sampling, Clustering, etc.
12. Activity inference
¡ Activity inference
¡ Stopped in a place why? to perform which activity?
¡ Open linked data are quite useful for this purpose
¡ Our Baquara methodology (Fileto et al. 2013, 2015)
Linking
Data Pre-processing
Textually
Annotated
Movement
Data
Ontologies
& LOD
Semantically
Enriched
Movement Data
Data
Cleansing &
Integration
Data
Compressing
Text & KB Pre-
processing
Spatio-
Temporal
Matching
Textual
Matching
Refinement &
Disambiguation
14. ¡ Q1 type: queries involving raw data
¡ Spatio-temporal (range, NN, …),
trajectory-based queries
¡ Q2 type: queries involving
semantically-enriched data.
Example:
¡ Find those who follow the pattern
“home – office – home” Mon-Fri
¡ Q3 type: cross-over queries. Examples:
¡ Find those who cross the city center on their way from office back to home
¡ How many of them make long trips (e.g. more than 20 km) on their way
from home to office? Exclude the trajectories which include intermediate
stops.
Querying Semantic MODs
15. Indexing Semantic MODs
¡ Hybrid indexing of spatial-temporal-textual information:
Sem3DR-tree vs. SemTB-tree (Pelekis et al. 2015b)
16. Querying Spatio-Temporal-
Textual Patterns (ST2P)
¡ An ST2P is a (simplified) regular expression consisting of LifeStep
objects. Formally:
Q := <p* | p is either a LifeStep lsi or a wildcard w ∈ {>, *}>
Example:
Q = [ls1 > ls2 * ls3 > ls4]
i.e. timelines starting from ls1,
immediately followed by ls2,
then followed by * LifeSteps,
then followed by ls3,
then ending to ls4
18. Querying Semantic MODs (cont.)
¡ Q4 type: Selection queries
over a SMN
¡ “Find Alice’s mobility network for
her movement during last week;
restrict it inside region R; call the
resulting network A”
¡ Q5 type: Aggregate queries
over a SMN
¡ “Find Alice’s Facebook friends’ mobility
network for the same period; roll it up
at level 2; call the resulting network B”
¡ “Given the above two networks A and B, extract the network where Alice and her
friends perform same activities by following e.g. similar routes; call the resulting
network C”
¡ Q6 type: Cross-over queries using SMN
¡ “Find Alice’s raw trajectories conforming to network C”
21. Application
Interface(s)
Geodata Sources
(Road network,
Land Usage, POI/ROI, etc.)
Semantic Mobility
Database (SMD)
Raw (e.g. GPS)
Mobility Storage
SemanticMobility
Storage
MOD index(raw) Moving Object
Database (MOD)
Queries
(MD/SMD/OLAP)
Results
(e.g. mobility timeline/network
visualization)
SMD index
SMD
Cube
Construction / Cross-over Operators
- Raw trajectory cleansing, compression,
map-matching, …
- Semantic mobility timeline
reconstruction (segmentation (lifesteps:
meteorsteps/moves), annotation, …
ETL
(Extract,
Transform,
Load)
process
Advanced Operators
Semantic trajectory similarity
search, compression, clustering,
FP mining etc.
Primitive Operators
Attribute Filtering, space /
time / trajectory
derivatives, Semantic OD-
matrix, etc.
Advanced
graph-based
OLAP
operations
22. Talk outline
¡ Background – what is a LifeStep
¡ On the management and analysis of our LifeSteps
¡ On the simulation of our LifeSteps
¡ Relevant publications
23. Motivation 2 – specs for a
semantic trajectory generator
¡ Lack of real BIG “synchronized” raw
(i.e. GPS logs) - diaries (i.e. annotated
trips) dataset
¡ Simulate different mobility profiles -
popular behaviors of people like e.g.
¡ students in campus vs. a downtown
building,
¡ 9-to-5 vs. workaholic employees, etc.
¡ Results:
¡ Hermoupolis (Pelekis et al. 2013a)
¡ Hermoupolisby-example (Pelekis et al.
2015a; 2015c)
24. Hermoupolis - the big picture
Motivation: lack of real semantic trajectories
following various mobility patterns
Road network
POIs
Mobility Profiles
(~ abstract semantic trajectories)
INPUT OUTPUT
Flocks
Swarms
Meeting Points
Methodology:
generate movements w.r.t. mobility
profiles
Synchronizedraw-and
semantictrajectories
25. (parenthesis – Brinkhoff
generator)
(Hermoupolis exploits on
Brinkhoff generator for raw
trajectories)
¡ Brinkhoff methodology:
¡ generate starting points
¡ generate length of route
(depending on object class)
¡ generate destination for each
object
¡ compute the route
¡ compute the trajectory by generating a
random speed every time unit
¡ based on capacity, weather, edge
class, etc.
source: www.fh-oow.de/institute/iapg/
personen/brinkhoff/generator
26. ¡ spatial + temporal +
semantic profiles
¡ P1) Attending school
¡ Home – School – Home
¡ P2) Studying at university
¡ Home – Campus –
Leisure – Home
¡ P3) Working and having fun
¡ Home – Work – Leisure
– Home
¡ P4) Working and shopping
¡ Home – Work – Mall –
Home
¡ P5) Working (only)
¡ Home – Work – Home
¡ P6) Having fun (only)
¡ Home – Leisure – Home
Hermoupolis input
H
S
H
W
L
H
H
C
L
H
W
L
H
L
27. !
Hermoupolis output
Generate objects moving in
Athens
¡ ... of certain population (e.g.
4 millions)
¡ ... during a period (e.g. 1
week)
¡ ... belonging to a number of
population profiles
P1
P2
P3
P5
P4
P6
28. The next step …
¡ Generate-by-example
¡ Given a small real dataset,
produce a large synthetic,
as much similar as the initial one
¡ The number, distribution,
characteristics etc. of population
profiles should be discovered
by the input dataset
Hermoupolis è Hermoupolisby-example
34. Research issues addressed
¡ Step 1. Clustering
¡ What is an appropriate spatial-temporal-textual
semantic trajectory similarity measure?
¡ Which clustering algorithm?
¡ Step 2. Clusters’ generalization
¡ How to create a generalized mobility profile for
each cluster?
¡ Step 3. Clusters’ classification
¡ How to classify clusters into equivalence classes?
¡ Step 4. Hermoupolis
¡ How to select PoIs?
¡ How to generate artificial STOPs and MOVEs?
35. Hermoupolis vs. related work
(1/2)
mobility
features
obstacles
avoidance
objects
interaction
network-
based
stop
generation
GSTD (and
variations)
✔ ✔ ✔
CENTRE ✔ ✔
G-TERD ✔
OPORTO ✔
Brinkhoff ✔ ✔ ✔
SUMO ✔ ✔ ✔
BerlinMOD ✔ ✔ ✔
ST-ACTS
MWGen ✔
GAMMA
Hermoupolis ✔ ✔ ✔ ✔ ✔
36. Hermoupolis vs. related work
(2/2)
long time
generation
pattern-
aware
by-example
additional
data
activities /
semantics
GSTD (and
variations)
✔
CENTRE
G-TERD
OPORTO
Brinkhoff
SUMO
BerlinMOD ✔ ✔ ✔
ST-ACTS ✔ ✔
MWGen
GAMMA ✔ ✔
Hermoupolis ✔ ✔ ✔ ✔ ✔
37. Talk outline
¡ Background – what is a LifeStep
¡ On the management and analysis of our LifeSteps
¡ On the simulation of our LifeSteps
¡ Relevant publications
38. Relevant publications
¡ On the activity inference (the Baquara ontology):
¡ R. Fileto, M. Krüger, N. Pelekis, Y. Theodoridis, C. Renso (2013):
Baquara: a holistic ontological framework for movement analysis using
linked data. Proc. ER’13. (best paper award)
¡ R. Fileto, C. May, C. Renso, N. Pelekis, D. Klein, Y. Theodoridis (2015):
The Baquara2 knowledge-based framework for semantic enrichment
and analysis of movement data. Data Knowl. Eng., 98.
¡ On the management of LifeSteps (Semantic MODs):
¡ N. Pelekis, Y. Theodoridis, D. Janssens (2013b): On the management
and analysis of our LifeSteps. SIGKDD Explorations, 15(1).
¡ N. Pelekis, S. Sideridis, Y. Theodoridis (2015b): Hermessem: a semantic-
aware framework for the management and analysis of our LifeSteps.
Proc. DSAA’15.
39. Relevant publications (cont.)
¡ On the simulation of semantic trajectories (Hermoupolis):
¡ N. Pelekis, C. Ntrigkogias, P. Tampakis, S. Sideridis, Y. Theodoridis
(2013a): Hermoupolis: a trajectory generator for simulating
generalized mobility patterns. Proc. ECML/PKDD'13.
¡ N. Pelekis, S. Sideridis, P. Tampakis, Y. Theodoridis (2015a):
Hermoupolis: a semantic trajectory generator in the data science era.
ACM SIGSPATIAL Special Newsletter, 7(1).
¡ N. Pelekis, S. Sideridis, P. Tampakis, Y. Theodoridis (2015c):
Simulating our LifeSteps by example. Submitted.