1. 16th
November, 2018 1st
IEEE SmartNets @ Hammamet, Tunisia 1
The Quest for Scalable and Intelligent
Trajectory Data Analytics Systems: Status
Report and Future Directions
Rim Moussa LaTICE Lab. Univ. of Tunis and University of Carthage
Ahmed Haddad LaTICE Lab. Univ. of Tunis and University of Carthage
Tarek Bejaoui MEDIATRON Lab. University of Carthage
1st
IEEE SmartNets @ Hammamet
16th
of November, 2018
2. 16th
November, 2018 1st
IEEE SmartNets @ Hammamet, Tunisia 2
Scalable Trips' Records Analysis
●Characteristics of trips' records
»Big volume
»Big velocity
●NYC Cabs' dataset
»http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml
»More than 200GB
»Yellow and Green taxi trips' records from 2009 to now
»csv format
»capturing pick-up and drop-off dates/times, pick-up and
drop-off locations, fares, rate types, payment types, and
driver-reported passenger counts
3. 16th
November, 2018 1st
IEEE SmartNets @ Hammamet, Tunisia 3
Goals of Scalable Trips' Records Analysis
●Turn trajectory data into knowledge
»Multi-dimensional analysis of trajectory data
●e.g. Average fare, Average trip duration... for a given
pick-up location and a given drop-off location between
9pm and 10pm
»Mining of Trajectory Patterns
●Hotspots and cold areas
●Frequent/Infrequent trajectory patterns
●Turn knowledge into decisions
●Intelligent urban computing
4. 16th
November, 2018 1st
IEEE SmartNets @ Hammamet, Tunisia 4
Outline
●Key Functional Requirements of Intelligent and Scalable
Trajectory Data Analysis
●Overview of state-of-the-art open -source Technologies
»Elastic stack -data shippers + search engine + visualization
»Geomondrian -spatial relational OLAP engine + Relational DBMS
»Leaflet -JavaScript library for mobile-friendly interactive maps +
relational data store
»Neo4j -graph database
●Neo4j Extension
●Conclusions
●Future Directions
5. 16th
November, 2018 1st
IEEE SmartNets @ Hammamet, Tunisia 5
Key Functional Requirements
↬ Spatial On-Line Analytical Processing (SOLAP)
●OLAP tools enable users to analyze multidimensional data
interactively from multiple perspectives
●Multi-dimensional data analysis
»Spatial dimension: GPS data, area, ...
»Time dimension: time range, day/night,
»Standard dimensions such as #passengers, payment type...
»Measures: count trips, sum trips' fares ….
●OLAP operations
»Drill-down: show trips' details for a combination of dimensions
»Roll-up: aggregate trips' data for a combination of dimensions
»Slice: filter along dimension
»Dice: filter along more than one dimension
6. 16th
November, 2018 1st
IEEE SmartNets @ Hammamet, Tunisia 6
Key Functional Requirements
↬ Spatial Data Mining
●Algorithms for learning trajectory patterns from historical
data
»Path patterns (infrequent/frequent path patterns, triangle
patterns)
»Hotspots/cold areas
»Co-location patterns, e.g. weather conditions and trips
patterns
»Stay Points, trips’ trajectory patterns, driving and speed
patterns.
●Algorithms for predicting future events such as a car
destination, future traffic congestion, trip’s cost, et cetera
7. 16th
November, 2018 1st
IEEE SmartNets @ Hammamet, Tunisia 7
Key Functional Requirements
↬ Geo-visualization
●Geo-visualization
»Interactive Maps
»change the visual appearance of the map (e.g. colors,
day/night theme)
9. 16th
November, 2018 1st
IEEE SmartNets @ Hammamet, Tunisia 9
Key Functional Requirements
↬ processing mode
●Batch processing
»Capacity to process historical data
●Real-time processing
»Capacity to process Real-time data
»Stream systems
●Lambda architecture
»Combine both systems
10. 16th
November, 2018 1st
IEEE SmartNets @ Hammamet, Tunisia 10
Overview of state-of-the-art Technologies
↬ elastic stack
●Elasticsearch
»Distributed search engine and document store system
»Distributed inverted indices for querying free text
●Logstash and Beats for data ingest
●Kibana for visualization
●Real world users: NASA, Uber, Lyft, Tinder, CISCO, New York
Times, eBay, Groupon, Wikipedia, Stackoverflow, GitHub...
11. 16th
November, 2018 1st
IEEE SmartNets @ Hammamet, Tunisia 11
Overview of state-of-the-art Technologies
↬ NYC cabs' records exploration with elastic stack
12. 16th
November, 2018 1st
IEEE SmartNets @ Hammamet, Tunisia 12
Overview of state-of-the-art Technologies
↬ NYC cabs' records exploration with elastic stack
13. 16th
November, 2018 1st
IEEE SmartNets @ Hammamet, Tunisia 13
Overview of state-of-the-art Technologies
↬ NYC cabs' records exploration with elastic stack
14. 16th
November, 2018 1st
IEEE SmartNets @ Hammamet, Tunisia 14
Overview of state-of-the-art Technologies
↬ elastic stack discussion
●Geo-visualization
»Interactive maps
●Spatial OLAP
»Use Domain Specific Language (DSL) to query
elasticsearch cluster
●Spatial data mining and trajectory patterns
»Not supported
●Processing mode
»Both batch and real-time
16. 16th
November, 2018 1st
IEEE SmartNets @ Hammamet, Tunisia 16
Overview of state-of-the-art Technologies
↬ Geomondrian discussion
●Geo-visualization
»No Interactive maps
»Need SOLAP client
●Spatial OLAP
»Use SQL to query the Relational store
●Spatial data mining and trajectory patterns
»pgpRouting -spatial extension of PostgreSQL
●Processing mode
»batch processing
17. 16th
November, 2018 1st
IEEE SmartNets @ Hammamet, Tunisia 17
Overview of state-of-the-art Technologies
↬ Leaflet + relational data store (MySQL)
Visualizing Millions of NYC taxi pick-ups locations for year
2014 (spatial points are clustered using superclustreing)
18. 16th
November, 2018 1st
IEEE SmartNets @ Hammamet, Tunisia 18
Overview of state-of-the-art Technologies
↬ Leaflet discussion
●Geo-visualization
»Interactive maps
●Spatial OLAP
»Use SQL to query the Relational store
●Spatial data mining and trajectory patterns
»Not supported
●Processing mode
»batch processing
19. 16th
November, 2018 1st
IEEE SmartNets @ Hammamet, Tunisia 19
Graph oriented data store solution
↬ Graph oriented databases: Neo4j, Graph Frames/Spark
●Directed Graph design
»Vertices: aggregated spatial locations
»Relationships: Bags of trips data
●Extend Neo4j CYPHER query language to support OLAP
operations
»Roll-up graph
»Drill down graph
●Use of Apache Spark for data preprocessing
»Processing Spatial data
●Map each GPS pick-up/drop-off into a geohash
●Geohash =12 → Area width x height: 3.7cm x 1.9cm
●Geohash =7 → Area width x height: 152.9m x 152.4m
●Geohash =4→ Area width x height: 39.1km x 19.5km ..
»Processing Time data
●Map each date-time pick-up/drop-off into a timehash
20. 16th
November, 2018 1st
IEEE SmartNets @ Hammamet, Tunisia 20
Graph oriented data store solution
↬ Graph oriented databases: Neo4j, Graph Frames/Spark
●Scalable data processing with CAPS -Cypher for Apache
Spark
●Most Trajectory patterns are provided or implemented
using either Neo4j-Cypher , Apache Spark/Graph Frames
or MLib
»Graph traversal Algorithms -Breadth First Search is
provided
»Depth First Search is implemented
»Frequent/infrequent trajectory patterns
»Hotspots and cold areas
»Page rank
»Connected components
»Clustering
● Work-in-progress
»Visualization on a world Map
21. 16th
November, 2018 1st
IEEE SmartNets @ Hammamet, Tunisia 21
Conclusion & Future work
●Conduct experiments on a HPC platform
»Benchmark NoSQL graph databases: Neo4j vs. Graph
Frames for each defined business query
●Extend Graph capabilities in Elastic
●Combine multiple datasets
» e.g. Trajectory data with open datasets such as weather
data, crime data in NYC
22. 16th
November, 2018 1st
IEEE SmartNets @ Hammamet, Tunisia 22
Thank you for your Attention
Q & A
The Quest for Scalable and Intelligent
Trajectory Data Analytics Systems: Status
Report and Future Directions
Rim Moussa, Ahmed Haddad and Tarek Bejaoui
16th
of November, 2018
1st
IEEE SmartNets @ Hammamet, Tunisia