GPSInsights: Towards an efficient
framework for storing and mining
massive real-time vehicle location data
Linh-Truong Hoang, Duy-Khanh Bui, Viet-
Trung Tran
Hanoi University of Science and Technology
1	
  
Agenda
•  Motivation
•  System architecture
•  Scalable map-matching
•  Experimentation
•  Conclusion
2	
  
Global Navigation Satellite
System (GNSS)
•  Autonomous geo-spatial positioning
– position
– velocity
– time
•  "Great" points about GNSS
– Free
– Real-time
– No required local infrastructures
3	
  
GNSS as part of Intelligent
transport system (ITS)
•  "precious" data for real-time traffic
managements 
– traffic dashboard
– speed control
– traffic jams monitoring
4	
  
Need	
  for	
  collec-ng	
  and	
  mining	
  massive	
  GNSS	
  
data	
  	
  
in	
  REAL-­‐TIME	
  
GNSS data characteristics
•  Real-time
– reported every
second
•  Massive in volume
– from millions cars
•  "bad" data
•  Need to be
processed within
digital map topology
5	
  
GNSS data is Bigdata's 5V
6	
  
SYSTEM ARCHITECTURE
Store massive GNSS data
Real-time mining 

7	
  
8	
  
Elas(city	
  
High-­‐throughput	
  
Fault-­‐tolerance	
  
Scalable	
  
First-­‐class	
  spa(o-­‐temporal	
  
API	
  
High-­‐thoughput	
  
Fault-­‐tolerance	
  
Online	
  processing	
  	
  
Scalable	
  	
  
Fault-­‐tolerence	
  
	
  Leverage	
  opensource	
  components	
  
9	
  
Apache spark processing
•  Resilient Distributed dataset (RDD)
– In-memory, backed by persistent storage (HDFS)
– fault-tolerance by lineage
– Support interactive – iterative analysis 
10	
  
Spark streaming
11	
  
Apache storm
12	
  
MongoDb with geo-indexing 
13	
  
Geomesa: Accumulo + geo-indexing
14	
  
SCALABLE MAP-MATCHING
ALGORITHM
15	
  
Map-matching
•  Online vs. Offline

•  OSM map
16	
  
Algorithm
•  OSM map format
•  Filling intermediate points
– Millions more points 
– Massive data 
– but simple calculations
•  real-time, scalable
17	
  
K-d tree for closest neighbours
•  Run by apache spark/storm
18	
  
EXPERIMENTATION
19	
  
Experiment setup
•  12 millions GPS records collected by
vehicles equipped with the GPS receiver in
March 2014
•  4 nodes cluster
– 8-cores Intel Xeon 2.6GHz CPU, 32GB memory
20	
  
Map-matching completion time
21	
  
Latency
22	
  
"Scalability"
23	
  
Demonstration
24	
  
Real-time traffic monitoring
25	
  
Real-time shortest path
26	
  
Conclusion
•  GPSInsights: Scalable framework for storing
and mining massive location data
– built on open-source scalable components
– scalable storage + real-time mining 
– Plug-able components
– Demonstration with scalable map-matching
algorithm
•  Future work
– Advance map-matching algorithms
– Traffic jam prediction
27	
  
Current state-of-the-arts
•  PostGIS
– Spatial objects management
over Postgres
– Small size 
– No mining supported 
28	
  

Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive transportation data