SlideShare a Scribd company logo
iBAT: Detecting Anomalous Taxi
Trajectories from GPS Traces
DAQING ZHANG, NAN LI, ZHI-HUA ZHOU, CHAO
CHEN, LIN SUN, SHIJIAN LI
Introduction
• Taxis have been equipped with GPS devices
• Gathering and analyzing large-scale GPS traces reveal the hidden
facts
- City dynamics
- Human behaviors
Potential Applications
1. Fraud detection
- Greedy taxi drivers overcharge passenger by taking unnecessary
retours
- Often manually verified upon passenger complain
- In most case frauds are not even noticed by passengers
2. Road network change detection
- Urban road network often change with time. E.g. new road, blocked
road
- Need to update the changes in digital map
- Done manually by digital map providers
Example
• Source : S
• Destination : D
• Three clusters of trajectories
• Four trajectories (t0, t1, t2, t3) are considered as anomalies
Challenges
• There will be different set of normal trajectories between a pair of
source and destination. E.g. 3 clusters
• Traditional anomaly detection techniques based on distance are not
sufficient. E.g. t3
• Detect an emerging cluster of anomalous trajectories and incorporate
the changes
• Traditional method often requires fixed length feature vectors
Solution
• Novel anomalous driving trajectory detection approach
• Contribution
1. Transform the problem into an easy-to-solve problem
2. Propose an Isolation-Based Anomalous Trajectory (iBAT) detection
method
3. Evaluate iBAT with real-world GPS traces
4. Achieves remarkable detection rate with low processing time
Related work
• Anomalous trajectory detection
1. Outlier detection using sub-trajectories using distance and density
[1]
2. Outlier detection using direction and density [2]
3. Clustering techniques [3]
4. Discover abnormal traffic change [4]
5. Learning based approaches [5]
Related work
• Anomalous detection methods not designed for trajectory data
1. Distance-based approach [6]
2. Density-based approach [7]
3. Isolation-based approach - iForest [8]
iForest
• Anomalies are few and different
• No distance or density measures
• State-of-the-art performance for outlier detection
• Lower run time and space complexity
• Fixed-length feature vectors
Problem Statement
• GPS Point
- latitude and longitude
- Timestamp
- Estimated speed
- Operation status
Problem Statement
• Taxi trajectory
Sequence of GPS points to an occupied taxi ride
t : p1 -> p2 -> …… -> pn
p1 : source
pn : destination
• Problem
Given a set of Trajectories T = {t1, t2,….. , tn} between two locations S
and D, find those in T that are significantly different from the major
Proposed approach
• Three-step procedure
Augmenting(preprocessing)
• Split the city map into grid-cells of equal size
•Map the taxi trajectory to the cell grid - sequence of traversed
cells
•Augment missing cells by inserting pseudo cells
Indexing(preprocessing)
• Given a source-destination cell-pair, find all the related taxi
trajectories
• For given period of time, there may be insufficient trajectories
between a source and destination
• Add all the taxi trajectories which pass through the source-
destination cell pair to solve the issue
• Construct inverted index mechanism for trajectory retrieval
Indexing(preprocessing)
• Example:
- Two trajectories
- Create the inverted index as follows
- Source p1 and destination p5: retrieve t1 and t2
- Source p1 and p3: retrieve t1
Adapting iForest
• Begin with all the trajectories
• Pick one grid cell at a time
• Partition trajectories based on the grid cell
- Trajectory contains the grid cell : Left
- Otherwise : Right
• Continue until all the instances are isolated
- Either there is only one trajectory to divide
- Or all the trajectories contain same set of grid cells
• Produces short paths for anomalies and long paths for normal
instances
Adapting iForest
• Example
iBAT
• Lazy learning
- Do not train a model until presented with a test sample
• Keep one trajectory as test t
• Select one grid cell from t randomly and remove trajectories
from train which do not contain the selected cell
• Repeat until no trajectory is left or all the trajectories left
contains cells same as test trajectory
• No of cells required to isolate t, n(t) decides anomalous
iBAT
• Example
• Can separately identify loops e.g. t8
iBAT
• Real-word example
iBAT
• Since the process is completely random, average of n(t) is
considered
• Example: random isolation process for 200 times
iBAT
• Calculate the anomaly score for t
s(t, N) = 2-E(n(t))/c(N)
N : Number of trajectories from which we separate t
E(n(t)) : Average number of cells used
c(N): 2H(N-1) - 2(N-1)/N
H(i): Harmonic number, estimated as ln(i) + 0.57721566
• E(n(t)) -> 0, s(t, N) -> 1 : Anomalous trajectory
• s(t, N) < 0.5 : t is categorized as normal trajectory
iBAT
• In practice, training set will be large
• Use sub-sample of the training data
• Two additional parameter
m : running trails (100)
ψ : sub-sample size (256)
Evaluation
• Real-word taxi GPS dataset collected from the large city of China,
Hangzhou
• More than 7600 taxis
• One record contains:
- latitude
- longitude
- passenger status
- timestamp
• Area is discretized into 100*200 grid cells, each corresponding to
a 250m*250m square
Evaluation
• Five source-destination cell-pair is picked
• Trajectories are manually labeled by three volunteers
• If one volunteer thinks a trajectory is anomalous, it is labeled to
be anomalous
Evaluation
• Density-based method is taken as the baseline for comparison
- density of a trajectory is average density of all its cells
- density of a cell is the number of trajectories that pass through
it
• Evaluation criteria
- detection rate : The fraction of anomalous trajectories that are
successfully detected
- false alarm rate: The fraction of normal ones that are
wrongly predicted detected as anomalous
- AUC: Area Under ROC Curve
Probability that a randomly chosen anomalous trajectory
is ranked higher than the randomly chosen normal one
Evaluation
• Visualization of the results
Evaluation
• ROC curve of iBAT
• High detection rate with small false alarm rate
• 90% of anomalous detected at 2% false alarm rate
• For T-4, 100% detection rate with false alarm rate less than 1%
Evaluation
• AUC values compared with density-based method
• AUC values are greater than 0.99 for all the datasets
• Density-based method achieves lower AUC values
- less than 0.95 in four datasets
- 0.97 in dataset T-4 as there are less anomalous trajectories
that detour on high-density cells
Evaluation
• AUC value and processing time change with m (ψ = 256), and ψ
(m = 100)
• Processing time is about 100 secs, about 0.07 second per
trajectory when m=100
Taxi Driving Fraud
• Long-distance detours may correspond to taxi driving frauds
• Detecting anomalous taxi trajectories can help building taxi
driving fraud detection systems
• Challenges
- Some drivers may be truly unfamiliar with the routes
- Some may argue that reason for the detour is due to accident
or traffic
Road Network Change
• If more similar anomalous trajectories are accumulating, it may
be an indication of new or blocked road
• 160 trajectories similar to t0
Strength
• Simple approach
• Computationally efficient
• Capture the behavior of anomalies, “few and different”
Weakness
• Run the algorithm for each trajectory again and again
• Handling temporal trajectories are not explained well
• Only density-based approach is compared in the evaluation
Future Work
• Implement the fraud detection system on top of this application
• Use other information associated with a GPS trace such as
driving speed
References
[1] J.-G. Lee, J. Han, and X. Li. Trajectory outlier detection: A partition-and-detect framework. In
Proc. ICDE 2008, pages 140–149, 2008.
[2] Y. Ge, H. Xiong, Z.-H. Zhou, H. Ozdemir, J. Yu, and K. C. Lee. Top-Eye: Top-k evolving trajectory
outlier detection. In Proc. CIKM 2010, pages 1733–1736, 2010
[3] Y. Bu, L. Chen, A. W.-C. Fu, and D. Liu. Efficient anomaly monitoring over moving object
trajectory streams. In Proc. KDD 2009, pages 159–168, 2009.
[4] B. Li, D. Zhang, L. Sun, C. Chen, S. Li, G. Qi, and Q. Yang. Hunting or waiting? discovering
passenger-finding strategies from a large-scale real-world taxi dataset. In MUCS’11 in conjunction
with PerCom 2011, pages 63–68, 2011.
[5] Z. Liao, Y. Yu, and B. Chen. Anomaly detection in GPS data based on visual analytics. In Proc.
VAST 2010, pages 51–58, 2010.
Thank You

More Related Content

What's hot

Planning of a field operational test on navigation systems: Implementation an...
Planning of a field operational test on navigation systems: Implementation an...Planning of a field operational test on navigation systems: Implementation an...
Planning of a field operational test on navigation systems: Implementation an...euroFOT
 
Transit Signal Priority
Transit Signal PriorityTransit Signal Priority
Transit Signal Priority
WSP
 
J0342054059
J0342054059J0342054059
J0342054059
inventionjournals
 
Traffic Conditions - From Now Until Forever
Traffic Conditions - From Now Until ForeverTraffic Conditions - From Now Until Forever
Traffic Conditions - From Now Until Forever
WSP
 
Learning Transportation Mode From Raw Gps Data For Geographic Applications On...
Learning Transportation Mode From Raw Gps Data For Geographic Applications On...Learning Transportation Mode From Raw Gps Data For Geographic Applications On...
Learning Transportation Mode From Raw Gps Data For Geographic Applications On...ceya
 
Iaetsd modified artificial potential fields algorithm for mobile robot path ...
Iaetsd modified  artificial potential fields algorithm for mobile robot path ...Iaetsd modified  artificial potential fields algorithm for mobile robot path ...
Iaetsd modified artificial potential fields algorithm for mobile robot path ...Iaetsd Iaetsd
 

What's hot (7)

Planning of a field operational test on navigation systems: Implementation an...
Planning of a field operational test on navigation systems: Implementation an...Planning of a field operational test on navigation systems: Implementation an...
Planning of a field operational test on navigation systems: Implementation an...
 
Transit Signal Priority
Transit Signal PriorityTransit Signal Priority
Transit Signal Priority
 
Session 38 Oded Cats
Session 38 Oded CatsSession 38 Oded Cats
Session 38 Oded Cats
 
J0342054059
J0342054059J0342054059
J0342054059
 
Traffic Conditions - From Now Until Forever
Traffic Conditions - From Now Until ForeverTraffic Conditions - From Now Until Forever
Traffic Conditions - From Now Until Forever
 
Learning Transportation Mode From Raw Gps Data For Geographic Applications On...
Learning Transportation Mode From Raw Gps Data For Geographic Applications On...Learning Transportation Mode From Raw Gps Data For Geographic Applications On...
Learning Transportation Mode From Raw Gps Data For Geographic Applications On...
 
Iaetsd modified artificial potential fields algorithm for mobile robot path ...
Iaetsd modified  artificial potential fields algorithm for mobile robot path ...Iaetsd modified  artificial potential fields algorithm for mobile robot path ...
Iaetsd modified artificial potential fields algorithm for mobile robot path ...
 

Similar to iBAT: Detecting Anomalous Taxi Trajectories from GPS Traces

COMPUTER NETWORKS CHAPTER 3 NETWORK LAYER NOTES CSE 3RD year sem 1
COMPUTER NETWORKS CHAPTER 3 NETWORK LAYER NOTES CSE 3RD year sem 1COMPUTER NETWORKS CHAPTER 3 NETWORK LAYER NOTES CSE 3RD year sem 1
COMPUTER NETWORKS CHAPTER 3 NETWORK LAYER NOTES CSE 3RD year sem 1
aishwaryaarrao3
 
Module 3 Part B - computer networks module 2 ppt
Module 3 Part B - computer networks module 2 pptModule 3 Part B - computer networks module 2 ppt
Module 3 Part B - computer networks module 2 ppt
anushaj46
 
Network analysis for shortest optimum path
Network analysis for shortest optimum pathNetwork analysis for shortest optimum path
Network analysis for shortest optimum path
Sourabh Jain
 
interactive voting based map matching algorithm
interactive voting based map matching algorithminteractive voting based map matching algorithm
interactive voting based map matching algorithm
Yousef Fadila
 
Presentation on Spot Speed Study Analysis for the course CE 454
Presentation on Spot Speed Study Analysis for the course CE 454Presentation on Spot Speed Study Analysis for the course CE 454
Presentation on Spot Speed Study Analysis for the course CE 454
nazifa tabassum
 
Reducing Scanning Latency in WiMAX Enabled VANETs
 Reducing Scanning Latency in WiMAX Enabled VANETs Reducing Scanning Latency in WiMAX Enabled VANETs
Reducing Scanning Latency in WiMAX Enabled VANETs
Syed Hassan Ahmed
 
Edward Robson
Edward RobsonEdward Robson
Edward Robson
JumpingJaq
 
How Long to Wait? Predicting Bus Arrival Time With Mobile Phone Based Partici...
How Long to Wait? Predicting Bus Arrival Time With Mobile Phone Based Partici...How Long to Wait? Predicting Bus Arrival Time With Mobile Phone Based Partici...
How Long to Wait? Predicting Bus Arrival Time With Mobile Phone Based Partici...
WMLab,NCU
 
Routing algorithms
Routing algorithmsRouting algorithms
Routing algorithms
Parameswaran Selvakumar
 
Ant Colony Optimization algorithms in ADSA
Ant Colony Optimization algorithms in ADSAAnt Colony Optimization algorithms in ADSA
Ant Colony Optimization algorithms in ADSA
ALIZAIB KHAN
 
Introduction to mobile ad hoc network (m.a.net)
Introduction to mobile ad hoc network (m.a.net)Introduction to mobile ad hoc network (m.a.net)
Introduction to mobile ad hoc network (m.a.net)Sohebuzzaman Khan
 
Routing Presentation
Routing PresentationRouting Presentation
Routing Presentation
Mohsin Ali
 
SPLT Transformer.pptx
SPLT Transformer.pptxSPLT Transformer.pptx
SPLT Transformer.pptx
Seungeon Baek
 
A participatory urban traffic monitoring system
A participatory urban traffic monitoring systemA participatory urban traffic monitoring system
A participatory urban traffic monitoring system
Kang Yen
 
Computer networks unit iii
Computer networks    unit iiiComputer networks    unit iii
Computer networks unit iii
JAIGANESH SEKAR
 
24-ad-hoc.ppt
24-ad-hoc.ppt24-ad-hoc.ppt
24-ad-hoc.ppt
sumadi26
 
NETWORK LAYER.ppt
NETWORK LAYER.pptNETWORK LAYER.ppt
NETWORK LAYER.ppt
DrTThendralCompSci
 
NetworkAlgorithms.ppt
NetworkAlgorithms.pptNetworkAlgorithms.ppt
NetworkAlgorithms.ppt
21121A0594
 

Similar to iBAT: Detecting Anomalous Taxi Trajectories from GPS Traces (20)

COMPUTER NETWORKS CHAPTER 3 NETWORK LAYER NOTES CSE 3RD year sem 1
COMPUTER NETWORKS CHAPTER 3 NETWORK LAYER NOTES CSE 3RD year sem 1COMPUTER NETWORKS CHAPTER 3 NETWORK LAYER NOTES CSE 3RD year sem 1
COMPUTER NETWORKS CHAPTER 3 NETWORK LAYER NOTES CSE 3RD year sem 1
 
Module 3 Part B - computer networks module 2 ppt
Module 3 Part B - computer networks module 2 pptModule 3 Part B - computer networks module 2 ppt
Module 3 Part B - computer networks module 2 ppt
 
Network analysis for shortest optimum path
Network analysis for shortest optimum pathNetwork analysis for shortest optimum path
Network analysis for shortest optimum path
 
TINET_FRnOG_2008_public
TINET_FRnOG_2008_publicTINET_FRnOG_2008_public
TINET_FRnOG_2008_public
 
interactive voting based map matching algorithm
interactive voting based map matching algorithminteractive voting based map matching algorithm
interactive voting based map matching algorithm
 
Presentation on Spot Speed Study Analysis for the course CE 454
Presentation on Spot Speed Study Analysis for the course CE 454Presentation on Spot Speed Study Analysis for the course CE 454
Presentation on Spot Speed Study Analysis for the course CE 454
 
Reducing Scanning Latency in WiMAX Enabled VANETs
 Reducing Scanning Latency in WiMAX Enabled VANETs Reducing Scanning Latency in WiMAX Enabled VANETs
Reducing Scanning Latency in WiMAX Enabled VANETs
 
Edward Robson
Edward RobsonEdward Robson
Edward Robson
 
How Long to Wait? Predicting Bus Arrival Time With Mobile Phone Based Partici...
How Long to Wait? Predicting Bus Arrival Time With Mobile Phone Based Partici...How Long to Wait? Predicting Bus Arrival Time With Mobile Phone Based Partici...
How Long to Wait? Predicting Bus Arrival Time With Mobile Phone Based Partici...
 
Routing algorithms
Routing algorithmsRouting algorithms
Routing algorithms
 
Ant Colony Optimization algorithms in ADSA
Ant Colony Optimization algorithms in ADSAAnt Colony Optimization algorithms in ADSA
Ant Colony Optimization algorithms in ADSA
 
Introduction to mobile ad hoc network (m.a.net)
Introduction to mobile ad hoc network (m.a.net)Introduction to mobile ad hoc network (m.a.net)
Introduction to mobile ad hoc network (m.a.net)
 
Routing Presentation
Routing PresentationRouting Presentation
Routing Presentation
 
11 routing
11 routing11 routing
11 routing
 
SPLT Transformer.pptx
SPLT Transformer.pptxSPLT Transformer.pptx
SPLT Transformer.pptx
 
A participatory urban traffic monitoring system
A participatory urban traffic monitoring systemA participatory urban traffic monitoring system
A participatory urban traffic monitoring system
 
Computer networks unit iii
Computer networks    unit iiiComputer networks    unit iii
Computer networks unit iii
 
24-ad-hoc.ppt
24-ad-hoc.ppt24-ad-hoc.ppt
24-ad-hoc.ppt
 
NETWORK LAYER.ppt
NETWORK LAYER.pptNETWORK LAYER.ppt
NETWORK LAYER.ppt
 
NetworkAlgorithms.ppt
NetworkAlgorithms.pptNetworkAlgorithms.ppt
NetworkAlgorithms.ppt
 

Recently uploaded

Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 

Recently uploaded (20)

Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 

iBAT: Detecting Anomalous Taxi Trajectories from GPS Traces

  • 1. iBAT: Detecting Anomalous Taxi Trajectories from GPS Traces DAQING ZHANG, NAN LI, ZHI-HUA ZHOU, CHAO CHEN, LIN SUN, SHIJIAN LI
  • 2. Introduction • Taxis have been equipped with GPS devices • Gathering and analyzing large-scale GPS traces reveal the hidden facts - City dynamics - Human behaviors
  • 3. Potential Applications 1. Fraud detection - Greedy taxi drivers overcharge passenger by taking unnecessary retours - Often manually verified upon passenger complain - In most case frauds are not even noticed by passengers 2. Road network change detection - Urban road network often change with time. E.g. new road, blocked road - Need to update the changes in digital map - Done manually by digital map providers
  • 4. Example • Source : S • Destination : D • Three clusters of trajectories • Four trajectories (t0, t1, t2, t3) are considered as anomalies
  • 5. Challenges • There will be different set of normal trajectories between a pair of source and destination. E.g. 3 clusters • Traditional anomaly detection techniques based on distance are not sufficient. E.g. t3 • Detect an emerging cluster of anomalous trajectories and incorporate the changes • Traditional method often requires fixed length feature vectors
  • 6. Solution • Novel anomalous driving trajectory detection approach • Contribution 1. Transform the problem into an easy-to-solve problem 2. Propose an Isolation-Based Anomalous Trajectory (iBAT) detection method 3. Evaluate iBAT with real-world GPS traces 4. Achieves remarkable detection rate with low processing time
  • 7. Related work • Anomalous trajectory detection 1. Outlier detection using sub-trajectories using distance and density [1] 2. Outlier detection using direction and density [2] 3. Clustering techniques [3] 4. Discover abnormal traffic change [4] 5. Learning based approaches [5]
  • 8. Related work • Anomalous detection methods not designed for trajectory data 1. Distance-based approach [6] 2. Density-based approach [7] 3. Isolation-based approach - iForest [8]
  • 9. iForest • Anomalies are few and different • No distance or density measures • State-of-the-art performance for outlier detection • Lower run time and space complexity • Fixed-length feature vectors
  • 10. Problem Statement • GPS Point - latitude and longitude - Timestamp - Estimated speed - Operation status
  • 11. Problem Statement • Taxi trajectory Sequence of GPS points to an occupied taxi ride t : p1 -> p2 -> …… -> pn p1 : source pn : destination • Problem Given a set of Trajectories T = {t1, t2,….. , tn} between two locations S and D, find those in T that are significantly different from the major
  • 13. Augmenting(preprocessing) • Split the city map into grid-cells of equal size •Map the taxi trajectory to the cell grid - sequence of traversed cells •Augment missing cells by inserting pseudo cells
  • 14. Indexing(preprocessing) • Given a source-destination cell-pair, find all the related taxi trajectories • For given period of time, there may be insufficient trajectories between a source and destination • Add all the taxi trajectories which pass through the source- destination cell pair to solve the issue • Construct inverted index mechanism for trajectory retrieval
  • 15. Indexing(preprocessing) • Example: - Two trajectories - Create the inverted index as follows - Source p1 and destination p5: retrieve t1 and t2 - Source p1 and p3: retrieve t1
  • 16. Adapting iForest • Begin with all the trajectories • Pick one grid cell at a time • Partition trajectories based on the grid cell - Trajectory contains the grid cell : Left - Otherwise : Right • Continue until all the instances are isolated - Either there is only one trajectory to divide - Or all the trajectories contain same set of grid cells • Produces short paths for anomalies and long paths for normal instances
  • 18. iBAT • Lazy learning - Do not train a model until presented with a test sample • Keep one trajectory as test t • Select one grid cell from t randomly and remove trajectories from train which do not contain the selected cell • Repeat until no trajectory is left or all the trajectories left contains cells same as test trajectory • No of cells required to isolate t, n(t) decides anomalous
  • 19. iBAT • Example • Can separately identify loops e.g. t8
  • 21. iBAT • Since the process is completely random, average of n(t) is considered • Example: random isolation process for 200 times
  • 22. iBAT • Calculate the anomaly score for t s(t, N) = 2-E(n(t))/c(N) N : Number of trajectories from which we separate t E(n(t)) : Average number of cells used c(N): 2H(N-1) - 2(N-1)/N H(i): Harmonic number, estimated as ln(i) + 0.57721566 • E(n(t)) -> 0, s(t, N) -> 1 : Anomalous trajectory • s(t, N) < 0.5 : t is categorized as normal trajectory
  • 23. iBAT • In practice, training set will be large • Use sub-sample of the training data • Two additional parameter m : running trails (100) ψ : sub-sample size (256)
  • 24. Evaluation • Real-word taxi GPS dataset collected from the large city of China, Hangzhou • More than 7600 taxis • One record contains: - latitude - longitude - passenger status - timestamp • Area is discretized into 100*200 grid cells, each corresponding to a 250m*250m square
  • 25. Evaluation • Five source-destination cell-pair is picked • Trajectories are manually labeled by three volunteers • If one volunteer thinks a trajectory is anomalous, it is labeled to be anomalous
  • 26. Evaluation • Density-based method is taken as the baseline for comparison - density of a trajectory is average density of all its cells - density of a cell is the number of trajectories that pass through it • Evaluation criteria - detection rate : The fraction of anomalous trajectories that are successfully detected - false alarm rate: The fraction of normal ones that are wrongly predicted detected as anomalous - AUC: Area Under ROC Curve Probability that a randomly chosen anomalous trajectory is ranked higher than the randomly chosen normal one
  • 28. Evaluation • ROC curve of iBAT • High detection rate with small false alarm rate • 90% of anomalous detected at 2% false alarm rate • For T-4, 100% detection rate with false alarm rate less than 1%
  • 29. Evaluation • AUC values compared with density-based method • AUC values are greater than 0.99 for all the datasets • Density-based method achieves lower AUC values - less than 0.95 in four datasets - 0.97 in dataset T-4 as there are less anomalous trajectories that detour on high-density cells
  • 30. Evaluation • AUC value and processing time change with m (ψ = 256), and ψ (m = 100) • Processing time is about 100 secs, about 0.07 second per trajectory when m=100
  • 31. Taxi Driving Fraud • Long-distance detours may correspond to taxi driving frauds • Detecting anomalous taxi trajectories can help building taxi driving fraud detection systems • Challenges - Some drivers may be truly unfamiliar with the routes - Some may argue that reason for the detour is due to accident or traffic
  • 32. Road Network Change • If more similar anomalous trajectories are accumulating, it may be an indication of new or blocked road • 160 trajectories similar to t0
  • 33. Strength • Simple approach • Computationally efficient • Capture the behavior of anomalies, “few and different”
  • 34. Weakness • Run the algorithm for each trajectory again and again • Handling temporal trajectories are not explained well • Only density-based approach is compared in the evaluation
  • 35. Future Work • Implement the fraud detection system on top of this application • Use other information associated with a GPS trace such as driving speed
  • 36. References [1] J.-G. Lee, J. Han, and X. Li. Trajectory outlier detection: A partition-and-detect framework. In Proc. ICDE 2008, pages 140–149, 2008. [2] Y. Ge, H. Xiong, Z.-H. Zhou, H. Ozdemir, J. Yu, and K. C. Lee. Top-Eye: Top-k evolving trajectory outlier detection. In Proc. CIKM 2010, pages 1733–1736, 2010 [3] Y. Bu, L. Chen, A. W.-C. Fu, and D. Liu. Efficient anomaly monitoring over moving object trajectory streams. In Proc. KDD 2009, pages 159–168, 2009. [4] B. Li, D. Zhang, L. Sun, C. Chen, S. Li, G. Qi, and Q. Yang. Hunting or waiting? discovering passenger-finding strategies from a large-scale real-world taxi dataset. In MUCS’11 in conjunction with PerCom 2011, pages 63–68, 2011. [5] Z. Liao, Y. Yu, and B. Chen. Anomaly detection in GPS data based on visual analytics. In Proc. VAST 2010, pages 51–58, 2010.