SlideShare a Scribd company logo
1 of 34
CS548 Fall 2016 Anomaly Detection Showcase
by
Nichole Etienne, Rohitpal Singh, Suchithra Balakrishnan, Yousef Fadila
Showcasing work by Bowen Du, Chuaren Liu, Wenjun Zhou, Zhenshan Hou, Hui Xiong on “
Catch Me If You Can - Detecting Pickpocket Suspects from Large-Scale Transit
Records”
CATCH ME IF YOU CAN
References
[1] Bowen Du, Chuaren Liu, Wenjun Zhou, Zhenshan Hou, Hui Xion, “Catch Me If You
Can - Detecting Pickpocket Suspects from Large-Scale Transit Records”, KDD ‘16,
Aug 13- 17, 2016, San Francisco, USA
[2] Paul Bouman, Evelien Van der Hurk, Leo Kroon, Ting Li, Peter Vervest, “Detecting
activity patterns from smart card data”, In BNAIC, 2013
[3] Markus M. Breunig, Hans Peter Kriegel, Raymond T. Ng and Jorg Sander. LOF:
Identifying density-based local outliers. SIGMOD Rec., 29(2):93-104, May 2000
Motivation
- Passengers in the public transit systems have been the main target for
pickpockets. In many cities, thefts happen frequently in transit systems
- Passengers dissatisfaction and serious public safety concerns
- 2014 in Beijing
350 pickpockets on subway system
490 pickpockets on buses
Abnormal travel Behaviours
 Travelling for an extended length of time
 Making unnecessary transfers
 Wandering on certain routes while making random stops
 Making random stops
Trajectories of Passengers
Examples of outliers
● A -> C -> D -> B
instead of A -> B (shortest
time/distance)
● E -> B (fewer passengers take
that path)
Data and Framework
Steps in the Architecture
1. Partition the city area into
regions with functional
categories
2. Extraction of mobility
characteristics of passengers
3. Individual mobility database to
store the profile of each
passenger
4. Passenger filtering and suspect
detection
5. User feedback information, for
future model training
Data Description - Transit Records
 Public transit system from buses and subways
 Rechargeable smart card – swipe when board or exit the vehicle
 Automated fare collection system (AFC) calculates the fare according to stations boarding and exiting
AFC record contains,
● Smart card ID
● Route number
● Event (boarding or exiting)
● Station
● Time stamp
Passenger Activity Map
Points of Interest Public Transit Network Info
Incident Reports
Confirmed Reports are publicly announced via SINA WEIBO, a primary social networking site in China
Two types of Pickpocket reports,
Official Announcements - announced by police
Personal Complaints - posted by victims
Extraction of Features
Mobility Characteristics
.
Travel Time and Frequency
• The daily travel time is defined as the total duration spent by each passenger
in the public transit system
• The daily riding frequency is defined as the number of transit records traveled
by each passenger per day.
Travel Time and Frequency
A thief has to spend quite long time in the crowded buses, subways, or near the transit stations to find
potential victims and better their crime moments
Short Rides
 A short ride is a transit record tr with less than 3 stops. Regular passengers
normally prefer fewer transfers in each trip.
 A thief (pickpocket) travels between bus/subway stations in random ways without
specific destinations.
 80% passengers finish their travels in 2 hours and within 2 transit records per day.
 In comparison, the identified thieves often spend more than 3 hours of daily travel
time, and their daily riding frequency is also larger.
Functional Transaction
 A high-level view of the human mobility patterns can be summarized by
transition among regions, where each region covers multiple stations
 Example ‘shopping trips’ like residence → shopping facilities → residence, or
‘sightseeing trips’ like residence → scenic spot → scenic spot → residence.
 Observation showed that pickpockets tend not to follow the typical patterns, and
wander randomly among the functional regions.
Frequently Visited Regions
 Regular movements between a small set of locations that the passenger is
familiar with
 Pickpockets often spend a significant portion of the time within few routes or
regions if they intend for opportunities
 Once a thief has committed the crime or lost the target, he or she would likely
come back to a familiar station for the next target
 Wandering behaviors were measured by counting the maximum number of
times a route was taken, or the maximum number of visits made to a region
Deviation from the Social Norm
 The difference between the individual behaviors and the typical behaviors of the population, so we
call them social features
 Example :
 Most of the trips will be finished within a specific amount of time given the trip origin and
destination,
 pickpocket suspects may spend more time in the transit system during the trip
Historical Behavior
 The statistics (e.g., median and standard deviation) were computed as daily
features were observed in the last seven days for each passenger, to quantify
their historical behaviors
 The median is more robust in the presence of outliers, hence its use
The Algorithm :Two Steps Algo
Goal:
Distinguish pickpockets from the regular passengers with high accuracy and low false-positives.
Motivation for the two-step approach
Classification Methods : Non-trivial Method
❖ Majority of the passengers are regular ones and only negligible are pickpockets in
passenger population.
❖ Class imbalance so heuristic approach leads to over-sampling and undersampling
Anomaly Detection Methods : High False positive Results
❖ Generally unsupervised, not good to scale instances well.
❖ High false positives
Two-step approach:
❖ The first step uses unsupervised anomaly detection techniques to filter out regular passengers from
suspects.
❖ The second step uses supervised classification to identify real pickpockets from suspects.
{(xj,yj) | j = 1,...,N} where xj ∈ Rq is
the extracted features associated
with the j-th passenger .
yj ∈ {0,1} : 1 means pickpockets.
Model : f : x → y = f(x)
Step I
➢ Filter out regular passengers from suspicious passengers.
➢ Used one class SVM (high accuracy, computing efficiency and high flexibility)
➢ The function φ(·) maps the original feature into a high- dimensional kernel space
where the optimal decision boundary exists:
➢ Use Kernel : κ(x1, x2) = ⟨φ(x1), φ(x2)⟩
Step II
➢ To distinguish the real suspects and false positives
➢ C is a controlling parameter that reduces false positives
➢ Used Supervised Classification Method using confirmation on social media as target
attribute
➢ Optimal Decision hyperplane :
➢ To compute optimal w and p in h(.) , Optimized
EXPERIMENT & RESULT
Baselines ( for comparison) & evaluation
 Classification methods (CM). The classification methods, including logistic regression (LR), decision
trees (DT), and support vector machines (SVM)
 Anomaly detection (AD). Anomaly detection methods, such as one-class SVM (OCSVM) and local
outlier factor (LOF)
 Two-step (TS) methods
Evaluation metrics.
- precision
- recall
- F-score computed with test set
Experiment Settings
Datasets:
real-world datasets containing over 1.6 billion transit records
split the data into historical training set and evaluation test set
Platform:
Windows Server 2012 64-bit system (4-CPU, each with 2.6GHz with
Quad-Core, and 128G main memory)
All algorithms were implemented with Java
The Result
● precisions of all one-step
methods are very low
● two-step combinations
significantly improve the
precisions.
● Two-step approach can e
ectively reduce the false-
positives
DEPLOYMENT AND INSIGHTS
Goal: Develop a decision support system for the security personnel to easily spot
pickpocket and hotspots
Screenshot of the prototype system
Shows:
 Suspect List
 Statistics
 Passenger Flows
 Active Regions
 Selected Suspect
Related Work
 Passengers Activity Patterns
 assessing the performance of the transit network
 identifying and optimizing problematic or awed bus routes,
 making service adjustments that accommodate variations in ridership on
different days
 Abnormal Traveling Behavior Detection
 discovers black-hole or volcano patterns in human mobility data in a
city,which can quickly identify gathering events, such as football matches
and concerts
 making service adjustments to smooth the traffic flow
Conclusion
 suspect detection and tracking system by mining large-scale transit records
 novel two-step framework to distinguish regular passengers from pickpocket
suspects
 implement a prototype system for end users
 experimental results on real-world data showed the effectiveness of the approach
Any Questions?

More Related Content

Viewers also liked

MISION-VISION-FODA-PERSONAL Y EMPRESARIAL
MISION-VISION-FODA-PERSONAL Y EMPRESARIALMISION-VISION-FODA-PERSONAL Y EMPRESARIAL
MISION-VISION-FODA-PERSONAL Y EMPRESARIALJoseph Garcia
 
تمارين وألعاب من دار الإبداع
تمارين وألعاب من دار الإبداع تمارين وألعاب من دار الإبداع
تمارين وألعاب من دار الإبداع Ebdaa center
 
Amber pitch deck_f2
Amber pitch deck_f2Amber pitch deck_f2
Amber pitch deck_f2Kyle Byrd
 
أنواع التفكير
أنواع التفكيرأنواع التفكير
أنواع التفكيرmemo200
 
CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1
CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1
CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1Yousef Fadila
 
تعليم التفكيـــر ( مفاهيم وتطبيقات ) لـــ أ.د . فتحي عبد الرحمن جروان .
تعليم التفكيـــر ( مفاهيم وتطبيقات ) لـــ أ.د . فتحي عبد الرحمن جروان .تعليم التفكيـــر ( مفاهيم وتطبيقات ) لـــ أ.د . فتحي عبد الرحمن جروان .
تعليم التفكيـــر ( مفاهيم وتطبيقات ) لـــ أ.د . فتحي عبد الرحمن جروان .Mohamed Saad Gelbana
 
شغل مخك 1
شغل مخك 1شغل مخك 1
شغل مخك 1gogo1994
 
المبادرة للإبداع
المبادرة للإبداعالمبادرة للإبداع
المبادرة للإبداعHani Al-Menaii
 
هندسة الأفكار
هندسة الأفكارهندسة الأفكار
هندسة الأفكارHani Al-Menaii
 
Inversionistas .i.
Inversionistas .i.Inversionistas .i.
Inversionistas .i.Arnold_Cross
 

Viewers also liked (14)

MISION-VISION-FODA-PERSONAL Y EMPRESARIAL
MISION-VISION-FODA-PERSONAL Y EMPRESARIALMISION-VISION-FODA-PERSONAL Y EMPRESARIAL
MISION-VISION-FODA-PERSONAL Y EMPRESARIAL
 
تمارين وألعاب من دار الإبداع
تمارين وألعاب من دار الإبداع تمارين وألعاب من دار الإبداع
تمارين وألعاب من دار الإبداع
 
Amber pitch deck_f2
Amber pitch deck_f2Amber pitch deck_f2
Amber pitch deck_f2
 
أنواع التفكير
أنواع التفكيرأنواع التفكير
أنواع التفكير
 
CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1
CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1
CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1
 
صيـاغـة مقترحـات التمـويل الفعّـالة
صيـاغـة مقترحـات التمـويل الفعّـالةصيـاغـة مقترحـات التمـويل الفعّـالة
صيـاغـة مقترحـات التمـويل الفعّـالة
 
تعليم التفكيـــر ( مفاهيم وتطبيقات ) لـــ أ.د . فتحي عبد الرحمن جروان .
تعليم التفكيـــر ( مفاهيم وتطبيقات ) لـــ أ.د . فتحي عبد الرحمن جروان .تعليم التفكيـــر ( مفاهيم وتطبيقات ) لـــ أ.د . فتحي عبد الرحمن جروان .
تعليم التفكيـــر ( مفاهيم وتطبيقات ) لـــ أ.د . فتحي عبد الرحمن جروان .
 
أنا أُفكّر
أنا أُفكّرأنا أُفكّر
أنا أُفكّر
 
شغل مخك 1
شغل مخك 1شغل مخك 1
شغل مخك 1
 
كتابة مقترح مشروع
كتابة مقترح مشروعكتابة مقترح مشروع
كتابة مقترح مشروع
 
المبادرة للإبداع
المبادرة للإبداعالمبادرة للإبداع
المبادرة للإبداع
 
هندسة الأفكار
هندسة الأفكارهندسة الأفكار
هندسة الأفكار
 
Plan de estudio
Plan de estudioPlan de estudio
Plan de estudio
 
Inversionistas .i.
Inversionistas .i.Inversionistas .i.
Inversionistas .i.
 

Similar to Anomaly Detection - Catch me if you can

Detecting Pickpocket Suspects fromLarge-Scale Public Transit Records
Detecting Pickpocket Suspects fromLarge-Scale Public Transit RecordsDetecting Pickpocket Suspects fromLarge-Scale Public Transit Records
Detecting Pickpocket Suspects fromLarge-Scale Public Transit RecordsJAYAPRAKASH JPINFOTECH
 
Comparative framework for activity-travel diary collection systems
Comparative framework for activity-travel diary collection systemsComparative framework for activity-travel diary collection systems
Comparative framework for activity-travel diary collection systemsAdrian C. Prelipcean
 
Ultimate Guide to Walkability Assessment Tools
Ultimate Guide to Walkability Assessment ToolsUltimate Guide to Walkability Assessment Tools
Ultimate Guide to Walkability Assessment ToolsState of Place
 
[Seminar] 20210115 Hyeshin Chu
[Seminar] 20210115 Hyeshin Chu[Seminar] 20210115 Hyeshin Chu
[Seminar] 20210115 Hyeshin Chuivaderivader
 
SMART International Symposium for Next Generation Infrastructure:Agency in tr...
SMART International Symposium for Next Generation Infrastructure:Agency in tr...SMART International Symposium for Next Generation Infrastructure:Agency in tr...
SMART International Symposium for Next Generation Infrastructure:Agency in tr...SMART Infrastructure Facility
 
2015 Transportation Research Forum Webinar - Enabling Better Mobility Through...
2015 Transportation Research Forum Webinar - Enabling Better Mobility Through...2015 Transportation Research Forum Webinar - Enabling Better Mobility Through...
2015 Transportation Research Forum Webinar - Enabling Better Mobility Through...Sean Barbeau
 
Intelligent Transportation System
Intelligent Transportation SystemIntelligent Transportation System
Intelligent Transportation Systemguest6d72ec
 
IRJET- Fuzzy Logic based Route Choice Behaviour Modelling
IRJET-  	  Fuzzy Logic based Route Choice Behaviour ModellingIRJET-  	  Fuzzy Logic based Route Choice Behaviour Modelling
IRJET- Fuzzy Logic based Route Choice Behaviour ModellingIRJET Journal
 
E-Ticketing System for public transport
E-Ticketing System  for  public transportE-Ticketing System  for  public transport
E-Ticketing System for public transportIliyas Khan
 
Neural Network Based Parking via Google Map Guidance
Neural Network Based Parking via Google Map GuidanceNeural Network Based Parking via Google Map Guidance
Neural Network Based Parking via Google Map GuidanceIJERA Editor
 
Traffic studies (transportation engineering)
Traffic studies (transportation engineering)Traffic studies (transportation engineering)
Traffic studies (transportation engineering)Civil Zone
 
Crime Pattern Detection using K-Means Clustering
Crime Pattern Detection using K-Means ClusteringCrime Pattern Detection using K-Means Clustering
Crime Pattern Detection using K-Means ClusteringReuben George
 
Building and Measuring Privacy-Preserving Mobility Analytics
Building and Measuring Privacy-Preserving Mobility AnalyticsBuilding and Measuring Privacy-Preserving Mobility Analytics
Building and Measuring Privacy-Preserving Mobility AnalyticsEmiliano De Cristofaro
 
Cross-border e-commerce trade facilitation and simplication
Cross-border e-commerce trade facilitation and simplicationCross-border e-commerce trade facilitation and simplication
Cross-border e-commerce trade facilitation and simplicationJonathan285782
 
IRJET- Traffic Study on Mid-Block Section & Intersection
IRJET-  	  Traffic Study on Mid-Block Section & IntersectionIRJET-  	  Traffic Study on Mid-Block Section & Intersection
IRJET- Traffic Study on Mid-Block Section & IntersectionIRJET Journal
 
IRJET - Driving Safety Risk Analysis using Naturalistic Driving Data
IRJET - Driving Safety Risk Analysis using Naturalistic Driving DataIRJET - Driving Safety Risk Analysis using Naturalistic Driving Data
IRJET - Driving Safety Risk Analysis using Naturalistic Driving DataIRJET Journal
 
Barbeau enabling better mobility through innovations for mobile devices - o...
Barbeau   enabling better mobility through innovations for mobile devices - o...Barbeau   enabling better mobility through innovations for mobile devices - o...
Barbeau enabling better mobility through innovations for mobile devices - o...Sean Barbeau
 
Urban Transportation system - mass transit system
Urban Transportation system - mass transit systemUrban Transportation system - mass transit system
Urban Transportation system - mass transit systemEddy Ankit Gangani
 
TorkkolaZhangLiZhangSchreinerGardner(MIRW2007)
TorkkolaZhangLiZhangSchreinerGardner(MIRW2007)TorkkolaZhangLiZhangSchreinerGardner(MIRW2007)
TorkkolaZhangLiZhangSchreinerGardner(MIRW2007)Harry Zhang
 

Similar to Anomaly Detection - Catch me if you can (20)

Detecting Pickpocket Suspects fromLarge-Scale Public Transit Records
Detecting Pickpocket Suspects fromLarge-Scale Public Transit RecordsDetecting Pickpocket Suspects fromLarge-Scale Public Transit Records
Detecting Pickpocket Suspects fromLarge-Scale Public Transit Records
 
Comparative framework for activity-travel diary collection systems
Comparative framework for activity-travel diary collection systemsComparative framework for activity-travel diary collection systems
Comparative framework for activity-travel diary collection systems
 
Ultimate Guide to Walkability Assessment Tools
Ultimate Guide to Walkability Assessment ToolsUltimate Guide to Walkability Assessment Tools
Ultimate Guide to Walkability Assessment Tools
 
[Seminar] 20210115 Hyeshin Chu
[Seminar] 20210115 Hyeshin Chu[Seminar] 20210115 Hyeshin Chu
[Seminar] 20210115 Hyeshin Chu
 
SMART International Symposium for Next Generation Infrastructure:Agency in tr...
SMART International Symposium for Next Generation Infrastructure:Agency in tr...SMART International Symposium for Next Generation Infrastructure:Agency in tr...
SMART International Symposium for Next Generation Infrastructure:Agency in tr...
 
2015 Transportation Research Forum Webinar - Enabling Better Mobility Through...
2015 Transportation Research Forum Webinar - Enabling Better Mobility Through...2015 Transportation Research Forum Webinar - Enabling Better Mobility Through...
2015 Transportation Research Forum Webinar - Enabling Better Mobility Through...
 
Intelligent Transportation System
Intelligent Transportation SystemIntelligent Transportation System
Intelligent Transportation System
 
IRJET- Fuzzy Logic based Route Choice Behaviour Modelling
IRJET-  	  Fuzzy Logic based Route Choice Behaviour ModellingIRJET-  	  Fuzzy Logic based Route Choice Behaviour Modelling
IRJET- Fuzzy Logic based Route Choice Behaviour Modelling
 
E-Ticketing System for public transport
E-Ticketing System  for  public transportE-Ticketing System  for  public transport
E-Ticketing System for public transport
 
Neural Network Based Parking via Google Map Guidance
Neural Network Based Parking via Google Map GuidanceNeural Network Based Parking via Google Map Guidance
Neural Network Based Parking via Google Map Guidance
 
Traffic studies (transportation engineering)
Traffic studies (transportation engineering)Traffic studies (transportation engineering)
Traffic studies (transportation engineering)
 
Crime Pattern Detection using K-Means Clustering
Crime Pattern Detection using K-Means ClusteringCrime Pattern Detection using K-Means Clustering
Crime Pattern Detection using K-Means Clustering
 
Building and Measuring Privacy-Preserving Mobility Analytics
Building and Measuring Privacy-Preserving Mobility AnalyticsBuilding and Measuring Privacy-Preserving Mobility Analytics
Building and Measuring Privacy-Preserving Mobility Analytics
 
Cross-border e-commerce trade facilitation and simplication
Cross-border e-commerce trade facilitation and simplicationCross-border e-commerce trade facilitation and simplication
Cross-border e-commerce trade facilitation and simplication
 
IRJET- Traffic Study on Mid-Block Section & Intersection
IRJET-  	  Traffic Study on Mid-Block Section & IntersectionIRJET-  	  Traffic Study on Mid-Block Section & Intersection
IRJET- Traffic Study on Mid-Block Section & Intersection
 
IRJET - Driving Safety Risk Analysis using Naturalistic Driving Data
IRJET - Driving Safety Risk Analysis using Naturalistic Driving DataIRJET - Driving Safety Risk Analysis using Naturalistic Driving Data
IRJET - Driving Safety Risk Analysis using Naturalistic Driving Data
 
Barbeau enabling better mobility through innovations for mobile devices - o...
Barbeau   enabling better mobility through innovations for mobile devices - o...Barbeau   enabling better mobility through innovations for mobile devices - o...
Barbeau enabling better mobility through innovations for mobile devices - o...
 
Sergio ICON project
Sergio ICON projectSergio ICON project
Sergio ICON project
 
Urban Transportation system - mass transit system
Urban Transportation system - mass transit systemUrban Transportation system - mass transit system
Urban Transportation system - mass transit system
 
TorkkolaZhangLiZhangSchreinerGardner(MIRW2007)
TorkkolaZhangLiZhangSchreinerGardner(MIRW2007)TorkkolaZhangLiZhangSchreinerGardner(MIRW2007)
TorkkolaZhangLiZhangSchreinerGardner(MIRW2007)
 

More from Yousef Fadila

Trackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity CalorimeterTrackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity CalorimeterYousef Fadila
 
Synergy on the Blockchain! whitepaper
Synergy on the Blockchain!  whitepaperSynergy on the Blockchain!  whitepaper
Synergy on the Blockchain! whitepaperYousef Fadila
 
Synergy Platform Whitepaper alpha
Synergy Platform Whitepaper alphaSynergy Platform Whitepaper alpha
Synergy Platform Whitepaper alphaYousef Fadila
 
Recommandation systems -
Recommandation systems - Recommandation systems -
Recommandation systems - Yousef Fadila
 
Analysis on steam platform
Analysis on steam platformAnalysis on steam platform
Analysis on steam platformYousef Fadila
 
interactive voting based map matching algorithm
interactive voting based map matching algorithminteractive voting based map matching algorithm
interactive voting based map matching algorithmYousef Fadila
 
Spot deceptive TripAdvisor Reviews
Spot deceptive TripAdvisor ReviewsSpot deceptive TripAdvisor Reviews
Spot deceptive TripAdvisor ReviewsYousef Fadila
 
Tweeting for Hillary - DS 501 case study 1
Tweeting for Hillary - DS 501 case study 1Tweeting for Hillary - DS 501 case study 1
Tweeting for Hillary - DS 501 case study 1Yousef Fadila
 

More from Yousef Fadila (8)

Trackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity CalorimeterTrackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity Calorimeter
 
Synergy on the Blockchain! whitepaper
Synergy on the Blockchain!  whitepaperSynergy on the Blockchain!  whitepaper
Synergy on the Blockchain! whitepaper
 
Synergy Platform Whitepaper alpha
Synergy Platform Whitepaper alphaSynergy Platform Whitepaper alpha
Synergy Platform Whitepaper alpha
 
Recommandation systems -
Recommandation systems - Recommandation systems -
Recommandation systems -
 
Analysis on steam platform
Analysis on steam platformAnalysis on steam platform
Analysis on steam platform
 
interactive voting based map matching algorithm
interactive voting based map matching algorithminteractive voting based map matching algorithm
interactive voting based map matching algorithm
 
Spot deceptive TripAdvisor Reviews
Spot deceptive TripAdvisor ReviewsSpot deceptive TripAdvisor Reviews
Spot deceptive TripAdvisor Reviews
 
Tweeting for Hillary - DS 501 case study 1
Tweeting for Hillary - DS 501 case study 1Tweeting for Hillary - DS 501 case study 1
Tweeting for Hillary - DS 501 case study 1
 

Recently uploaded

FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfrahulyadav957181
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 

Recently uploaded (20)

FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdf
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 

Anomaly Detection - Catch me if you can

  • 1. CS548 Fall 2016 Anomaly Detection Showcase by Nichole Etienne, Rohitpal Singh, Suchithra Balakrishnan, Yousef Fadila Showcasing work by Bowen Du, Chuaren Liu, Wenjun Zhou, Zhenshan Hou, Hui Xiong on “ Catch Me If You Can - Detecting Pickpocket Suspects from Large-Scale Transit Records” CATCH ME IF YOU CAN
  • 2. References [1] Bowen Du, Chuaren Liu, Wenjun Zhou, Zhenshan Hou, Hui Xion, “Catch Me If You Can - Detecting Pickpocket Suspects from Large-Scale Transit Records”, KDD ‘16, Aug 13- 17, 2016, San Francisco, USA [2] Paul Bouman, Evelien Van der Hurk, Leo Kroon, Ting Li, Peter Vervest, “Detecting activity patterns from smart card data”, In BNAIC, 2013 [3] Markus M. Breunig, Hans Peter Kriegel, Raymond T. Ng and Jorg Sander. LOF: Identifying density-based local outliers. SIGMOD Rec., 29(2):93-104, May 2000
  • 3. Motivation - Passengers in the public transit systems have been the main target for pickpockets. In many cities, thefts happen frequently in transit systems - Passengers dissatisfaction and serious public safety concerns - 2014 in Beijing 350 pickpockets on subway system 490 pickpockets on buses
  • 4. Abnormal travel Behaviours  Travelling for an extended length of time  Making unnecessary transfers  Wandering on certain routes while making random stops  Making random stops
  • 5. Trajectories of Passengers Examples of outliers ● A -> C -> D -> B instead of A -> B (shortest time/distance) ● E -> B (fewer passengers take that path)
  • 7. Steps in the Architecture 1. Partition the city area into regions with functional categories 2. Extraction of mobility characteristics of passengers 3. Individual mobility database to store the profile of each passenger 4. Passenger filtering and suspect detection 5. User feedback information, for future model training
  • 8. Data Description - Transit Records  Public transit system from buses and subways  Rechargeable smart card – swipe when board or exit the vehicle  Automated fare collection system (AFC) calculates the fare according to stations boarding and exiting AFC record contains, ● Smart card ID ● Route number ● Event (boarding or exiting) ● Station ● Time stamp
  • 10. Points of Interest Public Transit Network Info
  • 11. Incident Reports Confirmed Reports are publicly announced via SINA WEIBO, a primary social networking site in China Two types of Pickpocket reports, Official Announcements - announced by police Personal Complaints - posted by victims
  • 14. Travel Time and Frequency • The daily travel time is defined as the total duration spent by each passenger in the public transit system • The daily riding frequency is defined as the number of transit records traveled by each passenger per day.
  • 15. Travel Time and Frequency A thief has to spend quite long time in the crowded buses, subways, or near the transit stations to find potential victims and better their crime moments
  • 16. Short Rides  A short ride is a transit record tr with less than 3 stops. Regular passengers normally prefer fewer transfers in each trip.  A thief (pickpocket) travels between bus/subway stations in random ways without specific destinations.  80% passengers finish their travels in 2 hours and within 2 transit records per day.  In comparison, the identified thieves often spend more than 3 hours of daily travel time, and their daily riding frequency is also larger.
  • 17. Functional Transaction  A high-level view of the human mobility patterns can be summarized by transition among regions, where each region covers multiple stations  Example ‘shopping trips’ like residence → shopping facilities → residence, or ‘sightseeing trips’ like residence → scenic spot → scenic spot → residence.  Observation showed that pickpockets tend not to follow the typical patterns, and wander randomly among the functional regions.
  • 18. Frequently Visited Regions  Regular movements between a small set of locations that the passenger is familiar with  Pickpockets often spend a significant portion of the time within few routes or regions if they intend for opportunities  Once a thief has committed the crime or lost the target, he or she would likely come back to a familiar station for the next target  Wandering behaviors were measured by counting the maximum number of times a route was taken, or the maximum number of visits made to a region
  • 19. Deviation from the Social Norm  The difference between the individual behaviors and the typical behaviors of the population, so we call them social features  Example :  Most of the trips will be finished within a specific amount of time given the trip origin and destination,  pickpocket suspects may spend more time in the transit system during the trip
  • 20. Historical Behavior  The statistics (e.g., median and standard deviation) were computed as daily features were observed in the last seven days for each passenger, to quantify their historical behaviors  The median is more robust in the presence of outliers, hence its use
  • 21. The Algorithm :Two Steps Algo Goal: Distinguish pickpockets from the regular passengers with high accuracy and low false-positives.
  • 22. Motivation for the two-step approach Classification Methods : Non-trivial Method ❖ Majority of the passengers are regular ones and only negligible are pickpockets in passenger population. ❖ Class imbalance so heuristic approach leads to over-sampling and undersampling Anomaly Detection Methods : High False positive Results ❖ Generally unsupervised, not good to scale instances well. ❖ High false positives
  • 23. Two-step approach: ❖ The first step uses unsupervised anomaly detection techniques to filter out regular passengers from suspects. ❖ The second step uses supervised classification to identify real pickpockets from suspects. {(xj,yj) | j = 1,...,N} where xj ∈ Rq is the extracted features associated with the j-th passenger . yj ∈ {0,1} : 1 means pickpockets. Model : f : x → y = f(x)
  • 24. Step I ➢ Filter out regular passengers from suspicious passengers. ➢ Used one class SVM (high accuracy, computing efficiency and high flexibility) ➢ The function φ(·) maps the original feature into a high- dimensional kernel space where the optimal decision boundary exists: ➢ Use Kernel : κ(x1, x2) = ⟨φ(x1), φ(x2)⟩
  • 25. Step II ➢ To distinguish the real suspects and false positives ➢ C is a controlling parameter that reduces false positives ➢ Used Supervised Classification Method using confirmation on social media as target attribute ➢ Optimal Decision hyperplane : ➢ To compute optimal w and p in h(.) , Optimized
  • 27. Baselines ( for comparison) & evaluation  Classification methods (CM). The classification methods, including logistic regression (LR), decision trees (DT), and support vector machines (SVM)  Anomaly detection (AD). Anomaly detection methods, such as one-class SVM (OCSVM) and local outlier factor (LOF)  Two-step (TS) methods Evaluation metrics. - precision - recall - F-score computed with test set
  • 28. Experiment Settings Datasets: real-world datasets containing over 1.6 billion transit records split the data into historical training set and evaluation test set Platform: Windows Server 2012 64-bit system (4-CPU, each with 2.6GHz with Quad-Core, and 128G main memory) All algorithms were implemented with Java
  • 29. The Result ● precisions of all one-step methods are very low ● two-step combinations significantly improve the precisions. ● Two-step approach can e ectively reduce the false- positives
  • 30. DEPLOYMENT AND INSIGHTS Goal: Develop a decision support system for the security personnel to easily spot pickpocket and hotspots
  • 31. Screenshot of the prototype system Shows:  Suspect List  Statistics  Passenger Flows  Active Regions  Selected Suspect
  • 32. Related Work  Passengers Activity Patterns  assessing the performance of the transit network  identifying and optimizing problematic or awed bus routes,  making service adjustments that accommodate variations in ridership on different days  Abnormal Traveling Behavior Detection  discovers black-hole or volcano patterns in human mobility data in a city,which can quickly identify gathering events, such as football matches and concerts  making service adjustments to smooth the traffic flow
  • 33. Conclusion  suspect detection and tracking system by mining large-scale transit records  novel two-step framework to distinguish regular passengers from pickpocket suspects  implement a prototype system for end users  experimental results on real-world data showed the effectiveness of the approach

Editor's Notes

  1. Put figure 1 in a slide, and give example, travel who instead of going directly from A to B, choose to travel A → C → D → B is suspected.! But travel from E-B (outlier) is not suspected as this passenger is likely just a regular passenger who originates from a less crowded areat’s
  2. Clarify the different between transit record and trip (Verbal, explain the figure, don’t add these text to slides) Part (a) is the actual trajectory on the city’s map; Part (b) splits the trajectory into three separate trips; Part (c) demonstrates the corresponding transit records in our data. Explain the 30 minutes empirical cutoff!
  3. The features are are grouped into three categories: daily behaviors, social comparisons, and historical behaviors
  4. A thief (pickpocket) tends to travel between bus/subway stations in random ways without specific destinations. Such abnormal behaviors lead to abnormal daily travel time and riding frequency. We can see that more than 80% passengers finish their travels in 2 hours and within 2 transit records per day. In comparison, the identified thieves often spend more than 3 hours of daily travel time, and their daily riding frequency is also larger.
  5. Figure 7(a), the distribution approximates Gaussian with mean around 3. In Figure 7(b), for the passengers with at least 19 transit records, the distribution peak is shifted to the right. It shows that the frequency of short rides increases with increasing number of transit records.
  6. such as the number of boarding stations and the number of boarding regions. Then, we count the transition frequency between any pair of function categories for the daily trip of each passenger.
  7. , the wandering behaviors lead to concentration of the passenger’s boarding or exiting locations. We use the number of clusters as another feature, which measures the wandering concentration
  8. given the same origin and destination, the trip variation of the majority of the population (i.e., regular passengers) is low Thus, for each pair of origin o and destination d, we find the travel time of all trips between this pair. We then convert the time gap significance of a given trip by the quantile of its travel time with respect to the population.
  9. In our case, we only used data from the past seven days, and median can effectively avoid the impact of non-routine passenger behaviors. The standard deviation indicates the degree of variation of the daily behaviors of individual passengers. Regular passengers following routine trajectories normally generate statistics with less variations.