SlideShare a Scribd company logo
1 of 31
recommendations for
urban & transport contexts
      neal lathia (@neal_lathia)
      ucl media futures seminar
             may 25, 2011
research: personalisation
 to aide mobility in cities

              i.e.,
   getting from a to b (habit)
     finding z (discovery)
sensing mobility:
 5%-sample, 2 x 83-days
time-stamped location (entry, exit), modality
      payments (top-ups, travel cards)
        card-types (e.g., student)
what tools can we design to help travellers?
previously:
(getting from a to b) N. Lathia, J. Froehlich, L. Capra. Mining Public Transport Usage for
Personalised Intelligent Transport Systems. In IEEE ICDM 2010, Sydney, Australia.

(discovering z) D. Quercia, N. Lathia, F. Calabrese, G. Di Lorenzo, J. Crowcroft. Recommending
Social Events from Mobile Phone Location Data. In IEEE ICDM 2010, Sydney, Australia.
there is more to urban mobility than just moving.
context: purpose/intent, events, disruptions, cost, social connections.

for example,
mobility vs. cost
who are you?                how?




         when?



where         cash or travel card?
to?                                  how long for?
what
route?
N. Lathia, L. Capra. Mining Mobility Data to Minimise Travellers'
Spending on Public Transport. In ACM KDD 2011, San Diego, USA.

questions
(1) what is the relation between how we travel & how we spend?
(2) do travellers make the correct decisions? (no)
(3) can we help them with recommendations? (yes)
(%)          pay as you go purchases
                                         49.8         < 5 GBP
                                         24.2         5 – 10 GBP
                                         15.5         10 – 20 GBP
`
                                         (%)          travel card purchases
                                         70.8         7-day travel card
                                         15.8         1-month travel card
                                         11.6         7-day bus/tram pass

                               Purchase Behaviour
                  30
                                                              Travel
                  25                                          Cards
                                                              PAYG
                  20
    % Purchases




                  15


                  10


                   5


                   0
                       Mon   Tue   Wed    Thu   Fri     Sat      Sun
Purchase Geography                                Mobility Flow
45
                                                                                   Zone 1
40
                                          PAYG                                     Zone 2
                                          Travel Cards                             Zone 3
35
                                                                                   Zone 4
30                                                                                 Zone 5
                                                                                   Zone 6
25
                                                         arrive
20

15

10

 5                                                        depart
 0
     1   2   3       4    5    6      7       8     9
the data shows that:
(a) there is a high regularity in travel & purchase behaviour
(b) travellers buy in small increments and short-terms
(c) most purchases happen upon refused entry
(2) do travellers make the correct decisions?
compare actual purchases to the optimal (per traveller)

how:
(a) clean data
(b) build & search on a tree ~ sequence of choices
data cleaning overview




                          83-days   83-days


the “arrow” of time
data cleaning overview


                                    origin = destination




                             83-days                       83-days


the “arrow” of time


                      no purchase observed                 no purchase observed




                                                               -20% of users
how: build a tree with each user's mobility data
where a node is a purchase (expire, cost)
that is expanded when it has expired
(reduced) example:

               PAYG,             7-day             30-day
               £aa.aa            £bb.bb            £cc.cc
how: build a tree with each user's mobility data
where a node is a purchase (expire, cost)
that is expanded when it has expired
(reduced) example:

                PAYG,                   7-day                  30-day
                £aa.aa                  £bb.bb                 £cc.cc




      PAYG,              30-day
      £aa.aa   7-day     £cc.cc
               £bb.bb

                                     we reduce the space-complexity of
                                  searching on this tree by implementing
                                     expansion rules, pruning heuristics
the cheapest sequence of fares can then be
compared to what the user actually spent


               PAYG,
               £aa.aa




     PAYG,
     £aa.aa




     7-day
     £bb.bb




              30-day
              £cc.cc
the cheapest sequence of fares can then be
compared to what the user actually spent


               PAYG,
               £aa.aa


                        in each 83-day dataset, the 5% sample of users
                                 where overspending by ~ £2.5 million
     PAYG,
     £aa.aa                  An estimate of how much everybody (100%)
                        is overspending during an entire year (365 days)
                                                   is thus £200 million

     7-day
     £bb.bb




              30-day
              £cc.cc
overspending comes from
(a) failing to predict one's own mobility needs
                ...but we have observed that mobility is predictable

(b) failing to match mobility with fares (in a complex fare system)
                       ...which is an easy problem for a computer

can we help travellers?
recommender systems
aim to match users to items that will be of interest to them
recommender systems
aim to match users mobility profiles to items fares that will be of
interest the cheapest for them
two prediction problems
(in the paper) first, predict how a person is going to travel
(focus) then, predict what the cheapest fare will be
key fact:
the factors that influence cost on public transport are not
determined by your actual movements (a to b) but by generic
features that are city-dependent (e.g., Zone 1 - 2)
three steps
    1. for a given set of travel histories, compute the cheapest
fare (by tree expansion)
    2. reduce each travel history into a set of generic features,
describing the mobility (next slide)
    3. train classifiers to predict the cheapest fare given the set
of features
we have a set of {d, f, b, r, pt, ot, N} = F

                   where

              d = number of trips
           f = average trips per day
 b / r = proportion of trips on the bus / rail
pt / ot = proportion of peak & off-peak trips
             N = zone O-D matrix
           F = cheapest fare (label)
two baselines, three algorithms:
0. baseline – everyone on pay as you go
1. naïve bayes – estimating probabilities
2. k-nearest neighbours – looking at similar profiles
3. decision trees (C4.5) – recursively partitions data to infer rules
4. oracle – perfect knowledge
Accuracy (%)                     Savings (GBP)
              Dataset 1         Dataset 2      Dataset 1          Dataset 2
Baseline           74.99             76.91       326,447.95         306,145.85
Naïve Bayes        77.46             80.71       393,585.81         369,232.24
k-NN (5)           96.74             97.09       465,822.17         426,375.85
C4.5               98.01             98.29       473,918.38         434,082.81
Oracle             100                   100     479,583.91         438,923.30
N. Lathia, L. Capra. Mining Mobility Data to Minimise Travellers'
Spending on Public Transport. In ACM KDD 2011, San Diego, USA.

questions
(1) what is the relation between how we travel & how we spend?
(2) do travellers make the correct decisions? (no)
(3) can we help them with recommendations? (yes)
recommendations for
urban & transport contexts
      neal lathia (@neal_lathia)
      ucl media futures seminar
             may 25, 2011

More Related Content

Similar to Mobility Mining for Fare Recommendation

Stefan Michalak - Portfolio - January 2016
Stefan Michalak - Portfolio - January 2016Stefan Michalak - Portfolio - January 2016
Stefan Michalak - Portfolio - January 2016
stefan michalak
 
A car sharing auction with temporal-spatial OD connection conditions
A car sharing auction with temporal-spatial OD connection conditionsA car sharing auction with temporal-spatial OD connection conditions
A car sharing auction with temporal-spatial OD connection conditions
harapon
 
Cities2.0 Ict2008 Daniel Kaplan
Cities2.0 Ict2008 Daniel KaplanCities2.0 Ict2008 Daniel Kaplan
Cities2.0 Ict2008 Daniel Kaplan
ACIDD
 
ZhangTorkkolaLiSchreinerZhangGardnerZhao(04279048)
ZhangTorkkolaLiSchreinerZhangGardnerZhao(04279048)ZhangTorkkolaLiSchreinerZhangGardnerZhao(04279048)
ZhangTorkkolaLiSchreinerZhangGardnerZhao(04279048)
Harry Zhang
 

Similar to Mobility Mining for Fare Recommendation (20)

UCL Bite-Sized Lunch Lecture
UCL Bite-Sized Lunch LectureUCL Bite-Sized Lunch Lecture
UCL Bite-Sized Lunch Lecture
 
Volunteered Geographic Information and OpenStreetMap
Volunteered Geographic Information and OpenStreetMapVolunteered Geographic Information and OpenStreetMap
Volunteered Geographic Information and OpenStreetMap
 
Webinar: Using smart card and GPS data for policy and planning: the case of T...
Webinar: Using smart card and GPS data for policy and planning: the case of T...Webinar: Using smart card and GPS data for policy and planning: the case of T...
Webinar: Using smart card and GPS data for policy and planning: the case of T...
 
Rides Request Demand Forecast- OLA Bike
Rides Request Demand Forecast- OLA BikeRides Request Demand Forecast- OLA Bike
Rides Request Demand Forecast- OLA Bike
 
AUTO AI 2021 talk Real world data augmentations for autonomous driving : B Ra...
AUTO AI 2021 talk Real world data augmentations for autonomous driving : B Ra...AUTO AI 2021 talk Real world data augmentations for autonomous driving : B Ra...
AUTO AI 2021 talk Real world data augmentations for autonomous driving : B Ra...
 
IRJET - Cardless ATM
IRJET -  	  Cardless ATMIRJET -  	  Cardless ATM
IRJET - Cardless ATM
 
CREDIT CARD FRAUD DETECTION USING PREDICTIVE MODELLING
CREDIT CARD FRAUD DETECTION USING PREDICTIVE MODELLINGCREDIT CARD FRAUD DETECTION USING PREDICTIVE MODELLING
CREDIT CARD FRAUD DETECTION USING PREDICTIVE MODELLING
 
CNN MODEL FOR TRAFFIC SIGN RECOGNITION
CNN MODEL FOR TRAFFIC SIGN RECOGNITIONCNN MODEL FOR TRAFFIC SIGN RECOGNITION
CNN MODEL FOR TRAFFIC SIGN RECOGNITION
 
Safeguarding Abila: Spatio-Temporal Activity Modeling
Safeguarding Abila: Spatio-Temporal Activity ModelingSafeguarding Abila: Spatio-Temporal Activity Modeling
Safeguarding Abila: Spatio-Temporal Activity Modeling
 
Turning Oyster Cards into Information
Turning Oyster Cards into InformationTurning Oyster Cards into Information
Turning Oyster Cards into Information
 
Stefan Michalak - Portfolio - January 2016
Stefan Michalak - Portfolio - January 2016Stefan Michalak - Portfolio - January 2016
Stefan Michalak - Portfolio - January 2016
 
A car sharing auction with temporal-spatial OD connection conditions
A car sharing auction with temporal-spatial OD connection conditionsA car sharing auction with temporal-spatial OD connection conditions
A car sharing auction with temporal-spatial OD connection conditions
 
Le rôle de l’intelligence géospatiale dans la reprise économique
Le rôle de l’intelligence géospatiale dans la reprise économiqueLe rôle de l’intelligence géospatiale dans la reprise économique
Le rôle de l’intelligence géospatiale dans la reprise économique
 
Cities2.0 Ict2008 Daniel Kaplan
Cities2.0 Ict2008 Daniel KaplanCities2.0 Ict2008 Daniel Kaplan
Cities2.0 Ict2008 Daniel Kaplan
 
CARLI Usage Stats Keynote 20130325
CARLI Usage Stats Keynote 20130325CARLI Usage Stats Keynote 20130325
CARLI Usage Stats Keynote 20130325
 
SSRN-id2718694
SSRN-id2718694SSRN-id2718694
SSRN-id2718694
 
ZhangTorkkolaLiSchreinerZhangGardnerZhao(04279048)
ZhangTorkkolaLiSchreinerZhangGardnerZhao(04279048)ZhangTorkkolaLiSchreinerZhangGardnerZhao(04279048)
ZhangTorkkolaLiSchreinerZhangGardnerZhao(04279048)
 
Shortest Path Search with pgRouting
Shortest Path Search with pgRoutingShortest Path Search with pgRouting
Shortest Path Search with pgRouting
 
Shortest Path search for real road networks with pgRouting
Shortest Path search for real road networks with pgRoutingShortest Path search for real road networks with pgRouting
Shortest Path search for real road networks with pgRouting
 
Using geobrowsers for thematic mapping
Using geobrowsers for thematic mappingUsing geobrowsers for thematic mapping
Using geobrowsers for thematic mapping
 

More from Neal Lathia

More from Neal Lathia (20)

Everything around the NLP (London.AI Feb 2021)
Everything around the NLP (London.AI Feb 2021)Everything around the NLP (London.AI Feb 2021)
Everything around the NLP (London.AI Feb 2021)
 
Using machine learning for customer service (Data Talks Club)
Using machine learning for customer service (Data Talks Club)Using machine learning for customer service (Data Talks Club)
Using machine learning for customer service (Data Talks Club)
 
Using language models to supercharge Monzo’s customer support
 Using language models to supercharge Monzo’s customer support Using language models to supercharge Monzo’s customer support
Using language models to supercharge Monzo’s customer support
 
Making Better Decisions Faster
Making Better Decisions FasterMaking Better Decisions Faster
Making Better Decisions Faster
 
Machine Learning, Faster
Machine Learning, FasterMachine Learning, Faster
Machine Learning, Faster
 
AI & Personalised Experiences
AI & Personalised ExperiencesAI & Personalised Experiences
AI & Personalised Experiences
 
Opportunities & Challenges in Personalised Travel
Opportunities & Challenges in Personalised TravelOpportunities & Challenges in Personalised Travel
Opportunities & Challenges in Personalised Travel
 
Bootstrapping a Destination Recommendation Engine
Bootstrapping a Destination Recommendation EngineBootstrapping a Destination Recommendation Engine
Bootstrapping a Destination Recommendation Engine
 
Machine Learning for Product Managers
Machine Learning for Product ManagersMachine Learning for Product Managers
Machine Learning for Product Managers
 
Mining Smartphone Data (with Python)
Mining Smartphone Data (with Python)Mining Smartphone Data (with Python)
Mining Smartphone Data (with Python)
 
Data Science in Digital Health
Data Science in Digital HealthData Science in Digital Health
Data Science in Digital Health
 
Analysing Daily Behaviours with Large-Scale Smartphone Data
Analysing Daily Behaviours with Large-Scale Smartphone DataAnalysing Daily Behaviours with Large-Scale Smartphone Data
Analysing Daily Behaviours with Large-Scale Smartphone Data
 
Cambridge Quantified Self Meetup
Cambridge Quantified Self MeetupCambridge Quantified Self Meetup
Cambridge Quantified Self Meetup
 
Data Science in #mHealth
Data Science in #mHealthData Science in #mHealth
Data Science in #mHealth
 
Tube Star: Crowd-Sourced Experiences on Public Transport
Tube Star: Crowd-Sourced Experiences on Public Transport Tube Star: Crowd-Sourced Experiences on Public Transport
Tube Star: Crowd-Sourced Experiences on Public Transport
 
Emotion Sense: From Design to Deployment
Emotion Sense: From Design to DeploymentEmotion Sense: From Design to Deployment
Emotion Sense: From Design to Deployment
 
Opportunities and Challenges of Using Smartphones for Health Monitoring and I...
Opportunities and Challenges of Using Smartphones for Health Monitoring and I...Opportunities and Challenges of Using Smartphones for Health Monitoring and I...
Opportunities and Challenges of Using Smartphones for Health Monitoring and I...
 
The Ubhave Framework
The Ubhave FrameworkThe Ubhave Framework
The Ubhave Framework
 
Contextual Dissonance: Design Bias in Sensor-Based Experience Sampling Methods
Contextual Dissonance: Design Bias in Sensor-Based Experience Sampling MethodsContextual Dissonance: Design Bias in Sensor-Based Experience Sampling Methods
Contextual Dissonance: Design Bias in Sensor-Based Experience Sampling Methods
 
The Ubhave Project (Part 1/2)
The Ubhave Project (Part 1/2)The Ubhave Project (Part 1/2)
The Ubhave Project (Part 1/2)
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 

Mobility Mining for Fare Recommendation

  • 1. recommendations for urban & transport contexts neal lathia (@neal_lathia) ucl media futures seminar may 25, 2011
  • 2. research: personalisation to aide mobility in cities i.e., getting from a to b (habit) finding z (discovery)
  • 3.
  • 4.
  • 5. sensing mobility: 5%-sample, 2 x 83-days time-stamped location (entry, exit), modality payments (top-ups, travel cards) card-types (e.g., student)
  • 6. what tools can we design to help travellers? previously: (getting from a to b) N. Lathia, J. Froehlich, L. Capra. Mining Public Transport Usage for Personalised Intelligent Transport Systems. In IEEE ICDM 2010, Sydney, Australia. (discovering z) D. Quercia, N. Lathia, F. Calabrese, G. Di Lorenzo, J. Crowcroft. Recommending Social Events from Mobile Phone Location Data. In IEEE ICDM 2010, Sydney, Australia.
  • 7. there is more to urban mobility than just moving. context: purpose/intent, events, disruptions, cost, social connections. for example,
  • 9. who are you? how? when? where cash or travel card? to? how long for? what route?
  • 10. N. Lathia, L. Capra. Mining Mobility Data to Minimise Travellers' Spending on Public Transport. In ACM KDD 2011, San Diego, USA. questions (1) what is the relation between how we travel & how we spend? (2) do travellers make the correct decisions? (no) (3) can we help them with recommendations? (yes)
  • 11. (%) pay as you go purchases 49.8 < 5 GBP 24.2 5 – 10 GBP 15.5 10 – 20 GBP ` (%) travel card purchases 70.8 7-day travel card 15.8 1-month travel card 11.6 7-day bus/tram pass Purchase Behaviour 30 Travel 25 Cards PAYG 20 % Purchases 15 10 5 0 Mon Tue Wed Thu Fri Sat Sun
  • 12. Purchase Geography Mobility Flow 45 Zone 1 40 PAYG Zone 2 Travel Cards Zone 3 35 Zone 4 30 Zone 5 Zone 6 25 arrive 20 15 10 5 depart 0 1 2 3 4 5 6 7 8 9
  • 13. the data shows that: (a) there is a high regularity in travel & purchase behaviour (b) travellers buy in small increments and short-terms (c) most purchases happen upon refused entry
  • 14. (2) do travellers make the correct decisions? compare actual purchases to the optimal (per traveller) how: (a) clean data (b) build & search on a tree ~ sequence of choices
  • 15. data cleaning overview 83-days 83-days the “arrow” of time
  • 16. data cleaning overview origin = destination 83-days 83-days the “arrow” of time no purchase observed no purchase observed -20% of users
  • 17. how: build a tree with each user's mobility data where a node is a purchase (expire, cost) that is expanded when it has expired (reduced) example: PAYG, 7-day 30-day £aa.aa £bb.bb £cc.cc
  • 18. how: build a tree with each user's mobility data where a node is a purchase (expire, cost) that is expanded when it has expired (reduced) example: PAYG, 7-day 30-day £aa.aa £bb.bb £cc.cc PAYG, 30-day £aa.aa 7-day £cc.cc £bb.bb we reduce the space-complexity of searching on this tree by implementing expansion rules, pruning heuristics
  • 19. the cheapest sequence of fares can then be compared to what the user actually spent PAYG, £aa.aa PAYG, £aa.aa 7-day £bb.bb 30-day £cc.cc
  • 20. the cheapest sequence of fares can then be compared to what the user actually spent PAYG, £aa.aa in each 83-day dataset, the 5% sample of users where overspending by ~ £2.5 million PAYG, £aa.aa An estimate of how much everybody (100%) is overspending during an entire year (365 days) is thus £200 million 7-day £bb.bb 30-day £cc.cc
  • 21. overspending comes from (a) failing to predict one's own mobility needs ...but we have observed that mobility is predictable (b) failing to match mobility with fares (in a complex fare system) ...which is an easy problem for a computer can we help travellers?
  • 22. recommender systems aim to match users to items that will be of interest to them
  • 23. recommender systems aim to match users mobility profiles to items fares that will be of interest the cheapest for them
  • 24. two prediction problems (in the paper) first, predict how a person is going to travel (focus) then, predict what the cheapest fare will be
  • 25. key fact: the factors that influence cost on public transport are not determined by your actual movements (a to b) but by generic features that are city-dependent (e.g., Zone 1 - 2)
  • 26. three steps 1. for a given set of travel histories, compute the cheapest fare (by tree expansion) 2. reduce each travel history into a set of generic features, describing the mobility (next slide) 3. train classifiers to predict the cheapest fare given the set of features
  • 27. we have a set of {d, f, b, r, pt, ot, N} = F where d = number of trips f = average trips per day b / r = proportion of trips on the bus / rail pt / ot = proportion of peak & off-peak trips N = zone O-D matrix F = cheapest fare (label)
  • 28. two baselines, three algorithms: 0. baseline – everyone on pay as you go 1. naïve bayes – estimating probabilities 2. k-nearest neighbours – looking at similar profiles 3. decision trees (C4.5) – recursively partitions data to infer rules 4. oracle – perfect knowledge
  • 29. Accuracy (%) Savings (GBP) Dataset 1 Dataset 2 Dataset 1 Dataset 2 Baseline 74.99 76.91 326,447.95 306,145.85 Naïve Bayes 77.46 80.71 393,585.81 369,232.24 k-NN (5) 96.74 97.09 465,822.17 426,375.85 C4.5 98.01 98.29 473,918.38 434,082.81 Oracle 100 100 479,583.91 438,923.30
  • 30. N. Lathia, L. Capra. Mining Mobility Data to Minimise Travellers' Spending on Public Transport. In ACM KDD 2011, San Diego, USA. questions (1) what is the relation between how we travel & how we spend? (2) do travellers make the correct decisions? (no) (3) can we help them with recommendations? (yes)
  • 31. recommendations for urban & transport contexts neal lathia (@neal_lathia) ucl media futures seminar may 25, 2011