Personalisation for Smarter Cities
               Neal Lathia
        UCL Computer Science
         n.lathia@cs.ucl.ac.uk
But first...
●   How many of you …
    ●   own a computer...?
    ●   use Google...?
    ●   have a Facebook account...?
    ●   buy things on Amazon...?
    ●   use Last.fm/Spotify...?
Web Companies
●   Offer you
    ●   Personalised services and advertisements
    ●   Who you know, what you are looking for,
        recommendations for what you may like
●   Make money by
    ●   Doing useful things with the data you create
    ●   Selling advertisements, giving recommendations
●   Everything is centred around “you”
Data Mining (web)
●   Using huge amounts of data
    ●   Clicks onto links
    ●   Ratings for movies
    ●   Friendships
●   With data mining algorithms to
    ●   Understand (predict) how people behave
    ●   Build systems that will help them
Forget the web
●   40,000+ people die on European roads each
    year
●   Congestion costs an estimated 1% of EU GDP
    = 100 Billion Euros
●   Transport accounts for 30% of total energy
    consumption in the EU
●   It is expected that 70% of the world population
    will live in cities by 2050
Lots of problems to be solved
●   Road safety/traffic monitoring
●   Reducing congestion
●   Building sustainable transport networks
●   Urban navigation
Another question
●   How many of you...
    ●   have a smart phone?
    ●   have an Oyster card?
Oyster cards/Mobile Phones
●   These devices produce data that is very similar
    to the data you create online
    ●   Talking/texting friends
    ●   Checking in/rating to locations
    ●   Travelling around London with your Oyster card
Example
●   On last.fm, you only listen to rock music
●   On TfL, you only travel on buses

●   Both are implicit indications of your preferences
My research
●   Can we use the technologies that work so well
    for web companies to solve problems in cities?
My research
●   Can we use the technologies that work so well
    for web companies to solve problems in cities?

●   Today's examples:
    ●   Personalised tube services
    ●   Ticket recommendations
Today's examples
●   Will give you a very brief introduction to things
    people who are doing data mining are
    interested in:
    ●   Clustering
    ●   Regression
    ●   Ranking
    ●   Classification
Clustering
2 x ~300,000 travellers (5%)
~7,000,0000 tube trips
...is this you?
Clustering
●   We are looking for the different habits that
    travellers may have
●   Clustering is a process of automatically
    organising data into groups, so that each group
    has very similar members
How does it work?
Clustering
●   We can start seeing
    how different
    travellers move about
    the city
Regression
Predicting travel time
●   How long will it take me to get there?
●   Every time you travel, you make some data
    ●   From where + what time → to where + what time
●   Every one is creating their own data

●   When you want to travel, can we give you a
    personalised travel time estimate?
How does it work?
●   We design algorithms that leverage this data:
    ●   Self-similarity: how long it took you before
    ●   Familiarity: people who are similar to you
    ●   Context: time you are travelling
How well does it work?
●   On average, how much error in the predictions?
    ●   Using the mean trip time, 11.45 minutes
    ●   Using zone-zone mean time, 8.56 minutes
    ●   Using journey planner, ~6 minutes
    ●   Using our algorithms, < 3 minutes
    ●   Combined algorithm: 2.92 minutes
Ranking
Station Alerts
●   How often do you get to a station and find that
    there is a problem?
●   Travel alerts/disruptions: you need to look
    manually for what is relevant to you
●   But every time you touch in a station, you are
    showing that you are potentially interested in
    what happens there
Ranking
●   Is the process of making an ordered list
    ●   We can automatically make a unique list for each
        person
    ●   The stations will be sorted according to how
        relevant they are to your travels
How does it work?
●   Each station has a weight (a number) that we
    use to sort them
    ●   At first, the weight is just how popular the station is
    ●   We increase the weights of stations you visit often
    ●   We increase the weights of stations that are similar
        to the ones you visit often
         –   Similar, in this case, means that “people who travel
             to/from station X also travel to/from station Y”
Does it work?
●   We use a metric
    called percentile
    ranking
●   Smaller values are
    better
Classification
Paying for travel
●   Is it cheaper to use pay as you go?
●   Which travel card is best?
●   … how do you decide?
Paying for travel
●   Is it cheaper to use pay as you go?
●   Which travel card is best?
●   … how do you decide?

●   The cheapest fare will depend on where you
    need to go, when you need to travel, and how
    you tend to go there (bus, train)
Wasting money?
●   The Oyster card data we have shows what
    ticket people were using
●   We can use their trips to compute what the
    cheapest fare would have been, and then see
    how much money they could have saved
Wasting Money!
●   Based on the data, travellers could save about
                £200 million per year
●   If they were buying the cheapest tickets for their
    travel needs
●   Can we help them buy the best fare?
Classification
●   Is the process of assigning some data to a
    group. In our case,
    ●   Data = a person's travel habits
    ●   Group = the cheapest ticket
How does it work?
●   We used decision
    trees: an automatic
    way of recursively
    partitioning data and
    discovering rules to
    classify data
Example
●   Neal's travel habits:
    ●   2.5 average trips per day
    ●   85% trips on the tube / 15% trips on buses
    ●   75% of trips during peak-hours
    ●   95% of trips between Zone 1 and Zone 2

    ●   Decision tree says: Neal should buy a Zone 1-2
        travel card
Does this work?
●   We can ask our algorithm to predict what the
    best ticket for a person will be, and see if it
    predicts correctly
●   We pick a group that could have saved
    £479,583.91
●   Our algorithm is > 98% accurate; if this group
    followed our recommendations, it would have
    saved £473,918.38
Summary
Summary
●   Data mining for the city
    ●   People are already carrying around Oyster cards
        and mobile phones, and making lots of useful data
        about their movements
    ●   There are a lot of problems that can be tackled
        using data mining
Summary
●   We have looked at examples of
    ●   Clustering: grouping people's behaviours
    ●   Regression: predicting travel times
    ●   Ranking: making an ordered list of stations
    ●   Classification: recommending the best ticket
Personalisation for Smarter Cities
               Neal Lathia
        UCL Computer Science
         n.lathia@cs.ucl.ac.uk

Personalisation for Smarter Cities

  • 1.
    Personalisation for SmarterCities Neal Lathia UCL Computer Science n.lathia@cs.ucl.ac.uk
  • 2.
    But first... ● How many of you … ● own a computer...? ● use Google...? ● have a Facebook account...? ● buy things on Amazon...? ● use Last.fm/Spotify...?
  • 3.
    Web Companies ● Offer you ● Personalised services and advertisements ● Who you know, what you are looking for, recommendations for what you may like ● Make money by ● Doing useful things with the data you create ● Selling advertisements, giving recommendations ● Everything is centred around “you”
  • 4.
    Data Mining (web) ● Using huge amounts of data ● Clicks onto links ● Ratings for movies ● Friendships ● With data mining algorithms to ● Understand (predict) how people behave ● Build systems that will help them
  • 5.
    Forget the web ● 40,000+ people die on European roads each year ● Congestion costs an estimated 1% of EU GDP = 100 Billion Euros ● Transport accounts for 30% of total energy consumption in the EU ● It is expected that 70% of the world population will live in cities by 2050
  • 6.
    Lots of problemsto be solved ● Road safety/traffic monitoring ● Reducing congestion ● Building sustainable transport networks ● Urban navigation
  • 7.
    Another question ● How many of you... ● have a smart phone? ● have an Oyster card?
  • 8.
    Oyster cards/Mobile Phones ● These devices produce data that is very similar to the data you create online ● Talking/texting friends ● Checking in/rating to locations ● Travelling around London with your Oyster card
  • 9.
    Example ● On last.fm, you only listen to rock music ● On TfL, you only travel on buses ● Both are implicit indications of your preferences
  • 10.
    My research ● Can we use the technologies that work so well for web companies to solve problems in cities?
  • 11.
    My research ● Can we use the technologies that work so well for web companies to solve problems in cities? ● Today's examples: ● Personalised tube services ● Ticket recommendations
  • 12.
    Today's examples ● Will give you a very brief introduction to things people who are doing data mining are interested in: ● Clustering ● Regression ● Ranking ● Classification
  • 13.
  • 14.
    2 x ~300,000travellers (5%) ~7,000,0000 tube trips
  • 15.
  • 16.
    Clustering ● We are looking for the different habits that travellers may have ● Clustering is a process of automatically organising data into groups, so that each group has very similar members
  • 17.
  • 18.
    Clustering ● We can start seeing how different travellers move about the city
  • 19.
  • 20.
    Predicting travel time ● How long will it take me to get there? ● Every time you travel, you make some data ● From where + what time → to where + what time ● Every one is creating their own data ● When you want to travel, can we give you a personalised travel time estimate?
  • 21.
    How does itwork? ● We design algorithms that leverage this data: ● Self-similarity: how long it took you before ● Familiarity: people who are similar to you ● Context: time you are travelling
  • 22.
    How well doesit work? ● On average, how much error in the predictions? ● Using the mean trip time, 11.45 minutes ● Using zone-zone mean time, 8.56 minutes ● Using journey planner, ~6 minutes ● Using our algorithms, < 3 minutes ● Combined algorithm: 2.92 minutes
  • 23.
  • 24.
    Station Alerts ● How often do you get to a station and find that there is a problem? ● Travel alerts/disruptions: you need to look manually for what is relevant to you ● But every time you touch in a station, you are showing that you are potentially interested in what happens there
  • 25.
    Ranking ● Is the process of making an ordered list ● We can automatically make a unique list for each person ● The stations will be sorted according to how relevant they are to your travels
  • 26.
    How does itwork? ● Each station has a weight (a number) that we use to sort them ● At first, the weight is just how popular the station is ● We increase the weights of stations you visit often ● We increase the weights of stations that are similar to the ones you visit often – Similar, in this case, means that “people who travel to/from station X also travel to/from station Y”
  • 27.
    Does it work? ● We use a metric called percentile ranking ● Smaller values are better
  • 28.
  • 29.
    Paying for travel ● Is it cheaper to use pay as you go? ● Which travel card is best? ● … how do you decide?
  • 30.
    Paying for travel ● Is it cheaper to use pay as you go? ● Which travel card is best? ● … how do you decide? ● The cheapest fare will depend on where you need to go, when you need to travel, and how you tend to go there (bus, train)
  • 31.
    Wasting money? ● The Oyster card data we have shows what ticket people were using ● We can use their trips to compute what the cheapest fare would have been, and then see how much money they could have saved
  • 32.
    Wasting Money! ● Based on the data, travellers could save about £200 million per year ● If they were buying the cheapest tickets for their travel needs ● Can we help them buy the best fare?
  • 33.
    Classification ● Is the process of assigning some data to a group. In our case, ● Data = a person's travel habits ● Group = the cheapest ticket
  • 34.
    How does itwork? ● We used decision trees: an automatic way of recursively partitioning data and discovering rules to classify data
  • 35.
    Example ● Neal's travel habits: ● 2.5 average trips per day ● 85% trips on the tube / 15% trips on buses ● 75% of trips during peak-hours ● 95% of trips between Zone 1 and Zone 2 ● Decision tree says: Neal should buy a Zone 1-2 travel card
  • 36.
    Does this work? ● We can ask our algorithm to predict what the best ticket for a person will be, and see if it predicts correctly ● We pick a group that could have saved £479,583.91 ● Our algorithm is > 98% accurate; if this group followed our recommendations, it would have saved £473,918.38
  • 37.
  • 38.
    Summary ● Data mining for the city ● People are already carrying around Oyster cards and mobile phones, and making lots of useful data about their movements ● There are a lot of problems that can be tackled using data mining
  • 39.
    Summary ● We have looked at examples of ● Clustering: grouping people's behaviours ● Regression: predicting travel times ● Ranking: making an ordered list of stations ● Classification: recommending the best ticket
  • 40.
    Personalisation for SmarterCities Neal Lathia UCL Computer Science n.lathia@cs.ucl.ac.uk