But first...● How many of you … ● own a computer...? ● use Google...? ● have a Facebook account...? ● buy things on Amazon...? ● use Last.fm/Spotify...?
Web Companies● Offer you ● Personalised services and advertisements ● Who you know, what you are looking for, recommendations for what you may like● Make money by ● Doing useful things with the data you create ● Selling advertisements, giving recommendations● Everything is centred around “you”
Data Mining (web)● Using huge amounts of data ● Clicks onto links ● Ratings for movies ● Friendships● With data mining algorithms to ● Understand (predict) how people behave ● Build systems that will help them
Forget the web● 40,000+ people die on European roads each year● Congestion costs an estimated 1% of EU GDP = 100 Billion Euros● Transport accounts for 30% of total energy consumption in the EU● It is expected that 70% of the world population will live in cities by 2050
Lots of problems to be solved● Road safety/traffic monitoring● Reducing congestion● Building sustainable transport networks● Urban navigation
Another question● How many of you... ● have a smart phone? ● have an Oyster card?
Oyster cards/Mobile Phones● These devices produce data that is very similar to the data you create online ● Talking/texting friends ● Checking in/rating to locations ● Travelling around London with your Oyster card
Example● On last.fm, you only listen to rock music● On TfL, you only travel on buses● Both are implicit indications of your preferences
My research● Can we use the technologies that work so well for web companies to solve problems in cities?
My research● Can we use the technologies that work so well for web companies to solve problems in cities?● Todays examples: ● Personalised tube services ● Ticket recommendations
Todays examples● Will give you a very brief introduction to things people who are doing data mining are interested in: ● Clustering ● Regression ● Ranking ● Classification
Predicting travel time● How long will it take me to get there?● Every time you travel, you make some data ● From where + what time → to where + what time● Every one is creating their own data● When you want to travel, can we give you a personalised travel time estimate?
How does it work?● We design algorithms that leverage this data: ● Self-similarity: how long it took you before ● Familiarity: people who are similar to you ● Context: time you are travelling
How well does it work?● On average, how much error in the predictions? ● Using the mean trip time, 11.45 minutes ● Using zone-zone mean time, 8.56 minutes ● Using journey planner, ~6 minutes ● Using our algorithms, < 3 minutes ● Combined algorithm: 2.92 minutes
Station Alerts● How often do you get to a station and find that there is a problem?● Travel alerts/disruptions: you need to look manually for what is relevant to you● But every time you touch in a station, you are showing that you are potentially interested in what happens there
Ranking● Is the process of making an ordered list ● We can automatically make a unique list for each person ● The stations will be sorted according to how relevant they are to your travels
How does it work?● Each station has a weight (a number) that we use to sort them ● At first, the weight is just how popular the station is ● We increase the weights of stations you visit often ● We increase the weights of stations that are similar to the ones you visit often – Similar, in this case, means that “people who travel to/from station X also travel to/from station Y”
Does it work?● We use a metric called percentile ranking● Smaller values are better
Paying for travel● Is it cheaper to use pay as you go?● Which travel card is best?● … how do you decide?
Paying for travel● Is it cheaper to use pay as you go?● Which travel card is best?● … how do you decide?● The cheapest fare will depend on where you need to go, when you need to travel, and how you tend to go there (bus, train)
Wasting money?● The Oyster card data we have shows what ticket people were using● We can use their trips to compute what the cheapest fare would have been, and then see how much money they could have saved
Wasting Money!● Based on the data, travellers could save about £200 million per year● If they were buying the cheapest tickets for their travel needs● Can we help them buy the best fare?
Classification● Is the process of assigning some data to a group. In our case, ● Data = a persons travel habits ● Group = the cheapest ticket
How does it work?● We used decision trees: an automatic way of recursively partitioning data and discovering rules to classify data
Example● Neals travel habits: ● 2.5 average trips per day ● 85% trips on the tube / 15% trips on buses ● 75% of trips during peak-hours ● 95% of trips between Zone 1 and Zone 2 ● Decision tree says: Neal should buy a Zone 1-2 travel card
Does this work?● We can ask our algorithm to predict what the best ticket for a person will be, and see if it predicts correctly● We pick a group that could have saved £479,583.91● Our algorithm is > 98% accurate; if this group followed our recommendations, it would have saved £473,918.38
Summary● Data mining for the city ● People are already carrying around Oyster cards and mobile phones, and making lots of useful data about their movements ● There are a lot of problems that can be tackled using data mining
Summary● We have looked at examples of ● Clustering: grouping peoples behaviours ● Regression: predicting travel times ● Ranking: making an ordered list of stations ● Classification: recommending the best ticket