2. But first...● How many of you … ● own a computer...? ● use Google...? ● have a Facebook account...? ● buy things on Amazon...? ● use Last.fm/Spotify...?
3. Web Companies● Offer you ● Personalised services and advertisements ● Who you know, what you are looking for, recommendations for what you may like● Make money by ● Doing useful things with the data you create ● Selling advertisements, giving recommendations● Everything is centred around “you”
4. Data Mining (web)● Using huge amounts of data ● Clicks onto links ● Ratings for movies ● Friendships● With data mining algorithms to ● Understand (predict) how people behave ● Build systems that will help them
5. Forget the web● 40,000+ people die on European roads each year● Congestion costs an estimated 1% of EU GDP = 100 Billion Euros● Transport accounts for 30% of total energy consumption in the EU● It is expected that 70% of the world population will live in cities by 2050
6. Lots of problems to be solved● Road safety/traffic monitoring● Reducing congestion● Building sustainable transport networks● Urban navigation
7. Another question● How many of you... ● have a smart phone? ● have an Oyster card?
8. Oyster cards/Mobile Phones● These devices produce data that is very similar to the data you create online ● Talking/texting friends ● Checking in/rating to locations ● Travelling around London with your Oyster card
9. Example● On last.fm, you only listen to rock music● On TfL, you only travel on buses● Both are implicit indications of your preferences
10. My research● Can we use the technologies that work so well for web companies to solve problems in cities?
11. My research● Can we use the technologies that work so well for web companies to solve problems in cities?● Todays examples: ● Personalised tube services ● Ticket recommendations
12. Todays examples● Will give you a very brief introduction to things people who are doing data mining are interested in: ● Clustering ● Regression ● Ranking ● Classification
14. 2 x ~300,000 travellers (5%)~7,000,0000 tube trips
15. ...is this you?
16. Clustering● We are looking for the different habits that travellers may have● Clustering is a process of automatically organising data into groups, so that each group has very similar members
17. How does it work?
18. Clustering● We can start seeing how different travellers move about the city
20. Predicting travel time● How long will it take me to get there?● Every time you travel, you make some data ● From where + what time → to where + what time● Every one is creating their own data● When you want to travel, can we give you a personalised travel time estimate?
21. How does it work?● We design algorithms that leverage this data: ● Self-similarity: how long it took you before ● Familiarity: people who are similar to you ● Context: time you are travelling
22. How well does it work?● On average, how much error in the predictions? ● Using the mean trip time, 11.45 minutes ● Using zone-zone mean time, 8.56 minutes ● Using journey planner, ~6 minutes ● Using our algorithms, < 3 minutes ● Combined algorithm: 2.92 minutes
24. Station Alerts● How often do you get to a station and find that there is a problem?● Travel alerts/disruptions: you need to look manually for what is relevant to you● But every time you touch in a station, you are showing that you are potentially interested in what happens there
25. Ranking● Is the process of making an ordered list ● We can automatically make a unique list for each person ● The stations will be sorted according to how relevant they are to your travels
26. How does it work?● Each station has a weight (a number) that we use to sort them ● At first, the weight is just how popular the station is ● We increase the weights of stations you visit often ● We increase the weights of stations that are similar to the ones you visit often – Similar, in this case, means that “people who travel to/from station X also travel to/from station Y”
27. Does it work?● We use a metric called percentile ranking● Smaller values are better
29. Paying for travel● Is it cheaper to use pay as you go?● Which travel card is best?● … how do you decide?
30. Paying for travel● Is it cheaper to use pay as you go?● Which travel card is best?● … how do you decide?● The cheapest fare will depend on where you need to go, when you need to travel, and how you tend to go there (bus, train)
31. Wasting money?● The Oyster card data we have shows what ticket people were using● We can use their trips to compute what the cheapest fare would have been, and then see how much money they could have saved
32. Wasting Money!● Based on the data, travellers could save about £200 million per year● If they were buying the cheapest tickets for their travel needs● Can we help them buy the best fare?
33. Classification● Is the process of assigning some data to a group. In our case, ● Data = a persons travel habits ● Group = the cheapest ticket
34. How does it work?● We used decision trees: an automatic way of recursively partitioning data and discovering rules to classify data
35. Example● Neals travel habits: ● 2.5 average trips per day ● 85% trips on the tube / 15% trips on buses ● 75% of trips during peak-hours ● 95% of trips between Zone 1 and Zone 2 ● Decision tree says: Neal should buy a Zone 1-2 travel card
36. Does this work?● We can ask our algorithm to predict what the best ticket for a person will be, and see if it predicts correctly● We pick a group that could have saved £479,583.91● Our algorithm is > 98% accurate; if this group followed our recommendations, it would have saved £473,918.38
38. Summary● Data mining for the city ● People are already carrying around Oyster cards and mobile phones, and making lots of useful data about their movements ● There are a lot of problems that can be tackled using data mining
39. Summary● We have looked at examples of ● Clustering: grouping peoples behaviours ● Regression: predicting travel times ● Ranking: making an ordered list of stations ● Classification: recommending the best ticket