Tailoring Analytic Algorithms and Visualization to Address User Requirements - how do you get from a user with data to one acting on the basis of said data?
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Generating insight from data
1. 15 July 2016 P2175-P-011 v0.2
Commercially Confidential
Generating Insight from Data
Tailoring Analytic Algorithms and Visualization to
Address User Requirements
2. 15 July 2016 P2175-P-011 v0.22Commercially Confidential
The Challenge
How do you get from a user with data to one acting on the basis of said data?
We will elaborate by way of example(s)
Data
+
User
Action !
Needs
Analysis
Visualisation
3. 15 July 2016 P2175-P-011 v0.23Commercially Confidential
TfL Data Overview
The Transport for London (TfL) Tube travel data set provides a large, open-
access data set we can play around with to demonstrate the process
Over ½ million journeys logged giving location and time of the start and end of each.
– Needs cleaning (unstarted, unfinished, not applicable etc…)
While nominally about journeys, the data can be re-analysed to give information
about:
– Stations
– Lines
Other meta data allows for potentially interesting analyses, such as user type (elderly
pass user, season ticket user etc…)
4. 15 July 2016 P2175-P-011 v0.24Commercially Confidential
User: Needs
Consider a potential user. What questions do they want answers to? What
information is of use to them?
We consider a user who wants to know about stations – not just in terms of usage
but in terms of which stations are similar to other stations. This may be for several
reasons:
– Interested in advertising based on likely users;
– Interested in appropriate staffing and rostering of stations;
– Interested in issue tracking, learning and apply lessons to similar stations;
Alternative users might be interested in the traffic flow on lines and how they are
affected by station closure:
– Emergency / contingency planning;
– Sophisticated travel advice apps;
5. 15 July 2016 P2175-P-011 v0.25Commercially Confidential
Analysis: Station Profiles
We refocus the data set to give a profile of the usage of a station – recording
both arrival and departure rates across the working day
Comparing total usage of stations is easily done by the users (already) so we scale
each of these profiles to have a maximum value of 1.
6. 15 July 2016 P2175-P-011 v0.26Commercially Confidential
Analysis: Dissimilarity Metric
User is interested in type of station (e.g. commuter source) but not interested in
the precise timing of the commuter rushes
Stations close to the centre (e.g.
Harrow) have later morning
departure peaks than stations
further out (e.g. Chorleywood)
The reverse is true for the
evening arrivals rush
Dissimilarity between stations is
determined by minimum
Euclidian distance between
arrival and departure profiles
allowing for small timeshifts
Timeshifts must be applied in
opposite directions for arrivals
and departures
7. 15 July 2016 P2175-P-011 v0.27Commercially Confidential
Analysis: Automatic Clustering
Agglomerative hierarchical clustering technique was used with group average
linkage to merge clusters
Complete dendrogram is easy to calculate – deciding where to split is
Splitting into 6 clusters provided useful insight (more clusters are also insightful)
8. 15 July 2016 P2175-P-011 v0.28Commercially Confidential
Analysis: 6 Clusters
Some insights can be gained just from looking at the clusters – e.g. the clusters
were labelled by observing their membership
Commuter Source: 168 stations, characterised by a morning departures peak
and an evening arrivals peak, mainly located in the suburbs (e.g. Barnet)
Commuter Destination: 44 stations, characterised by a morning arrivals peak
and an evening departures peak, mainly central London (e.g. Canary Wharf)
Transit: 44 stations, with peaks as a commuter destination but also keeping high
usage throughout the day, includes most rail/tube interchanges, (e.g. Kings Cross)
Social: 3 stations, with peaks as a commuter destination, but with extra arrivals
early evening and many departures very late in the evening, (e.g. Covent Garden)
Heathrow Terminal 4: Cluster of one whose behaviour is highly variable -
dependent upon flights rather than typical work patterns.
Heathrow Terminals 1,2,&3: Cluster of one whose behaviour is highly variable -
dependent upon flights rather than typical work patterns.
…Text is a poor way of displaying these
9. 15 July 2016 P2175-P-011 v0.29Commercially Confidential
Geographic Visualisation
Further insight can be achieved by using an interactive, web based,
visualisation tool to show the location, cluster and current usage of each station
10. 15 July 2016 P2175-P-011 v0.210Commercially Confidential
Geographic Visualisation
Rush-hour becomes startlingly clear as the size of each station is proportional
to how busy it is
11. 15 July 2016 P2175-P-011 v0.211Commercially Confidential
Geographic Visualisation
Pan and zoom (inherited from Google Maps) allow a user to focus their interest
in an intuitive manner – clicking on a station brings up details on the right
12. 15 July 2016 P2175-P-011 v0.212Commercially Confidential
Conclusions
To generate new insight you need to determine users needs, apply appropriate
analytics and display with suitable visualisations.
Data analysis without an understanding of the goal may just be empty maths;
Data visualisation on it’s own may be very pretty, but not useful;
New insights can be generated from analytics + visualisation
Target these to address user needs and you have something useful