Hi, I’m Joerg and I want to talk with you about runFindr. As you can probably tell from the title, I’m passionate about running and
10 s
A few months ago, when I moved from Boston to the DC area and I realized a distinct change in my running behavior. I had a very defined set of routes I enjoyed (you can see a heatmap of those below, and so I spent a lot of time actually running along the river you see in the photo above). In DC however, my running route selection looked a lot more like a random walk, with a little bit of bias from looking at google maps or using routeloops.com. And it took several months for me to aquire the necessary domain knowledge to have routes that I would subjectively have scored similarly than the ones in Boston.
20 s
A few months ago, when I moved from Boston to the DC area and I realized a distinct change in my running behavior. I had a very defined set of routes I enjoyed (you can see a heatmap of those below, and so I spent a lot of time actually running along the river you see in the photo above). In DC however, my running route selection looked a lot more like a random walk, with a little bit of bias from looking at google maps or using routeloops.com. And it took several months for me to aquire the necessary domain knowledge to have routes that I would subjectively have scored similarly than the ones in Boston.
20 s
So how does this all work? The backbone of this data are two tables, that contain information about running routes and the points contained in those routes repectively. The routes table has about 100.000 tracks which I acquired through the mapmyfitness api, and based on that and information from a number of other APIs, we can calculate scores for all the features that you saw on the sliders in the app (and some extra ones not shown there).
20 s
When you enter an address and distance in the web app, we want to do computation on those routes, and doing that on all 100.000 routes is too expensive. So we cluster the routes, then find the 3 clusters closest to your location, then only select routes from these clusters that have the right length.
10 s
Calulate a total score for them, with means and standard deviation for the actual set of routes selected and with standard weights for now (that correspond to the average of all users), once we scored, we can rank and select the best ones and retrieve the points for those from the Points database.
10 s
Then we can display those routes and let the user refine the weights to close a feeback loop that allows the user to customize the routes till he finds one he likes.
10 s
(+ 20 s adv for data stories on website
With this huge database of routes and their properties, we can tell cool data stories and there is a tab on my website where you can look up some of them, If you’re like me, excited about that kind of stuff. But I want to use my last minute trying to investigate if you should believe what this website tells you. And so in order for this to work, we need to satisfy two main conditions.)
One, the features we selected for the sliders must be meaningful predictiors, something like fundamental dimensions of running route space. That is, if you liked routes with that heavily feature nature in one city, are you going to pick routes with lots of nature in the second city too. This plot shows exactly this data for people in my dataset that moved between several cities, and the color encodes which feature the points represent. If all features where perfect predictiors for every single user the correlation would be a straight line, and while there is some noise, the fit is really quite good. So the features encoded in the sliders are good.
30 s
And initially he struggled to find routes with the same subjective quality for him, and it takes almost a year to get to the same level (note that these are normalized locally, so the absolute quality might still be different, this also shows that normalizing locally works well).
30 s
And initially he struggled to find routes with the same subjective quality for him, and it takes almost a year to get to the same level (note that these are normalized locally, so the absolute quality might still be different, this also shows that normalizing locally works well).
30 s
And initially he struggled to find routes with the same subjective quality for him, and it takes almost a year to get to the same level (note that these are normalized locally, so the absolute quality might still be different, this also shows that normalizing locally works well).
30 s