2. Runners want to run faster
What goal should you set for a half marathon time?
What training programs do the fastest runners follow?
3. Data from Strava.com
Elevation gain this month
Number of rest days per week
Pace
Halfmarathontime(min)
Age
Time series, demographic, and aggregated running data on
10,000 runners. 1,000 with half-marathon times and other features.
Loghalfmarathontime
Log distance run this month (mi)
Pace
4. Distance past month Weight range
Time past month Age range
Pace past month Number of rest days/wk
Distance past 6 months Number of long days/wk
Gender Sdev pace
Data from Strava.com
Pace
Halfmarathontime(min)
Sex
Age range (years)
Halfmarathontime
Weight range (lbs)
5. Analysis
Benchmarking with a linear model 0.49 10 min
Nonlinear regression modeling
1. Lasso regression 0.48 10 min
2. Ridge regression 0.48 10 min
3. Random forest regression 0.66 8.3 min
Regression r2
RMSE
Validation:
179 runners
3-fold cross-validation 0.79 6.2 min
Seems to be related to a different distribution in the test
set. Possibly because of importance of outliers.
6. Your average pace over the past month is the most
important feature by far.
Results
Variable importance
Pace past month
Distance past month
Distance past 6 months
Elevation past month
Rest days
SD pace
Weight
Long days
Age
Gender
Decrease in node impurities
7. About me: Alexis Yelton, MIT postdoc
Genomics for understanding
ecosystems:
Discovery of novel organisms,
metabolisms, and ecosystem
functions such as large organic
compound breakdown by marine
cyanobacteria (Tara Oceans data
set 7.2 terabases of DNA)
Chitinase in marine Synechococcus
Chitinaseactivity
Wet lab science for
validating discoveries: My first half
marathon:
1:56:30
Personal best:
1:47:56
Editor's Notes
Start with asking if anyone is a runner. Be excited about the problem.
Try elastic net regression (lasso and ridge combination)