SlideShare a Scribd company logo
1 of 15
Can you run faster?
Alexis Yelton
Runners want to run faster
How fast can I run a half marathon?
How can I improve on that time?
Demo
Data from Strava.com
Pace
Time series, demographic, and aggregated running data on
10,000 runners. 1,000 with half-marathon times.
75
100
125
150
1 2 3 4
Rests per week
HalfMarathonTime
"
1
8.4
8.6
8.8
9.0
0 2 4
Log Month Distance
LogHalfMarathonTime
10
10
10
20
clean5$mnth_pace
8.4
8.6
8.8
9.0
0 2 4
Log Month Distance
LogHalfMarathonTime
10
10
10
20
clean5$mnth_pace
Data from Strava.com
60
80
100
120
140
160
20 30 40 50 60
Age (years)
HalfMarathonTime
"blue"
blue
10
10
60
80
100
120
140
160
120 140 160 180 200
Weight (lbs)
HalfMarathonTime
Analysis
Benchmarking with a linear model 0.73 6.5 min
Reducing number of features
Ensemble partial least squares regression 0.73 6.4 min
5-fold cross-validation
Regression r2
RMSE
Validation:
72 runners 0.63 7.2 min
About me: Alexis Yelton, MIT postdoc
Chitinase in marine cyanobacteria
Chitinaseactivity
My first half
marathon:
1:56:30
Personal best:
1:47:56
22 Features
Month distance Weight Range
Month Runs Gender
Month Elevation Rest Days / Week
Month Pace Fast Days / Week
Month Time Long Days /Week
6 Month Distance 5K Time
6 Month Runs Marathon Time
6 Month Elevation Minimum Pace
6 Month Pace Minimum Pace > 2 mi
6 Month Time Minimum Pace > 3 mi
Age Range SD Pace
Results
0
500
1000
1500
4000 5000 6000 7000 8000
train_sub$HALF.MARATHON
sqrt(errorstrain^2)
Half Marathon Time
Errors vary with half
marathon time.
A larger data set
would allow for
better predictions
for faster and slower
runners.
Results
Analysis
Benchmarking with a linear model 0.73 6.5 min
Dealing with collinear features (and reducing number of features)
1. Ensemble partial least squares regression 0.72 6.6 min
2. Linear model 0.71
6.7 min
3. Lasso regression 0.69 6.8 min
4. Ridge regression 0.72 6.7 min
5. Random forest regression 0.67 7.1 min
3-fold cross-validation
Regression r2
RMSE
Validation:
69 runners 0.63 7.2 min
Analysis
Benchmarking with a linear model 0.73 6.5 min
Reducing number of features
1. Ensemble partial least squares regression 0.72 6.6 min
2. Linear model 0.71
6.7 min
3. Lasso regression 0.69 6.8 min
Other models with these features
1. Ridge regression 0.72 6.7 min
2. Random forest regression 0.67 7.1 min
3-fold cross-validation
Regression r2
RMSE
Validation:
69 runners 0.63 7.2 min
0.0e+00 4.0e+07 8.0e+07 1.2e+08
rf$finalModel
IncNodePurity
Your average pace over the past month is the most
important feature by far.
Results
Variable importance
Increase in node purity
Pace past month
5K time
Pace past month
Rest days
SD pace
Weight
Long days
Age
Gender

More Related Content

Similar to Week 5 presentation 2

Maquinaria pesada y movimientos de tierras. Problemas de rendimiento
Maquinaria pesada y movimientos de tierras. Problemas de rendimientoMaquinaria pesada y movimientos de tierras. Problemas de rendimiento
Maquinaria pesada y movimientos de tierras. Problemas de rendimientoDerekRamos8
 
Nonlinear data mining techniques or clustering to improve predictions of a la...
Nonlinear data mining techniques or clustering to improve predictions of a la...Nonlinear data mining techniques or clustering to improve predictions of a la...
Nonlinear data mining techniques or clustering to improve predictions of a la...FAO
 
Untitledtest6818 aef6d6c0991a013425ea
Untitledtest6818 aef6d6c0991a013425eaUntitledtest6818 aef6d6c0991a013425ea
Untitledtest6818 aef6d6c0991a013425eapaul diya
 
3 Ways the New Thermo Scientific LC MS Triple Quads Improve Residue Analysis
3 Ways the New Thermo Scientific LC MS Triple Quads Improve Residue Analysis3 Ways the New Thermo Scientific LC MS Triple Quads Improve Residue Analysis
3 Ways the New Thermo Scientific LC MS Triple Quads Improve Residue AnalysisThermo Fisher Scientific
 
Untitledtest6806 c17b226098de013425b9
Untitledtest6806 c17b226098de013425b9Untitledtest6806 c17b226098de013425b9
Untitledtest6806 c17b226098de013425b9paul diya
 
Untitledtest6806 c17b226098de013425b9
Untitledtest6806 c17b226098de013425b9Untitledtest6806 c17b226098de013425b9
Untitledtest6806 c17b226098de013425b9paul diya
 
Body Weight-Normalized Energy Expenditure Estimation Using Combined Activity ...
Body Weight-Normalized Energy Expenditure Estimation Using Combined Activity ...Body Weight-Normalized Energy Expenditure Estimation Using Combined Activity ...
Body Weight-Normalized Energy Expenditure Estimation Using Combined Activity ...Marco Altini
 
Estimating Tail Parameters
Estimating Tail ParametersEstimating Tail Parameters
Estimating Tail ParametersAlejandro Ortega
 
Variable Selection Methods
Variable Selection MethodsVariable Selection Methods
Variable Selection Methodsjoycemi_la
 
Variable Selection Methods
Variable Selection MethodsVariable Selection Methods
Variable Selection Methodsjoycemi_la
 
Survival report of 76 breast cancer patients under three different treatments
Survival report of 76 breast cancer patients under three different treatmentsSurvival report of 76 breast cancer patients under three different treatments
Survival report of 76 breast cancer patients under three different treatmentsDwaipayan Mukhopadhyay
 
Risk management
Risk managementRisk management
Risk managementSunam Pal
 
Risk management Report
Risk management ReportRisk management Report
Risk management ReportNewGate India
 
Practical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and PresentationPractical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and PresentationHariniMS1
 
Design of experiment methodology
Design of experiment methodologyDesign of experiment methodology
Design of experiment methodologyCHUN-HAO KUNG
 
Enhancing the performance of kmeans algorithm
Enhancing the performance of kmeans algorithmEnhancing the performance of kmeans algorithm
Enhancing the performance of kmeans algorithmHadi Fadlallah
 
Simulation and hardware implementation of Adaptive algorithms on tms320 c6713...
Simulation and hardware implementation of Adaptive algorithms on tms320 c6713...Simulation and hardware implementation of Adaptive algorithms on tms320 c6713...
Simulation and hardware implementation of Adaptive algorithms on tms320 c6713...Raj Kumar Thenua
 

Similar to Week 5 presentation 2 (20)

Maquinaria pesada y movimientos de tierras. Problemas de rendimiento
Maquinaria pesada y movimientos de tierras. Problemas de rendimientoMaquinaria pesada y movimientos de tierras. Problemas de rendimiento
Maquinaria pesada y movimientos de tierras. Problemas de rendimiento
 
Nonlinear data mining techniques or clustering to improve predictions of a la...
Nonlinear data mining techniques or clustering to improve predictions of a la...Nonlinear data mining techniques or clustering to improve predictions of a la...
Nonlinear data mining techniques or clustering to improve predictions of a la...
 
Untitledtest6818 aef6d6c0991a013425ea
Untitledtest6818 aef6d6c0991a013425eaUntitledtest6818 aef6d6c0991a013425ea
Untitledtest6818 aef6d6c0991a013425ea
 
3 Ways the New Thermo Scientific LC MS Triple Quads Improve Residue Analysis
3 Ways the New Thermo Scientific LC MS Triple Quads Improve Residue Analysis3 Ways the New Thermo Scientific LC MS Triple Quads Improve Residue Analysis
3 Ways the New Thermo Scientific LC MS Triple Quads Improve Residue Analysis
 
Untitledtest6806 c17b226098de013425b9
Untitledtest6806 c17b226098de013425b9Untitledtest6806 c17b226098de013425b9
Untitledtest6806 c17b226098de013425b9
 
Untitledtest6806 c17b226098de013425b9
Untitledtest6806 c17b226098de013425b9Untitledtest6806 c17b226098de013425b9
Untitledtest6806 c17b226098de013425b9
 
Body Weight-Normalized Energy Expenditure Estimation Using Combined Activity ...
Body Weight-Normalized Energy Expenditure Estimation Using Combined Activity ...Body Weight-Normalized Energy Expenditure Estimation Using Combined Activity ...
Body Weight-Normalized Energy Expenditure Estimation Using Combined Activity ...
 
Estimating Tail Parameters
Estimating Tail ParametersEstimating Tail Parameters
Estimating Tail Parameters
 
Variable Selection Methods
Variable Selection MethodsVariable Selection Methods
Variable Selection Methods
 
Variable Selection Methods
Variable Selection MethodsVariable Selection Methods
Variable Selection Methods
 
Survival report of 76 breast cancer patients under three different treatments
Survival report of 76 breast cancer patients under three different treatmentsSurvival report of 76 breast cancer patients under three different treatments
Survival report of 76 breast cancer patients under three different treatments
 
Risk management
Risk managementRisk management
Risk management
 
Risk management Report
Risk management ReportRisk management Report
Risk management Report
 
Practical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and PresentationPractical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and Presentation
 
Design of experiment methodology
Design of experiment methodologyDesign of experiment methodology
Design of experiment methodology
 
Mnistauto 3
Mnistauto 3Mnistauto 3
Mnistauto 3
 
Enhancing the performance of kmeans algorithm
Enhancing the performance of kmeans algorithmEnhancing the performance of kmeans algorithm
Enhancing the performance of kmeans algorithm
 
Simulation and hardware implementation of Adaptive algorithms on tms320 c6713...
Simulation and hardware implementation of Adaptive algorithms on tms320 c6713...Simulation and hardware implementation of Adaptive algorithms on tms320 c6713...
Simulation and hardware implementation of Adaptive algorithms on tms320 c6713...
 
Msa rr
Msa rrMsa rr
Msa rr
 
Tools of the Trade
Tools of the TradeTools of the Trade
Tools of the Trade
 

Recently uploaded

Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 

Week 5 presentation 2

  • 1. Can you run faster? Alexis Yelton
  • 2. Runners want to run faster How fast can I run a half marathon? How can I improve on that time?
  • 4. Data from Strava.com Pace Time series, demographic, and aggregated running data on 10,000 runners. 1,000 with half-marathon times. 75 100 125 150 1 2 3 4 Rests per week HalfMarathonTime " 1 8.4 8.6 8.8 9.0 0 2 4 Log Month Distance LogHalfMarathonTime 10 10 10 20 clean5$mnth_pace 8.4 8.6 8.8 9.0 0 2 4 Log Month Distance LogHalfMarathonTime 10 10 10 20 clean5$mnth_pace
  • 5. Data from Strava.com 60 80 100 120 140 160 20 30 40 50 60 Age (years) HalfMarathonTime "blue" blue 10 10 60 80 100 120 140 160 120 140 160 180 200 Weight (lbs) HalfMarathonTime
  • 6. Analysis Benchmarking with a linear model 0.73 6.5 min Reducing number of features Ensemble partial least squares regression 0.73 6.4 min 5-fold cross-validation Regression r2 RMSE Validation: 72 runners 0.63 7.2 min
  • 7. About me: Alexis Yelton, MIT postdoc Chitinase in marine cyanobacteria Chitinaseactivity My first half marathon: 1:56:30 Personal best: 1:47:56
  • 8. 22 Features Month distance Weight Range Month Runs Gender Month Elevation Rest Days / Week Month Pace Fast Days / Week Month Time Long Days /Week 6 Month Distance 5K Time 6 Month Runs Marathon Time 6 Month Elevation Minimum Pace 6 Month Pace Minimum Pace > 2 mi 6 Month Time Minimum Pace > 3 mi Age Range SD Pace
  • 9. Results 0 500 1000 1500 4000 5000 6000 7000 8000 train_sub$HALF.MARATHON sqrt(errorstrain^2) Half Marathon Time Errors vary with half marathon time. A larger data set would allow for better predictions for faster and slower runners.
  • 11.
  • 12.
  • 13. Analysis Benchmarking with a linear model 0.73 6.5 min Dealing with collinear features (and reducing number of features) 1. Ensemble partial least squares regression 0.72 6.6 min 2. Linear model 0.71 6.7 min 3. Lasso regression 0.69 6.8 min 4. Ridge regression 0.72 6.7 min 5. Random forest regression 0.67 7.1 min 3-fold cross-validation Regression r2 RMSE Validation: 69 runners 0.63 7.2 min
  • 14. Analysis Benchmarking with a linear model 0.73 6.5 min Reducing number of features 1. Ensemble partial least squares regression 0.72 6.6 min 2. Linear model 0.71 6.7 min 3. Lasso regression 0.69 6.8 min Other models with these features 1. Ridge regression 0.72 6.7 min 2. Random forest regression 0.67 7.1 min 3-fold cross-validation Regression r2 RMSE Validation: 69 runners 0.63 7.2 min
  • 15. 0.0e+00 4.0e+07 8.0e+07 1.2e+08 rf$finalModel IncNodePurity Your average pace over the past month is the most important feature by far. Results Variable importance Increase in node purity Pace past month 5K time Pace past month Rest days SD pace Weight Long days Age Gender

Editor's Notes

  1. Start with asking if anyone is a runner. Be excited about the problem. Try elastic net regression (lasso and ridge combination)
  2. Motivation: Add what specifically works (main drivers of improving performance) Figure of errors vs actual times
  3. Motivation: Add what specifically works (main drivers of improving performance) Figure of errors vs actual times
  4. Start with 20 features in LM Feature selection (PLS regression, Lasso) Compare the same features in different models Improve usability Focus on the model I used. Introduce motivations before talking about the models (simplicity/usability and results)
  5. Start with 20 features in LM Feature selection (PLS regression, Lasso) Compare the same features in different models Improve usability
  6. Start with 20 features in LM Feature selection (PLS regression, Lasso) Compare the same features in different models Improve usability Focus on the model I used. Introduce motivations before talking about the models (simplicity/usability and results)