The Goal
Predict an individuals NHL player's
contribution to his team over the life
of his contract
The Data
• Private stats site (stats.hockeyanalysis.com)
• Rows correspond to individual player's stats by
season
• Data spread across multiple tables
– Joined tables using Pandas with name as key
– Be careful with excel…
More about the data
• Advanced stats only go back to 2007
• Aggregated prior 3 seasons of data to predict
the following season
• Players show up as multiple rows
• Y’s in certain row’s become x’s in other rows
Models
•
•
•
•
•
RF .326 (be sure to set oob_score = True)
ElasticNetCV .359
ElasticNetCV w/PCA .364
RF w/PCA .285
SVR(xVal) .314 (could be improved if I
normalized featureset)
• SVR(xVal) w/PCA .322
• GBR w/PCA .187
Improvements
• More data!!
– more rows
– Less noise variables (especially Y’s)
• Choose a ‘longer’ Y
• Get age data
• Include injury status