2. NYC Taxi Data - Algorithm/Methodology
Chris Wong's data set (http://chriswhong.com/open-data/foil_nyc_taxi/)
Information on over 100 million cab trips
~50 GB of data
Important features:
● pick up/drop off times and locations
● lengths of the trips
● fare rates and amounts
Derived Features:
● half hours in a day
● days of week (work day or weekend day)
● NYC boroughs
3. Time estimation: gradient boosting regression
25% std on 95% of the data
● path lengths
● current travel times
● subway info
Fare estimation: gradient descent regression
10% std on 95% of the data
Extra features: weather data
(weather underground: http://www.wunderground.com/)
- freezing
- snow
- rain