This document summarizes a Kaggle competition to forecast bike share system use using historical rental data. It describes the data provided, which includes date/time details, weather factors, and counts of casual, registered and total rentals on an hourly basis. It also lists the tools that were used for the analysis, including R for feature engineering and models like random forests and neural networks.
3. About Bike Share
Competition: http://www.kaggle.com/c/bike-
sharing-demand
Challenge:
Forecast use of a city’s bike share system
Data :
UCI Machine Learning Repository
4. Publication :
Fanaee-T, Hadi, and Gama, Joao, Event
labeling combining ensemble detectors
and background knowledge, Progress in
Artificial Intelligence (2013): pp. 1-15,
Springer Berlin Heidelberg.
About Bike Share
5. Data
The goal is to predict counts either
based on sum of casual & registered or
directly
6. Data Fields
datetime - hourly date + timestamp
season - 1 = spring, 2 = summer, 3 = fall, 4 = winter
holiday - whether the day is considered a holiday
workingday - whether the day is neither a weekend nor holiday
weather –
1: Clear, Few clouds, Partly cloudy, Partly cloudy
2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain +
Scattered clouds
4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
temp - temperature in Celsius
atemp - "feels like" temperature in Celsius
humidity - relative humidity
windspeed - wind speed
casual - number of non-registered user rentals initiated
registered - number of registered user rentals initiated
count - number of total rental
7. Data
datetime - hourly date + timestamp
season - 1 = spring, 2 = summer, 3 = fall, 4 = winter
holiday - whether the day is considered a holiday
workingday - whether the day is neither a weekend nor holiday
weather –
1: Clear, Few clouds, Partly cloudy, Partly cloudy
2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain +
Scattered clouds
4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
temp - temperature in Celsius
atemp - "feels like" temperature in Celsius
humidity - relative humidity
windspeed - wind speed
casual - number of non-registered user rentals initiated
registered - number of registered user rentals initiated
count - number of total rental
8. Data
Datetime - hourly date + timestamp
Predefined Factors:
season - 1 = spring, 2 = summer, 3 = fall, 4 = winter
holiday - whether the day is considered a holiday
workingday - whether the day is neither a weekend nor holiday
weather –
1: Clear, Few clouds, Partly cloudy, Partly cloudy
2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain +
Scattered clouds
4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
16. Citations
Feature-Weighted Linear Stacking
Joseph Sill1, Gabor Takacs2, Lester Mackey3, and David Lin4
Combining Predictions for Accurate Recommender Systems
Michael Jahrer ,Andreas Töscher ,Robert Legenstein
Editor's Notes
Would really love to build a framework recognizing patterns based on the type of data and creating complex factors out of the box
Size
Sparse vs. Dense
Tall (more records) & skinny vs High Width (more columns)