The document discusses a study conducted by the authors to determine the relationship between weather and bike ridership in a city. It describes the methodology used, which included gathering datasets on bike rental counts, weather data from 2011-2012, and conducting linear regressions. The results showed weather has a significant impact on bike usage, with rental counts being most strongly correlated with maximum temperature, average wind, precipitation, visibility, and snow levels. Including the year as a variable also improved the accuracy of the regression model.
2. Does weather have a relationship
with bike ridership?
Can we predict bike usage based
on weather?
3. INTRODUCTION
• Our team
• Research questions
• Picking datasets
• Our audience
4. METHODOLOGY
• Why linear regression?
• How we manipulated the data
• MySQL engine aggregated
3M table into sum of rental
counts and duration
• Mashed up with 731 rows of
weather data (2011, 2012)
• Added a Year field
• Tools: Excel, MySQL database,
R (Rattle)
5. METHODOLOGY
• Picking our best configuration
• Categoric vs. numeric variables
• Must decide how to measure bike usage
• Must pick best variables
• Error analysis
6. PHASE I
• Began with a broad study of six regressions
• Two target variables (rental counts, duration)
• Three temperature measures
• Minimum, Average, Maximum
• Chunked the day into three time ranges to reflect
temperature during bike rides
• Evaluated multiple weather variables’ affect on
regressions
• Ignored Date field
8. PHASE II
• Combining the data sets
• Picking best variables:
• Bike rental counts as sole target variable
• Maximum temperature
• Utilized date/year field
• Switched Snow to categoric variable
• Analyzed and refined our regression
• Higher accuracy – R-squared = .8374 or 83.74%
9. MSE and R-squared
• A measure of accuracy in one dataset
predicting another
• Relationship between R-squared and MSE
11. FINAL MODEL
Weight Variable
-4004.501 Intercept
62.118 Maximum Temperature
-132.741 Average Wind
93.162 Precipitation
416.818 Visibility
2063.069 Year
-161.038 Snow [0.0-1.2] inches
-4.945 Snow [1.2-2.0] inches
-588.349 Snow [2.0-3.1] inches
-5.390 Snow [3.1-3.9] inches
Y=
12. LESSONS LEARNED
• Too many independent variables to incorporate
crime dataset in addition to weather dataset
• Means Squared Error (MSE), R-squared
• Only two years’ worth of data was available due to
Bikeshare’s short history (2011, 2012)
• Final model would be even more accurate with
additional historical data
13. CONCLUSION
• Our hypotheses proved true: weather does affect
bike ridership
• Why is Maximum Temperature better?
• Why does the Year improve accuracy?
• The categorical range of snow inches
Does weather affect Bikeshare, and how? Can we predict it? To what limit can we be accurate?
Found the datasets on capital bikeshare and on farmer’s almanac
Who can use this study? Discuss what this could do for Bikeshare as a company
Linear regression was best suited. We were doing a comparison rather than classification. It was not a true/false research question.
We used charts in Excel to study the difference between predicted values and actual values.
Linear regression was best suited. We were doing a comparison rather than classification. It was not a true/false research question.
Min, avg, max temperature – best variables?
Error analysis – used both MSE and R-squared. Kays will discuss in further detail later.
TOP LEFT: Minimum temperature
TOP RIGHT: Average temperature, date is numeric
LOWER LEFT: Maximum temperature, date is numeric
LOWER RIGHT: Best combination: Maximum temperature, Year variable – numeric with two years only