5. The problem
I want to go hiking at a time/day that
works for me, but that also minimizes
the size of the crowds.
6. The problem
I want to go hiking at a time/day that
works for me, but that also minimizes
the size of the crowds.
I would like to predict the crowd size for
a specific location and a range of
future dates.
7. The problem
I want to go hiking at a time/day that
works for me, but that also minimizes
the size of the crowds.
I would like to predict the crowd size for
a specific location and a range of
future dates.
Then I can use that prediction to make
an intelligent choice about when to
take my trip.
8. How do we predict crowds right now?
Government data
Often aggregated
Not always immediately accessible
Check-ins
Sparse coverage
Prior knowledge/Intuition
Not always validated
12. CrowdSkippr: Inner workings
From flickr.com, extract
the total number of
photos taken at a given
time/place).
Extract data on
temperatures from
NOAA.gov for a
given time/place.
Using this information, create a prediction of how
heavy the crowds will be at a given future
time/place.
TM
13. Gradient Boosting Regression
Predictors
Day of week (Flickr)
Holiday flag (Flickr)
Day of year (Flickr)
Daily temperature (NOAA)
Response
Number of photos taken (Flickr)
(proxy for size of crowd)
18. For all 28-day windows in a given year,
the median difference between crowd size on predicted and
actual best days is 4.6%.
(On the days that are predicted to have the lowest crowds, the
crowd size is 29% of the worst possible crowds within that
window.)
Validation:
Rocky Mountain National Park
Predicted
crowd size
Actual
crowd size
(test data)
Editor's Notes
WHY. Explain what gradient boosting does, what random forest does, and why gradient boosting is better.
How does this improve upon just graphing the number of photos over a year? – have answer to this question.
Need 2014 data.
HOW GOOD is the correlation. What’s the R^2.
Photo values were calculated by finding the total number of all public photos taken at Yosemite National Park and posted on Flickr in each month from 2005-2013, then normalizing each month’s total by the grand total of all photos taken during this time
Visitor values were calculated by taking the total number of all visitors to Yosemite National Park in each month from 1985 to 2007, then normalizing by the grand total of visitors during this time
----- Meeting Notes (6/23/14 10:36) -----
have this before the gradient boosting. and after demo. THEN GBR.
point out that whether it’s a holiday is not as important.
maybe have a graphic describing gradient boosting.
menion you ran a battery of tests and the best one was ___.
----- Meeting Notes (6/23/14 10:36) -----
talk about your research. one sentence intro. crabs.