Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Building the Technology Pot           for the Stone Soup Method                of Data Collection:Facilitating Cooperation...
Caveat: I am not a data collection or surveying expert       I AM A TRAVEL MODELER.But travel models need…DATA.      SAN F...
So what am I going to talk about? Story : We needed data for something we had never seen collected before. And we didn’t h...
Route Choice Data Collection  Choices Considered               Cost per Cost per Respondent     Data       Data     RP or ...
CycleTracks: from coder to cyclist                      Publicity! Advertising! Stickers!  iTunes   Store  Android  Market...
CycleTracks Data: from cyclist to analyst                             Amazon EC/2 Server running Apache         JSON      ...
Bay Area Participants(if they noted their home ZIP)                               CycleTracks         BATS                ...
Data Quality: somegood, some bad      SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY   8
Urban Canyon Effect                                Haight Ashbury                       vs      Downtown     SAN FRANCISCO...
GPS Signal at Beginningof Trip      SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY   10
Not on a Bike      SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY   11
Post Processing Warranted    5,178 traces                                          Gaussian     497 users                 ...
Unintended Benefit: Scalability It works anywhere you can get  a satellite signal Database and cloud server  highly scal...
Where do people use CycleTracks?                                                           Agencies using Cycletracks:    ...
Where?Where it’s advertised the most.Place                         Users*                    Trips*San Francisco          ...
When did new users submit first trip?                Cycletracks New Users First Trip Submissions200180160    San Francisc...
Many just try it out , but half use it for a while                              Users by Duration of Use1000900     41%800...
Broad Spectrum of Users•       Half of users submit > 5 trips•       10% of users submitted > 20 trips•       40 users sub...
Capturing Infrequent Cyclists•    20% (500+) users infrequent cyclists (10% of trips)                            Trips and...
All Open Source                                     •   GPL3 License                                     •   Code on GitHu...
Bonus Benefit: Transferability + 750 users +8,500 tripsRolling their own from our code:              AggieTracks          ...
Combined Reach: ~44,000 trips              SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITYhttp://goo.gl/maps/DuqGh          ...
Issues - Bias Tradeoff between bias and quantityBut bias can be dealt with if quantity is high enough. Which biases are...
Issues - BiasDoes…          People who answer   People who have                               People who answer           ...
Issues - Bias…and this looks even better.                                                  Age Group 250% 200% 150% 100%  ...
Issues - Bias…but looks like income exacerbates the divide.                                          Total Household Incom...
Issues - Recruitment Recruitment can be difficultSmall publicity campaigns --> Small datasets    Areas most successful ...
Cliffs Notes All we did was build a little phone app:Very tiny investment (<$20,000 total ) for CycleTracksYielded 35,0...
Lessons Learned Think about ways to get data to come to you. Reach far with small levels of investment. Be open. Open-s...
Thanks!Credits: Lisa Zorn, Billy Charlton, Matt Paul        elizabeth [at] sfcta [dot] org         www.sfcta.org/modeling ...
Another story: We had a lot of unorganized datacollected by a zillionprojects, agencies, consultants…and wanted tomake sen...
CountDracula     https://github.com/sfcta/CountDracula     SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY   32
Upcoming SlideShare
Loading in …5
×

Stone Soup Data Collection w/ CycleTracks

16,718 views

Published on

This is a presentation given during TRB 2013 to illustrate a use of Technology to aggregate together data from an opt-in smart phone app: CycleTracks

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Stone Soup Data Collection w/ CycleTracks

  1. 1. Building the Technology Pot for the Stone Soup Method of Data Collection:Facilitating Cooperation in the Face of Scarcity Elizabeth A. Sall SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY Transportation Research Board Annual Meeting in Washington, D.C. Tuesday January 15th, 2013
  2. 2. Caveat: I am not a data collection or surveying expert I AM A TRAVEL MODELER.But travel models need…DATA. SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 2
  3. 3. So what am I going to talk about? Story : We needed data for something we had never seen collected before. And we didn’t have much money or time. …so we built this app called CycleTracks SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 3
  4. 4. Route Choice Data Collection Choices Considered Cost per Cost per Respondent Data Data RP or SP Record Respondent LOE Precision QualityWeb-based $ $ High SPstatedpreferenceCATI Route $$$ $$$ High Low Low RPrecallPersonal GPS $ $$ Med High Med RPBicycle GPS $ $$ Med High High RPSmart Phone $ $ Low Med Med RP SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 4
  5. 5. CycleTracks: from coder to cyclist Publicity! Advertising! Stickers! iTunes Store Android Market SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 5
  6. 6. CycleTracks Data: from cyclist to analyst Amazon EC/2 Server running Apache JSON PHP mySQL PHP SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 6
  7. 7. Bay Area Participants(if they noted their home ZIP) CycleTracks BATS N-366 N=153Age Mean 34 33Gender Female 20% 36%Cycling Frequency Daily 48% Several Times/Week 36% Several Times/Month 13% Less than once a month 3% N/A SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 7
  8. 8. Data Quality: somegood, some bad SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 8
  9. 9. Urban Canyon Effect Haight Ashbury vs Downtown SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 9
  10. 10. GPS Signal at Beginningof Trip SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 10
  11. 11. Not on a Bike SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 11
  12. 12. Post Processing Warranted 5,178 traces Gaussian 497 users smoothing Activity & mode detection~60% of 3,034 bikesubmitted Map stages hdatauseful matching 366 users (Schüssler & Axhausen 2009) SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 12
  13. 13. Unintended Benefit: Scalability It works anywhere you can get a satellite signal Database and cloud server highly scalable Web interface for data minimizes human resources Data cleaning open-source Cost for data:Keeping server onPromotion SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 13
  14. 14. Where do people use CycleTracks? Agencies using Cycletracks: 1. San Francisco 2. Monterey Bay, CA 3. Austin, TX 4. Seattle, WA *based on optional homeZIP field, NOT TRIP LOCATION 5. Fort Collins, CO 6. Twin Cities, MN 7. Raleigh, NC 8. Salt Lake City, UT SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 14
  15. 15. Where?Where it’s advertised the most.Place Users* Trips*San Francisco 665** 11,458Austin 276 2,950Fort Collins 126 1,560Seattle 108 1,175Minneapolis 67 1,326Oakland 26 127Saint Paul 23 449San Jose 22 70Santa Cruz 17 254Berkeley 14 127 *based on optional homeZIP field, NOT TRIP LOCATION ** compared to 153 cyclists in the 2000 HH Travel Survey SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 15
  16. 16. When did new users submit first trip? Cycletracks New Users First Trip Submissions200180160 San Francisco Monterey140120 Fort Collins100 80 Austin Twin Cities Seattle 60 40 20 0• New user registrations directly correlates with publicity SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 16
  17. 17. Many just try it out , but half use it for a while Users by Duration of Use1000900 41%800700600500400300 16% 15%200 11% 8%100 3% 3% 3% 0 One Day Day - Week Week - Month 1 - 3 Months 3 - 6 Months 6 - 9 Months 9 - 12 Months Over a Year SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 17
  18. 18. Broad Spectrum of Users• Half of users submit > 5 trips• 10% of users submitted > 20 trips• 40 users submitted >100 trips (Max = 685) Users by Trips Submitted800 31%700600500 20%400300 10% 8%200 5% 3%100 0 1 2-5 6-10 11-15 16-20 21-700 SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 18
  19. 19. Capturing Infrequent Cyclists• 20% (500+) users infrequent cyclists (10% of trips) Trips and Users by Cycling Frequency Users Trips 13,506 9,122 2,036 577 852 853 187 411 Less than once a month Several times per month Several times per week Daily SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 19
  20. 20. All Open Source • GPL3 License • Code on GitHub • Fork us! www.github.com/sfcta SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 20
  21. 21. Bonus Benefit: Transferability + 750 users +8,500 tripsRolling their own from our code: AggieTracks ~35 users Cville Bike mApper ~120 users/1500 trips Cycle Atlanta NuStats PaceLogger ~ 400 users/4500 trips Cycle Lane ~ 200 users/2500 trips SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 21
  22. 22. Combined Reach: ~44,000 trips SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITYhttp://goo.gl/maps/DuqGh 22
  23. 23. Issues - Bias Tradeoff between bias and quantityBut bias can be dealt with if quantity is high enough. Which biases are acceptable and when? i.e. does income affect how adverse to biking up hills you are (vs. biking around them) ? What biases can we undo with technology? SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 23
  24. 24. Issues - BiasDoes… People who answer People who have People who answer x = Surveys over their ? surveys Smartphones Smartphones Race/Ethnicity 140% 120% 100% 80% 60% 40% 20% 0% NHTS Sample Rate Smartphone Ownership Effective Probable Response Rate White Black Hispanic …if so, this looks pretty good. Sources: 2009 NHTS (NHTS Sample/1,000 population) ; Pew 2011 Smartphone Survey; Census SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 24
  25. 25. Issues - Bias…and this looks even better. Age Group 250% 200% 150% 100% 50% 0% NHTS Sample Rate Smartphone Ownership Effective Probable Response Rate 18-24 24-34 35-44 45-54 55-64 65+ Sources: 2009 NHTS (NHTS Sample/1,000 population) ; Pew 2011 Smartphone Survey; Census SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 25
  26. 26. Issues - Bias…but looks like income exacerbates the divide. Total Household Income300%250%200%150%100% 50% 0% NHTS Sample Rate Smartphone Ownership Effective Probable Response Rate <$10k $10k-<$20k $20k-<$30k $30k-<$40k $40k-<$50k $50k-<$75k $75k-<$100k $100k+ Sources: 2009 NHTS (NHTS Sample/1,000 population) ; Pew 2011 Smartphone Survey; Census SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 26
  27. 27. Issues - Recruitment Recruitment can be difficultSmall publicity campaigns --> Small datasets Areas most successful in recruiting users had large publicity campaigns App needs to have value itself: Monetary value Feel like ‘they are helping’ something they care about Fun (at least not painful) to use SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 27
  28. 28. Cliffs Notes All we did was build a little phone app:Very tiny investment (<$20,000 total ) for CycleTracksYielded 35,000+ recordsOpen source policy has afforded 8,500 more and counting SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 28
  29. 29. Lessons Learned Think about ways to get data to come to you. Reach far with small levels of investment. Be open. Open-source works! Set aside real money to:Maintain and grow the app and associated scriptsAdvertise what we have done with it/develop a community Develop App under your Apple Developer ID Changing is painful Use an API interface rather than have the app hard-coded to a database More flexible in case others want to contribute data SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 29
  30. 30. Thanks!Credits: Lisa Zorn, Billy Charlton, Matt Paul elizabeth [at] sfcta [dot] org www.sfcta.org/modeling www.sfcta.org/cycletracks http://github.com/sfcta SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY
  31. 31. Another story: We had a lot of unorganized datacollected by a zillionprojects, agencies, consultants…and wanted tomake sense of it. …so we built this app called CountDracula SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 31
  32. 32. CountDracula https://github.com/sfcta/CountDracula SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 32

×