Building the Technology Pot
           for the Stone Soup Method
                of Data Collection:
Facilitating Cooperation in the Face of Scarcity

                      Elizabeth A. Sall




      SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY
 Transportation Research Board Annual Meeting in Washington, D.C.
                    Tuesday January 15th, 2013
Caveat: I am not a data collection or surveying expert

       I AM A TRAVEL MODELER.


But travel models need…

DATA.


      SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY                  2
So what am I going to talk about?



 Story : We needed data for something we
 had never seen collected before. And we
 didn’t have much money or time.
     …so we built this app called CycleTracks




     SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY   3
Route Choice Data Collection
  Choices Considered

               Cost per Cost per Respondent     Data       Data     RP or SP
               Record Respondent    LOE       Precision   Quality
Web-based         $        $        High                            SP
stated
preference
CATI Route       $$$      $$$       High      Low         Low       RP
recall
Personal GPS      $       $$        Med       High        Med       RP
Bicycle GPS       $       $$        Med       High        High      RP
Smart Phone       $        $        Low       Med         Med       RP




              SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY                4
CycleTracks: from coder to cyclist


                      Publicity! Advertising! Stickers!




  iTunes
   Store

  Android
  Market




        SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY     5
CycleTracks Data: from cyclist to analyst




                             Amazon EC/2 Server running Apache

         JSON
                                 PHP      mySQL       PHP




      SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY              6
Bay Area Participants
(if they noted their home ZIP)


                               CycleTracks         BATS
                                 N-366            N=153
Age
 Mean                              34              33

Gender
 Female                           20%              36%
Cycling Frequency
 Daily                            48%
 Several Times/Week               36%
 Several Times/Month              13%
 Less than once a month            3%              N/A




          SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY   7
Data Quality: some
good, some bad




      SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY   8
Urban Canyon Effect




                                Haight Ashbury




                       vs




      Downtown




     SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY   9
GPS Signal at Beginning
of Trip




      SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY   10
Not on a Bike




      SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY   11
Post Processing Warranted



    5,178 traces
                                          Gaussian
     497 users                            smoothing


                                          Activity & mode
                                          detection

~60% of
                                                        3,034 bike
submitted                                 Map             stages
                                 h
data
useful
                                          matching       366 users

            (Schüssler & Axhausen 2009)

            SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY            12
Unintended Benefit: Scalability


 It works anywhere you can get
  a satellite signal
 Database and cloud server
  highly scalable
 Web interface for data
  minimizes human resources
 Data cleaning open-source
 Cost for data:
Keeping server on
Promotion



      SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY   13
Where do people use CycleTracks?




                                                           Agencies using Cycletracks:
                                                           1. San Francisco
                                                           2. Monterey Bay, CA
                                                           3. Austin, TX
                                                           4. Seattle, WA
     *based on optional homeZIP field, NOT TRIP LOCATION   5. Fort Collins, CO
                                                           6. Twin Cities, MN
                                                           7. Raleigh, NC
                                                           8. Salt Lake City, UT

       SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY                              14
Where?
Where it’s advertised the most.

Place                         Users*                    Trips*
San Francisco                 665**                     11,458
Austin                        276                       2,950
Fort Collins                  126                       1,560
Seattle                       108                       1,175
Minneapolis                   67                        1,326
Oakland                       26                        127
Saint Paul                    23                        449
San Jose                      22                        70
Santa Cruz                    17                        254
Berkeley                      14                        127
               *based on optional homeZIP field, NOT TRIP LOCATION
               ** compared to 153 cyclists in the 2000 HH Travel Survey
           SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY                  15
When did new users submit first trip?


                Cycletracks New User's First Trip Submissions
200
180
160    San Francisco             Monterey
140
120                                                                    Fort Collins
100
 80                                         Austin             Twin Cities
                                                     Seattle
 60
 40
 20
  0




•     New user registrations directly correlates with publicity



                 SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY                  16
Many just try it out , but half use it for a while

                              Users by Duration of Use
1000

900     41%

800

700

600

500

400

300                              16%
                                                 15%

200                 11%
                                                                8%
100

                                                                               3%              3%             3%
  0
       One Day   Day - Week   Week - Month   1 - 3 Months   3 - 6 Months   6 - 9 Months   9 - 12 Months   Over a Year




            SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY                                                          17
Broad Spectrum of Users


•       Half of users submit > 5 trips
•       10% of users submitted > 20 trips
•       40 users submitted >100 trips (Max = 685)
                         Users by Trips Submitted
800
          31%
700

600

500                20%

400

300                                                          10%
                                8%
200
                                          5%
                                                     3%
100

    0
           1       2-5          6-10     11-15      16-20   21-700




           SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY             18
Capturing Infrequent Cyclists


•    20% (500+) users infrequent cyclists (10% of trips)

                            Trips and Users by Cycling Frequency
              Users         Trips
                                                                                                     13,506




                                                                             9,122




                                                    2,036
                      577                                      852                     853
     187                              411


    Less than once a month          Several times per month   Several times per week         Daily




              SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY                                               19
All Open Source


                                     •   GPL3 License
                                     •   Code on GitHub
                                     •   Fork us!




     www.github.com/sfcta

      SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY       20
Bonus Benefit: Transferability
 + 750 users +8,500 trips
Rolling their own from our code:

              AggieTracks
              ~35 users


              Cville Bike mApper
              ~120 users/1500 trips


              Cycle Atlanta
                                                           NuStats PaceLogger
              ~ 400 users/4500 trips


                                       Cycle Lane
                                       ~ 200 users/2500 trips


           SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY                        21
Combined Reach: ~44,000 trips




              SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY
http://goo.gl/maps/DuqGh                                      22
Issues - Bias


 Tradeoff between bias and quantity
But bias can be dealt with if quantity is high enough.
 Which biases are acceptable and when?
    i.e. does income affect how adverse to biking up hills
     you are (vs. biking around them) ?
 What biases can we undo with technology?




       SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY          23
Issues - Bias

Does…
          People who answer   People who have                               People who answer
                            x                                             = Surveys over their                 ?
          surveys             Smartphones
                                                                            Smartphones
                                             Race/Ethnicity
   140%

   120%

   100%

   80%

   60%

   40%

   20%

    0%
               NHTS Sample Rate                  Smartphone Ownership       Effective Probable Response Rate

                                             White    Black    Hispanic
                                                                            …if so, this looks pretty good.
            Sources: 2009 NHTS (NHTS Sample/1,000
            population) ; Pew 2011 Smartphone Survey; Census


            SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY                                                          24
Issues - Bias


…and this looks even better.

                                                  Age Group
 250%


 200%


 150%


 100%


  50%


   0%
              NHTS Sample Rate                       Smartphone Ownership            Effective Probable Response Rate

                                     18-24   24-34    35-44    45-54   55-64   65+



           Sources: 2009 NHTS (NHTS Sample/1,000 population) ;
           Pew 2011 Smartphone Survey; Census

           SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY                                                                25
Issues - Bias


…but looks like income exacerbates the divide.
                                          Total Household Income
300%


250%


200%


150%


100%


 50%


  0%
              NHTS Sample Rate                       Smartphone Ownership                Effective Probable Response Rate

         <$10k    $10k-<$20k     $20k-<$30k    $30k-<$40k      $40k-<$50k   $50k-<$75k   $75k-<$100k       $100k+


            Sources: 2009 NHTS (NHTS Sample/1,000
            population) ; Pew 2011 Smartphone Survey; Census

            SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY                                                                   26
Issues - Recruitment


 Recruitment can be difficult
Small publicity campaigns --> Small datasets
    Areas most successful in recruiting users had large
     publicity campaigns
 App needs to have value itself:
   Monetary value
   Feel like ‘they are helping’ something they care about
   Fun (at least not painful) to use




       SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY         27
Cliffs Notes


 All we did was build a little phone app:
Very tiny investment (<$20,000 total ) for CycleTracks
Yielded 35,000+ records
Open source policy has afforded 8,500 more and
  counting




      SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY       28
Lessons Learned


 Think about ways to get data to come to you.
 Reach far with small levels of investment.
 Be open. Open-source works!
 Set aside real money to:
Maintain and grow the app and associated scripts
Advertise what we have done with it/develop a community
 Develop App under your Apple Developer ID
   Changing is painful
 Use an API interface rather than have the app hard-coded
  to a database
   More flexible in case others want to contribute data

      SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY        29
Thanks!

Credits: Lisa Zorn, Billy Charlton, Matt Paul


        elizabeth [at] sfcta [dot] org
         www.sfcta.org/modeling
        www.sfcta.org/cycletracks
          http://github.com/sfcta




 SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY
Another story: We had a lot of unorganized data
collected by a zillion
projects, agencies, consultants…and wanted to
make sense of it.
      …so we built this app called CountDracula




    SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY   31
CountDracula




     https://github.com/sfcta/CountDracula

     SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY   32

Stone Soup Data Collection w/ CycleTracks

  • 1.
    Building the TechnologyPot for the Stone Soup Method of Data Collection: Facilitating Cooperation in the Face of Scarcity Elizabeth A. Sall SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY Transportation Research Board Annual Meeting in Washington, D.C. Tuesday January 15th, 2013
  • 2.
    Caveat: I amnot a data collection or surveying expert I AM A TRAVEL MODELER. But travel models need… DATA. SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 2
  • 3.
    So what amI going to talk about? Story : We needed data for something we had never seen collected before. And we didn’t have much money or time. …so we built this app called CycleTracks SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 3
  • 4.
    Route Choice DataCollection Choices Considered Cost per Cost per Respondent Data Data RP or SP Record Respondent LOE Precision Quality Web-based $ $ High SP stated preference CATI Route $$$ $$$ High Low Low RP recall Personal GPS $ $$ Med High Med RP Bicycle GPS $ $$ Med High High RP Smart Phone $ $ Low Med Med RP SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 4
  • 5.
    CycleTracks: from coderto cyclist Publicity! Advertising! Stickers! iTunes Store Android Market SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 5
  • 6.
    CycleTracks Data: fromcyclist to analyst Amazon EC/2 Server running Apache JSON PHP mySQL PHP SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 6
  • 7.
    Bay Area Participants (ifthey noted their home ZIP) CycleTracks BATS N-366 N=153 Age Mean 34 33 Gender Female 20% 36% Cycling Frequency Daily 48% Several Times/Week 36% Several Times/Month 13% Less than once a month 3% N/A SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 7
  • 8.
    Data Quality: some good,some bad SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 8
  • 9.
    Urban Canyon Effect Haight Ashbury vs Downtown SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 9
  • 10.
    GPS Signal atBeginning of Trip SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 10
  • 11.
    Not on aBike SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 11
  • 12.
    Post Processing Warranted 5,178 traces Gaussian 497 users smoothing Activity & mode detection ~60% of 3,034 bike submitted Map stages h data useful matching 366 users (Schüssler & Axhausen 2009) SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 12
  • 13.
    Unintended Benefit: Scalability It works anywhere you can get a satellite signal  Database and cloud server highly scalable  Web interface for data minimizes human resources  Data cleaning open-source  Cost for data: Keeping server on Promotion SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 13
  • 14.
    Where do peopleuse CycleTracks? Agencies using Cycletracks: 1. San Francisco 2. Monterey Bay, CA 3. Austin, TX 4. Seattle, WA *based on optional homeZIP field, NOT TRIP LOCATION 5. Fort Collins, CO 6. Twin Cities, MN 7. Raleigh, NC 8. Salt Lake City, UT SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 14
  • 15.
    Where? Where it’s advertisedthe most. Place Users* Trips* San Francisco 665** 11,458 Austin 276 2,950 Fort Collins 126 1,560 Seattle 108 1,175 Minneapolis 67 1,326 Oakland 26 127 Saint Paul 23 449 San Jose 22 70 Santa Cruz 17 254 Berkeley 14 127 *based on optional homeZIP field, NOT TRIP LOCATION ** compared to 153 cyclists in the 2000 HH Travel Survey SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 15
  • 16.
    When did newusers submit first trip? Cycletracks New User's First Trip Submissions 200 180 160 San Francisco Monterey 140 120 Fort Collins 100 80 Austin Twin Cities Seattle 60 40 20 0 • New user registrations directly correlates with publicity SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 16
  • 17.
    Many just tryit out , but half use it for a while Users by Duration of Use 1000 900 41% 800 700 600 500 400 300 16% 15% 200 11% 8% 100 3% 3% 3% 0 One Day Day - Week Week - Month 1 - 3 Months 3 - 6 Months 6 - 9 Months 9 - 12 Months Over a Year SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 17
  • 18.
    Broad Spectrum ofUsers • Half of users submit > 5 trips • 10% of users submitted > 20 trips • 40 users submitted >100 trips (Max = 685) Users by Trips Submitted 800 31% 700 600 500 20% 400 300 10% 8% 200 5% 3% 100 0 1 2-5 6-10 11-15 16-20 21-700 SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 18
  • 19.
    Capturing Infrequent Cyclists • 20% (500+) users infrequent cyclists (10% of trips) Trips and Users by Cycling Frequency Users Trips 13,506 9,122 2,036 577 852 853 187 411 Less than once a month Several times per month Several times per week Daily SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 19
  • 20.
    All Open Source • GPL3 License • Code on GitHub • Fork us! www.github.com/sfcta SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 20
  • 21.
    Bonus Benefit: Transferability + 750 users +8,500 trips Rolling their own from our code: AggieTracks ~35 users Cville Bike mApper ~120 users/1500 trips Cycle Atlanta NuStats PaceLogger ~ 400 users/4500 trips Cycle Lane ~ 200 users/2500 trips SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 21
  • 22.
    Combined Reach: ~44,000trips SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY http://goo.gl/maps/DuqGh 22
  • 23.
    Issues - Bias Tradeoff between bias and quantity But bias can be dealt with if quantity is high enough.  Which biases are acceptable and when? i.e. does income affect how adverse to biking up hills you are (vs. biking around them) ?  What biases can we undo with technology? SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 23
  • 24.
    Issues - Bias Does… People who answer People who have People who answer x = Surveys over their ? surveys Smartphones Smartphones Race/Ethnicity 140% 120% 100% 80% 60% 40% 20% 0% NHTS Sample Rate Smartphone Ownership Effective Probable Response Rate White Black Hispanic …if so, this looks pretty good. Sources: 2009 NHTS (NHTS Sample/1,000 population) ; Pew 2011 Smartphone Survey; Census SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 24
  • 25.
    Issues - Bias …andthis looks even better. Age Group 250% 200% 150% 100% 50% 0% NHTS Sample Rate Smartphone Ownership Effective Probable Response Rate 18-24 24-34 35-44 45-54 55-64 65+ Sources: 2009 NHTS (NHTS Sample/1,000 population) ; Pew 2011 Smartphone Survey; Census SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 25
  • 26.
    Issues - Bias …butlooks like income exacerbates the divide. Total Household Income 300% 250% 200% 150% 100% 50% 0% NHTS Sample Rate Smartphone Ownership Effective Probable Response Rate <$10k $10k-<$20k $20k-<$30k $30k-<$40k $40k-<$50k $50k-<$75k $75k-<$100k $100k+ Sources: 2009 NHTS (NHTS Sample/1,000 population) ; Pew 2011 Smartphone Survey; Census SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 26
  • 27.
    Issues - Recruitment Recruitment can be difficult Small publicity campaigns --> Small datasets Areas most successful in recruiting users had large publicity campaigns App needs to have value itself: Monetary value Feel like ‘they are helping’ something they care about Fun (at least not painful) to use SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 27
  • 28.
    Cliffs Notes  Allwe did was build a little phone app: Very tiny investment (<$20,000 total ) for CycleTracks Yielded 35,000+ records Open source policy has afforded 8,500 more and counting SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 28
  • 29.
    Lessons Learned  Thinkabout ways to get data to come to you.  Reach far with small levels of investment.  Be open. Open-source works!  Set aside real money to: Maintain and grow the app and associated scripts Advertise what we have done with it/develop a community  Develop App under your Apple Developer ID Changing is painful  Use an API interface rather than have the app hard-coded to a database More flexible in case others want to contribute data SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 29
  • 30.
    Thanks! Credits: Lisa Zorn,Billy Charlton, Matt Paul elizabeth [at] sfcta [dot] org www.sfcta.org/modeling www.sfcta.org/cycletracks http://github.com/sfcta SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY
  • 31.
    Another story: Wehad a lot of unorganized data collected by a zillion projects, agencies, consultants…and wanted to make sense of it. …so we built this app called CountDracula SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 31
  • 32.
    CountDracula https://github.com/sfcta/CountDracula SAN FRANCISCO COUNTY TRANSPORTATION AUTHORITY 32

Editor's Notes

  • #5 Sources:Sener, Eluru, and Bhat (2008) Route-recall McDonald and Burns (2001) – non GPS methods costly on per-record basis and prone to human error in description and translation Special-purpose GPS units are small and lightweight enough to carry on-person and can be obtained for less than $100.Meghini et al (2009) was successful in glening 2,657 bike routes from a personal GPS dataset with 11,000 total trips and 2,434 people. Dill and Gliebe used bicycle-mounted GPS devices to record bicycle routes of 164 adults in Portland. This eliminated a lot of the dta-cleaning required by personal GPS devices, but still had to clean out bikes on transit. Doherty (2009) provided smartphones.
  • #8 Here is a comparison of the demographics of the participants whose traces survived the data processing to the subpopulation from the Bay Area Travel Survey that reported a cycling trip in San Francisco. As you can see, the CycleTracks sample is over twice the size of BATS, which contained 50,000 households, illustrating why seeking a representative sample to study cycling is not feasible. But, our sample is biased. While the mean age in the two samples are not significantly different, our study does include a lower proportion of women at 21% compared to BATS’ 36%. While we don’t have a population to compare cycling frequency against, we also suspect that our sample is biased toward frequent cyclists. The bias is a limited problem because we were able to account for it with interaction variables in model estimation.
  • #13 Schuessler, Nadine and Kay W. Axhausen. “Processing Raw Data from Global Positioning Systems Without Additional Information,” Transportation Research Record : Journal of the Transportation Research Board, No 2105. Washington D.C., 2009, pp. 28-35. http://trb.metapress.com/content/tv306m812140p330/
  • #21 Our code is open source, and there are a number of agencies who have tried their hand at modifying it to their own needs.