Heartbeat Measuring Active User Base  and Potential User Interest  in FLOSS Projects Andrea Wiggins, James Howison & Kevin Crowston 4 June, 2009
Introduction Success measures for FLOSS Internal versus external - success according to whom? Software usage is a desirable success measure, but difficult to obtain Goal: Develop an algorithm to  estimate  active user base and general interest based on download counts
Measuring Software Use Many ways to measure usage Surveys Usage reporting agents Mining online data (downloads) Downloads provide a proxy for usage Must get software before you can use it Usually FLOSS software is downloaded, which can be counted
Problems with Downloads Downloads often used as direct proxy for usage, but… Cannot indicate how many downloads “convert” to actual use Regular users are counted multiple times due to release updates Measures inflated by user experimentation Only counts one distribution channel Release rates vary, hard to compare
Hypothesis Development Experience-based theory What is the experience of adopting FLOSS for end-user applications? Try it out, adopt it, update it when notified H1: There is a relatively constant level of downloads by new users trying out the software H2: Regular users respond relatively quickly to new releases
Idealized Release/Download Grey area: potential user downloads White areas: active user downloads Ideally, we would expect that… - experimentation rate is nearly constant, growing over time - active user base updates after release, growing over time
Data & Analysis Daily time series data on package downloads FLOSSmole http://ossmole.sourceforge.net Release data for each package SRDA http://zerlot.cse.nd.edu   Analysis with Taverna http://taverna.sourceforge.net
Descriptive Results - BibDesk Spikes following new releases Cyclic weekly effects “ Flat” periods between releases Growth over time in both baseline and spikes
Descriptive Results - SkimApp Similar overall patterns Recently founded, less data More rapid release cycle than BibDesk In both projects, occasional non-release spikes appear - one-time marketing?
Quantifying User Base Calculations based on daily downloads for two one-week observation periods centered around release date Potential user base : sum of daily downloads before release Active user base : sum of daily downloads after release, less the baseline average download rate
Numerical Results - BibDesk Consistent baseline experimentation rate Large variance for installed user base Further smoothing might help User base may be declining in BibDesk, due to small target audience and competition
Numerical Results - Skim-app Stable baseline, but substantial variance in calculated installed base Big spike in April 2008: first release in 3 months Overall trends toward growth in both user base and baseline
Discussion - Limitations Download data are problematic for a number of reasons Calibrating the measures Varying the duration of time periods leads to substantial changes User response rate varies by project Very sensitive to release date accuracy Also difficult to sample releases with sufficient time in between for baselines
Discussion - Uses Generalizability Assumes swift user response Different cases for end user versus enterprise software, varying market sizes Use with caution Examine data for consistent release response patterns Either measure can serve as a dependent variable for project popularity
Future Work Compare these findings against more dynamically selected time ranges e.g. time required to return to a rate close to the pre-release baseline Application to more projects, and comparison against other measures Statistical fitting for growth estimates May apply to other non-FLOSS downloaded software, e.g. iPhone apps
Conclusions Introduced a measure for estimating baseline user interest, and one for active user base in FLOSS projects Baseline measure shows good face validity in longitudinal time series Active user base measure shows surprising variance
Thanks! Questions? {awiggins|crowston}@syr.edu ,  [email_address] floss.syr.edu flosshub.org   Background image derived from photo by Vincent Kaczmarek,  http://www.flickr.com/photos/kaczmarekvincent/3263200507/

Heartbeat: Measuring Active User Base and Potential User Interest

  • 1.
    Heartbeat Measuring ActiveUser Base and Potential User Interest in FLOSS Projects Andrea Wiggins, James Howison & Kevin Crowston 4 June, 2009
  • 2.
    Introduction Success measuresfor FLOSS Internal versus external - success according to whom? Software usage is a desirable success measure, but difficult to obtain Goal: Develop an algorithm to estimate active user base and general interest based on download counts
  • 3.
    Measuring Software UseMany ways to measure usage Surveys Usage reporting agents Mining online data (downloads) Downloads provide a proxy for usage Must get software before you can use it Usually FLOSS software is downloaded, which can be counted
  • 4.
    Problems with DownloadsDownloads often used as direct proxy for usage, but… Cannot indicate how many downloads “convert” to actual use Regular users are counted multiple times due to release updates Measures inflated by user experimentation Only counts one distribution channel Release rates vary, hard to compare
  • 5.
    Hypothesis Development Experience-basedtheory What is the experience of adopting FLOSS for end-user applications? Try it out, adopt it, update it when notified H1: There is a relatively constant level of downloads by new users trying out the software H2: Regular users respond relatively quickly to new releases
  • 6.
    Idealized Release/Download Greyarea: potential user downloads White areas: active user downloads Ideally, we would expect that… - experimentation rate is nearly constant, growing over time - active user base updates after release, growing over time
  • 7.
    Data & AnalysisDaily time series data on package downloads FLOSSmole http://ossmole.sourceforge.net Release data for each package SRDA http://zerlot.cse.nd.edu Analysis with Taverna http://taverna.sourceforge.net
  • 8.
    Descriptive Results -BibDesk Spikes following new releases Cyclic weekly effects “ Flat” periods between releases Growth over time in both baseline and spikes
  • 9.
    Descriptive Results -SkimApp Similar overall patterns Recently founded, less data More rapid release cycle than BibDesk In both projects, occasional non-release spikes appear - one-time marketing?
  • 10.
    Quantifying User BaseCalculations based on daily downloads for two one-week observation periods centered around release date Potential user base : sum of daily downloads before release Active user base : sum of daily downloads after release, less the baseline average download rate
  • 11.
    Numerical Results -BibDesk Consistent baseline experimentation rate Large variance for installed user base Further smoothing might help User base may be declining in BibDesk, due to small target audience and competition
  • 12.
    Numerical Results -Skim-app Stable baseline, but substantial variance in calculated installed base Big spike in April 2008: first release in 3 months Overall trends toward growth in both user base and baseline
  • 13.
    Discussion - LimitationsDownload data are problematic for a number of reasons Calibrating the measures Varying the duration of time periods leads to substantial changes User response rate varies by project Very sensitive to release date accuracy Also difficult to sample releases with sufficient time in between for baselines
  • 14.
    Discussion - UsesGeneralizability Assumes swift user response Different cases for end user versus enterprise software, varying market sizes Use with caution Examine data for consistent release response patterns Either measure can serve as a dependent variable for project popularity
  • 15.
    Future Work Comparethese findings against more dynamically selected time ranges e.g. time required to return to a rate close to the pre-release baseline Application to more projects, and comparison against other measures Statistical fitting for growth estimates May apply to other non-FLOSS downloaded software, e.g. iPhone apps
  • 16.
    Conclusions Introduced ameasure for estimating baseline user interest, and one for active user base in FLOSS projects Baseline measure shows good face validity in longitudinal time series Active user base measure shows surprising variance
  • 17.
    Thanks! Questions? {awiggins|crowston}@syr.edu, [email_address] floss.syr.edu flosshub.org Background image derived from photo by Vincent Kaczmarek, http://www.flickr.com/photos/kaczmarekvincent/3263200507/