Little eScience
Andrea Wiggins
June 18, 2009
Overview

• Background


• Exposition: Sociology of Science


  • Broad generalizations about science


• Example: FLOSS Research


  • Little science context for eScience research


• Expectations: What next?


                                           http://www.flickr.com/photos/pmtorrone/304696349/
My Background

• BA: Maths with economics


• Nonprofit & IT industry work


  • Adult literacy, nonprofit management support,
    professional theatre


  • Web analytics


• MSI: Human-computer interaction,
  complex systems & network science


• PhD: Information science & technology
Science

• Systematic investigation for the production of knowledge


  • Scientific method emphasizes reproducibility


  • Not all phenomena are reproducible...


• Many categories


  • Experimental, applied, social, etc.


  • Categories are not mutually exclusive


                                            http://www.flickr.com/photos/radiorover/419414206/
Paradigms & Revolutions

• Kuhn - Laws, theories, applications & instrumentation that create
  coherent traditions of scientific research


• Paradigms help us direct
  our research, but limit
  our view of the world


• New technologies can
  lead to scientific revolutions
  by revealing anomalies




                                           http://www.flickr.com/photos/weichbrodt/644302381/
Normal Science

• Kuhn - “normal science” is research based on broadly accepted scientific
  paradigms


• Shared paradigms are based on rules
  and standards for scientific practice


• Key requirement: agreement on
  focus and conduct of research


  • Ǝ(Grand Challenges)|Discipline




                                         http://www.flickr.com/photos/themadlolscientist/2421152973/
Big Science

• de Solla Price - “Big Science” is...


   • Inherently paradigmatic


   • Always normal science


• Produces detailed insights into
  the minutiae of phenomena
  studied in the paradigm




                                         http://www.flickr.com/photos/31333486@N00/1883498062/
Pre-paradigmatic Science

• Paradigms require agreement on...


  • Epistemology


  • Ontology


  • Methodology


• Most social sciences are pre-paradigmatic


  • Primarily exploratory research


  • Very little replication                   http://www.flickr.com/photos/askpang/327577395/
Little Science

• de Solla Price - “Little Science” is a
  romanticized precursor to Big Science,
  featuring lone, long-haired geniuses
  misunderstood by society, etc.


• If it’s not Big Science, it’s Little Science


   • Pre-paradigmatic and fraught with ambiguity


   • Often fundamentally exploratory


   • Epistemological/theoretical/methodological
     divergence among researchers
                                                 http://www.flickr.com/photos/mrjoax/2548045246/
Social Science

• Social science is real science: the goal is systematic knowledge production


• Focuses on the study of the social life of human groups and individuals


• IMHO, fundamentally more difficult than
  “hard” sciences due to infinite
  complexity of social phenomena


• Replicability is a major challenge
  with respect to scientific method


• Not all social science can or should
  aspire to replicability
                                            http://www.flickr.com/photos/smiteme/2379629501/
Normalizing Science

• Becoming a normal science requires community and convergence


  • Ǝ(community) != Ǝ(agreement)


• Establishing grand challenges and
  methods are primary tasks
  of normalizing


• Resistance to change is pervasive




                                      http://www.flickr.com/photos/9036026@N08/2949211479/
Scientific Collaboration

• Collaboration requires common focus, if not also epistemology and ontology


• Challenging enough in normal sciences


• Harder in pre-paradigmatic research


• Economics: systemic disincentives to
  collaborate, versus potential benefits
  and ideals of science




                                          http://www.flickr.com/photos/richardsummers/542738965/
Big Science Collaboration

• LHC, CERN, etc.


  • Thousands of collaborators


  • Complex but coordinated,
    at least somewhat centralized


• Requires shared goals and resources,
  plus (lots of) communication


  • Only happens in normal sciences


                                         http://www.flickr.com/photos/8767020@N08/531355152/
Little Science Collaboration

• A Professor & a grad student, give or take


   • Localized goals and resources


      • -> localized research practices


• Small research teams


   • Fundamentally difficult to achieve
     consensus that allows larger groups


   • Restricts the ability to obtain funding
     and undertake ambitious projects
                                               http://www.flickr.com/photos/lamazone/2735939345/
Scientific Collaboration Requirements

• Shared goals


  • Establishes focus of research


• Shared research resources


  • Both social and artifactual


  • Social aspects include
    training and community
    socialization

                                     we can has share?
                                    http://www.flickr.com/photos/ryanr/142455033/
Historical Research Artifacts

• Letters, Books, Journals, Lectures


• Also technologies: methods, instrumentation


• Sharing?


   • Recordkeeping is not always
     a researcher’s main priority


   • Without records, there’s not
     much to share except the
     research outputs

                                          http://www.flickr.com/photos/smailtronic/1535870363/
Today’s Research Artifacts

• Large scale datasets, scripts, software, workflows, papers, images, video,
  audio, annotations, ephemera, web sites...


   • “Research objects” -
     bundling all the pieces together


   • Hybrids of boundary objects
     and touchstones


• Technologies -> scientific revolution!


   • Open science

                                           http://www.flickr.com/photos/smiteme/2379630899/
Example: FLOSS Research

• Phenomenological & interdisciplinary


  • Software engineering,
    Information Systems,
    Anthropology,
    Sociology,
    CSCW,
    etc...


• Ethos


  • (Idealistic) combination
    of open source values
    and scientific values
                                     http://www.flickr.com/photos/themadlolscientist/2542236565/
FLOSS Phenomenon

• Free/Libre Open Source Software
 “Free as in speech, free as in beer” - liberty versus cost



  • Distributed collaboration
    to develop software


  • Volunteers and sponsored
    developers


  • Community-based model
    of development



                                                              http://www.flickr.com/photos/prawnwarp/541526661/
Typical FLOSS Research Topics

• Coordination and collaboration


• Growth and evolution (social and code)


• Code quality


• Business models and firm involvement


• Motivation, leadership, success


• Culture and community


• Intellectual property and copyright      http://www.flickr.com/photos/eean/519258881/
What we study @ SU

• Social aspects of FLOSS


  • What practices make some distributed work teams more effective than
    others?


  • How are these practices developed?


  • What are the dynamics through which self-organizing distributed teams
    develop and work?
Sharing FLOSS Research Artifacts

• Community: Small but growing, maybe around 400 researchers worldwide,
  with lively face-to-face interaction but relatively low listserv activity


• Data: Lots of it, and readily available, though often difficult to use for several
  reasons


• Analyses and tools: Not quite as
  easy to get, but there if you can
  find them


• Papers: Repositories are as yet
  underdeveloped, but efforts are
  underway
                                          http://www.flickr.com/photos/12698507@N08/2762563631/
FLOSS Research Community

• Handful of small research groups, mostly in UK & Europe


   • Most often found in Software Engineering departments


• International conferences
  targeted to academics,
  developers, or both


   • OSS, ICSE, FOSDEM, etc.


• IFIP WG 2.13


                                          http://www.flickr.com/photos/steevithak/2883218362/
FLOSS Research Data

• Data sources include interviews, surveys, and ethnographic fieldwork


• Digital “trace” data: archival, secondary,
  by-product of work, easy but hard


• Repositories


   • Hosting “forges” like SourceForge,
     FreshMeat, RubyForge, etc.


• RoRs: Repositories of Repositories


   • Data sources for research
We Built It...

• Motivations


  • Stop hammering forge servers, getting entire campus IPs blocked...


  • Stop reinventing the wheel!


• Adoption


  • Shared data sources
    seeing increasing use


  • Next step is harder:
    sharing tools and workflows
                                          http://www.flickr.com/photos/circulating/997909242/
RoRs: FLOSSmole

• Multiple PIs @ Syracuse, Elon, & Carnegie Mellon
  One grad student @ SU (me), a couple of undergrads @ Elon
                                                                             
                                                                                                                  
                                                                                                                          
                                                                                                                     
                                                                                                                                          



• Public access to 300+ GB data on
                                                                           
                                                                                                               
                                                                                                                       
                                                                                                                         
                                                                                                                 
                                                                                                                   
                                                                                                                     
                                                                                                                                  




  • 300K+ projects from 8 repositories
                                                                                                                                                                               
                                                                                                                                                                            
                                            
                                                                                                                                                                            
                                          
                                                                                                                                                                            
                                                                                  
                                                                                                                                                                            
                                                                                           



  • Flat files & SQL datamarts
                                                                                                                                                                            
                                                                                      
                                                                                                                                                                            
                                                                                   
                                                                                                                                                                            
                                                                                
                                                                                                                                                                            
                                                                                       
                                                                                                                                                                            
                                                                                    
                                                                                                                                                                            
                                                                                             
                                                                                                                                                                            
                                                                                     
                                                                                                                                                                            
                                                                                  
                                                                                                                                                                            
                                          
                                                                                                                                                                            



  • Released via SF & GC
                                         
                                                                                                                                                                            
                                                                                                                                                                          




• 5 TB allotment on TeraGrid @ SDSC                        
                                                                                                                                                      
                                                                                                                                              
                                                                                                          
                                                                                                                                        
                                                                                                          
                                                                                                                                           
                                                                                                                                      
                                                        
                                                                                                          
                                                      
                                                                                                          
                                                                                                          
                                                                                                          
                                                                                                          
                                                                                                          
                                                                                                         
RoRs: FLOSSmetrics

• Produced by LibreSoft with academic and corporate partners


• Public access to data for 2800+ projects


• Analyzed & raw data from CVS, email, trackers


• Tools for:


   • calculating code metrics


   • parsing trackers


   • parsing email lists
RoRs: SRDA

• SourceForge Research Data Archive


  • One PI @ Notre Dame University


  • One massive 300 GB+ SQL db of monthly dumps from SourceForge


     • Original obtuse structure,
       regular table deprecation,
       some documentation


  • Gated access: researchers only,
    condition of data release from SF
RoRs: Emerging Sources

• Ultimate Debian Database (UDD)


  • 300 MB compressed Postgres DB,
    produced by Debian community


  • Planning to add to FLOSSmole
FLOSS Research Analyses

• When available...


   • Bespoke Scripts


   • Taverna workflows
FLOSS Research Papers

• First, there was opensource.mit.edu


   • They no longer maintain it, and gave us the data


• Work-in-progress working papers
  repository at FLOSSpapers.org


• Essential viability problem is that
  repositories require long-term
  stewardship...


   • ...which requires long-term
     commitments of funding and
     personnel, not just volunteers
FLOSS Research Collaboration

• Multiple partners involved in producing FLOSSmole & FLOSSmetrics


• Federated data sources by choice,
  starting to develop ontologies


• As yet, a Little Science domain


   • Cross-institutional collaboration
     poses many challenges


   • Usual difficulties magnified by
     general lack of resources, both
     financial and human
Latest Initiatives

• Resource-oriented


  • Expanding resources: data, research artifacts, and pedagogical materials


  • DOIs: 10.4118/*


  • Semantic data
    interoperability


• Community-oriented


  • FLOSShub.org
Evangelizing eScience

• Made presentations at OSS conferences: well received, but hard to make
  converts for several reasons


• Tried to get other research group members to use Taverna: learning overhead
  is too high for most


• Submitted a paper on eScience
  to an IS conference: rejected
  because reviewers were unable
  to adequately evaluate eScience
  as a topic, as it’s too unfamiliar


• Currently just doing our work this
  way, as an exemplar
                                            http://www.flickr.com/photos/naezmi/2418745377/
Barriers to Uptake

• Lack of agreement in research focus, theory, methods; researcher isolation


• Bimodal distribution of requisite skills


   • “I can’t possibly do that! I can’t code!”


   • “Why bother? I can code my own.
     You should too; just use Python.”
     “Overheard” on Twitter:

     Friend #1: i HATE that openoffice automatically took
     over my "open with..." defaults.

     Friend #2: @Friend #1 <opensourcedeveloper> If you
     don't like it, then why don't you submit code to change
     the behavior!? </opensourcedeveloper>
                                                               http://www.flickr.com/photos/noner/1739876378/
What I had to learn to get this far

• Taverna                           • A little bit of OWL, RDF, & SPARQL


• A lot more Unix terminal & XML    • I would not have taken this on if I
                                      had known what was in store, but
                                      once I got started, I was hooked
• Relational DB management & SQL


• More R, plus packages and
  dependency management


• Java & Eclipse - just enough to
  write my own Beanshells


• SVN & SSH
                                        http://www.flickr.com/photos/sashala/292868436/
Sociotechnical Engineering

• Tools are part of the solution, thanks to brilliant CS and SE people


• Social elements are the true barrier


   • Awareness of methods and
     benefits


   • Incentive systems


   • Resistance to change
     (paradigms again)


   • Proof of concept is difficult
                                              http://www.flickr.com/photos/pinprick/3117108495/
Using Taverna for Little eScience

• Implementing analysis is usually easy


• Data handling is almost always hard


   • All data are in SQL databases, with consistent IDs


   • Lots of data manipulation is required


• Avoiding web services as much as possible


   • Infrastructure and resources are limited


   • Benefit is truly questionable: AFAIK, I am 50% of the user base...
Example: Our Recent Research

• Estimating user base and potential user interest in FLOSS projects


   • Based on common release-and-download patterns


   • Proxy for project success, a common dependent variable

                   Area under             Potential user
                 curve is active         experimentation      Active user base
                 users updating           growth (good             growth
                                           publicity?)
    downloads




                Version 0.5        Version 0.6      Version 0.7
●
            5000

            4000
                                                                         measure
downloads




            3000
                                       ●
                                                                              user_base
            2000                                                     ●
                        ●                                        ●
                                                                              baseline
                   ●
            1000

                                 ●     ●                         ●   ●
                   ●    ●



             Oct−2005       Apr−2006       Oct−2006   Apr−2007




     “Normal” Download-
                                                       BibDesk
        Release Patterns
1.3.2-RC1
          +2 presentations   1.5.0



  ?   ?




Taverna’s Download-
                                     External effects!
   Release Patterns
Taverna’s Estimated
                       14 day baseline & drop-off
Baseline & User Base
Taverna’s Estimated
                       7 day baseline & drop-off
Baseline & User Base
Interpretation

• Taverna is not a “normal” open source project


  • Speaking tours, tutorials, articles, and other events influence downloads


• What this demonstrates...


  • Care is needed with quantitative measures


  • Not all open source projects are the same


  • Taverna users are just as reactive as any


                                           http://www.flickr.com/photos/pagedooley/2121472112/
Where next?

• Adoption is a long-term agenda, as changing social practices doesn’t happen
  overnight


• For FLOSS research and our disciplinary communities


  • We will keep doing our work this way,
    and hope to draw in others

    “Won’t you come out and play?”




                                              http://www.flickr.com/photos/atiq/2658884520/
Thanks!

• Credits where they are due


  • Kevin Crowston, my advisor




  • James Howison, my collaborator




  • Everett Wiggins, my husband

Little eScience

  • 1.
  • 2.
    Overview • Background • Exposition:Sociology of Science • Broad generalizations about science • Example: FLOSS Research • Little science context for eScience research • Expectations: What next? http://www.flickr.com/photos/pmtorrone/304696349/
  • 3.
    My Background • BA:Maths with economics • Nonprofit & IT industry work • Adult literacy, nonprofit management support, professional theatre • Web analytics • MSI: Human-computer interaction, complex systems & network science • PhD: Information science & technology
  • 4.
    Science • Systematic investigationfor the production of knowledge • Scientific method emphasizes reproducibility • Not all phenomena are reproducible... • Many categories • Experimental, applied, social, etc. • Categories are not mutually exclusive http://www.flickr.com/photos/radiorover/419414206/
  • 5.
    Paradigms & Revolutions •Kuhn - Laws, theories, applications & instrumentation that create coherent traditions of scientific research • Paradigms help us direct our research, but limit our view of the world • New technologies can lead to scientific revolutions by revealing anomalies http://www.flickr.com/photos/weichbrodt/644302381/
  • 6.
    Normal Science • Kuhn- “normal science” is research based on broadly accepted scientific paradigms • Shared paradigms are based on rules and standards for scientific practice • Key requirement: agreement on focus and conduct of research • Ǝ(Grand Challenges)|Discipline http://www.flickr.com/photos/themadlolscientist/2421152973/
  • 7.
    Big Science • deSolla Price - “Big Science” is... • Inherently paradigmatic • Always normal science • Produces detailed insights into the minutiae of phenomena studied in the paradigm http://www.flickr.com/photos/31333486@N00/1883498062/
  • 8.
    Pre-paradigmatic Science • Paradigmsrequire agreement on... • Epistemology • Ontology • Methodology • Most social sciences are pre-paradigmatic • Primarily exploratory research • Very little replication http://www.flickr.com/photos/askpang/327577395/
  • 9.
    Little Science • deSolla Price - “Little Science” is a romanticized precursor to Big Science, featuring lone, long-haired geniuses misunderstood by society, etc. • If it’s not Big Science, it’s Little Science • Pre-paradigmatic and fraught with ambiguity • Often fundamentally exploratory • Epistemological/theoretical/methodological divergence among researchers http://www.flickr.com/photos/mrjoax/2548045246/
  • 10.
    Social Science • Socialscience is real science: the goal is systematic knowledge production • Focuses on the study of the social life of human groups and individuals • IMHO, fundamentally more difficult than “hard” sciences due to infinite complexity of social phenomena • Replicability is a major challenge with respect to scientific method • Not all social science can or should aspire to replicability http://www.flickr.com/photos/smiteme/2379629501/
  • 11.
    Normalizing Science • Becominga normal science requires community and convergence • Ǝ(community) != Ǝ(agreement) • Establishing grand challenges and methods are primary tasks of normalizing • Resistance to change is pervasive http://www.flickr.com/photos/9036026@N08/2949211479/
  • 12.
    Scientific Collaboration • Collaborationrequires common focus, if not also epistemology and ontology • Challenging enough in normal sciences • Harder in pre-paradigmatic research • Economics: systemic disincentives to collaborate, versus potential benefits and ideals of science http://www.flickr.com/photos/richardsummers/542738965/
  • 13.
    Big Science Collaboration •LHC, CERN, etc. • Thousands of collaborators • Complex but coordinated, at least somewhat centralized • Requires shared goals and resources, plus (lots of) communication • Only happens in normal sciences http://www.flickr.com/photos/8767020@N08/531355152/
  • 14.
    Little Science Collaboration •A Professor & a grad student, give or take • Localized goals and resources • -> localized research practices • Small research teams • Fundamentally difficult to achieve consensus that allows larger groups • Restricts the ability to obtain funding and undertake ambitious projects http://www.flickr.com/photos/lamazone/2735939345/
  • 15.
    Scientific Collaboration Requirements •Shared goals • Establishes focus of research • Shared research resources • Both social and artifactual • Social aspects include training and community socialization we can has share? http://www.flickr.com/photos/ryanr/142455033/
  • 16.
    Historical Research Artifacts •Letters, Books, Journals, Lectures • Also technologies: methods, instrumentation • Sharing? • Recordkeeping is not always a researcher’s main priority • Without records, there’s not much to share except the research outputs http://www.flickr.com/photos/smailtronic/1535870363/
  • 17.
    Today’s Research Artifacts •Large scale datasets, scripts, software, workflows, papers, images, video, audio, annotations, ephemera, web sites... • “Research objects” - bundling all the pieces together • Hybrids of boundary objects and touchstones • Technologies -> scientific revolution! • Open science http://www.flickr.com/photos/smiteme/2379630899/
  • 18.
    Example: FLOSS Research •Phenomenological & interdisciplinary • Software engineering, Information Systems, Anthropology, Sociology, CSCW, etc... • Ethos • (Idealistic) combination of open source values and scientific values http://www.flickr.com/photos/themadlolscientist/2542236565/
  • 19.
    FLOSS Phenomenon • Free/LibreOpen Source Software “Free as in speech, free as in beer” - liberty versus cost • Distributed collaboration to develop software • Volunteers and sponsored developers • Community-based model of development http://www.flickr.com/photos/prawnwarp/541526661/
  • 20.
    Typical FLOSS ResearchTopics • Coordination and collaboration • Growth and evolution (social and code) • Code quality • Business models and firm involvement • Motivation, leadership, success • Culture and community • Intellectual property and copyright http://www.flickr.com/photos/eean/519258881/
  • 21.
    What we study@ SU • Social aspects of FLOSS • What practices make some distributed work teams more effective than others? • How are these practices developed? • What are the dynamics through which self-organizing distributed teams develop and work?
  • 22.
    Sharing FLOSS ResearchArtifacts • Community: Small but growing, maybe around 400 researchers worldwide, with lively face-to-face interaction but relatively low listserv activity • Data: Lots of it, and readily available, though often difficult to use for several reasons • Analyses and tools: Not quite as easy to get, but there if you can find them • Papers: Repositories are as yet underdeveloped, but efforts are underway http://www.flickr.com/photos/12698507@N08/2762563631/
  • 23.
    FLOSS Research Community •Handful of small research groups, mostly in UK & Europe • Most often found in Software Engineering departments • International conferences targeted to academics, developers, or both • OSS, ICSE, FOSDEM, etc. • IFIP WG 2.13 http://www.flickr.com/photos/steevithak/2883218362/
  • 24.
    FLOSS Research Data •Data sources include interviews, surveys, and ethnographic fieldwork • Digital “trace” data: archival, secondary, by-product of work, easy but hard • Repositories • Hosting “forges” like SourceForge, FreshMeat, RubyForge, etc. • RoRs: Repositories of Repositories • Data sources for research
  • 25.
    We Built It... •Motivations • Stop hammering forge servers, getting entire campus IPs blocked... • Stop reinventing the wheel! • Adoption • Shared data sources seeing increasing use • Next step is harder: sharing tools and workflows http://www.flickr.com/photos/circulating/997909242/
  • 26.
    RoRs: FLOSSmole • MultiplePIs @ Syracuse, Elon, & Carnegie Mellon One grad student @ SU (me), a couple of undergrads @ Elon         • Public access to 300+ GB data on                • 300K+ projects from 8 repositories            • Flat files & SQL datamarts                            • Released via SF & GC    • 5 TB allotment on TeraGrid @ SDSC                      
  • 27.
    RoRs: FLOSSmetrics • Producedby LibreSoft with academic and corporate partners • Public access to data for 2800+ projects • Analyzed & raw data from CVS, email, trackers • Tools for: • calculating code metrics • parsing trackers • parsing email lists
  • 28.
    RoRs: SRDA • SourceForgeResearch Data Archive • One PI @ Notre Dame University • One massive 300 GB+ SQL db of monthly dumps from SourceForge • Original obtuse structure, regular table deprecation, some documentation • Gated access: researchers only, condition of data release from SF
  • 29.
    RoRs: Emerging Sources •Ultimate Debian Database (UDD) • 300 MB compressed Postgres DB, produced by Debian community • Planning to add to FLOSSmole
  • 30.
    FLOSS Research Analyses •When available... • Bespoke Scripts • Taverna workflows
  • 31.
    FLOSS Research Papers •First, there was opensource.mit.edu • They no longer maintain it, and gave us the data • Work-in-progress working papers repository at FLOSSpapers.org • Essential viability problem is that repositories require long-term stewardship... • ...which requires long-term commitments of funding and personnel, not just volunteers
  • 32.
    FLOSS Research Collaboration •Multiple partners involved in producing FLOSSmole & FLOSSmetrics • Federated data sources by choice, starting to develop ontologies • As yet, a Little Science domain • Cross-institutional collaboration poses many challenges • Usual difficulties magnified by general lack of resources, both financial and human
  • 33.
    Latest Initiatives • Resource-oriented • Expanding resources: data, research artifacts, and pedagogical materials • DOIs: 10.4118/* • Semantic data interoperability • Community-oriented • FLOSShub.org
  • 34.
    Evangelizing eScience • Madepresentations at OSS conferences: well received, but hard to make converts for several reasons • Tried to get other research group members to use Taverna: learning overhead is too high for most • Submitted a paper on eScience to an IS conference: rejected because reviewers were unable to adequately evaluate eScience as a topic, as it’s too unfamiliar • Currently just doing our work this way, as an exemplar http://www.flickr.com/photos/naezmi/2418745377/
  • 35.
    Barriers to Uptake •Lack of agreement in research focus, theory, methods; researcher isolation • Bimodal distribution of requisite skills • “I can’t possibly do that! I can’t code!” • “Why bother? I can code my own. You should too; just use Python.” “Overheard” on Twitter: Friend #1: i HATE that openoffice automatically took over my "open with..." defaults. Friend #2: @Friend #1 <opensourcedeveloper> If you don't like it, then why don't you submit code to change the behavior!? </opensourcedeveloper> http://www.flickr.com/photos/noner/1739876378/
  • 36.
    What I hadto learn to get this far • Taverna • A little bit of OWL, RDF, & SPARQL • A lot more Unix terminal & XML • I would not have taken this on if I had known what was in store, but once I got started, I was hooked • Relational DB management & SQL • More R, plus packages and dependency management • Java & Eclipse - just enough to write my own Beanshells • SVN & SSH http://www.flickr.com/photos/sashala/292868436/
  • 37.
    Sociotechnical Engineering • Toolsare part of the solution, thanks to brilliant CS and SE people • Social elements are the true barrier • Awareness of methods and benefits • Incentive systems • Resistance to change (paradigms again) • Proof of concept is difficult http://www.flickr.com/photos/pinprick/3117108495/
  • 38.
    Using Taverna forLittle eScience • Implementing analysis is usually easy • Data handling is almost always hard • All data are in SQL databases, with consistent IDs • Lots of data manipulation is required • Avoiding web services as much as possible • Infrastructure and resources are limited • Benefit is truly questionable: AFAIK, I am 50% of the user base...
  • 39.
    Example: Our RecentResearch • Estimating user base and potential user interest in FLOSS projects • Based on common release-and-download patterns • Proxy for project success, a common dependent variable Area under Potential user curve is active experimentation Active user base users updating growth (good growth publicity?) downloads Version 0.5 Version 0.6 Version 0.7
  • 40.
    5000 4000 measure downloads 3000 ● user_base 2000 ● ● ● baseline ● 1000 ● ● ● ● ● ● Oct−2005 Apr−2006 Oct−2006 Apr−2007 “Normal” Download- BibDesk Release Patterns
  • 41.
    1.3.2-RC1 +2 presentations 1.5.0 ? ? Taverna’s Download- External effects! Release Patterns
  • 42.
    Taverna’s Estimated 14 day baseline & drop-off Baseline & User Base
  • 43.
    Taverna’s Estimated 7 day baseline & drop-off Baseline & User Base
  • 44.
    Interpretation • Taverna isnot a “normal” open source project • Speaking tours, tutorials, articles, and other events influence downloads • What this demonstrates... • Care is needed with quantitative measures • Not all open source projects are the same • Taverna users are just as reactive as any http://www.flickr.com/photos/pagedooley/2121472112/
  • 45.
    Where next? • Adoptionis a long-term agenda, as changing social practices doesn’t happen overnight • For FLOSS research and our disciplinary communities • We will keep doing our work this way, and hope to draw in others “Won’t you come out and play?” http://www.flickr.com/photos/atiq/2658884520/
  • 46.
    Thanks! • Credits wherethey are due • Kevin Crowston, my advisor • James Howison, my collaborator • Everett Wiggins, my husband