Tales of the Field: Building Small Science Cyberinfrastructure


Published on

Society for the Social Studies of Science cyberinfrastructure methods panel presentation on experiences building small science cyberinfrastructure and reflections on implications for other pre-paradigmatic domains.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Tales of the Field: Building Small Science Cyberinfrastructure

  1. 1. Tales of the Field: Building Small Science Cyberinfrastructure Andrea Wiggins iSchool @ Syracuse University 31 October, 2009
  2. 2. Free/Libre Open Source Software <ul><li>FLOSS development </li></ul><ul><ul><li>Large-scale social phenomenon of “collaborative” software development </li></ul></ul><ul><li>Observing FLOSS research </li></ul><ul><ul><li>Reflexive examination of small scholarly community studying FLOSS development </li></ul></ul><ul><ul><li>Specifically working on building CI for FLOSS research </li></ul></ul>http://www.flickr.com/photos/pmtorrone/304696349/
  3. 3. eScience Proof of Concept <ul><li>(some) FLOSS research is a good candidate for eScience approaches to doing the work </li></ul><ul><ul><li>Lots of data due to scale of phenomenon </li></ul></ul><ul><ul><li>Research community ethos of sharing </li></ul></ul><ul><ul><ul><li>Data repositories </li></ul></ul></ul><ul><ul><ul><li>Research paper archive </li></ul></ul></ul><ul><ul><ul><li>Analysis artifacts </li></ul></ul></ul>
  4. 4. FLOSS Research Community <ul><li>Little Science </li></ul><ul><ul><li>Interdisciplinary: primarily software engineering, but also social sciences across a wide spectrum </li></ul></ul><ul><ul><li>Fairly small community: under 500 researchers worldwide </li></ul></ul>http://www.flickr.com/photos/circulating/997909242/
  5. 5. FLOSS Data <ul><li>Many types of data, focus here on digital “trace” data </li></ul><ul><ul><li>Archival, secondary </li></ul></ul><ul><ul><li>By-product of FLOSS work, easy to get but hard to use </li></ul></ul><ul><li>Federated repositories of repositories (RoRs) </li></ul><ul><ul><li>Data for research drawn from hosting “forges” </li></ul></ul><ul><ul><li>~1 TB across 3 RoRs </li></ul></ul>http://www.flickr.com/photos/smiteme/2379630899/
  6. 6. Research Methods & Tools <ul><li>Methods used with RoR data vary, but are generally quantitative </li></ul><ul><ul><li>Correlational studies </li></ul></ul><ul><ul><li>Longitudinal analysis </li></ul></ul><ul><ul><li>Code metrics </li></ul></ul><ul><li>Two main approaches </li></ul><ul><ul><li>Bespoke scripts or tools </li></ul></ul><ul><ul><li>eScience workflow tools </li></ul></ul>
  7. 7. Barriers to Uptake <ul><li>Little Science </li></ul><ul><ul><li>Lack of agreement over epistemology, RQs, methods, tools </li></ul></ul><ul><ul><li>Researcher isolation, few incentives to collaborate </li></ul></ul><ul><li>Bimodal distribution of skills </li></ul><ul><ul><li>“ I can’t possibly do that! I can’t write code!” </li></ul></ul><ul><ul><li>“ Why bother? I just write my own Python script; you should too.” </li></ul></ul>http://www.flickr.com/photos/noner/1739876378/
  8. 8. Technology Skills Required <ul><li>Taverna </li></ul><ul><li>SVN </li></ul><ul><li>(more) SSH, Unix terminal, XML </li></ul><ul><li>R, plus packages </li></ul><ul><li>SQL, relational DB management </li></ul><ul><li>Java & Eclipse (just enough) </li></ul><ul><li>OWL, RDF, SPARQL </li></ul><ul><li>Knowledge of opaque data sources </li></ul>
  9. 9. Implications for Small Sciences <ul><li>Critical mass </li></ul><ul><ul><li>Need stewardship, dedicated resources </li></ul></ul><ul><li>Skills gap </li></ul><ul><ul><li>eScience tools require fairly high technology competency </li></ul></ul><ul><li>Convergence of research </li></ul><ul><ul><li>Common questions, modes of research </li></ul></ul><ul><li>Motivations to contribute </li></ul><ul><ul><li>Academic credit </li></ul></ul>http://www.flickr.com/photos/askpang/327577395/
  10. 10. Potential Solutions <ul><li>$$$ </li></ul><ul><ul><li>Maintaining and developing resources is not free, even if they are freely shared </li></ul></ul><ul><li>Curricular integration </li></ul><ul><ul><li>Broaden contributor base by drawing on students through coursework </li></ul></ul><ul><li>Deliberately cultivate a community </li></ul><ul><ul><li>Train PhD students early in their studies </li></ul></ul><ul><li>Mechanisms to incentivize contribution </li></ul>
  11. 11. Conclusions <ul><li>Without external imperatives, CI for little science seems unlikely to emerge unaided </li></ul><ul><li>CI requires standardization and movement toward normal science, which may be premature or simply inappropriate for many social sciences </li></ul><ul><li>Benefits for early adopters: tools support efficient collaboration, enable rigorous research provenance, permit analysis replication, and speed time to results </li></ul>