A Task-Centered Framework for
Computationally Grounded Science
Collaborations
1Information Sciences Institute, University of Southern California
2Department of Software Engineering for Business Information Systems,
Technical University of Munich
3Department of Civil and Environmental Engineering at Penn State University
4Center for Limnology at the University of Wisconsin Madison
Yolanda Gil1, Felix Michel12, Varun Ratnakar1, Matheus Hauder2,
Christopher Duffy3, Hilary Dugan4, and Paul Hanson4
11th IEEE International Conference on eScience
Organic Data Science
http://www.organicdatascience.org/
USC INFORMATION SCIENCES INSTITUTE 2
ApproachIntroductionMotivation Evaluation Conclusion
Evolution of the scientific enterprise
Evolution of the scientific enterprise from [Barabasi, 2005] extended with the
ATLAS Detector Project at the Large Hadron Collider [The ATLAS Collaboration, 2012].
Motivation
single-authorship co-authorship large number of
co-authors
the community
as author
USC INFORMATION SCIENCES INSTITUTE 3
ApproachIntroductionMotivation Evaluation Conclusion
Taxonomy of Science Communities
Collaboration types with resources and activities [Bos et al 2007]
Introduction
Tools
(instruments)
Information
(data)
Knowledge
(new findings)
Aggregating
across distance
(loose coupling,
often asyn-
chronously)
Shared Instrument
NEON
Communication
Data System
PDB
Virtual Learning Community
GLEON,
Virtual Community of Practice
VIVO
Co-creating
across distance
(requires tighter
coupling, often
synchronously)
Infrastructure
CSDMS
Open Community
Contribution System
Zooniverse
Distributed Research Center
ENCODE
USC INFORMATION SCIENCES INSTITUTE 4
ApproachIntroductionMotivation Evaluation ConclusionIntroduction
Multi-disciplinary
contributions
Significant coordination
Engaging unanticipated
participants
R1:
R2:
R3:
Goal: Supporting Distributed Research Activities
with Unanticipated Participants Joining Over Time
USC INFORMATION SCIENCES INSTITUTE 5
ApproachIntroductionMotivation Evaluation ConclusionApproach
Algorithm Black box
Input Parameter Output
-> x1
-> x2
-> y1
-> y2
Description
z ->
v ->
a ->
b ->
This componentuses the X model to generate ….
factor: 20
repeat: 16 times
Min: 0.5 units
max: 11.5 units
MetaDescription
Software Component
Modeling Analyze Provenance
Executed 2014
Input:
Results:
Executed 2013
Input:
Results:
Executed 2012
Input:
Results:
Executed 2011
Input:
Results:
Implement computational
data analysis
1) Workflow
creation
activities
Supported by
workflow
systems
Computationally Grounded Science Collaboration: Layers
USC INFORMATION SCIENCES INSTITUTE 6
ApproachIntroductionMotivation Evaluation ConclusionApproach
Code
Input Parameter Output
-> x1
-> x2
-> y1
-> y2
Description
z ->
v ->
a ->
b ->
This component uses the X model to generate ….
factor: 20
repeat: 16 times
Min: 0.5 units
max: 11.5 units
MetaDescriptionAlgorithm
Select/develop software
Computationally Grounded Science Collaboration: Layers
2) Software
development
activities
Supported by
shared
software
repositories
USC INFORMATION SCIENCES INSTITUTE 7
ApproachIntroductionMotivation Evaluation ConclusionApproach
Select problems, strategies,
data, models, methods, etc.
Organic
Data
Science
Workflow Black box
Data Parameters
Description
z ->
v ->
a ->
b ->
Model X withdata source Y indicates …
MetaDescription
Computational Workflow
Models
Computationally Grounded Science Collaboration: Layers
3) Meta-
workflow
design
activities
Our focus
USC INFORMATION SCIENCES INSTITUTE 8
ApproachIntroductionMotivation Evaluation Conclusion
Computationally Grounded Science Collaboration: Layers
Meta-workflow
design activities
Workflow
creation
activities
Software
development
activities
Approach
USC INFORMATION SCIENCES INSTITUTE 9
ApproachIntroductionMotivation Evaluation Conclusion
Collaboration that occurs in
distributed research activities
with unanticipated participants
joining over time
Meta-workflow design layer:
scientists working together to agree
on a problem to solve and a strategy
to solve it
Reducing the coordination effort, lower
the barriers to growing the community
Focus of this work
Approach
USC INFORMATION SCIENCES INSTITUTE 10
ApproachIntroductionMotivation Evaluation Conclusion
Social Design Principles
Selected social principles from [Kraut and Resnick 2012] for building successful online communities that can be applied to Organic Data Science.
A1: Carve a niche of interest, scoped in terms of topics,
members, activities, and purpose
A2: Relate to competing sites, integrate content
A3: Organize content, people, and activities into
subspaces once there is enough activity
A4: Highlight more active tasks
A5: Inactive tasks should have “expected active times”
A6: Create mechanisms to match people to activities
B1: Make it easy to see and track
needed contributions
B2: Ask specific people on tasks of interest to them
B3: Simple tasks with challenging goals
are easier to comply with
B4: Specify deadlines for tasks,
while leaving people in control
B5: Give frequent feedback specific to the goals
…
B10 …
C1: Cluster members to help them identify
with the community
C2: Give subgroups a name and a tagline
C3: Put subgroups in the context of a larger group
C4: Make community goals and purpose explicit
C5: Interdependent tasks increase
commitment and reduce conflict
D
D1: Members recruiting colleagues is most effective
D2: Appoint people responsible for immediate
friendly interactions
D3: Introducing newcomers to members
increases interactions
D4: Entry barriers for newcomers help
screen for commitment
D5: When small, acknowledge each new member
…
D12 …
B
A C
Approach
Starting communities
Encouraging contributions
through motivation
Encouraging commitment
Attracting and Engaging Newcomers
USC INFORMATION SCIENCES INSTITUTE 11
ApproachIntroductionMotivation Evaluation Conclusion
Best Practices from Polymath and Encode
Selected best practices from the Polymath [Nielsen 2012] project and lessons learned from ENCODE [Encode 2004].
E1: Permanent URLs for posts and comments, so others can refer to them
E2: Appoint a volunteer to summarize periodically
E3: Appoint a volunteer to answer questions from newcomers
E4: Low barrier of entry: make it VERY easy to comment
E5: Advance notice of tasks that are anticipated
E6: Keep few tasks active at any given time, helps focus
F1: Spine of leadership, including a few leading scientists and 1-2 operational project managers,
that resolves complex scientific and social problems and has transparent decision making
F2: Written and publicly accessible rules to transfer work between groups, to
assign credit when papers are published, to present the work
F3: Quality inspection with visibility into intermediate steps
F4: Export of data and results, integration with existing standards
E
F
Approach
Lessons learned from ENCODE
Best practices from Polymath
USC INFORMATION SCIENCES INSTITUTE 12
ApproachIntroductionMotivation Evaluation Conclusion
Self-Organization through Dynamic
Task Decomposition
Approach
eScience
USC INFORMATION SCIENCES INSTITUTE 13
ApproachIntroductionMotivation Evaluation Conclusion
Organic Data Science:
Contributors
https://github.com/IKCAP/
organicdatascience
Approach
USC INFORMATION SCIENCES INSTITUTE 14
ApproachIntroductionMotivation Evaluation Conclusion
Organic data science is a novel approach to on-
line scientific collaboration that supports:
 Self-organization of communities by enabling
any user to specify and decompose tasks
 On-line community support by incorporating
social sciences principles and best practices
 An open science process by capturing new
kinds of metadata about the collaboration
that give necessary context to newcomers
Task-oriented self-organizing on-line
communities for open collaboration in science
Organic Data Science
Approach
USC INFORMATION SCIENCES INSTITUTE 15
ApproachIntroductionMotivation Evaluation Conclusion
Ongoing Communities
Age of Water is community
of hydrologists and
limnologists that are
studying the age of water in
an ecosystem.
ENIGMA a consortium for
neuroimaging genetics, it
includes more than 70
institutions that
collaborate to do joint
neuroscience studies.
GPF a group of geoscientists
publishing a special issue of a
journal. All articles include
datasets, software, and
workflows used to generate the
results in the paper
ODST assigns all new users
a set of pre-defined tasks
that involves learning
aspects of the framework.
ODSF coordinates the
development and
improvement of the Organic
Data Science Framework.
Approach
USC INFORMATION SCIENCES INSTITUTE 16
ApproachIntroductionMotivation Evaluation Conclusion
Evolution of the collaboration in the GPF community
GPF community was seeded with
five organizers of the special issue
One of the organizers served
as the host for the authors
The authors shared more and more
tasks as the collaboration progressed
The thickness of the lines is more
pronounced in the final graph❹
❶ ❷
❸
Evaluation
USC INFORMATION SCIENCES INSTITUTE 17
ApproachIntroductionMotivation Evaluation Conclusion
Age of Water Community
Number of Ancestors
NumberofTasks
Social Task NetworkTask Hierarchy
Node = Participant
Edge = Tasks in common
Evaluation
USC INFORMATION SCIENCES INSTITUTE 18
ApproachIntroductionMotivation Evaluation Conclusion
ENIGMA Community
Number of Ancestors
NumberofTasks
Social Task NetworkTask Hierarchy
Node = Participant
Edge = Tasks in common
Evaluation
USC INFORMATION SCIENCES INSTITUTE 19
ApproachIntroductionMotivation Evaluation Conclusion
GPF Community
Number of Ancestors
NumberofTasks
Social Task NetworkTask Hierarchy
Node = Participant
Edge = Tasks in common
Evaluation
USC INFORMATION SCIENCES INSTITUTE 20
ApproachIntroductionMotivation Evaluation Conclusion
ODSF Community
Number of Ancestors
NumberofTasks
Social Task NetworkTask Hierarchy
Evaluation
Node = Participant
Edge = Tasks in common
USC INFORMATION SCIENCES INSTITUTE 21
ApproachIntroductionMotivation Evaluation Conclusion
ODST Community
Number of Ancestors
NumberofTasks
Social Task NetworkTask Hierarchy
Node = Participant
Edge = Tasks in common
Evaluation
To accomplish the ODS Training no
collaboration is needed, therefore
only two users are shown.
USC INFORMATION SCIENCES INSTITUTE 22
ApproachIntroductionMotivation Evaluation Conclusion
Task metadata analysis
Evaluation
USC INFORMATION SCIENCES INSTITUTE 23
ApproachIntroductionMotivation Evaluation Conclusion
Conclusions
Conclusion
The Organic Data Science Framework supports
collaborations that are distributed research
activities with unanticipated participants joining
over time:
 meta-workflow design layer: scientists working together to
agree on a problem to solve and a strategy to solve it.
 based on social design principles
 preliminary data on use in different communities
Future work: Evaluation to assess how the
framework supports scientific collaboration
and whether it increases productivity and
community growth.
USC INFORMATION SCIENCES INSTITUTE 24
ApproachIntroductionMotivation Evaluation Conclusion
Thank You
https://github.com/IKCAP/organicdatascience
Organic Data Science
http://www.organicdatascience.org/
Development
Acknowledgments
We gratefully acknowledge funding from the US National Science Foundation under grant IIS-1344272.

A Task-Centered Framework för Computationally Grounded Science Collaborations

  • 1.
    A Task-Centered Frameworkfor Computationally Grounded Science Collaborations 1Information Sciences Institute, University of Southern California 2Department of Software Engineering for Business Information Systems, Technical University of Munich 3Department of Civil and Environmental Engineering at Penn State University 4Center for Limnology at the University of Wisconsin Madison Yolanda Gil1, Felix Michel12, Varun Ratnakar1, Matheus Hauder2, Christopher Duffy3, Hilary Dugan4, and Paul Hanson4 11th IEEE International Conference on eScience Organic Data Science http://www.organicdatascience.org/
  • 2.
    USC INFORMATION SCIENCESINSTITUTE 2 ApproachIntroductionMotivation Evaluation Conclusion Evolution of the scientific enterprise Evolution of the scientific enterprise from [Barabasi, 2005] extended with the ATLAS Detector Project at the Large Hadron Collider [The ATLAS Collaboration, 2012]. Motivation single-authorship co-authorship large number of co-authors the community as author
  • 3.
    USC INFORMATION SCIENCESINSTITUTE 3 ApproachIntroductionMotivation Evaluation Conclusion Taxonomy of Science Communities Collaboration types with resources and activities [Bos et al 2007] Introduction Tools (instruments) Information (data) Knowledge (new findings) Aggregating across distance (loose coupling, often asyn- chronously) Shared Instrument NEON Communication Data System PDB Virtual Learning Community GLEON, Virtual Community of Practice VIVO Co-creating across distance (requires tighter coupling, often synchronously) Infrastructure CSDMS Open Community Contribution System Zooniverse Distributed Research Center ENCODE
  • 4.
    USC INFORMATION SCIENCESINSTITUTE 4 ApproachIntroductionMotivation Evaluation ConclusionIntroduction Multi-disciplinary contributions Significant coordination Engaging unanticipated participants R1: R2: R3: Goal: Supporting Distributed Research Activities with Unanticipated Participants Joining Over Time
  • 5.
    USC INFORMATION SCIENCESINSTITUTE 5 ApproachIntroductionMotivation Evaluation ConclusionApproach Algorithm Black box Input Parameter Output -> x1 -> x2 -> y1 -> y2 Description z -> v -> a -> b -> This componentuses the X model to generate …. factor: 20 repeat: 16 times Min: 0.5 units max: 11.5 units MetaDescription Software Component Modeling Analyze Provenance Executed 2014 Input: Results: Executed 2013 Input: Results: Executed 2012 Input: Results: Executed 2011 Input: Results: Implement computational data analysis 1) Workflow creation activities Supported by workflow systems Computationally Grounded Science Collaboration: Layers
  • 6.
    USC INFORMATION SCIENCESINSTITUTE 6 ApproachIntroductionMotivation Evaluation ConclusionApproach Code Input Parameter Output -> x1 -> x2 -> y1 -> y2 Description z -> v -> a -> b -> This component uses the X model to generate …. factor: 20 repeat: 16 times Min: 0.5 units max: 11.5 units MetaDescriptionAlgorithm Select/develop software Computationally Grounded Science Collaboration: Layers 2) Software development activities Supported by shared software repositories
  • 7.
    USC INFORMATION SCIENCESINSTITUTE 7 ApproachIntroductionMotivation Evaluation ConclusionApproach Select problems, strategies, data, models, methods, etc. Organic Data Science Workflow Black box Data Parameters Description z -> v -> a -> b -> Model X withdata source Y indicates … MetaDescription Computational Workflow Models Computationally Grounded Science Collaboration: Layers 3) Meta- workflow design activities Our focus
  • 8.
    USC INFORMATION SCIENCESINSTITUTE 8 ApproachIntroductionMotivation Evaluation Conclusion Computationally Grounded Science Collaboration: Layers Meta-workflow design activities Workflow creation activities Software development activities Approach
  • 9.
    USC INFORMATION SCIENCESINSTITUTE 9 ApproachIntroductionMotivation Evaluation Conclusion Collaboration that occurs in distributed research activities with unanticipated participants joining over time Meta-workflow design layer: scientists working together to agree on a problem to solve and a strategy to solve it Reducing the coordination effort, lower the barriers to growing the community Focus of this work Approach
  • 10.
    USC INFORMATION SCIENCESINSTITUTE 10 ApproachIntroductionMotivation Evaluation Conclusion Social Design Principles Selected social principles from [Kraut and Resnick 2012] for building successful online communities that can be applied to Organic Data Science. A1: Carve a niche of interest, scoped in terms of topics, members, activities, and purpose A2: Relate to competing sites, integrate content A3: Organize content, people, and activities into subspaces once there is enough activity A4: Highlight more active tasks A5: Inactive tasks should have “expected active times” A6: Create mechanisms to match people to activities B1: Make it easy to see and track needed contributions B2: Ask specific people on tasks of interest to them B3: Simple tasks with challenging goals are easier to comply with B4: Specify deadlines for tasks, while leaving people in control B5: Give frequent feedback specific to the goals … B10 … C1: Cluster members to help them identify with the community C2: Give subgroups a name and a tagline C3: Put subgroups in the context of a larger group C4: Make community goals and purpose explicit C5: Interdependent tasks increase commitment and reduce conflict D D1: Members recruiting colleagues is most effective D2: Appoint people responsible for immediate friendly interactions D3: Introducing newcomers to members increases interactions D4: Entry barriers for newcomers help screen for commitment D5: When small, acknowledge each new member … D12 … B A C Approach Starting communities Encouraging contributions through motivation Encouraging commitment Attracting and Engaging Newcomers
  • 11.
    USC INFORMATION SCIENCESINSTITUTE 11 ApproachIntroductionMotivation Evaluation Conclusion Best Practices from Polymath and Encode Selected best practices from the Polymath [Nielsen 2012] project and lessons learned from ENCODE [Encode 2004]. E1: Permanent URLs for posts and comments, so others can refer to them E2: Appoint a volunteer to summarize periodically E3: Appoint a volunteer to answer questions from newcomers E4: Low barrier of entry: make it VERY easy to comment E5: Advance notice of tasks that are anticipated E6: Keep few tasks active at any given time, helps focus F1: Spine of leadership, including a few leading scientists and 1-2 operational project managers, that resolves complex scientific and social problems and has transparent decision making F2: Written and publicly accessible rules to transfer work between groups, to assign credit when papers are published, to present the work F3: Quality inspection with visibility into intermediate steps F4: Export of data and results, integration with existing standards E F Approach Lessons learned from ENCODE Best practices from Polymath
  • 12.
    USC INFORMATION SCIENCESINSTITUTE 12 ApproachIntroductionMotivation Evaluation Conclusion Self-Organization through Dynamic Task Decomposition Approach eScience
  • 13.
    USC INFORMATION SCIENCESINSTITUTE 13 ApproachIntroductionMotivation Evaluation Conclusion Organic Data Science: Contributors https://github.com/IKCAP/ organicdatascience Approach
  • 14.
    USC INFORMATION SCIENCESINSTITUTE 14 ApproachIntroductionMotivation Evaluation Conclusion Organic data science is a novel approach to on- line scientific collaboration that supports:  Self-organization of communities by enabling any user to specify and decompose tasks  On-line community support by incorporating social sciences principles and best practices  An open science process by capturing new kinds of metadata about the collaboration that give necessary context to newcomers Task-oriented self-organizing on-line communities for open collaboration in science Organic Data Science Approach
  • 15.
    USC INFORMATION SCIENCESINSTITUTE 15 ApproachIntroductionMotivation Evaluation Conclusion Ongoing Communities Age of Water is community of hydrologists and limnologists that are studying the age of water in an ecosystem. ENIGMA a consortium for neuroimaging genetics, it includes more than 70 institutions that collaborate to do joint neuroscience studies. GPF a group of geoscientists publishing a special issue of a journal. All articles include datasets, software, and workflows used to generate the results in the paper ODST assigns all new users a set of pre-defined tasks that involves learning aspects of the framework. ODSF coordinates the development and improvement of the Organic Data Science Framework. Approach
  • 16.
    USC INFORMATION SCIENCESINSTITUTE 16 ApproachIntroductionMotivation Evaluation Conclusion Evolution of the collaboration in the GPF community GPF community was seeded with five organizers of the special issue One of the organizers served as the host for the authors The authors shared more and more tasks as the collaboration progressed The thickness of the lines is more pronounced in the final graph❹ ❶ ❷ ❸ Evaluation
  • 17.
    USC INFORMATION SCIENCESINSTITUTE 17 ApproachIntroductionMotivation Evaluation Conclusion Age of Water Community Number of Ancestors NumberofTasks Social Task NetworkTask Hierarchy Node = Participant Edge = Tasks in common Evaluation
  • 18.
    USC INFORMATION SCIENCESINSTITUTE 18 ApproachIntroductionMotivation Evaluation Conclusion ENIGMA Community Number of Ancestors NumberofTasks Social Task NetworkTask Hierarchy Node = Participant Edge = Tasks in common Evaluation
  • 19.
    USC INFORMATION SCIENCESINSTITUTE 19 ApproachIntroductionMotivation Evaluation Conclusion GPF Community Number of Ancestors NumberofTasks Social Task NetworkTask Hierarchy Node = Participant Edge = Tasks in common Evaluation
  • 20.
    USC INFORMATION SCIENCESINSTITUTE 20 ApproachIntroductionMotivation Evaluation Conclusion ODSF Community Number of Ancestors NumberofTasks Social Task NetworkTask Hierarchy Evaluation Node = Participant Edge = Tasks in common
  • 21.
    USC INFORMATION SCIENCESINSTITUTE 21 ApproachIntroductionMotivation Evaluation Conclusion ODST Community Number of Ancestors NumberofTasks Social Task NetworkTask Hierarchy Node = Participant Edge = Tasks in common Evaluation To accomplish the ODS Training no collaboration is needed, therefore only two users are shown.
  • 22.
    USC INFORMATION SCIENCESINSTITUTE 22 ApproachIntroductionMotivation Evaluation Conclusion Task metadata analysis Evaluation
  • 23.
    USC INFORMATION SCIENCESINSTITUTE 23 ApproachIntroductionMotivation Evaluation Conclusion Conclusions Conclusion The Organic Data Science Framework supports collaborations that are distributed research activities with unanticipated participants joining over time:  meta-workflow design layer: scientists working together to agree on a problem to solve and a strategy to solve it.  based on social design principles  preliminary data on use in different communities Future work: Evaluation to assess how the framework supports scientific collaboration and whether it increases productivity and community growth.
  • 24.
    USC INFORMATION SCIENCESINSTITUTE 24 ApproachIntroductionMotivation Evaluation Conclusion Thank You https://github.com/IKCAP/organicdatascience Organic Data Science http://www.organicdatascience.org/ Development Acknowledgments We gratefully acknowledge funding from the US National Science Foundation under grant IIS-1344272.

Editor's Notes

  • #3 Galileo, Newton, Darwin and Einstein published fundamental work single-authorship Watson and Crick made progress on unscrambling the DNA’s structure co-authorship International Human Genome Sequencing Consortium large number co-authors ATLAS Detector Project at the Large Hadron Collider in CERN the community as author
  • #4 Shared Instruments, NASA and the University of Hawaii to share a telescope physically located in Hawaii. NEON National Ecological Observatory Network Community Data Systems is based on a geographically-distributed community that creates, modifies and maintains data sets. PDB Protein Databank Open Community Contribution Systems The approach focuses on contributing work, not on data. Zooniverse Virtual Communities of Practice is a network of interests, advices, and links to reassures in a research area. They do not work on joint projects. GLEON Global Lake Ecological Observatory Network Virtual Learning Communities help educating their participants. VIVO The VIVO Researcher Network Distributed Research Centers Research Centers profit from synergies through aggregation of resources and talents and effort. ENCODE ENCyclopedia of DNA Elements Community Infrastructure Projects develop a common infrastructure for a certain domain. CSDMS Community Surface Dynamics Modeling System
  • #10 Computationally-grounded collaboration occurs at several levels, from high-level meta-workflow design to determine what scientific problem to solve and how, to workflow creation to select the data and analytic software to be used, to coding activities to implement the software needed. The focus of this work is the former, that is, the collaboration that occurs when scientists are working together to agree on a problem to solve and a strategy to solve it.
  • #15 Computationally-grounded collaboration occurs at several levels, from high-level meta-workflow design to determine what scientific problem to solve and how, to workflow creation to select the data and analytic software to be used, to coding activities to implement the software needed. The focus of this work is the former, that is, the collaboration that occurs when scientists are working together to agree on a problem to solve and a strategy to solve it.
  • #16 Several communities are currently using the Organic Data Science framework. The major use of our framework is by a community of hydrologists and limnologists that are studying the age of water (AoW) in an ecosystem. This involves determining the concentrations of water isotopes at different locations as water flows over time. They are the main driver for the development of the Organic Data Science framework, and the AoW site initially included activities that span both topics. As the community evolved, it was eventually split into two separate sites (AoW and ODSF). Another community is the ENIGMA consortium for neuroimaging genetics. This consortium includes more than 70 institutions that collaborate to do joint neuroscience studies. The institutions keep their data locally, but they all agree to the method and software to be used to analyze their data. They organize themselves into working groups, each group studies a particular disease (e.g., autism) and cohort (e.g., children.) The leads of the consortium are interested in using the Organic Data Science framework to track what institutions participate in what study, the characteristics of their datasets, and the point person in that institution for each particular study. A requirement of this group is that some information needs to remain private to outsiders, and other information can only be shared between each institution and the lead organization. As a result, they have set up two separate sites: ENIGMA-LEADS and ENIGMA-ALL. Both sites are referred to here as ENIGMA. Another community is a group of geoscientists working together to publish a special issue of a journal composed of geoscience papers of the future (GPF). All the articles will follow a similar format in that they publish explicitly all datasets, software, and workflows used to generate the results in the paper. The site is being used to coordinate the activities involved in tracking the status of each paper, and to compare the approaches in different papers. We have also set up a site for training new users of the Organic Data Science framework regardless of their home community (ODST). All new users are given a pre-defined set of tasks each involving learning about some aspect of the framework. This gives them the ability to use this new system in a practice setting, following one of the social principles described earlier for newcomers. As they practice, they can create their own tasks and add themselves as participants of other tasks.
  • #17 Figure 5 illustrates the evolution of one of the communities (GPF) by showing the social task network at four different points in time. The GPF community was seeded with five organizers of the special issue (3a). The organizers shared different sets of tasks involved in planning the special issue. One of the organizers served as the host for the authors (3b). The authors shared more and more tasks as the collaboration progressed (3c). Eventually, the members of the community shared different amounts of tasks (3d), so the thickness of the lines is more pronounced in the final graph. We describe in [Gil et al 2015b] the evolution of the AoW and ODSF communities, starting from a single site where two distinct subgraphs can be seen in the social task network and later two distinct (but overlapping) communities working in two separate sites.
  • #18 15 different users participating on tasks
  • #19 53 users
  • #20 22
  • #24 This paper presented the social aspects of the Organic Data Science framework to support computationally-grounded scientific collaboration focused on meta-workflow design that leads to computational workflows. We discussed the social design principles coming from studies of on-line collaboration that we found relevant to this kind of scientific collaboration. The paper also presented preliminary data on the different communities that are currently using the framework. These data show that the collaborations are active and the communities are growing over time. In future work, we plan to do a formal evaluation to assess how the framework supports scientific collaboration and whether it increases productivity and community growth. We continue to improve and extend the framework based on new requirements and feedback from the different communities.