Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Metrics & Citation for Software (and Data)


Published on

A talk about why metrics and citation for software (and data) are important to NSF and the science & engineering community, and what a number of projects are trying to do to improve the situation. Presented at "Workshop on Supporting Scientific Discovery through Norms and Practices for Software and Data Citation and Attribution", Washington, DC, 29 Jan 2015

Published in: Science
  • Be the first to comment

  • Be the first to like this

Metrics & Citation for Software (and Data)

  1. 1. Metrics & Citation for Software (and Data) Daniel S. Katz & @danielskatz Program Director, Division of Advanced Cyberinfrastructure ( software-and-data) Workshop on Supporting Scientific Discovery through Norms and Practices for Software and Data Citation and Attribution, Washington DC, 29 Jan 2015
  2. 2. National Science Foundation • Federal agency created in 1950 "to promote the progress of science; to advance the national health, prosperity, and welfare; to secure the national defense…” • Annual budget of $7.3 billion (FY 2015) • Funds 24 percent of all federally supported basic research at US colleges and universities • In many fields such as mathematics, computer science and the social sciences, NSF is the major source of federal funds
  3. 3. NSF NATIONAL SCIENCE FOUNDATION DIRECTORATE FOR BIOLOGICAL SCIENCES (BIO) James L. Olds, Assistant Director Jane Silverthorne, Deputy AD 703.292.8400 DIRECTORATE FOR EDUCATION & HUMAN RESOURCES (EHR) Joan Ferrini-Mundy, Assistant Director James W. Lewis, Deputy AD 703.292.8600 DIVISION OF BIOLOGICAL INFRASTRUCTURE (DBI) Scott Edwards, Division Director 703.292.8470 DIVISION OF ENVIRONMENTAL BIOLOGY (DEB) Alan Tessler, Acting Division Director 703.292.8480 DIVISION OF INTEGRATIVE ORGANISMAL SYSTEMS (IOS) William Zamer, Acting Division Director 703.292.8420 DIVISION OF MOLECULAR & CELLULAR BIOSCIENCES (MCB) Gregory Warr, Acting Division Director 703.292.8440 OFFICE OF EMERGING FRONTIERS (EF) Charles Liarakos, Acting Division Director 703.292.8508 DIRECTORATE FOR COMPUTER & INFORMATION SCIENCE & ENGINEERING (CISE) James F. Kurose, Assistant Director Suzanne Iacono, Deputy AD 703.292.8900 DIVISION OF CHEMICAL, BIOENGINEERING,ENVIRONMENTAL& TRANSPORT SYSTEMS (CBET) JoAnn Lighty, Division Director 703.292.8320 DIVISION OF CIVIL, MECHANICAL & MANUFACTURING INNOVATION (CMMI) Deborah Goodings, Acting Division Director 703.292.8360 DIVISION OF ELECTRICAL, COMMUNICATIONS & CYBER SYSTEMS (ECCS) Samir El-Ghazaly, Division Director 703.292.8339 DIVISION OF ENGINEERING EDUCATION & CENTERS (EEC) Don L. Millard, Acting Division Director 703.292.8380 DIVISION OF INDUSTRIAL INNOVATION & PARTNERSHIPS (IIP) Joseph Hennessey, Acting Division Director 703.292.8050 OFFICE OF EMERGING FRONTIERS IN RESEARCH & INNOVATION (EFRI) Sohi Rastegar, Senior Advisor 703.292.8301 DIRECTORATE FOR GEOSCIENCES (GEO) Roger Wakimoto, Assistant Director Margaret Cavanaugh, Deputy AD 703.292.8500 DIRECTORATE FOR MATHEMATICAL & PHYSICAL SCIENCES (MPS) Fleming Crim, Assistant Director Celeste M. Rohlfin g , Deputy AD 703.292.8800 DIVISION OF ASTRONOMICAL SCIENCES (AST) James Ulvestad, Division Director 703.292.8820 DIVISION OF CHEMISTRY (CHE) Steven Bernasek, Division Director 703.292.8840 DIVISION OF MATERIALS RESEARCH (DMR) Mary Galvin-Donoghue , Division Director 703.292.8810 DIVISION OF MATHEMATICAL SCIENCES (DMS) Michael Vogelius, Division Director 703.292.8870 DIVISION OF PHYSICS (PHY) Denise Caldwell, Division Director 703.292.8890 OFFICE OF MULTIDISCIPLINARY ACTIVITIES (OMA) Clark Cooper, Offic e He ad 703.292.8800 DIRECTORATE FOR SOCIAL, BEHAVIORAL, & ECONOMIC SCIENCES (SBE) Fay L. Cook, Assistant Director Clifford Gabriel, Deputy AD (Acting) 703.292.8700 DIVISION OF BEHAVIORAL & COGNITIVE SCIENCES (BCS) Mark Weiss, Division Director 703.292.8740 DIVISION OF SOCIAL & ECONOMIC SCIENCES (SES) Jeryl Mumpower, Division Director 703.292.8760 NATIONAL CENTER FOR SCIENCE AND ENGINEERING STATISTICS (NCSES) John Gawalt, Division Director 703.292.8780 National Science Foundation 4201 Wilson Boulevard Arlington, Virginia 22230 TEL: 703.292.5111 | FIRS: 800.877.8339 | TDD: 800.281.8749 January 2015 DIRECTORATE FOR ENGINEERING (ENG) Pramod P. Khargonekar, Assistant Director Grace Wang, Deputy AD 703.292.8300 DIVISION OF GRADUATE EDUCATION (DGE) Valerie Wilson, Acting Division Director 703.292.8630 DIVISION OF HUMAN RESOURCE DEVELOPMENT (HRD) Sylvia James, Division Director 703.292.8640 DIVISION OF RESEARCH ON LEARNING IN FORMAL & INFORMAL SETTINGS (DRL) Sarah McDonald, Acting Division Director 703.292.8620 DIVISION OF UNDERGRADUATE EDUCATION (DUE) Susan Singer, Division Director 703.292.8670 DIVISION OF ATMOSPHERIC & GEOSPACE SCIENCES (AGS) Paul Shepson Division Director 703.292.8520 DIVISION OF EARTH SCIENCES (EAR) Carol Frost, Division Director 703.292.8550 DIVISION OF OCEAN SCIENCES (OCE) Deborah Bronk, Division Director 703.292.8580 DIVISION OF POLAR PROGRAMS (PLR) Kelly Falkner, Division Director 703.292.8030 DIVISION OF COMPUTER & NETWORK SYSTEMS (CNS) Keith Marzullo, Division Director 703.292.8950 OFFICE OF INFORMATION & RESOURCE MANAGEMENT (OIRM) Joanne S. Tornow, Head / Chief Human Capital Offic e r Amy Northcutt, Chief Information Offic e r 703.292.8100 OFFICE OF BUDGET, FINANCE, & AWARD MANAGEMENT (BFA) MarthaA. Rubenstein, Head / Chief Financial Offic e r Joanna E. Rom, Deputy Head 703.292.8200 BUDGET DIVISION (BUD) Michael Sieverts, Division Director 703.292.8260 DIVISION OF ACQUISITION AND COOPERATIVE SUPPORT (DACS) Jeffery Lupis, Division Director 703.292.8240 DIVISION OF FINANCIAL MANAGEMENT (DFM) Shirl Ruffin , Division Director / Deputy CFO 703.292.8280 DIVISION OF ADMINISTRATIVE SERVICES (DAS) Mercedes Eugenia, Division Director 703.292.8190 DIVISION OF INFORMATION SYSTEMS (DIS) Dorothy Aronson, Division Director 703.292.8150 DIVISION OF HUMAN RESOURCE MANAGEMENT (HRM) Judy Sunley, Division Director 703.292.8180 DIVISION OF GRANTS & AGREEMENTS (DGA) Karen Tiplady, Division Director 703.292.8210 DIVISION OF INSTITUTION & AWARD SUPPORT (DIAS) Mary Santonastasso, Division Director 703.292.8230 LARGE FACILITIES OFFICE Matthew Hawkins, Acting Deputy Director 703.292.4416 DIVISION OF COMPUTING & COMMUNICATION FOUNDATIONS (CCF) Rao Kosaraju, Division Director 703.292.8910 DIVISION OF ADVANCED CYBERINFRASTRUCTURE (ACI) Irene Qualters, Division Director 703.292.8970 DIVISION OF INFORMATION & INTELLIGENT SYSTEMS (IIS) Lynne E. Parker, Division Director 703.292.8930 Richard Buckius Chief Operating Offic e r OFFICE OF THE GENERAL COUNSEL (OGC) Lawrence Rudolph, General Counsel Peggy Hoyle, Deputy GC 703.292.8060 OFFICE OF DIVERSITY & INCLUSION (ODI) Vacant, Head 703.292.8020 OFFICE OF LEGISLATIVE & PUBLIC AFFAIRS (OLPA) Dana Toupousis, Acting Head 703.292.8070 OFFICE OF INTERNATIONAL & INTEGRATIVE ACTIVITIES (OIIA) Wanda Ward, Head 703.292.8040 OFFICE OF INSPECTOR GENERAL (OIG) Allison C. Lerner, Inspector General 703.292.7100 NATIONAL SCIENCE BOARD OFFICE Michael Van Woert Executive Offic e r 703.292.7000 NATIONAL SCIENCE BOARD (NSB) Dan E. Arvizu Chair Kelvin K. Droegemeier Vice Chair 703.292.7000 OFFICE OF THE DIRECTOR 703.292.8000 Vacant Deputy Director France A. Córdova Director
  4. 4. Advanced Cyberinfrastructure (ACI) Division • Supports and coordinates the development, acquisition, and provision of state-of-the-art cyberinfrastructure resources, tools, and services • Supports forward-looking research and education to expand the future capabilities of cyberinfrastructure • Serves the growing community of scientists and engineers, across all disciplines, whose work relies on the power of advanced computation, data-handling, and networking
  5. 5. Cyberinfrastructure “Cyberinfrastructure consists of computing systems, data storage systems, advanced instruments and data repositories, visualization environments, and people, all linked together by software and high performance networks, to improve research productivity and enable breakthroughs not otherwise possible.” -- Craig Stewart
  6. 6. Software as Infrastructure Science Software Computing Infrastructure • Software (including services) essential for the bulk of science - About half the papers in recent issues of Science were software-intensive projects - Research becoming dependent upon advances in software - Significant software development being conducted across NSF: NEON, OOI, NEES, NCN, iPlant, etc • Wide range of software types: system, applications, modeling, gateways, analysis, algorithms, middleware, libraries • Software is not a one-time effort, it must be sustained • Development, production, and maintenance are people intensive • Software life-times are long vs hardware • Software has under-appreciated value For software to be sustainable, it must become infrastructure
  7. 7. See for current projects 5 rounds of funding, 65 SSEs 4 rounds of funding, 35 SSIs 2 rounds of funding, 14 S2I2 conceptualizations NSF Software Infrastructure Projects SSE & SSI – NSF 14-520: Cross-NSF, all Directorates participating Next SSEs due Feb 2015; Next SSIs due June 2015
  8. 8. SI2 Solicitation and Decision Process • Proposal reviews well -> my role becomes matchmaking – I want to find program officers with funds, and convince them that they should spend their funds on the proposal • Unidisciplinary project (e.g. bioinformatics app) – Work with single program officer, either likes the proposal or not • Multidisciplinary project (e.g., molecular dynamics) – Work with multiple program officers, ... • Omnidisciplinary project (e.g. http, math library) – Try to work with all program officers, often am told “it’s your responsibility” To judge software, need to understand/forecast impact
  9. 9. Measuring Impact – Scenarios 1. Developer of open source physics simulation – Possible metrics • How many downloads? (easiest to measure, least value) • How many contributors? • How many uses? • How many papers cite it? • How many papers that cite it are cited? (hardest to measure, most value) 2. Developer of open source math library – Possible metrics are similar, but citations are less likely – What if users don’t download it? • It’s part of a distro • It’s pre-installed (and optimized) on an HPC system • It’s part of a cloud image • It’s a service • Future impacts – let proposers suggest
  10. 10. ACI Software Cluster Programs • In these programs, ACI works with other NSF units to support projects that lead to software as an element of infrastructure • Issue: amount of software that is infrastructure grows over time, and grows faster than NSF funding Q: How can NSF ensure that software as infrastructure continues to appear, without funding all of it? A: Incentives • The devil is in the details
  11. 11. Other Software Discussions • Working Towards Sustainable Software for Science: Practice and Experience (WSSSPE) – – 3 workshops held • Lessons: Many of the issues in developing sustainable software are social, not technical Software work is inadequately visible in ways that “count” within the reputation system underlying science
  12. 12. Where We Are • To judge software, need to understand/forecast impact • Q: How can NSF ensure that software as infrastructure continues to appear, without funding all of it? • A: Incentives • Many of the issues in developing sustainable software are social, not technical • Software work is inadequately visible in ways that “count” within the reputation system underlying science Hypothesis: better measurement of contributions can lead to rewards (incentives), leading to career paths, willingness to join communities, leading to more sustainable software
  13. 13. A Problem Credit for finding: Amy Brand, Digital Science
  14. 14. Another Problem Credit for finding: Amy Brand, Digital Science
  15. 15. Last Problem Credit for finding: Amy Brand, Digital Science
  16. 16. Moving Forward - NSF • Recent CISE/ACI & SBE/SES Dear Colleague Letter: Supporting Scientific Discovery through Norms and Practices for Software and Data Citation and Attribution (NSF 14-059, .jsp) – Need well-developed metrics to assess the impact and quality of scientific software and data – Explore new norms and practices for software and data citation and attribution, so that data producers, software and tool developers, and data curators are credited • 6 projects and 3 collaborative workshops funded
  17. 17. Moving Forward - Dan • Products (software, paper, data set) are registered – Credit map (weighted list of contributors— people, products, etc.) is an input – DOI is an output Paper Author B ... Paper M ... Software X ... 0.2 0.05 0.2 Author A 0.2 Data K ... 0.1
  18. 18. Moving Forward - Dan – Enables transitive credit1 • E.g., paper 1 provides 25% credit to software A, and software A provides 10% credit to library X -> library X gets 2.5% credit for paper 1 • Helps developer show: “my tools are important” – Issues: • Social: Trust in person who registers a product • Technological: How2, Registration system 1D. S. Katz, "Transitive Credit as a Means to Address Social and Technological Concerns Stemming from Citation and Attribution of Digital Products," Journal of Open Research Software, v.2(1): e20, 2014. DOI: 10.5334/ 2D. S. Katz, A. M. Smith, "Implementing Transitive Credit with JSON-LD," 2nd Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE2), 2014. URL: Author 1 ... Paper 4 ... Software 12 ... 0.1 0.1 0.3 Paper Author B ... Paper M ... Software X ... 0.2 0.05 0.2 Author A 0.2 Data K ... 0.1
  19. 19. Moving Forward – Project CRediT • Goal: develop a contributor role taxonomy to enable greater granularity & transparency around contributions to scholarly published output in science • • Rationale: • Comments to & Publishers Increase transparency Reduce author disputes Simplify process of chasing authors Identifying peer reviewers Research funders Supporting grant applications Understanding impact Awarding credit Identifying peer reviewers Identifying new funding opportunities Researchers Gaining credit for true contribution Credit for ‘new’/specific roles Identify collaborators Benefit junior reviewers Reduce authorship politics? Research institutions Support tenure & appointment New esteem & credit metrics for staff Understanding impact
  20. 20. Moving Forward – Project CRediT Role Description Study conception Idea; formulation of research question; statement of hypothesis Methodology Development or design of methodology; creation of models Computation Programming, software development; designing computer programs; implementation of computer code and supporting algorithms Formal analysis Application of statistical, mathematical, or or formal techniques to analyze study data Investigation; performed the experiments Conducting the research and investigation process, specifically performing the experiments Investigation; data/evidence collection Conducting the research and investigation process, specifically data/evidence collection Resources Provision of study materials, reagents, patients, laboratory samples, animals, instrumentation, or other analysis tools Data curation Management activities to annotate (produce metadata) and maintain research data for initial use and later re-use Writing/manuscript preparation: writing the initial draft Preparation, creation, and/or presentation of published work, specifically writing the initial draft Writing/manuscript preparation: critical review, commentary, or revision Preparation, creation, and/or presentation of published work, specifically critical review, commentary, or revision Writing/manuscript preparation: visualization/data presentation Preparation, creation, and/or presentation of published work, specifically visualization/data presentation Supervision Responsibility for supervising research; project orchestration; principal investigator or other lead stakeholder Project administration Coordination or management of research activities leading to this publication Funding acquisition Acquisition of the financial support for the project leading to this publication
  21. 21. Moving Forward – Software Discovery Index • NIH workshop, May 2014, within Big Data to Knowledge (BD2K) initiative – • Explored challenges facing the biomedical research community in locating, citing, and reusing biomedical software • Identified fundamental prerequisite for success: an automated, broadly accessible system enabling comprehensive identification of biomedical software. • SDI Objectives: – to assign standard and unambiguous identifiers to reference all software – to track specific metadata features that describe that software – to enable robust querying of all relevant information for users
  22. 22. Moving Forward – Software Discovery Index • Complementary with BD2K Data Discovery Index (DDI) • Data vs. Software Characteristics • Research Resource Identifiers (RRIDs) as prototype? – • Note strong biomedical focus of SDI and DDI – initial case or limiting? Issue Data Software Storage-limited   Number of {datasets | software}   Complex metadata   Cited consistently and effectively   Consistently accessible long-term   Dependent on other data and software   (Credit; Chris Wellington & Vivien Bonazzi, NIH)
  23. 23. Moving Forward - Scholarly Contributions Workshop & FORCE11 • FORCE11 – Open community aiming to improve future research communication and e-Scholarship – • Scholarly Communications Workshop @ FORCE2015, Oxford, UK, Jan 11 2015 • Goals: 1. Develop collaborative, interdisciplinary group to technically implement a scholarly contribution roles ontology in context of VIVO-ISF 2. Skeleton of scholarly products and the contribution roles that people have towards each 3. Plan for technical next steps and development of proposal to get funding to support this work • Interest led to Force11 Attribution working group – Webpage: – Mailing List:
  24. 24. Moving Forward - Community • Lots of challenges remain – within and across projects • Career paths – Is there a role for non-tenure-track researchers who produce software, data, etc. in universities? – Assuming yes, do universities recognize and support this? If not, how to get them to? • What is needed to support reproducibility of science, in terms of data and software? • Versioning & provenance • Lots of entities with similar interests in both software and data, e.g. JISC, RCUK, NIH, DOE, Sloan & Moore, Mozilla, Apache – Identifier work from Zenodo/GitHub, DataCite, CrossRef, VIVO, ... • Need institutional buy-in, incorporation in researcher profiles • Publisher involvement is essential – Software papers vs software? • Future of Google Scholar? • Continued participation in WSSSPE invited, leading to actions • Other ideas and questions are welcome, now or later – or
  25. 25. Resources • NSF Software as Infrastructure Vision: • Implementation of NSF Software Vision: • Software Infrastructure for Sustained Innovation (SI2) Program – Scientific Software Elements (SSE) & Scientific Software Integration (SSI) solicitation: – 2013 PI meeting: – 2014 PI meeting: – Awards: • Working towards Sustainable Software for Science: Practice and Experiences (WSSSPE) – Home: (includes links to all slides & papers) – 1st workshop paper: – 2nd workshop site: • NSF 14-059: “Dear Colleague Letter - Supporting Scientific Discovery through Norms and Practices for Software and Data Citation and Attribution” – • Transitive Credit Papers – – • Project CRediT: • NIH Software Discovery Index: • FORCE11: – Attribution Working Group:
  26. 26. Credits: • SI2 Program: – Current program officers: Daniel S. Katz, Rudolf Eigenmann, William Y. B. Chang, John C. Cherniavsky, Almadena Y. Chtchelkanova, Cheryl L. Eavey, Evelyn Goldfield, Sol Greenspan, Daryl W. Hess, Peter H. McCartney, Bogdan Mihaila, Dimitrios V. Papavassiliou, Andrew D. Pollington, Barbara Ransom, Thomas Russell, Massimo Ruzzene, Nigel A. Sharp, Paul Werbos, Eva Zanzerkia – Formerly-involved program officers: Manish Parashar, Gabrielle Allen, Sumanta Acharya, Eduardo Misawa, Jean Cottam-Allen, Thomas Siegmund • WSSSPE: – Organizers: Daniel S. Katz, Gabrielle Allen, Neil Chue Hong, Karen Cranston, Manish Parashar, David Proctor, Matthew Turk, Colin C. Venters, Nancy Wilkins- Diehr – WSSSPE1 summary paper authors: Daniel S. Katz, Sou-Cheng T. Choi, Hilmar Lapp, Ketan Maheshwari, Frank Löffler, Matthew Turk, Marcus D. Hanwell, Nancy Wilkins-Diehr, James Hetherington, James Howison, Shel Swenson, Gabrielle D. Allen, Anne C. Elster, Bruce Berriman, Colin Venters – Keynote speakers: Phil Bourne, Arfon Smith, Kaitlin Thaney, Neil Chue Hong • Project CRediT – Leads: Liz Allen, Amy Brandt, full group at • NIH Software Discovery Index – • Force11 community –