1. Best practices for DuraMat software dissemination
Anubhav Jain, Silvana Ovaitt, Cliff Hansen, Robert White
April 17, 2023
slides (already) posted to
https://hackingmaterials.lbl.gov
2. DuraMat funds many projects that produce software products –
but currently without many standards or guidance
Project Link
DuraMat data hub https://datahub.duramat.org
PV Analytics https://github.com/pvlib/pvanalytics
PV Ops https://github.com/sandialabs/pvOps
VocMax https://github.com/toddkarin/vocmax
PV Climate Zones https://github.com/toddkarin/pvcz
PVTools https://pvtools.lbl.gov/string-length-calculator
PV ARC thickness estimator https://github.com/DuraMAT/pvarc
PV-terms https://github.com/DuraMAT/pv-terms
Comparative LCOE calculator www.github.com/NREL/PVLCOE
PV-Pro SDM parameter estimation https://github.com/DuraMAT/pvpro
WhatsCracking https://datahub.duramat.org/dataset/whatscracking-application
3. • To help you share software products effectively, including:
– Sharing best practices in software dissemination
– Save time and effort in the dissemination process
– Establishing some consistency across projects
– Getting you (and DuraMat) more credit for software products
• This is intended to be a discussion, so
comments/questions/improvements are welcome
– Software best practices often move very quickly as new tools are
introduced
Purpose of this discussion
4. The level of dissemination should depend on the purpose of the software
Level 1 Level 2 Level 3
Code maturity
and novelty
Code is mostly data
analysis/plots, or using
other already published
packages. The code is
largely intended to
demonstrate usage or
clarify an analysis.
Novelty is low, implementing
published ideas.
Code is structured into
functions which are
intended to serve as a
general toolset for other
analyses.
Code may contain new
algorithms that may require
a disclosure.
Code is rationally and
thoughtfully organized into
packages, modules, classes
and functions. It may serve
as a framework for
downstream analyses.
Code may contain new
algorithms that may require
a disclosure.
Intended Use
and Lifetime
Typically, used to support
and document published
analyses for enhanced
reproducibility – e.g.,
something akin to
supporting information for a
journal publication.
Typically, may serve as
documentation for the
innovations of an entire
project, e.g., for multiple
publications. However, the
project may no longer be
actively maintained after
project end.
The project is intended to
be used and maintained
long-term by the project
team and a community of
users; project lives on even
if/when initial developers
exit the project
5. Level 1: e.g., one-off” scripts that support a plot, table, etc. in another document
q Follow Laboratory-specific guidelines for approval to release your code.
q Inline code documentation. Each public function and class definition
should have its own documentation (e.g., docstring). Use a consistent
format (an example is provided later in this document). A docstring
should include:
q Function purpose
q Input Parameters
q Return parameters
q References (if any)
q Add README
q How to install/run the code as well as associated tests
q How to cite it (i.e. OSTI record, publication, or other)
q Does the README clearly describe the code’s purpose and its
organization
q Add LICENSE that conforms to lab and funding guidelines and includes
copyright-specific wording (see example at end)
will discuss
bullet points
with arrows
in subsequent
slides
6. Inline code documentation example (Python)
Some notes:
• The formatting of the docstring can
depend on if you are autoconverting
the docstrings to HTML documentation
• Common formatting examples include
reST (restructured text), Google
formatting, epyDoc, etc.
• You can add type hinting to further help
in code readability as well as the ability
to use static type checking tools
9. • Talk to your lab’s IP / IT departments for guidance
• BSD/MIT licenses are examples of very “open” licenses that allow others to
do what they’d like with the software
– BSD typically gives some more protection against others using your
name to promote their product, e.g. “our commercial product uses LBL-
approved software technology for its analysis” or “uses the same
algorithms developed by the brilliant scientist <your_name_here>”
• Be careful about choosing licenses that require all downstream code to also
use the same license, e.g. GPL/Apache
– e.g., if you leave DuraMat and work for a company, you may no longer be
able to use your own code as companies typically avoid any GPL code
Choosing a license – some guidance
10. Level 2: e.g., Repository used for lifetime of a project (software itself is a work product)
q All Level 1 items
q In addition to lab-specific guidelines, ensure that DOE
requirements are being met. For example, this likely
includes:
q Software Record (gets recorded in OSTI.gov and
helps in reporting purposes / credit)
q Lab-specific approval to release code
q Set up a public facing Github repository. This could be
hosted by the project organization, by your institution, or
by your research lab. Examples include:
q github.com/DURAMAT
q github.com/NREL
q Additional README components
q Screenshot or visual aid of the project
q Current status of the project (testing use,
production use, actively maintained, etc.)
q Funding information and institutional branding
(logo, funding acknowledgement text)
q Add Contributor license agreement (CLA) for
contributors
q Include any examples of use, Jupyter notebooks or
scripts used for scientific publications that you want to
make available, and data that can also be made available
for testing/demonstration
q Use a standard layout for the repository (an example of a
standard Python layout is provided at the end of this
document)
q Add a consistent versioning scheme. Examples include
semantic versioning (v0.0.1) and date-based versioning
(v2023.01.25); tools like versioneer may help.
q Ensure your software is easy to install locally, including
any necessary dependencies. For example, Python
projects may include files such as setup.py or
requirements.txt.
q Report your software to your funding program so it can
be included in accomplishments
11. The following text is at the bottom of the LBL BSD-3 license:
You are under no obligation whatsoever to provide any bug fixes, patches, or upgrades to the features,
functionality or performance of the source code ("Enhancements") to anyone; however, if you choose to
make your Enhancements available either publicly, or directly to Lawrence Berkeley National Laboratory
or its contributors, without imposing a separate written license agreement for such Enhancements, then
you hereby grant the following license: a non-exclusive, royalty-free perpetual license to install, use,
modify, prepare derivative works, incorporate into other computer software, distribute, and sublicense
such enhancements or derivative works thereof, in binary and source code form.
Example of a contributor license agreement
https://spdx.org/licenses/BSD-3-Clause-LBNL.html
12. Example of a standard project layout for a Python project
You can look up standard project
layouts for the programming
language you are using
Some details of the layout may
depend on the tools you are using
for other tasks such as code
distribution or continuous
integration
13. Example of setup.py for easier installation of your Python-based software
Another example: https://github.com/NREL/PV_ICE/blob/main/setup.py
Such files can help
users install the
correct dependencies
with the correct
versions to ensure
your software runs
smoothly
14. Level 3: e.g., Ongoing project
q All Level 1 items
q All Level 2 items
q Implement a release system. One option is to use Github
tags and releases. You can obtain a DOI for each release
via Zenodo:
q Link the Github repo to Zenodo
q Perform the release and tag it
q Update the README to include the DOI identifier
Zenodo provides in the “how to cite” section
q Set up continuous integration (CI) tools (examples include
Github actions to execute CircleCI, Travis CI, etc. against
pull requests)
q A code coverage tool (e.g. coveralls) can help
establish that tests cover the entire codebase and
publish test status (pass/fail, test coverage)
q Check for consistent code formatting; a format checker
(e.g., pylint or Python black) can be used to check the
formatting of pull requests and/or automatically reformat
code
q Add Documentation pages (e.g., HTML documentation).
Documents can be deployed at several places (e.g.,
Github pages, readthedocs). Documentation pages
should provide:
q Getting started. Provide simple instructions to
install the code and run a sample problem. Links
here to Tutorials.
q Examples / Tutorials. Links here to illustrations of
using the code.
q API reference. Links here to the documentation of
each public Class, function and/or method. Note
that this can typically be auto-generated.
q Release notes. Links here to logs of changes with
each tagged release.
q Upload to PyPI, conda, or other easy install code service.
q Consider submitting to a code-centric journal
publication such as Journal of Open Source Software
15. Release versions via Github, citable via Zenodo
Github release
https://github.com/NREL/PV_ICE
Citeable DOI via Zenodo
19. Releasing large data sets with code
• Data sets should be formally released into a separate archival repository
(project-specific data hub (e.g., DuraMat Data Hub), Figshare, Dryad, etc.).
• If there are smaller files that are needed for the code, for example for unit
tests, document and include these datasets in the repository, provided they
have been cleared for release and are not infringing copyright from other
sources or NDAs.
• Do not use links to local files on your computer!