1. Best practices for DuraMat software dissemination
Anubhav Jain, Baojie Li, Silvana Ovaitt,
Cliff Hansen, Robert White
Sept 26, 2023
slides (already) posted to
https://hackingmaterials.lbl.gov
2. • To help you develop and share software products effectively:
– Best practices in software dissemination
– Save time and effort in the development and dissemination process
– Establishing some consistency across DuraMat projects
– Getting you (and DuraMat) more credit for software products
• This is intended to be a discussion, so
comments/questions/improvements are welcome
– Software best practices often move very quickly as new tools are introduced
Purpose of this discussion
3. DuraMat funds many projects that produce software products –
but currently without many standards or guidance
Project Link
DuraMat data hub https://datahub.duramat.org
PV Analytics https://github.com/pvlib/pvanalytics
PV Ops https://github.com/sandialabs/pvOps
VocMax https://github.com/toddkarin/vocmax
PV Climate Zones https://github.com/toddkarin/pvcz
PVTools https://pvtools.lbl.gov/string-length-calculator
PV ARC thickness estimator https://github.com/DuraMAT/pvarc
PV-terms https://github.com/DuraMAT/pv-terms
Comparative LCOE calculator www.github.com/NREL/PVLCOE
PV-Pro SDM parameter estimation https://github.com/DuraMAT/pvpro
WhatsCracking https://datahub.duramat.org/dataset/whatscracking-application
4. DuraMat funds many projects that produce software products –
but currently without many standards or guidance
Project Link
DuraMat data hub https://datahub.duramat.org
PV Analytics https://github.com/pvlib/pvanalytics
PV Ops https://github.com/sandialabs/pvOps
VocMax https://github.com/toddkarin/vocmax
PV Climate Zones https://github.com/toddkarin/pvcz
PVTools https://pvtools.lbl.gov/string-length-calculator
PV ARC thickness estimator https://github.com/DuraMAT/pvarc
PV-terms https://github.com/DuraMAT/pv-terms
Comparative LCOE calculator www.github.com/NREL/PVLCOE
PV-Pro SDM parameter estimation https://github.com/DuraMAT/pvpro
WhatsCracking https://datahub.duramat.org/dataset/whatscracking-application
+ new projects from recent calls
- Kempe/Ovaitt (Lifetime predictor)
- E. Young (Wind loading)
- Braid (cell crack models)
- Rahman (SIERRA/COMSOL convertor)
5. • We have compiled an online resource for this presentation that
you can skim through
• https://github.com/DuraMAT/software_guide
• There are a few things new in this presentation that are not yet in
the guide
• If you have suggestions, submit a PR to the guide!
An online version of this presentation
6. The level of dissemination should depend on the purpose of the software
Level 1 Level 2 Level 3
Code maturity and
novelty
Code is mostly data
analysis/plots, or using other
already published packages.
The code is largely intended to
demonstrate usage or clarify an
analysis.
Novelty is low, implementing
published ideas.
Code is structured into functions
which are intended to serve as a
general toolset for other
analyses.
Code may contain new
algorithms that may require a
disclosure.
Code is rationally and
thoughtfully organized into
packages, modules, classes and
functions. It may serve as a
framework for downstream
analyses.
Code may contain new
algorithms that may require a
disclosure.
Intended Use and
Lifetime
Typically, used to support and
document published analyses
for enhanced reproducibility –
e.g., something akin to
supporting information for a
journal publication.
Typically, may serve as
documentation for the
innovations of an entire project,
e.g., for multiple publications.
However, the project may no
longer be actively maintained
after project end.
The project is intended to be
used and maintained long-term
by the project team and a
community of users; project lives
on even if/when initial
developers exit the project
7. Level 1: e.g., one-off” scripts that support a plot, table, etc. in another document
q Follow Laboratory-specific guidelines for approval to release your code.
q Inline code documentation. Each public function and class definition
should have its own documentation (e.g., docstring). Use a consistent
format (an example is provided later in this document). A docstring
should include:
q Function purpose
q Input Parameters
q Return parameters
q References (if any)
q Add README
q How to install/run the code as well as associated tests
q How to cite it (i.e. OSTI record, publication, or other)
q Does the README clearly describe the code’s purpose and its
organization
q Add LICENSE that conforms to lab and funding guidelines and includes
copyright-specific wording (see example at end)
will discuss
bullet points
with arrows
in subsequent
slides
8. Inline code documentation example (Python)
Some notes:
• The formatting of the docstring can
depend on if you are autoconverting
the docstrings to HTML documentation
• Common formatting examples include
reST (restructured text), Google
formatting, epyDoc, etc.
• You can add type hinting to further help
in code readability as well as the ability
to use static type checking tools
11. • Talk to your lab’s IP / IT departments for guidance
• BSD/MIT licenses are examples of very “open” licenses that allow others to do what they’d like with
the software
– BSD typically gives some more protection against others using your name to promote their
product, e.g. prevents user claiming “our commercial product uses LBL-approved software
technology for its analysis” or “uses the same algorithms developed by the brilliant scientist
<your_name_here>”
• Be careful about choosing licenses that require all downstream code to also use the same license,
e.g. GPL/Apache
– e.g., if you leave DuraMat and work for a company, you may no longer be able to use your own
code as companies typically avoid any GPL code
– Some labs may actually discourage or ban versions of such licenses because they contain
patent-granting language I don’t understand (e.g., Apache 2.0 and GPL 3.0 for LBL)
– If you really insist on these licenses, suggest talking to DuraMat program (for impact on industry
adoption) as well as your lab’s IPO
Choosing a license – some guidance
12. Level 2: e.g., Repository used for lifetime of a project (software itself is a work product)
q All Level 1 items
q In addition to lab-specific guidelines, ensure that DOE
requirements are being met. For example, this likely
includes:
q Software Record (gets recorded in OSTI.gov and
helps in reporting purposes / credit)
q Lab-specific approval to release code
q Set up a public facing Github repository. This could be
hosted by the project organization, by your institution, or
by your research lab. Examples include:
q github.com/DURAMAT
q github.com/NREL
q Additional README components
q Screenshot or visual aid of the project
q Current status of the project (testing use,
production use, actively maintained, etc.)
q Funding information and institutional branding
(logo, funding acknowledgement text)
q Add Contributor license agreement (CLA) for
contributors
q Include any examples of use, Jupyter notebooks or
scripts used for scientific publications that you want to
make available, and data that can also be made available
for testing/demonstration
q Use a standard layout for the repository (an example of a
standard Python layout is provided at the end of this
document)
q Add a consistent versioning scheme. Examples include
semantic versioning (v0.0.1) and date-based versioning
(v2023.01.25); tools like versioneer may help.
q Ensure your software is easy to install locally, including
any necessary dependencies. For example, Python
projects may include files such as setup.py or
requirements.txt.
q Report your software to your funding program so it can
be included in accomplishments
13. The following text is at the bottom of the LBL BSD-3 license:
You are under no obligation whatsoever to provide any bug fixes, patches, or upgrades to the features,
functionality or performance of the source code ("Enhancements") to anyone; however, if you choose to
make your Enhancements available either publicly, or directly to Lawrence Berkeley National Laboratory
or its contributors, without imposing a separate written license agreement for such Enhancements, then
you hereby grant the following license: a non-exclusive, royalty-free perpetual license to install, use,
modify, prepare derivative works, incorporate into other computer software, distribute, and sublicense
such enhancements or derivative works thereof, in binary and source code form.
Example of a contributor license agreement
https://spdx.org/licenses/BSD-3-Clause-LBNL.html
14. Example of a standard project layout for a Python project
You can look up standard project
layouts for the programming
language you are using
Some details of the layout may
depend on the tools you are using
for other tasks such as code
distribution or continuous
integration
15. • The cookiecutter package will set up different
package structures depending on your usage
• The cruft package can help you keep things up to
date as things change
You can also use the cookiecutter package to help
Python Flask web site
ML project
16. Two commands to get started w/cookiecutter
pip install cookiecutter
cookiecutter https://github.com/audreyfeldroy/cookiecutter-pypackage.git
17. Level 3: e.g., Ongoing project
q All Level 1 items
q All Level 2 items
q Implement a release system. One option is to use Github
tags and releases. You can obtain a DOI for each release
via Zenodo:
q Link the Github repo to Zenodo
q Perform the release and tag it
q Update the README to include the DOI identifier
Zenodo provides in the “how to cite” section
q Set up continuous integration (CI) tools (examples include
Github actions to execute CircleCI, Travis CI, etc. against
pull requests)
q A code coverage tool (e.g. coveralls) can help
establish that tests cover the entire codebase and
publish test status (pass/fail, test coverage)
q Check for consistent code formatting; a format checker
(e.g., pylint or Python black) can be used to check the
formatting of pull requests and/or automatically reformat
code
q Add Documentation pages (e.g., HTML documentation).
Documents can be deployed at several places (e.g.,
Github pages, readthedocs). Documentation pages
should provide:
q Getting started. Provide simple instructions to
install the code and run a sample problem. Links
here to Tutorials.
q Examples / Tutorials. Links here to illustrations of
using the code.
q API reference. Links here to the documentation of
each public Class, function and/or method. Note
that this can typically be auto-generated.
q Release notes. Links here to logs of changes with
each tagged release.
q Upload to PyPI, conda, or other easy install code service.
q Consider submitting to a code-centric journal
publication such as Journal of Open Source Software
18. Release versions via Github, citable via Zenodo
Github release
https://github.com/NREL/PV_ICE
Citeable DOI via Zenodo
20. • Pre-commit hooks run a series of checks and automated fixes
against your code before you commit that code to git
• For example, pre-commit hooks can:
– Auto-fix indentation, trailing spaces, line ending, line length,
etc. issues (e.g., via a tool like black). This will essentially free
up any energy in the project from code formatting issues
– Warn against issues like unused imports, undefined variables,
bare ”except” clause, too high code complexity, etc. (via a tool
like flake)
• If set up early on, it keeps your code “on track” of clean code
– It can also be installed and run later, but then you may get a
long list of previous code issues to fix
Keeping your code clean: pre-commit hooks
25. Consider submitting paper to a code-centric journal
Reviews via Github repo
The length of a JOSS paper is 250 – 1000
words
i.e., the entire paper is like a couple of
abstracts
27. • You can run a check on your repo using
Scientific Python’s repo-review tool
• https://learn.scientific-
python.org/development/guides/repo-
review/
• Web version didn’t work for me, but
command line version did
• Example shown at right for pvlib
Auto-checking with Scientific Python’s repo-review
28. • Haven’t done this before personally, but it may be a good
exercise for larger libraries like pvlib
• Create an issue here, it will guide you through the process
– https://github.com/pyOpenSci/software-
submission/issues/new/choose
• If you are nervous or skeptical, one of the options is a
“presubmission inquiry”
Peer-checking with pyopensci
29. Releasing large data sets with code
• Data sets should be formally released into a separate archival repository
(project-specific data hub (e.g., DuraMat Data Hub), Figshare, Dryad, etc.).
• Include in the repository: smaller files that are needed for the code, for
example for unit test or examples, provided they have been cleared for
release and are not infringing copyright from other sources or NDAs.
• Remember not to use links to local files on your computer!