The document discusses software tools to facilitate materials science research, noting that the author's group works to standardize and automate computational methods for high-throughput calculations and discovery of new functional materials. It advocates for developing automated workflows and analysis frameworks to reduce errors, improve efficiency, and enable non-experts to easily conduct complex simulations and analyses through intuitive online interfaces. The goal is to make advanced computational materials science accessible to a wider audience.
Various energy sources for wearable and IoT has been covered and explained.
Super-capacitor, Secondary cell and energy harvesting technology has been explained here. Thin film battery, Piezo electric energy harvesting, wireless charging and other technology has been explained.
classify and explain various types of smart materials.
Smart materials” are materials that change significantly one or more of their properties, such as shape, color, or size in response to externally applied stimuli, such as stress, light, temperature, moisture or pH, and electric or magnetic fields.
Various energy sources for wearable and IoT has been covered and explained.
Super-capacitor, Secondary cell and energy harvesting technology has been explained here. Thin film battery, Piezo electric energy harvesting, wireless charging and other technology has been explained.
classify and explain various types of smart materials.
Smart materials” are materials that change significantly one or more of their properties, such as shape, color, or size in response to externally applied stimuli, such as stress, light, temperature, moisture or pH, and electric or magnetic fields.
Computational methods applied to materials modelingcippo1987Ita
Talk given to the 1st multidisciplinary conference of Italian researchers in Czechia.
This is a public engagement talk about computational tools to investigate materials properties.
Nanotechnology is the scientific ability to control and restructure the matter at the atomic and molecular levels within the nanoscale. It is a modern branch of materials science dealing with the understanding of the role of nanomaterials(NM) in real-world applications. It is the creation and/or manipulation of various materials at nanometer (nm) scale, analysing their structural characteristics & properties for novel applications, attracting, producing and exploiting the nanoparticles in different dimensions and increase the utilisation potential of nano structured materials (NSM)in various fields.
Computational methods applied to materials modelingcippo1987Ita
Talk given to the 1st multidisciplinary conference of Italian researchers in Czechia.
This is a public engagement talk about computational tools to investigate materials properties.
Nanotechnology is the scientific ability to control and restructure the matter at the atomic and molecular levels within the nanoscale. It is a modern branch of materials science dealing with the understanding of the role of nanomaterials(NM) in real-world applications. It is the creation and/or manipulation of various materials at nanometer (nm) scale, analysing their structural characteristics & properties for novel applications, attracting, producing and exploiting the nanoparticles in different dimensions and increase the utilisation potential of nano structured materials (NSM)in various fields.
This presentation was part of the workshop on Materials Project Software infrastructure conducted for the Materials Virtual Lab in Nov 10 2014. It presents an introduction to the Python Materials Genomics (pymatgen) materials analysis library. Pymatgen is a robust, open-source Python library for materials analysis. It currently powers the public Materials Project (http://www.materialsproject.org), an initiative to make calculated properties of all known inorganic materials available to materials researchers. These are some of the main features:
1. Highly flexible classes for the representation of Element, Site, Molecule, Structure objects.
Extensive io capabilities to manipulate many VASP (http://cms.mpi.univie.ac.at/vasp/) and ABINIT (http://www.abinit.org/) input and output files and the crystallographic information file format. This includes generating Structure objects from vasp input and output. There is also support for Gaussian input files and XYZ file for molecules.
2. Comprehensive tool to generate and view compositional and grand canonical phase diagrams.
3. Electronic structure analyses (DOS and Bandstructure).
4. Integration with the Materials Project REST API.
This presentation was part of the workshop on Materials Project Software infrastructure conducted for the Materials Virtual Lab in Nov 10 2014. It presents an introduction to the pymatgen-db database plugin for the pymatge) materials analysis library, and the custodian error recovery framework.
Pymatgen-db enables the creation of Materials Project-style MongoDB databases for management of materials data. A query engine is also provided to enable the easy translation of MongoDB docs to useful pymatgen objects for analysis purposes.
Custodian is a simple, robust and flexible just-in-time (JIT) job management framework written in Python. Using custodian, you can create wrappers that perform error checking, job management and error recovery. It has a simple plugin framework that allows you to develop specific job management workflows for different applications. Error recovery is an important aspect of many high-throughput projects that generate data on a large scale. The specific use case for custodian is for long running jobs, with potentially random errors. For example, there may be a script that takes several days to run on a server, with a 1% chance of some IO error causing the job to fail. Using custodian, one can develop a mechanism to gracefully recover from the error, and restart the job with modified parameters if necessary. The current version of Custodian also comes with sub-packages for error handling for Vienna Ab Initio Simulation Package (VASP) and QChem calculations.
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...Ilkay Altintas, Ph.D.
cientific workflows are used by many scientific communities to capture, automate and standardize computational and data practices in science. Workflow-based automation is often achieved through a craft that combines people, process, computational and Big Data platforms, application-specific purpose and programmability, leading to provenance-aware archival and publications of the results. This talk summarizes varying and changing requirements for distributed workflows influenced by Big Data and heterogeneous computing architectures and present a methodology for workflow-driven science based on these maturing requirements.
Parsl: Pervasive Parallel Programming in PythonDaniel S. Katz
a seminar presented at the School of Computer Science at the University of St Andrews 18 October 2019 (see https://blogs.cs.st-andrews.ac.uk/csblog/2019/09/25/daniel-katz-parsl/)
Resources for Teaching Undergraduate Computational PhysicsAmdeselassie Amde
Experience from Physics Department, University of Gondar ...why we should teach our students undergraduate computational physics (UCP), and Free & Open Resources for teaching UCP
Confessions of an Interdisciplinary Researcher: The Case of High Performance ...tiberiusp
Scaling up economics models to run on large input sizes, complex market and agent model settings, and on big computational resource pools is a demanding feat.
This presentation tells you what it takes to work as a computational economist.
Knowledge Infrastructure for Global Systems ScienceDavid De Roure
Presentation at the First Open Global Systems Science Conference, Brussels, 8-10 November 2012
http://www.gsdp.eu/nc/news/news/date/2012/10/31/first-open-global-systems-science-conference/
ExaLearn Overview - ECP Co-Design Center for Machine Learninginside-BigData.com
In this deck from the HPC User Forum, Frank Alexander, from Brookhaven National Laboratory presents: ExaLearn Overview - ECP Co-Design Center for Machine Learning.
"ExaLearn is a co-design center for Exascale Machine Learning (ML) Technologies and is a collaboration initially consisting of experts from eight multipurpose DOE labs. Rapid growth in the amount of data and computational power is driving a revolution in machine learning (ML) and artificial intelligence (AI). Beyond the highly visible successes in machine-based natural language translation, these new ML technologies have profound implications for computational and experimental science and engineering and the exascale computing systems that DOE is deploying to support those disciplines.
To address these challenges, the ExaLearn co-design center will provide exascale ML software for use by ECP Applications projects, other ECP Co-Design Centers and DOE experimental facilities and leadership class computing facilities. The ExaLearn Co-Design Center will also collaborate with ECP PathForward vendors on the development of exascale ML software."
Watch the video: https://wp.me/p3RLHQ-kdJ
Learn more: https://www.exascaleproject.org/ecp-announces-new-co-design-center-to-focus-on-exascale-machine-learning-technologies/
and
http://hpcuserforum.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/letter
For a Bioinformatics Discussion for Students and Post-Docs (BioDSP) meeting: Expands on Sandve's "Ten Simple Rules for Reproducible Computational Research"
https://bigscience.huggingface.co/
EN: Presentation of the BigScience project: a research initiative launched by HuggingFace and aiming to build a large language model (inspired by OpenAI and GPTx) over multiple languages and a very large processing cluster. The participants plan to investigate the dataset and the model from all angles: bias, social impact, capabilities, limitations, ethics, potential improvements, specific domain performances, carbon impact, general AI/cognitive research landscape.
FR : Présentation du projet Bigscience : un projet de recherche ouvert lancé par HuggingFace et qui a pour objectif de contruire un modèle de langue (ie un peu comme openAI et GPT-3) mais en explorant les problèmes liés au jeux de données et au modèle selon les angles des biais cognitifs, de l'impact social et environemental, des limites éthiques, des possibles gain de performance et de l'impact général de ce type d'approche lorsque le but n'est pas seulement "d'avoir un plus gros modèle".
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...Richard Zijdeman
A glimpse of how we are used to connecting datasets on our laptops and how, imho, need to move to the Web of Data, including a demo connecting various sources all from your(!) machine.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Studia Poinsotiana
I Introduction
II Subalternation and Theology
III Theology and Dogmatic Declarations
IV The Mixed Principles of Theology
V Virtual Revelation: The Unity of Theology
VI Theology as a Natural Science
VII Theology’s Certitude
VIII Conclusion
Notes
Bibliography
All the contents are fully attributable to the author, Doctor Victor Salas. Should you wish to get this text republished, get in touch with the author or the editorial committee of the Studia Poinsotiana. Insofar as possible, we will be happy to broker your contact.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...Wasswaderrick3
In this book, we use conservation of energy techniques on a fluid element to derive the Modified Bernoulli equation of flow with viscous or friction effects. We derive the general equation of flow/ velocity and then from this we derive the Pouiselle flow equation, the transition flow equation and the turbulent flow equation. In the situations where there are no viscous effects , the equation reduces to the Bernoulli equation. From experimental results, we are able to include other terms in the Bernoulli equation. We also look at cases where pressure gradients exist. We use the Modified Bernoulli equation to derive equations of flow rate for pipes of different cross sectional areas connected together. We also extend our techniques of energy conservation to a sphere falling in a viscous medium under the effect of gravity. We demonstrate Stokes equation of terminal velocity and turbulent flow equation. We look at a way of calculating the time taken for a body to fall in a viscous medium. We also look at the general equation of terminal velocity.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Software tools to facilitate materials science research
1. Software tools to facilitate
materials science research
Anubhav Jain
Energy Technologies Area
Lawrence Berkeley National Laboratory
Berkeley, CA
S2I2 Workshop, Feb 2017
Slides (already) posted to http://www.slideshare.net/anubhavster
2. What we work on
• We don’t develop or
debut the new and
fashionable
computational methods
• We adopt methods,
standardize the parts
that are ready for mass
reproduction, and
execute them over
thousands of materials
2
3. Our research interests as materials scientists
3
High-throughput calculations
(each point is a possible battery cathode)
Discovery of new functional materials
(e.g., new bulk thermoelectrics)
4. A user’s perspective of materials simulation
4
“something”!
Results!
PI!
What is the
GGA-PBE elas0c
tensor of GaAs?
5. A user’s perspective of materials simulation
5
“something”!
= student/postdoc!
Results!
PI!
What is the
GGA-PBE elas0c
tensor of GaAs?
Input file flags
Queue format
how to fix ZPOTRF?
6. Why this system?
• It works!
• Many aspects of running
simulations seem tailor-
made for assigning to
students/postdocs
– requires specialized
knowledge
– labor intensive
– helpful to have a high pain
threshold
• But there are also
disadvantages…
6
Nicola Marzari’s “Middle
Age Workshop” analogy
7. Staff specialization can get out of control
Because of the steep learning curve to
computational methods, there is often a single
group member assigned to a technique
7
“Alice knows how to do charged defect calculations.”!
“Bob is the one who can properly converge GW runs.”!
“Olga has all the scripts for phonon calculations.”!
8. Errors are all too common
Let’s take a look at two alternate universes:
Which universe you are in?
Are you sure? 8
student! has coffee!
copies files from!
previous simulation!
edits 5 lines!
runs simulation,!
delivers report!
student! forgets coffee!
copies files from!
previous simulation!
edits 4 lines!
forgets!
LHFCALC=F!
delivers report, !
looks fine at first, !
in a month you !
discover it was wrong!
1
2
9. Takes too long to get results
• Calculations are labor intensive!
– set up the structure coordinates
– write input files, double-check all the flags
– copy to supercomputer
– submit job to queue
– deal with supercomputer headaches
– monitor job
– fix error jobs, resubmit to queue, wait again
– repeat process for subsequent calculations in
workflow
– parse output files to obtain results
– copy and organize results, e.g., into Excel
9
10. There is a lot of back-and-forth in the analysis
• Student/postdoc presents Powerpoint/Excel of the
results
• PI wants to know certain details or follow up based
on the data, which are missing from the
Powerpoint/Excel
• Student/postdoc says “I will get back to you”, goes
back to office, re-processes the data, and prepares
a revised report within a few days
• Repeat…
10
11. What would be a better way?
11
“something”!
= a computer!
!
Results!
PI!
What is the
GGA-PBE elas0c
tensor of GaAs?
12. All past and present knowledge, from
everyone in the group, everyone previously
in the group, and outside collaborators,
about how to run calculations
Reduce specialization
12
13. Reduce errors and improve efficiency
• Computers can’t forget to set an input flag
• Computers (in theory) can create, correct,
submit, parse, and deliver the results of
calculations much faster than even the fastest
student
13
14. Improve analytics / visualization
• Excel and Powerpoint
works for a curated view
of the results
• But online analytics
would allow you to do
things like:
– view crystal structures on
demand
– generate the plot you
want
14
15. So this the vision we want – is it achievable?
15
“something”!
= a computer!
!
Results!
PI!
What is the
GGA-PBE elas0c
tensor of GaAs?
16. Yes! – and it is available on Materials Project
16
Input generation
(parameter choice)
Workflow mapping Supercomputer
submission /
monitoring
Error
handling
File Transfer
File Parsing /
DB insertion
Custom material
Submit!
www.materialsproject.org
“Crystal Toolkit”
Anyone can find, edit,
and submit (suggest)
structures
Currently, this feature is available for:
• structure optimization
• band structures
• elastic tensors
17. Software technologies to enable automatization
17
(automatic materials
science workflows)
Custodian
(calculation error
recovery)
(materials analysis
framework)
Base packages
Derived package
(workflow framework and
supercomputer interface)
These are all open-source:
• pymatgen and custodian are led by Prof. Ong group (UC San Diego)
• Developed in coordination with the Materials Project and Persson group
18. pymatgen – object-oriented materials analysis
18
www.pymatgen.org!
Ong, S. P., Richards, W. D., Jain, A., Hautier, G., Kocher, M., Cholia, S., Gunter,
D., Chevrier, V. L., Persson, K. a. & Ceder, G. Python Materials Genomics
(pymatgen): A robust, open-source python library for materials analysis.
Comput. Mater. Sci. 68, 314–319 (2013).!
19. pymatgen – examples of analyses
19
phase diagrams
Pourbaix diagrams
diffusivity from MDband structure analysis
20. pymatgen - many useful tools made accessible
20
Structure Matcher
analyzes if two periodic
structures are equivalent, even
if they are in different settings
or have minor distortions
= ?!
Order-disorder
resolve partial or mixed
occupancies into a fully
ordered crystal structure
(e.g., mixed oxide-fluoride site
into separate oxygen/fluorine)
Many other tools, such as:
• Bond-valence sums to determine valence
• Voronoi coordination as well as 3D coordination polyhedron analysis
• Automatically find and insert interstitial sites
• Diffraction pattern modeling
• Simple cost and materials availability estimators
21. custodian – fixing job errors
• Custodian can wrap
around an executable
(e.g., VASP)
– i.e., run custodian instead of
directly running VASP
• During execution,
custodian will monitor
output files and detect
errors / problems
– If so, it can change input files
and rerun the job
– e.g., if ZPOTRF error
detected, rerun with ISYM=0
– ever-expanding library of
fixes
21
22. FireWorks – scientific workflow software
• FireWorks is an open-source scientific
workflow software
• Materials Project, JCESR, and other
projects manage their runs with
FireWorks
– >1 million jobs
– >100 million CPU-hours
– multiple computing clusters
• You can write any workflow
– e.g., FireWorks is used for graphics
processing, machine learning, document
processing, and protein folding
– #1 Google hit for “Python workflow
software”, top 5 for general scientific
workflow software
• Detailed tutorials are available
22
Jain, A., Ong, S. P., Chen, W., Medasani, B., Qu, X., Kocher, M., Brafman, M.,
Petretto, G., Rignanese, G.-M., Hautier, G., Gunter, D. & Persson, K. A.
FireWorks: a dynamic workflow system designed for high-throughput
applications. Concurr. Comput. Pract. Exp. 22, 5037–5059 (2015).!
www.pythonhosted.org/
FireWorks!
23. FireWorks – screenshot of jobs status
23
Live version at http://fireworks.dash.materialsproject.org
24. atomate – our newest code (redesigns our older codes)
24
translate PI-style (minimal) specifications into well-
defined FireWorks workflows
(FireWorks handles all the execution and
job management details)
What is the
GGA-PBE elas0c
tensor of GaAs?
25. atomate – what’s available?
25
K. Mathew J. Montoya S. DwaraknathA. Faghaninia
• band structure
• spin-orbit coupling
• hybrid functional calcs
• elastic tensor
• piezoelectric tensor
• Raman spectra
• GIBBS method
• QH thermal expansion
• AIMD
• FEFF method
• LAMMPS MD
All past and present knowledge, from
everyone in the group, everyone previously
in the group, and outside collaborators,
about how to run calculations
M. Aykol S.P. Ong
26. Further resources
• The Github web sites
– www.github.com/materialsproject
– www.github.com/hackingmaterials
• Software carpentry
• https://software-carpentry.org
26
27. Needed: better way to learn methods
• It can take many months, and perhaps even an internship in a
group with relevant expertise, to learn to use a new method
• Workshops are one way to speed the process
• However, self-serve ways to learn new methods would be
wonderful
– e.g., web tutorials that mix together theory and practice
• Consider: what fraction of people could learn to correctly use
your code/method given only a single web link and no direct
communication with anyone? (they are allowed to find and
use other web resources based on the initial link)
– Example: https://www.youtube.com/user/MaterialsProject
27
28. Needed: curation of tools and methods
• A place to kick-start discovery and learning of
new codes and tools:
– “Too basic” example: http://materials.sh (Shyue Ping
Ong, UCSD)
– “Too complex/messy” example: Nanohub
28
29. Needed: standardizing data *containers*
• Different codes will have different inputs and
outputs, so obviously data organization will vary
• But the “container” of the data organization can be
consistent. e.g., you can represent arrays within:
– JSON
– YAML
– XML
– HDF5
– but don’t invent your own format to represent an array!
• Some of these container formats are human-
readable, i.e., easy to edit in a text editor
• No more “code parses custom input file format to
produce custom output file format”
29
30. Needed: other ways to improve accuracy
30
DFT band gap = cheap lens Some kind of super
accurate post-Bethe-
Salpeter method
How to improve image quality? Strategy 1
31. Needed: other ways to improve accuracy
31
Computer algorithms
improve image
How to improve image quality? Strategy 2
Software corrects for cheap lens. e.g.,
distortion, two images to create depth of field
32. Needed: other ways to improve accuracy
32
correct and mix
cheap/simple
calculations to
improve output
quality
Jain, A., Hau0er, G., Ong, S. P.,
Moore, C. J., Fischer, C. C., Persson,
K. A. & Ceder, G. Forma0on
enthalpies by mixing GGA and
GGA+U calcula0ons. Phys. Rev. B
84, 45115 (2011).
!
33. Needed: other ways to improve accuracy
33
Correcting the DFT is necessary to getting decent phase diagrams
Almost everyone that is practicing new materials
design does some flavor of post-correction (e.g., gas
phase energies)
More effort into comparing, developing, and
validating such methods is needed.
Jain, A., Hau0er, G., Ong, S. P.,
Moore, C. J., Fischer, C. C., Persson,
K. A. & Ceder, G. Forma0on
enthalpies by mixing GGA and
GGA+U calcula0ons. Phys. Rev. B
84, 45115 (2011).
!
35. Some lessons learned (1)
• In the beginning, strong central coordination from
authority was needed to develop these
– require that people contribute to common code, e.g.
pymatgen, and not write their own detached scripts
• Once a code was “established”, less authority was
needed
– people voluntarily contributed improvements rather than
writing their own code because this benefited them
• Today the process is almost completely
decentralized
– culture has changed
– even for new codes, people rally around it rather than
build independent things
35
36. Some lessons learned (2)
• It is helpful to have a strong BDFL (benevolent
dictator for life) for each codebase
• Requirements for the BDFL:
– very detail-oriented
– cares about the code itself, not just the application
– cares more about the code quality than about offending
teammates, i.e., will not accept poor quality contributions
– at the same time, able to rally support from people and
convince them to contribute or clean up code
– willing to work overtime to do things like write detailed
docs, advocate for the code, review commits, etc.
– derives joy from building and deploying things!
36
37. Some lessons learned (3)
• Spending time to do things like improve code-cleanliness, writing
unit tests, writing documentation, etc. is not such a “noble” and
“self-sacrificing” act like people make it out to be
– I’ve referred my own documentation many times
– I’ve saved myself from a world of trouble by previously writing unit tests to
detect bugs
– I’ve been able to write and build large code much faster due to previous
commitments to code cleanliness (and been slowed down in my progress
when I’ve relaxed these constraints)
• We don’t like to admit this, but a lack of attention to detail in the
past has easily cost us tens of thousands of dollars in wasted
computing and countless labor hours – but some of this is inevitable
with large projects
37
38. Some lessons learned (4)
• Computer scientists are useful for staying up to
date in the fast-moving world of software
– 2006: I took a graduate class in databases at MIT; all SQL,
not a single mention of “NoSQL”
– 2011: We are designing the framework for Materials
Project; I have lots of experience with SQL; a computer
scientist casually mentions NoSQL, its growing
prominence, and its potential applicability to our problem
– 2017: We do almost everything in NoSQL
• Lesson: software moves fast! Much faster than
materials science knowledge or methods. Don’t use
data from 5 years ago to inform your decision.
38