Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Software Citation, Reuse and
Metadata Considerations:
An Exploratory Study Examining
LAMMPS
Kai Li, Jane Greenberg and Xia...
Acknowledgement
The paper is available at:
https://goo.gl/uEP9B7
Research questions
• How is the software LAMMPS described to be
(re-)used through citation or mention in
scientific studie...
Conceptual model
Paper
SoftwareStudy
Describes
Uses
Cites or mentions the
use of software, the
software itself, or other
p...
Background
• Research data should be properly cited in
research outputs.
• “Software as data”:
– Scientific software is an...
Software reuse or software use?
What is cited: dataset vs. data paper?
• In established data repositories, datasets are
normally accompanied by official c...
LAMMPS
• LAMMPS (large-scale atomic/molecular massively
simulator) is a molecular dynamics program
created by agreement be...
Official citation of LAMMPS
Research method
• Sample: 400 most cited papers on Google Scholar
citing the original Plimpton’s paper (Plimpton,
1995).
•...
Scheme of reuse types of LAMMPS
Category Definition
Unspecified reuse
The paper reuses LAMMPS as whole in the main study o...
Examples of “Unspecified reuse” type
• “LAMMPS was used for all MD simulations.
[37]” (McMahon, Cheung & Troise, 2011)
• "...
Examples of “Modified reuse” type
• "The annealing simulations were performed with
LAMMPS (large-scale atomic/molecular ma...
Examples of “Benchmark” type
• "To demonstrate what one should expect of a precise
MD trajectory, the same simulation run ...
Examples of “Cite” type
• "Research by Plimpton [38], Plimpton and
Hendrickson [37], and Hwang et al. [19] shows
that this...
Category occurrence
Unspecified
reuse, 305
Modified reuse,
29
Benchmark, 11
Cite, 55
Metadata elements in mentions
• The following three metadata elements are
focused in this study:
– Version
– Parallel/part...
Version of LAMMPS
• Because of the nature of the citation of
LAMMPS, version information is not included
in most of the sa...
Parallel/part code of LAMMPS
• Out of the three earliest parallel code
packages, WARP and ParaDyn were found to
be mention...
Summary of parallel code mentions
Code package Papers
WARP (13) Ji & Park, 2006; Park, 2006; Park, Gall & Zimmerman, 2005,...
Simulation model used
• Software can be seen as a set of code, where
research method(s) being implemented.
• Research meth...
Summary of the top simulation model
mentioned
Model Occurrence
Adaptive Intermolecular Reactive Empirical Bond
Order (AIRE...
Conclusions
• There are different kinds of semantic elements
(reuse type, version, software relationship,
method) in the c...
Implications
• As the representation of software and
software use in scientific studies, what
elements and/or relationship...
REFERENCE
• Agrawal, R., Peng, B., & Espinosa, H. D. (2009). Experimental-computational investigation of ZnO nanowires str...
QUESTION TIME
Or you can also send any question to
kl696@drexel.edu
Upcoming SlideShare
Loading in …5
×

Software Citation, Reuse and Metadata Considerations: An Exploratory Study Examining LAMMPS

414 views

Published on

Slides for ASIS&T 2016 Data Curation session.

Published in: Science
  • Be the first to comment

Software Citation, Reuse and Metadata Considerations: An Exploratory Study Examining LAMMPS

  1. 1. Software Citation, Reuse and Metadata Considerations: An Exploratory Study Examining LAMMPS Kai Li, Jane Greenberg and Xia Lin College of Computing and Informatics Drexel University 10/17/2016 ASIS&T Conference 2016
  2. 2. Acknowledgement
  3. 3. The paper is available at: https://goo.gl/uEP9B7
  4. 4. Research questions • How is the software LAMMPS described to be (re-)used through citation or mention in scientific studies? – What are the use types of LAMMPS? – What metadata standards can be found in the natural-language mentions of the software?
  5. 5. Conceptual model Paper SoftwareStudy Describes Uses Cites or mentions the use of software, the software itself, or other possibilities
  6. 6. Background • Research data should be properly cited in research outputs. • “Software as data”: – Scientific software is an unique research data object in terms of its positions and functions in the research infrastructure/pipeline.
  7. 7. Software reuse or software use?
  8. 8. What is cited: dataset vs. data paper? • In established data repositories, datasets are normally accompanied by official citation instructions, including author, title, date, and DOI, and other descriptive metadata elements. • Data paper is a “searchable metadata document, describing a particular dataset or a group of datasets, published in the form of a peer- reviewed article in a scholarly journal.” (Chavan & Penev, 2011) – There is a also increasing number of software papers parallel to the format of data paper.
  9. 9. LAMMPS • LAMMPS (large-scale atomic/molecular massively simulator) is a molecular dynamics program created by agreement between Sandia National Laboratories, Lawrence Livermore National Laboratory and three other companies. • It was released as open source code in 2004. Since then, new features, including those developed by third-parties, have been integrated into the package.
  10. 10. Official citation of LAMMPS
  11. 11. Research method • Sample: 400 most cited papers on Google Scholar citing the original Plimpton’s paper (Plimpton, 1995). • All the sentences about LAMMPS in the papers were extracted and coded manually from the sample adopting content analysis method. – Classification scheme of reusing type – Metadata elements about LAMMPS and reuse of LAMMPS
  12. 12. Scheme of reuse types of LAMMPS Category Definition Unspecified reuse The paper reuses LAMMPS as whole in the main study or does not specify which other types of reuse it is. Modified reuse The paper uses a modified version of LAMMPS in the main study. The specification of modification may or may not be specified in the paper. Benchmark The paper only uses LAMMPS (original or modified version) in the background study. Cite (or non-use) The paper does not use LAMMPS per se, but just cites either the software or Plimpton’s paper, including those papers that just use the method represented in the original paper.
  13. 13. Examples of “Unspecified reuse” type • “LAMMPS was used for all MD simulations. [37]” (McMahon, Cheung & Troise, 2011) • "LAMMPS [28,29] (Large-Scale Atomic/Molecular Massively Parallel Simulator), developed at Sandia National Laboratories, was used to model [0001] oriented ZnO NWs with diameters ranging from 5 to 20 nm." (Agrawal, Peng & Espinosa, 2009)
  14. 14. Examples of “Modified reuse” type • "The annealing simulations were performed with LAMMPS (large-scale atomic/molecular massively parallel simulator) code from Plimpton at Sandia (modified to handle our force fields). " (Jang et al., 2004) • "We would like to thank E. Charlaix and P.-F. Gobin for introducing us to this subject, and Dr. S.J. Plimpton for making publicly available a parallel MD code, [25] a modified version of which was used in the present simulations." (Barrat & Bocquet, 1999)
  15. 15. Examples of “Benchmark” type • "To demonstrate what one should expect of a precise MD trajectory, the same simulation run is performed using LAMMPS again, but this time on four processor cores in parallel. ... We have compared our GPU implementation against LAMMPS running on a fast parallel cluster, see Fig. 8, and we have shown that the GPU performs at the same level as up to 36 processor cores." (Anderson, Lorenz & Travesset, 2008) • "In order to compare our GPU version to a well- optimized sequential code, we have also compared our CUDA implementation to LAMMPS." (Liu et al., 2008)
  16. 16. Examples of “Cite” type • "Research by Plimpton [38], Plimpton and Hendrickson [37], and Hwang et al. [19] shows that this method provides a better speedup than RD, and can be used with good speedups up to hundreds of processors. " (Kale et al., 1999) • "Software such as LAMMPS [212], IMD [213] and DL_POLY [214] are publicly available to perform large-scale MD simulations on parallel platforms." (Mishin, Asta & Li, 2010)
  17. 17. Category occurrence Unspecified reuse, 305 Modified reuse, 29 Benchmark, 11 Cite, 55
  18. 18. Metadata elements in mentions • The following three metadata elements are focused in this study: – Version – Parallel/part code – Simulation model used
  19. 19. Version of LAMMPS • Because of the nature of the citation of LAMMPS, version information is not included in most of the sampled papers: only five papers include any version information; and two of them are in an accurate and full form.
  20. 20. Parallel/part code of LAMMPS • Out of the three earliest parallel code packages, WARP and ParaDyn were found to be mentioned in the papers; but GranFlow wasn’t mentioned in any sampled paper. • All of these three packages were integrated into LAMMPS in 2001. (“LAMMPS history, n.d.”)
  21. 21. Summary of parallel code mentions Code package Papers WARP (13) Ji & Park, 2006; Park, 2006; Park, Gall & Zimmerman, 2005, 2006; Park & Zimmerman, 2005, 2006 Liang & Zhou, 2006 Tschopp & McDowell, 2008a, 2008b; Tschopp, Spearot & McDowell, 2007; Tschopp, Tucker & McDowell, 2007, 2008 ParaDyn (5) Cao & Ma, 2008; Cao & Wei, 2006, 2007a, 2007b; Cao, Wei & Ma, 2008
  22. 22. Simulation model used • Software can be seen as a set of code, where research method(s) being implemented. • Research methods connected to scientific software as a type research object should be traced and studied.
  23. 23. Summary of the top simulation model mentioned Model Occurrence Adaptive Intermolecular Reactive Empirical Bond Order (AIREBO) Potential 15 Embedded atom method (EAM) potential 15 Nose-Hoover thermostat 10 Reactive force field (ReaxFF) 7 Velocit-verlet algorithm 6
  24. 24. Conclusions • There are different kinds of semantic elements (reuse type, version, software relationship, method) in the citation/mention of LAMMPS in research papers. • The current practice of recording such information is highly incomplete, inconsistent, and sometimes confusing.
  25. 25. Implications • As the representation of software and software use in scientific studies, what elements and/or relationship should be included in the future standards of software citation? – Metaphor matters! (Parsons & Fox, 2013)
  26. 26. REFERENCE • Agrawal, R., Peng, B., & Espinosa, H. D. (2009). Experimental-computational investigation of ZnO nanowires strength and fracture. Nano Letters, 9(12), 4177–4183. • Anderson, J. A., Lorenz, C. D., & Travesset, A. (2008). General purpose molecular dynamics simulations fully implemented on graphics processing units. Journal of Computational Physics, 227(10), 5342–5359. • Barrat, J.-L., & Bocquet, L. ’ric. (1999). Influence of wetting properties on hydrodynamic boundary conditions at a fluid/solid interface. Faraday Discussions, 112, 119–128. • Chavan, V., & Penev, L. (2011). The data paper: a mechanism to incentivize data publishing in biodiversity science. BMC Bioinformatics, 12(15), 1. • Jang, S. S., Molinero, V., Cagin, T., & Goddard, W. A. (2004). Nanophase-segregation and transport in Nafion 117 from molecular dynamics simulations: effect of monomeric sequence. The Journal of Physical Chemistry B, 108(10), 3149– 3157. • Kalé, L., Skeel, R., Bhandarkar, M., Brunner, R., Gursoy, A., Krawetz, N., … Schulten, K. (1999). NAMD2: greater scalability for parallel molecular dynamics. Journal of Computational Physics, 151(1), 283–312. • Liu, W., Schmidt, B., Voss, G., & Müller-Wittig, W. (2008). Accelerating molecular dynamics simulations using Graphics Processing Units with CUDA. Computer Physics Communications, 179(9), 634–641. • McMahon, D. P., Cheung, D. L., & Troisi, A. (2011). Why holes and electrons separate so well in polymer/fullerene photovoltaic cells. The Journal of Physical Chemistry Letters, 2(21), 2737–2741. • Mishin, Y., Asta, M., & Li, J. (2010). Atomistic modeling of interfaces and their impact on microstructure and properties. Acta Materialia, 58(4), 1117–1151. • Parsons, M. A., & Fox, P. A. (2013). Is data publication the right metaphor? Data Science Journal, 12(0), WDS32–WDS46. • Plimpton, S. (1995). Fast Parallel Algorithms for Short-Range Molecular Dynamics. Journal of Computational Physics, 117(1), 1–19.
  27. 27. QUESTION TIME Or you can also send any question to kl696@drexel.edu

×