The document discusses 10 habits for effective research data: 1) Preserve, 2) Archive, 3) Access, 4) Comprehend, 5) Discover, 6) Reproduce, 7) Trust, 8) Cite, 9) Use, and 10) Putting it all together. It provides examples for each habit, including data rescue challenges, the Olive Project to preserve executable content, metadata tools to improve data comprehension and sharing, initiatives for data indexing and identifiers to improve discovery and reproducibility, and proposals for making data more usable and integrated. The overall message is that adopting standards and collaborating across organizations can help research data achieve its full potential.
What is data discovery and how do people find out about data?
Metadata: What information helps potential users decide whether that data might be useful?
How and why do machines exchange information about research data?
Data without metadata and connections is useless:
Linked data
How Scholix is helping publishers and others to link data with publications and more
Metadata, controlled vocabularies, linked data and crosswalks
Things #11, #12, #13 of 23 Things
How do we make FAIR data? Finable, Accessible, Interoperable, Reusable?
University of Bath Research Data Management training for researchersJez Cope
Slides from a workshop on Research Data Management for research staff and students at the University of Bath.
Part of the Research360 project (http://blogs.bath.ac.uk/research360).
Authors: Cathy Pink and Jez Cope, University of Bath
What is data discovery and how do people find out about data?
Metadata: What information helps potential users decide whether that data might be useful?
How and why do machines exchange information about research data?
Data without metadata and connections is useless:
Linked data
How Scholix is helping publishers and others to link data with publications and more
Metadata, controlled vocabularies, linked data and crosswalks
Things #11, #12, #13 of 23 Things
How do we make FAIR data? Finable, Accessible, Interoperable, Reusable?
University of Bath Research Data Management training for researchersJez Cope
Slides from a workshop on Research Data Management for research staff and students at the University of Bath.
Part of the Research360 project (http://blogs.bath.ac.uk/research360).
Authors: Cathy Pink and Jez Cope, University of Bath
Data Publishing at Harvard's Research Data Access SymposiumMerce Crosas
Data Publishing: The research community needs reliable, standard ways to make the data produced by scientific research available to the community, while giving credit to data authors. As a result, a new form of scholarly publication is emerging: data publishing. Data publishing - or making data reusable, citable, and accessible for long periods - is more than simply providing a link to a data file or posting the data to the researcher’s web site. We will discuss best practices, including the use of persistent identifiers and full data citations, the importance of metadata, the choice between public data and restricted data with terms of use, the workflows for collaboration and review before data release, and the role of trusted archival repositories. The Harvard Dataverse repository (and the Dataverse open-source software) provides a solution for data publishing, making it easy for researchers to follow these best practices, while satisfying data management requirements and incentivizing the sharing of research data.
This presentation was delivered at the Elsevier Library Connect Seminar on 6 October 2014 in Johannesburg, 7 October 2014 in Durban and 9 October 2014 in Cape Town and gives an overview of the potential role that librarians can play in research data management
Good (enough) research data management practicesLeon Osinski
Slides of a lecture on research data management (RDM), given for 3rd year students (Eindhoven University of Technology, major Psychology & Technology), as part of the course 0HV90 Quantitative Research. At the end of the slides a handy summary 'Research data management basics in a nutshell' is added.
Keynote presented to KE workshop held in conjunction with the release of the report "A Surfboard for Riding the Wave
Towards a four country action programme on research data": http://www.knowledge-exchange.info/Default.aspx?ID=469
Data Management in the context of Open Science.
Because open access become mandatory for publications and project-funded research data, it is the responsibility of each researcher to be informed and then trained in new practices.
Data Citation Implementation at DataverseMerce Crosas
Presentation at the Data Citation Implementation Pilot Workshop in Boston, February 3rd, 2016.
https://www.force11.org/group/data-citation-implementation-pilot-dcip/pilot-project-kick-workshop
Lesson 2 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
The DataTags System: Sharing Sensitive Data with ConfidenceMerce Crosas
This talk was part of a session at the Research Data Alliance (RDA) 8th Plenary on Privacy Implications of Research Data Sets, during International Data Week 2016:
https://rd-alliance.org/rda-8th-plenary-joint-meeting-ig-domain-repositories-wg-rdaniso-privacy-implications-research-data
Slides in Merce Crosas site:
http://scholar.harvard.edu/mercecrosas/presentations/datatags-system-sharing-sensitive-data-confidence
Data Publishing at Harvard's Research Data Access SymposiumMerce Crosas
Data Publishing: The research community needs reliable, standard ways to make the data produced by scientific research available to the community, while giving credit to data authors. As a result, a new form of scholarly publication is emerging: data publishing. Data publishing - or making data reusable, citable, and accessible for long periods - is more than simply providing a link to a data file or posting the data to the researcher’s web site. We will discuss best practices, including the use of persistent identifiers and full data citations, the importance of metadata, the choice between public data and restricted data with terms of use, the workflows for collaboration and review before data release, and the role of trusted archival repositories. The Harvard Dataverse repository (and the Dataverse open-source software) provides a solution for data publishing, making it easy for researchers to follow these best practices, while satisfying data management requirements and incentivizing the sharing of research data.
This presentation was delivered at the Elsevier Library Connect Seminar on 6 October 2014 in Johannesburg, 7 October 2014 in Durban and 9 October 2014 in Cape Town and gives an overview of the potential role that librarians can play in research data management
Good (enough) research data management practicesLeon Osinski
Slides of a lecture on research data management (RDM), given for 3rd year students (Eindhoven University of Technology, major Psychology & Technology), as part of the course 0HV90 Quantitative Research. At the end of the slides a handy summary 'Research data management basics in a nutshell' is added.
Keynote presented to KE workshop held in conjunction with the release of the report "A Surfboard for Riding the Wave
Towards a four country action programme on research data": http://www.knowledge-exchange.info/Default.aspx?ID=469
Data Management in the context of Open Science.
Because open access become mandatory for publications and project-funded research data, it is the responsibility of each researcher to be informed and then trained in new practices.
Data Citation Implementation at DataverseMerce Crosas
Presentation at the Data Citation Implementation Pilot Workshop in Boston, February 3rd, 2016.
https://www.force11.org/group/data-citation-implementation-pilot-dcip/pilot-project-kick-workshop
Lesson 2 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
The DataTags System: Sharing Sensitive Data with ConfidenceMerce Crosas
This talk was part of a session at the Research Data Alliance (RDA) 8th Plenary on Privacy Implications of Research Data Sets, during International Data Week 2016:
https://rd-alliance.org/rda-8th-plenary-joint-meeting-ig-domain-repositories-wg-rdaniso-privacy-implications-research-data
Slides in Merce Crosas site:
http://scholar.harvard.edu/mercecrosas/presentations/datatags-system-sharing-sensitive-data-confidence
FAIR Ddata in trustworthy repositories: the basicsOpenAIRE
This video illustrates how certified digital repositories contribute to making and keeping research data findable, accessible, interoperable and reusable (FAIR). Trustworthy repositories support Open Access to data, as well as Restricted Access when necessary, and they offer support for metadata, sustainable and interoperable file formats, and persistent identifiers for future citation. Presented by Marjan Grootveld (DANS, OpenAIRE).
Main references
• Core Trust Seal for trustworthy digital repositories: https://www.coretrustseal.org/
• EUDAT FAIR checklist: https://doi.org/10.5281/zenodo.1065991
• European Commission’s Guidelines on FAIR data management: http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf
• FAIR data principles: www.force11.org/group/fairgroup/fairprinciples
• Overview of metadata standards and tools: https://rdamsc.dcc.ac.uk/
Presentation slides on Open Science and research reproducibility. Presented by Gareth Knight (LSHTM Research Data Manager) on 18th September 2018, as part of an Open Science event for LSHTM Week 2018.
This presentation introduced participants to the DC 101 course and was given at the Digital Curation and Preservation Outreach and Capacity Building Workshop in Belfast on September 14-15 2009.
http://www.dcc.ac.uk/events/workshops/digital-curation-and-preservation-outreach-and-capacity-building-workshop
Data citation supports attribution, provenance, discovery, provenance, and persistence. It is not (and should not be) sufficient for all of these things, but its an important component. In the last 2 years, there have been several major efforts to standardize data citation practices, build citation infrastructure, and analyze data citation practices.
This session presented as part of the the Program on Information Science seminar series, examines data citation from an information lifecycle approach: what are the use cases, requirements and research opportunities. And the session will also discuss emerging infrastructure and standardization efforts around data citation.
A number of principles have emerged for citation -- the most central is that data citations should be treated consistently with citations to other objects:Data citations should at least provide the minimal core elements expected in other modern citations; should be included in the references section along with citations to other elements; and indexed in the same way.
Adoption of data citation by journals can provide positive and sustainable incentives for more reproducible science and more complete attribution. This would act to brighten the dark matter of science -- revealing connections among evidence bases that are not now visible through citations of articles.
Presentation slides from a lecture given at the University of the West of England (UWE) as part of the MSc in Library and Library Management, University of the West of England, Frenchay Campus, Bristol, March 24, 2009
Meeting Federal Research Requirements for Data Management Plans, Public Acces...ICPSR
These slides cover evolving federal research requirements for sharing scientific data. Provided are updates on federal agency responses to the 2013 OSTP memo, guidance on data management plans, resources for data management and curation training for staff/researchers, and tips for evaluating public data-sharing services. ICPSR's public data-sharing service, openICPSR, is also presented. Recording of this presentation is here: https://www.youtube.com/watch?v=2_erMkASSv4&feature=youtu.be
Similar to Ten habits of highly effective data (20)
Talk at the World Science Festival at Columbia, June 2, 2017: session on Big Data and Physics: http://www.worldsciencefestival.com/programs/big-data-future-physics/
Data Repositories: Recommendation, Certification and Models for Cost RecoveryAnita de Waard
Talk at NITRD Workshop "Measuring the Impact of Digital Repositories" February 28 – March 1, 2017 https://www.nitrd.gov/nitrdgroups/index.php?title=DigitalRepositories
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Toxic effects of heavy metals : Lead and Arsenicsanjana502982
Heavy metals are naturally occuring metallic chemical elements that have relatively high density, and are toxic at even low concentrations. All toxic metals are termed as heavy metals irrespective of their atomic mass and density, eg. arsenic, lead, mercury, cadmium, thallium, chromium, etc.
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Studia Poinsotiana
I Introduction
II Subalternation and Theology
III Theology and Dogmatic Declarations
IV The Mixed Principles of Theology
V Virtual Revelation: The Unity of Theology
VI Theology as a Natural Science
VII Theology’s Certitude
VIII Conclusion
Notes
Bibliography
All the contents are fully attributable to the author, Doctor Victor Salas. Should you wish to get this text republished, get in touch with the author or the editorial committee of the Studia Poinsotiana. Insofar as possible, we will be happy to broker your contact.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
1. Ten habits of highly effective data:
How to help your dataset achieve its full
potential
University of Illinois, Urbana Champaign
May 7, 2014
Anita de Waard
VP Research Data Collaborations
a.dewaard@elsevier.com
http://researchdata.elsevier.com/
2. Who cares about Research Data?
Funding bodies:
Demonstrate impact
Guarantee permanence,
discoverability
Avoid fraud
Avoid double funding
Serve general public
Research Management/Libary:
Generate, track outputs
Comply with mandates
Ensure availability
Phil Bourne, (then) Associate Vice Chancellor, UCSD, 4/13:
“We need to think about the university as a digital enterprise.”
Mike Huerta, Ass. Director NLM:
“Today, the major public product of science are concepts, written
down in papers. But tomorrow, data will be the main product of
science…. We will require scientists to track and share their data as
least as well, if not better, than they are sharing their ideas today.”
Researchers:
Derive credit
Comply with mandates
Discover and use
Cite/acknowledge
Nathan Urban, PI Urban Lab, CMU, 3/13:
“If we can share our data, we can write a paper that will knock
everybody’s socks off!”
Barbara Ransom, NSF Program Director Earth Sciences:
“We’re not going to spend any more money for you to go out and get
more data! We want you first to show us how you’re going to use all
the data we paid y’all to collect in the past!”
3. What’s the problem? One example:
Using antibodies
and squishy bits
Grad Students experiment
and enter details into their
lab notebook.
The PI then tries to make
sense of their slides,
and writes a paper.
End of story.
4. 7. Trusted (validated/checked by reviewers)
Maslow’s Hierarchy of Needs for Research Data
6. Reproducible (others can redo
experiments)
9. Usable (allow tools to run on it)
4. Comprehensible (others can understand
data & processes)
2. Archived (long-term & format-
independent)
1. Preserved (existing in some form)
5. Discoverable (can be indexed by a system)
3. Accessible (can be accessed by others)
8. Citable (able to point & track citations)
5. 1. Preserve: Data Rescue Challenge
• With IEDA/Lamont: award succesful data
rescue attempts
• Awarded at AGU 2013
• 23 submissions of data that was digitized,
preserved, made available
• Winner: NIMBUS Data Rescue:
– Recovery, reprocessing and digitization of the
infrared and visible observations along with their
navigation and formatting.
– Over 4000 7-track tapes of global infrared
satellite data were read and reprocessed.
– Nearly 200,000 visible light images were
scanned, rectified and navigated.
– All the resultant data was converted to HDF-5
(NetCDF) format and freely distributed to users
from NASA and NSIDC servers.
– This data was then used to calculate monthly sea
ice extents for both the Arctic d the Antarctic.
• Conclusion: we (collectively) need to do more
of this! How can we fund it?
7. Trusted (validated/checked by reviewers)
6. Reproducible (others can redo
experiments)
9. Usable (allow tools to run on it)
4. Comprehensible (others can understand
data & processes)
1. Preserved (existing in some form)
5. Discoverable (can be indexed by a system)
8. Citable (able to point & track citations)
3. Accessible (can be accessed by others)
2. Archived (long-term & format-
independent)
6. 7. Trusted (validated/checked by reviewers)
6. Reproducible (others can redo
experiments)
9. Usable (allow tools to run on it)
4. Comprehensible (others can understand
data & processes)
3. Accessible (can be accessed by others)
1. Preserved (existing in some form)
5. Discoverable (can be indexed by a system)
2. Archived (long-term & format-
independent)
8. Citable (able to point & track citations)
2. Archive: Olive Project
• CMU CS & Library: funded by a grant
from the IMLS, Elsevier is partner
• Goal: Preservation of executable content
- nowadays a large part of intellectual
output, and very fragile
• Identified a series of software packages
and prepared VM to preserve
• Does it work? Yes – see video (1:24)
7. 7. Trusted (validated/checked by reviewers)
6. Reproducible (others can redo
experiments)
9. Usable (allow tools to run on it)
4. Comprehensible (others can understand
data & processes)
1. Preserved (existing in some form)
5. Discoverable (can be indexed by a system)
8. Citable (able to point & track citations)
3. Access: Urban Legend
3. Accessible (can be accessed by others)
2. Archived (long-term & format-
independent)
• Part 1: Metadata acquisition
• Step through experimental process in series
of dropdown menus in simple web UI
• Can be tailored to workflow of individual
researcher
• Connected to shared ontologies through
lookup table, managed centrally in lab
• Connect to data input console (Igor Pro)
8. 7. Trusted (validated/checked by reviewers)
6. Reproducible (others can redo
experiments)
9. Usable (allow tools to run on it)
4. Comprehensible (others can understand
data & processes)
1. Preserved (existing in some form)
5. Discoverable (can be indexed by a system)
8. Citable (able to point & track citations)
4. Comprehend: Urban Legend
3. Accessible (can be accessed by others)
2. Archived (long-term & format-
independent)
• Part 2: Data Dashboard
• Access, select and manipulate data (calculate
properties, sort and plot)
• Final goal: interactive figures linked to data
• Plan to expand to more neuroscience labs
• Plan to build for geochemistry use case
9. 7. Trusted (validated/checked by reviewers)
6. Reproducible (others can redo
experiments)
9. Usable (allow tools to run on it)
4. Comprehensible (others can understand
data & processes)
1. Preserved (existing in some form)
5. Discoverable (can be indexed by a system)
8. Citable (able to point & track citations)
5. Discover: Data Indexing proposals
• Collaborated on Data Discovery Index
proposal with UCSD/Carnegie Mellon
• Also worked with UIUC!
• Interested in developing distributed
infrastructures on making data easier to
search: what is the ‘Goldilocks lndex’ where
search is scalable, yet useful?
• Looking for academic/industry partners/use
cases/platforms to address the next stage
• Discoverability is key driver for
metadata/data format structure!
3. Accessible (can be accessed by others)
2. Archived (long-term & format-
independent)
10. 7. Trusted (validated/checked by reviewers)
6. Reproducible (others can redo
experiments)
9. Usable (allow tools to run on it)
4. Comprehensible (others can understand
data & processes)
1. Preserved (existing in some form)
5. Discoverable (can be indexed by a system)
8. Citable (able to point & track citations)
6. Reproduce: Resource Identifier Initiative
Force11 Working Group to add data identifiers
to articles that is
– 1) Machine readable;
– 2) Free to generate and access;
– 3) Consistent across publishers and journals.
• Authors publishing in participating journals
will be asked to provide RRID's for their
resources; these are added to the keyword
field
• RRID's will be drawn from:
– The Antibody Registry
– Model Organism Databases
– NIF Resource Registry
• So far, Springer, Wiley, Biomednet, Elsevier
journals have signed up with 11 journals,
more to come
• Wide community adoption!
3. Accessible (can be accessed by others)
2. Archived (long-term & format-
independent)
11. 7. Trusted (validated/checked by reviewers)
6. Reproducible (others can redo
experiments)
9. Usable (allow tools to run on it)
4. Comprehensible (others can understand
data & processes)
1. Preserved (existing in some form)
5. Discoverable (can be indexed by a system)
8. Citable (able to point & track citations)
7.Trust: Moonrocks
3. Accessible (can be accessed by others)
2. Archived (long-term & format-
independent)
How can we scale up data curation?
Pilot project with IEDA:
• A database for lunar geochemistry:
leapfrog & improve curation time
• 1-year pilot, funded by Elsevier
• Main conclusion: if spreadsheet
columns/headers map to RDB
schema we can scale curation cost!
12. 7. Trusted (validated/checked by reviewers)
6. Reproducible (others can redo
experiments)
9. Usable (allow tools to run on it)
4. Comprehensible (others can understand
data & processes)
1. Preserved (existing in some form)
5. Discoverable (can be indexed by a system)
8. Citable (able to point & track citations)
8. Cite: Force11 Data Citation Principles
• Another Force11 Working group
• Defined 8 principles:
• Now seeking endorsement/working on
implementation
3. Accessible (can be accessed by others)
2. Archived (long-term & format-
independent)
1. Importance: Data should be considered legitimate, citable products of
research. Data citations should be accorded the same importance in
the scholarly record as citations of other research objects, such as
publications.
2. Credit and attribution: Data citations should facilitate giving scholarly
credit and normative and legal attribution to all contributors to the
data, recognizing that a single style or mechanism of attribution may
not be applicable to all data.
3. Evidence: Where a specific claim rests upon data, the corresponding
data citation should be provided.
4. Unique Identification: A data citation should include a persistent
method for identification that is machine actionable, globally unique,
and widely used by a community.
5. Access: Data citations should facilitate access to the data themselves
and to such associated metadata, documentation, and other materials,
as are necessary for both humans and machines to make informed use
of the referenced data.
6. Persistence: Metadata describing the data, and unique identifiers
should persist, even beyond the lifespan of the data they describe.
7. Versioning and granularity: Data citations should facilitate
identification and access to different versions and/or subsets of data.
Citations should include sufficient detail to verifiably link the citing
work to the portion and version of data cited.
8. Interoperability and flexibility: Data citation methods should be
sufficiently flexible to accommodate the variant practices among
communities but should not differ so much that they compromise
interoperability of data citation practices across communities.
13. 7. Trusted (validated/checked by reviewers)
6. Reproducible (others can redo
experiments)
9. Usable (allow tools to run on it)
4. Comprehensible (others can understand
data & processes)
1. Preserved (existing in some form)
5. Discoverable (can be indexed by a system)
8. Citable (able to point & track citations)
9. Use: Executable Papers
• Result of a challenge to come up with
cyberinfrastructure components to
enable executable papers
• Pilot in Computer Science journals
– See all code in the paper
– Save it, export it
– Change it and rerun on data set:
3. Accessible (can be accessed by others)
2. Archived (long-term & format-
independent)
14. 10: Putting it all together:
7. Trusted (validated/checked by reviewers)
6. Reproducible (others can redo
experiments)
9. Usable (allow tools to run on it)
4. Comprehensible (others can understand
data & processes)
2. Archived (long-term & format-
independent)
1. Preserved (existing in some form)
5. Discoverable (can be indexed by a system)
3. Accessible (can be accessed by others)
8. Citable (able to point & track citations)
Experimental Metadata:
Workflows, Samples, Settings, Reagents, Organisms, etc.
Record Metadata: DOI, Date, Author, Institute, etc.
Processed Data:
Mathematically/computationally processed
data: correlations, plots, etc.
Raw Data: Direct outputs from equipment:
images, traces, spectra, etc.
Methods and Equipment: Reagents,
settings, manufacturer’s details, etc.
Validation: Approval, Reproduction, Selection,
Quality Stamp
Morecuration
Moreusable
15. So how can we help research data
be more happy and productive?
• Group therapy: Force11, W3C, other fora – shared
standards help everyone (we play well with others !)
• Financial therapy: we have a lot of content & IT skills to
support data-driven processes to support grant
proposals; funders like us.
• Creative therapy: innovative collaboration projects that
expand everyone’s mind – let’s put your data through its
paces
• Relationship therapy: happy to address any issues or
concerns!
16. Collaborations and discussions gratefully acknowledged:
– CMU: Nathan Urban, Shreejoy Tripathy, Shawn Burton, Ed Hovy
– UCSD: Brian Shoettlander, David Minor, Declan Fleming, Ilya Zaslavsky
– NIF: Maryann Martone, Anita Bandrowski
– Force11: Ed Hovy, Tim Clark, Ivan Herman, Paul Groth, Maryann Martone,
Cameron Neylon, Stephanie Hagstrom
– OHSU: Melissa Haendel, Nicole Vasilevsky
– Columbia/IEDA: Kerstin Lehnert, Leslie Hsu
– MIT: Micah Altman
Thank you!
http://researchdata.elsevier.com/
Anita de Waard
a.dewaard@elsevier.com