Meeting Federal Research Requirements for Data Management Plans, Public Acces...ICPSR
These slides cover evolving federal research requirements for sharing scientific data. Provided are updates on federal agency responses to the 2013 OSTP memo, guidance on data management plans, resources for data management and curation training for staff/researchers, and tips for evaluating public data-sharing services. ICPSR's public data-sharing service, openICPSR, is also presented. Recording of this presentation is here: https://www.youtube.com/watch?v=2_erMkASSv4&feature=youtu.be
Digital transformation to enable a FAIR approach for health data scienceVarsha Khodiyar
Invited talk for ConTech Pharma on 1st March 2022
Abstract
Health Data Research UK is the UK’s national institute for health data science, with a mission to unite the UK’s health data to enable discoveries that improve people’s lives. In this talk, Dr Varsha Khodiyar will outline how HDR UK is bringing together disparate health data from all four countries of the United Kingdom, creating the infrastructure to enable discovery of and access to health data, and the convening standards making bodies to improve data linkage and data reuse. Varsha will also discuss how HDR UK is moving beyond the traditional confines of FAIR data to also ensure that data sharing and data use is transparent and ‘fair’ for the patients and lay public who are the subjects of these datasets.
Blockchain in Health Research Overview - ManionSean Manion PhD
Blockchain in Health Research 2019 was the 2nd annual summit hosted at Georgetown University on 27 Apr 2019 by Sean Manion, Science Distributed and Gilles Hilary, Georgetown University.
Meeting Federal Research Requirements for Data Management Plans, Public Acces...ICPSR
These slides cover evolving federal research requirements for sharing scientific data. Provided are updates on federal agency responses to the 2013 OSTP memo, guidance on data management plans, resources for data management and curation training for staff/researchers, and tips for evaluating public data-sharing services. ICPSR's public data-sharing service, openICPSR, is also presented. Recording of this presentation is here: https://www.youtube.com/watch?v=2_erMkASSv4&feature=youtu.be
Digital transformation to enable a FAIR approach for health data scienceVarsha Khodiyar
Invited talk for ConTech Pharma on 1st March 2022
Abstract
Health Data Research UK is the UK’s national institute for health data science, with a mission to unite the UK’s health data to enable discoveries that improve people’s lives. In this talk, Dr Varsha Khodiyar will outline how HDR UK is bringing together disparate health data from all four countries of the United Kingdom, creating the infrastructure to enable discovery of and access to health data, and the convening standards making bodies to improve data linkage and data reuse. Varsha will also discuss how HDR UK is moving beyond the traditional confines of FAIR data to also ensure that data sharing and data use is transparent and ‘fair’ for the patients and lay public who are the subjects of these datasets.
Blockchain in Health Research Overview - ManionSean Manion PhD
Blockchain in Health Research 2019 was the 2nd annual summit hosted at Georgetown University on 27 Apr 2019 by Sean Manion, Science Distributed and Gilles Hilary, Georgetown University.
Open science is yielding active efforts to make data from research available for broader use. But data have restrictions on them (privacy, sensitivity restrictions; regulated by statute or otherwise) that can limit their ability to be made available more broadly. In this talk we offer that there are alternate approaches to the spectrum of data sharing options that offers more control over data than full sharing yet are more contributory than no sharing at all. We offer the controlled compute environment, or capsule, as a viable new approach for computational analysis of data that have restrictions. The compute environment increases the range of possibilities for facilitating science through data reuse, an objective of open science. This talk frames the capsule, and provides experience based on one such capsule used in HathiTrust for research with copyrighted materials.
This is the introductory slide deck from the core curriculum from the Chain Event: Georgetown, a blockchain in health science research symposium held at Georgetown University on 12 May 2018
Agencies such as the NSF and NIH require data management plans as part of research proposals and the Office of Science and Technology Policy (OSTP) is requiring federal agencies to develop plans to increase public access to results of federally funded scientific research. These slides explore sustainable data sharing models, including models for sharing restricted-use data. Demos of these models and tips for accessing public data access services are provided as well as resources for creating data management plans for grant applications.
DataONE Education Module 10: Legal and Policy IssuesDataONE
Lesson 10 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinaidatascienceiqss
The DataTags framework makes it easy for data producers to deposit, data publishers to store and distribute, and data users to access and use datasets containing confidential information, in a standardized and responsible way. The talk will first introduce the concepts and tools behind DataTags, and then focus on the user-facing component of the system - Tagging Server (available today at datatags.org). We will conclude by describing how future versions of Dataverse will use DataTags to automatically handle sensitive datasets, that can only be shared under some restrictions.
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...ICPSR
This is ICPSR's core workshop deck designed to introduce, remind, and refresh your knowledge of ICPSR. It contains four "tours" or sub-presentations describing ICPSR's general reason for being, it's social and behavioral research data complete with search strategies, its training, educational, and instructional resources, and its data management and curation services, data repository options, and support resources (content and budget estimates) for those writing grant proposals.
The Role of the FAIR Guiding Principles for an effective Learning Health SystemMichel Dumontier
he learning health system (LHS) is an integrated social and technological system that embeds continuous improvement and innovation for the effective delivery of healthcare. A crucial part of the LHS lies in how the underlying information system will secure and take advantage of relevant knowledge assets towards supporting complex and unusual clinical decision making, facilitating public health surveillance, and aiding comparative effectiveness research. However, key knowledge assets remain difficult to obtain and reuse, particularly in a decentralized context. In this talk, I will discuss the role of the Findable, Accessible, Interoperable, and Reusable (FAIR) Guiding Principles towards the realization of the LHS, along with emerging technologies to publish and refine clinical research and knowledge derived therein.
Keynote given for 2021 Knowledge Representation for Health Care http://banzai-deim.urv.net/events/KR4HC-2021/
Instructional Data Sets from Q-step Launch Event (Univ of Exeter) 3-20-2014ICPSR
Presentation about using social science data in the classroom and creating (and finding) resources with which to do it. Addresses both substantive courses and research methods/statistics courses.
Why is the NIH investing $100M at the intersection of data science and health research? The NIH seeks to invest in ways to help researchers easily find, access, analyze, and curate research data. Researchers want visual analytics, and to build the database into a “social network” – being able to “friend” or “like” the data.
Research in the time of Covid: Surveying impacts on Early Career ResearchersRebecca Grant
Based on a survey of over 4,500 researchers published in the white paper The State of Open Data 2020, this session will explore the impacts of the pandemic on early career reearchers (ECRs), their research practice, and how they interact with open data. We will discuss the specific challenges reported by ECRs, as well as the gaps in training and support that they have identified that would encourage their sharing and reuse of research data.
Presentation at the E-ARMA conference 2021.
Data sharing promotes many goals of the NIH research endeavor. It is particularly important for unique data that cannot be readily replicated. Data sharing allows scientists to expedite the translation of research results into knowledge, products, and procedures to improve human health. Do you know what a data sharing plan should include? Are you aware of common practices and standards for data sharing? Do you know what services are available to help share your data responsibly? This workshop will begin to address these questions. Q&A will follow the presentation. Anyone interested in or planning to apply for NIH funding should attend. Note: The NIH data-sharing policy applies to applicants seeking $500,000 or more in direct costs in any year of the proposed research.
Data sharing promotes many goals of the NIH research endeavor. It is particularly important for unique data that cannot be readily replicated. Data sharing allows scientists to expedite the translation of research results into knowledge, products, and procedures to improve human health. Do you know what a data sharing plan should include? Are you aware of common practices and standards for data sharing? Do you know what services are available to help share your data responsibly? This workshop will begin to address these questions. Q&A will follow the presentation. Anyone interested in or planning to apply for NIH funding should attend. Note: The NIH data-sharing policy applies to applicants seeking $500,000 or more in direct costs in any year of the proposed research.
Open science is yielding active efforts to make data from research available for broader use. But data have restrictions on them (privacy, sensitivity restrictions; regulated by statute or otherwise) that can limit their ability to be made available more broadly. In this talk we offer that there are alternate approaches to the spectrum of data sharing options that offers more control over data than full sharing yet are more contributory than no sharing at all. We offer the controlled compute environment, or capsule, as a viable new approach for computational analysis of data that have restrictions. The compute environment increases the range of possibilities for facilitating science through data reuse, an objective of open science. This talk frames the capsule, and provides experience based on one such capsule used in HathiTrust for research with copyrighted materials.
This is the introductory slide deck from the core curriculum from the Chain Event: Georgetown, a blockchain in health science research symposium held at Georgetown University on 12 May 2018
Agencies such as the NSF and NIH require data management plans as part of research proposals and the Office of Science and Technology Policy (OSTP) is requiring federal agencies to develop plans to increase public access to results of federally funded scientific research. These slides explore sustainable data sharing models, including models for sharing restricted-use data. Demos of these models and tips for accessing public data access services are provided as well as resources for creating data management plans for grant applications.
DataONE Education Module 10: Legal and Policy IssuesDataONE
Lesson 10 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinaidatascienceiqss
The DataTags framework makes it easy for data producers to deposit, data publishers to store and distribute, and data users to access and use datasets containing confidential information, in a standardized and responsible way. The talk will first introduce the concepts and tools behind DataTags, and then focus on the user-facing component of the system - Tagging Server (available today at datatags.org). We will conclude by describing how future versions of Dataverse will use DataTags to automatically handle sensitive datasets, that can only be shared under some restrictions.
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...ICPSR
This is ICPSR's core workshop deck designed to introduce, remind, and refresh your knowledge of ICPSR. It contains four "tours" or sub-presentations describing ICPSR's general reason for being, it's social and behavioral research data complete with search strategies, its training, educational, and instructional resources, and its data management and curation services, data repository options, and support resources (content and budget estimates) for those writing grant proposals.
The Role of the FAIR Guiding Principles for an effective Learning Health SystemMichel Dumontier
he learning health system (LHS) is an integrated social and technological system that embeds continuous improvement and innovation for the effective delivery of healthcare. A crucial part of the LHS lies in how the underlying information system will secure and take advantage of relevant knowledge assets towards supporting complex and unusual clinical decision making, facilitating public health surveillance, and aiding comparative effectiveness research. However, key knowledge assets remain difficult to obtain and reuse, particularly in a decentralized context. In this talk, I will discuss the role of the Findable, Accessible, Interoperable, and Reusable (FAIR) Guiding Principles towards the realization of the LHS, along with emerging technologies to publish and refine clinical research and knowledge derived therein.
Keynote given for 2021 Knowledge Representation for Health Care http://banzai-deim.urv.net/events/KR4HC-2021/
Instructional Data Sets from Q-step Launch Event (Univ of Exeter) 3-20-2014ICPSR
Presentation about using social science data in the classroom and creating (and finding) resources with which to do it. Addresses both substantive courses and research methods/statistics courses.
Why is the NIH investing $100M at the intersection of data science and health research? The NIH seeks to invest in ways to help researchers easily find, access, analyze, and curate research data. Researchers want visual analytics, and to build the database into a “social network” – being able to “friend” or “like” the data.
Research in the time of Covid: Surveying impacts on Early Career ResearchersRebecca Grant
Based on a survey of over 4,500 researchers published in the white paper The State of Open Data 2020, this session will explore the impacts of the pandemic on early career reearchers (ECRs), their research practice, and how they interact with open data. We will discuss the specific challenges reported by ECRs, as well as the gaps in training and support that they have identified that would encourage their sharing and reuse of research data.
Presentation at the E-ARMA conference 2021.
Data sharing promotes many goals of the NIH research endeavor. It is particularly important for unique data that cannot be readily replicated. Data sharing allows scientists to expedite the translation of research results into knowledge, products, and procedures to improve human health. Do you know what a data sharing plan should include? Are you aware of common practices and standards for data sharing? Do you know what services are available to help share your data responsibly? This workshop will begin to address these questions. Q&A will follow the presentation. Anyone interested in or planning to apply for NIH funding should attend. Note: The NIH data-sharing policy applies to applicants seeking $500,000 or more in direct costs in any year of the proposed research.
Data sharing promotes many goals of the NIH research endeavor. It is particularly important for unique data that cannot be readily replicated. Data sharing allows scientists to expedite the translation of research results into knowledge, products, and procedures to improve human health. Do you know what a data sharing plan should include? Are you aware of common practices and standards for data sharing? Do you know what services are available to help share your data responsibly? This workshop will begin to address these questions. Q&A will follow the presentation. Anyone interested in or planning to apply for NIH funding should attend. Note: The NIH data-sharing policy applies to applicants seeking $500,000 or more in direct costs in any year of the proposed research.
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...dkNET
Abstract
In this presentation, Susan Gregurick, Ph.D., Associate Director of Data Science and Director, Office of Data Science Strategy at the National Institutes of Health, will share the NIH’s vision for a modernized, integrated FAIR biomedical data ecosystem and the strategic roadmap that NIH is following to achieve this vision. Dr. Gregurick will highlight projects being implemented by team members across the NIH’s 27 institutes and centers and will ways that industry, academia, and other communities can help NIH enable a FAIR data ecosystem. Finally, she will weave in how this strategy is being leveraged to address the COVID-19 pandemic.
Presenter: Susan Gregurick, Ph.D., Associate Director of Data Science and Director, Office of Data Science Strategy at the National Institutes of Health
dkNET Webinar Information: https://dknet.org/about/webinar
Workshop - finding and accessing data - Cambridge August 22 2016Fiona Nielsen
Finding and accessing human genomic data for research
University of Cambridge, United Kingdom | Seminar Room G
Monday, 22 August 2016 from 10:00 to 12:00 (BST)
Charlotte, Nadia and Fiona presented an overview of data sources around the world where you can find genomics data for your research and gave examples of the data access application for dbGaP and EGA with specific details relevant for University of Cambridge researchers.
dkNET Office Hours - "Are You Ready for 2023? New NIH Data Management and Sha...dkNET
For all proposals submitted on/after January 25 2023, NIH will require the sharing of data from all NIH funded studies. Do you have appropriate data management practices and sharing plans in place to meet these requirements? Have questions or need some help? Join the dkNET office hours to learn about NIH’s policy (NOT-OD-21-013) and resources (https://dknet.org/rin/research-data-management) that could help.
Upcoming Webinars Schedule: https://dknet.org/about/webinar
This is an update on the status of federal requirements for data sharing in 2015. These slides were presented at ACRL in Portland in March 2015, by Linda Detterman and Jared Lyle of ICPSR, based at the University of Michigan. The session includes overviews of federal requirements, data curation, data management plans, data sharing services, and lots of fun!
dkNET Office Hours - "Are You Ready for 2023: New NIH Data Management and Sha...dkNET
For all proposals submitted on/after January 25 2023, NIH will require the sharing of data from all NIH funded studies. Do you have appropriate data management practices and sharing plans in place to meet these requirements? Have questions or need some help? Join the dkNET office hours to learn about NIH’s policy (NOT-OD-21-013) and resources that could help.
*Previous Office Hours Slides and Recording: https://dknet.org/rin/research-data-management
Upcoming Webinars Schedule: https://dknet.org/about/webinar
Compliance: Data Management Plans and Public Access to DataMargaret Henderson
Presented at The 8th Annual University of Massachusetts and New England Area Librarian e-Science Symposium, Wednesday, April 6, 2016
University of Massachusetts Medical School
On November 21st 2014 at the Tufts University Medford campus and November 25th 2014 at the campus of the University of Massachusetts Medical School in Worcester, the BLC and Digital Science hosted a workshop focused on better understanding the research information management landscape.
Mark Hahnel, CEO of Figshare discussed more specific aspects of the research data management landscape and various approaches to address the growing suite of mandates.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...Wasswaderrick3
In this book, we use conservation of energy techniques on a fluid element to derive the Modified Bernoulli equation of flow with viscous or friction effects. We derive the general equation of flow/ velocity and then from this we derive the Pouiselle flow equation, the transition flow equation and the turbulent flow equation. In the situations where there are no viscous effects , the equation reduces to the Bernoulli equation. From experimental results, we are able to include other terms in the Bernoulli equation. We also look at cases where pressure gradients exist. We use the Modified Bernoulli equation to derive equations of flow rate for pipes of different cross sectional areas connected together. We also extend our techniques of energy conservation to a sphere falling in a viscous medium under the effect of gravity. We demonstrate Stokes equation of terminal velocity and turbulent flow equation. We look at a way of calculating the time taken for a body to fall in a viscous medium. We also look at the general equation of terminal velocity.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Toxic effects of heavy metals : Lead and Arsenicsanjana502982
Heavy metals are naturally occuring metallic chemical elements that have relatively high density, and are toxic at even low concentrations. All toxic metals are termed as heavy metals irrespective of their atomic mass and density, eg. arsenic, lead, mercury, cadmium, thallium, chromium, etc.
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Studia Poinsotiana
I Introduction
II Subalternation and Theology
III Theology and Dogmatic Declarations
IV The Mixed Principles of Theology
V Virtual Revelation: The Unity of Theology
VI Theology as a Natural Science
VII Theology’s Certitude
VIII Conclusion
Notes
Bibliography
All the contents are fully attributable to the author, Doctor Victor Salas. Should you wish to get this text republished, get in touch with the author or the editorial committee of the Studia Poinsotiana. Insofar as possible, we will be happy to broker your contact.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Data Governance in two different data archives: When is a federal data repository useful
1. Data Governance in Two Different Data Archives: When
is a Federal Data Repository Useful?
Greg Farber
Director, Office of Technology Development and Coordination
National Institute of Mental Health
National Institutes of Health
March 2018
2. 1) Most research subjects want their data to be used to understand
disease broadly. They are not too concerned about how researchers
use their data.
2) The diseases we are trying to understand today are complex meaning
that the same symptoms can have many different underlying biological
causes. Except in the cases where a deeply penetrant point mutation
uncovers a single biological pathway to a disease, understanding the
“subgroups” for complex diseases requires data from large populations
who have similar symptoms.
3) Differences in data sharing laws in different countries makes it difficult
or impossible to move data across international borders. Federating
data archives that are storing data in a similar way provides an
inelegant but workable solution to this problem.
4) Despite the urgent need to aggregate data to understand complex
diseases, individual consents and local laws must be respected.
Guiding Ideas
2
4. • Contrast two data archives that have built the infrastructure necessary
to aggregate data on complex diseases.
• NIMH Data Archive (NDA) Overview
▪ Federal Data Repository where the data are owned by the US National
Institutes of Health
▪ Infrastructure
▪ Policy Issues
• Human Connectome Program (HCP)
▪ Large NIH funded project
▪ Access to most data was by self certification
▪ Initial Data Distribution was through Washington University
Roadmap
5. • Stores data from experiments involving human subjects that are
deposited by research laboratories.
▪ Federal data repository
▪ Originally contained data from human subjects related to mental illness
(and control subjects), but that has expanded in a number of ways over the
past 12 months. Most subjects have consented to broad data sharing.
▪ Data are available to the research community through a not too difficult
application process.
▪ Both submission and access to subject level data require approval of an
institutional official.
▪ Summary data are available to everyone with a browser (https://data-
archive.nimh.nih.gov/)
• Begun in late 2006, and first data was received in 2008
• The data types include demographic data, clinical assessments,
imaging data, and –omic data. There are no formal limits to the types
of data that can be stored in NDA.
NIMH Data Archive
6.
7. • The NDA currently makes data available to the research community
from 200,000 subjects. Additional data are held by the NDA but are
not yet ready for sharing because the grant is still active and/or has
not published papers.
• Many subjects have longitudinal data.
• ~1.1 PB of imaging and –omic data is securely stored in the Amazon
cloud.
• Currently, the NDA does not contain any personally identifiable
information, but we expect to begin holding such data in the near
future (data from mobile devices).
▪ This change will likely require that NDA verify that the use of the data has
been approved by an Institutional Review Board.
NIMH Data Archive – Current Size and Scope
8. • It is best to think of NDA as a large (~182,000 data elements by
~200,000 people), sparse, two dimensional matrix.
NDA Structure – Rows and Columns are the Building
Blocks
8
9. • The NDA data dictionary is one of the key building blocks for this
repository. It provides a flexible and extensible framework for data
definition by the research community.
• 2,000+ instruments, freely available to anyone
▪ 180,000+ unique data elements and growing
▪ Data dictionaries describing
• Clinical
• Genomics/Proteomics
• MRI Modalities
• Other complex data (EEG, eye tracking)
• Accommodates any data type and data structure
• Describes the data collected by the research community
Data Dictionary – The First Building Block
10. • Curated by NDA (this takes a lot of time)
• Data held in different archives needs to use common data
dictionaries to allow deep federation.
• The associated validation tool allows investigators to
quickly perform quality control tests of their data without
submitting data anywhere.
• Data in archives that don’t have a similar QC step are
likely to have issues.
• Both to enhance the quality of the science and to ensure
that the time and effort that research subjects are
spending in our research protocols, the validation tool
should be run frequently (daily, weekly). This is common
practice in many other domains.
Data Dictionary – The First Building Block
11.
12.
13.
14. • The NDA GUID software allows any researcher
to generate a unique identifier using some
information from a birth certificate.
• If the same information is entered in different
laboratories, the same GUID will be generated.
• This strategy allows NDA to aggregate data on
the same subject collected in multiple
laboratories without holding any of the personally
identifiable information about that subject.
• The GUID is now being discussed in a number of
additional research communities. We think we
have a reasonable plan to prevent a GUID from
becoming something like a social security
number (which would be identifying in itself)
• External studies indicate that the GUID
implementation is pretty robust both to false
positives and false negatives in large
populations.
Global Unique Identifier – the Other Building Block
16. At this point, data has been received from the laboratory
that measured the data. Each subject has a GUID or a
pseudo-GUID. A data dictionary has have been defined,
and the submitted data have been validated against that
definition.
How does an outside user find data they are interested
in?
17.
18. An Example of Data Associated with a Particular Laboratory
19.
20. Now assigning DOIs to
each study, and we can
track how often a DOI
link is clicked (the start
of a data citation)
23. • Assertion: Any consent language that restricts the use of the data
for particular purposes (for autism research…) results in profound
negative consequences.
• For example, if a researcher is trying to aggregate data between
subjects with schizophrenia and autism to understand common
symptoms that are observed in the two diagnostic groups, a consent that
limited a dataset for use only to understand one of those diagnostic
conditions would probably mean the data is not accessible for a
comparison study.
• Restricted data are also probably off limits for those who are trying to
use data mining techniques to develop or substantiate a hypothesis.
• There are some cases where restrictive consents might be appropriate,
but this should be the rare exception.
Policy – Consents
23
24. • NIMH expects that
research we pay for
involving human subjects
will result in that data
being made available in
NDA.
• Journals can also have a
positive role to play in
requiring that data be
placed in a repository
prior to publication.
• Asking for data volunteers
probably isn’t good
enough right now.
Policy – Data Deposition
24
25. • Summary data are available to anyone via the web site, but
accessing subject level data requires a data access form.
• Similarly, a data submission agreement is required that
certifies that the data were consented for sharing.
• Both forms require the signature from the PI and an
institutional official. This means that the research
institution is formally responsible for ensuring that the
data are “treated with respect”.
• Although neither form is complicated, they do raise barriers
to accessing the data.
Policy – NDA Data Access and Data Submission
25
26. • The NIMH Data Archive does hold some data that were
collected outside the US.
• For those datasets, the institutional official has decided that
depositing data is allowed both by the terms of the informed
consent and by the laws in that country.
• When there are restrictions to allowing data to be moved, it
is still possible to make it easy for the research community
to find data by federating data archives.
Policy – Data from Institutions Outside the US
26
27. • For NDA, submitting data is separate from sharing that data
with the research community.
• Data are shared when the grant is complete or when a
paper is published.
• Other sharing timelines are possible.
• No matter when the data are shared, data need to be
submitted on a regular basis. This ensures that the data
from a grant award has been submitted before funding is
exhausted. More importantly, periodic data submission
ensures that the data have undergone basic QC checks as
they are collected.
Policy – Timeline for Data Sharing
27
28. • Responding to a number of instances of high visibility/impact
experiments that were not thoughtfully designed, NIH (and
NIMH) have instituted a number of programs to enhance
rigor and reproducibility in research supported by NIH.
• These discussions with the community started in June 2012.
The new guidelines to increase rigor and reproducibility are
outlined in NOT-OD-15-103 and at a web site
(https://www.nih.gov/research-training/rigor-reproducibility).
• Data archives plays an important role in improving the rigor
and reproducibility of NIMH funded research.
Rigor and Reproducibility – Data Archives Help
28
29. • Data dictionaries are a key part of the NDA infrastructure. Each item in a data
dictionary has an allowable range of values. The NDA has a validation tool that
allows users to check a data set to see if it conforms with the allowable ranges and
formats in a data dictionary.
• Because of our mandated data deposition schedule the validation tool allows labs
to find errors every 6 months when data are deposited (or more often if they
choose).
Rigor and Reproducibility - 2
29
30. • The NDA makes it easy to identify the data associated with a publication, and we
assign a doi to that dataset to make it trivial for the research community to find the
data.
• Identifying the data from a publication allows researchers to look at all of the data
collected under an award and compare that to the data used in the publication.
Rigor and Reproducibility - 3
30
31. • There are “professional” research participants who seem to make a
living volunteering for clinical studies. Websites exist that make it
relatively easy for such participants to find out the right answers to
screening questions to be admitted to a study. Clearly, this can be
dangerous to the volunteer and can also put the rigor and reproducibility
of the study at risk.
• Recruitment in certain diagnostic categories may take place in a small
number of clinical centers. This means that papers from many different
research groups may be sampling from a smaller population than the
“independent” papers might suggest.
• The NDA GUID helps the research community understand the size of
these problems and deal with these issues.
• There are also commercial services that aid in the screening for
someone who is participating in multiple clinical trials.
GUIDs and Rigor/Reproducibility
31
33. 1) NIMH Data Archive – very heterogeneous data collected in multiple
laboratories. NDA attempts to aggregate this data using a global
unique identifier system as well as data dictionaries to describe the
myriad experiments.
2) Human Connectome Program – heterogeneous data (clinical
assessments, imaging, MEG, genomics) collected using a common
protocol. The first phase of this project involved data collection from
typical research subjects at a single site. The project has recently
expanded to include data collected across the lifespan for control
subjects as well as from subjects with a diagnosis. Those datasets are
collected at multiple laboratories, but still use similar data collection
protocols.
Two Different Sorts of Data Archives
34. • The NIH Human Connectome Project (HCP) is supported by the NIH
Neuroscience Blueprint ICs
• The HCP is an ambitious effort to map the neural pathways that underlie
human brain function. The overarching purpose of the Project is to
acquire and share data about the structural and functional connectivity of
the human brain. It has greatly advance the capabilities for imaging and
analyzing brain connections, resulting in improved sensitivity, resolution,
and utility, thereby accelerating progress in the emerging field of human
connectomics.
• Phase 1 of the HCP resulted in two awards
■ David Van Essen and Kamil Ugurbil, Wash U and U Minnesota
■ Bruce Rosen, MGH
Human Connectome Project
34
35. 1) Deliver advanced MRI scanners and
techniques with high spatial and
temporal resolution for functional
and diffusion MRI.
■ Both the MGH and the Wash U MRIs
worked as designed and were able
to collect data quickly. Siemens
learned a great deal from
collaborating on both instruments,
and their new family of 3T MRIs (the
Prisma) has operating characteristics
similar to the Wash U scanner.
■ The supplements to port the pulse
sequences to other laboratories and
to other manufacturers has also been
successful.
Phase 1 Connectome Accomplishments
35
36. 2) Deliver high quality data to the research community
■ Wash U has released data from 1200 subjects. This includes
behavioral assessments, structural MRIs, rs fMRI, task fMRI, and
diffusion experiments. MEG data has also just been released. This is
the first time that a large imaging award adopted “genome speed” data
release.
■ Data from MGH are being made available on their web site as well as
at the Wash U web site.
■ The data are being widely used by the research community. More
than 100 papers cited the Wash U grant at the point where data
collection was only half complete.
■ High visibility papers have appeared. Researchers from outside the
WU-Minn collaboration have authored some of those papers.
Phase 1 Connectome Accomplishments
36
37. • Based on the results from the original connectome project (which
collected data on 22-34 year old healthy subjects), NIH decided to fund
awards for a lifespan connectome. Three awards have been made that
will cover the age range from birth to the oldest old (90+).
• In addition, NIH has funded 14 awards to measure connectomics on
groups that have some sort of diagnosis (Alzheimers, low vision,
dementia, epilepsy, mood and anxiety disorders, psychosis, …).
• Over 8,000 subjects are participating and nearly 12,000 scans are
expected in the data infrastructure by 2021. Phenotypic and clinical
assessments as well as other non-MRI data are being collected and
made available.
• In addition, the Adolescent Brain Cognitive Development (ABCD) study
has chosen to use the connectome data collection protocol. That study
intends to enroll 10,000 children aged 9-10 and follow them into early
adulthood. This dataset requires a data access agreement.
Connectome Today
37
38. • A Connectome Coordination Facility has been created to hold all of the
data (https://www.humanconnectome.org/).
• The original HCP consents allowed almost unlimited access to the data
(clinical and phenotypic data as well as the MRI and MEG data).
• An individual who wanted data simply enters a working web site into the
registration system and certifies that the will not attempt to re-identify
any of the research participants.
• Many of the original participants are part of the Missouri twin study. This
caused some of the measured data (family structure, substance use) to
be declared sensitive. The sensitive data had a more restrictive data
access protocol.
HCP Data
38
42. • Clearly resulted in a lot of data use – transfers, papers, …
• Open access probably helped the community to adopt HCP data
collection as the current standard
• Even in this open access data set, there is still some information that is
sensitive and requires approval. When the data were at Wash U, the
only penalty for misusing the data was loss of further access to the data.
• No penalties were ever imposed for mistreating data.
• ABCD early data availability (needs DAC) seems to be around the same
level as HCP – does this mean that researchers will do what it takes to
get good data?
• Probably the key question to think about when deciding between open
access and a more restrictive model is what penalties need to be
imposed if the data are not treated in accord with the data access
agreement.
HCP – Open Access
42
44. • Understanding complex diseases need lots of different data
from a variety of sources.
• Informed consents, national laws concerning data sharing,
and investigator preferences can all restrict the aggregation
of data.
• All of those issues can be solved, with some effort.
• If you have an option, deciding whether to share data under a
very open access model or in a federal database should be
made based on what needs to happen if the data are not
treated appropriately.
• Even though it is easier to get data from an open repository,
early results from the ABCD project suggest that users will
take the steps needed to get access to high quality data.
Summary