This document discusses why journals should ask authors to include Research Resource Identifiers (RRIDs) in their manuscripts. RRIDs help answer questions about what antibodies, animals, cell lines, or software tools were used in a study and allow others to find papers that used the same resources. The document notes that RRIDs improve reproducibility by making materials and methods more transparent. It also discusses how RRIDs can help identify problematic resources like contaminated cell lines or antibodies that do not work or are no longer available. The document provides examples of journals that now require RRIDs and how compliance is implemented.
Presentation on the Resource Identification Pilot Project, an initiative to develop a machine-processable citation system for key research resources used in scientific studies
This presentation was provided by Leslie McIntosh of Ripeta, during the NISO hot topic event "Preprints." The virtual conference was held on April 21, 2021.
Identifying and tracking research resources using RRIDs: a practical approachdkNET
At this presentation, you will learn (1) Why you need to use Research Resource identifier (RRID) (2) What is Resource Identification Initiative (3) How dkNET.org supports RRID (4) What can you do with RRID
RSC|ChemSpider is one of the world’s largest online resources for chemistry related data and services. Developed with the intention of delivering access to structure-based chemistry data via the internet the ChemSpider platform hosts over 26 million unique chemical compounds aggregated from over 400 data sources and provides an environment for the community to both annotate and curate these existing data as well as deposit new data to the system. The search system delivers flexible querying capabilities together with links to external sites for publication and patent data. This presentation will review the present capabilities of the ChemSpider system providing direct examples of how to use the system to source high quality data of value to chemists. We will discuss some of the challenges associated with validating data quality and examine how ChemSpider is a part of the new “semantic web for chemistry”. ChemSpider has also spawned a number of additional projects include ChemSpider SyntheticPages for hosting openly peer-reviewed chemical synthesis articles, Learn Chemistry Wiki for students learning chemistry and SpectraSchool for learning spectroscopy.
This presentation was provided by Alberto Pepe of Authorea, during the NISO hot topic event "Preprints." The virtual conference was held on April 21, 2021.
Presentation on the Resource Identification Pilot Project, an initiative to develop a machine-processable citation system for key research resources used in scientific studies
This presentation was provided by Leslie McIntosh of Ripeta, during the NISO hot topic event "Preprints." The virtual conference was held on April 21, 2021.
Identifying and tracking research resources using RRIDs: a practical approachdkNET
At this presentation, you will learn (1) Why you need to use Research Resource identifier (RRID) (2) What is Resource Identification Initiative (3) How dkNET.org supports RRID (4) What can you do with RRID
RSC|ChemSpider is one of the world’s largest online resources for chemistry related data and services. Developed with the intention of delivering access to structure-based chemistry data via the internet the ChemSpider platform hosts over 26 million unique chemical compounds aggregated from over 400 data sources and provides an environment for the community to both annotate and curate these existing data as well as deposit new data to the system. The search system delivers flexible querying capabilities together with links to external sites for publication and patent data. This presentation will review the present capabilities of the ChemSpider system providing direct examples of how to use the system to source high quality data of value to chemists. We will discuss some of the challenges associated with validating data quality and examine how ChemSpider is a part of the new “semantic web for chemistry”. ChemSpider has also spawned a number of additional projects include ChemSpider SyntheticPages for hosting openly peer-reviewed chemical synthesis articles, Learn Chemistry Wiki for students learning chemistry and SpectraSchool for learning spectroscopy.
This presentation was provided by Alberto Pepe of Authorea, during the NISO hot topic event "Preprints." The virtual conference was held on April 21, 2021.
Using ADAGE for pathway-style analysesCasey Greene
This talk was given at the Simons Institute Network Biology workshop. A video of the talk is available online:
https://www.youtube.com/watch?v=HpXDoMi4YO8
No Boundary Thinking in Bioinformatics Workshop KeynoteCasey Greene
"The bounty of the commons"
In this talk, we explore how public data can become more valuable with reuse. This reuse helps us get to the bottom of cases where we are certain and wrong and helps us ask better questions.
This presentation was provided by Kathryn Funk of the National Library of Medicine, during the NISO hot topic event "Preprints." The virtual conference was held on April 21, 2021.
Poster presentation about the Resource Identification Initiative (http://www.force11.org/Resource_identification_initiative) at the Research Data Alliance meeting in Dublin, Ireland in March 2014 (https://rd-alliance.org/rda-third-plenary-meeting.html).
Profiling systems have achieved notable adoption by research institutions.1 Multi-site search of research profiling systems has substantially evolved since the first deployment of systems such as DIRECT2Experts.2 CTSAsearch is a federated search engine using VIVO-compliant Linked Open Data (LOD) published by members of the NIH-funded Clinical and Translational Science (CTSA) consortium and other interested parties. Sixty-four institutions are currently included, spanning six distinct platforms and three continents (North America, Europe and Australia). In aggregate, CTSAsearch has data on 150-300 thousand unique researchers and their 10 million publications. The public interface is available at http://research.icts.uiowa.edu/polyglot.
Two Clinical Workflows - From Unfiltered Variants to a Clinical ReportGolden Helix Inc
Clinical labs need to be able to process samples down to a short list of variants and publish a professional report. Two common clinical applications for genetic tests include Cancer Gene Panels and Whole Exome Trios. Using VarSeq and VSReports, we will demonstrate how easy it is to go from a variant file created by a secondary analysis pipeline containing unfiltered variants to a report containing information for variants of interest. Along the way we will discuss tips and tricks and answer frequently asked questions to help you get the most out of your data!
The Center for Expanded Data Annotation and Retrieval (CEDAR) has developed a suite of tools and services that allow scientists to create and publish metadata describing scientific experiments. Using these tools and services—referred to collectively as the CEDAR Workbench—scientists can collaboratively author metadata and submit them to public repositories. A key focus of our software is semantically enriching metadata with ontology terms. The system combines emerging technologies, such as JSON-LD and graph databases, with modern software development technologies, such as microservices and container platforms. The result is a suite of user-friendly, Web-based tools and REST APIs that provide a versatile end-to-end solution to the problems of metadata authoring and management. This talk presents the architecture of the CEDAR Workbench and focuses on the technology choices made to construct an easily usable, open system that allows users to create and publish semantically enriched metadata in standard Web formats.
Access to scientific information has changed in a manner that was likely never even imagined by the early pioneers of the internet. The quantities of data, the array of tools available to search and analyze, the devices and the shift in community participation continues to expand while the pace of change does not appear to be slowing. ChemSpider is one of the chemistry community’s primary online public compound databases. Containing tens of millions of chemical compounds and its associated data ChemSpider serves data tens of thousands of chemists every day and it serves as the foundation for many important international projects to integrate chemistry and biology data, facilitate drug discovery efforts and help to identify new chemicals from under the ocean. This presentation will provide an overview of the expanding reach of this eScience cheminformatics platform and the nature of the solutions that it helps to enable including structure validation, text mining and semantic markup, the National Chemical Database Service for the United Kingdom and the development of a chemistry data repository. We will also discuss the possibilities it offers in the domain of crowdsourcing and open data sharing. The future of scientific information and communication will be underpinned by these efforts, influenced by increasing participation from the scientific community and facilitated collaboration and ultimately accelerate scientific progress.
The metadata about scientific experiments are crucial for finding, reproducing, and reusing the data that the metadata describe. We present a study of the quality of the metadata stored in BioSample—a repository of metadata about samples used in biomedical experiments managed by the U.S. National Center for Biomedical Technology Information (NCBI). We tested whether 6.6 million BioSample metadata records are populated with values that fulfill the stated requirements for such values. Our study revealed multiple anomalies in the analyzed metadata. The BioSample metadata field names and their values are not standardized or controlled—15% of the metadata fields use field names not specified in the BioSample data dictionary. Only 9 out of 452 BioSample-specified fields ordinarily require ontology terms as values, and the quality of these controlled fields is better than that of uncontrolled ones, as even simple binary or numeric fields are often populated with inadequate values of different data types (e.g., only 27% of Boolean values are valid). Overall, the metadata in BioSample reveal that there is a lack of principled mechanisms to enforce and validate metadata requirements. The aberrancies in the metadata are likely to impede search and secondary use of the associated datasets.
Model organisms such as budding yeast provide a common platform to interrogate and understand cellular and physiological processes. Knowledge about model organisms, whether generated during the course of scientific investigation, or extracted from published articles, are made available by model organism databases (MODs) such as the Saccharomyces Genome Database (SGD) for powerful, data-driven bioinformatic analyses. Integrative platforms such as InterMine offer a standard platform for MOD data exploration and data mining. Yet, today’s bioinformatic analyses also requires access to a significantly broader set of structured biomedical data, such as what can be found in the emerging network of Linked Open Data (LOD). If MOD data could be provisioned as FAIR (Findable, Accessible, Interoperable, and Reusable), then scientists could leverage a greater amount of interoperable data in knowledge discovery.
The goal of this proposal is to increase the utility of MOD data by implementing standards-compliant data access interfaces that interoperate with Linked Data. We will focus our efforts on developing interfaces for data access, data retrieval, and query answering for SGD. Our software will publish InterMine data as LOD that are semantically annotated with ontologies and be retrieved using standardized formats (e.g. JSON-LD, Turtle). We will facilitate the exploration of MOD data for hypothesis testing, by implementing efficient query answering using Linked Data Fragments, and by developing a set of graphical user interfaces to search for data of interest, explore connections, and answer questions that leverage the wider LOD network. Finally, we will develop a locally and cloud-deployable image to enable the rapid deployment of the proposed infrastructure. Our efforts to increase interoperability and ease of deployment for biomedical data repositories will increase research productivity and reduce costs associated with data integration and warehouse maintenance.
This is a presentation that I delivered at the ACS Division of Chemical Information meeting regarding "Reproducibility, Reporting, Sharing & Plagiarism".
I took the opportunity to remove my hat that has me be the VP of Strategic Development at RSC, and a member of the cheminformatics group that built ChemSpider and works on other RSC projects related to it. Instead I presented on how a LACK OF MANDATES from publishers on me in terms of submission of data accompanying articles I am involved with writing is actually weakening my scientific record as data is not getting shared in the most useful forms possible to the benefit of the community. I think there would be benefits for publishers to start pushing me for MORE data, in fairly general standards, and allowing me (and others) to download the data in the form of molecules (and collections), spectral data, CSV files etc.
Backbone taxonomies, data aggregation, and early career systematists: somethi...MAndrewJ
Backbone taxonomies, data aggregation, and early career systematists: something's got to give
Entomological Collections Network presentation by M. Andrew Johnston and Nico M. Franz, Denver, Colorado Nov 4, 2017
dkNET Webinar: Discovering and Evaluating Antibodies, Cell Lines, Software To...dkNET
Abstract
dkNET’s Resource Reports (https://dknet.org/rin/rrids) enable researchers to discover research resources that would be useful for their research. The resource report integrated data set and analytics platform combines Research Resource Identifiers (RRIDs), text mining and data aggregation to help you identify key biomedical resources, track these resources, and compare their performance. Resource Reports offer a detailed overview of each resource along with citation metrics from the biomedical literature and even information about what resources have been used together. You'll gain insights about who is using particular resources and how the community views those resources, including usage in published protocols.
The dkNET Co-PI, Dr Jeffrey Grethe, will give you live demos during this webinar, including:
- How to find and select a research resource such as an antibody or a cell line
- How to find Research Resource Identifiers (RRIDs) and proper citation of your resources
- How to register resources to obtain RRIDs if the resources do not exist in the system
We hope this short webinar will provide an opportunity to use this tool to shape your research activities.
Presenter: Jeffrey Grethe, PhD, dkNET Co-Principal Investigator, University of California San Diego
Upcoming webinars schedule: https://dknet.org/about/webinar
Given at the NIH stock center directors meeting, August 8, 2016. Author: Anita Bandrowski
Project: Resource Identification Initiative http://scicrunch.org/resources
Topic: How is model organism data being used in literature
Using ADAGE for pathway-style analysesCasey Greene
This talk was given at the Simons Institute Network Biology workshop. A video of the talk is available online:
https://www.youtube.com/watch?v=HpXDoMi4YO8
No Boundary Thinking in Bioinformatics Workshop KeynoteCasey Greene
"The bounty of the commons"
In this talk, we explore how public data can become more valuable with reuse. This reuse helps us get to the bottom of cases where we are certain and wrong and helps us ask better questions.
This presentation was provided by Kathryn Funk of the National Library of Medicine, during the NISO hot topic event "Preprints." The virtual conference was held on April 21, 2021.
Poster presentation about the Resource Identification Initiative (http://www.force11.org/Resource_identification_initiative) at the Research Data Alliance meeting in Dublin, Ireland in March 2014 (https://rd-alliance.org/rda-third-plenary-meeting.html).
Profiling systems have achieved notable adoption by research institutions.1 Multi-site search of research profiling systems has substantially evolved since the first deployment of systems such as DIRECT2Experts.2 CTSAsearch is a federated search engine using VIVO-compliant Linked Open Data (LOD) published by members of the NIH-funded Clinical and Translational Science (CTSA) consortium and other interested parties. Sixty-four institutions are currently included, spanning six distinct platforms and three continents (North America, Europe and Australia). In aggregate, CTSAsearch has data on 150-300 thousand unique researchers and their 10 million publications. The public interface is available at http://research.icts.uiowa.edu/polyglot.
Two Clinical Workflows - From Unfiltered Variants to a Clinical ReportGolden Helix Inc
Clinical labs need to be able to process samples down to a short list of variants and publish a professional report. Two common clinical applications for genetic tests include Cancer Gene Panels and Whole Exome Trios. Using VarSeq and VSReports, we will demonstrate how easy it is to go from a variant file created by a secondary analysis pipeline containing unfiltered variants to a report containing information for variants of interest. Along the way we will discuss tips and tricks and answer frequently asked questions to help you get the most out of your data!
The Center for Expanded Data Annotation and Retrieval (CEDAR) has developed a suite of tools and services that allow scientists to create and publish metadata describing scientific experiments. Using these tools and services—referred to collectively as the CEDAR Workbench—scientists can collaboratively author metadata and submit them to public repositories. A key focus of our software is semantically enriching metadata with ontology terms. The system combines emerging technologies, such as JSON-LD and graph databases, with modern software development technologies, such as microservices and container platforms. The result is a suite of user-friendly, Web-based tools and REST APIs that provide a versatile end-to-end solution to the problems of metadata authoring and management. This talk presents the architecture of the CEDAR Workbench and focuses on the technology choices made to construct an easily usable, open system that allows users to create and publish semantically enriched metadata in standard Web formats.
Access to scientific information has changed in a manner that was likely never even imagined by the early pioneers of the internet. The quantities of data, the array of tools available to search and analyze, the devices and the shift in community participation continues to expand while the pace of change does not appear to be slowing. ChemSpider is one of the chemistry community’s primary online public compound databases. Containing tens of millions of chemical compounds and its associated data ChemSpider serves data tens of thousands of chemists every day and it serves as the foundation for many important international projects to integrate chemistry and biology data, facilitate drug discovery efforts and help to identify new chemicals from under the ocean. This presentation will provide an overview of the expanding reach of this eScience cheminformatics platform and the nature of the solutions that it helps to enable including structure validation, text mining and semantic markup, the National Chemical Database Service for the United Kingdom and the development of a chemistry data repository. We will also discuss the possibilities it offers in the domain of crowdsourcing and open data sharing. The future of scientific information and communication will be underpinned by these efforts, influenced by increasing participation from the scientific community and facilitated collaboration and ultimately accelerate scientific progress.
The metadata about scientific experiments are crucial for finding, reproducing, and reusing the data that the metadata describe. We present a study of the quality of the metadata stored in BioSample—a repository of metadata about samples used in biomedical experiments managed by the U.S. National Center for Biomedical Technology Information (NCBI). We tested whether 6.6 million BioSample metadata records are populated with values that fulfill the stated requirements for such values. Our study revealed multiple anomalies in the analyzed metadata. The BioSample metadata field names and their values are not standardized or controlled—15% of the metadata fields use field names not specified in the BioSample data dictionary. Only 9 out of 452 BioSample-specified fields ordinarily require ontology terms as values, and the quality of these controlled fields is better than that of uncontrolled ones, as even simple binary or numeric fields are often populated with inadequate values of different data types (e.g., only 27% of Boolean values are valid). Overall, the metadata in BioSample reveal that there is a lack of principled mechanisms to enforce and validate metadata requirements. The aberrancies in the metadata are likely to impede search and secondary use of the associated datasets.
Model organisms such as budding yeast provide a common platform to interrogate and understand cellular and physiological processes. Knowledge about model organisms, whether generated during the course of scientific investigation, or extracted from published articles, are made available by model organism databases (MODs) such as the Saccharomyces Genome Database (SGD) for powerful, data-driven bioinformatic analyses. Integrative platforms such as InterMine offer a standard platform for MOD data exploration and data mining. Yet, today’s bioinformatic analyses also requires access to a significantly broader set of structured biomedical data, such as what can be found in the emerging network of Linked Open Data (LOD). If MOD data could be provisioned as FAIR (Findable, Accessible, Interoperable, and Reusable), then scientists could leverage a greater amount of interoperable data in knowledge discovery.
The goal of this proposal is to increase the utility of MOD data by implementing standards-compliant data access interfaces that interoperate with Linked Data. We will focus our efforts on developing interfaces for data access, data retrieval, and query answering for SGD. Our software will publish InterMine data as LOD that are semantically annotated with ontologies and be retrieved using standardized formats (e.g. JSON-LD, Turtle). We will facilitate the exploration of MOD data for hypothesis testing, by implementing efficient query answering using Linked Data Fragments, and by developing a set of graphical user interfaces to search for data of interest, explore connections, and answer questions that leverage the wider LOD network. Finally, we will develop a locally and cloud-deployable image to enable the rapid deployment of the proposed infrastructure. Our efforts to increase interoperability and ease of deployment for biomedical data repositories will increase research productivity and reduce costs associated with data integration and warehouse maintenance.
This is a presentation that I delivered at the ACS Division of Chemical Information meeting regarding "Reproducibility, Reporting, Sharing & Plagiarism".
I took the opportunity to remove my hat that has me be the VP of Strategic Development at RSC, and a member of the cheminformatics group that built ChemSpider and works on other RSC projects related to it. Instead I presented on how a LACK OF MANDATES from publishers on me in terms of submission of data accompanying articles I am involved with writing is actually weakening my scientific record as data is not getting shared in the most useful forms possible to the benefit of the community. I think there would be benefits for publishers to start pushing me for MORE data, in fairly general standards, and allowing me (and others) to download the data in the form of molecules (and collections), spectral data, CSV files etc.
Backbone taxonomies, data aggregation, and early career systematists: somethi...MAndrewJ
Backbone taxonomies, data aggregation, and early career systematists: something's got to give
Entomological Collections Network presentation by M. Andrew Johnston and Nico M. Franz, Denver, Colorado Nov 4, 2017
dkNET Webinar: Discovering and Evaluating Antibodies, Cell Lines, Software To...dkNET
Abstract
dkNET’s Resource Reports (https://dknet.org/rin/rrids) enable researchers to discover research resources that would be useful for their research. The resource report integrated data set and analytics platform combines Research Resource Identifiers (RRIDs), text mining and data aggregation to help you identify key biomedical resources, track these resources, and compare their performance. Resource Reports offer a detailed overview of each resource along with citation metrics from the biomedical literature and even information about what resources have been used together. You'll gain insights about who is using particular resources and how the community views those resources, including usage in published protocols.
The dkNET Co-PI, Dr Jeffrey Grethe, will give you live demos during this webinar, including:
- How to find and select a research resource such as an antibody or a cell line
- How to find Research Resource Identifiers (RRIDs) and proper citation of your resources
- How to register resources to obtain RRIDs if the resources do not exist in the system
We hope this short webinar will provide an opportunity to use this tool to shape your research activities.
Presenter: Jeffrey Grethe, PhD, dkNET Co-Principal Investigator, University of California San Diego
Upcoming webinars schedule: https://dknet.org/about/webinar
Given at the NIH stock center directors meeting, August 8, 2016. Author: Anita Bandrowski
Project: Resource Identification Initiative http://scicrunch.org/resources
Topic: How is model organism data being used in literature
RSC|ChemSpider is one of the world’s largest online resources for chemistry related data and services. Developed with the intention of delivering access to structure-based chemistry data via the internet the ChemSpider platform hosts over 26 million unique chemical compounds aggregated from over 400 data sources and provides an environment for the community to both annotate and curate these existing data as well as deposit new data to the system. The search system delivers flexible querying capabilities together with links to external sites for publication and patent data. This presentation will review the present capabilities of the ChemSpider system providing direct examples of how to use the system to source high quality data of value to chemists. We will discuss some of the challenges associated with validating data quality and examine how ChemSpider is a part of the new “semantic web for chemistry”. ChemSpider has also spawned a number of additional projects include ChemSpider SyntheticPages for hosting openly peer-reviewed chemical synthesis articles, Learn Chemistry Wiki for students learning chemistry and SpectraSchool for learning spectroscopy.
Research Data Alliance (RDA) Webinar: What do you really know about that anti...dkNET
What do you really know about that antibody? Ask dkNET
Research resources-defined here as the tools researchers use in their scientific studies-are a foundation of the biomedical enterprise. It is critical for researchers to be able to select the proper tools for their research, but also be aware of any issues that may arise in their application. Software tools and datasets may have bugs, cell lines get contaminated, knock outs may be incomplete and antibodies may have specificity problems. Such problematic resources can continue to be used in scientific studies, even after problems are detected. Many factors, including the inability to easily retrieve alerts about problematic resources, results in their continued use, wasting both time and money. To make it easy to find information about research resources and how they perform, dkNET (NIDDK Information Network, https://dknet.org), an on-line portal supported by the US National Institute of Diabetes, Digestive and Kidney diseases (NIDDK), has developed a resource information network that utilize Research Resource Identifiers (RRIDs) and natural language processes to aggregate information about individual antibodies, cell lines, organisms, digital tools, plasmids and biosamples. This information is presented in a Resource Report that provides information such as which papers have been published using these resources, who is using them and whether issues have been reported. Using this information, dkNET also provides tools to create authentication reports in support of the NIH rigor and reproducibility guidelines. The dkNET portal includes additional information to enable researchers to easily use and navigate large amounts of data and information about research resources in support of reproducible science.
By the end of this webinar, participants will be familiar with the services and tools provided at dkNET and will be able to create a detailed research resource report and produce an authentication report in support of NIH mandates and policies.
Presenter: Maryann Martone, PhD, FAIR Data Informatics Lab (FDI Lab), University of California, San Diego
dkNET Webinar: Discover the Latest from dkNET - Biomed Resource Watch 06/02/2023dkNET
dkNET Webinar: Discover the Latest from dkNET - Biomed Resource Watch
Presenter: Jeffrey Grethe, PhD, dkNET Principal Investigator, University of California San Diego
Abstract
The dkNET (NIDDK Information Network) team is announcing an exciting new service - Biomed Resource Watch (BRW, https://scicrunch.org/ResourceWatch), a knowledge base for aggregating and disseminating known problems and performance information about research resources such as antibodies, cell lines, and tools. We aggregate trustworthy information from authorized sources such as Cellosaurus, Antibody Registry, Human Protein Atlas, ENCODE, and many more. In addition, BRW includes antibody specificity text mining information extracted from the literature via natural language processing. BRW provides researchers and curators an easy-to-use interface to report their claims about a specific resource. Researchers can check information about a resource before planning their experiments via BRW-enhanced Resource Reports. This new service aims to help improve efficiency in selecting appropriate resources, enhancing scientific rigor and reproducibility, and promoting a FAIR (Findable, Accessible, Interoperable, Reusable) research resource ecosystem in the biomedical research community.
Join us for a webinar to introduce the following resources & topics:
1. An overview of dkNET
2. How Resource Reports benefit you
3. Biomed Resource Watch
3.1 Navigating Biomed Resource Watch
3.2 How to Submit a Claim
Upcoming webinars schedule: https://dknet.org/about/webinar
Recomendations for infrastructure and incentives for open science, presented to the Research Data Alliance 6th Plenary. Presenter: William Gunn, Director of Scholarly Communications for Mendeley.
How can we ensure research data is re-usable? The role of Publishers in Resea...LEARN Project
How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum. 2nd LEARN Workshop, Vienna, 6th April 2016
Dr. Edward Kai-Hua Chow, JALA Associate Editor/Asia and National University of Singapore, shares his SLAS2013 JALA and JBS Authors Workshop presentation. Learn more about these leading peer-reviewed journals, and then see Ed's tips for publication beginning on slide 16.
ORCID Implementation in Open Access Repositories and Institutional Research I...Simeon Warner
Slides from presentation with Pablo de Castro at Open Repositories 2013 (http://or2012.net/)
ORCID provides individual researchers and scholars with a persistent unique identifier. Initial adoption has been rapid but the full benefit will be realized only if ORCID iDs are used by all stakeholder communities. ORCID iDs enable reuse of items in new contexts by making connections between items from the same author in different places. Through its author-focused approach ORCID will contribute to bridging the current divide between management of publications and research data, which are often carried out in independent ways through different, frequently disconnected kinds of repositories. We discuss procedures and strategies for ORCID iD implementation in two different contexts: Open Access repositories, and institutional research information management systems.
Crediting informatics and data folks in life science teamsCarole Goble
Science Europe LEGS Committee: Career Pathways in Multidisciplinary Research: How to Assess the Contributions of Single Authors in Large Teams, 1-2 Dec 2015, Brussels
The People Behind Research Software crediting from the informatics, technical point of view
RSC|ChemSpider is one of the world’s largest online resources for chemistry related data and services. Developed with the intention of delivering access to structure-based chemistry data via the internet the ChemSpider platform hosts over 26 million unique chemical compounds aggregated from over 400 data sources and provides an environment for the community to both annotate and curate these existing data as well as deposit new data to the system. The search system delivers flexible querying capabilities together with links to external sites for publication and patent data. ChemSpider has spawned a number of projects include ChemSpider SyntheticPages for hosting openly peer-reviewed chemical synthesis articles. This presentation will review the present capabilities of the ChemSpider system providing direct examples of how to use the system to source high quality data of value to pharmaceutical companies. We will discuss some of the challenges associated with validating data quality, examine how ChemSpider is a part of the semantic web for chemistry and investigate approaches to using ChemSpider integrated to analytical instrumentation.
Equivalence is in the (ID) of the beholdermhaendel
Presented at PIDapalooza 2018. https://pidapalooza.org/
Determining identifier equivalency is key to data integration and to realizing the scientific discoveries that can only be made by collating our vast disconnected data stores.
There are two key problems in determining equivalency - conceptual and syntactic alignment. Conceptual alignment often relies on Xrefs and string-matching against synonyms. There is indeed a better way! Algorithmic determination of identifier equivalency across different sources can use a combination of Xrefs, priors rules, existing semantic relations, and synonyms to create equivalency cliques than can highlight the discrepancies in conceptual definitions for manual review. This is especially useful for data sources annotated with concept drift and differences, such as diseases. Syntactic issues are that there are so many variations of the same identifier, making data joins difficult. We present a framework to reconcile and provide authoritative and integration-ready prefixed identifiers (CURIES), to capture and consolidate prefixes and to build links across key resource registries. The combination of JSON-LD context technology with a prefix metadata repository provides the basis for the infrastructure to handle identifiers in a consistent fashion. Finally, this architecture also allows resources to be self describing "beacons" with respect to their identifiers.
Applied semantic technology and linked dataWilliam Smith
Mapping a human brain generates petabytes of gene listings and the corresponding locations of these genes throughout the human brain. Due to the large dataset a prototype Semantic Web application was created with the unique ability to link new datasets from similar fields of research, and present these new models to an online community. The resulting application presents a large set of gene to location mappings and provides new information about diseases, drugs, and side effects in relation to the genes and areas of the human brain.
In this presentation we will discuss the normalization processes and tools for adding new datasets, the user experience throughout the publishing process, the underlying technologies behind the application, and demonstrate the preliminary use cases of the project.
the Neuroscience Information Framework has over 100 big data databases indexed, allowing us to ask big data landscape questions. Anita Bandrowski presents an overview of the NIF system and provides insights into the addiction data landscape to JAX laboratories.
Anita Bandrowski explains how the uniform resource layer of the Neuroscience Information Framework allows several interesting questions about the state of scientific research to be answered.
Maryann Martone
Making Sense of Biological Systems: Using Knowledge Mining to Improve and Validate Models of Living Systems; NIH COBRE Center for the Analysis of Cellular Mechanisms and Systems Biology, Montana State University, Bozeman, MT
August 24, 2012
Cancer cell metabolism: special Reference to Lactate PathwayAADYARAJPANDEY1
Normal Cell Metabolism:
Cellular respiration describes the series of steps that cells use to break down sugar and other chemicals to get the energy we need to function.
Energy is stored in the bonds of glucose and when glucose is broken down, much of that energy is released.
Cell utilize energy in the form of ATP.
The first step of respiration is called glycolysis. In a series of steps, glycolysis breaks glucose into two smaller molecules - a chemical called pyruvate. A small amount of ATP is formed during this process.
Most healthy cells continue the breakdown in a second process, called the Kreb's cycle. The Kreb's cycle allows cells to “burn” the pyruvates made in glycolysis to get more ATP.
The last step in the breakdown of glucose is called oxidative phosphorylation (Ox-Phos).
It takes place in specialized cell structures called mitochondria. This process produces a large amount of ATP. Importantly, cells need oxygen to complete oxidative phosphorylation.
If a cell completes only glycolysis, only 2 molecules of ATP are made per glucose. However, if the cell completes the entire respiration process (glycolysis - Kreb's - oxidative phosphorylation), about 36 molecules of ATP are created, giving it much more energy to use.
IN CANCER CELL:
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
introduction to WARBERG PHENOMENA:
WARBURG EFFECT Usually, cancer cells are highly glycolytic (glucose addiction) and take up more glucose than do normal cells from outside.
Otto Heinrich Warburg (; 8 October 1883 – 1 August 1970) In 1931 was awarded the Nobel Prize in Physiology for his "discovery of the nature and mode of action of the respiratory enzyme.
WARNBURG EFFECT : cancer cells under aerobic (well-oxygenated) conditions to metabolize glucose to lactate (aerobic glycolysis) is known as the Warburg effect. Warburg made the observation that tumor slices consume glucose and secrete lactate at a higher rate than normal tissues.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
Richard's entangled aventures in wonderlandRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Comparative structure of adrenal gland in vertebrates
Why should Journals ask fo RRIDs?
1. Why should Journals ask
for RRIDs?
It answers two questions:
Which antibodies (or mice, cell lines) are used in this paper?
What other papers use this antibody?
RRIDs are in use for Antibodies, Animals, Cell Lines and Software tools.
Questions: Anita Bandrowski PhD.
abandrowski@ucsd.edu
2. Quality of published research is in question
Solving quality: mandates and
solutions
NIH NOT-OD-16-011
Journals RRIDs
Societies FASEB Guidelines
Non-Profits TOP Guidelines
Reproducibility of science is important
Transparency in science is achievable
5. How common is this?
Papers are
currently poor at
identifying the
simplest part of
the paper, the
materials used
Vasilevsky 2013
6. Resource IDs from NIF aggregated databases
•A single portal for authors
•>25 authoritative
databases (stock center
IDs are RRIDs)
•One search interface
•Simple directions
•Prominent “Cite This”
button
•Help desk (well used!)
• Response time is 1 business
day or less
One Portal simplifies
authors job
http://scicrunch.org/resources
author searches for an
antibody / mouse
author copies
”Cite This” text
into manuscript
Paper is
Published
7. RRIDs = Better papers
Bandrowski et al, 2015
Vasilevsky’s
analysis re-run
by Vasilevsky
for RRID pilot;
first 100 papers
8. RRIDs are Persistent Unique IDentifiers
1) The data set itself and the sources for the data are covered by a CC0 license, that is, the data set is
freely available for others to take up should any source fail. This practice is recommended by Geoffrey
Bilder of Cross Ref to ensure that key infrastructure is portable.
2) Unlike DOI’s or PURL’s, which absolutely have to resolve to the entity that they point to, RRIDs
reference metadata about physical things in the world. That is, RRID’s only point to metadata about the
thing, not the thing itself (mouse, antibody). Because the RRID’s are in the text next to the snippet of
information that identifies them, you would in the future be able to identify the resource and a full set of
metadata about them without any special infrastructure just from the literature itself. As a whole,
publishers and libraries are committed to preserving the scientific literature.
3) We are planning to ensure that copies of landing pages for research resources referenced in the
literature are placed in the internet archive. Should any source disappear, the current resolvers
(identifiers.org, CDL’s and SciCrunch’s) can point to these pages.
9. When is RRID critical?
…78 widely used cell lines that
turned out to be overgrown
with other cells…
Slides presented at SfN: NIH led Rigor
and Transparency Session Nov 2016
10. RRID shows problem resources
Is the cell line I am about to publish
contaminated?
• Comment “Problematic cell line:
Contaminated”
• Data is derived from ICLAC list of
contaminated cell lines, and it is curated /
updated by Cellosaurus
• Contamination noted in 714 common cell
lines
Not all vendors report contamination on
their website, note ATCC & ECACC do for
1-5c-4
11. Do we publish papers with
antibodies that don’t work?
12. Instruction to authors: How to authenticate antibodies?
• Based on Uhlen et al, 2016 (Nat Methods)
• Checking with the antibodyregistry.org
• catalog number and clone ID may be different,
if RRID is the same = same product
• Chose antibodies that that have been
validated by manufacturer for your application
• Chose antibodies with independent validation:
• ENCODE / ScienceExchange
• RRID publications
• Original manufacturers (in progress of adding this
tag)
• If no validation data is available, require full
validation
“discontinued 2016 due to
animal welfare concerns”
“Originating Manufacturer of
this product; Tested
applications: WB/ ELISA”
“ENCODE PROJECT External
validation DATA SET is released”
ISO 9001: 2008
and ISO 13485:
2003
Validated by
ScienceExchange
no validation data or MDS
available from vendor
Example comments
on antibody records:
13. Who does this?
Over 200 journals have RRID containing papers
across all major publishers
July 2016
Aug 2016
Oct 2016
Jun 2016
14. What does compliance
look like in a journal?
*typesetting instructions in eLife provide links
Identifiers.org and
CDL’s n2T resolvers back
scicrunch resolution services