With its focus on investigating the basis for the sustained existence
of living systems, modern biology has always been a fertile, if not
challenging, domain for formal knowledge representation and automated
reasoning. With thousands of databases and hundreds of ontologies now
available, there is a salient opportunity to integrate these for
discovery. In this talk, I will discuss our efforts to build a rich
foundational network of ontology-annotated linked data, develop
methods to intelligently retrieve content of interest, uncover
significant biological associations, and pursue new avenues for drug
discovery. As the portfolio of Semantic Web technologies continue to
mature in terms of functionality, scalability, and an understanding of
how to maximize their value, researchers will be strategically poised
to pursue increasingly sophisticated KR projects aimed at improving
our overall understanding of human health and disease.
bio: Dr. Michel Dumontier is an Associate Professor of Medicine
(Biomedical Informatics) at Stanford University. His research aims to
find new treatments for rare and complex diseases. His research
interest lie in the publication, integration, and discovery of
scientific knowledge. Dr. Dumontier serves as a co-chair for the World
Wide Web Consortium Semantic Web in Health Care and Life Sciences
Interest Group (W3C HCLSIG) and is the Scientific Director for
Bio2RDF, a widely used open-source project to create and provide
linked data for life sciences.
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental MetadataMichel Dumontier
Biomedical researchers will remain stymied in their ability to take full advantage of the Big Data revolution if they can never find the datasets that they need to analyze, if there is lack of clarity about what particular datasets contain, and if data are insufficiently described.
CEDAR, an NIH BD2K Center of Excellence, aims to develop methods and tools to vastly ease the burden of authoring good experimental metadata, and to maximally use this information to zero in on datasets of interest.
Semantic web technologies offer a potential mechanism for the representation and integration of thousands of biomedical databases. Many of these databases offer cross-references to other data sources, but these are generally incomplete and prone to error. In this paper, we conduct an empirical analysis of the link structure of life science Linked Data, obtained from the Bio2RDF project. Three different link graphs for datasets, entities and terms are characterized by degree, connectivity, and clustering metrics, and their correlation is measured as well. Furthermore, we utilize the symmetry and transitivity of entity links to build a benchmark and evaluate several popular entity matching approaches. Our findings indicate that the life science data network can help find hidden links, can be used to validate links, and may offer a mechanism to integrate a wider set of resources to support biomedical knowledge discovery.
Bio2RDF is an open-source project that offers a large and
connected knowledge graph of Life Science Linked Data. Each dataset is expressed using its own vocabulary, thereby hindering integration, search, query, and browse data across similar or identical types of data. With growth and content changes in source data, a manual approach to maintain mappings has proven untenable. The aim of this work is to develop a (semi)automated procedure to generate high quality mappings
between Bio2RDF and SIO using BioPortal ontologies. Our preliminary results demonstrate that our approach is promising in that it can find new mappings using a transitive closure between ontology mappings. Further development of the methodology coupled with improvements in
the ontology will offer a better-integrated view of the Life Science Linked Data
Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. This document describes a consensus among participating stakeholders in the Health Care and the Life Sciences domain on the description of datasets using the Resource Description Framework (RDF). This specification meets key functional requirements, reuses existing vocabularies to the extent that it is possible, and addresses elements of data description, versioning, provenance, discovery, exchange, query, and retrieval.
Model organisms such as budding yeast provide a common platform to interrogate and understand cellular and physiological processes. Knowledge about model organisms, whether generated during the course of scientific investigation, or extracted from published articles, are made available by model organism databases (MODs) such as the Saccharomyces Genome Database (SGD) for powerful, data-driven bioinformatic analyses. Integrative platforms such as InterMine offer a standard platform for MOD data exploration and data mining. Yet, today’s bioinformatic analyses also requires access to a significantly broader set of structured biomedical data, such as what can be found in the emerging network of Linked Open Data (LOD). If MOD data could be provisioned as FAIR (Findable, Accessible, Interoperable, and Reusable), then scientists could leverage a greater amount of interoperable data in knowledge discovery.
The goal of this proposal is to increase the utility of MOD data by implementing standards-compliant data access interfaces that interoperate with Linked Data. We will focus our efforts on developing interfaces for data access, data retrieval, and query answering for SGD. Our software will publish InterMine data as LOD that are semantically annotated with ontologies and be retrieved using standardized formats (e.g. JSON-LD, Turtle). We will facilitate the exploration of MOD data for hypothesis testing, by implementing efficient query answering using Linked Data Fragments, and by developing a set of graphical user interfaces to search for data of interest, explore connections, and answer questions that leverage the wider LOD network. Finally, we will develop a locally and cloud-deployable image to enable the rapid deployment of the proposed infrastructure. Our efforts to increase interoperability and ease of deployment for biomedical data repositories will increase research productivity and reduce costs associated with data integration and warehouse maintenance.
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)Michel Dumontier
In the quest to translate the results biomedical research into effective clinical applications, many are now trying to make sense of the large and rapidly growing amount of public biomedical data. However, substantial challenges exist in traversing the currently fragmented data landscape. In this talk, I will discuss our efforts to use Semantic Web technologies to facilitate biomedical research through the formulation, publication, integration, and exploration of facts, expert knowledge, and web services.
Building a Network of Interoperable and Independently Produced Linked and Ope...Michel Dumontier
Over 15 years ago, Sir Tim Berners Lee proclaimed the founding of an exciting new future involving intelligent agents operating over smarter data in order to perform complex tasks at the behest of their human controllers. At the heart of this vision lies an uneasy alliance between tedious formal knowledge representations and powerful analytics over big, but often messy data. Bio2RDF, our decade old open source project to create Linked Data for the life sciences, has weaved emergent Semantic Web technologies such as ontologies and Linked Data to generate FAIR - Findable, Accessible, Interoperable, and Reusable - data in the form of billions of machine accessible statements for use in downstream biomedical discovery.
This revolution in data publication has been strengthened by action from global bioinformatics institutions such as the NCBI, NCBO, EBI, and DBCLS. Notably, NCBI's PubChem has successfully coupled large scale data integration with community-based standards to offer a remakable biochemical knowledge resource amenable to data hungry discovery tools. Yet, in the face of increasing pressure from researchers, funders, and publishers, will these approaches be sufficient for growing and maintaining a comprehensive knowledge graph that is inclusive of all biomedical research?
With its focus on investigating the basis for the sustained existence
of living systems, modern biology has always been a fertile, if not
challenging, domain for formal knowledge representation and automated
reasoning. With thousands of databases and hundreds of ontologies now
available, there is a salient opportunity to integrate these for
discovery. In this talk, I will discuss our efforts to build a rich
foundational network of ontology-annotated linked data, develop
methods to intelligently retrieve content of interest, uncover
significant biological associations, and pursue new avenues for drug
discovery. As the portfolio of Semantic Web technologies continue to
mature in terms of functionality, scalability, and an understanding of
how to maximize their value, researchers will be strategically poised
to pursue increasingly sophisticated KR projects aimed at improving
our overall understanding of human health and disease.
bio: Dr. Michel Dumontier is an Associate Professor of Medicine
(Biomedical Informatics) at Stanford University. His research aims to
find new treatments for rare and complex diseases. His research
interest lie in the publication, integration, and discovery of
scientific knowledge. Dr. Dumontier serves as a co-chair for the World
Wide Web Consortium Semantic Web in Health Care and Life Sciences
Interest Group (W3C HCLSIG) and is the Scientific Director for
Bio2RDF, a widely used open-source project to create and provide
linked data for life sciences.
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental MetadataMichel Dumontier
Biomedical researchers will remain stymied in their ability to take full advantage of the Big Data revolution if they can never find the datasets that they need to analyze, if there is lack of clarity about what particular datasets contain, and if data are insufficiently described.
CEDAR, an NIH BD2K Center of Excellence, aims to develop methods and tools to vastly ease the burden of authoring good experimental metadata, and to maximally use this information to zero in on datasets of interest.
Semantic web technologies offer a potential mechanism for the representation and integration of thousands of biomedical databases. Many of these databases offer cross-references to other data sources, but these are generally incomplete and prone to error. In this paper, we conduct an empirical analysis of the link structure of life science Linked Data, obtained from the Bio2RDF project. Three different link graphs for datasets, entities and terms are characterized by degree, connectivity, and clustering metrics, and their correlation is measured as well. Furthermore, we utilize the symmetry and transitivity of entity links to build a benchmark and evaluate several popular entity matching approaches. Our findings indicate that the life science data network can help find hidden links, can be used to validate links, and may offer a mechanism to integrate a wider set of resources to support biomedical knowledge discovery.
Bio2RDF is an open-source project that offers a large and
connected knowledge graph of Life Science Linked Data. Each dataset is expressed using its own vocabulary, thereby hindering integration, search, query, and browse data across similar or identical types of data. With growth and content changes in source data, a manual approach to maintain mappings has proven untenable. The aim of this work is to develop a (semi)automated procedure to generate high quality mappings
between Bio2RDF and SIO using BioPortal ontologies. Our preliminary results demonstrate that our approach is promising in that it can find new mappings using a transitive closure between ontology mappings. Further development of the methodology coupled with improvements in
the ontology will offer a better-integrated view of the Life Science Linked Data
Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. This document describes a consensus among participating stakeholders in the Health Care and the Life Sciences domain on the description of datasets using the Resource Description Framework (RDF). This specification meets key functional requirements, reuses existing vocabularies to the extent that it is possible, and addresses elements of data description, versioning, provenance, discovery, exchange, query, and retrieval.
Model organisms such as budding yeast provide a common platform to interrogate and understand cellular and physiological processes. Knowledge about model organisms, whether generated during the course of scientific investigation, or extracted from published articles, are made available by model organism databases (MODs) such as the Saccharomyces Genome Database (SGD) for powerful, data-driven bioinformatic analyses. Integrative platforms such as InterMine offer a standard platform for MOD data exploration and data mining. Yet, today’s bioinformatic analyses also requires access to a significantly broader set of structured biomedical data, such as what can be found in the emerging network of Linked Open Data (LOD). If MOD data could be provisioned as FAIR (Findable, Accessible, Interoperable, and Reusable), then scientists could leverage a greater amount of interoperable data in knowledge discovery.
The goal of this proposal is to increase the utility of MOD data by implementing standards-compliant data access interfaces that interoperate with Linked Data. We will focus our efforts on developing interfaces for data access, data retrieval, and query answering for SGD. Our software will publish InterMine data as LOD that are semantically annotated with ontologies and be retrieved using standardized formats (e.g. JSON-LD, Turtle). We will facilitate the exploration of MOD data for hypothesis testing, by implementing efficient query answering using Linked Data Fragments, and by developing a set of graphical user interfaces to search for data of interest, explore connections, and answer questions that leverage the wider LOD network. Finally, we will develop a locally and cloud-deployable image to enable the rapid deployment of the proposed infrastructure. Our efforts to increase interoperability and ease of deployment for biomedical data repositories will increase research productivity and reduce costs associated with data integration and warehouse maintenance.
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)Michel Dumontier
In the quest to translate the results biomedical research into effective clinical applications, many are now trying to make sense of the large and rapidly growing amount of public biomedical data. However, substantial challenges exist in traversing the currently fragmented data landscape. In this talk, I will discuss our efforts to use Semantic Web technologies to facilitate biomedical research through the formulation, publication, integration, and exploration of facts, expert knowledge, and web services.
Building a Network of Interoperable and Independently Produced Linked and Ope...Michel Dumontier
Over 15 years ago, Sir Tim Berners Lee proclaimed the founding of an exciting new future involving intelligent agents operating over smarter data in order to perform complex tasks at the behest of their human controllers. At the heart of this vision lies an uneasy alliance between tedious formal knowledge representations and powerful analytics over big, but often messy data. Bio2RDF, our decade old open source project to create Linked Data for the life sciences, has weaved emergent Semantic Web technologies such as ontologies and Linked Data to generate FAIR - Findable, Accessible, Interoperable, and Reusable - data in the form of billions of machine accessible statements for use in downstream biomedical discovery.
This revolution in data publication has been strengthened by action from global bioinformatics institutions such as the NCBI, NCBO, EBI, and DBCLS. Notably, NCBI's PubChem has successfully coupled large scale data integration with community-based standards to offer a remakable biochemical knowledge resource amenable to data hungry discovery tools. Yet, in the face of increasing pressure from researchers, funders, and publishers, will these approaches be sufficient for growing and maintaining a comprehensive knowledge graph that is inclusive of all biomedical research?
A presentation to the New Year's Event for Maastricht University's Knowledge Engineering @ Work Program. https://www.maastrichtuniversity.nl/news/kework-first-10-students-academic-workstudy-track-graduate
Generating Biomedical Hypotheses Using Semantic Web TechnologiesMichel Dumontier
With its focus on investigating the nature and basis for the sustained existence of living systems, modern biology has always been a fertile, if not challenging, domain for formal knowledge representation and automated reasoning. Over the past 15 years, hundreds of projects have developed or leveraged ontologies for entity recognition and relation extraction, semantic annotation, data integration, query answering, consistency checking, association mining and other forms of knowledge discovery. In this talk, I will discuss our efforts to build a rich foundational network of ontology-annotated linked data, discover significant biological associations across these data using a set of partially overlapping ontologies, and identify new avenues for drug discovery by applying measures of semantic similarity over phenotypic descriptions. As the portfolio of Semantic Web technologies continue to mature in terms of functionality, scalability and an understanding of how to maximize their value, increasing numbers of biomedical researchers will be strategically poised to pursue increasingly sophisticated KR projects aimed at improving our overall understanding of the capability and behavior of biological systems.
Citing data in research articles: principles, implementation, challenges - an...FAIRDOM
Prepared and presented by Jo McEntyre (EMBL_EBI) as part of the Reproducible and Citable Data and Models Workshop in Warnemünde, Germany. September 14th - 16th 2015.
dkNET Webinar: "The Microphysiology Systems Database (MPS-Db): A Platform For...dkNET
Abstract
The Microphysiology Systems Database Center (MPS-DbC) developed and implemented the Microphysiology Systems Database (MPS-Db, https://mps.csb.pitt.edu/) for the management, analysis, sharing, integration of preclinical and clinical information, and computational modeling of data in one platform, enhancing the in vitro model value and user workflow. The MPS-Db supports data from a wide range of in vitro models including static and microfluidic 2D and 3D microplates, and microfluidic MPS for single and multiple organ models. Aggregation of metadata, experimental data, and references provides for robust and relevant interpretation of the results, and having a central repository facilitates data sharing among user-specified collaborators and groups. Ready access to experimental data and metadata from any in vitro platform, along with reference data in a mineable format, provides a convenient platform for statistical analysis of performance, and building computational models to predict PK, identify compound mechanisms of actions, and infer pathways of disease progression. The MPS-DbC assists users in capturing and managing MPS data, and the MPS-Db is the central repository for the Tissue Chip Testing Centers, as well as the NCATS Tissue Chips programs. We continue to build the research and commercial value of the MPS-Db by: 1) supporting MPS users to build content; 2) implementing on-line preclinical/clinical concordance analysis capabilities; 3) enhancing the suite of data mining and computational modeling tools; and 4) augmenting methods for ensuring data quality and the secure, controlled release of data to user-specified groups.
The top 3 key questions that Microphysiology Systems Database (MPS-Db) can answer:
1. What models are available, what are their characteristics, how reproducible are they, and how can they be used?
2. How does an organ model A compare with organ model B? For example, where model A and model B are constructed in different laboratories, on different days, or with difference cells, such as iPSCs vs. primary cells.
3. Which readouts from an organ model are predictive of a specific clinical outcome and how reliable is the prediction?
Presenter: Bert Gough, PhD, Association Professor of Computational and Systems Biology, Group Leader Informatics, University of Pittsburgh Drug Discovery Institute
Upcoming webinars schedule: https://dknet.org/about/webinar
My talk at the Open PHACTS last ever project meeting in Vienna 2016 where i was asked to talk about the challenges we addressed in open phacts with semantic web technology and what still needed to be done.
Marco Brandizi and Keywan Hassani-Pak, Rothamsted Research, Invited Presentation at SWAT4HCLS 2022.
FAIR data principles are being a driving force in life sciences and other scientific domains, helping researchers to share their data and free all of their potential to integrate information and do novel discoveries. Knowledge graphs are an ever more popular paradigm to model data according to such principles, and technologies such as graph databases are emerging as complementary to approaches like linked data. All of this includes the agronomy, farming and food domains. How advanced the adoption of sound data management policies is in these life domains? How does that compare to other life sciences? In this presentation, we will talk about our practical experience, focusing on KnetMiner, a gene and molecular biology discovering platform, which is based on building and publishing knowledge graphs according to the FAIR principles, as well as using a mix of linked data standards for life sciences and recent graph database and API technologies. We will welcome questions and discussions from the audience about similar experience.
In recent years there has been a dramatic increase in the number of freely accessible online databases serving the chemistry community. The internet provides chemistry data that can be used for data-mining, for computer models, and integration into systems to aid drug discovery. There is however a responsibility to ensure that the data are high quality to ensure that time is not wasted in erroneous searches, that models are underpinned by accurate data and that improved discoverability of online resources is not marred by incorrect data. In this article we provide an overview of some of the experiences of the authors using online chemical compound databases, critique the approaches taken to assemble data and we suggest approaches to deliver definitive reference data sources.
Opportunities and challenges presented by Wikidata in the context of biocurationBenjamin Good
Abstract—Wikidata is a world readable and writable knowledge base maintained by the Wikimedia Foundation. It offers the opportunity to collaboratively construct a fully open access knowledge graph spanning biology, medicine, and all other domains of knowledge. To meet this potential, social and technical challenges must be overcome - many of which are familiar to the biocuration community. These include community ontology building, high precision information extraction, provenance, and license management. By working together with Wikidata now, we can help shape it into a trustworthy, unencumbered central node in the Semantic Web of biomedical data.
Laboratories around the world continue to generate immense amounts of data that are non-proprietary and of value to the community. If available these data could dramatically reduce costs by minimizing rework and ultimately facilitating faster research. High quality reference data collections of chemical compound dictionaries, properties and spectra have been generated over many decades. With the advent of social networking tools and platforms such as Wikipedia, the community has an opportunity to contribute. The ChemSpider platform hosted by the Royal Society of Chemistry is a compound centric database with associated data. Already populated with almost 25 million unique compounds the community can deposit and host their own data, and curate and annotate existing data including those generated in Open Notebook Science Efforts. This presentation will provide an overview of progress to date and outline the vision of this community platform for chemistry and ensuring the longevity of chemistry reference data.
Exploring Chemical and Biological Knowledge Spaces with PubChemPaul Thiessen
My presentation for the Drug Repurposing workshop at the upcoming Bio-IT World Expo.
http://www.bio-itworldexpo.com/Bio-It_Expo_Content.aspx?id=124256
Presentation abstract:
PubChem has a wealth of chemical structure and biological activity information. In conjunction with NCBI’s other resources such as PubMed and GenBank, PubChem is a vast source of information relevant to repurposing not only of established drugs but any compounds with in vivo pharmacology and/or clinical results. The challenge is how to take advantage of this knowledge. The ability to explore not only chemical similarity but relationships between diseases and disease targets has crucial value in repurposing. While focused investigations are already possible within the existing Entrez system, navigation across these linked information spaces can be difficult to do on a large scale with current tools. We are actively developing new infrastructure to support such analyses, and pursuing new methods of exploring inter- and intra-database relationships between chemicals, targets, diseases, and patents. Progress and some future direction in these areas will be presented.
With the explosion of interest in both enhanced knowledge management and open science, the past few years have seen considerable discussion about making scientific data “FAIR” — findable, accessible, interoperable, and reusable. The problem is that most scientific datasets are not FAIR. When left to their own devices, scientists do an absolutely terrible job creating the metadata that describe the experimental datasets that make their way in online repositories. The lack of standardization makes it extremely difficult for other investigators to locate relevant datasets, to re-analyse them, and to integrate those datasets with other data. The Center for Expanded Data Annotation and Retrieval (CEDAR) has the goal of enhancing the authoring of experimental metadata to make online datasets more useful to the scientific community. The CEDAR work bench for metadata management will be presented in this webinar. CEDAR illustrates the importance of semantic technology to driving open science. It also demonstrates a means for simplifying access to scientific data sets and enhancing the reuse of the data to drive new discoveries.
Can machines understand the scientific literaturepetermurrayrust
With over 5000 scientific articles per day we need machines to help us understand the content. This material is to be used at an interactive session for the Science Society at Trinity College Cambridge UK
From Coordination to Semantic Self-Organisation: A Perspective on the Enginee...Andrea Omicini
After briefly recapitulating the classical lines of the literature on coordination models, we discuss the new lines of research that aim at addressing the coordination of complex systems, then focus on mechanisms and patterns of coordination for self-organising systems. The notions of semantic coordination and self-organising coordination are defined and shortly discussed, then a vision of SOSC (self-organising semantic coordination) is presented, along with some insights over available technologies and possible scenarios for SOSC.
[Lecture at the PhD Mini-school, 11th National Workshop "From Objects to Agents" (WOA 2010) — 05/09/2010, Bologna, Italy.
A presentation to the New Year's Event for Maastricht University's Knowledge Engineering @ Work Program. https://www.maastrichtuniversity.nl/news/kework-first-10-students-academic-workstudy-track-graduate
Generating Biomedical Hypotheses Using Semantic Web TechnologiesMichel Dumontier
With its focus on investigating the nature and basis for the sustained existence of living systems, modern biology has always been a fertile, if not challenging, domain for formal knowledge representation and automated reasoning. Over the past 15 years, hundreds of projects have developed or leveraged ontologies for entity recognition and relation extraction, semantic annotation, data integration, query answering, consistency checking, association mining and other forms of knowledge discovery. In this talk, I will discuss our efforts to build a rich foundational network of ontology-annotated linked data, discover significant biological associations across these data using a set of partially overlapping ontologies, and identify new avenues for drug discovery by applying measures of semantic similarity over phenotypic descriptions. As the portfolio of Semantic Web technologies continue to mature in terms of functionality, scalability and an understanding of how to maximize their value, increasing numbers of biomedical researchers will be strategically poised to pursue increasingly sophisticated KR projects aimed at improving our overall understanding of the capability and behavior of biological systems.
Citing data in research articles: principles, implementation, challenges - an...FAIRDOM
Prepared and presented by Jo McEntyre (EMBL_EBI) as part of the Reproducible and Citable Data and Models Workshop in Warnemünde, Germany. September 14th - 16th 2015.
dkNET Webinar: "The Microphysiology Systems Database (MPS-Db): A Platform For...dkNET
Abstract
The Microphysiology Systems Database Center (MPS-DbC) developed and implemented the Microphysiology Systems Database (MPS-Db, https://mps.csb.pitt.edu/) for the management, analysis, sharing, integration of preclinical and clinical information, and computational modeling of data in one platform, enhancing the in vitro model value and user workflow. The MPS-Db supports data from a wide range of in vitro models including static and microfluidic 2D and 3D microplates, and microfluidic MPS for single and multiple organ models. Aggregation of metadata, experimental data, and references provides for robust and relevant interpretation of the results, and having a central repository facilitates data sharing among user-specified collaborators and groups. Ready access to experimental data and metadata from any in vitro platform, along with reference data in a mineable format, provides a convenient platform for statistical analysis of performance, and building computational models to predict PK, identify compound mechanisms of actions, and infer pathways of disease progression. The MPS-DbC assists users in capturing and managing MPS data, and the MPS-Db is the central repository for the Tissue Chip Testing Centers, as well as the NCATS Tissue Chips programs. We continue to build the research and commercial value of the MPS-Db by: 1) supporting MPS users to build content; 2) implementing on-line preclinical/clinical concordance analysis capabilities; 3) enhancing the suite of data mining and computational modeling tools; and 4) augmenting methods for ensuring data quality and the secure, controlled release of data to user-specified groups.
The top 3 key questions that Microphysiology Systems Database (MPS-Db) can answer:
1. What models are available, what are their characteristics, how reproducible are they, and how can they be used?
2. How does an organ model A compare with organ model B? For example, where model A and model B are constructed in different laboratories, on different days, or with difference cells, such as iPSCs vs. primary cells.
3. Which readouts from an organ model are predictive of a specific clinical outcome and how reliable is the prediction?
Presenter: Bert Gough, PhD, Association Professor of Computational and Systems Biology, Group Leader Informatics, University of Pittsburgh Drug Discovery Institute
Upcoming webinars schedule: https://dknet.org/about/webinar
My talk at the Open PHACTS last ever project meeting in Vienna 2016 where i was asked to talk about the challenges we addressed in open phacts with semantic web technology and what still needed to be done.
Marco Brandizi and Keywan Hassani-Pak, Rothamsted Research, Invited Presentation at SWAT4HCLS 2022.
FAIR data principles are being a driving force in life sciences and other scientific domains, helping researchers to share their data and free all of their potential to integrate information and do novel discoveries. Knowledge graphs are an ever more popular paradigm to model data according to such principles, and technologies such as graph databases are emerging as complementary to approaches like linked data. All of this includes the agronomy, farming and food domains. How advanced the adoption of sound data management policies is in these life domains? How does that compare to other life sciences? In this presentation, we will talk about our practical experience, focusing on KnetMiner, a gene and molecular biology discovering platform, which is based on building and publishing knowledge graphs according to the FAIR principles, as well as using a mix of linked data standards for life sciences and recent graph database and API technologies. We will welcome questions and discussions from the audience about similar experience.
In recent years there has been a dramatic increase in the number of freely accessible online databases serving the chemistry community. The internet provides chemistry data that can be used for data-mining, for computer models, and integration into systems to aid drug discovery. There is however a responsibility to ensure that the data are high quality to ensure that time is not wasted in erroneous searches, that models are underpinned by accurate data and that improved discoverability of online resources is not marred by incorrect data. In this article we provide an overview of some of the experiences of the authors using online chemical compound databases, critique the approaches taken to assemble data and we suggest approaches to deliver definitive reference data sources.
Opportunities and challenges presented by Wikidata in the context of biocurationBenjamin Good
Abstract—Wikidata is a world readable and writable knowledge base maintained by the Wikimedia Foundation. It offers the opportunity to collaboratively construct a fully open access knowledge graph spanning biology, medicine, and all other domains of knowledge. To meet this potential, social and technical challenges must be overcome - many of which are familiar to the biocuration community. These include community ontology building, high precision information extraction, provenance, and license management. By working together with Wikidata now, we can help shape it into a trustworthy, unencumbered central node in the Semantic Web of biomedical data.
Laboratories around the world continue to generate immense amounts of data that are non-proprietary and of value to the community. If available these data could dramatically reduce costs by minimizing rework and ultimately facilitating faster research. High quality reference data collections of chemical compound dictionaries, properties and spectra have been generated over many decades. With the advent of social networking tools and platforms such as Wikipedia, the community has an opportunity to contribute. The ChemSpider platform hosted by the Royal Society of Chemistry is a compound centric database with associated data. Already populated with almost 25 million unique compounds the community can deposit and host their own data, and curate and annotate existing data including those generated in Open Notebook Science Efforts. This presentation will provide an overview of progress to date and outline the vision of this community platform for chemistry and ensuring the longevity of chemistry reference data.
Exploring Chemical and Biological Knowledge Spaces with PubChemPaul Thiessen
My presentation for the Drug Repurposing workshop at the upcoming Bio-IT World Expo.
http://www.bio-itworldexpo.com/Bio-It_Expo_Content.aspx?id=124256
Presentation abstract:
PubChem has a wealth of chemical structure and biological activity information. In conjunction with NCBI’s other resources such as PubMed and GenBank, PubChem is a vast source of information relevant to repurposing not only of established drugs but any compounds with in vivo pharmacology and/or clinical results. The challenge is how to take advantage of this knowledge. The ability to explore not only chemical similarity but relationships between diseases and disease targets has crucial value in repurposing. While focused investigations are already possible within the existing Entrez system, navigation across these linked information spaces can be difficult to do on a large scale with current tools. We are actively developing new infrastructure to support such analyses, and pursuing new methods of exploring inter- and intra-database relationships between chemicals, targets, diseases, and patents. Progress and some future direction in these areas will be presented.
With the explosion of interest in both enhanced knowledge management and open science, the past few years have seen considerable discussion about making scientific data “FAIR” — findable, accessible, interoperable, and reusable. The problem is that most scientific datasets are not FAIR. When left to their own devices, scientists do an absolutely terrible job creating the metadata that describe the experimental datasets that make their way in online repositories. The lack of standardization makes it extremely difficult for other investigators to locate relevant datasets, to re-analyse them, and to integrate those datasets with other data. The Center for Expanded Data Annotation and Retrieval (CEDAR) has the goal of enhancing the authoring of experimental metadata to make online datasets more useful to the scientific community. The CEDAR work bench for metadata management will be presented in this webinar. CEDAR illustrates the importance of semantic technology to driving open science. It also demonstrates a means for simplifying access to scientific data sets and enhancing the reuse of the data to drive new discoveries.
Can machines understand the scientific literaturepetermurrayrust
With over 5000 scientific articles per day we need machines to help us understand the content. This material is to be used at an interactive session for the Science Society at Trinity College Cambridge UK
From Coordination to Semantic Self-Organisation: A Perspective on the Enginee...Andrea Omicini
After briefly recapitulating the classical lines of the literature on coordination models, we discuss the new lines of research that aim at addressing the coordination of complex systems, then focus on mechanisms and patterns of coordination for self-organising systems. The notions of semantic coordination and self-organising coordination are defined and shortly discussed, then a vision of SOSC (self-organising semantic coordination) is presented, along with some insights over available technologies and possible scenarios for SOSC.
[Lecture at the PhD Mini-school, 11th National Workshop "From Objects to Agents" (WOA 2010) — 05/09/2010, Bologna, Italy.
Introduction to Natural Language Processingrohitnayak
Natural Language Processing has matured a lot recently. With the availability of great open source tools complementing the needs of the Semantic Web we believe this field should be on the radar of all software engineering professionals.
Introduction to Natural Language ProcessingPranav Gupta
the presentation gives a gist about the major tasks and challenges involved in natural language processing. In the second part, it talks about one technique each for Part Of Speech Tagging and Automatic Text Summarization
Using Healthcare Data for Research @ The Hyve - Campus Party 2016Kees van Bochove
In this presentation, Kees van Bochove, founder & CEO of The Hyve, a services company in biomedical open source software, presents a number of different types of healthcare data. As an example, he also provides details of a project in which The Hyve participates and which uses that kind of data. Covered are: translational medicine data using tranSMART and cBioPortal, population health data using OMOP and OHDSI, and personal health data processing using open mHealth Shimmer and Apache Kafka.
1/31/2020 Originality Report
https://ucumberlands.blackboard.com/webapps/mdb-sa-BB5a31b16bb2c48/originalityReport/ultra?attemptId=07655bd6-379b-44ca-bf70-fcf30eb046bc… 1/3
%39
%5
SafeAssign Originality Report
Spring 2020 - InfoTech Import in Strat Plan (ITS-831-32) - Full Term • Week 4 Research Paper: Blockchain
%44Total Score: High riskSpandana Kollipara
Submission UUID: b5247c95-6d9d-9160-127a-44a26fd2cf17
Total Number of Reports
1
Highest Match
44 %
week 4 assignment.docx
Average Match
44 %
Submitted on
01/31/20
09:29 AM PST
Average Word Count
651
Highest: week 4 assignment.docx
%44Attachment 1
Institutional database (3)
Student paper Student paper Student paper
Global database (2)
Student paper Student paper
Top sources (3)
Excluded sources (0)
View Originality Report - Old Design
Word Count: 651
week 4 assignment.docx
3 2 5
4 1
3 Student paper 4 Student paper 2 Student paper
https://ucumberlands.blackboard.com/webapps/mdb-sa-BB5a31b16bb2c48/originalityReport?attemptId=07655bd6-379b-44ca-bf70-fcf30eb046bc&course_id=_113956_1&download=true&includeDeleted=true&print=true&force=true
1/31/2020 Originality Report
https://ucumberlands.blackboard.com/webapps/mdb-sa-BB5a31b16bb2c48/originalityReport/ultra?attemptId=07655bd6-379b-44ca-bf70-fcf30eb046bc… 2/3
Source Matches (10)
Student paper 73%
Student paper 86%
Student paper 100%
Student paper 87%
Student paper 100%
Running Head: BLOCKCHAIN TECHNOLOGY BLOCKCHAIN TECHNOLOGY 2
Blockchain Technology in Healthcare Sector.
Spandana kollipara
ITS 831-32
University of the Cumberlands
Blockchain technology is a cryptography technology that keeps large records of data. Blocks of records are linked together as a transaction data. By use of open
distributed ledger, transactions engaged in between parties are stored efficiently and in a verifiable and permanent way. It records transactions across many
computers and thus the records involved cannot be interfered with or altered (Crosby, 2016). Various techniques has been used in the storage of data however, there
have been various challenges on the security measures and storage of large amounts of data. Therefore, with blockchain technology we can be able to store
large amount of data in various different locations closely linked together and safely stored without alteration. This technology is applicable in various fields which
includes data collection, management and analysis. However, this is a new technology and IT staff will need training in order to gain more understanding on the
use of blockchain together with improvising it in their various organizations and its applications in order to attain maximum advantage. Healthcare sector looks
forward to a great transition over its services using blockchain technology. This technology has a great impact in the healthcare sector in various fields. We will look on
its impact on how it monitors health together with the security of the patients’ records and data. I ...
Pine Biotech conducts monthly informational workshops on the topics related to high-throughput data analysis, interpretation and integration. The workshops highlight our research tools and educational resources developed with collaborators in the US and across the world.
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.cafionabrinkman
Talk at GenomeTrakr network meeting Sept 23 2015 in Washington DC. On Canada's open source Integrated Rapid Infectious Disease Analysis (IRIDA) bioinformatics platform - aiding genomic epidemiology analysis for public health agencies with planned open data release and linkage to GenomeTrakr. Discussed perspectives, challenges, solutions for getting more GenomeTrakr participation internationally.
FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...Kathleen Jagodnik
The FAIR Guiding Principles facilitate the Findability, Accessibility, Interoperability, and Reusability of digital resources. The Library of Integrated Network-based Cellular Signatures (LINCS) Project has sought to implement the FAIR principles in the provision of its resources in order to optimize usability. We have surveyed the FAIR principles and are implementing specific facets within the LINCS resources. Subsequently, with reference to the literature and other efforts to measure FAIRness, we are developing quantitative metrics to assess the FAIRness of each dataset and resource in order to provide users with objective measures of the characteristics of the LINCS project. Assessing and improving the FAIRness of LINCS is an ongoing effort by our team that will benefit from community input to ensure that all LINCS users are optimally engaged with this resource.
Slides contain information about why bioinformatics appeared,
who bioinformaticians are, what they do, what kind of cool applications and challenges in bioinformatics there are.
Slides were prepared for the Bioinformatics seminar 2016, Institute of Computer Science, University of Tartu.
Notes on "Artificial Intelligence in Bioscience Symposium 2017"PetteriTeikariPhD
Including talks for drug discovery, drug target selection, scientific reproducibility, machine learning in omics and GWAS, network biology, functional connectome, endotype discovery, bayesian causal networks, systems biology, brain decoding, place cells, personalized medicine, sepsis warning system, knowledge engineering, CRISPR genome editing, data science stacks, feline gene sequencing, generative models for chemical compounds via variational autoencoders, ethics in AI medicine
https://www.bioscience.ai/ | #bioai2017 | Sept 14, 2017 | The British Library, London
Alternative download for slides if Slideshare download is acting up: https://www.dropbox.com/s/2wdfuqzifns7475/bioai2017.pdf?dl=0
20 years of evolution in data production in health and life sciencesslecrom
I share feedbacks as a genomics core facility scientific head and as a facilities manager over the last 20 years. I go trough evolution in the high throughput sequencing field and discuss about data storage and sharing.
dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...dkNET
Presenter: Pieter Dorrestein, PhD, Professor, Skaggs School of Pharmacy and Pharmaceutical Sciences, Department of Pharmacology and Pediatrics, University of California San Diego
Abstract
In the analysis of organs, volatilome, or biofluids, the microbiome influences 15-70% of detectable mass spectrometry molecules. Typically, only 10% of human untargeted metabolomics data can be assigned a molecular structure, with merely 1-2% traceable to microbial origins. Human microbiomes contribute metabolites through the microbial metabolism of host-derived substances, digestion of food and beverage molecules, and de novo assembly using proteins encoded by genetic elements. Despite the significance of microbiome-derived metabolites to human health, there is no centralized knowledge base for community access. To address this, the "Collaborative Microbial Metabolite Center" (CMMC) leverages expertise in mass spectrometry, microbiome innovation, and the GNPS ecosystem to built a knowledgebase. It aims to create a user-accessible microbiome resource, enrich bioactivity knowledge, and facilitate data deposition. The CMMC includes the construction of a knowledge base, MicrobeMASST tool, and health phenotype enrichment workflows, the construction and use will be discussed in this presentation. The use of this ecosystem will be exemplified by the discovery of 20,000 bile acids, many of which were shown to be of microbial origin and linked to diet and IBD.
The top 3 key questions that this resource can answer:
1. How can we leverage the 1000’s of public metabolomics studies to discover microbial metabolites and their organ distributions as well as their phenotypic, including health, associations?
2. If one has an unknown molecule, how can one assess what microbes make a molecule without known structure?
3. How can one contribute to the expansion of the knowledgebase on microbial metabolites?
Upcoming webinars schedule: https://dknet.org/about/webinar
Democratising biodiversity and genomics research: open and citizen science to...GigaScience, BGI Hong Kong
Scott Edmunds at the China National GeneBank Youth Biodiversity MegaData Forum: Democratising biodiversity and genomics research: open and citizen science to build trust and fill the data gaps. 18th December 2018
Environmental Cheminformatics for Unknown ID UC Davis Nov 2018Emma Schymanski
Environmental Cheminformatics to Identify Unknown Chemicals and their Effects
Assoc. Prof. Dr. Emma L. Schymanski
FNR ATTRACT Fellow and PI: Environmental Cheminformatics, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 avenue du Swing, L-4367 Belvaux, Luxembourg.
The Environmental Cheminformatics group at the Luxembourg Centre for Systems Biomedicine focuses on the comprehensive identification of known and unknown chemicals in our environment to investigate their effects on health and disease. The environment and the chemicals to which we are exposed is incredibly complex, with over 125 million chemicals registered in the largest chemical registry and over 70,000 in household use alone. Detectable molecules in complex samples can now be captured using high resolution mass spectrometry (HRMS), which provides a “snapshot” of all chemicals present in a sample and allows for retrospective data analysis through digital archiving. However, scientists cannot yet identify the vast majority of the tens of thousands of features in each sample, leading to critical bottlenecks in identification and data interpretation. For instance, recent studies indicate a strong connection between the gut microbiome and Parkinson’s disease, yet over 60 % of significant metabolites in microbiome experiments are unknown. Unknown identification remains extremely time consuming and, in many cases, a matter of luck. Prioritizing efforts to find significant metabolites or potentially toxic substances responsible for observed effects is the key, which involves reconciling highly complex samples with expert knowledge and careful validation. This talk will cover European, US and worldwide community initiatives to help connect knowledge on chemistry and toxicity with environmental observations - from compound databases to spectral libraries and retrospective screening. It will touch on the challenges of standardized structure representations, data curation, deposition and communication between resources. Finally, it will show how interdisciplinary efforts and data sharing can facilitate research in metabolomics, exposomics and beyond.
NOTE: some slides causing errors have been removed but can be accessed through the tinyurl on the front page.
Active hyperlinks can be retrieved using the tinyurl on the front page. Please cite this work if you use any of the contents.
Open Source Collaboration in Drug Discovery in PharmaKees van Bochove
How pre-competitive collaboration in the pharmaceutical sector through open source platforms enables joint innovation of academics, pharma, SMEs and non-profits.
The increased availability of biomedical data, particularly in the public domain, offers the opportunity to better understand human health and to develop effective therapeutics for a wide range of unmet medical needs. However, data scientists remain stymied by the fact that data remain hard to find and to productively reuse because data and their metadata i) are wholly inaccessible, ii) are in non-standard or incompatible representations, iii) do not conform to community standards, and iv) have unclear or highly restricted terms and conditions that preclude legitimate reuse. These limitations require a rethink on data can be made machine and AI-ready - the key motivation behind the FAIR Guiding Principles. Concurrently, while recent efforts have explored the use of deep learning to fuse disparate data into predictive models for a wide range of biomedical applications, these models often fail even when the correct answer is already known, and fail to explain individual predictions in terms that data scientists can appreciate. These limitations suggest that new methods to produce practical artificial intelligence are still needed.
In this talk, I will discuss our work in (1) building an integrative knowledge infrastructure to prepare FAIR and "AI-ready" data and services along with (2) neurosymbolic AI methods to improve the quality of predictions and to generate plausible explanations. Attention is given to standards, platforms, and methods to wrangle knowledge into simple, but effective semantic and latent representations, and to make these available into standards-compliant and discoverable interfaces that can be used in model building, validation, and explanation. Our work, and those of others in the field, creates a baseline for building trustworthy and easy to deploy AI models in biomedicine.
Bio
Dr. Michel Dumontier is the Distinguished Professor of Data Science at Maastricht University, founder and executive director of the Institute of Data Science, and co-founder of the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. His research explores socio-technological approaches for responsible discovery science, which includes collaborative multi-modal knowledge graphs, privacy-preserving distributed data mining, and AI methods for drug discovery and personalized medicine. His work is supported through the Dutch National Research Agenda, the Netherlands Organisation for Scientific Research, Horizon Europe, the European Open Science Cloud, the US National Institutes of Health, and a Marie-Curie Innovative Training Network. He is the editor-in-chief for the journal Data Science and is internationally recognized for his contributions in bioinformatics, biomedical informatics, and semantic technologies including ontologies and linked data.
knowledge graphs are an emerging paradigm to represent information. yet their discovery and reuse is hampered by insufficient or inadequate metadata. here, the COST ACTION Distributed Knowledge Graphs had a first workshop to develop a KG metadata schema. In this presentation, the progress and plans are discussed with the W3C Community Group on Knowledge Graph Construction.
Data-Driven Discovery Science with FAIR Knowledge GraphsMichel Dumontier
Data-Driven Discovery Science with FAIR Knowledge Graphs
Despite the existence of vast amounts of biomedical data, these remain difficult to find and to productively reuse in machine learning and other Artificial Intelligence technologies. In this talk, I will discuss the role of the FAIR Guiding Principles to make AI-ready biomedical data, and their representation as knowledge graphs not only enables powerful ontology-backed semantic queries, but also can be used to predict missing information, as well as to check the quality of knowledge collected.
The main idea of the talk is to introduce the FAIR principles (what they are and what they are not), and how their application with semantic web technologies (ontologies/linked data) creates improved possibilities for large scale data integration, answering sophisticated questions using automated reasoners, and predicting new relations/validating data using graph embeddings. The audience will gain insight into the state of the art in a carefully presented manner that introduces principles, approaches, and outcomes relevant to Health AI.
The FAIR (Findable, Accessible, Interoperable, Reusable) Guiding Principles light a path towards improving the discovery and reuse of digital objects (data, documents, software, web services, etc) by machines. Machine reusability is a crucial strategic component in building robust digital infrastructure that strengthens scholarship and opens new pathways for innovation on a truly global scale. However, as the FAIR principles do not specify any particular implementation, communities have the homework to devise, standardize and implement technical specifications to improve the ‘FAIRness’ of digital assets. In this seminar, I will focus on the history and state of the art in the FAIRness assessment, including manual, semi-automated and fully automated approaches, and how these can be used by developers and consumers alike. This seminar will serve as a springboard for community discussion and adoption of these services to incrementally and realistically improve the FAIRness of their resources.
The Role of the FAIR Guiding Principles for an effective Learning Health SystemMichel Dumontier
he learning health system (LHS) is an integrated social and technological system that embeds continuous improvement and innovation for the effective delivery of healthcare. A crucial part of the LHS lies in how the underlying information system will secure and take advantage of relevant knowledge assets towards supporting complex and unusual clinical decision making, facilitating public health surveillance, and aiding comparative effectiveness research. However, key knowledge assets remain difficult to obtain and reuse, particularly in a decentralized context. In this talk, I will discuss the role of the Findable, Accessible, Interoperable, and Reusable (FAIR) Guiding Principles towards the realization of the LHS, along with emerging technologies to publish and refine clinical research and knowledge derived therein.
Keynote given for 2021 Knowledge Representation for Health Care http://banzai-deim.urv.net/events/KR4HC-2021/
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...Michel Dumontier
Biomedicine has always been a fertile and challenging domain for computational discovery science. Indeed, the existence of millions of scientific articles, thousands of databases, and hundreds of ontologies, offer exciting opportunities to reuse our collective knowledge, were we not stymied by incompatible formats, overlapping and incomplete vocabularies, unclear licensing, and heterogeneous access points. In this talk, I will discuss our work to create computational standards, platforms, and methods to wrangle knowledge into simple, but effective representations based on semantic web technologies that are maximally FAIR - Findable, Accessible, Interoperable, and Reuseable - and to further use these for biomedical knowledge discovery. But only with additional crucial developments will this emerging Internet of FAIR data and services enable automated scientific discovery on a global scale.
bio:
Dr. Michel Dumontier is the Distinguished Professor of Data Science at Maastricht University and co-founder of the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. His research focuses on the development of computational methods for scalable and responsible discovery science. Dr. Dumontier obtained his BSc (Biochemistry) in 1998 from the University of Manitoba, and his PhD (Bioinformatics) in 2005 from the University of Toronto. Previously a faculty member at Carleton University in Ottawa and Stanford University in Palo Alto, Dr. Dumontier founded and directs the interfaculty Institute of Data Science at Maastricht University to develop sociotechnological systems for responsible data science by design. His work is supported through the Dutch National Research Agenda, the Netherlands Organisation for Scientific Research, Horizon 2020, the European Open Science Cloud, the US National Institutes of Health and a Marie-Curie Innovative Training Network. He is the editor-in-chief for the journal Data Science and is internationally recognized for his contributions in bioinformatics, biomedical informatics, and semantic technologies including ontologies and linked data.
This presentation was given on October 21, 2020 at CIKM2020.
The role of the FAIR Guiding Principles in a Learning Health SystemMichel Dumontier
The learning health system (LHS) is a concept for a socio-technological system that continuously improves the delivery of health care by coupling biomedical research with practice- and evidence- based medicine. Key aspects of the LHS are collecting, integrating, and analyzing data from different sources. While the increased digitalisation of healthcare is creating new data sources, these remain hard to find and use, let alone make use of as part of intelligent systems for the benefit of patients, healthcare providers, and researchers. This talk will examine recent developments towards making key parts of the LHS, such as clinical practice guidelines, Findable, Accessible, Interoperable, and Reusable (FAIR).
Acclerating biomedical discovery with an internet of FAIR data and services -...Michel Dumontier
With its focus on improving the health and well being of people, biomedicine has always been a fertile, if not challenging domain for computational discovery science. Indeed, the existence of millions of scientific articles, thousands of databases, and hundreds of ontologies, offer exciting opportunities to reuse our collective knowledge, were we not stymied by incompatible formats, overlapping and incomplete vocabularies, unclear licensing, and heterogeneous access points. In this talk, I will discuss our work to create computational standards, platforms, and methods to wrangle knowledge into simple, but effective representations based on semantic web technologies that are maximally FAIR - Findable, Accessible, Interoperable, and Reuseable - and to further use these for biomedical knowledge discovery. But only with additional crucial developments will this emerging Internet of FAIR data and services, which is built on Semantic Web technologies, be well positioned to support automated scientific discovery on a global scale.
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...Michel Dumontier
ith its focus on improving the health and well being of people, biomedicine has always been a fertile, if not challenging domain for computational discovery science. Indeed, the existence of millions of scientific articles, thousands of databases, and hundreds of ontologies, offer exciting opportunities to reuse our collective knowledge, were we not stymied by incompatible formats, overlapping and incomplete vocabularies, unclear licensing, and heterogeneous access points. In this talk, I will discuss our work to create computational standards, platforms, and methods to wrangle knowledge into simple, but effective representations based on semantic web technologies that are maximally FAIR - Findable, Accessible, Interoperable, and Reuseable - and to further use these for biomedical knowledge discovery. But only with additional crucial developments will this emerging Internet of FAIR data and services enable automated scientific discovery on a global scale.
Are we FAIR yet? And will it be worth it?
The FAIR Principles propose essential characteristics that all digital resources (e.g. datasets, repositories, web services) should possess to be Findable, Accessible, Interoperable, and Reusable by both humans and machines. The Principles act as a guide that researchers and data stewards should expect from contemporary digital resources, and in turn, the requirements on them when publishing their own scholarly products. As interest in, and support for the Principles has spread, the diversity of interpretations has also broadened, with some resources claiming to already “be FAIR”.
This talk will elaborate on what FAIR is, what it entails, and how we should evaluate FAIRness. I will describe new social and technological infrastructure to support the creation and evaluation of FAIR resources, and how FAIR fits into institutional, national and international efforts. Finally, I will discuss the merits of the FAIR principles (and what we ask of people) in the context of strengthening data-driven scientific inquiry.Are we FAIR yet? And will it be worth it?
The FAIR Principles propose essential characteristics that all digital resources (e.g. datasets, repositories, web services) should possess to be Findable, Accessible, Interoperable, and Reusable by both humans and machines. The Principles act as a guide that researchers and data stewards should expect from contemporary digital resources, and in turn, the requirements on them when publishing their own scholarly products. As interest in, and support for the Principles has spread, the diversity of interpretations has also broadened, with some resources claiming to already “be FAIR”.
This talk will elaborate on what FAIR is, what it entails, and how we should evaluate FAIRness. I will describe new social and technological infrastructure to support the creation and evaluation of FAIR resources, and how FAIR fits into institutional, national and international efforts. Finally, I will discuss the merits of the FAIR principles (and what we ask of people) in the context of strengthening data-driven scientific inquiry.
Keynote given at NETTAB2018 - http://www.igst.it/nettab/2018/
The future of science and business - a UM Star LectureMichel Dumontier
I discuss how data science is affecting our way of life and how we at Maastricht University are preparing the next generation of leaders to address opportunities and challenges in responsible manner.
The FAIR Principles propose key characteristics that all digital resources (e.g. datasets, repositories, web services) should possess to be Findable, Accessible, Interoperable, and Reusable by people and machines. The Principles act as a guide that researchers should expect from contemporary digital resources, and in turn, the requirements on them when publishing their own scholarly products. As interest in, and support for the Principles has spread, the diversity of interpretations has also broadened, with some resources claiming to already “be FAIR”. This talk will elaborate on what FAIR is, why we need it, what it entails, and how we should evaluate FAIRness. I will describe new social and technological infrastructure to support the creation and evaluation of FAIR resources, and how FAIR fits into institutional, national and international efforts. Finally, I will discuss the merits of the FAIR principles (and what we ask of people) in the context of strengthening data-driven scientific inquiry.
A talk prepared for Workshop Working on data stewardship? Meet your peers!
Datum: 03 OKT 2017
https://www.surf.nl/agenda/2017/10/workshop-working-on-data-stewardship-meet-your-peers/index.html
Towards metrics to assess and encourage FAIRnessMichel Dumontier
With an increased interest in the FAIR metrics, there is need to develop tools and appraoches that can assess the FAIRness of a digital resource. This talk begins to explore some ideas in this space, and invites people to participate in a working group focused on the development, application, and evaluation of FAIR metric efforts.
Ontology has its roots as a field of philosophical study that is focused on the nature of existence. However, today's ontology (aka knowledge graph) can incorporate computable descriptions that can bring insight in a wide set of compelling applications including more precise knowledge capture, semantic data integration, sophisticated query answering, and powerful association mining - thereby delivering key value for health care and the life sciences. In this webinar, I will introduce the idea of computable ontologies and describe how they can be used with automated reasoners to perform classification, to reveal inconsistencies, and to precisely answer questions. Participants will learn about the tools of the trade to design, find, and reuse ontologies. Finally, I will discuss applications of ontologies in the fields of diagnosis and drug discovery.
Bio:
Dr. Michel Dumontier is an Associate Professor of Medicine (Biomedical Informatics) at Stanford University. His research focuses on the development of methods to integrate, mine, and make sense of large, complex, and heterogeneous biological and biomedical data. His current research interests include (1) using genetic, proteomic, and phenotypic data to find new uses for existing drugs, (2) elucidating the mechanism of single and multi-drug side effects, and (3) finding and optimizing combination drug therapies. Dr. Dumontier is the Stanford University Advisory Committee Representative for the World Wide Web Consortium, the co-Chair for the W3C Semantic Web for Health Care and the Life Sciences Interest Group, scientific advisor for the EBI-EMBL Chemistry Services Division, and the Scientific Director for Bio2RDF, an open source project to create Linked Data for the Life Sciences. He is also the founder and Editor-in-Chief for a Data Science, a new IOS Press journal featuring open access, open review, and semantic publishing.
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
1. Semantic Approaches
for Biochemical Knowledge Discovery
1
Michel Dumontier, Ph.D.
Associate Professor of Medicine (Biomedical Informatics)
Stanford University
@micheldumontier::ACS:15-03-2016
6. Reusing raw and curated data in thousands of databases is
challenging: identifiers, formats, access methods, links
6 @micheldumontier::ACS:15-03-2016
7. Various software are needed to analyze data
(problems: OS, versioning, input/output formats)
7 @micheldumontier::ACS:15-03-2016
8. Ultimately, scientists develop fairly sophisticated
programs/workflows to test hypotheses
8 @micheldumontier::ACS:15-03-2016
9. The absence of intelligent systems
requires vast amounts of
experience and technical expertise
@micheldumontier::ACS:15-03-20169
10. How can we
automatically find
the evidence that
support or dispute a
scientific hypothesis
using the latest data,
tools and scientific
knowledge?
@micheldumontier::ACS:15-03-201610
11. So what do we need to achieve this?
1. Data Science Tools and Methods
– To identify, represent, interlink, integrate, and query
data and services
– To identify and uncover support for known or novel
associations
2. Community Standards to share and interrogate a
massive, decentralized network of interconnected data
and software
@micheldumontier::ACS:15-03-201611
12. First, we need FAIR data
Findable
– Globally unique identifiers for datasets and the data they contain
– Rich set of descriptors to search and filter with
– Indexed and searchable
Accessible
– Metadata is eternally available.
– Identifiers are used to retrieve representations using standard protocols (e.g.
HTTP)
Interoperable
– Data represented with formal knowledge representations
– Include links to other datasets/vocabularies
Reusable
– Licensing, Provenance, Community standards
@micheldumontier::ACS:15-03-201612
“Numbers have no way of speaking for themselves. We need to
imbue them with meaning.” - Nate Silver, The signal and the noise
14. The Semantic Web
is the new global web of knowledge
14 @micheldumontier::ACS:15-03-2016
standards for publishing, sharing and querying
facts, expert knowledge and services
scalable approach for the discovery
of independently formulated
and distributed knowledge
15. Linked Data is FAIR data
15 @micheldumontier::ACS:15-03-2016Linking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/"
16. @micheldumontier::ACS:15-03-2016
Linked Data for the Life Sciences
16
Bio2RDF is an open source project to unify the
representation and interlinking of biological data using RDF.
chemicals/drugs/formulations,
genomes/genes/proteins, domains
Interactions, complexes & pathways
animal models and phenotypes
Disease, genetic markers, treatments
Terminologies & publications
• 11B+ interlinked statements from 35 biomedical
datasets
• dataset description, provenance & statistics
• A growing interoperable ecosystem with the EBI,
NCBI, DBCLS, NCBO, OpenPHACTS, and
commercial tool providers
18. Bio2RDF shows how datasets are
connected together
@micheldumontier::ACS:15-03-201618
19. graph methods for data quality
to find mismatches and discover new links
@micheldumontier::ACS:15-03-201619
W Hu, H Qiu, M Dumontier. Link Analysis of Life Science Linked Data.
International Semantic Web Conference (2) 2015: 446-462.
20. Federated Queries
over public SPARQL EndPoints
Get all protein catabolic processes (and more specific GO terms) in biomodels
SELECT ?go ?label count(distinct ?x)
WHERE {
service <http://bioportal.bio2rdf.org/sparql> {
?go rdfs:label ?label .
?go rdfs:subClassOf+ ?tgo
?tgo rdfs:label ?tlabel .
FILTER regex(?tlabel, "^protein catabolic process")
}
service <http://biomodels.bio2rdf.org/sparql> {
?x <http://bio2rdf.org/biopax_vocabulary:identical-to> ?go .
?x a <http://www.biopax.org/release/biopax-level3.owl#BiochemicalReaction> .
}
}
@micheldumontier::ACS:15-03-201620
21. EbolaKB
Using Linked Data and Software
@micheldumontier::ACS:15-03-201621
Kamdar, Dumontier. An Ebola virus-centered knowledge base. Database. 2015 Jun 8;2015. doi: 10.1093/database/bav049.
28. smartAPI
The goal is to reduce the barrier for the discovery and
reuse of web APIs through richer semantic metadata.
i) a coordinated facility for the intelligent annotation of
smart APIs
ii) a web application to discover smart APIs and how
they connect to each other.
iii) The augmentation of existing APIs to provide FAIR
data
28 @micheldumontier::ACS:15-03-2016
30. Evan’s Questions
• What should we be doing now?
– Encouraging researchers to publish FAIR data and
services
• How should we be doing it?
– As Linked Data
– Institutional repositories and available in wikidata and
other aggregators
• Where are things going in the future?
– Reproducible analyses over indexed, archived, and
massively connected knowledge graphs
@micheldumontier::ACS:15-03-201630