A presentation given at the University of Toronto on June 18, 2009 describing the current state of Bio2RDF with respect to biological knowledge representation on the semantic web as linked data with services to describe and answer questions.
The Bio2RDF project aims to transform silos of bioinformatics data into a distributed platform for biological knowledge discovery. Initial work focused on building a public database of open-linked data with web-resolvable identifiers that provides information about named entities. This involved a syntactic normalization to convert open data represented in a variety of formats (flatfile, tab, xml, web services) to RDF-based linked data with normalized names (HTTP URIs) and basic typing from source databases. Bio2RDF entities also make reference to other open linked data networks (e.g. dbPedia) thus facilitating traversal across information spaces. However, a significant problem arises when attempting to undertake more sophisticated knowledge discovery approaches such as question answering or symbolic data mining. This is because knowledge is represented in a fundamentally different manner, requiring one to know the underlying data model and reconcile the artefactual differences when they arise. In this talk, we describe our data integration strategy that makes use of both syntactic and semantic normalization to consistently marshal knowledge to a common data model while leveraging explicit logic-based mappings with community ontologies to further enhance the biological knowledgescope.
Background of the project and simple use cases of using the Open PHACTS API and KNIME to extract compound, target and indication entities from millions of patent documents and infer meaningful links among them. Open PHACTS Linked Data meeting in Vienna.
Overview of the SureChEMBL system and web interface.
https://www.surechembl.org/search/
SureChEMBL is a freely available web resource for chemistry patent searching. It is based on a fully automatic and dynamic text and image mining pipeline.
NCBI; Introduction, Homepage and about
Tools and database of NCBI
BLAST; Introduction, Homepage and types of BLAST
Some databases of NCBI
References
Acknowledgements
The Bio2RDF project aims to transform silos of bioinformatics data into a distributed platform for biological knowledge discovery. Initial work focused on building a public database of open-linked data with web-resolvable identifiers that provides information about named entities. This involved a syntactic normalization to convert open data represented in a variety of formats (flatfile, tab, xml, web services) to RDF-based linked data with normalized names (HTTP URIs) and basic typing from source databases. Bio2RDF entities also make reference to other open linked data networks (e.g. dbPedia) thus facilitating traversal across information spaces. However, a significant problem arises when attempting to undertake more sophisticated knowledge discovery approaches such as question answering or symbolic data mining. This is because knowledge is represented in a fundamentally different manner, requiring one to know the underlying data model and reconcile the artefactual differences when they arise. In this talk, we describe our data integration strategy that makes use of both syntactic and semantic normalization to consistently marshal knowledge to a common data model while leveraging explicit logic-based mappings with community ontologies to further enhance the biological knowledgescope.
Background of the project and simple use cases of using the Open PHACTS API and KNIME to extract compound, target and indication entities from millions of patent documents and infer meaningful links among them. Open PHACTS Linked Data meeting in Vienna.
Overview of the SureChEMBL system and web interface.
https://www.surechembl.org/search/
SureChEMBL is a freely available web resource for chemistry patent searching. It is based on a fully automatic and dynamic text and image mining pipeline.
NCBI; Introduction, Homepage and about
Tools and database of NCBI
BLAST; Introduction, Homepage and types of BLAST
Some databases of NCBI
References
Acknowledgements
PubChem for drug discovery in the age of big data and artificial intelligenceSunghwan Kim
Presented at the American Chemical Society Middle Atlantic Regional Meeting (MARM) 2021 (June 10, 2021).
==== Abstract ====
With the emergence of the age of big data and artificial intelligence, biomedical research communities have a great interest in exploiting the massive amount of chemical and biological data available in the public domain. PubChem (https://pubchem.ncbi.nlm.nih.gov) is one of the largest sources of publicly available chemical information, with +270 million substance descriptions, +110 million unique compounds, +285 million bioactivity outcomes from more than one million biological assay experiments. PubChem provides a wide range of chemical information, including structure, pharmacology, toxicology, drug target, metabolism, chemical vendors, patents, regulations, clinical trials, and many others. These contents can be accessed interactively through web browsers as well as programmatically using computer scripts. They can also be downloaded in bulk through the PubChem File Transfer Protocol (FTP) site. PubChem data has been used in many studies for developing bioactivity and toxicity prediction models, discovering polypharmacologic (multi-target) ligands, and identifying new macromolecule targets of compounds (for drug-repurposing or off-target side effect prediction). This presentation provides an overview of PubChem data, tools, and services useful for drug discovery.
Defrosting the Digital Library: A survey of bibliographic tools for the next ...Duncan Hull
After centuries with little change, scientific libraries have recently experienced massive upheaval. From being almost entirely paper-based, most libraries are now almost completely digital. This information revolution has all happened in less than 20 years and has created many novel opportunities and threats for scientists, publishers and libraries.
Today, we are struggling with an embarassing wealth of digital knowledge on the Web. Most scientists access this knowledge through some kind of digital library, however these places can be cold, impersonal, isolated, and inaccessible places. Many libraries are still clinging to obsolete models of identity, attribution, contribution, citation and publication.
Based on a review published in PLoS Computational Biology, http://pubmed.gov/18974831 this talk will discuss the current chilly state of digital libraries for biologists, chemists and informaticians, including PubMed and Google Scholar. We highlight problems and solutions to the coupling and decoupling of publication data and metadata, with a tool called http://www.citeulike.org. This software tool exploits the Web to make digital libraries “warmer”: more personal, sociable, integrated, and accessible places.
Finally issues that will help or hinder the continued warming of libraries in the future, particularly the accurate identity of authors and their publications, are briefly introduced. These are discussed in the context of the BBSRC funded REFINE project, at the National Centre for Text Mining (NaCTeM.ac.uk), which is linking biochemical pathway data with evidence for pathways from the PubMed database.
The DNA Data Bank of Japan (DDBJ) is a biological database that collects DNA sequences. It is located at the National Institute of Genetics (NIG) in the Shizuoka prefecture of Japan. It is also a member of the International Nucleotide Sequence Database Collaboration or INSDC.
A presentation to the New Year's Event for Maastricht University's Knowledge Engineering @ Work Program. https://www.maastrichtuniversity.nl/news/kework-first-10-students-academic-workstudy-track-graduate
Generating Biomedical Hypotheses Using Semantic Web TechnologiesMichel Dumontier
With its focus on investigating the nature and basis for the sustained existence of living systems, modern biology has always been a fertile, if not challenging, domain for formal knowledge representation and automated reasoning. Over the past 15 years, hundreds of projects have developed or leveraged ontologies for entity recognition and relation extraction, semantic annotation, data integration, query answering, consistency checking, association mining and other forms of knowledge discovery. In this talk, I will discuss our efforts to build a rich foundational network of ontology-annotated linked data, discover significant biological associations across these data using a set of partially overlapping ontologies, and identify new avenues for drug discovery by applying measures of semantic similarity over phenotypic descriptions. As the portfolio of Semantic Web technologies continue to mature in terms of functionality, scalability and an understanding of how to maximize their value, increasing numbers of biomedical researchers will be strategically poised to pursue increasingly sophisticated KR projects aimed at improving our overall understanding of the capability and behavior of biological systems.
Bio2RDF is an open-source project that offers a large and
connected knowledge graph of Life Science Linked Data. Each dataset is expressed using its own vocabulary, thereby hindering integration, search, query, and browse data across similar or identical types of data. With growth and content changes in source data, a manual approach to maintain mappings has proven untenable. The aim of this work is to develop a (semi)automated procedure to generate high quality mappings
between Bio2RDF and SIO using BioPortal ontologies. Our preliminary results demonstrate that our approach is promising in that it can find new mappings using a transitive closure between ontology mappings. Further development of the methodology coupled with improvements in
the ontology will offer a better-integrated view of the Life Science Linked Data
Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semant...Xiaogang (Marshall) Ma
Knowledge evolves in geoscience, and the evolution is reflected in datasets. In a context with distributed data sources, the evolution of knowledge may cause considerable challenges to data management and re-use. For example, a short news published in 2009 (Mascarelli, 2009) revealed the geoscience community’s concern that the International Commission on Stratigraphy’s change to the definition of Quaternary may bring heavy reworking of geologic maps. Now we are in the era of the World Wide Web, and geoscience knowledge is increasingly modeled and encoded in the form of ontologies and vocabularies by using semantic technologies. Accordingly, knowledge evolution leads to a consequence called ontology dynamics. Flouris et al. (2008) summarized 10 topics of general ontology changes/dynamics such as: ontology mapping, morphism, evolution, debugging and versioning, etc. Ontology dynamics makes impacts at several stages of a data life cycle and causes challenges, such as: the request for reworking of the extant data in a data center, semantic mismatch among data sources, differentiated understanding of a same piece of dataset between data providers and data users, as well as error propagation in cross-discipline data discovery and re-use (Ma et al., 2014). This presentation will analyze the best practices in the geoscience community so far and summarize a few recommendations to reduce the negative impacts of ontology dynamics in a data life cycle, including: communities of practice and collaboration on ontology and vocabulary building, link data records to standardized terms, and methods for (semi-)automatic reworking of datasets using semantic technologies.
References:
Flouris, G., Manakanatas, D., Kondylakis, H., Plexousakis, D., Antoniou, G., 2008. Ontology change: classification and survey. The Knowledge Engineering Review 23 (2), 117-152.
Ma, X., Fox, P., Rozell, E., West, P., Zednik, S., 2014. Ontology dynamics in a data life cycle: Challenges and recommendations from a Geoscience Perspective. Journal of Earth Science 25 (2), 407-412.
Mascarelli, A.L., 2009. Quaternary geologists win timescale vote. Nature 459, 624.
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)Michel Dumontier
In the quest to translate the results biomedical research into effective clinical applications, many are now trying to make sense of the large and rapidly growing amount of public biomedical data. However, substantial challenges exist in traversing the currently fragmented data landscape. In this talk, I will discuss our efforts to use Semantic Web technologies to facilitate biomedical research through the formulation, publication, integration, and exploration of facts, expert knowledge, and web services.
PubChem for drug discovery in the age of big data and artificial intelligenceSunghwan Kim
Presented at the American Chemical Society Middle Atlantic Regional Meeting (MARM) 2021 (June 10, 2021).
==== Abstract ====
With the emergence of the age of big data and artificial intelligence, biomedical research communities have a great interest in exploiting the massive amount of chemical and biological data available in the public domain. PubChem (https://pubchem.ncbi.nlm.nih.gov) is one of the largest sources of publicly available chemical information, with +270 million substance descriptions, +110 million unique compounds, +285 million bioactivity outcomes from more than one million biological assay experiments. PubChem provides a wide range of chemical information, including structure, pharmacology, toxicology, drug target, metabolism, chemical vendors, patents, regulations, clinical trials, and many others. These contents can be accessed interactively through web browsers as well as programmatically using computer scripts. They can also be downloaded in bulk through the PubChem File Transfer Protocol (FTP) site. PubChem data has been used in many studies for developing bioactivity and toxicity prediction models, discovering polypharmacologic (multi-target) ligands, and identifying new macromolecule targets of compounds (for drug-repurposing or off-target side effect prediction). This presentation provides an overview of PubChem data, tools, and services useful for drug discovery.
Defrosting the Digital Library: A survey of bibliographic tools for the next ...Duncan Hull
After centuries with little change, scientific libraries have recently experienced massive upheaval. From being almost entirely paper-based, most libraries are now almost completely digital. This information revolution has all happened in less than 20 years and has created many novel opportunities and threats for scientists, publishers and libraries.
Today, we are struggling with an embarassing wealth of digital knowledge on the Web. Most scientists access this knowledge through some kind of digital library, however these places can be cold, impersonal, isolated, and inaccessible places. Many libraries are still clinging to obsolete models of identity, attribution, contribution, citation and publication.
Based on a review published in PLoS Computational Biology, http://pubmed.gov/18974831 this talk will discuss the current chilly state of digital libraries for biologists, chemists and informaticians, including PubMed and Google Scholar. We highlight problems and solutions to the coupling and decoupling of publication data and metadata, with a tool called http://www.citeulike.org. This software tool exploits the Web to make digital libraries “warmer”: more personal, sociable, integrated, and accessible places.
Finally issues that will help or hinder the continued warming of libraries in the future, particularly the accurate identity of authors and their publications, are briefly introduced. These are discussed in the context of the BBSRC funded REFINE project, at the National Centre for Text Mining (NaCTeM.ac.uk), which is linking biochemical pathway data with evidence for pathways from the PubMed database.
The DNA Data Bank of Japan (DDBJ) is a biological database that collects DNA sequences. It is located at the National Institute of Genetics (NIG) in the Shizuoka prefecture of Japan. It is also a member of the International Nucleotide Sequence Database Collaboration or INSDC.
A presentation to the New Year's Event for Maastricht University's Knowledge Engineering @ Work Program. https://www.maastrichtuniversity.nl/news/kework-first-10-students-academic-workstudy-track-graduate
Generating Biomedical Hypotheses Using Semantic Web TechnologiesMichel Dumontier
With its focus on investigating the nature and basis for the sustained existence of living systems, modern biology has always been a fertile, if not challenging, domain for formal knowledge representation and automated reasoning. Over the past 15 years, hundreds of projects have developed or leveraged ontologies for entity recognition and relation extraction, semantic annotation, data integration, query answering, consistency checking, association mining and other forms of knowledge discovery. In this talk, I will discuss our efforts to build a rich foundational network of ontology-annotated linked data, discover significant biological associations across these data using a set of partially overlapping ontologies, and identify new avenues for drug discovery by applying measures of semantic similarity over phenotypic descriptions. As the portfolio of Semantic Web technologies continue to mature in terms of functionality, scalability and an understanding of how to maximize their value, increasing numbers of biomedical researchers will be strategically poised to pursue increasingly sophisticated KR projects aimed at improving our overall understanding of the capability and behavior of biological systems.
Bio2RDF is an open-source project that offers a large and
connected knowledge graph of Life Science Linked Data. Each dataset is expressed using its own vocabulary, thereby hindering integration, search, query, and browse data across similar or identical types of data. With growth and content changes in source data, a manual approach to maintain mappings has proven untenable. The aim of this work is to develop a (semi)automated procedure to generate high quality mappings
between Bio2RDF and SIO using BioPortal ontologies. Our preliminary results demonstrate that our approach is promising in that it can find new mappings using a transitive closure between ontology mappings. Further development of the methodology coupled with improvements in
the ontology will offer a better-integrated view of the Life Science Linked Data
Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semant...Xiaogang (Marshall) Ma
Knowledge evolves in geoscience, and the evolution is reflected in datasets. In a context with distributed data sources, the evolution of knowledge may cause considerable challenges to data management and re-use. For example, a short news published in 2009 (Mascarelli, 2009) revealed the geoscience community’s concern that the International Commission on Stratigraphy’s change to the definition of Quaternary may bring heavy reworking of geologic maps. Now we are in the era of the World Wide Web, and geoscience knowledge is increasingly modeled and encoded in the form of ontologies and vocabularies by using semantic technologies. Accordingly, knowledge evolution leads to a consequence called ontology dynamics. Flouris et al. (2008) summarized 10 topics of general ontology changes/dynamics such as: ontology mapping, morphism, evolution, debugging and versioning, etc. Ontology dynamics makes impacts at several stages of a data life cycle and causes challenges, such as: the request for reworking of the extant data in a data center, semantic mismatch among data sources, differentiated understanding of a same piece of dataset between data providers and data users, as well as error propagation in cross-discipline data discovery and re-use (Ma et al., 2014). This presentation will analyze the best practices in the geoscience community so far and summarize a few recommendations to reduce the negative impacts of ontology dynamics in a data life cycle, including: communities of practice and collaboration on ontology and vocabulary building, link data records to standardized terms, and methods for (semi-)automatic reworking of datasets using semantic technologies.
References:
Flouris, G., Manakanatas, D., Kondylakis, H., Plexousakis, D., Antoniou, G., 2008. Ontology change: classification and survey. The Knowledge Engineering Review 23 (2), 117-152.
Ma, X., Fox, P., Rozell, E., West, P., Zednik, S., 2014. Ontology dynamics in a data life cycle: Challenges and recommendations from a Geoscience Perspective. Journal of Earth Science 25 (2), 407-412.
Mascarelli, A.L., 2009. Quaternary geologists win timescale vote. Nature 459, 624.
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)Michel Dumontier
In the quest to translate the results biomedical research into effective clinical applications, many are now trying to make sense of the large and rapidly growing amount of public biomedical data. However, substantial challenges exist in traversing the currently fragmented data landscape. In this talk, I will discuss our efforts to use Semantic Web technologies to facilitate biomedical research through the formulation, publication, integration, and exploration of facts, expert knowledge, and web services.
Prof William Kosar: Letters of Credit as a Payment MethodWilliam Kosar
This is the 2nd lesson from a 5 day course on Letters of Credit (in English and Arabic) taught to Iraqi Private Commercial Bankers both at the Banking and Finance Academy in Erbil as well as the Banking Studies Center of the Central Bank of Iraq in Baghdad. .
Connecting life sciences data at the European Bioinformatics InstituteConnected Data World
Tony Burdett's slides from his talk at Connected Data London. Tony is a Senior Software Engineer at The European Bioinformatics Institute. He presented the complexity of data at the EMBL-EBI and what is their solution to make sense of all this data.
INTRODUCTION
WHAT IS DATA AND DATABASE?
WHAT IS BIOLOGICAL DATABASE?
TYPES OF BIOLOGICAL DATABASE
PRIMARY DATABASE
Nucleic acid sequence database
Protein sequence database
SECONDARY DATABASE
COMPOSITE DATABASE
TERTIARY DATABASE
WHY NEED?
CONCLUSION
REFRENCES
CINF 1: Generating Canonical Identifiers For (Glycoproteins And Other Chemica...NextMove Software
Bioinformatics dogma asserts that all-atom representations, capable of encoding details such as disulfide bridging and post-translationally modified amino acids, are too unwieldy to be of practical use. In this presentation, we show how recent advances in computer power, software algorithms and storage technology require us to question this precept. We show how InChI, InChI keys and canonical SMILES can be generated for the largest known proteins, and even for nucleic acid sequences as large as viral and prokaryotic genomes. Indeed, unique identifiers derived from all-atom nucleic acid representations, allow the capture of epigenetic methylation information and circular DNA; feats that are impossible with the one-letter codes used by bioinformaticians. These unique identifiers allow the linking of mature antibodies to the unique identifiers of the plasmids used to express them. Finally, we discuss the possibility of polymer-specific implementations/optimizations of standard InChI, by showing how InChIs and InChI keys may be generated efficiently for specific classes of polymer with over a million atoms.
Apollo and i5K: Collaborative Curation and Interactive Analysis of GenomesMonica Munoz-Torres
Precise elucidation of the many different biological features encoded in a genome requires a careful curation process that involves reviewing all available evidence to allow researchers to resolve discrepancies and validate automated gene models, protein alignments, and other biological elements. Genome annotation is an inherently collaborative task; researchers only rarely work in isolation, turning to colleagues for second opinions and insights from those with expertise in particular domains and gene families.
The i5k initiative seeks to sequence the genomes of 5,000 insect and related arthropod species. The selected species are known to be important to worldwide agriculture, food safety, medicine, and energy production as well as many used as models in biology, those most abundant in world ecosystems, and representatives in every branch of the insect phylogeny in an effort to better understand arthropod evolution and phylogeny. Because computational genome analysis remains an imperfect art, each of these new genomes sequenced will require visualization and curation.
Apollo is an instantaneous, collaborative, genome annotation editor, and the new JavaScript based version allows researchers real-time interactivity, breaking down large amounts of data into manageable portions to mobilize groups of researchers with shared interests. The i5K is a broad and inclusive effort that seeks to involve scientists from around the world in their genome curation process and Apollo is serving as the platform to empower this community. Here we offer details about this collaboration.
ICAR 2015
Workshop 10 (TUESDAY, JULY 7, 2015, 4:30-6:00 PM)
The Arabidopsis information portal for users and developers
Nick Provart (University of Toronto)
A Community Collaborator Perspective: Case study 1 - BioAnalytic Resource
Ontologies and Semantic Web technologies play an important role in the life sciences to help make data more interoperable and reusable. There are now many publicly available ontologies that enable biologists to describe everything from gene function through to animal physiology and disease.
Various efforts such as the Open Biomedical Ontologies (OBO) foundry provide central registries for biomedical ontologies and ensure they remain interoperable through a set of common shared development principles.
At EMBL-EBI we contribute to the development of biomedical ontologies and make extensive use of them in the annotation of public datasets. Biological data typically comes with rich and often complex metadata, so the ontologies provide a standard way to capture “what the data is about” and gives us hooks to connect to more data about similar things.
These ontology annotations have been put to good use in a number of large-scale data integration efforts and there’s an increasing recognition of the need for ontologies in making data FAIR (Findable, Accessible, Interoperable and Reusable).
EMBL-EBI build a number of integrative data platforms where ontologies are at the core of our domain models. One example is the Open Targets platform, where data about disease from 18 different databases can be aggregated and grouped based on therapeutic areas in the ontology and used to identify potential drug targets.
The ontologies team at EMBL-EBI provide a suite of services that are aimed at making ontologies more accessible for both humans and machines. We work with scientific data curators and software developers to integrate ontologies and semantics into both the data generation and data presentation workflows. We provide:
– An ontology lookup service (OLS) that provides search and visualisation services to over 200+ ontologies
– Services for automating the annotation of metadata and learning from previous annotations (Zooma)
– An ontology mapping and alignment service (OXO)
– Tools for working with metadata and ontologies in spreadsheets (Webulous)
– Software for enriching documents in search engines to support “semantic” query expansion
I’ll present how we are using these services at EMBL-EBI to scale up the semantic annotation of metadata. I’ll talk about our open source technology stack and describe how we utilise a polyglot persistence approach (graph databases, triples stores, document stores etc) to optimize how we deliver ontologies and semantics to our users.
The increased availability of biomedical data, particularly in the public domain, offers the opportunity to better understand human health and to develop effective therapeutics for a wide range of unmet medical needs. However, data scientists remain stymied by the fact that data remain hard to find and to productively reuse because data and their metadata i) are wholly inaccessible, ii) are in non-standard or incompatible representations, iii) do not conform to community standards, and iv) have unclear or highly restricted terms and conditions that preclude legitimate reuse. These limitations require a rethink on data can be made machine and AI-ready - the key motivation behind the FAIR Guiding Principles. Concurrently, while recent efforts have explored the use of deep learning to fuse disparate data into predictive models for a wide range of biomedical applications, these models often fail even when the correct answer is already known, and fail to explain individual predictions in terms that data scientists can appreciate. These limitations suggest that new methods to produce practical artificial intelligence are still needed.
In this talk, I will discuss our work in (1) building an integrative knowledge infrastructure to prepare FAIR and "AI-ready" data and services along with (2) neurosymbolic AI methods to improve the quality of predictions and to generate plausible explanations. Attention is given to standards, platforms, and methods to wrangle knowledge into simple, but effective semantic and latent representations, and to make these available into standards-compliant and discoverable interfaces that can be used in model building, validation, and explanation. Our work, and those of others in the field, creates a baseline for building trustworthy and easy to deploy AI models in biomedicine.
Bio
Dr. Michel Dumontier is the Distinguished Professor of Data Science at Maastricht University, founder and executive director of the Institute of Data Science, and co-founder of the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. His research explores socio-technological approaches for responsible discovery science, which includes collaborative multi-modal knowledge graphs, privacy-preserving distributed data mining, and AI methods for drug discovery and personalized medicine. His work is supported through the Dutch National Research Agenda, the Netherlands Organisation for Scientific Research, Horizon Europe, the European Open Science Cloud, the US National Institutes of Health, and a Marie-Curie Innovative Training Network. He is the editor-in-chief for the journal Data Science and is internationally recognized for his contributions in bioinformatics, biomedical informatics, and semantic technologies including ontologies and linked data.
knowledge graphs are an emerging paradigm to represent information. yet their discovery and reuse is hampered by insufficient or inadequate metadata. here, the COST ACTION Distributed Knowledge Graphs had a first workshop to develop a KG metadata schema. In this presentation, the progress and plans are discussed with the W3C Community Group on Knowledge Graph Construction.
Data-Driven Discovery Science with FAIR Knowledge GraphsMichel Dumontier
Data-Driven Discovery Science with FAIR Knowledge Graphs
Despite the existence of vast amounts of biomedical data, these remain difficult to find and to productively reuse in machine learning and other Artificial Intelligence technologies. In this talk, I will discuss the role of the FAIR Guiding Principles to make AI-ready biomedical data, and their representation as knowledge graphs not only enables powerful ontology-backed semantic queries, but also can be used to predict missing information, as well as to check the quality of knowledge collected.
The main idea of the talk is to introduce the FAIR principles (what they are and what they are not), and how their application with semantic web technologies (ontologies/linked data) creates improved possibilities for large scale data integration, answering sophisticated questions using automated reasoners, and predicting new relations/validating data using graph embeddings. The audience will gain insight into the state of the art in a carefully presented manner that introduces principles, approaches, and outcomes relevant to Health AI.
The FAIR (Findable, Accessible, Interoperable, Reusable) Guiding Principles light a path towards improving the discovery and reuse of digital objects (data, documents, software, web services, etc) by machines. Machine reusability is a crucial strategic component in building robust digital infrastructure that strengthens scholarship and opens new pathways for innovation on a truly global scale. However, as the FAIR principles do not specify any particular implementation, communities have the homework to devise, standardize and implement technical specifications to improve the ‘FAIRness’ of digital assets. In this seminar, I will focus on the history and state of the art in the FAIRness assessment, including manual, semi-automated and fully automated approaches, and how these can be used by developers and consumers alike. This seminar will serve as a springboard for community discussion and adoption of these services to incrementally and realistically improve the FAIRness of their resources.
The Role of the FAIR Guiding Principles for an effective Learning Health SystemMichel Dumontier
he learning health system (LHS) is an integrated social and technological system that embeds continuous improvement and innovation for the effective delivery of healthcare. A crucial part of the LHS lies in how the underlying information system will secure and take advantage of relevant knowledge assets towards supporting complex and unusual clinical decision making, facilitating public health surveillance, and aiding comparative effectiveness research. However, key knowledge assets remain difficult to obtain and reuse, particularly in a decentralized context. In this talk, I will discuss the role of the Findable, Accessible, Interoperable, and Reusable (FAIR) Guiding Principles towards the realization of the LHS, along with emerging technologies to publish and refine clinical research and knowledge derived therein.
Keynote given for 2021 Knowledge Representation for Health Care http://banzai-deim.urv.net/events/KR4HC-2021/
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...Michel Dumontier
Biomedicine has always been a fertile and challenging domain for computational discovery science. Indeed, the existence of millions of scientific articles, thousands of databases, and hundreds of ontologies, offer exciting opportunities to reuse our collective knowledge, were we not stymied by incompatible formats, overlapping and incomplete vocabularies, unclear licensing, and heterogeneous access points. In this talk, I will discuss our work to create computational standards, platforms, and methods to wrangle knowledge into simple, but effective representations based on semantic web technologies that are maximally FAIR - Findable, Accessible, Interoperable, and Reuseable - and to further use these for biomedical knowledge discovery. But only with additional crucial developments will this emerging Internet of FAIR data and services enable automated scientific discovery on a global scale.
bio:
Dr. Michel Dumontier is the Distinguished Professor of Data Science at Maastricht University and co-founder of the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. His research focuses on the development of computational methods for scalable and responsible discovery science. Dr. Dumontier obtained his BSc (Biochemistry) in 1998 from the University of Manitoba, and his PhD (Bioinformatics) in 2005 from the University of Toronto. Previously a faculty member at Carleton University in Ottawa and Stanford University in Palo Alto, Dr. Dumontier founded and directs the interfaculty Institute of Data Science at Maastricht University to develop sociotechnological systems for responsible data science by design. His work is supported through the Dutch National Research Agenda, the Netherlands Organisation for Scientific Research, Horizon 2020, the European Open Science Cloud, the US National Institutes of Health and a Marie-Curie Innovative Training Network. He is the editor-in-chief for the journal Data Science and is internationally recognized for his contributions in bioinformatics, biomedical informatics, and semantic technologies including ontologies and linked data.
This presentation was given on October 21, 2020 at CIKM2020.
The role of the FAIR Guiding Principles in a Learning Health SystemMichel Dumontier
The learning health system (LHS) is a concept for a socio-technological system that continuously improves the delivery of health care by coupling biomedical research with practice- and evidence- based medicine. Key aspects of the LHS are collecting, integrating, and analyzing data from different sources. While the increased digitalisation of healthcare is creating new data sources, these remain hard to find and use, let alone make use of as part of intelligent systems for the benefit of patients, healthcare providers, and researchers. This talk will examine recent developments towards making key parts of the LHS, such as clinical practice guidelines, Findable, Accessible, Interoperable, and Reusable (FAIR).
Acclerating biomedical discovery with an internet of FAIR data and services -...Michel Dumontier
With its focus on improving the health and well being of people, biomedicine has always been a fertile, if not challenging domain for computational discovery science. Indeed, the existence of millions of scientific articles, thousands of databases, and hundreds of ontologies, offer exciting opportunities to reuse our collective knowledge, were we not stymied by incompatible formats, overlapping and incomplete vocabularies, unclear licensing, and heterogeneous access points. In this talk, I will discuss our work to create computational standards, platforms, and methods to wrangle knowledge into simple, but effective representations based on semantic web technologies that are maximally FAIR - Findable, Accessible, Interoperable, and Reuseable - and to further use these for biomedical knowledge discovery. But only with additional crucial developments will this emerging Internet of FAIR data and services, which is built on Semantic Web technologies, be well positioned to support automated scientific discovery on a global scale.
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...Michel Dumontier
ith its focus on improving the health and well being of people, biomedicine has always been a fertile, if not challenging domain for computational discovery science. Indeed, the existence of millions of scientific articles, thousands of databases, and hundreds of ontologies, offer exciting opportunities to reuse our collective knowledge, were we not stymied by incompatible formats, overlapping and incomplete vocabularies, unclear licensing, and heterogeneous access points. In this talk, I will discuss our work to create computational standards, platforms, and methods to wrangle knowledge into simple, but effective representations based on semantic web technologies that are maximally FAIR - Findable, Accessible, Interoperable, and Reuseable - and to further use these for biomedical knowledge discovery. But only with additional crucial developments will this emerging Internet of FAIR data and services enable automated scientific discovery on a global scale.
Are we FAIR yet? And will it be worth it?
The FAIR Principles propose essential characteristics that all digital resources (e.g. datasets, repositories, web services) should possess to be Findable, Accessible, Interoperable, and Reusable by both humans and machines. The Principles act as a guide that researchers and data stewards should expect from contemporary digital resources, and in turn, the requirements on them when publishing their own scholarly products. As interest in, and support for the Principles has spread, the diversity of interpretations has also broadened, with some resources claiming to already “be FAIR”.
This talk will elaborate on what FAIR is, what it entails, and how we should evaluate FAIRness. I will describe new social and technological infrastructure to support the creation and evaluation of FAIR resources, and how FAIR fits into institutional, national and international efforts. Finally, I will discuss the merits of the FAIR principles (and what we ask of people) in the context of strengthening data-driven scientific inquiry.Are we FAIR yet? And will it be worth it?
The FAIR Principles propose essential characteristics that all digital resources (e.g. datasets, repositories, web services) should possess to be Findable, Accessible, Interoperable, and Reusable by both humans and machines. The Principles act as a guide that researchers and data stewards should expect from contemporary digital resources, and in turn, the requirements on them when publishing their own scholarly products. As interest in, and support for the Principles has spread, the diversity of interpretations has also broadened, with some resources claiming to already “be FAIR”.
This talk will elaborate on what FAIR is, what it entails, and how we should evaluate FAIRness. I will describe new social and technological infrastructure to support the creation and evaluation of FAIR resources, and how FAIR fits into institutional, national and international efforts. Finally, I will discuss the merits of the FAIR principles (and what we ask of people) in the context of strengthening data-driven scientific inquiry.
Keynote given at NETTAB2018 - http://www.igst.it/nettab/2018/
The future of science and business - a UM Star LectureMichel Dumontier
I discuss how data science is affecting our way of life and how we at Maastricht University are preparing the next generation of leaders to address opportunities and challenges in responsible manner.
The FAIR Principles propose key characteristics that all digital resources (e.g. datasets, repositories, web services) should possess to be Findable, Accessible, Interoperable, and Reusable by people and machines. The Principles act as a guide that researchers should expect from contemporary digital resources, and in turn, the requirements on them when publishing their own scholarly products. As interest in, and support for the Principles has spread, the diversity of interpretations has also broadened, with some resources claiming to already “be FAIR”. This talk will elaborate on what FAIR is, why we need it, what it entails, and how we should evaluate FAIRness. I will describe new social and technological infrastructure to support the creation and evaluation of FAIR resources, and how FAIR fits into institutional, national and international efforts. Finally, I will discuss the merits of the FAIR principles (and what we ask of people) in the context of strengthening data-driven scientific inquiry.
A talk prepared for Workshop Working on data stewardship? Meet your peers!
Datum: 03 OKT 2017
https://www.surf.nl/agenda/2017/10/workshop-working-on-data-stewardship-meet-your-peers/index.html
Towards metrics to assess and encourage FAIRnessMichel Dumontier
With an increased interest in the FAIR metrics, there is need to develop tools and appraoches that can assess the FAIRness of a digital resource. This talk begins to explore some ideas in this space, and invites people to participate in a working group focused on the development, application, and evaluation of FAIR metric efforts.
Ontology has its roots as a field of philosophical study that is focused on the nature of existence. However, today's ontology (aka knowledge graph) can incorporate computable descriptions that can bring insight in a wide set of compelling applications including more precise knowledge capture, semantic data integration, sophisticated query answering, and powerful association mining - thereby delivering key value for health care and the life sciences. In this webinar, I will introduce the idea of computable ontologies and describe how they can be used with automated reasoners to perform classification, to reveal inconsistencies, and to precisely answer questions. Participants will learn about the tools of the trade to design, find, and reuse ontologies. Finally, I will discuss applications of ontologies in the fields of diagnosis and drug discovery.
Bio:
Dr. Michel Dumontier is an Associate Professor of Medicine (Biomedical Informatics) at Stanford University. His research focuses on the development of methods to integrate, mine, and make sense of large, complex, and heterogeneous biological and biomedical data. His current research interests include (1) using genetic, proteomic, and phenotypic data to find new uses for existing drugs, (2) elucidating the mechanism of single and multi-drug side effects, and (3) finding and optimizing combination drug therapies. Dr. Dumontier is the Stanford University Advisory Committee Representative for the World Wide Web Consortium, the co-Chair for the W3C Semantic Web for Health Care and the Life Sciences Interest Group, scientific advisor for the EBI-EMBL Chemistry Services Division, and the Scientific Director for Bio2RDF, an open source project to create Linked Data for the Life Sciences. He is also the founder and Editor-in-Chief for a Data Science, a new IOS Press journal featuring open access, open review, and semantic publishing.
New Directions in Targeted Therapeutic Approaches for Older Adults With Mantl...i3 Health
i3 Health is pleased to make the speaker slides from this activity available for use as a non-accredited self-study or teaching resource.
This slide deck presented by Dr. Kami Maddocks, Professor-Clinical in the Division of Hematology and
Associate Division Director for Ambulatory Operations
The Ohio State University Comprehensive Cancer Center, will provide insight into new directions in targeted therapeutic approaches for older adults with mantle cell lymphoma.
STATEMENT OF NEED
Mantle cell lymphoma (MCL) is a rare, aggressive B-cell non-Hodgkin lymphoma (NHL) accounting for 5% to 7% of all lymphomas. Its prognosis ranges from indolent disease that does not require treatment for years to very aggressive disease, which is associated with poor survival (Silkenstedt et al, 2021). Typically, MCL is diagnosed at advanced stage and in older patients who cannot tolerate intensive therapy (NCCN, 2022). Although recent advances have slightly increased remission rates, recurrence and relapse remain very common, leading to a median overall survival between 3 and 6 years (LLS, 2021). Though there are several effective options, progress is still needed towards establishing an accepted frontline approach for MCL (Castellino et al, 2022). Treatment selection and management of MCL are complicated by the heterogeneity of prognosis, advanced age and comorbidities of patients, and lack of an established standard approach for treatment, making it vital that clinicians be familiar with the latest research and advances in this area. In this activity chaired by Michael Wang, MD, Professor in the Department of Lymphoma & Myeloma at MD Anderson Cancer Center, expert faculty will discuss prognostic factors informing treatment, the promising results of recent trials in new therapeutic approaches, and the implications of treatment resistance in therapeutic selection for MCL.
Target Audience
Hematology/oncology fellows, attending faculty, and other health care professionals involved in the treatment of patients with mantle cell lymphoma (MCL).
Learning Objectives
1.) Identify clinical and biological prognostic factors that can guide treatment decision making for older adults with MCL
2.) Evaluate emerging data on targeted therapeutic approaches for treatment-naive and relapsed/refractory MCL and their applicability to older adults
3.) Assess mechanisms of resistance to targeted therapies for MCL and their implications for treatment selection
Ethanol (CH3CH2OH), or beverage alcohol, is a two-carbon alcohol
that is rapidly distributed in the body and brain. Ethanol alters many
neurochemical systems and has rewarding and addictive properties. It
is the oldest recreational drug and likely contributes to more morbidity,
mortality, and public health costs than all illicit drugs combined. The
5th edition of the Diagnostic and Statistical Manual of Mental Disorders
(DSM-5) integrates alcohol abuse and alcohol dependence into a single
disorder called alcohol use disorder (AUD), with mild, moderate,
and severe subclassifications (American Psychiatric Association, 2013).
In the DSM-5, all types of substance abuse and dependence have been
combined into a single substance use disorder (SUD) on a continuum
from mild to severe. A diagnosis of AUD requires that at least two of
the 11 DSM-5 behaviors be present within a 12-month period (mild
AUD: 2–3 criteria; moderate AUD: 4–5 criteria; severe AUD: 6–11 criteria).
The four main behavioral effects of AUD are impaired control over
drinking, negative social consequences, risky use, and altered physiological
effects (tolerance, withdrawal). This chapter presents an overview
of the prevalence and harmful consequences of AUD in the U.S.,
the systemic nature of the disease, neurocircuitry and stages of AUD,
comorbidities, fetal alcohol spectrum disorders, genetic risk factors, and
pharmacotherapies for AUD.
Prix Galien International 2024 Forum ProgramLevi Shapiro
June 20, 2024, Prix Galien International and Jerusalem Ethics Forum in ROME. Detailed agenda including panels:
- ADVANCES IN CARDIOLOGY: A NEW PARADIGM IS COMING
- WOMEN’S HEALTH: FERTILITY PRESERVATION
- WHAT’S NEW IN THE TREATMENT OF INFECTIOUS,
ONCOLOGICAL AND INFLAMMATORY SKIN DISEASES?
- ARTIFICIAL INTELLIGENCE AND ETHICS
- GENE THERAPY
- BEYOND BORDERS: GLOBAL INITIATIVES FOR DEMOCRATIZING LIFE SCIENCE TECHNOLOGIES AND PROMOTING ACCESS TO HEALTHCARE
- ETHICAL CHALLENGES IN LIFE SCIENCES
- Prix Galien International Awards Ceremony
12. What is the semantic web?
The Semantic Web is a web of knowledge.
It is about standard formats for
representing and querying
knowledge drawn from
diverse sources and
making statements
about real
objects.
13. Goals for the Semantic Web
• Provide a common knowledge representation
• syntax & semantics
• Facilitate publishing, data integration and
information retrieval
• Make possible semantically interoperable web
applications and services
• Enable the answering of questions across global
repositories of knowledge
14. Resource Description Framework (RDF)
• Allows one to express propositions, and reason
about them
• Uniform Resource Identifier (URI) are entity names
• i.e http://purl.uniprot.org/uniprot/Q16665
• A RDF statement consists of:
– Subject: resource identified by a URI u:Q16665
– Predicate: resource identified by a URI rdf:type
– Object: resource or literal
Protein
15. Semantic Knowledge Base
fact
Q16665
rdf:type
Protein rdf:type
rdfs:subClassOf
Molecule
ontology
Knowledge base
17. Syntactic Data Integration
depends on consistent naming
has name
u:Q16665 HIF1-alpha
HIF1-alpha
UniProt
has name
+
located in located in
u:Q16665 go:nucleus u:Q16665 go:nucleus
Gene Ontology
+ interacts with
u:vhl
interacts with
u:Q16665 u:vhl Unified view
BIND
31. Services
• Describe a resource
– http://bio2rdf.org/ns:id
• Global services over federated endpoints
– http://bio2rdf.org/links/ns:id
– http://bio2rdf.org/search/term
• Targeted services to a specific endpoint
– http://bio2rdf.org/linksns/ns/ns2:id
– http://bio2rdf.org/searchns/ns/term
46. Bioinformatics Discovery Registry
• Part of SharedName initiative to provide stable URI
patterns for data records.
• We add the relationship between entities and records
Discovery Service
• Registry links entities to data records, their formats
(RDF/XML, HTML, etc) and provider (Bio2RDF, Uniprot)
http://registry.semanticscience.org/ns:id
Redirection Service
• Automatic redirection to data provider document
http://registry.semanticsience.org/doc/provider/format/ns:id
50. The Knowledge Web
• Merging data & services
• Reasoning & question answering
• Persistent (RESTful)
• Trust & Security
Data consumers must be able
to rely upon your data to use it
as a foundation for their own
applications.
51. 2009 Goals
• Add more data!
– Standardize RDFizers
– Enrichment from small producer data!
• Design more RESTful services (Workflow)
• Start using Virtuoso 6 cluster
• Add mirrors
• Approval from data providers to distribute RDF
dump and publish SPARQL endpoints
– Confirmed: UniProt, BioCyc, Pathway Commons, BIND
55. Thanks To:
• The Bio2RDF community
• Dumontier Lab
– Alex De Leon, Jose Cruz, Natalia Villanueva-Rosales
• Quebec Reseachers
– Francois Belleau, Marc-Alexandre Nolin
• Australian Researchers
– Peter Ansell
• Openlink Virtuoso Team