Over 15 years ago, Sir Tim Berners Lee proclaimed the founding of an exciting new future involving intelligent agents operating over smarter data in order to perform complex tasks at the behest of their human controllers. At the heart of this vision lies an uneasy alliance between tedious formal knowledge representations and powerful analytics over big, but often messy data. Bio2RDF, our decade old open source project to create Linked Data for the life sciences, has weaved emergent Semantic Web technologies such as ontologies and Linked Data to generate FAIR - Findable, Accessible, Interoperable, and Reusable - data in the form of billions of machine accessible statements for use in downstream biomedical discovery.
This revolution in data publication has been strengthened by action from global bioinformatics institutions such as the NCBI, NCBO, EBI, and DBCLS. Notably, NCBI's PubChem has successfully coupled large scale data integration with community-based standards to offer a remakable biochemical knowledge resource amenable to data hungry discovery tools. Yet, in the face of increasing pressure from researchers, funders, and publishers, will these approaches be sufficient for growing and maintaining a comprehensive knowledge graph that is inclusive of all biomedical research?
A presentation to the New Year's Event for Maastricht University's Knowledge Engineering @ Work Program. https://www.maastrichtuniversity.nl/news/kework-first-10-students-academic-workstudy-track-graduate
Semantic web technologies offer a potential mechanism for the representation and integration of thousands of biomedical databases. Many of these databases offer cross-references to other data sources, but these are generally incomplete and prone to error. In this paper, we conduct an empirical analysis of the link structure of life science Linked Data, obtained from the Bio2RDF project. Three different link graphs for datasets, entities and terms are characterized by degree, connectivity, and clustering metrics, and their correlation is measured as well. Furthermore, we utilize the symmetry and transitivity of entity links to build a benchmark and evaluate several popular entity matching approaches. Our findings indicate that the life science data network can help find hidden links, can be used to validate links, and may offer a mechanism to integrate a wider set of resources to support biomedical knowledge discovery.
Towards metrics to assess and encourage FAIRnessMichel Dumontier
With an increased interest in the FAIR metrics, there is need to develop tools and appraoches that can assess the FAIRness of a digital resource. This talk begins to explore some ideas in this space, and invites people to participate in a working group focused on the development, application, and evaluation of FAIR metric efforts.
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental MetadataMichel Dumontier
Biomedical researchers will remain stymied in their ability to take full advantage of the Big Data revolution if they can never find the datasets that they need to analyze, if there is lack of clarity about what particular datasets contain, and if data are insufficiently described.
CEDAR, an NIH BD2K Center of Excellence, aims to develop methods and tools to vastly ease the burden of authoring good experimental metadata, and to maximally use this information to zero in on datasets of interest.
Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. This document describes a consensus among participating stakeholders in the Health Care and the Life Sciences domain on the description of datasets using the Resource Description Framework (RDF). This specification meets key functional requirements, reuses existing vocabularies to the extent that it is possible, and addresses elements of data description, versioning, provenance, discovery, exchange, query, and retrieval.
A presentation to the New Year's Event for Maastricht University's Knowledge Engineering @ Work Program. https://www.maastrichtuniversity.nl/news/kework-first-10-students-academic-workstudy-track-graduate
Semantic web technologies offer a potential mechanism for the representation and integration of thousands of biomedical databases. Many of these databases offer cross-references to other data sources, but these are generally incomplete and prone to error. In this paper, we conduct an empirical analysis of the link structure of life science Linked Data, obtained from the Bio2RDF project. Three different link graphs for datasets, entities and terms are characterized by degree, connectivity, and clustering metrics, and their correlation is measured as well. Furthermore, we utilize the symmetry and transitivity of entity links to build a benchmark and evaluate several popular entity matching approaches. Our findings indicate that the life science data network can help find hidden links, can be used to validate links, and may offer a mechanism to integrate a wider set of resources to support biomedical knowledge discovery.
Towards metrics to assess and encourage FAIRnessMichel Dumontier
With an increased interest in the FAIR metrics, there is need to develop tools and appraoches that can assess the FAIRness of a digital resource. This talk begins to explore some ideas in this space, and invites people to participate in a working group focused on the development, application, and evaluation of FAIR metric efforts.
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental MetadataMichel Dumontier
Biomedical researchers will remain stymied in their ability to take full advantage of the Big Data revolution if they can never find the datasets that they need to analyze, if there is lack of clarity about what particular datasets contain, and if data are insufficiently described.
CEDAR, an NIH BD2K Center of Excellence, aims to develop methods and tools to vastly ease the burden of authoring good experimental metadata, and to maximally use this information to zero in on datasets of interest.
Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. This document describes a consensus among participating stakeholders in the Health Care and the Life Sciences domain on the description of datasets using the Resource Description Framework (RDF). This specification meets key functional requirements, reuses existing vocabularies to the extent that it is possible, and addresses elements of data description, versioning, provenance, discovery, exchange, query, and retrieval.
A talk prepared for Workshop Working on data stewardship? Meet your peers!
Datum: 03 OKT 2017
https://www.surf.nl/agenda/2017/10/workshop-working-on-data-stewardship-meet-your-peers/index.html
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)Tom Plasterer
What to do About FAIR…
In the experience of most pharma professionals, FAIR remains fairly abstract, bordering on inconclusive. This session will outline specific case studies – real problems with real data, and address opportunities and real concerns.
·
Why making data Findable, Actionable, Interoperable and Reusable is important.
Talk presented at the Data Driven Drug Development (D4) conference on March 20th, 2019.
FAIR data has flown up the hype curve without a clear sense of return from the required data stewardship investment. The killer use case for FAIR data is a science knowledge graph. It enables you to richly address novel questions of your and the world’s data. We started with data catalogues (findability) which exploited linked/referenced data using a few focused vocabularies (interoperability), for credentialed users (accessibility), with provenance and attribution (reusability) to make this happen.
This talk was presented at The Molecular Medicine Tri-Conference/Bio-IT West on March 11, 2019.
With its focus on investigating the basis for the sustained existence
of living systems, modern biology has always been a fertile, if not
challenging, domain for formal knowledge representation and automated
reasoning. With thousands of databases and hundreds of ontologies now
available, there is a salient opportunity to integrate these for
discovery. In this talk, I will discuss our efforts to build a rich
foundational network of ontology-annotated linked data, develop
methods to intelligently retrieve content of interest, uncover
significant biological associations, and pursue new avenues for drug
discovery. As the portfolio of Semantic Web technologies continue to
mature in terms of functionality, scalability, and an understanding of
how to maximize their value, researchers will be strategically poised
to pursue increasingly sophisticated KR projects aimed at improving
our overall understanding of human health and disease.
bio: Dr. Michel Dumontier is an Associate Professor of Medicine
(Biomedical Informatics) at Stanford University. His research aims to
find new treatments for rare and complex diseases. His research
interest lie in the publication, integration, and discovery of
scientific knowledge. Dr. Dumontier serves as a co-chair for the World
Wide Web Consortium Semantic Web in Health Care and Life Sciences
Interest Group (W3C HCLSIG) and is the Scientific Director for
Bio2RDF, a widely used open-source project to create and provide
linked data for life sciences.
FAIR Data Knowledge Graphs–from Theory to PracticeTom Plasterer
FAIR data has flown up the hype curve without a clear sense of return from the required data stewardship investment. The killer use case for FAIR data is a science knowledge graph. It enables you to richly address novel questions of your and the world’s data. We started with data catalogues (findability) which exploited linked/referenced data using a few focused vocabularies (interoperability), for credentialed users (accessibility), with provenance and attribution (reusability) to make this happen. Our processes enable simple creation of dataset records and linking to source data, providing a seamless federated knowledge graph for novice and advanced users alike.
Presented May 7th, 2019 at the Knowledge Graph Conference, Columbia University.
FAIR Data Management and FAIR Data SharingMerce Crosas
Presentation at the Critical Perspective on the Practice of Digiral Archeology symposium: http://archaeology.harvard.edu/critical-perspectives-practice-digital-archaeology
BioPharma and FAIR Data, a Collaborative AdvantageTom Plasterer
The concept of FAIR (Findable, Accessible, Interoperable and Reusable) data is becoming a reality as stakeholders from industry, academia, funding agencies and publishers are embracing this approach. For BioPharma being able to effectively share and reuse data is a tremendous competitive advantage, within a company, with peer organizations, key opinion leaders and regulatory agencies. A few key drivers, success stories and preliminary results of an industry data stewardship survey are presented.
DataCite and its Members: Connecting Research and Identifying KnowledgeETH-Bibliothek
PIDs and their metadata support scholarly research and its increasing amounts and
variety of scholarly output. DataCite provides services which enable the research community to identify, connect, cite and track these outputs, making content FAIR. New
services include data level metrics and the use of identifiers for organizations and new
types of content, e.g. software, repositories and instruments. As an open, collaborative
and community driven membership organization we rely on our members for their
input and experience to build services that are beneficial for the research community
as a whole. DataCite services as well as current and future initiatives will be described
and it will be shown how members can contribute and benefit. Over the course of the
years, our membership has grown and diversified and we are therefore refreshing and
clarifying our member model. The new member model will be presented and described.
The DataTags System: Sharing Sensitive Data with ConfidenceMerce Crosas
This talk was part of a session at the Research Data Alliance (RDA) 8th Plenary on Privacy Implications of Research Data Sets, during International Data Week 2016:
https://rd-alliance.org/rda-8th-plenary-joint-meeting-ig-domain-repositories-wg-rdaniso-privacy-implications-research-data
Slides in Merce Crosas site:
http://scholar.harvard.edu/mercecrosas/presentations/datatags-system-sharing-sensitive-data-confidence
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...dkNET
Abstract
In this presentation, Susan Gregurick, Ph.D., Associate Director of Data Science and Director, Office of Data Science Strategy at the National Institutes of Health, will share the NIH’s vision for a modernized, integrated FAIR biomedical data ecosystem and the strategic roadmap that NIH is following to achieve this vision. Dr. Gregurick will highlight projects being implemented by team members across the NIH’s 27 institutes and centers and will ways that industry, academia, and other communities can help NIH enable a FAIR data ecosystem. Finally, she will weave in how this strategy is being leveraged to address the COVID-19 pandemic.
Presenter: Susan Gregurick, Ph.D., Associate Director of Data Science and Director, Office of Data Science Strategy at the National Institutes of Health
dkNET Webinar Information: https://dknet.org/about/webinar
The Nuclear Receptor Signaling Atlas (NURSA) is partnering with dkNET (NIDDK Information Network) to host a dataset challenge, and we invite you to join! Everyone is talking about Big Data. How can we ensure that the impact of individual scientists working on a myriad of small and focused studies that discover and probe new phenomena - is not lost in the Big Data world. In fact, there is more than one way to generate big data and we would like your help in creating and expanding “big data” for NIDDK! In this 30-minute webinar, dkNET team will give a presentation about the overview of challenge task, how to use dkNET to find research resources, and top tips!
An overview on FAIR Data and FAIR Data stewardship, and the roadmap for FAIR Data solutions coordinated by the Dutch Techcentre for Life Sciences. This presentation was given at the Netherlands eScience Center's "Essential skills in data-intensive research" course week.
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...Tom Plasterer
As scientists in the life sciences we are trained to pursue singular goals around a publication or a validated target or a drug submission. Our failure rates are exceedingly high especially as we move closer to patients in the attempt to collect sufficient clinical evidence to demonstrate the value of novel therapeutics. This wastes resources as well as time for patients depending upon us for the next breakthrough.
Edge Informatics is an approach to ameliorate these failures. Using both technical and social solutions together knowledge can be shared and leveraged across the drug development process. This is accomplished by making data assets discoverable, accessible, self-described, reusable and annotatable. The Open PHACTS project pioneered this approach and has provided a number of the technical and social solutions to enable Edge Informatics. A number of pre-competitive consortia and some content providers have also embraced this approach, facilitating networks of collaborators within and outside a given organization. When taken together more accurate, timely and inclusive decision-making is fostered.
Dataset Catalogs as a Foundation for FAIR* DataTom Plasterer
BioPharma and the broader research community is faced with the challenge of simply finding the appropriate internal and external datasets for downstream analytics, knowledge-generation and collaboration. With datasets as the core asset, we wanted to promote both human and machine exploitability, using web-centric data cataloguing principles as described in the W3C Data on the Web Best Practices. To do so, we adopted DCAT (Data CATalog Vocabulary) and VoID (Vocabulary of Interlinked Datasets) for both RDF and non-RDF datasets at summary, version and distribution levels. Further, we’ve described datasets using a limited set of well-vetted public vocabularies, focused on cross-omics analytes and clinical features of the catalogued datasets.
My talk at the Open PHACTS last ever project meeting in Vienna 2016 where i was asked to talk about the challenges we addressed in open phacts with semantic web technology and what still needed to be done.
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...Tom Plasterer
Edge Informatics is an approach to accelerate collaboration in the BioPharma pipeline. By combining technical and social solutions knowledge can be shared and leveraged across the multiple internal and external silos participating in the drug development process. This is accomplished by making data assets findable, accessible, interoperable and reusable (FAIR). Public consortia and internal efforts embracing FAIR data and Edge Informatics are highlighted, in both preclinical and clinical domains.
This talk was presented at the Molecular Medicine Tri-Conference in San Francisco, CA on February 20, 2017
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)Michel Dumontier
In the quest to translate the results biomedical research into effective clinical applications, many are now trying to make sense of the large and rapidly growing amount of public biomedical data. However, substantial challenges exist in traversing the currently fragmented data landscape. In this talk, I will discuss our efforts to use Semantic Web technologies to facilitate biomedical research through the formulation, publication, integration, and exploration of facts, expert knowledge, and web services.
From Biological Data to Clinical Applications: Positioning a digital infrastr...Michel Dumontier
In the quest to translate the results of life science research into effective clinical applications, many are now turning their attention to and also trying to make sense of the large and rapidly growing amount of biological and biomedical data. Indeed, getting a grip on and keeping on top of the daily flood of new information, whether it be the latest in clinical reviews, scientific reports, or raw data is an ever-present and widely-recognized challenge. The limited access to structured, integrated and citable data limits our ability to exploit a rich source of scientific knowledge for clinical and translational research. While keeping the dual goals of increasing our understanding of how living systems respond to chemical agents and translating our combined knowledge into clinical applications, I will discuss our efforts to leverage SemanticWeb technologies to facilitate the formulation, publication, integration, and discovery of biological facts, expert knowledge and services of value to pharmaceutical and clinical research, and more recently, with applications for the patient-centric delivery of health care.
A talk prepared for Workshop Working on data stewardship? Meet your peers!
Datum: 03 OKT 2017
https://www.surf.nl/agenda/2017/10/workshop-working-on-data-stewardship-meet-your-peers/index.html
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)Tom Plasterer
What to do About FAIR…
In the experience of most pharma professionals, FAIR remains fairly abstract, bordering on inconclusive. This session will outline specific case studies – real problems with real data, and address opportunities and real concerns.
·
Why making data Findable, Actionable, Interoperable and Reusable is important.
Talk presented at the Data Driven Drug Development (D4) conference on March 20th, 2019.
FAIR data has flown up the hype curve without a clear sense of return from the required data stewardship investment. The killer use case for FAIR data is a science knowledge graph. It enables you to richly address novel questions of your and the world’s data. We started with data catalogues (findability) which exploited linked/referenced data using a few focused vocabularies (interoperability), for credentialed users (accessibility), with provenance and attribution (reusability) to make this happen.
This talk was presented at The Molecular Medicine Tri-Conference/Bio-IT West on March 11, 2019.
With its focus on investigating the basis for the sustained existence
of living systems, modern biology has always been a fertile, if not
challenging, domain for formal knowledge representation and automated
reasoning. With thousands of databases and hundreds of ontologies now
available, there is a salient opportunity to integrate these for
discovery. In this talk, I will discuss our efforts to build a rich
foundational network of ontology-annotated linked data, develop
methods to intelligently retrieve content of interest, uncover
significant biological associations, and pursue new avenues for drug
discovery. As the portfolio of Semantic Web technologies continue to
mature in terms of functionality, scalability, and an understanding of
how to maximize their value, researchers will be strategically poised
to pursue increasingly sophisticated KR projects aimed at improving
our overall understanding of human health and disease.
bio: Dr. Michel Dumontier is an Associate Professor of Medicine
(Biomedical Informatics) at Stanford University. His research aims to
find new treatments for rare and complex diseases. His research
interest lie in the publication, integration, and discovery of
scientific knowledge. Dr. Dumontier serves as a co-chair for the World
Wide Web Consortium Semantic Web in Health Care and Life Sciences
Interest Group (W3C HCLSIG) and is the Scientific Director for
Bio2RDF, a widely used open-source project to create and provide
linked data for life sciences.
FAIR Data Knowledge Graphs–from Theory to PracticeTom Plasterer
FAIR data has flown up the hype curve without a clear sense of return from the required data stewardship investment. The killer use case for FAIR data is a science knowledge graph. It enables you to richly address novel questions of your and the world’s data. We started with data catalogues (findability) which exploited linked/referenced data using a few focused vocabularies (interoperability), for credentialed users (accessibility), with provenance and attribution (reusability) to make this happen. Our processes enable simple creation of dataset records and linking to source data, providing a seamless federated knowledge graph for novice and advanced users alike.
Presented May 7th, 2019 at the Knowledge Graph Conference, Columbia University.
FAIR Data Management and FAIR Data SharingMerce Crosas
Presentation at the Critical Perspective on the Practice of Digiral Archeology symposium: http://archaeology.harvard.edu/critical-perspectives-practice-digital-archaeology
BioPharma and FAIR Data, a Collaborative AdvantageTom Plasterer
The concept of FAIR (Findable, Accessible, Interoperable and Reusable) data is becoming a reality as stakeholders from industry, academia, funding agencies and publishers are embracing this approach. For BioPharma being able to effectively share and reuse data is a tremendous competitive advantage, within a company, with peer organizations, key opinion leaders and regulatory agencies. A few key drivers, success stories and preliminary results of an industry data stewardship survey are presented.
DataCite and its Members: Connecting Research and Identifying KnowledgeETH-Bibliothek
PIDs and their metadata support scholarly research and its increasing amounts and
variety of scholarly output. DataCite provides services which enable the research community to identify, connect, cite and track these outputs, making content FAIR. New
services include data level metrics and the use of identifiers for organizations and new
types of content, e.g. software, repositories and instruments. As an open, collaborative
and community driven membership organization we rely on our members for their
input and experience to build services that are beneficial for the research community
as a whole. DataCite services as well as current and future initiatives will be described
and it will be shown how members can contribute and benefit. Over the course of the
years, our membership has grown and diversified and we are therefore refreshing and
clarifying our member model. The new member model will be presented and described.
The DataTags System: Sharing Sensitive Data with ConfidenceMerce Crosas
This talk was part of a session at the Research Data Alliance (RDA) 8th Plenary on Privacy Implications of Research Data Sets, during International Data Week 2016:
https://rd-alliance.org/rda-8th-plenary-joint-meeting-ig-domain-repositories-wg-rdaniso-privacy-implications-research-data
Slides in Merce Crosas site:
http://scholar.harvard.edu/mercecrosas/presentations/datatags-system-sharing-sensitive-data-confidence
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...dkNET
Abstract
In this presentation, Susan Gregurick, Ph.D., Associate Director of Data Science and Director, Office of Data Science Strategy at the National Institutes of Health, will share the NIH’s vision for a modernized, integrated FAIR biomedical data ecosystem and the strategic roadmap that NIH is following to achieve this vision. Dr. Gregurick will highlight projects being implemented by team members across the NIH’s 27 institutes and centers and will ways that industry, academia, and other communities can help NIH enable a FAIR data ecosystem. Finally, she will weave in how this strategy is being leveraged to address the COVID-19 pandemic.
Presenter: Susan Gregurick, Ph.D., Associate Director of Data Science and Director, Office of Data Science Strategy at the National Institutes of Health
dkNET Webinar Information: https://dknet.org/about/webinar
The Nuclear Receptor Signaling Atlas (NURSA) is partnering with dkNET (NIDDK Information Network) to host a dataset challenge, and we invite you to join! Everyone is talking about Big Data. How can we ensure that the impact of individual scientists working on a myriad of small and focused studies that discover and probe new phenomena - is not lost in the Big Data world. In fact, there is more than one way to generate big data and we would like your help in creating and expanding “big data” for NIDDK! In this 30-minute webinar, dkNET team will give a presentation about the overview of challenge task, how to use dkNET to find research resources, and top tips!
An overview on FAIR Data and FAIR Data stewardship, and the roadmap for FAIR Data solutions coordinated by the Dutch Techcentre for Life Sciences. This presentation was given at the Netherlands eScience Center's "Essential skills in data-intensive research" course week.
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...Tom Plasterer
As scientists in the life sciences we are trained to pursue singular goals around a publication or a validated target or a drug submission. Our failure rates are exceedingly high especially as we move closer to patients in the attempt to collect sufficient clinical evidence to demonstrate the value of novel therapeutics. This wastes resources as well as time for patients depending upon us for the next breakthrough.
Edge Informatics is an approach to ameliorate these failures. Using both technical and social solutions together knowledge can be shared and leveraged across the drug development process. This is accomplished by making data assets discoverable, accessible, self-described, reusable and annotatable. The Open PHACTS project pioneered this approach and has provided a number of the technical and social solutions to enable Edge Informatics. A number of pre-competitive consortia and some content providers have also embraced this approach, facilitating networks of collaborators within and outside a given organization. When taken together more accurate, timely and inclusive decision-making is fostered.
Dataset Catalogs as a Foundation for FAIR* DataTom Plasterer
BioPharma and the broader research community is faced with the challenge of simply finding the appropriate internal and external datasets for downstream analytics, knowledge-generation and collaboration. With datasets as the core asset, we wanted to promote both human and machine exploitability, using web-centric data cataloguing principles as described in the W3C Data on the Web Best Practices. To do so, we adopted DCAT (Data CATalog Vocabulary) and VoID (Vocabulary of Interlinked Datasets) for both RDF and non-RDF datasets at summary, version and distribution levels. Further, we’ve described datasets using a limited set of well-vetted public vocabularies, focused on cross-omics analytes and clinical features of the catalogued datasets.
My talk at the Open PHACTS last ever project meeting in Vienna 2016 where i was asked to talk about the challenges we addressed in open phacts with semantic web technology and what still needed to be done.
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...Tom Plasterer
Edge Informatics is an approach to accelerate collaboration in the BioPharma pipeline. By combining technical and social solutions knowledge can be shared and leveraged across the multiple internal and external silos participating in the drug development process. This is accomplished by making data assets findable, accessible, interoperable and reusable (FAIR). Public consortia and internal efforts embracing FAIR data and Edge Informatics are highlighted, in both preclinical and clinical domains.
This talk was presented at the Molecular Medicine Tri-Conference in San Francisco, CA on February 20, 2017
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)Michel Dumontier
In the quest to translate the results biomedical research into effective clinical applications, many are now trying to make sense of the large and rapidly growing amount of public biomedical data. However, substantial challenges exist in traversing the currently fragmented data landscape. In this talk, I will discuss our efforts to use Semantic Web technologies to facilitate biomedical research through the formulation, publication, integration, and exploration of facts, expert knowledge, and web services.
From Biological Data to Clinical Applications: Positioning a digital infrastr...Michel Dumontier
In the quest to translate the results of life science research into effective clinical applications, many are now turning their attention to and also trying to make sense of the large and rapidly growing amount of biological and biomedical data. Indeed, getting a grip on and keeping on top of the daily flood of new information, whether it be the latest in clinical reviews, scientific reports, or raw data is an ever-present and widely-recognized challenge. The limited access to structured, integrated and citable data limits our ability to exploit a rich source of scientific knowledge for clinical and translational research. While keeping the dual goals of increasing our understanding of how living systems respond to chemical agents and translating our combined knowledge into clinical applications, I will discuss our efforts to leverage SemanticWeb technologies to facilitate the formulation, publication, integration, and discovery of biological facts, expert knowledge and services of value to pharmaceutical and clinical research, and more recently, with applications for the patient-centric delivery of health care.
Presentation delivered in the context of the Agricultural Data Interoperability WG meeeting, during the RDA 3rd Plenary Meeting in Dublin, Ireland. 26/3/2014.
The presentation is mostly focused on the work done by the agINFRA project towards proposing a methodology for the definition of Germplasm descriptors as RDF, based on the existing work of experts in the field and making use of the existing effort in this direction.
Publishing Germplasm Vocabularies as Linked DataValeria Pesce
What has already been published?
What may still be needed?
How to do it?
This presentation is a part of the 3rd Session of the 1st International e-Conference on Germplasm Data Interoperability https://sites.google.com/site/germplasminteroperability/
European agrobiodioversity, ECPGR network meeting on EURISCO, Central Crop Da...Dag Endresen
Presentation on the Darwin Core standard for data exchange and the germplasm extension for genebanks during the 2014 workshop of the ECPGR Documentation and Information Working Group "Tailoring the Documentation of Plant Genetic Resources in Europe to the Needs of the User" (http://www.ecpgr.cgiar.org/working_groups/documentation_information/docinfo2014.html) in Prague-Ruzyně, Czech Republic, 20th May 2014.
Short URL: https://goo.gl/C5UEnU
DOI: http://doi.org/10.13140/RG.2.2.10865.28006
Strong second quarter
- Life Science and Healthcare post strong organic growth
- Organic sales growth in all regions
- Profitability rises thanks to growth and Sigma-Aldrich synergies
- Integration of Sigma-Aldrich on track
- Merck raises sales and earnings forecast for 2016 thanks to good business performance
Transparencias de las clases sobre Linked Data en el Máster de Bioinformática de la Universidad de Murcia. Para un mejor efecto, http://biordf.org:8080/UM_LSLD/Clases/UM_Bioinformatics_LD.html
Pistoia Alliance European Conference, Kings College London, April 19, 2016
Panel introduction to Big (Biomedical) Data and the challenges facing research in biomedical R&D with examples from genomics data around the world. #Pistoia2016
Event link:
https://www.eventbrite.co.uk/e/pistoia-alliance-european-conference-2016-tickets-19618953819
Read more about me and my work at:
http://dnadigest.org
http://repositive.io
https://uk.linkedin.com/in/fionanielsen
1h SPARQL tutorial given at the "Practical Cross-Dataset Queries on the Web of Data" tutorial at WWW2012. Supported by the LATC FP7 Project. http://latc-project.eu/
Motivation of LIFE🦅 बाज लगभग 70 वर्ष जीता है ....
परन्तु अपने जीवन के 40वें वर्ष में आते-आते उसे एक महत्वपूर्ण निर्णय लेना पड़ता है ।
उस अवस्था में उसके शरीर के
3 प्रमुख अंग निष्प्रभावी होने लगते हैं .....
पंजे लम्बे और लचीले हो जाते है, तथा शिकार पर पकड़ बनाने में अक्षम होने लगते हैं ।
चोंच आगे की ओर मुड़ जाती है,
और भोजन में व्यवधान उत्पन्न करने लगती है ।
पंख भारी हो जाते हैं, और सीने से चिपकने के कारण पूर्णरूप से खुल नहीं पाते हैं, उड़ान को सीमित कर देते हैं ।
भोजन ढूँढ़ना, भोजन पकड़ना,
और भोजन खाना .. तीनों प्रक्रियायें अपनी धार खोने लगती हैं ।
उसके पास तीन ही विकल्प बचते हैं....
1. देह त्याग दे,
2. अपनी प्रवृत्ति छोड़ गिद्ध की तरह त्यक्त भोजन पर निर्वाह करे !!
3. या फिर "स्वयं को पुनर्स्थापित करे" !!
आकाश के निर्द्वन्द एकाधिपति के रूप में.
जहाँ पहले दो विकल्प सरल और त्वरित हैं,
अंत में बचता है तीसरा लम्बा और अत्यन्त पीड़ादायी रास्ता ।
बाज चुनता है तीसरा रास्ता ..
और स्वयं को पुनर्स्थापित करता है ।
वह किसी ऊँचे पहाड़ पर जाता है, एकान्त में अपना घोंसला बनाता है ..
और तब स्वयं को पुनर्स्थापित करने की प्रक्रिया प्रारम्भ करता है !!
सबसे पहले वह अपनी चोंच चट्टान पर मार मार कर तोड़ देता है,
चोंच तोड़ने से अधिक पीड़ादायक कुछ भी नहीं है पक्षीराज के लिये !
और वह प्रतीक्षा करता है
चोंच के पुनः उग आने का ।
उसके बाद वह अपने पंजे भी उसी प्रकार तोड़ देता है,
और प्रतीक्षा करता है ..
पंजों के पुनः उग आने का ।
नयी चोंच और पंजे आने के बाद वह अपने भारी पंखों को एक-एक कर नोंच कर निकालता है !
और प्रतीक्षा करता है ..
पंखों के पुनः उग आने का ।
150 दिन की पीड़ा और प्रतीक्षा के बाद ...
मिलती है वही भव्य और ऊँची उड़ान पहले जैसी....
इस पुनर्स्थापना के बाद
वह 30 साल और जीता है ....
ऊर्जा, सम्मान और गरिमा के साथ ।
इसी प्रकार इच्छा, सक्रियता और कल्पना, तीनों निर्बल पड़ने लगते हैं हम इंसानों में भी !
हमें भी भूतकाल में जकड़े
अस्तित्व के भारीपन को त्याग कर कल्पना की उन्मुक्त उड़ाने भरनी होंगी ।
150 दिन न सही.....
60 दिन ही बिताया जाये
स्वयं को पुनर्स्थापित करने में !
जो शरीर और मन से चिपका हुआ है, उसे तोड़ने और
नोंचने में पीड़ा तो होगी ही !!
और फिर जब बाज की तरह उड़ानें भरने को तैयार होंगे ..
इस बार उड़ानें और ऊँची होंगी,
अनुभवी होंगी, अनन्तगामी होंगी ।
हर दिन कुछ चिंतन किया जाए
और आप ही वो व्यक्ति हे
जो खुद को सबसे बेहतर जान सकते है ।
सिर्फ इतना निवेदन है की छोटी-छोटी शुरुवात करें परिवर्तन करने की ।
Divizare digitală și subdezvoltare tehnologică: sociologia românească pe inte...Eugen Glavan
Prezentare la Conferința SSR Provocări sociale: instituții, valori, tendințe, Iași, 17 Mai 2013
Eugen Glăvan
Institutul de Cercetare a Calităţii Vieţii, Academia Română
Rezumat
Dezvoltarea unei discipline științifice nu este dependentă în exclusivitate de reflecțiile teoretice și cercetările empirice, ci și de mijloacele tehnice pe care le poate integra în demersul de producere a cunoașterii. În științele naturale acest această realitate este subînțeleasă și progresul tehnologic susține demersurile de inovare și descoperire. În științele sociale însă, deși impactul tehnologiilor nu este atât de important, dezvoltarea și utilizarea instrumentelor tehnice este secundară. În această prezentare voi aborda o componentă a acestei realități, modul în care se prezintă sociologia românească pe internet. Devenit un instrument esențial de comunicare, internetul pune la dispoziție o serie de aplicații care pot defini o știință, legăturile cu comunitatea internațională și rolul acesteia în societate. Printre acestea putem cita sistemele automate de citare, construirea unor depozite de prezervare pentru producţia academică (nu doar articole, ci şi materiale audio sau video), arhive de date sau integrarea sau conectarea resurselor web în reţele mondiale specifice. Utilizând instrumente specializate din sfera măsurătorilor online, prezentarea se va concentra asupra evaluării felului în care specialiştii şi instituţiile din domeniul sociologiei româneşti sunt pregătite să răspundă necesităţilor de a furniza informaţii complete şi comprehensibile despre activitatea proprie, constatând lipsa unor organisme care să centralizeze astfel de date.
Cuvinte cheie: sociologie digitală, webometrics, ierarhii academice, internet
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
Seminar for Dr. Min Zhang's Purdue Bioinformatics Seminar Series. Touched on learning health systems, the Gen3 Data Commons, the NCI Genomic Data Commons, Data Harmonization, FAIR, and open science.
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
Maximizing the value of data, computing, data science in an academic medical center, or 'towards a molecularly informed Learning Health System. Given in October at the University of Florida in Gainesville
OpenTox - an open community and framework supporting predictive toxicology an...Barry Hardy
Presented at ACS Boston 2015 at a Session on the growing impact of Open Science chaired by Andy Lang and Tony Williams dedicated to the work, memory and legacy of JC Bradley and the work we carry forward!
One important goal of OpenTox is to support the development of an Open Standards-based predictive toxicology framework that provides a unified access to toxicological data and models. OpenTox supports the development of tools for the integration of data, for the generation and validation of in silico models for toxic effects, libraries for the development and integration of modelling algorithms, and scientifically sound validation and reporting routines.
The OpenTox Application Programming Interface (API) is an important open standards development for software development purposes. It provides a specification against which development of global interoperable toxicology resources by the broader community can be carried out. The use of OpenTox API-compliant web services to communicate instructions between linked resources with URI addresses supports the use of a wide variety of commands to carry out operations such as data integration, algorithm use, model building and validation. The OpenTox Framework currently includes, with its APIs, services for compounds, datasets, features, algorithms, models, ontologies, tasks, validation, reporting, investigations, studies, assays, and authentication and authorisation, which may be combined into multiple applications satisfying a variety of different user needs. As OpenTox creates a semantic web for toxicology, it should be an ideal framework for incorporating toxicology data, ontology and modelling developments, thus supporting both a mechanistic framework for toxicology and best practices in statistical analysis and computational modelling.
In this presentation I will review the recent OpenTox-based development of applications including the ToxBank data infrastructure supporting integrated analysis across biochemical, functional and omics datasets supporting the safety assessment goals of the SEURAT-1 program which aims to develop alternatives to animal testing.
Finally, I will provide an overview of the working group activities of the newly formed OpenTox Association which aim to progress the development of open source, data, standards and tools in this area.
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Amit Sheth
Talk presented in Spain (WiMS 2013/UAM-Madrid, UMA-Malaga), June 2013.
Replaces earlier version at: http://www.slideshare.net/apsheth/semantic-technology-empowering-real-world-outcomes-in-biomedical-research-and-clinical-practices
Biomedical and translational research as well as clinical practice are increasingly data driven. Activities routinely involve large number of devices, data and people, resulting in the challenges associated with volume, velocity (change), variety (heterogeneity) and veracity (provenance, quality). Equally important is to realize the challenge of serving the needs of broader ecosystems of people and organizations, extending traditional stakeholders like drug makers, clinicians and policy makers, to increasingly technology savvy and information empowered patients. We believe that semantics is becoming centerpiece of informatics solutions that convert data into meaningful, contextually relevant information and insights that lead to optimal decisions for translational research and 360 degree health, fitness and well-being.
In this talk, I will provide a series of snapshots of efforts in which semantic approach and technology is the key enabler. I will emphasize real-world and in-use projects, technologies and systems, involving significant collaborations between my team and biomedical researchers or practicing clinicians. Examples include:
• Active Semantic Electronic Medical Record
• Semantics and Services enabled Problem Solving Environment for T.cruzi (SPSE)
• Data Mining of Cardiology data
• Semantic Search, Browsing and Literature Based Discovery
• PREscription Drug abuse Online Surveillance and Epidemiology (PREDOSE)
• kHealth: development of a knowledge-enhanced sensing and mobile computing applications (using low cost sensors and smartphone), along with ability to convert low level observations into clinically relevant abstractions
Further details are at http://knoesis.org/amit/hcls
Using the Micropublications ontology and the Open Annotation Data Model to re...jodischneider
Presentation of a paper at the ISWC 2014 Workshop on Linked Science 2014— Making Sense Out of Data (LISC2014) - at ISWC 2014 Riva de Garda, Italy, October 19
“Using the Micropublications ontology and the Open Annotation Data Model to represent evidence within a drug-drug interaction knowledge base.” by Jodi Schneider, Paolo Ciccarese, Tim Clark and Richard D. Boyce.
Paper: http://jodischneider.com/pubs/lisc2014.pdf
Event:http://linkedscience.org/events/lisc2014/
Abstract:
Semantic web technologies can support the rapid and transparent validation of scientific claims by interconnecting the assumptions and evidence used to support or challenge assertions. One important application domain is medication safety, where more efficient acquisition, representation, and synthesis of evidence about potential drug-drug interactions is needed. Exposure to potential drug-drug interactions (PDDIs), defined as two or more drugs for which an interaction is known to be possible, is a significant source of preventable drug-related harm. The combination of poor quality evidence on PDDIs, and a general lack of PDDI knowledge by prescribers, results in many thousands of preventable medication errors each year. While many sources of PDDI evidence exist to help improve prescriber knowledge, they are not concordant in their coverage, accuracy, and agreement. The goal of this project is to research and develop core components of a new model that supports more efficient acquisition, representation, and synthesis of evidence about potential drug-drug interactions. Two Semantic Web models—the Micropublications Ontology and the Open Annotation Data Model—have great potential to provide linkages from PDDI assertions to their supporting evidence: statements in source documents that mention data, materials, and methods. In this paper, we describe the context and goals of our work, propose competency questions for a dynamic PDDI evidence base, outline our new knowledge representation model for PDDIs, and discuss the challenges and potential of our approach.
This is a presentation given at the Opal Events meeting ""Drug Discovery Partnerships: Filling the Pipeline". I was speaking in a session with Jean-Claude Bradley regarding "Pre-competitive Collaboration: Sharing Data to Increase Predictability". This presentation discussed some of the work we are doing on Open PHACTS. My thanks especially to Carole Goble, Lee Harland and Sean Ekins for their comments.
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.cafionabrinkman
Talk at GenomeTrakr network meeting Sept 23 2015 in Washington DC. On Canada's open source Integrated Rapid Infectious Disease Analysis (IRIDA) bioinformatics platform - aiding genomic epidemiology analysis for public health agencies with planned open data release and linkage to GenomeTrakr. Discussed perspectives, challenges, solutions for getting more GenomeTrakr participation internationally.
Clinical Research Informatics Year-in-Review 2024Peter Embi
Peter Embi, MD's presentation of Clinical Research Informatics year-in-review presented at the 2024 AMIA Informatics Summit in Boston, MA on March 20, 2024.
CINECA webinar slides: Open science through fair health data networks dream o...CINECAProject
Since the FAIR data principles were published in 2016, many organizations including science funders and governments have adopted these principles to promote and foster true open science collaborations. However, to define a vision and create a video of a Personal Health Train that leverages worldwide FAIR health data in a federated manner is one step. To actually make this happen at scale and be able to show new scientific and medical insights for it is quite another!
In this webinar, we will dive into the basics of FAIR health data, but also take stock of the current situation in health data networks: after a year of frantic research and collaborations and many open datasets and hackathons on COVID-19, has the situation actually improved? Are we sharing health data on a global scale to improve medical practice, or is quality medical data still only accessible to researchers with the right credentials and deep pockets?
This webinar is part of the “How FAIR are you” webinar series and hackathon, which aim at increasing and facilitating the uptake of FAIR approaches into software, training materials and cohort data, to facilitate responsible and ethical data and resource sharing and implementation of federated applications for data analysis.
The CINECA webinar series aims to discuss ways to address common challenges and share best practices in the field of cohort data analysis, as well as distribute CINECA project results. All CINECA webinars include an audience Q&A session during which attendees can ask questions and make suggestions. Please note that all webinars are recorded and available for posterior viewing. CINECA webinars include an audience Q&A session during which attendees can ask questions and make suggestions.
This webinar took place on 21st January 2021 and is part of the CINECA webinar series.
For previous and upcoming CINECA webinars see:
https://www.cineca-project.eu/webinars
The increased availability of biomedical data, particularly in the public domain, offers the opportunity to better understand human health and to develop effective therapeutics for a wide range of unmet medical needs. However, data scientists remain stymied by the fact that data remain hard to find and to productively reuse because data and their metadata i) are wholly inaccessible, ii) are in non-standard or incompatible representations, iii) do not conform to community standards, and iv) have unclear or highly restricted terms and conditions that preclude legitimate reuse. These limitations require a rethink on data can be made machine and AI-ready - the key motivation behind the FAIR Guiding Principles. Concurrently, while recent efforts have explored the use of deep learning to fuse disparate data into predictive models for a wide range of biomedical applications, these models often fail even when the correct answer is already known, and fail to explain individual predictions in terms that data scientists can appreciate. These limitations suggest that new methods to produce practical artificial intelligence are still needed.
In this talk, I will discuss our work in (1) building an integrative knowledge infrastructure to prepare FAIR and "AI-ready" data and services along with (2) neurosymbolic AI methods to improve the quality of predictions and to generate plausible explanations. Attention is given to standards, platforms, and methods to wrangle knowledge into simple, but effective semantic and latent representations, and to make these available into standards-compliant and discoverable interfaces that can be used in model building, validation, and explanation. Our work, and those of others in the field, creates a baseline for building trustworthy and easy to deploy AI models in biomedicine.
Bio
Dr. Michel Dumontier is the Distinguished Professor of Data Science at Maastricht University, founder and executive director of the Institute of Data Science, and co-founder of the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. His research explores socio-technological approaches for responsible discovery science, which includes collaborative multi-modal knowledge graphs, privacy-preserving distributed data mining, and AI methods for drug discovery and personalized medicine. His work is supported through the Dutch National Research Agenda, the Netherlands Organisation for Scientific Research, Horizon Europe, the European Open Science Cloud, the US National Institutes of Health, and a Marie-Curie Innovative Training Network. He is the editor-in-chief for the journal Data Science and is internationally recognized for his contributions in bioinformatics, biomedical informatics, and semantic technologies including ontologies and linked data.
knowledge graphs are an emerging paradigm to represent information. yet their discovery and reuse is hampered by insufficient or inadequate metadata. here, the COST ACTION Distributed Knowledge Graphs had a first workshop to develop a KG metadata schema. In this presentation, the progress and plans are discussed with the W3C Community Group on Knowledge Graph Construction.
Data-Driven Discovery Science with FAIR Knowledge GraphsMichel Dumontier
Data-Driven Discovery Science with FAIR Knowledge Graphs
Despite the existence of vast amounts of biomedical data, these remain difficult to find and to productively reuse in machine learning and other Artificial Intelligence technologies. In this talk, I will discuss the role of the FAIR Guiding Principles to make AI-ready biomedical data, and their representation as knowledge graphs not only enables powerful ontology-backed semantic queries, but also can be used to predict missing information, as well as to check the quality of knowledge collected.
The main idea of the talk is to introduce the FAIR principles (what they are and what they are not), and how their application with semantic web technologies (ontologies/linked data) creates improved possibilities for large scale data integration, answering sophisticated questions using automated reasoners, and predicting new relations/validating data using graph embeddings. The audience will gain insight into the state of the art in a carefully presented manner that introduces principles, approaches, and outcomes relevant to Health AI.
The FAIR (Findable, Accessible, Interoperable, Reusable) Guiding Principles light a path towards improving the discovery and reuse of digital objects (data, documents, software, web services, etc) by machines. Machine reusability is a crucial strategic component in building robust digital infrastructure that strengthens scholarship and opens new pathways for innovation on a truly global scale. However, as the FAIR principles do not specify any particular implementation, communities have the homework to devise, standardize and implement technical specifications to improve the ‘FAIRness’ of digital assets. In this seminar, I will focus on the history and state of the art in the FAIRness assessment, including manual, semi-automated and fully automated approaches, and how these can be used by developers and consumers alike. This seminar will serve as a springboard for community discussion and adoption of these services to incrementally and realistically improve the FAIRness of their resources.
The Role of the FAIR Guiding Principles for an effective Learning Health SystemMichel Dumontier
he learning health system (LHS) is an integrated social and technological system that embeds continuous improvement and innovation for the effective delivery of healthcare. A crucial part of the LHS lies in how the underlying information system will secure and take advantage of relevant knowledge assets towards supporting complex and unusual clinical decision making, facilitating public health surveillance, and aiding comparative effectiveness research. However, key knowledge assets remain difficult to obtain and reuse, particularly in a decentralized context. In this talk, I will discuss the role of the Findable, Accessible, Interoperable, and Reusable (FAIR) Guiding Principles towards the realization of the LHS, along with emerging technologies to publish and refine clinical research and knowledge derived therein.
Keynote given for 2021 Knowledge Representation for Health Care http://banzai-deim.urv.net/events/KR4HC-2021/
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...Michel Dumontier
Biomedicine has always been a fertile and challenging domain for computational discovery science. Indeed, the existence of millions of scientific articles, thousands of databases, and hundreds of ontologies, offer exciting opportunities to reuse our collective knowledge, were we not stymied by incompatible formats, overlapping and incomplete vocabularies, unclear licensing, and heterogeneous access points. In this talk, I will discuss our work to create computational standards, platforms, and methods to wrangle knowledge into simple, but effective representations based on semantic web technologies that are maximally FAIR - Findable, Accessible, Interoperable, and Reuseable - and to further use these for biomedical knowledge discovery. But only with additional crucial developments will this emerging Internet of FAIR data and services enable automated scientific discovery on a global scale.
bio:
Dr. Michel Dumontier is the Distinguished Professor of Data Science at Maastricht University and co-founder of the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. His research focuses on the development of computational methods for scalable and responsible discovery science. Dr. Dumontier obtained his BSc (Biochemistry) in 1998 from the University of Manitoba, and his PhD (Bioinformatics) in 2005 from the University of Toronto. Previously a faculty member at Carleton University in Ottawa and Stanford University in Palo Alto, Dr. Dumontier founded and directs the interfaculty Institute of Data Science at Maastricht University to develop sociotechnological systems for responsible data science by design. His work is supported through the Dutch National Research Agenda, the Netherlands Organisation for Scientific Research, Horizon 2020, the European Open Science Cloud, the US National Institutes of Health and a Marie-Curie Innovative Training Network. He is the editor-in-chief for the journal Data Science and is internationally recognized for his contributions in bioinformatics, biomedical informatics, and semantic technologies including ontologies and linked data.
This presentation was given on October 21, 2020 at CIKM2020.
The role of the FAIR Guiding Principles in a Learning Health SystemMichel Dumontier
The learning health system (LHS) is a concept for a socio-technological system that continuously improves the delivery of health care by coupling biomedical research with practice- and evidence- based medicine. Key aspects of the LHS are collecting, integrating, and analyzing data from different sources. While the increased digitalisation of healthcare is creating new data sources, these remain hard to find and use, let alone make use of as part of intelligent systems for the benefit of patients, healthcare providers, and researchers. This talk will examine recent developments towards making key parts of the LHS, such as clinical practice guidelines, Findable, Accessible, Interoperable, and Reusable (FAIR).
Acclerating biomedical discovery with an internet of FAIR data and services -...Michel Dumontier
With its focus on improving the health and well being of people, biomedicine has always been a fertile, if not challenging domain for computational discovery science. Indeed, the existence of millions of scientific articles, thousands of databases, and hundreds of ontologies, offer exciting opportunities to reuse our collective knowledge, were we not stymied by incompatible formats, overlapping and incomplete vocabularies, unclear licensing, and heterogeneous access points. In this talk, I will discuss our work to create computational standards, platforms, and methods to wrangle knowledge into simple, but effective representations based on semantic web technologies that are maximally FAIR - Findable, Accessible, Interoperable, and Reuseable - and to further use these for biomedical knowledge discovery. But only with additional crucial developments will this emerging Internet of FAIR data and services, which is built on Semantic Web technologies, be well positioned to support automated scientific discovery on a global scale.
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...Michel Dumontier
ith its focus on improving the health and well being of people, biomedicine has always been a fertile, if not challenging domain for computational discovery science. Indeed, the existence of millions of scientific articles, thousands of databases, and hundreds of ontologies, offer exciting opportunities to reuse our collective knowledge, were we not stymied by incompatible formats, overlapping and incomplete vocabularies, unclear licensing, and heterogeneous access points. In this talk, I will discuss our work to create computational standards, platforms, and methods to wrangle knowledge into simple, but effective representations based on semantic web technologies that are maximally FAIR - Findable, Accessible, Interoperable, and Reuseable - and to further use these for biomedical knowledge discovery. But only with additional crucial developments will this emerging Internet of FAIR data and services enable automated scientific discovery on a global scale.
Are we FAIR yet? And will it be worth it?
The FAIR Principles propose essential characteristics that all digital resources (e.g. datasets, repositories, web services) should possess to be Findable, Accessible, Interoperable, and Reusable by both humans and machines. The Principles act as a guide that researchers and data stewards should expect from contemporary digital resources, and in turn, the requirements on them when publishing their own scholarly products. As interest in, and support for the Principles has spread, the diversity of interpretations has also broadened, with some resources claiming to already “be FAIR”.
This talk will elaborate on what FAIR is, what it entails, and how we should evaluate FAIRness. I will describe new social and technological infrastructure to support the creation and evaluation of FAIR resources, and how FAIR fits into institutional, national and international efforts. Finally, I will discuss the merits of the FAIR principles (and what we ask of people) in the context of strengthening data-driven scientific inquiry.Are we FAIR yet? And will it be worth it?
The FAIR Principles propose essential characteristics that all digital resources (e.g. datasets, repositories, web services) should possess to be Findable, Accessible, Interoperable, and Reusable by both humans and machines. The Principles act as a guide that researchers and data stewards should expect from contemporary digital resources, and in turn, the requirements on them when publishing their own scholarly products. As interest in, and support for the Principles has spread, the diversity of interpretations has also broadened, with some resources claiming to already “be FAIR”.
This talk will elaborate on what FAIR is, what it entails, and how we should evaluate FAIRness. I will describe new social and technological infrastructure to support the creation and evaluation of FAIR resources, and how FAIR fits into institutional, national and international efforts. Finally, I will discuss the merits of the FAIR principles (and what we ask of people) in the context of strengthening data-driven scientific inquiry.
Keynote given at NETTAB2018 - http://www.igst.it/nettab/2018/
The future of science and business - a UM Star LectureMichel Dumontier
I discuss how data science is affecting our way of life and how we at Maastricht University are preparing the next generation of leaders to address opportunities and challenges in responsible manner.
The FAIR Principles propose key characteristics that all digital resources (e.g. datasets, repositories, web services) should possess to be Findable, Accessible, Interoperable, and Reusable by people and machines. The Principles act as a guide that researchers should expect from contemporary digital resources, and in turn, the requirements on them when publishing their own scholarly products. As interest in, and support for the Principles has spread, the diversity of interpretations has also broadened, with some resources claiming to already “be FAIR”. This talk will elaborate on what FAIR is, why we need it, what it entails, and how we should evaluate FAIRness. I will describe new social and technological infrastructure to support the creation and evaluation of FAIR resources, and how FAIR fits into institutional, national and international efforts. Finally, I will discuss the merits of the FAIR principles (and what we ask of people) in the context of strengthening data-driven scientific inquiry.
Bio2RDF is an open-source project that offers a large and
connected knowledge graph of Life Science Linked Data. Each dataset is expressed using its own vocabulary, thereby hindering integration, search, query, and browse data across similar or identical types of data. With growth and content changes in source data, a manual approach to maintain mappings has proven untenable. The aim of this work is to develop a (semi)automated procedure to generate high quality mappings
between Bio2RDF and SIO using BioPortal ontologies. Our preliminary results demonstrate that our approach is promising in that it can find new mappings using a transitive closure between ontology mappings. Further development of the methodology coupled with improvements in
the ontology will offer a better-integrated view of the Life Science Linked Data
Ontology has its roots as a field of philosophical study that is focused on the nature of existence. However, today's ontology (aka knowledge graph) can incorporate computable descriptions that can bring insight in a wide set of compelling applications including more precise knowledge capture, semantic data integration, sophisticated query answering, and powerful association mining - thereby delivering key value for health care and the life sciences. In this webinar, I will introduce the idea of computable ontologies and describe how they can be used with automated reasoners to perform classification, to reveal inconsistencies, and to precisely answer questions. Participants will learn about the tools of the trade to design, find, and reuse ontologies. Finally, I will discuss applications of ontologies in the fields of diagnosis and drug discovery.
Bio:
Dr. Michel Dumontier is an Associate Professor of Medicine (Biomedical Informatics) at Stanford University. His research focuses on the development of methods to integrate, mine, and make sense of large, complex, and heterogeneous biological and biomedical data. His current research interests include (1) using genetic, proteomic, and phenotypic data to find new uses for existing drugs, (2) elucidating the mechanism of single and multi-drug side effects, and (3) finding and optimizing combination drug therapies. Dr. Dumontier is the Stanford University Advisory Committee Representative for the World Wide Web Consortium, the co-Chair for the W3C Semantic Web for Health Care and the Life Sciences Interest Group, scientific advisor for the EBI-EMBL Chemistry Services Division, and the Scientific Director for Bio2RDF, an open source project to create Linked Data for the Life Sciences. He is also the founder and Editor-in-Chief for a Data Science, a new IOS Press journal featuring open access, open review, and semantic publishing.
Model organisms such as budding yeast provide a common platform to interrogate and understand cellular and physiological processes. Knowledge about model organisms, whether generated during the course of scientific investigation, or extracted from published articles, are made available by model organism databases (MODs) such as the Saccharomyces Genome Database (SGD) for powerful, data-driven bioinformatic analyses. Integrative platforms such as InterMine offer a standard platform for MOD data exploration and data mining. Yet, today’s bioinformatic analyses also requires access to a significantly broader set of structured biomedical data, such as what can be found in the emerging network of Linked Open Data (LOD). If MOD data could be provisioned as FAIR (Findable, Accessible, Interoperable, and Reusable), then scientists could leverage a greater amount of interoperable data in knowledge discovery.
The goal of this proposal is to increase the utility of MOD data by implementing standards-compliant data access interfaces that interoperate with Linked Data. We will focus our efforts on developing interfaces for data access, data retrieval, and query answering for SGD. Our software will publish InterMine data as LOD that are semantically annotated with ontologies and be retrieved using standardized formats (e.g. JSON-LD, Turtle). We will facilitate the exploration of MOD data for hypothesis testing, by implementing efficient query answering using Linked Data Fragments, and by developing a set of graphical user interfaces to search for data of interest, explore connections, and answer questions that leverage the wider LOD network. Finally, we will develop a locally and cloud-deployable image to enable the rapid deployment of the proposed infrastructure. Our efforts to increase interoperability and ease of deployment for biomedical data repositories will increase research productivity and reduce costs associated with data integration and warehouse maintenance.
Making the most of phenotypes in ontology-based biomedical knowledge discoveryMichel Dumontier
A phenotype is an observable characteristic of an individually and typically pertains to its morphology, function, and behavior. Phenotypes, whether observed at the bench or the bedside, are increasingly being used to gain insight into the diagnosis, mechanism, and treatment of disease. A key aspect of these approaches involve comparing phenotypes that are defined in multiple terminologies that often cater to altogether different organisms, such as mice and humans. In this seminar, I will discuss computational approaches for harmonizing and utilizing phenotypes for translational research. We will examine case studies which involve the computation of semantic similarity including the use of phenotypes to inform clinical diagnosis of rare diseases, to identify human drug targets using mice knock-out models, and to explore phenotype-based approaches for drug repositioning .
Despite the massive amount of biomedical literature, only a small amount is available in a form that is readily computable. The National Center for Biomedical Ontology (NCBO) is hosting the first hackathon to develop a comprehensive Network of BioThings (proteins, genes, pathways, mutations, drugs, diseases) extracted from scientific research articles and integrated with public biomedical data (see blog post http://goo.gl/i91ngK). During this hackathon, we will (1) identify motivating use cases, (2) define a shared, sustainable, multi-component infrastructure to build the NoB, and (3) implement common data representations, ontology-based programmatic interfaces, and develop cool applications. We will do this in an open, scalable, responsive manner so that it becomes a major asset for hackers and biomedical researchers worldwide.
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
This pdf is about the Schizophrenia.
For more details visit on YouTube; @SELF-EXPLANATORY;
https://www.youtube.com/channel/UCAiarMZDNhe1A3Rnpr_WkzA/videos
Thanks...!
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Building a Network of Interoperable and Independently Produced Linked and Open Biomedical Data
1. Building a network of interoperable and
independently produced linked and open
biomedical data
1
Michel Dumontier, Ph.D.
Associate Professor of Medicine (Biomedical Informatics)
Stanford University
@micheldumontier::ACS:23-08-16
An invited talk in support of the 2016 Herman Skolnik Awardees
2. My research aims to
develop computational
methods for biomedical
knowledge discovery
We develop tools and
methods to represent,
store, publish, integrate,
query, and reuse
biomedical data,
software, and ontologies
@micheldumontier::ACS:23-08-162
5. Reproducible discovery
1. Data Science Tools and Methods
– Infrastructure: To identify, annotate, link, integrate,
search for and query data and services
– Tools: To identify and uncover support for known or
novel associations
2. Community Standards
to contribute to and interrogate a massive, decentralized
network of interconnected data and software
@micheldumontier::ACS:23-08-165
7. FAIR: Findable, Accessible,
Interoperable, Re-usable
Findable
– Globally unique identifiers for datasets and the data they contain
– Rich set of descriptors to search and filter with
– Indexed and searchable
Accessible
– Identifiers can be used to retrieve representations using standard protocols
(e.g. HTTP)
– Metadata is always available.
Interoperable
– Data represented with formal knowledge representations
– Include links to other datasets/vocabularies
Reusable
– Licensing, Provenance, Community standards
@micheldumontier::ACS:23-08-167
8. The Semantic Web
is the new global web of knowledge
8 @micheldumontier::ACS:23-08-16
standards for publishing, sharing and querying
facts, expert knowledge and services
scalable approach for the discovery
of independently formulated
and distributed knowledge
9. Linked Data
offers a solid foundation for FAIR data
• Entities (people, proteins, pathways, etc) are
identified using globally unique identifiers (URIs)
• Entity descriptions are represented with a
standardized language (RDF)
• Data can be retrieved using a universal protocol
(HTTP)
• Entities (concepts, data, resources) can be linked
together to increase interoperability
@micheldumontier::ACS:23-08-169
10. @micheldumontier::ACS:23-08-16
Linked Data for the Life Sciences
10
Bio2RDF is an open source project to unify the
representation and interlinking of biological data using RDF.
chemicals/drugs/formulations,
genomes/genes/proteins, domains
Interactions, complexes & pathways
animal models and phenotypes
Disease, genetic markers, treatments
Terminologies & publications
• 11B+ interlinked statements from 35 biomedical
datasets and 400+ ontologies
• dataset description, provenance & statistics
• A growing interoperable ecosystem with the EBI,
NCBI, DBCLS, NCBO, OpenPHACTS, and
commercial tool providers
13. Bio2RDF shows how datasets are
connected together
@micheldumontier::ACS:23-08-1613
14. Queries can be federated across
private and public SPARQL databases
Get all protein catabolic processes (and more specific GO terms) in biomodels
SELECT ?go ?label count(distinct ?x)
WHERE {
service <http://bioportal.bio2rdf.org/sparql> {
?go rdfs:label ?label .
?go rdfs:subClassOf+ ?tgo
?tgo rdfs:label ?tlabel .
FILTER regex(?tlabel, "^protein catabolic process")
}
service <http://biomodels.bio2rdf.org/sparql> {
?x <http://bio2rdf.org/biopax_vocabulary:identical-to> ?go .
?x a <http://www.biopax.org/release/biopax-level3.owl#BiochemicalReaction> .
}
}
@micheldumontier::ACS:23-08-1614
15. Graph-like representation amenable
to finding mismatches and discovering new links
@micheldumontier::ACS:23-08-1615
W Hu, H Qiu, M Dumontier. Link Analysis of Life Science Linked Data.
International Semantic Web Conference (2) 2015: 446-462.
16. EbolaKB
Using Linked Data and Software
@micheldumontier::ACS:23-08-1616
Kamdar, Dumontier. An Ebola virus-centered knowledge base. Database. 2015 Jun 8;2015. doi: 10.1093/database/bav049.
18. @micheldumontier::ACS:23-08-1618
Can we implement
an open version of
PREDICT using
Linked Data?
AUC 0.91 across all therapeutic indications
A. Chemical structure Similarity
B. Side Effect Similarity
C. Target Sequence Similarity
D. Target Functional Similarity
E. Network Distance
A. Phenotype Based
B. Text Extracted Concepts
Disease-disease similarityDrug-drug similarity
19. HyQue: Hypothesis Validation
• A platform for knowledge discovery that
uses data retrieval coupled with
automated reasoning to validate
scientific hypotheses
• Leverages semantic technologies to
provide access to linked data,
ontologies, and semantic web services
• Uses positive and negative findings,
captures provenance
• Weighs evidence according to context
• Used to find aging genes in worm,
assess cardiotoxicity of tyrosine kinase
inhibitors
HyQue: evaluating hypotheses using Semantic Web technologies. J Biomed Semantics. 2011 May 17;2 Suppl 2:S3.
Evaluating scientific hypotheses using the SPARQL Inferencing Notation. Extended Semantic Web Conference (ESWC 2012). Heraklion, Crete.
May 27-31, 2012. @micheldumontier::ACS:23-08-1619
20. What evidence might we gather?
• clinical: Are there cardiotoxic effects associated with the drug?
– Literature (studies) [curated db]
– Product labels (studies) [r3:sider]
– Clinical trials (studies) [r3:clinicaltrials]
– Adverse event reports [r2:pharmgkb/onesides]
– Electronic health records (observations)
• pre-clinical associations:
– genotype-phenotype (null/disease models) [r2:mgi, r2:sgd; r3:wormbase]
– in vitro assays (IC50) [r3:chembl]
– drug targets [r2:drugbank; r2:ctd; r3:stitch]
– drug-gene expression [r3:gxa]
– pathways [r2:kegg; r3:reactome]
– Drug-pathway, disease-pathway enrichments [aberrant pathways]
– Chemical properties [r2:pubchem; r2.drugbank]
– Toxicology [r1.toxkb/cebs]
@micheldumontier::ACS:23-08-1620
24. Expansion across domains
“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”@micheldumontier::ACS:23-08-1624
25. A rapidly growing network of Linked Data
25 @micheldumontier::ACS:23-08-16Linking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/"
35. Chemical Information Ontology
(CHEMINF)
• Collaborative ontology
• Distinguishes algorithmic,
or procedural information
from declarative, or factual
information, and renders of
particular importance the
annotation of provenance
to calculated data.
@micheldumontier::ACS:23-08-1635
36. Where are we going?
• Large scale publishing on the web across
biomedical datatypes is possible on the web
• Hubs, such as NCBI and EBI now integrate data,
but there is need for global coordination on all
datatypes
• Standard Vocabularies must to be open, freely
accessible, and demonstrably reused
• Use of worldwide data integration formats (RDF)
and improved linking of data
• Easier to deploy toolkits for providing standards-
compliant linked data
@micheldumontier::ACS:23-08-1636
37. Linked Data Platform
Docker
• Data conversion scripts
• Query Editor
• Faceted Browser
• Relation Exploration
• API
• Data and data store
Model Organism Linked Data
MO-LD.org
37
38. In Summary
• We use semantic technologies such as ontologies
and linked data to make sense of and facilitate
access to biomedical data (FAIR)
• The intimate development and use of standards
by PubChem and others brings us closer to an
interoperability ideal
• Much more work is needed to support
(computational) discovery in a reproducible
manner.
@micheldumontier::ACS:23-08-1638
39. Acknowledgements
Dumontier Lab
• Amrapali Zaveri
• Mary Panahiazar
• Shima Dastgheib
• Sandeep Ayyar
• Remzi Celebi
• David Odgers
• Wei Hu
• Ruben Verborgh
• Leo Chepelev
• Alison Callahan
• Jose Miguel Toledo Cruz
• Tanya Hiebert
• Beatriz Lujan
+ many more
Collaborators
• Mark Musen
• Nigam Shah
• Robert Hoehndorf
• Janna Hastings
• Christoph Steinbeck
• Egon Willighagen
• Nico Adams
• Colin Batchelor
• David Wild
• Evan Bolton
• Gang Fu
+ many more
@micheldumontier::ACS:23-08-1639