This document summarizes the IRIDA platform, a federated genomic epidemiology platform for Canada. IRIDA aims to (1) build a user-friendly analysis platform to process genomic data, (2) enable more efficient information sharing between public health agencies, and (3) standardize inconsistent information representation through the use of ontologies. The platform is a partnership between various public health and academic institutions to bridge gaps between genomic research and applications in public health outbreak investigations.
The Human Variome Database in Australia in 2014 - Graham TaylorHuman Variome Project
There are a number of genetics and genomics initiatives underway in Australia, including the Australian node of the Human Variome Project (HVPA), as well as many active research collaborations including familial cancer, endocrine disease, and developmental delay. Most of these projects work with disease-specific databases on a research basis, with the risk that such archives may be ephemeral. HVPA is the only database that is directly integrated with accredited clinical reporting of variants. As such it is designed to capture variants that have passed scrutiny as diagnostically robust, and have therefore already been curated by qualified staff. Registered users access the HVPA database via a secure Internet portal.
I will describe three recent developments of the HVPA database and portal: the upgraded search interface, linkage to other datasets via BioGrid using hash-based de-identified case matching, and the introduction of a genome wide database using LOVD3. Finally I will discuss the future direction of the HVPA and the questions of utility, quality control and sustainability of genetic variation databases.
Search interface
The search interface has to provide useful tools for clinicians and lab scientists so that the HPVA project offers them direct benefits and incentivises them to participate. Following a request for feedback from users, a series of improvements were implemented, initially on a demonstration server and then on the live server following review by the Steering Committee. The highest priorities were for more information about numbers of times particular variants were
recorded, the ability to search by range and to filter by pathogenicity. There was also interest in enabling direct uploading of VCF files and the automated calculation of pathogenicity scores. Many of these features are now implemented and examples will be presented.
Linkage to other datasets
We have implemented the hash key algorithm and work is in progress with BioGrid to link variation data to clinical data sets.
Genome wide database
We have established an HVPA LOVD3 database and are working with the Human Genetics Society of Australasia on a pilot study to sequence the exomes of two trios and review the data using this database.
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.cafionabrinkman
Talk at GenomeTrakr network meeting Sept 23 2015 in Washington DC. On Canada's open source Integrated Rapid Infectious Disease Analysis (IRIDA) bioinformatics platform - aiding genomic epidemiology analysis for public health agencies with planned open data release and linkage to GenomeTrakr. Discussed perspectives, challenges, solutions for getting more GenomeTrakr participation internationally.
Recomendations for infrastructure and incentives for open science, presented to the Research Data Alliance 6th Plenary. Presenter: William Gunn, Director of Scholarly Communications for Mendeley.
The Human Variome Database in Australia in 2014 - Graham TaylorHuman Variome Project
There are a number of genetics and genomics initiatives underway in Australia, including the Australian node of the Human Variome Project (HVPA), as well as many active research collaborations including familial cancer, endocrine disease, and developmental delay. Most of these projects work with disease-specific databases on a research basis, with the risk that such archives may be ephemeral. HVPA is the only database that is directly integrated with accredited clinical reporting of variants. As such it is designed to capture variants that have passed scrutiny as diagnostically robust, and have therefore already been curated by qualified staff. Registered users access the HVPA database via a secure Internet portal.
I will describe three recent developments of the HVPA database and portal: the upgraded search interface, linkage to other datasets via BioGrid using hash-based de-identified case matching, and the introduction of a genome wide database using LOVD3. Finally I will discuss the future direction of the HVPA and the questions of utility, quality control and sustainability of genetic variation databases.
Search interface
The search interface has to provide useful tools for clinicians and lab scientists so that the HPVA project offers them direct benefits and incentivises them to participate. Following a request for feedback from users, a series of improvements were implemented, initially on a demonstration server and then on the live server following review by the Steering Committee. The highest priorities were for more information about numbers of times particular variants were
recorded, the ability to search by range and to filter by pathogenicity. There was also interest in enabling direct uploading of VCF files and the automated calculation of pathogenicity scores. Many of these features are now implemented and examples will be presented.
Linkage to other datasets
We have implemented the hash key algorithm and work is in progress with BioGrid to link variation data to clinical data sets.
Genome wide database
We have established an HVPA LOVD3 database and are working with the Human Genetics Society of Australasia on a pilot study to sequence the exomes of two trios and review the data using this database.
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.cafionabrinkman
Talk at GenomeTrakr network meeting Sept 23 2015 in Washington DC. On Canada's open source Integrated Rapid Infectious Disease Analysis (IRIDA) bioinformatics platform - aiding genomic epidemiology analysis for public health agencies with planned open data release and linkage to GenomeTrakr. Discussed perspectives, challenges, solutions for getting more GenomeTrakr participation internationally.
Recomendations for infrastructure and incentives for open science, presented to the Research Data Alliance 6th Plenary. Presenter: William Gunn, Director of Scholarly Communications for Mendeley.
With its focus on investigating the basis for the sustained existence
of living systems, modern biology has always been a fertile, if not
challenging, domain for formal knowledge representation and automated
reasoning. With thousands of databases and hundreds of ontologies now
available, there is a salient opportunity to integrate these for
discovery. In this talk, I will discuss our efforts to build a rich
foundational network of ontology-annotated linked data, develop
methods to intelligently retrieve content of interest, uncover
significant biological associations, and pursue new avenues for drug
discovery. As the portfolio of Semantic Web technologies continue to
mature in terms of functionality, scalability, and an understanding of
how to maximize their value, researchers will be strategically poised
to pursue increasingly sophisticated KR projects aimed at improving
our overall understanding of human health and disease.
bio: Dr. Michel Dumontier is an Associate Professor of Medicine
(Biomedical Informatics) at Stanford University. His research aims to
find new treatments for rare and complex diseases. His research
interest lie in the publication, integration, and discovery of
scientific knowledge. Dr. Dumontier serves as a co-chair for the World
Wide Web Consortium Semantic Web in Health Care and Life Sciences
Interest Group (W3C HCLSIG) and is the Scientific Director for
Bio2RDF, a widely used open-source project to create and provide
linked data for life sciences.
Bio2RDF is an open-source project that offers a large and
connected knowledge graph of Life Science Linked Data. Each dataset is expressed using its own vocabulary, thereby hindering integration, search, query, and browse data across similar or identical types of data. With growth and content changes in source data, a manual approach to maintain mappings has proven untenable. The aim of this work is to develop a (semi)automated procedure to generate high quality mappings
between Bio2RDF and SIO using BioPortal ontologies. Our preliminary results demonstrate that our approach is promising in that it can find new mappings using a transitive closure between ontology mappings. Further development of the methodology coupled with improvements in
the ontology will offer a better-integrated view of the Life Science Linked Data
Scientific research is increasingly dependent on publicly avail-
able information and data sharing. So far, the best practices to ensure
that data is accessible and shareable has been to deposit it in public
repositories. However, these repositories often fail to implement mech-
anisms that measure data quality, which could lead to improving the
discoverability of existing data, and contribute to its future integration.
In light of this, we present Metadata Analyser, a tool that measures
metadata quality. It assesses the quality of metadata by considering the
proportion of terms actually linked to ontology concepts, as well as the
specificity of the terms used in the metadata. Metadata Analyser applied
to Metabolights, a real-world repository of metabolomics data, and re-
sults show that the tool successfully implements the proposed measures,
that there is indeed a lack of effort in the annotation task, and that our
tool can be used to improve this situation. Metadata Analyser’s frontend
is available at http://masterweb-metadataanalyser.rhcloud.com.
Semantic web technologies offer a potential mechanism for the representation and integration of thousands of biomedical databases. Many of these databases offer cross-references to other data sources, but these are generally incomplete and prone to error. In this paper, we conduct an empirical analysis of the link structure of life science Linked Data, obtained from the Bio2RDF project. Three different link graphs for datasets, entities and terms are characterized by degree, connectivity, and clustering metrics, and their correlation is measured as well. Furthermore, we utilize the symmetry and transitivity of entity links to build a benchmark and evaluate several popular entity matching approaches. Our findings indicate that the life science data network can help find hidden links, can be used to validate links, and may offer a mechanism to integrate a wider set of resources to support biomedical knowledge discovery.
Step by step tutorial for conducting GO enrichment analysis and then creating a network from the results.
Material from the UC Davis 2014 Proteomics Workshop.
See more at: http://sourceforge.net/projects/teachingdemos/files/2014%20UC%20Davis%20Proteomics%20Workshop/
Improving surveillance and early detection of Foot-and-mouth And Similar Tran...EuFMD
ANSES assessment of laboratory needs and capacities.
Improving surveillance and early detection of Foot-and-mouth And Similar Transboundary (FAST) animal diseases in the South-East European Neighbourhood (SEEN) countries / ANSES assessment of laboratory needs and capacities.
Virtual Workshop.
27-30 April 2020.
Genome sharing projects around the world nijmegen oct 29 - 2015Fiona Nielsen
Genome sharing projects across the world
Did you ever wonder what happened to the exponential increase in genome sequencing data? It is out there around the world and a lot of it is consented for research use. This means that if you just know where to find the data, you can potentially analyse gigabytes of data to power your research.
In this talk Fiona will present community genome initiatives, the genome sharing projects across the world, how you can benefit from this wealth of data in your work, and how you can boost your academic career by sharing and collaboration.
by Fiona Nielsen, Founder and CEO of DNAdigest and Repositive
With a background in software development Fiona pursued her career in bioinformatics research at Radboud University Nijmegen. Now a scientist-turned-entrepreneur Fiona founded DNAdigest and its social enterprise spin-out Repositive Ltd. Both the charity and company focus on efficient and ethical sharing of genetics data for research to accelerate diagnostics and cures for genetic diseases.
BIOLINK 2008: Linking database submissions to primary citations with PubMe...Heather Piwowar
Abstract: Background: Dataset submissions are growing exponentially. Links between dataset submissions and primary literature that describe the data collection are useful for many reasons: rich documentation, proper attribution, improved information retrieval, and enhanced text/data integration for analysis. Unfortunately, many database submissions do not include primary citation links, as database submissions are often made prior to publication. We suggest that automated tools can be developed to help identify links between dataset submissions and the primary literature. These tools require full text to differentiate cases of data sharing from data reuse and other contexts. In this study, we explore the possibility that deep analysis of full text may not be necessary, thereby enabling the querying of all reports in PubMed Central. Methods: We trained machine learning tree and rule-based classifiers on full-text open-access article unigram vectors, with the existence of a primary citation link from NCBI’s Gene Expression Omnibus (GEO) database submission records as the binary output class. We manually combined and simplified the classifier trees and rules to create a query compatible with the interface for PubMed Central. Results: The query identified 40% of non-OA articles with dataset submission links from GEO (recall), and 65% of the returned articles without dataset submission links were manually judged to include statements of dataset deposit despite having no link from the database (applicable precision). Conclusion: We hope this work inspires future enhancements, and highlights the opportunities for simple full-text queries in PubMed Central given the mandated influx of NIH-funded research reports.
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...ExternalEvents
http://www.fao.org/about/meetings/wgs-on-food-safety-management/en/
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in the Microbiological Testing & Traceability for Foodborne Pathogens. Presentation from the Technical Meeting on the impact of Whole Genome Sequencing (WGS) on food safety management -23-25 May 2016, Rome, Italy.
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...GigaScience, BGI Hong Kong
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: the reproducibility crisis, and the need for transparency. Melbourne University 19th September 2014
Using ADAGE for pathway-style analysesCasey Greene
This talk was given at the Simons Institute Network Biology workshop. A video of the talk is available online:
https://www.youtube.com/watch?v=HpXDoMi4YO8
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...ExternalEvents
http://www.fao.org/about/meetings/wgs-on-food-safety-management/en/
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infection Control in an Institutional Setting. Presentation from the Technical Meeting on the impact of Whole Genome Sequencing (WGS) on food safety management and GMI-9, 23-25 May 2016, Rome, Italy.
With its focus on investigating the basis for the sustained existence
of living systems, modern biology has always been a fertile, if not
challenging, domain for formal knowledge representation and automated
reasoning. With thousands of databases and hundreds of ontologies now
available, there is a salient opportunity to integrate these for
discovery. In this talk, I will discuss our efforts to build a rich
foundational network of ontology-annotated linked data, develop
methods to intelligently retrieve content of interest, uncover
significant biological associations, and pursue new avenues for drug
discovery. As the portfolio of Semantic Web technologies continue to
mature in terms of functionality, scalability, and an understanding of
how to maximize their value, researchers will be strategically poised
to pursue increasingly sophisticated KR projects aimed at improving
our overall understanding of human health and disease.
bio: Dr. Michel Dumontier is an Associate Professor of Medicine
(Biomedical Informatics) at Stanford University. His research aims to
find new treatments for rare and complex diseases. His research
interest lie in the publication, integration, and discovery of
scientific knowledge. Dr. Dumontier serves as a co-chair for the World
Wide Web Consortium Semantic Web in Health Care and Life Sciences
Interest Group (W3C HCLSIG) and is the Scientific Director for
Bio2RDF, a widely used open-source project to create and provide
linked data for life sciences.
Bio2RDF is an open-source project that offers a large and
connected knowledge graph of Life Science Linked Data. Each dataset is expressed using its own vocabulary, thereby hindering integration, search, query, and browse data across similar or identical types of data. With growth and content changes in source data, a manual approach to maintain mappings has proven untenable. The aim of this work is to develop a (semi)automated procedure to generate high quality mappings
between Bio2RDF and SIO using BioPortal ontologies. Our preliminary results demonstrate that our approach is promising in that it can find new mappings using a transitive closure between ontology mappings. Further development of the methodology coupled with improvements in
the ontology will offer a better-integrated view of the Life Science Linked Data
Scientific research is increasingly dependent on publicly avail-
able information and data sharing. So far, the best practices to ensure
that data is accessible and shareable has been to deposit it in public
repositories. However, these repositories often fail to implement mech-
anisms that measure data quality, which could lead to improving the
discoverability of existing data, and contribute to its future integration.
In light of this, we present Metadata Analyser, a tool that measures
metadata quality. It assesses the quality of metadata by considering the
proportion of terms actually linked to ontology concepts, as well as the
specificity of the terms used in the metadata. Metadata Analyser applied
to Metabolights, a real-world repository of metabolomics data, and re-
sults show that the tool successfully implements the proposed measures,
that there is indeed a lack of effort in the annotation task, and that our
tool can be used to improve this situation. Metadata Analyser’s frontend
is available at http://masterweb-metadataanalyser.rhcloud.com.
Semantic web technologies offer a potential mechanism for the representation and integration of thousands of biomedical databases. Many of these databases offer cross-references to other data sources, but these are generally incomplete and prone to error. In this paper, we conduct an empirical analysis of the link structure of life science Linked Data, obtained from the Bio2RDF project. Three different link graphs for datasets, entities and terms are characterized by degree, connectivity, and clustering metrics, and their correlation is measured as well. Furthermore, we utilize the symmetry and transitivity of entity links to build a benchmark and evaluate several popular entity matching approaches. Our findings indicate that the life science data network can help find hidden links, can be used to validate links, and may offer a mechanism to integrate a wider set of resources to support biomedical knowledge discovery.
Step by step tutorial for conducting GO enrichment analysis and then creating a network from the results.
Material from the UC Davis 2014 Proteomics Workshop.
See more at: http://sourceforge.net/projects/teachingdemos/files/2014%20UC%20Davis%20Proteomics%20Workshop/
Improving surveillance and early detection of Foot-and-mouth And Similar Tran...EuFMD
ANSES assessment of laboratory needs and capacities.
Improving surveillance and early detection of Foot-and-mouth And Similar Transboundary (FAST) animal diseases in the South-East European Neighbourhood (SEEN) countries / ANSES assessment of laboratory needs and capacities.
Virtual Workshop.
27-30 April 2020.
Genome sharing projects around the world nijmegen oct 29 - 2015Fiona Nielsen
Genome sharing projects across the world
Did you ever wonder what happened to the exponential increase in genome sequencing data? It is out there around the world and a lot of it is consented for research use. This means that if you just know where to find the data, you can potentially analyse gigabytes of data to power your research.
In this talk Fiona will present community genome initiatives, the genome sharing projects across the world, how you can benefit from this wealth of data in your work, and how you can boost your academic career by sharing and collaboration.
by Fiona Nielsen, Founder and CEO of DNAdigest and Repositive
With a background in software development Fiona pursued her career in bioinformatics research at Radboud University Nijmegen. Now a scientist-turned-entrepreneur Fiona founded DNAdigest and its social enterprise spin-out Repositive Ltd. Both the charity and company focus on efficient and ethical sharing of genetics data for research to accelerate diagnostics and cures for genetic diseases.
BIOLINK 2008: Linking database submissions to primary citations with PubMe...Heather Piwowar
Abstract: Background: Dataset submissions are growing exponentially. Links between dataset submissions and primary literature that describe the data collection are useful for many reasons: rich documentation, proper attribution, improved information retrieval, and enhanced text/data integration for analysis. Unfortunately, many database submissions do not include primary citation links, as database submissions are often made prior to publication. We suggest that automated tools can be developed to help identify links between dataset submissions and the primary literature. These tools require full text to differentiate cases of data sharing from data reuse and other contexts. In this study, we explore the possibility that deep analysis of full text may not be necessary, thereby enabling the querying of all reports in PubMed Central. Methods: We trained machine learning tree and rule-based classifiers on full-text open-access article unigram vectors, with the existence of a primary citation link from NCBI’s Gene Expression Omnibus (GEO) database submission records as the binary output class. We manually combined and simplified the classifier trees and rules to create a query compatible with the interface for PubMed Central. Results: The query identified 40% of non-OA articles with dataset submission links from GEO (recall), and 65% of the returned articles without dataset submission links were manually judged to include statements of dataset deposit despite having no link from the database (applicable precision). Conclusion: We hope this work inspires future enhancements, and highlights the opportunities for simple full-text queries in PubMed Central given the mandated influx of NIH-funded research reports.
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...ExternalEvents
http://www.fao.org/about/meetings/wgs-on-food-safety-management/en/
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in the Microbiological Testing & Traceability for Foodborne Pathogens. Presentation from the Technical Meeting on the impact of Whole Genome Sequencing (WGS) on food safety management -23-25 May 2016, Rome, Italy.
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...GigaScience, BGI Hong Kong
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: the reproducibility crisis, and the need for transparency. Melbourne University 19th September 2014
Using ADAGE for pathway-style analysesCasey Greene
This talk was given at the Simons Institute Network Biology workshop. A video of the talk is available online:
https://www.youtube.com/watch?v=HpXDoMi4YO8
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...ExternalEvents
http://www.fao.org/about/meetings/wgs-on-food-safety-management/en/
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infection Control in an Institutional Setting. Presentation from the Technical Meeting on the impact of Whole Genome Sequencing (WGS) on food safety management and GMI-9, 23-25 May 2016, Rome, Italy.
This talk explores how principles derived from experimental design practice, data and computational models can greatly enhance data quality, data generation, data reporting, data publication and data review.
Ontomaton: NCBO BioPortal Ontology lookups in Google Spreadsheets produced by ISATeam at University of Oxford e-Research Centre (Eamonn Maguire, Alejandra Gonzalez-Beltran, Philippe Rocca-Serra and Susanna Sansone) and NCBO (Trish Whetzel).
The work was presented during ICBO 2013 in Montreal by Trish Whetzel (Thanks Trish!)
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Alejandra Gonzalez-Beltran
Metagenomic Data Provenance and Management using the ISA infrastructure - overview, implementation patterns & software tools
Slides presented at EBI Metagenomics Bioinformatics course: http://www.ebi.ac.uk/training/course/metagenomics2014
IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...William Hsiao
Introducing BCCDC and Public Health Microbiology (PHM)
Current State of PHM
Sequence Technology Advancement -> revolution of PHM
Genomic Epidemiology
Amount of Sequence Data Produced
Need to Process the data – Introduction to IRIDA
Need of Metadata and Ontology
Software to improve data sharing
How research microbiology and PHM can joint effort
Workshop finding and accessing data - fiona nadia charlotte - cambridge apr...Fiona Nielsen
Workshop presentation on finding and accessing human genomics data for research.
Including statistics of publicly available data sources and tips on how to save time in your workflow of data access.
Organised in collaboration between DNAdigest and Open Data Cambridge.
Read more about our work:
http://DNAdigest.org
http://repositive.io
https://uk.linkedin.com/in/fionanielsen
http://www.data.cam.ac.uk
SciDataCon - How to increase accessibility and reuse for clinical and persona...Fiona Nielsen
Presented in session 48 - Sharing of sensitive data - presented by Fiona Nielsen on September 12, 2016 at #SciDataCon http://scidatacon.org
We have addressed the most pressing problem for public genomic data, that of data discoverability, by indexing worldwide resources for genomic research data on an online platform (repositive.io) providing a single point of entry to find and access available genomic research data.
http://www.scidatacon.org/2016/sessions/48/paper/26/
http://www.scidatacon.org/2016/sessions/48/
International data week - #RDAPlenary #IDW2016
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and PresentTim Williams
Abstract:
In recent years, the PhUSE organization has supported several Linked Data initiatives. The CDISC Foundational Standards as RDF is an early example of one such initiative. The results are available on the CDISC website. Subsequent proof of concept projects enjoyed marginal success at a time when pharma’s familiarity with the technology was still very limited. A recent surge in interest in F.A.I.R. data and Knowledge Graphs has sparked renewed interest in Linked Data within PhUSE and the industry at large. The recently completed “Clinical Trials Data as RDF (CTDasRDF)” spawned a new project, “Going Translational With Linked Data (GoTWLD).” GoTWLD extends the project scope of its predecessor beyond SDTM into the non-clinical domain.
Educational initiatives at PhUSE include an introductory, interactive workshop at the annual European conference (EU-Connect) and at the US Computational Science Symposium (CSS). A side-project of GoTWLD is investigating the potential use of URIs as study identifiers to promote adoption of Linked Data. Challenges remain, including the need for demonstrable return on investment and the development of user-friendly, intuitive interfaces for graph data. These challenges can be overcome if pharmaceutical companies cooperate in the pre-competitive space.
Presented at Semantics@Roche, Basel 2019-04-04
CINECA webinar slides: Open science through fair health data networks dream o...CINECAProject
Since the FAIR data principles were published in 2016, many organizations including science funders and governments have adopted these principles to promote and foster true open science collaborations. However, to define a vision and create a video of a Personal Health Train that leverages worldwide FAIR health data in a federated manner is one step. To actually make this happen at scale and be able to show new scientific and medical insights for it is quite another!
In this webinar, we will dive into the basics of FAIR health data, but also take stock of the current situation in health data networks: after a year of frantic research and collaborations and many open datasets and hackathons on COVID-19, has the situation actually improved? Are we sharing health data on a global scale to improve medical practice, or is quality medical data still only accessible to researchers with the right credentials and deep pockets?
This webinar is part of the “How FAIR are you” webinar series and hackathon, which aim at increasing and facilitating the uptake of FAIR approaches into software, training materials and cohort data, to facilitate responsible and ethical data and resource sharing and implementation of federated applications for data analysis.
The CINECA webinar series aims to discuss ways to address common challenges and share best practices in the field of cohort data analysis, as well as distribute CINECA project results. All CINECA webinars include an audience Q&A session during which attendees can ask questions and make suggestions. Please note that all webinars are recorded and available for posterior viewing. CINECA webinars include an audience Q&A session during which attendees can ask questions and make suggestions.
This webinar took place on 21st January 2021 and is part of the CINECA webinar series.
For previous and upcoming CINECA webinars see:
https://www.cineca-project.eu/webinars
Access the webinar: http://goo.gl/p08pTz
These slides were presented in a webinar by Denodo in collaboration with BioStorage Technologies and Indiana Clinical and Translational Sciences Institute and Regenstrief Institute.
BioStorage Technologies, Inc., Indiana Clinical and Translational Sciences Institute, and Regenstrief Institute (CTSI) have joined Denodo to talk about the important role of technological advancements, such as data virtualization, in advancing biospecimen research.
By watching this webinar, you can gain insight into best practices around the integration of biospecimen and research data as well as technology solutions that provide consolidated views and rapid conversions of this data into valuable business insights. You will also learn how data virtualization can assist with the integration of data residing in heterogeneous repositories and can securely deliver aggregated data in real-time.
This presentation was provided by Violeta Ilik of Northwestern University during the NISO Virtual Conference held on Feb 15, 2017, entitled Institutional Repositories: Ensuring Yours is Populated, Useful and Thriving. The DOI for this presentation is http://dx.doi.org/10.18131/G3VP6R
Internal NIH Seminar to the BISTI Team on some early thoughts from the Associate Director for Data Science (ADDS). These ideas are for discussion only and in no way reflect what might happen subsequently. Presented April 1, 2014 (the date is purely a coincidence).
Sbm open science committee report to the boardBradford Hesse
In the spirit of transparency, I am uploading a mid-course presentation I made to the Board of Directors for the Society of Behavioral Medicine on the topic of Open Science. The report embodies the best thinking of some of the greatest thinkers in our field.
This pdf is about the Schizophrenia.
For more details visit on YouTube; @SELF-EXPLANATORY;
https://www.youtube.com/channel/UCAiarMZDNhe1A3Rnpr_WkzA/videos
Thanks...!
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
The increased availability of biomedical data, particularly in the public domain, offers the opportunity to better understand human health and to develop effective therapeutics for a wide range of unmet medical needs. However, data scientists remain stymied by the fact that data remain hard to find and to productively reuse because data and their metadata i) are wholly inaccessible, ii) are in non-standard or incompatible representations, iii) do not conform to community standards, and iv) have unclear or highly restricted terms and conditions that preclude legitimate reuse. These limitations require a rethink on data can be made machine and AI-ready - the key motivation behind the FAIR Guiding Principles. Concurrently, while recent efforts have explored the use of deep learning to fuse disparate data into predictive models for a wide range of biomedical applications, these models often fail even when the correct answer is already known, and fail to explain individual predictions in terms that data scientists can appreciate. These limitations suggest that new methods to produce practical artificial intelligence are still needed.
In this talk, I will discuss our work in (1) building an integrative knowledge infrastructure to prepare FAIR and "AI-ready" data and services along with (2) neurosymbolic AI methods to improve the quality of predictions and to generate plausible explanations. Attention is given to standards, platforms, and methods to wrangle knowledge into simple, but effective semantic and latent representations, and to make these available into standards-compliant and discoverable interfaces that can be used in model building, validation, and explanation. Our work, and those of others in the field, creates a baseline for building trustworthy and easy to deploy AI models in biomedicine.
Bio
Dr. Michel Dumontier is the Distinguished Professor of Data Science at Maastricht University, founder and executive director of the Institute of Data Science, and co-founder of the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. His research explores socio-technological approaches for responsible discovery science, which includes collaborative multi-modal knowledge graphs, privacy-preserving distributed data mining, and AI methods for drug discovery and personalized medicine. His work is supported through the Dutch National Research Agenda, the Netherlands Organisation for Scientific Research, Horizon Europe, the European Open Science Cloud, the US National Institutes of Health, and a Marie-Curie Innovative Training Network. He is the editor-in-chief for the journal Data Science and is internationally recognized for his contributions in bioinformatics, biomedical informatics, and semantic technologies including ontologies and linked data.
Cancer cell metabolism: special Reference to Lactate PathwayAADYARAJPANDEY1
Normal Cell Metabolism:
Cellular respiration describes the series of steps that cells use to break down sugar and other chemicals to get the energy we need to function.
Energy is stored in the bonds of glucose and when glucose is broken down, much of that energy is released.
Cell utilize energy in the form of ATP.
The first step of respiration is called glycolysis. In a series of steps, glycolysis breaks glucose into two smaller molecules - a chemical called pyruvate. A small amount of ATP is formed during this process.
Most healthy cells continue the breakdown in a second process, called the Kreb's cycle. The Kreb's cycle allows cells to “burn” the pyruvates made in glycolysis to get more ATP.
The last step in the breakdown of glucose is called oxidative phosphorylation (Ox-Phos).
It takes place in specialized cell structures called mitochondria. This process produces a large amount of ATP. Importantly, cells need oxygen to complete oxidative phosphorylation.
If a cell completes only glycolysis, only 2 molecules of ATP are made per glucose. However, if the cell completes the entire respiration process (glycolysis - Kreb's - oxidative phosphorylation), about 36 molecules of ATP are created, giving it much more energy to use.
IN CANCER CELL:
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
introduction to WARBERG PHENOMENA:
WARBURG EFFECT Usually, cancer cells are highly glycolytic (glucose addiction) and take up more glucose than do normal cells from outside.
Otto Heinrich Warburg (; 8 October 1883 – 1 August 1970) In 1931 was awarded the Nobel Prize in Physiology for his "discovery of the nature and mode of action of the respiratory enzyme.
WARNBURG EFFECT : cancer cells under aerobic (well-oxygenated) conditions to metabolize glucose to lactate (aerobic glycolysis) is known as the Warburg effect. Warburg made the observation that tumor slices consume glucose and secrete lactate at a higher rate than normal tissues.
Richard's entangled aventures in wonderlandRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
IRIDA: Canada’s federated platform for genomic epidemiology
1. IRIDA: Canada’s federated platform
for genomic epidemiology
William Hsiao, Ph.D.
William.hsiao@bccdc.ca
@wlhsiao
BC Public Health Microbiology and Reference Laboratory
and University of British Columbia
ABPHM 2015
2. Genome Canada Bioinformatics Competition: Large-Scale Project
“A Federated Bioinformatics Platform for
Public Health Microbial Genomics”
Our Goal
The IRIDA platform
(Integrated Rapid Infectious Disease Analysis)
An open source, standards compliant, high quality genomic epidemiology
analysis platform to support real-time (food-borne) disease outbreak
investigations
2 www.IRIDA.ca
3. 3
Each year, one in eight Canadians (or
four million people)
get sick with a domestically acquired
food-borne illness.
4. Partnership among public health agencies and academic institutes to bridge the gaps
between advancements in genomic epidemiology and application to real-life and real-
time use cases in public health agencies
- Project Team has direct access to state of the art research in academia
- Project Team is directly embedded in user organization
National
Public Health Agency
Provincial
Public Health Agency
Academic/Public
5. Interviews with key personnel to identify
barriers to implement genomic epidemiology in
public health agencies
5
6. GAP 1: PUBLIC HEALTH PERSONNEL
LACK TRAINING IN GENOMICS
7. Solution 1a: Build a User Friendly, high quality
analysis platform to process genomics data
• Carefully designed and engineered software platform is
just the starting point… User
Interface
Security
File system
Metadata
Storage
Application
logic
REST API
Workflow Execution Manager
Continuous Integration Documentation
8. • Easy to use interface hiding the technical details
Solution 1a: Build a User Friendly, high quality
analysis platform to process genomics data
9. Solution 1a: Build a User Friendly, high quality
analysis platform to process genomics data
10. Solution 1b: Build Portable and Transparent
Pipelines
• Use Galaxy as workflow engine – large
community support
• Retools to address usability, security, and
other limitations
• Version Controlled Pipeline Templates
• Input files, parameters, and workflow are
sent to IRIDA-specific Galaxy for execution
• Results and provenance information are
copied from Galaxy
1. Input
files sent to
Galaxy
3. Results
downloaded
from Galaxy
IRIDA UI/DB
Galaxy
Assembly Tools
Variant Calling Tools
…
REST API
Shared File System
Worker Worker
2. Tools executed
on Galaxy workers
11. Solution 1c: Start the training NOW!
• Canada’s National Microbiology Laboratory has hosted
genomic workshops for partners and collaborators
• IRIDA Project has dedicated funding for hosting workshops in
4Q of 2015 and 2016
• We would like to hear about other training initiatives and
share experience and training material
13. Many Players in surveillance and outbreak –
ineffective information sharing
Source: M. Taylor, BCCDC
Provincial public
health dept.
National laboratory
Local public
health dept.
Provincial
laboratory
Cases
Physicians Frontline lab
Information
BioinformaticsandAnalyticalCapacities
14. Many Systems used in Reporting Diseases –
require data re-entry and re-coding
National Ministry of
Health
Provincial public
health dept.
National laboratory
Local public
health dept.
Provincial
laboratory
Cases
Physicians Local laboratory
Fax/Electronic
Fax
Phone/Fax
Electronic/Paper
Electronic/Fax/Phone
Mailing of
Samples/Fax/Eelctroni
c
Source: M. Taylor, BCCDC
15. IRIDA is designed with these dilemma in mind
• Solutions:
– 2a: Localized Instance of federated databases
– 2b: Permission Control – authentication /authorization for
information sharing
– 2c: User role-based display of information
16. Solution 2a: Local/Cloud Instances and Data
Federation
• Data processing capacity pushed to data generating
labs
• Allow data sharing securely for enhanced analysis
• Eventually cultivating a culture of openness of data
sharing and collaborative development of tools
16
17. Authorization
Solution 2b: Security
• Local authorization per instance.
• Method-level authorization.
• Object-level authorization.
• Allow secure, fine grained and
flexible information sharing
18. Solution 2c: Role-based Dynamic Display driven
by Ontology
• Ontologies often lack a content management system (CMS)
• An Interface Model Ontology (IFM) can define a CMS for an
ontology
19. IFM Interface View Permissions
Detailed View Restricted View
E.g. User role permissions control visibility and editing of content
21. Solution 3a: Use Ontology
• Ontology: a way to describe types of entities
and relations between them
• Why use ontology
– Ontology is flexible and expandable
– Lower levels of expressivity (e.g. controlled vocabulary,
data dictionary) are heavy handed and show low level of
compliance and adoption
– Free text used as an alternative that are not computing
friendly
– Ontology and semantic web technologies may be a
solution
22. Many Domains of Knowledge are needed to describe
an outbreak investigation Build On, Work With:
OBI
TypON
NGSOnto
NIAID-GSC-BRC core metadata
MIxS Ontology
NCBI Biosample etc
TRANS – Pathogen Transmission
EPO
Exposure Ontology
Infectious Disease Ontology
CARD, ARO for AMR
USDA Nutrient DB
EFSA Comp. Food Consump. DB
Example gaps to be filled:
Expand food ontology; expand CARD
AMR data with others.
23. Lab Checklist/Ontology
• Currently finishing a lab/genomics checklist and
starting an epidemiology checklist
• Metadata Domains:
– Sample Collection
– Sample Source
– Environmental
– Lab Analytics
– Sequencing Process /QC
– Sequencing Run /QC
– Assembly Process / QC
– Others overlapping with Epi: Demographic / Geographic /
etc.
24. GAP 4: GENOMIC DATA
INTERPRETATION IS COMPLEX AND
TECHNOLOGY IS EVOLVING
25. Solution 4a: Use of QA/QC in IRIDA
• Software Engineering
– High quality software that meets regulatory guidelines
– Open Source product to ensure “white box” testing
– Ontology driven software development
– Follow proper software development cycle
• Data Quality
– Built-in modules to check for input data quality
– Warnings and Feedbacks during pipeline execution to laboratory technologists
– Use of Ontology to check metadata (non-genomic) data quality
• Analytic Tool Quality
– Utilize validation datasets
– Use of abstract pipeline description – with version control
– Periodic analysis of exceptions and boundary cases to assess tool accuracy
26. Solution 4b: Generation of validation datasets
To Participate, Contact
Rene Hendriksen
rshe@food.dtu.dk
Or
Errol Strain
Errol.Strain@fda.hhs.gov
http://www.globalmicrobialidentifier.org/Workgroups#work-group-4
27. Solution 4c: Exploratory tools can access certain
data via REST API securely
27
http://pathogenomics.sfu.ca/islandviewer
IslandViewer
Dhillon and Laird et al. 2015, Nucleic Acids
Research
http://kiwi.cs.dal.ca/GenGIS
Parks et al. 2013, PLoS One
28. Availability
• Jun 1 2015: IRIDA 1.0 beta Internal Release
– Release to collaborators for installation and full test
• Jul 1 2015: IRIDA 1.0 beta1
– Announce Beta release, download, documentation
available on website – www.irida.ca
• Aug 1 2015: IRIDA 1.0 beta2
– Cloud installer, with documentation
– Additional pipelines as available
– Visualization as available
29. Acknowledgements
Project Leaders
Fiona Brinkman – SFU
Will Hsiao – PHMRL
Gary Van Domselaar – NML
University of Lisbon
Joᾶo Carriҫo
National Microbiology Laboratory (NML)
Franklin Bristow
Aaron Petkau
Thomas Matthews
Josh Adam
Adam Olson
Tarah Lynch
Shaun Tyler
Philip Mabon
Philip Au
Celine Nadon
Matthew Stuart-Edwards
Morag Graham
Chrystal Berry
Lorelee Tschetter
Aleisha Reimer
Laboratory for Foodborne Zoonoses (LFZ)
Eduardo Taboada
Peter Kruczkiewicz
Chad Laing
Vic Gannon
Matthew Whiteside
Ross Duncan
Steven Mutschall
Simon Fraser University (SFU)
Melanie Courtot
Emma Griffiths
Geoff Winsor
Julie Shay
Matthew Laird
Bhav Dhillon
Raymond Lo
BC Public Health Microbiology &
Reference Laboratory (PHMRL) and BC
Centre for Disease Control (BCCDC)
Judy Isaac-Renton
Patrick Tang
Natalie Prystajecky
Jennifer Gardy
Damion Dooley
Linda Hoang
Kim MacDonald
Yin Chang
Eleni Galanis
Marsha Taylor
Cletus D’Souza
Ana Paccagnella
University of Maryland
Lynn Schriml
Canadian Food Inspection Agency (CFIA)
Burton Blais
Catherine Carrillo
Dominic Lambert
Dalhousie University
Rob Beiko
Alex Keddy
29
McMaster University
Andrew McArthur
Daim Sardar
European Nucleotide Archive
Guy Cochrane
Petra ten Hoopen
Clara Amid
European Food Safety Agency
Leibana Criado Ernesto
Vernazza Francesco
Rizzi Valentina
31. The IRIDA platform
(Integrated Rapid Infectious Disease Analysis)
An open source, standards compliant, high quality genomic epidemiology
analysis platform to support real-time (food-borne) disease outbreak
investigations
Contacts:
William.hsiao@bccdc.ca
@wlhsiao
31 www.IRIDA.ca
Editor's Notes
Today, I’d like to tell you a bit about some of Canada’s effort on building a genomic epidemiology analysis platform
IRIDA was conceived about 2 years ago through a Genome Canada Bioinformatics Grant. It is an effort to build an open source, standards compliant, high quality genomic epidemiology analysis platform to support real-time disease outbreak investigations, initially focused on food-borne illnesses
Despite our high standard in food safety, each year 1 in eight Canadian get food poisoning, costing the economy $4 billion dollars. It is important to track the source and spread of the disease to prevent further sickness
IRIDA is partnership among provincial public health agencies, national public health agencies and academic institutes to bridge the gaps between advancements in genomic epidemiology and real-life and real-time use cases in public health agencies
Project Team has direct access to state of the art research in academia
Project Team is directly embedded in user organization
Since we have access to the end users, we conducted interviews with these subject experts to identify what are the barriers for up-taking of genomics epidemiology in public health agencies. We interviewed epidemiologists, lab scientists and technologists, medical microbiologists and lab administrators. So for the rest of the presentation, I’ll talk about some of the gaps we identified and how IRIDA can meet the requirements.
The first gap which should not be a surprise to this audience, is that public health workers are mostly unfamiliar with genomics and the bioinformatics analysis needed to process and interpret genomic data
While we do believe in the long run, adequate training in genomics is needed to bridge this gap and in the short term, experts such as yourselves are needed as stopgaps, having high quality analysis platform to automate data processing and has consistent analysis protocols will help to ease the transition.
However, carefully designed and engineered software platform is just the starting point and there will no doubt be many similar platforms to choose from. So I will touch on some of the more interesting design philosophies we have for IRIDA.
We found that in the diagnostic testing world, complex procedures with lots of options lead to more human errors and more non-compliance. So, one design solution that we stress on is to have a simple user interface that hides the technical details. This solution of course can’t stand on its own and I’ll describe measures to ensure that flexibility and scientific rigors can be maintained
We think a user interface should be like a joke… If you have to explain it , then it’s not good. That said, we do have extensive documentations for the administrators and accreditation auditors who don’t like jokes :P
Next solution is to leverage Galaxy which has a large community support and user base as our pipeline engine. We had to retool Galaxy extensively to address usability, security and other limitations. To achieve this we build IRIDA platform on top of the Galaxy engine where input files, parameters and workflows are sent to Galaxy for execution and results and pipeline provenance information are copied back into the IRIDA database for
To address the knowledge gaps in genomics, we have started training our public health lab workers on genomic analysis. We would like to hear about other training initiatives and will be happy to share our experience and training material
The second gap that we identified is that sharing of information within and between organizations are highly inefficient and often involves sharing of Excel files with deleted columns to hide sensitive information
There are many players involved in infectious disease surveillance and outbreak investigation. However, concerns with privacy and confidentiality (both founded and unfounded) means that information tend to be aggregated and lost as we move from the frontline labs to public health and reference labs. However the bioinformatics and analytical capacities are the most abundant in central labs and academia
Moreover, different institutions have different software and often data is exported and printed, faxed, then re-imported to a new system by re-typing! This is a huge waste of time and source of errors
IRIDA has a few designs to deal with these issues, and I’ll highlight 3 here.
First we propose that we should push the data processing capacity to the periphery where data is the richest by encouraging local or private cloud instances of the IRIDA platform. This way our partners would not be obligated to give up their data. The different instances are connected via a federated database schema. Data can then be shared securely and easily to allow enhanced analysis to be done by genomic experts located centrally. The more we share successfully, the more likely people will realize the benefit in sharing and this can lead to a new culture of openness
Second we have built-in mechanisms for authentication and authorization at different levels to allow secure and fine grained information sharing. This would allow parties to customize the data they share per material and data transfer agreements
Third, we realized we need to have a flexible user interface to present the data. Therefore, we are in the process of developing an interface model ontology which defines a content management system.
As an example, based on the user’s role, they will be able to see the content of the database displayed differently.
The third gap we identified is that information representation is inconsistent across organizations
Given the richness and complexity of genomic epidemiological data, we opt to use and develop ontologies compliant with OBO Foundry to describe the data; Currently, lower levels of expressivity such as controlled vocabularies and data dictionaries are used but they tend to be heavy handed and show low level of compliance and adoption. We believe ontology and semantic web technologies can make data sharing across heterogeneous systems and platforms more tractable.
There are many domains of knowledge needed to describe an outbreak investigation and we strive to re-use existing standards as much as possible
Currently we are finishing a lab/genomic checklist and will be starting an epidemiology checklist soon
Lastly, Jon and others mentioned yesterday, genomic data interpretation is complex and the technology is still evolving, yet in the world of diagnostic lab, accreditation means standardized protocols need to be developed
So we focus quite a bit of our energy on developing high quality software with build-in QA and QC components to assess data quality and analytic tool performance
I also want to highlight GMI’s WG4’s effort in developing proficiency tests for wet lab and analysis pipelines. To participate you can contact Rene or Errol.
To facilitate tool improvement and to allow exploratory analysis not part of IRIDA pipelines to be done, we would also allow pre-authorized tools to connect to IRIDA via a REST API securely. Currently we have two external tools for genomic island detection and phylogeography analysis.
The software will be released to a few international collaborators for full testing by Jun 1. Then in Jul 1, we plan to release the beta version publicly so people can try it out. Of course the software will be free and we would love to collaborate with people on both the software and the ontology development.
Large Group of People who contributed to this work