Presentation on how to enable model reuse in systems biology. Presented as part of the series "Führende Köpfe in der IT - Wissenschaftlerinnen im Dialog" (ZB Med, Bonn, Germany)
This document discusses bioinformatics and its importance in biomedical imaging and image processing. It begins by defining bioinformatics as the method of storing, organizing, retrieving and analyzing biological data. Large amounts of biological data are now being produced and require sophisticated computing methods. The goals of bioinformatics include optimally organizing vast databases of biological information so it can be easily accessed and analyzed. Key approaches in bioinformatics involve comparing new genetic and protein sequences to existing databases to better understand biological processes.
This document provides a summary of recent publications related to research conducted at the WPI-ICReDD. It lists five publications from 2018-2019 related to catalysis and materials science. It then discusses the research projects and personnel involved in the JST CREST program that is funding this work. The document outlines the goals of using data-driven approaches and machine learning to optimize materials discovery and design. It proposes a multilevel framework that combines in-house and public data along with quality control and annotations to advance the field.
Friday, October 15th, 2021, Sapporo, Hokkaido, Japan.
Hokkaido University ICReDD - Faculty of Medicine Joint Symposium
https://www.icredd.hokudai.ac.jp/event/5993
ICReDD (Institute for Chemical Reaction Design and Discovery)
https://www.icredd.hokudai.ac.jp
The evolution to network and computational paradigm has gone through a amazing phase of
expansion and development. The growth curve was indeed very steep in many major domains. The
advent of Cloud computing & Machine learning has enhanced the implementation in application area like
Bioinformatics. With huge application-domain scope Cloud computing has emerged as a special area of
interest for many bioinformatics researchers. Research is being done on different aspects of Cloud
computing with bioinformatics for identifying areas of improvement and their respective remedies for
living beings. Specially the cloud computing are acting very helpful for identifying H1N1 virus in human.
H1N1 is an infectious virus which, when spread affects a large volume of the population. It
spreads very easily and has a high death rate. Similarly cloud computing doing good job for detection of
Hypertension, Diabetics, Cancer and Heart patient with software as a service, so the development of
healthcare support systems using cloud computing is emerging as an effective solution with the
benefits of better quality of service, reduced costs. This paper, provide an effective review towards cloud
computing important effort in a field of bioinformatics.
Anomaly Detection in Fruits using Hyper Spectral Imagesijtsrd
One of the biggest problems in hyper spectral image analysis is the wavelength selection because of the immense amount of hypercube data. In this paper, we introduce an approach to find out the optimal wavelength selection in predicting the quality of the fruit. Hyper spectral imaging was built with spectral region of 400nm to 1000nm for fruit defect detection. For image acquisition, we used fluorescent light as the light source. Analysis was performed in visible region, which had spectral from 413nm to 642nm it was done because of the low reflectance spectrum found in fluorescent light sources. The captured image in this experiment demonstrates irregular illumination that means half of the fruit has brighter area. Analysis of the hyper spectral image was done in order to select diverse wavelengths that could possibly be used in multispectral imaging system. Selected wavelengths were used to create a separate image and each image went through thresholding. Experiment shows a multispectral imaging system which is able to detect defects in fruits by selecting most contributing wavelengths from the hyper spectral image. Algorithm presented in this paper could be improved with morphology operations so that we could get the actual size of the defect. Sandip Kumar | Parth Kapil | Yatika Bhardwaj | Uday Shankar Acharya | Charu Gupta ""Anomaly Detection in Fruits using Hyper Spectral Images"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-4 , June 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23753.pdf
Paper URL: https://www.ijtsrd.com/computer-science/artificial-intelligence/23753/anomaly-detection-in-fruits-using-hyper-spectral-images/sandip-kumar
Role of Bioinformatics in Cancer Research Akash Arora
The document discusses the role of bioinformatics in cancer research. It explains that cancer is abnormal cell growth caused by chromosomal rearrangements, mutations, and errors in molecular machinery. Bioinformatics is the science of collecting and analyzing complex biological data like genetic codes, using tools to analyze data from databases of cancer information. This data can be used for cancer progression insight, drug target identification, early detection through biomarkers, and personalized medicine through risk analysis and bio-simulations. Software tools and packages like R-Project are used to analyze this genomic and molecular interaction data to further the understanding and treatment of cancer.
Machine Learning for Molecules: Lessons and Challenges of Data-Centric ChemistryIchigaku Takigawa
Perspectives on Artificial Intelligence and Machine Learning in Materials Science
February 4, 2022. – February 6, 2022.
https://joint.imi.kyushu-u.ac.jp/post-2698/
This document discusses bioinformatics and its importance in biomedical imaging and image processing. It begins by defining bioinformatics as the method of storing, organizing, retrieving and analyzing biological data. Large amounts of biological data are now being produced and require sophisticated computing methods. The goals of bioinformatics include optimally organizing vast databases of biological information so it can be easily accessed and analyzed. Key approaches in bioinformatics involve comparing new genetic and protein sequences to existing databases to better understand biological processes.
This document provides a summary of recent publications related to research conducted at the WPI-ICReDD. It lists five publications from 2018-2019 related to catalysis and materials science. It then discusses the research projects and personnel involved in the JST CREST program that is funding this work. The document outlines the goals of using data-driven approaches and machine learning to optimize materials discovery and design. It proposes a multilevel framework that combines in-house and public data along with quality control and annotations to advance the field.
Friday, October 15th, 2021, Sapporo, Hokkaido, Japan.
Hokkaido University ICReDD - Faculty of Medicine Joint Symposium
https://www.icredd.hokudai.ac.jp/event/5993
ICReDD (Institute for Chemical Reaction Design and Discovery)
https://www.icredd.hokudai.ac.jp
The evolution to network and computational paradigm has gone through a amazing phase of
expansion and development. The growth curve was indeed very steep in many major domains. The
advent of Cloud computing & Machine learning has enhanced the implementation in application area like
Bioinformatics. With huge application-domain scope Cloud computing has emerged as a special area of
interest for many bioinformatics researchers. Research is being done on different aspects of Cloud
computing with bioinformatics for identifying areas of improvement and their respective remedies for
living beings. Specially the cloud computing are acting very helpful for identifying H1N1 virus in human.
H1N1 is an infectious virus which, when spread affects a large volume of the population. It
spreads very easily and has a high death rate. Similarly cloud computing doing good job for detection of
Hypertension, Diabetics, Cancer and Heart patient with software as a service, so the development of
healthcare support systems using cloud computing is emerging as an effective solution with the
benefits of better quality of service, reduced costs. This paper, provide an effective review towards cloud
computing important effort in a field of bioinformatics.
Anomaly Detection in Fruits using Hyper Spectral Imagesijtsrd
One of the biggest problems in hyper spectral image analysis is the wavelength selection because of the immense amount of hypercube data. In this paper, we introduce an approach to find out the optimal wavelength selection in predicting the quality of the fruit. Hyper spectral imaging was built with spectral region of 400nm to 1000nm for fruit defect detection. For image acquisition, we used fluorescent light as the light source. Analysis was performed in visible region, which had spectral from 413nm to 642nm it was done because of the low reflectance spectrum found in fluorescent light sources. The captured image in this experiment demonstrates irregular illumination that means half of the fruit has brighter area. Analysis of the hyper spectral image was done in order to select diverse wavelengths that could possibly be used in multispectral imaging system. Selected wavelengths were used to create a separate image and each image went through thresholding. Experiment shows a multispectral imaging system which is able to detect defects in fruits by selecting most contributing wavelengths from the hyper spectral image. Algorithm presented in this paper could be improved with morphology operations so that we could get the actual size of the defect. Sandip Kumar | Parth Kapil | Yatika Bhardwaj | Uday Shankar Acharya | Charu Gupta ""Anomaly Detection in Fruits using Hyper Spectral Images"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-4 , June 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23753.pdf
Paper URL: https://www.ijtsrd.com/computer-science/artificial-intelligence/23753/anomaly-detection-in-fruits-using-hyper-spectral-images/sandip-kumar
Role of Bioinformatics in Cancer Research Akash Arora
The document discusses the role of bioinformatics in cancer research. It explains that cancer is abnormal cell growth caused by chromosomal rearrangements, mutations, and errors in molecular machinery. Bioinformatics is the science of collecting and analyzing complex biological data like genetic codes, using tools to analyze data from databases of cancer information. This data can be used for cancer progression insight, drug target identification, early detection through biomarkers, and personalized medicine through risk analysis and bio-simulations. Software tools and packages like R-Project are used to analyze this genomic and molecular interaction data to further the understanding and treatment of cancer.
Machine Learning for Molecules: Lessons and Challenges of Data-Centric ChemistryIchigaku Takigawa
Perspectives on Artificial Intelligence and Machine Learning in Materials Science
February 4, 2022. – February 6, 2022.
https://joint.imi.kyushu-u.ac.jp/post-2698/
This document discusses data and model management in systems biology. It covers topics such as data ownership, metadata, ontologies, standards for encoding models and analyses, and tools for working with systems biology models and data. Standards like SBML, SBGN, SED-ML and COMBINE Archive allow for structured representation, visualization, simulation, and sharing of models and data. Resources like SEEK enable curation, simulation and publication of models in a findable, accessible, interoperable and reusable (FAIR) manner.
This talk was part of the 2020 Disease Map Modeling Community meeting, covering the steps towards publishing reproducible simulation studies (based on a reused model). Links to different COMBINE guidelines, tutorials and efforts. Grants: European Commission: EOSCsecretariat.eu - EOSCsecretariat.eu (831644)
Paper Annotated: SinGAN-Seg: Synthetic Training Data Generation for Medical I...Devansh16
YouTube video: https://www.youtube.com/watch?v=Ao-19L0sLOI
SinGAN-Seg: Synthetic Training Data Generation for Medical Image Segmentation
Vajira Thambawita, Pegah Salehi, Sajad Amouei Sheshkal, Steven A. Hicks, Hugo L.Hammer, Sravanthi Parasa, Thomas de Lange, Pål Halvorsen, Michael A. Riegler
Processing medical data to find abnormalities is a time-consuming and costly task, requiring tremendous efforts from medical experts. Therefore, Ai has become a popular tool for the automatic processing of medical data, acting as a supportive tool for doctors. AI tools highly depend on data for training the models. However, there are several constraints to access to large amounts of medical data to train machine learning algorithms in the medical domain, e.g., due to privacy concerns and the costly, time-consuming medical data annotation process. To address this, in this paper we present a novel synthetic data generation pipeline called SinGAN-Seg to produce synthetic medical data with the corresponding annotated ground truth masks. We show that these synthetic data generation pipelines can be used as an alternative to bypass privacy concerns and as an alternative way to produce artificial segmentation datasets with corresponding ground truth masks to avoid the tedious medical data annotation process. As a proof of concept, we used an open polyp segmentation dataset. By training UNet++ using both the real polyp segmentation dataset and the corresponding synthetic dataset generated from the SinGAN-Seg pipeline, we show that the synthetic data can achieve a very close performance to the real data when the real segmentation datasets are large enough. In addition, we show that synthetic data generated from the SinGAN-Seg pipeline improving the performance of segmentation algorithms when the training dataset is very small. Since our SinGAN-Seg pipeline is applicable for any medical dataset, this pipeline can be used with any other segmentation datasets.
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as: arXiv:2107.00471 [eess.IV]
(or arXiv:2107.00471v1 [eess.IV] for this version)
Reach out to me:
Check out my other articles on Medium. : https://machine-learning-made-simple....
My YouTube: https://rb.gy/88iwdd
Reach out to me on LinkedIn: https://www.linkedin.com/in/devansh-d...
My Instagram: https://rb.gy/gmvuy9
My Twitter: https://twitter.com/Machine01776819
My Substack: https://devanshacc.substack.com/
Live conversations at twitch here: https://rb.gy/zlhk9y
Get a free stock on Robinhood: https://join.robinhood.com/fnud75
An Innovative Deep Learning Framework Integrating Transfer- Learning And Extr...IRJET Journal
This paper proposes a deep learning framework that uses transfer learning and an XGBoost classifier to classify breast ultrasound images. It uses a VGG16 model pre-trained on general images to extract features from ultrasound images. These features are then classified using an XGBoost classifier. On a dataset of breast ultrasound images, the approach achieved 96.7% accuracy, and precision/recall/F-scores of 100%/96%/96% for benign images, 95%/97%/96% for malignant images, and 95%/98%/97% for normal images, outperforming other automatic image classification methods.
This document summarizes Dagmar Waltemath's presentation on model management for systems biology projects. It discusses the need for effective data management strategies due to the large, complex, and heterogeneous nature of systems biology data. It recommends using a data management plan, dedicated model management systems like FAIRDOMHub, standards for sharing data, publishing models in repositories, ensuring model quality, and tracking provenance. The goal is to make studies reproducible, valuable, and sustainable.
A SURVEY ON BLOOD DISEASE DETECTION USING MACHINE LEARNINGIRJET Journal
This document summarizes a research paper that evaluates different machine learning algorithms for detecting blood diseases from laboratory test results. It first introduces the objective to classify and predict diseases like anemia and leukemia. It then evaluates three algorithms: Gaussian, Random Forest, and Support Vector Classification (SVC). SVC achieved the highest accuracy of 98% for anemia detection. The models are deployed using Streamlit so users can access them online or offline. Benefits include low hardware requirements and mobile access. Future work will add more disease predictions and integrate nutritional guidance.
The document presents work from the Department of Systems Biology and Bioinformatics at the University of Rostock on improving reproducibility in systems biology simulations. It discusses developing standards for representing simulations (SED-ML) and modeling provenance to better reproduce published results and enable model reuse. The goals are to specify simulation experiments, develop simulation management methods focusing on model provenance, establish links between model data, and promote reproducible science.
M2CAT: Extracting reproducible simulation studies from model repositories usi...Martin Scharm
The document discusses M2CAT, a workflow that extracts reproducible simulation studies from model repositories. It searches the model repository Masymos for relevant studies, retrieves the necessary data, and exports it as a COMBINE archive using the CombineArchive Toolkit. This packages all the files into a single container that can be shared, modified and explored using various CAT tools. The workflow aims to make simulation studies more reproducible and accessible by bundling related models, data and descriptions into standardized packages.
A Delft3D model was implemented for the study of the hydrodynamics of San Quintin Bay. Calibration and validation have been successfully executed in previous research, but uncertainties propagated through simulation of future conditions are mostly unknown, and have not been tested in this region. Data Assimilation (DA) techniques play an important role, as their mathematical methods depict algorithms for combining dynamical system observations, implement computational models describing their evolution, and any relevant prior information. The aim of this study was to make a comparative analysis of calibration methods versus DA, as well as evaluate the long-term predictive capability of a model using sea surface height and current measurements taken within the bay. Delft3D-OpenDA is considered an effective a tool for delivering real-time forecasting via employment of the ensemble Kalman filter algorithm, and this automatic procedure is expected to obtain an improved model forecast. We anticipate an ensemble size of between 40 and 60 will provide the optimal and most accurately predicted water levels for San Quintin Bay by assimilating a single observation point located at the bay’s entry. New computational challenges will also be addressed, as well as means of reducing the computational costs of these implementations.
This document provides an overview of standards and best practices for making computational models reusable through the use of model repositories and standard formats. It discusses the COMBINE initiative for standardizing the encoding of models and simulations. The document encourages authors to make their models and data FAIR (Findable, Accessible, Interoperable, Reusable) by using community standards for publishing, exchanging, and archiving models. Examples of open model repositories and standards-compliant tools and libraries are provided to demonstrate how authors can improve sharing and reuse of their models.
Nicola Ancona – Dall’Intelligenza Artificiale alla Systems Medicineeventi-ITBbari
The Bioinformatics and Systems Biology Lab at the Institute of Intelligent Systems for Automation, National Research Council in Bari, Italy was established in the early 2000s. The multidisciplinary lab includes biotechnologists, physicists, engineers, and computer scientists who use computational approaches to address important life science issues. The lab analyzes and integrates large, heterogeneous omics data to identify genetic markers and molecular mechanisms underlying complex diseases. It has a high performance computing server with 512 cores, 1.5 TB RAM, and 14 TB storage for these analyses. The lab collaborates with several universities and research institutions in Italy and abroad on various projects focused on diseases such as cancer and kidney disease.
This document introduces BioPreDyn-bench, a suite of benchmark problems for dynamic modelling in systems biology. The suite contains 6 benchmark problems ranging from medium to large-scale kinetic models of organisms such as E. coli, S. cerevisiae, D. melanogaster, and human cells. For each benchmark, the document provides a description, implementations in various formats, computational results from specific solvers, and analysis. The suite aims to serve as reference test cases to evaluate and compare parameter estimation methods for dynamic models in systems biology.
This document discusses next generation sequencing (NGS) data and implications for data stewardship. It notes that NGS allows measuring the full-length transcriptome, including alternatively spliced transcripts specific to samples. This alters gene models and highlights the need to capture gene models and context in data commons for future reuse. The document also recommends that more metadata be captured about samples, experiments, and instruments to provide context and aid in data processing. It emphasizes making data FAIR (findable, accessible, interoperable, and reusable) according to W3C standards to improve data stewardship and enable both human and machine use of data.
Generalized deep learning model for Classification of Gastric, Colon and Rena...IRJET Journal
This document proposes developing a generalized deep learning model to classify gastric, colon, and renal cancer using a single model. The model would be trained on whole slide images of tissue samples fed through an EfficientNet model pre-trained on ImageNet. The model would be trained using transfer learning with partial transfusion to demonstrate the ability to classify pathology images from different sites. Previous studies have developed models to classify individual tissue types but not a unified model. The proposed model aims to address situations where the tissue site of origin is unknown.
The document discusses the creation and comparison of 3D-printed and finite element analysis (FEA) models of a porcine lumbar vertebra created using various open source software packages. Key findings include:
- 3D-printed models created with 3D-Slicer and MIMICS software were measured and found to be geometrically similar, with minor differences attributed to smoothing and measurement errors.
- FEA models were significantly stiffer than actual test results and published data. The model using equations from Morgan et al. was closest but still stiffer.
- Customizing material properties in the FEA model improved results, but equations require further validation.
- The study demonstrated a process for creating
Preliminary Lung Cancer Detection using Deep Neural NetworksIRJET Journal
This document presents a study on using deep learning techniques for preliminary lung cancer detection. Specifically, it proposes using a convolutional neural network (CNN) model for classifying histopathological lung cancer tissue images. The study describes the dataset used, which contains labeled RGB images of cancerous and non-cancerous lung tissue. It then discusses the proposed CNN architecture, which includes convolutional, pooling, dropout and fully connected layers. The model is trained on the dataset for 30 epochs and achieves 96.43% accuracy on the training set and 97.10% accuracy on the validation set, indicating it generalizes well for lung cancer classification. In conclusion, the CNN model shows promising results for preliminary lung cancer detection from histopathological images.
PREDICTION OF COVID-19 USING MACHINE LEARNING APPROACHESIRJET Journal
This document summarizes a research paper that used machine learning models to predict the spread of COVID-19. The researchers used various machine learning algorithms like SVM, random forest, decision tree, and linear regression on COVID-19 case data. SVM had the highest error in predictions, while random forest and decision tree performed best with lowest error. The models were developed using Python and deployed on cloud platforms. The study aimed to accurately predict COVID-19 trends to help governments respond better to the pandemic.
Reproducibility of model-based results: standards, infrastructure, and recogn...FAIRDOM
Written and presented by Dagmar Waltemath (University of Rostock) as part of the Reproducible and Citable Data and Models Workshop in Warnemünde, Germany. September 14th - 16th 2015.
This document provides an overview of the Computational Modeling in Biology Network (COMBINE) which coordinates the standardization of data and models in computational biology. It describes COMBINE's role in developing standards for encoding models (SBML), visualizing models (SBGN), and simulating models (SED-ML). The document also discusses COMBINE's guidance on publishing models according to FAIR principles, developing software tools and libraries to support the standards, and establishing best practices through documentation and training resources.
Introduction to FAIR principles in the context of computational biology models. Presented at a Workshop at the Basel Conference of Computational Biology. Grants: European Commission: EOSCsecretariat.eu - EOSCsecretariat.eu (831644)
This document discusses data and model management in systems biology. It covers topics such as data ownership, metadata, ontologies, standards for encoding models and analyses, and tools for working with systems biology models and data. Standards like SBML, SBGN, SED-ML and COMBINE Archive allow for structured representation, visualization, simulation, and sharing of models and data. Resources like SEEK enable curation, simulation and publication of models in a findable, accessible, interoperable and reusable (FAIR) manner.
This talk was part of the 2020 Disease Map Modeling Community meeting, covering the steps towards publishing reproducible simulation studies (based on a reused model). Links to different COMBINE guidelines, tutorials and efforts. Grants: European Commission: EOSCsecretariat.eu - EOSCsecretariat.eu (831644)
Paper Annotated: SinGAN-Seg: Synthetic Training Data Generation for Medical I...Devansh16
YouTube video: https://www.youtube.com/watch?v=Ao-19L0sLOI
SinGAN-Seg: Synthetic Training Data Generation for Medical Image Segmentation
Vajira Thambawita, Pegah Salehi, Sajad Amouei Sheshkal, Steven A. Hicks, Hugo L.Hammer, Sravanthi Parasa, Thomas de Lange, Pål Halvorsen, Michael A. Riegler
Processing medical data to find abnormalities is a time-consuming and costly task, requiring tremendous efforts from medical experts. Therefore, Ai has become a popular tool for the automatic processing of medical data, acting as a supportive tool for doctors. AI tools highly depend on data for training the models. However, there are several constraints to access to large amounts of medical data to train machine learning algorithms in the medical domain, e.g., due to privacy concerns and the costly, time-consuming medical data annotation process. To address this, in this paper we present a novel synthetic data generation pipeline called SinGAN-Seg to produce synthetic medical data with the corresponding annotated ground truth masks. We show that these synthetic data generation pipelines can be used as an alternative to bypass privacy concerns and as an alternative way to produce artificial segmentation datasets with corresponding ground truth masks to avoid the tedious medical data annotation process. As a proof of concept, we used an open polyp segmentation dataset. By training UNet++ using both the real polyp segmentation dataset and the corresponding synthetic dataset generated from the SinGAN-Seg pipeline, we show that the synthetic data can achieve a very close performance to the real data when the real segmentation datasets are large enough. In addition, we show that synthetic data generated from the SinGAN-Seg pipeline improving the performance of segmentation algorithms when the training dataset is very small. Since our SinGAN-Seg pipeline is applicable for any medical dataset, this pipeline can be used with any other segmentation datasets.
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as: arXiv:2107.00471 [eess.IV]
(or arXiv:2107.00471v1 [eess.IV] for this version)
Reach out to me:
Check out my other articles on Medium. : https://machine-learning-made-simple....
My YouTube: https://rb.gy/88iwdd
Reach out to me on LinkedIn: https://www.linkedin.com/in/devansh-d...
My Instagram: https://rb.gy/gmvuy9
My Twitter: https://twitter.com/Machine01776819
My Substack: https://devanshacc.substack.com/
Live conversations at twitch here: https://rb.gy/zlhk9y
Get a free stock on Robinhood: https://join.robinhood.com/fnud75
An Innovative Deep Learning Framework Integrating Transfer- Learning And Extr...IRJET Journal
This paper proposes a deep learning framework that uses transfer learning and an XGBoost classifier to classify breast ultrasound images. It uses a VGG16 model pre-trained on general images to extract features from ultrasound images. These features are then classified using an XGBoost classifier. On a dataset of breast ultrasound images, the approach achieved 96.7% accuracy, and precision/recall/F-scores of 100%/96%/96% for benign images, 95%/97%/96% for malignant images, and 95%/98%/97% for normal images, outperforming other automatic image classification methods.
This document summarizes Dagmar Waltemath's presentation on model management for systems biology projects. It discusses the need for effective data management strategies due to the large, complex, and heterogeneous nature of systems biology data. It recommends using a data management plan, dedicated model management systems like FAIRDOMHub, standards for sharing data, publishing models in repositories, ensuring model quality, and tracking provenance. The goal is to make studies reproducible, valuable, and sustainable.
A SURVEY ON BLOOD DISEASE DETECTION USING MACHINE LEARNINGIRJET Journal
This document summarizes a research paper that evaluates different machine learning algorithms for detecting blood diseases from laboratory test results. It first introduces the objective to classify and predict diseases like anemia and leukemia. It then evaluates three algorithms: Gaussian, Random Forest, and Support Vector Classification (SVC). SVC achieved the highest accuracy of 98% for anemia detection. The models are deployed using Streamlit so users can access them online or offline. Benefits include low hardware requirements and mobile access. Future work will add more disease predictions and integrate nutritional guidance.
The document presents work from the Department of Systems Biology and Bioinformatics at the University of Rostock on improving reproducibility in systems biology simulations. It discusses developing standards for representing simulations (SED-ML) and modeling provenance to better reproduce published results and enable model reuse. The goals are to specify simulation experiments, develop simulation management methods focusing on model provenance, establish links between model data, and promote reproducible science.
M2CAT: Extracting reproducible simulation studies from model repositories usi...Martin Scharm
The document discusses M2CAT, a workflow that extracts reproducible simulation studies from model repositories. It searches the model repository Masymos for relevant studies, retrieves the necessary data, and exports it as a COMBINE archive using the CombineArchive Toolkit. This packages all the files into a single container that can be shared, modified and explored using various CAT tools. The workflow aims to make simulation studies more reproducible and accessible by bundling related models, data and descriptions into standardized packages.
A Delft3D model was implemented for the study of the hydrodynamics of San Quintin Bay. Calibration and validation have been successfully executed in previous research, but uncertainties propagated through simulation of future conditions are mostly unknown, and have not been tested in this region. Data Assimilation (DA) techniques play an important role, as their mathematical methods depict algorithms for combining dynamical system observations, implement computational models describing their evolution, and any relevant prior information. The aim of this study was to make a comparative analysis of calibration methods versus DA, as well as evaluate the long-term predictive capability of a model using sea surface height and current measurements taken within the bay. Delft3D-OpenDA is considered an effective a tool for delivering real-time forecasting via employment of the ensemble Kalman filter algorithm, and this automatic procedure is expected to obtain an improved model forecast. We anticipate an ensemble size of between 40 and 60 will provide the optimal and most accurately predicted water levels for San Quintin Bay by assimilating a single observation point located at the bay’s entry. New computational challenges will also be addressed, as well as means of reducing the computational costs of these implementations.
This document provides an overview of standards and best practices for making computational models reusable through the use of model repositories and standard formats. It discusses the COMBINE initiative for standardizing the encoding of models and simulations. The document encourages authors to make their models and data FAIR (Findable, Accessible, Interoperable, Reusable) by using community standards for publishing, exchanging, and archiving models. Examples of open model repositories and standards-compliant tools and libraries are provided to demonstrate how authors can improve sharing and reuse of their models.
Nicola Ancona – Dall’Intelligenza Artificiale alla Systems Medicineeventi-ITBbari
The Bioinformatics and Systems Biology Lab at the Institute of Intelligent Systems for Automation, National Research Council in Bari, Italy was established in the early 2000s. The multidisciplinary lab includes biotechnologists, physicists, engineers, and computer scientists who use computational approaches to address important life science issues. The lab analyzes and integrates large, heterogeneous omics data to identify genetic markers and molecular mechanisms underlying complex diseases. It has a high performance computing server with 512 cores, 1.5 TB RAM, and 14 TB storage for these analyses. The lab collaborates with several universities and research institutions in Italy and abroad on various projects focused on diseases such as cancer and kidney disease.
This document introduces BioPreDyn-bench, a suite of benchmark problems for dynamic modelling in systems biology. The suite contains 6 benchmark problems ranging from medium to large-scale kinetic models of organisms such as E. coli, S. cerevisiae, D. melanogaster, and human cells. For each benchmark, the document provides a description, implementations in various formats, computational results from specific solvers, and analysis. The suite aims to serve as reference test cases to evaluate and compare parameter estimation methods for dynamic models in systems biology.
This document discusses next generation sequencing (NGS) data and implications for data stewardship. It notes that NGS allows measuring the full-length transcriptome, including alternatively spliced transcripts specific to samples. This alters gene models and highlights the need to capture gene models and context in data commons for future reuse. The document also recommends that more metadata be captured about samples, experiments, and instruments to provide context and aid in data processing. It emphasizes making data FAIR (findable, accessible, interoperable, and reusable) according to W3C standards to improve data stewardship and enable both human and machine use of data.
Generalized deep learning model for Classification of Gastric, Colon and Rena...IRJET Journal
This document proposes developing a generalized deep learning model to classify gastric, colon, and renal cancer using a single model. The model would be trained on whole slide images of tissue samples fed through an EfficientNet model pre-trained on ImageNet. The model would be trained using transfer learning with partial transfusion to demonstrate the ability to classify pathology images from different sites. Previous studies have developed models to classify individual tissue types but not a unified model. The proposed model aims to address situations where the tissue site of origin is unknown.
The document discusses the creation and comparison of 3D-printed and finite element analysis (FEA) models of a porcine lumbar vertebra created using various open source software packages. Key findings include:
- 3D-printed models created with 3D-Slicer and MIMICS software were measured and found to be geometrically similar, with minor differences attributed to smoothing and measurement errors.
- FEA models were significantly stiffer than actual test results and published data. The model using equations from Morgan et al. was closest but still stiffer.
- Customizing material properties in the FEA model improved results, but equations require further validation.
- The study demonstrated a process for creating
Preliminary Lung Cancer Detection using Deep Neural NetworksIRJET Journal
This document presents a study on using deep learning techniques for preliminary lung cancer detection. Specifically, it proposes using a convolutional neural network (CNN) model for classifying histopathological lung cancer tissue images. The study describes the dataset used, which contains labeled RGB images of cancerous and non-cancerous lung tissue. It then discusses the proposed CNN architecture, which includes convolutional, pooling, dropout and fully connected layers. The model is trained on the dataset for 30 epochs and achieves 96.43% accuracy on the training set and 97.10% accuracy on the validation set, indicating it generalizes well for lung cancer classification. In conclusion, the CNN model shows promising results for preliminary lung cancer detection from histopathological images.
PREDICTION OF COVID-19 USING MACHINE LEARNING APPROACHESIRJET Journal
This document summarizes a research paper that used machine learning models to predict the spread of COVID-19. The researchers used various machine learning algorithms like SVM, random forest, decision tree, and linear regression on COVID-19 case data. SVM had the highest error in predictions, while random forest and decision tree performed best with lowest error. The models were developed using Python and deployed on cloud platforms. The study aimed to accurately predict COVID-19 trends to help governments respond better to the pandemic.
Reproducibility of model-based results: standards, infrastructure, and recogn...FAIRDOM
Written and presented by Dagmar Waltemath (University of Rostock) as part of the Reproducible and Citable Data and Models Workshop in Warnemünde, Germany. September 14th - 16th 2015.
This document provides an overview of the Computational Modeling in Biology Network (COMBINE) which coordinates the standardization of data and models in computational biology. It describes COMBINE's role in developing standards for encoding models (SBML), visualizing models (SBGN), and simulating models (SED-ML). The document also discusses COMBINE's guidance on publishing models according to FAIR principles, developing software tools and libraries to support the standards, and establishing best practices through documentation and training resources.
Introduction to FAIR principles in the context of computational biology models. Presented at a Workshop at the Basel Conference of Computational Biology. Grants: European Commission: EOSCsecretariat.eu - EOSCsecretariat.eu (831644)
This document summarizes work using Neo4j graph databases for computational systems biology models. It discusses:
1) Projects using Neo4j to integrate storage of models and simulation studies, enable ranked retrieval, and identify frequent patterns in models.
2) Tools developed including MASYMOS for linking models, simulations, annotations via graph structures, and STON for converting SBGN maps to Neo4j.
3) Applications including model repositories, analysis tools, and identifying common reaction motifs in models.
This document discusses challenges to reproducibility in systems biology and potential solutions. It notes a lack of data standards, quality, availability, and transparency make it difficult for researchers to reproduce results. Tools and initiatives discussed that aim to improve reproducibility include the COMBINE archive to bundle necessary files, graph databases to integrate model-related data, and version control systems to track model evolution over time. The overall goal is to better support scientists in sharing reproducible model-based studies.
This document discusses SED-ML (Simulation Experiment Description Markup Language), a standard for describing computational simulations. SED-ML files contain information like the models, data, simulation settings and algorithms used in an experiment. Using SED-ML allows experiments to be reproduced and shared. The document encourages adopting SED-ML to make research more reproducible and help curation of models in repositories. It also provides an overview of tools that support SED-ML and ways to get involved in its development.
Slides from the presentation at IDAMO 2016, Rostock. May 2016.
Most scientific discoveries rely on previous or other findings. A lack of transparency and openness led to what many consider the "reproducibility crisis" in systems biology and systems medicine. The crisis arose from missing standards and inappropriate support of
standards in software tools. As a consequence, numerous results in low-and high-profile publications cannot be reproduced.
In my presentation, I summarise key challenges of reproducibility in systems biology and systems medicine, and I demonstrate available solutions to the related problems.
Introduction to the hands on session on "Standards and tools for model management" at the ICSB 2015.
Focus on COMBINE standards, tools for search, version control and archiving. Used management platform is SEEK.
These are the slides from COMBINE 2015. In this talk, I presented the different approaches we take to determine the similarity between simulation models encoded in SBML or CeLLML -- namely: Information Retrieval based ranked model retrieval; annotation-based feature extraction for sets of models; and structure-based similarity search and clustering of model sets.
This document discusses improving reproducibility of simulation studies in computational biology through better management of simulation models and data. The SEMS project aims to develop standards and tools to link related data such as publications, models, simulations, results and more. This will be achieved by using graph databases and COMBINE standards to integrate data from various repositories. Tools will be created to search, compare, cluster and visualize models and their evolution over time to enable more reproducible and reusable simulation studies.
The document summarizes the work of the SBGN-ED+ project, which aims to further develop and integrate the Systems Biology Graphical Notation (SBGN) for modeling biological networks. Some key goals of the project include contributing to the SBGN specification and library, implementing SBGN support for model version control and merging in software tools like SBGN-ED, and using SBGN maps to display differences between model versions. The project also seeks to incorporate SBGN maps into model search, comparison and integration of model-related data. This would help address the need for standardized visual representations of biological networks to reduce ambiguity and enable sharing of computational models.
Some slides put together on analogies between biosamples and model samples. Prepared for the Biosamples workshop at The University of Manchester, 17th June 2015.
Talk in the research seminar of the Systems Biology group at the University of Rostock. The goal was to introduce the two new projects running in SEMS from summer 2015: The de.NBI-SYSBIO German Network for Bioinformatics infrastructure (focus: systems biology data management) and SBGN-ED (support and further development of SBGN-ED and libSBGN).
MaSyMoS is a tool for finding hidden treasures in model repositories by enabling semantic searches across models, annotations, and associated data. It addresses a common problem researchers face in difficulty managing and accessing their data. MaSyMoS allows users to query model repositories to find models associated with certain publications, genes, or behaviors. It also provides files needed to run simulations from retrieved models. The tool aims to help researchers better discover, organize, and leverage existing computational models.
This document discusses challenges in modeling reproducibility, dissemination, and management. It notes that researchers struggle with data management. Standards are needed for reproducible modeling results, including models, annotations, and protocols. Models should be disseminated through public repositories for higher visibility, long-term availability, and quality checks. Management of models and related data can be improved through integration into graph databases linked to ontologies, as well as version control systems. The SEMS projects aim to address these issues to foster dissemination, ensure reproducibility, and improve management of computational models.
This document discusses three approaches to integrating model-related data in computational biology:
1) The COMBINE archive which bundles all model data into a single zip file for easy distribution.
2) Using a graph database (MORRE) to manage existing model data by representing it as a network of interrelated nodes that can be queried using information retrieval techniques.
3) Integrating model data into the semantic web and linked open data through BIO2RDF to enable automated reasoning and linking to other biological knowledge bases.
Ron Henkel's presentation of our Ranked Retrieval approach; 2012 PALs meeting of the Sysmo-SEEK project in Heidelberg, Germany. 28th-30th of November 2012.
ESR spectroscopy in liquid food and beverages.pptxPRIYANKA PATEL
With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.
Authoring a personal GPT for your research and practice: How we created the Q...Leonel Morgado
Thematic analysis in qualitative research is a time-consuming and systematic task, typically done using teams. Team members must ground their activities on common understandings of the major concepts underlying the thematic analysis, and define criteria for its development. However, conceptual misunderstandings, equivocations, and lack of adherence to criteria are challenges to the quality and speed of this process. Given the distributed and uncertain nature of this process, we wondered if the tasks in thematic analysis could be supported by readily available artificial intelligence chatbots. Our early efforts point to potential benefits: not just saving time in the coding process but better adherence to criteria and grounding, by increasing triangulation between humans and artificial intelligence. This tutorial will provide a description and demonstration of the process we followed, as two academic researchers, to develop a custom ChatGPT to assist with qualitative coding in the thematic data analysis process of immersive learning accounts in a survey of the academic literature: QUAL-E Immersive Learning Thematic Analysis Helper. In the hands-on time, participants will try out QUAL-E and develop their ideas for their own qualitative coding ChatGPT. Participants that have the paid ChatGPT Plus subscription can create a draft of their assistants. The organizers will provide course materials and slide deck that participants will be able to utilize to continue development of their custom GPT. The paid subscription to ChatGPT Plus is not required to participate in this workshop, just for trying out personal GPTs during it.
hematic appreciation test is a psychological assessment tool used to measure an individual's appreciation and understanding of specific themes or topics. This test helps to evaluate an individual's ability to connect different ideas and concepts within a given theme, as well as their overall comprehension and interpretation skills. The results of the test can provide valuable insights into an individual's cognitive abilities, creativity, and critical thinking skills
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...AbdullaAlAsif1
The pygmy halfbeak Dermogenys colletei, is known for its viviparous nature, this presents an intriguing case of relatively low fecundity, raising questions about potential compensatory reproductive strategies employed by this species. Our study delves into the examination of fecundity and the Gonadosomatic Index (GSI) in the Pygmy Halfbeak, D. colletei (Meisner, 2001), an intriguing viviparous fish indigenous to Sarawak, Borneo. We hypothesize that the Pygmy halfbeak, D. colletei, may exhibit unique reproductive adaptations to offset its low fecundity, thus enhancing its survival and fitness. To address this, we conducted a comprehensive study utilizing 28 mature female specimens of D. colletei, carefully measuring fecundity and GSI to shed light on the reproductive adaptations of this species. Our findings reveal that D. colletei indeed exhibits low fecundity, with a mean of 16.76 ± 2.01, and a mean GSI of 12.83 ± 1.27, providing crucial insights into the reproductive mechanisms at play in this species. These results underscore the existence of unique reproductive strategies in D. colletei, enabling its adaptation and persistence in Borneo's diverse aquatic ecosystems, and call for further ecological research to elucidate these mechanisms. This study lends to a better understanding of viviparous fish in Borneo and contributes to the broader field of aquatic ecology, enhancing our knowledge of species adaptations to unique ecological challenges.
Current Ms word generated power point presentation covers major details about the micronuclei test. It's significance and assays to conduct it. It is used to detect the micronuclei formation inside the cells of nearly every multicellular organism. It's formation takes place during chromosomal sepration at metaphase.
When I was asked to give a companion lecture in support of ‘The Philosophy of Science’ (https://shorturl.at/4pUXz) I decided not to walk through the detail of the many methodologies in order of use. Instead, I chose to employ a long standing, and ongoing, scientific development as an exemplar. And so, I chose the ever evolving story of Thermodynamics as a scientific investigation at its best.
Conducted over a period of >200 years, Thermodynamics R&D, and application, benefitted from the highest levels of professionalism, collaboration, and technical thoroughness. New layers of application, methodology, and practice were made possible by the progressive advance of technology. In turn, this has seen measurement and modelling accuracy continually improved at a micro and macro level.
Perhaps most importantly, Thermodynamics rapidly became a primary tool in the advance of applied science/engineering/technology, spanning micro-tech, to aerospace and cosmology. I can think of no better a story to illustrate the breadth of scientific methodologies and applications at their best.
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
Basics of crystallography, crystal systems, classes and different forms
2019 07-04-model reuse-bonn
1. from paper-based
model description to
interactive simulation
of disease progression
PROF. DR.-ING. DIPL.-INF. DAGMAR WALTEMATH
MEDICAL INFORMATICS | INSTITUTE FOR COMMUNITY MEDICINE
UNIVERSITY MEDICINE GREIFSWALD (GERMANY)
MODELREUSEWITHJOY
2. About me
SEMS@University of Rostock, Germany (2015)
7/4/2019 DAGMAR WALTEMATH | MODEL REUSE WITH JOY 2
Projects. SEMS | de.NBI:SYSBIO | SBGN-ED+ |
INCOME | MIRACUM
Community work. Standard development | COMBINE
coordinator | SED-ML editor
Research interests. Data integration | Semantics |
Reproducibility of scientific results | Sustainability of
scientific outcomes
Further interests. Education of young scientists | Open
Access & open data | Gender equality in science
@dagmarwaltemath
0000-0002-5886-5563
3. How this talk is organised
THE HISTORY THE SCIENCE
Disclaimer: All comic-style graphics in this presentation
were done either by Anna Zhukova or by Martin Peters.
Thank you very much! Images downloaded from pixabay.
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 37/4/2019
4. Systems Biology is…
Systems biology is the science that studies
how biological function emerges from the
interactions between the components of living
systems.
… and how these emergent properties enable
or constrain the behavior of these
components.
(Slide adapted from: Olaf Wolkenhauer)
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 47/4/2019
5. Simulation models can take many forms.
MATHEMATICAL MODELS FURTHER APPROACHES
Fig.s: https://doi.org/10.1371/journal.pcbi.1002815, https://doi.org/10.1371/journal.pcbi.1004591
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 57/4/2019
6. Simulation models can be complex.
First in silico Whole Cell Model
Genome (525 genes), transcriptome, proteome and metabolome
incorporated
Describes whole life cycle of a single cell on molecular level, and
predicts a wide range of cellular behaviors, and
accounts for the specific function of every annotated gene product
Based on 900 publications
Consists of 116 MATLAB files
Incorporates over 1.900 experimentally observed parameters
WHOLE-CELL MODEL KEY FIGURES
Fig.: Karr et al. (2012), https://doi.org/10.1016/j.cell.2012.05.044
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 67/4/2019
8. Publishing the model
PAPER AVAILABLE INFORMATION
1) (textual) description of work and related
efforts (referencing other papers)
2) (textual and visual) description of
(biochemical) network
3) (printed) model parameters
4) (printed) mathematical equations
5) resulting plots
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 8
Fig.: http://doi.org/10.1073/pnas.88.16.7328
7/4/2019
9. What can you do with this model?
STUDY THE PAPER, BELIEVE RE-IMPLEMENT BASED ON THE PAPER
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 97/4/2019
11. Publishing the model
PAPER AVAILABLE INFORMATION
1) (textual) description of work and related
efforts (referencing other papers)
2) (textual and visual) description of
(biochemical) network
3) (printed) model parameters
4) (printed) mathematical equations
5) resulting plots
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 11
Fig.: http://doi.org/10.1073/pnas.94.17.9147
7/4/2019
12. Publishing the model code
SIMULATION MODEL AVAILABLE INFORMATION
1) Description of (biochemical) network in
computer-readable format (SBML)
2) Mathematical equations in computer-
readable format (MathML)
3) Model parameters inside model code
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 127/4/2019
13. What can you do with this data?
CHECK THE MODEL (REPRODUCIBILITY)
RE-USE THE CODE IN ANOTHER SOFTWARE
(INTEROPERABILITY)
Fig. (left) JWS Online, http://jjj.mib.ac.uk/models. Fig. (right) courtesy M.Hucka (2016),
https://www.slideshare.net/thehuck/recent-software-and-services-to-support-the-sbml-community
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 137/4/2019
15. Publishing the model & code
PAPER SIMULATION MODEL
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 15
Fig.: https://doi.org/10.1038/msb4100171
7/4/2019
16. Publishing the meta-data
on repository – model – and entity level
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 16
Harmonised meta-data for simulation models in computational biology: Neal et al. (2018), Briefings in Bioinformatics (https://doi.org/10.1093/bib/bby087)
7/4/2019
17. Publishing the simulation setups
COMBINE ARCHIVE
manifest.xml Omex Skeleton, automatically generated by WebCAT
metadata.rdf Omex Skeleton, automatically generated by WebCAT
README.md Markdown Human readable information for users stumbling upon the archive
model/
BIOMD0000000144.xml SBML L2V1 origin: www.ebi.ac.uk/biomodels-main/download?mid=BIOMD0000000144
calzone_2007.svg SVG origin: models.cellml.org/workspace/calzone_thieffry_tyson_novak_2007
calzone_2007.ai Illustrator origin: models.cellml.org/workspace/calzone_thieffry_tyson_novak_2007
calzone_2007.png PNG origin: models.cellml.org/workspace/calzone_thieffry_tyson_novak_2007
calzone_thieffry_tyson_novak_2007.cellml CellML 1.0 origin: models.cellml.org/workspace/calzone_thieffry_tyson_novak_2007
sbgn/Calzone2007.gml GML SBGN compliant figure generated using SBGN-ED
sbgn/Calzone2007.graphml GraphML SBGN compliant figure generated using SBGN-ED
sbgn/Calzone2007.pdf PDF SBGN compliant figure generated using SBGN-ED
sbgn/Calzone2007.png PNG SBGN compliant figure generated using SBGN-ED
sbgn/Calzone2007.sbgn SBGN-ML SBGN-ML encoded figure generated using SBGN-ED
experiment/
Calzone2007-default-simulation.xml SED-ML L1V1 Simulation description generated using SED-ML Web Tools
Calzone2007-simulation-figure-1B.xml SED-ML L1V1
Simulation description generated using SED-ML Web Tools based on
Calzone2007-default-simulation.xml
documentation/
Calzone2007.pdf PDF
Scientific publication “Dynamical modeling of syncytial mitotic cycles in
Drosophila embryos”obtained from msb.embopress.org/content/3/1/131
Calzone2007-supplementary-material.pdf PDF
Supplementary information for the publication obtained from
msb.embopress.org/content/3/1/131
result/
Fig1B-bottom-COPASI.svg SVG
Imagegenerated by executing Calzone2007-simulation-figure-1B.xml on
BIOMD0000000144.xml in COPASI
Fig1B-top-COPASI.svg SVG
Imagegenerated by executing Calzone2007-simulation-figure-1B.xml on
BIOMD0000000144.xml in COPASI
Fig1B-bottom-webtools.png PNG
Imagegenerated by executing Calzone2007-simulation-figure-1B.xml on
BIOMD0000000144.xml in SED-ML Web Tools
Fig1B-top-webtools.png PNG
Imagegenerated by executing Calzone2007-simulation-figure-1B.xml on
BIOMD0000000144.xml in SED-ML Web Tools
AVAILABLE INFORMATION
1) Paper and additional information
2) Meta-data
3) Graphical representation of model (SBGN)
4) Alternative parametrisations (SED-ML)
5) Model versions
6) Simulation experiments (SED-ML)
Example archive available from: https://github.com/SemsProject/CombineArchiveShowCase/
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 177/4/2019
18. What can you do with an archive?
Explore data
and meta-data
Identify
Data set
of interest
Run model
Online/
offline
Safe new versions and
documentation in archive
Modify,
merge,
extend,
combine...
Re-publish
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 18
Download
Archive
7/4/2019
19. What can you do with an archive?
Example: Download archive from Github and run it in JWS Online
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 197/4/2019
20. What does the (near) future bring?
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 207/4/2019
21. Linking models and data simplifies verification
of models, and experimental data sets.
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 21
Integrating Disease maps and Biomedical
data (e.g., https://pdmap.uni.lu/minerva/)
Linking models and experimental data sets
(e.g., JWS Online)
7/4/2019
22. Connecting pathways, ontologies and datasets
leads to new means of data exploration.
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 22
Comprehensive knowledge of cancer signaling networks and linked data,
working with interactive Pathway Maps, https://acsn.curie.fr/ACSN2/ACSN2.html
7/4/2019
23. Easy access to patient-specific liver disease
progression helps doctors choose a therapy.
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 23
Fig.: Koenig et al. (2016), ODLS, Halle (Saale), http://livermetabolism.com
7/4/2019
24. The pillars of success
WHAT‘S THE SECRET?
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 247/4/2019
25. The research field develops and adheres to
FAIR standards for modeling and simulation.
Data formatsRecommendations Semantic / Ontologies
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 257/4/2019
26. Data formats are interoperable and are
being developed collaboratively.
Editorial Boards
Specifications
Software tool support
http://co.mbine.org/standards
Standard development Meetings
Annual special issue with
list of latest specifications
and errata
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 267/4/2019
27. The community builds, feeds & uses
open repositories for simulation studies.
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 277/4/2019
28. The community actively develops open,
standard-compliant libraries & tools.
MODELING AND SIMULATION SOFTWARE REPOSITORIES & MANAGEMENT TOOLS
…
Full list available at: http://sbml.org/SBML_Software_Guide/
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 287/4/2019
29. The (data) Science
DEVELOPMENT OF MODEL MANAGEMENT STRATEGIES
BY SEMS & FRIENDS (2011-2019)
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 29
30. Characteristics of the data
Heterogeneous
Big
Distributed
Complex
Highly connected
But
Good standards available to represent the
data
Agreed-upon semantic annotation schemes &
ontologies to enrich the data
Open data movement
Community spirit
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 307/4/2019
31. Issues that SEMS investigated 2012-17
Handling the steadily increasing size & numbers of models and studies (database performance)
Increasing the quality of published models (semantic annotations, reproducibility of results)
Keeping track of model changes and relations
(comprehensibility)
Identifying and handling similarities
in model representations (reuse)
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 31
~ 300.000 models in
BioModels Database,
on average 5 versions per
model.
XML, RDF, OWL
7/4/2019
32. A graph-based approach keeps storage
and retrieval efficient.
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 32
Document
SEDML
Modelrefere
nce
Output
Datagenera
tor
Simulation Task
Variable
Variable
Document
Tyson_1991
C2 CP
time
environment
isDescribedBy Pubmed:
1831270
time timeCPC2 CP C2
is_connected is_connected
is_mapped_to
is_connected
Document
Tyson1991
Cell Cycle 6
var
C2 pM CellReaction3 CP
Uniprot:P04551 Uniprot:P04551 GO:0005623
Interpro:
IPR006670isVersionOf
isVersion
hasPart
is
asProduct
asReactant isContainedIn
Pubmed:
1831270
Kegg Pathway
sce04111
isDescribedBy
is
EC-Code:
3.1.3.16
isVersionOf
Example: Tyson 1991 (BIOM5), Source: Waltemath & Henkel, Neo4j Life & Health Sciences Day - Berlin, 21st June, 2017,
adapted from Henkel et al. (2015) DATABASE (https://doi.org/10.1093/database/bau130)
SBO:
Ontology
SBO:0000
SBO:544 SBO:236SBO:231
isA
SBO:064 SBO:545SBO:004 SBO:003
Models Simulation Annotation
7/4/2019
33. The linking of data sets on graph-level
allows for complex queries.
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 33
2 experiments,
3 model versions,
changes,
meta-data
Fig.: Martin Peters, SEMS
Fig (right): Henkel et al. (2015) DATABASE, https://doi.org/10.1093/database/bau130
7/4/2019
34. Lucene-based indices incorporate all relevant
information for later search & comparison.
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 34
Model
Publication
Annotation
Person
Simulation
Document
Tyson1991
Cell Cycle 6
var
C2 pM CellReaction3 CP
Uniprot:P04551 Uniprot:P04551 GO:0005623
Interpro:
IPR006670
isVersionOf
isVersion
hasPart
is
asProduct
asReactant isContainedIn
Pubmed:
1831270
Kegg Pathway
sce04111
isDescribedBy
is
EC-Code:
3.1.3.16
isVersionOf
Document
SEDML
Modelrefere
nce
Output
Datagenera
tor
Simulation Task
Variable
Variable
Document
Tyson_1991
C2 CP
time
environment
isDescribedBy Pubmed:
1831270
time timeCPC2 CP C2
is_connected is_connected
is_mapped_to
is_connected
SBO:
Ontology
SBO:0000
SBO:544 SBO:236SBO:231
isA
SBO:064 SBO:545SBO:004 SBO:003
• Id
• Name
• Title
• Journal
• Abstract
• Authors
• …
• Id
• Name
• Component
• Variable
• Species
• Reaction
• Compartment
• First name
• Last name
• Organization
• Email
• URI
• Description
Fig.: Henkel et al. (2015) DATABASE
7/4/2019
35. A weighted ranked-retrieval methods
returns only most relevant models.
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 35
Document
Tyson1991
Cell Cycle 6
var
C2 pM CellReaction3 CP
Uniprot:P04551 Uniprot:P04551 GO:0005623
Interpro:
IPR006670
isVersionOf
isVersion
hasPart
is
asProduct
asReactant isContainedIn
Pubmed:
1831270
Kegg Pathway
sce04111
isDescribedBy
is
EC-Code:
3.1.3.16
isVersionOf
Annotation
Person
Show me models by
Tyson describing the cell
cycle and having cdc2
1. (0.859) Tyson1991 - Cell Cycle 6 var
2. (0.854) Tyson2001_Cell_Cycle_Regulation
3. (0.477) Chen2004 - Cell Cycle Regulation
Which are the most frequently used
GO annotations in my model set?
Which models contain reactions
with 'ATP' as reactant and 'ADP'
as product?
Find good candidates for
features describing my model set.
Which models are annotated
with ‘Ubiquitin'’?
Give me all the files I need to
run this simulation study.
Fig.: Henkel et al. (2015) DATABASE
7/4/2019
36. A method to detect and track differences
in model versions ensures transparency.
How did my model change between version x and X+1?
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 36
„Sophisticated“ XYDIFF & change ontology
How often did this model
change, when and wy?Give me all versions of this
model.Figs.: Waltemath et al. (2015) Oxford Bioinformatics (https://doi.org/10.1093/bioinformatics/btt018);
Implementation: M. Scharm, https://github.com/SemsProject/BiVeS
7/4/2019
37. Identification of frequent pattern in network
graphs helps determine structural similarity.
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 37
Fig.: Size and number of reactions and participating species (left), and identified frequent patterns (right).
Implementation: Fabienne Lambusch. Figure: Lambusch et al. (2018) DATABASE (https://doi.org/10.1093/database/bay051)
7/4/2019
38. Identification of frequent pattern in network
graphs helps determine structural similarity.
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 38
Fig.: Tyson BIOM5 (left), and identified patterns based on the (right).
Implementation: Fabienne Lambusch. Figure: Lambusch et al. (2018) DATABASE (https://doi.org/10.1093/database/bay051)
How similar are these two models
with respect to structure?
Give me all models with
this particular sub-structure.
7/4/2019
40. Implementing model version control in the FAIRDOMHub
Internal use of BIVES difference detection for SBML models
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 407/4/2019
41. Change statistics for model versions
Internal use of BIVES difference detection for SBML and CellML models, Change ontology COMODI, SBGN Visualisation tool DiViL;
https://most.bio.informatik.uni-rostock.de, Scharm et al (2018), BMC SysBio (https://doi.org/10.1186/s12918-018-0553-2)
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 41
BIOM7
7/4/2019
42. Change statistics for model versions
Internal use of BIVES difference detection for SBML and CellML models, Change ontology COMODI, SBGN Visualisation tool DiViL;
https://most.bio.informatik.uni-rostock.de, Scharm et al (2018), BMC SysBio (https://doi.org/10.1186/s12918-018-0553-2)
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 42
BIOM7
7/4/2019
43. Change statistics for model versions
Internal use of BIVES difference detection for SBML and CellML models, Change ontology COMODI, SBGN Visualisation tool DiViL;
https://most.bio.informatik.uni-rostock.de, Scharm et al (2018), BMC SysBio (https://doi.org/10.1186/s12918-018-0553-2)
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 43
BIOM7
7/4/2019
44. Ranked retrieval of reproducible simulation studies
Internal use of the COMBINEArchive-library, MORRE, MASYMOS, http://cellml.org/models
Internal use of the COMBINEArchive library, SEDMLlibrary, https://jjj.biochem.sun.ac.za/
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 447/4/2019
45. …we can help
you manage it,
so it can be
retrieved and
reused by others.
If your work is
standardised,
documented,
and open
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 45
47. Standardisation and integration of data
improved model accessibility and reusability.
COPPIC FOREST (DECORTICATED)
Matlab logo: By Jarekt (Own work) [Public domain], via Wikimedia Commons; Python logo: By www.python.org [GPL, via Wikimedia Commons];
Java logo: By Cguevara94 (Own work) [CC BY-SA 4.0], via Wikimedia Commons, modified.
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 47
PATH (ACCESSIBLE)
7/4/2019
48. Biological data is well-integrated with simulation
models, but biomedical/clinical data lacks behind.
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 48
49. Thank you for
your attention
Dagmar Waltemath
University Medicine Greifswald
@dagmarwaltemath
0000-0002-5886-5563
Contact me to adopt a SEMS –
work in Greifswald or clone a github repository!