Alphafold2 - Protein Structural Bioinformatics After CASP14Purdue University
This presentation discusses Alphafold2 about the algorithm in comparison with others and the time course of the impact and development that happened after the paper and software release. This slide was presented at IIBMP2021 https://iibmp2021.hamadalab.com/program/, a virtual conference in Tokyo, Japan on 9/27, 28, 2021
InterPro is a database that classifies proteins into families, domains, and sequence features based on their structural and functional properties. It integrates predictive models from several member databases to annotate unknown protein sequences. Protein signatures like patterns, profiles, fingerprints and hidden Markov models are generated from multiple sequence alignments and used by InterPro for classification. AlphaFold is an artificial intelligence system that can predict protein three-dimensional structures directly from amino acid sequences, representing a major advance in solving the protein folding problem.
Systems Biology & Pharmacology from a Structural PerspectivePhilip Bourne
Philip Bourne gave a talk on his past research leading data science at the NIH and his current and future research. Some of his past work includes developing methods for determining binding site similarity across protein space using geometric potentials and sequence-order independent profile-profile alignment. His current research focuses on using these methods for applications like predicting drug efficacy, toxicity and repurposing old drugs for new uses. He discussed challenges like the large search space and flexibility of proteins. Going forward, he hopes cloud computing environments can help make data and tools more accessible and reusable to advance systems pharmacology research.
Structure Modeling of Disordered Protein InteractionsPurdue University
This document describes a new method called IDR-LZerD for modeling the interactions between intrinsically disordered proteins and structured proteins. IDR-LZerD first predicts multiple conformations for short windows of the disordered sequence, docks these fragments to the structured protein, combines the docked fragments, and refines the models. It was shown to successfully model interactions for disordered proteins up to 69 residues in length on a training and test set, outperforming existing methods. Examples of predicted models are provided along with their accuracy compared to experimental structures. The method represents the first successful approach for docking intrinsically disordered proteins and could provide insights into their interactions.
This document discusses recent trends in bioinformatics, including the analysis of cDNA microarray data, protein tertiary structure prediction using Ramachandran plots, and the Protein Data Bank (PDB) which contains experimentally determined protein structures. It also discusses protein structure prediction techniques like CASP and TMW which aim to predict protein structures theoretically based on sequence. Predictions start from an initial conformation and use internal coordinates and planar geometry to model the structure as a tree. Further proteomics research can study protein function once a structure is obtained.
This presentation discusses protein structure prediction using Rosetta. It begins with an overview of the Critical Assessment of Protein Structure Prediction (CASP) experiments and notes that Rosetta is one of the top performing free-modeling servers. The presentation then describes the basic ab initio protocol used by Rosetta, which involves fragment insertion, scoring, and refinement. It also discusses limitations and success rates. Key aspects of the Rosetta energy functions and sampling algorithms are presented. Examples of specific Rosetta applications including low-resolution modeling and refinement are provided.
Alphafold2 - Protein Structural Bioinformatics After CASP14Purdue University
This presentation discusses Alphafold2 about the algorithm in comparison with others and the time course of the impact and development that happened after the paper and software release. This slide was presented at IIBMP2021 https://iibmp2021.hamadalab.com/program/, a virtual conference in Tokyo, Japan on 9/27, 28, 2021
InterPro is a database that classifies proteins into families, domains, and sequence features based on their structural and functional properties. It integrates predictive models from several member databases to annotate unknown protein sequences. Protein signatures like patterns, profiles, fingerprints and hidden Markov models are generated from multiple sequence alignments and used by InterPro for classification. AlphaFold is an artificial intelligence system that can predict protein three-dimensional structures directly from amino acid sequences, representing a major advance in solving the protein folding problem.
Systems Biology & Pharmacology from a Structural PerspectivePhilip Bourne
Philip Bourne gave a talk on his past research leading data science at the NIH and his current and future research. Some of his past work includes developing methods for determining binding site similarity across protein space using geometric potentials and sequence-order independent profile-profile alignment. His current research focuses on using these methods for applications like predicting drug efficacy, toxicity and repurposing old drugs for new uses. He discussed challenges like the large search space and flexibility of proteins. Going forward, he hopes cloud computing environments can help make data and tools more accessible and reusable to advance systems pharmacology research.
Structure Modeling of Disordered Protein InteractionsPurdue University
This document describes a new method called IDR-LZerD for modeling the interactions between intrinsically disordered proteins and structured proteins. IDR-LZerD first predicts multiple conformations for short windows of the disordered sequence, docks these fragments to the structured protein, combines the docked fragments, and refines the models. It was shown to successfully model interactions for disordered proteins up to 69 residues in length on a training and test set, outperforming existing methods. Examples of predicted models are provided along with their accuracy compared to experimental structures. The method represents the first successful approach for docking intrinsically disordered proteins and could provide insights into their interactions.
This document discusses recent trends in bioinformatics, including the analysis of cDNA microarray data, protein tertiary structure prediction using Ramachandran plots, and the Protein Data Bank (PDB) which contains experimentally determined protein structures. It also discusses protein structure prediction techniques like CASP and TMW which aim to predict protein structures theoretically based on sequence. Predictions start from an initial conformation and use internal coordinates and planar geometry to model the structure as a tree. Further proteomics research can study protein function once a structure is obtained.
This presentation discusses protein structure prediction using Rosetta. It begins with an overview of the Critical Assessment of Protein Structure Prediction (CASP) experiments and notes that Rosetta is one of the top performing free-modeling servers. The presentation then describes the basic ab initio protocol used by Rosetta, which involves fragment insertion, scoring, and refinement. It also discusses limitations and success rates. Key aspects of the Rosetta energy functions and sampling algorithms are presented. Examples of specific Rosetta applications including low-resolution modeling and refinement are provided.
Materials Design in the Age of Deep Learning and Quantum ComputationKAMAL CHOUDHARY
The document discusses recent developments in materials design using deep learning (DL) and quantum computation. It introduces several DL and quantum computation models developed at NIST including:
- ALIGNN, a graph neural network model that predicts materials properties from crystal structure.
- AtomVision, a DL framework for analyzing scanning probe microscopy and electron microscopy images of materials.
- AtomQC, which uses variational quantum algorithms like VQE to perform quantum simulations of materials on quantum computers.
The models have been applied to datasets containing thousands of materials to predict properties and analyze experimental images. Future work aims to integrate DL and quantum computation for accelerated materials discovery and design.
This document describes a methodology for improving protein loop structure prediction using PyRosetta. The researchers found that introducing transient amino acid mutations via site-directed mutagenesis helped smooth the energy landscape during conformational searching, leading to lower energy and more accurate predicted structures compared to the wild type sequences. While this approach improved results, the predicted structures still differed somewhat from the actual structures. Future work aims to better understand and linearize the relationship between predicted structure accuracy (RMSD) and energy near the native structure.
Development of algorithm for identification of maligant growth in cancer usin...IJECEIAES
The precise identification and characterization of small pulmonary nodules at low-dose CT is a necessary requirement for the completion of valuable lung cancer screening. It is compulsory to develop some automated tool, in order to detect pulmonary nodules at low dose ct at the beginning stage itself. The various algorithms had been proposed earlier by many researchers within the past, but the accuracy of prediction is usually a challenging task. During this work, a man-made neural networ based methodology is proposed to seek out the irregular growth of lung tissues. Higher probability of detection is taken as a goal to urge an automatic tool, with great accuracy. The best feature sets derived from Haralick Gray level co occurrence Matrix and used because the dimension reduction way for feeding neural network. During this work, a binary Binary classifier neural network has been proposed to spot the traditional images out of all the images. The potential of the proposed neural network has been quantitatively computed using confusion matrix and located in terms of accuracy.
Data Mining Protein Structures' Topological Properties to Enhance Contact Ma...jaumebp
1. The document discusses a contact map prediction method that was one of the top performers in CASP8. It uses topological properties of proteins like recursive convex hulls along with machine learning to predict contacts.
2. The method was tested on CASP8 targets, achieving high accuracy scores. Analysis of the predictive rules provided insights into which features were most important.
3. While performance was good, there is still room for improvement, such as better ranking of predictions or using correlated mutations. The contact map prediction approach is valuable as in CASP8 it was competitive with methods using 3D structure prediction.
PROTEIN STRUCTURE PREDICTION USING SUPPORT VECTOR MACHINEijsc
Support Vector Machine (SVM) is used for predict the protein structural. Bioinformatics method use to protein structure prediction mostly depends on the amino acid sequence. In this paper, work predicted of 1-
D, 2-D, and 3-D protein structure prediction. Protein structure prediction is one of the most important problems in modern computation biology. Support Vector Machine haves shown strong generalization ability protein structure prediction. Binary classification techniques of Support Vector Machine are implemented and RBF kernel function is used in SVM. This Radial Basic Function (RBF) of SVM produces better accuracy in terms of classification and the learning results.
Protein Structure Prediction Using Support Vector Machine ijsc
Support Vector Machine (SVM) is used for predict the protein structural. Bioinformatics method use to protein structure prediction mostly depends on the amino acid sequence. In this paper, work predicted of 1-D, 2-D, and 3-D protein structure prediction. Protein structure prediction is one of the most important problems in modern computation biology. Support Vector Machine haves shown strong generalization ability protein structure prediction. Binary classification techniques of Support Vector Machine are implemented and RBF kernel function is used in SVM. This Radial Basic Function (RBF) of SVM produces better accuracy in terms of classification and the learning results.
This document provides an overview of protein structure analysis tools and techniques:
1) It describes exploring the Protein Data Bank (PDB) to view and analyze X-ray crystallography and NMR protein structures, comparing similar structures, and using tools like FoldX for in silico mutagenesis and homology modeling.
2) Key concepts covered include PDB file formats, atomic coordinates, B-factors, resolution, RMSD, and the principles of X-ray crystallography, NMR structure determination, and homology modeling.
3) Visualization software like YASARA, SwissPDBViewer and PyMOL are introduced for viewing protein structures from the PDB.
The Infobiotics Contact Map predictor at CASP9jaumebp
The document describes the Infobiotics Contact Map Predictor submitted to CASP9. It summarizes the steps taken to predict protein contact maps, which include predicting secondary structure, solvent accessibility, recursive convex hull, and coordination number. These predictions are integrated using BioHEL and based on local sequence and evolutionary information within windows of neighboring residues. The dataset used for training and testing contained over 32 million instances from 3262 protein chains with resolutions under 2 Angstroms and less than 30% sequence identity.
Protein docking by LZerD, KiharaLab at CAPRI meeting 2016Purdue University
This document summarizes Daisuke Kihara's protein docking prediction methods used in CAPRI Rounds 30 and 31. It describes the LZerD docking program, 3D Zernike descriptors for representation, and scoring functions like ITScore, GOAP and DFIRE. Interface prediction methods like BindML and cons-PPISP are also overviewed. Examples of targets T79, T91 and T96 are provided to illustrate model quality, interface prediction performance, and how different scoring functions evaluated models. The summary highlights that further improvement is needed for interface prediction and scoring functions' dependence on subunit model quality.
COMPRESSION BASED FACE RECOGNITION USING DWT AND SVMsipij
The biometric is used to identify a person effectively and employ in almost all applications of day to day
activities. In this paper, we propose compression based face recognition using Discrete Wavelet Transform
(DWT) and Support Vector Machine (SVM). The novel concept of converting many images of single person
into one image using averaging technique is introduced to reduce execution time and memory. The DWT is
applied on averaged face image to obtain approximation (LL) and detailed bands. The LL band coefficients
are given as input to SVM to obtain Support vectors (SV’s). The LL coefficients of DWT and SV’s are fused
based on arithmetic addition to extract final features. The Euclidean Distance (ED) is used to compare test
image features with database image features to compute performance parameters. It is observed that, the
proposed algorithm is better in terms of performance compared to existing algorithms.
Friday, October 15th, 2021, Sapporo, Hokkaido, Japan.
Hokkaido University ICReDD - Faculty of Medicine Joint Symposium
https://www.icredd.hokudai.ac.jp/event/5993
ICReDD (Institute for Chemical Reaction Design and Discovery)
https://www.icredd.hokudai.ac.jp
Protein secondary structure prediction by a neural network architecture with...IJECEIAES
Protein secondary structure is an immense achievement of bioinformatics. It's an amino acid residue in a polypeptide backbone. In this paper, an innovative method has been proposed for predicting protein secondary structures based on 3-state protein secondary structure by neural network architecture with simple positioning algorithm (SIMPA) technique. Q3 (3-state prediction of protein secondary structure) is a fundamental methodology for our approach. Initially, the prediction of the secondary structure of the protein using the Q3 prediction method has been done. For this, a model has been built from its primary structure. Then it will retrieve the percentage of amino acid sequences from the original sequence to the accuracy of the predicted sequence. Utilizing the SIMPA technique from the 3-state secondary structure predicted sequence, the percentage of dissimilar residues of the three types (α-helix, β-sheet and coil) of Q3 has been extracted. Then the verification of the Q3 predicted accuracy through the SIMPA technique was done. Finally using a new method of neural network, it is verified that the Q3 prediction method gives good results from the neural network approach.
Presentation 2007 Journal Club Azhar Ali Shahguest5de83e
The document summarizes rapid methods for comparing protein structures and scanning structure databases. It discusses various representations of protein structures used in rapid comparison methods, including strings, arrays, secondary structural elements, and backbone representations. It also reviews different algorithms that use these representations, such as TOPSCAN, PRIDE, SSM, DEJAVU, and VAST. Evaluation of these methods on large databases is needed to properly benchmark their speed and accuracy.
Structural Studies of Aspartic Endopeptidase pep2 from Neosartorya Fisherica ...ijbbjournal
- The document describes the structural study of the aspartic endopeptidase pep2 protein from Neosartorya fisherica using homology modeling techniques.
- A 3D structural model of pep2 was generated based on its sequence similarity to proteinase A from Saccharomyces cerevisiae. The pep2 model was evaluated and found to have good stereochemistry and energy values.
- Homology modeling is an effective technique for predicting the 3D structure of a protein when an experimentally determined structure of a suitable template is available. This study provides insights into the structure of the pep2 protein.
Towards the comparative analysis of genomic variants with JalviewJim Procter
Invited talk at the first 'Fantastic Forces' Computational Evolutionary Biology at the University of St. Andrews on 5th June, 2019. In 45 minutes, I introduced key concepts in multiple sequence alignment analysis, Jalview's major capabilities for working with DNA, RNA and Protein sequences, structures and phylogenetic trees, and how they could be applied to filter and visualise genomic variant data imported from VCF and added to alignments as positional annotation.
I also highlighted MacGowan and Barton's ongoing research program (published on biorxiv) into the inference of functional regions of proteins by analysis of the relationship between the distribution of missense variants and conservation in Pfam multiple sequence alignments involving paralogous domains from human and other organisms.
https://fantastic4cesworkshop.wordpress.com/programme/
The document summarizes a method for mining frequent subgraphs from linear graphs. It describes:
1) Representing data like proteins, RNA and texts as linear graphs and the need for algorithms to mine frequent patterns from such graphs.
2) A method called LGM that can efficiently enumerate and mine both connected and disconnected subgraphs from linear graphs using reverse search techniques.
3) Experiments applying LGM to mine motifs from protein structures and phrases from texts, achieving better performance than existing methods.
Mark Mackey, Cresset, 'Meet Molecular Architect, A new product for understand...Cresset
Dr. Mark Mackey is the Chief Scientific Officer at Molecular Architect. He discusses using molecular fields and field points to align molecules and generate 3D-QSAR models. He provides examples of building successful 3D-QSAR models for a small SARS dataset using a guided alignment and for a large NK3 receptor antagonist dataset after predicting the binding mode. Molecular Architect combines field-based alignment and QSAR modeling into an interactive tool to aid in drug design.
This document summarizes a study that uses coarse-grained molecular dynamics (MD) simulations to model the coordination between competing RNA base pairs. The study uses two RNA systems, each with two separated complementary strands, to test the coarse-grained model. For the first system, the simulation showed little conformational change, but the second system indicated a base pairing interaction as the RMSD sharply decreased partway through the simulation, suggesting the strands bent back together. The coarse-grained model reduces computational expenses compared to all-atom MD while optimizing accuracy and speed for simulating macromolecules like RNA, which could aid drug development efforts.
Machine learning techniques can help address several unsolved problems in structural bioinformatics, including predicting protein flexibility and binding sites. The document discusses using machine learning models like SVMs trained on structural data to predict flexibility regions and protein-protein interaction sites from sequence alone. It also presents challenges in defining protein domain boundaries and predicting other structural features from sequence.
These slides are based on our presentation at CASP14 on Day 3, Dec 3, 2020, in the Satellite session of the data-assisted modeling category. Modified to add some more explanation.
Predicting Assembly Order of Multimeric Protein ComplexesPurdue University
Here, we present a computational method that predicts the assembly order of protein complexes by building the complex structure. The method, named Path-LzerD, uses a multimeric protein docking algorithm that assembles a protein complex structure from individual subunit structures and predicts assembly order by observing the simulated assembly process of the complex. This method can be applied for protein complexes whose overall structure has not been solved yet.
This slides were presented at 3D-SIG at ISMB, Chicago, July 9, 2018.
Materials Design in the Age of Deep Learning and Quantum ComputationKAMAL CHOUDHARY
The document discusses recent developments in materials design using deep learning (DL) and quantum computation. It introduces several DL and quantum computation models developed at NIST including:
- ALIGNN, a graph neural network model that predicts materials properties from crystal structure.
- AtomVision, a DL framework for analyzing scanning probe microscopy and electron microscopy images of materials.
- AtomQC, which uses variational quantum algorithms like VQE to perform quantum simulations of materials on quantum computers.
The models have been applied to datasets containing thousands of materials to predict properties and analyze experimental images. Future work aims to integrate DL and quantum computation for accelerated materials discovery and design.
This document describes a methodology for improving protein loop structure prediction using PyRosetta. The researchers found that introducing transient amino acid mutations via site-directed mutagenesis helped smooth the energy landscape during conformational searching, leading to lower energy and more accurate predicted structures compared to the wild type sequences. While this approach improved results, the predicted structures still differed somewhat from the actual structures. Future work aims to better understand and linearize the relationship between predicted structure accuracy (RMSD) and energy near the native structure.
Development of algorithm for identification of maligant growth in cancer usin...IJECEIAES
The precise identification and characterization of small pulmonary nodules at low-dose CT is a necessary requirement for the completion of valuable lung cancer screening. It is compulsory to develop some automated tool, in order to detect pulmonary nodules at low dose ct at the beginning stage itself. The various algorithms had been proposed earlier by many researchers within the past, but the accuracy of prediction is usually a challenging task. During this work, a man-made neural networ based methodology is proposed to seek out the irregular growth of lung tissues. Higher probability of detection is taken as a goal to urge an automatic tool, with great accuracy. The best feature sets derived from Haralick Gray level co occurrence Matrix and used because the dimension reduction way for feeding neural network. During this work, a binary Binary classifier neural network has been proposed to spot the traditional images out of all the images. The potential of the proposed neural network has been quantitatively computed using confusion matrix and located in terms of accuracy.
Data Mining Protein Structures' Topological Properties to Enhance Contact Ma...jaumebp
1. The document discusses a contact map prediction method that was one of the top performers in CASP8. It uses topological properties of proteins like recursive convex hulls along with machine learning to predict contacts.
2. The method was tested on CASP8 targets, achieving high accuracy scores. Analysis of the predictive rules provided insights into which features were most important.
3. While performance was good, there is still room for improvement, such as better ranking of predictions or using correlated mutations. The contact map prediction approach is valuable as in CASP8 it was competitive with methods using 3D structure prediction.
PROTEIN STRUCTURE PREDICTION USING SUPPORT VECTOR MACHINEijsc
Support Vector Machine (SVM) is used for predict the protein structural. Bioinformatics method use to protein structure prediction mostly depends on the amino acid sequence. In this paper, work predicted of 1-
D, 2-D, and 3-D protein structure prediction. Protein structure prediction is one of the most important problems in modern computation biology. Support Vector Machine haves shown strong generalization ability protein structure prediction. Binary classification techniques of Support Vector Machine are implemented and RBF kernel function is used in SVM. This Radial Basic Function (RBF) of SVM produces better accuracy in terms of classification and the learning results.
Protein Structure Prediction Using Support Vector Machine ijsc
Support Vector Machine (SVM) is used for predict the protein structural. Bioinformatics method use to protein structure prediction mostly depends on the amino acid sequence. In this paper, work predicted of 1-D, 2-D, and 3-D protein structure prediction. Protein structure prediction is one of the most important problems in modern computation biology. Support Vector Machine haves shown strong generalization ability protein structure prediction. Binary classification techniques of Support Vector Machine are implemented and RBF kernel function is used in SVM. This Radial Basic Function (RBF) of SVM produces better accuracy in terms of classification and the learning results.
This document provides an overview of protein structure analysis tools and techniques:
1) It describes exploring the Protein Data Bank (PDB) to view and analyze X-ray crystallography and NMR protein structures, comparing similar structures, and using tools like FoldX for in silico mutagenesis and homology modeling.
2) Key concepts covered include PDB file formats, atomic coordinates, B-factors, resolution, RMSD, and the principles of X-ray crystallography, NMR structure determination, and homology modeling.
3) Visualization software like YASARA, SwissPDBViewer and PyMOL are introduced for viewing protein structures from the PDB.
The Infobiotics Contact Map predictor at CASP9jaumebp
The document describes the Infobiotics Contact Map Predictor submitted to CASP9. It summarizes the steps taken to predict protein contact maps, which include predicting secondary structure, solvent accessibility, recursive convex hull, and coordination number. These predictions are integrated using BioHEL and based on local sequence and evolutionary information within windows of neighboring residues. The dataset used for training and testing contained over 32 million instances from 3262 protein chains with resolutions under 2 Angstroms and less than 30% sequence identity.
Protein docking by LZerD, KiharaLab at CAPRI meeting 2016Purdue University
This document summarizes Daisuke Kihara's protein docking prediction methods used in CAPRI Rounds 30 and 31. It describes the LZerD docking program, 3D Zernike descriptors for representation, and scoring functions like ITScore, GOAP and DFIRE. Interface prediction methods like BindML and cons-PPISP are also overviewed. Examples of targets T79, T91 and T96 are provided to illustrate model quality, interface prediction performance, and how different scoring functions evaluated models. The summary highlights that further improvement is needed for interface prediction and scoring functions' dependence on subunit model quality.
COMPRESSION BASED FACE RECOGNITION USING DWT AND SVMsipij
The biometric is used to identify a person effectively and employ in almost all applications of day to day
activities. In this paper, we propose compression based face recognition using Discrete Wavelet Transform
(DWT) and Support Vector Machine (SVM). The novel concept of converting many images of single person
into one image using averaging technique is introduced to reduce execution time and memory. The DWT is
applied on averaged face image to obtain approximation (LL) and detailed bands. The LL band coefficients
are given as input to SVM to obtain Support vectors (SV’s). The LL coefficients of DWT and SV’s are fused
based on arithmetic addition to extract final features. The Euclidean Distance (ED) is used to compare test
image features with database image features to compute performance parameters. It is observed that, the
proposed algorithm is better in terms of performance compared to existing algorithms.
Friday, October 15th, 2021, Sapporo, Hokkaido, Japan.
Hokkaido University ICReDD - Faculty of Medicine Joint Symposium
https://www.icredd.hokudai.ac.jp/event/5993
ICReDD (Institute for Chemical Reaction Design and Discovery)
https://www.icredd.hokudai.ac.jp
Protein secondary structure prediction by a neural network architecture with...IJECEIAES
Protein secondary structure is an immense achievement of bioinformatics. It's an amino acid residue in a polypeptide backbone. In this paper, an innovative method has been proposed for predicting protein secondary structures based on 3-state protein secondary structure by neural network architecture with simple positioning algorithm (SIMPA) technique. Q3 (3-state prediction of protein secondary structure) is a fundamental methodology for our approach. Initially, the prediction of the secondary structure of the protein using the Q3 prediction method has been done. For this, a model has been built from its primary structure. Then it will retrieve the percentage of amino acid sequences from the original sequence to the accuracy of the predicted sequence. Utilizing the SIMPA technique from the 3-state secondary structure predicted sequence, the percentage of dissimilar residues of the three types (α-helix, β-sheet and coil) of Q3 has been extracted. Then the verification of the Q3 predicted accuracy through the SIMPA technique was done. Finally using a new method of neural network, it is verified that the Q3 prediction method gives good results from the neural network approach.
Presentation 2007 Journal Club Azhar Ali Shahguest5de83e
The document summarizes rapid methods for comparing protein structures and scanning structure databases. It discusses various representations of protein structures used in rapid comparison methods, including strings, arrays, secondary structural elements, and backbone representations. It also reviews different algorithms that use these representations, such as TOPSCAN, PRIDE, SSM, DEJAVU, and VAST. Evaluation of these methods on large databases is needed to properly benchmark their speed and accuracy.
Structural Studies of Aspartic Endopeptidase pep2 from Neosartorya Fisherica ...ijbbjournal
- The document describes the structural study of the aspartic endopeptidase pep2 protein from Neosartorya fisherica using homology modeling techniques.
- A 3D structural model of pep2 was generated based on its sequence similarity to proteinase A from Saccharomyces cerevisiae. The pep2 model was evaluated and found to have good stereochemistry and energy values.
- Homology modeling is an effective technique for predicting the 3D structure of a protein when an experimentally determined structure of a suitable template is available. This study provides insights into the structure of the pep2 protein.
Towards the comparative analysis of genomic variants with JalviewJim Procter
Invited talk at the first 'Fantastic Forces' Computational Evolutionary Biology at the University of St. Andrews on 5th June, 2019. In 45 minutes, I introduced key concepts in multiple sequence alignment analysis, Jalview's major capabilities for working with DNA, RNA and Protein sequences, structures and phylogenetic trees, and how they could be applied to filter and visualise genomic variant data imported from VCF and added to alignments as positional annotation.
I also highlighted MacGowan and Barton's ongoing research program (published on biorxiv) into the inference of functional regions of proteins by analysis of the relationship between the distribution of missense variants and conservation in Pfam multiple sequence alignments involving paralogous domains from human and other organisms.
https://fantastic4cesworkshop.wordpress.com/programme/
The document summarizes a method for mining frequent subgraphs from linear graphs. It describes:
1) Representing data like proteins, RNA and texts as linear graphs and the need for algorithms to mine frequent patterns from such graphs.
2) A method called LGM that can efficiently enumerate and mine both connected and disconnected subgraphs from linear graphs using reverse search techniques.
3) Experiments applying LGM to mine motifs from protein structures and phrases from texts, achieving better performance than existing methods.
Mark Mackey, Cresset, 'Meet Molecular Architect, A new product for understand...Cresset
Dr. Mark Mackey is the Chief Scientific Officer at Molecular Architect. He discusses using molecular fields and field points to align molecules and generate 3D-QSAR models. He provides examples of building successful 3D-QSAR models for a small SARS dataset using a guided alignment and for a large NK3 receptor antagonist dataset after predicting the binding mode. Molecular Architect combines field-based alignment and QSAR modeling into an interactive tool to aid in drug design.
This document summarizes a study that uses coarse-grained molecular dynamics (MD) simulations to model the coordination between competing RNA base pairs. The study uses two RNA systems, each with two separated complementary strands, to test the coarse-grained model. For the first system, the simulation showed little conformational change, but the second system indicated a base pairing interaction as the RMSD sharply decreased partway through the simulation, suggesting the strands bent back together. The coarse-grained model reduces computational expenses compared to all-atom MD while optimizing accuracy and speed for simulating macromolecules like RNA, which could aid drug development efforts.
Machine learning techniques can help address several unsolved problems in structural bioinformatics, including predicting protein flexibility and binding sites. The document discusses using machine learning models like SVMs trained on structural data to predict flexibility regions and protein-protein interaction sites from sequence alone. It also presents challenges in defining protein domain boundaries and predicting other structural features from sequence.
Similar to Kiharalab Bioinformatics Projects 2019 (20)
These slides are based on our presentation at CASP14 on Day 3, Dec 3, 2020, in the Satellite session of the data-assisted modeling category. Modified to add some more explanation.
Predicting Assembly Order of Multimeric Protein ComplexesPurdue University
Here, we present a computational method that predicts the assembly order of protein complexes by building the complex structure. The method, named Path-LzerD, uses a multimeric protein docking algorithm that assembles a protein complex structure from individual subunit structures and predicts assembly order by observing the simulated assembly process of the complex. This method can be applied for protein complexes whose overall structure has not been solved yet.
This slides were presented at 3D-SIG at ISMB, Chicago, July 9, 2018.
DextMP: Text mining for finding moonlighting proteinsPurdue University
Slides presented at ISMB 2017 in Prague on "DextMP: deep dive in text for predicting moonlighting proteins" by Ishita K. Khan, Mansurul Bhuiiyan, &. Daisuke Kihara. ISMB Proceeding talk, published on Bioinformatics: https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/btx231
Combined experimental and structural-bioinformatics approach is presented for proteome-scale protein-metabolite/drug interactions. Experimental methods combine pulse-proteolysis and mass spec. Bioinformatics part uses Patch-Surfer software.
Kihara Lab protein structure prediction performance in CASP11Purdue University
Invited talk given at CASP11 protein structure prediction meeting (2014) in the Free modeling category. http://www.predictioncenter.org/casp11/docs.cgi?view=presentations
Reference: https://www.ncbi.nlm.nih.gov/pubmed/26344195
The document describes research from the Kihara Lab at Purdue University on bioinformatics and computational biology topics, including PL-PatchSurfer2 which is a virtual screening method that performs better than existing methods at predicting ligand binding, particularly on computationally modeled protein structures. It also describes research on predicting protein function, detecting moonlighting proteins, and modeling protein folding as an information theoretic noisy channel.
Flexscore: Ensemble-based evaluation for protein Structure modelsPurdue University
Presentation at ISMB 2016 for the paper on Flexscore. Score for evaluating computational protein models by considering flexibility derived from NMR or molecular dynamics simluation. Paper published on Bioinformatics: http://www.ncbi.nlm.nih.gov/pubmed/27307633
by Kihara Lab http://kiharalab.org
The cost of acquiring information by natural selectionCarl Bergstrom
This is a short talk that I gave at the Banff International Research Station workshop on Modeling and Theory in Population Biology. The idea is to try to understand how the burden of natural selection relates to the amount of information that selection puts into the genome.
It's based on the first part of this research paper:
The cost of information acquisition by natural selection
Ryan Seamus McGee, Olivia Kosterlitz, Artem Kaznatcheev, Benjamin Kerr, Carl T. Bergstrom
bioRxiv 2022.07.02.498577; doi: https://doi.org/10.1101/2022.07.02.498577
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...Advanced-Concepts-Team
Presentation in the Science Coffee of the Advanced Concepts Team of the European Space Agency on the 07.06.2024.
Speaker: Diego Blas (IFAE/ICREA)
Title: Gravitational wave detection with orbital motion of Moon and artificial
Abstract:
In this talk I will describe some recent ideas to find gravitational waves from supermassive black holes or of primordial origin by studying their secular effect on the orbital motion of the Moon or satellites that are laser ranged.
The debris of the ‘last major merger’ is dynamically youngSérgio Sacani
The Milky Way’s (MW) inner stellar halo contains an [Fe/H]-rich component with highly eccentric orbits, often referred to as the
‘last major merger.’ Hypotheses for the origin of this component include Gaia-Sausage/Enceladus (GSE), where the progenitor
collided with the MW proto-disc 8–11 Gyr ago, and the Virgo Radial Merger (VRM), where the progenitor collided with the
MW disc within the last 3 Gyr. These two scenarios make different predictions about observable structure in local phase space,
because the morphology of debris depends on how long it has had to phase mix. The recently identified phase-space folds in Gaia
DR3 have positive caustic velocities, making them fundamentally different than the phase-mixed chevrons found in simulations
at late times. Roughly 20 per cent of the stars in the prograde local stellar halo are associated with the observed caustics. Based
on a simple phase-mixing model, the observed number of caustics are consistent with a merger that occurred 1–2 Gyr ago.
We also compare the observed phase-space distribution to FIRE-2 Latte simulations of GSE-like mergers, using a quantitative
measurement of phase mixing (2D causticality). The observed local phase-space distribution best matches the simulated data
1–2 Gyr after collision, and certainly not later than 3 Gyr. This is further evidence that the progenitor of the ‘last major merger’
did not collide with the MW proto-disc at early times, as is thought for the GSE, but instead collided with the MW disc within
the last few Gyr, consistent with the body of work surrounding the VRM.
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Sérgio Sacani
Context. The observation of several L-band emission sources in the S cluster has led to a rich discussion of their nature. However, a definitive answer to the classification of the dusty objects requires an explanation for the detection of compact Doppler-shifted Brγ emission. The ionized hydrogen in combination with the observation of mid-infrared L-band continuum emission suggests that most of these sources are embedded in a dusty envelope. These embedded sources are part of the S-cluster, and their relationship to the S-stars is still under debate. To date, the question of the origin of these two populations has been vague, although all explanations favor migration processes for the individual cluster members. Aims. This work revisits the S-cluster and its dusty members orbiting the supermassive black hole SgrA* on bound Keplerian orbits from a kinematic perspective. The aim is to explore the Keplerian parameters for patterns that might imply a nonrandom distribution of the sample. Additionally, various analytical aspects are considered to address the nature of the dusty sources. Methods. Based on the photometric analysis, we estimated the individual H−K and K−L colors for the source sample and compared the results to known cluster members. The classification revealed a noticeable contrast between the S-stars and the dusty sources. To fit the flux-density distribution, we utilized the radiative transfer code HYPERION and implemented a young stellar object Class I model. We obtained the position angle from the Keplerian fit results; additionally, we analyzed the distribution of the inclinations and the longitudes of the ascending node. Results. The colors of the dusty sources suggest a stellar nature consistent with the spectral energy distribution in the near and midinfrared domains. Furthermore, the evaporation timescales of dusty and gaseous clumps in the vicinity of SgrA* are much shorter ( 2yr) than the epochs covered by the observations (≈15yr). In addition to the strong evidence for the stellar classification of the D-sources, we also find a clear disk-like pattern following the arrangements of S-stars proposed in the literature. Furthermore, we find a global intrinsic inclination for all dusty sources of 60 ± 20◦, implying a common formation process. Conclusions. The pattern of the dusty sources manifested in the distribution of the position angles, inclinations, and longitudes of the ascending node strongly suggests two different scenarios: the main-sequence stars and the dusty stellar S-cluster sources share a common formation history or migrated with a similar formation channel in the vicinity of SgrA*. Alternatively, the gravitational influence of SgrA* in combination with a massive perturber, such as a putative intermediate mass black hole in the IRS 13 cluster, forces the dusty objects and S-stars to follow a particular orbital arrangement. Key words. stars: black holes– stars: formation– Galaxy: center– galaxies: star formation
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Leonel Morgado
Current descriptions of immersive learning cases are often difficult or impossible to compare. This is due to a myriad of different options on what details to include, which aspects are relevant, and on the descriptive approaches employed. Also, these aspects often combine very specific details with more general guidelines or indicate intents and rationales without clarifying their implementation. In this paper we provide a method to describe immersive learning cases that is structured to enable comparisons, yet flexible enough to allow researchers and practitioners to decide which aspects to include. This method leverages a taxonomy that classifies educational aspects at three levels (uses, practices, and strategies) and then utilizes two frameworks, the Immersive Learning Brain and the Immersion Cube, to enable a structured description and interpretation of immersive learning cases. The method is then demonstrated on a published immersive learning case on training for wind turbine maintenance using virtual reality. Applying the method results in a structured artifact, the Immersive Learning Case Sheet, that tags the case with its proximal uses, practices, and strategies, and refines the free text case description to ensure that matching details are included. This contribution is thus a case description method in support of future comparative research of immersive learning cases. We then discuss how the resulting description and interpretation can be leveraged to change immersion learning cases, by enriching them (considering low-effort changes or additions) or innovating (exploring more challenging avenues of transformation). The method holds significant promise to support better-grounded research in immersive learning.
PPT on Alternate Wetting and Drying presented at the three-day 'Training and Validation Workshop on Modules of Climate Smart Agriculture (CSA) Technologies in South Asia' workshop on April 22, 2024.
PPT on Direct Seeded Rice presented at the three-day 'Training and Validation Workshop on Modules of Climate Smart Agriculture (CSA) Technologies in South Asia' workshop on April 22, 2024.
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...Travis Hills MN
By harnessing the power of High Flux Vacuum Membrane Distillation, Travis Hills from MN envisions a future where clean and safe drinking water is accessible to all, regardless of geographical location or economic status.
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
Kiharalab Bioinformatics Projects 2019
1. dkihara@purdue.edu 1
Bioinformatics Projects in
Kiharalab 2019
Daisuke Kihara
Professor
Department of Biological Sciences
Department of Computer Science
Purdue University, USA
dkihara@purdue.edu
http://kiharalab.org
2. Active Projects
Protein-protein docking prediction
Protein structure prediction
Protein structure modeling for cryo-
EM
Protein structure/EM map database
search
Virtual drug screening
Protein function prediction
Moonlighting proteins
2
4. 3D Zernike Descriptors (3DZD)
An extension of
spherical harmonics
based descriptors
A 3D object can be
represented by a
series of orthogonal
functions, thus
practically
represented by a
series of coefficients
as a feature vector
Compact
Rotation invariant
4
A surface representation of 1ew0A (A) is reconstructed from its 3D Zernike
invariants of the order 5, 10, 15, 20, and 25 (B-F). (Sael & Kihara, 2009)
),()(),,( m
lnl
m
nl YrRrZ
),( m
lY )(rRnl
),,( rZ m
nl
: Spherical harmonics, : radial functions
polynomials in Cartesian coordinates
14
3
.)()(
x
xxx dZf m
nl
m
nl Zernike moments:
Zernike Descriptor:
2
)( m
nl
lm
lm
nlF
7. Assembly Order Prediction for
Protein Complexes
7
Urease accessary complex:
Dimer of UreF/UreH dimer is formed first, which binds to G dimer
Peterson LX, Togawa Y, Esquivel-Rodriguez J, Terashi G, Christoffer C, et al. (2018)
Modeling the assembly order of multimeric heteroprotein complexes. PLOS Computational
8. The number of votes for assembly pathways of
4hi0 across generations of the genetic
algorithm
8
Peterson LX, Togawa Y, Esquivel-Rodriguez J, Terashi G, Christoffer C, et al. (2018) Modeling the assembly order of
multimeric heteroprotein complexes. PLOS Computational Biology 14(1): e1005937.
https://doi.org/10.1371/journal.pcbi.1005937
http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005937
10. Combining Docked Fragments
4,500 4,500 4,500
Seq.
window A
Seq.
window B
Seq.
window C
Docked
fragments
10
Eliminate pairs based on
distance and angle cutoffs:
11. Example of Unbound Predictions
11
PDB ID 1jya
69 AA
Rank 2
L-RMSD 9.9 Å
I-RMSD 5.4 Å
PDB ID 4ah2
20 AA
Rank 1
L-RMSD 5.8 Å
I-RMSD 2.4 Å
PDB ID 1ijj
25 AA
Rank 9
L-RMSD 5.1 Å
I-RMSD 2.5 Å
PDB ID 1l3e
44 AA
Rank 3
L-RMSD 7.4 Å
I-RMSD 6.3 Å
Model
Native
12. MAINMAST: De novo Structure Modeling for
medium (~4 Å) Resolution Maps
EM Map
Tracing
C⍺ model
EMD-6555 Porcine circovirus 2
Resolution: 2.9 Å
De-novo modeling: does not require known protein
structures.
Fully automated: No need for visual inspection and human
intervention.
12
(G. Terashi & D. Kihara, Nature Communications, 2018)
13. MAINMAST: (MAINchain Model trAcing from
Spanning Tree)
The longest path (red) does not
always cover the whole protein
structure.
13
EM map
Find local dense points
(LDPs) by Mean Shift
Connect All points by
Minimum Spanning
Tree
Refine Tree Structure
Thread sequence on
the longest path
C⍺ models ranked by threading score
22. 22
Features Considered:
• GO annotations (GO)
• PPI network (PPI)
• gene expression profiles
(GI)
• phylogenetic profiles (PE)
• genetic interactions (GI)
• disordered protein regions
(DOR)
• graph properties of PPI
23. DextMP: Finding Moonlighting Proteins
from Literature Text Information
23
Text Level
Prediction
Protein Level
Prediction
(Khan, Bhuiyan, & Kihara, Bioinformatics 2017)
24. iGFP: Group Function Prediction from
Functional Relevance Networks with
Conditional Random Field
24(Khan, Jain, Rawi, Bensmail, & Kihara, Bioinformatics 2018)