The document summarizes a bioinformatics summer camp, including:
1. The camp will cover basic molecular biology and bioinformatics topics like DNA, proteins, gene expression and the genetic code.
2. Students will work on computational analysis projects involving whole genome sequencing, gene expression profiling, and functional and comparative genomics.
3. The camp will teach techniques for analyzing protein structures and interactions, gene expression data, and identifying pockets on protein surfaces.
History and devolopment of bioinfomatics.ppt (1)Madan Kumar Ca
Dear Sir, Madam
Name: Madan Kumar C A
Topic: History and Development of Bioinformatics
Guide: Dr. Ramesh C K
Associate Professor
Dept of Biotechnnology
Sahyadri Science College
Shivamogga
This document discusses the sequencing of the human genome and the role of bioinformatics. It notes that in 2000, the human genome was sequenced through a joint British and American effort, marking a major event that changed human history. The document then discusses how bioinformatics uses computational techniques to analyze and manage biological data, allowing things like comparing genetic material of viruses to design medicines. Overall, the document provides a high-level overview of the sequencing of the human genome and introduction to the field of bioinformatics.
Computational biology involves using computational techniques like data analysis, modeling and simulation to study biological systems. Bioinformatics specifically develops tools to analyze biological data. Other computational biology fields include computational anatomy, genomics, neuroscience, pharmacology, and evolutionary biology which all apply computational methods to study anatomical structures, genomes, the brain, drug effects, and evolution respectively. Cancer computational biology aims to predict cancer mutations by analyzing large biological datasets.
INTRODUCTION
WHAT IS DATA AND DATABASE?
WHAT IS BIOLOGICAL DATABASE?
TYPES OF BIOLOGICAL DATABASE
PRIMARY DATABASE
Nucleic acid sequence database
Protein sequence database
SECONDARY DATABASE
COMPOSITE DATABASE
TERTIARY DATABASE
WHY NEED?
CONCLUSION
REFRENCES
Bioinformatics combines computer science, statistics, mathematics, and biology to study and process biological data on a large scale. The document discusses several applications of bioinformatics including information search and retrieval, sequence comparison for genetics, phylogenetic analysis, genome annotation, proteomics, pharmacogenomics, and drug discovery. Tools are provided for various applications such as linkage analysis, phylogenetic analysis, genome annotation, and protein identification.
Information contained in biological databases includes gene function, structure, localization (both cellular and chromosomal), clinical effects of mutations as well as similarities of biological sequences and structures. Biological databases can be broadly classified into sequence, structure and functional databases.
GenBank, EMBL, and DDBJ are primary nucleotide sequence databases that collaborate to store publicly available DNA sequences. NCBI's GenBank is one of the largest primary sequence databases, containing over 240,000 organisms' sequences submitted from laboratories. PubMed and Entrez are literature and biomedical databases maintained by NCBI that allow users to search biomedical research articles and integrate related data from multiple sources. SRS is a sequence retrieval system developed by EBI that integrates over 250 molecular biology databases and allows complex queries across data sources.
The document summarizes a bioinformatics summer camp, including:
1. The camp will cover basic molecular biology and bioinformatics topics like DNA, proteins, gene expression and the genetic code.
2. Students will work on computational analysis projects involving whole genome sequencing, gene expression profiling, and functional and comparative genomics.
3. The camp will teach techniques for analyzing protein structures and interactions, gene expression data, and identifying pockets on protein surfaces.
History and devolopment of bioinfomatics.ppt (1)Madan Kumar Ca
Dear Sir, Madam
Name: Madan Kumar C A
Topic: History and Development of Bioinformatics
Guide: Dr. Ramesh C K
Associate Professor
Dept of Biotechnnology
Sahyadri Science College
Shivamogga
This document discusses the sequencing of the human genome and the role of bioinformatics. It notes that in 2000, the human genome was sequenced through a joint British and American effort, marking a major event that changed human history. The document then discusses how bioinformatics uses computational techniques to analyze and manage biological data, allowing things like comparing genetic material of viruses to design medicines. Overall, the document provides a high-level overview of the sequencing of the human genome and introduction to the field of bioinformatics.
Computational biology involves using computational techniques like data analysis, modeling and simulation to study biological systems. Bioinformatics specifically develops tools to analyze biological data. Other computational biology fields include computational anatomy, genomics, neuroscience, pharmacology, and evolutionary biology which all apply computational methods to study anatomical structures, genomes, the brain, drug effects, and evolution respectively. Cancer computational biology aims to predict cancer mutations by analyzing large biological datasets.
INTRODUCTION
WHAT IS DATA AND DATABASE?
WHAT IS BIOLOGICAL DATABASE?
TYPES OF BIOLOGICAL DATABASE
PRIMARY DATABASE
Nucleic acid sequence database
Protein sequence database
SECONDARY DATABASE
COMPOSITE DATABASE
TERTIARY DATABASE
WHY NEED?
CONCLUSION
REFRENCES
Bioinformatics combines computer science, statistics, mathematics, and biology to study and process biological data on a large scale. The document discusses several applications of bioinformatics including information search and retrieval, sequence comparison for genetics, phylogenetic analysis, genome annotation, proteomics, pharmacogenomics, and drug discovery. Tools are provided for various applications such as linkage analysis, phylogenetic analysis, genome annotation, and protein identification.
Information contained in biological databases includes gene function, structure, localization (both cellular and chromosomal), clinical effects of mutations as well as similarities of biological sequences and structures. Biological databases can be broadly classified into sequence, structure and functional databases.
GenBank, EMBL, and DDBJ are primary nucleotide sequence databases that collaborate to store publicly available DNA sequences. NCBI's GenBank is one of the largest primary sequence databases, containing over 240,000 organisms' sequences submitted from laboratories. PubMed and Entrez are literature and biomedical databases maintained by NCBI that allow users to search biomedical research articles and integrate related data from multiple sources. SRS is a sequence retrieval system developed by EBI that integrates over 250 molecular biology databases and allows complex queries across data sources.
Bioinformatics is the use of computers for storage, retrieval, manipulation, and distribution of information related to biological macromolecules such as DNA, RNA, and proteins. It involves developing computational tools and databases to analyze biological data. Key areas include sequence analysis, structural analysis, functional analysis, biological databases, sequence alignment, protein structure prediction, molecular phylogenetics, and genomics. The goals are to better understand living systems at the molecular level through computational analysis of biological data.
The Molecular Modeling Database (MMDB) is a database hosted by the National Center for Biotechnology Information that contains over 28,000 experimentally determined 3D structures of biomolecules including proteins and nucleic acids derived from the Protein Data Bank, excluding theoretical models. It facilitates computation and links structures to other data types. Each record cross-references its source PDB file. The database contains molecular structures, biological activity data, experimental data, chemical properties, and annotations to aid researchers. Examples of widely used molecular modeling databases discussed are the Protein Data Bank, PubChem, and RCSB Ligand Explorer.
Protein Sequence, Structure, and Functional Databases: UniProtKB, Swiss-Prot, TrEMBL, PIR, MIPS, PROSITE, PRINTS, BLOCKS, Pfam, NDRB, OWL, PDB, SCOP, CATH, NDB, PQS, SYSTERS, and Motif. Presented at UGC Sponsored National Workshop on Bioinformatics and Sequence Analysis conducted by Nesamony Memorial Christian College, Marthandam on 9th and 10th October, 2017 by Prof. T. Ashok Kumar
Introduction
Overview
Reductionist approach
Holistic approach
What is systems biology?
○ Advantages of Systems Biology
Tools of holistic approach
○ Proteomics, Transcriptomics and Metabolomics
Conclusion
References
The document discusses bioinformatics, defining it as the application of information technology to the field of molecular biology. It describes how bioinformatics uses biology, mathematics, and computer science to analyze and manage biological data. Some key applications of bioinformatics mentioned are sequence analysis, prediction of protein structure, genome annotation, comparative genomics, and health/drug discovery. Several important bioinformatics resources are also outlined, including NCBI, PubMed, EMBL, and OMIM.
The document describes several key databases within the KEGG resource, including:
- The PATHWAY database containing molecular network maps of metabolic and genetic pathways.
- The BRITE database providing hierarchical classifications of biological systems beyond what is shown in pathways.
- The LIGAND database consisting of chemical compounds, carbohydrates, reactions, and enzyme information.
KEGG aims to comprehensively capture biological knowledge through integrated databases covering genomes, pathways, diseases and drugs.
Rashi Srivastava presented on the KEGG database in biotechnology. KEGG is a database that contains genomic, chemical, and systems information to understand biological functions from the molecular level up. It includes pathways, genes, compounds, diseases, drugs, and organisms. KEGG can be searched through its flat file format using DBGET or through its relational database format for more complex queries. It also contains the KEGG MEDICUS search tool and direct SQL searches of its relational database.
The document discusses different text-based database retrieval systems for accessing biological data, including Entrez, SRS, and DBGET/LinkDB. It describes their key features and how each system allows users to search text databases using queries, with Entrez providing linked related data across multiple databases. An example shows how each system can be used to retrieve and view related information for a SwissProt protein entry.
As an interdisciplinary field of science, bioinformatics combines biology, computer science, information engineering, mathematics and statistics to analyze and interpret the biological data.
Bioinformatics is an interdisciplinary field that combines biology, computer science, and information technology. It enables the discovery of new biological insights and unifying principles in biology through the merging of these disciplines. There are three main sub-disciplines: developing algorithms and statistics for analyzing large datasets, analyzing various types of biological data like sequences and structures, and developing tools for accessing and managing information.
This document provides an overview of bioinformatics and related topics across 7 parts:
Part I introduces bioinformatics and its areas including genomics, proteomics, computational biology, and databases.
Part II discusses the history of bioinformatics from Darwin's theory of evolution to the human genome project.
Part III focuses on the human genome project, its goals of identifying genes and sequencing DNA, and its benefits like improved medicine.
Part IV explains how the internet plays an important role in bioinformatics for retrieving biological information and resources like databases, tools, and software.
Part V describes different types of biological databases including primary, secondary, and composite databases that combine different sources.
Part VI discusses knowledge discovery
The Protein Data Bank (PDB) is an open database that archives 3D structural data of biological macromolecules. It was established in 1971 and currently holds over 150,000 structures determined by X-ray crystallography or NMR spectroscopy. The PDB is overseen by the Worldwide Protein Data Bank and freely accessible online. It serves as a key resource for structural biology and many other databases rely on protein structures deposited in the PDB.
Data mining involves using machine learning and statistical methods to discover patterns in large datasets and is useful in bioinformatics for analyzing biological data. Bioinformatics analyzes data from sequences, molecules, gene expressions, and pathways. Data mining can help understand these rapidly growing biological datasets. Common data mining tools in bioinformatics include BLAST for sequence comparisons, Entrez for integrated database searching, and ORF Finder for identifying open reading frames. Data mining approaches are well-suited to the enormous volumes of data in bioinformatics databases.
The document provides an overview of the history and scope of bioinformatics. It discusses how bioinformatics emerged from the fields of computer science and biology. The history section outlines major developments from Mendel's work in 1865 to the sequencing of the human genome in 2001. Bioinformatics has various applications in areas like drug development, personalized medicine, and biotechnology. It also has significant scope in India, with growing job opportunities in both the public and private sectors.
Structural databases like PDB, CSD, and CATH contain 3D structural information of proteins, small molecules, and macromolecules determined through techniques like X-ray crystallography and NMR spectroscopy. These databases provide bibliographic data, atomic coordinates, and other details for each entry. PDB contains protein structures, CSD contains organic and metal-organic structures, and CATH classifies protein domains hierarchically. Structural databases have wide applications in structure prediction, analysis, mining, comparison, classification, structure refinement, and database annotation.
PubChem is a key chemical information resource at the National Center for Biotechnology Information that contains 247.3 million substance descriptions, 96.5 million unique chemical structures, and 237 million bioactivity test results. It organizes data into the Substance, Compound, and BioAssay databases. PubChem provides search and analysis tools for its extensive and growing collection of chemical and biological data.
B.sc biochem i bobi u-1 introduction to bioinformaticsRai University
This document provides an introduction to the field of bioinformatics. It defines bioinformatics as using computer science and software tools to store, retrieve, organize and analyze biological data. The history of bioinformatics began in the 1970s with early work to create protein sequence databases. Today, bioinformatics has many applications including drug design, DNA analysis, and agricultural biotechnology. It also covers several key areas including genomics, proteomics, and systems biology. Necessary skills for bioinformatics include knowledge of molecular biology, mathematics, programming, and computer proficiency.
This document summarizes different types of biological data and biological databases. It discusses primary databases like GenBank, EMBL and DDBJ that contain raw nucleotide sequence data. Secondary databases like KEGG and Pfam analyze and annotate primary database content. Composite databases like NCBI aggregate data from multiple primary sources. Protein databases discussed include Swiss-Prot, TrEMBL, PDB, and Pfam. Structural databases such as SCOP, CATH and PDB organize protein structures.
Computational Biology and BioinformaticsSharif Shuvo
Computational Biology and Bioinformatics is a rapidly developing multi-disciplinary field. The systematic achievement of data made possible by genomics and proteomics technologies has created a tremendous gap between available data and their biological interpretation.
This document provides an introduction and overview of data mining. It discusses how data mining extracts knowledge from large amounts of data to discover hidden patterns and predict future trends. It notes that for effective data mining, data sets need to be extremely large. The document outlines some key techniques of data mining including associative learning, artificial neural networks, clustering, genetic algorithms, and hidden Markov models. It also discusses applications of data mining in bioinformatics such as gene finding, protein function prediction, and disease diagnosis. Finally, it acknowledges that while bioinformatics data is rich, developing comprehensive theories remains challenging but creates opportunities for novel knowledge discovery methods.
The document discusses the chemical shift index (CSI) technique, which uses protein nuclear magnetic resonance spectroscopy data to identify and locate secondary protein structures like beta strands and helices. CSI analyzes backbone chemical shift values to detect differences from random coil values that indicate structure. It is a widely used and easy to implement method, though newer techniques have slightly better accuracy. CSI remains popular due to its simplicity and ability to identify structures without specialized software. The document also discusses limitations and examples of CSI's applications and implementation in analysis programs.
A biodiversity hotspot is a biogeographic region with significant biodiversity that is threatened by human activity. To qualify as a hotspot, a region must contain at least 1,500 endemic plant species or 0.5% of the world's plant species, and have lost at least 70% of its primary vegetation. There are 34 biodiversity hotspots globally supporting over 60% of the world's species with high endemism. Examples discussed are the Eastern Himalayas, Lake Biwa in Japan, and Madagascar which is home to many lemur and plant species.
Bioinformatics is the use of computers for storage, retrieval, manipulation, and distribution of information related to biological macromolecules such as DNA, RNA, and proteins. It involves developing computational tools and databases to analyze biological data. Key areas include sequence analysis, structural analysis, functional analysis, biological databases, sequence alignment, protein structure prediction, molecular phylogenetics, and genomics. The goals are to better understand living systems at the molecular level through computational analysis of biological data.
The Molecular Modeling Database (MMDB) is a database hosted by the National Center for Biotechnology Information that contains over 28,000 experimentally determined 3D structures of biomolecules including proteins and nucleic acids derived from the Protein Data Bank, excluding theoretical models. It facilitates computation and links structures to other data types. Each record cross-references its source PDB file. The database contains molecular structures, biological activity data, experimental data, chemical properties, and annotations to aid researchers. Examples of widely used molecular modeling databases discussed are the Protein Data Bank, PubChem, and RCSB Ligand Explorer.
Protein Sequence, Structure, and Functional Databases: UniProtKB, Swiss-Prot, TrEMBL, PIR, MIPS, PROSITE, PRINTS, BLOCKS, Pfam, NDRB, OWL, PDB, SCOP, CATH, NDB, PQS, SYSTERS, and Motif. Presented at UGC Sponsored National Workshop on Bioinformatics and Sequence Analysis conducted by Nesamony Memorial Christian College, Marthandam on 9th and 10th October, 2017 by Prof. T. Ashok Kumar
Introduction
Overview
Reductionist approach
Holistic approach
What is systems biology?
○ Advantages of Systems Biology
Tools of holistic approach
○ Proteomics, Transcriptomics and Metabolomics
Conclusion
References
The document discusses bioinformatics, defining it as the application of information technology to the field of molecular biology. It describes how bioinformatics uses biology, mathematics, and computer science to analyze and manage biological data. Some key applications of bioinformatics mentioned are sequence analysis, prediction of protein structure, genome annotation, comparative genomics, and health/drug discovery. Several important bioinformatics resources are also outlined, including NCBI, PubMed, EMBL, and OMIM.
The document describes several key databases within the KEGG resource, including:
- The PATHWAY database containing molecular network maps of metabolic and genetic pathways.
- The BRITE database providing hierarchical classifications of biological systems beyond what is shown in pathways.
- The LIGAND database consisting of chemical compounds, carbohydrates, reactions, and enzyme information.
KEGG aims to comprehensively capture biological knowledge through integrated databases covering genomes, pathways, diseases and drugs.
Rashi Srivastava presented on the KEGG database in biotechnology. KEGG is a database that contains genomic, chemical, and systems information to understand biological functions from the molecular level up. It includes pathways, genes, compounds, diseases, drugs, and organisms. KEGG can be searched through its flat file format using DBGET or through its relational database format for more complex queries. It also contains the KEGG MEDICUS search tool and direct SQL searches of its relational database.
The document discusses different text-based database retrieval systems for accessing biological data, including Entrez, SRS, and DBGET/LinkDB. It describes their key features and how each system allows users to search text databases using queries, with Entrez providing linked related data across multiple databases. An example shows how each system can be used to retrieve and view related information for a SwissProt protein entry.
As an interdisciplinary field of science, bioinformatics combines biology, computer science, information engineering, mathematics and statistics to analyze and interpret the biological data.
Bioinformatics is an interdisciplinary field that combines biology, computer science, and information technology. It enables the discovery of new biological insights and unifying principles in biology through the merging of these disciplines. There are three main sub-disciplines: developing algorithms and statistics for analyzing large datasets, analyzing various types of biological data like sequences and structures, and developing tools for accessing and managing information.
This document provides an overview of bioinformatics and related topics across 7 parts:
Part I introduces bioinformatics and its areas including genomics, proteomics, computational biology, and databases.
Part II discusses the history of bioinformatics from Darwin's theory of evolution to the human genome project.
Part III focuses on the human genome project, its goals of identifying genes and sequencing DNA, and its benefits like improved medicine.
Part IV explains how the internet plays an important role in bioinformatics for retrieving biological information and resources like databases, tools, and software.
Part V describes different types of biological databases including primary, secondary, and composite databases that combine different sources.
Part VI discusses knowledge discovery
The Protein Data Bank (PDB) is an open database that archives 3D structural data of biological macromolecules. It was established in 1971 and currently holds over 150,000 structures determined by X-ray crystallography or NMR spectroscopy. The PDB is overseen by the Worldwide Protein Data Bank and freely accessible online. It serves as a key resource for structural biology and many other databases rely on protein structures deposited in the PDB.
Data mining involves using machine learning and statistical methods to discover patterns in large datasets and is useful in bioinformatics for analyzing biological data. Bioinformatics analyzes data from sequences, molecules, gene expressions, and pathways. Data mining can help understand these rapidly growing biological datasets. Common data mining tools in bioinformatics include BLAST for sequence comparisons, Entrez for integrated database searching, and ORF Finder for identifying open reading frames. Data mining approaches are well-suited to the enormous volumes of data in bioinformatics databases.
The document provides an overview of the history and scope of bioinformatics. It discusses how bioinformatics emerged from the fields of computer science and biology. The history section outlines major developments from Mendel's work in 1865 to the sequencing of the human genome in 2001. Bioinformatics has various applications in areas like drug development, personalized medicine, and biotechnology. It also has significant scope in India, with growing job opportunities in both the public and private sectors.
Structural databases like PDB, CSD, and CATH contain 3D structural information of proteins, small molecules, and macromolecules determined through techniques like X-ray crystallography and NMR spectroscopy. These databases provide bibliographic data, atomic coordinates, and other details for each entry. PDB contains protein structures, CSD contains organic and metal-organic structures, and CATH classifies protein domains hierarchically. Structural databases have wide applications in structure prediction, analysis, mining, comparison, classification, structure refinement, and database annotation.
PubChem is a key chemical information resource at the National Center for Biotechnology Information that contains 247.3 million substance descriptions, 96.5 million unique chemical structures, and 237 million bioactivity test results. It organizes data into the Substance, Compound, and BioAssay databases. PubChem provides search and analysis tools for its extensive and growing collection of chemical and biological data.
B.sc biochem i bobi u-1 introduction to bioinformaticsRai University
This document provides an introduction to the field of bioinformatics. It defines bioinformatics as using computer science and software tools to store, retrieve, organize and analyze biological data. The history of bioinformatics began in the 1970s with early work to create protein sequence databases. Today, bioinformatics has many applications including drug design, DNA analysis, and agricultural biotechnology. It also covers several key areas including genomics, proteomics, and systems biology. Necessary skills for bioinformatics include knowledge of molecular biology, mathematics, programming, and computer proficiency.
This document summarizes different types of biological data and biological databases. It discusses primary databases like GenBank, EMBL and DDBJ that contain raw nucleotide sequence data. Secondary databases like KEGG and Pfam analyze and annotate primary database content. Composite databases like NCBI aggregate data from multiple primary sources. Protein databases discussed include Swiss-Prot, TrEMBL, PDB, and Pfam. Structural databases such as SCOP, CATH and PDB organize protein structures.
Computational Biology and BioinformaticsSharif Shuvo
Computational Biology and Bioinformatics is a rapidly developing multi-disciplinary field. The systematic achievement of data made possible by genomics and proteomics technologies has created a tremendous gap between available data and their biological interpretation.
This document provides an introduction and overview of data mining. It discusses how data mining extracts knowledge from large amounts of data to discover hidden patterns and predict future trends. It notes that for effective data mining, data sets need to be extremely large. The document outlines some key techniques of data mining including associative learning, artificial neural networks, clustering, genetic algorithms, and hidden Markov models. It also discusses applications of data mining in bioinformatics such as gene finding, protein function prediction, and disease diagnosis. Finally, it acknowledges that while bioinformatics data is rich, developing comprehensive theories remains challenging but creates opportunities for novel knowledge discovery methods.
The document discusses the chemical shift index (CSI) technique, which uses protein nuclear magnetic resonance spectroscopy data to identify and locate secondary protein structures like beta strands and helices. CSI analyzes backbone chemical shift values to detect differences from random coil values that indicate structure. It is a widely used and easy to implement method, though newer techniques have slightly better accuracy. CSI remains popular due to its simplicity and ability to identify structures without specialized software. The document also discusses limitations and examples of CSI's applications and implementation in analysis programs.
A biodiversity hotspot is a biogeographic region with significant biodiversity that is threatened by human activity. To qualify as a hotspot, a region must contain at least 1,500 endemic plant species or 0.5% of the world's plant species, and have lost at least 70% of its primary vegetation. There are 34 biodiversity hotspots globally supporting over 60% of the world's species with high endemism. Examples discussed are the Eastern Himalayas, Lake Biwa in Japan, and Madagascar which is home to many lemur and plant species.
This document provides an overview of bioinformatics, including its history, major areas of research, databases, tools, and applications. Bioinformatics is defined as the use of computer science and information technology to analyze and interpret biological data. The document traces the history of bioinformatics from early genetics experiments in the 1860s to advances in computing and molecular biology in the 1970s that enabled the field. It outlines major research areas like sequence analysis, genome annotation, and computational evolutionary biology. It also discusses biological databases, common bioinformatics tools, and applications of bioinformatics in fields like medicine, agriculture, and comparative genomics.
The document discusses biodiversity hotspots, which are defined as regions that contain at least 1,500 vascular plant species and have lost at least 70% of their original habitat. These 34 hotspots account for only 2.3% of the Earth's land but are home to over 50% of plant and 40% of land vertebrate species. The hotspots are threatened by unsustainable consumption in northern countries and extreme poverty in tropical regions. The document lists the major biodiversity hotspots around the world.
This document outlines proposals for supporting agricultural biotechnology in the Sher-e-Kashmir University of Agricultural Sciences and Technology of Kashmir (SKUASTK) through funding from the Department of Biotechnology (DBT). It describes meetings held to formulate projects in areas like societal development, field crops, horticulture, and animal disease diagnosis. If funded, the proposals aim to address challenges to agriculture in Jammu and Kashmir like climate change, reduce chemical use, and improve livelihoods. Specific projects focus on developing biotech solutions for crops like saffron and rice, establishing a School of Biotechnology and disease diagnosis center, and strengthening rural livelihoods through livestock, poultry and flor
Bioinformatics issues and challanges presentation at s p collegeSKUASTKashmir
This document provides an overview of bioinformatics and some key concepts:
- It discusses the exponential growth of biological data from technologies like PCR and microarrays, and how bioinformatics is needed to analyze this data.
- Bioinformatics is defined as integrating biology and computer science to collect, analyze, and interpret large amounts of molecular-level information. It uses databases and tools to study genomes, proteins, and biological processes.
- Major databases like GenBank, EMBL, and SwissProt store DNA, RNA, protein sequences and provide access to researchers. Tools like BLAST are used to search databases and analyze sequences.
- Benefits of bioinformatics include advances in medicine, agriculture, forensics
The document discusses biodiversity hotspots, which are regions with high levels of endemic species that have lost at least 70% of their original habitat. It identifies 34 biodiversity hotspots around the world that meet these criteria. The main threats to biodiversity include habitat destruction, climate change, habitat fragmentation, pollution, overexploitation, and disease. Conservation efforts aim to prioritize protecting threatened regions through establishing reserves and protected areas, providing incentives for conservation, using regulations and market-based tools, developing new conservation professionals and projects, and promoting ecotourism.
This document provides a summary of computational organic chemistry methods for investigating molecular structures, properties, reactivities, and selectivities. It describes the basic concepts of molecular mechanics and quantum chemistry methods, including their advantages and limitations. Applications include determining molecular geometries and conformations, absolute configurations, electron distributions, acidities, and frontier molecular orbital energies to examine reactivity and selectivity. Combining computational methods with experiments allows more reliable investigations.
Target identification in drug discoverySwati Kumari
The document discusses target identification in drug discovery. It begins by defining a target and explaining that target identification is the first step in drug discovery. It then discusses various approaches to target identification, including direct biochemical methods, genetic interaction methods, and computational inference methods. The document also discusses characteristics of drug targets and how drugs interact with targets at the molecular level. It provides examples of tools that can be used for target identification and validation, such as microarrays, antisense technology, and proteomics. In summary, the document outlines the process of target identification in drug discovery and various methods that can be used to identify and validate potential drug targets.
Bioinformatics plays a key role in drug discovery by enabling researchers to efficiently analyze large amounts of biological data and computationally simulate drug-target interactions. Some important applications of bioinformatics in drug discovery include virtual high-throughput screening of compound libraries against protein targets to identify potential drug leads, analyzing genetic and protein sequences to infer evolutionary relationships and identify drug targets, and using homology modeling to predict the 3D structures of targets to aid in drug design when experimental structures are unknown.
Biodiversity hotspots are biogeographic regions with significant biodiversity that is threatened by humans. There are 25 hotspots identified worldwide based on having many endemic species and facing severe threats. Two of the hotspots are in India: the Western Ghats and Himalayan regions of northeast India and Myanmar. These hotspots are rich in endemic plant and animal species like reptiles, amphibians, insects and mammals. However, only a small percentage of the total land in biodiversity hotspots is currently protected.
This document discusses databases in bioinformatics. It begins by noting the rapid increase in biological data from sources like gene sequences, protein sequences, structural data, and gene expression data. It then defines biological databases as structured, searchable collections of data that are periodically updated and cross-referenced. The major purposes of databases are to make biological data available, systematize the data, and allow analysis of computed biological data. The document provides a brief history of biological databases and sequencing efforts. It also classifies biological databases based on data type, maintenance status, data access, data sources, database design, and organism. Specific databases discussed include DDBJ, EMBL, GenBank, Swiss-Prot, and NCB
Sustainable Software for Computational Chemistry and Materials ModelingSoftwarePractice
This document discusses the formation of an institute called S2I2C2M2 that aims to promote more sustainable software development practices for computational chemistry. It outlines challenges with current software, including complexity, lack of parallelism, and outdated development practices. The institute will focus on developing portable parallel infrastructures, general tensor algebra algorithms, standards for code and data sharing, and education initiatives to train the next generation of computational chemists. The goal is to overcome barriers and change the culture around computational chemistry software to better support collaborative open science.
protein sturcture prediction and molecular modellingDileep Paruchuru
This document discusses molecular modeling and protein structure prediction. It begins by introducing molecular modeling as a combination of computational chemistry and computer graphics that allows scientists to generate and present molecular data. It then discusses the two main computational methods for molecular modeling - molecular mechanics and quantum mechanics. The document goes on to discuss molecular mechanics in more detail and its applications. It also discusses protein structure and function, the challenges of protein structure prediction, and the goals of protein structure prediction.
Hamid-ur-Rahman studied Zoology at Islamia College in Peshawar. The document defines bioinformatics as the unified discipline combining biology, computer science, and information technology, aiming to solve biological problems using DNA, amino acid sequences, and related information, as described by Frank Tekaia.
This document provides an overview of bioinformatics and bioinformatics databases. It defines bioinformatics as the application of information technology to molecular biology to analyze and interpret biological data. This includes tasks like mapping and analyzing DNA and protein sequences. The document discusses how bioinformatics databases are used to store and manage the large amounts of biological data generated. It describes the characteristics of biological databases and how they are used for querying and retrieving sequence information. Key areas of bioinformatics research and important sequence databases are also summarized.
This document discusses different types of occupational hazards that workers may be exposed to, including physical, chemical, biological, mechanical, and psychosocial hazards. Some key physical hazards mentioned are heat, cold, light, noise, vibration, and radiation. Chemical hazards can enter the body through local skin contact, inhalation of dusts, gases or metal compounds, or ingestion. Biological hazards come from infectious agents. Mechanical hazards stem from injuries from machinery. Psychosocial hazards relate to stress, relationships, and mental health issues from an unfamiliar work environment. Overall the document provides a detailed classification and explanation of various occupational health risks.
1. Bioinformatics uses computer science and information technology to analyze biological data and assist with drug discovery. It helps identify drug targets and design drug candidates.
2. The drug design process involves identifying a disease target, studying compounds of interest, detecting molecular disease bases, rational drug design, refinement, and testing. Bioinformatics tools assist with each step.
3. CADD uses computational methods to simulate drug-receptor interactions and is heavily dependent on bioinformatics tools and databases. It supports techniques like virtual screening, sequence analysis, homology modeling, and physicochemical modeling to aid drug development.
Bioinformatics analyzes massive amounts of biological data like DNA sequences to uncover hidden biological information. It has many applications like molecular medicine, drug development, and microbial genome analysis. Common bioinformatics tools like BLAST are used to compare query sequences against databases to find similar sequences. BLAST works through a heuristic algorithm that finds short matches between sequences to locate potential homologs in an efficient manner. Other algorithms like Smith-Waterman and FASTA also perform sequence alignment but with different tradeoffs in accuracy and speed.
Occupational health refers to potential risks to worker health and safety from their jobs outside the home. It aims to promote worker well-being, prevent job-related illness, protect workers from health risks, and ensure a balance between workers and their occupational environments. Occupational health hazards include physical, chemical, biological, and psychosocial risks that can cause diseases, injuries, stress, and disorders if not properly controlled. Preventing occupational diseases involves measures related to the work environment, medical care, health education, and protecting workers from various hazards.
DNA sequencing: rapid improvements and their implicationsJeffrey Funk
these slides analyze the rapid improvements in DNA sequencers and the implications for these rapid improvements for drug discovery, new crops, materials creation, and new bio-fuels. Many of the rapid improvements are from "reductions in scale." As with integrated circuits, reducing the size of features on DNA sequencers has enabled many orders of magnitude improvements in them. Unlike integrated circuits, the improvements are also due to changes in technology. For example, changes from pyrosequencing to semiconductor and nanopore sequencing have also been needed to achieve the reductions in scale. Second, pyrosequencing also benefited from improvements in lasers and camera chips.
Presentation by Tina Graves-Lindsay at GRC/GIAB ASHG 2017 workshop "Getting the most from the reference assembly and reference materials" on production of reference grade assemblies for various human populations.
Total workflow solutions that cater every budget, performance or throughput requirement for confirmatory dioxin analysis were discussed in the Thermo Scientific Lunch Seminar at the Dioxin 2014 conference. D. Hope, CEO & Owner Pacific Rim Laboratoris, presented about the economies of POPs analysis from the point of view of a leading laboratory using the very latest dioxin method kits. C. Cojocariu, Thermo Fisher Scientific, discussed recent changes in EU regulations which bring new opportunities for more labs to participate in dioxin analysis and about validating methods using Gas Chromatography triple quadrupole for PCDD/Fs with reference to the new EU Commission Regulation No. 709/2014.
This lecture introduces next-generation sequencing and its applications in biomedical research. It discusses how next-gen sequencing is transforming genetic disease diagnosis and personalized medicine. The lecture covers sequencing workflows including read alignment, variant calling, and annotation. It also describes different sequencing experiments like whole genome, exome, RNA-seq, and ChIP-seq. Finally, it discusses how next-gen sequencing is advancing research into genetic diseases and cancer genomics.
Generating haplotype phased reference genomes for the dikaryotic wheat strip...Benjamin Schwessinger
The document summarizes work to generate haplotype phased reference genomes for the wheat stripe rust fungus Puccinia striformis f. sp. tritici. High quality DNA was extracted and sequenced using PacBio long reads, resulting in an assembly of under 400 contigs. Mapping of the primary and associated contigs showed heterozygosity between the two dikaryotic nuclei. Future work includes repeat annotation, RNAseq mapping, sequencing additional isolates, and single nucleus sequencing to better understand the dikaryotic nature of the fungus and its success. The work aims to generate chromosomally-level assemblies of both dikaryotic nuclei.
This document summarizes the design of the Magnetag V3.0 system, which aims to advance gameplay for a magnetic tag vest system by adding hit strength and location tracking capabilities. It describes the design of the coil matrix sensor, analog conditioning circuitry, power supply, analog-to-digital conversion system, and Bluetooth Low Energy transmitter. The coil matrix was designed with 8 coils to provide 75% torso coverage. Testing showed the system can differentiate hit strengths and locations. The BLE transmitter was shown to reliably transmit 8-bit and 16-bit packets with adequate power up to 2 meters. Next steps include refining the hit detection algorithm and migrating components to the nRF51422 chip.
Real-time PCR is a technique that monitors DNA amplification during the PCR process in real-time using fluorescence detection. It allows for both quantification of DNA present and detection of DNA amplification as it occurs. Real-time PCR has advantages over traditional PCR such as higher sensitivity, specificity, and ability to provide quantitative results. It uses sequence-specific DNA probes labeled with fluorescent dyes and quenchers to detect amplification of target DNA sequences. Data analysis can provide both absolute and relative quantification of DNA targets. Real-time PCR has many applications including gene expression analysis, disease diagnosis, and food and environmental testing.
The Challenges of Analytical Method Validation for Hallucinogens and Designer...NMS Labs
The Challenges of Analytical Method Validation for Hallucinogens and Designer Stimulants in Biological Samples Using LC-TOF
Hosted by Agilent Technologies on October 8, 2012
Presented by Barry K Logan, Ph.D., NMS Labs National Director, Forensic Services
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
The document provides an overview of plant genome sequence assembly, including:
1) A brief history of sequencing technologies and their improvements over time, from Sanger sequencing to newer technologies producing longer reads.
2) Key steps in a sequencing project including read processing, filtering, and corrections before assembly into contigs and scaffolds using appropriate software.
3) Factors to consider for experimental design and assembly optimization such as sequencing depth, library types, and software choices depending on the genome and data characteristics.
This document discusses heart rate variability (HRV) analysis techniques including time domain, frequency domain, and non-linear analyses. It provides examples of HRV analyses on data collected from tilt table tests and baroreflex sensitivity tests using nitroglycerin and phenylephrine drugs. Graphs show changes in heart period and interbeat interval over time during the tests and frequency domain analyses breaking down power in very low, low, and high frequency bands.
With the development of Next Generation
Sequencing Technology (NGS), the field of hominin
paleogenetics has transformed significantly from
studying specific DNA markers to revealing whole
genome information. However, ancient DNA of
interest is usually highly fragmented so an NGS
library preparation protocol optimized to capture
short DNA fragments (40bp to 200bp) was
developed. The improved workflow includes the
use of column-based DNA purification and
concentration and automated gel-based sizeselection.
This workflow permitted production of
“shotgun” genomic libraries from very limited input
DNA (6ng to 39ng). Methods that permit the use
of such low input, degraded DNA enable the
partitioning of exceedingly rare samples into
multiple analytical workflows.
Data from two orthogonal sequencing platforms for
these ancient Bulgarian samples demonstrated
very similar base-substitution profiles with C>T and
G>A variants accounting for ~75-80% of all SNPs
called in both datasets. With such orthogonal
verification, we expect to be able to reduce the
false positive rate and generate a “truth” list of
SNPs that will enhance our understanding of
ancient population genomics and migrations. In
summary, we have demonstrated a library
preparation and semiconductor-based NGS
workflow that is applicable for processing
contaminated and degraded samples and can be
used for ancient DNA research.
The document describes the Xact 640 continuous emissions monitoring system from Cooper Environmental. It can simultaneously identify and measure up to 23 elements in flue gas streams. Key features include non-destructive sampling and analysis using X-ray fluorescence, with sampling and results every 15 to 120 minutes. The EPA has recognized it as an innovative clean air technology.
How to Standardise and Assemble Raw Data into Sequences: What Does it Mean fo...Joseph Hughes
11th OIE Seminar at the XVII INTERNATIONAL SYMPOSIUM OF THE WORLD ASSOCIATION OF VETERINARY LABORATORY DIAGNOSTICIANS (WAVLD)
Saskatoon - 17th June 2015
This document summarizes work on generating haplotype phased reference genomes for the wheat stripe rust fungus Puccinia striiformis f. sp. tritici. Key points:
1) Long-read PacBio sequencing was used to generate improved genome assemblies with fewer contigs and the ability to distinguish between the two haplotypes of the dikaryotic fungus.
2) Mapping of the assemblies showed distinct sequences corresponding to the two haplotypes.
3) Future work includes manual curation of the genome assembly, annotating genes and repeats, and investigating the interaction between the two fungal nuclei.
How deep is your buffer – Demystifying buffers and application performanceCumulus Networks
Packet buffer memory is among the oldest topics in networking, and yet it never seems to fade in popularity. Starting from the days of buffers sized by the bandwidth delay product to what is now called "buffer bloat", from the days of 10Mbps to 100Gbps, the discussion around how deep should the buffers be never ceases to evoke opinionated responses.
In this webinar we will be joined by JR Rivers, co-founder and CTO of Cumulus Networks, a man who has designed many ultra-successful switching chips, switch products, and compute platforms, to discuss the innards of buffering. This webinar will cover data path theory, tools to evaluate network data path behavior, and the configuration variations that affect application visible outcomes.
Cathode ray oscilloscope and related experimentsTrisha Banerjee
This document discusses the use and operation of cathode ray oscilloscopes. It describes the basic components of an oscilloscope including the electron gun, deflection plates, and phosphor screen. It explains how oscilloscopes are used to observe and measure the amplitude, frequency, and timing of electrical signals. Specialized oscilloscopes can analyze signal spectra. The document also discusses focus, intensity, and timebase controls for adjusting the oscilloscope display. Examples of oscilloscope use include examining Fourier analysis, resonating LCR circuits, and measuring dual signal time bases.
This document summarizes work being done to improve human reference genomes using alternative samples. It notes that the initial human reference is incomplete and additional sequences are needed to represent diversity. It then describes efforts to generate "platinum" quality assemblies of additional samples, including CHM1 and CHM13, using long read sequencing and scaffolding with optical mapping. Initial assembly stats are provided for CHM13 and NA19240, and future plans include integrating targeted sequences, adding more diversity, and developing tools to utilize alternate haplotypes in the reference.
HIV Vaccines Process Development & Manufacturing - Pitfalls & PossibilitiesKBI Biopharma
Originally presented at the HIV Vaccine Manufacturing Workshop –July 19th& 20th, 2017 by Abhinav A. Shukla, Ph.D.Senior Vice PresidentDevelopment & ManufacturingKBI Biopharma, Durham NC
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumMJDuyan
(𝐓𝐋𝐄 𝟏𝟎𝟎) (𝐋𝐞𝐬𝐬𝐨𝐧 𝟏)-𝐏𝐫𝐞𝐥𝐢𝐦𝐬
𝐃𝐢𝐬𝐜𝐮𝐬𝐬 𝐭𝐡𝐞 𝐄𝐏𝐏 𝐂𝐮𝐫𝐫𝐢𝐜𝐮𝐥𝐮𝐦 𝐢𝐧 𝐭𝐡𝐞 𝐏𝐡𝐢𝐥𝐢𝐩𝐩𝐢𝐧𝐞𝐬:
- Understand the goals and objectives of the Edukasyong Pantahanan at Pangkabuhayan (EPP) curriculum, recognizing its importance in fostering practical life skills and values among students. Students will also be able to identify the key components and subjects covered, such as agriculture, home economics, industrial arts, and information and communication technology.
𝐄𝐱𝐩𝐥𝐚𝐢𝐧 𝐭𝐡𝐞 𝐍𝐚𝐭𝐮𝐫𝐞 𝐚𝐧𝐝 𝐒𝐜𝐨𝐩𝐞 𝐨𝐟 𝐚𝐧 𝐄𝐧𝐭𝐫𝐞𝐩𝐫𝐞𝐧𝐞𝐮𝐫:
-Define entrepreneurship, distinguishing it from general business activities by emphasizing its focus on innovation, risk-taking, and value creation. Students will describe the characteristics and traits of successful entrepreneurs, including their roles and responsibilities, and discuss the broader economic and social impacts of entrepreneurial activities on both local and global scales.
বাংলাদেশের অর্থনৈতিক সমীক্ষা ২০২৪ [Bangladesh Economic Review 2024 Bangla.pdf] কম্পিউটার , ট্যাব ও স্মার্ট ফোন ভার্সন সহ সম্পূর্ণ বাংলা ই-বুক বা pdf বই " সুচিপত্র ...বুকমার্ক মেনু 🔖 ও হাইপার লিংক মেনু 📝👆 যুক্ত ..
আমাদের সবার জন্য খুব খুব গুরুত্বপূর্ণ একটি বই ..বিসিএস, ব্যাংক, ইউনিভার্সিটি ভর্তি ও যে কোন প্রতিযোগিতা মূলক পরীক্ষার জন্য এর খুব ইম্পরট্যান্ট একটি বিষয় ...তাছাড়া বাংলাদেশের সাম্প্রতিক যে কোন ডাটা বা তথ্য এই বইতে পাবেন ...
তাই একজন নাগরিক হিসাবে এই তথ্য গুলো আপনার জানা প্রয়োজন ...।
বিসিএস ও ব্যাংক এর লিখিত পরীক্ষা ...+এছাড়া মাধ্যমিক ও উচ্চমাধ্যমিকের স্টুডেন্টদের জন্য অনেক কাজে আসবে ...
Strategies for Effective Upskilling is a presentation by Chinwendu Peace in a Your Skill Boost Masterclass organisation by the Excellence Foundation for South Sudan on 08th and 09th June 2024 from 1 PM to 3 PM on each day.
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxEduSkills OECD
Iván Bornacelly, Policy Analyst at the OECD Centre for Skills, OECD, presents at the webinar 'Tackling job market gaps with a skills-first approach' on 12 June 2024
Gender and Mental Health - Counselling and Family Therapy Applications and In...PsychoTech Services
A proprietary approach developed by bringing together the best of learning theories from Psychology, design principles from the world of visualization, and pedagogical methods from over a decade of training experience, that enables you to: Learn better, faster!
How to Setup Warehouse & Location in Odoo 17 InventoryCeline George
In this slide, we'll explore how to set up warehouses and locations in Odoo 17 Inventory. This will help us manage our stock effectively, track inventory levels, and streamline warehouse operations.
This document provides an overview of wound healing, its functions, stages, mechanisms, factors affecting it, and complications.
A wound is a break in the integrity of the skin or tissues, which may be associated with disruption of the structure and function.
Healing is the body’s response to injury in an attempt to restore normal structure and functions.
Healing can occur in two ways: Regeneration and Repair
There are 4 phases of wound healing: hemostasis, inflammation, proliferation, and remodeling. This document also describes the mechanism of wound healing. Factors that affect healing include infection, uncontrolled diabetes, poor nutrition, age, anemia, the presence of foreign bodies, etc.
Complications of wound healing like infection, hyperpigmentation of scar, contractures, and keloid formation.
हिंदी वर्णमाला पीपीटी, hindi alphabet PPT presentation, hindi varnamala PPT, Hindi Varnamala pdf, हिंदी स्वर, हिंदी व्यंजन, sikhiye hindi varnmala, dr. mulla adam ali, hindi language and literature, hindi alphabet with drawing, hindi alphabet pdf, hindi varnamala for childrens, hindi language, hindi varnamala practice for kids, https://www.drmullaadamali.com
Leveraging Generative AI to Drive Nonprofit InnovationTechSoup
In this webinar, participants learned how to utilize Generative AI to streamline operations and elevate member engagement. Amazon Web Service experts provided a customer specific use cases and dived into low/no-code tools that are quick and easy to deploy through Amazon Web Service (AWS.)
3. What is bioinformatics?
Bioinformatics is an interdisciplinary field that
develops methods and software tools for
understanding biological data. As an
interdisciplinary field of science, bioinformatics
combines computer science, statistics,
mathematics, and engineering to study and
process biological data.
http://en.wikipedia.org/wiki/Bioinformatics
2015-03-23 3
4. A little bit of history
• 1951 – Sequencing peptide (Frederick Sanger)
• 1965 – Sequencing RNA (Robert Holley)
• 1970 – Term BIOINFORMATICS coined by
Paulien Hogeweg & Ben Hesper
• 1977 – Sequencing DNA (Frederick Sanger)
• 1990 – Human Genome Project started
(expected duration 15 years)
• 2003 – Human Genome Project completed
2015-03-23 4
5. • It’s all about money!!!!
2015-03-23
Why is bioinformatics so important?
5
6. Cost of sequencing
Sboner et al. Genome Biology 2011 12:125 doi:10.1186/gb-2011-12-8-125
2015-03-23 6
7. Cost of sequencing & data analysis
Sboner et al. Genome Biology 2011 12:125 doi:10.1186/gb-2011-12-8-125
2015-03-23 7
9. Future of biological research
• With rapidly advancing automation there
will be less human efforts needed for
sample preparation
• With increasing amount of information data
analysis will be more important
• The information output of experiments is
growing beyond human capability: need of
high level summaries and statistics
2015-03-23 9
16. Quality filtering and trimming
TAGCGCAATACTTTCTGTTAGCGCAAATCCTAGTAGTGCAT
AGTGGTATCAACGCAGAGTACGGG
2015-03-23 16
17. Sequence search (BLAST)
• BLAST is one of the most commonly used
bioinformatics software
• It finds small sub-sequences of your query
in the subject sequence
• Uses word to match with the database of
subject and then uses heuristics to verify
and extend match
2015-03-23 17
20. Sequence/genome alignment
• Global alignment
– global optimization that "forces" the alignment
to span the entire sequences
(Needleman–Wunsch algorithm or Clustal style)
• Local alignment
– identify short regions of similarity within long
divergent sequences
(Smith–Waterman algorithm or BLAST style)
2015-03-23 20
22. Genome alignment
• Glocal alignment
• Uses a word matching method
• Creates suffix tree for faster search
• Searches suffix tree for exact matches of
words clusters them and then uses local
alignment methods to extend match
2015-03-23 22
25. Assembly
• Short read assembly is extremely difficult
and computationally intensive!
• For longer reads an Overlap Consensus
(OLC) assemblers are used
• For shorter reads (and in
high numbers) De Bruijn
Graph assemblers are
better
2015-03-23 25Source: Commins, Toft & Fares (CC BY-SA 2.5)
34. PDB and structural information
• Protein Data Bank holds information about
structure of proteins, nucleic acids and
complexes – over 100 000 entries!
• The 3D structure can be resolved by:
– X-ray diffraction
– NMR
– Electron microscopy
– Simulations
2015-03-23 34
35. PDB and structural information
HEADER TRANSCRIPTION 18-MAR-04 1VD4
TITLE SOLUTION STRUCTURE OF THE ZINC FINGER DOMAIN OF TFIIE ALPHA
COMPND 2 MOLECULE: TRANSCRIPTION INITIATION FACTOR IIE, ALPHA
COMPND 8 ENGINEERED: YES
SOURCE MOL_ID: 1;
SOURCE 2 ORGANISM_SCIENTIFIC: HOMO SAPIENS;
SOURCE 10 EXPRESSION_SYSTEM_PLASMID: PET11D
KEYWDS ZINC FINGER, TRANSCRIPTION
EXPDTA SOLUTION NMR
NUMMDL 20
AUTHOR M.OKUDA,A.TANAKA,Y.ARAI,M.SATOH,H.OKAMURA,A.NAGADOI,
REMARK 500 CHOLOGY: RAMACHANDRAN REVISITED. STRUCTURE 4, 1395 - 1400
REMARK 500
REMARK 500 M RES CSSEQI PSI PHI
REMARK 500 1 GLU A 118 -36.12 -163.20
REMARK 500 1 ARG A 119 -92.03 -138.92
REMARK 500 1 THR A 122 -70.74 -110.33
SITE 1 AC1 5 CYS A 129 CYS A 132 CYS A 154 CYS A 157
SITE 2 AC1 5 THR A 159
CRYST1 1.000 1.000 1.000 90.00 90.00 90.00 P 1 1
ORIGX1 1.000000 0.000000 0.000000 0.00000
ORIGX3 0.000000 0.000000 1.000000 0.00000
SCALE1 1.000000 0.000000 0.000000 0.00000
SCALE3 0.000000 0.000000 1.000000 0.00000
MODEL 1
ATOM 1 N ARG A 113 1.980 -19.277 -19.127 1.00 0.00 N
ATOM 2 CA ARG A 113 1.202 -19.280 -17.853 1.00 0.00 C
ATOM 3 C ARG A 113 0.666 -17.875 -17.557 1.00 0.00 C
ATOM 4 O ARG A 113 0.625 -17.023 -18.421 1.00 0.00 O
ATOM 5 CB ARG A 113 2.199 -19.713 -16.778 1.00 0.00 C
ATOM 6 CG ARG A 113 2.435 -21.222 -16.875 1.00 0.00 C
ATOM 7 CD ARG A 113 3.604 -21.619 -15.971 1.00 0.00 C
ATOM 8 NE ARG A 113 2.986 -21.899 -14.645 1.00 0.00 N
ATOM 9 CZ ARG A 113 3.125 -23.073 -14.094 1.00 0.00 C
2015-03-23 35
38. Molecular networks
• Bioinformatics is needed to describe
interactions between proteins, DNA, drugs…
• When thousands of interactions are
analyzed network science come to use
• The set of all protein-protein interactions in
single cell is called interactome
• A single interaction can be researched in
vivo/in vitro but more complex network can
be only investigated in silico
2015-03-23 38
40. Metabolic pathways
• To describe a series of biochemical reactions
that often happen in different cellular
compartments, bioinformatics is also useful
• For description of pathways special
databases (graph) had to be designed
• Modeling of metabolites flow in pathway is
virtually impossible without the use of
computers
2015-03-23 40
43. Simulation of biological systems
• Simulation of cell-cell interactions
• Description of interactions inside population
• Between species interactions
• Food chains => food web
• Social relations
• Evolution of populations
• Modeling in pharmacology
2015-03-23 43
46. Databases
• Different types public resources available:
2015-03-23 46
Nucleic sequence
Protein sequence
EST
Genome
Sequence
data
Metadata/Ontologies
Functional
annotation
Gene models
Gene ontologies
Protein structure
Structural data Complexes
structure
RNA structure
Variation dataSNP
SSRindels
Interactions
Metabolic data
Pathways
48. Databases
• How to use them?
– Browsing websites
directly
– Downloading
– Using API
2015-03-23 48
49. Text/data mining
• Obtaining information from several
scientific resources becoming is more
difficult as the volume of information grows
• Number of different resources/databases is
growing and simple search has to be
repeated for each of them
• Filtering relevant information is a big
intellectual/computational burden
2015-03-23 49
50. Text mining
• Retrieval, analysis and formatting (parsing)
of information into searchable databases
• Recognition of patterns
• Recognition of natural language
• Extraction of semantic or grammatical
relationships
• Coreference: terms that refer to the same
object
2015-03-23 50
51. Text mining example
• Query: Find promoters known to work in
E.coli with s70 holenzyme (Es70) aka sD
• PREFIX sbol:http://sbols.org/sbol.owl#
PREFIX pr:http://partsregistry.org/#
SELECT DISTINCT ?name
WHERE {
?part a sbol:Part;
sbol:status ?st;
sbol:name ?name;
sbol:dnaSequence ?seq;
a pr:promoter;
a ?cl.
FILTER (?cl =pr:sigma70_ecoli_prokaryote_rnap
&& ?st !='Deleted')}
2015-03-23 51
52. Open source software
• Software that anyone can use, modify, share
and distribute.
• Source code is known and can (should!) be
modified to fit the user requirements
• Society driven development
• Dynamic development and early releases
• Security and transparency
2015-03-23 52
53. Open source software repositories
2015-03-23 53
CRAN
The Comprehensive R Archive Network
CodePlex
55. CAN I BE A BIOINFORMATICIAN, TOO?
2015-03-23 55
56. How to become a bioinformatician?
• Get a computer with Linux
• Learn how to use bash shell
and how to run programs
command line
• Learn to code in python or Perl
• Try solving basic problems on
2015-03-23 56
57. How to become a bioinformatician?
• Read blogs:
• Read fora for geeks:
• Get an account on:
2015-03-23 57
58. Want to know more?
• Join my network on
http://nl.linkedin.com/in/andrzejstefanczech
• Come to Wageningen for an internship at
Genetwister Technologies B.V.
http://www.genetwister.nl/
• Slides from this lecture are also available on
SlideShare
2015-03-23 58