Quantitation and quantitative tools are indispensable in modern biology. Gregor Mendel and Thomas Morgan, who, by simply counting genetic variations of plants and fruit flies, were able to discover the principles of genetic inheritance.
For very sophisticated uses of quantitative tools, one may find application to model animal behavior and evolution , or the use of millions of nonlinear partial differential equations to model cardiac blood flow . it is clear that mathematical and computational tools have become an integral part of modern-day biological research.
a union of biology and informatics: bioinformatics involves the technology that uses computers for storage, retrieval, manipulation, and distribution of information related to biological macromolecules such as DNA, RNA, and proteins .
For example, mathematical modeling of ecosystems, population dynamics, and phylogenetic construction using fossil records all employ computational tools, but do not necessarily involve biological macromolecules. Computational molecular biology .
Bioinformatics is limited to sequence, structural, and functional analysis of genes and genomes and their corresponding products
Bioinformatics consists of two subfields: the development of computational tools and databases and the application of these tools and databases in generating biological knowledge to better understand living systems .
These two subfields are complementary to each other. The tool development includes writing software for sequence, structural, and functional analysis, aswell as the construction and curating of biological databases .
The analyses of biological data often generate new problems and challenges that in turn spur the development of new and better computational tools.
Is to better understand a living cell and how it functions at the molecular level. By analyzing raw molecular sequence and structural data, bioinformatics research can generate new insights and provide a “global” perspective of the cell.
The reason that the functions of a cell can be better understood by analyzing sequence data is ultimately because the flow of genetic information is dictated by the “central dogma” of biology in whichDNAis transcribed to RNA, which is translated to proteins.
knowledge-based drug design : (identification of novel leads for synthetic drugs)
Knowledge of the three-dimensional structures of proteins allows molecules to be designed that are capable of binding to the receptor site of a target protein with great affinity and specificity.
This informatics-based approach significantly reduces the time and cost necessary to developdrugs with higher potency, fewer side effects, and less toxicity than using the traditional trial-and-error approach
Biomedical science: The high speed genomic sequencing coupled with sophisticated informatics technology will allow a quickly sequence a patient’s genome and easily detect potential harmful mutations and to engage in early diagnosis and effective treatment of diseases.
What Is SNP? : its represents a substitution of one base for another, e.g., C to T or A to G.
SNP is the most common variation in the human genome and occurs approximately once every 100 to 300 bases.
SNP is terminologically distinguished from mutation based on an arbitrary population frequency cutoff value: 1%, with SNP > 1% and mutation < 1%.
A key aspect of research in genetics is associating sequence variations with heritable phenotypes . Because SNPs are expected to facilitate large-scale association genetics studies, there has been an increasing interest in SNP discovery and detection.
Identification of SNPs that contribute to susceptibility to common diseases will provide highly accurate diagnostic information that will facilitate early diagnosis, prevention, and treatment of human diseases.
Common SNPs, ranging from a minor allele frequency of 5 to >20%, are of interest because it has been argued that common genetic variation can explain a proportion of common human disease — the common variant/common disease (CV/CD) hypothesis.
There are two types of coding SNPs: nonsynonymous SNPs and synonymous SNPs.
“ Direct” approach: nonsynonymous SNPs directly affect protein function, many investigators focus on the genotyping of coding SNPs in genetic association studies, however this approach lies in predicting or determining a priori which SNPs are likely to be causative or predicting the phenotype of interest.
The “indirect” approach to genetic association studies differs from the direct approach in that the causal SNP is not assayed directly.
The assumption is that the assayed or genotyped SNPs will be in linkage disequilibrium or associated with the causative SNP.
Is caused by mutations in a single gene encoding CFTR, the disease has a variable clinical phenotype.
The most common mutation associated with cystic fibrosis, deletion of a phenylalanine at position 508 (frequency, 67%), is associated with severe disease. But some mutations, in which arginine is replaced by histidine at residue at 117 (R117H; 0.8%), tryptophan at 334 (0.4%), or proline at 347 (0.5%), are associated with milder disease.
NCBI: CFTR cystic fibrosis transmembrane conductance regulator (ATP-binding cassette sub-family C, member 7) [ Homo sapiens ]
Agricultural biotechnology : Plant genome databases and gene expression profile analyses have played an important role in the development of new crop varieties that have higher productivity and more resistance to disease , insecticides and insects.
The quality of bioinformatics predictions depends on the quality of data and the sophistication of the algorithms being used. (Sequence data from high throughput analysis often contain errors), it is so important to maintain a realistic perspective of the role of bioinformatics.
To providing more reliable and more rigorous computational tools for sequence, structural, and functional analysis, the major challenge for future bioinformatics development is to develop tools for elucidation of the functions and interactions of all gene products in a cell. .
This molecular simulation of all the cellular processes is termed systems biology . The ultimate goal of this endeavor is to transform biology from a qualitative science to a quantitative and predictive science .
Databases DNA Sequence Data EBI: http://www.ebi.ac.uk/ NCBI:http://www.ncbi.nlm.nih.gov/ DDBJ:http://www.ddbj.nig.ac.jp / (Sequence Retrieval System) NCBI ( National Center for Biotechnology Information) ( European Bioinformatics Institute)
A database is a computerized archive used to store and organize data in such a way that information can be retrieved easily via a variety of search criteria .
Databases are composed of computer hardware and software for data management.
Each record, also called an entry , should contain a number of fields that hold the actual data items, for example, fields for names, phone numbers, addresses, dates.
To retrieve a particular record from the database, a user can specify a particular piece of information, called value , to be found in a particular field and expect the computer to retrieve the whole data record. This process is called making a query.
Construction and query of an object-oriented database. Three objects are constructed and are linked by pointers shown as arrows. Finding specific information relies on navigating through the objects by way of pointers. For simplicity, some of the pointers are omitted
Historically there are two main methods of DNA sequencing: Maxam & Gilbert , using chemical sequencing Sanger , using dideoxynucleotides . Modern sequencing equipment uses the principles of the Sanger technique.
The Sample wells are loaded with DNA to be sequenced. Great care needs to be taken to ensure that each sample goes into its assigned well.
Reagents are added (water, dye, primers) in required amounts
The sample wells are ‘ spun ’ to ensure that the DNA and reagents are mixed and at the bottom of the sample wells.
Sample tray and micropipettor. Each tray holds 96 samples
Step 7 - The samples are run through a cycle sequencing process to get the fluorescent dyes incorporated by the DNA. The DNA and reagents are alternately heated and cooled over a 2 1/2 hour period.