Intro bioinfo


Published on

hi from vini

Published in: Education, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • DNA sequences of genes are rarely of any functional value alone. It is the proteins that they encode that are important to the organism. The process of reading the code in DNA and converting that code into a functional protein is highly conserved across almost all branches of life. An RNA-based copy of a gene’s DNA sequence on a chromosome is constructed by a molecule called RNA polymerase through a process called transcription. This RNA molecule is then read by ribosomes, which manufacture amino acids and assemble them into amino acid sequences. This latter process is known as translation. To summarize: DNA sequences are transcribed into RNA sequences, which are then translated into proteins.
  • A gene sequence is not simply a series of codons. Instead, there are several key components. Promoter sequences assist the RNA polymerase in attaching itself to the DNA sequence template. Once the DNA sequence is transcribed, processing still remains. One of the most unexpected findings in the history of molecular genetics was the discovery that genes are split into pieces. Exons composed of codons are often interrupted by intron sequences that do not encode amino acids. Before translation can occur, the intron sequences must be spliced out of the RNA. The exons are then reassembled for translation into proteins.
  • Here we see a representation of the steps involved in creating a protein from a DNA sequence.
  • Intro bioinfo

    1. 1. Introduction to Bioinformatics Shivani Chandra The Birla Institute of Scientific Research
    2. 2. What is Bioinformatics? <ul><li>Bioinformatics : is the development and use of computer applications for the Analysis , Interpretation , Simulation and Prediction of biological Systems and corresponding experimental methods in nature sciences. </li></ul>
    3. 3. <ul><li>What is bioinformatics? </li></ul><ul><li>Interface of biology and computers </li></ul><ul><li>Analysis of proteins, genes and genomes </li></ul><ul><li>using computer algorithms and </li></ul><ul><li>computer databases </li></ul><ul><li>Genomics is the analysis of genomes. </li></ul><ul><li>The tools of bioinformatics are used to make </li></ul><ul><li>sense of the billions of base pairs of DNA </li></ul><ul><li>that are sequenced by genomics projects. </li></ul>
    4. 4. History of Bioinformatics <ul><li>Biologists were searching for algorithms to analyze and interpret their huge amount of empiric biological data </li></ul><ul><li>Computer aided modeling and simulation International molecular biological databases arose to make data internationally accessible and comparable </li></ul>
    5. 5. History of Bioinformatics <ul><li>Algorithms for gene - and protein prediction where developed </li></ul><ul><li>These efforts lead to the development of artificial neuronal networks , genetic algorithms and evolution strategies </li></ul>
    6. 7. Bioinformatics <ul><li>Offers an ever more essential input to </li></ul><ul><li>Molecular Biology </li></ul><ul><li>Pharmacology (drug design) </li></ul><ul><li>Agriculture </li></ul><ul><li>Biotechnology </li></ul><ul><li>Clinical medicine </li></ul><ul><li>Forensic science </li></ul><ul><li>Chemical industries (detergent industries, etc.) </li></ul>
    7. 8. The Central Dogma
    8. 9. Central Dogma DNA RNA Protein Transcription Translation ATG CTA CTT CAC TGA M L L H AUG CUA CUU CAC UGA
    9. 10. Anatomy of a Gene Promoter Introns Exons
    10. 11. DNA to RNA to Protein
    11. 12. Molecular Sequences <ul><li>Two primary types </li></ul><ul><ul><li>DNA (4 nucleotides : A,C,G,T) </li></ul></ul><ul><ul><li>Amino acid (20 residues ) </li></ul></ul><ul><li>Strings of nucleotides can form genes , most of which code for the production of chains of amino acids called proteins . </li></ul>
    12. 13. Proteins <ul><li>Proteins have a variety of roles that they must fulfill: </li></ul><ul><li>they are the enzymes that rearrange chemical bonds. </li></ul><ul><li>they carry signals to and from the outside of the cell, and within the cell. </li></ul><ul><li>they transport small molecules. </li></ul><ul><li>they form many of the cellular structures. </li></ul><ul><li>they regulate cell processes, turning them on and off and controlling their rates. </li></ul>
    13. 14. Proteins – Amino Acids <ul><li>There are 20 different types of amino acids. </li></ul><ul><li>Different sequences of amino acids fold into different 3-D shapes. </li></ul><ul><li>Proteins can range from fewer than 20 to more than 5000 amino acids in length. </li></ul><ul><li>Each protein that an organism can produce is encoded in piece of the DNA called a “gene”. </li></ul>
    14. 15. Proteins – Amino Acids <ul><li>The single-celled bacterium E.coli has about 4300 different genes. </li></ul><ul><li>Properties of amino acids : </li></ul><ul><li>play a role in the construction of 3-D structures in proteins </li></ul>
    15. 18. In Summary <ul><li>DNA sequence determines protein </li></ul><ul><li>sequence </li></ul><ul><li>Protein sequence determines protein </li></ul><ul><li>structure </li></ul><ul><li>Protein structure determines protein folding and function </li></ul>
    16. 19. GenBank EMBL DDBJ There are three major public DNA databases The underlying raw DNA sequences are identical Databases in Bioinformatics
    17. 20. GenBank EMBL DDBJ Housed at EBI European Bioinformatics Institute There are three major public DNA databases Housed at NCBI National Center for Biotechnology Information Housed in Japan
    18. 21. >100,000 species are represented in GenBank all species 128,941 viruses 6,137 bacteria 31,262 archaea 2,100 eukaryota 87,147
    19. 24. The most sequenced organisms in GenBank Homo sapiens (6.9 million entries) Mus musculus (5.0 million) Zea mays (896,000) Rattus norvegicus (819,000) Gallus gallus (567,000) Arabidopsis thaliana (519,000) Danio rerio (492,000) Drosophila melanogaster (350,000) Oryza sativa (221,000)
    20. 25. National Center for Biotechnology Information (NCBI)
    21. 26.
    22. 28. <ul><li>PubMed is… </li></ul><ul><li>National Library of Medicine's search service </li></ul><ul><li>11 million citations in MEDLINE </li></ul><ul><li>links to participating online journals </li></ul><ul><li>PubMed tutorial (via “Education” on side bar) </li></ul>
    23. 29. <ul><li>Entrez integrates… </li></ul><ul><li>the scientific literature; </li></ul><ul><li>DNA and protein sequence databases; </li></ul><ul><li>3D protein structure data; </li></ul><ul><li>population study data sets; </li></ul><ul><li>assemblies of complete genomes </li></ul>
    24. 30. Entrez is a search and retrieval system that integrates NCBI databases
    25. 31. <ul><li>BLAST is… </li></ul><ul><li>Basic Local Alignment Search Tool </li></ul><ul><li>NCBI's sequence similarity search tool </li></ul><ul><li>supports analysis of DNA and protein databases </li></ul><ul><li>80,000 searches per day </li></ul>
    26. 32. <ul><li>OMIM is… </li></ul><ul><li>Online Mendelian Inheritance in Man </li></ul><ul><li>catalog of human genes and genetic disorders </li></ul><ul><li>edited by Dr. Victor McKusick, others at JHU </li></ul>
    27. 33. <ul><li>Books is… </li></ul><ul><li>searchable resource of on-line books </li></ul>
    28. 34. <ul><li>TaxBrowser is… </li></ul><ul><li>browser for the major divisions of living organisms </li></ul><ul><li>(archaea, bacteria, eukaryota, viruses) </li></ul><ul><li>taxonomy information such as genetic codes </li></ul><ul><li>molecular data on extinct organisms </li></ul>
    29. 35. Question #1: How can I use PubMed at NCBI to find literature information?
    30. 36. PubMed is the NCBI gateway to MEDLINE. MEDLINE contains bibliographic citations and author abstracts from over 4,000 journals published in the United States and in 70 foreign countries. It has 12 million records dating back to 1966.
    31. 37. MeSH is the acronym for &quot;Medical Subject Headings.&quot; MeSH is the list of the vocabulary terms used for subject analysis of biomedical literature at NLM. MeSH vocabulary is used for indexing journal articles for MEDLINE. The MeSH controlled vocabulary imposes uniformity and consistency to the indexing of biomedical literature.
    32. 40. PubMed search strategies Try the tutorial (“education” on the left sidebar) Use boolean queries lipocalin AND disease Try using “limits” Try “LinkOut” to find external resources Obtain articles on-line via Welch Medical Library (and download pdf files):
    33. 41. Sequence Databases <ul><li>GenBank -- DNA sequences and derived protein sequences </li></ul><ul><li>EMBL -- DNA sequences and derived protein sequences </li></ul><ul><li>DDBJ -- DNA sequences and derived protein sequences </li></ul><ul><li>SWISS-PROT -- Protein sequences </li></ul><ul><li>PDB -- three-dimensional structures of protein </li></ul>
    34. 42. <ul><li>GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences . </li></ul><ul><li>A new release is made every two months. GenBank is part of the International Nucleotide Sequence Database Collaboration , which is comprised of the DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank at NCBI. </li></ul><ul><li>These three organizations exchange data on a daily basis. </li></ul>GenBank,EMBL & DDBJ
    35. 43. GenBank,EMBL & DDBJ <ul><li>GenBank Release 122.0,Feb.15,2001. </li></ul><ul><li>10,897,000 sequence records </li></ul><ul><li>11,720,000,000 bases </li></ul><ul><li>EMBL Release 66,Mar.2,2000 </li></ul><ul><li>11,169,673 </li></ul><ul><li>11,916,112,872 </li></ul><ul><li>DDBJ, the Center for operating DDBJ, National Institute of Genetics (NIG),Japan,established in April 1995. </li></ul>
    36. 44. Next Topic : Protein Databases