Successfully reported this slideshow.
Introduction to Bioinformatics Shivani Chandra The Birla Institute of Scientific Research
What is Bioinformatics? <ul><li>Bioinformatics :  is the development and use of computer applications for the  Analysis , ...
<ul><li>What is bioinformatics? </li></ul><ul><li>Interface of biology and computers </li></ul><ul><li>Analysis of protein...
History of Bioinformatics <ul><li>Biologists were searching for  algorithms  to analyze and interpret their huge amount of...
History of Bioinformatics <ul><li>Algorithms for  gene - and  protein prediction  where developed </li></ul><ul><li>These ...
 
Bioinformatics <ul><li>Offers an ever more essential input to </li></ul><ul><li>Molecular Biology </li></ul><ul><li>Pharma...
The Central Dogma
Central Dogma DNA RNA Protein Transcription Translation ATG CTA CTT CAC TGA M L L H AUG CUA CUU CAC UGA
Anatomy of a Gene Promoter Introns Exons
DNA to RNA to Protein
Molecular Sequences <ul><li>Two primary types </li></ul><ul><ul><li>DNA (4  nucleotides : A,C,G,T) </li></ul></ul><ul><ul>...
Proteins <ul><li>Proteins have a variety of roles that they must fulfill: </li></ul><ul><li>they are the enzymes that rear...
Proteins – Amino Acids <ul><li>There are 20 different types of amino acids. </li></ul><ul><li>Different sequences of amino...
Proteins – Amino Acids <ul><li>The single-celled bacterium  E.coli  has about 4300 different genes. </li></ul><ul><li>Prop...
 
 
In Summary <ul><li>DNA sequence determines protein </li></ul><ul><li>sequence </li></ul><ul><li>Protein sequence determine...
GenBank EMBL DDBJ There are three major public DNA databases The underlying raw DNA sequences are identical Databases in B...
GenBank EMBL DDBJ Housed at EBI European Bioinformatics Institute There are three major public DNA databases Housed  at NC...
>100,000 species are represented in GenBank all species 128,941   viruses 6,137 bacteria 31,262  archaea 2,100  eukaryota ...
 
 
The most sequenced organisms in GenBank Homo sapiens  (6.9 million entries) Mus musculus  (5.0 million) Zea mays   (896,00...
National Center for Biotechnology Information (NCBI) www.ncbi.nlm.nih.gov
www.ncbi.nlm.nih.gov
 
<ul><li>PubMed is… </li></ul><ul><li>National Library of Medicine's search service </li></ul><ul><li>11 million citations ...
<ul><li>Entrez  integrates… </li></ul><ul><li>the scientific literature;  </li></ul><ul><li>DNA and protein sequence datab...
Entrez is a search and retrieval system  that integrates NCBI databases
<ul><li>BLAST is… </li></ul><ul><li>Basic Local Alignment Search Tool </li></ul><ul><li>NCBI's sequence similarity search ...
<ul><li>OMIM is… </li></ul><ul><li>Online Mendelian Inheritance in Man </li></ul><ul><li>catalog of human genes and geneti...
<ul><li>Books is… </li></ul><ul><li>searchable resource of on-line books </li></ul>
<ul><li>TaxBrowser is… </li></ul><ul><li>browser for the major divisions of living organisms  </li></ul><ul><li>(archaea, ...
Question #1: How can I use  PubMed at NCBI to find literature information?
PubMed is the NCBI gateway to MEDLINE. MEDLINE contains bibliographic citations  and author abstracts from over 4,000 jour...
MeSH is the acronym for &quot;Medical Subject Headings.&quot;  MeSH is the list of the vocabulary terms used  for subject ...
 
 
PubMed search strategies Try the tutorial (“education” on the left sidebar) Use boolean queries lipocalin AND disease Try ...
Sequence Databases <ul><li>GenBank -- DNA sequences and derived protein sequences   </li></ul><ul><li>EMBL  -- DNA sequenc...
<ul><li>GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences .  <...
GenBank,EMBL & DDBJ <ul><li>GenBank  Release  122.0,Feb.15,2001.  </li></ul><ul><li>10,897,000 sequence records  </li></ul...
Next Topic : Protein Databases
Upcoming SlideShare
Loading in …5
×

Intro bioinfo

1,423 views

Published on

friends please check it out and download..

Published in: Education, Technology
  • Be the first to comment

Intro bioinfo

  1. 1. Introduction to Bioinformatics Shivani Chandra The Birla Institute of Scientific Research
  2. 2. What is Bioinformatics? <ul><li>Bioinformatics : is the development and use of computer applications for the Analysis , Interpretation , Simulation and Prediction of biological Systems and corresponding experimental methods in nature sciences. </li></ul>
  3. 3. <ul><li>What is bioinformatics? </li></ul><ul><li>Interface of biology and computers </li></ul><ul><li>Analysis of proteins, genes and genomes </li></ul><ul><li>using computer algorithms and </li></ul><ul><li>computer databases </li></ul><ul><li>Genomics is the analysis of genomes. </li></ul><ul><li>The tools of bioinformatics are used to make </li></ul><ul><li>sense of the billions of base pairs of DNA </li></ul><ul><li>that are sequenced by genomics projects. </li></ul>
  4. 4. History of Bioinformatics <ul><li>Biologists were searching for algorithms to analyze and interpret their huge amount of empiric biological data </li></ul><ul><li>Computer aided modeling and simulation International molecular biological databases arose to make data internationally accessible and comparable </li></ul>
  5. 5. History of Bioinformatics <ul><li>Algorithms for gene - and protein prediction where developed </li></ul><ul><li>These efforts lead to the development of artificial neuronal networks , genetic algorithms and evolution strategies </li></ul>
  6. 7. Bioinformatics <ul><li>Offers an ever more essential input to </li></ul><ul><li>Molecular Biology </li></ul><ul><li>Pharmacology (drug design) </li></ul><ul><li>Agriculture </li></ul><ul><li>Biotechnology </li></ul><ul><li>Clinical medicine </li></ul><ul><li>Forensic science </li></ul><ul><li>Chemical industries (detergent industries, etc.) </li></ul>
  7. 8. The Central Dogma
  8. 9. Central Dogma DNA RNA Protein Transcription Translation ATG CTA CTT CAC TGA M L L H AUG CUA CUU CAC UGA
  9. 10. Anatomy of a Gene Promoter Introns Exons
  10. 11. DNA to RNA to Protein
  11. 12. Molecular Sequences <ul><li>Two primary types </li></ul><ul><ul><li>DNA (4 nucleotides : A,C,G,T) </li></ul></ul><ul><ul><li>Amino acid (20 residues ) </li></ul></ul><ul><li>Strings of nucleotides can form genes , most of which code for the production of chains of amino acids called proteins . </li></ul>
  12. 13. Proteins <ul><li>Proteins have a variety of roles that they must fulfill: </li></ul><ul><li>they are the enzymes that rearrange chemical bonds. </li></ul><ul><li>they carry signals to and from the outside of the cell, and within the cell. </li></ul><ul><li>they transport small molecules. </li></ul><ul><li>they form many of the cellular structures. </li></ul><ul><li>they regulate cell processes, turning them on and off and controlling their rates. </li></ul>
  13. 14. Proteins – Amino Acids <ul><li>There are 20 different types of amino acids. </li></ul><ul><li>Different sequences of amino acids fold into different 3-D shapes. </li></ul><ul><li>Proteins can range from fewer than 20 to more than 5000 amino acids in length. </li></ul><ul><li>Each protein that an organism can produce is encoded in piece of the DNA called a “gene”. </li></ul>
  14. 15. Proteins – Amino Acids <ul><li>The single-celled bacterium E.coli has about 4300 different genes. </li></ul><ul><li>Properties of amino acids : </li></ul><ul><li>play a role in the construction of 3-D structures in proteins </li></ul>
  15. 18. In Summary <ul><li>DNA sequence determines protein </li></ul><ul><li>sequence </li></ul><ul><li>Protein sequence determines protein </li></ul><ul><li>structure </li></ul><ul><li>Protein structure determines protein folding and function </li></ul>
  16. 19. GenBank EMBL DDBJ There are three major public DNA databases The underlying raw DNA sequences are identical Databases in Bioinformatics
  17. 20. GenBank EMBL DDBJ Housed at EBI European Bioinformatics Institute There are three major public DNA databases Housed at NCBI National Center for Biotechnology Information Housed in Japan
  18. 21. >100,000 species are represented in GenBank all species 128,941 viruses 6,137 bacteria 31,262 archaea 2,100 eukaryota 87,147
  19. 24. The most sequenced organisms in GenBank Homo sapiens (6.9 million entries) Mus musculus (5.0 million) Zea mays (896,000) Rattus norvegicus (819,000) Gallus gallus (567,000) Arabidopsis thaliana (519,000) Danio rerio (492,000) Drosophila melanogaster (350,000) Oryza sativa (221,000)
  20. 25. National Center for Biotechnology Information (NCBI) www.ncbi.nlm.nih.gov
  21. 26. www.ncbi.nlm.nih.gov
  22. 28. <ul><li>PubMed is… </li></ul><ul><li>National Library of Medicine's search service </li></ul><ul><li>11 million citations in MEDLINE </li></ul><ul><li>links to participating online journals </li></ul><ul><li>PubMed tutorial (via “Education” on side bar) </li></ul>
  23. 29. <ul><li>Entrez integrates… </li></ul><ul><li>the scientific literature; </li></ul><ul><li>DNA and protein sequence databases; </li></ul><ul><li>3D protein structure data; </li></ul><ul><li>population study data sets; </li></ul><ul><li>assemblies of complete genomes </li></ul>
  24. 30. Entrez is a search and retrieval system that integrates NCBI databases
  25. 31. <ul><li>BLAST is… </li></ul><ul><li>Basic Local Alignment Search Tool </li></ul><ul><li>NCBI's sequence similarity search tool </li></ul><ul><li>supports analysis of DNA and protein databases </li></ul><ul><li>80,000 searches per day </li></ul>
  26. 32. <ul><li>OMIM is… </li></ul><ul><li>Online Mendelian Inheritance in Man </li></ul><ul><li>catalog of human genes and genetic disorders </li></ul><ul><li>edited by Dr. Victor McKusick, others at JHU </li></ul>
  27. 33. <ul><li>Books is… </li></ul><ul><li>searchable resource of on-line books </li></ul>
  28. 34. <ul><li>TaxBrowser is… </li></ul><ul><li>browser for the major divisions of living organisms </li></ul><ul><li>(archaea, bacteria, eukaryota, viruses) </li></ul><ul><li>taxonomy information such as genetic codes </li></ul><ul><li>molecular data on extinct organisms </li></ul>
  29. 35. Question #1: How can I use PubMed at NCBI to find literature information?
  30. 36. PubMed is the NCBI gateway to MEDLINE. MEDLINE contains bibliographic citations and author abstracts from over 4,000 journals published in the United States and in 70 foreign countries. It has 12 million records dating back to 1966.
  31. 37. MeSH is the acronym for &quot;Medical Subject Headings.&quot; MeSH is the list of the vocabulary terms used for subject analysis of biomedical literature at NLM. MeSH vocabulary is used for indexing journal articles for MEDLINE. The MeSH controlled vocabulary imposes uniformity and consistency to the indexing of biomedical literature.
  32. 40. PubMed search strategies Try the tutorial (“education” on the left sidebar) Use boolean queries lipocalin AND disease Try using “limits” Try “LinkOut” to find external resources Obtain articles on-line via Welch Medical Library (and download pdf files): http://www.welch.jhu.edu/
  33. 41. Sequence Databases <ul><li>GenBank -- DNA sequences and derived protein sequences </li></ul><ul><li>EMBL -- DNA sequences and derived protein sequences </li></ul><ul><li>DDBJ -- DNA sequences and derived protein sequences </li></ul><ul><li>SWISS-PROT -- Protein sequences </li></ul><ul><li>PDB -- three-dimensional structures of protein </li></ul>
  34. 42. <ul><li>GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences . </li></ul><ul><li>A new release is made every two months. GenBank is part of the International Nucleotide Sequence Database Collaboration , which is comprised of the DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank at NCBI. </li></ul><ul><li>These three organizations exchange data on a daily basis. </li></ul>GenBank,EMBL & DDBJ
  35. 43. GenBank,EMBL & DDBJ <ul><li>GenBank Release 122.0,Feb.15,2001. </li></ul><ul><li>10,897,000 sequence records </li></ul><ul><li>11,720,000,000 bases </li></ul><ul><li>EMBL Release 66,Mar.2,2000 </li></ul><ul><li>11,169,673 </li></ul><ul><li>11,916,112,872 </li></ul><ul><li>DDBJ, the Center for operating DDBJ, National Institute of Genetics (NIG),Japan,established in April 1995. </li></ul>
  36. 44. Next Topic : Protein Databases

×