bioinformatics simple

1,128 views

Published on

Published in: Education, Technology
  • Be the first to comment

bioinformatics simple

  1. 1. 1
  2. 2.   Science of collecting, analyzing and conceptualizing biological data by implication of informatics techniques. 2 Bioinformatics Biology Informa- tics Bioinformatics
  3. 3. Biological Data Computer Analysis+ Mouse Genome: 2.5 billion base pairs Human Genome: 3 billion base pairs 3
  4. 4.   Manage biological information  organize biological information using databases  Process, analyze, and visualize biological data  Share biological information to the public using the Internet. 4 Goals of Bioinformatics
  5. 5.   Bio – informatics  Bioinformatics is conceptualizing biology in terms of molecules (in the sense of physical-chemistry) applying “informatics” techniques (derived from disciplines such as applied math, CS, and statistics) to understand and organize the information associated with these molecules, on a large-scale.  Bioinformatics is a practical discipline with many applications. 5 Definition
  6. 6.  Computational biology 6 Bioinformatics Systems biology Genomics Bioinformatics
  7. 7.  7 Biological Information  Central Dogma of Molecular Biology DNA -> RNA -> Protein -> Phenotype -> DNA  Molecules  Sequence, Structure, Function, Interaction  Processes  Mechanism, Specificity, Regulation  Central Paradigm for Bioinformatics Genomic Sequence Information -> mRNA (level) -> Protein Sequence -> Protein Structure -> Protein Function -> Protein Interaction -> Phenotype  Large Amounts of Information  Statistical  Computer Processing
  8. 8.  Systems Analysis Information Theory Graph Theory Robotics Algorithms Artificial IntelligenceStatistics 8
  9. 9.  9 Domains of bioinformatics Bio-informatist Development of new software Algorithms Bio-informaticians. Using different algorithms and computer software
  10. 10.   Could not have been achieved without bioinformatics  Goals  3 billion DNA subunits  Discover all the human genes  Make them accessible for further biological study  then ?  Need to bring together and store vast amounts of information from  Lab equipment and experiments  Computer Analysis  Human Analysis  Make visible to the world’s scientists 10 Human genome project
  11. 11.  11 How to analyze information  Data  –Management.  –Analysis.  –Derive Hypothesis.  –Design and Implement an in silico experiment.  –Confirm in the wet lab.
  12. 12.   Find an answer quickly  Most in silico biology is faster than in vitro  2. Massive amounts of data to analyze  Need to make use of all information  Not possible to do analysis by hand  Can’t organize and store information only using lab note books•  Automation is key  However!  Verification ? 12 Why bioinformatics
  13. 13.  1. Computational biology-  Computing methods for classical biology  Primarily concerned ----> Evolutionary, population and theoretical biology,  Cellular/Molecular biology ? 2. Medical informatics-  Computing methods to improve communication, understanding, and management of medical data  Data Manipulation Applications
  14. 14.  3. Chemo -informatics  Chemical and biological technology, for drug design and development 4. Genomics  Analysis and comparison of the entire genome of a single species or of multiple species  Genomics existed before any genomes were completely sequenced, but in a very primitive state Continued…
  15. 15.  5. Proteomics  Study of how the genome is expressed in proteins, and of how these proteins function and interact  Concerned with the actual states of specific cells, rather than the potential states described by the genome 6. Pharmacogenomics  The application of genomic methods to identify drug targets  For example, searching entire genomes for potential drug receptors, or by studying gene expression patterns in tumors Continued….
  16. 16.  7. Pharmacogenetics :  The use of genomic methods to determine what causes variations in individual response to drug treatments  The goal is to identify drugs that may be only be effective for subsets of patients, or to tailor drugs for specific individuals or groups
  17. 17.  17 Main Goal: ? Annotation Comparative genomics Structural genomics Functional genomics The “post-genomics” era
  18. 18. 18 Annotation Identify the genes within a given sequence of DNA Identify the sites Which regulate the gene Predict the function
  19. 19.   A gene is characterized by several features (promoter, ORF…)  some are easier and some harder to detect… 19 How do we identify a gene in a genome?
  20. 20. 20 Comparative genomics
  21. 21.  21 Comparison between the full drafts of the human and chimp genomes revealed that they differ only by 1.23% How humans are chimps? Perhaps not surprising!!!
  22. 22.  So where are we different ?? 22 Human ATAGCGGGGGGATGCGGGCCCTATACCC Chimp ATAGGGG - - GGATGCGGGCCCTATACCC Mouse ATAGCG - - - GGATGCGGCGC -TATACCA
  23. 23.  23 Structural Genomics
  24. 24. 24 The protein three dimensional structure can tell much more than the sequence alone Protein-ligand complexes Functional sites fold Evolutionary relationship Shape and electrostatics Active sites protein complexes Biologic processes
  25. 25.  The different types of data are collected in database  Sequence databases  Structural databases  Databases of Experimental Results All databases are connected 25 Resources and Databases
  26. 26.  Gene database Genome database Disease related mutation database 26 Sequence databases
  27. 27.   3-dimensional structures of proteins, nucleic acids, molecular complexes etc  3-d data is available due to techniques such as NMR and X-Ray crystallography 27 Structure Databases
  28. 28.   Data such as experimental microarray images- gene expression data  Proteomic data- protein expression data  Metabolic pathways, protein-protein interaction data, regulatory networks 28 Databases of Experimental Results
  29. 29.  29 PubMed Service of the National Library of Medicine http://www.ncbi.nlm.nih.gov/pubmed/ Literature Databases
  30. 30.   Each Database contains specific information  Like other biological systems also these databases are interrelated 30 Putting it all Together
  31. 31. 31 GENOMIC DATA GenBank DDBJ EMBL ASSEMBLED GENOMES GoldenPath WormBase TIGR PROTEIN PIR SWISS-PROT STRUCTURE PDB MMDB SCOP LITERATURE PubMed PATHWAY KEGG COG DISEASE LocusLink OMIM OMIA GENES RefSeq AllGenes GDBSNPs dbSNP ESTs dbEST unigene MOTIFS BLOCKS Pfam Prosite GENE EXPRESSION Stanford MGDB NetAffx ArrayExpress
  32. 32.  Applications I-- Genomics  Finding Genes in Genomic DNA  introns  exons  Promotors  Characterizing Repeats in Genomic DNA  Statistics  Patterns  Expression Analysis  Time Course Clustering  Identifying regulatory Regions  Measuring Differences • Genome Comparisons  Ortholog Families  Genome annotation  Evolutionary Phylogenetic trees • Characterizing Intergenic Regions  Finding Pseudo genes  Patterns • Duplications in the Genome  Large scale genomic alignment
  33. 33.  Application II- Protein Sequence  Sequence Alignment  non-exact string matching, gaps  How to align two strings optimally via Dynamic Programming  Local vs Global Alignment  Suboptimal Alignment  Hashing to increase speed (BLAST, FASTA)  Amino acid substitution scoring matrices  Multiple Alignment and Consensus Patterns  How to align more than one sequence and then fuse the result in a consensus representation  Transitive Comparisons  HMMs, Profiles  Motifs  Scoring schemes and Matching statistics  How to tell if a given alignment or match is statistically significant  A P-value (or an e-value)?  Score Distributions (extreme val. dist.)  Low Complexity Sequences  Evolutionary Issues  Rates of mutation and change
  34. 34.  Application III-- Protein Structure  Secondary Structure “Prediction”  via Propensities  Neural Networks, Genetic Algorithm.  Simple Statistics  Trans Membrane Regions  Assessing Secondary Structure Prediction  Tertiary Structure Prediction  Fold Recognition  Threading  Ab initio  Function Prediction  Active site identification  Relation of Sequence Similarity to Structural Similarity
  35. 35.  Example Application IV: Finding Homologs Core
  36. 36.   Overall Occurrence of a Certain Feature in the Genome  e.g. how many kinases in Yeast  Compare Organisms and Tissues  Expression levels in Cancerous vs Normal Tissues  Databases, Statistics Example Application IV: Overall Genome Characterization
  37. 37.  37 Thanks

×