Bioinformática: la biología por otros medios

               Alberto Labarga
             UGR, Noviembre 2008
Computational Biology
          Bioinformatics
     [Biological Information]
Hacia una teoría científica de la herencia




1859         1866         1870             1900   1902
Charles Darwin publica en 1859
              'The Origin of Species‘
              donde se propone que los seres
        ...
Leyes de Mendel,
                     publicadas en 1866,
                     redescubiertas en 1900




1859   1866   18...
En 1870, un científico alemán llamado
Friedrich Miescher aísla los
componentes almacenados en el
núcleo, compuesto princip...
A comienzo de siglo, Phoebus Levene,
descubrió que el ADN es una cadena de
nucleótidos, en la que cada nucleótido está
com...
Walter Sutton, a graduate student in E. B. Wilson’s
lab at Columbia University, observed that in the
process of cell divis...
El descubrimiento del ADN




1928        1944       1949   1952   1953
1928 Frederick Griffith: principio de transformación


                    si mezclaba a los neumococos R
                ...
En 1944 Oswald Avery y sus colaboradores, que
estaban estudiando la bacateria que causa la
neumonía, Pneumococcus, descubr...
La vida puede verse como un proceso
              de almacenamiento y transmisión de
              información biológica.
...
1949 DNA se duplica durante la división celular
     Chargaff: A = T and G = C




1928        1944       1949            ...
1952 - Hershey-Chase Experiment




 1928       1944      1949        1952   1953
M.H.F. Wilkins, A.R. Stokes, H.R. Wilson:
               Molecular Structure of Deoxypentose Nucleic
               Acids....
MOLECULAR STRUCTURE
OF NUCLEIC ACIDS
“We wish to propose a
structure for the salt of
desoxyribose nucleic acid
(DNA). This...
“It has not escaped our
              attention that the specific
              pairing we have
              postulated i...
The base pairs
En 1955 Ochoa publicó en Journal of the American
       Chemical Society con la bioquímica francorrusa
       Marianne Gru...
1955   1959   1962   1966
Cuando Perutz llegó a Cambridge la
estructura molecular más grande que se
había resuelto era la del pigmento natural
ficoc...
1955   1959   1963   1966
Over the course of several years,
Marshall Nirenberg, Har Khorana and
Severo Ochoa and their colleagues
elucidated the gen...
From DNA to protein
Entendiendo los mecanismos, creando las herramientas




1970    1971           1975         1977             1980
El Central Dogma




1970   1971       1975   1977   1980
Created in 1971
                     with seven
                     structures




1970   1971   1975       1977         ...
El ADN recombinante, o ADN recombinado, es
               una molécula de ADN formada por la unión de
               dos m...
1970   1971   1975   1977   1980
A precursor-RNA may often be matured to
                      mRNAs with alternative structures. An example
              ...
Entendiendo los mecanismos, creando las herramientas




1981   1982   1983       1985       1987             1990
Read out the letters from a DNA sequence




                                GTGAGGCGCTGC




1981   1982   1983   1985   ...
1983 La reacción en cadena de la polimerasa,
                          conocida como PCR por sus siglas en inglés
        ...
Total nucleotides                Number of entries
  (Nov 07: 188,490,792,445)          (Nov 07: 106,144,026)




1981   1...
1981   1982   1983   1985   1987   1990
El Proyecto Genoma Humano (PGH) (Human
Genome Project en inglés) consiste en
determinar las posiciones relativas de todos ...
”Imagine varias copias de un libro, cortadas en
10 millones de trocitos cada una, de manera
que los trocitos se solapan. S...
HUGO: Idealized representation of the hierarchical shotgun sequencing strategy. A library is constructed by
fragmenting th...
Descifrando el libro de la vida




1990              1995     1996      1997 1998 1999   2001
S.F. Altschul, et al. (1990), "Basic Local
                Alignment Search Tool," J. Molec.
                Biol., 215(3)...
• SSAHA (Ning et al., 2001)
   •   http://www.sanger.ac.uk/Software/analysis/SSAHA/
   •   SSAHA is an algorithm for very ...
J. Thompson, T. Gibson, D.
                Higgins (1994), CLUSTAL W:
                improving the sensitivity of
       ...
Flowchart of computation steps in
Clustal W (Thompson et al., 1994)

    Pairwise alignment: calculation of distance matri...
Otros métodos


Notredame, C., Higgins, D.G., Heringa, J. (2000) T-Coffee: a novel method for
   fast and accurate multipl...
Tree of Life




http://tolweb.org/tree/phylogeny.html       http://itol.embl.de/
1995
              • El primer genoma completo
              de un organismo
               Hemophilus influenzae.




199...
1996
• El genoma de la levadura se completa:
aproximadamente, 6,000 genes y
14.000.000 de pares de bases




 1990        ...
1990   1995   1996   1997 1998 1999   2001
1997

•Ecuenciado el genoma de la
bacteria E. Coli: 4,600 genes
4,5 millones de nucleótidos.




1990           1995   199...
1998

El genoma del gusano
Caenorhabditis elegans,
tiene 18,000 genes unos
100 millones de nucleotidos




1990         19...
1999
   •Se consigue la secuencia
   completa del cromosoma 22
   El HGP va por delante de lo
   planeado.
   Sorprende el...
Fire A, Xu S, Montgomery M, Kostas
S, Driver S, Mello C (1998). "Potent
and specific genetic interference by
double-strand...
Hamilton A, Baulcombe D
(1999). "A species of small
antisense RNA in
posttranscriptional gene
silencing in plants". Scienc...
Dr Alan Wolffe (1999)
• Epigenetics is heritable
  changes in gene expression
  that occur without a change
  in DNA seque...
1990   1995   1996   1997 1998 1999   2001
Gene prediction




            Where are the genes?




                  In humans:

                  ~22,000 genes
   ...
the gencode pipeline




1.   mapping of known transcripts sequences (ESTs, cDNAs, proteins) into the
     human genome
2....
Genome annotation - building a pipeline

                          Genome sequence



       Map repeats             Map E...
Genefinding - ab initio predictions

    Use compositional features of the DNA sequence to define coding
   segments (ess...
ab initio prediction

Genome


Coding
potential

ATG & Stop
codons

Splice sites
ATG & Stop
codons
Coding
potential




  ...
ab initio prediction

Genome


Coding
potential

ATG & Stop
codons

Splice sites
ATG & Stop
codons
Coding
potential




  ...
ab initio prediction

Genome


Coding
potential

ATG & Stop
codons

Splice sites
ATG & Stop
codons
Coding
potential
      ...
Genefinding - similarity

  Use known coding sequence to define coding regions
        EST sequences
        Peptide se...
Similarity-based prediction


Genome


                                                Align
           cDNA/peptide



  ...
Example of a simple HMM




    Top: model architecture and parameters. Bottom: sequence generation process.
    green: st...
Automatic Annotation vs Manual



Automatic Annotation                   Manual Annotation
• Quick whole genome analysis ~...
Analysis EGASP predictions vs manual
100
                      annotation                           100
                  ...
Y sólo es el principio




2002           2004         2005   2007   2010
2002   2004   2005   2007   2010
10/3/02   8/28/03    5/07   10/08

Published complete genomes:      104      156       500      874

Ongoing prokaryotic g...
32,000,000
                                                                                                     454-GS20

...
Aunque los seres humanos compartimos
              99.9 por ciento de la información genética,
              tenemos peque...
VARIACIÓN EN LA SECUENCIA HUMANA DE
      DNA




Tasa de mutación = 10-8 /sitio/generación
Nº generaciones ancestro común...
ENCyclopedia Of DNA Elements




2002   2004   2005       2007    2010
2002   2004   2005   2007   2010
Genómica funcional
Sequence (DNA/RNA)
Comparative                                                                & phylogeny
 genomics

     ...
Se preparan las
Se preparan copias del ADN    muestras de ARN
de los genes de interés       de interés                    ...
Microarray analysis
            Clinical prediction of Leukemia type

• 2 types
    – Acute lymphoid (ALL)
    – Acute mye...
Biomarkers discovery

   Data        statistical
Management      analysis                  Network
                       ...
RT-PCR Standard Processing Procedure

                  TaqMan
                  Assays
                                  ...
Example of Array CGH Technology*




Chari et al, Cancer Informatics, 2006, 2, 48-58   88
89
Chip-on-chip




     Source: http://www.chiponchip.org/
ChIP (Chromatin ImmunoPrecipitation)

• Chromatin immunoprecipitation, or ChIP, refers to a procedure
  used to determine ...
Protein Microarray
         G. MacBeath and S.L. Schreiber, 2000, Science 289:1760




                                   ...
Different Kinds of Protein Arrays*



Antibody Array    Antigen Array       Ligand Array




 Detection by: SELDI MS, fluo...
The Microarray Study Process
Preprocesado
Some Questions:



• Which genes have expression levels that are correlated
  with some external variable?
• For a given p...
Challenges for Data Analysis


• Normalization (removing systematic measurement effects)
• Variable Selection (Identificat...
Data Analysis Methods
Dimension Reduction
 • PCA (Principle Component Analysis)
 • ICA (Independent Component Analysis)
 •...
Matrix factorization
Popular Classification Methods

• Decision Trees/Rules
   – Find smallest gene sets, but not robust – poor performance
• N...
Support Vector Machine (SVM)




• Main idea: Select hyperplane that is more likely to
  generalize on a future datum
Best Practices



• Capture the complete process, from raw data to final
  results
• Gene (feature) selection inside cross...
Enrichment Analysis

•   What are major enriched GO terms?
•   What are the highly active pathways?
•   What are the frequ...
Meta-analysis example: “Creation and
    implications of a phenome-genome network”
Butte and Kohane. Nat Biotech. 2006
Meta-analysis example: “Creation and
     implications of a phenome-genome network”
Butte and Kohane. Nat Biotech. 2006
• ...
Biología de sistemas
PPI ANNOTATION AND DATABASES


Database    Reference                                      URL
MINT        (Zanoni et al., ...
Complex networks


• Many systems can be represented as
  networks (graphs)
   – Nodes: individual component (proteins)
  ...
Detecting Hierarchical Organization
Summary: Network Measures

• Degree ki
   The number of edges involving node i
• Degree distribution P(k)
   The probabili...
Mapping the phenotypic data to the network
                             •Systematic phenotyping
                          ...
The Role of Proteomics


• The existence of an ORF does not imply the
    existence of a functional gene.
•   Limitations ...
Structural proteomics



•   Folding
•   Structure and function
•   Protein structure prediction
•   Secondary structure
•...
What kind of methods around?


5 main levels of protein Structure prediction:

  1. Extensive Sequence Search
  2. Threadi...
Prediction of Protein Structures


• Examples – a few good examples




       actual              predicted   actual     ...
MODPIPE: Large-Scale Comparative Protein Structure Modeling
                          START


                            ...
Structural Proteomics:
                            The Motivation*


            2000000                                  ...
The hierarchies of protein structure
Docking Programs

• Dock (UCSF)
• Autodock (Scripps)
• Glide
  (Schrodinger)
• ICM (Molsoft)
• FRED (Open Eye)
• Gold, Fle...
Cell cycle network from KEGG
Graphical Notation: a necessity for the conceptual representation
    of biopathways



      Qualitative                 ...
Strategies: simulate or analyse?
                 (or rather what to do first)


                                         ...
130
         stochsim
                        Boolean
                        networks
                                   ...
Continuum of modeling approaches




Top-down                          Bottom-up
Frazier et al. (2003) Science 11 April Vol 300:290-293
Integración de datos
Nucleic Acids Research article lists
      1078 public databases




  Nucleic Acids Research, 2008, Vol. 36, Database iss...
Growth in Available Bioinformatics Databases
Too much unintegrated data



• Data sources incompatible
• No (or few) standard naming convention
• No common interface (...
– Large experiments or large research    – Small, isolated, independent,
  groups/labs, possibly distributed        groups...
Challenges: Names and Identity

•     WSL-1 protein                             Q93038 = Tumor necrosis factor
•     Apopt...
Why must support standards?


• Unambiguous representation, description
  and communication
  – Final results and metadata...
What to standarize?


•   CONTENT: Minimal/Core Information to be reported
•   MIBBI (http://www.mibbi.org)
•   SEMANTIC: ...
MIBBI: Standard Content




Promoting Coherent Minimum Reporting Requirements for
Biological and Biomedical Investigations...
Link Integration: Integration Lite




                                      Application interface



                    ...
Warehouse




                                     Wrappers Wrappers




                                                 ...
View integration




                                    Wrappers Wrappers




                                           ...
Specialist Integrating Application




                                  Wrappers Wrappers




                           ...
Workflows


                                                  Workflow
                                                  E...
Mash-Up Data Marshalling
                          objects




                                      Protocol




        ...
Composite applications
Semantic Web help?




                                                   Access and Query
                               ...
Service Oriented Architecture



                               Advanced Search
                               Retrieve da...
Distributed Annotation System
Distributed Annotation System
An Integrative Analysis Example


Relational
   data
    Decision
 mining                                     Text
  tree ...
From experiments to scientific publications

1- Experiments    2- Results          3- Scientific Peer-
                   ...
PubMed/Medline database at NCBI

                       - Developed at the National
                         Center for Bi...
Data in scientific articles

   Scientific      Free Text                         Tables              Figures
   Journals
...
BioCreative
BioCreative
BioCreative results

                      TP: prediction evaluated as protein
                      and GO terms correct
...
 Data Integration
   • Standards, DBs                                 Infrastructure

 Knowledge Discovery
   • Algorith...
Los retos de la biología en los próximos
                    50 years
• Listado de todos los componentes moleculares que
 ...
Retos de la Bioinformatica
Retos de la Bioinformatica
Retos de la Bioinformatica
Retos de la Bioinformatica
Retos de la Bioinformatica
Retos de la Bioinformatica
Retos de la Bioinformatica
Retos de la Bioinformatica
Retos de la Bioinformatica
Retos de la Bioinformatica
Retos de la Bioinformatica
Retos de la Bioinformatica
Retos de la Bioinformatica
Retos de la Bioinformatica
Retos de la Bioinformatica
Retos de la Bioinformatica
Upcoming SlideShare
Loading in …5
×

Retos de la Bioinformatica

2,682 views

Published on

Charla impartida en la Universidad de Granada

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,682
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
57
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Retos de la Bioinformatica

  1. 1. Bioinformática: la biología por otros medios Alberto Labarga UGR, Noviembre 2008
  2. 2. Computational Biology Bioinformatics [Biological Information]
  3. 3. Hacia una teoría científica de la herencia 1859 1866 1870 1900 1902
  4. 4. Charles Darwin publica en 1859 'The Origin of Species‘ donde se propone que los seres vivos son el resultado de la selección natural y que todas las criaturas han evolucionado a lo largo de las generaciones a través de pequeños cambios. 1859 1866 1870 1900 1902
  5. 5. Leyes de Mendel, publicadas en 1866, redescubiertas en 1900 1859 1866 1870 1900 1902
  6. 6. En 1870, un científico alemán llamado Friedrich Miescher aísla los componentes almacenados en el núcleo, compuesto principalmente por proteinas y ácidos nucleicos. En aquel momento se creía que el elemento que almacenaba la información hereditaria tenía que ser la proteína, compuesta por 20 aminoacidos, mientras que los ácidos nucleicos tenían sólo 4 componentes. 1859 1866 1870 1900 1902
  7. 7. A comienzo de siglo, Phoebus Levene, descubrió que el ADN es una cadena de nucleótidos, en la que cada nucleótido está compuesto de un azucar (desoxirribosa), un grupo fosfato y una base nitrogenada, que podía ser de cuatro tipos, Adenin, Timina, guanina y Citosina 1859 1866 1870 1900 1902
  8. 8. Walter Sutton, a graduate student in E. B. Wilson’s lab at Columbia University, observed that in the process of cell division, called meiosis, that produces sperm and egg cells, each sperm or egg receives only one chromosome of each type. (In other parts of the body, cells have two chromosomes of each type, one inherited from each parent.) The segregation pattern of chromosomes during meiosis matched the segregation patterns of Mendel’s genes. 1859 1866 1870 1900 1902
  9. 9. El descubrimiento del ADN 1928 1944 1949 1952 1953
  10. 10. 1928 Frederick Griffith: principio de transformación si mezclaba a los neumococos R con neumococos S previamente muertos por calor, entonces los ratones se morían. Aún más, en la sangre de estos ratones muertos Griffith encontró neumococos con cápsula (S). 1928 1944 1949 1952 1953
  11. 11. En 1944 Oswald Avery y sus colaboradores, que estaban estudiando la bacateria que causa la neumonía, Pneumococcus, descubrieron que las bacterias tienen ácidos nucleicos y que es la molécula de ADN la encargada de almacenar los genes. Otros estudios con virus se encargaronde confirmar esta teoría a pesar de que se seguía creyendo que el ADN era demasiado simple. 1928 1944 1949 1952 1953
  12. 12. La vida puede verse como un proceso de almacenamiento y transmisión de información biológica. Los cromosomas son los portadores de esta información. La información está almacenada en la forma de un código molecular Para entender la vida debemos identificar estas moléculas y descifrar el código 1928 1944 1949 1952 1953
  13. 13. 1949 DNA se duplica durante la división celular Chargaff: A = T and G = C 1928 1944 1949 1952 1953
  14. 14. 1952 - Hershey-Chase Experiment 1928 1944 1949 1952 1953
  15. 15. M.H.F. Wilkins, A.R. Stokes, H.R. Wilson: Molecular Structure of Deoxypentose Nucleic Acids. Nature 171, 738 (1953) R.E. Franklin and R.G. Gosling Molecular Configuration in Sodium Thymonucleate, Nature 171, 740 (1953) 1928 1944 1949 1952 1953
  16. 16. MOLECULAR STRUCTURE OF NUCLEIC ACIDS “We wish to propose a structure for the salt of desoxyribose nucleic acid (DNA). This structure has novel features which are of considerable biological interest” Nature. 25 de abril de 1953 1928 1944 1949 1952 1953
  17. 17. “It has not escaped our attention that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material.” 1928 1944 1949 1952 1953
  18. 18. The base pairs
  19. 19. En 1955 Ochoa publicó en Journal of the American Chemical Society con la bioquímica francorrusa Marianne Grunberg-Manago, el aislamiento de una enzima del colibacilo que cataliza la síntesis de ARN, el intermediario entre el ADN y las proteínas. Los descubridores llamaron «polinucleótido-fosforilasa» a la enzima, conocida luego como ARN-polimerasa. El descubrimiento de la polinucleótido fosforilasa dio lugar a la preparación de polinucleótidos sintéticos de distinta composición de bases con los que el grupo de Severo Ochoa, en paralelo con el grupo de Marshall Nirenberg, llegaron al desciframiento de la clave genética. 1955 1959 1962 1966
  20. 20. 1955 1959 1962 1966
  21. 21. Cuando Perutz llegó a Cambridge la estructura molecular más grande que se había resuelto era la del pigmento natural ficocianina, de 58 átomos. Una proteína tiene miles de átomos. Bernal, su director, había realizado algunas imágenes de difracción de rayos X de cristales de una proteína, la pepsina, pero sin llegar a interpretarlas. El tema escogido por Perutz para su tesis fue otra proteína, la hemoglobina, el transportador de oxígeno que da color rojo a nuestra sangre. La hemoglobina tiene nada menos que 11.000 átomos. Tardo 23 años. 1955 1959 1962 1966
  22. 22. 1955 1959 1963 1966
  23. 23. Over the course of several years, Marshall Nirenberg, Har Khorana and Severo Ochoa and their colleagues elucidated the genetic code – showing how nucleic acids with their 4-letter alphabet determine the order of the 20 kinds of amino acids in proteins. Messenger RNA is interpreted three letters at a time; a set of three nucleotides forms a "codon" that encodes an amino acid. A three-letter word made of four possible letters can have 64 (4 x 4 x 4) permutations, which is more than enough to encode the 20 amino acids in living beings. 1955 1959 1962 1966
  24. 24. From DNA to protein
  25. 25. Entendiendo los mecanismos, creando las herramientas 1970 1971 1975 1977 1980
  26. 26. El Central Dogma 1970 1971 1975 1977 1980
  27. 27. Created in 1971 with seven structures 1970 1971 1975 1977 1980
  28. 28. El ADN recombinante, o ADN recombinado, es una molécula de ADN formada por la unión de dos moléculas heterólogas, es decir, de diferente origen. Se realiza a través de las enzimas de restricción que son capaces de "cortar" el ADN en puntos concretos. De una manera muy simple podemos decir que "cortamos" un gen humano y se lo "pegamos" al ADN de una bacteria; si por ejemplo es el gen que regula la fabricación de insulina, lo que haríamos al ponérselo a una bacteria es "obligar" a ésta a que fabrique la insulina. 1970 1971 1975 1977 1980
  29. 29. 1970 1971 1975 1977 1980
  30. 30. A precursor-RNA may often be matured to mRNAs with alternative structures. An example where alternative splicing has a dramatic consequence is somatic sex determination in the fruit fly Drosophila melanogaster. In this system, the female-specific sxl-protein is a key regulator. It controls a cascade of alternative RNA splicing decisions that finally result in female flies. 1970 1971 1975 1977 1980
  31. 31. Entendiendo los mecanismos, creando las herramientas 1981 1982 1983 1985 1987 1990
  32. 32. Read out the letters from a DNA sequence GTGAGGCGCTGC 1981 1982 1983 1985 1987 1990
  33. 33. 1983 La reacción en cadena de la polimerasa, conocida como PCR por sus siglas en inglés (Polymerase Chain Reaction), es una técnica de biología molecular descrita en 1986 por Kary Mullis,[1] cuyo objetivo es obtener un gran número de copias de un fragmento de ADN particular, partiendo de un mínimo; en teoría basta partir de una única copia de ese fragmento original, o molde. 1981 1982 1983 1985 1987 1990
  34. 34. Total nucleotides Number of entries (Nov 07: 188,490,792,445) (Nov 07: 106,144,026) 1981 1982 1983 1985 1987 1990
  35. 35. 1981 1982 1983 1985 1987 1990
  36. 36. El Proyecto Genoma Humano (PGH) (Human Genome Project en inglés) consiste en determinar las posiciones relativas de todos los nucleótidos (o pares de bases) e identificar 100.000 genes presentes en él. El proyecto, dotado con 3.000 millones de dólares, fue fundado en 1990 por el Departamento de Energía y los Institutos de la Salud de los Estados Unidos, con un plazo de realización de 15 años. 1981 1982 1983 1985 1987 1990
  37. 37. ”Imagine varias copias de un libro, cortadas en 10 millones de trocitos cada una, de manera que los trocitos se solapan. Supongamos que 1 millón de trocitos se han perdido, y que los otros 9 millones están manchados de tinta. Recupere el texto original.”
  38. 38. HUGO: Idealized representation of the hierarchical shotgun sequencing strategy. A library is constructed by fragmenting the target genome and cloning it into a large-fragment cloning vector; here, BAC vectors are shown. The genomic DNA fragments represented in the library are then organized into a physical map and individual BAC clones are selected and sequenced by the random shotgun strategy. Finally, the clone sequences are assembled to reconstruct the sequence of the genome.
  39. 39. Descifrando el libro de la vida 1990 1995 1996 1997 1998 1999 2001
  40. 40. S.F. Altschul, et al. (1990), "Basic Local Alignment Search Tool," J. Molec. Biol., 215(3): 403-10, 1990. 15,306 citations Altschul, S.F. et al (1997), “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs”, Nucleic Acids Res., vol. 25, no. 17, pp. 3389-402. 1990 1995 1996 1997 1998 1999 2001
  41. 41. • SSAHA (Ning et al., 2001) • http://www.sanger.ac.uk/Software/analysis/SSAHA/ • SSAHA is an algorithm for very fast matching and alignment of DNA sequences. It stands for Sequence Search and Alignment by Hashing Algorithm. It achieves its fast search speed by converting sequence information into a `hash table' data structure, which can then be searched very rapidly for matches. • BLAT (J. Kent, 2002) • http://genome.ucsc.edu/cgi-bin/hgBlat • BLAT on DNA is designed to quickly find sequences of 95% and greater similarity of length 40 bases or more. It may miss more divergent or shorter sequence alignments. It will find perfect sequence matches of 33 bases, and sometimes find them down to 20 bases. BLAT on proteins finds sequences of 80% and greater similarity of length 20 amino acids or more.
  42. 42. J. Thompson, T. Gibson, D. Higgins (1994), CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment … Nuc. Acids. Res. 22, 4673 - 4680 1990 1995 1996 1997 1998 1999 2001
  43. 43. Flowchart of computation steps in Clustal W (Thompson et al., 1994) Pairwise alignment: calculation of distance matrix Creation of unrooted neighbor-joining tree Rooted nJ tree (guide tree) and calculation of sequence weights Progressive alignment following the guide tree
  44. 44. Otros métodos Notredame, C., Higgins, D.G., Heringa, J. (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol, 302, 205–217. Edgar, R.C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res., 32, 1792–1797. Katoh, K., Kuma, K., Toh, H., Miyata, T. (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res, 33, 511–518. Lassmann, T., Sonnhammer, E. (2005) Kalign – an accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics , 6, 298. Larkin M.A. et al. (2007) ClustalW and ClustalX version 2. Bioinformatics 2007 23(21): 2947-2948.
  45. 45. Tree of Life http://tolweb.org/tree/phylogeny.html http://itol.embl.de/
  46. 46. 1995 • El primer genoma completo de un organismo Hemophilus influenzae. 1990 1995 1996 1997 1998 1999 2001
  47. 47. 1996 • El genoma de la levadura se completa: aproximadamente, 6,000 genes y 14.000.000 de pares de bases 1990 1995 1996 1997 1998 1999 2001
  48. 48. 1990 1995 1996 1997 1998 1999 2001
  49. 49. 1997 •Ecuenciado el genoma de la bacteria E. Coli: 4,600 genes 4,5 millones de nucleótidos. 1990 1995 1996 1997 1998 1999 2001
  50. 50. 1998 El genoma del gusano Caenorhabditis elegans, tiene 18,000 genes unos 100 millones de nucleotidos 1990 1995 1996 1997 1998 1999 2001
  51. 51. 1999 •Se consigue la secuencia completa del cromosoma 22 El HGP va por delante de lo planeado. Sorprende el reducido número de genes encontrado (unos 300) 1990 1995 1996 1997 1998 1999 2001
  52. 52. Fire A, Xu S, Montgomery M, Kostas S, Driver S, Mello C (1998). "Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans". Nature 391 (6669): 806–11. doi:10.1038/35888. PMID 9486653
  53. 53. Hamilton A, Baulcombe D (1999). "A species of small antisense RNA in posttranscriptional gene silencing in plants". Science 286 (5441): 950–2. PMID 10542148
  54. 54. Dr Alan Wolffe (1999) • Epigenetics is heritable changes in gene expression that occur without a change in DNA sequence • Such changes cannot be attributed to changes in DNA sequence (mutations) • They are as Irreversible as mutations (or difficult to reverse)
  55. 55. 1990 1995 1996 1997 1998 1999 2001
  56. 56. Gene prediction Where are the genes? In humans: ~22,000 genes ~1.5% of human DNA
  57. 57. the gencode pipeline 1. mapping of known transcripts sequences (ESTs, cDNAs, proteins) into the human genome 2. manual curation to resolve conflicting evidence 3. additional computational predictions 4. experimental verification 5. FINAL ANNOTATION
  58. 58. Genome annotation - building a pipeline Genome sequence Map repeats Map ESTs Map Peptides Genefinding nc-RNAs Protein-coding genes Functional annotation Release August 2008 Bioinformatics tools for Comparative 64 Genomics of Vectors
  59. 59. Genefinding - ab initio predictions  Use compositional features of the DNA sequence to define coding segments (essentially exons)  ORFs  Coding bias  Splice site consensus sequences  Start and stop codons  Each feature is assigned a log likelihood score  Use dynamic programming to find the highest scoring path  Need to be trained using a known set of coding sequences  Examples: Genefinder, Augustus, Glimmer, SNAP, fgenesh August 2008 Bioinformatics tools for Comparative 65 Genomics of Vectors
  60. 60. ab initio prediction Genome Coding potential ATG & Stop codons Splice sites ATG & Stop codons Coding potential August 2008 Bioinformatics tools for Comparative 66 Genomics of Vectors
  61. 61. ab initio prediction Genome Coding potential ATG & Stop codons Splice sites ATG & Stop codons Coding potential August 2008 Bioinformatics tools for Comparative 67 Genomics of Vectors
  62. 62. ab initio prediction Genome Coding potential ATG & Stop codons Splice sites ATG & Stop codons Coding potential Find best prediction August 2008 Bioinformatics tools for Comparative 68 Genomics of Vectors
  63. 63. Genefinding - similarity  Use known coding sequence to define coding regions  EST sequences  Peptide sequences  Needs to handle fuzzy alignment regions around splice sites  Needs to attempt to find start and stop codons  Examples: EST2Genome, exonerate, genewise  Use 2 or more genomic sequences to predict genes based on conservation of exon sequences  Examples: Twinscan and SLAM August 2008 Bioinformatics tools for Comparative 69 Genomics of Vectors
  64. 64. Similarity-based prediction Genome Align cDNA/peptide Create prediction August 2008 Bioinformatics tools for Comparative 70 Genomics of Vectors
  65. 65. Example of a simple HMM Top: model architecture and parameters. Bottom: sequence generation process. green: state transition probabilities, red: emission probabilities. Prob(sequence, path|model) = 6.8e-8. EPFL – Bioinformatics I – 05 Dec 2005
  66. 66. Automatic Annotation vs Manual Automatic Annotation Manual Annotation • Quick whole genome analysis ~ • Extremely slow~3 months Chr 6 weeks • Need finished seq • Consistent annotation • Flexible, can deal with • Use unfinished sequence/shotgun inconsistencies in data assembly • Most rules have exception • No polyA sites/signals, pseudogene • Consult publications as well as • Predicts ~70% loci databases
  67. 67. Analysis EGASP predictions vs manual 100 annotation 100 Exon Sn Nuc Sn 90 90 Nuc Sp Exon Sp 80 80 70 70 60 60 50 50 40 40 30 30 20 20 10 10 0 0 9_101_1 20_79_1 36_46_1 41_77_1 9_101_1 20_79_1 36_46_1 41_77_1 80 80 Trans Sn 70 Gene Sn Trans Sp 70 Gene Sp 60 60 50 50 40 40 30 30 20 20 10 10 0 0 9_101_1 20_79_1 36_46_1 41_77_1 9_101_1 20_79_1 36_46_1 41_77_1
  68. 68. Y sólo es el principio 2002 2004 2005 2007 2010
  69. 69. 2002 2004 2005 2007 2010
  70. 70. 10/3/02 8/28/03 5/07 10/08 Published complete genomes: 104 156 500 874 Ongoing prokaryotic genomes: 316 386 1500 2124 Ongoing eukaryotic genomes: 218 246 700 1004 http://www.genomesonline.org 4000 2002 2004 2005 2007 2010
  71. 71. 32,000,000 454-GS20 Millions 4 .5 4 4 .0 4 Applied Biosystems 3 .5 4 Roche / 454 # Bases/Run 3 .0 4 ABI 3730XL ABI Genome Sequencer FLX 2 .5 4 ABI 1 Mb / day 2 .0 4 ABI 3730 100 Mb / run 1 .5 4 3700 1 .0 4 370/377 0 .5 4 0 .0 4 1994 1996 1998 2000 2002 2004 2006 Dat e of Int roduct ion Applied Biosystems SOLiD Illumina / Solexa 3000 Mb / run Genetic Analyzer 2000 Mb / run 2002 2004 2005 2007 2010
  72. 72. Aunque los seres humanos compartimos 99.9 por ciento de la información genética, tenemos pequeñas variaciones, llamadas poliformismos singulares de nucléotido o SNP (por su siglas en inglés; se pronuncia snip). Se estima que existen unos 10 millones de SNP en la especie humana y supuestamente esas diferencias estarían relacionadas con la mayor resistencia o susceptibilidad a enfermedades y medicamentos. 2002 2004 2005 2007 2010
  73. 73. VARIACIÓN EN LA SECUENCIA HUMANA DE DNA Tasa de mutación = 10-8 /sitio/generación Nº generaciones ancestro común-humano actual: 104-105
  74. 74. ENCyclopedia Of DNA Elements 2002 2004 2005 2007 2010
  75. 75. 2002 2004 2005 2007 2010
  76. 76. Genómica funcional
  77. 77. Sequence (DNA/RNA) Comparative & phylogeny genomics Protein sequence analysis & Regulation of gene evolution expression; transcription factors & micro RNAs Protein structure & function: computational crystallography Protein families, motifs and domains Chemical biology Protein interactions & complexes: modelling and prediction Pathway analysis Data integration & literature mining Image analysis Systems modelling
  78. 78. Se preparan las Se preparan copias del ADN muestras de ARN de los genes de interés de interés Laser 1 Laser 2 control muestr a El chip se excita con láseres diferentes: el ...que se Transcripción control imprimen inversa reacciona a uno en el chip Añadir de ellos y la fluorescencia muestra al otro La comparación de ambas imágenes nos indica que genes se expresan de manera diferente Las muestras se hibridan en el microarray Schena et al. Science 1995
  79. 79. Microarray analysis Clinical prediction of Leukemia type • 2 types – Acute lymphoid (ALL) – Acute myeloid (AML) • Different treatment & outcomes • Predict type before treatment? Golub et. al. Science 286:531-537. (1999)
  80. 80. Biomarkers discovery Data statistical Management analysis Network Annotation análisis Selection 30.000 1500 genes 150 genes 50 elements 10 targets genes
  81. 81. RT-PCR Standard Processing Procedure TaqMan Assays ! Overview Plates & Samples ! Quality Control Step1: Calculate Ct with SDS and export text file Raw Values ! Discard Samples Step2: Retrieve data and define experiment design ! Quality Control ΔCt Overview Step 4: Selection of Optimal Step 5: Differential Step 3: Biological Endogenous Controls & Expression Analysis ΔΔCt Replicates Calculation of ΔCt
  82. 82. Example of Array CGH Technology* Chari et al, Cancer Informatics, 2006, 2, 48-58 88
  83. 83. 89
  84. 84. Chip-on-chip Source: http://www.chiponchip.org/
  85. 85. ChIP (Chromatin ImmunoPrecipitation) • Chromatin immunoprecipitation, or ChIP, refers to a procedure used to determine whether a given protein binds to a specific DNA sequence in vivo DNA-binding proteins are crosslinked to DNA with formaldehyde in vivo Bind antibodies specific to the DNA- binding protein to isolate the complex by precipitation. Reverse the cross- linking to release the DNA and digest the proteins. Isolate the chromatin. Shear DNA along with bound proteins into small fragments. Use PCR( Polymerase Chain Reaction ) to amplify specific DNA sequences to see if they were precipitated with the antibody
  86. 86. Protein Microarray G. MacBeath and S.L. Schreiber, 2000, Science 289:1760 arrayIT TM Spotting platform and protein microarray
  87. 87. Different Kinds of Protein Arrays* Antibody Array Antigen Array Ligand Array Detection by: SELDI MS, fluorescence, SPR, electrochemical, radioactivity, microcantelever
  88. 88. The Microarray Study Process
  89. 89. Preprocesado
  90. 90. Some Questions: • Which genes have expression levels that are correlated with some external variable? • For a given pathway, which of the genes in our collection are most likely to be involved? • For a diffuse disease, which genes are associated with different outcomes?
  91. 91. Challenges for Data Analysis • Normalization (removing systematic measurement effects) • Variable Selection (Identification of relevant Variables) • Large sample Effects: Type I and Type II errors (False positives / False negatives) • Dimensionality Reduction • Identification of new disease classes • Classification of data into known disease classes
  92. 92. Data Analysis Methods Dimension Reduction • PCA (Principle Component Analysis) • ICA (Independent Component Analysis) • Multidimensional Scaling Unsupervised Learning • K-Means / K-Medoid • Hierarchical Clustering Algorithms Supervised Learning • Linear Discriminant Analysis • Maximum Likelihood Discrimination • Nearest Neighbor Methods • Decision Trees • Random Forests
  93. 93. Matrix factorization
  94. 94. Popular Classification Methods • Decision Trees/Rules – Find smallest gene sets, but not robust – poor performance • Neural Nets - work well for reduced number of genes • K-nearest neighbor – good results for small number of genes, but no model • Naïve Bayes – simple, robust, but ignores gene interactions • Support Vector Machines (SVM) – Good accuracy, does own gene selection, but hard to understand • Specialized methods, D/S/A (Dudoit), … 102
  95. 95. Support Vector Machine (SVM) • Main idea: Select hyperplane that is more likely to generalize on a future datum
  96. 96. Best Practices • Capture the complete process, from raw data to final results • Gene (feature) selection inside cross-validation • Randomization testing • Robust classification algorithms – Simple methods give good results – Advanced methods can be better • Wrapper approach for best gene subset selection • Use bagging to improve accuracy • Remove/relabel mislabeled or poorly differentiated samples 104
  97. 97. Enrichment Analysis • What are major enriched GO terms? • What are the highly active pathways? • What are the frequently interacting proteins? • What are the known disease associations? Alistair Chalk, 2008
  98. 98. Meta-analysis example: “Creation and implications of a phenome-genome network” Butte and Kohane. Nat Biotech. 2006
  99. 99. Meta-analysis example: “Creation and implications of a phenome-genome network” Butte and Kohane. Nat Biotech. 2006 • Clustered experiments based on mapping concepts found in sample annotations to UMLS meta-thesaurus. • Relationships found between phenotype (e.g., aging), disease (e.g., leukemia), environmental (e.g., injury) and experimental (e.g., muscle cells) factors and genes with differential expression. • “the ease and accuracy of automating inferences across data are crucially dependent on the accuracy and consistency of the human annotation process, which will only happen when every investigator has a better prospective understanding of the long- term value of the time invested in improving annotations.”
  100. 100. Biología de sistemas
  101. 101. PPI ANNOTATION AND DATABASES Database Reference URL MINT (Zanoni et al., 2002) http://mint.bio.uniroma2.it/mint IntAct (Hermjakob et al., 2004) http://www.ebi.ac.uk/intact DIP (Xenarios et al., 2002) http://dip.doe-mbi.ucla.edu/ HPID (Han et al., 2004) http://www.hpid.org HPRD (Peri et al., 2004) http://www.hprd.org/  iMEX agreement to share curation efforts  Protein Standard Initiative (PSI) recommendation  Molecular Interaction (MI) Ontology  Large scale experiments Literature curation
  102. 102. Complex networks • Many systems can be represented as networks (graphs) – Nodes: individual component (proteins) – Edges: relationships (interactions) • They share common properties – Scale-free – Hierarchical – Clustering • Some properties may be intrinsic and can be understood better when putting into the context of evolution
  103. 103. Detecting Hierarchical Organization
  104. 104. Summary: Network Measures • Degree ki The number of edges involving node i • Degree distribution P(k) The probability (frequency) of nodes of degree k • Mean path length The avg. shortest path between all node pairs • Network Diameter – i.e. the longest shortest path • Clustering Coefficient – A high CC is found for modules
  105. 105. Mapping the phenotypic data to the network •Systematic phenotyping of 1615 gene knockout strains in yeast •Evaluation of growth of each strain in the presence of MMS (and other DNA damaging agents) •Screening against a network of 12,232 protein interactions Begley TJ, Rosenbach AS, Ideker T, Samson LD. Damage recovery pathways in Saccharomyces cerevisiae revealed by genomic phenotyping and interactome mapping. Mol Cancer Res. 2002 Dec;1(2):103-12.
  106. 106. The Role of Proteomics • The existence of an ORF does not imply the existence of a functional gene. • Limitations of comparative genomics. • mRNA levels may not correlate with protein levels. • Protein modifications  post-transcriptional modifications, isoforms, post-translational modifications, mutants. • Issues of proteolysis, sequestration, etc. relevant only at the protein level. • Protein complex composition, protein-protein interactions, structures.
  107. 107. Structural proteomics • Folding • Structure and function • Protein structure prediction • Secondary structure • Tertiary structure • Function • Post-translational modification • Prot.-Prot. Interaction -- Docking algorithm • Molecular dynamics/Monte Carlo
  108. 108. What kind of methods around? 5 main levels of protein Structure prediction: 1. Extensive Sequence Search 2. Threading and 1D-3D profiles 3. Ab initio prediction of protein structure 4. Comparative Modelling 5. Docking (domain interaction prediction)
  109. 109. Prediction of Protein Structures • Examples – a few good examples actual predicted actual predicted actual predicted actual predicted
  110. 110. MODPIPE: Large-Scale Comparative Protein Structure Modeling START 1 Get profile for sequence (NR) Expand match to cover complete domains PSI-BLAST For each template structure For each target sequence Scan sequence profile against MODELLER representative PDB chains Align matched parts of sequence and structure Scan PDB chain profiles Build model for target segment by against sequence satisfaction of spatial restraints Evaluate model Select templates using permissive E-value cutoff 1 END R. Sánchez & A. Šali, Proc. Natl. Acad. Sci. USA 95, 13597, 1998. N. Eswar, M. Marti-Renom, M.S. Madhusudhan, B. John, A. Fiser, R. Sánchez, F. Melo, N. Mirkovic, A. Šali. 3/25/03
  111. 111. Structural Proteomics: The Motivation* 2000000 200000 1800000 180000 1600000 160000 1400000 140000 Sequences Structures 1200000 120000 1000000 100000 800000 80000 600000 60000 400000 40000 200000 20000 0 0 1980 1985 1990 1995 2000 2005
  112. 112. The hierarchies of protein structure
  113. 113. Docking Programs • Dock (UCSF) • Autodock (Scripps) • Glide (Schrodinger) • ICM (Molsoft) • FRED (Open Eye) • Gold, FlexX, etc. 126
  114. 114. Cell cycle network from KEGG
  115. 115. Graphical Notation: a necessity for the conceptual representation of biopathways Qualitative Mechanistic various degree of detail, mixed level of presentation Aladjem et al., Science STKE pe8 Thiery & Sleeman, Nat. Rev. Mol. (2004) Cell. Biol 7:131 (2006) 128
  116. 116. Strategies: simulate or analyse? (or rather what to do first) obtain qualitative convert diagram simulate model understanding into a quantitative behavior through numerical model numerically results and model reduction build and identify qualitatively simulate a “elementary analyze network reduced model modes” topology, stability, etc 129
  117. 117. 130 stochsim Boolean networks Space of modeling methods continuous ↔ discrete
  118. 118. Continuum of modeling approaches Top-down Bottom-up
  119. 119. Frazier et al. (2003) Science 11 April Vol 300:290-293
  120. 120. Integración de datos
  121. 121. Nucleic Acids Research article lists 1078 public databases Nucleic Acids Research, 2008, Vol. 36, Database issue http://nar.oxfordjournals.org/cgi/reprint/36/suppl_1/D2
  122. 122. Growth in Available Bioinformatics Databases
  123. 123. Too much unintegrated data • Data sources incompatible • No (or few) standard naming convention • No common interface (varying tools for browsing, querying and visualizing data)
  124. 124. – Large experiments or large research – Small, isolated, independent, groups/labs, possibly distributed groups/individuals – Large service provider institutes. – Loosely coupled provider- consumer of resources. – Tightly coupled provider-consumer of resources. – Commonly resource consumers – Commonly resource providers. – Boutique suppliers. – Some or lots of access to sys admin – Poor access systems admins
  125. 125. Challenges: Names and Identity • WSL-1 protein Q93038 = Tumor necrosis factor • Apoptosis-mediating receptor DR3 receptor superfamily member • Apoptosis-mediating receptor 25 precursor TRAMP • Death domain receptor 3 Annotation history: • WSL protein • Apoptosis-inducing receptor AIR Q92983 P78515 • Apo-3 O00275 Q93036 • Lymphocyte-associated receptor of death O00276 Q93037 • LARD O00277 Q99722 • GENE: Name=TNFRSF25 O00278 Q99830 O00279 Q99831 O00280 Q9BY86 O14865 Q9UME0 GUIDs O14866 Q9UME1 Life Science P78507 Q9UME5 Identifier? Normalisation 138 http://www.expasy.org/uniprot/Q93038
  126. 126. Why must support standards? • Unambiguous representation, description and communication – Final results and metadata • Interoperability – Data management and analysis • Integration of OMICS  system biology
  127. 127. What to standarize? • CONTENT: Minimal/Core Information to be reported • MIBBI (http://www.mibbi.org) • SEMANTIC: Terminology Used -> Ontologies • OBI (http://obi-ontology.org) • SYNTAX: Data Model, Data Exchange • Fuge (http://fuge.sourceforge.net/)
  128. 128. MIBBI: Standard Content Promoting Coherent Minimum Reporting Requirements for Biological and Biomedical Investigations: The MIBBI Project, Taylor et Al, Nature Biotech.
  129. 129. Link Integration: Integration Lite Application interface User interface Application Ontology Authority Identity Authority 143
  130. 130. Warehouse Wrappers Wrappers Data Access and Query User interface Application Unified Wrappers model • Copy the data sets, clean and massage data into shape • Combine them into a (different) pre-determined model before query • ATLAS, MRS, e-Fungi, GIMS, Medicel Integrator, MIPS, BioMART • Often called “Knowledge bases”  144
  131. 131. View integration Wrappers Wrappers Data Access and Query User interface Application Unified Wrappers model • Data at Source; Virtual integrating database view • Global as View / Local as View mappings between models • Map from model to databases dynamically so always fresh • TAMBIS, Information Integrator, K4, ComparaGrid, UTOPIA, caCORE 145
  132. 132. Specialist Integrating Application Wrappers Wrappers User interface Application Wrappers E.g. Ensembl, UTOPIA • Very popular. Known to be one application. 146
  133. 133. Workflows Workflow Engine User interface Application Wrapper • Data flow protocol. Automated data chaining. • General technique for describing and enacting a process • Describes what you want to do, not how you want to do it • Various degrees of data type compliance anticipated 147
  134. 134. Mash-Up Data Marshalling objects Protocol Mash Up Application User interface Protocol Protocol • Content syndication and feeds • Emphasis on User creating specific integration by mapping. • Just in time, just enough design • On demand integration 148
  135. 135. Composite applications
  136. 136. Semantic Web help? Access and Query Wrappers User interface Application Wrapper Wrappers Semantic Enrichment Model flattening Mapping Transparency • Slight problem: we have no first class metadata migration and management infrastructure, where metadata is outside the application and in the middleware, and we can handle progressive curation 150
  137. 137. Service Oriented Architecture Advanced Search Retrieve data Submit data submission curation ws ws ws ws ws dataflow workflow
  138. 138. Distributed Annotation System
  139. 139. Distributed Annotation System
  140. 140. An Integrative Analysis Example Relational data Decision mining Text tree model of mining Visualizing metabonomi serial/spect Visualizing c profile rum data cluster statistics Visualizing Visualizing Visualizin Chemical multidimensi Visualizing g sequence structure data pathway onal data Chemical relational Text mining Spectrum visualization data data sequence visualization data data clusters mining model
  141. 141. From experiments to scientific publications 1- Experiments 2- Results 3- Scientific Peer- reviewed articles Planning and Processing and carrying out interpretation of 'Relevant' results are experiments obtained results published in scientific (lab work) journals
  142. 142. PubMed/Medline database at NCBI - Developed at the National Center for Biotechnology Information (NCBI). - The core 'Textome'. - repository of citation entries of scientific articles. - PubMed titles and abstracts are primary data source for Bio-NLP. - ~ 450,000 new abstracts/a - > 4,800 biomedical journals - ENTREZ search engine
  143. 143. Data in scientific articles Scientific Free Text Tables Figures Journals Title Abstracts Keywords Text body References Journal- Biomedical literature characteristics specific Information: - Heavy use of domain specific terminology (12% biochemistry •Format •Paper structure related technical terms). (sections) - Polysemic words (word sense disambiguation). •Article type - Most words with low frequency (data sparseness). - New names and terms created. - Typographical variants - Different writing styles (native languages)
  144. 144. BioCreative
  145. 145. BioCreative
  146. 146. BioCreative results TP: prediction evaluated as protein and GO terms correct Precision: TP / Total nr. of evaluated submissions 1: Chiang et al. 2: Couto et al. 3: Ehrler et al. 4: Ray et al. 5: Rice et al. 6: Verspoor et al.
  147. 147.  Data Integration • Standards, DBs Infrastructure  Knowledge Discovery • Algorithms, Informatics, Machine Learning  Integrate knowledge • Text mining, Ontologies  Modelling • Pathways, Circuits, Abstraction Research Support
  148. 148. Los retos de la biología en los próximos 50 years • Listado de todos los componentes moleculares que forman un organismo: – Genes, proteinas, y otros elementos funcionales • Comprender la funcion de cada componente • Comprender como interaccionan • Estudiar como la función ha evolucionado • Encontrar defectos geneticos que causan enfermedades • Diseñar medicamentos y terapias de manera racional • Secuenciar el genoma de cada individuo y usarlo en una medicina personalizada • La Bioinformatica es un componente esencial para conseguir todos estos objetivos

×