L. ARAVIND National Center for Biotechnology Information Apprehending Life’s complexity: Making and communicating biological discoveries
We are becoming meme and teme machines ! It is all about replicators:  biological and otherwise The good old genes Memes Temes
Summary of issues Discovery in biology Different philosophies: Natural history versus “hypothesis driven science” Evolutionary theory and computation as a bridge between the philosophical antipodes Example of the PAS domain A rich breeding ground for memes Levels of organization in the living world and its complexity Microscopic, mesoscopic  and macroscopic world views need integration Seeking gold at the end of the maze Following natural order: hierarchies and networks Examples of classifications and hierarchies The meme machine: transmission  of discoveries Databases and search tools Scientific collaboration and competition  Journal systems
The two philosophies in biology Natural history: discovery of new forms, cataloguing and classification Hypothesis-> attempt at falsification->paradigms: Popper’s world view Largely a history of clash or neglect
Building the bridge: Evolutionary theory and computation + . . . . Sequence profile analysis Structure similarity comparisons Contextual analysis Understanding and predicting protein (biomolecule)  function Systems biology: Ensembles of biomolecules in functional guilds The “omics” (regular and meta): From sequence to organismal biology and ecology
Early domain  universe The protein universe shows enormous diversity but an underlying unity These relationships are powerful predictors of protein evolution, function and behavior ? ? ? ? The largest assemblage of homologous domains that can unified by sequence features is formally a  superfamily Several superfamilies may share a common folding pattern and arrangement of secondary structure elements: unified to a  fold
ALL LIFE FORMS BACTERIA ARCHAEA EUKARYA The ribosome, and the associated enzymes like some RNAses (including RnaseHII), PseudoU synthases, RNA methylases, thioU synthases, Clamp loader ATPase,  RecA, RNA polymerases, translation GTPases, AATRS, ABC,  MinD ATPases, OSGP like chaperone/protease. PCNA, DNA ligases, rRNA and tRNAs DNA polymerases, Holliday junction resolvases, Primases, Replicative Helicases, Origin recognition complexes Ribozymes are well-known: so an RNA world of sorts must have existed There was a common ancestor of all life; the main functions of this life form revolved around RNA metabolism and translation; some cellular functions related to DNA had developed but modern DNA replication “crystallized” later So there was a RNA centered ancestral form with a possible DNA intermediate in replication   Unifying life and inferring the common ancestor
Getting behind biological clocks, photodetectors and oxygen sensors Regulation of circadian rhythms in animals Periodic growth and sporulation in fungi Light regulated expression of photosynthetic pigments Oxygen-seeking behavior in aerobic bacteria A master regulator of the clock the period protein (per) WC-1 and WC-2 two light sensory regulators of gene-expression in  Neurospora BAT a regulator of photosynthetic pigment expression The aerotaxis receptor of E.coli and other bacteria
The PAS domain  A ligand binding domain which  binds diverse ligands like heme, tetrahydropyrrole and flavin nucleotides Thus, it can sense diverse stimuli like light, redox or both Transmits this stimulus to a diverse range of other “effector” domains Curr Biol. 1997 Nov 1;7(11):R674-7. PAS: a multifunctional domain family comes to light. Ponting CP, Aravind L.   Curr Biol. 7(11):R674-7. PAS: a multifunctional domain family comes to light.  Ponting CP and  Aravind L PAS PAS bHLH PAS AAA+ HTH Transcription WC-1 SIM PER PAS GATA PAS PAS PAS PAS PAS C6
PAS PAS S/T-Kinase GAF GAF Adenylyl cyclase PAS GAF PAS PAS ERG-channels: redox sensing in animal hearts Phytochrome: Light sensing in plants and bacteria Signaling intracellular redox states Small-molecule based regulation of signaling enzymes Birth of a meme… Detection of the PAS domain allows a definitive functional prediction The mechanisms of critical molecules across the entire diversity of life could be predicted  It was a very successful meme indeed:  887 publications  following up on the original characterization and function prediction of the PAS domain have emerged since – around 80-90 per year. The predictions: H-kinase
Overview of biological complexity Discovery and classification of domains  Mesoscopic Characterization of biological functional systems Function prediction & classification Microscopic Computational analysis of whole biological systems or networks Reconstructing organismal biology and whole ecosystems Macroscopic Evolutionary trajectories: Genomes to Biology
Eukaryotic signaling proteins show non-linear scaling with proteome size… However, major superfamilies of signaling proteins show largely linear trends: invention of many lineage-specific systems independent of the large superfamilies Deviations point to important functional adaptations: convergent evolution of LRR+kinase architectures
(Prolyl hydroxylases) Rs Pbcv1 Dm Arabidopsis Drosophila Lineage specific expansion of a domain family Definition: The increase in numbers of a domain in particular lineage with respect to its number in sister reference lineage Homo
Section of the contextual network for the Ub pathway LF LF WLM UB WLM PUG WLM UB LF LF WLM WLM PNGase Thioredoxin PAW PNGase PAW PUG UBA PUG PPPDE PUL DOMAIN Thioredoxin PPPDE UB OTU-DUB UB C 2 H 2 OTU-DUB UB Asp Protease UBA Thioredoxin X UBX UBX Thioredoxin ZZ finger UBA PUL DOMAIN WD40 LF LF LF LF LF Calpain A 2 0 Z n F UB/ UBX UBCH LF C 2 H 2 - U An1 ZnF OTU-DUB PNGase RAB-GEF PUL DOMAIN Thioredoxin PPPDE Asp Protease PAW ZZ finger WLM (metallopeptidase) Yif1 TM TM TM TM TM // RAB WD40 Calpain E2 // UBA * *   Predicted DUB * * * * *
Domain architectural “complexity” of eukaryotic signaling proteins Complexity can vary drastically even between sister lineages: parasitism causes a general fall in complexity The complexity in free-living forms is high in the chromalveolate+crown group clade. Multicellularity and cellular complexity resulted in increases in domain architectural complexity but clearly the increase was greatest in the animal lineage alone. Fungi as a whole show a reduction of complexity concomitant with their gene loss with respect to the ancestor of the crown group lineage.
Biology of Networks  Nodes  Links  Interaction A B Network Proteins Physical Interaction Protein-Protein A B Protein Interaction Metabolites Enzymatic  conversion Protein-Metabolite A B Metabolic Transcription factor Target genes Transcriptional Interaction Protein-DNA A B Transcriptional
112 TFs 711 TGs 1295 Interactions E. coli transcriptional regulatory network Small-scale biochemical experiments Large-scale ChIP-chip experiments and genetic deletion and over-expression data 157 TFs 4410TGs 12873 Interactions Datasets Yeast transcriptional regulatory network
Scale-free structure  Presence of few  nodes with many links and many nodes with few links Transcriptional networks are scale-free Scale free structure provides robustness to the system Albert & Barabasi,  Rev Mod Phys (2002) N (k)   k  1
Crp NarL Crp NarL E. coli H. influenzae B. pertussis NarL Crp Regulatory hubs which are  condition specific  can be either  lost  or  replaced The same protein in organisms living in different lifestyles may confer different adaptive value. Hence it may emerge as a regulatory  hub in the organism to which it confers high adaptive  value and not in the others Different proteins should emerge as hubs in organisms with different lifestyle
Apprehending the diversity of eukaryotes “ crown group” Most studied “ microbial eukaryotes” Most diverse and prevalent animals fungi Slime molds plants Chlorophytes rhodophytes diatoms Heteroloboseans parbasalids Diplomonads Euglenozoa ciliates Apicomplexans
Some notable associations that might favor inter-eukaryotic gene flow Primary endosymbiosis with cyanobacterium Secondary endosymbiosis with different plant lineages Plant lineages Karyoklepty (e.g. ciliates) Endosymbiosis Engulfment Parasitic nucleus Nuclear invasion Karyoparasitism (e.g. Rhodophytes) Endoparasitism (e.g. apicomplexa)
Composite selves: bacterial origins for Vitamin B12 receptors We discovered a novel domain that forms the common denominator for Vitamin B12 binding and recognition in both bacteria and animals. This helped us understand how B12 is taken up by animal guts Domain architectures and unusual phyletic  distribution of this domain strongly suggested a bacterial origin for the primary  animal Vitamin B12 receptor
The medium for biological discovery The Dali Database BLAST PSI-BLAST HMMER HHPRED DALI MUSTANG KALIGN MUSCLE … . Labs (including “Omics” centers) Primary archival databases Search  methods and strategies Secondary databases Journals Lost in the blackhole
Sociology of the process: Complexity, competition and currency Complexity Dispersion of efforts Lack of integration Gold rush for the “hot” issues Publications  seen as currency in scientific community Intense competition Secrecy and strife  Transmission of discoveries is hampered  Can we / should we intercede? Increased Collaboration
Genes: Natural selection; scientific memes: peer review? Does the axe peer review, as it stands, hamper effective scientific transmission? Great science was done without modern-style peer review Long delays in publishing  - damaging in a competitive scientific environment Inane reviews with hardly any constructive value Nitpicking – surely a primate instinct, but does is help in science? Obstructionists:  peer review as an tool against competitors  Closed  one-sided process Crackpot science : What do we do about it Enormous volume of scientific production: strain on referees and journal editors Constructive criticism helps! Open peer review system: A viable compromise? A test case for the model: Biology Direct at BMC journals
Conclusions Given the “special” interests: 1)Journals and publishers 2)Evaluation of scientists by host institutions 3)Triaging  scientific publications 4)Allocating Funds for Biological research 5) Need to bar crackpots  Given the competition: 1)Blogs 2)Wikis 3)Open access, open peer-review etc. 4)The ubiquity of the internet 5) The drive from the memes and temes! Will out of the box thinking help?

Communications

  • 1.
    L. ARAVIND NationalCenter for Biotechnology Information Apprehending Life’s complexity: Making and communicating biological discoveries
  • 2.
    We are becomingmeme and teme machines ! It is all about replicators: biological and otherwise The good old genes Memes Temes
  • 3.
    Summary of issuesDiscovery in biology Different philosophies: Natural history versus “hypothesis driven science” Evolutionary theory and computation as a bridge between the philosophical antipodes Example of the PAS domain A rich breeding ground for memes Levels of organization in the living world and its complexity Microscopic, mesoscopic and macroscopic world views need integration Seeking gold at the end of the maze Following natural order: hierarchies and networks Examples of classifications and hierarchies The meme machine: transmission of discoveries Databases and search tools Scientific collaboration and competition Journal systems
  • 4.
    The two philosophiesin biology Natural history: discovery of new forms, cataloguing and classification Hypothesis-> attempt at falsification->paradigms: Popper’s world view Largely a history of clash or neglect
  • 5.
    Building the bridge:Evolutionary theory and computation + . . . . Sequence profile analysis Structure similarity comparisons Contextual analysis Understanding and predicting protein (biomolecule) function Systems biology: Ensembles of biomolecules in functional guilds The “omics” (regular and meta): From sequence to organismal biology and ecology
  • 6.
    Early domain universe The protein universe shows enormous diversity but an underlying unity These relationships are powerful predictors of protein evolution, function and behavior ? ? ? ? The largest assemblage of homologous domains that can unified by sequence features is formally a superfamily Several superfamilies may share a common folding pattern and arrangement of secondary structure elements: unified to a fold
  • 7.
    ALL LIFE FORMSBACTERIA ARCHAEA EUKARYA The ribosome, and the associated enzymes like some RNAses (including RnaseHII), PseudoU synthases, RNA methylases, thioU synthases, Clamp loader ATPase, RecA, RNA polymerases, translation GTPases, AATRS, ABC, MinD ATPases, OSGP like chaperone/protease. PCNA, DNA ligases, rRNA and tRNAs DNA polymerases, Holliday junction resolvases, Primases, Replicative Helicases, Origin recognition complexes Ribozymes are well-known: so an RNA world of sorts must have existed There was a common ancestor of all life; the main functions of this life form revolved around RNA metabolism and translation; some cellular functions related to DNA had developed but modern DNA replication “crystallized” later So there was a RNA centered ancestral form with a possible DNA intermediate in replication Unifying life and inferring the common ancestor
  • 8.
    Getting behind biologicalclocks, photodetectors and oxygen sensors Regulation of circadian rhythms in animals Periodic growth and sporulation in fungi Light regulated expression of photosynthetic pigments Oxygen-seeking behavior in aerobic bacteria A master regulator of the clock the period protein (per) WC-1 and WC-2 two light sensory regulators of gene-expression in Neurospora BAT a regulator of photosynthetic pigment expression The aerotaxis receptor of E.coli and other bacteria
  • 9.
    The PAS domain A ligand binding domain which binds diverse ligands like heme, tetrahydropyrrole and flavin nucleotides Thus, it can sense diverse stimuli like light, redox or both Transmits this stimulus to a diverse range of other “effector” domains Curr Biol. 1997 Nov 1;7(11):R674-7. PAS: a multifunctional domain family comes to light. Ponting CP, Aravind L. Curr Biol. 7(11):R674-7. PAS: a multifunctional domain family comes to light. Ponting CP and Aravind L PAS PAS bHLH PAS AAA+ HTH Transcription WC-1 SIM PER PAS GATA PAS PAS PAS PAS PAS C6
  • 10.
    PAS PAS S/T-KinaseGAF GAF Adenylyl cyclase PAS GAF PAS PAS ERG-channels: redox sensing in animal hearts Phytochrome: Light sensing in plants and bacteria Signaling intracellular redox states Small-molecule based regulation of signaling enzymes Birth of a meme… Detection of the PAS domain allows a definitive functional prediction The mechanisms of critical molecules across the entire diversity of life could be predicted It was a very successful meme indeed: 887 publications following up on the original characterization and function prediction of the PAS domain have emerged since – around 80-90 per year. The predictions: H-kinase
  • 11.
    Overview of biologicalcomplexity Discovery and classification of domains Mesoscopic Characterization of biological functional systems Function prediction & classification Microscopic Computational analysis of whole biological systems or networks Reconstructing organismal biology and whole ecosystems Macroscopic Evolutionary trajectories: Genomes to Biology
  • 12.
    Eukaryotic signaling proteinsshow non-linear scaling with proteome size… However, major superfamilies of signaling proteins show largely linear trends: invention of many lineage-specific systems independent of the large superfamilies Deviations point to important functional adaptations: convergent evolution of LRR+kinase architectures
  • 13.
    (Prolyl hydroxylases) RsPbcv1 Dm Arabidopsis Drosophila Lineage specific expansion of a domain family Definition: The increase in numbers of a domain in particular lineage with respect to its number in sister reference lineage Homo
  • 14.
    Section of thecontextual network for the Ub pathway LF LF WLM UB WLM PUG WLM UB LF LF WLM WLM PNGase Thioredoxin PAW PNGase PAW PUG UBA PUG PPPDE PUL DOMAIN Thioredoxin PPPDE UB OTU-DUB UB C 2 H 2 OTU-DUB UB Asp Protease UBA Thioredoxin X UBX UBX Thioredoxin ZZ finger UBA PUL DOMAIN WD40 LF LF LF LF LF Calpain A 2 0 Z n F UB/ UBX UBCH LF C 2 H 2 - U An1 ZnF OTU-DUB PNGase RAB-GEF PUL DOMAIN Thioredoxin PPPDE Asp Protease PAW ZZ finger WLM (metallopeptidase) Yif1 TM TM TM TM TM // RAB WD40 Calpain E2 // UBA * * Predicted DUB * * * * *
  • 15.
    Domain architectural “complexity”of eukaryotic signaling proteins Complexity can vary drastically even between sister lineages: parasitism causes a general fall in complexity The complexity in free-living forms is high in the chromalveolate+crown group clade. Multicellularity and cellular complexity resulted in increases in domain architectural complexity but clearly the increase was greatest in the animal lineage alone. Fungi as a whole show a reduction of complexity concomitant with their gene loss with respect to the ancestor of the crown group lineage.
  • 16.
    Biology of Networks Nodes Links Interaction A B Network Proteins Physical Interaction Protein-Protein A B Protein Interaction Metabolites Enzymatic conversion Protein-Metabolite A B Metabolic Transcription factor Target genes Transcriptional Interaction Protein-DNA A B Transcriptional
  • 17.
    112 TFs 711TGs 1295 Interactions E. coli transcriptional regulatory network Small-scale biochemical experiments Large-scale ChIP-chip experiments and genetic deletion and over-expression data 157 TFs 4410TGs 12873 Interactions Datasets Yeast transcriptional regulatory network
  • 18.
    Scale-free structure Presence of few nodes with many links and many nodes with few links Transcriptional networks are scale-free Scale free structure provides robustness to the system Albert & Barabasi, Rev Mod Phys (2002) N (k)  k  1
  • 19.
    Crp NarL CrpNarL E. coli H. influenzae B. pertussis NarL Crp Regulatory hubs which are condition specific can be either lost or replaced The same protein in organisms living in different lifestyles may confer different adaptive value. Hence it may emerge as a regulatory hub in the organism to which it confers high adaptive value and not in the others Different proteins should emerge as hubs in organisms with different lifestyle
  • 20.
    Apprehending the diversityof eukaryotes “ crown group” Most studied “ microbial eukaryotes” Most diverse and prevalent animals fungi Slime molds plants Chlorophytes rhodophytes diatoms Heteroloboseans parbasalids Diplomonads Euglenozoa ciliates Apicomplexans
  • 21.
    Some notable associationsthat might favor inter-eukaryotic gene flow Primary endosymbiosis with cyanobacterium Secondary endosymbiosis with different plant lineages Plant lineages Karyoklepty (e.g. ciliates) Endosymbiosis Engulfment Parasitic nucleus Nuclear invasion Karyoparasitism (e.g. Rhodophytes) Endoparasitism (e.g. apicomplexa)
  • 22.
    Composite selves: bacterialorigins for Vitamin B12 receptors We discovered a novel domain that forms the common denominator for Vitamin B12 binding and recognition in both bacteria and animals. This helped us understand how B12 is taken up by animal guts Domain architectures and unusual phyletic distribution of this domain strongly suggested a bacterial origin for the primary animal Vitamin B12 receptor
  • 23.
    The medium forbiological discovery The Dali Database BLAST PSI-BLAST HMMER HHPRED DALI MUSTANG KALIGN MUSCLE … . Labs (including “Omics” centers) Primary archival databases Search methods and strategies Secondary databases Journals Lost in the blackhole
  • 24.
    Sociology of theprocess: Complexity, competition and currency Complexity Dispersion of efforts Lack of integration Gold rush for the “hot” issues Publications seen as currency in scientific community Intense competition Secrecy and strife Transmission of discoveries is hampered Can we / should we intercede? Increased Collaboration
  • 25.
    Genes: Natural selection;scientific memes: peer review? Does the axe peer review, as it stands, hamper effective scientific transmission? Great science was done without modern-style peer review Long delays in publishing - damaging in a competitive scientific environment Inane reviews with hardly any constructive value Nitpicking – surely a primate instinct, but does is help in science? Obstructionists: peer review as an tool against competitors Closed one-sided process Crackpot science : What do we do about it Enormous volume of scientific production: strain on referees and journal editors Constructive criticism helps! Open peer review system: A viable compromise? A test case for the model: Biology Direct at BMC journals
  • 26.
    Conclusions Given the“special” interests: 1)Journals and publishers 2)Evaluation of scientists by host institutions 3)Triaging scientific publications 4)Allocating Funds for Biological research 5) Need to bar crackpots Given the competition: 1)Blogs 2)Wikis 3)Open access, open peer-review etc. 4)The ubiquity of the internet 5) The drive from the memes and temes! Will out of the box thinking help?