Presentazione 2012 05_03

182 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
182
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Presentazione 2012 05_03

  1. 1. Ph.D. Day2012/05/16METAL PDBMETAL PDBA DATABASE OF METALLOPROTEINSA DATABASE OF METALLOPROTEINSXXVI cycle of“International Doctorate inMechanistic and Structural Systems Biology”Serena Lorenzini, 2 ndyear Ph.D. studentTutor: Claudia AndreiniPh.D. Day2012/05/16METAL PDBMETAL PDBA DATABASE OF METALLOPROTEINSA DATABASE OF METALLOPROTEINSXXVI cycle of“International Doctorate inMechanistic and Structural Systems Biology”Serena Lorenzini, 2 ndyear Ph.D. studentTutor: Claudia Andreini
  2. 2. Biological DatabasesWhy?1. Biology has increasingly turned into a data-rich science.2. To make biological data available to scientists.A particular type of information should be available in one single place(book, site, database).Collecting data from literature is TIME-CONSUMING!3. To organize data in order to produce knowledge4. To make biological data available in computer-readable form.Analysis of biological data almost always involves computers.Having the data in computer-readable form is a necessary first step.Biological DatabasesWhy?1. Biology has increasingly turned into a data-rich science.2. To make biological data available to scientists.A particular type of information should be available in one single place(book, site, database).Collecting data from literature is TIME-CONSUMING!3. To organize data in order to produce knowledge4. To make biological data available in computer-readable form.Analysis of biological data almost always involves computers.Having the data in computer-readable form is a necessary first step.
  3. 3. 1."Atlas of Protein Sequences and Structures" by Margaret Dayhoff andcolleagues, 1965. (PIR database). 65 Sequences.2. Protein Data Bank (PDB); Join between CCDC and BNL. 1971. 9 structures.3. GenBank. December 1982. 606 sequences.… Data require algorithms to be analyzed4. The FASTA algorithm is published by Pearson and Lipman. 19855. The BLAST program (Altschul,et.al.) is implemented. 1990...The omics era6. The E.Coli genome is published. 1997Today: 1783 Biological Databases(NAR database issue 20121)1. http://www.oxfordjournals.org/nar/database/cap/Biological DatabasesMilestonesBiological DatabasesMilestones
  4. 4. Biological Databases and MetalloproteinsA troubled relationshipNot informative Out of date(Lack of bio-inorganic background) (Difficult to update)The problem:exceptional variabilitylack of a formal description for metalsin proteinsFew resources dedicated(10 on 1783 databases found using “metal” keyword in NAR database issue).At least indicative.
  5. 5. Solution3D models of metal sitesAdvantages:Automatic Extraction of 3DmodelsFormal description of featuresSystematic organization of dataEasy updateMetal sites must be thought as functional unitscomposed of the metal and its LOCAL environmentBasis for database architecture
  6. 6. Metal PDB ArchitectureFirst level:Automatically filled1- Information onthe entire structuresfrom multipleresources2- Information onmetal sitesSecond level:Manually filledFunctionalinformation on metalsitesPDB
  7. 7. PfamPfamSCOPSCOPCATHCATHEC-PDBEC-PDBGOGO……PDBPDBFIRST LEVELInformationon the entire protein structurePDB coderesolutionProtein nameUniprot codeCluster 50% sequence identityCATH domain-sSCOP domain-sPFAM domain-sEnzyme Classification number-sTaxonomy namesOrganism of ExpressionMetal PDB Architecture
  8. 8. Metal PDB ArchitectureFIRST LEVELInformationon the metal site onlyMetal/s typeNuclearityCoordination numberBond distancesCoordination geometry1LigandsProximal residuesBinding patternConservation rates of residuesSecondary Structure patternH bonds............1. Andreini C., Cavallaro G., Lorenzini S., “FindGeo: a tool for determining metal coordinationgeometry”, Bioinformatics, April 2012Metal sitesMetal sites His 96His 96His 94His 94 His 119His 119
  9. 9. Metal PDB Interface
  10. 10. Metal PDB Interface
  11. 11. Metal PDB Interface
  12. 12. Metal PDB Interface
  13. 13. Data from First LevelData from First LevelMetal PDB Architecture: from First to Second LevelMetal PDB Architecture: from First to Second LevelProblem:Problem:Metal sites in the PDB are 151683.151683.28193 PDB entries actually bind metals.Problem:Problem:Metal sites in the PDB are 151683.151683.28193 PDB entries actually bind metals.Superfamily 1: zincinsSuperfamily 1: zincins Superfamily 2: endostatinsSuperfamily 2: endostatinsCluster 1Cluster 1 Cluster 2Cluster 2 Cluster 3Cluster 3 Cluster 4Cluster 4Solution:Solution:Create clusters of equivalent sites (same function) which can be annotatedtogether 1,21. Andreini et al, “Structural analysis of metal sites in proteins: non-heme iron sites as a case study”, J Mol Biol 20092. Andreini et al, Minimal functional sites allow a classification of zinc sites in proteins. PloS one, 2011
  14. 14. Characterization of a DatabaseMetal PDBType of dataMetal-containing 3D sub-structuresData entry and quality controlAppointed curators add, removeand update dataPrimary or derived dataSecondary databases: results of analysis of primary databasesLinks to other data itemsCombination of dataTechnical designRelational database (SQL)Maintainer statusAcademic groupAvailabilityPublicly available, no restrictions
  15. 15. THANK YOUFOR YOUR ATTENTIONThanks to Professor Ivano Bertini and ProfessorLucia BanciThanks toDr. Claudia AndreiniDr. Gabriele CavallaroProf. Antonio RosatoTech. Enrico MorelliTHANK YOUFOR YOUR ATTENTIONThanks to Professor Ivano Bertini and ProfessorLucia BanciThanks toDr. Claudia AndreiniDr. Gabriele CavallaroProf. Antonio RosatoTech. Enrico Morelli
  16. 16. Metal PDB Architecture: from First to Second LevelWhat are equivalent sites ?1. Automatically extract all sites2. Single linkage clustering to create groups of sites (CATH, SCOP,PFAM, CLUSTER50%)3. Structural alignment among sites in the same group(reference template is longest chain)
  17. 17. Metal PDB Architecture: from First to Second LevelWhat are equivalent sites ?1. Automatically extract all sites2. Single linkage clustering to create groups of sites (CATH, SCOP,PFAM, CLUSTER50%)3. Structural alignment among sites in the same group(reference template is longest chain)
  18. 18. Metal PDB Architecture: from First to Second LevelWhat are equivalent sites ?1. Automatically extract all sites2. Single linkage clustering to create groups of sites (CATH, SCOP,PFAM, CLUSTER50%)3. Structural alignment among sites in the same group(reference template is longest chain)4. Single linkage clustering to create sub-groups of structural similarity
  19. 19. Metal PDB Architecture: from First to Second LevelWhat are equivalent sites ?1 cluster of structuralsimilarity2 clusters offunctional similarity1 cluster of structuralsimilarity2 clusters offunctional similarity
  20. 20. Metal PDB Architecture: from First to Second LevelWhat are equivalent sites ?1. Automatically extract all sites2. Single linkage clustering to create groups of sites (CATH, SCOP,PFAM, CLUSTER50%)3. Structural alignment among sites in the same group(reference template is longest chain)4. Single linkage clustering to create sub-groups of structural similarity5. Cluster equivalent (nuclearity and type) metal sites among members ofsame structural similarity groupEQUIVALENT SITES ARE DEFINED

×