Your SlideShare is downloading. ×
0
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
UniView
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

UniView

864

Published on

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
864
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Web-based application to survey properties of homologous proteins. proteins Candidato: Diego Poggioli Relatore: Prof. Rita Casadio Correlatore: Dr. Brigitte Boeckmann
  • 2. • Bio-problem: Visualization and interaction with biological data and performing a comparative protein analysis • Info-solution: Web application – CGI The portal gives access to four web pages: 1) Function-related annotation derived from UniProtKB/Swiss-Prot; 2) Feature of the protein group; 3) Conservation score; 4) Tree.
  • 3. Members of a protein family normally perform a general biochemical function in common, but one or more subgroups may evolve a slightly different function, such as different substrate specificity.
  • 4. By comparing groups and subgroups of proteins it is possible to identify or estimate: • similarity and differences between the proteins sequences as well as the information available for the given protein group; • the ranges, within which functional information on proteins can be transferred from experimentally characterized proteins to their homologs from poorly studied organism; • errors in the annotations of proteins;
  • 5. Visualization and interact with biological data
  • 6. Available from any PC System and browser independent php C GI Dinamic page HTML JavaScript, PHP, Perl, Python, Ajax, ASP, Ruby…
  • 7. ID AVID_CHICK Reviewed; 152 AA. Form filling and data type AC DT P02701; Q91958; Q98SH4; 21-JUL-1986, integrated into DT 11-SEP-2007, sequence version DT 10-JUN-2008, entry version 87. DE Avidin precursor. GN Name=AVD; OS Gallus gallus (Chicken). OC Eukaryota; Metazoa; Chordata OC Archosauria; Dinosauria OC Neognathae; Galliformes OX NCBI_TaxID=9031; RN [1] RP NUC RX MEDLINE=87203384; PubMed RA Gope M.L., Keinaenen R.A., RA Zarucki-Schulz T., O'Malley B. RT quot;Molecular cloning of the chic RL Nucleic Acids Res. 15:3595 RN [2] RP NUCLEOTIDE SEQUENCE [MR RX MEDLINE=90355928; PubMed RA Chandra G., Gray J.G.; RT quot;Cloning and expression of RL Methods Enzymol. 184:70 … AVID_CHICK AVR2_CHICK AVR4_CHICK AVR1_CHICK AVR3_CHICK AVR6_CHICK AVR7_CHICK P02701 P56732 P56734 O13153 P56733 P56735 P56736
  • 8. BioView • overview on biological informations • taxonomic descriptive statistics a compact summary view on the biological information of a protein group is important especially when having a large dataset. This way it will be possible to observe, compare and count all common and dissimilar characteristics; it is also possible to analyze in every single detail of component with the same featuring. - gene name, functional (catalytic activity, enzyme regulation, pathway…) and general descriptive information; - organism classification (OC) and organism species (OS); - non-experimental qualifiers (by similarities, putative or probable).
  • 9. Pipeline BioView page ID, AC, DE, CC:'FUNCTION', 'PATHWAY', 'CATALYTIC ACTIVITY', 'ENZYME REGULATION', 'SUBUNIT', 'SIMILARITY', 'COFACTOR', 'DEVELOPMENTAL STAGE', 'INDUCTION', 'PTM', 'SUBCELLULAR LOCALIZATION', 'TISSUE SPECIFICITY' OS, OC Eukaryota - Viridiplantae Eukaryota Streptophyta Viridiplantae Embryophyta Streptophyta Tracheophyta Embryophyta ... ...
  • 10. Nuber of entries Non-redundant annotation Number of entries with non-experimental qualifier Number of entries with annotated experimental qualifier
  • 11. On mouse-click the relevant entry names are listed Expande all the hierarchy
  • 12. FeatureView • Interactive interface for visualizing function-related features on the protein sequence and 3D structure • This page should allow the user to analyze combined sequences-structure on a broad set of data showing the greatest number of information available in a clear and intuitive way.
  • 13. Function-related features derived from the FT lines of UniProtKB: active sites, binding sites, domain, transmembrane region, DNA binding domain… are mapped on the alignment and highlighted to allow a clear and compact presentation of the relevant information. The characteristics are mapped on the structure in the same way, allowing to identify regions and conserved sites. Sequence FT Structure
  • 14. FeatureView • Choose the best structure • Alignment • Mapping the feature on the alignment and on the structure
  • 15. Choose the best structure * ... '91 ' => ‘91', '25 ' => ‘25', '92 ' => ‘92', '81 ' => ‘82', '71 ' => ‘71', '21 ' => ‘23', '-' => 'x', '61 ' => ‘61', '37 ' => ‘37', '68 ' => ‘68', '50 ' => ‘50', '18 ' => ‘15', ... F.P.A. David and Y.L. Yip. SSMap*: a new UniProt-PDB mapping resource for the curation of structural-related information in the UniProt/Swiss-Prot Knowledgebase. Submitted
  • 16. Jmol: an open-source Java viewer for chemical structures in 3D. http://www.jmol.org/
  • 17. FeatureView • Choose the best structure • Alignment • Mapping the feature on the alignment and on the structure
  • 18. Alignment Input file Edgar, Robert C. (2004), MUSCLE: multiple sequence alignment with high accuracy and high throughput, Nucleic Acids Research 32(5), 1792-97.
  • 19. FeatureView • Choose the best structure • Alignment • Mapping the feature on the alignment and on the structure
  • 20. Alignment Input file FT (Feature Table) lines I group: ('CA_BIND', 'NP_BIND', 'MOTIF', 'ACT_SITE', 'METAL', 'BINDING', 'SITE', 'NON_STD', 'MOD_RES', 'LIPID', 'CARBOHYD', 'DISULFID', 'CROSSLINK'); II group: ('PEPTIDE', 'TOPO_DOM', 'TRANSMEM', 'DOMAIN', 'REPEAT', 'ZN_FING', 'DNA_BIND', 'REGION', 'COILED');
  • 21. FT (Feature Table) lines distinct font color and with a toolbox I group: ('CA_BIND', 'NP_BIND', 'MOTIF', containing the description of the feature 'ACT_SITE', 'METAL', 'BINDING', 'SITE', (entry name, feature key, sequence position, 'NON_STD', 'MOD_RES', 'LIPID', 'CARBOHYD', description) 'DISULFID', 'CROSSLINK'); II group: ('PEPTIDE', 'TOPO_DOM', different background colour and a toolbox with the 'TRANSMEM', 'DOMAIN', 'REPEAT', 'ZN_FING', content as described above. 'DNA_BIND', 'REGION', 'COILED'); -overlapping into the first group represented in toolbox. -ovelapping into the second group different background color.
  • 22. ATOM 1817 N MET B 3 -31.380 87.126 39.296 1.0 100.00 ATOM 1818 CA MET B 3 -30.684 88.400 39.176 1.0 100.00 ATOM 1819 C MET B 3 -30.858 88.967 37.771 1.0 100.00 ATOM 1820 O MET B 3 -30.195 88.514 36.832 1.0 100.00 ATOM 1821 CB MET B 3 -29.190 88.285 39.498 1.0 100.00 ATOM 1822 CG MET B 3 -28.465 89.628 39.501 1.0 100.00 ATOM 1823 SD MET B 3 -26.671 89.415 39.661 1.0 100.00 ATOM 1824 CE MET B 3 -26.312 90.705 40.863 1.0 100.00 ATOM 1825 N GLU B 4 -31.750 89.938 37.638 1.0 50.00 ATOM 1826 CA GLU B 4 -31.927 90.498 36.300 1.0 50.00 … … … … … … … … … … … 100.00 Alignment position 00.00 50.00
  • 23. On mouse-click run blastp on UniProt web page
  • 24. On mouse-click start Jalview applet
  • 25. Conservation • Interactive interface for visualizing the structural conservation of protein groups on the protein sequence and 3D structure • Highlight positions and regions conserved in the group of proteins • Conservation scores are mapped on the multiple sequence alignment (MSA) and into the 3D-structure
  • 26. Scoring residue conservation Input file
  • 27. Scoring methods Method name Type of score Description basicmdm Sum-of-Pairs (SP), matrix score Simplest SP score possible Normalized Shanon entropy with 7 entropynorm7 Entropic symbol types Normalized Shannon entropy with entropynorm21 Entropic 21 symbol types. Entropic, matrix score, sequence trident Mixed model score. weighted SP, matrix score, sequence Score used in Valdar & Thornton valdar01 weighted 2001 0.000 # ---S-------- 0.000 # ---T-------- 0.000 # ---S-------- 0.000 # ---T-------- 0.000 # ---S-------- 0.024 # ---TM-M----- 0.320 # MMMSV-VVMM-- 0.278 # VVVDHMHHGGG- 0.500 # LLLYLLWWLLL- 0.603 # SSSSTTTSSSS- 0.391 # PAAAPAAEDDD- 0.424 # AAAAEEEVGGQT 0.809 # DDDDEEEEEEEE
  • 28. At the moment it is a framework integrated for the development of the visualization of info such as annotation and for the visualization of sites that differ in conservation between protein subgroups. • develop a method to compare two or more protein subgroups • profile Input file
  • 29. Tree The phylogenetic tree of the protein group will be shown in this page .
  • 30. Software for phylogenetic tree visualization and manipulations http://bioinfo.unice.fr/biodiv/Tree_editors.html - Treedyn: works in local machine but not in server side (graphical applet needed) - Phylodendron: trouble with cgi script -phyfi: private program it is not possible to install on own server, eventually URL request -nexplorer: NEXUS format needed and it is not possible to install on own server - dnd2svg.pl: strict sequence number – output only in SVG format -TreeFam: only private program ATV 1.92
  • 31. Input file Gascuel O.1997. BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Molecular Biology and Evolution, 14:685-695. Tree in Newick format ((((ACADM_HUMAN:0.000925,ACADM_PANTR:0.003941):0.014922,ACADM_MACFA:0.021579):0.041621,((ACADM _MOUSE:0.015113,ACADM_RAT:0.029420):0.051559,(ACADM_DROME:0.187088,((ACAD8_MOUSE:0.049728,ACAD 8_HUMAN:0.052753):0.013706,ACAD8_BOVIN:0.104627):1.146493):0.149078):0.010918):0.015504,ACADM_ PIG:0.057735,ACADM_BOVIN:0.023577); http://www.phylosoft.org/atv/ Zmasek C.M. and Eddy S.R. (2001) ATV: display and manipulation of annotated phylogenetic trees. Bioinformatics, 17, 383-384. http://www.jalview.org/ Clamp, M., Cuff, J., Searle, S. M. and Barton, G. J. (2004). The Jalview Java Alignment Editor. Bioinformatics, 20, 426-7
  • 32. Future plans • Normalize HTML pages according to the W3C standard • Improve the use of CSS • Test the application on different web browser • Write the application in a server side language • Integrate the application with other databases • Ensuring multiple access to the application and analysis history • Develop a view of phylogenetic tree to show and to interact with additional information • Hierarchical phylogeny-based classification in UniProtKB
  • 33. Following the hierarchical phylogeny-based classification in UniProtKB
  • 34. Acknowledgements • Brigitte Boeckmann & Rita Casadio • Swiss-Prot lab, Biocomputing group • Fabrice David & Marco Vassura • Tutti i miei amici e Fra • Dolores e Davide And now?
  • 35. practical examples - identifysimilarity and differences between the proteins sequences as well as the information available for the given protein group; - estimating the ranges, within which functional information on proteins can be transferred from experimentally characterized proteins to their homologs from poorly studied organism; - identify errors in the annotations of proteins;
  • 36. Compact summary view on the biological information of a protein group is important especially when having a large dataset. This way it will be possible to observe, compare and count all common and dissimilar characteristics; it is also possible to analyze in every single detail of component with the same featuring. Acetylglutamate kinase family
  • 37. Acyl-CoA dehydrogenase family
  • 38. gatB/gatE family
  • 39. IPP transferase family

×