UniView
Upcoming SlideShare
Loading in...5
×
 

UniView

on

  • 1,279 views

 

Statistics

Views

Total Views
1,279
Views on SlideShare
1,270
Embed Views
9

Actions

Likes
1
Downloads
3
Comments
0

3 Embeds 9

http://www.diegop.net 6
http://www.lmodules.com 2
http://www.slideshare.net 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

UniView UniView Presentation Transcript

  • Web-based application to survey properties of homologous proteins. proteins Candidato: Diego Poggioli Relatore: Prof. Rita Casadio Correlatore: Dr. Brigitte Boeckmann
  • • Bio-problem: Visualization and interaction with biological data and performing a comparative protein analysis • Info-solution: Web application – CGI The portal gives access to four web pages: 1) Function-related annotation derived from UniProtKB/Swiss-Prot; 2) Feature of the protein group; 3) Conservation score; 4) Tree.
  • Members of a protein family normally perform a general biochemical function in common, but one or more subgroups may evolve a slightly different function, such as different substrate specificity.
  • By comparing groups and subgroups of proteins it is possible to identify or estimate: • similarity and differences between the proteins sequences as well as the information available for the given protein group; • the ranges, within which functional information on proteins can be transferred from experimentally characterized proteins to their homologs from poorly studied organism; • errors in the annotations of proteins;
  • Visualization and interact with biological data
  • Available from any PC System and browser independent php C GI Dinamic page HTML JavaScript, PHP, Perl, Python, Ajax, ASP, Ruby…
  • ID AVID_CHICK Reviewed; 152 AA. Form filling and data type AC DT P02701; Q91958; Q98SH4; 21-JUL-1986, integrated into DT 11-SEP-2007, sequence version DT 10-JUN-2008, entry version 87. DE Avidin precursor. GN Name=AVD; OS Gallus gallus (Chicken). OC Eukaryota; Metazoa; Chordata OC Archosauria; Dinosauria OC Neognathae; Galliformes OX NCBI_TaxID=9031; RN [1] RP NUC RX MEDLINE=87203384; PubMed RA Gope M.L., Keinaenen R.A., RA Zarucki-Schulz T., O'Malley B. RT quot;Molecular cloning of the chic RL Nucleic Acids Res. 15:3595 RN [2] RP NUCLEOTIDE SEQUENCE [MR RX MEDLINE=90355928; PubMed RA Chandra G., Gray J.G.; RT quot;Cloning and expression of RL Methods Enzymol. 184:70 … AVID_CHICK AVR2_CHICK AVR4_CHICK AVR1_CHICK AVR3_CHICK AVR6_CHICK AVR7_CHICK P02701 P56732 P56734 O13153 P56733 P56735 P56736
  • BioView • overview on biological informations • taxonomic descriptive statistics a compact summary view on the biological information of a protein group is important especially when having a large dataset. This way it will be possible to observe, compare and count all common and dissimilar characteristics; it is also possible to analyze in every single detail of component with the same featuring. - gene name, functional (catalytic activity, enzyme regulation, pathway…) and general descriptive information; - organism classification (OC) and organism species (OS); - non-experimental qualifiers (by similarities, putative or probable).
  • Pipeline BioView page ID, AC, DE, CC:'FUNCTION', 'PATHWAY', 'CATALYTIC ACTIVITY', 'ENZYME REGULATION', 'SUBUNIT', 'SIMILARITY', 'COFACTOR', 'DEVELOPMENTAL STAGE', 'INDUCTION', 'PTM', 'SUBCELLULAR LOCALIZATION', 'TISSUE SPECIFICITY' OS, OC Eukaryota - Viridiplantae Eukaryota Streptophyta Viridiplantae Embryophyta Streptophyta Tracheophyta Embryophyta ... ...
  • Nuber of entries Non-redundant annotation Number of entries with non-experimental qualifier Number of entries with annotated experimental qualifier
  • On mouse-click the relevant entry names are listed Expande all the hierarchy
  • FeatureView • Interactive interface for visualizing function-related features on the protein sequence and 3D structure • This page should allow the user to analyze combined sequences-structure on a broad set of data showing the greatest number of information available in a clear and intuitive way.
  • Function-related features derived from the FT lines of UniProtKB: active sites, binding sites, domain, transmembrane region, DNA binding domain… are mapped on the alignment and highlighted to allow a clear and compact presentation of the relevant information. The characteristics are mapped on the structure in the same way, allowing to identify regions and conserved sites. Sequence FT Structure
  • FeatureView • Choose the best structure • Alignment • Mapping the feature on the alignment and on the structure
  • Choose the best structure * ... '91 ' => ‘91', '25 ' => ‘25', '92 ' => ‘92', '81 ' => ‘82', '71 ' => ‘71', '21 ' => ‘23', '-' => 'x', '61 ' => ‘61', '37 ' => ‘37', '68 ' => ‘68', '50 ' => ‘50', '18 ' => ‘15', ... F.P.A. David and Y.L. Yip. SSMap*: a new UniProt-PDB mapping resource for the curation of structural-related information in the UniProt/Swiss-Prot Knowledgebase. Submitted
  • Jmol: an open-source Java viewer for chemical structures in 3D. http://www.jmol.org/
  • FeatureView • Choose the best structure • Alignment • Mapping the feature on the alignment and on the structure
  • Alignment Input file Edgar, Robert C. (2004), MUSCLE: multiple sequence alignment with high accuracy and high throughput, Nucleic Acids Research 32(5), 1792-97.
  • FeatureView • Choose the best structure • Alignment • Mapping the feature on the alignment and on the structure
  • Alignment Input file FT (Feature Table) lines I group: ('CA_BIND', 'NP_BIND', 'MOTIF', 'ACT_SITE', 'METAL', 'BINDING', 'SITE', 'NON_STD', 'MOD_RES', 'LIPID', 'CARBOHYD', 'DISULFID', 'CROSSLINK'); II group: ('PEPTIDE', 'TOPO_DOM', 'TRANSMEM', 'DOMAIN', 'REPEAT', 'ZN_FING', 'DNA_BIND', 'REGION', 'COILED');
  • FT (Feature Table) lines distinct font color and with a toolbox I group: ('CA_BIND', 'NP_BIND', 'MOTIF', containing the description of the feature 'ACT_SITE', 'METAL', 'BINDING', 'SITE', (entry name, feature key, sequence position, 'NON_STD', 'MOD_RES', 'LIPID', 'CARBOHYD', description) 'DISULFID', 'CROSSLINK'); II group: ('PEPTIDE', 'TOPO_DOM', different background colour and a toolbox with the 'TRANSMEM', 'DOMAIN', 'REPEAT', 'ZN_FING', content as described above. 'DNA_BIND', 'REGION', 'COILED'); -overlapping into the first group represented in toolbox. -ovelapping into the second group different background color.
  • ATOM 1817 N MET B 3 -31.380 87.126 39.296 1.0 100.00 ATOM 1818 CA MET B 3 -30.684 88.400 39.176 1.0 100.00 ATOM 1819 C MET B 3 -30.858 88.967 37.771 1.0 100.00 ATOM 1820 O MET B 3 -30.195 88.514 36.832 1.0 100.00 ATOM 1821 CB MET B 3 -29.190 88.285 39.498 1.0 100.00 ATOM 1822 CG MET B 3 -28.465 89.628 39.501 1.0 100.00 ATOM 1823 SD MET B 3 -26.671 89.415 39.661 1.0 100.00 ATOM 1824 CE MET B 3 -26.312 90.705 40.863 1.0 100.00 ATOM 1825 N GLU B 4 -31.750 89.938 37.638 1.0 50.00 ATOM 1826 CA GLU B 4 -31.927 90.498 36.300 1.0 50.00 … … … … … … … … … … … 100.00 Alignment position 00.00 50.00
  • On mouse-click run blastp on UniProt web page
  • On mouse-click start Jalview applet
  • Conservation • Interactive interface for visualizing the structural conservation of protein groups on the protein sequence and 3D structure • Highlight positions and regions conserved in the group of proteins • Conservation scores are mapped on the multiple sequence alignment (MSA) and into the 3D-structure
  • Scoring residue conservation Input file
  • Scoring methods Method name Type of score Description basicmdm Sum-of-Pairs (SP), matrix score Simplest SP score possible Normalized Shanon entropy with 7 entropynorm7 Entropic symbol types Normalized Shannon entropy with entropynorm21 Entropic 21 symbol types. Entropic, matrix score, sequence trident Mixed model score. weighted SP, matrix score, sequence Score used in Valdar & Thornton valdar01 weighted 2001 0.000 # ---S-------- 0.000 # ---T-------- 0.000 # ---S-------- 0.000 # ---T-------- 0.000 # ---S-------- 0.024 # ---TM-M----- 0.320 # MMMSV-VVMM-- 0.278 # VVVDHMHHGGG- 0.500 # LLLYLLWWLLL- 0.603 # SSSSTTTSSSS- 0.391 # PAAAPAAEDDD- 0.424 # AAAAEEEVGGQT 0.809 # DDDDEEEEEEEE
  • At the moment it is a framework integrated for the development of the visualization of info such as annotation and for the visualization of sites that differ in conservation between protein subgroups. • develop a method to compare two or more protein subgroups • profile Input file
  • Tree The phylogenetic tree of the protein group will be shown in this page .
  • Software for phylogenetic tree visualization and manipulations http://bioinfo.unice.fr/biodiv/Tree_editors.html - Treedyn: works in local machine but not in server side (graphical applet needed) - Phylodendron: trouble with cgi script -phyfi: private program it is not possible to install on own server, eventually URL request -nexplorer: NEXUS format needed and it is not possible to install on own server - dnd2svg.pl: strict sequence number – output only in SVG format -TreeFam: only private program ATV 1.92
  • Input file Gascuel O.1997. BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Molecular Biology and Evolution, 14:685-695. Tree in Newick format ((((ACADM_HUMAN:0.000925,ACADM_PANTR:0.003941):0.014922,ACADM_MACFA:0.021579):0.041621,((ACADM _MOUSE:0.015113,ACADM_RAT:0.029420):0.051559,(ACADM_DROME:0.187088,((ACAD8_MOUSE:0.049728,ACAD 8_HUMAN:0.052753):0.013706,ACAD8_BOVIN:0.104627):1.146493):0.149078):0.010918):0.015504,ACADM_ PIG:0.057735,ACADM_BOVIN:0.023577); http://www.phylosoft.org/atv/ Zmasek C.M. and Eddy S.R. (2001) ATV: display and manipulation of annotated phylogenetic trees. Bioinformatics, 17, 383-384. http://www.jalview.org/ Clamp, M., Cuff, J., Searle, S. M. and Barton, G. J. (2004). The Jalview Java Alignment Editor. Bioinformatics, 20, 426-7
  • Future plans • Normalize HTML pages according to the W3C standard • Improve the use of CSS • Test the application on different web browser • Write the application in a server side language • Integrate the application with other databases • Ensuring multiple access to the application and analysis history • Develop a view of phylogenetic tree to show and to interact with additional information • Hierarchical phylogeny-based classification in UniProtKB
  • Following the hierarchical phylogeny-based classification in UniProtKB
  • Acknowledgements • Brigitte Boeckmann & Rita Casadio • Swiss-Prot lab, Biocomputing group • Fabrice David & Marco Vassura • Tutti i miei amici e Fra • Dolores e Davide And now?
  • practical examples - identifysimilarity and differences between the proteins sequences as well as the information available for the given protein group; - estimating the ranges, within which functional information on proteins can be transferred from experimentally characterized proteins to their homologs from poorly studied organism; - identify errors in the annotations of proteins;
  • Compact summary view on the biological information of a protein group is important especially when having a large dataset. This way it will be possible to observe, compare and count all common and dissimilar characteristics; it is also possible to analyze in every single detail of component with the same featuring. Acetylglutamate kinase family
  • Acyl-CoA dehydrogenase family
  • gatB/gatE family
  • IPP transferase family