VAMSAS Project September  1rt, 2005 – February  2006 Pierre Marguerite
What is VAMSAS? Open framework that facilitates the interoperation of advanced tools for phylogenetics, sequence analysis, and structural bioinformatics, by providing a common model for bioinformatic data exchange, web service discovery and interaction.
Bringing three programs together… The VAMSAS Framework Visual Analysis of Molecular Sequences, Alignments and Structures Jalview Alignment Visualization  Sequence Analysis ( University of Dundee Geoff Barton & David Martin Jim Procter, Andrew Waterhouse ) TOPAli DNA Recombination Phylogenetic Analysis ( Biomathematics and Statistics Scotland (BioSS) At Scottish Crop Research Institute (SCRI) Frank Wright & David Marshall Iain Milne ) [email_address] Molecular Graphics Conformation Analysis Reaction Diagrams  (Tom Oldfield & Kim Henrick Pierre Marguerite ) http://www.vamsas.ac.uk VAMSAS
 
The VAMSAS Document XML model for core biological data types and annotation Database-like primary keys Provenance for primary and derived data What was done and when References for primary data Sequence IDs Database cross references Data storage for each VAMSAS application BIOLOGICAL SEQUENCE AND ANALYSIS DATA JALVIEW DATA ASTEXVIEWER DATA TOPALI DATA References Provenance
VAMSAS Applications VAMSAS Apps have three main functions: Data Import Translate data into VAMSAS XML Data Analysis Extend core data set Add new annotations Data Presentation Visualization and Export Parameters recorded in Application’s datastore CONTROL Analyse Import Data Present VAMSAS Map Render Filter Data VAMSAS VAMSAS
VAMSAS Client Sessions VAMSAS client library Data exchange ( Many applications connect to one document ) Locked IO Transport objects to/from document Object ID queries IO streams for application’s own data Session Events Handler Chains Document updates Others join or leave session Client Library is ‘lightweight’ Easy to adapt existing programs ‘ Only’ need to write mapping between VAMSAS and legacy data model Easy to make new clients/ add application Existing bioinformatics application. VAMSAS  Client API adaptor VAMSAS
My work Integrates  the AstexViewer@MSD-EBI in VAMSAS workflow. Conversion/Proceeding of Data  Export from AV-MSD (annotations for other vamsas application) Separated/specific  application (VAMAV) [email_address] [email_address] VAMSAS  Client API adaptor
Conversion of data from VAMSAS document Input of the AstexViewer@MSD-EBI:  Atribute file (Grouping Information) ClustalW/FastA alignment PDB files
Process of conversion Extract Alignment sequences and annotations Mapping with PBD ID Sequence Grouping Generate required files (atribute, alignment, …)  Visualise data in AstexViewer@MSD-EBI VAMSAS Document XML – zip Sequence Grouping Generate required  files Visualisation in  [email_address] Document  Processing
Exporting  from AV Export of pre-calculated data Context dependant Ex: Active Site Export of  functional  aspects Current Context  (as button state)
First Version (18th October 2005) Displays data only from VAMSAS document in the AV@EBI  Sequence  grouping per structure Java 1.5 Socket communication  between  the application and AV client
Current version Proper  design - Flexibility, configuration Session management Export of  precalculated data: Active site export BMean  Sequence  grouping per PFAM domain Align structures - Rotation Translation Matrix ( SSM – MSD api ) Execute ClustalW alignment  Web interface (JSP, Servlet) , GUI
Future plans Service for grouping/alignment using SSM and ClustalW. User management Data validation/highlight Integrate the new version of the Vamsas client API SIFTS initiative – spats web service  Meeting @ EBI (March 28th-29th) E-family
Demonstration VAMSAS @ EBI

Group Meeting Vamsas Project Final

  • 1.
    VAMSAS Project September 1rt, 2005 – February 2006 Pierre Marguerite
  • 2.
    What is VAMSAS?Open framework that facilitates the interoperation of advanced tools for phylogenetics, sequence analysis, and structural bioinformatics, by providing a common model for bioinformatic data exchange, web service discovery and interaction.
  • 3.
    Bringing three programstogether… The VAMSAS Framework Visual Analysis of Molecular Sequences, Alignments and Structures Jalview Alignment Visualization Sequence Analysis ( University of Dundee Geoff Barton & David Martin Jim Procter, Andrew Waterhouse ) TOPAli DNA Recombination Phylogenetic Analysis ( Biomathematics and Statistics Scotland (BioSS) At Scottish Crop Research Institute (SCRI) Frank Wright & David Marshall Iain Milne ) [email_address] Molecular Graphics Conformation Analysis Reaction Diagrams (Tom Oldfield & Kim Henrick Pierre Marguerite ) http://www.vamsas.ac.uk VAMSAS
  • 4.
  • 5.
    The VAMSAS DocumentXML model for core biological data types and annotation Database-like primary keys Provenance for primary and derived data What was done and when References for primary data Sequence IDs Database cross references Data storage for each VAMSAS application BIOLOGICAL SEQUENCE AND ANALYSIS DATA JALVIEW DATA ASTEXVIEWER DATA TOPALI DATA References Provenance
  • 6.
    VAMSAS Applications VAMSASApps have three main functions: Data Import Translate data into VAMSAS XML Data Analysis Extend core data set Add new annotations Data Presentation Visualization and Export Parameters recorded in Application’s datastore CONTROL Analyse Import Data Present VAMSAS Map Render Filter Data VAMSAS VAMSAS
  • 7.
    VAMSAS Client SessionsVAMSAS client library Data exchange ( Many applications connect to one document ) Locked IO Transport objects to/from document Object ID queries IO streams for application’s own data Session Events Handler Chains Document updates Others join or leave session Client Library is ‘lightweight’ Easy to adapt existing programs ‘ Only’ need to write mapping between VAMSAS and legacy data model Easy to make new clients/ add application Existing bioinformatics application. VAMSAS Client API adaptor VAMSAS
  • 8.
    My work Integrates the AstexViewer@MSD-EBI in VAMSAS workflow. Conversion/Proceeding of Data Export from AV-MSD (annotations for other vamsas application) Separated/specific application (VAMAV) [email_address] [email_address] VAMSAS Client API adaptor
  • 9.
    Conversion of datafrom VAMSAS document Input of the AstexViewer@MSD-EBI: Atribute file (Grouping Information) ClustalW/FastA alignment PDB files
  • 10.
    Process of conversionExtract Alignment sequences and annotations Mapping with PBD ID Sequence Grouping Generate required files (atribute, alignment, …) Visualise data in AstexViewer@MSD-EBI VAMSAS Document XML – zip Sequence Grouping Generate required files Visualisation in [email_address] Document Processing
  • 11.
    Exporting fromAV Export of pre-calculated data Context dependant Ex: Active Site Export of functional aspects Current Context (as button state)
  • 12.
    First Version (18thOctober 2005) Displays data only from VAMSAS document in the AV@EBI Sequence grouping per structure Java 1.5 Socket communication between the application and AV client
  • 13.
    Current version Proper design - Flexibility, configuration Session management Export of precalculated data: Active site export BMean Sequence grouping per PFAM domain Align structures - Rotation Translation Matrix ( SSM – MSD api ) Execute ClustalW alignment Web interface (JSP, Servlet) , GUI
  • 14.
    Future plans Servicefor grouping/alignment using SSM and ClustalW. User management Data validation/highlight Integrate the new version of the Vamsas client API SIFTS initiative – spats web service Meeting @ EBI (March 28th-29th) E-family
  • 15.

Editor's Notes

  • #3 A key objective of this project is to simplify collaboration between the phylogeny and protein structure communities via the provision of easy to use tools for complex methods. We will achieve this by adding significant new functions to three popular Java-based applications that span phylogenetics on DNA (TOPALi) protein sequence analysis and prediction (JalView) and visualisation of three-dimensional structure (AstexViewer@MSD-EBI ) vs
  • #4 Collaboration Université de Dundee (bbsrc – the Biotechnology and Biological Sciences Research Council) - Biomathematics & Statistics Scotland (BioSS) Jalview is a multiple alignment editor TOPALi provides an intuitive graphical interface for working with existing statistical methods designed for detecting recombination in DNA sequence alignments. AstexViewer@EBI structural data browser a version of the Astex Viewertm for molecular structure developed by the EBI's Macomolecular Structure Database group, and TOPAli – a tool for phylogenetic and recombination analysis which is developed at the Scottish Crop Research Institute. Topali detect the position of recombinaison breakpoints within DNA multiple alignments
  • #5 Visualisation & Analysis of Biological Sequences, Alignments & Structures The aim of this project is to simplify collaboration between the phylogeny and protein structure communities via the provision of easy to use tools for complex methods. A significant part of this challenge is to make independent tools interact together by transparent data exchange. by adding SOAP support, and significant new functions to three popular Java-based applications that span phylogenetics on DNA (TOPALi) protein sequence analysis and prediction (JalView) and visualisation of three-dimensional structure (AstexViewer™@EBI-MSD (AV@EBI-MSD)). Each of these programs has a wide userbase (Jalview has over 25,000 google hits), and analytical tools essential to integrative biology. Projet qui a commencé October 2004 3 outils Topali JalView And AstexViewer@EBI développé au sein de l’ebi et dont je suis chargé de son intégration
  • #6 Sequences sets Alignements Annotations historique Phylogenic tree (global tree of species)
  • #9 2 applications Av is owned by astex technology Lightweight AV Client and browser compatibility (Java 1.1)
  • #10 Rotation translation matrix -> SSM
  • #11 The "Structure integration with function, taxonomy and sequence (SIFTS) initiative" aims to work towards the integration of various bioinformatics resources. One of the major obstacles to the improved integration of structural databases such as MSD and sequence databases like UniProt , which are primary archival databases for structure and sequence data, is the absence of up to date and well-maintained mapping between corresponding entries. We have worked closely with the UniProt group at the EBI to clean up the taxonomy and sequence cross-reference information in the MSD and UniProt databases. The project was started in the year 2001 and has resulted in creating a robust mechanisms for exchanging data between the two primary data resources. This has dramatically improved the quality of annotation in both databases and is aiding the continuing improvements of legacy data. In the longer term this project will allow for not only the better and closer integration of derived-data resources but will continue to improve the quality of all data in the primary resources.
  • #14 Avoid duplicated files Clusta
  • #15 Omega torion angle The "Structure integration with function, taxonomy and sequence (SIFTS) initiative" aims to work towards the integration of various bioinformatics resources. One of the major obstacles to the improved integration of structural databases such as MSD and sequence databases like UniProt , which are primary archival databases for structure and sequence data, is the absence of up to date and well-maintained mapping between corresponding entries. We have worked closely with the UniProt group at the EBI to clean up the taxonomy and sequence cross-reference information in the MSD and UniProt databases. The project was started in the year 2001 and has resulted in creating a robust mechanisms for exchanging data between the two primary data resources. This has dramatically improved the quality of annotation in both databases and is aiding the continuing improvements of legacy data. In the longer term this project will allow for not only the better and closer integration of derived-data resources but will continue to improve the quality of all data in the primary resources.