A Prlic - BioJava update
Upcoming SlideShare
Loading in...5

A Prlic - BioJava update



Presentation by Prlic at BOSC2012 "BioJava Update"

Presentation by Prlic at BOSC2012 "BioJava Update"



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

A Prlic - BioJava update A Prlic - BioJava update Presentation Transcript

  • How to use BioJavato calculate one billion protein structure alignments at the RCSB PDB website Andreas Prlić
  • My Two Hats RCSB PDB BioJava
  • Number of released entries www.pdb.org OverviewYear
  • JmolSome of the things you can do at the RCSB PDB site • Advanced queries Custom report • Custom reports • Visualization • Education section • Comparisons across PDB, based on sequence and 3D structure similarities Ligand Explorer
  • www.pdb.org Systematic Structural Alignment Objective: Find novel relationshipsExample: Green FluorescentProtein§ Nidogen-1: similar 11-stranded§ beta-barrel and internal helices§ 3 Å RMSD, only 9% sequence identity§ Nidogen-1: component of basementmembrane, no chromophore§ GFP and NID-1 may share commonancestor
  • Open Science Grid based on the FATCAT (rigid) algorithm Yuzhen Ye & Adam Godzik. Flexible structure alignment by chaining aligned fragment pairs allowing twists. 2003. Bioinformatics vol.19 suppl. 2. ii246-ii255. Systematic comparisons of representative chains from 40% sequence identity clusters 22000 sequence clusters 33000 representative domains
  • Java Clients can run anywhere Custom JobPDB Management Sends out instructions Open to clients Science Grid . Writes results to disk . .
  • Initial calculation of frozen snapshot of PDB ~170k CPU hours on OSG Incremental weekly updates (~1-2 million alignments) <1000 CPU hours1 billion alignments available freely at www.rcsb.org Code www.biojava.org
  • BioJava• Major rewrite - BioJava 3
  • BioJava 1 BioJava 3 core data modelsymbols/alphabets, counts, distributions Genome/sequencing Mult. seq. alignStructure alignment Modfinder AA Properties Protein Disorder Hmmer3 WS NCBI WS Parsers: Genbank/Embl/Blast
  • Acknowledgments RCSB PDB BioJava • Spencer Bliven • all contributors • Peter Rose • A.Yates, J. Jacobsen, P. Troshin, M. Chapman, J. • Phil Bourne Gao, C.H. Koh, S. Foisy, R. Holland, G. Rimsa, M. Heuer, H. Brandstaetter- Mueller, S. Willis RCSB PDBFunding Google Summer of Code Open Science Grid