Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A Prlic - BioJava update

1,087 views

Published on

Presentation by Prlic at BOSC2012 "BioJava Update"

Published in: Technology
  • Be the first to comment

A Prlic - BioJava update

  1. 1. How to use BioJavato calculate one billion protein structure alignments at the RCSB PDB website Andreas Prlić
  2. 2. My Two Hats RCSB PDB BioJava
  3. 3. Number of released entries www.pdb.org OverviewYear
  4. 4. JmolSome of the things you can do at the RCSB PDB site • Advanced queries Custom report • Custom reports • Visualization • Education section • Comparisons across PDB, based on sequence and 3D structure similarities Ligand Explorer
  5. 5. www.pdb.org Systematic Structural Alignment Objective: Find novel relationshipsExample: Green FluorescentProtein§ Nidogen-1: similar 11-stranded§ beta-barrel and internal helices§ 3 Å RMSD, only 9% sequence identity§ Nidogen-1: component of basementmembrane, no chromophore§ GFP and NID-1 may share commonancestor
  6. 6. Open Science Grid based on the FATCAT (rigid) algorithm Yuzhen Ye & Adam Godzik. Flexible structure alignment by chaining aligned fragment pairs allowing twists. 2003. Bioinformatics vol.19 suppl. 2. ii246-ii255. Systematic comparisons of representative chains from 40% sequence identity clusters 22000 sequence clusters 33000 representative domains
  7. 7. Java Clients can run anywhere Custom JobPDB Management Sends out instructions Open to clients Science Grid . Writes results to disk . .
  8. 8. Initial calculation of frozen snapshot of PDB ~170k CPU hours on OSG Incremental weekly updates (~1-2 million alignments) <1000 CPU hours1 billion alignments available freely at www.rcsb.org Code www.biojava.org
  9. 9. BioJava• Major rewrite - BioJava 3
  10. 10. BioJava 1 BioJava 3 core data modelsymbols/alphabets, counts, distributions Genome/sequencing Mult. seq. alignStructure alignment Modfinder AA Properties Protein Disorder Hmmer3 WS NCBI WS Parsers: Genbank/Embl/Blast
  11. 11. Acknowledgments RCSB PDB BioJava • Spencer Bliven • all contributors • Peter Rose • A.Yates, J. Jacobsen, P. Troshin, M. Chapman, J. • Phil Bourne Gao, C.H. Koh, S. Foisy, R. Holland, G. Rimsa, M. Heuer, H. Brandstaetter- Mueller, S. Willis RCSB PDBFunding Google Summer of Code Open Science Grid

×