Your SlideShare is downloading. ×
0
Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Hedlund_biogrid_BOSC2009

377

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
377
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Biogrid – Bioinformatics for the grid Joel Hedlund <yohell@ifm.liu.se> Biogrid User and Developer Linköping University, Sweden Birds-of-a-feather session tonight: see me after this talk!
  • 2. Outline • What is it? • What is it good for? • Does it really work? • Gory details. • Why did we do this? • Profit!
  • 3. What is it? NDGF BIO Community Grid Bioinformatics for the Grid
  • 4. What is it? • Unified interface ...to popular bioinformatic applications ...on shared, distributed computational resources ...using versioned and cached databases
  • 5. What is it good for? • Burst computing – High demand for short periods of time • high during development / production • low during analysis / writing papers – Share resources to enable more efficient use • Database accessibility • Availibility • Unified interface
  • 6. What is NDGF?
  • 7. What is NDGF? • Nordic Data Grid Facility • A WLCG Tier1 facility – Worldwide LHC Computational Grid – Stores and processes data from LHC at CERN • peak rate ≈ 1.6Gb/s, when the accelerator is running (and that’s after most of the data have been filtered away)
  • 8. ”Does it really work, this distributed thingie?”
  • 9. ”Does it really work, this distributed thingie?” Why yes, very well thank you!
  • 10. NDGF • 96% availablity (highest of all Tier1 facilities) • Third largest Tier1 facility in the world • Lowest ratio of failed ATLAS jobs • Production goals met, and beyond – Goal: 8% of all ATLAS resources (10.5% provided) – Goal: 9% of all ALICE resources (12% provided) * Data graciously stolen from Leif Nixons NorduNet 2008 talk. Thank you Leif :-)
  • 11. DISTRIBUTION IS A STRENGTH
  • 12. It enforces unification It ensures availability
  • 13. Does it really work? It’s good enough for LHC. It’s good enough for Bioinformatics.
  • 14. Gory details
  • 15. Biogrid provides Optimised applications: – BLAST – ClustalW – HMMER – Muscle – Mafft Planned: molecular dynamics, phylogeny...
  • 16. Biogrid provides Versioned, indexed and cached databases – UniProtKB (subreleases) – Uniref (subreleases) Planned: genomes (EnsEMBL), nucleotides (EMBL)...
  • 17. Cached database access Database files are transfered to the cluster at most once per project.
  • 18. Unified Interface
  • 19. Unified Interface
  • 20. Unified Interface DATA RESULTS
  • 21. Unified Interface • XRSL Job Description Standard in ARC Grid Middleware • Well defined runtime environments $HMMERDIR: node local (fast) scratch dir containing db files prepare_db: download and unpack db files on the fly from front node to $HMMERDIR
  • 22. XRSL Job Description (jobName=refinehmm-family023) (runTimeEnvironment=APPS/BIO/HMMER2.3.2) (cpuTime=3000) (executable=refinehmm.jobscript.sh) (inputFiles= (sp.gz srm://srm.ndgf.org/biogrid/db/uniprot/UniProt14.8/uniprot_sprot.fasta.gz) (tr.gz srm://srm.ndgf.org/biogrid/db/uniprot/UniProt14.8/uniprot_trembl.fasta.gz) (family023.hmm ””) ) (outputfiles= (family023.refined.hmm ””) )
  • 23. XRSL Job Description (jobName=refinehmm-$HMM_NAME) (runTimeEnvironment=APPS/BIO/HMMER2.3.2) (cpuTime=3000) (executable=refinehmm.jobscript.sh) (inputFiles= (sp.gz srm://srm.ndgf.org/biogrid/db/uniprot/UniProt14.8/uniprot_sprot.fasta.gz) (tr.gz srm://srm.ndgf.org/biogrid/db/uniprot/UniProt14.8/uniprot_trembl.fasta.gz) ($HMM_NAME.hmm ””) ) (outputfiles= ($HMM_NAME.refined.hmm ””) )
  • 24. Unified Interface • Run on any resource I can access: $ ngsub myjob.xrsl • ...or run on my buddy’s cluster: $ ngsub -c kiniini.csc.fi myjob.xrsl • Check jobs: $ ngstat refinehmm-family023 (or use Grid Monitor web interface at www.nordugrid.org) • Fetch results: $ ngget refinehmm-family* DATA GRID RESULTS
  • 25. What do I need? 1. A resource with ARC and Biogrid REs 2. An ARC client 3. A Grid Certificate (available from a number of global certificate authorities) 4. Time allowance on the resource ( 5. Biogrid VO Membership Not really necessary, but it will get you 1 & 4 )
  • 26. What do I need? ...or you can just grab the RE scripts off the biogrid website, and your db of choice from the biogrid dCache.
  • 27. Why did we do this? Bioinformatic applications... – CPU intensive – Small input and output files – ”Large” databases can be cached ...are very well suited for distributed computing.
  • 28. Profit!
  • 29. Subclassification of the MDR superfamily • 15000 members from all kingdoms of life • 500 families 25% sequence identity • 40 human members • Different substrate specificities • Different subunit & cofactor count • 2 HMMs available for superfamily detection • None for any of the individual families
  • 30. Subclassification of the MDR superfamily • We made HMMs for all MDR (sub)families with 20+ members. • 86 families • 34 detected subfamilies to 14 of these • 11579 / 15000 sequences classified • ≈5000*hmmsearch vs UniProtKB Manuscript in preparation
  • 31. refinehmm • Algorithm for automated HMM refinement • Produces stable and reliable HMMs • Developed using Biogrid REs and resources Will also be open source software once the paper is out.
  • 32. Acknowledgements • Olli Tourunen Supercomputing centers Biogrid developer • NSC • Bengt Persson Jens Larsson, Leif Nixon Biogrid PI • HPC2N • NDGF Åke Sandgren Michael Grønager Josva Kleist • Others C3SE, CSC, Uppmax, Lunarc, PDC, • Biogrid co-applicants Aalborg University, Oslo University Ann-Charlotte Berglund Sonnhammer Erik Sonnhammer Inge Jonassen Joel Hedlund yohell@ifm.liu.se Biogrid User and Developer Linköping University, Sweden Birds-of-a-feather session tonight: see me after the talk!
  • 33. Acknowledgements • Olli Tourunen Supercomputing centers Biogrid developer • NSC • Bengt Persson Jens Larsson, Leif Nixon Biogrid PI • HPC2N • NDGF Åke Sandgren Michael Grønager Josva Kleist • Others C3SE, CSC, Uppmax, Lunarc, PDC, • Biogrid co-applicants Aalborg University, Oslo University Ann-Charlotte Berglund Sonnhammer Erik Sonnhammer Inge Jonassen Joel Hedlund yohell@ifm.liu.se Biogrid User and Developer Linköping University, Sweden Birds-of-a-feather session tonight: see me after the talk!

×