SlideShare a Scribd company logo
Biogrid – Bioinformatics for the grid

    Joel Hedlund <yohell@ifm.liu.se>
       Biogrid User and Developer
      Linköping University, Sweden

      Birds-of-a-feather session tonight: see me after this talk!
Outline
•   What is it?
•   What is it good for?
•   Does it really work?
•   Gory details.
•   Why did we do this?
•   Profit!
What is it?



NDGF BIO Community Grid
   Bioinformatics for the Grid
What is it?
• Unified interface
  ...to popular bioinformatic applications
  ...on shared, distributed computational resources
  ...using versioned and cached databases
What is it good for?
• Burst computing
  – High demand for short periods of time
     • high during development / production
     • low during analysis / writing papers
  – Share resources to enable more efficient use
• Database accessibility
• Availibility
• Unified interface
What is NDGF?
What is NDGF?
• Nordic Data Grid Facility
• A WLCG Tier1 facility
  – Worldwide LHC Computational Grid
  – Stores and processes data from LHC at CERN
     • peak rate ≈ 1.6Gb/s, when the accelerator is running
       (and that’s after most of the data have been filtered away)
”Does it really work, this
  distributed thingie?”
”Does it really work, this
  distributed thingie?”
 Why yes, very well thank you!
NDGF
• 96% availablity
  (highest of all Tier1 facilities)

• Third largest Tier1 facility in the world
• Lowest ratio of failed ATLAS jobs
• Production goals met, and beyond
   – Goal: 8% of all ATLAS resources (10.5% provided)
   – Goal: 9% of all ALICE resources (12% provided)




                    * Data graciously stolen from Leif Nixons NorduNet 2008 talk. Thank you Leif :-)
DISTRIBUTION
    IS A
 STRENGTH
It enforces unification

It ensures availability
Does it really work?


 It’s good enough for LHC.
It’s good enough for Bioinformatics.
Gory details
Biogrid provides
Optimised applications:
  – BLAST
  – ClustalW
  – HMMER
  – Muscle
  – Mafft




                          Planned: molecular dynamics, phylogeny...
Biogrid provides
Versioned, indexed and cached databases
  – UniProtKB (subreleases)
  – Uniref (subreleases)




                       Planned: genomes (EnsEMBL), nucleotides (EMBL)...
Cached database access




Database files are transfered to the cluster at most once per project.
Unified Interface
Unified Interface
Unified Interface


             DATA




             RESULTS
Unified Interface
• XRSL Job Description
  Standard in ARC Grid Middleware

• Well defined runtime environments
   $HMMERDIR: node local (fast) scratch dir containing db files
   prepare_db: download and unpack db files on the fly from front node to $HMMERDIR
XRSL Job Description
(jobName=refinehmm-family023)
(runTimeEnvironment=APPS/BIO/HMMER2.3.2)
(cpuTime=3000)
(executable=refinehmm.jobscript.sh)
(inputFiles=
  (sp.gz srm://srm.ndgf.org/biogrid/db/uniprot/UniProt14.8/uniprot_sprot.fasta.gz)
  (tr.gz srm://srm.ndgf.org/biogrid/db/uniprot/UniProt14.8/uniprot_trembl.fasta.gz)
  (family023.hmm ””)
)
(outputfiles=
  (family023.refined.hmm ””)
)
XRSL Job Description
(jobName=refinehmm-$HMM_NAME)
(runTimeEnvironment=APPS/BIO/HMMER2.3.2)
(cpuTime=3000)
(executable=refinehmm.jobscript.sh)
(inputFiles=
  (sp.gz srm://srm.ndgf.org/biogrid/db/uniprot/UniProt14.8/uniprot_sprot.fasta.gz)
  (tr.gz srm://srm.ndgf.org/biogrid/db/uniprot/UniProt14.8/uniprot_trembl.fasta.gz)
  ($HMM_NAME.hmm ””)
)
(outputfiles=
  ($HMM_NAME.refined.hmm ””)
)
Unified Interface
• Run on any resource I can access:
  $ ngsub myjob.xrsl

• ...or run on my buddy’s cluster:
  $ ngsub -c kiniini.csc.fi myjob.xrsl

• Check jobs:
  $ ngstat refinehmm-family023
  (or use Grid Monitor web interface at www.nordugrid.org)

• Fetch results:
  $ ngget refinehmm-family*



                     DATA                GRID
                                                RESULTS
What do I need?
    1. A resource with ARC and Biogrid REs
    2. An ARC client
    3. A Grid Certificate
       (available from a number of global certificate authorities)

    4. Time allowance on the resource



(   5. Biogrid VO Membership
       Not really necessary, but it will get you 1 & 4   )
What do I need?



...or you can just grab the RE scripts off the biogrid website,
        and your db of choice from the biogrid dCache.
Why did we do this?
Bioinformatic applications...
  – CPU intensive
  – Small input and output files
  – ”Large” databases can be cached

...are very well suited for distributed computing.
Profit!
Subclassification of the MDR superfamily

• 15000 members
    from all kingdoms of life

• 500 families
    25% sequence identity

•   40 human members
•   Different substrate specificities
•   Different subunit & cofactor count
•   2 HMMs available for superfamily detection
•   None for any of the individual families
Subclassification of the MDR superfamily

• We made HMMs for all MDR (sub)families
  with 20+ members.
• 86 families
• 34 detected subfamilies to 14 of these
• 11579 / 15000 sequences classified
• ≈5000*hmmsearch vs UniProtKB



                                Manuscript in preparation
refinehmm
• Algorithm for automated HMM refinement
• Produces stable and reliable HMMs
• Developed using Biogrid REs and resources




                Will also be open source software once the paper is out.
Acknowledgements
  • Olli Tourunen                       Supercomputing centers
    Biogrid developer
                                        • NSC
  • Bengt Persson                         Jens Larsson, Leif Nixon
    Biogrid PI
                                        • HPC2N
  • NDGF                                  Åke Sandgren
    Michael Grønager
    Josva Kleist                        • Others
                                          C3SE, CSC, Uppmax, Lunarc, PDC,
  • Biogrid co-applicants                 Aalborg University, Oslo University
    Ann-Charlotte Berglund Sonnhammer
    Erik Sonnhammer
    Inge Jonassen                                                 Joel Hedlund
                                                              yohell@ifm.liu.se
                                                    Biogrid User and Developer
                                                  Linköping University, Sweden

Birds-of-a-feather session tonight: see me after the talk!
Acknowledgements
  • Olli Tourunen                       Supercomputing centers
    Biogrid developer
                                        • NSC
  • Bengt Persson                         Jens Larsson, Leif Nixon
    Biogrid PI
                                        • HPC2N
  • NDGF                                  Åke Sandgren
    Michael Grønager
    Josva Kleist                        • Others
                                          C3SE, CSC, Uppmax, Lunarc, PDC,
  • Biogrid co-applicants                 Aalborg University, Oslo University
    Ann-Charlotte Berglund Sonnhammer
    Erik Sonnhammer
    Inge Jonassen                                                 Joel Hedlund
                                                              yohell@ifm.liu.se
                                                    Biogrid User and Developer
                                                  Linköping University, Sweden

Birds-of-a-feather session tonight: see me after the talk!

More Related Content

Similar to Hedlund_biogrid_BOSC2009

ngs.pptx
ngs.pptxngs.pptx
ngs.pptx
aaaa bbb
 
Next Generation Sequencing - An Overview
Next Generation Sequencing - An OverviewNext Generation Sequencing - An Overview
Next Generation Sequencing - An Overview
EdizonJambormias2
 
Mastering Bio Grid
Mastering Bio GridMastering Bio Grid
Mastering Bio Grid
Keith Russell
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
Bioinformatics and Computational Biosciences Branch
 
Grid is Dead ? Nimrod on the Cloud
Grid is Dead ? Nimrod on the CloudGrid is Dead ? Nimrod on the Cloud
Grid is Dead ? Nimrod on the Cloud
Adianto Wibisono
 
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Andrew Su
 
San diego-supercomputing-sc17-user-group
San diego-supercomputing-sc17-user-groupSan diego-supercomputing-sc17-user-group
San diego-supercomputing-sc17-user-group
inside-BigData.com
 
HiPipe Professional
HiPipe ProfessionalHiPipe Professional
HiPipe Professional
Cheng-Yang(Louis) Tang
 
TraitCapture: NextGen Monitoring and Visualization from seed to ecosystem
TraitCapture: NextGen Monitoring and Visualization from seed to ecosystemTraitCapture: NextGen Monitoring and Visualization from seed to ecosystem
TraitCapture: NextGen Monitoring and Visualization from seed to ecosystem
TimeScience
 
Working with Instrument Data (GlobusWorld Tour - UMich)
Working with Instrument Data (GlobusWorld Tour - UMich)Working with Instrument Data (GlobusWorld Tour - UMich)
Working with Instrument Data (GlobusWorld Tour - UMich)
Globus
 
Big data solution for ngs data analysis
Big data solution for ngs data analysisBig data solution for ngs data analysis
Big data solution for ngs data analysis
Yun Lung Li
 
Climb bath
Climb bathClimb bath
Climb bath
Tom Connor
 
TraitCapture:Open source tools for DIY high throughput Phenomics and NextGen ...
TraitCapture:Open source tools for DIY high throughput Phenomics and NextGen ...TraitCapture:Open source tools for DIY high throughput Phenomics and NextGen ...
TraitCapture:Open source tools for DIY high throughput Phenomics and NextGen ...
TimeScience
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
Bonnie Hurwitz
 
CLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB LaunchCLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB Launch
Tom Connor
 
Dp2 ppt by_bikramjit_chowdhury_final
Dp2 ppt by_bikramjit_chowdhury_finalDp2 ppt by_bikramjit_chowdhury_final
Dp2 ppt by_bikramjit_chowdhury_final
Bikramjit Chowdhury
 
Mik Black bioinformatics symposium
Mik Black bioinformatics symposiumMik Black bioinformatics symposium
Mik Black bioinformatics symposium
Mik Black bioinformatics symposiumMik Black bioinformatics symposium
Mik Black bioinformatics symposium
guest5e6f31
 
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...
Larry Smarr
 
DeepLabCut AI Residency
DeepLabCut AI ResidencyDeepLabCut AI Residency
DeepLabCut AI Residency
Vic Shao-Chih Chiang
 

Similar to Hedlund_biogrid_BOSC2009 (20)

ngs.pptx
ngs.pptxngs.pptx
ngs.pptx
 
Next Generation Sequencing - An Overview
Next Generation Sequencing - An OverviewNext Generation Sequencing - An Overview
Next Generation Sequencing - An Overview
 
Mastering Bio Grid
Mastering Bio GridMastering Bio Grid
Mastering Bio Grid
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
Grid is Dead ? Nimrod on the Cloud
Grid is Dead ? Nimrod on the CloudGrid is Dead ? Nimrod on the Cloud
Grid is Dead ? Nimrod on the Cloud
 
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
 
San diego-supercomputing-sc17-user-group
San diego-supercomputing-sc17-user-groupSan diego-supercomputing-sc17-user-group
San diego-supercomputing-sc17-user-group
 
HiPipe Professional
HiPipe ProfessionalHiPipe Professional
HiPipe Professional
 
TraitCapture: NextGen Monitoring and Visualization from seed to ecosystem
TraitCapture: NextGen Monitoring and Visualization from seed to ecosystemTraitCapture: NextGen Monitoring and Visualization from seed to ecosystem
TraitCapture: NextGen Monitoring and Visualization from seed to ecosystem
 
Working with Instrument Data (GlobusWorld Tour - UMich)
Working with Instrument Data (GlobusWorld Tour - UMich)Working with Instrument Data (GlobusWorld Tour - UMich)
Working with Instrument Data (GlobusWorld Tour - UMich)
 
Big data solution for ngs data analysis
Big data solution for ngs data analysisBig data solution for ngs data analysis
Big data solution for ngs data analysis
 
Climb bath
Climb bathClimb bath
Climb bath
 
TraitCapture:Open source tools for DIY high throughput Phenomics and NextGen ...
TraitCapture:Open source tools for DIY high throughput Phenomics and NextGen ...TraitCapture:Open source tools for DIY high throughput Phenomics and NextGen ...
TraitCapture:Open source tools for DIY high throughput Phenomics and NextGen ...
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
 
CLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB LaunchCLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB Launch
 
Dp2 ppt by_bikramjit_chowdhury_final
Dp2 ppt by_bikramjit_chowdhury_finalDp2 ppt by_bikramjit_chowdhury_final
Dp2 ppt by_bikramjit_chowdhury_final
 
Mik Black bioinformatics symposium
Mik Black bioinformatics symposiumMik Black bioinformatics symposium
Mik Black bioinformatics symposium
 
Mik Black bioinformatics symposium
Mik Black bioinformatics symposiumMik Black bioinformatics symposium
Mik Black bioinformatics symposium
 
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...
 
DeepLabCut AI Residency
DeepLabCut AI ResidencyDeepLabCut AI Residency
DeepLabCut AI Residency
 

More from bosc

Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009
bosc
 
Bosc Intro 20090627
Bosc Intro 20090627Bosc Intro 20090627
Bosc Intro 20090627
bosc
 
Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009
bosc
 
Schbath Rmes Bosc2009
Schbath Rmes Bosc2009Schbath Rmes Bosc2009
Schbath Rmes Bosc2009
bosc
 
Kallio Chipster Bosc2009
Kallio Chipster Bosc2009Kallio Chipster Bosc2009
Kallio Chipster Bosc2009
bosc
 
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
bosc
 
Rice Emboss Bosc2009
Rice Emboss Bosc2009Rice Emboss Bosc2009
Rice Emboss Bosc2009
bosc
 
Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009
bosc
 
Senger Soaplab Bosc2009
Senger Soaplab Bosc2009Senger Soaplab Bosc2009
Senger Soaplab Bosc2009
bosc
 
Cock Biopython Bosc2009
Cock Biopython Bosc2009Cock Biopython Bosc2009
Cock Biopython Bosc2009
bosc
 
Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009
bosc
 
Snell Psoda Bosc2009
Snell Psoda Bosc2009Snell Psoda Bosc2009
Snell Psoda Bosc2009
bosc
 
Procter Vamsas Bosc2009
Procter Vamsas Bosc2009Procter Vamsas Bosc2009
Procter Vamsas Bosc2009
bosc
 
Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009
bosc
 
Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009
bosc
 
Moeller Debian Bosc2009
Moeller Debian Bosc2009Moeller Debian Bosc2009
Moeller Debian Bosc2009
bosc
 
Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009
bosc
 
Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009
bosc
 
Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009
bosc
 
Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009
bosc
 

More from bosc (20)

Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009
 
Bosc Intro 20090627
Bosc Intro 20090627Bosc Intro 20090627
Bosc Intro 20090627
 
Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009
 
Schbath Rmes Bosc2009
Schbath Rmes Bosc2009Schbath Rmes Bosc2009
Schbath Rmes Bosc2009
 
Kallio Chipster Bosc2009
Kallio Chipster Bosc2009Kallio Chipster Bosc2009
Kallio Chipster Bosc2009
 
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
 
Rice Emboss Bosc2009
Rice Emboss Bosc2009Rice Emboss Bosc2009
Rice Emboss Bosc2009
 
Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009
 
Senger Soaplab Bosc2009
Senger Soaplab Bosc2009Senger Soaplab Bosc2009
Senger Soaplab Bosc2009
 
Cock Biopython Bosc2009
Cock Biopython Bosc2009Cock Biopython Bosc2009
Cock Biopython Bosc2009
 
Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009
 
Snell Psoda Bosc2009
Snell Psoda Bosc2009Snell Psoda Bosc2009
Snell Psoda Bosc2009
 
Procter Vamsas Bosc2009
Procter Vamsas Bosc2009Procter Vamsas Bosc2009
Procter Vamsas Bosc2009
 
Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009
 
Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009
 
Moeller Debian Bosc2009
Moeller Debian Bosc2009Moeller Debian Bosc2009
Moeller Debian Bosc2009
 
Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009
 
Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009
 
Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009
 
Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009
 

Recently uploaded

dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
saastr
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 

Recently uploaded (20)

dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 

Hedlund_biogrid_BOSC2009

  • 1. Biogrid – Bioinformatics for the grid Joel Hedlund <yohell@ifm.liu.se> Biogrid User and Developer Linköping University, Sweden Birds-of-a-feather session tonight: see me after this talk!
  • 2. Outline • What is it? • What is it good for? • Does it really work? • Gory details. • Why did we do this? • Profit!
  • 3. What is it? NDGF BIO Community Grid Bioinformatics for the Grid
  • 4. What is it? • Unified interface ...to popular bioinformatic applications ...on shared, distributed computational resources ...using versioned and cached databases
  • 5. What is it good for? • Burst computing – High demand for short periods of time • high during development / production • low during analysis / writing papers – Share resources to enable more efficient use • Database accessibility • Availibility • Unified interface
  • 7. What is NDGF? • Nordic Data Grid Facility • A WLCG Tier1 facility – Worldwide LHC Computational Grid – Stores and processes data from LHC at CERN • peak rate ≈ 1.6Gb/s, when the accelerator is running (and that’s after most of the data have been filtered away)
  • 8.
  • 9.
  • 10. ”Does it really work, this distributed thingie?”
  • 11. ”Does it really work, this distributed thingie?” Why yes, very well thank you!
  • 12. NDGF • 96% availablity (highest of all Tier1 facilities) • Third largest Tier1 facility in the world • Lowest ratio of failed ATLAS jobs • Production goals met, and beyond – Goal: 8% of all ATLAS resources (10.5% provided) – Goal: 9% of all ALICE resources (12% provided) * Data graciously stolen from Leif Nixons NorduNet 2008 talk. Thank you Leif :-)
  • 13. DISTRIBUTION IS A STRENGTH
  • 14. It enforces unification It ensures availability
  • 15. Does it really work? It’s good enough for LHC. It’s good enough for Bioinformatics.
  • 17. Biogrid provides Optimised applications: – BLAST – ClustalW – HMMER – Muscle – Mafft Planned: molecular dynamics, phylogeny...
  • 18. Biogrid provides Versioned, indexed and cached databases – UniProtKB (subreleases) – Uniref (subreleases) Planned: genomes (EnsEMBL), nucleotides (EMBL)...
  • 19. Cached database access Database files are transfered to the cluster at most once per project.
  • 22. Unified Interface DATA RESULTS
  • 23. Unified Interface • XRSL Job Description Standard in ARC Grid Middleware • Well defined runtime environments $HMMERDIR: node local (fast) scratch dir containing db files prepare_db: download and unpack db files on the fly from front node to $HMMERDIR
  • 24. XRSL Job Description (jobName=refinehmm-family023) (runTimeEnvironment=APPS/BIO/HMMER2.3.2) (cpuTime=3000) (executable=refinehmm.jobscript.sh) (inputFiles= (sp.gz srm://srm.ndgf.org/biogrid/db/uniprot/UniProt14.8/uniprot_sprot.fasta.gz) (tr.gz srm://srm.ndgf.org/biogrid/db/uniprot/UniProt14.8/uniprot_trembl.fasta.gz) (family023.hmm ””) ) (outputfiles= (family023.refined.hmm ””) )
  • 25. XRSL Job Description (jobName=refinehmm-$HMM_NAME) (runTimeEnvironment=APPS/BIO/HMMER2.3.2) (cpuTime=3000) (executable=refinehmm.jobscript.sh) (inputFiles= (sp.gz srm://srm.ndgf.org/biogrid/db/uniprot/UniProt14.8/uniprot_sprot.fasta.gz) (tr.gz srm://srm.ndgf.org/biogrid/db/uniprot/UniProt14.8/uniprot_trembl.fasta.gz) ($HMM_NAME.hmm ””) ) (outputfiles= ($HMM_NAME.refined.hmm ””) )
  • 26. Unified Interface • Run on any resource I can access: $ ngsub myjob.xrsl • ...or run on my buddy’s cluster: $ ngsub -c kiniini.csc.fi myjob.xrsl • Check jobs: $ ngstat refinehmm-family023 (or use Grid Monitor web interface at www.nordugrid.org) • Fetch results: $ ngget refinehmm-family* DATA GRID RESULTS
  • 27. What do I need? 1. A resource with ARC and Biogrid REs 2. An ARC client 3. A Grid Certificate (available from a number of global certificate authorities) 4. Time allowance on the resource ( 5. Biogrid VO Membership Not really necessary, but it will get you 1 & 4 )
  • 28. What do I need? ...or you can just grab the RE scripts off the biogrid website, and your db of choice from the biogrid dCache.
  • 29. Why did we do this? Bioinformatic applications... – CPU intensive – Small input and output files – ”Large” databases can be cached ...are very well suited for distributed computing.
  • 31. Subclassification of the MDR superfamily • 15000 members from all kingdoms of life • 500 families 25% sequence identity • 40 human members • Different substrate specificities • Different subunit & cofactor count • 2 HMMs available for superfamily detection • None for any of the individual families
  • 32. Subclassification of the MDR superfamily • We made HMMs for all MDR (sub)families with 20+ members. • 86 families • 34 detected subfamilies to 14 of these • 11579 / 15000 sequences classified • ≈5000*hmmsearch vs UniProtKB Manuscript in preparation
  • 33. refinehmm • Algorithm for automated HMM refinement • Produces stable and reliable HMMs • Developed using Biogrid REs and resources Will also be open source software once the paper is out.
  • 34. Acknowledgements • Olli Tourunen Supercomputing centers Biogrid developer • NSC • Bengt Persson Jens Larsson, Leif Nixon Biogrid PI • HPC2N • NDGF Åke Sandgren Michael Grønager Josva Kleist • Others C3SE, CSC, Uppmax, Lunarc, PDC, • Biogrid co-applicants Aalborg University, Oslo University Ann-Charlotte Berglund Sonnhammer Erik Sonnhammer Inge Jonassen Joel Hedlund yohell@ifm.liu.se Biogrid User and Developer Linköping University, Sweden Birds-of-a-feather session tonight: see me after the talk!
  • 35. Acknowledgements • Olli Tourunen Supercomputing centers Biogrid developer • NSC • Bengt Persson Jens Larsson, Leif Nixon Biogrid PI • HPC2N • NDGF Åke Sandgren Michael Grønager Josva Kleist • Others C3SE, CSC, Uppmax, Lunarc, PDC, • Biogrid co-applicants Aalborg University, Oslo University Ann-Charlotte Berglund Sonnhammer Erik Sonnhammer Inge Jonassen Joel Hedlund yohell@ifm.liu.se Biogrid User and Developer Linköping University, Sweden Birds-of-a-feather session tonight: see me after the talk!