Providing Bioinformatics Services on Cloud


Published on

Improvements of experimental technologies forces biologists to face a deluge of data that require relevant tools and sufficient resources to be analyzed. The cloud helps bioinformatics experts to define virtual appliances with pre-installed tools and workflows, and helps scientists to deploy them, on demand, on national research infrastructures.

Presented by Christophe Blanchet and Clément Cauthey at the EGI Community Forum in Manchester, UK in April 2013.

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Providing Bioinformatics Services on Cloud

  1. 1. Christophe Blanchet, Clément GautheyInfrastructure Distributed for BiologyIDB-IBCP CNRS FR3302 - LYON - FRANCEhttp://idee-b.ibcp.frIDB acknowledges co-funding by the European Communitys Seventh Framework Programme (INFSO-RI-261552)and the French National Research Agencys Arpege Programme (ANR-10-SEGI-001)Providing Bioinformatics Serviceson CloudC. Blanchet and C. GautheyEGI CF13, Manchester, 9 April 2013Infrastructure Distributed for Biology - IDBCNRS-IBCP FR3302, Lyon, FRANCE
  2. 2. EGI CF13, Manchester, 9 April 2013Bioinformatics Today• Biological data are big data• 1512 online databases (NAR Database Issue 2013)• Institut Sanger, UK, 5 PB• Beijing Genome Institute, China, 4 sites, 10 PB➡ Big data in lot of places• Analysing such data became difficult• Scale-up of the analyses : gene/protein to complete genome/proteome, ...• Lot of different daily-used tools• That need to be combined in workflows• Usual interfaces: portals,Web services, federation,...➡ Datacenters with ease of access/use• Distributed resources• Experimental platforms: NGS, imaging, ...• Bioinformatics platforms➡ Federation of datacentersADNBIMADNAADNBI CCBIADNADN
  3. 3. EGI CF13, Manchester, 9 April 2013Sequencing Genomessource: genome sequencingbecome a lab commodity withNGS (cheap and efficient)source:
  4. 4. EGI CF13, Manchester, 9 April 2013Infrastructures in BiologyLot of toolsand web servicesto treat and vizualizelot of data
  5. 5. EGI CF13, Manchester, 9 April 2013The scene• Bioinformatics services providers• Is it easy to deploy lot of (incompatible) tools ?• To make them connected to public databases ?• To limit transfer of huge data ?• To provide users with their own computing resources ?• With their own isolated storage ?• Scientists• Is it easy to access/use these tools ?• To adapt to your usage ?• To get your/other tools deployed on a datacenter ?• To combine them ?• To get my own computing/storage resources ?
  6. 6. EGI CF13, Manchester, 9 April 2013IDB’s Cloud• Cloud workbench for Biology• 13 turnkey bioinformatics appliances (as of Apr. 2013)• Running since Sept. 2011, opened to Biology community• Lyon, FRANCE• Powered by• StratusLab• Compute nodes, Block storage• +900 cores, +4TB RAM, 36TB vdisks• Mainly Intel SandyBridge servers with 32c 128GB• Bigmen servers with 64c 768GB• VMs from 1 to 64c, 512MB to 760GB RAM• + Openstack• Object storage (Swift)• +200 TB redundant & scalable storage
  7. 7. EGI CF13, Manchester, 9 April 2013Driven throught a simple web interface
  8. 8. EGI CF13, Manchester, 9 April 2013Integrate Bioinformatics Tools in CloudBLASTGOR4FastASSearchAbyssClustalWBioinformaticsToolsRayBWAPhyML RedHat,CentOSDebian,UbuntuSuseLinuxVirtual machinesCreatenewApplianceBioinformatics MarketplaceNGSStructure Galaxy ARIA (…)Sequence• Appliances are virtual machines• small : few GB, easy to convert in most virtualization formats• Installed and pre-configured with common bioinformatics tools• e.g. BLAST, Clustalw,ARIA, MEME, HMMer, TopHat, BWA, Samtools, etc.
  9. 9. EGI CF13, Manchester, 9 April 2013Bioinformatics Appliances
  10. 10. EGI CF13, Manchester, 9 April 2013Select your bioinformatics tools
  11. 11. EGI CF13, Manchester, 9 April 2013Run Bioinformatics Cloud InstancesBioinformatics MarketplaceNGSStructure Galaxy ARIA (…)SequenceIBCPs CloudResourcesBLAST,Clustal,etc.PaaSWorkersVM CNSSharedFSlaunch jobssshIaaSMaster & StorageVM ARIAPortalLaunchInstances
  12. 12. EGI CF13, Manchester, 9 April 2013Manage your Cloud Instances
  13. 13. EGI CF13, Manchester, 9 April 2013UNIPROTPDBEMBLPROSITEGenomesPublicData sourcesBioinformaticsCloudBLAST,Clustal,etc.PaaSWorkersVM CNSSharedFSlaunch jobssshIaaSMaster & StorageVM ARIAPortalshared(NFS)UserPersistent datapdisk(iSCSI)Biological Data in CloudUpload your dataGet your resultsscp http/S3scp http/S3
  14. 14. EGI CF13, Manchester, 9 April 2013Example:‘biocompute’ Appliance• Use your own instance(s)• With pre-installedstandard bioinformaticstools• BLAST, FastA, SSearch,HMM,...• ClustalW2, Clustal-Omega, Muscle,..• Bowtie(2), BWA, samtools, ...• MEME, R, etc.• Connected to publicreference data• Uniprot, EMBL, genomes, PDB, etc.• Automaticaly shared to theVMs
  15. 15. EGI CF13, Manchester, 9 April 2013Example: Galaxy portal for NGS analyses• Analyse NGS data• portal Galaxy is widely used in the community• connected to large public data: sequences and indexes• large user data (GBs)• Preserve workflows and results (persistent storage)
  16. 16. EGI CF13, Manchester, 9 April 2013Example: Proteomics• Motivation• Collaboration with a mass spectroscopy platform• Running out of space on their local resources• Protein identification• Mass experimental data• Reference databases : nr, Swiss-Prot• Reference screening tools:OMSSA, X!Tandem• User interface• Remote display• NX• Reference GUIs• SearchGUI• PeptidShakersource: PeptideShaker site
  17. 17. EGI CF13, Manchester, 9 April 2013Conclusion• Provide turnkey bioinformatics appliances• Standard tools and pipelines• Interoperability: ready to run on cloud• Easier to transfer appliances than data (GB vs TB)• Provide a cloud infrastructure tightly connectedto existing bioinformatics infrastructure• Public IDB’s bioinformatics cloud• Linked to public biological databases• In collaboration with the French Bioinformatics Institute• Ease the usage by scientists• Usual bioinformatics gateways• Persistent and large ubiquitous storage• Web interface for cloud management
  18. 18. EGI CF13, Manchester, 9 April 2013Perspectives• Define good practices to provide academiccommunity and industry with bioinformatics services!• French Bioinformatics Institute - IFB• Goals are to provide core bioinformatics resources to thenational and international life science research community inkey fields such as genomics, proteomics, systems biology, etc.• Aims at building a national academic cloud devoted toBioinformatics, inspired by the model evaluated through theIDB’s cloud.• European ELIXIR infrastructure• To build a sustainable European infrastructure for biologicalinformation, supporting life science research and itstranslation• IFB will be the French representative in ELIXIR.
  19. 19. EGI CF13, Manchester, 9 April 2013• Acknowledgment• StratusLab members• co-funding by the European Communitys SeventhFramework Programme (INFSO-RI-261552) andby the French National Research Agencys ArpegeProgramme (ANR-10-SEGI-001).Questions ?