• Save
SeqWare on the Cloud: Porting a Genome Center's Infrastructure to Amazon Web Services
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

SeqWare on the Cloud: Porting a Genome Center's Infrastructure to Amazon Web Services

  • 1,984 views
Uploaded on

This is a talk I (Brian O'Connor) gave at Genome Informatics 2012 describing how SeqWare was used to build the Next-Generation Sequencing (NGS) software infrastructure needed for multiple genome......

This is a talk I (Brian O'Connor) gave at Genome Informatics 2012 describing how SeqWare was used to build the Next-Generation Sequencing (NGS) software infrastructure needed for multiple genome centers (OICR & UNC) and how that was leveraged on Amazon's cloud. If you are interested in using SeqWare for your NGS analysis needs see out open source site at http://seqware.github.com or, for a commercially supported version, see http://nimbusinformatics.com.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,984
On Slideshare
1,401
From Embeds
583
Number of Embeds
5

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 583

http://seqware.github.io 410
http://seqware.github.com 151
http://localhost 14
http://orz.res.oicr.on.ca 4
http://50.116.44.134 4

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. SeqWare on the Cloud:Porting a Genome Center’s Infrastructure to Amazon Web Services Brian OConnor SeqWare Software Architect & Manager for Software Engineering The Ontario Institute for Cancer Research
  • 2. Effective Scaling Integration Expertise & Sharing Effective System Compute & StorageSeqWare was designed to scale in these ways
  • 3. Effective ScalingQuery Integration ExpertiseEngine & SharingPoster Effective System Compute & Storage SeqWare was designed to scale in these ways
  • 4. The Open Source SeqWare Project SeqWare SeqWare Web Query Engine Service SeqWare Portal SeqWare SeqWare Pipeline MetaDB Local Cluster Cloud Big Data Small Data
  • 5. Distinguishing Features of SeqWare Firehose ● Infrastructure Toolkit Tavern ● Developer Framework a Open Source/Community ● Automation ● Environment-Agnostic Commercial ● Tailored for Big Projects ● User-Created Workflows ● Packaging Format ● Provenance Tracking ● Fault Tolerant ● Tools-Agnostic ● Open Source
  • 6. Projects Using SeqWare UNC OntarioLineberger Institute for Cancer Cancer Center Research Iceman, Plant Genome HuRef 300x, Clinical Assembly Others... Sequencing + local projects + local projects Exome, Targeted Whole Genome, Resequencing Targeted Re- Whole Genome RNASeq Sequencing, Whole Genome RNASeq Hundreds of 2 genomes, 1.5 TBase 38 TBase 9 genomes patient samples JBrowse on927 samples 1,522 samples a 300x genome iPad 982 “lanes” 2,297 “lanes”
  • 7. Scaling Expertise: Analyzing Illumina Data @ OICR● September 2011 rolled out SeqWare at OICR● Goal: to deploy SeqWare and streamline production analysis through automation ● 4 groups working together ● SeqWare Workflows for – Large projects and common tasks – Projects with “public uploads”
  • 8. SeqWare at OICR SeqWare SeqWare Web Query Engine ServiceSeqWare Portal SeqWare SeqWare Pipeline MetaDB Local Cluster Cloud Big Data Small Data
  • 9. SeqWare at OICR SeqWare SeqWare Web Query EngineSoftware ServiceEngineeringSeqWare Portal SeqWare SeqWare Pipeline MetaDB Local Cluster Cloud Big Data Small Data
  • 10. SeqWare at OICR SeqWare SeqWare Web Query Engine ServiceSeqWare Portal SeqWare SeqWare Pipeline MetaDB Local Cluster Pipeline & Big Data Tool Evaluation Cloud Small Data
  • 11. SeqWare at OICR SeqWare SeqWare Web Query Engine ServiceSeqWare Portal SeqWare SeqWare Pipeline MetaDB Sequencing Facility Local Cluster LIMS Cloud Big Data Small Data
  • 12. SeqWare at OICR SeqWare SeqWare Web Query Engine Service User + Data =SeqWare Portal SeqWare SeqWare Pipeline MetaDB “deciders” Local Cluster Production Big Data Informatics Cloud Small Data
  • 13. OICR Production WorkflowsMultiple groups contributed including the new Pipeline and Evaluation Team In Production Staging , Testing, & Development Workflows
  • 14. OICR SeqWare Results 38 TrillionSamples Bases Aligned Time (~2 years) ● Automated key components in production ● What about sharing our infrastructure? ● To the Cloud!
  • 15. “The Cloud”Want to share infrastructure without sharing infrastructure Core Services Transfer & Web Other Nifty Services Services Elastic Cloud Compute Elastic Beanstalk Simple Storage Glacier Import/Export Service Elastic Map/Reduce Linux tools DirectConnect for disk and DynamoDB file encryption HBase Hardware through API calls
  • 16. Scaling Computation: Analyzing SOLiD Data on Amazon● Life Collaborations Division had 9 human genomes such as the Icemans genome and HuRef resequenced at high depth● Goal: to deploy SeqWare infrastructure on AWS and analyze data in a scalable way ● Without building infrastructure ● Using open source tools
  • 17. SeqWare Infrastructure on EC2 InstanceWorkflow Command SeqWare or Bundle Line MetaDB Cluster Tools Launcher Config SeqWare Web Service Amazon Amazon EC2 S3 Import SeqWare PipelineResult Files: SeqWare BAM, Portal VCF,Reports User SeqWare Interfaces Amazon Instance or Cluster
  • 18. Workflow OutputsResults via Project Website on EC2 Variants Loaded in JBrowse Genome Browser on Elastic Beanstalk http://icemangenome.net Variants in Database and Files in S3 Variant BAM VCF Annotated VCF Database
  • 19. Results● Cloud delivered fantastic computational and storage scalability● Analyzed 9 human genomes, one at 300x!● Costs ● 8 node HPC cluster, about 4 days ● 30x coverage genome was ~$1000 (<$15/GBase) ● ~$150 per exome ($10/GBase) ● ~$50/month/genome storage, website, & browser
  • 20. The Future of SeqWare● Scalability ● Cloud-based cluster launching (Starcluster/Cloudman) ● Release encryption and distributed filesystem tools ● Better documentation and easier setup● Expertise ● Simplify pipeline language(s) and development process ● Release OICR public workflows● Integration ● Expand NOSQL variant/annotation database ● Support for other tools like Galaxy
  • 21. Availability● SeqWare available at: http://seqware.github.com, @SeqWare Virtual Box & AMI● Brian OConnor boconnor@oicr.on.ca
  • 22. Acknowledgements● SeqWare @ OICR ● Tim Harkins ● Morgan Taschuk, ● Barry Merriman Denis Yuen, Yong Liang ● Jason Warner● OICR SeqProdBio ● Kevin McKernan ● Tim Beck, Zheng Zha, Tony ● Vincent Ferretti DeBat● OICR Bioinformatics Core ● Lincoln Stein ● Francis Ouellette, Zhibin Lu● SeqWare @ UNC ● Neil Hayes, Sara Grimm, Stuart Jefferys, Matt Solloway, and the Lineberger group