Your SlideShare is downloading. ×
0
SeqWare on the Cloud: Porting a Genome Center's Infrastructure to Amazon Web Services
SeqWare on the Cloud: Porting a Genome Center's Infrastructure to Amazon Web Services
SeqWare on the Cloud: Porting a Genome Center's Infrastructure to Amazon Web Services
SeqWare on the Cloud: Porting a Genome Center's Infrastructure to Amazon Web Services
SeqWare on the Cloud: Porting a Genome Center's Infrastructure to Amazon Web Services
SeqWare on the Cloud: Porting a Genome Center's Infrastructure to Amazon Web Services
SeqWare on the Cloud: Porting a Genome Center's Infrastructure to Amazon Web Services
SeqWare on the Cloud: Porting a Genome Center's Infrastructure to Amazon Web Services
SeqWare on the Cloud: Porting a Genome Center's Infrastructure to Amazon Web Services
SeqWare on the Cloud: Porting a Genome Center's Infrastructure to Amazon Web Services
SeqWare on the Cloud: Porting a Genome Center's Infrastructure to Amazon Web Services
SeqWare on the Cloud: Porting a Genome Center's Infrastructure to Amazon Web Services
SeqWare on the Cloud: Porting a Genome Center's Infrastructure to Amazon Web Services
SeqWare on the Cloud: Porting a Genome Center's Infrastructure to Amazon Web Services
SeqWare on the Cloud: Porting a Genome Center's Infrastructure to Amazon Web Services
SeqWare on the Cloud: Porting a Genome Center's Infrastructure to Amazon Web Services
SeqWare on the Cloud: Porting a Genome Center's Infrastructure to Amazon Web Services
SeqWare on the Cloud: Porting a Genome Center's Infrastructure to Amazon Web Services
SeqWare on the Cloud: Porting a Genome Center's Infrastructure to Amazon Web Services
SeqWare on the Cloud: Porting a Genome Center's Infrastructure to Amazon Web Services
SeqWare on the Cloud: Porting a Genome Center's Infrastructure to Amazon Web Services
SeqWare on the Cloud: Porting a Genome Center's Infrastructure to Amazon Web Services
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

SeqWare on the Cloud: Porting a Genome Center's Infrastructure to Amazon Web Services

1,691

Published on

This is a talk I (Brian O'Connor) gave at Genome Informatics 2012 describing how SeqWare was used to build the Next-Generation Sequencing (NGS) software infrastructure needed for multiple genome …

This is a talk I (Brian O'Connor) gave at Genome Informatics 2012 describing how SeqWare was used to build the Next-Generation Sequencing (NGS) software infrastructure needed for multiple genome centers (OICR & UNC) and how that was leveraged on Amazon's cloud. If you are interested in using SeqWare for your NGS analysis needs see out open source site at http://seqware.github.com or, for a commercially supported version, see http://nimbusinformatics.com.

Published in: Health & Medicine, Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,691
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. SeqWare on the Cloud:Porting a Genome Center’s Infrastructure to Amazon Web Services Brian OConnor SeqWare Software Architect & Manager for Software Engineering The Ontario Institute for Cancer Research
  • 2. Effective Scaling Integration Expertise & Sharing Effective System Compute & StorageSeqWare was designed to scale in these ways
  • 3. Effective ScalingQuery Integration ExpertiseEngine & SharingPoster Effective System Compute & Storage SeqWare was designed to scale in these ways
  • 4. The Open Source SeqWare Project SeqWare SeqWare Web Query Engine Service SeqWare Portal SeqWare SeqWare Pipeline MetaDB Local Cluster Cloud Big Data Small Data
  • 5. Distinguishing Features of SeqWare Firehose ● Infrastructure Toolkit Tavern ● Developer Framework a Open Source/Community ● Automation ● Environment-Agnostic Commercial ● Tailored for Big Projects ● User-Created Workflows ● Packaging Format ● Provenance Tracking ● Fault Tolerant ● Tools-Agnostic ● Open Source
  • 6. Projects Using SeqWare UNC OntarioLineberger Institute for Cancer Cancer Center Research Iceman, Plant Genome HuRef 300x, Clinical Assembly Others... Sequencing + local projects + local projects Exome, Targeted Whole Genome, Resequencing Targeted Re- Whole Genome RNASeq Sequencing, Whole Genome RNASeq Hundreds of 2 genomes, 1.5 TBase 38 TBase 9 genomes patient samples JBrowse on927 samples 1,522 samples a 300x genome iPad 982 “lanes” 2,297 “lanes”
  • 7. Scaling Expertise: Analyzing Illumina Data @ OICR● September 2011 rolled out SeqWare at OICR● Goal: to deploy SeqWare and streamline production analysis through automation ● 4 groups working together ● SeqWare Workflows for – Large projects and common tasks – Projects with “public uploads”
  • 8. SeqWare at OICR SeqWare SeqWare Web Query Engine ServiceSeqWare Portal SeqWare SeqWare Pipeline MetaDB Local Cluster Cloud Big Data Small Data
  • 9. SeqWare at OICR SeqWare SeqWare Web Query EngineSoftware ServiceEngineeringSeqWare Portal SeqWare SeqWare Pipeline MetaDB Local Cluster Cloud Big Data Small Data
  • 10. SeqWare at OICR SeqWare SeqWare Web Query Engine ServiceSeqWare Portal SeqWare SeqWare Pipeline MetaDB Local Cluster Pipeline & Big Data Tool Evaluation Cloud Small Data
  • 11. SeqWare at OICR SeqWare SeqWare Web Query Engine ServiceSeqWare Portal SeqWare SeqWare Pipeline MetaDB Sequencing Facility Local Cluster LIMS Cloud Big Data Small Data
  • 12. SeqWare at OICR SeqWare SeqWare Web Query Engine Service User + Data =SeqWare Portal SeqWare SeqWare Pipeline MetaDB “deciders” Local Cluster Production Big Data Informatics Cloud Small Data
  • 13. OICR Production WorkflowsMultiple groups contributed including the new Pipeline and Evaluation Team In Production Staging , Testing, & Development Workflows
  • 14. OICR SeqWare Results 38 TrillionSamples Bases Aligned Time (~2 years) ● Automated key components in production ● What about sharing our infrastructure? ● To the Cloud!
  • 15. “The Cloud”Want to share infrastructure without sharing infrastructure Core Services Transfer & Web Other Nifty Services Services Elastic Cloud Compute Elastic Beanstalk Simple Storage Glacier Import/Export Service Elastic Map/Reduce Linux tools DirectConnect for disk and DynamoDB file encryption HBase Hardware through API calls
  • 16. Scaling Computation: Analyzing SOLiD Data on Amazon● Life Collaborations Division had 9 human genomes such as the Icemans genome and HuRef resequenced at high depth● Goal: to deploy SeqWare infrastructure on AWS and analyze data in a scalable way ● Without building infrastructure ● Using open source tools
  • 17. SeqWare Infrastructure on EC2 InstanceWorkflow Command SeqWare or Bundle Line MetaDB Cluster Tools Launcher Config SeqWare Web Service Amazon Amazon EC2 S3 Import SeqWare PipelineResult Files: SeqWare BAM, Portal VCF,Reports User SeqWare Interfaces Amazon Instance or Cluster
  • 18. Workflow OutputsResults via Project Website on EC2 Variants Loaded in JBrowse Genome Browser on Elastic Beanstalk http://icemangenome.net Variants in Database and Files in S3 Variant BAM VCF Annotated VCF Database
  • 19. Results● Cloud delivered fantastic computational and storage scalability● Analyzed 9 human genomes, one at 300x!● Costs ● 8 node HPC cluster, about 4 days ● 30x coverage genome was ~$1000 (<$15/GBase) ● ~$150 per exome ($10/GBase) ● ~$50/month/genome storage, website, & browser
  • 20. The Future of SeqWare● Scalability ● Cloud-based cluster launching (Starcluster/Cloudman) ● Release encryption and distributed filesystem tools ● Better documentation and easier setup● Expertise ● Simplify pipeline language(s) and development process ● Release OICR public workflows● Integration ● Expand NOSQL variant/annotation database ● Support for other tools like Galaxy
  • 21. Availability● SeqWare available at: http://seqware.github.com, @SeqWare Virtual Box & AMI● Brian OConnor boconnor@oicr.on.ca
  • 22. Acknowledgements● SeqWare @ OICR ● Tim Harkins ● Morgan Taschuk, ● Barry Merriman Denis Yuen, Yong Liang ● Jason Warner● OICR SeqProdBio ● Kevin McKernan ● Tim Beck, Zheng Zha, Tony ● Vincent Ferretti DeBat● OICR Bioinformatics Core ● Lincoln Stein ● Francis Ouellette, Zhibin Lu● SeqWare @ UNC ● Neil Hayes, Sara Grimm, Stuart Jefferys, Matt Solloway, and the Lineberger group

×