SeqWare on the Cloud:Porting a Genome Center’s Infrastructure        to Amazon Web Services               Brian OConnor   ...
Effective Scaling      Integration               Expertise      & Sharing                    Effective                    ...
Effective ScalingQuery          Integration               ExpertiseEngine         & SharingPoster                         ...
The Open Source SeqWare Project               SeqWare                SeqWare                 Web                Query Engi...
Distinguishing Features of SeqWare Firehose                                ●   Infrastructure Toolkit     Tavern          ...
Projects Using SeqWare   UNC           OntarioLineberger     Institute for  Cancer         Cancer  Center        Research ...
Scaling Expertise:        Analyzing Illumina Data @ OICR●   September 2011 rolled out SeqWare at OICR●   Goal: to deploy S...
SeqWare at OICR              SeqWare                SeqWare                Web                Query Engine               S...
SeqWare at OICR        SeqWare                SeqWare          Web                Query EngineSoftware ServiceEngineeringS...
SeqWare at OICR              SeqWare                SeqWare                Web                Query Engine               S...
SeqWare at OICR               SeqWare                SeqWare                 Web                Query Engine              ...
SeqWare at OICR              SeqWare                SeqWare                Web                Query Engine               S...
OICR Production WorkflowsMultiple groups contributed including the new Pipeline and Evaluation Team     In Production     ...
OICR SeqWare Results                                            38 TrillionSamples                                        ...
“The Cloud”Want to share infrastructure without sharing infrastructure  Core Services            Transfer & Web           ...
Scaling Computation:         Analyzing SOLiD Data on Amazon●   Life Collaborations Division had 9 human    genomes such as...
SeqWare Infrastructure on EC2                                               InstanceWorkflow    Command               SeqW...
Workflow OutputsResults via Project Website on EC2          Variants Loaded in JBrowse                                    ...
Results●   Cloud delivered fantastic computational and    storage scalability●   Analyzed 9 human genomes, one at 300x!●  ...
The Future of SeqWare●   Scalability    ●   Cloud-based cluster launching (Starcluster/Cloudman)    ●   Release encryption...
Availability●   SeqWare available at:    http://seqware.github.com, @SeqWare                                    Virtual Bo...
Acknowledgements●   SeqWare @ OICR                  ●   Tim Harkins    ●   Morgan Taschuk,             ●   Barry Merriman ...
Upcoming SlideShare
Loading in...5
×

SeqWare on the Cloud: Porting a Genome Center's Infrastructure to Amazon Web Services

1,745

Published on

This is a talk I (Brian O'Connor) gave at Genome Informatics 2012 describing how SeqWare was used to build the Next-Generation Sequencing (NGS) software infrastructure needed for multiple genome centers (OICR & UNC) and how that was leveraged on Amazon's cloud. If you are interested in using SeqWare for your NGS analysis needs see out open source site at http://seqware.github.com or, for a commercially supported version, see http://nimbusinformatics.com.

Published in: Health & Medicine, Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,745
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

SeqWare on the Cloud: Porting a Genome Center's Infrastructure to Amazon Web Services

  1. 1. SeqWare on the Cloud:Porting a Genome Center’s Infrastructure to Amazon Web Services Brian OConnor SeqWare Software Architect & Manager for Software Engineering The Ontario Institute for Cancer Research
  2. 2. Effective Scaling Integration Expertise & Sharing Effective System Compute & StorageSeqWare was designed to scale in these ways
  3. 3. Effective ScalingQuery Integration ExpertiseEngine & SharingPoster Effective System Compute & Storage SeqWare was designed to scale in these ways
  4. 4. The Open Source SeqWare Project SeqWare SeqWare Web Query Engine Service SeqWare Portal SeqWare SeqWare Pipeline MetaDB Local Cluster Cloud Big Data Small Data
  5. 5. Distinguishing Features of SeqWare Firehose ● Infrastructure Toolkit Tavern ● Developer Framework a Open Source/Community ● Automation ● Environment-Agnostic Commercial ● Tailored for Big Projects ● User-Created Workflows ● Packaging Format ● Provenance Tracking ● Fault Tolerant ● Tools-Agnostic ● Open Source
  6. 6. Projects Using SeqWare UNC OntarioLineberger Institute for Cancer Cancer Center Research Iceman, Plant Genome HuRef 300x, Clinical Assembly Others... Sequencing + local projects + local projects Exome, Targeted Whole Genome, Resequencing Targeted Re- Whole Genome RNASeq Sequencing, Whole Genome RNASeq Hundreds of 2 genomes, 1.5 TBase 38 TBase 9 genomes patient samples JBrowse on927 samples 1,522 samples a 300x genome iPad 982 “lanes” 2,297 “lanes”
  7. 7. Scaling Expertise: Analyzing Illumina Data @ OICR● September 2011 rolled out SeqWare at OICR● Goal: to deploy SeqWare and streamline production analysis through automation ● 4 groups working together ● SeqWare Workflows for – Large projects and common tasks – Projects with “public uploads”
  8. 8. SeqWare at OICR SeqWare SeqWare Web Query Engine ServiceSeqWare Portal SeqWare SeqWare Pipeline MetaDB Local Cluster Cloud Big Data Small Data
  9. 9. SeqWare at OICR SeqWare SeqWare Web Query EngineSoftware ServiceEngineeringSeqWare Portal SeqWare SeqWare Pipeline MetaDB Local Cluster Cloud Big Data Small Data
  10. 10. SeqWare at OICR SeqWare SeqWare Web Query Engine ServiceSeqWare Portal SeqWare SeqWare Pipeline MetaDB Local Cluster Pipeline & Big Data Tool Evaluation Cloud Small Data
  11. 11. SeqWare at OICR SeqWare SeqWare Web Query Engine ServiceSeqWare Portal SeqWare SeqWare Pipeline MetaDB Sequencing Facility Local Cluster LIMS Cloud Big Data Small Data
  12. 12. SeqWare at OICR SeqWare SeqWare Web Query Engine Service User + Data =SeqWare Portal SeqWare SeqWare Pipeline MetaDB “deciders” Local Cluster Production Big Data Informatics Cloud Small Data
  13. 13. OICR Production WorkflowsMultiple groups contributed including the new Pipeline and Evaluation Team In Production Staging , Testing, & Development Workflows
  14. 14. OICR SeqWare Results 38 TrillionSamples Bases Aligned Time (~2 years) ● Automated key components in production ● What about sharing our infrastructure? ● To the Cloud!
  15. 15. “The Cloud”Want to share infrastructure without sharing infrastructure Core Services Transfer & Web Other Nifty Services Services Elastic Cloud Compute Elastic Beanstalk Simple Storage Glacier Import/Export Service Elastic Map/Reduce Linux tools DirectConnect for disk and DynamoDB file encryption HBase Hardware through API calls
  16. 16. Scaling Computation: Analyzing SOLiD Data on Amazon● Life Collaborations Division had 9 human genomes such as the Icemans genome and HuRef resequenced at high depth● Goal: to deploy SeqWare infrastructure on AWS and analyze data in a scalable way ● Without building infrastructure ● Using open source tools
  17. 17. SeqWare Infrastructure on EC2 InstanceWorkflow Command SeqWare or Bundle Line MetaDB Cluster Tools Launcher Config SeqWare Web Service Amazon Amazon EC2 S3 Import SeqWare PipelineResult Files: SeqWare BAM, Portal VCF,Reports User SeqWare Interfaces Amazon Instance or Cluster
  18. 18. Workflow OutputsResults via Project Website on EC2 Variants Loaded in JBrowse Genome Browser on Elastic Beanstalk http://icemangenome.net Variants in Database and Files in S3 Variant BAM VCF Annotated VCF Database
  19. 19. Results● Cloud delivered fantastic computational and storage scalability● Analyzed 9 human genomes, one at 300x!● Costs ● 8 node HPC cluster, about 4 days ● 30x coverage genome was ~$1000 (<$15/GBase) ● ~$150 per exome ($10/GBase) ● ~$50/month/genome storage, website, & browser
  20. 20. The Future of SeqWare● Scalability ● Cloud-based cluster launching (Starcluster/Cloudman) ● Release encryption and distributed filesystem tools ● Better documentation and easier setup● Expertise ● Simplify pipeline language(s) and development process ● Release OICR public workflows● Integration ● Expand NOSQL variant/annotation database ● Support for other tools like Galaxy
  21. 21. Availability● SeqWare available at: http://seqware.github.com, @SeqWare Virtual Box & AMI● Brian OConnor boconnor@oicr.on.ca
  22. 22. Acknowledgements● SeqWare @ OICR ● Tim Harkins ● Morgan Taschuk, ● Barry Merriman Denis Yuen, Yong Liang ● Jason Warner● OICR SeqProdBio ● Kevin McKernan ● Tim Beck, Zheng Zha, Tony ● Vincent Ferretti DeBat● OICR Bioinformatics Core ● Lincoln Stein ● Francis Ouellette, Zhibin Lu● SeqWare @ UNC ● Neil Hayes, Sara Grimm, Stuart Jefferys, Matt Solloway, and the Lineberger group

×