• Save
SeqWare on the Cloud: Porting a Genome Center's Infrastructure to Amazon Web Services
Upcoming SlideShare
Loading in...5
×
 

SeqWare on the Cloud: Porting a Genome Center's Infrastructure to Amazon Web Services

on

  • 1,834 views

This is a talk I (Brian O'Connor) gave at Genome Informatics 2012 describing how SeqWare was used to build the Next-Generation Sequencing (NGS) software infrastructure needed for multiple genome ...

This is a talk I (Brian O'Connor) gave at Genome Informatics 2012 describing how SeqWare was used to build the Next-Generation Sequencing (NGS) software infrastructure needed for multiple genome centers (OICR & UNC) and how that was leveraged on Amazon's cloud. If you are interested in using SeqWare for your NGS analysis needs see out open source site at http://seqware.github.com or, for a commercially supported version, see http://nimbusinformatics.com.

Statistics

Views

Total Views
1,834
Views on SlideShare
1,266
Embed Views
568

Actions

Likes
0
Downloads
0
Comments
0

5 Embeds 568

http://seqware.github.io 395
http://seqware.github.com 151
http://localhost 14
http://orz.res.oicr.on.ca 4
http://50.116.44.134 4

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial LicenseCC Attribution-NonCommercial License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

SeqWare on the Cloud: Porting a Genome Center's Infrastructure to Amazon Web Services SeqWare on the Cloud: Porting a Genome Center's Infrastructure to Amazon Web Services Presentation Transcript

  • SeqWare on the Cloud:Porting a Genome Center’s Infrastructure to Amazon Web Services Brian OConnor SeqWare Software Architect & Manager for Software Engineering The Ontario Institute for Cancer Research
  • Effective Scaling Integration Expertise & Sharing Effective System Compute & StorageSeqWare was designed to scale in these ways
  • Effective ScalingQuery Integration ExpertiseEngine & SharingPoster Effective System Compute & Storage SeqWare was designed to scale in these ways
  • The Open Source SeqWare Project SeqWare SeqWare Web Query Engine Service SeqWare Portal SeqWare SeqWare Pipeline MetaDB Local Cluster Cloud Big Data Small Data
  • Distinguishing Features of SeqWare Firehose ● Infrastructure Toolkit Tavern ● Developer Framework a Open Source/Community ● Automation ● Environment-Agnostic Commercial ● Tailored for Big Projects ● User-Created Workflows ● Packaging Format ● Provenance Tracking ● Fault Tolerant ● Tools-Agnostic ● Open Source
  • Projects Using SeqWare UNC OntarioLineberger Institute for Cancer Cancer Center Research Iceman, Plant Genome HuRef 300x, Clinical Assembly Others... Sequencing + local projects + local projects Exome, Targeted Whole Genome, Resequencing Targeted Re- Whole Genome RNASeq Sequencing, Whole Genome RNASeq Hundreds of 2 genomes, 1.5 TBase 38 TBase 9 genomes patient samples JBrowse on927 samples 1,522 samples a 300x genome iPad 982 “lanes” 2,297 “lanes”
  • Scaling Expertise: Analyzing Illumina Data @ OICR● September 2011 rolled out SeqWare at OICR● Goal: to deploy SeqWare and streamline production analysis through automation ● 4 groups working together ● SeqWare Workflows for – Large projects and common tasks – Projects with “public uploads”
  • SeqWare at OICR SeqWare SeqWare Web Query Engine ServiceSeqWare Portal SeqWare SeqWare Pipeline MetaDB Local Cluster Cloud Big Data Small Data
  • SeqWare at OICR SeqWare SeqWare Web Query EngineSoftware ServiceEngineeringSeqWare Portal SeqWare SeqWare Pipeline MetaDB Local Cluster Cloud Big Data Small Data
  • SeqWare at OICR SeqWare SeqWare Web Query Engine ServiceSeqWare Portal SeqWare SeqWare Pipeline MetaDB Local Cluster Pipeline & Big Data Tool Evaluation Cloud Small Data
  • SeqWare at OICR SeqWare SeqWare Web Query Engine ServiceSeqWare Portal SeqWare SeqWare Pipeline MetaDB Sequencing Facility Local Cluster LIMS Cloud Big Data Small Data
  • SeqWare at OICR SeqWare SeqWare Web Query Engine Service User + Data =SeqWare Portal SeqWare SeqWare Pipeline MetaDB “deciders” Local Cluster Production Big Data Informatics Cloud Small Data
  • OICR Production WorkflowsMultiple groups contributed including the new Pipeline and Evaluation Team In Production Staging , Testing, & Development Workflows
  • OICR SeqWare Results 38 TrillionSamples Bases Aligned Time (~2 years) ● Automated key components in production ● What about sharing our infrastructure? ● To the Cloud!
  • “The Cloud”Want to share infrastructure without sharing infrastructure Core Services Transfer & Web Other Nifty Services Services Elastic Cloud Compute Elastic Beanstalk Simple Storage Glacier Import/Export Service Elastic Map/Reduce Linux tools DirectConnect for disk and DynamoDB file encryption HBase Hardware through API calls
  • Scaling Computation: Analyzing SOLiD Data on Amazon● Life Collaborations Division had 9 human genomes such as the Icemans genome and HuRef resequenced at high depth● Goal: to deploy SeqWare infrastructure on AWS and analyze data in a scalable way ● Without building infrastructure ● Using open source tools
  • SeqWare Infrastructure on EC2 InstanceWorkflow Command SeqWare or Bundle Line MetaDB Cluster Tools Launcher Config SeqWare Web Service Amazon Amazon EC2 S3 Import SeqWare PipelineResult Files: SeqWare BAM, Portal VCF,Reports User SeqWare Interfaces Amazon Instance or Cluster
  • Workflow OutputsResults via Project Website on EC2 Variants Loaded in JBrowse Genome Browser on Elastic Beanstalk http://icemangenome.net Variants in Database and Files in S3 Variant BAM VCF Annotated VCF Database
  • Results● Cloud delivered fantastic computational and storage scalability● Analyzed 9 human genomes, one at 300x!● Costs ● 8 node HPC cluster, about 4 days ● 30x coverage genome was ~$1000 (<$15/GBase) ● ~$150 per exome ($10/GBase) ● ~$50/month/genome storage, website, & browser
  • The Future of SeqWare● Scalability ● Cloud-based cluster launching (Starcluster/Cloudman) ● Release encryption and distributed filesystem tools ● Better documentation and easier setup● Expertise ● Simplify pipeline language(s) and development process ● Release OICR public workflows● Integration ● Expand NOSQL variant/annotation database ● Support for other tools like Galaxy
  • Availability● SeqWare available at: http://seqware.github.com, @SeqWare Virtual Box & AMI● Brian OConnor boconnor@oicr.on.ca
  • Acknowledgements● SeqWare @ OICR ● Tim Harkins ● Morgan Taschuk, ● Barry Merriman Denis Yuen, Yong Liang ● Jason Warner● OICR SeqProdBio ● Kevin McKernan ● Tim Beck, Zheng Zha, Tony ● Vincent Ferretti DeBat● OICR Bioinformatics Core ● Lincoln Stein ● Francis Ouellette, Zhibin Lu● SeqWare @ UNC ● Neil Hayes, Sara Grimm, Stuart Jefferys, Matt Solloway, and the Lineberger group