Virtual Proteomics Analysis Cluster in the Cloud

Its always sunny on top of the Cloud! An intro to Amazon Web Services Simon Twigger, Ph.D. Medical College of Wisconsin, Milwaukee ViPDAC, a stand-alone Proteomics Analysis Suite in the Cloud

‘ How the humble pipette tip helped us rethink our computing strategy...’

Meet Joe ‘the’ Researcher...

Proteomics - Finding and identifying proteins DB Rat/Tissue Sample LC MS/MS Peptide Identification Results & Analysis

Current architecture Windows (head node, preprocessing, storage) Raw File .dtas Protein IDs IBM Blade Cluster (Sequest)

Finite Resource, wait your turn 1 MCW Cluster

Here’s the lab’s pipette tip, Let me have it when you’re done...

What would you do if there was only one tip? Wait in line to use it Run fewer experiments ( due to waiting in line ) Do small scale things ( Its a small tip, pipetting 5l takes all week! ) Try fewer things ( its a real pain to keep washing it up ) Not try anything weird ( What happens if it gets permanently clogged!? )

OK, more computers might be better... but... we dont have the money! we dont have an IT guy/gal we dont have a sysadmin we dont know how to install a cluster we wont use it all the time

Virtual Proteomics Analysis Cluster (ViPDAC) http://proteomics.mcw.edu/vipdac + +

Current architecture with Sequest Raw File .dtas Protein IDs IBM Blade Cluster (Sequest) Windows (head node, preprocessing, storage)

ViPDAC & Amazon Components S3 (Data Store) Raw File .dtas Protein IDs EC2 (OMSSA, !XTandem)

ViPDAC & Amazon Components S3 (Data Store) Raw File .dtas Protein IDs 2x 3x 20x

ViPDAC: Create a new analysis job

Wait in line vs On Demand vs 1 MCW Cluster Molly’s ViPDAC Shama’s ViPDAC Brian’s ViPDAC Bassam’s ViPDAC

Equal-opportunity computing - Clusters for All vs 1 PC 1 ViPDAC or n ViPDACs

Observations Sign up & Start up is hard for biologists. http://www.directthought.com / http://www.elasticpod.com /

Now what? No need to Wait in line to use it No need to Run fewer analyses No need to Do small scale things No need to Try fewer things No need to Not try anything weird Molly’s ViPDAC Shama’s ViPDAC Brian’s ViPDAC Bassam’s ViPDAC

Internal Hybrid Solution – Local and Cloud Scale up/down/off

Clouds & Bioinformatics: Our observations so far Use it as a software delivery method Use it to provide computing to virtually anyone Get fast access to large data files (Ensembl, Genbank, etc) Use it to COMPLEMENT existing clusters/grids AMIs/Apps not easy for non-informatics folks to get going ‘ Cloud-friendly’ licensing structures for commercial software? ‘ Grant-friendly’ billing options Data transfer for large datasets (NextGen sequencing?)

Acknowledgements Joey Geiger, Brian Halligan and Andrew Vallejos Molly Pellitteri-Hahn, Shama Mirsa Mike Olivier, Andy Greene NHLBI National Proteomics Center Low Cost, Scalable Proteomics Data Analysis Using Amazon’s Cloud Computing Services and Open Source Search Algorithms. J. Proteome Res., 2009, 8 (6), pp 3148–3153

Virtual Proteomics Analysis Cluster in the Cloud

More Related Content

Viewers also liked

Similar to Virtual Proteomics Analysis Cluster in the Cloud

Recently uploaded

Virtual Proteomics Analysis Cluster in the Cloud

Editor's Notes