Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Faster Time to Science - Scaling BioMedical Research in the Cloud with SciOps - Session Sponsored by DiUS


Published on

Medical Researchers are constantly looking for ways to be able to conduct more experiments, innovate at a faster rate and derive meaningful research outcomes more quickly. One of the major barriers to achieving this is long processing times due to giant datasets. A combined industry and research partnership, large-scale on-demand compute and the cloud has been key to making inroads to solving this very common challenge.

DiUS and the Walter Eliza Hall Institute of Medical Research (WEHI) have been working on approaches to accelerate the capture, processing and analysis of bioimagery and microscopy data used in the research labs at WEHI. In this talk, Pavi and Lachlan will share a case study starting with a background on microscope development and a synopsis of state-of-the-art microscopy techniques requiring large scale compute. The session will then launch into a discussion of scaling complex image analysis using Fiji, a bio-science image analysis package and dealing with ever-growing bioimaging datasets.

You will learn about the development of tailored high performance compute (HPC) platforms on AWS to enable this kind of research as well as the 'convention-over-configuration' framework developed by DiUS as a repeatable solution. Lower level technical considerations around network integration, efficient data movement and cluster compute approaches using CfnCluster on AWS will also be discussed in detail.

Speakers: Lachlan Whitehead, PhD, BioImage Analyst and Microscopy Walter and Eliza Hall Institute of Medical Research & Pavi De Alwis, Snr.Software Engineer, DiUS

Published in: Technology
  • Be the first to comment

Faster Time to Science - Scaling BioMedical Research in the Cloud with SciOps - Session Sponsored by DiUS

  1. 1. Faster Time to Science Scaling BioMedical Research in the Cloud with SciOps Pavi De Alwis LachlanWhitehead @paviOO @DrLachie
  2. 2. Talk Outline Who? § Who are we and what are we doing here? Why? § Microscopy and Image Analysis How? § How we utilised AWS to speed up the science What? § Our solution (WIA framework)
  3. 3. DiUS DiUS is an Australian-based technology services crew with a DNA that's cloud- first, human-powered, ‘small-a’ agile, lean and outcome focused. We use AWS to help our customers transform the way they develop and deliver digital products; to experiment better, move faster, enter new markets, compete and win.
  4. 4. About Me: Pavi De Alwis § Software Engineer at DiUS § Many hats across all SDLC activities § Experience across different domains, languages and tools § Applying Software Engineering and DevOps to Scientific Computing
  5. 5. Walter and Eliza Hall Institute § Oldest medical research institute in Australia § Discovery, Translation and Education § Cancer § Inflammation § Infection and Immunity § Stem Cells § Personalised Medicine § Etc.
  6. 6. About Me: Lachlan Whitehead § PhD in Physics from the University of Melbourne in ARC Centre of Excellence for Coherent X-ray science § BioImage Analyst at the Walter and Eliza Hall Institute in the imaging laboratory § What do I do: § Pretty pictures are no longer good enough § This is a quickly developing field and heavy on computation – something most biologists have no experience in
  7. 7. What is a Microscope?
  8. 8. What is a Microscope Image? From a raw data perspective? § XY (+ Intensity) § Color Channel § Z (depth) § Time Microscope companies keep inventing more: Position, Plate number, Block, Wavelength etc.
  9. 9. Cutting Edge Microscopy Pros: § Very fast § Very gentle § Very high resolution Cons: § Very fast § Very gentle § Very high resolution Chenetal.Science2014
  10. 10. The Problem – Data Size § Uncompressed 8-bit or 16-bit files § A 3 channel, 15 slice image with 200 time points is nearly 30GB
  11. 11. The Problem - Compute § It can be hard to just move files that size around § Generally a whole image loaded into RAM § Large numbers of small files similarly problematic
  12. 12. Walter and Eliza Hall Institute Art of Science Competition The Problem - Variety of Experiments
  13. 13. What can we analyse? Object counts – cell death / proliferation Intensity – Protein / gene expression Morphology – Size / shape / location of tumours Motion – Cell behaviour over time, speed and direction of migration Image Analysis Analysis ‘arms race’ Tools of the trade Microscope companies also provide their own (limited) tools for dealing with data. Many tools are open-source, others are extremely expensive Image analysis is embarrassingly parallel
  14. 14. Aside - Embarrassingly Parallel “an embarrassingly parallel workload or problem (also called perfectly parallel or pleasingly parallel) is one where little or no effort is needed to separate the problem into a number of parallel tasks” -Wikipedia
  15. 15. The Brief From: § Locking up my desktop and running analysis for hours at a time To: § Running parallelised analysis on the AWS cloud Requirements: § Must be simple § Must be efficient § Must be reusable
  16. 16. § Software Engineering and DevOps techniques § Pairing with Scientists and Labs § Conventions, light-weight frameworks and expose configuration § Seamless - dev on desktop, workload on cluster § Data from Instrument-to-cloud § Ad-hoc custom compute SciOps [sic]
  17. 17. AWS EC2 – ‘Elastic Compute’...2 S3 - Simple Storage Service IAM - Identity Access Management AMI - Amazon Machine Image
  18. 18. Setting Up Compute Master Node EC2 Local Machine ‘Local’ Data S3 Storage ‘Bucket’ Compute Node EC2 Compute Node EC2 Compute Node EC2 Compute Node EC2
  19. 19. Cluster Setup cfnCluster CLI tool to build and manage HPC clusters Provide configuration Press enter and wait a couple of minutes Custom spec cluster § CloudFormation § IAM § SNS § SQS § EC2 § AutoScaling § EBS § CloudWatch § S3 § Dynamodb User Defined
  20. 20. Deploying Clusters in AWS
  21. 21. What’s My Parallel Processing Model? 1. Get the directory or file 2. List the files or ‘dimensions’ in a file 3. Run the same analysis across files / dimensions 4. Display steps live on screen interactively
  22. 22. What’s My Runtime Model in AWS? 1. Run headless ec2 2. Start Fiji with macro and a configuration file 3. Configuration file contains ‘subset’ to analyse (i.e files or dimensions) 4. Write results to disk
  23. 23. Fiji and AWS 1. Custom AMI with Fiji pre-installed 2. Modify analysis macros to run online 3. Fiji plugins can’t headless 4. Multiple instances of Fiji on EC2 causes all sorts of problems - RMI
  24. 24. Project Lifecycle § New project on the LAN/local machine § Sync - to AWS § Kick-off image processing workloads to HPC cluster § Multiple jobs queued per-nodes § Sync - from AWS Choices to make § Size of machines § Level or parallelisation § Time costs / benefits Considerations § Costs of cluster § How long cluster will be up § Data transfer isn’t instant
  25. 25. Cost S3 (Storage) ~3c per gigabyte per month. EC2 (Compute) Scales with machine type. Machine Name Specs Price/hr T2.micro 1CPU, 500MB RAM, cloud storage 2c M4.2xlarge 8CPU, 32GB RAM, SSD storage 67.3c R3.8xlarge 32CPU, 244GB RAM, SSD storage $3.192
  26. 26. Cost m4.large + 4x m4.2xlarge = $3.03 / hour Only need 4 - m4.2xlarges for pretty large image data. What if we had lots of small images? t2.large + 10x t2.large = $ 1.93 / hour Cost scales more or less linearly with the number of machines. So does computation time! Master Node m4.large S3 Storage ‘Bucket’ Compute Node m4.2xlarge Compute Node m4.2xlarge Compute Node m4.2xlarge Compute Node m4.2xlarge
  27. 27. Fiji Optimization on EC2
  28. 28. WIA (Imaging on AWS) § Fully documented SciOps framework § Contains cli tools: § Create new project structure § Generate config files § Sync data into S3 and back § Create AMI with customised Fiji § Submit and manage jobs on HPC queues § Also contains: § cfnCluster config file and instructions § Generic Fiji macro launcher projectName/                                             |──  bin                                 |──  input                           |──  output                       └──  src
  29. 29. Conventions Established by § Analytical need § Software tools § Varies by problem § Custom compute frameworks § Experiment, build, automate § Repeatable templates § Short lived clusters § HPC on demand via SciOps
  30. 30. The Future § Our lab is building and expanding § Many labs don’t have access to local cluster compute § Faster development and turnaround from acquisition to analysis to publication § If this was around when I was a PhD I would have completed sooner
  31. 31. Stay in Touch Pavi De Alwis LachlanWhitehead @paviOO @DrLachie Acknowledgments AWS: Adrian White DIUS: Paula Ngov DIUS: Voon Wong WEHI: Kelly Rogers WEHI: Andrew Webb Find us @ location G1 Right next to the AWS Booth
  32. 32. Resources – From DiUS and AWS Read the case study: § Proving High-Performance Cloud Computing Can Support Disease Prevention Check out our technical blogs: § Scientific image processing in the cloud with Fiji/ImageJ § Building an auto-scaling R cluster using CfnCluster Read the AWS Blog: § High Performance Cloud Computing Supports Disease Prevention