Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

GT-Scan2: Bringing bioinformatics to the cloud - May Tech Talk


Published on

GT-Scann2: Bringing bioinformatics to the cloud - ANDS May Tech Talk presentation by Laurence Wilson and Adan O'Brien (CSIRO) 5th May 2017

Published in: Science
  • Be the first to comment

  • Be the first to like this

GT-Scan2: Bringing bioinformatics to the cloud - May Tech Talk

  1. 1. GT-Scan2: bringing bioinformatics to the cloud HEATH & BIOSECURITY Laurence Wilson and Aidan O’Brien AWS Tech Talk 5th May, 2017
  2. 2. What is bioinformatics? • Biological research using computation methods • It is a data driven science GT-Scan2| Laurence Wilson2 | Astronomy Twitter YouTube Genomics
  3. 3. My research field • My research focuses on genome-editing with CRISPR-Cas9 • CRISPR-Cas9 is a recent discovery that has revolutionized the genome editing field 3 | • CRISPR-Cas9 target sites are identified through pipelines GT-Scan2| Laurence Wilson
  4. 4. GT-Scan2: our CRISPR-Cas9 pipeline 4 | Step 1: Identify Step 2: Evaluate Off-targets: Sequence: Chromatin environment: ACGTCAGA… GACATCGA… TTACAGGG… Step 3: Rank Activity @gt_scan2 @dr_lwilson • We developed a pipeline that accurately identifies and evaluates potential CRISPR-Cas9 target sites GT-Scan2| Laurence Wilson
  5. 5. The issue of scale… 5 | • The potential search space can be small (~100bp, ~10 target sites) to very large (~20,000 x ~10,000bp, millions of target sites) • All potential target sites need to be evaluated • While individually the tasks are small, the scale makes it difficult • We needed a way to make the approach scalable GT-Scan2| Laurence Wilson
  6. 6. 6 | AWS Lambda “Applications built as a collection of microservices are more resilient and easier to scale” – AWS guy • Serverless compute service • Event-driven • Designed for small, quick tasks that communicate with each other to complete a larger goal • 300sec max wall time (3sec is default) • 512MB storage space (scratch/tmp) • Memory from 128MB to 1536MB • With Lambda, we can break our pipeline into individual functions that occur in parallel GT-Scan2| Laurence Wilson
  7. 7. 7 | GT-Scan2| Laurence Wilson
  8. 8. Overcoming some limits • For evaluation of off-targets, GT-Scan2 uses bowite2 • Bowtie2 searches against the genome for matches to the target site • This requires the reference genome to be uploaded with the Lambda function • The human genome is ~2GB, but Lambda has a limit of 512MB 8 | GT-Scan2| Laurence Wilson
  9. 9. 9 | The solution… • Split the genome into bitesize pieces • This way each Lambda instance can download and search for targets on a single genome piece (1-3 chromosomes) • Not always a solution, (i.e., this can produce different results) GT-Scan2| Laurence Wilson
  10. 10. 10 | In GT-Scan2 we use SNS • We have a Lambda function that watches the target table • This function sends a message to the SNS topic • This “message” can be a payload of 64KB • JSON format • Contains information about the targets (sequence, genome, etc.) • Sends one message per “genome-part” GT-Scan2| Laurence Wilson
  11. 11. 11 | GT-Scan2| Laurence Wilson
  12. 12. GT-Scan2 • We implemented the final pipeline as a web service, available at 12 | GT-Scan2| Laurence Wilson
  13. 13. What’s next? Step functions • Allow you to coordinate micro-services using visual workflows • For example: • Run second function if first function completes successfully • Run two functions in parallel and merge the results • Choose between two functions based on the output of the first function • Although we can (and do) achieve some of this without step functions, some parts are inefficient and complicated. • Step functions allow for better management of task timeouts, failures, etc. 13 | GT-Scan2| Laurence Wilson
  14. 14. Natalie Twine Transformational Bioinformatics Transformational Bioinformatics | Denis C. Bauer | @allPowerde Denis Bauer Oscar Luo Rob Dunne Piotr Szul Team Aidan O’BrienLaurence Wilson Adrian White Mia Champion Gaetan Burgio Collaborators David Levy News Software Dan Andrews Kaitao Lai Kaylene Simpson Iva Nikolic Ian Blair Kelly Williams