Delivering Bioinformatics Training Using
Cloud Computing Infrastructure
Nathan S. Watson-Haigh
Take-Home Message
• Cloud computing infrastructure
– Solves some issues in delivering hands-on
bioinformatics training
– H...
ACKNOWLEDGEMENTS
Catherine Shang (Bioplatforms
Australia)
Nathan Watson-Haigh (ACPFG)
Nandan Deshpande (Systems
Biology In...
Office
Workshop VM
THE SETUP
Host
SETTING UP
Office
Admin VM
Host
Cloud API
Tools
Office Host
Sysadmin Computer
DEALING WITH HICCUPS
Portable Apps
Admin VM
parallel-ssh
-scp
-slurp
Drivers
• A need for bioinformatics training
– Good bioinformaticians work at, and understand,
command line tools
• Take t...
Goals
• Minimise maintenance of the training
environment
– No monolithic installs
• Minimise cognitive burden on trainees
...
NGS Workshop: Key Elements
• Knowledgeable, friendly trainers
– Obviously 
• Content
– Tools, data, handout
• Mode of del...
THE REUSABLE RESOURCES: VM SETUP
Office
Vanilla Ubuntu VM
NGS Workshop VM
• Gnome
• FreeNX
• Generic Tools
• NGS tools
• N...
THE REUSABLE RESOURCES: HANDOUT
Office
Trainee
Handout
Trainer
Handout
Rolling Your Own Handout
• Style file provided
– Makes it easy(er) to write/edit LaTeX
• Trainee Handout • Trainer Handout...
Simplified Styling
begin{information}
Information to be provided
to the trainee.
end{information}
Simplified Styling
begin{questions}
First question.
begin{answer}
Answer to first question.
end{answer}
Second question.
b...
Simplified Styling
begin{lstlisting}
# several lines of code
cd ~/
ls -l
# a long command that line wraps automatically
to...
Resources Refresher
• Plain text files (Bash, LaTeX) for
– Generic tools install
– Workshop-specific tool install
– Worksh...
Cloud Pros and Cons
Pros
• Consistent training environment
• No “alien” OS on host network
• Minimal host network
configur...
Workshops Using This System
Feb2012
Jul2012
MEL(22)
SYD(22)
Nov2012
BNE(33)
ADL(29)
Feb2013CAN(35)
Jun2013PER(38)
Jul2013M...
Future Directions
• Better “glue” to enable easier reuse on
– Local VM’s
– NeCTAR Research Cloud
– Amazon Web Services
• B...
Take-Home Message
• Cloud computing infrastructure
– Solves some issues in delivering hands-on
bioinformatics training
– H...
ACKNOWLEDGEMENTS
Catherine Shang (Bioplatforms
Australia)
Nathan Watson-Haigh (ACPFG)
Nandan Deshpande (Systems
Biology In...
Good, Bad and Ugly: Trainee
Good Bad Ugly
• Familiar
environment
• Accessible
afterwards
• Permissions
• Poor hardware
spe...
Good Bad Ugly
Good, Bad and Ugly: Trainer
• Just need a room • Network access
• Multiple OSes
• We look like idiots
• We w...
Puppet
• Helps sysadmins automate many repetitive
tasks
• Puppet config files
– Plain text (version control)
– Defines the...
Upcoming SlideShare
Loading in …5
×

Delivering Bioinformatics Training Using Cloud Computing Infrastructure - Nathan Watson-Haigh

1,382 views

Published on

Published in: Science
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,382
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Once we had the beginnings of the content, we wanted to figure out by what means that content was going to be delivered.

    Once we knew how the content was going to be delivered, we wanted to draw on existing technology/infrastructure to develop a training environment
  • Delivering Bioinformatics Training Using Cloud Computing Infrastructure - Nathan Watson-Haigh

    1. 1. Delivering Bioinformatics Training Using Cloud Computing Infrastructure Nathan S. Watson-Haigh
    2. 2. Take-Home Message • Cloud computing infrastructure – Solves some issues in delivering hands-on bioinformatics training – Has its own unique set of issues • Code and materials (CC and Open Access) – NGS Workshop – Try rolling your own! github.com/BPA-CSIRO-Workshops Watson-Haigh, N.S., et al. (2013). Next-generation sequencing: a challenge to meet the increasing demand for training workshops in Australia. Brief Bioinform 14, 563–574. http://bib.oxfordjournals.org/content/14/5/563
    3. 3. ACKNOWLEDGEMENTS Catherine Shang (Bioplatforms Australia) Nathan Watson-Haigh (ACPFG) Nandan Deshpande (Systems Biology Initiative, UNSW) Paula Moolhuijzen (CCG, Murdoch University) Sonika Tyagi (Australian Genome Research Facility) Matthew Field (ANU) Annette McGrath (CSIRO Bioinformatics Core , Digital Productivity and Services Flagship) Konsta Duesing (Food & Nutrition Flagship, CSIRO) Xi (Sean) Li (CSIRO Bioinformatics Core, Digital Productivity and Services Flagship) Sean McWilliam (CSIRO Agricultural Productivity Flagship) Paul Greenfield (CSIRO Digital Productivity and Services Flagship) Cath Brooksbank (EBI) Vicky Schneider (TGAC) Matthias Haimel (University of Cambridge) Myrto Kostadima (University of Cambridge) Remco Loos (EBI) Alex Mitchell (EBI) Hubert Denise (EBI) Jerico Revote (Monash e-Research Centre) Simon Michnowicz (Monash e-Research Centre) Steve Quenette (Monash University) Mark Crowe (QFAB) Peter Sterk (Oxford e-Research Centre)
    4. 4. Office Workshop VM THE SETUP Host
    5. 5. SETTING UP Office Admin VM Host Cloud API Tools
    6. 6. Office Host Sysadmin Computer DEALING WITH HICCUPS Portable Apps Admin VM parallel-ssh -scp -slurp
    7. 7. Drivers • A need for bioinformatics training – Good bioinformaticians work at, and understand, command line tools • Take the workshops to the trainees – Maximise participation
    8. 8. Goals • Minimise maintenance of the training environment – No monolithic installs • Minimise cognitive burden on trainees – The training environment should go unseen • Make everything publically accessible and as reusable as possible
    9. 9. NGS Workshop: Key Elements • Knowledgeable, friendly trainers – Obviously  • Content – Tools, data, handout • Mode of delivery – Dedicated training suite, BYO laptop, roadshow • Training environment – Tailored to mode of delivery
    10. 10. THE REUSABLE RESOURCES: VM SETUP Office Vanilla Ubuntu VM NGS Workshop VM • Gnome • FreeNX • Generic Tools • NGS tools • NGS data • NGS handout
    11. 11. THE REUSABLE RESOURCES: HANDOUT Office Trainee Handout Trainer Handout
    12. 12. Rolling Your Own Handout • Style file provided – Makes it easy(er) to write/edit LaTeX • Trainee Handout • Trainer Handout https://github.com/BPA-CSIRO-Workshops/handout-template
    13. 13. Simplified Styling begin{information} Information to be provided to the trainee. end{information}
    14. 14. Simplified Styling begin{questions} First question. begin{answer} Answer to first question. end{answer} Second question. begin{answer} Answer to second question. end{answer} end{questions}
    15. 15. Simplified Styling begin{lstlisting} # several lines of code cd ~/ ls -l # a long command that line wraps automatically tophat --solexa-quals -g 2 --library-type fr-unstranded -j annotation/Danio_rerio.Zv9.66.spliceSites -o tophat/ZV9_2cells genome/ZV9 data/2cells_1.fastq data/2cells_2.fastq end{lstlisting}
    16. 16. Resources Refresher • Plain text files (Bash, LaTeX) for – Generic tools install – Workshop-specific tool install – Workshop-specific data download/configuration – Handout document • Why plain text? – Version control – Collaboration – Reuse
    17. 17. Cloud Pros and Cons Pros • Consistent training environment • No “alien” OS on host network • Minimal host network configuration and traffic – Firewall (port 22) • Minimal local computer specification and configuration – NX Client plus session files • Scalable resources • Encourages reproducible work Cons • Remote vs local confusion – Hide this using NX • How to analyse own data? • Requires a computer suite • Sysadmin skills required
    18. 18. Workshops Using This System Feb2012 Jul2012 MEL(22) SYD(22) Nov2012 BNE(33) ADL(29) Feb2013CAN(35) Jun2013PER(38) Jul2013MEL(60) Jul2013MEL(38) Nov2013 SYD(38) BNE(30) Feb2014 SYD(38) MEL(34) Jul2014SYD(37) Jul2014CAN(60) Jul2014CAN(15) Sep2013 Feb2012 Dec2012BioInfoSummer(100) Dec2013BioInfoSummer(100) Nov2013ACAD(30) Nov2012ACAD(30) Jul2014R&rQTL(40) Apr2013Linux&RNA-Seq(30) Nov2014ACAD(30) BPA/CSIRO Competitive Courses: ~650 applicants for ~400 places EMBL Australia PhD Program: 120 places Other workshops: 360 places Total: ~900 places in 2.5 yrs
    19. 19. Future Directions • Better “glue” to enable easier reuse on – Local VM’s – NeCTAR Research Cloud – Amazon Web Services • Better documentation - ugh! – Easier for others to contribute and roll their own – Tagging workshop versions
    20. 20. Take-Home Message • Cloud computing infrastructure – Solves some issues in delivering hands-on bioinformatics training – Has its own unique set of issues • Code and materials (CC and Open Access) – NGS Workshop – Try rolling your own! github.com/BPA-CSIRO-Workshops Watson-Haigh, N.S., et al. (2013). Next-generation sequencing: a challenge to meet the increasing demand for training workshops in Australia. Brief Bioinform 14, 563–574. http://bib.oxfordjournals.org/content/14/5/563
    21. 21. ACKNOWLEDGEMENTS Catherine Shang (Bioplatforms Australia) Nathan Watson-Haigh (ACPFG) Nandan Deshpande (Systems Biology Initiative, UNSW) Paula Moolhuijzen (CCG, Murdoch University) Sonika Tyagi (Australian Genome Research Facility) Matthew Field (ANU) Annette McGrath (CSIRO Bioinformatics Core , Digital Productivity and Services Flagship) Konsta Duesing (Food & Nutrition Flagship, CSIRO) Xi (Sean) Li (CSIRO Bioinformatics Core, Digital Productivity and Services Flagship) Sean McWilliam (CSIRO Agricultural Productivity Flagship) Paul Greenfield (CSIRO Digital Productivity and Services Flagship) Cath Brooksbank (EBI) Vicky Schneider-Gricar (TGAC) Matthias Haimel (University of Cambridge) Myrto Kostadima (University of Cambridge) Remco Loos (EBI) Alex Mitchell (EBI) Hubert Denise (EBI) Jerico Revote (Monash e-Research Centre) Simon Michnowicz (Monash e-Research Centre) Steve Quenette (Monash University) Mark Crowe (QFAB) Peter Sterk (Oxford e-Research Centre)
    22. 22. Good, Bad and Ugly: Trainee Good Bad Ugly • Familiar environment • Accessible afterwards • Permissions • Poor hardware specification • First hour (or 3) wasted by sorting out “issues” • A dedicated facility • Everything should just work • Costs of residential courses • Access to more powerful hardware • Usually CLI • Remote vs local confusion • Limited access • I want a GUI • Users competing over compute resources • Accessible afterwards • What’s a cloud!? • Can I use this for my own data? BYO laptop Dedicated training room Remote server Cloud virtualisation
    23. 23. Good Bad Ugly Good, Bad and Ugly: Trainer • Just need a room • Network access • Multiple OSes • We look like idiots • We wasted so much time • We know what works • Local IT support • Maintaining up-to- date hardware • No access afterwards • Control over the OS • A single OS to maintain • Managing users • Resource management • Enables roadshows • 1 VM per trainee • New skills required • Post-workshop access BYO laptop Dedicated training room Remote server Cloud virtualisation
    24. 24. Puppet • Helps sysadmins automate many repetitive tasks • Puppet config files – Plain text (version control) – Defines the required state “B” - Puppet figures out how to get from “A” to “B” • Workshops defined in terms of tools and data needed using plain text – Collaborate and share on workshops

    ×