Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

'Omics in extreme Environments (Lightweight bioinformatics)

953 views

Published on

Presentation on lightweight bioinformatics (Raspi / cloud computing) for real-time field-based analyses.

Presented at iEOS2015, St. Andrews, 3-6th July 2015.

Published in: Science
  • Be the first to comment

  • Be the first to like this

'Omics in extreme Environments (Lightweight bioinformatics)

  1. 1. Omics in extreme Environments (Lightweight bioinformatics)! Joe Parker" Royal Botanic Gardens, Kew"
  2. 2. Compute time is (much) cheaper than you think" "… and much cheaper than your time." Physical portability requires software portability."
  3. 3. Kew" One of the largest living and tissue collections in the world: ca ~6000 genera (~1/3 plant genera)" 2020 Strateigic Output: Plant And Fungal Trees Of Life!
  4. 4. Why in the field" •  Spatial analysis" •  ID & naming" •  Image recognition"
  5. 5. Why in the field"
  6. 6. The ‘micro’ computer: Raspi" •  Low: cost, energy (& power)! •  Highly portable" •  Hackable form-factor"
  7. 7. Laptops" •  Portable" •  Very costly form-factor" •  Maté? Beer?"
  8. 8. Clusters" •  Not portable, setup costs"
  9. 9. The cloud" •  Power closely linked to budget (as limited as)" •  Almost infinitely scalable" •  Have to have a connection to get data up there (and down!)" •  Fiddly setup"
  10. 10. Comparison" System Arch CPU type, clock GHz cores RAM Gb / MHz / type HDD Gb Pandanus i686 Xeon E5620 @ 2.4 4 33 1000 @ SATA Raspberry Pi 2 B+ ARM ARMv7 @ 1.0 1 1 8 @ flash card Macbook Pro (2011) x64 Core i7 @ 2.2 4 8 250 @ SSD EC2 m4.10xlarge x64 Xeon E5 @ 2.4 40 160 320 @ SSD MidPlus x64 Westmere @ 2.8 2500+ 24 - 512 2x320 @ SSD
  11. 11. Workflow" Setup BLAST 2.2.30 CEGMA genes Short reads Concatenate hits to CEGMA alignments Muscle 3.8.31 RAxML 7.2.8+ Set up workflow, binaries, and reference / alignment data. Deploy to machines. Protein-protein blast reads (from MG- RAST repository, Bass Strait oil field) against 458 core eukaryote genes from CEGMA. Keep only top hits. Use max. num_threads available. Append top hit sequences to CEGMA alignments. For each: Align in MUSCLE using default parameters Infer de novo phylogeny in RAxML under Dayhoff, random starting tree and max. PTHREADS. Output and parse times.
  12. 12. Results" Platform! Hardware capital! Data! Running! Pi (new) " £26" NA" NA" MBP" ~£2000" NA" NA" AWS (M4.10xlarge)" 0" ~£1/Mb (BGAN); £1/day (Virgin mobile-only tariff)" £1.78 /hr" AWS (t2.micro)" 0" ~£1/Mb (BGAN); £1/day (Virgin mobile-only tariff)" £0.01/hr" log(time)(user,s) log(Number of queries)
  13. 13. Overall performance"
  14. 14. Raspi in practice" •  Stability" •  ARM not x86 architecture" •  2 GB RAM… "
  15. 15. The cloud in practice" •  Fiddly setup, easy to replicate" •  Need a connection to get data up there (and down!)"
  16. 16. Conclusions" •  Pi opportunities but not there yet, also you’ll still need a connection unless you’re very lucky.. " •  Installation in situ?" •  Consider cloud computing (connections can only improve)" •  Portability of the workflow enhances portability of the system! –  …which you should be embracing anyway for reproducibility…"
  17. 17. Thanks" !Kew! !Matt Blissett, Abigail Barker, Rob Turner! Others! !Daniel Barker (4273π)" !Tim Booth (BioLinux)" "Alexandros Stamatakis (RAxML)"

×