Successfully reported this slideshow.
Your SlideShare is downloading. ×

The case for cloud computing in Life Sciences

Upcoming SlideShare
Pine education-platform
Pine education-platform
Loading in …3

Check these out next

1 of 24 Ad

More Related Content

Slideshows for you (20)

Similar to The case for cloud computing in Life Sciences (20)


More from Ola Spjuth (14)


The case for cloud computing in Life Sciences

  1. 1. The case for cloud computing in the life sciences Ola Spjuth <> Department of Pharmaceutical Biosciences and Science for Life Laboratory Uppsala University
  2. 2. About me • Ola Spjuth, Docent • Associate Professor at Uppsala University – Data-intensive and translational bioinformatics ( • Head of Bioinformatics Compute and Storage facility at SciLifeLab – Responsible for managing resources – Strategic e-infra planning and procurement for SciLifeLab • Deputy Director at SNIC-UPPMAX HPC center • Guest Researcher at Karolinska Institutet – e-Science for Cancer Prevention and Control (eCPC), flagship project at SeRC 2
  3. 3. From conventional microscopes… digital video-microscopes and image analysis Molecular biology is a field in transition…
  4. 4. From manual operations… …to automated robotized laboratories
  5. 5. Today: We have access to high-throughput technologies to study biological phenomena
  6. 6. Science for Life Laboratory An internationally leading center that develops and applies large-scale technologies for molecular biosciences with a focus on health and environment. Became a national platform in 2013 Stockholm node Uppsala node
  7. 7. 2017: Human whole genome sequenced in 3 days for ~$1100 …requires supercomputers for analysis and storage Massively parallel sequencing…. 2017: Illumina HiSeq X systems. 15K whole human genomes per year 2016: NGI data velocity 950 Mbp/hour = 16 Mbp/s
  8. 8. Analysis Scientists Sample transfer Mode of operation Platforms Pre-processing (NGI) Research (SNIC) Data delivery
  9. 9. Software + reference data Support Education Compute resources Storage resources Efficiency + automation UPPMAX: A national e-infrastructure
  10. 10. Some statistics Storage usage Projects at SNIC-UPPMAX Data-intensive bioinformatics Other disciplines Support tickets
  11. 11. New challenges: Data management and analysis • Storage • Analysis methods, pipelines • Scaling • Automation • Data integration, security • Predictions • …
  12. 12. Why cloud in the life sciences? • Access to resources – Flexible configurations – On-demand – Cost-efficient? • Collaborate on international level – Publish/federate data – E.g. Large sequencing initiatives, “move compute to the data” • New types of analysis environments – Hadoop/Spark/Flink etc. – Microservices, Docker, Kubernetes, Mesos 12
  13. 13. Challenges with cloud • Tradition: Strong HPC tradition in academia – Existing resources funded by Research Council and personnel at 6 centra in Sweden (SNIC) • Economy: Cost model is new – Difficult to assess the costs • Legal: Working with sensitive data • Educational: New technology for many 13
  14. 14. Needs in bioinformatics • Primarily resources with a lot of RAM and storage (high I/O) • Preferably transparent system, users don’t want to deal with e- infrastructure at all • How to work with storage (tiered?) 14
  15. 15. My research focus e-infrastructure development Automation, Big Data e-Science methods development Prediction models, machine learning Applied e-Science research Drug discovery and individualized diagnostics
  16. 16. Selected research in my group Privacy preservation Workflows Big Data frameworks Data management and predictive modeling Data federation Compute federation
  17. 17. Reactive/continuous modeling Data sources Coordinate Integrate Version Monitor Publish models Archive models User Bioclipse Train and assess model
  18. 18. Tools Tools Data Data VREs aim to bridge this gap! Researcher Other researchers Virtual Research Environments
  19. 19. Researcher Tools Data Compute and storage resources Virtual Research Environment! Other researchers Virtual Research Environments
  20. 20. Cloudflare kubeadm Terraform kubectl Packer • Enable users to deploy their own virtual infrastructure on an IaaS provider • Containerize tools, orchestrate with workflow systems on top of Kubernetes PhenoMeNal approach and stack KubeNow
  21. 21. Hierarchical Analysis of Temporal and Spatial Image Data 21 Carolina Wählby PI, PhD, Professor in Quantitative Microscopy Andreas Hellander Co-PI, Associate Professor Ola Spjuth Co-PI, Associate Professor
  22. 22. Presenting at Spark Summit 2017: “EasyMapReduce: Leverage the power of Spark And Docker To scale scientific tools in MapReduce fashion“ 22 power-of-spark-and-docker-to-scale-scientific-tools-in-mapreduce-fashion/
  23. 23. Our most recent scientific publication 23
  24. 24. European Open Science Cloud (EOSC) • The vast majority of all data in the world (in fact up to 90%) has been generated in the last two years. • Scientific data is in direct need of openness, better handling, careful management, machine actionability and sheer re-use. • European Open Science Cloud: A vision of a future infrastructure to support Open Research Data and Open Science in Europe – It should enable trusted access to services, systems and the re-use of shared scientific data across disciplinary, social and geographical borders – research data should be findable, accessible, interoperable and re- usable (FAIR) – provide the means to analyze datasets of huge sizes 24

Editor's Notes

  • Strategic funding to enable:
    Infrastructure for high-throughput analysis
    Multi-disciplinary research environment
    Competence in technology and analysis methodology
  • Access to computers (many if you need)
    Access to storage (a lot if you need)
    Pre-installed software and reference genomes
  • How improve efficiency on shared HPC for data-intensive bioinformatics?
    Can Cloud Computing and Big Data frameworks aid data-intensive research?
    How useful are Scientific Workflows in data-intensive research?
    Can predictive modeling aid data acquisition, storage and analysis?
    How can we continuously improve predictive models as data changes?