Successfully reported this slideshow.

Analyzing Big Data in Medicine with Virtual Research Environments and Microservices

1

Share

1 of 26
1 of 26

Analyzing Big Data in Medicine with Virtual Research Environments and Microservices

1

Share

Download to read offline

Description

Presented at Big Data in Medicine, Uppsala, Sweden on 2016-11-17

Transcript

  1. 1. Analyzing Big Data in Medicine with Virtual Research Environments and Microservices Ola Spjuth <ola.spjuth@farmbio.uu.se> Department of Pharmaceutical Biosciences Science for Life Laboratory Uppsala University
  2. 2. Today: We have access to high-throughput technologies to study biological phenomena
  3. 3. New challenges: Data management and analysis • Storage • Analysis methods, pipelines • Scaling • Automation • Data integration, security • Predictions • …
  4. 4. European Open Science Cloud (EOSC) • The vast majority of all data in the world (in fact up to 90%) has been generated in the last two years. • Scientific data is in direct need of openness, better handling, careful management, machine actionability and sheer re-use. • European Open Science Cloud: A vision of a future infrastructure to support Open Research Data and Open Science in Europe – It should enable trusted access to services, systems and the re-use of shared scientific data across disciplinary, social and geographical borders – research data should be findable, accessible, interoperable and re- usable (FAIR) – provide the means to analyze datasets of huge sizes 4http://ec.europa.eu/research/openscience/index.cfm?pg=open-science-cloud
  5. 5. Contemporary Big Data analysis in bioinformatics • High-Performance Computing with shared storage – Linux, Terminal, batch queue • Problems/challenges – Access to resources is limited – Dependency management for tools is cumbersome, need help from system administrators to install software – Privacy-related issues – Difficult to share/integrate data – Accessibility issues • A common approach: Internet-based services – Retrieve data – Analysis tools 5
  6. 6. Workflows 6
  7. 7. Service-Oriented Architectures (SOA) in the life sciences • Standardize – Agree on e.g. interfaces, data formats, protocols etc. • Decompose and compartmentalize – Experts (scientists) should provide services – do one thing and do it well – Achieve interoperability by exposing data and tools as Web services • Integrate – Users should access and integrate remote services API Scientist service Scientist consume
  8. 8. Service-Oriented Architectures (SOA) in the life sciences, ~2005 Scientist downtime API changed Not maintained Difficult to sustain, unreliable solutions API API API
  9. 9. Cloud Computing • Cloud computing offers advantages over contemporary e-infrastructures in the life sciences – On-demand elastic resources and services – No up-front costs, pay-per-use • A lot of businesses (and software development) moving into the cloud – Vibrant ecosystem of frameworks and tools, including for big data • High potential for science
  10. 10. Virtual Machines and Containers Virtual machines • Package entire systems (heavy) • Completely isolated • Suitable in cloud environments Containers: • Share OS • Smaller, faster, portable • Docker! 10
  11. 11. MicroServices • Similar to Web services: Decompose functionality into smaller, loosely coupled services communicating via API – “Do one thing and do it well” • Preferably smaller, light-weight and fast to instantiate on demand • Easy to replace, language-agnostic – Suitable for loosely coupled teams (which we have in science) – Portable - easy to deploy and scale – Maximize agility for developers • Suitable to deploy as containers in cloud environments
  12. 12. Scaling microservices 12 http://martinfowler.com/articles/microservices.html
  13. 13. 13 Shipping containers?
  14. 14. Orchestrating containers 14
  15. 15. Kubernetes: Orchestrating containers • Origin: Google • A declarative language for launching containers • Start, stop, update, and manage a cluster of machines running containers in a consistent and maintainable way • Suitable for microservices Containers Scheduled and packed containers on nodes
  16. 16. Virtual Research Environment (VRE) • Virtual (online) environments for research – Easy and user-friendly access to computational resources, tools and data, commonly for a scientific domain • Multi-tenant VRE – log into shared system • Private VRE – Deploy on your favorite cloud provider 16
  17. 17. • Horizon 2020-project, €8 M, 2015-2018 – “standardized e-infrastructure for the processing, analysis and information- mining of the massive amount of medical molecular phenotyping and genotyping data generated by metabolomics applications.” • Enable users to provision their own virtual infrastructure (VRE) – Public cloud, private cloud, local servers – Easy access to compatible tools exposed as microservices – Will in minutes set up and configure a complete data-center (compute nodes, storage, networks, DNS, firewall etc) – Can achieve high-availability, scalability and fault tolerance • Use modern and established tools and frameworks supported by industry – Reduce risk and improve sustainability • Offer an agile and scalable environment to use, and a straightforward platform to extend http://phenomenal-h2020.eu/
  18. 18. Users should not see this…
  19. 19. Deployment and user access Launch on reference installation Launch on public cloud Private VRE
  20. 20. In-house deployment scenarios MRC-NIHR Phenome Centre • Medium-sized IT-infrastructure • Dedicated IT- personnel • Users: ICL staff Hospital environment • Dedicated server • No IT-personnel • User: Clinical researcher Private VRE
  21. 21. Build and test tools, images, infrastructure Docker Hub PhenoMeNal Jenkins PhenoMeNal Container Hub Development: Container lifecycle Source code repositories
  22. 22. Two proof of concepts so far Kultima group Pablo Moreno
  23. 23. Implications • Improve sustainability – Not dependent on specific data centers • Improve reliability and security – Users can run their own service environments (VREs) within isolated environments – High-availability and fault tolerance • Scalability – Deploy in elastic environments • Agile development – Automate “from develop to deploy” • Agile science – Simple access to discoverable, scalable tools on elastic compute resources with no up-front costs • NB: Many problems of interoperability remains! – Data – APIs – etc. 24
  24. 24. Ongoing research on VREs 25 Data federation Compute federation Privacy preservation Workflows Big Data frameworks Data management and modeling
  25. 25. Acknowledgements Wesley Schaal Jonathan Alvarsson Staffan Arvidsson Arvid Berg Samuel Lampa Marco Capuccini Martin Dahlö Valentin Georgiev Anders Larsson Polina Georgiev Maris Lapins 26 AstraZeneca Lars Carlsson Ernst Ahlberg University Vienna David Kreil Maciej Kańduła SNIC Science Cloud Andreas Hellander Salman Toor Caramba.clinic Kim Kultima Stephanie Herman Payam Emami ToxHQ team Barry Hardy Thomas Exner Joh Dokler Daniel Bachler

Editor's Notes

  • Idea with SOA (~2005)
    Achieve interoperability by exposing data and functionality as Web services
    Experts (scientists) should set up and host their own Web services
    Users should integrate a multitude of distributed services, connect into workflows (e.g. Taverna), and share (parts of) workflows

    What happened?
    Users could not rely on Web services (downtime, API changes, abandoned) and they could not be mirrored
    Workflows never gained widespread popularity
    Today, stable web services mainly remain at large data and tool providers (EBI, NCBI etc)
  • Drop applications into VMs running Docker in different clouds.
  • Description

    Presented at Big Data in Medicine, Uppsala, Sweden on 2016-11-17

    Transcript

    1. 1. Analyzing Big Data in Medicine with Virtual Research Environments and Microservices Ola Spjuth <ola.spjuth@farmbio.uu.se> Department of Pharmaceutical Biosciences Science for Life Laboratory Uppsala University
    2. 2. Today: We have access to high-throughput technologies to study biological phenomena
    3. 3. New challenges: Data management and analysis • Storage • Analysis methods, pipelines • Scaling • Automation • Data integration, security • Predictions • …
    4. 4. European Open Science Cloud (EOSC) • The vast majority of all data in the world (in fact up to 90%) has been generated in the last two years. • Scientific data is in direct need of openness, better handling, careful management, machine actionability and sheer re-use. • European Open Science Cloud: A vision of a future infrastructure to support Open Research Data and Open Science in Europe – It should enable trusted access to services, systems and the re-use of shared scientific data across disciplinary, social and geographical borders – research data should be findable, accessible, interoperable and re- usable (FAIR) – provide the means to analyze datasets of huge sizes 4http://ec.europa.eu/research/openscience/index.cfm?pg=open-science-cloud
    5. 5. Contemporary Big Data analysis in bioinformatics • High-Performance Computing with shared storage – Linux, Terminal, batch queue • Problems/challenges – Access to resources is limited – Dependency management for tools is cumbersome, need help from system administrators to install software – Privacy-related issues – Difficult to share/integrate data – Accessibility issues • A common approach: Internet-based services – Retrieve data – Analysis tools 5
    6. 6. Workflows 6
    7. 7. Service-Oriented Architectures (SOA) in the life sciences • Standardize – Agree on e.g. interfaces, data formats, protocols etc. • Decompose and compartmentalize – Experts (scientists) should provide services – do one thing and do it well – Achieve interoperability by exposing data and tools as Web services • Integrate – Users should access and integrate remote services API Scientist service Scientist consume
    8. 8. Service-Oriented Architectures (SOA) in the life sciences, ~2005 Scientist downtime API changed Not maintained Difficult to sustain, unreliable solutions API API API
    9. 9. Cloud Computing • Cloud computing offers advantages over contemporary e-infrastructures in the life sciences – On-demand elastic resources and services – No up-front costs, pay-per-use • A lot of businesses (and software development) moving into the cloud – Vibrant ecosystem of frameworks and tools, including for big data • High potential for science
    10. 10. Virtual Machines and Containers Virtual machines • Package entire systems (heavy) • Completely isolated • Suitable in cloud environments Containers: • Share OS • Smaller, faster, portable • Docker! 10
    11. 11. MicroServices • Similar to Web services: Decompose functionality into smaller, loosely coupled services communicating via API – “Do one thing and do it well” • Preferably smaller, light-weight and fast to instantiate on demand • Easy to replace, language-agnostic – Suitable for loosely coupled teams (which we have in science) – Portable - easy to deploy and scale – Maximize agility for developers • Suitable to deploy as containers in cloud environments
    12. 12. Scaling microservices 12 http://martinfowler.com/articles/microservices.html
    13. 13. 13 Shipping containers?
    14. 14. Orchestrating containers 14
    15. 15. Kubernetes: Orchestrating containers • Origin: Google • A declarative language for launching containers • Start, stop, update, and manage a cluster of machines running containers in a consistent and maintainable way • Suitable for microservices Containers Scheduled and packed containers on nodes
    16. 16. Virtual Research Environment (VRE) • Virtual (online) environments for research – Easy and user-friendly access to computational resources, tools and data, commonly for a scientific domain • Multi-tenant VRE – log into shared system • Private VRE – Deploy on your favorite cloud provider 16
    17. 17. • Horizon 2020-project, €8 M, 2015-2018 – “standardized e-infrastructure for the processing, analysis and information- mining of the massive amount of medical molecular phenotyping and genotyping data generated by metabolomics applications.” • Enable users to provision their own virtual infrastructure (VRE) – Public cloud, private cloud, local servers – Easy access to compatible tools exposed as microservices – Will in minutes set up and configure a complete data-center (compute nodes, storage, networks, DNS, firewall etc) – Can achieve high-availability, scalability and fault tolerance • Use modern and established tools and frameworks supported by industry – Reduce risk and improve sustainability • Offer an agile and scalable environment to use, and a straightforward platform to extend http://phenomenal-h2020.eu/
    18. 18. Users should not see this…
    19. 19. Deployment and user access Launch on reference installation Launch on public cloud Private VRE
    20. 20. In-house deployment scenarios MRC-NIHR Phenome Centre • Medium-sized IT-infrastructure • Dedicated IT- personnel • Users: ICL staff Hospital environment • Dedicated server • No IT-personnel • User: Clinical researcher Private VRE
    21. 21. Build and test tools, images, infrastructure Docker Hub PhenoMeNal Jenkins PhenoMeNal Container Hub Development: Container lifecycle Source code repositories
    22. 22. Two proof of concepts so far Kultima group Pablo Moreno
    23. 23. Implications • Improve sustainability – Not dependent on specific data centers • Improve reliability and security – Users can run their own service environments (VREs) within isolated environments – High-availability and fault tolerance • Scalability – Deploy in elastic environments • Agile development – Automate “from develop to deploy” • Agile science – Simple access to discoverable, scalable tools on elastic compute resources with no up-front costs • NB: Many problems of interoperability remains! – Data – APIs – etc. 24
    24. 24. Ongoing research on VREs 25 Data federation Compute federation Privacy preservation Workflows Big Data frameworks Data management and modeling
    25. 25. Acknowledgements Wesley Schaal Jonathan Alvarsson Staffan Arvidsson Arvid Berg Samuel Lampa Marco Capuccini Martin Dahlö Valentin Georgiev Anders Larsson Polina Georgiev Maris Lapins 26 AstraZeneca Lars Carlsson Ernst Ahlberg University Vienna David Kreil Maciej Kańduła SNIC Science Cloud Andreas Hellander Salman Toor Caramba.clinic Kim Kultima Stephanie Herman Payam Emami ToxHQ team Barry Hardy Thomas Exner Joh Dokler Daniel Bachler

    Editor's Notes

  • Idea with SOA (~2005)
    Achieve interoperability by exposing data and functionality as Web services
    Experts (scientists) should set up and host their own Web services
    Users should integrate a multitude of distributed services, connect into workflows (e.g. Taverna), and share (parts of) workflows

    What happened?
    Users could not rely on Web services (downtime, API changes, abandoned) and they could not be mirrored
    Workflows never gained widespread popularity
    Today, stable web services mainly remain at large data and tool providers (EBI, NCBI etc)
  • Drop applications into VMs running Docker in different clouds.
  • More Related Content

    Related Books

    Free with a 30 day trial from Scribd

    See all

    Related Audiobooks

    Free with a 30 day trial from Scribd

    See all

    ×