Adoption of Cloud Computing
   in Scientific Research

              Yehia El-khatib
     School of Computing & Communications
               Lancaster University
Obligatory cloud image…
Outline
• Cloud Computing in Business
• Cloud Computing in Research
  – What does it offer
  – Comparison with other distributed paradigms
  – Different solutions
  – Examples
  – Challenges
• Conclusions
Cloud Computing
• Computational and storage resources provided in an
  on-demand fashion by large clusters of commodity
  computers.
• Offers opportunities:
   – Customised and isolated computing resources are
     obtained as and when required to handle user demand.
   – Pay per use model allows feasibility and sustainability
     through harnessing economies of scale.
   – Management via web service APIs.
   – Universal Internet-based access (all you need is / / /  / … ).
Cloud Computing in Business
        • Used to curb computing expenses without
          restricting the business.
             – Scale  to meet user demand.
             – Dynamically mitigate system failures.
             – Seamlessly roll out new capabilities.
        • Numerous users:



        • Cloud computing market
             – Worth $40.7bn in 2010
             – Expected $177bn in 2015
             – Expected $241bn in 2020

         http://www.forrester.com/rb/Research/sizing_cloud/q/id/58161/t/2
         http://www.gartner.com/it/page.jsp?id=1735214
Academic Research
• Researchers do not spend their
  entire time in the lab, field, etc.
• Collected data needs to be
  processed in order to distil some
  meaning.
• Such analysis processes range from
  scripts and spreadsheets to very
  complex computationally-intensive
  workflows.
• More data is being gathered using
  innovative methods (e.g. remote
  sensing).
Cloud in Academia
• People in academic circles are slowly adopting
  cloud computing for particular applications.
• What does the cloud offer?
  – ‘Everything as a service’ promotes integration and
    relatively easy collaboration across institutions,
    communities and disciplines.
  – Customised environments.
  – Elastic computing infrastructure.
  – More load off the users, i.e. scientists.
     More time to focus on their scientific processes.
Distributed Computing Paradigms
                    HPC              Grid               P2P             Cloud
  Ownership
                My university   Our universities    Our partners       3rd party
(management)
                                                      Trust in
    Trust        Very High           High                                 ?
                                                     partners
                                                     Depends on
  Reliability       High             High                             Very high
                                                   size & partners
                                 Individual &
                 Individual
 Accounting                     Organisational       Difficult…      Pay per use
                   Quotas
                                    quotas

Customisation     Very bad            Bad          Fairly flexible   Very flexible

   Access           Easy         Complicated        Complicated          Easy

                    Local           Remote         Local/Remote
   Support                                                           24x7 support
                  sysadmin         sysadmin          sysadmin
What solutions do clouds offer?
• Generic solutions:                 Research Support
  – Infrastructure (e.g. EmuLab)
  – Analysis (e.g. Biocep-R, CloudNumbers)
  – Space to discover (e.g. Academia.edu), share (e.g.
    myExperiment) and collaborate (e.g. Mendeley)
• Domain-driven solutions:
                                        Research
  – Workflow execution
  – Data normalisation
  – Data discovery, based on content rather than
    problem area
Domain-driven Cloud Solutions
• Environmental Virtual Observatory pilot
  (EVOp)
               http://www.EnvironmentalVirtualObservatory.org
  – To help:
     • Environmental scientists solve ‘big questions’.
     • Policy makers understand implications of decisions.
     • Raise awareness in and interact with local communities.
  – Use case for pilot phase: hydrology.
  – Deal with both geospatial and time series data.
  – Customisable modelling workflows for scientists.
  – Predefined analysis tools for non-specialists.
Domain-driven Cloud Solutions
• Penn State Integrated Hydrologic Model (PIHM)
                                http://slidesha.re/pFFMWp

  – Terrestrial watershed modelling in order to predict
    water distribution.
  – Data is sourced through a repository.
  – Cloud offers seamless access to abundant
    resources to carry out modelling workflows and
    simulations.
  – Results are delivered using bespoke visualisation
    (SaaS).
Domain-driven Cloud Solutions
• Coaddition of SDSS Astronomical Images
                                http://arxiv.org/abs/1010.1015
  – Using Apache Hadoop for coaddition of images from
    the Sloan Digital Sky Survey. (Coaddition increases the
    signal-to-noise ratio).
  – Runs over NSF cloud maintained by Google and IBM.
  – Experimented different approaches to coaddition
    using the MapReduce framework.
  – Improved performance was achieved by reducing job
    initialisation overhead using index files.
  – 300 million pixels processed in 3 minutes.
Domain-driven Cloud Solutions
• Cell structure analysis http://books.google.co.uk/books?id=C_aQqAa6rEoC
    – Hadoop jobs to analyse videos of single cell structures
      under varying conditions.
• European Space Agency                            http://www.esa.int
    – Uses AWS EC2 & S3 to deliver data about the current state
      of the planet to scientists, governmental agencies and other
      organizations worldwide.
• MD Anderson Cancer Center                        http://bit.ly/o0zDwl
    – Large private cloud (8,000 processors) maintained by The
      University of Texas.
    – Used to execute genomic processes against large clinical
      datasets (~1.4PB) on cancer.
Domain-driven Cloud Solutions
• NSF        http://www.nsf.gov/news/news_summ.jsp?cntn_id=119248

  – Approx. $4.5m to fund 13 research projects.
  – Mostly CS, but also bioinformatics & earth sciences.
• VENUS-C                                     http://www.venus-c.eu

  – 15 year-long pilots in different disciplines: architecture,
    biology, bioinformatics, chemistry, earth sciences, healthcare,
    maritime surveillance, mathematics, physics and social media.
• Masters @ SCC Lancaster
  – Corpus linguistics
  – Hydrological modelling
  – 3D imaging (volcanology)
Challenges
• Trust: security and privacy (even by law in some
  circumstances).
• Great divide between different disciplines.
• Data ownership.
   – Most data producers don’t mind sharing as long as they
     retain ownership.
• Software licenses.
• Belief that cloud/grid/etc is only for certain app’s.
• Investment into delivering cloud-based solutions
  to scientists.
   – Legacy applications & infrastructures.
Challenges
• Trust: security and privacy (even by law in some
  circumstances).
• Great divide between different disciplines.
• Data ownership.
   – Most data producers don’t mind sharing as long as they
     retain ownership.
• Software licenses.
• Belief that cloud/grid/etc is only for certain app’s.
• Investment into delivering cloud-based solutions
  to scientists.
   – Legacy applications & infrastructures.
Conclusions
• Need for cloud computing for scientific research:
   – Mainly: “I need more number crunching!”
   – Also: “I need to bridge data/discipline gaps.”
• Overall adoption is still relatively limited.
   – Various reasons, including trust. But also cloud-unrelated
     problems such as data ownership and software licensing.
• Investment into cloud-enabled research is important.
   – Not to browse articles via a mobile app while on the tube…
   – But for the added value of building and nurturing
     relationships.
   – And the economic model (less up front costs).
• Impact:
   – Better scientific tools, with less overhead on the scientists.
   – Potential for more integration.
Thank you!
                         Questions
Flickr credits:
• theaucitron        • stacylynn
• theplanetdotcom    • bpamerica
• Pnnl               • soilscience


                         http://www.comp.lancs.ac.uk/~elkhatib/
       Yehia El-khatib   @yelkhatib

                         http://www.EnvironmentalVirtualObservatory.org
                         @EVOpilot
Discussion
• Trust is not the problem; it is the perception of trust.
• Different academic communities have varying attitudes
  towards new technologies such as the cloud.
• More examples of funding to adopt cloud computing:
   o research: http://www.jisc.ac.uk/news/stories/2011/02/umf.aspx
   o Gov’t: http://www.cabinetoffice.gov.uk/content/government-ict-strategy

Adoption of Cloud Computing in Scientific Research

  • 1.
    Adoption of CloudComputing in Scientific Research Yehia El-khatib School of Computing & Communications Lancaster University
  • 2.
  • 3.
    Outline • Cloud Computingin Business • Cloud Computing in Research – What does it offer – Comparison with other distributed paradigms – Different solutions – Examples – Challenges • Conclusions
  • 4.
    Cloud Computing • Computationaland storage resources provided in an on-demand fashion by large clusters of commodity computers. • Offers opportunities: – Customised and isolated computing resources are obtained as and when required to handle user demand. – Pay per use model allows feasibility and sustainability through harnessing economies of scale. – Management via web service APIs. – Universal Internet-based access (all you need is / / / / … ).
  • 5.
    Cloud Computing inBusiness • Used to curb computing expenses without restricting the business. – Scale  to meet user demand. – Dynamically mitigate system failures. – Seamlessly roll out new capabilities. • Numerous users: • Cloud computing market – Worth $40.7bn in 2010 – Expected $177bn in 2015 – Expected $241bn in 2020  http://www.forrester.com/rb/Research/sizing_cloud/q/id/58161/t/2  http://www.gartner.com/it/page.jsp?id=1735214
  • 6.
    Academic Research • Researchersdo not spend their entire time in the lab, field, etc. • Collected data needs to be processed in order to distil some meaning. • Such analysis processes range from scripts and spreadsheets to very complex computationally-intensive workflows. • More data is being gathered using innovative methods (e.g. remote sensing).
  • 7.
    Cloud in Academia •People in academic circles are slowly adopting cloud computing for particular applications. • What does the cloud offer? – ‘Everything as a service’ promotes integration and relatively easy collaboration across institutions, communities and disciplines. – Customised environments. – Elastic computing infrastructure. – More load off the users, i.e. scientists.  More time to focus on their scientific processes.
  • 8.
    Distributed Computing Paradigms HPC Grid P2P Cloud Ownership My university Our universities Our partners 3rd party (management)  Trust in Trust Very High High ? partners Depends on Reliability High High Very high size & partners Individual & Individual Accounting Organisational Difficult… Pay per use Quotas quotas Customisation Very bad Bad Fairly flexible Very flexible Access Easy Complicated Complicated Easy Local Remote Local/Remote Support 24x7 support sysadmin sysadmin sysadmin
  • 9.
    What solutions doclouds offer? • Generic solutions: Research Support – Infrastructure (e.g. EmuLab) – Analysis (e.g. Biocep-R, CloudNumbers) – Space to discover (e.g. Academia.edu), share (e.g. myExperiment) and collaborate (e.g. Mendeley) • Domain-driven solutions: Research – Workflow execution – Data normalisation – Data discovery, based on content rather than problem area
  • 10.
    Domain-driven Cloud Solutions •Environmental Virtual Observatory pilot (EVOp) http://www.EnvironmentalVirtualObservatory.org – To help: • Environmental scientists solve ‘big questions’. • Policy makers understand implications of decisions. • Raise awareness in and interact with local communities. – Use case for pilot phase: hydrology. – Deal with both geospatial and time series data. – Customisable modelling workflows for scientists. – Predefined analysis tools for non-specialists.
  • 11.
    Domain-driven Cloud Solutions •Penn State Integrated Hydrologic Model (PIHM) http://slidesha.re/pFFMWp – Terrestrial watershed modelling in order to predict water distribution. – Data is sourced through a repository. – Cloud offers seamless access to abundant resources to carry out modelling workflows and simulations. – Results are delivered using bespoke visualisation (SaaS).
  • 12.
    Domain-driven Cloud Solutions •Coaddition of SDSS Astronomical Images http://arxiv.org/abs/1010.1015 – Using Apache Hadoop for coaddition of images from the Sloan Digital Sky Survey. (Coaddition increases the signal-to-noise ratio). – Runs over NSF cloud maintained by Google and IBM. – Experimented different approaches to coaddition using the MapReduce framework. – Improved performance was achieved by reducing job initialisation overhead using index files. – 300 million pixels processed in 3 minutes.
  • 13.
    Domain-driven Cloud Solutions •Cell structure analysis http://books.google.co.uk/books?id=C_aQqAa6rEoC – Hadoop jobs to analyse videos of single cell structures under varying conditions. • European Space Agency http://www.esa.int – Uses AWS EC2 & S3 to deliver data about the current state of the planet to scientists, governmental agencies and other organizations worldwide. • MD Anderson Cancer Center http://bit.ly/o0zDwl – Large private cloud (8,000 processors) maintained by The University of Texas. – Used to execute genomic processes against large clinical datasets (~1.4PB) on cancer.
  • 14.
    Domain-driven Cloud Solutions •NSF http://www.nsf.gov/news/news_summ.jsp?cntn_id=119248 – Approx. $4.5m to fund 13 research projects. – Mostly CS, but also bioinformatics & earth sciences. • VENUS-C http://www.venus-c.eu – 15 year-long pilots in different disciplines: architecture, biology, bioinformatics, chemistry, earth sciences, healthcare, maritime surveillance, mathematics, physics and social media. • Masters @ SCC Lancaster – Corpus linguistics – Hydrological modelling – 3D imaging (volcanology)
  • 15.
    Challenges • Trust: securityand privacy (even by law in some circumstances). • Great divide between different disciplines. • Data ownership. – Most data producers don’t mind sharing as long as they retain ownership. • Software licenses. • Belief that cloud/grid/etc is only for certain app’s. • Investment into delivering cloud-based solutions to scientists. – Legacy applications & infrastructures.
  • 16.
    Challenges • Trust: securityand privacy (even by law in some circumstances). • Great divide between different disciplines. • Data ownership. – Most data producers don’t mind sharing as long as they retain ownership. • Software licenses. • Belief that cloud/grid/etc is only for certain app’s. • Investment into delivering cloud-based solutions to scientists. – Legacy applications & infrastructures.
  • 17.
    Conclusions • Need forcloud computing for scientific research: – Mainly: “I need more number crunching!” – Also: “I need to bridge data/discipline gaps.” • Overall adoption is still relatively limited. – Various reasons, including trust. But also cloud-unrelated problems such as data ownership and software licensing. • Investment into cloud-enabled research is important. – Not to browse articles via a mobile app while on the tube… – But for the added value of building and nurturing relationships. – And the economic model (less up front costs). • Impact: – Better scientific tools, with less overhead on the scientists. – Potential for more integration.
  • 18.
    Thank you! Questions Flickr credits: • theaucitron • stacylynn • theplanetdotcom • bpamerica • Pnnl • soilscience http://www.comp.lancs.ac.uk/~elkhatib/ Yehia El-khatib @yelkhatib http://www.EnvironmentalVirtualObservatory.org @EVOpilot
  • 19.
    Discussion • Trust isnot the problem; it is the perception of trust. • Different academic communities have varying attitudes towards new technologies such as the cloud. • More examples of funding to adopt cloud computing: o research: http://www.jisc.ac.uk/news/stories/2011/02/umf.aspx o Gov’t: http://www.cabinetoffice.gov.uk/content/government-ict-strategy