Cloud-Based Solutions for Scientific Computing

Written for IST 402 – December 06, 2015
Cloud-Based Solutions for Scientific
Computing
Ian Lewis

Abstract:
Scientists and researchers from various disciplines (examples include molecular biology, genomics, and
climatology) collect a wealth of data for analysis from a variety of instruments and methods. This data
needs to be stored and shared among researchers, and is often used in complex modeling and simulation.
The flexibility, scalability, and low up-front costs offered by cloud computing could allow researchers to
engage in such research without the barrier of acquiring and configuring physical IT resources. For this
reason, there has been a proliferation of the use of cloud computing within scientific communities, of
cloud computing environments tailored for use by scientists, and research into the creation of
mechanisms to enable scientists to better leverage cloud computing in their research. This paper is a
discussion of dedicated scientific cloud environments, concerns related to scientific computing in the
cloud, and current research into improving cloud environments for scientists and researchers.

2
2
Cloud-Based Solutions for
Scientific Computing

Table of Contents
I – Introduction ....................................................................................................................... 3
II – Microsoft Azure for Research ............................................................................................ 4
III – Nimbus: Cloud Computing for Science .............................................................................. 5
IV – Public vs. Private Cloud Infrastructures for Scientific Applications ................................... 6
V – Azure and BLAST ............................................................................................................... 8
VI – Nimbus and CyberGIS ...................................................................................................... 9
VII – Scientific Data sets in the Cloud .................................................................................... 10
VIII – Current Research ......................................................................................................... 12
IX – Conclusions .................................................................................................................... 14
References ............................................................................................................................ 15

3
3
I – Introduction

Scientific computing involves the “construction of mathematical models and numerical
solution techniques to solve scientific, social scientific and engineering problems” (Vecchiola,
Pandey, and Buyya, 2009). In the past, researchers have used on-premise and remote
supercomputers, pooled machines, and computing grids to perform complex operations on
large-scale data sets (for example, The Open Science Grid consists of 25,000 machines to
support storage and analysis of data from the Large Hadron Collider along with other projects).
Obtaining these resources can prove to be a barrier to innovative research.

According to Vecchiola, Pandey, and Buyya (2009), for researchers attempting to secure
high-power computing resources, challenges come in the form of bureaucratic and technical
issues. Organizations or individuals requesting access to computing grids (or on-premise
resources at their own institution) must submit proposals to the owners of these resources,
which results in the necessity of prioritizing certain projects over others – high-priority projects
over those deemed to be low-priority. These resources often use “pre-packaged
environments,” that require researchers to use specific APIs and/or tools. These environments
may not be compatible with software or experiments that the user requires.

Leasing cloud-based IT resources for scientific computing allows researchers to avoid the
bureaucratic issues associated with securing computing resources from their (or another)
institution, and the technical issues associated with relying on one specific type of computing
resource. Researchers using cloud-based IT resources can custom-configure the cloud
environment for their specific needs, scale up resource use during experiment runtime, scale
back resource use after completion, and lease cloud-based storage to hold the associated data
and/or results, all without any up-front hardware costs. This sort of flexibility could enable a
whole generation of researchers to engage in computational science without the constraints of
securing and provisioning physical IT resources.

As the use of cloud computing has proliferated, many researchers and scientists have used
many cloud services sourced from public clouds (such as Amazon’s EC2) and open source
private cloud frameworks (such as OpenNebula) to implement their own solutions in fields such
as molecular biology, medicine, bioinformatics, neurology, astrophysics, and the social sciences.
However there are few cloud services specifically marketed to researchers. Of these, two
solutions came up frequently in the literature – Microsoft’s Azure for Research (a PaaS project
that allows researchers to leverage Azure for scientific computation), and Nimbus: Cloud

4
4
Computing for Science, an open-source private cloud framework produced by a team within the
University of Chicago.

II – Microsoft Azure for Research

Microsoft Azure is a set of cloud services that draw from Microsoft’s vast data centers,
which are distributed across the globe. These services include storage, infrastructure,
development, management, integration solutions, and many others. Azure for Research is not
its own service, but rather a bundle of Microsoft Azure services marketed to researchers and
scientists (Microsoft Corporation, 2013). These include:

Websites/Web Applications: Azure contains easily configurable templates, hosting, and
other services for researchers to communicate their findings to others through their own
webpage.

Virtual Machines: researchers can choose standard VM images to run or build applications
on, or capture a VM image from their own operating environment. Azure allows researchers
to run VMs in clusters, enabling them to perform multiple simulations/computations
simultaneously.

Cloud Services: if a researcher would prefer to perform computation without configuring a
VM’s operating system or any other aspect of the computing environment, they can choose
to use Microsoft Azure’s Cloud Services. Job submissions are entered into a web interface,
computation occurs on virtual infrastructure completely obscured from the user, and
associated data is managed through Azure’s data services.

Mobile Services: Azure offers support for mobile devices – this could be relevant for
researchers who wish to enter data remotely, or send push notifications to mobile devices
once computation is complete.

Storage/Data Services: Azure offers NoSQL storage in the form of Blobs and Tables. Blobs
are chunks of files, up to 200 GB in size (1TB for server backups) stored in Azure’s
infrastructure that can be shared with others or with cloud applications. Tables provides
NoSQL data tables that can hold up to 200 TB of typed data – within a research/science
context, Tables could be used to store raw data from instruments for later analysis. Azure
also offers SQL Database service for those interested in using a standard relational database
– these can be accessed by cloud and on-premise applications.

5
5

Queues: an asynchronous messaging tool that allows users to set up communication
between applications hosted in Azure, or between tiers of an application hosted in Azure.

HDInsight: an Apache Hadoop-based storage system that allows users to securely store
large-scale unstructured or structured data (that might be too large to fit into an SQL
database).

High-Performance Computing: a service that works with HDInsight to analyze massive data
sets through clustered computing nodes that work in parallel. This technique is discussed
more in-depth below in the AzureBlast case study.

From Microsoft Azure for Research Overview (Microsoft Corporation, 2013).

III – Nimbus: Cloud Computing for Science

Nimbus: Cloud Computing for Science is an alternative solution for researchers and
scientists seeking to leverage cloud computing in their work, developed by a team at the
University of Chicago. It is a free and open source IaaS framework that enables users to
dynamically allocate their own, private physical IT resources (or lease and access others’ private
IT resources) (Keahey & Freeman, 2008). The core mechanisms and services within Nimbus
include:

Workspace Service: this service consists of a front-end workspace that connects the user to
a back-end resource manager in order to deploy and manage virtual machines.

Workspace Resource Manager/Workspace Pilot: deploys and manages VMs through
various sub-mechanisms.

Workspace Control Tools: used to start, stop, pause, connect, and manage VMs.

IaaS Gateway: allows a user to extend Nimbus with another IaaS infrastructure by mapping
the user’s KPI credentials from the second infrastructure back to their Nimbus credentials.
Nimbus IaaS Gateway is configured to support Amazon EC2’s REST API, working with the
Nimbus Context Broker to allow users to easily extend their private cloud resources with
public cloud resources if computing demand exceeds capacity.

6
6
Context Broker: creates a common configuration and security context across resources
provisioned from one or many cloud infrastructures, enabling the user to operate in a multi-
cloud environment (combining private and public cloud capabilities).

Nimbus Storage Service: allows the user to manage their cloud storage space and VM
image repository, and works in conjunction with GridFTP to enable users to connect Nimbus
to storage area networks, and a range of other distributed file systems.

From Science Clouds: Early Experience in Cloud Computing for Scientific Applications (Keahey and Freeman,
2008).
IV – Public vs. Private Cloud Infrastructures for Scientific Applications

The benefits presented by cloud computing for scientific applications – dynamic scalability,
low implementation cost, and flexibility, among others – are fairly obvious, but there are
challenges presented when using cloud resources in scientific computation. Researchers
employing cloud computing in their work can choose to leverage public cloud resources, such as
Microsoft’s Azure or Amazon EC2, or open-source frameworks to leverage private cloud
resources, such as Nimbus, OpenNebula, or Eucalyptus. Each of these options presents a unique
set of costs and benefits for researchers and scientists, stemming from the requirements
commonly posed by scientific applications.

Scientific computation is different from other forms of computation, in that it often requires
data to be transferred at high volume between different applications, components of an
application, or tiers of an application (Tudoran et al., 2012). For example, the Biomass
Succession Extension of LANDIS-II is a scientific computation model that simulates annual forest
growth across space using a collection of ecosystem process models. LANDIS-II breaks landmass
down into small cells, and computes forest growth within these cells using shade tolerance,
maximum annual net primary productivity, maximum aboveground live biomass, probability of
establishment [of an organism], growth shape parameter, mortality shape parameter, effective
seed dispersal distance, species longevity, and minimum threshold for shade (Simons-Legaard,
Legaard, and Weiskittel, 2015). In order to generate accurate predictions, each of these
variables is associated with a fairly complex mathematical model.

Running this model requires that data from each cell to be fed through each component of
the application, and re-combined to generate the output prediction from initial input data and
parameters. Most non-science applications do not require data to undergo as many
abstractions, or move between as many components or tiers. Therefore, public cloud

7
7
computing services were not specifically designed to support these types of applications, as
architects did not place inter-VM throughput as high on their list of priorities as scientific
computation requires (Tudoran et al., 2012).

Other possible issues presented when using public cloud services include variability of
performance due to multi-tenancy (i.e. when demand from other users spikes, performance
may decline), the financial cost of leasing resources, and portability of data to and from “the
cloud.” Tudoran et al. (2012) explore these problems, and examine the costs and benefits of
using public clouds vs. open-source private cloud frameworks.

Tudoran et al. (2012) compared Microsoft Azure and Nimbus’s performance for use with
scientific applications by configuring them to run an instance of “A-Brain,” a reference
application used by researchers to compare brain regions of MRI images with genes in order to
find links between the two. Like LANDIS-II, A-Brain requires large data sets to be fed through
and transported between multiple components of the application in order to generate results.

They found that, in terms of performance, Nimbus was better suited for this type of use due
to its open-source and configurable nature, as well as the fact that it is a private cloud
framework. All physical IT resources were leased from private providers, with unshared
bandwidth close in proximity to one another (as opposed to using shared, globally-distributed
public computing resources). However, though downloading and using the Nimbus cloud
framework is free, Tudoran et al. (2012) found that it is 13.5% cheaper per year to use Azure.

Their estimate of how much it would cost to host Nimbus on private resources for scientific
computing (the cost of ownership or lease from one or many institutions) exceeds the cost of
using Azure. One additional concern is researchers’ ability to directly manage their cloud
infrastructure. Nimbus requires a higher degree of direct involvement and configuration than
Azure, as Microsoft does much of this for their clients.

Thus, which platform a researcher chooses to use boils down to (1) the availability of
private IT resources for use with Nimbus, (2) the value they place on Nimbus’s higher level of
customization and inter-VM throughput, and (3) the degree to which they are able to directly
manage their cloud resources. This example in Tudoran et al. (2012) provides a framework for
researchers to perform an effective cost/benefit analysis when making this choice. The
following two sections discuss case studies of researchers and scientists using cloud-based
resources for computational research: one example using Azure, and a second using Nimbus.

8
8
V – Azure and BLAST

The Basic Alignment Search Tool (BLAST) is an algorithm that detects similarities between
bio-sequences. It has been used heavily since the 1990s within genetics, bioinformatics,
molecular biology, and other related fields. BLAST can be used to compare DNA and amino acid
sequences, and score their similarity best on the frequency and length of identical segments
(Altschul et al., 1990).

Liu, Jackson, and Barga (2010) created an implementation of the BLAST algorithm using
Microsoft’s Azure, which they refer to as “AzureBlast.” Like Tudoran et al. (2012), Liu, Jackson,
and Barga (2010) note that Azure’s architecture is not optimized for low latency communication
between different nodes of computation. However, since BLAST is a single (albeit data-
intensive) algorithm that does not require high-volume communication between VMs, they
argue that BLAST is very well suited for implementation in cloud environments.

AzureBlast uses thousands of concurrently operating instances of BLAST within Azure’s
Cloud Services framework (refer to Section II for a definition of Azure’s various mechanisms and
services), each handling subsections of a larger bio-sequence. The mechanisms used within
AzureBlast include:

Job Submission Portal: this is the front-end portal from which users can submit jobs, which
consist of the two bio-sequences to be compared, along with any parameters researchers
wish to set.

Job Scheduler: this mechanism accepts jobs submitted through the front-end portal, and
uses rules to schedule computation tasks via the dispatch queue.

Tasks: the job scheduler mechanism breaks down the job (i.e. the two bio-sequences in full)
into smaller segments. Each individual segment comparison is considered a “task.”
Thousands of tasks are carried out by worker role instances simultaneously to expedite the
process, and the resulting data is combined at the end to provide the final result.

Worker Role Instances: instances of applications within the Azure framework – these
instances automatically “poll” the dispatch queue to find work, submit the resulting data
from each task in an Azure storage mechanism (in this case, a Blob). Workers poll the
dispatch queue once again after completing a task, repeating the process until no tasks
remain.

9
9
Dispatch Queue: using the Queues feature of Azure, the dispatch queue mechanism
provides an asynchronous message delivery system between compute roles – the dispatch
is filled with “tasks” by the job scheduler, and these tasks are retrieved by worker role
instances seeking work.

From Liu, Jackson, and Barga (2010).

In addition to these mechanisms, configured from Azure’s native features, Liu, Jackson, and
Barga (2010) added their own abstraction layer (referred to as a “Task Parallel Library”) to
coordinate the simultaneous operations of each worker role instance. The Task Parallel Library
uses a “fork” function to split bio-sequences into manageable segments, and a “join” function
to string the resulting data back together.

After extensive testing, they found that there is roughly a linear relationship between the
number of worker instances in use, and the number of sequences processed per minute. They
also found that the dollar cost of each sequence decreases until the number of input sequences
reaches about 100. AzureBlast processes about the same number of sequences per dollar at
any scale larger than this. While they note that the messaging API within Azure is “inconvenient
and unintuitive for developing parallel applications for science” (stating that researchers must
be careful when coordinating more than one queue and handling errors/exceptions), their
overall conclusion is that Azure is well suited for deploying instances of BLAST.
VI – Nimbus and CyberGIS

CyberGIS is a collaborative project headed by Professor Shaowen Wang at the University of
Illinois, Urbana-Champaign. CyberGIS uses National Science Foundation XSEDE supercomputers,
and OpenGrid computing resources to perform geo-spatial computation as a service. CyberGIS
applications are available for users at no cost through a web portal, referred to as the CyberGIS
Gateway – http://sandbox.cigi.illinois.edu/home/.

CyberGIS employs a three-tiered architecture: (1) the web portal, (2) “GISolve” middleware,
which serves as a bridge between the web portal and physical IT resources, and (3) physical
computing architecture. Examples of applications hosted on CyberGIS include BioScope, a
biofuel supply chain optimization application, TauDEM, which extracts hydrological information
from topography, and FluMapper, which analyzes large-scale, location-based social media data.

CyberGIS remotely uses physical IT resources from two sources – resources owned by
OpenGrid, and supercomputers owned by the National Science Foundation. Riteau et al. (2014)

10
10
identify auto-scaling as a major concern for CyberGIS, for two reasons. (1) There is highly
variable demand on the CyberGIS system, due to its accessibility and multi-user nature. For
example, if a professor decides to use CyberGIS for complex computation in an assignment for
their students, demand spikes, as there are now many students placing high demand on the
system all at once. (2) Many researchers take an exploratory approach to their research with
CyberGIS – it is common for researchers to run complex simulations many times with different
parameters. To implement auto-scaling within the CyberGIS architecture, Riteau et al. used
Nimbus: Cloud Computing for Science, as it is a free, open-source private IaaS framework
specifically tailored for use with scientific computation.

The CyberGIS Gateway normally used static VM clusters to run the various special
regression services offered by CyberGIS applications. Using Nimbus, Riteau et al. (2014)
implemented a queuing load balancer, and a dynamic scaling mechanism (referred to as the
“Decision Engine”) that provisioned and terminated VM instances on-demand in response to
the queuing load balancer.

Queuing Load Balancer: this mechanism distributes HTTP requests from the CyberGIS
gateway among a pool of back-end servers, queues requests when all servers are in use,
and provides metrics on requests and workloads.

Decision Engine: uses the API native to Nimbus to requests changes in the number of
deployed VM instances. When new instances are provisioned, this mechanism integrates
them into the pool of back-end servers used by the queuing load balancer to distribute
workloads.

Riteau et al. (2014) compared average response times for requests made through the
CyberGIS Gateway from the original, static VM configuration, and the dynamically scaled,
Nimbus configuration. They found that while response time increased in proportion to the
number of simultaneous requests submitted to the static configuration, response time
remained stable when they increase the number of simultaneous requests submitted to the
dynamic configuration. They tested these configurations for both small and large, complex file
submissions – the size of submitted jobs did not affect response time. They conclude that
Nimbus provides a viable auto-scaling solution for their existing framework – response time for
many simultaneous requests was reduced from 150 seconds to 50 seconds for large files, and
from over 120 seconds to 29.5 seconds for small files.
VII – Scientific Data sets in the Cloud

11
11
In addition to cloud-based computation, there are also many public data sets hosted on
cloud services for use by researchers and scientists. These cloud providers not only enable
researchers to perform complex operations on large-scale data, they also provide them with a
way to store and manage these massive data sets that are simply too large for local storage.
Two notable examples of these data set storage services are (1) Amazon Public Web Services
Data Sets, and (2) the Open Science Data Cloud.

Amazon AWS Public Data Sets

Amazon hosts several large-scale data sets on their own servers, which can be quickly and
easily processed with the applications of researchers’ choice on Amazon’s EC2 cloud services.
This is extremely beneficial for researchers, as they no longer need to spend time and resources
on locating and migrating cumbersome files (Amazon Web Services, Inc., 2015). Some examples
of these data sets include:

1000 Genomes Project: an international project cataloguing human genetic variation – this
contains data sequenced from over 2,500 individuals.

CCAFS-Climate Data: this is a climate data set of open-access climate projections, primarily
targeted at researchers seeking to assess the impact of climate change on agriculture.

Denisova Genome: this is a “high-coverage” genome of a Denisovian (a sister species to
Neanderthals), one of the most closely related extinct species to humans.

LandSat on AWS: a collaboration between the USGS and NASA, containing continuous
satellite imagery of Earth’s entire surface from 1972 to present – images are updated
consistently.

Petroleum Public Data Set: public domain data from a variety of petroleum organizations
worldwide.

From AWS Public Data Sets (Amazon Web Services Inc., 2015).

Open Science Data Cloud

The Open Science Data Cloud (OSDC) is a “petabyte-scale” resource for scientists to store,
share, and analyze large-scale data sets. It hosts material similar to AWS Public Data Sets, as
well as generalized computing resources – one for general computation, and another for

12
12
restricted computation (e.g. a project using sensitive medical data). Like Nimbus, this resource
is directly geared toward researchers and scientists. However, researchers must first be
approved by the OSDC after submitting a proposal before they are allocated resources, rather
being provided free and open access to configure their own resources (Open Science Data
Cloud, 2015). Examples of these data sets include:

City of Chicago Public Data Sets: a large set of social data from the City of Chicago, in both
tabular and “raw” form.

Complete Genomics Public Data: entire human gene sequences provided by Complete
Genomics. These include samples from disease-free individuals and individuals with cancer.

Earth Observing-1 Mission: 80.5 terabytes of data from NASA’s Earth Observing-1 satellite
mission.

Large-Scale Data Analysis and Visualization Symposium Data: data from a global climate
dynamics simulation run on Oak Ridge National Laboratory’s Titan supercomputer.

Sloan Digital Sky Survey: consists of “a series of three interlocking images and
spectroscopic surveys, carried out over an eight-year period” from Apache Point
Observatory in New Mexico.

From OSDC: Open Science Data Cloud (Open Science Data Cloud, 2015).

These cloud-based, public data sets will not only assist researchers employed at dedicated
institutions and universities – they may create opportunities for enterprises to leverage public
data for business purposes. This data could facilitate international collaboration. If a group of
specialists located in disparate locations decides to collaborate on a project, geographic or
political barriers to porting this data are removed, as anyone with a strong Internet connection
can access them.
VIII – Current Research

As discussed in Section IV, there are certain requirements common for scientific
applications that do not exist for most other classes of applications. As the use of cloud
computing for scientific applications has increased in the last few years, researchers have
sought to develop mechanisms to make cloud computing environments more scientific
application-friendly. Two examples are discussed below – the Cloud Resource Broker

13
13
(CLOUDRB), which manages resources and jobs according to user-specified deadlines, and a
“transparent elastic disk throughput” mechanism, which works to optimize cloud storage
provisions for scientific applications.

CLOUDRB

Scientific applications often require large amounts of data to be processed by a certain
deadline – cloud environments designated to host scientific applications (e.g. a deployment of
Nimbus) may have many applications running simultaneously, each with its own deadline.
Somasundaram and Govindarajam (2013) propose a Cloud Resource Broker (CLOUDRB) to
prioritize resource provisioning for certain jobs over others in an open-source science cloud
environment, given a set of deadlines.

This mechanism includes components that accept jobs with deadlines assigned by users,
prioritize these jobs according to their deadline and computation requirements (i.e. the
complexity of a job), and dynamically assign resources to complete jobs within these deadlines.
Somasundaram and Govindarajam (2013) compared their CLOUDRB with three, commonly
used, resource provisioning mechanisms. Their experiment involved running 200, 400, 600, 800,
and 1000 simultaneous instances of an ant pheromone comparison algorithm to compare the
impact of these provisioning mechanisms on completion time and job rejection rate. They
found that their mechanism completed jobs at or before deadline 1.3-1.77 times more often
than the three alternative mechanisms, and rejected jobs 15%-35% less. This is significant, in
that CLOUDRB will allow researchers to more accurately predict when results will be available
from complex operations.

Transparent Elastic Disk Throughput Mechanism

Nicolae, Riteau, and Keahey (2015) argue that while the dynamic provisioning of computing
resources in cloud environments has been studied heavily, dynamic storage allocation has
received comparatively little attention. While users can expect to be dynamically allocated
computing resources (e.g. RAM and CPUs) in response to demand, they are still required to
manually allocate storage resources. Also, users must either choose cheap, low I/O throughput
disks or expensive, high I/O throughput disks, without the option of changing storage disk type
in response to throughput demand. In scientific computing, this becomes an issue because it is
quite common for applications to iteratively demand periods of very high storage throughput
(when data exits an application in bulk) followed by periods of very low storage throughput
(when the actual computation is occurring).

14
14
Nicolae, Riteau, and Keahey (2015) designed the transparent elastic disk throughput
mechanism to dynamically allocate small, high throughput storage disks to supplement
cheaper, low throughput storage disks that are manually provisioned by the user and persist
throughout computation. During periods of high throughput demand, this dynamic storage
provisioning mechanism enables the user to avoid the higher cost associated with self-
provisioning high throughput disks, while increasing throughput to levels associated with
higher-throughput disks when necessary.

Testing this mechanism with an atmospheric phenomena simulator, they found that this
dynamic storage allocation increased cost by only 3.3%, and reduced completion time from
1,471 seconds to 1,231 seconds. By contrast, statically provisioning high-throughput disks for
use through the entire run increased costs by 23%, but only reduced completion time from
1,471 seconds to 1,190 seconds – a marginal improvement in comparison to their solution. This
mechanism will enable researchers and scientists to more efficiently take advantage of cloud
computing resources.
IX – Conclusions

After reviewing the literature, and cloud services marketed to researchers and scientists, it
can be concluded that there has indeed been a proliferation of cloud computing resources for
scientific applications, in recent years, as well as use of these resources. These services include
cloud-based data, storage, and computation, which can be used in a variety of ways in the work
of scientists from a variety of knowledge domains. For fields where many teams and individuals
might require complex modeling and simulation software for large data sets, this trend is likely
to continue. If Microsoft and Amazon wish to continue to market their services to scientists,
they will need to improve inter-VM throughput, as discussed in Tudoran, et al. (2012).
Otherwise, they may be edged out by emerging competitors that leverage the use of open-
source cloud frameworks.

15
15
References

Altschul, F. et al. (1990). Basic local alignment search tool. Journal of Molecular Biology, 215(3):
403-410. Retrieved from http://www.blastalgorithm.com/

Amazon Web Services, Inc. (2015). AWS public data sets. Retrieved from
http://aws.amazon.com/public-data-sets/

Keahey, K., & Freeman, T. (2008). Science clouds: Early experiences in cloud computing for
scientific applications. Retrieved from http://www.nimbusproject.org/files/Science-Clouds-
CCA08.pdf

Liu, W., Jackson, J., & Barga, R. (2010). AzureBlast: A case study of developing science
applications on the cloud. Proceedings from the 1st
Workshop on Scientific Cloud
Computing: Indianapolis, Indiana.

Microsoft Corporation. (2013). Microsoft azure for research overview. Retrieved from
http://research.microsoft.com/en-us/projects/azure/windows-azure-for-research-
overview.pdf

Nicolae, B., Riteau, P., & Keahey, K. (2015). Towards transparent throughput elasticity for IaaS
cloud storage: Exploring the benefits of adaptive block-level caching. International Journal
of Distributed Systems and Technologies (IJDST): 6(4).

Open Science Data Cloud. (2015). OSDC: Open science data cloud. Retrieved from
https://www.opensciencedatacloud.org/

Riteau, R. et al. (2014). A cloud computing approach to on-demand and scalable CyberGIS
analytics. Proceedings from 5th Workshop on Scientific Cloud Computing (ScienceCloud
2014): Vancouver, Canada.

Sempolinski, P., & Thain, D. (2010). A comparison and critique of Eucalyptus, OpenNebula, and
Nimbus. Proceedings from the 2nd
IEEE International Conference on Cloud Computing
Technology and Science: Indianapolis, Indiana.

16
16
Simons-Legaard, E., Legaard, K., & Weiskittel, A. (2015). Predicting aboveground biomass with
LANDIS-II: A global and temporal analysis of parameter sensitivity. Ecological Modeling: 313,
325-332.

Somasundaram, T. S., & Govindarajam, K. (2013). CLOUDRB: A framework for scheduling and
managing High-Performance Computing (HPC) applications in science cloud. Future
Generation Computer Systems: 34, 47-65.

Tudoran, R. et al. (2012). A performance evaluation of Azure and Nimbus clouds for scientific
applications. Proceedings from the 2nd
International Workshop on Cloud Computing
Platforms: Bern, Switzerland.

Vecchiola, C., Pandey, S., & Buyya, R. (2009). High-performance cloud computing: A view of
scientific applications. Proceedings from The 2009 10th International Symposium on
Pervasive Systems, Algorithms, and Networks: Melbourne, Australia.

Cloud-Based Solutions for Scientific Computing

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Cloud-Based Solutions for Scientific Computing

Similar to Cloud-Based Solutions for Scientific Computing (20)

Recently uploaded

Recently uploaded (20)

Cloud-Based Solutions for Scientific Computing