SlideShare a Scribd company logo
Written	for	IST	402	–	December	06,	2015	
Cloud-Based	Solutions	for	Scientific	
Computing	
Ian	Lewis	
	
Abstract:	
Scientists	and	researchers	from	various	disciplines	(examples	include	molecular	biology,	genomics,	and	
climatology)	collect	a	wealth	of	data	for	analysis	from	a	variety	of	instruments	and	methods.	This	data	
needs	to	be	stored	and	shared	among	researchers,	and	is	often	used	in	complex	modeling	and	simulation.	
The	flexibility,	scalability,	and	low	up-front	costs	offered	by	cloud	computing	could	allow	researchers	to	
engage	in	such	research	without	the	barrier	of	acquiring	and	configuring	physical	IT	resources.	For	this	
reason,	there	has	been	a	proliferation	of	the	use	of	cloud	computing	within	scientific	communities,	of	
cloud	computing	environments	tailored	for	use	by	scientists,	and	research	into	the	creation	of	
mechanisms	to	enable	scientists	to	better	leverage	cloud	computing	in	their	research.	This	paper	is	a	
discussion	of	dedicated	scientific	cloud	environments,	concerns	related	to	scientific	computing	in	the	
cloud,	and	current	research	into	improving	cloud	environments	for	scientists	and	researchers.
2	
2	
Cloud-Based	Solutions	for		
Scientific	Computing	
	
Table	of	Contents	
I	–	Introduction	.......................................................................................................................	3	
II	–	Microsoft	Azure	for	Research	............................................................................................	4	
III	–	Nimbus:	Cloud	Computing	for	Science	..............................................................................	5	
IV	–	Public	vs.	Private	Cloud	Infrastructures	for	Scientific	Applications	...................................	6	
V	–	Azure	and	BLAST	...............................................................................................................	8	
VI	–	Nimbus	and	CyberGIS	......................................................................................................	9	
VII	–	Scientific	Data	sets	in	the	Cloud	....................................................................................	10	
VIII	–	Current	Research	.........................................................................................................	12	
IX	–	Conclusions	....................................................................................................................	14	
References	............................................................................................................................	15
3	
3	
I	–	Introduction	
	
Scientific	computing	involves	the	“construction	of	mathematical	models	and	numerical	
solution	techniques	to	solve	scientific,	social	scientific	and	engineering	problems”	(Vecchiola,	
Pandey,	and	Buyya,	2009).	In	the	past,	researchers	have	used	on-premise	and	remote	
supercomputers,	pooled	machines,	and	computing	grids	to	perform	complex	operations	on	
large-scale	data	sets	(for	example,	The	Open	Science	Grid	consists	of	25,000	machines	to	
support	storage	and	analysis	of	data	from	the	Large	Hadron	Collider	along	with	other	projects).	
Obtaining	these	resources	can	prove	to	be	a	barrier	to	innovative	research.		
	
According	to	Vecchiola,	Pandey,	and	Buyya	(2009),	for	researchers	attempting	to	secure	
high-power	computing	resources,	challenges	come	in	the	form	of	bureaucratic	and	technical	
issues.	Organizations	or	individuals	requesting	access	to	computing	grids	(or	on-premise	
resources	at	their	own	institution)	must	submit	proposals	to	the	owners	of	these	resources,	
which	results	in	the	necessity	of	prioritizing	certain	projects	over	others	–	high-priority	projects	
over	those	deemed	to	be	low-priority.	These	resources	often	use	“pre-packaged	
environments,”	that	require	researchers	to	use	specific	APIs	and/or	tools.	These	environments	
may	not	be	compatible	with	software	or	experiments	that	the	user	requires.	
	
Leasing	cloud-based	IT	resources	for	scientific	computing	allows	researchers	to	avoid	the	
bureaucratic	issues	associated	with	securing	computing	resources	from	their	(or	another)	
institution,	and	the	technical	issues	associated	with	relying	on	one	specific	type	of	computing	
resource.	Researchers	using	cloud-based	IT	resources	can	custom-configure	the	cloud	
environment	for	their	specific	needs,	scale	up	resource	use	during	experiment	runtime,	scale	
back	resource	use	after	completion,	and	lease	cloud-based	storage	to	hold	the	associated	data	
and/or	results,	all	without	any	up-front	hardware	costs.	This	sort	of	flexibility	could	enable	a	
whole	generation	of	researchers	to	engage	in	computational	science	without	the	constraints	of	
securing	and	provisioning	physical	IT	resources.	
	
As	the	use	of	cloud	computing	has	proliferated,	many	researchers	and	scientists	have	used	
many	cloud	services	sourced	from	public	clouds	(such	as	Amazon’s	EC2)	and	open	source	
private	cloud	frameworks	(such	as	OpenNebula)	to	implement	their	own	solutions	in	fields	such	
as	molecular	biology,	medicine,	bioinformatics,	neurology,	astrophysics,	and	the	social	sciences.	
However	there	are	few	cloud	services	specifically	marketed	to	researchers.	Of	these,	two	
solutions	came	up	frequently	in	the	literature	–	Microsoft’s	Azure	for	Research	(a	PaaS	project	
that	allows	researchers	to	leverage	Azure	for	scientific	computation),	and	Nimbus:	Cloud
4	
4	
Computing	for	Science,	an	open-source	private	cloud	framework	produced	by	a	team	within	the	
University	of	Chicago.	
	
II	–	Microsoft	Azure	for	Research	
	
Microsoft	Azure	is	a	set	of	cloud	services	that	draw	from	Microsoft’s	vast	data	centers,	
which	are	distributed	across	the	globe.	These	services	include	storage,	infrastructure,	
development,	management,	integration	solutions,	and	many	others.	Azure	for	Research	is	not	
its	own	service,	but	rather	a	bundle	of	Microsoft	Azure	services	marketed	to	researchers	and	
scientists	(Microsoft	Corporation,	2013).	These	include:	
	
Websites/Web	Applications:	Azure	contains	easily	configurable	templates,	hosting,	and	
other	services	for	researchers	to	communicate	their	findings	to	others	through	their	own	
webpage.	
	
Virtual	Machines:	researchers	can	choose	standard	VM	images	to	run	or	build	applications	
on,	or	capture	a	VM	image	from	their	own	operating	environment.	Azure	allows	researchers	
to	run	VMs	in	clusters,	enabling	them	to	perform	multiple	simulations/computations	
simultaneously.	
	
Cloud	Services:	if	a	researcher	would	prefer	to	perform	computation	without	configuring	a	
VM’s	operating	system	or	any	other	aspect	of	the	computing	environment,	they	can	choose	
to	use	Microsoft	Azure’s	Cloud	Services.	Job	submissions	are	entered	into	a	web	interface,	
computation	occurs	on	virtual	infrastructure	completely	obscured	from	the	user,	and	
associated	data	is	managed	through	Azure’s	data	services.	
	
Mobile	Services:	Azure	offers	support	for	mobile	devices	–	this	could	be	relevant	for	
researchers	who	wish	to	enter	data	remotely,	or	send	push	notifications	to	mobile	devices	
once	computation	is	complete.	
	
Storage/Data	Services:	Azure	offers	NoSQL	storage	in	the	form	of	Blobs	and	Tables.	Blobs	
are	chunks	of	files,	up	to	200	GB	in	size	(1TB	for	server	backups)	stored	in	Azure’s	
infrastructure	that	can	be	shared	with	others	or	with	cloud	applications.	Tables	provides	
NoSQL	data	tables	that	can	hold	up	to	200	TB	of	typed	data	–	within	a	research/science	
context,	Tables	could	be	used	to	store	raw	data	from	instruments	for	later	analysis.	Azure	
also	offers	SQL	Database	service	for	those	interested	in	using	a	standard	relational	database	
–	these	can	be	accessed	by	cloud	and	on-premise	applications.
5	
5	
	
Queues:	an	asynchronous	messaging	tool	that	allows	users	to	set	up	communication	
between	applications	hosted	in	Azure,	or	between	tiers	of	an	application	hosted	in	Azure.	
	
HDInsight:	an	Apache	Hadoop-based	storage	system	that	allows	users	to	securely	store	
large-scale	unstructured	or	structured	data	(that	might	be	too	large	to	fit	into	an	SQL	
database).	
	
High-Performance	Computing:	a	service	that	works	with	HDInsight	to	analyze	massive	data	
sets	through	clustered	computing	nodes	that	work	in	parallel.	This	technique	is	discussed	
more	in-depth	below	in	the	AzureBlast	case	study.	
	
From	Microsoft	Azure	for	Research	Overview	(Microsoft	Corporation,	2013).	
	
III	–	Nimbus:	Cloud	Computing	for	Science	
	
Nimbus:	Cloud	Computing	for	Science	is	an	alternative	solution	for	researchers	and	
scientists	seeking	to	leverage	cloud	computing	in	their	work,	developed	by	a	team	at	the	
University	of	Chicago.	It	is	a	free	and	open	source	IaaS	framework	that	enables	users	to	
dynamically	allocate	their	own,	private	physical	IT	resources	(or	lease	and	access	others’	private	
IT	resources)	(Keahey	&	Freeman,	2008).	The	core	mechanisms	and	services	within	Nimbus	
include:	
	
Workspace	Service:	this	service	consists	of	a	front-end	workspace	that	connects	the	user	to	
a	back-end	resource	manager	in	order	to	deploy	and	manage	virtual	machines.	
	
Workspace	Resource	Manager/Workspace	Pilot:	deploys	and	manages	VMs	through	
various	sub-mechanisms.	
	
Workspace	Control	Tools:	used	to	start,	stop,	pause,	connect,	and	manage	VMs.	
	
IaaS	Gateway:	allows	a	user	to	extend	Nimbus	with	another	IaaS	infrastructure	by	mapping	
the	user’s	KPI	credentials	from	the	second	infrastructure	back	to	their	Nimbus	credentials.	
Nimbus	IaaS	Gateway	is	configured	to	support	Amazon	EC2’s	REST	API,	working	with	the	
Nimbus	Context	Broker	to	allow	users	to	easily	extend	their	private	cloud	resources	with	
public	cloud	resources	if	computing	demand	exceeds	capacity.
6	
6	
Context	Broker:	creates	a	common	configuration	and	security	context	across	resources	
provisioned	from	one	or	many	cloud	infrastructures,	enabling	the	user	to	operate	in	a	multi-
cloud	environment	(combining	private	and	public	cloud	capabilities).	
	
Nimbus	Storage	Service:	allows	the	user	to	manage	their	cloud	storage	space	and	VM	
image	repository,	and	works	in	conjunction	with	GridFTP	to	enable	users	to	connect	Nimbus	
to	storage	area	networks,	and	a	range	of	other	distributed	file	systems.	
	
From	Science	Clouds:	Early	Experience	in	Cloud	Computing	for	Scientific	Applications	(Keahey	and	Freeman,	
2008).	
IV	–	Public	vs.	Private	Cloud	Infrastructures	for	Scientific	Applications	
	
The	benefits	presented	by	cloud	computing	for	scientific	applications	–	dynamic	scalability,	
low	implementation	cost,	and	flexibility,	among	others	–	are	fairly	obvious,	but	there	are	
challenges	presented	when	using	cloud	resources	in	scientific	computation.	Researchers	
employing	cloud	computing	in	their	work	can	choose	to	leverage	public	cloud	resources,	such	as	
Microsoft’s	Azure	or	Amazon	EC2,	or	open-source	frameworks	to	leverage	private	cloud	
resources,	such	as	Nimbus,	OpenNebula,	or	Eucalyptus.	Each	of	these	options	presents	a	unique	
set	of	costs	and	benefits	for	researchers	and	scientists,	stemming	from	the	requirements	
commonly	posed	by	scientific	applications.	
	
Scientific	computation	is	different	from	other	forms	of	computation,	in	that	it	often	requires	
data	to	be	transferred	at	high	volume	between	different	applications,	components	of	an	
application,	or	tiers	of	an	application	(Tudoran	et	al.,	2012).	For	example,	the	Biomass	
Succession	Extension	of	LANDIS-II	is	a	scientific	computation	model	that	simulates	annual	forest	
growth	across	space	using	a	collection	of	ecosystem	process	models.	LANDIS-II	breaks	landmass	
down	into	small	cells,	and	computes	forest	growth	within	these	cells	using	shade	tolerance,	
maximum	annual	net	primary	productivity,	maximum	aboveground	live	biomass,	probability	of	
establishment	[of	an	organism],	growth	shape	parameter,	mortality	shape	parameter,	effective	
seed	dispersal	distance,	species	longevity,	and	minimum	threshold	for	shade	(Simons-Legaard,	
Legaard,	and	Weiskittel,	2015).	In	order	to	generate	accurate	predictions,	each	of	these	
variables	is	associated	with	a	fairly	complex	mathematical	model.	
	
Running	this	model	requires	that	data	from	each	cell	to	be	fed	through	each	component	of	
the	application,	and	re-combined	to	generate	the	output	prediction	from	initial	input	data	and	
parameters.	Most	non-science	applications	do	not	require	data	to	undergo	as	many	
abstractions,	or	move	between	as	many	components	or	tiers.	Therefore,	public	cloud
7	
7	
computing	services	were	not	specifically	designed	to	support	these	types	of	applications,	as	
architects	did	not	place	inter-VM	throughput	as	high	on	their	list	of	priorities	as	scientific	
computation	requires	(Tudoran	et	al.,	2012).			
	
Other	possible	issues	presented	when	using	public	cloud	services	include	variability	of	
performance	due	to	multi-tenancy	(i.e.	when	demand	from	other	users	spikes,	performance	
may	decline),	the	financial	cost	of	leasing	resources,	and	portability	of	data	to	and	from	“the	
cloud.”	Tudoran	et	al.	(2012)	explore	these	problems,	and	examine	the	costs	and	benefits	of	
using	public	clouds	vs.	open-source	private	cloud	frameworks.		
	
Tudoran	et	al.	(2012)	compared	Microsoft	Azure	and	Nimbus’s	performance	for	use	with	
scientific	applications	by	configuring	them	to	run	an	instance	of	“A-Brain,”	a	reference	
application	used	by	researchers	to	compare	brain	regions	of	MRI	images	with	genes	in	order	to	
find	links	between	the	two.	Like	LANDIS-II,	A-Brain	requires	large	data	sets	to	be	fed	through	
and	transported	between	multiple	components	of	the	application	in	order	to	generate	results.	
	
They	found	that,	in	terms	of	performance,	Nimbus	was	better	suited	for	this	type	of	use	due	
to	its	open-source	and	configurable	nature,	as	well	as	the	fact	that	it	is	a	private	cloud	
framework.	All	physical	IT	resources	were	leased	from	private	providers,	with	unshared	
bandwidth	close	in	proximity	to	one	another	(as	opposed	to	using	shared,	globally-distributed	
public	computing	resources).	However,	though	downloading	and	using	the	Nimbus	cloud	
framework	is	free,	Tudoran	et	al.	(2012)	found	that	it	is	13.5%	cheaper	per	year	to	use	Azure.		
	
Their	estimate	of	how	much	it	would	cost	to	host	Nimbus	on	private	resources	for	scientific	
computing	(the	cost	of	ownership	or	lease	from	one	or	many	institutions)	exceeds	the	cost	of	
using	Azure.	One	additional	concern	is	researchers’	ability	to	directly	manage	their	cloud	
infrastructure.	Nimbus	requires	a	higher	degree	of	direct	involvement	and	configuration	than	
Azure,	as	Microsoft	does	much	of	this	for	their	clients.	
	
Thus,	which	platform	a	researcher	chooses	to	use	boils	down	to	(1)	the	availability	of	
private	IT	resources	for	use	with	Nimbus,	(2)	the	value	they	place	on	Nimbus’s	higher	level	of	
customization	and	inter-VM	throughput,	and	(3)	the	degree	to	which	they	are	able	to	directly	
manage	their	cloud	resources.	This	example	in	Tudoran	et	al.	(2012)	provides	a	framework	for	
researchers	to	perform	an	effective	cost/benefit	analysis	when	making	this	choice.	The	
following	two	sections	discuss	case	studies	of	researchers	and	scientists	using	cloud-based	
resources	for	computational	research:	one	example	using	Azure,	and	a	second	using	Nimbus.
8	
8	
V	–	Azure	and	BLAST	
	
The	Basic	Alignment	Search	Tool	(BLAST)	is	an	algorithm	that	detects	similarities	between	
bio-sequences.	It	has	been	used	heavily	since	the	1990s	within	genetics,	bioinformatics,	
molecular	biology,	and	other	related	fields.	BLAST	can	be	used	to	compare	DNA	and	amino	acid	
sequences,	and	score	their	similarity	best	on	the	frequency	and	length	of	identical	segments	
(Altschul	et	al.,	1990).		
	
Liu,	Jackson,	and	Barga	(2010)	created	an	implementation	of	the	BLAST	algorithm	using	
Microsoft’s	Azure,	which	they	refer	to	as	“AzureBlast.”	Like	Tudoran	et	al.	(2012),	Liu,	Jackson,	
and	Barga	(2010)	note	that	Azure’s	architecture	is	not	optimized	for	low	latency	communication	
between	different	nodes	of	computation.	However,	since	BLAST	is	a	single	(albeit	data-
intensive)	algorithm	that	does	not	require	high-volume	communication	between	VMs,	they	
argue	that	BLAST	is	very	well	suited	for	implementation	in	cloud	environments.		
	
AzureBlast	uses	thousands	of	concurrently	operating	instances	of	BLAST	within	Azure’s	
Cloud	Services	framework	(refer	to	Section	II	for	a	definition	of	Azure’s	various	mechanisms	and	
services),	each	handling	subsections	of	a	larger	bio-sequence.	The	mechanisms	used	within	
AzureBlast	include:	
	
Job	Submission	Portal:	this	is	the	front-end	portal	from	which	users	can	submit	jobs,	which	
consist	of	the	two	bio-sequences	to	be	compared,	along	with	any	parameters	researchers	
wish	to	set.	
	
Job	Scheduler:	this	mechanism	accepts	jobs	submitted	through	the	front-end	portal,	and	
uses	rules	to	schedule	computation	tasks	via	the	dispatch	queue.	
	
Tasks:	the	job	scheduler	mechanism	breaks	down	the	job	(i.e.	the	two	bio-sequences	in	full)	
into	smaller	segments.	Each	individual	segment	comparison	is	considered	a	“task.”	
Thousands	of	tasks	are	carried	out	by	worker	role	instances	simultaneously	to	expedite	the	
process,	and	the	resulting	data	is	combined	at	the	end	to	provide	the	final	result.	
	
Worker	Role	Instances:	instances	of	applications	within	the	Azure	framework	–	these	
instances	automatically	“poll”	the	dispatch	queue	to	find	work,	submit	the	resulting	data	
from	each	task	in	an	Azure	storage	mechanism	(in	this	case,	a	Blob).	Workers	poll	the	
dispatch	queue	once	again	after	completing	a	task,	repeating	the	process	until	no	tasks	
remain.
9	
9	
Dispatch	Queue:	using	the	Queues	feature	of	Azure,	the	dispatch	queue	mechanism	
provides	an	asynchronous	message	delivery	system	between	compute	roles	–	the	dispatch	
is	filled	with	“tasks”	by	the	job	scheduler,	and	these	tasks	are	retrieved	by	worker	role	
instances	seeking	work.	
	
From	Liu,	Jackson,	and	Barga	(2010).	
	
In	addition	to	these	mechanisms,	configured	from	Azure’s	native	features,	Liu,	Jackson,	and	
Barga	(2010)	added	their	own	abstraction	layer	(referred	to	as	a	“Task	Parallel	Library”)	to	
coordinate	the	simultaneous	operations	of	each	worker	role	instance.	The	Task	Parallel	Library	
uses	a	“fork”	function	to	split	bio-sequences	into	manageable	segments,	and	a	“join”	function	
to	string	the	resulting	data	back	together.		
	
	After	extensive	testing,	they	found	that	there	is	roughly	a	linear	relationship	between	the	
number	of	worker	instances	in	use,	and	the	number	of	sequences	processed	per	minute.	They	
also	found	that	the	dollar	cost	of	each	sequence	decreases	until	the	number	of	input	sequences	
reaches	about	100.	AzureBlast	processes	about	the	same	number	of	sequences	per	dollar	at	
any	scale	larger	than	this.	While	they	note	that	the	messaging	API	within	Azure	is	“inconvenient	
and	unintuitive	for	developing	parallel	applications	for	science”	(stating	that	researchers	must	
be	careful	when	coordinating	more	than	one	queue	and	handling	errors/exceptions),	their	
overall	conclusion	is	that	Azure	is	well	suited	for	deploying	instances	of	BLAST.	
VI	–	Nimbus	and	CyberGIS	
	
CyberGIS	is	a	collaborative	project	headed	by	Professor	Shaowen	Wang	at	the	University	of	
Illinois,	Urbana-Champaign.	CyberGIS	uses	National	Science	Foundation	XSEDE	supercomputers,	
and	OpenGrid	computing	resources	to	perform	geo-spatial	computation	as	a	service.	CyberGIS	
applications	are	available	for	users	at	no	cost	through	a	web	portal,	referred	to	as	the	CyberGIS	
Gateway	–	http://sandbox.cigi.illinois.edu/home/.		
	
CyberGIS	employs	a	three-tiered	architecture:	(1)	the	web	portal,	(2)	“GISolve”	middleware,	
which	serves	as	a	bridge	between	the	web	portal	and	physical	IT	resources,	and	(3)	physical	
computing	architecture.	Examples	of	applications	hosted	on	CyberGIS	include	BioScope,	a	
biofuel	supply	chain	optimization	application,	TauDEM,	which	extracts	hydrological	information	
from	topography,	and	FluMapper,	which	analyzes	large-scale,	location-based	social	media	data.	
	
CyberGIS	remotely	uses	physical	IT	resources	from	two	sources	–	resources	owned	by	
OpenGrid,	and	supercomputers	owned	by	the	National	Science	Foundation.	Riteau	et	al.	(2014)
10	
10	
identify	auto-scaling	as	a	major	concern	for	CyberGIS,	for	two	reasons.	(1)	There	is	highly	
variable	demand	on	the	CyberGIS	system,	due	to	its	accessibility	and	multi-user	nature.	For	
example,	if	a	professor	decides	to	use	CyberGIS	for	complex	computation	in	an	assignment	for	
their	students,	demand	spikes,	as	there	are	now	many	students	placing	high	demand	on	the	
system	all	at	once.	(2)	Many	researchers	take	an	exploratory	approach	to	their	research	with	
CyberGIS	–	it	is	common	for	researchers	to	run	complex	simulations	many	times	with	different	
parameters.	To	implement	auto-scaling	within	the	CyberGIS	architecture,	Riteau	et	al.	used	
Nimbus:	Cloud	Computing	for	Science,	as	it	is	a	free,	open-source	private	IaaS	framework	
specifically	tailored	for	use	with	scientific	computation.	
	
The	CyberGIS	Gateway	normally	used	static	VM	clusters	to	run	the	various	special	
regression	services	offered	by	CyberGIS	applications.	Using	Nimbus,	Riteau	et	al.	(2014)	
implemented	a	queuing	load	balancer,	and	a	dynamic	scaling	mechanism	(referred	to	as	the	
“Decision	Engine”)	that	provisioned	and	terminated	VM	instances	on-demand	in	response	to	
the	queuing	load	balancer.		
	
Queuing	Load	Balancer:	this	mechanism	distributes	HTTP	requests	from	the	CyberGIS	
gateway	among	a	pool	of	back-end	servers,	queues	requests	when	all	servers	are	in	use,	
and	provides	metrics	on	requests	and	workloads.	
	
Decision	Engine:	uses	the	API	native	to	Nimbus	to	requests	changes	in	the	number	of	
deployed	VM	instances.	When	new	instances	are	provisioned,	this	mechanism	integrates	
them	into	the	pool	of	back-end	servers	used	by	the	queuing	load	balancer	to	distribute	
workloads.	
	
Riteau	et	al.	(2014)	compared	average	response	times	for	requests	made	through	the	
CyberGIS	Gateway	from	the	original,	static	VM	configuration,	and	the	dynamically	scaled,	
Nimbus	configuration.	They	found	that	while	response	time	increased	in	proportion	to	the	
number	of	simultaneous	requests	submitted	to	the	static	configuration,	response	time	
remained	stable	when	they	increase	the	number	of	simultaneous	requests	submitted	to	the	
dynamic	configuration.	They	tested	these	configurations	for	both	small	and	large,	complex	file	
submissions	–	the	size	of	submitted	jobs	did	not	affect	response	time.	They	conclude	that	
Nimbus	provides	a	viable	auto-scaling	solution	for	their	existing	framework	–	response	time	for	
many	simultaneous	requests	was	reduced	from	150	seconds	to	50	seconds	for	large	files,	and	
from	over	120	seconds	to	29.5	seconds	for	small	files.	
VII	–	Scientific	Data	sets	in	the	Cloud
11	
11	
In	addition	to	cloud-based	computation,	there	are	also	many	public	data	sets	hosted	on	
cloud	services	for	use	by	researchers	and	scientists.	These	cloud	providers	not	only	enable	
researchers	to	perform	complex	operations	on	large-scale	data,	they	also	provide	them	with	a	
way	to	store	and	manage	these	massive	data	sets	that	are	simply	too	large	for	local	storage.	
Two	notable	examples	of	these	data	set	storage	services	are	(1)	Amazon	Public	Web	Services	
Data	Sets,	and	(2)	the	Open	Science	Data	Cloud.	
	
Amazon	AWS	Public	Data	Sets	
	
Amazon	hosts	several	large-scale	data	sets	on	their	own	servers,	which	can	be	quickly	and	
easily	processed	with	the	applications	of	researchers’	choice	on	Amazon’s	EC2	cloud	services.	
This	is	extremely	beneficial	for	researchers,	as	they	no	longer	need	to	spend	time	and	resources	
on	locating	and	migrating	cumbersome	files	(Amazon	Web	Services,	Inc.,	2015).	Some	examples	
of	these	data	sets	include:		
	
1000	Genomes	Project:	an	international	project	cataloguing	human	genetic	variation	–	this	
contains	data	sequenced	from	over	2,500	individuals.	
	
CCAFS-Climate	Data:	this	is	a	climate	data	set	of	open-access	climate	projections,	primarily	
targeted	at	researchers	seeking	to	assess	the	impact	of	climate	change	on	agriculture.	
	
Denisova	Genome:	this	is	a	“high-coverage”	genome	of	a	Denisovian	(a	sister	species	to	
Neanderthals),	one	of	the	most	closely	related	extinct	species	to	humans.	
	
LandSat	on	AWS:	a	collaboration	between	the	USGS	and	NASA,	containing	continuous	
satellite	imagery	of	Earth’s	entire	surface	from	1972	to	present	–	images	are	updated	
consistently.	
	
Petroleum	Public	Data	Set:	public	domain	data	from	a	variety	of	petroleum	organizations	
worldwide.	
	
From	AWS	Public	Data	Sets	(Amazon	Web	Services	Inc.,	2015).	
	
Open	Science	Data	Cloud	
	
The	Open	Science	Data	Cloud	(OSDC)	is	a	“petabyte-scale”	resource	for	scientists	to	store,	
share,	and	analyze	large-scale	data	sets.	It	hosts	material	similar	to	AWS	Public	Data	Sets,	as	
well	as	generalized	computing	resources	–	one	for	general	computation,	and	another	for
12	
12	
restricted	computation	(e.g.	a	project	using	sensitive	medical	data).	Like	Nimbus,	this	resource	
is	directly	geared	toward	researchers	and	scientists.	However,	researchers	must	first	be	
approved	by	the	OSDC	after	submitting	a	proposal	before	they	are	allocated	resources,	rather	
being	provided	free	and	open	access	to	configure	their	own	resources	(Open	Science	Data	
Cloud,	2015).	Examples	of	these	data	sets	include:	
	
City	of	Chicago	Public	Data	Sets:	a	large	set	of	social	data	from	the	City	of	Chicago,	in	both	
tabular	and	“raw”	form.	
	
Complete	Genomics	Public	Data:	entire	human	gene	sequences	provided	by	Complete	
Genomics.	These	include	samples	from	disease-free	individuals	and	individuals	with	cancer.	
	
Earth	Observing-1	Mission:	80.5	terabytes	of	data	from	NASA’s	Earth	Observing-1	satellite	
mission.	
	
Large-Scale	Data	Analysis	and	Visualization	Symposium	Data:	data	from	a	global	climate	
dynamics	simulation	run	on	Oak	Ridge	National	Laboratory’s	Titan	supercomputer.	
	
Sloan	Digital	Sky	Survey:	consists	of	“a	series	of	three	interlocking	images	and	
spectroscopic	surveys,	carried	out	over	an	eight-year	period”	from	Apache	Point	
Observatory	in	New	Mexico.	
	
From	OSDC:	Open	Science	Data	Cloud	(Open	Science	Data	Cloud,	2015).	
	
These	cloud-based,	public	data	sets	will	not	only	assist	researchers	employed	at	dedicated	
institutions	and	universities	–	they	may	create	opportunities	for	enterprises	to	leverage	public	
data	for	business	purposes.	This	data	could	facilitate	international	collaboration.	If	a	group	of	
specialists	located	in	disparate	locations	decides	to	collaborate	on	a	project,	geographic	or	
political	barriers	to	porting	this	data	are	removed,	as	anyone	with	a	strong	Internet	connection	
can	access	them.	
VIII	–	Current	Research	
	
As	discussed	in	Section	IV,	there	are	certain	requirements	common	for	scientific	
applications	that	do	not	exist	for	most	other	classes	of	applications.	As	the	use	of	cloud	
computing	for	scientific	applications	has	increased	in	the	last	few	years,	researchers	have	
sought	to	develop	mechanisms	to	make	cloud	computing	environments	more	scientific	
application-friendly.	Two	examples	are	discussed	below	–	the	Cloud	Resource	Broker
13	
13	
(CLOUDRB),	which	manages	resources	and	jobs	according	to	user-specified	deadlines,	and	a	
“transparent	elastic	disk	throughput”	mechanism,	which	works	to	optimize	cloud	storage	
provisions	for	scientific	applications.	
	
CLOUDRB	
	
Scientific	applications	often	require	large	amounts	of	data	to	be	processed	by	a	certain	
deadline	–	cloud	environments	designated	to	host	scientific	applications	(e.g.	a	deployment	of	
Nimbus)	may	have	many	applications	running	simultaneously,	each	with	its	own	deadline.	
Somasundaram	and	Govindarajam	(2013)	propose	a	Cloud	Resource	Broker	(CLOUDRB)	to	
prioritize	resource	provisioning	for	certain	jobs	over	others	in	an	open-source	science	cloud	
environment,	given	a	set	of	deadlines.		
	
This	mechanism	includes	components	that	accept	jobs	with	deadlines	assigned	by	users,	
prioritize	these	jobs	according	to	their	deadline	and	computation	requirements	(i.e.	the	
complexity	of	a	job),	and	dynamically	assign	resources	to	complete	jobs	within	these	deadlines.	
Somasundaram	and	Govindarajam	(2013)	compared	their	CLOUDRB	with	three,	commonly	
used,	resource	provisioning	mechanisms.	Their	experiment	involved	running	200,	400,	600,	800,	
and	1000	simultaneous	instances	of	an	ant	pheromone	comparison	algorithm	to	compare	the	
impact	of	these	provisioning	mechanisms	on	completion	time	and	job	rejection	rate.	They	
found	that	their	mechanism	completed	jobs	at	or	before	deadline	1.3-1.77	times	more	often	
than	the	three	alternative	mechanisms,	and	rejected	jobs	15%-35%	less.	This	is	significant,	in	
that	CLOUDRB	will	allow	researchers	to	more	accurately	predict	when	results	will	be	available	
from	complex	operations.	
	
Transparent	Elastic	Disk	Throughput	Mechanism	
	
Nicolae,	Riteau,	and	Keahey	(2015)	argue	that	while	the	dynamic	provisioning	of	computing	
resources	in	cloud	environments	has	been	studied	heavily,	dynamic	storage	allocation	has	
received	comparatively	little	attention.	While	users	can	expect	to	be	dynamically	allocated	
computing	resources	(e.g.	RAM	and	CPUs)	in	response	to	demand,	they	are	still	required	to	
manually	allocate	storage	resources.	Also,	users	must	either	choose	cheap,	low	I/O	throughput	
disks	or	expensive,	high	I/O	throughput	disks,	without	the	option	of	changing	storage	disk	type	
in	response	to	throughput	demand.	In	scientific	computing,	this	becomes	an	issue	because	it	is	
quite	common	for	applications	to	iteratively	demand	periods	of	very	high	storage	throughput	
(when	data	exits	an	application	in	bulk)	followed	by	periods	of	very	low	storage	throughput	
(when	the	actual	computation	is	occurring).
14	
14	
Nicolae,	Riteau,	and	Keahey	(2015)	designed	the	transparent	elastic	disk	throughput	
mechanism	to	dynamically	allocate	small,	high	throughput	storage	disks	to	supplement	
cheaper,	low	throughput	storage	disks	that	are	manually	provisioned	by	the	user	and	persist	
throughout	computation.	During	periods	of	high	throughput	demand,	this	dynamic	storage	
provisioning	mechanism	enables	the	user	to	avoid	the	higher	cost	associated	with	self-
provisioning	high	throughput	disks,	while	increasing	throughput	to	levels	associated	with	
higher-throughput	disks	when	necessary.		
	
Testing	this	mechanism	with	an	atmospheric	phenomena	simulator,	they	found	that	this	
dynamic	storage	allocation	increased	cost	by	only	3.3%,	and	reduced	completion	time	from	
1,471	seconds	to	1,231	seconds.	By	contrast,	statically	provisioning	high-throughput	disks	for	
use	through	the	entire	run	increased	costs	by	23%,	but	only	reduced	completion	time	from	
1,471	seconds	to	1,190	seconds	–	a	marginal	improvement	in	comparison	to	their	solution.	This	
mechanism	will	enable	researchers	and	scientists	to	more	efficiently	take	advantage	of	cloud	
computing	resources.	
IX	–	Conclusions	
	
After	reviewing	the	literature,	and	cloud	services	marketed	to	researchers	and	scientists,	it	
can	be	concluded	that	there	has	indeed	been	a	proliferation	of	cloud	computing	resources	for	
scientific	applications,	in	recent	years,	as	well	as	use	of	these	resources.	These	services	include	
cloud-based	data,	storage,	and	computation,	which	can	be	used	in	a	variety	of	ways	in	the	work	
of	scientists	from	a	variety	of	knowledge	domains.	For	fields	where	many	teams	and	individuals	
might	require	complex	modeling	and	simulation	software	for	large	data	sets,	this	trend	is	likely	
to	continue.	If	Microsoft	and	Amazon	wish	to	continue	to	market	their	services	to	scientists,	
they	will	need	to	improve	inter-VM	throughput,	as	discussed	in	Tudoran,	et	al.	(2012).	
Otherwise,	they	may	be	edged	out	by	emerging	competitors	that	leverage	the	use	of	open-
source	cloud	frameworks.
15	
15	
References	
	
Altschul,	F.	et	al.	(1990).	Basic	local	alignment	search	tool.	Journal	of	Molecular	Biology,	215(3):	
403-410.	Retrieved	from	http://www.blastalgorithm.com/	
	
Amazon	Web	Services,	Inc.	(2015).	AWS	public	data	sets.	Retrieved	from	
http://aws.amazon.com/public-data-sets/	
	
Keahey,	K.,	&	Freeman,	T.	(2008).	Science	clouds:	Early	experiences	in	cloud	computing	for	
scientific	applications.	Retrieved	from	http://www.nimbusproject.org/files/Science-Clouds-
CCA08.pdf	
	
Liu,	W.,	Jackson,	J.,	&	Barga,	R.	(2010).	AzureBlast:	A	case	study	of	developing	science	
applications	on	the	cloud.	Proceedings	from	the	1st
	Workshop	on	Scientific	Cloud	
Computing:	Indianapolis,	Indiana.	
	
Microsoft	Corporation.	(2013).	Microsoft	azure	for	research	overview.	Retrieved	from	
http://research.microsoft.com/en-us/projects/azure/windows-azure-for-research-
overview.pdf	
	
Nicolae,	B.,	Riteau,	P.,	&	Keahey,	K.	(2015).	Towards	transparent	throughput	elasticity	for	IaaS	
cloud	storage:	Exploring	the	benefits	of	adaptive	block-level	caching.	International	Journal	
of	Distributed	Systems	and	Technologies	(IJDST):	6(4).	
	
Open	Science	Data	Cloud.	(2015).	OSDC:	Open	science	data	cloud.	Retrieved	from	
https://www.opensciencedatacloud.org/	
	
Riteau,	R.	et	al.	(2014).	A	cloud	computing	approach	to	on-demand	and	scalable	CyberGIS	
analytics.	Proceedings	from	5th	Workshop	on	Scientific	Cloud	Computing	(ScienceCloud	
2014):	Vancouver,	Canada.	
	
Sempolinski,	P.,	&	Thain,	D.	(2010).	A	comparison	and	critique	of	Eucalyptus,	OpenNebula,	and	
Nimbus.	Proceedings	from	the	2nd
	IEEE	International	Conference	on	Cloud	Computing	
Technology	and	Science:	Indianapolis,	Indiana.
16	
16	
Simons-Legaard,	E.,	Legaard,	K.,	&	Weiskittel,	A.	(2015).	Predicting	aboveground	biomass	with	
LANDIS-II:	A	global	and	temporal	analysis	of	parameter	sensitivity.	Ecological	Modeling:	313,	
325-332.	
	
Somasundaram,	T.	S.,	&	Govindarajam,	K.	(2013).	CLOUDRB:	A	framework	for	scheduling	and	
managing	High-Performance	Computing	(HPC)	applications	in	science	cloud.	Future	
Generation	Computer	Systems:	34,	47-65.	
	
Tudoran,	R.	et	al.	(2012).	A	performance	evaluation	of	Azure	and	Nimbus	clouds	for	scientific	
applications.	Proceedings	from	the	2nd
	International	Workshop	on	Cloud	Computing	
Platforms:	Bern,	Switzerland.	
	
Vecchiola,	C.,	Pandey,	S.,	&	Buyya,	R.	(2009).	High-performance	cloud	computing:	A	view	of	
scientific	applications.	Proceedings	from	The	2009	10th	International	Symposium	on	
Pervasive	Systems,	Algorithms,	and	Networks:	Melbourne,	Australia.

More Related Content

What's hot

Unit 2 -Cloud Computing Architecture
Unit 2 -Cloud Computing ArchitectureUnit 2 -Cloud Computing Architecture
Unit 2 -Cloud Computing Architecture
MonishaNehkal
 
INTRODUCTION TO CLOUD COMPUTING
INTRODUCTION TO CLOUD COMPUTINGINTRODUCTION TO CLOUD COMPUTING
INTRODUCTION TO CLOUD COMPUTING
Tanmoy Barman
 
Cloud computing and Cloudsim
Cloud computing and CloudsimCloud computing and Cloudsim
Cloud computing and Cloudsim
Manash Kumar Mondal
 
cloud security ppt
cloud security ppt cloud security ppt
cloud security ppt
Devyani Vaidya
 
Cloud Computing Architecture
Cloud Computing ArchitectureCloud Computing Architecture
Cloud Computing Architecture
Animesh Chaturvedi
 
Cloud computing
Cloud computingCloud computing
Cloud computing
Aditya Dwivedi
 
Cloud Computing - An Introduction
Cloud Computing - An IntroductionCloud Computing - An Introduction
Cloud Computing - An Introduction
Ravindra Dastikop
 
CLOUD ENABLING TECHNOLOGIES.pptx
 CLOUD ENABLING TECHNOLOGIES.pptx CLOUD ENABLING TECHNOLOGIES.pptx
CLOUD ENABLING TECHNOLOGIES.pptx
Dr Geetha Mohan
 
The seminar report on cloud computing
The seminar report on cloud computingThe seminar report on cloud computing
The seminar report on cloud computing
Divyesh Shah
 
Application Load Balancer and the integration with AutoScaling and ECS - Pop-...
Application Load Balancer and the integration with AutoScaling and ECS - Pop-...Application Load Balancer and the integration with AutoScaling and ECS - Pop-...
Application Load Balancer and the integration with AutoScaling and ECS - Pop-...
Amazon Web Services
 
Cloud Computing Security Issues
Cloud Computing Security IssuesCloud Computing Security Issues
Cloud Computing Security Issues
Stelios Krasadakis
 
Fog computing
Fog computingFog computing
Fog computing
Ayush Chaurasia
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
Naveed Farooq
 
Key Challenges In CLOUD COMPUTING
Key Challenges In CLOUD COMPUTINGKey Challenges In CLOUD COMPUTING
Key Challenges In CLOUD COMPUTING
Atul Chounde
 
Cloud computing and Cloud security fundamentals
Cloud computing and Cloud security fundamentalsCloud computing and Cloud security fundamentals
Cloud computing and Cloud security fundamentals
Viresh Suri
 
basic concept of Cloud computing and its architecture
basic concept of Cloud computing  and its architecturebasic concept of Cloud computing  and its architecture
basic concept of Cloud computing and its architecture
Mohammad Ilyas Malik
 
Fundamental Cloud Security
Fundamental Cloud SecurityFundamental Cloud Security
Fundamental Cloud Security
Mohammed Sajjad Ali
 
Cloud computing
Cloud computingCloud computing
Cloud computing
Ripal Ranpara
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
Samit Kumar Kapat
 
cluster computing
cluster computingcluster computing
cluster computing
anjalibhandari11011995
 

What's hot (20)

Unit 2 -Cloud Computing Architecture
Unit 2 -Cloud Computing ArchitectureUnit 2 -Cloud Computing Architecture
Unit 2 -Cloud Computing Architecture
 
INTRODUCTION TO CLOUD COMPUTING
INTRODUCTION TO CLOUD COMPUTINGINTRODUCTION TO CLOUD COMPUTING
INTRODUCTION TO CLOUD COMPUTING
 
Cloud computing and Cloudsim
Cloud computing and CloudsimCloud computing and Cloudsim
Cloud computing and Cloudsim
 
cloud security ppt
cloud security ppt cloud security ppt
cloud security ppt
 
Cloud Computing Architecture
Cloud Computing ArchitectureCloud Computing Architecture
Cloud Computing Architecture
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Cloud Computing - An Introduction
Cloud Computing - An IntroductionCloud Computing - An Introduction
Cloud Computing - An Introduction
 
CLOUD ENABLING TECHNOLOGIES.pptx
 CLOUD ENABLING TECHNOLOGIES.pptx CLOUD ENABLING TECHNOLOGIES.pptx
CLOUD ENABLING TECHNOLOGIES.pptx
 
The seminar report on cloud computing
The seminar report on cloud computingThe seminar report on cloud computing
The seminar report on cloud computing
 
Application Load Balancer and the integration with AutoScaling and ECS - Pop-...
Application Load Balancer and the integration with AutoScaling and ECS - Pop-...Application Load Balancer and the integration with AutoScaling and ECS - Pop-...
Application Load Balancer and the integration with AutoScaling and ECS - Pop-...
 
Cloud Computing Security Issues
Cloud Computing Security IssuesCloud Computing Security Issues
Cloud Computing Security Issues
 
Fog computing
Fog computingFog computing
Fog computing
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Key Challenges In CLOUD COMPUTING
Key Challenges In CLOUD COMPUTINGKey Challenges In CLOUD COMPUTING
Key Challenges In CLOUD COMPUTING
 
Cloud computing and Cloud security fundamentals
Cloud computing and Cloud security fundamentalsCloud computing and Cloud security fundamentals
Cloud computing and Cloud security fundamentals
 
basic concept of Cloud computing and its architecture
basic concept of Cloud computing  and its architecturebasic concept of Cloud computing  and its architecture
basic concept of Cloud computing and its architecture
 
Fundamental Cloud Security
Fundamental Cloud SecurityFundamental Cloud Security
Fundamental Cloud Security
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
cluster computing
cluster computingcluster computing
cluster computing
 

Similar to Cloud-Based Solutions for Scientific Computing

Curation of Research Data
Curation of Research DataCuration of Research Data
Curation of Research Data
Michael Day
 
Scientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an OverviewScientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an Overview
Angelo Salatino
 
06 e science-bio diversity@ pacc 18.07.2014
06 e science-bio diversity@ pacc 18.07.201406 e science-bio diversity@ pacc 18.07.2014
06 e science-bio diversity@ pacc 18.07.2014
VinothkumaR Ramu
 
Understanding the Big Picture of e-Science
Understanding the Big Picture of e-ScienceUnderstanding the Big Picture of e-Science
Understanding the Big Picture of e-Science
Andrew Sallans
 
CCCORE: Cloud Container for Collaborative Research
CCCORE: Cloud Container for Collaborative Research CCCORE: Cloud Container for Collaborative Research
CCCORE: Cloud Container for Collaborative Research
IJECEIAES
 
Current and emerging scientific data curation practices
Current and emerging scientific data curation practicesCurrent and emerging scientific data curation practices
Current and emerging scientific data curation practices
Michael Day
 
The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative research
Blue BRIDGE
 
User engagement in research data curation
User engagement in research data curationUser engagement in research data curation
User engagement in research data curation
EDINA, University of Edinburgh
 
Mtsr2015 goble-keynote
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynote
Carole Goble
 
Aussois bda-mdd-2018
Aussois bda-mdd-2018Aussois bda-mdd-2018
Aussois bda-mdd-2018
Khalid Belhajjame
 
2016 05-20-clariah-wp4
2016 05-20-clariah-wp42016 05-20-clariah-wp4
2016 05-20-clariah-wp4
CLARIAH
 
NIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWGNIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWG
Geoffrey Fox
 
Sla2009 D Curation Heidorn
Sla2009 D Curation HeidornSla2009 D Curation Heidorn
Sla2009 D Curation Heidorn
Bryan Heidorn
 
Disciplinary and institutional perspectives on digital curation
Disciplinary and institutional perspectives on digital curationDisciplinary and institutional perspectives on digital curation
Disciplinary and institutional perspectives on digital curation
Michael Day
 
Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11
Robert Grossman
 
The eCrystals Federation
The eCrystals FederationThe eCrystals Federation
The eCrystals Federation
ManjulaPatel
 
Integrated research data management in the Structural Sciences
Integrated research data management in the Structural SciencesIntegrated research data management in the Structural Sciences
Integrated research data management in the Structural Sciences
ManjulaPatel
 
User Engagement in Research Data Curation
User Engagement in Research Data CurationUser Engagement in Research Data Curation
User Engagement in Research Data Curation
University of Edinburgh
 
Knowledge Graphs for Scholarly Communication
Knowledge Graphs for Scholarly CommunicationKnowledge Graphs for Scholarly Communication
Knowledge Graphs for Scholarly Communication
Leipziger Semantic Web Tag
 
April_2024_Top_10_Read_Articles_in_D.pdf
April_2024_Top_10_Read_Articles_in_D.pdfApril_2024_Top_10_Read_Articles_in_D.pdf
April_2024_Top_10_Read_Articles_in_D.pdf
ijdms
 

Similar to Cloud-Based Solutions for Scientific Computing (20)

Curation of Research Data
Curation of Research DataCuration of Research Data
Curation of Research Data
 
Scientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an OverviewScientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an Overview
 
06 e science-bio diversity@ pacc 18.07.2014
06 e science-bio diversity@ pacc 18.07.201406 e science-bio diversity@ pacc 18.07.2014
06 e science-bio diversity@ pacc 18.07.2014
 
Understanding the Big Picture of e-Science
Understanding the Big Picture of e-ScienceUnderstanding the Big Picture of e-Science
Understanding the Big Picture of e-Science
 
CCCORE: Cloud Container for Collaborative Research
CCCORE: Cloud Container for Collaborative Research CCCORE: Cloud Container for Collaborative Research
CCCORE: Cloud Container for Collaborative Research
 
Current and emerging scientific data curation practices
Current and emerging scientific data curation practicesCurrent and emerging scientific data curation practices
Current and emerging scientific data curation practices
 
The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative research
 
User engagement in research data curation
User engagement in research data curationUser engagement in research data curation
User engagement in research data curation
 
Mtsr2015 goble-keynote
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynote
 
Aussois bda-mdd-2018
Aussois bda-mdd-2018Aussois bda-mdd-2018
Aussois bda-mdd-2018
 
2016 05-20-clariah-wp4
2016 05-20-clariah-wp42016 05-20-clariah-wp4
2016 05-20-clariah-wp4
 
NIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWGNIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWG
 
Sla2009 D Curation Heidorn
Sla2009 D Curation HeidornSla2009 D Curation Heidorn
Sla2009 D Curation Heidorn
 
Disciplinary and institutional perspectives on digital curation
Disciplinary and institutional perspectives on digital curationDisciplinary and institutional perspectives on digital curation
Disciplinary and institutional perspectives on digital curation
 
Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11
 
The eCrystals Federation
The eCrystals FederationThe eCrystals Federation
The eCrystals Federation
 
Integrated research data management in the Structural Sciences
Integrated research data management in the Structural SciencesIntegrated research data management in the Structural Sciences
Integrated research data management in the Structural Sciences
 
User Engagement in Research Data Curation
User Engagement in Research Data CurationUser Engagement in Research Data Curation
User Engagement in Research Data Curation
 
Knowledge Graphs for Scholarly Communication
Knowledge Graphs for Scholarly CommunicationKnowledge Graphs for Scholarly Communication
Knowledge Graphs for Scholarly Communication
 
April_2024_Top_10_Read_Articles_in_D.pdf
April_2024_Top_10_Read_Articles_in_D.pdfApril_2024_Top_10_Read_Articles_in_D.pdf
April_2024_Top_10_Read_Articles_in_D.pdf
 

Recently uploaded

GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 

Recently uploaded (20)

GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 

Cloud-Based Solutions for Scientific Computing