Data	Curation
Why	should	we	care?
Yasmin	AlNoamany
Old	Dominion	University
Web	Science	and	Digital	Libraries	Group	
ws-dl.cs.odu.edu
@yasmina_anwar @WebSciDL
1
Presented for CLIR Postdoctoral Fellow for Data Curation at
Vanderbilt University
About	me
2
Academic	degrees
3
Yasmin	AlNoamany
Ph.D.	Candidate	at	ODU
yasmin@cs.odu.edu
• Bachelor's	degree	of	
Computer	Science	
• Master's	degree	in	
Computer	Science	
• A	Doctor	of	Philosophy	in	
Computer	Science
Old	Dominion	University(2011-2016)
• Research	Assistant:	
integrating	the	past	
with	the	present	
“Storytelling	for	
Summarizing	
Collections	in	Web	
Archives”	
• Teaching	Assistant
4
Archived	collectionsStorytelling	services
Archived	enriched	
stories
Internet	Archive(summer	2014-fall	2014)
• Log	analysis	
• Tools	for	managing	
seed	URIs
5
0.11.160.135 [02/Feb/2012:00:01:03] "GET
http://web.archive.org/web/20070519015308i
m_/http://www.jcdl.org/images/jcdl2007-
edie.jpg HTTP/1.1" 200 2137 "-"
"Mozilla/5.0"
0.11.160.135 [02/Feb/2012:00:01:03] "GET
http://staticweb.archive.org/images/toolba
r/wayback-toolbar-logo.png HTTP/1.1" 200
3700 "–" "Mozilla/5.0"
0.151.147.108 [02/Feb/2012:00:01:03] "GET
http://web.archive.org/web/20100102003557/
about:blank HTTP/1.1" 302 0 "www.xx.com"
"Mozilla/4.0"
Personal	
• Women	in	Tech	
communities:	
@anitaborg,	
@systers,	@arabwic
• Photography
• A	mom	for	this	
adorable	7	years	old
6
Awards	and	publications
• Best	Teaching	Award
• Best	Student	Paper	
Award
• 9	papers,	in	which	3	
are	journals.
7
Data	Curation
8
Why	should	we	care?
9
Data	Management
10
11
Data	Management
How	I	got	the	logs	from	the	IA
12
Source:	http://www.tamr.com/real-data-scientists-enterprise/
Even	we	save	the	data,	how	it	will	be	
shared	and	re-used?
13
Metadata	is	important
14
The	call	for	a	revolution	in	Egypt
• It	all	started	on	
Facebook
15
Multiple	initiatives	for	documenting	
the	Egyptian	Revolution
16
Several	studies	and	books	about	
the	Egyptian	Revolution	
17
These	studies	and	books	cited	
these	sites
18
They	do	not	exist	any	more!
19
Data	preservation	is	important	
for	posterity	
• A	year	after	the	
Egyptian	Revolution,	
11%	of	the	social	media	
documentation	is	gone.
20Source:	http://ws-dl.blogspot.com/2012/02/2012-02-11-losing-my-revolution-year.html
Data	Curation	is	important	for	
scholarly	research
• Managing	your	research	data	saves	time	
• Universities	and	other	research	organizations	invest	very	large	
sums	of	money	into	research	activities
• Digital	data	is	inherently	prone	to	loss	
• Future	access	to	valuable	digital	assets	depends	upon	
curation/preservation	actions	taken	today	
• Funding	agency	requirements	
• Research	data	should	be	shared	and	publicly	accessible:
• Increase	the	impact	of	your	research	
• Make	attribution	easy
• Call	for	accountability	and	transparency
• Permits	others	to	replicate	the	findings	of	a	study
• Scholarly	communication	chain—connecting	data	to	publication	
21
What	is	Data	Curation?
• “Data	curation	is	the	active	and	ongoing	
management	of	research	data	through	its	lifecycle	
of	interest	and	usefulness	to	scholarship,	science,	
and	education.”	– Carole	Palmer,	UIUC	GSLIS
• Data	management
• Adding	value	to	data	
• Data	preservation	for	later	re-use
22
DCC	Curation	life	cycle
23Source:	http://www.dcc.ac.uk/resources/curation-lifecycle-model
CONCEPTUALIZE
Step-by-step	instruction	and	templates	for	creating,	
publishing	and	sharing	data	management	plans	that	
satisfy	funding	agency	mandates
24
CREATE	OR	RECEIVE
25
A	collaborative	working	space	and	data-sharing	platform
APPRAISE	&	SELECT
26
Identification,	Validation,	Characterization
INGEST
27
• Handle	a	wide	variety	of	transfer	processes
• Assure	the	availability	of	the	research	data	across	
institutions	and	publishers	and	keep	it	discoverable
PRESERVATION	ACTION
28
• Extract		metadata	in	XML	format
• Create	checksum,	or	hashtag	for	the	data	objects
• Facilitate	data	discovery	and	re-use
• Raise	interest	in	your research
• Facilitate	preservation
STORE
29
• Get	credit	for	your	data	and	build	your	reputation.
• You	data	is	discoverable	and	can	be	attributed	to	you.
• Other	researchers	can	find	data	associated	with	a	
publication	and	explore	new	ways	to	use	it.
ACCESS,	USE	&	REUSE
30
TRANSFORM
31
Migrating	the	data	and	put	them	into	another	format
Summary
• What	is	Data	Curation?
• Annotation
• Management
• Validation	
• Preservation
• Sharing
• Access	and	Re-use
• Authentication
32
• Why	do	we	need	Data	
Curation?	
• Long-term	access	
• Re-use
• Interoperability
• Reproducibility
• Cost-effective
• Time-saving
• Creditability
• Accountability
“Data	curation	systems	should	be	
integrated	with	the	active	research	
phase”
33
Yasmin	AlNomanyOld	Dominion	University
Web	Science	and	Digital	Libraries	Group	
ws-dl.cs.odu.edu
http://www.cs.odu.edu/~yasmin/
https://www.linkedin.com/in/yasminalnoamany
https://github.com/yasmina85/
@yasmina_anwar @WebSciDL
Backup	Slides
34
Data	&	Complexity	
• Research	problems	
increasingly	interdisciplinary	
and	complex	
• Collaboration	requires	open	
sharing	of	data	
• Data	are	highly	
heterogeneous	and	largely	
incompatible	in	their	native	
forms	
• The	semantics	and	contexts	
within	which	data	are	
gathered	and	interpreted	are	
important	to	preserve	
35
36
(Comic from The Official Dilbert Store)
37
http://www.christianitytoday.com/edstetzer/2015/february/3-
ways-social-media-benefits-church-leaders.html

Data curation vanderbilt