IASSIST 2011 presentation: Tracking Data Reuse Motivations, Methods, and Obstacles

Tracking Data Reuse
Motivations, Methods, and Obstacles

Heather
Piwowar
DataONE
postdoc
with
NESCent
and
Dryad
@researchremix

IASSIST2011
#iassist

http://www.metmuseum.org/toah/ho/09/euwf/ho_24.45.1.htm

http://www.flickr.com/photos/jsmjr/62443357/

http://www.flickr.com/photos/camilleharrington/3587294608/

http://www.flickr.com/photos/rkuhnau/3318245976/

http://www.flickr.com/photos/conformpdx/1796399674/

http://www.flickr.com/photos/rkuhnau/3317418699/

http://www.flickr.com/photos/zemlinki/261617721/

http://www.flickr.com/photos/tracenmatt/3020786491/

http://www.flickr.com/photos/the-o/2078239333/

?
http://www.flickr.com/photos/ryanr/142455033/

http://upload.wikimedia.org/wikipedia/commons/thumb/e/e6/Gamma_distribution_pdf.svg/500px-Gamma_distribution_pdf.svg.png

http://www.flickr.com/photos/archeon/2941655917/

In 2009, 116 articles cited ORNL DAAC data.

Finding these articles took 70-80 hours

across at least 12 resources
all chosen from a deep understanding
of this specific research domain

then the full text of all the hits were
manually reviewed
Valerie Enriquez interview with James Kidder
http://openwetware.org/wiki/DataONE:Notebook/Reuse_of_repository_data

How
to
iden9fy
Dataset
Reuse
in
the
published
literature
This
cita2on
paCern
(dataset

DOI/ID
in
references
sec2on)
is

used
almost
exclusively
for

dataset
has
an
iden2fier? with
dataset
unique
ID search
in
reference
dataset
reuse.

(DOI,
url,
accession
#) sec2ons

of
all
papers Manual
disambigua2on
not

required:

can
be
automated

IDs
are
difficult
to
DOI/ID
reference
search
possible
in
full-‐text
portals
like
pending
API
support.
unambiguously
iden2fy
in
PubMed
Central
and
HighWire
Press,
however
portal

full
text

unless
they
have
a
coverage
is
limited
and
search
is
not
restricted
to
Does
not
require
access
to

unique
paCern
(DOI)
or
references
sec2on. full-‐text
unusual
prefix
or
suffix. with
dataset
unique

DOI/ID
search
works
in
Google
Scholar,
but
scope
is
This
cita2on
paCern
is
currently

ID
poorly
defined,
results
are
messy. rare

This
cita2on
paCern
is
difficult

DOI/ID
search
not
supported
by
ISI
Web
of
Science
or
to
track
with
exis2ng
tool

Scopus limita2ons

with
(submi-er
surname
AND

repository
name),

publicly
dataset
submission
record
has
and
also This
cita2on
paCern

archived
submiCer
name
or
dataset
(dataset
9tle
AND search
in
full
text
of
all
sort
hits
to
disambiguate
(accession
numbers
in
full

dataset 2tle?
repository
name) papers reuse
from
submission text)
is
very
common
in

some
subdisciplines,
so

Names
and
2tles
are
messy
Disambigua2on
is
2me
probably
finds
most

Requires
ability
to
query

iden2fiers consuming reuses.
full
text
across
all

literature
that
may
Requires
access
to
full
text
of

with
(first
author
surname
contain
reuse search
hits
for
sor2ng
AND
repository
name)

sort
hits
to
disambiguate

dataset
submission
record
men2ons
gather
papers
that
cite
the
data
This
cita2on
paCern

with
data
reuse
from
other

data
collec2on
ar2cle
publica2on? collec2on
paper (cita2on
to
data
crea2on

collec2on
ar2cle’s
cita2on
contexts
paper)
is
very
common
in

journal,
volume,
Disambigua2on
is
2me
some
subdisciplines,
so

page,
etc. Cita2on
history
export
is
2me
probably
finds
most
reuses.
Link
to
data
collec2on
paper
oVen
consuming:
most
cita2ons
are

consuming:

automa2on
not

missing
from
dataset
submission
record,
not
in
the
context
of
reuse
supported.
especially
when
dataset
submission

predates
ar2cle
publica2on.
Only
finds
cita2ons
indexed
by
Requires
access
to
full
text
of

cita2on
databases search
hits
for
sor2ng

This
flow
s2ll
misses
aCribu2ons
embedded
in
supplementary
informa2on,
reuses

aCributed
through
a
query
descrip2on,
etc.
Heather
Piwowar,
v1.0,
CC-‐BY

1. following citations to the
paper that describes the data
collection, then filtering.

2. searching for accession
numbers, urls, and DOIs in
full text

http://api.plos.org/2011/05/31/announcing_the_plos_search_api/

2005 long time ago

biomedicine familiar, also very
dominant

search interfaces not well designed
for this task

helpdesks are very helpful

stay tuned for results
poster at ASIS&T, SIGUSE

I post my data, code, and statistical scripts:
http://researchremix.org
Share yours too!
-> Open Notebook Science

http://www.flickr.com/photos/myklroventine/892446624/

https://notebooks.dataone.org/tracking1000datasets/

thank you
Todd Vision,
Estephanie Sta Maria
Jonathan Carlson
Dryad and DataONE teams
The open science online community and those who
release their articles, datasets and photos openly

IASSIST 2011 presentation: Tracking Data Reuse Motivations, Methods, and Obstacles

Recommended

Recommended

More Related Content

More from Heather Piwowar

More from Heather Piwowar (20)

Recently uploaded

Recently uploaded (20)

IASSIST 2011 presentation: Tracking Data Reuse Motivations, Methods, and Obstacles