NISO Webinar: Part 2: Managing Data for Scholarly Communications

http://www.niso.org/news/events/2011/nisowebinars/semanticweb/

Managing Data for
Scholarly Communications
PART 2: Technical Management

October 19, 2011

Speakers: Joan Starr, Mark McFarland,
and MacKenzie Smith

Dataset
Iden*ﬁca*on
&
Cita*on:

DataCite
and
EZID

Joan
Starr

California
Digital
Library

October,
2011

Dataset
Iden*fica*on
&
Cita*on

Introduc*on

The
Researchers’
Challenge

Iden*fiers
are
a
tool
for
researchers

DataCite

“Helping
you
find,
access
and
reuse
data.”

EZID

Easy
crea*on
and
management
of
DataCite
DOIs
and
other

iden*fiers.

Next
steps

For
DataCite,
EZID
and
you!

California
Digital
Library
(CDL)

The
Researchers’
Challenge

Early
in
the
research
life
cycle

Data-‐intensive
research
+
Wri*ng
up
the
results

Where’s

the
data?
What
if

I

move
it?

PERSISTENT
IDENTIFIERS

make
the
diﬀerence

by
Dave
Rogers
hWp://www.ﬂickr.com/photos/dave-‐rogers/2815036285/

Working
on
a
federated
team

Data-‐intensive
research

+
Regional
research
center

+
Aging
infrastructure

Where’s

We
have
to

the
data?

move
it!

PERSISTENT
IDENTIFIERS

make
the
diﬀerence

©All
rights
reserved
by
University
of
California,
hWp://www.ﬂickr.com/photos/universityofcalifornia/5405812887

Making
a
career
move

•  Data-‐intensive
research
+
•  Researcher(s)
on
the

move

I
know

where
my

data
is
and
I’m

taking
it

with
me!

PERSISTENT
IDENTIFIERS

make
the
diﬀerence

©All
rights
reserved
by
University
of
California,

hWp://www.ﬂickr.com/photos/universityofcalifornia/5406308654

Mee*ng
funder
requirements

•  Data-‐intensive
research
+
•  Grantor
requirements

for
data
management

What
do
we
plan

put
here?

How
do
we

track
the
data?

PERSISTENT
IDENTIFIERS

make
the
diﬀerence

By
David
Mellis,
hWp://www.ﬂickr.com/photos/mellis/7675610/

DataCite

German
Na8onal
Library
of
Economics
(ZBW)

Canada
Ins8tute
for
Scien8fic
and
Technical
Informa8on

German
Na8onal
Library
of
Science
and
Technology
(TIB)

(CISTI)

German
Na8onal
Library
of
Medicine
(ZB
MED)
Technical
Informa8on
Center
of
Denmark

GESIS
-‐
Leibniz
Ins8tute
for
the
Social
Sciences,
Germany

Ins8tute
for
Scien8fic
&
Technical
Informa8on
(INIST-‐

Australian
Na8onal
Data
Service
(ANDS)
CNRS),
France

ETH
Zurich,
Switzerland
TU
DelS
Library,
The
Netherlands

The
Swedish
Na8onal
Data
Service
(SNDS)

The
Bri8sh
Library
,
UK

California
Digital
Library
(CDL),
USA

Office
of
Scien8fic
&
Technical
Informa8on
(OSTI),
USA

Purdue
University
Library

DataCite
Metadata
V.
2.2

•  Small
required
set
=
cita*on
elements

•  Op*onal
descrip*ve
set:

–  extendable
lists

–  can
refer
to
other
standards,
schemes

–  domain-‐neutral

–  rich
ability
to
describe
rela*onships
to
other

digital
objects

•  Metadata
Search
(MDS)
is
full-‐text
indexed

DataCite
Metadata
V.
2.2

Required
proper8es
Op8onal
proper8es

1.  Iden8fier
(with
type
aWribute)
6.  Subject
(with
schema
aWribute)

2.  Creator
(with
name
iden*fier
7.  Contributor
(with
type
&
name
iden*fier

aWributes)
aWributes)

3.  Title
(with
op*onal
type
aWribute)
8.  Date
(with
type
aWribute)

4.  Publisher
9.  Language

5.  Publica8onYear
10.  ResourceType
(with
descrip*on
aWribute)

11.  AlternateIden*fier
(with
type
aWribute)

12.  RelatedIden*fier
(with
type
&rela*on

type
aWributes)

13.  Size

14.  Format

15.  Version

16.  Rights

17.  Descrip*on
(with
type
aWribute)

•  Get
iden*ﬁers

•  Add
loca*on

•  Add
metadata

•  Update
loca*on

•  Update
metadata

Next
Steps

DataCite

• 
Dublin
Core
applica*on
proﬁle

• 
Content
Service

• 
Metadata
v.
2.3

EZID

• UI
redesign

• Automated
link
checking

• Exposure
for
cita*ons

By
Nicola
Whitaker
hWp://www.ﬂickr.com/photos/nicolawhitaker/111009156/

Next
Steps
for
you

•  Get
more
informa*on,
and

•  Try
EZID
for
yourself!

By
Nicola
Whitaker
hWp://www.ﬂickr.com/photos/nicolawhitaker/111009156/

For
more
informa*on

EZID

EZID
applica*on:
hWp://n2t.net/ezid/

EZID
website:
hWp://www.cdlib.org/services/uc3/ezid/

UC3
website:
hWp://www.cdlib.org/services/uc3/

DataCite

DataCite
Home:
hWp://datacite.org/

DataCite
Metadata
Schema:

hWp://schema.datacite.org/meta/kernel-‐2.2/index.html

DataCite
Metadata
Search:
hWp://search.datacite.org

Contact
Joan
Starr
at
uc3@ucop.edu

Ques*ons?

by
Horia
Varlan

hWp://www.ﬂickr.com/photos/horiavarlan/4273168957/in/photostream/

Digital
Library
Services
in
the
Cloud

Mark
McFarland

Director,
Texas
Digital
Library

Outline

•  Who:
Texas
Digital
Library

•  Where:
on
the
cloud

•  Why:
mo*va*ons

•  When:
late
2010

•  What:
lessons
learned

June
2011
30

Who:
Texas
Digital
Library

•  Consor*um
of
higher
educa*on
ins*tu*ons
in
Texas

•  Current
services
include:

–  Ins*tu*on:
IR
(DSpace),
ETD
system

–  Faculty:
OJS,
OCS,
blogs,
wikis

–  Approximately
70
customer-‐facing
service
instances

•  Legacy
hardware
included

–  Compute
servers

–  Storage
servers

–  Network
support
devices

June
2011
31

Where:
on
the
cloud

•  Migrated
customer-‐facing
services
to
AWS

–  50
AWS
VM
instances

•  Maintained
some
services
on
local
hardware

•  Simpliﬁed
and
consolidated
system

architecture

June
2011
32

Why:
mo*va*ons
/
When:
late
2010

•  Disaster
recovery
plan

–  Prepare
for
data
center
move

•  Elas*c
capacity

–  New
members,
collec*ons

•  Personnel
savings

–  Fewer
competencies,
responsibili*es

•  Began
Oct
2010

June
2011
33

What:
lessons
learned

•  The
Good

–  Elas*c
capacity;
customers
did
not
no*ce
change

–  No
hardware
purchase
cycle

•  The
Mixed

–  Lower
personnel
costs;
failover

•  The
Unexpected

–  Development
tools;
concerns
about
AWS
being
in

U.S.;
excellent
management
console

June
2011
34

Future

•  Preserva*on

–  DuraCloud

•  Con*nue
to
evaluate

–  AWS
is
ﬂexible
and
feature
rich,
but
may
s*ll
not

be
cost
eﬀec*ve

June
2011
35

For
more
informa*on
about
the
TDL,
please
visit
the
Texas

Digital
Library
website
at
hWp://www.tdl.org

or
contact
us
at

info@tdl.org.

Data
Governance
and

Legal
Interoperability

MacKenzie
Smith,
Science
Fellow

©
Crea*ve
Commons,
2011.
This
work
is
licensed
under
a
Crea*ve
Commons
AWribu*on
3.0
United
States
License.

Why
Data
Sharing
is
Good

•  research
reproducibility

•  ﬁscal
responsibility

•  broadest
possible
impact

•  large-‐scale
data
interoperability

–  Includes
technical,
social,
legal
and
policy
aspects

–  usual
focus
on
technical/social

–  focus
here
on
legal/policy
aspects

Why
Data
Sharing
is
Hard

•  No
incen*ves
to
improve
data
quality,
provide

missing
documenta*on

•  Conﬁden*ality
and
privacy
concerns

(e.g.
HIPAA,
endangered
species)

•  Patents
and
commercial
poten*al

•  Closed
Access
to
journal
ar*cles
(i.e.
results)

•  IP
issues
very
complicated

Deﬁni*ons

Data
governance
is
the
system
of
decision
rights
and

accountabili8es
that
describe
who
can
take
what
ac8ons

with
what
data,
and
when,
under
what
circumstances,
using

what
methods

•  strategies
for
data
quality
control
and
management,
and
processes
that

insure
important
data
assets
are
formally
managed
throughout
an

organiza*on;

–  organiza*ons
can
be
legal
en**es
like
universi*es,
or
virtual
organiza=ons

(e.g.
distributed
research
collabora*ons)

–  Includes
business
processes
and
risk
management;

•  laws
and
policies
associated
with
data;

•  ensures
that
data
can
be
trusted
and
that
people
are
accountable
for

ac*ons
aﬀec*ng
the
data

Deﬁni*ons

•  A"ribu'on
is
legally-‐imposed,
remedy
is
lawsuit

•  Credit
is
what
researchers
want

•  Cita'on
is
the
norm
in
scholarly
communica*on,

to
provide
suppor*ng
evidence,
now
proxy
for

credit

AWribu*on
does
not
insure
credit
or
cita*on.

Legal
Mechanisms
for
Sharing
Data

1. 
licenses

Require
aWribu*on

2. 
contracts

3. 
waivers

No
aWribu*on

requirement

Copyright
for
Data

•  Does
not
apply
to
facts,
e.g.,
most
scien*ﬁc

data

•  Can
apply
to
a
collec=on
of
facts,
but
only
to

original
aspects,
not
facts
themselves

•  Can
extract
facts
from
a
copyrighted
database

without
infringing

Licenses

•  Licenses
are
not
contracts

–  depend
on
underlying
rights,
e.g.
copyright
or
sui
generis

rights

–  Copyright
is
a
bundle
of
rights,
automa*c
when
fixed,

limited
in
scope
and
dura*on

•  US
and
EU
differ
(EU
has
sui
generis
data
rights)

so
different
licenses
cover
copyright,
sui
generis

rights,
or
both

Licenses

•  Crea*ve
Commons
(CC-‐BY)
example

–  applies
to
data
and
databases
to
the
extent
they’re

copyrightable

–  Only
data
uses
that
implicate
copyright
trigger

aWribu*on
requirement

–  uses
of
data
that
do
not
implicate
copyright,
e.g.
is
in

the
public
domain,
do
not
trigger
aWribu*on

Licenses

•  Hard
to
assess
copyright
for
par*cular
data

and
databases

•  Hard
to
know
when
license
applies,
creates

risks:

–  data
provider
be
misled

–  data
user
will
under
or
over
comply

Licenses

•  AWribu*on
requirements
are
inﬂexible,

causing
absurd
situa*ons

–  e.g.
providing
aWribu*on
to
1,000
providers

in
1,000
diﬀerent
ways

–  known
as
‘aWribu*on
stacking’

•  Could
provide
aWribu*on
and
s*ll
not
sa*sfy

norms
or
expecta*ons

Contracts

•  Do
not
require
underlying
right

–  rely
on
oﬀer/acceptance,
click
through,
terms
of
use

–  require
formali*es,
e.g.
aWribu*on

•  Downsides

–  confusing
obliga*ons,
no
standardiza*on,
each
user

agreement
can
have
diﬀerent
requirements

•  Researchers
may
avoid
data
if
they
can’t

understand
the
terms
of
use

Contracts

Unlike
licenses,
contracts
only
binds
par=es

•  If
someone
obtains
licensed
data
and
shares
it,
anyone

who
obtains
data
from
that
user
is
s*ll
bound
by
the

license

•  If
data
had
been
shared
by
contract,
anyone
obtaining

data
from
the
second
party
is
not
bound
by
the

contract
since
they
aren’t
a
party
to
the
contract

•  In
this
respect,
contracts
are
more
limited
than
licenses

Contracts

•  Have
broader
reach
than
licenses

–  not
*ed
to
a
legal
right

–  can
take
away
rights
of
public

Waivers

•  Provide
legal
certainty

–  No
need
to
decipher
copyright
protec*on
or
six
through
confusing

legalese

–  BeWer
than
silence,
to
avoid
forcing
people
to
guess
what
their
risks

are

•  Mean
loss
of
control

–  Can’t
require
aWribu*on
or
other
terms

•  Avoid
problems
and
rely
on
scholarly
norms

–  no
aWribu*on
stacking
or
inappropriate
obliga*ons

3
levels:
Waiver,
Fall-‐back
license,
Non-‐asser*on
pledge

Summary

•  Law
is
messy,
each
approach
has
consequences

•  Licenses
–
(1)
legal
uncertainty
about
scope,
(2)

requirements
can
be
inconsistent
with
norms

•  Contracts
–
(1)
burdensome
requirements
with
custom

terms,
(2)
exceed
scope
of
rights
with
requirements
that

take
away
normal
rights

•  Waivers
–
(1)
avoid
problems,
but
(2)
lose
control
and

rely
on
norms

Summary

•  Each
approach
requires
loss
of
control

•  No
mechanism
imposes
legally-‐binding
obliga*ons
in

way
that
perfectly
maps
to
scholarly
credit,
e.g.

cita*on

•  Ideal
solu*on
creates
the
least
fric*on
to
scien*ﬁc

progress
while
giving
credit
where
due,
i.e.,
waivers

and
norms
(the
community
governs
itself)

NISO Webinar: Part 2: Managing Data for Scholarly Communications

Recommended

Recommended

More Related Content

Similar to NISO Webinar: Part 2: Managing Data for Scholarly Communications

Similar to NISO Webinar: Part 2: Managing Data for Scholarly Communications (20)

More from National Information Standards Organization (NISO)

More from National Information Standards Organization (NISO) (20)

Recently uploaded

Recently uploaded (20)

NISO Webinar: Part 2: Managing Data for Scholarly Communications