Chuck
Henry
and
Christa
Williford,
DLF
Forum,
November
2011
Lessons
f...

                                                     Chuck
Henry
and
Christa
Williford,
DLF
Forum,
November
2011
        ...
Upcoming SlideShare
Loading in...5
×

Di d dlf_handout

317

Published on

Handout: Lessons from Digging Into Data

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
317
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Di d dlf_handout"

  1. 1. 
 Chuck
Henry
and
Christa
Williford,
DLF
Forum,
November
2011
Lessons
from
the
Digging
Into
Data
Challenge
What
Information
Professionals
Should
Know
about
Computationally
Intensive
Research
in
the
Humanities
and
Social
Sciences

For
the
past
two
years,
the
Council
on
Library
and
Information
Resources
(CLIR)
has
partnered
with
the
National
Endowment
for
Humanities
Office
of
Digital
Humanities
(NEH‐ODH)
in
an
intensive
assessment
of
the
inaugural
year
of
the
Digging
Into
Data
grant
program.
Launched
in
2009,
this
unprecedented
international
initiative
involved
four
funding
agencies
in
three
countries
and
supported
eight
international
collaborative
research
projects
in
the
social
sciences
and
humanities,
all
of
which
bring
innovative
applications
of
computer
technology
to
bear
on
the
collection,
mining,
and
interpretation
of
large
data
corpora.
Here
is
a
sampling
of
what
CLIR
has
learned:

Lesson
1:
Computationally
intensive
research
requires
open
sharing
of
resources
among
participants.
Essential
resources
include
hardware,
software,
data
corpora,
and
communication
tools.
Information
professionals
can
facilitate
open
sharing
by
helping
researchers
forge
partnership
agreements
based
upon
trust
and
transparency.

Example:
To
support
the
project
“Digging
Into
Data
to
Answer
Authorship
Related
Questions,”
participants
drafted
a
Memorandum
of
Understanding
that
made
clear
how
shared
resources
would
be
funded
as
well
as
established
a
plan
for
project
communication
and
credit
sharing.
See:
Michael
Simeone,
Jennifer
Guiliano,
Rob
Kooper
and
Peter
Bajcsy,
"Digging
into
Data
Using
New
Collaborative
Infrastructures
Supporting
Humanities‐based
Computer
Science
Research."
First
Monday
16.5

(2
May
2011):
http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/viewArticle/3372/2950

Lesson
2:
Computationally
intensive
research
projects
rely
upon
diverse
kinds
of
expertise:
domain
(or
subject)
expertise,
analytical
expertise,
data
management
expertise,
and
project
management
expertise.
Information
professionals
can
offer
and/or
develop
skills
and
knowledge
in
each
of
these
areas,
enabling
them
to
participate
actively
as
research
partners.

Example:
For
their
project,
“Digging
Into
the
Enlightenment:
Mapping
the
Republic
of
Letters,”
Stanford
University
provided
resources
and
project
management
support
to
their
international
partners
through
“embedded”
information
professional
Nicole
Coleman,
who
is
based
at
the
Stanford
Humanities
Center.
As
Academic
Technology
Specialist,
Nicole’s
focus
is
on
finding
new
research
opportunities
and
supporting
the
production
of
new
knowledge,
and
she
has
developed
expertise
in
the
kinds
of
infrastructure
and
management
practices
that
contribute
to
successful
research
collaborations.
For
more
information
about
this
project,
see:
http://enlightenment.humanitiesnetwork.org/

Lesson
3:
When
it
comes
to
analytical
tools,
one
size
does
not
fit
all.
As
their
questions
evolve
throughout
their
projects,
researchers
want
the
flexibility
to
alternate
between
looking
closely
at
select
data
and
performing
“distant”
readings
of
entire
corpora.
Information
professionals
can
educate
researchers
to
help
them
refine
their
questions,
select
appropriate
tools,
and
use
their
tools
effectively.

Example:
While
both
close
and
distant
readings
of
evidence
characterized
most
of
the
Digging
Into
Data
project
methodologies,
Richard
Healey,
co‐principal
investigator
of
“Railroads
and
the
Making
of
Modern
America,”
has
an
interesting
take
on
why
humanities
and
social
science
data
requires
the
continual
adaptation
and
evolution
of
analytical
tools.
He
hypothesizes
many
“different
levels
of
data‐related
operations,”
and
these
levels
determine
the
research
outcomes
that
are
possible
at
each
level.
He
writes:

 The
levels
relate
to
the
degree
of
scholarly
input
involved
and
I
see
them…as
a
data
‘hierarchy’:
 • Level
0
‐
Data
so
riddled
with
error
it
should
come
with
a
serious
intellectual
health
 warning!
(We
have
much
more
of
this
than
most
people
seem
willing
to
admit
and
much
 of
the
Google
data
from
scanned
railroad
reports
admirably
fits
into
this
category).

 • Level
1
‐
Raw
datasets…corrected
for
obvious
errors.

  2. 2. 
 Chuck
Henry
and
Christa
Williford,
DLF
Forum,
November
2011
 • Level
2
‐
Value‐added
datasets:
those
that
have
been
standardised/coded
etc.
in
a
 consistent
fashion
according
to
some
recognised
scheme
or
procedure,
which
may
 require
significant
domain
expertise
[to
produce]…)
 • Level
3
‐
Integrated
data
resources:
These
will
contain
value‐added
datasets
 but…explicit
linkages
have
been
made
between
multiple
related
datasets
(or
they
have
 been
coded/tagged
in
such
a
way
that
the
linkages
can
be
made
by
software.

Hence,
 these
are
not
just
data
because
so
much
additional
research
time
has
been
invested
in
 them,
which
is
why
I
prefer
the
word
‘resource’….

Many
GIS
resources
are
of
this
kind,
 because
they
require
linkage
of
spatial
and
non‐spatial
data.
 • Level
4
‐
Digging
Enabler
or
Digging
Key

data/classificatory
resources:
These
require
 extensive
domain
expertise,
and
use
of/analysis
of
multiple
sources/relevant
literature
 to
create.
They
facilitate
extensive
additional
types
of
digging
activity
to
be
undertaken
 on
substantive
projects
beyond
those
of
the
investigators
who
created
them,
i.e
they
 become
authority
files
for
the
wider
research
community.

Gazetteers,
structured
 occupational
coding
systems,
data
cross‐
classifiers
etc.
fit
into
this
category.

Lesson
4:
Big
data
isn’t
just
for
scientists
anymore.
Not
only
do
humanists
and
social
scientists
work
with
big
data,
their
research
can
also
produce
large
data
corpora.
Some
scholars
engaged
in
computationally
intensive
research
see
the
new
data
they
create
as
their
most
significant
research
outcomes.
Researchers
risk
losing
their
valuable
data
unless
they
take
steps
to
protect
and
sustain
them.
As
practices
for
publishing
research
data
evolve,
information
professionals
can
curate
this
data,
working
with
scholars
to
appraise,
normalize,
validate,
provide
access
to
and,
ultimately,
preserve
research
data
for
the
long
term.

Example:
In
the
final
white
paper
for
“Mining
a
Year
of
Speech,”
John
Coleman
draws
a
compelling
comparison
between
the
sizes
of
data
sets
with
which
current
major
science
and
humanities
projects
are
engaged
(see
below).
This
paper
is
available
at:
http://www.phon.ox.ac.uk/files/pdfs/MiningaYearofSpeechWhitePaper.pdf

 


×