IASSIT Kansa Presentation

Case-Study: Publishing to the
“Web of Data” in Archaeology

Quality and Workflows

Eric Kansa
UC Berkeley / OpenContext.org

Unless otherwise indicated, this work is licensed under a Creative Commons
Attribution 3.0 License <http://creativecommons.org/licenses/by/3.0/>

“Small Science” data sharing
is hard:
(1) Complexity
(2) Scalability
(3) Ethics, cultural property
claims, IP
(4) Incentives
(5) Preservation
Image Credit: “Grand Canyon NPS” via Flickr (CC-By)
http://www.flickr.com/photos/grand_canyon_nps/5975537378/

Thousand Flowers

●
Open Context: Open access,
open licensed data for
arhaeology
●
Archiving by California Digital
Library
●
Persistent Identifiers (DOIs,
ARKs)
●
Web services
●
NSF/NEH links for data
management plans

Thousand Flowers

Fills a Gap:

Most data sources are institutional.
Open Context publishes individual,
small group contributions

Thousand Flowers

Fills a Gap:
Challenge:
Most data sources are institutional. Diverse
Open Context publishes individual, contributions,
small group contributions needing lots of
work to clean-
up and “link” to
the Web of Data

•
3-year project Oct 2010 – Sep 2013

•
Funded with a National Leadership Grant from the
Institute for Museum and Library Services, LG-06-
10-0140-10, “Dissemination Information Packages
for Information Reuse”

•
Ixchel Faniel, PI & Elizabeth Yakel, Co-PI

http://www.dipir.org

The Big DIPIR Questions
Research Questions
1. What are the significant
properties of data that
facilitate reuse by the
designated communities at the
three sites?
2. How can these significant
properties be expressed as
representation information to
ensure the preservation of
meaning and enable data
reuse?

Open Context Interviewees
•
22 Ph.D. or graduate students
interviewed
–
13 men
–
9 women
•
Novices / Experts
–
19 experts
–
3 novices
•
Interviewees who where
curators or professors also
with a curatorial role = 6

Data Documentation Practices
I use an Excel spreadsheet…which I … inherited from my research
advisers. …my dissertation advisor was still recording data for each
specimen on paper when I was in graduate school so that's what I
started …then quickly, I was like, "This is ridiculous.“… I just started
using an Excel spreadsheet that has sort of slowly gotten bigger and
bigger over time with more variables or columns…I've added …color
coding…I also use…a very sort of primitive numerical coding system,
again, that I inherited from my research advisers…So, this little book
that goes with me of codes which is sort of odd, but …we all know
that a 14 is a sheep.” (CCU13)

Data Documentation Practices
I use an Excel spreadsheet…which I … inherited from my research
advisers. …my dissertation advisor was still recording data for each
specimen on paper when I was in graduate school so that's what I
started …then quickly, I was like, "This is ridiculous.“… I just started
using an Excel spreadsheet that has sort of slowly gotten bigger and
bigger over time with more variables or columns…I've added …color
coding…I also use…a very sort of primitive numerical coding system,
again, that I inherited from my research advisers…So, this little book
that goes with me of codes which is sort of odd, but …we all know
that a 14 is a sheep.” (CCU13)

A long way to go before we
get usable, intelligible data

Sometimes data is better
served cooked.

Thousand Flowers

●
Clean-up and document
contributed data
●
Map to ArchaeoML (general
ontology)
●
Mint URIs to entities
(potsherds, projects, contexts,
people)
●
Link to important vocabularies /
collections (Pleiades,
Encyclopedia of Life)
●
Working on CIDOC-CRM
(RDF) representations (not
straightforward)

Open Context: Record

●
XHTML + RDFa (Dublin Core,
Open Annotation, etc.)
●
XML (ArchaeoML)
●
Atom
●
RDF (draft CIDOC)
●
Link to GitHub versioned file

Open Context: Visutalization of Data Linked to the EOL

My Precious Data

Image Credit: “Lord of the Rings” (2003, New
Line), All Rights Reserved Copyright

Publishing

Data Quality and Standards
Alignment
(1) Check consistency
(2) Edit functions
(3) Align to common standards
(“Linked Data” if applicable)
(4) Issue tracking, version
control

Publishing

Tools of the Trade

(1) Google Refine (check, edit,
consistancy)
(2) Mantis (issue-tracker,
coordinate edits, metadata
creation)

Publishing

Tools of the Trade

(1) Domain scientists (Editorial
Board) check data
(2) Iterative “coproduction”
between contributors and
editoris

Publishing

Project Metadata

Column Descriptions

Web of Data (2011)

Main Contributors:

●
Institutions (esp. government)

●
Thematic collections / projects

Publishing

Entity Reconciliation

(1) With Google Refine
(2) Implemented, EOL and
Pleiades (gazetteer)
(3) Use existing mappings to
improve future reconciliation

●
CDL Archiving Service
●
EZID for persistent Identity: DOIs
(aggregate resources), ARKs
(granular resources) and Merritt
Repository
●
Helps build trust in community

CDL as Infrastructure

●
Platform / Services
disciplinary communities
can use for “Data
Publishing”
●
Different communities
work out
semantic/interoperability
needs, editorial policies, University of California (System)
incentives, etc. Repository,
All disciplines
(UC-funded library, grants)

CDL as Infrastructure Future data
Future data publisher
publisher

●
Platform / Services
disciplinary communities
can use for “Data
Publishing”
●
Different communities
work out
semantic/interoperability
needs, editorial policies, University of California (System)
incentives, etc. Repository,
All disciplines
(UC-funded library, grants)

eScholarship: UC’s OA Publishing Platform

Platform for traditional publishing

Summary

Outcomes of Publishing Data:
(1) Communicate and set
expectations about content and
quality
(2) Organize workflows to improve
data quality and usability
(3) Make “datasets” first class citizens
in world of scholarly
communications

Final Thoughts

Publication needs to evolve!

(1) Participating in Linked Data is
a great goal, but far removed
from most everyday practice

(2) Researchers need help.

(3) 19th century publication norms
poorly suited to 21st century
methods, research, public
goals

IASSIT Kansa Presentation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to IASSIT Kansa Presentation

Similar to IASSIT Kansa Presentation (20)

Recently uploaded

Recently uploaded (20)

IASSIT Kansa Presentation