This presentation outlines the need to invest intellectual and expert human effort in data publication in order to see compelling research outcomes.
I gave this presentation on April 10th, 2014 at the University of Pennsylvania in an event sponsored by the Penn Humanities forum (http://humanities.sas.upenn.edu/13-14/dhf_opendata.shtml)
Z Score,T Score, Percential Rank and Box Plot Graph
Publishing and Pushing Linked Open Data
1. Publishing and Pushing
Linked Data in Archaeology
Unless otherwise indicated, this work is licensed under a Creative Commons
Attribution 3.0 License <http://creativecommons.org/licenses/by/3.0/>
Eric C. Kansa (@ekansa)
UC Berkeley D-Lab
& Open Context
4. My Precious Data:
Dysfunctional incentives
(poorly constructed metrics),
limit scope, diversity of
publications
Image Credit: “Lord of the Rings” (2003, New
Line), All Rights Reserved Copyright
5.
6.
7. Need more carrots!
1. Citation, credit,
intellectually valued
2. Research outcomes
(new insights from data
reuse!)
8. Need more carrots!
1. Citation, credit,
intellectually valued
2. Research outcomes
(new insights from data
reuse!)
Why linked data
is so important
11. Large scale data sharing &
integration for exploring the
origins of farming.
Funded by EOL / NEH
12. 1. 300,000 bone specimens
2. Complex: dozens, up to 110
descriptive fields
3. 34 contributors from 15
archaeological sites
4. More than 4 person years
of effort to create the data !
15. 1. Referenced by US National
Science Foundation and
National Endowment for the
Humanities for Data
Management
2. “Data sharing as
publishing” metaphor
32. Linking to UBERON
1. Needed a controlled vocabulary for
bone anatomy
2. Better data modeling than common in
zooarchaeology, adds quality.
33. Linking to UBERON
1. Models links between anatomy,
developmental biology, and genetics
2. Unexpected links between the
Humanities and Bioinformatics!
34.
35.
36. 7000 BC (many pigs, cattle)
7500 BC (sheep + goat dominate, few pigs, few cattle)
6500 BC (few pigs, mixing with wild animals?)
8000 BC (cattle, pigs,
sheep + goats)
• Not a neat model of progress to adopt a more productive
economy. Very different, sometimes piecemeal adoption in
different regions.
• Separate coastal and inland routes for the spread of domestic
animals, over a 1000-year time period.
37. Easy to Align
1. Animal taxonomy
2. Bone anatomy
3. Sex determinations
4. Side of the animal
5. Fusion (bone growth, up to
a point)
38. Hard to Align (poor modeling, recording)
1. Tooth wear (age)
2. Fusion data
3. Measurements
Despite common research methods!!
39. Professional expectations for data reuse
1. Need better data modeling
(than feasible with, cough,
Excel)
2. Data validation,
normalization
3. Requires training &
incentives for researchers
to care more about quality
of their data!
41. … and not just academic
researchers, linked open
data involves many sectors!
42. Digital Index of North
American Archaeology
(DINAA)
1. State “site files” created
to comply with federal
preservation laws
2. Main record of human
occupation in North
America
3. PIs: David G. Anderson
and Josh Wells
43. DINAA
1. Stable URI for
each site file.
2. CC-Zero (public
domain)
3. Beginning to link
to controlled
vocabularies
44. Data are challenging!
1. Decoding takes 10x longer
2. Data management plans should
also cover data modeling, quality
control (esp. validation)
3. More work needed modeling
research methods (esp. sampling)
4. Editing, annotation requires lots of
back-and-forth with data authors
5. Data need investment to be
useful!
46. Investing in Data is a Continual Need
1. Data and code co-evolve. New
visualizations, analysis may reveal
unseen problems in data.
2. Data and metadata change routinely
(revised stratigraphy requires ongoing
updates to data in this analysis)
3. Problems, interpretive issues in data
(and annotations) keep cropping up.
4. Is publishing a bad metaphor implying
a static product?
47.
48. Data sharing as publication
Data sharing as open source
release cycles?
49. Data sharing as publication
Data sharing as open source
release cycles?
50. Data sharing as publication
AND
Data sharing as open source
release cycles
51. Data are challenging!
1. Decoding takes 10x longer
2. Data management plans should
also cover data modeling, quality
control (esp. validation)
3. More work needed modeling
research methods (esp. sampling)
4. Editing, annotation requires lots of
back-and-forth with data authors
5. Data need investment to be
useful!
53. Image Credit: “Brainchildvn” via Flickr (CC-By)
http://www.flickr.com/photos/brainchildvn/3957949195
Not an easy environment to
seek new investments.
56. Bethany Nowviskie (University of Virginia)
Shifts in Career Paths and Professions
(#alt-academy), different publishing
incentives, emerging as data assume
a greater emphasis
57. Bethany Nowviskie (University of Virginia)
Alt-Acs (contingent, low status) not a
good answer, but reflect wider need
for institutional reform.
58. One does not simply
walk into Mordor
Academia and share
usable data…
Image Credit: Copyright Newline Cinema
59. Final Thoughts
Data require intellectual
investment, methodological and
theoretical innovation.
Institutional structures poorly
configured to support data
powered research
New professional roles needed,
but who will pay for it?
The removal of objects is now forbidden in most countries and many sites in the US. As a result data collection methods have changed from description of a physical object accessible in the US to a full surrogate for an object that might be re-buried in the ground. Data collection has increased as the collection of objects has decreased.Still individual systems of data collection (see examples on the right) have emerged which have.Developed over timeAre Handed down from mentors Contain some technological adoption, particularly the adoption of Excel spreadsheets over relational databasesIn all of our interviews there was no reference to existing guides, such as the UK: Archaeological Data Service or Netherlands: DANS on archaeological documentation.
We used archaeology as a case study. During our 22 semi-structured interviews archaeologists were asked about their1. background and research interests2. data reuse experiences:Actual experience using the critical incident (i.e. the last time they reused someone else’s data for their research)Aspirational - for those who had not reused someone else’s data we asked what they would need or want in order to do so3. views on digital data repositories4. data sharing practices
The removal of objects is now forbidden in most countries and many sites in the US. As a result data collection methods have changed from description of a physical object accessible in the US to a full surrogate for an object that might be re-buried in the ground. Data collection has increased as the collection of objects has decreased.Still individual systems of data collection (see examples on the right) have emerged which have.Developed over timeAre Handed down from mentors Contain some technological adoption, particularly the adoption of Excel spreadsheets over relational databasesIn all of our interviews there was no reference to existing guides, such as the UK: Archaeological Data Service or Netherlands: DANS on archaeological documentation.
The removal of objects is now forbidden in most countries and many sites in the US. As a result data collection methods have changed from description of a physical object accessible in the US to a full surrogate for an object that might be re-buried in the ground. Data collection has increased as the collection of objects has decreased.Still individual systems of data collection (see examples on the right) have emerged which have.Developed over timeAre Handed down from mentors Contain some technological adoption, particularly the adoption of Excel spreadsheets over relational databasesIn all of our interviews there was no reference to existing guides, such as the UK: Archaeological Data Service or Netherlands: DANS on archaeological documentation.
The removal of objects is now forbidden in most countries and many sites in the US. As a result data collection methods have changed from description of a physical object accessible in the US to a full surrogate for an object that might be re-buried in the ground. Data collection has increased as the collection of objects has decreased.Still individual systems of data collection (see examples on the right) have emerged which have.Developed over timeAre Handed down from mentors Contain some technological adoption, particularly the adoption of Excel spreadsheets over relational databasesIn all of our interviews there was no reference to existing guides, such as the UK: Archaeological Data Service or Netherlands: DANS on archaeological documentation.
The removal of objects is now forbidden in most countries and many sites in the US. As a result data collection methods have changed from description of a physical object accessible in the US to a full surrogate for an object that might be re-buried in the ground. Data collection has increased as the collection of objects has decreased.Still individual systems of data collection (see examples on the right) have emerged which have.Developed over timeAre Handed down from mentors Contain some technological adoption, particularly the adoption of Excel spreadsheets over relational databasesIn all of our interviews there was no reference to existing guides, such as the UK: Archaeological Data Service or Netherlands: DANS on archaeological documentation.
The removal of objects is now forbidden in most countries and many sites in the US. As a result data collection methods have changed from description of a physical object accessible in the US to a full surrogate for an object that might be re-buried in the ground. Data collection has increased as the collection of objects has decreased.Still individual systems of data collection (see examples on the right) have emerged which have.Developed over timeAre Handed down from mentors Contain some technological adoption, particularly the adoption of Excel spreadsheets over relational databasesIn all of our interviews there was no reference to existing guides, such as the UK: Archaeological Data Service or Netherlands: DANS on archaeological documentation.
The removal of objects is now forbidden in most countries and many sites in the US. As a result data collection methods have changed from description of a physical object accessible in the US to a full surrogate for an object that might be re-buried in the ground. Data collection has increased as the collection of objects has decreased.Still individual systems of data collection (see examples on the right) have emerged which have.Developed over timeAre Handed down from mentors Contain some technological adoption, particularly the adoption of Excel spreadsheets over relational databasesIn all of our interviews there was no reference to existing guides, such as the UK: Archaeological Data Service or Netherlands: DANS on archaeological documentation.
The removal of objects is now forbidden in most countries and many sites in the US. As a result data collection methods have changed from description of a physical object accessible in the US to a full surrogate for an object that might be re-buried in the ground. Data collection has increased as the collection of objects has decreased.Still individual systems of data collection (see examples on the right) have emerged which have.Developed over timeAre Handed down from mentors Contain some technological adoption, particularly the adoption of Excel spreadsheets over relational databasesIn all of our interviews there was no reference to existing guides, such as the UK: Archaeological Data Service or Netherlands: DANS on archaeological documentation.
The removal of objects is now forbidden in most countries and many sites in the US. As a result data collection methods have changed from description of a physical object accessible in the US to a full surrogate for an object that might be re-buried in the ground. Data collection has increased as the collection of objects has decreased.Still individual systems of data collection (see examples on the right) have emerged which have.Developed over timeAre Handed down from mentors Contain some technological adoption, particularly the adoption of Excel spreadsheets over relational databasesIn all of our interviews there was no reference to existing guides, such as the UK: Archaeological Data Service or Netherlands: DANS on archaeological documentation.
The removal of objects is now forbidden in most countries and many sites in the US. As a result data collection methods have changed from description of a physical object accessible in the US to a full surrogate for an object that might be re-buried in the ground. Data collection has increased as the collection of objects has decreased.Still individual systems of data collection (see examples on the right) have emerged which have.Developed over timeAre Handed down from mentors Contain some technological adoption, particularly the adoption of Excel spreadsheets over relational databasesIn all of our interviews there was no reference to existing guides, such as the UK: Archaeological Data Service or Netherlands: DANS on archaeological documentation.