Publishing and Pushing:
Mixing Models for Communicating
Research Data in Archaeology
Eric C. Kansa (@ekansa)
UC Berkeley D...
Introduction

Challenges in Reusing Data
1. Background
2. Data publishing workflow
3. Data curation and dynamism
Need more carrots!
1. Citation, credit, intellectu
ally valued
2. Research outcomes
(new insights from data
reuse!)
EOL Computable Data
Challenge
(Ben Arbuckle, Sarah
W. Kansa, Eric Kansa)
Large scale data sharing &
integration for exploring the
origins of farming.
Funded by EOL / NEH
1. 300,000 bone specimens
2. Complex: dozens, up to 110
descriptive fields
3. 34 contributors from 15
archaeological sites...
Relatively collaborative
bunch, Ben Arbuckle cultivated
relationships & built trust over
years prior to EOL funding.
“204: Dynamics of Data Reuse when Aggregating Data through Time and
Space: The Case of Archaeology and Zoology”
Elizabeth ...
Introduction

Challenges in Reusing Data
1. Background
2. Data publishing workflow
3. Data curation and dynamism
1. Referenced by US National
Science Foundation and
National Endowment for the
Humanities for Data
Management
2. “Data sha...
Raw Data:
Idiosyncratic, sometimes
highly coded, often
inconsistent
Raw Data Can Be Unappetizing
Publishing Workflow

Improve / Enhance
1.
2.

Consistency
Context
(intelligibility)
Sometimes data is better
served cooked
- Documentation
- Review, editing
- Annotation
- Documentation
- Review, editing
- Annotation
- Documentation
- Review, editing
- Annotation
- Documentation
- Review, editing
- Annotation
- Documentation
- Review, editing
- Annotation
“Ovis orientalis”

Sheep,
wild

Code: 16

Wild
sheep

O.
orientalis

Code: 70

Ovis orientalis

Code: 15
Code: 14

Sheep
(...
- Documentation
- Review, editing
- Annotation
“Ovis orientalis”
http://eol.org/pages/311906/

Sheep,
wild

Code: 16

Wild
sheep

O.
orientalis

Code: 70

Ovis orientali...
●
●

Controlled vocabulary
Linked Data applications
“Sheep/goat”
http://eol.org/pages/32609438/

1. Needed to mint new
concepts like
“sheep/goat”
2. Vocabularies need to
be r...
Linking to UBERON
1. Needed a controlled vocabulary for
bone anatomy
2. Better data modeling than common in
zooarchaeology...
Linking to UBERON
1. Models links between anatomy,
developmental biology, and genetics
2. Unexpected links between the
Hum...
6500 BC (few pigs, mixing with wild animals?)

7500 BC (sheep + goat dominate, few pigs, few cattle)

7000 BC (many pigs, ...
Easy to Align
1. Animal taxonomy
2. Bone anatomy
3. Sex determinations
4. Side of the animal
5. Fusion (bone growth, up to...
Hard to Align (poor modeling, recording)
1. Tooth wear (age)
2. Fusion data
3. Measurements
Despite common research method...
“Under the hood” exposure
will lead to better data
documentation practices?
Nobody expected their data
to see wider scrutiny either..
Professional expectations for data reuse

1. Need better data modeling
(than feasible
with, cough, Excel)
2. Data
validati...
Data are challenging!
1.
2.

3.
4.
5.

Decoding takes 10x longer
Data management plans should
also cover data modeling, qu...
Introduction

Challenges in Reusing Data
1. Background
2. Data publishing workflow
3. Data curation and dynamism
Investing in Data is a Continual Need
1.

2.

3.

4.

Data and code co-evolve. New
visualizations, analysis may reveal
uns...
Data sharing as publication
Data sharing as open source
release cycles?
Data sharing as publication
Data sharing as open source
release cycles?
Data sharing as publication
AND
Data sharing as open source
release cycles
One does not simply
walk into Mordor
Academia and share
usable data…
Image Credit: Copyright Newline Cinema
Final Thoughts
Data require intellectual
investment, methodological and
theoretical innovation.
Institutional structures p...
Thank you!

IDCC reviewers
(excellent, very helpful
comments!)
Idcc kansa-kansa-arbuckle
Idcc kansa-kansa-arbuckle
Idcc kansa-kansa-arbuckle
Idcc kansa-kansa-arbuckle
Idcc kansa-kansa-arbuckle
Idcc kansa-kansa-arbuckle
Idcc kansa-kansa-arbuckle
Idcc kansa-kansa-arbuckle
Upcoming SlideShare
Loading in …5
×

Idcc kansa-kansa-arbuckle

449 views
388 views

Published on

Presentation for the San Francisco #IDCC14 conference (http://www.dcc.ac.uk/events/idcc14/day-two-papers). The presentation covers publishing zooarchaeology data with Open Context (http://opencontext.org) to study the spread of farming from the Near East to Europe through Anatolia. It looks at editorial processes, linked data annotation, and other workflow concerns relating to making raw data more usable for comparative analysis.

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
449
On SlideShare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • The removal of objects is now forbidden in most countries and many sites in the US. As a result data collection methods have changed from description of a physical object accessible in the US to a full surrogate for an object that might be re-buried in the ground. Data collection has increased as the collection of objects has decreased.Still individual systems of data collection (see examples on the right) have emerged which have.Developed over timeAre Handed down from mentors Contain some technological adoption, particularly the adoption of Excel spreadsheets over relational databasesIn all of our interviews there was no reference to existing guides, such as the UK: Archaeological Data Service or Netherlands: DANS on archaeological documentation.
  • We used archaeology as a case study. During our 22 semi-structured interviews archaeologists were asked about their1. background and research interests2. data reuse experiences:Actual experience using the critical incident (i.e. the last time they reused someone else’s data for their research)Aspirational - for those who had not reused someone else’s data we asked what they would need or want in order to do so3. views on digital data repositories4. data sharing practices
  • The removal of objects is now forbidden in most countries and many sites in the US. As a result data collection methods have changed from description of a physical object accessible in the US to a full surrogate for an object that might be re-buried in the ground. Data collection has increased as the collection of objects has decreased.Still individual systems of data collection (see examples on the right) have emerged which have.Developed over timeAre Handed down from mentors Contain some technological adoption, particularly the adoption of Excel spreadsheets over relational databasesIn all of our interviews there was no reference to existing guides, such as the UK: Archaeological Data Service or Netherlands: DANS on archaeological documentation.
  • The removal of objects is now forbidden in most countries and many sites in the US. As a result data collection methods have changed from description of a physical object accessible in the US to a full surrogate for an object that might be re-buried in the ground. Data collection has increased as the collection of objects has decreased.Still individual systems of data collection (see examples on the right) have emerged which have.Developed over timeAre Handed down from mentors Contain some technological adoption, particularly the adoption of Excel spreadsheets over relational databasesIn all of our interviews there was no reference to existing guides, such as the UK: Archaeological Data Service or Netherlands: DANS on archaeological documentation.
  • The removal of objects is now forbidden in most countries and many sites in the US. As a result data collection methods have changed from description of a physical object accessible in the US to a full surrogate for an object that might be re-buried in the ground. Data collection has increased as the collection of objects has decreased.Still individual systems of data collection (see examples on the right) have emerged which have.Developed over timeAre Handed down from mentors Contain some technological adoption, particularly the adoption of Excel spreadsheets over relational databasesIn all of our interviews there was no reference to existing guides, such as the UK: Archaeological Data Service or Netherlands: DANS on archaeological documentation.
  • The removal of objects is now forbidden in most countries and many sites in the US. As a result data collection methods have changed from description of a physical object accessible in the US to a full surrogate for an object that might be re-buried in the ground. Data collection has increased as the collection of objects has decreased.Still individual systems of data collection (see examples on the right) have emerged which have.Developed over timeAre Handed down from mentors Contain some technological adoption, particularly the adoption of Excel spreadsheets over relational databasesIn all of our interviews there was no reference to existing guides, such as the UK: Archaeological Data Service or Netherlands: DANS on archaeological documentation.
  • The removal of objects is now forbidden in most countries and many sites in the US. As a result data collection methods have changed from description of a physical object accessible in the US to a full surrogate for an object that might be re-buried in the ground. Data collection has increased as the collection of objects has decreased.Still individual systems of data collection (see examples on the right) have emerged which have.Developed over timeAre Handed down from mentors Contain some technological adoption, particularly the adoption of Excel spreadsheets over relational databasesIn all of our interviews there was no reference to existing guides, such as the UK: Archaeological Data Service or Netherlands: DANS on archaeological documentation.
  • The removal of objects is now forbidden in most countries and many sites in the US. As a result data collection methods have changed from description of a physical object accessible in the US to a full surrogate for an object that might be re-buried in the ground. Data collection has increased as the collection of objects has decreased.Still individual systems of data collection (see examples on the right) have emerged which have.Developed over timeAre Handed down from mentors Contain some technological adoption, particularly the adoption of Excel spreadsheets over relational databasesIn all of our interviews there was no reference to existing guides, such as the UK: Archaeological Data Service or Netherlands: DANS on archaeological documentation.
  • The removal of objects is now forbidden in most countries and many sites in the US. As a result data collection methods have changed from description of a physical object accessible in the US to a full surrogate for an object that might be re-buried in the ground. Data collection has increased as the collection of objects has decreased.Still individual systems of data collection (see examples on the right) have emerged which have.Developed over timeAre Handed down from mentors Contain some technological adoption, particularly the adoption of Excel spreadsheets over relational databasesIn all of our interviews there was no reference to existing guides, such as the UK: Archaeological Data Service or Netherlands: DANS on archaeological documentation.
  • The removal of objects is now forbidden in most countries and many sites in the US. As a result data collection methods have changed from description of a physical object accessible in the US to a full surrogate for an object that might be re-buried in the ground. Data collection has increased as the collection of objects has decreased.Still individual systems of data collection (see examples on the right) have emerged which have.Developed over timeAre Handed down from mentors Contain some technological adoption, particularly the adoption of Excel spreadsheets over relational databasesIn all of our interviews there was no reference to existing guides, such as the UK: Archaeological Data Service or Netherlands: DANS on archaeological documentation.
  • Idcc kansa-kansa-arbuckle

    1. 1. Publishing and Pushing: Mixing Models for Communicating Research Data in Archaeology Eric C. Kansa (@ekansa) UC Berkeley D-Lab & Open Context Sarah Whitcher Kansa Benjamin Arbuckle The Alexandria Archive Institute & Open Context University of North Carolina, Chapel Hill Unless otherwise indicated, this work is licensed under a Creative Commons Attribution 3.0 License <http://creativecommons.org/licenses/by/3.0/>
    2. 2. Introduction Challenges in Reusing Data 1. Background 2. Data publishing workflow 3. Data curation and dynamism
    3. 3. Need more carrots! 1. Citation, credit, intellectu ally valued 2. Research outcomes (new insights from data reuse!)
    4. 4. EOL Computable Data Challenge (Ben Arbuckle, Sarah W. Kansa, Eric Kansa)
    5. 5. Large scale data sharing & integration for exploring the origins of farming. Funded by EOL / NEH
    6. 6. 1. 300,000 bone specimens 2. Complex: dozens, up to 110 descriptive fields 3. 34 contributors from 15 archaeological sites 4. More than 4 person years of effort to create the data !
    7. 7. Relatively collaborative bunch, Ben Arbuckle cultivated relationships & built trust over years prior to EOL funding.
    8. 8. “204: Dynamics of Data Reuse when Aggregating Data through Time and Space: The Case of Archaeology and Zoology” Elizabeth Yakel; Ixchel Faniel; Rebecca Frank
    9. 9. Introduction Challenges in Reusing Data 1. Background 2. Data publishing workflow 3. Data curation and dynamism
    10. 10. 1. Referenced by US National Science Foundation and National Endowment for the Humanities for Data Management 2. “Data sharing as publishing” metaphor
    11. 11. Raw Data: Idiosyncratic, sometimes highly coded, often inconsistent
    12. 12. Raw Data Can Be Unappetizing
    13. 13. Publishing Workflow Improve / Enhance 1. 2. Consistency Context (intelligibility)
    14. 14. Sometimes data is better served cooked
    15. 15. - Documentation - Review, editing - Annotation
    16. 16. - Documentation - Review, editing - Annotation
    17. 17. - Documentation - Review, editing - Annotation
    18. 18. - Documentation - Review, editing - Annotation
    19. 19. - Documentation - Review, editing - Annotation
    20. 20. “Ovis orientalis” Sheep, wild Code: 16 Wild sheep O. orientalis Code: 70 Ovis orientalis Code: 15 Code: 14 Sheep (wild)
    21. 21. - Documentation - Review, editing - Annotation
    22. 22. “Ovis orientalis” http://eol.org/pages/311906/ Sheep, wild Code: 16 Wild sheep O. orientalis Code: 70 Ovis orientalis Code: 15 Code: 14 Sheep (wild)
    23. 23. ● ● Controlled vocabulary Linked Data applications
    24. 24. “Sheep/goat” http://eol.org/pages/32609438/ 1. Needed to mint new concepts like “sheep/goat” 2. Vocabularies need to be responsive for multidisciplinary applications
    25. 25. Linking to UBERON 1. Needed a controlled vocabulary for bone anatomy 2. Better data modeling than common in zooarchaeology, adds quality.
    26. 26. Linking to UBERON 1. Models links between anatomy, developmental biology, and genetics 2. Unexpected links between the Humanities and Bioinformatics!
    27. 27. 6500 BC (few pigs, mixing with wild animals?) 7500 BC (sheep + goat dominate, few pigs, few cattle) 7000 BC (many pigs, cattle) 8000 BC (cattle, pigs, sheep + goats) • Not a neat model of progress to adopt a more productive economy. Very different, sometimes piecemeal adoption in different regions. • Separate coastal and inland routes for the spread of domestic animals, over a 1000-year time period.
    28. 28. Easy to Align 1. Animal taxonomy 2. Bone anatomy 3. Sex determinations 4. Side of the animal 5. Fusion (bone growth, up to a point)
    29. 29. Hard to Align (poor modeling, recording) 1. Tooth wear (age) 2. Fusion data 3. Measurements Despite common research methods!!
    30. 30. “Under the hood” exposure will lead to better data documentation practices?
    31. 31. Nobody expected their data to see wider scrutiny either..
    32. 32. Professional expectations for data reuse 1. Need better data modeling (than feasible with, cough, Excel) 2. Data validation, normalization 3. Requires training & incentives for researchers to care more about quality of their data!
    33. 33. Data are challenging! 1. 2. 3. 4. 5. Decoding takes 10x longer Data management plans should also cover data modeling, quality control (esp. validation) More work needed modeling research methods (esp. sampling) Editing, annotation requires lots of back-and-forth with data authors Data needs investment to be useful!
    34. 34. Introduction Challenges in Reusing Data 1. Background 2. Data publishing workflow 3. Data curation and dynamism
    35. 35. Investing in Data is a Continual Need 1. 2. 3. 4. Data and code co-evolve. New visualizations, analysis may reveal unseen problems in data. Data and metadata change routinely (revised stratigraphy requires ongoing updates to data in this analysis) Problems, interpretive issues in data (and annotations) keep cropping up. Is publishing a bad metaphor implying a static product?
    36. 36. Data sharing as publication Data sharing as open source release cycles?
    37. 37. Data sharing as publication Data sharing as open source release cycles?
    38. 38. Data sharing as publication AND Data sharing as open source release cycles
    39. 39. One does not simply walk into Mordor Academia and share usable data… Image Credit: Copyright Newline Cinema
    40. 40. Final Thoughts Data require intellectual investment, methodological and theoretical innovation. Institutional structures poorly configured to support data powered research New professional roles needed, but who will pay for it?
    41. 41. Thank you! IDCC reviewers (excellent, very helpful comments!)

    ×