Data Publishing in Archaeozoology


Published on

Published in: Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Data Publishing in Archaeozoology

  1. 1. Data Publishing in Archaeozoologyor “Everybody knows that a 14 is a Sheep” Sarah Whitcher Kansa Alexandria Archive Institute Unless otherwise indicated, this work is licensed under a Creative Commons Attribution 3.0 License <>
  2. 2. Main Points- Reproducibility and new research opportunities require data sharing- Raw data are not sufficient- Publishing open data on the Web is a solution- Publishing data takes special expertise
  3. 3. Good scientific practice requires data sharing.We cannot trust results based on hidden data.
  4. 4. The Challenges • Limits of print (entrenched practice but not best practice) • Data preservation crisis (wasted effort) • Hard to compare and integrate data now
  5. 5. Policy Consensus: Urgent Need forBetter Data Practices!
  6. 6. DIPIR ( 3-Year project, Oct. 2010-Sept. 2013 National Leadership Grant from the Institute for Museum and Library Services (LG-06-10-0140-10) Ixchel Faniel (PI), Elizabeth Yakel (Co-PI)
  7. 7. Raw Data Can Be Unappetizing
  8. 8. Data Documentation Practices“I use an Excel spreadsheet…which I … inherited from myresearch advisers. …my dissertation advisor was still recordingdata for each specimen on paper when I was in graduate schoolso thats what I started …then quickly, I was like, ‘This isridiculous.’… I just started using an Excel spreadsheet that hassort of slowly gotten bigger and bigger over time with morevariables or columns…Ive added …color coding…I also use…a verysort of primitive numerical coding system, again, that I inheritedfrom my research advisers…So, this little book that goes with meof codes which is sort of odd, but …we all know that a 14 is asheep.” (CCU13) A long way to go before we get usable, intelligible data
  9. 9. Sometimes data isbetter servedcooked.
  10. 10. Adapt “publishing” metaphor to digital data
  11. 11. What is Data Publication? Putting editorially-vetted data on the Web • Cleaned, described, organized • More intelligible and cohesive • Open access • Linked to other resources (including print publications) • Machine-readable for discovery and reuse • Archived and curated (CDL)
  12. 12. Benefits & ChallengesThe Good: • Enhanced presentation • Enhanced search, discovery, understanding • Depth & breadth (linked to project data, other datasets, print publications, etc.) • Allowing for Linked Open Data = facilitates future use • Professional advancement The Bad: • Takes time, effort • Requires informatics expertise Benefits need to outweigh challenges
  13. 13. Thousand Flowers  Started in 2007  Integrates and publishes various forms of archaeological documentation (structured data, media, documents)  Not a repository, but archived with California Digital Library  Interoperability via web services, increasing emphasis on Linked Data
  14. 14. Data Publishing Data Quality and Standards Alignment (1) Check consistency (2) Edit functions (3) Align to common standards (“Linked Data” if applicable) (4) Issue tracking, version control
  15. 15. Data PublishingData Publishing  Comprehensive (Kenan Tepe: 30K photos, documents, object descriptions)  Added capabilities (search, analysis, visualization)  More attractive, usable data  Interactions with data editors improve data
  16. 16. • Citation provided for each item• CDL archival service to give permanence
  17. 17. Beyond the Silo  Often too much emphasis on single systems, need to consider relationships across systems  Even if one reaches some scale, it cant be isolated from the rest of the Web  Machines are important “audiences” (e.g. RESTful Services: Atom, AtomPub, JSON, etc.)
  18. 18. Linked Open Data Regarded as best practice for sharing data (among informatics researchers)
  19. 19. Web of Data (2009) Growing, Decentralized Innovation
  20. 20. Web of Data (2011)
  21. 21. Web of Data (2011) Need Archaeology on the Map Contributions should not be isolated from other communities
  22. 22. Open Context: Record  HTTP URIs to identify resources at a meaningful level of granulaity (“a URL per potsherd”)  Use HTTP URIs published by others  URIs act as “primary keys” allow data to be related
  23. 23. Concept: Bos taurus (
  24. 24. Concept: Bos taurus (
  25. 25. Open Context: Record
  26. 26. Open Context Entity Reconciliation Authors / Editors relate project-specific terminologies to global terminologies “Common name : Cattle, domestic” = (Bos taurus)
  27. 27. Open Context Entity Reconciliation Authors / Editors Many project- relate project-specific specific terms terminologies to related to global global terminologies terminologies Project Specific Property EOL Link (Global Terminology) Species : Sheep / Goat (Caprinae) Taxon : Bos taurus (Bos taurus) Species : Deer (Dama sp.) Type : Deer (Odocoileus sp.) Taxon : Ovis / Capra (Caprinae) Species : Cattle (Bos taurus) Species : Goat (Capra hircus)
  28. 28. Open Context Entity Reconciliation Authors / Editors Many project- Editorial work-flow relate project-specific specific terms helps annotate terminologies to related to global data for global terminologies terminologies interoperability Project Specific Property EOL Link (Global Terminology) Species : Sheep / Goat (Caprinae) Taxon : Bos taurus (Bos taurus) Species : Deer (Dama sp.) Type : Deer (Odocoileus sp.) Taxon : Ovis / Capra (Caprinae) Species : Cattle (Bos taurus) Species : Goat (Capra hircus)
  29. 29. Data Publishing Projects EOL (2012) funding for publishing additional zooarchaeology datasets (Neolithic Anatolia), in project led by Ben Arbuckle (Baylor University)
  30. 30. Data Publishing Projects NEH (2012) funding for publishing trade + exchange related datasets (Bronze- Iron Age Mediterranean)
  31. 31. Data Publishing Projects Complement Conventional Publishing  Lockwood Press (“Archaeobiology Series”), Cotsen Institute Press (UCLA)
  32. 32. Data Publishing Projects Driven by research interests and publication goals among researchers wanting to compare datasets, create reference collections, and have citable, full datasets linked to synthetic publications.
  33. 33. Summary Outcomes of Publishing Data: (1) Make “datasets” first class citizens in world of scholarly communications (2) Provide needed transparency to published interpretations (3) Enable new kinds of multi-disciplinary research across many datasets
  34. 34. Thank you!Special Thanks!Canan Ҫakırlar, RCAC, Koҫ University, ICAZ, and other sponsors