0
Data Publishing in      Archaeozoologyor “Everybody knows that a 14 is a Sheep”           Sarah Whitcher Kansa            ...
Main Points- Reproducibility and new  research opportunities  require data sharing- Raw data are not sufficient- Publishin...
Good scientific practice requires         data sharing.We cannot trust results based on         hidden data.
The Challenges • Limits of print   (entrenched practice   but not best practice) • Data preservation   crisis (wasted effo...
Policy Consensus:  Urgent Need forBetter Data Practices!
DIPIR (http://www.dipir.org)   3-Year project, Oct. 2010-Sept. 2013   National Leadership Grant from the Institute for  ...
Raw Data Can Be Unappetizing
Data Documentation Practices“I use an Excel spreadsheet…which I … inherited from myresearch advisers. …my dissertation adv...
Sometimes data isbetter servedcooked.
Adapt “publishing” metaphor       to digital data
What is Data Publication?     Putting editorially-vetted data on the Web • Cleaned, described, organized • More intelligib...
Benefits & ChallengesThe Good:   • Enhanced presentation   • Enhanced search, discovery, understanding   • Depth & breadth...
Thousand Flowers          Started in 2007          Integrates and publishes           various forms of archaeological   ...
Data Publishing                  Data Quality and                  Standards Alignment                  (1) Check consiste...
Data PublishingData Publishing    Comprehensive (Kenan Tepe: 30K     photos, documents, object     descriptions)    Adde...
• Citation provided for  each item• CDL archival service to  give permanence
Beyond the Silo          Often too much emphasis on           single systems, need to consider           relationships ac...
Linked Open Data                    Regarded as best                   practice for sharing                      data (amo...
Web of Data (2009)     Growing, Decentralized Innovation
Web of Data (2011)
Web of Data (2011)       Need Archaeology on the Map       Contributions should not be isolated       from other communities
Open Context: Record           HTTP URIs to identify resources            at a meaningful level of granulaity            ...
Concept: Bos taurus (http://eol.org/pages/328699/)
Concept: Bos taurus (http://eol.org/pages/328699/)
Open Context: Record
Open Context Entity Reconciliation    Authors / Editors relate project-specific    terminologies to  global terminologies ...
Open Context Entity Reconciliation    Authors / Editors           Many project- relate project-specific        specific te...
Open Context Entity Reconciliation    Authors / Editors           Many project-            Editorial work-flow relate proj...
Data Publishing Projects          EOL (2012) funding for publishing          additional zooarchaeology datasets          (...
Data Publishing Projects          NEH (2012) funding for publishing trade          + exchange related datasets (Bronze-   ...
Data Publishing Projects             Complement Conventional             Publishing                 Lockwood Press       ...
Data Publishing Projects       Driven by research interests and       publication goals among researchers       wanting to...
Summary Outcomes of Publishing Data:  (1) Make “datasets” first class citizens in      world of scholarly communications  ...
Thank you!Special Thanks!Canan Ҫakırlar, RCAC, Koҫ   University, ICAZ, and other   sponsors
Data Publishing in Archaeozoology
Data Publishing in Archaeozoology
Data Publishing in Archaeozoology
Data Publishing in Archaeozoology
Data Publishing in Archaeozoology
Upcoming SlideShare
Loading in...5
×

Data Publishing in Archaeozoology

302

Published on

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
302
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Data Publishing in Archaeozoology"

  1. 1. Data Publishing in Archaeozoologyor “Everybody knows that a 14 is a Sheep” Sarah Whitcher Kansa Alexandria Archive Institute OpenContext.org Unless otherwise indicated, this work is licensed under a Creative Commons Attribution 3.0 License <http://creativecommons.org/licenses/by/3.0/>
  2. 2. Main Points- Reproducibility and new research opportunities require data sharing- Raw data are not sufficient- Publishing open data on the Web is a solution- Publishing data takes special expertise
  3. 3. Good scientific practice requires data sharing.We cannot trust results based on hidden data.
  4. 4. The Challenges • Limits of print (entrenched practice but not best practice) • Data preservation crisis (wasted effort) • Hard to compare and integrate data now
  5. 5. Policy Consensus: Urgent Need forBetter Data Practices!
  6. 6. DIPIR (http://www.dipir.org) 3-Year project, Oct. 2010-Sept. 2013 National Leadership Grant from the Institute for Museum and Library Services (LG-06-10-0140-10) Ixchel Faniel (PI), Elizabeth Yakel (Co-PI)
  7. 7. Raw Data Can Be Unappetizing
  8. 8. Data Documentation Practices“I use an Excel spreadsheet…which I … inherited from myresearch advisers. …my dissertation advisor was still recordingdata for each specimen on paper when I was in graduate schoolso thats what I started …then quickly, I was like, ‘This isridiculous.’… I just started using an Excel spreadsheet that hassort of slowly gotten bigger and bigger over time with morevariables or columns…Ive added …color coding…I also use…a verysort of primitive numerical coding system, again, that I inheritedfrom my research advisers…So, this little book that goes with meof codes which is sort of odd, but …we all know that a 14 is asheep.” (CCU13) A long way to go before we get usable, intelligible data
  9. 9. Sometimes data isbetter servedcooked.
  10. 10. Adapt “publishing” metaphor to digital data
  11. 11. What is Data Publication? Putting editorially-vetted data on the Web • Cleaned, described, organized • More intelligible and cohesive • Open access • Linked to other resources (including print publications) • Machine-readable for discovery and reuse • Archived and curated (CDL)
  12. 12. Benefits & ChallengesThe Good: • Enhanced presentation • Enhanced search, discovery, understanding • Depth & breadth (linked to project data, other datasets, print publications, etc.) • Allowing for Linked Open Data = facilitates future use • Professional advancement The Bad: • Takes time, effort • Requires informatics expertise Benefits need to outweigh challenges
  13. 13. Thousand Flowers  Started in 2007  Integrates and publishes various forms of archaeological documentation (structured data, media, documents)  Not a repository, but archived with California Digital Library  Interoperability via web services, increasing emphasis on Linked Data
  14. 14. Data Publishing Data Quality and Standards Alignment (1) Check consistency (2) Edit functions (3) Align to common standards (“Linked Data” if applicable) (4) Issue tracking, version control
  15. 15. Data PublishingData Publishing  Comprehensive (Kenan Tepe: 30K photos, documents, object descriptions)  Added capabilities (search, analysis, visualization)  More attractive, usable data  Interactions with data editors improve data
  16. 16. • Citation provided for each item• CDL archival service to give permanence
  17. 17. Beyond the Silo  Often too much emphasis on single systems, need to consider relationships across systems  Even if one reaches some scale, it cant be isolated from the rest of the Web  Machines are important “audiences” (e.g. RESTful Services: Atom, AtomPub, JSON, etc.)
  18. 18. Linked Open Data Regarded as best practice for sharing data (among informatics researchers)
  19. 19. Web of Data (2009) Growing, Decentralized Innovation
  20. 20. Web of Data (2011)
  21. 21. Web of Data (2011) Need Archaeology on the Map Contributions should not be isolated from other communities
  22. 22. Open Context: Record  HTTP URIs to identify resources at a meaningful level of granulaity (“a URL per potsherd”)  Use HTTP URIs published by others  URIs act as “primary keys” allow data to be related
  23. 23. Concept: Bos taurus (http://eol.org/pages/328699/)
  24. 24. Concept: Bos taurus (http://eol.org/pages/328699/)
  25. 25. Open Context: Record
  26. 26. Open Context Entity Reconciliation Authors / Editors relate project-specific terminologies to global terminologies “Common name : Cattle, domestic” = http://eol.org/pages/328699/ (Bos taurus)
  27. 27. Open Context Entity Reconciliation Authors / Editors Many project- relate project-specific specific terms terminologies to related to global global terminologies terminologies Project Specific Property EOL Link (Global Terminology) Species : Sheep / Goat http://eol.org/pages/2851411/ (Caprinae) Taxon : Bos taurus http://eol.org/pages/328699/ (Bos taurus) Species : Deer http://eol.org/pages/38816/ (Dama sp.) Type : Deer http://eol.org/pages/34547/ (Odocoileus sp.) Taxon : Ovis / Capra http://eol.org/pages/2851411/ (Caprinae) Species : Cattle http://eol.org/pages/34548/ (Bos taurus) Species : Goat http://eol.org/pages/328660/ (Capra hircus)
  28. 28. Open Context Entity Reconciliation Authors / Editors Many project- Editorial work-flow relate project-specific specific terms helps annotate terminologies to related to global data for global terminologies terminologies interoperability Project Specific Property EOL Link (Global Terminology) Species : Sheep / Goat http://eol.org/pages/2851411/ (Caprinae) Taxon : Bos taurus http://eol.org/pages/328699/ (Bos taurus) Species : Deer http://eol.org/pages/38816/ (Dama sp.) Type : Deer http://eol.org/pages/34547/ (Odocoileus sp.) Taxon : Ovis / Capra http://eol.org/pages/2851411/ (Caprinae) Species : Cattle http://eol.org/pages/34548/ (Bos taurus) Species : Goat http://eol.org/pages/328660/ (Capra hircus)
  29. 29. Data Publishing Projects EOL (2012) funding for publishing additional zooarchaeology datasets (Neolithic Anatolia), in project led by Ben Arbuckle (Baylor University)
  30. 30. Data Publishing Projects NEH (2012) funding for publishing trade + exchange related datasets (Bronze- Iron Age Mediterranean)
  31. 31. Data Publishing Projects Complement Conventional Publishing  Lockwood Press (“Archaeobiology Series”), Cotsen Institute Press (UCLA)
  32. 32. Data Publishing Projects Driven by research interests and publication goals among researchers wanting to compare datasets, create reference collections, and have citable, full datasets linked to synthetic publications.
  33. 33. Summary Outcomes of Publishing Data: (1) Make “datasets” first class citizens in world of scholarly communications (2) Provide needed transparency to published interpretations (3) Enable new kinds of multi-disciplinary research across many datasets
  34. 34. Thank you!Special Thanks!Canan Ҫakırlar, RCAC, Koҫ University, ICAZ, and other sponsors
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×