Case-Study: Publishing to the“Web of Data” in Archaeology      Quality and Workflows                              Eric Kan...
“Small Science” data sharing                                                              is hard:                        ...
Thousand Flowers         ●             Open Context: Open access,             open licensed data for             arhaeolog...
Thousand FlowersFills a Gap:Most data sources are institutional.Open Context publishes individual,small group contributions
Thousand FlowersFills a Gap:                                       Challenge:Most data sources are institutional.   Divers...
•    3-year project Oct 2010 – Sep 2013•    Funded with a National Leadership Grant from the    Institute for Museum and L...
DIPIR Collaboration
The Big DIPIR QuestionsResearch Questions1. What are the significantproperties of data thatfacilitate reuse by thedesignat...
Open Context Interviewees•    22 Ph.D. or graduate students    interviewed    –        13 men    –        9 women•    Novi...
Raw Data is Unappetizing?
Data Documentation PracticesI use an Excel spreadsheet…which I … inherited from my researchadvisers. …my dissertation advi...
Data Documentation PracticesI use an Excel spreadsheet…which I … inherited from my researchadvisers. …my dissertation advi...
Sometimes data is betterserved cooked.
Thousand Flowers        ●            Clean-up and document            contributed data        ●            Map to ArchaeoM...
Open Context: Record
Open Context: Record                       ●                           XHTML + RDFa (Dublin Core,                         ...
Open Context: Record
Open Context: Record
Open Context: Visutalization of Data Linked to the EOL
My Precious Data  Image Credit: “Lord of the Rings” (2003, New      Line), All Rights Reserved Copyright
Data sharing as publication
Data Publishing
Publishing             Data Quality and Standards             Alignment             (1) Check consistency             (2) ...
Publishing             Tools of the Trade              (1) Google Refine (check, edit,                  consistancy)      ...
Publishing             Tools of the Trade              (1) Domain scientists (Editorial                  Board) check data...
Publishing               Project Metadata             Column Descriptions
Web of Data (2011)         Main Contributors:              ●                  Institutions (esp. government)              ...
Publishing             Entity Reconciliation              (1) With Google Refine              (2) Implemented, EOL and    ...
●    CDL Archiving Service●    EZID for persistent Identity: DOIs    (aggregate resources), ARKs    (granular resources) a...
CDL as Infrastructure●    Platform / Services    disciplinary communities    can use for “Data    Publishing”●    Differen...
CDL as Infrastructure                                   Future data                                 Future data           ...
eScholarship: UC’s OA Publishing Platform
Platform for traditional publishing
Also supports new genres
Summary Outcomes of Publishing Data:  (1) Communicate and set      expectations about content and      quality  (2) Organi...
Final ThoughtsPublication needs to evolve! (1) Participating in Linked Data is     a great goal, but far removed     from ...
Upcoming SlideShare
Loading in …5
×

IASSIT Kansa Presentation

847 views

Published on

A presentation given at the "Data Stewardship: Increasing the Integrity and Effectiveness of Science and Scholarship" Session on Friday, June 8 2012 at the IASSIT 2012 conference in Washington DC.

This presentation introduced data publishing, using a social science (archaeology) case study to explore editorial processes and dissemination outcomes that increasingly demand “Linked Data” capabilities.

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
847
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
5
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

IASSIT Kansa Presentation

  1. 1. Case-Study: Publishing to the“Web of Data” in Archaeology Quality and Workflows Eric Kansa UC Berkeley / OpenContext.org Unless otherwise indicated, this work is licensed under a Creative Commons Attribution 3.0 License <http://creativecommons.org/licenses/by/3.0/>
  2. 2. “Small Science” data sharing is hard: (1) Complexity (2) Scalability (3) Ethics, cultural property claims, IP (4) Incentives (5) PreservationImage Credit: “Grand Canyon NPS” via Flickr (CC-By) http://www.flickr.com/photos/grand_canyon_nps/5975537378/
  3. 3. Thousand Flowers ● Open Context: Open access, open licensed data for arhaeology ● Archiving by California Digital Library ● Persistent Identifiers (DOIs, ARKs) ● Web services ● NSF/NEH links for data management plans
  4. 4. Thousand FlowersFills a Gap:Most data sources are institutional.Open Context publishes individual,small group contributions
  5. 5. Thousand FlowersFills a Gap: Challenge:Most data sources are institutional. DiverseOpen Context publishes individual, contributions,small group contributions needing lots of work to clean- up and “link” to the Web of Data
  6. 6. • 3-year project Oct 2010 – Sep 2013• Funded with a National Leadership Grant from the Institute for Museum and Library Services, LG-06- 10-0140-10, “Dissemination Information Packages for Information Reuse”• Ixchel Faniel, PI & Elizabeth Yakel, Co-PI http://www.dipir.org
  7. 7. DIPIR Collaboration
  8. 8. The Big DIPIR QuestionsResearch Questions1. What are the significantproperties of data thatfacilitate reuse by thedesignated communities at thethree sites?2. How can these significantproperties be expressed asrepresentation information toensure the preservation ofmeaning and enable datareuse?
  9. 9. Open Context Interviewees• 22 Ph.D. or graduate students interviewed – 13 men – 9 women• Novices / Experts – 19 experts – 3 novices• Interviewees who where curators or professors also with a curatorial role = 6
  10. 10. Raw Data is Unappetizing?
  11. 11. Data Documentation PracticesI use an Excel spreadsheet…which I … inherited from my researchadvisers. …my dissertation advisor was still recording data for eachspecimen on paper when I was in graduate school so thats what Istarted …then quickly, I was like, "This is ridiculous.“… I just startedusing an Excel spreadsheet that has sort of slowly gotten bigger andbigger over time with more variables or columns…Ive added …colorcoding…I also use…a very sort of primitive numerical coding system,again, that I inherited from my research advisers…So, this little bookthat goes with me of codes which is sort of odd, but …we all knowthat a 14 is a sheep.” (CCU13)
  12. 12. Data Documentation PracticesI use an Excel spreadsheet…which I … inherited from my researchadvisers. …my dissertation advisor was still recording data for eachspecimen on paper when I was in graduate school so thats what Istarted …then quickly, I was like, "This is ridiculous.“… I just startedusing an Excel spreadsheet that has sort of slowly gotten bigger andbigger over time with more variables or columns…Ive added …colorcoding…I also use…a very sort of primitive numerical coding system,again, that I inherited from my research advisers…So, this little bookthat goes with me of codes which is sort of odd, but …we all knowthat a 14 is a sheep.” (CCU13) A long way to go before we get usable, intelligible data
  13. 13. Sometimes data is betterserved cooked.
  14. 14. Thousand Flowers ● Clean-up and document contributed data ● Map to ArchaeoML (general ontology) ● Mint URIs to entities (potsherds, projects, contexts, people) ● Link to important vocabularies / collections (Pleiades, Encyclopedia of Life) ● Working on CIDOC-CRM (RDF) representations (not straightforward)
  15. 15. Open Context: Record
  16. 16. Open Context: Record ● XHTML + RDFa (Dublin Core, Open Annotation, etc.) ● XML (ArchaeoML) ● Atom ● RDF (draft CIDOC) ● Link to GitHub versioned file
  17. 17. Open Context: Record
  18. 18. Open Context: Record
  19. 19. Open Context: Visutalization of Data Linked to the EOL
  20. 20. My Precious Data Image Credit: “Lord of the Rings” (2003, New Line), All Rights Reserved Copyright
  21. 21. Data sharing as publication
  22. 22. Data Publishing
  23. 23. Publishing Data Quality and Standards Alignment (1) Check consistency (2) Edit functions (3) Align to common standards (“Linked Data” if applicable) (4) Issue tracking, version control
  24. 24. Publishing Tools of the Trade (1) Google Refine (check, edit, consistancy) (2) Mantis (issue-tracker, coordinate edits, metadata creation)
  25. 25. Publishing Tools of the Trade (1) Domain scientists (Editorial Board) check data (2) Iterative “coproduction” between contributors and editoris
  26. 26. Publishing Project Metadata Column Descriptions
  27. 27. Web of Data (2011) Main Contributors: ● Institutions (esp. government) ● Thematic collections / projects
  28. 28. Publishing Entity Reconciliation (1) With Google Refine (2) Implemented, EOL and Pleiades (gazetteer) (3) Use existing mappings to improve future reconciliation
  29. 29. ● CDL Archiving Service● EZID for persistent Identity: DOIs (aggregate resources), ARKs (granular resources) and Merritt Repository● Helps build trust in community
  30. 30. CDL as Infrastructure● Platform / Services disciplinary communities can use for “Data Publishing”● Different communities work out semantic/interoperability needs, editorial policies, University of California (System) incentives, etc. Repository, All disciplines (UC-funded library, grants)
  31. 31. CDL as Infrastructure Future data Future data publisher publisher● Platform / Services disciplinary communities can use for “Data Publishing”● Different communities work out semantic/interoperability needs, editorial policies, University of California (System) incentives, etc. Repository, All disciplines (UC-funded library, grants)
  32. 32. eScholarship: UC’s OA Publishing Platform
  33. 33. Platform for traditional publishing
  34. 34. Also supports new genres
  35. 35. Summary Outcomes of Publishing Data: (1) Communicate and set expectations about content and quality (2) Organize workflows to improve data quality and usability (3) Make “datasets” first class citizens in world of scholarly communications
  36. 36. Final ThoughtsPublication needs to evolve! (1) Participating in Linked Data is a great goal, but far removed from most everyday practice (2) Researchers need help. (3) 19th century publication norms poorly suited to 21st century methods, research, public goals

×