SlideShare a Scribd company logo
A Publication Approach to
Linked Data in Archaeology
A Publication Approach to
Linked Data in Archaeology
Eric C. Kansa
UC Berkeley / OpenContext.org
Unless otherwise indicated, this work is licensed under a Creative Commons Attribution
3.0 License <http://creativecommons.org/licenses/by/3.0/>
• Started in 2007
• Open access / open data
publishing for archaeology
• Archiving by California
Digital Library
• Referenced by NSF and
NEH for grant data
management
• Started in 2007
• Open access / open data
publishing for archaeology
• Archiving by California
Digital Library
• Referenced by NSF and
NEH for grant data
management
My Precious DataMy Precious Data
?
Data Sharing as Publication
• Several projects studying
editorial + publishing
workflows
• Current Funding: ACLS,
NEH, Sloan, EOL
Data Sharing as Publication
• Several projects studying
editorial + publishing
workflows
• Current Funding: ACLS,
NEH, Sloan, EOL
Web of DataWeb of Data
Cross-discipline Connections
Open Context links with
humanities data (CIDOC,
Pleiades, British Museum), and
natural sciences (EOL, UBERON)
Pelagios API
EOL Computable Data
Challenge
(Ben Arbuckle, Sarah Kansa,
Eric Kansa)
EOL Computable Data
Challenge
1. 15 different sites
2. 34 zooarchaeologists
3. Publishing: decoding, cleanup,
metadata documentation
4. Linked Data annotation (EOL,
UBERON, biometrics)
5. Collaborative analysis
6. Reuse itself studied by
DIPIR.org (U. Michigan
ISchool)
EOL Computable Data
Challenge
1. 15 different sites
2. 34 zooarchaeologists
3. Publishing: decoding, cleanup,
metadata documentation
4. Linked Data annotation (EOL,
UBERON, biometrics)
5. Collaborative analysis
6. Reuse itself studied by
DIPIR.org (U. Michigan
ISchool)
Data Publishing
Google / Open Refine
1. Check consistency
2. Edit functions
3. All changes logged, can be
rolled back
Google / Open Refine
1. Check consistency
2. Edit functions
3. All changes logged, can be
rolled back
Bibliography
• Bibliographic references
expressed as Linked Data
(modeled after S. Heath)
• Associates publication
citation with Open Access
variants
Bibliography
• Bibliographic references
expressed as Linked Data
(modeled after S. Heath)
• Associates publication
citation with Open Access
variants
Why UBERON?
1. Expresses relevant expert knowledge,
tremendous effort. Why ignore or
duplicate this effort?
2. Anatomic entities related to
embryology, genetic networks. New
research opportunities for zooarch?
3. Zooarchaeology gains stakeholders
(biometric data of wide interest)
Why UBERON?
1. Expresses relevant expert knowledge,
tremendous effort. Why ignore or
duplicate this effort?
2. Anatomic entities related to
embryology, genetic networks. New
research opportunities for zooarch?
3. Zooarchaeology gains stakeholders
(biometric data of wide interest)
“Ovis aries”
http://eol.org/pages/311906/
Code: 14
Domestic
sheep
Code: 70
Code: 16
Ovis aries
Code: 15
Sheep
O. aries
Schaf
Sh.
“Distal epiphysis unfused”
http://opencontext.org/vocabularies/open-context-zooarch/zoo-0058
dist.
unfused
d. uf.
30
uf. dist.,
f. prox.
Distal epiph.
unfused
Distal end unf.
Sheep/Goat Distal Femur FusionSheep/Goat Distal Femur Fusion
Karain B Cave (N=53) Pınarbaşı (N=3) Çukuriçi Höyük (N=13)
Suberde (N=0) Domuztepe (N=28) Ulucak (N=15)
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Unfused
Fused
“Distal epiphysis unfused”
http://opencontext.org/vocabularies/open-context-zooarch/zoo-0058
DIPIR: Data Documentation PracticesDIPIR: Data Documentation Practices
I use an Excel spreadsheet…which I … inherited from my research
advisers. …my dissertation advisor was still recording data for each
specimen on paper when I was in graduate school so that's what I
started …then quickly, I was like, "This is ridiculous.“… I just started
using an Excel spreadsheet that has sort of slowly gotten bigger and
bigger over time with more variables or columns…I've added …color
coding…I also use…a very sort of primitive numerical coding system,
again, that I inherited from my research advisers…So, this little book
that goes with me of codes which is sort of odd, but …we all know
that a 14 is a sheep.” (CCU13)
A long way to go before we
get usable, intelligible data
CC-BY (Eduardo Otubo)
http://www.flickr.com/photos/otubo/5091378744
SPARQL endpoint easy to break (too big of a graph
to query).
Needed a work-around, so I also use the normal
(“plain web”) index to query the British Museum.
(1) Keyword
search for
relevant term.
(2) Scrape results
(blech!) for item
identifiers
(“objectid”
parameter in
URLs)
(3) Use ObjectIDs
in SPARQL queries
(limits size of
graph queried, so
server doesn’t
die).
SELECT ?s ?oPart ?oThes ?oLab
WHERE
{
?s
<http://collection.britishmuseum.org/id/c
rm/bm-extensions/codex_id>
'$objectID';
<http://collection.britishmuseum.org/id/c
rm/P46F.is_composed_of> ?oPart.
?oPart
<http://collection.britishmuseum.org/id/c
rm/P45F.consists_of> ?oThes.
?oThes
<http://www.w3.org/2004/02/skos/core#
prefLabel> ?oLab.
} LIMIT 10
Why is linked
data important?
Why is linked
data important?
1. Improve data quality, expert
curation of concepts +
vocabularies
2. Develop ties with other
research communities (can
feedback to collect new /
different data)
3. Increasingly sophisticated
open source tools, support
services
4. Part of the Web, not just on
the Web
1. Improve data quality, expert
curation of concepts +
vocabularies
2. Develop ties with other
research communities (can
feedback to collect new /
different data)
3. Increasingly sophisticated
open source tools, support
services
4. Part of the Web, not just on
the Web
… but
participating
in Linked Data
requires
effort!
… but
participating
in Linked Data
requires
effort!
Why is linked
data important?
Why is linked
data important?
Image Credit: Copyright Newline Cinema
One does not simply
share usable data…
Data are challenging
1. “Raw data” often problematic,
even with documentation (10X
effort needed with decoded data)
2. Tension between modeling needs
and familiarity with tools (Excel)
3. More work needed modeling
research methods (esp. sampling,
see DIPIR.org outcomes)
4. You’re never going to be done!
Data are challenging
1. “Raw data” often problematic,
even with documentation (10X
effort needed with decoded data)
2. Tension between modeling needs
and familiarity with tools (Excel)
3. More work needed modeling
research methods (esp. sampling,
see DIPIR.org outcomes)
4. You’re never going to be done!

More Related Content

What's hot

Linking Data, Linking People
Linking Data, Linking PeopleLinking Data, Linking People
Linking Data, Linking People
fereiraJ
 
Finding sci tech grey literature information
Finding sci tech grey literature informationFinding sci tech grey literature information
Finding sci tech grey literature information
Matthew Von Hendy
 
Interpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata: Examples from Open ContextInterpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata: Examples from Open Context
Eric Kansa
 
Texas sla presentation finding sci tech grey literature information
Texas sla presentation  finding sci tech grey literature informationTexas sla presentation  finding sci tech grey literature information
Texas sla presentation finding sci tech grey literature information
Matthew Von Hendy
 
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
ICZN
 
DataCite: the Perfect Complement to CrossRef
DataCite: the Perfect Complement to CrossRefDataCite: the Perfect Complement to CrossRef
DataCite: the Perfect Complement to CrossRef
Crossref
 
The MIAPA ontology: An annotation ontology for validating minimum metadata re...
The MIAPA ontology: An annotation ontology for validating minimum metadata re...The MIAPA ontology: An annotation ontology for validating minimum metadata re...
The MIAPA ontology: An annotation ontology for validating minimum metadata re...
Hilmar Lapp
 
Mcb database resources workshop 2013
Mcb database resources workshop 2013Mcb database resources workshop 2013
Mcb database resources workshop 2013
UCT
 
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
CEDAR: Center for Expanded Data Annotation and Retrieval
 
Building the new open linked library: Theory and Practice
Building the new open linked library: Theory and PracticeBuilding the new open linked library: Theory and Practice
Building the new open linked library: Theory and Practice
Trish Rose-Sandler
 
Content Mining of Science and Medicine
Content Mining of Science and MedicineContent Mining of Science and Medicine
Content Mining of Science and Medicine
TheContentMine
 
Finding and accessing human genome data with Repositive
Finding and accessing human genome data with RepositiveFinding and accessing human genome data with Repositive
Finding and accessing human genome data with Repositive
Manuel Corpas
 
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
CEDAR: Center for Expanded Data Annotation and Retrieval
 
Text and Data Mining explained at FTDM
Text and Data Mining explained at FTDMText and Data Mining explained at FTDM
Text and Data Mining explained at FTDM
petermurrayrust
 
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...
marcosmartinezromero
 
Top 10 web
Top 10 webTop 10 web
Top 10 web
skoskinen
 
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
CEDAR: Center for Expanded Data Annotation and Retrieval
 
Introduction to FundRef Webinar
Introduction to FundRef WebinarIntroduction to FundRef Webinar
Introduction to FundRef Webinar
Crossref
 
DAS game: how a programmer thinks
DAS game: how a programmer thinksDAS game: how a programmer thinks
DAS game: how a programmer thinks
Rafael C. Jimenez
 
Open Annotation Model
Open Annotation ModelOpen Annotation Model
Open Annotation Model
Paolo Ciccarese
 

What's hot (20)

Linking Data, Linking People
Linking Data, Linking PeopleLinking Data, Linking People
Linking Data, Linking People
 
Finding sci tech grey literature information
Finding sci tech grey literature informationFinding sci tech grey literature information
Finding sci tech grey literature information
 
Interpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata: Examples from Open ContextInterpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata: Examples from Open Context
 
Texas sla presentation finding sci tech grey literature information
Texas sla presentation  finding sci tech grey literature informationTexas sla presentation  finding sci tech grey literature information
Texas sla presentation finding sci tech grey literature information
 
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
 
DataCite: the Perfect Complement to CrossRef
DataCite: the Perfect Complement to CrossRefDataCite: the Perfect Complement to CrossRef
DataCite: the Perfect Complement to CrossRef
 
The MIAPA ontology: An annotation ontology for validating minimum metadata re...
The MIAPA ontology: An annotation ontology for validating minimum metadata re...The MIAPA ontology: An annotation ontology for validating minimum metadata re...
The MIAPA ontology: An annotation ontology for validating minimum metadata re...
 
Mcb database resources workshop 2013
Mcb database resources workshop 2013Mcb database resources workshop 2013
Mcb database resources workshop 2013
 
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
 
Building the new open linked library: Theory and Practice
Building the new open linked library: Theory and PracticeBuilding the new open linked library: Theory and Practice
Building the new open linked library: Theory and Practice
 
Content Mining of Science and Medicine
Content Mining of Science and MedicineContent Mining of Science and Medicine
Content Mining of Science and Medicine
 
Finding and accessing human genome data with Repositive
Finding and accessing human genome data with RepositiveFinding and accessing human genome data with Repositive
Finding and accessing human genome data with Repositive
 
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
 
Text and Data Mining explained at FTDM
Text and Data Mining explained at FTDMText and Data Mining explained at FTDM
Text and Data Mining explained at FTDM
 
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...
 
Top 10 web
Top 10 webTop 10 web
Top 10 web
 
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
 
Introduction to FundRef Webinar
Introduction to FundRef WebinarIntroduction to FundRef Webinar
Introduction to FundRef Webinar
 
DAS game: how a programmer thinks
DAS game: how a programmer thinksDAS game: how a programmer thinks
DAS game: how a programmer thinks
 
Open Annotation Model
Open Annotation ModelOpen Annotation Model
Open Annotation Model
 

Similar to #LAWDI Open Context, publishing linked data in archaeology

ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
Jon Voss
 
An Open Context for Archaeology
An Open Context for ArchaeologyAn Open Context for Archaeology
An Open Context for Archaeology
guest756e05
 
It's not rocket surgery - Linked In: ALA 2011
It's not rocket surgery - Linked In: ALA 2011It's not rocket surgery - Linked In: ALA 2011
It's not rocket surgery - Linked In: ALA 2011
Ross Singer
 
Idcc kansa-kansa-arbuckle
Idcc kansa-kansa-arbuckleIdcc kansa-kansa-arbuckle
Idcc kansa-kansa-arbuckle
Eric Kansa
 
Data Sharing as Publication: A View from Archaeology
Data Sharing as Publication: A View from ArchaeologyData Sharing as Publication: A View from Archaeology
Data Sharing as Publication: A View from Archaeology
Eric Kansa
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
Scott Edmunds
 
Ontologies neo4j-graph-workshop-berlin
Ontologies neo4j-graph-workshop-berlinOntologies neo4j-graph-workshop-berlin
Ontologies neo4j-graph-workshop-berlin
Simon Jupp
 
2 Discovery and Acquisition of Data1.pptx
2 Discovery and Acquisition of Data1.pptx2 Discovery and Acquisition of Data1.pptx
2 Discovery and Acquisition of Data1.pptx
vijayapraba1
 
Encyclopedia of Life: Applying Concepts from Amazon and LEGO to Biodiversity ...
Encyclopedia of Life: Applying Concepts from Amazon and LEGO to Biodiversity ...Encyclopedia of Life: Applying Concepts from Amazon and LEGO to Biodiversity ...
Encyclopedia of Life: Applying Concepts from Amazon and LEGO to Biodiversity ...
Cyndy Parr
 
Content Mining at Wellcome Trust
Content Mining at Wellcome TrustContent Mining at Wellcome Trust
Content Mining at Wellcome Trust
TheContentMine
 
Importing life science at a into Neo4j
Importing life science at a into Neo4jImporting life science at a into Neo4j
Importing life science at a into Neo4j
Simon Jupp
 
We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...
We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...
We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...
Trevor Owens
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
Herbert Van de Sompel
 
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
Cyndy Parr
 
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Anita de Waard
 
Peer Review and Science2.0
Peer Review and Science2.0Peer Review and Science2.0
Peer Review and Science2.0
Jean-Claude Bradley
 
Reference Rot and Linked Data: Threat and Remedy
Reference Rot and Linked Data: Threat and RemedyReference Rot and Linked Data: Threat and Remedy
Reference Rot and Linked Data: Threat and Remedy
EDINA, University of Edinburgh
 
Maximising your communication impact – making altmetrics workss
Maximising your communication impact – making altmetrics workssMaximising your communication impact – making altmetrics workss
Maximising your communication impact – making altmetrics workss
Ciarán Quinn
 
Metadata for researchers
Metadata for researchers Metadata for researchers
Metadata for researchers
Getaneh Alemu
 
FAIRy Stories
FAIRy StoriesFAIRy Stories
FAIRy Stories
Carole Goble
 

Similar to #LAWDI Open Context, publishing linked data in archaeology (20)

ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
 
An Open Context for Archaeology
An Open Context for ArchaeologyAn Open Context for Archaeology
An Open Context for Archaeology
 
It's not rocket surgery - Linked In: ALA 2011
It's not rocket surgery - Linked In: ALA 2011It's not rocket surgery - Linked In: ALA 2011
It's not rocket surgery - Linked In: ALA 2011
 
Idcc kansa-kansa-arbuckle
Idcc kansa-kansa-arbuckleIdcc kansa-kansa-arbuckle
Idcc kansa-kansa-arbuckle
 
Data Sharing as Publication: A View from Archaeology
Data Sharing as Publication: A View from ArchaeologyData Sharing as Publication: A View from Archaeology
Data Sharing as Publication: A View from Archaeology
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 
Ontologies neo4j-graph-workshop-berlin
Ontologies neo4j-graph-workshop-berlinOntologies neo4j-graph-workshop-berlin
Ontologies neo4j-graph-workshop-berlin
 
2 Discovery and Acquisition of Data1.pptx
2 Discovery and Acquisition of Data1.pptx2 Discovery and Acquisition of Data1.pptx
2 Discovery and Acquisition of Data1.pptx
 
Encyclopedia of Life: Applying Concepts from Amazon and LEGO to Biodiversity ...
Encyclopedia of Life: Applying Concepts from Amazon and LEGO to Biodiversity ...Encyclopedia of Life: Applying Concepts from Amazon and LEGO to Biodiversity ...
Encyclopedia of Life: Applying Concepts from Amazon and LEGO to Biodiversity ...
 
Content Mining at Wellcome Trust
Content Mining at Wellcome TrustContent Mining at Wellcome Trust
Content Mining at Wellcome Trust
 
Importing life science at a into Neo4j
Importing life science at a into Neo4jImporting life science at a into Neo4j
Importing life science at a into Neo4j
 
We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...
We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...
We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
 
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
 
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
 
Peer Review and Science2.0
Peer Review and Science2.0Peer Review and Science2.0
Peer Review and Science2.0
 
Reference Rot and Linked Data: Threat and Remedy
Reference Rot and Linked Data: Threat and RemedyReference Rot and Linked Data: Threat and Remedy
Reference Rot and Linked Data: Threat and Remedy
 
Maximising your communication impact – making altmetrics workss
Maximising your communication impact – making altmetrics workssMaximising your communication impact – making altmetrics workss
Maximising your communication impact – making altmetrics workss
 
Metadata for researchers
Metadata for researchers Metadata for researchers
Metadata for researchers
 
FAIRy Stories
FAIRy StoriesFAIRy Stories
FAIRy Stories
 

Recently uploaded

Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 

Recently uploaded (20)

Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 

#LAWDI Open Context, publishing linked data in archaeology

  • 1. A Publication Approach to Linked Data in Archaeology A Publication Approach to Linked Data in Archaeology Eric C. Kansa UC Berkeley / OpenContext.org Unless otherwise indicated, this work is licensed under a Creative Commons Attribution 3.0 License <http://creativecommons.org/licenses/by/3.0/>
  • 2. • Started in 2007 • Open access / open data publishing for archaeology • Archiving by California Digital Library • Referenced by NSF and NEH for grant data management • Started in 2007 • Open access / open data publishing for archaeology • Archiving by California Digital Library • Referenced by NSF and NEH for grant data management
  • 3. My Precious DataMy Precious Data ?
  • 4.
  • 5. Data Sharing as Publication • Several projects studying editorial + publishing workflows • Current Funding: ACLS, NEH, Sloan, EOL Data Sharing as Publication • Several projects studying editorial + publishing workflows • Current Funding: ACLS, NEH, Sloan, EOL
  • 6.
  • 7.
  • 8. Web of DataWeb of Data Cross-discipline Connections Open Context links with humanities data (CIDOC, Pleiades, British Museum), and natural sciences (EOL, UBERON)
  • 10. EOL Computable Data Challenge (Ben Arbuckle, Sarah Kansa, Eric Kansa)
  • 11. EOL Computable Data Challenge 1. 15 different sites 2. 34 zooarchaeologists 3. Publishing: decoding, cleanup, metadata documentation 4. Linked Data annotation (EOL, UBERON, biometrics) 5. Collaborative analysis 6. Reuse itself studied by DIPIR.org (U. Michigan ISchool) EOL Computable Data Challenge 1. 15 different sites 2. 34 zooarchaeologists 3. Publishing: decoding, cleanup, metadata documentation 4. Linked Data annotation (EOL, UBERON, biometrics) 5. Collaborative analysis 6. Reuse itself studied by DIPIR.org (U. Michigan ISchool)
  • 12. Data Publishing Google / Open Refine 1. Check consistency 2. Edit functions 3. All changes logged, can be rolled back Google / Open Refine 1. Check consistency 2. Edit functions 3. All changes logged, can be rolled back
  • 13.
  • 14. Bibliography • Bibliographic references expressed as Linked Data (modeled after S. Heath) • Associates publication citation with Open Access variants Bibliography • Bibliographic references expressed as Linked Data (modeled after S. Heath) • Associates publication citation with Open Access variants
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22. Why UBERON? 1. Expresses relevant expert knowledge, tremendous effort. Why ignore or duplicate this effort? 2. Anatomic entities related to embryology, genetic networks. New research opportunities for zooarch? 3. Zooarchaeology gains stakeholders (biometric data of wide interest) Why UBERON? 1. Expresses relevant expert knowledge, tremendous effort. Why ignore or duplicate this effort? 2. Anatomic entities related to embryology, genetic networks. New research opportunities for zooarch? 3. Zooarchaeology gains stakeholders (biometric data of wide interest)
  • 23.
  • 24.
  • 25. “Ovis aries” http://eol.org/pages/311906/ Code: 14 Domestic sheep Code: 70 Code: 16 Ovis aries Code: 15 Sheep O. aries Schaf Sh.
  • 26.
  • 27.
  • 29. Sheep/Goat Distal Femur FusionSheep/Goat Distal Femur Fusion Karain B Cave (N=53) Pınarbaşı (N=3) Çukuriçi Höyük (N=13) Suberde (N=0) Domuztepe (N=28) Ulucak (N=15) 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Unfused Fused
  • 31.
  • 32.
  • 33.
  • 34.
  • 35. DIPIR: Data Documentation PracticesDIPIR: Data Documentation Practices I use an Excel spreadsheet…which I … inherited from my research advisers. …my dissertation advisor was still recording data for each specimen on paper when I was in graduate school so that's what I started …then quickly, I was like, "This is ridiculous.“… I just started using an Excel spreadsheet that has sort of slowly gotten bigger and bigger over time with more variables or columns…I've added …color coding…I also use…a very sort of primitive numerical coding system, again, that I inherited from my research advisers…So, this little book that goes with me of codes which is sort of odd, but …we all know that a 14 is a sheep.” (CCU13) A long way to go before we get usable, intelligible data
  • 37. SPARQL endpoint easy to break (too big of a graph to query). Needed a work-around, so I also use the normal (“plain web”) index to query the British Museum.
  • 38. (1) Keyword search for relevant term. (2) Scrape results (blech!) for item identifiers (“objectid” parameter in URLs) (3) Use ObjectIDs in SPARQL queries (limits size of graph queried, so server doesn’t die).
  • 39. SELECT ?s ?oPart ?oThes ?oLab WHERE { ?s <http://collection.britishmuseum.org/id/c rm/bm-extensions/codex_id> '$objectID'; <http://collection.britishmuseum.org/id/c rm/P46F.is_composed_of> ?oPart. ?oPart <http://collection.britishmuseum.org/id/c rm/P45F.consists_of> ?oThes. ?oThes <http://www.w3.org/2004/02/skos/core# prefLabel> ?oLab. } LIMIT 10
  • 40. Why is linked data important? Why is linked data important? 1. Improve data quality, expert curation of concepts + vocabularies 2. Develop ties with other research communities (can feedback to collect new / different data) 3. Increasingly sophisticated open source tools, support services 4. Part of the Web, not just on the Web 1. Improve data quality, expert curation of concepts + vocabularies 2. Develop ties with other research communities (can feedback to collect new / different data) 3. Increasingly sophisticated open source tools, support services 4. Part of the Web, not just on the Web
  • 41. … but participating in Linked Data requires effort! … but participating in Linked Data requires effort! Why is linked data important? Why is linked data important?
  • 42. Image Credit: Copyright Newline Cinema
  • 43. One does not simply share usable data…
  • 44. Data are challenging 1. “Raw data” often problematic, even with documentation (10X effort needed with decoded data) 2. Tension between modeling needs and familiarity with tools (Excel) 3. More work needed modeling research methods (esp. sampling, see DIPIR.org outcomes) 4. You’re never going to be done! Data are challenging 1. “Raw data” often problematic, even with documentation (10X effort needed with decoded data) 2. Tension between modeling needs and familiarity with tools (Excel) 3. More work needed modeling research methods (esp. sampling, see DIPIR.org outcomes) 4. You’re never going to be done!