Beyond Preservation: 
Situating Archaeological Data in 
Professional Practice 
Eric C. Kansa (@ekansa) 
UC Berkeley D-Lab 
Eric C. Kansa (@ekansa) 
UC Berkeley D-Lab 
& Open Context 
& Open Context 
2014-2015 Harvard Center for 
Hellenic Studies & German 
2014-2015 Harvard Center for 
Hellenic Archaeological Studies Institute 
& German 
Archaeological Institute Research 
Research Fellow 
Fellow 
Unless otherwise indicated, this work is licensed under a Creative Commons Attribution 3.0 
License <http://creativecommons.org/licenses/by/3.0/>
Data Sharing as Publication 
• Started in 2007 
• Open data (mainly CC-By) 
• Archiving by California Digital 
Library 
• Part of a broader reform 
movement in scholarly 
communications
IInnttrroodduuccttiioonn 
Visions for Digital 
Data in Archaeology 
1. “Optimizing the status quo” 
2. Opportunity for fundamentally better 
ways to conduct and communicate 
research
IInnttrroodduuccttiioonn 
Digital Data in Archaeology 
1. Why discuss data? 
2. Data in (bad) institutional contexts 
3. Open Context's approach 
4. Need for more & wider intellectual 
investment
IInnttrroodduuccttiioonn 
Digital Data in Archaeology 
1. Why discuss data? 
2. Data in (bad) institutional contexts 
3. Open Context's approach 
4. Need for more & wider intellectual 
investment
Data source: Arif Jinha (2010). Article 50 million: an estimate of the number of scholarly articles in 
Arif Jinha (2010). Article 50 million: an estimate of the number of scholarly articles in 
existence Learned Publishing, 23 (3), 258-263 DOI: 10.1087/20100308. 
existence Learned Publishing, 23 (3), 258-263 DOI: 10.1087/20100308. 
Image Source: http://www.cs.cmu.edu/~comar/open-science/ 
http://www.cs.cmu.edu/~comar/open-science/
Paper and paper like 
digital files (PDFs) do 
not scale well: 
● ● Discovery 
● ● Reuse 
Image Credit: Wikimedia Commons (Public Domain) 
http://commons.wikimedia.org/wiki/File:Archives_entreprises.jpg
Image Credit: Wikimedia Commons (CC-BY-SA) 
http://commons.wikimedia.org/wiki/File:BigData_2267x1146_white.png
Lots of investment in 
“Big Data” 
● ● Corporate 
● ● Government 
● ● 'STEM' academia
Lots of investment in 
“Big Data” 
● ● Corporate 
● ● Government 
● ● 'STEM' academia
Image Credit: 'gin soak' (CC-BY-NC-ND) 
https://www.flickr.com/photos/gin_soak/2215398726 
Structured Data – Creativity 
1. New forms of communication 
2. New forms of collaboration 
3. New research opportunities
'Mash-ups' 
(informal 
integrations) 
Open Context & 
Arachne
Experiment in open, distributed 
post-publication peer-review
Text-mining literature to identify 
references to ancient places 
2010 (renewed 2012) Google Digital Humanities Awards: with 
Elton Barker, Leif Isaksen, Kate Byrne, Nick Rabinowitz
Project limited to public domain 
(pre-1920) resources
IInnttrroodduuccttiioonn 
Digital Data in Archaeology 
1. Why discuss data? 
2. Data in (bad) institutional contexts 
3. Open Context's approach 
4. Need for more & wider intellectual 
investment
Commercial interests and 
public policy 
Conditions of 
academic labor 
Neoliberalism: 
(Loosely associated ideologies / 
assumptions / interests)
Source: The Occasional Pamphlet - Harvard University 
(http://blogs.law.harvard.edu/pamphlet/2013/01/29/why-open-access-is-better-for-scholarly-societies/)
Conditions of 
academic labor 
Neoliberalism: 
(Loosely associated ideologies / 
assumptions / interests)
Neoliberalism: 
Taylorism, 
“Audit Culture” and fierce 
job/grant competition 
 
Data contributions don’t 
count! 
Image Credit: Wikimedia Commons (Public Domain) 
http://en.wikipedia.org/wiki/Frederick_Winslow_Taylor#mediaviewer/File:Frederick_Winslow_Taylor_crop.jpg
Ironies of data: Publications 
counted as data, but data don’t 
count!
☹Frowns at 
Many researchers (esp. junior 
scholars) lack academic freedom
My Precious Data 
Image Credit: “Lord of the Rings” (2003, New Line), 
All Rights Reserved Copyright
Data sharing as 
compliance
Need more carrots! 
1. Citation, credit, intellectually 
valued 
2. Research outcomes (new 
insights from data reuse!)
Need more carrots! 
1. Citation, credit, intellectually 
valued 
2. Research outcomes (new 
insights from data reuse!)
Adapt Academic Taylorism: 
● Datacite (metadata, citation 
for datasets) 
● Alt-metrics (social media, 
view counts, download 
counts, etc.) 
 
Make data count!
Need more carrots! 
1. Citation, credit, intellectually 
valued 
2. Research outcomes (new 
insights from data reuse!)
IInnttrroodduuccttiioonn 
Digital Data in Archaeology 
1. Why discuss data? 
2. Data in (bad) institutional contexts 
3. Open Context's approach 
4. Need for more & wider intellectual 
investment
Data Sharing as Publication 
• Started in 2007 
• Open data (mainly CC-By) 
• Archiving by California Digital 
Library 
• Part of a broader reform 
movement in scholarly 
communications
Publishing Workflow 
Improve / Enhance 
1. Consistency 
2. Context (intelligibility, 
interoperability)
Digital Index of North American 
Archaeology (DINAA) 
1. Rich metadata (cultures, 
chronology, site-types) 
2. Reduced precision location data 
(site security, legal) 
3. Data modeling challenges (using 
GeoJSON-LD, CIDOC-CRM, 
event models)
Using site file 
data to 
examine the 
impacts of sea 
level rise 
In 100 years, 19,676 
sites will be covered!
Digital Index of North American 
Archaeology (DINAA) 
1. ~ 500,000 site records curated by 
state officials 
2. Key (Linked Data!) reference for N. 
American archaeology 
3. PIs/Co-PIs: David G. Anderson, 
Joshua Wells, Eric Kansa, Sarah 
Kansa, Stephen Yerka
Stable Web URI: 
Reference this to disambiguate between 
“Alexandria” (Egypt) and other places 
called “Alexandria” (many of which are 
also ancient)
Pelagios: 
Heat map of museum collections, 
archives, databases referencing places 
in Pleiades 
(PIs Leif Isaksen, Elton Barker)
WWeebb ooff DDaattaa ((22001111)) 
Need Archaeology on the Map 
Contributions should not be isolated 
from other communities
Linked Data: 
Annotations to community vocabularies 
part of Open Context editorial process
IInnttrroodduuccttiioonn 
Digital Data in Archaeology 
1. Why discuss data? 
2. Data in (bad) institutional contexts 
3. Open Context's approach 
4. Need for more & wider intellectual 
investment
I just started using an Excel spreadsheet that 
has sort of slowly gotten bigger and bigger 
over time with more variables or columns…I've 
added …color coding…I also use…a very sort of 
primitive numerical coding system, again, that I 
inherited from my research advisers…So, this 
little book that goes with me of codes which is 
sort of odd, but …we all know that a 14 is a 
sheep.” (CCU13) 
Need to do more than 
“Optimize the Status Quo”
RRaaww DDaattaa CCaann BBee UUnnaappppeettiizziinngg
Sometimes data is better 
served cooked
Large scale data sharing & 
integration for exploring the 
origins of farming. 
Funded by EOL / NEH
1. 300,000 bone specimens 
2. Complex: dozens, up to 110 
descriptive fields 
3. 34 contributors from 15 
archaeological sites 
4. More than 4 person years of 
effort to create the data !
6500 BC (few pigs, mixing with wild animals?) 
7500 BC (sheep + goat dominate, few pigs, few cattle) 
7000 BC (many pigs, cattle) 
8000 BC (cattle, pigs, 
sheep + goats) 
• Not a neat model of progress to adopt a more productive economy. Very 
different, sometimes piecemeal adoption in different regions. 
Arbuckle BS, Kansa SW, Kansa E, Orton D, Çakırlar C, et al. (2014) Data Sharing Reveals Complexity in 
the Westward Spread of Domestic Animals across Neolithic Turkey. PLoS ONE 9(6): e99845. 
doi:10.1371/journal.pone.0099845
Easy to Align 
1. Animal taxonomy 
2. Skeletal elements 
3. Sex determinations 
4. Side of the animal 
5. Fusion (bone growth, up to a 
point)
Hard to Align (poor modeling, recording) 
1. Tooth wear (age) 
2. Fusion data 
3. Measurements 
Despite common research methods!!
“Under the hood” exposure 
and reuse attempts critical! 
Fundamental method & theory 
issues in data modeling!
Investing in Data is a Continual Need 
1. Data and code co-evolve. New 
visualizations, analysis may reveal unseen 
problems in data. 
2. Data and metadata change routinely 
(revised stratigraphy requires ongoing 
updates to data in this analysis) 
3. Problems, interpretive issues in data (and 
annotations) keep cropping up. 
4. Is publishing a bad metaphor implying a 
static product?
Data sharing as publication 
Data sharing as open source 
release cycles?
Data sharing as publication 
Data sharing as open source 
release cycles?
Data sharing as publication 
AND 
Data sharing as open source 
release cycles
Go beyond Optimization 
of the Status Quo 
More to data than 'compliance' 
Data require intellectual investment, 
methodological and theoretical 
innovation. 
New professional roles needed, but 
who will pay for it?
TThhaannkk yyoouu!! 
Special Thanks! 
Harvard Center for Hellenic 
Studies & the German 
Archaeological Institute (DAI)

Beyond Preservation: Situating Archaeological Data in Professional Practice

  • 1.
    Beyond Preservation: SituatingArchaeological Data in Professional Practice Eric C. Kansa (@ekansa) UC Berkeley D-Lab Eric C. Kansa (@ekansa) UC Berkeley D-Lab & Open Context & Open Context 2014-2015 Harvard Center for Hellenic Studies & German 2014-2015 Harvard Center for Hellenic Archaeological Studies Institute & German Archaeological Institute Research Research Fellow Fellow Unless otherwise indicated, this work is licensed under a Creative Commons Attribution 3.0 License <http://creativecommons.org/licenses/by/3.0/>
  • 2.
    Data Sharing asPublication • Started in 2007 • Open data (mainly CC-By) • Archiving by California Digital Library • Part of a broader reform movement in scholarly communications
  • 3.
    IInnttrroodduuccttiioonn Visions forDigital Data in Archaeology 1. “Optimizing the status quo” 2. Opportunity for fundamentally better ways to conduct and communicate research
  • 4.
    IInnttrroodduuccttiioonn Digital Datain Archaeology 1. Why discuss data? 2. Data in (bad) institutional contexts 3. Open Context's approach 4. Need for more & wider intellectual investment
  • 5.
    IInnttrroodduuccttiioonn Digital Datain Archaeology 1. Why discuss data? 2. Data in (bad) institutional contexts 3. Open Context's approach 4. Need for more & wider intellectual investment
  • 6.
    Data source: ArifJinha (2010). Article 50 million: an estimate of the number of scholarly articles in Arif Jinha (2010). Article 50 million: an estimate of the number of scholarly articles in existence Learned Publishing, 23 (3), 258-263 DOI: 10.1087/20100308. existence Learned Publishing, 23 (3), 258-263 DOI: 10.1087/20100308. Image Source: http://www.cs.cmu.edu/~comar/open-science/ http://www.cs.cmu.edu/~comar/open-science/
  • 7.
    Paper and paperlike digital files (PDFs) do not scale well: ● ● Discovery ● ● Reuse Image Credit: Wikimedia Commons (Public Domain) http://commons.wikimedia.org/wiki/File:Archives_entreprises.jpg
  • 8.
    Image Credit: WikimediaCommons (CC-BY-SA) http://commons.wikimedia.org/wiki/File:BigData_2267x1146_white.png
  • 9.
    Lots of investmentin “Big Data” ● ● Corporate ● ● Government ● ● 'STEM' academia
  • 10.
    Lots of investmentin “Big Data” ● ● Corporate ● ● Government ● ● 'STEM' academia
  • 12.
    Image Credit: 'ginsoak' (CC-BY-NC-ND) https://www.flickr.com/photos/gin_soak/2215398726 Structured Data – Creativity 1. New forms of communication 2. New forms of collaboration 3. New research opportunities
  • 13.
    'Mash-ups' (informal integrations) Open Context & Arachne
  • 14.
    Experiment in open,distributed post-publication peer-review
  • 15.
    Text-mining literature toidentify references to ancient places 2010 (renewed 2012) Google Digital Humanities Awards: with Elton Barker, Leif Isaksen, Kate Byrne, Nick Rabinowitz
  • 16.
    Project limited topublic domain (pre-1920) resources
  • 17.
    IInnttrroodduuccttiioonn Digital Datain Archaeology 1. Why discuss data? 2. Data in (bad) institutional contexts 3. Open Context's approach 4. Need for more & wider intellectual investment
  • 18.
    Commercial interests and public policy Conditions of academic labor Neoliberalism: (Loosely associated ideologies / assumptions / interests)
  • 19.
    Source: The OccasionalPamphlet - Harvard University (http://blogs.law.harvard.edu/pamphlet/2013/01/29/why-open-access-is-better-for-scholarly-societies/)
  • 22.
    Conditions of academiclabor Neoliberalism: (Loosely associated ideologies / assumptions / interests)
  • 23.
    Neoliberalism: Taylorism, “AuditCulture” and fierce job/grant competition  Data contributions don’t count! Image Credit: Wikimedia Commons (Public Domain) http://en.wikipedia.org/wiki/Frederick_Winslow_Taylor#mediaviewer/File:Frederick_Winslow_Taylor_crop.jpg
  • 24.
    Ironies of data:Publications counted as data, but data don’t count!
  • 25.
    ☹Frowns at Manyresearchers (esp. junior scholars) lack academic freedom
  • 26.
    My Precious Data Image Credit: “Lord of the Rings” (2003, New Line), All Rights Reserved Copyright
  • 28.
    Data sharing as compliance
  • 30.
    Need more carrots! 1. Citation, credit, intellectually valued 2. Research outcomes (new insights from data reuse!)
  • 31.
    Need more carrots! 1. Citation, credit, intellectually valued 2. Research outcomes (new insights from data reuse!)
  • 32.
    Adapt Academic Taylorism: ● Datacite (metadata, citation for datasets) ● Alt-metrics (social media, view counts, download counts, etc.)  Make data count!
  • 33.
    Need more carrots! 1. Citation, credit, intellectually valued 2. Research outcomes (new insights from data reuse!)
  • 34.
    IInnttrroodduuccttiioonn Digital Datain Archaeology 1. Why discuss data? 2. Data in (bad) institutional contexts 3. Open Context's approach 4. Need for more & wider intellectual investment
  • 35.
    Data Sharing asPublication • Started in 2007 • Open data (mainly CC-By) • Archiving by California Digital Library • Part of a broader reform movement in scholarly communications
  • 36.
    Publishing Workflow Improve/ Enhance 1. Consistency 2. Context (intelligibility, interoperability)
  • 46.
    Digital Index ofNorth American Archaeology (DINAA) 1. Rich metadata (cultures, chronology, site-types) 2. Reduced precision location data (site security, legal) 3. Data modeling challenges (using GeoJSON-LD, CIDOC-CRM, event models)
  • 48.
    Using site file data to examine the impacts of sea level rise In 100 years, 19,676 sites will be covered!
  • 49.
    Digital Index ofNorth American Archaeology (DINAA) 1. ~ 500,000 site records curated by state officials 2. Key (Linked Data!) reference for N. American archaeology 3. PIs/Co-PIs: David G. Anderson, Joshua Wells, Eric Kansa, Sarah Kansa, Stephen Yerka
  • 50.
    Stable Web URI: Reference this to disambiguate between “Alexandria” (Egypt) and other places called “Alexandria” (many of which are also ancient)
  • 51.
    Pelagios: Heat mapof museum collections, archives, databases referencing places in Pleiades (PIs Leif Isaksen, Elton Barker)
  • 52.
    WWeebb ooff DDaattaa((22001111)) Need Archaeology on the Map Contributions should not be isolated from other communities
  • 53.
    Linked Data: Annotationsto community vocabularies part of Open Context editorial process
  • 54.
    IInnttrroodduuccttiioonn Digital Datain Archaeology 1. Why discuss data? 2. Data in (bad) institutional contexts 3. Open Context's approach 4. Need for more & wider intellectual investment
  • 55.
    I just startedusing an Excel spreadsheet that has sort of slowly gotten bigger and bigger over time with more variables or columns…I've added …color coding…I also use…a very sort of primitive numerical coding system, again, that I inherited from my research advisers…So, this little book that goes with me of codes which is sort of odd, but …we all know that a 14 is a sheep.” (CCU13) Need to do more than “Optimize the Status Quo”
  • 56.
    RRaaww DDaattaa CCaannBBee UUnnaappppeettiizziinngg
  • 57.
    Sometimes data isbetter served cooked
  • 58.
    Large scale datasharing & integration for exploring the origins of farming. Funded by EOL / NEH
  • 59.
    1. 300,000 bonespecimens 2. Complex: dozens, up to 110 descriptive fields 3. 34 contributors from 15 archaeological sites 4. More than 4 person years of effort to create the data !
  • 60.
    6500 BC (fewpigs, mixing with wild animals?) 7500 BC (sheep + goat dominate, few pigs, few cattle) 7000 BC (many pigs, cattle) 8000 BC (cattle, pigs, sheep + goats) • Not a neat model of progress to adopt a more productive economy. Very different, sometimes piecemeal adoption in different regions. Arbuckle BS, Kansa SW, Kansa E, Orton D, Çakırlar C, et al. (2014) Data Sharing Reveals Complexity in the Westward Spread of Domestic Animals across Neolithic Turkey. PLoS ONE 9(6): e99845. doi:10.1371/journal.pone.0099845
  • 61.
    Easy to Align 1. Animal taxonomy 2. Skeletal elements 3. Sex determinations 4. Side of the animal 5. Fusion (bone growth, up to a point)
  • 62.
    Hard to Align(poor modeling, recording) 1. Tooth wear (age) 2. Fusion data 3. Measurements Despite common research methods!!
  • 63.
    “Under the hood”exposure and reuse attempts critical! Fundamental method & theory issues in data modeling!
  • 64.
    Investing in Datais a Continual Need 1. Data and code co-evolve. New visualizations, analysis may reveal unseen problems in data. 2. Data and metadata change routinely (revised stratigraphy requires ongoing updates to data in this analysis) 3. Problems, interpretive issues in data (and annotations) keep cropping up. 4. Is publishing a bad metaphor implying a static product?
  • 66.
    Data sharing aspublication Data sharing as open source release cycles?
  • 67.
    Data sharing aspublication Data sharing as open source release cycles?
  • 68.
    Data sharing aspublication AND Data sharing as open source release cycles
  • 69.
    Go beyond Optimization of the Status Quo More to data than 'compliance' Data require intellectual investment, methodological and theoretical innovation. New professional roles needed, but who will pay for it?
  • 70.
    TThhaannkk yyoouu!! SpecialThanks! Harvard Center for Hellenic Studies & the German Archaeological Institute (DAI)