An Open Context for Archaeology   Publishing Research Data on the Web  Eric Kansa UC Berkeley School of Information Unless otherwise indicated, this work is licensed under a Creative Commons Attribution 3.0 License <http://creativecommons.org/licenses/by/3.0/>
Today My background Sharing Field Documentation Open Context Unresolved Issues and Next Steps
Today My background Sharing Field Documentation Open Context Unresolved Issues and Next Steps
Personal Background Anthropology Cultural Archaeology (social) Co-founder of the AAI, a “.org” Currently Executive Director of ISD Program
Career Directions Frustrated with the practice of archaeology Data sharing hard / nonexistent Publication = paper Impressionistic, hard to verify claims Opportunity for research Focus on data sharing / communication
Independent NGO / nonprofit corporation  dedicated to promoting open content for cultural heritage research and education Explore approaches for the community to share (technical, copyright, academic) NOT a repository:  Promoting data sharing by creating tools, methods, and exemplars.
Today My background Sharing Field Documentation Open Context Unresolved Issues and Next Steps
Why Focus on Field Projects and Collections? New Research Opportunities Encourage use and reuse of primary evidence Enable broad scale, analytically rigorous investigations Reduce costs and enhance effectiveness of preservation & access Informal estimates: 15-27% of research  ever  gets published 1,2 , often in inaccessible formats 1   James H. Ottaway, Jr. “Publish or Be Damned”, a lecture presented for the University of Cincinnati Classics Department, 5/2001.  2   Morag Kersel. “Publishing the Past: Some Shocking Statistics ”, a lecture presented for the American Schools for Oriental Research annual conference. 2005
Why Primary Research Content? Bumpus (1898) House Sparrow Data Carey Bumpus published all of his  raw data  along with his syntheses 10 subsequent groundbreaking papers reanalyzed these data Invaluable dataset used for instruction Key Point:  Dataset becomes 10X more valuable with dissemination!
Ads Screenshot
The Conceptual Challenge The content of field documentation (represented in spreadsheets and databases) varies greatly Discipline has 1 foot in humanities, 1 in sciences Archaeological documentation is also rich in media and narrative text
Our Approach Explore ways to pool data without overly constraining standards Find / create tools for non-tech expert use and contribution Find / create tools that enable casual browsing / exploratory analyses
Our Approach Stay cost-effective!  Most archaeological data sharing initiatives are site / project specific. More general solutions needed
Global Schemas:  ArchaeoML Simple, general schema makes it easier to pool diverse content Not overly determined, support multiple research agendas Hard to implement .  But we’re gaining experience UML Diagram of a subset of ArchaeoML
Other Web Initiatives  Web resources using highly generalized data structures see growing popularity Example: OpenRecord  Dojo Foundation (leading open source AJAX) “ Wiki for databases” Data expressed in RDF triples (queried with SPARQL) Likely needs some added meaningful structure to facilitate discipline specific use OpenRecord (www.openrecord.org)
Other Web Initiatives  Freebase (freebase.com)
Other Web Initiatives  Freebase (freebase.com) More about this later…
Today My background Sharing Field Documentation Open Context Unresolved Issues and Next Steps
OCHRE, Open Context OCHRE:  Fully supports ArchaeoML global schema using a native XML database and free java client Open Context:  Uses a subset of the ArchaeoML schema (via MySQL/PHP) for web-browser access and Internet search engine indexing. Common services, including complex querying and analysis for pooled content
General Approach “ Organic” Development  Originally planned just to use OCHRE PHP/MySQL:  Drives many dynamic content websites, relatively simple standard technology. “Bleeding edge” difficult for our target community. Open to Search Engines:  Increasingly important research tools (Harley 2006) Easy integration of Open Source Tools  (RSS-feeds, ping-back, etc.) 2   Diane Harley, Sarah Earl-Novell, Jennifer Arter, Shannon Lawrence, and C. Judson King, Jr. “The Influence of Academic Values on Scholarly Publication and Communication Practices”, Research & Occasional Paper Series: CSHE.13.06 <http://cshe.berkeley.edu/publications/docs/ROP.Harley.AcademicValues.13.06.pdf>.
Faceted Browse  Data from multiple projects browsed, queried (even with Boolean algebra), and results pooled together
Records in Open Context
Media Record
Records in Open Context Contextual relationships: (Spatial containment)
Records in Open Context Contextual relationships: (Stratigraphy)
Ownership in Open Context Copyright ownership and Creative Commons license information, including metadata Internet-wide standard metadata, links ownership & permissions
Ownership in Open Context Citation information with stable URL direct to the item being cited.
Ownership in Open Context Zotero  (www.zotero.org) uses COinS (a micro-format) metadata to make bibliographic references
Complex Querying  Data from multiple projects can be queried (with Boolean algebra), and results pooled together
Summary Statistics
Making Meaningful Links  ArchaeoML essentially describes a network of atomic units and their relationships Units and their links typically derived from source data Domuztepe Lot 1939 Bone 231 Pot 232 Pot 233 Pinarbasi Cave Unit A Find ID1-A Find ID2-A Find ID3-A Taxon:  Ovis aries Modification: Ground Point Element:  metacarpal Material:  ceramic Color: Buff-orange Type:  Spindle-whorl
Making Meaningful Links Current (limited) approach with “tags” Assigned to 1 item or a whole set of items (esp. a query selection set) Express a meaningful link between items Domuztepe Lot 1939 Bone 231 Pot 232 Pot 233 Pinarbasi Cave Unit A Find ID1-A Find ID2-A Find ID3-A Taxon:  Ovis aries Modification: Ground Point Element:  metacarpal Material:  ceramic Color: Buff-orange Type:  Spindle-whorl “ Weaving tool”
Future Extensions  Extend tagging concept for more structure Users can apply variable/value pairs. Assign calendar dates to items Apply more sophisticated ontologies / thesauri (Getty?) Domuztepe Lot 1939 Bone 231 Pot 232 Pot 233 Pinarbasi Cave Unit A Find ID1-A Find ID2-A Find ID3-A Taxon:  Ovis aries Modification: Ground Point Element:  metacarpal Material:  ceramic Color: Buff-orange Type:  Spindle-whorl “ Tool Type”: “ Weaving tool”
Using Tagged Sets Pingback:  Register of a link made to a set tagged as “weaving tools” from a weblog
Integration at a General Level Speed and ease of mapping content into ArchaeoML systems Significant cost reduction if most contributors can do it themselves Important for small, individual or project generated research Enables powerful query and analysis across multiple projects But  NOT  very specific.  Example:  Composing queries still uses each project’s  local  recording system (even though several projects can be queried simultaneously and their results pooled)
Schema Mapping into ArchaeoML Importer an important part, most people work with Excel, Filemaker, Access…  Goal: Individuals can upload their own data, map them into ArchaeoML and submit for review and publishing
Data expressed in ArchaeoML Ready for Open Context, OCHRE dissemination Interoperability, longevity advantages Project’s original terminology is maintained Data described with high-level metadata Dublin Core, TimeMap. Can be expressed in RDF, COinS, ArchaeoML (XML), etc. Data schema mappings recorded Import process saves mapping parameters. Internet Archive Accession Outcomes
Petra Open City Eric C. Kansa Executive Director, ISD Program, School of Information, UC Berkeley
Faceted Browse  Petra Great Temple: 128,187 locations / objects 1.1 million descriptions 1626 media objects ( more to come ) 298,500 relationships
Penelope 2 Petra Great Temple: 12 individual databases (some very large) ~200 text documents “mined” 1600+ related media files
Penelope 1
Penelope 1
Penelope 2
Penelope 3
Penelope 4
Today My background Sharing Field Documentation Open Context Unresolved Issues and Next Steps
Bugs, interface problems Truncated development (I got a new job…) Just beginning user evaluations Schema mapping is major challenge Recent collaborations, hiring should help (stay tuned!) User Experience Image by Jeff Kubina via Flickr (CC-by license) <http://www.flickr.com/photos/kubina/296367267/>
Unlocking Open Context
Unlocking Open Context Web services Clear need to facilitate “mash-ups” Community / organization specific portals and views of content Example Application: Second Life or Croquet Most current virtual visualizations are one-off projects, have little applicability to other sites / collections Dynamically link online data stores so visualizations can be easier do develop / more meaningful
Records in Open Context XML data output, enables: (1) Sharing between web resources (2) Custom presentation (Brown University-specific style templates, etc.)
“ Cultural Resource Management” 90% of US archaeology Un-circulated “gray literature” reports Collaboration with the San Diego Archaeological Center 400 datasets, representing 500,000 locations and objects 4-5 “Petras” worth of data Scaling issues becoming paramount! Data Inundation Image by “Doegox” via Flickr (CC-by license) <http://www.flickr.com/photos/doegox/2085419215/>
Metaweb  Exploring Metaweb ArchaeoML seems to map well to their data store Powerful API Large user community Scale, performance Need to understand concerns!
Open Data Protocol Advocated by Science Commons Use of “CC-zero” license  Public domain data, reliance on social norms for appropriate use Solves important problems Questions Multiple stakeholder communities Very different, conflicting norms
Glocal Backlash Internet, one of the principle ways traditional knowledge and heritage is/will be accessed Local claims and notions of privacy, propriety, spirituality  often missing Can have dark-side too! (essentialism, ethno-nationalism, fundamentalisms) Captain Hook award winner for bio-piracy 2006 1 1 Andrew Donoghue, ZDNet UK (March 29, 2006) http://news.zdnet.co.uk/business/0,39020645,39260264,00.htm
Traditional Knowledge Jason Schulz and Ahrash Bissell (co-authors) How CC-licenses can be applied, where they may be inappropriate Some Rights Reserved
Public Tensions Prospects to collaborate with amateur communities? Site security “ Fantastic” archaeology &quot;Pothunters&quot; destroying an archaeological site on the Columbia River (Oregon, USA) Image by “gbaku” via Flickr (CC-By-SA license) <http://www.flickr.com/photos/gbaku/1074322614/>
Today My background Sharing Field Documentation Open Context Unresolved Issues and Next Steps … .. Now for the thanks!
Open Context Developers Eric Kansa (Lead developer, tagging system, interface design) Ahrash Bissell (Penelope design, usability) Nathan Hirth (XML, XSLT, schema mapping) David Schloen (ArchaeoML schema) Sarah W. Kansa (Usability, interface design, documentation) Jeanne Lopiparo (Interface and graphic design, usability) Michael Ashley (Filemaker item-view mockup) Chris Hoffman (Usability, optimization )
Special Thanks University of Chicago: OCHRE Project The Electronic Frontier Foundation Doris and Donald Fisher Presidio Archaeology: NPS, Golden Gate National Rec. Area Science Commons Internet Archive (media repository services) “ Friday Afternoon Seminar” “ Friday Afternoon Seminar”
Special Thanks University of Chicago: OCHRE Project The Electronic Frontier Foundation Doris and Donald Fisher “ Friday Afternoon Seminar” Presidio Archaeology: NPS, Golden Gate National Rec. Area Science Commons Internet Archive (media repository services)

An Open Context for Archaeology

  • 1.
    An Open Contextfor Archaeology Publishing Research Data on the Web Eric Kansa UC Berkeley School of Information Unless otherwise indicated, this work is licensed under a Creative Commons Attribution 3.0 License <http://creativecommons.org/licenses/by/3.0/>
  • 2.
    Today My backgroundSharing Field Documentation Open Context Unresolved Issues and Next Steps
  • 3.
    Today My backgroundSharing Field Documentation Open Context Unresolved Issues and Next Steps
  • 4.
    Personal Background AnthropologyCultural Archaeology (social) Co-founder of the AAI, a “.org” Currently Executive Director of ISD Program
  • 5.
    Career Directions Frustratedwith the practice of archaeology Data sharing hard / nonexistent Publication = paper Impressionistic, hard to verify claims Opportunity for research Focus on data sharing / communication
  • 6.
    Independent NGO /nonprofit corporation dedicated to promoting open content for cultural heritage research and education Explore approaches for the community to share (technical, copyright, academic) NOT a repository: Promoting data sharing by creating tools, methods, and exemplars.
  • 7.
    Today My backgroundSharing Field Documentation Open Context Unresolved Issues and Next Steps
  • 8.
    Why Focus onField Projects and Collections? New Research Opportunities Encourage use and reuse of primary evidence Enable broad scale, analytically rigorous investigations Reduce costs and enhance effectiveness of preservation & access Informal estimates: 15-27% of research ever gets published 1,2 , often in inaccessible formats 1 James H. Ottaway, Jr. “Publish or Be Damned”, a lecture presented for the University of Cincinnati Classics Department, 5/2001. 2 Morag Kersel. “Publishing the Past: Some Shocking Statistics ”, a lecture presented for the American Schools for Oriental Research annual conference. 2005
  • 9.
    Why Primary ResearchContent? Bumpus (1898) House Sparrow Data Carey Bumpus published all of his raw data along with his syntheses 10 subsequent groundbreaking papers reanalyzed these data Invaluable dataset used for instruction Key Point: Dataset becomes 10X more valuable with dissemination!
  • 10.
  • 11.
    The Conceptual ChallengeThe content of field documentation (represented in spreadsheets and databases) varies greatly Discipline has 1 foot in humanities, 1 in sciences Archaeological documentation is also rich in media and narrative text
  • 12.
    Our Approach Exploreways to pool data without overly constraining standards Find / create tools for non-tech expert use and contribution Find / create tools that enable casual browsing / exploratory analyses
  • 13.
    Our Approach Staycost-effective! Most archaeological data sharing initiatives are site / project specific. More general solutions needed
  • 14.
    Global Schemas: ArchaeoML Simple, general schema makes it easier to pool diverse content Not overly determined, support multiple research agendas Hard to implement . But we’re gaining experience UML Diagram of a subset of ArchaeoML
  • 15.
    Other Web Initiatives Web resources using highly generalized data structures see growing popularity Example: OpenRecord Dojo Foundation (leading open source AJAX) “ Wiki for databases” Data expressed in RDF triples (queried with SPARQL) Likely needs some added meaningful structure to facilitate discipline specific use OpenRecord (www.openrecord.org)
  • 16.
    Other Web Initiatives Freebase (freebase.com)
  • 17.
    Other Web Initiatives Freebase (freebase.com) More about this later…
  • 18.
    Today My backgroundSharing Field Documentation Open Context Unresolved Issues and Next Steps
  • 19.
    OCHRE, Open ContextOCHRE: Fully supports ArchaeoML global schema using a native XML database and free java client Open Context: Uses a subset of the ArchaeoML schema (via MySQL/PHP) for web-browser access and Internet search engine indexing. Common services, including complex querying and analysis for pooled content
  • 20.
    General Approach “Organic” Development Originally planned just to use OCHRE PHP/MySQL: Drives many dynamic content websites, relatively simple standard technology. “Bleeding edge” difficult for our target community. Open to Search Engines: Increasingly important research tools (Harley 2006) Easy integration of Open Source Tools (RSS-feeds, ping-back, etc.) 2 Diane Harley, Sarah Earl-Novell, Jennifer Arter, Shannon Lawrence, and C. Judson King, Jr. “The Influence of Academic Values on Scholarly Publication and Communication Practices”, Research & Occasional Paper Series: CSHE.13.06 <http://cshe.berkeley.edu/publications/docs/ROP.Harley.AcademicValues.13.06.pdf>.
  • 21.
    Faceted Browse Data from multiple projects browsed, queried (even with Boolean algebra), and results pooled together
  • 22.
  • 23.
  • 24.
    Records in OpenContext Contextual relationships: (Spatial containment)
  • 25.
    Records in OpenContext Contextual relationships: (Stratigraphy)
  • 26.
    Ownership in OpenContext Copyright ownership and Creative Commons license information, including metadata Internet-wide standard metadata, links ownership & permissions
  • 27.
    Ownership in OpenContext Citation information with stable URL direct to the item being cited.
  • 28.
    Ownership in OpenContext Zotero (www.zotero.org) uses COinS (a micro-format) metadata to make bibliographic references
  • 29.
    Complex Querying Data from multiple projects can be queried (with Boolean algebra), and results pooled together
  • 30.
  • 31.
    Making Meaningful Links ArchaeoML essentially describes a network of atomic units and their relationships Units and their links typically derived from source data Domuztepe Lot 1939 Bone 231 Pot 232 Pot 233 Pinarbasi Cave Unit A Find ID1-A Find ID2-A Find ID3-A Taxon: Ovis aries Modification: Ground Point Element: metacarpal Material: ceramic Color: Buff-orange Type: Spindle-whorl
  • 32.
    Making Meaningful LinksCurrent (limited) approach with “tags” Assigned to 1 item or a whole set of items (esp. a query selection set) Express a meaningful link between items Domuztepe Lot 1939 Bone 231 Pot 232 Pot 233 Pinarbasi Cave Unit A Find ID1-A Find ID2-A Find ID3-A Taxon: Ovis aries Modification: Ground Point Element: metacarpal Material: ceramic Color: Buff-orange Type: Spindle-whorl “ Weaving tool”
  • 33.
    Future Extensions Extend tagging concept for more structure Users can apply variable/value pairs. Assign calendar dates to items Apply more sophisticated ontologies / thesauri (Getty?) Domuztepe Lot 1939 Bone 231 Pot 232 Pot 233 Pinarbasi Cave Unit A Find ID1-A Find ID2-A Find ID3-A Taxon: Ovis aries Modification: Ground Point Element: metacarpal Material: ceramic Color: Buff-orange Type: Spindle-whorl “ Tool Type”: “ Weaving tool”
  • 34.
    Using Tagged SetsPingback: Register of a link made to a set tagged as “weaving tools” from a weblog
  • 35.
    Integration at aGeneral Level Speed and ease of mapping content into ArchaeoML systems Significant cost reduction if most contributors can do it themselves Important for small, individual or project generated research Enables powerful query and analysis across multiple projects But NOT very specific. Example: Composing queries still uses each project’s local recording system (even though several projects can be queried simultaneously and their results pooled)
  • 36.
    Schema Mapping intoArchaeoML Importer an important part, most people work with Excel, Filemaker, Access… Goal: Individuals can upload their own data, map them into ArchaeoML and submit for review and publishing
  • 37.
    Data expressed inArchaeoML Ready for Open Context, OCHRE dissemination Interoperability, longevity advantages Project’s original terminology is maintained Data described with high-level metadata Dublin Core, TimeMap. Can be expressed in RDF, COinS, ArchaeoML (XML), etc. Data schema mappings recorded Import process saves mapping parameters. Internet Archive Accession Outcomes
  • 38.
    Petra Open CityEric C. Kansa Executive Director, ISD Program, School of Information, UC Berkeley
  • 39.
    Faceted Browse Petra Great Temple: 128,187 locations / objects 1.1 million descriptions 1626 media objects ( more to come ) 298,500 relationships
  • 40.
    Penelope 2 PetraGreat Temple: 12 individual databases (some very large) ~200 text documents “mined” 1600+ related media files
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
    Today My backgroundSharing Field Documentation Open Context Unresolved Issues and Next Steps
  • 47.
    Bugs, interface problemsTruncated development (I got a new job…) Just beginning user evaluations Schema mapping is major challenge Recent collaborations, hiring should help (stay tuned!) User Experience Image by Jeff Kubina via Flickr (CC-by license) <http://www.flickr.com/photos/kubina/296367267/>
  • 48.
  • 49.
    Unlocking Open ContextWeb services Clear need to facilitate “mash-ups” Community / organization specific portals and views of content Example Application: Second Life or Croquet Most current virtual visualizations are one-off projects, have little applicability to other sites / collections Dynamically link online data stores so visualizations can be easier do develop / more meaningful
  • 50.
    Records in OpenContext XML data output, enables: (1) Sharing between web resources (2) Custom presentation (Brown University-specific style templates, etc.)
  • 51.
    “ Cultural ResourceManagement” 90% of US archaeology Un-circulated “gray literature” reports Collaboration with the San Diego Archaeological Center 400 datasets, representing 500,000 locations and objects 4-5 “Petras” worth of data Scaling issues becoming paramount! Data Inundation Image by “Doegox” via Flickr (CC-by license) <http://www.flickr.com/photos/doegox/2085419215/>
  • 52.
    Metaweb ExploringMetaweb ArchaeoML seems to map well to their data store Powerful API Large user community Scale, performance Need to understand concerns!
  • 53.
    Open Data ProtocolAdvocated by Science Commons Use of “CC-zero” license Public domain data, reliance on social norms for appropriate use Solves important problems Questions Multiple stakeholder communities Very different, conflicting norms
  • 54.
    Glocal Backlash Internet,one of the principle ways traditional knowledge and heritage is/will be accessed Local claims and notions of privacy, propriety, spirituality often missing Can have dark-side too! (essentialism, ethno-nationalism, fundamentalisms) Captain Hook award winner for bio-piracy 2006 1 1 Andrew Donoghue, ZDNet UK (March 29, 2006) http://news.zdnet.co.uk/business/0,39020645,39260264,00.htm
  • 55.
    Traditional Knowledge JasonSchulz and Ahrash Bissell (co-authors) How CC-licenses can be applied, where they may be inappropriate Some Rights Reserved
  • 56.
    Public Tensions Prospectsto collaborate with amateur communities? Site security “ Fantastic” archaeology &quot;Pothunters&quot; destroying an archaeological site on the Columbia River (Oregon, USA) Image by “gbaku” via Flickr (CC-By-SA license) <http://www.flickr.com/photos/gbaku/1074322614/>
  • 57.
    Today My backgroundSharing Field Documentation Open Context Unresolved Issues and Next Steps … .. Now for the thanks!
  • 58.
    Open Context DevelopersEric Kansa (Lead developer, tagging system, interface design) Ahrash Bissell (Penelope design, usability) Nathan Hirth (XML, XSLT, schema mapping) David Schloen (ArchaeoML schema) Sarah W. Kansa (Usability, interface design, documentation) Jeanne Lopiparo (Interface and graphic design, usability) Michael Ashley (Filemaker item-view mockup) Chris Hoffman (Usability, optimization )
  • 59.
    Special Thanks Universityof Chicago: OCHRE Project The Electronic Frontier Foundation Doris and Donald Fisher Presidio Archaeology: NPS, Golden Gate National Rec. Area Science Commons Internet Archive (media repository services) “ Friday Afternoon Seminar” “ Friday Afternoon Seminar”
  • 60.
    Special Thanks Universityof Chicago: OCHRE Project The Electronic Frontier Foundation Doris and Donald Fisher “ Friday Afternoon Seminar” Presidio Archaeology: NPS, Golden Gate National Rec. Area Science Commons Internet Archive (media repository services)