An Open Context for Archaeology - Presentation Transcript
An Open Context for Archaeology Publishing Research Data on the Web Eric Kansa UC Berkeley School of Information Unless otherwise indicated, this work is licensed under a Creative Commons Attribution 3.0 License <http://creativecommons.org/licenses/by/3.0/>
Today
My background
Sharing Field Documentation
Open Context
Unresolved Issues and Next Steps
Today
My background
Sharing Field Documentation
Open Context
Unresolved Issues and Next Steps
Personal Background
Anthropology
Cultural
Archaeology (social)
Co-founder of the AAI, a “.org”
Currently Executive Director of ISD Program
Career Directions
Frustrated with the practice of archaeology
Data sharing hard / nonexistent
Publication = paper
Impressionistic, hard to verify claims
Opportunity for research
Focus on data sharing / communication
Independent NGO / nonprofit corporation dedicated to promoting open content for cultural heritage research and education
Explore approaches for the community to share (technical, copyright, academic)
NOT a repository: Promoting data sharing by creating tools, methods, and exemplars.
Reduce costs and enhance effectiveness of preservation & access
Informal estimates: 15-27% of research ever gets published 1,2 , often in inaccessible formats
1 James H. Ottaway, Jr. “Publish or Be Damned”, a lecture presented for the University of Cincinnati Classics Department, 5/2001. 2 Morag Kersel. “Publishing the Past: Some Shocking Statistics ”, a lecture presented for the American Schools for Oriental Research annual conference. 2005
Why Primary Research Content?
Bumpus (1898) House Sparrow Data
Carey Bumpus published all of his raw data along with his syntheses
10 subsequent groundbreaking papers reanalyzed these data
Invaluable dataset used for instruction
Key Point: Dataset becomes 10X more valuable with dissemination!
Ads Screenshot
The Conceptual Challenge
The content of field documentation (represented in spreadsheets and databases) varies greatly
Discipline has 1 foot in humanities, 1 in sciences
Archaeological documentation is also rich in media and narrative text
Our Approach
Explore ways to pool data without overly constraining standards
Find / create tools for non-tech expert use and contribution
Stay cost-effective! Most archaeological data sharing initiatives are site / project specific.
More general solutions needed
Global Schemas: ArchaeoML
Simple, general schema makes it easier to pool diverse content
Not overly determined, support multiple research agendas
Hard to implement . But we’re gaining experience
UML Diagram of a subset of ArchaeoML
Other Web Initiatives
Web resources using highly generalized data structures see growing popularity
Example: OpenRecord
Dojo Foundation (leading open source AJAX)
“ Wiki for databases”
Data expressed in RDF triples (queried with SPARQL)
Likely needs some added meaningful structure to facilitate discipline specific use
OpenRecord (www.openrecord.org)
Other Web Initiatives Freebase (freebase.com)
Other Web Initiatives Freebase (freebase.com)
More about this later…
Today
My background
Sharing Field Documentation
Open Context
Unresolved Issues and Next Steps
OCHRE, Open Context
OCHRE: Fully supports ArchaeoML global schema using a native XML database and free java client
Open Context: Uses a subset of the ArchaeoML schema (via MySQL/PHP) for web-browser access and Internet search engine indexing.
Common services, including complex querying and analysis for pooled content
General Approach
“ Organic” Development Originally planned just to use OCHRE
PHP/MySQL: Drives many dynamic content websites, relatively simple standard technology. “Bleeding edge” difficult for our target community.
Open to Search Engines: Increasingly important research tools (Harley 2006)
Easy integration of Open Source Tools (RSS-feeds, ping-back, etc.)
2 Diane Harley, Sarah Earl-Novell, Jennifer Arter, Shannon Lawrence, and C. Judson King, Jr. “The Influence of Academic Values on Scholarly Publication and Communication Practices”, Research & Occasional Paper Series: CSHE.13.06 <http://cshe.berkeley.edu/publications/docs/ROP.Harley.AcademicValues.13.06.pdf>.
Faceted Browse
Data from multiple projects browsed, queried (even with Boolean algebra), and results pooled together
Records in Open Context
Media Record
Records in Open Context Contextual relationships: (Spatial containment)
Records in Open Context Contextual relationships: (Stratigraphy)
Ownership in Open Context Copyright ownership and Creative Commons license information, including metadata Internet-wide standard metadata, links ownership & permissions
Ownership in Open Context Citation information with stable URL direct to the item being cited.
Ownership in Open Context Zotero (www.zotero.org) uses COinS (a micro-format) metadata to make bibliographic references
Complex Querying
Data from multiple projects can be queried (with Boolean algebra), and results pooled together
Summary Statistics
Making Meaningful Links
ArchaeoML essentially describes a network of atomic units and their relationships
Units and their links typically derived from source data
Domuztepe Lot 1939 Bone 231 Pot 232 Pot 233 Pinarbasi Cave Unit A Find ID1-A Find ID2-A Find ID3-A Taxon: Ovis aries Modification: Ground Point Element: metacarpal Material: ceramic Color: Buff-orange Type: Spindle-whorl
Making Meaningful Links
Current (limited) approach with “tags”
Assigned to 1 item or a whole set of items (esp. a query selection set)
Express a meaningful link between items
Domuztepe Lot 1939 Bone 231 Pot 232 Pot 233 Pinarbasi Cave Unit A Find ID1-A Find ID2-A Find ID3-A Taxon: Ovis aries Modification: Ground Point Element: metacarpal Material: ceramic Color: Buff-orange Type: Spindle-whorl “ Weaving tool”
Future Extensions
Extend tagging concept for more structure
Users can apply variable/value pairs.
Assign calendar dates to items
Apply more sophisticated ontologies / thesauri (Getty?)
Domuztepe Lot 1939 Bone 231 Pot 232 Pot 233 Pinarbasi Cave Unit A Find ID1-A Find ID2-A Find ID3-A Taxon: Ovis aries Modification: Ground Point Element: metacarpal Material: ceramic Color: Buff-orange Type: Spindle-whorl “ Tool Type”: “ Weaving tool”
Using Tagged Sets Pingback: Register of a link made to a set tagged as “weaving tools” from a weblog
Integration at a General Level
Speed and ease of mapping content into ArchaeoML systems
Significant cost reduction if most contributors can do it themselves
Important for small, individual or project generated research
Enables powerful query and analysis across multiple projects
But NOT very specific.
Example: Composing queries still uses each project’s local recording system (even though several projects can be queried simultaneously and their results pooled)
Schema Mapping into ArchaeoML
Importer an important part, most people work with Excel, Filemaker, Access…
Goal: Individuals can upload their own data, map them into ArchaeoML and submit for review and publishing
Data expressed in ArchaeoML
Ready for Open Context, OCHRE dissemination
Interoperability, longevity advantages
Project’s original terminology is maintained
Data described with high-level metadata
Dublin Core, TimeMap. Can be expressed in RDF, COinS, ArchaeoML (XML), etc.
Data schema mappings recorded
Import process saves mapping parameters.
Internet Archive Accession
Outcomes
Petra Open City Eric C. Kansa Executive Director, ISD Program, School of Information, UC Berkeley
Faceted Browse
Petra Great Temple:
128,187 locations / objects
1.1 million descriptions
1626 media objects ( more to come )
298,500 relationships
Penelope 2
Petra Great Temple:
12 individual databases (some very large)
~200 text documents “mined”
1600+ related media files
Penelope 1
Penelope 1
Penelope 2
Penelope 3
Penelope 4
Today
My background
Sharing Field Documentation
Open Context
Unresolved Issues and Next Steps
Bugs, interface problems
Truncated development (I got a new job…)
Just beginning user evaluations
Schema mapping is major challenge
Recent collaborations, hiring should help (stay tuned!)
User Experience Image by Jeff Kubina via Flickr (CC-by license) <http://www.flickr.com/photos/kubina/296367267/>
Unlocking Open Context
Unlocking Open Context
Web services
Clear need to facilitate “mash-ups”
Community / organization specific portals and views of content
Example Application: Second Life or Croquet
Most current virtual visualizations are one-off projects, have little applicability to other sites / collections
Dynamically link online data stores so visualizations can be easier do develop / more meaningful
Records in Open Context XML data output, enables: (1) Sharing between web resources (2) Custom presentation (Brown University-specific style templates, etc.)
“ Cultural Resource Management”
90% of US archaeology
Un-circulated “gray literature” reports
Collaboration with the San Diego Archaeological Center
400 datasets, representing 500,000 locations and objects
4-5 “Petras” worth of data
Scaling issues becoming paramount!
Data Inundation Image by “Doegox” via Flickr (CC-by license) <http://www.flickr.com/photos/doegox/2085419215/>
Metaweb
Exploring Metaweb
ArchaeoML seems to map well to their data store
Powerful API
Large user community
Scale, performance
Need to understand concerns!
Open Data Protocol
Advocated by Science Commons
Use of “CC-zero” license
Public domain data, reliance on social norms for appropriate use
Solves important problems
Questions
Multiple stakeholder communities
Very different, conflicting norms
Glocal Backlash
Internet, one of the principle ways traditional knowledge and heritage is/will be accessed
Local claims and notions of privacy, propriety, spirituality often missing
Can have dark-side too! (essentialism, ethno-nationalism, fundamentalisms)
Captain Hook award winner for bio-piracy 2006 1 1 Andrew Donoghue, ZDNet UK (March 29, 2006) http://news.zdnet.co.uk/business/0,39020645,39260264,00.htm
Traditional Knowledge
Jason Schulz and Ahrash Bissell (co-authors)
How CC-licenses can be applied, where they may be inappropriate
Some Rights Reserved
Public Tensions
Prospects to collaborate with amateur communities?
Site security
“ Fantastic” archaeology
"Pothunters" destroying an archaeological site on the Columbia River (Oregon, USA) Image by “gbaku” via Flickr (CC-By-SA license) <http://www.flickr.com/photos/gbaku/1074322614/>
Today
My background
Sharing Field Documentation
Open Context
Unresolved Issues and Next Steps
… .. Now for the thanks!
Open Context Developers
Eric Kansa (Lead developer, tagging system, interface design)
Ahrash Bissell (Penelope design, usability)
Nathan Hirth (XML, XSLT, schema mapping)
David Schloen (ArchaeoML schema)
Sarah W. Kansa (Usability, interface design, documentation)
Jeanne Lopiparo (Interface and graphic design, usability)
Michael Ashley (Filemaker item-view mockup)
Chris Hoffman (Usability, optimization )
Special Thanks University of Chicago: OCHRE Project The Electronic Frontier Foundation Doris and Donald Fisher Presidio Archaeology: NPS, Golden Gate National Rec. Area Science Commons Internet Archive (media repository services) “ Friday Afternoon Seminar” “ Friday Afternoon Seminar”
Special Thanks University of Chicago: OCHRE Project The Electronic Frontier Foundation Doris and Donald Fisher “ Friday Afternoon Seminar” Presidio Archaeology: NPS, Golden Gate National Rec. Area Science Commons Internet Archive (media repository services)
The common use by archaeologists of ubiquitous tech more
The common use by archaeologists of ubiquitous technologies such as computers and digital cameras means that archaeological research projects now produce huge amounts of diverse, digital documentation. However, while the technology is available to collect this documentation, we still largely lack community accepted dissemination channels appropriate for such torrents of data. Open Context (http://www.opencontext.org) aims to help fill this gap by providing open access data publication services for archaeology. Open Context has a flexible and generalized technical architecture that can accommodate most archaeological datasets, despite the lack of common recording systems or other documentation standards. Open Context includes a variety of tools to make data dissemination easier and more worthwhile. Authorship is clearly identified through citation tools, a web-based publication systems enables individuals upload their own data for review, and collaboration is facilitated through easy download and other features. While we have demonstrated a potentially valuable approach for data sharing, we face significant challenges in scaling Open Context up for serving large quantities of data from multiple projects. less
0 comments
Post a comment