• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Archaeology, Informatics and Knowledge Representation
 

Archaeology, Informatics and Knowledge Representation

on

  • 618 views

Presented by Anthony Beck

Presented by Anthony Beck

Statistics

Views

Total Views
618
Views on SlideShare
618
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Time team
  • Understand complex relationships in the fragmented archaeological recordArchaeology as human ecology
  • Sections, plans, sensors
  • Classification and identification of material (artefacts - finds)
  • Environmental Sampling – ecofacts
  • Environmental Processing
  • Environmental Analysis – identifying and counting grains
  • Data RichData Archiving - Building the silo
  • INFORMATION OVERLOADUnstructured
  • Too Much
  • Formal structures inhibit collaboration and accessInformal networks established to make the data work effectively
  • The nature of knowledge From a policy perspective there are different levels of knowledge awareness know what we know (the data we have access to) know there are things we don't know (the relevant data which is not accessible) and recognise there are things that we are unaware of which may be extremely important (the potential knowledge advances gained by integrating all data, collaborating with different domains and future research avenues). Ideally we want to increase the size of the accessible knowledge so that policy can be formed from a position of ‘perfect’, or ‘near-perfect’, knowledge.
  • Templates take tabular data as input:directly from delimited data file, or as output of SQL query on internal database

Archaeology, Informatics and Knowledge Representation Archaeology, Informatics and Knowledge Representation Presentation Transcript

  • Housekeeping • Notes: http://goo.gl/l6xGa • Mindmap: http://goo.gl/4O1QX • Presentation: – Slideshare: – Prezi:
  • What is archaeology?
  • Work is not conducted in 3 days!
  • The reality is more mundane
  • The past is a foreign place • Archaeological knowledge acquisition is a Interpretation dynamic process • Dynamic feedback allows Synthesis theories/practice to be tested or revised
  • Archaeology is led by
  • Theories structure this evidence
  • Archaeological evidence (practice)
  • • Primary data – Excavation records – Remote sensing transcriptions – NMP – Lab Analysis – Specialist reports• Decoupled synthetic data – Site reports – SMR – NMR
  • Grey Literature
  • Grey Literature • Archaeology units conduct most excavations in the UK • Problem – Predominantly paper based recording (still) – Primary record is difficult to access – Excavations are written up as site reports (interpretative and data summary) – These reports are not published: hence Grey Literature
  • Isn’t this wrong?
  • The Corpus helps Decision Making
  • Ideally we want to increase the size of the accessible knowledge so thatpolicy can be formed from a position of „perfect‟, or „near-perfect‟, knowledge.
  • Why do we need to model? • Problems – An idealised view of the world • Basic models do not support rich data • Do not support change management • Are not dynamic – Semantically unclear – Fundamental database issues • Lack of atomicity • Poor URIs – INCONSISTENCY
  • Why do we need to model? • Benefits – Archaeoinformatics – to appropriately represent the archaeological record • Variations in – Quality – Certainty • Provenance (who did what) • Presence/absence – Support and enhance process • Inference • Evidence • Multivocality (handling multiple interpretations) – Improving structure
  • Modelling : Ontologies • An ontology is an explicit specification of a domain in terms of entities and relationship between these entities. • The relationship between entities provide the semantics or meaning of the data. • This information is normally hidden in a dataset, e.g., databases, but it is made more explicit in an ontology. • This explicit specification facilitates – Querying – Complex reasoning – Extension of the knowledge by inference
  • Technologies
  • Modelling: SKOS • SKOS provides a standard way to represent knowledge organization systems in RDF: – thesauri, – classification schemes, – subject heading systems – taxonomies • SKOS is a data model for data that can be published on the web.
  • Modelling: SKOS
  • Modelling: CiDOC CRM ontology • provides definitions and a formal structure for describing the implicit and explicit concepts and relationships used in cultural heritage documentation. • enables information exchange and interchange between heterogeneous sources of cultural heritage information by providing a common and extensible semantic framework • promote a shared understanding of cultural heritage information by transforming disparate, localised information sources into a coherent global resource.
  • Modelling: CiDOC CRM ontology • This is the most dominant ontology in cultural heritage. • It is intended to cover the full spectrum of cultural heritage knowledge - from Archaeology to Art history, literary and musical entities. • An ontology of 86 classes and 137 properties for culture and more. • With the capacity to explain hundreds of (meta)data formats. • International standard since 2006 - ISO 21127:2006. • The ontology has been encoded in OWL2.0, OWLDL and RDFS.
  • Conceptualization ? approximates explains, motivates Data structures & Presentation model organize Data Legacy Legacy bases systemsWorld Phenomena systems Data in various forms
  • AD461 * * P82 at some time AD453 withinP11 had participant: Death of Death ofP93 took o.o.existence: Leo I AttilaP92 brought i. existence: P82 at some time * within AD452 before P4 has time-span before (is time- span of) Attila Pope Leo I P14 carried out by meeting P14 carried out by Attila (performed) Leo I (performed) before before before Birth of Birth of Deduction: Leo I Attila
  • Modelling: CIDOC-CRM-EH • Archaeology extension to CiDOC CRM • Mostly extensions to classes not properties • Archaeological concepts expand the scope of CIDOC concepts –e.g. – EHE0003: AreaOfInvestigation(IsA E53: Place) – EHE0007 Context (IsA E53: Place) – EHE0005 Group (IsA E53: Place)
  • Modelling: CIDOC-CRM-EH
  • What can we do with thesemodelling tools • The basics: effective data management • Linked Open Data – With limited semantic inconsistencies • Deposition of data in CRM-EH RDF • Leverage the archive – Mapping to Ontologies and SKOS – Dealing with documents (which in some instances are the only archive) • Open Access
  • Examples – Populating the recordfrom grey literature
  • Grey Literature • Prof. Richard Bradley – Reading Uni – http://www.nature.com/news/2010/100407/full/464826 a.html – Recognised the potential of Grey Literature – Visited contract units • Collated „grey literature‟ – Analysed for • Understanding of Bronze Age settlement patterns and dynamics – Transformed • Theory and interpretative frameworks • What about the backlog?
  • STAR project • http://hypermedia.research.glam.ac.uk/kos /star/ • Open up the grey literature to scholarly research. • Develop new methods for enhancing linkages between digital archive database resources and to associated grey literature, exploiting the potential of a high level, core ontology.
  • NLP- Rule Based InformationExtraction • Aims to Enable „rich‟, semantic aware indexing of Archaeology fieldwork reports (Grey Literature) with respect to the CRM-EH Conceptual Reference Model (Ontology) • Grey Literature; source materials that can not be found through the conventional means of publication • OASIS – Online AccesS to the Index of achaeological investigationS – Coordinated by ADS – Online index to Archaeological Grey Literature – Accessed via ADS ArchSearch online Service (http://www.oasis.ac.uk)
  • NLP- General Architecture for TextEngineeringJava Pattern Engine EH Thesaurus Gazetteer Lists ADS – OASIS Grey Literature XML structures to represent semantic properties
  • Name Entity Recognition (NER) • The NER phase is targeted to extract the following annotation types with respect to the CIDOC-CRM. – E4.Period – E19.Physical Object – E53.Place – E57.Material • Supports ambiguous and context searches.
  • Example of Grey LiteratureAnnotations
  • Examples – Landscape stratigraphy The matrix reflects the relative position and stratigraphic contacts of observable stratigraphic units, or contexts.
  • Examples – Landscape stratigraphy • Computerized stratigraphy has a long research history – (e.g. ArchEd, Stratify) – Satisfies many visualization and error checking issues • Some problems concern: – the difficulty to manage large/huge datasets – the difficulty to integrate a digital matrix representation with other software (e.g. GIS) – the difficulty to handle multilinear stratigraphic sequences – the difficulty to manage uncertain or insufficient knowledge • This is a real problem for DYNAMIC data
  • M. Cattani et al – Prolog system Variables („Terms‟) X, Y, Z, ...Al p h a b e t Constants („Terms‟) us1, us2, us3, ... Unary Predicates („Atoms‟) cutUnit(X), trench(X), wall(X), ... Binary Predicates („Atoms‟) cover(X,Y), fill(X,Y), cut(X,Y), ... Positive & Negative Literals („Atoms‟ or „Classically Negated Atoms‟)Ax i om wall(X), cut(X,Y),..., cover(X,Y), fill(X,Y), ... Ground Literals cover(u6,u3), filledBy(u4,u8), ... naf-Literals („Atoms‟ or „Atoms preceeded by not‟) ..., not cover(X,Y), not fill(X,Y), ... Rules dirPostTo(Z,Y) :- equalTo(X,Y),cover(Z,X).
  • Archeometrical SpecS DATA UPDATE DATA cover(us1,us2). cover(us1,us2). cover(us1,us3). cover(us1,us3). fill(us2,us4). fill(us2,us4). fill(us3,us4). fill(us3,us4). equalTo(us2,us4).DLV [build BEN/Oct 11 2007 gcc 3.4.5 (mingw special)] DLV [build BEN/Oct 11 2007 gcc 3.4.5 (mingw special)]{posteriorTo(us1,us2), posteriorTo(us1,us3),posteriorTo(us1,us4), posteriorTo(us2,us4),posteriorTo(us3,us4), posteriorTo(us2,us3)}{posteriorTo(us1,us2), posteriorTo(us1,us3),posteriorTo(us1,us4), posteriorTo(us2,us4),posteriorTo(us3,us4), posteriorTo(us3,us2)}{posteriorTo(us1,us2), posteriorTo(us1,us3), {posteriorTo(us1,us2), posteriorTo(us1,us3),posteriorTo(us1,us4), posteriorTo(us2,us4), posteriorTo(us1,us4), posteriorTo(us2,us4),posteriorTo(us3,us4), contemporary(us3,us2), posteriorTo(us3,us4), contemporary(us3,us2),contemporary(us2,us3)} contemporary(us2,us3)}
  • linPostTo(u6,u3), linPostTo(u1,u7), linPostTo(u4,u10), linPostTo(u2,u12), linPostTo(u3,u2), linPostTo(u12,u5), linPostTo(u7,u8), linPostTo(u10,u11)linPostTo(u5,u1), linPostTo(u8,u9)} The pruned Harris‟s example – using Prolog and inferencing{contemporary(u6,u6), contemporary(u1,u1), contemporary(u4,u4), contemporary(u2,u2), contemporary(u3,u3), contemporary(u12,u12), contemporary(u7,u7)contemporary(u9,u9), contemporary(u11,u11), contemporary(u10,u10), contemporary(u5,u5), contemporary(u8,u8), linPostTo(u6,u3), linPostTo(u1,u7)linPostTo(u2,u12), linPostTo(u3,u2), linPostTo(u12,u5), linPostTo(u7,u8), linPostTo(u10,u11), linPostTo(u5,u1)}{contemporary(u6,u6), contemporary(u1,u1), contemporary(u4,u4), contemporary(u2,u2), contemporary(u3,u3), contemporary(u12,u12), contemporary(u7,u7)contemporary(u9,u9), contemporary(u11,u11), contemporary(u10,u10), contemporary(u5,u5), contemporary(u8,u8), linPostTo(u6,u3), linPostTo(u1,u7)linPostTo(u4,u10), linPostTo(u2,u12), linPostTo(u3,u2), linPostTo(u12,u5), linPostTo(u7,u8), linPostTo(u9,u11), linPostTo(u5,u1)}{contemporary(u6,u6), contemporary(u1,u1), contemporary(u4,u4), contemporary(u2,u2), contemporary(u3,u3), contemporary(u12,u12), contemporary(u7,u7) equalTo(u1,u4).contemporary(u9,u9), contemporary(u10,u9), contemporary(u11,u11), contemporary(u9,u10), contemporary(u10,u10), contemporary(u5,u5), contemporary(u8,u8) equalTo(u9,u10).linPostTo(u6,u3), linPostTo(u1,u7), linPostTo(u4,u10), linPostTo(u2,u12), linPostTo(u3,u2), linPostTo(u12,u5), linPostTo(u7,u8), linPostTo(u9,u11)linPostTo(u10,u11), linPostTo(u5,u1)}{contemporary(u6,u6), contemporary(u1,u1), contemporary(u10,u1), contemporary(u4,u4), contemporary(u2,u2), contemporary(u3,u3), contemporary(u12,u12)contemporary(u7,u7), contemporary(u9,u9), contemporary(u11,u11), contemporary(u1,u10), contemporary(u10,u10), contemporary(u5,u5), contemporary(u8,u8)linPostTo(u6,u3), linPostTo(u1,u7), linPostTo(u4,u10), linPostTo(u2,u12), linPostTo(u3,u2), linPostTo(u12,u5), linPostTo(u7,u8), linPostTo(u9,u11)linPostTo(u5,u4), linPostTo(u8,u9)}{contemporary(u6,u6), contemporary(u1,u1), contemporary(u4,u4), contemporary(u2,u2), contemporary(u3,u3), contemporary(u12,u12), contemporary(u7,u7)contemporary(u9,u9), contemporary(u11,u11), contemporary(u10,u10), contemporary(u5,u5), contemporary(u8,u8), linPostTo(u6,u3), linPostTo(u4,u10)linPostTo(u2,u12), linPostTo(u3,u2), linPostTo(u12,u5), linPostTo(u7,u8), linPostTo(u9,u11), linPostTo(u5,u1), linPostTo(u8,u9)}{contemporary(u6,u6), contemporary(u1,u1), contemporary(u4,u1), contemporary(u1,u4), contemporary(u4,u4), contemporary(u2,u2), contemporary(u3,u3)contemporary(u12,u12), contemporary(u7,u7), contemporary(u9,u9), contemporary(u11,u11), contemporary(u10,u10), contemporary(u5,u5), contemporary(u8,u8)linPostTo(u6,u3), linPostTo(u4,u10), linPostTo(u2,u12), linPostTo(u3,u2), linPostTo(u12,u5), linPostTo(u7,u8), linPostTo(u9,u11), linPostTo(u5,u1)linPostTo(u5,u4), linPostTo(u8,u9)}{contemporary(u6,u6), contemporary(u1,u1), contemporary(u4,u4), contemporary(u2,u2), contemporary(u3,u3), contemporary(u12,u12), contemporary(u7,u7)contemporary(u9,u9), contemporary(u11,u11), contemporary(u10,u10), contemporary(u5,u5), contemporary(u8,u8), linPostTo(u6,u3), linPostTo(u2,u12)linPostTo(u3,u2), linPostTo(u12,u5), linPostTo(u7,u8), linPostTo(u9,u11), linPostTo(u5,u4), linPostTo(u8,u9)}{contemporary(u6,u6), contemporary(u1,u1), contemporary(u4,u4), contemporary(u8,u4), contemporary(u2,u2), contemporary(u3,u3), contemporary(u12,u12)contemporary(u7,u7), contemporary(u9,u9), contemporary(u11,u11), contemporary(u10,u10), contemporary(u5,u5), contemporary(u4,u8), contemporary(u8,u8) {linPostTo(u6,u3), linPostTo(u1,u7), linPostTo(u4,u10), linPostTo(u2,u12),linPostTo(u6,u3), linPostTo(u1,u7), linPostTo(u2,u12), linPostTo(u3,u2), linPostTo(u12,u5), linPostTo(u7,u8), linPostTo(u10,u11), linPostTo(u5,u1)linPostTo(u8,u9)} linPostTo(u3,u2), linPostTo(u12,u5), linPostTo(u7,u8), linPostTo(u9,u11), linPostTo(u10,u11), linPostTo(u5,u1), linPostTo(u5,u4), linPostTo(u8,u9), contemporary(u1,u4), contemporary(u9,u10)}{contemporary(u6,u6), contemporary(u1,u1), contemporary(u4,u4), contemporary(u8,u4), contemporary(u2,u2), contemporary(u3,u3), contemporary(u12,u12)contemporary(u7,u7), contemporary(u9,u9), contemporary(u11,u11), contemporary(u10,u10), contemporary(u5,u5), contemporary(u4,u8), contemporary(u8,u8)linPostTo(u6,u3), linPostTo(u1,u7), linPostTo(u4,u10), linPostTo(u2,u12), linPostTo(u3,u2), linPostTo(u12,u5), linPostTo(u7,u8), linPostTo(u9,u11)linPostTo(u5,u1)}{contemporary(u6,u6), contemporary(u1,u1), contemporary(u4,u4), contemporary(u8,u4), contemporary(u2,u2), contemporary(u3,u3), contemporary(u12,u12)contemporary(u7,u7), contemporary(u9,u9), contemporary(u10,u9), contemporary(u11,u11), contemporary(u9,u10), contemporary(u10,u10), contemporary(u5,u5)
  • Examples – Landscape stratigraphy • Provides a formal framework – Scale stratigraphic calculations – Manage and articulate uncertainty – Web-based • A foundation to develop new approaches • Based on: M. Cattani, G. Mantegari, A. Mosca, M. Palmonari • http://goo.gl/T5yH7
  • Examples – Heterogenous PotterySequences • Pottery is important for dating sites and deposits • Classification based on form and fabric variations • Dates derived from stratified sequences (e.g. wells) • Pottery sequences developed locally and integrated – – Regionally – Nationally
  • Clumping and splitting • Periodically sequences are reviewed – Clumping (owl:sameas) – Splitting – Refining date ranges • Date changes impact on: – Interpretation – Knowledge – Policy – Think “Grey literature” but bigger! • Unfortunately the data is decoupled and not linked. The primary and synthetic data is never/rarely re-interpreted
  • Examples – Data integrationthrough CiDOC CRM • Need to produce an information model that reflected continuing best archaeological practice • Not fossilize structures of existing systems • Model new information requirements • Map to specific old legacy data and new data fields • Integration of old & new project recording systems ( CSV, MDB, MySQLet al) • Use ontology to model conceptual relationships between data
  • STAR & STELLAR • Current situation is one of fragmented datasets and applications, with different terminology systems • Need for integrative conceptual framework – English Heritage extended CIDOC CRM ontology for archaeology • Need for terminology control – English Heritage Thesauri – Recording Manual glossaries augmented with dataset glossaries
  • STAR/STELLAR architectureApplications – Server Side, Rich Client, Browser Web Services, SQL, SPARQLRDF Based Common Ontology Data Layer (CRM / CRMEH / SKOS)Indexing SKOS Conversion Data Mapping / Normalisation EH STAN RRAD IADB LEAP RPRE Grey thesauri, literature glossaries
  • STELLAR Data Conversions SQL2CSV SQLEXECUTE SQL2TAB Delimited Data Database CSV2DB TAB2DB CSV2STG TAB2STG SQL2STG DELIM2STG User-defined template [other textual XML RDF formats]
  • Consistent URIs - Convention • Namespace prefix – E.g. http://stellar/silchester/ • Entity type – E.g. “EHE0007” (i.e. Context) • Identifier (data value) – E.g. “1015” • URI pattern: {prefix}{entity type}_{value} – E.g. http://stellar/silchester/EHE0007_1015 • Consistent identifiers facilitate incremental enrichment of data
  • Cross Searching • The STAR demonstrator – Making use of the decoupled RDF files – Cross searching between grey literature and datasets – A SPARQL engine supports the semantic search • Semantic Search Examples – Context of type X containing Find of type Y: “hearth” containing “coin”, – Context Find of type X within Context of type Y: “Animal Remains” within “pit”.
  • Cross Searching “the pit produced a range of artefactual material“all domestic fire-places that contain money” which included animal bone”
  • Long term vision and the future
  • Long term vision and the future • Archaeological data sources are fragmentary by nature. • Theoretical approaches used by practitioners are diverse. • “Data is sacred” - expressing one‟s knowledge base in terms of another‟s ontology, may not always be “acceptable”. • Adoption of the Semantic Web by the heritage sector depends upon the syntactical and semantic mark-up of content. • The sector should coordinate their efforts to ensure that the fundamental building blocks that can enable their success on the semantic web are in place. • Try not to “reinvent the wheel” in terms of metadata - use existing annotation schemes.
  • Heritage: a complex tapestry
  • Implications of silo-ed data • No synergy Interpretation • Cripples the knowledge frameworks • Less effective – Research – Policy – Impact Synthesis
  • Credits and Thanks • Ceri Binding • Paul Cripps • Glauco Mantegari • Keith May • Monika Solanki • Doug Trudhope • Andreas Vlachidis