Invited report in Proceedings of "Digital Presentation and Preservation of Cultural and Scientific Heritage" (DiPP2012) conference, September 2012, Veliko Tarnovo, Bulgaria.
This document discusses linked open data, which organizes data using URIs and RDF triples so that the data can be connected across sources. It provides examples of datasets that use linked open data principles like DBpedia, Wikidata, and Schema.org. The document outlines the semantic web stack for linked data, including RDF, RDF Schema, and OWL. It discusses benefits like preparing data for unknown future uses and enabling automation through linking related information.
"Semantic Integration Is What You Do Before The Deep Learning". dev.bg Machine Learning seminar, 13 May 2019.
It's well known that 80\% of the effort of a data scientist is spent on data preparation. Semantic integration is arguably the best way to spend this effort more efficiently and to reuse it between tasks, projects and organizations. Knowledge Graphs (KG) and Linked Open Data (LOD) have become very popular recently. They are used by Google, Amazon, Bing, Samsung, Springer Nature, Microsoft Academic, AirBnb… and any large enterprise that would like to have a holistic (360 degree) view of its business. The Semantic Web (web 3.0) is a way to build a Giant Global Graph, just like the normal web is a Global Web of Documents. IEEE already talks about Big Data Semantics. We review the topic of KGs and their applicability to Machine Learning.
This document provides guidance on developing Horizon 2020 proposals, including building organization and team profiles, finding synergies to build consortia, and preparing proposal budgets. It discusses creating online profiles on websites, LinkedIn, and other platforms to showcase capabilities and experience. Guidelines are given for participating in or coordinating a proposal, including defining a concept note, mapping capabilities to calls, finding partners, and coordinating preparation. The document also covers preparing personnel costs and budgets in detail, emphasizing the importance of accurate assumptions and practical planning.
The Galleries, Libraries, Archives and Museums (GLAM) sector deals with complex and varied data. Integrating that data, especially across institutions, has always been a challenge. Semantic data integration is the best approach to deal with such challenges. Linked Open Data (LOD) enable large-scale Digital Humanities (DH) research, collaboration and aggregation, allowing DH researchers to make connections between (and make sense of) the multitude of digitized Cultural Heritage (CH) available on the web. An upsurge of interest in semtech and LOD has swept the CH and DH communities. An active Linked Open Data for Libraries, Archives and Museums (LODLAM) community exists, CH data is published as LOD, and international collaborations have emerged. The value of LOD is especially high in the GLAM sector, since culture by its very nature is cross-border and interlinked. We present interesting LODLAM projects, datasets, and ontologies, as well as Ontotext's experience in this domain. An extended paper on these topics is also available. It has 77 pages, 67 figures, detailed info about CH content and XML standards, Wikidata and global authority control.
Invited report in Proceedings of "Digital Presentation and Preservation of Cultural and Scientific Heritage" (DiPP2012) conference, September 2012, Veliko Tarnovo, Bulgaria.
This document discusses linked open data, which organizes data using URIs and RDF triples so that the data can be connected across sources. It provides examples of datasets that use linked open data principles like DBpedia, Wikidata, and Schema.org. The document outlines the semantic web stack for linked data, including RDF, RDF Schema, and OWL. It discusses benefits like preparing data for unknown future uses and enabling automation through linking related information.
"Semantic Integration Is What You Do Before The Deep Learning". dev.bg Machine Learning seminar, 13 May 2019.
It's well known that 80\% of the effort of a data scientist is spent on data preparation. Semantic integration is arguably the best way to spend this effort more efficiently and to reuse it between tasks, projects and organizations. Knowledge Graphs (KG) and Linked Open Data (LOD) have become very popular recently. They are used by Google, Amazon, Bing, Samsung, Springer Nature, Microsoft Academic, AirBnb… and any large enterprise that would like to have a holistic (360 degree) view of its business. The Semantic Web (web 3.0) is a way to build a Giant Global Graph, just like the normal web is a Global Web of Documents. IEEE already talks about Big Data Semantics. We review the topic of KGs and their applicability to Machine Learning.
This document provides guidance on developing Horizon 2020 proposals, including building organization and team profiles, finding synergies to build consortia, and preparing proposal budgets. It discusses creating online profiles on websites, LinkedIn, and other platforms to showcase capabilities and experience. Guidelines are given for participating in or coordinating a proposal, including defining a concept note, mapping capabilities to calls, finding partners, and coordinating preparation. The document also covers preparing personnel costs and budgets in detail, emphasizing the importance of accurate assumptions and practical planning.
The Galleries, Libraries, Archives and Museums (GLAM) sector deals with complex and varied data. Integrating that data, especially across institutions, has always been a challenge. Semantic data integration is the best approach to deal with such challenges. Linked Open Data (LOD) enable large-scale Digital Humanities (DH) research, collaboration and aggregation, allowing DH researchers to make connections between (and make sense of) the multitude of digitized Cultural Heritage (CH) available on the web. An upsurge of interest in semtech and LOD has swept the CH and DH communities. An active Linked Open Data for Libraries, Archives and Museums (LODLAM) community exists, CH data is published as LOD, and international collaborations have emerged. The value of LOD is especially high in the GLAM sector, since culture by its very nature is cross-border and interlinked. We present interesting LODLAM projects, datasets, and ontologies, as well as Ontotext's experience in this domain. An extended paper on these topics is also available. It has 77 pages, 67 figures, detailed info about CH content and XML standards, Wikidata and global authority control.
Describes the experiences we made with the CRM. The presentation points out three main problems technicians (no technical specification, mapping ambiguities, complexity of mapping chains) will face when they decide to implement the CRM in a real-world application. It also proposes to introduce a kind of mapping guidelines that support potential CRM adopters in producing more homogenous mappings.
Two methods for semi-automated feature extractionariadnenetwork
from lidar-derived DEMdesigned for cairn-fields and burial mounds
Benjamin Štular
Znanstvenoraziskovalni Center Slovenske Akademije Znanosti In Umetnosti (ZRC SAZU)
Scientific Research Centre of the Slovenian Academy of Sciences and Arts
CAA 2016, Oslo
Large-scale Reasoning with a Complex Cultural Heritage Ontology (CIDOC CRM) ...Vladimir Alexiev, PhD, PMP
Vladimir Alexiev, Dimitar Manov, Jana Parvanova and Svetoslav Petrov. In proceedings of workshop Practical Experiences with CIDOC CRM and its Extensions (CRMEX 2013) at TPDL 2013, 26 Sep 2013, Valetta, Malta
Metadata for 3D models, presentation given by Sheena Bassett at the ArcheoLandscapes Conference in Romania in October 2014.
The presentation describes the aims of the 3D ICONS project, the uses of metadata, the CARARE metadata schema, paradata and the requirements for metadata in the 3D ICONS project and for Europeana.
Use Cases on Data Integration with Cidoc-CRM and on Standards
Philipp Gerth, Wolfgang Schmidle and Sebastian Cuy
German Archaeological Institute
CAA 2016, Oslo
Presentation and supporting videos also available at: https://prezi.com/tpxt7fr-0f4y
This document discusses challenges in sharing cultural heritage data online and potential solutions. It notes that raw data is difficult for most users to understand and use without additional context and expertise. Effective data sharing requires representing objects and their relationships within meaningful contexts like events, actors, places and time periods. It outlines how the ResearchSpace project at the British Museum aims to address these issues by working with subject experts to enrich data with historical, social and semantic contexts so it can better support research, education and public engagement.
This document discusses the importance of documentation for museums in the future. It argues that documentation is no longer just about cataloguing physical objects, but facilitating dynamic access to collections-based knowledge through various digital assets and narratives. Effective documentation requires following standards like SPECTRUM and adopting principles like "create once, publish everywhere" to make collection information reusable across different platforms and experiences. Strategic collection management is also needed to ensure documentation supports organizational goals and provides rich experiences for diverse audiences. Going forward, documentation will be key to museums maintaining their position as trusted civic institutions in an increasingly connected world.
This document provides an overview of a course on digital humanities. It outlines the topics that will be covered in each of the 12 classes, including introductions to digital humanities, semantic modeling, crowdsourcing and visualization. One class focuses specifically on semantic coding and modeling using standards like RDF, URIs, OWL and SPARQL. It also discusses ontologies like CIDOC-CRM that can be used to semantically represent cultural heritage data.
This document discusses architectures for integrating diverse data sources. It presents three levels of architecture: 1) the data level, where data is extracted, transformed and loaded into a data warehouse or accessed via ontology-based data access. 2) The ontology level, where ontologies are used to provide a common conceptualization of the data sources. 3) The application level, where applications can query over the integrated data sources via the ontologies. Ontologies play an important role in data integration, both by providing mappings between data sources and a common vocabulary for communication between different domains.
Ariadne Booklet 2016: Building a research infrastructure for Digital Archaeol...ariadnenetwork
Authors:
Kate Fernie (PIN and 2Culture Associates Ltd)
Franco Niccolucci (PIN)
Julian Richards (University of York)
Contributors:
Achille Felicetti, Ilenia Galluccio and Paola Ronzino (PIN),
Bruno Fanini (ITABC CNR)
Carlo Meghini, Matteo Dellepiane and Roberto Scopigno (ISTI CNR)
Dimitris Gavrilis (Athena Research Centre)
Douglas Tudhope (University of South Wales)
Elizabeth Fentress (AIAC)
Guntram Geser (Salzburg Research)
Holly Wright (University of York)
Johan Fihn (SND)
Maria Theodoridou (ICS Forth)
L'utilisation d’ontologies dans le cadre de BiblissimaEquipex Biblissima
Présentation du travail en cours sur le portail Biblissima aux journées "Ontologie en Sciences Humaines et Sociales" par Stefanie Gehrke (MSH Val de Loire, Tours, 09/11/2015)
Presentation for NEC Lab Europe.
Knowledge graphs are increasingly built using complex multifaceted machine learning-based systems relying on a wide of different data sources. To be effective these must constantly evolve and thus be maintained. I present work on combining knowledge graph construction (e.g. information extraction) and refinement (e.g. link prediction) in end to end systems. In particular, I will discuss recent work on using inductive representations for link predication. I then discuss the challenges of ongoing system maintenance, knowledge graph quality and traceability.
Event-based MultiMedia Search and Retrieval for Question AnsweringBenoit HUET
The document describes a framework for event-based multimedia search and retrieval for question answering. It discusses parsing user queries to extract semantic entities like location and time. Events are collected from news sources and clustered. Textual and visual media are gathered from social media to summarize events. Visual concepts are mapped to queries to enrich text-based search. Experimental results show combining textual and visual scores using concept mapping improves search performance. The optimal approach weights textual scores higher than visual scores.
This document provides an introduction to data science and analytics, including:
1. It defines data science as the extraction of knowledge from large volumes of structured and unstructured data through data mining, predictive analytics, and other techniques.
2. It outlines the typical data science pipeline of collecting data, cleaning it, exploring it, engineering features, building models, and communicating results.
3. It discusses various data analytics tools like Spark, Complex Event Processing, machine learning libraries, and dashboards that can be used to analyze data, build models, and communicate insights.
Introduction to Data Science and AnalyticsSrinath Perera
This webinar serves as an introduction to WSO2 Summer School. It will discuss how to build a pipeline for your organization and for each use case, and the technology and tooling choices that need to be made for the same.
This session will explore analytics under four themes:
Hindsight (what happened)
Oversight (what is happening)
Insight (why is it happening)
Foresight (what will happen)
Recording http://t.co/WcMFEAJHok
Describes the experiences we made with the CRM. The presentation points out three main problems technicians (no technical specification, mapping ambiguities, complexity of mapping chains) will face when they decide to implement the CRM in a real-world application. It also proposes to introduce a kind of mapping guidelines that support potential CRM adopters in producing more homogenous mappings.
Two methods for semi-automated feature extractionariadnenetwork
from lidar-derived DEMdesigned for cairn-fields and burial mounds
Benjamin Štular
Znanstvenoraziskovalni Center Slovenske Akademije Znanosti In Umetnosti (ZRC SAZU)
Scientific Research Centre of the Slovenian Academy of Sciences and Arts
CAA 2016, Oslo
Large-scale Reasoning with a Complex Cultural Heritage Ontology (CIDOC CRM) ...Vladimir Alexiev, PhD, PMP
Vladimir Alexiev, Dimitar Manov, Jana Parvanova and Svetoslav Petrov. In proceedings of workshop Practical Experiences with CIDOC CRM and its Extensions (CRMEX 2013) at TPDL 2013, 26 Sep 2013, Valetta, Malta
Metadata for 3D models, presentation given by Sheena Bassett at the ArcheoLandscapes Conference in Romania in October 2014.
The presentation describes the aims of the 3D ICONS project, the uses of metadata, the CARARE metadata schema, paradata and the requirements for metadata in the 3D ICONS project and for Europeana.
Use Cases on Data Integration with Cidoc-CRM and on Standards
Philipp Gerth, Wolfgang Schmidle and Sebastian Cuy
German Archaeological Institute
CAA 2016, Oslo
Presentation and supporting videos also available at: https://prezi.com/tpxt7fr-0f4y
This document discusses challenges in sharing cultural heritage data online and potential solutions. It notes that raw data is difficult for most users to understand and use without additional context and expertise. Effective data sharing requires representing objects and their relationships within meaningful contexts like events, actors, places and time periods. It outlines how the ResearchSpace project at the British Museum aims to address these issues by working with subject experts to enrich data with historical, social and semantic contexts so it can better support research, education and public engagement.
This document discusses the importance of documentation for museums in the future. It argues that documentation is no longer just about cataloguing physical objects, but facilitating dynamic access to collections-based knowledge through various digital assets and narratives. Effective documentation requires following standards like SPECTRUM and adopting principles like "create once, publish everywhere" to make collection information reusable across different platforms and experiences. Strategic collection management is also needed to ensure documentation supports organizational goals and provides rich experiences for diverse audiences. Going forward, documentation will be key to museums maintaining their position as trusted civic institutions in an increasingly connected world.
This document provides an overview of a course on digital humanities. It outlines the topics that will be covered in each of the 12 classes, including introductions to digital humanities, semantic modeling, crowdsourcing and visualization. One class focuses specifically on semantic coding and modeling using standards like RDF, URIs, OWL and SPARQL. It also discusses ontologies like CIDOC-CRM that can be used to semantically represent cultural heritage data.
This document discusses architectures for integrating diverse data sources. It presents three levels of architecture: 1) the data level, where data is extracted, transformed and loaded into a data warehouse or accessed via ontology-based data access. 2) The ontology level, where ontologies are used to provide a common conceptualization of the data sources. 3) The application level, where applications can query over the integrated data sources via the ontologies. Ontologies play an important role in data integration, both by providing mappings between data sources and a common vocabulary for communication between different domains.
Ariadne Booklet 2016: Building a research infrastructure for Digital Archaeol...ariadnenetwork
Authors:
Kate Fernie (PIN and 2Culture Associates Ltd)
Franco Niccolucci (PIN)
Julian Richards (University of York)
Contributors:
Achille Felicetti, Ilenia Galluccio and Paola Ronzino (PIN),
Bruno Fanini (ITABC CNR)
Carlo Meghini, Matteo Dellepiane and Roberto Scopigno (ISTI CNR)
Dimitris Gavrilis (Athena Research Centre)
Douglas Tudhope (University of South Wales)
Elizabeth Fentress (AIAC)
Guntram Geser (Salzburg Research)
Holly Wright (University of York)
Johan Fihn (SND)
Maria Theodoridou (ICS Forth)
L'utilisation d’ontologies dans le cadre de BiblissimaEquipex Biblissima
Présentation du travail en cours sur le portail Biblissima aux journées "Ontologie en Sciences Humaines et Sociales" par Stefanie Gehrke (MSH Val de Loire, Tours, 09/11/2015)
Presentation for NEC Lab Europe.
Knowledge graphs are increasingly built using complex multifaceted machine learning-based systems relying on a wide of different data sources. To be effective these must constantly evolve and thus be maintained. I present work on combining knowledge graph construction (e.g. information extraction) and refinement (e.g. link prediction) in end to end systems. In particular, I will discuss recent work on using inductive representations for link predication. I then discuss the challenges of ongoing system maintenance, knowledge graph quality and traceability.
Event-based MultiMedia Search and Retrieval for Question AnsweringBenoit HUET
The document describes a framework for event-based multimedia search and retrieval for question answering. It discusses parsing user queries to extract semantic entities like location and time. Events are collected from news sources and clustered. Textual and visual media are gathered from social media to summarize events. Visual concepts are mapped to queries to enrich text-based search. Experimental results show combining textual and visual scores using concept mapping improves search performance. The optimal approach weights textual scores higher than visual scores.
This document provides an introduction to data science and analytics, including:
1. It defines data science as the extraction of knowledge from large volumes of structured and unstructured data through data mining, predictive analytics, and other techniques.
2. It outlines the typical data science pipeline of collecting data, cleaning it, exploring it, engineering features, building models, and communicating results.
3. It discusses various data analytics tools like Spark, Complex Event Processing, machine learning libraries, and dashboards that can be used to analyze data, build models, and communicate insights.
Introduction to Data Science and AnalyticsSrinath Perera
This webinar serves as an introduction to WSO2 Summer School. It will discuss how to build a pipeline for your organization and for each use case, and the technology and tooling choices that need to be made for the same.
This session will explore analytics under four themes:
Hindsight (what happened)
Oversight (what is happening)
Insight (why is it happening)
Foresight (what will happen)
Recording http://t.co/WcMFEAJHok
In this Webinar Lorenz Bühmann presents the ontology repair and enrichment tool ORE and also the DL-Learner , a machine learning tool to solve supervised learnings tasks and support knowledge engineers in constructing knowledge. Those two beneighbored tools in the LOD2 Stack are for classification and the following quality analysis of Linked Data.
This document provides an overview of using deep learning techniques for recommender systems. It begins with establishing the need for recommender systems due to increasing information overload. It then gives a basic introduction and agenda for the talk, covering motivation, basics, deep learning for vehicle recommendations, and scalability/production. The talk discusses using deep learning approaches like wide and deep learning as well as sequential models to improve recommendation relevance for applications like vehicle recommendations. It provides details on preprocessing, training a classifier, candidate generation and ranking for recommendations. The document concludes with discussing deploying such a system at scale and current trends in recommender system research.
Recommender systems support the decision making processes of customers with personalized suggestions. These widely used systems influence the daily life of almost everyone across domains like ecommerce, social media, and entertainment. However, the efficient generation of relevant recommendations in large-scale systems is a very complex task. In order to provide personalization, engines and algorithms need to capture users’ varying tastes and find mostly nonlinear dependencies between them and a multitude of items. Enormous data sparsity and ambitious real-time requirements further complicate this challenge. At the same time, deep learning has been proven to solve complex tasks like object or speech recognition where traditional machine learning failed or showed mediocre performance.
Join Marcel Kurovski to explore a use case for vehicle recommendations at mobile.de, Germany’s biggest online vehicle market. Marcel shares a novel regularization technique for the optimization criterion and evaluates it against various baselines. To achieve high scalability, he combines this method with strategies for efficient candidate generation based on user and item embeddings—providing a holistic solution for candidate generation and ranking.
The proposed approach outperforms collaborative filtering and hybrid collaborative-content-based filtering by 73% and 143% for MAP@5. It also scales well for millions of items and users returning recommendations in tens of milliseconds.
Event: O'Reilly Artificial Intelligence Conference, New York, 18.04.2019
Speaker: Marcel Kurovski, inovex GmbH
Mehr Tech-Vorträge: inovex.de/vortraege
Mehr Tech-Artikel: inovex.de/blog
Data enrichment is vital for leveraging heterogeneous data sources in various business analyses, AI applications, and data-driven services. Knowledge Graphs (KGs) support the enrichment of heterogeneous data sources by making entities first-class citizens: links to entities help interconnect heterogeneous data pieces or even ease access to external data sources to eventually augment the original data. Data annotation algorithms to find and link entities in reference KGs, as well as to identify out-of-KG entities have been proposed and applied to different types of data, such as tables, and texts. However, despite recent progress in annotation algorithms, the output of these algorithms does not always meet the quality requirements that make the enriched data valuable in downstream applications. As a result, semantic data enrichment remains an effort-consuming and error-prone task. In this seminar, we discuss the relationships between annotation algorithms, data enrichment, and KG construction, highlighting challenges and open problems. In addition, we advocate for a native human-in-the-loop perspective that enables users to control the outcome of the enrichment and, eventually, improve the quality of the enriched data. We focus in particular on the annotation and enrichment of tabular data and briefly discuss the application of a similar paradigm to the enrichment of textual data in the legal domain, e.g., on court decisions and criminal investigation documents.
Boost your data analytics with open data and public news contentOntotext
Get guidance through the gigantic sea of freely available Open Data and learn how it can empower you analysis of any kind of sources.
This webinar is a live demo of news and data analytics, based on rich links within big knowledge graphs. It will show you how to:
Build ranking reports (e.g for people and organisations)
View topics linked implicitly (e.g. daughter companies, key personnel, products …)
Draw trend lines
Extend your analytics with additional data sources
Benchmarking Commercial RDF Stores with Publications Office DatasetGhislain Atemezing
The slides present a benchmark of RDF stores with real-world datasets and queries from the EU Publications Office (PO). The study compares the performance of four commercial triple stores: Stardog 4.3 EE, GraphDB 8.0.3 EE, Oracle 12.2c and Virtuoso 7.2.4.2 with respect to the following requirements: bulk loading, scalability, stability and query execution.
The document provides an overview of the Common European Research Information Format (CERIF). CERIF is a data model and exchange format for research information management. It allows representing entities like projects, organizations, people, and their relationships. Attributes of entities can be multilingual. The tutorial explains key CERIF concepts like entities, attributes, link entities, roles, and the semantic layer for classifying entities and defining roles. CERIF can be implemented in relational databases and exchanged in XML or linked data formats.
IP LodB project (for more details see iplod.io ) capitalizes on LOD database thinking, to build bridges between patented information and scientific knowledge, whilst focusing on individuals who codify new knowledge and their connected organizations, including those who apply patents in new products and services.
As main outputs the IP LodB produced an intellectual property rights (IPR) linked open data (LOD) map (IP LOD map), and has tested the linkability of the European patent (EP) LOD database, whilst increasing the uniqueness of data using different harmonization techniques.
These slides were developed for NIPO workshop
Presentation in the First Workshop on Digital Information Management. The workshop is organized by the Laboratory on Digital Libraries and Electronic Publication, Department of Archives and Library Sciences, Ionian University, Greece and aims to create a venue for unfolding research activity on the general field of Information Science. The workshop features sessions for the dissemination of the research results of the Laboratory members, as well as tutorial sessions on interesting issues.
The document summarizes the plans and activities of DRESD, a research group on dynamic reconfigurability in embedded system design at Politecnico di Milano. It discusses DRESD's research objectives, collaboration with other universities, involvement in teaching courses, and plans to hold workshops and become an official association to support its research vision.
EUBrasilCloudFORUM Working Group on future scenarios and research challengesEUBrasilCloudFORUM .
Congresso Sociedade Brasileira de Computação CSBC2016 Porto Alegre (Brazil)
Workshop on Cloud Networks & Cloudscape Brazil
- James Clarke, Waterford Institute of Technology, Ireland
- Moacyr Martucci, Department of Engineering and Digital Systems of Polytechnic School of University of São Paulo, EUBrasilCloudFORUM member and Brazilian National Contact Point Coordinator for the Horizon 2020 and for ERC, Brazil
EUBrasilCloudFORUM Working Group on future scenarios and research challenges
Talk at 3th Keystone Training School - Keyword Search in Big Linked Data - Institute for Software Technology and Interactive Systems, TU Wien, Austria, 2017
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsCarole Goble
Abstract
slides available at: https://zenodo.org/record/7147703#.Y7agoxXP2F4
The Helmholtz Metadata Collaboration aims to make the research data [and software] produced by Helmholtz Centres FAIR for their own and the wider science community by means of metadata enrichment [1]. Why metadata enrichment and why FAIR? Because the whole scientific enterprise depends on a cycle of finding, exchanging, understanding, validating, reproducing), integrating and reusing research entities across a dispersed community of researchers.
Metadata is not just “a love note to the future” [2], it is a love note to today’s collaborators and peers. Moreover, a FAIR Commons must cater for the metadata of all the entities of research – data, software, workflows, protocols, instruments, geo-spatial locations, specimens, samples, people (well as traditional articles) – and their interconnectivity. That is a lot of metadata love notes to manage, bundle up and move around. Notes written in different languages at different times by different folks, produced and hosted by different platforms, yet referring to each other, and building an integrated picture of a multi-part and multi-party investigation. We need a crate!
RO-Crate [3] is an open, community-driven, and lightweight approach to packaging research entities along with their metadata in a machine-readable manner. Following key principles - “just enough” and “developer and legacy friendliness - RO-Crate simplifies the process of making research outputs FAIR while also enhancing research reproducibility and citability. As a self-describing and unbounded “metadata middleware” framework RO-Crate shows that a little bit of packaging goes a long way to realise the goals of FAIR Digital Objects (FDO)[4], and to not just overcome platform diversity but celebrate it while retaining investigation contextual integrity.
In this talk I will present the why, and how Research Object packaging eases Metadata Collaboration using examples in big data and mixed object exchange, mixed object archiving and publishing, mass citation, and reproducibility. Some examples come from the HMC, others from EOSC, USA and Australia, and from different disciplines.
Metadata is a love note to the future, RO-Crate is the delivery package.
[1] https://helmholtz-metadaten.de/en
[2] Scott, Jason The Metadata Mania, http://ascii.textfiles.com/archives/3181, June 2011
[3] Soiland-Reyes, Stian et al. “Packaging Research Artefacts with RO-Crate”. Data Science, 2022; 5(2):97-138, DOI: 10.3233/DS-210053
[4] De Smedt K, Koureas D, Wittenburg P. “FAIR Digital Objects for Science: From Data Pieces to Actionable Knowledge Units”. Publications. 2020; 8(2):21. https://doi.org/10.3390/publications8020021
An overview of the ICARUS project provided during the European Big Data Value Forum, Parallel Session 1.3 “Transforming Transport”, on November 12th, 2018, in Vienna.
This document summarizes the key new features in Spark 2.0, including a new Spark Session entry point that unifies SQLContext and HiveContext, unified Dataset and DataFrame APIs, enhanced SQL features like subqueries and window functions, built-in CSV support, machine learning pipeline persistence across languages, approximate query functions, whole-stage code generation for performance improvements, and initial support for structured streaming. It provides examples of using several of these new features and discusses Combient's role in helping customers with analytics.
Similar to Methodological tips for mappings to CIDOC CRM (20)
This document introduces a workshop about the Visual Media Service provided by the ARIADNEplus project. It provides information about the tools offered by the Visual Media Service and invites participants to provide feedback. The workshop will introduce the service and its features like 3D modeling, relightable images, and high resolution images. Participants are asked to comment on useful features and potential improvements.
DANS Data Trail Data Management Tools for Archaeologistsariadnenetwork
With the arrival of ARIADNEplus there is a searchable catalogue of datasets that helps archaeological researchers navigate the “maze” of data and archives. Especially for archaeological researchers, support staff and data managers, a set of tools has now been developed that helps in making your data management plan. Hella Holander, Peter Doorn and Paola Ronzino introduced the tools to the participants during the workshop.
The ARIADNEplus online toolset for data management consists of three parts:
a protocol for archaeological data management,
a template for researchers to create a data management plan with archaeological data,
a manual containing all guidelines, recommendations and practical examples of data management.
In just six steps, the protocol takes you through the entire process of making a Data Management Plan (DMP) for archaeological research. By using the templates and the accompanying manual with a clear set of guidelines and advice, it becomes much easier to meet the requirements of organisations that fund research. The DMP is then also in line with standards in the archaeological domain, which ultimately makes the data more findable, accessible, reusable and interoperable (FAIR).
Eaa2021 476 natália botica - from 2_archis to datarepositorium2ariadnenetwork
To promote open science and data reuse, it is necessary to have data available in open repositories that guarantee their accessibility and permanence, while facilitating their reuse.
Data classified as FAIR (Findable, Accessible, Interoperable and Reusable) must follow guidelines that ensure the use of an appropriate metadata scheme, persistent identifiers, well-defined vocabularies, procedures to standardize and improve data quality and sustainable file formats. We will present the methodology used for recording the coin findings from an archaeological excavation carried out by the Archaeology Unit of the University of Minho (UAUM) in the intervention of Casa da Bica, starting with the recording of data in the UAUM's 2ArchIS information system and ending with its availability in the scientific repository "DataRepositóriUM". We will also present some works of visualization and research as examples of the reuse of these data sets, which can be wider when they are integrated in structures of greater visibility like ARIADNE.
1. The Archaeological Map of Bulgaria project digitizes archaeological site data from Bulgaria and makes it accessible through the ARIADNE portal. This allows Bulgarian archaeological data to be discovered and used internationally.
2. Site records and GIS data from the Archaeological Map of Bulgaria have been mapped to international standards like CIDOC CRM, AAT, and PeriodO to improve interoperability. Around 1000 Bulgarian archaeological records are now available through the ARIADNE portal.
3. Making this archaeological data digital and accessible on the web provides opportunities for researchers and helps overcome data isolation, especially during COVID when fieldwork is limited. It also supports data-driven rescue excavations and research.
This contribution will present digital assets and initiatives at the Museum of Cultural History (MCH), University of Oslo (UiO) and aims at sharing data. The COVID-19 restrictions have elevated the importance of digital assets. At the beginning of this period, metadata for the archaeological collections were, to a large
degree, already digitized and accessible online. This is the result of a national collaboration beginning in the 1990s and continue today in UniMus:Kultur. MCH had also published a map-based overview of all excavations in Eastern/Southern Norway, and
begun to release excavation reports through UiO’s science archive. Recently, focus has shifted towards 3D-documentation of exhibits and publication of existing 3D-models on 3DHOP—available through humgis.uiocloud.no MCH now concentrates on digitizing artefacts at the Viking Ship Museum. The 3D-models
from here will be included in the BItFROST project, which will address the active role of 3D-models in research and education. BItFROST will work on FAIRifcation of 3D-models and promote dialogue with researchers. The 3DHOP platform enables the creation of interactive user-interfaces for researchers and a public audience. Collaboration with DarkLab in Lund, Sweden will create common user-interfaces for Swedish and Norwegian
collections. The project will also utilize AR and VR in the presentation of data.
In addition, the infrastructure project ADED (Archaeological Digital Excavation Documentation) provides open-access to excavations in Norway. The five Norwegian university museums and the Directorate of Cultural Heritage take part in the project.
ADED’s map-based webpages will integrate excavation documentation and the museums’ artefact/photograph databases, making it possible to have an overview and
detailed information of excavations and finds. As part of migrating the data to a common repository, mapping it to CIDOC-CRMarcheo facilitates further mapping to ARIADNEplus and/or other datasets.
Abstracts for the ten presentations at EAA 2021 Session 476: Understanding and expanding capacity in archaeological data management beyond western Europe organised by ARIADNEplus and SEADDA under Theme 3: The new normality of heritage management and museums in post-Covid times on 8th September 2021.
Eaa2021 476 ways and capacity in archaeological data management in serbiaariadnenetwork
Over the past year and due to the COVID-19 pandemic, the entire world has witnessed inequalities across borders and societies. They also include access to archaeological resources, both physical and digital. Both archaeological data creators and users spent a lot of time working from their homes, away from artefact collections and research data. However, this was the perfect moment to understand the importance of making data
freely and openly available, both nationally and internationally.
This is why the authors of this paper chose to make a selection of data bases from various institutions responsible for preservation and protection of cultural heritage, in
order to understand their policies regarding accessibility and usage of the data they keep. This will be done by simple visits to various web-sites or data bases. They intend to check on the volume and content, but also importance of the offered archaeological heritage. In addition, the authors will estimate whether the heritage has adequately been classified and described and also check whether data is available in foreign languages. It needs to be seen whether it is possible to access digital objects (documents and the accompanying metadata), whether access is opened for all users or it requires a certain
hierarchy access, what is the policy of usage, reusage and distribution etc. It remains to be seen whether there are public API or whether it is possible to collect data through API.
In case that there is a public API, one needs to check whether datasets are interoperable or messy, requiring data cleaning.
After having visited a certain number of web-sites, the authors expect to collect enough data to make a satisfactory conclusion about accessibility and usage of Serbian archaeological data web bases.
Eaa2021 476 izeta cattaneo idacordig and suquiaariadnenetwork
The COVID-19 pandemic unleashed during 2020 implied a change in the way of doing archaeology on a global scale. In Argentina, in particular, activities had to move to the
domestic sphere and, most times, the possibility of carrying out fieldwork, material analysis and collection management in the usual workplaces was lost. This practice showed the need for repositories, libraries and online databases that would allow access to archaeological information. Suquía, the institutional repository of IDACOR, has been compiling and disseminating archaeological information since 2016, although it had not
yet developed its capacity to include databases that would allow meta-analysis of the information hosted. So, the needs raised by the lockdown led to implementing an action aimed at incorporating data from 1938 archaeological sites in the Province of Córdoba (Argentina) together with IDACORDIG (an implementation of the Arches software) which links this set to a spatial database, creating a gazetteer of archaeological sites for the region. This integration is the first of its kind in Argentina, and fosters an increase in primary information and grey literature visibility, together with publications preprints and
prints that allow continuity in the study of archaeology on a regional scale. In this presentation we will characterize this process and its technical aspects to aware on the potential of this type of platform for its integration into digital infrastructures of global impact.
Eaa2021 476 preserving historic building documentation pakistanariadnenetwork
Like many countries around the world, Pakistan was forced to go into a COVID-19 national lockdown in March 2020. While this confined most people to their homes, it also had the unintended consequence of catapulting many institutions into embracing going digital. At the National College of Arts (NCA), Pakistan’s oldest art school, this meant embracing online tools and digital resources that had previously been resisted or under utilized in the teaching of art, design, and architecture. The experiences of
lockdown have highlighted inadequacies and inequities within our systems, and as Pakistan returns to normal there is a renewed will to maintain the momentum gained during the pandemic, and an increased realization of the need for developing and sustaining digital infrastructures. The National College of Arts Archives collect and preserve the records, manuscripts, and other artefacts of historical and archaeological
significance at the National College of Arts. From March 2021, the NCA Archives are initiating a project to collect, preserve, and digitize historic building documentation created at the NCA over the past 145 years. This paper will follow this process and
document the NCA Archive’s attempt at creating a Findable, Accessible, Interoperable, and Reusable (FAIR) database of historic building documentation in Pakistan. It will summarize the experiences of the six-month pilot project, including opportunities that have arisen in the aftermath of the Covid-19 pandemic, and in light of the Government of Pakistan’s ongoing Digital Pakistan initiative. The paper will also document and analyze the difficulties and hurdles that might emerge during the course of the project as the NCA Archive’s digital infrastructure is built from the ground up in a post-colonial setting and a post-COVID world.
The COVID-19 pandemic has exacerbated or made more visible many known inequalities across borders and societies. This includes access to archaeological resources, both physical and digital. As both the creators and users of archaeological data adapted to working from their homes, cut off from artefact collections and research data siloed within organisations and institutions, the importance of making data freely and openly
available internationally became even more pronounced. The ARIADNE infrastructure (ariadne-infrastructure.eu) for archaeological data, and the SEADDA COST Action
(seadda.eu) are working to secure the sustainable future of archaeological data across Europe and beyond, in ways that are Findable, Accessible, Interoperable and Re-usable (FAIR). Experience within the ARIADNE partnership during the pandemic was largely positive, with many partners able to carry on as usual with accessing their digital resources, emphasising what is possible, while also emphasising what is not achievable
across archaeology, due to lack of capacity. ARIADNE and SEADDA invite papers discussing the challenges, opportunities and lessons learned across all aspects of archaeological data management during the pandemic, and how it may change and
inform our best practice going forward. We particularly invite papers from outside of Western Europe on how the COVID-19 pandemic created barriers or opportunities for accessing archaeological resources, so that we may better understand capacity building during a post-COVID era.
The Portable Antiquities of the Netherlands (PAN) portal and the data model behind the description of the findings are discussed in detail, and how this approach leads to publishing data that is FAIR .
The Innovation Strategy and Targeted activities report presents the ARIADNEplus innovation strategy, addressing its different dimensions and how each of these will approached.
The main dimensions of the strategy are:
Research policies: Alignment with the European research policies on FAIR data, Open Science practices, and the European Open Science Cloud (EOSC) initiative.
Data integration: Increase of the ARIADNE data pool through incorporation of datasets from more archaeological research domains.
Data infrastructure: Implementation and operation of a Cloud-based platform for data aggregation, integration, discovery, access and use across across institutional and national, as well as disciplinary boundaries.
Service portfolio: Provision of enhanced and new services for digital archaeology on the Cloud-based platform.
Stakeholder and user base: Extension of the stakeholder and user base in Europe and beyond, taking account of user needs regarding data, technical services and training.
The report concludes with the methodology that is being used to evaluate the impact of ARIADNEplus on the wider archaeological community.
ARIADNEplus Community Needs Survey - Key Resultsariadnenetwork
The survey of over 700 archaeologists and data managers found that 65% now share some or all project data through repositories, up from 50% in 2013. Respondents said they most need recognition for data sharing but also find it time-consuming. While awareness of sharing is growing, more must be done to increase readiness. Respondents have reused others' data for research and building databases. They are most interested in ARIADNEplus services for discovering, accessing, and registering datasets across countries. Training in open data principles is the top priority.
The objectives for the ARIADNEplus online survey were to collect information on needs of the ARIADNEplus user community regarding data sharing, access and (re)use, new services (as developed by the project), and related training needs. Results of the ARIADNEplus survey were to be compared, where possible, to those of the ARIADNE 2013 survey (ARIADNE 2014) and, particularly, to planned new technical and other services. Furthermore, the analysis of the results had to focus on the match between the perceived user needs and planned ARIADNEplus services, and suggestions to be provided on activities likely to enable an optimal match.
This presentation provides an insightful view in the process of digitising agenda in Czech archaeology. A cornerstone of this is the Archaeological Information System of the Czech Republic (AIS CR), a national solution for research management, data gathering, curation and presentation. A key component AIS CR is the Archaeological Map of the Czech Republic (AMCR), operational since 2017.
OpenArchaeo is an application to query archaeological data via CIDOC CRM developed by the MASA Consortium (Mémoire des archéologues et des sites archéologiques). This exciting tool allows to query both the MASA triplestore and other sources of archaeological data mapped with the CIDOC CRM and can be used by other interfaces such as the ARIADNE portal.
INRAP is one of the biggest European institutions in charge of unmovable archaeological heritage. Although centralised, INRAP is so big that a lot of diversity in terms of standards and tools existed. Therefore, ARIADNE was very helpful for Kai, Amala and their co-workers to apply some of the ARIADNE’s tools and approaches to INRAP. One of the top achievements of INRAP due to ARIADNE was ‘changing the culture of sharing’.
DANS, the Dutch Data Archiving and Networked Services provides facilities for the deposit and archiving of archaeological data and provide a Trusted Digital Repository. Challenges involved mass ingestion of datasets and making use of thesauri, data mining and Linked Open-Data techniques.
The Swedish national Data Service (SND) were in the original ARIADNE project and learned how to organise and classify their data for both the Portal and their own web service. Able to display map, marker and polygon information now. Use Elasticsearch, AAT and Periodo.
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
Open Source Contributions to Postgres: The Basics POSETTE 2024ElizabethGarrettChri
Postgres is the most advanced open-source database in the world and it's supported by a community, not a single company. So how does this work? How does code actually get into Postgres? I recently had a patch submitted and committed and I want to share what I learned in that process. I’ll give you an overview of Postgres versions and how the underlying project codebase functions. I’ll also show you the process for submitting a patch and getting that tested and committed.
1. Methodological tips for mappings to
CIDOC CRM
Maria Theodoridou, George Bruseker, Maria Daskalaki, Martin Doerr
FORTH - Institute of Computer Science
{bruseker, martin, maria}@ics.forth.gr
daskalakim@hotmail.com
2. OSLO CAA 2016, March 30, 2016
The CIDOC Conceptual Reference Model (ISO 21127:2006) has been chosen
as the core model for use in several Cultural Heritage projects
2
http://www.ariadne-infrastructure.eu/
http://www.itn-dch.eu/
http://www.researchspace.org/
http://www.parthenos-project.eu/
http://wiss-ki.eu/
http://americanartcollaborative.org/
3. OSLO CAA 2016, March 30, 2016
CIDOC CRM extension suite
3
Few concepts,
high recall
Special concepts,
high precision
CRMinf
CRMsci
CIDOC-CRM
CRMarchaeo
CRMdig
CRMba
Thing
Actor
Event
was present at
happened at
4. OSLO CAA 2016, March 30, 2016 4
Mapping one schema to another
Foundational activity
Convert data stored in existing (source) schemata to an expression in a form (rdf)
compatible with CIDOC CRM and its extensions (target schema)
It’s not a matching exercise, it’s not matching of terms, but interpreting relations
It is a consideration of the meaning of the source and a re-expression of that in the
target
Common point of reference in interpreting source and target as being descriptions
of the same reality
Goal
Preserve as much as possible the initial ‘meaning’ of the data
Enable information exchange and integration between heterogeneous sources of
cultural heritage information
5. OSLO CAA 2016, March 30, 2016 5
Target Range:
Literal
Target Domain:
E22 Man-Made Object
P43 has dimension
Source Path:
weights
Source Domain:
Coin
Source Range:
WEIGHT
P90 has value
Target Path:
Intermediate Node:
E54 Dimension
Constant Node:
E58 Measurement Unit
P91 has unit
P2 has type
Constant Node:
E55 Type weight
gr
Mapping one schema to another
6. OSLO CAA 2016, March 30, 2016
Mapping & Training Activities
Several mapping activities were initiated trying to convert existing schemata of
Cultural heritage data to CIDOC CRM
ARIADNE: diverse archaeological databases
Research Space: British Museum and Rijksmuseum data
Training activities
Research Space Workshops in Yale and in Oxford Universities
ITN-DCH CIDOC CRM board game
ARIADNE Summer Schools and Transnational Access
6
7. OSLO CAA 2016, March 30, 2016
General Mapping Principles &
Approach
7
8. OSLO CAA 2016, March 30, 2016
Define primary domain
Define primary domain - typically represents the focus of interest
Identify the CIDOC CRM class that models it best
Is there a need for a new Class?
Do we need properties that are not available in CIDOC CRM?
The introduction of a new class should comply with the “Minimality” modelling
principle of CIDOC CRM (CIDOC CRM, 2015):
“A class is not declared unless it is required as the domain or range of a
property not appropriate to its superclass, or it is a key concept in the
practical scope”
8
9. OSLO CAA 2016, March 30, 2016
Example: Two approaches for defining Coin
Introduce a specialization of E22 Man-Made Object:
Exx Coin subclass of E22 Man-Made Object
Define the Type of E22 Man-Made Object:
E22 Man-Made Object. P2 has type: E55 Type = “Coin”
To choose we need to answer the question:
Does the new class Coin have new properties that are
not available in E22?
Create a new Class, work with it and if it comes out that it has no new
properties, revert to its CRM superclass
9
Define primary domain
11. OSLO CAA 2016, March 30, 2016
We map local identifiers in relational database tables explicitly only if these
identifiers are visible in the user interface and used in other documents as
well. Alternatively, we use the local database identifiers only for generating
URIs for the record instance, if the thing identified belongs to our holdings or
was created/invented by us. Otherwise we better use a global authority.
Golden rules for identifiers:
they should be widely known for the item they identify
they should be related to those who can confirm the relation
between the identifier and the identified (museum object identifiers)
11
Common database fields:
Local identifiers
12. OSLO CAA 2016, March 30, 2016
only a local URI:
or
a local URI and an Identifier:
12
Common database fields:
Local identifiers
<COIN>
<ID>627</ID>
<COUNTRY_ID>1</COUNTRY_ID>
<FIND_SPOT_ID>242</FIND_SPOT_ID>
<FIND_MANNER_ID>2</FIND_MANNER_ID>
………………..
</COIN>
E22 Man-Made Object
http://www.oeaw.ac.at/coins/627
E22 Man-Made Object
http://www.oeaw.ac.at/coins/627
E42 Identifier
http://www.oeaw.ac.at/id/627
P1 is identified by
13. OSLO CAA 2016, March 30, 2016
Whenever possible use a global authority
El Greco (Dominico Theotokopoulos)
VIAF:
or
GETTY ULAN:
13
Common database fields:
Local identifiers
E39 Actor
http://viaf.org/viaf/100215785/
E39 Actor
http://vocab.getty.edu/ulan/500010916
14. OSLO CAA 2016, March 30, 2016
Label and Appellation are alternative implementations in rdf
For contemporary names (not historical) the use of rdfs:label + language
attribute is sufficient
We use the E41 Appellation class only if there is need to assign some
property to the Appellation
14
Common database fields:
Appellations
15. OSLO CAA 2016, March 30, 2016
Yes/No fields of relational databases are typically mapped to types.
Table name: Gefaesse (vessels) (E22 Man-Made Object)
Field name: GefFragmen: yes/no field
If value of field is yes
then Gefaesse.GefFragment maps to E55 Type fragment
15
Common database fields:
Yes/No fields
E22 Man-Made Object
Vessel 1
E55 Type
fragment
P2 has type
16. OSLO CAA 2016, March 30, 2016
All the hidden constants, the common context information of a db, can be
added as constant node information at a suitable object type, typically in
the domain node.
Quite often semantics are built into
the User Interface
the queries
values of the identifiers
16
Common database fields:
Implicit and contextual information
17. OSLO CAA 2016, March 30, 2016 17
Target Range:
Literal
Target Domain:
E22 Man-Made Object
P43 has dimension
Source Path:
weights
Source Domain:
Coin
Source Range:
WEIGHT
P90 has value
Target Path:
Intermediate Node:
E54 Dimension
Constant Node:
E58 Measurement Unit
P91 has unit
P2 has type
Constant Node:
E55 Type weight
gr
Common database fields:
Implicit and contextual information
Constant Node:
E40 Legal Body
OEAW
P50 has current keeper
18. OSLO CAA 2016, March 30, 2016
Need to separate categorical and factual data
Inconsistent information:
Find spot -> for a specific coin
Historical facts -> for a category of coins
18
Common database fields:
Categorical vs factual info
19. OSLO CAA 2016, March 30, 2016
Common database fields:
Categorical vs factual info
19
P108i was
produced by
P137 exemplifies
PC1 produced
things of type
Need to extend the model in order to
support categorical production
(similar to FRBRoo
R26 produced things of type)
E12 Production
production of coins AU
E55 Type
AU from Rome
E22 Man-Made Object
MyCoin
20. OSLO CAA 2016, March 30, 2016
Nationality can be modeled as:
membership in an E74 Group = citizenship, social participation
an actor may join and/or leave the group (former, current member)
E67 Birth = provenance
E55 Type = behavior
suitable for ambiguous, fuzzy characterizations
20
People:
Groups, Nationality, Origins
21. OSLO CAA 2016, March 30, 2016 21
People:
Groups, Nationality, Origins
Art Institute Chicago, Photo M. Doerr, 2010
E55 Type
Spanish
E39 Actor
http://viaf.org/viaf/100215785/
P2 has type
E74 Group
Cretan
P107i is current or former member of
E67 Birth
The birth of El Greco
E53 Place
Greece
P7 took place at
P98i was born
22. OSLO CAA 2016, March 30, 2016
Art Institute Chicago
http://www.artic.edu/aic/collections/artwork/87479
22
People:
Groups, Nationality, Origins
23. OSLO CAA 2016, March 30, 2016
Accidental roles are roles that do not characterize an actor independently
from a particular context of activity. These roles are properties of the P14
carried out property but this cannot be represented in rdf.
They should NOT be assigned to the actor
23
People:
Accidental roles
E39 Actor
E55 Type
E7 Activity
P14 carried out by (performed) P14.1 in the role of
24. OSLO CAA 2016, March 30, 2016
Solution 1: Specializing P14 carried out by for each role
suitable if we know a priory all the possible values
P14.1.1 carried out by as researcher P14.1.2 carried out by as farmer
People:
Accidental roles
E7 Activity
http://www.ics.forth.gr/Activity/2013HeraklionCRM-SIGmeeting
E7 Activity
http://www.ics.forth.gr/Activity/2013MaladesPlantingOliveTrees
E39 Actor
http://www.ics.forth.gr/Actor/Doerr
25. OSLO CAA 2016, March 30, 2016
Solution 2: Introducing class PC0 Typed CRM_Property and its subclass PC14 carried out by
and properties P01 has domain and P02 has range
PC14 carried out by
http://www.ics.forth.gr/Actor/Doerr_as_Researcher
PC14 carried out by
http://www.ics.forth.gr/Actor/Doerr_as_Farmer
E7 Activity
http://www.ics.forth.gr/Activity/2013Hera
klionCRM-SIGmeeting
P01 has domain
E55 Type
http://www.ics.forth.gr/Role/Researcher
E55 Type
http://www.ics.forth.gr/Role/Farmer
P14.1 in the role of
E7 Activity
http://www.ics.forth.gr/Activity/2013
MaladesPlantingOliveTrees
E39 Actor
http://www.ics.forth.gr/Actor/Doerr
P02 has range
P01 has domain
P02 has range
PC0 Typed CRM
Property
E1 CRM Entity
P01 has domain (P01i is domain of)P02 has range (P02i is range of)
P14.1 in the role of
People:
Accidental roles
26. OSLO CAA 2016, March 30, 2016
Solution 2: equivalence with CIDOC CRM definition
PC14 carried out by
http://www.ics.forth.gr/Actor/Doerr_as_Research
er
E7 Activity
http://www.ics.forth.gr/Activity/2013Hera
klionCRM-SIGmeeting
P01 has domain
E55 Type
http://www.ics.forth.gr/Role/Researcher
P14.1 in the role of
E39 Actor
http://www.ics.forth.gr/Actor/Doerr
P02 has range
P14 carried out by (performed)
PC0 Typed CRM
Property
E7 Activity
http://www.ics.forth.gr/Activity/2013
HeraklionCIDOC-CRM-SIGmeeting
E39 Actor
http://www.ics.forth.gr/Actor/Doerr
E55 Type
http://www.ics.forth.gr/Role/Researcher
equivalent to P14.1 in the role of
People:
Accidental roles
27. OSLO CAA 2016, March 30, 2016
Solution 3: Add a note to each Activity with the Person’s role
does not require any extension of the model
Literal
The Activity was carried out by Doerr in the role
of Researcher
Literal
The Activity was carried out by Doerr in the role
of Farmer
E7 Activity
http://www.ics.forth.gr/Activity/2013HeraklionCRM-SIGmeeting
P2 has note
E7 Activity
http://www.ics.forth.gr/Activity/2013MaladesPlantingOliveTrees
E39 Actor
http://www.ics.forth.gr/Actor/Doerr
P14 carried out by
P2 has note
People:
Accidental roles
28. OSLO CAA 2016, March 30, 2016
Solution 4: Assign a type to each Activity
does not require any extension of the model
E55 Type
http://www.ics.forth.gr/Role/Research
E55 Type
http://www.ics.forth.gr/Role/Farming
E7 Activity
http://www.ics.forth.gr/Activity/2013HeraklionCRM-SIGmeeting
E7 Activity
http://www.ics.forth.gr/Activity/2013MaladesPlantingOliveTrees
E39 Actor
http://www.ics.forth.gr/Actor/Doerr
P14 carried out by
People:
Accidental roles
P2 has type P2 has type
29. OSLO CAA 2016, March 30, 2016
People:
Accidental roles - Issuer
29
Target Range:
E39 Actor
Target Domain:
E22 Man-Made Object
P108i was produced by
Source Path:
ISSUER_ID == PR_ID
Source Domain:
//COIN
Source Range:
//ISSUER
P14 carried out by
E12 Production
p1
"Issuer" is an accidental role, does not characterize
an actor independently from particular contexts of
activity. Therefore the Actor does not have the type
"Issuer" but the activity only has the type "Issuing“
P17 was motivated by
E7 Activity
ia1 E55 Type
Issuing
“gave order”
30. OSLO CAA 2016, March 30, 2016
Political office is modeled as an E74 Group with one member
an actor may join and/or leave the group (former, current member)
https://en.wikipedia.org/wiki/Alexis_Tsipras
Alexis Tsipras is the 185th and current Prime Minister of Greece, having been sworn in on 21
September 2015. He previously served as the 183rd Prime Minister of Greece from 26 January
2015 to 27 August 2015.
E74 Group The Prime Minister of Greece
P144i gained member by E85 Joining P143 joined E39 Actor Alexis Tsipras
P4 has time-span E52 Time-Span 26 January 2015
P146i lost member by E86 Leaving P145 separated E39 Actor Alexis Tsipras
P4 has time-span E52 Time-Span 27 August 2015
P144i gained member by E85 Joining P143 joined E39 Actor Alexis Tsipras
P4 has time-span E52 Time-Span 21 September 2015
30
People:
Political office
31. OSLO CAA 2016, March 30, 2016
Geopolitical units map to a Period with type “State”, etc.
Map them as Place but the id has the name of the country plus a date,
when this name of country was assigned.
It is a snapshot that allows to be always in the right time space context.
always refer to a global authority e.g. TGN (falls within, overlaps, is
identified by)
31
Places:
Countries
32. OSLO CAA 2016, March 30, 2016
“Why hell is not a good place”
E53 Place represents always a real extent in space independent of time
An imaginary place can be represented as an E89 Propositional Object
The ambiguity between real and imaginary is an ambiguity of the phrase and
not an ambiguity in the mind of the author of the phrase (as long as he is
sane)
Alternatives should be made explicit: principle of recall over precision
32
Places:
Imaginary Places
33. OSLO CAA 2016, March 30, 2016
How many interpretations of hell?
33
Places:
Imaginary Places
Taoist Hell
Catholic Hell,
Pisa 13th century
Byzantine-Orthodox
Hell
Crete
34. OSLO CAA 2016, March 30, 2016
William Blake's depiction of "The Vestibule of Hell and the Souls Mustering to Cross
the Acheron" in his Illustrations to Dante's "Divine Comedy“ object 5 c. 1824-27. The
original for the work is held by the National Gallery of Victoria
Example taken from https://en.wikipedia.org/wiki/Acheron
34
Places:
Imaginary Places
E73 Information Object
The Vestibule of Hell and the Souls Mustering
to Cross the Acheron
E89 Propositional Object
Hell
P67 refers to E53 Place
Acheron
35. OSLO CAA 2016, March 30, 2016
Now Italian archaeologists working at the Greco-Roman site of ancient Hierapolis
(modern-day Pamukkale) in Turkey have uncovered that city's gate to the underworld.
Example taken from http://news.nationalgeographic.com/news/2013/04/130414-hell-
underworld-archaeology-mount-olympus--greece/
35
Places:
Imaginary Places
E89 Propositional Object
Entrance to HellP67 refers to
E53 Place
ancient Hierapolis
E13 Attribute Assignement
possibly refers to
E73 Information Object
P140 was attributed by
P141 assigned
Archaeologists Find a Classic
Entrance to Hell
36. OSLO CAA 2016, March 30, 2016
Pieces of a broken object are mapped as products of an event that destroyed
the object and not as part_of.
For provenance reasons
For semantic clarity
As long as the object existed the respective pieces of matter were not distinct,
had no identity
36
Objects:
Pieces of a broken object
E22 Man-Made Object
Piece 1
E81 Transformation
destruction of object
E55 Type
fragment
E22 Man-Made Object
broken object
P124 transformed
P2 has type
P123 resulted in
37. OSLO CAA 2016, March 30, 2016
Pieces of a set are mapped as part of
Chess set has parts: Chess board and chess pieces
Collective objects in the general sense, like a tomb full of gifts, a folder with stamps or
a set of chessmen, should be documented as instances of E19 Physical Object, and
not as instances of E78 Collection. This is because they form wholes either because
they are physically bound together or because they are kept together for their
functionality.
37
Objects:
Pieces of a set
E22 Man-Made Object
Chess set
E22 Man-Made Object
Chess pieces
E22 Man-Made Object
Chess boardP46 is composed of
38. OSLO CAA 2016, March 30, 2016 38
Conclusions
Always consult the reference document
Definition of the CIDOC Conceptual Reference Model
http://www.cidoc-crm.org/comprehensive_intro.html
Read carefully the scope note of the class and property that you intend to use.
Learning to use a formal ontology requires understanding the basic principles
expressed in its scope, classes and relations
http://www.cidoc-crm.org/index.html
New site (available for testing)
http://new.cidoc-crm.org/