The presentation of my public PhD defense on March 10, 2022. The related video is available at https://www.youtube.com/watch?v=NofQSwc3Svk
This doctoral thesis tackles the support of users when assessing, creating and using Knowledge Graph restrictions.
More concretely, in this dissertation the FAIR Montolo statistics are contributed, supporting users in assessing existing Knowledge Graphs based on used restrictions.
The two visual notations ShapeUML and ShapeVOWL are presented and evaluated: they represent all constraint types of the Shapes Constraint Language (SHACL) and thus advance the state of the art.
Finally, the use of restrictions to represent formal meaning and to assess data quality is demonstrated for a social media archiving use case in the BESOCIAL project of the Royal Library of Belgium (KBR).
Statistics about Data Shape Use in RDF DataSven Lieber
The presentation of the poster paper "Statistics about Data Shape Use in RDF Data" presented during the demo/poster session at the International Semantic Web Conference (ISWC) 2020.
Joint work with Ben De Meester, Anastasia Dimou and Ruben Verborgh.
The related video is online available at YouTube: https://www.youtube.com/watch?v=6-OdjYdEpeU
BESOCIAL A Knowledge Graph for Social Media ArchivingSven Lieber
The presentation of our paper "BESOCIAL: A Sustainable Knowledge Graph-based Workflow for Social Media Archiving" presented at the SEMANTiCS EU conference 2021 in Amsterdam.
Joint work with Dylan Van Assche, Sally Chambers, Fien Messens, Friedel Geeraert. Julie M. Birkholz and Anastasia Dimou
The relate video is online available at https://youtu.be/oYmzD3e8rBE?t=1912
Development of Semantic Web based Disaster Management SystemNIT Durgapur
Semantic Web model In the field of disaster management to structurise the data such that any information needed during emergency will be easily available.
Context, Perspective, and Generalities in a Knowledge OntologyMike Bergman
This presentation to the Ontolog Forum in Dec 2016 presents the knowledge graph (ontology) design for KBpedia, a system of six major knowledge bases and 20 minor ones for conducting knowledge-based artificial intelligence (KBAI). The talk emphasizes the roots of the system in the triadic logic of Charles Sanders Peirce. It also discusses the use of KBpedia for the more-or-less automatic ways it can help create training corpuses, training sets, and reference standards for supervised, unsupervised and deep machine learning. Uses of the system include entity and relation extraction and tagging, classification, clustering, sentiment analysis, and other AI tasks.
An introduction to the Joint Information Systems Committee Resource Discovery iKit. Includes a look at controlled vocabularies declared in the Resource Discovery Framework (RDF)/Simple Knowledge Organisation System (SKOS) and wikipedia entries. Presented by Tony Ross at the CILIPS Centenary Conference Branch and Group Day which took place 5 Jun 2008.
The document introduces a semantic wiki that aims to reduce the steep learning curve of developing and deploying semantic web applications. The semantic wiki allows for easy publishing and smart data propagation for end users, as well as fast prototyping in the browser and lightweight concept modeling for developers. It integrates semantic technologies like RDF, OWL, and SPARQL to enable knowledge management, data organization, data sharing, personalization, privacy protection, and provenance tracking within wikis. Challenges addressed include ontology modeling, relational modeling with rules, semantic querying across multiple wikis, and annotation extensions.
Research Data Sharing: A Basic FrameworkPaul Groth
Some thoughts on thinking about data sharing. Prepared for the 2016 LERU Doctoral Summer School - Data Stewardship for Scientific Discovery and Innovation.
http://www.dtls.nl/fair-data/fair-data-training/leru-summer-school/
Statistics about Data Shape Use in RDF DataSven Lieber
The presentation of the poster paper "Statistics about Data Shape Use in RDF Data" presented during the demo/poster session at the International Semantic Web Conference (ISWC) 2020.
Joint work with Ben De Meester, Anastasia Dimou and Ruben Verborgh.
The related video is online available at YouTube: https://www.youtube.com/watch?v=6-OdjYdEpeU
BESOCIAL A Knowledge Graph for Social Media ArchivingSven Lieber
The presentation of our paper "BESOCIAL: A Sustainable Knowledge Graph-based Workflow for Social Media Archiving" presented at the SEMANTiCS EU conference 2021 in Amsterdam.
Joint work with Dylan Van Assche, Sally Chambers, Fien Messens, Friedel Geeraert. Julie M. Birkholz and Anastasia Dimou
The relate video is online available at https://youtu.be/oYmzD3e8rBE?t=1912
Development of Semantic Web based Disaster Management SystemNIT Durgapur
Semantic Web model In the field of disaster management to structurise the data such that any information needed during emergency will be easily available.
Context, Perspective, and Generalities in a Knowledge OntologyMike Bergman
This presentation to the Ontolog Forum in Dec 2016 presents the knowledge graph (ontology) design for KBpedia, a system of six major knowledge bases and 20 minor ones for conducting knowledge-based artificial intelligence (KBAI). The talk emphasizes the roots of the system in the triadic logic of Charles Sanders Peirce. It also discusses the use of KBpedia for the more-or-less automatic ways it can help create training corpuses, training sets, and reference standards for supervised, unsupervised and deep machine learning. Uses of the system include entity and relation extraction and tagging, classification, clustering, sentiment analysis, and other AI tasks.
An introduction to the Joint Information Systems Committee Resource Discovery iKit. Includes a look at controlled vocabularies declared in the Resource Discovery Framework (RDF)/Simple Knowledge Organisation System (SKOS) and wikipedia entries. Presented by Tony Ross at the CILIPS Centenary Conference Branch and Group Day which took place 5 Jun 2008.
The document introduces a semantic wiki that aims to reduce the steep learning curve of developing and deploying semantic web applications. The semantic wiki allows for easy publishing and smart data propagation for end users, as well as fast prototyping in the browser and lightweight concept modeling for developers. It integrates semantic technologies like RDF, OWL, and SPARQL to enable knowledge management, data organization, data sharing, personalization, privacy protection, and provenance tracking within wikis. Challenges addressed include ontology modeling, relational modeling with rules, semantic querying across multiple wikis, and annotation extensions.
Research Data Sharing: A Basic FrameworkPaul Groth
Some thoughts on thinking about data sharing. Prepared for the 2016 LERU Doctoral Summer School - Data Stewardship for Scientific Discovery and Innovation.
http://www.dtls.nl/fair-data/fair-data-training/leru-summer-school/
Knowledge Technologies: Opportunities and ChallengesFariz Darari
How to be one step ahead of leveraging knowledge technologies for your apps!
When: Dec 8, 2017
Where: Fl. 6, Multimedia Tower, Central Jakarta
Thanks to Ragil for the invitation!
This document discusses open medical knowledge bases and Wikidata in particular. It describes Wikidata as a free and multilingual knowledge base that has grown from 30k facts in 2013 to over 346m facts in 2017. The document provides examples of medical information represented in Wikidata, including anatomy, diseases, and drugs. It also describes ongoing efforts to improve medicine data in Wikidata and gives examples of applications that utilize Wikidata's medical knowledge, such as virtual doctors that can identify potential diseases based on reported symptoms.
This document discusses approaches to developing globally interoperable metadata standards like RDA. It describes the failure of top-down approaches and issues with both top-down and bottom-up mapping strategies. Bottom-up risks multiple overlapping element sets while top-down may not fully represent local practices. The author advocates balancing global needs with flexibility for local implementation.
RDF and Open Linked Data, a first approachhorvadam
This document discusses the potential benefits of libraries publishing their data as linked open data using semantic web technologies. It describes how linked data allows for standardized access to data across the web as a single API. Libraries can make their data more discoverable on the web and searchable by services like Google by publishing it as linked open data. Semantic web technologies like RDF and SPARQL allow for more powerful search capabilities. Several large libraries are already publishing portions of their data as linked open data, including authority files and entire catalogs. The document outlines some semantic web applications libraries could use to enhance discovery and provides examples of vocabularies for describing different types of metadata.
The document discusses the evolution of the web from documents to data, and introduces linked data which publishes machine-readable data on the web that is explicitly defined and linked to other datasets. It then discusses question answering systems that take natural language questions and locate answers from document collections, including both closed-domain systems with restricted knowledge bases and open-domain systems that retrieve answers from the web. The document also presents the linked data technology stack and some examples of linked open data clouds from 2007 to 2011 to demonstrate the growth of linked data on the web.
Linked Open Data Alignment and Enrichment Using Bootstrapping Based TechniquesPrateek Jain
The recent emergence of the “Linked Data” approach for publishing data represents a major step forward in realizing the original vision of a web that can “understand and satisfy the requests of people and machines to use the web content” – i.e. the Semantic Web. This new approach has resulted in the Linked Open Data (LOD) Cloud, which includes more than 70 large datasets contributed by experts belonging to diverse communities such as geography, entertainment, and life sciences. However, the current interlinks between datasets in the LOD Cloud – as we will illustrate – are too shallow to realize much of the benefits promised. If this limitation is left unaddressed, then the LOD Cloud will merely be more data that suffers from the same kinds of problems, which plague the Web of Documents, and hence the vision of the Semantic Web will fall short.
This thesis presents a comprehensive solution to address these issues using a bootstrapping based approach. It showcases using bootstrapping based methods to identify and create richer relationships between LOD datasets. The BLOOMS project (http://wiki.knoesis.org/index.php/BLOOMS) and the PLATO project, both built as part of this research, have provided evidence to the feasibility and the applicability of the solution.
This document discusses Neo4j and its applications in bioinformatics. It describes Bio4j, an open source bioinformatics graph database built using Neo4j that integrates data from sources like Uniprot, NCBI taxonomy, Gene Ontology, and more. Bio4j models biological data as nodes and relationships in a graph structure rather than tables. This allows for more flexible querying and knowledge integration. The document provides examples of how Bio4j can be accessed through its Java API, Cypher query language, Gremlin traversal language, and REST API. It also describes some tools and visualizations for exploring and analyzing Bio4j data.
A QA system takes in a natural language question, analyzes it to understand the type of question and information sought, searches structured and unstructured data sources for relevant information, and generates a natural language answer. It consists of modules for question analysis, information retrieval from knowledge bases and documents, answer generation, and response formatting. The goal is to delegate more interpretation work to machines so users can get direct answers to complex questions over heterogeneous data.
Introduction to question answering for linked data & big dataAndre Freitas
This document discusses question answering (QA) systems in the context of big data and heterogeneous data scenarios. It outlines the motivation and challenges for developing natural language interfaces for databases. The document covers the basic concepts and taxonomy of QA systems, including question types, answer types, data sources, and domains. It also discusses the anatomy and components of a typical QA system.
NSF Workshop Data and Software Citation, 6-7 June 2016, Boston USA, Software Panel
FIndable, Accessible, Interoperable, Reusable Software and Data Citation: Europe, Research Objects, and BioSchemas.org
Semantics-aware Techniques for Social Media Analysis, User Modeling and Recom...Cataldo Musto
This document provides an overview and agenda for a tutorial on semantics-aware techniques for social media analysis, user modeling, and recommender systems. The tutorial will discuss how to represent content to improve information access and build new services for social media. It will cover why intelligent information access is needed to effectively cope with information overload, and how semantics can be introduced through natural language processing and by encoding endogenous and exogenous semantics. The agenda includes explaining recommendations, semantic user profiles based on social data, and semantic analysis of social streams.
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)Ig Bittencourt
The document discusses developing Linked Data and Semantic Web applications. It begins with key concepts related to Linked Data, the Semantic Web, and applications. It then describes two key steps in developing such applications: publishing data as Linked Data and consuming Linked Data to build applications. Examples are provided of extracting, enriching, and linking different datasets to build a real estate recommendation application that performs semantic searches over the integrated data. Ontologies are created and reused to represent the domains and support interoperability. The document emphasizes integrating the data and software engineering perspectives in developing Semantic Web applications.
The Datalift Project aims to publish and interconnect government open data. It develops tools and methodologies to transform raw datasets into interconnected semantic data. The project's first phase focuses on opening data by developing an infrastructure to ease publication. The second phase will validate the platform by publishing real datasets. The goal of Datalift is to move data from its raw published state to being fully interconnected on the Semantic Web.
Datalift is a project that aims to catalyze the publication and interconnection of data on the web. It provides tools and services to help with various steps in the data publication process including:
- Dataset publication and conversion tools to automate publishing raw data as linked data using RDF.
- Infrastructure for storing and querying published RDF data using SPARQL endpoints and RDF stores.
- Linkage tools to help interconnect published datasets by finding equivalence links between resources.
- Applications that visualize and make use of published and interlinked datasets to demonstrate the value of linked open data.
Riding the wave - Paradigm shifts in information accessdatacite
The document discusses the paradigm shifts in scientific information access over time from empirical observation to computational simulation. It outlines the challenges libraries now face in providing access to non-textual scientific content like research data and simulations. The document also introduces DataCite, a global consortium that issues digital object identifiers (DOIs) to datasets to help make them accessible, citable, and traceable like scholarly articles.
The document discusses a webinar presented by NISO and DCMI on Schema.org and Linked Data. The webinar provides an overview of Schema.org and Linked Data, examines the advantages and challenges of using RDF and Linked Data, looks at Schema.org in more detail, and discusses how Schema.org and Linked Data can be combined. The goals of the webinar are to illustrate the different design choices for identifying entities and describing structured data, integrating vocabularies, and incentives for publishing accurate data, as well as to help guide adoption of Schema.org and Linked Data approaches.
This document provides an overview of the RDF data model. It discusses the history and development of RDF standards from 1997 to 2014. It explains that an RDF graph is made up of triples consisting of a subject, predicate, and object. It provides examples of RDF triples and their N-triples representation. It also describes RDF syntaxes like Turtle and features of RDF like literals, blank nodes, and language-tagged strings.
This document summarizes work being done to express the Data Documentation Initiative (DDI) metadata standard in Resource Description Framework (RDF) format to improve discovery and linking of microdata on the Web of Linked Data. It describes background on the DDI to RDF mapping effort, the goals of making microdata more accessible and interoperable online, and examples of how the RDF representation would support common discovery use cases. It also provides information on tools and next steps for the ongoing work, acknowledging contributions from participants in workshops where this effort was discussed.
Introduction to Ontology Concepts and TerminologySteven Miller
The document introduces an ontology tutorial that will cover basic concepts of the Semantic Web, Linked Data, and the Resource Description Framework data model as well as the ontology languages RDFS and OWL. The tutorial is intended for information professionals who want to gain an introductory understanding of ontologies, ontology concepts, and terminology. The tutorial will explain how to model and structure data as RDF triples and create basic RDFS ontologies.
Objectification Is A Word That Has Many Negative ConnotationsBeth Johnson
Here is an introduction to social web mining and big data:
Social web mining is the process of extracting useful information and knowledge from social media data. With the rise of big data, social media platforms are generating massive amounts of unstructured data every day in the form of posts, comments, shares, likes, etc. This user-generated data holds valuable insights about people's opinions, interests, behaviors and more.
Big data analytics provides tools and techniques to analyze this large, complex social data at scale. Social web mining applies data mining and machine learning algorithms to big social data to discover patterns and relationships. Areas of focus include sentiment analysis to understand public opinions on brands, products or issues; network analysis to map relationships and influence; and
Knowledge Technologies: Opportunities and ChallengesFariz Darari
How to be one step ahead of leveraging knowledge technologies for your apps!
When: Dec 8, 2017
Where: Fl. 6, Multimedia Tower, Central Jakarta
Thanks to Ragil for the invitation!
This document discusses open medical knowledge bases and Wikidata in particular. It describes Wikidata as a free and multilingual knowledge base that has grown from 30k facts in 2013 to over 346m facts in 2017. The document provides examples of medical information represented in Wikidata, including anatomy, diseases, and drugs. It also describes ongoing efforts to improve medicine data in Wikidata and gives examples of applications that utilize Wikidata's medical knowledge, such as virtual doctors that can identify potential diseases based on reported symptoms.
This document discusses approaches to developing globally interoperable metadata standards like RDA. It describes the failure of top-down approaches and issues with both top-down and bottom-up mapping strategies. Bottom-up risks multiple overlapping element sets while top-down may not fully represent local practices. The author advocates balancing global needs with flexibility for local implementation.
RDF and Open Linked Data, a first approachhorvadam
This document discusses the potential benefits of libraries publishing their data as linked open data using semantic web technologies. It describes how linked data allows for standardized access to data across the web as a single API. Libraries can make their data more discoverable on the web and searchable by services like Google by publishing it as linked open data. Semantic web technologies like RDF and SPARQL allow for more powerful search capabilities. Several large libraries are already publishing portions of their data as linked open data, including authority files and entire catalogs. The document outlines some semantic web applications libraries could use to enhance discovery and provides examples of vocabularies for describing different types of metadata.
The document discusses the evolution of the web from documents to data, and introduces linked data which publishes machine-readable data on the web that is explicitly defined and linked to other datasets. It then discusses question answering systems that take natural language questions and locate answers from document collections, including both closed-domain systems with restricted knowledge bases and open-domain systems that retrieve answers from the web. The document also presents the linked data technology stack and some examples of linked open data clouds from 2007 to 2011 to demonstrate the growth of linked data on the web.
Linked Open Data Alignment and Enrichment Using Bootstrapping Based TechniquesPrateek Jain
The recent emergence of the “Linked Data” approach for publishing data represents a major step forward in realizing the original vision of a web that can “understand and satisfy the requests of people and machines to use the web content” – i.e. the Semantic Web. This new approach has resulted in the Linked Open Data (LOD) Cloud, which includes more than 70 large datasets contributed by experts belonging to diverse communities such as geography, entertainment, and life sciences. However, the current interlinks between datasets in the LOD Cloud – as we will illustrate – are too shallow to realize much of the benefits promised. If this limitation is left unaddressed, then the LOD Cloud will merely be more data that suffers from the same kinds of problems, which plague the Web of Documents, and hence the vision of the Semantic Web will fall short.
This thesis presents a comprehensive solution to address these issues using a bootstrapping based approach. It showcases using bootstrapping based methods to identify and create richer relationships between LOD datasets. The BLOOMS project (http://wiki.knoesis.org/index.php/BLOOMS) and the PLATO project, both built as part of this research, have provided evidence to the feasibility and the applicability of the solution.
This document discusses Neo4j and its applications in bioinformatics. It describes Bio4j, an open source bioinformatics graph database built using Neo4j that integrates data from sources like Uniprot, NCBI taxonomy, Gene Ontology, and more. Bio4j models biological data as nodes and relationships in a graph structure rather than tables. This allows for more flexible querying and knowledge integration. The document provides examples of how Bio4j can be accessed through its Java API, Cypher query language, Gremlin traversal language, and REST API. It also describes some tools and visualizations for exploring and analyzing Bio4j data.
A QA system takes in a natural language question, analyzes it to understand the type of question and information sought, searches structured and unstructured data sources for relevant information, and generates a natural language answer. It consists of modules for question analysis, information retrieval from knowledge bases and documents, answer generation, and response formatting. The goal is to delegate more interpretation work to machines so users can get direct answers to complex questions over heterogeneous data.
Introduction to question answering for linked data & big dataAndre Freitas
This document discusses question answering (QA) systems in the context of big data and heterogeneous data scenarios. It outlines the motivation and challenges for developing natural language interfaces for databases. The document covers the basic concepts and taxonomy of QA systems, including question types, answer types, data sources, and domains. It also discusses the anatomy and components of a typical QA system.
NSF Workshop Data and Software Citation, 6-7 June 2016, Boston USA, Software Panel
FIndable, Accessible, Interoperable, Reusable Software and Data Citation: Europe, Research Objects, and BioSchemas.org
Semantics-aware Techniques for Social Media Analysis, User Modeling and Recom...Cataldo Musto
This document provides an overview and agenda for a tutorial on semantics-aware techniques for social media analysis, user modeling, and recommender systems. The tutorial will discuss how to represent content to improve information access and build new services for social media. It will cover why intelligent information access is needed to effectively cope with information overload, and how semantics can be introduced through natural language processing and by encoding endogenous and exogenous semantics. The agenda includes explaining recommendations, semantic user profiles based on social data, and semantic analysis of social streams.
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)Ig Bittencourt
The document discusses developing Linked Data and Semantic Web applications. It begins with key concepts related to Linked Data, the Semantic Web, and applications. It then describes two key steps in developing such applications: publishing data as Linked Data and consuming Linked Data to build applications. Examples are provided of extracting, enriching, and linking different datasets to build a real estate recommendation application that performs semantic searches over the integrated data. Ontologies are created and reused to represent the domains and support interoperability. The document emphasizes integrating the data and software engineering perspectives in developing Semantic Web applications.
The Datalift Project aims to publish and interconnect government open data. It develops tools and methodologies to transform raw datasets into interconnected semantic data. The project's first phase focuses on opening data by developing an infrastructure to ease publication. The second phase will validate the platform by publishing real datasets. The goal of Datalift is to move data from its raw published state to being fully interconnected on the Semantic Web.
Datalift is a project that aims to catalyze the publication and interconnection of data on the web. It provides tools and services to help with various steps in the data publication process including:
- Dataset publication and conversion tools to automate publishing raw data as linked data using RDF.
- Infrastructure for storing and querying published RDF data using SPARQL endpoints and RDF stores.
- Linkage tools to help interconnect published datasets by finding equivalence links between resources.
- Applications that visualize and make use of published and interlinked datasets to demonstrate the value of linked open data.
Riding the wave - Paradigm shifts in information accessdatacite
The document discusses the paradigm shifts in scientific information access over time from empirical observation to computational simulation. It outlines the challenges libraries now face in providing access to non-textual scientific content like research data and simulations. The document also introduces DataCite, a global consortium that issues digital object identifiers (DOIs) to datasets to help make them accessible, citable, and traceable like scholarly articles.
The document discusses a webinar presented by NISO and DCMI on Schema.org and Linked Data. The webinar provides an overview of Schema.org and Linked Data, examines the advantages and challenges of using RDF and Linked Data, looks at Schema.org in more detail, and discusses how Schema.org and Linked Data can be combined. The goals of the webinar are to illustrate the different design choices for identifying entities and describing structured data, integrating vocabularies, and incentives for publishing accurate data, as well as to help guide adoption of Schema.org and Linked Data approaches.
This document provides an overview of the RDF data model. It discusses the history and development of RDF standards from 1997 to 2014. It explains that an RDF graph is made up of triples consisting of a subject, predicate, and object. It provides examples of RDF triples and their N-triples representation. It also describes RDF syntaxes like Turtle and features of RDF like literals, blank nodes, and language-tagged strings.
This document summarizes work being done to express the Data Documentation Initiative (DDI) metadata standard in Resource Description Framework (RDF) format to improve discovery and linking of microdata on the Web of Linked Data. It describes background on the DDI to RDF mapping effort, the goals of making microdata more accessible and interoperable online, and examples of how the RDF representation would support common discovery use cases. It also provides information on tools and next steps for the ongoing work, acknowledging contributions from participants in workshops where this effort was discussed.
Introduction to Ontology Concepts and TerminologySteven Miller
The document introduces an ontology tutorial that will cover basic concepts of the Semantic Web, Linked Data, and the Resource Description Framework data model as well as the ontology languages RDFS and OWL. The tutorial is intended for information professionals who want to gain an introductory understanding of ontologies, ontology concepts, and terminology. The tutorial will explain how to model and structure data as RDF triples and create basic RDFS ontologies.
Objectification Is A Word That Has Many Negative ConnotationsBeth Johnson
Here is an introduction to social web mining and big data:
Social web mining is the process of extracting useful information and knowledge from social media data. With the rise of big data, social media platforms are generating massive amounts of unstructured data every day in the form of posts, comments, shares, likes, etc. This user-generated data holds valuable insights about people's opinions, interests, behaviors and more.
Big data analytics provides tools and techniques to analyze this large, complex social data at scale. Social web mining applies data mining and machine learning algorithms to big social data to discover patterns and relationships. Areas of focus include sentiment analysis to understand public opinions on brands, products or issues; network analysis to map relationships and influence; and
Project Credit: Melissa Haendel - On the Nature of CreditCASRAI
This document discusses credit and attribution in research. It provides an example scenario of researchers involved in a project and publication. It also discusses modeling relationships between people, publications, datasets and other research entities. The document recommends using standards like PROV-O and the W3C Dataset Description standard to represent these relationships and enable attribution and reproducibility. Questions that can be asked by representing roles and contributions in a formal language are also presented.
Talk at 3th Keystone Training School - Keyword Search in Big Linked Data - Institute for Software Technology and Interactive Systems, TU Wien, Austria, 2017
VIZ-VIVO: Towards Visualizations-driven Linked Data NavigationMuhammad Javed
Paper published in ISWC correlated workshop VOILA 2016.
Abstract: Scholars@Cornell is a new project of Cornell University Library (CUL) that provides linked data and novel visualizations of the scholarly record. Our goal is to enable easy discovery of explicit and latent patterns that can reveal high-impact research areas, the dynamics of scholarly collaboration, and expertise of faculty and researchers. We describe VIZ-VIVO, an extension for the VIVO framework that enables end-user exploration of a scholarly knowledge-base through a configurable set of data-driven visualizations. Unlike systems that provide web pages of researcher profiles using lists and directory-style metaphors, our work explores the power of visual metaphors for navigating a rich semantic network of scholarly data modeled with the VIVO-ISF ontology. We produce dynamic web pages using D3 visualizations and bridge the user experience layer with the underlying semantic triple-store layer. Our selection of visual metaphors enables end users to start with the big picture of scholarship and navigate to individuals faculty and researchers within a macro visual context. The D3-enabled interactive environment can guide the user through a sea of scholarly data depending on the questions the user wishes to answer. In this paper, we discuss our process for selection, design, and development of an initial set of visualizations as well as our approach to the underlying technical architecture. By engaging an initial set of pilot partners we are evaluating the use of these data-driven visualizations by multiple stakeholders, including faculty, students, librarians, administrators, and the public.
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasAngelo Salatino
Ontologies of research areas are important tools for characterising, exploring, and analysing the research landscape. Some fields of research are comprehensively described by large-scale taxonomies, e.g., MeSH in Biology and PhySH in Physics. Conversely, current Computer Science taxonomies are coarse-grained and tend to evolve slowly. For instance, the ACM classification scheme contains only about 2K research topics and the last version dates back to 2012. In this paper, we introduce the Computer Science Ontology (CSO), a large-scale, automatically generated ontology of research areas, which includes about 15K topics and 70K semantic relationships. It was created by applying the Klink-2 algorithm on a very large dataset of 16M scientific articles. CSO presents two main advantages over the alternatives: i) it includes a very large number of topics that do not appear in other classifications, and ii) it can be updated automatically by running Klink-2 on recent corpora of publications. CSO powers several tools adopted by the editorial team at Springer Nature and has been used to enable a variety of solutions, such as classifying research publications, detecting research communities, and predicting research trends. To facilitate the uptake of CSO we have developed the CSO Portal, a web application that enables users to download, explore, and provide granular feedback on CSO at different levels. Users can use the portal to rate topics and relationships, suggest missing relationships, and visualise sections of the ontology. The portal will support the publication of and access to regular new releases of CSO, with the aim of providing a comprehensive resource to the various communities engaged with scholarly data.
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasAngelo Salatino
Ontologies of research areas are important tools for characterising, exploring, and analysing the research landscape. Some fields of research are comprehensively described by large-scale taxonomies, e.g., MeSH in Biology and PhySH in Physics. Conversely, current Computer Science taxonomies are coarse-grained and tend to evolve slowly. For instance, the ACM classification scheme contains only about 2K research topics and the last version dates back to 2012. In this paper, we introduce the Computer Science Ontology (CSO), a large-scale, automatically generated ontology of research areas, which includes about 15K topics and 70K semantic relationships. It was created by applying the Klink-2 algorithm on a very large dataset of 16M scientific articles. CSO presents two main advantages over the alternatives: i) it includes a very large number of topics that do not appear in other classifications, and ii) it can be updated automatically by running Klink-2 on recent corpora of publications. CSO powers several tools adopted by the editorial team at Springer Nature and has been used to enable a variety of solutions, such as classifying research publications, detecting research communities, and predicting research trends. To facilitate the uptake of CSO we have developed the CSO Portal, a web application that enables users to download, explore, and provide granular feedback on CSO at different levels. Users can use the portal to rate topics and relationships, suggest missing relationships, and visualise sections of the ontology. The portal will support the publication of and access to regular new releases of CSO, with the aim of providing a comprehensive resource to the various communities engaged with scholarly data.
Searching for patterns in crowdsourced informationSilvia Puglisi
This document introduces crowdsourcing and discusses discovering patterns in crowdsourced data. It discusses defining the context of volunteered information on the internet in order to understand relationships between data. A network model is proposed where different types of context define nodes and relationships between context determine edges. Properties of small world networks are discussed including how they could be used to model relationships between crowdsourced data and evaluate data quality. Finally, applications to search ranking, privacy and security are briefly mentioned.
Scholars@Cornell: Visualizing the Scholarship DataMuhammad Javed
Short paper published in IEEE Visualizations in Practice workshop. Phoenix, AZ.
A new project of CUL is Scholars@Cornell, a data and visualization service built upon VIVO’s semantic, linked data knowledge-base that represents the record of scholarship produced by Cornell faculty and researchers. While adhering to the VIVO ontology, our work on Scholars@Cornell helps move VIVO forward in the technology areas that require a looser coupling of backend and frontend technologies. One key question we set out to answer was “how can visual mediation help users navigate the rich semantic data that represent the scholarship data recorded in VIVO knowledge-base?” Can visualizations be used to make the content more consumable and answer the questions that cannot easily be answered by browsing list views.
UKOLN supports repositories and provides repository infrastructure support through several JISC-funded projects. It has developed a Dublin Core Application Profile for Scholarly Works that defines a richer metadata model based on FRBR and expresses it using Dublin Core. This profile aims to provide consistent, unambiguous metadata to enable added-value services for repositories. UKOLN is working to promote community adoption of the profile.
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsJon Voss
This document discusses practical applications of Linked Open Data (LOD) for libraries, archives, and museums. It describes how LOD allows these institutions to publish structured data on the web in ways that are interoperable and can be connected to other open datasets. Examples are given of how LOD is being used by various institutions to share metadata, images, and other cultural heritage assets on the web in open, machine-readable formats. The presenter argues that LOD represents a new paradigm that these cultural organizations should embrace to make their collections more accessible and useful on the web.
This document discusses the role of thesauri and standard vocabularies in linking data on the semantic web. It explains how thesauri were traditionally used to ensure consistency in library indexing but are now being used as building blocks for the semantic web. The document outlines how AGROVOC, FAO's multilingual controlled vocabulary, has been converted to SKOS and linked to other vocabularies to facilitate integration of agricultural data from different sources on the semantic web. It also describes how AGROVOC is being used to semantically tag unstructured text by tools like AgroTagger to help structure and link more agricultural information online.
This document describes using collaborative knowledge bases like Wikipedia to support exploratory search tasks. It presents an approach that extracts concepts and their relationships from Wikipedia to build a concept network. Documents are then ranked based on their relationships to these concepts. An experiment ranks journal abstracts given a seed abstract, comparing the proposed Wikipedia-based approach to a maximal marginal relevance technique. The Wikipedia approach provided more diverse results while maintaining high relevance, showing potential for improving exploratory search.
folksonomy, social tagging, tag clouds, automatic folksonomy construction, word clouds, wordle,context-preserving word cloud visualisation, CPEWCV, seam carving, inflate and push, star forest, cycle cover, quantitative metrics, realized adjacencies, distortion, area utilization, compactness, aspect ratio, running time, semantics in language technology
Edited and revised: Overview of the international and interdisciplinary Gordon Research Conference on Visualization in Science and Education and info on key cognitive science and other visualization researchers. History of the conference, NSF workshop, and research on learning with visualizations.
Research Inventy : International Journal of Engineering and Scienceresearchinventy
Research Inventy : International Journal of Engineering and Science is published by the group of young academic and industrial researchers with 12 Issues per year. It is an online as well as print version open access journal that provides rapid publication (monthly) of articles in all areas of the subject such as: civil, mechanical, chemical, electronic and computer engineering as well as production and information technology. The Journal welcomes the submission of manuscripts that meet the general criteria of significance and scientific excellence. Papers will be published by rapid process within 20 days after acceptance and peer review process takes only 7 days. All articles published in Research Inventy will be peer-reviewed.
Similar to Assessing, Creating and Using Knowledge Graph Restrictions (20)
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"sameer shah
Embark on a captivating financial journey with 'Financial Odyssey,' our hackathon project. Delve deep into the past performance of two companies as we employ an array of financial statement analysis techniques. From ratio analysis to trend analysis, uncover insights crucial for informed decision-making in the dynamic world of finance."
Open Source Contributions to Postgres: The Basics POSETTE 2024ElizabethGarrettChri
Postgres is the most advanced open-source database in the world and it's supported by a community, not a single company. So how does this work? How does code actually get into Postgres? I recently had a patch submitted and committed and I want to share what I learned in that process. I’ll give you an overview of Postgres versions and how the underlying project codebase functions. I’ll also show you the process for submitting a patch and getting that tested and committed.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
Build applications with generative AI on Google CloudMárton Kodok
We will explore Vertex AI - Model Garden powered experiences, we are going to learn more about the integration of these generative AI APIs. We are going to see in action what the Gemini family of generative models are for developers to build and deploy AI-driven applications. Vertex AI includes a suite of foundation models, these are referred to as the PaLM and Gemini family of generative ai models, and they come in different versions. We are going to cover how to use via API to: - execute prompts in text and chat - cover multimodal use cases with image prompts. - finetune and distill to improve knowledge domains - run function calls with foundation models to optimize them for specific tasks. At the end of the session, developers will understand how to innovate with generative AI and develop apps using the generative ai industry trends.
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Kaxil Naik
Navigating today's data landscape isn't just about managing workflows; it's about strategically propelling your business forward. Apache Airflow has stood out as the benchmark in this arena, driving data orchestration forward since its early days. As we dive into the complexities of our current data-rich environment, where the sheer volume of information and its timely, accurate processing are crucial for AI and ML applications, the role of Airflow has never been more critical.
In my journey as the Senior Engineering Director and a pivotal member of Apache Airflow's Project Management Committee (PMC), I've witnessed Airflow transform data handling, making agility and insight the norm in an ever-evolving digital space. At Astronomer, our collaboration with leading AI & ML teams worldwide has not only tested but also proven Airflow's mettle in delivering data reliably and efficiently—data that now powers not just insights but core business functions.
This session is a deep dive into the essence of Airflow's success. We'll trace its evolution from a budding project to the backbone of data orchestration it is today, constantly adapting to meet the next wave of data challenges, including those brought on by Generative AI. It's this forward-thinking adaptability that keeps Airflow at the forefront of innovation, ready for whatever comes next.
The ever-growing demands of AI and ML applications have ushered in an era where sophisticated data management isn't a luxury—it's a necessity. Airflow's innate flexibility and scalability are what makes it indispensable in managing the intricate workflows of today, especially those involving Large Language Models (LLMs).
This talk isn't just a rundown of Airflow's features; it's about harnessing these capabilities to turn your data workflows into a strategic asset. Together, we'll explore how Airflow remains at the cutting edge of data orchestration, ensuring your organization is not just keeping pace but setting the pace in a data-driven future.
Session in https://budapestdata.hu/2024/04/kaxil-naik-astronomer-io/ | https://dataml24.sessionize.com/session/667627
Assessing, Creating and Using Knowledge Graph Restrictions
1. Assessing, Creating and Using
Knowledge Graph Restrictions
Sven Lieber, supervised by Anastasia Dimou and Ruben Verborgh
10.03.2022 - public PhD defense
2. Assessing, Creating and Using
Knowledge Graph Restrictions
Sven Lieber, supervised by Anastasia Dimou and Ruben Verborgh
10.03.2022 - public PhD defense
?
3. Assessing, Creating and Using
? ? ?
Sven Lieber, supervised by Anastasia Dimou and Ruben Verborgh
10.03.2022 - public PhD defense
?
5. This PhD is about information processing
Telescope science?
=> Astronomy!
Microscope science?
=> Biology!
Computer science?
=> Information!
“Computer science involves
the study of or the practice of
computation, automation,
and information” - Wikipedia
8. “24 hours in photos”, 2011 from Erik Kessels
350k printed images uploaded to Flickr in a single day
Large amount of unconnected data
What is on these two images,
and are they connected somehow?
9. What is what? We need semantics!
“a separate seat for one person,
typically with a back and four legs.”
- Oxford Languages
“the person in charge of a meeting or
of an organization (used as a neutral
alternative to chairman or
chairwoman)” - Oxford Languages
Please think of “a chair”
10. Different definitions and understanding about data
Data Silo 1 Data Silo 3
Data Silo 2
A person is alive, has a
first and last name and
has a residence address
A person is real or
fictional
A person is the user of
the app identified via
an Email address
?
?
?
How many persons can we
reach with our marketing
campaign in Ghent?
11. Data and data modeling using a graph
Person
Sven
Organization
University
Supervisor
Anastasia
Ruben
is subclass
is a
is subclass is subclass
UGent
is a
is enrolled at
knows
knows
is a
is a
A Knowledge Graph
(i) real world entities in a graph structure
(ii) classes and relations in a schema
(iii) linking of arbitrary entities
(iv) covers various topical domains
“Knowledge Graph Refinement: A Survey of
Approaches and Evaluation Methods”, Semantic Web
Journal, 2016, Heiko Paulheim
PhD
student
knows
is enrolled at
12. Link data in a flexible way
Person
Sven
Organization
University
Supervisor
Anastasia
Ruben
is subclass
is a
is subclass is subclass
UGent
is a
is enrolled at
knows
knows
is a
is a
A Knowledge Graph
(i) real world entities in a graph structure
(ii) classes and relations in a schema
(iii) linking of arbitrary entities
(iv) covers various topical domains
“Knowledge Graph Refinement: A Survey of
Approaches and Evaluation Methods”, Semantic Web
Journal, 2016, Heiko Paulheim
PhD
student
knows
is enrolled at
13. Express the data model in a flexible way
Person
Sven
Organization
University
Supervisor
Anastasia
Ruben
is subclass
is a
is subclass is subclass
UGent
is a
is enrolled at
knows
knows
is a
is a
A Knowledge Graph
(i) real world entities in a graph structure
(ii) classes and relations in a schema
(iii) linking of arbitrary entities
(iv) covers various topical domains
“Knowledge Graph Refinement: A Survey of
Approaches and Evaluation Methods”, Semantic Web
Journal, 2016, Heiko Paulheim
PhD
student
knows
is enrolled at
14. A uniform graph representation
Person
Sven
Organization
University
Supervisor
Anastasia
Ruben
is subclass
is a
is subclass is subclass
UGent
is a
is enrolled at
knows
knows
is a
is a
A Knowledge Graph
(i) real world entities in a graph structure
(ii) classes and relations in a schema
(iii) linking of arbitrary entities
(iv) covers various topical domains
“Knowledge Graph Refinement: A Survey of
Approaches and Evaluation Methods”, Semantic Web
Journal, 2016, Heiko Paulheim
PhD
student
knows
is enrolled at
15. Data integration because of reused definitions of things
Data Silo 1 Data Silo 3
Data Silo 2
A person is alive, has a
first and last name and
has a residence address
A person is real or
fictional
A person is the user of
the app identified via
an Email address
“A vocabulary defines the concepts
and relationships describing an area
of concern” - World Wide Web
Consortium (W3C)
16. Crash course about the context
-> Represent data in a uniform graph structure
PhD presentation
17. But how can this be used by a computer?
Data Silo 1 Data Silo 3
Data Silo 2
A person is alive, has a
first and last name and
has a residence address
A person is real or
fictional
A person is the user of
the app identified via
an Email address
18. Keep the flexible graph representation
in a computer readable text format by using “triples”
Person is a Class .
PhD student is subclass Person .
Supervisor is subclass Person .
University is subclass Organization .
Anastasia is a Supervisor .
Ruben is a Supervisor .
Sven is a PhD Student .
UGent is a University .
Sven is enrolled at UGent .
19. Reuse the web as global information system
Person is a Class .
http://xmlns.com/foaf/0.1/Person
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://www.w3.org/2000/01/rdf-schema#Class .
foaf:Person rdf:type rdfs:Class .
20. Reuse the web as global information system
-> reuse of definitions for shared understanding
-> link to existing data
foaf:Person rdf:type rdf:Class .
ex:PhdStudent rdfs:subClassOf foaf:Person .
ex:Supervisor rdfs:subClassOf foaf:Person .
ex:University rdfs:subClassOf foaf:Organization .
data:anastasia rdf:type ex:Supervisor .
data:ruben rdf:type ex:Supervisor .
data:sven rdf:type ex:PhDStudent .
data:ugent rdf:type ex:University .
data:sven ex:enrolledAt data:ugent .
data:sven foaf:givenName “Sven” .
data:sven foaf:familyName “Lieber” .
21. Crash course about the context
-> Data in a uniform graph structure
-> Use the web to represent the graph
PhD presentation
22. This does not seem right to us …
UGent Sven
is enrolled at
wroteBook
Train 123
… but okay for a computer
because we did not restrict possible links
23. Let’s talk about semantics … again
“a separate seat for one person,
typically with a back and four legs.”
- Oxford Languages
“the person in charge of a meeting or
of an organization (used as a neutral
alternative to chairman or
chairwoman)” - Oxford Languages
24. Let’s talk about semantics … again
“a separate seat for one person,
typically with a back and four legs.”
- Oxford Languages
“the person in charge of a meeting or
of an organization (used as a neutral
alternative to chairman or
chairwoman)” - Oxford Languages
25. We can distinguish now between different things
“a separate seat for one person,
typically with a back and four legs.”
- Oxford Languages
“the person in charge of a meeting or
of an organization (used as a neutral
alternative to chairman or
chairwoman)” - Oxford Languages
26. Without restrictions a computer cannot differentiate
Sven
knows
?
Domain and Range axioms: “knows”
connects two instances of class Person
Axioms are “statements that are asserted to be true in the
domain being described” - OWL2 Structural Specification
and Functional-Style Syntax, W3C 2012
27. Provide formal meaning using axioms which supports
inferring new knowledge
Sven
knows
Domain and Range axioms: “knows”
connects two instances of class Person
Person
is a
is a
new “is a” relationships inferred!
28. What can be inferred here?
Sven
knows
4
has legs
?
Axiom: something with 4 legs is a chair
29. Ups, we created a Person-Chair
Sven
knows
4
has legs
Person
Chair
Axiom: something with 4 legs is a chair
is a
is a
is a
new “is a” relationships inferred!
30. Use constraints to define what is valid
Data shapes express “structural constraints to
validate instance data” - SHACL Use Cases and
Requirements, W3C 2017
Person
Birth date
Last name
First name
For example: persons need a birth
date, last name and first name
31. Vocabulary -> Ontology
“An Ontology is a formal, explicit
specification of a shared
conceptualization” - Thomas R. Gruber
(1993)
Person
Organization
University
Supervisor
is subclass
is subclass is subclass
PhD
student
knows
is enrolled at
“A Conceptualization is an intensional
semantic structure which encodes the
implicit rules constraining the
structure of a piece of reality” -
Guarino et al. (1995)
“The OWL 2 RDF-Based Semantics gives a formal meaning to
every RDF graph” - OWL2 RDF-based Semantics, W3C 2012
32. The use of restrictions varies in practice
only subclasses Different restrictions
defining formal
meaning
structured metadata in
websites using schema.org
Neuro Behavior
ontology (NBO)
A program to infer knowledge
(a reasoner) needs formal meaning
33. Crash course about the context
-> Data in a uniform graph structure
-> Use the web to represent the graph
-> We can restrict meaning using axioms or restrict what is
valid using constraints
PhD presentation
34. Crash course about the context
PhD presentation
Congratulations, you passed Knowledge Graphs 101
35. Assessing, Creating and Using
Knowledge Graph Restrictions
Sven Lieber, supervised by Anastasia Dimou and Ruben Verborgh
10.03.2022 - public PhD defense
36. Users need support
Assessing restrictions using Montolo
Creating restrictions using visual notations
Using restrictions to enable data stewardship
Conclusion
37. Users need support
Assessing restrictions using Montolo
Creating restrictions using visual notations
Using restrictions to enable data stewardship
Conclusion
38. Imagine you want to create an application (data model)
Reuse existing concepts which fit your use case
for example an event planning app
39. Reusing ontologies is usually a multi-step process
Discovery of reuse
candidates
Selection of
relevant
ontologies
Customization
and integration
of reused
ontologies
40. Imagine you want to create an application (data model)
Reuse existing concepts which fit your use case
for example an event planning app
Create your own local constraints
for example Corona measures which temporarily apply
41. Creating constraints
Person
Birth date
Last name
First name
USER
schema:DatedMoneySpecification
rdf:type sh:NodeShape ;
sh:closed "true"^^xsd:boolean ;
sh:ignoredProperties (
rdf:type
) ;
sh:property [
sh:path schema:amount ;
sh:datatype xsd:float ;
sh:maxCount 1 ;
sh:minCount 1 ;
] ;
sh:property [
sh:path schema:currency ;
rdfs:comment "The currency code (here) is a
mandatory property consisting of three upper-case
letters" ;
sh:datatype xsd:string ;
sh:flags "i" ;
sh:maxCount 1 ;
sh:minCount 1 ;
sh:pattern "^[A-Z]{3}$" ;
] ;
What users get!
What users want
is visual support!
42. Main research question
How can we support users in the assessment and in the
creation of Knowledge Graph restrictions?
43. Users need support
Assessing restrictions using Montolo
Creating restrictions using visual notations
Using restrictions to enable data stewardship
Conclusion
44. The use of restrictions varies in practice
only subclasses Different restrictions
defining formal
meaning
structured metadata in
websites using schema.org
Neuro Behavior
ontology (NBO)
A program to infer knowledge
(a reasoner) needs formal meaning
45. Different types of restrictions are available in RDFS/OWL
only subclasses Different restrictions
defining formal
meaning
Domain
Disjoint
Properties
Literal ranges
Reasoner
46. only subclasses
Different restrictions
defining formal
meaning
But some restriction types come with a high
(computational) complexity … not always needed
Domain
Disjoint
Properties
Literal ranges
Reasoner
47. Reusing ontologies is usually a multi-step process
Discovery of reuse
candidates
Selection of
relevant
ontologies
Customization
and integration
of reused
ontologies
49. Does this vocabulary fit our use case?
Existing statistics do not provide any
information of what restrictions exist
in the vocabulary
50. Currently only a manual assessment
of ontologies, one by one
Ontology documentation pages
created by Widoco
Ontology loaded into the editor tool
Protégé
51. Discover and assess ontologies based on restriction use
Possible ontology reuse candidates
(colors = different restriction
types)
Use case
52. Discover and assess ontologies based on restriction use
Possible ontology reuse candidates
(colors = different restriction
types)
Restriction type
use statistics Use case
54. Created statistics are FAIR
The statistics are described using
Knowledge Graphs
Dataset available via a repository
or consultable via a website
55. How many ontologies use each restriction type?
A few often used
restriction types and a
long tail both
in LOV and BioPortal
Restriction types
56. Negligible number of literal value restrictions
Almost no literalRanges
restrictions
literalPattern not used
at all
57. Property and cardinality restrictions in the tail
Tail mostly consists of
property-based and
cardinality-based
restrictions expressed
using OWL terms
58. LOV vs BioPortal: qualified cardinalities
Qualified cardinalities
preferred in BioPortal
ontologies
59. LOV vs BioPortal: unqualified cardinalities
Unqualified cardinalities
preferred in LOV
ontologies
61. Domain and range used less in BioPortal
More domain/range
Restrictions in LOV
62. Commonly used constraint types and unused potential
Data shapes are relatively
new, here we could only
investigate 19 data sources
63. Besides assessment support we learned from the statistics
and we can ask more questions
Only half of the ontologies use OWL-based axioms
Little attention for literal values
Attention with editing tools regarding a self fulfilling prophecy
64. Users need support
Assessing restrictions using Montolo
Creating restrictions using visual notations
Using restrictions to enable data stewardship
Conclusion
65. Creating constraints
USER
schema:DatedMoneySpecification
rdf:type sh:NodeShape ;
sh:closed "true"^^xsd:boolean ;
sh:ignoredProperties (
rdf:type
) ;
sh:property [
sh:path schema:amount ;
sh:datatype xsd:float ;
sh:maxCount 1 ;
sh:minCount 1 ;
] ;
sh:property [
sh:path schema:currency ;
rdfs:comment "The currency code (here) is a
mandatory property consisting of three upper-case
letters" ;
sh:datatype xsd:string ;
sh:flags "i" ;
sh:maxCount 1 ;
sh:minCount 1 ;
sh:pattern "^[A-Z]{3}$" ;
] ;
What users want
is visual support!
What users get!
66. Different constraint types need to be visualized
USER
Or
Disjoint
Not
What users want
is visual support!
67. Existing tools do not specify how to visualize
all SHACL core constraints
USER
Or
Disjoint
Existing visual tools
68. Based on existing cognitive theories and experiments
we can define how to systematically visualize constraint types
USER
Or
Disjoint
Moody, Daniel. "The ‘physics’ of
notations: toward a scientific basis for
constructing visual notations in software
engineering." IEEE Transactions on
software engineering 35.6 (2009): 756-
779.
70. Chapter: Constraint creation
How can we support users familiar with Linked Data
in viewing RDF constraints?
Users familiar with Linked Data
can answer questions about
visually represented RDF constraints
more accurately with a VOWL-based visual notation
than with a UML-based visual notation
72. Compare visual notations in a user study with 12 participants
Two visual notations
to visualize the same
semantic constructs
Test case Group 1 Group 2
Test case 1 ShapeUML ShapeVOWL
Test case 2 ShapeVOWL ShapeUML
Test case 3 ShapeUML ShapeVOWL
Test case 4 ShapeVOWL ShapeUML
ShapeVOWL ShapeUML
Pre assessment (social demographics + skills)
Main questionnaire to assess
accuracy of answers to provided questions
Post assessment (opinion)
75. Besides having 2 new visual notations,
we gained new qualitative insights!
Space efficient representation using ShapeUML
Good to have several notations because of familiarity bias
Visual features are important and can also improve ShapeUML
76. Users need support
Assessing restrictions using Montolo
Creating restrictions using visual notations
Using restrictions to enable data stewardship
Conclusion
78. Valuable information in archived records
Historic government records or early climate data,
e.g. demographics or taxes on crop yields
Invaluable data loss
NASA is unable to locate the original high quality
moon landing video.
How about 21st century data?
Social media content influences the real world,
what if Twitter and Co are gone?
Historical records
Moon landing in the 1960s
The web and social media
79. BESOCIAL: a cross-institutional research project to
develop a social media archiving strategy for Belgium
Follow up of a project for
general web archiving
Lead by the Royal
Library of Belgium
Research partners
with different
expertise
Funded by the Belgian
Science Policy Office
84. Knowledge Graph-based workflow for data stewardship
Society
#meToo
#IchBinHanna
Data
format
A
Data
format
B
Heterogeneous data sources Knowledge Graph Views on the data in different formats
85. Quality is use-case specific and can be
systematically defined and measured
For example quality dimension
“Rich collection description”
86. Quality Assessment using
Knowledge Graphs and restrictions
40 user stories such as “As an archive-user, I want to see
descriptive information about the collection from the archivist,
so I can assess if the content is relevant to me.”
Derive quality requirements such as “The description of
each collection should at least have 200 characters”
Metric: Missing collection description
Metric: Number of missing descriptions
Metric: Insufficient collection description
Metric: Number of insufficient descriptions
Report
Quality
Assessment
88. A Knowledge Graph and restrictions supported data
stewardship for social media archiving
Providing an integrated view on the data (with formal
meaning)
Assisted in an automated quality assessment by using constraints
The workflow is generalizable thus helpful in other use cases
89. Users need support
Assessing restrictions using Montolo
Creating restrictions using visual notations
Using restrictions to enable data stewardship
Conclusion
90. How can we support users in the assessment and in the
creation of Knowledge Graph restrictions?
Montolo statistics support restriction assessments
with FAIR data which was not possible before
We can rethink the value we give to restrictions, why and
how do we use restrictions systematically?
91. How can we support users in the assessment and in the
creation of Knowledge Graph restrictions?
There are now 2 visual notations covering all
SHACL core constraints
First steps to make Knowledge Graph constraints more
accessible to domain experts
92. How can we support users in the assessment and in the
creation of Knowledge Graph restrictions?
The BESOCIAL use case demonstrated the use of restrictions
to tackle data stewardship challenges
The future is less about tools and more about workflows
and data!
93. A circle representing the human knowledge
“The illustrated guide to a Ph.D.” - Matt Might
94. Little knowledge after elementary school
“The illustrated guide to a Ph.D.” - Matt Might
95. More knowledge after high school
“The illustrated guide to a Ph.D.” - Matt Might
96. Gaining speciality with the Bachelor’s degree
“The illustrated guide to a Ph.D.” - Matt Might
97. Deepen speciality with the Master’s degree
“The illustrated guide to a Ph.D.” - Matt Might
98. Reading research papers takes you to the edge of human knowledge
“The illustrated guide to a Ph.D.” - Matt Might
99. You focus at the boundary
“The illustrated guide to a Ph.D.” - Matt Might
100. You focus at the boundary for a few years
“The illustrated guide to a Ph.D.” - Matt Might
101. One day the boundary gives way
“The illustrated guide to a Ph.D.” - Matt Might
102. The dent you have made is called PhD
“The illustrated guide to a Ph.D.” - Matt Might
103. The world looks different to you now
“The illustrated guide to a Ph.D.” - Matt Might
104. Don’t forget the bigger picture
“The illustrated guide to a Ph.D.” - Matt Might
105. Newly raised questions: future work
Montolo provides metrics, but what are the higher level
dimensions, tools and its usability?
Why and how are restrictions used in the first place?
How do we build our future Knowledge Graphs from a
methodological point of view?
106. Questions & Answers
Dissertation available as PDF at https://sven-lieber.org/phd
SvenLieber sven-lieber.org
knows.idlab.ugent.be