n overview of existing solutions for link discovery and looked into some of the state-of-art algorithms for the rapid execution of link discovery tasks focusing on algorithms which guarantee result completeness.
(HOBBIT project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 688227.)
This document provides an overview of hands-on tasks for a link discovery tutorial using the Limes framework. It describes a test dataset, and three tasks: 1) executing a provided Limes configuration to detect duplicate authors, 2) creating a configuration to find similar publications based on keywords, and 3) using the Limes GUI.
An overview of existing solutions for link discovery and looked into some of the state-of-art algorithms for the rapid execution of link discovery tasks focusing on algorithms which guarantee result completeness.
(HOBBIT project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 688227.)
An overview of existing solutions for link discovery and looked into some of the state-of-art algorithms for the rapid execution of link discovery tasks focusing on algorithms which guarantee result completeness.
(HOBBIT project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 688227.)
An overview of existing solutions for link discovery and looked into some of the state-of-art algorithms for the rapid execution of link discovery tasks focusing on algorithms which guarantee result completeness.
(HOBBIT project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 688227.)
A paper presented at the 1st International Workshop on Benchmarking Linked Data (BLINK). We present experimental results with the instance matching benchmark generator LANCE that is developed in the context of HOBBIT.
(HOBBIT project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 688227.)
Can Deep Learning Techniques Improve Entity Linking?Julien PLU
Julien Plu presented on using deep learning techniques to improve entity linking. He discussed using word embeddings and neural networks to better recognize and link entities in documents by understanding semantic relationships between words. Current supervised methods require large training sets and are not robust to new entity types, while unsupervised methods have difficulties computing relatedness between candidate entities. Deep learning approaches may help address these issues through their ability to learn complex patterns from large amounts of unlabeled text data.
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...Julien PLU
The document presents research on developing an adaptive entity linking system called ADEL. It discusses 6 problems in entity linking and proposes research questions to address adaptivity to different text, entity types, knowledge bases, and languages. It describes ADEL's modular framework including extraction, linking, and pruning modules. Evaluation shows ADEL achieves state-of-the-art results on multiple datasets. Future work focuses on knowledge base and language adaptivity, improving the system, and engineering a distributed architecture.
Enhancing Entity Linking by Combining NER ModelsJulien PLU
The document describes enhancements made to the ADEL entity linking framework. ADEL combines multiple named entity recognition models and uses a combination of linguistic and dictionary-based approaches. New features in ADEL include using a generic API to interface with NLP tools, combining multiple CRF models for entity extraction, clustering nil entities, and developing a new backend using Elasticsearch and Couchbase. The document compares the performance of the original 2015 version of ADEL to the new 2016 version on standard entity linking tasks and datasets.
This document provides an overview of hands-on tasks for a link discovery tutorial using the Limes framework. It describes a test dataset, and three tasks: 1) executing a provided Limes configuration to detect duplicate authors, 2) creating a configuration to find similar publications based on keywords, and 3) using the Limes GUI.
An overview of existing solutions for link discovery and looked into some of the state-of-art algorithms for the rapid execution of link discovery tasks focusing on algorithms which guarantee result completeness.
(HOBBIT project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 688227.)
An overview of existing solutions for link discovery and looked into some of the state-of-art algorithms for the rapid execution of link discovery tasks focusing on algorithms which guarantee result completeness.
(HOBBIT project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 688227.)
An overview of existing solutions for link discovery and looked into some of the state-of-art algorithms for the rapid execution of link discovery tasks focusing on algorithms which guarantee result completeness.
(HOBBIT project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 688227.)
A paper presented at the 1st International Workshop on Benchmarking Linked Data (BLINK). We present experimental results with the instance matching benchmark generator LANCE that is developed in the context of HOBBIT.
(HOBBIT project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 688227.)
Can Deep Learning Techniques Improve Entity Linking?Julien PLU
Julien Plu presented on using deep learning techniques to improve entity linking. He discussed using word embeddings and neural networks to better recognize and link entities in documents by understanding semantic relationships between words. Current supervised methods require large training sets and are not robust to new entity types, while unsupervised methods have difficulties computing relatedness between candidate entities. Deep learning approaches may help address these issues through their ability to learn complex patterns from large amounts of unlabeled text data.
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...Julien PLU
The document presents research on developing an adaptive entity linking system called ADEL. It discusses 6 problems in entity linking and proposes research questions to address adaptivity to different text, entity types, knowledge bases, and languages. It describes ADEL's modular framework including extraction, linking, and pruning modules. Evaluation shows ADEL achieves state-of-the-art results on multiple datasets. Future work focuses on knowledge base and language adaptivity, improving the system, and engineering a distributed architecture.
Enhancing Entity Linking by Combining NER ModelsJulien PLU
The document describes enhancements made to the ADEL entity linking framework. ADEL combines multiple named entity recognition models and uses a combination of linguistic and dictionary-based approaches. New features in ADEL include using a generic API to interface with NLP tools, combining multiple CRF models for entity extraction, clustering nil entities, and developing a new backend using Elasticsearch and Couchbase. The document compares the performance of the original 2015 version of ADEL to the new 2016 version on standard entity linking tasks and datasets.
Framester: A Wide Coverage Linguistic Linked Data HubMehwish Alam
Framester is a linguistic linked data hub that aims to improve coverage of FrameNet by extending mappings between FrameNet and other resources like WordNet and BabelNet. Framester represents over 40 million triples linking linguistic and factual resources and aligning frames, roles, and types to foundational ontologies. It provides a word frame disambiguation service and was evaluated on annotated corpora, showing improved performance over previous approaches.
This document discusses representing computing concepts like Turing machines, programming patterns, and virtual machines using semantic networks and RDF graphs. It describes how instructions, data structures, objects, and software patterns can be modeled as nodes and relationships in a graph. It also introduces RDF as a standardized data model for semantic networks and triplestores for efficiently storing and querying large semantic graphs.
DS2014: Feature selection in hierarchical feature spacesPetar Ristoski
The document describes a proposed approach called SHSEL for hierarchical feature selection in machine learning. SHSEL exploits the hierarchical structure of feature spaces, where more specific features imply more general ones. It initially selects ranges of similar features in each branch based on relevance similarity. It then prunes the set further by selecting only the most relevant remaining features. The authors evaluate SHSEL on real and synthetic datasets compared to other feature selection methods, finding it achieves comparable or improved accuracy while significantly reducing the feature space.
RDF2Vec: RDF Graph Embeddings for Data MiningPetar Ristoski
Linked Open Data has been recognized as a valuable source for background information in data mining. However, most data mining tools require features in propositional form, i.e., a vector of nominal or numerical features associated with an instance, while Linked Open Data sources are graphs by nature. In this paper, we present RDF2Vec, an approach that uses language modeling approaches for unsupervised feature extraction from sequences of words, and adapts them to RDF graphs. We generate sequences by leveraging local information from graph sub-structures, harvested by Weisfeiler-Lehman Subtree RDF Graph Kernels and graph walks, and learn latent numerical representations of entities in RDF graphs. Our evaluation shows that such vector representations outperform existing techniques for the propositionalization of RDF graphs on a variety of different predictive machine learning tasks, and that feature vector representations of general knowledge graphs such as DBpedia and Wikidata can be easily reused for different tasks.
Applications of Word Vectors in Text Retrieval and Classificationshakimov
Applications of word vectors (word2vec, BERT, etc.) on problems such as text retrieval, classification of textual documents for tasks such as sentiment analysis, spam detection.
A presentation at DH2014-Lausanne of the eMOP (at the IDHMC at Texas A&M) on our post-processing triage method along with our expanded treatment and diagnosis queues for correcting and analysing Tessearct OCR results.
Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...shakimov
The document summarizes a PhD dissertation defense talk on learning multilingual semantic parsers for question answering over linked data. It discusses comparing neural and probabilistic graphical model architectures for semantic parsing to map natural language to formal meaning representations. The talk outlines introducing dependency parse tree-based approaches, evaluating different model architectures, and addressing challenges in building multilingual question answering systems over structured knowledge bases.
The document describes the Early Modern OCR Project (eMOP) which aims to Optical Character Recognition (OCR) over 300,000 documents and 45 million page images from the Early English Books Online (EEBO) and Eighteenth Century Collections Online (ECCO) collections. It outlines the challenges of OCRing early modern texts including inconsistent fonts, layouts, and quality issues. It then details eMOP's workflow which includes training Tesseract with manually labeled glyphs, de-noising and scoring OCR output, and making corrections through the TypeWright tool. The fully-searchable OCR and corrections will be made freely available through aggregators like 18thConnect to improve access and reuse of these texts
A presentation at DH2014-Lausanne of the Tesseract training methods and tools developed by eMOP (at the IDHMC at Texas A&M), and their uses for other book history and typeface history research.
ESWC 2013 Poster: Representing and Querying Negative Knowledge in RDFFariz Darari
Typically, only positive data can be represented in RDF. However, negative knowledge representation is required in some application domains such as food allergies, software incompatibility, and school absence. We present an approach to represent and query RDF data with negative data. We provide the syntax, semantics, and an example. We argue that this approach fits into the open-world semantics of RDF according to the notion of certain answers.
Get more information at http://dx.doi.org/10.1007/978-3-642-41242-4_40
Entity Linking in Queries: Tasks and EvaluationFaegheh Hasibi
The document discusses entity linking in queries for tasks like ad-hoc document retrieval and query understanding. It describes three main tasks for entity linking in queries: entity linking, semantic mapping, and interpretation finding. Semantic mapping allows mentions to overlap and links mentions to multiple entities, while interpretation finding returns sets of semantically related entity sets where mentions do not overlap within a set. The document also discusses test collections, evaluation metrics, and methods like mention detection, candidate entity ranking, and greedy interpretation finding for linking entity mentions in queries to knowledge bases.
Talk given at the SSSW 2013 Semantic Web Summerschool.
Part 1: What is "Semantic Web" (in 4 principles and 1 movie)
Part 2: What question can we ask now that we couldn't ask 10 years ago
Part 3: Treat Computer Science as a *science*, not just as engineering!
(this part a short version of http://slidesha.re/SaUhS4 )
Entity Search: The Last Decade and the Nextkrisztianbalog
Keynote talk given at the 10th Russian Summer School in Information Retrieval (RuSSIR ’16), Saratov, Russia, August 2016.
Note: part of the work is under still review; those slides are not yet included.
Approximation and Self-Organisation on the Web of DataKathrin Dentler
This document discusses using computational intelligence techniques like evolutionary computing and collective intelligence to handle challenges posed by the growing Web of Data. It describes how these techniques can provide adaptive, scalable, and robust approaches to tasks like ontology mapping, query answering, and reasoning. Evolutionary computing is proposed for optimization problems, while collective intelligence approaches may enable emergent behaviors from decentralized data flows and reasoning. While computational intelligence loses precision, it gains properties like adaptation, simplicity, scalability, and interactive behavior that are well-suited to the dynamic, distributed nature of the Web of Data.
We now have larger Knowledge Bases than ever before. (10 billion facts is now a small number).
We now have the instruments to observe and analyse these very large Knowledge Bases.
We can use these insights for better tools for querying, inferencing, publishing, maintaining, visualising and explaining.
The Early Modern OCR Project (eMOP) aims to improve the optical character recognition (OCR) of over 300,000 documents and 45 million pages from the Early English Books Online and Eighteenth Century Collections Online databases dating from 1475-1800. The project develops techniques to train OCR engines on the specific typefaces in these collections. Undergraduate students create ground truth data to train Tesseract, while tools like Aletheia and Franken+ standardize the process. Documents undergo processing, triage, treatment and correction. The resulting OCR and page tags will be made available through tools like TypeWright to allow scholars to search and correct texts.
This document provides a quick tour of data mining. It begins with an overview of the evolution of data management techniques from manual record keeping to modern big data and data science. It then discusses what data mining is, focusing on algorithms for discovering patterns in existing data. Various examples of data mining applications are also presented, as well as the origins of data mining in fields such as machine learning and databases. Finally, an overview of the key steps in the knowledge discovery process is given, including data preprocessing, data mining, and pattern evaluation.
Unsupervised Learning of an Extensive and Usable Taxonomy for DBpediaMarco Fossati
Talk given by fellow Claus Stadler at the 11th International Conference on Semantic Systems - SEMANTiCS 2015
Paper available here: http://jens-lehmann.org/files/2015/semantics_dbtax.pdf
QALD-7 @ ESWC 2017 Portoroz, Slovenia
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Presentation of QALD 7 challenge at ESWC2017: Question Answering over Linked Data.
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Framester: A Wide Coverage Linguistic Linked Data HubMehwish Alam
Framester is a linguistic linked data hub that aims to improve coverage of FrameNet by extending mappings between FrameNet and other resources like WordNet and BabelNet. Framester represents over 40 million triples linking linguistic and factual resources and aligning frames, roles, and types to foundational ontologies. It provides a word frame disambiguation service and was evaluated on annotated corpora, showing improved performance over previous approaches.
This document discusses representing computing concepts like Turing machines, programming patterns, and virtual machines using semantic networks and RDF graphs. It describes how instructions, data structures, objects, and software patterns can be modeled as nodes and relationships in a graph. It also introduces RDF as a standardized data model for semantic networks and triplestores for efficiently storing and querying large semantic graphs.
DS2014: Feature selection in hierarchical feature spacesPetar Ristoski
The document describes a proposed approach called SHSEL for hierarchical feature selection in machine learning. SHSEL exploits the hierarchical structure of feature spaces, where more specific features imply more general ones. It initially selects ranges of similar features in each branch based on relevance similarity. It then prunes the set further by selecting only the most relevant remaining features. The authors evaluate SHSEL on real and synthetic datasets compared to other feature selection methods, finding it achieves comparable or improved accuracy while significantly reducing the feature space.
RDF2Vec: RDF Graph Embeddings for Data MiningPetar Ristoski
Linked Open Data has been recognized as a valuable source for background information in data mining. However, most data mining tools require features in propositional form, i.e., a vector of nominal or numerical features associated with an instance, while Linked Open Data sources are graphs by nature. In this paper, we present RDF2Vec, an approach that uses language modeling approaches for unsupervised feature extraction from sequences of words, and adapts them to RDF graphs. We generate sequences by leveraging local information from graph sub-structures, harvested by Weisfeiler-Lehman Subtree RDF Graph Kernels and graph walks, and learn latent numerical representations of entities in RDF graphs. Our evaluation shows that such vector representations outperform existing techniques for the propositionalization of RDF graphs on a variety of different predictive machine learning tasks, and that feature vector representations of general knowledge graphs such as DBpedia and Wikidata can be easily reused for different tasks.
Applications of Word Vectors in Text Retrieval and Classificationshakimov
Applications of word vectors (word2vec, BERT, etc.) on problems such as text retrieval, classification of textual documents for tasks such as sentiment analysis, spam detection.
A presentation at DH2014-Lausanne of the eMOP (at the IDHMC at Texas A&M) on our post-processing triage method along with our expanded treatment and diagnosis queues for correcting and analysing Tessearct OCR results.
Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...shakimov
The document summarizes a PhD dissertation defense talk on learning multilingual semantic parsers for question answering over linked data. It discusses comparing neural and probabilistic graphical model architectures for semantic parsing to map natural language to formal meaning representations. The talk outlines introducing dependency parse tree-based approaches, evaluating different model architectures, and addressing challenges in building multilingual question answering systems over structured knowledge bases.
The document describes the Early Modern OCR Project (eMOP) which aims to Optical Character Recognition (OCR) over 300,000 documents and 45 million page images from the Early English Books Online (EEBO) and Eighteenth Century Collections Online (ECCO) collections. It outlines the challenges of OCRing early modern texts including inconsistent fonts, layouts, and quality issues. It then details eMOP's workflow which includes training Tesseract with manually labeled glyphs, de-noising and scoring OCR output, and making corrections through the TypeWright tool. The fully-searchable OCR and corrections will be made freely available through aggregators like 18thConnect to improve access and reuse of these texts
A presentation at DH2014-Lausanne of the Tesseract training methods and tools developed by eMOP (at the IDHMC at Texas A&M), and their uses for other book history and typeface history research.
ESWC 2013 Poster: Representing and Querying Negative Knowledge in RDFFariz Darari
Typically, only positive data can be represented in RDF. However, negative knowledge representation is required in some application domains such as food allergies, software incompatibility, and school absence. We present an approach to represent and query RDF data with negative data. We provide the syntax, semantics, and an example. We argue that this approach fits into the open-world semantics of RDF according to the notion of certain answers.
Get more information at http://dx.doi.org/10.1007/978-3-642-41242-4_40
Entity Linking in Queries: Tasks and EvaluationFaegheh Hasibi
The document discusses entity linking in queries for tasks like ad-hoc document retrieval and query understanding. It describes three main tasks for entity linking in queries: entity linking, semantic mapping, and interpretation finding. Semantic mapping allows mentions to overlap and links mentions to multiple entities, while interpretation finding returns sets of semantically related entity sets where mentions do not overlap within a set. The document also discusses test collections, evaluation metrics, and methods like mention detection, candidate entity ranking, and greedy interpretation finding for linking entity mentions in queries to knowledge bases.
Talk given at the SSSW 2013 Semantic Web Summerschool.
Part 1: What is "Semantic Web" (in 4 principles and 1 movie)
Part 2: What question can we ask now that we couldn't ask 10 years ago
Part 3: Treat Computer Science as a *science*, not just as engineering!
(this part a short version of http://slidesha.re/SaUhS4 )
Entity Search: The Last Decade and the Nextkrisztianbalog
Keynote talk given at the 10th Russian Summer School in Information Retrieval (RuSSIR ’16), Saratov, Russia, August 2016.
Note: part of the work is under still review; those slides are not yet included.
Approximation and Self-Organisation on the Web of DataKathrin Dentler
This document discusses using computational intelligence techniques like evolutionary computing and collective intelligence to handle challenges posed by the growing Web of Data. It describes how these techniques can provide adaptive, scalable, and robust approaches to tasks like ontology mapping, query answering, and reasoning. Evolutionary computing is proposed for optimization problems, while collective intelligence approaches may enable emergent behaviors from decentralized data flows and reasoning. While computational intelligence loses precision, it gains properties like adaptation, simplicity, scalability, and interactive behavior that are well-suited to the dynamic, distributed nature of the Web of Data.
We now have larger Knowledge Bases than ever before. (10 billion facts is now a small number).
We now have the instruments to observe and analyse these very large Knowledge Bases.
We can use these insights for better tools for querying, inferencing, publishing, maintaining, visualising and explaining.
The Early Modern OCR Project (eMOP) aims to improve the optical character recognition (OCR) of over 300,000 documents and 45 million pages from the Early English Books Online and Eighteenth Century Collections Online databases dating from 1475-1800. The project develops techniques to train OCR engines on the specific typefaces in these collections. Undergraduate students create ground truth data to train Tesseract, while tools like Aletheia and Franken+ standardize the process. Documents undergo processing, triage, treatment and correction. The resulting OCR and page tags will be made available through tools like TypeWright to allow scholars to search and correct texts.
This document provides a quick tour of data mining. It begins with an overview of the evolution of data management techniques from manual record keeping to modern big data and data science. It then discusses what data mining is, focusing on algorithms for discovering patterns in existing data. Various examples of data mining applications are also presented, as well as the origins of data mining in fields such as machine learning and databases. Finally, an overview of the key steps in the knowledge discovery process is given, including data preprocessing, data mining, and pattern evaluation.
Unsupervised Learning of an Extensive and Usable Taxonomy for DBpediaMarco Fossati
Talk given by fellow Claus Stadler at the 11th International Conference on Semantic Systems - SEMANTiCS 2015
Paper available here: http://jens-lehmann.org/files/2015/semantics_dbtax.pdf
QALD-7 @ ESWC 2017 Portoroz, Slovenia
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Presentation of QALD 7 challenge at ESWC2017: Question Answering over Linked Data.
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Exploiting Semantic Information for Graph-based Recommendations of Learning R...Mojisola Erdt née Anjorin
This document summarizes a presentation on exploiting semantic information for graph-based recommendations of learning resources using the CROKODIL platform. It introduces CROKODIL's extended folksonomy model, which incorporates semantic tag types, activities, learner groups, and friendships. It describes two approaches for calculating resource scores using the extended graph model: AScore, which runs a graph-based ranking algorithm on the extended folksonomy graph; and AInheritScore, which leverages activity hierarchies to calculate scores. The presentation evaluates these approaches on a real-world dataset using leave-one-out cross-validation, finding that AScore and GFolkRank perform best. It concludes by discussing future work, including additional evaluation
This document summarizes a presentation on exploiting semantic information for graph-based recommendations of learning resources using the CROKODIL platform. It introduces CROKODIL's extended folksonomy model, which incorporates semantic tag types, activities, learner groups, and friendships. It describes two approaches for calculating resource scores using the extended graph model: AScore, which runs a graph-based ranking algorithm on the extended folksonomy graph; and AInheritScore, which leverages activity hierarchies to calculate scores. The presentation evaluates these approaches on a real-world dataset using leave-one-out cross-validation, finding that AScore and GFolkRank perform best. It concludes by discussing future work, including additional evaluation
Articulating the connection between Learning Design and Learning AnalyticsAbelardo Pardo
Learning analytics is a discipline that uses data captured by technology during a learning experience to increase our level of understanding, increase its quality, and improve the environment in which it occurs. But these experiences need to be designed first. In this talk we start from the statement that there is no such thing as a neutral design. In the era of increasing technology mediation Learning experiences need to be designed considering the capacity to capture data, the possibility of making sense and derive knowledge from the data, and the need to act on that knowledge. In this talk we will explore some initiatives to make these connections explicit in a learning design. Using a flipped learning experience, we will explore how to embed data and data analysis as part of the design tasks.
Slides for the following paper: NLP Data Cleansing Based on Linguistic Ontology Constraints
Abstract: Linked Data comprises of an unprecedented volume of structured data on the Web and is adopted from an increasing number of domains. However, the varying quality of published data forms a barrier for further adoption, especially for Linked Data consumers. In this paper, we extend a previously developed methodology of Linked Data quality assessment, which is inspired by test-driven software development. Specifically, we enrich it with ontological support and different levels of result reporting and describe how the method is applied in the Natural Language Processing (NLP) area. NLP is – compared to other domains, such as biology – a late Linked Data adopter. However, it has seen a
steep rise of activity in the creation of data and ontologies. NLP data quality assessment has become an important need for NLP datasets. In our study, we analysed 11 datasets using the lemon and NIF vocabularies in 277 test cases and point out common quality issues.
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...Julien PLU
Julien Plu defended his PhD thesis on knowledge extraction from web media. His research addressed three main challenges: 1) extracting entities from different types of texts and languages, 2) linking entities to multiple knowledge bases, and 3) adapting entity linking pipelines for different contexts. To extract entities, he evaluated various natural language processing techniques including phrase matching, sequence labeling using neural networks, and coreference resolution. He found that combining multiple named entity recognition models improved performance over using a single model. Plu's research provided methods for extracting and linking entities from diverse textual sources in an adaptable manner.
"4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QALD-9 Question Answering over Linked Data Challenge" as presented in the 17th International Semantic Web Conference ISWC, 8th - 12th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Keynote presentation made by Associate Professor Rodney Clarke, Associate Fellow, SMART Infrastructure Facility, University of Wollongong on Day 1 of ISNGI 2016.
An overview of the workshop as presented at the 1st International Workshop on Benchmarking Linked Data (BLINK).
(HOBBIT project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 688227.)
"An Evaluation of Models for Runtime Approximation in Link Discovery" as presented in the IEEE/WIC/ACM WI, August 25th, 2017, held in Leipzig, Germany.
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Machine Learning Methods for Analysing and Linking RDF DataJens Lehmann
Invited Talk at the 8th International Conference on Scalable Uncertainty Management (SUM)
The talk outlines applications of supervised structured machine learning and presents a specific refinement operator based approach for RDF/OWL. It also outlines how similar ideas can be used in other (formal) languages, in particular link specifications.
CNI fall 2009 enhanced publications john_doove-SURFfoundationJohn Doove
- SURF is an organization in the Netherlands that works to improve ICT infrastructure for higher education and research.
- SURF is working on projects to develop "enhanced publications" which combine traditional publications like text with additional materials like data, maps, images and annotations.
- Several projects have been funded to create enhanced publications in fields like archaeology and psychology. Challenges include presentation, identification, long-term preservation and developing tools and infrastructure to support enhanced publications.
- Moving forward, SURF will work on developing repository infrastructure to store and share enhanced publications, creating guidelines and incentivizing their creation through things like legal reports and reward systems.
Deep Learning with H2O and R
In my previous TNO4U talk I gave an introduction about how I addressed the classification problem for autonomous driving using fuzzy logic based insights. I also gave a very concise introduction on deep learning. In this talk I want to go more into the details of deep learning - what is it - and why people think it is so important. Due to the duration of the talk I will not go through the complete history of Artificial Intelligence from the perceptron, via the Hopfield net, towards modern Restricted Bolzmann Machines and Convoluted Neural Networks. Nor get philosophical and do a Gödel, Escher, Bach exposé.
I will just give some basic theoretical considerations and demonstrate how one easy it is to get results with deep learning using – open source- tools like R and H2O. You can install these for free on any computer, Windows, Linux or Mac. R is of course the computer language of choice for data science, H2O is an easy to use interface between R and Big Data (like Spark).
During the talk we will do some small workshop style examples. Handwriting recognition with a Restricted Bolzmann Machine, analyze heartbeats with machine learning and do a little predictive modelling on an industrial process.
Are this the heartbeats of a healthy person? Let’s ask our algorithm (The computer has seen more heartbeats than any living doctor)
This document provides an overview of the SHEBANQ project, which provides tools for querying annotated Hebrew text data. It describes the data sources and contributors that have built up the underlying text corpus over many years. It also outlines the steps taken to make this data and related tools more accessible, including developing a website, depositing data in archives, running demonstration projects, and integrating the data and tools into broader research environments through additional projects and publications. The goal has been to facilitate wider use of this linguistic resource and foster more digital humanities and data science work based on its contents.
Oana Tifrea-Marciuska's research focuses on artificial intelligence for social good, specifically how people reason and communicate. Her work includes developing tools for automatic humor recognition, adaptive learning systems, personalized search incorporating social and semantic data, fact checking, and semantic parsing of natural language questions into logical forms. She has published papers on representing and reasoning with preferences in ontology languages like Datalog± to enable more personalized search and query answering over semantic data.
Statistical Analysis of Results in Music Information Retrieval: Why and HowJulián Urbano
This document summarizes an introduction given by Julián Urbano and Arthur Flexer on statistical analysis in music information retrieval. It discusses why evaluation is important, including addressing questions about how good a system is and comparing different systems. It describes how the Cranfield paradigm addresses these questions by using fixed test collections with documents, topics and relevance judgments to simulate users and allow reproducible experiments. Finally, it outlines different types of music information retrieval tasks like retrieval, annotation and extraction and how the evaluation approach may differ depending on the specific task or use case.
Analyzing OpenCourseWare usage by means of social taggingJulià Minguillón
1) The document analyzes how users tag OpenCourseWare resources on the social bookmarking site Delicious in order to understand how these resources are organized and described.
2) The researchers collected tagging data for OpenCourseWare URLs from Delicious and performed data analysis to identify themes in how URLs and their content were categorized.
3) Preliminary results identified factors that grouped tags by semantic relationships, showing potential to describe course domains, resource types, and formats to improve searching of OpenCourseWare content.
"EARL: Joint Entity and Relation Linking for Question Answering over Knowledge Graphs" as presented in Sthe 17th International Semantic Web Conference ISWC, 9th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"Benchmarking Big Linked Data: The case of the HOBBIT Project" as presented in the First International Workshop on Semantic Web Technologies for Health Data Management (SWH 2018), co-located with ISWC 2018, 9th October, 2018 held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning Benchmark" as presented in SSWS 2018 co-located with the 17th International Semantic Web Conference ISWC, 9th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"The DEBS Grand Challenge 2018" as presented in the 12th ACM International Conference on Distributed and Event-Based Systems (DEBS 2017), 25 - 29 June, 2017 held in Hammilton, New Zeland
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"Benchmarking of distributed linked data streaming systems" as presented in the Stream Reasoning Workshop 2018, January 16-17, 2018, held by Department of Informatics DDIS (University of Zurich) in Zurich, Suisse
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"SQCFramework: SPARQL Query Containment Benchmarks Generation Framework" as presented in the 9th International Conference on Knowledge Capture(K-Cap 2017), December 4th-6th, 2017, held in Austin, USA.
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"LargeRDFBench: A billion triples benchmark for SPARQL endpoint federation" as presented in the 17th International Semantic Web Conference ISWC ( ournal track), 8th - 12th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"The DEBS Grand Challenge 2017" as presented in the The 11th ACM International Conference on Distributed and Event-Based Systems, 19 - 23 June, 2017 held in Barcelona, Spain
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"Scalable Link Discovery for Modern Data-Driven Applications" poster presented ECAI 2016, September 2016, held in the Hague, Netherlands.
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"Scalable Link Discovery for Modern Data-Driven Applications" as presented in the 15th International Semantic Web Conference ISWC, Doctoral Consortium, October 18th, 2016, held in Kobe, Japan
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint Federation" as presented in SSWS 2018 co-located with the 17th International Semantic Web Conference ISWC, 8th - 12th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"SPgen: A Benchmark Generator for Spatial Link Discovery Tools" as presented in Ontology Matching (OM) hosted by the 17th International Semantic Web Conference ISWC, 8th - 12th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"Introducing the HOBBIT platform into the Ontology Alignment Evaluation Campaign" was presented in Ontology Matching (OM) hosted by the 17th International Semantic Web Conference ISWC, 8th - 12th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
OKE Challenge was hosted by European Semantic Web Conference ESWC, 3-7 June 2018, held in Heraklion, Crete, Greece (Aldemar Knossos Royal & Royal Villa).
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
MOCHA Challenge was hosted by European Semantic Web Conference ESWC, 3-7 June 2018, held in Heraklion, Crete, Greece (Aldemar Knossos Royal & Royal Villa).
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Paper presented at European Semantic Web Conference ESWC, 3-7 June 2018, held in Heraklion, Crete, Greece (Aldemar Knossos Royal & Royal Villa).
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
HOBBIT project overview presented at European Big Data Value Forum, 21-23 Nov 2017, held in Versailles, France (Palais des Congres).
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Leopard ISWC Semantic Web Challenge 2017 at ISWC2017.
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Leopard ISWC Semantic Web Challenge 2017 at ISWC2017.
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Benchmarking Link Discovery Systems for Geo-Spatial Data presented at BLINK2017 at ISWC2017.
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
More from Holistic Benchmarking of Big Linked Data (20)
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...Advanced-Concepts-Team
Presentation in the Science Coffee of the Advanced Concepts Team of the European Space Agency on the 07.06.2024.
Speaker: Diego Blas (IFAE/ICREA)
Title: Gravitational wave detection with orbital motion of Moon and artificial
Abstract:
In this talk I will describe some recent ideas to find gravitational waves from supermassive black holes or of primordial origin by studying their secular effect on the orbital motion of the Moon or satellites that are laser ranged.
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...Sérgio Sacani
Context. With a mass exceeding several 104 M⊙ and a rich and dense population of massive stars, supermassive young star clusters
represent the most massive star-forming environment that is dominated by the feedback from massive stars and gravitational interactions
among stars.
Aims. In this paper we present the Extended Westerlund 1 and 2 Open Clusters Survey (EWOCS) project, which aims to investigate
the influence of the starburst environment on the formation of stars and planets, and on the evolution of both low and high mass stars.
The primary targets of this project are Westerlund 1 and 2, the closest supermassive star clusters to the Sun.
Methods. The project is based primarily on recent observations conducted with the Chandra and JWST observatories. Specifically,
the Chandra survey of Westerlund 1 consists of 36 new ACIS-I observations, nearly co-pointed, for a total exposure time of 1 Msec.
Additionally, we included 8 archival Chandra/ACIS-S observations. This paper presents the resulting catalog of X-ray sources within
and around Westerlund 1. Sources were detected by combining various existing methods, and photon extraction and source validation
were carried out using the ACIS-Extract software.
Results. The EWOCS X-ray catalog comprises 5963 validated sources out of the 9420 initially provided to ACIS-Extract, reaching a
photon flux threshold of approximately 2 × 10−8 photons cm−2
s
−1
. The X-ray sources exhibit a highly concentrated spatial distribution,
with 1075 sources located within the central 1 arcmin. We have successfully detected X-ray emissions from 126 out of the 166 known
massive stars of the cluster, and we have collected over 71 000 photons from the magnetar CXO J164710.20-455217.
The technology uses reclaimed CO₂ as the dyeing medium in a closed loop process. When pressurized, CO₂ becomes supercritical (SC-CO₂). In this state CO₂ has a very high solvent power, allowing the dye to dissolve easily.
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...Scintica Instrumentation
Targeting Hsp90 and its pathogen Orthologs with Tethered Inhibitors as a Diagnostic and Therapeutic Strategy for cancer and infectious diseases with Dr. Timothy Haystead.
Current Ms word generated power point presentation covers major details about the micronuclei test. It's significance and assays to conduct it. It is used to detect the micronuclei formation inside the cells of nearly every multicellular organism. It's formation takes place during chromosomal sepration at metaphase.
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Leonel Morgado
Current descriptions of immersive learning cases are often difficult or impossible to compare. This is due to a myriad of different options on what details to include, which aspects are relevant, and on the descriptive approaches employed. Also, these aspects often combine very specific details with more general guidelines or indicate intents and rationales without clarifying their implementation. In this paper we provide a method to describe immersive learning cases that is structured to enable comparisons, yet flexible enough to allow researchers and practitioners to decide which aspects to include. This method leverages a taxonomy that classifies educational aspects at three levels (uses, practices, and strategies) and then utilizes two frameworks, the Immersive Learning Brain and the Immersion Cube, to enable a structured description and interpretation of immersive learning cases. The method is then demonstrated on a published immersive learning case on training for wind turbine maintenance using virtual reality. Applying the method results in a structured artifact, the Immersive Learning Case Sheet, that tags the case with its proximal uses, practices, and strategies, and refines the free text case description to ensure that matching details are included. This contribution is thus a case description method in support of future comparative research of immersive learning cases. We then discuss how the resulting description and interpretation can be leveraged to change immersion learning cases, by enriching them (considering low-effort changes or additions) or innovating (exploring more challenging avenues of transformation). The method holds significant promise to support better-grounded research in immersive learning.
Authoring a personal GPT for your research and practice: How we created the Q...Leonel Morgado
Thematic analysis in qualitative research is a time-consuming and systematic task, typically done using teams. Team members must ground their activities on common understandings of the major concepts underlying the thematic analysis, and define criteria for its development. However, conceptual misunderstandings, equivocations, and lack of adherence to criteria are challenges to the quality and speed of this process. Given the distributed and uncertain nature of this process, we wondered if the tasks in thematic analysis could be supported by readily available artificial intelligence chatbots. Our early efforts point to potential benefits: not just saving time in the coding process but better adherence to criteria and grounding, by increasing triangulation between humans and artificial intelligence. This tutorial will provide a description and demonstration of the process we followed, as two academic researchers, to develop a custom ChatGPT to assist with qualitative coding in the thematic data analysis process of immersive learning accounts in a survey of the academic literature: QUAL-E Immersive Learning Thematic Analysis Helper. In the hands-on time, participants will try out QUAL-E and develop their ideas for their own qualitative coding ChatGPT. Participants that have the paid ChatGPT Plus subscription can create a draft of their assistants. The organizers will provide course materials and slide deck that participants will be able to utilize to continue development of their custom GPT. The paid subscription to ChatGPT Plus is not required to participate in this workshop, just for trying out personal GPTs during it.
PPT on Direct Seeded Rice presented at the three-day 'Training and Validation Workshop on Modules of Climate Smart Agriculture (CSA) Technologies in South Asia' workshop on April 22, 2024.
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfSelcen Ozturkcan
Ozturkcan, S., Berndt, A., & Angelakis, A. (2024). Mending clothing to support sustainable fashion. Presented at the 31st Annual Conference by the Consortium for International Marketing Research (CIMaR), 10-13 Jun 2024, University of Gävle, Sweden.
The debris of the ‘last major merger’ is dynamically youngSérgio Sacani
The Milky Way’s (MW) inner stellar halo contains an [Fe/H]-rich component with highly eccentric orbits, often referred to as the
‘last major merger.’ Hypotheses for the origin of this component include Gaia-Sausage/Enceladus (GSE), where the progenitor
collided with the MW proto-disc 8–11 Gyr ago, and the Virgo Radial Merger (VRM), where the progenitor collided with the
MW disc within the last 3 Gyr. These two scenarios make different predictions about observable structure in local phase space,
because the morphology of debris depends on how long it has had to phase mix. The recently identified phase-space folds in Gaia
DR3 have positive caustic velocities, making them fundamentally different than the phase-mixed chevrons found in simulations
at late times. Roughly 20 per cent of the stars in the prograde local stellar halo are associated with the observed caustics. Based
on a simple phase-mixing model, the observed number of caustics are consistent with a merger that occurred 1–2 Gyr ago.
We also compare the observed phase-space distribution to FIRE-2 Latte simulations of GSE-like mergers, using a quantitative
measurement of phase mixing (2D causticality). The observed local phase-space distribution best matches the simulated data
1–2 Gyr after collision, and certainly not later than 3 Gyr. This is further evidence that the progenitor of the ‘last major merger’
did not collide with the MW proto-disc at early times, as is thought for the GSE, but instead collided with the MW disc within
the last few Gyr, consistent with the body of work surrounding the VRM.
The debris of the ‘last major merger’ is dynamically young
Link Discovery Tutorial Introduction
1. Link Discovery Tutorial
Introduction
Axel-Cyrille Ngonga Ngomo(1)
, Irini Fundulaki(2)
, Mohamed Ahmed Sherif(1)
(1) Institute for Applied Informatics, Germany
(2) FORTH, Greece
October 18th, 2016
Kobe, Japan
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 1 / 24
2. Introduction
Disclaimer
No pursuit of completeness [Nen+15]
Focus on
Basic ideas and principles
Principles
Evaluation
Open questions and challenges
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 2 / 24
3. Introduction
Disclaimer
No pursuit of completeness [Nen+15]
Focus on
Basic ideas and principles
Principles
Evaluation
Open questions and challenges
Guidelines
http://iswc2016ldtutorial.aksw.org/
Question? Just ask!
Comment? Go for it!
Be kind :)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 2 / 24
6. Why Link Discovery?
1 Linked Open Data Cloud
130+ billion triples
≈ 0.5 billion links
Mostly owl:sameAs
2 Decentralized dataset
creation
3 Complex information needs
⇒ Need to consume data
across knowledge bases
4 Links are central for
Cross-ontology QA
Data Integration
Reasoning
Federated Queries
...
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 5 / 24
7. Cross-Ontology QA
Example
Give me the name and description of all drugs that cure their side-effect. [SNA13]
1 Need information from
Drugbank (Drug description)
Sider (Side-effects)
DBpedia (Description)
2 Gathering information via SPARQL query
using links
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 6 / 24
8. Cross-Ontology QA
Example
Give me the name and description of all drugs that cure their side-effect.
SELECT ?drug ?name ?desc WHERE
{
?drug a drugbank:Drug .
?drug rdfs:label ?name .
?drug drugbank:cures ?disease .
?drug owl:sameAs ?drug2 .
?drug owl:sameAs ?drug3 .
?drug2 sider:hasSideEffect ?effect .
?effect owl:sameAs ?disease .
?drug3 dbo:hasWikiPage ?desc .
}
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 7 / 24
9. Cross-Ontology QA
Example (DEQA)
Give me flats near kindergartens in Kobe. [Leh+12]
SELECT ?flat WHERE
{
?flat a deqa:Flat .
?flat deqa:near ?school .
?school a lgdo:School .
?school lgdo:city lgdo:Kobe .
}
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 8 / 24
10. Data Integration
Federated Queries on Patient Data [Kha+14]
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 9 / 24
11. Federated Queries
Example (FedBench CD2)
Return Barack Obama’s party membership and news pages. [Sal+15]
SELECT ?party ?page WHERE
{
dbr:Barack_Obama dbo:party ?party .
?x nytimes:topicPage ?page .
?x owl:sameAs dbr:Barack_Obama .
}
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 10 / 24
12. Definition
Definition (Link Discovery, informal)
Given two knowledge bases S and T, find links of type R between S and T
Here, declarative link discovery
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 11 / 24
13. Definition
Definition (Link Discovery, informal)
Given two knowledge bases S and T, find links of type R between S and T
Here, declarative link discovery
Definition (Declarative Link Discovery, formal, similarities)
Given sets S and T of resources and relation R
Find M = {(s, t) ∈ S × T : R(s, t)}
Common approach: Find M = {(s, t) ∈ S × T : σ(s, t) ≥ θ}
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 11 / 24
14. Definition
Definition (Link Discovery, informal)
Given two knowledge bases S and T, find links of type R between S and T
Here, declarative link discovery
Definition (Declarative Link Discovery, formal, similarities)
Given sets S and T of resources and relation R
Find M = {(s, t) ∈ S × T : R(s, t)}
Common approach: Find M = {(s, t) ∈ S × T : σ(s, t) ≥ θ}
Definition (Declarative Link Discovery, formal, distances)
Given sets S and T of resources and relation R
Find M = {(s, t) ∈ S × T : R(s, t)}
Common approach: Find M = {(s, t) ∈ S × T : δ(s, t) ≤ τ}
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 11 / 24
15. Definition
Most common: R = owl:sameAs
Also known as deduplication [Nen+15]
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 12 / 24
16. Definition
Goal: Address all possible relations R
Declarative Link Discovery: Similarity/distance defined using property values
(incl. property chains)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 13 / 24
17. Definition
Goal: Address all possible relations R
Declarative Link Discovery: Similarity/distance defined using property values
(incl. property chains)
Example: R = :sameModel
:s770fm rdfs:label "S770FM"@en
:s770fm rdf:type :SABER
:s770fm :model :770
:s770fm :top :FlamedMaple
:s770fm :producer :Ibanez
:s770fm rdfs:label "S770BEM"@en
:s770fm rdf:type :SABER
:s770fm :model :770
:s770fm :top :BirdEyeMaple
:s770fm :producer :Ibanez
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 13 / 24
18. Why is it difficult?
1 Time complexity
Large number of triples (e.g.,
LinkedTCGA with 20.4 billion
triples [Sal+14])
Quadratic a-priori runtime
69 days for mapping cities from
DBpedia to Geonames
Solutions usually in-memory
(insufficient heap space)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 14 / 24
19. Why is it difficult?
1 Time complexity
Large number of triples (e.g.,
LinkedTCGA with 20.4 billion
triples [Sal+14])
Quadratic a-priori runtime
69 days for mapping cities from
DBpedia to Geonames
Solutions usually in-memory
(insufficient heap space)
2 Accuracy
Combination of several attributes
required for high precision
Tedious discovery of most
adequate mapping
Dataset-dependent similarity
functions
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 14 / 24
20. Structure
1 Time complexity(≈ 60 min)
LIMES algorithm [NA11]
MultiBlock [IJB11]
HR3
[Ngo12]
AEGLE [GSN16]
Summary and Challenges
2 Accuracy(≈ 30 min)
RAVEN [Ngo+11]
EAGLE [NL12]
COALA [NLC13]
Summary and Challenges
3 Benchmarking (≈ 30 min)
Benchmarking [NGF16]
Synthetic Benchmarks [Sav+15]
Real Benchmarks [Mor+11]
Summary and Challenges
4 Hands-On Session (≈ 45 min)
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 15 / 24
21. That’s all Folks!
Axel Ngonga
AKSW Research Group
Institute for Applied Informatics
ngonga@informatik.uni-leipzig.de
Irini Fundulaki
ICS FORTH
fundul@ics.forth.gr
Mohamed Ahmed Sherif
AKSW Research Group
Institute for Applied Informatics
sherif@informatik.uni-leipzig.de
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 16 / 24
22. Acknowledgment
This work was supported by grants from the EU H2020 Framework Programme
provided for the project HOBBIT (GA no. 688227).
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 17 / 24
23. References I
Kleanthi Georgala, Mohamed Ahmed Sherif, and
Axel-Cyrille Ngonga Ngomo. “An Efficient Approach for the
Generation of Allen Relations”. In: ECAI 2016 - 22nd European
Conference on Artificial Intelligence, 29 August-2 September 2016,
The Hague, The Netherlands - Including Prestigious Applications of
Artificial Intelligence (PAIS 2016). Ed. by Gal A. Kaminka et al.
Vol. 285. Frontiers in Artificial Intelligence and Applications. IOS
Press, 2016, pp. 948–956. isbn: 978-1-61499-671-2. doi:
10.3233/978-1-61499-672-9-948. url:
http://dx.doi.org/10.3233/978-1-61499-672-9-948.
Robert Isele, Anja Jentzsch, and Christian Bizer. “Efficient
Multidimensional Blocking for Link Discovery without losing Recall”.
In: Proceedings of the 14th International Workshop on the Web and
Databases 2011, WebDB 2011, Athens, Greece, June 12, 2011. Ed. by
Amélie Marian and Vasilis Vassalos. 2011. url:
http://webdb2011.rutgers.edu/papers/Paper%2039/silk.pdf.
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 18 / 24
24. References II
Yasar Khan et al. “SAFE: Policy Aware SPARQL Query Federation
Over RDF Data Cubes”. In: Proceedings of the 7th International
Workshop on Semantic Web Applications and Tools for Life Sciences,
Berlin, Germany, December 9-11, 2014. Ed. by Adrian Paschke et al.
Vol. 1320. CEUR Workshop Proceedings. CEUR-WS.org, 2014. url:
http://ceur-ws.org/Vol-1320/Preface_SWAT4LS2014.pdf.
Jens Lehmann et al. “deqa: Deep Web Extraction for Question
Answering”. In: The Semantic Web - ISWC 2012 - 11th International
Semantic Web Conference, Boston, MA, USA, November 11-15, 2012,
Proceedings, Part II. Ed. by Philippe Cudré-Mauroux et al. Vol. 7650.
Lecture Notes in Computer Science. Springer, 2012, pp. 131–147.
isbn: 978-3-642-35172-3. doi: 10.1007/978-3-642-35173-0_9.
url: http://dx.doi.org/10.1007/978-3-642-35173-0_9.
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 19 / 24
25. References III
Mohamed Morsey et al. “DBpedia SPARQL Benchmark -
Performance Assessment with Real Queries on Real Data”. In: The
Semantic Web - ISWC 2011 - 10th International Semantic Web
Conference, Bonn, Germany, October 23-27, 2011, Proceedings, Part
I. Ed. by Lora Aroyo et al. Vol. 7031. Lecture Notes in Computer
Science. Springer, 2011, pp. 454–469. isbn: 978-3-642-25072-9. doi:
10.1007/978-3-642-25073-6_29. url:
http://dx.doi.org/10.1007/978-3-642-25073-6_29.
Axel-Cyrille Ngonga Ngomo and Sören Auer. “LIMES - A
Time-Efficient Approach for Large-Scale Link Discovery on the Web
of Data”. In: IJCAI 2011, Proceedings of the 22nd International Joint
Conference on Artificial Intelligence, Barcelona, Catalonia, Spain, July
16-22, 2011. Ed. by Toby Walsh. IJCAI/AAAI, 2011, pp. 2312–2317.
isbn: 978-1-57735-516-8. doi:
10.5591/978-1-57735-516-8/IJCAI11-385. url: http:
//dx.doi.org/10.5591/978-1-57735-516-8/IJCAI11-385.
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 20 / 24
26. References IV
Markus Nentwig et al. “A survey of current Link Discovery
frameworks”. In: Semantic Web Preprint (2015), pp. 1–18.
Axel-Cyrille Ngonga Ngomo, Alejandra Garcıa-Rojas, and
Irini Fundulaki. “HOBBIT: Holistic Benchmarking of Big Linked
Data”. In: ERCIM News 2016.105 (2016). url:
http://ercim-news.ercim.eu/en105/r-i/hobbit-holistic-
benchmarking-of-big-linked-data.
Axel-Cyrille Ngonga Ngomo et al. “RAVEN - active learning of link
specifications”. In: Proceedings of the 6th International Workshop on
Ontology Matching, Bonn, Germany, October 24, 2011. Ed. by
Pavel Shvaiko et al. Vol. 814. CEUR Workshop Proceedings.
CEUR-WS.org, 2011. url:
http://ceur-ws.org/Vol-814/om2011_Tpaper3.pdf.
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 21 / 24
27. References V
Axel-Cyrille Ngonga Ngomo. “Link Discovery with Guaranteed
Reduction Ratio in Affine Spaces with Minkowski Measures”. In: The
Semantic Web - ISWC 2012 - 11th International Semantic Web
Conference, Boston, MA, USA, November 11-15, 2012, Proceedings,
Part I. Ed. by Philippe Cudré-Mauroux et al. Vol. 7649. Lecture Notes
in Computer Science. Springer, 2012, pp. 378–393. isbn:
978-3-642-35175-4. doi: 10.1007/978-3-642-35176-1_24. url:
http://dx.doi.org/10.1007/978-3-642-35176-1_24.
Axel-Cyrille Ngonga Ngomo and Klaus Lyko. “EAGLE: Efficient Active
Learning of Link Specifications Using Genetic Programming”. In: The
Semantic Web: Research and Applications - 9th Extended Semantic
Web Conference, ESWC 2012, Heraklion, Crete, Greece, May 27-31,
2012. Proceedings. Ed. by Elena Simperl et al. Vol. 7295. Lecture
Notes in Computer Science. Springer, 2012, pp. 149–163. isbn:
978-3-642-30283-1. doi: 10.1007/978-3-642-30284-8_17. url:
http://dx.doi.org/10.1007/978-3-642-30284-8_17.
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 22 / 24
28. References VI
Axel-Cyrille Ngonga Ngomo, Klaus Lyko, and Victor Christen.
“COALA - Correlation-Aware Active Learning of Link Specifications”.
In: The Semantic Web: Semantics and Big Data, 10th International
Conference, ESWC 2013, Montpellier, France, May 26-30, 2013.
Proceedings. Ed. by Philipp Cimiano et al. Vol. 7882. Lecture Notes
in Computer Science. Springer, 2013, pp. 442–456. isbn:
978-3-642-38287-1. doi: 10.1007/978-3-642-38288-8_30. url:
http://dx.doi.org/10.1007/978-3-642-38288-8_30.
Muhammad Saleem et al. “TopFed: TCGA Tailored Federated Query
Processing and Linking to LOD”. In: J. Biomedical Semantics 5
(2014), p. 47. doi: 10.1186/2041-1480-5-47. url:
http://dx.doi.org/10.1186/2041-1480-5-47.
Muhammad Saleem et al. “A fine-grained evaluation of SPARQL
endpoint federation systems”. In: Semantic Web 7.5 (2015),
pp. 493–518. doi: 10.3233/SW-150186. url:
http://dx.doi.org/10.3233/SW-150186.
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 23 / 24
29. References VII
Tzanina Saveta et al. “LANCE: Piercing to the Heart of Instance
Matching Tools”. In: The Semantic Web - ISWC 2015 - 14th
International Semantic Web Conference, Bethlehem, PA, USA,
October 11-15, 2015, Proceedings, Part I. Ed. by Marcelo Arenas
et al. Vol. 9366. Lecture Notes in Computer Science. Springer, 2015,
pp. 375–391. isbn: 978-3-319-25006-9. doi:
10.1007/978-3-319-25007-6_22. url:
http://dx.doi.org/10.1007/978-3-319-25007-6_22.
Saeedeh Shekarpour, Axel-Cyrille Ngonga Ngomo, and Sören Auer.
“Question answering on interlinked data”. In: 22nd International
World Wide Web Conference, WWW ’13, Rio de Janeiro, Brazil, May
13-17, 2013. Ed. by Daniel Schwabe et al. International World Wide
Web Conferences Steering Committee / ACM, 2013, pp. 1145–1156.
isbn: 978-1-4503-2035-1. url:
http://dl.acm.org/citation.cfm?id=2488488.
Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial:Intro October 17, 2016 24 / 24