Question Answering - Application and ChallengesJens Lehmann
This document provides an overview of question answering applications and challenges. It defines question answering as receiving natural language questions and providing concise answers. Recent developments in question answering systems are discussed, including IBM Watson. Challenges for question answering over semantic data are explored, such as lexical gaps, ambiguity, granularity, and alternative resources. Large-scale linguistic resources and machine learning approaches for question answering are also covered. Applications of question answering technologies are examined.
The document presents two neural network models for named entity recognition (NER) without language-specific resources: an LSTM-CRF model and a transition-based stack LSTM (S-LSTM) model. The LSTM-CRF model uses a bidirectional LSTM layer followed by a CRF layer to label input sequences, while the S-LSTM model directly constructs labeled entity chunks. Both models represent words as character-level representations from a bidirectional LSTM combined with word embeddings. The models are evaluated on four languages and achieve state-of-the-art performance on three of the languages without external labeled data.
This document presents an overview of named entity recognition (NER) and the conditional random field (CRF) algorithm for NER. It defines NER as the identification and classification of named entities like people, organizations, locations, etc. in unstructured text. The document discusses the types of named entities, common NER techniques including rule-based and supervised methods, and explains the CRF algorithm and its mathematical model. It also covers the advantages of CRF for NER and examples of its applications in areas like information extraction.
WISS QA Do it yourself Question answering over Linked DataAndre Freitas
This document describes a challenge to build a question answering system over linked data from DBpedia and Wikipedia. Participants will work in groups to develop components of the QA system, such as question analysis, entity search, query generation, graph extraction, evaluation, and a user interface. The goal is to have a working QA system by the end of the challenge that can answer natural language questions over linked data.
This document describes an approach for bridging the gap between natural language queries and linked data concepts using BabelNet. The approach uses BabelNet for word sense disambiguation, named entity recognition and disambiguation. It parses queries, matches terms to ontology concepts and properties, generates candidate triples, and integrates the triples to produce SPARQL queries. The approach was evaluated on test data from QALD-2, achieving a promising 76% of questions answered correctly.
This document provides an overview of the OpenNLP natural language processing tool. It discusses the various NLP tasks that OpenNLP can perform, including tokenization, POS tagging, named entity recognition, chunking, parsing, and co-reference resolution. It also describes how models for these tasks are trained in OpenNLP using annotated training data. The document concludes by listing some advantages and limitations of OpenNLP.
Question Answering - Application and ChallengesJens Lehmann
This document provides an overview of question answering applications and challenges. It defines question answering as receiving natural language questions and providing concise answers. Recent developments in question answering systems are discussed, including IBM Watson. Challenges for question answering over semantic data are explored, such as lexical gaps, ambiguity, granularity, and alternative resources. Large-scale linguistic resources and machine learning approaches for question answering are also covered. Applications of question answering technologies are examined.
The document presents two neural network models for named entity recognition (NER) without language-specific resources: an LSTM-CRF model and a transition-based stack LSTM (S-LSTM) model. The LSTM-CRF model uses a bidirectional LSTM layer followed by a CRF layer to label input sequences, while the S-LSTM model directly constructs labeled entity chunks. Both models represent words as character-level representations from a bidirectional LSTM combined with word embeddings. The models are evaluated on four languages and achieve state-of-the-art performance on three of the languages without external labeled data.
This document presents an overview of named entity recognition (NER) and the conditional random field (CRF) algorithm for NER. It defines NER as the identification and classification of named entities like people, organizations, locations, etc. in unstructured text. The document discusses the types of named entities, common NER techniques including rule-based and supervised methods, and explains the CRF algorithm and its mathematical model. It also covers the advantages of CRF for NER and examples of its applications in areas like information extraction.
WISS QA Do it yourself Question answering over Linked DataAndre Freitas
This document describes a challenge to build a question answering system over linked data from DBpedia and Wikipedia. Participants will work in groups to develop components of the QA system, such as question analysis, entity search, query generation, graph extraction, evaluation, and a user interface. The goal is to have a working QA system by the end of the challenge that can answer natural language questions over linked data.
This document describes an approach for bridging the gap between natural language queries and linked data concepts using BabelNet. The approach uses BabelNet for word sense disambiguation, named entity recognition and disambiguation. It parses queries, matches terms to ontology concepts and properties, generates candidate triples, and integrates the triples to produce SPARQL queries. The approach was evaluated on test data from QALD-2, achieving a promising 76% of questions answered correctly.
This document provides an overview of the OpenNLP natural language processing tool. It discusses the various NLP tasks that OpenNLP can perform, including tokenization, POS tagging, named entity recognition, chunking, parsing, and co-reference resolution. It also describes how models for these tasks are trained in OpenNLP using annotated training data. The document concludes by listing some advantages and limitations of OpenNLP.
A Context-Based Semantics for SPARQL Property Paths over the WebOlaf Hartig
- The document proposes a formal context-based semantics for evaluating SPARQL property path queries over the Web of Linked Data.
- This semantics defines how to compute the results of such queries in a well-defined manner and ensures the "web-safeness" of queries, meaning they can be executed directly over the Web without prior knowledge of all data.
- The paper presents a decidable syntactic condition for identifying SPARQL property path queries that are web-safe based on their sets of conditionally bounded variables.
A short tutorial on R, basically for a starter who wants to do data mining especially text data mining.
Related codes and data will be found at the following lnik: http://textanalytics.in/wm/R%20tutorial%20(DATA2014).zip
This document describes a paraphrase detection algorithm that uses semantic similarity scores from various NLP toolkits and machine translation engines. It evaluates pairs of sentences to classify them as either non-paraphrases, near-paraphrases, or precise paraphrases (Task 1) or simply paraphrase and non-paraphrase (Task 2). The algorithm uses feature vectors containing similarity scores from tools like SEMILAR, DKPro Similarity, NLTK WordNet, Swoogle, and BLEU, fed into a gradient boosting classifier. Evaluation on test data showed an accuracy of 0.5695 for Task 1 and 0.7153 for Task 2, placing middle of submissions. The Microsoft
What one needs to know to work in Natural Language Processing field and the aspects of developing an NLP project using the example of a system to identify text language
Concepts in Application Context ( How we may think conceptually )Steffen Staab
Formal concept analysis (FCA) derives a hierarchy of concepts
in a formal context that relates objects with attributes. This approach is very well aligned with the traditions of Frege, Saussure and Peirce, which relate a signifier (e.g. a word/an attribute) to a mental concept evoked by this word and meant to refer to a specific object in the real world. However, in the practice of natural languages as well as artificial languages (e.g. programming languages), the application context
often constitutes a latent variable that influences the interpretation of a signifier. We present some of our current work that analyzes the usage of words in natural language in varying application contexts as well as the usage of variables in programming languages in varying application contexts in order to provide conceptual constraints on these signifiers.
Schema-agnositc queries over large-schema databases: a distributional semanti...Andre Freitas
This document provides an overview and summary of André Freitas' PhD thesis defense presentation on schema-agnostic queries for large schema databases using distributional semantics. The presentation motivates the need for schema-agnostic queries due to the rise of very large and dynamic database schemas. It proposes using distributional semantics to provide an accurate, comprehensive and low maintenance approach to cope with semantic heterogeneity in schema-agnostic queries. The key aspects of the approach include semantic pivoting to reduce semantic complexity, distributional semantic models to enable semantic matching, and a hybrid distributional-relational semantic model called τ-Space to support the development of a schema-agnostic query mechanism.
This document provides an overview of natural language processing (NLP) including the linguistic basis of NLP, common NLP problems and approaches, sources of NLP data, and steps to develop an NLP system. It discusses tokenization, part-of-speech tagging, parsing, machine learning approaches like naive Bayes classification and dependency parsing, measuring word similarity, and distributional semantics. The document also provides advice on going from research to production systems and notes areas not covered like machine translation and deep learning methods.
This document discusses practical aspects of natural language processing (NLP) work. It contrasts research work, which involves setting goals, devising algorithms, training models, and testing accuracy, with development work, which focuses on implementing algorithms as scalable APIs. The document emphasizes that obtaining data is crucial for NLP and describes sources for structured, semi-structured, and unstructured data. It recommends Lisp as a language that supports the interactivity, flexibility, and tree processing needed for NLP research and development work.
This document discusses using Lisp for practical natural language processing (NLP). It begins with an overview of NLP practice, including research work like setting goals, devising algorithms, training models, and testing accuracy. It then discusses some pros and cons of using Lisp for NLP, including its support for interactivity, mathematical foundations, and tree structures. Examples are given of interactive Lisp programs and APIs. The document emphasizes that data is key for NLP and discusses sources for collecting data. It concludes that Lisp is well-suited for NLP research and development due to its interactive and flexible nature.
The document provides an agenda for the second day of the WISS Challenge on question answering over linked data. It encourages full effort on projects and offers free coffee. It also includes links to training question and answer datasets for the QALD-4 challenge and the DBpedia endpoint to use.
Webinar: Simpler Semantic Search with SolrLucidworks
Hear from Lucidworks Senior Solutions Consultant Ted Sullivan about how you can leverage Apache Solr and Lucidworks Fusion to improve semantic awareness of your search applications.
This document discusses algorithms for transforming queries between different query languages. It focuses on transformations between Prolog, SPARQL, and λ-DCS queries. The document provides background on these query languages and explains why query transformations are useful, such as for linking natural language to database queries. It then describes two algorithms: one for transforming Prolog to SPARQL queries, and one for transforming SPARQL to λ-DCS queries. The algorithms are tested on a small geography database and the results are analyzed to evaluate the algorithms' performance and limitations.
Semantics at Scale: A Distributional ApproachAndre Freitas
1) The document discusses using distributional semantics to build robust semantic models that can handle large amounts of data and enable semantic computing at scale.
2) It describes how distributional semantic models can be used to represent word meanings based on their linguistic contexts, allowing semantic knowledge bases to be automatically constructed from large text corpora.
3) The author proposes a schema-agnostic approach using distributional semantics to enable querying databases without prior knowledge of schemas, addressing problems of vocabulary and structural differences between queries and data.
The document describes a character-level convolutional neural network approach for sentence paraphrase detection. It evaluates standard and non-standard models using word and character embeddings as inputs to the CNN. The standard model using character embeddings achieved the best results, obtaining an accuracy of 72.74% and F1 score of 78.8%, outperforming the standard word-based model and non-standard model. The document discusses related work applying CNNs to other NLP tasks and analyzes the results.
This document discusses using data augmentation techniques to improve character-level neural networks for Russian language processing. It proposes augmenting data by replacing words with synonyms, randomly shuffling word order, and inserting adjectives near nouns. The techniques are tested on sentiment analysis tasks using a character-level convolutional network model. Results show that augmenting with synonyms significantly improved accuracy on the test set, while other techniques did not provide clear benefits or reduced accuracy on an out-of-domain validation set. The study demonstrates that data augmentation can help for some NLP tasks when applied carefully for the language and model.
Rated Ranking Evaluator: An Open Source Approach for Search Quality EvaluationAlessandro Benedetti
Every team working on Information Retrieval software struggles with the task of evaluating how well their system performs in terms of search quality(at a specific point in time and historically).
Evaluating search quality is important both to understand and size the improvement or regression of your search application across the development cycles, and to communicate such progress to relevant stakeholders.
To satisfy these requirements an helpful tool must be:
flexible and highly configurable for a technical user
immediate, visual and concise for an optimal business utilization
In the industry, and especially in the open source community, the landscape is quite fragmented: such requirements are often achieved using ad-hoc partial solutions that each time require a considerable amount of development and customization effort.
To provide a standard, unified and approachable technology, we developed the Rated Ranking Evaluator (RRE), an open source tool for evaluating and measuring the search quality of a given search infrastructure. RRE is modular, compatible with multiple search technologies and easy to extend. It is composed by a core library and a set of modules and plugins that give it the flexibility to be integrated in automated evaluation processes and in continuous integrations flows.
This talk will introduce RRE, it will describe its latest developments and demonstrate how it can be integrated in a project to measure and assess the search quality of your search application.
The focus of the presentation will be on a live demo showing an example project with a set of initial relevancy issues that we will solve iteration after iteration: using RRE output feedbacks to gradually drive the improvement process until we reach an optimal balance between quality evaluation measures.
The document discusses question answering over knowledge graphs. It introduces question answering and describes how knowledge graphs can be used to answer natural language questions. It summarizes three proposed papers on learning knowledge graphs for question answering through dialogs, automated template generation for question answering over knowledge graphs, and generating knowledge questions from knowledge graphs. The document also covers motivation for question answering, defining characteristics, different methods like template-based and dialog-based systems, evaluating knowledge quality, and examples of question answering systems.
Every team working on information retrieval software struggles with the task of evaluating how well their system performs in terms of search quality(currently and historically). Evaluating search quality is important both to understand and size the improvement or regression of your search application across the development cycles, and to communicate such progress to relevant stakeholders. In the industry, and especially in the open source community, the landscape is quite fragmented: such requirements are often achieved using ad-hoc partial solutions that each time require a considerable amount of development and customization effort. To provide a standard, unified and approachable technology, we developed the Rated Ranking Evaluator (RRE), an open source tool for evaluating and measuring the search quality of a given search infrastructure. RRE is modular, compatible with multiple search technologies and easy to extend.
I used these slides for an introductory lecture (90min) to a seminar on SPARQL. This slideset introduces the semantics of the RDF query language SPARQL.
How the Web can change social science research (including yours)Frank van Harmelen
A presentation for a group of PhD students from the Leibniz Institutes (section B, social sciences) to discuss how they could use the Web, and even better the Web of Data, as an instrument in their research.
Different Semantic Perspectives for Question Answering SystemsAndre Freitas
Question Answering systems define one of the most complex tasks in computational semantics. The intrinsic complexity of the QA task allows researchers of QA systems to investigate and explore different perspectives of semantics. However, this complexity also induces a bias towards a systems perspective, where researchers are alienated from a deeper reasoning on the semantic principles that are in place within the different components of the system. In this talk we will explore the semantic challenges, principles and perspectives behind the components of QA systems, aiming at providing a principled map and overview on the contribution of each component within the QA semantic interpretation goal.
A Context-Based Semantics for SPARQL Property Paths over the WebOlaf Hartig
- The document proposes a formal context-based semantics for evaluating SPARQL property path queries over the Web of Linked Data.
- This semantics defines how to compute the results of such queries in a well-defined manner and ensures the "web-safeness" of queries, meaning they can be executed directly over the Web without prior knowledge of all data.
- The paper presents a decidable syntactic condition for identifying SPARQL property path queries that are web-safe based on their sets of conditionally bounded variables.
A short tutorial on R, basically for a starter who wants to do data mining especially text data mining.
Related codes and data will be found at the following lnik: http://textanalytics.in/wm/R%20tutorial%20(DATA2014).zip
This document describes a paraphrase detection algorithm that uses semantic similarity scores from various NLP toolkits and machine translation engines. It evaluates pairs of sentences to classify them as either non-paraphrases, near-paraphrases, or precise paraphrases (Task 1) or simply paraphrase and non-paraphrase (Task 2). The algorithm uses feature vectors containing similarity scores from tools like SEMILAR, DKPro Similarity, NLTK WordNet, Swoogle, and BLEU, fed into a gradient boosting classifier. Evaluation on test data showed an accuracy of 0.5695 for Task 1 and 0.7153 for Task 2, placing middle of submissions. The Microsoft
What one needs to know to work in Natural Language Processing field and the aspects of developing an NLP project using the example of a system to identify text language
Concepts in Application Context ( How we may think conceptually )Steffen Staab
Formal concept analysis (FCA) derives a hierarchy of concepts
in a formal context that relates objects with attributes. This approach is very well aligned with the traditions of Frege, Saussure and Peirce, which relate a signifier (e.g. a word/an attribute) to a mental concept evoked by this word and meant to refer to a specific object in the real world. However, in the practice of natural languages as well as artificial languages (e.g. programming languages), the application context
often constitutes a latent variable that influences the interpretation of a signifier. We present some of our current work that analyzes the usage of words in natural language in varying application contexts as well as the usage of variables in programming languages in varying application contexts in order to provide conceptual constraints on these signifiers.
Schema-agnositc queries over large-schema databases: a distributional semanti...Andre Freitas
This document provides an overview and summary of André Freitas' PhD thesis defense presentation on schema-agnostic queries for large schema databases using distributional semantics. The presentation motivates the need for schema-agnostic queries due to the rise of very large and dynamic database schemas. It proposes using distributional semantics to provide an accurate, comprehensive and low maintenance approach to cope with semantic heterogeneity in schema-agnostic queries. The key aspects of the approach include semantic pivoting to reduce semantic complexity, distributional semantic models to enable semantic matching, and a hybrid distributional-relational semantic model called τ-Space to support the development of a schema-agnostic query mechanism.
This document provides an overview of natural language processing (NLP) including the linguistic basis of NLP, common NLP problems and approaches, sources of NLP data, and steps to develop an NLP system. It discusses tokenization, part-of-speech tagging, parsing, machine learning approaches like naive Bayes classification and dependency parsing, measuring word similarity, and distributional semantics. The document also provides advice on going from research to production systems and notes areas not covered like machine translation and deep learning methods.
This document discusses practical aspects of natural language processing (NLP) work. It contrasts research work, which involves setting goals, devising algorithms, training models, and testing accuracy, with development work, which focuses on implementing algorithms as scalable APIs. The document emphasizes that obtaining data is crucial for NLP and describes sources for structured, semi-structured, and unstructured data. It recommends Lisp as a language that supports the interactivity, flexibility, and tree processing needed for NLP research and development work.
This document discusses using Lisp for practical natural language processing (NLP). It begins with an overview of NLP practice, including research work like setting goals, devising algorithms, training models, and testing accuracy. It then discusses some pros and cons of using Lisp for NLP, including its support for interactivity, mathematical foundations, and tree structures. Examples are given of interactive Lisp programs and APIs. The document emphasizes that data is key for NLP and discusses sources for collecting data. It concludes that Lisp is well-suited for NLP research and development due to its interactive and flexible nature.
The document provides an agenda for the second day of the WISS Challenge on question answering over linked data. It encourages full effort on projects and offers free coffee. It also includes links to training question and answer datasets for the QALD-4 challenge and the DBpedia endpoint to use.
Webinar: Simpler Semantic Search with SolrLucidworks
Hear from Lucidworks Senior Solutions Consultant Ted Sullivan about how you can leverage Apache Solr and Lucidworks Fusion to improve semantic awareness of your search applications.
This document discusses algorithms for transforming queries between different query languages. It focuses on transformations between Prolog, SPARQL, and λ-DCS queries. The document provides background on these query languages and explains why query transformations are useful, such as for linking natural language to database queries. It then describes two algorithms: one for transforming Prolog to SPARQL queries, and one for transforming SPARQL to λ-DCS queries. The algorithms are tested on a small geography database and the results are analyzed to evaluate the algorithms' performance and limitations.
Semantics at Scale: A Distributional ApproachAndre Freitas
1) The document discusses using distributional semantics to build robust semantic models that can handle large amounts of data and enable semantic computing at scale.
2) It describes how distributional semantic models can be used to represent word meanings based on their linguistic contexts, allowing semantic knowledge bases to be automatically constructed from large text corpora.
3) The author proposes a schema-agnostic approach using distributional semantics to enable querying databases without prior knowledge of schemas, addressing problems of vocabulary and structural differences between queries and data.
The document describes a character-level convolutional neural network approach for sentence paraphrase detection. It evaluates standard and non-standard models using word and character embeddings as inputs to the CNN. The standard model using character embeddings achieved the best results, obtaining an accuracy of 72.74% and F1 score of 78.8%, outperforming the standard word-based model and non-standard model. The document discusses related work applying CNNs to other NLP tasks and analyzes the results.
This document discusses using data augmentation techniques to improve character-level neural networks for Russian language processing. It proposes augmenting data by replacing words with synonyms, randomly shuffling word order, and inserting adjectives near nouns. The techniques are tested on sentiment analysis tasks using a character-level convolutional network model. Results show that augmenting with synonyms significantly improved accuracy on the test set, while other techniques did not provide clear benefits or reduced accuracy on an out-of-domain validation set. The study demonstrates that data augmentation can help for some NLP tasks when applied carefully for the language and model.
Rated Ranking Evaluator: An Open Source Approach for Search Quality EvaluationAlessandro Benedetti
Every team working on Information Retrieval software struggles with the task of evaluating how well their system performs in terms of search quality(at a specific point in time and historically).
Evaluating search quality is important both to understand and size the improvement or regression of your search application across the development cycles, and to communicate such progress to relevant stakeholders.
To satisfy these requirements an helpful tool must be:
flexible and highly configurable for a technical user
immediate, visual and concise for an optimal business utilization
In the industry, and especially in the open source community, the landscape is quite fragmented: such requirements are often achieved using ad-hoc partial solutions that each time require a considerable amount of development and customization effort.
To provide a standard, unified and approachable technology, we developed the Rated Ranking Evaluator (RRE), an open source tool for evaluating and measuring the search quality of a given search infrastructure. RRE is modular, compatible with multiple search technologies and easy to extend. It is composed by a core library and a set of modules and plugins that give it the flexibility to be integrated in automated evaluation processes and in continuous integrations flows.
This talk will introduce RRE, it will describe its latest developments and demonstrate how it can be integrated in a project to measure and assess the search quality of your search application.
The focus of the presentation will be on a live demo showing an example project with a set of initial relevancy issues that we will solve iteration after iteration: using RRE output feedbacks to gradually drive the improvement process until we reach an optimal balance between quality evaluation measures.
The document discusses question answering over knowledge graphs. It introduces question answering and describes how knowledge graphs can be used to answer natural language questions. It summarizes three proposed papers on learning knowledge graphs for question answering through dialogs, automated template generation for question answering over knowledge graphs, and generating knowledge questions from knowledge graphs. The document also covers motivation for question answering, defining characteristics, different methods like template-based and dialog-based systems, evaluating knowledge quality, and examples of question answering systems.
Every team working on information retrieval software struggles with the task of evaluating how well their system performs in terms of search quality(currently and historically). Evaluating search quality is important both to understand and size the improvement or regression of your search application across the development cycles, and to communicate such progress to relevant stakeholders. In the industry, and especially in the open source community, the landscape is quite fragmented: such requirements are often achieved using ad-hoc partial solutions that each time require a considerable amount of development and customization effort. To provide a standard, unified and approachable technology, we developed the Rated Ranking Evaluator (RRE), an open source tool for evaluating and measuring the search quality of a given search infrastructure. RRE is modular, compatible with multiple search technologies and easy to extend.
I used these slides for an introductory lecture (90min) to a seminar on SPARQL. This slideset introduces the semantics of the RDF query language SPARQL.
How the Web can change social science research (including yours)Frank van Harmelen
A presentation for a group of PhD students from the Leibniz Institutes (section B, social sciences) to discuss how they could use the Web, and even better the Web of Data, as an instrument in their research.
Different Semantic Perspectives for Question Answering SystemsAndre Freitas
Question Answering systems define one of the most complex tasks in computational semantics. The intrinsic complexity of the QA task allows researchers of QA systems to investigate and explore different perspectives of semantics. However, this complexity also induces a bias towards a systems perspective, where researchers are alienated from a deeper reasoning on the semantic principles that are in place within the different components of the system. In this talk we will explore the semantic challenges, principles and perspectives behind the components of QA systems, aiming at providing a principled map and overview on the contribution of each component within the QA semantic interpretation goal.
This document summarizes Spotify's approach to music discovery and recommendations using machine learning techniques. It discusses how Spotify analyzes billions of user streams to find patterns and make recommendations using collaborative filtering and latent factor models. It also explores combining multiple models like recurrent neural networks, word2vec, and gradient boosted decision trees to improve recommendations. The challenges of evaluating recommendations and optimizing for the right metrics are also summarized.
Semantic Technologies and Programmatic Access to Semantic Data Steffen Staab
This is a talk given at the Semantics@Roche Forum on September 8, 2015. It is a short version of the talk I gave in July at Summer School Semantic Web and really a subset of the slides I showed then.
Query Translation for Ontology-extended Data SourcesJie Bao
This document summarizes an approach for querying ontology-extended data sources. It describes how data sources can be semantically extended with ontologies and mappings to allow for flexible querying. It presents an approach for translating queries formulated over one ontology into equivalent queries over another ontology, while ensuring the translations are sound and complete. It discusses tools developed for ontology editing, mapping, data access and query translation over ontology-extended data sources.
Information-Rich Programming in F# with Semantic DataSteffen Staab
Programming with rich data frequently implies that one
needs to search for, understand, integrate and program with
new data - with each of these steps constituting a major
obstacle to successful data use.
In this talk we will explain and demonstrate how our approach,
LITEQ - Language Integrated Types, Extensions and Queries for
RDF Graphs, which is realized as part of the F# / Visual Studio-
environment, supports the software developer. Using the extended
IDE the developer may now
a. explore new, previously unseen data sources,
which are either natively in RDF or mapped into RDF;
b. use the exploration of schemata and data in order to
construct types and objects in the F# environment;
c. automatically map between data and programming language objects in
order to make them persistent in the data source;
d. have extended typing functionality added to the F#
environment and resulting from the exploration of the data source
and its mapping into F#.
Core to this approach is the novel node path query language, NPQL,
that allows for interactive, intuitive exploration of data schemata and
data proper as well as for the mapping and definition
of types, object collections and individual objects.
Beyond the existing type provider mechanism for F#
our approach also allows for property-based navigation
and runtime querying for data objects.
Lidia Pivovarova is a PhD student at Saint-Petersburg State University working on natural language understanding and conceptual modeling under the supervision of Dr. V. Sh. Rubashkin. Their goals include developing an ontology and conceptual model to support information extraction from newspaper texts by identifying key factors and patterns related to the factors. They are building an attribute tree ontology with over 100 domains and testing it on Russian language texts.
Filtering Inaccurate Entity Co-references on the Linked Open Dataebrahim_bagheri
A method for identifying incorrect sameAs links on the Linked Open Data cloud
Details published in:
John Cuzzola, Ebrahim Bagheri, Jelena Jovanovic:
Filtering Inaccurate Entity Co-references on the Linked Open Data. DEXA (1) 2015: 128-143
Improving Semantic Search Using Query Log AnalysisStuart Wrigley
Despite the attention Semantic Search is continuously gaining, several challenges affecting tool performance and user experience remain unsolved. Among these are: matching user terms with the searchspace, adopting view-based interfaces in the Open Web as well as supporting users while building their queries. This paper proposes an approach to move a step forward towards tackling these challenges by creating models of usage of Linked Data concepts and properties extracted from semantic query logs as a source of collaborative knowledge. We use two sets of query logs from the USEWOD workshops to create our models and show the potential of using them in the mentioned areas.
a system called natural language interface which transforms user's natural language question into SPARQL query
find related papers here https://sites.google.com/site/fadhlinams81/publication
Many Linked Data datasets model elements in their domains in the form of lists: a countable number of ordered resources.
When publishing these lists in RDF, an important concern is making them easy to consume.
Therefore, a well-known recommendation is to find an existing list modelling solution, and reuse it.
However, a specific domain model can be implemented in different ways and vocabularies may provide alternative solutions.
In this paper, we argue that a wrong decision could have a significant impact in terms of performance and, ultimately, the availability of the data.
We take the case of RDF Lists and make the hypothesis that the efficiency of retrieving sequential linked data depends primarily on how they are modelled (triple-store invariance hypothesis).
To demonstrate this, we survey different solutions for modelling sequences in RDF, and propose a pragmatic approach for assessing their impact on data availability.
Finally, we derive good (and bad) practices on how to publish lists as linked open data.
By doing this, we sketch the foundations of an empirical, task-oriented methodology for benchmarking linked data modelling solutions.
This document describes a technique called MinHashing that can be used to efficiently find near-duplicate documents among a large collection. MinHashing works in three steps: 1) it converts documents to sets of shingles, 2) it computes signatures for the sets using MinHashing to preserve similarity, 3) it uses Locality-Sensitive Hashing to focus on signature pairs likely to be from similar documents, finding candidates efficiently. This avoids comparing all possible document pairs.
1. The document proposes Granulated LDA (GLDA), a regularized version of LDA, to improve topic modeling stability.
2. It introduces measures like Kullback-Leibler divergence and Jaccard coefficient to evaluate topic similarity and modeling stability across runs.
3. An experiment applies LDA, SLDA, and GLDA to a large Russian text corpus, finding that GLDA produces more stable topics across multiple runs according to these measures.
Joint Keynote at Int. Conference on Knowledge Engineering and Semantic Web and Prague Computer Science Seminar, Prague, September 22, 2016
The challenges of Big Data are frequently explained by dealing with Volume, Velocity, Variety and Veracity. The large variety of data in organizations results from accessing different information systems with heterogeneous schemata or ontologies. In this talk I will present the research efforts that target the management of such broad data.
They include: (i) an integrated development environment for programming with broad data, (ii) a query language that allows for typing of query results, (iii) a typed lambda-calculus based on description logics, and (iv) efficient access to data repositories via schema indices.
Intelligent Methods in Models of Text Information Retrieval: Implications for...inscit2006
This document summarizes a conference paper on intelligent information retrieval methods and their implications for society. It discusses topics like digital inclusion, the digital divide, effects on the work environment, intellectual property issues, privacy, security, censorship, and spam/optimization techniques used to artificially increase search engine rankings. It also describes a collaborative research project using various artificial intelligence techniques like fuzzy sets, genetic algorithms, and rough sets to improve information retrieval system usability.
Music Personalization : Real time Platforms.Esh Vckay
1. The document discusses music personalization techniques at Spotify, including understanding users and music content, using collaborative filtering and latent vector models to make recommendations, and building real-time recommendation systems using Apache Storm.
2. It describes how Spotify uses machine learning techniques like matrix factorization and word2vec to generate latent vectors for users, songs, artists and playlists to measure similarity and make personalized recommendations at scale for its 75 million users.
3. The key challenges are processing huge amounts of data from 1 billion playlists and 1TB of logs daily to provide recommendations for each new user within 3 seconds and in real-time as listening behaviors change.
This document provides an overview of topic modeling. It defines topic modeling as discovering the thematic structure of a corpus by modeling relationships between words and documents through learned topics. The document introduces Latent Dirichlet Allocation (LDA) as a widely used topic modeling technique. It outlines LDA's generative process and inference methods like Gibbs sampling and variational inference. The document also discusses extensions to LDA, evaluation strategies, open questions, and applications like topic labeling and browsing.
Natural Language Processing in R (rNLP)fridolin.wild
The introductory slides of a workshop given to the doctoral school at the Institute of Business Informatics of the Goethe University Frankfurt. The tutorials are available on http://crunch.kmi.open.ac.uk/w/index.php/Tutorials
SPARQL is a query language for retrieving and manipulating data stored in RDF format. It allows users to write queries against remote SPARQL endpoints to query RDF triples stored in a database. SPARQL queries are composed of triple patterns, similar to RDF triples, that can include variables to retrieve variable bindings from the queried data. Query results are returned as solutions that assign values to the variables. Common queries include SELECT, ASK, CONSTRUCT, and DESCRIBE. SPARQL endpoints provide programmatic access to issue SPARQL queries against remote SPARQL-accessible stores.
Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012taxonbytes
Presentation on reconciling taxonomic concepts using the Euler approach, given at the 2012 Annual Meeting of Entomological Society of America, Knoxville, TN.
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
Data spaces in distributed environments should be allowed to evolve in agile ways providing data space owners with large flexibility about which data they store. Agility and heterogeneity, however, jeopardize data exchanges because representations may build on varying ontologies and data consumers may not rely on the semantic correctness of their queries in the context of semantically heterogeneous, evolving data spaces. Graph data spaces are one example of a powerful model for representing and querying data whose semantics may change over time. To assert and enforce conditions on individual graph data spaces, shape languages (e.g SHACL) have been developed. We investigate the question of how querying and programming can be guarded by reasoning over SHACL constraints in a distributed setting and we sketch a picture of how a future landscape based on semantically heterogeneous data spaces might look like.
Knowledge graphs for knowing more and knowing for sureSteffen Staab
Knowledge graphs have been conceived to collect heterogeneous data and knowledge about large domains, e.g. medical or engineering domains, and to allow versatile access to such collections by means of querying and logical reasoning. A surge of methods has responded to additional requirements in recent years. (i) Knowledge graph embeddings use similarity and analogy of structures to speculatively add to the collected data and knowledge. (ii) Queries with shapes and schema information can be typed to provide certainty about results. We survey both developments and find that the development of techniques happens in disjoint communities that mostly do not understand each other, thus limiting the proper and most versatile use of knowledge graphs.
Symbolic Background Knowledge for Machine LearningSteffen Staab
Machine learning aims at learning complex functions from data. Very often, this challenge remains ill-defined given the available amount of data, however, background knowledge that is available as knowledge graphs, ontologies or symbolic (physical) equations allows for an improved specification of the targeted solution. In this talk, we want to discuss several use cases that include symbolic background knowledge as regularizing priors, as constraints or as other inductive biases into machine learning tasks.
Soziale Netzwerke und Medien: Multi-disziplinäre Ansätze für ein multi-dimens...Steffen Staab
Präsentation von Oul Han und Steffen Staab
Workshop "Soziale Netzwerke und Medien" auf dem Treffen des Fakultätentags Informatik, 14. November 2019, Hamburg
Web Futures: Inclusive, Intelligent, SustainableSteffen Staab
Almost from its very beginning, the Web has been ambivalent.
It has facilitated freedom for information, but this also included the freedom to spread misinformation. It has faciliated intelligent personalization, but at the cost of intrusion into our private lifes. It has included more people than any other system before, but at the risk of exploiting them.
The Web is full of such ambivalences and the usage of artificial intelligences threatens to further amplify these ambivalences. To further the good and to contain the negative consequences, we need a research agenda studying and engineering the Web, as well as numerous activities by societies at large. In this talk, I will present and discuss a joint effort by an interdisciplinary team of Web Scientists to prepare and pursue such an agenda.
This document summarizes Steffen Staab's keynote presentation on eye tracking and web interaction. It discusses how eye tracking can be used to understand how users interact with and understand websites. It presents a framework for discovering active visual stimuli on websites using eye tracking data and machine learning. It also introduces GazeTheWeb, a system that aims to optimize gaze-based interaction with websites by adapting the interaction based on semantic understanding of page elements and dynamics. A lab study found that GazeTheWeb improved task completion times, usability and workload compared to traditional gaze emulation.
Storing and Querying Semantic Data in the CloudSteffen Staab
Daniel Janke and Steffen Staab. Tutorial at Reasoning Web
With proliferation of semantic data, there is a need to cope with trillions of triples by horizontally scaling data management in the cloud. To this end one needs to advance (i) strategies for data placement over compute and storage nodes, (ii) strategies for distributed query processing, and (iii) strategies for handling failure of compute and storage nodes. In this tutorial, we want to review challenges and how they have been addressed by research and development in the last 15 years.
Talk at Leopoldina Symposium on Digitization and its Effects on Man and Society
(Die Digitalisierung und ihre Auswirkungen auf Mensch und Gesellschaft)
leopoldina.org/de/veranstaltungen/veranstaltung/event/2464/
The document discusses Steffen Staab's presentation on "The Web We Want" at the WebSci '17 conference. It covers several topics related to making the web more inclusive, healthy, and useful. For social inclusion, it describes the MAMEM project which aims to measure how accessible the web is for people with disabilities. For a healthy web, it discusses using techniques from social network analysis to identify harmful roles and behaviors. For a useful semantic web, it presents principles for interlinking data sets in ways that meaningfully extend entity descriptions and connectivity. The overall goal is to engineer and measure how well the web achieves important values like inclusion, health, and usefulness.
This document summarizes a presentation on the next 10 years of Web Science. It discusses social challenges like discrimination and trust, legal challenges regarding regulation and tracking, political challenges from misinformation and participation, and technical challenges from artificial intelligence and security. The presentation outlines the 10 year initiative of the Web Science Network of laboratories and highlights talks from researchers at companies like Google, Facebook, and Stanford. It promotes collaborative projects like the Web Science Observatory and Summer School.
(Semi-)Automatic analysis of online contentsSteffen Staab
How can media and discourse analyses combine approaches from humanities and statistical methods to deeply analyse large amounts of online contents.
Invited talk at Fachgruppen-Workshop der Deutschen Gesellschaft für Publizistik und Kommunikationswissenschaft
Soziale Medien – Echo-Kammer oder öffentlicher Raum?
Ansätze zur computergestützten Analyse von Internet-Korpora
6. Oktober 2016, Karlsruher Institut für Technologie (KIT)
We use metadata of various kind to improve and enrich text document clustering using an extension of Latent Dirichlet Allocation (LDA). The methods are fully implemented, evaluated and software is available on github.
These are the slides of an invited talk I gave September 8 at the Alexandria Workshop of TPDL-2016: http://alexandria-project.eu/events/3rd-workshop/
This document provides an overview of a workshop on web science. It includes an agenda with topics such as an introduction to web science, aspects of the web, observing the web through web observatories, modeling aspects of the web, and the past and future of the web. It also provides details about project work sessions and social events during the workshop. Examples of bias in the web are discussed, such as bias in devices, software, content and data, and social networks. Methods for observing and collecting data from the web are addressed, along with challenges around data collection and publishing.
This document discusses the past 10 years and future of Web Science. It provides an overview of how the Web has evolved from a place to retrieve documents to a platform for coordination, monitoring, delivering services and understanding data. Web Science has progressed from case studies to developing concepts like the "Social Machine" and models of tagging. The document poses questions to a panel of experts about the strengths, weaknesses, opportunities and threats for Web Science over the past 10 years and what the next 10 years may bring.
The document summarizes the closing session of ISWC 2015, including award winners. It lists the winners of the People's Choice Poster Award, People's Choice Demo Award, Best Poster Award, Best Demo Award, Best Applied Paper Award, and Best Research Paper Award. It thanks attendees for their participation at ISWC 2015 and looks forward to ISWC 2016 in Kobe, Japan.
This document provides an overview and schedule for ISWC 2015 held from October 11-15, 2015. It summarizes attendance statistics, the research and applied paper submission and review process, award nominees, and highlights of the program including keynotes, paper sessions, and social events. The general chair is Steffen Staab from the University of Koblenz-Landau and University of Southampton. ISWC 2015 aims to bring together researchers and practitioners in the fields of semantic web and linked data.
This document discusses biases that can occur in social machines and algorithms. It summarizes research on observing bias in data and algorithms. Geographic bias in topic modeling algorithms is explored through an example using tagged photos. The document also examines biases that can occur in liquid feedback systems, using data from the German Pirate Party's system to analyze biases in voting weights, the delegation network, and the impact of delegations on approval rates. Novel power indices are proposed that aim to better measure potential and exercised power by accounting for voting biases.
Invited Talk at Summer School on Semantic Web, Bertinoro, 2015
Abstract:
Two decades ago one has discussed how to build seamless digital workflows
such that the medium for data in a workflow would not switch between paper, fax, phone,
and digital, because each transcription from one to another medium would
be laborious and cost-inefficient. Thus, the issue was avoiding *medium discontinuities*.
Today, we have all-digital data workflows, but we have still plenty of *semantic discontinuities*.
In this talk, I want first to describe reasons for this discontinuities including: autonomy of
data providers, need for agility and flexibility, or decentralized organizations in
the world-wide data spaces.
Then I want to describe several semantics discontinuities and some efforts to
ameliorate them by:
1. Semantic programming (Horizontal workflow paradigm)
2. Core ontologies (Vertical workflow paradigm)
3. Semantic data production and consumption (Sticky semantics)
Revolutionizing Visual Effects Mastering AI Face Swaps.pdfUndress Baby
The quest for the best AI face swap solution is marked by an amalgamation of technological prowess and artistic finesse, where cutting-edge algorithms seamlessly replace faces in images or videos with striking realism. Leveraging advanced deep learning techniques, the best AI face swap tools meticulously analyze facial features, lighting conditions, and expressions to execute flawless transformations, ensuring natural-looking results that blur the line between reality and illusion, captivating users with their ingenuity and sophistication.
Web:- https://undressbaby.com/
E-commerce Application Development Company.pdfHornet Dynamics
Your business can reach new heights with our assistance as we design solutions that are specifically appropriate for your goals and vision. Our eCommerce application solutions can digitally coordinate all retail operations processes to meet the demands of the marketplace while maintaining business continuity.
DDS Security Version 1.2 was adopted in 2024. This revision strengthens support for long runnings systems adding new cryptographic algorithms, certificate revocation, and hardness against DoS attacks.
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppGoogle
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-fusion-buddy-review
AI Fusion Buddy Review: Key Features
✅Create Stunning AI App Suite Fully Powered By Google's Latest AI technology, Gemini
✅Use Gemini to Build high-converting Converting Sales Video Scripts, ad copies, Trending Articles, blogs, etc.100% unique!
✅Create Ultra-HD graphics with a single keyword or phrase that commands 10x eyeballs!
✅Fully automated AI articles bulk generation!
✅Auto-post or schedule stunning AI content across all your accounts at once—WordPress, Facebook, LinkedIn, Blogger, and more.
✅With one keyword or URL, generate complete websites, landing pages, and more…
✅Automatically create & sell AI content, graphics, websites, landing pages, & all that gets you paid non-stop 24*7.
✅Pre-built High-Converting 100+ website Templates and 2000+ graphic templates logos, banners, and thumbnail images in Trending Niches.
✅Say goodbye to wasting time logging into multiple Chat GPT & AI Apps once & for all!
✅Save over $5000 per year and kick out dependency on third parties completely!
✅Brand New App: Not available anywhere else!
✅ Beginner-friendly!
✅ZERO upfront cost or any extra expenses
✅Risk-Free: 30-Day Money-Back Guarantee!
✅Commercial License included!
See My Other Reviews Article:
(1) AI Genie Review: https://sumonreview.com/ai-genie-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
#AIFusionBuddyReview,
#AIFusionBuddyFeatures,
#AIFusionBuddyPricing,
#AIFusionBuddyProsandCons,
#AIFusionBuddyTutorial,
#AIFusionBuddyUserExperience
#AIFusionBuddyforBeginners,
#AIFusionBuddyBenefits,
#AIFusionBuddyComparison,
#AIFusionBuddyInstallation,
#AIFusionBuddyRefundPolicy,
#AIFusionBuddyDemo,
#AIFusionBuddyMaintenanceFees,
#AIFusionBuddyNewbieFriendly,
#WhatIsAIFusionBuddy?,
#HowDoesAIFusionBuddyWorks
SOCRadar's Aviation Industry Q1 Incident Report is out now!
The aviation industry has always been a prime target for cybercriminals due to its critical infrastructure and high stakes. In the first quarter of 2024, the sector faced an alarming surge in cybersecurity threats, revealing its vulnerabilities and the relentless sophistication of cyber attackers.
SOCRadar’s Aviation Industry, Quarterly Incident Report, provides an in-depth analysis of these threats, detected and examined through our extensive monitoring of hacker forums, Telegram channels, and dark web platforms.
Measures in SQL (SIGMOD 2024, Santiago, Chile)Julian Hyde
SQL has attained widespread adoption, but Business Intelligence tools still use their own higher level languages based upon a multidimensional paradigm. Composable calculations are what is missing from SQL, and we propose a new kind of column, called a measure, that attaches a calculation to a table. Like regular tables, tables with measures are composable and closed when used in queries.
SQL-with-measures has the power, conciseness and reusability of multidimensional languages but retains SQL semantics. Measure invocations can be expanded in place to simple, clear SQL.
To define the evaluation semantics for measures, we introduce context-sensitive expressions (a way to evaluate multidimensional expressions that is consistent with existing SQL semantics), a concept called evaluation context, and several operations for setting and modifying the evaluation context.
A talk at SIGMOD, June 9–15, 2024, Santiago, Chile
Authors: Julian Hyde (Google) and John Fremlin (Google)
https://doi.org/10.1145/3626246.3653374
Takashi Kobayashi and Hironori Washizaki, "SWEBOK Guide and Future of SE Education," First International Symposium on the Future of Software Engineering (FUSE), June 3-6, 2024, Okinawa, Japan
Do you want Software for your Business? Visit Deuglo
Deuglo has top Software Developers in India. They are experts in software development and help design and create custom Software solutions.
Deuglo follows seven steps methods for delivering their services to their customers. They called it the Software development life cycle process (SDLC).
Requirement — Collecting the Requirements is the first Phase in the SSLC process.
Feasibility Study — after completing the requirement process they move to the design phase.
Design — in this phase, they start designing the software.
Coding — when designing is completed, the developers start coding for the software.
Testing — in this phase when the coding of the software is done the testing team will start testing.
Installation — after completion of testing, the application opens to the live server and launches!
Maintenance — after completing the software development, customers start using the software.
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Łukasz Chruściel
No one wants their application to drag like a car stuck in the slow lane! Yet it’s all too common to encounter bumpy, pothole-filled solutions that slow the speed of any application. Symfony apps are not an exception.
In this talk, I will take you for a spin around the performance racetrack. We’ll explore common pitfalls - those hidden potholes on your application that can cause unexpected slowdowns. Learn how to spot these performance bumps early, and more importantly, how to navigate around them to keep your application running at top speed.
We will focus in particular on tuning your engine at the application level, making the right adjustments to ensure that your system responds like a well-oiled, high-performance race car.
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesQuickdice ERP
Explore the seamless transition to e-invoicing with this comprehensive guide tailored for Saudi Arabian businesses. Navigate the process effortlessly with step-by-step instructions designed to streamline implementation and enhance efficiency.
OpenMetadata Community Meeting - 5th June 2024OpenMetadata
The OpenMetadata Community Meeting was held on June 5th, 2024. In this meeting, we discussed about the data quality capabilities that are integrated with the Incident Manager, providing a complete solution to handle your data observability needs. Watch the end-to-end demo of the data quality features.
* How to run your own data quality framework
* What is the performance impact of running data quality frameworks
* How to run the test cases in your own ETL pipelines
* How the Incident Manager is integrated
* Get notified with alerts when test cases fail
Watch the meeting recording here - https://www.youtube.com/watch?v=UbNOje0kf6E
Using Query Store in Azure PostgreSQL to Understand Query PerformanceGrant Fritchey
Microsoft has added an excellent new extension in PostgreSQL on their Azure Platform. This session, presented at Posette 2024, covers what Query Store is and the types of information you can get out of it.
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsPeter Muessig
The UI5 tooling is the development and build tooling of UI5. It is built in a modular and extensible way so that it can be easily extended by your needs. This session will showcase various tooling extensions which can boost your development experience by far so that you can really work offline, transpile your code in your project to use even newer versions of EcmaScript (than 2022 which is supported right now by the UI5 tooling), consume any npm package of your choice in your project, using different kind of proxies, and even stitching UI5 projects during development together to mimic your target environment.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j
Dr. Jesús Barrasa, Head of Solutions Architecture for EMEA, Neo4j
Découvrez les dernières innovations de Neo4j, et notamment les dernières intégrations cloud et les améliorations produits qui font de Neo4j un choix essentiel pour les développeurs qui créent des applications avec des données interconnectées et de l’IA générative.
Graspan: A Big Data System for Big Code AnalysisAftab Hussain
We built a disk-based parallel graph system, Graspan, that uses a novel edge-pair centric computation model to compute dynamic transitive closures on very large program graphs.
We implement context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large codebases such as Linux shows that their Graspan implementations scale to millions of lines of code and are much simpler than their original implementations.
These analyses were used to augment the existing checkers; these augmented checkers found 132 new NULL pointer bugs and 1308 unnecessary NULL tests in Linux 4.4.0-rc5, PostgreSQL 8.3.9, and Apache httpd 2.2.18.
- Accepted in ASPLOS ‘17, Xi’an, China.
- Featured in the tutorial, Systemized Program Analyses: A Big Data Perspective on Static Analysis Scalability, ASPLOS ‘17.
- Invited for presentation at SoCal PLS ‘16.
- Invited for poster presentation at PLDI SRC ‘16.
1. Steffen Staab Semantics Reloaded 1Institute for Web Science and Technologies · University of Koblenz-Landau, Germany
Web and Internet Science Group · ECS · University of Southampton, UK &
Semantics Reloaded
Steffen Staab
@ststaab
http://west.uni-koblenz.de
http://wais.soton.ac.uk
2. Steffen Staab Semantics Reloaded 2
• is the linguistic and philosophical study of meaning,
in language, programming languages, formal logics,
and semiotics.
• It is concerned with the relationship
between signifiers—like words, phrases, signs,
and symbols—and what they stand for,
their denotation.
From Wikipedia
Semantics
3. Steffen Staab Semantics Reloaded 3
My Team@Institute for Web Science and Technologies
Deduction
Semantic Web
RDF
OWL
Commonsense
Forgetting
Programming
Induction
Social Media
Text, RDF
Sensors
Transfer Learning
in Networks
Argumentation
Deep Learning
LDA, FCA, TDA,
Responsibility
Interaction
Eye Tracking
Multimodal
Browser
GazeTheWeb
Gaze Mining
Semantic Data Data Analytics Semantic Interaction
5. Steffen Staab Semantics Reloaded 5
Let‘s grab for semantics
whereever we can find it!
What is a cup?
6. Steffen Staab Semantics Reloaded 6
“For a large class of cases of the employment of
the word ‘meaning’—though not for all—this
word can be explained in this way:
the meaning of a word is its use in the language”
Wittgenstein, Philosophical Investigations
8. Steffen Staab Semantics Reloaded 8
• is the linguistic and philosophical study of meaning,
in language, programming languages, formal
logics, and semiotics.
• It is concerned with the relationship
between signifiers —like words, phrases, signs,
and symbols—and what they stand for,
their denotation.
Semantics
9. Steffen Staab Semantics Reloaded 9
Waterfall development
Conceptualization
Conceptualization
Database
Code
strings are passed
back and forth
here
What can we learn
about these
„strings“?
10. Steffen Staab Semantics Reloaded 10
∃recorded.Song ⊑ Musician
Actor ⊓ Musician ⊑ MusicalActor
hendrix, machineGun : recorded
machineGun : Song
elvis :Musician
elvis :Actor
hendrix, elvis : influencedBy
T-Box
A-Box
Example ontology and data
We consider a powerful ontology language, thus
we can handle simpler ones, too.
11. Steffen Staab Semantics Reloaded 11
Description Logics
• Atomic elements of a dataset (knowledge base) 𝒦
defined in signature 𝑆𝑖𝑔 𝒦 = (𝒞, 𝒬, 𝒪)
– Concept identifiers, e.g., Musician
– Role identifiers, e.g., recorded
– Object identifiers, e.g., hendrix, beatles
• Concept expressions built using connectives
– Intersection: Musician⊓Painter
– Existential Quantification: ∃recorded.Song
– Inverse roles: ∃recorded−. ⊤
– Concepts through enumeration: beatles
12. Steffen Staab Semantics Reloaded 12
Knowledge base and inference
• Knowledge base 𝒦: Schema and data
– Subsumption: MusicGroup ⊑ Musician
– Concept assertions: beatles : MusicGroup
– Role assertions: hendrix, beatles : influencedBy
• Interpretation-based semantics
– Formula true or false in specific interpretation
– If all formulas of 𝒦 are true: Model of 𝒦
• If formula 𝐹 true in all models: 𝒦 ⊨ 𝐹
– Answering queries: 𝒦 ⊨ ?X ∶ 𝐶 = ? X 𝒦 ⊨ ?X ∶ 𝐶 }
13. Steffen Staab Semantics Reloaded 13
• Mixture of nominal (Musician) and
structural typing (∃recorded.Song).
• Lack of formal conceptualization:
– e.g. influencedBy
• Logical reasoning
• Number of concepts:
>1,148,230 different concepts in Wikidata
Issues arising
14. Steffen Staab Semantics Reloaded 14
Generic representations
• Only types:
– Node
– Edge
– Axiom
– ...
• Implication
– Typing statements are manually programmed, e.g.
• If x:Node and hasType(x,Person) and hasName(x,y)
then print(“Person:“ y)
If x:Node and hasType(x,Company) and hasName(x,y)
then print(“Company:“ y)
Related Work: Generic Representations
No static typing!
15. Steffen Staab Semantics Reloaded 15
Related Work: Mappings
Conceptualization
Database
Code
„strings are
passed back and
forth here“
Code/mapping
generation
ActiveRDF
Liteq
Owl2Java
...
Issues
- Queries imply nominal and
structural typing
- Large number of possible
types
[J. Pan et al, 2013]
16. Steffen Staab Semantics Reloaded 16
Public class Influences {
static RDFNode MUSICIAN = ...
static Property INFLUENCED_BY = ...
private static String getInfluence(Resource r){
return r.getProperty(INFLUENCED_BY).getObject().toString();
}
public static void main(String args []) {
Model model=... ;// load datasource
for(Resource musician :
model.listSubjectsWithProperty(RDF.type,MUSICIAN))
System.out.format(„%s was influenced by %s“,
musician.toString(),
getInfluence(musician));
}}
Jena style program – with error
∃influencedBy
is not a subclass of
Musician !
17. Steffen Staab Semantics Reloaded 17
Erroneous program in JavaDL
import static semantics.util.names; // helper for converting to IRI
public class Influences knows “music.rdf“ {
private static String getInfluences(∃«:influencedBy».⊤ artist) {
return String.join(“ “, names(artist.«:influencedBy»));
}
// Query for all music artists and print their influences
private static void main(String args[]) {
for («:Musician» m : query-for(“:Musician“))
System.out.format(“%s was influenced by %s“, m.getName(),
getInfluences(m));
}
}
Static typing finds that
∃influencedBy is not a subclass of Musician !
18. Steffen Staab Semantics Reloaded 18
Type-checked program in JavaDL
import static semantics.util.names; // helper for converting to IRI
public class Influences knows “music.rdf“ {
private static String getInfluences(«:Musician» artist) {
switch-type (artist) {
∃«:influencedBy».⊤ influencable { return String.join(“ “,
names(influencable.«:influencedBy»));
}
default: “no influence known“
}
}
// Query for all music artists and print their influences
public static void main(String args[]) {
for («:MusicArtist» m : query-for(“:MusicArtist“))
System.out.format(“%s was influenced by %s“, a.getName(),
getInfluences(m));
}
}
19. Steffen Staab Semantics Reloaded 19
Implementation
Compiler
Reasoning
Service
(HermiT)
Semantic
data
Standard
Java
Compiler
Extended
syntactic
forms
queries during
type-checking
Extended
Java Code
loads
compiled with
Standard
JVM
Bytecode
produces
queries during
runtime
21. Steffen Staab Semantics Reloaded 21
𝝀 𝑫𝑳 in a nutshell
and including powerful type inference
Core principles & Language constructs
(Leinberger, Lämmel, Staab; ESOP 2017)
22. Steffen Staab Semantics Reloaded 22
1. Use concept expressions as types
– Extended syntax for types (e.g., MusicArtist ⊓ Painter)
2. Subtype inferences
– Forward subtyping of concepts expressions (𝐶 ⊑ 𝐷) to 𝒦
– E.g., λx:MusicArtist. … (beatles as MusicGroup)
3. Typing queries
– Use concept expression queries
(e.g., query MusicArtist⊓Painter )
– Check for satisfiability
– Queries always return lists
Core principles
23. Steffen Staab Semantics Reloaded 24
• Subtyping: Additional rule for concept expressions
• Abstraction, application, recursion not affected
– Simple, static types with well defined subtyping behavior
• if-then-else, cons, … need join-type (least upper
bound)
– Separation between normal types and concept types
– Straightforward due to disjunction: 𝑙𝑢𝑏 𝐶, 𝐷 = 𝐶 ⊔ 𝐷
– Same for greatest lower bound
𝝀 𝑫𝑳 rules for static typing
24. Steffen Staab Semantics Reloaded 25
• Typing of objects with most specific concept
– Concepts via enumeration make it straightforward:
beatles ∶ {beatles}
• Queries are straightforward: query MusicArtist
• All songs recorded by beatles: beatles.recorded
– Satisfiable, but empty query result
– head nil is exception to type safety
Objects & Queries
25. Steffen Staab Semantics Reloaded 26
• All influences of hendrix: hendrix.influencedBy
– Result type: ∃influencedBy−
.{hendrix} list
• Typecase to allow for down casting:
case (head hendrix.influencedBy) of
type Painter as x → …
type ¬Painter as y → …
default …
• Not known for beatles if painter or not
– Default case necessary to not get stuck
Down casting
26. Steffen Staab Semantics Reloaded 27
Theorem: A well-typed closed term does not get stuck
during evaluation (with common exceptions).
Result for DL
Typing is a safety net,
but does not solve the halting problem
(empty list)
27. Steffen Staab Semantics Reloaded 28
• Type inference
– Not possible in Java, but in modern languages
• SPARQL BGP queries
• Epistemic concept expressions:
K ∃«:influencedBy»
– Class of instances whose influencers are known
• Shape constraints: shacl
Outlook: DFG Project LISeQ granted
28. Steffen Staab Semantics Reloaded 29
Induction
With Jun Sun, Jerome Kunegis
and the previous teams of ROBUST and REVEAL
(Sun, Staab, Kunegis;
Submitted to IEEE Computer Special Issue on Web Science)
29. Steffen Staab Semantics Reloaded 30
• Benefit from Experience with Social Networks
• Early response to
– trolls
– attacks
– spam
• Social networks are easy to ruin!
What do we want: Healthy social networks
30. Steffen Staab Semantics Reloaded 31
• Role: two nodes belong to the same role if they have
similar structural behavior
• Using structural features of nodes for classification
Roles in Social Networks
31. Steffen Staab Semantics Reloaded 32
• Idea: to learn knowledge from a domain (source
domain) and apply it to another domain (target
domain) using power law
• Challenge: feature distributions differ between the
source and target domains
Transfer Learning
32. Steffen Staab Semantics Reloaded 33
Cumulation (Quantiles)
Transfers
Value transfer from one to another power law
35. Steffen Staab Semantics Reloaded 36
Related Work
None: Lower Baseline
• Applying source-trained classifier on target without transfer
SVD
• Agirre and De Lacalle (ACL2008) use SVD for feature
transformation for word sense disambiguation in different domains
TrAda
• Dai et al. propose TrAdaBoost using partially labelled data from the
target network.
TraNet
• Our approach
Trad.: Upper Baseline
• Training and evaluating on the target network
36. Steffen Staab Semantics Reloaded 37
Evaluation
Target dataset:
• Software AG ARIS Community user interaction
• 9566 threads and 20538 comments by 4216 people
38. Steffen Staab Semantics Reloaded 39
14 Wiki-talk social networks (different languages)
• Registered uses who discuss with each other
• At least 25 users marked as administrators
Evaluation: Wiki talk data sets
39. Steffen Staab Semantics Reloaded 40
Wiki Talk: A set of user interactions in
different Wikipedias
SVD and TrAda omitted due
to poor performance
Network properties
– power laws –
are key for
transfer learning
40. Steffen Staab Semantics Reloaded 41
Wiki Talk: A set of user interactions in
different Wikipedias
Semantics of a
concept „trusted“
relative to context
41. Steffen Staab Semantics Reloaded 42
• How to better describe each node
– Algebraic topology
• How to apply to RDF and knowledge graphs?
Outlook:
EU Project Cutler just started
EU Project Co-inform about to start
43. Steffen Staab Semantics Reloaded 44
• is the linguistic and philosophical study of meaning, in
language, programming languages, formal logics, and
semiotics.
• It is concerned with the relationship between signifiers —
like words, phrases, signs, and symbols (web
pages!?) —and what they stand for, their denotation.
• Semiotics is the study of meaning-making, the study
of sign process and meaningful communication.... The
semiotic tradition explores the study of signs and
symbols as a significant part of communications. As
different from linguistics, however, semiotics also studies
non-linguistic sign systems.
Semantics
49. Steffen Staab Semantics Reloaded 50
• What are eye gaze patterns
– In dynamic environments
• Scrolling
• Drop-down menus
• Rotating web banners
• ...
– Over many people?
Outlook: KMU Innovativ GazeMining just started
Digital Imagination Challenge Finale
Berlin, 15. Februar 2018
50. Steffen Staab Semantics Reloaded 51
Find patterns in eye gaze data
– Layers of presentation activities
• Fixed elements
• Carousels
• Canvas
• ...
– Layers of different users
Visualize resulting analysis for intuitive understanding
Outlook: KMU Innovativ GazeMining just started
52. Steffen Staab Semantics Reloaded 53
“For a large class of cases of the employment of
the word ‘meaning’—though not for all—this
word can be explained in this way:
the meaning of a word is its use in the language”
Wittgenstein, Philosophical Investigations
53. Steffen Staab Semantics Reloaded 54Institute for Web Science and Technologies · University of Koblenz-Landau, Germany
Web and Internet Science Group · ECS · University of Southampton, UK &
Thanks to my team members and all
the other collaborators:
Martin Leinberger, Ralf Lämmel, Raphael
Menges, Daniel Müller, Chandan Kumar,
Korok Sengupta, Jun Sun, Jerome Kunegis,
Tina Walber,...
Project teams:
MAMEM, ROBUST, REVEAL