ESWC 2016 talk about how to compute types (ontology classes) for literals and add semantics to them, making them richer. Then utilize them in an entity summarization usecase.
Entity Linking in Queries: Efficiency vs. EffectivenessFaegheh Hasibi
Slides for the ECIR 2017 paper "Entity Linking in Queries: Efficiency vs. Effectiveness"
Identifying and disambiguating entity references in queries is one of the core enabling components for semantic search. While there is a large body of work on entity linking in documents, entity linking in queries poses new challenges due to the limited context the query provides coupled with the efficiency requirements of an online setting. Our goal is to gain a deeper understanding of how to approach entity linking in queries, with a special focus on how to strike a balance between effectiveness and efficiency. We divide the task of entity linking in queries to two main steps: candidate entity ranking and disambiguation, and explore both unsupervised and supervised alternatives for each step. Our main
finding is that best overall performance (in terms of efficiency and effectiveness) can be achieved by employing supervised learning for the entity ranking step, while tackling disambiguation with a simple unsupervised algorithm. Using the Entity Recognition and Disambiguation Challenge platform, we further demonstrate that our recommended method achieves state-of-the-art performance.
Exploiting Entity Linking in Queries For Entity RetrievalFaegheh Hasibi
Slides for the ICTIR 2016 paper: "Exploiting Entity Linking in Queries For Entity Retrieval"
The premise of entity retrieval is to better answer search queries by returning specific entities instead of documents. Many queries mention particular entities; recognizing and linking them to the corresponding entry in a knowledge base is known as the task of entity
linking in queries. In this paper we make a first attempt at bringing together these two, i.e., leveraging entity annotations of queries in the entity retrieval model. We introduce a new probabilistic component and show how it can be applied on top of any term based entity retrieval model that can be emulated in the Markov Random Field framework, including language models, sequential dependence models, as well as their fielded variations. Using a standard entity retrieval test collection, we show that our extension brings consistent improvements over all baseline methods, including the current state-of-the-art. We further show that our extension is robust against parameter settings.
This is Part II of the tutorial "Entity Linking and Retrieval for Semantic Search" given at WSDM 2014 (together with E. Meij and D. Odijk). For the complete tutorial material (including slides for the other parts) visit http://ejmeij.github.io/entity-linking-and-retrieval-tutorial/
This is Part II of the tutorial "Entity Linking and Retrieval" given at SIGIR 2013 (together with E. Meij and D. Odijk). For the complete tutorial material (including slides for the other parts) visit http://ejmeij.github.io/entity-linking-and-retrieval-tutorial/
Entity Search: The Last Decade and the Nextkrisztianbalog
Keynote talk given at the 10th Russian Summer School in Information Retrieval (RuSSIR ’16), Saratov, Russia, August 2016.
Note: part of the work is under still review; those slides are not yet included.
Entity Linking in Queries: Tasks and EvaluationFaegheh Hasibi
Slides for the ICTIR 2015 paper "Entity Linking in Queries: Tasks and Evaluation"
Annotating queries with entities is one of the core problem areas in query understanding. While seeming similar, the task of entity linking in queries is different from entity linking in documents and requires a methodological departure due to the inherent ambiguity of queries. We differentiate between two specific tasks, semantic mapping and interpretation finding, discuss current evaluation methodology, and propose refinements. We examine publicly available datasets for these tasks and introduce a new manually curated dataset for interpretation finding. To further deepen the understanding of task differences, we present a set of approaches for effectively addressing these tasks and report on experimental results.
Entity Linking in Queries: Efficiency vs. EffectivenessFaegheh Hasibi
Slides for the ECIR 2017 paper "Entity Linking in Queries: Efficiency vs. Effectiveness"
Identifying and disambiguating entity references in queries is one of the core enabling components for semantic search. While there is a large body of work on entity linking in documents, entity linking in queries poses new challenges due to the limited context the query provides coupled with the efficiency requirements of an online setting. Our goal is to gain a deeper understanding of how to approach entity linking in queries, with a special focus on how to strike a balance between effectiveness and efficiency. We divide the task of entity linking in queries to two main steps: candidate entity ranking and disambiguation, and explore both unsupervised and supervised alternatives for each step. Our main
finding is that best overall performance (in terms of efficiency and effectiveness) can be achieved by employing supervised learning for the entity ranking step, while tackling disambiguation with a simple unsupervised algorithm. Using the Entity Recognition and Disambiguation Challenge platform, we further demonstrate that our recommended method achieves state-of-the-art performance.
Exploiting Entity Linking in Queries For Entity RetrievalFaegheh Hasibi
Slides for the ICTIR 2016 paper: "Exploiting Entity Linking in Queries For Entity Retrieval"
The premise of entity retrieval is to better answer search queries by returning specific entities instead of documents. Many queries mention particular entities; recognizing and linking them to the corresponding entry in a knowledge base is known as the task of entity
linking in queries. In this paper we make a first attempt at bringing together these two, i.e., leveraging entity annotations of queries in the entity retrieval model. We introduce a new probabilistic component and show how it can be applied on top of any term based entity retrieval model that can be emulated in the Markov Random Field framework, including language models, sequential dependence models, as well as their fielded variations. Using a standard entity retrieval test collection, we show that our extension brings consistent improvements over all baseline methods, including the current state-of-the-art. We further show that our extension is robust against parameter settings.
This is Part II of the tutorial "Entity Linking and Retrieval for Semantic Search" given at WSDM 2014 (together with E. Meij and D. Odijk). For the complete tutorial material (including slides for the other parts) visit http://ejmeij.github.io/entity-linking-and-retrieval-tutorial/
This is Part II of the tutorial "Entity Linking and Retrieval" given at SIGIR 2013 (together with E. Meij and D. Odijk). For the complete tutorial material (including slides for the other parts) visit http://ejmeij.github.io/entity-linking-and-retrieval-tutorial/
Entity Search: The Last Decade and the Nextkrisztianbalog
Keynote talk given at the 10th Russian Summer School in Information Retrieval (RuSSIR ’16), Saratov, Russia, August 2016.
Note: part of the work is under still review; those slides are not yet included.
Entity Linking in Queries: Tasks and EvaluationFaegheh Hasibi
Slides for the ICTIR 2015 paper "Entity Linking in Queries: Tasks and Evaluation"
Annotating queries with entities is one of the core problem areas in query understanding. While seeming similar, the task of entity linking in queries is different from entity linking in documents and requires a methodological departure due to the inherent ambiguity of queries. We differentiate between two specific tasks, semantic mapping and interpretation finding, discuss current evaluation methodology, and propose refinements. We examine publicly available datasets for these tasks and introduce a new manually curated dataset for interpretation finding. To further deepen the understanding of task differences, we present a set of approaches for effectively addressing these tasks and report on experimental results.
UNIT III MINING COMMUNITIES
Aggregating and reasoning with social network data, Advanced Representations - Extracting
evolution of Web Community from a Series of Web Archive - Detecting Communities in Social
Networks - Evaluating Communities – Core Methods for Community Detection & Mining Applications of Community Mining Algorithms - Node Classification in Social Networks.
Entity Retrieval (tutorial organized by Radialpoint in Montreal)krisztianbalog
This is Part II of the tutorial "Entity Linking and Retrieval for Semantic Search" given at a tutorial organized by Radialpoint (together with E. Meij and D. Odijk).
Previous versions of the tutorial were given at WWW'13, SIGIR'13, and WSDM'14. The current version contains an overhaul of the type-aware ranking part.
For the complete tutorial material (including slides for the other parts) visit http://ejmeij.github.io/entity-linking-and-retrieval-tutorial/
Different Semantic Perspectives for Question Answering SystemsAndre Freitas
Question Answering systems define one of the most complex tasks in computational semantics. The intrinsic complexity of the QA task allows researchers of QA systems to investigate and explore different perspectives of semantics. However, this complexity also induces a bias towards a systems perspective, where researchers are alienated from a deeper reasoning on the semantic principles that are in place within the different components of the system. In this talk we will explore the semantic challenges, principles and perspectives behind the components of QA systems, aiming at providing a principled map and overview on the contribution of each component within the QA semantic interpretation goal.
Presented at Hypertext'13.
Topic classification (TC) of short text messages o↵ers an ef- fective and fast way to reveal events happening around the world ranging from those related to Disaster (e.g. Sandy hurricane) to those related to Violence (e.g. Egypt revolu- tion). Previous approaches to TC have mostly focused on exploiting individual knowledge sources (KS) (e.g. DBpedia or Freebase) without considering the graph structures that surround concepts present in KSs when detecting the top- ics of Tweets. In this paper we introduce a novel approach for harnessing such graph structures from multiple linked KSs, by: (i) building a conceptual representation of the KSs, (ii) leveraging contextual information about concepts by exploiting semantic concept graphs, and (iii) providing a principled way for the combination of KSs. Experiments evaluating our TC classifier in the context of Violence detec- tion (VD) and Emergency Responses (ER) show promising results that significantly outperform various baseline models including an approach using a single KS without linked data and an approach using only Tweets.
Schema-agnositc queries over large-schema databases: a distributional semanti...Andre Freitas
The evolution of data environments towards the growth in the size, complexity, dy-
namicity and decentralisation (SCoDD) of schemas drastically impacts contemporary
data management. The SCoDD trend emerges as a central data management concern
in Big Data scenarios, where users and applications have a demand for more complete
data, produced by independent data sources, under different semantic assumptions and
contexts of use. Most Database Management Systems (DBMSs) today target a closed
communication scenario, where the symbolic schema of the database is known a priori
by the database user, which is able to interpret it in an unambiguous way. The context
in which the data is consumed and produced is well-defined and it is typically the
same context in which the data was created. In contrast, data management under the
SCoDD conditions target an open communication scenario where the symbolic system of
the database is unknown by the user and multiple interpretation contexts are possible.
In this case the database can be created under a different context from the database
user. The emergence of this new data environment demands the revisit of the semantic
assumptions behind databases and the design of data access mechanisms which can
support semantically heterogeneous (open communication) data environments.
This work aims at filling this gap by proposing a complementary semantic model for
databases, based on distributional semantic models. Distributional semantics provides a
complementary perspective to the formal perspective of database semantics, which supports
semantic approximation as a first-class database operation. Differently from models
which describe uncertain and incomplete data or probabilistic databases, distributional-
relational models focuses on the construction of conceptual approximation approaches
for databases, supported by a comprehensive semantic model automatically built from
large-scale unstructured data external to the database, which serves as a semantic/com-
monsense knowledge base. The semantic model can be used to support schema-agnosticqueries, i.e. abstracting the data consumer from a specific conceptualization behind the
data.
The proposed distributional-relational semantic model is supported by a distributional
structured vector space model, named τ −Space, which represents structured data under
a distributional semantic model representation which, in coordination with a query plan-
ning approach, supports a schema-agnostic query mechanism for large-schema databases.
The query mechanism is materialized in the Treo query engine and is evaluated using
schema-agnostic natural language queries.
The evaluation of the query mechanism confirms that distributional semantics provides
a high-recall, medium-high precision, and low maintainability solution to cope with
the abstraction and conceptual-level differences in schema-agnostic queries over largeschema/
schema-less open domain dataset
Presentation of the main IR models
Presentation of our submission to TREC KBA 2014 (Entity oriented information retrieval), in partnership with Kware company (V. Bouvier, M. Benoit)
Categorization of Semantic Roles for Dictionary DefinitionsAndre Freitas
Understanding the semantic relationships between terms is a fundamental task in natural language
processing applications. While structured resources that can express those relationships in
a formal way, such as ontologies, are still scarce, a large number of linguistic resources gathering
dictionary definitions is becoming available, but understanding the semantic structure of natural
language definitions is fundamental to make them useful in semantic interpretation tasks. Based
on an analysis of a subset of WordNet’s glosses, we propose a set of semantic roles that compose
the semantic structure of a dictionary definition, and show how they are related to the definition’s
syntactic configuration, identifying patterns that can be used in the development of information
extraction frameworks and semantic models.
Word Tagging with Foundational Ontology ClassesAndre Freitas
Semantic annotation is fundamental to deal with large-scale
lexical information, mapping the information to an enumerable set of
categories over which rules and algorithms can be applied, and foundational
ontology classes can be used as a formal set of categories for
such tasks. A previous alignment between WordNet noun synsets and
DOLCE provided a starting point for ontology-based annotation, but in
NLP tasks verbs are also of substantial importance. This work presents
an extension to the WordNet-DOLCE noun mapping, aligning verbs according
to their links to nouns denoting perdurants, transferring to the
verb the DOLCE class assigned to the noun that best represents that
verb’s occurrence. To evaluate the usefulness of this resource, we implemented
a foundational ontology-based semantic annotation framework,
that assigns a high-level foundational category to each word or phrase
in a text, and compared it to a similar annotation tool, obtaining an
increase of 9.05% in accuracy.
Introduction to Ontology Engineering with Fluent Editor 2014Cognitum
An introductory course for Ontology Engineering using Controlled Natural Language. Fluent Editor (FE) is an ontology editor that is a tool for editing and manipulating ontologies. The main feature of Fluent Editor is that it uses controlled natural language (CNL) to communicate with a user. Communication with CNL is a more suitable for human users alternative to XML-based OWL editors.
Schema-Agnostic Queries (SAQ-2015): Semantic Web ChallengeAndre Freitas
The Challenge in a Nutshell
To create a query mechanism that semantically matches schema-agnostic user queries to knowledge base elements
The Goal
To support easy querying over complex databases with large schemata, relieving users from the need to understand the formal representation of the data
Relevance
The increase in the size and in the semantic heterogeneity of database schemas are bringing new requirements for users querying and searching structured data. At this scale it can become unfeasible for data consumers to be familiar with the representation of the data in order to query it. At the center of this discussion is the semantic gap between users and databases, which becomes more central as the scale and complexity of the data grows. Addressing this gap is a fundamental part of the Semantic Web vision.
Schema-agnostic query mechanisms aim at allowing users to be abstracted from the representation of the data, supporting the automatic matching between queries and databases. This challenge aims at emphasizing the role of schema-agnosticism as a key requirement for contemporary database management, by providing a test collection for evaluating flexible query and search systems over structured data in terms of their level of schema-agnosticism (i.e. their ability to map a query issued with the user terminology and structure, mapping it to the dataset vocabulary). The challenge is instantiated in the context of Semantic Web datasets.
Detailed documented with the definition of text mining along with challenges, implementing modeling techniques, word cloud and much more.
Thanks, for your time, if you enjoyed this short video there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
Process the sentiments of NLP with Naive Bayes Rule, Random Forest, Support Vector Machine, and much more.
Thanks, for your time, if you enjoyed this short slide there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
UNIT III MINING COMMUNITIES
Aggregating and reasoning with social network data, Advanced Representations - Extracting
evolution of Web Community from a Series of Web Archive - Detecting Communities in Social
Networks - Evaluating Communities – Core Methods for Community Detection & Mining Applications of Community Mining Algorithms - Node Classification in Social Networks.
Entity Retrieval (tutorial organized by Radialpoint in Montreal)krisztianbalog
This is Part II of the tutorial "Entity Linking and Retrieval for Semantic Search" given at a tutorial organized by Radialpoint (together with E. Meij and D. Odijk).
Previous versions of the tutorial were given at WWW'13, SIGIR'13, and WSDM'14. The current version contains an overhaul of the type-aware ranking part.
For the complete tutorial material (including slides for the other parts) visit http://ejmeij.github.io/entity-linking-and-retrieval-tutorial/
Different Semantic Perspectives for Question Answering SystemsAndre Freitas
Question Answering systems define one of the most complex tasks in computational semantics. The intrinsic complexity of the QA task allows researchers of QA systems to investigate and explore different perspectives of semantics. However, this complexity also induces a bias towards a systems perspective, where researchers are alienated from a deeper reasoning on the semantic principles that are in place within the different components of the system. In this talk we will explore the semantic challenges, principles and perspectives behind the components of QA systems, aiming at providing a principled map and overview on the contribution of each component within the QA semantic interpretation goal.
Presented at Hypertext'13.
Topic classification (TC) of short text messages o↵ers an ef- fective and fast way to reveal events happening around the world ranging from those related to Disaster (e.g. Sandy hurricane) to those related to Violence (e.g. Egypt revolu- tion). Previous approaches to TC have mostly focused on exploiting individual knowledge sources (KS) (e.g. DBpedia or Freebase) without considering the graph structures that surround concepts present in KSs when detecting the top- ics of Tweets. In this paper we introduce a novel approach for harnessing such graph structures from multiple linked KSs, by: (i) building a conceptual representation of the KSs, (ii) leveraging contextual information about concepts by exploiting semantic concept graphs, and (iii) providing a principled way for the combination of KSs. Experiments evaluating our TC classifier in the context of Violence detec- tion (VD) and Emergency Responses (ER) show promising results that significantly outperform various baseline models including an approach using a single KS without linked data and an approach using only Tweets.
Schema-agnositc queries over large-schema databases: a distributional semanti...Andre Freitas
The evolution of data environments towards the growth in the size, complexity, dy-
namicity and decentralisation (SCoDD) of schemas drastically impacts contemporary
data management. The SCoDD trend emerges as a central data management concern
in Big Data scenarios, where users and applications have a demand for more complete
data, produced by independent data sources, under different semantic assumptions and
contexts of use. Most Database Management Systems (DBMSs) today target a closed
communication scenario, where the symbolic schema of the database is known a priori
by the database user, which is able to interpret it in an unambiguous way. The context
in which the data is consumed and produced is well-defined and it is typically the
same context in which the data was created. In contrast, data management under the
SCoDD conditions target an open communication scenario where the symbolic system of
the database is unknown by the user and multiple interpretation contexts are possible.
In this case the database can be created under a different context from the database
user. The emergence of this new data environment demands the revisit of the semantic
assumptions behind databases and the design of data access mechanisms which can
support semantically heterogeneous (open communication) data environments.
This work aims at filling this gap by proposing a complementary semantic model for
databases, based on distributional semantic models. Distributional semantics provides a
complementary perspective to the formal perspective of database semantics, which supports
semantic approximation as a first-class database operation. Differently from models
which describe uncertain and incomplete data or probabilistic databases, distributional-
relational models focuses on the construction of conceptual approximation approaches
for databases, supported by a comprehensive semantic model automatically built from
large-scale unstructured data external to the database, which serves as a semantic/com-
monsense knowledge base. The semantic model can be used to support schema-agnosticqueries, i.e. abstracting the data consumer from a specific conceptualization behind the
data.
The proposed distributional-relational semantic model is supported by a distributional
structured vector space model, named τ −Space, which represents structured data under
a distributional semantic model representation which, in coordination with a query plan-
ning approach, supports a schema-agnostic query mechanism for large-schema databases.
The query mechanism is materialized in the Treo query engine and is evaluated using
schema-agnostic natural language queries.
The evaluation of the query mechanism confirms that distributional semantics provides
a high-recall, medium-high precision, and low maintainability solution to cope with
the abstraction and conceptual-level differences in schema-agnostic queries over largeschema/
schema-less open domain dataset
Presentation of the main IR models
Presentation of our submission to TREC KBA 2014 (Entity oriented information retrieval), in partnership with Kware company (V. Bouvier, M. Benoit)
Categorization of Semantic Roles for Dictionary DefinitionsAndre Freitas
Understanding the semantic relationships between terms is a fundamental task in natural language
processing applications. While structured resources that can express those relationships in
a formal way, such as ontologies, are still scarce, a large number of linguistic resources gathering
dictionary definitions is becoming available, but understanding the semantic structure of natural
language definitions is fundamental to make them useful in semantic interpretation tasks. Based
on an analysis of a subset of WordNet’s glosses, we propose a set of semantic roles that compose
the semantic structure of a dictionary definition, and show how they are related to the definition’s
syntactic configuration, identifying patterns that can be used in the development of information
extraction frameworks and semantic models.
Word Tagging with Foundational Ontology ClassesAndre Freitas
Semantic annotation is fundamental to deal with large-scale
lexical information, mapping the information to an enumerable set of
categories over which rules and algorithms can be applied, and foundational
ontology classes can be used as a formal set of categories for
such tasks. A previous alignment between WordNet noun synsets and
DOLCE provided a starting point for ontology-based annotation, but in
NLP tasks verbs are also of substantial importance. This work presents
an extension to the WordNet-DOLCE noun mapping, aligning verbs according
to their links to nouns denoting perdurants, transferring to the
verb the DOLCE class assigned to the noun that best represents that
verb’s occurrence. To evaluate the usefulness of this resource, we implemented
a foundational ontology-based semantic annotation framework,
that assigns a high-level foundational category to each word or phrase
in a text, and compared it to a similar annotation tool, obtaining an
increase of 9.05% in accuracy.
Introduction to Ontology Engineering with Fluent Editor 2014Cognitum
An introductory course for Ontology Engineering using Controlled Natural Language. Fluent Editor (FE) is an ontology editor that is a tool for editing and manipulating ontologies. The main feature of Fluent Editor is that it uses controlled natural language (CNL) to communicate with a user. Communication with CNL is a more suitable for human users alternative to XML-based OWL editors.
Schema-Agnostic Queries (SAQ-2015): Semantic Web ChallengeAndre Freitas
The Challenge in a Nutshell
To create a query mechanism that semantically matches schema-agnostic user queries to knowledge base elements
The Goal
To support easy querying over complex databases with large schemata, relieving users from the need to understand the formal representation of the data
Relevance
The increase in the size and in the semantic heterogeneity of database schemas are bringing new requirements for users querying and searching structured data. At this scale it can become unfeasible for data consumers to be familiar with the representation of the data in order to query it. At the center of this discussion is the semantic gap between users and databases, which becomes more central as the scale and complexity of the data grows. Addressing this gap is a fundamental part of the Semantic Web vision.
Schema-agnostic query mechanisms aim at allowing users to be abstracted from the representation of the data, supporting the automatic matching between queries and databases. This challenge aims at emphasizing the role of schema-agnosticism as a key requirement for contemporary database management, by providing a test collection for evaluating flexible query and search systems over structured data in terms of their level of schema-agnosticism (i.e. their ability to map a query issued with the user terminology and structure, mapping it to the dataset vocabulary). The challenge is instantiated in the context of Semantic Web datasets.
Detailed documented with the definition of text mining along with challenges, implementing modeling techniques, word cloud and much more.
Thanks, for your time, if you enjoyed this short video there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
Process the sentiments of NLP with Naive Bayes Rule, Random Forest, Support Vector Machine, and much more.
Thanks, for your time, if you enjoyed this short slide there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
With a dozen locations in Florida, Rose Radiology Centers provide care from a dedicated group of certified medical practitioners. Rose Radiology’s staff consists of 11 accredited radiologists.
Media Deck - New Mexico Department of TourismSean Arthur
Detailed media deck that our team created and presented for New Mexico's Department of Tourism. I researched Radio, OOH, and TV/Cable Buy and wrote the section on TV/Cable Buy.
This talk was initially delivered at the Melbourne WordPress User Meetup. With tens of thousands of choices for WordPress users and developers, choosing the right theme is an important decision to make when working on any WordPress project. Theme choice impacts on not only design and UX, but also usability, accessibility, performance and more.
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...CITE
5 March 2010 (Friday) | 09:00 - 12:30 | http://citers2010.cite.hku.hk/abstract/69 | Dr. Kwok Ping CHAN, Associate Professor, Department of Computer Science, HKU
This is an introduction to text analytics for advanced business users and IT professionals with limited programming expertise. The presentation will go through different areas of text analytics as well as provide some real work examples that help to make the subject matter a little more relatable. We will cover topics like search engine building, categorization (supervised and unsupervised), clustering, NLP, and social media analysis.
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAijistjournal
Ontologisms have been applied to many applications in recent years, especially on Sematic Web, Information Retrieval, Information Extraction, and Question and Answer. The purpose of domain-specific ontology is to get rid of conceptual and terminological confusion. It accomplishes this by specifying a set of generic concepts that characterizes the domain as well as their definitions and interrelationships. This paper will describe some algorithms for identifying semantic relations and constructing an Information Technology Ontology, while extracting the concepts and objects from different sources. The Ontology is constructed based on three main resources: ACM, Wikipedia and unstructured files from ACM Digital Library. Our algorithms are combined of Natural Language Processing and Machine Learning. We use Natural Language Processing tools, such as OpenNLP, Stanford Lexical Dependency Parser in order to explore sentences. We then extract these sentences based on English pattern in order to build training set. We use a random sample among 245 categories of ACM to evaluate our results. Results generated show that our system yields superior performance.
Ontologisms have been applied to many applications in recent years, especially on Sematic Web, Information
Retrieval, Information Extraction, and Question and Answer. The purpose of domain-specific ontology
is to get rid of conceptual and terminological confusion. It accomplishes this by specifying a set of generic
concepts that characterizes the domain as well as their definitions and interrelationships. This paper will
describe some algorithms for identifying semantic relations and constructing an Information Technology
Ontology, while extracting the concepts and objects from different sources. The Ontology is constructed
based on three main resources: ACM, Wikipedia and unstructured files from ACM Digital Library. Our
algorithms are combined of Natural Language Processing and Machine Learning. We use Natural Language
Processing tools, such as OpenNLP, Stanford Lexical Dependency Parser in order to explore sentences.
We then extract these sentences based on English pattern in order to build training set. We use a
random sample among 245 categories of ACM to evaluate our results. Results generated show that our
system yields superior performance.
Effective Semantics for Engineering NLP SystemsAndre Freitas
Provide a synthesis of the emerging representation trends behind NLP systems.
Shift in perspective:
Effective engineering (task driven, scalable) instead of sound formalism.
Best-effort representation.
Knowledge Graphs (Frege revisited)
Information Extraction & Text Classification
Distributional Semantic Models
Knowledge Graphs & Distributional Semantics
(Distributional-Relational Models)
Applications of DRMs
KG Completion
Semantic Parsing
Natural Language Inference
"EARL: Joint Entity and Relation Linking for Question Answering over Knowledge Graphs" as presented in Sthe 17th International Semantic Web Conference ISWC, 9th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
The Metaverse and AI: how can decision-makers harness the Metaverse for their...Jen Stirrup
The Metaverse is popularized in science fiction, and now it is becoming closer to being a part of our daily lives through the use of social media and shopping companies. How can businesses survive in a world where Artificial Intelligence is becoming the present as well as the future of technology, and how does the Metaverse fit into business strategy when futurist ideas are developing into reality at accelerated rates? How do we do this when our data isn't up to scratch? How can we move towards success with our data so we are set up for the Metaverse when it arrives?
How can you help your company evolve, adapt, and succeed using Artificial Intelligence and the Metaverse to stay ahead of the competition? What are the potential issues, complications, and benefits that these technologies could bring to us and our organizations? In this session, Jen Stirrup will explain how to start thinking about these technologies as an organisation.
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
Gleaning Types for Literals in RDF with Application to Entity Summarization
1. Gleaning Types for Literals
in RDF Triples with Application to
Entity Summarization
1 Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis),
Wright State University, USA
2 National Key Laboratory for Novel Software Technology, Nanjing University, China
13th Extended Semantic Web Conference (ESWC ) 2016
Greece, 05.31.2016
Kalpa Gunaratna 1 Krishnaprasad Thirunarayan 1 Amit Sheth 1 Gong Cheng 2
2. o Literals and background of Entity Summarization
o Typing literals in knowledge graphs
o Entity Summarization (FACES-E)
o Evaluation
– Typing
– Entity Summarization with datatype properties
o Conclusion and Future Work
2
Talk Overview
3. o Considerable amount of information captured in datatype
properties.
– 1600 datatype properties vs. 1079 object properties in DBpedia
o Many literals can be “easily typed” for proper interpretation
and use.
– Example: in DBpedia, http://dbpedia.org/property/location has
~1,00,000 unique simple literals that can be directly mapped to
entities.
o Added semantics can be used in practical and useful
applications like (i) entity summarization, (ii) property
alignment, (iii) data integration, and (iv) dataset profiling.
3
Motivating Facts – Literals and Semantics
4. o Datasets and knowledge graphs on the web continue to grow
in number and size.
– DBpedia (3.9) has around 200 triples on average per entity.
o All the facts of an entity are difficult to process when browsing.
o Better presentation is required. Good quality summaries can
help!
4
Lets Focus on Entity Summarization now …..
5. 5
Importance of Entities and Summaries
Google has its own knowledge graph called
Google Knowledge Graph (GKG) to facilitate
search.
Google made summarization their second
priority in building GKG*.
* Singhal, A. 2012. Introducing the knowledge graph: things, not strings. Official Google Blog, May.
6. o Introduced FACES (FACeted Entity Summaries) approach *.
o FACES follows two main steps.
o First, it groups “conceptually” similar features.
– Two groups will have different facts from each other.
o Second, it picks features (property-value pairs) from these
groups, improving diversity, for the summaries.
6
Diversity-Aware Entity Summaries (FACES approach) - Background
* Kalpa Gunaratna, Krishnaprasad Thirunarayan, and Amit Sheth. 'FACES: Diversity-Aware Entity Summarization using
Incremental Hierarchical Conceptual Clustering'. 29th AAAI Conference on Artificial Intelligence (AAAI 2015), AAAI, 2015.
7. 7
Faceted Entity Summary - Example
Marie Curie
Pierre Curie Warsaw Passy,_Haute-
Savoie
ESPCI_ParisTechUniversity_of_Paris
Radioactivity
Chemistry
Birth
Place
Field
Concise and comprehensive summary
could be: {f1,f2, f6}
Another summary could be: {f4, f6, f7}
8. o FACES utilizes type semantics of objects in grouping features.
o Literals in RDF triples do not have “semantic” types. They only
have primary data types (e.g., date, integer, string, etc).
o Can we try to add semantic types to literals? How?
8
Information coming from literals???
9. o FACES can only handle object property based features.
o Why? – any specific reason???
– Values of the features are not URIs and have no “semantic”
types.
– Hence, the adapted algorithm (Cobweb) for grouping features
can not get types for property object values.
– It can not create the partitions for faceted entity summaries.
o Our contributions are to:
– First compute types for the values of datatype property based
features (data enrichment).
– Then, adapt and improve ranking algorithms (summarization).
FACES-E system.
9
Typing Literals in RDF Triples for Entity Summarization
10. 10
Typing Datatype Property Values - Example
dbr:Barack_Obama dbo:Politician
dbo:Politician
dbp:vicePresident
dbr:Joe_Biden
rdf:type
dbr:Barack_Obama “44th President of the United States”^^xsd:string
dbp:shortDescription
dbr:Calvin_Coolidge “48th Governor of Massachusetts”^^xsd:string
dbp:orderInOffice
dbo:President
dbo:Governor
rdf:subClassOf
rdf:subClassOf
11. o Focus of the literal is not clear unlike URIs.
o May contain several entities or labels matching ontology
classes.
o The literal can be long.
– In this work, we focus on one sentence long literals.
– For a paragraph like text, finding a single focus is hard and needs
different techniques.
11
Why is it Hard?
44th President of the United States
option 1
option 2 option 3
12. o We expect the focus of the sentence or phrase leads to the
representative entity/type of the sentence.
o There are prominent works on identifying head word of a
sentence/phrase.
– Example: member of committee
o We use existing head word detection algorithms to identify the
focus term.
– Collins’ head word detection algorithm
12
Focus term identification
13. We filter out date and numeric values.
1. Exact matching of focus term to class labels.
– E.g., “48th Governor of Massachusetts” Governor (class)
2. Get the n-grams and see for a matching class using n-gram and
focus term overlap (maximal match).
I. Check for a matching class for an overlapping n-gram.
II. If a type not found, spot entities in the n-grams and get their types.
• “United States Senate” “United State Senate” 3-gram matches the
entity in DBpedia.
3. Semantic matching of focus term to class labels.
– We compare pairwise similarity of the focus term with all the class
labels and pick the highest (we utilize UMBC similarity service).
13
Deriving type (class) from head word
16. o Ranking mechanism for objects (in the FACES) do not work.
– Why? Two literals can be unique even if their types and the
main entities are the same.
• Example, “United States President” Vs. “President of the United
States” (counting is affected).
• Not desirable to search using the whole phrase.
– Hence, use entities.
– A literal can have several entities. Which one to choose?
16
Ranking Datatype Property Features
17. o We observe humans recognize popular entities.
– Entities can be in literals with variations.
o We use the popular entities in literals and not the literals
themselves for ranking.
o Functions
– Function ES(v) returns all entities present in the value v.
– Function max(ES(v)) returns the most popular entity in ES(v).
17
Idea for Ranking
v = “44th President of the United States”
ES(v) = {db:President, db:United States}
max(ES(v)) = db:United States
Remember: our goal and objective of ranking is disjoint with typing mechanism
18. 18
Modified Ranking Equations
If you really wanted to know …
informativeness is inversely proportional to the number of entities that are associated with
overlapping values containing the most popular entity of feature f.
Frequency of the most popular entity in v.
tf-idf based ranking score.
19. o Aggregate feature ranking scores for each facet.
o Rank facets based on the aggregated scores.
19
Facet Ranking
Rank(f) is the original function and Rank(f)’ is the modified one for datatype property based features.
20. 1. Extract features for the entity e.
2. Enrich each feature and get the WordSet WS(f).
3. Enriched feature set FS(e) is input to the partitioning algorithm and get facet set
F(e).
4. First get the feature ranking scores (R(f)) and then compute the facet ranking
scores for each facet (FacetRank(F(e)).
5. Top ranked features from top ranked facets in the order are picked to form the
faceted entity summary. The constraints defined in the definition for the faceted
entity summary hold.
20
FACES-E Entity Summary Generation
(1) (2) (3) (4) (5)
Enriching Literals Modified Ranking
21. Literal Types
United States Ambassador to the United Nations Agent, Ambassador, Person
Chairman of the Republican National Committee Agent, Politician, Person, President,
United States Navy Agent, Organisation, Military Unit
Member of the New York State Senate Agent, OrganisationMember, Person
Senate Minority Leader Agent, Politician, Person, President
United States Senate Agent, Organisation, Legislature
from Virginia Administrative Region, Place, Region,
Populated Place
Denison, Texas, U.S. Administrative Region, Place, Country,
Region,Populated Place
21
Type Computation Samples
with super types excluding owl:Thing
22. o Type Set TS(v) is the generated set of types for the value v.
22
Evaluation – Type Generation Metrics
n is the total number of features.
23. o DBpedia Spotlight is used as the baseline and had 1117 unique
property-value pairs (features).
o 118 pairs (consisting of labelling properties and noisy features)
were removed.
o Results convey that special care should be taken in deciding
types for literals.
23
Evaluation – Type Generation
Mean Precision (MP) Any Mean Precision (AMP) Coverage
Our approach 0.8290 0.8829 0.8529
Baseline 0.4867 0.5825 0.5533
24. 24
Evaluation – Summarization Metrics
Average pairwise agreement of the ideal summaries
Average summary overlap between system generated and ideal summaries.
25. o The gold standard consists of 20 random entities used in FACES
taken from DBpedia 3.9 and 60 random entities taken from
DBpedia 2015-04.
o 17 human users created ideal summaries (total of 900). Each
entity received at least 4 ideal summaries for each length.
25
Evaluation – FACES-E Summary Generation
System k = 5 k = 10
Avg. Quality % Increase Avg. Quality % Increase
FACES-E 1.5308 - 4.5320 -
RELIN 0.9611 59 % 3.0988 46 %
RELINM 1.0251 49 % 3.6514 24 %
Avg. Agreement 2.1168 5.4363
k is the summary length
26. o Consider meaning of the property name to compute types.
o Literals and properties are noisy.
– Identify those automatically to filter out.
– Filter out labelling properties (automatic identification). This is
hard.
o A formal model to capture the semantic types in RDF for
literals.
– Without changing their original representation (literals).
26
Future Work
29. o Entities are described by features.
o Feature: A property-value pair is called a feature.
o Feature Set: All the features that describe an entity.
o Entity Summary of size k: A subset of the feature set for an entity,
constrained by size k.
29
Preliminaries
Entity summaries for k=3:
{f1,f2,f5}, {f4, f6, f7}, {f3,f4,f5}, …
Entity – Marie Curie
Feature Set Features Property Value
FS
f1 spouse Pierre_Curie
f2 birthPlace Warsaw
f3 deathPlace Passy_Haute-Savoie
f4 almaMater ESPI_ParisTech
f5 workInstitutions University_of_Paris
f6 knownFor Radioactivity
f7 field Chemistry
30. Facets (partition)
Given an entity e, a set of facets F(e) of e is a partition of the feature set FS(e). That is, F(e) =
{C1, C2, ..Cn} such that F(e) satisfies:
(i) Non-empty: ∅ ∉ F(e).
(ii) Collectively exhaustive: C1 ∪ C2 ∪…Cn = FS(e).
(iii) Mutually (pairwise) disjoint: Ci ≠ Cj then Ci ∩ Cj = ∅.
Faceted entity summary
Given an entity e and a positive integer k < |FS(e)|, faceted entity summary of e of size k,
FSumm(e,k), is a collection of features such that FSumm(e,k) ⊂ FS(e), |FSumm(e,k)| = k.
Further, either (i) k > |F(e)| and ∀X ∈ F(e), X ∩ FSumm(e,k) ≠ ∅ or (ii) k ≤ |F(e)| and ∀X ∈ F(e),
| X ∩ FSumm(e,k)| ≤ 1 holds, where F(e) is a set of facets of FS(e).
30
Faceted Entity Summary
Faceted entity summary,
k=2: {f1, f6}
k=3: {f1, f2, f6}
Editor's Notes
Property facts taken from DBpedia 3.9 statistics (I believe)
Easily types literal example: “California”, “Greece” etc., related to location property
We talk about entity summarization as a usecase from here onwards together with Typing.
DBpedia 2.0 had 1.95 million things and DBpedia 2015-04 has 5.9 million things.
LOD had 295 datasets in 2011 and in 2014 it had 1014 datasets.
Entity – A real world thing (e.g., person, book, place) at the data level that encapsulates facts and is represented by a URI.
Entity summary is a subset of facts that represent the entity.
Explain conceptual idea -> we want facts talking about one “theme” to be grouped together.
Conceptually similar features are colored in the same color.
Refer to previous slide in grouping using types. Explain with example: almaMater and workInstitute properties are both talking about places.
Possible reasons for creating most of the literals instead of URI resources:
(i) the creator was unable to find a suitable entity URI for the object value, and hence chose to use a literal instead,
(ii) the creator of the triple did not want to attach more details to the value and hence represented it in plain text,
(iii) the value contains only basic implementation types like integer, boolean, and date, and hence not meaningful to create an entity, or
(iv) the value has a lengthy description spanning several sentences (e.g., dbo:abstract property in DBpedia) that covers a diverse set of entities and facts.
Option 1 and 2 seems to be the right pick (one of them).
Another example,
48th Governor of Massachusetts person and populated place
Another one
United States Ambassador to the United Nations
We used Dbpedia 2015-04 dataset at the time of processing
Numeric and Date, can not do more than these types.
( Governor Matches class Governor (entity Governor of owl:Thing) )
Case 2:
Senate matches to an entity of class Thing, so it didn’t get a type from step 1.
United States Senate matches to Legislature
Harvard Law School focus term is school.
3-gram leads to Hardvard_Law_School entitity Educational Institute
Head word detection – Colin’s Head Word Detection algorithm.
Directly matches head word to class
Matches N-grams and head word to class label or else, match entities to N-grams and head word and then get the types.
Semantic matcher of head word using UMBC matching service.
Inf(f)’ – count # of entities having the feature. Property should match but value has to contain the popular entity of the input feature’s value.
Po(v)’ – count the number of triples that have the matching feature with most popular entity of the value.
F(e) is the facet set
Colored parts are the new additions/modifications
We got super classes other than Thing in this evaluation for both baseline and our approach.
Recall is not measured because it is hard to do so (check so many pairs).
For DBpedia 2015-04 version
Summ(e) is the system generated summary
SummI(e) is the ideal summary
For this evaluation we used both DBpedia 3.9 and DBpedia 2015-04 versions.
Labelling properties should not be typed, they are probably there to just referencing as a label. For example, “name” property value typing as a Person looks odd.