This document discusses measuring vocabulary relatedness and its application to recommender systems. It presents 6 measures of vocabulary relatedness based on semantic relatedness, content similarity, expressivity closeness, and distributional relatedness. It analyzes the measures empirically using a dataset of 2,996 vocabularies and 4 billion RDF triples. The measures can be applied to post-selection vocabulary recommendation in vocabulary search tools.
Characterising the Emergent Semantics in Twitter ListsOscar Corcho
This document summarizes research analyzing the emergent semantics of lists and list names on Twitter. The researchers investigated whether related keywords can be identified from list names according to how they are used by different user roles (curators, subscribers, members). They used a dataset of over 297,000 lists to extract keywords from list names and model their relationships based on these user roles. Their experiments analyzed the semantics of related keyword pairs using techniques like WordNet searches and found that relationships identified based on members had the highest percentage of direct semantic relations like synonyms.
DynaLearn: Problem-based learning supported by semantic techniquesOscar Corcho
This document describes a system that supports problem-based learning through semantic techniques. The system grounds learner models in semantic repositories to enable semantic-based feedback. It analyzes learner models and reference models to identify discrepancies in terminology, taxonomy, and qualitative reasoning structures. Suggestions are generated and filtered based on agreement across multiple reference models. The system aims to bridge gaps between learner and expert terminology and provide automated feedback to support the learning process.
A survey on phrase structure learning methods for text classificationijnlc
Text classification is a task of automatic classification of text into one of the predefined categories. The
problem of text classification has been widely studied in different communities like natural language
processing, data mining and information retrieval. Text classification is an important constituent in many
information management tasks like topic identification, spam filtering, email routing, language
identification, genre classification, readability assessment etc. The performance of text classification
improves notably when phrase patterns are used. The use of phrase patterns helps in capturing non-local
behaviours and thus helps in the improvement of text classification task. Phrase structure extraction is the
first step to continue with the phrase pattern identification. In this survey, detailed study of phrase structure
learning methods have been carried out. This will enable future work in several NLP tasks, which uses
syntactic information from phrase structure like grammar checkers, question answering, information
extraction, machine translation, text classification. The paper also provides different levels of classification
and detailed comparison of the phrase structure learning methods.
Lexical Distribution in Citation Contexts through the IMRaD Standard - ECIR-2...Iana Atanassova
This study analyzed the distribution of verbs used in citation contexts within scientific papers structured using the IMRaD format (Introduction, Methods, Results, Discussion). Over 450,000 citation contexts from biology journals were identified and categorized by section. The top verbs used in each section were identified, with "show" being most common in introductions, "use" in methods, "use" in results, and "show" in discussions. Verb distributions followed Zipf's law, with a small number of verbs accounting for most uses. Citations were found to play different roles depending on their section, with verbs expressing these citation acts.
Translating natural language competency questions into sparql queries web2013Leila Zemmouchi-Ghomari
it's our presentation at WEB 2013 the first internationl conference on building and exploring web based environments, held at seville, Spain (January 27-February 01)
Towards Exploratory Relationship Search: A Clustering-based ApproachGong Cheng
This document presents an approach for exploratory relationship search through hierarchical clustering. It aims to address the challenge of too many relationship search results by organizing them into a cluster hierarchy based on common relationship patterns. An evaluation with participants performing lookup and exploratory search tasks on DBpedia data found that the clustering approach outperformed simple listing and faceted categorization alternatives. User feedback suggested areas for improvement like more concise visualizations and cognitive support. The authors conclude it is a promising approach and future work could combine facets and clustering or explore alternatives.
Falcons Explorer: Tabular and Relational End-user Programming for the Web of ...Gong Cheng
This presents Falcons Explorer at Semantic Web Challenge (SWC) 2010. Falcons Explorer is a tabular and relational interface for exploring the Web of data.
Characterising the Emergent Semantics in Twitter ListsOscar Corcho
This document summarizes research analyzing the emergent semantics of lists and list names on Twitter. The researchers investigated whether related keywords can be identified from list names according to how they are used by different user roles (curators, subscribers, members). They used a dataset of over 297,000 lists to extract keywords from list names and model their relationships based on these user roles. Their experiments analyzed the semantics of related keyword pairs using techniques like WordNet searches and found that relationships identified based on members had the highest percentage of direct semantic relations like synonyms.
DynaLearn: Problem-based learning supported by semantic techniquesOscar Corcho
This document describes a system that supports problem-based learning through semantic techniques. The system grounds learner models in semantic repositories to enable semantic-based feedback. It analyzes learner models and reference models to identify discrepancies in terminology, taxonomy, and qualitative reasoning structures. Suggestions are generated and filtered based on agreement across multiple reference models. The system aims to bridge gaps between learner and expert terminology and provide automated feedback to support the learning process.
A survey on phrase structure learning methods for text classificationijnlc
Text classification is a task of automatic classification of text into one of the predefined categories. The
problem of text classification has been widely studied in different communities like natural language
processing, data mining and information retrieval. Text classification is an important constituent in many
information management tasks like topic identification, spam filtering, email routing, language
identification, genre classification, readability assessment etc. The performance of text classification
improves notably when phrase patterns are used. The use of phrase patterns helps in capturing non-local
behaviours and thus helps in the improvement of text classification task. Phrase structure extraction is the
first step to continue with the phrase pattern identification. In this survey, detailed study of phrase structure
learning methods have been carried out. This will enable future work in several NLP tasks, which uses
syntactic information from phrase structure like grammar checkers, question answering, information
extraction, machine translation, text classification. The paper also provides different levels of classification
and detailed comparison of the phrase structure learning methods.
Lexical Distribution in Citation Contexts through the IMRaD Standard - ECIR-2...Iana Atanassova
This study analyzed the distribution of verbs used in citation contexts within scientific papers structured using the IMRaD format (Introduction, Methods, Results, Discussion). Over 450,000 citation contexts from biology journals were identified and categorized by section. The top verbs used in each section were identified, with "show" being most common in introductions, "use" in methods, "use" in results, and "show" in discussions. Verb distributions followed Zipf's law, with a small number of verbs accounting for most uses. Citations were found to play different roles depending on their section, with verbs expressing these citation acts.
Translating natural language competency questions into sparql queries web2013Leila Zemmouchi-Ghomari
it's our presentation at WEB 2013 the first internationl conference on building and exploring web based environments, held at seville, Spain (January 27-February 01)
Towards Exploratory Relationship Search: A Clustering-based ApproachGong Cheng
This document presents an approach for exploratory relationship search through hierarchical clustering. It aims to address the challenge of too many relationship search results by organizing them into a cluster hierarchy based on common relationship patterns. An evaluation with participants performing lookup and exploratory search tasks on DBpedia data found that the clustering approach outperformed simple listing and faceted categorization alternatives. User feedback suggested areas for improvement like more concise visualizations and cognitive support. The authors conclude it is a promising approach and future work could combine facets and clustering or explore alternatives.
Falcons Explorer: Tabular and Relational End-user Programming for the Web of ...Gong Cheng
This presents Falcons Explorer at Semantic Web Challenge (SWC) 2010. Falcons Explorer is a tabular and relational interface for exploring the Web of data.
This document outlines an image morphing thesis that aims to create a database of expressions without using images of real people. The thesis will generate a database of 350 morphed images across 10 subjects, 5 faces each, and 7 expressions. An experiment will be conducted to test recognition rates using various methods on a training set of 40 images and testing set of 10 images. The results show that RCV, FisherFace and CV methods achieve over 90% recognition rates while EigenFace and NNC are lower at around 60%. Future work will compare the results to other related research.
This document discusses summarizing semantic data, including entity descriptions, entity associations, and semantic datasets. It describes extractive and abstractive summarization methods. For entity descriptions, intrinsic metrics like frequency, centrality, informativeness, and diversity are used to rank property-value pairs for the summary. Extrinsic metrics also utilize external knowledge and context. Similar methods are applied to summarizing entity associations by ranking paths between entities. Summarizing semantic datasets involves selecting a representative subset of the data.
When a modem connects to an internet service provider (ISP), it connects to a router called an access concentrator located at the ISP's Point of Presence, usually near a telephone facility. High-speed leased lines connect these Points of Presence to the ISP's main data center, where email and other servers are located. Each ISP connects to major internet backbones via very high-speed fiber-optic links, and these backbones are all interconnected so traffic automatically reroutes if one fails. Traffic then moves between data centers, businesses, universities, government agencies, and end users.
The document describes the NanJing Vocabulary Repository (NJVR), a freely accessible collection of real-world vocabularies created by crawling over 4 billion RDF triples from thousands of domains. NJVR contains RDF descriptions of over 2,900 vocabularies identified from 261 domains as well as statistical data on their usage. It was constructed through crawling, vocabulary identification, and analysis of vocabulary instantiations. The goal is to provide a large test collection for research on topics like vocabulary ranking and matching.
Presentation given at ISWC2008. It analyzes complex network characteristics of dependence between terms (i.e. classes and properties) on the Semantic Web as well as dependence between Web ontologies.
Taking up the Gaokao Challenge: An Information Retrieval ApproachGong Cheng
This document describes an information retrieval approach for answering questions from the Gaokao, China's national college entrance exam. It retrieves relevant concept pages, quotes, and disambiguates terms from Wikipedia. It ranks pages based on centrality within vector spaces of words, links, and categories, filtering within relevant historical categories. It assesses answer options based on the extent the question and pages can entail each option. In experiments, it correctly answered 43.09% of questions answerable from Wikipedia and 31.28% of questions outside Wikipedia's scope.
Explass: Exploring Associations between Entities via Top-K Ontological Patter...Gong Cheng
This document describes Explass, a system for exploring associations between entities via top-k ontological patterns and facets. It discusses challenges in exploring the over 1,000 associations within 4 hops in DBpedia and proposes two exploration methods: clustering associations into patterns and using entity/property classes as facets. The key steps involve mining significant patterns as frequent itemsets and selecting k patterns based on frequency, informativeness, and overlap. A demo of Explass on DBpedia is presented along with results of a user study comparing it to other approaches.
Surviving (and Thriving in) the Online Identity WarsJohn McCrea
The document discusses strategies for websites to survive and thrive in the current environment of online identity wars. It introduces concepts like the social web ecosystem and virtuous cycle from previous presentations. It notes the rapid shift towards an open social web driven by major players like Facebook, Google, and Microsoft competing to be preferred identity providers. The document outlines "do's" and "don'ts" for websites, advising them to implement open standards like Facebook Connect rather than proprietary APIs, focus on activity streams, build APIs around unique content, and closely monitor the evolving landscape.
Joseph Smarr shares results of a Plaxo/Google hybrid OpenID/OAuth "two-click signup" experiment at the OpenID Design Summit at Facebook on February 10, 2009.
Searching Semantic Web Objects Based on Class HierarchiesGong Cheng
This document summarizes a presentation on searching semantic web objects based on class hierarchies. The presentation discusses using class hierarchies to filter and restrict queries, recommends subclasses to refine searches, and uses an inverted index to match query terms to classes. It also covers heuristics for determining provenance of class typing information and an algorithm for recommending subclasses based on coverage of search results.
Acute Fatty Liver of Pregnancy (AFLP) is a rare but serious condition that affects 1 in 7,000-11,000 pregnancies. It is characterized by fatty infiltration and cellular dysfunction in the liver during late pregnancy or early postpartum. Prompt delivery is the recommended treatment as the condition does not typically improve until after delivery, and maternal and fetal mortality rates are high if not treated properly. Diagnosis is based on clinical presentation and lab tests in the absence of a definitive causative agent or diagnostic test. Close monitoring of future pregnancies is advised for women previously affected by AFLP.
BipRank: Ranking and Summarizing RDF Vocabulary DescriptionsGong Cheng
This document describes an approach called BipRank for ranking and summarizing RDF vocabulary descriptions. BipRank models the relationships between terms and sentences using a bipartite graph and ranks sentences based on their salience. It also considers the patterns of RDF sentences. The top-ranked sentences are then selected to generate a summary. Evaluation shows that BipRank correlates better with human judgments of sentence importance than prior work. It also generates summaries that experts rate as more relevant and cohesive.
Rahul Biswas is a computational physicist working as a postdoctoral research associate at the Center for Gravitational Wave and Astronomy. He obtained his PhD in Physics from the University of Wisconsin Milwaukee in 2010. His current research involves analyzing 250GB of astrophysical data from LIGO and Virgo detectors to classify noise transients and understand their origins using techniques like time series analysis, machine learning algorithms, and statistical modeling. Previously as a research assistant, he performed data analysis of LIGO-Virgo experiments to search for gravitational wave sources.
Continuous bag of words cbow word2vec word embedding work .pdfdevangmittal4
Continuous bag of words (cbow) word2vec word embedding work is that it tends to predict the
probability of a word given a context. A context may be a single word or a group of words. But for
simplicity, I will take a single context word and try to predict a single target word.
The purpose of this question is to be able to create a word embedding for the given data set.
data set text:
In linguistics word embeddings were discussed in the research area of distributional semantics. It
aims to quantify and categorize semantic similarities between linguistic items based on their
distributional properties in large samples of language data. The underlying idea that "a word is
characterized by the company it keeps" was popularized by Firth.
The technique of representing words as vectors has roots in the 1960s with the development of
the vector space model for information retrieval. Reducing the number of dimensions using
singular value decomposition then led to the introduction of latent semantic analysis in the late
1980s.In 2000 Bengio et al. provided in a series of papers the "Neural probabilistic language
models" to reduce the high dimensionality of words representations in contexts by "learning a
distributed representation for words". (Bengio et al, 2003). Word embeddings come in two different
styles, one in which words are expressed as vectors of co-occurring words, and another in which
words are expressed as vectors of linguistic contexts in which the words occur; these different
styles are studied in (Lavelli et al, 2004). Roweis and Saul published in Science how to use
"locally linear embedding" (LLE) to discover representations of high dimensional data structures.
The area developed gradually and really took off after 2010, partly because important advances
had been made since then on the quality of vectors and the training speed of the model.
There are many branches and many research groups working on word embeddings. In 2013, a
team at Google led by Tomas Mikolov created word2vec, a word embedding toolkit which can train
vector space models faster than the previous approaches. Most new word embedding techniques
rely on a neural network architecture instead of more traditional n-gram models and unsupervised
learning.
Limitations
One of the main limitations of word embeddings (word vector space models in general) is that
possible meanings of a word are conflated into a single representation (a single vector in the
semantic space). Sense embeddings are a solution to this problem: individual meanings of words
are represented as distinct vectors in the space.
For biological sequences: BioVectors
Word embeddings for n-grams in biological sequences (e.g. DNA, RNA, and Proteins) for
bioinformatics applications have been proposed by Asgari and Mofrad. Named bio-vectors
(BioVec) to refer to biological sequences in general with protein-vectors (ProtVec) for proteins
(amino-acid sequences) and gene-vectors (GeneVec) for gene sequences, this representa.
Semantic Relatedness for Evaluation of Course EquivalenciesBeibei Yang
The document outlines a doctoral dissertation defense presentation on using semantic relatedness to evaluate course equivalencies. The presentation includes an introduction, outlines knowledge sources and related work, describes two approaches to measuring semantic relatedness between courses, and discusses experimental results comparing the approaches.
This document discusses ontology engineering and practices. It begins with a review of ontology and compares ontology-like things such as controlled vocabularies, taxonomies, thesauruses, and data models. It defines ontology as a formal, explicit specification of a shared conceptualization of a domain. The document outlines ontology development methods and the typical life cycle of an ontology building project, which includes investigation, design, implementation, evaluation, and documentation. It provides an example ontology and discusses ontology building in summary.
Context Representation for the Semantic Web Jie Bao
This document discusses representing contexts on the semantic web. It defines contexts as capturing the relativity of meaning of data and knowledge. Contexts have aspects like assumptions, sources, scopes, and relations to other contexts. The document proposes extending McCarthy's context theory to represent contexts as objects on the semantic web with jurisdictions over interpreting data. It suggests using relations between contexts to control knowledge flow and inference with respect to contexts.
Latent Topic-semantic Indexing based Automatic Text SummarizationElaheh Barati
Automatic summarization, a difficult but pressing problem in natural language processing, aims at shortening source documents while retaining main information. In recent years, more statistical machine learning methods have been applied to automatic summarization. In this paper, we propose a novel approach for summarization, based on hierarchical Bayesian model of topic-semantic indexing (TSI) and extraction strategy of average log-likelihood. The new method is tested on Brown corpus, and its performance is analyzed by a well-designed blind experiment of one-way ANOVA on human reviews. The experimental results show that TSI model is promising on topic- driven summarization.
Towards Content-Based Dataset Search - Test Collections and BeyondGong Cheng
The document discusses content-based dataset search (CBDS) as an improvement over metadata-based dataset search (MBDS). It presents ACORDAR, a test collection for ad hoc CBDS using synthetic and TREC queries on RDF datasets. Evaluation results showed that both metadata and dataset content are useful, and that TREC queries are more difficult. CBDS faces challenges including scalability, tractability, and heterogeneity, but is likely to trend as it provides higher relevance and explainability than MBDS.
This document outlines an image morphing thesis that aims to create a database of expressions without using images of real people. The thesis will generate a database of 350 morphed images across 10 subjects, 5 faces each, and 7 expressions. An experiment will be conducted to test recognition rates using various methods on a training set of 40 images and testing set of 10 images. The results show that RCV, FisherFace and CV methods achieve over 90% recognition rates while EigenFace and NNC are lower at around 60%. Future work will compare the results to other related research.
This document discusses summarizing semantic data, including entity descriptions, entity associations, and semantic datasets. It describes extractive and abstractive summarization methods. For entity descriptions, intrinsic metrics like frequency, centrality, informativeness, and diversity are used to rank property-value pairs for the summary. Extrinsic metrics also utilize external knowledge and context. Similar methods are applied to summarizing entity associations by ranking paths between entities. Summarizing semantic datasets involves selecting a representative subset of the data.
When a modem connects to an internet service provider (ISP), it connects to a router called an access concentrator located at the ISP's Point of Presence, usually near a telephone facility. High-speed leased lines connect these Points of Presence to the ISP's main data center, where email and other servers are located. Each ISP connects to major internet backbones via very high-speed fiber-optic links, and these backbones are all interconnected so traffic automatically reroutes if one fails. Traffic then moves between data centers, businesses, universities, government agencies, and end users.
The document describes the NanJing Vocabulary Repository (NJVR), a freely accessible collection of real-world vocabularies created by crawling over 4 billion RDF triples from thousands of domains. NJVR contains RDF descriptions of over 2,900 vocabularies identified from 261 domains as well as statistical data on their usage. It was constructed through crawling, vocabulary identification, and analysis of vocabulary instantiations. The goal is to provide a large test collection for research on topics like vocabulary ranking and matching.
Presentation given at ISWC2008. It analyzes complex network characteristics of dependence between terms (i.e. classes and properties) on the Semantic Web as well as dependence between Web ontologies.
Taking up the Gaokao Challenge: An Information Retrieval ApproachGong Cheng
This document describes an information retrieval approach for answering questions from the Gaokao, China's national college entrance exam. It retrieves relevant concept pages, quotes, and disambiguates terms from Wikipedia. It ranks pages based on centrality within vector spaces of words, links, and categories, filtering within relevant historical categories. It assesses answer options based on the extent the question and pages can entail each option. In experiments, it correctly answered 43.09% of questions answerable from Wikipedia and 31.28% of questions outside Wikipedia's scope.
Explass: Exploring Associations between Entities via Top-K Ontological Patter...Gong Cheng
This document describes Explass, a system for exploring associations between entities via top-k ontological patterns and facets. It discusses challenges in exploring the over 1,000 associations within 4 hops in DBpedia and proposes two exploration methods: clustering associations into patterns and using entity/property classes as facets. The key steps involve mining significant patterns as frequent itemsets and selecting k patterns based on frequency, informativeness, and overlap. A demo of Explass on DBpedia is presented along with results of a user study comparing it to other approaches.
Surviving (and Thriving in) the Online Identity WarsJohn McCrea
The document discusses strategies for websites to survive and thrive in the current environment of online identity wars. It introduces concepts like the social web ecosystem and virtuous cycle from previous presentations. It notes the rapid shift towards an open social web driven by major players like Facebook, Google, and Microsoft competing to be preferred identity providers. The document outlines "do's" and "don'ts" for websites, advising them to implement open standards like Facebook Connect rather than proprietary APIs, focus on activity streams, build APIs around unique content, and closely monitor the evolving landscape.
Joseph Smarr shares results of a Plaxo/Google hybrid OpenID/OAuth "two-click signup" experiment at the OpenID Design Summit at Facebook on February 10, 2009.
Searching Semantic Web Objects Based on Class HierarchiesGong Cheng
This document summarizes a presentation on searching semantic web objects based on class hierarchies. The presentation discusses using class hierarchies to filter and restrict queries, recommends subclasses to refine searches, and uses an inverted index to match query terms to classes. It also covers heuristics for determining provenance of class typing information and an algorithm for recommending subclasses based on coverage of search results.
Acute Fatty Liver of Pregnancy (AFLP) is a rare but serious condition that affects 1 in 7,000-11,000 pregnancies. It is characterized by fatty infiltration and cellular dysfunction in the liver during late pregnancy or early postpartum. Prompt delivery is the recommended treatment as the condition does not typically improve until after delivery, and maternal and fetal mortality rates are high if not treated properly. Diagnosis is based on clinical presentation and lab tests in the absence of a definitive causative agent or diagnostic test. Close monitoring of future pregnancies is advised for women previously affected by AFLP.
BipRank: Ranking and Summarizing RDF Vocabulary DescriptionsGong Cheng
This document describes an approach called BipRank for ranking and summarizing RDF vocabulary descriptions. BipRank models the relationships between terms and sentences using a bipartite graph and ranks sentences based on their salience. It also considers the patterns of RDF sentences. The top-ranked sentences are then selected to generate a summary. Evaluation shows that BipRank correlates better with human judgments of sentence importance than prior work. It also generates summaries that experts rate as more relevant and cohesive.
Rahul Biswas is a computational physicist working as a postdoctoral research associate at the Center for Gravitational Wave and Astronomy. He obtained his PhD in Physics from the University of Wisconsin Milwaukee in 2010. His current research involves analyzing 250GB of astrophysical data from LIGO and Virgo detectors to classify noise transients and understand their origins using techniques like time series analysis, machine learning algorithms, and statistical modeling. Previously as a research assistant, he performed data analysis of LIGO-Virgo experiments to search for gravitational wave sources.
Continuous bag of words cbow word2vec word embedding work .pdfdevangmittal4
Continuous bag of words (cbow) word2vec word embedding work is that it tends to predict the
probability of a word given a context. A context may be a single word or a group of words. But for
simplicity, I will take a single context word and try to predict a single target word.
The purpose of this question is to be able to create a word embedding for the given data set.
data set text:
In linguistics word embeddings were discussed in the research area of distributional semantics. It
aims to quantify and categorize semantic similarities between linguistic items based on their
distributional properties in large samples of language data. The underlying idea that "a word is
characterized by the company it keeps" was popularized by Firth.
The technique of representing words as vectors has roots in the 1960s with the development of
the vector space model for information retrieval. Reducing the number of dimensions using
singular value decomposition then led to the introduction of latent semantic analysis in the late
1980s.In 2000 Bengio et al. provided in a series of papers the "Neural probabilistic language
models" to reduce the high dimensionality of words representations in contexts by "learning a
distributed representation for words". (Bengio et al, 2003). Word embeddings come in two different
styles, one in which words are expressed as vectors of co-occurring words, and another in which
words are expressed as vectors of linguistic contexts in which the words occur; these different
styles are studied in (Lavelli et al, 2004). Roweis and Saul published in Science how to use
"locally linear embedding" (LLE) to discover representations of high dimensional data structures.
The area developed gradually and really took off after 2010, partly because important advances
had been made since then on the quality of vectors and the training speed of the model.
There are many branches and many research groups working on word embeddings. In 2013, a
team at Google led by Tomas Mikolov created word2vec, a word embedding toolkit which can train
vector space models faster than the previous approaches. Most new word embedding techniques
rely on a neural network architecture instead of more traditional n-gram models and unsupervised
learning.
Limitations
One of the main limitations of word embeddings (word vector space models in general) is that
possible meanings of a word are conflated into a single representation (a single vector in the
semantic space). Sense embeddings are a solution to this problem: individual meanings of words
are represented as distinct vectors in the space.
For biological sequences: BioVectors
Word embeddings for n-grams in biological sequences (e.g. DNA, RNA, and Proteins) for
bioinformatics applications have been proposed by Asgari and Mofrad. Named bio-vectors
(BioVec) to refer to biological sequences in general with protein-vectors (ProtVec) for proteins
(amino-acid sequences) and gene-vectors (GeneVec) for gene sequences, this representa.
Semantic Relatedness for Evaluation of Course EquivalenciesBeibei Yang
The document outlines a doctoral dissertation defense presentation on using semantic relatedness to evaluate course equivalencies. The presentation includes an introduction, outlines knowledge sources and related work, describes two approaches to measuring semantic relatedness between courses, and discusses experimental results comparing the approaches.
This document discusses ontology engineering and practices. It begins with a review of ontology and compares ontology-like things such as controlled vocabularies, taxonomies, thesauruses, and data models. It defines ontology as a formal, explicit specification of a shared conceptualization of a domain. The document outlines ontology development methods and the typical life cycle of an ontology building project, which includes investigation, design, implementation, evaluation, and documentation. It provides an example ontology and discusses ontology building in summary.
Context Representation for the Semantic Web Jie Bao
This document discusses representing contexts on the semantic web. It defines contexts as capturing the relativity of meaning of data and knowledge. Contexts have aspects like assumptions, sources, scopes, and relations to other contexts. The document proposes extending McCarthy's context theory to represent contexts as objects on the semantic web with jurisdictions over interpreting data. It suggests using relations between contexts to control knowledge flow and inference with respect to contexts.
Latent Topic-semantic Indexing based Automatic Text SummarizationElaheh Barati
Automatic summarization, a difficult but pressing problem in natural language processing, aims at shortening source documents while retaining main information. In recent years, more statistical machine learning methods have been applied to automatic summarization. In this paper, we propose a novel approach for summarization, based on hierarchical Bayesian model of topic-semantic indexing (TSI) and extraction strategy of average log-likelihood. The new method is tested on Brown corpus, and its performance is analyzed by a well-designed blind experiment of one-way ANOVA on human reviews. The experimental results show that TSI model is promising on topic- driven summarization.
Similar to An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems (7)
Towards Content-Based Dataset Search - Test Collections and BeyondGong Cheng
The document discusses content-based dataset search (CBDS) as an improvement over metadata-based dataset search (MBDS). It presents ACORDAR, a test collection for ad hoc CBDS using synthetic and TREC queries on RDF datasets. Evaluation results showed that both metadata and dataset content are useful, and that TREC queries are more difficult. CBDS faces challenges including scalability, tractability, and heterogeneity, but is likely to trend as it provides higher relevance and explainability than MBDS.
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...Gong Cheng
This document presents an algorithm called CORE for generating compact yet relaxable answers to keyword queries over knowledge graphs. CORE aims to balance answer compactness, defined as having a bounded diameter, with answer completeness, defined as covering the most query keywords. It provides theoretical foundations for the existence of such answers and uses a best-first search approach. An evaluation shows CORE efficiently computes answers that are more complete than alternatives while remaining compact.
Semantic Data Retrieval: Search, Ranking, and SummarizationGong Cheng
Gong Cheng presented on semantic data retrieval, including entity retrieval and association retrieval from semantic graphs. He discussed two main challenges: efficiently searching large graphs for associations within a diameter bound, and ranking the retrieved associations. For the first challenge, he proposed algorithms using path finding, pruning, and result deduplication. For the second challenge, he conducted a user study and found that association size was the most important ranking factor. Other proposed measures like entity homogeneity and relation heterogeneity had mixed user preferences.
Semantic Web related top conference reviewGong Cheng
The document summarizes key topics in semantic web and knowledge graph research from 2014-2017, including conferences, hot research areas, applications, and papers. It discusses trends such as increasing focus on knowledge graph applications, integration, and construction using techniques like neural networks. Notable news includes Google calling for dataset metadata and Wikidata creating its 31 millionth entity. The road ahead may involve greater knowledge graph commercialization, enrichment, and making knowledge graphs more accessible on the Web.
The document proposes a new approach called relatedness-based multi-entity summarization (MES) to generate concise summaries of related entities. It formulates MES as a quadratic multidimensional knapsack problem (QMKP) to select important and diverse intra-entity features while also selecting inter-entity features that indicate relatedness. It presents an algorithm called REMES based on the grasp heuristic to solve the QMKP formulation. A user study shows REMES outperforms other entity summarization methods at multi-entity summarization tasks.
Generating Illustrative Snippets for Open Data on the WebGong Cheng
We propose generating illustrative snippets from datasets to serve with metadata on dataset search engines. Currently, only metadata is shown. Snippets would help users understand the contents faster by covering important types and entities, using familiar entities, and keeping entities related. We formulate the snippet generation as a maximum-weight-and-coverage connected graph problem to optimize for these qualities. Experimental results show our snippets outperform baselines.
Efficient Algorithms for Association Finding and Frequent Association Pattern...Gong Cheng
The document presents efficient algorithms for association finding and frequent association pattern mining in large graph data. It describes the problems of finding all associations connecting a set of query entities within a diameter constraint and mining frequent association patterns. The basic solutions and optimizations for association finding using distance-based pruning and distance oracles are discussed. For frequent pattern mining, it addresses generating a canonical code to uniquely represent patterns and counting code occurrences to determine frequency. Experiments on real datasets demonstrate the efficiency and scalability of the approaches.
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset SummarizationGong Cheng
This document summarizes the HIEDS approach to hierarchical dataset summarization. HIEDS aims to provide multigranular summaries that preserve dataset structure and are comprehensible. It models summarization as a multidimensional knapsack problem to maximize subgroup cohesion and moderateness while disallowing large overlap. HIEDS uses a greedy strategy for efficient solving but requires non-trivial implementation. Experiments show HIEDS outperforms the baseline by generating hierarchical rather than flat groups with better trade-offs and less redundancy.
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...Gong Cheng
Presented at WWW'15, Florence.
Gong Cheng, Danyun Xu, Yuzhong Qu. Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking. In Proceedings of the 24th International World Wide Web Conference (WWW), pages 184--194, 2015.
Facilitating Human Intervention in Coreference Resolution with Comparative En...Gong Cheng
The document presents a method for facilitating human intervention in coreference resolution by providing comparative entity summaries. It describes using properties and values of candidate coreferent entities to generate summaries that reflect their commonality and differences. The optimal summary maximizes commonality, difference, identity information and diversity, subject to a length limit. An evaluation involved human subjects verifying coreferent relationships for different summarization approaches. The comparative summary approach was found to improve verification efficiency without affecting accuracy as much as only showing common properties or entire descriptions.
RELIN: Relatedness and Informativeness-based Centrality for Entity SummarizationGong Cheng
This document presents RELIN, a model for entity summarization that ranks features based on relatedness and informativeness. RELIN extends PageRank by making the transition probabilities between features proportional to their relatedness and the amount of new information provided. It defines the relatedness between features based on the relatedness of their properties and values. Informativeness is measured by self-information which favors features that are less likely to co-occur. The model was implemented to rank features for entity summarization using these relatedness and informativeness measures.
MyView is a Linked Data browser designed for citizen users that allows for link traversal and filtering, customization of links and views, browsing history and bookmarks, online reasoning, result explanation, and coreference resolution. It was created by Gong Cheng of Nanjing University.
Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots - like ChatGPT - can converse on any topic, task-oriented chatbots - the focus of this paper - are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots.
To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.
The Department of Veteran Affairs (VA) invited Taylor Paschal, Knowledge & Information Management Consultant at Enterprise Knowledge, to speak at a Knowledge Management Lunch and Learn hosted on June 12, 2024. All Office of Administration staff were invited to attend and received professional development credit for participating in the voluntary event.
The objectives of the Lunch and Learn presentation were to:
- Review what KM ‘is’ and ‘isn’t’
- Understand the value of KM and the benefits of engaging
- Define and reflect on your “what’s in it for me?”
- Share actionable ways you can participate in Knowledge - - Capture & Transfer
What is an RPA CoE? Session 1 – CoE VisionDianaGray10
In the first session, we will review the organization's vision and how this has an impact on the COE Structure.
Topics covered:
• The role of a steering committee
• How do the organization’s priorities determine CoE Structure?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillLizaNolte
HERE IS YOUR WEBINAR CONTENT! 'Mastering Customer Journey Management with Dr. Graham Hill'. We hope you find the webinar recording both insightful and enjoyable.
In this webinar, we explored essential aspects of Customer Journey Management and personalization. Here’s a summary of the key insights and topics discussed:
Key Takeaways:
Understanding the Customer Journey: Dr. Hill emphasized the importance of mapping and understanding the complete customer journey to identify touchpoints and opportunities for improvement.
Personalization Strategies: We discussed how to leverage data and insights to create personalized experiences that resonate with customers.
Technology Integration: Insights were shared on how inQuba’s advanced technology can streamline customer interactions and drive operational efficiency.
How information systems are built or acquired puts information, which is what they should be about, in a secondary place. Our language adapted accordingly, and we no longer talk about information systems but applications. Applications evolved in a way to break data into diverse fragments, tightly coupled with applications and expensive to integrate. The result is technical debt, which is re-paid by taking even bigger "loans", resulting in an ever-increasing technical debt. Software engineering and procurement practices work in sync with market forces to maintain this trend. This talk demonstrates how natural this situation is. The question is: can something be done to reverse the trend?
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/temporal-event-neural-networks-a-more-efficient-alternative-to-the-transformer-a-presentation-from-brainchip/
Chris Jones, Director of Product Management at BrainChip , presents the “Temporal Event Neural Networks: A More Efficient Alternative to the Transformer” tutorial at the May 2024 Embedded Vision Summit.
The expansion of AI services necessitates enhanced computational capabilities on edge devices. Temporal Event Neural Networks (TENNs), developed by BrainChip, represent a novel and highly efficient state-space network. TENNs demonstrate exceptional proficiency in handling multi-dimensional streaming data, facilitating advancements in object detection, action recognition, speech enhancement and language model/sequence generation. Through the utilization of polynomial-based continuous convolutions, TENNs streamline models, expedite training processes and significantly diminish memory requirements, achieving notable reductions of up to 50x in parameters and 5,000x in energy consumption compared to prevailing methodologies like transformers.
Integration with BrainChip’s Akida neuromorphic hardware IP further enhances TENNs’ capabilities, enabling the realization of highly capable, portable and passively cooled edge devices. This presentation delves into the technical innovations underlying TENNs, presents real-world benchmarks, and elucidates how this cutting-edge approach is positioned to revolutionize edge AI across diverse applications.
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor IvaniukFwdays
At this talk we will discuss DDoS protection tools and best practices, discuss network architectures and what AWS has to offer. Also, we will look into one of the largest DDoS attacks on Ukrainian infrastructure that happened in February 2022. We'll see, what techniques helped to keep the web resources available for Ukrainians and how AWS improved DDoS protection for all customers based on Ukraine experience
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...Alex Pruden
Folding is a recent technique for building efficient recursive SNARKs. Several elegant folding protocols have been proposed, such as Nova, Supernova, Hypernova, Protostar, and others. However, all of them rely on an additively homomorphic commitment scheme based on discrete log, and are therefore not post-quantum secure. In this work we present LatticeFold, the first lattice-based folding protocol based on the Module SIS problem. This folding protocol naturally leads to an efficient recursive lattice-based SNARK and an efficient PCD scheme. LatticeFold supports folding low-degree relations, such as R1CS, as well as high-degree relations, such as CCS. The key challenge is to construct a secure folding protocol that works with the Ajtai commitment scheme. The difficulty, is ensuring that extracted witnesses are low norm through many rounds of folding. We present a novel technique using the sumcheck protocol to ensure that extracted witnesses are always low norm no matter how many rounds of folding are used. Our evaluation of the final proof system suggests that it is as performant as Hypernova, while providing post-quantum security.
Paper Link: https://eprint.iacr.org/2024/257
Session 1 - Intro to Robotic Process Automation.pdfUiPathCommunity
👉 Check out our full 'Africa Series - Automation Student Developers (EN)' page to register for the full program:
https://bit.ly/Automation_Student_Kickstart
In this session, we shall introduce you to the world of automation, the UiPath Platform, and guide you on how to install and setup UiPath Studio on your Windows PC.
📕 Detailed agenda:
What is RPA? Benefits of RPA?
RPA Applications
The UiPath End-to-End Automation Platform
UiPath Studio CE Installation and Setup
💻 Extra training through UiPath Academy:
Introduction to Automation
UiPath Business Automation Platform
Explore automation development with UiPath Studio
👉 Register here for our upcoming Session 2 on June 20: Introduction to UiPath Studio Fundamentals: https://community.uipath.com/events/details/uipath-lagos-presents-session-2-introduction-to-uipath-studio-fundamentals/
Session 1 - Intro to Robotic Process Automation.pdf
An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems
1. .nju.edu.cn
An Empirical Study of Vocabulary Relatedness
and Its Application to Recommender Systems
Gong Cheng, Saisai Gong, Yuzhong Qu
State Key Laboratory for Novel Software Technology, Nanjing University, China
gcheng@nju.edu.cn
Presented at ISWC2011
2. ws .nju.edu.cn
Measuring term similarity
0.9
FacultyMember Faculty
FullProfessor 0.8 Professor
AssistantProfessor
AssistantProfessor
Vocabulary matching 1.0
Gong Cheng (程龚) gcheng@nju.edu.cn 2 of 36
3. ws .nju.edu.cn
Measuring vocabulary similarity
Semantic Web for Research
Communities (SWRC)
Foundational Model of
Anatomy (FMA)
0.8 0.5
Vocabulary distance
GALEN 0.6
0.02
eBiquity Person 0.5
NCBI organismal classification
Vocabulary matching (NCBITaxon)
Gong Cheng (程龚) gcheng@nju.edu.cn 3 of 36
4. ws .nju.edu.cn
Measuring vocabulary relatedness
Vocabulary relatedness
FacultyMember Postgraduate-Research-
Degree
Vocabulary distance
FullProfessor
PhD EngD
AssistantProfessor
Vocabulary matching
not that similar, but somewhat related
Gong Cheng (程龚) gcheng@nju.edu.cn 4 of 36
5. Contributions
ws .nju.edu.cn
How to measure vocabulary relatedness?
6 measures, from 4 aspects
How about vocabulary relatedness in real-life cases?
Empirical analysis of 2,996 vocabularies and other 4 billion RDF triples
Where to apply vocabulary relatedness?
Post-selection vocabulary recommendation in vocabulary search
Gong Cheng (程龚) gcheng@nju.edu.cn 5 of 36
6. Outline
ws .nju.edu.cn
Data set
Vocabulary relatedness
Post-selection vocabulary recommendation
Conclusions
Gong Cheng (程龚) gcheng@nju.edu.cn 6 of 36
7. Data set statistics
ws .nju.edu.cn
Crawled from February 2010 to May 2011 by
Gong Cheng (程龚) gcheng@nju.edu.cn 7 of 36
8. Data set distributions
ws .nju.edu.cn
RDF documents over pay-level domains
Gong Cheng (程龚) gcheng@nju.edu.cn 8 of 36
9. Data set distributions
ws .nju.edu.cn
Vocabularies over top-level domains
Gong Cheng (程龚) gcheng@nju.edu.cn 9 of 36
10. Outline
ws .nju.edu.cn
Data set
Vocabulary relatedness
Post-selection vocabulary recommendation
Conclusions
Gong Cheng (程龚) gcheng@nju.edu.cn 10 of 36
11. Vocabulary relatedness
ws .nju.edu.cn
6 numerical measures, from 4 aspects
Semantic relatedness
Explicit
Implicit
Hybrid
Content similarity
Expressivity closeness
Distributional relatedness
Comparison
Gong Cheng (程龚) gcheng@nju.edu.cn 11 of 36
12. Measure 1: explicit semantic relatedness
ws .nju.edu.cn
E 1
RS v i , v j
weight of a shortestpathbetween vi and v j in GE
1 2
GE v1 v2 v3
owl:imports owl:priorVersion
v1 v3
v2
rdfs:seeAlso
Gong Cheng (程龚) gcheng@nju.edu.cn 12 of 36
13. Measure 2: implicit semantic relatedness
ws .nju.edu.cn
I 1
RS v i , v j
weight of a shortestpathbetween vi and v j in GI
1 2
GI v2 v3 v4
owl:inverseOf rdfs:subClassOf
t2 t4
t3
owl:inverseOf
v2 v3 v4
Gong Cheng (程龚) gcheng@nju.edu.cn 13 of 36
14. Measure 3: hybrid semantic relatedness
ws .nju.edu.cn
E I 1
RS vi , v j
weight of a shortestpathbetween vi and v j in GE I
1 v2
GE+I 1 v4
v1
2
v3
Gong Cheng (程龚) gcheng@nju.edu.cn 14 of 36
15. Empirical analysis (1)
ws .nju.edu.cn
Statistical properties of GE, GI and GE+I
Gong Cheng (程龚) gcheng@nju.edu.cn 15 of 36
16. Empirical analysis (2)
ws .nju.edu.cn
Explicit relations between vocabularies
Gong Cheng (程龚) gcheng@nju.edu.cn 16 of 36
17. Measure 4: content similarity
ws .nju.edu.cn
Harmonic mean
Maximum similarity between their labels
Gong Cheng (程龚) gcheng@nju.edu.cn 17 of 36
18. Empirical analysis (3)
ws .nju.edu.cn
86 label-like properties
rdfs:label, dc:title, and their subproperties (e.g. skos:prefLabel)
and local name
Terms and their labels Vocabulary distribution
36.33% 36.21%
63.67% w/ w/
63.79%
w/o w/o
Gong Cheng (程龚) gcheng@nju.edu.cn 18 of 36
20. Empirical analysis (4)
ws .nju.edu.cn
4,978 meta-level terms, 469 (9.42%) in >1 vocabulary
Most popular meta-level terms
1. rdf:type
2. rdfs:domain
3. rdfs:range
4. …
and after excluding language constructs
10.13 meta-level terms per vocabulary
≤20 meta-level terms in 92.96% vocabularies
but hundreds in Cyc
Gong Cheng (程龚) gcheng@nju.edu.cn 20 of 36
21. Measure 6: distributional relatedness
ws .nju.edu.cn
Distributional profile
p v1 | v
p v2 | v
DP v RD vi , v j cos DP vi , DP v j
...
p vn | v
Gong Cheng (程龚) gcheng@nju.edu.cn 21 of 36
22. Empirical analysis (5)
ws .nju.edu.cn
Instantiation found for 1,874 (62.55%) vocabularies
Most popular vocabularies (excluding languages)
Gong Cheng (程龚) gcheng@nju.edu.cn 22 of 36
23. Empirical analysis (6)
ws .nju.edu.cn
Co-instantiation found for 9,763 pairs of vocabularies
Most popular vocabulary co-instantiation (excluding languages)
Gong Cheng (程龚) gcheng@nju.edu.cn 23 of 36
24. Vocabulary relatedness
ws .nju.edu.cn
6 numerical measures, from 4 aspects
Semantic relatedness
Explicit
Implicit
Hybrid
Content similarity
Expressivity closeness
Distributional relatedness
Comparison
Gong Cheng (程龚) gcheng@nju.edu.cn 24 of 36
25. Agreement between measures
ws .nju.edu.cn
Spearman’s rank correlation coefficient (ρ∈[-1,1])
Single-link hierarchical clustering
Gong Cheng (程龚) gcheng@nju.edu.cn 25 of 36
26. Outline
ws .nju.edu.cn
Data set
Vocabulary relatedness
Post-selection vocabulary recommendation
Conclusions
Gong Cheng (程龚) gcheng@nju.edu.cn 26 of 36
27. Relatedness-based ranking
ws .nju.edu.cn
Ranking by single measure:
Ranking by multiple measures:
Gong Cheng (程龚) gcheng@nju.edu.cn 27 of 36
28. Popularity-based re-ranking
ws .nju.edu.cn
Degree of influence of popularity
Number of pay-level domains instantiating vi
Gong Cheng (程龚) gcheng@nju.edu.cn 28 of 36
29. Evaluation settings
ws .nju.edu.cn
20 “selections” randomly selected from 1,302 moderate-sized vocabularies
Depth-10 pooling with
2 experts
Ratings
Closely related: 2
Somewhat related: 1
Unrelated: 0
Metric: NDCG
Gong Cheng (程龚) gcheng@nju.edu.cn 29 of 36
30. Gold standard
ws .nju.edu.cn
739 assessments
Assessments
7.85% Closely related
10.55%
81.60% Somewhat related
Unrelated
Agreement between experts
80%
or 91% when “closely related = somewhat related = related”
Gong Cheng (程龚) gcheng@nju.edu.cn 30 of 36
31. Evaluation results --- individual measures
ws .nju.edu.cn
56.88% isolated vocabularies in GE 37.45% uninstantiated vocabularies
Gong Cheng (程龚) gcheng@nju.edu.cn 31 of 36
32. Evaluation results --- combinations of measures
ws .nju.edu.cn
Gong Cheng (程龚) gcheng@nju.edu.cn 32 of 36
33. Relatedness vs. popularity
ws .nju.edu.cn
NDCG@1 vs. number of pay-level domains instantiating it
Gong Cheng (程龚) gcheng@nju.edu.cn 33 of 36
34. Outline
ws .nju.edu.cn
Data set
Vocabulary relatedness
Post-selection vocabulary recommendation
Conclusions
Gong Cheng (程龚) gcheng@nju.edu.cn 34 of 36
36. Take away
ws .nju.edu.cn
Vocabulary meta-descriptions are incomplete.
Terms lack labels.
Co-instantiated ∝ explicitly related
http://ws.nju.edu.cn/falcons/ontologysearch/
Gong Cheng (程龚) gcheng@nju.edu.cn 36 of 36