Topic Modeling : Clustering of Deep Webpagescsandit
The internet is comprised of massive amount of info
rmation in the form of zillions of web
pages.This information can be categorized into the
surface web and the deep web. The existing
search engines can effectively make use of surface
web information.But the deep web remains
unexploited yet. Machine learning techniques have b
een commonly employed to access deep
web content.
Under Machine Learning, topic models provide a simp
le way to analyze large volumes of
unlabeled text. A "topic" consists of a cluster of
words that frequently occur together. Using
contextual clues, topic models can connect words wi
th similar meanings and distinguish
between words with multiple meanings. Clustering is
one of the key solutions to organize the
deep web databases.In this paper, we cluster deep w
eb databases based on the relevance found
among deep web forms by employing a generative prob
abilistic model called Latent Dirichlet
Allocation(LDA) for modeling content representative
of deep web databases. This is
implemented after preprocessing the set of web page
s to extract page contents and form
contents.Further, we contrive the distribution of “
topics per document” and “words per topic”
using the technique of Gibbs sampling. Experimental
results show that the proposed method
clearly outperforms the existing clustering methods
.
Full-Text Retrieval in Unstructured P2P Networks using Bloom Cast Efficientlyijsrd.com
Efficient and effective full-text retrieval in unstructured peer-to-peer networks remains a challenge in the research community. First, it is difficult, if not impossible, for unstructured P2P systems to effectively locate items with guaranteed recall. Second, existing schemes to improve search success rate often rely on replicating a large number of item replicas across the wide area network, incurring a large amount of communication and storage costs. In this paper, we propose BloomCast, an efficient and effective full-text retrieval scheme, in unstructured P2P networks. By leveraging a hybrid P2P protocol, BloomCast replicates the items uniformly at random across the P2P networks, achieving a guaranteed recall at a communication cost of O (N), where N is the size of the network. Furthermore, by casting Bloom Filters instead of the raw documents across the network, BloomCast significantly reduces the communication and storage costs for replication. Results show that BloomCast achieves an average query recall, which outperforms the existing WP algorithm by 18 percent, while BloomCast greatly reduces the search latency for query processing by 57 percent.
Effective Feature Selection for Mining Text Data with Side-InformationIJTET Journal
Abstract— Many text documents contain side-information. Many web documents consist of meta-data with them which correspond to different kinds of attributes such as the source or other information related to the origin of the document. Data such as location, ownership or even temporal information may be considered as side-information. This huge amount of information may be used for performing text clustering. This information can either improve the quality of the representation for the mining process, or can add noise to the process. When the information is noisy it can be a risky approach for performing mining process along with the side-information. These noises can reduce the quality of clustering while if the side-information is informative then it can improve the quality of clustering. In existing system, Gini index is used as the feature selection method to filter the informative side-information from text documents. It is effective to a certain extent but the remaining number of features is still huge. It is important to use feature selection methods to handle the high dimensionality of data for effective text categorization. In the proposed system, In order to improve the document clustering and classification accuracy as well as reduce the number of selected features, a novel feature selection method was proposed. To improve the accuracy and purity of document clustering with less time complexity a new method called Effective Feature Selection (EFS) is introduced. This three-stage procedure includes feature subset selection, feature ranking and feature re-ranking.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
Topic Modeling : Clustering of Deep Webpagescsandit
The internet is comprised of massive amount of info
rmation in the form of zillions of web
pages.This information can be categorized into the
surface web and the deep web. The existing
search engines can effectively make use of surface
web information.But the deep web remains
unexploited yet. Machine learning techniques have b
een commonly employed to access deep
web content.
Under Machine Learning, topic models provide a simp
le way to analyze large volumes of
unlabeled text. A "topic" consists of a cluster of
words that frequently occur together. Using
contextual clues, topic models can connect words wi
th similar meanings and distinguish
between words with multiple meanings. Clustering is
one of the key solutions to organize the
deep web databases.In this paper, we cluster deep w
eb databases based on the relevance found
among deep web forms by employing a generative prob
abilistic model called Latent Dirichlet
Allocation(LDA) for modeling content representative
of deep web databases. This is
implemented after preprocessing the set of web page
s to extract page contents and form
contents.Further, we contrive the distribution of “
topics per document” and “words per topic”
using the technique of Gibbs sampling. Experimental
results show that the proposed method
clearly outperforms the existing clustering methods
.
Full-Text Retrieval in Unstructured P2P Networks using Bloom Cast Efficientlyijsrd.com
Efficient and effective full-text retrieval in unstructured peer-to-peer networks remains a challenge in the research community. First, it is difficult, if not impossible, for unstructured P2P systems to effectively locate items with guaranteed recall. Second, existing schemes to improve search success rate often rely on replicating a large number of item replicas across the wide area network, incurring a large amount of communication and storage costs. In this paper, we propose BloomCast, an efficient and effective full-text retrieval scheme, in unstructured P2P networks. By leveraging a hybrid P2P protocol, BloomCast replicates the items uniformly at random across the P2P networks, achieving a guaranteed recall at a communication cost of O (N), where N is the size of the network. Furthermore, by casting Bloom Filters instead of the raw documents across the network, BloomCast significantly reduces the communication and storage costs for replication. Results show that BloomCast achieves an average query recall, which outperforms the existing WP algorithm by 18 percent, while BloomCast greatly reduces the search latency for query processing by 57 percent.
Effective Feature Selection for Mining Text Data with Side-InformationIJTET Journal
Abstract— Many text documents contain side-information. Many web documents consist of meta-data with them which correspond to different kinds of attributes such as the source or other information related to the origin of the document. Data such as location, ownership or even temporal information may be considered as side-information. This huge amount of information may be used for performing text clustering. This information can either improve the quality of the representation for the mining process, or can add noise to the process. When the information is noisy it can be a risky approach for performing mining process along with the side-information. These noises can reduce the quality of clustering while if the side-information is informative then it can improve the quality of clustering. In existing system, Gini index is used as the feature selection method to filter the informative side-information from text documents. It is effective to a certain extent but the remaining number of features is still huge. It is important to use feature selection methods to handle the high dimensionality of data for effective text categorization. In the proposed system, In order to improve the document clustering and classification accuracy as well as reduce the number of selected features, a novel feature selection method was proposed. To improve the accuracy and purity of document clustering with less time complexity a new method called Effective Feature Selection (EFS) is introduced. This three-stage procedure includes feature subset selection, feature ranking and feature re-ranking.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
A Domain Based Approach to Information Retrieval in Digital Libraries - Rotel...University of Bari (Italy)
The current abundance of electronic documents requires automatic techniques that support the users in understanding their content and extracting useful information. To this aim, improving the retrieval performance must necessarily go beyond simple lexical interpretation of the user queries, and pass through an understanding of their semantic content and aims. It goes without saying that any digital library would take enormous advantage from the availability of effective Information Retrieval techniques to provide to their users. This paper proposes an approach to Information Retrieval based on a correspondence of the domain of discourse between the query and the documents in the repository. Such an association is based on standard general-purpose linguistic resources (WordNet and WordNet Domains) and on a novel similarity assessment technique. Although the work is at a preliminary stage, interesting initial results suggest to go on extending and improving the approach.
In this Research paper, we present an overview of
research issues in web mining. We discuss mining with respect to
web data referred here as web data mining. In particular, our
focus is on web data mining research in context of our web
warehousing project.We have categorized web data mining into
three areas; web content mining, web structure mining and web
usage mining. We have highlighted and discussed various
research issues involved in each of these web data mining
category. We believe that web data mining will be the topic of
exploratory research in near future.
Beyond Seamless Access: Meta-data In The Age of Content IntegrationNew York University
This was an example of meta-data research that I did before Dot-COM bubble hit the East Coast in 2000. Much of what we envisioned for content integration shaped the meta-data movement for today. Its full potentials have not reached yet, e.g. the level of intelligent data for semantic apps, personalized delivery, interactive and bidirectional-linking services, repurposed services, etc. It's the first of its kind weaving content from scholarly publications (particularly in the context of formal and informal communications) with library mission critical applications in authority control, meta-data, directory services, ILS, ILL, knowledge-base for site map, etc.
A Survey on Text Mining-techniques and applicationRyota Eisaki
Abstract:
Text Mining is the process of extracting interesting information or knowledge or patterns from the unstructured text that are from different sources. In this paper, There is a vast amount of financial information on companies. financial performance available to investors today. While automatic analysis of financial figures is common, it has been difficult to automatically extract meaning from the textual part of financial reports. The textual part of an annual report contains richer information than the financial ratios. In this paper, we combine data mining methods for analyzing quantitative and qualitative data from financial reports, in order to see if the textual part of the report contains some indication about future financial performance. The quantitative analysis has been performed using selforganizing maps, and the qualitative analysis using prototype-matching text clustering. The analysis is performed on the quarterly reports of three leading companies in the telecommunications sector.
Towards Webpage Steganography with Attribute Truth TableSreekanth Reddy
The idea is to conceal the Secret Information in the source code of the Webpage.
The decoding of the secrete information is carried out using
tailor made client side plug-in.
Dynamic & Attribute Weighted KNN for Document Classification Using Bootstrap ...IJERA Editor
Although publicly accessible databases containing speech documents. It requires a great deal of time and effort
required to keep them up to date is often burdensome. In an effort to help identify speaker of speech if text is
available, text-mining tools, from the machine learning discipline, it can be applied to help in this process also.
Here, we describe and evaluate document classification algorithms i.e. a combo pack of text mining and
classification. This task asked participants to design classifiers for identifying documents containing speech
related information in the main literature, and evaluated them against one another. Expected systems utilizes a
novel approach of k -nearest neighbour classification and compare its performance by taking different values of
k.
A NEAR-DUPLICATE DETECTION ALGORITHM TO FACILITATE DOCUMENT CLUSTERINGIJDKP
Web Ming faces huge problems due to Duplicate and Near Duplicate Web pages. Detecting Near
Duplicates is very difficult in large collection of data like ”internet”. The presence of these web pages
plays an important role in the performance degradation while integrating data from heterogeneous
sources. These pages either increase the index storage space or increase the serving costs. Detecting these
pages has many potential applications for example may indicate plagiarism or copyright infringement.
This paper concerns detecting, and optionally removing duplicate and near duplicate documents which are
used to perform clustering of documents .We demonstrated our approach in web news articles domain. The
experimental results show that our algorithm outperforms in terms of similarity measures. The near
duplicate and duplicate document identification has resulted reduced memory in repositories.
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...Computer Science Journals
With the increased number of web databases, major part of deep web is one of the bases of database. In several search engines, encoded data in the returned resultant pages from the web often comes from structured databases which are referred as Web databases (WDB).
A Soft Set-based Co-occurrence for Clustering Web User TransactionsTELKOMNIKA JOURNAL
Grouping web transactions into some clusters are essential to gain a better understanding the behavior of the users, which e-commerce companies widely use this grouping process. Therefore, clustering web transaction is important even though it is challenging data mining issue. The problems arise because there is uncertainty when forming clusters. Clustering web user transaction has used the rough set theory for managing uncertainty in the clustering process. However, it suffers from high computational complexity and low cluster purity. In this study, we propose a soft set-based co-occurrence for clustering web user transactions. Unlike rough set approach that uses similarity approach, the novelty of this approach uses a co-occurrence approach of soft set theory. We compare the proposed approach and rough set approaches regarding computational complexity and cluster purity. The result demonstrates better performance and is more effective so that lower computational complexity is achieved with the improvement more than 100% and cluster purity is higher as compared to two previous rough set-based approaches.
CAA 2014 - To Boldly or Bravely Go? Experiences of using Semantic Technologie...Keith.May
This paper is based upon practical experiences of Conceptual modelling, using CIDOC CRM, of the single context recording system at English Heritage and mapping it to other 'single context' based systems. It also presents recent work on identifying conceptual commonalities that may exist in different archaeological recording methodologies, whether 'single context recording' or otherwise, along with practical challenges based on experiences of trying to integrate, or simply search across, data from different archaeological recording systems. In addition it introduces the work to date on developing http://www.heritagedata.org/ and suggests opportunities for sharing and aligning further archaeological vocabularies using SKOS and Linked Open Data technologies.
A Domain Based Approach to Information Retrieval in Digital Libraries - Rotel...University of Bari (Italy)
The current abundance of electronic documents requires automatic techniques that support the users in understanding their content and extracting useful information. To this aim, improving the retrieval performance must necessarily go beyond simple lexical interpretation of the user queries, and pass through an understanding of their semantic content and aims. It goes without saying that any digital library would take enormous advantage from the availability of effective Information Retrieval techniques to provide to their users. This paper proposes an approach to Information Retrieval based on a correspondence of the domain of discourse between the query and the documents in the repository. Such an association is based on standard general-purpose linguistic resources (WordNet and WordNet Domains) and on a novel similarity assessment technique. Although the work is at a preliminary stage, interesting initial results suggest to go on extending and improving the approach.
In this Research paper, we present an overview of
research issues in web mining. We discuss mining with respect to
web data referred here as web data mining. In particular, our
focus is on web data mining research in context of our web
warehousing project.We have categorized web data mining into
three areas; web content mining, web structure mining and web
usage mining. We have highlighted and discussed various
research issues involved in each of these web data mining
category. We believe that web data mining will be the topic of
exploratory research in near future.
Beyond Seamless Access: Meta-data In The Age of Content IntegrationNew York University
This was an example of meta-data research that I did before Dot-COM bubble hit the East Coast in 2000. Much of what we envisioned for content integration shaped the meta-data movement for today. Its full potentials have not reached yet, e.g. the level of intelligent data for semantic apps, personalized delivery, interactive and bidirectional-linking services, repurposed services, etc. It's the first of its kind weaving content from scholarly publications (particularly in the context of formal and informal communications) with library mission critical applications in authority control, meta-data, directory services, ILS, ILL, knowledge-base for site map, etc.
A Survey on Text Mining-techniques and applicationRyota Eisaki
Abstract:
Text Mining is the process of extracting interesting information or knowledge or patterns from the unstructured text that are from different sources. In this paper, There is a vast amount of financial information on companies. financial performance available to investors today. While automatic analysis of financial figures is common, it has been difficult to automatically extract meaning from the textual part of financial reports. The textual part of an annual report contains richer information than the financial ratios. In this paper, we combine data mining methods for analyzing quantitative and qualitative data from financial reports, in order to see if the textual part of the report contains some indication about future financial performance. The quantitative analysis has been performed using selforganizing maps, and the qualitative analysis using prototype-matching text clustering. The analysis is performed on the quarterly reports of three leading companies in the telecommunications sector.
Towards Webpage Steganography with Attribute Truth TableSreekanth Reddy
The idea is to conceal the Secret Information in the source code of the Webpage.
The decoding of the secrete information is carried out using
tailor made client side plug-in.
Dynamic & Attribute Weighted KNN for Document Classification Using Bootstrap ...IJERA Editor
Although publicly accessible databases containing speech documents. It requires a great deal of time and effort
required to keep them up to date is often burdensome. In an effort to help identify speaker of speech if text is
available, text-mining tools, from the machine learning discipline, it can be applied to help in this process also.
Here, we describe and evaluate document classification algorithms i.e. a combo pack of text mining and
classification. This task asked participants to design classifiers for identifying documents containing speech
related information in the main literature, and evaluated them against one another. Expected systems utilizes a
novel approach of k -nearest neighbour classification and compare its performance by taking different values of
k.
A NEAR-DUPLICATE DETECTION ALGORITHM TO FACILITATE DOCUMENT CLUSTERINGIJDKP
Web Ming faces huge problems due to Duplicate and Near Duplicate Web pages. Detecting Near
Duplicates is very difficult in large collection of data like ”internet”. The presence of these web pages
plays an important role in the performance degradation while integrating data from heterogeneous
sources. These pages either increase the index storage space or increase the serving costs. Detecting these
pages has many potential applications for example may indicate plagiarism or copyright infringement.
This paper concerns detecting, and optionally removing duplicate and near duplicate documents which are
used to perform clustering of documents .We demonstrated our approach in web news articles domain. The
experimental results show that our algorithm outperforms in terms of similarity measures. The near
duplicate and duplicate document identification has resulted reduced memory in repositories.
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...Computer Science Journals
With the increased number of web databases, major part of deep web is one of the bases of database. In several search engines, encoded data in the returned resultant pages from the web often comes from structured databases which are referred as Web databases (WDB).
A Soft Set-based Co-occurrence for Clustering Web User TransactionsTELKOMNIKA JOURNAL
Grouping web transactions into some clusters are essential to gain a better understanding the behavior of the users, which e-commerce companies widely use this grouping process. Therefore, clustering web transaction is important even though it is challenging data mining issue. The problems arise because there is uncertainty when forming clusters. Clustering web user transaction has used the rough set theory for managing uncertainty in the clustering process. However, it suffers from high computational complexity and low cluster purity. In this study, we propose a soft set-based co-occurrence for clustering web user transactions. Unlike rough set approach that uses similarity approach, the novelty of this approach uses a co-occurrence approach of soft set theory. We compare the proposed approach and rough set approaches regarding computational complexity and cluster purity. The result demonstrates better performance and is more effective so that lower computational complexity is achieved with the improvement more than 100% and cluster purity is higher as compared to two previous rough set-based approaches.
CAA 2014 - To Boldly or Bravely Go? Experiences of using Semantic Technologie...Keith.May
This paper is based upon practical experiences of Conceptual modelling, using CIDOC CRM, of the single context recording system at English Heritage and mapping it to other 'single context' based systems. It also presents recent work on identifying conceptual commonalities that may exist in different archaeological recording methodologies, whether 'single context recording' or otherwise, along with practical challenges based on experiences of trying to integrate, or simply search across, data from different archaeological recording systems. In addition it introduces the work to date on developing http://www.heritagedata.org/ and suggests opportunities for sharing and aligning further archaeological vocabularies using SKOS and Linked Open Data technologies.
PoolParty Semantic Search Server is described technologically. How to use SKOS thesauri to map data from different sources and how to generate a semantic index. How to build precise faceted search.
Navigate, search and link SharePoint content by use of semantic technologies based on Semantic SP.
Semantic technologies build the basis for smart content management systems. Functionalities of such technologies range from automatic tagging / text mining to taxonomy / ontology management. From a user perspective, improved search, contextualisation of information, e.g. automatic content recommendation, and means for a better understanding of interlinked information are key for professional information management.
SharePoint is a frequently used carrier-system of enterprise content which offers some basic functionalities for semantic information management out-of-the-box. In this webinar, you will see how these features are usually used, e.g. SharePoint’s Term Store, and how those components can be extended by a set of additional functionalities provided by Semantic SP.
We demonstrate and discuss the benefit of use cases based on the following components of the Semantic SP product family:
PowerTagging for SharePoint: Automatic tagging and semantic indexing of documents by use of text mining based on enterprise vocabularies. Semantic search based on SharePoint’s standard search component.
Semantic Knowledge Base for SharePoint: See how to publish and navigate enterprise vocabularies, complex semantic networks and/or ontologies within a SharePoint server.
Taxonomy Creator for SharePoint: See how to create and maintain very large and complex taxonomies by use of PoolParty Thesaurus Server, to import into SP Term Store or to enable PowerTagging for SharePoint.
From the 11th to 16th of November the Norwegian University of Life Sciences Library is hosting a meeting with people from Sokoine Agricultural University (SUA), University of Dar es Salaam (UDSM), Ardhi University (ARU) and Tanzania Meteorological Institute (TMA) working on the project "Strengthening documentation, communication and dissemination of information related to climate change impacts, adaptation and mitigation in Tanzania". The objective is to build the Tanzania Climate Change Repository, TaCCIRe, a subject repository based on DSpace.
The Tanzania Climate Change Information Repository is a digital collection of the intellectual output which is online, free of charge and free from most copyright and licensing restrictions. This repository aims at documenting and enhancing access to relevant information resources produced by CCIAM programme and related information generated from Tanzania. Through the TaCCIRe, research on climate change related to Tanzania and Africa are more visible to the rest of the world.
Next Friday (15th of November) people participating in the project and the AIMS editorial team will participate in a webinar to share the importance of using controlled vocabularies like AGROVOC on DSpace.
Topic Modeling : Clustering of Deep Webpagescsandit
The internet is comprised of massive amount of information in the form of zillions of web pages.This information can be categorized into the surface web and the deep web. The existing search engines can effectively make use of surface web information.But the deep web remains unexploited yet. Machine learning techniques have been commonly employed to access deep web content.
Under Machine Learning, topic models provide a simple way to analyze large volumes of unlabeled text. A "topic" consists of a cluster of words that frequently occur together. Using
contextual clues, topic models can connect words with similar meanings and distinguish between words with multiple meanings. Clustering is one of the key solutions to organize the deep web databases.In this paper, we cluster deep web databases based on the relevance found among deep web forms by employing a generative probabilistic model called Latent Dirichlet
Allocation(LDA) for modeling content representative of deep web databases. This is implemented after preprocessing the set of web pages to extract page contents and form
contents.Further, we contrive the distribution of “topics per document” and “words per topic”
using the technique of Gibbs sampling. Experimental results show that the proposed method clearly outperforms the existing clustering methods.
Semantic annotation, which is considered one of the semantic web applicative aspects, has been adopted by researchers from different communities as a paramount solution that improves searching and retrieval of information by promoting the richness of the content. However, researchers are facing challenges concerning both the quality and the relevance of the semantic annotations attached to the annotated document against its content as well as its semantics, without ignoring those regarding automation process which is supposed to ensure an optimal system for information indexing and retrieval. In this article, we will introduce the semantic annotation concept by presenting a state of the art including definitions, features and a classification of annotation systems. Systems and proposed approaches in the field will be cited, as well as a study of some existing annotation tools. This study will also pinpoint various problems and limitations related to the annotation in order to offer solutions for our future work.
Great model a model for the automatic generation of semantic relations betwee...ijcsity
The
large
a
v
ailable
am
ou
n
t
of
non
-
structured
texts
that
b
e
-
long
to
differe
n
t
domains
su
c
h
as
healthcare
(e.g.
medical
records),
justice
(e.g.
l
a
ws,
declarations),
insurance
(e.g.
declarations),
etc. increases
the
effort
required
for
the
analysis
of
information
in
a
decision making
pro
-
cess.
Differe
n
t
pr
o
jects
and t
o
ols
h
av
e
pro
p
osed
strategies
to
reduce
this
complexi
t
y
b
y
classifying,
summarizing
or
annotating
the
texts.
P
artic
-
ularl
y
,
text
summary
strategies
h
av
e
pr
ov
en
to
b
e
v
ery
useful
to
pr
o
vide
a
compact
view
of
an
original
text.
H
ow
e
v
er,
the
a
v
ailable
strategies
to
generate
these
summaries
do
not
fit
v
ery
w
ell
within
the
domains
that
require
ta
k
e
i
n
to
consideration
the
tem
p
oral
dimension
of
the
text
(e.g.
a
rece
n
t
piece
of
text
in
a
medical
record
is
more
im
p
orta
n
t
than
a
pre
-
vious
one)
and
the
profile
of
the
p
erson
who
requires
the
summary
(e.g
the
medical
s
p
ecialization).
T
o
co
p
e with
these
limitations
this
pa
p
er
prese
n
ts
”GRe
A
T”
a
m
o
del
for
automatic
summary
generation
that
re
-
lies
on
natural
language
pr
o
cessing
and
text
mining
te
c
hniques
to
extract
the
most
rele
v
a
n
t
information
from
narrati
v
e
texts
and
disc
o
v
er
new
in
-
formation
from
the
detection
of
related
information. GRe
A
T
M
o
del
w
as impleme
n
ted
on
sof
tw
are
to
b
e
v
alidated
in
a
health
institution
where
it
has
sh
o
wn
to
b
e
v
ery
useful
to displ
a
y
a
preview
of
the
information
a
b
ou
t
medical
health
records
and
disc
o
v
er
new
facts
and
h
y
p
otheses
within
the
information.
Se
v
eral
tests
w
ere
executed
su
c
h
as
F
unctional
-
i
t
y
,
Usabili
t
y
and
P
erformance
regarding
to
the
impleme
n
ted
sof
t
w
are.
In
addition,
precision
and
recall
measures
w
ere
applied
on
the
results
ob
-
tained
through
the
impleme
n
ted
t
o
ol,
as
w
ell
as
on
the
loss
of
information
obtained
b
y
pr
o
viding
a
text
more
shorter than
the
original
A Comprehensive Survey on Comparisons across Contextual Pre-Filtering, Contex...TELKOMNIKA JOURNAL
Recently, there has been growing interest in recommender systems (RS) and particularly in context-aware RS. Methods for generating context-aware recommendations are classified into pre-filtering, post-filtering and contextual modelling approaches. In this paper, we present the several novel approaches of the different variant of each of these three contextualization paradigms and present a complete survey on the state-of-the-art comparisons across them. We then identify the significant challenges that require being addressed by the current RS researchers, which will help academicians and practitioners in comparing these three approaches to select the best alternative according to their strategies.
Data Integration in Multi-sources Information Systemsijceronline
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING cscpconf
In the last decade, ontologies have played a key technology role for information sharing and agents interoperability in different application domains. In semantic web domain, ontologies are efficiently used toface the great challenge of representing the semantics of data, in order to bring the actual web to its full
power and hence, achieve its objective. However, using ontologies as common and shared vocabularies requires a certain degree of interoperability between them. To confront this requirement, mapping ontologies is a solution that is not to be avoided. In deed, ontology mapping build a meta layer that allows different applications and information systems to access and share their informations, of course, after resolving the different forms of syntactic, semantic and lexical mismatches. In the contribution presented in this paper, we have integrated the semantic aspect based on an external lexical resource, wordNet, to design a new algorithm for fully automatic ontology mapping. This fully automatic character features the
main difference of our contribution with regards to the most of the existing semi-automatic algorithms of ontology mapping, such as Chimaera, Prompt, Onion, Glue, etc. To better enhance the performances of our algorithm, the mapping discovery stage is based on the combination of two sub-modules. The former
analysis the concept’s names and the later analysis their properties. Each one of these two sub-modules is
it self based on the combination of lexical and semantic similarity measures.
Abstract:
A growing number of resources are available for enriching documents with semantic annotations. While originally focused on a few standard classes of annotations, the ecosystem of annotators is now becoming increasingly diverse. Although annotators often have very different vocabularies, with both high-level and specialist
concepts, they also have many semantic interconnections. We will show that both the overlap and the diversity in annotator vocabularies motivate the need for semantic annotation integration: middleware that produces a unified annotation on top of diverse semantic annotators. On the one hand, the diversity of vocabulary allows applications
to benefit from the much richer vocabulary available in
an integrated vocabulary. On the other hand, we present evidence that the most widely-used annotators on the web suffer from serious accuracy deficiencies: the overlap in vocabularies from individual annotators allows an integrated annotator to boost accuracy by exploiting inter-annotator agreement and disagreement.
The integration of semantic annotations leads to new challenges, both compared to usual data integration scenarios and to standard aggregation of machine learning tools. We overview an approach to these challenges that performs ontology-aware aggregation. We
introduce an approach that requires no training data, making use of ideas from database repair. We experimentally compare this with a supervised approach, which adapts maximal entropy Markov models to the setting of ontology-based annotations. We further experimentally compare both these approaches with respect to ontology-unaware
supervised approaches, and to individual annotators.
Similar to Semantically indexed hypermedia linking information disciplines (20)
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
2. Tudhope
Semantically Indexed Hypermedia: Linking Information Disciplines
reasoning. Different types of indexing system are possible. It is useful to categorise indexing systems according to
three dimensions [van Rijsbergen 1979]:
1.
whether index terms are automatically derived or manually assigned.
2.
whether index terms belong to a controlled vocabulary or are uncontrolled ('free').
3.
whether terms can be combined as ordered strings representing a single concept when indexing (precoordinated terms), e.g. "Association of Computing Machinery", or must be post-coordinated on retrieval.
The latter allows the possibility of 'false positives' where items are returned that have no connection
between different terms in the source string.
Information Retrieval (IR) has tended towards automatically generated free text index terms (post-coordinated),
weighted by statistical frequency of terms in documents and collections. On the other hand, distinguishing features
of a semantic index are that semantic relationships exist between controlled index terms, usually (but not
necessarily) the result of manual cataloguing. Semantically indexed hypermedia links are, by definition, computed,
corresponding to Intensional-Retrieval links [DeRose 1989]. This allows the possibility of flexible query-based
navigation tools.
2 Thesauri and Classification Systems
The semantic index approach employs a set of semantic relationships between index terms, following the well
established thesaurus tradition in information science (ISO 2788, ISO 5964). A large number of thesauri exist,
covering a variety of subject domains, for example the Medical Subject Headings [MeSH 1999] and the Art and
Architecture Thesaurus [AAT 1999]. Classification systems, such as Dewey Decimal or Library of Congress, focus
on hierarchical relationships. These controlled vocabularies are part of standard cataloguing practice in libraries and
museums and are now being applied to digital hypertexts via thematic keywords in metadata resource descriptors.
For example, the Dublin Core [DC 1999] standard metadata set includes elements for Title, Creator, Date, Format,
etc. in addition to the more complex notion of the Subject (or theme) of a resource. Guidelines recommend that,
where possible, the Subject element be taken from a relevant controlled vocabulary. Links between concepts in the
subject domain can be expressed by the semantic relationships in a thesaurus. The three main thesaurus relationships
are Equivalence (equivalent terms), Hierarchical (broader/narrower terms), and Associative (more loosely Related
Terms). Sometimes specialisations of the three main relationships are included (for example distinguishing
taxonomic and instance hierarchical relationships). Following a minimalist approach to semantic modelling by
restricting the set of relationships permits interoperability of cataloguing/retrieval tools and techniques. It also
facilitates automated reasoning over this core set of relationships.
3 Using semantic index links
Navigation is provided indirectly by queries to the semantic index space, as opposed to directly following explicit
links between information items. The queries can be simple or complex. The conventional hypermedia navigation
techniques may be implemented by relatively simple queries [Tudhope 1994], although there would be no particular
reason to use a semantic index to achieve that functionality. One additional possibility provided by a semantic index
space is an organised set of browsable concept descriptors, as a means of comprehending the associated layer of
media items [Bruza 1990], [Pollard 1993]. The user can browse the index space, 'beam down' to view media items of
interest, and conversely 'beam up' to the index space from media items. Additionally, when index terms are
combined, the user may browse around each term, broadening and narrowing the specificity of description and
seeing the effect on likely 'hits' [Pollitt 1997]. Alternatively, the combined terms can be considered as locating a
position in a 'hyperindex', permitting a string of terms to be broadened or narrowed in one navigation action [Bruza
1990]. If a user enters a set of query terms as opposed to browsing the index space, equivalence relationships permit
a broad entry vocabulary of synonyms to be tied together for retrieval purposes, without the user having to specify
ACM Computing Surveys, Vol. 31, Number 4es, December 1999
2
3. Tudhope
Semantically Indexed Hypermedia: Linking Information Disciplines
the exact term employed for indexing. As a simple example, this document is indexed by a set of controlled
vocabulary terms from the ACM Computing Classification [ACM 1998] (see Categories and Subject Descriptors
above). In the ACM Digital Library pages, explicit hypertext links can be navigated. In addition, controlled
vocabulary index terms can be combined with free text terms when searching the library and the hypertext version
of the classification can be browsed as a subject index in order to select terms for searching.
Beyond this, the inclusion of semantic information in the index space provides the opportunity for knowledge-based
hypermedia systems that provide intelligent navigation support and retrieval, with the system taking a more active
role in the navigation process than relying on manual browsing alone. For example, rules governing permitted
combinations of terms can filter a user's possible navigation options [Arents 1993], [Rada 1993]. Work at the
University of Glamorgan explores the potential of reasoning over the semantic relationships in the index space.
Traversal of transitive relationships makes possible imprecise matching between query and media item, or between
two media items, rather than relying on an exact match of controlled vocabulary terms [Tudhope 1997]. Expanding
terms offers an augmented browsing capacity based on measures of distance in the semantic index space. Results
can be post-processed for expression in a particular retrieval tool. Various possibilities exist for indirect computed
links with such hybrid query/navigation tools [Cunliffe 1997]. For example, information items with semantically
close terms can be ranked in the result or destination set, or the system might automatically suggest terms to be
considered for inclusion in a query. If facets exist for time and place in the index space, then a result set can be
returned as a dynamic guided tour based on temporal or spatial relationships (or indeed other orderings).
Alternatively, the focus of a user's navigation can remain in the document (media) space, typically requiring less
cognitive overhead than constructing a formal query [Marchionini 1995]. In this case, having found an information
item of interest, the navigation action consists of requesting "More items like this one", with the system responsible
for a (best-match) similarity measure of the item's index terms. At the cost of greater cognitive demand on the user,
the source context for the navigation may be modified and particular media items or terms (de)emphasised (cf.
relevance feedback techniques in IR).
4 Key application to RDF and the WWW
Semantically based retrieval underpins diverse efforts to provide access to distributed multimedia resources, such as
the many projects involving SGML (XML) and Z39.50 for networked access to cross-platform information. Major
efforts are underway to create subject-based gateways to Internet resources, sometimes combining manually indexed
and robot harvested metadata. The W3C Recommendation for a 'machine-understandable' Resource Description
Framework supports the thrust of this research [Lassila 1999]. An RDF descriptor might include the Dublin Core
element, Subject, specifying a classification or thesaurus to which keywords belong. Precise semantic index retrieval
tools will be required to provide a manageable set of results to requests that may span several collections [Doerr
1997], and may involve networked terminology servers and more than one thesaurus or classification. One point
worth emphasising is the social dimension to access and the link with existing cataloguing practice. Controlled
vocabularies are often the result of standards efforts in subject domains, continue to evolve, and are part of a
network of practice and education/training in the information science community. They have the potential to act as a
bridge between information provider and seeker, "a semantic road map for searchers and indexers" [Soergel 1995],
if tools can be devised that visualise their structure and how they may be used.
5 Research issues
A number of key issues for research remain if the potential of significant gains in precision of information access is
to be realised.
•
An advantage of building query functionality into hypertext navigation is a smooth transition between
querying and browsing. Can we identify the appropriate extent of cognitive effort demanded by interfaces
to navigation tools? How far should the internal workings of matching functions or the detail of the
underlying semantic network be brought to the surface?
ACM Computing Surveys, Vol. 31, Number 4es, December 1999
3
4. Tudhope
Semantically Indexed Hypermedia: Linking Information Disciplines
•
Some applications may lend themselves to the specialisation of the standard thesaurus relationships into
richer sets, particularly the associative relationship. For example, in some situations it may be useful to
distinguish various kinds of causal relationships from the generic associative relationship.
•
The problem of expressing similarity between pre-coordinated strings of semantic index terms needs
further investigation. How much should be pre-computed and what can be left to dynamic computation?
How best can we express syntax or structure in such strings? This effort converges with work on
description logic ontologies [Bullock 1998], [Weinstein 1998].
•
Various efforts attempt to combine statistical IR and semantic controlled vocabulary approaches. For
example, Agosti et al [Agosti 1995] propose a three layer architecture for Hypermedia IR systems
combining a statistical index layer and a semantic (thesaurus) layer (see also [Aslandogan 1997],
[Chiaramella 1996]). Studies of online searching behaviour have investigated conditions influencing choice
of free text or controlled vocabulary terms (e.g. [Fidel 1991]). How should the two approaches be best
integrated - should they be seen as different components of a toolkit, or should a matching function
incorporate both statistical weighting and semantic measures? In addition, indirect semantic links and
explicit authored links will soon be combined in link/search engines. What principles should guide this
integration?
•
The semantic interoperability of overlapping but different thesauri is an important issue for remote access
to distributed sets of resources employing controlled vocabularies in metadata. A concept may exist in one
vocabulary but not another, or may map (partially) to various concepts.
References
[AAT 1999] Art and Architecture Thesaurus Browser, [Online: http://shiva.pub.getty.edu/aat_browser/], 1999.
[ACM 1998] ACM Computing Classification. http://www.acm.org/class/1998/
[Agosti 1995] Maristella Agosti, Massimo Melucci, and Fabio Crestani. "Automatic Authoring and Construction of
Hypermedia for Information Retrieval" in ACM Multimedia Systems, 3(1), 15-24, 1995.
[Arents 1993] Hans C. Arents and Walter F. L. Bogaerts. "Navigation without Links and Nodes without Contents:
Intensional Navigation in a Third-Order Hypermedia System" in Hypermedia, 5(3), 187-204, 1993.
[Aslandogan 1997] Y. Alp Aslandogan, Chuck Thier, Clement T. Yu, Jon Zou, and Naphtali Rishe. "Using
Semantic Contents and WordNet in Image Retrieval" in Proceedings of ACM SIGIR '97, 286-295, 1997.
[Berners-Lee 1998a] Tim Berners-Lee. World Wide Web Design Issues: A Roadmap to the Semantic Web,
[Online: http://www.w3.org/DesignIssues/Semantic.html], 1998.
[Bruza 1990] Peter Bruza. "Hyperindices: A Novel Aid for Searching in Hypermedia" in Proceedings of the ACM
European Conference on Hypertext '90 (ECHT '90), Versailles, France,109-122, November 1990.
[Bullock 1998] Joseph Bullock and Carole Goble. "TourisT: The Application of a Description Logic based
Semantic Hypermedia System for Tourism" in Proceedings of ACM Hypertext '98, Pittsburgh PA, 132-141, June
1998.
[Chiaramella 1996] Yves Chiaramella and Ammar Kheirbek. "An Integrated Model for Hypermedia and
Information Retrieval" in Information Retrieval and Hypertext, Maristella Agosti and Alan Smeaton (editors),
Kluwer, 139-178, 1996.
ACM Computing Surveys, Vol. 31, Number 4es, December 1999
4
5. Tudhope
Semantically Indexed Hypermedia: Linking Information Disciplines
[Collier 1987] George Collier. "Thoth-II: Hypertext with Explicit Semantics" in Proceedings of ACM Hypertext
'87, Chapel Hill, NC, 269-289, November 1987.
[Cunliffe 1997] Daniel Cunliffe, Carl Taylor, and Douglas Tudhope. "Query-based Navigation in Semantically
Indexed Hypermedia" in Proceedings of ACM Hypertext 97, Southampton, UK, 87-95, April 1997.
[DC 1999] Dublin Core. [Online: http://purl.org/metadata/dublin_core], 1999.
[DeRose 1989] Steven J. DeRose. "Expanding the Notion of Links" in Proceedings of ACM Hypertext '89,
Pittsburgh, PA, 249-257, November 1989.
[Doerr 1997] Martin Doerr, Irene Fundulaki and Vassilis Christophidis. "The Specialist Seeks Expert Views:
Managing Digital Folders in the AQUARELLE Project" in Proceedings of Museums and the Web, David Bearman
and Jennifer Trant (editors), 261-270, 1997.
[Fidel 1991] Raya Fidel. "Searchers' Selection of Search Keys (I-III)" in Journal of American Society for
Information Science, 42(7), 490-527, 1991.
[Frisse 1989] Mark E. Frisse and Steven B. Cousins. "Information retrieval from hypertext: Update on the Dynamic
Medical Handbook" in Proceedings of ACM Hypertext '89, Pittsburgh, PA, 199-211, November 1989.
[Lassila 1999] Ora Lassila and Ralph Swick (editors), "Resource Description Framework (RDF) Model and Syntax
Specification" World Wide Web Consortium Recommendation, [Online: http://www.w3.org/TR/REC-rdf-syntax/],
February 22 1999.
[Marchionini 1995] Gary Marchionini. Information Seeking in Electronic Environments. Cambridge University
Press, 1995.
[MeSH 1999] MeSH 1999. Medical Subject Headings homepage. http://www.nlm.nih.gov/mesh/meshhome.html
[Nanard 1991] Jocelyne Nanard and Mark Nanard. "Using structured types to incorporate knowledge in hypertext"
in Proceedings of ACM Hypertext '91, San Antonio, TX, 329-344, December 1991.
[Pollard 1993] Richard Pollard. "A hypertext-based thesaurus as a subject browsing aid for bibliographic databases"
in Information Processing and Management, 29(3), 345-357, 1993.
[Pollitt 1997] Steven Pollitt, Martin P Smith and Patrick A J Braekevelt. "View-based Searching Systems" in
Proceedings of Joint Workshop of BCS IR and HCI Specialist Groups, (Johnson and Dunlop eds.) 73-77.
[Rada 1993] Roy Rada, Weigang Wang, Alex Birchall. "Retrieval hierarchies in hypertext" in Information
Processing and Management 29(3), 359-371, 1993.
[Schnase 1993] John L. Schnase, John J. Leggett, David L. Hicks, and Ron L. Szabo. "Semantic Data Modeling of
Hypermedia Associations. ACM Transactions on Information Systems (TOIS), 11(1), 27-49, January 1993.
[Soergel 1995] Dagobert Soergel. "The Art and Architecture Thesaurus (AAT): a critical appraisal" in Visual
Resources, 10(4), 369-400, 1995.
[Trigg 1986] Randall H. Trigg and Mark Weiser. "Textnet: A Network-based Approach to Text Handling" in ACM
Transactions on Office Information Systems (TOIS), 4(1), 1-23, January 1986.
[Tudhope 1994] Douglas Tudhope, Paul Beynon-Davies, Carl Taylor, and Chris B. Jones. "Virtual Architecture
Based on a Binary Relational Model: A Museum Hypermedia Application" in Hypermedia, 6(3), 174-192, 1994.
ACM Computing Surveys, Vol. 31, Number 4es, December 1999
5
6. Tudhope
Semantically Indexed Hypermedia: Linking Information Disciplines
[Tudhope 1997] Douglas Tudhope and Carl Taylor. "Navigation via Similarity: Automatic Lnking Based on
Semantic Closeness" in Information Processing and Management, 33(2), 233-242, 1997.
[van Rijsbergen 1979] C. J. "Keith" van Rijsbergen. Information Retrieval. Butterworth, 1979.
[Weinstein 1998] Peter C. Weinstein. "Ontology-based metadata: transforming the MARC legacy" in Proceedings
of ACM Digital Libraries '98, 254-263, 1998.
ACM Computing Surveys, Vol. 31, Number 4es, December 1999
6