Semantic web is a web of future. The Resource Description Framework (RDF) is a language
to represent resources in the World Wide Web. When these resources are queried the problem of duplicate
query results occurs. The present techniques used hash index comparison to remove duplicate query
results. The major drawback of using the hash index to remove duplicate query results is that, if there is a
slight change in formatting or word order, then hash index is changed and query results are no more
considered as duplicate even though they have same contents. We presented an algorithm for detection and
elimination of duplicate query results from semantic web using hash index and page size comparisons.
Experimental results showed that the proposed technique removed duplicate query results from semantic
web efficiently, solved the problems of using hash index for duplicate handling and could be embedded in
existing SQL-Based query system for semantic web. Research could be carried out for certain flexibilities
in existing SQL-Based query system of semantic web to accommodate other duplicate detection techniques
as well.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
A Novel Data Extraction and Alignment Method for Web DatabasesIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
Existing work on keyword search relies on an element-level model (data graphs) to compute keyword query results.Elements mentioning keywords are retrieved from this model and paths between them are explored to compute Steiner graphs. KRG (keyword Element Relationship Graph) captures relationships at the keyword level.Relationships captured by a KRG are not direct edges between tuples but stand for paths between keywords.
We propose to route keywords only to relevant sources to reduce the high cost of processing keyword search queries over all sources. A multilevel scoring mechanism is proposed for computing the relevance of routing plans based on scores at the level of keywords, data elements,element sets, and subgraphs that connect these elements
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A NEAR-DUPLICATE DETECTION ALGORITHM TO FACILITATE DOCUMENT CLUSTERINGIJDKP
Web Ming faces huge problems due to Duplicate and Near Duplicate Web pages. Detecting Near
Duplicates is very difficult in large collection of data like ”internet”. The presence of these web pages
plays an important role in the performance degradation while integrating data from heterogeneous
sources. These pages either increase the index storage space or increase the serving costs. Detecting these
pages has many potential applications for example may indicate plagiarism or copyright infringement.
This paper concerns detecting, and optionally removing duplicate and near duplicate documents which are
used to perform clustering of documents .We demonstrated our approach in web news articles domain. The
experimental results show that our algorithm outperforms in terms of similarity measures. The near
duplicate and duplicate document identification has resulted reduced memory in repositories.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
A Novel Data Extraction and Alignment Method for Web DatabasesIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
Existing work on keyword search relies on an element-level model (data graphs) to compute keyword query results.Elements mentioning keywords are retrieved from this model and paths between them are explored to compute Steiner graphs. KRG (keyword Element Relationship Graph) captures relationships at the keyword level.Relationships captured by a KRG are not direct edges between tuples but stand for paths between keywords.
We propose to route keywords only to relevant sources to reduce the high cost of processing keyword search queries over all sources. A multilevel scoring mechanism is proposed for computing the relevance of routing plans based on scores at the level of keywords, data elements,element sets, and subgraphs that connect these elements
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A NEAR-DUPLICATE DETECTION ALGORITHM TO FACILITATE DOCUMENT CLUSTERINGIJDKP
Web Ming faces huge problems due to Duplicate and Near Duplicate Web pages. Detecting Near
Duplicates is very difficult in large collection of data like ”internet”. The presence of these web pages
plays an important role in the performance degradation while integrating data from heterogeneous
sources. These pages either increase the index storage space or increase the serving costs. Detecting these
pages has many potential applications for example may indicate plagiarism or copyright infringement.
This paper concerns detecting, and optionally removing duplicate and near duplicate documents which are
used to perform clustering of documents .We demonstrated our approach in web news articles domain. The
experimental results show that our algorithm outperforms in terms of similarity measures. The near
duplicate and duplicate document identification has resulted reduced memory in repositories.
The growing number of datasets published on the Web as linked data brings both opportunities for high data
availability of data. As the data increases challenges for querying also increases. It is very difficult to search
linked data using structured languages. Hence, we use Keyword Query searching for linked data. In this paper,
we propose different approaches for keyword query routing through which the efficiency of keyword search can
be improved greatly. By routing the keywords to the relevant data sources the processing cost of keyword search
queries can be greatly reduced. In this paper, we contrast and compare four models – Keyword level, Element
level, Set level and query expansion using semantic and linguistic analysis. These models are used for keyword
query routing in keyword search.
Context-Based Diversification for Keyword Queries over XML Data1crore projects
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
EFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASESIJCSEIT Journal
Keyword search in relational databases allows user to search information without knowing database
schema and using structural query language (SQL). In this paper, we address the problem of generating
and evaluating candidate networks. In candidate network generation, the overhead is caused by raising the
number of joining tuples for the size of minimal candidate network. To reduce overhead, we propose
candidate network generation algorithms to generate a minimum number of joining tuples according to the
maximum number of tuple set. We first generate a set of joining tuples, candidate networks (CNs). It is
difficult to obtain an optimal query processing plan during generating a number of joins. We also develop a
dynamic CN evaluation algorithm (D_CNEval) to generate connected tuple trees (CTTs) by reducing the
size of intermediate joining results. The performance evaluation of the proposed algorithms is conducted
on IMDB and DBLP datasets and also compared with existing algorithms.
In this Research paper, we present an overview of
research issues in web mining. We discuss mining with respect to
web data referred here as web data mining. In particular, our
focus is on web data mining research in context of our web
warehousing project.We have categorized web data mining into
three areas; web content mining, web structure mining and web
usage mining. We have highlighted and discussed various
research issues involved in each of these web data mining
category. We believe that web data mining will be the topic of
exploratory research in near future.
Comparative analysis of relative and exact search for web information retrievaleSAT Journals
Abstract The volume of data on web repository is huge. To get specific and precise information for the web repository is a big challenge. Existing Information Retrieval (IR) techniques, given by contemporary researchers, are very useful in field of IR. Here, the authors have implemented and tested two of the techniques from the fields of IR. The authors dealt with Relative Search and Exact Search techniques one by one. Initially relative search tested on web repository data using web mining tool and then its results are analyzed. In the same manner, the exact search technique of IR tested on web repository data and the results are measured. The researchers have experienced the significant importance on exact search and relative search. The focused of the research paper is to retrieve relevant information from the web information repository. With the use of two searching criteria these can be done. With the use of the suggested methods the searchers may retrieve a relevant web data in a fewer time. Key Words: Web data Mining, Exact Search, Relative Search, PR, TM, CD, VSM and TASE
Annotating search results from web databases-IEEE Transaction Paper 2013Yadhu Kiran
Abstract—An increasing number of databases have become web accessible through HTML form-based search interfaces. The data units returned from the underlying database are usually encoded into the result pages dynamically for human browsing. For the encoded data units to be machine processable, which is essential for many applications such as deep web data collection and Internet comparison shopping, they need to be extracted out and assigned meaningful labels. In this paper, we present an automatic
annotation approach that first aligns the data units on a result page into different groups such that the data in the same group have the same semantic. Then, for each group we annotate it from different aspects and aggregate the different annotations to predict a final annotation label for it. An annotation wrapper for the search site is automatically constructed and can be used to annotate new result pages from the same web database. Our experiments indicate that the proposed approach is highly effective.
Vision Based Deep Web data Extraction on Nested Query Result RecordsIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
Annotating Search Results from Web DatabasesSWAMI06
An increasing number of databases have become web accessible through HTML form-based search interfaces. The data
units returned from the underlying database are usually encoded into the result pages dynamically for human browsing. For the
encoded data units to be machine processable, which is essential for many applications such as deep web data collection and Internet
comparison shopping, they need to be extracted out and assigned meaningful labels. In this paper, we present an automatic
annotation approach that first aligns the data units on a result page into different groups such that the data in the same group have the
same semantic. Then, for each group we annotate it from different aspects and aggregate the different annotations to predict a final
annotation label for it. An annotation wrapper for the search site is automatically constructed and can be used to annotate new result
pages from the same web database. Our experiments indicate that the proposed approach is highly effective.
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.co¬m-Visit Our Website: www.finalyearprojects.org
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
Semantic web approach towards interoperability and privacy issues in social n...ijwscjournal
The Social Web is a set of social relations that link people through World Wide Web. This Social Web
encompasses how the websites and software are designed and developed to support social relations. The
new paradigms, tools and web services introduced by Social Web are widely accepted by internet users.
The main drawbacks of these tools are it acts as independent data silos; hence interoperability among
applications is a complex issue. This paper focuses on this issue and how best we can use semantic web
technologies to achieve interoperability among applications.
Semantic Ordering Relation- Applied to Agatha Christie Crime Thrillersijcoa
In any investigation, logical conclusions play a major part. In the present paper, we investigate the pattern which was widely used by Agatha Christie in her mystery novels and represent those literary elements by the relation called semantically ordering relation. The purpose of introducing this concept is to reduce the vagueness in naturally structured literary presentation by expressing the domain of linguistic variables into the fuzzy hedge based lattice structure. The idea proposed by Ho and Wechler has been applied in our analysis. Lastly, the paper compares the relation between the results obtained by the fuzzy based lattice structure with the projection and Max-Min composition principles.
Conceptual similarity measurement algorithm for domain specific ontology[Zac Darcy
This paper presents the similarity measurement algorithm for domain specific terms collected in the
ontology based data integration system. This similarity measurement algorithm can be used in ontology
mapping and query service of
ontology based data integration sy
stem. In this paper, we focus
o
n the web
query service to apply
this proposed algorithm
. Concepts similarity is important for web query service
because the words in user input query are not
same wholly with the concepts in
ontology. So, we need to
extract the possible concepts that are match or related to the input words with the help of machine readable
dictionary WordNet. Sometimes, we use the generated mapping rules in query generation procedure for
some words that canno
t be
confirmed the similarity of these words
by WordNet. We prove the effect
of this
algorithm with two degree semantic result of web minin
g by generating
the concepts results obtained form
the input query
Ontology Based Approach for Semantic Information Retrieval SystemIJTET Journal
Abstract—The Information retrieval system is taking an important role in current search engine which performs searching operation based on keywords which results in an enormous amount of data available to the user, from which user cannot figure out the essential and most important information. This limitation may be overcome by a new web architecture known as the semantic web which overcome the limitation of the keyword based search technique called the conceptual or the semantic search technique. Natural language processing technique is mostly implemented in a QA system for asking user’s questions and several steps are also followed for conversion of questions to the query form for retrieving an exact answer. In conceptual search, search engine interprets the meaning of the user’s query and the relation among the concepts that document contains with respect to a particular domain that produces specific answers instead of showing lists of answers. In this paper, we proposed the ontology based semantic information retrieval system and the Jena semantic web framework in which, the user enters an input query which is parsed by Standford Parser then the triplet extraction algorithm is used. For all input queries, the SPARQL query is formed and further, it is fired on the knowledge base (Ontology) which finds appropriate RDF triples in knowledge base and retrieve the relevant information using the Jena framework.
The growing number of datasets published on the Web as linked data brings both opportunities for high data
availability of data. As the data increases challenges for querying also increases. It is very difficult to search
linked data using structured languages. Hence, we use Keyword Query searching for linked data. In this paper,
we propose different approaches for keyword query routing through which the efficiency of keyword search can
be improved greatly. By routing the keywords to the relevant data sources the processing cost of keyword search
queries can be greatly reduced. In this paper, we contrast and compare four models – Keyword level, Element
level, Set level and query expansion using semantic and linguistic analysis. These models are used for keyword
query routing in keyword search.
Context-Based Diversification for Keyword Queries over XML Data1crore projects
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
EFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASESIJCSEIT Journal
Keyword search in relational databases allows user to search information without knowing database
schema and using structural query language (SQL). In this paper, we address the problem of generating
and evaluating candidate networks. In candidate network generation, the overhead is caused by raising the
number of joining tuples for the size of minimal candidate network. To reduce overhead, we propose
candidate network generation algorithms to generate a minimum number of joining tuples according to the
maximum number of tuple set. We first generate a set of joining tuples, candidate networks (CNs). It is
difficult to obtain an optimal query processing plan during generating a number of joins. We also develop a
dynamic CN evaluation algorithm (D_CNEval) to generate connected tuple trees (CTTs) by reducing the
size of intermediate joining results. The performance evaluation of the proposed algorithms is conducted
on IMDB and DBLP datasets and also compared with existing algorithms.
In this Research paper, we present an overview of
research issues in web mining. We discuss mining with respect to
web data referred here as web data mining. In particular, our
focus is on web data mining research in context of our web
warehousing project.We have categorized web data mining into
three areas; web content mining, web structure mining and web
usage mining. We have highlighted and discussed various
research issues involved in each of these web data mining
category. We believe that web data mining will be the topic of
exploratory research in near future.
Comparative analysis of relative and exact search for web information retrievaleSAT Journals
Abstract The volume of data on web repository is huge. To get specific and precise information for the web repository is a big challenge. Existing Information Retrieval (IR) techniques, given by contemporary researchers, are very useful in field of IR. Here, the authors have implemented and tested two of the techniques from the fields of IR. The authors dealt with Relative Search and Exact Search techniques one by one. Initially relative search tested on web repository data using web mining tool and then its results are analyzed. In the same manner, the exact search technique of IR tested on web repository data and the results are measured. The researchers have experienced the significant importance on exact search and relative search. The focused of the research paper is to retrieve relevant information from the web information repository. With the use of two searching criteria these can be done. With the use of the suggested methods the searchers may retrieve a relevant web data in a fewer time. Key Words: Web data Mining, Exact Search, Relative Search, PR, TM, CD, VSM and TASE
Annotating search results from web databases-IEEE Transaction Paper 2013Yadhu Kiran
Abstract—An increasing number of databases have become web accessible through HTML form-based search interfaces. The data units returned from the underlying database are usually encoded into the result pages dynamically for human browsing. For the encoded data units to be machine processable, which is essential for many applications such as deep web data collection and Internet comparison shopping, they need to be extracted out and assigned meaningful labels. In this paper, we present an automatic
annotation approach that first aligns the data units on a result page into different groups such that the data in the same group have the same semantic. Then, for each group we annotate it from different aspects and aggregate the different annotations to predict a final annotation label for it. An annotation wrapper for the search site is automatically constructed and can be used to annotate new result pages from the same web database. Our experiments indicate that the proposed approach is highly effective.
Vision Based Deep Web data Extraction on Nested Query Result RecordsIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
Annotating Search Results from Web DatabasesSWAMI06
An increasing number of databases have become web accessible through HTML form-based search interfaces. The data
units returned from the underlying database are usually encoded into the result pages dynamically for human browsing. For the
encoded data units to be machine processable, which is essential for many applications such as deep web data collection and Internet
comparison shopping, they need to be extracted out and assigned meaningful labels. In this paper, we present an automatic
annotation approach that first aligns the data units on a result page into different groups such that the data in the same group have the
same semantic. Then, for each group we annotate it from different aspects and aggregate the different annotations to predict a final
annotation label for it. An annotation wrapper for the search site is automatically constructed and can be used to annotate new result
pages from the same web database. Our experiments indicate that the proposed approach is highly effective.
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.co¬m-Visit Our Website: www.finalyearprojects.org
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
Semantic web approach towards interoperability and privacy issues in social n...ijwscjournal
The Social Web is a set of social relations that link people through World Wide Web. This Social Web
encompasses how the websites and software are designed and developed to support social relations. The
new paradigms, tools and web services introduced by Social Web are widely accepted by internet users.
The main drawbacks of these tools are it acts as independent data silos; hence interoperability among
applications is a complex issue. This paper focuses on this issue and how best we can use semantic web
technologies to achieve interoperability among applications.
Semantic Ordering Relation- Applied to Agatha Christie Crime Thrillersijcoa
In any investigation, logical conclusions play a major part. In the present paper, we investigate the pattern which was widely used by Agatha Christie in her mystery novels and represent those literary elements by the relation called semantically ordering relation. The purpose of introducing this concept is to reduce the vagueness in naturally structured literary presentation by expressing the domain of linguistic variables into the fuzzy hedge based lattice structure. The idea proposed by Ho and Wechler has been applied in our analysis. Lastly, the paper compares the relation between the results obtained by the fuzzy based lattice structure with the projection and Max-Min composition principles.
Conceptual similarity measurement algorithm for domain specific ontology[Zac Darcy
This paper presents the similarity measurement algorithm for domain specific terms collected in the
ontology based data integration system. This similarity measurement algorithm can be used in ontology
mapping and query service of
ontology based data integration sy
stem. In this paper, we focus
o
n the web
query service to apply
this proposed algorithm
. Concepts similarity is important for web query service
because the words in user input query are not
same wholly with the concepts in
ontology. So, we need to
extract the possible concepts that are match or related to the input words with the help of machine readable
dictionary WordNet. Sometimes, we use the generated mapping rules in query generation procedure for
some words that canno
t be
confirmed the similarity of these words
by WordNet. We prove the effect
of this
algorithm with two degree semantic result of web minin
g by generating
the concepts results obtained form
the input query
Ontology Based Approach for Semantic Information Retrieval SystemIJTET Journal
Abstract—The Information retrieval system is taking an important role in current search engine which performs searching operation based on keywords which results in an enormous amount of data available to the user, from which user cannot figure out the essential and most important information. This limitation may be overcome by a new web architecture known as the semantic web which overcome the limitation of the keyword based search technique called the conceptual or the semantic search technique. Natural language processing technique is mostly implemented in a QA system for asking user’s questions and several steps are also followed for conversion of questions to the query form for retrieving an exact answer. In conceptual search, search engine interprets the meaning of the user’s query and the relation among the concepts that document contains with respect to a particular domain that produces specific answers instead of showing lists of answers. In this paper, we proposed the ontology based semantic information retrieval system and the Jena semantic web framework in which, the user enters an input query which is parsed by Standford Parser then the triplet extraction algorithm is used. For all input queries, the SPARQL query is formed and further, it is fired on the knowledge base (Ontology) which finds appropriate RDF triples in knowledge base and retrieve the relevant information using the Jena framework.
http://inarocket.com
Learn BEM fundamentals as fast as possible. What is BEM (Block, element, modifier), BEM syntax, how it works with a real example, etc.
Content personalisation is becoming more prevalent. A site, it's content and/or it's products, change dynamically according to the specific needs of the user. SEO needs to ensure we do not fall behind of this trend.
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
By David F. Larcker, Stephen A. Miles, and Brian Tayan
Stanford Closer Look Series
Overview:
Shareholders pay considerable attention to the choice of executive selected as the new CEO whenever a change in leadership takes place. However, without an inside look at the leading candidates to assume the CEO role, it is difficult for shareholders to tell whether the board has made the correct choice. In this Closer Look, we examine CEO succession events among the largest 100 companies over a ten-year period to determine what happens to the executives who were not selected (i.e., the “succession losers”) and how they perform relative to those who were selected (the “succession winners”).
We ask:
• Are the executives selected for the CEO role really better than those passed over?
• What are the implications for understanding the labor market for executive talent?
• Are differences in performance due to operating conditions or quality of available talent?
• Are boards better at identifying CEO talent than other research generally suggests?
Annotation for query result records based on domain specific ontologyijnlc
The World Wide Web is enriched with a large collection of data, scattered in deep web databases and web
pages in unstructured or semi structured formats. Recently evolving customer friendly web applications
need special data extraction mechanisms to draw out the required data from these deep web, according to
the end user query and populate to the output page dynamically at the fastest rate. In existing research
areas web data extraction methods are based on the supervised learning (wrapper induction) methods. In
the past few years researchers depicted on the automatic web data extraction methods based on similarity
measures. Among automatic data extraction methods our existing Combining Tag and Value similarity
method, lags to identify an attribute in the query result table. A novel approach for data extracting and
label assignment called Annotation for Query Result Records based on domain specific ontology. First, an
ontology domain is to be constructed using information from query interface and query result pages
obtained from the web. Next, using this domain ontology, a meaning label is assigned automatically to each
column of the extracted query result records.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
An effective search on web log from most popular downloaded contentijdpsjournal
A Web page recommender system effectively predicts the best related web page to search. While search
ing
a word from search engine it may display some unnecessary links and unrelated data’s to user so to a
void
this problem, the con
ceptual prediction model combines both the web usage and domain knowledge. The
proposed conceptual prediction model automatically generates a semantic network of the semantic Web
usage knowledge, which is the integration of domain knowledge and web usage i
nformation. Web usage
mining aims to discover interesting and frequent user access patterns from web browsing data. The
discovered knowledge can then be used for many practical web applications such as web
recommendations, adaptive web sites, and personali
zed web search and surfing
Implemenation of Enhancing Information Retrieval Using Integration of Invisib...iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
We are good IEEE java projects development center in Chennai and Pondicherry. We guided advanced java technologies projects of cloud computing, data mining, Secure Computing, Networking, Parallel & Distributed Systems, Mobile Computing and Service Computing (Web Service).
For More Details:
http://jpinfotech.org/final-year-ieee-projects/2014-ieee-projects/java-projects/
Enhancing keyword search over relational databases using ontologiescsandit
Keyword Search Over Relational Databases (KSORDB) provides an easy way for casual users
to access relational databases using a set of keywords. Although much research has been done
and several prototypes have been developed recently, most of this research implements exact
(also called syntactic or keyword) match. So, if there is a vocabulary mismatch, the user cannot
get an answer although the database may contain relevant data. In this paper we propose a
system that overcomes this issue. Our system extends existing schema-free KSORDB systems
with semantic match features. So, if there were no or very few answers, our system exploits
domain ontology to progressively return related terms that can be used to retrieve more
relevant answers to user.
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES cscpconf
Keyword Search Over Relational Databases (KSORDB) provides an easy way for casual users to access relational databases using a set of keywords. Although much research has been done and several prototypes have been developed recently, most of this research implements exact also called syntactic or keyword) match. So, if there is a vocabulary mismatch, the user cannotget an answer although the database may contain relevant data. In this paper we propose a
system that overcomes this issue. Our system extends existing schema-free KSORDB systems with semantic match features. So, if there were no or very few answers, our system exploits
domain ontology to progressively return related terms that can be used to retrieve morerelevant answers to user.
Semantic Search of E-Learning Documents Using Ontology Based Systemijcnes
The keyword searching mechanism is traditionally used for information retrieval from Web based systems. However, this system fails to meet the requirements in Web searching of the expert knowledge base based on the popular semantic systems. Semantic search of E-learning documents based on ontology is increasingly adopted in information retrieval systems. Ontology based system simplifies the task of finding correct information on the Web by building a search system based on the meaning of keyword instead of the keyword itself. The major function of the ontology based system is the development of specification of conceptualization which enhances the connection between the information present in the Web pages with that of the background knowledge.The semantic gap existing between the keyword found in documents and those in query can be matched suitably using Ontology based system. This paper provides a detailed account of the semantic search of E-learning documents using ontology based system by making comparison between various ontology systems. Based on this comparison, this survey attempts to identify the possible directions for future research.
COST-SENSITIVE TOPICAL DATA ACQUISITION FROM THE WEBIJDKP
The cost of acquiring training data instances for induction of data mining models is one of the main concerns in real-world problems. The web is a comprehensive source for many types of data which can be used for data mining tasks. But the distributed and dynamic nature of web dictates the use of solutions which can handle these characteristics. In this paper, we introduce an automatic method for topical data acquisition from the web. We propose a new type of topical crawlers that use a hybrid link context extraction method for topical crawling to acquire on-topic web pages with minimum bandwidth usage and with the lowest cost. The new link context extraction method which is called Block Text Window (BTW), combines a text window method with a block-based method and overcomes challenges of each of these methods using the advantages of the other one. Experimental results show the predominance of BTW in comparison with state of the art automatic topical web data acquisition methods based on standard metrics.
Performance Evaluation of Query Processing Techniques in Information Retrievalidescitation
The first element of the search process is the query.
The user query being on an average restricted to two or three
keywords makes the query ambiguous to the search engine.
Given the user query, the goal of an Information Retrieval
[IR] system is to retrieve information which might be useful
or relevant to the information need of the user. Hence, the
query processing plays an important role in IR system.
The query processing can be divided into four categories
i.e. query expansion, query optimization, query classification and
query parsing. In this paper an attempt is made to evaluate the
performance of query processing algorithms in each of the
category. The evaluation was based on dataset as specified by
Forum for Information Retrieval [FIRE15]. The criteria used
for evaluation are precision and relative recall. The analysis is
based on the importance of each step in query processing. The
experimental results show that the significance of each step
in query processing and also the relevance of web semantics
and spelling correction in the user query.
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
Application of hidden markov model in question answering systemsijcsa
By the increase of the volume of the saved information on web, Question Answering (QA) systems have been very important for Information Retrieval (IR). QA systems are a specialized form of information retrieval. Given a collection of documents, a Question Answering system attempts to retrieve correct answers to questions posed in natural language. Web QA system is a sample of QA systems that in this system answers retrieval from web environment doing. In contrast to the databases, the saved information on web does not follow a distinct structure and are not generally defined. Web QA systems is the task of automatically answering a question posed in Natural Language Processing (NLP). NLP techniques are used in applications that make queries to databases, extract information from text, retrieve relevant documents from a collection, translate from one language to another, generate text responses, or recognize spoken words converting them into text. To find the needed information on a mass of the non-structured information we have to use techniques in which the accuracy and retrieval factors are implemented well. In this paper in order to well IR in web environment, The QA system in designed and also implemented based on the Hidden Markov Model (HMM)
Keyword search is an intuitive paradigm for searching linked data sources on the web. We propose to route keywords only to relevant sources to reduce the high cost of processing keyword search queries over all sources.
Similar to Using Page Size for Controlling Duplicate Query Results in Semantic Web (20)
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Elevating Tactical DDD Patterns Through Object Calisthenics
Using Page Size for Controlling Duplicate Query Results in Semantic Web
1. International Journal of Web & Semantic Technology (IJWesT) Vol.4, No.2, April 2013
DOI : 10.5121/ijwest.2013.4205 49
Using Page Size for Controlling Duplicate Query
Results in Semantic Web
1
Oumair Naseer, 2
Ayesha Naseer, 3
Atif Ali Khan, 4
Humza Naseer
1,
School of Engineering, University of Warwick, Coventry, UK
2,
Department of Computer Science, NUST, Islamabad, Pakistan
3,
School of Engineering, University of Warwick, Coventry, UK
4,
Business Intelligence Consultant, Bilytica Private Limited, Lahore, Pakistan
1
o.naseer@warwick.ac.uk, 2
ayesha.naseer@emc.edu.pk,
3
atif.khan@warwick.ac.uk, 4
humza.naseer@bilytica.com
Abstract. Semantic web is a web of future. The Resource Description Framework (RDF) is a language
to represent resources in the World Wide Web. When these resources are queried the problem of duplicate
query results occurs. The present techniques used hash index comparison to remove duplicate query
results. The major drawback of using the hash index to remove duplicate query results is that, if there is a
slight change in formatting or word order, then hash index is changed and query results are no more
considered as duplicate even though they have same contents. We presented an algorithm for detection and
elimination of duplicate query results from semantic web using hash index and page size comparisons.
Experimental results showed that the proposed technique removed duplicate query results from semantic
web efficiently, solved the problems of using hash index for duplicate handling and could be embedded in
existing SQL-Based query system for semantic web. Research could be carried out for certain flexibilities
in existing SQL-Based query system of semantic web to accommodate other duplicate detection techniques
as well.
Keywords: Duplicate query results; Semantic web; Hash index; SQL-Based query system
1. Introduction
The volume of data on web is growing day by day. World Wide Web has necessitated the users to
locate their desired information resources and to assess their usage patterns [1]. The need for
building server side and client side intelligent systems can mine for knowledge in a successful
manner. Semantic web is the extension of the current web where semantics of information and
services on the web are defined. Semantic web makes it possible for the web to understand and
satisfy the requests of people and machines to use the web content [2]. The advancement of the
semantic web has given rise to different problems related with it. One of the problems associated
with semantic web is the duplicate query results. Multiple data sources referring to the same real
entity leads to the problem of duplicate query results. A duplicate can be defined as; the exact
same syntactic terms and sequences without formatting differences and layout changes. The
current SQL-based approach to query Resource Description Framework RDF data on semantic
web does not handle the problem of duplicate query results generated from multiple data sources
[3].
In existing techniques [3, 4] duplicate query results are removed by computing and comparing the
hash index of query results. The major drawback of using the hash index to remove duplicate
query results is that, if there is a slight change in formatting or word order, then hash index is
2. International Journal of Web & Semantic Technology (IJWesT) Vol.4, No.2, April 2013
50
changed. Also the query results are no more considered as duplicate even though they have same
contents.
This paper presented a technique to remove duplicate query results. It depends on distribution of
document sizes and hash collisions will not be detected among documents of same sizes with
slight change in formatting. The use of document size comparison along with hash value reduced
the problems of using the hash index for duplicate detection. This technique has very low
computational cost, reduced memory requirements of the system and can easily be embedded in
existing SQL-Based query system of semantic web.
2. Related Work
In [5] author presented a novel technique for the estimation of the degree of similarity among
pairs of documents. The approach was called shingling, which does not rely on any linguistic
knowledge other than the ability to tokenize documents into a list of words, i.e., it is merely
syntactic. In shingling, all word sequences of the adjacent words are extracted. If two documents
contain the same set of shingles they are considered to be equivalent. If their sets of shingles
appreciably overlap, then they are exceedingly similar. In order to reduce the set of shingles to a
reasonably small size, authors employed an unbiased deterministic sampling technique that
reduces the storage requirements for retaining information about each document. This also
reduces the computational effort of comparing documents. The experiment was done on a set of
30 million web pages obtained from an AltaVista crawl. These pages were grouped into the
clusters of extremely similar documents. A Compact features for the detection of near-duplicates
in distributed retrieval is presented in [6]. In [7], author describes an approach to distinguish
between different URLs having similar texts. For computer networks and ISDN systems [8]
author describes an effective crawling through URL ordering. In [9], author describes an
approach to find replicated web collections. Materialized view on oracle is presented in [10].
In [4] author proposed a pass-through architecture via hash techniques to remove duplicate query
results. The system involved to receive the query from the user and issue the user query to a first
data source. After receiving the result of first query from the first data source, the hash index is
calculated for the first query result and result is passed to the user. Then the system further
receives the results of second query and calculates the hash index of second query result. The first
hash value is compared with the second hash value to check for the duplicate query results. If
there is any hash collision then the first data source is queried to receive the results of second
query, and if first data source contains data against second query then second query result is
considered duplicated and discarded.
In [3], author proposed a SQL table function RDF_MATCH to query RDF data, which can search
and infer from RDF data available on the semantic web. It also enables further processing by
using standard SQL constructs [11-13]. The structure of the RDF_MATCH function [14-16]
enables it to capture a graph pattern to be searched, RDF model and rule bases consisting of RDF
data to be queried, and then provide the query results based on inference rules [17-19]. The
RDF_MATCH function is implemented by generating SQL queries against tables that contain
RDF data.
3. Methodology
Subject-property matrix materialized join views and indexes on data and rule bases are used to
improve efficiency and further kernel enhancements have been provided to reduce the run time
3. International Journal of Web & Semantic Technology (IJWesT) Vol.4, No.2, April 2013
51
overheads [20-24]. This query scheme efficiently retrieved RDF data on semantic web but did not
handle the duplicate query results problem.
Step1: Issuing first query to first data source
In this very first step the system receives a first query from the user as input, after
receiving the first query the system passes that query to the first data source to get the
first query results in output.
Step2: Computation of the hash index and page size of first query result
In second step after receiving the results from first data source, we compute the hash
index of the result. After computing the hash index we compute the page size of the first
query result, then the calculated values of hash index and page size of first query result
are stored in hash table along with the pointer to first data source. After storing this
information in hash table the first query result is passed to the user.
Step3: Issuing second query to second data source
In third phase of this process the system receives a second query from the user as input,
after receiving the second query the system pass that query to second data source to get
the second query results in output.
Step4: Computation of the hash index and page size of second query result
In fourth step after receiving the results from second data source, we compute the hash
index of the result. After computing the hash index we compute the page size of the
second query result, then the calculated values of hash index and page size of second
query result are stored in hash table along with the pointer to second data source.
Step5: Comparison of hash indexes
In this step the system proceeds by comparing the hash indexes of first and second query
results. If the first hash index collides with the second hash index, then the first data
source is queried for the results of the second query. If the first data source contains data
in response to second query then second query results are considered as duplicate and are
discarded.
Step6: Comparison of page sizes
In this phase we see, if the first and second hash indexes are not same then the page sizes
of first and second query results are compared if the page sizes of first and second query
results are same or vary within the threshold of 0 to 50 kb then the first data source is
queried for the results of second query. If the first data source contains data in response to
second query then second query results are considered as duplicate and are discarded.
3. Algorithm
The modified processing steps for the approach [3] can be written in an algorithmic form as
under.
ALGORITHM: Building a module to remove duplicate query results from semantic web.
INPUT: Query Results of RDF data from a semantic data source.
OUTPUT: Query Results free of duplicates
STEP1: /* Displaying the First Query Results to User After Receiving the Query.
DO Read the user query/ problem statement.
4. International Journal of Web & Semantic Technology (IJWesT) Vol.4, No.2, April 2013
52
1.2: WRITE Query to Data Source.
1.3: READ Query Results
1.4: COMPUTE Hash Index
1.5: COMPUTE Page Size
1.6: SAVE the Hash Index, Page Size and Pointer In Hash Table.
1.7: DISPLAY results to the user.
STEP2: /* Displaying the Further Query Results to User after Removing duplicate results.
2.1: WHILE not end of query results DO
2.2: READ the Query
2.3: COMPUTE Hash Index
2.4: COMPUTE Page Size
2.5: IF Hash Index values in STEP 2.3 IS EQUAL to the Hash Index in STEP 1.4
DISCARD the result.
2.6: ELSE IF page size values in STEP 2.4 IS EQUAL to the page size in STEP 1.5
DISCARD the result.
ELSE GOTO STEP 1.6
{END IF}
END {DO WHILE}
3. Query Handling
The detailed description for the process is as under:
Figure 1: Dataflow diagram of proposed algorithm.
5. International Journal of Web & Semantic Technology (IJWesT) Vol.4, No.2, April 2013
53
The proposed algorithm involves receiving the first query from the user and issuing the user
query to a first data source. After receiving the result of first query from the first data source the
hash index and page size is calculated for the first query result and result is passed to the user.
Then the system further receives the results of second query and calculates the hash index and
page size of second query result. The first hash value is compared with the second hash value to
check for duplicate query results. If there is any hash collision then the first data source is queried
to receive the results of second query, and if first data source contains data against second query
then second query result is considered duplicate and is discarded. If the first and second hash
indexes are not same then the first page size is compared with second page size. If the page sizes
are same or vary within the threshold of 0 to 50 kb then again the first data source is queried to
receive the results of second query. And if first data source contains data against second query
then second query result is considered as duplicate and is discarded. The proposed technique is an
enhancement of SQL-Based scheme to query RDF data on semantic web; it extends its
functionality to remove duplicate query results.
4. Results
Now we shall give the results of our proposed technique. The success of the proposed technique
depends on the distribution of the document sizes and hash collisions will not be detected among
documents of same size with slight changes in formatting and word order. We performed an
experiment on data set of 16 web pages collected randomly; the links of collected web pages are
given in following tables. We calculated the values of hash index and page sizes of data set. Then
we run our proposed algorithm on this data set which updates formatting and word order of web
pages. Again, we computed the values of hash index and page sizes of same data set. Following
tables (1, 2) and graphs (1, 2) showed the results of computed values and their variations.
Table 1: Download links of web pages, their corresponding hash values and page sizes.
Serial
No.
Download Links of Web Pages MD5 Hash Values Page Size
1 http://download.cnet.com/WinRAR-32-
bit/3000-2250_4-10007677.html
123D463728394FDB876A6534B 31.1KB
2 http://www.hotmail.co.uk 35645623FBD8786A667F76B8A 20.5KB
3 https://mail.google.com/mail/u/0/?shva=1 4565778898965320B98A9F5D12 20.7KB
4 http://www.winzip.com/index.htm?sc_ci
d=go_uk_b_search_wz_brand
768735D6544A4433B767F24432 61.5KB
5 http://download.cnet.com/Skype/3000-
2349_4-10225260.html
87B68634AF8768B76767D87676 265KB
6 http://www.google.com 40877123B656D565F564356A11 368KB
7 http://vlc-media-
player.todownload.com/?gclid=CO7D44
CR0bQCFaTKtAodbg0ATA
3635B78987F56567D434A989B1 87.5KB
8 http://www.youtube.com 837735299BFD26536723A565FA 5.57KB
9 http://www.uet.edu.pk 9897860B798790D098A454F767 8.57KB
10 http://www.mcs.edu.pk 365236B6767F5465A766D5767F 3.66KB
11 http://www.rect.edu.pk 213625371825637BFDA768C76C 24.5KB
12 http://www.jackson.com 5656C67B8768F97898D7878A19 32.7KB
13 http://www.w3.org/TR/rdf-primer/ 986765320BCFA565DF6767C767 65.4KB
14 http://www.bru.com 386827368BFDAC79878CDA897 92.3KB
15 http://www.education.com 213567153BFDA987897CFCA92 89.7KB
16 http://www.baqai.com 387867FCDA87878CFDA7887B6 91.9KB
6. International Journal of Web & Semantic Technology (IJWesT) Vol.4, No.2, April 2013
54
Table 2: Hash values and page sizes of web pages after changes in formatting and word order.
Serial
No:
Duplicate Web Page MD5 Hash Values Page Sizes
1 WinRAR Web Installation Page 27BA8126354FCD345235CF 31.1KB
2 Hotmail Web Page 123546AAF786534Act67325 20.5KB
3 Gmail Web Page 2443547FCA544335DF678A 20.7KB
4 WinZip Web Installation Page 10293645ACF534627DCB878 61.5KB
5 Skype Web Page 23454789CFD4234ACB878F 265KB
6 Google Web Page 34546678FC5D4A099B2431 368KB
7 VLC Web Installation Page 12343241CD6789AD98976B 87.5KB
8 YouTube Web Page 3546728CFB8767B6754FA6 5.57KB
9 UET Web page 234DCF655F665A776B556A 8.57KB
10 MCS Web Page 778644AA5F5DD4434B56AF 3.66KB
11 RECT Web Page 2341325364786FCDA565DF5 24.5KB
12 Jackson Web Page 0902365542536ADF675B78A 32.7KB
13 RDF Premier Page 534098FB7653455AD89B7D 65.4KB
14 Bru Web Page 23416354688439DCF67A5BF 92.3KB
15 Education Web Page 2234B564840986AF37846BA 89.7KB
16 Baqai Web Page 455463787846546BFCAD677 91.9KB
Figure 2: Variation of page size and hash values of collected data set of original web pages before
formatting changes.
Figure 3: Variation of page size and hash values of collected data set of web pages with formatting
changes.
In graph 1 the thick line represents the hash indexes of original web pages, and thin line
represents the page size values of original web pages. In graph 2 the thick line represents the hash
values of collected web pages after formatting changes, while the thin line shows the values of
page sizes of web pages after formatting changes. By the comparison of both graphs it is
7. International Journal of Web & Semantic Technology (IJWesT) Vol.4, No.2, April 2013
55
concluded that the page size values of web pages remain the same after formatting changes as
shown by thin line in graph 1 and graph 2, while the hash indexes are changed with formatting
changes in the web pages as shown by the thick lines in graph 1 and graph 2. So the above graphs
show that with formatting changes of web pages the page sizes remain similar while the values of
hash indexes changed even though the contents of web pages are similar. It is concluded that
using the page size comparison along with hash index solve the problems of just using the hash
index for duplicate detection. It does not make the system computationally complex, reduces the
memory requirements of the system and can easily be embedded in existing SQL-Based query
system of semantic web.
5. Conclusions
A technique to remove the duplicate query results on semantic web using hash index and page
size comparison has been presented in this paper. The proposed technique involves receiving the
query from the user and issuing the user query to a first data source. After receiving the result of
first query from the first data source the hash index and page size are calculated for the first query
result and result is passed to the user. Then the system further receives the results of second query
and calculates the hash index and page size of second query result. The first hash value is
compared with the second hash value to check for the duplicate query results. If there is any hash
collision then the first data source is queried to receive the results of second query, and if first
data source contains data against second query then second query result is considered as duplicate
and is discarded. If the first and second hash indexes are not same then the first page size is
compared with second page size. If the page sizes are same or vary within the threshold of 0 to 50
kb then again the first data source is queried to receive the results of second query, and if first
data source contains data against second query then second query result is considered duplicate
and is discarded.
6. Future Recommendations
It is concluded that using the page size comparison along with hash index solve the problems of
just using the hash index for duplicate detection. It does not make the system computationally
complex, reduces the memory requirements of system. The proposed technique is an
enhancement of SQL-Based scheme to query RDF data on semantic web; it further extends its
functionality to remove duplicate query results.
Research can be carried out for certain flexibilities in existing SQL-Based query system of
semantic web to accommodate other duplicate detection techniques as well. The concept of
optimizing self-join queries that usually occur when querying RDF data can improve the
efficiency of query processing in semantic web.
References
[1] Berners-Lee, Tim, James Hendler, and Ora Lassila. "The semantic web." Scientific american 284.5:
pp,28-37, 2001.
[2] Dumbill, Edd. "Building the semantic web." XML. com. 2001.
[3] Chong, Eugene Inseok, et al. "An efficient SQL-based RDF querying scheme." Proceedings of the 31st
international conference on Very large data bases. VLDB Endowment, 2005.
[4] Guha, Ramanathan V. "Pass-through architecture via hash techniques to remove duplicate query
results." U.S. Patent No. 6,081,805. 27 Jun. 2000.
8. International Journal of Web & Semantic Technology (IJWesT) Vol.4, No.2, April 2013
56
[5] Broder, A., Glassman, S., Manasse, M., and Zweig, G. “Syntactic clustering of the Web”, In 6th
International World Wide Web Conference, pp: 393-404, 1997
[6] Bernstein, Y., Shokouhi, M., and Zobel, J. "Compact Features for Detection of Near- Duplicates in
Distributed Retrieval", in 'Proceedings of String Processing and Information Retrieval Symposium (to
appear)', Glasgow, Scotland, 2006.
[7] [BarYossef, Z., Keidar, I., Schonfeld, U.. "Do Not Crawl in the DUST: Different URLs with Similar
Text", 16th International world Wide Web conference, Alberta, Canada, Data Mining Track, 8-12
May, 2007.
[8] Cho, J., Garca-Molina, H., and Page, L. “Efficient crawling through URL ordering”, Computer
Networks and ISDN Systems, Vol. 30, No. 1-7, pp: 161-172, 1998.
[9] Cho, J., Shivakumar, N., Garcia-Molina, H. "Finding replicated web collections", ACM SIGMOD
Record, Vol. 29, No. 2, pp. 355 - 366, June, 2000.
[10]Bello, Randall G., et al. "Materialized views in Oracle." PROCEEDINGS OF THE
INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, Vol. 24, INSTITUTE OF
ELECTRICAL & ELECTRONICS ENGINEERS, 1998.
[11]Taylor, Chris. "An introduction to metadata." University of Queensland Library: 07-29, 2003.
[12]Bizer, Christian, and Richard Cyganiak. "D2r server-publishing relational databases on the semantic
web." 5th international Semantic Web conference, 2006.
[13]Karvounarakis, Gregory, et al. "RQL: a declarative query language for RDF." Proceedings of the 11th
international conference on World Wide Web, ACM, 2002.
[14]Wilkinson, Kevin, et al. "Efficient RDF storage and retrieval in Jena2." Proceedings of SWDB. Vol. 3,
2003.
[15]Haase, Peter, et al. "A comparison of RDF query languages." The Semantic Web–ISWC2004 502-
517., 2004.
[16]Miller, Libby, Andy Seaborne, and Alberto Reggiori. "Three implementations of SquishQL, a simple
RDF query language." The Semantic Web—ISWC 2002 (2002): 423-435., 2002.
[17]Prud, Eric, and Andy Seaborne. "Sparql query language for rdf." ,2006.
[18]Shivakumar, Narayanan, and Hector Garcia-Molina. "Finding near-replicas of documents on the web."
The World Wide Web and Databases: 204-212.1999.
[19]Shivakumar, Narayanan, and Hector Garcia-Molina. "SCAM: A copy detection mechanism for digital
documents."1995.
[20]Denehy, Timothy E., and Windsor W. Hsu. "Duplicate management for reference data." Computer
Sciences Department, University of Wisconsin and IBM Research Division, Almaden Research
Center, Research Report 2003.
[21]Honicky, R. J., and Ethan L. Miller. "A fast algorithm for online placement and reorganization of
replicated data." Parallel and Distributed Processing Symposium, 2003. Proceedings International
IEEE, 2003.
[22] Gomes, Daniel, André L. Santos, and Mário J. Silva. "Managing duplicates in a web archive."
Proceedings of the 2006 ACM symposium on applied computing. ACM, 2006.
[23]Narayana, V. A., P. Premchand, and A. Govardhan. "Effective Detection of Near Duplicate Web
Documents in Web Crawling." International Journal of Computational Intelligence Research 5.1,: 83-
96., 2009.
[24]Su, Weifeng, Jiying Wang, and Frederick H. Lochovsky. "Record matching over query results from
multiple web databases." Knowledge and Data Engineering, IEEE Transactions on 22.4: 578-589,
2010.
Corresponding Author:
Oumair Naseer
Department of Computer Science,
University of Warwick, Coventry, United Kingdom
o.naseer@warwick.ac.uk