SlideShare a Scribd company logo
Context 
based 
Web 
Indexing 
for 
Storage 
of 
Relevant 
Web 
Pages
ABSTRACT 
• A 
focused 
crawler 
downloads 
web 
pages 
that 
are 
relevant 
to 
a 
user 
specified 
topic. 
• This 
paper 
proposes 
a 
technique 
for 
indexing 
the 
keyword 
extracted 
from 
the 
web 
documents 
along 
with 
their 
contexts 
wherein 
it 
uses 
a 
height 
balanced 
binary 
search 
(AVL) 
tree, 
for 
indexing 
purpose 
to 
enhance 
the 
performance 
of 
the 
retrieval 
system.
INTRODUCTION 
• The 
basic 
aim 
is 
to 
select 
the 
best 
collecNon 
of 
informaNon 
according 
to 
users 
need. 
• 
The 
exisNng 
focused 
crawlers 
do 
not 
analyze 
the 
context 
of 
the 
keyword 
in 
the 
web 
page 
before 
they 
download 
it. 
• The 
Use 
Of 
AVL 
Tree
AVL 
Tree 
• In 
AVL 
(i.e. 
height 
balanced 
binary 
tree) 
tree 
[3], 
the 
height 
of 
a 
tree 
is 
defined 
as 
the 
length 
of 
the 
longest 
path 
from 
the 
root 
node 
of 
the 
tree 
to 
one 
of 
its 
leaf 
node. 
• 
And 
the 
balance 
factor 
(BF) 
is: 
(height 
of 
leU 
subtree 
– 
height 
of 
right 
subtree). 
To 
call 
the 
AVL 
as 
balanced 
the 
value 
of 
BF 
should 
be 
-­‐1, 
0 
or 
1. 
• This 
strategy 
makes 
the 
searching 
task 
faster 
and 
opNmized.
RELATED 
WORK 
• F. 
Silvestri, 
R.Perego 
and 
Orlando. 
proposed 
the 
reordering 
algorithm. 
• Oren 
Zamir 
and 
Oren 
Etzioni. 
proposed 
threshold 
based 
clustering 
algorithm. 
• C. 
Zhou, 
W. 
Ding 
and 
Na 
Yang. 
The 
paper 
introduces 
a 
double 
indexing 
mechanism 
for 
search 
engines 
based 
on 
campus 
Net. 
• N. 
Chauhan 
and 
A. 
K. 
Sharma. 
Proposed, 
the 
context 
driven 
focused 
crawler 
(CDFC) 
• P. 
Gupta 
and 
A. 
K. 
Sharma. 
Worked 
on 
context 
based 
indexing 
in 
search 
engines 
using 
ontology.
PROPOSED 
WORK 
• This 
paper 
proposes 
an 
algorithm 
for 
indexing 
the 
keyword 
extracted 
from 
the 
web 
documents 
along 
with 
their 
context. 
• The 
indexing 
technique 
uses 
a 
height 
balanced 
binary 
search 
(AVL) 
tree, 
in 
addiNon 
to 
improved 
performance 
in 
the 
retrieval 
of 
informaNon, 
this 
data 
structure 
is 
able 
to 
support 
dynamic 
indexing, 
which 
is 
especially 
important 
for 
environments 
where 
documents 
are 
changed 
frequently.
Architecture 
of 
context 
based 
indexing.
Context 
Based 
Retrieval 
Interface
Steps 
involved 
in 
the 
construcNon 
of 
the 
context 
based 
index 
using 
AVL. 
• Step1: 
Preprocess 
the 
crawled 
web 
documents 
and 
extract 
the 
keyword 
along 
with 
their 
frequency 
of 
occurrence. 
• Step2: 
Input 
the 
keywords 
to 
the 
context 
generator 
which 
extracts 
the 
mulNple 
contextual 
sense 
of 
the 
word. 
Context 
is 
being 
searched 
in 
the 
thesaurus 
(a 
dicNonary 
of 
words 
available 
on 
WWW 
from 
thesaurus.com, 
which 
contains 
the 
words 
as 
well 
their 
mulNple 
meanings). 
Step3: 
The 
keywords 
along 
with 
the 
context 
are 
indexed 
using 
the 
AVL 
tree. 
• Step4: 
Compare 
the 
entered 
keyword 
with 
the 
node’s 
keyword 
field 
of 
the 
AVL 
tree, 
unNl 
a 
similar 
word 
is 
found. 
• Step5: 
If 
search 
is 
not 
a 
success, 
create 
a 
node 
containing 
the 
following 
fields 
(LeUchild, 
keyword, 
rightchild, 
link) 
as 
shown 
in 
figure4.The 
link 
is 
pointer 
variable 
which 
points 
to 
the 
database 
where 
the 
context 
of 
keyword 
and 
the 
corresponding 
document_id 
is 
stored. 
Context 
is 
being 
searched 
in 
the 
thesaurus 
(a 
dicNonary 
of 
words 
available 
on 
WWW 
from 
thesaurus.com, 
which 
contains 
the 
words 
as 
well 
their 
mulNple 
meanings). 
Step6: 
Arrange 
the 
node 
in 
the 
AVL 
tree, 
according 
to 
the 
height 
BF.
Steps 
involved 
in 
the 
construcNon 
of 
the 
context 
based 
index 
using 
AVL. 
• Step7: 
Repeat 
step 
4, 
5 
and 
6 
unNl 
all 
the 
extracted 
keywords 
are 
arranged. 
• Step8: 
Now 
when 
the 
user 
fires 
the 
query 
with 
context 
explicitly 
specified, 
then 
the 
index 
is 
being 
searched, 
reducing 
its 
search 
Nme 
to 
half 
of 
the 
linear 
search. 
• Step9. 
Thus, 
AVL 
indexing 
technique 
provides 
a 
fast 
access 
to 
document 
context 
and 
structure.
Node 
structure.
Node 
structure. 
Create_BST() 
//iniNally 
the 
tree 
is 
empty. 
{ 
create 
new 
node 
containing 
the 
fields 
( 
leU 
child, 
keyword, 
rightchild, 
link). 
LeUchild 
value 
= 
NULL 
Rightchild 
value 
= 
NULL 
Link 
= 
address 
of 
database 
where 
the 
context 
and 
the 
corresponding 
document_id 
is 
stored 
Insert_node(); 
} 
Insert_node() 
{ 
Check, 
whether 
value 
in 
current 
node 
and 
a 
new 
keyword 
value 
are 
equal. 
If 
so, 
duplicate 
is 
found. 
Otherwise, 
if 
a 
new 
keyword 
value 
is 
less, 
than 
the 
root 
node's 
value: 
If 
a 
current 
node 
has 
no 
leU 
child, 
place 
for 
inserNon 
has 
been 
found; 
Otherwise, 
handle 
the 
leU 
child 
with 
the 
same 
algorithm. 
Compute_height(); 
if 
a 
new 
value 
is 
greater, 
than 
the 
root 
node's 
value: 
if 
a 
current 
node 
has 
no 
right 
child, 
place 
for 
inserNon 
has 
been 
found; 
otherwise, 
handle 
the 
right 
child 
with 
the 
same 
algorithm. 
Compute_height();
Node 
structure. 
The rearrangement of the node can eliminate the imbalance. 
Representation of keywords using binary search tree
Example
Node 
structure.
CONCLUSION 
• This 
paper 
proposes 
a 
technique 
for 
indexing 
the 
keyword 
extracted 
from 
the 
web 
documents 
along 
with 
their 
context. 
• The 
AVL 
tree 
based 
indexing 
technique, 
is 
able 
to 
support 
dynamic 
indexing 
and 
improves 
the 
performance 
in 
terms 
of 
accuracy 
and 
efficiency 
for 
retrieving 
more, 
relevant 
documents 
as 
per 
the 
user’s 
requirements 
since 
the 
context 
of 
the 
various 
keywords 
is 
also 
stored 
along 
with 
them. 
• Thus, 
the 
indexing 
technique 
provides 
a 
fast 
access 
to 
document 
context 
and 
structure 
along 
with 
an 
opNmized 
searching.
Thank 
You

More Related Content

What's hot

Document Classification and Clustering
Document Classification and ClusteringDocument Classification and Clustering
Document Classification and Clustering
Ankur Shrivastava
 
A Research Literature Search Engine With Abbreviation Recognition
A Research Literature Search Engine With Abbreviation RecognitionA Research Literature Search Engine With Abbreviation Recognition
A Research Literature Search Engine With Abbreviation Recognition
Hector Lin
 
Binary search in ds
Binary search in dsBinary search in ds
Binary search in ds
chauhankapil
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
guest0edcaf
 
Evolving a Medical Image Similarity Search
Evolving a Medical Image Similarity SearchEvolving a Medical Image Similarity Search
Evolving a Medical Image Similarity Search
Sujit Pal
 
Data-Applied: Technology Insights
Data-Applied: Technology InsightsData-Applied: Technology Insights
Data-Applied: Technology Insights
DataminingTools Inc
 
Reviewing basic concepts of relational database
Reviewing basic concepts of relational databaseReviewing basic concepts of relational database
Reviewing basic concepts of relational database
Hitesh Mohapatra
 
Susie search using services and information extraction
Susie search using services and information extractionSusie search using services and information extraction
Susie search using services and information extraction
IEEEFINALYEARPROJECTS
 
8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network
INFOGAIN PUBLICATION
 
Query-porcessing-& Query optimization
Query-porcessing-& Query optimizationQuery-porcessing-& Query optimization
Query-porcessing-& Query optimization
Saranya Natarajan
 
Topic sensitive page rank(review)
Topic sensitive page rank(review)Topic sensitive page rank(review)
Topic sensitive page rank(review)
hongs
 
Query evaluation and optimization
Query evaluation and optimizationQuery evaluation and optimization
Query evaluation and optimization
lavanya marichamy
 
Combination of Levenshtein Distance and Rabin-Karp to Improve the Accuracy of...
Combination of Levenshtein Distance and Rabin-Karp to Improve the Accuracy of...Combination of Levenshtein Distance and Rabin-Karp to Improve the Accuracy of...
Combination of Levenshtein Distance and Rabin-Karp to Improve the Accuracy of...
Universitas Pembangunan Panca Budi
 
Vchunk join an efficient algorithm for edit similarity joins
Vchunk join an efficient algorithm for edit similarity joinsVchunk join an efficient algorithm for edit similarity joins
Vchunk join an efficient algorithm for edit similarity joins
Vijay Koushik
 
Real Time Competitive Marketing Intelligence
Real Time Competitive Marketing IntelligenceReal Time Competitive Marketing Intelligence
Real Time Competitive Marketing Intelligence
feiwin
 
Classification of URLs
Classification of URLsClassification of URLs
Classification of URLs
FANCY ARORA
 
Indexing in eXist database
Indexing in eXist database Indexing in eXist database
Indexing in eXist database
redchilly
 
Supporting search as-you-type using sql in databases
Supporting search as-you-type using sql in databasesSupporting search as-you-type using sql in databases
Supporting search as-you-type using sql in databases
Ecway Technologies
 
Author paper identification problem final presentation
Author  paper identification problem final presentationAuthor  paper identification problem final presentation
Author paper identification problem final presentation
Pooja Mishra
 
Clustering sentence level text using a novel fuzzy relational clustering algo...
Clustering sentence level text using a novel fuzzy relational clustering algo...Clustering sentence level text using a novel fuzzy relational clustering algo...
Clustering sentence level text using a novel fuzzy relational clustering algo...
JPINFOTECH JAYAPRAKASH
 

What's hot (20)

Document Classification and Clustering
Document Classification and ClusteringDocument Classification and Clustering
Document Classification and Clustering
 
A Research Literature Search Engine With Abbreviation Recognition
A Research Literature Search Engine With Abbreviation RecognitionA Research Literature Search Engine With Abbreviation Recognition
A Research Literature Search Engine With Abbreviation Recognition
 
Binary search in ds
Binary search in dsBinary search in ds
Binary search in ds
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
 
Evolving a Medical Image Similarity Search
Evolving a Medical Image Similarity SearchEvolving a Medical Image Similarity Search
Evolving a Medical Image Similarity Search
 
Data-Applied: Technology Insights
Data-Applied: Technology InsightsData-Applied: Technology Insights
Data-Applied: Technology Insights
 
Reviewing basic concepts of relational database
Reviewing basic concepts of relational databaseReviewing basic concepts of relational database
Reviewing basic concepts of relational database
 
Susie search using services and information extraction
Susie search using services and information extractionSusie search using services and information extraction
Susie search using services and information extraction
 
8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network
 
Query-porcessing-& Query optimization
Query-porcessing-& Query optimizationQuery-porcessing-& Query optimization
Query-porcessing-& Query optimization
 
Topic sensitive page rank(review)
Topic sensitive page rank(review)Topic sensitive page rank(review)
Topic sensitive page rank(review)
 
Query evaluation and optimization
Query evaluation and optimizationQuery evaluation and optimization
Query evaluation and optimization
 
Combination of Levenshtein Distance and Rabin-Karp to Improve the Accuracy of...
Combination of Levenshtein Distance and Rabin-Karp to Improve the Accuracy of...Combination of Levenshtein Distance and Rabin-Karp to Improve the Accuracy of...
Combination of Levenshtein Distance and Rabin-Karp to Improve the Accuracy of...
 
Vchunk join an efficient algorithm for edit similarity joins
Vchunk join an efficient algorithm for edit similarity joinsVchunk join an efficient algorithm for edit similarity joins
Vchunk join an efficient algorithm for edit similarity joins
 
Real Time Competitive Marketing Intelligence
Real Time Competitive Marketing IntelligenceReal Time Competitive Marketing Intelligence
Real Time Competitive Marketing Intelligence
 
Classification of URLs
Classification of URLsClassification of URLs
Classification of URLs
 
Indexing in eXist database
Indexing in eXist database Indexing in eXist database
Indexing in eXist database
 
Supporting search as-you-type using sql in databases
Supporting search as-you-type using sql in databasesSupporting search as-you-type using sql in databases
Supporting search as-you-type using sql in databases
 
Author paper identification problem final presentation
Author  paper identification problem final presentationAuthor  paper identification problem final presentation
Author paper identification problem final presentation
 
Clustering sentence level text using a novel fuzzy relational clustering algo...
Clustering sentence level text using a novel fuzzy relational clustering algo...Clustering sentence level text using a novel fuzzy relational clustering algo...
Clustering sentence level text using a novel fuzzy relational clustering algo...
 

Viewers also liked

Book Citation Index and NOVA's Ranking
Book Citation Index and NOVA's RankingBook Citation Index and NOVA's Ranking
Book Citation Index and NOVA's Ranking
NOVA
 
Google App indexing
Google App indexingGoogle App indexing
Google App indexing
Daniele Vitali
 
Chunking
ChunkingChunking
Chunking
Heather2217
 
Best Practices For Delivering Virtual Classroom Training
Best Practices For Delivering Virtual Classroom TrainingBest Practices For Delivering Virtual Classroom Training
Best Practices For Delivering Virtual Classroom Training
Fareeza Marican
 
Chunking Text
Chunking TextChunking Text
Chunking Text
Helpful Partners
 
Assignment 1 determining employee motivation
Assignment 1 determining employee motivationAssignment 1 determining employee motivation
Assignment 1 determining employee motivation
Moses Mbanje
 
Ratings made simple edited
Ratings made simple editedRatings made simple edited
Ratings made simple edited
mediaplaylab
 
Video Paper Builder
Video Paper BuilderVideo Paper Builder
Video Paper Builder
mediaplaylab
 
Desitara-optimized-seo-plan-for-desitara-by-Shyam-Swaraj
Desitara-optimized-seo-plan-for-desitara-by-Shyam-SwarajDesitara-optimized-seo-plan-for-desitara-by-Shyam-Swaraj
Desitara-optimized-seo-plan-for-desitara-by-Shyam-Swaraj
EventXP
 
Diacritice romanesti in ym 2
Diacritice romanesti in ym 2Diacritice romanesti in ym 2
Diacritice romanesti in ym 2dianaifrim
 
áLbum De FotografíAs
áLbum De FotografíAsáLbum De FotografíAs
áLbum De FotografíAs
BAT007
 
Property Centric Overview
Property Centric OverviewProperty Centric Overview
Property Centric Overview
Property Centric
 
Amitabh Leveraging Cable Networks In India
Amitabh Leveraging Cable Networks In IndiaAmitabh Leveraging Cable Networks In India
Amitabh Leveraging Cable Networks In India
gunjan999906
 
Exposure Lecture 2014 - Tamil Language
Exposure Lecture 2014 - Tamil LanguageExposure Lecture 2014 - Tamil Language
Exposure Lecture 2014 - Tamil Languagemediaplaylab
 
เสนอค่าย
เสนอค่ายเสนอค่าย
เสนอค่าย
Montira Hokjaroen
 
Introduction to Service Design for Translink
Introduction to Service Design for TranslinkIntroduction to Service Design for Translink
Introduction to Service Design for Translink
Cathy Wang
 
A L BÚ M D E L P A R Q U E E C O LÓ G I C O
A L BÚ M  D E L  P A R Q U E  E C O LÓ G I C OA L BÚ M  D E L  P A R Q U E  E C O LÓ G I C O
A L BÚ M D E L P A R Q U E E C O LÓ G I C O
BAT007
 
C:\Fakepath\Information Literacy Hw5
C:\Fakepath\Information Literacy  Hw5C:\Fakepath\Information Literacy  Hw5
C:\Fakepath\Information Literacy Hw5
王耀慶
 
Allyssen
AllyssenAllyssen
Allyssen
Looppa
 
Bluesky Concept Presentation
Bluesky Concept PresentationBluesky Concept Presentation
Bluesky Concept Presentation
HuanYang
 

Viewers also liked (20)

Book Citation Index and NOVA's Ranking
Book Citation Index and NOVA's RankingBook Citation Index and NOVA's Ranking
Book Citation Index and NOVA's Ranking
 
Google App indexing
Google App indexingGoogle App indexing
Google App indexing
 
Chunking
ChunkingChunking
Chunking
 
Best Practices For Delivering Virtual Classroom Training
Best Practices For Delivering Virtual Classroom TrainingBest Practices For Delivering Virtual Classroom Training
Best Practices For Delivering Virtual Classroom Training
 
Chunking Text
Chunking TextChunking Text
Chunking Text
 
Assignment 1 determining employee motivation
Assignment 1 determining employee motivationAssignment 1 determining employee motivation
Assignment 1 determining employee motivation
 
Ratings made simple edited
Ratings made simple editedRatings made simple edited
Ratings made simple edited
 
Video Paper Builder
Video Paper BuilderVideo Paper Builder
Video Paper Builder
 
Desitara-optimized-seo-plan-for-desitara-by-Shyam-Swaraj
Desitara-optimized-seo-plan-for-desitara-by-Shyam-SwarajDesitara-optimized-seo-plan-for-desitara-by-Shyam-Swaraj
Desitara-optimized-seo-plan-for-desitara-by-Shyam-Swaraj
 
Diacritice romanesti in ym 2
Diacritice romanesti in ym 2Diacritice romanesti in ym 2
Diacritice romanesti in ym 2
 
áLbum De FotografíAs
áLbum De FotografíAsáLbum De FotografíAs
áLbum De FotografíAs
 
Property Centric Overview
Property Centric OverviewProperty Centric Overview
Property Centric Overview
 
Amitabh Leveraging Cable Networks In India
Amitabh Leveraging Cable Networks In IndiaAmitabh Leveraging Cable Networks In India
Amitabh Leveraging Cable Networks In India
 
Exposure Lecture 2014 - Tamil Language
Exposure Lecture 2014 - Tamil LanguageExposure Lecture 2014 - Tamil Language
Exposure Lecture 2014 - Tamil Language
 
เสนอค่าย
เสนอค่ายเสนอค่าย
เสนอค่าย
 
Introduction to Service Design for Translink
Introduction to Service Design for TranslinkIntroduction to Service Design for Translink
Introduction to Service Design for Translink
 
A L BÚ M D E L P A R Q U E E C O LÓ G I C O
A L BÚ M  D E L  P A R Q U E  E C O LÓ G I C OA L BÚ M  D E L  P A R Q U E  E C O LÓ G I C O
A L BÚ M D E L P A R Q U E E C O LÓ G I C O
 
C:\Fakepath\Information Literacy Hw5
C:\Fakepath\Information Literacy  Hw5C:\Fakepath\Information Literacy  Hw5
C:\Fakepath\Information Literacy Hw5
 
Allyssen
AllyssenAllyssen
Allyssen
 
Bluesky Concept Presentation
Bluesky Concept PresentationBluesky Concept Presentation
Bluesky Concept Presentation
 

Similar to Context based Web Indexing for Storage of Relevant Web Pages

Context Based Web Indexing For Semantic Web
Context Based Web Indexing For Semantic WebContext Based Web Indexing For Semantic Web
Context Based Web Indexing For Semantic Web
IOSR Journals
 
Comparative analysis of relative and exact search for web information retrieval
Comparative analysis of relative and exact search for web information retrievalComparative analysis of relative and exact search for web information retrieval
Comparative analysis of relative and exact search for web information retrieval
eSAT Journals
 
Elastic search
Elastic searchElastic search
Elastic search
Mahmoud91Tx
 
Annotating search results from web databases-IEEE Transaction Paper 2013
Annotating search results from web databases-IEEE Transaction Paper 2013Annotating search results from web databases-IEEE Transaction Paper 2013
Annotating search results from web databases-IEEE Transaction Paper 2013
Yadhu Kiran
 
Keyword query routing
Keyword query routingKeyword query routing
Keyword query routing
Shakas Technologies
 
Red Hat Summit Connect 2023 - Redis Enterprise, the engine of Generative AI
Red Hat Summit Connect 2023 - Redis Enterprise, the engine of Generative AIRed Hat Summit Connect 2023 - Redis Enterprise, the engine of Generative AI
Red Hat Summit Connect 2023 - Redis Enterprise, the engine of Generative AI
Luigi Fugaro
 
Keyword Query Routing
Keyword Query RoutingKeyword Query Routing
Keyword Query Routing
SWAMI06
 
DEByE─Data Extraction By Example
DEByE─Data Extraction By ExampleDEByE─Data Extraction By Example
DEByE─Data Extraction By Example
lswing
 
2014 IEEE JAVA DATA MINING PROJECT Keyword query routing
2014 IEEE JAVA DATA MINING PROJECT Keyword query routing2014 IEEE JAVA DATA MINING PROJECT Keyword query routing
2014 IEEE JAVA DATA MINING PROJECT Keyword query routing
IEEEMEMTECHSTUDENTSPROJECTS
 
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routingIEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEEFINALYEARSTUDENTPROJECTS
 
score based ranking of documents
score based ranking of documentsscore based ranking of documents
score based ranking of documents
Kriti Khanna
 
Lucene basics
Lucene basicsLucene basics
Lucene basics
Nitin Pande
 
Getting Started With Elasticsearch In .NET
Getting Started With Elasticsearch In .NETGetting Started With Elasticsearch In .NET
Getting Started With Elasticsearch In .NET
Ahmed Abd Ellatif
 
Getting started with Elasticsearch in .net
Getting started with Elasticsearch in .netGetting started with Elasticsearch in .net
Getting started with Elasticsearch in .net
Ismaeel Enjreny
 
View the Microsoft Word document.doc
View the Microsoft Word document.docView the Microsoft Word document.doc
View the Microsoft Word document.doc
butest
 
View the Microsoft Word document.doc
View the Microsoft Word document.docView the Microsoft Word document.doc
View the Microsoft Word document.doc
butest
 
View the Microsoft Word document.doc
View the Microsoft Word document.docView the Microsoft Word document.doc
View the Microsoft Word document.doc
butest
 
Sub1522
Sub1522Sub1522
International Journal of Computational Engineering Research(IJCER)
 International Journal of Computational Engineering Research(IJCER)  International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
ijceronline
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
Divij Sehgal
 

Similar to Context based Web Indexing for Storage of Relevant Web Pages (20)

Context Based Web Indexing For Semantic Web
Context Based Web Indexing For Semantic WebContext Based Web Indexing For Semantic Web
Context Based Web Indexing For Semantic Web
 
Comparative analysis of relative and exact search for web information retrieval
Comparative analysis of relative and exact search for web information retrievalComparative analysis of relative and exact search for web information retrieval
Comparative analysis of relative and exact search for web information retrieval
 
Elastic search
Elastic searchElastic search
Elastic search
 
Annotating search results from web databases-IEEE Transaction Paper 2013
Annotating search results from web databases-IEEE Transaction Paper 2013Annotating search results from web databases-IEEE Transaction Paper 2013
Annotating search results from web databases-IEEE Transaction Paper 2013
 
Keyword query routing
Keyword query routingKeyword query routing
Keyword query routing
 
Red Hat Summit Connect 2023 - Redis Enterprise, the engine of Generative AI
Red Hat Summit Connect 2023 - Redis Enterprise, the engine of Generative AIRed Hat Summit Connect 2023 - Redis Enterprise, the engine of Generative AI
Red Hat Summit Connect 2023 - Redis Enterprise, the engine of Generative AI
 
Keyword Query Routing
Keyword Query RoutingKeyword Query Routing
Keyword Query Routing
 
DEByE─Data Extraction By Example
DEByE─Data Extraction By ExampleDEByE─Data Extraction By Example
DEByE─Data Extraction By Example
 
2014 IEEE JAVA DATA MINING PROJECT Keyword query routing
2014 IEEE JAVA DATA MINING PROJECT Keyword query routing2014 IEEE JAVA DATA MINING PROJECT Keyword query routing
2014 IEEE JAVA DATA MINING PROJECT Keyword query routing
 
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routingIEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
 
score based ranking of documents
score based ranking of documentsscore based ranking of documents
score based ranking of documents
 
Lucene basics
Lucene basicsLucene basics
Lucene basics
 
Getting Started With Elasticsearch In .NET
Getting Started With Elasticsearch In .NETGetting Started With Elasticsearch In .NET
Getting Started With Elasticsearch In .NET
 
Getting started with Elasticsearch in .net
Getting started with Elasticsearch in .netGetting started with Elasticsearch in .net
Getting started with Elasticsearch in .net
 
View the Microsoft Word document.doc
View the Microsoft Word document.docView the Microsoft Word document.doc
View the Microsoft Word document.doc
 
View the Microsoft Word document.doc
View the Microsoft Word document.docView the Microsoft Word document.doc
View the Microsoft Word document.doc
 
View the Microsoft Word document.doc
View the Microsoft Word document.docView the Microsoft Word document.doc
View the Microsoft Word document.doc
 
Sub1522
Sub1522Sub1522
Sub1522
 
International Journal of Computational Engineering Research(IJCER)
 International Journal of Computational Engineering Research(IJCER)  International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 

Recently uploaded

How to Predict Vendor Bill Product in Odoo 17
How to Predict Vendor Bill Product in Odoo 17How to Predict Vendor Bill Product in Odoo 17
How to Predict Vendor Bill Product in Odoo 17
Celine George
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
Nguyen Thanh Tu Collection
 
HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.
deepaannamalai16
 
Juneteenth Freedom Day 2024 David Douglas School District
Juneteenth Freedom Day 2024 David Douglas School DistrictJuneteenth Freedom Day 2024 David Douglas School District
Juneteenth Freedom Day 2024 David Douglas School District
David Douglas School District
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
haiqairshad
 
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
TechSoup
 
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptxBIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
RidwanHassanYusuf
 
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumPhilippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
MJDuyan
 
Data Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsxData Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsx
Prof. Dr. K. Adisesha
 
The basics of sentences session 7pptx.pptx
The basics of sentences session 7pptx.pptxThe basics of sentences session 7pptx.pptx
The basics of sentences session 7pptx.pptx
heathfieldcps1
 
Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10
nitinpv4ai
 
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdfمصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
سمير بسيوني
 
skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)
Mohammad Al-Dhahabi
 
Electric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger HuntElectric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger Hunt
RamseyBerglund
 
CIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdfCIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdf
blueshagoo1
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
HajraNaeem15
 
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
Nguyen Thanh Tu Collection
 
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptxChapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Denish Jangid
 
Stack Memory Organization of 8086 Microprocessor
Stack Memory Organization of 8086 MicroprocessorStack Memory Organization of 8086 Microprocessor
Stack Memory Organization of 8086 Microprocessor
JomonJoseph58
 
Bonku-Babus-Friend by Sathyajith Ray (9)
Bonku-Babus-Friend by Sathyajith Ray  (9)Bonku-Babus-Friend by Sathyajith Ray  (9)
Bonku-Babus-Friend by Sathyajith Ray (9)
nitinpv4ai
 

Recently uploaded (20)

How to Predict Vendor Bill Product in Odoo 17
How to Predict Vendor Bill Product in Odoo 17How to Predict Vendor Bill Product in Odoo 17
How to Predict Vendor Bill Product in Odoo 17
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
 
HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.
 
Juneteenth Freedom Day 2024 David Douglas School District
Juneteenth Freedom Day 2024 David Douglas School DistrictJuneteenth Freedom Day 2024 David Douglas School District
Juneteenth Freedom Day 2024 David Douglas School District
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
 
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
 
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptxBIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
 
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumPhilippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
 
Data Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsxData Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsx
 
The basics of sentences session 7pptx.pptx
The basics of sentences session 7pptx.pptxThe basics of sentences session 7pptx.pptx
The basics of sentences session 7pptx.pptx
 
Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10
 
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdfمصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
 
skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)
 
Electric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger HuntElectric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger Hunt
 
CIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdfCIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdf
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
 
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
 
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptxChapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptx
 
Stack Memory Organization of 8086 Microprocessor
Stack Memory Organization of 8086 MicroprocessorStack Memory Organization of 8086 Microprocessor
Stack Memory Organization of 8086 Microprocessor
 
Bonku-Babus-Friend by Sathyajith Ray (9)
Bonku-Babus-Friend by Sathyajith Ray  (9)Bonku-Babus-Friend by Sathyajith Ray  (9)
Bonku-Babus-Friend by Sathyajith Ray (9)
 

Context based Web Indexing for Storage of Relevant Web Pages

  • 1. Context based Web Indexing for Storage of Relevant Web Pages
  • 2. ABSTRACT • A focused crawler downloads web pages that are relevant to a user specified topic. • This paper proposes a technique for indexing the keyword extracted from the web documents along with their contexts wherein it uses a height balanced binary search (AVL) tree, for indexing purpose to enhance the performance of the retrieval system.
  • 3. INTRODUCTION • The basic aim is to select the best collecNon of informaNon according to users need. • The exisNng focused crawlers do not analyze the context of the keyword in the web page before they download it. • The Use Of AVL Tree
  • 4. AVL Tree • In AVL (i.e. height balanced binary tree) tree [3], the height of a tree is defined as the length of the longest path from the root node of the tree to one of its leaf node. • And the balance factor (BF) is: (height of leU subtree – height of right subtree). To call the AVL as balanced the value of BF should be -­‐1, 0 or 1. • This strategy makes the searching task faster and opNmized.
  • 5. RELATED WORK • F. Silvestri, R.Perego and Orlando. proposed the reordering algorithm. • Oren Zamir and Oren Etzioni. proposed threshold based clustering algorithm. • C. Zhou, W. Ding and Na Yang. The paper introduces a double indexing mechanism for search engines based on campus Net. • N. Chauhan and A. K. Sharma. Proposed, the context driven focused crawler (CDFC) • P. Gupta and A. K. Sharma. Worked on context based indexing in search engines using ontology.
  • 6. PROPOSED WORK • This paper proposes an algorithm for indexing the keyword extracted from the web documents along with their context. • The indexing technique uses a height balanced binary search (AVL) tree, in addiNon to improved performance in the retrieval of informaNon, this data structure is able to support dynamic indexing, which is especially important for environments where documents are changed frequently.
  • 7. Architecture of context based indexing.
  • 9. Steps involved in the construcNon of the context based index using AVL. • Step1: Preprocess the crawled web documents and extract the keyword along with their frequency of occurrence. • Step2: Input the keywords to the context generator which extracts the mulNple contextual sense of the word. Context is being searched in the thesaurus (a dicNonary of words available on WWW from thesaurus.com, which contains the words as well their mulNple meanings). Step3: The keywords along with the context are indexed using the AVL tree. • Step4: Compare the entered keyword with the node’s keyword field of the AVL tree, unNl a similar word is found. • Step5: If search is not a success, create a node containing the following fields (LeUchild, keyword, rightchild, link) as shown in figure4.The link is pointer variable which points to the database where the context of keyword and the corresponding document_id is stored. Context is being searched in the thesaurus (a dicNonary of words available on WWW from thesaurus.com, which contains the words as well their mulNple meanings). Step6: Arrange the node in the AVL tree, according to the height BF.
  • 10. Steps involved in the construcNon of the context based index using AVL. • Step7: Repeat step 4, 5 and 6 unNl all the extracted keywords are arranged. • Step8: Now when the user fires the query with context explicitly specified, then the index is being searched, reducing its search Nme to half of the linear search. • Step9. Thus, AVL indexing technique provides a fast access to document context and structure.
  • 12. Node structure. Create_BST() //iniNally the tree is empty. { create new node containing the fields ( leU child, keyword, rightchild, link). LeUchild value = NULL Rightchild value = NULL Link = address of database where the context and the corresponding document_id is stored Insert_node(); } Insert_node() { Check, whether value in current node and a new keyword value are equal. If so, duplicate is found. Otherwise, if a new keyword value is less, than the root node's value: If a current node has no leU child, place for inserNon has been found; Otherwise, handle the leU child with the same algorithm. Compute_height(); if a new value is greater, than the root node's value: if a current node has no right child, place for inserNon has been found; otherwise, handle the right child with the same algorithm. Compute_height();
  • 13. Node structure. The rearrangement of the node can eliminate the imbalance. Representation of keywords using binary search tree
  • 16. CONCLUSION • This paper proposes a technique for indexing the keyword extracted from the web documents along with their context. • The AVL tree based indexing technique, is able to support dynamic indexing and improves the performance in terms of accuracy and efficiency for retrieving more, relevant documents as per the user’s requirements since the context of the various keywords is also stored along with them. • Thus, the indexing technique provides a fast access to document context and structure along with an opNmized searching.