SlideShare a Scribd company logo
International Journal of Computer Applications (0975 – 8887)
Volume 95– No.8, June 2014
1
Data Mining in Web Search Engine Optimization and
User Assisted Rank Results
Minky Jindal
Institute of Technology and Management
Gurgaon 122017, Haryana, India
Nisha kharb
Institute of Technology and Management
Gurgaon 122017, Haryana, India
ABSTRACT
In the fast moving world, the use of web is been increasing
day by day so that the requirement of users relative to web
search are also increasing. The content search over the web is
one of the important research area comes under the web
content mining. According to a traditional search engine, the
search is based on the content based matching. But when
some site is optimized under the SEO tools, such kind of
search is not effective in all ways. The aim of this research is
to design a user assisted, reliable, search based on the
keyword based analysis ,to provide the user assisted ranked
results so that user can select the priority links ,discard the
spam links over the web and efficient search optimization
model over the open web. The main objective of the work is
to implement the work in user friendly environment and
analysis of work under different parameters.
Keywords
web pages, data mining, web mining, extreme programming,
software tool.
1. INTRODUCTION
1.1 Data Mining
Web usage mining is a subset of web mining operations which
itself is a subset of data mining in general. The aim is to use
the data and information extracted in web systems in order to
reach knowledge of the system itself. Data mining is different
from information extraction although they are closely related.
To better understand the concepts brief definitions of
keywords can be given as [1].
 Data:- “A class of information objects, made up of
units of binary code that are intended to be stored,
processed, and transmitted by digital computers”.
 Information:-“is a set of facts with processing
capability added, such as context, relationships to
other facts about the same or related objects,
implying an increased usefulness. Information
provides meaning to data”
 Knowledge :- “is the summation of information
into independent concepts and rules that can explain
relationships or predict outcomes”
Information extraction is the process of extraction information
from data sources whether they are structured, unstructured or
semi-structured into structured and computer understandable
data formats. Area where data mining is widely used is
bioinformatics where very large data about protein structures,
networks and genetic material is analyzed. The sub category
of interest in this thesis is the web mining which acts on the
data made available in the World Wide Web (WWW) data
servers.
1.1.1 Web Mining
Web mining consists of a set operations defined on data
residing on WWW data servers defines web mining as“…the
discovery and analysis of useful information from the World
Wide Web”. Web mining as a sub category of data mining is
fairly recent compared to their areas since the introduction of
internet and its widespread usage itself is also recent.
However, the incentive to mine the data available on the
internet is quite strong. Both the number of users around the
world accessing online data and the volume of the data itself
motivate the stakeholders of the web sites to consider
analyzing the data and user behavior. Web mining is mainly
categorized into two subsets namely web content mining and
web usage mining. While the content mining approaches
focus on the content of single web pages, web usage mining
uses server logs that detail the past accesses to the web site
data made available to public.
1.1.2 Web Content Mining
“Web content mining describes the automatic search of
information resources available on-line”. The focus is on the
content of web pages themselves. content mining as agent-
based approaches; where intelligent web agents such as
crawlers autonomously crawl the web and classify data and
database approaches; where information retrieval tasks are
employed to store web data in databases where data mining
process can take place Most web content mining studies have
focused on textual and graphical data since the early years of
internet mostly featured textual or graphical information.
Recent studies started to focus on visual and aural data such
as sound and video content too [2,3].
1.1.3 Web Usage Mining
The main topic of this thesis is the web usage mining. Usage
mining as the name implies focus on how the users of
websites interact with web site, the web pages visited, the
order of visit, timestamps of visits and durations of them. The
main source of data for the web usage mining is the server
logs which log each visit to each web page with possibly IP,
referrer, time, browser and accessed page link. Although
many areas and applications can be cited where usage mining
is useful, it can be said the main idea behind web usage
mining is to let users of a web site to use it with ease
efficiently, predict and recommend parts of the web site to
user based on their and previous users actions on the web
site[4].
1.2 LITERATURE SURVEY
In Year 2011, D. Choi has defined an approach to perform the
query over the web and to extract the web document. The
author also presented the approach to assign the ranking to
these web documents. With the development of web search
engines, one of the major tasks is to retrieve these documents
from web effectively. These search engines uses the some
ranking algorithm to present the result in an effective way.
International Journal of Computer Applications (0975 – 8887)
Volume 95– No.8, June 2014
2
The author has defined a study of existing ranking algorithm
used by different search engines. The author explored the
advantages and limitations of these ranking algorithms. The
major contribution of author was the definition of query based
information retrieval. The author defined the classification
over a query and performed the query filtration. Based on this
analysis, the ranking is improved and refined [5]
Zhou Hui[6] has presented a work on optimization of search
engine under the keyword analysis along with face link
analysis and back link analysis. Author defined a relational
environment based on search engine optimization os that the
search ranking will be improved. Author also discussed
various aspects of search engine optimization including the
optimization vector, ranking, working principal etc.
Ping-Tsai Chun[7] has presented a search engine optimization
approach under the current market scenario analysis. Author
defined the web service analysis to improve the business
dictating and to provide the work under small organizations so
that the effective keyword analysis based search will be
performed. Author presented the work for text search as well
as for image search over the web. The pattern analysis is
defined to perform the effective search over web.
One of the common model for web page ranking and
prediction system is defined by Markov Model. Such model
defines the navigational behavior of web graph theory as well
as defines the transitional probabilities over the ranking
analysis. The author not only defined work for a single web
page access, but also presents the work for web path
generation. The web path is actually defined as a series of web
pages that a user can visit after visiting a specific web page.
To perform this kind of analysis a Markov Model based
prediction system is defined. The prediction is here defined
under the web usage mining that defined the structural
information for prediction of web pages. To perform such
kind of analysis, the author defined a web page graph and
implements the markov model over it to analyze the
frequency match. Based on which a acyclic web path is
generated and based on the weightage assigned to this web
path the prediction is performed [8]. Another work in web
page ranking is the comparison of different web pages and the
web sites. Author M. Klein performed this comparison on two
football team web sites of college team. The analysis is
performed under the web page metrics to perform the quality
assessment. The author has defined the page comparison and
the ranking system under the graph theory[9].
2. PROPOSED APPROACH
The proposed work is about to optimize the topic based Web
Service crawling process with the concept of exclusion of
duplicate pages. For this a new architecture is proposed, this
architecture will use the rank based service selection
approach. In this work the ranking is performed respective
main criteria’s called User Interest Analysis. The user interest
analysis is further categorized under three sub categories
called User Query Relevancy Analysis, User
Recommendation to service in terms of like dislike factor and
the user Web service visits. The basic phenomenon is given as
under.
Figure 1: Proposed Web Search Architecture
As we can see in this proposed architecture the user will
interact to the Web Service with his topic based query to
retrieve the Web Service pages. As the page is query
performed it will perform request to the Web Service and
generate the basic url list. Now it will retrieve the data from
the Web Service. For the url collection it will use some
concepts like indexing and the ranking. The indexing will
provide a fast access to the Web Service page where as
ranking will arrange the list according to the priority. Now as
a Web Service page is fetched, the proposed approach will
retrieve the keywords form the document and perform the
relevancy match by performing the match of service keywords
with user query. Now as a new page is retrieved it will
generate the suffix tree and perform a suffix tree based
comparison to analyze the relevancy ratio. Based on this
factor the initial ranking is assigned to the Web service. Here
Fig 1 is showing the proposed web architecture. The web
architecture is divided in three mains stages. At the earlier
stage, the keyword analysis is performed in terms of keyword
identification over the query and removes the stop list words
from the query. After this stage, the keyword extraction over
the query is obtained. Now this keyword list is considered as
the query and passed over the web. As the web contents are
extracted over the web, the link analysis is performed. This
analysis will obtain the web contents and perform the content
based match to obtain the most relevant pages over the web.
The steps involved proposed work is presented in the Figure
2.
Crawling the Web Page based on query
Parsing the Web Contents to Text Form
Identify the relevancy Vector and assign initial ranking
Obtain User response
Estimate ranking
Result will be displayed
Figure 2: Process Model of Proposed Work
International Journal of Computer Applications (0975 – 8887)
Volume 95– No.8, June 2014
3
3. PROPOSED ALGORITHM
Algorithm {
1. Initialize the Web Environment
2. Get user Query
3. Accept the user query and filter it to retrieve the
keywords under the following step
a. Remove the stoplist words from the query
list
b. Remove the Similar Words
c. Extract the Keywords From the Query
4. Use these extracted keywords as the main query to
the web system
5. Extract the web contents and find the occurrence of
the keywords in the web pages
6. Find the maximum match web page from the web
respective to its contents as well as internal link
contents
7. Find the list of M web pages from the web that
satisfy the relevancy vector
8. For i=1 to M
[Perform the Content based similarity measure as]
{
9. RelevencyVector = 0
For j=1 to Length (UserKeywords)
{
RelevencyVector=RelevencyVector +
KeywordOccurance(Page(i) ,Keyword(j))
/TotalKeywords(Page(i),Keyword(j));
}
10. Check the Existence of Particular web server in
Database if it does not exist then set this relevancy
vector as the initial ranking parameter
11. Obtain the rank based on user response parameters
i.e. like, dislike and
12. Update the rank based on user response.
}
13. Show the ranked list of pages to user
}
4. EXPERIMENTAL RESULTS
Here Figure 3 is showing the graphical screen on which the
user pass the query to search engine. The results are obtained
for the web user based on this query by performs server side
check under different parameters.
Figure 3: Graphical Screen
Here Figure 4 is showing the results obtained from the web
server based on user query. The query results are presented as
the base results and decide the initial ranking based on the
relevancy vector,
Figure 4: Initial Results
The given Figure 5 is showing the results based on proposed
user assisted weighted model. The ranks to links are assigned.
The primary ranking is based on relevancy vector. As the like
vector will be improved, the rank will be improved and as the
dislike vector will be increased, the rank will be decreased
International Journal of Computer Applications (0975 – 8887)
Volume 95– No.8, June 2014
4
Figure 5: Ranked Results
The given Figure 6 is the impact of like vector. The like clicks
on web link “education.com” is increased, ranking of the
particular link is increased.
Figure 6: Modified Ranked Results Based On Response
Here figure 7 is showing the analysis of crawled pages under
different parameters based on which the ranking to the web
pages is assigned. These parameters includes like, dislike,
visit count and ranking. Here figure is showing, As the user
response is provided to these pages, the ranking is changed.
As shown in figure, As the like vector is increased, the
ranking of particular page is also increased. In same way,
dislike vector and visit count also affect the ranking.
Figure 7: Ranked Page Analysis for Education Keyword
Here figure 8 is showing the analysis of crawled pages under
different parameters based on which the ranking to the web
pages is assigned. These parameters includes like, dislike,
visit count and ranking. Here figure is showing, As the user
response is provided to these pages, the ranking is changed.
As shown in figure, As the like vector is increased, the
ranking of particular page is also increased. In same way,
dislike vector and visit count also affect the ranking.
Figure 8: Ranked Page Analysis for Education Keyword
5. CONCLUSION
In this paper we have mainly presented work is about to
perform the effective search in the Web environment based on
the user query relevancy factor. The relevancy of the query is
here analyzed under three main factors called Keyword based
Analysis, User Recommendation Analysis and the User Web
service visit analysis. Based on these all factors a ranking
criterion is decided and based on these ranking vectors the
Web services are ordered. The user can get the best Web
International Journal of Computer Applications (0975 – 8887)
Volume 95– No.8, June 2014
5
service as well as recommend other for the best service
selection. In this work, the google App is used as the public
Web repository to perform the query analysis. The work is
implemented in a web environment to perform the user query
and to derive the ordered results from the query.
6. REFERENCES
[1] Rajeev Motwani," Evolution of Page Popularity
under Random Web Graph Models", PODS’06, June
26–28, 2006, Chicago, Illinois, USA. ACM 1-59593-
318-2/06/0006.
[2] Ravi Kumar," Rank Quantization", WSDM’13,
February 4–8, 2013, Rome, Italy,ACM 978-1-4503-
1869-3/13/02.
[3] Paul Alexandru Chirita," Using ODP Metadata to
Personalize Search", SIGIR’05, August 15–19, 2005,
Salvador, Brazil. ACM 1-59593-034-5/05/0008.
[4] Ricardo BaezaYates,"Web PageRanking usingLink
Attributes",WWW2004, May 17–22, 2004,NewYork,
USA.ACM 1-58113-912-8/04/0005.
[5] Donjung Choi," An Approach to Use Query-related
Web Context on Document Ranking", ICUIMC ’11,
February 21–23, 2011, Seoul, Korea. ACM 978-1-
4503-0571-6.
[6] Zhou Hui, Qin Shigang, Liu Jinhua, Chen Jianli,
"Study on Website Search Engine Optimization",
International Conference on Computer Science and
Service System, pp 930-933, 2012.
[7] Ping-Tsai Chung, "A Web Server Design Using
Search Engine Optimization Techniques for Web
Intelligence for Small Organizations", Proceedings of
IEEE Conference, pp 1-6, 2013.
[8] Magdalini Eirinaki," Web Path Recommendations
based on Page Ranking and Markov Models",
WIDM’05, November 5, 2005, Bremen, Germany
ACM 1-59593-194-5/05/0011.
[9] Martin Klein," Comparing the Performance of US
College Football Teams in the Web and on the Field",
HT’09, June 29–July 1, 2009, Torino, Italy. ACM
978-1-60558-486-7/09/06.
[10] JOHN B. KILLORAN, "How to Use Search Engine
Optimization Techniques to Increase Website
Visibility", IEEE TRANSACTIONS ON
PROFESSIONAL COMMUNICATION, VOL. 56,
NO. 1, pp 50-66, MARCH 2013.
[11] Chen Wang," Extracting Search-Focused Key N-
Grams for Relevance Ranking in Web Search",
WSDM’12, February 8–12, 2012, Seattle,
Washington, USA. ACM 978-1-4503-0747-5/12/02.
[12] Bin Gao," Semi-Supervised Ranking on Very Large
Graphs with Rich Metadata", KDD’11, August 21–24,
2011, San Diego, California, USA. ACM 978-1-4503-
0813-7/11/08.
IJCATM : www.ijcaonline.org

More Related Content

What's hot

Web Page Recommendation Using Web Mining
Web Page Recommendation Using Web MiningWeb Page Recommendation Using Web Mining
Web Page Recommendation Using Web Mining
IJERA Editor
 
Multi Similarity Measure based Result Merging Strategies in Meta Search Engine
Multi Similarity Measure based Result Merging Strategies in Meta Search EngineMulti Similarity Measure based Result Merging Strategies in Meta Search Engine
Multi Similarity Measure based Result Merging Strategies in Meta Search Engine
IDES Editor
 
An Effective Approach for Document Crawling With Usage Pattern and Image Base...
An Effective Approach for Document Crawling With Usage Pattern and Image Base...An Effective Approach for Document Crawling With Usage Pattern and Image Base...
An Effective Approach for Document Crawling With Usage Pattern and Image Base...
Editor IJCATR
 
A Survey on Web Page Recommendation and Data Preprocessing
A Survey on Web Page Recommendation and Data PreprocessingA Survey on Web Page Recommendation and Data Preprocessing
A Survey on Web Page Recommendation and Data Preprocessing
IJCERT
 
An Enhanced Approach for Detecting User's Behavior Applying Country-Wise Loca...
An Enhanced Approach for Detecting User's Behavior Applying Country-Wise Loca...An Enhanced Approach for Detecting User's Behavior Applying Country-Wise Loca...
An Enhanced Approach for Detecting User's Behavior Applying Country-Wise Loca...
IJSRD
 
A comprehensive study of mining web data
A comprehensive study of mining web dataA comprehensive study of mining web data
A comprehensive study of mining web data
eSAT Publishing House
 
Personalized web search using browsing history and domain knowledge
Personalized web search using browsing history and domain knowledgePersonalized web search using browsing history and domain knowledge
Personalized web search using browsing history and domain knowledge
Rishikesh Pathak
 
Recommendation generation by integrating sequential pattern mining and semantics
Recommendation generation by integrating sequential pattern mining and semanticsRecommendation generation by integrating sequential pattern mining and semantics
Recommendation generation by integrating sequential pattern mining and semantics
eSAT Journals
 
Recommendation generation by integrating sequential
Recommendation generation by integrating sequentialRecommendation generation by integrating sequential
Recommendation generation by integrating sequential
eSAT Publishing House
 
Efficient way of user search location in query processing
Efficient way of user search location in query processingEfficient way of user search location in query processing
Efficient way of user search location in query processing
eSAT Publishing House
 
Sekhon final 1_ppt
Sekhon final 1_pptSekhon final 1_ppt
Sekhon final 1_ppt
Manant Sweet
 
IRJET-A Survey on Web Personalization of Web Usage Mining
IRJET-A Survey on Web Personalization of Web Usage MiningIRJET-A Survey on Web Personalization of Web Usage Mining
IRJET-A Survey on Web Personalization of Web Usage Mining
IRJET Journal
 
Identifying the Number of Visitors to improve Website Usability from Educatio...
Identifying the Number of Visitors to improve Website Usability from Educatio...Identifying the Number of Visitors to improve Website Usability from Educatio...
Identifying the Number of Visitors to improve Website Usability from Educatio...
Editor IJCATR
 
WEB MINING – A CATALYST FOR E-BUSINESS
WEB MINING – A CATALYST FOR E-BUSINESSWEB MINING – A CATALYST FOR E-BUSINESS
WEB MINING – A CATALYST FOR E-BUSINESS
acijjournal
 
A270104
A270104A270104
IRJET- A Novel Technique for Inferring User Search using Feedback Sessions
IRJET- A Novel Technique for Inferring User Search using Feedback SessionsIRJET- A Novel Technique for Inferring User Search using Feedback Sessions
IRJET- A Novel Technique for Inferring User Search using Feedback Sessions
IRJET Journal
 
H0314450
H0314450H0314450
H0314450
iosrjournals
 
A Study on Web Structure Mining
A Study on Web Structure MiningA Study on Web Structure Mining
A Study on Web Structure Mining
IRJET Journal
 

What's hot (18)

Web Page Recommendation Using Web Mining
Web Page Recommendation Using Web MiningWeb Page Recommendation Using Web Mining
Web Page Recommendation Using Web Mining
 
Multi Similarity Measure based Result Merging Strategies in Meta Search Engine
Multi Similarity Measure based Result Merging Strategies in Meta Search EngineMulti Similarity Measure based Result Merging Strategies in Meta Search Engine
Multi Similarity Measure based Result Merging Strategies in Meta Search Engine
 
An Effective Approach for Document Crawling With Usage Pattern and Image Base...
An Effective Approach for Document Crawling With Usage Pattern and Image Base...An Effective Approach for Document Crawling With Usage Pattern and Image Base...
An Effective Approach for Document Crawling With Usage Pattern and Image Base...
 
A Survey on Web Page Recommendation and Data Preprocessing
A Survey on Web Page Recommendation and Data PreprocessingA Survey on Web Page Recommendation and Data Preprocessing
A Survey on Web Page Recommendation and Data Preprocessing
 
An Enhanced Approach for Detecting User's Behavior Applying Country-Wise Loca...
An Enhanced Approach for Detecting User's Behavior Applying Country-Wise Loca...An Enhanced Approach for Detecting User's Behavior Applying Country-Wise Loca...
An Enhanced Approach for Detecting User's Behavior Applying Country-Wise Loca...
 
A comprehensive study of mining web data
A comprehensive study of mining web dataA comprehensive study of mining web data
A comprehensive study of mining web data
 
Personalized web search using browsing history and domain knowledge
Personalized web search using browsing history and domain knowledgePersonalized web search using browsing history and domain knowledge
Personalized web search using browsing history and domain knowledge
 
Recommendation generation by integrating sequential pattern mining and semantics
Recommendation generation by integrating sequential pattern mining and semanticsRecommendation generation by integrating sequential pattern mining and semantics
Recommendation generation by integrating sequential pattern mining and semantics
 
Recommendation generation by integrating sequential
Recommendation generation by integrating sequentialRecommendation generation by integrating sequential
Recommendation generation by integrating sequential
 
Efficient way of user search location in query processing
Efficient way of user search location in query processingEfficient way of user search location in query processing
Efficient way of user search location in query processing
 
Sekhon final 1_ppt
Sekhon final 1_pptSekhon final 1_ppt
Sekhon final 1_ppt
 
IRJET-A Survey on Web Personalization of Web Usage Mining
IRJET-A Survey on Web Personalization of Web Usage MiningIRJET-A Survey on Web Personalization of Web Usage Mining
IRJET-A Survey on Web Personalization of Web Usage Mining
 
Identifying the Number of Visitors to improve Website Usability from Educatio...
Identifying the Number of Visitors to improve Website Usability from Educatio...Identifying the Number of Visitors to improve Website Usability from Educatio...
Identifying the Number of Visitors to improve Website Usability from Educatio...
 
WEB MINING – A CATALYST FOR E-BUSINESS
WEB MINING – A CATALYST FOR E-BUSINESSWEB MINING – A CATALYST FOR E-BUSINESS
WEB MINING – A CATALYST FOR E-BUSINESS
 
A270104
A270104A270104
A270104
 
IRJET- A Novel Technique for Inferring User Search using Feedback Sessions
IRJET- A Novel Technique for Inferring User Search using Feedback SessionsIRJET- A Novel Technique for Inferring User Search using Feedback Sessions
IRJET- A Novel Technique for Inferring User Search using Feedback Sessions
 
H0314450
H0314450H0314450
H0314450
 
A Study on Web Structure Mining
A Study on Web Structure MiningA Study on Web Structure Mining
A Study on Web Structure Mining
 

Viewers also liked

Search Engine Optimization and Analytics for CSEPP Advanced Training Course
Search Engine Optimization and Analytics for CSEPP Advanced Training CourseSearch Engine Optimization and Analytics for CSEPP Advanced Training Course
Search Engine Optimization and Analytics for CSEPP Advanced Training Course
Bryan Campbell
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
Krish_ver2
 
Cluster Analysis
Cluster AnalysisCluster Analysis
Cluster Analysis
DataminingTools Inc
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
Jewel Refran
 
Search Engine Powerpoint
Search Engine PowerpointSearch Engine Powerpoint
Search Engine Powerpoint
201014161
 
Data mining
Data miningData mining
Data mining
Akannsha Totewar
 

Viewers also liked (6)

Search Engine Optimization and Analytics for CSEPP Advanced Training Course
Search Engine Optimization and Analytics for CSEPP Advanced Training CourseSearch Engine Optimization and Analytics for CSEPP Advanced Training Course
Search Engine Optimization and Analytics for CSEPP Advanced Training Course
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
 
Cluster Analysis
Cluster AnalysisCluster Analysis
Cluster Analysis
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Search Engine Powerpoint
Search Engine PowerpointSearch Engine Powerpoint
Search Engine Powerpoint
 
Data mining
Data miningData mining
Data mining
 

Similar to Data mining in web search engine optimization

PageRank algorithm and its variations: A Survey report
PageRank algorithm and its variations: A Survey reportPageRank algorithm and its variations: A Survey report
PageRank algorithm and its variations: A Survey report
IOSR Journals
 
IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...
IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...
IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...
IRJET Journal
 
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web CrawlerIRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
IRJET Journal
 
Performance of Real Time Web Traffic Analysis Using Feed Forward Neural Netw...
Performance of Real Time Web Traffic Analysis Using Feed  Forward Neural Netw...Performance of Real Time Web Traffic Analysis Using Feed  Forward Neural Netw...
Performance of Real Time Web Traffic Analysis Using Feed Forward Neural Netw...
IOSR Journals
 
A Study On Web Structure Mining
A Study On Web Structure MiningA Study On Web Structure Mining
A Study On Web Structure Mining
Nicole Heredia
 
IRJET - Re-Ranking of Google Search Results
IRJET - Re-Ranking of Google Search ResultsIRJET - Re-Ranking of Google Search Results
IRJET - Re-Ranking of Google Search Results
IRJET Journal
 
K1803057782
K1803057782K1803057782
K1803057782
IOSR Journals
 
Comparable Analysis of Web Mining Categories
Comparable Analysis of Web Mining CategoriesComparable Analysis of Web Mining Categories
Comparable Analysis of Web Mining Categories
theijes
 
Webmining ppt
Webmining pptWebmining ppt
Webmining ppt
kiransatyawada
 
A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...
butest
 
A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...
butest
 
Farthest first clustering in links reorganization
Farthest first clustering in links reorganizationFarthest first clustering in links reorganization
Farthest first clustering in links reorganization
IJwest
 
Web Usage Mining: A Survey on User's Navigation Pattern from Web Logs
Web Usage Mining: A Survey on User's Navigation Pattern from Web LogsWeb Usage Mining: A Survey on User's Navigation Pattern from Web Logs
Web Usage Mining: A Survey on User's Navigation Pattern from Web Logs
ijsrd.com
 
Literature Survey on Web Mining
Literature Survey on Web MiningLiterature Survey on Web Mining
Literature Survey on Web Mining
IOSR Journals
 
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMSIDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
Zac Darcy
 
Identifying Important Features of Users to Improve Page Ranking Algorithms
Identifying Important Features of Users to Improve Page Ranking Algorithms Identifying Important Features of Users to Improve Page Ranking Algorithms
Identifying Important Features of Users to Improve Page Ranking Algorithms
dannyijwest
 
COST-SENSITIVE TOPICAL DATA ACQUISITION FROM THE WEB
COST-SENSITIVE TOPICAL DATA ACQUISITION FROM THE WEBCOST-SENSITIVE TOPICAL DATA ACQUISITION FROM THE WEB
COST-SENSITIVE TOPICAL DATA ACQUISITION FROM THE WEB
IJDKP
 
Pf3426712675
Pf3426712675Pf3426712675
Pf3426712675
IJERA Editor
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
IJERD Editor
 
G017254554
G017254554G017254554
G017254554
IOSR Journals
 

Similar to Data mining in web search engine optimization (20)

PageRank algorithm and its variations: A Survey report
PageRank algorithm and its variations: A Survey reportPageRank algorithm and its variations: A Survey report
PageRank algorithm and its variations: A Survey report
 
IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...
IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...
IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...
 
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web CrawlerIRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
 
Performance of Real Time Web Traffic Analysis Using Feed Forward Neural Netw...
Performance of Real Time Web Traffic Analysis Using Feed  Forward Neural Netw...Performance of Real Time Web Traffic Analysis Using Feed  Forward Neural Netw...
Performance of Real Time Web Traffic Analysis Using Feed Forward Neural Netw...
 
A Study On Web Structure Mining
A Study On Web Structure MiningA Study On Web Structure Mining
A Study On Web Structure Mining
 
IRJET - Re-Ranking of Google Search Results
IRJET - Re-Ranking of Google Search ResultsIRJET - Re-Ranking of Google Search Results
IRJET - Re-Ranking of Google Search Results
 
K1803057782
K1803057782K1803057782
K1803057782
 
Comparable Analysis of Web Mining Categories
Comparable Analysis of Web Mining CategoriesComparable Analysis of Web Mining Categories
Comparable Analysis of Web Mining Categories
 
Webmining ppt
Webmining pptWebmining ppt
Webmining ppt
 
A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...
 
A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...
 
Farthest first clustering in links reorganization
Farthest first clustering in links reorganizationFarthest first clustering in links reorganization
Farthest first clustering in links reorganization
 
Web Usage Mining: A Survey on User's Navigation Pattern from Web Logs
Web Usage Mining: A Survey on User's Navigation Pattern from Web LogsWeb Usage Mining: A Survey on User's Navigation Pattern from Web Logs
Web Usage Mining: A Survey on User's Navigation Pattern from Web Logs
 
Literature Survey on Web Mining
Literature Survey on Web MiningLiterature Survey on Web Mining
Literature Survey on Web Mining
 
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMSIDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
 
Identifying Important Features of Users to Improve Page Ranking Algorithms
Identifying Important Features of Users to Improve Page Ranking Algorithms Identifying Important Features of Users to Improve Page Ranking Algorithms
Identifying Important Features of Users to Improve Page Ranking Algorithms
 
COST-SENSITIVE TOPICAL DATA ACQUISITION FROM THE WEB
COST-SENSITIVE TOPICAL DATA ACQUISITION FROM THE WEBCOST-SENSITIVE TOPICAL DATA ACQUISITION FROM THE WEB
COST-SENSITIVE TOPICAL DATA ACQUISITION FROM THE WEB
 
Pf3426712675
Pf3426712675Pf3426712675
Pf3426712675
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
G017254554
G017254554G017254554
G017254554
 

Recently uploaded

BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
Nguyen Thanh Tu Collection
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Akanksha trivedi rama nursing college kanpur.
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
Nicholas Montgomery
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
WaniBasim
 
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
สมใจ จันสุกสี
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
Israel Genealogy Research Association
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
RAHUL
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
Katrina Pritchard
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
Nguyen Thanh Tu Collection
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
amberjdewit93
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
PECB
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
Jyoti Chand
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
TechSoup
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
Priyankaranawat4
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Life upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for studentLife upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for student
NgcHiNguyn25
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
heathfieldcps1
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
Celine George
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
Celine George
 

Recently uploaded (20)

BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
 
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 
Life upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for studentLife upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for student
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
 

Data mining in web search engine optimization

  • 1. International Journal of Computer Applications (0975 – 8887) Volume 95– No.8, June 2014 1 Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management Gurgaon 122017, Haryana, India ABSTRACT In the fast moving world, the use of web is been increasing day by day so that the requirement of users relative to web search are also increasing. The content search over the web is one of the important research area comes under the web content mining. According to a traditional search engine, the search is based on the content based matching. But when some site is optimized under the SEO tools, such kind of search is not effective in all ways. The aim of this research is to design a user assisted, reliable, search based on the keyword based analysis ,to provide the user assisted ranked results so that user can select the priority links ,discard the spam links over the web and efficient search optimization model over the open web. The main objective of the work is to implement the work in user friendly environment and analysis of work under different parameters. Keywords web pages, data mining, web mining, extreme programming, software tool. 1. INTRODUCTION 1.1 Data Mining Web usage mining is a subset of web mining operations which itself is a subset of data mining in general. The aim is to use the data and information extracted in web systems in order to reach knowledge of the system itself. Data mining is different from information extraction although they are closely related. To better understand the concepts brief definitions of keywords can be given as [1].  Data:- “A class of information objects, made up of units of binary code that are intended to be stored, processed, and transmitted by digital computers”.  Information:-“is a set of facts with processing capability added, such as context, relationships to other facts about the same or related objects, implying an increased usefulness. Information provides meaning to data”  Knowledge :- “is the summation of information into independent concepts and rules that can explain relationships or predict outcomes” Information extraction is the process of extraction information from data sources whether they are structured, unstructured or semi-structured into structured and computer understandable data formats. Area where data mining is widely used is bioinformatics where very large data about protein structures, networks and genetic material is analyzed. The sub category of interest in this thesis is the web mining which acts on the data made available in the World Wide Web (WWW) data servers. 1.1.1 Web Mining Web mining consists of a set operations defined on data residing on WWW data servers defines web mining as“…the discovery and analysis of useful information from the World Wide Web”. Web mining as a sub category of data mining is fairly recent compared to their areas since the introduction of internet and its widespread usage itself is also recent. However, the incentive to mine the data available on the internet is quite strong. Both the number of users around the world accessing online data and the volume of the data itself motivate the stakeholders of the web sites to consider analyzing the data and user behavior. Web mining is mainly categorized into two subsets namely web content mining and web usage mining. While the content mining approaches focus on the content of single web pages, web usage mining uses server logs that detail the past accesses to the web site data made available to public. 1.1.2 Web Content Mining “Web content mining describes the automatic search of information resources available on-line”. The focus is on the content of web pages themselves. content mining as agent- based approaches; where intelligent web agents such as crawlers autonomously crawl the web and classify data and database approaches; where information retrieval tasks are employed to store web data in databases where data mining process can take place Most web content mining studies have focused on textual and graphical data since the early years of internet mostly featured textual or graphical information. Recent studies started to focus on visual and aural data such as sound and video content too [2,3]. 1.1.3 Web Usage Mining The main topic of this thesis is the web usage mining. Usage mining as the name implies focus on how the users of websites interact with web site, the web pages visited, the order of visit, timestamps of visits and durations of them. The main source of data for the web usage mining is the server logs which log each visit to each web page with possibly IP, referrer, time, browser and accessed page link. Although many areas and applications can be cited where usage mining is useful, it can be said the main idea behind web usage mining is to let users of a web site to use it with ease efficiently, predict and recommend parts of the web site to user based on their and previous users actions on the web site[4]. 1.2 LITERATURE SURVEY In Year 2011, D. Choi has defined an approach to perform the query over the web and to extract the web document. The author also presented the approach to assign the ranking to these web documents. With the development of web search engines, one of the major tasks is to retrieve these documents from web effectively. These search engines uses the some ranking algorithm to present the result in an effective way.
  • 2. International Journal of Computer Applications (0975 – 8887) Volume 95– No.8, June 2014 2 The author has defined a study of existing ranking algorithm used by different search engines. The author explored the advantages and limitations of these ranking algorithms. The major contribution of author was the definition of query based information retrieval. The author defined the classification over a query and performed the query filtration. Based on this analysis, the ranking is improved and refined [5] Zhou Hui[6] has presented a work on optimization of search engine under the keyword analysis along with face link analysis and back link analysis. Author defined a relational environment based on search engine optimization os that the search ranking will be improved. Author also discussed various aspects of search engine optimization including the optimization vector, ranking, working principal etc. Ping-Tsai Chun[7] has presented a search engine optimization approach under the current market scenario analysis. Author defined the web service analysis to improve the business dictating and to provide the work under small organizations so that the effective keyword analysis based search will be performed. Author presented the work for text search as well as for image search over the web. The pattern analysis is defined to perform the effective search over web. One of the common model for web page ranking and prediction system is defined by Markov Model. Such model defines the navigational behavior of web graph theory as well as defines the transitional probabilities over the ranking analysis. The author not only defined work for a single web page access, but also presents the work for web path generation. The web path is actually defined as a series of web pages that a user can visit after visiting a specific web page. To perform this kind of analysis a Markov Model based prediction system is defined. The prediction is here defined under the web usage mining that defined the structural information for prediction of web pages. To perform such kind of analysis, the author defined a web page graph and implements the markov model over it to analyze the frequency match. Based on which a acyclic web path is generated and based on the weightage assigned to this web path the prediction is performed [8]. Another work in web page ranking is the comparison of different web pages and the web sites. Author M. Klein performed this comparison on two football team web sites of college team. The analysis is performed under the web page metrics to perform the quality assessment. The author has defined the page comparison and the ranking system under the graph theory[9]. 2. PROPOSED APPROACH The proposed work is about to optimize the topic based Web Service crawling process with the concept of exclusion of duplicate pages. For this a new architecture is proposed, this architecture will use the rank based service selection approach. In this work the ranking is performed respective main criteria’s called User Interest Analysis. The user interest analysis is further categorized under three sub categories called User Query Relevancy Analysis, User Recommendation to service in terms of like dislike factor and the user Web service visits. The basic phenomenon is given as under. Figure 1: Proposed Web Search Architecture As we can see in this proposed architecture the user will interact to the Web Service with his topic based query to retrieve the Web Service pages. As the page is query performed it will perform request to the Web Service and generate the basic url list. Now it will retrieve the data from the Web Service. For the url collection it will use some concepts like indexing and the ranking. The indexing will provide a fast access to the Web Service page where as ranking will arrange the list according to the priority. Now as a Web Service page is fetched, the proposed approach will retrieve the keywords form the document and perform the relevancy match by performing the match of service keywords with user query. Now as a new page is retrieved it will generate the suffix tree and perform a suffix tree based comparison to analyze the relevancy ratio. Based on this factor the initial ranking is assigned to the Web service. Here Fig 1 is showing the proposed web architecture. The web architecture is divided in three mains stages. At the earlier stage, the keyword analysis is performed in terms of keyword identification over the query and removes the stop list words from the query. After this stage, the keyword extraction over the query is obtained. Now this keyword list is considered as the query and passed over the web. As the web contents are extracted over the web, the link analysis is performed. This analysis will obtain the web contents and perform the content based match to obtain the most relevant pages over the web. The steps involved proposed work is presented in the Figure 2. Crawling the Web Page based on query Parsing the Web Contents to Text Form Identify the relevancy Vector and assign initial ranking Obtain User response Estimate ranking Result will be displayed Figure 2: Process Model of Proposed Work
  • 3. International Journal of Computer Applications (0975 – 8887) Volume 95– No.8, June 2014 3 3. PROPOSED ALGORITHM Algorithm { 1. Initialize the Web Environment 2. Get user Query 3. Accept the user query and filter it to retrieve the keywords under the following step a. Remove the stoplist words from the query list b. Remove the Similar Words c. Extract the Keywords From the Query 4. Use these extracted keywords as the main query to the web system 5. Extract the web contents and find the occurrence of the keywords in the web pages 6. Find the maximum match web page from the web respective to its contents as well as internal link contents 7. Find the list of M web pages from the web that satisfy the relevancy vector 8. For i=1 to M [Perform the Content based similarity measure as] { 9. RelevencyVector = 0 For j=1 to Length (UserKeywords) { RelevencyVector=RelevencyVector + KeywordOccurance(Page(i) ,Keyword(j)) /TotalKeywords(Page(i),Keyword(j)); } 10. Check the Existence of Particular web server in Database if it does not exist then set this relevancy vector as the initial ranking parameter 11. Obtain the rank based on user response parameters i.e. like, dislike and 12. Update the rank based on user response. } 13. Show the ranked list of pages to user } 4. EXPERIMENTAL RESULTS Here Figure 3 is showing the graphical screen on which the user pass the query to search engine. The results are obtained for the web user based on this query by performs server side check under different parameters. Figure 3: Graphical Screen Here Figure 4 is showing the results obtained from the web server based on user query. The query results are presented as the base results and decide the initial ranking based on the relevancy vector, Figure 4: Initial Results The given Figure 5 is showing the results based on proposed user assisted weighted model. The ranks to links are assigned. The primary ranking is based on relevancy vector. As the like vector will be improved, the rank will be improved and as the dislike vector will be increased, the rank will be decreased
  • 4. International Journal of Computer Applications (0975 – 8887) Volume 95– No.8, June 2014 4 Figure 5: Ranked Results The given Figure 6 is the impact of like vector. The like clicks on web link “education.com” is increased, ranking of the particular link is increased. Figure 6: Modified Ranked Results Based On Response Here figure 7 is showing the analysis of crawled pages under different parameters based on which the ranking to the web pages is assigned. These parameters includes like, dislike, visit count and ranking. Here figure is showing, As the user response is provided to these pages, the ranking is changed. As shown in figure, As the like vector is increased, the ranking of particular page is also increased. In same way, dislike vector and visit count also affect the ranking. Figure 7: Ranked Page Analysis for Education Keyword Here figure 8 is showing the analysis of crawled pages under different parameters based on which the ranking to the web pages is assigned. These parameters includes like, dislike, visit count and ranking. Here figure is showing, As the user response is provided to these pages, the ranking is changed. As shown in figure, As the like vector is increased, the ranking of particular page is also increased. In same way, dislike vector and visit count also affect the ranking. Figure 8: Ranked Page Analysis for Education Keyword 5. CONCLUSION In this paper we have mainly presented work is about to perform the effective search in the Web environment based on the user query relevancy factor. The relevancy of the query is here analyzed under three main factors called Keyword based Analysis, User Recommendation Analysis and the User Web service visit analysis. Based on these all factors a ranking criterion is decided and based on these ranking vectors the Web services are ordered. The user can get the best Web
  • 5. International Journal of Computer Applications (0975 – 8887) Volume 95– No.8, June 2014 5 service as well as recommend other for the best service selection. In this work, the google App is used as the public Web repository to perform the query analysis. The work is implemented in a web environment to perform the user query and to derive the ordered results from the query. 6. REFERENCES [1] Rajeev Motwani," Evolution of Page Popularity under Random Web Graph Models", PODS’06, June 26–28, 2006, Chicago, Illinois, USA. ACM 1-59593- 318-2/06/0006. [2] Ravi Kumar," Rank Quantization", WSDM’13, February 4–8, 2013, Rome, Italy,ACM 978-1-4503- 1869-3/13/02. [3] Paul Alexandru Chirita," Using ODP Metadata to Personalize Search", SIGIR’05, August 15–19, 2005, Salvador, Brazil. ACM 1-59593-034-5/05/0008. [4] Ricardo BaezaYates,"Web PageRanking usingLink Attributes",WWW2004, May 17–22, 2004,NewYork, USA.ACM 1-58113-912-8/04/0005. [5] Donjung Choi," An Approach to Use Query-related Web Context on Document Ranking", ICUIMC ’11, February 21–23, 2011, Seoul, Korea. ACM 978-1- 4503-0571-6. [6] Zhou Hui, Qin Shigang, Liu Jinhua, Chen Jianli, "Study on Website Search Engine Optimization", International Conference on Computer Science and Service System, pp 930-933, 2012. [7] Ping-Tsai Chung, "A Web Server Design Using Search Engine Optimization Techniques for Web Intelligence for Small Organizations", Proceedings of IEEE Conference, pp 1-6, 2013. [8] Magdalini Eirinaki," Web Path Recommendations based on Page Ranking and Markov Models", WIDM’05, November 5, 2005, Bremen, Germany ACM 1-59593-194-5/05/0011. [9] Martin Klein," Comparing the Performance of US College Football Teams in the Web and on the Field", HT’09, June 29–July 1, 2009, Torino, Italy. ACM 978-1-60558-486-7/09/06. [10] JOHN B. KILLORAN, "How to Use Search Engine Optimization Techniques to Increase Website Visibility", IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 56, NO. 1, pp 50-66, MARCH 2013. [11] Chen Wang," Extracting Search-Focused Key N- Grams for Relevance Ranking in Web Search", WSDM’12, February 8–12, 2012, Seattle, Washington, USA. ACM 978-1-4503-0747-5/12/02. [12] Bin Gao," Semi-Supervised Ranking on Very Large Graphs with Rich Metadata", KDD’11, August 21–24, 2011, San Diego, California, USA. ACM 978-1-4503- 0813-7/11/08. IJCATM : www.ijcaonline.org