SlideShare a Scribd company logo
1 of 10
Download to read offline
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
495
RESEARCH ON CLASSIFICATION ALGORITHMS AND ITS IMPACT ON
WEB MINING
Prof. Sindhu P Menon1
Dr. Nagaratna P Hegde2
Assistant Professor, Professor,
Dept of CSE,KLEIT, Gokul Road, Dept. of CSE, VCE, Ibrahimbagh,
Hubli, Karnataka, India Hyderabad, India
ABSTRACT
Web mining is the application of data mining technique to discover patters from the web.
Web mining can be further classified into research fields such as: Data mining and World Wide Web
(WWW). Data mining a field of computer science is the process of discovering new patterns from
large dataset involving methods of artificial intelligence, database systems. It refers that the goal of
the data mining is to draw out information from the dataset [19] .The World Wide Web provides an
effective and simpler way for the users to search, brows and retrieve information from the web. Web
mining can be broadly classified into i) Web Structure Mining ii) Web Content Mining and iii) Web
Usage Mining[15].
This paper is based on the survey established on the published papers on web mining, mainly
focus on web usage mining .Its a known fact that Web usage mining is an application of web mining.
This is s used to extract patterns from the log data which are then examined to obtain behavioral
patterns which can be analyzed for further processing. The input for all these operations is the log
file.
Keywords: Clustering, Classification, Web Mining, Associative Rule Mining.
I. INTRODUCTION
The term web mining refers to applying data mining techniques to extract useful information
from web. As much of the research has not been carried out in web mining , its implementation
becomes complex. Web mining covers a lot other areas such as Database, Information Retrieval, and
artificial intelligence. The term web mining was first coated by Oren Etzioni in 1996 [21], he
claimed that web mining [21] is the use of mining techniques to extract information from WWW and
other services.
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING &
TECHNOLOGY (IJCET)
ISSN 0976 – 6367(Print)
ISSN 0976 – 6375(Online)
Volume 4, Issue 4, July-August (2013), pp. 495-504
© IAEME: www.iaeme.com/ijcet.asp
Journal Impact Factor (2013): 6.1302 (Calculated by GISI)
www.jifactor.com
IJCET
© I A E M E
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
496
To start With web mining we can say that web contains 3 types of data i) web structure data
ii) content (data of web) iii) usage (web log data) [11]. Usually web mining can be categorized into
i)Web Structure Mining.
ii)Web Content Mining.
iii)Web Usage mining.
Web structure mining [15], one of three categories of web mining for data, is a tool used to
identify the relationship between Web pages linked by information or direct link connection. This
connection allows extracting data relating to a search query directly from the linking Web page.
Structure mining addresses two problems of the World Wide Web. The first of these problems is
irrelevant search results. Relevance of search information becomes a problem that search engines
often allow for low precision criteria. The recall in content mining is low because of the vast amount
of information provided on the Web. This reduction in information is due to the uncovering of the
Web hyperlink structure . The concept behind structure mining is to retrieve the unexplored
relationships from the web data. Structure mining finds its use in a number of applications like
business wherein it is used to link its website details to enable users to navigate and cluster
information. Through this the users can access information through keyword league and content
mining.
Web content mining (text mining) [15], is generally the second step in Web data mining.
Content mining is the scanning of text, pictures and graphs of a Web page. This scanning is usually
completed after the clustering process through structure mining and provides the results based on the
same. With the huge amount of information available on the World Wide Web, content mining
provides the results to search engines. Content mining is directed toward specific information given
by the customer in search engines. This in turn allows for the scanning of the entire Web. Text
mining becomes very efficacious when handling with specific topics. The main uses for this type of
data mining is to gather, organize possible information available on the WWW to the user request. A
vast number of results thus obtained improve the navigation patterns on the web.
Web usage mining [15] is the main category in web mining. This allows for the collection of
Web access information for Web pages. This allows the paths leading to accessed Web pages. This
information is often gathered automatically into web logs via. Usage mining allows producing
productive information pertaining to the future of their function ability. Some of which can be
derived from the collective information. The usage data that is gathered provides the ability to
produce results in more effective way.
Usage mining is useful in a number of areas like online trading and also in businesses based
on web trading. This information allows us to get idea about the number of users and business on
each site. This web mining also enables Web based businesses to provide the best access routes to
services or other advertisements. Usage processing is used for complete pattern discovery. The
pattern discovery [8] is difficult because only bits of information like IP addresses, url, etc are
available. With this less information it is harder to identify the user. Another use is structure
processing, Which consists of analysis of the structure of each page contained in a Web site.
II. RELATED WORK
Vijyalaxmi [15] has explored an innovative sequential technique called AWAPT (Adaptive
Web Access Pattern Tree), for Frequent Sequential Pattern mining which is the most common
method of data mining. The main focus is on discovering the relationships between the access
patterns which is obtained from the web log file. Pattern discovery process is carried out by applying
FSP method to log file and later it is analyzed. The main steps involved in this technique are
summarized-The algorithm first surveys the raw log file to find out the separate user activities. Web
access pattern tree, an efficient data structure is constructed based on frequent access patterns. Later
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
497
intermediate WAP tree is constructed. Considering memory and time as the critical issues, AWAPT
is proposed which avoids recursion [15].
A. Rule of AWAPT[15]:
Given a WAP-tree With some nodes, the binary code of each node can simply be assigned
following the rule that the root has null position code, and the leftmost child of the root has a code of
1, but the code of any other node is derived by appending 1 to the position code of its parent, if this
node is the leftmost child, or appending 10 to the position code of the parent if this node is the
second leftmost child, the third leftmost child has 100 appended, etc. In general, for the nth leftmost
child, the position code is obtained by appending the binary number for 2n-1 to the parent’s code. A
node is an ancestor of another node if and only if the position code of With “1” appended to its end,
equals the first x number of bits in the position code of , where x is the ((number of bits in the
position code ) + 1).
The authors of [18] proposed some rules in every phase of data pre processing, as the raw
web log file contains many irrelevant data , they applied many heuristics in order to reduce the size
as well as to improve the quality of the file. User identification is also the main part dealt in this
paper. IP address was used as the key element in distinguishing the users. Access log is used along
with the referrer log to construct the access patterns. Path completion is also the critical thing as
many access records are not maintained in the log file, hence there is a need for filling these missed
references. In [10] , clustering was the main goal. As there was a prior knowledge about the tasks,
this brought accuracy up to 99%. The main aspect which was used in clustering as the viewing time
of the pages increased the effectiveness and robustness. Vectors are created to identify features of
each web page, and then the page is described as the multi-model vector. Later the user sessions are
modeled as multi model vectors. Finally clustering of the sessions is done which leads to the
different categories.
Many researchers are working in this field in order to effectively identify the users and
classify them which helps in personalization. The first step in WUM is data cleaning as the raw web
log file includes lots of noise and it has to be removed. Yan Wang [11] described in his paper the
pattern discovery in the form of Statistical Analysis, Association Rules, Clustering, Classification,
Sequential Pattern and Dependency Modeling. He also used web usage mining frame work called
WebSIFT that used the content and structure information from a Web site, and finally identified the
interesting results from mining usage data [6]. Due to the extensive growth of E-commerce, privacy
has become a critical topic for many researchers. The applications of web mining have led to many
conflicts like spam and that the sensitive information of the users are being hacked during online
shopping and online banking. He carried out works related to user privacy and navigation pattern
discovery [12] described two categories of effective features in identifying the goal of a query: past
user-click behavior and anchor-link distribution. Here attempt is made to clearly understand the goal
of the user query which may help in improving the search engine’s service. He used taxonomy of
query goals which relied mainly on [19] that is
1) Navigational queries : Here users already have a clear
website in their mind. So the query is just for the purpose of verification. The users might have
previously visited the sites. So the result will be accurate.
2) Informational queries: Here users do not pre-assume any
web pages and there will be multiple results, user will discover new websites as he searches
depending upon the requirement.
Here is an approach [14] to develop user profiles. This helps in identifying the users interests in
a better and efficient way, enabling effective personalized services. User profiles may also be used to
obtain the user information like the location of the user, his previously visited sites etc. The basic
thing for user profile construction is the requirement of user details and so the users have to be
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
498
identified uniquely. The various methods are-software agents, logins, enhanced proxy servers,
cookies, and session ids. After identification users information must be collected Through the HTML
forms, where in the users can fill their details using checkboxes and radio buttons. This is explicit
collection and this requires user’s interest. Another method is implicit collection which may not
require intervention of the users. The various techniques are- browser cache, proxy servers, browser
agents, desktop agents, web logs, search logs. User profiles are constructed using the information
obtained.
The authors [9] described the recent developments in WUM which is getting more attention
day by day. Most of this research effort focuses on three main paradigms: association rules,
sequential patterns, and clustering. Association rules are used to find out the associations among
pages that often appear together in users’ sessions. The typical result has the form “X.html, Y.html
=> Z.html” which states that if a user has visited page X.html and page Y.html, it is very likely that
in the same session, the same user has also visited page Z.html. Sequential Patterns are used to
explore frequent sequences among large amount of sequential data. In web usage mining, sequential
patterns are discovered to find out navigational patterns that occur in users’ sessions. Clustering is
mainly used to group the similar sessions. Applying multi-modal clustering, [10] is a technique
which builds clusters by using multiple information data features. [20] presents an application of
matrix clustering to web usage data. [3] integrated a naïve Bayesian multi-net to perform the user
identification task. This mainly interprets the distinct user patterns which help to retain the
customers. This approach is carried out with e-transaction data. Other strategy used is click stream
data. Transactions are the main criteria. User identification function is derived from the set of
transactions and the set of users.
This approach in [3] has following steps:
1. Characteristic pattern mining: The characteristic patterns are mined from a training transaction
set by filtering out user extrinsic behaviors, such as common patterns for most users and
accidental behavior patterns for a particular user.
2. Identification function construction: The identification function is built on the learned
characteristic patterns using a naïve Bayesian multi-net.
3. User identification: The characteristic patterns in each given transaction are recognized and
employed to determine the user of the transaction by the identification function.
Characteristic pattern mining and related concepts are also defined. The user confidence, C(ptk,
uj) of pattern ptk and user uj is the conditional probability that a transaction containing ptk was from
user uj.
In [4], they developed a general architecture for Web usage mining which is presented in [6]
and [21]. The WEBMINER is a system that implements parts of this general architecture. The
architecture divides the Web usage mining process into two main parts. The first part includes the
domain dependent processes of transforming the Web data into suitable transaction form. This
includes preprocessing, transaction identification, and data integration components. The second part
includes the largely domain independent application of generic data mining and pattern matching
techniques (such as the discovery of association rule and sequential patterns) as part of the system's
data mining engine.
[7] have presented a probabilistic latent semantic analysis (PLSA) model, which can imply
the hidden semantic factors and present user access patterns from the session-page observation data.
The data from two different sources, namely, web access log files (i.e. usage data) and web site map
(i.e. linkage information) are integrated to generate linkage-enhanced usage data. The integrated
usage data, in turn, are viewed as user session data in terms of page view-weight pairs and utilized to
derive the user access patterns based on the PLSA model. In addition, the user access patterns have
also been characterized by the user profiles, which are presented in terms of weighted page sets. The
benefit of weight in the user profiles can be used to determine the main theme of individual and
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
499
common user access pattern, which will provide useful information for further other web
applications, such as web recommendation or personalization.
III. PHASES IN WEB USAGE CLASSIFICATION
The figure below depicts the various phases in user classification
Fig.1 Phases of User Classification
This model has been proposed in [16]. The sub modules are described as below.
A. User identification
User identification is the crucial step in data preprocessing model. Since its very challenging
task to find out a particular user from heaps of the log records stored in the server, there are many
ways to identify the users [14]. They are as follows:
1) Software agents
These are small application modules which are installed in user computer. This keeps track of
all the web transactions of the user. The only assumption is that both the server and user has the same
application and information.
2) User id
Here identifying is accurate since users themselves give their identification data Through
userid and passwords. Here the only assumption is all server provide the registration forms.
3) Cookies
These are the chunks of information stored in user’s computer by the server. It is most
efficient technique but the user must have set the cookie on his machine otherwise no information is
stored.An algorithm for user identification [8] is as follows:
Algorithm: User Identification
Input: Log Database.
Output: Unique Users Database.
Step1: Initialize
IPList=0; UsersList=0; BrowserList=0;
OSList=0; No-of-users=0;
Step2: Read Record from LogDatabase
Step3: If Record.IP address in not in IPList then add new
Record.IPaddress in to IPList
add Record.Browser in to BrowserList
add Record.OS in to OSList
increment count of No-of-users
insert new user in to UserList.
Else
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
500
If Record.IP address is present in IPList OR
Record.Browser not in BrowserList OR
Record.OS not in OSList
then
increment count of No-of-users
insert as new user in to UserList.
End of If
End of If
Step4: Repeat the above steps 2 to 3
until eof (Log Database ).
Step5: Stop the process.
The outcome of this algorithm is a Unique Users Database gives information about total
number of individual users, users IPaddress, user agent and browser used[1].
In [3] they have used transactions to identify users, they have given the expression as bellow:
They have described Boolean function as
ʄ : {0,1 }ǀ I ǀ
{ 0,1}ǀ U ǀ+1
Where |I| is total number of possible items in the transaction set T , and |U| +1 is the total
number of possible users plus the unknown user.
The Algorithm [3] is given below:
Algorithm: Identifying users
Input: A transaction t, that requires identifying the user
Output: Identified user, u^
, or unknown users u0
Procedure user_identification(t) {
Identify all characteristics patterns, ptk, in t, i.e., ptk subset to PTt,
, ptk c t and ptk belongs to
PTc;
If PTt is empty
Return u0
Else
Let PTt=PTc-PTt;
Remove any pattern that is a sub pattern or another pattern
in PTt
Remove any pattern that is a super pattern of another
pattern in vector PT
For all ui belongs to U, compute
Return uj with the maximum
End if
}
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
501
B. Session Identification
Sessions are normally defined as the number of page visits done by the user in a certain time
allotment. It depends on the particular user whether he/she has single or multiple sessions. Here even
we can redefine user’s access pattern in that particular session by using some reconstruction
techniques.
In [9] they have mentioned two steps for session identification:
•••• Identifying the different session of the user from the very poor data available in server log
file.
•••• Restructuring the user’s navigation patterns within the already identified sessions.
They have used cookies to rewrite the urls. They have also addressed the web browser
caching problem in their proposed paper.In [4], they have used click streams of users. They have
named this process as sessionization. Here a transaction is defined as a subset of user session having
homogenous pages. The two methods described here are:
1. Time Oriented Heuristics
This is based on total session time. Here Page viewing time is defined as set of pages visited
by a specific user at a specific time. The time can vary from 10 minutes to 10 hours depending on the
patterns.
Another method calculates time stamps of two records of same user for the maximum period
of time they have stayed in a page
2. Navigation Oriented Heuristics
Here the web topologies are used in graphical format. They
have used referrer field in web log file to denote whether the page is a new session or it must be
added to previous system.
In [7] authors have mentioned an algorithm for grouping session of user. The Algorithm
proposed by them is as follows
Algorithm: Grouping User Sessions
Input: P(Zk |Si), user session-page matrix SPij, threshold 
Output: A set of clusters SCL=(SCL1,SCL2,….,SCLk)
Begin
Step1: SCL1=SCL2=...=SCLk=
Step2: For each si belongs to S,select P(zk|si), if
P(zk|si)>= Then SCLk=SCLk U si
Step3: If there are still user’s sessions to be clustered, go
back to step2
Step4: Return clusters SCL= {SCLk}
C. User Classification
Classification means supervised learning. Here the classes are previously defined. There is
another method of training of data (unsupervised learning), also called as clustering. Here classes are
dynamic.
1. Decision tree induction
Decision tree [2] contains flow chart like tree structure.
Here the internal nodes denote a conflict on a test or a condition and branches represent
outcome of the condition or test. The leaf node represents the classes which are already
defined.
Decision tree is denoted in 2 steps
a. Constructing tree
Where training examples are at the root and it is partitioned recursively using
attributes or features.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
502
b. Tree pruning
Here any noise or defects are eradicated from the tree especially from the branches,
identifying these noise or defects is a complex process.
Algorithm: Decision tree
Step1: Construct a Top-Down recursive tree using divide
and conquer method.
Step2: Keep all the training samples at the root.
Step3: If attributes have continuous values, make it
discrete.
Step4: Categorize the attributes.
Step5: The sample is portioned recursively according to the
attributes selection.
Step 6: Based on the statistical measure like information
gain attributes are selected.
Step7: Stop the partitions if all samples are in same class
for a particular node.
Step8: Stop the partitioning if these are no remaining
attributes or samples.
2. Bayesian classifier
Given training data D, posteriori probability of a hypothesis h, P(h|D) follows the Bayes
theorem
MAP (maximum posteriori) hypothesis
The classification problem may be formalized using a posteriori probabilities:
P(C|X) = prob. that the sample tuple X=<x1,…,xk> is of class C
E.g. P(class=N | outlook=sunny,windy=true,…)Idea: assign to sample X the class label C
such that P(C|X) is maximal
Bayes theorem:
P(C|X) = P(X|C)·P(C) / P(X)
where P(X) is constant for all classes
and P(C) = relative freq of class C samples
C such that P(C|X) is maximum = C such that P(X|C)·P(C) is maximum.
3. Other Classification approaches
3.1) K-nearest neighbor:
Here instances are represented as points
Algorithm: K-Nearest Neighbor
Step 1: All samples or values are denoted as points in a n-D space.
Step 2: Euclidean distance is determined to find out the nearest neighbor of all points.
Step 3: Another function is expressed either discrete or real valued. It is called as target
function.
)(
)()|()|(
DP
hPhDPDhP =
.)()|(maxarg)|(maxarg hPhDP
Hh
DhP
HhMAP
h
∈
=
∈
≡
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
503
Step 4: If discrete target function is used, K-NN returns the most common value Among the
training example.
The weight of the neighbor is calculated as
W=1/d(xq,x1)2
3.2) Genetic Algorithm:
It is a analog to biological evolution. Each rule is repeated as a string of bits. Initially
population is created first by using random generated rules. Here the notation of survival of
fittest is represented by the accuracy with which it classifies a set of training examples,
crossovers and mutation of rules generate offspring.
3.3) Rough set approach
In this approach the equivalent classes are defined approximately or roughly. In this
approach for a given class C approximations are done in two sets that is lower approximation
(certain to b in C) and upper approximation (can not b described as not in C). N-P hard is
used to find the minimum set of attributes and a diascernibility matrix is used to reduce the
computation intensity.
3.4) Fuzzy logic
It uses the truth values between 0.0to1.0. These truth values represent the degree of
fuzzy membership. Attribute values are converted to fuzzy values. For a given data more than
one fuzzy value can be defined. Each rule which is applied denotes a vote for membership in
that pertaining class. Finally the total of truth value is taken from each class.
Classification on Web Mining
As stated earlier Classification referes to supervised learning wherin the classes are
predefined by the researcher. In Clustering we go in for unsupervised learning wherein there are no
predefined classes. In classification we have two sets of data, the training set and the testing set. In
the training set based on the characteristics of the model, we feed in data and build the database.
Then based on this built database , using the testing set we classify the users. A number of
classification methodologies exist
IV. CONCLUSION
In this paper, we survey the researches in the area of Web mining With the focus on the
classification in Web Usage Mining. Around the key topic of this paper - usage mining, we provide
detailed description of user identification as well as classification. In Classification Bayesian
approach is the oldest and best approach to classify the users. User identification using some
advanced techniques is a better way compared to the construction of the user profiles which requires
the information of the users. So lots of areas in this domain are yet to be explored which may lead to
personalization and to provide the desired information to the users.
V. REFERENCES
[1] Suneetha K.R and Dr. R. Krishnamoorthi (2009), Data Preprocessing and Easy Access
Retrieval of Data Through Data Ware House, Proceedings of the World Congress on
Engineering and Computer Science IWCECS, October 20-22, pp.306-311, San Francisco,
USA.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
504
[2] Micheline Kamber, Lara Winstone, Wan Gong, Shan Cheng, Jiawei Han, Generalization and
Decision Tree Induction: Efficient Classification, Database Systems Research Laboratory School
of Computing Science
[3] Oren Etzioni(1996), The world-wide web: Quagmire or gold mine, Communications of the ACM,
Vol. 39(11) pp. 65–68.
[4] V. Chitra and Dr. Antony Selvdoss Davamani(2010),A Survey on Preprocessing Methods for Web
Usage Data. (IJCSIS) International Journal of Computer Science and Information Security, Vol. 7,
No. 3,pp. 78-83, ISSN 1947-5500
[5] Alice Marascu and Florent Masseglia (2006) ,Mining Sequential Patterns from Data Streams: a
Centroid Approach, Journal of Intelligent Information Systems, Volume 27, Issue 3, pp 291-307
[6] R. Cooley, B. Mobasher, and J. Srivastava,(1997) , Web Mining: Information and Pattern
Discovery on the World Wide Web, University of Minnesota, Dept. of Computer Science,
Minneapolis, ACM SIGKDD, Vol.1, Issue 2 , pp. 12-23
[7] Guandong Xu, Yanchun Zhang, Jiangang Ma, Xiaofang Zhou, Discovering User Access Pattern
Based on Probabilistic Latent Factor Model, in ADC ’05: Proceedings of the sixteenth
Australasian database conference, Darlinghurst, Australia, Australian Computer Society, pp.27-35
[8] K.R.Suneetha, R. Krishnamoorti(2011), IRS: Intelligent Recommendation System for Web
Personalization, European Journal of Scientific Research, Inc., ISSN 1450-216X, Vol.65 Issue 2,
pp.175-186.
[9] Federico Michele Facca and Pier Luca Lanzi(2003) , Recent Developments in Web Usage Mining
Research, in proceddings of DaWaK ,Prague,Czech republic, LNCS, Springer Verlag
[10] Jeffrey Heer, Ed H. Chi(2002) , Separating the swarm: categorization methods for user sessions on
the web , in Proceedings of ACM CHI 2002, Conference on Human factors in Computing
Systems, pp.243-250, ACM Press, Minnapolis
[11] Yan Wang(2000), Web Mining and Knowledge Discovery of Usage Pattern,in Web Age
Information Management System, pp 227-232
[12] Uichin Lee, Zhenyu Liu, Junghoo Cho(2005) , Automatic Identification of User Goals in Web
Search”, University of California Los Angeles, CA 90095, In WWW2005: The 14th
International
World Wide Web Conference.
[13] R. Kosala, H. Blockeel(2000), Web Mining Research: A Survey,” In SIGKDD Explorations,
ACM press, Vol 2 Issue 1 , pp.1-15.
[14] Susan Gauch, Mirco Speretta, Aravind Chandramouli and Alessandro Micarelli(2007), User
Profiles for Personalized Information Access, Electrical Engineering and Computer Science
Information & Telecommunication Technology Center, The Adaptive Web, LNCS 4321, pp.54-
89, Springer-Verlag Berlin Heidelberg
[15] S. Vijayalakshmi and V.Mohan (2010), Mining of users access behavior for frequent sequential
pattern from web logs, International Journal of Database Management Systems ( IJDMS) Vol.2,
No.3,pp.31-45.
[16] Suneetha K.R, Dr. R. Krishnamoorthi(2009), Data Preprocessing and Easy Access Retrieval of
Data Through Data Ware House, Proceedings of the World Congress on Engineering and
Computer Science , Vol IWCECS 2009, October 20-22, San Francisco, USA,ISBN :978-988-
17012-6-8
[17] Li Chaofeng(2006) ,Research and Development of Data Preprocessing in Web Usage Mining,
School of Management, South-Central University for Nationalities,Wuhan 430074, P.R. Chinapp,
International Conference on Management Science and Engineering, pp.1311-1315
[18] Sayeesh and Dr. Nagaratna P. Hegde, “A Comparison of Multiple Wavelet Algorithms for Iris
Recognition”, International Journal of Computer Engineering & Technology (IJCET), Volume 4,
Issue 2, 2013, pp. 386 - 395, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.
[19] Sumana M and Hareesha K S, “Preprocessing and Secure Computations for Privacy Preservation
Data Mining”, International Journal of Computer Engineering & Technology (IJCET), Volume 4,
Issue 4, 2013, pp. 203 - 212, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.

More Related Content

What's hot

Web content mining a case study for bput results
Web content mining a case study for bput resultsWeb content mining a case study for bput results
Web content mining a case study for bput resultseSAT Publishing House
 
ACOMP_2014_submission_70
ACOMP_2014_submission_70ACOMP_2014_submission_70
ACOMP_2014_submission_70David Nguyen
 
Literature Survey on Web Mining
Literature Survey on Web MiningLiterature Survey on Web Mining
Literature Survey on Web MiningIOSR Journals
 
Applying association rules and co location techniques on geospatial web services
Applying association rules and co location techniques on geospatial web servicesApplying association rules and co location techniques on geospatial web services
Applying association rules and co location techniques on geospatial web servicesAlexander Decker
 
Internet Prospective Study
Internet Prospective StudyInternet Prospective Study
Internet Prospective StudyjournalBEEI
 
Data preparation for mining world wide web browsing patterns (1999)
Data preparation for mining world wide web browsing patterns (1999)Data preparation for mining world wide web browsing patterns (1999)
Data preparation for mining world wide web browsing patterns (1999)OUM SAOKOSAL
 
PAS: A Sampling Based Similarity Identification Algorithm for compression of ...
PAS: A Sampling Based Similarity Identification Algorithm for compression of ...PAS: A Sampling Based Similarity Identification Algorithm for compression of ...
PAS: A Sampling Based Similarity Identification Algorithm for compression of ...rahulmonikasharma
 
Sekhon final 1_ppt
Sekhon final 1_pptSekhon final 1_ppt
Sekhon final 1_pptManant Sweet
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisDataminingTools Inc
 
International Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data ScienceInternational Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data Sciencedatasciencekorea
 
Client Forensics: An Assessment of Existing Research And Future Directions
Client Forensics: An Assessment of Existing Research And Future DirectionsClient Forensics: An Assessment of Existing Research And Future Directions
Client Forensics: An Assessment of Existing Research And Future DirectionsCSCJournals
 

What's hot (18)

Web content mining a case study for bput results
Web content mining a case study for bput resultsWeb content mining a case study for bput results
Web content mining a case study for bput results
 
Web content minin
Web content mininWeb content minin
Web content minin
 
Aa03401490154
Aa03401490154Aa03401490154
Aa03401490154
 
ACOMP_2014_submission_70
ACOMP_2014_submission_70ACOMP_2014_submission_70
ACOMP_2014_submission_70
 
01635156
0163515601635156
01635156
 
Literature Survey on Web Mining
Literature Survey on Web MiningLiterature Survey on Web Mining
Literature Survey on Web Mining
 
50320130403007
5032013040300750320130403007
50320130403007
 
Applying association rules and co location techniques on geospatial web services
Applying association rules and co location techniques on geospatial web servicesApplying association rules and co location techniques on geospatial web services
Applying association rules and co location techniques on geospatial web services
 
Internet Prospective Study
Internet Prospective StudyInternet Prospective Study
Internet Prospective Study
 
Data preparation for mining world wide web browsing patterns (1999)
Data preparation for mining world wide web browsing patterns (1999)Data preparation for mining world wide web browsing patterns (1999)
Data preparation for mining world wide web browsing patterns (1999)
 
PAS: A Sampling Based Similarity Identification Algorithm for compression of ...
PAS: A Sampling Based Similarity Identification Algorithm for compression of ...PAS: A Sampling Based Similarity Identification Algorithm for compression of ...
PAS: A Sampling Based Similarity Identification Algorithm for compression of ...
 
Web mining
Web miningWeb mining
Web mining
 
Sekhon final 1_ppt
Sekhon final 1_pptSekhon final 1_ppt
Sekhon final 1_ppt
 
Minning www
Minning wwwMinning www
Minning www
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
 
International Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data ScienceInternational Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data Science
 
Client Forensics: An Assessment of Existing Research And Future Directions
Client Forensics: An Assessment of Existing Research And Future DirectionsClient Forensics: An Assessment of Existing Research And Future Directions
Client Forensics: An Assessment of Existing Research And Future Directions
 
Web Content Mining
Web Content MiningWeb Content Mining
Web Content Mining
 

Similar to Research on classification algorithms and its impact on web mining

AN EXTENSIVE LITERATURE SURVEY ON COMPREHENSIVE RESEARCH ACTIVITIES OF WEB US...
AN EXTENSIVE LITERATURE SURVEY ON COMPREHENSIVE RESEARCH ACTIVITIES OF WEB US...AN EXTENSIVE LITERATURE SURVEY ON COMPREHENSIVE RESEARCH ACTIVITIES OF WEB US...
AN EXTENSIVE LITERATURE SURVEY ON COMPREHENSIVE RESEARCH ACTIVITIES OF WEB US...James Heller
 
Search Engine Scrapper
Search Engine ScrapperSearch Engine Scrapper
Search Engine ScrapperIRJET Journal
 
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...IAEME Publication
 
A NEW IMPROVED WEIGHTED ASSOCIATION RULE MINING WITH DYNAMIC PROGRAMMING APPR...
A NEW IMPROVED WEIGHTED ASSOCIATION RULE MINING WITH DYNAMIC PROGRAMMING APPR...A NEW IMPROVED WEIGHTED ASSOCIATION RULE MINING WITH DYNAMIC PROGRAMMING APPR...
A NEW IMPROVED WEIGHTED ASSOCIATION RULE MINING WITH DYNAMIC PROGRAMMING APPR...cscpconf
 
A Review on Pattern Discovery Techniques of Web Usage Mining
A Review on Pattern Discovery Techniques of Web Usage MiningA Review on Pattern Discovery Techniques of Web Usage Mining
A Review on Pattern Discovery Techniques of Web Usage MiningIJERA Editor
 
MINING FUZZY ASSOCIATION RULES FROM WEB USAGE QUANTITATIVE DATA
MINING FUZZY ASSOCIATION RULES FROM WEB USAGE QUANTITATIVE DATAMINING FUZZY ASSOCIATION RULES FROM WEB USAGE QUANTITATIVE DATA
MINING FUZZY ASSOCIATION RULES FROM WEB USAGE QUANTITATIVE DATAcscpconf
 
Mining Fuzzy Association Rules from Web Usage Quantitative Data
Mining Fuzzy Association Rules from Web Usage Quantitative Data Mining Fuzzy Association Rules from Web Usage Quantitative Data
Mining Fuzzy Association Rules from Web Usage Quantitative Data csandit
 
Development of pattern knowledge discovery framework using
Development of pattern knowledge discovery framework usingDevelopment of pattern knowledge discovery framework using
Development of pattern knowledge discovery framework usingIAEME Publication
 
A Novel Framework on Web Usage Mining
A Novel Framework on Web Usage MiningA Novel Framework on Web Usage Mining
A Novel Framework on Web Usage MiningIRJET Journal
 
A Novel Method for Data Cleaning and User- Session Identification for Web Mining
A Novel Method for Data Cleaning and User- Session Identification for Web MiningA Novel Method for Data Cleaning and User- Session Identification for Web Mining
A Novel Method for Data Cleaning and User- Session Identification for Web MiningIJMER
 
The Data Records Extraction from Web Pages
The Data Records Extraction from Web PagesThe Data Records Extraction from Web Pages
The Data Records Extraction from Web Pagesijtsrd
 
User Navigation Pattern Prediction from Web Log Data: A Survey
User Navigation Pattern Prediction from Web Log Data:  A SurveyUser Navigation Pattern Prediction from Web Log Data:  A Survey
User Navigation Pattern Prediction from Web Log Data: A SurveyIJMER
 
C03406021027
C03406021027C03406021027
C03406021027theijes
 
IRJET-A Survey on Web Personalization of Web Usage Mining
IRJET-A Survey on Web Personalization of Web Usage MiningIRJET-A Survey on Web Personalization of Web Usage Mining
IRJET-A Survey on Web Personalization of Web Usage MiningIRJET Journal
 
A detail survey of page re ranking various web features and techniques
A detail survey of page re ranking various web features and techniquesA detail survey of page re ranking various web features and techniques
A detail survey of page re ranking various web features and techniquesijctet
 
STRATEGY AND IMPLEMENTATION OF WEB MINING TOOLS
STRATEGY AND IMPLEMENTATION OF WEB MINING TOOLSSTRATEGY AND IMPLEMENTATION OF WEB MINING TOOLS
STRATEGY AND IMPLEMENTATION OF WEB MINING TOOLSAM Publications
 
WEB MINING – A CATALYST FOR E-BUSINESS
WEB MINING – A CATALYST FOR E-BUSINESSWEB MINING – A CATALYST FOR E-BUSINESS
WEB MINING – A CATALYST FOR E-BUSINESSacijjournal
 

Similar to Research on classification algorithms and its impact on web mining (20)

Pf3426712675
Pf3426712675Pf3426712675
Pf3426712675
 
AN EXTENSIVE LITERATURE SURVEY ON COMPREHENSIVE RESEARCH ACTIVITIES OF WEB US...
AN EXTENSIVE LITERATURE SURVEY ON COMPREHENSIVE RESEARCH ACTIVITIES OF WEB US...AN EXTENSIVE LITERATURE SURVEY ON COMPREHENSIVE RESEARCH ACTIVITIES OF WEB US...
AN EXTENSIVE LITERATURE SURVEY ON COMPREHENSIVE RESEARCH ACTIVITIES OF WEB US...
 
Search Engine Scrapper
Search Engine ScrapperSearch Engine Scrapper
Search Engine Scrapper
 
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...
 
A NEW IMPROVED WEIGHTED ASSOCIATION RULE MINING WITH DYNAMIC PROGRAMMING APPR...
A NEW IMPROVED WEIGHTED ASSOCIATION RULE MINING WITH DYNAMIC PROGRAMMING APPR...A NEW IMPROVED WEIGHTED ASSOCIATION RULE MINING WITH DYNAMIC PROGRAMMING APPR...
A NEW IMPROVED WEIGHTED ASSOCIATION RULE MINING WITH DYNAMIC PROGRAMMING APPR...
 
50120140504006
5012014050400650120140504006
50120140504006
 
A Review on Pattern Discovery Techniques of Web Usage Mining
A Review on Pattern Discovery Techniques of Web Usage MiningA Review on Pattern Discovery Techniques of Web Usage Mining
A Review on Pattern Discovery Techniques of Web Usage Mining
 
MINING FUZZY ASSOCIATION RULES FROM WEB USAGE QUANTITATIVE DATA
MINING FUZZY ASSOCIATION RULES FROM WEB USAGE QUANTITATIVE DATAMINING FUZZY ASSOCIATION RULES FROM WEB USAGE QUANTITATIVE DATA
MINING FUZZY ASSOCIATION RULES FROM WEB USAGE QUANTITATIVE DATA
 
Mining Fuzzy Association Rules from Web Usage Quantitative Data
Mining Fuzzy Association Rules from Web Usage Quantitative Data Mining Fuzzy Association Rules from Web Usage Quantitative Data
Mining Fuzzy Association Rules from Web Usage Quantitative Data
 
Development of pattern knowledge discovery framework using
Development of pattern knowledge discovery framework usingDevelopment of pattern knowledge discovery framework using
Development of pattern knowledge discovery framework using
 
A Novel Framework on Web Usage Mining
A Novel Framework on Web Usage MiningA Novel Framework on Web Usage Mining
A Novel Framework on Web Usage Mining
 
A Novel Method for Data Cleaning and User- Session Identification for Web Mining
A Novel Method for Data Cleaning and User- Session Identification for Web MiningA Novel Method for Data Cleaning and User- Session Identification for Web Mining
A Novel Method for Data Cleaning and User- Session Identification for Web Mining
 
The Data Records Extraction from Web Pages
The Data Records Extraction from Web PagesThe Data Records Extraction from Web Pages
The Data Records Extraction from Web Pages
 
User Navigation Pattern Prediction from Web Log Data: A Survey
User Navigation Pattern Prediction from Web Log Data:  A SurveyUser Navigation Pattern Prediction from Web Log Data:  A Survey
User Navigation Pattern Prediction from Web Log Data: A Survey
 
C03406021027
C03406021027C03406021027
C03406021027
 
IRJET-A Survey on Web Personalization of Web Usage Mining
IRJET-A Survey on Web Personalization of Web Usage MiningIRJET-A Survey on Web Personalization of Web Usage Mining
IRJET-A Survey on Web Personalization of Web Usage Mining
 
A detail survey of page re ranking various web features and techniques
A detail survey of page re ranking various web features and techniquesA detail survey of page re ranking various web features and techniques
A detail survey of page re ranking various web features and techniques
 
A Clustering Based Approach for knowledge discovery on web.
A Clustering Based Approach for knowledge discovery on web.A Clustering Based Approach for knowledge discovery on web.
A Clustering Based Approach for knowledge discovery on web.
 
STRATEGY AND IMPLEMENTATION OF WEB MINING TOOLS
STRATEGY AND IMPLEMENTATION OF WEB MINING TOOLSSTRATEGY AND IMPLEMENTATION OF WEB MINING TOOLS
STRATEGY AND IMPLEMENTATION OF WEB MINING TOOLS
 
WEB MINING – A CATALYST FOR E-BUSINESS
WEB MINING – A CATALYST FOR E-BUSINESSWEB MINING – A CATALYST FOR E-BUSINESS
WEB MINING – A CATALYST FOR E-BUSINESS
 

More from IAEME Publication

IAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdfIAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdfIAEME Publication
 
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...IAEME Publication
 
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURSA STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURSIAEME Publication
 
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURSBROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURSIAEME Publication
 
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONSDETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONSIAEME Publication
 
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONSANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONSIAEME Publication
 
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINOVOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINOIAEME Publication
 
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...IAEME Publication
 
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMYVISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMYIAEME Publication
 
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...IAEME Publication
 
GANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICEGANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICEIAEME Publication
 
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...IAEME Publication
 
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...IAEME Publication
 
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...IAEME Publication
 
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...IAEME Publication
 
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...IAEME Publication
 
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...IAEME Publication
 
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...IAEME Publication
 
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...IAEME Publication
 
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENTA MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENTIAEME Publication
 

More from IAEME Publication (20)

IAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdfIAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdf
 
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
 
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURSA STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
 
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURSBROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
 
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONSDETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
 
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONSANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
 
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINOVOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
 
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
 
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMYVISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
 
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
 
GANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICEGANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICE
 
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
 
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
 
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
 
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
 
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
 
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
 
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
 
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
 
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENTA MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
 

Recently uploaded

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Recently uploaded (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Research on classification algorithms and its impact on web mining

  • 1. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 495 RESEARCH ON CLASSIFICATION ALGORITHMS AND ITS IMPACT ON WEB MINING Prof. Sindhu P Menon1 Dr. Nagaratna P Hegde2 Assistant Professor, Professor, Dept of CSE,KLEIT, Gokul Road, Dept. of CSE, VCE, Ibrahimbagh, Hubli, Karnataka, India Hyderabad, India ABSTRACT Web mining is the application of data mining technique to discover patters from the web. Web mining can be further classified into research fields such as: Data mining and World Wide Web (WWW). Data mining a field of computer science is the process of discovering new patterns from large dataset involving methods of artificial intelligence, database systems. It refers that the goal of the data mining is to draw out information from the dataset [19] .The World Wide Web provides an effective and simpler way for the users to search, brows and retrieve information from the web. Web mining can be broadly classified into i) Web Structure Mining ii) Web Content Mining and iii) Web Usage Mining[15]. This paper is based on the survey established on the published papers on web mining, mainly focus on web usage mining .Its a known fact that Web usage mining is an application of web mining. This is s used to extract patterns from the log data which are then examined to obtain behavioral patterns which can be analyzed for further processing. The input for all these operations is the log file. Keywords: Clustering, Classification, Web Mining, Associative Rule Mining. I. INTRODUCTION The term web mining refers to applying data mining techniques to extract useful information from web. As much of the research has not been carried out in web mining , its implementation becomes complex. Web mining covers a lot other areas such as Database, Information Retrieval, and artificial intelligence. The term web mining was first coated by Oren Etzioni in 1996 [21], he claimed that web mining [21] is the use of mining techniques to extract information from WWW and other services. INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) ISSN 0976 – 6367(Print) ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), pp. 495-504 © IAEME: www.iaeme.com/ijcet.asp Journal Impact Factor (2013): 6.1302 (Calculated by GISI) www.jifactor.com IJCET © I A E M E
  • 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 496 To start With web mining we can say that web contains 3 types of data i) web structure data ii) content (data of web) iii) usage (web log data) [11]. Usually web mining can be categorized into i)Web Structure Mining. ii)Web Content Mining. iii)Web Usage mining. Web structure mining [15], one of three categories of web mining for data, is a tool used to identify the relationship between Web pages linked by information or direct link connection. This connection allows extracting data relating to a search query directly from the linking Web page. Structure mining addresses two problems of the World Wide Web. The first of these problems is irrelevant search results. Relevance of search information becomes a problem that search engines often allow for low precision criteria. The recall in content mining is low because of the vast amount of information provided on the Web. This reduction in information is due to the uncovering of the Web hyperlink structure . The concept behind structure mining is to retrieve the unexplored relationships from the web data. Structure mining finds its use in a number of applications like business wherein it is used to link its website details to enable users to navigate and cluster information. Through this the users can access information through keyword league and content mining. Web content mining (text mining) [15], is generally the second step in Web data mining. Content mining is the scanning of text, pictures and graphs of a Web page. This scanning is usually completed after the clustering process through structure mining and provides the results based on the same. With the huge amount of information available on the World Wide Web, content mining provides the results to search engines. Content mining is directed toward specific information given by the customer in search engines. This in turn allows for the scanning of the entire Web. Text mining becomes very efficacious when handling with specific topics. The main uses for this type of data mining is to gather, organize possible information available on the WWW to the user request. A vast number of results thus obtained improve the navigation patterns on the web. Web usage mining [15] is the main category in web mining. This allows for the collection of Web access information for Web pages. This allows the paths leading to accessed Web pages. This information is often gathered automatically into web logs via. Usage mining allows producing productive information pertaining to the future of their function ability. Some of which can be derived from the collective information. The usage data that is gathered provides the ability to produce results in more effective way. Usage mining is useful in a number of areas like online trading and also in businesses based on web trading. This information allows us to get idea about the number of users and business on each site. This web mining also enables Web based businesses to provide the best access routes to services or other advertisements. Usage processing is used for complete pattern discovery. The pattern discovery [8] is difficult because only bits of information like IP addresses, url, etc are available. With this less information it is harder to identify the user. Another use is structure processing, Which consists of analysis of the structure of each page contained in a Web site. II. RELATED WORK Vijyalaxmi [15] has explored an innovative sequential technique called AWAPT (Adaptive Web Access Pattern Tree), for Frequent Sequential Pattern mining which is the most common method of data mining. The main focus is on discovering the relationships between the access patterns which is obtained from the web log file. Pattern discovery process is carried out by applying FSP method to log file and later it is analyzed. The main steps involved in this technique are summarized-The algorithm first surveys the raw log file to find out the separate user activities. Web access pattern tree, an efficient data structure is constructed based on frequent access patterns. Later
  • 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 497 intermediate WAP tree is constructed. Considering memory and time as the critical issues, AWAPT is proposed which avoids recursion [15]. A. Rule of AWAPT[15]: Given a WAP-tree With some nodes, the binary code of each node can simply be assigned following the rule that the root has null position code, and the leftmost child of the root has a code of 1, but the code of any other node is derived by appending 1 to the position code of its parent, if this node is the leftmost child, or appending 10 to the position code of the parent if this node is the second leftmost child, the third leftmost child has 100 appended, etc. In general, for the nth leftmost child, the position code is obtained by appending the binary number for 2n-1 to the parent’s code. A node is an ancestor of another node if and only if the position code of With “1” appended to its end, equals the first x number of bits in the position code of , where x is the ((number of bits in the position code ) + 1). The authors of [18] proposed some rules in every phase of data pre processing, as the raw web log file contains many irrelevant data , they applied many heuristics in order to reduce the size as well as to improve the quality of the file. User identification is also the main part dealt in this paper. IP address was used as the key element in distinguishing the users. Access log is used along with the referrer log to construct the access patterns. Path completion is also the critical thing as many access records are not maintained in the log file, hence there is a need for filling these missed references. In [10] , clustering was the main goal. As there was a prior knowledge about the tasks, this brought accuracy up to 99%. The main aspect which was used in clustering as the viewing time of the pages increased the effectiveness and robustness. Vectors are created to identify features of each web page, and then the page is described as the multi-model vector. Later the user sessions are modeled as multi model vectors. Finally clustering of the sessions is done which leads to the different categories. Many researchers are working in this field in order to effectively identify the users and classify them which helps in personalization. The first step in WUM is data cleaning as the raw web log file includes lots of noise and it has to be removed. Yan Wang [11] described in his paper the pattern discovery in the form of Statistical Analysis, Association Rules, Clustering, Classification, Sequential Pattern and Dependency Modeling. He also used web usage mining frame work called WebSIFT that used the content and structure information from a Web site, and finally identified the interesting results from mining usage data [6]. Due to the extensive growth of E-commerce, privacy has become a critical topic for many researchers. The applications of web mining have led to many conflicts like spam and that the sensitive information of the users are being hacked during online shopping and online banking. He carried out works related to user privacy and navigation pattern discovery [12] described two categories of effective features in identifying the goal of a query: past user-click behavior and anchor-link distribution. Here attempt is made to clearly understand the goal of the user query which may help in improving the search engine’s service. He used taxonomy of query goals which relied mainly on [19] that is 1) Navigational queries : Here users already have a clear website in their mind. So the query is just for the purpose of verification. The users might have previously visited the sites. So the result will be accurate. 2) Informational queries: Here users do not pre-assume any web pages and there will be multiple results, user will discover new websites as he searches depending upon the requirement. Here is an approach [14] to develop user profiles. This helps in identifying the users interests in a better and efficient way, enabling effective personalized services. User profiles may also be used to obtain the user information like the location of the user, his previously visited sites etc. The basic thing for user profile construction is the requirement of user details and so the users have to be
  • 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 498 identified uniquely. The various methods are-software agents, logins, enhanced proxy servers, cookies, and session ids. After identification users information must be collected Through the HTML forms, where in the users can fill their details using checkboxes and radio buttons. This is explicit collection and this requires user’s interest. Another method is implicit collection which may not require intervention of the users. The various techniques are- browser cache, proxy servers, browser agents, desktop agents, web logs, search logs. User profiles are constructed using the information obtained. The authors [9] described the recent developments in WUM which is getting more attention day by day. Most of this research effort focuses on three main paradigms: association rules, sequential patterns, and clustering. Association rules are used to find out the associations among pages that often appear together in users’ sessions. The typical result has the form “X.html, Y.html => Z.html” which states that if a user has visited page X.html and page Y.html, it is very likely that in the same session, the same user has also visited page Z.html. Sequential Patterns are used to explore frequent sequences among large amount of sequential data. In web usage mining, sequential patterns are discovered to find out navigational patterns that occur in users’ sessions. Clustering is mainly used to group the similar sessions. Applying multi-modal clustering, [10] is a technique which builds clusters by using multiple information data features. [20] presents an application of matrix clustering to web usage data. [3] integrated a naïve Bayesian multi-net to perform the user identification task. This mainly interprets the distinct user patterns which help to retain the customers. This approach is carried out with e-transaction data. Other strategy used is click stream data. Transactions are the main criteria. User identification function is derived from the set of transactions and the set of users. This approach in [3] has following steps: 1. Characteristic pattern mining: The characteristic patterns are mined from a training transaction set by filtering out user extrinsic behaviors, such as common patterns for most users and accidental behavior patterns for a particular user. 2. Identification function construction: The identification function is built on the learned characteristic patterns using a naïve Bayesian multi-net. 3. User identification: The characteristic patterns in each given transaction are recognized and employed to determine the user of the transaction by the identification function. Characteristic pattern mining and related concepts are also defined. The user confidence, C(ptk, uj) of pattern ptk and user uj is the conditional probability that a transaction containing ptk was from user uj. In [4], they developed a general architecture for Web usage mining which is presented in [6] and [21]. The WEBMINER is a system that implements parts of this general architecture. The architecture divides the Web usage mining process into two main parts. The first part includes the domain dependent processes of transforming the Web data into suitable transaction form. This includes preprocessing, transaction identification, and data integration components. The second part includes the largely domain independent application of generic data mining and pattern matching techniques (such as the discovery of association rule and sequential patterns) as part of the system's data mining engine. [7] have presented a probabilistic latent semantic analysis (PLSA) model, which can imply the hidden semantic factors and present user access patterns from the session-page observation data. The data from two different sources, namely, web access log files (i.e. usage data) and web site map (i.e. linkage information) are integrated to generate linkage-enhanced usage data. The integrated usage data, in turn, are viewed as user session data in terms of page view-weight pairs and utilized to derive the user access patterns based on the PLSA model. In addition, the user access patterns have also been characterized by the user profiles, which are presented in terms of weighted page sets. The benefit of weight in the user profiles can be used to determine the main theme of individual and
  • 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 499 common user access pattern, which will provide useful information for further other web applications, such as web recommendation or personalization. III. PHASES IN WEB USAGE CLASSIFICATION The figure below depicts the various phases in user classification Fig.1 Phases of User Classification This model has been proposed in [16]. The sub modules are described as below. A. User identification User identification is the crucial step in data preprocessing model. Since its very challenging task to find out a particular user from heaps of the log records stored in the server, there are many ways to identify the users [14]. They are as follows: 1) Software agents These are small application modules which are installed in user computer. This keeps track of all the web transactions of the user. The only assumption is that both the server and user has the same application and information. 2) User id Here identifying is accurate since users themselves give their identification data Through userid and passwords. Here the only assumption is all server provide the registration forms. 3) Cookies These are the chunks of information stored in user’s computer by the server. It is most efficient technique but the user must have set the cookie on his machine otherwise no information is stored.An algorithm for user identification [8] is as follows: Algorithm: User Identification Input: Log Database. Output: Unique Users Database. Step1: Initialize IPList=0; UsersList=0; BrowserList=0; OSList=0; No-of-users=0; Step2: Read Record from LogDatabase Step3: If Record.IP address in not in IPList then add new Record.IPaddress in to IPList add Record.Browser in to BrowserList add Record.OS in to OSList increment count of No-of-users insert new user in to UserList. Else
  • 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 500 If Record.IP address is present in IPList OR Record.Browser not in BrowserList OR Record.OS not in OSList then increment count of No-of-users insert as new user in to UserList. End of If End of If Step4: Repeat the above steps 2 to 3 until eof (Log Database ). Step5: Stop the process. The outcome of this algorithm is a Unique Users Database gives information about total number of individual users, users IPaddress, user agent and browser used[1]. In [3] they have used transactions to identify users, they have given the expression as bellow: They have described Boolean function as ʄ : {0,1 }ǀ I ǀ { 0,1}ǀ U ǀ+1 Where |I| is total number of possible items in the transaction set T , and |U| +1 is the total number of possible users plus the unknown user. The Algorithm [3] is given below: Algorithm: Identifying users Input: A transaction t, that requires identifying the user Output: Identified user, u^ , or unknown users u0 Procedure user_identification(t) { Identify all characteristics patterns, ptk, in t, i.e., ptk subset to PTt, , ptk c t and ptk belongs to PTc; If PTt is empty Return u0 Else Let PTt=PTc-PTt; Remove any pattern that is a sub pattern or another pattern in PTt Remove any pattern that is a super pattern of another pattern in vector PT For all ui belongs to U, compute Return uj with the maximum End if }
  • 7. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 501 B. Session Identification Sessions are normally defined as the number of page visits done by the user in a certain time allotment. It depends on the particular user whether he/she has single or multiple sessions. Here even we can redefine user’s access pattern in that particular session by using some reconstruction techniques. In [9] they have mentioned two steps for session identification: •••• Identifying the different session of the user from the very poor data available in server log file. •••• Restructuring the user’s navigation patterns within the already identified sessions. They have used cookies to rewrite the urls. They have also addressed the web browser caching problem in their proposed paper.In [4], they have used click streams of users. They have named this process as sessionization. Here a transaction is defined as a subset of user session having homogenous pages. The two methods described here are: 1. Time Oriented Heuristics This is based on total session time. Here Page viewing time is defined as set of pages visited by a specific user at a specific time. The time can vary from 10 minutes to 10 hours depending on the patterns. Another method calculates time stamps of two records of same user for the maximum period of time they have stayed in a page 2. Navigation Oriented Heuristics Here the web topologies are used in graphical format. They have used referrer field in web log file to denote whether the page is a new session or it must be added to previous system. In [7] authors have mentioned an algorithm for grouping session of user. The Algorithm proposed by them is as follows Algorithm: Grouping User Sessions Input: P(Zk |Si), user session-page matrix SPij, threshold  Output: A set of clusters SCL=(SCL1,SCL2,….,SCLk) Begin Step1: SCL1=SCL2=...=SCLk= Step2: For each si belongs to S,select P(zk|si), if P(zk|si)>= Then SCLk=SCLk U si Step3: If there are still user’s sessions to be clustered, go back to step2 Step4: Return clusters SCL= {SCLk} C. User Classification Classification means supervised learning. Here the classes are previously defined. There is another method of training of data (unsupervised learning), also called as clustering. Here classes are dynamic. 1. Decision tree induction Decision tree [2] contains flow chart like tree structure. Here the internal nodes denote a conflict on a test or a condition and branches represent outcome of the condition or test. The leaf node represents the classes which are already defined. Decision tree is denoted in 2 steps a. Constructing tree Where training examples are at the root and it is partitioned recursively using attributes or features.
  • 8. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 502 b. Tree pruning Here any noise or defects are eradicated from the tree especially from the branches, identifying these noise or defects is a complex process. Algorithm: Decision tree Step1: Construct a Top-Down recursive tree using divide and conquer method. Step2: Keep all the training samples at the root. Step3: If attributes have continuous values, make it discrete. Step4: Categorize the attributes. Step5: The sample is portioned recursively according to the attributes selection. Step 6: Based on the statistical measure like information gain attributes are selected. Step7: Stop the partitions if all samples are in same class for a particular node. Step8: Stop the partitioning if these are no remaining attributes or samples. 2. Bayesian classifier Given training data D, posteriori probability of a hypothesis h, P(h|D) follows the Bayes theorem MAP (maximum posteriori) hypothesis The classification problem may be formalized using a posteriori probabilities: P(C|X) = prob. that the sample tuple X=<x1,…,xk> is of class C E.g. P(class=N | outlook=sunny,windy=true,…)Idea: assign to sample X the class label C such that P(C|X) is maximal Bayes theorem: P(C|X) = P(X|C)·P(C) / P(X) where P(X) is constant for all classes and P(C) = relative freq of class C samples C such that P(C|X) is maximum = C such that P(X|C)·P(C) is maximum. 3. Other Classification approaches 3.1) K-nearest neighbor: Here instances are represented as points Algorithm: K-Nearest Neighbor Step 1: All samples or values are denoted as points in a n-D space. Step 2: Euclidean distance is determined to find out the nearest neighbor of all points. Step 3: Another function is expressed either discrete or real valued. It is called as target function. )( )()|()|( DP hPhDPDhP = .)()|(maxarg)|(maxarg hPhDP Hh DhP HhMAP h ∈ = ∈ ≡
  • 9. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 503 Step 4: If discrete target function is used, K-NN returns the most common value Among the training example. The weight of the neighbor is calculated as W=1/d(xq,x1)2 3.2) Genetic Algorithm: It is a analog to biological evolution. Each rule is repeated as a string of bits. Initially population is created first by using random generated rules. Here the notation of survival of fittest is represented by the accuracy with which it classifies a set of training examples, crossovers and mutation of rules generate offspring. 3.3) Rough set approach In this approach the equivalent classes are defined approximately or roughly. In this approach for a given class C approximations are done in two sets that is lower approximation (certain to b in C) and upper approximation (can not b described as not in C). N-P hard is used to find the minimum set of attributes and a diascernibility matrix is used to reduce the computation intensity. 3.4) Fuzzy logic It uses the truth values between 0.0to1.0. These truth values represent the degree of fuzzy membership. Attribute values are converted to fuzzy values. For a given data more than one fuzzy value can be defined. Each rule which is applied denotes a vote for membership in that pertaining class. Finally the total of truth value is taken from each class. Classification on Web Mining As stated earlier Classification referes to supervised learning wherin the classes are predefined by the researcher. In Clustering we go in for unsupervised learning wherein there are no predefined classes. In classification we have two sets of data, the training set and the testing set. In the training set based on the characteristics of the model, we feed in data and build the database. Then based on this built database , using the testing set we classify the users. A number of classification methodologies exist IV. CONCLUSION In this paper, we survey the researches in the area of Web mining With the focus on the classification in Web Usage Mining. Around the key topic of this paper - usage mining, we provide detailed description of user identification as well as classification. In Classification Bayesian approach is the oldest and best approach to classify the users. User identification using some advanced techniques is a better way compared to the construction of the user profiles which requires the information of the users. So lots of areas in this domain are yet to be explored which may lead to personalization and to provide the desired information to the users. V. REFERENCES [1] Suneetha K.R and Dr. R. Krishnamoorthi (2009), Data Preprocessing and Easy Access Retrieval of Data Through Data Ware House, Proceedings of the World Congress on Engineering and Computer Science IWCECS, October 20-22, pp.306-311, San Francisco, USA.
  • 10. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 504 [2] Micheline Kamber, Lara Winstone, Wan Gong, Shan Cheng, Jiawei Han, Generalization and Decision Tree Induction: Efficient Classification, Database Systems Research Laboratory School of Computing Science [3] Oren Etzioni(1996), The world-wide web: Quagmire or gold mine, Communications of the ACM, Vol. 39(11) pp. 65–68. [4] V. Chitra and Dr. Antony Selvdoss Davamani(2010),A Survey on Preprocessing Methods for Web Usage Data. (IJCSIS) International Journal of Computer Science and Information Security, Vol. 7, No. 3,pp. 78-83, ISSN 1947-5500 [5] Alice Marascu and Florent Masseglia (2006) ,Mining Sequential Patterns from Data Streams: a Centroid Approach, Journal of Intelligent Information Systems, Volume 27, Issue 3, pp 291-307 [6] R. Cooley, B. Mobasher, and J. Srivastava,(1997) , Web Mining: Information and Pattern Discovery on the World Wide Web, University of Minnesota, Dept. of Computer Science, Minneapolis, ACM SIGKDD, Vol.1, Issue 2 , pp. 12-23 [7] Guandong Xu, Yanchun Zhang, Jiangang Ma, Xiaofang Zhou, Discovering User Access Pattern Based on Probabilistic Latent Factor Model, in ADC ’05: Proceedings of the sixteenth Australasian database conference, Darlinghurst, Australia, Australian Computer Society, pp.27-35 [8] K.R.Suneetha, R. Krishnamoorti(2011), IRS: Intelligent Recommendation System for Web Personalization, European Journal of Scientific Research, Inc., ISSN 1450-216X, Vol.65 Issue 2, pp.175-186. [9] Federico Michele Facca and Pier Luca Lanzi(2003) , Recent Developments in Web Usage Mining Research, in proceddings of DaWaK ,Prague,Czech republic, LNCS, Springer Verlag [10] Jeffrey Heer, Ed H. Chi(2002) , Separating the swarm: categorization methods for user sessions on the web , in Proceedings of ACM CHI 2002, Conference on Human factors in Computing Systems, pp.243-250, ACM Press, Minnapolis [11] Yan Wang(2000), Web Mining and Knowledge Discovery of Usage Pattern,in Web Age Information Management System, pp 227-232 [12] Uichin Lee, Zhenyu Liu, Junghoo Cho(2005) , Automatic Identification of User Goals in Web Search”, University of California Los Angeles, CA 90095, In WWW2005: The 14th International World Wide Web Conference. [13] R. Kosala, H. Blockeel(2000), Web Mining Research: A Survey,” In SIGKDD Explorations, ACM press, Vol 2 Issue 1 , pp.1-15. [14] Susan Gauch, Mirco Speretta, Aravind Chandramouli and Alessandro Micarelli(2007), User Profiles for Personalized Information Access, Electrical Engineering and Computer Science Information & Telecommunication Technology Center, The Adaptive Web, LNCS 4321, pp.54- 89, Springer-Verlag Berlin Heidelberg [15] S. Vijayalakshmi and V.Mohan (2010), Mining of users access behavior for frequent sequential pattern from web logs, International Journal of Database Management Systems ( IJDMS) Vol.2, No.3,pp.31-45. [16] Suneetha K.R, Dr. R. Krishnamoorthi(2009), Data Preprocessing and Easy Access Retrieval of Data Through Data Ware House, Proceedings of the World Congress on Engineering and Computer Science , Vol IWCECS 2009, October 20-22, San Francisco, USA,ISBN :978-988- 17012-6-8 [17] Li Chaofeng(2006) ,Research and Development of Data Preprocessing in Web Usage Mining, School of Management, South-Central University for Nationalities,Wuhan 430074, P.R. Chinapp, International Conference on Management Science and Engineering, pp.1311-1315 [18] Sayeesh and Dr. Nagaratna P. Hegde, “A Comparison of Multiple Wavelet Algorithms for Iris Recognition”, International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 2, 2013, pp. 386 - 395, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375. [19] Sumana M and Hareesha K S, “Preprocessing and Secure Computations for Privacy Preservation Data Mining”, International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 4, 2013, pp. 203 - 212, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.