More Related Content
Similar to Research on classification algorithms and its impact on web mining
Similar to Research on classification algorithms and its impact on web mining (20)
More from IAEME Publication
More from IAEME Publication (20)
Research on classification algorithms and its impact on web mining
- 1. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
495
RESEARCH ON CLASSIFICATION ALGORITHMS AND ITS IMPACT ON
WEB MINING
Prof. Sindhu P Menon1
Dr. Nagaratna P Hegde2
Assistant Professor, Professor,
Dept of CSE,KLEIT, Gokul Road, Dept. of CSE, VCE, Ibrahimbagh,
Hubli, Karnataka, India Hyderabad, India
ABSTRACT
Web mining is the application of data mining technique to discover patters from the web.
Web mining can be further classified into research fields such as: Data mining and World Wide Web
(WWW). Data mining a field of computer science is the process of discovering new patterns from
large dataset involving methods of artificial intelligence, database systems. It refers that the goal of
the data mining is to draw out information from the dataset [19] .The World Wide Web provides an
effective and simpler way for the users to search, brows and retrieve information from the web. Web
mining can be broadly classified into i) Web Structure Mining ii) Web Content Mining and iii) Web
Usage Mining[15].
This paper is based on the survey established on the published papers on web mining, mainly
focus on web usage mining .Its a known fact that Web usage mining is an application of web mining.
This is s used to extract patterns from the log data which are then examined to obtain behavioral
patterns which can be analyzed for further processing. The input for all these operations is the log
file.
Keywords: Clustering, Classification, Web Mining, Associative Rule Mining.
I. INTRODUCTION
The term web mining refers to applying data mining techniques to extract useful information
from web. As much of the research has not been carried out in web mining , its implementation
becomes complex. Web mining covers a lot other areas such as Database, Information Retrieval, and
artificial intelligence. The term web mining was first coated by Oren Etzioni in 1996 [21], he
claimed that web mining [21] is the use of mining techniques to extract information from WWW and
other services.
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING &
TECHNOLOGY (IJCET)
ISSN 0976 – 6367(Print)
ISSN 0976 – 6375(Online)
Volume 4, Issue 4, July-August (2013), pp. 495-504
© IAEME: www.iaeme.com/ijcet.asp
Journal Impact Factor (2013): 6.1302 (Calculated by GISI)
www.jifactor.com
IJCET
© I A E M E
- 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
496
To start With web mining we can say that web contains 3 types of data i) web structure data
ii) content (data of web) iii) usage (web log data) [11]. Usually web mining can be categorized into
i)Web Structure Mining.
ii)Web Content Mining.
iii)Web Usage mining.
Web structure mining [15], one of three categories of web mining for data, is a tool used to
identify the relationship between Web pages linked by information or direct link connection. This
connection allows extracting data relating to a search query directly from the linking Web page.
Structure mining addresses two problems of the World Wide Web. The first of these problems is
irrelevant search results. Relevance of search information becomes a problem that search engines
often allow for low precision criteria. The recall in content mining is low because of the vast amount
of information provided on the Web. This reduction in information is due to the uncovering of the
Web hyperlink structure . The concept behind structure mining is to retrieve the unexplored
relationships from the web data. Structure mining finds its use in a number of applications like
business wherein it is used to link its website details to enable users to navigate and cluster
information. Through this the users can access information through keyword league and content
mining.
Web content mining (text mining) [15], is generally the second step in Web data mining.
Content mining is the scanning of text, pictures and graphs of a Web page. This scanning is usually
completed after the clustering process through structure mining and provides the results based on the
same. With the huge amount of information available on the World Wide Web, content mining
provides the results to search engines. Content mining is directed toward specific information given
by the customer in search engines. This in turn allows for the scanning of the entire Web. Text
mining becomes very efficacious when handling with specific topics. The main uses for this type of
data mining is to gather, organize possible information available on the WWW to the user request. A
vast number of results thus obtained improve the navigation patterns on the web.
Web usage mining [15] is the main category in web mining. This allows for the collection of
Web access information for Web pages. This allows the paths leading to accessed Web pages. This
information is often gathered automatically into web logs via. Usage mining allows producing
productive information pertaining to the future of their function ability. Some of which can be
derived from the collective information. The usage data that is gathered provides the ability to
produce results in more effective way.
Usage mining is useful in a number of areas like online trading and also in businesses based
on web trading. This information allows us to get idea about the number of users and business on
each site. This web mining also enables Web based businesses to provide the best access routes to
services or other advertisements. Usage processing is used for complete pattern discovery. The
pattern discovery [8] is difficult because only bits of information like IP addresses, url, etc are
available. With this less information it is harder to identify the user. Another use is structure
processing, Which consists of analysis of the structure of each page contained in a Web site.
II. RELATED WORK
Vijyalaxmi [15] has explored an innovative sequential technique called AWAPT (Adaptive
Web Access Pattern Tree), for Frequent Sequential Pattern mining which is the most common
method of data mining. The main focus is on discovering the relationships between the access
patterns which is obtained from the web log file. Pattern discovery process is carried out by applying
FSP method to log file and later it is analyzed. The main steps involved in this technique are
summarized-The algorithm first surveys the raw log file to find out the separate user activities. Web
access pattern tree, an efficient data structure is constructed based on frequent access patterns. Later
- 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
497
intermediate WAP tree is constructed. Considering memory and time as the critical issues, AWAPT
is proposed which avoids recursion [15].
A. Rule of AWAPT[15]:
Given a WAP-tree With some nodes, the binary code of each node can simply be assigned
following the rule that the root has null position code, and the leftmost child of the root has a code of
1, but the code of any other node is derived by appending 1 to the position code of its parent, if this
node is the leftmost child, or appending 10 to the position code of the parent if this node is the
second leftmost child, the third leftmost child has 100 appended, etc. In general, for the nth leftmost
child, the position code is obtained by appending the binary number for 2n-1 to the parent’s code. A
node is an ancestor of another node if and only if the position code of With “1” appended to its end,
equals the first x number of bits in the position code of , where x is the ((number of bits in the
position code ) + 1).
The authors of [18] proposed some rules in every phase of data pre processing, as the raw
web log file contains many irrelevant data , they applied many heuristics in order to reduce the size
as well as to improve the quality of the file. User identification is also the main part dealt in this
paper. IP address was used as the key element in distinguishing the users. Access log is used along
with the referrer log to construct the access patterns. Path completion is also the critical thing as
many access records are not maintained in the log file, hence there is a need for filling these missed
references. In [10] , clustering was the main goal. As there was a prior knowledge about the tasks,
this brought accuracy up to 99%. The main aspect which was used in clustering as the viewing time
of the pages increased the effectiveness and robustness. Vectors are created to identify features of
each web page, and then the page is described as the multi-model vector. Later the user sessions are
modeled as multi model vectors. Finally clustering of the sessions is done which leads to the
different categories.
Many researchers are working in this field in order to effectively identify the users and
classify them which helps in personalization. The first step in WUM is data cleaning as the raw web
log file includes lots of noise and it has to be removed. Yan Wang [11] described in his paper the
pattern discovery in the form of Statistical Analysis, Association Rules, Clustering, Classification,
Sequential Pattern and Dependency Modeling. He also used web usage mining frame work called
WebSIFT that used the content and structure information from a Web site, and finally identified the
interesting results from mining usage data [6]. Due to the extensive growth of E-commerce, privacy
has become a critical topic for many researchers. The applications of web mining have led to many
conflicts like spam and that the sensitive information of the users are being hacked during online
shopping and online banking. He carried out works related to user privacy and navigation pattern
discovery [12] described two categories of effective features in identifying the goal of a query: past
user-click behavior and anchor-link distribution. Here attempt is made to clearly understand the goal
of the user query which may help in improving the search engine’s service. He used taxonomy of
query goals which relied mainly on [19] that is
1) Navigational queries : Here users already have a clear
website in their mind. So the query is just for the purpose of verification. The users might have
previously visited the sites. So the result will be accurate.
2) Informational queries: Here users do not pre-assume any
web pages and there will be multiple results, user will discover new websites as he searches
depending upon the requirement.
Here is an approach [14] to develop user profiles. This helps in identifying the users interests in
a better and efficient way, enabling effective personalized services. User profiles may also be used to
obtain the user information like the location of the user, his previously visited sites etc. The basic
thing for user profile construction is the requirement of user details and so the users have to be
- 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
498
identified uniquely. The various methods are-software agents, logins, enhanced proxy servers,
cookies, and session ids. After identification users information must be collected Through the HTML
forms, where in the users can fill their details using checkboxes and radio buttons. This is explicit
collection and this requires user’s interest. Another method is implicit collection which may not
require intervention of the users. The various techniques are- browser cache, proxy servers, browser
agents, desktop agents, web logs, search logs. User profiles are constructed using the information
obtained.
The authors [9] described the recent developments in WUM which is getting more attention
day by day. Most of this research effort focuses on three main paradigms: association rules,
sequential patterns, and clustering. Association rules are used to find out the associations among
pages that often appear together in users’ sessions. The typical result has the form “X.html, Y.html
=> Z.html” which states that if a user has visited page X.html and page Y.html, it is very likely that
in the same session, the same user has also visited page Z.html. Sequential Patterns are used to
explore frequent sequences among large amount of sequential data. In web usage mining, sequential
patterns are discovered to find out navigational patterns that occur in users’ sessions. Clustering is
mainly used to group the similar sessions. Applying multi-modal clustering, [10] is a technique
which builds clusters by using multiple information data features. [20] presents an application of
matrix clustering to web usage data. [3] integrated a naïve Bayesian multi-net to perform the user
identification task. This mainly interprets the distinct user patterns which help to retain the
customers. This approach is carried out with e-transaction data. Other strategy used is click stream
data. Transactions are the main criteria. User identification function is derived from the set of
transactions and the set of users.
This approach in [3] has following steps:
1. Characteristic pattern mining: The characteristic patterns are mined from a training transaction
set by filtering out user extrinsic behaviors, such as common patterns for most users and
accidental behavior patterns for a particular user.
2. Identification function construction: The identification function is built on the learned
characteristic patterns using a naïve Bayesian multi-net.
3. User identification: The characteristic patterns in each given transaction are recognized and
employed to determine the user of the transaction by the identification function.
Characteristic pattern mining and related concepts are also defined. The user confidence, C(ptk,
uj) of pattern ptk and user uj is the conditional probability that a transaction containing ptk was from
user uj.
In [4], they developed a general architecture for Web usage mining which is presented in [6]
and [21]. The WEBMINER is a system that implements parts of this general architecture. The
architecture divides the Web usage mining process into two main parts. The first part includes the
domain dependent processes of transforming the Web data into suitable transaction form. This
includes preprocessing, transaction identification, and data integration components. The second part
includes the largely domain independent application of generic data mining and pattern matching
techniques (such as the discovery of association rule and sequential patterns) as part of the system's
data mining engine.
[7] have presented a probabilistic latent semantic analysis (PLSA) model, which can imply
the hidden semantic factors and present user access patterns from the session-page observation data.
The data from two different sources, namely, web access log files (i.e. usage data) and web site map
(i.e. linkage information) are integrated to generate linkage-enhanced usage data. The integrated
usage data, in turn, are viewed as user session data in terms of page view-weight pairs and utilized to
derive the user access patterns based on the PLSA model. In addition, the user access patterns have
also been characterized by the user profiles, which are presented in terms of weighted page sets. The
benefit of weight in the user profiles can be used to determine the main theme of individual and
- 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
499
common user access pattern, which will provide useful information for further other web
applications, such as web recommendation or personalization.
III. PHASES IN WEB USAGE CLASSIFICATION
The figure below depicts the various phases in user classification
Fig.1 Phases of User Classification
This model has been proposed in [16]. The sub modules are described as below.
A. User identification
User identification is the crucial step in data preprocessing model. Since its very challenging
task to find out a particular user from heaps of the log records stored in the server, there are many
ways to identify the users [14]. They are as follows:
1) Software agents
These are small application modules which are installed in user computer. This keeps track of
all the web transactions of the user. The only assumption is that both the server and user has the same
application and information.
2) User id
Here identifying is accurate since users themselves give their identification data Through
userid and passwords. Here the only assumption is all server provide the registration forms.
3) Cookies
These are the chunks of information stored in user’s computer by the server. It is most
efficient technique but the user must have set the cookie on his machine otherwise no information is
stored.An algorithm for user identification [8] is as follows:
Algorithm: User Identification
Input: Log Database.
Output: Unique Users Database.
Step1: Initialize
IPList=0; UsersList=0; BrowserList=0;
OSList=0; No-of-users=0;
Step2: Read Record from LogDatabase
Step3: If Record.IP address in not in IPList then add new
Record.IPaddress in to IPList
add Record.Browser in to BrowserList
add Record.OS in to OSList
increment count of No-of-users
insert new user in to UserList.
Else
- 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
500
If Record.IP address is present in IPList OR
Record.Browser not in BrowserList OR
Record.OS not in OSList
then
increment count of No-of-users
insert as new user in to UserList.
End of If
End of If
Step4: Repeat the above steps 2 to 3
until eof (Log Database ).
Step5: Stop the process.
The outcome of this algorithm is a Unique Users Database gives information about total
number of individual users, users IPaddress, user agent and browser used[1].
In [3] they have used transactions to identify users, they have given the expression as bellow:
They have described Boolean function as
ʄ : {0,1 }ǀ I ǀ
{ 0,1}ǀ U ǀ+1
Where |I| is total number of possible items in the transaction set T , and |U| +1 is the total
number of possible users plus the unknown user.
The Algorithm [3] is given below:
Algorithm: Identifying users
Input: A transaction t, that requires identifying the user
Output: Identified user, u^
, or unknown users u0
Procedure user_identification(t) {
Identify all characteristics patterns, ptk, in t, i.e., ptk subset to PTt,
, ptk c t and ptk belongs to
PTc;
If PTt is empty
Return u0
Else
Let PTt=PTc-PTt;
Remove any pattern that is a sub pattern or another pattern
in PTt
Remove any pattern that is a super pattern of another
pattern in vector PT
For all ui belongs to U, compute
Return uj with the maximum
End if
}
- 7. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
501
B. Session Identification
Sessions are normally defined as the number of page visits done by the user in a certain time
allotment. It depends on the particular user whether he/she has single or multiple sessions. Here even
we can redefine user’s access pattern in that particular session by using some reconstruction
techniques.
In [9] they have mentioned two steps for session identification:
•••• Identifying the different session of the user from the very poor data available in server log
file.
•••• Restructuring the user’s navigation patterns within the already identified sessions.
They have used cookies to rewrite the urls. They have also addressed the web browser
caching problem in their proposed paper.In [4], they have used click streams of users. They have
named this process as sessionization. Here a transaction is defined as a subset of user session having
homogenous pages. The two methods described here are:
1. Time Oriented Heuristics
This is based on total session time. Here Page viewing time is defined as set of pages visited
by a specific user at a specific time. The time can vary from 10 minutes to 10 hours depending on the
patterns.
Another method calculates time stamps of two records of same user for the maximum period
of time they have stayed in a page
2. Navigation Oriented Heuristics
Here the web topologies are used in graphical format. They
have used referrer field in web log file to denote whether the page is a new session or it must be
added to previous system.
In [7] authors have mentioned an algorithm for grouping session of user. The Algorithm
proposed by them is as follows
Algorithm: Grouping User Sessions
Input: P(Zk |Si), user session-page matrix SPij, threshold
Output: A set of clusters SCL=(SCL1,SCL2,….,SCLk)
Begin
Step1: SCL1=SCL2=...=SCLk=
Step2: For each si belongs to S,select P(zk|si), if
P(zk|si)>= Then SCLk=SCLk U si
Step3: If there are still user’s sessions to be clustered, go
back to step2
Step4: Return clusters SCL= {SCLk}
C. User Classification
Classification means supervised learning. Here the classes are previously defined. There is
another method of training of data (unsupervised learning), also called as clustering. Here classes are
dynamic.
1. Decision tree induction
Decision tree [2] contains flow chart like tree structure.
Here the internal nodes denote a conflict on a test or a condition and branches represent
outcome of the condition or test. The leaf node represents the classes which are already
defined.
Decision tree is denoted in 2 steps
a. Constructing tree
Where training examples are at the root and it is partitioned recursively using
attributes or features.
- 8. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
502
b. Tree pruning
Here any noise or defects are eradicated from the tree especially from the branches,
identifying these noise or defects is a complex process.
Algorithm: Decision tree
Step1: Construct a Top-Down recursive tree using divide
and conquer method.
Step2: Keep all the training samples at the root.
Step3: If attributes have continuous values, make it
discrete.
Step4: Categorize the attributes.
Step5: The sample is portioned recursively according to the
attributes selection.
Step 6: Based on the statistical measure like information
gain attributes are selected.
Step7: Stop the partitions if all samples are in same class
for a particular node.
Step8: Stop the partitioning if these are no remaining
attributes or samples.
2. Bayesian classifier
Given training data D, posteriori probability of a hypothesis h, P(h|D) follows the Bayes
theorem
MAP (maximum posteriori) hypothesis
The classification problem may be formalized using a posteriori probabilities:
P(C|X) = prob. that the sample tuple X=<x1,…,xk> is of class C
E.g. P(class=N | outlook=sunny,windy=true,…)Idea: assign to sample X the class label C
such that P(C|X) is maximal
Bayes theorem:
P(C|X) = P(X|C)·P(C) / P(X)
where P(X) is constant for all classes
and P(C) = relative freq of class C samples
C such that P(C|X) is maximum = C such that P(X|C)·P(C) is maximum.
3. Other Classification approaches
3.1) K-nearest neighbor:
Here instances are represented as points
Algorithm: K-Nearest Neighbor
Step 1: All samples or values are denoted as points in a n-D space.
Step 2: Euclidean distance is determined to find out the nearest neighbor of all points.
Step 3: Another function is expressed either discrete or real valued. It is called as target
function.
)(
)()|()|(
DP
hPhDPDhP =
.)()|(maxarg)|(maxarg hPhDP
Hh
DhP
HhMAP
h
∈
=
∈
≡
- 9. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
503
Step 4: If discrete target function is used, K-NN returns the most common value Among the
training example.
The weight of the neighbor is calculated as
W=1/d(xq,x1)2
3.2) Genetic Algorithm:
It is a analog to biological evolution. Each rule is repeated as a string of bits. Initially
population is created first by using random generated rules. Here the notation of survival of
fittest is represented by the accuracy with which it classifies a set of training examples,
crossovers and mutation of rules generate offspring.
3.3) Rough set approach
In this approach the equivalent classes are defined approximately or roughly. In this
approach for a given class C approximations are done in two sets that is lower approximation
(certain to b in C) and upper approximation (can not b described as not in C). N-P hard is
used to find the minimum set of attributes and a diascernibility matrix is used to reduce the
computation intensity.
3.4) Fuzzy logic
It uses the truth values between 0.0to1.0. These truth values represent the degree of
fuzzy membership. Attribute values are converted to fuzzy values. For a given data more than
one fuzzy value can be defined. Each rule which is applied denotes a vote for membership in
that pertaining class. Finally the total of truth value is taken from each class.
Classification on Web Mining
As stated earlier Classification referes to supervised learning wherin the classes are
predefined by the researcher. In Clustering we go in for unsupervised learning wherein there are no
predefined classes. In classification we have two sets of data, the training set and the testing set. In
the training set based on the characteristics of the model, we feed in data and build the database.
Then based on this built database , using the testing set we classify the users. A number of
classification methodologies exist
IV. CONCLUSION
In this paper, we survey the researches in the area of Web mining With the focus on the
classification in Web Usage Mining. Around the key topic of this paper - usage mining, we provide
detailed description of user identification as well as classification. In Classification Bayesian
approach is the oldest and best approach to classify the users. User identification using some
advanced techniques is a better way compared to the construction of the user profiles which requires
the information of the users. So lots of areas in this domain are yet to be explored which may lead to
personalization and to provide the desired information to the users.
V. REFERENCES
[1] Suneetha K.R and Dr. R. Krishnamoorthi (2009), Data Preprocessing and Easy Access
Retrieval of Data Through Data Ware House, Proceedings of the World Congress on
Engineering and Computer Science IWCECS, October 20-22, pp.306-311, San Francisco,
USA.
- 10. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
504
[2] Micheline Kamber, Lara Winstone, Wan Gong, Shan Cheng, Jiawei Han, Generalization and
Decision Tree Induction: Efficient Classification, Database Systems Research Laboratory School
of Computing Science
[3] Oren Etzioni(1996), The world-wide web: Quagmire or gold mine, Communications of the ACM,
Vol. 39(11) pp. 65–68.
[4] V. Chitra and Dr. Antony Selvdoss Davamani(2010),A Survey on Preprocessing Methods for Web
Usage Data. (IJCSIS) International Journal of Computer Science and Information Security, Vol. 7,
No. 3,pp. 78-83, ISSN 1947-5500
[5] Alice Marascu and Florent Masseglia (2006) ,Mining Sequential Patterns from Data Streams: a
Centroid Approach, Journal of Intelligent Information Systems, Volume 27, Issue 3, pp 291-307
[6] R. Cooley, B. Mobasher, and J. Srivastava,(1997) , Web Mining: Information and Pattern
Discovery on the World Wide Web, University of Minnesota, Dept. of Computer Science,
Minneapolis, ACM SIGKDD, Vol.1, Issue 2 , pp. 12-23
[7] Guandong Xu, Yanchun Zhang, Jiangang Ma, Xiaofang Zhou, Discovering User Access Pattern
Based on Probabilistic Latent Factor Model, in ADC ’05: Proceedings of the sixteenth
Australasian database conference, Darlinghurst, Australia, Australian Computer Society, pp.27-35
[8] K.R.Suneetha, R. Krishnamoorti(2011), IRS: Intelligent Recommendation System for Web
Personalization, European Journal of Scientific Research, Inc., ISSN 1450-216X, Vol.65 Issue 2,
pp.175-186.
[9] Federico Michele Facca and Pier Luca Lanzi(2003) , Recent Developments in Web Usage Mining
Research, in proceddings of DaWaK ,Prague,Czech republic, LNCS, Springer Verlag
[10] Jeffrey Heer, Ed H. Chi(2002) , Separating the swarm: categorization methods for user sessions on
the web , in Proceedings of ACM CHI 2002, Conference on Human factors in Computing
Systems, pp.243-250, ACM Press, Minnapolis
[11] Yan Wang(2000), Web Mining and Knowledge Discovery of Usage Pattern,in Web Age
Information Management System, pp 227-232
[12] Uichin Lee, Zhenyu Liu, Junghoo Cho(2005) , Automatic Identification of User Goals in Web
Search”, University of California Los Angeles, CA 90095, In WWW2005: The 14th
International
World Wide Web Conference.
[13] R. Kosala, H. Blockeel(2000), Web Mining Research: A Survey,” In SIGKDD Explorations,
ACM press, Vol 2 Issue 1 , pp.1-15.
[14] Susan Gauch, Mirco Speretta, Aravind Chandramouli and Alessandro Micarelli(2007), User
Profiles for Personalized Information Access, Electrical Engineering and Computer Science
Information & Telecommunication Technology Center, The Adaptive Web, LNCS 4321, pp.54-
89, Springer-Verlag Berlin Heidelberg
[15] S. Vijayalakshmi and V.Mohan (2010), Mining of users access behavior for frequent sequential
pattern from web logs, International Journal of Database Management Systems ( IJDMS) Vol.2,
No.3,pp.31-45.
[16] Suneetha K.R, Dr. R. Krishnamoorthi(2009), Data Preprocessing and Easy Access Retrieval of
Data Through Data Ware House, Proceedings of the World Congress on Engineering and
Computer Science , Vol IWCECS 2009, October 20-22, San Francisco, USA,ISBN :978-988-
17012-6-8
[17] Li Chaofeng(2006) ,Research and Development of Data Preprocessing in Web Usage Mining,
School of Management, South-Central University for Nationalities,Wuhan 430074, P.R. Chinapp,
International Conference on Management Science and Engineering, pp.1311-1315
[18] Sayeesh and Dr. Nagaratna P. Hegde, “A Comparison of Multiple Wavelet Algorithms for Iris
Recognition”, International Journal of Computer Engineering & Technology (IJCET), Volume 4,
Issue 2, 2013, pp. 386 - 395, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.
[19] Sumana M and Hareesha K S, “Preprocessing and Secure Computations for Privacy Preservation
Data Mining”, International Journal of Computer Engineering & Technology (IJCET), Volume 4,
Issue 4, 2013, pp. 203 - 212, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.