International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Vol...
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Vol...
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Vol...
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Vol...
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Vol...
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Vol...
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Vol...
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Vol...
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Vol...
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Vol...
Upcoming SlideShare
Loading in …5
×

Research on classification algorithms and its impact on web mining

1,682 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,682
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
18
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Research on classification algorithms and its impact on web mining

  1. 1. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 495 RESEARCH ON CLASSIFICATION ALGORITHMS AND ITS IMPACT ON WEB MINING Prof. Sindhu P Menon1 Dr. Nagaratna P Hegde2 Assistant Professor, Professor, Dept of CSE,KLEIT, Gokul Road, Dept. of CSE, VCE, Ibrahimbagh, Hubli, Karnataka, India Hyderabad, India ABSTRACT Web mining is the application of data mining technique to discover patters from the web. Web mining can be further classified into research fields such as: Data mining and World Wide Web (WWW). Data mining a field of computer science is the process of discovering new patterns from large dataset involving methods of artificial intelligence, database systems. It refers that the goal of the data mining is to draw out information from the dataset [19] .The World Wide Web provides an effective and simpler way for the users to search, brows and retrieve information from the web. Web mining can be broadly classified into i) Web Structure Mining ii) Web Content Mining and iii) Web Usage Mining[15]. This paper is based on the survey established on the published papers on web mining, mainly focus on web usage mining .Its a known fact that Web usage mining is an application of web mining. This is s used to extract patterns from the log data which are then examined to obtain behavioral patterns which can be analyzed for further processing. The input for all these operations is the log file. Keywords: Clustering, Classification, Web Mining, Associative Rule Mining. I. INTRODUCTION The term web mining refers to applying data mining techniques to extract useful information from web. As much of the research has not been carried out in web mining , its implementation becomes complex. Web mining covers a lot other areas such as Database, Information Retrieval, and artificial intelligence. The term web mining was first coated by Oren Etzioni in 1996 [21], he claimed that web mining [21] is the use of mining techniques to extract information from WWW and other services. INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) ISSN 0976 – 6367(Print) ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), pp. 495-504 © IAEME: www.iaeme.com/ijcet.asp Journal Impact Factor (2013): 6.1302 (Calculated by GISI) www.jifactor.com IJCET © I A E M E
  2. 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 496 To start With web mining we can say that web contains 3 types of data i) web structure data ii) content (data of web) iii) usage (web log data) [11]. Usually web mining can be categorized into i)Web Structure Mining. ii)Web Content Mining. iii)Web Usage mining. Web structure mining [15], one of three categories of web mining for data, is a tool used to identify the relationship between Web pages linked by information or direct link connection. This connection allows extracting data relating to a search query directly from the linking Web page. Structure mining addresses two problems of the World Wide Web. The first of these problems is irrelevant search results. Relevance of search information becomes a problem that search engines often allow for low precision criteria. The recall in content mining is low because of the vast amount of information provided on the Web. This reduction in information is due to the uncovering of the Web hyperlink structure . The concept behind structure mining is to retrieve the unexplored relationships from the web data. Structure mining finds its use in a number of applications like business wherein it is used to link its website details to enable users to navigate and cluster information. Through this the users can access information through keyword league and content mining. Web content mining (text mining) [15], is generally the second step in Web data mining. Content mining is the scanning of text, pictures and graphs of a Web page. This scanning is usually completed after the clustering process through structure mining and provides the results based on the same. With the huge amount of information available on the World Wide Web, content mining provides the results to search engines. Content mining is directed toward specific information given by the customer in search engines. This in turn allows for the scanning of the entire Web. Text mining becomes very efficacious when handling with specific topics. The main uses for this type of data mining is to gather, organize possible information available on the WWW to the user request. A vast number of results thus obtained improve the navigation patterns on the web. Web usage mining [15] is the main category in web mining. This allows for the collection of Web access information for Web pages. This allows the paths leading to accessed Web pages. This information is often gathered automatically into web logs via. Usage mining allows producing productive information pertaining to the future of their function ability. Some of which can be derived from the collective information. The usage data that is gathered provides the ability to produce results in more effective way. Usage mining is useful in a number of areas like online trading and also in businesses based on web trading. This information allows us to get idea about the number of users and business on each site. This web mining also enables Web based businesses to provide the best access routes to services or other advertisements. Usage processing is used for complete pattern discovery. The pattern discovery [8] is difficult because only bits of information like IP addresses, url, etc are available. With this less information it is harder to identify the user. Another use is structure processing, Which consists of analysis of the structure of each page contained in a Web site. II. RELATED WORK Vijyalaxmi [15] has explored an innovative sequential technique called AWAPT (Adaptive Web Access Pattern Tree), for Frequent Sequential Pattern mining which is the most common method of data mining. The main focus is on discovering the relationships between the access patterns which is obtained from the web log file. Pattern discovery process is carried out by applying FSP method to log file and later it is analyzed. The main steps involved in this technique are summarized-The algorithm first surveys the raw log file to find out the separate user activities. Web access pattern tree, an efficient data structure is constructed based on frequent access patterns. Later
  3. 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 497 intermediate WAP tree is constructed. Considering memory and time as the critical issues, AWAPT is proposed which avoids recursion [15]. A. Rule of AWAPT[15]: Given a WAP-tree With some nodes, the binary code of each node can simply be assigned following the rule that the root has null position code, and the leftmost child of the root has a code of 1, but the code of any other node is derived by appending 1 to the position code of its parent, if this node is the leftmost child, or appending 10 to the position code of the parent if this node is the second leftmost child, the third leftmost child has 100 appended, etc. In general, for the nth leftmost child, the position code is obtained by appending the binary number for 2n-1 to the parent’s code. A node is an ancestor of another node if and only if the position code of With “1” appended to its end, equals the first x number of bits in the position code of , where x is the ((number of bits in the position code ) + 1). The authors of [18] proposed some rules in every phase of data pre processing, as the raw web log file contains many irrelevant data , they applied many heuristics in order to reduce the size as well as to improve the quality of the file. User identification is also the main part dealt in this paper. IP address was used as the key element in distinguishing the users. Access log is used along with the referrer log to construct the access patterns. Path completion is also the critical thing as many access records are not maintained in the log file, hence there is a need for filling these missed references. In [10] , clustering was the main goal. As there was a prior knowledge about the tasks, this brought accuracy up to 99%. The main aspect which was used in clustering as the viewing time of the pages increased the effectiveness and robustness. Vectors are created to identify features of each web page, and then the page is described as the multi-model vector. Later the user sessions are modeled as multi model vectors. Finally clustering of the sessions is done which leads to the different categories. Many researchers are working in this field in order to effectively identify the users and classify them which helps in personalization. The first step in WUM is data cleaning as the raw web log file includes lots of noise and it has to be removed. Yan Wang [11] described in his paper the pattern discovery in the form of Statistical Analysis, Association Rules, Clustering, Classification, Sequential Pattern and Dependency Modeling. He also used web usage mining frame work called WebSIFT that used the content and structure information from a Web site, and finally identified the interesting results from mining usage data [6]. Due to the extensive growth of E-commerce, privacy has become a critical topic for many researchers. The applications of web mining have led to many conflicts like spam and that the sensitive information of the users are being hacked during online shopping and online banking. He carried out works related to user privacy and navigation pattern discovery [12] described two categories of effective features in identifying the goal of a query: past user-click behavior and anchor-link distribution. Here attempt is made to clearly understand the goal of the user query which may help in improving the search engine’s service. He used taxonomy of query goals which relied mainly on [19] that is 1) Navigational queries : Here users already have a clear website in their mind. So the query is just for the purpose of verification. The users might have previously visited the sites. So the result will be accurate. 2) Informational queries: Here users do not pre-assume any web pages and there will be multiple results, user will discover new websites as he searches depending upon the requirement. Here is an approach [14] to develop user profiles. This helps in identifying the users interests in a better and efficient way, enabling effective personalized services. User profiles may also be used to obtain the user information like the location of the user, his previously visited sites etc. The basic thing for user profile construction is the requirement of user details and so the users have to be
  4. 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 498 identified uniquely. The various methods are-software agents, logins, enhanced proxy servers, cookies, and session ids. After identification users information must be collected Through the HTML forms, where in the users can fill their details using checkboxes and radio buttons. This is explicit collection and this requires user’s interest. Another method is implicit collection which may not require intervention of the users. The various techniques are- browser cache, proxy servers, browser agents, desktop agents, web logs, search logs. User profiles are constructed using the information obtained. The authors [9] described the recent developments in WUM which is getting more attention day by day. Most of this research effort focuses on three main paradigms: association rules, sequential patterns, and clustering. Association rules are used to find out the associations among pages that often appear together in users’ sessions. The typical result has the form “X.html, Y.html => Z.html” which states that if a user has visited page X.html and page Y.html, it is very likely that in the same session, the same user has also visited page Z.html. Sequential Patterns are used to explore frequent sequences among large amount of sequential data. In web usage mining, sequential patterns are discovered to find out navigational patterns that occur in users’ sessions. Clustering is mainly used to group the similar sessions. Applying multi-modal clustering, [10] is a technique which builds clusters by using multiple information data features. [20] presents an application of matrix clustering to web usage data. [3] integrated a naïve Bayesian multi-net to perform the user identification task. This mainly interprets the distinct user patterns which help to retain the customers. This approach is carried out with e-transaction data. Other strategy used is click stream data. Transactions are the main criteria. User identification function is derived from the set of transactions and the set of users. This approach in [3] has following steps: 1. Characteristic pattern mining: The characteristic patterns are mined from a training transaction set by filtering out user extrinsic behaviors, such as common patterns for most users and accidental behavior patterns for a particular user. 2. Identification function construction: The identification function is built on the learned characteristic patterns using a naïve Bayesian multi-net. 3. User identification: The characteristic patterns in each given transaction are recognized and employed to determine the user of the transaction by the identification function. Characteristic pattern mining and related concepts are also defined. The user confidence, C(ptk, uj) of pattern ptk and user uj is the conditional probability that a transaction containing ptk was from user uj. In [4], they developed a general architecture for Web usage mining which is presented in [6] and [21]. The WEBMINER is a system that implements parts of this general architecture. The architecture divides the Web usage mining process into two main parts. The first part includes the domain dependent processes of transforming the Web data into suitable transaction form. This includes preprocessing, transaction identification, and data integration components. The second part includes the largely domain independent application of generic data mining and pattern matching techniques (such as the discovery of association rule and sequential patterns) as part of the system's data mining engine. [7] have presented a probabilistic latent semantic analysis (PLSA) model, which can imply the hidden semantic factors and present user access patterns from the session-page observation data. The data from two different sources, namely, web access log files (i.e. usage data) and web site map (i.e. linkage information) are integrated to generate linkage-enhanced usage data. The integrated usage data, in turn, are viewed as user session data in terms of page view-weight pairs and utilized to derive the user access patterns based on the PLSA model. In addition, the user access patterns have also been characterized by the user profiles, which are presented in terms of weighted page sets. The benefit of weight in the user profiles can be used to determine the main theme of individual and
  5. 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 499 common user access pattern, which will provide useful information for further other web applications, such as web recommendation or personalization. III. PHASES IN WEB USAGE CLASSIFICATION The figure below depicts the various phases in user classification Fig.1 Phases of User Classification This model has been proposed in [16]. The sub modules are described as below. A. User identification User identification is the crucial step in data preprocessing model. Since its very challenging task to find out a particular user from heaps of the log records stored in the server, there are many ways to identify the users [14]. They are as follows: 1) Software agents These are small application modules which are installed in user computer. This keeps track of all the web transactions of the user. The only assumption is that both the server and user has the same application and information. 2) User id Here identifying is accurate since users themselves give their identification data Through userid and passwords. Here the only assumption is all server provide the registration forms. 3) Cookies These are the chunks of information stored in user’s computer by the server. It is most efficient technique but the user must have set the cookie on his machine otherwise no information is stored.An algorithm for user identification [8] is as follows: Algorithm: User Identification Input: Log Database. Output: Unique Users Database. Step1: Initialize IPList=0; UsersList=0; BrowserList=0; OSList=0; No-of-users=0; Step2: Read Record from LogDatabase Step3: If Record.IP address in not in IPList then add new Record.IPaddress in to IPList add Record.Browser in to BrowserList add Record.OS in to OSList increment count of No-of-users insert new user in to UserList. Else
  6. 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 500 If Record.IP address is present in IPList OR Record.Browser not in BrowserList OR Record.OS not in OSList then increment count of No-of-users insert as new user in to UserList. End of If End of If Step4: Repeat the above steps 2 to 3 until eof (Log Database ). Step5: Stop the process. The outcome of this algorithm is a Unique Users Database gives information about total number of individual users, users IPaddress, user agent and browser used[1]. In [3] they have used transactions to identify users, they have given the expression as bellow: They have described Boolean function as ʄ : {0,1 }ǀ I ǀ { 0,1}ǀ U ǀ+1 Where |I| is total number of possible items in the transaction set T , and |U| +1 is the total number of possible users plus the unknown user. The Algorithm [3] is given below: Algorithm: Identifying users Input: A transaction t, that requires identifying the user Output: Identified user, u^ , or unknown users u0 Procedure user_identification(t) { Identify all characteristics patterns, ptk, in t, i.e., ptk subset to PTt, , ptk c t and ptk belongs to PTc; If PTt is empty Return u0 Else Let PTt=PTc-PTt; Remove any pattern that is a sub pattern or another pattern in PTt Remove any pattern that is a super pattern of another pattern in vector PT For all ui belongs to U, compute Return uj with the maximum End if }
  7. 7. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 501 B. Session Identification Sessions are normally defined as the number of page visits done by the user in a certain time allotment. It depends on the particular user whether he/she has single or multiple sessions. Here even we can redefine user’s access pattern in that particular session by using some reconstruction techniques. In [9] they have mentioned two steps for session identification: •••• Identifying the different session of the user from the very poor data available in server log file. •••• Restructuring the user’s navigation patterns within the already identified sessions. They have used cookies to rewrite the urls. They have also addressed the web browser caching problem in their proposed paper.In [4], they have used click streams of users. They have named this process as sessionization. Here a transaction is defined as a subset of user session having homogenous pages. The two methods described here are: 1. Time Oriented Heuristics This is based on total session time. Here Page viewing time is defined as set of pages visited by a specific user at a specific time. The time can vary from 10 minutes to 10 hours depending on the patterns. Another method calculates time stamps of two records of same user for the maximum period of time they have stayed in a page 2. Navigation Oriented Heuristics Here the web topologies are used in graphical format. They have used referrer field in web log file to denote whether the page is a new session or it must be added to previous system. In [7] authors have mentioned an algorithm for grouping session of user. The Algorithm proposed by them is as follows Algorithm: Grouping User Sessions Input: P(Zk |Si), user session-page matrix SPij, threshold  Output: A set of clusters SCL=(SCL1,SCL2,….,SCLk) Begin Step1: SCL1=SCL2=...=SCLk= Step2: For each si belongs to S,select P(zk|si), if P(zk|si)>= Then SCLk=SCLk U si Step3: If there are still user’s sessions to be clustered, go back to step2 Step4: Return clusters SCL= {SCLk} C. User Classification Classification means supervised learning. Here the classes are previously defined. There is another method of training of data (unsupervised learning), also called as clustering. Here classes are dynamic. 1. Decision tree induction Decision tree [2] contains flow chart like tree structure. Here the internal nodes denote a conflict on a test or a condition and branches represent outcome of the condition or test. The leaf node represents the classes which are already defined. Decision tree is denoted in 2 steps a. Constructing tree Where training examples are at the root and it is partitioned recursively using attributes or features.
  8. 8. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 502 b. Tree pruning Here any noise or defects are eradicated from the tree especially from the branches, identifying these noise or defects is a complex process. Algorithm: Decision tree Step1: Construct a Top-Down recursive tree using divide and conquer method. Step2: Keep all the training samples at the root. Step3: If attributes have continuous values, make it discrete. Step4: Categorize the attributes. Step5: The sample is portioned recursively according to the attributes selection. Step 6: Based on the statistical measure like information gain attributes are selected. Step7: Stop the partitions if all samples are in same class for a particular node. Step8: Stop the partitioning if these are no remaining attributes or samples. 2. Bayesian classifier Given training data D, posteriori probability of a hypothesis h, P(h|D) follows the Bayes theorem MAP (maximum posteriori) hypothesis The classification problem may be formalized using a posteriori probabilities: P(C|X) = prob. that the sample tuple X=<x1,…,xk> is of class C E.g. P(class=N | outlook=sunny,windy=true,…)Idea: assign to sample X the class label C such that P(C|X) is maximal Bayes theorem: P(C|X) = P(X|C)·P(C) / P(X) where P(X) is constant for all classes and P(C) = relative freq of class C samples C such that P(C|X) is maximum = C such that P(X|C)·P(C) is maximum. 3. Other Classification approaches 3.1) K-nearest neighbor: Here instances are represented as points Algorithm: K-Nearest Neighbor Step 1: All samples or values are denoted as points in a n-D space. Step 2: Euclidean distance is determined to find out the nearest neighbor of all points. Step 3: Another function is expressed either discrete or real valued. It is called as target function. )( )()|()|( DP hPhDPDhP = .)()|(maxarg)|(maxarg hPhDP Hh DhP HhMAP h ∈ = ∈ ≡
  9. 9. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 503 Step 4: If discrete target function is used, K-NN returns the most common value Among the training example. The weight of the neighbor is calculated as W=1/d(xq,x1)2 3.2) Genetic Algorithm: It is a analog to biological evolution. Each rule is repeated as a string of bits. Initially population is created first by using random generated rules. Here the notation of survival of fittest is represented by the accuracy with which it classifies a set of training examples, crossovers and mutation of rules generate offspring. 3.3) Rough set approach In this approach the equivalent classes are defined approximately or roughly. In this approach for a given class C approximations are done in two sets that is lower approximation (certain to b in C) and upper approximation (can not b described as not in C). N-P hard is used to find the minimum set of attributes and a diascernibility matrix is used to reduce the computation intensity. 3.4) Fuzzy logic It uses the truth values between 0.0to1.0. These truth values represent the degree of fuzzy membership. Attribute values are converted to fuzzy values. For a given data more than one fuzzy value can be defined. Each rule which is applied denotes a vote for membership in that pertaining class. Finally the total of truth value is taken from each class. Classification on Web Mining As stated earlier Classification referes to supervised learning wherin the classes are predefined by the researcher. In Clustering we go in for unsupervised learning wherein there are no predefined classes. In classification we have two sets of data, the training set and the testing set. In the training set based on the characteristics of the model, we feed in data and build the database. Then based on this built database , using the testing set we classify the users. A number of classification methodologies exist IV. CONCLUSION In this paper, we survey the researches in the area of Web mining With the focus on the classification in Web Usage Mining. Around the key topic of this paper - usage mining, we provide detailed description of user identification as well as classification. In Classification Bayesian approach is the oldest and best approach to classify the users. User identification using some advanced techniques is a better way compared to the construction of the user profiles which requires the information of the users. So lots of areas in this domain are yet to be explored which may lead to personalization and to provide the desired information to the users. V. REFERENCES [1] Suneetha K.R and Dr. R. Krishnamoorthi (2009), Data Preprocessing and Easy Access Retrieval of Data Through Data Ware House, Proceedings of the World Congress on Engineering and Computer Science IWCECS, October 20-22, pp.306-311, San Francisco, USA.
  10. 10. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 504 [2] Micheline Kamber, Lara Winstone, Wan Gong, Shan Cheng, Jiawei Han, Generalization and Decision Tree Induction: Efficient Classification, Database Systems Research Laboratory School of Computing Science [3] Oren Etzioni(1996), The world-wide web: Quagmire or gold mine, Communications of the ACM, Vol. 39(11) pp. 65–68. [4] V. Chitra and Dr. Antony Selvdoss Davamani(2010),A Survey on Preprocessing Methods for Web Usage Data. (IJCSIS) International Journal of Computer Science and Information Security, Vol. 7, No. 3,pp. 78-83, ISSN 1947-5500 [5] Alice Marascu and Florent Masseglia (2006) ,Mining Sequential Patterns from Data Streams: a Centroid Approach, Journal of Intelligent Information Systems, Volume 27, Issue 3, pp 291-307 [6] R. Cooley, B. Mobasher, and J. Srivastava,(1997) , Web Mining: Information and Pattern Discovery on the World Wide Web, University of Minnesota, Dept. of Computer Science, Minneapolis, ACM SIGKDD, Vol.1, Issue 2 , pp. 12-23 [7] Guandong Xu, Yanchun Zhang, Jiangang Ma, Xiaofang Zhou, Discovering User Access Pattern Based on Probabilistic Latent Factor Model, in ADC ’05: Proceedings of the sixteenth Australasian database conference, Darlinghurst, Australia, Australian Computer Society, pp.27-35 [8] K.R.Suneetha, R. Krishnamoorti(2011), IRS: Intelligent Recommendation System for Web Personalization, European Journal of Scientific Research, Inc., ISSN 1450-216X, Vol.65 Issue 2, pp.175-186. [9] Federico Michele Facca and Pier Luca Lanzi(2003) , Recent Developments in Web Usage Mining Research, in proceddings of DaWaK ,Prague,Czech republic, LNCS, Springer Verlag [10] Jeffrey Heer, Ed H. Chi(2002) , Separating the swarm: categorization methods for user sessions on the web , in Proceedings of ACM CHI 2002, Conference on Human factors in Computing Systems, pp.243-250, ACM Press, Minnapolis [11] Yan Wang(2000), Web Mining and Knowledge Discovery of Usage Pattern,in Web Age Information Management System, pp 227-232 [12] Uichin Lee, Zhenyu Liu, Junghoo Cho(2005) , Automatic Identification of User Goals in Web Search”, University of California Los Angeles, CA 90095, In WWW2005: The 14th International World Wide Web Conference. [13] R. Kosala, H. Blockeel(2000), Web Mining Research: A Survey,” In SIGKDD Explorations, ACM press, Vol 2 Issue 1 , pp.1-15. [14] Susan Gauch, Mirco Speretta, Aravind Chandramouli and Alessandro Micarelli(2007), User Profiles for Personalized Information Access, Electrical Engineering and Computer Science Information & Telecommunication Technology Center, The Adaptive Web, LNCS 4321, pp.54- 89, Springer-Verlag Berlin Heidelberg [15] S. Vijayalakshmi and V.Mohan (2010), Mining of users access behavior for frequent sequential pattern from web logs, International Journal of Database Management Systems ( IJDMS) Vol.2, No.3,pp.31-45. [16] Suneetha K.R, Dr. R. Krishnamoorthi(2009), Data Preprocessing and Easy Access Retrieval of Data Through Data Ware House, Proceedings of the World Congress on Engineering and Computer Science , Vol IWCECS 2009, October 20-22, San Francisco, USA,ISBN :978-988- 17012-6-8 [17] Li Chaofeng(2006) ,Research and Development of Data Preprocessing in Web Usage Mining, School of Management, South-Central University for Nationalities,Wuhan 430074, P.R. Chinapp, International Conference on Management Science and Engineering, pp.1311-1315 [18] Sayeesh and Dr. Nagaratna P. Hegde, “A Comparison of Multiple Wavelet Algorithms for Iris Recognition”, International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 2, 2013, pp. 386 - 395, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375. [19] Sumana M and Hareesha K S, “Preprocessing and Secure Computations for Privacy Preservation Data Mining”, International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 4, 2013, pp. 203 - 212, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.

×