Applying web mining application for user behavior understanding


Published on

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Applying web mining application for user behavior understanding

  1. 1. APPLYING WEB MINING APPLICATION FOR USER BEHAVIOR UNDERSTANDING Dr. Zakaria Suliman Zubi Associate Professor Computer Science Department Faculty Of Science Sirte University, Libya LOGO
  2. 2. LOGO Contents
  3. 3. LOGO Abstract Web usage mining (WUM) focuses on the discovering of potential knowledge from browsing patterns of the users. Which leads us to find the correlation between pages in the analysis stage. The primary data source used in web usage mining is the server log-files (web-logs).  Browsing web pages by the user leaves a lot of information in the log-file. Analyzing logfiles information drives us to understand the behavior of the user. Web log is an essential part for the web mining to extract usage patterns and study the visiting characteristics of user. Our paper focus on the use of web mining techniques to classify web pages type according to user visits.  This classification helps us to understand the user behavior. We also uses some classification and association rule techniques for discovering the potential knowledge from the browsing patterns.
  4. 4. LOGO Contents
  5. 5. LOGO INTRODUCTION The Internet offers a huge, widely global information center for News, advertising, consume information, financial management, education, government, and e-commerce . The aim of using web mining techniques for understanding user behavior is to profile user characteristics. Web mining can be organized into three main categories: web content mining, web structure mining, and web usage mining.
  6. 6. LOGO INTRODUCTION Cont.. Web Mining Web Structure Mining Web Content Mining Web Usage Mining 1-Web content mining analyzes web content such as text, multimedia data, and structured data (within web pages or linked across web pages). 2 -Web structure mining is the process of using graph and network mining theory and methods to analyze the nodes and connection structures on the Web. 3- Web Usage Mining is a special type of web mining tool, which can discover the knowledge in the hidden browsing patterns and analyses the visiting characteristics of the users.
  7. 7. LOGO INTRODUCTION Cont.. The Primary Data of Web Usage Mining 1-Web server logs . 2-Data about visitors of the sites. 3-Registration forms. Fig 2:portion of a typical server log A standard log-file had the following format remotehost; logname; username; date; request; status; bytes[ where: remotehost: is the remote hostname or its IP address; logname:is the remote log name of the user; username: is the username with which the user has authenticated himself, date: is the date and time of the request, request: is the exact request line as it came from the client, status: is the HTTP status code returned to the client, and bytes: is the content-length of the document transferred.
  8. 8. LOGO Contents
  9. 9. LOGO THE PHASES OF WEB USAGE MINING Web usage mining is a complete process that includes various stages of data mining cycle, including Data Preprocessing, Pattern Discovery & Pattern Analysis.  Initially, at the data preprocessing stage web log is preprocessed to clean, integrate and transform into a common log. In the pattern discovery: Data mining techniques are applied to discover the interesting characteristics in the hidden patterns. Pattern Analysis is the final stage of web usage mining which can validate interested patterns from the output of pattern discovery that can be used to predict user behavior.
  10. 10. LOGO THE PHASES OF WEB USAGE MINING Data Preprocessing Process Data Cleaning: The log-file is first examined to remove irrelevant entries such as those that represent multimedia data and scripts or uninteresting entries such as those that belongs to top/bottom frames. Pageview Identification: Identification of page views is heavily dependent on the intra-page structure of the site, as well as on the page contents and the underlying site do-main knowledge. each pageview can be viewed as a collection of Web objects or resources representing a specific “user event,”. Data Cleaning Pageview Identification User Identification Session Identification
  11. 11. LOGO THE PHASES OF WEB USAGE MINING Data Preprocessing Process User Identification: Since several users may share a single machine name, certain heuristics are used to identify users . We use the phrase user activity record to refer to the sequence of logged activities belonging to the same user. Session Identification:  Aims to split the page access of each user into separated sessions. It defines the number of times the user has accessed a web page and time out defines a time limit for the access of particular web page for more than 30 minutes if more the session will be divided in more than one session. Sample of user and sessions identification
  12. 12. LOGO THE PHASES OF WEB USAGE MINING Pattern Discovery Process: Discovering user access pattern from the user access log files is the main purpose of using web usage mining . Association Rule Mining: Association rule mining discovery and statistical correlation analysis can find groups of web pages types that are commonly accessed together (Association rule mining can be used to discover correlation between pages types found in a web log) this technique is applied to user and session identification consisting of item where every item represents a page type ,we will also use Apriori algorithm to find the correlation between pages based on the confidence and support vectors. What are the set of pages type frequently accessed together by the web users. e.g (Sport, News, Social) What the page type will be fetched next. e.g Entertainment
  13. 13. LOGO THE PHASES OF WEB USAGE MINING Classification Classification techniques play an important role in Web analytics applications for modeling the users according to various predefined metrics. In the Web domain, we are interested in developing a profile of users belonging to a particular class or category . This requires extraction and selection of features that best describe the properties of a given class or category. We will focus also on k-nearest neighbor (K-NN) which was considered as a predictive technique for classification models. Whereas;  k represents a number of similar cases or the number of items in the group.
  14. 14. LOGO THE PHASES OF WEB USAGE MINING Pattern Analysis Process: In this stage of process the discovered patterns will further processed ,filtered ,possibly resulting in aggregate user models that can be used as a visualizations tools ,the next figure summarizes the whole process:
  15. 15. LOGO Contents
  16. 16. RESULTS OF USING ASSOCIATION RULES LOGO Log-file in a flat file format. Import log-file database to our implemented application.
  17. 17. RESULTS OF USING ASSOCIATION RULES LOGO Extract the transactional database of web sever log for every user where every transaction represents a session. Find the association rules of user behavior after applying the Aprori algorithm to the transactional database of the user.
  18. 18. LOGO Contents
  19. 19. LOGO CONCLUSION  We used web data that contained all the information about the user. When the user leaves accessing the web pages. This data is called web logs or (serverlogs) A statistical methods such as classification, association rule mining discovery and statistical correlation analysis which can find groups of web pages types that are commonly accessed together are applied as well. Classification is used to map the data item into one of several predefined classes. The class will belongs into one category such as sport or politics or education or..etc. We also uses the k-nearest neighbor (K-NN) algorithm as a common classification method to select the best class. Association rule mining was used to discover correlation between sites types found in a web log. The implemented application program was designed in C# programming language.
  20. 20. Any Questions???? LOGO