This document provides an overview of web usage mining. It discusses that web usage mining applies data mining techniques to discover usage patterns from web data. The data can be collected at the server, client, or proxy level. The goals are to analyze user behavioral patterns and profiles, and understand how to better serve web applications. The process involves preprocessing data, pattern discovery using methods like statistical analysis and clustering, and pattern analysis including filtering patterns. Web usage mining can benefit applications like personalized marketing and increasing profitability.
2. INTRODUCTION
Web Usage mining is the process of
applying data mining techniques for the
discovery of usage patterns from Web data,
targeted towards various applications.
4. INTRODUCTION
Goal:
➢ analyze the behavioral patterns and
profiles of users interacting with a Web
site
➢ Understand and better serve the needs
of Web-based applications
6. INTRODUCTION
Importance:
➢ Growth of e-commerce
○ Provides an a cost effective way of
doing business.
➢ Hidden useful information
○ Visitors’ profile
○ Measure online marketing effort
8. PREPROCESSING
Preprocessing consists of converting the:
➢ usage information
➢ content information
➢ structure information
contained in the various available data
sources into the data abstractions necessary
for pattern discovery.
13. Preprocessing of Web Usage Mining
Data Cleaning remove irrelevant references
and fields in server logs, removes erroneous
references and adds missing references due
to caching.
14. Preprocessing of Web Usage Mining
Sessionization: the activities performed by a
user from the moment she enters the site
until the moment she leaves it.
18. Preprocessing of Web Usage Mining
A page view consists of every file that
contributes to the display on a user's browser
at one time.
19. Preprocessing of Web Usage Mining
Conceptually, each Page view can be viewed
as a collection of Web objects or resources
representing a specific “user event,” e.g.,
reading an article, viewing a product page, or
adding a product to the shopping cart.
20. Preprocessing of Web Usage Mining
Path Completion: Client- or proxy-side
caching can often result in missing access
references to those pages or objects that
have been cached.
21. Preprocessing of Web Usage Mining
Path Completion: For instance,
➢ if a user returns to a page A during the
same session, the second access to A will
likely result in viewing the previously
downloaded version of A that was
cached on the client- side, and therefore,
no request is made to the server.
22. Preprocessing of Web Usage Mining
Path Completion:
➢ This results in the second reference to A
not being recorded on the server logs.
24. Preprocessing of Web Usage Mining
Episode is a subset or subsequence of a
session comprised of semantically or
functionally related page views.
25. PATTERN DISCOVERY
Pattern discovery draws upon methods and
algorithms developed from several fields such as
statistics, data mining, machine learning and
pattern recognition.
27. PATTERN ANALYSIS
The motivation behind pattern analysis is to filter
out uninteresting rules or patterns from the set
found in the pattern discovery phase.
28. PATTERN ANALYSIS
Methods:
➢ A knowledge query mechanism such as SQL.
➢ Another method is to load usage data into a
data cube in order to perform Online
Analytical Processing (OLAP) operations.
29. PATTERN ANALYSIS
Methods:
➢ Visualization techniques, such as graphing
patterns or assigning colors to different
values.
➢ content and structure information can be
used to filter out patterns containing pages of
a certain usage type, content type, or pages
that match a certain hyperlink structure.
33. What is collaborative filtering…???
Collaborative filtering is a method of making
automatic predictions about the interest of a
user by collecting preferences or taste
information from other other users users(I.e
collaborating the interest )
35. This is how it works….
1.Weight all users with respect to similarity with active user
2. Select a subset of Users to use as a set of predictors
3. Compute a prediction from a weighted combination of
selected neighbors’ ratings
36. Collaborative filtering types
Memory Based: uses user rating data to compute
similarity between users or items user
rating,Neighbourhood based,Item Based etc
Model Based:Uses data mining and machine learning
Bayesian networks,neural embedding
models,clustering models,latent semantic models
such as SVD.
37. Approaches for CF (memory based)
User-Based CF - compute similarity based on User
Item-Based CF-Compute similarity base on item
38. User based CF
Look for users who share the same rating
patterns with the active user(the user whom
the prediction is for)
Use the ratings from those like-minded users
to calculate a prediction for the active user
39.
40. Item based CF
1. Build an item-item matrix determining
relationships between pairs of items
1. Infer the tastes of the current user by
examining the matrix and matching that
user's data
46. Collaborative Filtering problem
Cold-start: There should be enough other users
already in the system to find a match.New items
need to get enough ratings
Popularity Bias:Hard to recommend items to
someone with unique tastes
48. Definition
● Estimate a utility function that automatically predicts how a
user will like an item
● Based on
○ Past Behavior
○ Relations to other users
○ Item similarity
○ Context
59. ● System creates a user profile based on users likes or dislikes
which are explicitly stated
● Every purchase updates the user profile.
● A content-based recommender system matches the profile of
item to user profile to decide its relevancy to the user
63. Content Representation
● Structured data
○ Small number of attributes
○ Each item described by same set of attributes
○ Known set of values of attributes
64. Content
Representation(contd...)
● Unstructured data
○ No attribute names with well defined values
○ Need to impose structure on text before use
○ Natural language complexity
■ Same word with different meaning
■ Different word with same meaning
66. ● System uses additional data about context of an item
consumption.
● Example: Additional component of time may be used to
recommend restaurants to consumers i.e different restaurants
for breakfast, lunch and so on. Further, information about
whether you are going out to eat with your friends or family
should also vary the recommendation.
67. Major obstacles for contextual computing
● Obtain sufficient and reliable data describing user context
● Understand the impact of contextual dimensions on
personalisation process
● Computational model of contextual dimensions in more
classical recommendation technology
● For instance: How to extend Collaborative filtering to
include contextual dimensions?
69. ● A shared or group intelligence that emerges
from the collaboration and competition of many
individuals.
● Groups of people and computers, connected by
the Internet, collectively doing intelligent
things.
70.
71. It can be understood as an emergent property from
the synergies among:
1) Data - knowledge-information
2) Software-hardware
3) Experts
72. For instance,
Google technology harvests knowledge generated
by millions of people creating and linking web
pages and then uses this knowledge to answer
queries in ways that often seem amazingly
intelligent.
73. In Wikipedia, thousands of people around the world
have collectively created a very large and high quality
intellectual product with almost no centralized control,
and almost all as volunteers!
74. Online multi-player games are another example
of collective intelligence. Games such as Dota 2,
Second Life and Call of Duty rely on gamers
coming together as a community to form the
game’s Identity.
75. Other examples:
● social networking ( perhaps the
most popular of collective intelligence.)
● Amazon, Hamrobazaar & other ecommerce sites
● etc