Web Usage Mining Process and its Applications

WEB USAGE MINING
Monu Chaudhary
071BCT522

INTRODUCTION
Web Usage mining is the process of
applying data mining techniques for the
discovery of usage patterns from Web data,
targeted towards various applications.

INTRODUCTION
Data collected at different levels:
➢ Server level
➢ Client level
➢ Proxy level

INTRODUCTION
Goal:
➢ analyze the behavioral patterns and
profiles of users interacting with a Web
site
➢ Understand and better serve the needs
of Web-based applications

INTRODUCTION
Classification based on Usage Data:
➢ Web server Data
➢ Application Server Data
➢ Application Level Data

INTRODUCTION
Importance:
➢ Growth of e-commerce
○ Provides an a cost effective way of
doing business.
➢ Hidden useful information
○ Visitors’ profile
○ Measure online marketing effort

INTRODUCTION
3 Phases:
➢ Preprocessing
➢ Pattern Discovery
➢ Pattern Analysis

PREPROCESSING
Preprocessing consists of converting the:
➢ usage information
➢ content information
➢ structure information
contained in the various available data
sources into the data abstractions necessary
for pattern discovery.

Preprocessing of Web Usage Mining

Data Cleaning remove irrelevant references
and fields in server logs, removes erroneous
references and adds missing references due
to caching.

Sessionization: the activities performed by a
user from the moment she enters the site
until the moment she leaves it.

User Identification records multiple sessions
for user. This log is called User activity
record.

A page view consists of every file that
contributes to the display on a user's browser
at one time.

Conceptually, each Page view can be viewed
as a collection of Web objects or resources
representing a specific “user event,” e.g.,
reading an article, viewing a product page, or
adding a product to the shopping cart.

Path Completion: Client- or proxy-side
caching can often result in missing access
references to those pages or objects that
have been cached.

Path Completion: For instance,
➢ if a user returns to a page A during the
same session, the second access to A will
likely result in viewing the previously
downloaded version of A that was
cached on the client- side, and therefore,
no request is made to the server.

Path Completion:
➢ This results in the second reference to A
not being recorded on the server logs.

Episode is a subset or subsequence of a
session comprised of semantically or
functionally related page views.

PATTERN DISCOVERY
Pattern discovery draws upon methods and
algorithms developed from several fields such as
statistics, data mining, machine learning and
pattern recognition.

PATTERN DISCOVERY
Methods:
➢ Statistical Analysis
➢ Association Rules
➢ Clustering
➢ Classification
➢ Sequential Patterns

PATTERN ANALYSIS
The motivation behind pattern analysis is to filter
out uninteresting rules or patterns from the set
found in the pattern discovery phase.

PATTERN ANALYSIS
Methods:
➢ A knowledge query mechanism such as SQL.
➢ Another method is to load usage data into a
data cube in order to perform Online
Analytical Processing (OLAP) operations.

PATTERN ANALYSIS
Methods:
➢ Visualization techniques, such as graphing
patterns or assigning colors to different
values.
➢ content and structure information can be
used to filter out patterns containing pages of
a certain usage type, content type, or pages
that match a certain hyperlink structure.

Application of Web Usage Mining

Advantages
➢ Personalized marketing.
➢ Fight against terrorism.
➢ Customer Relationship.
➢ Increase profitability by target pricing.

COLLABORATIVE FILTERING
Subodh chandra shakya
071BCT543

What is collaborative filtering…???
Collaborative filtering is a method of making
automatic predictions about the interest of a
user by collecting preferences or taste
information from other other users users(I.e
collaborating the interest )

Application
Mostly in e-commerce recommendation
system
Amazon
Netflix

This is how it works….
1.Weight all users with respect to similarity with active user
2. Select a subset of Users to use as a set of predictors
3. Compute a prediction from a weighted combination of
selected neighbors’ ratings

Collaborative filtering types
Memory Based: uses user rating data to compute
similarity between users or items user
rating,Neighbourhood based,Item Based etc
Model Based:Uses data mining and machine learning
Bayesian networks,neural embedding
models,clustering models,latent semantic models
such as SVD.

Approaches for CF (memory based)
User-Based CF - compute similarity based on User
Item-Based CF-Compute similarity base on item

User based CF
Look for users who share the same rating
patterns with the active user(the user whom
the prediction is for)
Use the ratings from those like-minded users
to calculate a prediction for the active user

Item based CF
1. Build an item-item matrix determining
relationships between pairs of items
1. Infer the tastes of the current user by
examining the matrix and matching that
user's data

Simple similarity is cosine similarity

Pearson correlation similarity

Collaborative Filtering problem
Cold-start: There should be enough other users
already in the system to find a match.New items
need to get enough ratings
Popularity Bias:Hard to recommend items to
someone with unique tastes

RECOMMENDER SYSTEMS
Atul Khatri
071bct509

Definition
● Estimate a utility function that automatically predicts how a
user will like an item
● Based on
○ Past Behavior
○ Relations to other users
○ Item similarity
○ Context

Impact
Apparent
● Advertisement
● Restaurants, cafes
● Movies, Tv shows, Music
● Books
● News articles
● Social sites including dating services

Impact(continued)
Not so apparent
● Courses in E-learning
● Drug components
● Research papers
● Citations
● Code modules

Types
● Collaborative Filtering system
● Content-based system
● Hybrid recommender system
○ Context-based system
○ Knowledge-based system

Paradigms of recommender
systems

Content-Based
Recommender System

● System creates a user profile based on users likes or dislikes
which are explicitly stated
● Every purchase updates the user profile.
● A content-based recommender system matches the profile of
item to user profile to decide its relevancy to the user

Content Representation
● Structured data
○ Small number of attributes
○ Each item described by same set of attributes
○ Known set of values of attributes

Content
Representation(contd...)
● Unstructured data
○ No attribute names with well defined values
○ Need to impose structure on text before use
○ Natural language complexity
■ Same word with different meaning
■ Different word with same meaning

Context-Based
Recommender Systems

● System uses additional data about context of an item
consumption.
● Example: Additional component of time may be used to
recommend restaurants to consumers i.e different restaurants
for breakfast, lunch and so on. Further, information about
whether you are going out to eat with your friends or family
should also vary the recommendation.

Major obstacles for contextual computing
● Obtain sufficient and reliable data describing user context
● Understand the impact of contextual dimensions on
personalisation process
● Computational model of contextual dimensions in more
classical recommendation technology
● For instance: How to extend Collaborative filtering to
include contextual dimensions?

Collective Intelligence
Sagun Nakarmi
071bct533

● A shared or group intelligence that emerges
from the collaboration and competition of many
individuals.
● Groups of people and computers, connected by
the Internet, collectively doing intelligent
things.

It can be understood as an emergent property from
the synergies among:
1) Data - knowledge-information
2) Software-hardware
3) Experts

For instance,
Google technology harvests knowledge generated
by millions of people creating and linking web
pages and then uses this knowledge to answer
queries in ways that often seem amazingly
intelligent.

In Wikipedia, thousands of people around the world
have collectively created a very large and high quality
intellectual product with almost no centralized control,
and almost all as volunteers!

Online multi-player games are another example
of collective intelligence. Games such as Dota 2,
Second Life and Call of Duty rely on gamers
coming together as a community to form the
game’s Identity.

Other examples:
● social networking ( perhaps the
most popular of collective intelligence.)
● Amazon, Hamrobazaar & other ecommerce sites
● etc

Web Usage Mining Process and its Applications

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Web Usage Mining Process and its Applications

Similar to Web Usage Mining Process and its Applications (20)

Recently uploaded

Recently uploaded (20)

Web Usage Mining Process and its Applications