SlideShare a Scribd company logo
Chapter 12: Web Usage Mining
- An introduction
Chapter written by Bamshad Mobasher
Many slides are from a tutorial given by
B. Berendt, B. Mobasher, M. Spiliopoulou
Introduction
 Web usage mining: automatic discovery of
patterns in clickstreams and associated data
collected or generated as a result of user
interactions with one or more Web sites.
 Goal: analyze the behavioral patterns and
profiles of users interacting with a Web site.
 The discovered patterns are usually
represented as collections of pages, objects,
or resources that are frequently accessed by
groups of users with common interests.
Bing Liu 3
Introduction
 Data in Web Usage Mining:
 Web server logs
 Site contents
 Data about the visitors, gathered from external channels
 Further application data
 Not all these data are always available.
 When they are, they must be integrated.
 A large part of Web usage mining is about
processing usage/ clickstream data.
 After that various data mining algorithm can be applied.
Web server logs
Bing Liu 4
1 2006-02-01 00:08:43 1.2.3.4 - GET /classes/cs589/papers.html - 200 9221
HTTP/1.1 maya.cs.depaul.edu
Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1;+.NET+CLR+2.0.50727)
http://dataminingresources.blogspot.com/
2 2006-02-01 00:08:46 1.2.3.4 - GET /classes/cs589/papers/cms-tai.pdf - 200 4096
HTTP/1.1 maya.cs.depaul.edu
Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1;+.NET+CLR+2.0.50727)
http://maya.cs.depaul.edu/~classes/cs589/papers.html
3 2006-02-01 08:01:28 2.3.4.5 - GET /classes/ds575/papers/hyperlink.pdf - 200
318814 HTTP/1.1 maya.cs.depaul.edu
Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1)
http://www.google.com/search?hl=en&lr=&q=hyperlink+analysis+for+the+web+survey
4 2006-02-02 19:34:45 3.4.5.6 - GET /classes/cs480/announce.html - 200 3794
HTTP/1.1 maya.cs.depaul.edu
Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1)
http://maya.cs.depaul.edu/~classes/cs480/
5 2006-02-02 19:34:45 3.4.5.6 - GET /classes/cs480/styles2.css - 200 1636
HTTP/1.1 maya.cs.depaul.edu
Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1)
http://maya.cs.depaul.edu/~classes/cs480/announce.html
6 2006-02-02 19:34:45 3.4.5.6 - GET /classes/cs480/header.gif - 200 6027
HTTP/1.1 maya.cs.depaul.edu
Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1)
http://maya.cs.depaul.edu/~classes/cs480/announce.html
Web usage mining process
Bing Liu 5
Bing Liu 6
Data preparation
Pre-processing of web usage data
Bing Liu 7
Data cleaning
 Data cleaning
 remove irrelevant references and fields in server
logs
 remove references due to spider navigation
 remove erroneous references
 add missing references due to caching (done after
sessionization)
Bing Liu 8
Identify sessions (sessionization)
 In Web usage analysis, these data are the
sessions of the site visitors: the activities
performed by a user from the moment she
enters the site until the moment she leaves it.
 Difficult to obtain reliable usage data due to
proxy servers and anonymizers, dynamic IP
addresses, missing references due to
caching, and the inability of servers to
distinguish among different visits.
Bing Liu 9
Sessionization strategies
Bing Liu 10
Sessionization heuristics
Bing Liu 11
Sessionization example
Bing Liu 12
User identification
Bing Liu 13
User identification: an example
Bing Liu 14
Pageview
 A pageview is an aggregate representation of
a collection of Web objects contributing to the
display on a user’s browser resulting from a
single user action (such as a click-through).
 Conceptually, each pageview can be viewed
as a collection of Web objects or resources
representing a specific “user event,” e.g.,
reading an article, viewing a product page, or
adding a product to the shopping cart.
Bing Liu 15
Path completion
 Client- or proxy-side caching can often result
in missing access references to those pages
or objects that have been cached.
 For instance,
 if a user returns to a page A during the same
session, the second access to A will likely result in
viewing the previously downloaded version of A
that was cached on the client-side, and therefore,
no request is made to the server.
 This results in the second reference to A not
being recorded on the server logs.
Bing Liu 16
Missing references due to caching
Bing Liu 17
Path completion
 The problem of inferring missing user
references due to caching.
 Effective path completion requires extensive
knowledge of the link structure within the site
 Referrer information in server logs can also
be used in disambiguating the inferred paths.
 Problem gets much more complicated in
frame-based sites.
Bing Liu 18
Integrating with e-commerce events
 Either product oriented or visit oriented
 Used to track and analyze conversion of
browsers to buyers.
 Major difficulty for E-commerce events is defining
and implementing the events for a site, however,
in contrast to clickstream data, getting reliable
preprocessed data is not a problem.
 Another major challenge is the successful
integration with clickstream data
Bing Liu 19
Product-Oriented Events
 Product View
 Occurs every time a product is displayed on a
page view
 Typical Types: Image, Link, Text
 Product Click-through
 Occurs every time a user “clicks” on a product to
get more information
Bing Liu 20
Product-Oriented Events
 Shopping Cart Changes
 Shopping Cart Add or Remove
 Shopping Cart Change - quantity or other feature
(e.g. size) is changed
 Product Buy or Bid
 Separate buy event occurs for each product in the
shopping cart
 Auction sites can track bid events in addition to
the product purchases
Bing Liu 21
Web usage mining process
Bing Liu 22
Integration with page content
Bing Liu 23
Integration with link structure
Bing Liu 24
E-commerce data analysis
Bing Liu 25
Session analysis
 Simplest form of analysis: examine individual
or groups of server sessions and e-
commerce data.
 Advantages:
 Gain insight into typical customer behaviors.
 Trace specific problems with the site.
 Drawbacks:
 LOTS of data.
 Difficult to generalize.
Bing Liu 26
Session analysis: aggregate reports
Bing Liu 27
OLAP
Bing Liu 28
Data mining
Bing Liu 29
Data mining (cont.)
Bing Liu 30
Some usage mining applications
Bing Liu 31
Personalization application
Bing Liu 32
Standard approaches
Bing Liu 33
Summary
 Web usage mining has emerged as the essential
tool for realizing more personalized, user-friendly
and business-optimal Web services.
 The key is to use the user-clickstream data for
many mining purposes.
 Traditionally, Web usage mining is used by e-
commerce sites to organize their sites and to
increase profits.
 It is now also used by search engines to improve
search quality and to evaluate search results, etc,
and by many other applications.
Bing Liu 34

More Related Content

What's hot

The impact of web on ir
The impact of web on irThe impact of web on ir
The impact of web on ir
Primya Tamil
 
Web mining slides
Web mining slidesWeb mining slides
Web mining slides
mahavir_a
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
Milind Gokhale
 
Power Laws and Rich-Get-Richer Phenomena
Power Laws and Rich-Get-Richer PhenomenaPower Laws and Rich-Get-Richer Phenomena
Power Laws and Rich-Get-Richer Phenomena
Ai Sha
 
Web mining
Web miningWeb mining
Web mining
Daminda Herath
 
Web Mining
Web MiningWeb Mining
Web Mining
dataminers.ir
 
Online shopping report-6 month project
Online shopping report-6 month projectOnline shopping report-6 month project
Online shopping report-6 month project
Ginne yoffe
 
Food Order Management System
Food Order Management SystemFood Order Management System
Food Order Management System
SheikhAmikUllahPrott
 
Web Search and Mining
Web Search and MiningWeb Search and Mining
Web Search and Mining
sathish sak
 
Case Study Of Webgraph
Case Study Of WebgraphCase Study Of Webgraph
Case Study Of Webgraph
Suraksha Sanghavi
 
Machine Learning in Healthcare
Machine Learning in HealthcareMachine Learning in Healthcare
Machine Learning in Healthcare
BigR.io
 
Module 2nd USER INTERFACE DESIGN (15CS832) - VTU
Module 2nd USER INTERFACE DESIGN (15CS832) - VTUModule 2nd USER INTERFACE DESIGN (15CS832) - VTU
Module 2nd USER INTERFACE DESIGN (15CS832) - VTU
Sachin Gowda
 
Collaborative filtering
Collaborative filteringCollaborative filtering
Collaborative filtering
Neha Kulkarni
 
Information Retrieval Techniques of Google
Information Retrieval Techniques of Google Information Retrieval Techniques of Google
Information Retrieval Techniques of Google
Cyr Ish
 
Project Report of Faculty feedback system
Project Report of Faculty feedback systemProject Report of Faculty feedback system
Project Report of Faculty feedback system
BalajeeSofTech
 
Amazon Item-to-Item Recommendations
Amazon Item-to-Item RecommendationsAmazon Item-to-Item Recommendations
Amazon Item-to-Item Recommendations
Roger Chen
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
nimmyjans4
 
Web mining (structure mining)
Web mining (structure mining)Web mining (structure mining)
Web mining (structure mining)
Amir Fahmideh
 
Unit2 hci
Unit2 hciUnit2 hci
Unit2 hci
pradeepgupta266
 
Ppt of blogs
Ppt of blogsPpt of blogs
Ppt of blogs
Kritika Chauhan
 

What's hot (20)

The impact of web on ir
The impact of web on irThe impact of web on ir
The impact of web on ir
 
Web mining slides
Web mining slidesWeb mining slides
Web mining slides
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
 
Power Laws and Rich-Get-Richer Phenomena
Power Laws and Rich-Get-Richer PhenomenaPower Laws and Rich-Get-Richer Phenomena
Power Laws and Rich-Get-Richer Phenomena
 
Web mining
Web miningWeb mining
Web mining
 
Web Mining
Web MiningWeb Mining
Web Mining
 
Online shopping report-6 month project
Online shopping report-6 month projectOnline shopping report-6 month project
Online shopping report-6 month project
 
Food Order Management System
Food Order Management SystemFood Order Management System
Food Order Management System
 
Web Search and Mining
Web Search and MiningWeb Search and Mining
Web Search and Mining
 
Case Study Of Webgraph
Case Study Of WebgraphCase Study Of Webgraph
Case Study Of Webgraph
 
Machine Learning in Healthcare
Machine Learning in HealthcareMachine Learning in Healthcare
Machine Learning in Healthcare
 
Module 2nd USER INTERFACE DESIGN (15CS832) - VTU
Module 2nd USER INTERFACE DESIGN (15CS832) - VTUModule 2nd USER INTERFACE DESIGN (15CS832) - VTU
Module 2nd USER INTERFACE DESIGN (15CS832) - VTU
 
Collaborative filtering
Collaborative filteringCollaborative filtering
Collaborative filtering
 
Information Retrieval Techniques of Google
Information Retrieval Techniques of Google Information Retrieval Techniques of Google
Information Retrieval Techniques of Google
 
Project Report of Faculty feedback system
Project Report of Faculty feedback systemProject Report of Faculty feedback system
Project Report of Faculty feedback system
 
Amazon Item-to-Item Recommendations
Amazon Item-to-Item RecommendationsAmazon Item-to-Item Recommendations
Amazon Item-to-Item Recommendations
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
 
Web mining (structure mining)
Web mining (structure mining)Web mining (structure mining)
Web mining (structure mining)
 
Unit2 hci
Unit2 hciUnit2 hci
Unit2 hci
 
Ppt of blogs
Ppt of blogsPpt of blogs
Ppt of blogs
 

Similar to Web usage-mining

Web usage mining
Web usage miningWeb usage mining
Web usage mining
shabnamfsayyad
 
Implementation of Intelligent Web Server Monitoring
Implementation of Intelligent Web Server MonitoringImplementation of Intelligent Web Server Monitoring
Implementation of Intelligent Web Server Monitoring
iosrjce
 
C017231726
C017231726C017231726
C017231726
IOSR Journals
 
Automatic recommendation for online users using web usage mining
Automatic recommendation for online users using web usage miningAutomatic recommendation for online users using web usage mining
Automatic recommendation for online users using web usage mining
IJMIT JOURNAL
 
Automatic Recommendation for Online Users Using Web Usage Mining
Automatic Recommendation for Online Users Using Web Usage Mining Automatic Recommendation for Online Users Using Web Usage Mining
Automatic Recommendation for Online Users Using Web Usage Mining
IJMIT JOURNAL
 
Making Web Analytics actionable with Web Content Management
Making Web Analytics actionable with Web Content ManagementMaking Web Analytics actionable with Web Content Management
Making Web Analytics actionable with Web Content Management
Amplexor
 
Clickstream Analysis
Clickstream AnalysisClickstream Analysis
Clickstream Analysis
intuitiv.de
 
Pxc3893553
Pxc3893553Pxc3893553
Pxc3893553
Ouzza Brahim
 
International conference On Computer Science And technology
International conference On Computer Science And technologyInternational conference On Computer Science And technology
International conference On Computer Science And technology
anchalsinghdm
 
Personal web usage mining
Personal web usage miningPersonal web usage mining
Personal web usage mining
Daminda Herath
 
Personal Web Usage Mining
Personal Web Usage MiningPersonal Web Usage Mining
Personal Web Usage Mining
Daminda Herath
 
Google Analytics tutorial by Jay Murphy
Google Analytics tutorial by Jay Murphy Google Analytics tutorial by Jay Murphy
Google Analytics tutorial by Jay Murphy
Newport Interactive Marketers
 
Web Data mining-A Research area in Web usage mining
Web Data mining-A Research area in Web usage miningWeb Data mining-A Research area in Web usage mining
Web Data mining-A Research area in Web usage mining
IOSR Journals
 
Data preparation for mining world wide web browsing patterns (1999)
Data preparation for mining world wide web browsing patterns (1999)Data preparation for mining world wide web browsing patterns (1999)
Data preparation for mining world wide web browsing patterns (1999)
OUM SAOKOSAL
 
Web analytics white paper Quiterian
Web analytics white paper QuiterianWeb analytics white paper Quiterian
Web analytics white paper Quiterian
Josep Arroyo
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
ijcax
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
ijcax
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
ijcax
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
ijcax
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
ijcax
 

Similar to Web usage-mining (20)

Web usage mining
Web usage miningWeb usage mining
Web usage mining
 
Implementation of Intelligent Web Server Monitoring
Implementation of Intelligent Web Server MonitoringImplementation of Intelligent Web Server Monitoring
Implementation of Intelligent Web Server Monitoring
 
C017231726
C017231726C017231726
C017231726
 
Automatic recommendation for online users using web usage mining
Automatic recommendation for online users using web usage miningAutomatic recommendation for online users using web usage mining
Automatic recommendation for online users using web usage mining
 
Automatic Recommendation for Online Users Using Web Usage Mining
Automatic Recommendation for Online Users Using Web Usage Mining Automatic Recommendation for Online Users Using Web Usage Mining
Automatic Recommendation for Online Users Using Web Usage Mining
 
Making Web Analytics actionable with Web Content Management
Making Web Analytics actionable with Web Content ManagementMaking Web Analytics actionable with Web Content Management
Making Web Analytics actionable with Web Content Management
 
Clickstream Analysis
Clickstream AnalysisClickstream Analysis
Clickstream Analysis
 
Pxc3893553
Pxc3893553Pxc3893553
Pxc3893553
 
International conference On Computer Science And technology
International conference On Computer Science And technologyInternational conference On Computer Science And technology
International conference On Computer Science And technology
 
Personal web usage mining
Personal web usage miningPersonal web usage mining
Personal web usage mining
 
Personal Web Usage Mining
Personal Web Usage MiningPersonal Web Usage Mining
Personal Web Usage Mining
 
Google Analytics tutorial by Jay Murphy
Google Analytics tutorial by Jay Murphy Google Analytics tutorial by Jay Murphy
Google Analytics tutorial by Jay Murphy
 
Web Data mining-A Research area in Web usage mining
Web Data mining-A Research area in Web usage miningWeb Data mining-A Research area in Web usage mining
Web Data mining-A Research area in Web usage mining
 
Data preparation for mining world wide web browsing patterns (1999)
Data preparation for mining world wide web browsing patterns (1999)Data preparation for mining world wide web browsing patterns (1999)
Data preparation for mining world wide web browsing patterns (1999)
 
Web analytics white paper Quiterian
Web analytics white paper QuiterianWeb analytics white paper Quiterian
Web analytics white paper Quiterian
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
 

Recently uploaded

Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 

Recently uploaded (20)

Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 

Web usage-mining

  • 1. Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M. Spiliopoulou
  • 2. Introduction  Web usage mining: automatic discovery of patterns in clickstreams and associated data collected or generated as a result of user interactions with one or more Web sites.  Goal: analyze the behavioral patterns and profiles of users interacting with a Web site.  The discovered patterns are usually represented as collections of pages, objects, or resources that are frequently accessed by groups of users with common interests.
  • 3. Bing Liu 3 Introduction  Data in Web Usage Mining:  Web server logs  Site contents  Data about the visitors, gathered from external channels  Further application data  Not all these data are always available.  When they are, they must be integrated.  A large part of Web usage mining is about processing usage/ clickstream data.  After that various data mining algorithm can be applied.
  • 4. Web server logs Bing Liu 4 1 2006-02-01 00:08:43 1.2.3.4 - GET /classes/cs589/papers.html - 200 9221 HTTP/1.1 maya.cs.depaul.edu Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1;+.NET+CLR+2.0.50727) http://dataminingresources.blogspot.com/ 2 2006-02-01 00:08:46 1.2.3.4 - GET /classes/cs589/papers/cms-tai.pdf - 200 4096 HTTP/1.1 maya.cs.depaul.edu Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1;+.NET+CLR+2.0.50727) http://maya.cs.depaul.edu/~classes/cs589/papers.html 3 2006-02-01 08:01:28 2.3.4.5 - GET /classes/ds575/papers/hyperlink.pdf - 200 318814 HTTP/1.1 maya.cs.depaul.edu Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1) http://www.google.com/search?hl=en&lr=&q=hyperlink+analysis+for+the+web+survey 4 2006-02-02 19:34:45 3.4.5.6 - GET /classes/cs480/announce.html - 200 3794 HTTP/1.1 maya.cs.depaul.edu Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1) http://maya.cs.depaul.edu/~classes/cs480/ 5 2006-02-02 19:34:45 3.4.5.6 - GET /classes/cs480/styles2.css - 200 1636 HTTP/1.1 maya.cs.depaul.edu Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1) http://maya.cs.depaul.edu/~classes/cs480/announce.html 6 2006-02-02 19:34:45 3.4.5.6 - GET /classes/cs480/header.gif - 200 6027 HTTP/1.1 maya.cs.depaul.edu Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1) http://maya.cs.depaul.edu/~classes/cs480/announce.html
  • 5. Web usage mining process Bing Liu 5
  • 6. Bing Liu 6 Data preparation
  • 7. Pre-processing of web usage data Bing Liu 7
  • 8. Data cleaning  Data cleaning  remove irrelevant references and fields in server logs  remove references due to spider navigation  remove erroneous references  add missing references due to caching (done after sessionization) Bing Liu 8
  • 9. Identify sessions (sessionization)  In Web usage analysis, these data are the sessions of the site visitors: the activities performed by a user from the moment she enters the site until the moment she leaves it.  Difficult to obtain reliable usage data due to proxy servers and anonymizers, dynamic IP addresses, missing references due to caching, and the inability of servers to distinguish among different visits. Bing Liu 9
  • 14. User identification: an example Bing Liu 14
  • 15. Pageview  A pageview is an aggregate representation of a collection of Web objects contributing to the display on a user’s browser resulting from a single user action (such as a click-through).  Conceptually, each pageview can be viewed as a collection of Web objects or resources representing a specific “user event,” e.g., reading an article, viewing a product page, or adding a product to the shopping cart. Bing Liu 15
  • 16. Path completion  Client- or proxy-side caching can often result in missing access references to those pages or objects that have been cached.  For instance,  if a user returns to a page A during the same session, the second access to A will likely result in viewing the previously downloaded version of A that was cached on the client-side, and therefore, no request is made to the server.  This results in the second reference to A not being recorded on the server logs. Bing Liu 16
  • 17. Missing references due to caching Bing Liu 17
  • 18. Path completion  The problem of inferring missing user references due to caching.  Effective path completion requires extensive knowledge of the link structure within the site  Referrer information in server logs can also be used in disambiguating the inferred paths.  Problem gets much more complicated in frame-based sites. Bing Liu 18
  • 19. Integrating with e-commerce events  Either product oriented or visit oriented  Used to track and analyze conversion of browsers to buyers.  Major difficulty for E-commerce events is defining and implementing the events for a site, however, in contrast to clickstream data, getting reliable preprocessed data is not a problem.  Another major challenge is the successful integration with clickstream data Bing Liu 19
  • 20. Product-Oriented Events  Product View  Occurs every time a product is displayed on a page view  Typical Types: Image, Link, Text  Product Click-through  Occurs every time a user “clicks” on a product to get more information Bing Liu 20
  • 21. Product-Oriented Events  Shopping Cart Changes  Shopping Cart Add or Remove  Shopping Cart Change - quantity or other feature (e.g. size) is changed  Product Buy or Bid  Separate buy event occurs for each product in the shopping cart  Auction sites can track bid events in addition to the product purchases Bing Liu 21
  • 22. Web usage mining process Bing Liu 22
  • 23. Integration with page content Bing Liu 23
  • 24. Integration with link structure Bing Liu 24
  • 26. Session analysis  Simplest form of analysis: examine individual or groups of server sessions and e- commerce data.  Advantages:  Gain insight into typical customer behaviors.  Trace specific problems with the site.  Drawbacks:  LOTS of data.  Difficult to generalize. Bing Liu 26
  • 27. Session analysis: aggregate reports Bing Liu 27
  • 31. Some usage mining applications Bing Liu 31
  • 34. Summary  Web usage mining has emerged as the essential tool for realizing more personalized, user-friendly and business-optimal Web services.  The key is to use the user-clickstream data for many mining purposes.  Traditionally, Web usage mining is used by e- commerce sites to organize their sites and to increase profits.  It is now also used by search engines to improve search quality and to evaluate search results, etc, and by many other applications. Bing Liu 34