SlideShare a Scribd company logo
UNDER THE GUIDANCE OF Ms. Reshma.R.Owhal
Dr. S.F .Sayyad ME(Computer)
Roll No:17MCO004
 Introduction
 Data Collection and Pre-Processing
 Data Modeling for Web Usage Mining
 Discovery and Analysis of Web Usage
Patterns
 Conclusions
 References
 Web usage mining
– can be broadly defined as discovery and analysis
useful information from the WWW.
– automatic discovery of patterns in clickstreams and
associated data, collected or generated as a result of user
interactions with one or more Web sites.
 Goal: analyze the behavioral patterns and profiles of
users interacting with a Web site.
 This is important in Web usage mining due to the
characteristics of clickstream data.
 This process is critical to the successful extraction of useful
patterns from the data.
 The process may involve pre-processing the original data,is a
process known as data preparation.
 Data cleaning
– remove irrelevant references and fields in server
logs
– remove references due to spider/robot navigation
– add missing references due to caching (done after
sessionization)
 Data fusion/integration
– synchronize data from multiple server logs
– integrate e-commerce and application server data
– integrate meta-data (e.g., content labels)
Data transformation
– user identification
– sessionization
– pageview identification
• a pageview is a set of page files and associated
objects that contribute to a single display in a Web Browser
Data Reduction
– sampling and dimensionality reduction (ignoring certain
pageviews / items)
 Identifying User Transactions
– i.e., sets or sequences of pageviews possibly with
associated weights
Sessionization (Identify sessions )
-It is the process of segmenting the user activity record of
each user into sessions, each representing a single visit to the site.
-The goal of a sessionization heuristic is to reconstruct, from
the clickstream data, the actual sequence of actions performed by
one user during one visit to the site
Difficult to obtain reliable usage data due to
– proxy servers
– dynamic IP addresses,
– the inability of servers.
Pageview identification
– Depends on the intra-page structure of sites
– Identify the collection of Web files representing a specific “user
event” corresponding to a clickthrough (e.g. viewing a product page, adding a
product to a shopping cart)
– e.g like the purchase of a product on an online ecommerce Site
User Identification
– The analysis of Web usage does not require knowledge about a
user’s identity. So it is necessary to distinguish among different users.
– Since a user may visit a site more than once, the server logs record
multiple sessions for each user.
Path completion
-Client- or proxy-side caching can often result in missing
access references to those pages or objects that have been cached.
- For instance,
– if a user goes back to a page A during the same session, the
second access to A will likely result in viewing the previously
downloaded version of A that was cached on the client-side, and
therefore, no request is made to the server.
– This results in the second reference to A not being
recorded on the server logs.
 The discovered patterns: usually represented as
– collections of pages, objects, or resources that are
frequently accessed by groups of users with
common interests.
 Decision Trees
◦ a flow chart of questions leading to a decision
◦ Ex: car buying decision tree
 Path Analysis
◦ Uses Graph Model
◦ Provide insights to navigational problems
◦ Example of info. Discovered by Path analysis:
 78% “company”-> “what’s new”->“sample”-> “order”
 60% left sites after 4 or less page references
=> most important info must be within the first 4 pages of site entry
points.
 Grouping
◦ Groups similar info. to help draw higher-level conclusions
◦ Ex: all URLs containing the word “Yahoo”…
 Filtering
◦ Allows to answer specific questions like:
 how many visitors to the site in this week?
 Cookies
◦ Randomly assigned ID by web server to browser
◦ Cookies are beneficial to both web site developers and visitors
◦ Cookie field entry in log file can be used by Web traffic analysis
software to track repeat visitors  loyal customers.
 Association Rules
◦ help find spending patterns on related products
◦ 30% who accessed/company/products/bread.html, also accessed
/company/products/milk.htm.
 Sequential Patterns
◦ help find inter-transaction patterns
◦ 50% who bought items in /pcworld/computers/, also bought in
/pcworld/accessories/ within 15 days
 Clustering
◦ Identifies visitors with common characteristics based on visitors’ profiles
◦ One straightforward approach in creating an aggregate view of each
cluster is to compute the centroid of each cluster.
◦ 50% who applied discover platinum card in
/discovercard/customerService/newcard, were in the 25-35 age group,
with annual income between $40,000 – 50,000.
 Web Mining support on-going, continuous improvements for E-
businesses
 Web usage and data mining to find patterns is a growing area with the
growth of Web-based applications
 Application of web usage data can be used to better understand web
usage, and apply this specific knowledge to better serve users
 Web usage patterns and data mining can be the basis for a great deal
of future research
 Web Usage Mining from Bing Liu. “Web Data Mining: Exploring
Hyperlinks, Contents, and Usage Data”, Springer Chapter written by
Bamshad Mobasher.
 Web Usage Mining-What, Why, hoW Presented by : Roopa Datla ,
Jinguang Liu.
 Web Usage Mining: Discovery and Applications of Usage Patterns
from Web Data Srivastava J., Cooley R., Deshpande M, Tan
P.N.Appeared in SIGKDD Explorations, Vol. 1, Issue 2, 2000.
 Web Usage Mining: Processes and Applications Qiaoyuan Jiang CSE
8331 November 24, 2003.
Thank you…..

More Related Content

What's hot

Learn to Rank search results
Learn to Rank search resultsLearn to Rank search results
Learn to Rank search results
Ganesh Venkataraman
 
Decision tree
Decision treeDecision tree
Decision tree
Venkata Reddy Konasani
 
Movie lens recommender systems
Movie lens recommender systemsMovie lens recommender systems
Movie lens recommender systems
Kapil Garg
 
Data Mining: an Introduction
Data Mining: an IntroductionData Mining: an Introduction
Data Mining: an Introduction
Ali Abbasi
 
Multidimensional Indexing
Multidimensional IndexingMultidimensional Indexing
Multidimensional IndexingDigvijay Singh
 
WEB MINING.
WEB MINING.WEB MINING.
WEB MINING.
Sushil kasar
 
Recommender systems for E-commerce
Recommender systems for E-commerceRecommender systems for E-commerce
Recommender systems for E-commerce
Alexander Konduforov
 
Web mining slides
Web mining slidesWeb mining slides
Web mining slides
mahavir_a
 
clickstream analysis
 clickstream analysis clickstream analysis
clickstream analysis
ERSHUBHAM TIWARI
 
Web mining
Web miningWeb mining
Web mining
Daminda Herath
 
The Smart Cube | Marketing Mix Modeling: An Old Remedy for New Ills
The Smart Cube | Marketing Mix Modeling: An Old Remedy for New IllsThe Smart Cube | Marketing Mix Modeling: An Old Remedy for New Ills
The Smart Cube | Marketing Mix Modeling: An Old Remedy for New Ills
Melissa Luongo
 
Data Mining and Business Intelligence Tools
Data Mining and Business Intelligence ToolsData Mining and Business Intelligence Tools
Data Mining and Business Intelligence Tools
Motaz Saad
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
Viet-Trung TRAN
 
The Evolution of Data Science
The Evolution of Data ScienceThe Evolution of Data Science
The Evolution of Data Science
Kenny Daniel
 
Content based recommendation systems
Content based recommendation systemsContent based recommendation systems
Content based recommendation systems
Aravindharamanan S
 
Fraud detection ML
Fraud detection MLFraud detection ML
Fraud detection ML
MaatougSelim
 
Fraud Analytics
Fraud AnalyticsFraud Analytics
Fraud Analytics
Big Data Colombia
 
Data Transformation PowerPoint Presentation Slides
Data Transformation PowerPoint Presentation Slides Data Transformation PowerPoint Presentation Slides
Data Transformation PowerPoint Presentation Slides
SlideTeam
 
Collaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CFCollaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CF
Yusuke Yamamoto
 

What's hot (20)

Learn to Rank search results
Learn to Rank search resultsLearn to Rank search results
Learn to Rank search results
 
Decision tree
Decision treeDecision tree
Decision tree
 
Movie lens recommender systems
Movie lens recommender systemsMovie lens recommender systems
Movie lens recommender systems
 
Data Mining: an Introduction
Data Mining: an IntroductionData Mining: an Introduction
Data Mining: an Introduction
 
Multidimensional Indexing
Multidimensional IndexingMultidimensional Indexing
Multidimensional Indexing
 
WEB MINING.
WEB MINING.WEB MINING.
WEB MINING.
 
Recommender systems for E-commerce
Recommender systems for E-commerceRecommender systems for E-commerce
Recommender systems for E-commerce
 
Web mining slides
Web mining slidesWeb mining slides
Web mining slides
 
clickstream analysis
 clickstream analysis clickstream analysis
clickstream analysis
 
Web mining
Web miningWeb mining
Web mining
 
The Smart Cube | Marketing Mix Modeling: An Old Remedy for New Ills
The Smart Cube | Marketing Mix Modeling: An Old Remedy for New IllsThe Smart Cube | Marketing Mix Modeling: An Old Remedy for New Ills
The Smart Cube | Marketing Mix Modeling: An Old Remedy for New Ills
 
Data Mining and Business Intelligence Tools
Data Mining and Business Intelligence ToolsData Mining and Business Intelligence Tools
Data Mining and Business Intelligence Tools
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
 
Lecture - Data Mining
Lecture - Data MiningLecture - Data Mining
Lecture - Data Mining
 
The Evolution of Data Science
The Evolution of Data ScienceThe Evolution of Data Science
The Evolution of Data Science
 
Content based recommendation systems
Content based recommendation systemsContent based recommendation systems
Content based recommendation systems
 
Fraud detection ML
Fraud detection MLFraud detection ML
Fraud detection ML
 
Fraud Analytics
Fraud AnalyticsFraud Analytics
Fraud Analytics
 
Data Transformation PowerPoint Presentation Slides
Data Transformation PowerPoint Presentation Slides Data Transformation PowerPoint Presentation Slides
Data Transformation PowerPoint Presentation Slides
 
Collaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CFCollaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CF
 

Similar to Web usage mining

Implementation of Intelligent Web Server Monitoring
Implementation of Intelligent Web Server MonitoringImplementation of Intelligent Web Server Monitoring
Implementation of Intelligent Web Server Monitoring
iosrjce
 
C017231726
C017231726C017231726
C017231726
IOSR Journals
 
Pxc3893553
Pxc3893553Pxc3893553
Pxc3893553
Ouzza Brahim
 
Clickstream Analysis
Clickstream AnalysisClickstream Analysis
Clickstream Analysis
intuitiv.de
 
applyingwebminingapplicationforuserbehaviorunderstanding-131215105223-phpapp0...
applyingwebminingapplicationforuserbehaviorunderstanding-131215105223-phpapp0...applyingwebminingapplicationforuserbehaviorunderstanding-131215105223-phpapp0...
applyingwebminingapplicationforuserbehaviorunderstanding-131215105223-phpapp0...
Zakaria Zubi
 
Web analytics white paper Quiterian
Web analytics white paper QuiterianWeb analytics white paper Quiterian
Web analytics white paper Quiterian
Josep Arroyo
 
Ecommerce by bhawani nandan prasad
Ecommerce by bhawani nandan prasadEcommerce by bhawani nandan prasad
Ecommerce by bhawani nandan prasad
Bhawani N Prasad
 
IRJET - Re-Ranking of Google Search Results
IRJET - Re-Ranking of Google Search ResultsIRJET - Re-Ranking of Google Search Results
IRJET - Re-Ranking of Google Search Results
IRJET Journal
 
Web Data mining-A Research area in Web usage mining
Web Data mining-A Research area in Web usage miningWeb Data mining-A Research area in Web usage mining
Web Data mining-A Research area in Web usage mining
IOSR Journals
 
Performance of Real Time Web Traffic Analysis Using Feed Forward Neural Netw...
Performance of Real Time Web Traffic Analysis Using Feed  Forward Neural Netw...Performance of Real Time Web Traffic Analysis Using Feed  Forward Neural Netw...
Performance of Real Time Web Traffic Analysis Using Feed Forward Neural Netw...
IOSR Journals
 
Web mining and social media mining
Web mining and social media miningWeb mining and social media mining
Web mining and social media mining
Roxana Tadayon
 
Web
WebWeb
An Enhanced Approach for Detecting User's Behavior Applying Country-Wise Loca...
An Enhanced Approach for Detecting User's Behavior Applying Country-Wise Loca...An Enhanced Approach for Detecting User's Behavior Applying Country-Wise Loca...
An Enhanced Approach for Detecting User's Behavior Applying Country-Wise Loca...
IJSRD
 
Applying web mining application for user behavior understanding
Applying web mining application for user behavior understandingApplying web mining application for user behavior understanding
Applying web mining application for user behavior understandingZakaria Zubi
 
Certain Issues in Web Page Prediction, Classification and Clustering in Data ...
Certain Issues in Web Page Prediction, Classification and Clustering in Data ...Certain Issues in Web Page Prediction, Classification and Clustering in Data ...
Certain Issues in Web Page Prediction, Classification and Clustering in Data ...
IJAEMSJORNAL
 
Automatic recommendation for online users using web usage mining
Automatic recommendation for online users using web usage miningAutomatic recommendation for online users using web usage mining
Automatic recommendation for online users using web usage mining
IJMIT JOURNAL
 
Automatic Recommendation for Online Users Using Web Usage Mining
Automatic Recommendation for Online Users Using Web Usage Mining Automatic Recommendation for Online Users Using Web Usage Mining
Automatic Recommendation for Online Users Using Web Usage Mining
IJMIT JOURNAL
 
2000-08.doc
2000-08.doc2000-08.doc
2000-08.docbutest
 
2000-08.doc
2000-08.doc2000-08.doc
2000-08.docbutest
 

Similar to Web usage mining (20)

Implementation of Intelligent Web Server Monitoring
Implementation of Intelligent Web Server MonitoringImplementation of Intelligent Web Server Monitoring
Implementation of Intelligent Web Server Monitoring
 
C017231726
C017231726C017231726
C017231726
 
Pxc3893553
Pxc3893553Pxc3893553
Pxc3893553
 
Clickstream Analysis
Clickstream AnalysisClickstream Analysis
Clickstream Analysis
 
applyingwebminingapplicationforuserbehaviorunderstanding-131215105223-phpapp0...
applyingwebminingapplicationforuserbehaviorunderstanding-131215105223-phpapp0...applyingwebminingapplicationforuserbehaviorunderstanding-131215105223-phpapp0...
applyingwebminingapplicationforuserbehaviorunderstanding-131215105223-phpapp0...
 
Web analytics white paper Quiterian
Web analytics white paper QuiterianWeb analytics white paper Quiterian
Web analytics white paper Quiterian
 
Ecommerce by bhawani nandan prasad
Ecommerce by bhawani nandan prasadEcommerce by bhawani nandan prasad
Ecommerce by bhawani nandan prasad
 
IRJET - Re-Ranking of Google Search Results
IRJET - Re-Ranking of Google Search ResultsIRJET - Re-Ranking of Google Search Results
IRJET - Re-Ranking of Google Search Results
 
Web Data mining-A Research area in Web usage mining
Web Data mining-A Research area in Web usage miningWeb Data mining-A Research area in Web usage mining
Web Data mining-A Research area in Web usage mining
 
Performance of Real Time Web Traffic Analysis Using Feed Forward Neural Netw...
Performance of Real Time Web Traffic Analysis Using Feed  Forward Neural Netw...Performance of Real Time Web Traffic Analysis Using Feed  Forward Neural Netw...
Performance of Real Time Web Traffic Analysis Using Feed Forward Neural Netw...
 
Web mining and social media mining
Web mining and social media miningWeb mining and social media mining
Web mining and social media mining
 
Web
WebWeb
Web
 
An Enhanced Approach for Detecting User's Behavior Applying Country-Wise Loca...
An Enhanced Approach for Detecting User's Behavior Applying Country-Wise Loca...An Enhanced Approach for Detecting User's Behavior Applying Country-Wise Loca...
An Enhanced Approach for Detecting User's Behavior Applying Country-Wise Loca...
 
Applying web mining application for user behavior understanding
Applying web mining application for user behavior understandingApplying web mining application for user behavior understanding
Applying web mining application for user behavior understanding
 
Certain Issues in Web Page Prediction, Classification and Clustering in Data ...
Certain Issues in Web Page Prediction, Classification and Clustering in Data ...Certain Issues in Web Page Prediction, Classification and Clustering in Data ...
Certain Issues in Web Page Prediction, Classification and Clustering in Data ...
 
Web Mining
Web Mining Web Mining
Web Mining
 
Automatic recommendation for online users using web usage mining
Automatic recommendation for online users using web usage miningAutomatic recommendation for online users using web usage mining
Automatic recommendation for online users using web usage mining
 
Automatic Recommendation for Online Users Using Web Usage Mining
Automatic Recommendation for Online Users Using Web Usage Mining Automatic Recommendation for Online Users Using Web Usage Mining
Automatic Recommendation for Online Users Using Web Usage Mining
 
2000-08.doc
2000-08.doc2000-08.doc
2000-08.doc
 
2000-08.doc
2000-08.doc2000-08.doc
2000-08.doc
 

Recently uploaded

A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
Priyankaranawat4
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
AyyanKhan40
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
Krisztián Száraz
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
ak6969907
 
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
NelTorrente
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
goswamiyash170123
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
chanes7
 
DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
taiba qazi
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
Top five deadliest dog breeds in America
Top five deadliest dog breeds in AmericaTop five deadliest dog breeds in America
Top five deadliest dog breeds in America
Bisnar Chase Personal Injury Attorneys
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Akanksha trivedi rama nursing college kanpur.
 

Recently uploaded (20)

A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
 
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
 
DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
Top five deadliest dog breeds in America
Top five deadliest dog breeds in AmericaTop five deadliest dog breeds in America
Top five deadliest dog breeds in America
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
 

Web usage mining

  • 1. UNDER THE GUIDANCE OF Ms. Reshma.R.Owhal Dr. S.F .Sayyad ME(Computer) Roll No:17MCO004
  • 2.  Introduction  Data Collection and Pre-Processing  Data Modeling for Web Usage Mining  Discovery and Analysis of Web Usage Patterns  Conclusions  References
  • 3.  Web usage mining – can be broadly defined as discovery and analysis useful information from the WWW. – automatic discovery of patterns in clickstreams and associated data, collected or generated as a result of user interactions with one or more Web sites.  Goal: analyze the behavioral patterns and profiles of users interacting with a Web site.
  • 4.
  • 5.  This is important in Web usage mining due to the characteristics of clickstream data.  This process is critical to the successful extraction of useful patterns from the data.  The process may involve pre-processing the original data,is a process known as data preparation.
  • 6.
  • 7.  Data cleaning – remove irrelevant references and fields in server logs – remove references due to spider/robot navigation – add missing references due to caching (done after sessionization)  Data fusion/integration – synchronize data from multiple server logs – integrate e-commerce and application server data – integrate meta-data (e.g., content labels)
  • 8. Data transformation – user identification – sessionization – pageview identification • a pageview is a set of page files and associated objects that contribute to a single display in a Web Browser Data Reduction – sampling and dimensionality reduction (ignoring certain pageviews / items)  Identifying User Transactions – i.e., sets or sequences of pageviews possibly with associated weights
  • 9. Sessionization (Identify sessions ) -It is the process of segmenting the user activity record of each user into sessions, each representing a single visit to the site. -The goal of a sessionization heuristic is to reconstruct, from the clickstream data, the actual sequence of actions performed by one user during one visit to the site Difficult to obtain reliable usage data due to – proxy servers – dynamic IP addresses, – the inability of servers.
  • 10. Pageview identification – Depends on the intra-page structure of sites – Identify the collection of Web files representing a specific “user event” corresponding to a clickthrough (e.g. viewing a product page, adding a product to a shopping cart) – e.g like the purchase of a product on an online ecommerce Site User Identification – The analysis of Web usage does not require knowledge about a user’s identity. So it is necessary to distinguish among different users. – Since a user may visit a site more than once, the server logs record multiple sessions for each user.
  • 11. Path completion -Client- or proxy-side caching can often result in missing access references to those pages or objects that have been cached. - For instance, – if a user goes back to a page A during the same session, the second access to A will likely result in viewing the previously downloaded version of A that was cached on the client-side, and therefore, no request is made to the server. – This results in the second reference to A not being recorded on the server logs.
  • 12.
  • 13.  The discovered patterns: usually represented as – collections of pages, objects, or resources that are frequently accessed by groups of users with common interests.
  • 14.  Decision Trees ◦ a flow chart of questions leading to a decision ◦ Ex: car buying decision tree  Path Analysis ◦ Uses Graph Model ◦ Provide insights to navigational problems ◦ Example of info. Discovered by Path analysis:  78% “company”-> “what’s new”->“sample”-> “order”  60% left sites after 4 or less page references => most important info must be within the first 4 pages of site entry points.
  • 15.  Grouping ◦ Groups similar info. to help draw higher-level conclusions ◦ Ex: all URLs containing the word “Yahoo”…  Filtering ◦ Allows to answer specific questions like:  how many visitors to the site in this week?  Cookies ◦ Randomly assigned ID by web server to browser ◦ Cookies are beneficial to both web site developers and visitors ◦ Cookie field entry in log file can be used by Web traffic analysis software to track repeat visitors  loyal customers.
  • 16.  Association Rules ◦ help find spending patterns on related products ◦ 30% who accessed/company/products/bread.html, also accessed /company/products/milk.htm.  Sequential Patterns ◦ help find inter-transaction patterns ◦ 50% who bought items in /pcworld/computers/, also bought in /pcworld/accessories/ within 15 days  Clustering ◦ Identifies visitors with common characteristics based on visitors’ profiles ◦ One straightforward approach in creating an aggregate view of each cluster is to compute the centroid of each cluster. ◦ 50% who applied discover platinum card in /discovercard/customerService/newcard, were in the 25-35 age group, with annual income between $40,000 – 50,000.
  • 17.  Web Mining support on-going, continuous improvements for E- businesses  Web usage and data mining to find patterns is a growing area with the growth of Web-based applications  Application of web usage data can be used to better understand web usage, and apply this specific knowledge to better serve users  Web usage patterns and data mining can be the basis for a great deal of future research
  • 18.  Web Usage Mining from Bing Liu. “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data”, Springer Chapter written by Bamshad Mobasher.  Web Usage Mining-What, Why, hoW Presented by : Roopa Datla , Jinguang Liu.  Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data Srivastava J., Cooley R., Deshpande M, Tan P.N.Appeared in SIGKDD Explorations, Vol. 1, Issue 2, 2000.  Web Usage Mining: Processes and Applications Qiaoyuan Jiang CSE 8331 November 24, 2003.