SlideShare a Scribd company logo
WEB MINING
Shree Ram Rauniyar (072-BCT-536)
Suman Kharti (072-BCT-542)
Ujjwal Gewali (072-BCT-546)
Vijay Yadav (072-BCT-547)
What is data mining ?
2
Data mining is defined as
a process used to extract
usable data from a larger
set of any raw data.
What is web mining ?
3
Web mining is
an application of data
mining techniques to
find information
patterns from the web
data.
“ TYPES OF WEB
MINING
44
1. Web usage
mining
2. Web content
mining
3. Web structure
mining
WEB USAGE MINING
6
1
7
 Discover interesting usage patterns from Web data.
 Identity or origin of Web users with their browsing
behavior.
 Web Server Data: Collect IP address, page reference and access
time etc.
 Application Server Data: Significant in E-commerce applications
e.g. track and log business events
Introduction
Advantages of web Usage Mining
8
▰ Target potential customers.
▰ Enhance the web surfing experience through personalization.
▰ Identify potential advertisement locations.
▰ Better customer relationship.
▰ Building a strategy for digital marketing.
▰ Identify crime.
Disadvantages of web Usage Mining
9
▰ Invasion of privacy.
▰ Trading of personal data.
Applications
10
▰ E-Banking
▰ Search Engine
▰ Online Auction
▰ E-Learning
▰ E-commerce
How much is your data worth ?
11
▰ $84 - $251 per email address.
▰ Answering a survey : $1 - $5
▰ Providing feedback over a period of 2-4 weeks on usage of product:
$50 - $100
▰ Allowing insurance company to track your speed, location and
acceleration: $100 - $300 depending on car
▰ Facebook earns $45 - $190 and Google earns $5000 per user per
year.
Continue…
12
This 5000 dollars in US means
 625 beers per year
 Or, 352 pizzas (and yes, we are talking about good stuff)
So, this is the price we pay unknowingly for using
google (per year).
Some companies making billions through web usage mining
PRACTICAL EXAMPLE
14
of web usage mining
Practical example of web usage mining
15
Usage according to country (All data taken from Youtube)
16
Usage according to age group
Usage according to OS & device type
17
Geographies, traffic sources, gender
18
WEB STRUCTURE MINING
19
2
20
 Discover the link structure of hyperlink.
 Mine structure i.e. links, graphs of the web.
 Identity the web pages are either linked by information or direct
link connection.
 Purpose is to produce structure summary of websites and similar
web pages.
Introduction
21
 PageRank
 CLEVER
Techniques for web structure mining
22
 Quality of pages
 Interesting Web Structure
 Web Page Classification
 Finding related pages
 Detection of duplicate page
Helps in
23
 Is mathematical formula that judges value of
pages
 Is probability distribution used to represent
likelihood that a person randomly clicking on links
 Developed by Larry Page and Sergey Brin in 1997
PageRank
24
PR(A) = (1‐d) + d (PR(T1)/C(T1) + …
+ PR(Tn)/C(Tn))
D= damping factor(0.85)
page A has pages T1…Tn which point
to it
C(A) = number of links going out of
page A
Work??
25
▰
The quantity and quality of inbound linking
pages.
▰
The number of outbound links on each linking
page.
▰
The PageRank of each linking page.
Three factors used in PageRank
26
 System developed by IBM
Authoritative Pages:
Contain valuable information about pages
Hub Pages:
Contain links to highly important pages
 Uses HITS(Hyperlink Induces Topic Search) algorithm
CLEVER
27
 A good hub page is one that points to many good
authority pages.
 Many in links: authority
 Many out links: hub
 A good authority page is one that is pointed by many
good hub pages.
Hubs and Authorities
PRACTICAL EXAMPLE
28
of web structure mining
“
Challenges
Web is too huge
-for finding relevant information
Complexity of web pages
-support for multimedia files
-not just text search
Diversity in usage
Relevancy of information
-read tags?? titles ?? urls??
Complex queries and page indexing
Key problem in information
retrieval is how to improve
correlation of information.
Pros and cons
Pros
 Commercial applications
 Security
 Efficient search
 Check authenticity
Cons
 Privacy issues
 Trading personal data
 Denial of services based
on personal attributes

More Related Content

What's hot

Web Mining
Web MiningWeb Mining
Web Mining
Ziyad Abid
 
Web mining tools
Web mining toolsWeb mining tools
Web mining tools
Sujata Regoti
 
Web content mining
Web content miningWeb content mining
Web content mining
Akanksha Dombe
 
WEB MINING.
WEB MINING.WEB MINING.
WEB MINING.
Sushil kasar
 
Web mining
Web miningWeb mining
Web mining
MohamadHayeri1
 
Personal Web Usage Mining
Personal Web Usage MiningPersonal Web Usage Mining
Personal Web Usage Mining
Daminda Herath
 
Web Content Mining
Web Content MiningWeb Content Mining
Web Content Mining
Daminda Herath
 
Web Mining
Web MiningWeb Mining
Web Mining
Mudit Dholakia
 
Discovering knowledge using web structure mining
Discovering knowledge using web structure miningDiscovering knowledge using web structure mining
Discovering knowledge using web structure mining
Atul Khanna
 
Webmining ppt
Webmining pptWebmining ppt
Webmining ppt
kiransatyawada
 
Web mining
Web miningWeb mining
Web mining
Silicon
 
Web Mining Presentation Final
Web Mining Presentation FinalWeb Mining Presentation Final
Web Mining Presentation Final
Er. Jagrat Gupta
 
Web mining
Web miningWeb mining
Web mining
DheerajKashnyal
 
Web mining
Web miningWeb mining
Web mining
shireen fatima
 
Web mining
Web mining Web mining
Web mining
Nandini Sahu
 
5463 26 web mining
5463 26 web mining5463 26 web mining
Web mining
Web miningWeb mining
Web mining
Iniya Kannan
 
A survey on web usage mining techniques
A survey on web usage mining techniquesA survey on web usage mining techniques
A survey on web usage mining techniques
International Center for Research & Development
 
Web mining
Web miningWeb mining
Web mining
SarthakSahoo8
 
Web mining (structure mining)
Web mining (structure mining)Web mining (structure mining)
Web mining (structure mining)
Amir Fahmideh
 

What's hot (20)

Web Mining
Web MiningWeb Mining
Web Mining
 
Web mining tools
Web mining toolsWeb mining tools
Web mining tools
 
Web content mining
Web content miningWeb content mining
Web content mining
 
WEB MINING.
WEB MINING.WEB MINING.
WEB MINING.
 
Web mining
Web miningWeb mining
Web mining
 
Personal Web Usage Mining
Personal Web Usage MiningPersonal Web Usage Mining
Personal Web Usage Mining
 
Web Content Mining
Web Content MiningWeb Content Mining
Web Content Mining
 
Web Mining
Web MiningWeb Mining
Web Mining
 
Discovering knowledge using web structure mining
Discovering knowledge using web structure miningDiscovering knowledge using web structure mining
Discovering knowledge using web structure mining
 
Webmining ppt
Webmining pptWebmining ppt
Webmining ppt
 
Web mining
Web miningWeb mining
Web mining
 
Web Mining Presentation Final
Web Mining Presentation FinalWeb Mining Presentation Final
Web Mining Presentation Final
 
Web mining
Web miningWeb mining
Web mining
 
Web mining
Web miningWeb mining
Web mining
 
Web mining
Web mining Web mining
Web mining
 
5463 26 web mining
5463 26 web mining5463 26 web mining
5463 26 web mining
 
Web mining
Web miningWeb mining
Web mining
 
A survey on web usage mining techniques
A survey on web usage mining techniquesA survey on web usage mining techniques
A survey on web usage mining techniques
 
Web mining
Web miningWeb mining
Web mining
 
Web mining (structure mining)
Web mining (structure mining)Web mining (structure mining)
Web mining (structure mining)
 

Similar to Web mining

Web mining
Web miningWeb mining
Web mining
Rashmi Bhat
 
Data analytics and SEO to grow your international business
Data analytics and SEO to grow your international businessData analytics and SEO to grow your international business
Data analytics and SEO to grow your international business
Enterprise Ireland
 
SEO Metrics in 2014
SEO Metrics in 2014SEO Metrics in 2014
SEO Metrics in 2014
Cramer Krasselt
 
SEO for WordPress
SEO for WordPressSEO for WordPress
SEO for WordPress
Overdrive Interactive
 
Seo basics
Seo basicsSeo basics
Seo basics
LE GRAND
 
Web mining
Web miningWeb mining
Web mining
Jay Lohokare
 
My SEO Analysis of SEO LLC
My SEO Analysis of SEO LLCMy SEO Analysis of SEO LLC
My SEO Analysis of SEO LLC
Brian Bateman
 
A synonym based approach of data mining in SEO
A synonym based approach of data mining in SEOA synonym based approach of data mining in SEO
A synonym based approach of data mining in SEO
hussein khateb
 
SEO 2014- Future of SEO
SEO 2014- Future of SEOSEO 2014- Future of SEO
SEO 2014- Future of SEO
Navneet Kaushal
 
Vorian Agency - Web Analytics & Online Monitoring Tools Seminar
Vorian Agency - Web Analytics & Online Monitoring Tools SeminarVorian Agency - Web Analytics & Online Monitoring Tools Seminar
Vorian Agency - Web Analytics & Online Monitoring Tools Seminar
Matt Lynch
 
International conference On Computer Science And technology
International conference On Computer Science And technologyInternational conference On Computer Science And technology
International conference On Computer Science And technology
anchalsinghdm
 
The Role of Backlinks in SEO
The Role of Backlinks in SEOThe Role of Backlinks in SEO
The Role of Backlinks in SEO
Aarav Infotech
 
Analyzing a Link with Google's Eyes by Matteo Monari
Analyzing a Link with Google's Eyes by Matteo MonariAnalyzing a Link with Google's Eyes by Matteo Monari
Analyzing a Link with Google's Eyes by Matteo Monari
Bizup
 
Rich Snippets - What Are they and How do I get One?
Rich Snippets - What Are they and How do I get One?Rich Snippets - What Are they and How do I get One?
Rich Snippets - What Are they and How do I get One?
Colleen Harris
 
Web usage-mining
Web usage-miningWeb usage-mining
Web usage-mining
Samik Bhattacharjee
 
Data analytics and SEO to grow your international business | John Caldwell | ...
Data analytics and SEO to grow your international business | John Caldwell | ...Data analytics and SEO to grow your international business | John Caldwell | ...
Data analytics and SEO to grow your international business | John Caldwell | ...
Enterprise Ireland
 
AI-Powered SEO with Botify: Automation in Prevention, Execution, and Implemen...
AI-Powered SEO with Botify: Automation in Prevention, Execution, and Implemen...AI-Powered SEO with Botify: Automation in Prevention, Execution, and Implemen...
AI-Powered SEO with Botify: Automation in Prevention, Execution, and Implemen...
In Marketing We Trust
 
Web mining
Web miningWeb mining
Web mining
jabedskakib
 
How to measure success with google analytics
How to measure success with google analyticsHow to measure success with google analytics
How to measure success with google analytics
Phenom People
 
Seo beginners
Seo beginners Seo beginners
Seo beginners
Health Care
 

Similar to Web mining (20)

Web mining
Web miningWeb mining
Web mining
 
Data analytics and SEO to grow your international business
Data analytics and SEO to grow your international businessData analytics and SEO to grow your international business
Data analytics and SEO to grow your international business
 
SEO Metrics in 2014
SEO Metrics in 2014SEO Metrics in 2014
SEO Metrics in 2014
 
SEO for WordPress
SEO for WordPressSEO for WordPress
SEO for WordPress
 
Seo basics
Seo basicsSeo basics
Seo basics
 
Web mining
Web miningWeb mining
Web mining
 
My SEO Analysis of SEO LLC
My SEO Analysis of SEO LLCMy SEO Analysis of SEO LLC
My SEO Analysis of SEO LLC
 
A synonym based approach of data mining in SEO
A synonym based approach of data mining in SEOA synonym based approach of data mining in SEO
A synonym based approach of data mining in SEO
 
SEO 2014- Future of SEO
SEO 2014- Future of SEOSEO 2014- Future of SEO
SEO 2014- Future of SEO
 
Vorian Agency - Web Analytics & Online Monitoring Tools Seminar
Vorian Agency - Web Analytics & Online Monitoring Tools SeminarVorian Agency - Web Analytics & Online Monitoring Tools Seminar
Vorian Agency - Web Analytics & Online Monitoring Tools Seminar
 
International conference On Computer Science And technology
International conference On Computer Science And technologyInternational conference On Computer Science And technology
International conference On Computer Science And technology
 
The Role of Backlinks in SEO
The Role of Backlinks in SEOThe Role of Backlinks in SEO
The Role of Backlinks in SEO
 
Analyzing a Link with Google's Eyes by Matteo Monari
Analyzing a Link with Google's Eyes by Matteo MonariAnalyzing a Link with Google's Eyes by Matteo Monari
Analyzing a Link with Google's Eyes by Matteo Monari
 
Rich Snippets - What Are they and How do I get One?
Rich Snippets - What Are they and How do I get One?Rich Snippets - What Are they and How do I get One?
Rich Snippets - What Are they and How do I get One?
 
Web usage-mining
Web usage-miningWeb usage-mining
Web usage-mining
 
Data analytics and SEO to grow your international business | John Caldwell | ...
Data analytics and SEO to grow your international business | John Caldwell | ...Data analytics and SEO to grow your international business | John Caldwell | ...
Data analytics and SEO to grow your international business | John Caldwell | ...
 
AI-Powered SEO with Botify: Automation in Prevention, Execution, and Implemen...
AI-Powered SEO with Botify: Automation in Prevention, Execution, and Implemen...AI-Powered SEO with Botify: Automation in Prevention, Execution, and Implemen...
AI-Powered SEO with Botify: Automation in Prevention, Execution, and Implemen...
 
Web mining
Web miningWeb mining
Web mining
 
How to measure success with google analytics
How to measure success with google analyticsHow to measure success with google analytics
How to measure success with google analytics
 
Seo beginners
Seo beginners Seo beginners
Seo beginners
 

More from Vijay Yadav

Decision tree presentation
Decision tree presentationDecision tree presentation
Decision tree presentation
Vijay Yadav
 
Summary of Knowledge creating company article
Summary of Knowledge creating company articleSummary of Knowledge creating company article
Summary of Knowledge creating company article
Vijay Yadav
 
Hospital and-medical-store-management-system
Hospital and-medical-store-management-systemHospital and-medical-store-management-system
Hospital and-medical-store-management-system
Vijay Yadav
 
Software reuse slide
Software reuse slideSoftware reuse slide
Software reuse slide
Vijay Yadav
 
File system performance
File system performanceFile system performance
File system performance
Vijay Yadav
 
Client server
Client serverClient server
Client server
Vijay Yadav
 
Cricket database
Cricket databaseCricket database
Cricket database
Vijay Yadav
 
Full text and relational search
Full text and relational searchFull text and relational search
Full text and relational search
Vijay Yadav
 

More from Vijay Yadav (8)

Decision tree presentation
Decision tree presentationDecision tree presentation
Decision tree presentation
 
Summary of Knowledge creating company article
Summary of Knowledge creating company articleSummary of Knowledge creating company article
Summary of Knowledge creating company article
 
Hospital and-medical-store-management-system
Hospital and-medical-store-management-systemHospital and-medical-store-management-system
Hospital and-medical-store-management-system
 
Software reuse slide
Software reuse slideSoftware reuse slide
Software reuse slide
 
File system performance
File system performanceFile system performance
File system performance
 
Client server
Client serverClient server
Client server
 
Cricket database
Cricket databaseCricket database
Cricket database
 
Full text and relational search
Full text and relational searchFull text and relational search
Full text and relational search
 

Recently uploaded

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 

Recently uploaded (20)

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 

Web mining

  • 1. WEB MINING Shree Ram Rauniyar (072-BCT-536) Suman Kharti (072-BCT-542) Ujjwal Gewali (072-BCT-546) Vijay Yadav (072-BCT-547)
  • 2. What is data mining ? 2 Data mining is defined as a process used to extract usable data from a larger set of any raw data.
  • 3. What is web mining ? 3 Web mining is an application of data mining techniques to find information patterns from the web data.
  • 4. “ TYPES OF WEB MINING 44
  • 5. 1. Web usage mining 2. Web content mining 3. Web structure mining
  • 7. 7  Discover interesting usage patterns from Web data.  Identity or origin of Web users with their browsing behavior.  Web Server Data: Collect IP address, page reference and access time etc.  Application Server Data: Significant in E-commerce applications e.g. track and log business events Introduction
  • 8. Advantages of web Usage Mining 8 ▰ Target potential customers. ▰ Enhance the web surfing experience through personalization. ▰ Identify potential advertisement locations. ▰ Better customer relationship. ▰ Building a strategy for digital marketing. ▰ Identify crime.
  • 9. Disadvantages of web Usage Mining 9 ▰ Invasion of privacy. ▰ Trading of personal data.
  • 10. Applications 10 ▰ E-Banking ▰ Search Engine ▰ Online Auction ▰ E-Learning ▰ E-commerce
  • 11. How much is your data worth ? 11 ▰ $84 - $251 per email address. ▰ Answering a survey : $1 - $5 ▰ Providing feedback over a period of 2-4 weeks on usage of product: $50 - $100 ▰ Allowing insurance company to track your speed, location and acceleration: $100 - $300 depending on car ▰ Facebook earns $45 - $190 and Google earns $5000 per user per year.
  • 12. Continue… 12 This 5000 dollars in US means  625 beers per year  Or, 352 pizzas (and yes, we are talking about good stuff) So, this is the price we pay unknowingly for using google (per year).
  • 13. Some companies making billions through web usage mining
  • 15. Practical example of web usage mining 15 Usage according to country (All data taken from Youtube)
  • 17. Usage according to OS & device type 17
  • 20. 20  Discover the link structure of hyperlink.  Mine structure i.e. links, graphs of the web.  Identity the web pages are either linked by information or direct link connection.  Purpose is to produce structure summary of websites and similar web pages. Introduction
  • 21. 21  PageRank  CLEVER Techniques for web structure mining
  • 22. 22  Quality of pages  Interesting Web Structure  Web Page Classification  Finding related pages  Detection of duplicate page Helps in
  • 23. 23  Is mathematical formula that judges value of pages  Is probability distribution used to represent likelihood that a person randomly clicking on links  Developed by Larry Page and Sergey Brin in 1997 PageRank
  • 24. 24 PR(A) = (1‐d) + d (PR(T1)/C(T1) + … + PR(Tn)/C(Tn)) D= damping factor(0.85) page A has pages T1…Tn which point to it C(A) = number of links going out of page A Work??
  • 25. 25 ▰ The quantity and quality of inbound linking pages. ▰ The number of outbound links on each linking page. ▰ The PageRank of each linking page. Three factors used in PageRank
  • 26. 26  System developed by IBM Authoritative Pages: Contain valuable information about pages Hub Pages: Contain links to highly important pages  Uses HITS(Hyperlink Induces Topic Search) algorithm CLEVER
  • 27. 27  A good hub page is one that points to many good authority pages.  Many in links: authority  Many out links: hub  A good authority page is one that is pointed by many good hub pages. Hubs and Authorities
  • 28. PRACTICAL EXAMPLE 28 of web structure mining
  • 29.
  • 31. Web is too huge -for finding relevant information Complexity of web pages -support for multimedia files -not just text search
  • 32. Diversity in usage Relevancy of information -read tags?? titles ?? urls?? Complex queries and page indexing
  • 33. Key problem in information retrieval is how to improve correlation of information.
  • 34. Pros and cons Pros  Commercial applications  Security  Efficient search  Check authenticity Cons  Privacy issues  Trading personal data  Denial of services based on personal attributes