SlideShare a Scribd company logo
1 of 15
Web Mining
Presented by:
Sarthak Kumar Sahoo
Computer Science & Engineering
Section: B
Regdno: 1501209160
Contents
• Introduction
• Introduction Discovered by Web Mining
• Steps in Web Mining
• Different types of Web Mining
• Web Usage Mining
• Web Structure Mining
• Web Structure Mining Terminologies
• Web Content Mining
• Different Methods Used in Web Mining
• Web Mining Applications
• Difference Between Web mining & Data Mining
Introduction
• It is the process of using data mining techniques where it uses
different algorithms to extract information directly from the Web by
extracting it from web documents and services, web content,
hyperlinks and server logs.
• The main goal of the Web mining is to search for the patterns in
web data by collecting and analyzing information in order to gain
insight into trends, the industry and users in general.
• The primary data source is World Wide Web.
• There are 3 general classes of information that can be discovered by
Web mining.
Information Discovered by Web Mining
Web Activity Web Graph Web Content
Server logs and web browser
activity tracking
Link between pages, people and other
data
Data found on the web pages and
inside of documents
Steps in Content Web Mining
Web
data
Collect
Parse
AnalyzeProduce
Report, Search
index etc
Fetch the content from
the web
Extract useable data
from formatted data
Tokenize, rate, classify,
cluster, filter, sort etc
Turn the result of
analysis into something
useful
Different types of Web Mining
Web Mining
Web Usage
Mining
Web Content
Mining
Web
Structure
Mining
Web Usage Mining
• This methodology is used to discover interesting usage patterns from Web data
in order to understand and better serve the need of web-based application.
• Usage data captures the identity, origin of web users along with their browsing
behaviour at a website.
Web Usage Mining classification according
to usage data
Web Server Data Application Server Data Application Level Data
Web Server data, like
IP address, page
reference & access
time
The ability to track various
kinds of business events
and log them in
application server logs.
New kinds of events can be
defined in an application, and
logging can be turned on for
them thus generating
histories of these specially
defined events.
Web Structure Mining
• Web Structure mining is the process of discovering structure information from the
web.
• Web Structure mining uses graph theory to analyze the node and connection structure
to the website.
• Web Structure mining can be divided into 2 type:
 Extracting patterns from hyperlink
 Mining the document structure: analysis of the tree-like structure of page.
Web document
hyperlinks
Web Structure Mining Terminology
• Web Graph: directed graph representing the web
• Node: Web page in graph
• Edge: hyperlinks
• In degree: number of links pointing to particular node
• Out degree: number of links generated from particular node
Web Content Mining
• Web content mining is the mining, extraction and integration of useful data,
information and knowledge from web page content.
• The contents of the web pages are mostly text, images and video and audio files.
• From information retrieval purpose techniques of Natural Language Processing and
intelligent web agent is used.
• The agent based-approach to web mining leads to the development of sophisticated
AI systems.
• Web content mining can be differentiated in 2 point of view: Information retrieval
view and database view.
• For Information retrieval view, the research work is done through the unstructured
data and semi-structured data (HTML structure & Hyperlink Structure).
Web Content Mining(contd)
• As per the database point of view in order to have the better
information management and querying on the web, the mining
always tries to infer the structure of the website to transform
website to become a database.
• With the help of multi-scanning approach feature selection
approach can be used.
Different Methods used in Web Mining
• Pattern analysis
• Classification accuracy
• Information Score
• Information gain
• Cross entropy
• Mutual information
• Odds Ratio
Web Mining Applications
• E-Commerce
• Information Filtering
• Fraud Detection
• Education & Research
Difference between Web Mining & Data Mining
Data Mining Web Mining
In traditional data mining approach processing
1 million records from database is a large job.
Here even 10 million pages wouldn’t be a big
number.
When doing data mining for corporate
information, the data is private and often
require access to read.
For Web mining data is public and rarely
requires access rights.
A traditional data mining task gets information
from a database, which provides some level of
explicit structure.
A typical web mining task is processing
unstructured or semi-structured data from
web pages. Even when the underlying
information for web pages comes from a
database, this often is obscured by HTML
markup.
THANK YOU.

More Related Content

What's hot

What's hot (20)

Web Scraping using Python | Web Screen Scraping
Web Scraping using Python | Web Screen ScrapingWeb Scraping using Python | Web Screen Scraping
Web Scraping using Python | Web Screen Scraping
 
Web crawler
Web crawlerWeb crawler
Web crawler
 
Web Scraping With Python
Web Scraping With PythonWeb Scraping With Python
Web Scraping With Python
 
What is web scraping?
What is web scraping?What is web scraping?
What is web scraping?
 
Web mining
Web mining Web mining
Web mining
 
Web mining
Web miningWeb mining
Web mining
 
Getting started with Web Scraping in Python
Getting started with Web Scraping in PythonGetting started with Web Scraping in Python
Getting started with Web Scraping in Python
 
Web Information Retrieval and Mining
Web Information Retrieval and MiningWeb Information Retrieval and Mining
Web Information Retrieval and Mining
 
Tutorial on Web Scraping in Python
Tutorial on Web Scraping in PythonTutorial on Web Scraping in Python
Tutorial on Web Scraping in Python
 
Web mining
Web miningWeb mining
Web mining
 
Web scraping
Web scrapingWeb scraping
Web scraping
 
web mining
web miningweb mining
web mining
 
Web mining
Web miningWeb mining
Web mining
 
WEB MINING.
WEB MINING.WEB MINING.
WEB MINING.
 
Web Content Mining
Web Content MiningWeb Content Mining
Web Content Mining
 
Web Scraping Basics
Web Scraping BasicsWeb Scraping Basics
Web Scraping Basics
 
NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISNE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSIS
 
Web scraping in python
Web scraping in pythonWeb scraping in python
Web scraping in python
 
WebCrawler
WebCrawlerWebCrawler
WebCrawler
 
Web mining (structure mining)
Web mining (structure mining)Web mining (structure mining)
Web mining (structure mining)
 

Similar to Web Mining Techniques and Applications

Web mining application &trends in data mining
Web mining application &trends in data miningWeb mining application &trends in data mining
Web mining application &trends in data miningPriyaKarnan3
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING ijcax
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING ijcax
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING ijcax
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING ijcax
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING ijcax
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING ijcax
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING ijcax
 
WEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEMWEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEMSai Kumar Ale
 
BIDIRECTIONAL GROWTH BASED MINING AND CYCLIC BEHAVIOUR ANALYSIS OF WEB SEQUEN...
BIDIRECTIONAL GROWTH BASED MINING AND CYCLIC BEHAVIOUR ANALYSIS OF WEB SEQUEN...BIDIRECTIONAL GROWTH BASED MINING AND CYCLIC BEHAVIOUR ANALYSIS OF WEB SEQUEN...
BIDIRECTIONAL GROWTH BASED MINING AND CYCLIC BEHAVIOUR ANALYSIS OF WEB SEQUEN...ijdkp
 
Introduction to Web Mining and Spatial Data Mining
Introduction to Web Mining and Spatial Data MiningIntroduction to Web Mining and Spatial Data Mining
Introduction to Web Mining and Spatial Data MiningAarshDhokai
 
Web Mining.pptx
Web Mining.pptxWeb Mining.pptx
Web Mining.pptxScrbifPt
 
Web mining and its types
Web mining and its typesWeb mining and its types
Web mining and its typesnevilshah11
 

Similar to Web Mining Techniques and Applications (20)

Web mining
Web miningWeb mining
Web mining
 
Web Mining
Web MiningWeb Mining
Web Mining
 
Web mining application &trends in data mining
Web mining application &trends in data miningWeb mining application &trends in data mining
Web mining application &trends in data mining
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
 
Gaurav web mining
Gaurav web miningGaurav web mining
Gaurav web mining
 
Web Mining
Web MiningWeb Mining
Web Mining
 
Aa03401490154
Aa03401490154Aa03401490154
Aa03401490154
 
Web mining
Web miningWeb mining
Web mining
 
WEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEMWEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEM
 
BIDIRECTIONAL GROWTH BASED MINING AND CYCLIC BEHAVIOUR ANALYSIS OF WEB SEQUEN...
BIDIRECTIONAL GROWTH BASED MINING AND CYCLIC BEHAVIOUR ANALYSIS OF WEB SEQUEN...BIDIRECTIONAL GROWTH BASED MINING AND CYCLIC BEHAVIOUR ANALYSIS OF WEB SEQUEN...
BIDIRECTIONAL GROWTH BASED MINING AND CYCLIC BEHAVIOUR ANALYSIS OF WEB SEQUEN...
 
Introduction to Web Mining and Spatial Data Mining
Introduction to Web Mining and Spatial Data MiningIntroduction to Web Mining and Spatial Data Mining
Introduction to Web Mining and Spatial Data Mining
 
Bb31269380
Bb31269380Bb31269380
Bb31269380
 
Web Mining.pptx
Web Mining.pptxWeb Mining.pptx
Web Mining.pptx
 
Web mining and its types
Web mining and its typesWeb mining and its types
Web mining and its types
 

Recently uploaded

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Web Mining Techniques and Applications

  • 1. Web Mining Presented by: Sarthak Kumar Sahoo Computer Science & Engineering Section: B Regdno: 1501209160
  • 2. Contents • Introduction • Introduction Discovered by Web Mining • Steps in Web Mining • Different types of Web Mining • Web Usage Mining • Web Structure Mining • Web Structure Mining Terminologies • Web Content Mining • Different Methods Used in Web Mining • Web Mining Applications • Difference Between Web mining & Data Mining
  • 3. Introduction • It is the process of using data mining techniques where it uses different algorithms to extract information directly from the Web by extracting it from web documents and services, web content, hyperlinks and server logs. • The main goal of the Web mining is to search for the patterns in web data by collecting and analyzing information in order to gain insight into trends, the industry and users in general. • The primary data source is World Wide Web. • There are 3 general classes of information that can be discovered by Web mining.
  • 4. Information Discovered by Web Mining Web Activity Web Graph Web Content Server logs and web browser activity tracking Link between pages, people and other data Data found on the web pages and inside of documents
  • 5. Steps in Content Web Mining Web data Collect Parse AnalyzeProduce Report, Search index etc Fetch the content from the web Extract useable data from formatted data Tokenize, rate, classify, cluster, filter, sort etc Turn the result of analysis into something useful
  • 6. Different types of Web Mining Web Mining Web Usage Mining Web Content Mining Web Structure Mining
  • 7. Web Usage Mining • This methodology is used to discover interesting usage patterns from Web data in order to understand and better serve the need of web-based application. • Usage data captures the identity, origin of web users along with their browsing behaviour at a website. Web Usage Mining classification according to usage data Web Server Data Application Server Data Application Level Data Web Server data, like IP address, page reference & access time The ability to track various kinds of business events and log them in application server logs. New kinds of events can be defined in an application, and logging can be turned on for them thus generating histories of these specially defined events.
  • 8. Web Structure Mining • Web Structure mining is the process of discovering structure information from the web. • Web Structure mining uses graph theory to analyze the node and connection structure to the website. • Web Structure mining can be divided into 2 type:  Extracting patterns from hyperlink  Mining the document structure: analysis of the tree-like structure of page. Web document hyperlinks
  • 9. Web Structure Mining Terminology • Web Graph: directed graph representing the web • Node: Web page in graph • Edge: hyperlinks • In degree: number of links pointing to particular node • Out degree: number of links generated from particular node
  • 10. Web Content Mining • Web content mining is the mining, extraction and integration of useful data, information and knowledge from web page content. • The contents of the web pages are mostly text, images and video and audio files. • From information retrieval purpose techniques of Natural Language Processing and intelligent web agent is used. • The agent based-approach to web mining leads to the development of sophisticated AI systems. • Web content mining can be differentiated in 2 point of view: Information retrieval view and database view. • For Information retrieval view, the research work is done through the unstructured data and semi-structured data (HTML structure & Hyperlink Structure).
  • 11. Web Content Mining(contd) • As per the database point of view in order to have the better information management and querying on the web, the mining always tries to infer the structure of the website to transform website to become a database. • With the help of multi-scanning approach feature selection approach can be used.
  • 12. Different Methods used in Web Mining • Pattern analysis • Classification accuracy • Information Score • Information gain • Cross entropy • Mutual information • Odds Ratio
  • 13. Web Mining Applications • E-Commerce • Information Filtering • Fraud Detection • Education & Research
  • 14. Difference between Web Mining & Data Mining Data Mining Web Mining In traditional data mining approach processing 1 million records from database is a large job. Here even 10 million pages wouldn’t be a big number. When doing data mining for corporate information, the data is private and often require access to read. For Web mining data is public and rarely requires access rights. A traditional data mining task gets information from a database, which provides some level of explicit structure. A typical web mining task is processing unstructured or semi-structured data from web pages. Even when the underlying information for web pages comes from a database, this often is obscured by HTML markup.