Web Mining
Presented by:
Sarthak Kumar Sahoo
Computer Science & Engineering
Section: B
Regdno: 1501209160
Contents
• Introduction
• Introduction Discovered by Web Mining
• Steps in Web Mining
• Different types of Web Mining
• Web Usage Mining
• Web Structure Mining
• Web Structure Mining Terminologies
• Web Content Mining
• Different Methods Used in Web Mining
• Web Mining Applications
• Difference Between Web mining & Data Mining
Introduction
• It is the process of using data mining techniques where it uses
different algorithms to extract information directly from the Web by
extracting it from web documents and services, web content,
hyperlinks and server logs.
• The main goal of the Web mining is to search for the patterns in
web data by collecting and analyzing information in order to gain
insight into trends, the industry and users in general.
• The primary data source is World Wide Web.
• There are 3 general classes of information that can be discovered by
Web mining.
Information Discovered by Web Mining
Web Activity Web Graph Web Content
Server logs and web browser
activity tracking
Link between pages, people and other
data
Data found on the web pages and
inside of documents
Steps in Content Web Mining
Web
data
Collect
Parse
AnalyzeProduce
Report, Search
index etc
Fetch the content from
the web
Extract useable data
from formatted data
Tokenize, rate, classify,
cluster, filter, sort etc
Turn the result of
analysis into something
useful
Different types of Web Mining
Web Mining
Web Usage
Mining
Web Content
Mining
Web
Structure
Mining
Web Usage Mining
• This methodology is used to discover interesting usage patterns from Web data
in order to understand and better serve the need of web-based application.
• Usage data captures the identity, origin of web users along with their browsing
behaviour at a website.
Web Usage Mining classification according
to usage data
Web Server Data Application Server Data Application Level Data
Web Server data, like
IP address, page
reference & access
time
The ability to track various
kinds of business events
and log them in
application server logs.
New kinds of events can be
defined in an application, and
logging can be turned on for
them thus generating
histories of these specially
defined events.
Web Structure Mining
• Web Structure mining is the process of discovering structure information from the
web.
• Web Structure mining uses graph theory to analyze the node and connection structure
to the website.
• Web Structure mining can be divided into 2 type:
 Extracting patterns from hyperlink
 Mining the document structure: analysis of the tree-like structure of page.
Web document
hyperlinks
Web Structure Mining Terminology
• Web Graph: directed graph representing the web
• Node: Web page in graph
• Edge: hyperlinks
• In degree: number of links pointing to particular node
• Out degree: number of links generated from particular node
Web Content Mining
• Web content mining is the mining, extraction and integration of useful data,
information and knowledge from web page content.
• The contents of the web pages are mostly text, images and video and audio files.
• From information retrieval purpose techniques of Natural Language Processing and
intelligent web agent is used.
• The agent based-approach to web mining leads to the development of sophisticated
AI systems.
• Web content mining can be differentiated in 2 point of view: Information retrieval
view and database view.
• For Information retrieval view, the research work is done through the unstructured
data and semi-structured data (HTML structure & Hyperlink Structure).
Web Content Mining(contd)
• As per the database point of view in order to have the better
information management and querying on the web, the mining
always tries to infer the structure of the website to transform
website to become a database.
• With the help of multi-scanning approach feature selection
approach can be used.
Different Methods used in Web Mining
• Pattern analysis
• Classification accuracy
• Information Score
• Information gain
• Cross entropy
• Mutual information
• Odds Ratio
Web Mining Applications
• E-Commerce
• Information Filtering
• Fraud Detection
• Education & Research
Difference between Web Mining & Data Mining
Data Mining Web Mining
In traditional data mining approach processing
1 million records from database is a large job.
Here even 10 million pages wouldn’t be a big
number.
When doing data mining for corporate
information, the data is private and often
require access to read.
For Web mining data is public and rarely
requires access rights.
A traditional data mining task gets information
from a database, which provides some level of
explicit structure.
A typical web mining task is processing
unstructured or semi-structured data from
web pages. Even when the underlying
information for web pages comes from a
database, this often is obscured by HTML
markup.
THANK YOU.

Web mining

  • 1.
    Web Mining Presented by: SarthakKumar Sahoo Computer Science & Engineering Section: B Regdno: 1501209160
  • 2.
    Contents • Introduction • IntroductionDiscovered by Web Mining • Steps in Web Mining • Different types of Web Mining • Web Usage Mining • Web Structure Mining • Web Structure Mining Terminologies • Web Content Mining • Different Methods Used in Web Mining • Web Mining Applications • Difference Between Web mining & Data Mining
  • 3.
    Introduction • It isthe process of using data mining techniques where it uses different algorithms to extract information directly from the Web by extracting it from web documents and services, web content, hyperlinks and server logs. • The main goal of the Web mining is to search for the patterns in web data by collecting and analyzing information in order to gain insight into trends, the industry and users in general. • The primary data source is World Wide Web. • There are 3 general classes of information that can be discovered by Web mining.
  • 4.
    Information Discovered byWeb Mining Web Activity Web Graph Web Content Server logs and web browser activity tracking Link between pages, people and other data Data found on the web pages and inside of documents
  • 5.
    Steps in ContentWeb Mining Web data Collect Parse AnalyzeProduce Report, Search index etc Fetch the content from the web Extract useable data from formatted data Tokenize, rate, classify, cluster, filter, sort etc Turn the result of analysis into something useful
  • 6.
    Different types ofWeb Mining Web Mining Web Usage Mining Web Content Mining Web Structure Mining
  • 7.
    Web Usage Mining •This methodology is used to discover interesting usage patterns from Web data in order to understand and better serve the need of web-based application. • Usage data captures the identity, origin of web users along with their browsing behaviour at a website. Web Usage Mining classification according to usage data Web Server Data Application Server Data Application Level Data Web Server data, like IP address, page reference & access time The ability to track various kinds of business events and log them in application server logs. New kinds of events can be defined in an application, and logging can be turned on for them thus generating histories of these specially defined events.
  • 8.
    Web Structure Mining •Web Structure mining is the process of discovering structure information from the web. • Web Structure mining uses graph theory to analyze the node and connection structure to the website. • Web Structure mining can be divided into 2 type:  Extracting patterns from hyperlink  Mining the document structure: analysis of the tree-like structure of page. Web document hyperlinks
  • 9.
    Web Structure MiningTerminology • Web Graph: directed graph representing the web • Node: Web page in graph • Edge: hyperlinks • In degree: number of links pointing to particular node • Out degree: number of links generated from particular node
  • 10.
    Web Content Mining •Web content mining is the mining, extraction and integration of useful data, information and knowledge from web page content. • The contents of the web pages are mostly text, images and video and audio files. • From information retrieval purpose techniques of Natural Language Processing and intelligent web agent is used. • The agent based-approach to web mining leads to the development of sophisticated AI systems. • Web content mining can be differentiated in 2 point of view: Information retrieval view and database view. • For Information retrieval view, the research work is done through the unstructured data and semi-structured data (HTML structure & Hyperlink Structure).
  • 11.
    Web Content Mining(contd) •As per the database point of view in order to have the better information management and querying on the web, the mining always tries to infer the structure of the website to transform website to become a database. • With the help of multi-scanning approach feature selection approach can be used.
  • 12.
    Different Methods usedin Web Mining • Pattern analysis • Classification accuracy • Information Score • Information gain • Cross entropy • Mutual information • Odds Ratio
  • 13.
    Web Mining Applications •E-Commerce • Information Filtering • Fraud Detection • Education & Research
  • 14.
    Difference between WebMining & Data Mining Data Mining Web Mining In traditional data mining approach processing 1 million records from database is a large job. Here even 10 million pages wouldn’t be a big number. When doing data mining for corporate information, the data is private and often require access to read. For Web mining data is public and rarely requires access rights. A traditional data mining task gets information from a database, which provides some level of explicit structure. A typical web mining task is processing unstructured or semi-structured data from web pages. Even when the underlying information for web pages comes from a database, this often is obscured by HTML markup.
  • 15.