General Idea about web mining and different methods of web mining and terminologies associated with web mining and Usage of web mining, differentiation between web mining and data mining.
2. Contents
• Introduction
• Introduction Discovered by Web Mining
• Steps in Web Mining
• Different types of Web Mining
• Web Usage Mining
• Web Structure Mining
• Web Structure Mining Terminologies
• Web Content Mining
• Different Methods Used in Web Mining
• Web Mining Applications
• Difference Between Web mining & Data Mining
3. Introduction
• It is the process of using data mining techniques where it uses
different algorithms to extract information directly from the Web by
extracting it from web documents and services, web content,
hyperlinks and server logs.
• The main goal of the Web mining is to search for the patterns in
web data by collecting and analyzing information in order to gain
insight into trends, the industry and users in general.
• The primary data source is World Wide Web.
• There are 3 general classes of information that can be discovered by
Web mining.
4. Information Discovered by Web Mining
Web Activity Web Graph Web Content
Server logs and web browser
activity tracking
Link between pages, people and other
data
Data found on the web pages and
inside of documents
5. Steps in Content Web Mining
Web
data
Collect
Parse
AnalyzeProduce
Report, Search
index etc
Fetch the content from
the web
Extract useable data
from formatted data
Tokenize, rate, classify,
cluster, filter, sort etc
Turn the result of
analysis into something
useful
6. Different types of Web Mining
Web Mining
Web Usage
Mining
Web Content
Mining
Web
Structure
Mining
7. Web Usage Mining
• This methodology is used to discover interesting usage patterns from Web data
in order to understand and better serve the need of web-based application.
• Usage data captures the identity, origin of web users along with their browsing
behaviour at a website.
Web Usage Mining classification according
to usage data
Web Server Data Application Server Data Application Level Data
Web Server data, like
IP address, page
reference & access
time
The ability to track various
kinds of business events
and log them in
application server logs.
New kinds of events can be
defined in an application, and
logging can be turned on for
them thus generating
histories of these specially
defined events.
8. Web Structure Mining
• Web Structure mining is the process of discovering structure information from the
web.
• Web Structure mining uses graph theory to analyze the node and connection structure
to the website.
• Web Structure mining can be divided into 2 type:
Extracting patterns from hyperlink
Mining the document structure: analysis of the tree-like structure of page.
Web document
hyperlinks
9. Web Structure Mining Terminology
• Web Graph: directed graph representing the web
• Node: Web page in graph
• Edge: hyperlinks
• In degree: number of links pointing to particular node
• Out degree: number of links generated from particular node
10. Web Content Mining
• Web content mining is the mining, extraction and integration of useful data,
information and knowledge from web page content.
• The contents of the web pages are mostly text, images and video and audio files.
• From information retrieval purpose techniques of Natural Language Processing and
intelligent web agent is used.
• The agent based-approach to web mining leads to the development of sophisticated
AI systems.
• Web content mining can be differentiated in 2 point of view: Information retrieval
view and database view.
• For Information retrieval view, the research work is done through the unstructured
data and semi-structured data (HTML structure & Hyperlink Structure).
11. Web Content Mining(contd)
• As per the database point of view in order to have the better
information management and querying on the web, the mining
always tries to infer the structure of the website to transform
website to become a database.
• With the help of multi-scanning approach feature selection
approach can be used.
12. Different Methods used in Web Mining
• Pattern analysis
• Classification accuracy
• Information Score
• Information gain
• Cross entropy
• Mutual information
• Odds Ratio
13. Web Mining Applications
• E-Commerce
• Information Filtering
• Fraud Detection
• Education & Research
14. Difference between Web Mining & Data Mining
Data Mining Web Mining
In traditional data mining approach processing
1 million records from database is a large job.
Here even 10 million pages wouldn’t be a big
number.
When doing data mining for corporate
information, the data is private and often
require access to read.
For Web mining data is public and rarely
requires access rights.
A traditional data mining task gets information
from a database, which provides some level of
explicit structure.
A typical web mining task is processing
unstructured or semi-structured data from
web pages. Even when the underlying
information for web pages comes from a
database, this often is obscured by HTML
markup.