SEMINAR
ON
WEB MINING
1
CONTENTS
Introduction
Definition of web mining
Data mining Vs Web mining
Taxonomy
Web content mining
Web structure mining
Web usage mining
Applications
Pros&Cons
Conclusion
2
INTRODUCTION
 Nowadays, it has become necessary for users to
utilise automated tools to find, extract, filter &
evaluate desired information & resources.
 The target of search engines is only to discover
the resources on the web.
3
WEB MINING
 Web mining refers to the overall process of
discovering potentially useful and previously
unknown information or knowledge from the
Web data.
 Web mining research – integrate research from
several research communities such as:
◦ Database (DB)
◦ Information retrieval (IR).
4
DATA MINING WEB MINING
 Extraction of useful
patterns from data
sources are like
databases, texts, web,
images etc
 Extracting relevant
information hidden in
Web-related data,
like the hypertext
documents on web.
5
Web Mining Taxonomy
Web Mining
Web Content
Mining
Web Usage
Mining
Web Structure
Mining
6
WEB CONTENT MINING
Discovery of useful information from web contents
/ data / documents
◦ Web data contents: text, image, audio, video.
 Information Retrieval View ( Structured + Semi-
Structured)
 Database View
7
WEB STRUCTURE MINING
Discovering structure information from web
Web graph : web pages as nodes & hyperlinks as
edges
8
WEB USAGE MINING
 Web usage mining also known as Web log
mining
◦ mining techniques to discover interesting
usage patterns from the secondary data
derived from the interactions of the users
while surfing the web.
9
WEB MINING ISSUES
 Size
◦ Grows at about 1 million pages a day
◦ Google indexes 9 billion documents
◦ Number of web sites
 Netcraft survey says 72 million sites
 Diverse types of data
10
APPLICATIONS
 Personalized Services
 Improve website design
 System Improvement
 Predicting trends
 Carry out intelligent buisness
11
ADVANTAGES
 High trade volumes
 Classify threats & fight against Terrorism
 Establish better customer relationship
 Increase profitability
12
DISADVANTAGES
Invasion of Privacy
Discrimination by controversial attributes
13
CONCLUSION
Rapidly growing area.Promising area of future
research.The proposed techniques aim at helping
Web users to learn an unfamiliar topic in-depth
and systematically.
14
ANY
QUERIES??
15
THANK YOU
16

Web Mining

  • 1.
  • 2.
    CONTENTS Introduction Definition of webmining Data mining Vs Web mining Taxonomy Web content mining Web structure mining Web usage mining Applications Pros&Cons Conclusion 2
  • 3.
    INTRODUCTION  Nowadays, ithas become necessary for users to utilise automated tools to find, extract, filter & evaluate desired information & resources.  The target of search engines is only to discover the resources on the web. 3
  • 4.
    WEB MINING  Webmining refers to the overall process of discovering potentially useful and previously unknown information or knowledge from the Web data.  Web mining research – integrate research from several research communities such as: ◦ Database (DB) ◦ Information retrieval (IR). 4
  • 5.
    DATA MINING WEBMINING  Extraction of useful patterns from data sources are like databases, texts, web, images etc  Extracting relevant information hidden in Web-related data, like the hypertext documents on web. 5
  • 6.
    Web Mining Taxonomy WebMining Web Content Mining Web Usage Mining Web Structure Mining 6
  • 7.
    WEB CONTENT MINING Discoveryof useful information from web contents / data / documents ◦ Web data contents: text, image, audio, video.  Information Retrieval View ( Structured + Semi- Structured)  Database View 7
  • 8.
    WEB STRUCTURE MINING Discoveringstructure information from web Web graph : web pages as nodes & hyperlinks as edges 8
  • 9.
    WEB USAGE MINING Web usage mining also known as Web log mining ◦ mining techniques to discover interesting usage patterns from the secondary data derived from the interactions of the users while surfing the web. 9
  • 10.
    WEB MINING ISSUES Size ◦ Grows at about 1 million pages a day ◦ Google indexes 9 billion documents ◦ Number of web sites  Netcraft survey says 72 million sites  Diverse types of data 10
  • 11.
    APPLICATIONS  Personalized Services Improve website design  System Improvement  Predicting trends  Carry out intelligent buisness 11
  • 12.
    ADVANTAGES  High tradevolumes  Classify threats & fight against Terrorism  Establish better customer relationship  Increase profitability 12
  • 13.
  • 14.
    CONCLUSION Rapidly growing area.Promisingarea of future research.The proposed techniques aim at helping Web users to learn an unfamiliar topic in-depth and systematically. 14
  • 15.
  • 16.