1
Presentation Outlines
Introduction
Similarity & Difference between Data Mining and
Web Mining
Reasons for Web Mining
Types of Web Mining
Architecture of Web Mining
Application of Web Mining
Challenges of Web Mining
Conclusion and Recommendations 2
INTROUDUCTION
Data Mining is the set of methodologies used in
analyzing data from various dimensions and perspectives,
finding previously unknown hidden patterns,
classifying and grouping the data and
summarizing the identified relationships.
3
INTROUDUCTION
Web mining can be broadly defined as the discovery and
analysis of useful information from the World Wide Web.
The data is collected from the server, client and database in
Web mining.
Web mining is a subset of data mining.
4
Difference between Datamining and Web mining
 In DM data is stored in data warehouse while data is stored in
web server database and web logos in WM.
 DM uses Structured data while WM uses Structured and
Unstructured data.
5
Similarity of Datamining and Web mining
 Their common goal is to extracting, discovering, finding and
mining hidden knowledge.
 Their concept is identification of patterns from the data
available in the system/web.
 Both are useful for decision making and prediction.
 Both follows the same process
 Both needs input/ source data to complete their process
6
REASONS FOR WEB MINING
 While dealing with the web data we face with the following
problems.
 User side Problem: Users browse or use the search service to find
a relevant information from the web.
 They face problems like:
•Low Precision,
•getting an irrelevant information and
• Low Recall .
7
Cont…
 Information Providers/Server Problem:
 What do customers do,
 what do the customers want,
 how effectively use the web data to market products
 and service to the customers?
8
Web Mining Tools
 Data Miner (Web Content Mining Tool)
 Google Analytics (Web Usage Mining Tool)
 Majestic (Web structure mining tool)
 Scrappy (Web content mining tool)
 Oracle data Mining (Web Usage Mining Tool)
 Bixo (Web structure mining tool)
 Weka (Web Usage Mining tool )
9
TYPES OF WEB MINING
 Web mining can be generally divided into three categories,
based on the data to be mined as seen in Figure :
Figure 1: Types of Web Mining
10
Web Content Mining
 Web Content Mining the process of collecting useful data from
websites.
 This content includes news, comments, company information,
product catalogs, etc.
 It is extract information or knowledge from collected sources.
 This content may consists text, image, video, sound or
structured records such as lists and tables.
11
Web Structure Mining
 It is the process of extracting structural information from the
web.
Hyperlinks: is a structural component that connects the web
page to a different location.
Document Structure: organization of content from the web
page in tree-structure format based on HTML and XML tags
with in the page.
12
Web Usage Mining
 It is the application of data mining techniques to discover
patterns using the Web to better understand and meet the needs
of the user.
 It is classified in to three based on the kind of data usage.
 Web Server Data: the web server including IP address, page reference
and access time collects user logs.
 Application Server Data: ability to track various kinds of business events.
 Application Level Data: defining new kinds of events and logging them
by generating histories of the events.
13
Architecture of Web Mining
14
Figure 2: Architecture of Web Mining
APPLICATIONS OF WEB MINING
 A web mining has a lot of application in different sectors or areas.
Figure 3: Application of Web mining
15
Cont…
E-Learning:
 Web mining can be used for improving and enhancing the
process of E-learning environments.
 Applications of web mining to e-learning are usually web usage
based.
 Machine learning techniques and web usage mining enhance
web based learning environments
16
Cont…
Electronic commerce:
 A major challenge e-commerce is to understand visitors or
customers needs and to value orientations as such as possible.
 It can improve capacity of service for consumer and
competitive advantages.
17
Cont…
Security and Crime Investigation:
 Web mining techniques are also used for protection of user
system or logging information against such cybercrimes as
hacking,
internet fraud,
fraudulent websites,
illegal online gambling,
virus spreading,
child pornography distribution and
cyber terrorism. 18
Cont…
Electronic Business:
Web mining techniques can support a web enabled
electronic business to improve on
•Marketing,
•Customer support and
•Sales operations.
19
Advantages of Web Mining
 Increases of profits of companies or organizations by sealing products.
 Protect user system or logging information from cybercrimes.
 Improves capacity of service for consumer and competitive advantages.
 improving and enhancing the process of E-learning environments.
 It opens door for Business Intelligence or Knowledge economy.
 It supports for Decision Making and prediction.
 Mining and Discovering hidden knowledge.
 Used for data analysis.
20
Disadvantages
 URL’s can be tracked to access the data,
 Multiplicity of events and URL’s,
 Large amount of data remain unused
 Since data are updatable it is not good to say they are untrusted
21
WEB MINING CHALENGES
Web mining is faced with various technical and non-technical
challenges.
The non-technical restrictions can be included the
lack of management support,
inadequate fund and
lack of required resources such as professional human
resources.
22
Cont…
The technical issues are
Incorrect and Inaccurate Data
Data may be inaccurate.
Data may be incomplete and unavailable.
The lack of tools
Available tools only support one of the web mining
types such as classification or clustering.
23
CONCLUSION
 As web usage and information source in the World Wide Web
are growing continuously it is a good opportunity having web
miner to extract hidden knowledge's from the web.
 As a weakness not all but some researchers are replaced Web
mining by Text mining. It is strongly wrong since web mining is
concentrated with too much multimedia information's but text
mining is only for textual data.
24
RECOMMENDATION
For the future Web mining tools should become supportable for all
clustering, classification and association techniques.
Since privacy is a big challenge for and harms the process of web
mining it is good for the future things or data's should be released
publicly and to increase the societies habit of knowledge sharing by
serving training and collaborative opportunities.
25
26

Web mining

  • 1.
  • 2.
    Presentation Outlines Introduction Similarity &Difference between Data Mining and Web Mining Reasons for Web Mining Types of Web Mining Architecture of Web Mining Application of Web Mining Challenges of Web Mining Conclusion and Recommendations 2
  • 3.
    INTROUDUCTION Data Mining isthe set of methodologies used in analyzing data from various dimensions and perspectives, finding previously unknown hidden patterns, classifying and grouping the data and summarizing the identified relationships. 3
  • 4.
    INTROUDUCTION Web mining canbe broadly defined as the discovery and analysis of useful information from the World Wide Web. The data is collected from the server, client and database in Web mining. Web mining is a subset of data mining. 4
  • 5.
    Difference between Dataminingand Web mining  In DM data is stored in data warehouse while data is stored in web server database and web logos in WM.  DM uses Structured data while WM uses Structured and Unstructured data. 5
  • 6.
    Similarity of Dataminingand Web mining  Their common goal is to extracting, discovering, finding and mining hidden knowledge.  Their concept is identification of patterns from the data available in the system/web.  Both are useful for decision making and prediction.  Both follows the same process  Both needs input/ source data to complete their process 6
  • 7.
    REASONS FOR WEBMINING  While dealing with the web data we face with the following problems.  User side Problem: Users browse or use the search service to find a relevant information from the web.  They face problems like: •Low Precision, •getting an irrelevant information and • Low Recall . 7
  • 8.
    Cont…  Information Providers/ServerProblem:  What do customers do,  what do the customers want,  how effectively use the web data to market products  and service to the customers? 8
  • 9.
    Web Mining Tools Data Miner (Web Content Mining Tool)  Google Analytics (Web Usage Mining Tool)  Majestic (Web structure mining tool)  Scrappy (Web content mining tool)  Oracle data Mining (Web Usage Mining Tool)  Bixo (Web structure mining tool)  Weka (Web Usage Mining tool ) 9
  • 10.
    TYPES OF WEBMINING  Web mining can be generally divided into three categories, based on the data to be mined as seen in Figure : Figure 1: Types of Web Mining 10
  • 11.
    Web Content Mining Web Content Mining the process of collecting useful data from websites.  This content includes news, comments, company information, product catalogs, etc.  It is extract information or knowledge from collected sources.  This content may consists text, image, video, sound or structured records such as lists and tables. 11
  • 12.
    Web Structure Mining It is the process of extracting structural information from the web. Hyperlinks: is a structural component that connects the web page to a different location. Document Structure: organization of content from the web page in tree-structure format based on HTML and XML tags with in the page. 12
  • 13.
    Web Usage Mining It is the application of data mining techniques to discover patterns using the Web to better understand and meet the needs of the user.  It is classified in to three based on the kind of data usage.  Web Server Data: the web server including IP address, page reference and access time collects user logs.  Application Server Data: ability to track various kinds of business events.  Application Level Data: defining new kinds of events and logging them by generating histories of the events. 13
  • 14.
    Architecture of WebMining 14 Figure 2: Architecture of Web Mining
  • 15.
    APPLICATIONS OF WEBMINING  A web mining has a lot of application in different sectors or areas. Figure 3: Application of Web mining 15
  • 16.
    Cont… E-Learning:  Web miningcan be used for improving and enhancing the process of E-learning environments.  Applications of web mining to e-learning are usually web usage based.  Machine learning techniques and web usage mining enhance web based learning environments 16
  • 17.
    Cont… Electronic commerce:  Amajor challenge e-commerce is to understand visitors or customers needs and to value orientations as such as possible.  It can improve capacity of service for consumer and competitive advantages. 17
  • 18.
    Cont… Security and CrimeInvestigation:  Web mining techniques are also used for protection of user system or logging information against such cybercrimes as hacking, internet fraud, fraudulent websites, illegal online gambling, virus spreading, child pornography distribution and cyber terrorism. 18
  • 19.
    Cont… Electronic Business: Web miningtechniques can support a web enabled electronic business to improve on •Marketing, •Customer support and •Sales operations. 19
  • 20.
    Advantages of WebMining  Increases of profits of companies or organizations by sealing products.  Protect user system or logging information from cybercrimes.  Improves capacity of service for consumer and competitive advantages.  improving and enhancing the process of E-learning environments.  It opens door for Business Intelligence or Knowledge economy.  It supports for Decision Making and prediction.  Mining and Discovering hidden knowledge.  Used for data analysis. 20
  • 21.
    Disadvantages  URL’s canbe tracked to access the data,  Multiplicity of events and URL’s,  Large amount of data remain unused  Since data are updatable it is not good to say they are untrusted 21
  • 22.
    WEB MINING CHALENGES Webmining is faced with various technical and non-technical challenges. The non-technical restrictions can be included the lack of management support, inadequate fund and lack of required resources such as professional human resources. 22
  • 23.
    Cont… The technical issuesare Incorrect and Inaccurate Data Data may be inaccurate. Data may be incomplete and unavailable. The lack of tools Available tools only support one of the web mining types such as classification or clustering. 23
  • 24.
    CONCLUSION  As webusage and information source in the World Wide Web are growing continuously it is a good opportunity having web miner to extract hidden knowledge's from the web.  As a weakness not all but some researchers are replaced Web mining by Text mining. It is strongly wrong since web mining is concentrated with too much multimedia information's but text mining is only for textual data. 24
  • 25.
    RECOMMENDATION For the futureWeb mining tools should become supportable for all clustering, classification and association techniques. Since privacy is a big challenge for and harms the process of web mining it is good for the future things or data's should be released publicly and to increase the societies habit of knowledge sharing by serving training and collaborative opportunities. 25
  • 26.