SlideShare a Scribd company logo
Advanced IS Design
Lecture 1
Web Mining
Overview
 Challenges in Web Mining
 Basics of Web Mining
 Classification of Web Mining
Web Mining
It is the application of data mining techniques to
automatically discover and extract information from
Web data, including web documents, hyperlinks
between documents, usage logs of websites, etc.
 Web mining is a multidisciplinary field:
 Data mining,
 Machine learning,
 Natural language processing,
 Statistics,
 Databases,
 Information retrieval, multimedia, etc.
Web mining challenges
 The Web has many unique characteristics, which make
mining useful information and knowledge a fascinating and
challenging task.
 The amount of information on the Web is huge, and easily
accessible.
 Information/data of almost all types exist on the Web, e.g.,
structured tables, texts, multimedia data, etc.
 Much of the Web information is redundant. The same piece of
information or its variants may appear in many pages.
 The Web is noisy. A Web page typically contains a mixture of many
kinds of information, e.g., main contents, advertisements,
navigation panels, copyright notices, etc.
Web mining challenges
 The Web is dynamic. Information on the Web changes
constantly. Keeping up with the changes and monitoring the
changes are important issues.
 Above all, the Web is a virtual society. It is not only about data,
information and services, but also about interactions among
people, organizations and automatic systems, i.e., communities.
Classification of Web Mining Techniques
 Web Structure Mining
 Web Usage Mining
 Web Content Mining
Web-Structure Mining
 Discovering useful knowledge from hyperlinks,
which represent the structure of the Web.
 Link mining refers to data mining techniques that
explicitly consider these links when building predictive or
descriptive models of the linked data are used for
beneficial applications i.e.,:
 In search engines: for discovering important Web
pages.
 In social network analysis: for discovering
communities of users who share common interests.
 Citation analysis (co-citation & bibliographic coupling)
Web-Usage Mining
 Discovery of user access patterns from Web
usage logs, which record user clickstreams.
 Clickstream
 It is the recording of what a computer user clicks on
while Web browsing. As the user clicks anywhere in
the webpage, the action is logged on a client or inside
the Web server, as well as other sources.
Web-Usage Mining
 Clickstream Analysis answers the following questions:
 Which web page is the most common point of entry for users?
 Are visitors entering through the gateway constructed by the
website developers, or are they somehow by passing the
gateway and landing in the middle of the Web site?
 In which order have the pages been viewed?
 Is this page sequencing as the developers might have expected,
or is there something the users are trying to tell us about how the
Web site should be structured?
 Which other Web sites referred the users to your Web site?
 Which referrer sites are providing us with the greatest number of
referrals?
 How many web pages have been viewed in the typical visit?
Web-Usage Mining Benefits
 Restructure a website
 Extract user access patterns to target ads
 Number of access to individual files
 Predict user behavior based on previously learned
rules and users’ profile
Web-Usage Mining Techniques
 Data Preprocessing
Conversion of raw data in usage logs in order to produce
the right data for mining. (e.g., data cleaning)
 Pattern Discovery
- using the algorithms and techniques from data mining,
sequential pattern mining, machine learning, statistics and pattern
recognition etc.
- Common data mining techniques are association rules
and sequence pattern mining.
 Pattern Analysis
Validation and interpretation of the mined patterns.
Web Content Mining
 Discovering useful information or knowledge
from Web page contents.
 Web data contents include text, Image, audio, video,
metadata and hyperlinks.
 Technologies that are normally used in web
content mining are NLP (Natural Language
Processing) and IR (Information Retrieval).
Web Content Mining Applications
 Web Information Integration and Schema
Matching.
 (Lecture 2)
 Opinion extraction from online sources.
 (Lecture 3)
 Knowledge synthesis (representation).
 (Lecture 4)
Social Network Analysis
CS583, Bing Liu, UIC 15
Social network analysis
 Social network is the study of social entities (people
in an organization, called actors), and their
interactions and relationships.
 The interactions and relationships can be
represented with a network or graph,
 each vertex (or node) represents an actor and
 each link represents a relationship.
 From the network, we can study the properties of its
structure, and find various kinds of sub-graphs, e.g.,
communities formed by groups of actors.
 We study two types of social network analysis, centrality
and prestige, which are closely related to hyperlink
analysis and search on the Web.
CS583, Bing Liu, UIC 16
Centrality
 Important or prominent actors are those that
are linked or involved with other actors
extensively.
 A person with extensive contacts (links) or
communications with many other people in
the organization is considered more important
than a person with relatively fewer contacts.
 The links can also be called ties. A central
actor is one involved in many ties.
17
Centrality
Based on the varying notions of importance of
vertices or edges, different centrality measures
were developed:
1. Degree centrality
2. Betweenness centrality
3. Closeness centrality
18
Degree Centrality
Central actors are the most active actors that have most links or ties
with other actors. Let the total number of actors in the network be n.
 Undirected Graph: In an undirected graph, the degree centrality of an
actor i (denoted by CD(i)) is simply the node degree (the number of edges)
of the actor node, denoted by d(i), normalized with the maximum degree, n-
1.
 The value of this measure ranges between 0 and 1 as n-1 is the maximum
value of d(i).
 Directed Graph: In this case, we need to distinguish in-links of actor i
(links pointing to i), and out-links (links pointing out from i). The degree
centrality is defined based on only the out-degree (the number of out-links or
edges), do(i).
19
Degree Centrality
degree?
20
Closeness Centrality
This view of centrality is based on the closeness or distance. The basic
idea is that an actor xi is central if it can easily interact with all other
actors. That is, its distance to all other actors is short. Thus, we can use
the shortest distance to compute this measure. Let the shortest
distance from actor i to actor j be d(i, j) (measured as the number of
links in a shortest path).
 Undirected Graph: The closeness centrality CC(i) of actor i is defined as
 The value of this measure also ranges between 0 and 1 as n-1 is the
minimum value of the denominator, which is the sum of the shortest
distances from i to all other actors.
 Directed Graph: The same equation can be used for a directed graph. The
distance computation needs to consider directions of links or edges.
21
Closeness Centrality
 CC(d)=0.75
 d is at distance 1 from 4 nodes
and at distance 2 from 2 nodes.
 Then
∑j≠ddist(d,j)=1+1+1+1+2+2=8
 Since there are 7 nodes in the
network, the numerator of the
equation above is 6, then the
closeness centrality of d is
6/8=0.75
CS583, Bing Liu, UIC 22
Betweenness Centrality
 If two non-adjacent actors j and k want to
interact and actor i is on the path between j
and k, then i may have some control over the
interactions between j and k.
 Betweenness measures this control of i over
other pairs of actors. Thus,
 if i is on the paths of many such interactions, then
i is an important actor.
CS583, Bing Liu, UIC 23
Betweenness Centrality (cont …)
 Undirected graph: Let pjk be the number of
shortest paths between actor j and actor k.
 The betweenness of an actor i is defined as the
number of shortest paths that pass i (pjk(i))
normalized by the total number of shortest paths.

k
j jk
jk
p
i
p )
(
24
Betweenness Centrality
 CB(b)=16
 as all the shortest paths from
any node from the set a,c
 to any node from the set d,e,f,g
 pass through b
THANK YOU
25

More Related Content

Similar to Web Mining .ppt

Data.Mining.C.8(Ii).Web Mining 570802461
Data.Mining.C.8(Ii).Web Mining 570802461Data.Mining.C.8(Ii).Web Mining 570802461
Data.Mining.C.8(Ii).Web Mining 570802461
Margaret Wang
 
Data.Mining.C.8(Ii).Web Mining 570802461
Data.Mining.C.8(Ii).Web Mining 570802461Data.Mining.C.8(Ii).Web Mining 570802461
Data.Mining.C.8(Ii).Web Mining 570802461
Margaret Wang
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
dannyijwest
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
dannyijwest
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
IJwest
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
dannyijwest
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
dannyijwest
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
IJwest
 
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
paperpublications3
 
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
paperpublications3
 
APPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKS
APPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKSAPPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKS
APPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKS
IJwest
 
APPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKS
APPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKSAPPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKS
APPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKS
IJwest
 
Sampling of User Behavior Using Online Social Network
Sampling of User Behavior Using Online Social NetworkSampling of User Behavior Using Online Social Network
Sampling of User Behavior Using Online Social Network
Editor IJCATR
 
Sampling of User Behavior Using Online Social Network
Sampling of User Behavior Using Online Social NetworkSampling of User Behavior Using Online Social Network
Sampling of User Behavior Using Online Social Network
Editor IJCATR
 
Nt1310 Unit 1 Literature Review
Nt1310 Unit 1 Literature ReviewNt1310 Unit 1 Literature Review
Nt1310 Unit 1 Literature Review
Camella Taylor
 
Nt1310 Unit 1 Literature Review
Nt1310 Unit 1 Literature ReviewNt1310 Unit 1 Literature Review
Nt1310 Unit 1 Literature Review
Camella Taylor
 
20142014_20142015_20142115
20142014_20142015_2014211520142014_20142015_20142115
20142014_20142015_20142115
Divita Madaan
 
20142014_20142015_20142115
20142014_20142015_2014211520142014_20142015_20142115
20142014_20142015_20142115
Divita Madaan
 
Exploring the Current Trends and Future Prospects in Terrorist Network Mining
Exploring the Current Trends and Future Prospects in Terrorist Network Mining Exploring the Current Trends and Future Prospects in Terrorist Network Mining
Exploring the Current Trends and Future Prospects in Terrorist Network Mining
cscpconf
 
Exploring the Current Trends and Future Prospects in Terrorist Network Mining
Exploring the Current Trends and Future Prospects in Terrorist Network Mining Exploring the Current Trends and Future Prospects in Terrorist Network Mining
Exploring the Current Trends and Future Prospects in Terrorist Network Mining
cscpconf
 

Similar to Web Mining .ppt (20)

Data.Mining.C.8(Ii).Web Mining 570802461
Data.Mining.C.8(Ii).Web Mining 570802461Data.Mining.C.8(Ii).Web Mining 570802461
Data.Mining.C.8(Ii).Web Mining 570802461
 
Data.Mining.C.8(Ii).Web Mining 570802461
Data.Mining.C.8(Ii).Web Mining 570802461Data.Mining.C.8(Ii).Web Mining 570802461
Data.Mining.C.8(Ii).Web Mining 570802461
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
 
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
 
APPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKS
APPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKSAPPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKS
APPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKS
 
APPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKS
APPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKSAPPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKS
APPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKS
 
Sampling of User Behavior Using Online Social Network
Sampling of User Behavior Using Online Social NetworkSampling of User Behavior Using Online Social Network
Sampling of User Behavior Using Online Social Network
 
Sampling of User Behavior Using Online Social Network
Sampling of User Behavior Using Online Social NetworkSampling of User Behavior Using Online Social Network
Sampling of User Behavior Using Online Social Network
 
Nt1310 Unit 1 Literature Review
Nt1310 Unit 1 Literature ReviewNt1310 Unit 1 Literature Review
Nt1310 Unit 1 Literature Review
 
Nt1310 Unit 1 Literature Review
Nt1310 Unit 1 Literature ReviewNt1310 Unit 1 Literature Review
Nt1310 Unit 1 Literature Review
 
20142014_20142015_20142115
20142014_20142015_2014211520142014_20142015_20142115
20142014_20142015_20142115
 
20142014_20142015_20142115
20142014_20142015_2014211520142014_20142015_20142115
20142014_20142015_20142115
 
Exploring the Current Trends and Future Prospects in Terrorist Network Mining
Exploring the Current Trends and Future Prospects in Terrorist Network Mining Exploring the Current Trends and Future Prospects in Terrorist Network Mining
Exploring the Current Trends and Future Prospects in Terrorist Network Mining
 
Exploring the Current Trends and Future Prospects in Terrorist Network Mining
Exploring the Current Trends and Future Prospects in Terrorist Network Mining Exploring the Current Trends and Future Prospects in Terrorist Network Mining
Exploring the Current Trends and Future Prospects in Terrorist Network Mining
 

More from NaglaaFathy42

reverse engineering.ppt
reverse engineering.pptreverse engineering.ppt
reverse engineering.ppt
NaglaaFathy42
 
introduction to web engineering.pptx
introduction to web engineering.pptxintroduction to web engineering.pptx
introduction to web engineering.pptx
NaglaaFathy42
 
introduction to web engineering.pdf
introduction to web engineering.pdfintroduction to web engineering.pdf
introduction to web engineering.pdf
NaglaaFathy42
 
understanding computers.ppt
understanding computers.pptunderstanding computers.ppt
understanding computers.ppt
NaglaaFathy42
 
semantic integration.ppt
semantic integration.pptsemantic integration.ppt
semantic integration.ppt
NaglaaFathy42
 
semantic web tech.ppt
semantic web tech.pptsemantic web tech.ppt
semantic web tech.ppt
NaglaaFathy42
 
Bioinformatic_Databases_2.ppt
Bioinformatic_Databases_2.pptBioinformatic_Databases_2.ppt
Bioinformatic_Databases_2.ppt
NaglaaFathy42
 
Lec2_Information Integration.ppt
 Lec2_Information Integration.ppt Lec2_Information Integration.ppt
Lec2_Information Integration.ppt
NaglaaFathy42
 
ch5-georeferencing.ppt
ch5-georeferencing.pptch5-georeferencing.ppt
ch5-georeferencing.ppt
NaglaaFathy42
 
intro to gis
intro to gisintro to gis
intro to gis
NaglaaFathy42
 

More from NaglaaFathy42 (10)

reverse engineering.ppt
reverse engineering.pptreverse engineering.ppt
reverse engineering.ppt
 
introduction to web engineering.pptx
introduction to web engineering.pptxintroduction to web engineering.pptx
introduction to web engineering.pptx
 
introduction to web engineering.pdf
introduction to web engineering.pdfintroduction to web engineering.pdf
introduction to web engineering.pdf
 
understanding computers.ppt
understanding computers.pptunderstanding computers.ppt
understanding computers.ppt
 
semantic integration.ppt
semantic integration.pptsemantic integration.ppt
semantic integration.ppt
 
semantic web tech.ppt
semantic web tech.pptsemantic web tech.ppt
semantic web tech.ppt
 
Bioinformatic_Databases_2.ppt
Bioinformatic_Databases_2.pptBioinformatic_Databases_2.ppt
Bioinformatic_Databases_2.ppt
 
Lec2_Information Integration.ppt
 Lec2_Information Integration.ppt Lec2_Information Integration.ppt
Lec2_Information Integration.ppt
 
ch5-georeferencing.ppt
ch5-georeferencing.pptch5-georeferencing.ppt
ch5-georeferencing.ppt
 
intro to gis
intro to gisintro to gis
intro to gis
 

Recently uploaded

Practical Research for grade 12 students
Practical Research for grade 12 studentsPractical Research for grade 12 students
Practical Research for grade 12 students
juliaaaaana10
 
Female Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service An...
Female Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service An...Female Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service An...
Female Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service An...
sheetal singh$A17
 
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
45unexpected
 
FINAL PROJECT WORK PORTFOLIO MANAGEMENT (2) hhh (1) (2) (5) (1) (1).pdf
FINAL PROJECT WORK PORTFOLIO MANAGEMENT (2)  hhh (1) (2) (5) (1) (1).pdfFINAL PROJECT WORK PORTFOLIO MANAGEMENT (2)  hhh (1) (2) (5) (1) (1).pdf
FINAL PROJECT WORK PORTFOLIO MANAGEMENT (2) hhh (1) (2) (5) (1) (1).pdf
bala krishna
 
AWS re:Invent 2023 - Deep dive into Amazon Aurora and its innovations DAT408
AWS re:Invent 2023 - Deep dive into Amazon Aurora and its innovations DAT408AWS re:Invent 2023 - Deep dive into Amazon Aurora and its innovations DAT408
AWS re:Invent 2023 - Deep dive into Amazon Aurora and its innovations DAT408
Grant McAlister
 
Cyber Insurance Mathematical Model & Pricing 2
Cyber Insurance Mathematical Model & Pricing 2Cyber Insurance Mathematical Model & Pricing 2
Cyber Insurance Mathematical Model & Pricing 2
BaraDaniel1
 
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeliveryBDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
erynsouthern
 
PTT of AI Bots, Avatar, business continuity software.
PTT of AI Bots, Avatar, business continuity software.PTT of AI Bots, Avatar, business continuity software.
PTT of AI Bots, Avatar, business continuity software.
arash8484
 
CMO MRM_May 2024 WITH BREAKDOWN AND IMPROVEMENTDATA.pdf
CMO MRM_May 2024 WITH BREAKDOWN AND IMPROVEMENTDATA.pdfCMO MRM_May 2024 WITH BREAKDOWN AND IMPROVEMENTDATA.pdf
CMO MRM_May 2024 WITH BREAKDOWN AND IMPROVEMENTDATA.pdf
IndranilDasgupta19
 
VVIP Girls Call Noida 9873940964 Provide Best And Top Girl Service And No1 in...
VVIP Girls Call Noida 9873940964 Provide Best And Top Girl Service And No1 in...VVIP Girls Call Noida 9873940964 Provide Best And Top Girl Service And No1 in...
VVIP Girls Call Noida 9873940964 Provide Best And Top Girl Service And No1 in...
Ak47
 
Why_are_we_hypnotizing_ourselves-_ATeggin-1.pdf
Why_are_we_hypnotizing_ourselves-_ATeggin-1.pdfWhy_are_we_hypnotizing_ourselves-_ATeggin-1.pdf
Why_are_we_hypnotizing_ourselves-_ATeggin-1.pdf
Alexander Teggin
 
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
6459astrid
 
Self-healing Security Systems - CloudIOTEnterpriseSystems-Group5.pptx
Self-healing Security Systems - CloudIOTEnterpriseSystems-Group5.pptxSelf-healing Security Systems - CloudIOTEnterpriseSystems-Group5.pptx
Self-healing Security Systems - CloudIOTEnterpriseSystems-Group5.pptx
BiplabRoy71
 
Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)
Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)
Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)
Alireza Kamrani
 
UNITEC Institute of Technology diploma
UNITEC Institute of Technology diplomaUNITEC Institute of Technology diploma
UNITEC Institute of Technology diploma
oyhka
 
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
satpalsheravatmumbai
 
Communication-Skills-An-Essential-Toolkit.pptx
Communication-Skills-An-Essential-Toolkit.pptxCommunication-Skills-An-Essential-Toolkit.pptx
Communication-Skills-An-Essential-Toolkit.pptx
sanketdhavale23di
 
The Rise of Python in Finance,Automating Trading Strategies: _.pdf
The Rise of Python in Finance,Automating Trading Strategies: _.pdfThe Rise of Python in Finance,Automating Trading Strategies: _.pdf
The Rise of Python in Finance,Automating Trading Strategies: _.pdf
Riya Sen
 
Histology of Muscle types histology o.ppt
Histology of Muscle types histology o.pptHistology of Muscle types histology o.ppt
Histology of Muscle types histology o.ppt
SamanArshad11
 
Celonis Busniess Analyst Virtual Internship.pptx
Celonis Busniess Analyst Virtual Internship.pptxCelonis Busniess Analyst Virtual Internship.pptx
Celonis Busniess Analyst Virtual Internship.pptx
AnujaGaikwad28
 

Recently uploaded (20)

Practical Research for grade 12 students
Practical Research for grade 12 studentsPractical Research for grade 12 students
Practical Research for grade 12 students
 
Female Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service An...
Female Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service An...Female Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service An...
Female Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service An...
 
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
 
FINAL PROJECT WORK PORTFOLIO MANAGEMENT (2) hhh (1) (2) (5) (1) (1).pdf
FINAL PROJECT WORK PORTFOLIO MANAGEMENT (2)  hhh (1) (2) (5) (1) (1).pdfFINAL PROJECT WORK PORTFOLIO MANAGEMENT (2)  hhh (1) (2) (5) (1) (1).pdf
FINAL PROJECT WORK PORTFOLIO MANAGEMENT (2) hhh (1) (2) (5) (1) (1).pdf
 
AWS re:Invent 2023 - Deep dive into Amazon Aurora and its innovations DAT408
AWS re:Invent 2023 - Deep dive into Amazon Aurora and its innovations DAT408AWS re:Invent 2023 - Deep dive into Amazon Aurora and its innovations DAT408
AWS re:Invent 2023 - Deep dive into Amazon Aurora and its innovations DAT408
 
Cyber Insurance Mathematical Model & Pricing 2
Cyber Insurance Mathematical Model & Pricing 2Cyber Insurance Mathematical Model & Pricing 2
Cyber Insurance Mathematical Model & Pricing 2
 
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeliveryBDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
 
PTT of AI Bots, Avatar, business continuity software.
PTT of AI Bots, Avatar, business continuity software.PTT of AI Bots, Avatar, business continuity software.
PTT of AI Bots, Avatar, business continuity software.
 
CMO MRM_May 2024 WITH BREAKDOWN AND IMPROVEMENTDATA.pdf
CMO MRM_May 2024 WITH BREAKDOWN AND IMPROVEMENTDATA.pdfCMO MRM_May 2024 WITH BREAKDOWN AND IMPROVEMENTDATA.pdf
CMO MRM_May 2024 WITH BREAKDOWN AND IMPROVEMENTDATA.pdf
 
VVIP Girls Call Noida 9873940964 Provide Best And Top Girl Service And No1 in...
VVIP Girls Call Noida 9873940964 Provide Best And Top Girl Service And No1 in...VVIP Girls Call Noida 9873940964 Provide Best And Top Girl Service And No1 in...
VVIP Girls Call Noida 9873940964 Provide Best And Top Girl Service And No1 in...
 
Why_are_we_hypnotizing_ourselves-_ATeggin-1.pdf
Why_are_we_hypnotizing_ourselves-_ATeggin-1.pdfWhy_are_we_hypnotizing_ourselves-_ATeggin-1.pdf
Why_are_we_hypnotizing_ourselves-_ATeggin-1.pdf
 
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
 
Self-healing Security Systems - CloudIOTEnterpriseSystems-Group5.pptx
Self-healing Security Systems - CloudIOTEnterpriseSystems-Group5.pptxSelf-healing Security Systems - CloudIOTEnterpriseSystems-Group5.pptx
Self-healing Security Systems - CloudIOTEnterpriseSystems-Group5.pptx
 
Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)
Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)
Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)
 
UNITEC Institute of Technology diploma
UNITEC Institute of Technology diplomaUNITEC Institute of Technology diploma
UNITEC Institute of Technology diploma
 
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
 
Communication-Skills-An-Essential-Toolkit.pptx
Communication-Skills-An-Essential-Toolkit.pptxCommunication-Skills-An-Essential-Toolkit.pptx
Communication-Skills-An-Essential-Toolkit.pptx
 
The Rise of Python in Finance,Automating Trading Strategies: _.pdf
The Rise of Python in Finance,Automating Trading Strategies: _.pdfThe Rise of Python in Finance,Automating Trading Strategies: _.pdf
The Rise of Python in Finance,Automating Trading Strategies: _.pdf
 
Histology of Muscle types histology o.ppt
Histology of Muscle types histology o.pptHistology of Muscle types histology o.ppt
Histology of Muscle types histology o.ppt
 
Celonis Busniess Analyst Virtual Internship.pptx
Celonis Busniess Analyst Virtual Internship.pptxCelonis Busniess Analyst Virtual Internship.pptx
Celonis Busniess Analyst Virtual Internship.pptx
 

Web Mining .ppt

  • 2. Overview  Challenges in Web Mining  Basics of Web Mining  Classification of Web Mining
  • 3. Web Mining It is the application of data mining techniques to automatically discover and extract information from Web data, including web documents, hyperlinks between documents, usage logs of websites, etc.  Web mining is a multidisciplinary field:  Data mining,  Machine learning,  Natural language processing,  Statistics,  Databases,  Information retrieval, multimedia, etc.
  • 4. Web mining challenges  The Web has many unique characteristics, which make mining useful information and knowledge a fascinating and challenging task.  The amount of information on the Web is huge, and easily accessible.  Information/data of almost all types exist on the Web, e.g., structured tables, texts, multimedia data, etc.  Much of the Web information is redundant. The same piece of information or its variants may appear in many pages.  The Web is noisy. A Web page typically contains a mixture of many kinds of information, e.g., main contents, advertisements, navigation panels, copyright notices, etc.
  • 5. Web mining challenges  The Web is dynamic. Information on the Web changes constantly. Keeping up with the changes and monitoring the changes are important issues.  Above all, the Web is a virtual society. It is not only about data, information and services, but also about interactions among people, organizations and automatic systems, i.e., communities.
  • 6. Classification of Web Mining Techniques  Web Structure Mining  Web Usage Mining  Web Content Mining
  • 7. Web-Structure Mining  Discovering useful knowledge from hyperlinks, which represent the structure of the Web.  Link mining refers to data mining techniques that explicitly consider these links when building predictive or descriptive models of the linked data are used for beneficial applications i.e.,:  In search engines: for discovering important Web pages.  In social network analysis: for discovering communities of users who share common interests.  Citation analysis (co-citation & bibliographic coupling)
  • 8. Web-Usage Mining  Discovery of user access patterns from Web usage logs, which record user clickstreams.  Clickstream  It is the recording of what a computer user clicks on while Web browsing. As the user clicks anywhere in the webpage, the action is logged on a client or inside the Web server, as well as other sources.
  • 9. Web-Usage Mining  Clickstream Analysis answers the following questions:  Which web page is the most common point of entry for users?  Are visitors entering through the gateway constructed by the website developers, or are they somehow by passing the gateway and landing in the middle of the Web site?  In which order have the pages been viewed?  Is this page sequencing as the developers might have expected, or is there something the users are trying to tell us about how the Web site should be structured?  Which other Web sites referred the users to your Web site?  Which referrer sites are providing us with the greatest number of referrals?  How many web pages have been viewed in the typical visit?
  • 10. Web-Usage Mining Benefits  Restructure a website  Extract user access patterns to target ads  Number of access to individual files  Predict user behavior based on previously learned rules and users’ profile
  • 11. Web-Usage Mining Techniques  Data Preprocessing Conversion of raw data in usage logs in order to produce the right data for mining. (e.g., data cleaning)  Pattern Discovery - using the algorithms and techniques from data mining, sequential pattern mining, machine learning, statistics and pattern recognition etc. - Common data mining techniques are association rules and sequence pattern mining.  Pattern Analysis Validation and interpretation of the mined patterns.
  • 12. Web Content Mining  Discovering useful information or knowledge from Web page contents.  Web data contents include text, Image, audio, video, metadata and hyperlinks.  Technologies that are normally used in web content mining are NLP (Natural Language Processing) and IR (Information Retrieval).
  • 13. Web Content Mining Applications  Web Information Integration and Schema Matching.  (Lecture 2)  Opinion extraction from online sources.  (Lecture 3)  Knowledge synthesis (representation).  (Lecture 4)
  • 15. CS583, Bing Liu, UIC 15 Social network analysis  Social network is the study of social entities (people in an organization, called actors), and their interactions and relationships.  The interactions and relationships can be represented with a network or graph,  each vertex (or node) represents an actor and  each link represents a relationship.  From the network, we can study the properties of its structure, and find various kinds of sub-graphs, e.g., communities formed by groups of actors.  We study two types of social network analysis, centrality and prestige, which are closely related to hyperlink analysis and search on the Web.
  • 16. CS583, Bing Liu, UIC 16 Centrality  Important or prominent actors are those that are linked or involved with other actors extensively.  A person with extensive contacts (links) or communications with many other people in the organization is considered more important than a person with relatively fewer contacts.  The links can also be called ties. A central actor is one involved in many ties.
  • 17. 17 Centrality Based on the varying notions of importance of vertices or edges, different centrality measures were developed: 1. Degree centrality 2. Betweenness centrality 3. Closeness centrality
  • 18. 18 Degree Centrality Central actors are the most active actors that have most links or ties with other actors. Let the total number of actors in the network be n.  Undirected Graph: In an undirected graph, the degree centrality of an actor i (denoted by CD(i)) is simply the node degree (the number of edges) of the actor node, denoted by d(i), normalized with the maximum degree, n- 1.  The value of this measure ranges between 0 and 1 as n-1 is the maximum value of d(i).  Directed Graph: In this case, we need to distinguish in-links of actor i (links pointing to i), and out-links (links pointing out from i). The degree centrality is defined based on only the out-degree (the number of out-links or edges), do(i).
  • 20. 20 Closeness Centrality This view of centrality is based on the closeness or distance. The basic idea is that an actor xi is central if it can easily interact with all other actors. That is, its distance to all other actors is short. Thus, we can use the shortest distance to compute this measure. Let the shortest distance from actor i to actor j be d(i, j) (measured as the number of links in a shortest path).  Undirected Graph: The closeness centrality CC(i) of actor i is defined as  The value of this measure also ranges between 0 and 1 as n-1 is the minimum value of the denominator, which is the sum of the shortest distances from i to all other actors.  Directed Graph: The same equation can be used for a directed graph. The distance computation needs to consider directions of links or edges.
  • 21. 21 Closeness Centrality  CC(d)=0.75  d is at distance 1 from 4 nodes and at distance 2 from 2 nodes.  Then ∑j≠ddist(d,j)=1+1+1+1+2+2=8  Since there are 7 nodes in the network, the numerator of the equation above is 6, then the closeness centrality of d is 6/8=0.75
  • 22. CS583, Bing Liu, UIC 22 Betweenness Centrality  If two non-adjacent actors j and k want to interact and actor i is on the path between j and k, then i may have some control over the interactions between j and k.  Betweenness measures this control of i over other pairs of actors. Thus,  if i is on the paths of many such interactions, then i is an important actor.
  • 23. CS583, Bing Liu, UIC 23 Betweenness Centrality (cont …)  Undirected graph: Let pjk be the number of shortest paths between actor j and actor k.  The betweenness of an actor i is defined as the number of shortest paths that pass i (pjk(i)) normalized by the total number of shortest paths.  k j jk jk p i p ) (
  • 24. 24 Betweenness Centrality  CB(b)=16  as all the shortest paths from any node from the set a,c  to any node from the set d,e,f,g  pass through b