SlideShare a Scribd company logo
1 of 25
Advanced IS Design
Lecture 1
Web Mining
Overview
 Challenges in Web Mining
 Basics of Web Mining
 Classification of Web Mining
Web Mining
It is the application of data mining techniques to
automatically discover and extract information from
Web data, including web documents, hyperlinks
between documents, usage logs of websites, etc.
 Web mining is a multidisciplinary field:
 Data mining,
 Machine learning,
 Natural language processing,
 Statistics,
 Databases,
 Information retrieval, multimedia, etc.
Web mining challenges
 The Web has many unique characteristics, which make
mining useful information and knowledge a fascinating and
challenging task.
 The amount of information on the Web is huge, and easily
accessible.
 Information/data of almost all types exist on the Web, e.g.,
structured tables, texts, multimedia data, etc.
 Much of the Web information is redundant. The same piece of
information or its variants may appear in many pages.
 The Web is noisy. A Web page typically contains a mixture of many
kinds of information, e.g., main contents, advertisements,
navigation panels, copyright notices, etc.
Web mining challenges
 The Web is dynamic. Information on the Web changes
constantly. Keeping up with the changes and monitoring the
changes are important issues.
 Above all, the Web is a virtual society. It is not only about data,
information and services, but also about interactions among
people, organizations and automatic systems, i.e., communities.
Classification of Web Mining Techniques
 Web Structure Mining
 Web Usage Mining
 Web Content Mining
Web-Structure Mining
 Discovering useful knowledge from hyperlinks,
which represent the structure of the Web.
 Link mining refers to data mining techniques that
explicitly consider these links when building predictive or
descriptive models of the linked data are used for
beneficial applications i.e.,:
 In search engines: for discovering important Web
pages.
 In social network analysis: for discovering
communities of users who share common interests.
 Citation analysis (co-citation & bibliographic coupling)
Web-Usage Mining
 Discovery of user access patterns from Web
usage logs, which record user clickstreams.
 Clickstream
 It is the recording of what a computer user clicks on
while Web browsing. As the user clicks anywhere in
the webpage, the action is logged on a client or inside
the Web server, as well as other sources.
Web-Usage Mining
 Clickstream Analysis answers the following questions:
 Which web page is the most common point of entry for users?
 Are visitors entering through the gateway constructed by the
website developers, or are they somehow by passing the
gateway and landing in the middle of the Web site?
 In which order have the pages been viewed?
 Is this page sequencing as the developers might have expected,
or is there something the users are trying to tell us about how the
Web site should be structured?
 Which other Web sites referred the users to your Web site?
 Which referrer sites are providing us with the greatest number of
referrals?
 How many web pages have been viewed in the typical visit?
Web-Usage Mining Benefits
 Restructure a website
 Extract user access patterns to target ads
 Number of access to individual files
 Predict user behavior based on previously learned
rules and users’ profile
Web-Usage Mining Techniques
 Data Preprocessing
Conversion of raw data in usage logs in order to produce
the right data for mining. (e.g., data cleaning)
 Pattern Discovery
- using the algorithms and techniques from data mining,
sequential pattern mining, machine learning, statistics and pattern
recognition etc.
- Common data mining techniques are association rules
and sequence pattern mining.
 Pattern Analysis
Validation and interpretation of the mined patterns.
Web Content Mining
 Discovering useful information or knowledge
from Web page contents.
 Web data contents include text, Image, audio, video,
metadata and hyperlinks.
 Technologies that are normally used in web
content mining are NLP (Natural Language
Processing) and IR (Information Retrieval).
Web Content Mining Applications
 Web Information Integration and Schema
Matching.
 (Lecture 2)
 Opinion extraction from online sources.
 (Lecture 3)
 Knowledge synthesis (representation).
 (Lecture 4)
Social Network Analysis
CS583, Bing Liu, UIC 15
Social network analysis
 Social network is the study of social entities (people
in an organization, called actors), and their
interactions and relationships.
 The interactions and relationships can be
represented with a network or graph,
 each vertex (or node) represents an actor and
 each link represents a relationship.
 From the network, we can study the properties of its
structure, and find various kinds of sub-graphs, e.g.,
communities formed by groups of actors.
 We study two types of social network analysis, centrality
and prestige, which are closely related to hyperlink
analysis and search on the Web.
CS583, Bing Liu, UIC 16
Centrality
 Important or prominent actors are those that
are linked or involved with other actors
extensively.
 A person with extensive contacts (links) or
communications with many other people in
the organization is considered more important
than a person with relatively fewer contacts.
 The links can also be called ties. A central
actor is one involved in many ties.
17
Centrality
Based on the varying notions of importance of
vertices or edges, different centrality measures
were developed:
1. Degree centrality
2. Betweenness centrality
3. Closeness centrality
18
Degree Centrality
Central actors are the most active actors that have most links or ties
with other actors. Let the total number of actors in the network be n.
 Undirected Graph: In an undirected graph, the degree centrality of an
actor i (denoted by CD(i)) is simply the node degree (the number of edges)
of the actor node, denoted by d(i), normalized with the maximum degree, n-
1.
 The value of this measure ranges between 0 and 1 as n-1 is the maximum
value of d(i).
 Directed Graph: In this case, we need to distinguish in-links of actor i
(links pointing to i), and out-links (links pointing out from i). The degree
centrality is defined based on only the out-degree (the number of out-links or
edges), do(i).
19
Degree Centrality
degree?
20
Closeness Centrality
This view of centrality is based on the closeness or distance. The basic
idea is that an actor xi is central if it can easily interact with all other
actors. That is, its distance to all other actors is short. Thus, we can use
the shortest distance to compute this measure. Let the shortest
distance from actor i to actor j be d(i, j) (measured as the number of
links in a shortest path).
 Undirected Graph: The closeness centrality CC(i) of actor i is defined as
 The value of this measure also ranges between 0 and 1 as n-1 is the
minimum value of the denominator, which is the sum of the shortest
distances from i to all other actors.
 Directed Graph: The same equation can be used for a directed graph. The
distance computation needs to consider directions of links or edges.
21
Closeness Centrality
 CC(d)=0.75
 d is at distance 1 from 4 nodes
and at distance 2 from 2 nodes.
 Then
∑j≠ddist(d,j)=1+1+1+1+2+2=8
 Since there are 7 nodes in the
network, the numerator of the
equation above is 6, then the
closeness centrality of d is
6/8=0.75
CS583, Bing Liu, UIC 22
Betweenness Centrality
 If two non-adjacent actors j and k want to
interact and actor i is on the path between j
and k, then i may have some control over the
interactions between j and k.
 Betweenness measures this control of i over
other pairs of actors. Thus,
 if i is on the paths of many such interactions, then
i is an important actor.
CS583, Bing Liu, UIC 23
Betweenness Centrality (cont …)
 Undirected graph: Let pjk be the number of
shortest paths between actor j and actor k.
 The betweenness of an actor i is defined as the
number of shortest paths that pass i (pjk(i))
normalized by the total number of shortest paths.

k
j jk
jk
p
i
p )
(
24
Betweenness Centrality
 CB(b)=16
 as all the shortest paths from
any node from the set a,c
 to any node from the set d,e,f,g
 pass through b
THANK YOU
25

More Related Content

Similar to Web Mining .ppt

Data.Mining.C.8(Ii).Web Mining 570802461
Data.Mining.C.8(Ii).Web Mining 570802461Data.Mining.C.8(Ii).Web Mining 570802461
Data.Mining.C.8(Ii).Web Mining 570802461Margaret Wang
 
Data.Mining.C.8(Ii).Web Mining 570802461
Data.Mining.C.8(Ii).Web Mining 570802461Data.Mining.C.8(Ii).Web Mining 570802461
Data.Mining.C.8(Ii).Web Mining 570802461Margaret Wang
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGIJwest
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGIJwest
 
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)paperpublications3
 
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)paperpublications3
 
APPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKS
APPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKSAPPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKS
APPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKSIJwest
 
APPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKS
APPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKSAPPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKS
APPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKSIJwest
 
Sampling of User Behavior Using Online Social Network
Sampling of User Behavior Using Online Social NetworkSampling of User Behavior Using Online Social Network
Sampling of User Behavior Using Online Social NetworkEditor IJCATR
 
Sampling of User Behavior Using Online Social Network
Sampling of User Behavior Using Online Social NetworkSampling of User Behavior Using Online Social Network
Sampling of User Behavior Using Online Social NetworkEditor IJCATR
 
Nt1310 Unit 1 Literature Review
Nt1310 Unit 1 Literature ReviewNt1310 Unit 1 Literature Review
Nt1310 Unit 1 Literature ReviewCamella Taylor
 
Nt1310 Unit 1 Literature Review
Nt1310 Unit 1 Literature ReviewNt1310 Unit 1 Literature Review
Nt1310 Unit 1 Literature ReviewCamella Taylor
 
20142014_20142015_20142115
20142014_20142015_2014211520142014_20142015_20142115
20142014_20142015_20142115Divita Madaan
 
20142014_20142015_20142115
20142014_20142015_2014211520142014_20142015_20142115
20142014_20142015_20142115Divita Madaan
 
Exploring the Current Trends and Future Prospects in Terrorist Network Mining
Exploring the Current Trends and Future Prospects in Terrorist Network Mining Exploring the Current Trends and Future Prospects in Terrorist Network Mining
Exploring the Current Trends and Future Prospects in Terrorist Network Mining cscpconf
 
Exploring the Current Trends and Future Prospects in Terrorist Network Mining
Exploring the Current Trends and Future Prospects in Terrorist Network Mining Exploring the Current Trends and Future Prospects in Terrorist Network Mining
Exploring the Current Trends and Future Prospects in Terrorist Network Mining cscpconf
 

Similar to Web Mining .ppt (20)

Data.Mining.C.8(Ii).Web Mining 570802461
Data.Mining.C.8(Ii).Web Mining 570802461Data.Mining.C.8(Ii).Web Mining 570802461
Data.Mining.C.8(Ii).Web Mining 570802461
 
Data.Mining.C.8(Ii).Web Mining 570802461
Data.Mining.C.8(Ii).Web Mining 570802461Data.Mining.C.8(Ii).Web Mining 570802461
Data.Mining.C.8(Ii).Web Mining 570802461
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
 
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
 
APPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKS
APPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKSAPPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKS
APPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKS
 
APPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKS
APPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKSAPPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKS
APPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKS
 
Sampling of User Behavior Using Online Social Network
Sampling of User Behavior Using Online Social NetworkSampling of User Behavior Using Online Social Network
Sampling of User Behavior Using Online Social Network
 
Sampling of User Behavior Using Online Social Network
Sampling of User Behavior Using Online Social NetworkSampling of User Behavior Using Online Social Network
Sampling of User Behavior Using Online Social Network
 
Nt1310 Unit 1 Literature Review
Nt1310 Unit 1 Literature ReviewNt1310 Unit 1 Literature Review
Nt1310 Unit 1 Literature Review
 
Nt1310 Unit 1 Literature Review
Nt1310 Unit 1 Literature ReviewNt1310 Unit 1 Literature Review
Nt1310 Unit 1 Literature Review
 
20142014_20142015_20142115
20142014_20142015_2014211520142014_20142015_20142115
20142014_20142015_20142115
 
20142014_20142015_20142115
20142014_20142015_2014211520142014_20142015_20142115
20142014_20142015_20142115
 
Exploring the Current Trends and Future Prospects in Terrorist Network Mining
Exploring the Current Trends and Future Prospects in Terrorist Network Mining Exploring the Current Trends and Future Prospects in Terrorist Network Mining
Exploring the Current Trends and Future Prospects in Terrorist Network Mining
 
Exploring the Current Trends and Future Prospects in Terrorist Network Mining
Exploring the Current Trends and Future Prospects in Terrorist Network Mining Exploring the Current Trends and Future Prospects in Terrorist Network Mining
Exploring the Current Trends and Future Prospects in Terrorist Network Mining
 

More from NaglaaFathy42

reverse engineering.ppt
reverse engineering.pptreverse engineering.ppt
reverse engineering.pptNaglaaFathy42
 
introduction to web engineering.pptx
introduction to web engineering.pptxintroduction to web engineering.pptx
introduction to web engineering.pptxNaglaaFathy42
 
introduction to web engineering.pdf
introduction to web engineering.pdfintroduction to web engineering.pdf
introduction to web engineering.pdfNaglaaFathy42
 
understanding computers.ppt
understanding computers.pptunderstanding computers.ppt
understanding computers.pptNaglaaFathy42
 
semantic integration.ppt
semantic integration.pptsemantic integration.ppt
semantic integration.pptNaglaaFathy42
 
semantic web tech.ppt
semantic web tech.pptsemantic web tech.ppt
semantic web tech.pptNaglaaFathy42
 
Bioinformatic_Databases_2.ppt
Bioinformatic_Databases_2.pptBioinformatic_Databases_2.ppt
Bioinformatic_Databases_2.pptNaglaaFathy42
 
Lec2_Information Integration.ppt
 Lec2_Information Integration.ppt Lec2_Information Integration.ppt
Lec2_Information Integration.pptNaglaaFathy42
 
ch5-georeferencing.ppt
ch5-georeferencing.pptch5-georeferencing.ppt
ch5-georeferencing.pptNaglaaFathy42
 

More from NaglaaFathy42 (10)

reverse engineering.ppt
reverse engineering.pptreverse engineering.ppt
reverse engineering.ppt
 
introduction to web engineering.pptx
introduction to web engineering.pptxintroduction to web engineering.pptx
introduction to web engineering.pptx
 
introduction to web engineering.pdf
introduction to web engineering.pdfintroduction to web engineering.pdf
introduction to web engineering.pdf
 
understanding computers.ppt
understanding computers.pptunderstanding computers.ppt
understanding computers.ppt
 
semantic integration.ppt
semantic integration.pptsemantic integration.ppt
semantic integration.ppt
 
semantic web tech.ppt
semantic web tech.pptsemantic web tech.ppt
semantic web tech.ppt
 
Bioinformatic_Databases_2.ppt
Bioinformatic_Databases_2.pptBioinformatic_Databases_2.ppt
Bioinformatic_Databases_2.ppt
 
Lec2_Information Integration.ppt
 Lec2_Information Integration.ppt Lec2_Information Integration.ppt
Lec2_Information Integration.ppt
 
ch5-georeferencing.ppt
ch5-georeferencing.pptch5-georeferencing.ppt
ch5-georeferencing.ppt
 
intro to gis
intro to gisintro to gis
intro to gis
 

Recently uploaded

From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 

Web Mining .ppt

  • 2. Overview  Challenges in Web Mining  Basics of Web Mining  Classification of Web Mining
  • 3. Web Mining It is the application of data mining techniques to automatically discover and extract information from Web data, including web documents, hyperlinks between documents, usage logs of websites, etc.  Web mining is a multidisciplinary field:  Data mining,  Machine learning,  Natural language processing,  Statistics,  Databases,  Information retrieval, multimedia, etc.
  • 4. Web mining challenges  The Web has many unique characteristics, which make mining useful information and knowledge a fascinating and challenging task.  The amount of information on the Web is huge, and easily accessible.  Information/data of almost all types exist on the Web, e.g., structured tables, texts, multimedia data, etc.  Much of the Web information is redundant. The same piece of information or its variants may appear in many pages.  The Web is noisy. A Web page typically contains a mixture of many kinds of information, e.g., main contents, advertisements, navigation panels, copyright notices, etc.
  • 5. Web mining challenges  The Web is dynamic. Information on the Web changes constantly. Keeping up with the changes and monitoring the changes are important issues.  Above all, the Web is a virtual society. It is not only about data, information and services, but also about interactions among people, organizations and automatic systems, i.e., communities.
  • 6. Classification of Web Mining Techniques  Web Structure Mining  Web Usage Mining  Web Content Mining
  • 7. Web-Structure Mining  Discovering useful knowledge from hyperlinks, which represent the structure of the Web.  Link mining refers to data mining techniques that explicitly consider these links when building predictive or descriptive models of the linked data are used for beneficial applications i.e.,:  In search engines: for discovering important Web pages.  In social network analysis: for discovering communities of users who share common interests.  Citation analysis (co-citation & bibliographic coupling)
  • 8. Web-Usage Mining  Discovery of user access patterns from Web usage logs, which record user clickstreams.  Clickstream  It is the recording of what a computer user clicks on while Web browsing. As the user clicks anywhere in the webpage, the action is logged on a client or inside the Web server, as well as other sources.
  • 9. Web-Usage Mining  Clickstream Analysis answers the following questions:  Which web page is the most common point of entry for users?  Are visitors entering through the gateway constructed by the website developers, or are they somehow by passing the gateway and landing in the middle of the Web site?  In which order have the pages been viewed?  Is this page sequencing as the developers might have expected, or is there something the users are trying to tell us about how the Web site should be structured?  Which other Web sites referred the users to your Web site?  Which referrer sites are providing us with the greatest number of referrals?  How many web pages have been viewed in the typical visit?
  • 10. Web-Usage Mining Benefits  Restructure a website  Extract user access patterns to target ads  Number of access to individual files  Predict user behavior based on previously learned rules and users’ profile
  • 11. Web-Usage Mining Techniques  Data Preprocessing Conversion of raw data in usage logs in order to produce the right data for mining. (e.g., data cleaning)  Pattern Discovery - using the algorithms and techniques from data mining, sequential pattern mining, machine learning, statistics and pattern recognition etc. - Common data mining techniques are association rules and sequence pattern mining.  Pattern Analysis Validation and interpretation of the mined patterns.
  • 12. Web Content Mining  Discovering useful information or knowledge from Web page contents.  Web data contents include text, Image, audio, video, metadata and hyperlinks.  Technologies that are normally used in web content mining are NLP (Natural Language Processing) and IR (Information Retrieval).
  • 13. Web Content Mining Applications  Web Information Integration and Schema Matching.  (Lecture 2)  Opinion extraction from online sources.  (Lecture 3)  Knowledge synthesis (representation).  (Lecture 4)
  • 15. CS583, Bing Liu, UIC 15 Social network analysis  Social network is the study of social entities (people in an organization, called actors), and their interactions and relationships.  The interactions and relationships can be represented with a network or graph,  each vertex (or node) represents an actor and  each link represents a relationship.  From the network, we can study the properties of its structure, and find various kinds of sub-graphs, e.g., communities formed by groups of actors.  We study two types of social network analysis, centrality and prestige, which are closely related to hyperlink analysis and search on the Web.
  • 16. CS583, Bing Liu, UIC 16 Centrality  Important or prominent actors are those that are linked or involved with other actors extensively.  A person with extensive contacts (links) or communications with many other people in the organization is considered more important than a person with relatively fewer contacts.  The links can also be called ties. A central actor is one involved in many ties.
  • 17. 17 Centrality Based on the varying notions of importance of vertices or edges, different centrality measures were developed: 1. Degree centrality 2. Betweenness centrality 3. Closeness centrality
  • 18. 18 Degree Centrality Central actors are the most active actors that have most links or ties with other actors. Let the total number of actors in the network be n.  Undirected Graph: In an undirected graph, the degree centrality of an actor i (denoted by CD(i)) is simply the node degree (the number of edges) of the actor node, denoted by d(i), normalized with the maximum degree, n- 1.  The value of this measure ranges between 0 and 1 as n-1 is the maximum value of d(i).  Directed Graph: In this case, we need to distinguish in-links of actor i (links pointing to i), and out-links (links pointing out from i). The degree centrality is defined based on only the out-degree (the number of out-links or edges), do(i).
  • 20. 20 Closeness Centrality This view of centrality is based on the closeness or distance. The basic idea is that an actor xi is central if it can easily interact with all other actors. That is, its distance to all other actors is short. Thus, we can use the shortest distance to compute this measure. Let the shortest distance from actor i to actor j be d(i, j) (measured as the number of links in a shortest path).  Undirected Graph: The closeness centrality CC(i) of actor i is defined as  The value of this measure also ranges between 0 and 1 as n-1 is the minimum value of the denominator, which is the sum of the shortest distances from i to all other actors.  Directed Graph: The same equation can be used for a directed graph. The distance computation needs to consider directions of links or edges.
  • 21. 21 Closeness Centrality  CC(d)=0.75  d is at distance 1 from 4 nodes and at distance 2 from 2 nodes.  Then ∑j≠ddist(d,j)=1+1+1+1+2+2=8  Since there are 7 nodes in the network, the numerator of the equation above is 6, then the closeness centrality of d is 6/8=0.75
  • 22. CS583, Bing Liu, UIC 22 Betweenness Centrality  If two non-adjacent actors j and k want to interact and actor i is on the path between j and k, then i may have some control over the interactions between j and k.  Betweenness measures this control of i over other pairs of actors. Thus,  if i is on the paths of many such interactions, then i is an important actor.
  • 23. CS583, Bing Liu, UIC 23 Betweenness Centrality (cont …)  Undirected graph: Let pjk be the number of shortest paths between actor j and actor k.  The betweenness of an actor i is defined as the number of shortest paths that pass i (pjk(i)) normalized by the total number of shortest paths.  k j jk jk p i p ) (
  • 24. 24 Betweenness Centrality  CB(b)=16  as all the shortest paths from any node from the set a,c  to any node from the set d,e,f,g  pass through b