Link Analysis (RBY)

Link Analysis on
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Link Analysis on the Web
Functional
Rankings
The big picture, the small picture and the medium-sized picture
Web Spam

Web Spam
Detection

Ricardo Baeza-Yates3,4
Topological Web
Spam
Joint work with: L. Becchetti1 , P. Boldi2 , C. Castillo1,3 ,
Direct Counting
D. Donato1,3 , S. Leonardi1 , B. Poblete5
of Supporters

Spam Detection
Results
1. Universit` di Roma “La Sapienza” – Rome, Italy
a
2. Univerit` degli Studi di Milano – Milan, Italy
a
3. Yahoo! Research Barcelona – Catalunya, Spain
4. Yahoo! Research Latin America – Santiago, Chile
5. Universitat Pompeu Fabra – Catalunya, Spain

Link Analysis on
the Web

Levels of Link Analysis
1
Levels of Link
Analysis
Generalizing PageRank
2
Generalizing
PageRank

Other
Other Functional Rankings
3
Functional
Rankings

Web Spam
Web Spam
4
Web Spam
Detection

Web Spam Detection
Topological Web 5
Spam

Direct Counting
of Supporters
Topological Web Spam
6
Spam Detection
Results

Direct Counting of Supporters
7

Spam Detection Results
8

Link Analysis on
the Web

Levels of Link
Analysis

Generalizing
PageRank
Levels of Link Analysis
1
Other
Generalizing PageRank
2
Functional
Other Functional Rankings
Rankings 3
Web Spam
4
Web Spam

Web Spam Detection
5
Web Spam
Detection
Topological Web Spam
6
Topological Web
Direct Counting of Supporters
7
Spam
Spam Detection Results
8
Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
How to ﬁnd meaningful patterns?
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Several levels of analysis:
Web Spam

Web Spam
Macroscopic view: overall structure
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Microscopic view: nodes
Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Microscopic view: nodes
Topological Web
Spam
Mesoscopic view: regions
Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
Macroscopic view, e.g. Bow-tie
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

[Broder et al., 2000]

Link Analysis on
Macroscopic view, e.g. Bow-tie, migration
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

[Baeza-Yates and Poblete, 2006]

Link Analysis on
Macroscopic view, e.g. Jellyﬁsh
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

[Tauro et al., 2001] - Internet Autonomous Systems (AS)
Topology

Link Analysis on
Macroscopic view, e.g. Jellyﬁsh
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
Microscopic view, e.g. Degree
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

[Barab´si, 2002] and others
a

Link Analysis on
Microscopic view, e.g. Degree
the Web

Greece Chile
Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Spain Korea
Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

[Baeza-Yates et al., 2006b] - compares this distribution in 8
countries . . . guess what is the result?

Link Analysis on
Mesoscopic view, e.g. Hop-plot
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
Mesoscopic view, e.g. Hop-plot
the Web

Levels of Link
Analysis
.it (40M pages) .uk (18M pages)
Generalizing 0.3 0.3
PageRank

Other 0.2 0.2
Frequency

Frequency
Functional
Rankings
0.1 0.1
Web Spam

Web Spam 0.0 0.0
5 10 15 20 25 30 5 10 15 20 25 30
Detection
Distance Distance
Topological Web
.eu.int (800K pages) Synthetic graph (100K pages)
Spam

Direct Counting 0.3 0.3
of Supporters

Spam Detection 0.2 0.2
Frequency

Frequency
Results

0.1 0.1

0.0 0.0
5 10 15 20 25 30 5 10 15 20 25 30
Distance Distance

[Baeza-Yates et al., 2006a]

Link Analysis on
Notation
the Web

Levels of Link
Analysis

Generalizing
Let PN×N be the normalized link matrix of a graph
PageRank

Row-normalized
Other
Functional
Rankings
No “sinks”
Web Spam

Deﬁnition (PageRank)
Web Spam
Detection
Stationary state of:
Topological Web
Spam

(1 − α)
Direct Counting
αP + 1N×N
of Supporters
N
Spam Detection
Results

Link Analysis on
Notation
the Web

Levels of Link
Analysis

Generalizing
Let PN×N be the normalized link matrix of a graph
PageRank

Row-normalized
Other
Functional
Rankings
No “sinks”
Web Spam

Deﬁnition (PageRank)
Web Spam
Detection
Stationary state of:
Topological Web
Spam

(1 − α)
Direct Counting
αP + 1N×N
of Supporters
N
Spam Detection
Results

Follow links with probability α
Random jump with probability 1 − α

Link Analysis on
Explicit Formulas
the Web

Levels of Link
Analysis

Generalizing
PageRank

Formulas for PageRank
Other
Functional
[Newman et al., 2001, Boldi et al., 2005]
Rankings

Web Spam
∞
(1 − α)
Web Spam
(αP)t .
r(α) =
Detection
N
t=0
Topological Web
Spam

(1 − α)α|p|
Direct Counting
ri (α) = branching(p)
of Supporters
N
Spam Detection p∈Path(−,i)
Results

Link Analysis on
Explicit Formulas
the Web

Levels of Link
Analysis

Generalizing
PageRank

Formulas for PageRank
Other
Functional
[Newman et al., 2001, Boldi et al., 2005]
Rankings

Web Spam
∞
(1 − α)
Web Spam
(αP)t .
r(α) =
Detection
N
t=0
Topological Web
Spam

(1 − α)α|p|
Direct Counting
of Supporters
N
Spam Detection p∈Path(−,i)
Results

Path(−, i) are incoming paths in node i

Link Analysis on
Branching contribution
the Web

Levels of Link
Analysis

Generalizing
PageRank
Deﬁnition (Branching contribution of a path)
Other
Functional
Given a path p = x1 , x2 , . . . , xt of length t = |p|
Rankings

Web Spam
1
branching(p) =
Web Spam
d1 d2 · · · dt−1
Detection

Topological Web
where di are the out-degrees of the members of the path
Spam

Direct Counting
For every node i and every length t
of Supporters

Spam Detection
Results
branching(p) = 1.
p∈Path(i,−),|p|=t

Link Analysis on
Functional ranking
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

General functional ranking [Baeza-Yates et al., 2006a]
Web Spam

Web Spam
damping(|p|)
Detection
N
Topological Web
p∈Path(−,i)
Spam

Direct Counting
PageRank is a particular case of path-based ranking
of Supporters

Spam Detection
Results

Link Analysis on
Exponential damping = PageRank
the Web

Levels of Link
0.30
Analysis
damping(t) with α=0.8
damping(t) with α=0.7
Generalizing
PageRank

Other
0.20
Functional

Weight
Rankings

Web Spam

Web Spam
0.10
Detection

Topological Web
Spam

Direct Counting
0.00
of Supporters
1 2 345678 9 10
Spam Detection
Length of the path (t)
Results

Exponential damping = PageRank
damping(t) = α(1 − α)t
Most of the contribution is on the ﬁrst few levels.

Link Analysis on
Linear damping
the Web

0.30
Levels of Link
damping(t) with L=15
Analysis

damping(t) with L=10
Generalizing
PageRank

0.20
Other
Functional

Weight
Rankings

Web Spam

0.10
Web Spam
Detection

Topological Web
Spam

0.00
Direct Counting
of Supporters
1 2 345678 9 10
Spam Detection
Length of the path (t)
Results

Linear damping
2(L−t)
t<L
L(L+1)
damping(t) =
t≥L
0

Link Analysis on
Example: Calculating LinearRank
the Web

Levels of Link
Analysis

Generalizing
PageRank

For calculating LinearRank we use:
Other
Functional
Rankings
∞
1
Web Spam
damping(t)Pt
LinearRank =
N
Web Spam
t=0
Detection

L−1
Topological Web
2(L − t) t
1
Spam
= P
N L(L + 1)
Direct Counting
t=0
of Supporters

Spam Detection
Results

Link Analysis on
Example: Calculating LinearRank
the Web

Levels of Link
Analysis

Generalizing
PageRank

For calculating LinearRank we use:
Other
Functional
Rankings
∞
1
Web Spam
damping(t)Pt
LinearRank =
N
Web Spam
t=0
Detection

L−1
Topological Web
2(L − t) t
1
Spam
= P
N L(L + 1)
Direct Counting
t=0
of Supporters

Spam Detection
Results
However, we cannot hold the temporary Pt in memory!

Link Analysis on
Re-write the damping as a recursion
the Web

Levels of Link
Analysis

Generalizing
PageRank
We have to rewrite to be able to calculate:
Other
Functional
2
Rankings
R(0) =
Web Spam
L+1
Web Spam
(L − k − 1) (k)
Detection
R(k+1) = RP
(L − k)
Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
the Web

Levels of Link
Analysis

Generalizing
PageRank
Other
Functional
2
Rankings
R(0) =
Web Spam
L+1
Web Spam
(L − k − 1) (k)
Detection
R(k+1) = RP
(L − k)
Topological Web
Spam
L−1
Direct Counting
R(k)
LinearRank =
of Supporters

Spam Detection k=0
Results

Link Analysis on
the Web

Levels of Link
Analysis

Generalizing
PageRank
Other
Functional
2
Rankings
R(0) =
Web Spam
L+1
Web Spam
(L − k − 1) (k)
Detection
R(k+1) = RP
(L − k)
Topological Web
Spam
L−1
Direct Counting
R(k)
LinearRank =
of Supporters

Spam Detection k=0
Results

Now we can give the algorithm . . .

Link Analysis on
Algorithm
the Web

Levels of Link
for i : 1 . . . N do {Initialization}
1:
Analysis
2
Score[i] ← R[i] ← L+1
2:
Generalizing
PageRank
3: end for
Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
Algorithm
the Web

Levels of Link
1:
Analysis
2
2:
Generalizing
PageRank
end for
3:
Other
for k : 1 . . . L − 1 do {Iteration step}
4:
Functional
Rankings
Aux ← 0
5:
Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
Algorithm
the Web

Levels of Link
1:
Analysis
2
2:
Generalizing
PageRank
end for
3:
Other
4:
Functional
Rankings
Aux ← 0
5:
Web Spam
for i : 1 . . . N do {Follow links in the graph}
6:
Web Spam
for all j such that there is a link from i to j do
7:
Detection

Aux[j] ← Aux[j] + R[i]/outdegree(i)
Topological Web 8:
Spam
end for
9:
Direct Counting
end for
of Supporters 10:
Spam Detection
Results

Link Analysis on
Algorithm
the Web

Levels of Link
1:
Analysis
2
2:
Generalizing
PageRank
end for
3:
Other
4:
Functional
Rankings
Aux ← 0
5:
Web Spam
6:
Web Spam
7:
Detection

Topological Web 8:
Spam
end for
9:
Direct Counting
end for
of Supporters 10:
for i : 1 . . . N do {Add to ranking value}
Spam Detection
11:
Results
R[i] ← Aux[i] × (L−k−1)
12: (L−k)
Score[i] ← Score[i] + R[i]
13:
end for
14:
end for
15:
return Score
16:

Link Analysis on
Algorithm (general)
the Web

Levels of Link
1:
Analysis

Score[i] ← R[i] ← INIT
2:
Generalizing
PageRank
end for
3:
Other
for k : 1 . . . STOP do {Iteration step}
4:
Functional
Rankings
Aux ← 0
5:
Web Spam
6:
Web Spam
Detection 7:
Topological Web
8:
Spam
end for
9:
Direct Counting
of Supporters
end for
10:
Spam Detection
for i : 1 . . . N do {Add to ranking value}
11:
Results
R[i] ← Aux[i] × FACTOR
12:
Score[i] ← Score[i] + R[i]
13:
end for
14:
end for
15:
return Score
16:

Link Analysis on
Other damping functions
the Web

Levels of Link
Analysis

Empirical damping:
Generalizing
PageRank

0.7
Other
Functional
Rankings

Average text similarity 0.6
Web Spam

Web Spam
0.5
Detection

Topological Web
Spam
0.4
Direct Counting
of Supporters
0.3
Spam Detection
Results

0.2
1 2 3 4 5
Link distance

Link Analysis on
Using LinearRank to approximage PageRank
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Experimental comparison: 18-million nodes in the U.K. Web
Rankings

Web Spam
Graph
Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam
Graph
Web Spam
Calculated PageRank with α = 0.1, 0.2, . . . , 0.9
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam
Graph
Web Spam
Detection

Topological Web
Calculated LinearRank with L = 5, 10, . . . , 25
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam
Graph
Web Spam
Detection

Topological Web
Calculated LinearRank with L = 5, 10, . . . , 25
Spam

For certain combinations of parameters, the rankings are
Direct Counting
of Supporters
almost equal!
Spam Detection
Results

Link Analysis on
Experimental comparison
the Web

Levels of Link
Analysis
Experimental Comparison in the U.K. Web Graph
Generalizing
PageRank

Other
Functional
1.00
Rankings

0.95
Web Spam
τ
0.90
Web Spam
Detection
0.85
τ ≥ 0.95
Topological Web
0.80
Spam

Direct Counting
of Supporters
25
Spam Detection
20
Results
0.9
15
L 0.8
10 0.7
α
0.6
5 0.5

Link Analysis on
Prediction of best parameter combination
the Web

Levels of Link
Analysis
Prediction of Best Parameter Combinations (Analysis)
Generalizing
PageRank
25
Actual optimum
Other
Predicted optimum with length=5
Functional
Rankings
L that maximizes Kendall’s τ
20
Web Spam

Web Spam
Detection
15
Topological Web
Spam

10
Direct Counting
of Supporters

Spam Detection
Results
5

0.5 0.6 0.7 0.8 0.9
Exponent α

Link Analysis on
What is on the Web?
the Web

Information
Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
What is on the Web?
the Web

Information + Porn
Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
What is on the Web?
the Web

Information + Porn + On-line casinos + Free movies +
Levels of Link
Analysis
Cheap software + Buy a MBA diploma + Prescription -free
Generalizing
drugs + V!-4-gra + Get rich now now now!!!
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Graphic: www.milliondollarhomepage.com

Link Analysis on
Opportunities for Web spam
the Web

Levels of Link
Analysis

Generalizing
PageRank
V Spamdexing
Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
the Web

Levels of Link
Analysis

Generalizing
PageRank
V Spamdexing
Other
Keyword stuﬃng
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
the Web

Levels of Link
Analysis

Generalizing
PageRank
V Spamdexing
Other
Keyword stuﬃng
Functional
Rankings
Link farms
Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
the Web

Levels of Link
Analysis

Generalizing
PageRank
V Spamdexing
Other
Keyword stuﬃng
Functional
Rankings
Link farms
Web Spam
Scraper, “Made for Advertising” sites
Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
the Web

Levels of Link
Analysis

Generalizing
PageRank
V Spamdexing
Other
Keyword stuﬃng
Functional
Rankings
Link farms
Web Spam
Web Spam
Spam blogs (splogs)
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
the Web

Levels of Link
Analysis

Generalizing
PageRank
V Spamdexing
Other
Keyword stuﬃng
Functional
Rankings
Link farms
Web Spam
Web Spam
Spam blogs (splogs)
Detection
Cloaking
Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
the Web

Levels of Link
Analysis

Generalizing
PageRank
V Spamdexing
Other
Keyword stuﬃng
Functional
Rankings
Link farms
Web Spam
Web Spam
Spam blogs (splogs)
Detection
Cloaking
Topological Web
Spam
Click spam
Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
the Web

Levels of Link
Analysis

Generalizing
PageRank
V Spamdexing
Other
Keyword stuﬃng
Functional
Rankings
Link farms
Web Spam
Web Spam
Spam blogs (splogs)
Detection
Cloaking
Topological Web
Spam
Click spam
Direct Counting
of Supporters

Adversarial relationship
Spam Detection
Results
Every undeserved gain in ranking for a spammer, is a loss of
precision for the search engine.

Link Analysis on
Typical Web Spam (1)
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
Typical Web Spam (2)
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
Hidden text
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
Made for Advertising (1)
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
Search engine?
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
Fake search engine
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
Problem: “normal” pages that are spam
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
Machine Learning
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
Machine Learning (cont.)
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
Feature Extraction
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
Challenges: Machine Learning
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Machine Learning Challenges:
Web Spam

Web Spam
Learning with inter dependent variables (graph)
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Learning with few examples
Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Learning with few examples
Topological Web
Spam
Scalability
Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
Challenges: Information Retrieval
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Information Retrieval Challenges:
Rankings

Feature extraction: which features?
Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Feature aggregation: page/host/domain
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Feature propagation (graph)
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Recall/precision tradeoﬀs
Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Recall/precision tradeoﬀs
Direct Counting
of Supporters
Scalability
Spam Detection
Results

Link Analysis on
Topological spam: link farms
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
Topological spam: link farms
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Single-level farms can be detected by searching groups of
nodes sharing their out-links [Gibson et al., 2005]

Link Analysis on
Motivation
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
[Fetterly et al., 2004] hypothesized that studying the
Rankings

distribution of statistics about pages could be a good way of
Web Spam

Web Spam
detecting spam pages:
Detection

Topological Web
“in a number of these distributions, outlier values are
Spam

Direct Counting
associated with web spam”
of Supporters

Spam Detection
Results

Link Analysis on
Test collection
the Web

Levels of Link
Analysis

Generalizing
PageRank

U.K. collection
Other
Functional
Rankings
18.5 million pages downloaded from the .UK domain
Web Spam

5,344 hosts manually classiﬁed (6% of the hosts)
Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
Test collection
the Web

Levels of Link
Analysis

Generalizing
PageRank

U.K. collection
Other
Functional
Rankings
18.5 million pages downloaded from the .UK domain
Web Spam

5,344 hosts manually classiﬁed (6% of the hosts)
Web Spam
Detection

Topological Web
Spam

Direct Counting
Classiﬁed entire hosts:
of Supporters

Spam Detection
V A few hosts are mixed: spam and non-spam pages
Results

X More coverage: sample covers 32% of the pages

Link Analysis on
In-degree
the Web

δ = 0.35
In−degree
Levels of Link
Analysis

Generalizing
Normal
PageRank
0.4 Spam
Other
Functional
Rankings

0.3
Web Spam

Web Spam
Detection

Topological Web
0.2
Spam

Direct Counting
of Supporters

Spam Detection
0.1
Results

0
1 100 10000
Number of in−links
(δ = max. diﬀerence in C.D.F. plot)

Link Analysis on
Out-degree
the Web

Levels of Link
δ = 0.28
Out−degree
Analysis

0.3
Generalizing
Normal
PageRank

Spam
Other
Functional
Rankings

Web Spam
0.2
Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters
0.1
Spam Detection
Results

0
1 10 50 100
Number of out−links

Link Analysis on
Edge reciprocity
the Web

Levels of Link
δ = 0.35
Reciprocity of max. PR page
Analysis

0.5
Generalizing
Normal
PageRank

Spam
Other
Functional
0.4
Rankings

Web Spam

Web Spam
0.3
Detection

Topological Web
Spam

0.2
Direct Counting
of Supporters

Spam Detection
Results
0.1

0
0 0.2 0.4 0.6 0.8 1
Fraction of reciprocal links

Link Analysis on
Assortativity
the Web

Levels of Link

δ = 0.31
Degree / Degree of neighbors
Analysis

Generalizing
0.4
PageRank
Normal
Spam
Other
Functional
Rankings
0.3
Web Spam

Web Spam
Detection

Topological Web
0.2
Spam

Direct Counting
of Supporters

Spam Detection
0.1
Results

0
0.001 0.01 0.1 1 10 100 1000
Degree/Degree ratio of home page

Link Analysis on
Variance of PageRank
the Web

Suggested in [Bencz´r et al., 2005]
u
Levels of Link
Analysis

Generalizing
PageRank
PageRank PageRank
Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
Variance of PageRank of in-neighbors
the Web

Levels of Link

Stdev. of PR of Neighbors (Home) δ = 0.41
Analysis

Generalizing
PageRank
Normal
Spam
Other
0.3
Functional
Rankings

Web Spam

Web Spam
Detection
0.2
Topological Web
Spam

Direct Counting
of Supporters
0.1
Spam Detection
Results

0
0 0.2 0.4 0.6 0.8 1
σ2 of the logarithm of PageRank

Link Analysis on
TrustRank
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
TrustRank [Gy¨ngyi et al., 2004]
o
Functional
Rankings

A node with high PageRank, but far away from a core set of
Web Spam

“trusted nodes” is suspicious
Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
TrustRank
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
TrustRank [Gy¨ngyi et al., 2004]
o
Functional
Rankings

A node with high PageRank, but far away from a core set of
Web Spam

“trusted nodes” is suspicious
Web Spam
Detection

Start from a set of trusted nodes, then do a random walk,
Topological Web
Spam
returning to the set of trusted nodes with probability 1 − α at
Direct Counting
each step
of Supporters

Spam Detection
Results
i Trusted nodes: data from http://www.dmoz.org/

Link Analysis on
TrustRank Idea
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
TrustRank score
the Web

Levels of Link

δ = 0.59
Analysis
TrustRank score of home page
Generalizing
PageRank
Normal
0.4 Spam
Other
Functional
Rankings

Web Spam
0.3
Web Spam
Detection

Topological Web
Spam
0.2
Direct Counting
of Supporters

Spam Detection
0.1
Results

0
1e−06 0.001
TrustRank

Link Analysis on
TrustRank / PageRank
the Web

Levels of Link

δ = 0.59
Analysis
Estimated relative non−spam mass
Generalizing
PageRank
Normal
0.8
Spam
Other
Functional
0.7
Rankings

Web Spam
0.6
Web Spam
0.5
Detection

Topological Web
0.4
Spam

Direct Counting
0.3
of Supporters

Spam Detection
0.2
Results

0.1

0
0.3 1 10 100
TrustRank score/PageRank

Link Analysis on
Truncated PageRank
the Web

Levels of Link
Analysis

Generalizing
Proposed in [Becchetti et al., 2006b]. Idea: reduce the direct
PageRank

contribution of the ﬁrst levels of links:
Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
t≤T
0
Results
damping(t) =
C αt t>T

Link Analysis on
Truncated PageRank
the Web

Levels of Link
Analysis

Generalizing
Proposed in [Becchetti et al., 2006b]. Idea: reduce the direct
PageRank

contribution of the ﬁrst levels of links:
Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
t≤T
0
Results
damping(t) =
C αt t>T
V No extra reading of the graph after PageRank

Link Analysis on
Truncated PageRank(T=2) / PageRank
the Web

Levels of Link
Analysis
TruncatedPageRank T=2 / PageRank δ = 0.30
Generalizing
PageRank
Normal
Other
Spam
0.3
Functional
Rankings

Web Spam

Web Spam
Detection
0.2
Topological Web
Spam

Direct Counting
of Supporters
0.1
Spam Detection
Results

0
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
TruncatedPageRank(T=2) / PageRank

Link Analysis on
Max. change of Truncated PageRank
the Web

Levels of Link
Analysis

Maximum change of Truncated PageRank δ = 0.29
Generalizing
PageRank
Normal
Other
Spam
Functional
Rankings
0.2
Web Spam

Web Spam
Detection

Topological Web
Spam
0.1
Direct Counting
of Supporters

Spam Detection
Results

0
0.85 0.9 0.95 1 1.05 1.1
max(TrPRi+1/TrPri)

Link Analysis on
High and low-ranked pages are diﬀerent
the Web

4
Levels of Link
x 10
Analysis
Top 0%−10%
12
Generalizing
Top 40%−50%
PageRank
Top 60%−70%
Other
10
Number of Nodes
Functional
Rankings

8
Web Spam

Web Spam
Detection
6
Topological Web
Spam
4
Direct Counting
of Supporters

2
Spam Detection
Results

0
1 5 10 15 20
Distance

Link Analysis on
High and low-ranked pages are diﬀerent
the Web

4
Levels of Link
x 10
Analysis
Top 0%−10%
12
Generalizing
Top 40%−50%
PageRank
Top 60%−70%
Other
10
Number of Nodes
Functional
Rankings

8
Web Spam

Web Spam
Detection
6
Topological Web
Spam
4
Direct Counting
of Supporters

2
Spam Detection
Results

0
1 5 10 15 20
Distance
Areas below the curves are equal if we are in the same
strongly-connected component

Link Analysis on
Probabilistic counting
the Web

Levels of Link
Analysis
1
1
Generalizing 0
0
PageRank 0
0
0
0
Other 0 1
1 1
1
1
Functional 0 0
1 1
0
0
Rankings 0
0 0 0
Propagation of 0
0 1
1
Web Spam bits using the 1
0 1
1
“OR” operation 1
0 1
0
Web Spam
Detection
1
Target
0 Count bits set
Topological Web 0
page
0 to estimate
Spam 0
0 supporters
0
0
Direct Counting 1
1 1
1
of Supporters 0
0 1
1
0
0
Spam Detection 0
0
Results 1
1
0
0

Link Analysis on
Probabilistic counting
the Web

Levels of Link
Analysis
1
1
Generalizing 0
0
PageRank 0
0
0
0
Other 0 1
1 1
1
1
Functional 0 0
1 1
0
0
Rankings 0
0 0 0
Propagation of 0
0 1
1
Web Spam bits using the 1
0 1
1
“OR” operation 1
0 1
0
Web Spam
Detection
1
Target
0 Count bits set
Topological Web 0
page
0 to estimate
Spam 0
0 supporters
0
0
Direct Counting 1
1 1
1
of Supporters 0
0 1
1
0
0
Spam Detection 0
0
Results 1
1
0
0

[Becchetti et al., 2006b] shows an improvement of ANF
algorithm [Palmer et al., 2002] based on probabilistic
counting [Flajolet and Martin, 1985]

Link Analysis on
General algorithm
the Web

Require: N: number of nodes, d: distance, k: bits
Levels of Link
Analysis
1: for node : 1 . . . N, bit: 1 . . . k do
Generalizing
INIT(node,bit)
2:
PageRank

3: end for
Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
General algorithm
the Web

Levels of Link
Analysis
1: for node : 1 . . . N, bit: 1 . . . k do
Generalizing
INIT(node,bit)
2:
PageRank

3: end for
Other
Functional
4: for distance : 1 . . . d do {Iteration step}
Rankings

Aux ← 0k
Web Spam 5:
for src : 1 . . . N do {Follow links in the graph}
Web Spam
6:
Detection
for all links from src to dest do
7:
Topological Web
Aux[dest] ← Aux[dest] OR V[src,·]
Spam
8:
Direct Counting
end for
9:
of Supporters
end for
10:
Spam Detection
Results
V ← Aux
11:
12: end for

Link Analysis on
General algorithm
the Web

Levels of Link
Analysis
1: for node : 1 . . . N, bit: 1 . . . k do
Generalizing
INIT(node,bit)
2:
PageRank

3: end for
Other
Functional
4: for distance : 1 . . . d do {Iteration step}
Rankings

Aux ← 0k
Web Spam 5:
for src : 1 . . . N do {Follow links in the graph}
Web Spam
6:
Detection
for all links from src to dest do
7:
Topological Web
Aux[dest] ← Aux[dest] OR V[src,·]
Spam
8:
Direct Counting
end for
9:
of Supporters
end for
10:
Spam Detection
Results
V ← Aux
11:
12: end for
13: for node: 1 . . . N do {Estimate supporters}
Supporters[node] ← ESTIMATE( V[node,·] )
14:
15: end for
16: return Supporters

Link Analysis on
Our estimator
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Initialize all bits to one with probability
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
Our estimator
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings
ones(node)
Estimator: neighbors(node) = log(1− ) 1 −
Web Spam
k
Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
Our estimator
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings
ones(node)
Estimator: neighbors(node) = log(1− ) 1 −
Web Spam
k
Web Spam
Detection
Adaptive estimation
Topological Web
Spam
Repeat the above process for = 1/2, 1/4, 1/8, . . . , and look
Direct Counting
for the transitions from more than (1 − 1/e)k ones to less
of Supporters
than (1 − 1/e)k ones.
Spam Detection
Results

Link Analysis on
Convergence
the Web

Levels of Link
Analysis
100%
Generalizing
PageRank
90%
Other
80%
Functional
Rankings
Fraction of nodes
70%
with estimates
Web Spam

60%
Web Spam
Detection
50% d=1
Topological Web
d=2
40%
Spam
d=3
Direct Counting
30% d=4
of Supporters
d=5
20%
Spam Detection
d=6
Results
d=7
10%
d=8
0%
5 10 15 20
Iteration

Link Analysis on
Error rate
the Web

Levels of Link
Analysis

Generalizing
Ours 64 bits, epsilon−only estimator
PageRank
Ours 64 bits, combined estimator
0.5
Other
ANF 24 bits × 24 iterations (576 b×i)
Average Relative Error
Functional
ANF 24 bits × 48 iterations (1152 b×i)
Rankings

0.4
Web Spam
960 b×i
Web Spam
1216 b×i
512 b×i 832 b×i
Detection 1344 b×i 1408 b×i
768 b×i 1152 b×i
0.3
Topological Web
Spam

0.2
Direct Counting 576 b×i
1152 b×i
of Supporters
512 b×i 768 b×i 960 b×i 1216 b×i 1344 b×i 1408 b×i
832 b×i 1152 b×i
Spam Detection
0.1
Results

0
1 2 3 4 5 6 7 8
Distance

Link Analysis on
Hosts at distance 4
the Web

Levels of Link
δ = 0.39
Hosts at Distance Exactly 4
Analysis

0.4
Generalizing
Normal
PageRank

Spam
Other
Functional
Rankings
0.3
Web Spam

Web Spam
Detection

Topological Web
0.2
Spam

Direct Counting
of Supporters

Spam Detection
0.1
Results

0
1 100 1000
S4 − S3

Link Analysis on
Minimum change of supporters
the Web

Levels of Link
δ = 0.39
Minimum change of supporters
Analysis

Generalizing
PageRank
Normal
0.4 Spam
Other
Functional
Rankings

Web Spam
0.3
Web Spam
Detection

Topological Web
Spam
0.2
Direct Counting
of Supporters

Spam Detection
0.1
Results

0
1 5 10
min(S2/S1, S3/S2, S4/S3)

Link Analysis on
Detection rates
the Web

Levels of Link
Analysis

Generalizing
PageRank

60% (UK-2006) – 80% (UK-2002) of detection rate, with
Other
Functional
4%–2% error rate by combining diﬀerent
Rankings

attributes [Becchetti et al., 2006a].
Web Spam

Web Spam
X No magic bullet in link analysis
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
Detection rates
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

X
Topological Web
Precision still low compared to e-mail spam ﬁlters
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
Detection rates
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

X
Topological Web
Spam

V Measure both home page and max. PageRank page
Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
Detection rates
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

X
Topological Web
Spam

Direct Counting
of Supporters
V Host-based counts of neighbors are important
Spam Detection
Results

Link Analysis on
Detection rates
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

X
Topological Web
Spam

Direct Counting
of Supporters
V Host-based counts of neighbors are important
Spam Detection
Results
Next step: combine link analysis and content analysis

Link Analysis on
Upcoming Web Spam Challenge on UK-2006
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

We asked 20+ volunteers to clasify entire hosts
Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
We provided several examples
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Asked to classify normal / borderline / spam
Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Asked to classify normal / borderline / spam
Topological Web
Spam

Do they agree? Mostly . . .
Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
Agreement between humans
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
Result: ﬁrst public Web Spam collection
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Public spam collection
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings
Web graph with ∼80 million pages
Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings
Web Spam
∼11,000 hosts
Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings
Web Spam
∼11,000 hosts
Web Spam
Labels for ∼4,000 hosts by at least 2 humans each
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings
Web Spam
∼11,000 hosts
Web Spam
Detection

Topological Web
Upcoming Web Spam challenge
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings
Web Spam
∼11,000 hosts
Web Spam
Detection

Topological Web
Spam
Machine learning
Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings
Web Spam
∼11,000 hosts
Web Spam
Detection

Topological Web
Spam
Machine learning
Direct Counting
of Supporters Information retrieval
Spam Detection
Results

Link Analysis on
the Web

Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings
Web Spam
∼11,000 hosts
Web Spam
Detection

Topological Web
Spam
Machine learning
Direct Counting
of Supporters Information retrieval
Spam Detection
webspam-announces-subscribe@yahoogroups.com
Results

Link Analysis on
the Web

Levels of Link
Thank you!
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results

Link Analysis on
the Web
Baeza-Yates, R., Boldi, P., and Castillo, C. (2006a).
Generalizing pagerank: Damping functions for link-based
Levels of Link
Analysis
ranking algorithms.
Generalizing
In Proceedings of ACM SIGIR, pages 308–315, Seattle,
PageRank

Washington, USA. ACM Press.
Other
Functional
Rankings
Baeza-Yates, R., Castillo, C., and Efthimiadis, E. (2006b).
Web Spam
Characterization of national web domains.
Web Spam
Detection
To appear in ACM TOIT.
Topological Web
Spam
Baeza-Yates, R. and Poblete, B. (2006).
Direct Counting
of Supporters
Dynamics of the chilean web structure.
Spam Detection
Comput. Networks, 50(10):1464–1473.
Results

Barab´si, A.-L. (2002).
a
Linked: The New Science of Networks.
Perseus Books Group.

Link Analysis on
the Web
Becchetti, L., Castillo, C., Donato, D., Leonardi, S., and
Baeza-Yates, R. (2006a).
Levels of Link
Link-based characterization and detection of Web Spam.
Analysis

Generalizing
In Second International Workshop on Adversarial Information
PageRank
Retrieval on the Web (AIRWeb), Seattle, USA.
Other
Functional
Rankings
Becchetti, L., Castillo, C., Donato, D., Leonardi, S., and
Web Spam
Baeza-Yates, R. (2006b).
Web Spam
Using rank propagation and probabilistic counting for
Detection
link-based spam detection.
Topological Web
Spam
In Proceedings of the Workshop on Web Mining and Web
Direct Counting
Usage Analysis (WebKDD), Pennsylvania, USA. ACM Press.
of Supporters

Spam Detection
Bencz´r, A. A., Csalog´ny, K., Sarl´s, T., and Uher, M.
u a o
Results

(2005).
Spamrank: fully automatic link spam detection.
In Proceedings of the First International Workshop on
Adversarial Information Retrieval on the Web, Chiba, Japan.

Link Analysis on
the Web

Boldi, P., Santini, M., and Vigna, S. (2005).
Pagerank as a function of the damping factor.
Levels of Link
Analysis
In Proceedings of the 14th international conference on World
Generalizing
Wide Web, pages 557–566, Chiba, Japan. ACM Press.
PageRank

Other
Functional
Broder, A., Kumar, R., Maghoul, F., Raghavan, P.,
Rankings
Rajagopalan, S., Stata, R., Tomkins, A., and Wiener, J.
Web Spam
(2000).
Web Spam
Detection
Graph structure in the web: Experiments and models.
Topological Web
In Proceedings of the Ninth Conference on World Wide Web,
Spam
pages 309–320, Amsterdam, Netherlands. ACM Press.
Direct Counting
of Supporters

Fetterly, D., Manasse, M., and Najork, M. (2004).
Spam Detection
Results
Spam, damn spam, and statistics: Using statistical analysis to
locate spam web pages.
In Proceedings of the seventh workshop on the Web and
databases (WebDB), pages 1–6, Paris, France.

Link Analysis on
Flajolet, P. and Martin, N. G. (1985).
the Web

Probabilistic counting algorithms for data base applications.
Levels of Link
Journal of Computer and System Sciences, 31(2):182–209.
Analysis

Generalizing
Gibson, D., Kumar, R., and Tomkins, A. (2005).
PageRank

Other
Discovering large dense subgraphs in massive graphs.
Functional
Rankings
In VLDB ’05: Proceedings of the 31st international conference
Web Spam
on Very large data bases, pages 721–732. VLDB Endowment.
Web Spam
Detection
Gy¨ngyi, Z., Molina, H. G., and Pedersen, J. (2004).
o
Topological Web
Combating web spam with trustrank.
Spam

Direct Counting
In Proceedings of the Thirtieth International Conference on
of Supporters
Very Large Data Bases (VLDB), pages 576–587, Toronto,
Spam Detection
Canada. Morgan Kaufmann.
Results

Newman, M. E., Strogatz, S. H., and Watts, D. J. (2001).
Random graphs with arbitrary degree distributions and their
applications.
Phys Rev E Stat Nonlin Soft Matter Phys, 64(2 Pt 2).

Link Analysis on
the Web

Levels of Link
Analysis

Palmer, C. R., Gibbons, P. B., and Faloutsos, C. (2002).
Generalizing
PageRank
ANF: a fast and scalable tool for data mining in massive
Other
Functional
graphs.
Rankings
In Proceedings of the eighth ACM SIGKDD international
Web Spam
conference on Knowledge discovery and data mining, pages
Web Spam
Detection
81–90, New York, NY, USA. ACM Press.
Topological Web
Spam
Tauro, L., Palmer, C., Siganos, G., and Faloutsos, M. (2001).
Direct Counting
A simple conceptual model for the internet topology.
of Supporters

Spam Detection
In Global Internet, San Antonio, Texas, USA. IEEE CS Press.
Results

Link Analysis (RBY)

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (7)

Similar to Link Analysis (RBY)

Similar to Link Analysis (RBY) (20)

More from Carlos Castillo (ChaTo)

More from Carlos Castillo (ChaTo) (20)

Recently uploaded

Recently uploaded (20)

Link Analysis (RBY)