The document analyzes the degree distribution of nodes in real-world networks using probabilistic models. It studies networks from datasets like Amazon, California roads, DBLP co-authorships, and others. Maximum likelihood and information criteria are used to determine the best fitting distributions, including Altmann, discrete Weibull, and MOEZipf. The analysis finds the MOEZipf distribution provides the best fit for most networks, followed by the discrete Weibull and Altmann distributions. Future work is proposed to integrate the best fitting distributions into a graph generation tool and analyze correlations between degree distribution and other network structures.
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Analysing the degree distribution of real graphs by means of several probabilistic models
1. Analysing the degree distribution of real
graphs by means of several probabilistic
models
A. Duarte-L´opez, A. Prat-P´erez, M. P´erez-Casany
DAMA-UPC, Universitat Polit`ecnica de Catalunya
18th March, 2015
2. Introduction
Networks are complex structures in which the connections among
the nodes can exhibit complicated patterns.
Given a graph G with set of nodes V, the node degree is the number
of connections that a node has.
Objective: To determine the probability distribution of the de-
gree random variable of a real graph.
2/9
3. Networks
Different real live networks from SNAP were studied.
Network Nodes Edges Type
Amazon 262111 1234876 Directed
CA roadnet 1965206 5533213 Undirected
DBLP Co-autorship 317080 1049865 Undirected
Live Journal 3997962 34681188 Undirected
NotreDame 325729 1497133 Directed
Patents 3774767 16518947 Directed
TX roadnet 1379917 3843319 Undirected
Wikipedia 2394385 5021409 Directed
Youtube 1134890 2987623 Undirected
3/9
4. Probabilistic Models
The non-negative integer probability distributions considered
are:
Uni-parametric Models Bi-parametric Models
Geometric Right-truncated Zipf
Poisson MOEZipf
Zipf Altmann
Negative Binomial
Discrete Weibull
All the models that contain the zero value in its domain were
truncated at 1.
4/9
5. Estimation and Goodness of the fit
Maximum likelihood method.
Given k = (k1, k2, ..., kN) a sample for the node degree we
maximize:
l(θ; k) =
N
i=1
log(pθ(ki))
Let M be the dimension of the parameter vector.
Akaike Information Criteria
AIC = −2l(ˆθ, K) + 2M
N
N − M − 1
Bayesian Information Criteria
BIC = −2l(ˆθ, k) + Mlog(N)
5/9
6. Results
1 10 100 1000 10000
110010000
Youtube (log−log scale)
Degree
Frequency
Observations
Altmann Distribution
Discrete Weibull Distribution
MOEZipf Distribution
Parameters
Distribution Parameters
Altmann ˆγ = 1.64; ˆδ = 0.0036
Discrete Weibull ˆv = 0.1424; ˆp = 0.0044
MOEZipf ˆγ = 2.089; ˆβ = 2.4101
Goodness of the fit
Distribution AIC ∆AIC
Altmann 1688294.56 13402.66
Discrete Weibull 1676831.47 1939.57
MOEZipf 1674891.9 0
Distribution BIC ∆BIC
Altmann 1688316.23 13402.66
Discrete Weibull 1676853.14 1939.57
MOEZipf 1674913.57 0
6/9
7. Conclusions
The models given better fits are:
MOEZipf model (54%)
Zero truncated discrete Weibull Model(38%)
Altmann Model (8%).
7/9
8. Future work
To integrate functions in the DataGen (LDBC project) that
allow generate random graphs in which their node degree
follows the MOEZipf, the zero truncated discrete Weibull
or the Altmann distributions.
To analyse the correlation between the degree distribution
and some other structural characteristics of the network
such as the clustering coefficient, the degree assortativity,
etc.
8/9
9. Why should you come to our poster?
You will find a relatively easy approach that allows you to
get more information about your network structure.
You can share with us your experience with respect the
node degree distribution.
Further potential collaborations.
9/9