Davis plaque method.pptx recombinant DNA technology
Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm
1. Fractality of Massive Graphs:
Scalable Analysis with
Sketch-Based Box-Covering
Algorithm
Takuya Akiba (Preferred Networks, Inc.)
Kenko Nakamura (Recruit Communications., Ltd.)
Taro Takaguchi (National Institute of Information and
Communications Technology)
*Work done while all authors were at National Institute of Informatics
1
2. Akiba+ | Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm
Fractality of networks
2
Some of real-world
networks are fractal.
[Song+, Nature’05]
3. Akiba+ | Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm
▶ box := set of vertices within a radius of ℓ
▶b(ℓ) := number of boxes needed to cover the whole graph
▶ graph said to be fractal ⇔ b(ℓ) ∝ ℓ−d
Definition of Graph Fractality
3
← Fractal network model
4. Akiba+ | Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm
▶ b(ℓ) := number of boxes needed to cover the whole graph
Box-Covering Problem
4
Box-Covering Problem : Determination of the fractality
▶ Minimize b(ℓ)
▶ Box-Covering Problem is NP-Hard
▶ Approximation algorithms are used
5. Akiba+ | Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm
Box-Covering Problem
Previous Algorithms
computation time is too long!
infeasible for networks with millions of vertices
5
This Work
near-liner time complexity
works with tens of millions of vertices
6. Akiba+ | Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm
Compared with Previous Method
Previous Naive Method [Song+’05]
▶ Step 1: Instantiate all boxes
BFS from each vertex
▶ Step 2: Solve set cover problem
Greedy algorithm with approximation ratio 1 + ln n
Proposed Method
▶ Step 1: Instantiate Min-Hash of all boxes
Similar to algorithms for All-Distances Sketches
▶ Step 2: Solve set cover problem in the sketch-space
Near-linear time complexity by using BST and Heap
6
7. Akiba+ | Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm
Experimental Results
Computation Time
Memory Usage
Environment:
Intel Xeon 2.67GHz, 96GB
10 times faster than the previous algorithms
Flower model BA model
8. Akiba+ | Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm
Real Large Network
▶ Web graph with 1M vertices and 17M edges (in-2004)
– 11.7 hours in total
▶ Fractality analysis of million-scale network for the first time
8
9. Akiba+ | Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm
Summary
Background: Fractality of real-world network
▶ Some of the real-world networks are fractal.
▶ Lack of an efficient algorithm
Proposed Method: Box-Covering on Min-Hash
▶ Avoid explicit representation of boxes
▶ Efficient Min-Hash computation: Similar to ADS
▶ Efficient Greedy by Binary Search Tree and Heap
▶ Fractality analysis of the network with 17M edges
9
Editor's Notes
Welcome to my presentation.
I am Kenko Nakamura, a software engineer at Recruit Communications.
Today, I would like to talk about Fractality of Massive Graphs and Scalable Analysis with Sketch-Based Box-Covering Algorithm.
For data mining on network,
we can use many kinds of properties of networks,
such as vertex degree, average distances and so on.
As a non-local property,
the fractality of complex networks was found in network science.
The fractality of a network suggests that
the network shows a self-similar structure (like that).
This is the definition of graph fractality.
The set of vertices within a radius of L is called “box”.
Then, if the number of boxes follows a power-low function of L,
the network is said to be fractal.
This figure illustrate the comparison for a fractal network model.
Plotted points of the numbers of boxes are closer to the power-law function
than to the exponential function.
Determination of the fractality is based on the box-covering problem.
We have to minimize the number of boxes.
However, it is known to be an NP-hard problem.
So, to determine the fractality of networks,
approximation algorithms are used.
In previous algorithms, computation time is too long.
Because they generate all boxes with quadratic space,
they are infeasible for large-scale networks with millions of vertices.
In this work,
our algorithm achieves near-linear time complexity
And works with tens of millions of vertices.
Compared with Previous Method, there are two different points.
In our method,
First, all boxes are generated as Min-Hash Sketch.
This generation algorithm is similar to one used in All-Distance Sketches.
Second, set cover problem is solved in the Sketch space.
These are the Experimental Results with previous methods.
Our method is showed as these red lines.
These figures are plotted in log-log scale.
Left figures are for fractal networks,
and right figures are for non-fractal networks.
They shows that our algorithm can run at least 10 times faster
than the previous algorithms.
This is the experimental result for real-world large network.
This network is crawled web graph of 1M vertices and 17M edges.
A large part of the points fall on the line of the fitted power-law function,
which suggests the fractality of this network.
The fractality of the million-scale network is unveiled for the first time.