Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm

•

1 like•3,534 views

Presentation for ICDM 2016

Fractality of Massive Graphs:
Scalable Analysis with
Sketch-Based Box-Covering
Algorithm
Takuya Akiba (Preferred Networks, Inc.)
Kenko Nakamura (Recruit Communications., Ltd.)
Taro Takaguchi (National Institute of Information and
Communications Technology)
*Work done while all authors were at National Institute of Informatics
1

$Akiba+ | Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm Fractality of networks 2 Some of real-world networks are fractal. [Song+, Nature’05]$

$Akiba+ | Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm ▶ box := set of vertices within a radius of ℓ ▶b(ℓ) := number of boxes needed to cover the whole graph ▶ graph said to be fractal ⇔ b(ℓ) ∝ ℓ−d Definition of Graph Fractality 3 ← Fractal network model$

$Akiba+ | Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm ▶ b(ℓ) := number of boxes needed to cover the whole graph Box-Covering Problem 4 Box-Covering Problem : Determination of the fractality ▶ Minimize b(ℓ) ▶ Box-Covering Problem is NP-Hard ▶ Approximation algorithms are used$

Akiba+ | Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm
Box-Covering Problem
Previous Algorithms
computation time is too long!
infeasible for networks with millions of vertices
5
This Work
near-liner time complexity
works with tens of millions of vertices

Akiba+ | Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm
Compared with Previous Method
Previous Naive Method [Song+’05]
▶ Step 1: Instantiate all boxes
BFS from each vertex
▶ Step 2: Solve set cover problem
Greedy algorithm with approximation ratio 1 ＋ ln n
Proposed Method
▶ Step 1: Instantiate Min-Hash of all boxes
Similar to algorithms for All-Distances Sketches
▶ Step 2: Solve set cover problem in the sketch-space
Near-linear time complexity by using BST and Heap
6

Akiba+ | Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm
Experimental Results
Computation Time
Memory Usage
Environment:
Intel Xeon 2.67GHz, 96GB
10 times faster than the previous algorithms
Flower model BA model

Akiba+ | Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm
Real Large Network
▶ Web graph with 1M vertices and 17M edges (in-2004)
– 11.7 hours in total
▶ Fractality analysis of million-scale network for the first time
8

$Akiba+ | Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm Summary Background: Fractality of real-world network ▶ Some of the real-world networks are fractal. ▶ Lack of an efficient algorithm Proposed Method: Box-Covering on Min-Hash ▶ Avoid explicit representation of boxes ▶ Efficient Min-Hash computation: Similar to ADS ▶ Efficient Greedy by Binary Search Tree and Heap ▶ Fractality analysis of the network with 17M edges 9$

What's hot

Hubba Deep LearningIvan Goloskokovic

[0312] jooheeivaderivader

YOLACTArithmer Inc.

PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationSunghoon Joo

Weakly supervised semantic segmentation of 3D point cloudArithmer Inc.

Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series DataDaiki Tanaka

InternshipAli Akbari

Graph Neural Network - IntroductionJungwon Kim

Improving access to satellite imagery with Cloud computingRAHUL BHOJWANI

Graph neural networks overviewRodion Kiryukhin

Compiler Designsweetysweety8

Accelerated Logistic Regression on GPU(s)RAHUL BHOJWANI

About functional SIRtuxette

Lec4 ClusteringJeff Hammerbacher

Bidirectional graph search techniques for finding shortest path in image base...Navin Kumar

Kernel methods and variable selection for exploratory analysis and multi-omic...tuxette

Ivan Sahumbaiev "Deep Learning approaches meet 3D data"Fwdays

Orpailleur -- triclustering talkDmitrii Ignatov

Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...AIST

Vladimir Milov and Andrey Savchenko - Classification of Dangerous Situations...AIST

What's hot (20)

Hubba Deep Learning

[0312] joohee

YOLACT

PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization

Weakly supervised semantic segmentation of 3D point cloud

Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data

Internship

Graph Neural Network - Introduction

Improving access to satellite imagery with Cloud computing

Graph neural networks overview

Compiler Design

Accelerated Logistic Regression on GPU(s)

About functional SIR

Lec4 Clustering

Bidirectional graph search techniques for finding shortest path in image base...

Kernel methods and variable selection for exploratory analysis and multi-omic...

Ivan Sahumbaiev "Deep Learning approaches meet 3D data"

Orpailleur -- triclustering talk

Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...

Vladimir Milov and Andrey Savchenko - Classification of Dangerous Situations...

Viewers also liked

NIPS2016 Supervised Word Mover's DistanceRecruit Lifestyle Co., Ltd.

Dynamic filter networksTatsuya Shirakawa

Bayesian Nonparametric Motor-skill Representations for Efficient Learning of ...Nishanth Koganti

NIPS Paper Reading, Data ProgramingKotaro Tanahashi

Binarized Neural NetworksShotaro Sano

20170819 CV勉強会 CVPR 2017issaymk2

CVPR2016読み会 Sparsifying Neural Network Connections for Face RecognitionKoichi Takahashi

Stochastic Variational InferenceKaede Hayashi

On the Dynamics of Machine Learning Algorithms and Behavioral Game TheoryRikiya Takahashi

LCA and RMQ ~簡潔もあるよ！~Yuma Inoue

sublabel accurate convex relaxation of vectorial multilabel energiesFujimoto Keisuke

プログラミングコンテストでのデータ構造 2　～動的木編～Takuya Akiba

DeepLearningTutorialTakayoshi Yamashita

Greed is Good: 劣モジュラ関数最大化とその発展Yuichi Yoshida

ウェーブレット木の世界Preferred Networks

PRML輪読#14matsuolab

Practical recommendations for gradient-based training of deep architecturesKoji Matsuda

ORB-SLAMを動かしてみたTakuya Minagawa

強化学習その2nishio

多項式あてはめで眺めるベイズ推定~今日からきみもベイジアン~tanutarou

Viewers also liked (20)

NIPS2016 Supervised Word Mover's Distance

Dynamic filter networks

Bayesian Nonparametric Motor-skill Representations for Efficient Learning of ...

NIPS Paper Reading, Data Programing

Binarized Neural Networks

20170819 CV勉強会 CVPR 2017

CVPR2016読み会 Sparsifying Neural Network Connections for Face Recognition

Stochastic Variational Inference

On the Dynamics of Machine Learning Algorithms and Behavioral Game Theory

LCA and RMQ ~簡潔もあるよ！~

sublabel accurate convex relaxation of vectorial multilabel energies

プログラミングコンテストでのデータ構造 2　～動的木編～

DeepLearningTutorial

Greed is Good: 劣モジュラ関数最大化とその発展

ウェーブレット木の世界

PRML輪読#14

Practical recommendations for gradient-based training of deep architectures

ORB-SLAMを動かしてみた

強化学習その2

多項式あてはめで眺めるベイズ推定~今日からきみもベイジアン~

Similar to Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm

Enterprise Scale Topological Data Analysis Using SparkAlpine Data

Enterprise Scale Topological Data Analysis Using SparkSpark Summit

Reducing Structural Bias in Technology Mappingsatrajit

Space time & power.Soudip Sinha Roy

CnnMehrnaz Faraz

Convolutional Neural Networksmilad abbasi

Accumulo and the Convergence of Machine Learning, Big Data, and SupercomputingAccumulo Summit

Performance Analysis of Lattice QCD with APGAS Programming ModelKoichi Shirahata

Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...MLconf

Deep_Learning_Frameworks_CNTK_PyTorchSubhashis Hazarika

Online learning, Vowpal Wabbit and HadoopHéloïse Nonne

Deep LearningPierre de Lacaze

Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018Universitat Politècnica de Catalunya

Introduction to Applied Machine LearningSheilaJimenezMorejon

lec6a.pptSaadMemon23

Deep learning (2)Muhanad Al-khalisy

PR-144: SqueezeNext: Hardware-Aware Neural Network DesignJinwon Lee

MLconf seattle 2015 presentationehtshamelahi

A brief introduction to recent segmentation methodsShunta Saito

ECET 375 Success Begins/Newtonhelp.comledlang1

Similar to Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm (20)

Enterprise Scale Topological Data Analysis Using Spark

Reducing Structural Bias in Technology Mapping

Space time & power.

Cnn

Convolutional Neural Networks

Accumulo and the Convergence of Machine Learning, Big Data, and Supercomputing

Performance Analysis of Lattice QCD with APGAS Programming Model

Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...

Deep_Learning_Frameworks_CNTK_PyTorch

Online learning, Vowpal Wabbit and Hadoop

Deep Learning

Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018

Introduction to Applied Machine Learning

lec6a.ppt

Deep learning (2)

PR-144: SqueezeNext: Hardware-Aware Neural Network Design

MLconf seattle 2015 presentation

A brief introduction to recent segmentation methods

ECET 375 Success Begins/Newtonhelp.com

Recently uploaded

Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxnoordubaliya2003

Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju

Neurodevelopmental disorders according to the dsm 5 trssuser06f238

preservation, maintanence and improvement of industrial organism.pptxnoordubaliya2003

The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar

TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b

Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013

GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1

Speech, hearing, noise, intelligibility.pptxpriyankatabhane

Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA

Harmful and Useful Microorganisms Presentationtahreemzahra82

User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems

THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh

LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth

Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad

STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B

Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju

Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju

Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita

Davis plaque method.pptx recombinant DNA technologycaarthichand2003

Recently uploaded (20)

Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx

Pests of safflower_Binomics_Identification_Dr.UPR.pdf

Neurodevelopmental disorders according to the dsm 5 tr

preservation, maintanence and improvement of industrial organism.pptx

The dark energy paradox leads to a new structure of spacetime.pptx

TOPIC 8 Temperature and Heat.pdf physics

Scheme-of-Work-Science-Stage-4 cambridge science.docx

GenBio2 - Lesson 1 - Introduction to Genetics.pptx

Speech, hearing, noise, intelligibility.pptx

Grafana in space: Monitoring Japan's SLIM moon lander in real time

Harmful and Useful Microorganisms Presentation

User Guide: Orion™ Weather Station (Columbia Weather Systems)

THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx

LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx

Environmental Biotechnology Topic:- Microbial Biosensor

STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx

Pests of jatropha_Bionomics_identification_Dr.UPR.pdf

Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf

Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine

Davis plaque method.pptx recombinant DNA technology

Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm

1. Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm Takuya Akiba (Preferred Networks, Inc.) Kenko Nakamura (Recruit Communications., Ltd.) Taro Takaguchi (National Institute of Information and Communications Technology) *Work done while all authors were at National Institute of Informatics 1

2. Akiba+ | Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm Fractality of networks 2 Some of real-world networks are fractal. [Song+, Nature’05]

3. Akiba+ | Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm ▶ box := set of vertices within a radius of ℓ ▶b(ℓ) := number of boxes needed to cover the whole graph ▶ graph said to be fractal ⇔ b(ℓ) ∝ ℓ−d Definition of Graph Fractality 3 ← Fractal network model

4. Akiba+ | Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm ▶ b(ℓ) := number of boxes needed to cover the whole graph Box-Covering Problem 4 Box-Covering Problem : Determination of the fractality ▶ Minimize b(ℓ) ▶ Box-Covering Problem is NP-Hard ▶ Approximation algorithms are used

5. Akiba+ | Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm Box-Covering Problem Previous Algorithms computation time is too long! infeasible for networks with millions of vertices 5 This Work near-liner time complexity works with tens of millions of vertices

6. Akiba+ | Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm Compared with Previous Method Previous Naive Method [Song+’05] ▶ Step 1: Instantiate all boxes BFS from each vertex ▶ Step 2: Solve set cover problem Greedy algorithm with approximation ratio 1 ＋ ln n Proposed Method ▶ Step 1: Instantiate Min-Hash of all boxes Similar to algorithms for All-Distances Sketches ▶ Step 2: Solve set cover problem in the sketch-space Near-linear time complexity by using BST and Heap 6

7. Akiba+ | Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm Experimental Results Computation Time Memory Usage Environment: Intel Xeon 2.67GHz, 96GB 10 times faster than the previous algorithms Flower model BA model

8. Akiba+ | Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm Real Large Network ▶ Web graph with 1M vertices and 17M edges (in-2004) – 11.7 hours in total ▶ Fractality analysis of million-scale network for the first time 8

9. Akiba+ | Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm Summary Background: Fractality of real-world network ▶ Some of the real-world networks are fractal. ▶ Lack of an efficient algorithm Proposed Method: Box-Covering on Min-Hash ▶ Avoid explicit representation of boxes ▶ Efficient Min-Hash computation: Similar to ADS ▶ Efficient Greedy by Binary Search Tree and Heap ▶ Fractality analysis of the network with 17M edges 9

Editor's Notes

Welcome to my presentation. I am Kenko Nakamura, a software engineer at Recruit Communications. Today, I would like to talk about Fractality of Massive Graphs and Scalable Analysis with Sketch-Based Box-Covering Algorithm.
For data mining on network, we can use many kinds of properties of networks, such as vertex degree, average distances and so on. As a non-local property, the fractality of complex networks was found in network science. The fractality of a network suggests that the network shows a self-similar structure (like that).
This is the definition of graph fractality. The set of vertices within a radius of L is called “box”. Then, if the number of boxes follows a power-low function of L, the network is said to be fractal. This figure illustrate the comparison for a fractal network model. Plotted points of the numbers of boxes are closer to the power-law function than to the exponential function.
Determination of the fractality is based on the box-covering problem. We have to minimize the number of boxes. However, it is known to be an NP-hard problem. So, to determine the fractality of networks, approximation algorithms are used.
In previous algorithms, computation time is too long. Because they generate all boxes with quadratic space, they are infeasible for large-scale networks with millions of vertices. In this work, our algorithm achieves near-linear time complexity And works with tens of millions of vertices.
Compared with Previous Method, there are two different points. In our method, First, all boxes are generated as Min-Hash Sketch. This generation algorithm is similar to one used in All-Distance Sketches. Second, set cover problem is solved in the Sketch space.
These are the Experimental Results with previous methods. Our method is showed as these red lines. These figures are plotted in log-log scale. Left figures are for fractal networks, and right figures are for non-fractal networks. They shows that our algorithm can run at least 10 times faster than the previous algorithms.
This is the experimental result for real-world large network. This network is crawled web graph of 1M vertices and 17M edges. A large part of the points fall on the line of the fitted power-law function, which suggests the fractality of this network. The fractality of the million-scale network is unveiled for the first time.
9

Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm

Similar to Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm (20)

Recently uploaded

Recently uploaded (20)

Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm

Editor's Notes