The problem of finding bicliques in graphs arises in areas such as bioinformatics, social networks, citation networks and data mining. Most of the algorithms that appear in the literature are based on enumeration of bicliques. We may have an exponential number of bicliques in a graph. The Maximum edge biclique problem is NP-complete. When looking for maximum edge biclique(s) on large graphs, enumeration based algorithms may be subject to an explosive number of enumerated bicliques. Even though there are very efficient enumeration based solver packages around, there is also a need for a solver that can just report a single maximum edge biclique. We contribute an integer programming based algorithm for finding the maximum edge biclique. Our algorithm called BIIP, makes use of quaternary search and other optimizations to locate an optimal solution by making calls to an integer programming solver. We provide timing results of our solver on graphs coming from the bioinformatics field. Our BIIP solver is available at https://github.com/melihsozdinler/biip.
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
Finding Maximum Edge Biclique in Bipartite Networks by Integer Programming
1. FINDING MAXIMUM EDGE BICLIQUE
IN BIPARTITE NETWORKS BY
INTEGER PROGRAMMING
CSE 2018 – 21st IEEE
International Conference on
Computational Science and
Engineering
Melih Sözdinler
30 October 2018
2. PERSONAL INTRODUCTION
PhD Candidate at Boğaziçi University, Computer Engineering
Department, İstanbul
My Advisor: Professor Can ÖZTURAN, co-author of this paper
MSc at Işık University, Computer Engineering Department, İstanbul
BSc at Işık University, Computer Engineering Department, İstanbul
Working as Senior Software and System Engineer at Huawei Turkey
R&D, İstanbul
3. PAPER TERMINOLOGY
Bipartite Graphs/Networks – A Two Layered Graph
A bipartite graph 𝐺 is special graph where vertices, 𝑉, can be divided into two
independent sets, 𝑉𝑙 and 𝑉𝑟, and every edge of the graph connects one vertex in 𝑉𝑙 to
one vertex in 𝑉𝑟
Complete Bipartite Graphs – Bicliques
A Complete Bipartite Graph where contains two independent sets of vertices, 𝑉𝑙 and
𝑉𝑟 are connected with and edge for all possible pairs of 𝑣𝑙 from 𝑉𝑙 and 𝑢 𝑟 from 𝑉𝑟
𝐾(𝐺, 𝐼, 𝐽) is a edge biclique denoted for 𝐺(𝑉𝑙
′
, 𝑉𝑟
′, 𝐸) in G(𝑉𝑙, 𝑉𝑟, 𝐸) with
vertices
𝑉𝑙
′
𝜖𝑉𝑙 , I = 𝑉𝑙
′
and 𝑉𝑟
′ 𝜖𝑉𝑟 , J = 𝑉𝑟
′
This paper focuses on finding the largest Maximum Edge Biclique.
4. MAXIMUM EDGE AND VERTEX
BICLIQUES
An edge maximum biclique B1({u1,u2},{v1,v2,v3}) with 5 vertices and
6 edges
A vertex maximum biclique B2({u3,u4,u5,u6,u7},{v5}) with 6 vertices
and 5 edges.
Both B1 and B2 are maximal
5. MOTIVATION
Finding Maximum Edge Biclique Problem in Bipartite Graphs is NP-Complete[1].
Biclique Enumeration Algorithms have exhaustive methods to find Maximal
case[3,4]. Maximum case can be reached until the end of enumeration.
Bipartite Networks with biclique introduced can be transformed to Super Nodes to
simply the graph.
This yields simpler visualization outputs
6. HOW EXHAUSTIVE ENUMERATION
Using Taste Sweet Human Disease and Gene Association
Network[2], BIMAX and MICA does exhaustive enumeration to return
distinct Edge Bicliques:
7. PROBLEM DEFINITION FOR IP
FORMULATION
Our formulation requires Vertex-Edge Incidence Matrix.
Since we focus on finding Maximum Edge Biclique Equation, that
requires number of edge maximization.
Formulation yields a result when a given 𝐼, 𝐽 returns 𝐾(𝐺, 𝐼, 𝐽)
8. APPLYING QUATERNARY SEARCH
IP Formulation returns 𝐾(𝐺, 𝐼, 𝐽) for a given 𝐼, 𝐽 if exists inside
G(𝑉𝑙, 𝑉𝑟, 𝐸)
We defined BIIP algorithm with IP formulation to find maximum
𝐾 𝐺, 𝐼, 𝐽 using quaternary search.
9. RESEARCH DATASET
Bipartite Networks exists in many areas of scientific studies.
Our paper focuses on Bioinformatic’s Sample of Networks
Disease to Gene Networks; First layer as Diseases and Second layer as Genes
We use DISGENET[ 2 ] platform to create Disease to Gene Networks.
Gene Expression Networks; First layer as Genes/Proteins and Second layer as
Conditions
Movie Lens Database
Stable Benchmark Database with 100K ratings from 1000 users and 1700 movies
10. RESULTS
BIMAX[3] and MICA[4] are state of art algorithms, chosen for
comparison of results:
11. CONCLUSION & FUTURE WORK
We proposed ILP formulation to find 𝐾(𝐺, 𝐼, 𝐽) bicliques and BIIP
algorithm to find Maximum Edge Biclique in Bipartite Graphs.
Future Work
Quaternary Search Parallelism
Gurobi allows us to do parallel tasks. Quaternary search 𝐾(𝐺, 𝐼, 𝐽) bicliques can be
done in parallel.
Finding all distinct maximum bicliques
Iteratively apply BIIP by setting found maximum biclique edges to 0.
Create a software to visualize bipartite graphs with embedded supernode strategy
mentioned previously.
13. REFERENCES
[1] Peeters, M.J.P.. (2000). The Maximum Edge Biclique Problem is NP-Complete.
Tilburg University, Faculty of Economics and Business Administration, Research
Memorandum.
[2] J. P. Gonzalez, A. Bravo, N. Queralt-Rosinach, A. Gutierrez-Sacristan, J. Deu-
Pons, E. Centeno, J. Garc´ıa-Garcia, F. Sanz, and L. I. Furlong, “Disgenet: a
comprehensive platform integrating information on human disease-associated genes
and variants,” Nucleic Acids Research, vol. 45, no. Database-Issue, pp. D833–D839,
2017. [Online].
[3] A. Prelic, S. Bleuler, P. Zimmermann, A. Wille, P. B¨uhlmann, W. Gruissem, L.
Hennig, L. Thiele, and E. Zitzler, “A systematic comparison and evaluation of
biclustering methods for gene expression data,” Bioinformatics, vol. 22, no. 9, pp.
1122–1129, 2006.
[4] G. Alexe, S. Alexe, Y. Crama, S. Foldes, P. L. Hammer, and B. Simeone,
“Consensus algorithms for the generation of all maximal bicliques,” Discrete Applied
Mathematics, vol. 145, no. 1, pp. 11–21, 2004.