A New Method for Vertical Parallelisation of TAN Learning Based on Balanced Incomplete Block Designs
1. A New Method for Vertical Parallelisation of TAN Learning Based on
Balanced Incomplete Block Designs
Anders L. Madsen12 Frank Jensen1 Antonio Salmerón3 Martin
Karlsen1 Helge Langseth4 Thomas D. Nielsen2
HUGIN EXPERT A/S, Aalborg, Denmark
Department of Computer Science, Aalborg University, Denmark
Department of Mathematics, University of Almería , Spain
Department of Computer and Information Science, Norwegian University of Science and
Technology, Norway
PGM, Sept 17-19, 2014 1
2. Outline
1 Introduction
2 Learning TAN Using BIB Designs
3 Experimental Analysis
4 Conclusion & Future Work
PGM, Sept 17-19, 2014 2
3. Introduction
A Bayesian network (BN) is an excellent knowledge representation,
but learning a BN from data can be a challenge
I Data sets are increasing in size w.r.t. both variables and cases
I The computational power of computers is increasing
Research Problem
Learning the structure of a Tree-Augmented Naive Bayes model
from massive amounts of data in parallel
This work is performed as part of the AMIDST project (no. 619209)
PGM, Sept 17-19, 2014 3
4. Learning a TAN model
Let F be a set of features and C be the class variable
C
F1 F2 · · · Fn
The structure of a TAN can be constructed as ([11] and [3]):
1. Compute mutual information I (Fi ; Fj jC) for each pair, i6= j .
2. Build a complete graph G over F with edges annotated by
I (Fi ; Fj jC).
3. Build a maximal spanning tree T from G.
4. Select a vertex and recursively direct edges outward from it.
5. Add C as parent of each F 2 F.
PGM, Sept 17-19, 2014 4
5. Parallel TAN learning
I Step 1 (scoring all pairs w.r.t. MI) is the most time-consuming
element
I Both horizontal and vertical parallelization of learning is
possible
I Data should be distributed, we assume each variable has its
own file
Challenge
Given a set of features F we need to distribute MI-computations
such that each pair is scored exactly once
We use Balanced Incomplete Block (BIB) Designs to achieve
vertical parallelisation
PGM, Sept 17-19, 2014 5
6. Balanced Incomplete Block (BIB) Designs[23]
Definition (Design)
A design is a pair (X;A) s. t. the following properties are satisfied:
1. X is a set of elements called points, and
2. A is a collection of nonempty subsets of X called blocks.
Definition (BIB design)
Let v, k and be positive integers s. t. v k 2. A (v; k; )-BIB
design is a design (X;A) s. t. the following properties are satisfied:
1. jXj = v,
2. each block contains exactly k points, and
3. every pair of distinct points is contained in exactly blocks.
If v = b where b = jAj, then the design is symmetric
PGM, Sept 17-19, 2014 6
7. BIB Design Intuition
Let F be the features with jFj = 7 and let (v = 7; k = 3; = 1)
be a symmetric BIB design, i.e., v = b, with blocks
f013g; f124g; f235g; f346g; f450g; f561g; f602g:
We observe
I = 1 - we compute mutual information exactly once
I a symmetric BIB design, i.e., v = b
I k = 3 is the number of (subset) of features in each block
We take the following approach
I create a process for each block to compute mutual information
I a point p represents Fp when jFj v.
I each process generates its block based on its rank
PGM, Sept 17-19, 2014 7
8. Difference Sets for Symmetric BIB designs
BIB design Difference set k=v
(3,2,1) (0,1) 0:67
(7,3,1) (0,1,3) 0:43
(13,4,1) (0,1,3,9) 0:31
(21,5,1) (0,1,4,14,16) 0:24
(31,6,1) (0,1,3,8,12,18) 0:19
(57,8,1) (0,1,3,13,32,36,43,52) 0:14
(73,9,1) (0,1,3,7,15,31,36,54,63) 0:12
(91,10,1) (0,1,3,9,27,49,56,61,77,81) 0:11
(133,12,1) (0,9,10,12,26,30,67,74,82,109,114,120) 0:09
(183,14,1) (0,12,19,20,22,43,60,71,76,85,89,115,121,168) 0:08
Difference sets for (v; k; )-BIB designs where v is the number of points and k is the
number of points in a block
PGM, Sept 17-19, 2014 8
9. BIB Design and TAN Learning
All processes should compute the same number of I (Fi ; Fj jC)
scores
I There is
n2
= n(n 1)=2 pairs where n = jFj
I We assume each variable is in one file
If we have p processes and each process reads data from k
variables, then the following must be satisfied
p
k
2
n
2
:
Solving for k produces k n
p
p, and equality holds only when
p = 1.
PGM, Sept 17-19, 2014 9
10. BIB Design Example
Consider the (7; 3; 1)-BIB design and assume jFj = 14
I Each point represents two features
I Each process is assigned six features
The seven blocks (b = 7) are:
f013g; f124g; f235g; f346g; f450g; f561g; f602g
The pairwise scoring is performed as
p = 1
0 1 3
X1 X2 X3 X4 X7 X8
x1 x2 x3 x4 x7 x8
· · ·
p = 7
6 0 2
X13 X14 X1 X2 X5 X6
x13 x14 x1 x2 x5 x6
Intra
Inter
PGM, Sept 17-19, 2014 10
11. Experimental Setup
I Software implementation based on HUGIN software and using
Message Passing Interface (MPI)
I Client-server architecture; a master and worker processes
I Average run time (wall-clock) is computed over ten runs of the
algorithm on each dataset with specific number of processes
I Clock starts before reading data and ends after all results are
received by master process
data set jX j N
Munin1 189 750,000
Munin2 1,003 750,000
Bank 1,823 1,140,000
PGM, Sept 17-19, 2014 11
12. Three Different Test Computers
1. Ubuntu: a linux server running - 1 Intel Xeon(TM) E3-1270
Processor (4 cores) and 32 GB RAM
2. Fyrkat: cluster with 80 nodes - each has 2 Intel Xeon (TM)
X5260 Processors (2 cores) and 16GB RAM
3. Vilje: cluster with 1404 nodes - each has 2 Intel Xeon E5-2670
Processors (8 cores) and 32GB
Test computers have different resource management systems
For 2. and 3. we allocate one worker node for each process
PGM, Sept 17-19, 2014 12
13. Ubuntu - Linux Server One Node
50
45
40
35
30
25
20
15
10
5
0
Munin1 on Ubuntu
0 5 10 15 20 25
Average run time in seconds
Number of processors
1000
900
800
700
600
500
400
300
200
100
0
Munin2 on Ubuntu
0 5 10 15 20 25
Average run time in seconds
Number of processors
Ubuntu: A linux server running Ubuntu - Intel Xeon(TM) E3-1270 Processor (4 cores) and 32 GB RAM
PGM, Sept 17-19, 2014 13
14. Fyrkat - Cluster 80 Nodes
1400
1200
1000
800
600
400
200
0
Munin2 on Fyrkat
0 5 10 15 20 25 30 35
Average run time in seconds
Number of processors
7000
6000
5000
4000
3000
2000
1000
0
Bank on Fyrkat
0 5 10 15 20 25 30 35
Average run time in seconds
Number of processors
Fyrkat: 80 nodes - each has 2 Intel Xeon (TM) X5260 Processors (2 cores) and 16GB RAM
PGM, Sept 17-19, 2014 14
15. Vilje - Cluster 1404 Nodes
1800
1600
1400
1200
1000
800
600
400
200
0
Munin2 on Vilje
0 10 20 30 40 50 60 70 80 90 100
Average run time in seconds
Number of processors
7000
6000
5000
4000
3000
2000
1000
0
Bank on Vilje
0 20 40 60 80 100 120 140 160 180 200
Average run time in seconds
Number of processors
Vilje: 1404 nodes - each has 2 Intel Xeon E5-2670 Processors (8 cores) and 32GB
PGM, Sept 17-19, 2014 15
16. Speedup Factors - Vilje Bank
Relative performance compared to case of using one processor
25
20
15
10
5
0
0 20 40 60 80 100 120 140 160 180 200
Speedup factor
Number of processors
Time
Files
Optimal
Time speed-up factors for reading data
Files theoretical number of files read by
each process (k=v jX j)
Optimal square root of the number of
processors
180
160
140
120
100
80
60
40
20
0
0 20 40 60 80 100 120 140 160 180 200
Speedup factor
Number of processors
Score
Score speed-up factor for computing
mutual information
PGM, Sept 17-19, 2014 16
17. Conclusion Future Work
Conclusion
I Introduced a new algorithm for learning TAN structure using
parallel computing
I Empirical evaluation shows that significant performance
improvements are possible
Future Work
I Apply principles to structure learning in general TAN
I Multithread implementation Multiprocess
I Horizontal parallelization Vertical parallelization
PGM, Sept 17-19, 2014 17
18. This project has received funding from the European Union’s
Seventh Framework Programme for research, technological
development and demonstration under grant agreement no 619209
PGM, Sept 17-19, 2014 18