A Generate-Test-Aggregate
Parallel Programming Library
Yu Liu1, Kento Emoto2, Zhenjiang Hu3
1The Graduate University for Advanced Studies
2The University of Tokyo
3National Institute of Informatics
PPoPP PMAM 2013
Systematic Parallel Programming for MapReduce
Outline
Introduction to GTA
The GTA library
ļ‚§ Implementation strategy
ļ‚§ Programming interface
ļ‚§ Automatic parallelization and optimization
Applications and evaluations
Conclusions
Outline
Introduction to GTA
The GTA library
ļ‚§ Implementation strategy
ļ‚§ Programming interface
ļ‚§ Automatic parallelization and optimization
Applications and evaluations
Conclusions
The GTA Programming Methodology
 Simple programming pattern
1. Generate all possible solution candidates;
2. Test and filter candidates;
3. Aggregate the valid candidates.
 Expressive and code efficient
ļ‚§ Covers a large class of problems
ļ‚§ Automatic optimization and parallelization
~ Kento Emoto, et.al., [ESOP’12]
An Example: The Knapsack Problem
Writing a parallel (MapReduce) program for the
knapsack problem is not easy.
Picture from Wikipedia
input: [ (1 $, 2 Kg), (2 $, 6 Kg), (3 $, 10 Kg) ]
weight limitation =15
generate:
[ [ ], [ (1$, 2 Kg) ], [ (2$, 6 Kg) ], [ (3 $, 10 Kg) ], [(1$, 2 Kg) , (2$,
6 Kg) ], [1$, 2 Kg) , (3 $, 10 Kg) ], [(2$, 6 Kg) , (3 $, 10 Kg) ],
[(1$, 2 Kg) , (2$, 6 Kg) , (3 $, 10 Kg) ] ]
test: [true, true, true, true, true, false, false]
filter: [ [ ], [ (1$, 2 Kg) ], [ (2$, 6 Kg) ], [ (3 $, 10 Kg) ],
[(1$, 2 Kg) , (2$, 6 Kg) ], [1$, 2 Kg) , (3 $, 10 Kg) ] ]
aggregate: 0$, 1$, 2 $, 3$, 3$, 4$
Naively implementing Knapsack is inefficient (O(2n)).
Input (length) Time (ms)
8 30
12 86
16 97
20 2829
24
java.lang.OutOfMemoryError: Java heap
space
performance of the naĆÆve Knapsack program
The GTA fusion theorem is introduced for resolve
efficiency problem
GTA Fusion
mapReduce
able
predicates
generator
aggregator
map ( mapReduceable.f ) .
reduce ( mapReduceable.combine )
MapReduce
Definitions of G,T,A
Class Name Algebraic Structure
Generator polymorphic semiring
generator
Predicate almost list
homomorphism
Aggregator semiring homomorphism
Ref: K.Emoto [ESOP’12]
Main Contributions
The implementation of a GTA library
ļ‚§ A simple and statically typed GTA-DSL is
implemented
ļ‚§ Algebraic structures and
computations/transformations of them are
implemented
Evaluation of GTA methodology
Outline
GTA programming methodology
The GTA library
ļ‚§ Implementation strategy
ļ‚§ Programming interface
ļ‚§ Automatic parallelization and optimization
Applications and evaluations
Conclusions
Object-oriented Functional Style
We defined the basic
algebraic structures.
Relations/transformations
of the algebras are well
typed
Examples
Outline
GTA programming methodology
The GTA library
ļ‚§ Implementation strategy
ļ‚§ Programming interface
ļ‚§ Automatic parallelization and optimization
Applications and evaluations
Conclusions
The users write GTA expressions like:
generate(g:GEN) filter(t:Predicate)* aggregate(a:Aggregator)
G‧T‧A Programming DSL
GEN, Aggregator, Predicate are Scala traits defined in the GTA library
Outline
GTA programming methodology
The GTA library
ļ‚§ Implementation strategy
ļ‚§ Programming interface
ļ‚§ Automatic parallelization and optimization
Applications and evaluations
Conclusions
GTA-fusion
G+A+T š‘€š‘Žš‘š‘…š‘’š‘‘š‘¢š‘š‘’š‘Žš‘š‘™š‘’[š‘“,āŠ•]
Input x1, x2, x3, … , xn
MAP
REDUCE
table1 tablen
f f f f
…
table1 tablentable2 āŠ• āŠ•āŠ• …
[EuroPar’11]
Implementation of GTA
Fusion/Optimization
The main difficulties:
How to define a polymorphic generator
How to define a predicate for test
How to define intermediate data structures
and other algebraic structures
Outline
GTA programming methodology
The GTA library
ļ‚§ Implementation strategy
ļ‚§ Programming interface
ļ‚§ Automatic parallelization and optimization
Applications and evaluations
Conclusions
More Examples
More examples in the paper and source package:
ļ‚§ Extended Knapsack problems
ļ‚§ The maximum-segments-sum problem
ļ‚§ Finding the most possible sequence (viterbi algorithm)
More information on: https://bitbucket.org/inii/gtalib
G‧T‧A Building Blocks
Our library provides commonly used GĀ·TĀ·A building
blocks and users can also implement their own G,T,As.
Performance Evaluations
Evaluations on EdubaseCluster (Cloud)
– Up to 32 VM nodes, each has 3GB RAM, 1 single
core CPU
– Executed on Spark – an in-memory MR cluster
Execution Time (Knapsack)
203.63
92.83 64.64 47.76 37.06 29.78 25.17 23.25
1727.973
679.305
637.33
471.2
362.36
287.08
234.25 223.44
0
200
400
600
800
1000
1200
1400
1600
1800
4 8 12 16 20 24 28 32
Time(second)
Number of VM nodes
1.00E+07 items
1.00E+08 items
Linear Speedup
0
1
2
3
4
5
6
7
8
9
4 8 12 16 20 24 28 32
speedup
number of VM
Knapsack
ViterbiAlg
MSS
Outline
GTA programming methodology
The GTA library
ļ‚§ Implementation strategy
ļ‚§ Programming interface
ļ‚§ Automatic parallelization and optimization
Applications and evaluations
Conclusions
Conclusions
We show GTA can be efficiently implemented
GTA-DSL can simplify parallel programming
ļ‚§ Simple programming model
ļ‚§ Good code efficiency
GTA-DSL is architecture independent
Future Works
Enrich the library by more building blocks in
terms of G, T, A
GTA-DSL can be extended to processing more
complex data structures such as tree/graph
Q&A
Thank you very much!

A Generate-Test-Aggregate Parallel Programming Library on Spark

  • 1.
    A Generate-Test-Aggregate Parallel ProgrammingLibrary Yu Liu1, Kento Emoto2, Zhenjiang Hu3 1The Graduate University for Advanced Studies 2The University of Tokyo 3National Institute of Informatics PPoPP PMAM 2013 Systematic Parallel Programming for MapReduce
  • 2.
    Outline Introduction to GTA TheGTA library ļ‚§ Implementation strategy ļ‚§ Programming interface ļ‚§ Automatic parallelization and optimization Applications and evaluations Conclusions
  • 3.
    Outline Introduction to GTA TheGTA library ļ‚§ Implementation strategy ļ‚§ Programming interface ļ‚§ Automatic parallelization and optimization Applications and evaluations Conclusions
  • 4.
    The GTA ProgrammingMethodology  Simple programming pattern 1. Generate all possible solution candidates; 2. Test and filter candidates; 3. Aggregate the valid candidates.  Expressive and code efficient ļ‚§ Covers a large class of problems ļ‚§ Automatic optimization and parallelization ~ Kento Emoto, et.al., [ESOP’12]
  • 5.
    An Example: TheKnapsack Problem Writing a parallel (MapReduce) program for the knapsack problem is not easy. Picture from Wikipedia
  • 6.
    input: [ (1$, 2 Kg), (2 $, 6 Kg), (3 $, 10 Kg) ] weight limitation =15 generate: [ [ ], [ (1$, 2 Kg) ], [ (2$, 6 Kg) ], [ (3 $, 10 Kg) ], [(1$, 2 Kg) , (2$, 6 Kg) ], [1$, 2 Kg) , (3 $, 10 Kg) ], [(2$, 6 Kg) , (3 $, 10 Kg) ], [(1$, 2 Kg) , (2$, 6 Kg) , (3 $, 10 Kg) ] ] test: [true, true, true, true, true, false, false] filter: [ [ ], [ (1$, 2 Kg) ], [ (2$, 6 Kg) ], [ (3 $, 10 Kg) ], [(1$, 2 Kg) , (2$, 6 Kg) ], [1$, 2 Kg) , (3 $, 10 Kg) ] ] aggregate: 0$, 1$, 2 $, 3$, 3$, 4$
  • 7.
    Naively implementing Knapsackis inefficient (O(2n)). Input (length) Time (ms) 8 30 12 86 16 97 20 2829 24 java.lang.OutOfMemoryError: Java heap space performance of the naĆÆve Knapsack program The GTA fusion theorem is introduced for resolve efficiency problem
  • 8.
    GTA Fusion mapReduce able predicates generator aggregator map (mapReduceable.f ) . reduce ( mapReduceable.combine ) MapReduce
  • 9.
    Definitions of G,T,A ClassName Algebraic Structure Generator polymorphic semiring generator Predicate almost list homomorphism Aggregator semiring homomorphism Ref: K.Emoto [ESOP’12]
  • 10.
    Main Contributions The implementationof a GTA library ļ‚§ A simple and statically typed GTA-DSL is implemented ļ‚§ Algebraic structures and computations/transformations of them are implemented Evaluation of GTA methodology
  • 11.
    Outline GTA programming methodology TheGTA library ļ‚§ Implementation strategy ļ‚§ Programming interface ļ‚§ Automatic parallelization and optimization Applications and evaluations Conclusions
  • 12.
    Object-oriented Functional Style Wedefined the basic algebraic structures. Relations/transformations of the algebras are well typed
  • 13.
  • 14.
    Outline GTA programming methodology TheGTA library ļ‚§ Implementation strategy ļ‚§ Programming interface ļ‚§ Automatic parallelization and optimization Applications and evaluations Conclusions
  • 15.
    The users writeGTA expressions like: generate(g:GEN) filter(t:Predicate)* aggregate(a:Aggregator) G‧T‧A Programming DSL GEN, Aggregator, Predicate are Scala traits defined in the GTA library
  • 16.
    Outline GTA programming methodology TheGTA library ļ‚§ Implementation strategy ļ‚§ Programming interface ļ‚§ Automatic parallelization and optimization Applications and evaluations Conclusions
  • 17.
    GTA-fusion G+A+T š‘€š‘Žš‘š‘…š‘’š‘‘š‘¢š‘š‘’š‘Žš‘š‘™š‘’[š‘“,āŠ•] Input x1,x2, x3, … , xn MAP REDUCE table1 tablen f f f f … table1 tablentable2 āŠ• āŠ•āŠ• … [EuroPar’11]
  • 18.
    Implementation of GTA Fusion/Optimization Themain difficulties: How to define a polymorphic generator How to define a predicate for test How to define intermediate data structures and other algebraic structures
  • 19.
    Outline GTA programming methodology TheGTA library ļ‚§ Implementation strategy ļ‚§ Programming interface ļ‚§ Automatic parallelization and optimization Applications and evaluations Conclusions
  • 20.
    More Examples More examplesin the paper and source package: ļ‚§ Extended Knapsack problems ļ‚§ The maximum-segments-sum problem ļ‚§ Finding the most possible sequence (viterbi algorithm) More information on: https://bitbucket.org/inii/gtalib
  • 21.
    G‧T‧A Building Blocks Ourlibrary provides commonly used GĀ·TĀ·A building blocks and users can also implement their own G,T,As.
  • 22.
    Performance Evaluations Evaluations onEdubaseCluster (Cloud) – Up to 32 VM nodes, each has 3GB RAM, 1 single core CPU – Executed on Spark – an in-memory MR cluster
  • 23.
    Execution Time (Knapsack) 203.63 92.8364.64 47.76 37.06 29.78 25.17 23.25 1727.973 679.305 637.33 471.2 362.36 287.08 234.25 223.44 0 200 400 600 800 1000 1200 1400 1600 1800 4 8 12 16 20 24 28 32 Time(second) Number of VM nodes 1.00E+07 items 1.00E+08 items
  • 24.
    Linear Speedup 0 1 2 3 4 5 6 7 8 9 4 812 16 20 24 28 32 speedup number of VM Knapsack ViterbiAlg MSS
  • 25.
    Outline GTA programming methodology TheGTA library ļ‚§ Implementation strategy ļ‚§ Programming interface ļ‚§ Automatic parallelization and optimization Applications and evaluations Conclusions
  • 26.
    Conclusions We show GTAcan be efficiently implemented GTA-DSL can simplify parallel programming ļ‚§ Simple programming model ļ‚§ Good code efficiency GTA-DSL is architecture independent
  • 27.
    Future Works Enrich thelibrary by more building blocks in terms of G, T, A GTA-DSL can be extended to processing more complex data structures such as tree/graph
  • 28.