Minimizing cost in distributed multiquery processing applications

Minimizing Communication Cost in Distributed Multi-query Processing Jian Li, Amol Deshpande, Samir Khuller Department of Computer Science, University of Maryland Presented by: Luis Galárraga Saarland University July 7th, 2010

Outline Justification and related work

Proposed methods and analysis Graph theory concepts

Experimental results Conclusions

Outline Justification Problem formulation

Justification Emergence of large-scale distributed query processing in applications like: Wireless sensor networks

Distributed stream processing applications Common need: Minimize data movement!!

Justification Data transfer cost from one node to another may be heterogenous.

Outline Justification Problem formulation Proposed methods and analysis Graph theory concepts

Problem formulation Minimization of data movement for multiples queries.

Assumptions: Non-uniform communication cost model.

Query plans are part of the input.

Intermediate results sizes are known.

More formally Input: Set of relations or data sources

Topology, undirected weighted graph

Assignment of relations to nodes in the topology.

Each query comes with a plan in the form of a directed tree. Destination node Data sources involved Data size S i S j S i x S j w z(S i ) z(S j ) z(S i x S j )

More formally Given the topology graph G c and a set of trees representing the query plans, our goal is to find a data movement plan that minimizes the total communication cost incurred while executing the queries.

Problem formulation Topology G c Queries (10) S 1 S 2 S 1 x S 2 C (10) (7) S 4 S 1 x S 2 x S 4 (5) (100) (100) S 2 S 6 S 2 x S 6 D (10) (5) S 2 S 5 S 2 x S 5 B (10) (6) (8) B A C D E F S 2 S 1 S 3 S 4 S 6 S 5 (10) (10) (100) (8) (100)

Problem formulation If a block of data sized S is sent along an edge e with weight w(e) then the communication cost is S * w(e)

For simplicity in the examples assume w(e) = 1 for all edges. But the algorithm is general in that sense!

Proposed methods and analysis Graph theory concepts Tree topology

Problem analysis It has been proved to be NP-Hard Via reduction to the Steiner Tree problem Is everything lost? If topology graph is a tree, there is a polynomial-time algorithm.

For general topologies, aproximation algorithms are known.

Steiner Tree problem Given an undirected graph:

Find a tree of minimum weight that connects all vertices in S. It can contain vertices not in S, known as Steiner points.

Steiner Tree problem 5 5 2 6 2 2 3 4 13 2 2 3 4 Terminals Steiner points

The algorithm It implies to solve a series of min-cut problems on appropriately constructed hypergraphs . Umm.. Hypergraphs?

Hypergraphs Generalization of a graph. In normal graphs, edges can be seen as pairs of vertices.

Hyperedges can group any number of vertices.

Max-flow/Min-cut Given a weighted, directed graph and nodes s , t known as source and sink:

Find a flow or mapping of maximum value:

Max-flow/Min-cut 3 / 3 2 / 3 2 / 2 3 / 3 0 / 2 1 / 4 2 / 2 3 / 3 Flow Capacity

Max-flow/Min-cut A min-cut is a set of edges with minimum weight such that if removed from the graph, there is no path from s to t.

The maximum value of an s-t flow is equal to the minimum capacity of an s-t cut.

Max-flow/Min-cut 3 / 3 2 / 3 3 / 3 0 / 2 1 / 4 2 / 2 3 / 3 2 / 2 Flow Capacity

Max-flow/Min-cut in hypergraphs Problem solvable in polynomial time.

What about max-flow in hypergraphs? For every hyperedge, add two new nodes and a directed edge between them of capacity equal to the weight of the hyperedge.

Max-flow/Min-cut in hypergraphs w

Minimizing cost in distributed multiquery processing applications

More Related Content

What's hot

Viewers also liked

Similar to Minimizing cost in distributed multiquery processing applications

More from Luis Galárraga

Recently uploaded

Minimizing cost in distributed multiquery processing applications

Editor's Notes