Approximation Algorithms for Problems
on Networks and Streams of Data
Luca Foschini - Ph.D. Defense

Committee: Subhash Suri (chair), John Gilbert, Teofilo Gonzalez
Why Approximation Algorithms?
Why Approximation Algorithms?

Exact algorithms require many resources
Why Approximation Algorithms?
                                                 Hardware
Exact algorithms require many resources




                                          Apps




                                                 Data
Why Approximation Algorithms?
                                                 Hardware
Exact algorithms require many resources




                                          Apps



  Problems solvable
       exactly                                   Data
A Long History,
and Work in Progress




                       © Original Artist
A Long History,
and Work in Progress

✤   Early ‘70s - many combinatorial
    problems found to be NP-hard

✤   Recently - more restricting
    computation models proposed e.g.,
    data stream

                                        © Original Artist
A Long History,
and Work in Progress

✤   Early ‘70s - many combinatorial
    problems found to be NP-hard

✤   Recently - more restricting
    computation models proposed e.g.,
    data stream

                                                            © Original Artist




          Heuristics not sufficient, provable guarantees needed
Content of the Dissertation
Content of the Dissertation




"
Content of the Dissertation


    Networks



"

    Data Streams
Content of the Dissertation

                                      STACS12 +
                   Partitioning
                                     Algorithmica
    Networks
                                   SODA11 +
                   Shortest Paths
                                  Algorithmica
"
                   Time Series         ICDE10

    Data Streams

                   Burst Detection     NSDI11
Content of the Dissertation

                                      STACS12 +      ICISS08
                   Partitioning
                                     Algorithmica
    Networks                                         ICIP11
                                   SODA11 +
                   Shortest Paths                   ALENEX10
                                  Algorithmica
"
                                                     ESA11
                   Time Series         ICDE10

    Data Streams                                    WOOT11

                   Burst Detection     NSDI11        WAW09
Roadmap

                                         STACS12 +
                      Partitioning
                                        Algorithmica
       Networks
                                      SODA11 +
                      Shortest Paths
                                     Algorithmica
   "
                      Time Series         ICDE10

       Data Streams

                      Burst Detection     NSDI11
k-Balanced Partitioning Problem
 Given: an unweighted graph G on n
 vertices; an integer k

 Find: a partition of the vertices of G
 into k sets Vi s.t.

    ✤   |Vi |  dn/ke
    ✤   Cut size (number of edges
        connecting vertices in
        different Vi) is minimized


                  joint work with Andi Feldmann (ETHz)
             (appeared in STACS12, submitted to Algorithmica)
Motivation & Complexity

✤   Divide-and-conquer algorithms

✤   VLSI design

✤   Parallel computing



✤   NP-hard to approximate cut size within any finite value alpha
    [Andreev and Räcke 2006]
Related Work
General Graphs & Trees

✤   Algorithm is !-approximation if
    finds a cut at most ! times optimal

✤   NP-hard to approximate cut size
    within any finite ! [Andreev and
    Räcke 2006]
General Graphs & Trees

✤   Algorithm is !-approximation if
    finds a cut at most ! times optimal

✤   NP-hard to approximate cut size
    within any finite ! [Andreev and
    Räcke 2006]


        Trees - simple instances?
General Graphs & Trees

✤   Algorithm is !-approximation if
    finds a cut at most ! times optimal

✤   NP-hard to approximate cut size      n=31, k=8 cut size = 10
    within any finite ! [Andreev and
    Räcke 2006]


        Trees - simple instances?


                                         n=31, k=9 cut size = 8
Trees Are Hard
Trees Are Hard

✤   NP-hard to approx. cut size for !=nc
    (for any c<1) even if constant diameter
Trees Are Hard

✤   NP-hard to approx. cut size for !=nc
    (for any c<1) even if constant diameter

✤   APX-hard to approx. cut-size even if
    constant degree
Trees Are Hard

✤   NP-hard to approx. cut size for !=nc
    (for any c<1) even if constant diameter

✤   APX-hard to approx. cut-size even if
    constant degree




            Most NP-hard problems become trivial on trees
Relax!
Relax!

 Balance constraint relaxed:
   |Vi |  (1 + ")dn/ke
Relax!

 Balance constraint relaxed:
   |Vi |  (1 + ")dn/ke


                               Balance relaxed
Perfect balance
Optimal cut size
                                                   Cut size
                                                 approximated
                                  !
Relax!

 Balance constraint relaxed:          Bicriteria Approximation: cut
                                      size approximation ! measured
   |Vi |  (1 + ")dn/ke
                                      w.r.t perfectly balanced optimum


                               Balance relaxed
Perfect balance
Optimal cut size
                                                         Cut size
                                                       approximated
                                  !
0<eps<1 on general graphs



✤   eps>1 -- alpha in .... spreading metric techniques

✤   0<eps < 1 not much improvement. 1/epsˆ2 log ^1.5 n

✤   What about trees?
Summary of PTAS for Trees


✤   Compute optimal cut size for each coarse signature using DP

✤   Pack each coarse signatures into bins of size (1 + ")dn/ke

✤   Pick solution with smallest cut size among those fitting into k bins
                                4       1+3d 1 log( 1 )e
✤   Total time complexity O(n (k/")          "      "      )
Summary of PTAS for Trees


✤   Compute optimal cut size for each coarse signature using DP

✤   Pack each coarse signatures into bins of size (1 + ")dn/ke

✤   Pick solution with smallest cut size among those fitting into k bins
                                4       1+3d 1 log( 1 )e
✤   Total time complexity O(n (k/")          "      "      )


                               Show that ! =1
Extension to General Graphs


✤   Decomposition of graph into collection of trees [Räcke, Madry], cut
    size worsen by at most O(log n) for at least 1 tree

✤   Apply PTAS for trees to each instance

✤   Return partition for tree with minimum cut

✤   alpha = O(log n) improves
Tree Decomposition
Analysis of Embedding
Extensions & Open Problems
✤   Tree embedding techniques allow the !=1 tree PTAS to translate to a
    !=O(log n) approx for general weighted graphs

✤   Improves on previous best != O(log 1.5 n/"2 )
Extensions & Open Problems
✤   Tree embedding techniques allow the !=1 tree PTAS to translate to a
    !=O(log n) approx for general weighted graphs

✤   Improves on previous best != O(log 1.5 n/"2 )


                                             



                             




                                                            
                Graphs                              Trees
Roadmap

                                         STACS12 +
                      Partitioning
                                        Algorithmica
       Networks
                                      SODA11 +
                      Shortest Paths
                                     Algorithmica
   "
                      Time Series         ICDE10

       Data Streams

                      Burst Detection     NSDI11
Approximating Time Series



✤   Represent a time series with B
    linear segments

✤   New value arrives to the time
    series, need to reallocate
    segments
Approximating Time Series



✤   Represent a time series with B
    linear segments

✤   New value arrives to the time
    series, need to reallocate
    segments
Approximating Time Series



✤   Represent a time series with B
    linear segments

✤   New value arrives to the time
    series, need to reallocate
    segments
Old Algorithms, New Proofs
Old Algorithms, New Proofs

✤   We prove that a popular greedy merge
    scheme gives constant (bicriteria)
    approx. for many L_p norms. (ICDE10;
    joint with Gandhi, Suri)
Old Algorithms, New Proofs

✤   We prove that a popular greedy merge
    scheme gives constant (bicriteria)
    approx. for many L_p norms. (ICDE10;
    joint with Gandhi, Suri)

✤   Results implemented in Linux Kernel
    and used to detect traffic bursts in
    networks (NSDI11, joint with Uyeda,
    Suri, Varghese, Baker)
Old Algorithms, New Proofs

✤   We prove that a popular greedy merge
    scheme gives constant (bicriteria)
    approx. for many L_p norms. (ICDE10;
    joint with Gandhi, Suri)

✤   Results implemented in Linux Kernel
    and used to detect traffic bursts in
    networks (NSDI11, joint with Uyeda,
    Suri, Varghese, Baker)


           Next steps: Extend results in ICDE10 to other norms
Conclusion


✤   Approximation is necessary to reduce resource utilization

✤   Presented approximation algorithms for problems from different
    domains that we cannot afford to solve exactly

✤   Presented basic building blocks that can be used across the board to
    design approximation algorithms

Defense

  • 1.
    Approximation Algorithms forProblems on Networks and Streams of Data Luca Foschini - Ph.D. Defense Committee: Subhash Suri (chair), John Gilbert, Teofilo Gonzalez
  • 2.
  • 3.
    Why Approximation Algorithms? Exactalgorithms require many resources
  • 4.
    Why Approximation Algorithms? Hardware Exact algorithms require many resources Apps Data
  • 5.
    Why Approximation Algorithms? Hardware Exact algorithms require many resources Apps Problems solvable exactly Data
  • 6.
    A Long History, andWork in Progress © Original Artist
  • 7.
    A Long History, andWork in Progress ✤ Early ‘70s - many combinatorial problems found to be NP-hard ✤ Recently - more restricting computation models proposed e.g., data stream © Original Artist
  • 8.
    A Long History, andWork in Progress ✤ Early ‘70s - many combinatorial problems found to be NP-hard ✤ Recently - more restricting computation models proposed e.g., data stream © Original Artist Heuristics not sufficient, provable guarantees needed
  • 9.
    Content of theDissertation
  • 10.
    Content of theDissertation "
  • 11.
    Content of theDissertation Networks " Data Streams
  • 12.
    Content of theDissertation STACS12 + Partitioning Algorithmica Networks SODA11 + Shortest Paths Algorithmica " Time Series ICDE10 Data Streams Burst Detection NSDI11
  • 13.
    Content of theDissertation STACS12 + ICISS08 Partitioning Algorithmica Networks ICIP11 SODA11 + Shortest Paths ALENEX10 Algorithmica " ESA11 Time Series ICDE10 Data Streams WOOT11 Burst Detection NSDI11 WAW09
  • 14.
    Roadmap STACS12 + Partitioning Algorithmica Networks SODA11 + Shortest Paths Algorithmica " Time Series ICDE10 Data Streams Burst Detection NSDI11
  • 15.
    k-Balanced Partitioning Problem Given: an unweighted graph G on n vertices; an integer k Find: a partition of the vertices of G into k sets Vi s.t. ✤ |Vi |  dn/ke ✤ Cut size (number of edges connecting vertices in different Vi) is minimized joint work with Andi Feldmann (ETHz) (appeared in STACS12, submitted to Algorithmica)
  • 16.
    Motivation & Complexity ✤ Divide-and-conquer algorithms ✤ VLSI design ✤ Parallel computing ✤ NP-hard to approximate cut size within any finite value alpha [Andreev and Räcke 2006]
  • 17.
  • 18.
    General Graphs &Trees ✤ Algorithm is !-approximation if finds a cut at most ! times optimal ✤ NP-hard to approximate cut size within any finite ! [Andreev and Räcke 2006]
  • 19.
    General Graphs &Trees ✤ Algorithm is !-approximation if finds a cut at most ! times optimal ✤ NP-hard to approximate cut size within any finite ! [Andreev and Räcke 2006] Trees - simple instances?
  • 20.
    General Graphs &Trees ✤ Algorithm is !-approximation if finds a cut at most ! times optimal ✤ NP-hard to approximate cut size n=31, k=8 cut size = 10 within any finite ! [Andreev and Räcke 2006] Trees - simple instances? n=31, k=9 cut size = 8
  • 21.
  • 22.
    Trees Are Hard ✤ NP-hard to approx. cut size for !=nc (for any c<1) even if constant diameter
  • 23.
    Trees Are Hard ✤ NP-hard to approx. cut size for !=nc (for any c<1) even if constant diameter ✤ APX-hard to approx. cut-size even if constant degree
  • 24.
    Trees Are Hard ✤ NP-hard to approx. cut size for !=nc (for any c<1) even if constant diameter ✤ APX-hard to approx. cut-size even if constant degree Most NP-hard problems become trivial on trees
  • 25.
  • 26.
    Relax! Balance constraintrelaxed: |Vi |  (1 + ")dn/ke
  • 27.
    Relax! Balance constraintrelaxed: |Vi |  (1 + ")dn/ke Balance relaxed Perfect balance Optimal cut size Cut size approximated !
  • 28.
    Relax! Balance constraintrelaxed: Bicriteria Approximation: cut size approximation ! measured |Vi |  (1 + ")dn/ke w.r.t perfectly balanced optimum Balance relaxed Perfect balance Optimal cut size Cut size approximated !
  • 29.
    0<eps<1 on generalgraphs ✤ eps>1 -- alpha in .... spreading metric techniques ✤ 0<eps < 1 not much improvement. 1/epsˆ2 log ^1.5 n ✤ What about trees?
  • 30.
    Summary of PTASfor Trees ✤ Compute optimal cut size for each coarse signature using DP ✤ Pack each coarse signatures into bins of size (1 + ")dn/ke ✤ Pick solution with smallest cut size among those fitting into k bins 4 1+3d 1 log( 1 )e ✤ Total time complexity O(n (k/") " " )
  • 31.
    Summary of PTASfor Trees ✤ Compute optimal cut size for each coarse signature using DP ✤ Pack each coarse signatures into bins of size (1 + ")dn/ke ✤ Pick solution with smallest cut size among those fitting into k bins 4 1+3d 1 log( 1 )e ✤ Total time complexity O(n (k/") " " ) Show that ! =1
  • 32.
    Extension to GeneralGraphs ✤ Decomposition of graph into collection of trees [Räcke, Madry], cut size worsen by at most O(log n) for at least 1 tree ✤ Apply PTAS for trees to each instance ✤ Return partition for tree with minimum cut ✤ alpha = O(log n) improves
  • 33.
  • 34.
  • 35.
    Extensions & OpenProblems ✤ Tree embedding techniques allow the !=1 tree PTAS to translate to a !=O(log n) approx for general weighted graphs ✤ Improves on previous best != O(log 1.5 n/"2 )
  • 36.
    Extensions & OpenProblems ✤ Tree embedding techniques allow the !=1 tree PTAS to translate to a !=O(log n) approx for general weighted graphs ✤ Improves on previous best != O(log 1.5 n/"2 )    Graphs Trees
  • 37.
    Roadmap STACS12 + Partitioning Algorithmica Networks SODA11 + Shortest Paths Algorithmica " Time Series ICDE10 Data Streams Burst Detection NSDI11
  • 38.
    Approximating Time Series ✤ Represent a time series with B linear segments ✤ New value arrives to the time series, need to reallocate segments
  • 39.
    Approximating Time Series ✤ Represent a time series with B linear segments ✤ New value arrives to the time series, need to reallocate segments
  • 40.
    Approximating Time Series ✤ Represent a time series with B linear segments ✤ New value arrives to the time series, need to reallocate segments
  • 41.
  • 42.
    Old Algorithms, NewProofs ✤ We prove that a popular greedy merge scheme gives constant (bicriteria) approx. for many L_p norms. (ICDE10; joint with Gandhi, Suri)
  • 43.
    Old Algorithms, NewProofs ✤ We prove that a popular greedy merge scheme gives constant (bicriteria) approx. for many L_p norms. (ICDE10; joint with Gandhi, Suri) ✤ Results implemented in Linux Kernel and used to detect traffic bursts in networks (NSDI11, joint with Uyeda, Suri, Varghese, Baker)
  • 44.
    Old Algorithms, NewProofs ✤ We prove that a popular greedy merge scheme gives constant (bicriteria) approx. for many L_p norms. (ICDE10; joint with Gandhi, Suri) ✤ Results implemented in Linux Kernel and used to detect traffic bursts in networks (NSDI11, joint with Uyeda, Suri, Varghese, Baker) Next steps: Extend results in ICDE10 to other norms
  • 45.
    Conclusion ✤ Approximation is necessary to reduce resource utilization ✤ Presented approximation algorithms for problems from different domains that we cannot afford to solve exactly ✤ Presented basic building blocks that can be used across the board to design approximation algorithms