• Save
IEEE CLOUD \'11
Upcoming SlideShare
Loading in...5
×
 

IEEE CLOUD \'11

on

  • 1,563 views

The presentation of the paper:Deadline Queries: Leveraging the Cloud to Produce On-Time results

The presentation of the paper:Deadline Queries: Leveraging the Cloud to Produce On-Time results

Statistics

Views

Total Views
1,563
Views on SlideShare
1,556
Embed Views
7

Actions

Likes
0
Downloads
0
Comments
0

2 Embeds 7

http://www.linkedin.com 6
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • ----- Meeting Notes (10/20/10 14:48) -----Notasgenericas:Mais "sharp"FocarnaaudienciaNuncadigocomovouavaliar o sistema.Gantt estamuitopequeno
  • In particular I’d like to refer to two practical cases:1st one is that of a portuguese bank that must complete processing 10M transaction and produce the respective reports in the morning, but has no idea how much machine power it requires to do so.2nd is that of a portuguese telecom company that is actually building the largest portuguese private cloud, but still has problems alocating nodes to tasks to guarantee they complete in time.
  • Create an animation in a slide or two that describes how the problem was previously deal with and our solution, introduce the running example hereStory of the slide is:start a processing documents, (start moving doc arrow to the cluster)when the system predicts the deadline will be missed (clock turns red)… it starts discard data or reducing accuracy (put documents in the trash)mencionaroralmenteque outros sistemasdicartam dados mas naoficarporaquimuito tempomencionarqueemmuitoscasosnao se podedeitar dados for a (exemplosprevios)
  • Story of the slide is:start a processing documents, (start moving doc arrow to the cluster)when we see the deadline will be missed (clock turns red)… start expanding used resources
  • Mencionarqueadoptamos streaming mapreduceparasermoscapazes de lidar com alteracoes no cluster
  • Transform task in dataflow and split data in partitionsRequest nodes and assign dataflow parts to themNodes fetch partitions from a queue and insert them in the dataflowNodes send report updates to the master, which decides if more nodes are needed anf if so…**CLICK**New nodes are added to the computationThe fact that we use stramingmapreduce allows us to:deal with data skew, by using streaming routing techniquesdeal with faults relatively quickly
  • Transform task in dataflow and split data in partitionsRequest nodes and assign dataflow parts to themNodes fetch partitions from a queue and insert them in the dataflowNodes send report updates to the master, which decides if more nodes are needed anf if so…**CLICK**New nodes are added to the computationUse load-balanced content insensitive routing where possibleUse load-balanced content sensitive routing where needed.----- Meeting Notes (6/27/11 19:18) -----Por o nome das maquinas----- Meeting Notes (6/29/11 14:33) -----eficiente palavra demasiado relativa
  • Experiment 1 – Varying Initial Cluster Size, Click 1 – The experiment starts with 1 nodeClick 2 – At first we
  • Series 1Lets see the experimentsWe begin by executing the query starting with one node**CLICK**The system starts execution with 1 nodeAt first there are no statistics on progress so nothing can be said about wether the deadline will be met**CLICK**As soon as the system detects the deadline will be missed----- Meeting Notes (6/29/11 14:33) -----threshold on dealine fault detectionthreshold on max number of machineshistoria interesasnte para contar como se portaria em diversos modelos de custospeculOS
  • Clarificaroralmentecomoforaminjectadas as perturbacoes (comandoslinux)
  • Normal OperationMaps process single partitions and tag results with part_idPartial reduces maitain per partition windowsTotal Reduces maitain a tentative set, where results are separated partition wise.Upon receiving a part_end punctuation When faults occurMaster notifies remaining nodes that the node has failed (so they know not to receive data from that node).Nodes discard all data from that partition (partial reduces discard the partitions window set and total reduces discard the partitions group in the tentative set)----- Meeting Notes (6/27/11 18:55) -----Transformar isto em dois slides

IEEE CLOUD \'11 IEEE CLOUD \'11 Presentation Transcript

  • Deadline Queries: Leveraging the Cloud to Produce On-Time Results
    Authors: David Ribeiro Alves, Pedro Bizarro, Paulo Marques
  • In a nutshell
    Cluster computing widely used to solve “BigData” problems
    Users use programming abstractions to express the computation, e.g., MapReduce, but are left with some difficult questions:
    how many nodes?
    how long will it take?
    Proposed solution:Users define a deadline; cluster expands/contracts to meet it.
    2
    CLOUD '11
  • Introducing Deadline Queries
    Cluster computing tasks that complete within a deadline…
    … while minimizingcost/resource consumption
    Independently of:
    3
    Processing Capacity per Machine
    Faults or Perturbations
    Initial Number of Nodes
    Data Size, Content or Skew
    Computation Complexity
    CLOUD '11
  • Approaches in current systems
    4
    … make the task fit the cluster.
    CLOUD '11
  • Our Approach
    5
    … make cluster fit the task.
    CLOUD '11
  • Architecture and Runtime
    6
    Ex: SELECT symbol, avg(value), avg(volume) FROM Stocks
    GROUP BY symbol FINISH IN 900 SEC
    Master Node
    Query
    IaaS Provider
    request nodes
    metrics
    Worker Node
    Part. 1
    Worker Node
    mod. cluster
    Worker Node
    Part. 2
    Worker Node
    Worker
    Worker
    Part. 3
    Worker
    Part. n
    CLOUD '11
  • Stream Processing
    Continuous processing allows phases to start before previous phases complete
    Continuous processing allows to continuously gather progress metrics about the computation as a whole
    SP provides continuous load balancing, which allows to:
    take immediate advantage of arriving nodes
    deal with temporary or permanent asymmetries
    deal with data skew
    SP fault tolerance allow to quickly respond to faults
    CLOUD '11
    7
  • MapReduce
    SELECT symbol, avg(value),avg(volume)
    FROM Stocks
    GROUP BY symbol
    FINISH IN 900 sec
    MapReduce Decomposition:
    8
    Fetch & Transform
    Map (Select/Project)
    Group
    Reduce (Aggregate)
    Store Results
    CLOUD '11
  • Streaming MapReduce - Scaling
    Stream Processing => load balancing and fault tolerance in a changing cluster
    MapReduce => Simple, parallel, scalable programming and execution model
    9
    CLOUD '11
  • Progress estimation
    Consumed vs. remaining data + linear regression to estimate finish time.
    React accordingly by either expanding or contracting the cluster.
    10
    CLOUD '11
  • Experimental Evaluation - Setup
    11
    Real world environment experiments
    On top of Amazon EC2
    Running Query:
    SELECT symbol, avg(value), avg(volume)FROM StocksGROUP BY symbol FINISH IN 900 sec
    Used between 1 and 27 machines (m1.large)
    2* Dual Core Xeon (2.66 Ghz)
    7.5 GB of RAM
    Experiments show:
    Predicted remaining time
    Number of nodes
    CLOUD '11
  • Exp. 1 – Varying Initial Cluster Size
    12
    CLOUD '11
  • Exp. 2 – Varying Deadline
    13
    CLOUD '11
  • Exp. 3 – Introducing Perturbations
    14
    CLOUD '11
  • Conclusions
    Cloud Computing, e.g., IaaS, allow new approaches to cluster computing and new optimization goals.
    Deadline Queries may help in expressing computation prov. requirements beyond number of nodes.
    Deadline Queries is a viable alternative to implement hard time limits for query execution.
    Real implementation and evaluation show approach is feasible and works as expected.
    15
    CLOUD '11
  • 16
    Questions?
    CLOUD '11
  • Fault Tolerance
    17
    CLOUD ‘11