IEEE CLOUD \'11

1,778 views
1,731 views

Published on

The presentation of the paper:Deadline Queries: Leveraging the Cloud to Produce On-Time results

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,778
On SlideShare
0
From Embeds
0
Number of Embeds
67
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • ----- Meeting Notes (10/20/10 14:48) -----Notasgenericas:Mais "sharp"FocarnaaudienciaNuncadigocomovouavaliar o sistema.Gantt estamuitopequeno
  • In particular I’d like to refer to two practical cases:1st one is that of a portuguese bank that must complete processing 10M transaction and produce the respective reports in the morning, but has no idea how much machine power it requires to do so.2nd is that of a portuguese telecom company that is actually building the largest portuguese private cloud, but still has problems alocating nodes to tasks to guarantee they complete in time.
  • Create an animation in a slide or two that describes how the problem was previously deal with and our solution, introduce the running example hereStory of the slide is:start a processing documents, (start moving doc arrow to the cluster)when the system predicts the deadline will be missed (clock turns red)… it starts discard data or reducing accuracy (put documents in the trash)mencionaroralmenteque outros sistemasdicartam dados mas naoficarporaquimuito tempomencionarqueemmuitoscasosnao se podedeitar dados for a (exemplosprevios)
  • Story of the slide is:start a processing documents, (start moving doc arrow to the cluster)when we see the deadline will be missed (clock turns red)… start expanding used resources
  • Mencionarqueadoptamos streaming mapreduceparasermoscapazes de lidar com alteracoes no cluster
  • Transform task in dataflow and split data in partitionsRequest nodes and assign dataflow parts to themNodes fetch partitions from a queue and insert them in the dataflowNodes send report updates to the master, which decides if more nodes are needed anf if so…**CLICK**New nodes are added to the computationThe fact that we use stramingmapreduce allows us to:deal with data skew, by using streaming routing techniquesdeal with faults relatively quickly
  • Transform task in dataflow and split data in partitionsRequest nodes and assign dataflow parts to themNodes fetch partitions from a queue and insert them in the dataflowNodes send report updates to the master, which decides if more nodes are needed anf if so…**CLICK**New nodes are added to the computationUse load-balanced content insensitive routing where possibleUse load-balanced content sensitive routing where needed.----- Meeting Notes (6/27/11 19:18) -----Por o nome das maquinas----- Meeting Notes (6/29/11 14:33) -----eficiente palavra demasiado relativa
  • Experiment 1 – Varying Initial Cluster Size, Click 1 – The experiment starts with 1 nodeClick 2 – At first we
  • Series 1Lets see the experimentsWe begin by executing the query starting with one node**CLICK**The system starts execution with 1 nodeAt first there are no statistics on progress so nothing can be said about wether the deadline will be met**CLICK**As soon as the system detects the deadline will be missed----- Meeting Notes (6/29/11 14:33) -----threshold on dealine fault detectionthreshold on max number of machineshistoria interesasnte para contar como se portaria em diversos modelos de custospeculOS
  • Clarificaroralmentecomoforaminjectadas as perturbacoes (comandoslinux)
  • Normal OperationMaps process single partitions and tag results with part_idPartial reduces maitain per partition windowsTotal Reduces maitain a tentative set, where results are separated partition wise.Upon receiving a part_end punctuation When faults occurMaster notifies remaining nodes that the node has failed (so they know not to receive data from that node).Nodes discard all data from that partition (partial reduces discard the partitions window set and total reduces discard the partitions group in the tentative set)----- Meeting Notes (6/27/11 18:55) -----Transformar isto em dois slides
  • IEEE CLOUD \'11

    1. 1. Deadline Queries: Leveraging the Cloud to Produce On-Time Results<br />Authors: David Ribeiro Alves, Pedro Bizarro, Paulo Marques<br />
    2. 2. In a nutshell<br />Cluster computing widely used to solve “BigData” problems<br />Users use programming abstractions to express the computation, e.g., MapReduce, but are left with some difficult questions: <br />how many nodes? <br />how long will it take?<br />Proposed solution:Users define a deadline; cluster expands/contracts to meet it.<br />2<br />CLOUD '11<br />
    3. 3. Introducing Deadline Queries<br />Cluster computing tasks that complete within a deadline…<br />… while minimizingcost/resource consumption<br />Independently of:<br />3<br />Processing Capacity per Machine<br />Faults or Perturbations<br />Initial Number of Nodes<br />Data Size, Content or Skew<br />Computation Complexity<br />CLOUD '11<br />
    4. 4. Approaches in current systems<br />4<br />… make the task fit the cluster.<br />CLOUD '11<br />
    5. 5. Our Approach<br />5<br />… make cluster fit the task.<br />CLOUD '11<br />
    6. 6. Architecture and Runtime<br />6<br />Ex: SELECT symbol, avg(value), avg(volume) FROM Stocks<br /> GROUP BY symbol FINISH IN 900 SEC<br />Master Node<br />Query<br /> IaaS Provider<br />request nodes<br />metrics<br />Worker Node<br />Part. 1<br />Worker Node<br />mod. cluster<br />Worker Node<br />Part. 2<br />Worker Node<br />Worker<br />Worker<br />Part. 3<br />Worker<br />Part. n<br />CLOUD '11<br />
    7. 7. Stream Processing <br />Continuous processing allows phases to start before previous phases complete<br />Continuous processing allows to continuously gather progress metrics about the computation as a whole<br />SP provides continuous load balancing, which allows to: <br />take immediate advantage of arriving nodes<br />deal with temporary or permanent asymmetries<br />deal with data skew<br />SP fault tolerance allow to quickly respond to faults <br />CLOUD '11<br />7<br />
    8. 8. MapReduce<br />SELECT symbol, avg(value),avg(volume)<br />FROM Stocks <br />GROUP BY symbol<br />FINISH IN 900 sec<br />MapReduce Decomposition:<br />8<br />Fetch & Transform<br />Map (Select/Project)<br />Group<br />Reduce (Aggregate)<br />Store Results<br />CLOUD '11<br />
    9. 9. Streaming MapReduce - Scaling<br />Stream Processing => load balancing and fault tolerance in a changing cluster<br />MapReduce => Simple, parallel, scalable programming and execution model<br />9<br />CLOUD '11<br />
    10. 10. Progress estimation<br />Consumed vs. remaining data + linear regression to estimate finish time.<br />React accordingly by either expanding or contracting the cluster.<br />10<br />CLOUD '11<br />
    11. 11. Experimental Evaluation - Setup<br />11<br />Real world environment experiments<br />On top of Amazon EC2<br />Running Query:<br />SELECT symbol, avg(value), avg(volume)FROM StocksGROUP BY symbol FINISH IN 900 sec<br />Used between 1 and 27 machines (m1.large)<br />2* Dual Core Xeon (2.66 Ghz)<br />7.5 GB of RAM<br />Experiments show:<br />Predicted remaining time<br />Number of nodes<br />CLOUD '11<br />
    12. 12. Exp. 1 – Varying Initial Cluster Size<br />12<br />CLOUD '11<br />
    13. 13. Exp. 2 – Varying Deadline<br />13<br />CLOUD '11<br />
    14. 14. Exp. 3 – Introducing Perturbations<br />14<br />CLOUD '11<br />
    15. 15. Conclusions<br />Cloud Computing, e.g., IaaS, allow new approaches to cluster computing and new optimization goals.<br />Deadline Queries may help in expressing computation prov. requirements beyond number of nodes.<br />Deadline Queries is a viable alternative to implement hard time limits for query execution.<br />Real implementation and evaluation show approach is feasible and works as expected. <br />15<br />CLOUD '11<br />
    16. 16. 16<br />Questions?<br />CLOUD '11<br />
    17. 17. Fault Tolerance<br />17<br />CLOUD ‘11<br />

    ×