Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Green scheduling


Published on

Il mio primo seminario in isislab sul green scheduling

Published in: Education, Technology, Business
  • Be the first to comment

  • Be the first to like this

Green scheduling

  1. 1. Green scheduling<br />Vincenzo De Maio<br />
  2. 2. Outline<br />Introduction<br />Theoretical Model<br />Computation model<br />Energy consumption model<br />Throttling model<br />Simulator<br />Green Heuristics<br />Results and future works<br />References<br />
  3. 3. What is green computing?<br />“The study and practice of designing, manufacturing, using, and<br />disposing of computers, servers, and associated subsystems such<br />as monitors, printers, storage devices, and networking and<br />communications systems efficiently and effectively with minimal or<br />no impact on the environment.”[1]<br />Professor Dr San MurugesanFaculty of ManagementMultimedia UniversityCyberjaya, Malaysia, <br />
  4. 4. Why does green computing matters?<br />Some numbers:<br />2 google searches = 14CO2 grams (as boiling a kettle!) (Alex Wissner-Gross, Harvard University physicist) [2][3]<br />Windows 7 + Microsoft office 2007 requires 70 times more RAM than Windows 98 + Office 2000 to write exactly the same text or send the same email[4]<br />In 2010, servers were responsible of the 2.5% of the total energy consumption of the USA. A Further 2.5% were used for their cooling.[5]<br />It was estimated that by 2020, servers would use more of the world's energy than air travel if current trends continued[5]<br />
  5. 5. Further references<br /> Green500 (<br />GreenIT (<br /> CO2Stats (<br />
  6. 6. Why green scheduling?<br />A green scheduler could provide<br />Energy-oriented task assignment<br />Setting the correct power level for current workload<br />Improved use of the power management <br />Learning power usage profile of job types<br />Could be a part of the Operating System power management<br />
  7. 7. What do we want from a green scheduler?<br />Efficiency<br />Simplicity<br />Time is money! <br />
  8. 8. Outline<br />Introduction<br />Theoretical Model<br />Computation model<br />Energy consumption model<br />Throttling model<br />Simulator<br />Green Heuristics<br />Results and future works<br />References<br />
  9. 9. Computation model<br />Tasks usually depends on each other<br />DAGs: Directed Acyclic Graphs<br />If there’s a dependency between task u and v, we put an arc between nodes u and v<br />
  10. 10. Computation model<br />SP-DAGs: Serial parallel DAGs<br />A DAG with 2 terminals (source and target) and an arc between them is a SP-DAG<br />Made by parallel and series composition of other SP-DAGs<br />
  11. 11. Why SP-DAGs?<br />They describe several significant class of computation (for instance divide and conquer algorithms)<br />They are the natural abstraction for several parallel programming languages (such as CILK) [10]<br />We can recognize if a DAG is an SP-DAG in linear time<br />We can easily transform an arbitrary DAG in an SP-DAG in linear time, using SP-ization<br />
  12. 12. LEGO® DAGs<br />Assessing the computational benefits of AREA-Oriented DAG-Scheduling (GennaroCordasco, Rosario De Chiara, Arnold L. Rosenberg) 2009<br />SP-DAGs made by a repertoire of Connected Bipartite Building Blocks DAGs representing the various subcomputations<br />
  13. 13. Furtherdefinitions on DAGs and SP-DAGs<br />A node in the DAG could be<br />Unelegible<br />Elegible<br />Assigned/executed<br />Schedule: <br />Topologicalsort of the DAG <br />Obtained by a rule for selectingwhichelegiblenode to executeateachstep of computation<br />v has been scheduled for execution or executed<br />v has at least a non-executed parent<br />Allv’sparenthave been executed<br />
  14. 14. Critical path<br />Longest path from the source to the sink<br />Why is so important?<br />It’s clear to see that we can’t finish our computation before executing each node on the critical path<br />So, time critical path execution takes it’s a trivial lower bound.<br />
  15. 15. Further definitions on DAGs and SP-DAGs<br />Yield of a node: number of nodes that become elegible when the given node completes his execution.<br />𝑬Σ(𝒊): Elegible nodes at step i in schedule Σ<br />𝑨𝑹𝑬𝑨(Σ)≜𝑖=0𝑛𝐸Σ𝑖<br /> <br />
  16. 16. Outline<br />Introduction<br />Theoretical Model<br />Computation model<br />Energy consumption model<br />Throttling model<br />Simulator<br />Green Heuristics<br />Results and projectedworks<br />References<br />
  17. 17. Energy consumption model<br />We need a realistic model for energy consumption<br />We should check<br />Circuits dissipation<br />Throttling models<br />
  18. 18. Energy consumption model<br />CMOS Circuit dissipation:<br />𝑃=𝐶𝑉2𝑓+𝐼𝑚𝑒𝑎𝑛𝑉 +𝑉𝑙𝑒𝑎𝑘𝑎𝑔𝑒<br />(we won’t consider short circuit power and leakage)<br />We assume a linear relationship between voltage and frequency<br />𝑓=𝑘𝑉<br /> <br />
  19. 19. Energy consumption model<br />Our model:<br />𝐸=𝐶 × 𝑇× 𝑓3<br />Where:<br />𝑇=𝑐𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒𝑠𝑓<br />𝑓= clock cycles per second<br />C enclosesseveralconstantslikecapacitance, k and clock multiplier<br /> <br />
  20. 20. Outline<br />Introduction<br />Theoretical Model<br />Computation model<br />Energy consumption model<br />Throttling model<br />Simulator<br />Green Heuristics<br />Results and projectedworks<br />References<br />
  21. 21. CPU throttlingmodels<br />Whichis the common throttling model used by modern processors?<br />ACPI: Advanced Configuration and Power-management Interface[6]<br />A fullyplatform-independent standard thatprovides:<br />Monitoring<br />Configuring<br />Hardware discovering<br />Power management<br />Definespowerstates for everydevice<br />
  22. 22. Performance vs powerstates<br />Powerstates:<br />C0: Operationalpower state<br />C1: Halt state<br />C2: Stop-clock<br />C3: Sleep<br />Performance states:<br />P0: Higher state<br />P1: Lessthan P0, frequency / voltagescaled<br />Pn: Lessthan Pn-1, frequancy/voltagescaled<br />In our model, weimplementonly C0 power state and P0,P1,P2 Performance states.<br />
  23. 23. Ourthrottling model<br />We use a DFS (DynamicFrequencyScaling) Model, assumingthatscalingdoesn’taddenergyoverhead<br />P0: 1.0 ∗𝑓<br />P1: 0.7 ∗𝑓<br />P2: 0.5 ∗ 𝑓<br /> <br />
  24. 24. Further considerations<br />In our model, an idle core consumes 0<br />We do not track the algorithm execution energy<br />We do not track energy dissipated by memory using<br />Energy is unbounded<br />We’re assuming that you can set a single core throttling<br />
  25. 25. Outline<br />Introduction<br />Theoretical Model<br />Computation model<br />Energy consumption model<br />Throttling model<br />Simulator<br />Green Heuristics<br />Results and future works<br />References<br />
  26. 26. The simulator<br />We implemented this model in a DAG-Scheduling simulator, <br />Providing classes and methods to calculate energy consumption<br />Implementing the energy model we discussed earlier<br />Paying attention to extensibility<br />
  27. 27. A typical simulation<br />Loads a DAG<br />Computesgraphcriticalpath<br />Initializesschedulersthatneeds to be tested<br />Executesschedulers on the givengraphs for a givennumber of trials (usually 100, due to randomnessinfluencingschedulers)<br />At the end of iterations, itcollectsstatisticsabout the executions, specifically<br />Makespan (min, max, average)<br />Energy consumptionaverage<br />Repeats on each DAG<br />
  28. 28. How we implemented the model<br />Our focus: Extensibility<br />We wanted our simulator to support multiple kind of models<br />Providing<br />Core abstraction<br />Throttling level abstraction<br />Energy aware scheduler abstraction<br />Totally decoupled from core and throttling level<br />Making easier to add<br />Different scheduling algorithms<br />Different core types <br />Different energy models<br />
  29. 29. Core abstraction<br />A core can<br />Execute tasks<br />Set its own throttling level<br />Track its power consumption<br />Problem: different cores could implement different throttling strategies<br />Solution: <br />Every core has its own throttling levels array<br />Throttling level is a nested class in the core implementation<br />
  30. 30. Throttlinglevelabstraction<br />A throttlinglevelcontains<br />Informationsaboutfrequency and consumption<br />Methods to calculate<br />Due date of a task at a givenlevel (lesser the level, slower the task execution)<br />Powerconsumptionat a givenlevel<br />
  31. 31. Energy package<br />Core interface<br />We assume thatevery core can execute task and set hisownthrottling<br />AbstractclassThrottlingLevel<br />Implements a throttlinglevel, with energyconsumption info and frequency.<br />Class DummyCore<br />Core base implementation<br />Class DefaultThrottlingLevel<br />DummyCorenestedclass, implementsour performance states<br />
  32. 32. Core interface<br />/**<br /> * Execute a task on this core<br /> * @param node The node that models the task<br /> * @param length Task length if executed at max power<br /> * @return the real task length (this could differ from input<br /> * if Core is set to a different throttling level)<br /> */<br />public double executeTask(ICONodenode, double length);<br />/**<br /> * Sets a core power consumption to his current throttling level<br /> * idleconsumption<br /> */<br />public voidsetIdle();<br />/**<br /> * Sets the core to a greater power level<br /> */<br />public voidincreaseThrottlingLevel();<br />/**<br /> * Sets the core to a lesser power level<br /> */<br />public voiddecreaseThrottlingLevel();<br />
  33. 33. ThrottlingLevel<br />/**<br /> * This method calculates the power consumption for a<br /> * given task length, according to power consumption unit<br /> * and other parameters, according to programmer's will that<br /> * implementsit.<br /> * <br /> * @param length The task length<br /> * @return Power consumption for this task<br /> */<br />abstract double getPowerConsumptionPerTask(double length);<br />/**<br /> * This method calculates how task length is modified<br /> * for the given throttling level<br /> * <br /> * @param length ideal length of the task<br /> * @return the real task length for the given throttling level<br /> */<br />abstract double getRealLength(double length);<br />
  34. 34. Throttlinglevelinitialization<br />public voidinitializeThrottlingLevels(double hardwareConstant,<br />double maxFreq, double maxVoltage, intthrottlingLevels) {<br />this.levels= new ThrottlingLevel[throttlingLevels];<br /> for( int i = 0; i < throttlingLevels - 1 ; i++ ){<br /> double numerator,denominator;<br /> numerator = i + 1.0;<br /> denominator = i + 2.0;<br /> double fraction = numerator/denominator;<br /> levels[i] = new DefaultThrottlingLevel("LEVEL"+i,<br />hardwareConstant, fraction * maxFreq, fraction * maxVoltage);<br /> }<br />this.levels[throttlingLevels- 1] = new DefaultThrottlingLevel("LEVEL"+(throttlingLevels-1),<br />hardwareConstant, maxFreq, maxVoltage);<br /> //necessary for correct use of increase and decrease<br />Arrays.sort(levels);<br /> //by default we set the maximum power level<br />this.currentThrottlingLevel= levels[2];<br />this.throttlingLevelIndex= 2;<br />this.dissipatedPower= 0.0;<br />}<br />
  35. 35. Energy awareschedulerabstraction<br />An energyawareschedulerhas to<br />Work with differenttypes of cores<br />Track the makespan and the energyconsumption<br />Implementlogic for<br />Core selection<br />Elegiblenodeselection<br />Choosing the right throttlinglevel<br />
  36. 36. Energy awarescheduler package<br />CoreSelector<br />Implements free core selectionstrategy (In thosetestswe use DefaultCoreSelectorclass)<br />EnergyAwareScheduler<br />Base for eachschedulertrackingenergyconsumption<br />
  37. 37. InspectingEnergyAwareSchedulerclass<br />/**<br /> * Istantiates a new EnergyAwareScheduler<br /> * @paramnumCores number of cores<br /> * @paramcoreClass class that models the desired core type<br /> * @throwsInstantiationException<br /> * @throwsIllegalAccessException<br /> * @throws IllegalArgumentException if numCores <= 0<br /> */<br />public EnergyAwareScheduler(intnumCores, Class<? extends Core> coreClass) <br />throwsInstantiationException, IllegalAccessException, IllegalArgumentException<br />/**<br /> * Calculates the task length on a given core<br /> * @paramcoreIndex index of the core in the corePool<br /> * @parameventLength ideal length of the task<br /> * @param node node to be executed<br /> * @return the task length if executed on coreIndex core<br /> */<br />protected double getTimeOffsetForCore(intcoreIndex, double eventLength,<br />ICONodenode) <br />
  38. 38. InspectingEnergyAwareSchedulerclass<br />/**<br />*Sets thtottlingfor core thatare going to execute a task in thisstep<br />*@paramcoreIndex: the core id<br />*/<br />protectedvoidsetBusyThrottling(intcoreIndex)<br />/**<br />*Sets throttling state for core thatwillremainidle<br />*/<br />protectedvoidsetIdleThrottling()<br />public double getTotalPowerConsumption()<br />private voidcalculateIdleConsumptions()<br />
  39. 39. Whataboutscheduling?<br />Schedule steps are implementedusing the TimeLine Object<br />A priorityqueuecontainingtwotypes of TimeEvent<br />processorsArrives<br />clientFinishes<br />At eachschedulingstepremoves the first event from the TimeLine<br />Schedulinglogicisimplemented in the runBatchedMakespanmethod<br />Furtherinitialization are made in the initBatchedMakespanmethod<br />
  40. 40. runBatchedMakespanmethod<br />While ( executedNode != target)<br />Event := timeline.pollNextEvent();<br />setOverallThrottlingLevel();<br />Switch(Event)<br />Case(processorsArrives)<br />𝑛𝑒 := min(availableCores,elegibleNodesNum)<br />For i := 0 to 𝑛𝑒<br />nextNode := getNextElegibleNode();<br />coreIndex := coreSelector.getCoreIndex();<br />corePool[coreIndex].setBusy();<br />setBusyThrottling(coreIndex);<br />timeOffset := getTimeOffsetForCore(coreIndex, eventLength, nextNode);<br />timeline.add(new TimeEvent(event.getTime+ timeOffset,ClientFinishes,nextNode)<br /> <br />
  41. 41. runBatchedMakespanMethod<br />Case(clientFinishes)<br />executedNode = event.getNode();<br />Execute(executedNode);<br />corePool[event.getOwnerCore()].setFree();<br />
  42. 42. Default strategies<br />getNextElegibleCore() isabstract (every core has to implementit)<br />setBusyThrottling(coreIndex) by default sets the maximum throttlinglevel, assetOverallThrottlingLevel()<br />Furtherinitializations are made in the initBatchedMakespanmethod<br />
  43. 43. Whatabout core selection?<br />Core selectionisimplementedas a differentclassimplementing the CoreSelectorinterface<br />CoreSelectorprovides the getCoreIndexmethod<br />In oursimulationwe use only the DefaultCoreSelector, thatsimplytakes the highestfrequency free core<br />
  44. 44. Outline<br />Introduction<br />Theoretical Model<br />Computation model<br />Energy consumption model<br />Throttling model<br />Simulator<br />Green Heuristics<br />Results and projectedworks<br />References<br />
  45. 45. Green heuristics<br />CPScheduler<br />AOSPDScheduler<br />TFIHeuristicScheduler<br />MarathonHeuristic<br />Every heuristic has been implemented as an EnergyAwareScheduler subclass<br />
  46. 46. CRITICAL PATH Based scheduling<br />Computes graph critical path<br />Select free core with highest energy<br />Set core to maximum power<br />Select node with maximum distance from the sink<br />To implement this scheduler, only method getNextElegibleCore() has been overwritten<br />
  47. 47. AOSPD SCHEDULING<br />On scheduling DAGs to maximize AREA (GennaroCordasco, Arnold L. Rosenberg)<br />An idea from Internet Computing scenario<br />It’s quite impossible to determine when new processors become available for task execution<br />So… What we can do?<br />Solutions: <br />Maximize the AREA at each execution step<br />GREAT! <br />Not always possibile [7]<br />Maximize the average AREA over the execution steps<br />Good! <br />Always possible! <br />
  48. 48. More on AOSPD scheduling<br />At step 1, wehave to choose B or C for execution<br />To maximize AREA atstep 1, wechoose C<br />Whathappens in step 2?<br />Choosingelegiblenodes in step 2 wecan’tmaximize AREA <br />To maximize AREA in step 2 weshouldhavechosen B, thatwasnot AREA-Maximizing for step 1<br />
  49. 49. Addingenergytrackingaospdscheduling<br />Wealreadyhadthisalgorithmimplemented, withoutenergytracking<br />How to plug AOSPD in?<br />Solution:<br />Extending the EnergyAwareScheduler<br />Refactoringclass so thatwehave the getNextElegibleNode()<br />
  50. 50. TFI HEURISTIC<br />The idea: if we have to wait for a task that requires much more time than others, we could slow down the faster ones to save energy<br />TFI: Max due date for critical path value i<br />
  51. 51. TFI HEURISTIC<br />Computes graph critical path<br />Select free core with highest frequency<br />Sort elegible nodes by their critical path value and yield<br />Find maximum due date<br />TFINode := node with maximum critical path value and due date<br />TFI:= maximum task length<br />𝑛𝑒:=min⁡(𝑐𝑜𝑟𝑒𝐴𝑣𝑎𝑖𝑙𝑎𝑏𝑙𝑒, 𝑒𝑙𝑒𝑔𝑖𝑏𝑙𝑒𝑁𝑜𝑑𝑒𝑠𝑁𝑢𝑚)<br />For i:=1 to 𝑛𝑒<br />Node := elegibleNodes[i]<br />If Node == TFINode<br />execute Node at max power<br />Else if (elegibleNodes.size() <numCores)<br />Execute our node at minimum throttling level that keeps his length lesser than TFI<br />Else execute node at default throttling level<br /> <br />
  52. 52. Marathon heuristic<br />The idea: Our problem reminds a Marathon…<br />We have to come first…<br />… and possibly alive  (with enough energy to come back home)<br />Being lazier we’ll save more energy<br />How should we run a marathon?<br />According to my uncle:<br />It’s better to preserve an average pace than squandering energies to run faster for a short stretch<br />When you can’t overtake (road too narrow or you’re too tired), it’s better to slow down a little waiting for best conditions<br />
  53. 53. Marathon heuristic<br />Computes graph critical path<br />Select free core with highest frequency<br />Sort elegible nodes by their critical path value and Yield<br />𝑛𝑒:=min⁡(𝑎𝑣𝑎𝑖𝑙𝑎𝑏𝑙𝑒𝐶𝑜𝑟𝑒𝑠, 𝑒𝑙𝑒𝑔𝑖𝑏𝑙𝑒𝑁𝑜𝑑𝑒𝑠𝑁𝑢𝑚)<br />Front := sum of yields of the first 𝑛𝑒 nodes<br />For i := 1 to 𝑛𝑒<br />Node := elegibleNodes[i]<br />If front + n <= numcores – (numcores / DELTA)<br />execute Node at minimum power<br />Else<br />Execute Node at average power<br /> <br />
  54. 54. Outline<br />Introduction<br />Theoretical Model<br />Computation model<br />Energy consumption model<br />Throttling model<br />Simulator<br />Green Heuristics<br />Results and projectedworks<br />References<br />
  55. 55. Assessing results<br />Remember “time is money”?<br />Solution: 𝐸𝑇2<br />Remember Area-time complexity in VLSI design?[8][9]<br />We use Energy-Time complexity to plot our schedulers performances<br />Lesser the 𝐸𝑇2 score, better the scheduler <br /> <br />
  56. 56. Tests<br />Test parameters:<br />Number of cores: 4, 8, 16<br />Standard deviation: 1, 2, 4, 8<br />Standard deviation influences task due date, which are generated by a Gaussian distribution with mean 1.0 and stdev in the given set<br />
  57. 57. 4 cores, stdev = 1<br />
  58. 58. 4 cores, stdev = 2<br />
  59. 59. 4 cores, stdev = 4<br />
  60. 60. 4 cores, stdev = 8<br />
  61. 61. 8 cores, stdev = 1<br />
  62. 62. 8 cores, stdev = 2<br />
  63. 63. 8 cores, stdev = 4<br />
  64. 64. 8 cores, stdev = 8<br />
  65. 65. 16 cores, stdev = 1<br />
  66. 66. 16 cores, stdev = 2<br />
  67. 67. 16 cores, stdev = 8<br />
  68. 68. Conclusions<br />We can’t obtain a makespanbetterthan the criticalpathscheduling<br />AREA and Yieldconsiderationsdoesn’t seemtoaddmuch more in termsofenergysavings<br />At least in a multicorescenario…<br />Probablyweshould focus only on criticalpath<br />Task due datesdoesn’t seemtoinfluencemakespantoomuch<br />
  69. 69. Future works<br />Tracking scheduler efficiency<br />Adding a model for idle core’s consumption<br />Considering a “finite energy” model<br />Extend it in a volunteer computing scenario<br />We could consider a scenario with many core on different dies<br />Adding an extra cost to switch them on<br />Adding thermal parameters<br />
  70. 70. Outline<br />Introduction<br />Theoretical Model<br />Computation model<br />Energy consumption model<br />Throttling model<br />Simulator<br />Green Heuristics<br />Results and projectedworks<br />References<br />
  71. 71. References<br />Harnessing GREEN IT: Principles and pratice (San Murugesan, 2009)<br />"Research reveals environmental impact of Google searches.". Fox News. 2009-01-12.,2933,479127,00.html. Retrieved 2009-01-15.<br />“Powering a Google search". Official Google Blog. Google. Retrieved 2009-10-01. <br />"Office suite require 70 times more memory than 10 years ago.". 2010-05-24. Retrieved 2010-05-24.<br />
  72. 72. References<br />"ARM chief calls for low-drain wireless". The Inquirer. 29 June 2010. Retrieved 30 June 2010.<br />Advanced Configuration and Power Interface Specification, 2010 (<br />Towarda theory for schedulingdags in internet-basedcomputing (G. Malewicz, A. L. Rosenberg, M. Yurkewych, 2006)<br />Lower bound for VLSI (Richard J. Lipton, Robert Sedgewick, 1981)<br />
  73. 73. References<br />Area-time complexity for VLSI (C.D. Thompson, 1979)<br />Cilk: an efficientmultithreadedruntimesystem (R.D. Blumofe, C.F. Joerg, B.C. Kuszmaul, C.E. Leiserson, K. H. Randall, Y. Zhou) 5° ACM SIGPLAN Symp. On Principles and practices of Parallel Programming (PPoPP ‘95)<br />
  74. 74. That’s all, folks!<br />Thanks for your attention!<br />