2. Outline Introduction Theoretical Model Computation model Energy consumption model Throttling model Simulator Green Heuristics Results and future works References
3. What is green computing? “The study and practice of designing, manufacturing, using, and disposing of computers, servers, and associated subsystems such as monitors, printers, storage devices, and networking and communications systems efficiently and effectively with minimal or no impact on the environment.”[1] Professor Dr San MurugesanFaculty of ManagementMultimedia UniversityCyberjaya, Malaysia,
4. Why does green computing matters? Some numbers: 2 google searches = 14CO2 grams (as boiling a kettle!) (Alex Wissner-Gross, Harvard University physicist) [2][3] Windows 7 + Microsoft office 2007 requires 70 times more RAM than Windows 98 + Office 2000 to write exactly the same text or send the same email[4] In 2010, servers were responsible of the 2.5% of the total energy consumption of the USA. A Further 2.5% were used for their cooling.[5] It was estimated that by 2020, servers would use more of the world's energy than air travel if current trends continued[5]
5. Further references Green500 (www.green500.com) GreenIT (www.greenit.fr) CO2Stats (www.co2stats.com)
6. Why green scheduling? A green scheduler could provide Energy-oriented task assignment Setting the correct power level for current workload Improved use of the power management Learning power usage profile of job types Could be a part of the Operating System power management
7. What do we want from a green scheduler? Efficiency Simplicity Time is money!
8. Outline Introduction Theoretical Model Computation model Energy consumption model Throttling model Simulator Green Heuristics Results and future works References
9. Computation model Tasks usually depends on each other DAGs: Directed Acyclic Graphs If there’s a dependency between task u and v, we put an arc between nodes u and v
10. Computation model SP-DAGs: Serial parallel DAGs A DAG with 2 terminals (source and target) and an arc between them is a SP-DAG Made by parallel and series composition of other SP-DAGs
11. Why SP-DAGs? They describe several significant class of computation (for instance divide and conquer algorithms) They are the natural abstraction for several parallel programming languages (such as CILK) [10] We can recognize if a DAG is an SP-DAG in linear time We can easily transform an arbitrary DAG in an SP-DAG in linear time, using SP-ization
12. LEGO® DAGs Assessing the computational benefits of AREA-Oriented DAG-Scheduling (GennaroCordasco, Rosario De Chiara, Arnold L. Rosenberg) 2009 SP-DAGs made by a repertoire of Connected Bipartite Building Blocks DAGs representing the various subcomputations
13. Furtherdefinitions on DAGs and SP-DAGs A node in the DAG could be Unelegible Elegible Assigned/executed Schedule: Topologicalsort of the DAG Obtained by a rule for selectingwhichelegiblenode to executeateachstep of computation v has been scheduled for execution or executed v has at least a non-executed parent Allv’sparenthave been executed
14. Critical path Longest path from the source to the sink Why is so important? It’s clear to see that we can’t finish our computation before executing each node on the critical path So, time critical path execution takes it’s a trivial lower bound.
15. Further definitions on DAGs and SP-DAGs Yield of a node: number of nodes that become elegible when the given node completes his execution. 𝑬Σ(𝒊): Elegible nodes at step i in schedule Σ 𝑨𝑹𝑬𝑨(Σ)≜𝑖=0𝑛𝐸Σ𝑖
16. Outline Introduction Theoretical Model Computation model Energy consumption model Throttling model Simulator Green Heuristics Results and projectedworks References
17. Energy consumption model We need a realistic model for energy consumption We should check Circuits dissipation Throttling models
18. Energy consumption model CMOS Circuit dissipation: 𝑃=𝐶𝑉2𝑓+𝐼𝑚𝑒𝑎𝑛𝑉 +𝑉𝑙𝑒𝑎𝑘𝑎𝑔𝑒 (we won’t consider short circuit power and leakage) We assume a linear relationship between voltage and frequency 𝑓=𝑘𝑉
19. Energy consumption model Our model: 𝐸=𝐶 × 𝑇× 𝑓3 Where: 𝑇=𝑐𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒𝑠𝑓 𝑓= clock cycles per second C enclosesseveralconstantslikecapacitance, k and clock multiplier
20. Outline Introduction Theoretical Model Computation model Energy consumption model Throttling model Simulator Green Heuristics Results and projectedworks References
21. CPU throttlingmodels Whichis the common throttling model used by modern processors? ACPI: Advanced Configuration and Power-management Interface[6] A fullyplatform-independent standard thatprovides: Monitoring Configuring Hardware discovering Power management Definespowerstates for everydevice
22. Performance vs powerstates Powerstates: C0: Operationalpower state C1: Halt state C2: Stop-clock C3: Sleep Performance states: P0: Higher state P1: Lessthan P0, frequency / voltagescaled Pn: Lessthan Pn-1, frequancy/voltagescaled In our model, weimplementonly C0 power state and P0,P1,P2 Performance states.
23. Ourthrottling model We use a DFS (DynamicFrequencyScaling) Model, assumingthatscalingdoesn’taddenergyoverhead P0: 1.0 ∗𝑓 P1: 0.7 ∗𝑓 P2: 0.5 ∗ 𝑓
24. Further considerations In our model, an idle core consumes 0 We do not track the algorithm execution energy We do not track energy dissipated by memory using Energy is unbounded We’re assuming that you can set a single core throttling
25. Outline Introduction Theoretical Model Computation model Energy consumption model Throttling model Simulator Green Heuristics Results and future works References
26. The simulator We implemented this model in a DAG-Scheduling simulator, Providing classes and methods to calculate energy consumption Implementing the energy model we discussed earlier Paying attention to extensibility
27. A typical simulation Loads a DAG Computesgraphcriticalpath Initializesschedulersthatneeds to be tested Executesschedulers on the givengraphs for a givennumber of trials (usually 100, due to randomnessinfluencingschedulers) At the end of iterations, itcollectsstatisticsabout the executions, specifically Makespan (min, max, average) Energy consumptionaverage Repeats on each DAG
28. How we implemented the model Our focus: Extensibility We wanted our simulator to support multiple kind of models Providing Core abstraction Throttling level abstraction Energy aware scheduler abstraction Totally decoupled from core and throttling level Making easier to add Different scheduling algorithms Different core types Different energy models
29. Core abstraction A core can Execute tasks Set its own throttling level Track its power consumption Problem: different cores could implement different throttling strategies Solution: Every core has its own throttling levels array Throttling level is a nested class in the core implementation
30. Throttlinglevelabstraction A throttlinglevelcontains Informationsaboutfrequency and consumption Methods to calculate Due date of a task at a givenlevel (lesser the level, slower the task execution) Powerconsumptionat a givenlevel
31. Energy package Core interface We assume thatevery core can execute task and set hisownthrottling AbstractclassThrottlingLevel Implements a throttlinglevel, with energyconsumption info and frequency. Class DummyCore Core base implementation Class DefaultThrottlingLevel DummyCorenestedclass, implementsour performance states
32. Core interface /** * Execute a task on this core * @param node The node that models the task * @param length Task length if executed at max power * @return the real task length (this could differ from input * if Core is set to a different throttling level) */ public double executeTask(ICONodenode, double length); /** * Sets a core power consumption to his current throttling level * idleconsumption */ public voidsetIdle(); /** * Sets the core to a greater power level */ public voidincreaseThrottlingLevel(); /** * Sets the core to a lesser power level */ public voiddecreaseThrottlingLevel();
33. ThrottlingLevel /** * This method calculates the power consumption for a * given task length, according to power consumption unit * and other parameters, according to programmer's will that * implementsit. * * @param length The task length * @return Power consumption for this task */ abstract double getPowerConsumptionPerTask(double length); /** * This method calculates how task length is modified * for the given throttling level * * @param length ideal length of the task * @return the real task length for the given throttling level */ abstract double getRealLength(double length);
34. Throttlinglevelinitialization public voidinitializeThrottlingLevels(double hardwareConstant, double maxFreq, double maxVoltage, intthrottlingLevels) { this.levels= new ThrottlingLevel[throttlingLevels]; for( int i = 0; i < throttlingLevels - 1 ; i++ ){ double numerator,denominator; numerator = i + 1.0; denominator = i + 2.0; double fraction = numerator/denominator; levels[i] = new DefaultThrottlingLevel("LEVEL"+i, hardwareConstant, fraction * maxFreq, fraction * maxVoltage); } this.levels[throttlingLevels- 1] = new DefaultThrottlingLevel("LEVEL"+(throttlingLevels-1), hardwareConstant, maxFreq, maxVoltage); //necessary for correct use of increase and decrease Arrays.sort(levels); //by default we set the maximum power level this.currentThrottlingLevel= levels[2]; this.throttlingLevelIndex= 2; this.dissipatedPower= 0.0; }
35. Energy awareschedulerabstraction An energyawareschedulerhas to Work with differenttypes of cores Track the makespan and the energyconsumption Implementlogic for Core selection Elegiblenodeselection Choosing the right throttlinglevel
36. Energy awarescheduler package CoreSelector Implements free core selectionstrategy (In thosetestswe use DefaultCoreSelectorclass) EnergyAwareScheduler Base for eachschedulertrackingenergyconsumption
37. InspectingEnergyAwareSchedulerclass /** * Istantiates a new EnergyAwareScheduler * @paramnumCores number of cores * @paramcoreClass class that models the desired core type * @throwsInstantiationException * @throwsIllegalAccessException * @throws IllegalArgumentException if numCores <= 0 */ public EnergyAwareScheduler(intnumCores, Class<? extends Core> coreClass) throwsInstantiationException, IllegalAccessException, IllegalArgumentException /** * Calculates the task length on a given core * @paramcoreIndex index of the core in the corePool * @parameventLength ideal length of the task * @param node node to be executed * @return the task length if executed on coreIndex core */ protected double getTimeOffsetForCore(intcoreIndex, double eventLength, ICONodenode)
38. InspectingEnergyAwareSchedulerclass /** *Sets thtottlingfor core thatare going to execute a task in thisstep *@paramcoreIndex: the core id */ protectedvoidsetBusyThrottling(intcoreIndex) /** *Sets throttling state for core thatwillremainidle */ protectedvoidsetIdleThrottling() public double getTotalPowerConsumption() private voidcalculateIdleConsumptions()
39. Whataboutscheduling? Schedule steps are implementedusing the TimeLine Object A priorityqueuecontainingtwotypes of TimeEvent processorsArrives clientFinishes At eachschedulingstepremoves the first event from the TimeLine Schedulinglogicisimplemented in the runBatchedMakespanmethod Furtherinitialization are made in the initBatchedMakespanmethod
40. runBatchedMakespanmethod While ( executedNode != target) Event := timeline.pollNextEvent(); setOverallThrottlingLevel(); Switch(Event) Case(processorsArrives) 𝑛𝑒 := min(availableCores,elegibleNodesNum) For i := 0 to 𝑛𝑒 nextNode := getNextElegibleNode(); coreIndex := coreSelector.getCoreIndex(); corePool[coreIndex].setBusy(); setBusyThrottling(coreIndex); timeOffset := getTimeOffsetForCore(coreIndex, eventLength, nextNode); timeline.add(new TimeEvent(event.getTime+ timeOffset,ClientFinishes,nextNode)
42. Default strategies getNextElegibleCore() isabstract (every core has to implementit) setBusyThrottling(coreIndex) by default sets the maximum throttlinglevel, assetOverallThrottlingLevel() Furtherinitializations are made in the initBatchedMakespanmethod
43. Whatabout core selection? Core selectionisimplementedas a differentclassimplementing the CoreSelectorinterface CoreSelectorprovides the getCoreIndexmethod In oursimulationwe use only the DefaultCoreSelector, thatsimplytakes the highestfrequency free core
44. Outline Introduction Theoretical Model Computation model Energy consumption model Throttling model Simulator Green Heuristics Results and projectedworks References
45. Green heuristics CPScheduler AOSPDScheduler TFIHeuristicScheduler MarathonHeuristic Every heuristic has been implemented as an EnergyAwareScheduler subclass
46. CRITICAL PATH Based scheduling Computes graph critical path Select free core with highest energy Set core to maximum power Select node with maximum distance from the sink To implement this scheduler, only method getNextElegibleCore() has been overwritten
47. AOSPD SCHEDULING On scheduling DAGs to maximize AREA (GennaroCordasco, Arnold L. Rosenberg) An idea from Internet Computing scenario It’s quite impossible to determine when new processors become available for task execution So… What we can do? Solutions: Maximize the AREA at each execution step GREAT! Not always possibile [7] Maximize the average AREA over the execution steps Good! Always possible!
48. More on AOSPD scheduling At step 1, wehave to choose B or C for execution To maximize AREA atstep 1, wechoose C Whathappens in step 2? Choosingelegiblenodes in step 2 wecan’tmaximize AREA To maximize AREA in step 2 weshouldhavechosen B, thatwasnot AREA-Maximizing for step 1
50. TFI HEURISTIC The idea: if we have to wait for a task that requires much more time than others, we could slow down the faster ones to save energy TFI: Max due date for critical path value i
51. TFI HEURISTIC Computes graph critical path Select free core with highest frequency Sort elegible nodes by their critical path value and yield Find maximum due date TFINode := node with maximum critical path value and due date TFI:= maximum task length 𝑛𝑒:=min(𝑐𝑜𝑟𝑒𝐴𝑣𝑎𝑖𝑙𝑎𝑏𝑙𝑒, 𝑒𝑙𝑒𝑔𝑖𝑏𝑙𝑒𝑁𝑜𝑑𝑒𝑠𝑁𝑢𝑚) For i:=1 to 𝑛𝑒 Node := elegibleNodes[i] If Node == TFINode execute Node at max power Else if (elegibleNodes.size() <numCores) Execute our node at minimum throttling level that keeps his length lesser than TFI Else execute node at default throttling level
52. Marathon heuristic The idea: Our problem reminds a Marathon… We have to come first… … and possibly alive (with enough energy to come back home) Being lazier we’ll save more energy How should we run a marathon? According to my uncle: It’s better to preserve an average pace than squandering energies to run faster for a short stretch When you can’t overtake (road too narrow or you’re too tired), it’s better to slow down a little waiting for best conditions
53. Marathon heuristic Computes graph critical path Select free core with highest frequency Sort elegible nodes by their critical path value and Yield 𝑛𝑒:=min(𝑎𝑣𝑎𝑖𝑙𝑎𝑏𝑙𝑒𝐶𝑜𝑟𝑒𝑠, 𝑒𝑙𝑒𝑔𝑖𝑏𝑙𝑒𝑁𝑜𝑑𝑒𝑠𝑁𝑢𝑚) Front := sum of yields of the first 𝑛𝑒 nodes For i := 1 to 𝑛𝑒 Node := elegibleNodes[i] If front + n <= numcores – (numcores / DELTA) execute Node at minimum power Else Execute Node at average power
54. Outline Introduction Theoretical Model Computation model Energy consumption model Throttling model Simulator Green Heuristics Results and projectedworks References
55. Assessing results Remember “time is money”? Solution: 𝐸𝑇2 Remember Area-time complexity in VLSI design?[8][9] We use Energy-Time complexity to plot our schedulers performances Lesser the 𝐸𝑇2 score, better the scheduler
56. Tests Test parameters: Number of cores: 4, 8, 16 Standard deviation: 1, 2, 4, 8 Standard deviation influences task due date, which are generated by a Gaussian distribution with mean 1.0 and stdev in the given set
68. Conclusions We can’t obtain a makespanbetterthan the criticalpathscheduling AREA and Yieldconsiderationsdoesn’t seemtoaddmuch more in termsofenergysavings At least in a multicorescenario… Probablyweshould focus only on criticalpath Task due datesdoesn’t seemtoinfluencemakespantoomuch
69. Future works Tracking scheduler efficiency Adding a model for idle core’s consumption Considering a “finite energy” model Extend it in a volunteer computing scenario We could consider a scenario with many core on different dies Adding an extra cost to switch them on Adding thermal parameters
70. Outline Introduction Theoretical Model Computation model Energy consumption model Throttling model Simulator Green Heuristics Results and projectedworks References
71. References Harnessing GREEN IT: Principles and pratice (San Murugesan, 2009) "Research reveals environmental impact of Google searches.". Fox News. 2009-01-12. http://www.foxnews.com/story/0,2933,479127,00.html. Retrieved 2009-01-15. “Powering a Google search". Official Google Blog. Google. http://googleblog.blogspot.com/2009/01/powering-google-search.html. Retrieved 2009-10-01. "Office suite require 70 times more memory than 10 years ago.". GreenIT.fr. 2010-05-24. http://www.greenit.fr/article/logiciels/logiciel-la-cle-de-l-obsolescence-programmee-du-materiel-informatique-2748. Retrieved 2010-05-24.
72. References "ARM chief calls for low-drain wireless". The Inquirer. 29 June 2010. http://www.theinquirer.net/inquirer/news/1719749/arm-chief-calls-low-drain-wireless. Retrieved 30 June 2010. Advanced Configuration and Power Interface Specification, 2010 (www.acpi.info) Towarda theory for schedulingdags in internet-basedcomputing (G. Malewicz, A. L. Rosenberg, M. Yurkewych, 2006) Lower bound for VLSI (Richard J. Lipton, Robert Sedgewick, 1981)
73. References Area-time complexity for VLSI (C.D. Thompson, 1979) Cilk: an efficientmultithreadedruntimesystem (R.D. Blumofe, C.F. Joerg, B.C. Kuszmaul, C.E. Leiserson, K. H. Randall, Y. Zhou) 5° ACM SIGPLAN Symp. On Principles and practices of Parallel Programming (PPoPP ‘95)