SlideShare a Scribd company logo
1 of 26
Team Backrow
Team 8: Charlotte Steinichen, Josh
Dierberger, Kenedy Thorne, Phillip Stephens
Introduction
The high percentage of idle time on a computer can reveal the waste of certain
resources at institutions and other labs. When a resource isn’t being used
essentiality its value is being depreciated as this is a time of rapid evolving
technology. In order to capitalize on this and help systems be more efficient, our
team proposes to use this idle time to have backend processes run on the system
such as mining for bitcoin or pulling the computer system in to help with
calculations or with other system’s functionalities such as rendering or simulation.
Idle Machines would now be pulled in to help support other research tasks in time
that would otherwise be wasted. This helps institutions with maximizing the value
of their resources and further pursue other ventures along with improving cluster
usage significantly and improving on the user experience for difficult/frustrating
tasks.
Architecture/Setup
For the sake of simplicity and practicality, we divided the architecture into a
distributed hypervisor architecture with a singleton supervisor.
Each hypervisor switches between a user and research VM.
This supervisor decides what research jobs to run based on available hypervisors.
System Diagram
Level 0: Commander
Level 1: VM_Admin
Level 2:
User/Research
VM-Admin
Hypervisor (VM-Admin)
● A hypervisor is needed to switch between running research tasks or providing
a user’s Desktop environment
● VM-Admin serves as the local hosted (software) hypervisor on the machine,
switching between these VMs
○ This offers modularity that a native/bare-metal hypervisor can’t offer.
● Job Overhead ≅ 1-2 seconds
● Context Switch between Research to User ≅ 10 seconds
Hypervisor (VM-Admin)
Structure
Commander
Supervisor (Commander) Setup
● Since the hypervisors are distributed, we separate managing and scheduling
research jobs into a single Commander program.
● The hypervisors don’t have a holistic view, the Commander does.
● This allows us modularity as well, since the Commander doesn’t require
specific hypervisor setups.
● The Commander knows:
○ The time of day (to guesstimate availability)
○ All pending jobs and their priorities
■ This can be an arbitrary number, or something like how long the job has run so far
○ All running jobs
○ The availability of all VM-Admins, and what VM-Admins are paused/storing specific jobs
○ The saved state of all jobs and the messages they passed/are passing
Supervisor (Commander) Setup (cont.)
Some major actions the Commander uses:
● It can deploy a job, either a from a fresh specification or a saved state (if there
is a difference in a specific VM-Admin setup)
● It can query for and/or delete the saved state of a paused job
● It can pass messages from client jobs
These actions allow it to undertake any form of non-preemptive scheduling.
MVP
MVP Testing-Distributed Sum
● Our MVP was a program that could run a
distributed Sum
● This program ran with 1 commander, 2
VM-Admin’s, and 2 Nodes
Job 1:
Local_sum = 0
For i = 1 to 500:
Local_sum += i
Send local_sum to Job 2
Job 2:
Local_sum = 0
For i = 501 to 999 (inclusive):
Local_sum += i
Global_sum = Recv_local_sum from Job 1 + local_sum
Return global_sum
Commander
VM-Admin
Node
VM-Admin
Node
Commander
Notice the correct
Final_Sum being
computed in the last
line
Commander: Final View
Finally, the commander receives the job ‘FINISHED’ packet, with the contents of
the job’s output. Since this was an echo, it's just the initial phrase.
The first job has now been run!
Scheduling
Solving scheduling
Because we couldn’t test our product in a realistic environment, we built a
simulator to help us see the usage gain from scheduling jobs in computer
downtime and see what the most effective scheduler type would be.
Simulator assumptions:
● 20 computers
● 300 jobs, 100 “small” (15 min), 100 “medium” (45 min), 100 “large” (3 hour)
● Can set policy to either save state or delete progress when users interrupt
jobs
● Can use FIFO, SJF, or SRTF* as scheduling policies
● It takes 2 minutes to switch from a user VM to a research VM
Scheduler Policy Comparison
FIFO+DELETE FIFO+SAVE SJF+DELETE SJF+SAVE SRTF+SAVE
# jobs complete 202 241 221 269 270
S jobs complete 87 97 100 100 100
M jobs complete 79 90 100 100 100
L jobs complete 36 54 21 69 70
# job interruptions 353 341 215 380 367
% jobs interrupted 67% 56% 18% 34% 32%
# work hours done 189 253 163 307 310
Simulator Graphs
time
time
Number of Jobs Running Per Type (FIFO/SAVE)
Number of Jobs Running Per Type (SRTF/SAVE)
#ofcomputers#ofcomputers
Simulator Graphs (cont.)
computer
%usage
% Type of Job Done Per Computer
Simulator Experiment: Multiple Days
This time, ran the simulator over 5 weekdays with a pool of 1000 of each job size.
The new scheduler switches from SRTF to LRTF at night (from 1AM - 9AM).
FIFO+DELETE FIFO+SAVE SJF+DELETE SJF+SAVE SRTF SRTF+LRTF
# jobs complete 1218 1246 2060 2206 2207 2010
S jobs complete 594 600 1000 1000 1000 1000
M jobs complete 433 450 1000 1000 1000 778
L jobs complete 191 196 60 206 207 232
# job interruptions 2112 2108 1897 2053 2037 2023
% jobs interrupted 61% 61% 31% 36% 38% 51%
# work hours done 1046 1075 1180 1618 1621 1529
SRTF vs SRTF + LRTF
time
#ofcomputers#ofcomputers
time
Number of Jobs Running Per Type (SRTF/SAVE)
Number of Jobs Running Per Type (SRTF+LRTF/SAVE)
SRTF+LRTF thoughts
SRTF+LRTF forces a more equitable distribution of work and does increase the
number of long jobs that are able to be completed. It outperforms all deletion
tactics and FIFO as an algorithm. It is also more realistic than SRTF, which looks
attractive due to the high number of work hours it completes, but ignores all long
jobs for multiple days, which is unlikely to be ideal in a real-life situation where
there would not be a cap on the number of “short” jobs in the queue. Ignoring
longer jobs indefinitely is unrealistic, and this scheduler does address this.
However, this scheduler still fails to be entirely realistic by ignoring the “medium”
length jobs, which only get completed because the simulator runs out of short
jobs. A next step might be giving medium-length jobs priority for medium-traffic
times and giving small jobs priority for high-traffic times.
Group Limitations
● Coronavirus moved everyone off campus so we couldn’t collect any real data
from the GT libraries/computers.
● Difficulty with varying schedules and distance forcing us to utilize video chats
etc. to work remotely which causes some communication to be lost in
translation.
● Integration testing became very difficult, as hypervisor testing is difficult to
build a CI pipeline for and integration testing coordination remotely is a
logistical challenge.
● We didn’t have a lot of experience with VM provisioning and most industry
solutions didn’t suit our needs. Developing a solution from scratch was
lengthy
Lessons
● Value of Communication and Teamwork – differing skills and strengths
● Variety of Testing in terms of Data sets & Timings – luxury of real time data vs
simulated data
● Break down of jobs & Delegation of Work – starting small and working up in
complexity
● The Reach & Scalability of the Project – just the beginning
Conclusions / Findings
In the end, we found our system ran very smoothly on our given scale. Through
our research of implementing different schedulers the SRTF with SAVE proved to
be most effective in getting the most amount of jobs completed with little waste as
it prioritized smaller jobs first; Although, long term data might reveal other
schedulers proving to be also effective. Our Architectural use of a Commander to
switch between user and our Research VM proved to show promising results by
detecting a system’s use and switching to the proper state. Our simulated data
revealed an outstanding amount of time and resources going to waste when a
computer wasn’t in use where the implementation of our system could be utilized
about 90% of that time. Overall, our system shows promising results and to be
very resourceful and beneficial to it’s possible use at Georgia Tech. Time is
valuable; let's not waste it.
Questions?

More Related Content

Similar to 8 Team Backrow Final Presentation

Task Scheduling using Tabu Search algorithm in Cloud Computing Environment us...
Task Scheduling using Tabu Search algorithm in Cloud Computing Environment us...Task Scheduling using Tabu Search algorithm in Cloud Computing Environment us...
Task Scheduling using Tabu Search algorithm in Cloud Computing Environment us...
AzarulIkhwan
 
Cse viii-advanced-computer-architectures-06cs81-solution
Cse viii-advanced-computer-architectures-06cs81-solutionCse viii-advanced-computer-architectures-06cs81-solution
Cse viii-advanced-computer-architectures-06cs81-solution
Shobha Kumar
 
Learning Software Performance Models for Dynamic and Uncertain Environments
Learning Software Performance Models for Dynamic and Uncertain EnvironmentsLearning Software Performance Models for Dynamic and Uncertain Environments
Learning Software Performance Models for Dynamic and Uncertain Environments
Pooyan Jamshidi
 
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docxCS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
faithxdunce63732
 
DMM9 - Data Migration Testing
DMM9 - Data Migration TestingDMM9 - Data Migration Testing
DMM9 - Data Migration Testing
Nick van Beest
 

Similar to 8 Team Backrow Final Presentation (20)

Cs 568 Spring 10 Lecture 5 Estimation
Cs 568 Spring 10  Lecture 5 EstimationCs 568 Spring 10  Lecture 5 Estimation
Cs 568 Spring 10 Lecture 5 Estimation
 
Process scheduling in Light weight weight and Heavy weight processes.
Process scheduling in Light weight weight and Heavy weight processes.Process scheduling in Light weight weight and Heavy weight processes.
Process scheduling in Light weight weight and Heavy weight processes.
 
20118016 aryan sabat study and analysis of scheduler design
20118016 aryan sabat study and analysis of scheduler design20118016 aryan sabat study and analysis of scheduler design
20118016 aryan sabat study and analysis of scheduler design
 
LEARNING SCHEDULER PARAMETERS FOR ADAPTIVE PREEMPTION
LEARNING SCHEDULER PARAMETERS FOR ADAPTIVE PREEMPTIONLEARNING SCHEDULER PARAMETERS FOR ADAPTIVE PREEMPTION
LEARNING SCHEDULER PARAMETERS FOR ADAPTIVE PREEMPTION
 
Learning scheduler parameters for adaptive preemption
Learning scheduler parameters for adaptive preemptionLearning scheduler parameters for adaptive preemption
Learning scheduler parameters for adaptive preemption
 
System overview
System overviewSystem overview
System overview
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
 
Lab3F22.pdf
Lab3F22.pdfLab3F22.pdf
Lab3F22.pdf
 
DevOPs Transformation Workshop
DevOPs Transformation WorkshopDevOPs Transformation Workshop
DevOPs Transformation Workshop
 
Task Scheduling using Tabu Search algorithm in Cloud Computing Environment us...
Task Scheduling using Tabu Search algorithm in Cloud Computing Environment us...Task Scheduling using Tabu Search algorithm in Cloud Computing Environment us...
Task Scheduling using Tabu Search algorithm in Cloud Computing Environment us...
 
Cse viii-advanced-computer-architectures-06cs81-solution
Cse viii-advanced-computer-architectures-06cs81-solutionCse viii-advanced-computer-architectures-06cs81-solution
Cse viii-advanced-computer-architectures-06cs81-solution
 
Learning Software Performance Models for Dynamic and Uncertain Environments
Learning Software Performance Models for Dynamic and Uncertain EnvironmentsLearning Software Performance Models for Dynamic and Uncertain Environments
Learning Software Performance Models for Dynamic and Uncertain Environments
 
Forecasting database performance
Forecasting database performanceForecasting database performance
Forecasting database performance
 
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docxCS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
 
Prometheus-Grafana-RahulSoni1584KnolX.pptx.pdf
Prometheus-Grafana-RahulSoni1584KnolX.pptx.pdfPrometheus-Grafana-RahulSoni1584KnolX.pptx.pdf
Prometheus-Grafana-RahulSoni1584KnolX.pptx.pdf
 
DMM9 - Data Migration Testing
DMM9 - Data Migration TestingDMM9 - Data Migration Testing
DMM9 - Data Migration Testing
 
CPU Scheduling
CPU SchedulingCPU Scheduling
CPU Scheduling
 
CPU scheduling in Operating System Explanation
CPU scheduling in Operating System ExplanationCPU scheduling in Operating System Explanation
CPU scheduling in Operating System Explanation
 
Mumak
MumakMumak
Mumak
 
@SIMUL8 Virtual User Group, September: Brian Harrington, Less is More
@SIMUL8 Virtual User Group, September: Brian Harrington, Less is More@SIMUL8 Virtual User Group, September: Brian Harrington, Less is More
@SIMUL8 Virtual User Group, September: Brian Harrington, Less is More
 

Recently uploaded

Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
SofiyaSharma5
 
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
Diya Sharma
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
sexy call girls service in goa
 
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
ellan12
 
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
 

Recently uploaded (20)

On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024
 
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
 
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
 
INDIVIDUAL ASSIGNMENT #3 CBG, PRESENTATION.
INDIVIDUAL ASSIGNMENT #3 CBG, PRESENTATION.INDIVIDUAL ASSIGNMENT #3 CBG, PRESENTATION.
INDIVIDUAL ASSIGNMENT #3 CBG, PRESENTATION.
 
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
 
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
 
How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)
 
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
 
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
 
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl ServiceRussian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
 
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
 
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
 
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
 
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
 

8 Team Backrow Final Presentation

  • 1. Team Backrow Team 8: Charlotte Steinichen, Josh Dierberger, Kenedy Thorne, Phillip Stephens
  • 2. Introduction The high percentage of idle time on a computer can reveal the waste of certain resources at institutions and other labs. When a resource isn’t being used essentiality its value is being depreciated as this is a time of rapid evolving technology. In order to capitalize on this and help systems be more efficient, our team proposes to use this idle time to have backend processes run on the system such as mining for bitcoin or pulling the computer system in to help with calculations or with other system’s functionalities such as rendering or simulation. Idle Machines would now be pulled in to help support other research tasks in time that would otherwise be wasted. This helps institutions with maximizing the value of their resources and further pursue other ventures along with improving cluster usage significantly and improving on the user experience for difficult/frustrating tasks.
  • 3. Architecture/Setup For the sake of simplicity and practicality, we divided the architecture into a distributed hypervisor architecture with a singleton supervisor. Each hypervisor switches between a user and research VM. This supervisor decides what research jobs to run based on available hypervisors.
  • 4. System Diagram Level 0: Commander Level 1: VM_Admin Level 2: User/Research
  • 6. Hypervisor (VM-Admin) ● A hypervisor is needed to switch between running research tasks or providing a user’s Desktop environment ● VM-Admin serves as the local hosted (software) hypervisor on the machine, switching between these VMs ○ This offers modularity that a native/bare-metal hypervisor can’t offer. ● Job Overhead ≅ 1-2 seconds ● Context Switch between Research to User ≅ 10 seconds
  • 9. Supervisor (Commander) Setup ● Since the hypervisors are distributed, we separate managing and scheduling research jobs into a single Commander program. ● The hypervisors don’t have a holistic view, the Commander does. ● This allows us modularity as well, since the Commander doesn’t require specific hypervisor setups. ● The Commander knows: ○ The time of day (to guesstimate availability) ○ All pending jobs and their priorities ■ This can be an arbitrary number, or something like how long the job has run so far ○ All running jobs ○ The availability of all VM-Admins, and what VM-Admins are paused/storing specific jobs ○ The saved state of all jobs and the messages they passed/are passing
  • 10. Supervisor (Commander) Setup (cont.) Some major actions the Commander uses: ● It can deploy a job, either a from a fresh specification or a saved state (if there is a difference in a specific VM-Admin setup) ● It can query for and/or delete the saved state of a paused job ● It can pass messages from client jobs These actions allow it to undertake any form of non-preemptive scheduling.
  • 11. MVP
  • 12. MVP Testing-Distributed Sum ● Our MVP was a program that could run a distributed Sum ● This program ran with 1 commander, 2 VM-Admin’s, and 2 Nodes Job 1: Local_sum = 0 For i = 1 to 500: Local_sum += i Send local_sum to Job 2 Job 2: Local_sum = 0 For i = 501 to 999 (inclusive): Local_sum += i Global_sum = Recv_local_sum from Job 1 + local_sum Return global_sum Commander VM-Admin Node VM-Admin Node
  • 13. Commander Notice the correct Final_Sum being computed in the last line
  • 14. Commander: Final View Finally, the commander receives the job ‘FINISHED’ packet, with the contents of the job’s output. Since this was an echo, it's just the initial phrase. The first job has now been run!
  • 16. Solving scheduling Because we couldn’t test our product in a realistic environment, we built a simulator to help us see the usage gain from scheduling jobs in computer downtime and see what the most effective scheduler type would be. Simulator assumptions: ● 20 computers ● 300 jobs, 100 “small” (15 min), 100 “medium” (45 min), 100 “large” (3 hour) ● Can set policy to either save state or delete progress when users interrupt jobs ● Can use FIFO, SJF, or SRTF* as scheduling policies ● It takes 2 minutes to switch from a user VM to a research VM
  • 17. Scheduler Policy Comparison FIFO+DELETE FIFO+SAVE SJF+DELETE SJF+SAVE SRTF+SAVE # jobs complete 202 241 221 269 270 S jobs complete 87 97 100 100 100 M jobs complete 79 90 100 100 100 L jobs complete 36 54 21 69 70 # job interruptions 353 341 215 380 367 % jobs interrupted 67% 56% 18% 34% 32% # work hours done 189 253 163 307 310
  • 18. Simulator Graphs time time Number of Jobs Running Per Type (FIFO/SAVE) Number of Jobs Running Per Type (SRTF/SAVE) #ofcomputers#ofcomputers
  • 19. Simulator Graphs (cont.) computer %usage % Type of Job Done Per Computer
  • 20. Simulator Experiment: Multiple Days This time, ran the simulator over 5 weekdays with a pool of 1000 of each job size. The new scheduler switches from SRTF to LRTF at night (from 1AM - 9AM). FIFO+DELETE FIFO+SAVE SJF+DELETE SJF+SAVE SRTF SRTF+LRTF # jobs complete 1218 1246 2060 2206 2207 2010 S jobs complete 594 600 1000 1000 1000 1000 M jobs complete 433 450 1000 1000 1000 778 L jobs complete 191 196 60 206 207 232 # job interruptions 2112 2108 1897 2053 2037 2023 % jobs interrupted 61% 61% 31% 36% 38% 51% # work hours done 1046 1075 1180 1618 1621 1529
  • 21. SRTF vs SRTF + LRTF time #ofcomputers#ofcomputers time Number of Jobs Running Per Type (SRTF/SAVE) Number of Jobs Running Per Type (SRTF+LRTF/SAVE)
  • 22. SRTF+LRTF thoughts SRTF+LRTF forces a more equitable distribution of work and does increase the number of long jobs that are able to be completed. It outperforms all deletion tactics and FIFO as an algorithm. It is also more realistic than SRTF, which looks attractive due to the high number of work hours it completes, but ignores all long jobs for multiple days, which is unlikely to be ideal in a real-life situation where there would not be a cap on the number of “short” jobs in the queue. Ignoring longer jobs indefinitely is unrealistic, and this scheduler does address this. However, this scheduler still fails to be entirely realistic by ignoring the “medium” length jobs, which only get completed because the simulator runs out of short jobs. A next step might be giving medium-length jobs priority for medium-traffic times and giving small jobs priority for high-traffic times.
  • 23. Group Limitations ● Coronavirus moved everyone off campus so we couldn’t collect any real data from the GT libraries/computers. ● Difficulty with varying schedules and distance forcing us to utilize video chats etc. to work remotely which causes some communication to be lost in translation. ● Integration testing became very difficult, as hypervisor testing is difficult to build a CI pipeline for and integration testing coordination remotely is a logistical challenge. ● We didn’t have a lot of experience with VM provisioning and most industry solutions didn’t suit our needs. Developing a solution from scratch was lengthy
  • 24. Lessons ● Value of Communication and Teamwork – differing skills and strengths ● Variety of Testing in terms of Data sets & Timings – luxury of real time data vs simulated data ● Break down of jobs & Delegation of Work – starting small and working up in complexity ● The Reach & Scalability of the Project – just the beginning
  • 25. Conclusions / Findings In the end, we found our system ran very smoothly on our given scale. Through our research of implementing different schedulers the SRTF with SAVE proved to be most effective in getting the most amount of jobs completed with little waste as it prioritized smaller jobs first; Although, long term data might reveal other schedulers proving to be also effective. Our Architectural use of a Commander to switch between user and our Research VM proved to show promising results by detecting a system’s use and switching to the proper state. Our simulated data revealed an outstanding amount of time and resources going to waste when a computer wasn’t in use where the implementation of our system could be utilized about 90% of that time. Overall, our system shows promising results and to be very resourceful and beneficial to it’s possible use at Georgia Tech. Time is valuable; let's not waste it.

Editor's Notes

  1. Kenedy - Distributed Computing Enabled by Idle Georgia Tech Computers
  2. Kenedy
  3. Kenedy
  4. Phillip
  5. Phillip
  6. Phillip
  7. Josh
  8. Josh
  9. Charlotte
  10. Charlotte
  11. Charlotte
  12. Charlotte
  13. Kenedy - testing typically in person using same network so moving off -campus and meshing all the parts became significantly more complex to ensure it was being properly tested