SlideShare a Scribd company logo
Introduction Off-Line Problem On-Line Problem Summary Appendix
Dynamic Fractional Resource Scheduling
for Cluster Platforms
Mark Stillwell
Department of Information and Computer Sciences
University of Hawai’i at M¯anoa
Achievement Rewards for College Scientists
2010 Scholarship Awards Program
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Clusters
Definition
A cluster is a group of independent computers, or nodes,
working together closely, usually connected by a high-speed
network.
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Jobs
The system can accept user requests to run jobs or the
administrator can instantiate jobs directly
Running jobs are made up of nearly identical tasks
The number of tasks is specified by user/administrator
Tasks can block while communicating with each other
The assignment of resources to tasks is called scheduling
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Current Approaches
Service Hosting (Off-Line scheduling)
traditionally, dedicated machines
current interest in server consolidation
few good theoretical models
heavy “engineering” bias
High-Performance Computing (On-Line Scheduling)
Usually First-Come-First-Served (FCFS) with backfilling
Backfilling needs (unreliable) compute time estimates
Unbounded wait times
Inefficient use of nodes/resources
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Our Proposal
Use virtual machine technology.
Multiple tasks on one node
Performance isolation
Sharing of fractional resources
Define a run-time computable metric that captures notions
of performance and fairness.
Design heuristics that allocate resources to jobs while
explicitly trying to achieve high ratings by our metric.
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Requirements, Needs, and Yield
Tasks have memory requirements and CPU needs
All tasks of a job have the same requirements and needs
For a task to be placed on a node there must be memory
available at least equal to its requirements
A task can be allocated less CPU than its need, and the
ratio of the allocation to the need is the yield
All tasks of a job must have the same yield, so we can also
speak of the yield of a job
The yield of a job gives its performance relative to if it were
run on a dedicated system
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Off-Line Problem Assumptions
Steady-state execution with infinite jobs
Models an ideal service hosting environment
Makes the problem more tractable [Marchal et al., 2006]
Good when job duration longer than schedule time
Jobs are serial (single task)
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Objective Function
The performance of a job is correlated to its yield
Maximizing the average or sum of the yields may be unfair
to some jobs
Instead, we seek to maximize the minimum yield
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Task Placement Heuristics
Greedy Task Placement – Incremental, puts each task on
the node with the lowest computational load on which it
can fit without violating memory constraints
Randomized Rounding Task Placement – Based on
relaxing an MILP to an LP and rounding probabilistically
MCB Task Placement – Global, iteratively applies
multi-capacity (vector) bin-packing heuristics during a
binary search for the maximized minimum yield
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Large Problem Set: Minimum Yield vs. Free Memory
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0.3
0.35
0.4
0.45
0.5
0.55
Slack
MinimumYield
bound
MCB8
GR
GB
SG
SGB
RRND
RRNZ
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Conclusions
The problem is tractable
Multi-capacity bin packing algorithms perform well
Greedy algorithms perform okay, and are very fast
Randomized rounding approaches are not very promising
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
On-Line Problem Assumptions
Job arrival/completion times are not known in advance
Jobs are temporary
The user wants a final result
Quick turnaround relative to runtime is desired
Jobs are not interactive
So can wait until resources are available to start
We avoid the use of runtime estimates
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Stretch
Our goal: minimize maximum stretch (aka slowdown)
Stretch: the time a job spends in the system divided by the
time that would be spent in a dedicated
system [Bender et al., 1998]
Popular to quantify schedule quality post-mortem
Not generally used to make scheduling decisions
Runtime computation requires (unreliable) user estimates.
Minimizing average stretch prone to starvation
Minimizing maximum stretch captures notions of both
performance and fairness.
Our approach: try to maximize minimum yield
Similar, but not the same, as minimizing maximum stretch
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Max Stretch Degradation vs. Load, no migration cost
1
10
100
1000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
StretchDegradationFactor
Load
EASY
FCFS
GREEDY
GREEDYP
GREEDYPM
DynMCB8
MCB8P 600
MCB8PG 600
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Conclusions
DFRS approaches can significantly outperform traditional
approaches
Aggressive repacking can lead to much better resource
allocations
But also to heavy migration costs
Greedy migration is not that useful
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Summary
We have proposed a novel approach to job scheduling on
clusters, Dynamic Fractional Resource Scheduling, that
makes use of modern virtual machine technology and
seeks to optimize a runtime-computable, user-centric
measure of performance called the minimum yield
Multi-capacity bin packing heuristics can be used to find
good solutions
Our approach avoids the use of unreliable runtime
estimates
This approach has the potential to lead to
order-of-magnitude improvements in performance over
current technology
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
References I
Bender, M. A., Chakrabarti, S., and Muthukrishnan, S.
(1998).
Flow and Stretch Metrics for Scheduling Continuous Job
Streams.
In Proc. of the 9th ACM-SIAM Symp. On Discrete
Algorithms, pages 270–279.
Marchal, L., Yang, Y., Casanova, H., and Robert, Y. (2006).
Steady-state scheduling of multiple divisible load
applications on wide-area distributed computing platforms.
Intl. J. of High Performance Computing Applications,
20(3):365–381.
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
References II
Stillwell, M., Shanzenbach, D., Vivien, F., and Casanova,
H. (2009).
Resource Allocation using Virtual Clusters.
In CCGrid, pages 260–267. IEEE.
Stillwell, M., Vivien, F., and Casanova, H. (2010).
Dynamic fractional resource scheduling for HPC workloads.
In IPDPS.
to appear.
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters

More Related Content

Viewers also liked

Main
MainMain
ギークハウスエチオピアProject 一部削除版
ギークハウスエチオピアProject 一部削除版ギークハウスエチオピアProject 一部削除版
ギークハウスエチオピアProject 一部削除版
Naoki Kitamura
 
Resource Allocation using Virtual Clusters
Resource Allocation using Virtual ClustersResource Allocation using Virtual Clusters
Resource Allocation using Virtual Clusters
Mark Stillwell
 
Dynamic Fractional Resource Scheduling For HPC Workloads -- 2009, Lyon
Dynamic Fractional Resource Scheduling For HPC Workloads -- 2009, LyonDynamic Fractional Resource Scheduling For HPC Workloads -- 2009, Lyon
Dynamic Fractional Resource Scheduling For HPC Workloads -- 2009, Lyon
Mark Stillwell
 
Dynamic Fractional Resource Scheduling Practical Issues and Future Directions...
Dynamic Fractional Resource Scheduling Practical Issues and Future Directions...Dynamic Fractional Resource Scheduling Practical Issues and Future Directions...
Dynamic Fractional Resource Scheduling Practical Issues and Future Directions...
Mark Stillwell
 
H2C
H2CH2C
DevOps introduction with ansible, vagrant, and docker
DevOps introduction with ansible, vagrant, and dockerDevOps introduction with ansible, vagrant, and docker
DevOps introduction with ansible, vagrant, and docker
Mark Stillwell
 
DevOps introduction with ansible, vagrant, and docker
DevOps introduction with ansible, vagrant, and dockerDevOps introduction with ansible, vagrant, and docker
DevOps introduction with ansible, vagrant, and docker
Mark Stillwell
 
HARNESS project Demo
HARNESS project DemoHARNESS project Demo
HARNESS project Demo
Mark Stillwell
 

Viewers also liked (9)

Main
MainMain
Main
 
ギークハウスエチオピアProject 一部削除版
ギークハウスエチオピアProject 一部削除版ギークハウスエチオピアProject 一部削除版
ギークハウスエチオピアProject 一部削除版
 
Resource Allocation using Virtual Clusters
Resource Allocation using Virtual ClustersResource Allocation using Virtual Clusters
Resource Allocation using Virtual Clusters
 
Dynamic Fractional Resource Scheduling For HPC Workloads -- 2009, Lyon
Dynamic Fractional Resource Scheduling For HPC Workloads -- 2009, LyonDynamic Fractional Resource Scheduling For HPC Workloads -- 2009, Lyon
Dynamic Fractional Resource Scheduling For HPC Workloads -- 2009, Lyon
 
Dynamic Fractional Resource Scheduling Practical Issues and Future Directions...
Dynamic Fractional Resource Scheduling Practical Issues and Future Directions...Dynamic Fractional Resource Scheduling Practical Issues and Future Directions...
Dynamic Fractional Resource Scheduling Practical Issues and Future Directions...
 
H2C
H2CH2C
H2C
 
DevOps introduction with ansible, vagrant, and docker
DevOps introduction with ansible, vagrant, and dockerDevOps introduction with ansible, vagrant, and docker
DevOps introduction with ansible, vagrant, and docker
 
DevOps introduction with ansible, vagrant, and docker
DevOps introduction with ansible, vagrant, and dockerDevOps introduction with ansible, vagrant, and docker
DevOps introduction with ansible, vagrant, and docker
 
HARNESS project Demo
HARNESS project DemoHARNESS project Demo
HARNESS project Demo
 

Similar to Dynamic Fractional Resource Scheduling -- 2010, ARCS

Virtualization Business Case
Virtualization Business CaseVirtualization Business Case
Virtualization Business Case
TopLine Strategies
 
An Overview of VIEW
An Overview of VIEWAn Overview of VIEW
An Overview of VIEW
Shiyong Lu
 
Svm Classifier Algorithm for Data Stream Mining Using Hive and R
Svm Classifier Algorithm for Data Stream Mining Using Hive and RSvm Classifier Algorithm for Data Stream Mining Using Hive and R
Svm Classifier Algorithm for Data Stream Mining Using Hive and R
IRJET Journal
 
Compare Clustering Methods for MS SQL Server
Compare Clustering Methods for MS SQL ServerCompare Clustering Methods for MS SQL Server
Compare Clustering Methods for MS SQL Server
AlexDepo
 
Migration to Oracle 12c Made Easy Using Replication Technology
Migration to Oracle 12c Made Easy Using Replication TechnologyMigration to Oracle 12c Made Easy Using Replication Technology
Migration to Oracle 12c Made Easy Using Replication Technology
Donna Guazzaloca-Zehl
 
Scalable scheduling of updates in streaming data warehouses
Scalable scheduling of updates in streaming data warehousesScalable scheduling of updates in streaming data warehouses
Scalable scheduling of updates in streaming data warehouses
Finalyear Projects
 
REAL TIME PROJECTS IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...
REAL TIME PROJECTS  IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...REAL TIME PROJECTS  IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...
REAL TIME PROJECTS IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...
Finalyear Projects
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
Wei Ting Chen
 
A Study on Replication and Failover Cluster to Maximize System Uptime
A Study on Replication and Failover Cluster to Maximize System UptimeA Study on Replication and Failover Cluster to Maximize System Uptime
A Study on Replication and Failover Cluster to Maximize System Uptime
YogeshIJTSRD
 
MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1
blewington
 
An Optimized-Throttled Algorithm for Distributing Load in Cloud Computing
An Optimized-Throttled Algorithm for Distributing Load in Cloud ComputingAn Optimized-Throttled Algorithm for Distributing Load in Cloud Computing
An Optimized-Throttled Algorithm for Distributing Load in Cloud Computing
IRJET Journal
 
IBM InterConnect 2013 Expert Integrated Systems Keynote: Sotiropoulos & Wieck
IBM InterConnect 2013 Expert Integrated Systems Keynote: Sotiropoulos & WieckIBM InterConnect 2013 Expert Integrated Systems Keynote: Sotiropoulos & Wieck
IBM InterConnect 2013 Expert Integrated Systems Keynote: Sotiropoulos & Wieck
IBM Events
 
CirrusDB Cloud Effect
CirrusDB Cloud EffectCirrusDB Cloud Effect
CirrusDB Cloud Effect
Ashok Sami
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...
Ryousei Takano
 
Designing apps for resiliency
Designing apps for resiliencyDesigning apps for resiliency
Designing apps for resiliency
Masashi Narumoto
 
VMworld 2013: SDDC IT Operations Transformation: Multi-customer Lessons Learned
VMworld 2013: SDDC IT Operations Transformation:  Multi-customer Lessons LearnedVMworld 2013: SDDC IT Operations Transformation:  Multi-customer Lessons Learned
VMworld 2013: SDDC IT Operations Transformation: Multi-customer Lessons Learned
VMworld
 
Error tolerant resource allocation and payment minimization for cloud system
Error tolerant resource allocation and payment minimization for cloud systemError tolerant resource allocation and payment minimization for cloud system
Error tolerant resource allocation and payment minimization for cloud system
JPINFOTECH JAYAPRAKASH
 
DB2 for z/O S Data Sharing
DB2 for z/O S  Data  SharingDB2 for z/O S  Data  Sharing
DB2 for z/O S Data Sharing
Surekha Parekh
 
High Availability Disaster Recovery Customer Success Stories[1]
High Availability Disaster Recovery Customer Success Stories[1]High Availability Disaster Recovery Customer Success Stories[1]
High Availability Disaster Recovery Customer Success Stories[1]
Michael Hudak
 
IEEE 2014 JAVA DATA MINING PROJECTS Towards multi tenant performance sl os
IEEE 2014 JAVA DATA MINING PROJECTS Towards multi tenant performance sl osIEEE 2014 JAVA DATA MINING PROJECTS Towards multi tenant performance sl os
IEEE 2014 JAVA DATA MINING PROJECTS Towards multi tenant performance sl os
IEEEFINALYEARSTUDENTPROJECTS
 

Similar to Dynamic Fractional Resource Scheduling -- 2010, ARCS (20)

Virtualization Business Case
Virtualization Business CaseVirtualization Business Case
Virtualization Business Case
 
An Overview of VIEW
An Overview of VIEWAn Overview of VIEW
An Overview of VIEW
 
Svm Classifier Algorithm for Data Stream Mining Using Hive and R
Svm Classifier Algorithm for Data Stream Mining Using Hive and RSvm Classifier Algorithm for Data Stream Mining Using Hive and R
Svm Classifier Algorithm for Data Stream Mining Using Hive and R
 
Compare Clustering Methods for MS SQL Server
Compare Clustering Methods for MS SQL ServerCompare Clustering Methods for MS SQL Server
Compare Clustering Methods for MS SQL Server
 
Migration to Oracle 12c Made Easy Using Replication Technology
Migration to Oracle 12c Made Easy Using Replication TechnologyMigration to Oracle 12c Made Easy Using Replication Technology
Migration to Oracle 12c Made Easy Using Replication Technology
 
Scalable scheduling of updates in streaming data warehouses
Scalable scheduling of updates in streaming data warehousesScalable scheduling of updates in streaming data warehouses
Scalable scheduling of updates in streaming data warehouses
 
REAL TIME PROJECTS IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...
REAL TIME PROJECTS  IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...REAL TIME PROJECTS  IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...
REAL TIME PROJECTS IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
 
A Study on Replication and Failover Cluster to Maximize System Uptime
A Study on Replication and Failover Cluster to Maximize System UptimeA Study on Replication and Failover Cluster to Maximize System Uptime
A Study on Replication and Failover Cluster to Maximize System Uptime
 
MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1
 
An Optimized-Throttled Algorithm for Distributing Load in Cloud Computing
An Optimized-Throttled Algorithm for Distributing Load in Cloud ComputingAn Optimized-Throttled Algorithm for Distributing Load in Cloud Computing
An Optimized-Throttled Algorithm for Distributing Load in Cloud Computing
 
IBM InterConnect 2013 Expert Integrated Systems Keynote: Sotiropoulos & Wieck
IBM InterConnect 2013 Expert Integrated Systems Keynote: Sotiropoulos & WieckIBM InterConnect 2013 Expert Integrated Systems Keynote: Sotiropoulos & Wieck
IBM InterConnect 2013 Expert Integrated Systems Keynote: Sotiropoulos & Wieck
 
CirrusDB Cloud Effect
CirrusDB Cloud EffectCirrusDB Cloud Effect
CirrusDB Cloud Effect
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...
 
Designing apps for resiliency
Designing apps for resiliencyDesigning apps for resiliency
Designing apps for resiliency
 
VMworld 2013: SDDC IT Operations Transformation: Multi-customer Lessons Learned
VMworld 2013: SDDC IT Operations Transformation:  Multi-customer Lessons LearnedVMworld 2013: SDDC IT Operations Transformation:  Multi-customer Lessons Learned
VMworld 2013: SDDC IT Operations Transformation: Multi-customer Lessons Learned
 
Error tolerant resource allocation and payment minimization for cloud system
Error tolerant resource allocation and payment minimization for cloud systemError tolerant resource allocation and payment minimization for cloud system
Error tolerant resource allocation and payment minimization for cloud system
 
DB2 for z/O S Data Sharing
DB2 for z/O S  Data  SharingDB2 for z/O S  Data  Sharing
DB2 for z/O S Data Sharing
 
High Availability Disaster Recovery Customer Success Stories[1]
High Availability Disaster Recovery Customer Success Stories[1]High Availability Disaster Recovery Customer Success Stories[1]
High Availability Disaster Recovery Customer Success Stories[1]
 
IEEE 2014 JAVA DATA MINING PROJECTS Towards multi tenant performance sl os
IEEE 2014 JAVA DATA MINING PROJECTS Towards multi tenant performance sl osIEEE 2014 JAVA DATA MINING PROJECTS Towards multi tenant performance sl os
IEEE 2014 JAVA DATA MINING PROJECTS Towards multi tenant performance sl os
 

Recently uploaded

Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 

Recently uploaded (20)

Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 

Dynamic Fractional Resource Scheduling -- 2010, ARCS

  • 1. Introduction Off-Line Problem On-Line Problem Summary Appendix Dynamic Fractional Resource Scheduling for Cluster Platforms Mark Stillwell Department of Information and Computer Sciences University of Hawai’i at M¯anoa Achievement Rewards for College Scientists 2010 Scholarship Awards Program Mark Stillwell UH M¯anoa ICS DFRS for Clusters
  • 2. Introduction Off-Line Problem On-Line Problem Summary Appendix Clusters Definition A cluster is a group of independent computers, or nodes, working together closely, usually connected by a high-speed network. Mark Stillwell UH M¯anoa ICS DFRS for Clusters
  • 3. Introduction Off-Line Problem On-Line Problem Summary Appendix Jobs The system can accept user requests to run jobs or the administrator can instantiate jobs directly Running jobs are made up of nearly identical tasks The number of tasks is specified by user/administrator Tasks can block while communicating with each other The assignment of resources to tasks is called scheduling Mark Stillwell UH M¯anoa ICS DFRS for Clusters
  • 4. Introduction Off-Line Problem On-Line Problem Summary Appendix Current Approaches Service Hosting (Off-Line scheduling) traditionally, dedicated machines current interest in server consolidation few good theoretical models heavy “engineering” bias High-Performance Computing (On-Line Scheduling) Usually First-Come-First-Served (FCFS) with backfilling Backfilling needs (unreliable) compute time estimates Unbounded wait times Inefficient use of nodes/resources Mark Stillwell UH M¯anoa ICS DFRS for Clusters
  • 5. Introduction Off-Line Problem On-Line Problem Summary Appendix Our Proposal Use virtual machine technology. Multiple tasks on one node Performance isolation Sharing of fractional resources Define a run-time computable metric that captures notions of performance and fairness. Design heuristics that allocate resources to jobs while explicitly trying to achieve high ratings by our metric. Mark Stillwell UH M¯anoa ICS DFRS for Clusters
  • 6. Introduction Off-Line Problem On-Line Problem Summary Appendix Requirements, Needs, and Yield Tasks have memory requirements and CPU needs All tasks of a job have the same requirements and needs For a task to be placed on a node there must be memory available at least equal to its requirements A task can be allocated less CPU than its need, and the ratio of the allocation to the need is the yield All tasks of a job must have the same yield, so we can also speak of the yield of a job The yield of a job gives its performance relative to if it were run on a dedicated system Mark Stillwell UH M¯anoa ICS DFRS for Clusters
  • 7. Introduction Off-Line Problem On-Line Problem Summary Appendix Off-Line Problem Assumptions Steady-state execution with infinite jobs Models an ideal service hosting environment Makes the problem more tractable [Marchal et al., 2006] Good when job duration longer than schedule time Jobs are serial (single task) Mark Stillwell UH M¯anoa ICS DFRS for Clusters
  • 8. Introduction Off-Line Problem On-Line Problem Summary Appendix Objective Function The performance of a job is correlated to its yield Maximizing the average or sum of the yields may be unfair to some jobs Instead, we seek to maximize the minimum yield Mark Stillwell UH M¯anoa ICS DFRS for Clusters
  • 9. Introduction Off-Line Problem On-Line Problem Summary Appendix Task Placement Heuristics Greedy Task Placement – Incremental, puts each task on the node with the lowest computational load on which it can fit without violating memory constraints Randomized Rounding Task Placement – Based on relaxing an MILP to an LP and rounding probabilistically MCB Task Placement – Global, iteratively applies multi-capacity (vector) bin-packing heuristics during a binary search for the maximized minimum yield Mark Stillwell UH M¯anoa ICS DFRS for Clusters
  • 10. Introduction Off-Line Problem On-Line Problem Summary Appendix Large Problem Set: Minimum Yield vs. Free Memory 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.3 0.35 0.4 0.45 0.5 0.55 Slack MinimumYield bound MCB8 GR GB SG SGB RRND RRNZ Mark Stillwell UH M¯anoa ICS DFRS for Clusters
  • 11. Introduction Off-Line Problem On-Line Problem Summary Appendix Conclusions The problem is tractable Multi-capacity bin packing algorithms perform well Greedy algorithms perform okay, and are very fast Randomized rounding approaches are not very promising Mark Stillwell UH M¯anoa ICS DFRS for Clusters
  • 12. Introduction Off-Line Problem On-Line Problem Summary Appendix On-Line Problem Assumptions Job arrival/completion times are not known in advance Jobs are temporary The user wants a final result Quick turnaround relative to runtime is desired Jobs are not interactive So can wait until resources are available to start We avoid the use of runtime estimates Mark Stillwell UH M¯anoa ICS DFRS for Clusters
  • 13. Introduction Off-Line Problem On-Line Problem Summary Appendix Stretch Our goal: minimize maximum stretch (aka slowdown) Stretch: the time a job spends in the system divided by the time that would be spent in a dedicated system [Bender et al., 1998] Popular to quantify schedule quality post-mortem Not generally used to make scheduling decisions Runtime computation requires (unreliable) user estimates. Minimizing average stretch prone to starvation Minimizing maximum stretch captures notions of both performance and fairness. Our approach: try to maximize minimum yield Similar, but not the same, as minimizing maximum stretch Mark Stillwell UH M¯anoa ICS DFRS for Clusters
  • 14. Introduction Off-Line Problem On-Line Problem Summary Appendix Max Stretch Degradation vs. Load, no migration cost 1 10 100 1000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 StretchDegradationFactor Load EASY FCFS GREEDY GREEDYP GREEDYPM DynMCB8 MCB8P 600 MCB8PG 600 Mark Stillwell UH M¯anoa ICS DFRS for Clusters
  • 15. Introduction Off-Line Problem On-Line Problem Summary Appendix Conclusions DFRS approaches can significantly outperform traditional approaches Aggressive repacking can lead to much better resource allocations But also to heavy migration costs Greedy migration is not that useful Mark Stillwell UH M¯anoa ICS DFRS for Clusters
  • 16. Introduction Off-Line Problem On-Line Problem Summary Appendix Summary We have proposed a novel approach to job scheduling on clusters, Dynamic Fractional Resource Scheduling, that makes use of modern virtual machine technology and seeks to optimize a runtime-computable, user-centric measure of performance called the minimum yield Multi-capacity bin packing heuristics can be used to find good solutions Our approach avoids the use of unreliable runtime estimates This approach has the potential to lead to order-of-magnitude improvements in performance over current technology Mark Stillwell UH M¯anoa ICS DFRS for Clusters
  • 17. Introduction Off-Line Problem On-Line Problem Summary Appendix References I Bender, M. A., Chakrabarti, S., and Muthukrishnan, S. (1998). Flow and Stretch Metrics for Scheduling Continuous Job Streams. In Proc. of the 9th ACM-SIAM Symp. On Discrete Algorithms, pages 270–279. Marchal, L., Yang, Y., Casanova, H., and Robert, Y. (2006). Steady-state scheduling of multiple divisible load applications on wide-area distributed computing platforms. Intl. J. of High Performance Computing Applications, 20(3):365–381. Mark Stillwell UH M¯anoa ICS DFRS for Clusters
  • 18. Introduction Off-Line Problem On-Line Problem Summary Appendix References II Stillwell, M., Shanzenbach, D., Vivien, F., and Casanova, H. (2009). Resource Allocation using Virtual Clusters. In CCGrid, pages 260–267. IEEE. Stillwell, M., Vivien, F., and Casanova, H. (2010). Dynamic fractional resource scheduling for HPC workloads. In IPDPS. to appear. Mark Stillwell UH M¯anoa ICS DFRS for Clusters