SlideShare a Scribd company logo
1 of 22
Resource Overbooking and Application Profiling in Shared Hosting Platforms Bhuvan Urgaonkar Prashant Shenoy  Timothy Roscoe  † UMASS Amherst and Intel Research  †
Motivation ,[object Object],[object Object],[object Object],[object Object],Clients Streaming Games E-commerce cluster Internet
Hosting Platforms ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Design Challenges ,[object Object],[object Object],[object Object],[object Object]
Talk Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hosting Platform Model ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Provisioning By Overbooking ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Application Profiling ,[object Object],time Begin CPU quantum End CPU quantum ON OFF ,[object Object],[object Object],[object Object],[object Object]
Resource Usage Distribution  time Measurement  Interval Cumulative  Probability Fractional usage 0 1 1 r(100) 0.99 r(99) Probability Fractional usage 0 1
Capturing Burstiness: Token Bucket ,[object Object],[object Object],Algorithm by Tang et al ,[object Object],[object Object],ρ 1 ρ 2 time usage σ 1 .t +  ρ 1 σ 2 .t +  ρ 2
Profiles of Server Applications ,[object Object],[object Object],[object Object],0 0.02 0.04 0.06 0.08 0.1 0 0.2 0.4 0.6 0.8 1 Postgres Server, 10 clients Probability Fraction of CPU 0 0.05 0.1 0.15 0.2 0.25 0.3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Apache Web Server, 50% cgi-bin Probability Fraction of CPU 0 0.05 0.1 0.15 0.2 0.25 0.3 0 0.1 .2 0.3 0.4 0.5 0.6 0.7 0.8 Streaming Media Server, 20 clients Probability Fraction of NW bandwidth
Resource Overbooking ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Mapping Capsules to Nodes ,[object Object],[object Object],[object Object],[object Object],1 2 3 1 2 3 4 capsules nodes capsules nodes 1 3 3 1 2 4 Final Mapping
Handling Flash Crowds ,[object Object],[object Object],[object Object],[object Object],0 0.05 0.1 0.15 0.2 0.25 0.3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Apache Web Server, Overload Probability Fraction of CPU 0 0.05 0.1 0.15 0.2 0.25 0.3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Apache Web Server, Expected Workload Probability Fraction of CPU 0 0.05 0.1 0.15 0.2 0.25 0.3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Apache Web Server, Offline Profile Probability Fraction of CPU
Talk Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The SHARC Prototype ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Experimental Setup ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Resource Overbooking Benefits ,[object Object],[object Object],Placement of Apache Web Servers
Capsule Placement Algorithms ,[object Object],[object Object]
Performance with Overbooking ,[object Object],5.23 0.59 0.31 0 0 Viol (sec) Streaming 9.04 21.78 22.21 22.46 22.8 Tput (trans/s) PostgreSQL 39.8 64.81 66.91 67.51 67.9 Tput (req/s) Apache Avg 95 th   99 th   100 th   Isolated Metric Application
Related Work ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Concluding Remarks ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

More Related Content

What's hot

Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Robert Evans
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28Xavier Lucas
 
Shared Memory Performance: Beyond TCP/IP with Ben Cotton, JPMorgan
Shared Memory Performance: Beyond TCP/IP with Ben Cotton, JPMorganShared Memory Performance: Beyond TCP/IP with Ben Cotton, JPMorgan
Shared Memory Performance: Beyond TCP/IP with Ben Cotton, JPMorganHazelcast
 
Spark vs storm
Spark vs stormSpark vs storm
Spark vs stormTrong Ton
 
Improved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as exampleImproved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as exampleDataWorks Summit/Hadoop Summit
 
유연하고 확장성 있는 빅데이터 처리
유연하고 확장성 있는 빅데이터 처리유연하고 확장성 있는 빅데이터 처리
유연하고 확장성 있는 빅데이터 처리NAVER D2
 
[OpenInfra Days Korea 2018] (Track 4) CloudEvents 소개 - 상호 운용 가능성을 극대화한 이벤트 데이...
[OpenInfra Days Korea 2018] (Track 4) CloudEvents 소개 - 상호 운용 가능성을 극대화한 이벤트 데이...[OpenInfra Days Korea 2018] (Track 4) CloudEvents 소개 - 상호 운용 가능성을 극대화한 이벤트 데이...
[OpenInfra Days Korea 2018] (Track 4) CloudEvents 소개 - 상호 운용 가능성을 극대화한 이벤트 데이...OpenStack Korea Community
 
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014Amazon Web Services
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014P. Taylor Goetz
 
Aerospike Go Language Client
Aerospike Go Language ClientAerospike Go Language Client
Aerospike Go Language ClientSayyaparaju Sunil
 
Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormMd. Shamsur Rahim
 
Real-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormReal-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormNati Shalom
 
PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.DECK36
 
Storm@Twitter, SIGMOD 2014 paper
Storm@Twitter, SIGMOD 2014 paperStorm@Twitter, SIGMOD 2014 paper
Storm@Twitter, SIGMOD 2014 paperKarthik Ramasamy
 
Build a Complex, Realtime Data Management App with Postgres 14!
Build a Complex, Realtime Data Management App with Postgres 14!Build a Complex, Realtime Data Management App with Postgres 14!
Build a Complex, Realtime Data Management App with Postgres 14!Jonathan Katz
 
Anatomy of an action
Anatomy of an actionAnatomy of an action
Anatomy of an actionGordon Chung
 

What's hot (20)

Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)
 
Aerospike & GCE (LSPE Talk)
Aerospike & GCE (LSPE Talk)Aerospike & GCE (LSPE Talk)
Aerospike & GCE (LSPE Talk)
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28
 
Shared Memory Performance: Beyond TCP/IP with Ben Cotton, JPMorgan
Shared Memory Performance: Beyond TCP/IP with Ben Cotton, JPMorganShared Memory Performance: Beyond TCP/IP with Ben Cotton, JPMorgan
Shared Memory Performance: Beyond TCP/IP with Ben Cotton, JPMorgan
 
Spark vs storm
Spark vs stormSpark vs storm
Spark vs storm
 
Improved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as exampleImproved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as example
 
유연하고 확장성 있는 빅데이터 처리
유연하고 확장성 있는 빅데이터 처리유연하고 확장성 있는 빅데이터 처리
유연하고 확장성 있는 빅데이터 처리
 
[OpenInfra Days Korea 2018] (Track 4) CloudEvents 소개 - 상호 운용 가능성을 극대화한 이벤트 데이...
[OpenInfra Days Korea 2018] (Track 4) CloudEvents 소개 - 상호 운용 가능성을 극대화한 이벤트 데이...[OpenInfra Days Korea 2018] (Track 4) CloudEvents 소개 - 상호 운용 가능성을 극대화한 이벤트 데이...
[OpenInfra Days Korea 2018] (Track 4) CloudEvents 소개 - 상호 운용 가능성을 극대화한 이벤트 데이...
 
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
 
Storm
StormStorm
Storm
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
 
Aerospike Go Language Client
Aerospike Go Language ClientAerospike Go Language Client
Aerospike Go Language Client
 
Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache Storm
 
Storm Anatomy
Storm AnatomyStorm Anatomy
Storm Anatomy
 
Real-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormReal-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using Storm
 
PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.
 
Storm@Twitter, SIGMOD 2014 paper
Storm@Twitter, SIGMOD 2014 paperStorm@Twitter, SIGMOD 2014 paper
Storm@Twitter, SIGMOD 2014 paper
 
Build a Complex, Realtime Data Management App with Postgres 14!
Build a Complex, Realtime Data Management App with Postgres 14!Build a Complex, Realtime Data Management App with Postgres 14!
Build a Complex, Realtime Data Management App with Postgres 14!
 
Anatomy of an action
Anatomy of an actionAnatomy of an action
Anatomy of an action
 

Viewers also liked

Incrementa - Increase your Upselling Revenue
Incrementa - Increase your Upselling RevenueIncrementa - Increase your Upselling Revenue
Incrementa - Increase your Upselling RevenueIncrementa
 
Overbooking policy revenue management1
Overbooking policy  revenue management1Overbooking policy  revenue management1
Overbooking policy revenue management1Mohammed Awad
 
Wildlife sanctuaries
Wildlife sanctuaries  Wildlife sanctuaries
Wildlife sanctuaries Ghassan Hadi
 
Atoll 3.1.2 automatic cell planning module
Atoll 3.1.2 automatic cell planning moduleAtoll 3.1.2 automatic cell planning module
Atoll 3.1.2 automatic cell planning moduleAnthonyPeters01
 
Wildlife Sanctuaries In India
Wildlife Sanctuaries In IndiaWildlife Sanctuaries In India
Wildlife Sanctuaries In IndiaDirectionerRiddhi
 
Front Office Upselling Training
Front Office Upselling Training Front Office Upselling Training
Front Office Upselling Training Jongjit Janjam
 
Hotel Front Office Up-Selling
Hotel Front Office Up-SellingHotel Front Office Up-Selling
Hotel Front Office Up-SellingDarrel Cartwright
 
Revenue management overview
Revenue management overviewRevenue management overview
Revenue management overviewchinichiang0707
 
Wildlife sanctuaries and National Park in India
Wildlife sanctuaries and National Park in IndiaWildlife sanctuaries and National Park in India
Wildlife sanctuaries and National Park in IndiaJamia Millia Islamia
 
Forecasting Slides
Forecasting SlidesForecasting Slides
Forecasting Slidesknksmart
 

Viewers also liked (13)

Incrementa - Increase your Upselling Revenue
Incrementa - Increase your Upselling RevenueIncrementa - Increase your Upselling Revenue
Incrementa - Increase your Upselling Revenue
 
Overbooking policy revenue management1
Overbooking policy  revenue management1Overbooking policy  revenue management1
Overbooking policy revenue management1
 
Wildlife sanctuaries
Wildlife sanctuaries  Wildlife sanctuaries
Wildlife sanctuaries
 
Upselling training
Upselling trainingUpselling training
Upselling training
 
Atoll 3.1.2 automatic cell planning module
Atoll 3.1.2 automatic cell planning moduleAtoll 3.1.2 automatic cell planning module
Atoll 3.1.2 automatic cell planning module
 
Wildlife Sanctuaries In India
Wildlife Sanctuaries In IndiaWildlife Sanctuaries In India
Wildlife Sanctuaries In India
 
Front Office Upselling Training
Front Office Upselling Training Front Office Upselling Training
Front Office Upselling Training
 
LTE RF Planning Tool - Atoll
LTE RF Planning Tool - AtollLTE RF Planning Tool - Atoll
LTE RF Planning Tool - Atoll
 
Hotel Front Office Up-Selling
Hotel Front Office Up-SellingHotel Front Office Up-Selling
Hotel Front Office Up-Selling
 
Revenue management overview
Revenue management overviewRevenue management overview
Revenue management overview
 
Upselling methods
Upselling methodsUpselling methods
Upselling methods
 
Wildlife sanctuaries and National Park in India
Wildlife sanctuaries and National Park in IndiaWildlife sanctuaries and National Park in India
Wildlife sanctuaries and National Park in India
 
Forecasting Slides
Forecasting SlidesForecasting Slides
Forecasting Slides
 

Similar to overbooking.ppt

Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022HostedbyConfluent
 
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWSArquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWSAmazon Web Services LATAM
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCoburn Watson
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...Ryousei Takano
 
The experiences of migrating a large scale, high performance healthcare network
The experiences of migrating a large scale, high performance healthcare networkThe experiences of migrating a large scale, high performance healthcare network
The experiences of migrating a large scale, high performance healthcare networkgeorge.james
 
20160503 Amazed by AWS | Tips about Performance on AWS
20160503 Amazed by AWS | Tips about Performance on AWS20160503 Amazed by AWS | Tips about Performance on AWS
20160503 Amazed by AWS | Tips about Performance on AWSAmazon Web Services Korea
 
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Amazon Web Services
 
Unlimited Virtual Computing Capacity using the Cloud for Automated Parameter ...
Unlimited Virtual Computing Capacity using the Cloud for Automated Parameter ...Unlimited Virtual Computing Capacity using the Cloud for Automated Parameter ...
Unlimited Virtual Computing Capacity using the Cloud for Automated Parameter ...Joseph Luchette
 
Scalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availabilityScalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availabilityPapitha Velumani
 
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...Spark Summit
 
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)Spark Summit
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at ScaleSean Zhong
 
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...Amazon Web Services
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyPeter Clapham
 
Flink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseFlink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseKostas Tzoumas
 
Clug 2011 March web server optimisation
Clug 2011 March  web server optimisationClug 2011 March  web server optimisation
Clug 2011 March web server optimisationgrooverdan
 
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...Amazon Web Services
 
Windows Azure Acid Test
Windows Azure Acid TestWindows Azure Acid Test
Windows Azure Acid Testexpanz
 

Similar to overbooking.ppt (20)

Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
 
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWSArquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...
 
The experiences of migrating a large scale, high performance healthcare network
The experiences of migrating a large scale, high performance healthcare networkThe experiences of migrating a large scale, high performance healthcare network
The experiences of migrating a large scale, high performance healthcare network
 
20160503 Amazed by AWS | Tips about Performance on AWS
20160503 Amazed by AWS | Tips about Performance on AWS20160503 Amazed by AWS | Tips about Performance on AWS
20160503 Amazed by AWS | Tips about Performance on AWS
 
Velocity 2010 - ATS
Velocity 2010 - ATSVelocity 2010 - ATS
Velocity 2010 - ATS
 
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
 
Unlimited Virtual Computing Capacity using the Cloud for Automated Parameter ...
Unlimited Virtual Computing Capacity using the Cloud for Automated Parameter ...Unlimited Virtual Computing Capacity using the Cloud for Automated Parameter ...
Unlimited Virtual Computing Capacity using the Cloud for Automated Parameter ...
 
Scalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availabilityScalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availability
 
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
 
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
 
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Flink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseFlink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San Jose
 
Clug 2011 March web server optimisation
Clug 2011 March  web server optimisationClug 2011 March  web server optimisation
Clug 2011 March web server optimisation
 
Introduction To Cloud Computing
Introduction To Cloud ComputingIntroduction To Cloud Computing
Introduction To Cloud Computing
 
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...
 
Windows Azure Acid Test
Windows Azure Acid TestWindows Azure Acid Test
Windows Azure Acid Test
 

More from webhostingguy

Running and Developing Tests with the Apache::Test Framework
Running and Developing Tests with the Apache::Test FrameworkRunning and Developing Tests with the Apache::Test Framework
Running and Developing Tests with the Apache::Test Frameworkwebhostingguy
 
MySQL and memcached Guide
MySQL and memcached GuideMySQL and memcached Guide
MySQL and memcached Guidewebhostingguy
 
Novell® iChain® 2.3
Novell® iChain® 2.3Novell® iChain® 2.3
Novell® iChain® 2.3webhostingguy
 
Load-balancing web servers Load-balancing web servers
Load-balancing web servers Load-balancing web serversLoad-balancing web servers Load-balancing web servers
Load-balancing web servers Load-balancing web serverswebhostingguy
 
SQL Server 2008 Consolidation
SQL Server 2008 ConsolidationSQL Server 2008 Consolidation
SQL Server 2008 Consolidationwebhostingguy
 
Master Service Agreement
Master Service AgreementMaster Service Agreement
Master Service Agreementwebhostingguy
 
PHP and MySQL PHP Written as a set of CGI binaries in C in ...
PHP and MySQL PHP Written as a set of CGI binaries in C in ...PHP and MySQL PHP Written as a set of CGI binaries in C in ...
PHP and MySQL PHP Written as a set of CGI binaries in C in ...webhostingguy
 
Dell Reference Architecture Guide Deploying Microsoft® SQL ...
Dell Reference Architecture Guide Deploying Microsoft® SQL ...Dell Reference Architecture Guide Deploying Microsoft® SQL ...
Dell Reference Architecture Guide Deploying Microsoft® SQL ...webhostingguy
 
Managing Diverse IT Infrastructure
Managing Diverse IT InfrastructureManaging Diverse IT Infrastructure
Managing Diverse IT Infrastructurewebhostingguy
 
Web design for business.ppt
Web design for business.pptWeb design for business.ppt
Web design for business.pptwebhostingguy
 
IT Power Management Strategy
IT Power Management Strategy IT Power Management Strategy
IT Power Management Strategy webhostingguy
 
Excel and SQL Quick Tricks for Merchandisers
Excel and SQL Quick Tricks for MerchandisersExcel and SQL Quick Tricks for Merchandisers
Excel and SQL Quick Tricks for Merchandiserswebhostingguy
 
Parallels Hosting Products
Parallels Hosting ProductsParallels Hosting Products
Parallels Hosting Productswebhostingguy
 
Microsoft PowerPoint presentation 2.175 Mb
Microsoft PowerPoint presentation 2.175 MbMicrosoft PowerPoint presentation 2.175 Mb
Microsoft PowerPoint presentation 2.175 Mbwebhostingguy
 

More from webhostingguy (20)

File Upload
File UploadFile Upload
File Upload
 
Running and Developing Tests with the Apache::Test Framework
Running and Developing Tests with the Apache::Test FrameworkRunning and Developing Tests with the Apache::Test Framework
Running and Developing Tests with the Apache::Test Framework
 
MySQL and memcached Guide
MySQL and memcached GuideMySQL and memcached Guide
MySQL and memcached Guide
 
Novell® iChain® 2.3
Novell® iChain® 2.3Novell® iChain® 2.3
Novell® iChain® 2.3
 
Load-balancing web servers Load-balancing web servers
Load-balancing web servers Load-balancing web serversLoad-balancing web servers Load-balancing web servers
Load-balancing web servers Load-balancing web servers
 
SQL Server 2008 Consolidation
SQL Server 2008 ConsolidationSQL Server 2008 Consolidation
SQL Server 2008 Consolidation
 
What is mod_perl?
What is mod_perl?What is mod_perl?
What is mod_perl?
 
What is mod_perl?
What is mod_perl?What is mod_perl?
What is mod_perl?
 
Master Service Agreement
Master Service AgreementMaster Service Agreement
Master Service Agreement
 
Notes8
Notes8Notes8
Notes8
 
PHP and MySQL PHP Written as a set of CGI binaries in C in ...
PHP and MySQL PHP Written as a set of CGI binaries in C in ...PHP and MySQL PHP Written as a set of CGI binaries in C in ...
PHP and MySQL PHP Written as a set of CGI binaries in C in ...
 
Dell Reference Architecture Guide Deploying Microsoft® SQL ...
Dell Reference Architecture Guide Deploying Microsoft® SQL ...Dell Reference Architecture Guide Deploying Microsoft® SQL ...
Dell Reference Architecture Guide Deploying Microsoft® SQL ...
 
Managing Diverse IT Infrastructure
Managing Diverse IT InfrastructureManaging Diverse IT Infrastructure
Managing Diverse IT Infrastructure
 
Web design for business.ppt
Web design for business.pptWeb design for business.ppt
Web design for business.ppt
 
IT Power Management Strategy
IT Power Management Strategy IT Power Management Strategy
IT Power Management Strategy
 
Excel and SQL Quick Tricks for Merchandisers
Excel and SQL Quick Tricks for MerchandisersExcel and SQL Quick Tricks for Merchandisers
Excel and SQL Quick Tricks for Merchandisers
 
OLUG_xen.ppt
OLUG_xen.pptOLUG_xen.ppt
OLUG_xen.ppt
 
Parallels Hosting Products
Parallels Hosting ProductsParallels Hosting Products
Parallels Hosting Products
 
Microsoft PowerPoint presentation 2.175 Mb
Microsoft PowerPoint presentation 2.175 MbMicrosoft PowerPoint presentation 2.175 Mb
Microsoft PowerPoint presentation 2.175 Mb
 
Reseller's Guide
Reseller's GuideReseller's Guide
Reseller's Guide
 

overbooking.ppt

  • 1. Resource Overbooking and Application Profiling in Shared Hosting Platforms Bhuvan Urgaonkar Prashant Shenoy Timothy Roscoe † UMASS Amherst and Intel Research †
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9. Resource Usage Distribution time Measurement Interval Cumulative Probability Fractional usage 0 1 1 r(100) 0.99 r(99) Probability Fractional usage 0 1
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.

Editor's Notes

  1. Good morning. I am Bhuvan Urgaonkar from UMASS Amherst. The title of my paper is … This is joint work with my advisor PS and Timothy Roscoe from Intel Reserach.
  2. Let me begin by providing the motivation behind this work. During the past few years there has been a proliferation of Internet applications. Examples of such applications include ecommerce applications, streaming media servers, online game servers etc. Due to falling hardware prices and improvements in networking technology, clusters of commodity servers have become a popular alternative to large multiprocessors for hosting these applications. This picture shows an example of a cluster of servers that hosts an ecommerce application, a streaming server and a game server.
  3. We refer to such server clusters that run third-party Internet applications as hosting platforms. The application providers use the platform’s resources such as CPU, memory, network bandwidth etc and pay the platform for these resources. In turn, the platform provider certain guarantees on resource availability to the applications so that they may provide desired levels of performance to their clients. Since the platform provider earns revenue by renting out the platform’s resources to the application providers, a central challenge in building economically profitable hosting platforms is to have mechanisms to maximize the platform’s revenue while providing useful resource guarantees to the applications. To maximize its revenue, the platform would want to maximize the number of applications that it can support with the resources it has. There are a number of problems that the platform must address to achieve this.
  4. When a new application needs to be hosted on the platform, its resource needs have to be determined to decide how many resources to reserve for it. The platform needs to employ mechanisms that can let it provision these resources for the application. It needs to map the application to the nodes in the cluster. Finally, the resource requirements of an application may change with time. The platform may need to deal with such variations to be continue meeting the application’s requirements.
  5. Here is the outline for the rest of this talk. I’ll talk about how a SHP can infer an application’s resource requirements and provision resources for it. Next I’ll talk about how variations in workload can be handled. Finally I’ll present our experimental evaluation and related work.
  6. First, let me talk a bit about the model of hosting platforms considered in our work. Hosting platforms may be either dedicated or shared. In dedicated platforms, resources are allocated at the granularity of a single server. This means each application is assigned an integral number of cluster nodes. In SHPs applications may be assigned fractional # nodes. E.g., an application may be assigned half the CPU capacity on a node. In our work, we consider SHPs. An implication of fractional resource allocation is that the nodes in the cluster can have components of multiple applications competing for the resources on the node. The third-party applications that are run on hosting platforms are typically distributed with multiple logical components. As an example, an ecommerce application may have a front-end web server, a Java application server and a back-end database server. To provide flexibility to the applications over how their components get placed on the nodes in the cluster, we introduce the notion of a capsule. A capsule is a component of an application that needs to run on a node separate from the other capsules of the application. So each application has at least one capsule, and possibly more. To take the example of the ecommerce application again, it may want its three components to be placed on one node in which case it would bundle them as one capsule. If it wants them to go to different nodes, it would have each of them as a capsule.
  7. Keeping in mind the platform’s goal of maximizing the number of applications that it can support, let us consider how it should allocate its resources among applications. One possibility is to find out the peak or the worst-case resource requirements of the applications and reserve these many resources for them. However, worst-case provisioning can be wasteful in platform resources. This is because when an application’s requirements are below the peak needs, the resources assigned to it may remain idle. So this can result in low utilization of the resources. How can we improve upon provisioning based on the peak needs ? A key observation is that several applications may be tolerant to occasional violations of resource guarantees. As an example, an application may be willing to tolerate its requirements being met 99% of the time. In such cases, the platform can reserve a smaller amount of resources than the peak needs. As a result the platform may be able to house an increased number of applications. Provisioning in this manner amounts to overbooking the resources of the nodes because now the sum of the peak needs of the capsules hosted on a node may exceed the capacity on the node. So although overbooking can help improve utilization of platform resources, it needs to be done carefully to bound the likelihood of resource guarantees being violated. For this we need to accurately determine the resource usage of hosted applications. Next I’ll describe how we do this.
  8. The first step in hosting a new application is to determine its resource usage behavior. We refer to this process as application profiling. To profile an application, we first run it on a set of isolated nodes. By isolated we mean a node which has no other competing applications to interfere with the resource usage of the new application. The application is then subjected to a realistic workload. In this work we focus on two resources namely CPU and network bandwidth. For each capsule of the application, its usage of these resources is modeled as an ON-OFF process where the ON durations correspond to when the capsule was using the resource and the OFF durations are when it was not using the resource. A part of such an ON-OFF process for the CPU usage of a hypothetical capsule is shown here. The green blocks are the ON periods and the remaining durations are OFF periods. We use the linux trace toolkit to log CPU scheduling instants and packet transmission times and use these to construct this information for the capsules.
  9. Once we have this ON-OFF series for a capsule, we process it as follows. We divide the series into measurement intervals of a suitable length and measure the fractional resource usage in each of the intervals. These usage values are histogrammed to obtain a probability density function for the fractional usage over a measurement interval. This is an example PDF. The x axis is fractional usage and goes from 0 to 1. The y axis plots the probability of the fractional usage taking a certain value. From this, we obtain the cumulative distribution function or CDF for the fractional usage. This figure shows an example CDF. As shown, this capsule has a worst-case fractional usage of r(100). So if we provision resources for the worst case, we would have to reserve r(100) fraction of the resource for this capsule. We also see that in 99% of the measurement intervals, the capsule’s fractional usage was less than or equal to r(99). So if the capsule is willing to tolerate violations of resource guarantees 1% of the time, we need only reserve r(99) fraction of the resource for it which is the 99 th percentile of its usage distribution. So we see that this CDF captures the rate at which a capsule desires a resource. It however does not capture the variability or burstiness in a capsule’s resource usage. Two capsules may have the same rate of resource consumption, but one may be inherently more bursty than the other. To capture this, we associate a CPU and a network token bucket with each capsule.
  10. Let me briefly define a token bucket. A token bucket has two parameters, a rate and a burst associated with it. A capsule’s usage of a resource is said to confirm to a token bucket with parameters sigma and rho if its resource usage over any time interval of length t is <= sigma t + rho. Given a ON-OFF process describing the resource usage of a capsule, there can be multiple token buckets to which its usage confirms. For example, for this capsule, the resource usage conforms to the token bucket (sigma1, rho1) denoted by this red line. It also conforms to a second token bucket denoted by this second red line. We use an algorithm by Tang and others to derive a set of token bucket pairs all of which describe the capsule’s resource usage behavior. Different applications may have different requirements on the time granularity over which they need resource guarantees. To capture this we allow applications to specify a third parameter T. With this parameter T, the application requires that a capsule with rate sigma and burst rho may ask for sigma*t + rho units of the resource only for intervals of length > T. Once we have these token bucket pairs all of which describe a capsule’s resource usage, we need to pick one pair to allocate resource for it. What is a good way to do this ? In the next slide we’ll look at the profiles of some real applications and that will give us a meaningful way of selecting a token bucket pair.
  11. Here are profiles of some server applications that would run on a hosting platform. This figure presents profiles for three classes of servers. The first figure shows the CPU usage distribution for an Apache web server. The web server was run on an isolated node and its workload was generated using the SPECWEB99 benchmark and consisted of 50% cgi-bin requests. For this capsule we see that the peak CPU requirement was close to 0.7, but its 99 th perceitile is 29% that is much smaller than 70%. This is an example of a highly bursty capsule with a long tail. The second profile is the network bandwidth usage of a streaming media server. The workload for this application consisted of 20 1.5Mbps variable bit rate MPEG 1 clients. The 100 th percentile and the 99 th percentile are 63% and 49% respectively. So this shows less variability than the web server. Finally we have the CPU profile of a Postgres benchmark whose workload was generated using a benchmark confirming to the TPC-B standard. This server is at the other extreme in that it has most of its mass in the high percentile regions meaning for example that the 99 th percentile is not very different from the 100 th percentile. From these profiles and others, we find that server applications exhibit different degrees of burstiness. Some of these may have a long tail. These observations suggest that a good way to pick CPU and network token bucket pairs for a capsule is to take the resource guarantee it desires, say its CPU requirements be met 99% of the time, determine a rate corresponding to that percentile of its distribution and obtain the corresponding burst as described in the last slide.
  12. Once we have the feasible nodes for each of the capsules, we need to pick a single node to place each capsule on. The situation that the platform is confronted is represented using a bipartite graph in which there is a vertex for each capsule and node. We add an edge between a capsule and a node if the node is feasible for the capsule. As an example here is a application with 3 capsules to be placed on a cluster of 4 nodes. Capsule 1 has just one feasible node so in this graph there is just one edge incident on capsule 1. We propose a set of greedy algorithms to find a placement for an application. Any such algorithm works as follows. It considers the capsules in a non-decreasing order of their degrees. If there are multiple feasible nodes for a capsule, a node may be picked for placing the capsule in several ways. A policy could be to pick one node randomly out of the feasible nodes. It turns out that any such algorithm will find a placement if one exists. Besides random we consider two other placement policies. These are best-fit and worst-fit. Best-fit chooses the feasible node with the least available capacity, whereas worst-fit chooses the node with the most available capacity. DO WE WANT THE FOLLOWING? Note that this is different from the knapsack problem. Unlike the knapsack problem, which is NP-hard, this is a polynomial time solvable problem. If an application has c capsules, then any of these algorithms can find a placement in O(c log c) time.
  13. So far we have looked at how a new applications resource requirements may be inferred and how it may be placed on a platform. So long as the application continues to see a workload similar to what it was subjected to during profiling, it continues to meet its resource requirements. However, as time goes, unanticipated events such as flash crowds may occur causing the applications requirements to change. The platform needs to detect such events and then react to them. We propose to conduct online profiling of the resource usage of capsules, construct profiles of recent usage and compare these with the original profiles. An overload would manifest itself through a change in the capsule’s profile. We conducted an experiment to illustrate how this would work. We profiled an apache web server using the SPECWEB benchmark. The workload had 50 clients and 50% dynamic HTTP requests. The obtained PDf is shown in the extreme left figure. We then decided to associate an overbooking tolerance of 1% with this web server. Correspondingly, we determined its 99 th percentile which was 29% and ran it on a node where 29% CPu was reserved for it while the rest was taken up by a greedy application. The figure in the middle shows the online profile when this web server was subjected to the same workload as during the original profiling. We find that this PDF is very similar to the original except being upper bounded by its reservation. The tail of the original pdf has shrunk and caused a small bump near the upper bound on reservation. Next we subjected this capsule to a higher request rate of 70 clients and 70% dynamic requests to simulate a flash crowd. The resulting pdf is shown in the right graph. We find that it is drastically different from the original pdf with a lot of mass near the 30% value. This is used as an indicator of overload. Once such overload has been detected, the platform needs to react to it. This is part of our ongoing research. The first step would be to compute new allocations based on the recent profile. Then the platform can take a variety of actions to achieve these allocations such as changing existing allocations, moving capsules or adding new servers.
  14. Let me now describe our prototype implementation and experimental results.
  15. We built a prototype shared hosting platform on a cluster of 6 Dell Poweredge 1550 servers running Linux 2.??. The servers were connected using a gigabit ethernet link. Here is a picture of our cluster of servers. There are three software components: first, there is the profiling infrastructure which consisted of vanilla linux 2.2.17 with the linux trace toolkit which runs applications on isolated nodes to infer their requirements. Second is a component that we call the control plane. It runs on a dedicated node and is responsible for placing applications and overbooking resources. Finally, each node that hosts applications runs a QoS enhanced linux kernel that implements the HSFQ proportional share schedulers for CPU and network bandwidth. These schedulers are used to provide specified amounts of CPU and network bandwidth to hosted applications.
  16. For our experimentation, we had the control plane running on a dedicated node and the applications ran on the other four nodes. Our workload consisted of a mix of server applications. We chose four classes of server applications. We had the postgresql database server for which the workload was generated using the pgbench benchmark that comes along with the postgresql distribution. We had the Apache web server, the workload was generated using the SPECWEB99 benchmark. We used a home grown MPEG streaming server that serves 1.5 Mbps VBR clients. Finally we had the Quake I game server with terminator bots generating workload. Not only do these applications cover a wide spectrum of the applications that run on hosting platforms, but they have diverse resource usage behavior and performance requirements and so such a mix provides a good way of testing our overbooking and placement mechanisms.
  17. We demonstrate the benefits of resource overbooking experimentally. Here we have results for two types of applications. The figure on the left shows the results for a platform hosting apache web servers. We profiled the apache web server under a variety of workloads generated using the SPECWEB benchmark. We then constructed hypothetical web server applications by picking requirements from this set of profiles and placed them on platforms of different sizes till the platform got saturated. The x axis plots the number of nodes in the platform and the y axis plots the number of applications that could be placed. The three different lines indicate different levels of overbooking tolerances. The bottom curve shows the case when there was no overbooking. The blue curve corresponds to an overbooking of 1%. We see that this small overbooking yields huge gains. When the overbooking is 5%, the gains are even bigger. Now let us look at the figure on the right. Here we had a platform on which we placed streaming media servers. We find that though overbooking provides gains, the gains are not as big as with web server. This is because the streaming media server’s workload is not as bursty as that of the web server. So this confirms that the more bursty an application, the bigger would be the gains of overbooking.
  18. Next we present an experimental comparison of three application placement policy. These policies are random, best-fit and worst-fit. Using our profiles we constructed two types of applications – a replicated web server and an ecommerce application with some front end web servers and back end database servers. The overbooking tolerance was chosen to be 5%. The figures here plot the number of applications that these policies could place on platforms of different sizes. In the figure on the left, the applications had widely varying number of capsules. For such as mix of applications we find that worst-fit outperforms random and best-fit. However, when applications have similar number of capsules, all the policies perform similarly. The reason for this is that worst-fit has the tendency of keeping the load balanced across the nodes in the cluster. Best-fit on the other hands fills up certain nodes before others. So if applications with a large number of capsules arrive, they may get turned down. IMPROVE THE EXPLAINATION
  19. Our final set of results are concerned with showing how overbooking affects the performance of server applications. For this we ran our server applications on nodes with varying levels of overbooking. This was done by first profiling the applications on isolated nodes and obtaining resource allocations corresponding to the 100 th percentile, the 99 th percentile and so on. The applications were then run with these allocations and the remaining capacity assigned to a greedy application. This table shows the performance three applications saw when run in isolation, which is this column and the performance for difference levels of overbooking. For the apache web server, we measured the throughput in requests/sec. For the database server, we measured transactions/sec and for the streaming server we measures the total length for which violations occurred. We find that for all these applications, the performance degradation due to overbooking was within the specified overbooking tolerance. Also note the provisioning based on the average needs results in substantial degradation in performance and so it is not recommended for shared hosting platforms.
  20. There has been a lot of work in resource management in the context of single nodes as well as for clusters of nodes. Two main classes of schedulers have been designed for guaranteed allocation of resources on single node. Examples of proportional-share schedulers include weighted fair queuing, start-time fair queuing and Borrowed Virtual time. Reservation based schedulers have also been proposed. Example systems that implement such schedulers are the nemesis operating system and Rialto. The specific problem of QoS aware resource management for clustered environments has been addressed in the Cluster Reserves work. Cluster Reserves builds on single node QoS aware resource allocation mechanisms and extends their benefits to clustered environments. Aron’s thesis proposes a system where applications are profiled to infer their requirements and then online profiling is done to detect changes in requirements. Whereas there are similarities with our work, the main difference is that while the primary concern in that work is meeting application contract, in our work the emphasis is on maximizing revenue. MUSE provides a mechanism for managing resources in shared hosting platforms using an economic approach for sharing resources. Pradhan et al have proposed mechanisms to use online profiling to identify bottleneck resources for applications running on hosting servers. These observations are then used to dynamically change resource allocations to optimize some metric that is an indicator of system-wide satisfaction. Something about planetary computing anf Oceano.
  21. To summarize, we looked at the problem of building effective and economically viable shared hosting platforms. Our approach was based on the following key insight. Many applications have highly bursty resource requirements, and also have tolerance towards occasional and small violations of resource availability guarantees. We proposed taking advantage of these observations by overbooking the resources in a shared hosting platform. To accurately determine resource requirements of applications, we proposed kernel based profiling techniques. We proposed techniques to do resource overbooking in a controlled manner and place applications on a cluster. We also demonstrated how online profiling can be used to detect changes in workload. Mechanisms to handle such changes is part of our ongoing research on hosting platforms. Finally, please visit our group URL for more information on our work.