SlideShare a Scribd company logo
Service Oriented Computing
Reading Assignment #2
Cloud Mirror
Mesos Cluster
Google Omega
Aris Cahyadi Risdianto
- 20132095 -
CloudMirror: Background Problem and Challenges
Cloud hosted Application Problem
 Not simple as Hadoop or Pregel
 Interactive = predictable throughput & latency
 100 msec latency increase = 1 % sales loss
(Amazon)
 Interactive workload ≥ batch workload CPU
 Oversubscribe bandwidth to guarantee
application = very expensive cost
 No bandwidth-to-vCPU ratio to guarantee the
bandwidth usage
Key Challenges
• “Easy” network abstraction model
specify bandwidth requirement
• A workload placement algorithm for
efficient resource allocation
• Scalable runtime to enforce bandwidth
guarantee and efficient usage
CloudMirror: Proposed Solutions
*) New network abstraction based on application communication structure
TAG*
(Tenant Application Graph)
Workload Placement
Algorithm Cloud Mirror
TAG Deployment
• Bandwidth allocation at DC uplink match with TAG model
requirements
• Bandwidth saving by VM collocations in the subtree
• VM Placement Algorithm to bridge the gap between high
level TAG and low level infrastructure
• Guaranteeing anti-affinity for HA and opportunistic anti-
affinity for non-HA
TAG model
• each vertex graph represent
application component/tier
• Intuitive, descriptive, efficient
and flexible
• produced by OpenStack Heat
and AWS Cloud formation
extension
CloudMirror: Simulation and Evaluation Result
Evaluation
1) Efficiency
a) Reserving Less Network Bandwidth
b) Accepting more tenant request
2) Placement ability to guarantee and improve
availability
3) Feasibility of deploying in real testbed
Result Highlight
• Benefits resource balancing as introduced in
bandwidth capacity constraint network topology
• Tenant rejection rate is less than 2.2 % and usually
because of large VM/bandwidth requirements
• Guaranteeing High Availability with higher WCS
requirement will increase rejection rate
• Scalability: 200 msec for 100 VMs/tenant or few
seconds for 1000 VMs/tenant
Mesos: Background Problem + Challenges and
Mesos Target Environment
Cluster Computing Framework Today
 Emerge, but no framework for all
 Multiplexing improve utilization and allow
sharing, but costly for replications
 Static partition / VM allocation per framework
not achieve high utilization or efficient sharing
>> no fine-grained sharing across framework
Key Challenges
• Complexity : scheduler API to get all
frameworks requirements and online
optimization for millions of tasks
• New framework and new scheduling
policies : current framework still
developed
• Expensive Refactoring : move many
individual frameworks scheduling into
global scheduling
Target Environment:
Cluster run Hadoop Jobs/Tasks as well as
MPI jobs in the same time
(Facebook or Yahoo dataware house)
Mesos: Proposed Solutions
Key Features
1) Resource Allocations
• Two allocation modules : max-min fairness for
multiple resource and strict priorities (similar with
Hadoop & Dryad)
• Task revocation mechanism: killing low impact tasks
& trigger when revocation
2) Isolations resources between framework executors
• Leveraging several existing OS isolation (modules)
• Currently using Linux Container and Solaris Project
3) Scalable and Robust Resource Offer with 3 mechanism
• Some framework always reject certain resources
• Response timer for framework to receive offer
• One framework no response, re-offer to other
framework
• Master Process manage mesos slaves
daemon on each cluster
• Framework run on each cluster to run
the tasks on each slave
• Framework has two component:
scheduler (register to master to get
resources) and executor (run the task)
Mesos API
Function for
Scheduler &
Executors
Mesos: Simulation and Evaluation Result
Evaluation
1) Macrobenchmark workloads (facebook hadoop
mix, large hadoop mix, Spark, Torque/MPI)
2) Overhead
3) Data Locality through Delay Scheduling
4) Iterative jobs using Spark
5) Mesos Scalability
6) Failure Recovery
7) Performance Isolation
Implementation
• 10,000 lines codes of C++
• Run on Linux, Solaris, and OS X
• Supporting frameworks on Java, C++,
and Python
• Zookeeper to leader election
• Linux container for CPU and Memory
• Tested frameworks: Hadoop, Torque,
MPICH2, and Spark
Resource
Utilization
Mesos
Scalability
macrobenchmark
Speedup
Result
Omega: Background Problem + Requirement and
Solutions Approach
Cluster Scheduler Problem
 Many different (high resource, rapid decision,
business constraint, etc.) goals but should
robust and always available
 Cluster and workloads are keep growing fast
 Monolithic and Two-level scheduling not
satisfied (difficult for new policy and difficult
to schedule)
 Complexity in hardware and workload
heterogeneity
Design Issues Cluster Scheduler
• Partitioning the scheduling work
• Choice of Resources from Cluster
• Interference (optimistic & pessimistic)
• Allocation Granularity (policy flexible)
• Cluster-wide behavior
Omega: Proposed Solutions
Key Features
1) Grant full access all scheduler to entire cluster (allow
compete in a free-for-all manner)
2) Optimistic concurrency control to mediate clashes to
update the cluster state
3) No central resource allocator (all decisions in scheduler)
4) Resource allocation copy in scheduler (called as “cell”)
5) Synchronize cell state (transaction), if failed try it again
6) Run in parallel and no wait for other jobs (no inter-
scheduler blocking)
7) Different policies for all scheduler and apply relative
important jobs (called as “precedence”)
• Monolithic: use in HPC with single
instance , same algorithm for all jobs
• Two-level: use by Mesos and Hadoop-
on-Demand, many different scheduler
control by central scheduler
• Shared State: use by Omega, avoiding
two level and limited parallelism
New Parallel Scheduler
around “shared-state”
Lock-free Optimistic
Concurreny Control Omega
Omega: Simulation and Evaluation Result
Evaluation for Trace-Driven Simulation
1) Scheduling Performance : how service scheduler busyness varies as jobs and tasks
2) Scaling the Workload: time for scaling the task if there any conflicts
3) Load-balancing the batch scheduler: more decision time for large batch jobs
4) Dealing with Conflicts with two choices : coarse-grained conflict detection and all-nothing schedule
5) MapScheduler impact for the utilization and time completion jbs
Lightweight
Simulator
Result
Simulator
1)Lightweight Simulator: for compare scheduler architecture in same conditions and identic workloads
2)A high-fidelity Simulator: for historical Google workload traces

More Related Content

What's hot

Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...HostedbyConfluent
 
Simulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightningSimulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightningCloudLightning
 
(Slides) Task scheduling algorithm for multicore processor system for minimiz...
(Slides) Task scheduling algorithm for multicore processor system for minimiz...(Slides) Task scheduling algorithm for multicore processor system for minimiz...
(Slides) Task scheduling algorithm for multicore processor system for minimiz...Naoki Shibata
 
Cluster schedulerの紹介
Cluster schedulerの紹介Cluster schedulerの紹介
Cluster schedulerの紹介Chuenlye Leo
 
High Performance Computer
High Performance ComputerHigh Performance Computer
High Performance ComputerAshok Raj
 
distributed, concurrent, and independent access to encrypted cloud databases
distributed, concurrent, and independent access to encrypted cloud databasesdistributed, concurrent, and independent access to encrypted cloud databases
distributed, concurrent, and independent access to encrypted cloud databasesswathi78
 
WPF/ XamDataGrid Performance, Infragistics Seminar, Israel , November 2011
WPF/ XamDataGrid Performance, Infragistics Seminar, Israel , November 2011WPF/ XamDataGrid Performance, Infragistics Seminar, Israel , November 2011
WPF/ XamDataGrid Performance, Infragistics Seminar, Israel , November 2011Engineering Software Lab
 
4838281 operating-system-scheduling-on-multicore-architectures
4838281 operating-system-scheduling-on-multicore-architectures4838281 operating-system-scheduling-on-multicore-architectures
4838281 operating-system-scheduling-on-multicore-architecturesIslam Samir
 
CloudLightning Simulator
CloudLightning SimulatorCloudLightning Simulator
CloudLightning SimulatorCloudLightning
 
Cluster computing
Cluster computingCluster computing
Cluster computingbrainbix
 
Multiprocessor scheduling 1
Multiprocessor scheduling 1Multiprocessor scheduling 1
Multiprocessor scheduling 1mrbourne
 
A load balancing model based on cloud partitioning for the public cloud
A load balancing model based on cloud partitioning for the public cloudA load balancing model based on cloud partitioning for the public cloud
A load balancing model based on cloud partitioning for the public cloudJPINFOTECH JAYAPRAKASH
 
Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...IEEEFINALYEARPROJECTS
 

What's hot (20)

Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...
 
Simulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightningSimulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightning
 
Cluster
ClusterCluster
Cluster
 
(Slides) Task scheduling algorithm for multicore processor system for minimiz...
(Slides) Task scheduling algorithm for multicore processor system for minimiz...(Slides) Task scheduling algorithm for multicore processor system for minimiz...
(Slides) Task scheduling algorithm for multicore processor system for minimiz...
 
Cluster schedulerの紹介
Cluster schedulerの紹介Cluster schedulerの紹介
Cluster schedulerの紹介
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
 
High Performance Computer
High Performance ComputerHigh Performance Computer
High Performance Computer
 
distributed, concurrent, and independent access to encrypted cloud databases
distributed, concurrent, and independent access to encrypted cloud databasesdistributed, concurrent, and independent access to encrypted cloud databases
distributed, concurrent, and independent access to encrypted cloud databases
 
WPF/ XamDataGrid Performance, Infragistics Seminar, Israel , November 2011
WPF/ XamDataGrid Performance, Infragistics Seminar, Israel , November 2011WPF/ XamDataGrid Performance, Infragistics Seminar, Israel , November 2011
WPF/ XamDataGrid Performance, Infragistics Seminar, Israel , November 2011
 
4838281 operating-system-scheduling-on-multicore-architectures
4838281 operating-system-scheduling-on-multicore-architectures4838281 operating-system-scheduling-on-multicore-architectures
4838281 operating-system-scheduling-on-multicore-architectures
 
Application scheduling in cloud sim
Application scheduling in cloud simApplication scheduling in cloud sim
Application scheduling in cloud sim
 
CloudLightning Simulator
CloudLightning SimulatorCloudLightning Simulator
CloudLightning Simulator
 
Hadoop data management
Hadoop data managementHadoop data management
Hadoop data management
 
cluster computing
cluster computingcluster computing
cluster computing
 
Task programming
Task programmingTask programming
Task programming
 
Cluster computing
Cluster computingCluster computing
Cluster computing
 
Hadoop map reduce v2
Hadoop map reduce v2Hadoop map reduce v2
Hadoop map reduce v2
 
Multiprocessor scheduling 1
Multiprocessor scheduling 1Multiprocessor scheduling 1
Multiprocessor scheduling 1
 
A load balancing model based on cloud partitioning for the public cloud
A load balancing model based on cloud partitioning for the public cloudA load balancing model based on cloud partitioning for the public cloud
A load balancing model based on cloud partitioning for the public cloud
 
Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...
 

Viewers also liked

PES Solar presentation
PES Solar presentationPES Solar presentation
PES Solar presentationpessolar
 
Effects of air pollution m3
Effects of air pollution m3 Effects of air pollution m3
Effects of air pollution m3 Bibhabasu Mohanty
 
Environmental impact assessment m5
Environmental impact assessment m5Environmental impact assessment m5
Environmental impact assessment m5Bibhabasu Mohanty
 
Canal regulation works. m4pptx
Canal regulation works. m4pptxCanal regulation works. m4pptx
Canal regulation works. m4pptxBibhabasu Mohanty
 

Viewers also liked (17)

THE H.264/MPEG4 AND ITS APPLICATIONS
THE H.264/MPEG4 AND ITS APPLICATIONSTHE H.264/MPEG4 AND ITS APPLICATIONS
THE H.264/MPEG4 AND ITS APPLICATIONS
 
Example summary of SDN + NFV + Cloud Technology
Example summary of SDN + NFV + Cloud TechnologyExample summary of SDN + NFV + Cloud Technology
Example summary of SDN + NFV + Cloud Technology
 
Tegas Industrial Group
Tegas Industrial GroupTegas Industrial Group
Tegas Industrial Group
 
Playing with OF@TEIN SDN-enabled Virtual Playgrounds
Playing with OF@TEIN SDN-enabled  Virtual PlaygroundsPlaying with OF@TEIN SDN-enabled  Virtual Playgrounds
Playing with OF@TEIN SDN-enabled Virtual Playgrounds
 
PES Solar presentation
PES Solar presentationPES Solar presentation
PES Solar presentation
 
CFI 2015 - Flow-centric Visibility Tools for OF@TEIN
CFI 2015 - Flow-centric Visibility Tools for OF@TEINCFI 2015 - Flow-centric Visibility Tools for OF@TEIN
CFI 2015 - Flow-centric Visibility Tools for OF@TEIN
 
SDN@MYREN Day 2015 - OF@TEIN SDN-Cloud Playground
SDN@MYREN Day 2015 - OF@TEIN SDN-Cloud PlaygroundSDN@MYREN Day 2015 - OF@TEIN SDN-Cloud Playground
SDN@MYREN Day 2015 - OF@TEIN SDN-Cloud Playground
 
Visibility Challenge on OF@TEIN SDN-enabled Virtual Playgrounds
Visibility Challenge on OF@TEIN SDN-enabled Virtual PlaygroundsVisibility Challenge on OF@TEIN SDN-enabled Virtual Playgrounds
Visibility Challenge on OF@TEIN SDN-enabled Virtual Playgrounds
 
Master Thesis Presentation in Bahasa Indonesia
Master Thesis Presentation in Bahasa IndonesiaMaster Thesis Presentation in Bahasa Indonesia
Master Thesis Presentation in Bahasa Indonesia
 
CloudComp 2015 - SDN-Cloud Testbed with Hyper-convergent SmartX Boxes
CloudComp 2015 - SDN-Cloud Testbed with Hyper-convergent SmartX BoxesCloudComp 2015 - SDN-Cloud Testbed with Hyper-convergent SmartX Boxes
CloudComp 2015 - SDN-Cloud Testbed with Hyper-convergent SmartX Boxes
 
APAN-NRW 2015 - Community Effort towards Open/Shared Playground
APAN-NRW 2015 - Community Effort towards Open/Shared Playground APAN-NRW 2015 - Community Effort towards Open/Shared Playground
APAN-NRW 2015 - Community Effort towards Open/Shared Playground
 
ICCE 2014 - Running Lifecycle Experiments over SDN-enabled OF@TEIN Testbed
ICCE 2014 - Running Lifecycle Experiments over SDN-enabled OF@TEIN TestbedICCE 2014 - Running Lifecycle Experiments over SDN-enabled OF@TEIN Testbed
ICCE 2014 - Running Lifecycle Experiments over SDN-enabled OF@TEIN Testbed
 
Noise pollution
Noise pollutionNoise pollution
Noise pollution
 
Effects of air pollution m3
Effects of air pollution m3 Effects of air pollution m3
Effects of air pollution m3
 
Environmental impact assessment m5
Environmental impact assessment m5Environmental impact assessment m5
Environmental impact assessment m5
 
Canal regulation works. m4pptx
Canal regulation works. m4pptxCanal regulation works. m4pptx
Canal regulation works. m4pptx
 
Air pollution control m4
Air pollution control m4Air pollution control m4
Air pollution control m4
 

Similar to Comparison between Cloud Mirror, Mesos Cluster, and Google Omega

Chapter 5.pptx
Chapter 5.pptxChapter 5.pptx
Chapter 5.pptxJoeBaker69
 
Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez DataWorks Summit
 
Next-Gen Decision Making in Under 2ms
Next-Gen Decision Making in Under 2msNext-Gen Decision Making in Under 2ms
Next-Gen Decision Making in Under 2msIlya Ganelin
 
Dori Exterman, Considerations for choosing the parallel computing strategy th...
Dori Exterman, Considerations for choosing the parallel computing strategy th...Dori Exterman, Considerations for choosing the parallel computing strategy th...
Dori Exterman, Considerations for choosing the parallel computing strategy th...Sergey Platonov
 
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...Soumya Banerjee
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computingbutest
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computingbutest
 
Capital One's Next Generation Decision in less than 2 ms
Capital One's Next Generation Decision in less than 2 msCapital One's Next Generation Decision in less than 2 ms
Capital One's Next Generation Decision in less than 2 msApache Apex
 
Cloudsim & Green Cloud
Cloudsim & Green CloudCloudsim & Green Cloud
Cloudsim & Green CloudNeda Maleki
 
Cloudsim & greencloud
Cloudsim & greencloud Cloudsim & greencloud
Cloudsim & greencloud nedamaleki87
 
Utility Driven Service Routing over Large Scale Infrastructures
Utility Driven Service Routing over Large Scale InfrastructuresUtility Driven Service Routing over Large Scale Infrastructures
Utility Driven Service Routing over Large Scale InfrastructuresPablo Chacin
 
LOAD BALANCING IN CLOUD COMPUTING
LOAD BALANCING IN CLOUD COMPUTINGLOAD BALANCING IN CLOUD COMPUTING
LOAD BALANCING IN CLOUD COMPUTINGIRJET Journal
 
Cluster computing ppt
Cluster computing pptCluster computing ppt
Cluster computing pptDC Graphics
 
Cloud computing Module 2 First Part
Cloud computing Module 2 First PartCloud computing Module 2 First Part
Cloud computing Module 2 First PartSoumee Maschatak
 
2014 IEEE JAVA CLOUD COMPUTING PROJECT A stochastic model to investigate data...
2014 IEEE JAVA CLOUD COMPUTING PROJECT A stochastic model to investigate data...2014 IEEE JAVA CLOUD COMPUTING PROJECT A stochastic model to investigate data...
2014 IEEE JAVA CLOUD COMPUTING PROJECT A stochastic model to investigate data...IEEEFINALSEMSTUDENTPROJECTS
 
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDSFAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDSMaurvi04
 

Similar to Comparison between Cloud Mirror, Mesos Cluster, and Google Omega (20)

Chapter 5.pptx
Chapter 5.pptxChapter 5.pptx
Chapter 5.pptx
 
Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez
 
Clusters
ClustersClusters
Clusters
 
Next-Gen Decision Making in Under 2ms
Next-Gen Decision Making in Under 2msNext-Gen Decision Making in Under 2ms
Next-Gen Decision Making in Under 2ms
 
Introduction
IntroductionIntroduction
Introduction
 
Dori Exterman, Considerations for choosing the parallel computing strategy th...
Dori Exterman, Considerations for choosing the parallel computing strategy th...Dori Exterman, Considerations for choosing the parallel computing strategy th...
Dori Exterman, Considerations for choosing the parallel computing strategy th...
 
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Capital One's Next Generation Decision in less than 2 ms
Capital One's Next Generation Decision in less than 2 msCapital One's Next Generation Decision in less than 2 ms
Capital One's Next Generation Decision in less than 2 ms
 
Cloudsim & Green Cloud
Cloudsim & Green CloudCloudsim & Green Cloud
Cloudsim & Green Cloud
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
CLUSTER COMPUTING
CLUSTER COMPUTINGCLUSTER COMPUTING
CLUSTER COMPUTING
 
Cloudsim & greencloud
Cloudsim & greencloud Cloudsim & greencloud
Cloudsim & greencloud
 
Utility Driven Service Routing over Large Scale Infrastructures
Utility Driven Service Routing over Large Scale InfrastructuresUtility Driven Service Routing over Large Scale Infrastructures
Utility Driven Service Routing over Large Scale Infrastructures
 
LOAD BALANCING IN CLOUD COMPUTING
LOAD BALANCING IN CLOUD COMPUTINGLOAD BALANCING IN CLOUD COMPUTING
LOAD BALANCING IN CLOUD COMPUTING
 
Cluster computing ppt
Cluster computing pptCluster computing ppt
Cluster computing ppt
 
Cloud computing Module 2 First Part
Cloud computing Module 2 First PartCloud computing Module 2 First Part
Cloud computing Module 2 First Part
 
2014 IEEE JAVA CLOUD COMPUTING PROJECT A stochastic model to investigate data...
2014 IEEE JAVA CLOUD COMPUTING PROJECT A stochastic model to investigate data...2014 IEEE JAVA CLOUD COMPUTING PROJECT A stochastic model to investigate data...
2014 IEEE JAVA CLOUD COMPUTING PROJECT A stochastic model to investigate data...
 
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDSFAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
 

Recently uploaded

Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxDavid Michel
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
The architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdfThe architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdfalexjohnson7307
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationZilliz
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityScyllaDB
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsPaul Groth
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka DoktorováCzechDreamin
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyJohn Staveley
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Product School
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoTAnalytics
 
Agentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfAgentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfChristopherTHyatt
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsExpeed Software
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIES VE
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Product School
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Thierry Lestable
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeCzechDreamin
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCzechDreamin
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backElena Simperl
 

Recently uploaded (20)

Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
The architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdfThe architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdf
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Agentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfAgentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdf
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 

Comparison between Cloud Mirror, Mesos Cluster, and Google Omega

  • 1. Service Oriented Computing Reading Assignment #2 Cloud Mirror Mesos Cluster Google Omega Aris Cahyadi Risdianto - 20132095 -
  • 2. CloudMirror: Background Problem and Challenges Cloud hosted Application Problem  Not simple as Hadoop or Pregel  Interactive = predictable throughput & latency  100 msec latency increase = 1 % sales loss (Amazon)  Interactive workload ≥ batch workload CPU  Oversubscribe bandwidth to guarantee application = very expensive cost  No bandwidth-to-vCPU ratio to guarantee the bandwidth usage Key Challenges • “Easy” network abstraction model specify bandwidth requirement • A workload placement algorithm for efficient resource allocation • Scalable runtime to enforce bandwidth guarantee and efficient usage
  • 3. CloudMirror: Proposed Solutions *) New network abstraction based on application communication structure TAG* (Tenant Application Graph) Workload Placement Algorithm Cloud Mirror TAG Deployment • Bandwidth allocation at DC uplink match with TAG model requirements • Bandwidth saving by VM collocations in the subtree • VM Placement Algorithm to bridge the gap between high level TAG and low level infrastructure • Guaranteeing anti-affinity for HA and opportunistic anti- affinity for non-HA TAG model • each vertex graph represent application component/tier • Intuitive, descriptive, efficient and flexible • produced by OpenStack Heat and AWS Cloud formation extension
  • 4. CloudMirror: Simulation and Evaluation Result Evaluation 1) Efficiency a) Reserving Less Network Bandwidth b) Accepting more tenant request 2) Placement ability to guarantee and improve availability 3) Feasibility of deploying in real testbed Result Highlight • Benefits resource balancing as introduced in bandwidth capacity constraint network topology • Tenant rejection rate is less than 2.2 % and usually because of large VM/bandwidth requirements • Guaranteeing High Availability with higher WCS requirement will increase rejection rate • Scalability: 200 msec for 100 VMs/tenant or few seconds for 1000 VMs/tenant
  • 5. Mesos: Background Problem + Challenges and Mesos Target Environment Cluster Computing Framework Today  Emerge, but no framework for all  Multiplexing improve utilization and allow sharing, but costly for replications  Static partition / VM allocation per framework not achieve high utilization or efficient sharing >> no fine-grained sharing across framework Key Challenges • Complexity : scheduler API to get all frameworks requirements and online optimization for millions of tasks • New framework and new scheduling policies : current framework still developed • Expensive Refactoring : move many individual frameworks scheduling into global scheduling Target Environment: Cluster run Hadoop Jobs/Tasks as well as MPI jobs in the same time (Facebook or Yahoo dataware house)
  • 6. Mesos: Proposed Solutions Key Features 1) Resource Allocations • Two allocation modules : max-min fairness for multiple resource and strict priorities (similar with Hadoop & Dryad) • Task revocation mechanism: killing low impact tasks & trigger when revocation 2) Isolations resources between framework executors • Leveraging several existing OS isolation (modules) • Currently using Linux Container and Solaris Project 3) Scalable and Robust Resource Offer with 3 mechanism • Some framework always reject certain resources • Response timer for framework to receive offer • One framework no response, re-offer to other framework • Master Process manage mesos slaves daemon on each cluster • Framework run on each cluster to run the tasks on each slave • Framework has two component: scheduler (register to master to get resources) and executor (run the task) Mesos API Function for Scheduler & Executors
  • 7. Mesos: Simulation and Evaluation Result Evaluation 1) Macrobenchmark workloads (facebook hadoop mix, large hadoop mix, Spark, Torque/MPI) 2) Overhead 3) Data Locality through Delay Scheduling 4) Iterative jobs using Spark 5) Mesos Scalability 6) Failure Recovery 7) Performance Isolation Implementation • 10,000 lines codes of C++ • Run on Linux, Solaris, and OS X • Supporting frameworks on Java, C++, and Python • Zookeeper to leader election • Linux container for CPU and Memory • Tested frameworks: Hadoop, Torque, MPICH2, and Spark Resource Utilization Mesos Scalability macrobenchmark Speedup Result
  • 8. Omega: Background Problem + Requirement and Solutions Approach Cluster Scheduler Problem  Many different (high resource, rapid decision, business constraint, etc.) goals but should robust and always available  Cluster and workloads are keep growing fast  Monolithic and Two-level scheduling not satisfied (difficult for new policy and difficult to schedule)  Complexity in hardware and workload heterogeneity Design Issues Cluster Scheduler • Partitioning the scheduling work • Choice of Resources from Cluster • Interference (optimistic & pessimistic) • Allocation Granularity (policy flexible) • Cluster-wide behavior
  • 9. Omega: Proposed Solutions Key Features 1) Grant full access all scheduler to entire cluster (allow compete in a free-for-all manner) 2) Optimistic concurrency control to mediate clashes to update the cluster state 3) No central resource allocator (all decisions in scheduler) 4) Resource allocation copy in scheduler (called as “cell”) 5) Synchronize cell state (transaction), if failed try it again 6) Run in parallel and no wait for other jobs (no inter- scheduler blocking) 7) Different policies for all scheduler and apply relative important jobs (called as “precedence”) • Monolithic: use in HPC with single instance , same algorithm for all jobs • Two-level: use by Mesos and Hadoop- on-Demand, many different scheduler control by central scheduler • Shared State: use by Omega, avoiding two level and limited parallelism New Parallel Scheduler around “shared-state” Lock-free Optimistic Concurreny Control Omega
  • 10. Omega: Simulation and Evaluation Result Evaluation for Trace-Driven Simulation 1) Scheduling Performance : how service scheduler busyness varies as jobs and tasks 2) Scaling the Workload: time for scaling the task if there any conflicts 3) Load-balancing the batch scheduler: more decision time for large batch jobs 4) Dealing with Conflicts with two choices : coarse-grained conflict detection and all-nothing schedule 5) MapScheduler impact for the utilization and time completion jbs Lightweight Simulator Result Simulator 1)Lightweight Simulator: for compare scheduler architecture in same conditions and identic workloads 2)A high-fidelity Simulator: for historical Google workload traces