Mitigating Common CloudStack Instance Deployment Failures

•Download as PPTX, PDF•

0 likes•290 views

Session Description: A discussion on the common failures when using CloudStack taking instance deployment as an example. The session includes 15 specific failure scenarios, their causes, and possible mitigation steps. Speaker Bio: For work, Jithin is a cloud architect at ShapeBlue. Jithin has helped organisations around the globe use commercial distributions of Apache CloudStack in the past 10 years. --------------------------------------------- On Friday 18th August, the Apache CloudStack India User Group 2023 took place in Bangalore, seeing CloudStack enthusiasts, experts, and industry leaders from across the country, discuss the open-source project. The meetup served as a vibrant platform to delve into the depths of Apache CloudStack, share insights, and forge new connections.

Technology

Mitigating Common
CloudStack Instance
Deployment Failures
Jithin Raju

About Me
• Cloud Architect at ShapeBlue
• Involved with CloudStack/ Forks
since 2013.
• Citrix > Accelerite > ShapeBlue

Agenda
Common CloudStack Instance
Deployment Failures
Q&A
Discussion

Insufficient
Capacity
Capacity is fully utilized
Add
Add more Capacity
(Compute/Storage).
Delete
Delete unused
instances and volumes.
Resource
Resource limits
Monitor
Monitor utilization

Insufficient
Address Capacity
• Add new public IP
address range/subnet.
Add
• Resource limits.
Resource
• Plan capacity well
Plan
Public IPs are fully allocated

Unable to allocate
vnet Extend the VLAN range
Resource Limits on Networks
Plan capacity well.
Guest VLAN range in the zone fully utilized

Tag Mismatch Review
Review the
configuration.
Add
Add the tag to
more hosts/
storage.
Avoid
Avoid using tags
if not required.
The host / Storage tag is not configured
correctly.
Tagged host/storage are fully utilized.

Overconfidence
with over
provisioning Use realistic values.
Thorough testing.
Leave headroom.
Higher used capacity compared to allocated
capacity

Deployment
options
Avoid using
combinations
which can’t
be deployed.
Reduce the
choices
Conflicting choices

No destination found for a deployment
for VM instance Ensure
Host/Hardware
Health.
Ensure free
capacity.
Hypervisor
monitoring.
Hosts are disconnected
Lost vCenter access
No Free Capacity

Capability
Mismatch
Revise the compute offerings
along with hardware changes.
Review compute offerings.
Unable to find hosts with a suitable number
of vCPUs, CPU MHz, or any other
specification in the compute offering.

Resource Limits
Increase the limit. Free up resources.
Resource limit on account or domain for
instance, volumes, primary storage,
public IPs, and networks.

Null Pointer
Exceptions
Avoid DB
modifications.
Apply fixes. Report/ Fix
bugs.
Manual DB changes
Cloudstack bugs.

Database Errors Monitor Monitor the server.
Connectivity
Ensure good
connectivity to
Database.
MySQL
Use tested and
supported MySQL
versions.
DB Server performance
JDBC Errors.
Errors executing statements.
DB server filesystem filled up.

Instance not
booting
Fix the guest OS
mapping.
Use supported
controller types.
Fix the
template.
Incorrect Guest OS mapping.
Unsupported controller type.
Corrupted template/ISO.

Timeouts
Update timeouts to
match the environment
and use cases.
Identify the sub-task
causing the delay.
Review underlying
platform performance.
“Wait” timeouts
Job timeout

Issues with VR
Inspect the VR. Restart the network
with clean-up.
Fix / Report bug.
Unresponsive VR.
Filesystem filled up.
Unreachable VR.
Storage issues.
Cloudstack bugs.

Similar to Mitigating Common CloudStack Instance Deployment Failures

Ask The Architect: RightScale & AWS Dive Deep into Hybrid ITRightScale

Best Practices For WorkflowTimothy Spann

Building rich domain models with ddd and tdd ivan paulovich - betssonIvan Paulovich

Testing the UntestableMark Baker

Resilience planning and how the empire strikes backBhakti Mehta

How to Set Up a Cloud Cost Optimization Process for your EnterpriseRightScale

VMworld 2013: Building a Validation Factory for VMware Partners VMworld

Domain separation trainingbpatino15

Analysis Services Best Practices From Large Deploymentsrsnarayanan

How to Lower TCO and Avoid Cloud Lock-in Cloudera, Inc.

Pa Project And Best Practice 2alice yang

Webinar: AWS Partner Strategies For SuccessAaron Klein

VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study VMworld

AWS Summit Sydney 2014 | Moving to the Cloud. What does it Mean to your BusinessAmazon Web Services

Datacomm VMWare Hybrid CloudPT Datacomm Diangraha

Test at Scale within your Internal Networks with BrowserStack Local TestingBrowserStack

Azure architecture design patterns - proven solutions to common challengesIvo Andreev

A Framework to Measure and Maximize Cloud ROIRightScale

AWS Summit Auckland 2014 | Moving to the Cloud. What does it Mean to your Bus...Amazon Web Services

Kafka Summit NYC 2017 - Apache Kafka in the Enterprise: What if it Fails? confluent

Similar to Mitigating Common CloudStack Instance Deployment Failures (20)

Ask The Architect: RightScale & AWS Dive Deep into Hybrid IT

Best Practices For Workflow

Building rich domain models with ddd and tdd ivan paulovich - betsson

Testing the Untestable

Resilience planning and how the empire strikes back

How to Set Up a Cloud Cost Optimization Process for your Enterprise

VMworld 2013: Building a Validation Factory for VMware Partners

Domain separation training

Analysis Services Best Practices From Large Deployments

How to Lower TCO and Avoid Cloud Lock-in 

Pa Project And Best Practice 2

Webinar: AWS Partner Strategies For Success

VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study

AWS Summit Sydney 2014 | Moving to the Cloud. What does it Mean to your Business

Datacomm VMWare Hybrid Cloud

Test at Scale within your Internal Networks with BrowserStack Local Testing

Azure architecture design patterns - proven solutions to common challenges

A Framework to Measure and Maximize Cloud ROI

AWS Summit Auckland 2014 | Moving to the Cloud. What does it Mean to your Bus...

Kafka Summit NYC 2017 - Apache Kafka in the Enterprise: What if it Fails?

Recently uploaded

GenAI Risks & Security Meetup 01052024.pdflior mazor

Histor y of HAM Radio presentation slidevu2urc

A Year of the Servo Reboot: Where Are We Now?Igalia

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

🐬 The future of MySQL is Postgres 🐘RTylerCroy

Apidays New York 2024 - The value of a flexible API Management solution for O...apidays

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge

Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93

Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Recently uploaded (20)

GenAI Risks & Security Meetup 01052024.pdf

Histor y of HAM Radio presentation slide

A Year of the Servo Reboot: Where Are We Now?

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

🐬 The future of MySQL is Postgres 🐘

Apidays New York 2024 - The value of a flexible API Management solution for O...

[2024]Digital Global Overview Report 2024 Meltwater.pdf

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

Driving Behavioral Change for Information Management through Data-Driven Gree...

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

How to Troubleshoot Apps for the Modern Connected Worker

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

Boost Fertility New Invention Ups Success Rates.pdf

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

Powerful Google developer tools for immediate impact! (2023-24 C)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

GenCyber Cyber Security Day Presentation

Mitigating Common CloudStack Instance Deployment Failures

1. Mitigating Common CloudStack Instance Deployment Failures Jithin Raju

2. About Me • Cloud Architect at ShapeBlue • Involved with CloudStack/ Forks since 2013. • Citrix > Accelerite > ShapeBlue

3. Agenda Common CloudStack Instance Deployment Failures Q&A Discussion

4. Insufficient Capacity Capacity is fully utilized Add Add more Capacity (Compute/Storage). Delete Delete unused instances and volumes. Resource Resource limits Monitor Monitor utilization

5. Insufficient Address Capacity • Add new public IP address range/subnet. Add • Resource limits. Resource • Plan capacity well Plan Public IPs are fully allocated

6. Unable to allocate vnet Extend the VLAN range Resource Limits on Networks Plan capacity well. Guest VLAN range in the zone fully utilized

7. Tag Mismatch Review Review the configuration. Add Add the tag to more hosts/ storage. Avoid Avoid using tags if not required. The host / Storage tag is not configured correctly. Tagged host/storage are fully utilized.

8. Overconfidence with over provisioning Use realistic values. Thorough testing. Leave headroom. Higher used capacity compared to allocated capacity

9. Deployment options Avoid using combinations which can’t be deployed. Reduce the choices Conflicting choices

10. No destination found for a deployment for VM instance Ensure Host/Hardware Health. Ensure free capacity. Hypervisor monitoring. Hosts are disconnected Lost vCenter access No Free Capacity

11. Capability Mismatch Revise the compute offerings along with hardware changes. Review compute offerings. Unable to find hosts with a suitable number of vCPUs, CPU MHz, or any other specification in the compute offering.

12. Resource Limits Increase the limit. Free up resources. Resource limit on account or domain for instance, volumes, primary storage, public IPs, and networks.

13. Null Pointer Exceptions Avoid DB modifications. Apply fixes. Report/ Fix bugs. Manual DB changes Cloudstack bugs.

14. Database Errors Monitor Monitor the server. Connectivity Ensure good connectivity to Database. MySQL Use tested and supported MySQL versions. DB Server performance JDBC Errors. Errors executing statements. DB server filesystem filled up.

15. Instance not booting Fix the guest OS mapping. Use supported controller types. Fix the template. Incorrect Guest OS mapping. Unsupported controller type. Corrupted template/ISO.

16. Timeouts Update timeouts to match the environment and use cases. Identify the sub-task causing the delay. Review underlying platform performance. “Wait” timeouts Job timeout

17. Issues with VR Inspect the VR. Restart the network with clean-up. Fix / Report bug. Unresponsive VR. Filesystem filled up. Unreachable VR. Storage issues. Cloudstack bugs.

18. Q&A DISCUSSION

Editor's Notes

We are going to discuss 15 common instance deployment failure scenarios, their usual causes, and suggested mitigation steps.
If you have used cloudstack already you must have seen this error. Insufficient capacity is a generic error thrown for many failures and obviously when there is no available capacity as well. Most of the time the capacity is fully utilized and the ways to resolve are straightforward. We could add more compute or storage resources depending on the situation. If there is an opportunity to delete some unused resources such as instances or volumes that could also help. Another way to handle this situation is with efficient use of resource limits at account, domain levels.
When you are out of public IPs and the new network requires a public IP the instance deployment could fail. The way around this situation is mostly by adding more public IPs. To avoid getting into this situation we could use resource limits efficiently. If you know the use case and the expected usage of public IPs upfront you could add the capacity accordingly. If you pay attention to the resource utilization you should be able to provision new IPs before it results in any failure.
When you deploy an instance where the network needs to be implemented a new VLAN needs to be allocated. There are situations where the provisioned VLAN IDs are fully consumed, this could also result in an instance deployment failure. The typical way around this is to extend the VLAN range. If you have used resource limits effectively, to some extent you may not see this issue that often. If the VLAN capacity is planned well you may not face this issue at all.
The deployment planner is unable to find host / storage matching the tag having enough capacity. We can solve this by reviewing the current host / storage tagging configuration and fixing them. We could avoid using tags if that’s not required. Or we could add tags to more resources.
Over provisioning has no magic to give us more than the capacity the infrastructure has. We need to use the over provisioning values realistically. If you do thorough testing you should be able to find the suitable values. Also keep some head room to avoid resource contentions.
We should avoid choosing mutually exclusive deployment choices such as selecting dedicated affinity group and selecting an offering have a host or storage that is not available in the dedicated resource. We can avoid this by selecting only the appropriate choices for instance deployment.
You might see this instance deployment error when hosts are unavailable due to any reason. It could hardware, network or software issues. This is also seen where there is no free capacity as well. You can avoid this by ensuring the host and its hardware health. Make sure there is free capacity. Implement any hypervisor monitoring.
I have seen this capability mismatch error during instance deployment after a server hardware upgrade or replacement. The older CPUs could be say 3500 MHZ and the compute offering were created based on them so we may have used CPU MHZ value of 3500. If the new CPU is of 2000MHz the instance deployment would fail. Its always good practice to revise the compute offering according to the hardware changes.
Although I suggested using resource limits as a solution earlier a couple of times, resource limits themselves could lead to instance deployment failure. If the instance resource limit is reached for either account , domain or project level the result is a failure. We do have much options in this case, we can either increase the limit or free up the resources.
Java Null pointer exceptions are another set of errors that could lead to any operation failure and instance deployment as well. Mostly these appear if you have done wrong DB changes. We hit this error caused by bugs as well. If we can avoid manual DB changes we will not be affected by this error most likely. If it’s a bug either we can find the fixes version and upgrade or report the bug or even fix it ourselves if that’s an option.
Any issues with database can can break the cloudstack installation. It is essential to keep it away from any errors. Issues such as DB server performance issues, JDBC errors, some wrong mysql statements etc are the most common ones leading to failures such as instance deployment failure. We need to monitor the DB server to avoid any service outages. We need to ensure good network connectivity to DB server. Also we can avoid some issues if we use supported and tested mysql versions.
Sometimes the instance deployment job succeeds but the guest OS fails to boot. This could happen due to incorrect guest OS mappings, or unsupported guest OS type. Unsupported controller type or corrupted template or ISO. If you access the console you should be able to tell what caused the issue. Depending on the cause you should be able to fix it by either fixing the guest OS mapping, using the correct controller types or by fixing the template.
Timeouts another common failure scenario. We could avoid this by increasing the timeout. We can also identify the sub tasks causing the delay and fix it. Most of times the underlysing platform is slow, its worth reviewing it fixing it solves the problem.
Unless the network is configured the instance won't be deployed. Any system issue with VR could result in instance deployment failure. It could be an unresponsive VR, its filesystem could be filled up. VR may be unreachable. There could storage issues where the VR is hosted. There could be some bug causing the network configuration on the VR to fail. In these situations we should inspect the VR to identify what is causing the issue. Sometimes restarting the network with clean-up would solve the issue. Sometimes it is best to report the bug or fix the bug if that’s the case.

Mitigating Common CloudStack Instance Deployment Failures

Recommended

Recommended

More Related Content

Similar to Mitigating Common CloudStack Instance Deployment Failures

Similar to Mitigating Common CloudStack Instance Deployment Failures (20)

More from ShapeBlue

More from ShapeBlue (20)

Recently uploaded

Recently uploaded (20)

Mitigating Common CloudStack Instance Deployment Failures

Editor's Notes