The document provides troubleshooting strategies for CloudStack installations, including network issues, security groups, host connectivity, virtual routers, templates, and log analysis. It discusses common problems such as VLAN misconfigurations, security group rules not being applied, hosts showing in the "avoid set", template preparation errors, and exceptions in the logs. It emphasizes analyzing logs at the management server, hypervisor, and job levels to find the root cause of failures.
CloudStack - Top 5 Technical Issues and TroubleshootingShapeBlue
Cloudstack Top 5 technical issues and troubleshooting. Cloudstack is a mature product in use by companies world-wide. While being associated with CloudStack development for over 5 years, Abhi has come across some technical issues that once in a while affect the CloudStack deployment. This presentation is an effort to put together top 5 such issues, analyze their symptoms, see them from CloudStack architecture perspective and from the distributed nature of cloud orchestration, then look at ways to avoid them and finally be able to troubleshoot if they occur.
CloudStack allows various life cycle operations for a Virtual Machine (VM). It maintains queues internally, to sync and perform all these operations. This talk briefs about how job queues are maintained in CloudStack, to execute the VM operations, followed by a demo.
Suresh Anaparti is a software architect at ShapeBlue, the largest independent integrator of CloudStack technologies globally. He has over 15 years of end-to-end product development experience in Cloud Infrastructure, Telecom and Geospatial technologies. He is an active Apache CloudStack committer/contributor and is currently working with ShapeBlue. He has been working on CloudStack development for more than 5 years.
-----------------------------------------
The CloudStack European User Group 2022 took place on 7th April. The day saw a virtual get together for the European CloudStack Community, hosting 265 attendees from 25 countries. The event hosted 10 sessions with from leading CloudStack experts, users and skilful engineers from the open-source world, which included: technical talks, user stories, new features and integrations presentations and more.
------------------------------------------
About CloudStack: https://cloudstack.apache.org/
Paul Angus – Backup & Recovery in CloudStackShapeBlue
CloudStack users have long had to resort to using snapshots as a form of VM backup with varying success. In this talk Paul will explain features of the forthcoming backup and recovery feature.
CloudStack - Top 5 Technical Issues and TroubleshootingShapeBlue
Cloudstack Top 5 technical issues and troubleshooting. Cloudstack is a mature product in use by companies world-wide. While being associated with CloudStack development for over 5 years, Abhi has come across some technical issues that once in a while affect the CloudStack deployment. This presentation is an effort to put together top 5 such issues, analyze their symptoms, see them from CloudStack architecture perspective and from the distributed nature of cloud orchestration, then look at ways to avoid them and finally be able to troubleshoot if they occur.
CloudStack allows various life cycle operations for a Virtual Machine (VM). It maintains queues internally, to sync and perform all these operations. This talk briefs about how job queues are maintained in CloudStack, to execute the VM operations, followed by a demo.
Suresh Anaparti is a software architect at ShapeBlue, the largest independent integrator of CloudStack technologies globally. He has over 15 years of end-to-end product development experience in Cloud Infrastructure, Telecom and Geospatial technologies. He is an active Apache CloudStack committer/contributor and is currently working with ShapeBlue. He has been working on CloudStack development for more than 5 years.
-----------------------------------------
The CloudStack European User Group 2022 took place on 7th April. The day saw a virtual get together for the European CloudStack Community, hosting 265 attendees from 25 countries. The event hosted 10 sessions with from leading CloudStack experts, users and skilful engineers from the open-source world, which included: technical talks, user stories, new features and integrations presentations and more.
------------------------------------------
About CloudStack: https://cloudstack.apache.org/
Paul Angus – Backup & Recovery in CloudStackShapeBlue
CloudStack users have long had to resort to using snapshots as a form of VM backup with varying success. In this talk Paul will explain features of the forthcoming backup and recovery feature.
IBM Integration Bus & WebSphere MQ - High Availability & Disaster RecoveryRob Convery
This covers the various aspects of configuration IBM Integration Bus when looking to implement a highly available system and comprehensive disaster recovery plan.
4.17.0 is the latest Apache CloudStack major release. In this talk, Nicolas goes through the new features introduced in this version from an administrator/user perspective, explaining their benefits and the problems those features resolve. He also ran a live demo to see the new features in action.
Nicolas Vazquez is a Senior Software Engineer at ShapeBlue and is a PMC member of the Apache CloudStack project. He spends his time designing and implementing features in Apache CloudStack and can be seen acting as a release manager also. Nicolas is based in Uruguay and is a father of a young girl. He is a fan of sports, enjoys playing tennis and football. In his free time, he also enjoys reading and listening to economic and political materials.
-----------------------------------------
CloudStack Collaboration Conference 2022 took place on 14th-16th November in Sofia, Bulgaria and virtually. The day saw a hybrid get-together of the global CloudStack community hosting 370 attendees. The event hosted 43 sessions from leading CloudStack experts, users and skilful engineers from the open-source world, which included: technical talks, user stories, new features and integrations presentations and more.
In this session, Lucian talks about monitoring CloudStack and its related components. What are the best practices and what do you need to track closely to ensure your cloud reliability.
Lucian is a long-time sysadmin and Apache Cloustack user and contributor. He has a background in hosting, virtualisation and datacentre operations, but is now working full time on Cloudstack.
-----------------------------------------
CloudStack Collaboration Conference 2022 took place on 14th-16th November in Sofia, Bulgaria and virtually. The day saw a hybrid get-together of the global CloudStack community hosting 370 attendees. The event hosted 43 sessions from leading CloudStack experts, users and skilful engineers from the open-source world, which included: technical talks, user stories, new features and integrations presentations and more.
Backroll: Production Grade KVM Backup Solution Integrated in CloudStackShapeBlue
Backroll is not only a production-grade KVM backup solution. It is also being integrated inside Apache Cloudstack using the Backup and restore framework. Pierre and Quentin will show how it works, the feature list, and how the integration has been made.
Quentin is in charge of DIMSI custom developments on Apache Cloudstack deployment : customer portal, backup solution. On a daily basis, he helps our customers and our developers to use and embrace Devops methodology, by building CI/CD pipelines (GitLab, Azure Devops), dockerizing apps and automate things as much as possible... When not DevOps'ing, Quentin loves to binge watch series and movies, play with his cat "Boogie" and is a crazy fan of street food.
Grégoire is a software architect who spends most of his time designing infrastructure applications and CRM systems, on-premise or multi-cloud based. He’s been using Apache Cloudstack for many years, and likes to keep knowledge and data outside black-boxes Father of 4 children, you can meet him in Southern Brittany, sailing Hobbie Cat or supporting Lorient football club at Moustoir stadium.
Pierre is in charge of Backroll integration inside Cloudstack. Pierre has a proven track record of successful c# and Java projects. When not playing with his keyboard, Pierre is surfing, WingFoiling or bodyboarding on Brittany coast.
-----------------------------------------
CloudStack Collaboration Conference 2022 took place on 14th-16th November in Sofia, Bulgaria and virtually. The day saw a hybrid get-together of the global CloudStack community hosting 370 attendees. The event hosted 43 sessions from leading CloudStack experts, users and skilful engineers from the open-source world, which included: technical talks, user stories, new features and integrations presentations and more.
KMIP stands for key management interoperability protocol. Provides simple binary and TTLV variant protocol to manage various cryptographic key cycles for enterprise needs viz., for enterprise applications, data encryption etc.
Checking in your deployment configuration as code
Helm is a tool that streamlines the creation, deployment and management of your Kubernetes-native applications. In this talk, we take a look at how Helm enables you to manage your deployment configurations as code, and demonstrate how it can be used to power your continuous delivery (CI/CD) pipeline.
Dimsi have developed a backup solution for Virtual Machines based on KVM hypervisors. Every layer of the product uses Open Source libraries or components (Python, VueJS, Celery, Borg Backup, Redis, Socketio, Flask). There is no agent needed on the VMs. Dimsi have implemented a feature to group the hosts based on their use (CloudStack Hosts or Management Hosts) and apply specific policies to the groups. In the CloudStack context, this product can help you backup and restore all your VMs easily if the hypervisors are KVM-based. Moreover, restoring the VMs is effortless because KVM and CloudStack use the same id for the VM disks, so no need to hack the database to match them.
Quentin Roccia : Senior DevOps engineer, Cloud enabler
Quentin is in charge of DIMSI custom developments on top of Apache Cloudtack deployment : customer portal, backup solutions.
On a daily basis, he helps our customers to build and improve Devops strategy, including GitLab, Cloudstack APIs and Python devs.
Quentin is the main contributor of KVM backup solution
Joffrey Luangsaysana : Senior Cloud engineer, Plateform specialist
Joffrey is responsible of our core plateform, including compute, storage, networking, and Apache Cloudstack services.
He is focused on providing maximum performances and uptime to our customer, and dedicated to guarantee fast and reliable customer VM’s backup.
-----------------------------------------
The CloudStack European User Group 2022 took place on 7th April. The day saw a virtual get together for the European CloudStack Community, hosting 265 attendees from 25 countries. The event hosted 10 sessions with from leading CloudStack experts, users and skilful engineers from the open-source world, which included: technical talks, user stories, new features and integrations presentations and more.
------------------------------------------
About CloudStack: https://cloudstack.apache.org/
This presentation provides a comprehensive overview of Maven 3 including lifecycles and a detail of the default lifecycle and the associated phases within.
Dev and test environments require the frequent and repeatable deployment of the CloudStack setup. This can be time-consuming and prone to errors. In this presentation, Kaloyan shows how StorPool uses Ansible for automatic deployment and setting up complete CloudStack clouds.
Kaloyan Kotlarski is a system administrator in StorPools' support team. He's been in the company for two years. He's responsible for building CI/CD automation and helping clients integrate StorPool Storage in their cloud deployments.
-----------------------------------------
CloudStack Collaboration Conference 2022 took place on 14th-16th November in Sofia, Bulgaria and virtually. The day saw a hybrid get-together of the global CloudStack community hosting 370 attendees. The event hosted 43 sessions from leading CloudStack experts, users and skilful engineers from the open-source world, which included: technical talks, user stories, new features and integrations presentations and more.
As Atlassian Connect is the way forward for building add-ons on Atlassian Cloud, Spring Boot is the way forward for building Spring web applications. Now you can combine the best of both worlds with the new open source library: Atlassian Connect Starter for Spring Boot. This will get you bootstrapped with an Atlassian Connect add-on in just a few minutes. In this talk you will learn:
What is Spring Boot
What is a Spring Boot Starter and how they benefit you
How to use the Atlassian Connect Starter to easily build Atlassian Connect add-ons
The Atlassian Connect architecture and how it interacts with your add-ons
We will write a simple macro for Confluence and show how much time Spring Boot can save you.
Building a redundant CloudStack management cluster - Vladimir MelnikShapeBlue
Building a redundant CloudStack management cluster. Building and maintaining an open-source-driven clustered environment for Apache CloudStack management server with GNU Linux, HAProxy, HeartBeat, Bind, OpenLDAP and other tools.
Building a serverless company on AWS lambda and Serverless frameworkLuciano Mammino
Planet9energy.com is a new electricity company building a sophisticated analytics and energy trading platform for the UK market. Since the earliest draft of the platform, we took the unconventional decision to go serverless and build the product on top of AWS Lambda and the Serverless framework using Node.js. In this talk, I want to discuss why we took this radical decision, what are the pros and cons of this approach and what are the main issues we faced as a tech team in our design and development experience. We will discuss how normal things like testing and deployment need to be re-thought to work on a serverless fashion but also the benefits of (almost) infinite self-scalability and the peace of mind of not having to manage hundreds of servers. Finally, we will underline how Node.js seems to fit naturally in this scenario and how it makes developing serverless applications extremely convenient.
Technologies:
Backend
Frontend
Application architecture
Javascript
cloud computing
IBM Integration Bus & WebSphere MQ - High Availability & Disaster RecoveryRob Convery
This covers the various aspects of configuration IBM Integration Bus when looking to implement a highly available system and comprehensive disaster recovery plan.
4.17.0 is the latest Apache CloudStack major release. In this talk, Nicolas goes through the new features introduced in this version from an administrator/user perspective, explaining their benefits and the problems those features resolve. He also ran a live demo to see the new features in action.
Nicolas Vazquez is a Senior Software Engineer at ShapeBlue and is a PMC member of the Apache CloudStack project. He spends his time designing and implementing features in Apache CloudStack and can be seen acting as a release manager also. Nicolas is based in Uruguay and is a father of a young girl. He is a fan of sports, enjoys playing tennis and football. In his free time, he also enjoys reading and listening to economic and political materials.
-----------------------------------------
CloudStack Collaboration Conference 2022 took place on 14th-16th November in Sofia, Bulgaria and virtually. The day saw a hybrid get-together of the global CloudStack community hosting 370 attendees. The event hosted 43 sessions from leading CloudStack experts, users and skilful engineers from the open-source world, which included: technical talks, user stories, new features and integrations presentations and more.
In this session, Lucian talks about monitoring CloudStack and its related components. What are the best practices and what do you need to track closely to ensure your cloud reliability.
Lucian is a long-time sysadmin and Apache Cloustack user and contributor. He has a background in hosting, virtualisation and datacentre operations, but is now working full time on Cloudstack.
-----------------------------------------
CloudStack Collaboration Conference 2022 took place on 14th-16th November in Sofia, Bulgaria and virtually. The day saw a hybrid get-together of the global CloudStack community hosting 370 attendees. The event hosted 43 sessions from leading CloudStack experts, users and skilful engineers from the open-source world, which included: technical talks, user stories, new features and integrations presentations and more.
Backroll: Production Grade KVM Backup Solution Integrated in CloudStackShapeBlue
Backroll is not only a production-grade KVM backup solution. It is also being integrated inside Apache Cloudstack using the Backup and restore framework. Pierre and Quentin will show how it works, the feature list, and how the integration has been made.
Quentin is in charge of DIMSI custom developments on Apache Cloudstack deployment : customer portal, backup solution. On a daily basis, he helps our customers and our developers to use and embrace Devops methodology, by building CI/CD pipelines (GitLab, Azure Devops), dockerizing apps and automate things as much as possible... When not DevOps'ing, Quentin loves to binge watch series and movies, play with his cat "Boogie" and is a crazy fan of street food.
Grégoire is a software architect who spends most of his time designing infrastructure applications and CRM systems, on-premise or multi-cloud based. He’s been using Apache Cloudstack for many years, and likes to keep knowledge and data outside black-boxes Father of 4 children, you can meet him in Southern Brittany, sailing Hobbie Cat or supporting Lorient football club at Moustoir stadium.
Pierre is in charge of Backroll integration inside Cloudstack. Pierre has a proven track record of successful c# and Java projects. When not playing with his keyboard, Pierre is surfing, WingFoiling or bodyboarding on Brittany coast.
-----------------------------------------
CloudStack Collaboration Conference 2022 took place on 14th-16th November in Sofia, Bulgaria and virtually. The day saw a hybrid get-together of the global CloudStack community hosting 370 attendees. The event hosted 43 sessions from leading CloudStack experts, users and skilful engineers from the open-source world, which included: technical talks, user stories, new features and integrations presentations and more.
KMIP stands for key management interoperability protocol. Provides simple binary and TTLV variant protocol to manage various cryptographic key cycles for enterprise needs viz., for enterprise applications, data encryption etc.
Checking in your deployment configuration as code
Helm is a tool that streamlines the creation, deployment and management of your Kubernetes-native applications. In this talk, we take a look at how Helm enables you to manage your deployment configurations as code, and demonstrate how it can be used to power your continuous delivery (CI/CD) pipeline.
Dimsi have developed a backup solution for Virtual Machines based on KVM hypervisors. Every layer of the product uses Open Source libraries or components (Python, VueJS, Celery, Borg Backup, Redis, Socketio, Flask). There is no agent needed on the VMs. Dimsi have implemented a feature to group the hosts based on their use (CloudStack Hosts or Management Hosts) and apply specific policies to the groups. In the CloudStack context, this product can help you backup and restore all your VMs easily if the hypervisors are KVM-based. Moreover, restoring the VMs is effortless because KVM and CloudStack use the same id for the VM disks, so no need to hack the database to match them.
Quentin Roccia : Senior DevOps engineer, Cloud enabler
Quentin is in charge of DIMSI custom developments on top of Apache Cloudtack deployment : customer portal, backup solutions.
On a daily basis, he helps our customers to build and improve Devops strategy, including GitLab, Cloudstack APIs and Python devs.
Quentin is the main contributor of KVM backup solution
Joffrey Luangsaysana : Senior Cloud engineer, Plateform specialist
Joffrey is responsible of our core plateform, including compute, storage, networking, and Apache Cloudstack services.
He is focused on providing maximum performances and uptime to our customer, and dedicated to guarantee fast and reliable customer VM’s backup.
-----------------------------------------
The CloudStack European User Group 2022 took place on 7th April. The day saw a virtual get together for the European CloudStack Community, hosting 265 attendees from 25 countries. The event hosted 10 sessions with from leading CloudStack experts, users and skilful engineers from the open-source world, which included: technical talks, user stories, new features and integrations presentations and more.
------------------------------------------
About CloudStack: https://cloudstack.apache.org/
This presentation provides a comprehensive overview of Maven 3 including lifecycles and a detail of the default lifecycle and the associated phases within.
Dev and test environments require the frequent and repeatable deployment of the CloudStack setup. This can be time-consuming and prone to errors. In this presentation, Kaloyan shows how StorPool uses Ansible for automatic deployment and setting up complete CloudStack clouds.
Kaloyan Kotlarski is a system administrator in StorPools' support team. He's been in the company for two years. He's responsible for building CI/CD automation and helping clients integrate StorPool Storage in their cloud deployments.
-----------------------------------------
CloudStack Collaboration Conference 2022 took place on 14th-16th November in Sofia, Bulgaria and virtually. The day saw a hybrid get-together of the global CloudStack community hosting 370 attendees. The event hosted 43 sessions from leading CloudStack experts, users and skilful engineers from the open-source world, which included: technical talks, user stories, new features and integrations presentations and more.
As Atlassian Connect is the way forward for building add-ons on Atlassian Cloud, Spring Boot is the way forward for building Spring web applications. Now you can combine the best of both worlds with the new open source library: Atlassian Connect Starter for Spring Boot. This will get you bootstrapped with an Atlassian Connect add-on in just a few minutes. In this talk you will learn:
What is Spring Boot
What is a Spring Boot Starter and how they benefit you
How to use the Atlassian Connect Starter to easily build Atlassian Connect add-ons
The Atlassian Connect architecture and how it interacts with your add-ons
We will write a simple macro for Confluence and show how much time Spring Boot can save you.
Building a redundant CloudStack management cluster - Vladimir MelnikShapeBlue
Building a redundant CloudStack management cluster. Building and maintaining an open-source-driven clustered environment for Apache CloudStack management server with GNU Linux, HAProxy, HeartBeat, Bind, OpenLDAP and other tools.
Building a serverless company on AWS lambda and Serverless frameworkLuciano Mammino
Planet9energy.com is a new electricity company building a sophisticated analytics and energy trading platform for the UK market. Since the earliest draft of the platform, we took the unconventional decision to go serverless and build the product on top of AWS Lambda and the Serverless framework using Node.js. In this talk, I want to discuss why we took this radical decision, what are the pros and cons of this approach and what are the main issues we faced as a tech team in our design and development experience. We will discuss how normal things like testing and deployment need to be re-thought to work on a serverless fashion but also the benefits of (almost) infinite self-scalability and the peace of mind of not having to manage hundreds of servers. Finally, we will underline how Node.js seems to fit naturally in this scenario and how it makes developing serverless applications extremely convenient.
Technologies:
Backend
Frontend
Application architecture
Javascript
cloud computing
Introduction To Managing VMware With PowerShellHal Rottenberg
Introduction to the VI Toolkit which is available at http://vmware.com/go/powershell. Companion to my book which is at http://sapienpress.com/vmware.asp
Vladimir Melnik from Tucha Cloud Services in the Ukraine, another company running IaaS services on Apache Cloudstack. Vladimir is the original author and maintainer of Monkeyman, a perl5 framework for Apache CloudStack automation
Are your Oracle databases highly available? You have deployed Real Application Clusters (RAC), Data Guard, or Failover Clusters and are well protected against server failures? Great – the prerequisites for a highly available environment are given. However, to assure that backend infrastructure failures also remain transparent to the client, an appropriate configuration is a prerequisite.
This lecture will discuss the Oracle technologies that can be used to achieve automatic client failover functionality. What are the advantages, but also the limitations of these technologies?
Building a Serverless company with Node.js, React and the Serverless Framewor...Luciano Mammino
Planet9energy.com is a new electricity company building a sophisticated analytics and energy trading platform for the UK market. Since the earliest draft of the platform, we took the unconventional decision to go serverless and build the product on top of AWS Lambda and the Serverless framework using Node.js. In this talk, I want to discuss why we took this radical decision, what are the pros and cons of this approach and what are the main issues we faced as a tech team in our design and development experience. We will discuss how normal things like testing and deployment need to be re-thought to work on a serverless fashion but also the benefits of (almost) infinite self-scalability and the piece of mind of not having to manage hundreds of servers. Finally, we will underline how Node.js seems to fit naturally in this scenario and how it makes developing serverless applications extremely convenient.
VMworld 2013: vCloud Powered HPC is Better and Outperforming PhysicalVMworld
VMworld Europe 2013
Theo van Drimmelen, Bitbrains IT Services
Willem van Engeland, VMware
Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare
VMworld 2013: VMware Horizon View Troubleshooting: Looking under the HoodVMworld
VMworld 2013
Matt Coppinger, VMware
Jack McMichaels, VMware
Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare
New Jersey Red Hat Users Group Presentation: Provisioning anywhereRodrique Heron
This presentation is from the October 10, 2017, Red Hat Users Group meeting. Please check us out on meetup.com.
https://www.meetup.com/NorthernNJRHUG
Tools like Docker and Ansible enable new capabilities and speed, and this session will help you and your organization to put it all in context and be more successful and collaborative than ever before.
This session will provide both practical advice to improve your organization's provisioning process, as well as discuss best practices to achieve the much sought-after "push button infrastructure" across multi-cloud environments.
Provisioning means more than simply deploying VMs (or cloud instances) and participants will leave this session with a fresh understanding of the various aspects that go into providing a reliable, flexible and portable platform to their businesses' workloads.
Our Speaker: Andre Pitanga, Red Hat Solutions Architect
Andre is at heart just a chill and optimistic guy. He's delivered agile infrastructure projects with some of the world's biggest banks, financial analytics and media companies, but he swears he didn't break anything. When not reviewing or writing Ansible playbooks, he can be found working shoulder-to-shoulder with his awesome clients to build better platforms the open source way.
thredUP team shares key learnings from after-migration processes. We tell you about what technologies and solutions worked best for us and where we spent time troubleshooting and improving. In particular we have focused on development and staging experience, user authentication, cloud-native CI pipelines, applications telemetry and service mesh. We also share our experience with Kubernetes security hardening, autoscaling and tell you about a new service creation within our infrastructure.
DB proxy server test: run tests on tens of virtual machines with Jenkins, Vag...Timofey Turenko
The presentation describes CI environment for our product - Maxscale - database proxy server. To test such product we need a setup that consists of tens of machines: locally hosted virtual machines as well as machines from different clouds. All our Jenkins jobs are implemented in the form of Jenkins Job Builder code. Presentations also tells about our tool to manage virtual machines (wrapper over Vagrant)- MDBCI.
Presentation at March 2019 Dutch Postgres User Group Meetup on lessons learnt while migrating from Oracle to Postgres, demo'ed via vagrant test environments and using generic pgbench datasets.
The Future of SDN in CloudStack by Chiradeep Vittalbuildacloud
The core of CloudStack networking has always been software-defined. As the networking industry evolves to a software-defined future, CloudStack will have to evolve with it.
The presentation will examine the present state of SDN in CloudStack, look at some industry directions and attempt to predict the evolution of CloudStack with those trends.
Bio
Chiradeep Vittal is a Distinguished Engineer in the Converged Infrastructure Group at Citrix where he has technology leadership responsibilities around Citrix Cloud Platform, Citrix Lifecycle Manager and Citrix Workspace Pod. He is also a Project Management Committee member of the Apache CloudStack Project. At cloud.com (acquired by Citrix), he was a founding engineer, often tasked with the thorny details of virtualized networking and storage. Prior to cloud.com, he worked at several Silicon Valley startups in various architectural roles.
Chiradeep has a B.Tech in Computer Science from IIT, Bombay and a M.Sc from the University of Alberta. He has spoken / presented at several conferences, including CloudStack Collab, LISA, OSCON, ONS, SDN Summit and LinuxCon. His twitter handle is @chiradeep and occasionally blogs at http://cloudierthanthou.wordpress.com
Policy Based SDN Solution for DC and Branch Office by Suresh Boddapatibuildacloud
In this talk Suresh will discuss how Nuage Networks Virtualized Services Platform (VSP) helps overcome the challenges that cloud service providers and large enterprises face delivering, and managing, large multi-tenant clouds. He will discuss how Nuage Networks delivers a massively scalable SDN solution that ensures that datacenters, and wide area networks, are able to respond instantly to demand, and are boundary-less. The talk will also provide an overview of the SDN capabilities that Nuage VSP adds to CloudStack.
Bio
Suresh is the VP of Engineering at Nuage Networks. He has over 19 years experience in software development, building great teams and delivering high quality software. As the first engineer at Nuage Networks, Suresh played a key role in shaping the architecture of the Nuage Virtualized Services Platform (VSP). Suresh’s experience includes extensive protocol development, having developed IP routing and multicast protocols from scratch and deploying them in large ISPs. Suresh was part of the original TiMetra team before becoming part of Alcatel Lucent as Principal Engineer. He then took a role as Director of Engineering at Juniper where he worked on their QFabric product. Earlier in his career, Suresh worked in software engineering at Shasta Networks (Nortel acquired) as well as Fore Systems (Marconi, Ericsson acquired).
L4-L7 services for SDN and NVF by Youcef Laribibuildacloud
In this talk, we will discuss how L4-L7 devices can integrate in various SDN architectures, discuss benefits and some of the challenges that such integration represents. We will also talk about how SDN and NFV relate, and what are the different challenges to successfully deploy L4-L7 devices as Virtual Network Functions (VNFs) or provide such services to the NFV Infrastructure (VIM).
Bio
Youcef Laribi is a Principal Architect in the Delivery Networks BU at Citrix. He is responsible for driving the integration projects of the NetScaler ADC product with several Cloud, SDN and Automation environments including OpenStack, CloudStack, VMware NSX and Cisco ACI. He is also the Citrix representative on the OpenDaylight Technical Steering Committee. His background is mainly in Operating Systems and Distributed Systems, and he worked on several middleware technologies from DCE and CORBA in the early days, to J2EE and .NET to SOA and micro-services today. Youcef speaks 4 languages and holds a PhD and an MSc in Computer Science from the French INPG Institute in Grenoble, France.
Jenkins, jclouds, CloudStack, and CentOS by David Nalleybuildacloud
Setting up continuous integration for a single project can be a pretty daunting task. Doing that for hundreds of projects becomes a challenge of a different magnitude. Not only are their capacity problems, but some tests are destructive to the testing environment, some have esoteric environment demands. See how this is solved in the real world using Jenkins, jclouds, CloudStack to build an on-demand build infrastructure.
About David Nalley
David Nalley is the Vice President, Infrastructure at the Apache Software Foundation and a CloudStack PMC member.
This session will introduce monitoring CloudStack with Zenoss, and the CloudStack ZenPack. I will cover in detail what you get out of monitoring CloudStack with Zenoss. Additionally I will cover installation of Zenoss, interacting with our community and Q&A.
About Andrew Kirch
Andrew D Kirch is the Community Manager at Zenoss, a software development company specializing in Unified Monitoring with 130 employees, headquartered in Austin, Texas. The company offers an open source network and systems monitoring product called Zenoss Core, and a commercial product called Zenoss Service Dynamics. The company has over 35,000 users in over 180 countries. Customers include major organizations such as Chic-fil-a, Huntington Bank, Netflix, SunGard, Accenture, NASA, FIS Global, and many more.
As Community Manager, Andrew works directly with product users every day. He has over 10 years of experience as a Systems/Network Administrator, with specialization including SNMP and network monitoring. Prior to working at Zenoss he was principal at a unified communications VAR focused in the Midwest. In his spare time he puts computer crackers in prison.
Guaranteeing Storage Performance by Mike Tutkowskibuildacloud
This session will introduce the basics of primary storage in CloudStack. Additionally, I discuss the challenges of guaranteeing storage performance in a cloud and how by leveraging the latest enhancements to CloudStack, storage administrators can deliver consistent, repeatable performance to 10s, 100s or 1,000s of application workloads in parallel. I'll review the CloudStack enhancements in detail, outline the management benefits they provide and discuss common go-to-market approaches.
About Mike Tutkowski
Mike Tutkowski, a member of the CloudStack PMC, develops software for the Apache Software Foundation's CloudStack project to help drive improvements in its storage component and to integrate SolidFire more deeply into the product.
Cloud Application Blueprints with Apache Brooklyn by Alex Henevaldbuildacloud
So you have your cloud running, what now? Extend the devops agility from infrastructure to applications by learning how to use Brooklyn, the Apache-incubating project for application management. Create blueprints for applications to enable one-click deployment into Cloudstack, Docker, localhost, or other targets. Leverage your favourite server management tools, from Bash to Chef. Automatically change the deployment after it's deployed. Attach policies to support scaling, failover, and alerting in the way your application needs.
In this session we'll show how with just a few lines of YAML, you can build powerful application blueprints by composing pre-existing components, from polyglot web stacks to big data tools such as Riak. We'll also cover defining new blueprints using custom scripts, configuring machine selection and runtime policies, and managing new locations such as Clocker -- the cloud of docker.
About Alex Henevald
Alex brings twenty years experience designing software solutions in the enterprise, start-up, and academic sectors. Most recently Alex was with Enigmatec Corporation where he led the development of what is now the Monterey® Middleware Platform™. Previous to that, he founded PocketWatch Systems, commercialising results from his doctoral research. Alex holds a PhD (Informatics) and an MSc (Cognitive Science) from the University of Edinburgh and an AB (Mathematics) from Princeton University. Alex was both a USA Today Academic All-Star and a Marshall Scholar.
Introduction to Apache CloudStack by David Nalleybuildacloud
Apache CloudStack is a mature, easy to deploy IaaS platform. That doesn't mean that it can be done without thought or preparation. Learn how CloudStack can be most efficiently deployed, and the problems to avoid in the process.
About David Nalley
David is a recovering sysadmin with a decade of experience. He’s a committer on the Apache CloudStack (incubating) project, a contributor to the Fedora Project and the Vice President of Infrastructure at the Apache Software Foundation.
Monitoring CloudStack in context with Converged Infrastructure by Mike Turnlundbuildacloud
CloudStack is a powerful, flexible technology that greatly expands the economic potential for a datacenter. Performance management of CloudStack in context with the rest of the datacenter is critical for quick fault diagnostics, proactive management of bottlenecks and quickly bringing up or tearing down services. Learn how proper tooling can make the difference in running an excellent service versus a problem plagued environment.
Mike is a 25+ year technology veteran with past roles in software engineering, product development, planning, and operations at CA Technologies, Cisco, and AMD. He currently leads a business development team at CA Technologies driving their partnerships in virtualized infrastructure and converged compute environments. Mike is based in Santa Clara, California. His time outside of work is spent with wife and four children, biking, and running triathlons. He has bachelors and masters degrees from the University of California, Santa Barbara.
As you go into the cloud, the applications you are building will often be built on service-oriented architectures that communicate through RESTful APIs. Where API design and development used to be an uncommon thing, today it has become a basic application requirement. George Reese will cover the basic considerations in designing and implementing an API for your applications.
George Reese is the author of a number of technology books and a regular speaker on RESTful APIs, cloud computing, Java, and database systems. His most recent books are The REST API Design Handbook and O’Reilly’s Cloud Application Architectures. Professionally, he is the Executive Director of Cloud Computing at Dell as a result of Dell's recent acquisition of Enstratius, a company George co-founded. George has also led a number of Open Source projects, including several MUD libraries and the Imaginary Home home automation libraries for Java. He is also the primary maintainer of Dasein Cloud, a cloud abstraction API for Java.
George holds a BA from Bates College in Maine and an MBA from the Kellogg School of Management at Northwestern University.
Enterprise grade firewall and ssl termination to ac by will stevensbuildacloud
CloudOps has add support for enterprise grade security products in ACS. CloudOps has developed an integration with the Palo Alto Networks firewall appliance to enable ACS to orchestrate network features such as network creation, Source NAT, Static NAT, Port Forwarding and Firewall rules on the Palo Alto device. Additionally, CloudOps has extended ACS to support SSL certificate management as well as SSL termination by external load balancers. The existing ACS NetScaler plugin has been improved to support this new SSL termination functionality. The talk will cover the features added as well as a basic overview of how they are used.
Will Stevens is the Lead Developer at CloudOps. He has been directly involved in extending ACS to support more enterprise grade security functionality. Will has over 10 years experience as a software developer and is primarily focused on cloud integrations at CloudOps.
Securing Your Cloud With the Xen Hypervisor by Russell Pavlicekbuildacloud
The Xen Project produces a mature, enterprise-grade virtualization technology designed for the Cloud featuring many advanced and unique security features. For this reason, it's a hypervisor of choice for government agencies like NSA and the DoD, as well as for new security-minded projects the QubesOS Secure Desktop. However, while much of the security of Xen is inherent in its design, many of the advanced security features, such as stub domains, driver domains, and Xen Security Modules (XSM), are not enabled by default. This session will describe many of the advanced security features of Xen, as well as explaining why Xen is an excellent choice for secure Clouds
DevCloud - Setup and Demo on Apache CloudStack buildacloud
Hands-on Hacking Session by Amogh Vasekar
1. Demo of CloudStack using DevCloud
2. How we got there -
A) Building CloudStack from scratch
B) Deploying databases
C) Configuring your own DevCloud using Marvin
Cloud Network Virtualization with Juniper Contrailbuildacloud
Description: Contrail Technology will be discussed covering architecture, capabilities and use cases. It will be followed by a demonstration on current Contrail implementation on CloudStack/Openstack.
Parantap works as a Sr. Director of Solutions Engineering for Contrail Product within Juniper. Before Juniper, Parantap led the network architecture team for Microsoft Online Services (Windows Azure, MS Bing). Prior to Microsoft, Parantap worked as a core engineering manager for UUNet Technologies building Internet backbones.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
9. Templates
●
eth0, or is it eth1? Or maybe
p192p1?
●
“sysprep” for Windows, your own
solution for Linux
●
Prepare in CloudStack
environment?
●
Can't “import” them?
13. What to look for?
●
Warnings and errors and exceptions, oh my!
●
WARN, ERROR, Exception, Unable, Failed
●
VM name
●
Type of task that failed
●
Enable TRACE if necessary
●
The dreaded “avoid set”
●
Error text from UI/API
14. Jobs and Sequences
●
Job ID
–
Visible at API level
–
ID versus UUID
●
Sequence ID
–
Subordinate to Jobs
–
Sent to hosts or management
servers
15. What to do?
●
Depends on the errors
●
Check capacity
●
Check network
●
Keep waiting
●
Hack the database and retry
17. UI Error
●
Start with an error from the UI
Find the related log entry (at the end of the job).
2013-02-25 16:39:40,567 DEBUG
[cloud.async.AsyncJobManagerImpl] (Job-Executor-1:job-318)
Complete async job-318, jobStatus: 2, resultCode: 530,
result: Error Code: 533 Error text: Unable to create a
deployment for VM[User|holybigvm]
The actual error.
2013-02-25 16:39:40,459 DEBUG
[cloud.deploy.FirstFitPlanner] (Job-Executor-1:job-318) No
clusters found having a host with enough capacity,
returning.
18. The avoid set...
Hey guys, why is my host in the avoid set, and how do I
remove it?
2012-05-14 16:04:54,772 DEBUG
[allocator.impl.FirstFitAllocator] (Job-Executor-4:job-
17638 FirstFitRoutingAllocator) Host name:
somehost.example.local, hostId: 5 is in avoid set,
skipping this and trying other available hosts
19. ...means SCROLL UP
The real error will always be earlier in the job.
2012-05-14 16:04:54,735 ERROR
[network.router.VirtualNetworkApplianceManagerImpl] (Job-
Executor-4:job-17638) Unable to set dhcp entry for VM[User|i-
2-1607-VM] on domR: r-16-VM due to
2012-05-14 16:04:54,735 INFO
[cloud.vm.VirtualMachineManagerImpl] (Job-Executor-4:job-
17638) Unable to contact resource.
21. Hypervisor Errors (KVM)
management-server.log
2013-01-08 13:47:17,256 WARN
[cloud.vm.VirtualMachineManagerImpl] (AgentManager-Handler-
32:null) Cleanup failed due to Exception:
org.libvirt.LibvirtException
Message: internal error '/bin/umount /mnt/dd7d684f-255f-
3a24-87ab-512168a207b5' exited with non-zero status 16 and
signal 0: umount.nfs: /mnt/dd7d684f-255f-3a24-8
7ab-512168a207b5: device is busy
umount.nfs: /mnt/dd7d684f-255f-3a24-87ab-512168a207b5:
device is busy
/var/log/messages
2012-09-19 12:14:37,002 WARN
[resource.computing.LibvirtComputingResource] (Agent-
Handler-5:null) Exception
org.libvirt.LibvirtException: internal error '/bin/umount
/mnt/513d2d1f-38a7-3e3b-b2c5-ea3f4e0db6ba' exited with non-
zero status 16 and signal 0: umount.nfs: /mnt/513d2d1f-38a7-
3e3b-b2c5-ea3f4e0db6ba: device is busy
umount.nfs: /mnt/513d2d1f-38a7-3e3b-b2c5-ea3f4e0db6ba:
device is busy
22. Hypervisor Errors (vSphere)
2013-01-16 10:12:28,313 ERROR
[vmware.resource.VmwareResource] (DirectAgent-
161:esx102.example.com) CreateCommand failed due to
Exception: java.io.FileNotFoundException
Message:
https://stor1.example.com/folder/39b7d77f7dd547f695172b07882
daec9/ff7d2b46103e4595a1b422990a0799e2.vmdk?
dcPath=Petaluma&dsName=aaf29b8e-b548-3fad-8c59-0dad849b2704
java.io.FileNotFoundException:
https://stor1.example.com/folder/39b7d77f7dd547f695172b07882
daec9/ff7d2b46103e4595a1b422990a0799e2.vmdk?
dcPath=Petaluma&dsName=aaf29b8e-b548-3fad-8c59-0dad849b2704
at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(H
ttpURLConnection.java:1401)
at
sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputSt
ream(HttpsURLConnectionImpl.java:254)
<snip>
23. Exceptions
2013-02-06 15:36:45,023 ERROR [cloud.vm.VirtualMachineManagerImpl] (Job-Executor-
112:job-56957) Failed to start instance VM[User|a9167fdc-7d43-49f1-93cb-6c6902a666
fa]
com.cloud.utils.exception.CloudRuntimeException: Unable to acquire lock on
VMTemplateStoragePool: 2570
at
com.cloud.template.TemplateManagerImpl.prepareTemplateForCreate(TemplateManagerImpl.
java:638)
at com.cloud.utils.db.DatabaseCallback.intercept(DatabaseCallback.java:30)
at
com.cloud.storage.StorageManagerImpl.createVolume(StorageManagerImpl.java:3418)
at
com.cloud.storage.StorageManagerImpl.prepare(StorageManagerImpl.java:3327)
at
com.cloud.vm.VirtualMachineManagerImpl.advanceStart(VirtualMachineManagerImpl.java:7
49)
at
com.cloud.vm.VirtualMachineManagerImpl.start(VirtualMachineManagerImpl.java:467)
at
com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:2944)
at
com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:2616)
at
com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:2604)
<snip>
25. TRACE Example
●
Cannot snapshot a volume
The error shown in the UI wasn't logged at DEBUG level. It
was still possible to find the job (from job type and volume ID).
2013-01-15 13:01:43,249 DEBUG
[cloud.async.AsyncJobManagerImpl] (catalina-exec-9:null)
submit async job-36626, details: AsyncJobVO {id:36626,
userId: 215, accountId: 180, sessionKey: null, instanceType:
Snapshot, instanceId: 19951, cmd:
com.cloud.api.commands.CreateSnapshotCmd, cmdOriginator:
null, cmdInfo:
{"id":"19951","response":"json","sessionkey":"asdf","ctxUserI
d":"215","tenant":"be04d916-a916-483e-a96c-
925e0cb89575","volumeid":"4500","ctxAccountId":"180","ctxStar
tEventId":"136303","signature":"asdf","apikey":"asdf"},
cmdVersion: 0, callbackType: 0, callbackAddress: null,
status: 0, processStatus: 0, resultCode: 0, result: null,
initMsid: 172136269028748, completeMsid: null, lastUpdated:
null, lastPolled: null, created: null}
26. Could have found UI error with TRACE enabled.
2013-01-15 13:01:48,346 TRACE
[cloud.async.AsyncJobManagerImpl] (catalina-exec-14:null)
Job status: AsyncJobResult {jobId:36626, jobStatus: 2,
processStatus: 0, resultCode: 530, result:
com.cloud.api.response.ExceptionResponse/null/
{"errorcode":530,"errortext":"Internal error executing
command, please contact your system administrator"}}
27. The actual error didn't require TRACE.
2013-01-15 13:01:43,314 ERROR [cloud.api.ApiDispatcher]
(Job-Executor-41:job-36626) Exception while executing
CreateSnapshotCmd:
com.cloud.utils.exception.CloudRuntimeException: There is
other active snapshot tasks on the instance to which the
volume is attached, please try again later
at
com.cloud.storage.snapshot.SnapshotManagerImpl.createSnapsho
t(SnapshotManagerImpl.java:384)
at
com.cloud.utils.component.ComponentLocator$InterceptorDispat
cher.intercept(ComponentLocator.java:1128)
at
com.cloud.storage.snapshot.SnapshotManagerImpl.createSnapsho
t(SnapshotManagerImpl.java:118)
<snip>
35. Check agent.log on host 57...
2012-06-13 09:00:43,501 INFO [cloud.agent.Agent] (Agent-
Handler-1:null) Lost connection to the server. Dealing with
the remaining commands...
2012-06-13 09:00:48,502 INFO [cloud.agent.Agent] (Agent-
Handler-1:null) Reconnecting...
2012-06-13 09:00:48,520 INFO [utils.nio.NioClient] (Agent-
Selector:null) Connecting to 192.168.114.102:8250
2012-06-13 09:00:48,799 ERROR [utils.nio.NioConnection]
(Agent-Selector:null) Unable to connect to remote
...
2012-06-13 09:14:43,275 ERROR [utils.nio.NioConnection]
(Agent-Selector:null) Unable to connect to remote
2012-06-13 09:14:48,275 INFO [cloud.agent.Agent] (Agent-
Handler-1:null) Reconnecting...
2012-06-13 09:14:48,276 INFO [utils.nio.NioClient] (Agent-
Selector:null) Connecting to 192.168.114.102:8250
2012-06-13 09:14:50,936 INFO [utils.nio.NioClient] (Agent-
Selector:null) SSL: Handshake done
36. It did not make it to the host until 40 minutes later (also agent.log).
2012-06-13 21:44:20,660 DEBUG [cloud.agent.Agent] (agentRequest-Handler-
4:null) Request:Seq 57-57802800: { Cmd , MgmtId: 345050807280, via: 57,
Ver: v1, Flags: 100111, [{"storage.DestroyCommand":{"vmName":"r-9218-
VM","volume":{"id":10062,"name":"ROOT-
9218","mountPoint":"/pools/HKPool/kvm-primary","path":"/mnt/25a4ee3a-
7463-3bca-9e0e-cb0418f91557/09284c44-a940-4ec8-bec4-
a44bf63f3576","size":2097152000,"type":"ROOT","storagePoolType":"Network
Filesystem","storagePoolUuid":"25a4ee3a-7463-3bca-9e0e-
cb0418f91557","deviceId":0},"wait":0}}] }
It did succeed, though.
2012-06-13 21:44:20,672 DEBUG [cloud.agent.Agent] (agentRequest-Handler-
4:null) Seq 57-57802800: { Ans: , MgmtId: 345050807280, via: 57, Ver:
v1, Flags: 110, [{"Answer":
{"result":true,"details":"Success","wait":0}}] }
37. 20 minutes after the first destroyRouter, the administrator
hacked the database and ran destroyRouter again.
2012-06-13 09:26:27,038 DEBUG
[cloud.vm.VirtualMachineManagerImpl] (Job-Executor-100:job-
135208) Destroying vm VM[DomainRouter|r-9218-VM]
39. Sequence for second destroyRouter finally unstuck after the
host connectivity was restored and the previous sequence
completed.
2012-06-13 09:44:29,004 DEBUG [agent.manager.AgentAttache]
(AgentManager-Handler-9:null) Seq 57-57802807: Sending now.
is current sequence.
It failed since the volume was gone.
2012-06-13 09:44:31,788 DEBUG [agent.transport.Request]
(AgentManager-Handler-16:null) Seq 57-57802807: Processing:
{ Ans: , MgmtId: 345050807280, via: 57, Ver: v1, Flags: 110,
[{"Answer":
{"result":false,"details":"org.libvirt.LibvirtException:
Storage volume not found: no storage vol with matching
key","wait":0}}] }
41. Nearby Entries
●
Usually not relevant
●
Still can take a look
HA triggered on a few randomly dispersed VMs immediately
after restarting CloudStack.
2013-02-07 00:56:09,999 INFO
[cloud.ha.HighAvailabilityManagerImpl](main:null) Schedule
vm for HA: VM[User|i-2-12423-VM]
No sign of problems that might cause HA. But check
nearby entries that like even slightly related.
2013-02-07 00:56:09,985 INFO
[cloud.vm.VirtualMachineManagerImpl] (main:null) Handling
unfinished work item:ItWork[7d8a11b4-3c72-476b-9b79-
d9ce4171e131-Starting-12423-Release]
46. VLAN Issues
●
Symptoms
–
Switch misconfiguration
●
All VLANs trunked by default? Or
denied?
–
Router problems
–
Bad or mislabeled cabling
Symptom – cannot ping across hosts, VMs cannot
get DHCP sometimes (when DHCP server is on
another host).
Detection – Where does the traffic stop?
XS (bridge) / KVM: tcpdump; ESXi, XS (OVS):
dummy VMs? Check ARP / MAC address table.
Switch misconfiguration – common.
Confirm switchport is in trunk mode, not access;
confirm whether VLANs are actually allowed.
Cabling – traffic “randomly” dropped, traffic showing
up on wrong switchports or not at all.
Solution – Fix the switch/router config, replace
switches/router/cables.
47. More VLAN Issues
●
Hypervisor problems
–
NIC drivers
–
Bonding
–
Open vSwitch
–
VLAN Scalability
Bad drivers – What, you actually want to use
VLANs?
NIC bonding:
Symptoms – similar to switch misconfiguration.
Detection – NIC drivers / bonding – similar to switch
misconfiguration, but traffic stopped elsewhere.
Bonding – check config (XS), check if traffic is
dropped on the bond interface (ifconfig); disable one
slave NIC, or force failover; confirm subinterfaces are
on the right interface (the bond); change bond mode
(active-passive vs. SLB).
VLAN scaling – XS – high dom0 CPU, slowness.
Check iptables/ebtables rules on host (KVM, XS).
DB hacking – “wrong” VLANs in use.
Solutions – Update drivers, replace NICs, unhack db.
48. Security Groups
●
KVM
●
XenServer / XCP
–
Switch backend
–
CSP
●
vSphere...
Symptoms – VMs inaccessible (ingress),
cannot reach something (egress)
(partial, complete).
KVM/XS – check iptables/ebtables.
XS - Is the CSP installed? ARE YOU
SURE?! Some patches will blow it away.
Don't “optimize” your XS. It only looks
like a CentOS machine. General
optimization doesn't apply.
vSphere – no SG support.
SGs are at the level – migrate VM to
another host and see what happens.
49. “Host” Connectivity
●
Hypervisors
●
System VMs
●
Secondary Storage
–
Alert status is normal
Symptoms – connectivity, HA errors in log
Other network problems – firewall blocking ports; “weird” problems with
application layer firewall (e.g. ping, nmap to 22 work, ssh fails); bad load
balancer (for use with “host” Global Setting)
Requirements – Mgmt to hypervisor, vice versa – varies by hypervisor
XS: SSH, HTTPS; KVM: SSH; vSphere: 443/tcp to vCenter
System VMs to mgmt – 8250/tcp (“host” param)
System VMs to Internet (ping gateway)
Mgmt to system VMs – ssh via hypervisor (KVM, XS) or direct (vSphere) –
3922/tcp
Mgmt to sec store, hypervisors to sec store – varies by hypervisor; SSVM to sec
store
Mgmt to Mgmt (w/ multi-Mgmt) – 9090, 8250/tcp
CPVM must reach mgmt server and hypervisors (management/private network)
and end-user (public network)
CPVM proxies VNC from hypervisors to end-user
Public IPs must be accessible to end-users
CPVM uses realhostip.com domain by default – NOT a placeholder, it's a real
domain
Traffic is over HTTPS using *.realhostip.com wildcard cert by default
Change the domain and cert – must be valid! Potential URLs are a-b-c-
d.yourdomain.tld, where a-b-c-d are IPs with s/./-/g from public net
50. Virtual Router (domR)
●
Dnsmasq
●
HAProxy
●
Password resets
●
User- and Meta-data
DNS/DHCP provided by Dnsmasq.
HAProxy. LB function.
Reset script problems – Check DHCP
client and version for the template/VM.
Check domR for daemon problems
(8080/tcp on virtual router / socat
process, serve_password.sh)
User/Meta-data - Apache on 80/tcp
(Standard location - /var/www/html)
51. Templates
●
eth0, or is it eth1? Or maybe
p192p1?
●
“sysprep” for Windows, your own
solution for Linux
●
Prepare in CloudStack
environment?
●
Can't “import” them?
Templates preparation should follow best
practices from OS vendor.
Use “sysprep” for Windows, scripts /
deployment tools for Linux.
Linux suggestions – clear udev persistent
network device names, SSH keys, bash
history, logs, temp files.
It can be easier to prepare templates outside
of CS (especially PV mode Ubuntu (XS)
much easier) not always an option (e.g.
slow connectivity to CS environment).
Setup password reset script.
54. Hypervisor Hosts
●
XenServer / XCP
–
/var/log/SMlog, xensource.log
●
KVM
–
/var/log/cloud/agent/agent.log
–
/var/log/libvirt/libvirtd.log
●
vSphere
–
vCenter logs
XS – CS logs mainly to SMlog – Storage
Manager log. Errors encountered by hypervisor
often go to xensource.log.
KVM – agent.log for CloudStack errors – note:
not DEBUG by default; libvirtd.log for libvirt
errors.
vSphere – host logs not useful, check vCenter
logs.
XS/KVM - /var/log/messages can be useful –
libvirt and qemu errors – host problems (power
failure).
55. What to look for?
●
Warnings and errors and exceptions, oh my!
●
WARN, ERROR, Exception, Unable, Failed
●
VM name
●
Type of task that failed
●
Enable TRACE if necessary
●
The dreaded “avoid set”
●
Error text from UI/API
CS problems usually have something error-
related to grep for – WARN, ERROR, etc.
Often too many false positives to grep for errors
in general.
Also, grep can removes useful entries
I normally use “less”.
If there is a specific problem, grep for things
specific to it. VM name, task type, etc.
Hosts, storage, etc. in “avoid set”? THIS IS NOT
THE PROBLEM. SCROLL UP IN THE LOG TO
FIND THE REAL PROBLEM.
UI/API errors can be useful to grep for – Quote
the exact text for best results.
56. Jobs and Sequences
●
Job ID
–
Visible at API level
–
ID versus UUID
●
Sequence ID
–
Subordinate to Jobs
–
Sent to hosts or management
servers
Job ID from API not shown in UI – too
complicated?
Job ID not useful by itself since it's not logged!
In log, there is numeric job ID... “id” in database
(versus “uuid” from db in API response) – “job-
<id#>” in log.
select id from async_job where uuid = 'job ID
from API';
Jobs can contain sequences.
Sequences can depend on each other, even
across jobs – this can make a job get “stuck”.
They can go to hosts - Look for “Executing
request” and “Response received”.
Or to other management servers - Look for
“Forwarding Seq <id>”.
57. What to do?
●
Depends on the errors
●
Check capacity
●
Check network
●
Keep waiting
●
Hack the database and retry
Capacity – beware, there may be errors about
capacity that are not the real error. Look at the
job from beginning to end before making a
conclusion.
Network – “no route to host”; “connection
refused” - “it's the network” (usually).
Patience – they don't say it's a virtue for nothing.
Please, don't hack the database.
59. UI Error
●
Start with an error from the UI
Find the related log entry (at the end of the job).
2013-02-25 16:39:40,567 DEBUG
[cloud.async.AsyncJobManagerImpl] (Job-Executor-1:job-318)
Complete async job-318, jobStatus: 2, resultCode: 530,
result: Error Code: 533 Error text: Unable to create a
deployment for VM[User|holybigvm]
The actual error.
2013-02-25 16:39:40,459 DEBUG
[cloud.deploy.FirstFitPlanner] (Job-Executor-1:job-318) No
clusters found having a host with enough capacity,
returning.
You can usually find the error shown in
the UI in the log.
Once you find the job (job-318 here), it's
normally best to start from the beginning
of the job and work your way down.
60. The avoid set...
Hey guys, why is my host in the avoid set, and how do I
remove it?
2012-05-14 16:04:54,772 DEBUG
[allocator.impl.FirstFitAllocator] (Job-Executor-4:job-
17638 FirstFitRoutingAllocator) Host name:
somehost.example.local, hostId: 5 is in avoid set,
skipping this and trying other available hosts
61. ...means SCROLL UP
The real error will always be earlier in the job.
2012-05-14 16:04:54,735 ERROR
[network.router.VirtualNetworkApplianceManagerImpl] (Job-
Executor-4:job-17638) Unable to set dhcp entry for VM[User|i-
2-1607-VM] on domR: r-16-VM due to
2012-05-14 16:04:54,735 INFO
[cloud.vm.VirtualMachineManagerImpl] (Job-Executor-4:job-
17638) Unable to contact resource.
Also note that “grep job-<id>” does not
always work well... in the example, error
is “due to” what?!?
62. Hypervisor Errors (XS/XCP)
2012-07-13 08:08:05,149 WARN
[xen.resource.CitrixResourceBase] (DirectAgent-73:null)
Task failed! Task record: uuid:
95e36660-702f-8e07-5025-3bcf1
573e18c
nameLabel: Async.VM.start_on
nameDescription:
allowedOperations: []
currentOperations: {}
created: Fri Jul 13 08:08:04 PDT 2012
finished: Fri Jul 13 08:08:04 PDT 2012
status: FAILURE
residentOn: com.xensource.xenapi.Host@fed4b193
progress: 1.0
type: <none/>
result:
errorInfo: [SR_BACKEND_FAILURE_46, , The VDI
is not available [opterr=VDI e7f5571b-44b1-48b7-ba84-
34d9c3ee879b already attached RW]]
otherConfig: {}
subtaskOf: com.xensource.xenapi.Task@aaf13f6f
subtasks: []
Errors from hypervisor may show up
Example from XenServer.
Typical solution for this:
1. Get the SR UUID and name-label for the
affected VDI (UUID is shown in the error in
management-server.log, and also in
volumes table of database):
xe vdi-list uuid=<UUID of affected VDI>
2. Forget the affected VDI:
xe vdi-forget uuid=<UUID of affected VDI>
3. Rescan the SR:
xe sr-scan uuid=<SR UUID from step 1>
4. Give the VDI the correct name-label:
xe vdi-param-set uuid=< UUID of affected
VDI> name-label=<name-label from step 1>
63. Hypervisor Errors (KVM)
management-server.log
2013-01-08 13:47:17,256 WARN
[cloud.vm.VirtualMachineManagerImpl] (AgentManager-Handler-
32:null) Cleanup failed due to Exception:
org.libvirt.LibvirtException
Message: internal error '/bin/umount /mnt/dd7d684f-255f-
3a24-87ab-512168a207b5' exited with non-zero status 16 and
signal 0: umount.nfs: /mnt/dd7d684f-255f-3a24-8
7ab-512168a207b5: device is busy
umount.nfs: /mnt/dd7d684f-255f-3a24-87ab-512168a207b5:
device is busy
/var/log/messages
2012-09-19 12:14:37,002 WARN
[resource.computing.LibvirtComputingResource] (Agent-
Handler-5:null) Exception
org.libvirt.LibvirtException: internal error '/bin/umount
/mnt/513d2d1f-38a7-3e3b-b2c5-ea3f4e0db6ba' exited with non-
zero status 16 and signal 0: umount.nfs: /mnt/513d2d1f-38a7-
3e3b-b2c5-ea3f4e0db6ba: device is busy
umount.nfs: /mnt/513d2d1f-38a7-3e3b-b2c5-ea3f4e0db6ba:
device is busy
This is caused by a bug in versions of
libvirt prior to 0.9.4:
http://www.redhat.com/archives/libvir-
list/2011-February/msg00637.html
So upgrade your libvirt packages... on
all KVM hosts.
64. Hypervisor Errors (vSphere)
2013-01-16 10:12:28,313 ERROR
[vmware.resource.VmwareResource] (DirectAgent-
161:esx102.example.com) CreateCommand failed due to
Exception: java.io.FileNotFoundException
Message:
https://stor1.example.com/folder/39b7d77f7dd547f695172b07882
daec9/ff7d2b46103e4595a1b422990a0799e2.vmdk?
dcPath=Petaluma&dsName=aaf29b8e-b548-3fad-8c59-0dad849b2704
java.io.FileNotFoundException:
https://stor1.example.com/folder/39b7d77f7dd547f695172b07882
daec9/ff7d2b46103e4595a1b422990a0799e2.vmdk?
dcPath=Petaluma&dsName=aaf29b8e-b548-3fad-8c59-0dad849b2704
at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(H
ttpURLConnection.java:1401)
at
sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputSt
ream(HttpsURLConnectionImpl.java:254)
<snip>
“Sometimes” vCenter doesn't like paths that are
“too long”. So in this case, rename the datastore
to not include hyphens.
65. Exceptions
2013-02-06 15:36:45,023 ERROR [cloud.vm.VirtualMachineManagerImpl] (Job-Executor-
112:job-56957) Failed to start instance VM[User|a9167fdc-7d43-49f1-93cb-6c6902a666
fa]
com.cloud.utils.exception.CloudRuntimeException: Unable to acquire lock on
VMTemplateStoragePool: 2570
at
com.cloud.template.TemplateManagerImpl.prepareTemplateForCreate(TemplateManagerImpl.
java:638)
at com.cloud.utils.db.DatabaseCallback.intercept(DatabaseCallback.java:30)
at
com.cloud.storage.StorageManagerImpl.createVolume(StorageManagerImpl.java:3418)
at
com.cloud.storage.StorageManagerImpl.prepare(StorageManagerImpl.java:3327)
at
com.cloud.vm.VirtualMachineManagerImpl.advanceStart(VirtualMachineManagerImpl.java:7
49)
at
com.cloud.vm.VirtualMachineManagerImpl.start(VirtualMachineManagerImpl.java:467)
at
com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:2944)
at
com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:2616)
at
com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:2604)
<snip>
Often some hint in the first few lines.
Here there is a problem in
TemplateManagerImpl.java – even if you're
not a dev, it points to something template
related.
In this case they were trying to deploy a lot
of VMs from a new template within a short
time which was failing with an exception and
not gracefully (bug).
66. Forwarding Sequences
2013-01-24 14:09:28,063 DEBUG
[agent.manager.ClusteredAgentAttache] (AgentManager-Handler-
13:null) Seq 6-1372586715: Forwarding Seq 6-1372586715: { Cmd
, MgmtId: 83533443070, via: 6, Ver: v1, Flags: 100111,
[{"StopCommand":{"isProxy":false,"vmName":"i-7-121-
VM","wait":0}}] } to 83533443165
In this case, the admin said one particular
task was failing.
Further investigation revealed that
“everything” was getting “stuck” - only one
mgmt server, so mgmt server endlessly
forwarded everything to itself – bad situation
– duplicate management servers in mshost
table caused this which “should never
happen”.
67. TRACE Example
●
Cannot snapshot a volume
The error shown in the UI wasn't logged at DEBUG level. It
was still possible to find the job (from job type and volume ID).
2013-01-15 13:01:43,249 DEBUG
[cloud.async.AsyncJobManagerImpl] (catalina-exec-9:null)
submit async job-36626, details: AsyncJobVO {id:36626,
userId: 215, accountId: 180, sessionKey: null, instanceType:
Snapshot, instanceId: 19951, cmd:
com.cloud.api.commands.CreateSnapshotCmd, cmdOriginator:
null, cmdInfo:
{"id":"19951","response":"json","sessionkey":"asdf","ctxUserI
d":"215","tenant":"be04d916-a916-483e-a96c-
925e0cb89575","volumeid":"4500","ctxAccountId":"180","ctxStar
tEventId":"136303","signature":"asdf","apikey":"asdf"},
cmdVersion: 0, callbackType: 0, callbackAddress: null,
status: 0, processStatus: 0, resultCode: 0, result: null,
initMsid: 172136269028748, completeMsid: null, lastUpdated:
null, lastPolled: null, created: null}
Need to find the job - Couldn't find the error
from the UI with default DEBUG log.
Found it here from “CreateSnapshot” and
volume ID (4500).
68. Could have found UI error with TRACE enabled.
2013-01-15 13:01:48,346 TRACE
[cloud.async.AsyncJobManagerImpl] (catalina-exec-14:null)
Job status: AsyncJobResult {jobId:36626, jobStatus: 2,
processStatus: 0, resultCode: 530, result:
com.cloud.api.response.ExceptionResponse/null/
{"errorcode":530,"errortext":"Internal error executing
command, please contact your system administrator"}}
The error in the UI was logged at TRACE
level instead of DEBUG for some reason
(bug).
To enable TRACE logging, edit
/etc/cloud/management/log4j-xml.conf and
set the com.cloud category to TRACE as
shown below. There is no need to restart
CloudStack.
<category name="com.cloud">
<priority value="TRACE"/>
</category>
69. The actual error didn't require TRACE.
2013-01-15 13:01:43,314 ERROR [cloud.api.ApiDispatcher]
(Job-Executor-41:job-36626) Exception while executing
CreateSnapshotCmd:
com.cloud.utils.exception.CloudRuntimeException: There is
other active snapshot tasks on the instance to which the
volume is attached, please try again later
at
com.cloud.storage.snapshot.SnapshotManagerImpl.createSnapsho
t(SnapshotManagerImpl.java:384)
at
com.cloud.utils.component.ComponentLocator$InterceptorDispat
cher.intercept(ComponentLocator.java:1128)
at
com.cloud.storage.snapshot.SnapshotManagerImpl.createSnapsho
t(SnapshotManagerImpl.java:118)
<snip>
Issue was another snapshot in progress on
the VM... or at least that is what CloudStack
though – there was a snapshot “stuck” in
“BackingUp” status for some reason (bug).
70. Patience level over 9000!
Reboot of virtual router initiated.
2012-08-31 12:13:06,333 DEBUG
[cloud.async.AsyncJobManagerImpl] (catalina-exec-14:null)
submit async job-5473, details: AsyncJobVO {id:5473, userId:
161, accountId: 2, sessionKey: null, instanceType:
DomainRouter, instanceId: 4054, cmd:
com.cloud.api.commands.RebootRouterCmd, cmdOriginator: null,
cmdInfo:
{"response":"json","id":"4054","sessionkey":"84uOYXRynfqNDk7
QTvYO4Nek238u003d","ctxUserId":"161","_":"1346411586256","c
txAccountId":"2","ctxStartEventId":"22455"}, cmdVersion: 0,
callbackType: 0, callbackAddress: null, status: 0,
processStatus: 0, resultCode: 0, result: null, initMsid:
345052684411, completeMsid: null, lastUpdated: null,
lastPolled: null, created: null}
In some cases, just need to be patient.
Example - Administrator complained that a
virtual router took a long time to reboot.
Note: job-5461
71. But it has to wait for Seq 1415446801.
2012-08-31 12:13:06,377 DEBUG [agent.transport.Request]
(Job-Executor-131:job-5473) Seq 16-1415446919: Waiting for
Seq 1415446801 Scheduling: { Cmd , MgmtId: 345052684411,
via: 16, Ver: v1, Flags: 100111, [{"StopCommand":
{"isProxy":false,"privateRouterIpAddress":"10.255.105.3","v
mName":"r-4054-VM","wait":0}}] }
Much later the job proceeds...
2012-08-31 12:41:14,749 DEBUG [agent.transport.Request]
(DirectAgent-238:null) Seq 16-1415446919: Executing:
{ Cmd , MgmtId: 345052684411, via: 16, Ver: v1, Flags:
100111, [{"StopCommand":
{"isProxy":false,"privateRouterIpAddress":"10.255.105.3","v
mName":"r-4054-VM","wait":0}}] }
75. Less Patience :-(
destroyRouter received by the management server.
2012-06-13 09:05:36,321 DEBUG
[cloud.vm.VirtualMachineManagerImpl] (Job-Executor-88:job-
135158) Destroying vm VM[DomainRouter|r-9218-VM]
Host 57 instructed to destroy the volume.
2012-06-13 09:05:36,387 DEBUG [agent.transport.Request] (Job-
Executor-88:job-135158) Seq 57-57802800: Sending { Cmd ,
MgmtId: 345050807280, via: 57, Ver: v1, Flags: 100111,
[{"storage.DestroyCommand":{"vmName":"r-9218-VM","volume":
{"id":10062,"name":"ROOT-
9218","mountPoint":"/pools/HKPool/kvm-
primary","path":"/mnt/25a4ee3a-7463-3bca-9e0e-
cb0418f91557/09284c44-a940-4ec8-bec4-
a44bf63f3576","size":2097152000,"type":"ROOT","storagePoolTyp
e":"NetworkFilesystem","storagePoolUuid":"25a4ee3a-7463-3bca-
9e0e-cb0418f91557","deviceId":0},"wait":0}}] }
The admin reported they tried to destroy a
virtual router. “It didn't work” and the router
was “stuck” in Stopping state.
Then they hacked the db to put virtual router
back to “Running” state, tried to destroy the
router again, and it worked.
Why did it fail the first time?
76. Completed 40 minutes later.
2012-06-13 09:44:20,863 DEBUG [agent.transport.Request]
(Job-Executor-88:job-135158) Seq 57-57802800: Received:
{ Ans: , MgmtId: 345050807280, via: 57, Ver: v1, Flags: 110,
{ Answer } }
2012-06-13 09:44:20,863 DEBUG
[cloud.vm.VirtualMachineManagerImpl] (Job-Executor-88:job-
135158) Cleanup succeeded. Details Success
2012-06-13 09:44:20,883 DEBUG
[cloud.storage.StorageManagerImpl] (Job-Executor-88:job-
135158) Volume successfully expunged from 208
2012-06-13 09:44:20,883 DEBUG
[cloud.vm.VirtualMachineManagerImpl] (Job-Executor-88:job-
135158) Expunged VM[DomainRouter|r-9218-VM]
The job eventually succeeded. But two
problem:
1. Contradicts administrator's description of
events.
2. Why did it take 40 minutes to destroy a
router?
77. Check agent.log on host 57...
2012-06-13 09:00:43,501 INFO [cloud.agent.Agent] (Agent-
Handler-1:null) Lost connection to the server. Dealing with
the remaining commands...
2012-06-13 09:00:48,502 INFO [cloud.agent.Agent] (Agent-
Handler-1:null) Reconnecting...
2012-06-13 09:00:48,520 INFO [utils.nio.NioClient] (Agent-
Selector:null) Connecting to 192.168.114.102:8250
2012-06-13 09:00:48,799 ERROR [utils.nio.NioConnection]
(Agent-Selector:null) Unable to connect to remote
...
2012-06-13 09:14:43,275 ERROR [utils.nio.NioConnection]
(Agent-Selector:null) Unable to connect to remote
2012-06-13 09:14:48,275 INFO [cloud.agent.Agent] (Agent-
Handler-1:null) Reconnecting...
2012-06-13 09:14:48,276 INFO [utils.nio.NioClient] (Agent-
Selector:null) Connecting to 192.168.114.102:8250
2012-06-13 09:14:50,936 INFO [utils.nio.NioClient] (Agent-
Selector:null) SSL: Handshake done
No DEBUG (disabled by default) for
agent.log.
78. It did not make it to the host until 40 minutes later (also agent.log).
2012-06-13 21:44:20,660 DEBUG [cloud.agent.Agent] (agentRequest-Handler-
4:null) Request:Seq 57-57802800: { Cmd , MgmtId: 345050807280, via: 57,
Ver: v1, Flags: 100111, [{"storage.DestroyCommand":{"vmName":"r-9218-
VM","volume":{"id":10062,"name":"ROOT-
9218","mountPoint":"/pools/HKPool/kvm-primary","path":"/mnt/25a4ee3a-
7463-3bca-9e0e-cb0418f91557/09284c44-a940-4ec8-bec4-
a44bf63f3576","size":2097152000,"type":"ROOT","storagePoolType":"Network
Filesystem","storagePoolUuid":"25a4ee3a-7463-3bca-9e0e-
cb0418f91557","deviceId":0},"wait":0}}] }
It did succeed, though.
2012-06-13 21:44:20,672 DEBUG [cloud.agent.Agent] (agentRequest-Handler-
4:null) Seq 57-57802800: { Ans: , MgmtId: 345050807280, via: 57, Ver:
v1, Flags: 110, [{"Answer":
{"result":true,"details":"Success","wait":0}}] }
Connectivity errors finally cleared up and
host 57 received and processed the job the
job.
79. 20 minutes after the first destroyRouter, the administrator
hacked the database and ran destroyRouter again.
2012-06-13 09:26:27,038 DEBUG
[cloud.vm.VirtualMachineManagerImpl] (Job-Executor-100:job-
135208) Destroying vm VM[DomainRouter|r-9218-VM]
81. Sequence for second destroyRouter finally unstuck after the
host connectivity was restored and the previous sequence
completed.
2012-06-13 09:44:29,004 DEBUG [agent.manager.AgentAttache]
(AgentManager-Handler-9:null) Seq 57-57802807: Sending now.
is current sequence.
It failed since the volume was gone.
2012-06-13 09:44:31,788 DEBUG [agent.transport.Request]
(AgentManager-Handler-16:null) Seq 57-57802807: Processing:
{ Ans: , MgmtId: 345050807280, via: 57, Ver: v1, Flags: 110,
[{"Answer":
{"result":false,"details":"org.libvirt.LibvirtException:
Storage volume not found: no storage vol with matching
key","wait":0}}] }
82. Log Anomalies
●
Lost time
●
Out of order
●
Zero byte management-
server.log
–
lol no.
Can result from high load, such as log
rotation of multiple >1GB log files.
Change the rotation in log4j
configuration.
83. Nearby Entries
●
Usually not relevant
●
Still can take a look
HA triggered on a few randomly dispersed VMs immediately
after restarting CloudStack.
2013-02-07 00:56:09,999 INFO
[cloud.ha.HighAvailabilityManagerImpl](main:null) Schedule
vm for HA: VM[User|i-2-12423-VM]
No sign of problems that might cause HA. But check
nearby entries that like even slightly related.
2013-02-07 00:56:09,985 INFO
[cloud.vm.VirtualMachineManagerImpl] (main:null) Handling
unfinished work item:ItWork[7d8a11b4-3c72-476b-9b79-
d9ce4171e131-Starting-12423-Release]
Often too much going on to find related
entries nearby, so need to filter by job ID,
Seq ID, etc.
But sometimes you don't have a job or Seq
to look for.
So can look for stuff like VM ID (that is the
“id” from database, not the “uuid” from db
that is reported as ID in UI).
In this example there are stale entries in
op_it_work and/or op_ha_work due to some
bug.