#CCCEU14
#CCCEU14 
TROUBLESHOOTING APACHE CLOUDSTACK
#CCCEU14 
JORIS VAN LIESHOUT 
Working @ Schuberg Philis since 2010 
Mission Critical 
+3 year IaaS team 
Part of the initial CS vs OS 
Started with CloudStack 2.2.14
#CCCEU14 
Reading the logs 
Understanding System VMs 
Use the source, Luke! 
DB API, hands off? 
Employee Cloud 
Questions 
Side notes 
• Have you worked with ACS? 
• CloudStack 4.4.1 
• XenServer 6.2.0 SP1 
• Default log settings 
AGENDA
#CCCEU14 
Less is more <=> 
• less /var/log/cloudstack/management/management-server. 
log 
Keep management server ids at hand 
• select * from cloud.mshost where removed is null; 
Stack traces 
• Look at the first instead of the last 
Search for API key 
Log4j 1.2 EnhancedPatternLayout 
• /etc/cloudstack/management/log4j-cloud.xml 
READING THE LOGS
#CCCEU14 
READING THE LOGS 
(Date, Time), Log priority 
Class 
• Will match a .java file in the source
#CCCEU14 
READING THE LOGS 
Thread Name 
Thread context 
Optional Job id
#CCCEU14 
READING THE LOGS 
When forwarding commands 
• Host_id-Sequence_nr 
• MgmtId 
• Can be the other server 
• Via 
• Command
#CCCEU14 
READING THE LOGS 
Find the first call 
• API call 
• API key 
• Name 
Thread name and 
first context 
Async creates a job
#CCCEU14 
READING THE LOGS 
Find the first call 
Thread name and 
first context 
Might creates a job 
Picked up by…
#CCCEU14 
READING THE LOGS 
Thread name and 
first context 
Sending… 
• Track sequence id 
• Keep an eye on MgmtID 
Executing…
#CCCEU14 
UNDERSTANDING SYSTEM VMS 
ssh to SVM 
• From hypervisor 
• Port 3922 
• /root/.ssh/id_rsa.cloud 
xensourse.log, SMlog and cloud/vmops.log 
XAPI call to vmops plugin
#CCCEU14 
USE THE SOURCE, LUKE! 
GitHub 
• https://github.com/apache/cloudstack 
Eclipse for Java EE 
• https://cwiki.apache.org/confluence/display/CLOUDSTACK/ 
Using+Eclipse+With+CloudStack 
DevCloud4 
• https://github.com/imduffy15/devcloud4
#CCCEU14 
Read only! 
Unless… 
• After code review 
• Bug solution also changes db 
Why? 
• Change state => data mismatch 
• Incorrect value => ACS start fails 
• DB data is not always leading 
• Effects of DB change can stick around 
Marvin and CloudMonkey 
• And find a way out without DB change 
DB API, HANDS OFF!
#CCCEU14 
EMPLOYEE CLOUD 
Realistic workload 
Use as UAT environment 
To reproduce bugs and workarounds 
Technology test ground
#CCCEU14 
jvanlieshout@schubergphilis.com 
@JorizvL 
QUESTIONS?

Troubleshooting Apache CloudStack at #ccceu14 by @jorizvl

  • 1.
  • 2.
  • 3.
    #CCCEU14 JORIS VANLIESHOUT Working @ Schuberg Philis since 2010 Mission Critical +3 year IaaS team Part of the initial CS vs OS Started with CloudStack 2.2.14
  • 4.
    #CCCEU14 Reading thelogs Understanding System VMs Use the source, Luke! DB API, hands off? Employee Cloud Questions Side notes • Have you worked with ACS? • CloudStack 4.4.1 • XenServer 6.2.0 SP1 • Default log settings AGENDA
  • 5.
    #CCCEU14 Less ismore <=> • less /var/log/cloudstack/management/management-server. log Keep management server ids at hand • select * from cloud.mshost where removed is null; Stack traces • Look at the first instead of the last Search for API key Log4j 1.2 EnhancedPatternLayout • /etc/cloudstack/management/log4j-cloud.xml READING THE LOGS
  • 6.
    #CCCEU14 READING THELOGS (Date, Time), Log priority Class • Will match a .java file in the source
  • 7.
    #CCCEU14 READING THELOGS Thread Name Thread context Optional Job id
  • 8.
    #CCCEU14 READING THELOGS When forwarding commands • Host_id-Sequence_nr • MgmtId • Can be the other server • Via • Command
  • 9.
    #CCCEU14 READING THELOGS Find the first call • API call • API key • Name Thread name and first context Async creates a job
  • 10.
    #CCCEU14 READING THELOGS Find the first call Thread name and first context Might creates a job Picked up by…
  • 11.
    #CCCEU14 READING THELOGS Thread name and first context Sending… • Track sequence id • Keep an eye on MgmtID Executing…
  • 12.
    #CCCEU14 UNDERSTANDING SYSTEMVMS ssh to SVM • From hypervisor • Port 3922 • /root/.ssh/id_rsa.cloud xensourse.log, SMlog and cloud/vmops.log XAPI call to vmops plugin
  • 13.
    #CCCEU14 USE THESOURCE, LUKE! GitHub • https://github.com/apache/cloudstack Eclipse for Java EE • https://cwiki.apache.org/confluence/display/CLOUDSTACK/ Using+Eclipse+With+CloudStack DevCloud4 • https://github.com/imduffy15/devcloud4
  • 14.
    #CCCEU14 Read only! Unless… • After code review • Bug solution also changes db Why? • Change state => data mismatch • Incorrect value => ACS start fails • DB data is not always leading • Effects of DB change can stick around Marvin and CloudMonkey • And find a way out without DB change DB API, HANDS OFF!
  • 15.
    #CCCEU14 EMPLOYEE CLOUD Realistic workload Use as UAT environment To reproduce bugs and workarounds Technology test ground
  • 16.

Editor's Notes

  • #3 Everyone has seen <screenshot> Troubleshooting can be daunting <CLICK> Is it a infra issue, bug, something else Quick poll: Dev, Ops, Other? Today talk about: Fair share of outages (boot launch failed) From operational perspective: Some bugs only in env with PRD workload With this: Better DevOps relation
  • #4 As MCE (Mission Critical Engineer) CloudStack vs OpenStack, easily won by CS
  • #5 Bit much text, understanding will help a lot. Rohit api call life. How see commands are forwarded. Check out presentation Sten Use the source code and Dev tools Common discussion, my take on it Best tip I can give you Q: If time <CLICK> Notes: Ask: Who has? Based on DevCloud4 (4.4.1 and XS62EPS1) We never had the need to adjust log4j config
  • #6 Instead of grep, tail Or something like LogStash or Splunk When command gets forwarded to other management server And host ids (inc CVM and SSVM) Last ST frequently is result of earlier fault Search for instance name, network name, api call or API key API key: trace what user did. F.I. Citrix Studio *Brower plugin to see calls: Firefox, FireBug
  • #7 Bake down of a log line Remove date and time in all examples Debug is default Class (as of 4.4 abbreviated) Matches a source file on git
  • #8 Name and Number Can be many. F.I. DirectAgent thread ID: Unique per cycle Usually search for this combination For Async calls a Job id to track the work across threads We’ll look at this in a bit
  • #9 Forwarding commands to other hosts: hosts, CVMs, SSVMs host_id can be found in DB: cloud.hosts seq for tracking MgmtId might be other server when host managed by that server Via is the same as host_id. As of 4.4 this is expanded to full name.
  • #10 Finding call: API call or key or Name of Instance, Network, template, etc. between ===START=== and ===END=== Thread name and context can been seen as conversation. Async call will create a job for async execution
  • #11 Search for Job-id Again search for Thread name and context Repeat for next job can jump management server Different Thread and context
  • #12 In a conversation a task for a host is send to thread Picked up by thread and executed can jump management server In case of XenServer result in XAPI calls
  • #13 Very brief on SVMs: check out next talk by Sten This applies to XenServer get control ip from ACS (169.254.0.0) Depending on the call Calls for SVMs using vmops
  • #14 Have a look at GitHub clone essential Eclipse recommended DevCloud4 good to have
  • #15 As read only source really valuable Although most info using API Really careful Unless Know how the code responds to change in DB Stay in like with fix in code State change => example: NIC not removed, ref tables Incorrect state => example: instance state Expunged, network state Destroyed host ping => db is backup instance removed nic still there network will never return to allocated Both very powerful example => host in alert with pingtimeout, cluster unmanage.
  • #16 Load bugs, race conditions Also hardware load Not only ACS upgrades also other components: Hypervisors, storage, networking Tested solution for snapshot Dom0 load bug Adoption of Cloud tech For use Chef, Graphite, new OS, many PoCs Employee rack now is Employee cloud => lower power consumption