What should you do if you think you’ve got a problem with Perforce Helix? In this session, understand what the common issues are, what to look for, where to find help and how our Support engineers can assist you.
4. “For a moment, nothing
happened. Then, after a
second or so, nothing
continued to happen.”
5. 5
Nothing Happened…
What is the “Baseline” of your Server?
Questions to ask yourself
• How fast should Commands should come back? What’s a reasonable time
for my server?
• Errors aren’t happening? What errors can I ignore?
• What’s my normal load averages? CPU;RAM;I/O
• How quiet are my end users?
What do you consider normal?
8. “Funny, how just when you
think life can’t possibly get
any worse it suddenly
does.”
9. 9
Funny…
What is now not normal?
When?
Where is something broken?
• Server side?
• Client side?
• Network?
What is the actual error?
What does the user say is happening?
10. 10
Funny…
Operating System vs Perforce
Perforce client error:
Connect to server failed; check $P4PORT.
TCP connect to asoida:1666 failed.
Name or service not known
Perforce client error:
Connect to server failed; check $P4PORT.
TCP connect to 10.3.0.2:1668 failed.
connect: 10.3.0.2:1668: Connection refused
vs
11. 11
Funny…
Operating System
Perforce client error:
Connect to server failed; check $P4PORT.
TCP connect to 10.3.0.2:1668 failed.
connect: 10.3.0.2:1668: Connection refused
12. 12
Funny…
Operating System
Perforce server error:
Operation: lbr-SubmitFile
Operation 'lbr-SubmitFile' failed.
Librarian checkin //depot/dev/tomahony/hitchhickers.txt failed.
lock on depot/dev/tomahony/hitchhickers.txt failed
open for write: depot/dev/tomahony/hitchhickers.txt,v : Permission denied
13. 13
Funny…
Perforce Errors
Submit validation failed -- fix problems then use 'p4 submit -c 835231'.
'check_in_test' validation failed: no error message
Client map too twisted for directory list.
$ p4 add install
install - no permission for operation on file(s).
Perforce password (P4PASSWD) invalid or unset.
20. 20
Bad Vogon Poetry
--- lapse 908s
--- usage 148058+740688us 0+0io 1+0net 0k 0pf
--- killed by client disconnect
Long Lapse times with large CPU usage
Long Database locking
--- db.revdx
--- pages in+out+cached 131340+0+96
--- locks read/write 1/1 rows get+pos+scan put+del 0+18+269548 0+0
--- total lock wait+held read/write 0ms+0ms/0ms+686753ms
Culprit vs Victim – not all long commands are bad things
21. 21
Vogon Poetry
Knowledge base articles to help you:
Interpreting server log files
• http://answers.perforce.com/articles/KB/2525
Simple P4D Log Analysis
• http://answers.perforce.com/articles/KB/2514
Using the Log Analyzer
• http://answers.perforce.com/articles/KB/1266
22. 22
Vogon Poetry
Manage your log files
Tracking and verbose logging can get big very quickly…
• no bigger than that.
Consider where you are storing your logs
How much can you store?
• 1 day, 2 days, 7 days?
Consider non-perforce logging too as part of your baseline
24. “it has the words DON’T
PANIC inscribed in large
friendly letters on it’s
cover.”
25. 25
Don’t Panic
BTree is corrupt! dbverify
• When to run “p4d –xv”?
• When should I run “p4d –xx”
• p4d –xf
• p4 dbverify
• p4 dbstat
26. 26
Don’t Panic
-xv vs –xx
• Validating db files
• Inconsistencies found
- jnl.fix files
Not the same thing
Check with support!
27. 27
Don’t Panic
Validating db.have
**** Checksum mismatch ****
possible data corruption Problems Summary:
pages which are not connect to tree or freelist
pages which are not valid or uninterruptable
pages which are visited multiple times in tree and freelist
data is out of order - table restore required
B-tree does not have consistent level count
p4d –r $P4ROOT –xv
• output – as bad as it gets
28. 28
Don’t Panic
p4d –r $P4ROOT –xv; recovery
• Checkpoints, checkpoints and checkpoints
• p4d –r $P4ROOT –jds –z dump.gz
• Grep the journal
• Partially restore tables
• Is it a secondary index? Can you rebuild using –xx?
33. 33
Babel Fish
What should I have ready?
• Version details (p4 –Ztag info)
• Server configurables (p4 configure show)
• Server logs (ask for ftp accounts)
• Reproduction cases…
• The actual “error”
• Checkpoints/journals at hand; at least a list of what you have.
34. 34
Babel Fish
Support wants to know the big picture
We want to know what and where you’re going.
We want to know why you you’re doing something
Support wants to know the small details
It’s the small things that count.
These can give the biggest clues but are easily missed
36. “It is a mistake to think
you can solve any major
problems just with
potatoes.”
37. 37
Solving issues with Potatoes
Don’t turn it on and off again
Killing processes can cause worse things to happen
If you must shutdown see Shutting down the Server:
http://answers.perforce.com/articles/KB/2580
Who’s running what commands?
Look at p4 monitor; nestat –anp;
Read the full error!
Corruptions – Restore from checkpoints/journals
38. 38
Solving issues with Potatoes
Who changed what and when?
Know your timings of events
Check what started in the logs around those times
Can you back out the change? Does this help?
41. 41
42
Going forward
Server log, storage, know what errors you see
Steps to take
Create baselines
Thinks to consider
Disaster Recovery
42. 42
42
Creating Baselines
• Df
• sar
• iostat
• vmstat
• Mpstat
Regular Intervals for all commands
Nagios plugins
43. 43
42
Iostat
• Iostat without any argument displays information about the CPU usage, and I/O
statistics about all the partitions on the system as shown below.
44. 44
42
vmstat
• vmstat by default will display the memory usage (including swap) as shown
below.
45. 45
42
vmstat
• Procs – r: Total number of processes waiting to run
• Procs – b: Total number of busy processes
• Memory – swpd: Used virtual memory
• Memory – free: Free virtual memory
• Memory – cache: Memory used as cache.
• Swap – si: Memory swapped from disk (for every second)
• Swap – so: Memory swapped to disk (for every second)
• IO – bi: Blocks in. i.e blocks received from device (for every second)
• IO – bo: Blocks out. i.e blocks sent to the device (for every second)
• System – in: Interrupts per second
• System – cs: Context switches
• CPU – us, sy, id, wa, st: CPU user time, system time, idle time, wait time
46. 46
42
Mpstat
• By default mpstat displays CPU statistics
• Option -P ALL, displays all the individual CPUs (or Cores) along with its
statistics as shown below.
47. 47
42
Nagios Plugin
Written by Karl Wirth, UK Tech Support
• Available in the workshop
https://swarm.workshop.perforce.com/files/guest/karl_wirth/Nagios
48. So long, and thanks
for all the Fish!
https://www.perforce.com/support-services
Tim O’Mahony
tomahony@perforce.com