Is Evergreen Slowing Down – Basic NetworkTroubleshootingRogan Hamby, June 13th 2013
Is Evergreen slowing down?
There could be several culprits.
Staff Client Issues
There are known memory leaks inthe staff client. These are beingactively addressed by thecommunity.
If this is happening it probably isn’thappening the same at all stations.Reboot the troubled station.
Network Issues
From your local switch having fits toa router in Tennessee dying tosomeone in Atlanta doing a thirteenterabit backup we ar...
Usually these problems will growslowly. All machines will be affectedbut it may not seem like that at firstas some activit...
Staff facing patrons and thosefunctions moving large data frames(e.g. cataloging) will usually noticefirst because lost pa...
Now it’s important to look at yournetwork path. There are manycommon elements in the paths fromSCLENDS member libraries to...
If you use ICMP or UDP based toolsbe aware of the false positives theycan give since they are oftenblocked.
I recommend that you use TCPbased trace routes.
Windows – Pingplotter Prohttp://www.pingplotter.com/pro/
Linux – traceroute -T
Mac – Path Analyzer ProUses protocol paths, not just hops.http://www.pathanalyzer.com/
If the issue is on your local LAN oranywhere in SC and ongoing youneed to either address the issueinternally or with the S...
If the issue is outside SC we can lookat trying to appeal for a remedy orsome kind of routing but we can’tguarantee results.
If the issue is at the hosting facilitywe can fix the issues immediately.
Standard Traceroute
TCP Based Path
So… what if everything so far looksclear?
It’s a SERVER(s)!
Our SetupLoad BalancerApp ServersProductionReplicationand ReportingDatabase Servers
How can I tell which has just gone tomeet Werner Jacob?(warning: broad simplifications ahead)
If it’s the DB servers then everythinggoes to heck starting with databaseretrieval and the errors will say ‘SQL’in them so...
If it’s only the replication one thenonly reports will be affectedincluding notices.
App bricks – its very rare for all fourapp bricks to fail at once so usuallysome machines will do fine whileothers have is...
Example: When catalogers havetemplate issues, they may have lostthem on one brick but not others.
When a brick crashes you willusually get errors referencingvarious PM files (perl modules) orspecific scripts.
When it’s the load balancer –everything slows down painfully andeverything goes to heck. Eventuallystations will time out ...
Don’t jump to conclusions but theseexamples should give you someinsight into the kinds of things tolook for.
Copy errors. Observe and report.Communicate on listserv. IRCchannel is also available specific toSCLENDS. Call Rogan in an...
Sclends basic network troubleshooting
Upcoming SlideShare
Loading in …5
×

Sclends basic network troubleshooting

351 views

Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Sclends basic network troubleshooting

  1. 1. Is Evergreen Slowing Down – Basic NetworkTroubleshootingRogan Hamby, June 13th 2013
  2. 2. Is Evergreen slowing down?
  3. 3. There could be several culprits.
  4. 4. Staff Client Issues
  5. 5. There are known memory leaks inthe staff client. These are beingactively addressed by thecommunity.
  6. 6. If this is happening it probably isn’thappening the same at all stations.Reboot the troubled station.
  7. 7. Network Issues
  8. 8. From your local switch having fits toa router in Tennessee dying tosomeone in Atlanta doing a thirteenterabit backup we are at the mercyof the pipes inbetween.
  9. 9. Usually these problems will growslowly. All machines will be affectedbut it may not seem like that at firstas some activities are more prone tointerruption.
  10. 10. Staff facing patrons and thosefunctions moving large data frames(e.g. cataloging) will usually noticefirst because lost packets andlatency have the greatestperceivable impact.
  11. 11. Now it’s important to look at yournetwork path. There are manycommon elements in the paths fromSCLENDS member libraries to thehosting facility but no universal onesexcept the last few.
  12. 12. If you use ICMP or UDP based toolsbe aware of the false positives theycan give since they are oftenblocked.
  13. 13. I recommend that you use TCPbased trace routes.
  14. 14. Windows – Pingplotter Prohttp://www.pingplotter.com/pro/
  15. 15. Linux – traceroute -T
  16. 16. Mac – Path Analyzer ProUses protocol paths, not just hops.http://www.pathanalyzer.com/
  17. 17. If the issue is on your local LAN oranywhere in SC and ongoing youneed to either address the issueinternally or with the State level e-rate board.
  18. 18. If the issue is outside SC we can lookat trying to appeal for a remedy orsome kind of routing but we can’tguarantee results.
  19. 19. If the issue is at the hosting facilitywe can fix the issues immediately.
  20. 20. Standard Traceroute
  21. 21. TCP Based Path
  22. 22. So… what if everything so far looksclear?
  23. 23. It’s a SERVER(s)!
  24. 24. Our SetupLoad BalancerApp ServersProductionReplicationand ReportingDatabase Servers
  25. 25. How can I tell which has just gone tomeet Werner Jacob?(warning: broad simplifications ahead)
  26. 26. If it’s the DB servers then everythinggoes to heck starting with databaseretrieval and the errors will say ‘SQL’in them somewhere usually. But it’squick!
  27. 27. If it’s only the replication one thenonly reports will be affectedincluding notices.
  28. 28. App bricks – its very rare for all fourapp bricks to fail at once so usuallysome machines will do fine whileothers have issues or it appearsrandom.
  29. 29. Example: When catalogers havetemplate issues, they may have lostthem on one brick but not others.
  30. 30. When a brick crashes you willusually get errors referencingvarious PM files (perl modules) orspecific scripts.
  31. 31. When it’s the load balancer –everything slows down painfully andeverything goes to heck. Eventuallystations will time out and errors willreflect that.
  32. 32. Don’t jump to conclusions but theseexamples should give you someinsight into the kinds of things tolook for.
  33. 33. Copy errors. Observe and report.Communicate on listserv. IRCchannel is also available specific toSCLENDS. Call Rogan in anemergency (he’s not always at hisdesk).

×