Call Girls In Defence Colony Delhi π―Call Us π8264348440π
Β
Network troubleshooting checklist
1. Network troubleshooting checklist
Information Timelines
Investigation (Proximate
cause)
Component Transport
Critical Important
00:00:00
00:01:26
00:02:53
00:04:19
00:05:46
00:07:12
00:08:38
00:10:05
00:11:31
00:12:58
Incident times
Detection Diagnosis Repair Recovery Restore Wrap-up Workaround Escalation Unavailable Degraded
2. Description Check Comments Tracking
Description of incident or outage including the symptoms
displayed or experienced In progress <Problem statement> 0
SLA targets achieved? In progress <SLA measure> 0
Last known change on component impacted? Not applicable <Description of change> 1
Are suspect weather conditions associated with the incident? In progress <Comments> 0
Is the incident electrical/power related? In progress <This is an important check to complete when there is load shedding occuring> 0
Are there any issues highligted by the component checks? In progress <Description of any other suspected causes> 0
Is there a component utilization problem? In progress <Review of long term, short term and real time utilization> 0
Is the component accessible and managable via cli and
SNMP? In progress <Comments> 0
Is the ethernet port speed setting correct? In progress <Ethernet port speed settings> 0
Is the ethernet port duplex setting correct? In progress <Ethernet port duplex setting> 0
Are there any CRC errors on the port? In progress <Comments> 0
Has a visual check of the cabling been conducted and is it
acceptable? In progress <Comments> 0
Has the correct rate limit as ordered by the customer been
applied? In progress <Comments> 0
Is there a congestion problem on either the primary or
backup transmission path? In progress <Comments> 0
Is there a layer 2 loop (or symptom of one) present on any
transmission path? In progress <Comments> 0
0.066667
3. Descriptions Checks Comments Tracking
Describe how the business was impacted by stating the
undersired outcome In progress <Undesired outcome> 0
Conditions (the business and IT conditions present when
the incident occurred) In progress <Conditions present, provide business and IT context> 0
Time when incident started 12:00:00 AM The actual time when a component was impacted or an event occurred 0
Time when incident was detected 12:01:00 AM Detected either by monitoring tools, IT resources or worst case the user/customer 1
Detection time was acceptable? In progress Detected either by monitoring tools, IT resources or worst case the user/customer 0
Is this a component failure that is hardware related? In progress <Related to component hardware fault> 0
Are there any other suspected causes been identified? In progress <Description of any other suspected causes> 0
Is the component temperature status acceptable? In progress <Measurments> 0
Are the MAC addresses being learned correct? In progress <MAC addresses from location A and B> 0
Has an ethernet OAM been performed on the last mile
link? In progress <Results of OAM> 0
Are the patches damaged? In progress <Comments> 0
Is the fibre damaged or is signal loss suspected? In progress <Comments> 0
Are pings and mtrs operating as expected? In progress <Comments> 0
Do pings with different packet sizes fail? In progress <If path MTUs are incorrectly set the black holes will occur> 0
Are any path protection flaps being recorded or logged? In progress <Comments> 0
Is the configuration of the path protection correct? In progress <Comments> 0
Are then any visible LOS issues? In progress <Comments> 0
Is the expected throughput over the distance of the link
acceptable? (budget) In progress <Comments> 0
Are all MTU's along the tranport path aligned? In progress <Comments> 0
0.052632
4. Description Check Comments Tracking Priority
Description of incident or outage including the
symptoms displayed or experienced
In progress
<Problem statement> 0 C
Describe how the business was impacted by stating the
undersired outcome
In progress
<Undesired outcome> 0 I
Is it a loss of connectivity? In progress <Failure or outage> 0 R
Is it a degradation of services? In progress <Typically referred to as a brown out> 0 R
Are there any photos available? In progress <Use a smart phone to take pictures of the incident or component involved> 0 R
Has a physical inspection been conducted? In progress <Important to put eyes on the problem> 0 R
Conditions (the business and IT conditions present when
the incident occurred) In progress
<Conditions present, provide business and IT context> 0 I
Diagram of solution available? In progress <Reference> 0 R
Configuration of solution available? In progress <Reference> 0 R
Design and process documentation of solution available? In progress
<Reference> 0 R
Main component/circuit being investigated In progress <Main circuit id> 0 R
Component/circuit Location A In progress <Details of location A> 0 R
Component/circuit Location B In progress <Details of location B, termination point of a circuit> 0 R
What is the type of component/circuit? In progress <Details of circuit type access, distribution, core, dot1q, QinQ> 0 R
What is the type of cabling being used? In progress <Copper, fibre> 0 R
0
5. Event Date Time Description Delta Analysis Check Comments Tracking Priority
Time when incident started 2004/04/15 00:00:00 The actual time when a component was impacted or an event occurred I
Time when incident was detected 2004/04/15 00:01:00 Detection 00:01:00 Detected either by monitoring tools, IT resources or worst case the user/customer I
Time of diagnosis 2004/04/15 00:02:00 Diagnosis 00:01:00 Underlying cause - we know what happened R
Time of repair 2004/04/15 00:03:00 Repair 00:01:00 Process to fix failure or corrective action initiated R
Time of recovery 2004/04/15 00:04:00 Recovery 00:01:00 Component recovered, the component is back in production, service ready to be resumed R
Time of restoration 2004/04/15 00:05:00 Restore 00:01:00 Normal operations resumed, the service is back in production R
Time of resolution 2004/04/15 00:06:00 Wrap-up 00:01:00 Time to close incident was acceptable? In progress Customers/users informed and incident is verified as closed 0 R
Time of workaround 2004/04/15 00:07:00 Workaround 00:06:00 Time to implement workaround was acceptable? In progress Service is back in production with workaround 0 R
Time of escalation 2004/04/15 00:08:00 Escalation 00:07:00 Escalation times were acceptable? In progress Third level escalation (if required) 0 R
00:10:00 Unavailable 00:10:00 R
00:11:00 Degraded 00:11:00 C
0
0
0
0
0<SLA measure>
Time period service was unavailable
Time period service was degraded SLA targets achieved? In progress
Time to restore service to operational state was
acceptable? In progress
Detection time was acceptable? In progress
Repair time was acceptable? In progress
6. Description Check Comments Tracking Priority
Has all proximate cause investigations been completed? In progress
<Checkpoint to make circuits all known proximate causes
have been investigated> 0 R
Is this a component failure that is hardware related? In progress <Related to component hardware fault> 0 I
Were there any changes logged for this time period? In progress <List and description of changes logged> 0 R
Last known change on component impacted? Not applicable <Description of change> 1 C
Was an planned maintenance scheduled for this time
period? In progress <List and time of planned maintenance tasks> 0 R
Is the incident related to a planned change or scheduled
maintenance? In progress <Comments> 0 R
Are there any other components with similar issues? In progress <List of associated compintes 0 R
Prevalent weather conditions In progress <Details of weather at location A and Location B> 0 R
Are suspect weather conditions associated with the
incident? In progress <Comments> 0 C
Has preventative maintenance been conducted at the
relevant locations within an acceptable review period?
In progress
<Comments> 0 R
Is the inicident associated with an environmental problem? In progress <Comments> 0 R
Is there sheduled load shedding? In progress <Comments> 0 I
Is the incident electrical/power related? In progress
<This is an important check to complete when there is load
shedding occuring> 0 C
Is the incident lightning strike related? In progress <Comments> 0 R
Is there water damage? In progress <Comments> 0 R
Is the incident heat related? In progress <Comments> 0 R
Have vendor and software bugs been eliminated? In progress <Comments> 0 R
Have contractor or service provider faults been
eliminated? In progress <Comments> 0 R
Are there any other suspected causes been identified? In progress <Description of any other suspected causes> 0 I
0.052632
7. Description Check Comment Tracking Priority
Are there any issues highligted by the component checks? In progress
<Measurement> 0 C
Is the component fan speed status acceptable? In progress <Measusment> 0 R
Is the component temperature status acceptable? In progress <Measurments> 0 I
Is the component power supply status acceptable? In progress <Measusment> 0 R
Does the component pass power on self tests? In progress <Results> 0 R
Any unexplained component resets or violations? In progress <Description of any resets or violations> 0 R
Are all cpu, memory and file system statuses acceptable? In progress
<Measurements> 0 R
Are all the vlans correct? In progress <Descriptions of vlans> 0 R
Is the correct firmware deployed on the component? In progress <Description of firmware> 0 R
Do the latest firmware release notes of the component
describe or identify and of the problems being experienced?
In progress
<Latest firmware release notes> 0 R
Are the MAC addresses being learned correct? In progress <MAC addresses from location A and B> 0 I
Are RFC2544 test results available? In progress <Results of RFC2544 tests> 0 R
Are the SLA measurements using Y.1731 acceptable? In progress <Results of Y.1731 tests> 0 R
Is there a component utilization problem? In progress <Review of long term, short term and real time utilization> 0 C
Is the capacity being offered to the component being
exceeded? In progress
<Review of drops> 0 R
Is the component accessible and managable via cli and
SNMP? In progress
<Comments> 0 C
Has the Component been seperately tested? In progress <e.g. CPE moved from customer to high site> 0 R
Is there a wireshark of the traffic involving the component? In progress
<wireshark file> 0 R
Is there any deductions that can be made from the wireshark? In progress
<Deductions from wireshark file> 0 R
Is the ethernet port speed setting correct? In progress <Ethernet port speed settings> 0 C
Is the ethernet port duplex setting correct? In progress <Ethernet port duplex setting> 0 C
Are the ethernet port statistics withing acceptable limits? In progress
<Ethernet port statistics> 0 R
Do the Link LED's indicate proper cable connection? In progress <Comments> 0 R
Are there any CRC errors on the port? In progress <Comments> 0 C
Are there any cabling related problems? In progress <Comments> 0 R
Are there any connected radio related problems? In progress <Comments> 0 R
Are there any pause frames on the component? In progress <Switch or radio pause frames> 0 R
Are traffic counters incremented in both directions on both
facing ports of the component/link? In progress
<Comments> 0 R
Have ports been administratively reset? In progress <Comments> 0 R
Have ports been physically reset? In progress <Comments> 0 R
Is the switch ports configured correctly? In progress <Trunk or access requirements> 0 R
Is there a disabled port due to a fault? In progress <Comments> 0 R
Are the port and cables correctly labeled? In progress <Comments> 0 R
Do the physical connections correspond to the labeling and
documentation? In progress
<Comments> 0 R
Are the descriptions on the switch port configuration correct? In progress
<Comments> 0 R
Has an ethernet OAM been performed on the last mile link? In progress
<Results of OAM> 0 I
Are all POE components functioning correctly? In progress <Comments> 0 R
0
8. Description Check Comment Tracking Priority
Has a visual check of the cabling been conducted and is it acceptable? In progress <Comments> 0 C
Are there photos available of the cabling? In progress <Comments> 0 R
Are the patches damaged? In progress <Comments> 0 I
Are the port SFP/XFP damaged? In progress <Comments> 0 R
Has the link been tested? In progress <Test results> 0 R
Are the maximum link lengths being exceeded? In progress <Comments> 0 R
Has a cable recently been moved from one port to another? In progress <Comments> 0 R
Are the fibre pigtails correctly connected (RX to TX) In progress <Comments> 0 R
Does the fibre pigtail have a half break? In progress <Comments> 0 R
Is the fibre within allowable attenuation limits? In progress <Comments> 0 R
Are the links being connected at compatible frequencies/types? In progress <Comments> 0 R
Has the fibre been recently cleaned? In progress <Comments> 0 R
Is the fibre damaged or is signal loss suspected? In progress <Comments> 0 I
Has the correct rate limit as ordered by the customer been applied? In progress <Comments> 0 C
Are the management VLANs correctly provisioned? In progress <Comments> 0 R
Is the ration been broadcasts and unicasts acceptable? In progress <Less than 10%> 0 R
Has a broadcast filter been configured? In progress <Comments> 0 R
Is there a broadcast problem? In progress <Comments> 0 R
Is the correct IP address assigned? In progress <Comments> 0 R
Is the correct subnet, subnet mask and associated gateway correctly
assigned? In progress <Comments> 0 R
Are pings and mtrs operating as expected? In progress <Comments> 0 I
Do pings with different packet sizes fail? In progress <If path MTUs are incorrectly set the black holes will occur> 0 I
Is there a congestion problem on either the primary or backup
transmission path? In progress <Comments> 0 C
Is there a layer 2 loop (or symptom of one) present on any
transmission path? In progress <Comments> 0 C
Is there path protection? In progress <Comments> 0 R
Are any path protection flaps being recorded or logged? In progress <Comments> 0 I
Is the configuration of the path protection correct? In progress <Comments> 0 I
Any indication of asymmetrical traffic? In progress <Comments> 0 R
Is there any radio interference problem? In progress <Comments> 0 R
Are there any LOS photos? In progress <Comments> 0 R
Are then any visible LOS issues? In progress <Comments> 0 I
Is the expected throughput over the distance of the link acceptable?
(budget) In progress <Comments> 0 I
Does the radio reports BERS? In progress <Comments> 0 R
Are links synchronized? In progress <Comments> 0 R
Has self interference been eliminated? In progress <Comments> 0 R
Has external interference been eliminated? In progress <Comments> 0 R
Are all MTU's along the tranport path aligned? In progress <Comments> 0 I
0
9. Web site: http://www.deesmith.co.za
Linkedin: https://www.linkedin.com/company/1286803
Spinning the wheel of continuous network improvement
One of the main tools in understanding IT maturity is using the Deming wheel. The wheel was taken up and
promoted very effectively from the 1950s on by the famous quality management authority, W. Edwards
Plan-Do-Check-Act
β’ Plan to improve service management by determining what is going wrong (that is identify the
β’ Do changes designed to solve the problems on a small and incremental scale first. This minimises
disruption to Live while testing whether the changes are workable.
β’ Check whether the small and incremental changes are achieving the desired result. Also, continuously
monitor nominated key activities to ensure that you know what the quality of the output is at all times to
β’ Act to implement changes on a larger scale if the small changes are successful. This means scheduling
the changes a part of the standard maintenance and administrative tasks. Also Act to involve resources
(people, partners, products and process) affected by the changes and obtain buy-in to implement them
Dee Smith and Associates has made a tool available that uses some of these and other principles. It has its
roots in the Major Incident Process. During a network event which is typically a failure or outage one of the
steps would be to do a diagnosis. A tool to assist in diagnosis is a network troubleshooting checklist. This
The purpose of the tool is for a resource, even if they are less than expert, to execute the checklist to assist
in diagnosis. The checklist is constructed from previous experiences and those checks that are often
common causes are highlighted as critical, followed by important and then routine. This is crucial as you
want to let common causes be prioritized. This is a well-known principle from the Italian economist Pareto.
As an example, during periods of load shedding it is crucial to make certain that the outage or failure is not
related to power. A method to check is to use an app such as Grid Watch. Power issues can be related to
not only having power but additionally equipment failures caused by power cycles and surges. Thus power
Another consideration is weather conditions. When radio is used in microwave and satellite links adverse
weather conditions trigger an outage. It is possible to view weather conditions via Accuweather or subscribe
A checklist is used to compensate for the weaknesses of human memory to help ensure consistency and
completeness in carrying out a diagnosis. Pilots were the first to use this methodology to overcome
The checklist is a form used to standardise a processes. A checklist includes a list of items to check, steps to
take, or information to mine. Pilots use checklists to do a pre-flight check. Checklists are useful to make sure
no items are missed which may happen if a person just does what they remember. Checklists can also aid in
Although we are referring specifically about networks here as an example the methods described can be
used for any ICT infrastructure service. This forms a important part of a greater due diligence related to IT