2. HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Contents
1. Overview of GPON System Troubleshooting
2. Categorized System Fault Troubleshooting
3. Case Study
3. HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
GPON System Troubleshooting
Overview
System faults mainly refer to the service faults caused by
host system failures, registration problems, or upgrade
problems.
During the practical application,system faults mainly includes:
……
Host software
fault
Mismatch of software version
Failure of version upgrading
Board fault
Service board failed to communicate with the control board
Board aging or be damaged
Communication
with NMS failed
Incorrect configuration of network management parameters
Uplink fault
4. HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Preparations
version feature、system logical
structure and software components
software
system structure、board features、
data forwarding procedure
hardware
basic command line operation
system maintenance operation such
as backup and loading
basic
operation
fast locating
professional
knowledge
fault prediction
GOALS
typical system faults
typical communication faults
typical
faults
5. HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Contents
1. Overview of GPON System Troubleshooting
2. Categorized System Fault Troubleshooting
3. Case Study
6. HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
System Troubleshooting Procedure
START
confirm system environment
check LED status
check the alarms
troubleshooting according to the
fault types
END
7. HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Confirm System Environment
Check whether it is clean inside
the cabinet and whether there
is very much dust inside the
cabinet.
Check whether the cabinet
and the cable distribution
frame are connected properly
to the
ground.
Check whether the
temperature inside the
room, cabinet and board
is very high.
In FAN mode, run the
display fan alarm
command to check whether
the fan is normal.
Check whether the power
supply is normal and whether
it conforms to the project
standard.
8. HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Checking the LED Status
RUN/Alarm
RUN/Alarm RUN/Alarm
ACT
RUN/Alarm
Act
RUN/Alarm
PON
BSY
10. HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Checking Alarms
How to obtain alarm information:
NMS(NCE) alarm panel
To query by running command ”display alarm history” in the host
LED indicator on the panel of the boards
alarm classification alarm level alarm reason
communication
service quality
mistake of
processing
equipment and
hardware
environment
critical
major
minor
warning
all kinds
11. HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Checking Alarms
display alarm history all
ALARM 1031 FAULT MAJOR 0x2e314020 EQUIPMENT 2020-01-14 17:58:59+08:00
ALARM NAME : The optical transceiver of the PON port is absent
SRVEFF : SA
PARAMETERS : FrameID: 0, SlotID: 4, PortID: 0
DESCRIPTION : The optical transceiver of the PON port is absent and the service is interrupted
CAUSE : (1) The optical transceiver does not exist
(2) The optical transceiver is inserted loosely
ADVICE : (1) Check whether the optical transceiver exists in the PON port
(2) Ensure the optical transceiver is inserted tight
--- END
ALARM 1032 RECOVERY CLEARED 0x2e324020 EQUIPMENT 2020-01-14 17:59:39+08:00
ALARM NAME : The optical transceiver of the PON port recovers to the normal state
SRVEFF : NSA
PARAMETERS : FrameID: 0, SlotID: 4, PortID: 0
DESCRIPTION : The optical transceiver of the PON port recovers to the normal state
CAUSE : None
ADVICE : No need to proceed
--- END
Tips: We can query the alarm according to alarm ID, alarm serial number,
alarm type, alarm time, alarm level.
12. HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Category and Cause of Common Faults
S/N Fault Category Possible Cause
1 Board
registration
failure
◆The board version and the host version do not
match.
◆The slot has been registered with board, and
does not support upgrade between the two
boards.
◆The slot of the board is loose.
◆The power, temperature, and fan are faulty.
◆The board is damaged.
2 Inband NMS
disconnection
◆The NMS version and the device version do
not match.
◆The upper layer device is faulty.
◆The upstream board is faulty.
◆The transmission line is faulty.
◆The NMS data configuration is incorrect (such
as community name, access list, MTU).
◆There is no route between the device and
◆The transmission format is not compatible.
13. HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Category and Cause of Common Faults
S/N Fault Category Possible Cause
3 Repeated
reboot of the
control board
◆The components of the control board are
damaged.
◆The backplane pins are damaged.
◆The environment and the fan are faulty.
◆The subscriber ring network is faulty.
◆The control board is not inserted tightly.
◆The CPLD, BIOS and the programs are loaded
incorrectly.
4 Switchover
failure
◆The active/standby software versions do not
match.
◆The active/standby hardware versions do not
match.
◆The standby board or the upstream port is faulty.
◆The data between the active/standby control
boards is not synchronized sufficiently.
◆The system is saving configuration.
14. HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Board Registration Failure
Check the version integration
Change the board slot
Check the type of the new board
Check the power, temperature,
and fan
Check the board status
Check the board LED
15. HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Inband NMS Disconnection
Check the upstream board status
Check the route
Check the configuration data
Confirm the fault
large scale disconnection
only one MA5800 failure
Check the ACL configured on the
host
Check the traffic on the upstream
router
check LEDs or “display board 0”
check interface board
check fiber
Check the fiber
display snmp-agent xx
display ip routing-table
ping the Gateway
display acl
16. HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Repeated Reboot of the Control Board
The possible causes are as follows:
The components of the control board are damaged
The backplane pins are damaged
The environment and the fan are abnormal
The subscriber ring network is faulty
The control board is not inserted tightly
The loaded version is incorrect
17. HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Switchover Failure
Check the subboard of the active/standby
control boards.
Check the standby board and upstream
ports.
Check the software version of the
active/standby control boards.
display language/version
display version
display board
display data sync state
Check the status of data synchronization
between the active/standby control boards.
18. HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Contents
1. Overview of GPON System Troubleshooting
2. Categorized System Fault Troubleshooting
3. Case Study
19. HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Case1 MA5800 fails to send trap
message
Description
One MA5800 can be added to NMS successfully, NMS can also manage the NE,
but can not receive any alarm message of the device. Capture packet on the uplink
port of MA5800, there is no trap message.
Alarm
No
Cause analysis
OLT hardware fault
The link between OLT and NMS fault
OLT disabled the “trap report” function
Other SNMP parameters incorrect
20. HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Case1 MA5800 fails to send trap
message
Troubleshooting
OLT can run normally, exclude hardware fault
OLT can ping NMS successfully, indicate that NMS is reachable
Check the SNMP configuration on MA5800:
1、snmp-agent community read public
2、snmp-agent community write private // read/write community
3、snmp-agent target-host trap-hostname NCE address 192.168.2.8 udp-
port 162 trap-paramsname huawei //destination IP address of trap
message with the port number 162
4、snmp-agent trap enable standard //enable trap function
5、snmp-agent trap source vlanif 28 //set the source of trap message
6、snmp-agent sys-info version v2c //set SNMP version
21. HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Case1 MA5800 fails to send trap
message
discover one command was missed:
7、snmp-agent target-host trap-
paramsname huawei v2c securityname WangGuan
//trap-paramsname in this command must keep the same with the trap-
paramsname in command 3, the value can be set to be any string.
After we run the command above, the NMS can receive trap message from
OLT normally.
Experience & Conclusion:
Many engineer easily to forget the command “snmp-agent target-host trap-
paramsname”. But actually this command is very important, the NMS can
not receive trap message in case of incorrect configuration or missing.
22. HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Case2 Failure to Log In to the Device
Through Telnet Remotely
Description
When a user logs in to the MA5800 through Telnet, the system prompts "too many
users" and the login fails.
Alarm
No
Cause Analysis
The possible causes are as follows:
The number of remote login users has exceeded the system limitation
The system software is faulty
The hardware such as the control board is faulty
Virus attacks
Other causes
23. HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Case2 Failure to Log In to the Device
Through Telnet Remotely
Troubleshooting
Run the display client command or the display terminal user online command
to check the user information, and it is found that only the users who log in to
the system through the serial port exist.
The collected information shows that certain fixed IP addresses are repeatedly
used log in to the MA5800 through port 23, Because these IP addresses are
not owned by the operator, there might be virus or malicious attacks.
To solve the problem, enable the powerful firewall and configure ACL rules on
the MA5800.
24. HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Case3: A Private Line User Fails to Open a
Web Page
Description
New private line users under the OLT fail to open web pages. The terminal fails to ping the
gateway (BRAS) and services of other PPPoE users on the OLT are normal. These private
line users can open web pages if they access the Internet through PPPoE dialup.
Alarm
No
Cause Analysis
The possible causes are as follows:
Data configurations of the private line service are incorrect.
Data configurations on the BRAS are incorrect.
25. HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Troubleshooting
Check whether data configurations on the BRAS are correct. Specifically, disconnect the
OLT from the PC, and configure the same VLAN on the Layer 3 switch (replace the OLT
with the Layer 3 switch for networking). Connect the PC to the Layer 3 switch, configure
a static IP address for the PC, and test services. It is found that service provisioning is
normal. This indicates that data configurations on the BRAS are correct.
Connect the OLT to the PC again, check whether data configurations on the OLT are
correct, and ping the gateway (BRAS) by using the static IP address. The ping operation
fails. Query packets on the BRAS and no packets are found. This indicates that data
configurations on the OLT are incorrect.
Check data configurations on the OLT further. It is found that service flows and VLANs
are correct. Try to configure the Layer 3 interface whose IP address is in the same
network segment as the OLT on the OLT and use this IP address ping the PC and
BRAS. Pinging the PC is successful but pinging the BRAS fails. This indicates that a
fault occurs on the upstream port on the OLT.
Case3: A Private Line User Fails to Open
a Web Page
26. HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Troubleshooting
Check configurations further and it is found that the OLT has the following ACL
configurations
huawei(config)#acl number 2000
huawei(config)#rule 5 deny
huawei(config)#rule 10 permit source x.x.x.x 0
huawei(config)#rule 15 permit source x.x.x.x 0
huawei(config)#packet-filter inbound ip-group 2000 rule 5 port 0/3/0
huawei(config)#packet-filter inbound ip-group 2000 rule 10 port 0/3/0
huawei(config)#packet-filter inbound ip-group 2000 rule 15 port 0/3/0
In mentioned ACL configurations, upstream port 0/3/0 permits only IP packets carrying
source IP address x.x.x.x and drops the other IP packets. In this case, Internet access
packets of users having static IP addresses are dropped. ACL rules (2000-2999) are for
IP packets and can only filter out packets adopting standard IPoE encapsulation format
and drops packets adopting PPPoE encapsulation format. Therefore, these ACL rules
are invalid for PPPoE users.
Delete these ACL configurations. Then, the Internet access service is normal.
Case3: A Private Line User Fails to Open a Web Page
27. HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Case4: Partial Users Encounter Slow Internet Access
Description
Partial users connected to an OLT encounter slow Internet access. After a check, it is found
that the aggregation switch also connects to another four OLTs besides this OLT and Internet
access of the users connected to these four OLTs is normal.
Alarm
No
Cause Analysis
The possible causes are as follows:
The fault may occur on the upper-layer device or the aggregation
Port negotiation information and statistics are incorrect.
The MAC address table is abnormal.
28. HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Troubleshooting
Check port configurations of another four OLTs. The configurations are correct
and therefore, the upper-layer device or the aggregation switch is normal
Log in to the OLT that encounters slow Internet access and check the port
negotiation information and statistics:
- Check the negotiation result on the upstream port of the OLT. It is found that
full-duplex is used for negotiation, which is proper.
- Check the statistics of received and transmitted packets. It is found that the
number of discarded frames on the upstream port increases dramatically, but
the symptom does not occur on other normal nodes. The discarding of
frames may be caused by the low downloading speed.
Check whether the MAC address table is abnormal.
- Run the mac-address timer no-aging command to set the MAC address
aging time to No aging. The MAC address of the BRAS is learned by multiple
service ports.
Case4: Partial Users Encounter Slow Internet Access
29. HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Case4: Partial Users Encounter Slow Internet Access
Troubleshooting
- Run the mac-address timer 300 command to set the MAC address aging
time to 300s (default value).
- After 10 minutes, check again. It is found that the MAC address table
returns to normal and only the upstream board learns the MAC address of
the BRAS. Meanwhile, the number of discarded frames does not increase
and the downloading speed becomes normal. Therefore, the fault occurs
because an operator sets the MAC address aging time to No aging during a
test, but does not change the setting after the test.
Suggestion:
The MAC address aging time should be set to 300s by default to ensure
normal Internet access.
30. HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Case5: Error 676 Occasionally Occurs in PPPoE Dialup
Description
Network topology: ONU -> optical line terminal (OLT) -> switch -> broadband remote access
server (BRAS)
An ONU user occasionally encounters error 676 in PPPoE dialup. In addition, the fault
duration varies.
Alarm
No
Cause Analysis
The possible causes are as follows:
An ONU user forges the MAC address of the BRAS.
An OLT user forges the MAC address of the BRAS.
The configuration of the BRAS or remote authentication dial in user service (RADIUS)
is incorrect.
A VLAN conflict occurs.
31. HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Case5: Error 676 Occasionally Occurs in PPPoE Dialup
Troubleshooting
Check the configurations of the BRAS and RADIUS. The configurations are correct.
Check the VLAN configurations on the upper-layer aggregation switch and OLT. The
VLAN configurations are correct.
When an ONU user in VLAN A encounters error 676, another user in the same VLAN is
already online on the BRAS. Therefore, it is determined that a VLAN conflict occurs on
the BRAS. The maximum number of concurrent online users configured on the BRAS is
1 in PPPoE dialup. Therefore, if a user is online in the VLAN with conflicts, error 676
occurs with other users of the same VLAN in PPPoE dialup.
Query the configurations of all ONUs connected to the OLT. It is found that VLANs have
the same outer VLAN tags and the same inner VLAN tags. Modify the VLAN settings so
that they have different outer tags and different inner tags. Then, the fault is rectified.
Suggestion
Properly plan VLANs for the PON ports on an OLT. It is suggested to configure an outer
VLAN tag for each PON port. In such a manner, the fault location scope is minimized
even when an inner VLAN conflict occurs.
32. HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Summary
PON system troubleshooting procedure
Categorized common system fault and processing methods
Typical system faults and troubleshooting