Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Copyright 2015 QuEST Forum. All Rights Reserved.
1
The action against Soft-errors
to prevent service outages
NTT Network S...
Agenda
2
1. Soft error problems
Laboratory non-reproducible errors
Silent errors
2. Soft error mechanisms
Soft errors are ...
1. Soft error problems
Laboratory non-reproducible errors
3
Network System
Network operations center
① Error
② Alarm
Manuf...
1. Soft error problems
Silent errors
4
Network System
Network operations center① User complaint
I can’t connect! • Not ala...
5
SunSupernova explosion
Earth
Cosmic rays
(High energy particles)
Neutron
Nuclei (O or N)陽子
High energy particles
Destruc...
6
2. Soft error mechanisms
Nuclear reactions in the device
Soft error
(Bit error)
Secondary ions
Silicon nuclei陽子
Destruct...
3. The increase of soft errors
7
Miniaturization of LSI design rule
(Highly integrated)
Soft errors increase
Current,
At g...
3. The increase of soft errors
How often do soft errors occur ?
8
FPGA
SRAM
The FPGA contains large capacity SRAM.
Without...
4. Practices
9
Developing and applying soft error countermeasures
4. Practices
Step 1. Specifying requirements
10
Planned network scale
E.g.
1000 units on the network
Specify requirements
...
4. Practices
Step 2. Simulating soft errors
11
Device Design
rule
[nm]
Size
[Mb]
Soft error
rate
[FIT]
CPU SRAM 65 2 200
F...
4. Practices
Step 3. Apply soft error countermeasures
12
(1) Reducing
soft errors
(2) Protection from
soft errors
(3) Reco...
4. Practices
Step 4. Soft error tests with real products
13
We developed soft error testing technology using Hokkaido
Univ...
14
4. Practices
Step 4. Soft error tests with real products
5. Results
15
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10
Comparison of neutron soft error rates
...
6. Conclusion
16
We successfully reproduced soft errors using a compact
accelerator-driven neutron source.
We were able to...
Message
17
Have you ever experience troubles with unknown
causes on your network ?
It might be caused with soft errors !
S...
Special thanks:
18
Fujitsu, Ltd.
Hitachi, Ltd.
NEC corp.
Upcoming SlideShare
Loading in …5
×

The Action Against Soft-Errors to Prevent Service Outage

484 views

Published on

The Action Against Soft-Errors to Prevent Service Outage presented by Hidenori Iwashita - Nippon Telegraph and Telephone Corporation (NTT). NTT has reproduced soft errors using a compact accelerator-driven neutron source and reduced service outages and failure recovery costs due to soft errors.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

The Action Against Soft-Errors to Prevent Service Outage

  1. 1. Copyright 2015 QuEST Forum. All Rights Reserved. 1 The action against Soft-errors to prevent service outages NTT Network Service Systems Laboratories Hidenori Iwashita 2015 APAC QuEST Forum APAC Best Practices Conference April 2015
  2. 2. Agenda 2 1. Soft error problems Laboratory non-reproducible errors Silent errors 2. Soft error mechanisms Soft errors are caused by cosmic rays 3. The increase of soft errors With miniaturization of LSI design rules, soft errors are increasing rapidly 4. Practices Soft error test using a compact accelerator neutron source 5. Results 6. Conclusion NTT can reduce service outages and failure recovery costs due to soft errors.
  3. 3. 1. Soft error problems Laboratory non-reproducible errors 3 Network System Network operations center ① Error ② Alarm Manufacturer factory ③ Return ④ Tests ⑤ Test OK
  4. 4. 1. Soft error problems Silent errors 4 Network System Network operations center① User complaint I can’t connect! • Not alarmed • Fault node unknown Prolonged Significant failure  Press release (Newspaper, TV)
  5. 5. 5 SunSupernova explosion Earth Cosmic rays (High energy particles) Neutron Nuclei (O or N)陽子 High energy particles Destruction Nuclear reactions in the atmosphere Proton Muon π-meson 2. Soft error mechanisms Neutrons generated by cosmic rays
  6. 6. 6 2. Soft error mechanisms Nuclear reactions in the device Soft error (Bit error) Secondary ions Silicon nuclei陽子 Destruction NeutronNetwork System Neutrons
  7. 7. 3. The increase of soft errors 7 Miniaturization of LSI design rule (Highly integrated) Soft errors increase Current, At ground level Past, Only in space or the sky
  8. 8. 3. The increase of soft errors How often do soft errors occur ? 8 FPGA SRAM The FPGA contains large capacity SRAM. Without soft error mitigation you got more than 10000 FIT. E.g. Since SRAMs have less critical charge (are more sensitive), soft errors occur more frequently. SRAM ×1000 units in network FPGA×6 About 1.5 devices per day fail
  9. 9. 4. Practices 9 Developing and applying soft error countermeasures
  10. 10. 4. Practices Step 1. Specifying requirements 10 Planned network scale E.g. 1000 units on the network Specify requirements E.g. 1 failure per month on the network ⇒ about 1300FIT / unit
  11. 11. 4. Practices Step 2. Simulating soft errors 11 Device Design rule [nm] Size [Mb] Soft error rate [FIT] CPU SRAM 65 2 200 FPGA SRAM 28 100 10000 ASIC SRAM 90 2 150 DRAM ① 40 500 10 DRAM ② 40 500 10 DRAM ③ 40 500 10 DRAM ④ 40 500 10 SRAM ① 65 10 1000 SRAM ② 65 1 100 SRAM ③ 65 10 1000 SRAM ④ 65 2 200 SRAM ⑤ 65 10 1000 Flash Mem 90 50 50 Substrate FPGA ASIC CPU SRAM SRAMSRAMSRAMSRAMSRAM DRAM DRAM DRAM DRAM Flash Memory SRAM SRAM E.g. We simulate high soft error rates in devices. High High High High
  12. 12. 4. Practices Step 3. Apply soft error countermeasures 12 (1) Reducing soft errors (2) Protection from soft errors (3) Recovery from soft errors Devices with low soft error rates Using memory devices with error correction functions such as ECC*. *Error Correction Code Systems automatically restart or overwrite if a soft error occurs. Selecting the appropriate soft error countermeasures to suit functions MRAM Special device Low spec High cost 1 bit correction 2 bit detection 2 bit correction 3 bit detection Low cost High cost Firmware Low cost ASIC Long-term development
  13. 13. 4. Practices Step 4. Soft error tests with real products 13 We developed soft error testing technology using Hokkaido University’s compact accelerator-driven neutron source. Hokkaido University’s compact accelerator-driven neutron source
  14. 14. 14 4. Practices Step 4. Soft error tests with real products
  15. 15. 5. Results 15 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Comparison of neutron soft error rates FPGA based device ASIC based device w/o ECC function w/ ECC function w/o auto recovery function w/ auto recovery function We measured the device to confirm the soft error rate reduction using the accelerator neutron source. On the real network, the number of soft errors largely decreased. 80% reduction 90% reduction 80% reduction
  16. 16. 6. Conclusion 16 We successfully reproduced soft errors using a compact accelerator-driven neutron source. We were able to investigate soft error tolerance, and check the fault detection process and the process of switching to a backup network system. We conclude that NTT can reduce service outages and failure recovery costs due to soft errors.
  17. 17. Message 17 Have you ever experience troubles with unknown causes on your network ? It might be caused with soft errors ! Soft errors is able to deal with ! We hope all of the carriers and manufacturers of the world to be freed from this problems !
  18. 18. Special thanks: 18 Fujitsu, Ltd. Hitachi, Ltd. NEC corp.

×