2. Introduction
• Implementation of a High Integrity NTP
system for Air Traffic Control
– Air Traffic Control
– Supporting Systems
– Safety Requirements
– Failure Modes
– Solution to provide NTP service
– Conclusion
5. Safety Requirements
• Depends on criticality of service
– Voice Comms
– Surveillance
• Probability of Failure <1 in 10,000,000 hours
• No undesirable failure modes
• Safety Management System
• Rarely achieved by COTS products
6. Reliability
• Electronic hardware – random
– Typical equipment MTBF 50k-100k hours
• Software – systematic
– For commercial software limit is 10k hours
• How do we meet the Safety Requirements?
– Bespoke
– Innovative use of commercially available
equipment.
7. Time Distribution
• Time data by serial interface
• Originally bespoke
• Network Time Protocol
• Improved performance at less cost
11. Conclusions
• NTP service for ATC
– Meets safety requirements using COTS
equipment.
– Better performance
– Less cost
• Sometimes only a bespoke solution will do.
Today we will be looking at the way we tackled the implementation of a new Time Distribution system for ATC, with very high integrity requirements.
An example of data integration between systems. This is an example of a controller’s surveillance screen. Data from the FDP system is added to enable the controller to identify aircraft by flight number. Destination also, and cleared flight level.
To devise a way to meet the safety requirements we need to consider how systems fail.
System reliability includes all parts of the system, including hardware, software, supporting infrastructure (including critical external interfaces), operators and procedures.
Hardware faults occur randomly due to component failures and are not usually related to how it is used; they are state independent. Hardware reliability is usually simply calculated but considers all failures to be equal – not realistic. Need further analysis to arrive at dangerous failures.
Software faults are usually bugs, the result of unanticipated results of software operations. Reset it to get it working again, but it will reappear when the same situation occurs, hence they are systematic.
Typical quoted reliability is 50k to 100k hours, short of our targets. If the system uses software (most do) the best you can claim is 10,000 hours (CAA/SRG figures – allowance for commercial systems) with systematic failures dominating.
A typical requirement is 1 in 10E7, pretty difficult. Bespoke solutions are possible, and were common in the past, but very expensive and risky.
What we need is an innovative solution using inexpensive equipment that meets the requirement.
Many ATC systems need a time of day feed, ranging from wall and console clocks to Surveillance Data Processing. The latter has a major safety requirement of the order of 10^7.
The earliest systems were bespoke designs and some are still in use today. Serial ASCII data was the norm, with specific cabling to each clock and system, different for each application and expensive to own.
Network Time Protocol has now emerged as the standard for distribution of time of day. This has evolved from the internet world and is capable of giving a very accurate indication of time, even over packet networks. It is cheaper to implement since it is distributed over a WAN or LAN, with no dedicated cabling.
Stratum 0 clocks are atomic clocks. GPS signals are based on atomic clocks.
The NTP servers are at Stratum 1.
Clients systems interface at Stratum 2 and may distribute to lower levels.
Ways to use COTS. Put in two? No, because there are common cause failures due to software, so redundancy gains little and it gives NTP a problem if they disagree.
That gives NTP a problem if they disagree. The answer is to put in three with minimal (and understood) common failure causes – different software, hardware, chipset, GPS engine, everything.
How do you assess when there are suppliers who want to keep their designs secret? You source from the ones who co-operate!
We arrive at a system design like this, a 1 out of 3 architecture. Three NTP servers are from different suppliers with a thorough assessment to satisfy ourselves that there are minimal common cause failures.
So if ATC has difficult requirements, what about the customer?
The wing is a critical component. Can only have one so it must be designed to be fit for purpose.
Two engines, can fly on just one. Failures are usually independent – but common causes have been known. Like the 777 that just managed to glide into Heathrow after fuel had frozen causing both engines to stop.
Airbus use a 1 out of 3 fly-by-wire system. Other on board systems use multiple instances to ensure reliability.
Requirement achieved by using features of the protocol and a 1 out of 3 architecture.