This talk is about data science and statistics applied to flight safety in commercial aviation worldwide. In the introductory part, we will stress the importance of monitoring your flight data and show you some real records coming from flight data recorders (aircraft “black boxes”). We will then explain how the data is recorded, downloaded, analysed, converted to safety events and finally, validated by experts in the field - flight data analysts. Data aggregation across many flights will result with statistical images of safety risks in airlines’ operations. However, this valuable tool can turn into a deadly weapon if used negligently – we’ll support this claim with examples. We are convinced the audience will know about some of these traps, regardless of the industry they are coming from, but hopefully there will be something valuable to take home, too. In the second part, we are saying goodbye to the data analyst and the statistician – the two dominant guys from the first part of the presentation. However, a data scientist will stem from valuable experiences and domain knowledges of the two. This guy will walk the audience through three simple, but working examples. The first one is about how we can improve the accuracy of automated analysis by using historic data and a probabilistic, Bayesian approach. The second example is about finding novel safety risks in airlines’ operations by using simple principal component analysis. Lastly, we’ll use a Markov model to detect aircraft which have changed behaviour with respect to frequency of data downloads, so we collect as many flight data as possible. We will try to make this chat as interesting and as interactive as we can and are looking forward to meeting you at this fun and interesting conference!
6. 3400 G shock for 6.5 ms
500 lb. Dropped from 10 ft with a ¼-
inch-diameter contact point
1100 ºC flame for 30 minutes.
260 ºC for 10 hours
Immersion in aircraft fluids
for 24 hours
Immersion in sea water
for 30 days
5,000 pounds crush
for 5 minutes on each axis
Pressure equivalent to
depth of 20,000 ft.
7.
8.
9. Avoiding Black Boxes
with Data Science
Raffaele Rainone & Marko Vasiljevski
Data Science Conference
11-12 October 2016
Belgrade, Serbia
11. Hkjk;l
“I don’t know…, pure
mathematician, Python
developer, data scientist,
pizza lover, feeling a bit
home sick for Italy, …”
12. Chatting about…
Aviation Safety
A story about flight data monitoring (FDM)
A story of a flight data analyst
A story of a flight safety statistician
13.
14. Chatting about…
Data science in (a bit of) action
Jet-engine health - Working with Mr. Bayes
Detecting safety concerns – PCA, a friend
Finding cuckoo’s eggs – Mr. Markov’s chains
18. Flight Data Monitoring (FDM) what?!
Part of safety management system (SMS)
Airlines worldwide obliged to do FDM
Spotting deviations from safe operation
Non-punitive – learn from mistakes
Confidential
41. Uncertainty - good companion
Normal acceleration - Tdwn Normal acceleration – Lift-off
Statistic Value
Total count 80,663
Average 1.31
(Min, Max) (1.03, 2.40)
Range 1.37
Standard deviation 0.10
Statistic Value
Total count 80,663
Average 1.19
(Min, Max) (0.66, 1.63)
Range 0.97
Standard deviation 0.05
Wider
histogram,
less
confidence
in mean
value
Narrower
histogram,
more
confidence
in mean
value
65. End of take-off – a problem
If not detected – problems with engine health
Difficult to do with classical signal processing
What if a parameter fails (not so rarely)
66. End of take-off – a solution
Time at vertical navigation mode selected (VNAV)
Drop/rise in fuel flow (FF) and gas temperature (EGT)
Search for changes around that point in time
Gather the knowledge
Calculate probability for EOT for unseen flight
67. End of take-off – a solution
Time at vertical navigation mode selected (VNAV)
Drop/rise in fuel flow (FF) and gas temperature (EGT)
Search for changes around that point in time
Gather the knowledge
Calculate probability for EOT for unseen flight
82. Formally…
Previous 700 x 123 matrix is A
Multiply transpose of A with A to get covariance matrix, C. It’s 123 x
123
Do singular value decomposition of C to find eigenvectors and
eigenvalues
Eigen vectors coincide with directions of highest variance
Keep just first couple of vectors to reduce dimensionality
99. Flight data upload data data data
State1 = M x State0
State2 = M x State1 = M x (M x State0) = M2 x State0
StateN = MN x State0
Don’t have to know states, just powers of M
(probabilities)
Depending on where we are in the world means various standards that FDRs need to comply to; visibility in wreckage (orange)
The crash was caused primarily by the aircraft's automated reaction which was triggered by a faulty radio altimeter. This caused the autothrottle to decrease the engine power to idle during approach. The crew noticed this too late to take appropriate action to increase the thrust and recover the aircraft before it stalled and crashed.[9] Boeing has since issued a bulletin to remind pilots of all 737 series and BBJ aircraft of the importance of monitoring airspeed and altitude, advising against the use of autopilot or autothrottle while landing in cases of radio altimeter discrepancies.
“I don’t know…, pure mathematician, Python developer, data scientist, pizza lover, feeling a bit home sick for Italy, …”
Holds a PhD in theoretical mathematics, more specifically, in group theory. He is also a very capable developer and can easily communicate results of analyses. All of that makes him a great data scientist. Last but not least, he cooks great pasta carbonara - main reason for employing him.
“I don’t know…, pure mathematician, Python developer, data scientist, pizza lover, feeling a bit home sick for Italy, …”
Holds a PhD in theoretical mathematics, more specifically, in group theory. He is also a very capable developer and can easily communicate results of analyses. All of that makes him a great data scientist. Last but not least, he cooks great pasta carbonara - main reason for employing him.
Intro: Legal requirement, explain SMS, how the airlines are obliged
Aircraft flying
By the way, is it trying to stop or is accelerating down the rwy? Where is the second black box?
Aircraft on the ground
Every manufacturer, a/c type has different set of parameters recorded. Even the same a/c type can have different parameters – like OS, imagine same file on multiple computers – different locations, but still comparable when clicked
Data cleansing, Different frequencies, Parameter naming convention important
Why is there missing data – normal or problem with the parameter?; change in behaviour of acceleration normal – why?
Why is there missing data – normal or problem with the parameter?; change in behaviour of acceleration normal – why?
Why is there missing data – normal or problem with the parameter?; change in behaviour of acceleration normal – why?
Why is there missing data – normal or problem with the parameter?; change in behaviour of acceleration normal – why?
Mainly with charts and numbers coming from the data
Very often forgotten
We live in the data-driven world
We used to rely on our hunch feeling and on flying planes
Nowadays, we are overwhelmed by quantities of data and automation (autopilot)
We are deceived by charts and numbers
The solution: somehow combine previous knowledge and data that are constantly being stored in databases across the globe
Insert image
Examples of 1) missing data; 2) corrupt parameter
Examples of 1) missing data; 2) corrupt parameter
Examples of 1) missing data; 2) corrupt parameter
Examples of 1) missing data; 2) corrupt parameter
Examples of 1) missing data; 2) corrupt parameter
Examples of 1) missing data; 2) corrupt parameter
Examples of 1) missing data; 2) corrupt parameter
Examples of 1) missing data; 2) corrupt parameter
Examples of 1) missing data; 2) corrupt parameter
Examples of 1) missing data; 2) corrupt parameter
Examples of 1) missing data; 2) corrupt parameter
Examples of 1) missing data; 2) corrupt parameter
Examples of 1) missing data; 2) corrupt parameter
Examples of 1) missing data; 2) corrupt parameter
The first outlier – remarkably difficult to analyse, outlier due to E1N1 Hi after TDWN, causes: AT engaged until a couple of seconds after TDWN in high cross wind at 45% (pilot to handle control column instead to monitor the speed); concerns: deep, baloon, hard landings
Contrary to the previous example, AT was diengaged, landing into high cross wind at 45 degrees
AT mach mode engaged during the landing – poor airmanship, too low speed, can stall
Firm landing, The 1.9g peak was a result of the deployment of speed brakes. The trace shows a perfect example of normal-g trace – two close peaks after KTI (two wheels), and then smaller one corresponding to the nose wheel, and third, more pronounced peak for the breaks