Applying Machine Learning
to Aerospace Training
Mikhail Klassen
Chief Data Scientist
Royal Aeronautical Society Conference Simulation-Based Training in the Digital Generation
London, UK 11—12 November, 2015
Background in computational
astrophysics and the study of
star formation.
Ph.D. (Almost), McMaster University

B.Sc., Columbia University

Applied Physics & Applied Mathematics
Data Scientist

Paladin:Paradigm Knowledge Solutions
Mikhail Klassen
@mikhailklassen
mikhail.klassen@ppksgroup.com
Artist’s conception of a newborn star
Supercomputer simulation of star birth
from Klassen et al. (2015, in prep.)
Data Science
Data science is a relatively
new interdisciplinary field
combining skills from:
• Mathematics, statistics
• Computer science,
artificial intelligence,
data mining
• Data visualization,
databases
Teaching Machines to “Learn”
Supervised Learning
• Developing a statistical model that gets better
the more examples provided to it
• Examples: Automatic classification, image
recognition, handwriting digitization
Teaching Machines to “Learn”
Unsupervised Learning
• Automatic pattern extraction
• Examples: clustering, personalized
recommendations
What is Big Data?
“Big Data” refers to the exponential growth in data…
• …Volume: data sets are too large to fit in standard
memory and challenge typical available storage
• …Velocity: data streams (e.g. Twitter, stock prices)
pose challenges for real-time analysis
• …Variety: mixture of structured and unstructured
data pose challenges for database paradigms
Big Data in Aerospace
• Aircraft and other aerospace products
are some of the most instrumented
products in the world
• Etihad using big data analytics to
measure pilot aptitude
• GE sponsored competition to optimize
flight routes
• PASSUR Aerospace created RightETA to
better predict arrival time at airports
Competency-Based Training
• Competency-based training is an
approach to teaching and
learning applicable wherever a
subject can be finely decomposed
into discrete skills and concepts,
and where the mastery of these
can be measured.
• In aerospace, this is in contrast with some
traditional approaches that required reaching
prescribed time quotas in a simulator or in the air
Measuring Achievement
The challenges include
selecting the right metrics and
knowing how to measure.
• Subject matter experts still
vital
Approaches to measurement
• Item Response Theory
• Bayesian Knowledge Tracing
Item Response Theory
• Item Response Theory is a way of ‘measuring’ the
skill level of a trainee based on their responses to
assessment problems
• Does not assume that every assessment is equal
‣ Variable difficulty
‣ Variable discriminatory power
6 4 2 0 2 4 6
Student Ability ✓
0.0
0.2
0.4
0.6
0.8
1.0ProbabilityofAnsweringCorrectly
P(✓) = 1
1+e (✓ b)
Problem Difficulty
b = -1 (Easy)
b = 0 (Average)
b = 1 (Hard)
6 4 2 0 2 4 6
Student Ability ✓
0.0
0.2
0.4
0.6
0.8
1.0ProbabilityofAnsweringCorrectly
P(✓) = 1
1+e a✓
Problem Discrimination
a = 0.1
a = 1.0
a = 5.0
Knowledge Tracing
• Does not assume that a single parameter
characterizes the trainee’s entire ability
• Instead, a trainee is measured against many
individual skills or ‘knowledge components’
• After each assessment, the probability that a
trainee has learned is updated in a Bayesian way
• Over many assessments, we can build a clear
picture of the trainee’s mastery of many discrete
skills
Correct Correct
Not Learned Learned
P(L0)P(T)
P(G) 1 - P(S)
P(T) Probability the skill was learned at each opportunity to use it
P(L0) Probability the student had previously learned this skill
P(G) Probability the student will guess correctly if skill is not known
P(S) Probability the students will ‘slip’ if skill is already known
Bayesian Knowledge Tracing
The Equations
Cohort Analysis
• When you already have training data for
hundreds of candidates, you can use
supervised learning models to find predictors
for candidate success
• In our research on pilot e-learning, we use
supervised learning to predict completion rates
• With each successive cohort, you get better
results, and more predictive power
Primer on Predictive Analytics
The decision tree algorithm repeatedly splits a data set on
input variables (“features”), selecting and giving primacy to
those features with the most discriminative power.
Trainee Name …
Performance:
Module 1
Performance:
Module 2
Performance:
Module 3
…
Flight time
(hours)
Predicted
Final
Evaluation
10234 John Doe … 90% 68% 80% … … 85%
10235 Jane Philips … 85% 90% 86% … … 87%
10236 Sam Wilson … 87% 75% 91% … … 79%
… … … … … … … … …
• Through comparison against past cohorts, these
types of regression algorithms can predict final
scores, even as the candidate is still mid-training
• This allows for early identification of weaknesses
• Because the feature weights of various training
inputs have already been calculated, the system
knows where remedial action is most effective
Analytics Engine
Adaptive Training
Assessing Potential
Competence
KC1: 84%

KC2: 90%
KC3: 77%
KC4: 78%
KC5: 54%
KC6: 71%
…
Through evaluation across a range
of core skills, knowledge tracing
algorithms can identify areas for
remediation or certify a candidate.
This is how competency-based
training could work.
Data on career performance can
then inform training metrics.
Knowledge

Components
Admission & Recruitment
Why would you want to
use predictive analytics
in admissions, hiring or
recruitment?
• Avoid bias
• Predict outcome
Promises & Perils
Unstructured interviews
Reference checks
Number of years of work experience
Work sample test
General cognitive test
Structured interview
0 7.5 15 22.5 30
26%
26%
29%
3%
7%
14%
Adapted from Work Rules! by Laszlo Bock,
Senior Vice President of People Operations at
Google
Conclusion
• Machine learning and other AI-based systems are
disrupting many industries and bringing us smarter,
more targeted products and services
• Education & training are already feeling the wave of
these technologies and will be dramatically
transformed by them
• Data-driven adaptive training will become the
industry standard as we move towards
competency-based training
Thank You!
@mikhailklassen
mikhail.klassen@ppksgroup.com
Get in touch!

Machine Learning for Aerospace Training

  • 1.
    Applying Machine Learning toAerospace Training Mikhail Klassen Chief Data Scientist Royal Aeronautical Society Conference Simulation-Based Training in the Digital Generation London, UK 11—12 November, 2015
  • 2.
    Background in computational astrophysicsand the study of star formation. Ph.D. (Almost), McMaster University
 B.Sc., Columbia University
 Applied Physics & Applied Mathematics Data Scientist
 Paladin:Paradigm Knowledge Solutions Mikhail Klassen @mikhailklassen mikhail.klassen@ppksgroup.com Artist’s conception of a newborn star Supercomputer simulation of star birth from Klassen et al. (2015, in prep.)
  • 3.
    Data Science Data scienceis a relatively new interdisciplinary field combining skills from: • Mathematics, statistics • Computer science, artificial intelligence, data mining • Data visualization, databases
  • 4.
    Teaching Machines to“Learn” Supervised Learning • Developing a statistical model that gets better the more examples provided to it • Examples: Automatic classification, image recognition, handwriting digitization
  • 5.
    Teaching Machines to“Learn” Unsupervised Learning • Automatic pattern extraction • Examples: clustering, personalized recommendations
  • 6.
    What is BigData? “Big Data” refers to the exponential growth in data… • …Volume: data sets are too large to fit in standard memory and challenge typical available storage • …Velocity: data streams (e.g. Twitter, stock prices) pose challenges for real-time analysis • …Variety: mixture of structured and unstructured data pose challenges for database paradigms
  • 7.
    Big Data inAerospace • Aircraft and other aerospace products are some of the most instrumented products in the world • Etihad using big data analytics to measure pilot aptitude • GE sponsored competition to optimize flight routes • PASSUR Aerospace created RightETA to better predict arrival time at airports
  • 8.
    Competency-Based Training • Competency-basedtraining is an approach to teaching and learning applicable wherever a subject can be finely decomposed into discrete skills and concepts, and where the mastery of these can be measured. • In aerospace, this is in contrast with some traditional approaches that required reaching prescribed time quotas in a simulator or in the air
  • 9.
    Measuring Achievement The challengesinclude selecting the right metrics and knowing how to measure. • Subject matter experts still vital Approaches to measurement • Item Response Theory • Bayesian Knowledge Tracing
  • 10.
    Item Response Theory •Item Response Theory is a way of ‘measuring’ the skill level of a trainee based on their responses to assessment problems • Does not assume that every assessment is equal ‣ Variable difficulty ‣ Variable discriminatory power
  • 11.
    6 4 20 2 4 6 Student Ability ✓ 0.0 0.2 0.4 0.6 0.8 1.0ProbabilityofAnsweringCorrectly P(✓) = 1 1+e (✓ b) Problem Difficulty b = -1 (Easy) b = 0 (Average) b = 1 (Hard)
  • 12.
    6 4 20 2 4 6 Student Ability ✓ 0.0 0.2 0.4 0.6 0.8 1.0ProbabilityofAnsweringCorrectly P(✓) = 1 1+e a✓ Problem Discrimination a = 0.1 a = 1.0 a = 5.0
  • 13.
    Knowledge Tracing • Doesnot assume that a single parameter characterizes the trainee’s entire ability • Instead, a trainee is measured against many individual skills or ‘knowledge components’ • After each assessment, the probability that a trainee has learned is updated in a Bayesian way • Over many assessments, we can build a clear picture of the trainee’s mastery of many discrete skills
  • 14.
    Correct Correct Not LearnedLearned P(L0)P(T) P(G) 1 - P(S) P(T) Probability the skill was learned at each opportunity to use it P(L0) Probability the student had previously learned this skill P(G) Probability the student will guess correctly if skill is not known P(S) Probability the students will ‘slip’ if skill is already known
  • 15.
  • 16.
    Cohort Analysis • Whenyou already have training data for hundreds of candidates, you can use supervised learning models to find predictors for candidate success • In our research on pilot e-learning, we use supervised learning to predict completion rates • With each successive cohort, you get better results, and more predictive power
  • 17.
    Primer on PredictiveAnalytics The decision tree algorithm repeatedly splits a data set on input variables (“features”), selecting and giving primacy to those features with the most discriminative power.
  • 18.
    Trainee Name … Performance: Module1 Performance: Module 2 Performance: Module 3 … Flight time (hours) Predicted Final Evaluation 10234 John Doe … 90% 68% 80% … … 85% 10235 Jane Philips … 85% 90% 86% … … 87% 10236 Sam Wilson … 87% 75% 91% … … 79% … … … … … … … … … • Through comparison against past cohorts, these types of regression algorithms can predict final scores, even as the candidate is still mid-training • This allows for early identification of weaknesses • Because the feature weights of various training inputs have already been calculated, the system knows where remedial action is most effective Analytics Engine
  • 19.
  • 20.
    Assessing Potential Competence KC1: 84%
 KC2:90% KC3: 77% KC4: 78% KC5: 54% KC6: 71% … Through evaluation across a range of core skills, knowledge tracing algorithms can identify areas for remediation or certify a candidate. This is how competency-based training could work. Data on career performance can then inform training metrics. Knowledge
 Components
  • 21.
    Admission & Recruitment Whywould you want to use predictive analytics in admissions, hiring or recruitment? • Avoid bias • Predict outcome
  • 22.
    Promises & Perils Unstructuredinterviews Reference checks Number of years of work experience Work sample test General cognitive test Structured interview 0 7.5 15 22.5 30 26% 26% 29% 3% 7% 14% Adapted from Work Rules! by Laszlo Bock, Senior Vice President of People Operations at Google
  • 23.
    Conclusion • Machine learningand other AI-based systems are disrupting many industries and bringing us smarter, more targeted products and services • Education & training are already feeling the wave of these technologies and will be dramatically transformed by them • Data-driven adaptive training will become the industry standard as we move towards competency-based training
  • 24.