Airline Passenger Profiling
Based on
Fuzzy Deep Machine Learning
by Ayman Qaddumi
Profiling History
• Passenger profiling plays a vital part of commercial
aviation security
• In 1996, Northwest Airlines developed the first
computer airline passenger profiling system, named
which is a rule-based screening system utilizing
about 39 pieces of pre-boarding data to identify
high-risk passengers
Profiling History
• After the 9/11 attacks, the Transportation Security
Administration (TSA) proposed CAPPS II
• CAPPS II utilized an additional set of attributes
(e.g., income and demography) retrieved from
governmental databases to calculate a risk score
for each passenger.
• Based on their risk scores, passengers are labeled
as “green”, “yellow”, or “red” security risks and then
subject to correspondingly intrusive scrutiny
Profiling History
• In 2004, the TSA replaced CAPPS II with a next-
generation system “Secure Flight”, which improves the
selection process through a terrorist screening database
• The new system shifted responsibility of comparing
passenger identification with sensitive government data
from the privatized airlines to the federal government
• Since 2010, the TSA has required airlines to share pre-
boarding data with the government to perform centralized
watch list matching.
Problems with
Current Profiling Systems
• Given that normal passengers constitute a much larger fraction,
a small imperfection in classifying them will result in a large
number of false positives due to the base rate fallacy
• In a profiling system with 99% accuracy, the case of identifying
the 11 hijackers of 9/11 in a population of 300 million, would
result in 3 million of those identified as False Positives!.
• Manipulating recorded data by malicious intent can throw off the
profiler as well
• So in general, classical passenger profiling methods are
inefficient in handling the rapidly increasing amounts of
electronic records
Recommended Solution
• Emerging deep learning models combined with highly parallel
computing have exhibited promising performance for feature
exaction and abstraction, but their applications in aviation security
management have rarely been reported.
• Although ‘Fuzzy set theory’ provides a powerful tool for handling
imperfect data and has been applied in risk assessment, profiling
methods based on ‘fuzzy profiling’ are few.
• This paper proposes a deep learning approach to passenger
profiling.
• The center of the approach is a Pythagorean fuzzy deep
Boltzmann machine (PFDBM),
PFDBM
• For training the PFDBM the authors propose a
hybrid algorithm that combines a gradient-based
method and an evolutionary algorithm
• Based on the novel learning model, they develop a
deep neural network (DNN) for classifying normal
passengers and potential attackers.
• They claim their approach provides much higher
learning ability and classification accuracy than
existing profilers
Boltzmann Machines
• Introduced in 1983 as a basic connectionist model
using stochastic networks
• Units have one of two states(binary)
• Bidirectional connections (No arrows)
• Probabilistic state transition
Pythagorean fuzzy set
• Any two items are usually compared by a user as either a ‘match’ or a
‘mismatch’
• In fuzzy logic, the range between a perfect match & a perfect mismatch is
extended. For instance, [0,1] set can be subdivided into [0,0.25,0.5,0.75,1]
• Membership into an extended subset can be expressed in terms of a r(x),
strength of membership and d(x), direction of commitment. (Intuitionistic
Fuzzy Set)
• PFS which is more general than IFS in that the sum of squares of the
membership degree and the nonmembership degree is less than or equal to
1.
• Fuzzified neural networks can handle inputs with fuzzy (labeled) and/or
incomplete features, which are inevitable in passenger profilers
PFDBM Training
• For training the PFDBM to meet the high accuracy
requirements of tasks, such as passenger profiling,
the authors propose a hybrid algorithm integrating
the greedy layer wise training method with
biogeography-based optimization (BBO)
FUZZY DNN FOR
PASSENGER PROFILING
PFDBM-Based DNN for identifying the risk of each individual
passenger. The input to the DNN includes the following.
1) The passenger name record, including the identity
information of the passenger and booking information of the
flight.
2) Travel statistics of the passenger’s flight history from the
Aviation Administration (collected from different airlines).
3) Travel statistics from other transportation methods, such as
railway and marine.
DNN Inputs (continued)
• 4) Travel statistics from the Tourism Administration (collected from travel agencies).
• 5) Criminal records from the Public Security Department.
• 6) Educational records from the Education Department.
• 7) Tax records from the Tax Department.
• 8) Reported information from banks and the Housing Administration.
• 9) Consumption records or patterns from large retailers (the detailed records have
typically been preprocessed, summarized, and/or mined by retailers’ systems).
• 10) (Preprocessed) telecommunication behavior records or patterns from telecom
operators.
• 11) (Authenticated and preprocessed) Internet behavior records or patterns from
Internet operators.
Input Classes Divisions
• Based on their data types, the input features can be divided into the following
four classes.
1. Static binary-valued features, which can be input to the PFDBM directly.
2. Static real-valued features, which are first transformed into binary values
using a Gaussian-Bernoulli RBM (GRBM) and then input to the PFDBM.
GRBM is typically effective for image pixel binary classification
3. Static labeled features, which are first transformed into real values using
membership functions of fuzzy sets defined on the domain of the labels, and
then transformed into binary values using GRBM.
4. Dynamic (temporal) features, which are first transformed into real values
using an additional temporal filter, and then transformed into binary values
using GRBM.
Structure of the
PFDBM-based DNN
PFDBM-based DNN
• The GMM is trained with the expectation maximization algorithm. The higher the output value, the higher the
likelihood of the passenger being an attacker is. The classification is made using a predefined threshold τ . The
claim that the passenger is a (potential) attacker is accepted if z ≥ τ and rejected otherwise. Here, τ can be
tuned according to the inspection capability and the current pressure of terrorism, e.g., it can be turned down after
the airport receives a threatening phone call.
• Thus, the DNN has two preprocessing layers, the PFDBM, and the GMM, as shown in Fig. 6. In our current
implementation, the PFDBM has four layers, and the dimension of the input to its first layer is about 7000 (which
is expected to increase when more external information is available).
PFDBM vs Other Models
Conclusion
• The authors believe that our PFDBM model can be
adapted and applied to many other deep learning tasks
in computer science and management science
• Currently, they are developing a multigrade DNN that
integrates a low-grade DNN that only uses nonsensitive
features. If that passes a threshold, the high-grade model
is used
• This paper does not study the impacts of the passenger
profiling on privacy, which we believe should be seriously
addressed by relevant laws and regulations.
References
• Yu-Jun Zheng, Wei-Guo Sheng, Xing-Ming Sun, and Sheng-Yong Chen,
“Airline Passenger Profiling Based on Fuzzy Deep Machine
Learning”,IEEE TRANSACTIONS ON NEURAL NETWORKS AND
LEARNING SYSTEMS
• H. Cavusoglu, Y. Kwark, B. Mai, and S. Raghunathan, “Passenger profiling
and screening for aviation security in the presence of strategic attackers,”
Decision Anal., vol. 10, no. 1, pp. 63–81, 2013.

Airline passenger profiling based on fuzzy deep machine learning

  • 1.
    Airline Passenger Profiling Basedon Fuzzy Deep Machine Learning by Ayman Qaddumi
  • 2.
    Profiling History • Passengerprofiling plays a vital part of commercial aviation security • In 1996, Northwest Airlines developed the first computer airline passenger profiling system, named which is a rule-based screening system utilizing about 39 pieces of pre-boarding data to identify high-risk passengers
  • 3.
    Profiling History • Afterthe 9/11 attacks, the Transportation Security Administration (TSA) proposed CAPPS II • CAPPS II utilized an additional set of attributes (e.g., income and demography) retrieved from governmental databases to calculate a risk score for each passenger. • Based on their risk scores, passengers are labeled as “green”, “yellow”, or “red” security risks and then subject to correspondingly intrusive scrutiny
  • 4.
    Profiling History • In2004, the TSA replaced CAPPS II with a next- generation system “Secure Flight”, which improves the selection process through a terrorist screening database • The new system shifted responsibility of comparing passenger identification with sensitive government data from the privatized airlines to the federal government • Since 2010, the TSA has required airlines to share pre- boarding data with the government to perform centralized watch list matching.
  • 5.
    Problems with Current ProfilingSystems • Given that normal passengers constitute a much larger fraction, a small imperfection in classifying them will result in a large number of false positives due to the base rate fallacy • In a profiling system with 99% accuracy, the case of identifying the 11 hijackers of 9/11 in a population of 300 million, would result in 3 million of those identified as False Positives!. • Manipulating recorded data by malicious intent can throw off the profiler as well • So in general, classical passenger profiling methods are inefficient in handling the rapidly increasing amounts of electronic records
  • 6.
    Recommended Solution • Emergingdeep learning models combined with highly parallel computing have exhibited promising performance for feature exaction and abstraction, but their applications in aviation security management have rarely been reported. • Although ‘Fuzzy set theory’ provides a powerful tool for handling imperfect data and has been applied in risk assessment, profiling methods based on ‘fuzzy profiling’ are few. • This paper proposes a deep learning approach to passenger profiling. • The center of the approach is a Pythagorean fuzzy deep Boltzmann machine (PFDBM),
  • 7.
    PFDBM • For trainingthe PFDBM the authors propose a hybrid algorithm that combines a gradient-based method and an evolutionary algorithm • Based on the novel learning model, they develop a deep neural network (DNN) for classifying normal passengers and potential attackers. • They claim their approach provides much higher learning ability and classification accuracy than existing profilers
  • 8.
    Boltzmann Machines • Introducedin 1983 as a basic connectionist model using stochastic networks • Units have one of two states(binary) • Bidirectional connections (No arrows) • Probabilistic state transition
  • 9.
    Pythagorean fuzzy set •Any two items are usually compared by a user as either a ‘match’ or a ‘mismatch’ • In fuzzy logic, the range between a perfect match & a perfect mismatch is extended. For instance, [0,1] set can be subdivided into [0,0.25,0.5,0.75,1] • Membership into an extended subset can be expressed in terms of a r(x), strength of membership and d(x), direction of commitment. (Intuitionistic Fuzzy Set) • PFS which is more general than IFS in that the sum of squares of the membership degree and the nonmembership degree is less than or equal to 1. • Fuzzified neural networks can handle inputs with fuzzy (labeled) and/or incomplete features, which are inevitable in passenger profilers
  • 10.
    PFDBM Training • Fortraining the PFDBM to meet the high accuracy requirements of tasks, such as passenger profiling, the authors propose a hybrid algorithm integrating the greedy layer wise training method with biogeography-based optimization (BBO)
  • 11.
    FUZZY DNN FOR PASSENGERPROFILING PFDBM-Based DNN for identifying the risk of each individual passenger. The input to the DNN includes the following. 1) The passenger name record, including the identity information of the passenger and booking information of the flight. 2) Travel statistics of the passenger’s flight history from the Aviation Administration (collected from different airlines). 3) Travel statistics from other transportation methods, such as railway and marine.
  • 12.
    DNN Inputs (continued) •4) Travel statistics from the Tourism Administration (collected from travel agencies). • 5) Criminal records from the Public Security Department. • 6) Educational records from the Education Department. • 7) Tax records from the Tax Department. • 8) Reported information from banks and the Housing Administration. • 9) Consumption records or patterns from large retailers (the detailed records have typically been preprocessed, summarized, and/or mined by retailers’ systems). • 10) (Preprocessed) telecommunication behavior records or patterns from telecom operators. • 11) (Authenticated and preprocessed) Internet behavior records or patterns from Internet operators.
  • 13.
    Input Classes Divisions •Based on their data types, the input features can be divided into the following four classes. 1. Static binary-valued features, which can be input to the PFDBM directly. 2. Static real-valued features, which are first transformed into binary values using a Gaussian-Bernoulli RBM (GRBM) and then input to the PFDBM. GRBM is typically effective for image pixel binary classification 3. Static labeled features, which are first transformed into real values using membership functions of fuzzy sets defined on the domain of the labels, and then transformed into binary values using GRBM. 4. Dynamic (temporal) features, which are first transformed into real values using an additional temporal filter, and then transformed into binary values using GRBM.
  • 14.
  • 15.
    PFDBM-based DNN • TheGMM is trained with the expectation maximization algorithm. The higher the output value, the higher the likelihood of the passenger being an attacker is. The classification is made using a predefined threshold τ . The claim that the passenger is a (potential) attacker is accepted if z ≥ τ and rejected otherwise. Here, τ can be tuned according to the inspection capability and the current pressure of terrorism, e.g., it can be turned down after the airport receives a threatening phone call. • Thus, the DNN has two preprocessing layers, the PFDBM, and the GMM, as shown in Fig. 6. In our current implementation, the PFDBM has four layers, and the dimension of the input to its first layer is about 7000 (which is expected to increase when more external information is available).
  • 16.
  • 17.
    Conclusion • The authorsbelieve that our PFDBM model can be adapted and applied to many other deep learning tasks in computer science and management science • Currently, they are developing a multigrade DNN that integrates a low-grade DNN that only uses nonsensitive features. If that passes a threshold, the high-grade model is used • This paper does not study the impacts of the passenger profiling on privacy, which we believe should be seriously addressed by relevant laws and regulations.
  • 18.
    References • Yu-Jun Zheng,Wei-Guo Sheng, Xing-Ming Sun, and Sheng-Yong Chen, “Airline Passenger Profiling Based on Fuzzy Deep Machine Learning”,IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS • H. Cavusoglu, Y. Kwark, B. Mai, and S. Raghunathan, “Passenger profiling and screening for aviation security in the presence of strategic attackers,” Decision Anal., vol. 10, no. 1, pp. 63–81, 2013.