SlideShare a Scribd company logo
1 of 27
This course is prepared under the Erasmus+ KA-210-YOU Project titled
«Skilling Youth for the Next Generation Air Transport Management»
Machine Learning
Applications in Aviation
Feature Selection
Asst. Prof. Dr. Emircan Özdemir
Eskişehir Technical University
• In data mining, 80% of the analysis effort is spent on data cleaning and preparation and
only 20% is typically spent on modeling.
• Data cleansing and preparation are things that are better learned through experience and
not so much from a book or a course. Feature selection is one of the important steps that
affects the success of the model during the data preparation phase.
• Feature selection is the process of choosing a subset of relevant features from a larger
set to improve model performance, reduce computational complexity, and enhance
interpretability.
• Aviation datasets often contain a large number of features, including sensor data, flight
parameters, weather conditions, and historical data. Selecting relevant features has
important impacts on improving the accuracy, efficiency, and interpretability of machine
learning models in aviation applications.
• Specific challenges and considerations arise when dealing with aviation data. This may
include the large size of datasets, high dimensionality due to various sensor inputs,
potential noise in data, and the need to handle time-series and sequential data.
Feature Selection 2
Introduction
• Large datasets in aviation:
Aviation datasets are expansive, encompassing a wealth of information from flight records,
sensor readings, and historical data. Managing and processing such large volumes of data pose
challenges in terms of storage, processing speed, and computational resources.
• High dimensionality due to various sensor inputs:
Aviation data is characterized by high dimensionality, arising from the diverse set of sensors
capturing parameters like altitude, airspeed, and GPS coordinates. The multitude of sensor
inputs increases the complexity of the data, emphasizing the need for effective feature selection
to enhance model performance.
• Noisy data and potential outliers:
Aviation datasets often contain noise, stemming from sensor inaccuracies or external factors.
Additionally, the presence of potential outliers can impact model training and prediction
accuracy. Implementing preprocessing techniques and outlier detection methods becomes
crucial to ensure the reliability and robustness of machine learning models in aviation
applications.
Feature Selection 3
Aviation Data Characteristics
Sensor data:
Sensor data encompasses information collected from various onboard sensors, such as
altitude sensors, airspeed indicators, and GPS devices. These real-time measurements
provide crucial insights into the aircraft's operational status and environmental conditions.
Flight parameters:
Flight parameters include data related to the aircraft's performance and navigation, such as
pitch, roll, and yaw angles. These parameters are essential for understanding the dynamics
of flight and optimizing control systems.
Feature Selection 4
Types of Features in Aviation Data
Weather conditions:
Weather conditions feature meteorological data like temperature, humidity, wind speed, and
precipitation. Incorporating this data is vital for assessing potential hazards and optimizing
flight paths in response to changing weather patterns.
Historical data:
Historical data comprises information from past flights, maintenance records, and incidents.
Analyzing historical patterns aids in predicting potential issues, optimizing maintenance
schedules, and enhancing overall safety and efficiency.
Feature Selection 5
Types of Features in Aviation Data
Aircraft specifications:
Aircraft specifications include details about the aircraft's make, model, engine type, and
other technical specifications. This information is crucial for tailoring machine learning
models to specific aircraft types and optimizing performance.
Customer/Passenger data:
Customer data may include information about passengers, their preferences, and feedback.
While privacy considerations are paramount, leveraging anonymized customer data can
contribute to personalized services, improved customer experiences, and operational
efficiency.
Feature Selection 6
Types of Features in Aviation Data
• Data quality and preprocessing issues:
Ensuring the quality of aviation data is paramount, as inaccuracies and inconsistencies may
arise from sensor malfunctions or external factors. Preprocessing challenges include
cleaning noisy data, handling missing values, and normalizing features to enhance the
reliability of the machine learning models.
• High dimensionality and computational complexity:
Aviation datasets often exhibit high dimensionality due to numerous sensors and diverse
parameters. The computational complexity associated with processing and analyzing such
datasets can be a challenge. Feature selection methods must be efficient to handle the
large number of features without compromising model performance.
Feature Selection 7
Challenges in Aviation Feature Selection
• Incorporating domain knowledge:
In aviation, domain expertise is crucial for identifying relevant features and understanding
the intricacies of the data. Integrating domain knowledge into the feature selection process
ensures that selected features align with operational requirements and contribute to
meaningful insights.
• Handling time-series and sequential data:
Aviation data frequently involves time-series and sequential information, such as flight
trajectories and sensor readings over time. Feature selection methods need to account for
the temporal nature of the data, considering how features evolve during different phases of
flight and adapting to the sequential nature of events.
Feature Selection 8
Challenges in Aviation Feature Selection
• Remove irrelevant features/attributes
• Increase the performance of your model
• Make the model training faster
• Build your model easier with less and relevant features
• Build models which are easy to to understand
• With less features, it’s easy to debug your models
Feature Selection 9
Advantages of Feature Selection
Feature Selection
Filter Methods
Correlation analysis
Information gain
Mutual information
Wrapper Methods
Recursive Feature
Elimination (RFE)
Forward selection
Backward
elimination
Embedded
Methods
LASSO (Least
Absolute Shrinkage
and Selection
Operator)
Decision Trees and
Random Forests
Regularized
regression models
Feature Selection 10
Feature Selection Techniques
Filter Methods: Filter methods involve the direct evaluation of individual features without
considering the impact on the model. These methods are computationally efficient and
include:
• Correlation analysis: Assesses the linear relationship between features and identifies
highly correlated ones. It helps in selecting a subset of features that are less redundant.
Suitable for numerical data.
• Information gain: Measures the reduction in uncertainty about the target variable when
considering a particular feature. Features with high information gain are prioritized.
Primarily used for categorical target variables, but can be adapted for numerical data.
• Mutual information: Quantifies the amount of information shared between a feature and
the target variable. It aids in selecting features that contribute significantly to predictive
accuracy. Can be applied to both numerical and categorical data.
Feature Selection 11
Feature Selection Techniques
Wrapper Methods: Wrapper methods determine feature subsets based on the model's
performance. These methods involve iterative model training and selection and include:
• Recursive Feature Elimination (RFE): Systematically removes the least important
features by training the model iteratively. RFE helps identify the most critical features for
optimal model performance. Applicable to both numerical and categorical data.
• Forward selection: Builds the feature set incrementally by adding the most relevant
feature in each iteration. It continues until a predefined criterion is met, optimizing for
model accuracy. Typically used with numerical data.
• Backward elimination: Starts with all features and removes the least important ones
iteratively. It aims to find the minimal subset of features that maximizes model
performance. Similar to forward selection, it is often applied to numerical data.
Feature Selection 12
Feature Selection Techniques
Embedded Methods: Embedded methods integrate feature selection into the model
training process. These methods include:
• LASSO (Least Absolute Shrinkage and Selection Operator): Introduces a penalty term
during model training that encourages sparsity in feature weights, effectively selecting a
subset of important features. Suitable for numerical data.
• Decision Trees and Random Forests: Built-in feature selection mechanisms within
decision tree algorithms. These models naturally highlight important features based on
their contribution to decision-making. Can handle both numerical and categorical data.
• Regularized regression models: Incorporate regularization terms in regression models,
penalizing the inclusion of unnecessary features. This encourages the selection of
relevant features. Primarily designed for numerical data.
Feature Selection 13
Feature Selection Techniques
• Thoroughly understand your dataset, including feature types and inherent patterns.
• Clearly define your goal for feature selection, such as improving accuracy or interpretability.
• Choose methods aligned with your data types—numerical or categorical.
• Assess computational complexity, considering the size of your dataset.
• Use methods like correlation analysis or regularization for correlated features.
• Decide on model-agnostic or model-specific feature selection based on your preference.
• Leverage domain expertise to guide feature selection based on contextual insights.
• Explore ensemble methods like Random Forests, which naturally perform feature selection.
• Assess the stability and consistency of the feature selection method.
• Use cross-validation to ensure selected features generalize well to unseen data.
• Experiment with multiple methods and compare outcomes to identify the most suitable.
• Understand trade-offs between simplicity, accuracy, and computational efficiency.
Feature Selection 14
How to choose the right feature selection technique?
Domain-Specific Feature Selection refers to the process of selecting relevant features for a
machine learning model based on the specific knowledge and characteristics of a particular
domain or industry. In other words, it involves tailoring the feature selection process to the
intricacies and requirements of a specific field or domain of expertise.
Key components of Domain-Specific Feature Selection are:
• Domain Knowledge
• Collaboration with Domain Experts
• Custom Criteria for Selection
• Relevance to Industry-Specific Goals
• Enhanced Model Performance
Feature Selection 15
Domain-Specific Feature Selection
• Importance of domain knowledge in aviation: You should recognize the critical role of
domain knowledge in aviation feature selection. Experts in the field possess insights into
the significance of certain features and can guide the selection process for optimal model
performance.
• Collaboration with domain experts: The collaboration with aviation domain experts is
invaluable. Working closely with professionals who understand the intricacies of aviation
data ensures that feature selection aligns with operational requirements, safety
considerations, and industry-specific nuances.
• Custom feature selection based on aviation-specific criteria: There is a need for
custom feature selection criteria tailored to aviation. Generic approaches may not capture
the unique aspects of aviation data. Creating bespoke selection criteria based on
industry-specific considerations enhances the relevance and effectiveness of the chosen
features.
Feature Selection 16
Domain-Specific Feature Selection
• Optimizing Flight Safety
An airline may implement feature selection to identify critical flight parameters from a vast
array of sensor data. By focusing on key indicators such as altitude, airspeed, and engine
performance, the airline can successfully enhance its predictive models for detecting
potential safety issues. This results in more accurate and timely alerts, contributing to
improved overall flight safety.
• Efficient Aircraft Maintenance Scheduling
An aviation maintenance facility may utilize historical data for predictive maintenance. By
employing feature selection techniques, the team can identify the most relevant features
related to aircraft health and performance. This can streamline the maintenance scheduling
process, reducing downtime and operational costs, while ensuring optimal aircraft reliability.
Feature Selection 17
Case Studies
• Enhanced Air Traffic Management
Air traffic control agencies face challenges in processing large volumes of data for optimal
route planning. Feature selection methods can be applied to prioritize weather
conditions, airspace congestion, and historical flight patterns. This enables the
development of more efficient air traffic management systems, reducing delays and
improving overall airspace utilization.
• Fuel Efficiency Improvement
An airline may aim to optimize fuel consumption by identifying the most influential
factors affecting fuel efficiency. Feature selection can be conducted focusing on variables
such as weather conditions, aircraft weight, and engine performance. The resulting model
provides actionable insights, leading to fuel-efficient operational strategies and substantial
cost savings.
Feature Selection 18
Case Studies
• Customized Aircraft Design
An aircraft manufacturer may leverage feature selection to identify key specifications for
designing customized aircraft. By considering factors such as passenger preferences,
operational requirements, and fuel efficiency, the company can optimize its design process.
This can result in the production of aircraft that better met the unique needs of specific
markets and clients.
• Enhanced Passenger Experience
An airline may aim to improve the overall passenger experience by tailoring services and
operations to individual preferences. The airline can access to a diverse set of passenger
data, including demographic information, travel history, and in-flight behaviors. By utilizing a
combination of filter and wrapper methods, the airline can identify key features
influencing passenger satisfaction. This can led to the implementation of personalized
services such as tailored in-flight entertainment recommendations, optimized seating
arrangements aligned with passenger preferences, and an efficient onboard retail selection
Feature Selection 19
Case Studies
• Integration of deep learning techniques
For enhanced predictive modeling, integration of deep learning techniques in aviation is
becoming important. Deep learning algorithms, with their capacity to automatically extract
intricate patterns from large datasets, hold the potential to improve the accuracy and
efficiency of feature selection, especially in scenarios where complex relationships exist
within the data.
• Explainable AI for aviation applications
Explainable AI (XAI) in aviation has a growing importance for transparent and interpretable
machine learning models. As aviation systems become more reliant on AI, ensuring the
explainability of model decisions becomes crucial for safety, regulatory compliance, and
gaining the trust of industry stakeholders.
Feature Selection 20
Future Trends and Technologies
• Advances in real-time feature selection
Real-time feature selection, where models dynamically adapt to changing data conditions, is
an emerging trend. With the advancements in computational capabilities, the ability to
perform feature selection in real-time allows aviation systems to respond promptly to
evolving circumstances, optimizing decision-making processes and enhancing overall
system responsiveness.
Feature Selection 21
Future Trends and Technologies
• In RapidMiner, using the Repository window, follow
the path Training Resources-Model-Unsupervised-
Feature Weights and open the Hotel App Select by
Weight Solution process.
• In this example, three different feature selection
methods are provided in the model. These methods
are Information Gain, Correlation, and Relief. All of
these three methods are weighting methods to select
features.
• Data is imported using ETL subprocess.
Feature Selection 22
RapidMiner Example on Feature Selection
• In this model, feature weighting is
implemented in three different ways,
using the Feature Weights
operators.
• Weights are normalized and sorted
in descending order.
- Information Gain
- Correlation (using squared
correlation)
- Relief
• For each of the three sets of
weights, Select by Weights operator
keeps only the most important
attributes (threshold is set to 0.5).
Feature Selection 23
RapidMiner Example on Feature Selection
• You can inspect the outputs using Results view.
Feature Selection 24
RapidMiner Example on Feature Selection
• You can select the most relevant features considering these weights you obtained.
• Also as mentioned earlier, you should use domain expertise in the selection process.
• The selection threshold was set 0.5. You can try different thresholds and make decisions
considering different scenarios.
• Building the model with the right combination of feature set will help you to obtain more
succesful and accurate outputs/predictions.
Feature Selection 25
RapidMiner Example on Feature Selection
• In summary, feature selection is crucial in aviation's data-driven narrative. It's not just a
tool; it's the essence of constructing precise and efficient machine learning models.
Navigating through the challenges posed by complex aviation data, we unveiled smart
strategies to enhance accuracy and efficiency. Tailored feature selection is the game-
changer, shaping a path towards more accurate predictions and optimized aviation
operations.
• Tailoring methodologies to aviation data's unique characteristics not only boosts model
accuracy but also ensures safety and operational efficiency. In aviation, selecting the right
features is like fine-tuning an instrument, orchestrating a harmonious symphony of data
insights.
Feature Selection 26
Conclusion
• Considering dynamic advancements in deep learning and real-time processing;
discussing challenges and collobaration are important. Data scientists, aviation experts,
and researchers, should collobarate to refine feature selection techniques.
Feature Selection 27
Conclusion

More Related Content

Similar to ml-09x01.pdf

Ibm test data_management_v0.4
Ibm test data_management_v0.4Ibm test data_management_v0.4
Ibm test data_management_v0.4Rosario Cunha
 
Modernizing legacy systems
Modernizing legacy systemsModernizing legacy systems
Modernizing legacy systemsBhagvanK1
 
IRJET- Agricultural Productivity System
IRJET- Agricultural Productivity SystemIRJET- Agricultural Productivity System
IRJET- Agricultural Productivity SystemIRJET Journal
 
Customer relationship management
Customer relationship managementCustomer relationship management
Customer relationship managementRohit Gupta
 
Data Engineer’s Lunch #67: Machine Learning - Feature Selection
Data Engineer’s Lunch #67: Machine Learning - Feature SelectionData Engineer’s Lunch #67: Machine Learning - Feature Selection
Data Engineer’s Lunch #67: Machine Learning - Feature SelectionAnant Corporation
 
Data Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable TestingData Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable TestingKnoldus Inc.
 
Comparative Study of Enchancement of Automated Student Attendance System Usin...
Comparative Study of Enchancement of Automated Student Attendance System Usin...Comparative Study of Enchancement of Automated Student Attendance System Usin...
Comparative Study of Enchancement of Automated Student Attendance System Usin...IRJET Journal
 
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...IRJET Journal
 
SAS Training session - By Pratima
SAS Training session  -  By Pratima SAS Training session  -  By Pratima
SAS Training session - By Pratima Pratima Pandey
 
introduction to Statistical Theory.pptx
 introduction to Statistical Theory.pptx introduction to Statistical Theory.pptx
introduction to Statistical Theory.pptxDr.Shweta
 
Data Quality Management: Cleaner Data, Better Reporting
Data Quality Management: Cleaner Data, Better ReportingData Quality Management: Cleaner Data, Better Reporting
Data Quality Management: Cleaner Data, Better Reportingaccenture
 
Transport Modelling for managers 2014 willumsen
Transport Modelling for managers 2014 willumsenTransport Modelling for managers 2014 willumsen
Transport Modelling for managers 2014 willumsenLuis Willumsen
 
Mainframe Sort Operations: Gaining the Insights You Need for Peak Performance
Mainframe Sort Operations: Gaining the Insights You Need for Peak PerformanceMainframe Sort Operations: Gaining the Insights You Need for Peak Performance
Mainframe Sort Operations: Gaining the Insights You Need for Peak PerformancePrecisely
 
Data Engineer's Lunch #67: Machine Learning - Feature Selection
Data Engineer's Lunch #67: Machine Learning - Feature SelectionData Engineer's Lunch #67: Machine Learning - Feature Selection
Data Engineer's Lunch #67: Machine Learning - Feature SelectionAnant Corporation
 
Feature Engineering in Machine Learning
Feature Engineering in Machine LearningFeature Engineering in Machine Learning
Feature Engineering in Machine LearningKnoldus Inc.
 
Testing Machine Learning-enabled Systems: A Personal Perspective
Testing Machine Learning-enabled Systems: A Personal PerspectiveTesting Machine Learning-enabled Systems: A Personal Perspective
Testing Machine Learning-enabled Systems: A Personal PerspectiveLionel Briand
 

Similar to ml-09x01.pdf (20)

Ibm test data_management_v0.4
Ibm test data_management_v0.4Ibm test data_management_v0.4
Ibm test data_management_v0.4
 
Modernizing legacy systems
Modernizing legacy systemsModernizing legacy systems
Modernizing legacy systems
 
Module-4_Part-II.pptx
Module-4_Part-II.pptxModule-4_Part-II.pptx
Module-4_Part-II.pptx
 
IRJET- Agricultural Productivity System
IRJET- Agricultural Productivity SystemIRJET- Agricultural Productivity System
IRJET- Agricultural Productivity System
 
Customer relationship management
Customer relationship managementCustomer relationship management
Customer relationship management
 
CSL0777-L07.pptx
CSL0777-L07.pptxCSL0777-L07.pptx
CSL0777-L07.pptx
 
Data Engineer’s Lunch #67: Machine Learning - Feature Selection
Data Engineer’s Lunch #67: Machine Learning - Feature SelectionData Engineer’s Lunch #67: Machine Learning - Feature Selection
Data Engineer’s Lunch #67: Machine Learning - Feature Selection
 
Data Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable TestingData Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable Testing
 
Lessons Learned from AMI Deployments and Asset Management Readiness
Lessons Learned from AMI Deployments and Asset Management ReadinessLessons Learned from AMI Deployments and Asset Management Readiness
Lessons Learned from AMI Deployments and Asset Management Readiness
 
Comparative Study of Enchancement of Automated Student Attendance System Usin...
Comparative Study of Enchancement of Automated Student Attendance System Usin...Comparative Study of Enchancement of Automated Student Attendance System Usin...
Comparative Study of Enchancement of Automated Student Attendance System Usin...
 
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
 
SAS Training session - By Pratima
SAS Training session  -  By Pratima SAS Training session  -  By Pratima
SAS Training session - By Pratima
 
introduction to Statistical Theory.pptx
 introduction to Statistical Theory.pptx introduction to Statistical Theory.pptx
introduction to Statistical Theory.pptx
 
Data Quality Management: Cleaner Data, Better Reporting
Data Quality Management: Cleaner Data, Better ReportingData Quality Management: Cleaner Data, Better Reporting
Data Quality Management: Cleaner Data, Better Reporting
 
Transport Modelling for managers 2014 willumsen
Transport Modelling for managers 2014 willumsenTransport Modelling for managers 2014 willumsen
Transport Modelling for managers 2014 willumsen
 
Mainframe Sort Operations: Gaining the Insights You Need for Peak Performance
Mainframe Sort Operations: Gaining the Insights You Need for Peak PerformanceMainframe Sort Operations: Gaining the Insights You Need for Peak Performance
Mainframe Sort Operations: Gaining the Insights You Need for Peak Performance
 
Data Engineer's Lunch #67: Machine Learning - Feature Selection
Data Engineer's Lunch #67: Machine Learning - Feature SelectionData Engineer's Lunch #67: Machine Learning - Feature Selection
Data Engineer's Lunch #67: Machine Learning - Feature Selection
 
-linkedin
-linkedin-linkedin
-linkedin
 
Feature Engineering in Machine Learning
Feature Engineering in Machine LearningFeature Engineering in Machine Learning
Feature Engineering in Machine Learning
 
Testing Machine Learning-enabled Systems: A Personal Perspective
Testing Machine Learning-enabled Systems: A Personal PerspectiveTesting Machine Learning-enabled Systems: A Personal Perspective
Testing Machine Learning-enabled Systems: A Personal Perspective
 

More from NextGenATM Erasmus+ Project (20)

ml-08x01.pdf
ml-08x01.pdfml-08x01.pdf
ml-08x01.pdf
 
ml-07x01.pdf
ml-07x01.pdfml-07x01.pdf
ml-07x01.pdf
 
ml-06x01.pdf
ml-06x01.pdfml-06x01.pdf
ml-06x01.pdf
 
ml-05x01.pdf
ml-05x01.pdfml-05x01.pdf
ml-05x01.pdf
 
ml-04x01.pdf
ml-04x01.pdfml-04x01.pdf
ml-04x01.pdf
 
ml-03x01.pdf
ml-03x01.pdfml-03x01.pdf
ml-03x01.pdf
 
ml-02x01.pdf
ml-02x01.pdfml-02x01.pdf
ml-02x01.pdf
 
ml-01x01.pdf
ml-01x01.pdfml-01x01.pdf
ml-01x01.pdf
 
EAVA presentation.pdf
EAVA presentation.pdfEAVA presentation.pdf
EAVA presentation.pdf
 
ESTU presentation.pdf
ESTU presentation.pdfESTU presentation.pdf
ESTU presentation.pdf
 
HSW presentation.pdf
HSW presentation.pdfHSW presentation.pdf
HSW presentation.pdf
 
ts-07x01.pdf
ts-07x01.pdfts-07x01.pdf
ts-07x01.pdf
 
ts-06x01.pdf
ts-06x01.pdfts-06x01.pdf
ts-06x01.pdf
 
ts-05x01.pdf
ts-05x01.pdfts-05x01.pdf
ts-05x01.pdf
 
ts-04x01.pdf
ts-04x01.pdfts-04x01.pdf
ts-04x01.pdf
 
ts-03x01.pdf
ts-03x01.pdfts-03x01.pdf
ts-03x01.pdf
 
ts-02x01.pdf
ts-02x01.pdfts-02x01.pdf
ts-02x01.pdf
 
ts-01x01.pdf
ts-01x01.pdfts-01x01.pdf
ts-01x01.pdf
 
sa-07x01.pdf
sa-07x01.pdfsa-07x01.pdf
sa-07x01.pdf
 
sa-06x01.pdf
sa-06x01.pdfsa-06x01.pdf
sa-06x01.pdf
 

Recently uploaded

Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 

Recently uploaded (20)

Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 

ml-09x01.pdf

  • 1. This course is prepared under the Erasmus+ KA-210-YOU Project titled «Skilling Youth for the Next Generation Air Transport Management» Machine Learning Applications in Aviation Feature Selection Asst. Prof. Dr. Emircan Özdemir Eskişehir Technical University
  • 2. • In data mining, 80% of the analysis effort is spent on data cleaning and preparation and only 20% is typically spent on modeling. • Data cleansing and preparation are things that are better learned through experience and not so much from a book or a course. Feature selection is one of the important steps that affects the success of the model during the data preparation phase. • Feature selection is the process of choosing a subset of relevant features from a larger set to improve model performance, reduce computational complexity, and enhance interpretability. • Aviation datasets often contain a large number of features, including sensor data, flight parameters, weather conditions, and historical data. Selecting relevant features has important impacts on improving the accuracy, efficiency, and interpretability of machine learning models in aviation applications. • Specific challenges and considerations arise when dealing with aviation data. This may include the large size of datasets, high dimensionality due to various sensor inputs, potential noise in data, and the need to handle time-series and sequential data. Feature Selection 2 Introduction
  • 3. • Large datasets in aviation: Aviation datasets are expansive, encompassing a wealth of information from flight records, sensor readings, and historical data. Managing and processing such large volumes of data pose challenges in terms of storage, processing speed, and computational resources. • High dimensionality due to various sensor inputs: Aviation data is characterized by high dimensionality, arising from the diverse set of sensors capturing parameters like altitude, airspeed, and GPS coordinates. The multitude of sensor inputs increases the complexity of the data, emphasizing the need for effective feature selection to enhance model performance. • Noisy data and potential outliers: Aviation datasets often contain noise, stemming from sensor inaccuracies or external factors. Additionally, the presence of potential outliers can impact model training and prediction accuracy. Implementing preprocessing techniques and outlier detection methods becomes crucial to ensure the reliability and robustness of machine learning models in aviation applications. Feature Selection 3 Aviation Data Characteristics
  • 4. Sensor data: Sensor data encompasses information collected from various onboard sensors, such as altitude sensors, airspeed indicators, and GPS devices. These real-time measurements provide crucial insights into the aircraft's operational status and environmental conditions. Flight parameters: Flight parameters include data related to the aircraft's performance and navigation, such as pitch, roll, and yaw angles. These parameters are essential for understanding the dynamics of flight and optimizing control systems. Feature Selection 4 Types of Features in Aviation Data
  • 5. Weather conditions: Weather conditions feature meteorological data like temperature, humidity, wind speed, and precipitation. Incorporating this data is vital for assessing potential hazards and optimizing flight paths in response to changing weather patterns. Historical data: Historical data comprises information from past flights, maintenance records, and incidents. Analyzing historical patterns aids in predicting potential issues, optimizing maintenance schedules, and enhancing overall safety and efficiency. Feature Selection 5 Types of Features in Aviation Data
  • 6. Aircraft specifications: Aircraft specifications include details about the aircraft's make, model, engine type, and other technical specifications. This information is crucial for tailoring machine learning models to specific aircraft types and optimizing performance. Customer/Passenger data: Customer data may include information about passengers, their preferences, and feedback. While privacy considerations are paramount, leveraging anonymized customer data can contribute to personalized services, improved customer experiences, and operational efficiency. Feature Selection 6 Types of Features in Aviation Data
  • 7. • Data quality and preprocessing issues: Ensuring the quality of aviation data is paramount, as inaccuracies and inconsistencies may arise from sensor malfunctions or external factors. Preprocessing challenges include cleaning noisy data, handling missing values, and normalizing features to enhance the reliability of the machine learning models. • High dimensionality and computational complexity: Aviation datasets often exhibit high dimensionality due to numerous sensors and diverse parameters. The computational complexity associated with processing and analyzing such datasets can be a challenge. Feature selection methods must be efficient to handle the large number of features without compromising model performance. Feature Selection 7 Challenges in Aviation Feature Selection
  • 8. • Incorporating domain knowledge: In aviation, domain expertise is crucial for identifying relevant features and understanding the intricacies of the data. Integrating domain knowledge into the feature selection process ensures that selected features align with operational requirements and contribute to meaningful insights. • Handling time-series and sequential data: Aviation data frequently involves time-series and sequential information, such as flight trajectories and sensor readings over time. Feature selection methods need to account for the temporal nature of the data, considering how features evolve during different phases of flight and adapting to the sequential nature of events. Feature Selection 8 Challenges in Aviation Feature Selection
  • 9. • Remove irrelevant features/attributes • Increase the performance of your model • Make the model training faster • Build your model easier with less and relevant features • Build models which are easy to to understand • With less features, it’s easy to debug your models Feature Selection 9 Advantages of Feature Selection
  • 10. Feature Selection Filter Methods Correlation analysis Information gain Mutual information Wrapper Methods Recursive Feature Elimination (RFE) Forward selection Backward elimination Embedded Methods LASSO (Least Absolute Shrinkage and Selection Operator) Decision Trees and Random Forests Regularized regression models Feature Selection 10 Feature Selection Techniques
  • 11. Filter Methods: Filter methods involve the direct evaluation of individual features without considering the impact on the model. These methods are computationally efficient and include: • Correlation analysis: Assesses the linear relationship between features and identifies highly correlated ones. It helps in selecting a subset of features that are less redundant. Suitable for numerical data. • Information gain: Measures the reduction in uncertainty about the target variable when considering a particular feature. Features with high information gain are prioritized. Primarily used for categorical target variables, but can be adapted for numerical data. • Mutual information: Quantifies the amount of information shared between a feature and the target variable. It aids in selecting features that contribute significantly to predictive accuracy. Can be applied to both numerical and categorical data. Feature Selection 11 Feature Selection Techniques
  • 12. Wrapper Methods: Wrapper methods determine feature subsets based on the model's performance. These methods involve iterative model training and selection and include: • Recursive Feature Elimination (RFE): Systematically removes the least important features by training the model iteratively. RFE helps identify the most critical features for optimal model performance. Applicable to both numerical and categorical data. • Forward selection: Builds the feature set incrementally by adding the most relevant feature in each iteration. It continues until a predefined criterion is met, optimizing for model accuracy. Typically used with numerical data. • Backward elimination: Starts with all features and removes the least important ones iteratively. It aims to find the minimal subset of features that maximizes model performance. Similar to forward selection, it is often applied to numerical data. Feature Selection 12 Feature Selection Techniques
  • 13. Embedded Methods: Embedded methods integrate feature selection into the model training process. These methods include: • LASSO (Least Absolute Shrinkage and Selection Operator): Introduces a penalty term during model training that encourages sparsity in feature weights, effectively selecting a subset of important features. Suitable for numerical data. • Decision Trees and Random Forests: Built-in feature selection mechanisms within decision tree algorithms. These models naturally highlight important features based on their contribution to decision-making. Can handle both numerical and categorical data. • Regularized regression models: Incorporate regularization terms in regression models, penalizing the inclusion of unnecessary features. This encourages the selection of relevant features. Primarily designed for numerical data. Feature Selection 13 Feature Selection Techniques
  • 14. • Thoroughly understand your dataset, including feature types and inherent patterns. • Clearly define your goal for feature selection, such as improving accuracy or interpretability. • Choose methods aligned with your data types—numerical or categorical. • Assess computational complexity, considering the size of your dataset. • Use methods like correlation analysis or regularization for correlated features. • Decide on model-agnostic or model-specific feature selection based on your preference. • Leverage domain expertise to guide feature selection based on contextual insights. • Explore ensemble methods like Random Forests, which naturally perform feature selection. • Assess the stability and consistency of the feature selection method. • Use cross-validation to ensure selected features generalize well to unseen data. • Experiment with multiple methods and compare outcomes to identify the most suitable. • Understand trade-offs between simplicity, accuracy, and computational efficiency. Feature Selection 14 How to choose the right feature selection technique?
  • 15. Domain-Specific Feature Selection refers to the process of selecting relevant features for a machine learning model based on the specific knowledge and characteristics of a particular domain or industry. In other words, it involves tailoring the feature selection process to the intricacies and requirements of a specific field or domain of expertise. Key components of Domain-Specific Feature Selection are: • Domain Knowledge • Collaboration with Domain Experts • Custom Criteria for Selection • Relevance to Industry-Specific Goals • Enhanced Model Performance Feature Selection 15 Domain-Specific Feature Selection
  • 16. • Importance of domain knowledge in aviation: You should recognize the critical role of domain knowledge in aviation feature selection. Experts in the field possess insights into the significance of certain features and can guide the selection process for optimal model performance. • Collaboration with domain experts: The collaboration with aviation domain experts is invaluable. Working closely with professionals who understand the intricacies of aviation data ensures that feature selection aligns with operational requirements, safety considerations, and industry-specific nuances. • Custom feature selection based on aviation-specific criteria: There is a need for custom feature selection criteria tailored to aviation. Generic approaches may not capture the unique aspects of aviation data. Creating bespoke selection criteria based on industry-specific considerations enhances the relevance and effectiveness of the chosen features. Feature Selection 16 Domain-Specific Feature Selection
  • 17. • Optimizing Flight Safety An airline may implement feature selection to identify critical flight parameters from a vast array of sensor data. By focusing on key indicators such as altitude, airspeed, and engine performance, the airline can successfully enhance its predictive models for detecting potential safety issues. This results in more accurate and timely alerts, contributing to improved overall flight safety. • Efficient Aircraft Maintenance Scheduling An aviation maintenance facility may utilize historical data for predictive maintenance. By employing feature selection techniques, the team can identify the most relevant features related to aircraft health and performance. This can streamline the maintenance scheduling process, reducing downtime and operational costs, while ensuring optimal aircraft reliability. Feature Selection 17 Case Studies
  • 18. • Enhanced Air Traffic Management Air traffic control agencies face challenges in processing large volumes of data for optimal route planning. Feature selection methods can be applied to prioritize weather conditions, airspace congestion, and historical flight patterns. This enables the development of more efficient air traffic management systems, reducing delays and improving overall airspace utilization. • Fuel Efficiency Improvement An airline may aim to optimize fuel consumption by identifying the most influential factors affecting fuel efficiency. Feature selection can be conducted focusing on variables such as weather conditions, aircraft weight, and engine performance. The resulting model provides actionable insights, leading to fuel-efficient operational strategies and substantial cost savings. Feature Selection 18 Case Studies
  • 19. • Customized Aircraft Design An aircraft manufacturer may leverage feature selection to identify key specifications for designing customized aircraft. By considering factors such as passenger preferences, operational requirements, and fuel efficiency, the company can optimize its design process. This can result in the production of aircraft that better met the unique needs of specific markets and clients. • Enhanced Passenger Experience An airline may aim to improve the overall passenger experience by tailoring services and operations to individual preferences. The airline can access to a diverse set of passenger data, including demographic information, travel history, and in-flight behaviors. By utilizing a combination of filter and wrapper methods, the airline can identify key features influencing passenger satisfaction. This can led to the implementation of personalized services such as tailored in-flight entertainment recommendations, optimized seating arrangements aligned with passenger preferences, and an efficient onboard retail selection Feature Selection 19 Case Studies
  • 20. • Integration of deep learning techniques For enhanced predictive modeling, integration of deep learning techniques in aviation is becoming important. Deep learning algorithms, with their capacity to automatically extract intricate patterns from large datasets, hold the potential to improve the accuracy and efficiency of feature selection, especially in scenarios where complex relationships exist within the data. • Explainable AI for aviation applications Explainable AI (XAI) in aviation has a growing importance for transparent and interpretable machine learning models. As aviation systems become more reliant on AI, ensuring the explainability of model decisions becomes crucial for safety, regulatory compliance, and gaining the trust of industry stakeholders. Feature Selection 20 Future Trends and Technologies
  • 21. • Advances in real-time feature selection Real-time feature selection, where models dynamically adapt to changing data conditions, is an emerging trend. With the advancements in computational capabilities, the ability to perform feature selection in real-time allows aviation systems to respond promptly to evolving circumstances, optimizing decision-making processes and enhancing overall system responsiveness. Feature Selection 21 Future Trends and Technologies
  • 22. • In RapidMiner, using the Repository window, follow the path Training Resources-Model-Unsupervised- Feature Weights and open the Hotel App Select by Weight Solution process. • In this example, three different feature selection methods are provided in the model. These methods are Information Gain, Correlation, and Relief. All of these three methods are weighting methods to select features. • Data is imported using ETL subprocess. Feature Selection 22 RapidMiner Example on Feature Selection
  • 23. • In this model, feature weighting is implemented in three different ways, using the Feature Weights operators. • Weights are normalized and sorted in descending order. - Information Gain - Correlation (using squared correlation) - Relief • For each of the three sets of weights, Select by Weights operator keeps only the most important attributes (threshold is set to 0.5). Feature Selection 23 RapidMiner Example on Feature Selection
  • 24. • You can inspect the outputs using Results view. Feature Selection 24 RapidMiner Example on Feature Selection
  • 25. • You can select the most relevant features considering these weights you obtained. • Also as mentioned earlier, you should use domain expertise in the selection process. • The selection threshold was set 0.5. You can try different thresholds and make decisions considering different scenarios. • Building the model with the right combination of feature set will help you to obtain more succesful and accurate outputs/predictions. Feature Selection 25 RapidMiner Example on Feature Selection
  • 26. • In summary, feature selection is crucial in aviation's data-driven narrative. It's not just a tool; it's the essence of constructing precise and efficient machine learning models. Navigating through the challenges posed by complex aviation data, we unveiled smart strategies to enhance accuracy and efficiency. Tailored feature selection is the game- changer, shaping a path towards more accurate predictions and optimized aviation operations. • Tailoring methodologies to aviation data's unique characteristics not only boosts model accuracy but also ensures safety and operational efficiency. In aviation, selecting the right features is like fine-tuning an instrument, orchestrating a harmonious symphony of data insights. Feature Selection 26 Conclusion
  • 27. • Considering dynamic advancements in deep learning and real-time processing; discussing challenges and collobaration are important. Data scientists, aviation experts, and researchers, should collobarate to refine feature selection techniques. Feature Selection 27 Conclusion