ml-06x01.pdf

This course is prepared under the Erasmus+ KA-210-YOU Project titled
«Skilling Youth for the Next Generation Air Transport Management»
Machine Learning
Applications in Aviation
Clustering
Asst. Prof. Dr. Emircan Özdemir
Eskişehir Technical University

• Clustering, as a fundamental unsupervised learning technique, holds paramount
importance in discerning inherent structures within aviation datasets. It's the compass that
guides analysts through the vast sea of information, allowing them to uncover hidden
relationships, group similar entities, and derive meaningful insights.
• At its core, clustering is the art of finding natural groupings or clusters within a dataset.
These clusters represent entities that share similarities, creating a valuable framework for
understanding the underlying structure of aviation data.
Clustering 2
Introduction

• Clustering operates in the unsupervised learning realm, where the algorithm explores the
data without predefined labels. In aviation analytics, where the intricacies of flight
patterns, maintenance records, and passenger behaviors are multi-faceted, unsupervised
learning becomes the compass guiding analysts through uncharted territories. Clustering,
in particular, becomes the lens through which intricate patterns and relationships come
into focus.
• Consider clustering as the air traffic controller of data points, guiding them to form
coherent patterns and groupings. This grouping mechanism is crucial in aviation for
various applications – from categorizing aircraft maintenance profiles with shared
characteristics to segmenting passenger behaviors for targeted marketing strategies.
Clustering 3
Introduction

• K-Means clustering is a foundational algorithm in unsupervised learning, widely
employed for its simplicity and efficiency. This algorithm partitions data points into K
clusters, where each cluster is represented by its centroid. The iterative process refines
cluster assignments until convergence, making it a valuable tool data analysis.
Clustering 4
Types of Clustering Algorithms
Source (left): https://medium.com/data-folks-indonesia/step-by-step-to-understanding-k-means-clustering-and-implementation-with-sklearn-b55803f519d6
Source (right): https://www.ejable.com/tech-corner/ai-machine-learning-and-deep-learning/k-means-clustering/

• Hierarchical clustering stands out for its ability to create a hierarchical tree-like structure
of clusters. This method is particularly advantageous when the hierarchy of relationships
within the data is of interest. In data analytics, hierarchical clustering finds utility in
scenarios where data exhibits nested patterns.
Clustering 5
Source: https://codinginfinite.com/hierarchical-clustering-applications-advantages-and-disadvantages/

• DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a density-based
clustering algorithm that excels in identifying clusters of varying shapes and sizes. Unlike
K-Means, DBSCAN doesn't require specifying the number of clusters beforehand and can
uncover outliers or noise in the data.
Clustering 6
Source: https://towardsdatascience.com/understanding-dbscan-and-implementation-with-python-5de75a786f9f

Aircraft Maintenance Grouping
• Clustering can be used to categorize maintenance profiles of aircrafts. This application
involves grouping similar maintenance patterns based on historical data. By identifying
commonalities in maintenance needs, aviation experts can proactively plan and optimize
maintenance schedules, ensuring the fleet's operational efficiency and safety.
Clustering 7
Clustering Use Cases in Aviation
https://investinestonia.com/magnetic-mro-wants-to-conquer-the-world/

Passenger Segmentation
• In the dynamic world of aviation, understanding passenger behavior is akin to deciphering
a complex code. Clustering steps in as the linguist, segmenting passenger behaviors into
meaningful groups. Whether it's frequent flyers, leisure travelers, or business executives,
clustering enables airlines to tailor services, marketing, and experiences for distinct
passenger segments, enhancing overall customer satisfaction and loyalty.
Clustering 8
Source: https://investor-relations.lufthansagroup.com/fileadmin/downloads/en/charts-speeches/capital-markets-day-2019/capital-markets-day-2019-presentations.pdf

Route Optimization
• Flight paths are the arteries of aviation operations, and clustering serves as the compass
for optimal route planning. By analyzing geographical patterns, clustering algorithms
assist in grouping destinations with similar characteristics. This allows airlines to optimize
flight routes, considering factors like weather, fuel efficiency, and airspace constraints.
The results are: streamlined operations, reduced fuel costs, and improved on-time
performance.
Clustering 9
Source: http://coolinfographics.squarespace.com/blog/2016/6/3/the-global-air-transportation-network.html;jsessionid=E11686681A0ACCCF95D74B92A8C72E4E.v5-web014

Feature Selection
Feature selection stands as a crucial consideration, emphasizing the
importance of choosing the most relevant variables for effective clustering.
Like a skilled pilot choosing the essential instruments for a smooth flight,
feature selection ensures that the clustering algorithm focuses on the data
aspects most pertinent to the aviation context.
Clustering 10
Considerations and Challenges in Aviation Clustering

Scalability
• Scalability emerges as a challenge that requires careful attention. As
datasets soar in size and complexity, clustering algorithms must efficiently
handle the increasing volume of information. Like managing air traffic,
scalability in clustering ensures that algorithms remain effective and
responsive even when dealing with vast aviation datasets, contributing to
the seamless analysis of patterns and insights.
Clustering 11
Considerations and Challenges in Aviation Clustering

Data Preprocessing: Data preprocessing emerges as a critical best practice, emphasizing
the significance of cleaning and preparing data for clustering tasks. Similar to ensuring that
an aircraft is in optimal condition before takeoff, data preprocessing ensures that the input
data is refined and ready for the intricate process of pattern recognition through clustering.
Evaluation Metrics: Assessing the success of clustering models requires reliable metrics.
Incorporating evaluation metrics is essential for gauging the performance of clustering
algorithms. These metrics act as measurement tools, allowing analysts to quantitatively
understand how well the clustering process aligns with the goals of aviation analytics.
Clustering 12
Best Practices for Aviation Clustering

• In RapidMiner, using the Repository window, follow the
path Training Resources-Model-Unsupervised-
Segmentation and open the Credit Risk k-Means
Clustering solution process.
• In this example, the customers are aimed to be segmented
into groups according to their credit risks.
• There is no label attribute and several attributes related to
the credit risk of customers are taken into account to
segment the customers.
• Therefore, clustering model is choosen to reach this goal.
Clustering 13
RapidMiner Example on Clustering

• In the process window, there are data importing (ETL) operator, clustering model operator,
and cluster model visualizer operator. In ETL oeprator, there are suboperators to
prerocess the data. Z-transformation (normalization) stands here in the subprocess
window. Also model parameters (number of k, numerical measure type etc.) can be set on
the window right.
Clustering 14

• After you run the model, you can
find outputs in the Results view.
• You can find cluster graph,
members of each cluser, centroid
table, and plot view of clusters
here.
• Figure on the right shows the
cluster graph (tree).
Clustering 15

• In the results view, cluster model
visualizer operator provides further
insights.
• In the overview tab, you can see
the clusters and breakdowns
based on attributes.
Clustering 16

• In the Scatter Plot tab, you can
create scatter plots focusing your
each cluster.
Clustering 17

In the Plot tab, you can see the
attributes that:
- Clusters differ most
- Clusters not differ
- The complexiy of differences
between clusters
Clustering 18

• Moreover, if you want to analyze the performance of the clustering model, you can use
the performance operators under the segmentation folder in the operators window.
Clustering 19

• Incorporation of AI in Clustering
As aviation analytics evolves, the integration of Artificial Intelligence (AI) into clustering
methods represents a significant trend. Advanced AI techniques enhance the capabilities of
clustering algorithms, allowing them to adapt and discover intricate patterns within aviation
datasets, leading to more accurate and nuanced insights.
• Explainable Clustering
The future of aviation clustering emphasizes the importance of transparency and
interpretability. As clustering models become more sophisticated, the need for
understanding the rationale behind clustering outcomes grows. The concept of explainable
clustering ensures that results are not only accurate but also comprehensible, fostering trust
in the decision-making processes driven by clustering algorithms.
Clustering 20
Future Trends in Clustering

• In this lesson, we exlored the clustering in aviation comprehensively.
• Various clustering algrotihms were introduced and the main characteristics/differences
were explained.
• You can further explore clustering algortihms using different datasets in RapidMiner.
• Feature selection (selection of attributes) is a key point to build accurate clustering
models. So, try different combinations for your attributes in the clustering models and try
to figure out differences between outputs.
• Also keep in mind to compare performances of your clustering models.
Clustering 21
Conclusion

ml-06x01.pdf

Recommended

Recommended

More Related Content

Similar to ml-06x01.pdf

Similar to ml-06x01.pdf (20)

More from NextGenATM Erasmus+ Project

More from NextGenATM Erasmus+ Project (20)

Recently uploaded

Recently uploaded (20)

ml-06x01.pdf