ml-05x01.pdf

This course is prepared under the Erasmus+ KA-210-YOU Project titled
«Skilling Youth for the Next Generation Air Transport Management»
Machine Learning
Applications in Aviation
Regression Methods
Asst. Prof. Dr. Emircan Özdemir
Eskişehir Technical University

• Regression Methods play a crucial role in predicting numerical outcomes
in the context of machine learning.
• In aviation industry, forecasting fuel consumption, optimizing flight paths,
predicting maintenance needs, anticipating passenger demand for a
particular route, optimizing marketing strategies for increased ticket sales,
or predicting customer satisfaction levels, regression methods offer a
versatile toolkit for modeling intricate relationships between variables in
the aviation domain.
Regression Methods 2
Introduction

• Simple Linear Regression
Simple Linear Regression is a foundational regression method used in machine learning
and statistics to model the relationship between two variables: one independent variable
(predictor) and one dependent variable (outcome). The goal is to establish a linear
equation that best fits the observed data points, allowing for predictions or estimations of
the dependent variable based on changes in the independent variable.
The equation of a simple linear regression line is typically expressed as:
Types of Regression Problems

The regression analysis aims to find the values of b0 and b1 that minimize the difference
between the observed and predicted values, often using methods like the least squares
approach.
Source: https://medium.com/@sachin.hs20/simple-linear-regression-using-example-e4e2a89df54c

In aviation, this model could help predict fuel consumption based on the number of
passengers, offering insights into how changes in passenger load may impact fuel
efficiency. This, in turn, aids in optimizing flight operations, resource planning, and cost
management.

• Multiple Linear Regression
Multiple Linear Regression is an extension of Simple Linear Regression, allowing for the
modeling of the relationship between a dependent variable and multiple independent
variables. In this method, the goal is to create a linear equation that best fits the observed
data points by considering the impact of multiple predictors on the outcome variable.
The general form of the Multiple Linear Regression equation is:

• Multiple Linear Regression
In the context of aviation, consider a scenario where the fuel consumption of an aircraft
(dependent variable) is influenced not only by the number of passengers (as in Simple
Linear Regression) but also by additional factors such as flight distance, weather conditions,
and aircraft type.
Multiple Linear Regression enables the creation of a model that incorporates all these
variables to provide a more comprehensive understanding of how they collectively influence
fuel consumption.
Source: https://www.mathworks.com/help/stats/regress.html

• Polynomial Regression
Polynomial Regression is a type of regression analysis that extends the concept of linear
regression by introducing polynomial terms, allowing for the modeling of non-linear
relationships between the dependent and independent variables. In other words, it
accommodates situations where the relationship between the variables cannot be
adequately represented by a straight line.

The Polynomial Regression equation takes the form:

The choice of the degree of the polynomial (n) depends on the complexity of the
relationship in the data. A higher-degree polynomial can fit the data more closely but may
risk overfitting, capturing noise rather than true patterns.
In the context of aviation, Polynomial Regression can be applied when predicting variables
with non-linear trends.
For example, forecasting the relationship between aircraft altitude and fuel efficiency might
involve a polynomial model, as the impact of altitude on fuel efficiency may not follow a
straight line. Polynomial Regression allows for a more flexible representation of such
relationships, enhancing the accuracy of predictions in scenarios where non-linear patterns
are evident.

• Linear Regression
It serves as the foundational algorithm for modeling linear relationships
between variables.
This method is particularly useful in scenarios where a clear, straight-line
relationship can be established, such as predicting fuel consumption based
on flight time.
Popular Regression Algorithms

• Ridge Regression
It introduces regularization techniques to improve model performance.
In aviation, where datasets often exhibit multicollinearity among variables,
Ridge Regression's regularization helps mitigate overfitting and enhances
the model's generalizability, contributing to more accurate predictions.
The aim of ridge regression is to minimize the sum of squared residuals (like
ordinary linear regression), but with an additional penalty for the sum of
squared coefficients (L2 regularization term).
It helps prevent overfitting, particularly when there is multicollinearity among
predictor variables.

• Lasso Regression
Lasso Regression, on the other hand, brings in L1 regularization, offering a
unique advantage in feature selection.
This is crucial in aviation scenarios where selecting the most relevant
features can significantly impact the accuracy and efficiency of regression
models. Lasso Regression aids in identifying and prioritizing the key
variables that influence outcomes, such as predicting aircraft maintenance
needs.
Its aim is similar to Ridge Regression, but with an L1 regularization term,
which penalizes the absolute values of the coefficients.
It promotes sparsity in the coefficient values, effectively performing
automatic feature selection by pushing certain coefficients to exactly zero.

• Elastic Net Regression
Elastic Net Regression is a hybrid regularization technique that combines
aspects of both Ridge Regression and Lasso Regression. It incorporates
both L1 (Lasso) and L2 (Ridge) regularization terms into the linear
regression objective function.
It aims to minimize the sum of squared residuals, like ordinary linear
regression, including both L1 and L2 regularization terms to penalize the
sum of absolute values (L1) and the sum of squared values (L2) of the
coefficients.
It’s suitable for situations where there are many correlated features, and
automatic feature selection is desired.

• Building the right model is important to avoid overfitting and underfitting. In
In some cases, Ridge, Lasso and Elastic Net regression algorithms help
us to make better long term predictions and avoid over/under fitting.
Overfitting and Underfitting
Source: https://www.geeksforgeeks.org/lasso-vs-ridge-vs-elastic-net-ml/

• Mean Squared Error (MSE)
MSE is a regression evaluation metric that quantifies the average squared difference
between predicted values and actual values in the dataset.
A lower MSE indicates better model performance, with smaller errors between predicted
and actual values. It provides a measure of the accuracy of the model's predictions.
Regression Evaluation Metrics

• Root Mean Squared Error (RMSE)
The RMSE is a commonly used metric for evaluating the accuracy of a regression model. It
represents the square root of the average squared differences between the actual and
predicted values.
The lower the RMSE, the better the model's performance, as it indicates smaller prediction
errors.

• Mean Absolute Error (MAE)
MAE is another metric for evaluating the accuracy of a regression model. It measures the
average absolute differences between the actual and predicted values.
Unlike the squared differences in RMSE, MAE provides a more straightforward
interpretation of the average prediction error.
The lower the MAE, the better the model's performance in terms of absolute prediction
accuracy.

• R-squared (𝑹𝟐)
R-squared is a statistical metric that represents the proportion of the variance in the
dependent variable that is predictable from the independent variables. It ranges from 0 to 1,
where 1 indicates a perfect fit.
A higher R-squared value signifies a better fit of the regression model to the data. It
quantifies the proportion of variability in the dependent variable that can be explained by the
independent variables. However, caution is needed, as high R-squared does not
necessarily imply causation.

Conclusion:
• The lower value of MAE, MSE, and RMSE implies higher accuracy of a
regression model.
• However, a higher value of R-squared is considered desirable.

• Predictive Maintenance
Regression is employed to forecast maintenance needs based on historical data. By
analyzing patterns and trends in past maintenance records, aviation companies can
proactively schedule maintenance activities, reducing the risk of unplanned downtime and
improving overall operational efficiency.
• Predictors: Historical maintenance records, Sensor data from aircraft components, Flight
hours and cycles
• Outcome(s): Predicted time until the next maintenance event, Identification of specific
components requiring attention
Regression Use Cases in Aviation

• Fuel Consumption Optimization
Regression models play a crucial role in optimizing fuel efficiency in aviation. By considering
factors such as aircraft type, weather conditions, and flight routes, regression helps in
developing models that guide fuel consumption strategies. This contributes to cost savings
and environmental sustainability.
• Predictors: Aircraft type and mode, Weather conditions, Flight route and altitude
• Outcome(s): Optimal fuel consumption rate, Cost-effective fuel usage strategies

• Flight Time Prediction
Regression is utilized for estimating flight durations and enhancing scheduling processes.
By analyzing various variables such as departure and arrival locations, historical data, and
potential delays, regression models assist in predicting more accurate flight times,
improving overall airline scheduling efficiency.
• Predictors: Departure and arrival locations, Historical flight data Weather forecasts
• Outcome(s): Predicted flight duration, More accurate scheduling of arrival and departure
times

• Passenger Loyalty Prediction
Airlines can leverage regression methods to analyze historical data related to passenger
behavior, including factors like booking frequency, travel preferences, and demographic
information. By building regression models, airlines can predict the likelihood of passengers
remaining loyal to their services. This allows for targeted strategies to enhance passenger
experience, offer personalized incentives, and ultimately foster stronger customer loyalty in
the highly competitive aviation industry.
• Predictors: Historical passenger behavior, Booking frequency, Travel preferences and
patterns, Demographic information
• Outcome(s): Predicted likelihood of a passenger choosing the airline for future travels,
Identification of factors influencing passenger loyalty

• Feature Selection
The selection of features plays a pivotal role. Features are the variables
used to predict the outcome, and choosing relevant ones is crucial for
accurate regression models.
Aviation datasets may contain numerous variables, and careful
consideration is needed to identify which features significantly contribute to
the predictive power of the model.
Feature selection ensures that the chosen variables align with the specific
objectives of the regression task, enhancing the model's efficacy and
interpretability.
Considerations and Challenges

• Multicollinearity
Multicollinearity refers to the presence of correlated predictor variables in a
regression model. In aviation datasets, certain features may exhibit high
correlation, potentially posing challenges.
When predictor variables are highly correlated, it becomes difficult to
distinguish the individual impact of each variable on the outcome.
Addressing multicollinearity is essential for maintaining the stability and
reliability of regression models.
Techniques such as variance inflation factor (VIF) analysis can be employed
to identify and mitigate multicollinearity, ensuring the robustness of the
regression analysis in aviation applications.
Considerations and Challenges

• Data Preprocessing
Data preprocessing is a critical step in aviation regression, emphasizing the
importance of cleaning and preparing data before applying regression
models.
Aviation datasets can be complex, containing various variables with missing
values, outliers, or inconsistencies. Data preprocessing techniques involve
handling missing data, addressing outliers, and transforming variables to
ensure they align with the assumptions of regression models.
By cleaning and preparing the data meticulously, analysts create a robust
foundation for regression analysis, enhancing the accuracy and reliability of
the subsequent modeling.
Best Practices

• Model Selection
Model selection is a key facet of successful aviation regression. Different
regression algorithms may be suited to varying aviation problems, and
selecting the right algorithm is crucial for achieving accurate and meaningful
results.
Considerations such as the nature of the data, the relationship between
variables, and the specific goals of the regression task guide the choice of
regression algorithm. Whether it's linear regression, ridge regression, or
other advanced techniques, thoughtful model selection ensures that the
regression model aligns optimally with the characteristics of the aviation
dataset and the objectives of the analysis.
Best Practices

• Ensemble Regression Models
Ensemble regression models represent a future trend in aviation that
involves the exploration of combining multiple regression models for
enhanced accuracy and robust predictions.
Ensemble methods, such as Random Forest or Gradient Boosting, leverage
the strength of diverse models to mitigate individual weaknesses and
improve overall performance. By aggregating predictions from multiple
models, ensemble techniques offer a sophisticated approach to handle
complex relationships within aviation datasets, contributing to more reliable
regression outcomes.
Future Trends

• Explainable AI in Regression:
The evolving trend of Explainable AI in regression underscores the
increasing importance of transparency and interpretability in regression
models. As regression techniques become more advanced and complex,
understanding how models arrive at specific predictions becomes crucial,
especially in aviation scenarios where decisions impact safety and
operational efficiency. Explainable AI in regression aims to demystify the
decision-making process of advanced models, providing insights into the
factors influencing predictions and fostering trust in the outcomes. This trend
aligns with the industry's growing emphasis on accountability and
comprehension of AI-driven regression analyses.
Future Trends

• In RapidMiner, using the Repository window, follow the path
Training Resources-Model-Supervised-Linear Regression
and open the Hotel App CLV Linear Regression solution
process.
• In this example, the outcome (label) variable is «Customer
Lifetime Value (CLV)».
• CLV is the total revenue or profit generated by a customer
over the entire course of their relationship with your business.
• Predicting CLV using regression is of paramount importance
in the field of business and marketing.
• Since CLV variable is also realted with airline business, so
we will use this regression example.
RapidMiner Example on Linear Regression Model

• In the process window of this model, you can see main operators for data importing, cross
validation. The cross validation operator also includes the model operators (train and
test), data split function and performance operator.

• When you double-click the cross validation operator, you can see two main areas for
training and testing. You can simply drag and drop model and performance operators in
these areas.

Cross Validation is a technique used for assessing the performance and generalization ability of a
predictive model. Cross-validation helps to evaluate how well a model will perform on an independent
dataset by partitioning the available data into subsets. The basic idea is to train and test the model
multiple times on different subsets of the data, and then average the performance metrics to get a
more reliable estimate of the model's performance. This operator:
• Splits data into k subsets (folds), typically with a value of k such as 5 or 10.
• The model is trained on k-1 folds and tested on the remaining fold. This process is repeated k
times, each time using a different fold as the test set.
• The performance metrics are recorded for each iteration.
• The average performance across all iterations is calculated to provide a more robust estimate of the
model's performance.
• You can simply arrange «number of folds (k)» and «sampling type» using the operator parameters
window on the left.

• After you run the model,
you can find the outputs
in the Results view.
• You can find coefficients
of each predictor and p-
values of these
variables in the
regression model
equation.

• You can find the values of the performance indicators like MSE, RMSE, R-squared etc.
Also, please keep in mind that you can get more performance indicators choosing the
options in the performance operator.

• You can find the predictions for each row in the Results view. These prediction data can
be used for futher visualization or discussion.

• For example you can create graphs using the visualizations tab in the left menu options.

After running the regression model you can:
• Explain the relationship between variables.
• Obtain predicted values for the label attribute.
• Get performance metrics for the model and interpret.
• Visualize the outputs.
• Get statistical outputs for the model.

• In conclusion, this chapter provides a comprehensive exploration of
Regression Methods in aviation.
• With a solid understanding of predictive modeling, regression models
offers a robust toolkit for predicting numerical outcomes and optimizing
various aspects within the aviation industry.
• In order to enhance your proficiency in leveraging regression techniques
for nuanced problem-solving in aviation analytics, not only grasp these
foundational concepts but to actively apply them in practical scenarios
using RapidMiner.
Conclusion

ml-05x01.pdf

Recommended

Recommended

More Related Content

Similar to ml-05x01.pdf

Similar to ml-05x01.pdf (20)

More from NextGenATM Erasmus+ Project

More from NextGenATM Erasmus+ Project (20)

Recently uploaded

Recently uploaded (20)

ml-05x01.pdf