SlideShare a Scribd company logo
1 of 9
Download to read offline
Prediciting Suicide Rates, Hotel Booking
Cancellation and Bittering Units using Machine
Learning Algorithms
Pooja Kumar
Msc.Data Analytics
National College of Ireland
Dublin, Ireland
x18181929@student.ncirl.ie
Abstract—The aim of this analysis is to apply different
machine learning algorithm on three different datasets to
identify the patterns/insights. Multiple linear regression and
gradient boosting regression will be applied on Suicide Rates
Overview dataset to predict the suicide rate of a country based
on various factor of a country. On Hotel Booking Cancellation
dataset, Support Vector Machine (SVM) and Naïve Bayes
algorithm is applied to predict the booking cancellation based
on booking information. K-nearest neighbor (KNN) and
gradient boosting algorithm will be applied on Bittering units of
Beer dataset to predict the bitterness of beer based on the
brewing style. All the methods will be evaluated and validated
to find the which method has better performance on each datset.
Keywords—Multiple linear regression, gradient boosting, K-
nearest neighbors (KNN) regression, Support Vector Machine
(SVM), Naïve Bayes, Kfold validation.
I. INTRODUCTION
A. Suicide Rates Overview
Globally, nearly 800,000 people commit suicide every
year. That means every 40 seconds one person commit
suicide. The third leading causes of death for the age 15-19
years is suicide. In men suicide rates is just over twice as
women. In 2017, “5% in South Korea, 3.9% in Qatar, 3.6%
in Sri Lanka” were the highest recorded deaths due to suicides
[17]. World Health Organization (WHO) has recognized
suicide as a public health priority. WHO is creating the
awareness about suicide prevention. Many studies are being
carried out for its prevention. This problem needs to be
resolved as it is affecting the country’s human resource. This
study is to analyze the factors that on which the suicide rate
depends and to create a predicting model for the same.
Research Question: What are the parameters that helps in
predicting the suicide rate of a country?
B. Hotel Booking Cancellation
Booking cancelation has a notable influence on
management decisions in the hospitality industry. The hotel
applied strict cancelation policies to lessen the cancellations
which damaged the reputation of the hotel. Hotel tends to
refuse the provision of service as a consequence of
overbooking this will have negative impact on its immediate
revenue. Hotel guest have an option to cancel a reservation
by paying a small sum of amount but for hotel manager it is
a factor that diminish the revenue. Online reservation further
challenge hotels to handle cancellations, which is completely
different from the traditional reservation made by guests. By
estimating both guest and booking specifications an analysis
made to classify whether booking tends to cancel or not.
Research Question: How past customer interaction with
hotel management can be utilized to identify the booking
cancellation in future?
C. Bittering units of Beer
Beer is most consumed alcoholic beverage, made from
barley, water and yeast. Nutrient and non-nutrient
compounds both are present in it and if consumed in sensible
amounts can contribute to a healthy diet. The taste and quality
of beer depends on the quantity and quality of the ingredients.
Bitterness is an important parameter for quality in beer
production. Machine learning is slowly taking over the
traditional process of preparing beer. This study focuses on
analyzing the factors that affect the bitterness in beer.
Research Question: What are the factors that contribute to
predict bittering unit of beer?
In this paper, Section I describes Introduction and
research question of the topic which motivates the study.
Section II describes the related work and various aspects of
the other researches on the same domain. Section III
describes the detailed methodology used to carry out the
process. Section IV describes the various machine learning
algorithm which are applied on the datasets and results are
interpreted. In Section V the analysis is concluded and
followed by references.
II. RELATED WORK
A. Suicide Rates Overview
Boonkwang et al., [4], tries to identify the suicidal
characteristics by applying machine learning methods to the
data collected from self-harm surveillance report of a
Psychiatric hospital which had information about the suicide
attempts. Different machine learning model was applied to
identify the individual who has suicidal characteristics tries
to commit suicide repeatedly. The decision tree yielded a
better result in classifying a suicidal characteristic when
compared to Naïve Bayes. There was imbalance in the
distribution of the characteristics so Synthetic minority
oversampling technique (SMOTE) was applied. When
compared to bagging, SMOTE ensemble technique gave a
better accuracy.
Joseph et al., [5], the author applied six different machine
learning methods like Logistic regression, Random forest,
Decision tree, Classification via Regression and Sequential
Minimal Optimization (SMO) on para suicidal patient’s data.
Classification via Regression had higher efficiency when
compared with other methods. But the patient’s information
has to be updated very frequently to get a better result because
this method makes use of psychological measures to predict
a person behavior.
Walsh et al., [6], aims to apply a machine learning method
to classify the psychopathology behavior to know how a
person develop a problematic behavior and tends to self-
harm. First using physical injury data was validated to
identify suicide attempt. An explanatory study was conducted
to analyze how specific risk factor changes over time.
Random forest was used to classify as it can handle nominal
variables. To optimize the model bootstrap method was used.
From this method more accurate result was obtained. But
there was no difference in performance for single and repeat
suicide attempts. Predictor importance used to change over
time within the model.
Hayes et al., [7], statistical analysis was made use to
identify suicidal behavior among college student who are
psychotherapy patients. From regression analysis it was
identified that suicidal behavior was dependent on
depression, behavior and self-injury. After complete analysis
it was found that few students were attempted to commit
suicide during treatment period. Based on the self-injurious
factor whether it was intentional or not behavior of suicidal
can be identified.
Bae et al., [8], the study was conducted on Korean
adolescents using decision tree analysis was made on high
school students to predict suicide attempt based on
sociodemographic, intrapersonal, and extrapersonal
variables. Depression had a stronger severity to predict the
suicide attempt. Based on the factor of depression students
were classified into three groups in that the depression and
potential depression group had high suicide attempts rate
when compared to non- depression group. In non-depression
group the students who experienced high level stress where
tend to attempt suicide.
B. Hotel Booking Cancellation
Antonio et al., [9], to avoid the risk of booking
cancellation and overbooking impact on hotel reputation and
revenue. A model was developed to predict the booking
cancellation. The Property management system data was
used from four resorts which had higher booking cancelation.
First the factor which influenced booking cancellation was
identified. A model was developed separately for each resort
and different algorithms like Boosted decision tree and
decision forest were applied. It was validated using k fold
cross validation. Boosted decision tree had better accuracy.
The bookings with high chance of cancellation can be
identified. This allows hotel managers to prevent this
cancellation by offering discounts and other services. As this
model is applied only for four resorts the prediction result
may vary if it is applied to different hotel data.
Falk et al., [10], using probit model the booking
cancellation is determined. For rooms booked offline,
booking made via online travel agencies and through
traditional travel agencies, separate estimates are given. This
probit study estimates the reason for booking cancelation is
higher when it is an early booking, when children are not
involved, offline high booking season, guest bookings from
specific countries and booking made via online. This study is
based on the record hotels belonging to particular chain.
Thus, no other group of hotels can generalize the results.
In tourism industry, the tourist destination hotel is the
revenue indicator. Precise travel forecasting can bring lot of
revenue to hotel managements [16]. The ambiguity in
prediction of the passenger during the peak season may put
the hotel management in confusion, which will overestimate
or underestimate the passenger presence due to this there will
be wastage of resources. The present predicting model uses
linear and non-linear technologies. The author [15], explains
the drawback of the machine learning algorithm on hotel
booking cancellation prediction. The performance of
prediction increases as sample size increase’s, but it tends to
normalize after some time. To overcome this problem deep
learning methods are used, the long-short term memory
(LSTM) is better for time series forecast. For this prediction
data from Hainan province which a tourist destination in
China was taken. The model was trained for one-night
passenger flow and it was tested. A prediction for 12 months
of passenger flow was done in a fold. Thus, the model
predicted well and suits for dynamic characteristics
prediction.
C. Bittering units of Beer
Popescu et al., [11], examine the characteristics of beer
during different stages of the Romanian brewery beer
production process using statistical method. Romanian light
beer contains 3.4% - 3.9% of alcohol and dark beer contains
3.7% - 4.6% of alcohol. Beer color is influenced based on the
usage of wheat. Bitterness gradually decreases in each stage.
The loss of bitterness is expected from 24.7 - 41.54% units
during boiling, fermentation and bottling process.
To identify the beer properties using Ultraviolet-visible
(UV-VIS) spectrum along with Artificial neural network
(ANN) and Principal component regression (PCR). The
diluted beer was scanned, absorbance data was collected and
used for modelling. PCR model showed no accepted
correlations. ANN exhibited appropriate accuracy [12]. To
examine the nutraceutical and mineral properties in beer,
three beer with different alcohol content from same brand
was taken. On each sample mineral analysis was done and
found the zero beer had few important minerals when
compared with regular and light beers [13]. By using
ANOVA statistical difference was evaluated.
To identify the beer properties using Ultraviolet-visible
(UV-VIS) spectrum along with Artificial neural network
(ANN) and Principal component regression (PCR). The
diluted beer was scanned, absorbance data was collected and
used for modelling. PCR model showed no accepted
correlations. ANN exhibited appropriate accuracy [12]. To
examine the nutraceutical and mineral properties in beer,
three beer with different alcohol content from same brand
was taken. On each sample mineral analysis was done and
found the zero beer had few important minerals when
compared with regular and light beers [13]. By using
ANOVA statistical difference was evaluated.
III. DATA MINING METHODOLOGY
The dataset belongs to three different domains on which
different machine learning methodologies have been applied.
Knowledge Discovery in Databases (KDD) methodology is
used to extract the information from the datasets. It is an
iterative process [18]. First begins with identifying objectives
and by the end model will be implemented based on the
knowledge discovered. Each step-in fig. 1, is explained with
respect to the dataset.
Fig. 1. KDD Process Overview
Step 1: In this step application domain needs to be understood
for end-user target and prior appropriate knowledge has to be
possessed. For the analysis three different domain were
chosen. First, Suicide Rates Overview to understand the risk
factor of suicide. Second, Hotel booking cancellation to
identify how revenue and reputation of a hotel gets affected.
Third, Bittering units of beer based on the brewery style and
alcohol content to identify the bitterness.
Step 2: The objectives should be defined and dataset should
be identified on which machine learning model is applied.
The three datasets used for analysis are taken from Kaggle.
The Suicide Rates Overview dataset contains 27820 records
and 12 columns [1]. There are 119390 records and 32
columns in Hotel Booking Cancellation dataset and it
contains booking information about resort and city hotel
along with guest’s requirement specification [2]. In Bittering
units of beer dataset there are 73861 records and 23 columns
present in it and has information about homebrewed beer [3].
Step 3: The data pre-processing takes place by handling
missing data, removing outliers and preparing data for the
analysis.
In Suicide Rates Overview dataset, HDI_for_year
column had missing values and it was imputed using median
values. Fig. 2 shows the count of missing values.
In Hotel Booking Cancellation dataset there was four
values missing in children column and the value was replaced
by 0. The columns agent and company were dropped as they
were not used in the analysis. Fig. 3. Shows the missing value
count.
Fig. 2. Missing Values in Suicide Rates Overview Dataset
Fig. 3. Missing values in Hotel Booking Cancellation dataset
In Bittering units of beer dataset there were many
record missing seen in fig. 4. PitchRate and BoilingGravity
were imputed using mean values. And other columns were
dropped as they were not involved in the analysis.
Fig. 4 Missing values in Bittering units of beer dataset
In all three datasets the columns which were irrelevant to
analysis were dropped.
Step 4: The required variables in the dataset needs to be
transformed into appropriate format. For example, for
classification method the variables should be categorical and
for regression it should be continuous. In Hotel Booking
Cancellation dataset deposit_type, customer_type was
converted into factor and the independent variables were
normalized by applying scaler method.
Step 5: In fifth step, based on KDD objective decision has to
be made whether if classification, regression or clustering has
to be applied on the dataset. For Suicide Rates Overview and
Bittering units of beer regression methods are used. For Hotel
Booking Cancellation classification method is used.
Step 6: The data mining algorithm has to be selected for each
dataset. For Suicide Rates Overview dataset Multilinear
regression and Gradient Boosting regression will be applied.
Support Vector Machine (SVM) classification and Naïve
Bayes classification is applied on Hotel Booking
Cancellation dataset. On Bittering units of beer dataset K-
nearest neighbors (KNN) regression and Gradient Boosting
Regression will be applied.
Step 7: In this step the algorithm has to be applied to identify
the pattern and result has to be obtained. In order to apply
machine learning algorithm all three dataset were split into
80:20 ratio as training and testing data.
Step 8: The results obtained are evaluated based on the
parameters like Root mean squared error (RMSE), Mean
absolute error (MAE), R-square, accuracy, confusion matrix,
etc.
Step 9: At last the obtained result is stored and accessed when
it is required.
In IV Section, machine learning algorithm will be applied on
each dataset, based on the obtained result model will be
evaluated.
A. Suicide Rates Overview
For this dataset as per KDD process, target has to be set.
Here target is to predict the suicide rates for a country. To
predict this a dataset was taken from Kaggle which has 27820
records and contains information like country, population,
suicide per 100k population, sex, age, year, suicide numbers,
human development index (HDI) for year, gross domestic
product (GDP) for year, GDP per capita and generation [1].
The data was checked for missing values. HDI for year
column had missing values and was imputed using median.
As the other variables were in proper format, it required no
transformation. A regression model like multiple linear
regression and gradient boosting algorithms are applied on
this dataset. As multiple linear regression is used for the
prediction based on correlation factor independent variables
were taken. Population of a country, HDI for year, GDP per
capita and suicides per 100k population are used as
explanatory variables to predict suicide numbers in a country.
The data was split into 80:20 ratio as training and testing data.
The data was trained and tested. Further it was evaluated. (In
section IV)
B. Hotel Booking Cancellation
For this dataset as per KDD, target is set to predict the
booking cancellation. This dataset was taken from Kaggle
[2]. It has 119390 records and 32 columns. It contains
booking information of a customer. The data was checked for
missing values agent, company and children values were
missing. In children column 4 values were missing and it was
replaced with 0. Agent and company columns were dropped
because they were not used in the analysis. The categorical
variables which had labels instead of values are transformed
using dummy variables where duplicate variables are created,
1 will represent the presence of level and 0 will represent the
absence of level. Scaling was applied on explanatory
variables to normalize the data. On this dataset the
classification model, SVM and Naïve Bayes algorithm are
applied. The lead time, stays in week nights, number of
adults, previous cancellation, booking type and customer type
are used as explanatory variable to predict booking
cancellation. The data was split into training and testing data
in 80:20 ratio. Further the data was trained and tested. The
model is evaluated in section IV.
Fig. 5. Hotel Booking Cancellation
C. Bitterning units of Beer
For this dataset as per KDD process, the target is to
predict the bittering units of beer. For this analysis the dataset
was taken from Kaggle [3]. It has 73861 records and 23
variables. It contains beer brewing information. The dataset
was checked for missing values. It was found that boil
gravity, mash thickness, pitch rate, primary temperature,
priming method, priming amount and user Id had missing
values. Boil gravity and pitch rate was imputed using mean.
The other variables were dropped as they are not used in the
analysis. A regression model like KNN and gradient boosting
regression algorithm are used. The alcohol by volume (ABV),
color, boil time and pitch rate are used as an explanatory
variable to predict bittering units of beer. The dataset was
split in to 80:20 ratio for training and testing. The data was
trained and tested. In section IV evaluation metrics are
discussed.
IV. EVALUATION
In this section, machine learning models will be applied on
the dataset and based on the result obtained model will be
evaluated.
A. Suicide Rates Overview
Multilinear Regression and Gradient Boosting Regression
model will be applied on this dataset.
Multiple linear Regression: It is useful for modelling the
relation between response variable and multiple explanatory
variables. Suicide number is the response variable and
country population, number of suicides/10k population,
Human development index (HDI) for year and Gross
Domestic product (GDP) per capita were explanatory
variables. The variables were checked for multicollinearity
using correlation matrix as the correlation coefficient values
is not greater than 0.7 there is no multicollinearity seen in
fig.6.
Fig. 6. Correlation matrix
The dataset was trained and result was predicted. The
predicted result was validated using test data. To evaluate the
predicted result, Mean Absolute Error (MAE), Root Mean
Squared Error (RMSE) and R-square values were obtained.
Fig. 7, shows the evaluation metrics of this model. The R-
square values is 0.514.
Fig. 7. Evaluation metrics Multiple linear Regression
If the RMSE value of training and testing data is similar
then the data fits well in the model. As seen in fig.8, there is
a slight difference between training and testing RMSE value.
Here training set value is greater than testing set, when tested
for a sample it has less predictive value.
Fig. 8. Multiple linear regression RMSE value
Kfold cross-validation is a procedure to resample the data
and evaluate the machine learning model using that limited
data sample. The R-square value obtained after Multiple linear
regression validation is 0.449. When the R-square of the
regression model (0.514) and validation model (0.449) is
compared there is little difference between them.
Fig. 9. Multiple linear Regression Validation
In fig. 10, observed vs predicted value of multiple linear
regression is plotted. Observed values are in yellow color and
predicted value are in blue color.
Fig. 10. Observed vs Predicted value Multiple linear Regression
Gradient Boosting Regression: This can be used for both
classification and regression model. For this dataset it is used
as regressor. Boosting method builds the model in stage-wise
method similar to decision tree. In this method the weak
learner can be modified to become better. The dataset was
trained and predicted results were tested. MAE, RMSE and R-
square values were obtained. The evaluation metrics of
gradient boosting regression is shown in fig. 11, The R-square
value of this model is 0.871.
Fig. 11. Evaluation metrics Gradient Boosting Regression
In gradient boosting regression model, there is no much
difference between the RMSE values of training and testing
dataset which is seen in fig. 12, which means the prediction is
better.
Fig. 12. RMSE value of Gradient Boosting Regression
The R-square value after the Kfold gradient boosting
regression validation is 0.895 which is similar to the r-square
of the model (0.871).
Fig. 13. Gradient Boosting Regression Validation
For observed vs predicted value in gradient boosting
regression a graph is plotted where yellow represents the
observed value and blue represents the predicted value seen in
fig. 14.
Fig. 14. Observed vs Predicted value Gradient Boosting Regression
Algorithm MAE RMSE R-square
Multiple Linear
Regression
246.71 535.27 0.514
Gradient Boosting
Regression
94.96 275.23 0.871
Fig. 15. Evaluation metric for Suicide Rates Overview
For Suicide Rates Overview dataset when evaluation
metrics of both the regression model is compared, it is more
evident that gradient boosting regression model is better in
predicting the suicide number as it has low MAE and RMSE
values. In addition, this model has high R square value 0.871
i.e., close to 1 which supports the prediction. The difference
between RMSE value of training and testing set is 268.18 and
275.23 where there is no much difference. Even after
validating the R-square value remains same
B. Hotel Booking Cancellation
Support Vector Machine (SVM) and Naïve Bayes
classification model is applied on this dataset to classify the
booking cancellation.
Support Vector Machine: It can used for both
classification and regression analysis. Here,it is used as
classification. The data points on a plane are separated and
classified into one of the two categories. Such that there is a
maximum distance between the categories. It finds the
optimal hyper plane between two classes. In this dataset
dependent variable is_cancelled is a categorical variable and
it is a binary classification, as there are only two classes 0 and
1. In which 0 means booking is not cancelled and 1 means
booking is cancelled. This model learns from training
instances and classify the testing variable.
Fig. 16. SVM classification Result
The classification result of booking cancelled or not is
seen in fig.16. To evaluate the model accuracy, confusion
matrix, precision, recall and f1-score values were obtained.
In this model 6034 misclassifications can be observed and
74.73% accuracy is obtained based on confusion matrix. The
F1- score value is 0.83.
When k-fold cross validation technique was applied on
SVM model. Based on the result which is shown in fig. 17, it
is evident that there is no overfitting or underfitting of data as
result obtained from k-fold is similar to the result obtain in
SVM model.
Fig. 17. K-fold validation on SVM model
The average accuracy score of all 5 k-fold is shown in fig. 18.
Fig. 18. Kfold Accuracy
Fig. 19, shows the predicted probability of the booking
cancellation. Here 1 is cancelled and 0 is not cancelled.
Fig. 19. SVM Predicted probability
Naïve Bayes: It finds probability of an event based on
occurrence of the other event. It works efficiently when
independence assumption holds good. This method can
handle both discrete and continuous data by making
probabilistic prediction. As response variables are binary
values, Bernoulli Naïve Bayes model is used for prediction.
Various evaluation metrics like accuracy, confusion matrix,
precision, recall and f1-score were obtained to evaluate the
model. The result is shown in fig. 20. It has 76.22% accuracy
and 5677 misclassifications were observed. The f1-score of a
model is 0.84.
Fig. 20. Naïve Bayes classification Result
On Naïve Bayes model k-fold cross validation was
applied and result is shown in fig. 21. As there is no
difference in the result obtained by Naïve Bayes model and
k-fold technique there is no underfitting or overfitting of data.
Thus, the model is well generalized.
Fig. 21. K-fold validation on Naïve Bayes model
From fig. 22, the average accuracy score of all 5 k-fold is
76.58%.
Fig. 22. Average Kfold
Fig. 23 shows the naïve bayes predicted probability of
booking cancellation.
Fig. 23. Naïve Bayes Predicted Probability
Algorithm Accuracy F1-score Precision
SVM model 74.73% 0.83 0.71
Naïve Bayes model 76.22% 0.84 0.73
Fig. 24. Evaluation metrics Hotel Booking Cancellation
When SVM and Naïve Bayes model results are compared
there is no much difference. But with slight difference it is
observed Naïve Bayes model has a better accuracy and f1-
score. In Naïve Bayes model less misclassifications are
observed. When predicted values are compared Naïve Bayes
predict almost accurate result. Thus, Naïve Bayes model has
better performance.
But there is a drawback in this prediction as most of the
sample has 0 which means booking not cancelled. The data is
trained and when it is tested there is chance of predicting the
booking which is cancelled as not cancelled.
C. Bittering units of beer
To predict bittering units of beer K-nearest neighbors
(KNN) regression and Gradient Boosting Regression will be
applied.
K-nearest neighbors: It can be used as both regression
and classification. Based on the similarity measure it predict
the target variable. In KNN regression more than one nearest
neighbor can be used and the average of the neighbors is
predicted. This model is used to predict the bittering units of
beer based on independent variables like alcohol by volume,
color, pitch rate and boiling time of the beer. Using this
model, the dataset was trained and predicted values were
tested. To evaluate the model MAE, RMSE and R-square was
obtained. The evaluation metrics is shown in fig. 25.
Fig. 25. KNN model Evaluation metrics
The RMSE value of both training and testing dataset was
obtained for KNN model, there is no much difference between
the values is shown in fig. 26.
Fig. 26. RMSE value of KNN Regression
There is no much difference in R-square after applying
Kfold validation technique which is seen in fig. 27.
Fig. 27. KNN regression validation
A graph was plotted for actual vs predicted value in KNN
regression where red signifies actual value and green signifies
the predicted value seen in fig. 28.
Fig. 28. Observed vs Predicted value KNN Regression
Gradient Boosting Regression: As discussed above this
method build the model step-wise. The model was trained and
tested. The evaluation metrics were obtained which is shown
in fig. 29.
Fig. 29. Evaluation metrics Gradient Boosting Regression
RMSE value for training and testing dataset was obtained.
As seen in fig. 30. the RMSE value is almost similar for
training and testing dataset.
Fig. 30. RMSE value Gradient Boosting Regression
After the applying validation technique there is no much
difference in R-square value which is seen in fig. 31.
Fig. 31. Gradient boosting regression validation
Fig. 32, shows the plotting of actual vs predicted value for
Gradient boosting regression where red indicates actual value
and green indicated predicted value.
Fig. 32. Observed vs Predicted value Gradient Boosting Regression
Algorithm MAE RMSE R-square
KNN Regression 0.996 1.503 0.287
Gradient Boosting
Regression
1.022 1.529 0.263
Fig. 33. Evaluation metrics Bittering units of beer
When evaluation metrics are compared both KNN and
gradient boosting regression are not efficient in predicting the
bittering unit of beer. But KNN as slightly higher
performance with R-square value 0.287 whereas gradient
boosting regression has 0.263 R-square value.
V. CONCLUSION AND FUTURE WORK
This study analyzes different machine learning
algorithms implemented on three different datasets.
In predicting “Suicides Rates” Gradient boosting
regression algorithm has better performance and from cross-
validation it is observed that there is no under-fitting or over-
fitting of data.
In classification of “Hotel Booking Cancellation” both
SVM and Naïve Bayes has accuracy 74.73% and 76.22%.
Even after cross validation the accuracy remains same. But
Naïve Bayes tends to predict better. Thus, it can be concluded
that Naïve Bayes model is a better classifier of hotel booking
cancellation.
KNN regression is better in predicting the units of
bitterness with smaller MAE and RMSE value.
The analytical findings of this research can unravel the
possibility of improving the models in the future. In order to
reduce dimensions Principal Component Analysis can be
applied on datasets. Artificial Intelligence can be used to get
efficient result.
REFERENCES
[1] Rusty, “Suicide Rates Overview 1985-2016 | Kaggle," in 2018.
[Online]. Available:
https://www.kaggle.com/russellyates88/suiciderates-overview-1985-
to-2016/metadata [Accessed on Mar. 4, 2020].
[2] J. Mostipak, "Hotel Booking Demand | Kaggle," 2020. [Online].
Available: https://www.kaggle.com/jessemostipak/hotel-
bookingdemand [Accessed on Mar. 4, 2020].
[3] Jtrofe, “Brewer’s Friend Beer Recipes | Kaggle,” 2018. [Online].
Available: https://www.kaggle.com/jtrofe/beer-recipes/metadata
[Accessed on Mar. 4, 2020].
[4] K. Boonkwang, S. Kasemvilas, S. Kaewhao and O. Youdkang, "A
Comparison of Data Mining Techniques for Suicide Attempt
Characteristics Mapping and Prediction", International Seminar on
Application for Technology of Information and Communication, 2018.
[5] A. Joesph and B. Murthy, “Suicidal Behavior Prediction Using Data
Mining Techniques,” IJMET, vol. 9, no. 4, pp. 293-301, Apr.2018.
[6] C. Walsh, J. Ribeiro and J. Franklin, "Predicting Risk of Suicide
Attempts Over Time Through Machine Learning", SAGE publishing,
2017.
[7] J. Hayes, J. Petrovich, R. Janis, Y. Yang, L. Castonguay and B. Locke,
"Suicide Among College Students in Psychotherapy: Individual
Predictors and Latent Classes", Journal of Counseling Psychology, vol.
67, no. 1, pp. 104-114, 2020.
[8] S. Bae, S. lee and S. lee, "Prediction by data mining, of suicide attempts
in Korean adolescents: a national study", Neuropsychiatric Disease and
Treatment, pp. 2367-2375, Sep. 16, 2015. [Accessed 3 May 2020].
[9] N. Antonio, A. Almedia, and L. Nunes, “Predicting Hotel Bookings
Cancellation with a Machine Learning Classification Model,” IEEE
Conf. Machine learning and Applications, December 16, 2017,
pp.1049-1054. [Online]. Available: IEEE Xplore,
https://www.ieee.org/ [Accessed on Mar. 4, 2020].
[10] M. Falk and M. Vieru, "Modelling the cancellation behaviour of hotel
guests", International Journal of Contemporary Hospitality
Management, vol. 30, no. 10, pp. 3100-3116, Jan. 18, 2018. [Accessed
3 May 2020].
[11] V. Popescu, A. Soceanu, S. Dobrinas and G. Stanciu, "A study of beer
bitterness loss during the various stages of the Romanian beer
production process", Institute of Brewing & Distilling, pp. 111-115,
Aug. 15 2013. [Accessed 3 May 2020].
[12] H. Oliveiraa, J. Filhoa, J. Rochaa and E. Núñez, "Rapid monitoring of
beer-quality attributes based on UV-Vis spectral
data", INTERNATIONAL JOURNAL OF FOOD PROPERTIES, vol.
20, no. 2, pp. 1686-1699, July, 5, 2017.
[13] D. Muy-Rangel, V. Urias-Orona, B. Heredia, L. Hernadez-Garcia, W.
Rubio-Carrasco, L. Contreras-Angulo, R. Contreras-Martinez, and G.
Nino-Medina, “Differences In Physicochemical, Mineral and
Nutraceutical Properties Between Regular, Light and Zero Beers,”
Farmacia, 2018, Vol. 66, no. 4, Jan. 2018. Doi:
org/10.31925/farmacia.2018.4.20R
[14] C. Cernuda, E. Lughofer, H. Klein, C. Forster, M. Pawliczek and M.
Brandstetter4, "Improved quantification of important beer quality
parameters based on nonlinear calibration methods applied to FT-MIR
spectra", Springer, pp. 841-857, Aug. 20, 2016.
[15] B. Zhang, Y. Pu, Y. Wang and J. Li, "Forecasting Hotel
Accommodation Demand Based on LSTM Model Incorporating
Internet Search Index", Sustainability, vol. 11, no. 4708, p. 14, Aug.
29, 2019.
[16] L. Weatherforda and S. Kimesb, "A comparison of forecasting methods
for hotel revenue management", International Institute of Forecasters,
vol. 19, no. 3, pp. 401-415, Sep. 2003.
[17] H. Ritchie, M. Roser and E. Ortiz-Ospina, "Suicide", Our World in
Data, 2020. [Online]. Available: https://ourworldindata.org/suicide.
[Accessed on May, 3, 2020].
[18] F. Gullo, "From Patterns in Data to Knowledge Discovery: What Data
Mining Can Do", 3rd International Conference Frontiers in Diagnostic
Technologies, vol. 62, pp. 18-22, 2015.

More Related Content

Similar to Data mining and Machine learning

Impact Of Perceptual Mapping Of Star Hotel
Impact Of Perceptual Mapping Of Star HotelImpact Of Perceptual Mapping Of Star Hotel
Impact Of Perceptual Mapping Of Star Hotel
Rachel Phillips
 
Predictive analytics-white-paper
Predictive analytics-white-paperPredictive analytics-white-paper
Predictive analytics-white-paper
Shubhashish Biswas
 
Csit65111ASSOCIATIVE REGRESSIVE DECISION RULE MINING FOR ASSOCIATIVE REGRESSI...
Csit65111ASSOCIATIVE REGRESSIVE DECISION RULE MINING FOR ASSOCIATIVE REGRESSI...Csit65111ASSOCIATIVE REGRESSIVE DECISION RULE MINING FOR ASSOCIATIVE REGRESSI...
Csit65111ASSOCIATIVE REGRESSIVE DECISION RULE MINING FOR ASSOCIATIVE REGRESSI...
cscpconf
 
Associative Regressive Decision Rule Mining for Predicting Customer Satisfact...
Associative Regressive Decision Rule Mining for Predicting Customer Satisfact...Associative Regressive Decision Rule Mining for Predicting Customer Satisfact...
Associative Regressive Decision Rule Mining for Predicting Customer Satisfact...
csandit
 
A Biometric Fusion Based on Face and Fingerprint Recognition using ANN
A Biometric Fusion Based on Face and Fingerprint Recognition using ANNA Biometric Fusion Based on Face and Fingerprint Recognition using ANN
A Biometric Fusion Based on Face and Fingerprint Recognition using ANN
rahulmonikasharma
 
credit scoring paper published in eswa
credit scoring paper published in eswacredit scoring paper published in eswa
credit scoring paper published in eswa
Akhil Bandhu Hens, FRM
 

Similar to Data mining and Machine learning (20)

Instance Selection and Optimization of Neural Networks
Instance Selection and Optimization of Neural NetworksInstance Selection and Optimization of Neural Networks
Instance Selection and Optimization of Neural Networks
 
How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - Statswork
 
Scoring and predicting risk preferences
Scoring and predicting risk preferencesScoring and predicting risk preferences
Scoring and predicting risk preferences
 
Impact Of Perceptual Mapping Of Star Hotel
Impact Of Perceptual Mapping Of Star HotelImpact Of Perceptual Mapping Of Star Hotel
Impact Of Perceptual Mapping Of Star Hotel
 
Predictive analytics-white-paper
Predictive analytics-white-paperPredictive analytics-white-paper
Predictive analytics-white-paper
 
Age Estimation using Mixed Feature Vectors
Age Estimation using Mixed Feature VectorsAge Estimation using Mixed Feature Vectors
Age Estimation using Mixed Feature Vectors
 
Comparative Study of Classification Method on Customer Candidate Data to Pred...
Comparative Study of Classification Method on Customer Candidate Data to Pred...Comparative Study of Classification Method on Customer Candidate Data to Pred...
Comparative Study of Classification Method on Customer Candidate Data to Pred...
 
Credit risk assessment with imbalanced data sets using SVMs
Credit risk assessment with imbalanced data sets using SVMsCredit risk assessment with imbalanced data sets using SVMs
Credit risk assessment with imbalanced data sets using SVMs
 
Mental Health Monitor using facial expression
Mental Health Monitor using facial expressionMental Health Monitor using facial expression
Mental Health Monitor using facial expression
 
Intensive Care Unit (ICU) Readmission Prediction - Ideas2IT
Intensive Care Unit (ICU) Readmission Prediction - Ideas2ITIntensive Care Unit (ICU) Readmission Prediction - Ideas2IT
Intensive Care Unit (ICU) Readmission Prediction - Ideas2IT
 
Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...
Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...
Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...
 
Csit65111ASSOCIATIVE REGRESSIVE DECISION RULE MINING FOR ASSOCIATIVE REGRESSI...
Csit65111ASSOCIATIVE REGRESSIVE DECISION RULE MINING FOR ASSOCIATIVE REGRESSI...Csit65111ASSOCIATIVE REGRESSIVE DECISION RULE MINING FOR ASSOCIATIVE REGRESSI...
Csit65111ASSOCIATIVE REGRESSIVE DECISION RULE MINING FOR ASSOCIATIVE REGRESSI...
 
Associative Regressive Decision Rule Mining for Predicting Customer Satisfact...
Associative Regressive Decision Rule Mining for Predicting Customer Satisfact...Associative Regressive Decision Rule Mining for Predicting Customer Satisfact...
Associative Regressive Decision Rule Mining for Predicting Customer Satisfact...
 
IJSRED-V2I5P44
IJSRED-V2I5P44IJSRED-V2I5P44
IJSRED-V2I5P44
 
A comprehensive study on disease risk predictions in machine learning
A comprehensive study on disease risk predictions  in machine learning A comprehensive study on disease risk predictions  in machine learning
A comprehensive study on disease risk predictions in machine learning
 
A Biometric Fusion Based on Face and Fingerprint Recognition using ANN
A Biometric Fusion Based on Face and Fingerprint Recognition using ANNA Biometric Fusion Based on Face and Fingerprint Recognition using ANN
A Biometric Fusion Based on Face and Fingerprint Recognition using ANN
 
Tanvi_Sharma_Shruti_Garg_pre.pdf.pdf
Tanvi_Sharma_Shruti_Garg_pre.pdf.pdfTanvi_Sharma_Shruti_Garg_pre.pdf.pdf
Tanvi_Sharma_Shruti_Garg_pre.pdf.pdf
 
credit scoring paper published in eswa
credit scoring paper published in eswacredit scoring paper published in eswa
credit scoring paper published in eswa
 
Credit Card Fraud Detection Using Machine Learning
Credit Card Fraud Detection Using Machine LearningCredit Card Fraud Detection Using Machine Learning
Credit Card Fraud Detection Using Machine Learning
 
Credit Card Fraud Detection Using Machine Learning
Credit Card Fraud Detection Using Machine LearningCredit Card Fraud Detection Using Machine Learning
Credit Card Fraud Detection Using Machine Learning
 

Recently uploaded

如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
zifhagzkk
 
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
aqpto5bt
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
Amil baba
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444
saurabvyas476
 
obat aborsi Banjarmasin wa 082135199655 jual obat aborsi cytotec asli di Ban...
obat aborsi Banjarmasin wa 082135199655 jual obat aborsi cytotec asli di  Ban...obat aborsi Banjarmasin wa 082135199655 jual obat aborsi cytotec asli di  Ban...
obat aborsi Banjarmasin wa 082135199655 jual obat aborsi cytotec asli di Ban...
siskavia95
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Stephen266013
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
jk0tkvfv
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
23050636
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Valters Lauzums
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
yulianti213969
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证
pwgnohujw
 

Recently uploaded (20)

如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
 
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
 
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI  MANAJEMEN OF PENYAKIT TETANUS.pptMATERI  MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444
 
obat aborsi Banjarmasin wa 082135199655 jual obat aborsi cytotec asli di Ban...
obat aborsi Banjarmasin wa 082135199655 jual obat aborsi cytotec asli di  Ban...obat aborsi Banjarmasin wa 082135199655 jual obat aborsi cytotec asli di  Ban...
obat aborsi Banjarmasin wa 082135199655 jual obat aborsi cytotec asli di Ban...
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam DunksNOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"
 

Data mining and Machine learning

  • 1. Prediciting Suicide Rates, Hotel Booking Cancellation and Bittering Units using Machine Learning Algorithms Pooja Kumar Msc.Data Analytics National College of Ireland Dublin, Ireland x18181929@student.ncirl.ie Abstract—The aim of this analysis is to apply different machine learning algorithm on three different datasets to identify the patterns/insights. Multiple linear regression and gradient boosting regression will be applied on Suicide Rates Overview dataset to predict the suicide rate of a country based on various factor of a country. On Hotel Booking Cancellation dataset, Support Vector Machine (SVM) and Naïve Bayes algorithm is applied to predict the booking cancellation based on booking information. K-nearest neighbor (KNN) and gradient boosting algorithm will be applied on Bittering units of Beer dataset to predict the bitterness of beer based on the brewing style. All the methods will be evaluated and validated to find the which method has better performance on each datset. Keywords—Multiple linear regression, gradient boosting, K- nearest neighbors (KNN) regression, Support Vector Machine (SVM), Naïve Bayes, Kfold validation. I. INTRODUCTION A. Suicide Rates Overview Globally, nearly 800,000 people commit suicide every year. That means every 40 seconds one person commit suicide. The third leading causes of death for the age 15-19 years is suicide. In men suicide rates is just over twice as women. In 2017, “5% in South Korea, 3.9% in Qatar, 3.6% in Sri Lanka” were the highest recorded deaths due to suicides [17]. World Health Organization (WHO) has recognized suicide as a public health priority. WHO is creating the awareness about suicide prevention. Many studies are being carried out for its prevention. This problem needs to be resolved as it is affecting the country’s human resource. This study is to analyze the factors that on which the suicide rate depends and to create a predicting model for the same. Research Question: What are the parameters that helps in predicting the suicide rate of a country? B. Hotel Booking Cancellation Booking cancelation has a notable influence on management decisions in the hospitality industry. The hotel applied strict cancelation policies to lessen the cancellations which damaged the reputation of the hotel. Hotel tends to refuse the provision of service as a consequence of overbooking this will have negative impact on its immediate revenue. Hotel guest have an option to cancel a reservation by paying a small sum of amount but for hotel manager it is a factor that diminish the revenue. Online reservation further challenge hotels to handle cancellations, which is completely different from the traditional reservation made by guests. By estimating both guest and booking specifications an analysis made to classify whether booking tends to cancel or not. Research Question: How past customer interaction with hotel management can be utilized to identify the booking cancellation in future? C. Bittering units of Beer Beer is most consumed alcoholic beverage, made from barley, water and yeast. Nutrient and non-nutrient compounds both are present in it and if consumed in sensible amounts can contribute to a healthy diet. The taste and quality of beer depends on the quantity and quality of the ingredients. Bitterness is an important parameter for quality in beer production. Machine learning is slowly taking over the traditional process of preparing beer. This study focuses on analyzing the factors that affect the bitterness in beer. Research Question: What are the factors that contribute to predict bittering unit of beer? In this paper, Section I describes Introduction and research question of the topic which motivates the study. Section II describes the related work and various aspects of the other researches on the same domain. Section III describes the detailed methodology used to carry out the process. Section IV describes the various machine learning algorithm which are applied on the datasets and results are interpreted. In Section V the analysis is concluded and followed by references. II. RELATED WORK A. Suicide Rates Overview Boonkwang et al., [4], tries to identify the suicidal characteristics by applying machine learning methods to the data collected from self-harm surveillance report of a Psychiatric hospital which had information about the suicide attempts. Different machine learning model was applied to identify the individual who has suicidal characteristics tries to commit suicide repeatedly. The decision tree yielded a better result in classifying a suicidal characteristic when compared to Naïve Bayes. There was imbalance in the distribution of the characteristics so Synthetic minority oversampling technique (SMOTE) was applied. When compared to bagging, SMOTE ensemble technique gave a better accuracy. Joseph et al., [5], the author applied six different machine learning methods like Logistic regression, Random forest, Decision tree, Classification via Regression and Sequential
  • 2. Minimal Optimization (SMO) on para suicidal patient’s data. Classification via Regression had higher efficiency when compared with other methods. But the patient’s information has to be updated very frequently to get a better result because this method makes use of psychological measures to predict a person behavior. Walsh et al., [6], aims to apply a machine learning method to classify the psychopathology behavior to know how a person develop a problematic behavior and tends to self- harm. First using physical injury data was validated to identify suicide attempt. An explanatory study was conducted to analyze how specific risk factor changes over time. Random forest was used to classify as it can handle nominal variables. To optimize the model bootstrap method was used. From this method more accurate result was obtained. But there was no difference in performance for single and repeat suicide attempts. Predictor importance used to change over time within the model. Hayes et al., [7], statistical analysis was made use to identify suicidal behavior among college student who are psychotherapy patients. From regression analysis it was identified that suicidal behavior was dependent on depression, behavior and self-injury. After complete analysis it was found that few students were attempted to commit suicide during treatment period. Based on the self-injurious factor whether it was intentional or not behavior of suicidal can be identified. Bae et al., [8], the study was conducted on Korean adolescents using decision tree analysis was made on high school students to predict suicide attempt based on sociodemographic, intrapersonal, and extrapersonal variables. Depression had a stronger severity to predict the suicide attempt. Based on the factor of depression students were classified into three groups in that the depression and potential depression group had high suicide attempts rate when compared to non- depression group. In non-depression group the students who experienced high level stress where tend to attempt suicide. B. Hotel Booking Cancellation Antonio et al., [9], to avoid the risk of booking cancellation and overbooking impact on hotel reputation and revenue. A model was developed to predict the booking cancellation. The Property management system data was used from four resorts which had higher booking cancelation. First the factor which influenced booking cancellation was identified. A model was developed separately for each resort and different algorithms like Boosted decision tree and decision forest were applied. It was validated using k fold cross validation. Boosted decision tree had better accuracy. The bookings with high chance of cancellation can be identified. This allows hotel managers to prevent this cancellation by offering discounts and other services. As this model is applied only for four resorts the prediction result may vary if it is applied to different hotel data. Falk et al., [10], using probit model the booking cancellation is determined. For rooms booked offline, booking made via online travel agencies and through traditional travel agencies, separate estimates are given. This probit study estimates the reason for booking cancelation is higher when it is an early booking, when children are not involved, offline high booking season, guest bookings from specific countries and booking made via online. This study is based on the record hotels belonging to particular chain. Thus, no other group of hotels can generalize the results. In tourism industry, the tourist destination hotel is the revenue indicator. Precise travel forecasting can bring lot of revenue to hotel managements [16]. The ambiguity in prediction of the passenger during the peak season may put the hotel management in confusion, which will overestimate or underestimate the passenger presence due to this there will be wastage of resources. The present predicting model uses linear and non-linear technologies. The author [15], explains the drawback of the machine learning algorithm on hotel booking cancellation prediction. The performance of prediction increases as sample size increase’s, but it tends to normalize after some time. To overcome this problem deep learning methods are used, the long-short term memory (LSTM) is better for time series forecast. For this prediction data from Hainan province which a tourist destination in China was taken. The model was trained for one-night passenger flow and it was tested. A prediction for 12 months of passenger flow was done in a fold. Thus, the model predicted well and suits for dynamic characteristics prediction. C. Bittering units of Beer Popescu et al., [11], examine the characteristics of beer during different stages of the Romanian brewery beer production process using statistical method. Romanian light beer contains 3.4% - 3.9% of alcohol and dark beer contains 3.7% - 4.6% of alcohol. Beer color is influenced based on the usage of wheat. Bitterness gradually decreases in each stage. The loss of bitterness is expected from 24.7 - 41.54% units during boiling, fermentation and bottling process. To identify the beer properties using Ultraviolet-visible (UV-VIS) spectrum along with Artificial neural network (ANN) and Principal component regression (PCR). The diluted beer was scanned, absorbance data was collected and used for modelling. PCR model showed no accepted correlations. ANN exhibited appropriate accuracy [12]. To examine the nutraceutical and mineral properties in beer, three beer with different alcohol content from same brand was taken. On each sample mineral analysis was done and found the zero beer had few important minerals when compared with regular and light beers [13]. By using ANOVA statistical difference was evaluated. To identify the beer properties using Ultraviolet-visible (UV-VIS) spectrum along with Artificial neural network (ANN) and Principal component regression (PCR). The diluted beer was scanned, absorbance data was collected and used for modelling. PCR model showed no accepted correlations. ANN exhibited appropriate accuracy [12]. To
  • 3. examine the nutraceutical and mineral properties in beer, three beer with different alcohol content from same brand was taken. On each sample mineral analysis was done and found the zero beer had few important minerals when compared with regular and light beers [13]. By using ANOVA statistical difference was evaluated. III. DATA MINING METHODOLOGY The dataset belongs to three different domains on which different machine learning methodologies have been applied. Knowledge Discovery in Databases (KDD) methodology is used to extract the information from the datasets. It is an iterative process [18]. First begins with identifying objectives and by the end model will be implemented based on the knowledge discovered. Each step-in fig. 1, is explained with respect to the dataset. Fig. 1. KDD Process Overview Step 1: In this step application domain needs to be understood for end-user target and prior appropriate knowledge has to be possessed. For the analysis three different domain were chosen. First, Suicide Rates Overview to understand the risk factor of suicide. Second, Hotel booking cancellation to identify how revenue and reputation of a hotel gets affected. Third, Bittering units of beer based on the brewery style and alcohol content to identify the bitterness. Step 2: The objectives should be defined and dataset should be identified on which machine learning model is applied. The three datasets used for analysis are taken from Kaggle. The Suicide Rates Overview dataset contains 27820 records and 12 columns [1]. There are 119390 records and 32 columns in Hotel Booking Cancellation dataset and it contains booking information about resort and city hotel along with guest’s requirement specification [2]. In Bittering units of beer dataset there are 73861 records and 23 columns present in it and has information about homebrewed beer [3]. Step 3: The data pre-processing takes place by handling missing data, removing outliers and preparing data for the analysis. In Suicide Rates Overview dataset, HDI_for_year column had missing values and it was imputed using median values. Fig. 2 shows the count of missing values. In Hotel Booking Cancellation dataset there was four values missing in children column and the value was replaced by 0. The columns agent and company were dropped as they were not used in the analysis. Fig. 3. Shows the missing value count. Fig. 2. Missing Values in Suicide Rates Overview Dataset Fig. 3. Missing values in Hotel Booking Cancellation dataset In Bittering units of beer dataset there were many record missing seen in fig. 4. PitchRate and BoilingGravity were imputed using mean values. And other columns were dropped as they were not involved in the analysis. Fig. 4 Missing values in Bittering units of beer dataset In all three datasets the columns which were irrelevant to analysis were dropped. Step 4: The required variables in the dataset needs to be transformed into appropriate format. For example, for classification method the variables should be categorical and for regression it should be continuous. In Hotel Booking Cancellation dataset deposit_type, customer_type was converted into factor and the independent variables were normalized by applying scaler method. Step 5: In fifth step, based on KDD objective decision has to be made whether if classification, regression or clustering has to be applied on the dataset. For Suicide Rates Overview and Bittering units of beer regression methods are used. For Hotel Booking Cancellation classification method is used.
  • 4. Step 6: The data mining algorithm has to be selected for each dataset. For Suicide Rates Overview dataset Multilinear regression and Gradient Boosting regression will be applied. Support Vector Machine (SVM) classification and Naïve Bayes classification is applied on Hotel Booking Cancellation dataset. On Bittering units of beer dataset K- nearest neighbors (KNN) regression and Gradient Boosting Regression will be applied. Step 7: In this step the algorithm has to be applied to identify the pattern and result has to be obtained. In order to apply machine learning algorithm all three dataset were split into 80:20 ratio as training and testing data. Step 8: The results obtained are evaluated based on the parameters like Root mean squared error (RMSE), Mean absolute error (MAE), R-square, accuracy, confusion matrix, etc. Step 9: At last the obtained result is stored and accessed when it is required. In IV Section, machine learning algorithm will be applied on each dataset, based on the obtained result model will be evaluated. A. Suicide Rates Overview For this dataset as per KDD process, target has to be set. Here target is to predict the suicide rates for a country. To predict this a dataset was taken from Kaggle which has 27820 records and contains information like country, population, suicide per 100k population, sex, age, year, suicide numbers, human development index (HDI) for year, gross domestic product (GDP) for year, GDP per capita and generation [1]. The data was checked for missing values. HDI for year column had missing values and was imputed using median. As the other variables were in proper format, it required no transformation. A regression model like multiple linear regression and gradient boosting algorithms are applied on this dataset. As multiple linear regression is used for the prediction based on correlation factor independent variables were taken. Population of a country, HDI for year, GDP per capita and suicides per 100k population are used as explanatory variables to predict suicide numbers in a country. The data was split into 80:20 ratio as training and testing data. The data was trained and tested. Further it was evaluated. (In section IV) B. Hotel Booking Cancellation For this dataset as per KDD, target is set to predict the booking cancellation. This dataset was taken from Kaggle [2]. It has 119390 records and 32 columns. It contains booking information of a customer. The data was checked for missing values agent, company and children values were missing. In children column 4 values were missing and it was replaced with 0. Agent and company columns were dropped because they were not used in the analysis. The categorical variables which had labels instead of values are transformed using dummy variables where duplicate variables are created, 1 will represent the presence of level and 0 will represent the absence of level. Scaling was applied on explanatory variables to normalize the data. On this dataset the classification model, SVM and Naïve Bayes algorithm are applied. The lead time, stays in week nights, number of adults, previous cancellation, booking type and customer type are used as explanatory variable to predict booking cancellation. The data was split into training and testing data in 80:20 ratio. Further the data was trained and tested. The model is evaluated in section IV. Fig. 5. Hotel Booking Cancellation C. Bitterning units of Beer For this dataset as per KDD process, the target is to predict the bittering units of beer. For this analysis the dataset was taken from Kaggle [3]. It has 73861 records and 23 variables. It contains beer brewing information. The dataset was checked for missing values. It was found that boil gravity, mash thickness, pitch rate, primary temperature, priming method, priming amount and user Id had missing values. Boil gravity and pitch rate was imputed using mean. The other variables were dropped as they are not used in the analysis. A regression model like KNN and gradient boosting regression algorithm are used. The alcohol by volume (ABV), color, boil time and pitch rate are used as an explanatory variable to predict bittering units of beer. The dataset was split in to 80:20 ratio for training and testing. The data was trained and tested. In section IV evaluation metrics are discussed. IV. EVALUATION In this section, machine learning models will be applied on the dataset and based on the result obtained model will be evaluated. A. Suicide Rates Overview Multilinear Regression and Gradient Boosting Regression model will be applied on this dataset. Multiple linear Regression: It is useful for modelling the relation between response variable and multiple explanatory variables. Suicide number is the response variable and country population, number of suicides/10k population, Human development index (HDI) for year and Gross Domestic product (GDP) per capita were explanatory variables. The variables were checked for multicollinearity using correlation matrix as the correlation coefficient values is not greater than 0.7 there is no multicollinearity seen in fig.6.
  • 5. Fig. 6. Correlation matrix The dataset was trained and result was predicted. The predicted result was validated using test data. To evaluate the predicted result, Mean Absolute Error (MAE), Root Mean Squared Error (RMSE) and R-square values were obtained. Fig. 7, shows the evaluation metrics of this model. The R- square values is 0.514. Fig. 7. Evaluation metrics Multiple linear Regression If the RMSE value of training and testing data is similar then the data fits well in the model. As seen in fig.8, there is a slight difference between training and testing RMSE value. Here training set value is greater than testing set, when tested for a sample it has less predictive value. Fig. 8. Multiple linear regression RMSE value Kfold cross-validation is a procedure to resample the data and evaluate the machine learning model using that limited data sample. The R-square value obtained after Multiple linear regression validation is 0.449. When the R-square of the regression model (0.514) and validation model (0.449) is compared there is little difference between them. Fig. 9. Multiple linear Regression Validation In fig. 10, observed vs predicted value of multiple linear regression is plotted. Observed values are in yellow color and predicted value are in blue color. Fig. 10. Observed vs Predicted value Multiple linear Regression Gradient Boosting Regression: This can be used for both classification and regression model. For this dataset it is used as regressor. Boosting method builds the model in stage-wise method similar to decision tree. In this method the weak learner can be modified to become better. The dataset was trained and predicted results were tested. MAE, RMSE and R- square values were obtained. The evaluation metrics of gradient boosting regression is shown in fig. 11, The R-square value of this model is 0.871. Fig. 11. Evaluation metrics Gradient Boosting Regression In gradient boosting regression model, there is no much difference between the RMSE values of training and testing dataset which is seen in fig. 12, which means the prediction is better. Fig. 12. RMSE value of Gradient Boosting Regression The R-square value after the Kfold gradient boosting regression validation is 0.895 which is similar to the r-square of the model (0.871). Fig. 13. Gradient Boosting Regression Validation For observed vs predicted value in gradient boosting regression a graph is plotted where yellow represents the observed value and blue represents the predicted value seen in fig. 14. Fig. 14. Observed vs Predicted value Gradient Boosting Regression Algorithm MAE RMSE R-square Multiple Linear Regression 246.71 535.27 0.514 Gradient Boosting Regression 94.96 275.23 0.871 Fig. 15. Evaluation metric for Suicide Rates Overview For Suicide Rates Overview dataset when evaluation metrics of both the regression model is compared, it is more evident that gradient boosting regression model is better in predicting the suicide number as it has low MAE and RMSE values. In addition, this model has high R square value 0.871
  • 6. i.e., close to 1 which supports the prediction. The difference between RMSE value of training and testing set is 268.18 and 275.23 where there is no much difference. Even after validating the R-square value remains same B. Hotel Booking Cancellation Support Vector Machine (SVM) and Naïve Bayes classification model is applied on this dataset to classify the booking cancellation. Support Vector Machine: It can used for both classification and regression analysis. Here,it is used as classification. The data points on a plane are separated and classified into one of the two categories. Such that there is a maximum distance between the categories. It finds the optimal hyper plane between two classes. In this dataset dependent variable is_cancelled is a categorical variable and it is a binary classification, as there are only two classes 0 and 1. In which 0 means booking is not cancelled and 1 means booking is cancelled. This model learns from training instances and classify the testing variable. Fig. 16. SVM classification Result The classification result of booking cancelled or not is seen in fig.16. To evaluate the model accuracy, confusion matrix, precision, recall and f1-score values were obtained. In this model 6034 misclassifications can be observed and 74.73% accuracy is obtained based on confusion matrix. The F1- score value is 0.83. When k-fold cross validation technique was applied on SVM model. Based on the result which is shown in fig. 17, it is evident that there is no overfitting or underfitting of data as result obtained from k-fold is similar to the result obtain in SVM model. Fig. 17. K-fold validation on SVM model The average accuracy score of all 5 k-fold is shown in fig. 18. Fig. 18. Kfold Accuracy Fig. 19, shows the predicted probability of the booking cancellation. Here 1 is cancelled and 0 is not cancelled. Fig. 19. SVM Predicted probability Naïve Bayes: It finds probability of an event based on occurrence of the other event. It works efficiently when independence assumption holds good. This method can handle both discrete and continuous data by making probabilistic prediction. As response variables are binary values, Bernoulli Naïve Bayes model is used for prediction. Various evaluation metrics like accuracy, confusion matrix, precision, recall and f1-score were obtained to evaluate the model. The result is shown in fig. 20. It has 76.22% accuracy and 5677 misclassifications were observed. The f1-score of a model is 0.84. Fig. 20. Naïve Bayes classification Result On Naïve Bayes model k-fold cross validation was applied and result is shown in fig. 21. As there is no difference in the result obtained by Naïve Bayes model and k-fold technique there is no underfitting or overfitting of data. Thus, the model is well generalized.
  • 7. Fig. 21. K-fold validation on Naïve Bayes model From fig. 22, the average accuracy score of all 5 k-fold is 76.58%. Fig. 22. Average Kfold Fig. 23 shows the naïve bayes predicted probability of booking cancellation. Fig. 23. Naïve Bayes Predicted Probability Algorithm Accuracy F1-score Precision SVM model 74.73% 0.83 0.71 Naïve Bayes model 76.22% 0.84 0.73 Fig. 24. Evaluation metrics Hotel Booking Cancellation When SVM and Naïve Bayes model results are compared there is no much difference. But with slight difference it is observed Naïve Bayes model has a better accuracy and f1- score. In Naïve Bayes model less misclassifications are observed. When predicted values are compared Naïve Bayes predict almost accurate result. Thus, Naïve Bayes model has better performance. But there is a drawback in this prediction as most of the sample has 0 which means booking not cancelled. The data is trained and when it is tested there is chance of predicting the booking which is cancelled as not cancelled. C. Bittering units of beer To predict bittering units of beer K-nearest neighbors (KNN) regression and Gradient Boosting Regression will be applied. K-nearest neighbors: It can be used as both regression and classification. Based on the similarity measure it predict the target variable. In KNN regression more than one nearest neighbor can be used and the average of the neighbors is predicted. This model is used to predict the bittering units of beer based on independent variables like alcohol by volume, color, pitch rate and boiling time of the beer. Using this model, the dataset was trained and predicted values were tested. To evaluate the model MAE, RMSE and R-square was obtained. The evaluation metrics is shown in fig. 25. Fig. 25. KNN model Evaluation metrics The RMSE value of both training and testing dataset was obtained for KNN model, there is no much difference between the values is shown in fig. 26. Fig. 26. RMSE value of KNN Regression There is no much difference in R-square after applying Kfold validation technique which is seen in fig. 27. Fig. 27. KNN regression validation A graph was plotted for actual vs predicted value in KNN regression where red signifies actual value and green signifies the predicted value seen in fig. 28. Fig. 28. Observed vs Predicted value KNN Regression Gradient Boosting Regression: As discussed above this method build the model step-wise. The model was trained and tested. The evaluation metrics were obtained which is shown in fig. 29. Fig. 29. Evaluation metrics Gradient Boosting Regression RMSE value for training and testing dataset was obtained. As seen in fig. 30. the RMSE value is almost similar for training and testing dataset.
  • 8. Fig. 30. RMSE value Gradient Boosting Regression After the applying validation technique there is no much difference in R-square value which is seen in fig. 31. Fig. 31. Gradient boosting regression validation Fig. 32, shows the plotting of actual vs predicted value for Gradient boosting regression where red indicates actual value and green indicated predicted value. Fig. 32. Observed vs Predicted value Gradient Boosting Regression Algorithm MAE RMSE R-square KNN Regression 0.996 1.503 0.287 Gradient Boosting Regression 1.022 1.529 0.263 Fig. 33. Evaluation metrics Bittering units of beer When evaluation metrics are compared both KNN and gradient boosting regression are not efficient in predicting the bittering unit of beer. But KNN as slightly higher performance with R-square value 0.287 whereas gradient boosting regression has 0.263 R-square value. V. CONCLUSION AND FUTURE WORK This study analyzes different machine learning algorithms implemented on three different datasets. In predicting “Suicides Rates” Gradient boosting regression algorithm has better performance and from cross- validation it is observed that there is no under-fitting or over- fitting of data. In classification of “Hotel Booking Cancellation” both SVM and Naïve Bayes has accuracy 74.73% and 76.22%. Even after cross validation the accuracy remains same. But Naïve Bayes tends to predict better. Thus, it can be concluded that Naïve Bayes model is a better classifier of hotel booking cancellation. KNN regression is better in predicting the units of bitterness with smaller MAE and RMSE value. The analytical findings of this research can unravel the possibility of improving the models in the future. In order to reduce dimensions Principal Component Analysis can be applied on datasets. Artificial Intelligence can be used to get efficient result. REFERENCES [1] Rusty, “Suicide Rates Overview 1985-2016 | Kaggle," in 2018. [Online]. Available: https://www.kaggle.com/russellyates88/suiciderates-overview-1985- to-2016/metadata [Accessed on Mar. 4, 2020]. [2] J. Mostipak, "Hotel Booking Demand | Kaggle," 2020. [Online]. Available: https://www.kaggle.com/jessemostipak/hotel- bookingdemand [Accessed on Mar. 4, 2020]. [3] Jtrofe, “Brewer’s Friend Beer Recipes | Kaggle,” 2018. [Online]. Available: https://www.kaggle.com/jtrofe/beer-recipes/metadata [Accessed on Mar. 4, 2020]. [4] K. Boonkwang, S. Kasemvilas, S. Kaewhao and O. Youdkang, "A Comparison of Data Mining Techniques for Suicide Attempt Characteristics Mapping and Prediction", International Seminar on Application for Technology of Information and Communication, 2018. [5] A. Joesph and B. Murthy, “Suicidal Behavior Prediction Using Data Mining Techniques,” IJMET, vol. 9, no. 4, pp. 293-301, Apr.2018. [6] C. Walsh, J. Ribeiro and J. Franklin, "Predicting Risk of Suicide Attempts Over Time Through Machine Learning", SAGE publishing, 2017. [7] J. Hayes, J. Petrovich, R. Janis, Y. Yang, L. Castonguay and B. Locke, "Suicide Among College Students in Psychotherapy: Individual Predictors and Latent Classes", Journal of Counseling Psychology, vol. 67, no. 1, pp. 104-114, 2020. [8] S. Bae, S. lee and S. lee, "Prediction by data mining, of suicide attempts in Korean adolescents: a national study", Neuropsychiatric Disease and Treatment, pp. 2367-2375, Sep. 16, 2015. [Accessed 3 May 2020]. [9] N. Antonio, A. Almedia, and L. Nunes, “Predicting Hotel Bookings Cancellation with a Machine Learning Classification Model,” IEEE Conf. Machine learning and Applications, December 16, 2017, pp.1049-1054. [Online]. Available: IEEE Xplore, https://www.ieee.org/ [Accessed on Mar. 4, 2020]. [10] M. Falk and M. Vieru, "Modelling the cancellation behaviour of hotel guests", International Journal of Contemporary Hospitality Management, vol. 30, no. 10, pp. 3100-3116, Jan. 18, 2018. [Accessed 3 May 2020]. [11] V. Popescu, A. Soceanu, S. Dobrinas and G. Stanciu, "A study of beer bitterness loss during the various stages of the Romanian beer production process", Institute of Brewing & Distilling, pp. 111-115, Aug. 15 2013. [Accessed 3 May 2020]. [12] H. Oliveiraa, J. Filhoa, J. Rochaa and E. Núñez, "Rapid monitoring of beer-quality attributes based on UV-Vis spectral data", INTERNATIONAL JOURNAL OF FOOD PROPERTIES, vol. 20, no. 2, pp. 1686-1699, July, 5, 2017. [13] D. Muy-Rangel, V. Urias-Orona, B. Heredia, L. Hernadez-Garcia, W. Rubio-Carrasco, L. Contreras-Angulo, R. Contreras-Martinez, and G. Nino-Medina, “Differences In Physicochemical, Mineral and Nutraceutical Properties Between Regular, Light and Zero Beers,” Farmacia, 2018, Vol. 66, no. 4, Jan. 2018. Doi: org/10.31925/farmacia.2018.4.20R [14] C. Cernuda, E. Lughofer, H. Klein, C. Forster, M. Pawliczek and M. Brandstetter4, "Improved quantification of important beer quality parameters based on nonlinear calibration methods applied to FT-MIR spectra", Springer, pp. 841-857, Aug. 20, 2016. [15] B. Zhang, Y. Pu, Y. Wang and J. Li, "Forecasting Hotel Accommodation Demand Based on LSTM Model Incorporating Internet Search Index", Sustainability, vol. 11, no. 4708, p. 14, Aug. 29, 2019. [16] L. Weatherforda and S. Kimesb, "A comparison of forecasting methods for hotel revenue management", International Institute of Forecasters, vol. 19, no. 3, pp. 401-415, Sep. 2003. [17] H. Ritchie, M. Roser and E. Ortiz-Ospina, "Suicide", Our World in Data, 2020. [Online]. Available: https://ourworldindata.org/suicide. [Accessed on May, 3, 2020].
  • 9. [18] F. Gullo, "From Patterns in Data to Knowledge Discovery: What Data Mining Can Do", 3rd International Conference Frontiers in Diagnostic Technologies, vol. 62, pp. 18-22, 2015.