Data mining and Machine learning

Prediciting Suicide Rates, Hotel Booking
Cancellation and Bittering Units using Machine
Learning Algorithms
Pooja Kumar
Msc.Data Analytics
National College of Ireland
Dublin, Ireland
x18181929@student.ncirl.ie
Abstract—The aim of this analysis is to apply different
machine learning algorithm on three different datasets to
identify the patterns/insights. Multiple linear regression and
gradient boosting regression will be applied on Suicide Rates
Overview dataset to predict the suicide rate of a country based
on various factor of a country. On Hotel Booking Cancellation
dataset, Support Vector Machine (SVM) and Naïve Bayes
algorithm is applied to predict the booking cancellation based
on booking information. K-nearest neighbor (KNN) and
gradient boosting algorithm will be applied on Bittering units of
Beer dataset to predict the bitterness of beer based on the
brewing style. All the methods will be evaluated and validated
to find the which method has better performance on each datset.
Keywords—Multiple linear regression, gradient boosting, K-
nearest neighbors (KNN) regression, Support Vector Machine
(SVM), Naïve Bayes, Kfold validation.
I. INTRODUCTION
A. Suicide Rates Overview
Globally, nearly 800,000 people commit suicide every
year. That means every 40 seconds one person commit
suicide. The third leading causes of death for the age 15-19
years is suicide. In men suicide rates is just over twice as
women. In 2017, “5% in South Korea, 3.9% in Qatar, 3.6%
in Sri Lanka” were the highest recorded deaths due to suicides
[17]. World Health Organization (WHO) has recognized
suicide as a public health priority. WHO is creating the
awareness about suicide prevention. Many studies are being
carried out for its prevention. This problem needs to be
resolved as it is affecting the country’s human resource. This
study is to analyze the factors that on which the suicide rate
depends and to create a predicting model for the same.
Research Question: What are the parameters that helps in
predicting the suicide rate of a country?
B. Hotel Booking Cancellation
Booking cancelation has a notable influence on
management decisions in the hospitality industry. The hotel
applied strict cancelation policies to lessen the cancellations
which damaged the reputation of the hotel. Hotel tends to
refuse the provision of service as a consequence of
overbooking this will have negative impact on its immediate
revenue. Hotel guest have an option to cancel a reservation
by paying a small sum of amount but for hotel manager it is
a factor that diminish the revenue. Online reservation further
challenge hotels to handle cancellations, which is completely
different from the traditional reservation made by guests. By
estimating both guest and booking specifications an analysis
made to classify whether booking tends to cancel or not.
Research Question: How past customer interaction with
hotel management can be utilized to identify the booking
cancellation in future?
C. Bittering units of Beer
Beer is most consumed alcoholic beverage, made from
barley, water and yeast. Nutrient and non-nutrient
compounds both are present in it and if consumed in sensible
amounts can contribute to a healthy diet. The taste and quality
of beer depends on the quantity and quality of the ingredients.
Bitterness is an important parameter for quality in beer
production. Machine learning is slowly taking over the
traditional process of preparing beer. This study focuses on
analyzing the factors that affect the bitterness in beer.
Research Question: What are the factors that contribute to
predict bittering unit of beer?
In this paper, Section I describes Introduction and
research question of the topic which motivates the study.
Section II describes the related work and various aspects of
the other researches on the same domain. Section III
describes the detailed methodology used to carry out the
process. Section IV describes the various machine learning
algorithm which are applied on the datasets and results are
interpreted. In Section V the analysis is concluded and
followed by references.
II. RELATED WORK
Boonkwang et al., [4], tries to identify the suicidal
characteristics by applying machine learning methods to the
data collected from self-harm surveillance report of a
Psychiatric hospital which had information about the suicide
attempts. Different machine learning model was applied to
identify the individual who has suicidal characteristics tries
to commit suicide repeatedly. The decision tree yielded a
better result in classifying a suicidal characteristic when
compared to Naïve Bayes. There was imbalance in the
distribution of the characteristics so Synthetic minority
oversampling technique (SMOTE) was applied. When
compared to bagging, SMOTE ensemble technique gave a
better accuracy.
Joseph et al., [5], the author applied six different machine
learning methods like Logistic regression, Random forest,
Decision tree, Classification via Regression and Sequential

Minimal Optimization (SMO) on para suicidal patient’s data.
Classification via Regression had higher efficiency when
compared with other methods. But the patient’s information
has to be updated very frequently to get a better result because
this method makes use of psychological measures to predict
a person behavior.
Walsh et al., [6], aims to apply a machine learning method
to classify the psychopathology behavior to know how a
person develop a problematic behavior and tends to self-
harm. First using physical injury data was validated to
identify suicide attempt. An explanatory study was conducted
to analyze how specific risk factor changes over time.
Random forest was used to classify as it can handle nominal
variables. To optimize the model bootstrap method was used.
From this method more accurate result was obtained. But
there was no difference in performance for single and repeat
suicide attempts. Predictor importance used to change over
time within the model.
Hayes et al., [7], statistical analysis was made use to
identify suicidal behavior among college student who are
psychotherapy patients. From regression analysis it was
identified that suicidal behavior was dependent on
depression, behavior and self-injury. After complete analysis
it was found that few students were attempted to commit
suicide during treatment period. Based on the self-injurious
factor whether it was intentional or not behavior of suicidal
can be identified.
Bae et al., [8], the study was conducted on Korean
adolescents using decision tree analysis was made on high
school students to predict suicide attempt based on
sociodemographic, intrapersonal, and extrapersonal
variables. Depression had a stronger severity to predict the
suicide attempt. Based on the factor of depression students
were classified into three groups in that the depression and
potential depression group had high suicide attempts rate
when compared to non- depression group. In non-depression
group the students who experienced high level stress where
tend to attempt suicide.
Antonio et al., [9], to avoid the risk of booking
cancellation and overbooking impact on hotel reputation and
revenue. A model was developed to predict the booking
cancellation. The Property management system data was
used from four resorts which had higher booking cancelation.
First the factor which influenced booking cancellation was
identified. A model was developed separately for each resort
and different algorithms like Boosted decision tree and
decision forest were applied. It was validated using k fold
cross validation. Boosted decision tree had better accuracy.
The bookings with high chance of cancellation can be
identified. This allows hotel managers to prevent this
cancellation by offering discounts and other services. As this
model is applied only for four resorts the prediction result
may vary if it is applied to different hotel data.
Falk et al., [10], using probit model the booking
cancellation is determined. For rooms booked offline,
booking made via online travel agencies and through
traditional travel agencies, separate estimates are given. This
probit study estimates the reason for booking cancelation is
higher when it is an early booking, when children are not
involved, offline high booking season, guest bookings from
specific countries and booking made via online. This study is
based on the record hotels belonging to particular chain.
Thus, no other group of hotels can generalize the results.
In tourism industry, the tourist destination hotel is the
revenue indicator. Precise travel forecasting can bring lot of
revenue to hotel managements [16]. The ambiguity in
prediction of the passenger during the peak season may put
the hotel management in confusion, which will overestimate
or underestimate the passenger presence due to this there will
be wastage of resources. The present predicting model uses
linear and non-linear technologies. The author [15], explains
the drawback of the machine learning algorithm on hotel
booking cancellation prediction. The performance of
prediction increases as sample size increase’s, but it tends to
normalize after some time. To overcome this problem deep
learning methods are used, the long-short term memory
(LSTM) is better for time series forecast. For this prediction
data from Hainan province which a tourist destination in
China was taken. The model was trained for one-night
passenger flow and it was tested. A prediction for 12 months
of passenger flow was done in a fold. Thus, the model
predicted well and suits for dynamic characteristics
prediction.
C. Bittering units of Beer
Popescu et al., [11], examine the characteristics of beer
during different stages of the Romanian brewery beer
production process using statistical method. Romanian light
beer contains 3.4% - 3.9% of alcohol and dark beer contains
3.7% - 4.6% of alcohol. Beer color is influenced based on the
usage of wheat. Bitterness gradually decreases in each stage.
The loss of bitterness is expected from 24.7 - 41.54% units
during boiling, fermentation and bottling process.
To identify the beer properties using Ultraviolet-visible
(UV-VIS) spectrum along with Artificial neural network
(ANN) and Principal component regression (PCR). The
diluted beer was scanned, absorbance data was collected and
used for modelling. PCR model showed no accepted
correlations. ANN exhibited appropriate accuracy [12]. To
examine the nutraceutical and mineral properties in beer,
three beer with different alcohol content from same brand
was taken. On each sample mineral analysis was done and
found the zero beer had few important minerals when
compared with regular and light beers [13]. By using
ANOVA statistical difference was evaluated.
To identify the beer properties using Ultraviolet-visible
(UV-VIS) spectrum along with Artificial neural network
(ANN) and Principal component regression (PCR). The
diluted beer was scanned, absorbance data was collected and
used for modelling. PCR model showed no accepted
correlations. ANN exhibited appropriate accuracy [12]. To

examine the nutraceutical and mineral properties in beer,
three beer with different alcohol content from same brand
was taken. On each sample mineral analysis was done and
found the zero beer had few important minerals when
compared with regular and light beers [13]. By using
ANOVA statistical difference was evaluated.
III. DATA MINING METHODOLOGY
The dataset belongs to three different domains on which
different machine learning methodologies have been applied.
Knowledge Discovery in Databases (KDD) methodology is
used to extract the information from the datasets. It is an
iterative process [18]. First begins with identifying objectives
and by the end model will be implemented based on the
knowledge discovered. Each step-in fig. 1, is explained with
respect to the dataset.
Fig. 1. KDD Process Overview
Step 1: In this step application domain needs to be understood
for end-user target and prior appropriate knowledge has to be
possessed. For the analysis three different domain were
chosen. First, Suicide Rates Overview to understand the risk
factor of suicide. Second, Hotel booking cancellation to
identify how revenue and reputation of a hotel gets affected.
Third, Bittering units of beer based on the brewery style and
alcohol content to identify the bitterness.
Step 2: The objectives should be defined and dataset should
be identified on which machine learning model is applied.
The three datasets used for analysis are taken from Kaggle.
The Suicide Rates Overview dataset contains 27820 records
and 12 columns [1]. There are 119390 records and 32
columns in Hotel Booking Cancellation dataset and it
contains booking information about resort and city hotel
along with guest’s requirement specification [2]. In Bittering
units of beer dataset there are 73861 records and 23 columns
present in it and has information about homebrewed beer [3].
Step 3: The data pre-processing takes place by handling
missing data, removing outliers and preparing data for the
analysis.
In Suicide Rates Overview dataset, HDI_for_year
column had missing values and it was imputed using median
values. Fig. 2 shows the count of missing values.
In Hotel Booking Cancellation dataset there was four
values missing in children column and the value was replaced
by 0. The columns agent and company were dropped as they
were not used in the analysis. Fig. 3. Shows the missing value
count.
Fig. 2. Missing Values in Suicide Rates Overview Dataset
Fig. 3. Missing values in Hotel Booking Cancellation dataset
In Bittering units of beer dataset there were many
record missing seen in fig. 4. PitchRate and BoilingGravity
were imputed using mean values. And other columns were
dropped as they were not involved in the analysis.
Fig. 4 Missing values in Bittering units of beer dataset
In all three datasets the columns which were irrelevant to
analysis were dropped.
Step 4: The required variables in the dataset needs to be
transformed into appropriate format. For example, for
classification method the variables should be categorical and
for regression it should be continuous. In Hotel Booking
Cancellation dataset deposit_type, customer_type was
converted into factor and the independent variables were
normalized by applying scaler method.
Step 5: In fifth step, based on KDD objective decision has to
be made whether if classification, regression or clustering has
to be applied on the dataset. For Suicide Rates Overview and
Bittering units of beer regression methods are used. For Hotel
Booking Cancellation classification method is used.

Step 6: The data mining algorithm has to be selected for each
dataset. For Suicide Rates Overview dataset Multilinear
regression and Gradient Boosting regression will be applied.
Support Vector Machine (SVM) classification and Naïve
Bayes classification is applied on Hotel Booking
Cancellation dataset. On Bittering units of beer dataset K-
nearest neighbors (KNN) regression and Gradient Boosting
Regression will be applied.
Step 7: In this step the algorithm has to be applied to identify
the pattern and result has to be obtained. In order to apply
machine learning algorithm all three dataset were split into
80:20 ratio as training and testing data.
Step 8: The results obtained are evaluated based on the
parameters like Root mean squared error (RMSE), Mean
absolute error (MAE), R-square, accuracy, confusion matrix,
etc.
Step 9: At last the obtained result is stored and accessed when
it is required.
In IV Section, machine learning algorithm will be applied on
each dataset, based on the obtained result model will be
evaluated.
For this dataset as per KDD process, target has to be set.
Here target is to predict the suicide rates for a country. To
predict this a dataset was taken from Kaggle which has 27820
records and contains information like country, population,
suicide per 100k population, sex, age, year, suicide numbers,
human development index (HDI) for year, gross domestic
product (GDP) for year, GDP per capita and generation [1].
The data was checked for missing values. HDI for year
column had missing values and was imputed using median.
As the other variables were in proper format, it required no
transformation. A regression model like multiple linear
regression and gradient boosting algorithms are applied on
this dataset. As multiple linear regression is used for the
prediction based on correlation factor independent variables
were taken. Population of a country, HDI for year, GDP per
capita and suicides per 100k population are used as
explanatory variables to predict suicide numbers in a country.
The data was split into 80:20 ratio as training and testing data.
The data was trained and tested. Further it was evaluated. (In
section IV)
For this dataset as per KDD, target is set to predict the
booking cancellation. This dataset was taken from Kaggle
[2]. It has 119390 records and 32 columns. It contains
booking information of a customer. The data was checked for
missing values agent, company and children values were
missing. In children column 4 values were missing and it was
replaced with 0. Agent and company columns were dropped
because they were not used in the analysis. The categorical
variables which had labels instead of values are transformed
using dummy variables where duplicate variables are created,
1 will represent the presence of level and 0 will represent the
absence of level. Scaling was applied on explanatory
variables to normalize the data. On this dataset the
classification model, SVM and Naïve Bayes algorithm are
applied. The lead time, stays in week nights, number of
adults, previous cancellation, booking type and customer type
are used as explanatory variable to predict booking
cancellation. The data was split into training and testing data
in 80:20 ratio. Further the data was trained and tested. The
model is evaluated in section IV.
Fig. 5. Hotel Booking Cancellation
C. Bitterning units of Beer
For this dataset as per KDD process, the target is to
predict the bittering units of beer. For this analysis the dataset
was taken from Kaggle [3]. It has 73861 records and 23
variables. It contains beer brewing information. The dataset
was checked for missing values. It was found that boil
gravity, mash thickness, pitch rate, primary temperature,
priming method, priming amount and user Id had missing
values. Boil gravity and pitch rate was imputed using mean.
The other variables were dropped as they are not used in the
analysis. A regression model like KNN and gradient boosting
regression algorithm are used. The alcohol by volume (ABV),
color, boil time and pitch rate are used as an explanatory
variable to predict bittering units of beer. The dataset was
split in to 80:20 ratio for training and testing. The data was
trained and tested. In section IV evaluation metrics are
discussed.
IV. EVALUATION
In this section, machine learning models will be applied on
the dataset and based on the result obtained model will be
evaluated.
Multilinear Regression and Gradient Boosting Regression
model will be applied on this dataset.
Multiple linear Regression: It is useful for modelling the
relation between response variable and multiple explanatory
variables. Suicide number is the response variable and
country population, number of suicides/10k population,
Human development index (HDI) for year and Gross
Domestic product (GDP) per capita were explanatory
variables. The variables were checked for multicollinearity
using correlation matrix as the correlation coefficient values
is not greater than 0.7 there is no multicollinearity seen in
fig.6.

Fig. 6. Correlation matrix
The dataset was trained and result was predicted. The
predicted result was validated using test data. To evaluate the
predicted result, Mean Absolute Error (MAE), Root Mean
Squared Error (RMSE) and R-square values were obtained.
Fig. 7, shows the evaluation metrics of this model. The R-
square values is 0.514.
Fig. 7. Evaluation metrics Multiple linear Regression
If the RMSE value of training and testing data is similar
then the data fits well in the model. As seen in fig.8, there is
a slight difference between training and testing RMSE value.
Here training set value is greater than testing set, when tested
for a sample it has less predictive value.
Fig. 8. Multiple linear regression RMSE value
Kfold cross-validation is a procedure to resample the data
and evaluate the machine learning model using that limited
data sample. The R-square value obtained after Multiple linear
regression validation is 0.449. When the R-square of the
regression model (0.514) and validation model (0.449) is
compared there is little difference between them.
Fig. 9. Multiple linear Regression Validation
In fig. 10, observed vs predicted value of multiple linear
regression is plotted. Observed values are in yellow color and
predicted value are in blue color.
Fig. 10. Observed vs Predicted value Multiple linear Regression
Gradient Boosting Regression: This can be used for both
classification and regression model. For this dataset it is used
as regressor. Boosting method builds the model in stage-wise
method similar to decision tree. In this method the weak
learner can be modified to become better. The dataset was
trained and predicted results were tested. MAE, RMSE and R-
square values were obtained. The evaluation metrics of
gradient boosting regression is shown in fig. 11, The R-square
value of this model is 0.871.
Fig. 11. Evaluation metrics Gradient Boosting Regression
In gradient boosting regression model, there is no much
difference between the RMSE values of training and testing
dataset which is seen in fig. 12, which means the prediction is
better.
Fig. 12. RMSE value of Gradient Boosting Regression
The R-square value after the Kfold gradient boosting
regression validation is 0.895 which is similar to the r-square
of the model (0.871).
Fig. 13. Gradient Boosting Regression Validation
For observed vs predicted value in gradient boosting
regression a graph is plotted where yellow represents the
observed value and blue represents the predicted value seen in
fig. 14.
Fig. 14. Observed vs Predicted value Gradient Boosting Regression
Algorithm MAE RMSE R-square
Multiple Linear
Regression
246.71 535.27 0.514
Gradient Boosting
Regression
94.96 275.23 0.871
Fig. 15. Evaluation metric for Suicide Rates Overview
For Suicide Rates Overview dataset when evaluation
metrics of both the regression model is compared, it is more
evident that gradient boosting regression model is better in
predicting the suicide number as it has low MAE and RMSE
values. In addition, this model has high R square value 0.871

i.e., close to 1 which supports the prediction. The difference
between RMSE value of training and testing set is 268.18 and
275.23 where there is no much difference. Even after
validating the R-square value remains same
Support Vector Machine (SVM) and Naïve Bayes
classification model is applied on this dataset to classify the
booking cancellation.
Support Vector Machine: It can used for both
classification and regression analysis. Here,it is used as
classification. The data points on a plane are separated and
classified into one of the two categories. Such that there is a
maximum distance between the categories. It finds the
optimal hyper plane between two classes. In this dataset
dependent variable is_cancelled is a categorical variable and
it is a binary classification, as there are only two classes 0 and
1. In which 0 means booking is not cancelled and 1 means
booking is cancelled. This model learns from training
instances and classify the testing variable.
Fig. 16. SVM classification Result
The classification result of booking cancelled or not is
seen in fig.16. To evaluate the model accuracy, confusion
matrix, precision, recall and f1-score values were obtained.
In this model 6034 misclassifications can be observed and
74.73% accuracy is obtained based on confusion matrix. The
F1- score value is 0.83.
When k-fold cross validation technique was applied on
SVM model. Based on the result which is shown in fig. 17, it
is evident that there is no overfitting or underfitting of data as
result obtained from k-fold is similar to the result obtain in
SVM model.
Fig. 17. K-fold validation on SVM model
The average accuracy score of all 5 k-fold is shown in fig. 18.
Fig. 18. Kfold Accuracy
Fig. 19, shows the predicted probability of the booking
cancellation. Here 1 is cancelled and 0 is not cancelled.
Fig. 19. SVM Predicted probability
Naïve Bayes: It finds probability of an event based on
occurrence of the other event. It works efficiently when
independence assumption holds good. This method can
handle both discrete and continuous data by making
probabilistic prediction. As response variables are binary
values, Bernoulli Naïve Bayes model is used for prediction.
Various evaluation metrics like accuracy, confusion matrix,
precision, recall and f1-score were obtained to evaluate the
model. The result is shown in fig. 20. It has 76.22% accuracy
and 5677 misclassifications were observed. The f1-score of a
model is 0.84.
Fig. 20. Naïve Bayes classification Result
On Naïve Bayes model k-fold cross validation was
applied and result is shown in fig. 21. As there is no
difference in the result obtained by Naïve Bayes model and
k-fold technique there is no underfitting or overfitting of data.
Thus, the model is well generalized.

Fig. 21. K-fold validation on Naïve Bayes model
From fig. 22, the average accuracy score of all 5 k-fold is
76.58%.
Fig. 22. Average Kfold
Fig. 23 shows the naïve bayes predicted probability of
booking cancellation.
Fig. 23. Naïve Bayes Predicted Probability
Algorithm Accuracy F1-score Precision
SVM model 74.73% 0.83 0.71
Naïve Bayes model 76.22% 0.84 0.73
Fig. 24. Evaluation metrics Hotel Booking Cancellation
When SVM and Naïve Bayes model results are compared
there is no much difference. But with slight difference it is
observed Naïve Bayes model has a better accuracy and f1-
score. In Naïve Bayes model less misclassifications are
observed. When predicted values are compared Naïve Bayes
predict almost accurate result. Thus, Naïve Bayes model has
better performance.
But there is a drawback in this prediction as most of the
sample has 0 which means booking not cancelled. The data is
trained and when it is tested there is chance of predicting the
booking which is cancelled as not cancelled.
C. Bittering units of beer
To predict bittering units of beer K-nearest neighbors
(KNN) regression and Gradient Boosting Regression will be
applied.
K-nearest neighbors: It can be used as both regression
and classification. Based on the similarity measure it predict
the target variable. In KNN regression more than one nearest
neighbor can be used and the average of the neighbors is
predicted. This model is used to predict the bittering units of
beer based on independent variables like alcohol by volume,
color, pitch rate and boiling time of the beer. Using this
model, the dataset was trained and predicted values were
tested. To evaluate the model MAE, RMSE and R-square was
obtained. The evaluation metrics is shown in fig. 25.
Fig. 25. KNN model Evaluation metrics
The RMSE value of both training and testing dataset was
obtained for KNN model, there is no much difference between
the values is shown in fig. 26.
Fig. 26. RMSE value of KNN Regression
There is no much difference in R-square after applying
Kfold validation technique which is seen in fig. 27.
Fig. 27. KNN regression validation
A graph was plotted for actual vs predicted value in KNN
regression where red signifies actual value and green signifies
the predicted value seen in fig. 28.
Fig. 28. Observed vs Predicted value KNN Regression
Gradient Boosting Regression: As discussed above this
method build the model step-wise. The model was trained and
tested. The evaluation metrics were obtained which is shown
in fig. 29.
Fig. 29. Evaluation metrics Gradient Boosting Regression
RMSE value for training and testing dataset was obtained.
As seen in fig. 30. the RMSE value is almost similar for
training and testing dataset.

Fig. 30. RMSE value Gradient Boosting Regression
After the applying validation technique there is no much
difference in R-square value which is seen in fig. 31.
Fig. 31. Gradient boosting regression validation
Fig. 32, shows the plotting of actual vs predicted value for
Gradient boosting regression where red indicates actual value
and green indicated predicted value.
Fig. 32. Observed vs Predicted value Gradient Boosting Regression
Algorithm MAE RMSE R-square
KNN Regression 0.996 1.503 0.287
Gradient Boosting
Regression
1.022 1.529 0.263
Fig. 33. Evaluation metrics Bittering units of beer
When evaluation metrics are compared both KNN and
gradient boosting regression are not efficient in predicting the
bittering unit of beer. But KNN as slightly higher
performance with R-square value 0.287 whereas gradient
boosting regression has 0.263 R-square value.
V. CONCLUSION AND FUTURE WORK
This study analyzes different machine learning
algorithms implemented on three different datasets.
In predicting “Suicides Rates” Gradient boosting
regression algorithm has better performance and from cross-
validation it is observed that there is no under-fitting or over-
fitting of data.
In classification of “Hotel Booking Cancellation” both
SVM and Naïve Bayes has accuracy 74.73% and 76.22%.
Even after cross validation the accuracy remains same. But
Naïve Bayes tends to predict better. Thus, it can be concluded
that Naïve Bayes model is a better classifier of hotel booking
cancellation.
KNN regression is better in predicting the units of
bitterness with smaller MAE and RMSE value.
The analytical findings of this research can unravel the
possibility of improving the models in the future. In order to
reduce dimensions Principal Component Analysis can be
applied on datasets. Artificial Intelligence can be used to get
efficient result.
REFERENCES
[1] Rusty, “Suicide Rates Overview 1985-2016 | Kaggle," in 2018.
[Online]. Available:
https://www.kaggle.com/russellyates88/suiciderates-overview-1985-
to-2016/metadata [Accessed on Mar. 4, 2020].
[2] J. Mostipak, "Hotel Booking Demand | Kaggle," 2020. [Online].
Available: https://www.kaggle.com/jessemostipak/hotel-
bookingdemand [Accessed on Mar. 4, 2020].
[3] Jtrofe, “Brewer’s Friend Beer Recipes | Kaggle,” 2018. [Online].
Available: https://www.kaggle.com/jtrofe/beer-recipes/metadata
[Accessed on Mar. 4, 2020].
[4] K. Boonkwang, S. Kasemvilas, S. Kaewhao and O. Youdkang, "A
Comparison of Data Mining Techniques for Suicide Attempt
Characteristics Mapping and Prediction", International Seminar on
Application for Technology of Information and Communication, 2018.
[5] A. Joesph and B. Murthy, “Suicidal Behavior Prediction Using Data
Mining Techniques,” IJMET, vol. 9, no. 4, pp. 293-301, Apr.2018.
[6] C. Walsh, J. Ribeiro and J. Franklin, "Predicting Risk of Suicide
Attempts Over Time Through Machine Learning", SAGE publishing,
2017.
[7] J. Hayes, J. Petrovich, R. Janis, Y. Yang, L. Castonguay and B. Locke,
"Suicide Among College Students in Psychotherapy: Individual
Predictors and Latent Classes", Journal of Counseling Psychology, vol.
67, no. 1, pp. 104-114, 2020.
[8] S. Bae, S. lee and S. lee, "Prediction by data mining, of suicide attempts
in Korean adolescents: a national study", Neuropsychiatric Disease and
Treatment, pp. 2367-2375, Sep. 16, 2015. [Accessed 3 May 2020].
[9] N. Antonio, A. Almedia, and L. Nunes, “Predicting Hotel Bookings
Cancellation with a Machine Learning Classification Model,” IEEE
Conf. Machine learning and Applications, December 16, 2017,
pp.1049-1054. [Online]. Available: IEEE Xplore,
https://www.ieee.org/ [Accessed on Mar. 4, 2020].
[10] M. Falk and M. Vieru, "Modelling the cancellation behaviour of hotel
guests", International Journal of Contemporary Hospitality
Management, vol. 30, no. 10, pp. 3100-3116, Jan. 18, 2018. [Accessed
3 May 2020].
[11] V. Popescu, A. Soceanu, S. Dobrinas and G. Stanciu, "A study of beer
bitterness loss during the various stages of the Romanian beer
production process", Institute of Brewing & Distilling, pp. 111-115,
Aug. 15 2013. [Accessed 3 May 2020].
[12] H. Oliveiraa, J. Filhoa, J. Rochaa and E. Núñez, "Rapid monitoring of
beer-quality attributes based on UV-Vis spectral
data", INTERNATIONAL JOURNAL OF FOOD PROPERTIES, vol.
20, no. 2, pp. 1686-1699, July, 5, 2017.
[13] D. Muy-Rangel, V. Urias-Orona, B. Heredia, L. Hernadez-Garcia, W.
Rubio-Carrasco, L. Contreras-Angulo, R. Contreras-Martinez, and G.
Nino-Medina, “Differences In Physicochemical, Mineral and
Nutraceutical Properties Between Regular, Light and Zero Beers,”
Farmacia, 2018, Vol. 66, no. 4, Jan. 2018. Doi:
org/10.31925/farmacia.2018.4.20R
[14] C. Cernuda, E. Lughofer, H. Klein, C. Forster, M. Pawliczek and M.
Brandstetter4, "Improved quantification of important beer quality
parameters based on nonlinear calibration methods applied to FT-MIR
spectra", Springer, pp. 841-857, Aug. 20, 2016.
[15] B. Zhang, Y. Pu, Y. Wang and J. Li, "Forecasting Hotel
Accommodation Demand Based on LSTM Model Incorporating
Internet Search Index", Sustainability, vol. 11, no. 4708, p. 14, Aug.
29, 2019.
[16] L. Weatherforda and S. Kimesb, "A comparison of forecasting methods
for hotel revenue management", International Institute of Forecasters,
vol. 19, no. 3, pp. 401-415, Sep. 2003.
[17] H. Ritchie, M. Roser and E. Ortiz-Ospina, "Suicide", Our World in
Data, 2020. [Online]. Available: https://ourworldindata.org/suicide.
[Accessed on May, 3, 2020].

[18] F. Gullo, "From Patterns in Data to Knowledge Discovery: What Data
Mining Can Do", 3rd International Conference Frontiers in Diagnostic
Technologies, vol. 62, pp. 18-22, 2015.

Data mining and Machine learning

Recommended

Recommended

More Related Content

Similar to Data mining and Machine learning

Similar to Data mining and Machine learning (20)

Recently uploaded

Recently uploaded (20)

Data mining and Machine learning