In recent years, pre-trained language models, such as BERT, have achieved state of-the-art results in several natural language processing tasks. However, these models are typically characterized by a large number of parameters and high demands on memory and processing power. Therefore, their use in limited resource environments, such as on-the-edge applications, is often difficult. Within the context of this diploma thesis, various knowledge distillation tech niques into simple BiLSTM models are investigated with the aim of compressing the Greek-BERT model. The term ”Knowledge Distillation” refers to a set of techniques for transferring knowledge from a large and complex model to a smaller one. Greek BERT is a monolingual BERT language model, which has proven to be very efficient in various natural language processing problems in Modern Greek. For this purpose, GloVe word embeddings in Modern Greek, which were not previously available, are trained and evaluated. GloVe is trained on a huge corpus of texts in Modern Greek, totalling over 30GB. In order to make a fair comparison, the text corpus was crawled from the same web sources used for the pre-training of Greek-BERT. The models are evaluated on the XNLI dataset and on a text classifi cation dataset from the newspaper ”Makedonia”. In order to maximize knowledge transfer from Greek-BERT into the BiLSTM models, a data augmentation algorithm is developed, which is based on the GloVe word embeddings. It is proven that this process significantly improves the perfor mance of the models, especially for small datasets. Experiments indicate that knowledge distillation can improve the performance of simple BiLSTM models for natural language understanding in Modern Greek. The final single-layer model is 28.6x times faster, achieving 96.0% of the performance of Greek-BERT performance in text classification tasks and 86.9% in NLI tasks. The two-layer model is 10.7x times faster, achieving 88.4% of the performance of Greek-BERT in NLI tasks.