Thai textual memes have been popular in social media, as a form of image information summarization. Unfortunately, many memes contain some hateful content that easily causes the controversy in Thailand. For global protection, the hateful memes challengeis also provided by Facebook AI to enable researchers to compete their algorithms for combating the hate speech on memes as one of NeurIPS’20 competitions. As well as in Thailand, this paper introduces the Thai textual meme detection as a new research problem in Thai natural language processing (Thai-NLP) that is the settlement of transmission linkage between scene text localization, Thai optical recognition (Thai-OCR) and language understanding. From the results, both regular and irregular text position can be localized by one-stage detection pipeline. More scene text can be augmented by different resolution and rotation. The accuracy of Thai-OCR using convolutional neural network (CNN) can be improved by recurrent neural network (RNN). Since misspelling Thai words are frequently used in social, this paper categorizes them as synonyms to train on multi-task pre-trained language model.
Fine grained irony classification through transfer learning approachCSITiaesprime
Nowadays irony appears to be pervasive in all social media discussion forums and chats, offering further obstacles to sentiment analysis efforts. The aim of the present research work is to detect irony and its types in English tweets We employed a new system for irony detection in English tweets, and we propose a distilled bidirectional encoder representations from transformers (DistilBERT) light transformer model based on the bidirectional encoder representations from transformers (BERT) architecture, this is further strengthened by the use and design of bidirectional long-short term memory (Bi-LSTM) network this configuration minimizes data preprocessing tasks proposed model tests on a SemEval2018 task 3, 3,834 samples were provided. Experiment results show the proposed system has achieved a precision of 81% for not irony class and 66% for irony class, recall of 77% for not irony and 72% for irony, and F1 score of 79% for not irony and 69% for irony class since researchers have come up with a binary classification model, in this study we have extended our work for multiclass classification of irony. It is significant and will serve as a foundation for future research on different types of irony in tweets.
Cognitive level classification on information communication technology skill...IJECEIAES
Learners can study and update their knowledge continually due to the rapid growth of online content. The Medium blog is a well-known open platform that encourages authors who want to share their experiences to publish content on various topics in multiple languages. Meanwhile, readers can query interesting content by searching for a related topic. However, finding suitable content is still challenging for learners, especially information communication technology (ICT) content in Thai, and needs to be classified into beginner, intermediate, and advanced cognitive levels. Moreover, ICT blog content is usually a mix of Thai language and technical terms in English. To overcome the challenge of content classification, a deep neural network (DNN) classification model was constructed to classify the ICT content from the Medium blog into three levels based on cognition. We examined and compared the classification results with strong baseline models, including logistic regression, multinomial naïve bayes, support vector machine (SVM), and multilayer perceptron (MLP). The experimental results indicate that the proposed DNN model attained the highest accuracy (0.878), precision (0.882), recall (0.878), and F1-score (0.875).
Hate speech detection on Indonesian text using word embedding method-global v...IAESIJAI
Hate speech is defined as communication directed toward a specific individual or group that involves hatred or anger and a language with solid arguments leading to someone's opinion can cause social conflict. It has a lot of potential for individuals to communicate their thoughts on an online platform because the number of Internet users globally, including in Indonesia, is continually rising. This study aims to observe the impact of pre-trained global vector (GloVe) word embedding on accuracy in the classification of hate speech and non-hate speech. The use of pre-trained GloVe (Indonesian text) and single and multi-layer long short-term memory (LSTM) classifiers has performance that is resistant to overfitting compared to pre-trainable embedding for hate-speech detection. The accuracy value is 81.5% on a single layer and 80.9% on a double-layer LSTM. The following job is to provide pre-trained with formal and non-formal language corpus; pre-processing to overcome non-formal words is very challenging.
The COVID-19 fake news detection in Thai social textsjournalBEEI
One important obstruction against Thai COVID-19 recovery is fake news shared on social media that is one of the “Artificial Intelligence Open Issues against COVID-19” reported by Montreal.AI. Misinformation spread is one of the main cyber-security threats that should be filtered out as the IDS for maintaining COVID-19 information quality. To detect fake news in Thai texts, Thai-NLP techniques are necessary. This paper proposes a state-of-the-art Thai COVID-19 fake news detection among word relations using transfer learning models. For pre-training from the global open COVID-19 datasets, the source dataset is constructed by English to Thai translating. The novel feature shifting is formulated to enlarge Thai text examples in target dataset. Machine translation can be used for constructing Thai source dataset to cope with the lack of local dataset for future Thai-NLP applications. To lead the knowledge in Thai text understanding forward, feature shifting is a promising accuracy improvement in fine-tuning stage.
Smart detection of offensive words in social media using the soundex algorith...IJECEIAES
Offensive posts in the social media that are inappropriate for a specific age, level of maturity, or impression are quite often destined more to unadult than adult participants. Nowadays, the growth in the number of the masked offensive words in the social media is one of the ethically challenging problems. Thus, there has been growing interest in development of methods that can automatically detect posts with such words. This study aimed at developing a method that can detect the masked offensive words in which partial alteration of the word may trick the conventional monitoring systems when being posted on social media. The proposed method progresses in a series of phases that can be broken down into a pre-processing phase, which includes filtering, tokenization, and stemming; offensive word extraction phase, which relies on using the soundex algorithm and permuterm index; and a post-processing phase that classifies the users’ posts in order to highlight the offensive content. Accordingly, the method detects the masked offensive words in the written text, thus forbidding certain types of offensive words from being published. Results of evaluation of performance of the proposed method indicate a 99% accuracy of detection of offensive words.
Temporal Exploration in 2D Visualization of Emotions on Twitter StreamTELKOMNIKA JOURNAL
As people freely express their opinions toward a product on Twitter streams without being bound
by time, visualizing time pattern of customers emotional behavior can play a crucial role in decisionmaking.
We analyze how emotions are fluctuated in pattern and demonstrate how we can explore it into
useful visualizations with an appropriate framework. We manually customized the current framework in
order to improve a state-of-the-art of crawling and visualizing Twitter data. The data, post or update on
status on the Twitter website about iPhone, was collected from U.S.A, Japan, Indonesia, and Taiwan by
using geographical bounding-box and visualized it into two-dimensional heat map, interactive stream
graph, and context focus via brushing visualization. The results show that our proposed system can
explore uniqueness of temporal pattern of customers emotional behavior.
MIMEME ATTRIBUTE CLASSIFICATION USING LDV ENSEMBLE MULTIMODEL LEARNINGCSEIJJournal
One of the most common types of social networking interaction is memes. Memes are innately multimodal,
so studying and processing them is a hot issue currently. This study's analysis of the DV dataset comprises
classifying memes according to their irony, humour, motive, and overarching mood. The effectiveness of
three different creative transformer-based strategies has been carefully examined. The DV Dataset used
here is created by own meme data for this implementation analysis of hateful memes. Out of all of our
strategies, the proposed ensemble model LDV obtained a macro F1 score of 0.737 for humour
classification, 0.775 for motivation classification, 0.69 for sarcasm classification, and 0.756 for overall
sentiment of the meme.
Mimeme Attribute Classification using LDV Ensemble Multimodel LearningCSEIJJournal
One of the most common types of social networking interaction is memes. Memes are innately multimodal,
so studying and processing them is a hot issue currently. This study's analysis of the DV dataset comprises
classifying memes according to their irony, humour, motive, and overarching mood. The effectiveness of
three different creative transformer-based strategies has been carefully examined. The DV Dataset used
here is created by own meme data for this implementation analysis of hateful memes. Out of all of our
strategies, the proposed ensemble model LDV obtained a macro F1 score of 0.737 for humour
classification, 0.775 for motivation classification, 0.69 for sarcasm classification, and 0.756 for overall
sentiment of the meme.
Fine grained irony classification through transfer learning approachCSITiaesprime
Nowadays irony appears to be pervasive in all social media discussion forums and chats, offering further obstacles to sentiment analysis efforts. The aim of the present research work is to detect irony and its types in English tweets We employed a new system for irony detection in English tweets, and we propose a distilled bidirectional encoder representations from transformers (DistilBERT) light transformer model based on the bidirectional encoder representations from transformers (BERT) architecture, this is further strengthened by the use and design of bidirectional long-short term memory (Bi-LSTM) network this configuration minimizes data preprocessing tasks proposed model tests on a SemEval2018 task 3, 3,834 samples were provided. Experiment results show the proposed system has achieved a precision of 81% for not irony class and 66% for irony class, recall of 77% for not irony and 72% for irony, and F1 score of 79% for not irony and 69% for irony class since researchers have come up with a binary classification model, in this study we have extended our work for multiclass classification of irony. It is significant and will serve as a foundation for future research on different types of irony in tweets.
Cognitive level classification on information communication technology skill...IJECEIAES
Learners can study and update their knowledge continually due to the rapid growth of online content. The Medium blog is a well-known open platform that encourages authors who want to share their experiences to publish content on various topics in multiple languages. Meanwhile, readers can query interesting content by searching for a related topic. However, finding suitable content is still challenging for learners, especially information communication technology (ICT) content in Thai, and needs to be classified into beginner, intermediate, and advanced cognitive levels. Moreover, ICT blog content is usually a mix of Thai language and technical terms in English. To overcome the challenge of content classification, a deep neural network (DNN) classification model was constructed to classify the ICT content from the Medium blog into three levels based on cognition. We examined and compared the classification results with strong baseline models, including logistic regression, multinomial naïve bayes, support vector machine (SVM), and multilayer perceptron (MLP). The experimental results indicate that the proposed DNN model attained the highest accuracy (0.878), precision (0.882), recall (0.878), and F1-score (0.875).
Hate speech detection on Indonesian text using word embedding method-global v...IAESIJAI
Hate speech is defined as communication directed toward a specific individual or group that involves hatred or anger and a language with solid arguments leading to someone's opinion can cause social conflict. It has a lot of potential for individuals to communicate their thoughts on an online platform because the number of Internet users globally, including in Indonesia, is continually rising. This study aims to observe the impact of pre-trained global vector (GloVe) word embedding on accuracy in the classification of hate speech and non-hate speech. The use of pre-trained GloVe (Indonesian text) and single and multi-layer long short-term memory (LSTM) classifiers has performance that is resistant to overfitting compared to pre-trainable embedding for hate-speech detection. The accuracy value is 81.5% on a single layer and 80.9% on a double-layer LSTM. The following job is to provide pre-trained with formal and non-formal language corpus; pre-processing to overcome non-formal words is very challenging.
The COVID-19 fake news detection in Thai social textsjournalBEEI
One important obstruction against Thai COVID-19 recovery is fake news shared on social media that is one of the “Artificial Intelligence Open Issues against COVID-19” reported by Montreal.AI. Misinformation spread is one of the main cyber-security threats that should be filtered out as the IDS for maintaining COVID-19 information quality. To detect fake news in Thai texts, Thai-NLP techniques are necessary. This paper proposes a state-of-the-art Thai COVID-19 fake news detection among word relations using transfer learning models. For pre-training from the global open COVID-19 datasets, the source dataset is constructed by English to Thai translating. The novel feature shifting is formulated to enlarge Thai text examples in target dataset. Machine translation can be used for constructing Thai source dataset to cope with the lack of local dataset for future Thai-NLP applications. To lead the knowledge in Thai text understanding forward, feature shifting is a promising accuracy improvement in fine-tuning stage.
Smart detection of offensive words in social media using the soundex algorith...IJECEIAES
Offensive posts in the social media that are inappropriate for a specific age, level of maturity, or impression are quite often destined more to unadult than adult participants. Nowadays, the growth in the number of the masked offensive words in the social media is one of the ethically challenging problems. Thus, there has been growing interest in development of methods that can automatically detect posts with such words. This study aimed at developing a method that can detect the masked offensive words in which partial alteration of the word may trick the conventional monitoring systems when being posted on social media. The proposed method progresses in a series of phases that can be broken down into a pre-processing phase, which includes filtering, tokenization, and stemming; offensive word extraction phase, which relies on using the soundex algorithm and permuterm index; and a post-processing phase that classifies the users’ posts in order to highlight the offensive content. Accordingly, the method detects the masked offensive words in the written text, thus forbidding certain types of offensive words from being published. Results of evaluation of performance of the proposed method indicate a 99% accuracy of detection of offensive words.
Temporal Exploration in 2D Visualization of Emotions on Twitter StreamTELKOMNIKA JOURNAL
As people freely express their opinions toward a product on Twitter streams without being bound
by time, visualizing time pattern of customers emotional behavior can play a crucial role in decisionmaking.
We analyze how emotions are fluctuated in pattern and demonstrate how we can explore it into
useful visualizations with an appropriate framework. We manually customized the current framework in
order to improve a state-of-the-art of crawling and visualizing Twitter data. The data, post or update on
status on the Twitter website about iPhone, was collected from U.S.A, Japan, Indonesia, and Taiwan by
using geographical bounding-box and visualized it into two-dimensional heat map, interactive stream
graph, and context focus via brushing visualization. The results show that our proposed system can
explore uniqueness of temporal pattern of customers emotional behavior.
MIMEME ATTRIBUTE CLASSIFICATION USING LDV ENSEMBLE MULTIMODEL LEARNINGCSEIJJournal
One of the most common types of social networking interaction is memes. Memes are innately multimodal,
so studying and processing them is a hot issue currently. This study's analysis of the DV dataset comprises
classifying memes according to their irony, humour, motive, and overarching mood. The effectiveness of
three different creative transformer-based strategies has been carefully examined. The DV Dataset used
here is created by own meme data for this implementation analysis of hateful memes. Out of all of our
strategies, the proposed ensemble model LDV obtained a macro F1 score of 0.737 for humour
classification, 0.775 for motivation classification, 0.69 for sarcasm classification, and 0.756 for overall
sentiment of the meme.
Mimeme Attribute Classification using LDV Ensemble Multimodel LearningCSEIJJournal
One of the most common types of social networking interaction is memes. Memes are innately multimodal,
so studying and processing them is a hot issue currently. This study's analysis of the DV dataset comprises
classifying memes according to their irony, humour, motive, and overarching mood. The effectiveness of
three different creative transformer-based strategies has been carefully examined. The DV Dataset used
here is created by own meme data for this implementation analysis of hateful memes. Out of all of our
strategies, the proposed ensemble model LDV obtained a macro F1 score of 0.737 for humour
classification, 0.775 for motivation classification, 0.69 for sarcasm classification, and 0.756 for overall
sentiment of the meme.
Big five personality prediction based in Indonesian tweets using machine lea...IJECEIAES
The popularity of social media has drawn the attention of researchers who have conducted cross-disciplinary studies examining the relationship between personality traits and behavior on social media. Most current work focuses on personality prediction analysis of English texts, but Indonesian has received scant attention. Therefore, this research aims to predict user’s personalities based on Indonesian text from social media using machine learning techniques. This paper evaluates several machine learning techniques, including naive Bayes (NB), K-nearest neighbors (KNN), and support vector machine (SVM), based on semantic features including emotion, sentiment, and publicly available Twitter profile. We predict the personality based on the Big Five personality model, the most appropriate model for predicting user personality in social media. We examine the relationships between the semantic features and the Big Five personality dimensions. The experimental results indicate that the Big Five personality exhibit distinct emotional, sentimental, and social characteristics and that SVM outperformed NB and KNN for Indonesian. In addition, we observe several terms in Indonesian that specifically refer to each personality type, each of which has distinct emotional, sentimental, and social features.
Semantic Massage Addressing based on Social Cloud Actor's InterestsCSCJournals
Wireless communication with Mobile Terminals has become popular tools for collecting and sending information and data. With mobile communication comes the Short Message Service (SMS) technology which is an ideal way to stay connected with anyone, anywhere anytime to help maintain business relationships with customers. Sending individual SMS messages to long list of mobile numbers can be very time consuming, and face problems of wireless communications such as variable and asymmetric bandwidth, geographical mobility and high usage costs and face the rigidity of lists. This paper proposes a technique that assures sending the message to semantically specified group of recipients. A recipient group is automatically identified based on personal information (interests, work place, publications, social relationships, etc.) and behavior based on a populated ontology created by integrating the publicly available FOAF (Friend-of-a-Friend) documents. We demonstrate that our simple technique can first, ensure extracting groups effectively according to the descriptive attributes and second send SMS effectively and can help combat unintentional spam and preserve the privacy of mobile numbers and even individual identities. The technique provides fast, effective, and dynamic solution to save time in constructing lists and sending group messages which can be applied both on personal level or in business.
Performance analysis in text clustering using k-means and k-medoids algorith...IJECEIAES
Few studies on text clustering for the Malay language have been conducted due to some limitations that need to be addressed. The purpose of this article is to compare the two clustering algorithms of k-means and k-medoids using Euclidean distance similarity to determine which method is the best for clustering documents. Both algorithms are applied to 1,000 documents pertaining to housebreaking crimes involving a variety of different modus operandi. Comparability results indicate that the k-means algorithm performed the best at clustering the relevant documents, with a 78% accuracy rate. K-means clustering also achieves the best performance for cluster evaluation when comparing the average within-cluster distance to the k-medoids algorithm. However, k-medoids perform exceptionally well on the Davis Bouldin index (DBI). Furthermore, the accuracy of k-means is dependent on the number of initial clusters, where the appropriate cluster number can be determined using the elbow method.
A prior case study of natural language processing on different domain IJECEIAES
In the present state of digital world, computer machine do not understand the human’s ordinary language. This is the great barrier between humans and digital systems. Hence, researchers found an advanced technology that provides information to the users from the digital machine. However, natural language processing (i.e. NLP) is a branch of AI that has significant implication on the ways that computer machine and humans can interact. NLP has become an essential technology in bridging the communication gap between humans and digital data. Thus, this study provides the necessity of the NLP in the current computing world along with different approaches and their applications. It also, highlights the key challenges in the development of new NLP model.
Indonesian multilabel classification using IndoBERT embedding and MBERT class...IJECEIAES
The rapid increase in social media activity has triggered various discussion spaces and information exchanges on social media. Social media users can easily tell stories or comment on many things without limits. However, this often triggers open debates that lead to fights on social media. This is because many social media users use toxic comments that contain elements of racism, radicalism, pornography, or slander to argue and corner individuals or groups. These comments can easily spread and trigger users vulnerable to mental disorders due to unhealthy and unfair debates on social media. Thus, a model is needed to classify comments, especially toxic ones, in Indonesian. Transformer-based model development and natural language processing approaches can be applied to create classification models. Some previous research related to the classification of toxic comments has been done, but the classification results of the model still require exploration to get optimal results. So, this research uses the proposed model by using different pre-trained models at the embedding and classification stages, in the embedding stage using Indonesia bidirectional encoder representations from transformers (IndoBERT), and classification using multilingual bidirectional encoder representations from transformers (MBERT). The proposed model provides optimal results with an F1 value of 0.9032.
Synonym based feature expansion for Indonesian hate speech detectionIJECEIAES
Online hate speech is one of the negative impacts of internet-based social media development. Hate speech occurs due to a lack of public understanding of criticism and hate speech. The Indonesian government has regulations regarding hate speech, and most of the existing research about hate speech only focuses on feature extraction and classification methods. Therefore, this paper proposes methods to identify hate speech before a crime occurs. This paper presents an approach to detect hate speech by expanding synonyms in word embedding and shows the classification comparison result between Word2Vec and FastText with bidirectional long short-term memory which are processed using synonym expanding process and without it. The goal is to classify hate speech and non-hate speech. The best accuracy result without the synonym expanding process is 0.90, and the expanding synonym process is 0.93.
An in-depth exploration of Bangla blog post classificationjournalBEEI
Bangla blog is increasing rapidly in the era of information, and consequently, the blog has a diverse layout and categorization. In such an aptitude, automated blog post classification is a comparatively more efficient solution in order to organize Bangla blog posts in a standard way so that users can easily find their required articles of interest. In this research, nine supervised learning models which are Support Vector Machine (SVM), multinomial naïve Bayes (MNB), multi-layer perceptron (MLP), k-nearest neighbours (k-NN), stochastic gradient descent (SGD), decision tree, perceptron, ridge classifier and random forest are utilized and compared for classification of Bangla blog post. Moreover, the performance on predicting blog posts against eight categories, three feature extraction techniques are applied, namely unigram TF-IDF (term frequency-inverse document frequency), bigram TF-IDF, and trigram TF-IDF. The majority of the classifiers show above 80% accuracy. Other performance evaluation metrics also show good results while comparing the selected classifiers.
Sentiment analysis of comments in social media IJECEIAES
Social media platforms are witnessing a significant growth in both size and purpose. One specific aspect of social media platforms is sentiment analysis, by which insights into the emotions and feelings of a person can be inferred from their posted text. Research related to sentiment analysis is acquiring substantial interest as it is a promising filed that can improve user experience and provide countless personalized services. Twitter is one of the most popular social media platforms, it has users from different regions with a variety of cultures and languages. It can thus provide valuable information for a diverse and large amount of data to be used to improve decision making. In this paper, the sentiment orientation of the textual features and emoji-based components is studied targeting “Tweets” and comments posted in Arabic on Twitter, during the 2018 world cup event. This study also measures the significance of analyzing texts including or excluding emojis. The data is obtained from thousands of extracted tweets, to find the results of sentiment analysis for texts and emojis separately. Results show that emojis support the sentiment orientation of the texts and those texts or emojis cannot separately provide reliable information as they complement each other to give the intended meaning.
Sentimental classification analysis of polarity multi-view textual data using...IJECEIAES
The data and information available in most community environments is complex in nature. Sentimental data resources may possibly consist of textual data collected from multiple information sources with different representations and usually handled by different analytical models. These types of data resource characteristics can form multi-view polarity textual data. However, knowledge creation from this type of sentimental textual data requires considerable analytical efforts and capabilities. In particular, data mining practices can provide exceptional results in handling textual data formats. Besides, in the case of the textual data exists as multi-view or unstructured data formats, the hybrid and integrated analysis efforts of text data mining algorithms are vital to get helpful results. The objective of this research is to enhance the knowledge discovery from sentimental multi-view textual data which can be considered as unstructured data format to classify the polarity information documents in the form of two different categories or types of useful information. A proposed framework with integrated data mining algorithms has been discussed in this paper, which is achieved through the application of X-means algorithm for clustering and HotSpot algorithm of association rules. The analysis results have shown improved accuracies of classifying the sentimental multi-view textual data into two categories through the application of the proposed framework on online polarity user-reviews dataset upon a given topics.
Dialectal Arabic sentiment analysis based on tree-based pipeline optimizatio...IJECEIAES
The heavy involvement of the Arabic internet users resulted in spreading data written in the Arabic language and creating a vast research area regarding natural language processing (NLP). Sentiment analysis is a growing field of research that is of great importance to everyone considering the high added potential for decision-making and predicting upcoming actions using the texts produced in social networks. Arabic used in microblogging websites, especially Twitter, is highly informal. It is not compliant with neither standards nor spelling regulations making it quite challenging for automatic machine-learning techniques. In this paper’s scope, we propose a new approach based on AutoML methods to improve the efficiency of the sentiment classification process for dialectal Arabic. This approach was validated through benchmarks testing on three different datasets that represent three vernacular forms of Arabic. The obtained results show that the presented framework has significantly increased accuracy than similar works in the literature.
A number of benefits have been reported for computer-based assessments over traditional paper-based exams, both in terms of IT support for question development, reduced distribution and test administration costs, and automated support. Possible for the ranking. However, existing computerized assessment systems do not provide all kinds of questions, namely open questions that require writing solutions. To overcome the challenges of the existing, the objective of this work is to achieve an intelligent evaluation system (IES) responding to the problems identified, and which adapts to the different types of questions, especially open-ended questions of which the answer requires sentence writing or programming.
The generation of digital content has undergone a great increase in recent years due to the
development of new technologies that allow the creation of content quickly and easily. A further step in this
evolution is the generation of contents by automatic systems without human intervention. Thus, for decadesit has
been developing models for the Natural Language Generation (NLG) that allow the transformation of content to
the form of narratives. At present, there are several systems that enable the generation in text format. In this
paper we present the Narrative system, which allows the generation of text narratives from different sources,
and which are indistinguishable for user from those made by a human being.
Twitter is a popular microblogging service where users create status messages (called
“tweets”). These tweets sometimes express opinions about different topics; and are presented to
the user in a chronological order. This format of presentation is useful to the user since the
latest tweets from are rich on recent news which is generally more interesting than tweets about
an event that occurred long time back. Merely, presenting tweets in a chronological order may
be too embarrassing to the user, especially if he has many followers. Therefore, there is a need
to separate the tweets into different categories and then present the categories to the user.
Nowadays Text Categorization (TC) becomes more significant especially for the Arabic
language which is one of the most complex languages.
In this paper, in order to improve the accuracy of tweets categorization a system based on
Rough Set Theory is proposed for enrichment the document’s representation. The effectiveness
of our system was evaluated and compared in term of the F-measure of the Naïve Bayesian
classifier and the Support Vector Machine classifier.
Applying Clustering Techniques for Efficient Text Mining in Twitter Dataijbuiiir1
Knowledge is the ultimate output of decisions on a dataset. The revolution of the Internet has made the global distance closer with the touch on the hand held electronic devices. Usage of social media sites have increased in the past decades. One of the most popular social media micro blog is Twitter. Twitter has millions of users in the world. In this paper the analysis of Twitter data is performed through the text contained in hash tags. After Preprocessing clustering algorithms are applied on text data. The different clusters formed are compared through various parameters. Visualization techniques are used to portray the results from which inferences like time series and topic flow can be easily made. The observed results show that the hierarchical clustering algorithm performs better than other algorithms.
A New Framework for Information System Development on Instant Messaging for L...TELKOMNIKA JOURNAL
The increasingly inexpensive Internet has spurred the growth of online information system
services in various companies. Almost all services are available in forms on web or mobile applications.
For small companies, this particular system is more difficult to implement as it requires a substantial cost
allocated for hosting, domain and server devices. The solution is to develop a framework for building
information system services through Instant Messaging (IM) such as Telegram, Line or XMPP / Jabber
using the Design Science Research Methodology. This proposed framework has the ability to transform
the existing information system services into chat services with RBAC role, session, validation and natural
interaction using Indonesian-language conversations. The framework that consists of Initiate layers,
business process and communication, memory group and OLTP DBMS will produce low-cost solution for
the development of integrated information systems service.
Text mining has turned out to be one of the in vogue handle that has been joined in a few research
fields, for example, computational etymology, Information Retrieval (IR) and data mining. Natural
Language Processing (NLP) methods were utilized to extricate learning from the textual text that is
composed by people. Text mining peruses an unstructured form of data to give important
information designs in a most brief day and age. Long range interpersonal communication locales
are an awesome wellspring of correspondence as the vast majority of the general population in this
day and age utilize these destinations in their everyday lives to keep associated with each other. It
turns into a typical practice to not compose a sentence with remedy punctuation and spelling. This
training may prompt various types of ambiguities like lexical, syntactic, and semantic and because of
this kind of indistinct data; it is elusive out the genuine data arrange. As needs be, we are directing
an examination with the point of searching for various text mining techniques to get different
textual requests via web-based networking media sites. This review expects to depict how
contemplates in online networking have utilized text investigation and text mining methods to
identify the key topics in the data. This study concentrated on examining the text mining
contemplates identified with Facebook and Twitter; the two prevailing web-based social networking
on the planet. Aftereffects of this overview can fill in as the baselines for future text mining research.
One-shot learning Batak Toba character recognition using siamese neural networkTELKOMNIKA JOURNAL
Siamese neural network (SINN) is an image processing model that compares the scores of two patterns. The SINN algorithm is a combination of the use of the double convolutional neural network (CNN) algorithm. By combined SINN with a one-shot learning algorithm, we can build an image model without requiring thousands of images for training. The test results from the SINN algorithm and one-shot learning show that this process was successful in matching the two data but was unable to produce labels from the data being tested. Because of this, the researcher decided to continue the implementation process using the CNN algorithm combined with single shot detection (SSD). By using a dataset of 5000, the recognition and translation of the Toba Batak script was successful. The percentage of average accuracy results from CNN and SSD in recognizing Toba Batak characters is 84.08% for single characters and 74.13% for mixed characters. While the percentage of average accuracy results for testing the breadth first search algorithm is 75.725%.
Development of depth map from stereo images using sum of absolute differences...nooriasukmaningtyas
This article proposes a framework for the depth map reconstruction using stereo images. Fundamentally, this map provides an important information which commonly used in essential applications such as autonomous vehicle navigation, drone’s navigation and 3D surface reconstruction. To develop an accurate depth map, the framework must be robust against the challenging regions of low texture, plain color and repetitive pattern on the input stereo image. The development of this map requires several stages which starts with matching cost calculation, cost aggregation, optimization and refinement stage. Hence, this work develops a framework with sum of absolute difference (SAD) and the combination of two edge preserving filters to increase the robustness against the challenging regions. The SAD convolves using block matching technique to increase the efficiency of matching process on the low texture and plain color regions. Moreover, two edge preserving filters will increase the accuracy on the repetitive pattern region. The results show that the proposed method is accurate and capable to work with the challenging regions. The results are provided by the Middlebury standard dataset. The framework is also efficiently and can be applied on the 3D surface reconstruction. Moreover, this work is greatly competitive with previously available methods.
Model predictive controller for a retrofitted heat exchanger temperature cont...nooriasukmaningtyas
This paper aims to demonstrate the practical aspects of process control theory for undergraduate students at the Department of Chemical Engineering at the University of Bahrain. Both, the ubiquitous proportional integral derivative (PID) as well as model predictive control (MPC) and their auxiliaries were designed and implemented in a real-time framework. The latter was realized through retrofitting an existing plate-and-frame heat exchanger unit that has been operated using an analog PID temperature controller. The upgraded control system consists of a personal computer (PC), low-cost signal conditioning circuit, national instruments USB 6008 data acquisition card, and LabVIEW software. LabVIEW control design and simulation modules were used to design and implement the PID and MPC controllers. The performance of the designed controllers was evaluated while controlling the outlet temperature of the retrofitted plate-and-frame heat exchanger. The distinguished feature of the MPC controller in handling input and output constraints was perceived in real-time. From a pedagogical point of view, realizing the theory of process control through practical implementation was substantial in enhancing the student’s learning and the instructor’s teaching experience.
More Related Content
Similar to Combating the hate speech in Thai textual memes
Big five personality prediction based in Indonesian tweets using machine lea...IJECEIAES
The popularity of social media has drawn the attention of researchers who have conducted cross-disciplinary studies examining the relationship between personality traits and behavior on social media. Most current work focuses on personality prediction analysis of English texts, but Indonesian has received scant attention. Therefore, this research aims to predict user’s personalities based on Indonesian text from social media using machine learning techniques. This paper evaluates several machine learning techniques, including naive Bayes (NB), K-nearest neighbors (KNN), and support vector machine (SVM), based on semantic features including emotion, sentiment, and publicly available Twitter profile. We predict the personality based on the Big Five personality model, the most appropriate model for predicting user personality in social media. We examine the relationships between the semantic features and the Big Five personality dimensions. The experimental results indicate that the Big Five personality exhibit distinct emotional, sentimental, and social characteristics and that SVM outperformed NB and KNN for Indonesian. In addition, we observe several terms in Indonesian that specifically refer to each personality type, each of which has distinct emotional, sentimental, and social features.
Semantic Massage Addressing based on Social Cloud Actor's InterestsCSCJournals
Wireless communication with Mobile Terminals has become popular tools for collecting and sending information and data. With mobile communication comes the Short Message Service (SMS) technology which is an ideal way to stay connected with anyone, anywhere anytime to help maintain business relationships with customers. Sending individual SMS messages to long list of mobile numbers can be very time consuming, and face problems of wireless communications such as variable and asymmetric bandwidth, geographical mobility and high usage costs and face the rigidity of lists. This paper proposes a technique that assures sending the message to semantically specified group of recipients. A recipient group is automatically identified based on personal information (interests, work place, publications, social relationships, etc.) and behavior based on a populated ontology created by integrating the publicly available FOAF (Friend-of-a-Friend) documents. We demonstrate that our simple technique can first, ensure extracting groups effectively according to the descriptive attributes and second send SMS effectively and can help combat unintentional spam and preserve the privacy of mobile numbers and even individual identities. The technique provides fast, effective, and dynamic solution to save time in constructing lists and sending group messages which can be applied both on personal level or in business.
Performance analysis in text clustering using k-means and k-medoids algorith...IJECEIAES
Few studies on text clustering for the Malay language have been conducted due to some limitations that need to be addressed. The purpose of this article is to compare the two clustering algorithms of k-means and k-medoids using Euclidean distance similarity to determine which method is the best for clustering documents. Both algorithms are applied to 1,000 documents pertaining to housebreaking crimes involving a variety of different modus operandi. Comparability results indicate that the k-means algorithm performed the best at clustering the relevant documents, with a 78% accuracy rate. K-means clustering also achieves the best performance for cluster evaluation when comparing the average within-cluster distance to the k-medoids algorithm. However, k-medoids perform exceptionally well on the Davis Bouldin index (DBI). Furthermore, the accuracy of k-means is dependent on the number of initial clusters, where the appropriate cluster number can be determined using the elbow method.
A prior case study of natural language processing on different domain IJECEIAES
In the present state of digital world, computer machine do not understand the human’s ordinary language. This is the great barrier between humans and digital systems. Hence, researchers found an advanced technology that provides information to the users from the digital machine. However, natural language processing (i.e. NLP) is a branch of AI that has significant implication on the ways that computer machine and humans can interact. NLP has become an essential technology in bridging the communication gap between humans and digital data. Thus, this study provides the necessity of the NLP in the current computing world along with different approaches and their applications. It also, highlights the key challenges in the development of new NLP model.
Indonesian multilabel classification using IndoBERT embedding and MBERT class...IJECEIAES
The rapid increase in social media activity has triggered various discussion spaces and information exchanges on social media. Social media users can easily tell stories or comment on many things without limits. However, this often triggers open debates that lead to fights on social media. This is because many social media users use toxic comments that contain elements of racism, radicalism, pornography, or slander to argue and corner individuals or groups. These comments can easily spread and trigger users vulnerable to mental disorders due to unhealthy and unfair debates on social media. Thus, a model is needed to classify comments, especially toxic ones, in Indonesian. Transformer-based model development and natural language processing approaches can be applied to create classification models. Some previous research related to the classification of toxic comments has been done, but the classification results of the model still require exploration to get optimal results. So, this research uses the proposed model by using different pre-trained models at the embedding and classification stages, in the embedding stage using Indonesia bidirectional encoder representations from transformers (IndoBERT), and classification using multilingual bidirectional encoder representations from transformers (MBERT). The proposed model provides optimal results with an F1 value of 0.9032.
Synonym based feature expansion for Indonesian hate speech detectionIJECEIAES
Online hate speech is one of the negative impacts of internet-based social media development. Hate speech occurs due to a lack of public understanding of criticism and hate speech. The Indonesian government has regulations regarding hate speech, and most of the existing research about hate speech only focuses on feature extraction and classification methods. Therefore, this paper proposes methods to identify hate speech before a crime occurs. This paper presents an approach to detect hate speech by expanding synonyms in word embedding and shows the classification comparison result between Word2Vec and FastText with bidirectional long short-term memory which are processed using synonym expanding process and without it. The goal is to classify hate speech and non-hate speech. The best accuracy result without the synonym expanding process is 0.90, and the expanding synonym process is 0.93.
An in-depth exploration of Bangla blog post classificationjournalBEEI
Bangla blog is increasing rapidly in the era of information, and consequently, the blog has a diverse layout and categorization. In such an aptitude, automated blog post classification is a comparatively more efficient solution in order to organize Bangla blog posts in a standard way so that users can easily find their required articles of interest. In this research, nine supervised learning models which are Support Vector Machine (SVM), multinomial naïve Bayes (MNB), multi-layer perceptron (MLP), k-nearest neighbours (k-NN), stochastic gradient descent (SGD), decision tree, perceptron, ridge classifier and random forest are utilized and compared for classification of Bangla blog post. Moreover, the performance on predicting blog posts against eight categories, three feature extraction techniques are applied, namely unigram TF-IDF (term frequency-inverse document frequency), bigram TF-IDF, and trigram TF-IDF. The majority of the classifiers show above 80% accuracy. Other performance evaluation metrics also show good results while comparing the selected classifiers.
Sentiment analysis of comments in social media IJECEIAES
Social media platforms are witnessing a significant growth in both size and purpose. One specific aspect of social media platforms is sentiment analysis, by which insights into the emotions and feelings of a person can be inferred from their posted text. Research related to sentiment analysis is acquiring substantial interest as it is a promising filed that can improve user experience and provide countless personalized services. Twitter is one of the most popular social media platforms, it has users from different regions with a variety of cultures and languages. It can thus provide valuable information for a diverse and large amount of data to be used to improve decision making. In this paper, the sentiment orientation of the textual features and emoji-based components is studied targeting “Tweets” and comments posted in Arabic on Twitter, during the 2018 world cup event. This study also measures the significance of analyzing texts including or excluding emojis. The data is obtained from thousands of extracted tweets, to find the results of sentiment analysis for texts and emojis separately. Results show that emojis support the sentiment orientation of the texts and those texts or emojis cannot separately provide reliable information as they complement each other to give the intended meaning.
Sentimental classification analysis of polarity multi-view textual data using...IJECEIAES
The data and information available in most community environments is complex in nature. Sentimental data resources may possibly consist of textual data collected from multiple information sources with different representations and usually handled by different analytical models. These types of data resource characteristics can form multi-view polarity textual data. However, knowledge creation from this type of sentimental textual data requires considerable analytical efforts and capabilities. In particular, data mining practices can provide exceptional results in handling textual data formats. Besides, in the case of the textual data exists as multi-view or unstructured data formats, the hybrid and integrated analysis efforts of text data mining algorithms are vital to get helpful results. The objective of this research is to enhance the knowledge discovery from sentimental multi-view textual data which can be considered as unstructured data format to classify the polarity information documents in the form of two different categories or types of useful information. A proposed framework with integrated data mining algorithms has been discussed in this paper, which is achieved through the application of X-means algorithm for clustering and HotSpot algorithm of association rules. The analysis results have shown improved accuracies of classifying the sentimental multi-view textual data into two categories through the application of the proposed framework on online polarity user-reviews dataset upon a given topics.
Dialectal Arabic sentiment analysis based on tree-based pipeline optimizatio...IJECEIAES
The heavy involvement of the Arabic internet users resulted in spreading data written in the Arabic language and creating a vast research area regarding natural language processing (NLP). Sentiment analysis is a growing field of research that is of great importance to everyone considering the high added potential for decision-making and predicting upcoming actions using the texts produced in social networks. Arabic used in microblogging websites, especially Twitter, is highly informal. It is not compliant with neither standards nor spelling regulations making it quite challenging for automatic machine-learning techniques. In this paper’s scope, we propose a new approach based on AutoML methods to improve the efficiency of the sentiment classification process for dialectal Arabic. This approach was validated through benchmarks testing on three different datasets that represent three vernacular forms of Arabic. The obtained results show that the presented framework has significantly increased accuracy than similar works in the literature.
A number of benefits have been reported for computer-based assessments over traditional paper-based exams, both in terms of IT support for question development, reduced distribution and test administration costs, and automated support. Possible for the ranking. However, existing computerized assessment systems do not provide all kinds of questions, namely open questions that require writing solutions. To overcome the challenges of the existing, the objective of this work is to achieve an intelligent evaluation system (IES) responding to the problems identified, and which adapts to the different types of questions, especially open-ended questions of which the answer requires sentence writing or programming.
The generation of digital content has undergone a great increase in recent years due to the
development of new technologies that allow the creation of content quickly and easily. A further step in this
evolution is the generation of contents by automatic systems without human intervention. Thus, for decadesit has
been developing models for the Natural Language Generation (NLG) that allow the transformation of content to
the form of narratives. At present, there are several systems that enable the generation in text format. In this
paper we present the Narrative system, which allows the generation of text narratives from different sources,
and which are indistinguishable for user from those made by a human being.
Twitter is a popular microblogging service where users create status messages (called
“tweets”). These tweets sometimes express opinions about different topics; and are presented to
the user in a chronological order. This format of presentation is useful to the user since the
latest tweets from are rich on recent news which is generally more interesting than tweets about
an event that occurred long time back. Merely, presenting tweets in a chronological order may
be too embarrassing to the user, especially if he has many followers. Therefore, there is a need
to separate the tweets into different categories and then present the categories to the user.
Nowadays Text Categorization (TC) becomes more significant especially for the Arabic
language which is one of the most complex languages.
In this paper, in order to improve the accuracy of tweets categorization a system based on
Rough Set Theory is proposed for enrichment the document’s representation. The effectiveness
of our system was evaluated and compared in term of the F-measure of the Naïve Bayesian
classifier and the Support Vector Machine classifier.
Applying Clustering Techniques for Efficient Text Mining in Twitter Dataijbuiiir1
Knowledge is the ultimate output of decisions on a dataset. The revolution of the Internet has made the global distance closer with the touch on the hand held electronic devices. Usage of social media sites have increased in the past decades. One of the most popular social media micro blog is Twitter. Twitter has millions of users in the world. In this paper the analysis of Twitter data is performed through the text contained in hash tags. After Preprocessing clustering algorithms are applied on text data. The different clusters formed are compared through various parameters. Visualization techniques are used to portray the results from which inferences like time series and topic flow can be easily made. The observed results show that the hierarchical clustering algorithm performs better than other algorithms.
A New Framework for Information System Development on Instant Messaging for L...TELKOMNIKA JOURNAL
The increasingly inexpensive Internet has spurred the growth of online information system
services in various companies. Almost all services are available in forms on web or mobile applications.
For small companies, this particular system is more difficult to implement as it requires a substantial cost
allocated for hosting, domain and server devices. The solution is to develop a framework for building
information system services through Instant Messaging (IM) such as Telegram, Line or XMPP / Jabber
using the Design Science Research Methodology. This proposed framework has the ability to transform
the existing information system services into chat services with RBAC role, session, validation and natural
interaction using Indonesian-language conversations. The framework that consists of Initiate layers,
business process and communication, memory group and OLTP DBMS will produce low-cost solution for
the development of integrated information systems service.
Text mining has turned out to be one of the in vogue handle that has been joined in a few research
fields, for example, computational etymology, Information Retrieval (IR) and data mining. Natural
Language Processing (NLP) methods were utilized to extricate learning from the textual text that is
composed by people. Text mining peruses an unstructured form of data to give important
information designs in a most brief day and age. Long range interpersonal communication locales
are an awesome wellspring of correspondence as the vast majority of the general population in this
day and age utilize these destinations in their everyday lives to keep associated with each other. It
turns into a typical practice to not compose a sentence with remedy punctuation and spelling. This
training may prompt various types of ambiguities like lexical, syntactic, and semantic and because of
this kind of indistinct data; it is elusive out the genuine data arrange. As needs be, we are directing
an examination with the point of searching for various text mining techniques to get different
textual requests via web-based networking media sites. This review expects to depict how
contemplates in online networking have utilized text investigation and text mining methods to
identify the key topics in the data. This study concentrated on examining the text mining
contemplates identified with Facebook and Twitter; the two prevailing web-based social networking
on the planet. Aftereffects of this overview can fill in as the baselines for future text mining research.
One-shot learning Batak Toba character recognition using siamese neural networkTELKOMNIKA JOURNAL
Siamese neural network (SINN) is an image processing model that compares the scores of two patterns. The SINN algorithm is a combination of the use of the double convolutional neural network (CNN) algorithm. By combined SINN with a one-shot learning algorithm, we can build an image model without requiring thousands of images for training. The test results from the SINN algorithm and one-shot learning show that this process was successful in matching the two data but was unable to produce labels from the data being tested. Because of this, the researcher decided to continue the implementation process using the CNN algorithm combined with single shot detection (SSD). By using a dataset of 5000, the recognition and translation of the Toba Batak script was successful. The percentage of average accuracy results from CNN and SSD in recognizing Toba Batak characters is 84.08% for single characters and 74.13% for mixed characters. While the percentage of average accuracy results for testing the breadth first search algorithm is 75.725%.
Similar to Combating the hate speech in Thai textual memes (20)
Development of depth map from stereo images using sum of absolute differences...nooriasukmaningtyas
This article proposes a framework for the depth map reconstruction using stereo images. Fundamentally, this map provides an important information which commonly used in essential applications such as autonomous vehicle navigation, drone’s navigation and 3D surface reconstruction. To develop an accurate depth map, the framework must be robust against the challenging regions of low texture, plain color and repetitive pattern on the input stereo image. The development of this map requires several stages which starts with matching cost calculation, cost aggregation, optimization and refinement stage. Hence, this work develops a framework with sum of absolute difference (SAD) and the combination of two edge preserving filters to increase the robustness against the challenging regions. The SAD convolves using block matching technique to increase the efficiency of matching process on the low texture and plain color regions. Moreover, two edge preserving filters will increase the accuracy on the repetitive pattern region. The results show that the proposed method is accurate and capable to work with the challenging regions. The results are provided by the Middlebury standard dataset. The framework is also efficiently and can be applied on the 3D surface reconstruction. Moreover, this work is greatly competitive with previously available methods.
Model predictive controller for a retrofitted heat exchanger temperature cont...nooriasukmaningtyas
This paper aims to demonstrate the practical aspects of process control theory for undergraduate students at the Department of Chemical Engineering at the University of Bahrain. Both, the ubiquitous proportional integral derivative (PID) as well as model predictive control (MPC) and their auxiliaries were designed and implemented in a real-time framework. The latter was realized through retrofitting an existing plate-and-frame heat exchanger unit that has been operated using an analog PID temperature controller. The upgraded control system consists of a personal computer (PC), low-cost signal conditioning circuit, national instruments USB 6008 data acquisition card, and LabVIEW software. LabVIEW control design and simulation modules were used to design and implement the PID and MPC controllers. The performance of the designed controllers was evaluated while controlling the outlet temperature of the retrofitted plate-and-frame heat exchanger. The distinguished feature of the MPC controller in handling input and output constraints was perceived in real-time. From a pedagogical point of view, realizing the theory of process control through practical implementation was substantial in enhancing the student’s learning and the instructor’s teaching experience.
Control of a servo-hydraulic system utilizing an extended wavelet functional ...nooriasukmaningtyas
Servo-hydraulic systems have been extensively employed in various industrial applications. However, these systems are characterized by their highly complex and nonlinear dynamics, which complicates the control design stage of such systems. In this paper, an extended wavelet functional link neural network (EWFLNN) is proposed to control the displacement response of the servo-hydraulic system. To optimize the controller's parameters, a recently developed optimization technique, which is called the modified sine cosine algorithm (M-SCA), is exploited as the training method. The proposed controller has achieved remarkable results in terms of tracking two different displacement signals and handling external disturbances. From a comparative study, the proposed EWFLNN controller has attained the best control precision compared with those of other controllers, namely, a proportional-integralderivative (PID) controller, an artificial neural network (ANN) controller, a wavelet neural network (WNN) controller, and the original wavelet functional link neural network (WFLNN) controller. Moreover, compared to the genetic algorithm (GA) and the original sine cosine algorithm (SCA), the M-SCA has shown better optimization results in finding the optimal values of the controller's parameters.
Decentralised optimal deployment of mobile underwater sensors for covering la...nooriasukmaningtyas
This paper presents the problem of sensing coverage of layers of the ocean in three dimensional underwater environments. We propose distributed control laws to drive mobile underwater sensors to optimally cover a given confined layer of the ocean. By applying this algorithm at first the mobile underwater sensors adjust their depth to the specified depth. Then, they make a triangular grid across a given area. Afterwards, they randomly move to spread across the given grid. These control laws only rely on local information also they are easily implemented and computationally effective as they use some easy consensus rules. The feature of exchanging information just among neighbouring mobile sensors keeps the information exchange minimum in the whole networks and makes this algorithm practicable option for undersea. The efficiency of the presented control laws is confirmed via mathematical proof and numerical simulations.
Evaluation quality of service for internet of things based on fuzzy logic: a ...nooriasukmaningtyas
The development of the internet of thing (IoT) technology has become a major concern in sustainability of quality of service (SQoS) in terms of efficiency, measurement, and evaluation of services, such as our smart home case study. Based on several ambiguous linguistic and standard criteria, this article deals with quality of service (QoS). We used fuzzy logic to select the most appropriate and efficient services. For this reason, we have introduced a new paradigmatic approach to assess QoS. In this regard, to measure SQoS, linguistic terms were collected for identification of ambiguous criteria. This paper collects the results of other work to compare the traditional assessment methods and techniques in IoT. It has been proven that the comparison that traditional valuation methods and techniques could not effectively deal with these metrics. Therefore, fuzzy logic is a worthy method to provide a good measure of QoS with ambiguous linguistic and criteria. The proposed model addresses with constantly being improved, all the main axes of the QoS for a smart home. The results obtained also indicate that the model with its fuzzy performance importance index (FPII) has efficiently evaluate the multiple services of SQoS.
Low power architecture of logic gates using adiabatic techniquesnooriasukmaningtyas
The growing significance of portable systems to limit power consumption in ultra-large-scale-integration chips of very high density, has recently led to rapid and inventive progresses in low-power design. The most effective technique is adiabatic logic circuit design in energy-efficient hardware. This paper presents two adiabatic approaches for the design of low power circuits, modified positive feedback adiabatic logic (modified PFAL) and the other is direct current diode based positive feedback adiabatic logic (DC-DB PFAL). Logic gates are the preliminary components in any digital circuit design. By improving the performance of basic gates, one can improvise the whole system performance. In this paper proposed circuit design of the low power architecture of OR/NOR, AND/NAND, and XOR/XNOR gates are presented using the said approaches and their results are analyzed for powerdissipation, delay, power-delay-product and rise time and compared with the other adiabatic techniques along with the conventional complementary metal oxide semiconductor (CMOS) designs reported in the literature. It has been found that the designs with DC-DB PFAL technique outperform with the percentage improvement of 65% for NOR gate and 7% for NAND gate and 34% for XNOR gate over the modified PFAL techniques at 10 MHz respectively.
A review on techniques and modelling methodologies used for checking electrom...nooriasukmaningtyas
The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.
Smart monitoring system using NodeMCU for maintenance of production machinesnooriasukmaningtyas
Maintenance is an activity that helps to reduce risk, increase productivity, improve quality, and minimize production costs. The necessity for maintenance actions will increase efficiency and enhance the safety and quality of products and processes. On getting these conditions, it is necessary to implement a monitoring system used to observe machines' conditions from time to time, especially the machine parts that often experience problems. This paper presents a low-cost intelligent monitoring system using NodeMCU to continuously monitor machine conditions and provide warnings in the case of machine failure. Not only does it provide alerts, but this monitoring system also generates historical data on machine conditions to the Google Cloud (Google Sheet), includes which machines were down, downtime, issues occurred, repairs made, and technician handling. The results obtained are machine operators do not need to lose a relatively long time to call the technician. Likewise, the technicians assisted in carrying out machine maintenance activities and online reports so that errors that often occur due to human error do not happen again. The system succeeded in reducing the technician-calling time and maintenance workreporting time up to 50%. The availability of online and real-time maintenance historical data will support further maintenance strategy.
Design and simulation of a software defined networkingenabled smart switch, f...nooriasukmaningtyas
Using sustainable energy is the future of our planet earth, this became not only economically efficient but also a necessity for the preservation of life on earth. Because of such necessity, smart grids became a very important issue to be researched. Many literatures discussed this topic and with the development of internet of things (IoT) and smart sensors, smart grids are developed even further. On the other hand, software defined networking is a technology that separates the control plane from the data plan of the network. It centralizes the management and the orchestration of the network tasks by using a network controller. The network controller is the heart of the SDN-enabled network, and it can control other networking devices using software defined networking (SDN) protocols such as OpenFlow. A smart switching mechanism called (SDN-smgrid-sw) for the smart grid will be modeled and controlled using SDN. We modeled the environment that interact with the sensors, for the sun and the wind elements. The Algorithm is modeled and programmed for smart efficient power sharing that is managed centrally and monitored using SDN controller. Also, all if the smart grid elements (power sources) are connected to the IP network using IoT protocols.
Efficient wireless power transmission to remote the sensor in restenosis coro...nooriasukmaningtyas
In this study, the researchers have proposed an alternative technique for designing an asymmetric 4 coil-resonance coupling module based on the series-to-parallel topology at 27 MHz industrial scientific medical (ISM) band to avoid the tissue damage, for the constant monitoring of the in-stent restenosis coronary artery. This design consisted of 2 components, i.e., the external part that included 3 planar coils that were placed outside the body and an internal helical coil (stent) that was implanted into the coronary artery in the human tissue. This technique considered the output power and the transfer efficiency of the overall system, coil geometry like the number of coils per turn, and coil size. The results indicated that this design showed an 82% efficiency in the air if the transmission distance was maintained as 20 mm, which allowed the wireless power supply system to monitor the pressure within the coronary artery when the implanted load resistance was 400 Ω.
Grid reactive voltage regulation and cost optimization for electric vehicle p...nooriasukmaningtyas
Expecting large electric vehicle (EV) usage in the future due to environmental issues, state subsidies, and incentives, the impact of EV charging on the power grid is required to be closely analyzed and studied for power quality, stability, and planning of infrastructure. When a large number of energy storage batteries are connected to the grid as a capacitive load the power factor of the power grid is inevitably reduced, causing power losses and voltage instability. In this work large-scale 18K EV charging model is implemented on IEEE 33 network. Optimization methods are described to search for the location of nodes that are affected most due to EV charging in terms of power losses and voltage instability of the network. Followed by optimized reactive power injection magnitude and time duration of reactive power at the identified nodes. It is shown that power losses are reduced and voltage stability is improved in the grid, which also complements the reduction in EV charging cost. The result will be useful for EV charging stations infrastructure planning, grid stabilization, and reducing EV charging costs.
Topology network effects for double synchronized switch harvesting circuit on...nooriasukmaningtyas
Energy extraction takes place using several different technologies, depending on the type of energy and how it is used. The objective of this paper is to study topology influence for a smart network based on piezoelectric materials using the double synchronized switch harvesting (DSSH). In this work, has been presented network topology for circuit DSSH (DSSH Standard, Independent DSSH, DSSH in parallel, mono DSSH, and DSSH in series). Using simulation-based on a structure with embedded piezoelectric system harvesters, then compare different topology of circuit DSSH for knowledge is how to connect the circuit DSSH together and how to implement accurately this circuit strategy for maximizing the total output power. The network topology DSSH extracted power a technique allows again up to in terms of maximal power output compared with network topology standard extracted at the resonant frequency. The simulation results show that by using the same input parameters the maximum efficiency for topology DSSH in parallel produces 120% more energy than topology DSSH-series. In addition, the energy harvesting by mono-DSSH is more than DSSH-series by 650% and it has exceeded DSSHind by 240%.
Improving the design of super-lift Luo converter using hybrid switching capac...nooriasukmaningtyas
In this article, an improvement to the positive output super-lift Luo converter (POSLC) has been proposed to get high gain at a low duty cycle. Also, reduce the stress on the switch and diodes, reduce the current through the inductors to reduce loss, and increase efficiency. Using a hybrid switch unit composed of four inductors and two capacitors it is replaced by the main inductor in the elementary circuit. It’s charged in parallel with the same input voltage and discharged in series. The output voltage is increased according to the number of components. The gain equation is modeled. The boundary condition between continuous conduction mode (CCM) and discontinuous conduction mode (DCM) has been derived. Passive components are designed to get high output voltage (8 times at D=0.5) and low ripple about (0.004). The circuit is simulated and analyzed using MATLAB/Simulink. Maximum power point tracker (MPPT) controls the converter to provide the most interest from solar energy.
Third harmonic current minimization using third harmonic blocking transformernooriasukmaningtyas
Zero sequence blocking transformers (ZSBTs) are used to suppress third harmonic currents in 3-phase systems. Three-phase systems where singlephase loading is present, there is every chance that the load is not balanced. If there is zero-sequence current due to unequal load current, then the ZSBT will impose high impedance and the supply voltage at the load end will be varied which is not desired. This paper presents Third harmonic blocking transformer (THBT) which suppresses only higher harmonic zero sequences. The constructional features using all windings in single-core and construction using three single-phase transformers explained. The paper discusses the constructional features, full details of circuit usage, design considerations, and simulation results for different supply and load conditions. A comparison of THBT with ZSBT is made with simulation results by considering four different cases
Power quality improvement of distribution systems asymmetry caused by power d...nooriasukmaningtyas
With an increase of non-linear load in today’s electrical power systems, the rate of power quality drops and the voltage source and frequency deteriorate if not properly compensated with an appropriate device. Filters are most common techniques that employed to overcome this problem and improving power quality. In this paper an improved optimization technique of filter applies to the power system is based on a particle swarm optimization with using artificial neural network technique applied to the unified power flow quality conditioner (PSO-ANN UPQC). Design particle swarm optimization and artificial neural network together result in a very high performance of flexible AC transmission lines (FACTs) controller and it implements to the system to compensate all types of power quality disturbances. This technique is very powerful for minimization of total harmonic distortion of source voltages and currents as a limit permitted by IEEE-519. The work creates a power system model in MATLAB/Simulink program to investigate our proposed optimization technique for improving control circuit of filters. The work also has measured all power quality disturbances of the electrical arc furnace of steel factory and suggests this technique of filter to improve the power quality.
Studies enhancement of transient stability by single machine infinite bus sys...nooriasukmaningtyas
Maintaining network synchronization is important to customer service. Low fluctuations cause voltage instability, non-synchronization in the power system or the problems in the electrical system disturbances, harmonics current and voltages inflation and contraction voltage. Proper tunning of the parameters of stabilizer is prime for validation of stabilizer. To overcome instability issues and get reinforcement found a lot of the techniques are developed to overcome instability problems and improve performance of power system. Genetic algorithm was applied to optimize parameters and suppress oscillation. The simulation of the robust composite capacitance system of an infinite single-machine bus was studied using MATLAB was used for optimization purpose. The critical time is an indication of the maximum possible time during which the error can pass in the system to obtain stability through the simulation. The effectiveness improvement has been shown in the system
Renewable energy based dynamic tariff system for domestic load managementnooriasukmaningtyas
To deal with the present power-scenario, this paper proposes a model of an advanced energy management system, which tries to achieve peak clipping, peak to average ratio reduction and cost reduction based on effective utilization of distributed generations. This helps to manage conventional loads based on flexible tariff system. The main contribution of this work is the development of three-part dynamic tariff system on the basis of time of utilizing power, available renewable energy sources (RES) and consumers’ load profile. This incorporates consumers’ choice to suitably select for either consuming power from conventional energy sources and/or renewable energy sources during peak or off-peak hours. To validate the efficiency of the proposed model we have comparatively evaluated the model performance with existing optimization techniques using genetic algorithm and particle swarm optimization. A new optimization technique, hybrid greedy particle swarm optimization has been proposed which is based on the two aforementioned techniques. It is found that the proposed model is superior with the improved tariff scheme when subjected to load management and consumers’ financial benefit. This work leads to maintain a healthy relationship between the utility sectors and the consumers, thereby making the existing grid more reliable, robust, flexible yet cost effective.
Energy harvesting maximization by integration of distributed generation based...nooriasukmaningtyas
The purpose of distributed generation systems (DGS) is to enhance the distribution system (DS) performance to be better known with its benefits in the power sector as installing distributed generation (DG) units into the DS can introduce economic, environmental and technical benefits. Those benefits can be obtained if the DG units' site and size is properly determined. The aim of this paper is studying and reviewing the effect of connecting DG units in the DS on transmission efficiency, reactive power loss and voltage deviation in addition to the economical point of view and considering the interest and inflation rate. Whale optimization algorithm (WOA) is introduced to find the best solution to the distributed generation penetration problem in the DS. The result of WOA is compared with the genetic algorithm (GA), particle swarm optimization (PSO), and grey wolf optimizer (GWO). The proposed solutions methodologies have been tested using MATLAB software on IEEE 33 standard bus system
Intelligent fault diagnosis for power distribution systemcomparative studiesnooriasukmaningtyas
Short circuit is one of the most popular types of permanent fault in power distribution system. Thus, fast and accuracy diagnosis of short circuit failure is very important so that the power system works more effectively. In this paper, a newly enhanced support vector machine (SVM) classifier has been investigated to identify ten short-circuit fault types, including single line-toground faults (XG, YG, ZG), line-to-line faults (XY, XZ, YZ), double lineto-ground faults (XYG, XZG, YZG) and three-line faults (XYZ). The performance of this enhanced SVM model has been improved by using three different versions of particle swarm optimization (PSO), namely: classical PSO (C-PSO), time varying acceleration coefficients PSO (T-PSO) and constriction factor PSO (K-PSO). Further, utilizing pseudo-random binary sequence (PRBS)-based time domain reflectometry (TDR) method allows to obtain a reliable dataset for SVM classifier. The experimental results performed on a two-branch distribution line show the most optimal variant of PSO for short fault diagnosis.
A deep learning approach based on stochastic gradient descent and least absol...nooriasukmaningtyas
More than eighty-five to ninety percentage of the diabetic patients are affected with diabetic retinopathy (DR) which is an eye disorder that leads to blindness. The computational techniques can support to detect the DR by using the retinal images. However, it is hard to measure the DR with the raw retinal image. This paper proposes an effective method for identification of DR from the retinal images. In this research work, initially the Weiner filter is used for preprocessing the raw retinal image. Then the preprocessed image is segmented using fuzzy c-mean technique. Then from the segmented image, the features are extracted using grey level co-occurrence matrix (GLCM). After extracting the fundus image, the feature selection is performed stochastic gradient descent, and least absolute shrinkage and selection operator (LASSO) for accurate identification during the classification process. Then the inception v3-convolutional neural network (IV3-CNN) model is used in the classification process to classify the image as DR image or non-DR image. By applying the proposed method, the classification performance of IV3-CNN model in identifying DR is studied. Using the proposed method, the DR is identified with the accuracy of about 95%, and the processed retinal image is identified as mild DR.
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
Quality defects in TMT Bars, Possible causes and Potential Solutions.PrashantGoswami42
Maintaining high-quality standards in the production of TMT bars is crucial for ensuring structural integrity in construction. Addressing common defects through careful monitoring, standardized processes, and advanced technology can significantly improve the quality of TMT bars. Continuous training and adherence to quality control measures will also play a pivotal role in minimizing these defects.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSEDuvanRamosGarzon1
AIRCRAFT GENERAL
The Single Aisle is the most advanced family aircraft in service today, with fly-by-wire flight controls.
The A318, A319, A320 and A321 are twin-engine subsonic medium range aircraft.
The family offers a choice of engines
Automobile Management System Project Report.pdfKamal Acharya
The proposed project is developed to manage the automobile in the automobile dealer company. The main module in this project is login, automobile management, customer management, sales, complaints and reports. The first module is the login. The automobile showroom owner should login to the project for usage. The username and password are verified and if it is correct, next form opens. If the username and password are not correct, it shows the error message.
When a customer search for a automobile, if the automobile is available, they will be taken to a page that shows the details of the automobile including automobile name, automobile ID, quantity, price etc. “Automobile Management System” is useful for maintaining automobiles, customers effectively and hence helps for establishing good relation between customer and automobile organization. It contains various customized modules for effectively maintaining automobiles and stock information accurately and safely.
When the automobile is sold to the customer, stock will be reduced automatically. When a new purchase is made, stock will be increased automatically. While selecting automobiles for sale, the proposed software will automatically check for total number of available stock of that particular item, if the total stock of that particular item is less than 5, software will notify the user to purchase the particular item.
Also when the user tries to sale items which are not in stock, the system will prompt the user that the stock is not enough. Customers of this system can search for a automobile; can purchase a automobile easily by selecting fast. On the other hand the stock of automobiles can be maintained perfectly by the automobile shop manager overcoming the drawbacks of existing system.
Courier management system project report.pdfKamal Acharya
It is now-a-days very important for the people to send or receive articles like imported furniture, electronic items, gifts, business goods and the like. People depend vastly on different transport systems which mostly use the manual way of receiving and delivering the articles. There is no way to track the articles till they are received and there is no way to let the customer know what happened in transit, once he booked some articles. In such a situation, we need a system which completely computerizes the cargo activities including time to time tracking of the articles sent. This need is fulfilled by Courier Management System software which is online software for the cargo management people that enables them to receive the goods from a source and send them to a required destination and track their status from time to time.
Event Management System Vb Net Project Report.pdfKamal Acharya
In present era, the scopes of information technology growing with a very fast .We do not see any are untouched from this industry. The scope of information technology has become wider includes: Business and industry. Household Business, Communication, Education, Entertainment, Science, Medicine, Engineering, Distance Learning, Weather Forecasting. Carrier Searching and so on.
My project named “Event Management System” is software that store and maintained all events coordinated in college. It also helpful to print related reports. My project will help to record the events coordinated by faculties with their Name, Event subject, date & details in an efficient & effective ways.
In my system we have to make a system by which a user can record all events coordinated by a particular faculty. In our proposed system some more featured are added which differs it from the existing system such as security.
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
block diagram and signal flow graph representation
Combating the hate speech in Thai textual memes
1. Indonesian Journal of Electrical Engineering and Computer Science
Vol. 21, No.3, March 2021, pp. 1493~1502
ISSN: 2502-4752, DOI: 10.11591/ijeecs.v21.i3.pp1493-1502 1493
Journal homepage: http://ijeecs.iaescore.com
Combating the hate speech in Thai textual memes
Lawankorn Mookdarsanit1
, Pakpoom Mookdarsanit2
1
Faculty of Management Science, Chandrakasem Rajabhat University, Bangkok, Thailand
2
Faculty of Science, Chandrakasem Rajabhat University, Bangkok, Thailand
Article Info ABSTRACT
Article history:
Received Sep 23, 2020
Revised Dec 9, 2020
Accepted Dec 23, 2020
Thai textual memes have been popular in social media, as a form of image
information summarization. Unfortunately, many memes contain some
hateful content that easily causes the controversy in Thailand. For global
protection, the hateful memes challengeis also provided by Facebook AI to
enable researchers to compete their algorithms for combating the hate speech
on memes as one of NeurIPS’20 competitions. As well as in Thailand, this
paper introduces the Thai textual meme detection as a new research problem
in Thai natural language processing (Thai-NLP) that is the settlement of
transmission linkage between scene text localization, Thai optical recognition
(Thai-OCR) and language understanding. From the results, both regular and
irregular text position can be localized by one-stage detection pipeline. More
scene text can be augmented by different resolution and rotation. The
accuracy of Thai-OCR using convolutional neural network (CNN) can be
improved by recurrent neural network (RNN). Since misspelling Thai words
are frequently used in social, this paper categorizes them as synonyms to
train on multi-task pre-trained language model.
Keywords:
Frequent misspelling words
Hateful meme detection
Scene text localization
Thai Language understanding
Thai printed text recognition
This is an open access article under the CC BY-SA license.
Corresponding Author:
Lawankorn Mookdarsanit
Faculty of Management Science
Chandrakasem Rajabhat University
39/1 Rachadapisek Road, Chan Kasem District, Chatuchak, Bangkok, 10900, Thailand
Email: lawankorn.s@chandra.ac.th
1. INTRODUCTION
Longer than 15 years of the social media’s scale [1], confliction and violent in Thailand has been
rapidly continuing as one of the main national problems [2], reported by Ministry of Higher Education,
Science, Research and Innovation, Thailand (MHESI)’s minister. For the MHESI’s researching policy and
support, the solution concerning this problem are still required [2] to improve the nation’s unity. Many
controversies areeasily provoked [3-4] by sharing the hate speech on social media [4]. Under the umbrella of
United Nations (UN) [5], hate speech can be both legal and illegal [5-6] that is any direct/indirect attacking
characteristics [6], to negatively defame, insult, mock or scoff a person or a group of people. As a
solution,the empowering social media in Thailand (e.g., Facebook, Twitter,Instagram and LINE) [7] is seen
as the largest information centrethat totally plays the main role for the relief of controversy. Coupled with the
short communication on social media, theimage information categorization [8] is popular for the well-
classified knowledge, known as “infographic” [8-9]. Beyond the infographical representation, “meme” [10] -
a form of imageinformation summarization (with some joking stories or explaning by laughing actions) [10-
11], is also a flavor of Thai social users [12-13] for the story telling that is short and clear, instead of reading
the full article, e.g., “Mag-Ina meme(Thai: แม่ลูกไปไหน-กลับมาทาไม) [14]”. According to the text
rotationality, Thai texts in the memes as shown in Figure 1 can be located in both regular and irregular
position. Against the meme objective [15], some Thai textual memes on Facebook are easily misleaded to
2. ISSN: 2502-4752
Indonesian J ElecEng& Comp Sci, Vol. 21, No. 3, March 2021: 1493 - 1502
1494
create the malicious content [16] called “hate speech” [17] that is the main cause of the serious confliction
and violentmultipication in Thailand.Since the hate speech widely distributed on social media is a global
problem to the entire nation, the Hateful Memes Challenge2020 [17] is launched by Facebook Artificial
Intelligence (Facebook AI) [18-19] as a track in Conference on Neural Information Processing Systems
(NeurIPS’20) [20], to invite researchers around the world to compete their hate speech detection
algorithmswith a $100K prize pool.To that end, the hate speech detection in textual meme is still a new
problem in artificial intelligence (AI) [21] as multimodal (textual description and image) classification in
English textual memes.
(a) (b)
Figure 1. Thai textual meme examples, scene text rotationality: (a) Regular text position, (b) Irregular text
position (Note that: the textual memes with “rudeness” or “social controversy”were censored)
Formerly, hate speech was only detected by textual information [22] using language understanding
based on NLP [23-24]. The repeated hate speech on social media could be seen as a spam [25] or phishing
link [26]. Since social media provided a rich source for mining negative/positive Thai trends in other forms
(e.g., images, emoji symbols, GIFs, stickers) as well as textual information, there were many social apps to
create textual memes. Textual meme was a popular media to share on social media, especially for
representing hateful information. Hate speech detection could be seen as an extension of optical character
recognition (OCR) application that required image to text conversion (img2txt).
Obviously, the hateful meme detection was the meeting of computer vision (CV) for image
information and natural language processing (NLP) for textual information, respectively. As it related tothe
concrete Thai controversy, most hateful memes in Thai social were already included some hateful/bully Thai
texts within the images that could be seen as a Thai printed character recognition (or Thai optical character
recognition: Thai-OCR) application [27-28]. Thai-OCR had been researching longer than 29 years [29].
During the historical AI for Thailand in 90’s [30], not only Thai-OCR [31-33] but also Thai handwritten
recognition [34-35] was one of the traditional open topics in Thai natural language processing (Thai-NLP)
[36] that was seen as the computer applied to Thai language [37]. Although neural network based model
demonstrated the high recognition rate in Thai-OCR [33, 38-40], it consumed huge much of large
computational resource. Many works were proposed to avoid the neural network. Thai characters could be
recognized by rough sets [41-42], numerical feature extraction [43-44]. Support vector machine (SVM) also
provided a state-of-the-art recognition accuracy [45] for a large number of target classes. For the well-known
competition, the local Thai-OCR challenge was hosted by Thailand’s National Electronics and Computer
Technology Center (NECTEC) that invited all Thai researchers or students to compete their algorithms shown
on this contest, called “Benchmark for Enhancing the Standard of Thai Language Processing (BEST)” [46].
3. Indonesian J ElecEng& Comp Sci ISSN: 2502-4752
Combating the hate speech in Thai textual memes (Lawankorn Mookdarsanit)
1495
Since Thai writing variants by different writers that caused Thai handwritten recognition seemed to be more
challenge than Thai-OCR [47-48], the later competitions shifted into Thai handwritten recognition since
BEST 2014. Howbeit, Thai-OCR had its own challenge; such the scene text localization [49] and recognition
[50] in the wild that might have multi-objects within the scene as in the multi-views of Thai license plate
recognition [51-52]. According to the scene text rotationality, Thai texts on the scene [53] could be detected
[54-55] in both regular and irregular position (categorized by Clova AI, LINE Corporation [50]).
To set the transmission linkage between scene text localization [49, 51-55] with Thai-OCR [27-28,
31-33, 38-46] and Thai hateful text understanding [56-57] as a new Thai-NLP problem in Figure 2, this paper
proposes an end-to-end “combating the hate speech in Thai textual memes” as a solution for the problem
stated by MHESI and Facebook AI. As well as visual question answering (VQA) [58-59] and image
captioning (IC) [60-61], this new problem needs both CV and NLP task. Since multi-objects can be located
with Thai-texts in the same scene, the positions of texts are localized by single shot detector (SSD) [62]. For
img2txt conversion, there are so many vertical positions (top, upper, middle and lower level) as sequence
data [63] in Thai text that is neccasary for character-level embedding, coupled with character recognition [64]
based on residual network (ResNet) [65] and bidirectional long-short-term-memory (Bi-LSTM) [66] with the
connectionist temporal classification (CTC) [67]. The hate speech in a converted Thai text is finally detected
by a Multi-tasking transfer learning architecture as generative pre-training transformer v.2 (GPT-2) [68] by
OpenAI. The main contribution is made as follow:
a) Thai hateful meme detetection is proposed as the meeting between CV and NLP that poses a new research
problem in Thai-NLP.
b) The regular Thai texts are multiplied by different power law distribution; and rotated in different angles to
enlarge the dataset for training the irregular ones. Both regular and irregular Thai texts in memes can be
detected by SSD.
c) The accuracy of Thai-OCR by ResNet in sequence data in character level can be improved by Bi-LSTM.
d) Those frequent misspelling words are seen as Thai synonyms that are combined with multi-task GPT-2 to
produce the state-of-the-art results.
This paper is organized into 4 sections. Research method is described in Section 2. Section 3 is
results and discussion. And the conclusion is in Section 4.
Figure 2. This proposed “hate speech detection in Thai memes” as the transmission linkage between scene
text localization with Thai-OCR and Thai hateful text understanding
4. ISSN: 2502-4752
Indonesian J ElecEng& Comp Sci, Vol. 21, No. 3, March 2021: 1493 - 1502
1496
2. RESEARCH METHOD
Most story telling in hateful memes are normally illustrated by visual image features with the bully
texts for satirizing or mocking the activities. To formulate Thai hate speech detection on memes as Figure 2,
this paper poses an enlargement of Thai natural language processing (Thai-NLP). The modular framework
consists of scene text localization-to localize the position of Thai texts within the meme, Thai-OCR-to
convert Thai textual image into sequential characters of text incharacter-level (also called Thai-img2txt) and
Thai hateful text understanding-to supervisedly learn and classify the converted Thai textual information
using word-level sequence to sequence (seq2seq) network.
2.1. Scene text localization
For scene textual meme (unlike Thai-OCR problem), Thai characters cannot be directly segmented
from the background. Thai text can be located in any positions within the meme. Moreover, there are a large
number of objects with textual description in the scene (called scene complexity). Convolutional neural
network (CNN) based detection has been proven to be better than traditional detection to localize Thai text
[53] from the multi-objects scene. Revolutionarily, CNN-based detection can be classified into 2 types [69]:
1) one-stage pipeline (without region feature extraction) and two-stage pipeline (with region feature
extraction). From the report [69-70], one-stage pipeline is better than two-stage in term of speed; but it
provides lower in correctness. Two-stage detection is proposed to localize multi-objects, rather than text.
Generally, Thai textual features can besufficiently localized by one-stage pipeline, according to the short time
processing.
Single shot detector (SSD) [62] is such a one-stage pipeline that is shown to be the acceptable
correctness of Thai text localization [49] in the multi-objects scenes. SSD is based on Visual geometry group
network (VGGNet) with ImageNet pre-training [71]. SSD firstly provides the multi-reference and multi-
resolution (called pyramid representation) in one-stage for various sizes of objects, especially for Thai texts.
Accordingly, this paper uses SSD to localize the positions of Thai textual features from memes as text
localization in the wild scene.
2.2. Thai-OCR
Thai optical character recognition (Thai-OCR) is based on convolutional neural network (CNN) for
feature extraction; and recurrent neural network (RNN) for character-level sequence [64]. The localized Thai
textual feature is processed in 2 stages: Thai textual feature extraction is used to recognize those Thai
characters (44 constants, 18 vowels, 4 tone marks, 5 diacritics, 19 numbers and 6 symbols) [37, 46]. For Thai
character recognition, the block is automatically divided into 4 levels (top, upper, medium and lower
position) as shown in Figure 3(a). The residual network (ResNet) [65] pretrained on ImageNet is applied to
supervisedly-learn and classify those characters from the localized textual features.
Thai character-level sequence is to sequence the extracted textual features by Bidirectional Long-
short-term memory (Bi-LSTM) [66].Each character feature is sequentially sorted from left to right as position
by position. Each position is also checked its lower, upper and top level, as shown in Figure 3(b). The stream
extracted features is fed into Bi-LSTM [66]. For the prediction, Connectionist temporal classification (CTC)
[67] is used for mapping those extracted features into Thai character sequence, to produce the output as Thai
textual information.
(a) (b)
Figure 3. Thai character-level sequence: (a) Thai characters in 4 levels: top, upper, medium and lower
position, (b) Sorting each character feature from left to right; and each of the lower, upper and top level
2.3. Thai hateful text understanding
According to the infographic representation, most Thai textual memes are short description. Speech
understanding can be seen as a problem of aspect-level sentiment analysis [72] on Thai textual information
that consists of frequent misspelling words and pre-trained language model.
5. Indonesian J ElecEng& Comp Sci ISSN: 2502-4752
Combating the hate speech in Thai textual memes (Lawankorn Mookdarsanit)
1497
2.3.1. Frequent misspelling words
Although Thai textual information in meme is such a short textual message, the real Thai usage still
has a huge much of noise and character repetition that are the main cause of model misclassification. Those
spamming characters are filtered out using pyThaiNLP library [73], except for frequent misspelling words.
Likewise, Thai word tokenization is still an important issue in Thai natural language processing (Thai-NLP)
for the word-level language understanding. Instead of the misspelling word checker, the frequent misspelling
usages can be seen as synonyms on social media, as shown in Table 1. These frequent misspelling usages are
also augmented to be trained to the multi-task fine-tuning in pre-trained language model.
Table 1. Some official spelling and frequent misspelling
Official Thai spelling Frequent misspelling forms
บอกตรงๆ บ่องตง
ครับ คร้าฟ, ครัช, ครัฟ, คับ, คัฟ, ฮ๊าฟ, งับ, ฮัฟ
ค่ะ/คะ ค่า, ค๊า, ขา, ค๊ะ, ข๊ะ, ข่ะ
ราคาญ รามคาญ, รัมคาน, ราค่าน, รามคาน
อะไร
ทาไม
อะรัย, อาไร, อาราย
ทามมาย, ทัมมัย, ทามัย, ทะมาย
จังเลย จุงเบย, จังเบย, จังเรย,จางเรย
อะไร อะรัย, อัลไล, อาราย, อาไร
2.3.2. Pre-trained language model
Pre-trained language model has demonstrated state-of-the-art results [74] in word2vec language
understanding tasks for Thai text classification [75]. Generative pre-training transformer 2 (GPT-2) by
OpenAI is used to be pre-trained and fine-tuned Thai textual information from the collected dataset. For the
benchmarking, Wisesight sentiment analaysis [76] in Kaggle competition [77] that has 4 different target
classes: positive, negative, neutral and question. GPT-2 is an improved version of GPT [78] for semi-
supervised learning on large-scale dataset (both labeled and unlabled data) that has transformer based model
and deeper than GPT. GPT-2 also has multi-tasked fine-tuning at the same time to improve the performance
of standard transfer learning. The fine-tuned configuration is set for 3 epochs and batch size as 32 with 1.5B
parameters. To best of the knowledge, the frequent misspelling Thai words are augmented to fine-tune the
model as multi-tasking.
3. RESULTS AND DISCUSSION
According to the requirement of Thai hateful meme detection on social media (as well as the hateful
memes challenge by Facebook AI), this section described the implementation detail that processed on Tesla
V100 GPU colab. By the components, the results could be separately divided into dataset and extension,
Thai textual meme results and language understanding results.
3.1. Dataset and extension
The dataset in this paper could be divided into 2 parts: textual image dataset and textual information
dataset. Both dataset had the augmentation to increase more synthetic data.
3.1.1. Textual image dataset
A various amount of cleaned Thai textual images (without any other objects) in different fonts, e.g.,
dhamma, religion, politics, celebrities, business, science, quotes or hate speech were cropped. To increase the
dataset, these textual formats were multiplied by power law distribution (where 1
deg
0
ree )to increase
more multi-resolution texts and rotated in both left and right direction (where
60
30
) to
extensively synthesize more irregular texts, according to the textual presentation for the reader’s views, as
shown in Figure 4. The textual image dataset contained 16,469 images with their textual information as the
target class.
6. ISSN: 2502-4752
Indonesian J ElecEng& Comp Sci, Vol. 21, No. 3, March 2021: 1493 - 1502
1498
(a) (b)
Figure 4. Synthesizing more textual images: (a) Multi-resolution degree, (b) Rotation
3.1.2. Textual information dataset
For pre-training dataset construction, the 325,530 raw Thai texts concerning opinion analysis were
crawled from Pantip.com, Youtube, Facebook and Twitter and were cleaned and prepared that finally had only
298,212 texts for the pre-trained model. Instead of misspelling replacement, the frequent misspelling words
were seen as synonyms; those synonyms might be further used to extensively synthesize another 72,398 Thai
texts for multi-task fine-tuning. By the way, Wisesight sentiment analysis [76-77] in Kaggle competition [77]
was used as benchmarking dataset for the proposed language model that had 26,737 Thai texts.
3.2. Thai textual meme results
The training set and testing set was partitioned into 70:30. According to object detection and
classification in computer vision, Thai optical character recognition (Thai-OCR, called img2txt conversion)
results could be divided into text localization and character recognition evaluation.
3.2.1. Text localization evaluation
Thai textual with multi-objects in the scene (called scene complexity) that made the text localization
in this problem differ from classical Thai-OCR. And those Thai characters (44 constants, 18 vowels, 4 tone
marks, 5 diacritics, 19 numbers and 6 symbols) were not such a complex feature that was unnecessary to use
two-stage detection, e.g., Faster R-CNN [79], FPN [80]. As to the short time localization, one-stage detection
(e.g., YOLO [81], SSD [62]) was more suitable for these characters with the careless region feature
formulation. From the reported results [70], SSD was the best speed in object detection; one-stage detection
was quicklier than two-stage detection as shown in Figure 5. Text localization correctness by SSD [62] could
be divided into regular, irregular and combined results as shown in Table 2. Since many Thai textual memes
were little orientated the text position, it was essential to extend the efficiency by synthesizing more Thai
textual samples to the deep learning model.
Figure 5. Reported speed (in FPS) comparison between SSD
and other detection pipelines
Table 2. Comparison between regular and
irregular scene text detection
Thai text position Precision
Regular 93.2
Irregular 88.4
7. Indonesian J ElecEng& Comp Sci ISSN: 2502-4752
Combating the hate speech in Thai textual memes (Lawankorn Mookdarsanit)
1499
3.2.2. Text recognition evaluation
All Thai character features were trained and recognized by ResNet architecture [65] and CTC
prediction [67]. From the results, Bi-LSTM [66] totally improved the recognition accuracy in character-level
sequence data, even if it took almost twice times. Many text recognition papers used ResNet or VGGNet.
With a larger number of parameters and skip connection in ResNet, ResNet outperformed VGGNet. For
trade-off between correctness and speed, the higher recognition accuracy (%) was paid by more processing
time (ms/image). Accuracy and time curve were shown in Figure 6. While, the overall character-level
recognition results under CTC prediction were shown in Table 3.
Figure 6. Accuracy and time curve
Table 3. The overall character-level recognition results under CTC prediction
Thai textual feature extraction Character-level sequence Time Accuracy Parameters
VGGNet None 1.8 67.9 5.6M
ResNet None 5.1 76.3 46M
ResNet Bi-LSTM 7.2 78.0 48M
3.3. Thai textual understanding results
The data partitioning in pre-training was divided into 3 parts: training, validation and testing dataset
as 70:15:15. For fine-tuning, the training and testing dataset was splitted as 70:30. From Table 4, standard
GPT-2 might look better than GPT-2 with multi-tasking in pre-training stage. In contrast, GPT-2 with multi-
tasking was fine-tuned in multi-task fine-tuning those spelling and frequent misspelling words as the different
written styles to be a better final accuracy in Wisesight’s private and public dataset as 0.7634 and 0.7832.
Multi-tasking or (multi-task fine-tuning at the same time) totally boosted the fairness in augmented
transformer-based model that no tasks had higher precedence than others. Based on Wisesight challenge, the
promising results were demonstrated that multi-task based transfer learning improved the classification
accuracy in Thai textual categorization.
Table 4. Comparisson between word-level GPT-2 with/without multi-taksing
Language model
Wisessight Challenge
Private Public
Multi-tasking GPT-2 0.7634 0.7832
GPT-2 0.6891 0.7011
4. CONCLUSION
As referred to Ministry of Higher Education, Science, Research and Innovation, Thailand
(MHESI)’s requirement on social media literacy, this paper achieved combating the hate speech in Thai
textual memes as a new research problem in Thai natural language processing (Thai-NLP). The research
method contained 3 parts: scene text localization by single shot detector (SSD), Thai optical character
recognition (Thai-OCR) by residual network (ResNet) and Bidirectional Long-short-term-memory (Bi-
LSTM) and Thai hateful text understanding by multi-task GPT-2. For the main discovery, the regular text
multi-resolution degree and rotation for affine augmentation could improve the localization and recognition.
8. ISSN: 2502-4752
Indonesian J ElecEng& Comp Sci, Vol. 21, No. 3, March 2021: 1493 - 1502
1500
Bi-LSTM improved the recognition accuracy in character-level sequence data. Instead of cleansing Thai
word misspelling, turn them as synonyms to multi-task GPT-2. According to the nature of Thai comparison
types, many sarcastical objects within the image (e.g., criminals, buffalos, dogs, garbages or other
caricatures) are also useful for combating the hateful memes. Moreover, the Deepfake can dangerously
generate the multi-emotions from a facial person that will be easily applied for making the hateful meme. The
fake face detection will be convergent to hateful meme detection as the same research goal.
ACKNOWLEDGEMENTS
According to the long time of controversy in Thailand as the problem stated by of Higher
Education, Science, Research and Innovation, Thailand (MHESI), this paper introduced a novel Thai textual
meme detection as a new Thai-NLP application by setting the transmission linkage between scene text
localization with Thai optical character recognition (Thai-OCR) and Thai hateful text understanding. The
rude texts could not be shown. The local data and code for your extensions could be requested by
corresponding author’s email. Thanks to the expert from Thailand’s National Electronics and Computer
Technology Center (NECTEC) for giving the technical knowledge and data. As to the quote “Rajabhat means
the king’s men”, the computational resources were supported by ChandrakasemRajabhat University (CRU)
towards the media literacy and long-term social development.
REFERENCES
[1] D. Hendricks, “Complete History of Social Media: Then and Now,” 2013. [Online]. Available:
https://smallbiztrends.com/2013/05/the-complete-history-of-social-media-infographic.html. Accessed: September
24, 2020.
[2] MHESI. “Media’s Rights and Liberties Are Necessary,” 2020. [Online]. Available:
https://www.mhesi.go.th/index.php/en/all-media/infographic/2757-2020-11-22-07-44-13.html. Accessed: September
24, 2020. [in Thai].
[3] W. Schaffar, “New Social Media and Politics in Thailand: The Emergence of Fascist Vigilante Groups on
Facebook,” Austrian Journal of South-East Asian Studies, vol. 9, no. 2, pp. 215-233, 2016.
[4] Xinhua News Agency, “Gov’t Warns Social Media, Websites over Publishing ‘MisInfo,” Khaosod English, 2020.
[Online]. Available: https://www.khaosodenglish.com/news/crimecourtscalamity/2020/08/21/govt-warns-social-
media-websites-over-publishing-misinfo/. Accessed: September 24, 2020.
[5] United Nations, “United Nations Strategy and Plan of Action on Hate Speech,” pp. 1-5, 2019. [Online]. Available:
https://www.un.org/en/genocideprevention/documents/UN%20Strategy%20and%20Plan%20of%20Action%20on%
20Hate%20Speech%2018%20June%20SYNOPSIS.pdf. Accessed: September 24, 2020.
[6] C. George, “Hate Speech Law and Policy,” The International Encyclopedia of Digital Communication and Society,
First edition, John Wiley & Sons, Inc, 2015. [Online]. Available:
https://onlinelibrary.wiley.com/doi/pdf/10.1002/9781118767771.wbiedcs139.
[7] N. Mahittivanicha, “Most Active Social Media Platforms in Thailand during July 2020,” 2020. [Online]. Available:
https://www.twfdigital.com/blog/2020/04/summary-of-social-network-in-thailand-march2020/. Accessed:
September 24, 2020. [in Thai].
[8] J. Lankow, et al., “Infographics: The Power of Visual Storytelling,”John Wiley & Sons, Inc, 2012.
[9] M. K. Mancini, “Infographic: Social Media, Then and Now,” 2015. [Online]. Available:
https://www.tech4pub.com/2015/07/08/infographic-social-media-then-and-now/. Accessed: September 24, 2020.
[10] C. M. CastañoDíaz, “Defining and Characterizing the Concept of Internet Meme,” Revista CES Psicología, vol. 6,
no. 1, pp. 82-104, 2013.
[11] B. H. Spitzberg, “Toward a Model of Meme Diffusion (M3D),” Communication Theory, vol. 24, no. 3, pp. 311-339,
2014.
[12] V. Taecharungroj and P. Nueangjamnong, “The Effect of Humour on Virality: The Study of Internet Memes on
Social Media,” in the 2014 International Forum on Public Relations and Advertising Media Impacts on Culture and
Social Communication, 2014.
[13] V. Taecharungroj and P. Nueangjamnong, “Humour 2.0: Styles and Types of Humour and Virality of Memes on
Facebook,” Journal of Creative Communications, vol. 10, no. 3, pp. 288-302, 2015.
[14] Thaitrakulpanich, A., “Filipino Mom-and-kid ‘Mag-Ina’ Meme Spreads to ThaiNet,” Khaosod English, 2019.
[Online]. Available: https://www.khaosodenglish.com/culture/net/2019/12/12/filipino-mom-and-kid-mag-ina-meme-
spreads-to-thainet/. Accessed: September 24, 2020.
[15] L. Shifman, “Memes in a Digital World: Reconciling with a Conceptual Troublemaker,” Journal of Computer-
Mediated Communication, vol. 18, pp. 362-377, 2013.
[16] R. Sittichai and P. K. Smith, “Bullying and Cyberbullying in Thailand: Coping Strategies and Relation to Age,
Gender, Religion and Victim Status,” Journal of New Approaches in Educational Research, vol. 7, no. 1, pp. 24-30,
2018.
[17] D. Kiela, et al., “The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes,” 2020. [Online].
Available: https://arxiv.org/pdf/2005.04790.pdf. Accessed: September 24, 2020.
9. Indonesian J ElecEng& Comp Sci ISSN: 2502-4752
Combating the hate speech in Thai textual memes (Lawankorn Mookdarsanit)
1501
[18] Facebook AI. “Hateful Memes Challenge and Data Set,” 2020. [Online]. Available:
https://ai.facebook.com/tools/hatefulmemes/. Accessed: October 4, 2020.
[19] Facebook AI. “Hateful Memes Challenge and Data Set for Research on Harmful Multimodal Content,” 2020. [Online].
Available: https://ai.facebook.com/blog/hateful-memes-challenge-and-data-set/. Accessed: October 4, 2020.
[20] NeurIPS. “NeurIPS 2020 Competition Track,” 2020. [Online]. Available:
https://neurips.cc/Conferences/2020/CompetitionTrack. Accessed: October 4, 2020.
[21] H. Zhong, et al., “Content-Driven Detection of Cyberbullying on the Instagram Social Network,” in the 2016
International Joint Conference on Artificial Intelligence, pp. 3952-3958, 2016.
[22] Z. Waseem, et al., “Understanding Abuse: A Typology of Abusive Language Detection Subtasks,” in the 2017
Workshop on Abusive Language Online, pp. 78-84, 2017.
[23] P. Fortuna and S. Nunes, “A Survey on Automatic Detection of Hate Speech in Text,” ACM Computing Surveys,
vol. 51, no. 4, 2018.
[24] A. Schmidt and M. Wiegand, “A Survey on Hate Speech Detection using Natural Language Processing,” in the 2017
International Workshop on Natural Language Processing for Social Media, pp. 1-10, 2017.
[25] Y. Vernanda, et al., “Indonesian Language Email Spam Detection using N-gram and Naïve Bayes Algorithm,”
Bulletin of Electrical Engineering and Informatics, vol. 9, no. 5, pp. 96-108, pp. 2012-2019, 2020.
[26] J. A. Jupin, et al., “Review of the Machine Learning Methods in the Classification of Phishing Attack,” Bulletin of
Electrical Engineering and Informatics, vol. 8, no. 4, pp. 1545-1555, 2019.
[27] T. Siriteerakul, et al., “Character Classification Framework based on Support Vector Machine and k-Nearest
Neighbour Schemes,” Science Asia, vol. 42, no. 1, pp. 46-51, 2016.
[28] K. Kesorn, et al., “Optical Character Recognition (OCR) Enhancement using an Approximate String Matching
Technique,” Engineering and Applied Science Research, vol. 45, no. 4, pp. 282-289, 2018.
[29] V. Sornlertlamvanich, “A 29-year Journey of Thai NLP,” [Online].Available: https://www.slideshare.net/virach/a-
29year-journey-of-thai-nlp. Accessed: October 4, 2020.
[30] B. Kijsirikul and T. Theeramunkong, “Survey on Artificial Intelligence Technology in Thailand,” 1999. [Online].
Available: https://www.cp.eng.chula.ac.th/~boonserm/publication/AISurvey.pdf. Accessed: October 4, 2020.
[31] C. Kimpan and S. Walairacht, “Thai Characters Recognition,” in the 1994 Symposium on Natural Language
Processing, 1994.
[32] D. Cooper, “How Do Thais Tell Letters Apart?,” 1996 International Symposium on Language and Linguistics, 1996.
[33] C. Tanprasert and T. Koanantakool, “Thai OCR: a Neural Network Application,” in the 1996 Digital Processing
Applications, pp. 90-95, 1996.
[34] C. Lursinsap and C. Khunasaraphan, “Simulated Light Sensitive Model for Handwritten Digit Recognition,” in
IJCNN Joint Conference on Neural Networks, pp. 13-18, 1992.
[35] C. Khunasaraphan and C. Lursinsap, “Simulated Light Sensitive Model for Thai Handwritten Alphabets
Recognition,” in the 1994 Symposium on Natural Language Processing, 1994.
[36] V. Sornlertlamvanich, et al., “The State of the Art in Thai Language Processing,” in the 2000 Annual Meeting of
Association for Computational Linguistics, pp. 1-2, 2000.
[37] H. T. Koanatakool, et al., “Computers and the Thai Language,” IEEE Annals of the History of Computing, vol. 31,
no. 2, pp. 50-58, 2009.
[38] C. Tanpraset, et al., “Improved Mixed Thai & English OCR using Two-step Neural Net Classification,” NECTEC
Technocal Journal, vol. 1, no. 1, pp.41-46, 1996.
[39] B. Kijsirikul, et al., “Thai Printed Character Recognition by Combining Inductive Logic Programming with
Backpropagation Neural Network,” in IEEE Asia-Pacific Conference on Circuits and Systems Microelectronics and
Integrating Systems, 1998.
[40] U. Marang, et al., “Recognition of Printed Thai Characters using Boundary Normalization and Fuzzy Neural
Networks,” in the 2000 Symposium on Natural Language Processing, 2000.
[41] W. Kasemsiri and C. Kimpan, “Printed Thai Character Recognition using Fuzzy-Rough Sets,” in IEEE Region 10
International Conference on Electrical and ElectronicTechnology, 2001.
[42] P. Phokharatkul and P. Pantaragphong, “Printed Thai Characters Recognition using Rough Sets and Fractal
Dimension,” in International Symposium on Nonlinear Theory and Its Applications, 2002.
[43] A. Kawtrakul and P. Waewsawangwong, “Multi-feature Extraction for Printed Thai Character Recognition,” in the
2000 Symposium on Natural Language Processing, 2000.
[44] U. Suttapakti, et al., “Font Descriptor Construction for Printed Thai Character Recognition,” in the 2013 IAPR
International Conference onf Machine Vision Applications, 2013.
[45] T. Siriteerakul, et al., “Character Classification Framework based on Support Vector Machine and k-Nearest
Neighbour Schemes,” Science Asia, vol. 42, no. 1, pp. 46-51, 2016.
[46] I. Methasate and S. Marukatut, “BEST 2013: Thai Printed Character Recognition Competition,” in the 2013
Symposium on Natural Language Processing, 2013.
[47] O. Surinta, et al., “Recognition of Handwritten Characters using Local Gradient Feature Descriptors,” Engineering
Applications of Artificial Intelligence, vol. 15, no. 1, pp. 405-414, 2015.
[48] P. Mookdarsanit and L. Mookdarsanit, “ThaiWrittenNet: Thai Handwritten Script Recognition using Deep Neural
Networks,” Azerbaijan Journal of High Performance Computing, vol. 3, no. 1, pp. 75-93, 2020.
[49] Y. Baek, et al., “Character Region Awareness for Text Detection,” in the 2019 IEEE/CVF Conference on Computer
Vision and Pattern Recognition, pp. 9357-9366, 2019.
[50] J. Baek, et al., “What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis,” in
the 2019 IEEE/CVF Conference on Computer Vision, pp. 4714-4722, 2019.
10. ISSN: 2502-4752
Indonesian J ElecEng& Comp Sci, Vol. 21, No. 3, March 2021: 1493 - 1502
1502
[51] W. Puarungroj and N. Boonsirisumpun, “Thai License Plate Recognition based on Deep Learning,” in the 2018
International Conference on Computer Science and Computational Intelligence, pp. 214-221, 2018.
[52] A. Kitvimonrat and S. Watcharabutsarakham, “A Robust Method for Thai License Plate Recognition,” in the 2020
International Conference on Image and Graphics Processing, pp. 17-21, 2020.
[53] T. Kobchaisawat and T. H. Chalidabhongse, “Thai Text Localization in Natural Scene Images using Convolutional Neural
Network,” in the 2014 Signal and Information Processing Association Annual Summit and Conference, pp. 1-7, 2014.
[54] X. Zhou, et al., “EAST: An Efficient and Accurate Scene Text Detector,” in the 2017 IEEE Conference on
Computer Vision and Pattern Recognition, pp. 5551-5560, 2017.
[55] T. Kobchaisawat, et al., “Scene Text Detection with Polygon Offsetting and Border Augmentation,” Electronics,
vol. 9, no. 117, 2020.
[56] S. Sirihattasak, et al., “Annotation and Classification of Toxicity for Thai Twitter,” in the 2018 Text Analytics for
Cybersecurity and Online Safety, 2018.
[57] P. Mookdarsanit and L. Mookdarsanit, “TGF-GRU: A Cyber-bullying Autonomous Detector of Lexical Thai across
Social Media,” NKRAFA Journal of Science and Technology, vol. 15, no. 1, pp. 50-58, 2019.
[58] M. Malinowski, et al., “Ask Your Neurons: A Neural-Based Approach to Answering Questions about Images,” in
the 2015 IEEE International Conference on Computer Vision, pp. 1-9, 2015.
[59] K. Marino, et al., “OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge,” in the
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3190-3199, 2019.
[60] K. Shuster, et al., “Engaging Image Captioning via Personality,” in the 2019 IEEE/CVF Conference on Computer
Vision and Pattern Recognition, pp. 12508-12518, 2019.
[61] P. Mookdarsanit and L. Mookdarsanit, “Thai-IC: Thai Image Captioning based on CNN-RNN Architecture,”
International Journal of Applied Computer and Information Systems, vol. 10, no. 1, pp. 40-45, 2020.
[62] W. Liu, et al., “SSD: Single Shot MultiBox Detector,” in the 2016 European Conference on Computer Vision, pp.
21-37, 2016.
[63] G. W. Nie, et al., “Deep Stair Walking Detection using Wearable Inertial Sensor via Long Short-Term Memory
Network,” Bulletin of Electrical Engineering and Informatics, vol. 9, no. 1, pp. 238-246, 2020.
[64] T. Emsawas and B. Kijsirikul, “Thai Printed Character Recognition using Long Short-Term Memory and Vertical
Component Shifting,” in the 2016 Pacific Rim International Conference on Artificial Intelligence, pp. 106-115, 2016.
[65] K. He, et al., “Deep Residual Learning for Image Recognition,” in the 2016 IEEE Conference on Computer Vision
and Pattern Recognition, pp. 770-778, 2016.
[66] A. Graves, et al., “Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition,” in the
2005 International Conference on Artificial Neural Networks, pp. 799-804, 2005.
[67] A. Graves, et al., “Connectionist Temporal Classification: Labeling Unsegemented Sequence Data with Recurrent
Nerual Networks,” in the 2006 International Conference on Machine Learning, pp. 369-376, 2006.
[68] A. Radford, et al., “Language Models are Unsupervised Multitask Learners,” 2019. [Online]. Available:
https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf. Accessed: October 4, 2020.
[69] L. Liu, et al., “Deep Learning for Generic Object Detection: A Survey,” 2018. [Online]. Available:
https://arxiv.org/pdf/1809.02165.pdf. Accessed: October 4, 2020.
[70] Z. Zou, et al., “Object Detection in 20 Years: A Survey,” 2019. [Online]. Available:
https://arxiv.org/pdf/1905.05055.pdf. Accessed: October 4, 2020.
[71] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-scale Image Recognition,” in the
2015 International Conference on Learning Representations, 2015.
[72] A. Torfi, et al., “Natural Language Processing Advancements by Deep Learning: A Survey,” 2020. [Online].
Available: https://arxiv.org/pdf/2003.01200.pdf. Accessed: October 4, 2020.
[73] PyThaiNLP. “PyThaiNLP: Thai Natural Language Processing in Python,” 2019. [Online]. Available:
https://github.com/PyThaiNLP/pythainlp. Accessed: October 4, 2020.
[74] M. Raghu and E. Schmidt, “A Survey of Deep Learning for Scientific Discovery,” 2020. [Online]. Available:
https://arxiv.org/pdf/2003.11755.pdf. Accessed: October 4, 2020.
[75] T. Horsuwan, et al., “A Comparative Study of Pretrained Language Models on Thai Social Text Categorization,” in
the 2020 Asian Conference on Intelligent Information and Database Systems, pp. 63-75, 2020.
[76] PyThaiNLP, “wisesight-sentiment,” 2018. [Online]. Available: https://github.com/PyThaiNLP/wisesight-sentiment.
Accessed: October 4, 2020.
[77] Kaggle, “Wisesight Sentiment Analysis,” 2018. [Online]. Available: https://www.kaggle.com/c/wisesight-sentiment.
Accessed: October 4, 2020.
[78] A. Radford, et al., “Improving Language Understanding by Generative Pre-Training,” 2018. [Online]. Available:
https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-
unsupervised/language_understanding_paper.pdf. Accessed: October 4, 2020.
[79] S. Ren, et al., “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, 2017.doi:
10.1109/TPAMI.2016.2577031.
[80] T-Y. Lin, et al., “Feature Pyramid Networks for Object Detection,” in the 2017 IEEE Conference on Computer
Vision and Pattern Recognition, pp. 936-944, 2017.doi: 10.1109/CVPR.2017.106.
[81] J. Redmon and A. Farhadi, “YOLO9000: Better, Faster, Stronger,” in the 2017 IEEE Conference on Computer
Vision and Pattern Recognition, pp. 6517-6525, 2017.doi: 10.1109/CVPR.2017.690.