Every written text in any language has one author or more authors (authors have their individual sublanguage). An analysis of text if authors are not known could be done using methods of data analysis, data mining, and structural analysis. In this paper, two methods are described for anomaly detections: ngrams method and a system of Self-Organizing Maps working on sequences built from a text. there are
analyzed and compared results of usable methods for discrepancies detection based on character n-gram
profiles (the set of character n-gram normalized frequencies of a text) for Arabic texts. Arabic texts were
analyzed from many statistical characteristics point of view. We applied some heuristics for measurements
of text parts dissimilarities. We evaluate some Arabic texts and show its parts they contain discrepancies and
they need some following analysis for anomaly detection. The analysis depends on selected parameters
prepared in experiments. The system is trained to input sequences after which it determines text parts with
anomalies using a cumulative error and winner analysis in the networks. Both methods have been tested on
Arabic texts and they have a perspective contribution to text analysis
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKSijdms
In this paper, we describe the developed model of the Convolutional Neural Networks CNN to a
classification of advertisements. The developed method has been tested on both texts (Arabic and Slovak
texts).The advertisements are chosen on a classified advertisements websites as short texts. We evolved a
modified model of the CNN, we have implemented it and developed next modifications. We studied their
influence on the performing activity of the proposed network. The result is a functional model of the
network and its implementation in Java and Python. And analysis of model results using different
parameters for the network and input data. The results on experiments data show that the developed model
of CNN is useful in the domains of Arabic and Slovak short texts, mainly for some classification of
advertisements. This paper gives complete guidelines for authors submitting papers for the AIRCC
Journals.
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKSijdms
In this paper, we describe the developed model of the Convolutional Neural Networks CNN to a
classification of advertisements. The developed method has been tested on both texts (Arabic and Slovak
texts).The advertisements are chosen on a classified advertisements websites as short texts. We evolved a
modified model of the CNN, we have implemented it and developed next modifications. We studied their
influence on the performing activity of the proposed network. The result is a functional model of the
network and its implementation in Java and Python. And analysis of model results using different
parameters for the network and input data. The results on experiments data show that the developed model
of CNN is useful in the domains of Arabic and Slovak short texts, mainly for some classification of
advertisements.
Indexing of Arabic documents automatically based on lexical analysis kevig
The continuous information explosion through the Internet and all information sources makes it
necessary to perform all information processing activities automatically in quick and reliable
manners. In this paper, we proposed and implemented a method to automatically create and Index
for books written in Arabic language. The process depends largely on text summarization and
abstraction processes to collect main topics and statements in the book. The process is developed
in terms of accuracy and performance and results showed that this process can effectively replace
the effort of manually indexing books and document, a process that can be very useful in all
information processing and retrieval applications.
Indexing of Arabic documents automatically based on lexical analysiskevig
The continuous information explosion through the Internet and all information sources makes it necessary to perform all information processing activities automatically in quick and reliable manners. In this paper, we proposed and implemented a method to automatically create and Index for books written in Arabic language. The process depends largely on text summarization and abstraction processes to collect main topics and statements in the book. The process is developed in terms of accuracy and performance and results showed that this process can effectively replace the effort of manually indexing books and document, a process that can be very useful in all information processing and retrieval applications.
Indexing of Arabic documents automatically based on lexical analysis kevig
The continuous information explosion through the Internet and all information sources makes it
necessary to perform all information processing activities automatically in quick and reliable
manners. In this paper, we proposed and implemented a method to automatically create and Index
for books written in Arabic language. The process depends largely on text summarization and
abstraction processes to collect main topics and statements in the book. The process is developed
in terms of accuracy and performance and results showed that this process can effectively replace
the effort of manually indexing books and document, a process that can be very useful in all
information processing and retrieval applications.
final Year Projects, Final Year Projects in Chennai, Software Projects, Embedded Projects, Microcontrollers Projects, DSP Projects, VLSI Projects, Matlab Projects, Java Projects, .NET Projects, IEEE Projects, IEEE 2009 Projects, IEEE 2009 Projects, Software, IEEE 2009 Projects, Embedded, Software IEEE 2009 Projects, Embedded IEEE 2009 Projects, Final Year Project Titles, Final Year Project Reports, Final Year Project Review, Robotics Projects, Mechanical Projects, Electrical Projects, Power Electronics Projects, Power System Projects, Model Projects, Java Projects, J2EE Projects, Engineering Projects, Student Projects, Engineering College Projects, MCA Projects, BE Projects, BTech Projects, ME Projects, MTech Projects, Wireless Networks Projects, Network Security Projects, Networking Projects, final year projects, ieee projects, student projects, college projects, ieee projects in chennai, java projects, software ieee projects, embedded ieee projects, "ieee2009projects", "final year projects", "ieee projects", "Engineering Projects", "Final Year Projects in Chennai", "Final year Projects at Chennai", Java Projects, ASP.NET Projects, VB.NET Projects, C# Projects, Visual C++ Projects, Matlab Projects, NS2 Projects, C Projects, Microcontroller Projects, ATMEL Projects, PIC Projects, ARM Projects, DSP Projects, VLSI Projects, FPGA Projects, CPLD Projects, Power Electronics Projects, Electrical Projects, Robotics Projects, Solor Projects, MEMS Projects, J2EE Projects, J2ME Projects, AJAX Projects, Structs Projects, EJB Projects, Real Time Projects, Live Projects, Student Projects, Engineering Projects, MCA Projects, MBA Projects, College Projects, BE Projects, BTech Projects, ME Projects, MTech Projects, M.Sc Projects, Final Year Java Projects, Final Year ASP.NET Projects, Final Year VB.NET Projects, Final Year C# Projects, Final Year Visual C++ Projects, Final Year Matlab Projects, Final Year NS2 Projects, Final Year C Projects, Final Year Microcontroller Projects, Final Year ATMEL Projects, Final Year PIC Projects, Final Year ARM Projects, Final Year DSP Projects, Final Year VLSI Projects, Final Year FPGA Projects, Final Year CPLD Projects, Final Year Power Electronics Projects, Final Year Electrical Projects, Final Year Robotics Projects, Final Year Solor Projects, Final Year MEMS Projects, Final Year J2EE Projects, Final Year J2ME Projects, Final Year AJAX Projects, Final Year Structs Projects, Final Year EJB Projects, Final Year Real Time Projects, Final Year Live Projects, Final Year Student Projects, Final Year Engineering Projects, Final Year MCA Projects, Final Year MBA Projects, Final Year College Projects, Final Year BE Projects, Final Year BTech Projects, Final Year ME Projects, Final Year MTech Projects, Final Year M.Sc Projects, IEEE Java Projects, ASP.NET Projects, VB.NET Projects, C# Projects, Visual C++ Projects, Matlab Projects, NS2 Projects, C Projects, Microcontroller Projects, ATMEL Projects, PIC Projects, ARM Projects, DSP Projects, VLSI Projects, FPGA Projects, CPLD Projects, Power Electronics Projects, Electrical Projects, Robotics Projects, Solor Projects, MEMS Projects, J2EE Projects, J2ME Projects, AJAX Projects, Structs Projects, EJB Projects, Real Time Projects, Live Projects, Student Projects, Engineering Projects, MCA Projects, MBA Projects, College Projects, BE Projects, BTech Projects, ME Projects, MTech Projects, M.Sc Projects, IEEE 2009 Java Projects, IEEE 2009 ASP.NET Projects, IEEE 2009 VB.NET Projects, IEEE 2009 C# Projects, IEEE 2009 Visual C++ Projects, IEEE 2009 Matlab Projects, IEEE 2009 NS2 Projects, IEEE 2009 C Projects, IEEE 2009 Microcontroller Projects, IEEE 2009 ATMEL Projects, IEEE 2009 PIC Projects, IEEE 2009 ARM Projects, IEEE 2009 DSP Projects, IEEE 2009 VLSI Projects, IEEE 2009 FPGA Projects, IEEE 2009 CPLD Projects, IEEE 2009 Power Electronics Projects, IEEE 2009 Electrical Projects, IEEE 2009 Robotics Projects, IEEE 2009 Solor Projects, IEEE 2009 MEMS Projects, IEEE 2009 J2EE P
Near Duplicate Document Detection: Mathematical Modeling and AlgorithmsLiwei Ren任力偉
Near-duplicate document detection is a well-known problem in the area of information retrieval. It is an important problem to be solved for many applications in IT industry. It has been studied with profound research literatures. This article provides a novel solution to this classic problem. We present the problem with abstract models along with additional concepts such as text models, document fingerprints and document similarity. With these concepts, the problem can be transformed into keyword like search problem with results ranked by document similarity. There are two major techniques. The first technique is to extract robust and unique fingerprints from a document. The second one is to calculate document similarity effectively. Algorithms for both fingerprint extraction and document similarity calculation are introduced as a complete solution.
Recognition of Words in Tamil Script Using Neural NetworkIJERA Editor
In this paper, word recognition using neural network is proposed. Recognition process is started with the partitioning of document image into lines, words, and characters and then capturing the local features of segmented characters. After classifying the characters, the word image is transferred into unique code based on character code. This code ideally describes any form of word including word with mixed styles and different sizes. Sequence of character codes of the word form input pattern and word code is a target value of the pattern. Neural network is used to train the patterns of the words. Trained network is tested with word patterns and is recognized or unrecognized based on the network error value. Experiments have been conducted with a local database to evaluate the performance of the word recognizing system and obtained good accuracy. This method can be applied for any language word recognition system as the training is based on only unique code of the characters and words belonging to the language.
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKSijdms
In this paper, we describe the developed model of the Convolutional Neural Networks CNN to a
classification of advertisements. The developed method has been tested on both texts (Arabic and Slovak
texts).The advertisements are chosen on a classified advertisements websites as short texts. We evolved a
modified model of the CNN, we have implemented it and developed next modifications. We studied their
influence on the performing activity of the proposed network. The result is a functional model of the
network and its implementation in Java and Python. And analysis of model results using different
parameters for the network and input data. The results on experiments data show that the developed model
of CNN is useful in the domains of Arabic and Slovak short texts, mainly for some classification of
advertisements. This paper gives complete guidelines for authors submitting papers for the AIRCC
Journals.
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKSijdms
In this paper, we describe the developed model of the Convolutional Neural Networks CNN to a
classification of advertisements. The developed method has been tested on both texts (Arabic and Slovak
texts).The advertisements are chosen on a classified advertisements websites as short texts. We evolved a
modified model of the CNN, we have implemented it and developed next modifications. We studied their
influence on the performing activity of the proposed network. The result is a functional model of the
network and its implementation in Java and Python. And analysis of model results using different
parameters for the network and input data. The results on experiments data show that the developed model
of CNN is useful in the domains of Arabic and Slovak short texts, mainly for some classification of
advertisements.
Indexing of Arabic documents automatically based on lexical analysis kevig
The continuous information explosion through the Internet and all information sources makes it
necessary to perform all information processing activities automatically in quick and reliable
manners. In this paper, we proposed and implemented a method to automatically create and Index
for books written in Arabic language. The process depends largely on text summarization and
abstraction processes to collect main topics and statements in the book. The process is developed
in terms of accuracy and performance and results showed that this process can effectively replace
the effort of manually indexing books and document, a process that can be very useful in all
information processing and retrieval applications.
Indexing of Arabic documents automatically based on lexical analysiskevig
The continuous information explosion through the Internet and all information sources makes it necessary to perform all information processing activities automatically in quick and reliable manners. In this paper, we proposed and implemented a method to automatically create and Index for books written in Arabic language. The process depends largely on text summarization and abstraction processes to collect main topics and statements in the book. The process is developed in terms of accuracy and performance and results showed that this process can effectively replace the effort of manually indexing books and document, a process that can be very useful in all information processing and retrieval applications.
Indexing of Arabic documents automatically based on lexical analysis kevig
The continuous information explosion through the Internet and all information sources makes it
necessary to perform all information processing activities automatically in quick and reliable
manners. In this paper, we proposed and implemented a method to automatically create and Index
for books written in Arabic language. The process depends largely on text summarization and
abstraction processes to collect main topics and statements in the book. The process is developed
in terms of accuracy and performance and results showed that this process can effectively replace
the effort of manually indexing books and document, a process that can be very useful in all
information processing and retrieval applications.
final Year Projects, Final Year Projects in Chennai, Software Projects, Embedded Projects, Microcontrollers Projects, DSP Projects, VLSI Projects, Matlab Projects, Java Projects, .NET Projects, IEEE Projects, IEEE 2009 Projects, IEEE 2009 Projects, Software, IEEE 2009 Projects, Embedded, Software IEEE 2009 Projects, Embedded IEEE 2009 Projects, Final Year Project Titles, Final Year Project Reports, Final Year Project Review, Robotics Projects, Mechanical Projects, Electrical Projects, Power Electronics Projects, Power System Projects, Model Projects, Java Projects, J2EE Projects, Engineering Projects, Student Projects, Engineering College Projects, MCA Projects, BE Projects, BTech Projects, ME Projects, MTech Projects, Wireless Networks Projects, Network Security Projects, Networking Projects, final year projects, ieee projects, student projects, college projects, ieee projects in chennai, java projects, software ieee projects, embedded ieee projects, "ieee2009projects", "final year projects", "ieee projects", "Engineering Projects", "Final Year Projects in Chennai", "Final year Projects at Chennai", Java Projects, ASP.NET Projects, VB.NET Projects, C# Projects, Visual C++ Projects, Matlab Projects, NS2 Projects, C Projects, Microcontroller Projects, ATMEL Projects, PIC Projects, ARM Projects, DSP Projects, VLSI Projects, FPGA Projects, CPLD Projects, Power Electronics Projects, Electrical Projects, Robotics Projects, Solor Projects, MEMS Projects, J2EE Projects, J2ME Projects, AJAX Projects, Structs Projects, EJB Projects, Real Time Projects, Live Projects, Student Projects, Engineering Projects, MCA Projects, MBA Projects, College Projects, BE Projects, BTech Projects, ME Projects, MTech Projects, M.Sc Projects, Final Year Java Projects, Final Year ASP.NET Projects, Final Year VB.NET Projects, Final Year C# Projects, Final Year Visual C++ Projects, Final Year Matlab Projects, Final Year NS2 Projects, Final Year C Projects, Final Year Microcontroller Projects, Final Year ATMEL Projects, Final Year PIC Projects, Final Year ARM Projects, Final Year DSP Projects, Final Year VLSI Projects, Final Year FPGA Projects, Final Year CPLD Projects, Final Year Power Electronics Projects, Final Year Electrical Projects, Final Year Robotics Projects, Final Year Solor Projects, Final Year MEMS Projects, Final Year J2EE Projects, Final Year J2ME Projects, Final Year AJAX Projects, Final Year Structs Projects, Final Year EJB Projects, Final Year Real Time Projects, Final Year Live Projects, Final Year Student Projects, Final Year Engineering Projects, Final Year MCA Projects, Final Year MBA Projects, Final Year College Projects, Final Year BE Projects, Final Year BTech Projects, Final Year ME Projects, Final Year MTech Projects, Final Year M.Sc Projects, IEEE Java Projects, ASP.NET Projects, VB.NET Projects, C# Projects, Visual C++ Projects, Matlab Projects, NS2 Projects, C Projects, Microcontroller Projects, ATMEL Projects, PIC Projects, ARM Projects, DSP Projects, VLSI Projects, FPGA Projects, CPLD Projects, Power Electronics Projects, Electrical Projects, Robotics Projects, Solor Projects, MEMS Projects, J2EE Projects, J2ME Projects, AJAX Projects, Structs Projects, EJB Projects, Real Time Projects, Live Projects, Student Projects, Engineering Projects, MCA Projects, MBA Projects, College Projects, BE Projects, BTech Projects, ME Projects, MTech Projects, M.Sc Projects, IEEE 2009 Java Projects, IEEE 2009 ASP.NET Projects, IEEE 2009 VB.NET Projects, IEEE 2009 C# Projects, IEEE 2009 Visual C++ Projects, IEEE 2009 Matlab Projects, IEEE 2009 NS2 Projects, IEEE 2009 C Projects, IEEE 2009 Microcontroller Projects, IEEE 2009 ATMEL Projects, IEEE 2009 PIC Projects, IEEE 2009 ARM Projects, IEEE 2009 DSP Projects, IEEE 2009 VLSI Projects, IEEE 2009 FPGA Projects, IEEE 2009 CPLD Projects, IEEE 2009 Power Electronics Projects, IEEE 2009 Electrical Projects, IEEE 2009 Robotics Projects, IEEE 2009 Solor Projects, IEEE 2009 MEMS Projects, IEEE 2009 J2EE P
Near Duplicate Document Detection: Mathematical Modeling and AlgorithmsLiwei Ren任力偉
Near-duplicate document detection is a well-known problem in the area of information retrieval. It is an important problem to be solved for many applications in IT industry. It has been studied with profound research literatures. This article provides a novel solution to this classic problem. We present the problem with abstract models along with additional concepts such as text models, document fingerprints and document similarity. With these concepts, the problem can be transformed into keyword like search problem with results ranked by document similarity. There are two major techniques. The first technique is to extract robust and unique fingerprints from a document. The second one is to calculate document similarity effectively. Algorithms for both fingerprint extraction and document similarity calculation are introduced as a complete solution.
Recognition of Words in Tamil Script Using Neural NetworkIJERA Editor
In this paper, word recognition using neural network is proposed. Recognition process is started with the partitioning of document image into lines, words, and characters and then capturing the local features of segmented characters. After classifying the characters, the word image is transferred into unique code based on character code. This code ideally describes any form of word including word with mixed styles and different sizes. Sequence of character codes of the word form input pattern and word code is a target value of the pattern. Neural network is used to train the patterns of the words. Trained network is tested with word patterns and is recognized or unrecognized based on the network error value. Experiments have been conducted with a local database to evaluate the performance of the word recognizing system and obtained good accuracy. This method can be applied for any language word recognition system as the training is based on only unique code of the characters and words belonging to the language.
The effect of training set size in authorship attribution: application on sho...IJECEIAES
Authorship attribution (AA) is a subfield of linguistics analysis, aiming to identify the original author among a set of candidate authors. Several research papers were published and several methods and models were developed for many languages. However, the number of related works for Arabic is limited. Moreover, investigating the impact of short words length and training set size is not well addressed. To the best of our knowledge, no published works or researches, in this direction or even in other languages, are available. Therefore, we propose to investigate this effect, taking into account different stylomatric combination. The Mahalanobis distance (MD), Linear Regression (LR), and Multilayer Perceptron (MP) are selected as AA classifiers. During the experiment, the training dataset size is increased and the accuracy of the classifiers is recorded. The results are quite interesting and show different classifiers behaviours. Combining word-based stylomatric features with n-grams provides the best accuracy reached in average 93%.
Arabic tweeps dialect prediction based on machine learning approach IJECEIAES
In this paper, we present our approach for profiling Arabic authors on Twitter, based on their tweets. We consider here the dialect of an Arabic author as an important trait to be predicted. For this purpose, many indicators, feature vectors and machine learning-based classifiers were implemented. The results of these classifiers were compared to find out the best dialect prediction model. The best dialect prediction model was obtained using random forest classifier with full forms and their stems as feature vector.
Dimensionality Reduction and Feature Selection Methods for Script Identificat...ITIIIndustries
The goal of this research is to explore effects of dimensionality reduction and feature selection on the problem of script identification from images of printed documents. The kadjacent segment is ideal for this use due to its ability to capture visual patterns. We have used principle component analysis to reduce the size of our feature matrix to a handier size that can be trained easily, and experimented by including varying combinations of dimensions of the super feature set. A modular
approach in neural network was used to classify 7 languages – Arabic, Chinese, English, Japanese, Tamil, Thai and Korean.
Proposed Arabic Text Steganography Method Based on New Coding TechniqueIJERA Editor
Steganography is one of the important fields of information security that depend on hiding
secret information in a cover media (video, image, audio, text) such that un authorized person fails to realize its
existence. One of the lossless data compression techniques which are used for a given file that contains many
redundant data is run length encoding (RLE). Sometimes the RLE output will be expanded rather than
compressed, and this is the main problem of RLE. In this paper we will use a new coding method such that its
output will be contains sequence of ones with few zeros, so modified RLE that we proposed in this paper will be
suitable for compression, finally we employ the modified RLE output for stenography purpose that based on
Unicode and non-printed characters to hide the secret information in an Arabic text.
The work shows that a program can be
syntactically represented as an oriented word tree, that is a syntactic program tree, program
words being located both in tree nodes and on tree arrows.
Scene text recognition in mobile applications by character descriptor and str...eSAT Journals
Abstract
Camera-based scene images usually have complex background filled with non-text objects in multiple shapes and colors. The existing system is sensitive to font scale changes and background interference. The main focusof this system is on two character recognition methods. In text detection, previously proposed algorithms are used to search for regions of text strings. Proposed system uses character descriptor which is effective to extract representative and discriminative text features for both recognition schemes. The local features descriptor HOG is compatible with all above key point detectors. Our method of scene text recognition from detected text regions is compatible with the application of mobile devices. Proposedsystem accurately extracts text from natural scene image in presence of background interference.The demo system gives us details of algorithm design and performance improvements of scene text extraction. It is ableto detect text region of text strings from cluttered and recognize characters in the text regions.
Keywords: Scene text detection, scene text recognition, character descriptor, stroke configuration, text understanding, text retrieval.
A SURVEY ON SIMILARITY MEASURES IN TEXT MINING mlaij
The Volume of text resources have been increasing in digital libraries and internet. Organizing these text documents has become a practical need. For organizing great number of objects into small or minimum number of coherent groups automatically, Clustering technique is used. These documents are widely used for information retrieval and Natural Language processing tasks. Different Clustering algorithms require a metric for quantifying how dissimilar two given documents are. This difference is often measured by similarity measure such as Euclidean distance, Cosine similarity etc. The similarity measure process in text
mining can be used to identify the suitable clustering algorithm for a specific problem. This survey discusses the existing works on text similarity by partitioning them into three significant approaches; String-based, Knowledge based and Corpus-based similarities.
In This paper we presented new approach for cursive Arabic text recognition system. The objective is to propose methodology analytical offline recognition of handwritten Arabic for rapid implementation.The first part in the writing recognition system is the preprocessing phase is the preprocessing phase to prepare the data was introduces and extracts a set of simple statistical features by two methods : from a window which is sliding long that text line the right to left and the approach VH2D (consists in projecting every character on the abscissa, on the ordinate and the diagonals 45° and 135°) . It then injects the resulting feature vectors to Hidden Markov Model (HMM) and combined the two HMM by multi-stream approach.
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONkevig
In This paper we presented new approach for cursive Arabic text recognition system. The objective is to propose methodology analytical offline recognition of handwritten Arabic for rapid implementation. The first part in the writing recognition system is the preprocessing phase is the preprocessing phase to prepare the data was introduces and extracts a set of simple statistical features by two methods : from a window which is sliding long that text line the right to left and the approach VH2D (consists in projecting every character on the abscissa, on the ordinate and the diagonals 45° and 135°) . It then injects the resulting feature vectors to Hidden Markov Model (HMM) and combined the two HMM by multi-stream approach.
Artificial Neural Networks have proved their efficiency in a large number of research domains. In this paper, we have applied Artificial Neural Networks on Arabic text to prove correct language modeling, text generation, and missing text prediction. In one hand, we have adapted Recurrent Neural Networks architectures to model Arabic language in order to generate correct Arabic sequences. In the other hand, Convolutional Neural Networks have been parameterized, basing on some specific features of Arabic, to predict missing text in Arabic documents. We have demonstrated the power of our adapted models in generating and predicting correct Arabic text comparing to the standard model. The model had been trained and tested on known free Arabic datasets. Results have been promising with sufficient accuracy.
Off line system for the recognition of handwritten arabic charactercsandit
Recognition of handwritten Arabic text awaits accurate recognition solutions. There are many
difficulties facing a good handwritten Arabic recognition system such as unlimited variation in
human handwriting, similarities of distinct character shapes, and their position in the word. The
typical Optical Character Recognition (OCR) systems are based mainly on three stages,
preprocessing, features extraction and recognition.
In this paper, we present an efficient approach for the recognition of off-line Arabic handwritten
characters which is based on structural, Statistical and Morphological features from the main
body of the character and also from the secondary components. Evaluation of the accuracy of
the selected features is made. The system was trained and tested with CENPRMI dataset. The
proposed algorithm obtained promising results in terms of accuracy (success rate of 100% for
some letters at average 88%). In Comparable with other related works we find that our result is
the highest among others.
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...ijsc
Assigning the submitted text to one of the predetermined categories is required when dealing with
application-oriented texts. There are many different approaches to solving this problem, including using
neural network algorithms. This article explores using neural networks to sort news articles based on their
category. Two word vectorization algorithms are being used — The Bag of Words (BOW) and the
word2vec distributive semantic model. For this work the BOW model was applied to the FNN, whereas the
word2vec model was applied to CNN. We have measured the accuracy of the classification when applying
these methods for ad texts datasets. The experimental results have shown that both of the models show us
quite the comparable accuracy. However, the word2vec encoding used for CNN showed more relevant
results, regarding to the texts semantics. Moreover, the trained CNN, based on the word2vec architecture,
has produced a compact feature map on its last convolutional layer, which can then be used in the future
text representation. I.e. Using CNN as a text encoder and for learning transfer.
Texts Classification with the usage of Neural Network based on the Word2vec’s...ijsc
Assigning the submitted text to one of the predetermined categories is required when dealing with application-oriented texts. There are many different approaches to solving this problem, including using neural network algorithms. This article explores using neural networks to sort news articles based on their category. Two word vectorization algorithms are being used — The Bag of Words (BOW) and the
word2vec distributive semantic model. For this work the BOW model was applied to the FNN, whereas the word2vec model was applied to CNN. We have measured the accuracy of the classification when applying these methods for ad texts datasets. The experimental results have shown that both of the models show us quite the comparable accuracy. However, the word2vec encoding used for CNN showed more relevant results, regarding to the texts semantics. Moreover, the trained CNN, based on the word2vec architecture, has produced a compact feature map on its last convolutional layer, which can then be used in the future text representation. I.e. Using CNN as a text encoder and for learning transfer.
Scandal! Teasers June 2024 on etv Forum.co.zaIsaac More
Monday, 3 June 2024
Episode 47
A friend is compelled to expose a manipulative scheme to prevent another from making a grave mistake. In a frantic bid to save Jojo, Phakamile agrees to a meeting that unbeknownst to her, will seal her fate.
Tuesday, 4 June 2024
Episode 48
A mother, with her son's best interests at heart, finds him unready to heed her advice. Motshabi finds herself in an unmanageable situation, sinking fast like in quicksand.
Wednesday, 5 June 2024
Episode 49
A woman fabricates a diabolical lie to cover up an indiscretion. Overwhelmed by guilt, she makes a spontaneous confession that could be devastating to another heart.
Thursday, 6 June 2024
Episode 50
Linda unwittingly discloses damning information. Nhlamulo and Vuvu try to guide their friend towards the right decision.
Friday, 7 June 2024
Episode 51
Jojo's life continues to spiral out of control. Dintle weaves a web of lies to conceal that she is not as successful as everyone believes.
Monday, 10 June 2024
Episode 52
A heated confrontation between lovers leads to a devastating admission of guilt. Dintle's desperation takes a new turn, leaving her with dwindling options.
Tuesday, 11 June 2024
Episode 53
Unable to resort to violence, Taps issues a verbal threat, leaving Mdala unsettled. A sister must explain her life choices to regain her brother's trust.
Wednesday, 12 June 2024
Episode 54
Winnie makes a very troubling discovery. Taps follows through on his threat, leaving a woman reeling. Layla, oblivious to the truth, offers an incentive.
Thursday, 13 June 2024
Episode 55
A nosy relative arrives just in time to thwart a man's fatal decision. Dintle manipulates Khanyi to tug at Mo's heartstrings and get what she wants.
Friday, 14 June 2024
Episode 56
Tlhogi is shocked by Mdala's reaction following the revelation of their indiscretion. Jojo is in disbelief when the punishment for his crime is revealed.
Monday, 17 June 2024
Episode 57
A woman reprimands another to stay in her lane, leading to a damning revelation. A man decides to leave his broken life behind.
Tuesday, 18 June 2024
Episode 58
Nhlamulo learns that due to his actions, his worst fears have come true. Caiphus' extravagant promises to suppliers get him into trouble with Ndu.
Wednesday, 19 June 2024
Episode 59
A woman manages to kill two birds with one stone. Business doom looms over Chillax. A sobering incident makes a woman realize how far she's fallen.
Thursday, 20 June 2024
Episode 60
Taps' offer to help Nhlamulo comes with hidden motives. Caiphus' new ideas for Chillax have MaHilda excited. A blast from the past recognizes Dintle, not for her newfound fame.
Friday, 21 June 2024
Episode 61
Taps is hungry for revenge and finds a rope to hang Mdala with. Chillax's new job opportunity elicits mixed reactions from the public. Roommates' initial meeting starts off on the wrong foot.
Monday, 24 June 2024
Episode 62
Taps seizes new information and recruits someone on the inside. Mary's new job
From the Editor's Desk: 115th Father's day Celebration - When we see Father's day in Hindu context, Nanda Baba is the most vivid figure which comes to the mind. Nanda Baba who was the foster father of Lord Krishna is known to provide love, care and affection to Lord Krishna and Balarama along with his wife Yashoda; Letter’s to the Editor: Mother's Day - Mother is a precious life for their children. Mother is life breath for her children. Mother's lap is the world happiness whose debt can never be paid.
More Related Content
Similar to ANOMALY DETECTION IN ARABIC TEXTS USING NGRAMS AND SELF ORGANIZING MAPS
The effect of training set size in authorship attribution: application on sho...IJECEIAES
Authorship attribution (AA) is a subfield of linguistics analysis, aiming to identify the original author among a set of candidate authors. Several research papers were published and several methods and models were developed for many languages. However, the number of related works for Arabic is limited. Moreover, investigating the impact of short words length and training set size is not well addressed. To the best of our knowledge, no published works or researches, in this direction or even in other languages, are available. Therefore, we propose to investigate this effect, taking into account different stylomatric combination. The Mahalanobis distance (MD), Linear Regression (LR), and Multilayer Perceptron (MP) are selected as AA classifiers. During the experiment, the training dataset size is increased and the accuracy of the classifiers is recorded. The results are quite interesting and show different classifiers behaviours. Combining word-based stylomatric features with n-grams provides the best accuracy reached in average 93%.
Arabic tweeps dialect prediction based on machine learning approach IJECEIAES
In this paper, we present our approach for profiling Arabic authors on Twitter, based on their tweets. We consider here the dialect of an Arabic author as an important trait to be predicted. For this purpose, many indicators, feature vectors and machine learning-based classifiers were implemented. The results of these classifiers were compared to find out the best dialect prediction model. The best dialect prediction model was obtained using random forest classifier with full forms and their stems as feature vector.
Dimensionality Reduction and Feature Selection Methods for Script Identificat...ITIIIndustries
The goal of this research is to explore effects of dimensionality reduction and feature selection on the problem of script identification from images of printed documents. The kadjacent segment is ideal for this use due to its ability to capture visual patterns. We have used principle component analysis to reduce the size of our feature matrix to a handier size that can be trained easily, and experimented by including varying combinations of dimensions of the super feature set. A modular
approach in neural network was used to classify 7 languages – Arabic, Chinese, English, Japanese, Tamil, Thai and Korean.
Proposed Arabic Text Steganography Method Based on New Coding TechniqueIJERA Editor
Steganography is one of the important fields of information security that depend on hiding
secret information in a cover media (video, image, audio, text) such that un authorized person fails to realize its
existence. One of the lossless data compression techniques which are used for a given file that contains many
redundant data is run length encoding (RLE). Sometimes the RLE output will be expanded rather than
compressed, and this is the main problem of RLE. In this paper we will use a new coding method such that its
output will be contains sequence of ones with few zeros, so modified RLE that we proposed in this paper will be
suitable for compression, finally we employ the modified RLE output for stenography purpose that based on
Unicode and non-printed characters to hide the secret information in an Arabic text.
The work shows that a program can be
syntactically represented as an oriented word tree, that is a syntactic program tree, program
words being located both in tree nodes and on tree arrows.
Scene text recognition in mobile applications by character descriptor and str...eSAT Journals
Abstract
Camera-based scene images usually have complex background filled with non-text objects in multiple shapes and colors. The existing system is sensitive to font scale changes and background interference. The main focusof this system is on two character recognition methods. In text detection, previously proposed algorithms are used to search for regions of text strings. Proposed system uses character descriptor which is effective to extract representative and discriminative text features for both recognition schemes. The local features descriptor HOG is compatible with all above key point detectors. Our method of scene text recognition from detected text regions is compatible with the application of mobile devices. Proposedsystem accurately extracts text from natural scene image in presence of background interference.The demo system gives us details of algorithm design and performance improvements of scene text extraction. It is ableto detect text region of text strings from cluttered and recognize characters in the text regions.
Keywords: Scene text detection, scene text recognition, character descriptor, stroke configuration, text understanding, text retrieval.
A SURVEY ON SIMILARITY MEASURES IN TEXT MINING mlaij
The Volume of text resources have been increasing in digital libraries and internet. Organizing these text documents has become a practical need. For organizing great number of objects into small or minimum number of coherent groups automatically, Clustering technique is used. These documents are widely used for information retrieval and Natural Language processing tasks. Different Clustering algorithms require a metric for quantifying how dissimilar two given documents are. This difference is often measured by similarity measure such as Euclidean distance, Cosine similarity etc. The similarity measure process in text
mining can be used to identify the suitable clustering algorithm for a specific problem. This survey discusses the existing works on text similarity by partitioning them into three significant approaches; String-based, Knowledge based and Corpus-based similarities.
In This paper we presented new approach for cursive Arabic text recognition system. The objective is to propose methodology analytical offline recognition of handwritten Arabic for rapid implementation.The first part in the writing recognition system is the preprocessing phase is the preprocessing phase to prepare the data was introduces and extracts a set of simple statistical features by two methods : from a window which is sliding long that text line the right to left and the approach VH2D (consists in projecting every character on the abscissa, on the ordinate and the diagonals 45° and 135°) . It then injects the resulting feature vectors to Hidden Markov Model (HMM) and combined the two HMM by multi-stream approach.
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONkevig
In This paper we presented new approach for cursive Arabic text recognition system. The objective is to propose methodology analytical offline recognition of handwritten Arabic for rapid implementation. The first part in the writing recognition system is the preprocessing phase is the preprocessing phase to prepare the data was introduces and extracts a set of simple statistical features by two methods : from a window which is sliding long that text line the right to left and the approach VH2D (consists in projecting every character on the abscissa, on the ordinate and the diagonals 45° and 135°) . It then injects the resulting feature vectors to Hidden Markov Model (HMM) and combined the two HMM by multi-stream approach.
Artificial Neural Networks have proved their efficiency in a large number of research domains. In this paper, we have applied Artificial Neural Networks on Arabic text to prove correct language modeling, text generation, and missing text prediction. In one hand, we have adapted Recurrent Neural Networks architectures to model Arabic language in order to generate correct Arabic sequences. In the other hand, Convolutional Neural Networks have been parameterized, basing on some specific features of Arabic, to predict missing text in Arabic documents. We have demonstrated the power of our adapted models in generating and predicting correct Arabic text comparing to the standard model. The model had been trained and tested on known free Arabic datasets. Results have been promising with sufficient accuracy.
Off line system for the recognition of handwritten arabic charactercsandit
Recognition of handwritten Arabic text awaits accurate recognition solutions. There are many
difficulties facing a good handwritten Arabic recognition system such as unlimited variation in
human handwriting, similarities of distinct character shapes, and their position in the word. The
typical Optical Character Recognition (OCR) systems are based mainly on three stages,
preprocessing, features extraction and recognition.
In this paper, we present an efficient approach for the recognition of off-line Arabic handwritten
characters which is based on structural, Statistical and Morphological features from the main
body of the character and also from the secondary components. Evaluation of the accuracy of
the selected features is made. The system was trained and tested with CENPRMI dataset. The
proposed algorithm obtained promising results in terms of accuracy (success rate of 100% for
some letters at average 88%). In Comparable with other related works we find that our result is
the highest among others.
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...ijsc
Assigning the submitted text to one of the predetermined categories is required when dealing with
application-oriented texts. There are many different approaches to solving this problem, including using
neural network algorithms. This article explores using neural networks to sort news articles based on their
category. Two word vectorization algorithms are being used — The Bag of Words (BOW) and the
word2vec distributive semantic model. For this work the BOW model was applied to the FNN, whereas the
word2vec model was applied to CNN. We have measured the accuracy of the classification when applying
these methods for ad texts datasets. The experimental results have shown that both of the models show us
quite the comparable accuracy. However, the word2vec encoding used for CNN showed more relevant
results, regarding to the texts semantics. Moreover, the trained CNN, based on the word2vec architecture,
has produced a compact feature map on its last convolutional layer, which can then be used in the future
text representation. I.e. Using CNN as a text encoder and for learning transfer.
Texts Classification with the usage of Neural Network based on the Word2vec’s...ijsc
Assigning the submitted text to one of the predetermined categories is required when dealing with application-oriented texts. There are many different approaches to solving this problem, including using neural network algorithms. This article explores using neural networks to sort news articles based on their category. Two word vectorization algorithms are being used — The Bag of Words (BOW) and the
word2vec distributive semantic model. For this work the BOW model was applied to the FNN, whereas the word2vec model was applied to CNN. We have measured the accuracy of the classification when applying these methods for ad texts datasets. The experimental results have shown that both of the models show us quite the comparable accuracy. However, the word2vec encoding used for CNN showed more relevant results, regarding to the texts semantics. Moreover, the trained CNN, based on the word2vec architecture, has produced a compact feature map on its last convolutional layer, which can then be used in the future text representation. I.e. Using CNN as a text encoder and for learning transfer.
Scandal! Teasers June 2024 on etv Forum.co.zaIsaac More
Monday, 3 June 2024
Episode 47
A friend is compelled to expose a manipulative scheme to prevent another from making a grave mistake. In a frantic bid to save Jojo, Phakamile agrees to a meeting that unbeknownst to her, will seal her fate.
Tuesday, 4 June 2024
Episode 48
A mother, with her son's best interests at heart, finds him unready to heed her advice. Motshabi finds herself in an unmanageable situation, sinking fast like in quicksand.
Wednesday, 5 June 2024
Episode 49
A woman fabricates a diabolical lie to cover up an indiscretion. Overwhelmed by guilt, she makes a spontaneous confession that could be devastating to another heart.
Thursday, 6 June 2024
Episode 50
Linda unwittingly discloses damning information. Nhlamulo and Vuvu try to guide their friend towards the right decision.
Friday, 7 June 2024
Episode 51
Jojo's life continues to spiral out of control. Dintle weaves a web of lies to conceal that she is not as successful as everyone believes.
Monday, 10 June 2024
Episode 52
A heated confrontation between lovers leads to a devastating admission of guilt. Dintle's desperation takes a new turn, leaving her with dwindling options.
Tuesday, 11 June 2024
Episode 53
Unable to resort to violence, Taps issues a verbal threat, leaving Mdala unsettled. A sister must explain her life choices to regain her brother's trust.
Wednesday, 12 June 2024
Episode 54
Winnie makes a very troubling discovery. Taps follows through on his threat, leaving a woman reeling. Layla, oblivious to the truth, offers an incentive.
Thursday, 13 June 2024
Episode 55
A nosy relative arrives just in time to thwart a man's fatal decision. Dintle manipulates Khanyi to tug at Mo's heartstrings and get what she wants.
Friday, 14 June 2024
Episode 56
Tlhogi is shocked by Mdala's reaction following the revelation of their indiscretion. Jojo is in disbelief when the punishment for his crime is revealed.
Monday, 17 June 2024
Episode 57
A woman reprimands another to stay in her lane, leading to a damning revelation. A man decides to leave his broken life behind.
Tuesday, 18 June 2024
Episode 58
Nhlamulo learns that due to his actions, his worst fears have come true. Caiphus' extravagant promises to suppliers get him into trouble with Ndu.
Wednesday, 19 June 2024
Episode 59
A woman manages to kill two birds with one stone. Business doom looms over Chillax. A sobering incident makes a woman realize how far she's fallen.
Thursday, 20 June 2024
Episode 60
Taps' offer to help Nhlamulo comes with hidden motives. Caiphus' new ideas for Chillax have MaHilda excited. A blast from the past recognizes Dintle, not for her newfound fame.
Friday, 21 June 2024
Episode 61
Taps is hungry for revenge and finds a rope to hang Mdala with. Chillax's new job opportunity elicits mixed reactions from the public. Roommates' initial meeting starts off on the wrong foot.
Monday, 24 June 2024
Episode 62
Taps seizes new information and recruits someone on the inside. Mary's new job
From the Editor's Desk: 115th Father's day Celebration - When we see Father's day in Hindu context, Nanda Baba is the most vivid figure which comes to the mind. Nanda Baba who was the foster father of Lord Krishna is known to provide love, care and affection to Lord Krishna and Balarama along with his wife Yashoda; Letter’s to the Editor: Mother's Day - Mother is a precious life for their children. Mother is life breath for her children. Mother's lap is the world happiness whose debt can never be paid.
In the vast landscape of cinema, stories have been told, retold, and reimagined in countless ways. At the heart of this narrative evolution lies the concept of a "remake". A successful remake allows us to revisit cherished tales through a fresh lens, often reflecting a different era's perspective or harnessing the power of advanced technology. Yet, the question remains, what makes a remake successful? Today, we will delve deeper into this subject, identifying the key ingredients that contribute to the success of a remake.
Panchayat Season 3 - Official Trailer.pdfSuleman Rana
The dearest series "Panchayat" is set to make a victorious return with its third season, and the fervor is discernible. The authority trailer, delivered on May 28, guarantees one more enamoring venture through the country heartland of India.
Jitendra Kumar keeps on sparkling as Abhishek Tripathi, the city-reared engineer who ends up functioning as the secretary of the Panchayat office in the curious town of Phulera. His nuanced depiction of a young fellow exploring the difficulties of country life while endeavoring to adjust to his new environmental factors has earned far and wide recognition.
Neena Gupta and Raghubir Yadav return as Manju Devi and Brij Bhushan Dubey, separately. Their dynamic science and immaculate acting rejuvenate the hardships of town administration. Gupta's depiction of the town Pradhan with an ever-evolving outlook, matched with Yadav's carefully prepared exhibition, adds profundity and credibility to the story.
New Difficulties and Experiences
The trailer indicates new difficulties anticipating the characters, as Abhishek keeps on wrestling with his part in the town and his yearnings for a superior future. The series has reliably offset humor with social editorial, and Season 3 looks ready to dig much more profound into the intricacies of rustic organization and self-awareness.
Watchers can hope to see a greater amount of the enchanting and particular residents who have become fan top picks. Their connections and the one of a kind cut of-life situations give a reviving and interesting portrayal of provincial India, featuring the two its appeal and its difficulties.
A Mix of Humor and Heart
One of the signs of "Panchayat" is its capacity to mix humor with sincere narrating. The trailer features minutes that guarantee to convey giggles, as well as scenes that pull at the heartstrings. This equilibrium has been a critical calculate the show's prosperity, resounding with crowds across different socioeconomics.
Creation Greatness
The creation quality remaining parts first rate, with the beautiful setting of Phulera town filling in as a scenery that upgrades the narrating. The meticulousness in portraying provincial life, joined with sharp composition and solid exhibitions, guarantees that "Panchayat" keeps on hanging out in the packed web series scene.
Expectation and Delivery
As the delivery date draws near, expectation for "Panchayat" Season 3 is at a record-breaking high. The authority trailer has previously created critical buzz, with fans enthusiastically anticipating the continuation of Abhishek Tripathi's excursion and the new undertakings that lie ahead in Phulera.
All in all, the authority trailer for "Panchayat" Season 3 recommends that watchers are in for another drawing in and engaging ride. Yet again with its charming characters, convincing story, and ideal mix of humor and show, the new season is set to enamor crowds. Write in your schedules and prepare to get back to the endearing universe of "Panchayat."
From Slave to Scourge: The Existential Choice of Django Unchained. The Philos...Rodney Thomas Jr
#SSAPhilosophy #DjangoUnchained #DjangoFreeman #ExistentialPhilosophy #Freedom #Identity #Justice #Courage #Rebellion #Transformation
Welcome to SSA Philosophy, your ultimate destination for diving deep into the profound philosophies of iconic characters from video games, movies, and TV shows. In this episode, we explore the powerful journey and existential philosophy of Django Freeman from Quentin Tarantino’s masterful film, "Django Unchained," in our video titled, "From Slave to Scourge: The Existential Choice of Django Unchained. The Philosophy of Django Freeman!"
From Slave to Scourge: The Existential Choice of Django Unchained – The Philosophy of Django Freeman!
Join me as we delve into the existential philosophy of Django Freeman, uncovering the profound lessons and timeless wisdom his character offers. Through his story, we find inspiration in the power of choice, the quest for justice, and the courage to defy oppression. Django Freeman’s philosophy is a testament to the human spirit’s unyielding drive for freedom and justice.
Don’t forget to like, comment, and subscribe to SSA Philosophy for more in-depth explorations of the philosophies behind your favorite characters. Hit the notification bell to stay updated on our latest videos. Let’s discover the principles that shape these icons and the profound lessons they offer.
Django Freeman’s story is one of the most compelling narratives of transformation and empowerment in cinema. A former slave turned relentless bounty hunter, Django’s journey is not just a physical liberation but an existential quest for identity, justice, and retribution. This video delves into the core philosophical elements that define Django’s character and the profound choices he makes throughout his journey.
Link to video: https://youtu.be/GszqrXk38qk
Maximizing Your Streaming Experience with XCIPTV- Tips for 2024.pdfXtreame HDTV
In today’s digital age, streaming services have become an integral part of our entertainment lives. Among the myriad of options available, XCIPTV stands out as a premier choice for those seeking seamless, high-quality streaming. This comprehensive guide will delve into the features, benefits, and user experience of XCIPTV, illustrating why it is a top contender in the IPTV industry.
Meet Dinah Mattingly – Larry Bird’s Partner in Life and Loveget joys
Get an intimate look at Dinah Mattingly’s life alongside NBA icon Larry Bird. From their humble beginnings to their life today, discover the love and partnership that have defined their relationship.
As a film director, I have always been awestruck by the magic of animation. Animation, a medium once considered solely for the amusement of children, has undergone a significant transformation over the years. Its evolution from a rudimentary form of entertainment to a sophisticated form of storytelling has stirred my creativity and expanded my vision, offering limitless possibilities in the realm of cinematic storytelling.
Create a Seamless Viewing Experience with Your Own Custom OTT Player.pdfGenny Knight
As the popularity of online streaming continues to rise, the significance of providing outstanding viewing experiences cannot be emphasized enough. Tailored OTT players present a robust solution for service providers aiming to enhance their offerings and engage audiences in a competitive market. Through embracing customization, companies can craft immersive, individualized experiences that effectively hold viewers' attention, entertain them, and encourage repeat usage.
Experience the thrill of Progressive Puzzle Adventures, like Scavenger Hunt Games and Escape Room Activities combined Solve Treasure Hunt Puzzles online.
Young Tom Selleck: A Journey Through His Early Years and Rise to Stardomgreendigital
Introduction
When one thinks of Hollywood legends, Tom Selleck is a name that comes to mind. Known for his charming smile, rugged good looks. and the iconic mustache that has become synonymous with his persona. Tom Selleck has had a prolific career spanning decades. But, the journey of young Tom Selleck, from his early years to becoming a household name. is a story filled with determination, talent, and a touch of luck. This article delves into young Tom Selleck's life, background, early struggles. and pivotal moments that led to his rise in Hollywood.
Follow us on: Pinterest
Early Life and Background
Family Roots and Childhood
Thomas William Selleck was born in Detroit, Michigan, on January 29, 1945. He was the second of four children in a close-knit family. His father, Robert Dean Selleck, was a real estate investor and executive. while his mother, Martha Selleck, was a homemaker. The Selleck family relocated to Sherman Oaks, California. when Tom was a child, setting the stage for his future in the entertainment industry.
Education and Early Interests
Growing up, young Tom Selleck was an active and athletic child. He attended Grant High School in Van Nuys, California. where he excelled in sports, particularly basketball. His tall and athletic build made him a standout player, and he earned a basketball scholarship to the University of Southern California (U.S.C.). While at U.S.C., Selleck studied business administration. but his interests shifted toward acting.
Discovery of Acting Passion
Tom Selleck's journey into acting was serendipitous. During his time at U.S.C., a drama coach encouraged him to try acting. This nudge led him to join the Hills Playhouse, where he began honing his craft. Transitioning from an aspiring athlete to an actor took time. but young Tom Selleck became drawn to the performance world.
Early Career Struggles
Breaking Into the Industry
The path to stardom was a challenging one for young Tom Selleck. Like many aspiring actors, he faced many rejections and struggled to find steady work. A series of minor roles and guest appearances on television shows marked his early career. In 1965, he debuted on the syndicated show "The Dating Game." which gave him some exposure but did not lead to immediate success.
The Commercial Breakthrough
During the late 1960s and early 1970s, Selleck began appearing in television commercials. His rugged good looks and charismatic presence made him a popular brand choice. He starred in advertisements for Pepsi-Cola, Revlon, and Close-Up toothpaste. These commercials provided financial stability and helped him gain visibility in the industry.
Struggling Actor in Hollywood
Despite his success in commercials. breaking into large acting roles remained a challenge for young Tom Selleck. He auditioned and took on small parts in T.V. shows and movies. Some of his early television appearances included roles in popular series like Lancer, The F.B.I., and Bracken's World. But, it would take a
Hollywood Actress - The 250 hottest galleryZsolt Nemeth
Hollywood Actress amazon album eminent worldwide media, female-singer, actresses, alhletina-woman, 250 collection.
Highest and photoreal-print exclusive testament PC collage.
Focused television virtuality crime, novel.
The sheer afterlife of the work is activism-like hollywood-actresses point com.
173 Illustrate, 250 gallery, 154 blog, 120 TV serie logo, 17 TV president logo, 183 active hyperlink.
HD AI face enhancement 384 page plus Bowker ISBN, Congress LLCL or US Copyright.
Skeem Saam in June 2024 available on ForumIsaac More
Monday, June 3, 2024 - Episode 241: Sergeant Rathebe nabs a top scammer in Turfloop. Meikie is furious at her uncle's reaction to the truth about Ntswaki.
Tuesday, June 4, 2024 - Episode 242: Babeile uncovers the truth behind Rathebe’s latest actions. Leeto's announcement shocks his employees, and Ntswaki’s ordeal haunts her family.
Wednesday, June 5, 2024 - Episode 243: Rathebe blocks Babeile from investigating further. Melita warns Eunice to stay clear of Mr. Kgomo.
Thursday, June 6, 2024 - Episode 244: Tbose surrenders to the police while an intruder meddles in his affairs. Rathebe's secret mission faces a setback.
Friday, June 7, 2024 - Episode 245: Rathebe’s antics reach Kganyago. Tbose dodges a bullet, but a nightmare looms. Mr. Kgomo accuses Melita of witchcraft.
Monday, June 10, 2024 - Episode 246: Ntswaki struggles on her first day back at school. Babeile is stunned by Rathebe’s romance with Bullet Mabuza.
Tuesday, June 11, 2024 - Episode 247: An unexpected turn halts Rathebe’s investigation. The press discovers Mr. Kgomo’s affair with a young employee.
Wednesday, June 12, 2024 - Episode 248: Rathebe chases a criminal, resorting to gunfire. Turf High is rife with tension and transfer threats.
Thursday, June 13, 2024 - Episode 249: Rathebe traps Kganyago. John warns Toby to stop harassing Ntswaki.
Friday, June 14, 2024 - Episode 250: Babeile is cleared to investigate Rathebe. Melita gains Mr. Kgomo’s trust, and Jacobeth devises a financial solution.
Monday, June 17, 2024 - Episode 251: Rathebe feels the pressure as Babeile closes in. Mr. Kgomo and Eunice clash. Jacobeth risks her safety in pursuit of Kganyago.
Tuesday, June 18, 2024 - Episode 252: Bullet Mabuza retaliates against Jacobeth. Pitsi inadvertently reveals his parents’ plans. Nkosi is shocked by Khwezi’s decision on LJ’s future.
Wednesday, June 19, 2024 - Episode 253: Jacobeth is ensnared in deceit. Evelyn is stressed over Toby’s case, and Letetswe reveals shocking academic results.
Thursday, June 20, 2024 - Episode 254: Elizabeth learns Jacobeth is in Mpumalanga. Kganyago's past is exposed, and Lehasa discovers his son is in KZN.
Friday, June 21, 2024 - Episode 255: Elizabeth confirms Jacobeth’s dubious activities in Mpumalanga. Rathebe lies about her relationship with Bullet, and Jacobeth faces theft accusations.
Monday, June 24, 2024 - Episode 256: Rathebe spies on Kganyago. Lehasa plans to retrieve his son from KZN, fearing what awaits.
Tuesday, June 25, 2024 - Episode 257: MaNtuli fears for Kwaito’s safety in Mpumalanga. Mr. Kgomo and Melita reconcile.
Wednesday, June 26, 2024 - Episode 258: Kganyago makes a bold escape. Elizabeth receives a shocking message from Kwaito. Mrs. Khoza defends her husband against scam accusations.
Thursday, June 27, 2024 - Episode 259: Babeile's skillful arrest changes the game. Tbose and Kwaito face a hostage crisis.
Friday, June 28, 2024 - Episode 260: Two women face the reality of being scammed. Turf is rocked by breaking
Tom Selleck Net Worth: A Comprehensive Analysisgreendigital
Over several decades, Tom Selleck, a name synonymous with charisma. From his iconic role as Thomas Magnum in the television series "Magnum, P.I." to his enduring presence in "Blue Bloods," Selleck has captivated audiences with his versatility and charm. As a result, "Tom Selleck net worth" has become a topic of great interest among fans. and financial enthusiasts alike. This article delves deep into Tom Selleck's wealth, exploring his career, assets, endorsements. and business ventures that contribute to his impressive economic standing.
Follow us on: Pinterest
Early Life and Career Beginnings
The Foundation of Tom Selleck's Wealth
Born on January 29, 1945, in Detroit, Michigan, Tom Selleck grew up in Sherman Oaks, California. His journey towards building a large net worth began with humble origins. , Selleck pursued a business administration degree at the University of Southern California (USC) on a basketball scholarship. But, his interest shifted towards acting. leading him to study at the Hills Playhouse under Milton Katselas.
Minor roles in television and films marked Selleck's early career. He appeared in commercials and took on small parts in T.V. series such as "The Dating Game" and "Lancer." These initial steps, although modest. laid the groundwork for his future success and the growth of Tom Selleck net worth. Breakthrough with "Magnum, P.I."
The Role that Defined Tom Selleck's Career
Tom Selleck's breakthrough came with the role of Thomas Magnum in the CBS television series "Magnum, P.I." (1980-1988). This role made him a household name and boosted his net worth. The series' popularity resulted in Selleck earning large salaries. leading to financial stability and increased recognition in Hollywood.
"Magnum P.I." garnered high ratings and critical acclaim during its run. Selleck's portrayal of the charming and resourceful private investigator resonated with audiences. making him one of the most beloved television actors of the 1980s. The success of "Magnum P.I." played a pivotal role in shaping Tom Selleck net worth, establishing him as a major star.
Film Career and Diversification
Expanding Tom Selleck's Financial Portfolio
While "Magnum, P.I." was a cornerstone of Selleck's career, he did not limit himself to television. He ventured into films, further enhancing Tom Selleck net worth. His filmography includes notable movies such as "Three Men and a Baby" (1987). which became the highest-grossing film of the year, and its sequel, "Three Men and a Little Lady" (1990). These box office successes contributed to his wealth.
Selleck's versatility allowed him to transition between genres. from comedies like "Mr. Baseball" (1992) to westerns such as "Quigley Down Under" (1990). This diversification showcased his acting range. and provided many income streams, reinforcing Tom Selleck net worth.
Television Resurgence with "Blue Bloods"
Sustaining Wealth through Consistent Success
In 2010, Tom Selleck began starring as Frank Reagan i
Meet Crazyjamjam - A TikTok Sensation | Blog EternalBlog Eternal
Crazyjamjam, the TikTok star everyone's talking about! Uncover her secrets to success, viral trends, and more in this exclusive feature on Blog Eternal.
Source: https://blogeternal.com/celebrity/crazyjamjam-leaks/
240529_Teleprotection Global Market Report 2024.pdfMadhura TBRC
The teleprotection market size has grown
exponentially in recent years. It will grow from
$21.92 billion in 2023 to $28.11 billion in 2024 at a
compound annual growth rate (CAGR) of 28.2%. The
teleprotection market size is expected to see
exponential growth in the next few years. It will grow
to $70.77 billion in 2028 at a compound annual
growth rate (CAGR) of 26.0%.
240529_Teleprotection Global Market Report 2024.pdf
ANOMALY DETECTION IN ARABIC TEXTS USING NGRAMS AND SELF ORGANIZING MAPS
1. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.11, No.2/3/4, August 2021
DOI: 10.5121/ijcsea.2021.11402 17
ANOMALY DETECTION IN ARABIC TEXTS USING N-
GRAMS AND SELF ORGANIZING MAPS
Abdulwahed Almarimi and Asmaa Salem
Department of Computer Science, Bani Waleed University, Bani Waleed, Libya
ABSTRACT
Every written text in any language has one author or more authors (authors have their individual
sublanguage). An analysis of text if authors are not known could be done using methods of data analysis,
data mining, and structural analysis. In this paper, two methods are described for anomaly detections: n-
grams method and a system of Self-Organizing Maps working on sequences built from a text. there are
analyzed and compared results of usable methods for discrepancies detection based on character n-gram
profiles (the set of character n-gram normalized frequencies of a text) for Arabic texts. Arabic texts were
analyzed from many statistical characteristics point of view. We applied some heuristics for measurements
of text parts dissimilarities. We evaluate some Arabic texts and show its parts they contain discrepancies and
they need some following analysis for anomaly detection. The analysis depends on selected parameters
prepared in experiments. The system is trained to input sequences after which it determines text parts with
anomalies using a cumulative error and winner analysis in the networks. Both methods have been tested on
Arabic texts and they have a perspective contribution to text analysis.
KEYWORDS
Anomaly Detection, N-gram of Words, N-gram of Symbols, Self Organizing Map
1. INTRODUCTION
In text processing, many problems are solved connected to authorship of texts, for example
authorship attribution, external plagiarism, internal plagiarism, authorship verification, text
verification. The Authorship Attribution problem is formulated as a problem to identify the author
of the given text from the group of potential candidate authors. Some interesting approaches to
solving of the problem can be found in [1], [2], [3] and plagiarism [4], [5],[6] but in both problems
there exist some groups of comparable authors and comparable texts. It means the results of
analysis can be compared according to texts or authors. In our problem the author is known [7][8]
and we analyze each text as one extra text. In the solution of the problem, we use Self-Organizing
Maps (SOM) models of neural networks [9]. A good description of Self-Organizing Maps
extensions for temporal structures can be found in [10], where some of the extensions are usable
for sequences. SOM models of neural networks are applied to time series in [11] and it inspired us
to apply the same in a text analysis.
This paper is written in the following structure: The second section describes the background and
statistics of some analyzed Arabic texts. The third section describes our developed method, the
algorithm of Character n-gram Profiles. The fourth section describes the second method which we
use in analysis of texts. It is a system of Self Organizing Maps. In the conclusion, we formulate
summary of results and the plan of the following research.
2. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.11, No.2/3/4, August 2021
18
2. BASIC BACKGROUND AND TEXT STATISTICS
In the text verification we used Arabic texts from [12]. The statistics of 6 Arabic texts are shown
in the Table I.
We will use the following symbols and definitions:
- an finite alphabet of letters;
is the number of letters in ; in our texts;
V - a finite vocabulary of words in the alphabet , presented in the alphabetic order;
V
-
the numbers of words in the vocabulary V ;
T - text document; a finite sequence of words T ; T =
N
V
w
w
w n ;
;
,...,
1
- the number
of words in the text;
T =
T
t
t
t T
;
...,
, 2
1
– the number of symbols in the text T ;
g
n
– symbol for n-gram (build on symbols);
V
n
– a finite vocabulary of n-grams,
V
n
is the number of different n-grams in the
vocabulary V
n
;
Let P(A) and P(B) be the profiles of two texts A and B, respectively. We studied the performance
of various distance measures that quantify the similarity between two character n-gram profiles in
the framework of author identification experiments. The following dissimilarity measure has been
found to be both accurate and robust when the two texts significantly differ in length.
)
(
2
)
(
)
(
))
(
)
(
(
2
)
,
(
A
p
g
n
g
f
g
f
g
f
g
f
n
B
n
A
n
B
n
A
B
A
d
(1)
where fA (g) and fB (g) is the frequency of occurrence (normalized over the text length) of the n-
gram g in text A and text B, respectively P(A) the set of all n-gram in the part A.
If the numbers of occurrences of in the two parts A and B of document are known, a function on
n-grams can be defined by:
,
)
(
#
)
(
#
)
(
,
g
o
g
o
g
k n
A
n
B
n
B
A
(2)
and the formula (1) can be modified as (3) using formula (2) as follows:
2
)
( ,
,
)
(
*
)
(
*
(
2
)
,
(
A
p
g
n
B
A
n
n
n
B
A
n
n
n g
k
A
B
g
k
A
B
B
A
d
(3)
3. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.11, No.2/3/4, August 2021
19
Table.1. Statistics of 10 Arabic Texts
Name of texts
A1 A2 A3 A4 A5 A6 A7 A8 A9 A14
# words 94197 48358 51938 31656 39340 36977 93668 40076 60503 2168
# symbols 395065 198019 247448 135573 152905 155301 375430 163212 258346 11409
#diff. words14110 9061 25755 10098 7036 3492 16384 9391 20920 1108
# words
by length
1 6 48 183 81 14 7 89 690 12 28
2 13217 7358 5365 4816 6188 6397 15396 7456 8079 312
3 23287
24.72%
12130
25.08%
9353*
18.00%
7795
24.62%
9619*
24.45%
7575*
20.48%
23520*
25.10%
8405
20.97%
12393
21.48%
529
24.40%
4 22426*
23.80%
11653*
24.09%
9779
18.82%
6324 *
19.97%
11476
29.17%
8231
22.25%
25075
26.77%
7850 *
19.58%
10592*
17.50%
456*
18.07%
Arabic
Latin
Max frq.
3-grams
الم
alm
3027
واا
waa
1797
واا
waa
4242
نال
nal
789
قال
qal
2294
الم
alm
971
الل
all
3468
اال
ala
1647
الم
alm
1983
الم
alm
57
Arabic
Latin
Max frq.
4-grams
هللا
allah
1479
فيال
fiyal
525
هوهو
huhu
797
فيال
fiyal
346
هللا
allah
1986
لبصر
lbsar
2230
هللا
allah
2958
ثمال
thmal
1030
فيال
fiyal
841
هللا
allah
27
3. CHARACTER N-GRAM PROFILES METHOD
The method is based on similarity/dissimilarity of the text parts and their occurrences of n-grams
in comparison to the complete text. We modify the dissimilarity measure defined by (3) using (2)
to normalized dissimilarity measure nd as follows :
2
)
( ,
,
)
(
*
)
(
*
*
1
)
,
(
A
p
g
n
T
A
n
n
n
n
T
A
n
n
n
n
n g
k
A
T
g
k
A
T
A
T
A
nd
(4)
where T is the complete text, P(A) and is the set of all n-grams in the text part A. The denominator
A
n
ensures that the values of this dissimilarity function lie between 0 (highest similarity) and 1.
The complete set of parameter settings for the proposed method is given in Table 2.
Table.2. Parameter settings used in this study.
Description Symbol value
Character n-gram length Arabic 4
Sliding window length w 2000
Sliding window moving s 100
Threshold of plagiarism free criterion
t1 0.2
Real window length threshold t2 1500
Sensitivity of plagiarism detection a 2
4. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.11, No.2/3/4, August 2021
20
3.1.N-gram Profile and a Style Function
Let W be a sliding window moving through the document of length w (in letters) and a step s (in
letters). The window represents a text part and will be moved every time to the right by s letters.
The profile of the window W is defined by the value nd(W;T). It is possible to define the style
function of a text T, using profiles of the moving windows as follows:
,
/
...
1
),
,
(
)
,
( s
T
i
T
W
nd
T
i
sf i
(5)
where Wi is a window,
s
T /
is the total number of windows (it depends on a text length). If w
> s the windows are overlapping. It means, a text part in each window of the text will be evaluated
in a comparison to whole text. The size of the window and the distance of the moving of window
should have some influence for the stability of the style function nd. The different results are
illustrated in the next Figure. The figure shows sf function of Arabic text A4, for 4-grams, the
length of the moving window was 2000 letters. In the above panel the moving step was 100 letters
and in the down panel 500 letters. In the figure, a middle line is drawn representing the mean of
all sf values along with two more lines representing the +/- standard deviation [13].
Figure.1. The style function of Arabic text A4, the window of the length 2000 letters moving by 100
positions using 4-grams (up) and moving by 500 positions (down).
3.2. Algorithm Covering Anomalies in Texts Parts
We expect that the style function is relatively stable (it does not change value dramatically) if the
document is written by the same author. If the style function has very different values (some peaks
[14]) for different windows, it is necessary to analyze the covered parts.
Let M be a mean value of the sf function values. The existence of peaks can be indicated by the
standard deviation. Let S denote the standard deviation of the style function. If S is lower than a
predefined threshold and profile values are less than M + S, then the text looks like consistent text
of one author. The windows with the profile greater than M + S could be analyzed again.
5. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.11, No.2/3/4, August 2021
21
The steps of the algorithm:
Step 1: We first remove from sf all the text windows with the profile less than M + S. The reduced
text is '
T . These windows correspond to unplagiarized sections probably.
Step 2: denote the style function after removal of the above described windows. Let '
M and '
S
be the mean and the standard deviation of )
'
,
'
( T
i
sf
Step 3: The criterion (6) is defined to detect discrepancies. '
*
'
)
,
'
( S
a
M
W
i
sf
(6)
where parameter a determines the sensitivity of the discrepancies detection method. For the higher
value a, the smaller number (and more likely problematic) sections are detected. The value of a
was determined empirically at 2:0 [15] to attain a good combination of precision and recall. We
used recommended value for the parameter a.
Step 4: Let #dsf be the number of windows which fulfil the condition (6). The percentage of
discrepancies can be described by the formula (7)
W
dsf
Pdisc
*#
100
(7)
Figure. 2. The illustration of the algorithm on Arabic text A4, the window of the length 2000 letters
moving by 100 positions. The binary function in the down panels indicates probably plagiarized passages
(values greater than 0).
3.3. Evaluation of Character N-Gram Profiles Method
Our method covered the anomalies in such texts. Fig. 3 shows the application of 4-gram profile
method on combined Arabic text (A4-A7). It means, the texts were created as an artificial
combination of two parts of different texts. The style function of the combined text has different
shape in the first part where discrepancies were identified. The percentage of anomalies is Pdisc =
25:75%. The percentage is higher than the supposed percentage (10%), meaning the text should
have some anomaly.
6. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.11, No.2/3/4, August 2021
22
Figure. 3. The style function of combined two Arabic texts A4 (5347characters) and A7 (8155 first
characters), the window moving by 100 positions using 4-grams. The binary function (the down panel)
indicates problematic passages (high values). The percentage of discrepancies is Pdisc = 25:75%.
4. SYSTEM FOR ANOMALY DETECTIONS
4.1. Self Organizing Maps
The Self Organizing Map belongs to the class of unsupervised and competitive learning algorithms
[9]. This type of neural network is used to map the n-dimensional space to the less dimensional
space, usually two-dimension space. The neurons are arranged usually to the two dimensional
lattice, frequently called a map. This mapping is topology saved and each neuron has its own n-
dimensional weights vector to an input. If the input is represented by some sequence (for example,
time series), when the order of values is important, then it is necessary to follow the order without
changing it.
The steps of the algorithm:
1. Initialization: The weight vectors of each node (neuron) in the lattice are initialized to a
small random value from the interval
1
,
0 . The weight vectors are of the same dimensions
as the input vectors.
2. Winner identification for an input vector: Calculate the distance of the input vector to the
weight vector of each node. The node with the shortest distance is the winner. If there are
more than one node with the same distance, then the winning node is chosen randomly from
the nodes with the shortest distance. The winning node is called the Best Matching Unit
(BMU). Let i*
be index of the winning node.
3. 3. Neighbors calculation: For this, the following equation is used:
)
)
(
)
(
)
(
exp(
)
,
,
( 2
*
*
t
t
r
t
r
t
i
i
h
i
i
(8)
7. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.11, No.2/3/4, August 2021
23
Where )
(t
means the radius of the neighborhood function, t is an iteration step,
)
(t
ri and
)
(
* t
ri
are the coordinates of units i and i_ in the output array.
4. Weights adaptation: Only the weights of the nodes within the neighborhood radius will be
adapted. The equation for that is
)
(
*
)
,
,
(
* * old
old
new
W
x
t
i
i
h
W
W
(9)
where
new
W
is the vector of the new weights,
old
W
are old weights,
(0,1) is the learning
rate,
X is the actual input vector. After the algorithm makes changes in the weights, it presents
next random input vector from the remaining input vectors to input and continues with step 2 and
so on until no input vector is left.
4.2. Description of the System Structure
In the first layer, it has SOMx; x ∈ {words, w2-grams, w3-grams, s3-grams}, neural networks are
trained to different sequences built according to the text D. The shape of the model is very similar
to the model developed in [8], but here the different sequences are used for a training. The training
of each SOMx is done on sequences Sw; Sw2g; Sw3g; Ss3g. The sequences are built according to the
probability of n-grams [16].
Figure.4. System for anomaly detections.
4.3. Description of the system computation
After the SOMx was trained it is possible to evaluate the quality of the training prepared by the
evaluation of errors for all input vectors (all windows in the text). We will use a quantization error
Erx defined by (10) as a measure of a proximity input vector x+
to the learned winner vector wi* of
i*
-th neuron (winner for input vector x+
) in the SOMx
,
)
,
( *
* i
i
x w
x
w
x
Er
(10)
8. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.11, No.2/3/4, August 2021
24
Using formula (10) it is possible to compute the vectors of quantization errors
R
t
i
x t
w
t
x
Er 1
* ))}
(
),
(
(
{
(11)
where R is the number of training vectors, t is the order of the member in input sequence. For the
anomaly detections we will use thresholds developed by [11]. Let 𝛼 be a significance level (𝛼 =
0:01 or 𝛼 = 0:05). We suppose the percentage of normal values of the quantization error will be
100 * (1 - 𝛼). Let N𝛼 be the real number such that a percentage 100 *(1 - 𝛼) of the error values is
less than or equal to N𝛼. Then
Lower limit:
= N1-𝛼/2
Upper limit:
= N𝛼/2
The important interval is
,
the values out of it could be detected as anomalies.
The quantization vectors Erx and intervals x
x s
s
,
are computed in the panels Errorsx, x ; x
∈ {words, w2-grams, w3-grams, s3-grams} and they are used in two the following evaluations:
1- Cumulative Error :
4
4
3
3
2
2
1 *
*
*
* w
w
w
w Er
Er
Er
Er
CEr
(12)
where i
, i = 1; 2; 3; 4,
4
1
1
i i
are parameters for a contribution of Eri to the cumulative
error. The values of the parameters i
should be chosen after the analysis of all errors. If the
cumulative error has higher value as the threshold hup given by formula (13)
4
4
3
3
2
2
1 *
*
*
* Sw
Sw
Sw
Sw
up
h
(13)
then the text needs some further analysis.
2- Evaluation of SOM winners, clusters in SOM lattices:
Fig. 5 shows the System of our method on combined Arabic text (A14). It means, the texts were
created as an artificial combination of two parts from different texts (A1 and A4).
In the first part of Figure 5, we show the analysis of the cumulative error. The experiment was done
with parameters _1 = 0:3339; _2 = 0:0314; _3 = 0:0157; _4 = 0:6189. The influence of
the word probability in windows and 3-grams of symbols probability in the same window is higher
than the others. The text needs some further analysis. It should have some anomaly. The second
part shows SOM winners evaluations for windows moving through the text. The similar windows
are grouped into clusters. We can follow more clusters than one. That means there should be some
anomaly too in the text.
9. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.11, No.2/3/4, August 2021
25
Figure .5. The evaluation of the cumulative error in the first part and evaluation of SOM winners in the
second part for Arabic text A14. In the Figure 5, the first part shows the analysis of the cumulative error of
the text A14. The experiment was done with parameters _1 = 0:3339; _2 = 0:0314; _3 = 0:0157;
_4 = 0:6189: For all types of errors, there exist error values above the thresholds and for the threshold of
the cumulative error too. The text should have some anomaly. The second part shows the clusters of SOM
winners, more clusters illustrate that in the text should be some anomaly too.
5. CONCLUSIONS
In this paper, we developed two methods that compute some characteristics of texts. In both
methods we illustrate results for Arabic text from the corpus [12],[3]. The first method is Character
n-gram Profiles and the second method is a system for anomalies detections in a text. We illustrated
results for Arabic texts. In this paper we showed the statistical analysis of Arabic texts and cover
that using 4-grams are better for it. The prepared analysis is very formal and in the next work we
will try to apply some new accesses to the problem. Both methods are capable of covering
anomalies of texts combined from two texts. The results from both methods trying to discover
dissimilarities of text parts in each text show dissimilarities and they call for an attention to the text
(or not) if the text parts were written by the same author (or not). We show the clusters of all four
trained SOM networks. The training done on 1-grams of words, 2-grams of words, 3-gram of words
10. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.11, No.2/3/4, August 2021
26
and 3-grams of symbols. According to the evaluation we can say that declared clusters of SOM
networks trained for words better situation from symbols, they can give more information about
clusters in some text.The clusters in the SOM lattice show similar characteristics of moved
windows in the text. If it is possible to follow more clusters in the lattice, it is necessary to use
some next analysis (the text probably has some anomalies). According obtained from these
methods, the results for all texts from our point of view we can say that the second method System
of Self Organizing Maps (SOM) is the most successful in the evaluation of the results, because we
have many experiments of the text. Our next plan is to do many statistic tests to do evaluation and
find better modifications of parameters.
ACKNOWLEDGEMENTS
Thanks to Prof. Gabriela Andrejková, CSc. and Mgr. Peter Sedmák for their help.
REFERENCES
[1] Neme, A. Pulido, J.R.G. Muño, A. Hernández, S. & Dey, T, (2015) “Stylistics analysis and authorship
attribution algorithms based on self-organizing maps”, Neurocomputing, 147 5 January 2015, pp
147–159.
[2] Stamatatos, E, (2010) “A survey of modern authorship attribution methods”, J. Am. Soc. Inf. Sci.
Technol. pp 538-556.
[3] Bensalem, I. Rosso, P & Chikhi, S, (2013) “A New Corpus for the Evaluation of Arabic Intrinsic
Plagiarism Detection”, CLEF 2013, LNCS 8138, pp 53-58.
[4] Eissen, S. M. Z. Stein, B & Kulig, M, (2006) “Plagiarism detection without reference collections”, In:
Decker, R., Lenz, H.J. (eds.) GfKl. Studies in Classification, Data Analysis, and Knowledge
Organization, Springer Berlin, pp 359-366.
[5] Stamatatos, E, (2006) “Ensemble-based author indentation using character n-grams”, In Proceedings
of the 3rd International Workshop on Text-based Information Retrieval 36, pp 41-46.
[6] Hassan, F. I. H & Chaurasia, M. A, (2012) “N-gram based text author verification”, IACSIT press,
Singapore 36, pp 67-71.
[7] Almarimi, A & Andrejková, G, (2015) “Text Anomalies Detection Using Histograms of Words”,
ACSIJ Advances in Computer Science: an International Journal, Vol. 4, Issue 5, No. 17, September
2015. ISSN: 2322-5157, pp 69-75.
[8] Almarimi, A. Andrejková, G & Sedmák, P, (2016) “Self Organizing Maps in Text Anomalies
Detections”, conference Cognition and Articial Life, Telč, 2016.
[9] Kohonen , T (2007) Self Organizing Maps. Prentice-Hall, 2 ed, 2007.
[10] Hammer, B. Micheli, A. Neubauer, N. Sperduti, A & Strickert, M, (2005) “Self-organizing maps for
time series”, WSOM 2005, Paris, pp. 1-8.
[11] Barreto, G.A & Aguayo, L, (2009) “Time series clustering for anomaly detection: Using competitive
neural networks”, Proceedings WSOM 2009 LNCS (5629), 28-36, 2009.
[12] King Saud University Corpus of Classical Arabic (2011). http://ksucorpus.ksu.edu.sa .
[13] Almarimi, A & Andrejková, G, (2015) “Document Verification using N-grams and Histograms of
Words”, IEEE 13th International Scientific Conference on Informatics, November 18-20, Poprad
Slovakia, pp 21-26.
[14] Jurafsky , D & Martin, J.H (2000) Speech and Language Processing. Prentice-Hall, 1 ed., 2000.
[15] Stamatatos, E, (2009) “Intrinsic Plagiarism Detection Using Character n-gram Profiles”, SEPLN
Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse, PAN’09, pp 38-46.
[16] Stamatatos, E, (2010) “Authorship attribution based on feature set subspacing ensembles”,
International Journal on Artificial Intelligence Tools, 15.5, pp 823-838.
11. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.11, No.2/3/4, August 2021
27
AUTHORS
Master Degree from P. J. Šafárik University in Košice, Institute of Computer Science,
Faculty of Science 2012, PhD. from P. J. Šafárik University in Košice, Institute of
Computer Science, Faculty of Science2016, Faculty member at Bani Waleed University
From 2017, Libya.
Master Degree from P. J. Šafárik University in Košice, Institute of Computer Science, Faculty of Science
2014, PhD. from P. J. Šafárik University in Košice, Institute of Computer Science, Faculty of Science 2019,
Faculty member at Bani Waleed University From 2020, Libya.